Success Story — Itbanque

🌍

Multilingual Whisper Fine-Tuning for Global Media Intelligence

We collaborated with a global media analytics company to deliver a production-ready multilingual speech recognition system for high-accuracy transcription across Japanese, Korean, Mandarin, Cantonese, and French. The goal was to enable automated subtitle generation and content indexing in noisy, mixed-language media environments such as TV shows, films, and podcasts.

We designed a pipeline to collect and align over 1,200 hours of domain-specific speech data, sourced from licensed entertainment content, interviews, and public media archives. All audio was segmented, speaker-labeled, and aligned to exact subtitle timestamps. Data was packaged in Whisper-compatible format, enriched with token-based metadata for language switching and noise classification.

Fine-tuning was performed on a per-language basis using Whisper small and medium models, followed by targeted LoRA adapters for code-switching and latency-sensitive streaming scenarios. Evaluation included WER, CER, BLEU and real-user transcription fluency scores.

The result? Sub-15% WER across five languages in noisy, real-world conditions — enabling the client to build internal AI subtitle pipelines and scale multilingual content indexing across markets.

🗣 5 Languages 🎧 1,200h Aligned Speech ⚙️ Whisper + LoRA 🎯 Sub-15% WER 📺 Film, Podcast, News

🏃‍♂️

Supervised Action Recognition for Competitive Sports Analysis

We developed a domain-specific action recognition system tailored for analyzing technical movements in competitive sports. The objective was to detect and classify precise athletic techniques—including transitions, takedowns, and holds—within full-length match videos.

Leveraging a fully supervised training approach, we collaborated with subject-matter experts to create a structured dataset featuring 100+ labeled action classes. Each video segment was manually annotated with start/end timestamps, action type, and contextual metadata to support accurate downstream modeling.

The final model, built on top of a transformer-based video backbone, achieved 78% top-1 accuracy on clip-level predictions and supported real-time inference. A custom labeling interface was also delivered to enable rapid expansion of the dataset and human-in-the-loop quality control.

🎯 Supervised Training Pipeline 🏷️ 100+ Expert-Labeled Classes 📹 Fine-Grained Video Segmentation ⚡ Real-Time Clip Inference 🧰 Annotation Interface Included

👟

Sneakers Radar — Real-Time Inventory Intelligence

Sneakers Radar tracks sneaker stock levels and pricing across global retailers such as Nike, JD Sports, and Foot Locker. It aggregates live product data into a unified dashboard, providing real-time analytics on inventory, restocks, and discounts.

Powered by one of the most advanced large-scale web crawling architectures, Sneakers Radar continuously monitors product availability and pricing across multiple global retailers with precision, scalability, and real-time reliability.

📊 Real-Time Data 🛒 120K+ SKUs Indexed 🌍 Global Retail Coverage 💬 Slack Alerts & API

🧠

Talk2Dom — AI-Powered Element Locator

Talk2Dom transforms natural language into browser automation by identifying DOM elements intelligently. It supports Selenium WebDriver and Playwright, making element location effortless through an OpenAI-compatible API.

Powered by state-of-the-art language models, Talk2Dom empowers QA engineers, automation developers, and data teams to interact with complex web interfaces through natural language — boosting productivity and simplifying workflows at scale.

⚙️ OpenAI-Compatible API 🔍 DOM Element Locator 💡 LangChain Integration 🌐 SaaS Dashboard