Success Story
Custom data. Fine-tuned models. Real-world fluency in multi areas.
Multilingual Whisper Fine-Tuning for Global Media Intelligence
We collaborated with a global media analytics company to deliver a production-ready multilingual speech recognition system for high-accuracy transcription across Japanese, Korean, Mandarin, Cantonese, and French. The goal was to enable automated subtitle generation and content indexing in noisy, mixed-language media environments such as TV shows, films, and podcasts.
We designed a pipeline to collect and align over 1,200 hours of domain-specific speech data, sourced from licensed entertainment content, interviews, and public media archives. All audio was segmented, speaker-labeled, and aligned to exact subtitle timestamps. Data was packaged in Whisper-compatible format, enriched with token-based metadata for language switching and noise classification.
Fine-tuning was performed on a per-language basis using Whisper small and medium models, followed by targeted LoRA adapters for code-switching and latency-sensitive streaming scenarios. Evaluation included WER, CER, BLEU and real-user transcription fluency scores.
The result? Sub-15% WER across five languages in noisy, real-world conditions β enabling the client to build internal AI subtitle pipelines and scale multilingual content indexing across markets.
Supervised Action Recognition for Competitive Sports Analysis
We developed a domain-specific action recognition system tailored for analyzing technical movements in competitive sports. The objective was to detect and classify precise athletic techniquesβincluding transitions, takedowns, and holdsβwithin full-length match videos.
Leveraging a fully supervised training approach, we collaborated with subject-matter experts to create a structured dataset featuring 100+ labeled action classes. Each video segment was manually annotated with start/end timestamps, action type, and contextual metadata to support accurate downstream modeling.
The final model, built on top of a transformer-based video backbone, achieved 78% top-1 accuracy on clip-level predictions and supported real-time inference. A custom labeling interface was also delivered to enable rapid expansion of the dataset and human-in-the-loop quality control.
Sneakers Radar β Real-Time Inventory Intelligence
Sneakers Radar tracks sneaker stock levels and pricing across global retailers such as Nike, JD Sports, and Foot Locker. It aggregates live product data into a unified dashboard, providing real-time analytics on inventory, restocks, and discounts.
Powered by one of the most advanced large-scale web crawling architectures, Sneakers Radar continuously monitors product availability and pricing across multiple global retailers with precision, scalability, and real-time reliability.
Talk2Dom β AI-Powered Element Locator
Talk2Dom transforms natural language into browser automation by identifying DOM elements intelligently. It supports Selenium WebDriver and Playwright, making element location effortless through an OpenAI-compatible API.
Powered by state-of-the-art language models, Talk2Dom empowers QA engineers, automation developers, and data teams to interact with complex web interfaces through natural language β boosting productivity and simplifying workflows at scale.