PROJECTS

Live video understanding with vision-language models

For Unsabotage, I have been exploring what it takes to make a video stream legible to a local model in real time. Can a small VLM describe movement as it unfolds, without depending on a remote API or an offline annotation pipeline?

I built a video sampler that batches temporal context into VLM requests, outputs movement descriptions, and runs Qwen3-VL GGUF models locally through a llama-server-compatible API. I also experimented with a VL-JEPA path for comparing generated descriptions with learned video representations.

VLMs / live video understanding / Qwen3-VL / V-JEPA / TTS

Retrieval-augmented generation for post-sales intelligence

At gregthe.ai, I worked on the problem of making post-sales knowledge searchable enough for an LLM to reason over it usefully. This involved adding retrieval, deciding what should be retrieved, how it should be embedded, and how failures should be diagnosed.

I architected and deployed a scalable multi-tenant RAG system, designing the ingestion, embedding, indexing, and retrieval optimization pipeline that moved the product from no retrieval capability to production insight surfacing over growing SaaS document repositories.

RAG / embeddings / indexing / retrieval optimization / SaaS

Video-centric multimodal policy violation detection

At Unitary, I worked on video, policy violations, and representing temporal evidence while still using text, OCR, and ASR when language carried part of the signal.

I designed and deployed a video-centric multimodal transformer system combining frame-level visual models, temporal transformers, and text/audio-derived encoders. The system improved detection across moderation categories and increased confident no-violation predictions in production.

video-language models / multimodal transformers / ASR / OCR

Robust video object and motion detection

At Calipsa, I worked on video models that had to be useful under messy deployment conditions: very short CCTV clips, low light, low resolution, thermal cameras, occlusions, compression artifacts, and distribution shifts across sites.

I led the end-to-end development of object and motion detection systems for alarm filtering, designing the data pipeline, evaluation framework, deployment metrics, and robustness checks needed to reduce false alarms while preserving very high recall.

computer vision / video understanding / object detection / recall

Few-shot tear gas canister detection

With Forensic Architecture, I worked on detecting tear-gas canisters in videos for human-rights investigations. The research question was how to make object detection useful when the training set is far smaller than conventional CNN detectors expect.

I experimented with few-shot detectors trained from as few as 10 images, scaling up to 300 canister examples, and tested synthetic augmentation for improving generalization in open-source investigation footage.

few-shot learning / object detection / computer vision / WACV

Bayesian deep reinforcement learning for dialogue

In the Dialogue Systems group at Cambridge, I studied uncertainty in deep reinforcement learning for dialogue policy optimization. The motivation was whether neural policies could become more sample-efficient by knowing when they were uncertain.

I implemented Bayes-by-Backprop, dropout, concrete dropout, bootstrapped ensembles, and alpha-divergences in PyDial, leading to ICASSP and NeurIPS Bayesian Deep Learning workshop publications. The internal write-up is available here.

Bayesian neural networks / deep RL / dialogue systems / ICASSP

Applied ML mentoring for scientific modelling

As a project mentor for the Schmidt Data for Science Residency Program, I helped researchers turn scientific questions into modelable ML problems. The projects ranged from nationwide health-register modelling to language analysis of scholarly discourse.

I supervised work on comorbidity prediction, topic modelling, sequence-to-sequence demand forecasting, and protein sequence fitness prediction for enzymatic plastic degradation, with an emphasis on evaluation under scientific constraints.

topic models / seq2seq / health data / protein sequences

RESEARCH

Research interests

Video-language models, LLMs, RL, Bayesian modelling, computer vision for video understanding, object and movement detection, graph knowledge systems, agent workflows, retrieval systems, and practical methods for running and training smaller models.

VLMs / RL / Bayesian ML / small models / graphs / agents

Publications

D'Cruz, A.*, Tegho, C.*, Greaves, S.*, and Kermode, L. (2022). Detecting Tear Gas Canisters With Limited Training Data. IEEE/CVF Winter Conference on Applications of Computer Vision (WACV).
Tegho, C., Budzianowski, P., and Gasic, M. (2018). Benchmarking Uncertainty Estimates With Deep Reinforcement Learning for Dialogue Policy Optimisation. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Best Student Paper Award.
Tegho, C., Budzianowski, P., and Gasic, M. (2017). Uncertainty Estimates for Efficient Neural Network-based Dialogue Policy Optimisation. Bayesian Deep Learning Workshop, NeurIPS.
Tegho, C. Bayes By Backprop Neural Networks for Dialogue Management. Thesis Dissertation for MPhil in Machine Learning, Speech and Language Technology, University of Cambridge.
Ploix, B., Rashid, M., Agrawal, S., D’cruz, A., Tegho, C., & Veselev, A. (2021). Method and system for reviewing and analysing video alarms (GB Patent Application No. GB2585919A). Calipsa Ltd. https://patents.google.com/patent/GB2585919A/en

ABOUT

I build around machine learning systems: fine-tuned models, graph knowledge systems, agent workflows, video-language models, RL, and retrieval systems. I am interested in video understanding, LLMs, agentic AI, designing smaller models, and making model outputs traceable to structured evidence. I work mainly in Python and JavaScript, with TensorFlow, PyTorch, JAX, and production ML tooling.

Chris Tegho

LIVE VIDEO UNDERSTANDING WITH VLMS

RAG FOR POST-SALES INTELLIGENCE

VIDEO-CENTRIC MULTIMODAL MODERATION

VIDEO OBJECT AND MOTION DETECTION

FEW-SHOT TEAR GAS DETECTION

BAYESIAN DEEP RL FOR DIALOGUE SYSTEMS

APPLIED ML FOR SCIENTIFIC MODELLING