AI Researcher · Paris, France
PhD researcher specialising in efficient multimodal AI (MLLMs) — compressing Vision-Language & Vision-Language-Action models for the real world.
Introduction
I am a PhD researcher at Université Paris-Saclay (IBISC Lab & CEA), working under the supervision of Dr. Martyna Poreba, Dr. Michal Szczepanski, and Prof. Samia Bouchafa. My research sits at the intersection of computer vision, large multimodal models, and resource-constrained deployment.
My core focus is making Vision-Language Models (VLMs) and Vision-Language-Action (VLA) models efficient enough to run on edge hardware — without retraining. I design training-free methods that analyse and reduce redundant visual tokens before they reach the LLM backbone, with a particular emphasis on deployment on the NVIDIA Jetson Orin for autonomous systems.
I am a Demythif.AI Fellow under Institut DATAIA, co-funded by the Marie Skłodowska-Curie Actions (MSCA) and the European Union — a programme dedicated to responsible, human-centred AI.
Focus Areas
A training-free method that inserts SVD-based leverage scoring into the LLM forward pass to prune redundant visual tokens. Evaluated on LLaVA-1.5 7B & 13B and Qwen-VL across ScienceQA, GQA, POPE, AI2D, TextVQA, and MMBench. Achieves 4× speedup at 16 tokens vs the 576-token baseline — with configurable injection layer via environment variables.
Extending visual token pruning to VLA models for robotic manipulation — an ongoing effort building on the VLM work. Working with OpenVLA, π₀ (pi_0), and CogAct on the LIBERO simulation benchmark. A key challenge: spatial token importance differs fundamentally from VLMs — removing spatially critical tokens can cause task failure. Targeting ICRA workshop, CoRL 2026, and NeurIPS 2026.
All pruning work targets real-world deployment on the NVIDIA Jetson Orin, a key constraint shaping the design of SVD-PRUNE. Benchmarking inference latency, memory footprint, and accuracy trade-offs directly on edge hardware for autonomous systems applications.
A systematic survey benchmarking state-of-the-art token pruning and merging techniques for VLMs. Covers key design dimensions — where to prune, how to score token importance, merging strategies (e.g. G2TM-style graph merging), training-free vs fine-tuning approaches — and open challenges for the community.
Academic Output
Core Research Method
VLMs like LLaVA-1.5 encode images as 576 visual tokens — most are redundant background patches. SVD-based leverage scoring identifies which tokens carry the most information before they enter the LLM backbone. Use the slider below to see how pruning affects token selection and model efficiency.
VLA Research Platform
My VLA pruning work builds directly on π₀ — Physical Intelligence's generalist robot foundation model trained across 8 distinct robot embodiments. Each robot presents a different kinematic chain and task distribution — making token importance estimation fundamentally harder than in standard VLMs. pi.website ↗
Toolkit
Credentials
Side Work
Interactive dashboard for comparing VLM architectures and visualising token pruning benchmarks. Includes VLM timeline, pruning comparisons, and model performance charts.
Open ↗React-based news web application with real-time AI integration via ALAN AI voice assistant.
Live ↗Full-stack replica with authentication, product listing, cart, and secure checkout flow.
Live ↗Robust HR management web application built on the Spring-Struts-MyBatis (SSM) framework.
GitHub ↗Android application featuring user registration and picture publication.
GitHub ↗Swift iOS app for vehicle sales: user registration, listings, and booking functionality.
GitHub ↗