AI Researcher · Paris, France

Yvon Apedo

PhD researcher specialising in efficient multimodal AI (MLLMs) — compressing Vision-Language & Vision-Language-Action models for the real world.

MSCA Fellow VLM Models VLA Models Token Pruning LLaVA · Qwen-VL OpenVLA · π₀ Edge AI · Jetson Orin IBISC Lab · Paris-Saclay
Connect on LinkedIn ↗ View Publications 📄 View CV

Introduction

About Me

I am a PhD researcher at Université Paris-Saclay (IBISC Lab & CEA), working under the supervision of Dr. Martyna Poreba, Dr. Michal Szczepanski, and Prof. Samia Bouchafa. My research sits at the intersection of computer vision, large multimodal models, and resource-constrained deployment.

My core focus is making Vision-Language Models (VLMs) and Vision-Language-Action (VLA) models efficient enough to run on edge hardware — without retraining. I design training-free methods that analyse and reduce redundant visual tokens before they reach the LLM backbone, with a particular emphasis on deployment on the NVIDIA Jetson Orin for autonomous systems.

I am a Demythif.AI Fellow under Institut DATAIA, co-funded by the Marie Skłodowska-Curie Actions (MSCA) and the European Union — a programme dedicated to responsible, human-centred AI.

Workshop paper accepted at ICRA 2026 LOWI Workshop · available
MSc · Northwestern Polytechnical University, Xi'an
BSc (Eng) · Yunnan Tech & Business University
Adv. Diploma · IPMC-Ghana
MSCA / Institut DATAIA Fellow
Yvon Apedo — AI Researcher
📷 photo.jpg
institution
Université Paris-Saclay / IBISC Lab & CEA
research area
Efficient Multimodal AI (VLM & VLA)
location
Paris, France
target hardware
NVIDIA Jetson Orin (edge deployment)

Focus Areas

Research

VLM Token Pruning — SVD-PRUNE

Vision-Language Models · ICRA LOWI Workshop 2026

A training-free method that inserts SVD-based leverage scoring into the LLM forward pass to prune redundant visual tokens. Evaluated on LLaVA-1.5 7B & 13B and Qwen-VL across ScienceQA, GQA, POPE, AI2D, TextVQA, and MMBench. Achieves 4× speedup at 16 tokens vs the 576-token baseline — with configurable injection layer via environment variables.

LLaVA-1.5 7B LLaVA-1.5 13B Qwen-VL LLaMA backbone lmms-eval VLMEvalKit

VLA Pruning — Robot Action Models

Vision-Language-Action Models · Ongoing

Extending visual token pruning to VLA models for robotic manipulation — an ongoing effort building on the VLM work. Working with OpenVLA, π₀ (pi_0), and CogAct on the LIBERO simulation benchmark. A key challenge: spatial token importance differs fundamentally from VLMs — removing spatially critical tokens can cause task failure. Targeting ICRA workshop, CoRL 2026, and NeurIPS 2026.

OpenVLA π₀ (pi_0) CogAct DINOv2 SigLIP LIBERO

Edge Deployment

Jetson Orin · Embedded Systems

All pruning work targets real-world deployment on the NVIDIA Jetson Orin, a key constraint shaping the design of SVD-PRUNE. Benchmarking inference latency, memory footprint, and accuracy trade-offs directly on edge hardware for autonomous systems applications.

Jetson Orin RTX 3080 16GB Latency benchmarking Memory efficiency

Survey: Token Pruning for VLMs

Literature Review · Ongoing

A systematic survey benchmarking state-of-the-art token pruning and merging techniques for VLMs. Covers key design dimensions — where to prune, how to score token importance, merging strategies (e.g. G2TM-style graph merging), training-free vs fine-tuning approaches — and open challenges for the community.

ToMe FastV G2TM VLA-Pruner SP-VLA LightVLA

Academic Output

Publications

Toward Embedded Vision-Language Perception for Long-Term Autonomous Robots via Training-Free Token Pruning
Yvon Apedo, Dr. Martyna Poreba, Dr. Michal Szczepanski, Prof. Samia Bouchafa
ICRA 2026 · LOWI Workshop 2026
Weakly Supervised Crack Segmentation Using Adversarial Learning and Transformers
Yvon Apedo, Huanjie Tao
Multimedia Systems, Springer 2025
Published
Unsupervised Domain Adaptation for Crack Segmentation Based on Cross-Domain Stylization and Dual Adversarial Feature Learning
Yvon Apedo, Huanjie Tao, Wu Gao, Chao Xie, Shusen Zhao
Journal of Computing for Civil Engineering · ASCE 2025
Published
ADB-Crack: A Transformer-Based Framework with Adaptive Context Fusion and Dynamic Feature Refinement for High-Precision Pavement Crack Segmentation
Yvon Apedo, Huanjie Tao, Chao Xie, Shusen Zhao
Automation in Construction · Elsevier 2026
Under Review
Systematic Literature Review on Forecasting and Prediction of Technical Debt Evolution
Ajibode Adekunle, Yvon Apedo
arXiv 2024
Published

Core Research Method

Visual Token Pruning

VLMs like LLaVA-1.5 encode images as 576 visual tokens — most are redundant background patches. SVD-based leverage scoring identifies which tokens carry the most information before they enter the LLM backbone. Use the slider below to see how pruning affects token selection and model efficiency.

Visual token importance map (24 × 24 = 576 tokens)
Kept Pruned
576 tokens
1.0× speedup
0% pruned
Architecture
Image
336×336
Vision Enc.
CLIP ViT-L
SVD-PRUNE
576 → 576
LLM
Vicuna-7B
Answer
text output
Accuracy retention at current token count
Relative to 576-token baseline · LLaVA-1.5 7B

VLA Research Platform

π₀ Cross-Embodiment Dataset

My VLA pruning work builds directly on π₀ — Physical Intelligence's generalist robot foundation model trained across 8 distinct robot embodiments. Each robot presents a different kinematic chain and task distribution — making token importance estimation fundamentally harder than in standard VLMs. pi.website ↗

UR5e
UR5e
Bimanual UR5e
Bimanual UR5e
Franka
Franka
Bimanual Trossen
Bi. Trossen
Bimanual ARX
Bi. ARX
Mobile Trossen
Mob. Trossen
Mobile Fibocom
Mob. Fibocom
UR5e · Single Arm · 6-DOF 01/07
Make coffee Load dishes Fold laundry Bag groceries Pack bottles Open popcorn Bus table Plug in cable

Toolkit

Technical Skills

VLM / VLA Models

LLaVA-1.5 Qwen-VL OpenVLA π₀ (pi_0) CogAct Prismatic-7B DINOv2 SigLIP LLaMA-2

Frameworks & Libraries

PyTorch HuggingFace lmms-eval VLMEvalKit timm NumPy OpenCV

Research Methods

Token Pruning SVD / Leverage Scores Token Merging Attention Analysis Semantic Segmentation Domain Adaptation

Benchmarks

ScienceQA GQA POPE MMStar AI2D TextVQA MMBench LIBERO

Hardware & Infra

Jetson Orin RTX 3080 Linux Server (SSH) Docker Conda Git

Languages

Python Java JavaScript Swift (iOS) SQL Bash

Credentials

Certifications

Learning AI Through Visualization
Columbia University
2025 · Taught by Ali Hirsa on Columbia+
View Certificate ↗
Google Advanced Data Analytics Specialization
Google / Coursera
2025 · 6-course specialisation
View Certificate ↗
Machine Learning Specialization
Stanford University · DeepLearning.AI
2023 · Taught by Andrew Ng
View Certificate ↗
Deep Learning Specialization
DeepLearning.AI
2023 · Taught by Andrew Ng · CNNs, RNNs, Transformers
View Certificate ↗

Side Work

Software Projects

VLM Research Hub

Interactive dashboard for comparing VLM architectures and visualising token pruning benchmarks. Includes VLM timeline, pruning comparisons, and model performance charts.

Open ↗

ALAN AI News App

React-based news web application with real-time AI integration via ALAN AI voice assistant.

Live ↗

Amazon Clone

Full-stack replica with authentication, product listing, cart, and secure checkout flow.

Live ↗

HR-SSM

Robust HR management web application built on the Spring-Struts-MyBatis (SSM) framework.

GitHub ↗

Tourism Blog

Dynamic, data-driven tourism blog built with JSP and a relational back-end.

GitHub ↗

Project Peter (Android)

Android application featuring user registration and picture publication.

GitHub ↗

KOMI — Vehicles Sales (iOS)

Swift iOS app for vehicle sales: user registration, listings, and booking functionality.

GitHub ↗