AI Researcher · Paris, France

Yvon Apedo

PhD researcher specialising in efficient multimodal AI (MLLMs) — compressing Vision-Language & Vision-Language-Action models for the real world.

MSCA Fellow VLM Models VLA Models Token Pruning LLaVA · Qwen-VL OpenVLA · π₀ Edge AI · Jetson Orin IBISC Lab · Paris-Saclay

Connect on LinkedIn ↗ View Publications 📄 View CV

Introduction

About Me

I am a PhD researcher at Université Paris-Saclay (IBISC Lab & CEA), working under the supervision of Dr. Martyna Poreba, Dr. Michal Szczepanski, and Prof. Samia Bouchafa. My research sits at the intersection of computer vision, large multimodal models, and resource-constrained deployment.

My core focus is making Vision-Language Models (VLMs) and Vision-Language-Action (VLA) models efficient enough to run on edge hardware — without retraining. I design training-free methods that analyse and reduce redundant visual tokens before they reach the LLM backbone, with a particular emphasis on deployment on the NVIDIA Jetson Orin for autonomous systems.

I am a Demythif.AI Fellow under Institut DATAIA, co-funded by the Marie Skłodowska-Curie Actions (MSCA) and the European Union — a programme dedicated to responsible, human-centred AI.

Workshop paper accepted at ICRA 2026 LOWI Workshop · available

MSc · Northwestern Polytechnical University, Xi'an

BSc (Eng) · Yunnan Tech & Business University

Adv. Diploma · IPMC-Ghana

MSCA / Institut DATAIA Fellow

📷 photo.jpg

institution

Université Paris-Saclay / IBISC Lab & CEA

research area

Efficient Multimodal AI (VLM & VLA)

supervisors

Prof. Samia Bouchafa Dr. Martyna Poreba Dr. Michal Szczepanski

location

Paris, France

target hardware

NVIDIA Jetson Orin (edge deployment)

links

GitHub LinkedIn Google Scholar ORCID 0000-0002-3098-8545 komi.apedo@universite-paris-saclay.fr

Focus Areas

Research

VLM Token Pruning — SVD-PRUNE

Vision-Language Models · ICRA LOWI Workshop 2026

A training-free method that inserts SVD-based leverage scoring into the LLM forward pass to prune redundant visual tokens. Evaluated on LLaVA-1.5 7B & 13B and Qwen-VL across ScienceQA, GQA, POPE, AI2D, TextVQA, and MMBench. Achieves 4× speedup at 16 tokens vs the 576-token baseline — with configurable injection layer via environment variables.

LLaVA-1.5 7B LLaVA-1.5 13B Qwen-VL LLaMA backbone lmms-eval VLMEvalKit

VLA Pruning — Robot Action Models

Vision-Language-Action Models · Ongoing

Extending visual token pruning to VLA models for robotic manipulation — an ongoing effort building on the VLM work. Working with OpenVLA, π₀ (pi_0), and CogAct on the LIBERO simulation benchmark. A key challenge: spatial token importance differs fundamentally from VLMs — removing spatially critical tokens can cause task failure. Targeting ICRA workshop, CoRL 2026, and NeurIPS 2026.

OpenVLA π₀ (pi_0) CogAct DINOv2 SigLIP LIBERO

Edge Deployment

Jetson Orin · Embedded Systems

All pruning work targets real-world deployment on the NVIDIA Jetson Orin, a key constraint shaping the design of SVD-PRUNE. Benchmarking inference latency, memory footprint, and accuracy trade-offs directly on edge hardware for autonomous systems applications.

Jetson Orin RTX 3080 16GB Latency benchmarking Memory efficiency

Survey: Token Pruning for VLMs

Literature Review · Ongoing

A systematic survey benchmarking state-of-the-art token pruning and merging techniques for VLMs. Covers key design dimensions — where to prune, how to score token importance, merging strategies (e.g. G2TM-style graph merging), training-free vs fine-tuning approaches — and open challenges for the community.

ToMe FastV G2TM VLA-Pruner SP-VLA LightVLA

Academic Output

Publications

Toward Embedded Vision-Language Perception for Long-Term Autonomous Robots via Training-Free Token Pruning

Yvon Apedo, Dr. Martyna Poreba, Dr. Michal Szczepanski, Prof. Samia Bouchafa

ICRA 2026 · LOWI Workshop 2026

arXiv ↗ Code ↗

Under Review

Weakly Supervised Crack Segmentation Using Adversarial Learning and Transformers

Yvon Apedo, Huanjie Tao

Multimedia Systems, Springer 2025

Paper ↗ Code ↗

Published

Unsupervised Domain Adaptation for Crack Segmentation Based on Cross-Domain Stylization and Dual Adversarial Feature Learning

Yvon Apedo, Huanjie Tao, Wu Gao, Chao Xie, Shusen Zhao

Journal of Computing for Civil Engineering · ASCE 2025

Paper ↗ Code ↗

Published

ADB-Crack: A Transformer-Based Framework with Adaptive Context Fusion and Dynamic Feature Refinement for High-Precision Pavement Crack Segmentation

Yvon Apedo, Huanjie Tao, Chao Xie, Shusen Zhao

Automation in Construction · Elsevier 2026

Paper ↗ Code ↗

Under Review

Systematic Literature Review on Forecasting and Prediction of Technical Debt Evolution

Ajibode Adekunle, Yvon Apedo

arXiv 2024

arXiv ↗

Published

Core Research Method

Visual Token Pruning

VLMs like LLaVA-1.5 encode images as 576 visual tokens — most are redundant background patches. SVD-based leverage scoring identifies which tokens carry the most information before they enter the LLM backbone. Use the slider below to see how pruning affects token selection and model efficiency.

Visual token importance map (24 × 24 = 576 tokens)

Kept Pruned

576 tokens

1.0× speedup

0% pruned

Architecture

Image

336×336

→

Vision Enc.

CLIP ViT-L

→

SVD-PRUNE

576 → 576

→

LLM

Vicuna-7B

→

Answer

text output

Accuracy retention at current token count

            Relative to 576-token baseline · LLaVA-1.5 7B
          

VLA Research Platform

π₀ Cross-Embodiment Dataset

My VLA pruning work builds directly on π₀ — Physical Intelligence's generalist robot foundation model trained across 8 distinct robot embodiments. Each robot presents a different kinematic chain and task distribution — making token importance estimation fundamentally harder than in standard VLMs. pi.website ↗

UR5e

Bimanual UR5e

Franka

Bi. Trossen

Bi. ARX

Mob. Trossen

Mob. Fibocom

UR5e · Single Arm · 6-DOF 01/07

Make coffee Load dishes Fold laundry Bag groceries Pack bottles Open popcorn Bus table Plug in cable

Toolkit

Technical Skills

VLM / VLA Models

LLaVA-1.5 Qwen-VL OpenVLA π₀ (pi_0) CogAct Prismatic-7B DINOv2 SigLIP LLaMA-2

Frameworks & Libraries

PyTorch HuggingFace lmms-eval VLMEvalKit timm NumPy OpenCV

Research Methods

Token Pruning SVD / Leverage Scores Token Merging Attention Analysis Semantic Segmentation Domain Adaptation

Benchmarks

ScienceQA GQA POPE MMStar AI2D TextVQA MMBench LIBERO

Hardware & Infra

Jetson Orin RTX 3080 Linux Server (SSH) Docker Conda Git

Languages

Python Java JavaScript Swift (iOS) SQL Bash

Credentials

Certifications

Learning AI Through Visualization

Columbia University

2025 · Taught by Ali Hirsa on Columbia+

View Certificate ↗

Google Advanced Data Analytics Specialization

Google / Coursera

2025 · 6-course specialisation

View Certificate ↗

Machine Learning Specialization

Stanford University · DeepLearning.AI

2023 · Taught by Andrew Ng

View Certificate ↗

Deep Learning Specialization

DeepLearning.AI

2023 · Taught by Andrew Ng · CNNs, RNNs, Transformers

View Certificate ↗

Side Work

Software Projects

VLM Research Hub

Interactive dashboard for comparing VLM architectures and visualising token pruning benchmarks. Includes VLM timeline, pruning comparisons, and model performance charts.

Open ↗

ALAN AI News App

React-based news web application with real-time AI integration via ALAN AI voice assistant.

Live ↗

Amazon Clone

Full-stack replica with authentication, product listing, cart, and secure checkout flow.

Live ↗

HR-SSM

Robust HR management web application built on the Spring-Struts-MyBatis (SSM) framework.

GitHub ↗

Tourism Blog

Dynamic, data-driven tourism blog built with JSP and a relational back-end.

GitHub ↗

Project Peter (Android)

Android application featuring user registration and picture publication.

GitHub ↗

KOMI — Vehicles Sales (iOS)

Swift iOS app for vehicle sales: user registration, listings, and booking functionality.

GitHub ↗