MedGRPO: Multi-Task Reinforcement Learning for Heterogeneous Medical Video Understanding

CVPR 2026

Task: The work introduces the MedVidBench benchmark (531K video-instruction pairs) and the MedGRPO RL framework, which utilizes cross-dataset reward normalization and a medical LLM judge to stabilize training and advance medical video understanding.
Results: Supervised fine-tuning on MedVidBench outperforms GPT-4.1 and Gemini-2.5-Flash across all tasks, with MedGRPO further improving performance over the SFT baseline on multiple tasks.

MedGRPO introduces a comprehensive framework for medical video understanding, combining a large-scale dataset with a novel multi-task reinforcement learning approach.

MedVidBench Dataset

531K video-instruction pairs: Large-scale dataset for medical video understanding
8 diverse tasks: Covers video-level, segment-level, and frame-level understanding
Enables comprehensive evaluation of medical video models

MedGRPO Framework

Multi-Task Reinforcement Learning: Novel RL framework designed for heterogeneous medical video tasks
Cross-Dataset Reward Normalization: Ensures balanced training across different tasks and datasets
Medical LLM Judge: Specialized evaluation for medical video caption quality

Performance Improvements

Compared to supervised fine-tuning (SFT) baseline:

+0.074 mIoU@0.3 on temporal action grounding
+0.588 LLM score on video summary generation
Consistent improvements across all 8 tasks