MedGRPO: Multi-Task Reinforcement Learning for Heterogeneous Medical Video Understanding

CVPR 2026

MedGRPO introduces a comprehensive framework for medical video understanding, combining a large-scale dataset with a novel multi-task reinforcement learning approach.

MedVidBench Dataset

531K video-instruction pairs: Large-scale dataset for medical video understanding
8 diverse tasks: Covers video-level, segment-level, and frame-level understanding
Enables comprehensive evaluation of medical video models

MedGRPO Framework

Multi-Task Reinforcement Learning: Novel RL framework designed for heterogeneous medical video tasks
Cross-Dataset Reward Normalization: Ensures balanced training across different tasks and datasets
Medical LLM Judge: Specialized evaluation for medical video caption quality

Performance Improvements

Compared to supervised fine-tuning (SFT) baseline:

+0.074 mIoU@0.3 on temporal action grounding
+0.588 LLM score on video summary generation
Consistent improvements across all 8 tasks