MedGRPO: Multi-Task Reinforcement Learning for Heterogeneous Medical Video Understanding

Task: The work introduces the MedVidBench benchmark (531K video-instruction pairs) and the MedGRPO RL framework, which utilizes cross-dataset reward normalization and a medical LLM judge to stabilize training and advance medical video understanding.
Results: Supervised fine-tuning on MedVidBench outperforms GPT-4.1 and Gemini-2.5-Flash across all tasks, with MedGRPO further improving performance over the SFT baseline on multiple tasks.

MedGRPO introduces a comprehensive framework for medical video understanding, combining a large-scale dataset with a novel multi-task reinforcement learning approach.

MedVidBench Dataset

  • 531K video-instruction pairs: Large-scale dataset for medical video understanding
  • 8 diverse tasks: Covers video-level, segment-level, and frame-level understanding
  • Enables comprehensive evaluation of medical video models

MedGRPO Framework

  • Multi-Task Reinforcement Learning: Novel RL framework designed for heterogeneous medical video tasks
  • Cross-Dataset Reward Normalization: Ensures balanced training across different tasks and datasets
  • Medical LLM Judge: Specialized evaluation for medical video caption quality

Performance Improvements

Compared to supervised fine-tuning (SFT) baseline:

  • +0.074 mIoU@0.3 on temporal action grounding
  • +0.588 LLM score on video summary generation
  • Consistent improvements across all 8 tasks