MedGRPO: Multi-Task Reinforcement Learning for Heterogeneous Medical Video Understanding

MedGRPO introduces a comprehensive framework for medical video understanding, combining a large-scale dataset with a novel multi-task reinforcement learning approach.

MedVidBench Dataset

  • 531K video-instruction pairs: Large-scale dataset for medical video understanding
  • 8 diverse tasks: Covers video-level, segment-level, and frame-level understanding
  • Enables comprehensive evaluation of medical video models

MedGRPO Framework

  • Multi-Task Reinforcement Learning: Novel RL framework designed for heterogeneous medical video tasks
  • Cross-Dataset Reward Normalization: Ensures balanced training across different tasks and datasets
  • Medical LLM Judge: Specialized evaluation for medical video caption quality

Performance Improvements

Compared to supervised fine-tuning (SFT) baseline:

  • +0.074 mIoU@0.3 on temporal action grounding
  • +0.588 LLM score on video summary generation
  • Consistent improvements across all 8 tasks