HGRPO: Hierarchical Grouped Reward Policy Optimization for Multi-Turn Conversational Agents
Компьютерные науки
At Data Fest 2026 in Belgrade, Karina Romanova, Senior LLM Research Engineer, presented HGRPO — a hierarchical modification of GRPO for multi-turn dialogue agents. Applied to a booking agent in Yandex Alice, the method improved truthfulness by 8.0 percentage points and reduced dialogue length by 10.7%. #MultimodalAI