An automatic, extensible Framework to Evaluate User-Proxy Agents for Human-Likeness. 🌟 Star if you like it!
calibration evaluation-metrics dialogue-systems evaluation-framework conversational-ai user-simulation user-proxy llm-evaluation llm-as-a-judge chatbot-arena user-proxy-evaluation clariq qulac oasst1
-
Updated
Jan 16, 2026 - Jupyter Notebook