Chatbot Arena

6.2. Chatbot Arena#

Chatbot Arena is a crowdsourced version of the vibe check; or an open AB testing platform for LLMs.

The premise is simple: let the public interact with your language model, and collect their ratings and feedback in real-time.

Chatbot Arena

The ahatbot arena is a website for a crowdsourced vibe check by blindtesting LLMs.

Here’s how it typically works:

  1. You host your language model on a public-facing interface, like a website or chatbot platform.

  2. Users can freely interact with your model, asking it questions, giving it tasks, or just having casual conversations.

  3. After each interaction, users are prompted to rate the model’s performance on various criteria, similar to a vibe check (e.g., coherence, relevance, quality).

  4. Users can also leave open-ended feedback, sharing their thoughts and experiences with the model.

Chatbot Arena shares some of the downsides of the vibe check. The feedback can be noisy and inconsistent, as different users have different expectations and criteria. However, as of early 2024, it seems fine – that the evaluators usually vote towards agreed ratings between LMs.