Are there any good benchmarks for chatbot memory capabilities?

Written by

in

For instance, memGPT uses an ‘LLM judge’ which is instructed to evaluate whether or not the generated response is consistent with the gold response, using the chat dataset of https://parl.ai/projects/msc/.

Anything else?

submitted by /u/lorepieri
[link] [comments]