2024-12-17

Logs

delta

I’ve been setting up the foundations to add node summaries to Delta. Ideally, I will use the same model to create the node summaries as I use to generate the responses since this will keep the model dependencies minimal. However, my early experiments have yielded some inconsistency in how a shared prompt behaves across models. To try and understand this and smooth it out as much as possible, I plan to set up evals to ensure the summaries are

within a certain length
don’t include direct references to the user or assistant
are a single sentence or fragment
include the relevant topic(s) discussed

I am currently looking for a straightforward, local option for running evals. I’d like to avoid cloud products and testing across multiple models needs to be straightforward. My ideal output would be a matrix of the different models run and the results of the different eval checks against the respective model outputs given the same system and user prompts.

✎ Edit

Raw