Meta released Llama 2 yesterday and the hype has ensued. While it’s exciting to see more powerful models become available, a model with weights is not the same as an API. It is still far less accessible.
A paper came out on the measurement of the degradation ChatGPT’s reasoning abilities. As real-time peer review took place over the course of the day on Twitter, the most compelling explanation that I heard to explain these findings was that OpenAI has further fine-tuned the models to respond in a manner consistent with the level of the prompt, because this is a better experience for the user. We’ll see if this explanation holds up over time.
A paper earlier this month investigated the ability of language models to use longer context supplied to them and finds that performance is the best for information at the beginning and the end of the context. If this is an architectural limitation of LLMs, we may be using semantic and contextual search for a while longer to ensure the model has the relevant context in a prompt of reasonable length, to ensure high quality results.