2024-03-23

Logs

One of the greatest misconceptions concerning LLMs is the idea that they are easy to use. They really aren’t: getting great results out of them requires a great deal of experience and hard-fought intuition, combined with deep domain knowledge of the problem you are applying them to.

The whole “LLMs are useful” section hits for me. I have an experience similar to Simon’s and I also wouldn’t claim LLMs are without issue or controversy. For me they are unquestionably useful. They help me more quickly and effectively get ideas out of my head and into the real world. They help me learn more quickly and solve problems and answer questions as I’m learning. They increase my capabilities as an individual in the same sort of way getting access to Google for the first time did. These days, not having access to a language model makes me feel like I’ve had an essential tool taken away from me, like not having a calculator or documentation.

https://simonwillison.net/2024/Mar/22/claude-and-chatgpt-case-study/

The models claude-3-opus, claude-3-sonnet and gpt-4 solved Connections today on their first tries today. All but opus failed on subsequent tries.

gpt-3.5.turbo doesn’t seem to understand it is supposed to be playing the game, not me. groq-llama-2 got just 1/4 correct. mistral-large got 2/4.

As an intermediate player of this game, I would say today’s puzzle was on the easier side. The models don’t usually do so well.

BOOX Palma is a a mobile phone that uses an E Ink display. I immediately wanted this when I saw it.

While working on a script that evaluates language models’ ability to play Connections, I was using Cursor to make some improvements to the code. On a whim, I decided to have it swap all my convenience print statements to log statements. I’ve done this on a smaller scale but with a ~200 line file, practically, I knew it would probably take the model longer to do than for me to do myself. As I watched Cursor regenerate my file via inference and show a diff, I was struck by the idea that running complex, multiple passes of LLMs on code bases with target instructions may become the new “my code is compiling”.

An infinitely zoomable, Game of Life, simulating itself. The simulation adjusts speed as you zoom so that each level moves at a perceptible speed, allowing you to really appreciate it. Extremely well executed.

https://oimo.io/works/life/

✎ Edit

Raw