2024-09-05

[logs] September 5, 2024

baml

Played around a bit with baml for extraction structured data with a VLM. It’s an interesting approach and has better ergonomics and tooling from most things I’ve tried so far. I like how you can declare test cases in the same place as the object schemas and that there is a built-in playground. I need to see how to handle multi-step pipelines.

I experimented with doing data extraction from pictures of menus. Early results were mixed. I think my photo quality isn’t great and that might be one of the bigger issues.

2024-09-02

[logs] September 2, 2024

Benchmarking >80 LLMs shows: The best model is not necessarily the best for your programming language 😱

- Best overall: Anthropic’s Sonnet 3.5
- Best for Go: Meta’s Llama 3.1 405B
- Best for Java: OpenAI’s GPT-4 Turbo
- Best for Ruby: OpenAI’s GPT-4o

Good models for one… pic.twitter.com/EYUphEI5rH
— Markus Zimmermann (@zimmskal) September 2, 2024

Great to see more concrete results published on how different models are “the best” at writing different programming languages.

Iterating on Cogno, improving the “remaining guesses” and sharing functionality.

2024-08-31

[logs] August 31, 2024

Language models can’t

generate instructions for knitting patterns
generate crossword puzzles from scatch

Language models can

generate Connections puzzles

2024-08-29

[logs] August 29, 2024

Incredible read: https://eieio.games/essays/the-secret-in-one-million-checkboxes/

I failed many attempts at getting Sonnet to write code to display the folder structure of the output of a tree -F command using shortcodes. After a lot of prompting, I wrote a mini-design doc on how the feature needed to be implemented and used it as context for Sonnet. I tried several variants of instructions in the design including trying to improve it with the model itself for clarity. I validated that the model could translate from the tree -F to the html markup directly. It could. That in fact is the example html target document in my design doc. Here is that doc:

2024-08-25

[logs] August 25, 2024

I tried Townie. As has become tradition, I tried to build a writing editor for myself. Townie got a simple version of this working with the ability to send a highlighted selection of text to the backend and run it through a model along with a prompt. This experience was relatively basic, using a textarea and a popup. From here, I got Townie to add the ability to show diffs between the model proposal and original text. It was able to do this for the selected text using CSS in a straightforward manner. I wanted to support multiple line diffs and diffs across multiple sections of the file. I suggested we use an open source text editor. At this point, things started to break. The app stopped rendering and I wasn’t able to prompt it into resolving the issue. I did manage to get it to revert (fixing forward) to a state where the app rendered again. However, the LLM-completion hotkey was broken.

2024-08-23

[logs] August 23, 2024

I’ve been trying out Cursor’s hyped composer mode with Sonnet. I am a bit disappointed. Maybe I shouldn’t be. I think it’s not as good as I expected because I hold Cursor to a higher bar than the other developer tools out there. It’s possible it’s over-hyped or that I am using it suboptimally. But it’s more or less of the same quality as most of the tools of the same level of abstraction like aider, etc. I am trying to create a multipane, React-based writing app. It’s possible I need to provide more detailed description than I am giving so far. However, my main complaint after running it is that now I have a ton of code that isn’t quite right and I don’t know where or why it’s sort of broken. Now, I need to read all the code. This approach is notably less productive than slowly building up an app with LLM-code generation, because after each generation I can test the new code and make sure it does what I intended (or write automated tests to do that). The code I get out of Composer doesn’t do what I want, but the LLM doesn’t know why, either because my high level task is under-specified, it doesn’t have enough context, or the ask is too vague. I don’t usually run into this issue when I use cmd+k. Maybe, I need to watch some videos of folks using it.

2024-08-22

[logs] August 22, 2024

I tried out OpenRouter for the first time. My struggles to find an API that hosted llama3.1-405B motivated me to try this out. There are too many companies providing inference APIs to keep track. OpenRouter seems to be aiming to make all these available from a single place, sort of like AWS Bedrock, but not locked in cloud configuration purgatory. The first thing I tried was playing a game of Connections with nousresearch/hermes-3-llama-3.1-405b. It didn’t get any categories correct for the 2024-08-21 puzzle. OpenRouter’s app showcase list is an interesting window into how people are using models. The dominant themes are

2024-08-21

[logs] August 21, 2024

An interesting read about how the world works through an economic lens.

But what is success? You can quantify net worth, but can you quantify the good you have brought to others lives?
It is not all about the TAM monster–doing cool things that are NOT ECONOMICALLY VALUABLE, but ARTISTICALLY VALUABLE, is equally important.

2024-08-20

[logs] August 20, 2024

I downloaded Pile, a journal app with a first-class language model integration and an offline ollama integration. For personal data, running the model offline is a must for me. I use DayOne sporadically, but I’m intrigued by the potential of a more of conversational format as I write.

The concept of a journal writing partner appears to be capturing mindshare. I found another similar app called Mindsera today as well. I also learned about Lex which puts collaborative and AI features at the heart of document authorship, I concept I played around a bit with Write Partner.

2024-08-19

[logs] August 19, 2024

wezterm

I setup WezTerm and experimented a bit. It’s a nice terminal emulator. I like the builtin themes and Lua as a configuration language. These days, I largely rely on the Cursor integrated terminal. It’s not the greatest, but having cmd+k it’s a bit of a killer feature.