2024-07-22

[logs] July 22, 2024

I’ve been wanting to create a chat component for this site for a while, because I really don’t like quoting conversations and manually formatting them each time. When using a model playground, usually there is a code snippet option that generates Python code you can copy out intro a script. Using that feature, I can now copy the message list and paste it as JSON into a Hugo shortcode and get results like this:

2024-07-21

[logs] July 21, 2024

espanso

I tried out adding espanso to configure text expansions rather than using Alfred just to try something new. This is the PR to add it to my Nix configurations. The existing examples are a toy configuration. The tool seems to support far more complex configuration that I still need to look into further.

`gpt-4o-mini`

people frame this like it’s somehow a win over llama, when in fact the goal of llama has wildly succeeded: commoditize models and drive token cost to zero
Read More…

2024-07-20

[logs] July 20, 2024

Incredible writing and insight by Linus in Synthesizer for thought. I will probably need to revisit this work several times.

2024-07-19

[logs] July 19, 2024

How can I add videos to Google Gemini as context (is this even what their newest model is called anymore) and why is it so hard to figure it out? https://gemini.google.com only let’s me upload images. I assume I need to pay for something.

I played around with Cohere’s chat. They support web search and calculator and a python interpreter as tools as well as files and an internet search connector. I added “Web Search with Site” pointed to Steph Curry’s stats on Basketball Reference. Then I prompted

2024-07-16

[logs] July 16, 2024

Research and experimentation with models presents different problems than I am used to dealing with on a daily basis. The structure of what you want to try out changes often, so I understand why some folks prefer to use notebooks. Personally, notebooks haven’t caught on for my so I’m still just writing scripts. Several times now, I’ve run a relatively lengthy (and expensive) batch of prompts through a model only to realize something about my setup wasn’t quite right. Definitely bringing back memories of finetuning gpt-3.5-turbo to play Connections but I’m learning a lot along the way.

2024-07-14

[logs] July 14, 2024

I spent some more time experimenting with thought partnership with language models. I’ve previously experimented with this idea when building write-partner. Referring back to this work, the prompts still seemed pretty effective for the goal at hand. My original idea was to incrementally construct and iterate on a document by having a conversation with a language model. A separate model would analyze that conversation and update the working draft of the document to include new information, thoughts or insights from the conversation. It worked reasonably with gpt-3.5-turbo. I’m eager to try it with claude-3.5-sonnet. Today, I rebuilt a small version of this idea with ollama in Python. The goal was to try the idea out focused on a local-first experience. For this, I used a smaller model. Initially, I tried mistral but ended up settling on llama3 as it was a bit better at following my instructions. Instead of writing to the working file after each conversation turn, I decided to add a done command that allowed me to do that on-demand.

2024-07-13

[logs] July 13, 2024

claude-3.5-sonnet

While I didn’t have much success getting gpt-4o to perform Task 1 - Counting Line Intersection from the Vision Language Models Are Blind paper, I pulled down some code and did a bit of testing with Claude 3.5 Sonnet. The paper reports the following success rate for Sonnet for this line intersection task:

Thickness	Sonnet 3.5
2	80.00
3	79.00
4	73.00
Average	77.33

I used the code from the paper to generate 30 similar images with line thickness 4 of intersecting (or not) lines. I chose a thickness of 4 because this was the worst performing thickness according to the paper. With these classified manually (I didn’t realize the configurations.json file already had ground truths in it), I ran the prompt from the paper against these images using Sonnet.

2024-07-11

[logs] July 11, 2024

We probably are living in a simulation and we’re probably about to create the next one.
Martin Casado

https://podcasts.apple.com/us/podcast/invest-like-the-best-with-patrick-oshaughnessy/id1154105909?i=1000661628717

2024-07-10

[logs] July 10, 2024

VLMs are Blind showed a number of interesting cases where vision language models fail to solve problems that humans can easily solve. I spent some time trying to build examples with additional context that could steer the model to correctly complete Task 1: Counting line intersections, but didn’t have much success.

2024-07-08

[logs] July 8, 2024

Kent wrote this post on how to engage an audience by switching the first and second slide of a presentation. The audience focuses more as they try to fill in the gaps of what you’ve introduced them to so far.