How can I add videos to Google Gemini as context (is this even what their newest model is called anymore) and why is it so hard to figure it out? https://gemini.google.com only let’s me upload images. I assume I need to pay for something.


I played around with Cohere’s chat. They support web search and calculator and a python interpreter as tools as well as files and an internet search connector. I added “Web Search with Site” pointed to Steph Curry’s stats on Basketball Reference. Then I prompted

Research and experimentation with models presents different problems than I am used to dealing with on a daily basis. The structure of what you want to try out changes often, so I understand why some folks prefer to use notebooks. Personally, notebooks haven’t caught on for my so I’m still just writing scripts. Several times now, I’ve run a relatively lengthy (and expensive) batch of prompts through a model only to realize something about my setup wasn’t quite right. Definitely bringing back memories of finetuning gpt-3.5-turbo to play Connections but I’m learning a lot along the way.

2024-07-14

I spent some more time experimenting with thought partnership with language models. I’ve previously experimented with this idea when building write-partner. Referring back to this work, the prompts still seemed pretty effective for the goal at hand. My original idea was to incrementally construct and iterate on a document by having a conversation with a language model. A separate model would analyze that conversation and update the working draft of the document to include new information, thoughts or insights from the conversation. It worked reasonably with gpt-3.5-turbo. I’m eager to try it with claude-3.5-sonnet. Today, I rebuilt a small version of this idea with ollama in Python. The goal was to try the idea out focused on a local-first experience. For this, I used a smaller model. Initially, I tried mistral but ended up settling on llama3 as it was a bit better at following my instructions. Instead of writing to the working file after each conversation turn, I decided to add a done command that allowed me to do that on-demand.

While I didn’t have much success getting gpt-4o to perform Task 1 - Counting Line Intersection from the Vision Language Models Are Blind paper, I pulled down some code and did a bit of testing with Claude 3.5 Sonnet. The paper reports the following success rate for Sonnet for this line intersection task:

ThicknessSonnet 3.5
280.00
379.00
473.00
Average77.33

I used the code from the paper to generate 30 similar images with line thickness 4 of intersecting (or not) lines. I chose a thickness of 4 because this was the worst performing thickness according to the paper. With these classified manually (I didn’t realize the configurations.json file already had ground truths in it), I ran the prompt from the paper against these images using Sonnet.

2024-07-10

VLMs are Blind showed a number of interesting cases where vision language models fail to solve problems that humans can easily solve. I spent some time trying to build examples with additional context that could steer the model to correctly complete Task 1: Counting line intersections, but didn’t have much success.

2024-07-08

Kent wrote this post on how to engage an audience by switching the first and second slide of a presentation. The audience focuses more as they try to fill in the gaps of what you’ve introduced them to so far.

I’ve been chatting with qwen2, a model from Alibaba. I mostly chatted with it in English but it appears to support several other languages and I noticed a bit of Chinese leaking through even though I don’t speak it, so I’m not sure how I would have introduced it to the conversation.

user: do you have a name

assistant: As an AI, I don’t have personal names or identities like humans do. However, you can call me Assistant to easily refer to me in our conversation.

I was inspired by Daniel’s post to add sidenotes to this blog. I used claude-3.5-sonnet to generate the CSS and HTML shortcode to do this. I was impressed how well it turned out. Now I need to read the CSS in more detail to understand what Claude did It was almost too easy. I’m not the most competent CSS writer and I had never written a Hugo shortcode before. In several turns with Sonnet in Cursor, I was able to create a basic styled shortcode for a sidenote that appeared as a superscript number to start. I prompted the model to allow me to use content in-line as the shortcode anchor and it generated those modifications. Then I had an issue with the content overflowing on the right side if the content anchor was too far right or most of the time on mobile, so I asked the model for some options. It suggested showing the sidenote content below the main post content. I liked that, but didn’t like how it was shifting the content when it revealed the sidenote, so I prompted it to show the sidenote above the rest of the content.

2024-07-02

A nice read by Stuart on Python development tools. This introduced me to the pyproject.toml configuration file, which is more comprehensive than a requirements file. It’s something I’ll need to research a bit more before I’m ready to confidently adopt it.


Claude’s character

A video about the personality of the AI, Claude. I’ve not yet become a big “papers” person yet, so this was my first introduction to “Constitutional AI”, which is a training approach where you use the model to train itself, by having it evaluate its own responses against the principles with which it was trained.