How can I add videos to Google Gemini as context (is this even what their newest model is called anymore) and why is it so hard to figure it out?
https://gemini.google.com only let’s me upload images.
I assume I need to pay for something.
I played around with Cohere’s chat.
They support web search and calculator and a python interpreter as tools as well as files and an internet search connector.
I added “Web Search with Site” pointed to Steph Curry’s stats on Basketball Reference.
Then I prompted
Research and experimentation with models presents different problems than I am used to dealing with on a daily basis.
The structure of what you want to try out changes often, so I understand why some folks prefer to use notebooks.
Personally, notebooks haven’t caught on for my so I’m still just writing scripts.
Several times now, I’ve run a relatively lengthy (and expensive) batch of prompts through a model only to realize something about my setup wasn’t quite right.
Definitely bringing back memories of finetuning gpt-3.5-turbo to play Connections but I’m learning a lot along the way.
I spent some more time experimenting with thought partnership with language models.
I’ve previously experimented with this idea when building write-partner.
Referring back to this work, the prompts still seemed pretty effective for the goal at hand.
My original idea was to incrementally construct and iterate on a document by having a conversation with a language model.
A separate model would analyze that conversation and update the working draft of the document to include new information, thoughts or insights from the conversation.
It worked reasonably with gpt-3.5-turbo
.
I’m eager to try it with claude-3.5-sonnet
.
Today, I rebuilt a small version of this idea with ollama
in Python.
The goal was to try the idea out focused on a local-first experience.
For this, I used a smaller model.
Initially, I tried mistral
but ended up settling on llama3
as it was a bit better at following my instructions.
Instead of writing to the working file after each conversation turn, I decided to add a done
command that allowed me to do that on-demand.
While I didn’t have much success getting gpt-4o
to perform Task 1 - Counting Line Intersection from the Vision Language Models Are Blind paper, I pulled down some code and did a bit of testing with Claude 3.5 Sonnet.
The paper reports the following success rate for Sonnet for this line intersection task:
Thickness | Sonnet 3.5 |
---|
2 | 80.00 |
3 | 79.00 |
4 | 73.00 |
Average | 77.33 |
I used the code from the paper to generate 30 similar images with line thickness 4 of intersecting (or not) lines.
I chose a thickness of 4 because this was the worst performing thickness according to the paper.
With these classified manually (I didn’t realize the configurations.json
file already had ground truths in it), I ran the prompt from the paper against these images using Sonnet.
We probably are living in a simulation and we’re probably about to create the next one.
Martin Casado
https://podcasts.apple.com/us/podcast/invest-like-the-best-with-patrick-oshaughnessy/id1154105909?i=1000661628717
VLMs are Blind showed a number of interesting cases where vision language models fail to solve problems that humans can easily solve.
I spent some time trying to build examples with additional context that could steer the model to correctly complete Task 1: Counting line intersections, but didn’t have much success.
Kent wrote this post on how to engage an audience by switching the first and second slide of a presentation.
The audience focuses more as they try to fill in the gaps of what you’ve introduced them to so far.
I’ve been chatting with qwen2
, a model from Alibaba.
I mostly chatted with it in English but it appears to support several other languages and I noticed a bit of Chinese leaking through even though I don’t speak it, so I’m not sure how I would have introduced it to the conversation.
user: do you have a name
assistant: As an AI, I don’t have personal names or identities like humans do. However, you can call me Assistant to easily refer to me in our conversation.
I was inspired by Daniel’s post to add sidenotes to this blog.
I used claude-3.5-sonnet
to generate the CSS and HTML shortcode to do this.
I was impressed how well it turned out.
Now I need to read the CSS in more detail to understand what Claude did
It was almost too easy.
I’m not the most competent CSS writer and I had never written a Hugo shortcode before.
In several turns with Sonnet in Cursor, I was able to create a basic styled shortcode for a sidenote that appeared as a superscript number to start.
I prompted the model to allow me to use content in-line as the shortcode anchor and it generated those modifications.
Then I had an issue with the content overflowing on the right side if the content anchor was too far right or most of the time on mobile, so I asked the model for some options.
It suggested showing the sidenote content below the main post content.
I liked that, but didn’t like how it was shifting the content when it revealed the sidenote, so I prompted it to show the sidenote above the rest of the content.
A nice read by Stuart on Python development tools.
This introduced me to the pyproject.toml
configuration file, which is more comprehensive than a requirements file.
It’s something I’ll need to research a bit more before I’m ready to confidently adopt it.
Claude’s character
A video about the personality of the AI, Claude.
I’ve not yet become a big “papers” person yet, so this was my first introduction to “Constitutional AI”, which is a training approach where you use the model to train itself, by having it evaluate its own responses against the principles with which it was trained.