I tried the Vision Pro today.
I had heard mixed reviews, mostly about how heavy it is and people getting headaches or vertigo.
Those challenges are real.
Even still, the experience was pretty incredible.
I might need to go back for another demo, so I can choose my own adventure rather than staying on the standard demo path.
The eye tracking selection was natural and pretty effective.
I did find my eyes getting a bit tired though and eventually got a bit of a headache.
The pinch to select was also quite good, though I found myself wanting to cross my arms and then the device cameras couldn’t see my pinching and I had to put my hands back on my lap.
An Apple Watch could probably solve this somehow.
I added image support for the chat
shortcode of this site.
assistant
The image depicts a LEGO chef figure in a playful, detailed kitchen setting. The chef, dressed in a white chef’s coat, apron, and white chef’s hat, appears to be cooking or serving food. The LEGO figure is smiling and holding utensils, with plates of colorful items like tomatoes in the foreground, resembling food. The background features a kitchen environment with soft, out-of-focus lighting and various kitchen elements, contributing to the warm and inviting atmosphere.
I tried stacking multiple pages of a pdf vertically as a single image to a model, then doing data extraction from this.
It didn’t work.
I imagine this is because models aren’t trained on much data like this.
The inference seemed to output made up data.
An interesting pitch written by Hillel for preferring reStructuredText to Markdown.
Multiple studies have shown that hallucinations can be significantly reduced by giving the model the right context via retrieval or tools that the model can use to gather context (e.g., web search).
I wrote and screen-recorded myself building a Python app to call a model to extract structured data from an image, making heavy use of codegen with Cursor.
The same protobuf is used as instructions in the prompt and to unpack the result returned by the model into an instance of the class generated from the protobuf via protoc
.
I’m planning to open source this pattern once I get it into a better state.
Also, I’m looking into ways to host the video of the screen recording for fun and to reference later.
This point resonates with me.
The more time I spend prompting models, the more it’s becoming clear that the clarity of the instructions are what matter most.
Writing clear, unambiguous instructions is not easy.
Decrease scope and you have a chance of doing it well.
I ran the code from my Fine-tuning “Connections” post using gpt-4o-mini
.
I was hoping the results might be a bit better, which could motivate an effort to fine-tune the model.
I’m not sure where my original version of this code went, so I reconstructed a repo for it.
Once I was done, I ran 100 prompts through the model to get a sense of where its baseline performance was.
Correct: 2.00%
Incorrect: 98.00%
Total Categories Correct: 19.25%
Not great, and not much different from gpt-3.5-turbo
.
With these kind of results, I wasn’t particularly motivated to put the effort in to do more fine tunes.
Tried to join in on the llama3.1-405b
hype using Groq but sadly, no dice
curl -X POST https://api.groq.com/openai/v1/chat/completions \
-H "Authorization: Bearer $GROQ_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "llama-3.1-405b-reasoning",
"messages": [
{
"role": "user",
"content": "Hello, how are you?"
}
]
}'
{"error":{"message":"The model `llama-3.1-405b-reasoning` does not exist or you do not have access to it.","type":"invalid_request_error","code":"model_not_found"}}
The queue to try it out in their chat is also quite long, so I guess either the infra needs to scale up or the hype needs to die down.
I’ve been wanting to create a chat component for this site for a while, because I really don’t like quoting conversations and manually formatting them each time.
When using a model playground, usually there is a code snippet option that generates Python code you can copy out intro a script.
Using that feature, I can now copy the message list and paste it as JSON into a Hugo shortcode and get results like this:
espanso
I tried out adding espanso to configure text expansions rather than using Alfred just to try something new.
This is the PR to add it to my Nix configurations.
The existing examples are a toy configuration.
The tool seems to support far more complex configuration that I still need to look into further.
gpt-4o-mini
Incredible writing and insight by Linus in Synthesizer for thought.
I will probably need to revisit this work several times.