2024-08-18

[logs] August 18, 2024

I can't believe we're back to discussing LLMs' ability to reason. Where have you been these past two years? In a bunker? If you'd actually worked with LLMs during this time, you'd know by now that they're obviously pattern-matching machines. Try asking one to write incorrect… pic.twitter.com/KPcDCI2cjD
— Andriy Burkov (@burkov) August 18, 2024

I haven’t viewed the LLMs-can, LLM-can’t discourse through this lens explicitly.

they’re obviously pattern-matching machines

I’m not sure if I understand at what point these are different things. Maybe it’s a consequence of how I learn, but I generally develop skills on the foundations of seeing and understanding how someone more skilled than myself solves a problem.

2024-08-12

[logs] August 12, 2024

huggingface

I tried to run florence-2 and colpali using the Huggingface serverless inference API. Searching around, there seems to pretty pretty start support for image-text-to-text models. On Github, I only found a few projects that even reference these types of models.

I didn’t really know what I was doing, so I copied the example code then tried to use a model to augment it to call florence-2. Initially, it seemed like it was working:

2024-08-11

[logs] August 11, 2024

embeddings

I’ve been doing some experimentation with smaller models and embeddings, including distilbert/distilbert-base-uncased-finetuned-sst-2-english and cardiffnlp/twitter-roberta-base-sentiment-latest as binary sentiment classifiers and google/vit-base-patch16-224 as an image classifier.

Also GoogleNews-vectors-negative300 and fasttext-wiki-news-subwords-300 for embeddings to try and find semantically similar words and concepts.

2024-08-09

[logs] August 9, 2024

I figured out the issue with adding mistral-large. After a bit of debugging, I realized by manually calling llm_mistral.refresh_models() that something was wrong with how I had added the secret on Modal. It turns out the environment variable name for the Mistral API key needed to be LLM_MISTRAL_KEY. I’m going to try and make a PR to the repo to document this behavior.

I’ve been trying to run models locally. Mostly specifically colpali and florence-2. This has not been easy. It’s possible these require GPUs and might not be macOS friendly. I’ve ended up deep in Github threads and dependency help trying to get basic inference running. I might need to start with something more simpler and smaller and build up from there.

2024-08-08

[logs] August 8, 2024

I did some experimentation deriving a data model iteratively (something I am currently calling “data model distillation”) by sequentially passing multiple images (could work with text as well) to a language model and prompting it to improve the schema using any new learnings from the current image. Results so far have been unimpressive.

I’ve been hearing good things about mistral-large-2. I’m working on adding it to bots-doing-things but have had a bit of dependency trouble so far.

2024-08-06

[logs] August 6, 2024

I watched Jeremy Howard’s interview with Carson Gross, the author of htmx. As someone who learned my first bits of web dev with jQuery, I feel like I appreciate the foundations of the approach in the library, but am still early in fully developing my mental model. Jeremy built a Python wrapper on top of htmx called fastml and the combination of these technologies is pretty well aligned with the technology I like to work with.

2024-08-04

[logs] August 4, 2024

vision_pro

I tried the Vision Pro today. I had heard mixed reviews, mostly about how heavy it is and people getting headaches or vertigo. Those challenges are real. Even still, the experience was pretty incredible. I might need to go back for another demo, so I can choose my own adventure rather than staying on the standard demo path.

The eye tracking selection was natural and pretty effective. I did find my eyes getting a bit tired though and eventually got a bit of a headache. The pinch to select was also quite good, though I found myself wanting to cross my arms and then the device cameras couldn’t see my pinching and I had to put my hands back on my lap. An Apple Watch could probably solve this somehow.

2024-08-02

[logs] August 2, 2024

I added image support for the chat shortcode of this site.

gpt-4o

user

Describe this image

assistant

The image depicts a LEGO chef figure in a playful, detailed kitchen setting. The chef, dressed in a white chef’s coat, apron, and white chef’s hat, appears to be cooking or serving food. The LEGO figure is smiling and holding utensils, with plates of colorful items like tomatoes in the foreground, resembling food. The background features a kitchen environment with soft, out-of-focus lighting and various kitchen elements, contributing to the warm and inviting atmosphere.

2024-07-31

[logs] July 31, 2024

I tried stacking multiple pages of a pdf vertically as a single image to a model, then doing data extraction from this. It didn’t work. I imagine this is because models aren’t trained on much data like this. The inference seemed to output made up data.

An interesting pitch written by Hillel for preferring reStructuredText to Markdown.

Multiple studies have shown that hallucinations can be significantly reduced by giving the model the right context via retrieval or tools that the model can use to gather context (e.g., web search).
Read More…

2024-07-30

[logs] July 30, 2024

protobuf

I wrote and screen-recorded myself building a Python app to call a model to extract structured data from an image, making heavy use of codegen with Cursor. The same protobuf is used as instructions in the prompt and to unpack the result returned by the model into an instance of the class generated from the protobuf via protoc. I’m planning to open source this pattern once I get it into a better state. Also, I’m looking into ways to host the video of the screen recording for fun and to reference later.