Local VLMs Have Improved

[TIL] January 16, 2025

About 6 months ago, I experimented with running a few different multi-modal (vision) language models on my Macbook. At the time, the results weren’t so great.

An experiment

With a slight modification to the script from that post, I tested out llama3.2-vision 11B (~8GB in size between the model and the projector). Using uv and inline script dependencies, the full script looks like this

# /// script
# requires-python = ">=3.12"
# dependencies = [
#     "ollama",
# ]
# ///

import os
import sys
import ollama

PROMPT = "Describe the provided image in a few sentences"


def run_inference(model: str, image_path: str):
    stream = ollama.chat(
        model=model,
        messages=[{"role": "user", "content": PROMPT, "images": [image_path]}],
        stream=True,
    )

    for chunk in stream:
        print(chunk["message"]["content"], end="", flush=True)


def main():
    if len(sys.argv) != 3:
        print("Usage: python run.py <model_name> <image_path>")
        sys.exit(1)

    model_name = sys.argv[1]
    image_path = sys.argv[2]

    if not os.path.exists(image_path):
        print(f"Error: Image file '{image_path}' does not exist.")
        sys.exit(1)

    run_inference(model_name, image_path)


if __name__ == "__main__":
    main()

We run it with

Calling Deepseek with `llm` using OpenAI Compatible APIs

[TIL] January 1, 2025

Deepseek V3 was recently released: a cheap, reliable, supposedly GPT-4 class model.

Quick note upfront, according to the docs, there will be non-trivial price increases in February 2025:

Input price (cache miss) is going up to $0.27 / 1M tokens from $0.14 / 1M tokens (~2x)
Output price is going up to $1.10 / 1M tokens from $0.28 /1M tokens (~4x)

From now until 2025-02-08 16:00 (UTC), all users can enjoy the discounted prices of DeepSeek API
Read More…

Intro to Astro

[TIL] December 29, 2024

I’m aiming to setup a space for more interactive UX experiments. My current Hugo blog has held up well with my scale of content but doesn’t play nicely with modern Javascript frameworks, where most of the open source energy is currently invested.

Astro seemed like a promising option because it supports Markdown content along with plug-and-play approach to many different frameworks like React, Svelte and Vue. More importantly, there is a precedent for flexibility when the Next Big Thing emerges which makes Astro a plausible test bed for new concepts without requiring a brand new site or a rewrite. At least, this was my thought process when I decided to try it out.

Practical Deep Learning, Lesson 7, Movie Recommendations

[TIL] December 24, 2024

Series: Fast.ai Course

In this notebook, we’ll use the MovieLens 10M dataset and collaborative filtering to create a movie recommendation model. We’ll use the data from movies.dat and ratings.dat to create embeddings that will help us predict ratings for movies I haven’t watched yet.

Create some personal data

Before I wrote any code to train models, I code-generated a quick UI to rate movies to generate my_ratings.dat, to append to ratings.dat. There is a bit of code needed to do that. The nice part is using inline script metadata and uv, we can write (generate) and run the whole tool in a single file.

Hugo Social Image Previews

[TIL] December 19, 2024

I’ve started posting more on Bluesky and I noticed that articles from my site didn’t have social image previews 😔

A screenshot showing a Bluesky post with no social preview image

I looked into Poison’s code (the theme this site is based on) and found that it supports social image previews at the site level or in the site’s assets folder.

This approach didn’t quite work for me. I recently switched to using page bundles which group markdown and content in the same folder and make linking to images from markdown straightforward. With a few modifications, I was able to make the code work to use images in the page bundles for social previews as well.

Embeddings Clustering

[TIL] November 19, 2024

I explored how embeddings cluster by visualizing LLM-generated words across different categories. The visualizations helped build intuition about how these embeddings relate to each other in vector space. Most of the code was generated using Sonnet.

!pip install --upgrade pip
!pip install openai
!pip install matplotlib
!pip install scikit-learn
!pip install pandas
!pip install plotly
!pip install "nbformat>=4.2.0"

We start by setting up functions to call ollama locally to generate embeddings and words for several categories. The generate_words function occasionally doesn’t adhere to instructions, but the end results are largely unaffected.

Add Alt Text to an Image with an LLM and Cursor

[TIL] November 15, 2024

Using Cursor, we can easily get a first pass at creating alt text for an image using a language model. It’s quite straightforward using a multi-modal model/prompt. For this example, we’ll use claude-3-5-sonnet-20241022.

A screenshot showing a prompt to Cursor asking it to generate alt text for an image

Here’s what it generates.

A screenshot showing Cursor’s generated alt text for an image. The alt text describes a diagram with multiple conversation branches and discusses caching approaches in Python applications

Practical Deep Learning, Lesson 5, Pricing Iowa Houses with Random Forests

[TIL] November 14, 2024

Series: Fast.ai Course

Having completed lesson 5 of the FastAI course, I prompted Claude to give me some good datasets upon which to train a random forest model. This housing dataset from Kaggle seemed like a nice option, so I decided to give it a try. I am also going to try something that Jeremy Howard recommended for this notebook/post, which is to not refine or edit my process very much. I am mostly going to try things out and if they don’t work, I’ll try and write up why and continue, rather than finding a working path and adding commentary at the end.

Practical Deep Learning, Lesson 4, Language Model Blog Post Imitator

[TIL] November 4, 2024

Series: Fast.ai Course

In this notebook/post, we’re going to be using the markdown content from my blog to try a language model. From this, we’ll attempt to prompt the model to generate a post for a topic I might write about.

Let’s import fastai and disable warnings since these pollute the notebook a lot when I’m trying to convert these notebooks into posts (I am writing this as a notebook and converting it to a markdown file with this script).

Practical Deep Learning, Lesson 3, Stochastic Gradient Descent on the Titanic Dataset

[TIL] October 18, 2024

course.fast.ai

Series: Fast.ai Course

In this notebook, we train two similar neural nets on the classic Titanic dataset using techniques from fastbook chapter 1 and chapter 4.

The first, we train using mostly PyTorch APIs. The second, with FastAI APIs. There are a few cells that output warnings. I kept those because I wanted to preserve print outs of the models’ accuracy.

The Titanic data set can be downloaded from the link above or with:

!kaggle competitions download -c titanic

To start, we install and import the dependencies we’ll need: