The following is the notebook I used to experiment training an image model to classify types of rowing shells (with people rowing them) and the same dataset by rowing technique (sweep vs. scull). There are a few cells that output a batch of the data. I decided not to include these because the rowers in these images didn’t ask to be on my website. I’ll keep this in mind when selecting future datasets as I think showing the data batches in the notebook/post is helpful for understanding what is going on.
I set out to do a project using my learnings from the first chapter of the fast.ai course. My first idea was to try and train a Ruby/Python classifier. ResNets are not designed to do this, but I was curious how well it would perform. Classifying images of sources code by language My first idea was to download a bunch of source code from GitHub, sort it by language type, then convert it to images with Carbon.
I’ve enjoyed using fasthtml to deploy small, easily hosted webpages for little apps I’ve been building. I’m still getting used to it but it almost no effort at all to deploy. Recently, I built an app that would benefit from having a loading spinner upon submitting a form, but I couldn’t quite figure out how I would do that with htmx in FastHTML, so I built a small project to experiment with various approaches.
I revisited Eugene’s excellent work, “Prompting Fundamentals and How to Apply Them Effectively”. From this I learned about the ability to prefill Claude’s responses. Using this technique, you can quickly get Claude to output JSON without any negotiation and avoid issues with leading codefences (e.g. ```json). While JSON isn’t as good an example as XML, which ends less ambiguously, here’s a quick script showing the concept: import anthropic message = anthropic.
One challenge I’ve continued to have is figuring out how to use the models on Huggingface. There are usually Python snippets to “run” models that often seem to require GPUs and always seem to run into some sort of issues when trying to install the various Python dependencies. Today, I learned how to run model inference on a Mac with an M-series chip using llama-cpp and a gguf file built from safetensors files on Huggingface.
I’ve been experimenting with FastHTML for making quick demo apps, often involving language models. It’s a pretty simple but powerful framework, which allows me to deploy a client and server in a single main.py – something I appreciate a lot for little projects I want to ship quickly. I currently use it how you might use streamlit. I ran into an issue where I was struggling to submit a form with multiple images.
I spent a bit of time configuring WezTerm to my liking. This exercise was similar to rebuilding my iTerm setup in Alacritty. I found WezTerm to be more accessible and strongly appreciated the builtin terminal multiplexing because I don’t like using tmux. I configured WezTerm to provide the following experience. Getting this working probably took me 30 minutes spread across a few sessions as I noticed things I was missing.
In Python, the most straightforward path to implementing a gRPC server for a Protobuf service is to use protoc to generate code that can be imported in a server, which then defines the service logic. Let’s take a simple example Protobuf service: syntax = "proto3"; package simple; message HelloRequest { string name = 1; } message HelloResponse { string message = 1; } service Greeter { rpc SayHello (HelloRequest) returns (HelloResponse); } Next, we run some variant of python -m grpc_tools.
Temporal provides helpful primitives called Workflows and Activities for orchestrating processes. A common pattern I’ve found useful is the ability to run multiple “child workflows” in parallel from a single “parent” workflow. Let’s say we have the following activity and workflow (imports omitted for brevity) Activity code @dataclass class MyGoodActivityArgs: arg1: str arg2: str @dataclass class MyGoodActivityResult: arg1: str arg2: str random_val: float @activity.defn async def my_good_activity(args: MyGoodActivityArgs) -> MyGoodActivityResult: activity.
I spent some time experimenting with multi-modal model (also called vision models on the ollama site) to see how they perform. You try these out with the CLI ollama run <model> but I opted to use the ollama Python client. I didn’t find explicit documentation in the README on how to pass images to the model but the type hints in the code made it pretty easy to figure out and there are several examples around Github.