I’ve been experimenting with FastHTML for making quick demo apps, often involving language models.
It’s a pretty simple but powerful framework, which allows me to deploy a client and server in a single main.py
– something I appreciate a lot for little projects I want to ship quickly.
I currently use it how you might use streamlit
.
I ran into an issue where I was struggling to submit a form with multiple images.
I spent a bit of time configuring WezTerm to my liking.
This exercise was similar to rebuilding my iTerm setup in Alacritty.
I found WezTerm to be more accessible and strongly appreciated the builtin terminal multiplexing because I don’t like using tmux.
I configured WezTerm to provide the following experience.
Getting this working probably took me 30 minutes spread across a few sessions as I noticed things I was missing.
- Monokai-like theme
- Horizontal and vertical pane splitting
- Dimmed inactive panes
- Steady cursor
- Immediate pane closing with confirmation if something is still running
- Pane full screening
- Command+arrow navigation between panes
- Command+option+arrow navigation between tabs
- Moving between words in the command prompt with option-arrow
- Hotkey to clear terminal
What went well
I found achieving these configurations to be much easier in WezTerm than Alacritty, or at least, it took me less time.
The blend of native UI with dotfile-style configurable settings hits a sweet spot for my preferences as well, and I haven’t even scratched the surface of scripting things with Lua.
I’ve done some experimentation extracting structured data from documents using VLMs.
A summary of one approach I’ve tried can be found in my repo, impulse
.
I’ve found using Protobufs to be a relatively effective approach for extracting values from documents.
The high-level idea is you write a Protobuf as your target data model then use that Protobuf itself as most of the prompt
I really need a name for this as I reference the concept so frequently.
.
I discussed the approach in more detail in this post so I am going to jump right into it.
I’ve been prompting models to output JSON for about as long as I’ve been using models.
Since text-davinci-003
, getting valid JSON out of OpenAI’s models didn’t seem like that big of a challenge, but maybe I wasn’t seeing the long tails of misbehavior because I hadn’t massively scaled up a use case.
As adoption has picked up, OpenAI has released features to make it easier to get JSON output from a model.
Here are three examples using structured outputs, function calling and just prompting respectively.
In light of OpenAI releasing structured output in the model API, let’s move output structuring another level up the stack to the microservice/RPC level.
A light intro to Protobufs
Many services (mostly in microservice land) use Protocol Buffers (protobufs) to establish contracts for what data an RPC requires and what it will return.
If you’re completely unfamiliar with protobufs, you can read up on them here.
Here is an example of a message
that a protobuf service
might return.
In Python, the most straightforward path to implementing a gRPC server for a Protobuf service is to use protoc
to generate code that can be imported in a server, which then defines the service logic.
Let’s take a simple example Protobuf service:
syntax = "proto3";
package simple;
message HelloRequest {
string name = 1;
}
message HelloResponse {
string message = 1;
}
service Greeter {
rpc SayHello (HelloRequest) returns (HelloResponse);
}
Next, we run some variant of python -m grpc_tools.protoc
to generate code (assuming we’ve installed grpcio
and grpcio-tools
).
Here’s an example for .proto
files in a protos
folder:
Using models for various different purposes daily has been a satisfying endeavor for me because they can be used as tools to help make your vision for something come to life.
Models are powerful generators that can produce code, writing, images and more based on a user’s description of what they want.
But models “fill in the gaps” on behalf of the user to resolve ambiguity in the user’s prompt.
I attempted to reproduce the results for one task from the VLMs are Blind paper.
Specifically, Task 1: Counting line intersections.
I ran 150 examples of lines generated by the code from the project with line thickness 4.
I started with the prompt:
How many times do the blue and red lines intersect?
using the model claude-3.5-sonnet
with temperature 0.
The paper reported 73.00% correctness for claude-3.5-sonnet
with thickness 4.
Reproducing the experiment
My results were a bit better than those reported in the paper.
I’m trying something a bit new, writing some of my thoughts about how the future might look based on patterns I’ve been observing lately.
From where I’m sitting, it seems language models are positioned to become an indispensable tool for the software engineer.
While there continues to be advancement in model-driven agents which can autonomously accomplish software-creation tasks of increasing complexity, it’s not clear if or how long it will take these to completely replace the job function responsible for writing, testing, deploying and maintaining software in a production environment.
If we accept the premise that the job of software engineer will exist in some capacity for the next several years, it becomes interesting to explore how widespread use of language models in software development will affect the job and evolution of the field.
I spent some time working with Claude Artifacts for the first time.
I started with this prompt
I want to see what you can do. Can you please create a 2d rendering of fluid moving around obstacles of different shapes?
In effort to not spend this whole post quoting prompts
I need to figure out a better way to share conversations from all the different models I interact with, including multi-modal models. Ideally, I could export these in a consistent JSON structure to make rendering them in a standard conversation format easier. Static media (images, video, etc.) would be straightforward but things like Artifacts, which are rendered by the Claude UI fit this structure less cohesively.
, I’ve exported the whole conversation returned from the Anthropic API, using the response from the following endpoint
This API may change in the future. It isn’t an externally-documented API.
.