A new thing I am trying is sending thanks to folks who write articles or build projects that I find useful. I got a taste of this after writing an article on fine-tuning gpt-3.5 to solve the Connections word game, even though the results didn’t turn out that well. Getting the positive feedback was quite motivating and my hope is to give others the same appreciation for the positive impact their work makes on me.
I’m currently working on building a language model based chatbot that can answer questions about the contents of a database. There are a lot of products and libraries making efforts at this problem. To start, I tried out the Vanna.ai open source library. I followed this guide to get started with ChromaDB for the indices and OpenAI as the language model to query a Postgres database. I also set up a Postgres database with Docker and the Chinook dataset. I downloaded the Chinook dataset for Postgres from this repo. The dataset is described in detail here. To start up Docker and load the data, I ran the following from my host machine (not inside a Docker container)
For several days now, I’ve been looking into recording audio in a browser and streaming it to a backend over a websocket with the intent to do speech to text translation with an AI model.
I know the pieces are all there and I’ve done something like this before (streamed audio from a Twilio IVR to a node backend, the send that to a Google Dialogflow CX agent).
The current challenge is finding which pieces I want to connect.
I’ve used a lot of Next.js lately.
I like the developer experience.
It enjoyable to use to build frontends.
It also has route handlers, which are backend functions that are deployed on Lambda if you deploy on Vercel.
These route handlers can’t really support a websocket backend because they aren’t designed to be long lived, something I learned when I worked around if by creating a secondary route handler as an async function.
Apparently, these can now run for up to five minutes.1
Route handlers on Vercel can now run for a maximum of five minutes, which is an increase from the previous limit. This allows for more complex operations to be handled directly within these functions.
I would need to stand up a separate backend.
That seemed fine and fair enough, so I started looking at Deno, which I’ve also used recently and enjoyed.
Deno supports websockets out of the box.
It also supports importing npm modules – I plan to use @google-cloud/speech
to do speech to text conversion.
The remaining question is how I can stream audio captured in the browser with navigator.getUserMedia
over a websocket to forward to Google to convert to text.
Hardly seemed with a TIL post because it was too easy, but I learned gpt-4
is proficient at building working ffmpeg
commands.
I wrote the prompt
convert m4a to mp3 with ffmpeg
and it responsed with
ffmpeg -i input.m4a -codec:v copy -codec:a libmp3lame -q:a 2 output.mp3
Since the problem at hand was low stakes, I just ran the command and, to my satisfaction, it worked. Language models can’t solve every problem but they can be absolutely delightful when they work.
I spent another hour playing around with different techniques to try and teach and convince gpt-4
to play Connections properly, after a bit of exploration and feedback.
I incorporated two new techniques
- Asking for on category at a time, then giving the model feedback (correct, incorrect, 3/4)
- Using the chain of thought prompting technique
Despite all sorts of shimming and instructions, I still struggled to get the model to
- only suggest each word once, even when it already got a category correct
- only suggest words from the 16 word list
Even giving a followup message with feedback that the previous guess was invalid didn’t seem to help. This was the prompt I ended up with. It wasn’t all that effective.
After some experimentation with GitHub Copilot Chat, my review is mixed. I like the ability to copy from the sidebar chat to the editor a lot. It makes the chat more useful, but the chat is pretty chatty and thus somewhat slow to finish responding as a result. I’ve also found the inline generation doesn’t consistently respect instructions or highlighted context, which is probably the most common way I use Cursor, so that was a little disappointing. To get similar behavior with Copilot, sometimes I needed to run a generation for the whole file, but the lack of specific highlighted context meant I had to write more specific instructions, which was more time-consuming than highlighting and giving shorter, more contextual instructions. It is easy to edit the prompt and resubmit it if the completion is close, but not quite right, so that is helpful.
I worked through a basic SwiftUI 2 tutorial to build a simple Mac app. Swift and SwiftUI are an alternative to accomplish the same things Javascript and React do for web. I could also use something like Electron to build a cross-platform app using web technology, but after reading Mihhail’s article about using macOS native technology to develop Paper, I was curious to dip my toe in and see what the state of the ecosystem looked like. He opted to use Objective-C, for performance reasons. I decided to try Swift because I’ve written a bit of Objective-C years ago. I like the ergonomics of Swift as a language well enough. I can’t say I’m a huge fan of Xcode. My hardware is almost certainly too old, but Xcode is sluggish and not fun to use in a way that the web development tools I use are not (at least on my machine). Seeing all the things that PWAs can do today, I’m unsure whether it makes sense to invest in learning SwiftUI unless I want to build native mac apps.
I enjoyed this article by Robin about writing software for yourself. I very much appreciate the reminder of how gratifying it can be to build tools for yourself.
I read Swyx’s article Learn in Public today and it’s inspired me to open source most of my projects on Github.
A beautifully written and thought-provoking piece by Henrik about world models, exploring vs. exploiting in life, among other things.
I finally had a chance to use Github Copilot Chat in VS Code. It has a function to chat inline like Cursor, which has worked quite well given my initial use of it. I’m looking forward to using this more. Unfortunately, it’s not available for all IDEs yet but hopefully will be soon!
I watched lesson 3 of the FastAI course. I’ve really enjoyed Jeremy Howard’s lecture’s so far.