I spent some time experimenting with OpenDevin using claude-3-opus (I couldn’t find an easy way to use claude-3.5-sonnet). The agentic capabilities were not bad. I gave a prompt and behind the scenes, the agent iterated, created files, ran code and course corrected. I didn’t love that there wasn’t an obvious way to interrupt or help course correct. My first attempt was with the same prompt I sent to Sonnet to build Tactic. The result wasn’t bad. OpenDevin (running in a container, which makes me feel better) setup several files in a project then seemed to install dependencies, iterate on issues and inspect the site running in the browser. At least, this was what it claimed to be doing. It was hard to validate what was happening, especially because the tool’s built in browser never loaded anything. Maybe this was a bug or something unusual happening on the model side.

2024-06-24

I weirdly was running into an issue where whenever a ⌘F search didn’t return a result, my screen would flash white. It was irritating me for several days. Fortunately, I was able to find a solution that addressed it.

sudo killall coreaudiod

I was writing code with claude-3.5-sonnet and prompted it to add input validation for input arguments.

Claude 3.5 Sonnet Code Completion

Most of the code what straight forward and I was expecting the one sentence prompt to get me 5-10 lines of code with path validation, I did not expect these lines.

I’m trying to avoid buying too much into the hype (maybe it’s too late), but here are several folks talking about their notably impressive experiences with claude-3.5-sonnet.

As I noted in this post, I am going to spend more time interacting with smaller models to try and build more intuition for how LLMs behave and the different flavors in which they respond. Today, I spent some time chatting with Microsoft’s phi-3 3B using ollama. In chatting with phi3, I found it neutral in tone and to the point. It responds in a way that is easy to understand and is not overly complex or technical at the onset. It doesn’t seem to have a strong sense of self.

I enabled Cursor’s Copilot++ today. Magical. So much better predictive capabilities than Copilot. The way it anticipates my needs is pretty cool. Edit: It’s not great for writing markdown or only prose.


I learned a bit more about Crafter, a Minecraft like game that language models can play.

2024-06-19

I enjoyed reading Jordan’s post, a walk down memory lane of his career so far through a series of emails. He includes things like following up on internship opportunities, negotiating, and meeting people who would change the course of his career. He inspired me to look back through some old emails as well, both to remember this time and acknowledge how much has changed since then.

I managed to find my original offer letter from Uber in 2016, which brought back many memories that I might write about in longer form sometime in the future.

2024-06-14

For the first time in a while I used iTunes. I mean the Music app, sorry. I clicked on the album art while I was playing a song and the app switch to mini-player mode. I…didn’t see what I could click to get back to the main player. From my perspective the app had just shrunken down to the mini-player. I hesitantly clicked the lyrics button. Yeah, definitely not that. I clicked the red button of the traffic lights. Ah. Has this always been this way? It’s been a while.

2024-06-12

A few years old, excellent response written by Maxim about extracting the most value from Temporal by using it “as a service mesh for invocations of child workflows and activities hosted by different services”.