Kent wrote this post on how to engage an audience by switching the first and second slide of a presentation. The audience focuses more as they try to fill in the gaps of what you’ve introduced them to so far.I’ve been chatting with qwen2, a model from Alibaba. I mostly chatted with it in English but it appears to support several other languages and I noticed a bit of Chinese leaking through even though I don’t speak it, so I’m not sure how I would have introduced it to the conversation.
user: do you have a name
assistant: As an AI, I don’t have personal names or identities like humans do.I was inspired by Daniel’s post to add sidenotes to this blog. I used claude-3.5-sonnet to generate the CSS and HTML shortcode to do this. I was impressed how well it turned out. Now I need to read the CSS in more detail to understand what Claude did It was almost too easy. I’m not the most competent CSS writer and I had never written a Hugo shortcode before. In several turns with Sonnet in Cursor, I was able to create a basic styled shortcode for a sidenote that appeared as a superscript number to start.A nice read by Stuart on Python development tools. This introduced me to the pyproject.toml configuration file, which is more comprehensive than a requirements file. It’s something I’ll need to research a bit more before I’m ready to confidently adopt it.
Claude’s character
A video about the personality of the AI, Claude. I’ve not yet become a big “papers” person yet, so this was my first introduction to “Constitutional AI”, which is a training approach where you use the model to train itself, by having it evaluate its own responses against the principles with which it was trained.I reproduced Josh’s claude-3.5-sonnet mirror test. I hadn’t realized gpt-4 and claude-3-opus had also been “passing” this test since back in March. More interesting still, Sonnet actually seems to resist speaking in the first person about itself. Fascinating research and evolution of the models’ behaviors. After reading a bit more, apparently this type of model behavior has been around at least since Bing/Sydney (paywall, sorry).
https://onemillioncheckboxes.com is an amusing, massively-parallel art project(?I spent some time experimenting with OpenDevin using claude-3-opus (I couldn’t find an easy way to use claude-3.5-sonnet). The agentic capabilities were not bad. I gave a prompt and behind the scenes, the agent iterated, created files, ran code and course corrected. I didn’t love that there wasn’t an obvious way to interrupt or help course correct. My first attempt was with the same prompt I sent to Sonnet to build Tactic.I weirdly was running into an issue where whenever a ⌘F search didn’t return a result, my screen would flash white. It was irritating me for several days. Fortunately, I was able to find a solution that addressed it.
sudo killall coreaudiod I was writing code with claude-3.5-sonnet and prompted it to add input validation for input arguments.
Most of the code what straight forward and I was expecting the one sentence prompt to get me 5-10 lines of code with path validation, I did not expect these lines.I’m trying to avoid buying too much into the hype (maybe it’s too late), but here are several folks talking about their notably impressive experiences with claude-3.5-sonnet.
https://twitter.com/SullyOmarr/status/1804656718283935845 https://twitter.com/jdjkelly/status/1804226265886363719 https://twitter.com/mattshumer_/status/1804519779077636459 https://twitter.com/alexalbert__/status/1803804677701869748 https://twitter.com/marissamary/status/1804172736488415593As I noted in this post, I am going to spend more time interacting with smaller models to try and build more intuition for how LLMs behave and the different flavors in which they respond. Today, I spent some time chatting with Microsoft’s phi-3 3B using ollama. In chatting with phi3, I found it neutral in tone and to the point. It responds in a way that is easy to understand and is not overly complex or technical at the onset.I enabled Cursor’s Copilot++ today. Magical. So much better predictive capabilities than Copilot. The way it anticipates my needs is pretty cool. Edit: It’s not great for writing markdown or only prose.
I learned a bit more about Crafter, a Minecraft like game that language models can play.