I’m taking a break from sketchybar for now. I’m currently looking into build a NL to SQL plugin or addition to datasette to use a language model to write queries.
πŸ€– Connections (claude-3-opus) Puzzle #287 🟩🟩🟩🟩 🟨🟨🟨🟨 🟦🟦🟦🟦 πŸŸͺπŸŸͺπŸŸͺπŸŸͺ I got this result twice in a row. gpt-4 couldn’t solve it. Here is one attempt. πŸ€– Connections (gpt-4) Puzzle #287 🟩πŸŸͺ🟩🟩 🟩🟩🟩🟩 🟦🟨🟦🟨 🟦🟨🟦🟨 🟦🟨🟦🟦 I tried https://echochess.com/. Kind of fun. I remember when my highschool teachers used to tell us Wikipedia wasn’t a legitimate source. It sort of feels like education is having this type of moment now with language models.
One of the greatest misconceptions concerning LLMs is the idea that they are easy to use. They really aren’t: getting great results out of them requires a great deal of experience and hard-fought intuition, combined with deep domain knowledge of the problem you are applying them to. The whole “LLMs are useful” section hits for me. I have an experience similar to Simon’s and I also wouldn’t claim LLMs are without issue or controversy.
Did a bit more work on a LLM evaluator for connections. I’m mostly trying it with gpt-4 and claude-3-opus. On today’s puzzle, the best either did was 2/4 correct. I’m unsure how much more improvement is possible with prompting or even fine tuning, but it’s an interesting challenge. Darwin, who kept a notebook where he wrote down facts that contradicted him, observed that frustrating, cognitively dissonant things were the first to slip his memory.
Setup a Temporal worker in Ruby and got familiar with its ergonomics. Tried out this gpt-4v demo repo Experimented with OCR capabilities of open source multi-modal language models. Tried llava:32b (1.6) and bakllava but neither seemed to touch gpt-4-vison’s performance. It was cool to see the former run on a macbook though.
I use the hyper+u keyboard shortcut to open a language model playground for convenience. I might use this 10-20 times a day. For the last year or so that I’ve been doing this, it has always pointed to https://platform.openai.com/playground. As of today, I’ve switched it to point to https://console.anthropic.com/workbench?new=1. Lately, I’ve preferred claude-3-opus to gpt-4. For a while, I had completely stopped looking for other models as gpt-4 seemed to be unchallenged, but it’s exciting to see new options available.
I tried setting up sqlite-vss with Deno following these instructions but got stuck on this error ❯ deno task dev Task dev deno run --allow-env --allow-read --allow-write --allow-net --unstable-ffi --allow-ffi --watch main.ts Watcher Process started. error: Uncaught (in promise) TypeError: readCstr is not a function export const SQLITE_VERSION = readCstr(sqlite3_libversion()); ^ at https://deno.land/x/[email protected]/src/database.ts:101:31 at eventLoopTick (ext:core/01_core.js:169:7) so I pivoted to Python. That effort eventually turned into this post.
I’ve spend almost a week, on and off, trying to install ollama using Nix in such a way that ollama serve will be run and managed automatically in the background. Initially, I had tried to install ollama via home-manager. This was straightforward, but finding a way to have ollama serve run automatically so that I didn’t need to do it myself every time I wanted to interact with or pull a model.
I spent some time exploring Deepgram’s Next.js starter app. I was hoping I could use it to generate a transcription in realtime but it was more like real-time captions. The responses from the server were sometimes corrections of previous transcriptions. Maybe there is a way to make this a transcription but I wasn’t sure. I also tried out vocode’s Python library for building voice-based LLM applications. By far the hardest part of this was getting an Azure account, which I believe is used to synthesize the LLM response as speech.
I played around with AgentGPT using Reworkd’s cloud hosted instance. I tried a few different goals. The first was travel related. I was pleasantly surprised (unnerved) to see the agent return links for flights to my proposed location from my actual current origin (I guess they used my IP to determine this). One of my first impressions was the likeness to Perplexity, particularly in how the links were displayed. It turns out, Reworkd open sourced code for this exact thing.