It’s been a little while since I worked at all on Delta but I still have the itch to have access to the source of the tool that becomes my daily LLM driver.
I recently setup Chorus which advertises itself as a tool to chat with multiple models at once.
This isn’t really a main use case for me but besides not being open source, it’s basically everything I want.
- Clean interface
- Text streaming
- Use my own API keys
- Local model support (ollama, LM Studio)
- Broad model API support
- Mini-mode triggered by a hotkey
- Conversation history
- Image support
I don’t love that their cloud is involved with the service.
I understand this is required for them to make money but I can’t see myself using a cloud product for this in the long term, especially for sending images of my system.
I feel like there is too much potential for things to go wrong.
OpenAI shipped a new version of 4o that can generate images.
I tried generating images in ChatGPT and it popped open a side chat and edit tool I hadn’t seen before.
I got rate limited before I could try much.
I read this paper on using “chain of draft”, a more compressed version of chain of thought to achieve similar results to chain of thought reasoning.
The main unlock with this approach is a reduction in inference latency and cost due to few tokens needed to achieve similar performance.
I mostly browse social media apps like LinkedIn and Bluesky on my phone.
Recently, LinkedIn started surfacing a popup prompting me to download the app after a bit of scrolling the browser.
This prompts me to close LinkedIn entirely.
I’m curious if my behavior is unusual, or if they’re getting enough conversions to app downloads for it to be worth it.
It’s certainly caused me to use LinkedIn less which I am guessing is not the point.
So much of the world is outside of what can be specified.
https://podcasts.apple.com/us/podcast/ai-and-i/id1719789201?i=1000696284548
I feel like .cursorrules
are finally starting to snap into place for me.
I’ve made many messes in Cursor, but once I exhaust what is possible without structure, if I want to continue, I need to create a structure.
There’s nothing like a painful refactor to reinforce how and why you should define conventions for your codebase.
Usually, when diving into a new idea, I don’t love to think about this stuff, but it’s always time well spent.
With this in mind, my plan is to try and build rules and structure as I go rather than pushing my projects to the limit then needing to clean up.
Stumbled upon the react-three-fiber library today and now I am building a game where a lander can fly around a mini-solar system.
Learned more about the post-training phase of fine-tuning LLMs and how the model initially goes through a pre-training phase.
From there, it is fine-tuned to contribute to a token stream with a human user, using prompt tokens to demarcate whether a message was written by the user or the assistant.
For example
<|im_start|>user
Hi there!<|im_end|>
<|im_start|>assistant
Nice to meet you!<|im_end|>
<|im_start|>user
Can I ask a question?<|im_end|>
Finally, labs have continued to improve model benchmark performance using further fine-tuning, like RLHF, where humans pick the best of a set of responses from the model, and the model is further fine-tuned on this data.
I found an interesting library for building 3D games specifically “built for Cursor” called viber3d
.
I assume the name is a reference to “vibe coding”.
This is the first library I have seen ship a starter scaffold with Cursor rules.
This is an interesting development in how frameworks are being built now.
Since language models don’t know about brand new frameworks, frameworks are shipping with content that will aid language models in using them, since coding with models seems to be an increasingly popular way to code.
I’ve built a few prototypes with the OpenAI voice to text API with code largely written using Cursor.
This has been fast and easy to incorporate into Next.js apps.
I can add an audio-recording-to-text feature to any app in a couple of minutes, ready for use in a production environment.
There are several other options for voice to text as well.
MacOS has a built-in voice to text feature and there are several other Whisper wrappers available, some that can run locally.
Talking is much faster than typing and allows me to capture raw thoughts faster, which I can then refine later.
LLMs are also quite good at structuring these raw thoughts into a more refined form that I can then edit.
I’d like to see authors being surprised by what readers end up learning from their material.
Because the author is not just sending out something static.
They’re sending out a program which is capable of emergent behavior.
So the reader will be able to try out different things and discover things the author hadn’t intended.