2024-10-12

Reading fastbook, I get the sense we could teach math more effectively if we did so through spreadsheets and Python code.

2024-10-11

Current theory on why nbdev and notebooks in general make sense and can work: Writing code for most software is actually pretty similar to writing code for models, but usually you pay less of a cost for not knowing what you’re doing yet (aka exploring). You still pay some cost, but it isn’t totally deal-breaking in most software settings, unlike if you needed to run the entire data load and clean job as you’re training a model. For the latter, a full re-run would slow down the feedback loop too much to still move quickly. Notebooks reduce the feedback loop to about as low as imaginable while also allowing you to experiment while still not totally knowing where you’re going yet. Because you can still go forward(ish), you can sketch in the general direction of where you want to go without as many of the constraints of abstraction, structure, or slow parts of your workflow, which can easily be memoized by the notebook.

2024-10-10

There are many tools for doing evals. I used ell and braintrust together for fun and disaster. The integration is actually not terrible, though I’m not 100% whether they’d be obvious things to try and link together. It seems ell is striving to build its own eval capabilities as well.

Some quotes from Lesson 3 of course.fast.ai by Jeremy Howard.

I remember a few years ago when I said something like this in a class somebody on the forum was like “this reminds me of that thing about how to draw an owl”. Jeremy’s basically saying okay step one draw two circles, step two draw the rest of the owl. The thing I find I have a lot of trouble explaining to students is when it comes to deep learning, there’s nothing between these two steps. When you have ReLUs getting added together and gradient descent to optimize the parameters and samples of inputs and of what you want, the computer draws the owl. That’s it.

2024-10-08

I tried to use aider to build a crossword generator in Python. Even with a preselected set of words, this proved difficult. Or perhaps the preselected set of words was why it was difficult. Either way, the AI model doesn’t really understand the concept of word overlap in the context of a crossword. That seemed solvable. Instead, I had it write code to precalculate word overlap from the word list, then use that to place the words. The program seemed to hang indefinitely. Adding debug statements revealed extensive looping and attempts to place words that likely couldn’t fit within the constraints of the puzzle. I had selected a group of words around the theme “celestial bodies” without considering their potential for overlap. It was probably impossible to place them all.

2024-10-07

One of the most painful lessons beginners have to learn is just how often everyone is wrong about everything.

Imagine a spreadsheet where every time you change something you must open a terminal, run the compiler and scan through the cell / value pairs in the printout to see the effects of your change. We wouldn’t put up with UX that appalling in any other tool but somehow that is still the state of the art for programming tools.

2024-09-27

Erik wrote about how it’s hard to write code for humans.

Getting started is the product!


Found a cool script by David to allow streaming output using glow. Can’t seem to figure out why it strips away the terminal colors for me.

2024-09-26

Cool article by Jacob on a blog re-write to Astro. I’ve been getting a bit of a re-write itch lately but I don’t want it to be a distraction. Might need to wait until the end of the FastAI course with just a little exploration on the side.

2024-09-25

Several interesting releases today/recently.

Multi-modal llama: llama3.2. Tons of model infra providers announced availability day one. We seem to be getting into a bit of a rhythm here. It’s also convenient for Meta who doesn’t need to scale the infra (though they of all companies would probably be capable) – providers do it for them.

AllenAI’s Olmo: another interesting, open source multi-modal model.

Open source is catching up in multi-modal. I’m looking forward to experimenting with both of these.

I finally found some time to run a more comprehensive evals of Connections with one guess at a time and using Python code to validate the guesses and give feedback. I ran about 100 puzzles with gpt-4o-mini, gp-4o, and claude-3-5-sonnet, but it became clear that Sonnet was going to perform the best, so I decide to only complete the 466 puzzles released as of today with Sonnet. This wasn’t cheap but it was interesting to see the results. I’m going to write up some more comprehensive findings and push the code soon.