Ran several experiments using local LLMs (~7b parameter models) like llama3.2
and phi3
to generate a random number between 1 and 100.
The exact prompt was
system
You are a random number generator that provides a number between 1 and 100.
user
Generate a random number between 1 and 100. Provide the output in the following format: ‘Random number: X’, where X is the generated number. Ensure the number is an integer and do not include any additional text or explanations.
I didn’t expect this approach to work as a uniform number generator, but it was interesting to see how it doesn’t work.
At lower temperatures, most models only output a few different values in the range of 40-60.
There was little to no variability.
With increases in temperature (between 1-3), the distribution begins to look bi-model for several models. After this threshold, most model outputs start to breakdown and output only single digit numbers at temperature 7 and higher (I am aware this is general not a recommended model of using a language model).
Reading a bunch.
Also got inspired to play around with generating random numbers with language models across different temperatures to see their distributions.
There is an enormous amount of jargon in deep learning, including terms like rectified linear unit. The vast vast majority of this jargon is no more complicated than can be implemented in a short line of code, as we saw in this example. The reality is that for academics to get their papers published they need to make them sound as impressive and sophisticated as possible. One of the ways that they do that is to introduce jargon. Unfortunately, this has the result that the field ends up becoming far more intimidating and difficult to get into than it should be. You do have to learn the jargon, because otherwise papers and tutorials are not going to mean much to you. But that doesn’t mean you have to find the jargon intimidating. Just remember, when you come across a word or phrase that you haven’t seen before, it will almost certainly turn out to be referring to a very simple concept.
Reading fastbook, I get the sense we could teach math more effectively if we did so through spreadsheets and Python code.
Current theory on why nbdev
and notebooks in general make sense and can work:
Writing code for most software is actually pretty similar to writing code for models, but usually you pay less of a cost for not knowing what you’re doing yet (aka exploring).
You still pay some cost, but it isn’t totally deal-breaking in most software settings, unlike if you needed to run the entire data load and clean job as you’re training a model.
For the latter, a full re-run would slow down the feedback loop too much to still move quickly.
Notebooks reduce the feedback loop to about as low as imaginable while also allowing you to experiment while still not totally knowing where you’re going yet.
Because you can still go forward(ish), you can sketch in the general direction of where you want to go without as many of the constraints of abstraction, structure, or slow parts of your workflow, which can easily be memoized by the notebook.
There are many tools for doing evals.
I used ell
and braintrust
together for fun and disaster.
The integration is actually not terrible, though I’m not 100% whether they’d be obvious things to try and link together.
It seems ell
is striving to build its own eval capabilities as well.
Some quotes from Lesson 3 of course.fast.ai by Jeremy Howard.
I remember a few years ago when I said something like this in a class somebody on the forum was like “this reminds me of that thing about how to draw an owl”.
Jeremy’s basically saying okay step one draw two circles, step two draw the rest of the owl.
The thing I find I have a lot of trouble explaining to students is when it comes to deep learning, there’s nothing between these two steps.
When you have ReLUs getting added together and gradient descent to optimize the parameters and samples of inputs and of what you want, the computer draws the owl.
That’s it.
I tried to use aider
to build a crossword generator in Python.
Even with a preselected set of words, this proved difficult.
Or perhaps the preselected set of words was why it was difficult.
Either way, the AI model doesn’t really understand the concept of word overlap in the context of a crossword.
That seemed solvable.
Instead, I had it write code to precalculate word overlap from the word list, then use that to place the words.
The program seemed to hang indefinitely.
Adding debug statements revealed extensive looping and attempts to place words that likely couldn’t fit within the constraints of the puzzle.
I had selected a group of words around the theme “celestial bodies” without considering their potential for overlap.
It was probably impossible to place them all.
One of the most painful lessons beginners have to learn is just how often everyone is wrong about everything.
Imagine a spreadsheet where every time you change something you must open a terminal, run the compiler and scan through the cell / value pairs in the printout to see the effects of your change.
We wouldn’t put up with UX that appalling in any other tool but somehow that is still the state of the art for programming tools.
Erik wrote about how it’s hard to write code for humans.
Getting started is the product!
Found a cool script by David to allow streaming output using glow
.
Can’t seem to figure out why it strips away the terminal colors for me.