I tried training a language model with fastai on Modal.
First I attempted it in a standalone Modal script. I first wrote a script to unpack the data to a volume, then ran the fit_one_cycle function with the learner. I ran into an issue with counter.pkl sort of similar to this issue but I haven’t figured out how to resolve it yet.
On a whim, I checked to see if I could run a Jupyter notebook on Modal.Jon wrote an interesting blog on top of Cloudflare Workers and KV.
I’ve been seeing more and more notebook-like products and UX. A few I’ve seen recently:
https://github.com/fastai/fastbook https://nbdev.fast.ai/ https://runme.dev/ https://github.com/tzador/makedown https://github.com/dim0x69/mdxRan several experiments using local LLMs (~7b parameter models) like llama3.2 and phi3 to generate a random number between 1 and 100. The exact prompt was
llama3.2 Chat system You are a random number generator that provides a number between 1 and 100. user Generate a random number between 1 and 100. Provide the output in the following format: ‘Random number: X’, where X is the generated number. Ensure the number is an integer and do not include any additional text or explanations.Reading a bunch. Also got inspired to play around with generating random numbers with language models across different temperatures to see their distributions.There is an enormous amount of jargon in deep learning, including terms like rectified linear unit. The vast vast majority of this jargon is no more complicated than can be implemented in a short line of code, as we saw in this example. The reality is that for academics to get their papers published they need to make them sound as impressive and sophisticated as possible. One of the ways that they do that is to introduce jargon.Reading fastbook, I get the sense we could teach math more effectively if we did so through spreadsheets and Python code.Current theory on why nbdev and notebooks in general make sense and can work: Writing code for most software is actually pretty similar to writing code for models, but usually you pay less of a cost for not knowing what you’re doing yet (aka exploring). You still pay some cost, but it isn’t totally deal-breaking in most software settings, unlike if you needed to run the entire data load and clean job as you’re training a model.There are many tools for doing evals. I used ell and braintrust together for fun and disaster. The integration is actually not terrible, though I’m not 100% whether they’d be obvious things to try and link together. It seems ell is striving to build its own eval capabilities as well.Some quotes from Lesson 3 of course.fast.ai by Jeremy Howard.
I remember a few years ago when I said something like this in a class somebody on the forum was like “this reminds me of that thing about how to draw an owl”. Jeremy’s basically saying okay step one draw two circles, step two draw the rest of the owl. The thing I find I have a lot of trouble explaining to students is when it comes to deep learning, there’s nothing between these two steps.I tried to use aider to build a crossword generator in Python. Even with a preselected set of words, this proved difficult. Or perhaps the preselected set of words was why it was difficult. Either way, the AI model doesn’t really understand the concept of word overlap in the context of a crossword. That seemed solvable. Instead, I had it write code to precalculate word overlap from the word list, then use that to place the words.