Trying out Windsurf
When you are curious about something, you have the right cocktail of neurotransmitters present to make that information stick. If you get the answer to something in the context of your curiosity, then it’s going to stay with you.
we can improve the accuracy of nearly any kind of machine learning algorithm by training it multiple times, each time on a different random subset of the data, and averaging its predictions
I wanted to get more hands-on with the language model trained in chapter 12 of the FastAI course, so I got some Google Colab credits and actually ran the training on an A100. It cost about $2.50 and took about 1:40, but generally worked quite well. There was a minor issue with auto-saving the notebook, probably due to my use of this workaround to avoid needing to give Colab full Google Drive access. Regardless, I was still able to train the language model, run sentence completions, then using the fine-tuned language model as an encoder to build a sentiment classifier. Seeing how long this process took, then seeing it work helped me build a bit more intuition about what to expect when training models. I was also a bit surprised how fast the next token prediction and classification inference were. I might try out a smaller fine-tune on my local machine now that I have a better sense of what this process looks like end to end.
The following code allowed me to successfully download the IMDB dataset with fastai to a Modal volume:
import os
os.environ["FASTAI_HOME"] = "/data/fastai"
from fastai.text.all import *
app = modal.App("imdb-dataset-train")
vol = modal.Volume.from_name("modal-llm-data", create_if_missing=True)
@app.function(
gpu="any",
image=modal.Image.debian_slim().pip_install("fastai"),
volumes={"/data": vol},
)
def download():
path = untar_data(URLs.IMDB)
print(f"Data downloaded to: {path}")
return path
run with
modal run train.py::download
Next, I tried to run one epoch of training of the language model
I tried training a language model with fastai on Modal.
First I attempted it in a standalone Modal script.
I first wrote a script to unpack the data to a volume, then ran the fit_one_cycle
function with the learner.
I ran into an issue with counter.pkl
sort of similar to this issue but I haven’t figured out how to resolve it yet.
On a whim, I checked to see if I could run a Jupyter notebook on Modal. Apparently, you can!
Jon wrote an interesting blog on top of Cloudflare Workers and KV.
I’ve been seeing more and more notebook-like products and UX. A few I’ve seen recently:
Ran several experiments using local LLMs (~7b parameter models) like llama3.2
and phi3
to generate a random number between 1 and 100.
The exact prompt was
I didn’t expect this approach to work as a uniform number generator, but it was interesting to see how it doesn’t work. At lower temperatures, most models only output a few different values in the range of 40-60. There was little to no variability. With increases in temperature (between 1-3), the distribution begins to look bi-model for several models. After this threshold, most model outputs start to breakdown and output only single digit numbers at temperature 7 and higher (I am aware this is general not a recommended model of using a language model).
Reading a bunch. Also got inspired to play around with generating random numbers with language models across different temperatures to see their distributions.
There is an enormous amount of jargon in deep learning, including terms like rectified linear unit. The vast vast majority of this jargon is no more complicated than can be implemented in a short line of code, as we saw in this example. The reality is that for academics to get their papers published they need to make them sound as impressive and sophisticated as possible. One of the ways that they do that is to introduce jargon. Unfortunately, this has the result that the field ends up becoming far more intimidating and difficult to get into than it should be. You do have to learn the jargon, because otherwise papers and tutorials are not going to mean much to you. But that doesn’t mean you have to find the jargon intimidating. Just remember, when you come across a word or phrase that you haven’t seen before, it will almost certainly turn out to be referring to a very simple concept.