I’m working on a conversation branching tool called “Delta” (for now). The first thing that led me to this idea came from chatting with Llama 3.2 and experimenting with different system prompts. I was actually trying to build myself a local version of an app I’ve been fascinated by called Dot.

I noticed that as conversations with models progressed, they became more interesting. A friend made a point that really stuck in my head about how you “build trust” with a model over multiple conversation turns. While you can write system prompts to steer the model to respond with more details and longer paragraphs, I observed that regardless of the system prompt, as conversations went on longer (more turns, more exchanges, longer message history), the responses became more interesting and better calibrated to what I was looking for. The model seemed to display a more coherent understanding of what I was talking about.

2024-11-16

Would another day of editing fundamentally change the value readers get? Probably not. Ship it and move on to your next idea while you’re still energized.

2024-11-09

we can improve the accuracy of nearly any kind of machine learning algorithm by training it multiple times, each time on a different random subset of the data, and averaging its predictions

I wanted to get more hands-on with the language model trained in chapter 12 of the FastAI course, so I got some Google Colab credits and actually ran the training on an A100. It cost about $2.50 and took about 1:40, but generally worked quite well. There was a minor issue with auto-saving the notebook, probably due to my use of this workaround to avoid needing to give Colab full Google Drive access. Regardless, I was still able to train the language model, run sentence completions, then using the fine-tuned language model as an encoder to build a sentiment classifier. Seeing how long this process took, then seeing it work helped me build a bit more intuition about what to expect when training models. I was also a bit surprised how fast the next token prediction and classification inference were. I might try out a smaller fine-tune on my local machine now that I have a better sense of what this process looks like end to end.

2024-10-29

The following code allowed me to successfully download the IMDB dataset with fastai to a Modal volume:

import os

os.environ["FASTAI_HOME"] = "/data/fastai"

from fastai.text.all import *

app = modal.App("imdb-dataset-train")
vol = modal.Volume.from_name("modal-llm-data", create_if_missing=True)


@app.function(
    gpu="any",
    image=modal.Image.debian_slim().pip_install("fastai"),
    volumes={"/data": vol},
)
def download():
    path = untar_data(URLs.IMDB)
    print(f"Data downloaded to: {path}")
    return path

run with

modal run train.py::download

Next, I tried to run one epoch of training of the language model

I tried training a language model with fastai on Modal.

First I attempted it in a standalone Modal script. I first wrote a script to unpack the data to a volume, then ran the fit_one_cycle function with the learner. I ran into an issue with counter.pkl sort of similar to this issue but I haven’t figured out how to resolve it yet.

On a whim, I checked to see if I could run a Jupyter notebook on Modal. Apparently, you can!