Learned more about the post-training phase of fine-tuning LLMs and how the model initially goes through a pre-training phase. From there, it is fine-tuned to contribute to a token stream with a human user, using prompt tokens to demarcate whether a message was written by the user or the assistant.

For example

<|im_start|>user
Hi there!<|im_end|>
<|im_start|>assistant
Nice to meet you!<|im_end|>
<|im_start|>user
Can I ask a question?<|im_end|>

Finally, labs have continued to improve model benchmark performance using further fine-tuning, like RLHF, where humans pick the best of a set of responses from the model, and the model is further fine-tuned on this data.

Progress is slow, but I feel like I am finally beginning to develop more of a mental model of what is happening in model training. When I trained my own language model on the posts from the blog, I understood that I was training a completion model but fully appreciate the additional steps I would need to shape that “base” model into a chat model myself. Now, I feel I have a better understanding of how that process works.

This brings me back to a question I have been asking for a while: what happened to the completion models? Why do I have to use a model fine-tuned on <|im_start|> and <|im_end|> tokens?