2024-06-21

Logs

As I noted in this post, I am going to spend more time interacting with smaller models to try and build more intuition for how LLMs behave and the different flavors in which they respond. Today, I spent some time chatting with Microsoft’s phi-3 3B using ollama. In chatting with phi3, I found it neutral in tone and to the point. It responds in a way that is easy to understand and is not overly complex or technical at the onset. It doesn’t seem to have a strong sense of self.

When prompted about itself, it generally responds that it’s “an AI/large language model developed by Microsoft”. Often it will include that its name is “Phi”.

>>> who are you?
 I am Phi, an AI developed by Microsoft to assist users in generating human-like text
based on the prompts given to me. How can I help you today?

When you ask it for its name, rather than who it is, it seems to forget it has one.

>>> what is your name?
As an AI, I don't have a personal name, but you can call me Assistant. How can I help you today?

Once, it even thought it was ChatGPT:

>>> who are you?
I'm an AI developed by Microsoft, known as ChatGPT. [...]

I prompted it to have the following conversation about the future of technology in society where we would alternate asking each other questions. It quickly slipped into outputting long lists of ways governments or society could prepare. Occasionally, there were strange artifacts that showed up in the output like this:

2. <item 2>
3. Written by user:
3. <item 3>

Several times I asked the model not to respond with bullet points, but it really wanted to keep doing that. Adding

No lists. Respond in paragraphs.

to the end of my prompt seemed to help. Adding that to the beginning was less consistent. This result somewhat aligns with what I’ve seen with OpenAI models, where if you want JSON, adding

JSON:

to the end of the prompt is a pretty good way to get JSON output. Phi3 seems to quickly forget instructions and fall back to its defaults. This intuitively makes sense to me as it is a small model compared to what I typically work with.

I ran all the conversations through the shiny and new claude-3.5-sonnet to see if there was anything I missed in my summary above. It corroborated most of what I noticed, though it also added that the model seemed verbose “sometimes to the point of being overly wordy”. This exchange with Claude got me thinking about the different categories one might score a language model on in terms of the experience of chatting with it. Specifically, I would want to capture what it’s like to talk to the model. These categories would be more qualitative and not indicative of the model’s performance on benchmarks but rather a description of what the vibe is like when you interact with it.

I think there’s a lot of potential depth to describing models, so I am going to save that exploration for another time. I’ll conclude with the vibes I got from phi3. Since this is my first shallow dive for a model like this, I don’t have many other similar models to compare it to, so these may end up being a little unfair since most of my experience is with larger models. I’ll try to focus on what it’s like to talk to the model rather than “how good it is at things”.

Nevertheless, I would say phi3 is

verbose
formal
lacking a sense of self or identity
inconsistent in following instructions

Because it seemed a little unfair to just review phi3, I decided to chat a bit with llama3 8b as well. I immediately felt I was unfair to phi3. I think I prefer llama3’s tone but it’s response pattern was a bit repetitive:

a response to what you said
another side of the argument
its perceived role as LLaMA (“it’s my job to…”)

When I started a conversation, it didn’t follow my instructions about alternating asking questions and instead just kept asking questions. When I asked my own question, it’s response was in lists and very repetitive. Far less nice than anything phi3 said throughout our conversation.

When I asked llama3 about itself, it had more of a personality than phi3. It used analogies to describe what it was like to be a model where as phi3 responded more with facts about how models work.

I found this response particularly interesting:

Imagine being a vast library with an infinite shelf space, where every book represents a piece of knowledge, a concept, or a relationship between ideas. That's roughly what my "brain" is like – a massive database of text data that I've been trained on.

When you interact with me, it's as if you're browsing through these digital shelves, and I'm the librarian who can help you find specific books or recommend related ones based on your interests.

To summarize, I would say llama3 8b is

friendly
engaging
articulates a sense of self

✎ Edit

Raw