I tried out jsonformer to see how it would perform with some of structured data use cases I’ve been exploring.

Setup

python -m venv env
. env/bin/activate
pip install jsonformer transformers torch

Code

⚠️ Running this code will download 10+ GB of model weights ⚠️

from jsonformer import Jsonformer
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("databricks/dolly-v2-12b")
tokenizer = AutoTokenizer.from_pretrained("databricks/dolly-v2-12b")

json_schema = {
  "$schema": "http://json-schema.org/draft-07/schema#",
  "title": "RestaurantReview",
  "type": "object",
  "properties": {
    "review": {
      "type": "string"
    },
    "sentiment": {
      "type": "string",
      "enum": ["UNKNOWN", "POSITIVE", "MILDLY_POSITIVE", "NEGATIVE", "MILDLY_NEGATIVE"]
    },
    "likes": {
      "type": "array",
      "items": {
        "type": "string"
      }
    },
    "dislikes": {
      "type": "array",
      "items": {
        "type": "string"
      }
    }
  },
  "required": ["review", "sentiment"]
}
prompt = """From the provided restaurant review, respond with JSON adhering to the schema.
Use content from the review only.
Review:
Amazing food, I like their brisket sandwiches! Also, they give you a lot of sides! Excited to come again.
Response:
"""
jsonformer = Jsonformer(model, tokenizer, json_schema, prompt)
generated_data = jsonformer()

print(json.dumps(generated_data, indent=2))

Results

(env) ~/ time python run_review.py
{
  "review": "Amazing food, I like their brisket sandwiches",
  "sentiment": "POSITIVE",
  "likes": [
    "They give you a lot of sides!"
  ],
  "dislikes": [
    "I'm not a fan of the rice"
  ]
}
150.52s user 98.48s system 104% cpu 3:57.68 total

(env) ~/ time python run_review.py
{
  "review": "Amazing food, I like their brisket sandwiches",
  "sentiment": "POSITIVE",
  "likes": [
    "Excited to come again"
  ],
  "dislikes": [
    "Their sandwiches are too expensive"
  ]
}
141.12s user 92.58s system 109% cpu 3:34.12 total

(env) ~/ time python run_review.py
{
  "review": "Amazing food, I like their brisket sandwiches",
  "sentiment": "POSITIVE",
  "likes": [
    "Excited to come again"
  ],
  "dislikes": [
    "They give you a lot of sides"
  ]
}
148.66s user 96.66s system 106% cpu 3:50.38 total

Takeaways

jsonformer’s has a nice API to mandate structured output of a language model. The quality of the output from dolly isn’t the best. There are hallucinations and only a single like and dislike is generated for each completion. It would be nice it if supported more than just JSON schemas. It runs quite slowly on an M1 Macbook Pro. This library could become much more compelling if OpenAI is added.

Imagine we have a query to an application that has become slow under load demands. We have several options to remedy this issue. If we settle on using a cache, consider the following failure domain when we design an architecture to determine whether using a cache actually is a good fit for the use case.

Motivations for using a cache

When the cache is available and populated it will remove load from the database. As a result, the responses for the query will likely be faster than it was when we were making it to the underlying database. However, we should consider how the application will behave if the cache isn’t available (either expired or the infrastructure is unstable). A starting approach in code might look like this:

I’ve written several posts on using JSON and Pydantic schemas to structure LLM responses. Recently, I’ve done some work using a similar approach with protobuf message schemas as the data contract. Here’s an example to show what that looks like.

Example

Imagine we have the following questionnaire that we send out to new employees when they join our company so their teammates can get to know them better.

  1. What are your hobbies or interests outside of work? Are there any particular activities or hobbies that you enjoy doing in your free time?
  2. Are there any unique talents or skills that you possess that your teammates might find interesting or helpful?
  3. What is one thing you would like your teammates to know about you that may not be immediately apparent?
  4. Are there any favorite books, movies, or TV shows that you enjoy? Feel free to share a few recommendations with your teammates.
  5. What is one interesting or memorable travel experience you’ve had? It could be an adventure, a cultural immersion, or simply a unique encounter that left a lasting impression. Share a brief description with your teammates.

For fun, we want to reward those who read and engage with these emails because we think it helps with team building, so we want to periodically do some trivia using all employees’ answers to these questions. An example of one of these trivia questions could be “who has the unique talent that they can juggle bowling pins?”. If we have a lot of employees, it would become cumbersome to manage all this data, but it’s important to do so for our trivia game. We don’t want to have to re-read response emails each week to create our trivia. We want the employee, the question and their answer readily available.

Plenty of data is ambiguous without additional description or schema to clarify its meaning. It’s easy to come up with structured data that can’t easily be interpreted without its accompanying schema. Here’s an example:

{
  "data": [
    12,
    "21",
    true,
    {
      "name": "John Doe",
      "age": 30
    }
  ]
}

You can argue that this is “bad” structured data, but if you have this data structure, there is little meaning you can derive without additional insight into what the data represents.

The most popular language model use cases I’ve seen around have been

  • chatbots
  • agents
  • chat your X use cases

These use cases are quite cool, but often stand alone, separate from existing products or added on as an isolated feature.

Expanding production use cases for language models

I’ve been thinking about what could it look like to naturally embed a call to language model in code to flexibly make use of its capabilities in a production application. In this pursuit, I’ve been hooked by the idea that declarative, unambiguous data schema can serve as the bridge between language model capabilities and applications. Generally, schemas provide the contract for the data that will be sent into and expected out of a procedure. With this approach, we’re treating the language model as a sort of magic RPC or API call. The request schema provides detailed context about the data we are sending into the language model. The response schema becomes an instruction set for the language model to interpret and fill in given the request data and accompanying schema as a kind of explanation. Finally, the response schema also serves as a validation layer in the application. If the language model returns data that fails to comply with the response schema, a validation error will prevent that bad data from making it any further into the system. Here’s an example of what failed validations might look like:

It’s necessary to pay attention to the shape of a language model’s response when incorporating it as a component in a software application. You can’t programmatically tap into the power of a language model if you can’t reliably parse its response. In the past, I have mostly used a combination of prose and examples to define the shape of the language model response in my prompts. Something like:

Respond using JSON with the following shape:

Auto-GPT is a popular project on Github that attempts to build an autonomous agent on top of an LLM. This is not my first time using Auto-GPT. I used it shortly after it was released and gave it a second try a week or two later, which makes this my third, zero-to-running effort.

I installed it

python -m venv env
. env/bin/activate
pip install -r requirements.txt

and then ran it

I believe that language models are most useful when available at your fingertips in the context of what you’re doing. Github Copilot is a well known application that applies language models in this manner. There is no need to pre-prompt the model. It knows you’re writing code and that you’re going to ask it to write code for you based on the contents of your comment. Github is further extending this idea with Copilot for CLI, which looks promising but isn’t generally available yet. I’ll describe a tool I’ve created, called “Shell AI” (sai) that integrates a language model directly into the command line interface to generate and run shell commands on the fly.

Over the the years, I’ve developed a system for capturing knowledge that has been useful to me. The idea behind this practice is to provide immediate access to useful snippets and learnings, often with examples. I’ll store things like:

Amend commit message

git commit --amend -m "New commit message"

with tags like #git, #commit, and #amend after I searched Google for “how to amend a git commit message”. With the knowledge available locally and within my own file system, I’ve added search capabilities on top which allows me to quickly find this snippet any time I need it. I use a system hotkey to bring up the search function then type loosely what I am looking for (maybe “git amend commit”) and this snippet is returned as a result.

I know a little about nix. Not a lot. I know some things about Python virtual environments, asdf and a few things about package managers. I’ve heard the combo of direnv and nix is fantastic from a number of engineers I trust, but I haven’t had the chance to figure out what these tools can really do. I could totally ask someone what their setup is, but I decided to ask ChatGPT (gpt-4) to see what would happen. The following is the summary of what I did that worked, as instructed and summarized by the LM, with a bit of editing and commentary by me.