I’m currently working on building a language model based chatbot that can answer questions about the contents of a database. There are a lot of products and libraries making efforts at this problem. To start, I tried out the Vanna.ai open source library. I followed this guide to get started with ChromaDB for the indices and OpenAI as the language model to query a Postgres database. I also set up a Postgres database with Docker and the Chinook dataset. I downloaded the Chinook dataset for Postgres from this repo. The dataset is described in detail here. To start up Docker and load the data, I ran the following from my host machine (not inside a Docker container)

# start postgres
docker run --name vanna-postgres -e POSTGRES_PASSWORD=mysecretpassword -p 5432:5432 -d postgres

# create the database
createdb -h localhost -p 5432 -U postgres chinook

# load the data
psql -h localhost -p 5432 -U postgres chinook -1 -f ~/path.to/chinook_pg_serial_pk_proper_naming.sql

Vanna indexes the database content into, then can take a question and convert it into a SQL query. Finally, it executes a SQL query and returns the results as a dataframe.