Your project has a youthful optimism that I hope you won’t lose as you go. And in fact it might be the way to win in the long run.
I tried out Llama 2 today using ollama. At first pass, it seemed ok a writing Python code but I struggled to get it to effective generate or adhere to specific schema. I’ll have to try a few more things but my initial impressions are mixed (relative to OpenAI models).
It’s hard to think because it’s hard to think.
Finally learned that RAG stands for “Retriever-Augmented Generation” after seeing it all over the place for months. Not sure how I missed that one.
Meta released Llama 2 yesterday and the hype has ensued. While it’s exciting to see more powerful models become available, a model with weights is not the same as an API. It is still far less accessible.
A paper came out on the measurement of the degradation ChatGPT’s reasoning abilities. As real-time peer review took place over the course of the day on Twitter, the most compelling explanation that I heard to explain these findings was that OpenAI has further fine-tuned the models to respond in a manner consistent with the level of the prompt, because this is a better experience for the user. We’ll see if this explanation holds up over time.
I’ve been playing around more with nix
lately.
I like what I’ve seen from it so far: declare dependencies and get an isolated shell with those dependencies.
If distributed, the environment can be trivially recreated on another machine.
So far, it’s been a struggle to get a working Python environment with dependencies setup.
I’ve gotten a lot of cryptic error messages after trying a number of different flake.nix
files.
I plan to continue to experiment, but thus far the learning curve is tough.
Some unstructured thoughts on the types of tasks language models seem to be good (and bad) at completing:
A language model is an effective tool for solving problems when can describe the answer or output you want from it with language. A language model is a good candidate to replace manual processes performed by humans, where judgement or application of semantic rules is needed to get the right answer. Existing machine learning approaches are already good at classifying or predicting over a large number of features, specifically when one doesn’t know how things can or should be clustered or labelled just by looking at the data points. To give an example where a language model will likely not perform well: imagine you want to generate a prediction for the value of a house and the land it sits on, given a list of data points describing it:
Experimenting with using a language model to improve the input prompt, then use that output as the actual prompt for the model, then returning the result. It’s a bit of a play on the “critique” approach. Some of the outputs were interesting but I need a better way to evaluate the results.
import sys
import openai
MODEL = "gpt-3.5-turbo-16k"
IMPROVER_PROMPT = """
You are an expert prompt writer for a language model. Please convert the user's message into an effective prompt that will be sent to a language model to produce a helpful and useful response.
Output the improved prompt only.
"""
def generate_improved_prompt(prompt: str) -> str:
completion = openai.ChatCompletion.create(
model=MODEL,
temperature=1.0,
messages=[
{
"role": "system",
"content": IMPROVER_PROMPT,
},
{
"role": "user",
"content": prompt,
},
],
)
return completion.choices[0].message.content
def generate_completion(prompt: str) -> dict:
completion = openai.ChatCompletion.create(
model=MODEL,
temperature=1.0,
messages=[
{
"role": "user",
"content": prompt,
},
],
)
return completion.choices[0].message.content
def main():
prompt = ' '.join(sys.argv[1:])
standard_result = generate_completion(prompt)
print("Standard completion:")
print(standard_result)
improved_prompt = generate_improved_prompt(prompt)
print("\nImproved prompt:")
print(improved_prompt)
improved_result = generate_completion(improved_prompt)
print("Improved completion:")
print(improved_result)
return improved_result
if __name__ == "__main__":
main()
I’ve been working through a series on nix-flakes.
It’s well written and shows some interesting applications of the tool set.
I’m still trying to wrap my head around exactly where nix could fit in in my development lifecycle.
It seems like it wraps up builds and package management into one.
Sort of like docker, bazel, pip/npm/brew all in one.
The tutorial has shown some useful variations and has convinced me flakes is the way to go, but I need to spend some more time better understanding the primitives as well.
I understand little of what’s going on in the flake.nix
files I’ve looked at.
Facebook (Meta, whatever) announced Threads today to launch on July 6th. Given how much worse it feels like Twitter has become (my experience only), on one hand, I could see people migrating here because no great alternative has really emerged. On the other, Facebook has zero “public” products where the user experience is even palatable for me, personally (I use Whatsapp but it’s basically iMessage). Instagram and Facebook both rapidly became completely intolerable for me due to their content. Maybe that is a matter of curation, but I bet, at least in some part, it’s a result of how Facebook runs their business and why Twitter never made much ad revenue compared to them (and why Reddit struggles to either). If I had to make a bet, I would bet on people migrating to Threads. Personally, I won’t until they have a webapp.
A simple shell function to setup a Python project scaffold.
It’s idempotent, so it won’t overwrite an existing folder or env
.
pproj () {
mkdir -p $1
cd $1
python -m venv env
. env/bin/activate
}