Today, I played around with Matt Rickard’s ReLLM library, another take on constraining LLM output, in this case, with regex. I tried to use it to steer a language model to generate structure (JSON) from unstructured input. This exercise is sort of like parsing or validating JSON with regex – it’s not a good idea. Complicated regular expressions to describe JSON are hard to write. I do love the demo that generates acronyms though. Matt also wrote parserLLM which provides to ability to use a context-free grammar to constrain the next predicted token from the language model. I prefer the context-free grammar approach at a high-level, but believe we need the language model constraints to be directly connected with the data structures we intend to use in code to effectively weave language models into our existing applications.
I’ve been trying to find a simple way to host a website that formats and serves a nice looking version of a recipe written in markdown.
There are a few open source projects available, but nothing that has fit the bill yet.
I briefly had the idea to try out something with Next.js and mdx
but I found when I scaffolded a new app that I didn’t even recognize the project structure.
Next.js has moved to the “App Router” approach for structuring projects.
It’s not immediately obvious or intuitive how this works.
As a batteries-included framework, it makes sense that different approaches to structure will have their own learning curves.
Nevertheless, it’s a bit irritating how frequently this structure changes.
I have a ~2-3 year old Next.js project that looks nothing like the project I am currently working on.
And the project I am currently working on looks very different from a newly scaffolded project with App Router today.
I did a bit more work with Clojure today.
My imperative programming habits are still bleeding through.
The exercise introduced cond
as a sort of case statement for flow control.
I wrote a simple cond
statement but was getting a bizarre runtime error:
(defn my-fn
[x]
(cond
(x < 0) "negative"
(x = 0) "zero"
(x > 0) "positive"
)
)
user=> (my-fn 1)
Execution error (ClassCastException) at user/my-fn (REPL:4).
class java.lang.Long cannot be cast to class clojure.lang.IFn (java.lang.Long is in module java.base of loader 'bootstrap'; clojure.lang.IFn is in unnamed module of loader 'app')
It took me a frustratingly long time to realize I needed to use prefix notation for the conditions in the cond
.
I’ve following the work Exercism has been doing for several years now. I used their product to learn a bit of Elixir and Go a few years back. I got an email from the today promoting a focus on functional programming languages in the month of June. I decided to learn a bit of Clojure, since I’ve been working with the JVM lately. I’ve done a few of the exercises and my takeaways so far are
The API does not just change without us telling you. The models are static there.
— Logan Kilpatrick (@OfficialLoganK) May 31, 2023
Logan says any changes to the model would have been communicated. It seems some folks have data that show the model’s degradation. As competition emerges in the space, it could be a problem for OpenAI if they lose user trust on model versioning and evolution.
Tried to setup Falcon 40B. Used their provided example code and download about 90GB of weights. Ran the Python code and it failed. Did a search on the error. Found many others were seeing the same in the HuggingFace forum. Eventually, got the program to run in some form. Maxed out Macbook’s memory at about 90GB (went to swap) and crashed the process. I wonder if a Mac can be tuned to make this work.
Tons of reports on HN that GPT-4 has gotten significantly worse. Are people experiencing this? pic.twitter.com/VBminyUj6r
— Nabeel S. Qureshi (@nabeelqu) May 31, 2023
A number of folks are reporting gpt-4 appears to be performing less impressively as of late (additional conversation). I was using gpt-4 to write code earlier today, and anecdotally, can say it seems to be less effective at code generation. It still writes working code but the code, but the tests cases it provided aren’t always correct and it seems to be less “expert level” than I recall initially.
I’ve been following Eric’s posts about SudoLang since the first installment back in March. I’ve skimmed through the spec and the value proposition quite compelling. SudoLang seeks to allow programmers all levels to instruct LLMs and can also be transpiled into your programming language of choice. While I’m still early in my understanding of how to use this technique, it’s one I’m following closely and continuing to experiment with.
What if we set GPT-4 free in Minecraft? ⛏️
— Jim Fan (@DrJimFan) May 26, 2023
I’m excited to announce Voyager, the first lifelong learning agent that plays Minecraft purely in-context. Voyager continuously improves itself by writing, refining, committing, and retrieving *code* from a skill library.
GPT-4 unlocks… pic.twitter.com/hjTxk6Qb1x
NVIDIA researchers introduce an LLM-based agent with “lifelong learning” capabilities that can navigate, discover, and accomplish goals in Minecraft without human intervention.
The Alexandria Index is building embeddings for large, public data sets, to make them more searchable and accessible.
That people produce HTML with string templates is telling us something. I think about this phenomena often, though I personally find most string template systems that produce HTML difficult to use. Django templates, Handlebars, Rails ERB, Hugo templates just to name a few. My experience has been these systems are difficult to debug and are practically their own full programming languages. I think React finds the sweet spot for the challenges that these other systems run into, with the abilities of Typescript/Javascript (maybe the braces-in-JSX syntax notwithstanding). React can still be difficult to debug but it feels much more like you’re writing the thing that will be rendered rather than an abstraction on top of that (yes React is probably a higher level abstraction than any other these but it’s about experience over the implementation details).
I’ve seen a lot of “GPT detection” products floating around lately. Sebastian discusses some of the products and their approaches in this article. Some products claim to have developed an “algorithm with an accuracy rate of text detection higher than 98%”. Unfortunately, this same algorithm determined a GPT-4 generated response from the prompt “write a paragraph in the style of Edgar Allan Poe” was 0% AI GPT. In my experience, you don’t need to try very hard to trick “AI-detection” systems. It seems that adding “in the style of…” to pretty much any prompt can thwart detections approaches. Even though these products don’t seem to work, there is clearly a market for them and many products in that market, which seems to indicate a desire for them to work. From these products’ marketing and news references, it appears folks in education are quite interested in them. I can’t begin to appreciate the challenges educators must be experiencing as they attempt to adjust to the changes brought on by the accessibility of LLMs. However, as someone who struggled to learn in the traditional education system, I do hope teachers will pivot their energies to adopting their curriculums rather than trying to maintain the status quo.