I’ve following the work Exercism has been doing for several years now. I used their product to learn a bit of Elixir and Go a few years back. I got an email from the today promoting a focus on functional programming languages in the month of June. I decided to learn a bit of Clojure, since I’ve been working with the JVM lately. I’ve done a few of the exercises and my takeaways so far are
The API does not just change without us telling you. The models are static there. — Logan Kilpatrick (@OfficialLoganK) May 31, 2023 Logan says any changes to the model would have been communicated. It seems some folks have data that show the model’s degradation. As competition emerges in the space, it could be a problem for OpenAI if they lose user trust on model versioning and evolution. Tried to setup Falcon 40B.
Tons of reports on HN that GPT-4 has gotten significantly worse. Are people experiencing this? pic.twitter.com/VBminyUj6r — Nabeel S. Qureshi (@nabeelqu) May 31, 2023 A number of folks are reporting gpt-4 appears to be performing less impressively as of late (additional conversation). I was using gpt-4 to write code earlier today, and anecdotally, can say it seems to be less effective at code generation. It still writes working code but the code, but the tests cases it provided aren’t always correct and it seems to be less “expert level” than I recall initially.
I’ve been following Eric’s posts about SudoLang since the first installment back in March. I’ve skimmed through the spec and the value proposition quite compelling. SudoLang seeks to allow programmers all levels to instruct LLMs and can also be transpiled into your programming language of choice. While I’m still early in my understanding of how to use this technique, it’s one I’m following closely and continuing to experiment with.
What if we set GPT-4 free in Minecraft? ⛏️ I’m excited to announce Voyager, the first lifelong learning agent that plays Minecraft purely in-context. Voyager continuously improves itself by writing, refining, committing, and retrieving *code* from a skill library. GPT-4 unlocks… pic.twitter.com/hjTxk6Qb1x — Jim Fan (@DrJimFan) May 26, 2023 NVIDIA researchers introduce an LLM-based agent with “lifelong learning” capabilities that can navigate, discover, and accomplish goals in Minecraft without human intervention.
The Alexandria Index is building embeddings for large, public data sets, to make them more searchable and accessible. That people produce HTML with string templates is telling us something. I think about this phenomena often, though I personally find most string template systems that produce HTML difficult to use. Django templates, Handlebars, Rails ERB, Hugo templates just to name a few. My experience has been these systems are difficult to debug and are practically their own full programming languages.
I’ve seen a lot of “GPT detection” products floating around lately. Sebastian discusses some of the products and their approaches in this article. Some products claim to have developed an “algorithm with an accuracy rate of text detection higher than 98%”. Unfortunately, this same algorithm determined a GPT-4 generated response from the prompt “write a paragraph in the style of Edgar Allan Poe” was 0% AI GPT. In my experience, you don’t need to try very hard to trick “AI-detection” systems.
A low-effort quality-of-life improvement for oncall has been starting a week-long shift on a Friday instead of a Monday. Beginning a weekend with oncall isn’t the best, but it’s more than offset by how good it feels to finish the week and oncall at the same time next Friday.
LMQL is a SQL-like programming language for interacting with LMs. It takes a declarative approach to specifying the output constraints for a language model, with a SQL flavor. Microsoft created a project called guidance which is an LLM-agnostic language to “interleave generation, prompting, and logical control into a single continuous flow matching how the language model actually processes the text”. It’s based on Handlebars templates and provides in-template notion for system and user messages.