I did some experimentation deriving a data model iteratively (something I am currently calling “data model distillation”) by sequentially passing multiple images (could work with text as well) to a language model and prompting it to improve the schema using any new learnings from the current image. Results so far have been unimpressive.

I’ve been hearing good things about mistral-large-2. I’m working on adding it to bots-doing-things but have had a bit of dependency trouble so far.