I am continuing to see a lot of buzz about ColPali and Qwen2-VL. I’d like to try these out but haven’t put together enough of the pieces to make sense of it yet. I am also seeing a lot of conversation about how traditional OCR to LLM pipelines will be superseded by these approaches. Based on my experience with VLMs, this seems directionally correct. The overall amount of noise makes it tough to figure out what is worth focusing on and what is real vs. hype.