Having gotten more into using llama 7b and 30b lately, this take seems likes it could hold water. Model inference still isn’t free when you scale a consumer app. Maybe I can use llama3 for all my personal use cases, but I still need infra to scale it. The price probably goes down significantly though with so many model inference providers and the speed will go way up once Groq starts running it (if they can run multi-modal models).