I’ve been keeping an eye out for language models that can run locally so that I can use them on personal data sets for tasks like summarization and knowledge retrieval without sending all my data up to someone else’s cloud.
Anthony sent me a link to a Twitter thread about product called deepsparse
by Neural Magic that claims to offer
[a]n inference runtime offering GPU-class performance on CPUs and APIs to integrate ML into your application