Matthew wrote a thread summarizing Apple’s private cloud and their security approach. Between the speed at which models are changing and their size, it’s not currently practical to run things like LLM inference on iPhones, at least not for the best available models. These models will probably always be larger or require more compute than what handheld devices are capable of. If you want to use the best models, you’ll need to offload inference to a data center.

There is plently more in the press release.

Our on-device model, with ~3B parameters, outperforms larger models including Phi-3-mini, Mistral-7B, and Gemma-7B. Our server model compares favorably to DBRX-Instruct, Mixtral-8x22B, and GPT-3.5-Turbo while being highly efficient.

It sounds like the options will be

  • a close-to-be-in-class small model on device
  • private inference with a model that is (maybe?) about 12 months behind best in class models
  • farm out the task to OpenAI to get a response from the best model available