I figured out the issue with adding mistral-large. After a bit of debugging, I realized by manually calling llm_mistral.refresh_models() that something was wrong with how I had added the secret on Modal. It turns out the environment variable name for the Mistral API key needed to be LLM_MISTRAL_KEY. I’m going to try and make a PR to the repo to document this behavior.


I’ve been trying to run models locally. Mostly specifically colpali and florence-2. This has not been easy. It’s possible these require GPUs and might not be macOS friendly. I’ve ended up deep in Github threads and dependency help trying to get basic inference running. I might need to start with something more simpler and smaller and build up from there.

A taste of the problems

❯ pip install flash_attn
Collecting flash_attn
  Using cached flash_attn-2.6.3.tar.gz (2.6 MB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... error
  error: subprocess-exited-with-error

  × Getting requirements to build wheel did not run successfully.
  ...
      ModuleNotFoundError: No module named 'torch'
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.
❯ pip install --extra-index-url https://download.pytorch.org/whl/cpu torch==2.4.0+cpu
Looking in indexes: https://pypi.org/simple, https://download.pytorch.org/whl/cpu
ERROR: Could not find a version that satisfies the requirement torch==2.4.0+cpu (from versions: 2.0.0, 2.0.1, 2.1.0, 2.1.1, 2.1.2, 2.2.0, 2.2.1, 2.2.2, 2.3.0, 2.3.1, 2.4.0)
ERROR: No matching distribution found for torch==2.4.0+cpu

Not great.