iris-vision-70b

Flagship multimodal model. Llama 3.3 70B base with a custom vision adapter trained on screenshots, photos, and diagrams. 128k context. Strongest at reading interfaces and document layouts.

HUGGING FACE →

iris-vision-24b

Mid-size workhorse on Mistral 24B. Runs on a single 4090. Fast enough for live camera streaming, cheap enough for high-volume vision queries.

HUGGING FACE →

iris-mobile-9b

The lightweight option on Gemma 9B. Runs on edge hardware — 3090, M2 Mac, even a high-end phone. For when you want IRIS entirely on-device.

HUGGING FACE →

## Why these models

Closed labs ship vision APIs that refuse half your prompts and rate-limit the other half. Self-hosted open models work but require infrastructure most builders don't want to run.

IRIS sits in the middle — open weights, uncensored outputs, hosted on a peer-to-peer mesh of consumer GPUs, billed per token in $IRIS. You can self-host if you want full control. You can call the API if you want zero ops.

Each model is tuned specifically on mobile/agent use cases — reading screenshots, parsing UI screens, OCR with layout, photo→action chains. Different from a generic vision model that was bolted on after training.