IRIS publishes open-source vision-language models tuned for mobile-first multi-modal agents. Vision-capable, uncensored, served by the network. Weights live on Hugging Face — self-host whenever.
Flagship multimodal model. Llama 3.3 70B base with a custom vision adapter trained on screenshots, photos, and diagrams. 128k context. Strongest at reading interfaces and document layouts.
HUGGING FACE →Mid-size workhorse on Mistral 24B. Runs on a single 4090. Fast enough for live camera streaming, cheap enough for high-volume vision queries.
HUGGING FACE →The lightweight option on Gemma 9B. Runs on edge hardware — 3090, M2 Mac, even a high-end phone. For when you want IRIS entirely on-device.
HUGGING FACE →Closed labs ship vision APIs that refuse half your prompts and rate-limit the other half. Self-hosted open models work but require infrastructure most builders don't want to run.
IRIS sits in the middle — open weights, uncensored outputs, hosted on a peer-to-peer mesh of consumer GPUs, billed per token in $IRIS. You can self-host if you want full control. You can call the API if you want zero ops.
Each model is tuned specifically on mobile/agent use cases — reading screenshots, parsing UI screens, OCR with layout, photo→action chains. Different from a generic vision model that was bolted on after training.