Skip to main content

Cortex

info

Real-world Use: Cortex.cpp powers Jan, our on-device ChatGPT-alternative.

Cortex.cpp is in active development. If you have any questions, please reach out to us on GitHub or Discord

Cortex Cover Image

Cortex is a Local AI API Platform that is used to run and customize LLMs.

Key Features:

  • Straightforward CLI (inspired by Ollama)
  • Full C++ implementation, packageable into Desktop and Mobile apps
  • Pull from Huggingface, or Cortex Built-in Model Library
  • Models stored in universal file formats (vs blobs)
  • Swappable Inference Backends (default: llamacpp, future: ONNXRuntime, TensorRT-LLM)
  • Cortex can be deployed as a standalone API server, or integrated into apps like Jan.ai

Cortex's roadmap is to implement the full OpenAI API including Tools, Runs, Multi-modal and Realtime APIs.

Inference Backends

  • Default: llama.cpp: cross-platform, supports most laptops, desktops and OSes
  • Future: ONNX Runtime: supports Windows Copilot+ PCs & NPUs
  • Future: TensorRT-LLM: supports Nvidia GPUs

If GPU hardware is available, Cortex is GPU accelerated by default.

Models

Cortex.cpp allows users to pull models from multiple Model Hubs, offering flexibility and extensive model access.

Note: As a very general guide: You should have >8 GB of RAM available to run the 7B models, 16 GB to run the 14B models, and 32 GB to run the 32B models.

Cortex Built-in Models & Quantizations

Model /Enginellama.cppCommand
phi-3.5cortex run phi3.5
llama3.2cortex run llama3.2
llama3.1cortex run llama3.1
codestralcortex run codestral
gemma2cortex run gemma2
mistralcortex run mistral
ministralcortex run ministral
qwen2cortex run qwen2.5
openhermes-2.5cortex run openhermes-2.5
tinyllamacortex run tinyllama

View all Cortex Built-in Models.

Cortex supports multiple quantizations for each model.


❯ cortex-nightly pull llama3.2
Downloaded models:
llama3.2:3b-gguf-q2-k
Available to download:
1. llama3.2:3b-gguf-q3-kl
2. llama3.2:3b-gguf-q3-km
3. llama3.2:3b-gguf-q3-ks
4. llama3.2:3b-gguf-q4-km (default)
5. llama3.2:3b-gguf-q4-ks
6. llama3.2:3b-gguf-q5-km
7. llama3.2:3b-gguf-q5-ks
8. llama3.2:3b-gguf-q6-k
9. llama3.2:3b-gguf-q8-0
Select a model (1-9):