Cortex Basic Usage
Cortex has an API server that runs at localhost:39281
.
The port parameter can be set in .cortexrc
with the apiServerPort
parameter
Server
Start Cortex Server
# By default the server will be started on port `39281`cortex# Start a server with different port numbercortex -a <address> -p <port_number># Set the data folder directorycortex --dataFolder <dataFolderPath>
Terminate Cortex Server
curl --request DELETE \ --url http://127.0.0.1:39281/processManager/destroy
Engines
Cortex currently supports 3 industry-standard engines: llama.cpp, ONNXRuntime and TensorRT-LLM.
By default, Cortex installs llama.cpp engine which supports most laptops, desktops and OSes.
For more information, see Engine Management
List available engines
curl --request GET \ --url http://127.0.0.1:39281/v1/engines
Install an Engine (eg llama-cpp)
curl --request POST \ --url http://127.0.0.1:39281/v1/engines/install/llama-cpp
Manage Models
Pull Model
curl --request POST \ --url http://127.0.0.1:39281/v1/models/pull \ -H "Content-Type: application/json" \ --header 'Content-Type: application/json' \ --data '{ "model": "tinyllama:gguf", "id": "my-custom-model-id",}'
If the model download was interrupted, this request will download the remainder of the model files.
The downloaded models are saved to the Cortex Data Folder.
Stop Model Download
❯ curl --request DELETE \ --url http://127.0.0.1:39281/v1/models/pull \ --header 'Content-Type: application/json' \ --data '{ "taskId": "tinyllama:1b-gguf"}'
List Models
curl --request GET \ --url http://127.0.0.1:39281/v1/models
Delete Model
curl --request DELETE \ --url http://127.0.0.1:39281/v1/models/tinyllama:1b-gguf
Run Models
Start Model
# Start the modelcurl --request POST \ --url http://127.0.0.1:39281/v1/models/start \ --header 'Content-Type: application/json' \ --data '{ "model": "tinyllama:1b-gguf"}'
Create Chat Completion
# Invoke the chat completions endpointcurl --request POST \ --url http://localhost:39281/v1/chat/completions \ -H "Content-Type: application/json" \ --data '{ "messages": [ { "role": "user", "content": "Write a Haiku about cats and AI" }, ], "model": "tinyllama:1b-gguf", "stream": false,}'
Stop Model
curl --request POST \ --url http://127.0.0.1:39281/v1/models/stop \ --header 'Content-Type: application/json' \ --data '{ "model": "tinyllama:1b-gguf"}'