Overview

Cortex is a Rust-based AI inference and memory system built with Candle. It provides local LLM inference with persistent memory and state management, enabling developers to run AI models locally with semantic memory capabilities.

Key Features

Local LLM Inference: GGUF model support via Candle framework for running AI models locally
Semantic Memory: Vector storage with similarity search and optional embedding models
Session Management: Persistent chat sessions with state checkpointing
State Checkpointing: Save and restore conversation states
CLI Interface: Interactive chat and single-shot generation modes

Technical Architecture

The system is built around a modular architecture with the following core components:

Runtime: Core execution environment with memory and state primitives
Inference: Pluggable text generation backends (Candle, stub engines)
Memory: Vector storage with similarity search capabilities
State: Checkpoint and session management
Config: Centralized configuration system

Memory System

The memory system provides powerful semantic search capabilities:

Vector Storage: Efficient similarity search with configurable thresholds
Automatic Embeddings: Downloads and uses embedding models when --memory flag is used
Persistent Storage: Memory persists across sessions
Configurable Limits: Set maximum entries and similarity thresholds

Usage Examples

The CLI provides intuitive commands for interacting with the system:

Bash

# Interactive chat with semantic memory
cortex chat --model path/to/model.gguf --memory

# Single generation
cortex generate --model path/to/model.gguf "Explain quantum computing"

# Session management
cortex sessions
cortex delete-session my-session

Technology Stack

Language: Rust (100%)
ML Framework: Candle (Rust ML framework by Hugging Face)
Model Format: GGUF (quantized models)
License: MIT