Cortex
Overview
Cortex is a Rust-based AI inference and memory system built with Candle. It provides local LLM inference with persistent memory and state management, enabling developers to run AI models locally with semantic memory capabilities.
Key Features
- Local LLM Inference: GGUF model support via Candle framework for running AI models locally
- Semantic Memory: Vector storage with similarity search and optional embedding models
- Session Management: Persistent chat sessions with state checkpointing
- State Checkpointing: Save and restore conversation states
- CLI Interface: Interactive chat and single-shot generation modes
Technical Architecture
The system is built around a modular architecture with the following core components:
- Runtime: Core execution environment with memory and state primitives
- Inference: Pluggable text generation backends (Candle, stub engines)
- Memory: Vector storage with similarity search capabilities
- State: Checkpoint and session management
- Config: Centralized configuration system
Memory System
The memory system provides powerful semantic search capabilities:
- Vector Storage: Efficient similarity search with configurable thresholds
- Automatic Embeddings: Downloads and uses embedding models when --memory flag is used
- Persistent Storage: Memory persists across sessions
- Configurable Limits: Set maximum entries and similarity thresholds
Usage Examples
The CLI provides intuitive commands for interacting with the system:
Bash
# Interactive chat with semantic memory
cortex chat --model path/to/model.gguf --memory
# Single generation
cortex generate --model path/to/model.gguf "Explain quantum computing"
# Session management
cortex sessions
cortex delete-session my-sessionTechnology Stack
- Language: Rust (100%)
- ML Framework: Candle (Rust ML framework by Hugging Face)
- Model Format: GGUF (quantized models)
- License: MIT