E-Labs Copilot

What can I help with?

Local + cloud AI orchestration. The CEO model routes your prompt to the best muscle automatically.

Generate a cyberpunk image with neon lighting

Write a GPU temperature monitor in Python

Explain LoRA vs full fine-tuning

Create a 30s product launch script

💬 0 messages ~0 tokens

Non-admin model visibility

Local models via Ollama + ComfyUI • Cloud APIs optional • Your data stays on your machine

⚙️ General

Default routing mode, API keys, and system information.

Default Mode

Which routing mode to use for new chats

API Keys

Loading providers...

System

Hardware

Detecting...

VRAM Usage

—

🚀 Performance

Fast mode, benchmarks, and active inference parameters.

⚡ Fast Mode

Reduces context window & may lower output quality for faster responses

Performance Targets

Tokens/sec

—

Target tok/s

16+

Context Window

—

Max Output

—

Benchmarks the model currently loaded in VRAM. Mount a model in the Models tab first.

Active Parameters

🤖 Models

Install, manage, and monitor local LLM models via Ollama.

Install a New Model

R&D Dashboard

Choose a model that fits your 12GB VRAM. Models download from Ollama's registry automatically.

Or type a custom model name:

Running Models (in VRAM)

Loading...

Installed Models

Loading...

Default	Model	Service	Buy $	Sell $	Credits	User	Admin
Loading chat model rules...

🔧 Advanced

Ollama inference parameters and system diagnostics.

Ollama Parameters

Fine-tune individual inference parameters. Changes apply to all future requests.

System Info

GPU Layers

num_gpu = 99 → all layers on GPU, never CPU offload

99

Model Storage

~/.ollama/models (default Ollama location)

Loaded Models

—

🔌 AI Backends

Configure local and cloud AI providers. Only enabled backends are used by smart routing.

Smart Routing

Automatically pick the best model from enabled backends

Local first

Active Backend:

Loading backends...

Cloud Tools / Extensions

☁️ Claude Code CLI

Route tasks or individual swarm nodes to Anthropic's Claude Code CLI instead of Ollama.
Requires claude CLI installed and authenticated via claude /login.

🤖 GitHub Copilot SDK

Access GitHub Copilot's model fleet (GPT-5, Claude, Gemini, Grok and more) via the copilot Python SDK.
Requires the SDK installed in .venv and authenticated via copilot login.

⚡ RotorQuant KV & llama.cpp

Serve KV-cache-compressed models via llama-server on port 8095.
Tier 1 — q8_0 KV compression (~40K ctx, ~2× savings) — works now with standard llama.cpp.
Tier 2 — RotorQuant iso3/planar3 (~200K ctx, ~10×) — requires CUDA 12.4 + fork build.

Default tier

When enabled, models in the registry with backend: llamacpp are routed to http://127.0.0.1:8095 instead of Ollama. A subtle ⚡llama.cpp · q8_0 KV badge appears on responses served by llama-server. Disable to force all models through Ollama (e.g. when llama-server is offline).

⚡ Skills

Manage built-in and community skills. The active skill is applied automatically by the CEO router.

Loading skills...

🧠 Reasoning Models

Local and cloud reasoning models for Agent and Plan modes. Models exceeding your hardware are shown with ⚠️.

Show Performance Tips

Display inline tips during chat when using reasoning models

Local Models (Ollama)

Loading...

Cloud Models (API Key Required)

Loading...

🧩 Auto-Solve

Autonomous multi-step task solving. The system reasons, calls tools, and iterates until the task is complete.

Enable Auto-Solve

Master switch — when off, auto-solve requests fall back to normal chat

Reasoning Model

Primary model for auto-solve (must support tool calling or CoT)

Max Iterations

Maximum reasoning steps per solve (hard cap: 25)

Thinking Mode

Enable chain-of-thought reasoning (when reasoning model not used)

Tool Categories

Enable or disable entire categories of tools the auto-solver can use.

🔧 General Tools

Code execution, web search, delegation, planning, math

🎨 Image & Video

Generate, edit, and animate images/videos via ComfyUI

💾 Memory

Search and save facts in the memory sidecar

🔊 Text-to-Speech

Generate speech and narrate videos

📄 File Access

Read workspace files

Resource Management

Auto VRAM Management

Automatically load/unload models during auto-solve

💾 Memory Management

View and manage persistent memory across different scopes: global facts, project-specific context, and conversation history.

🧠 Memory Architecture

Global Memory: User facts (name, job, preferences) saved across all chats
Project Memory: Project-specific context (tech stack, custom rules) isolated per project
Current Chat: Messages in this conversation only
Conversation History: Recent messages from other conversations for context

💾 Global Memory (0 facts)

Loading...

📁 Project Memory

No projects yet. Create a project to see its memory.

⚙️ Data Controls

⚡ Swarm Pipeline

Configure THE MACHINE's self-improvement pipeline — models, context budgets, worker counts, timeouts, and apply behavior.

Loading pipeline settings...

📖 Help Guide

🖥️ Systems Portal

Live reachability status of E-Labs services. Click Refresh to re-probe all ports.

Loading...

📋 Saved Workflows

Reusable workflow templates saved from completed projects. Click Launch to create a new project from any template.

Loading...

Loading workflows...

Copilot Workspace

What can I help with?

⚙️ General

API Keys

System

🚀 Performance

Performance Targets

Active Parameters

🤖 Models

Install a New Model

Running Models (in VRAM)

Installed Models

Chat Proven Methods (Admin)

🔧 Advanced

Ollama Parameters

System Info

🔌 AI Backends

Cloud Tools / Extensions

⚡ Skills

🧠 Reasoning Models

Local Models (Ollama)

Cloud Models (API Key Required)

🧩 Auto-Solve

Tool Categories

Resource Management

💾 Memory Management

🧠 Memory Architecture

💾 Global Memory (0 facts)

📁 Project Memory

⚙️ Data Controls

⚡ Swarm Pipeline

🖥️ Systems Portal

📋 Saved Workflows

Configure Service

Available Models

Personas

Files

Workspace Files

Model Stack

Help

Artifact

🧪 Stress Test Dashboard

Test Results

Quick Single Test

Available Test Cases