Hybrid LLM Manager — TurboQuant multi-model inference with voice pipeline

homelab inference llm

Python 75%
Shell 25%

Find a file

pook 9572429536 feat: add TurboQuant launch script — 262K context, q8_0/turbo3 KV cache		2026-04-30 12:22:15 -04:00
__pycache__	feat: Hybrid LLM Manager — TurboQuant+ multi-model router	2026-03-31 22:43:13 -04:00
scripts	feat: add TurboQuant launch script — 262K context, q8_0/turbo3 KV cache	2026-04-30 12:22:15 -04:00
.gitignore	chore: add gitignore	2026-03-31 22:43:24 -04:00
config.py	fix: sync TurboQuant config — turbo3 KV cache, 262K context, V4 manager	2026-04-30 12:21:31 -04:00
llm-manager-init	feat: Hybrid LLM Manager — TurboQuant+ multi-model router	2026-03-31 22:43:13 -04:00
manager.py	fix: sync TurboQuant config — turbo3 KV cache, 262K context, V4 manager	2026-04-30 12:21:31 -04:00
README.md	feat: add README.md	2026-04-24 15:02:20 -04:00

README.md

Homelab LLM Infrastructure (homelab-llm-infra)

TurboQuant multi-model inference with voice pipeline.

Overview

Hybrid LLM manager for homelab GPU inference. Manages multiple Ollama models with TurboQuant optimization, context window orchestration, automatic model handoff, and integrated voice pipeline for real-time interaction.

Stack

Python
Ollama

Quick Start

# Clone
git clone ssh://git@192.168.183.110:2222/pook/homelab-llm-infra.git
cd homelab-llm-infra

Status

Active