Context

Codebases are graphs of interconnected functions, types, and documentation, yet our primary discovery tool is still grep. Chizu is a local knowledge graph that indexes your codebase into a queryable database, letting you ask "what tests cover the checkout flow" instead of hunting through thousands of text matches.

Key Takeaways

  • Chizu transforms your codebase into a queryable knowledge graph using tree-sitter parsing, SQLite storage, and optional vector embeddings - everything runs locally with no code ever leaving your machine
  • Natural language queries like "how does error handling work in the API layer" return ranked entities with explanations, replacing the grep-and-hope approach to code discovery
  • The graph structure captures relationships (defines, uses, tested_by, mentions) that text search cannot express, enabling questions about code architecture rather than just string locations

The Problem with Code Discovery

Modern software development involves navigating enormous graphs of interconnected concepts. Yet our primary discovery tool is still grep - a line-oriented text search that treats code as a flat sequence of characters. This works fine for finding where a specific string appears, but falls apart when you need to understand relationships.

Consider what modern software development actually involves:

Relationship Example
Functions call other functions process_order calls validate_payment
Types reference other types User struct contains Address
Tests validate implementations test_checkout_flow tests checkout.rs
Documentation mentions APIs README references Config struct
Infrastructure deploys services Terraform deploys Docker containers
Configuration wires everything together app.conf sets database URLs

Each of these is a graph edge. But grep sees only lines.

A seemingly simple question illustrates the gap: "What tests cover the user authentication flow?" With traditional tools, you might find files that mention "auth," identify which functions handle authentication, search for test files that import those functions, then manually verify which tests actually test the flow versus just mention it.

What Chizu Does

Chizu treats your codebase as a graph. It parses source files, extracts meaningful entities, and creates edges between them based on their relationships.

Entity Types

Type Description Example ID
symbol Functions, structs, traits, types symbol::src/auth.rs::validate_token
test Test functions test::src/auth.rs::test_token_expired
source_unit Source files source_unit::src/auth.rs
doc Markdown documentation doc::docs/auth.md
infra_root Terraform directories infra_root::infra/base
containerized Dockerfiles containerized::Dockerfile

Edge Types

Edge Meaning Example
defines File contains symbol auth.rs --defines--> validate_token
uses Symbol references symbol handle_request --uses--> validate_token
tested_by File has associated tests router.rs --tested_by--> test_routing
mentions Doc references symbol README.md --mentions--> Config
deploys Infra deploys container base-infra --deploys--> Dockerfile

This structure enables queries that understand context. Instead of grepping for strings, you traverse relationships.

Architecture

The pipeline moves from source code to queryable graph through four stages:

Input (Rust, TypeScript, Astro, Terraform, Markdown)
|
v
+-----------------------+
| Indexing Pipeline |
| - File discovery |
| - Tree-sitter parsing |
| - Entity extraction |
| - Edge creation |
| - Embedding generation|
+-----------------------+
|
v
+-----------------------+
| Storage Layer |
| - SQLite (entities) |
| - usearch (vectors) |
| - Blake3 (hashes) |
+-----------------------+
|
v
+-----------------------+
| Query Interface |
| - Natural language |
| - Entity inspection |
| - Graph traversal |
| - Vector search |
+-----------------------+

Design Principles

Local-first: Everything runs on your machine. No code, no embeddings, no metadata ever leaves your system.

Incremental: Chizu only re-indexes files that have changed, using content hashing to detect modifications quickly.

Language-agnostic: The parser architecture supports any language with a tree-sitter grammar. Currently supports Rust, TypeScript, Astro, Terraform, and Markdown.

Graph-native: Relationships are first-class citizens, not afterthoughts.

Query-flexible: Access your data via CLI, direct SQL, or natural language.

Using Chizu

Indexing

# Index a repository
chizu index /path/to/repo
# With embeddings (requires Ollama)
chizu index --embed /path/to/repo

The index is stored in .chizu/ at the repository root:

  • graph.db - SQLite database with entities and edges
  • vectors.usearch - Vector index for semantic search
  • content_hashes.json - Content addressing for incremental updates

Natural Language Queries

chizu plan "how does routing work"

This uses an LLM to interpret your question, query the graph, and return relevant entities with explanations. The reranking system considers keyword matches, semantic similarity, entity type relevance, graph connectivity, and path matching.

Direct Queries

# List all symbols
chizu query entities --kind symbol
# Find specific entity
chizu inspect "symbol::src/main.rs::main"

SQL Access

Since the underlying storage is SQLite, you can query directly:

cd /path/to/repo/.chizu
# Count entities by type
sqlite3 graph.db "SELECT kind, COUNT(*) FROM entities GROUP BY kind;"
# Find all tests for a module
sqlite3 graph.db "SELECT e.name FROM entities e
JOIN edges ed ON e.id = ed.dst_id
WHERE ed.src_id LIKE '%router%' AND e.kind = 'test';"

How Indexing Works

File Discovery

Chizu walks the directory tree, respecting .gitignore and configurable exclude patterns. It computes a Blake3 hash of each file's content to detect changes.

Parsing

Files are parsed using tree-sitter, a parser generator that produces concrete syntax trees. For each file:

  • Rust: Extracts functions, structs, enums, traits, impl blocks, tests
  • TypeScript: Extracts functions, classes, interfaces, types
  • Astro: Extracts components, frontmatter
  • Terraform: Extracts resources, modules, variables
  • Markdown: Extracts headers, code blocks, symbol mentions

Entity Extraction

Parsed ASTs are traversed to extract entities. Each entity gets a unique ID:

symbol::src/auth.rs::validate_token
test::src/auth.rs::test_validate_token_expired
source_unit::src/auth.rs
doc::docs/auth.md

Edge Creation

As entities are extracted, relationships are recorded:

  • A file "defines" all symbols it contains
  • A symbol "uses" symbols it references
  • A test file "tests" the source file it is named after
  • Documentation "mentions" symbols referenced in backticks

Embedding Generation

If embeddings are enabled, Chizu sends entity text to a local Ollama instance and stores the resulting vectors in usearch. This enables semantic search - finding entities related by meaning, not just keyword.

Query Processing

When you run chizu plan "how does error handling work", here is what happens:

  1. Entity Retrieval: Fetch candidate entities from the graph
  2. Keyword Matching: Score entities whose names contain query terms
  3. Vector Search (if enabled): Find semantically similar entities
  4. Reranking: Combine scores using weighted factors for task routing, keyword relevance, name match quality, vector similarity, entity type preference, and path relevance
  5. LLM Synthesis: Present top entities to an LLM with context, get structured answer

Configuration

Create .chizu.toml in your repository root:

[index]
exclude_patterns = ["**/target/**", "**/node_modules/**"]
parallel_workers = 4
[query]
default_limit = 15
[query.rerank_weights]
task_route = 0.30
keyword = 0.20
name_match = 0.15
vector = 0.20
kind_preference = 0.05
exported = 0.05
path_match = 0.05
[llm]
default_model = "gpt-4o-mini"
timeout_secs = 60
[embedding]
enabled = true
provider = "ollama"
base_url = "http://localhost:11434/v1"
model = "nomic-embed-text-v2-moe:latest"
dimensions = 768

Use Cases

Onboarding to a New Codebase

chizu plan "explain the architecture of the payment system"

Get a high-level overview without reading hundreds of files.

Finding Relevant Tests

chizu plan "what tests cover the checkout flow"

Skip the grep-and-hope approach.

Understanding Dependencies

sqlite3 .chizu/graph.db "SELECT dst_id FROM edges
WHERE src_id = 'symbol::src/order.rs::process_order'
AND rel = 'uses';"

See exactly what a function depends on.

Documentation Gap Analysis

sqlite3 .chizu/graph.db "SELECT s.name FROM entities s
LEFT JOIN edges e ON s.id = e.dst_id AND e.rel = 'mentions'
WHERE s.kind = 'symbol' AND e.dst_id IS NULL;"

Find exported symbols never mentioned in docs.

Comparison with Existing Tools

Tool Approach Local Graph Natural Language
grep Text search Yes No No
ctags Symbol index Yes No No
Sourcegraph Code search No Partial Yes
GitHub Copilot AI completion Partial No Limited
Chizu Knowledge graph Yes Yes Yes

Chizu occupies a unique space: it provides AI-powered code understanding that runs entirely locally, using a structured graph representation rather than just text search.

Current Limitations

Chizu is early software. Current limitations:

  • Language coverage: Only Rust, TypeScript, Astro, Terraform, and Markdown. Python, Go, Java, and others need parsers.
  • Cross-file analysis: Import resolution and cross-file type inference are limited.
  • Git integration: No blame information or commit history in the graph yet.

Future directions include language server protocol integration, a web UI for graph visualization, code complexity metrics, and automated documentation generation.

Why Rust

Chizu is written in Rust because parsing millions of lines of code needs to be fast. Tree-sitter parsing, content hashing, and database operations all benefit from Rust's zero-cost abstractions and memory safety. The incremental indexing process can handle large repositories in seconds, not minutes.

Getting Started

# Clone and build
git clone https://github.com/l1x/chizu
cd chizu
cargo build --release
# Index your project
./target/release/chizu index /path/to/your/project
# Start exploring
./target/release/chizu plan "how does this codebase work"

Conclusion

I built Chizu because I was tired of grepping. Tired of the thirty-minute spirals where you chase imports through five files only to realize you were looking at the wrong abstraction the whole time. Tired of knowing the information was in there somewhere, but having no map to find it.

Codebases are graphs. It is time our tools treated them that way.

Chizu is an experiment in bringing knowledge graph technology to local code exploration. It will not replace your IDE or your ability to read code, but it might just give you that map you have been missing - the one that shows you how everything connects.

If you are working with large codebases and frustrated with code discovery, give it a try. The project is open source and contributions - especially new language parsers - are welcome.


Chizu (地図) means "map" in Japanese. Because every codebase needs a map.