The Codebase Map I Wish I Had: A Local Knowledge Graph for Code Discovery
Context
Codebases are graphs of interconnected functions, types, and documentation, yet our primary discovery tool is still grep. Chizu is a local knowledge graph that indexes your codebase into a queryable database, letting you ask "what tests cover the checkout flow" instead of hunting through thousands of text matches.
Key Takeaways
- Chizu transforms your codebase into a queryable knowledge graph using tree-sitter parsing, SQLite storage, and optional vector embeddings - everything runs locally with no code ever leaving your machine
- Natural language queries like "how does error handling work in the API layer" return ranked entities with explanations, replacing the grep-and-hope approach to code discovery
- The graph structure captures relationships (defines, uses, tested_by, mentions) that text search cannot express, enabling questions about code architecture rather than just string locations
The Problem with Code Discovery
Modern software development involves navigating enormous graphs of interconnected concepts. Yet our primary discovery tool is still grep - a line-oriented text search that treats code as a flat sequence of characters. This works fine for finding where a specific string appears, but falls apart when you need to understand relationships.
Consider what modern software development actually involves:
| Relationship | Example |
|---|---|
| Functions call other functions | process_order calls validate_payment |
| Types reference other types | User struct contains Address |
| Tests validate implementations | test_checkout_flow tests checkout.rs |
| Documentation mentions APIs | README references Config struct |
| Infrastructure deploys services | Terraform deploys Docker containers |
| Configuration wires everything together | app.conf sets database URLs |
Each of these is a graph edge. But grep sees only lines.
A seemingly simple question illustrates the gap: "What tests cover the user authentication flow?" With traditional tools, you might find files that mention "auth," identify which functions handle authentication, search for test files that import those functions, then manually verify which tests actually test the flow versus just mention it.
What Chizu Does
Chizu treats your codebase as a graph. It parses source files, extracts meaningful entities, and creates edges between them based on their relationships.
Entity Types
| Type | Description | Example ID |
|---|---|---|
symbol |
Functions, structs, traits, types | symbol::src/auth.rs::validate_token |
test |
Test functions | test::src/auth.rs::test_token_expired |
source_unit |
Source files | source_unit::src/auth.rs |
doc |
Markdown documentation | doc::docs/auth.md |
infra_root |
Terraform directories | infra_root::infra/base |
containerized |
Dockerfiles | containerized::Dockerfile |
Edge Types
| Edge | Meaning | Example |
|---|---|---|
defines |
File contains symbol | auth.rs --defines--> validate_token |
uses |
Symbol references symbol | handle_request --uses--> validate_token |
tested_by |
File has associated tests | router.rs --tested_by--> test_routing |
mentions |
Doc references symbol | README.md --mentions--> Config |
deploys |
Infra deploys container | base-infra --deploys--> Dockerfile |
This structure enables queries that understand context. Instead of grepping for strings, you traverse relationships.
Architecture
The pipeline moves from source code to queryable graph through four stages:
Input (Rust, TypeScript, Astro, Terraform, Markdown)
|
v
+-----------------------+
| Indexing Pipeline |
| - File discovery |
| - Tree-sitter parsing |
| - Entity extraction |
| - Edge creation |
| - Embedding generation|
+-----------------------+
|
v
+-----------------------+
| Storage Layer |
| - SQLite (entities) |
| - usearch (vectors) |
| - Blake3 (hashes) |
+-----------------------+
|
v
+-----------------------+
| Query Interface |
| - Natural language |
| - Entity inspection |
| - Graph traversal |
| - Vector search |
+-----------------------+
Design Principles
Local-first: Everything runs on your machine. No code, no embeddings, no metadata ever leaves your system.
Incremental: Chizu only re-indexes files that have changed, using content hashing to detect modifications quickly.
Language-agnostic: The parser architecture supports any language with a tree-sitter grammar. Currently supports Rust, TypeScript, Astro, Terraform, and Markdown.
Graph-native: Relationships are first-class citizens, not afterthoughts.
Query-flexible: Access your data via CLI, direct SQL, or natural language.
Using Chizu
Indexing
# Index a repository
chizu index /path/to/repo
# With embeddings (requires Ollama)
chizu index --embed /path/to/repo
The index is stored in .chizu/ at the repository root:
graph.db- SQLite database with entities and edgesvectors.usearch- Vector index for semantic searchcontent_hashes.json- Content addressing for incremental updates
Natural Language Queries
chizu plan "how does routing work"
This uses an LLM to interpret your question, query the graph, and return relevant entities with explanations. The reranking system considers keyword matches, semantic similarity, entity type relevance, graph connectivity, and path matching.
Direct Queries
# List all symbols
chizu query entities --kind symbol
# Find specific entity
chizu inspect "symbol::src/main.rs::main"
SQL Access
Since the underlying storage is SQLite, you can query directly:
cd /path/to/repo/.chizu
# Count entities by type
sqlite3 graph.db "SELECT kind, COUNT(*) FROM entities GROUP BY kind;"
# Find all tests for a module
sqlite3 graph.db "SELECT e.name FROM entities e
JOIN edges ed ON e.id = ed.dst_id
WHERE ed.src_id LIKE '%router%' AND e.kind = 'test';"
How Indexing Works
File Discovery
Chizu walks the directory tree, respecting .gitignore and configurable exclude patterns. It computes a Blake3 hash of each file's content to detect changes.
Parsing
Files are parsed using tree-sitter, a parser generator that produces concrete syntax trees. For each file:
- Rust: Extracts functions, structs, enums, traits, impl blocks, tests
- TypeScript: Extracts functions, classes, interfaces, types
- Astro: Extracts components, frontmatter
- Terraform: Extracts resources, modules, variables
- Markdown: Extracts headers, code blocks, symbol mentions
Entity Extraction
Parsed ASTs are traversed to extract entities. Each entity gets a unique ID:
symbol::src/auth.rs::validate_token
test::src/auth.rs::test_validate_token_expired
source_unit::src/auth.rs
doc::docs/auth.md
Edge Creation
As entities are extracted, relationships are recorded:
- A file "defines" all symbols it contains
- A symbol "uses" symbols it references
- A test file "tests" the source file it is named after
- Documentation "mentions" symbols referenced in backticks
Embedding Generation
If embeddings are enabled, Chizu sends entity text to a local Ollama instance and stores the resulting vectors in usearch. This enables semantic search - finding entities related by meaning, not just keyword.
Query Processing
When you run chizu plan "how does error handling work", here is what happens:
- Entity Retrieval: Fetch candidate entities from the graph
- Keyword Matching: Score entities whose names contain query terms
- Vector Search (if enabled): Find semantically similar entities
- Reranking: Combine scores using weighted factors for task routing, keyword relevance, name match quality, vector similarity, entity type preference, and path relevance
- LLM Synthesis: Present top entities to an LLM with context, get structured answer
Configuration
Create .chizu.toml in your repository root:
[index]
exclude_patterns = ["**/target/**", "**/node_modules/**"]
parallel_workers = 4
[query]
default_limit = 15
[query.rerank_weights]
task_route = 0.30
keyword = 0.20
name_match = 0.15
vector = 0.20
kind_preference = 0.05
exported = 0.05
path_match = 0.05
[llm]
default_model = "gpt-4o-mini"
timeout_secs = 60
[embedding]
enabled = true
provider = "ollama"
base_url = "http://localhost:11434/v1"
model = "nomic-embed-text-v2-moe:latest"
dimensions = 768
Use Cases
Onboarding to a New Codebase
chizu plan "explain the architecture of the payment system"
Get a high-level overview without reading hundreds of files.
Finding Relevant Tests
chizu plan "what tests cover the checkout flow"
Skip the grep-and-hope approach.
Understanding Dependencies
sqlite3 .chizu/graph.db "SELECT dst_id FROM edges
WHERE src_id = 'symbol::src/order.rs::process_order'
AND rel = 'uses';"
See exactly what a function depends on.
Documentation Gap Analysis
sqlite3 .chizu/graph.db "SELECT s.name FROM entities s
LEFT JOIN edges e ON s.id = e.dst_id AND e.rel = 'mentions'
WHERE s.kind = 'symbol' AND e.dst_id IS NULL;"
Find exported symbols never mentioned in docs.
Comparison with Existing Tools
| Tool | Approach | Local | Graph | Natural Language |
|---|---|---|---|---|
| grep | Text search | Yes | No | No |
| ctags | Symbol index | Yes | No | No |
| Sourcegraph | Code search | No | Partial | Yes |
| GitHub Copilot | AI completion | Partial | No | Limited |
| Chizu | Knowledge graph | Yes | Yes | Yes |
Chizu occupies a unique space: it provides AI-powered code understanding that runs entirely locally, using a structured graph representation rather than just text search.
Current Limitations
Chizu is early software. Current limitations:
- Language coverage: Only Rust, TypeScript, Astro, Terraform, and Markdown. Python, Go, Java, and others need parsers.
- Cross-file analysis: Import resolution and cross-file type inference are limited.
- Git integration: No blame information or commit history in the graph yet.
Future directions include language server protocol integration, a web UI for graph visualization, code complexity metrics, and automated documentation generation.
Why Rust
Chizu is written in Rust because parsing millions of lines of code needs to be fast. Tree-sitter parsing, content hashing, and database operations all benefit from Rust's zero-cost abstractions and memory safety. The incremental indexing process can handle large repositories in seconds, not minutes.
Getting Started
# Clone and build
git clone https://github.com/l1x/chizu
cd chizu
cargo build --release
# Index your project
./target/release/chizu index /path/to/your/project
# Start exploring
./target/release/chizu plan "how does this codebase work"
Conclusion
I built Chizu because I was tired of grepping. Tired of the thirty-minute spirals where you chase imports through five files only to realize you were looking at the wrong abstraction the whole time. Tired of knowing the information was in there somewhere, but having no map to find it.
Codebases are graphs. It is time our tools treated them that way.
Chizu is an experiment in bringing knowledge graph technology to local code exploration. It will not replace your IDE or your ability to read code, but it might just give you that map you have been missing - the one that shows you how everything connects.
If you are working with large codebases and frustrated with code discovery, give it a try. The project is open source and contributions - especially new language parsers - are welcome.
Chizu (地図) means "map" in Japanese. Because every codebase needs a map.