The Codebase Map I Wish I Had: A Local Knowledge Graph for Code Discovery

Istvan · Saturday, March 28 2026 · 8 min

rust developer tools code intelligence knowledge graphs semantic search

Context

Codebases are graphs of interconnected functions, types, and documentation, yet our primary discovery tool is still grep. Chizu is a local knowledge graph that indexes your codebase into a queryable database, letting you ask "what tests cover the checkout flow" instead of hunting through thousands of text matches.

Key Takeaways

Chizu transforms your codebase into a queryable knowledge graph using tree-sitter parsing, SQLite storage, and optional vector embeddings - everything runs locally with no code ever leaving your machine
Natural language queries like "how does error handling work in the API layer" return ranked entities with explanations, replacing the grep-and-hope approach to code discovery
The graph structure captures relationships (defines, uses, tested_by, mentions) that text search cannot express, enabling questions about code architecture rather than just string locations

The Problem with Code Discovery

Modern software development involves navigating enormous graphs of interconnected concepts. Yet our primary discovery tool is still grep - a line-oriented text search that treats code as a flat sequence of characters. This works fine for finding where a specific string appears, but falls apart when you need to understand relationships.

Consider what modern software development actually involves:

Relationship	Example
Functions call other functions	`process_order` calls `validate_payment`
Types reference other types	`User` struct contains `Address`
Tests validate implementations	`test_checkout_flow` tests `checkout.rs`
Documentation mentions APIs	README references `Config` struct
Infrastructure deploys services	Terraform deploys Docker containers
Configuration wires everything together	`app.conf` sets database URLs

Each of these is a graph edge. But grep sees only lines.

A seemingly simple question illustrates the gap: "What tests cover the user authentication flow?" With traditional tools, you might find files that mention "auth," identify which functions handle authentication, search for test files that import those functions, then manually verify which tests actually test the flow versus just mention it.

What Chizu Does

Chizu treats your codebase as a graph. It parses source files, extracts meaningful entities, and creates edges between them based on their relationships.

Entity Types

Type	Description	Example ID
`symbol`	Functions, structs, traits, types	`symbol::src/auth.rs::validate_token`
`test`	Test functions	`test::src/auth.rs::test_token_expired`
`source_unit`	Source files	`source_unit::src/auth.rs`
`doc`	Markdown documentation	`doc::docs/auth.md`
`infra_root`	Terraform directories	`infra_root::infra/base`
`containerized`	Dockerfiles	`containerized::Dockerfile`

Edge Types

Edge	Meaning	Example
`defines`	File contains symbol	`auth.rs --defines--> validate_token`
`uses`	Symbol references symbol	`handle_request --uses--> validate_token`
`tested_by`	File has associated tests	`router.rs --tested_by--> test_routing`
`mentions`	Doc references symbol	`README.md --mentions--> Config`
`deploys`	Infra deploys container	`base-infra --deploys--> Dockerfile`

This structure enables queries that understand context. Instead of grepping for strings, you traverse relationships.

Architecture

The pipeline moves from source code to queryable graph through four stages:

Input (Rust, TypeScript, Astro, Terraform, Markdown)
                    |
                    v
        +-----------------------+
        |   Indexing Pipeline   |
        | - File discovery      |
        | - Tree-sitter parsing |
        | - Entity extraction   |
        | - Edge creation       |
        | - Embedding generation|
        +-----------------------+
                    |
                    v
        +-----------------------+
        |    Storage Layer      |
        | - SQLite (entities)   |
        | - usearch (vectors)   |
        | - Blake3 (hashes)     |
        +-----------------------+
                    |
                    v
        +-----------------------+
        |   Query Interface     |
        | - Natural language    |
        | - Entity inspection   |
        | - Graph traversal     |
        | - Vector search       |
        +-----------------------+

Design Principles

Local-first: Everything runs on your machine. No code, no embeddings, no metadata ever leaves your system.

Incremental: Chizu only re-indexes files that have changed, using content hashing to detect modifications quickly.

Language-agnostic: The parser architecture supports any language with a tree-sitter grammar. Currently supports Rust, TypeScript, Astro, Terraform, and Markdown.

Graph-native: Relationships are first-class citizens, not afterthoughts.

Query-flexible: Access your data via CLI, direct SQL, or natural language.

Using Chizu

Indexing

# Index a repository
chizu index /path/to/repo

# With embeddings (requires Ollama)
chizu index --embed /path/to/repo

The index is stored in .chizu/ at the repository root:

graph.db - SQLite database with entities and edges
vectors.usearch - Vector index for semantic search
content_hashes.json - Content addressing for incremental updates

Natural Language Queries

chizu plan "how does routing work"

This uses an LLM to interpret your question, query the graph, and return relevant entities with explanations. The reranking system considers keyword matches, semantic similarity, entity type relevance, graph connectivity, and path matching.

Direct Queries

# List all symbols
chizu query entities --kind symbol

# Find specific entity
chizu inspect "symbol::src/main.rs::main"

SQL Access

Since the underlying storage is SQLite, you can query directly:

cd /path/to/repo/.chizu

# Count entities by type
sqlite3 graph.db "SELECT kind, COUNT(*) FROM entities GROUP BY kind;"

# Find all tests for a module
sqlite3 graph.db "SELECT e.name FROM entities e 
    JOIN edges ed ON e.id = ed.dst_id 
    WHERE ed.src_id LIKE '%router%' AND e.kind = 'test';"

How Indexing Works

File Discovery

Chizu walks the directory tree, respecting .gitignore and configurable exclude patterns. It computes a Blake3 hash of each file's content to detect changes.

Parsing

Files are parsed using tree-sitter, a parser generator that produces concrete syntax trees. For each file:

Rust: Extracts functions, structs, enums, traits, impl blocks, tests
TypeScript: Extracts functions, classes, interfaces, types
Astro: Extracts components, frontmatter
Terraform: Extracts resources, modules, variables
Markdown: Extracts headers, code blocks, symbol mentions

Entity Extraction

Parsed ASTs are traversed to extract entities. Each entity gets a unique ID:

symbol::src/auth.rs::validate_token
test::src/auth.rs::test_validate_token_expired
source_unit::src/auth.rs
doc::docs/auth.md

Edge Creation

As entities are extracted, relationships are recorded:

A file "defines" all symbols it contains
A symbol "uses" symbols it references
A test file "tests" the source file it is named after
Documentation "mentions" symbols referenced in backticks

Embedding Generation

If embeddings are enabled, Chizu sends entity text to a local Ollama instance and stores the resulting vectors in usearch. This enables semantic search - finding entities related by meaning, not just keyword.

Query Processing

When you run chizu plan "how does error handling work", here is what happens:

Entity Retrieval: Fetch candidate entities from the graph
Keyword Matching: Score entities whose names contain query terms
Vector Search (if enabled): Find semantically similar entities
Reranking: Combine scores using weighted factors for task routing, keyword relevance, name match quality, vector similarity, entity type preference, and path relevance
LLM Synthesis: Present top entities to an LLM with context, get structured answer

Configuration

Create .chizu.toml in your repository root:

[index]
exclude_patterns = ["**/target/**", "**/node_modules/**"]
parallel_workers = 4

[query]
default_limit = 15

[query.rerank_weights]
task_route = 0.30
keyword = 0.20
name_match = 0.15
vector = 0.20
kind_preference = 0.05
exported = 0.05
path_match = 0.05

[llm]
default_model = "gpt-4o-mini"
timeout_secs = 60

[embedding]
enabled = true
provider = "ollama"
base_url = "http://localhost:11434/v1"
model = "nomic-embed-text-v2-moe:latest"
dimensions = 768

Use Cases

Onboarding to a New Codebase

chizu plan "explain the architecture of the payment system"

Get a high-level overview without reading hundreds of files.

Finding Relevant Tests

chizu plan "what tests cover the checkout flow"

Skip the grep-and-hope approach.

Understanding Dependencies

sqlite3 .chizu/graph.db "SELECT dst_id FROM edges 
    WHERE src_id = 'symbol::src/order.rs::process_order' 
    AND rel = 'uses';"

See exactly what a function depends on.

Documentation Gap Analysis

sqlite3 .chizu/graph.db "SELECT s.name FROM entities s 
    LEFT JOIN edges e ON s.id = e.dst_id AND e.rel = 'mentions'
    WHERE s.kind = 'symbol' AND e.dst_id IS NULL;"

Find exported symbols never mentioned in docs.

Comparison with Existing Tools

Tool	Approach	Local	Graph	Natural Language
grep	Text search	Yes	No	No
ctags	Symbol index	Yes	No	No
Sourcegraph	Code search	No	Partial	Yes
GitHub Copilot	AI completion	Partial	No	Limited
Chizu	Knowledge graph	Yes	Yes	Yes

Chizu occupies a unique space: it provides AI-powered code understanding that runs entirely locally, using a structured graph representation rather than just text search.

Current Limitations

Chizu is early software. Current limitations:

Language coverage: Only Rust, TypeScript, Astro, Terraform, and Markdown. Python, Go, Java, and others need parsers.
Cross-file analysis: Import resolution and cross-file type inference are limited.
Git integration: No blame information or commit history in the graph yet.

Future directions include language server protocol integration, a web UI for graph visualization, code complexity metrics, and automated documentation generation.

Why Rust

Chizu is written in Rust because parsing millions of lines of code needs to be fast. Tree-sitter parsing, content hashing, and database operations all benefit from Rust's zero-cost abstractions and memory safety. The incremental indexing process can handle large repositories in seconds, not minutes.

Getting Started

# Clone and build
git clone https://github.com/l1x/chizu
cd chizu
cargo build --release

# Index your project
./target/release/chizu index /path/to/your/project

# Start exploring
./target/release/chizu plan "how does this codebase work"

Conclusion

I built Chizu because I was tired of grepping. Tired of the thirty-minute spirals where you chase imports through five files only to realize you were looking at the wrong abstraction the whole time. Tired of knowing the information was in there somewhere, but having no map to find it.

Codebases are graphs. It is time our tools treated them that way.

Chizu is an experiment in bringing knowledge graph technology to local code exploration. It will not replace your IDE or your ability to read code, but it might just give you that map you have been missing - the one that shows you how everything connects.

If you are working with large codebases and frustrated with code discovery, give it a try. The project is open source and contributions - especially new language parsers - are welcome.

Chizu (地図) means "map" in Japanese. Because every codebase needs a map.