Research
GKG
Overview
Research prototype that compresses a codebase into a multi-level directed graph (Macro, Meso, Micro, Nano) where nodes are code objects and edges their relationships, enabling LLMs to navigate and modify code with fewer tokens, higher output quality, and cheaper models.
GKG addresses a core problem in LLM-assisted development: feeding raw source files to a model is token-expensive, produces spaghetti when generating PRs across multiple files, and scales poorly to cheaper models. The alternative, maintaining handwritten architecture docs in Markdown, often drifts from the code the moment someone pushes a commit, and LLMs may ignore, hallucinate on, or fail to update those docs anyway. GKG replaces both approaches with a deterministic, AST-derived graph that stays in sync with the source by construction.
The graph has four zoom levels. Macro nodes represent logical modules (rendering, networking, data, etc.), inferred from directory structure and optionally refined by an LLM clustering pass. Meso nodes represent classes and structs. Micro nodes represent individual methods and functions. Nano captures skeleton logic within function bodies. Edges encode five relationship types: OWN (parent-child hierarchy), CALLS (caller invokes callee), SEND (callee return value is consumed by caller), IMPLEMENTS (class inheritance), and DEPENDS_ON (file-level include/import dependencies).
Structure extraction is fully deterministic: Python files are parsed via the ast module, C/C++/JavaScript/TypeScript/Go via tree-sitter, and all other languages via a regex fallback. The only LLM call in the mapping phase is an optional clustering and intent-labeling pass that names modules and annotates nodes with one-sentence descriptions -- the graph topology itself requires no model.
The repo includes two A/B benchmark notebooks. The first compares raw single-shot LLM output against GKG-blueprint-guided generation on four Python tasks, measuring AST parse rate, symbol coverage, forbidden-pattern violations, and graph structure scores. The second runs the same comparison on a real C++ codebase with four feature-implementation tasks. A separate quest runner tests three conditions: full-codebase dump, file-tree navigation, and GKG-guided navigation; across six tasks of increasing complexity, with token-overlap verification and optional LLM judge scoring.
Results on a 1.5B parameter model show GKG-guided generation matching or exceeding raw output quality while producing significantly more consistent results (lower variance across runs), with the graph acting as an architectural contract that prevents the model from drifting off-structure.
Similar project: https://github.com/safishamsi/graphify