Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
- .woodpecker.yaml: image paths -> library/autojanet-{agent,dispatcher}
- .woodpecker.yaml: secret names RS_HARBOR_USER / RS_HARBOR_PASS (global)
- container/Dockerfile: restore COPY skills/, skills/ populated from opencode config
- skills/: 84 opencode skills bundled into image
- k8s/manifests: update image refs to library/
132 lines
5.3 KiB
Markdown
132 lines
5.3 KiB
Markdown
---
|
|
name: understand-knowledge
|
|
description: Analyze a Karpathy-pattern LLM wiki knowledge base and generate an interactive knowledge graph with entity extraction, implicit relationships, and topic clustering.
|
|
argument-hint: [wiki-directory]
|
|
---
|
|
|
|
# /understand-knowledge
|
|
|
|
Analyzes a Karpathy-pattern LLM wiki — a three-layer knowledge base with raw sources, wiki markdown, and a schema file — and produces an interactive knowledge graph dashboard.
|
|
|
|
## What It Detects
|
|
|
|
The **Karpathy LLM wiki pattern** (see https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f):
|
|
- **Raw sources** — immutable source documents (articles, papers, data files)
|
|
- **Wiki** — LLM-generated markdown files with wikilinks (`[[target]]` syntax)
|
|
- **Schema** — CLAUDE.md, AGENTS.md, or similar configuration file
|
|
- **index.md** — content catalog organized by categories
|
|
- **log.md** — chronological operation log
|
|
|
|
Detection signals: has `index.md` + multiple `.md` files with wikilinks. May have `raw/` directory and schema file.
|
|
|
|
## Instructions
|
|
|
|
### Phase 1: DETECT
|
|
|
|
1. Determine the target directory:
|
|
- If the user provided a path argument, use that
|
|
- Otherwise, use the current working directory
|
|
|
|
2. Run the format detection script bundled with this skill:
|
|
```
|
|
python3 <SKILL_DIR>/parse-knowledge-base.py <TARGET_DIR>
|
|
```
|
|
- If the script exits with an error, tell the user this doesn't appear to be a Karpathy-pattern wiki and explain what was expected
|
|
- If successful, proceed. The script writes `scan-manifest.json` to `<TARGET_DIR>/.understand-anything/intermediate/`
|
|
|
|
3. Read the scan-manifest.json and announce the results:
|
|
- "Detected Karpathy wiki: N articles, N sources, N topics, N wikilinks (N unresolved)"
|
|
- List the categories found from index.md
|
|
|
|
### Phase 2: SCAN (already done)
|
|
|
|
The parse script in Phase 1 already performed the deterministic scan. The scan-manifest.json contains:
|
|
- Article nodes (one per wiki .md file) with extracted wikilinks, headings, frontmatter
|
|
- Source nodes (one per raw/ file)
|
|
- Topic nodes (from index.md section headings)
|
|
- `related` edges (from wikilinks)
|
|
- `categorized_under` edges (from index.md sections)
|
|
|
|
No additional scanning is needed. Proceed to Phase 3.
|
|
|
|
### Phase 3: ANALYZE
|
|
|
|
Dispatch `article-analyzer` subagents to extract implicit knowledge:
|
|
|
|
1. Read the scan-manifest.json to get the article list
|
|
|
|
2. Prepare batches of 10-15 articles each, grouped by category when possible (articles in the same category are more likely to have implicit cross-references)
|
|
|
|
3. For each batch, dispatch an `article-analyzer` subagent with:
|
|
- The batch of articles (id, name, summary, wikilinks, category, content from knowledgeMeta)
|
|
- The full list of existing node IDs (so the agent can reference them)
|
|
- The batch number for output file naming
|
|
- The intermediate directory path: `$INTERMEDIATE_DIR = <TARGET_DIR>/.understand-anything/intermediate`
|
|
|
|
The agent will write `analysis-batch-{N}.json` to the intermediate directory.
|
|
|
|
4. Run up to 3 batches concurrently. Wait for all batches to complete.
|
|
|
|
5. If any batch fails, log a warning but continue — the scan-manifest provides a solid base graph even without LLM analysis.
|
|
|
|
### Phase 4: MERGE
|
|
|
|
1. Run the merge script bundled with this skill:
|
|
```
|
|
python3 <SKILL_DIR>/merge-knowledge-graph.py <TARGET_DIR>
|
|
```
|
|
|
|
2. The script:
|
|
- Combines scan-manifest.json + all analysis-batch-*.json files
|
|
- Deduplicates entities (case-insensitive name matching)
|
|
- Normalizes node/edge types via alias maps
|
|
- Builds layers from index.md categories
|
|
- Builds a tour from index.md section ordering
|
|
- Writes `assembled-graph.json` to the intermediate directory
|
|
|
|
3. Read the merge report from stderr and announce:
|
|
- Total nodes, edges, layers, tour steps
|
|
- How many entities/claims the LLM analysis added
|
|
|
|
### Phase 5: SAVE
|
|
|
|
1. Read the assembled-graph.json
|
|
|
|
2. Run basic validation:
|
|
- Every edge source/target must reference an existing node
|
|
- Every node must have: id, type, name, summary, tags, complexity
|
|
- Remove any edges with dangling references
|
|
|
|
3. Copy the validated graph to `<TARGET_DIR>/.understand-anything/knowledge-graph.json`
|
|
|
|
4. Write metadata to `<TARGET_DIR>/.understand-anything/meta.json`:
|
|
```json
|
|
{
|
|
"lastAnalyzedAt": "<ISO timestamp>",
|
|
"gitCommitHash": "<from git rev-parse HEAD or empty>",
|
|
"version": "1.0.0",
|
|
"analyzedFiles": <number of wiki articles>
|
|
}
|
|
```
|
|
|
|
5. Clean up intermediate files:
|
|
```
|
|
rm -rf <TARGET_DIR>/.understand-anything/intermediate
|
|
```
|
|
|
|
6. Report summary to the user:
|
|
- "Knowledge graph saved: N articles, N entities, N topics, N claims, N sources"
|
|
- "N edges (N wikilink, N categorized, N implicit)"
|
|
- "N layers, N tour steps"
|
|
|
|
7. Auto-trigger the dashboard:
|
|
```
|
|
/understand-dashboard <TARGET_DIR>
|
|
```
|
|
|
|
## Notes
|
|
|
|
- The parse script handles ALL deterministic extraction (wikilinks, headings, frontmatter, categories from index.md). The LLM agents only add implicit knowledge that requires inference.
|
|
- Categories and taxonomy come from index.md section headings, NOT from filename prefixes. The Karpathy spec is intentionally abstract about naming conventions.
|
|
- The graph uses `kind: "knowledge"` to signal the dashboard to use force-directed layout instead of hierarchical dagre.
|
|
- Source nodes from raw/ are lightweight (filename + size only) — we don't parse PDFs or binary files.
|