From 1751f4d29d5efd072d159674a15d61741c3778ac Mon Sep 17 00:00:00 2001 From: Tomas Grosup Date: Thu, 12 Feb 2026 14:53:35 +0100 Subject: [PATCH] Create SKILL.md --- .github/skills/review-council/SKILL.md | 113 +++++++++++++++++++++++++ 1 file changed, 113 insertions(+) create mode 100644 .github/skills/review-council/SKILL.md diff --git a/.github/skills/review-council/SKILL.md b/.github/skills/review-council/SKILL.md new file mode 100644 index 0000000000..223efc915f --- /dev/null +++ b/.github/skills/review-council/SKILL.md @@ -0,0 +1,113 @@ +--- +name: review-council +description: "Multi-agent review council. Invoke for: review council, fellowship review, multi-agent review, panel review, diverse model review." +--- + +# Review Council + +A multi-model code review panel. You orchestrate 3 phases: build a briefing pack, dispatch 15 parallel assessment agents (5 dimensions × 3 models), and consolidate findings. + +## Phase 1: Briefing Pack + +Auto-detect the PR from the current branch. Build a self-contained briefing document containing all of the following: + +1. **PR metadata**: title, body, labels +2. **Linked issues**: for every issue referenced in the PR body or commits, fetch the full issue body and **all comments** — this is where the real requirements and context live +3. **Full diff**: merge-base diff only (`git diff $(git merge-base HEAD )..HEAD`) — shows only what the PR adds, not what main gained since the branch point +4. **Changed files list**: with change type (added/modified/deleted), test files called out separately +5. **Commit messages**: useful for correlating claims to specific code changes, but not a source of requirements + +Use `claude-opus-4.6` for this phase. The output is a single structured briefing document. Every assessment agent in Phase 2 receives this briefing verbatim — they should not need to re-fetch any of this context. + +## Phase 2: Parallel Assessment + +Dispatch **15 agents in parallel**: each of the 5 dimensions below assessed independently by each of 3 models. + +**Models**: `claude-opus-4.6`, `gemini-3-pro-preview`, `gpt-5.2-codex` + +Each agent receives: +- The full briefing pack from Phase 1 +- Its specific dimension prompt (below) +- Instruction to return findings as a list, each with: location (file:line), description, severity (critical/high/medium/low) + +--- + +### Dimension 1: Claims Coverage + +Are the goals met? Cross-reference every claim in the PR description, linked issues, and commit messages against actual code changes. Flag: + +- **Orphan claims**: stated in PR/issue but no corresponding code change implements it +- **Orphan changes**: code changed but not mentioned in any claim — why is this here? +- **Partial implementations**: claim says X, code does X-minus-something — what's missing? + +--- + +### Dimension 2: Test Coverage + +Is every feature, fix, or behavioral change covered with tests? Check for: + +- **Happy path**: does the basic intended usage work and is it tested end-to-end? +- **Negative path**: what happens with invalid input, malformed syntax, error conditions? Are diagnostics tested? +- **Feature interactions**: this is a compiler — edge cases are about how the change interacts with other F# features. Example: a codegen change should test both reference types and value types. A syntax change should test interaction with generics, constraints, computation expressions, etc. +- **Assertion quality**: tests must actually assert the claimed behavior, not just "compiles and runs without throwing". A test that calls the function but doesn't check the result is not a test. + +Flag any behavioral change in the diff that lacks a corresponding test. Tests should be in the appropriate layer — pick based on what the issue is and what changed: +- **Typecheck tests**: the bulk of coverage — type inference, constraint solving, overload resolution, expected compiler warnings and errors +- **SyntaxTreeTests**: parser/syntax changes +- **EmittedIL tests**: codegen/IL shape changes +- **compileAndRun tests**: end-to-end behavioral correctness that absolutely needs proper execution on the .NET runtime +- **Service.Tests**: FCS API, editor features +- **FSharp.Core.Tests**: core library changes + +A PR can and often should have tests in multiple categories. + +--- + +### Dimension 3: Code Quality + +Assess structural quality of the changes: + +- **Logical layer placement**: is the code in the right module/file, or shoved somewhere convenient? Would a reader expect to find this logic here? +- **Ad-hoc "if/then" patches**: flag any `if condition then specialCase else normalPath` that looks like a band-aid rather than a systematic fix. These are symptoms of not understanding the root cause — the fix should be at the source, not patched at a consumer. A conditional that exists only to work around a bug elsewhere is a code smell. +- **Duplicated logic within the PR diff**: same or near-same code appearing in multiple places in the changeset +- **Error handling**: not swallowing exceptions, not ignoring Result values, not using `failwith` where a typed error would be appropriate + +--- + +### Dimension 4: Code Reuse & Higher-Order Patterns + +Search the codebase for existing patterns that match the new code's structure. F# allows extracting logic into composable pieces — mappable structures, foldable, walkers, visitors. Look for: + +- **Highly similar nested pattern matches**: a familiar structure of nested `match ... with` but with a minor tweak. This is the #1 symptom — slight differences can almost always be extracted into a higher-order function or otherwise parameterized or made generic. +- **Copy-paste-modify**: new code that duplicates an existing function with small changes. The difference should be a parameter, a generic type argument, or a function argument (higher-order function). +- **Missed abstractions**: where two pieces of code share structure but differ in a specific operation — that operation should be a parameter, a generic type argument, or a function argument. +- **Existing utilities ignored**: the codebase may already have helpers, combinators, or active patterns that do what the new code reimplements from scratch. Search for them. + +--- + +### Dimension 5: Cyclomatic Complexity + +Assess complexity of added/changed code: + +- **Pyramid of nested doom**: deeply nested `if/then/else`, heavily nested `match`or interleaving with `for` and other branching constructs. Any nesting beyond 2 levels should be questioned — is there a flatter way? +- **F# offers better tools**: pattern matching and active patterns handle non-trivial branching logic flatter and more readably than chains of `if/elif/else`. Suggest them as alternatives. +- **Active patterns for complex conditions**: when a match guard or if-condition encodes domain logic, an active pattern names it and makes it reusable. +- **Pipelines over nesting**: sequential operations should be pipelined, not nested. Collections, Result, Option, ValueOption all support this — use `|>`, `bind`, `map` chains instead of nested `match` or `if/then`. +- **High branch count**: functions with many `match` arms or `if` branches — consider whether the cases can be grouped, or whether the function is doing too much and should be split. +- **Flatter is better**: a flat pattern match with 10 arms is easier to read than 4 levels of nesting with 10 combinations. Prefer wide over deep. + +--- + +## Phase 3: Consolidation + +Collect all findings from the 15 agents. Then: + +1. **Deduplicate**: multiple agents will find the same issue. Merge findings that point at the same location and describe the same problem. Keep the best-written description. +2. **Classify** into three buckets: + - **Behavioral**: missing feature coverage, missing tests, incorrect logic, claims not met — things that affect correctness + - **Quality**: code structure, readability, complexity, reuse opportunities — it works, but could be better + - **Nitpick**: typos, naming, formatting, minor style — agents love producing these to have *something* to say. Low priority. Only surface if there are no higher-level findings. +3. **Rank within each bucket**: prefer findings flagged by more agents (cross-model agreement = higher confidence). +4. **Present**: Behavioral first, then Quality, then Nitpicks (if any). For each finding: location, dimension, description, how many agents flagged it. + +If a model is unavailable at runtime, proceed with the remaining models. Minimum viable council = 2 models.