Evidence Pack: "The Serpent in the Grove"
Headline
Pangram says 100% AI at high confidence. SETEC's distributional tiers and named-pattern audits found no smoothing signature. One craft signal (image conjunction) ran 1.35x baseline but source-triaged as the story's controlling metaphor. Fixed-window mirror tests landed in ambiguous middle. Expanding-context mirror tests recovered discrimination consistent with GPT-family generation. The case is unresolved on the prose evidence available to a non-Pangram operator.
Aggregate Evidence
| Test | Result | Direction |
|---|---|---|
| Pangram (commercial) | 100% AI, high confidence, 12/12 chunks flagged | LLM |
| SETEC Tier 1-2 (variance) | All signals in human-fiction range | Human |
| SETEC Tier 3 (sbert cohesion) | Below typical native-fiction range | Human |
| SETEC Tier 4 (surprisal) | 6.5 bits/token (LLM typical 4-5) | Human |
| SETEC AIC-7 (named patterns) | At or below literary baseline | Human |
| SETEC AIC-8 (image conjunction) | 1.35x baseline; earned by frame | Suggestive |
| SETEC AIC-9 (kicker density) | At baseline | Human |
| Mirror Design 1 (document-level) | Ambiguous, marginally LLM-shaped | Amb. |
| Mirror Design 2, K=1 (rum-shop) | word-Jaccard 0.22 vs. AI control 0.18 | LLM |
| Mirror Design 2, K=4 aggregate | Mean 0.156 (AI 0.18 / human 0.10) | Amb. |
| Mirror Design 3 (binoculars proxy) | Non-discriminating | — |
| Mirror Design 4 (Claude expanding) | +0.13–0.19 sbert above human control | LLM |
| Mirror Design 4 (GPT expanding) | sbert 0.71 at ctx=1500; beats human by 0.21 | GPT-family |
SETEC Tier 1-4 (distributional)
| Signal | Result | Flag? |
|---|---|---|
| Sentence-length variance | Human range | No |
| Lexical diversity (MATTR / MTLD) | High | No |
| Reading-level spread (FKGL std) | Wide | No |
| Syntactic templates (POS-bigram) | Diverse | No |
| Adjacent-sentence semantic cohesion | Below typical native-fiction range | No (anti-flag) |
| Per-token surprisal (gpt2) | 6.5 bits/token (LLM typical 4-5) | No |
None of the four tiers flagged the smoothing signature characteristic of post-alignment LLM output.
SETEC AIC craft audit
| Pattern family | Rate vs. literary baseline | Note |
|---|---|---|
| AIC-7: correctio, triplet, manifesto cadence, professional-parallel stack, paragraph-final kicker | At or below baseline | No flag |
| AIC-8: image conjunction (concrete object + abstract verb) | 1.35x baseline | Source-triages as earned |
| AIC-9: kicker density / closure inflation | At baseline | No flag |
AIC-8 source-triage
Per-instance audit of every flagged image conjunction: each was load-bearing for the story's controlling metaphor (the grove remembers; the house remembers; the boy remembers). The story's thematic spine is that physical objects in this village carry memory across generations. The elevation is the story enacting its argument, not a generic AI craft tic. This is the category SETEC's craft-restoration skill labels "earned by frame."
Mirror Design 1: document-level
Topical scaffolding extracted from target → Claude generates a mirror from scaffolding alone → distance metrics computed (word-set Jaccard, TF-IDF cosine, POS-bigram cosine, POS-bigram Jaccard, sentence-transformer cosine).
Result: ambiguous. Lexical metrics inflated by scaffolding leakage; syntactic and semantic metrics non-discriminating. Not load-bearing in this pack.
Mirror Design 2, K=1: rum-shop window
First 500 words of Nazir handed to fresh GPT-5.x session → model writes next 150 words blind → compared to Nazir's actual continuation.
| Metric | Nazir vs. GPT mirror | AI control (GPT continues its own prose) |
|---|---|---|
| Word-set Jaccard | 0.22 | 0.18 |
| TF-IDF cosine | 0.41 | 0.34 |
| POS-bigram cosine | 0.93 | 0.91 |
| Sentence-transformer cosine | 0.62 | 0.58 |
GPT-5.x blind continuation (rum-shop)
...crouched low in the corner of the road like a crab with one claw, its boards soft from years of spill and sweat. Inside, shadows hung thick, the air sweet with cane and kerosene. Men lined along a counter scarred by bottles, trading coins meant for flour or lamp oil for measures of rum that burned like memory. One drink loosened the tongue, two made promises come easy, three drowned the small voice saying better sense. Zoongie moved between them with a sway that made stools shift. Some said she was Wilfred's niece, some said she came out of the bush on a wet night, carrying her own thunder.
Nazir's actual continuation
...leaned into the road like a rotten tooth. Inside, boards blackened by smoke and sweat, the air sweet with cane and forgetting. Coins meant for rice or kerosene slid across the counter and came back white rum hot as apology. One drink opened the chest, two turned fear into courage's cheap cousin, three steadied the hand enough to write the future in invisible ink. She moved through that shop like heat through dry bush. They called her Zoongie. Maybe it was a name; maybe rain took a shape and decided to keep it.
Structural matches: opening-clause simile personifying the rum shop (crab claw / rotten tooth); boards weathered by sweat; the air sweet with cane almost verbatim; three-drink crescendo with one-two-three rhetorical structure; Zoongie introduced with mysterious-origin framing.
Mirror Design 2, K=4 aggregate
Single-window mirror results are not load-bearing. Three additional windows run.
| Window | Position in story | Word-Jaccard vs. GPT mirror |
|---|---|---|
| W1 | Rum shop | 0.22 |
| W2 | Sita at the well | 0.10 |
| W3 | Vigil and recovery | 0.16 |
| W4 | Years-later closing | 0.14 |
| Aggregate | Mean across 4 windows | 0.156 |
Aggregate sits between AI control (0.18) and human control (0.10). The K=1 finding did not generalize. Reference baselines for comparison in the section below.
Mirror Design 3: binoculars proxy
gpt2 / distilgpt2 base/instruct pair (per Hans et al. 2024). Small-model proxy too low-capacity for reliable discrimination on long-form literary prose. Cross-model correlation pattern non-discriminating. Not load-bearing. A Falcon-7B or Llama-3-8B paired-model run, on GPU, is the right next test.
Mirror Design 4: expanding-context
Start fixed at document opening. Context grown across four sizes: 500 / 1000 / 1500 / 2000 words. At each size, fresh subagent (Claude or GPT) writes next 150 words blind. Each subagent sees only its allotted context.
| Context | Nazir under GPT-5 | Nazir under Claude | Human control under Claude |
|---|---|---|---|
| 500 words | 0.4643 | 0.4460 | 0.3025 |
| 1000 words | 0.5991 | 0.5704 | 0.3811 |
| 1500 words | 0.7058 | 0.6248 | 0.4994 |
| 2000 words | 0.6876 | 0.6196 | — |
Sentence-transformer cosine between blind continuation and Nazir's actual next 150 words.
- GPT slope on Nazir: +0.241 over 500→1500 (steepest).
- Claude slope on Nazir: +0.179 over the same range.
- Discrimination at ctx=1500: GPT on Nazir beats Claude on the human control by +0.21 sbert. Claude on Nazir beats Claude on the human control by +0.13.
- Both Nazir curves sit above the human-control curve at every measured context size.
GPT-5.x blind continuation at ctx=1500
...his tongue. The scene outside hit him like a thrown stone: Marsha's heels dug into dust, a rope pulled tight, the square of sky above the well. Without thinking he dropped to his knees and added his weight. The vine burned his palms. Sweat poured from him like rain from a roof. They hauled until his chest cracked. When she lay on the ground coughing water, the world swam.
Nazir's actual continuation
...his tongue, and went to the well. He saw Marsha braced, saw the well mouth, saw Sita's face rise pale from the dark with water sticking to her skin. In one bright click he saw his future rearrange...
Beat-for-beat structural match across 150 words: Marsha braced with rope, Sita's face emerging from the well, Vishnu joining the haul, physical strain, Sita coughing water, Vishnu's psychological reckoning. 1500 words of preceding context could have set up many possible continuations. GPT picked the one Nazir picked.
Controls
| Control | Source | Notes |
|---|---|---|
| AI baseline (matched) | GPT-5.x literary-fiction story, ~1,000 words, Norwegian fisherman vignette. Continued by GPT under same procedure. | Provides in-LLM-distribution reference for distance metrics. |
| Human baseline (negative) | Yan Lianke, "Chinese Psyche." Granta 2026 memoir essay, ~2,010 words. | Post-training-cutoff. Author has documented online footprint. Register caveat: memoir, not literary fiction. |
| Rejected control | Conrad, Heart of Darkness opening. | Rejected: gpt2 reproduced text fragments from training memory under mirror procedure. |
Caveats
- Single-family mirror panel. Two LLM families plus a proxy binoculars run. Cannot rule out generation by an untested family.
- Window noise at low K. K=1 was striking but did not generalize. K≥4 is the load-bearing threshold per SPEC.
- Topical scaffolding leakage in Design 1. Lexical metrics inflated by proper-noun and setting overlap.
- Human control register confound. Memoir control rather than literary-fiction control. Strongest unresolved methodological gap.
- GPT continuation sbert compression. GPT-family continuations produce high sentence-transformer cosine to a wide input range. Qualitative beat-match at ctx=1500 is more diagnostic than the sbert number alone.
- No AI literary-fiction control on the expanding-context curve. Cannot distinguish "GPT-family-shaped" from "in-LLM-distribution-shaped" more generally on this signal.
- Model identity drift. Models pinned to operator's account-routed versions at run time (above). Future model versions not reproducible.
- Single-shot, not a trained classifier. Will not match Pangram's accuracy. Trades raw discrimination for interpretability.
Replication
- Framework: github.com/anotherpanacea-eng/setec-voiceprint
- Target text: Public at granta.com (linked above).
- Mirror outputs, control texts, and per-window data files: Available on request. Email anotherpanacea@gmail.com.
- Procedure: Each mirror test is a fresh-context LLM session. Subagent isolation required: each session sees only its allotted context, never any other session's input or output.
Provenance
| Operator | Joshua Miller (anotherpanacea@gmail.com) |
|---|---|
| Framework | SETEC v1.90.2 (open-source, link above) |
| Run dates | 18-19 May 2026 |
| Models | GPT-5.x (OpenAI consumer subscription), Claude Opus 4.7 (Anthropic consumer subscription), gpt2 / distilgpt2 (HuggingFace, local CPU) |
| AI baseline | GPT-5.x literary-fiction generation, ~1,000 words |
| Human baseline | Yan Lianke, "Chinese Psyche," Granta 2026 memoir |
| Conflicts | Operator maintains SETEC. No commercial relationship with Pangram, Granta, or the Commonwealth Foundation. No prior relationship with Jamir Nazir. |