Evidence Pack: "The Serpent in the Grove"

Target: Jamir Nazir, "The Serpent in the Grove." Caribbean regional winner, 2026 Commonwealth Short Story Prize. Published in Granta, 18 May 2026. ~3,447 words. Source · Archive
Operator: Joshua Miller. Maintainer, SETEC v1.90.2 (github.com/anotherpanacea-eng/setec-voiceprint)
Run dates: 18-19 May 2026
Models: GPT-5.x (OpenAI), Claude Opus 4.7 (Anthropic), gpt2 / distilgpt2 (HuggingFace, local CPU)
Companion: Substack post at anotherpanacea.substack.com (May 19, 2026)

Headline

Pangram says 100% AI at high confidence. SETEC's distributional tiers and named-pattern audits found no smoothing signature. One craft signal (image conjunction) ran 1.35x baseline but source-triaged as the story's controlling metaphor. Fixed-window mirror tests landed in ambiguous middle. Expanding-context mirror tests recovered discrimination consistent with GPT-family generation. The case is unresolved on the prose evidence available to a non-Pangram operator.

Aggregate Evidence

Test Result Direction
Pangram (commercial)100% AI, high confidence, 12/12 chunks flaggedLLM
SETEC Tier 1-2 (variance)All signals in human-fiction rangeHuman
SETEC Tier 3 (sbert cohesion)Below typical native-fiction rangeHuman
SETEC Tier 4 (surprisal)6.5 bits/token (LLM typical 4-5)Human
SETEC AIC-7 (named patterns)At or below literary baselineHuman
SETEC AIC-8 (image conjunction)1.35x baseline; earned by frameSuggestive
SETEC AIC-9 (kicker density)At baselineHuman
Mirror Design 1 (document-level)Ambiguous, marginally LLM-shapedAmb.
Mirror Design 2, K=1 (rum-shop)word-Jaccard 0.22 vs. AI control 0.18LLM
Mirror Design 2, K=4 aggregateMean 0.156 (AI 0.18 / human 0.10)Amb.
Mirror Design 3 (binoculars proxy)Non-discriminating
Mirror Design 4 (Claude expanding)+0.13–0.19 sbert above human controlLLM
Mirror Design 4 (GPT expanding)sbert 0.71 at ctx=1500; beats human by 0.21GPT-family

SETEC Tier 1-4 (distributional)

SignalResultFlag?
Sentence-length varianceHuman rangeNo
Lexical diversity (MATTR / MTLD)HighNo
Reading-level spread (FKGL std)WideNo
Syntactic templates (POS-bigram)DiverseNo
Adjacent-sentence semantic cohesionBelow typical native-fiction rangeNo (anti-flag)
Per-token surprisal (gpt2)6.5 bits/token (LLM typical 4-5)No

None of the four tiers flagged the smoothing signature characteristic of post-alignment LLM output.

SETEC AIC craft audit

Pattern familyRate vs. literary baselineNote
AIC-7: correctio, triplet, manifesto cadence, professional-parallel stack, paragraph-final kickerAt or below baselineNo flag
AIC-8: image conjunction (concrete object + abstract verb)1.35x baselineSource-triages as earned
AIC-9: kicker density / closure inflationAt baselineNo flag

AIC-8 source-triage

Per-instance audit of every flagged image conjunction: each was load-bearing for the story's controlling metaphor (the grove remembers; the house remembers; the boy remembers). The story's thematic spine is that physical objects in this village carry memory across generations. The elevation is the story enacting its argument, not a generic AI craft tic. This is the category SETEC's craft-restoration skill labels "earned by frame."

Mirror Design 1: document-level

Topical scaffolding extracted from target → Claude generates a mirror from scaffolding alone → distance metrics computed (word-set Jaccard, TF-IDF cosine, POS-bigram cosine, POS-bigram Jaccard, sentence-transformer cosine).

Result: ambiguous. Lexical metrics inflated by scaffolding leakage; syntactic and semantic metrics non-discriminating. Not load-bearing in this pack.

Mirror Design 2, K=1: rum-shop window

First 500 words of Nazir handed to fresh GPT-5.x session → model writes next 150 words blind → compared to Nazir's actual continuation.

MetricNazir vs. GPT mirrorAI control (GPT continues its own prose)
Word-set Jaccard0.220.18
TF-IDF cosine0.410.34
POS-bigram cosine0.930.91
Sentence-transformer cosine0.620.58

GPT-5.x blind continuation (rum-shop)

...crouched low in the corner of the road like a crab with one claw, its boards soft from years of spill and sweat. Inside, shadows hung thick, the air sweet with cane and kerosene. Men lined along a counter scarred by bottles, trading coins meant for flour or lamp oil for measures of rum that burned like memory. One drink loosened the tongue, two made promises come easy, three drowned the small voice saying better sense. Zoongie moved between them with a sway that made stools shift. Some said she was Wilfred's niece, some said she came out of the bush on a wet night, carrying her own thunder.

Nazir's actual continuation

...leaned into the road like a rotten tooth. Inside, boards blackened by smoke and sweat, the air sweet with cane and forgetting. Coins meant for rice or kerosene slid across the counter and came back white rum hot as apology. One drink opened the chest, two turned fear into courage's cheap cousin, three steadied the hand enough to write the future in invisible ink. She moved through that shop like heat through dry bush. They called her Zoongie. Maybe it was a name; maybe rain took a shape and decided to keep it.

Structural matches: opening-clause simile personifying the rum shop (crab claw / rotten tooth); boards weathered by sweat; the air sweet with cane almost verbatim; three-drink crescendo with one-two-three rhetorical structure; Zoongie introduced with mysterious-origin framing.

Mirror Design 2, K=4 aggregate

Single-window mirror results are not load-bearing. Three additional windows run.

WindowPosition in storyWord-Jaccard vs. GPT mirror
W1Rum shop0.22
W2Sita at the well0.10
W3Vigil and recovery0.16
W4Years-later closing0.14
AggregateMean across 4 windows0.156

Aggregate sits between AI control (0.18) and human control (0.10). The K=1 finding did not generalize. Reference baselines for comparison in the section below.

Mirror Design 3: binoculars proxy

gpt2 / distilgpt2 base/instruct pair (per Hans et al. 2024). Small-model proxy too low-capacity for reliable discrimination on long-form literary prose. Cross-model correlation pattern non-discriminating. Not load-bearing. A Falcon-7B or Llama-3-8B paired-model run, on GPU, is the right next test.

Mirror Design 4: expanding-context

Start fixed at document opening. Context grown across four sizes: 500 / 1000 / 1500 / 2000 words. At each size, fresh subagent (Claude or GPT) writes next 150 words blind. Each subagent sees only its allotted context.

ContextNazir under GPT-5Nazir under ClaudeHuman control under Claude
500 words0.46430.44600.3025
1000 words0.59910.57040.3811
1500 words0.70580.62480.4994
2000 words0.68760.6196

Sentence-transformer cosine between blind continuation and Nazir's actual next 150 words.

GPT-5.x blind continuation at ctx=1500

...his tongue. The scene outside hit him like a thrown stone: Marsha's heels dug into dust, a rope pulled tight, the square of sky above the well. Without thinking he dropped to his knees and added his weight. The vine burned his palms. Sweat poured from him like rain from a roof. They hauled until his chest cracked. When she lay on the ground coughing water, the world swam.

Nazir's actual continuation

...his tongue, and went to the well. He saw Marsha braced, saw the well mouth, saw Sita's face rise pale from the dark with water sticking to her skin. In one bright click he saw his future rearrange...

Beat-for-beat structural match across 150 words: Marsha braced with rope, Sita's face emerging from the well, Vishnu joining the haul, physical strain, Sita coughing water, Vishnu's psychological reckoning. 1500 words of preceding context could have set up many possible continuations. GPT picked the one Nazir picked.

Controls

ControlSourceNotes
AI baseline (matched)GPT-5.x literary-fiction story, ~1,000 words, Norwegian fisherman vignette. Continued by GPT under same procedure.Provides in-LLM-distribution reference for distance metrics.
Human baseline (negative)Yan Lianke, "Chinese Psyche." Granta 2026 memoir essay, ~2,010 words.Post-training-cutoff. Author has documented online footprint. Register caveat: memoir, not literary fiction.
Rejected controlConrad, Heart of Darkness opening.Rejected: gpt2 reproduced text fragments from training memory under mirror procedure.

Caveats

  1. Single-family mirror panel. Two LLM families plus a proxy binoculars run. Cannot rule out generation by an untested family.
  2. Window noise at low K. K=1 was striking but did not generalize. K≥4 is the load-bearing threshold per SPEC.
  3. Topical scaffolding leakage in Design 1. Lexical metrics inflated by proper-noun and setting overlap.
  4. Human control register confound. Memoir control rather than literary-fiction control. Strongest unresolved methodological gap.
  5. GPT continuation sbert compression. GPT-family continuations produce high sentence-transformer cosine to a wide input range. Qualitative beat-match at ctx=1500 is more diagnostic than the sbert number alone.
  6. No AI literary-fiction control on the expanding-context curve. Cannot distinguish "GPT-family-shaped" from "in-LLM-distribution-shaped" more generally on this signal.
  7. Model identity drift. Models pinned to operator's account-routed versions at run time (above). Future model versions not reproducible.
  8. Single-shot, not a trained classifier. Will not match Pangram's accuracy. Trades raw discrimination for interpretability.

Replication

Provenance

OperatorJoshua Miller (anotherpanacea@gmail.com)
FrameworkSETEC v1.90.2 (open-source, link above)
Run dates18-19 May 2026
ModelsGPT-5.x (OpenAI consumer subscription), Claude Opus 4.7 (Anthropic consumer subscription), gpt2 / distilgpt2 (HuggingFace, local CPU)
AI baselineGPT-5.x literary-fiction generation, ~1,000 words
Human baselineYan Lianke, "Chinese Psyche," Granta 2026 memoir
ConflictsOperator maintains SETEC. No commercial relationship with Pangram, Granta, or the Commonwealth Foundation. No prior relationship with Jamir Nazir.