Evidence Pack: "The Serpent in the Grove"

Target: Jamir Nazir, "The Serpent in the Grove." Caribbean regional winner, 2026 Commonwealth Short Story Prize. Published in Granta, 18 May 2026. ~3,447 words. Source · Archive

Operator: Joshua Miller. Maintainer, SETEC v1.90.2 (github.com/anotherpanacea-eng/setec-voiceprint)

Run dates: 18-19 May 2026

Models: GPT-5.x (OpenAI), Claude Opus 4.7 (Anthropic), gpt2 / distilgpt2 (HuggingFace, local CPU)

Companion: Substack post at anotherpanacea.substack.com (May 19, 2026)

Headline

Pangram says 100% AI at high confidence. SETEC's distributional tiers and named-pattern audits found no smoothing signature. One craft signal (image conjunction) ran 1.35x baseline but source-triaged as the story's controlling metaphor. Fixed-window mirror tests landed in ambiguous middle. Expanding-context mirror tests recovered discrimination consistent with GPT-family generation. The case is unresolved on the prose evidence available to a non-Pangram operator.

Aggregate Evidence

Test	Result	Direction
Pangram (commercial)	100% AI, high confidence, 12/12 chunks flagged	LLM
SETEC Tier 1-2 (variance)	All signals in human-fiction range	Human
SETEC Tier 3 (sbert cohesion)	Below typical native-fiction range	Human
SETEC Tier 4 (surprisal)	6.5 bits/token (LLM typical 4-5)	Human
SETEC AIC-7 (named patterns)	At or below literary baseline	Human
SETEC AIC-8 (image conjunction)	1.35x baseline; earned by frame	Suggestive
SETEC AIC-9 (kicker density)	At baseline	Human
Mirror Design 1 (document-level)	Ambiguous, marginally LLM-shaped	Amb.
Mirror Design 2, K=1 (rum-shop)	word-Jaccard 0.22 vs. AI control 0.18	LLM
Mirror Design 2, K=4 aggregate	Mean 0.156 (AI 0.18 / human 0.10)	Amb.
Mirror Design 3 (binoculars proxy)	Non-discriminating	—
Mirror Design 4 (Claude expanding)	+0.13–0.19 sbert above human control	LLM
Mirror Design 4 (GPT expanding)	sbert 0.71 at ctx=1500; beats human by 0.21	GPT-family

SETEC Tier 1-4 (distributional)

Signal	Result	Flag?
Sentence-length variance	Human range	No
Lexical diversity (MATTR / MTLD)	High	No
Reading-level spread (FKGL std)	Wide	No
Syntactic templates (POS-bigram)	Diverse	No
Adjacent-sentence semantic cohesion	Below typical native-fiction range	No (anti-flag)
Per-token surprisal (gpt2)	6.5 bits/token (LLM typical 4-5)	No

None of the four tiers flagged the smoothing signature characteristic of post-alignment LLM output.

SETEC AIC craft audit

Pattern family	Rate vs. literary baseline	Note
AIC-7: correctio, triplet, manifesto cadence, professional-parallel stack, paragraph-final kicker	At or below baseline	No flag
AIC-8: image conjunction (concrete object + abstract verb)	1.35x baseline	Source-triages as earned
AIC-9: kicker density / closure inflation	At baseline	No flag

AIC-8 source-triage

Per-instance audit of every flagged image conjunction: each was load-bearing for the story's controlling metaphor (the grove remembers; the house remembers; the boy remembers). The story's thematic spine is that physical objects in this village carry memory across generations. The elevation is the story enacting its argument, not a generic AI craft tic. This is the category SETEC's craft-restoration skill labels "earned by frame."

Mirror Design 1: document-level

Topical scaffolding extracted from target → Claude generates a mirror from scaffolding alone → distance metrics computed (word-set Jaccard, TF-IDF cosine, POS-bigram cosine, POS-bigram Jaccard, sentence-transformer cosine).

Result: ambiguous. Lexical metrics inflated by scaffolding leakage; syntactic and semantic metrics non-discriminating. Not load-bearing in this pack.

Mirror Design 2, K=1: rum-shop window

First 500 words of Nazir handed to fresh GPT-5.x session → model writes next 150 words blind → compared to Nazir's actual continuation.

Metric	Nazir vs. GPT mirror	AI control (GPT continues its own prose)
Word-set Jaccard	0.22	0.18
TF-IDF cosine	0.41	0.34
POS-bigram cosine	0.93	0.91
Sentence-transformer cosine	0.62	0.58

GPT-5.x blind continuation (rum-shop)

...crouched low in the corner of the road like a crab with one claw, its boards soft from years of spill and sweat. Inside, shadows hung thick, the air sweet with cane and kerosene. Men lined along a counter scarred by bottles, trading coins meant for flour or lamp oil for measures of rum that burned like memory. One drink loosened the tongue, two made promises come easy, three drowned the small voice saying better sense. Zoongie moved between them with a sway that made stools shift. Some said she was Wilfred's niece, some said she came out of the bush on a wet night, carrying her own thunder.

Nazir's actual continuation

...leaned into the road like a rotten tooth. Inside, boards blackened by smoke and sweat, the air sweet with cane and forgetting. Coins meant for rice or kerosene slid across the counter and came back white rum hot as apology. One drink opened the chest, two turned fear into courage's cheap cousin, three steadied the hand enough to write the future in invisible ink. She moved through that shop like heat through dry bush. They called her Zoongie. Maybe it was a name; maybe rain took a shape and decided to keep it.

Structural matches: opening-clause simile personifying the rum shop (crab claw / rotten tooth); boards weathered by sweat; the air sweet with cane almost verbatim; three-drink crescendo with one-two-three rhetorical structure; Zoongie introduced with mysterious-origin framing.

Mirror Design 2, K=4 aggregate

Single-window mirror results are not load-bearing. Three additional windows run.

Window	Position in story	Word-Jaccard vs. GPT mirror
W1	Rum shop	0.22
W2	Sita at the well	0.10
W3	Vigil and recovery	0.16
W4	Years-later closing	0.14
Aggregate	Mean across 4 windows	0.156

Aggregate sits between AI control (0.18) and human control (0.10). The K=1 finding did not generalize. Reference baselines for comparison in the section below.

Mirror Design 3: binoculars proxy

gpt2 / distilgpt2 base/instruct pair (per Hans et al. 2024). Small-model proxy too low-capacity for reliable discrimination on long-form literary prose. Cross-model correlation pattern non-discriminating. Not load-bearing. A Falcon-7B or Llama-3-8B paired-model run, on GPU, is the right next test.

Mirror Design 4: expanding-context

Start fixed at document opening. Context grown across four sizes: 500 / 1000 / 1500 / 2000 words. At each size, fresh subagent (Claude or GPT) writes next 150 words blind. Each subagent sees only its allotted context.

Context	Nazir under GPT-5	Nazir under Claude	Human control under Claude
500 words	0.4643	0.4460	0.3025
1000 words	0.5991	0.5704	0.3811
1500 words	0.7058	0.6248	0.4994
2000 words	0.6876	0.6196	—

Sentence-transformer cosine between blind continuation and Nazir's actual next 150 words.

GPT slope on Nazir: +0.241 over 500→1500 (steepest).
Claude slope on Nazir: +0.179 over the same range.
Discrimination at ctx=1500: GPT on Nazir beats Claude on the human control by +0.21 sbert. Claude on Nazir beats Claude on the human control by +0.13.
Both Nazir curves sit above the human-control curve at every measured context size.

GPT-5.x blind continuation at ctx=1500

...his tongue. The scene outside hit him like a thrown stone: Marsha's heels dug into dust, a rope pulled tight, the square of sky above the well. Without thinking he dropped to his knees and added his weight. The vine burned his palms. Sweat poured from him like rain from a roof. They hauled until his chest cracked. When she lay on the ground coughing water, the world swam.

Nazir's actual continuation

...his tongue, and went to the well. He saw Marsha braced, saw the well mouth, saw Sita's face rise pale from the dark with water sticking to her skin. In one bright click he saw his future rearrange...

Beat-for-beat structural match across 150 words: Marsha braced with rope, Sita's face emerging from the well, Vishnu joining the haul, physical strain, Sita coughing water, Vishnu's psychological reckoning. 1500 words of preceding context could have set up many possible continuations. GPT picked the one Nazir picked.

Controls

Control	Source	Notes
AI baseline (matched)	GPT-5.x literary-fiction story, ~1,000 words, Norwegian fisherman vignette. Continued by GPT under same procedure.	Provides in-LLM-distribution reference for distance metrics.
Human baseline (negative)	Yan Lianke, "Chinese Psyche." Granta 2026 memoir essay, ~2,010 words.	Post-training-cutoff. Author has documented online footprint. Register caveat: memoir, not literary fiction.
Rejected control	Conrad, Heart of Darkness opening.	Rejected: gpt2 reproduced text fragments from training memory under mirror procedure.

Caveats

Single-family mirror panel. Two LLM families plus a proxy binoculars run. Cannot rule out generation by an untested family.
Window noise at low K. K=1 was striking but did not generalize. K≥4 is the load-bearing threshold per SPEC.
Topical scaffolding leakage in Design 1. Lexical metrics inflated by proper-noun and setting overlap.
Human control register confound. Memoir control rather than literary-fiction control. Strongest unresolved methodological gap.
GPT continuation sbert compression. GPT-family continuations produce high sentence-transformer cosine to a wide input range. Qualitative beat-match at ctx=1500 is more diagnostic than the sbert number alone.
No AI literary-fiction control on the expanding-context curve. Cannot distinguish "GPT-family-shaped" from "in-LLM-distribution-shaped" more generally on this signal.
Model identity drift. Models pinned to operator's account-routed versions at run time (above). Future model versions not reproducible.
Single-shot, not a trained classifier. Will not match Pangram's accuracy. Trades raw discrimination for interpretability.

Replication

Framework: github.com/anotherpanacea-eng/setec-voiceprint
Target text: Public at granta.com (linked above).
Mirror outputs, control texts, and per-window data files: Available on request. Email anotherpanacea@gmail.com.
Procedure: Each mirror test is a fresh-context LLM session. Subagent isolation required: each session sees only its allotted context, never any other session's input or output.

Provenance

Operator	Joshua Miller (anotherpanacea@gmail.com)
Framework	SETEC v1.90.2 (open-source, link above)
Run dates	18-19 May 2026
Models	GPT-5.x (OpenAI consumer subscription), Claude Opus 4.7 (Anthropic consumer subscription), gpt2 / distilgpt2 (HuggingFace, local CPU)
AI baseline	GPT-5.x literary-fiction generation, ~1,000 words
Human baseline	Yan Lianke, "Chinese Psyche," Granta 2026 memoir
Conflicts	Operator maintains SETEC. No commercial relationship with Pangram, Granta, or the Commonwealth Foundation. No prior relationship with Jamir Nazir.