Dataset v7.2: The Pattern F Gap Is Closed, the Hard Negatives Expanded
This is a maintenance-and-extension release. Not a new claim — it is about making the corpus more complete and more internally consistent. That is what the real work of a research archive looks like: unglamorous, but keeping its word.
When I released v7.1, I disclosed a gap in my own data — Pattern F had no pure examples — and said closing it was a priority. This release does that, adds a second batch of hard negatives, and completes the DOI record in the dataset README. None of it is a breakthrough. I would rather report it plainly than dress it up.
The Pattern F gap is closed
Pattern F — "Mundane Parallel Life" — is one of six sub-patterns of the Atmosphere Contradiction rule (Rule 6 of the Bulut Doctrine). It is the case where the detail that breaks the scene's emotion is an ordinary person living an ordinary life, indifferent to the character's crisis: a neighbour airing a rug while someone waits for a life-changing phone call, a child picking the tomato out of a sandwich in a hospital waiting room.
In v7.1 I defined Pattern F but openly stated that there were zero pure-corpus examples of it in the 500-scene corpus — it existed only inside hard negative target outputs. A pattern with no clean positive examples is a definition, not data.
v7.2 adds ten pure Pattern F scenes (pattern_F_pure_corpus_batch1.jsonl, five Turkish and five English, pf_001–pf_010) across ten emotion categories. Each applies the five-criterion structural signature that separates Pattern F from its neighbours: the detail is independent of the main character's action; it is ordinary daily life, not a professional role (this is what divides F from Pattern B); its metaphorical load is zero; the stakes of the scene and of the detail are wildly mismatched; and the detail enters, exists, and leaves without interacting with the character.
I want to be precise here. These ten scenes apply the signature — they were built to satisfy the five criteria. That is not the same as validating it. Validation would require testing whether the criteria reliably distinguish Pattern F from neighbouring patterns on scenes I did not write. That validation is still open. What v7.2 closes is the absence of clean positive examples; it does not, on its own, prove the typology correct.
Hard Negatives Batch 2
The hard negatives corpus targets a failure mode I keep seeing in models trained on the standard corpus: the "I removed the emotion label, so I'm compliant" shortcut. A model learns to drop "she was terrified" and "like a cage," then ports the same emotion into adverbs, into pseudo-objective numbers, into a seven-item cliché inventory, or into atmosphere that reinforces rather than contradicts. Batch 1 covered five such patterns across five categories.
Batch 2 extends the same five violation types into five new categories: shame (cliché inventory), determination (pseudo-objective numbers), awe (hidden simile), remorse (emotion-loaded adverbs), and jealousy (atmospheric reinforcement). Ten new scenes, Turkish and English in parallel, same schema as Batch 1. Each carries load_bearing_elements — spans that must survive an edit operation, because the most common way a model breaks an OP-compliant scene is by "tightening" it and deleting the very detail doing the work. That deletion is summarization bias acting directly against the method.
The corpus is still deliberately small — twenty hard negatives in total. They are expensive to build correctly; each needs a bad output that just fails and a target that just succeeds. I would rather have twenty sharp pairs than two hundred loose ones.
A note on transparency
Two honest disclosures, because the method has to be auditable.
First, the Pattern F scenes carry two experimental schema fields (pattern_F_signature and atmosfer_celiskisi_pattern) that are not part of the stable v7.1 schema. They are flagged schema_extension: "v8-alpha" so no one mistakes them for settled structure, and they are annotated manually (annotation_method: "manual_pattern_F_v1"), not by the rule-based pipeline — apply_rules.py detects atmosphere contradiction at only about 9.8% reliability and would miss most of these. Labelling them by hand and saying so was the honest call.
Second, the DOI record in the dataset README is now complete: forty-four Zenodo deposits, the Hugging Face primary DOI, and the canonical Narrative Entropy (Sₙ) reference. Earlier versions of the README listed only a subset, which was inconsistent with the full chain. That is now fixed.
What this is for
Everything here is openly licensed and auditable. If you train on it, critique it, or try to break the Pattern F typology with a counterexample, that is exactly the use it is meant for. The dataset is at huggingface.co/datasets/leventbulut/objective-projection
On questions and contact I'm glad this work draws interest, and I read every message that comes in. Because of the volume, I can't reply to each one individually — but the good news is that most of the questions I'm asked (how the methodology works, what's coming next) are already answered in the dataset README, since the whole project is deliberately open and auditable. Why I run this in the open rather than through traditional journals is explained separately in Why I Work in the Open. If you have a question, the dataset is the best place to start: most of what people ask is already there, with its sources attached. It's also where to follow new work — every release is announced there and here first. Dataset: huggingface.co/datasets/leventbulut/objective-projection