G-Verified: Levent Bulut

Dataset v7: Detecting the Six Golden Rules — A Process Note

Case Studies May 29, 2026

Version 7 of the Objective Projection dataset added three pieces that move the methodology from a claim into an auditable structure: a transparent detection pipeline that marks which rules each scene satisfies, machine-readable citation infrastructure, and full-text methodology papers. This note explains why we took this route — and which honest limits we declare openly.

The problem: the "I deleted the label, so I'm compliant" fallacy

Objective Projection rests on six rules: Emotion Embargo, Simile Prohibition, Materialized Metaphors, Micro-Focus (Ng), Temporal Anchor, Atmosphere Contradiction. It is easy for a dataset to say it follows these rules; it is hard to prove it. Until v7, whether each of the 500 scenes obeyed a given rule was left to the reader's eye and trust. For an academic record, that is not enough.

The solution: a deterministic, rule-based, open detection pipeline

In v7, every scene gained an applied_rules block. The thing that produces this block is not a language model (no LLM-as-judge) — it is apply_rules.py, a single-file, dependency-free Python script using word-boundary matching. We chose this deliberately:

  • Reproducible. Clone the repo, run the script, get byte-identical output. Unlike an LLM judge, the result does not drift.
  • Auditable. Exactly which patterns each rule checks for is written plainly in the script. A researcher can inspect a specific decision, change thresholds, or contest a call.
  • Transparent. No black box. The full detection logic is readable.

Each applied_rules block carries six boolean flags, an active_count, primary_rule, detection_method, and doctrine_version. No existing field was modified, renamed, or removed — only added.

An honest limit, declared: the rules are not equally reliable

This is the most important part of the release, and the thing most datasets hide. A rule-based checker is, by design, a blunt instrument. Detection rates on the target outputs vary by rule:

  • High reliability (95%+): Simile Prohibition and Emotion Embargo — deterministic lexicon matching.
  • Moderate (60–80%): Temporal Anchor, Materialized Metaphors, Micro-Focus — structural/heuristic patterns.
  • Deliberately conservative (~10%): Atmosphere Contradiction.

That last line is intentional. Atmosphere Contradiction encodes a semantic authorial choice that regex cannot reliably see. So we tuned the pipeline to favour false negatives over false positives: it would rather miss the rule than wrongly claim it. The reason is simple — keeping the dataset's positive labels trustworthy matters more than coverage. To a peer reviewer, this is not a weakness; it is a signal of strength.

The citation infrastructure that ships with it

  • CITATION.cff — Citation File Format v1.2.0. Hugging Face, GitHub, and Zenodo recognise this file and surface a "Cite this dataset" affordance automatically. It carries the primary HF DOI (10.57967/hf/8960) and the Zenodo archive DOI (10.5281/zenodo.19511369), plus cross-references to the architectural framework and the Sₙ pilot report.
  • Two full-text papers under academic/: the short-form methodology paper (Beyond the Cortical Label) and the Sₙ pilot report (10.5281/zenodo.20362901).

In short

v7 was not about adding scenes. It was about making the existing 500 scenes provable. If you want a methodology to take its own claim — "literature is not a feeling, it is a physics" — seriously, that claim has to be auditable, reproducible, and honestly bounded. That is what v7 aimed at.

Dataset: huggingface.co/datasets/leventbulut/objective-projection · DOI: 10.57967/hf/8960 · Full technical write-up: huggingface.co/blog/leventbulut/objective-projection

Levent Bulut

Tags

Levent Bulut

Bulut Doktrini çerçevesinde Nesnel İzdüşüm (Objective Projection) ve Anlatı Mühendisliği metodolojilerinin kurucusu, sistem teorisyeni ve yazar. Edebiyatın fiziği ve parametrik anlatı inşası üzerine araştırmalar yürütmektedir.