A diagnostic checklist for testing whether your AI-generated content actually sounds like you. Twelve specific checks covering voice match, default vocabulary, structural sameness and the seven generic-AI failure modes. Run it on any draft before publishing.
Run this 12-point audit on any AI-generated draft before publishing. Score above 80%: ship. 60-80%: edit specific sections. Below 60%: rebuild the voice prompt before generating more drafts. Takes 10-15 minutes per draft. The audit catches the failures that make AI content recognisable as AI: default vocabulary, hedging, structural sameness, and absent point of view.
The standard mistake when AI content sounds wrong is to regenerate. New prompt, same model, same voice prompt. The output drifts in different directions but rarely improves systematically. The reason: regeneration treats the symptom (this draft is off) and not the cause (the voice prompt is too vague, or the task prompt is asking for the wrong shape).
The audit reverses the diagnosis. Instead of judging "is this draft good", it asks "which specific failure modes is this draft showing". The failures are diagnostic. They tell you what to fix in the voice prompt, not just in the draft.
The seven causes of generic AI content (covered in AI content that doesn't sound like AI) provide the framework. The 12 checks below operationalise it.
Take three drafts the AI has produced this week. Score each draft against the 12 checks. Each check is pass or fail. The total out of 12 is the audit score.
Three drafts, not one. Single drafts hide patterns that only show up in repetition. If all three drafts open with the same hook structure, that is structural sameness even if each individual hook reads fine.
Section 1 · Voice match (4 checks)
Check 1
Read the voice prompt's voice essence section. Then read the draft. Does the draft feel like it was written by the person described? Not "is it grammatical" or "is it on topic". Does the personality match.
Check 2
Check sentence length distribution. Voice prompts that work specify a range (e.g. 4-22 words, average 11). Drafts that are within range and varied pass. Drafts where every sentence is the same length fail, even if the average is right.
Check 3
Pick one of your signature moves (callback structures, recurring metaphors, contrarian setups). Does it show up in the draft? Not all of them, every time. But at least one in every long-form draft. None at all is a fail.
Check 4
Voice shifts by context. A LinkedIn post is not a sales page is not a friend email. Does the draft's tone match its intended context? A tonally formal LinkedIn post when the writer is naturally direct fails this check.
Section 2 · Default vocabulary and structure (4 checks)
Check 5
Run a search across the draft for the banned word list in the voice prompt. Default AI vocabulary (leverage, cutting-edge, thought leader, in this fast-paced world, unlock, navigate, streamline, robust, seamlessly) should not appear. Personal banned words (whatever is in the voice prompt) should not appear either.
Check 6
Check the first sentence of each draft. Do they all open with similar patterns ("Let me tell you about...", "Here's the truth about...", "I've been thinking about...")? Default openers are a structural sameness flag.
Check 7
Scan for vague terms: "businesses", "professionals", "many people", "various challenges". Replace with specific details where possible. The draft should name actual scenarios, numbers, or examples — not categories.
Check 8
Compare the three drafts as a set. Same opening rhythm? Same body structure (intro, three points, takeaway)? Same closing pattern? If structure is shared across drafts, the AI is defaulting to a template even when the topics differ.
Section 3 · Point of view and finish (4 checks)
Check 9
Does the draft commit to a position? Or does it hedge with "it depends", "many believe", "some argue"? AI defaults to hedging because hedging is statistically safer. A voice prompt that does its job overrides the default.
Check 10
"May", "could", "in some cases", "potentially". Allowed when the uncertainty is real and specified. Not allowed as a default register. If half the sentences hedge, the writer's authority is being eroded by the AI's defaults.
Check 11
The closing line is one of the highest-leverage spots in any post. Does it provoke, reframe, or commit? Or does it default to "Let me know your thoughts" or "What do you think"? Generic closes signal generic content.
Check 12
Even posts without explicit CTAs implicitly direct the reader. The draft should leave the reader with one specific takeaway, action, or shift in thinking. Not three. Not zero. One.
10-12 / 12 (above 80%): ship. The voice prompt is doing its job. Edit any specific failures, then publish.
7-9 / 12 (60-80%): edit, don't regenerate. The spine is right. Specific sections need attention. Identify which checks failed and edit the draft directly. Regenerating from the same voice prompt rarely improves output.
Below 7 / 12 (under 60%): rebuild the voice prompt. Systemic failure. The voice prompt is too vague or too narrow. Don't generate more drafts until the prompt is tightened. How to build a voice prompt covers reconstruction; how to reverse engineer your own voice covers discovery if the issue is upstream of construction.
The audit is only useful if failures inform iteration. The mapping:
Treat the audit as feedback for the voice prompt, not just for the draft. Otherwise the same failures recur every week.
Once you have the 12 checks, you can ask ChatGPT or Claude to run the audit on its own output. Use Prompt 17 from best ChatGPT prompts for LinkedIn or build your own:
"Here is a draft. Audit it against the voice prompt above using these 12 checks: [paste checklist]. Report which checks pass, which fail, and one specific edit per failure that would lift the draft. Be honest. Generic praise is useless."
AI catches the surface checks (banned words, sentence length, hedging count) reliably. It misses the harder ones (structural sameness across drafts, point of view commitment, signature moves) because each draft is judged in isolation. Use AI for the first six. Run the last six yourself.
Three rhythms work:
The DFY Voice System builds and stress-tests a voice prompt against the same 12-point audit before delivery. Most builds ship at 80-90% match on first-draft output. £497 founder pricing. The Voice Build methodology, applied to your existing writing.
See The Voice BuildThree quick signals: default vocabulary appearing, drafts that sound interchangeable, and hedging instead of point-of-view. Anything more rigorous needs the 12-point audit.
10-15 minutes per draft once you know the checklist. First time is 30 minutes; after three runs you can score in under five.
Above 80%. 60-80% means edit specific sections. Below 60% means rebuild the voice prompt before generating more drafts.
Yes for the surface checks (banned words, hedging counts). No for the pattern checks (structural sameness across drafts) — those need human eyes.
Partially. Mechanical checks automate. Judgement-heavy checks need a human. Most teams run a hybrid.
Specific failures: edit. Systemic failures: tighten the voice prompt. Don't keep regenerating from a vague prompt.