How to Prepare Your Text for Readability Formulas

Imagine you’re a teacher preparing reading materials for your students. You need to match these materials to their grade level, so you turn to readability formulas—they tell you if your students will find your materials easy or difficult to comprehend. One day, you come across a fascinating article that you believe would captivate your fifth-grade class. Excitedly, you run the text through a readability formula, only to receive an unexpectedly high difficulty score—equivalent to a college-level reading!

Perplexed, you begin to investigate. As you examine the text more closely, you realize that it’s riddled with formatting quirks, inconsistent spellings, and even some stray HTML tags. It becomes clear that these anomalies are affecting the readability score.

This experience highlights a critical aspect of readability analysis: cleaning or “sanitizing” the text before applying readability formulas. The cleaning process ensures the formulas accurately score your text’s readability and output correct word stats.

Here’s how to clean your text and achieve an accurate score:

TIP #1: Abbreviations. Embedded punctuation may confuse a readability program when it counts the number of sentences. Most programs tell the computer to find the end of a sentence by looking for a punctuation mark. Sometimes this punctuation falls within a sentence, rather than at the end, but the computer cannot differentiate this. Since the computer interprets any period as a sentence stop, remove any embedded punctuation, such as periods that you’ve used for abbreviations, roman numerals, numbers with decimals, etc.

Example: “The meeting will be held at 10:30 a.m. because that’s when the CEO arrives.”
Explanation: Some programs might mistaken the “a.m.” in the sentence as the end of a sentence because of the period.

TIP #2: Titles/Headings. A readability program cannot distinguish ordinary sentences from titles, headings, and bulleted lists because the sentence has no punctuation. If the computer keeps searching for punctuation (.?!), it will include the text from headings as part of the first sentence that follows the heading. Obviously, the program will miscalculate the sentence length.

TIP #3: Normalize Numbers: You can represent numbers in different ways, such as numerals (3) or words (three). Decide on a standard format for numbers and stick to it throughout the text, as inconsistency might skew word and syllable counts.

TIP #4: Hyphenated Words: Sometimes hyphenated words can be counted as one word or multiple words. To maintain consistency, decide whether to keep or split hyphenated words and apply your decision uniformly across the text. Example: “The well-known author gave a heart-warming speech at the state-of-the-art theatre.”

TIP #5: Compound Sentences: If your text has compound sentences connected by conjunctions (and, or, but), consider breaking them into simpler sentences. This will average out sentence lengths more accurately.

TIP #6: Possessives/Contractions: Words like “can’t” or “John’s” might be misread by some programs. If possible, expand contractions and possessives to their full form (cannot, John is) to aid in accurate word and syllable counts.

TIP #7: Quotations: Quotations can disrupt sentence flow and structure. Clearly differentiate quotations from the main text. Analyze them separately if they form a significant part of the document.

Example:

Original: As Mark Twain once said, “Truth is stranger than fiction, but it is because Fiction is obliged to stick to possibilities; Truth isn’t.”
Revised: Mark Twain made a notable point about truth and fiction. He stated: “Truth is stranger than fiction, but it is because Fiction is obliged to stick to possibilities; Truth isn’t.” This statement reflects…
Reason: In the revised version, the quotation is separated from the main text with a full sentence introducing it. This can help readability programs better understand the structure of the text and not confuse the quoted material with the author’s own sentences.

TIP #8: Special Characters/Symbols: Write out characters like ampersands (&), percent signs (%), and currency symbols ($) as words (and, percent, dollars) so the program interprets them correctly.

TIP #9: Inline Lists: If you use lists within sentences (e.g., “The colors are red, green, blue, and yellow.”), you can represent them in bullet form or as separate sentences to avoid inflating word counts within a sentence.

TIP #10: Footnotes/Endnotes: If your text has footnotes or endnotes, remove them from the main text body. These often contain punctuation and formatting that could confuse the readability analysis.

TIP #11: Ellipses/Dashes: Use ellipses (…) and dashes (—) uniformly and replace any informal use with standardized punctuation marks, as these can affect sentence and word counts.

Example: I don’t know . . . I might come to the party – if I finish work early.
Revised: I don’t know… I might come to the party—if I finish work early.
Reason: The proper ellipsis indicates a pause or trailing off thought, and the em dash correctly indicates a break in the sentence or an interjection.

TIP #12: Quotation Marks: Use straight quotes (” “) instead of curly quotes (“ ”) as some programs might confuse the latter. Also, use opening and closing quotes correctly to detect sentence boundaries.

TIP #13: Emojis/Emoticons: Replace emojis and emoticons with their descriptive phrase or remove them entirely since they are not standard text characters and most readability formulas won’t process them correctly.

TIP #14: Paragraph Spacing: Make sure paragraph breaks are consistent. Extra line breaks or line spaces can be interpreted as sentence breaks, affecting sentence counts.

TIP #15: Bullet Points/Numbered Lists: Transform bullet points and numbered list items into full sentences if possible, or remove them if they are not crucial, as these can disrupt the flow and structure of connected text.

TIP #16: Complex Vocabulary: Replace unnecessary complex or highly technical terms with simpler synonyms to prevent the program from overestimating the reading difficulty based on syllable count.

TIP #17: American/British Spelling: Choose either American or British English spelling conventions and apply them consistently to avoid miscounts of words.

TIP #18: Non-Textual Elements: Remove non-textual elements like images, graphs, or tables since readability formulas cannot process them.

TIP #19: Acronyms: Expand acronyms on their first use (e.g., “NASA, the National Aeronautics and Space Administration”) to ensure the program does not misinterpret the periods or count them as individual words.

TIP #20: Dialogue: If you use dialogue, consider rewriting it in a narrative form, as the varying punctuations in dialogue (like dashes, ellipses, and interrupted sentences) can confuse a readability program.

Example: “Where are you going?” John asked.
“To the store,” Mary replied, “I need to buy some milk.”
Revised: John inquired where Mary was headed. She explained that she was going to the store to purchase some milk.
Reason: The revised version omits the punctuation marks by converting the dialogue into indirect speech, thus presenting it in narrative form. This can make it easier for the readability program to analyze the text.

TIP #21: HTML: Remove any HTML, hyperlinks, URLs as they can contain syntactical elements that are non-standard in regular prose and can confuse readability measures.

TIP #22: Punctuation Marks: Double-check for any misplaced or unneeded punctuation marks such as commas, colons, and semicolons.

TIP #23: Foreign Languages: If your text has foreign phrases or passages, remove or translate them to maintain consistency.

Example: The spirit of ‘joie de vivre’ is what makes Paris so special.
Revised: The spirit of ‘joyful living’ is what makes Paris so special.
Reason: A readability program might find the original sentence problematic because it is unable to process foreign language.
Example: The menu included items such as piña colada, jalapeño peppers, and crème brûlée.
Revised: The menu included items such as pina colada, jalapeno peppers, and creme brulee.
Reason: In the original sentence, words like “piña” and “jalapeño” contain the Spanish letter ‘ñ’, which is a special character not found in the English alphabet. Additionally, “crème brûlée” has French accented characters. In the revised sentence, these special characters are replaced with their closest English alphabet equivalents

TIP #24: Capitalization: Maintain uniform capitalization rules for headings, titles, and general text to avoid confusing the software, which may treat capitalized words as proper nouns or the start of new sentences.

###

Tags: readability help

Related Articles

You may have missed