Page & Bell

Text Tools

How to Clean Up Text Pasted from AI, Word, and PDFs

Last updated: 2026-06-13

Text from AI assistants, Microsoft Word, and PDFs looks clean but is full of typographic characters that break code, search, and data files. The fix is to normalize the fancy characters back to plain ASCII and strip the invisible ones — a two-minute cleanup that prevents hours of debugging.

What gets silently swapped in

  • Curly/smart quotes (“ ” ‘ ’) replacing straight quotes — they break code and JSON instantly.
  • Em and en dashes (— –) replacing hyphens — a tell-tale sign of AI and Word output.
  • The ellipsis character (…) replacing three periods.
  • Non-breaking spaces (U+00A0) that look like spaces but are not, breaking matches and CSV columns.
  • Zero-width and other invisible characters — covered in depth in the zero-width characters guide.
  • Bullet glyphs and soft hyphens carried over from formatted documents.

Why it matters beyond looks

Inconsistent characters fail exact-match search, corrupt CSV imports, break JSON parsing, and make two visually identical strings compare as unequal. In published content they also read as sloppy when smart and straight quotes mix on the same line.

Try the toolAI Text Cleaner (Markdown & Em Dash Remover)Clean AI-generated text in one paste: strip markdown asterisks and headers, replace em dashes and smart quotes, and remove hidden characters.

The cleanup, step by step

  1. Run the text through the AI text cleaner to normalize smart quotes, dashes, and ellipses to plain ASCII and remove invisible characters.
  2. Use the extra space remover to collapse double spaces and trailing whitespace.
  3. If the source mangled line breaks (common with PDF copy-paste), fix them with the line break remover.
  4. For aggressive symbol stripping, finish with the special character remover.
  5. Need general tidy-up at any point? The all-in-one text cleaner bundles the common fixes.

When to keep the fancy characters

Normalization is for code, data, and anywhere consistency matters. For polished prose meant for human readers, curly quotes and em dashes are correct typography — clean for your CSV and your code editor, but do not strip them out of an article where they belong.

Frequently asked questions

Why does code copied from a chatbot or document fail to run?

It usually contains curly quotes or em dashes substituted for straight quotes and hyphens. The compiler treats them as different characters, so normalize them to plain ASCII first.

What is a non-breaking space and why is it a problem?

It is a space-like character (U+00A0) that looks identical to a normal space but is a different code point, so it breaks exact matches, CSV columns, and search.

Is my text uploaded anywhere when I clean it?

No. The cleanup tools run entirely in your browser; the text never leaves your device.

Should I always remove smart quotes and em dashes?

Only for code, data, and consistency-sensitive contexts. In prose written for people, curly quotes and em dashes are correct typography and worth keeping.

Tools in this guide