Skip to content

Text Cleaner

Clean messy text with toggles for spaces, blank lines, tabs, smart quotes, hidden characters, and HTML tags.

Why pasted text is such a mess

Copy a paragraph out of a Word doc, a PDF, or a Slack thread and paste it somewhere else. Looks fine, right? Then your build breaks, or your CSV won’t parse, or a curly apostrophe shows up as ’ in someone’s inbox. That junk is invisible until it isn’t.

This cleaner runs your text through a stack of fixes you can switch on and off. Every toggle you flip re-runs instantly, so you watch the right pane update as you go. No button to hammer, no upload, nothing leaves the tab.

What each toggle actually does

There are eight switches. Mix and match:

  • Collapse multiple spaces turns Hello World into Hello World. Just the runs of two or more, down to one.
  • Trim line whitespace strips leading and trailing spaces and tabs from every line. Indentation and end-of-line cruft both go.
  • Remove blank lines deletes empty lines, including the ones that only hold spaces. Triple line breaks become single.
  • Remove tabs swaps tab characters for a single space so nothing jumps to weird column widths.
  • Smart quotes/dashes to ASCII rewrites " " and ' ' as plain " and ', em and en dashes as -, and as three dots. The stuff that wrecks code and JSON.
  • Strip zero-width / non-printable kills zero-width spaces, byte-order marks, direction overrides, and control characters. These are the truly invisible ones that break diffs and search.
  • Strip HTML tags pulls out anything between angle brackets, so <p>Hi <b>there</b></p> collapses to Hi there.
  • Lowercase everything is off by default. Flip it when you need normalized slugs or keys.

The order is fixed and sensible: HTML and quotes get handled before spacing, so tag removal doesn’t leave double spaces behind.

A real example

Say you paste a line out of a design brief: the word “panic” wrapped in curly quotes, an em dash before “it’s”, three extra spaces between words, a curly ellipsis at the end, and a sneaky zero-width space trailing it.

With the defaults on, you get back Don't "panic" - it's fine... with straight quotes, a plain hyphen, three literal dots, single spaces, and the hidden character gone. That version pastes cleanly into a code editor, a JSON file, or a database field without surprises.

The little counter under the output tells you how many characters got cut. Sometimes it’s two. Sometimes a paste from a PDF sheds a few hundred.

Where this saves you time

Developers hit this before committing. Trailing whitespace, smart quotes that sneak into a string literal, a zero-width space that makes a variable name look identical but compile differently. One pass and it’s gone.

Writers and editors flatten formatting between apps with it. Notion, Google Docs, and Word each inject their own characters. Strip them and your text behaves the same everywhere.

Data folks lean on the blank-line and trim options when cleaning exports. Form submissions and scraped pages show up with inconsistent spacing, and tidy input means fewer parsing headaches.

One thing to keep in mind: collapsing spaces has no idea what’s inside quotes or code. If you’ve got intentionally aligned comments or an ASCII table, leave that toggle off so it doesn’t squash your spacing.

FAQ

Does the cleaner send my text anywhere?

Nope. Everything runs in your browser with plain JavaScript. Close the tab and it’s gone. Nothing gets uploaded.

What counts as a zero-width character?

Zero-width spaces, zero-width joiners and non-joiners, byte-order marks, and bidirectional overrides. They have no visible glyph but they live in your text and break searches, diffs, and string comparisons.

Will stripping HTML keep the text content?

Yep. It removes the tags and leaves what’s between them. So <a href="...">click</a> becomes click. It doesn’t decode entities like &amp; though.

Can I turn off the smart-quote conversion?

Sure. Uncheck that toggle and curly quotes, em dashes, and ellipses stay exactly as they were. Handy when you’re cleaning prose for print rather than code.

Why is lowercase off by default?

Because most cleanups want to preserve case. It’s there for slugs, keys, and normalization jobs, but flipping it on by accident would mangle normal paragraphs, so you opt in.

text-cleaner whitespace smart-quotes zero-width strip-html

Related Tools

More in Text Tools