The Beauty of Unicode: Zero-Width Characters

article by: at: 26th Sep 2022 under: Informational

Man chased by cloud of letters

When browsing through Unicode tables, which is something nerdy Localization Engineers occasionally do, I sometimes come across characters that deserve a closer look. Did you know that there are five characters that have zero width? What could be their purpose? Let’s sort it out…

Zero-width space (U+200B)

The zero-width space can be used to enable line wrapping in long words, when using languages that don’t use spaces to separate words, or after certain characters like a slash /. Most applications treat the zero-width space like a regular space for word wrapping purposes, even though it is not visible.

In the illustration below, the first string contains no spaces of any kind, but the second string contains zero-width spaces before each capital letter. When narrowing the window, the difference in word wrapping can be observed.

How zero-width spaces affect word breaking

Zero-width non-joiner (U+200C)

The zero-width non-joiner is a non-printing character used in writing systems that make use of ligatures. When placed between two characters that would otherwise be combined into a ligature, a zero-width non-joiner tells the font engine not to combine them.

In the Persian example below, the phrase “I want…” requires a zero-width non-joiner (indicated with a red line) after the first two letters to prevent the ligature from forming. If the zero-width non-joiner is missing, the ligature is formed as seen in the first line.

How a zero-width non-joiner prevents ligature

Zero-width joiner (U+200D)

The zero-width joiner is essentially the opposite of the zero-width non-joiner. When placed between two characters that would otherwise not be connected, a zero-width joiner causes them to be printed in their connected forms (if they have one).

In the Devanagari example below, adding a zero-width joiner in the second line changes the appearance of the character.

How a zero-width joiner affects character rendering

Word joiner (U+2060)

The word joiner, used in language scripts that don’t use spaces, prevents a line break between two characters, which can be useful to manually control sentence breaking in applications that don’t know how to properly wrap text in some languages. Its function is identical to the zero-width no-break space.

Many applications don’t have support for the word joiner, including Internet Explorer. Chrome and Firefox support the word joiner however.

In the example below, the second line contains a word joiner between the last two Chinese characters, which prevents a single character to wrap to the new line and the word for “broadband” being split.

How a word joiner affects Chinese text wrapping

Zero-width no-break space (U+FEFF):

The zero-width no-break space is no longer to be used for its original purpose. Character U+FEFF should solely be used as a Byte Order Mark at the start of a Unicode text file. To keep lines from breaking between two characters, the word joiner (above) should be used instead.

If you like what you are reading, please follow us on major social media platforms including LinkedIn, Facebook, and Twitter. If you are interested in hearing more or are ready to venture into the global market, please reach out to us: getstarted@ptiglobal.com