It handles combining characters correctly.
Simplistic implementations of ROT13, such as the one at rot13.com, when presented with the word
café, leave the
é (U+00E9 LATIN SMALL LETTER E WITH ACUTE) untouched, and produce
When this string is sent through a Unicode-aware text system, it may be subject to normalization, changing from
pnsé. Although these two strings look identical, in the latter, the U+00E9 LATIN SMALL LETTER E WITH ACUTE has been decomposed into two characters, U+0065 LATIN SMALL LETTER E followed by U+0301 COMBINING ACUTE ACCENT.
When simplistic ROT13 is applied again, the U+0065 LATIN SMALL LETTER E does get decoded, and the "decoded" string is
cafŕ, which is not the original input.
Here, we avoid this issue by applying NFD normalization before encoding/decoding. The correct encoded form of both
pnsŕ, and the decoded form of
pnsŕ is, of course,
Back to Things Of Interest