unicode - Normalization needed after case folding -
given nfc normalized string, applying full case folding string, can assume result nfc normalized too?
i don't understand unicode standard trying tell me in quote:
normalization interacts case folding. string x, let q(x) = nfc(tocasefold(nfd(x))). in other words, q(x) result of normalizing x, case folding result, putting result normalization form nfc format. because of way normalization , case folding defined, q(q(x)) = q(x). repeatedly applying q not change result; case folding closed under canonical normalization either normalization form nfc or nfd.
a unicode string might not in nfc after case folding. example u+00df
(latin small letter sharp s) followed u+0301
(combining acute accent).
x = u+00df u+0301 nfc(x) = u+00df u+0301 tocasefold(nfc(x)) = u+0073 u+0073 u+0301 nfc(tocasefold(nfc(x))) = u+0073 u+015b
Comments
Post a Comment