"lol :Man facepalming: :medium light skintone:" becomes the skintone applying to...

rlayton2 · on March 23, 2020

Agree. I think reversing in non-ascii should always be thought of as "per-token", where English is character-as-token. So the reverse of what you gave would be:

":medium light skintone: :Man facepalming: lol"

(with the lol reversed). In this problem, it is a much harder problem than, say in python, mystring[::-1]. Therefore, it is a different problem "reverse a string" than to "reverse an array".

Accented characters would be kept as is in my scenario.

PeCaN · on March 23, 2020

The "tokens" you're thinking of are "grapheme clusters" in Unicode.

Unfortunately just reversing by grapheme clusters doesn't solve the problem because of directional formatting codes; if you have e.g. a right-to-left embedding followed by a pop directional formatting you can't naively reverse them.

naniwaduni · on March 23, 2020

Grapheme clusters are a poor approximation of the vaguely-defined linguistic-level concept you're groping for.

PeCaN · on March 23, 2020

Well, yes, but we gotta stop somewhere or just give up any hope of computers operating on text.

Although I think grapheme clusters are a pretty good approximation in that it's usually what you want to backspace in a word processor.

diegoperini · on March 23, 2020

Is there a better approximation?