"lol :Man facepalming: :medium light skintone:" becomes the skintone applying to nothing (which might crash?) and the wrong coloured man. (e+accent a) making éa becomes (a+accent e) incorrectly making áe - or possibly invalidly making an error combination. Right-to-left markers[1] and left-to-right markers will change which sections of the text are reversed unless you swap them over.
Codepoints can combine more than once, to the point where if you're too nitpicky you can't validly substring either, you can only read a string from the first codepoint onwards; they could become invalid sequences if reversed, possibly?
Agree. I think reversing in non-ascii should always be thought of as "per-token", where English is character-as-token. So the reverse of what you gave would be:
":medium light skintone: :Man facepalming: lol"
(with the lol reversed). In this problem, it is a much harder problem than, say in python, mystring[::-1]. Therefore, it is a different problem "reverse a string" than to "reverse an array".
Accented characters would be kept as is in my scenario.
The "tokens" you're thinking of are "grapheme clusters" in Unicode.
Unfortunately just reversing by grapheme clusters doesn't solve the problem because of directional formatting codes; if you have e.g. a right-to-left embedding followed by a pop directional formatting you can't naively reverse them.
Codepoints can combine more than once, to the point where if you're too nitpicky you can't validly substring either, you can only read a string from the first codepoint onwards; they could become invalid sequences if reversed, possibly?
[1] https://en.wikipedia.org/wiki/Right-to-left_mark