Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If your character encoding is fixed-bit-length then a code point and a code unit are the same thing yes. If your character encoding is variable-bit-length then a code point is a "character" and a code unit is the thing that a character may be 1 or more of (i.e. a byte in utf8, or a 2-byte thingummy in utf16).

The point is that it's better to treat strings as sequences of unicode codepoints ("characters") than treat them as sequences of 16-bit utf16 units like Java does.



Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: