[tech] Zero-width characters and tracking via pasted text

Oliver Simmons oliversimmo at gmail.com
Mon Mar 15 16:43:33 GMT 2021


On Sun, 14 Mar 2021 at 16:55, nervuri <nervuri at disroot.org> wrote:
>
> First, as a point of reference, here are a few positive-width Unicode
> characters:
> 0020: _ _ | 00E9: _é_ | 03A9: _Ω_ | 5B57: _字_ | 1F407: __
>

All fine for me!
(GMail seems to strip emoji in plain-text replies though.. which is rather odd.)

>  FFF9: __
>  FFFA: __
>  FFFB: __

These three show as the replacement box for me.
I've never quite understood what the "inter annotation" whatever
characters are - but I think they're some form of control character so
having them display as a box when used incorrectly might be correct.

>
> E0020: _󠀠_
> ... (E0020–E007F used for invisibly tagging texts by language)
> E007F: _󠁿_
>

These *were* used for tagging texts by language, but have been
deprecated in favour of using other non-Unicode metadata for this
purpose.
They are planned to be used in emojis and are (were?) used (but not
widely supported) for country codes/flags with codes longer than 2
characters (3?), such as USA states or counties of England.
Wikipedia has a ~ok description of their history.
=> https://en.wikipedia.org/wiki/Tags_(Unicode_block)

-Oliver Simmons


More information about the Gemini mailing list