[tech] Zero-width characters and tracking via pasted text

nervuri nervuri at disroot.org
Mon Mar 22 13:59:14 GMT 2021


On Mon, Mar 15, 2021, Oliver Simmons wrote:
>> E0020: _󠀠_
>> ... (E0020–E007F used for invisibly tagging texts by language)
>> E007F: _󠁿_
>
>These *were* used for tagging texts by language, but have been
>deprecated in favour of using other non-Unicode metadata for this
>purpose.
>They are planned to be used in emojis and are (were?) used (but not
>widely supported) for country codes/flags with codes longer than 2
>characters (3?), such as USA states or counties of England.
>Wikipedia has a ~ok description of their history.
>=> https://en.wikipedia.org/wiki/Tags_(Unicode_block)

Thanks, I replaced "used" with "formerly used".  Wikipedia says "The
release of Emoji 5.0 in March 2017 considers these characters to be
emoji for use as modifiers in special sequences."  I take that to mean
that they will remain zero-width, but will generate emojis when used in
special sequences, as with the flag of England:

🏴󠁧󠁢󠁥󠁮󠁧󠁿
=
🏴<U+E0067><U+E0062><U+E0065><U+E006E><U+E0067><U+E007F><U+E0042>

Unicode keeps getting weirder.


More information about the Gemini mailing list