[spec] Limit valid encodings of text/gemini to UTF-8

Stephane Bortzmeyer stephane at sources.org
Sun Jan 3 13:49:43 GMT 2021


On Mon, Dec 28, 2020 at 02:16:27PM +0100,
 Philip Linde <linde.philip at gmail.com> wrote 
 a message of 69 lines which said:

> While it is the case that impact is minimal, I suggest that the
> specification reflects the much simpler situation these statistics
> indicate rather than keep itself open to the general problem of
> representing text/gemini in encodings that might not even have the
> meta information characters encoded in the same way, and—if IRIs are
> introduced—creates the problem of how IRIs should be represented in
> e.g. ISO-8859-1.

Note also that saying "gemtexts MUST be in UTF-8" is not
everything. We may (or may be not) also want to mandate end-of-lines
(they can be represented with CR, LF, CR-LF, LS or PS, the last two
being purely Unicode, not present in ASCII) and normalization.

If we go that way, there is an existing standard for Unicode text, RFC
5198 <gemini://gemini.bortzmeyer.org/rfc-mirror/rfc5198.txt>. It
mandates CR-LF and normalization NFC.



More information about the Gemini mailing list