[spec] Limit valid encodings of text/gemini to UTF-8

Petite Abeille petite.abeille at gmail.com
Sun Jan 3 16:02:54 GMT 2021



> On Jan 3, 2021, at 14:46, Stephane Bortzmeyer <stephane at sources.org> wrote:
> 
> UTF-8 has a quasi-monopoly.

Not quite.

For text/gemini, your stats read:

• Unspecified: 42,322
• utf-8: 6,513
• us-ascii: 3

Unspecified rules. By far. Most likely plain ASCII in practice.


Could you run #file --mime-type --mime-encoding on all these text/gemini? 

$ openssl s_client -quiet -crlf -connect mozz.us:1965 <<< gemini://mozz.us/ 2>/dev/null | file --brief --mime-type --mime-encoding -
text/plain; charset=utf-8


Validating the encoding would be informative as well:

$ openssl s_client -quiet -crlf -connect mozz.us:1965 <<< gemini://mozz.us/ 2>/dev/null | iconv -f utf-8 -t utf-8 > /dev/null; echo $?
0


Ditto for guessing the actual language:

# echo $(openssl s_client -quiet -crlf -connect mozz.us:1965 <<< gemini://mozz.us/ 2>/dev/null ) | polyglot detect | cut -d' ' -f1 | uniq
English

https://polyglot.readthedocs.io/en/latest/Detection.html


℀ ±𝟤¢



More information about the Gemini mailing list