Ambiguity in spec regarding line endings

Ryan Kavanagh rak at rak.ac
Thu Jun 4 17:08:44 BST 2020


I'm reading the current version of the spec, and have come across the
following ambiguous paragraph in §3.3:

    When in canonical form, media subtypes of the "text" type use CRLF
    as the text line break.  Gemini relaxes this requirement and allows
    the transport of text media with plain LF alone (but NOT a plain CR
    alone) representing a line break when it is done consistently for an
    entire response body.  Gemini clients MUST accept CRLF and bare LF
    as being representative of a line break in text media received via
    HTTP.

How do the second and third sentences interact? In particular, how does

    [...] when it is done consistently for an entire response body.

interact with

    Gemini clients MUST accept CRLF and bare LF as being representative
    of a line break in text media received via HTTP.

How should Gemini clients behave when both CRLF and LF appear in the
same text/gemini transmission? Are both to be equivalently treated as
line breaks?

I've looked through the archives to see what has been said in the past
about line breaks, and the two following messages appear most relevant:

On Sat, Sep 07, 2019 at 04:30:14PM -0400, Jason McBrayer wrote:
> IMO, it makes sense to require CRLF in the plain text parts of the
> protocol (after requests, after the status line of a response), but I
> don't think that the text/gemini file format needs to have CR/LF; IMO
> clients should be prepared to accept either LF or CR/LF just as they
> would with text/plain. And maybe if we're serious about supporting old
> devices, clients should be prepared for bare CR, too (Classic MacOS).
> But it's a pain in the arse to authors to have to save text documents
> with non-native line endings, and I don't feel like servers need to be
> in the business of reformatting the content they serve.

On Sun, Sep 08, 2019 at 02:42:08PM +0000, solderpunk wrote:
> I will admit that the current liberal use of CRLF throughout the
> Gemini spec is the result of me blindly copying from Gopher and other
> RFCs (as Sean mentioned, it's ubiquitous).

Here's [0,1] some of the history of requiring CRLF in network protocols
and in requiring CRLF for text/ subtypes [2] during transmission.

TL;DR: every system has a different native line ending sequence (LF vs
CR vs CRLF). To ensure all can communicate with each other (and to
simplify parsing of communications), transmissions are required to
represent all line endings in text formats by CRLF. Line endings used in
the local storage of text files have *nothing to do* with the line
endings used in transmission, and clients are expected to convert from
CRLF to whatever local format is preferred. So indeed, servers are in
the business of reformatting text/* content that they serve, and they do
so to ensure interoperability between systems with different line ending
conventions.

I think there's a conceptual point to be made here: text/gemini files
are not binary data, but rather, *text files*. This means that their
transmission should not attempt to provide byte-for-byte identical
copies of the local data, but should instead follow well-defined and
agreed-upon representations. If your goal is to transmit a byte-for-byte
identical copy of your file, there are other mime types you can use to
accomplish this (e.g., application/octet-stream).

The FTP protocol makes a similar conceptual distinction. It allows for
text transmission (ASCII and EBCDIC types), where end-of-lines are
defined to be CLRF (ASCII type) and NL (EBCDIC type). It also allows for
a stream / binary transfer mode for transmitting text (and other data)
without any conversion. Quoting from the RFC [4, §3.4]:

    For the purpose of standardized transfer, the sending host will
    translate its internal end of line or end of record denotation into
    the representation prescribed by the transfer mode and file
    structure, and the receiving host will perform the inverse
    translation to its internal denotation.  [...]  Since these
    transformations imply extra work for some systems, identical systems
    transferring non-record structured text files might wish to use a
    binary representation and stream mode for the transfer.

However, in keeping with Postel's law, I suggest allowing clients to
accept LF as a line ending, as is done by RFC 7230 §3.5 [3]:

     Although the line terminator for the start-line and header fields
     is the sequence CRLF, a recipient MAY recognize a single LF as a
     line terminator and ignore any preceding CR.

Conclusion:

To eliminate ambiguity and to make the gemini protocol consistent with
every other text transmission protocol I know of, I propose amending the
ambiguous paragraph in the spec as follows:

    As specified in RFC 2046 §4.1.1, the canonical form of any MIME
    "text" subtype MUST always represent a line break as a CRLF
    sequence. For robustness, a recipient MAY recognize a single LF as
    a line terminator and ignore any preceding CR in text media.

Best,
Ryan

[0] https://www.rfc-editor.org/old/EOLstory.txt
[1] https://tools.ietf.org/html/rfc318
    [ page 8, "End of Line Convention" ]
[2] https://tools.ietf.org/html/rfc2046#section-4.1.1
[3] https://tools.ietf.org/html/rfc7230#section-3.5
[4] https://tools.ietf.org/html/rfc959

-- 
|)|/  Ryan Kavanagh      | GPG: 4E46 9519 ED67 7734 268F
|\|\  https://rak.ac     |      BD95 8F7B F8FC 4A11 C97A
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 1873 bytes
Desc: not available
URL: <https://lists.orbitalfox.eu/archives/gemini/attachments/20200604/fbf5d0ed/attachment.sig>


More information about the Gemini mailing list