[SPEC] Backwards-compatible metadata in Gemini
cowan at ccil.org
Wed Feb 24 14:03:23 GMT 2021
On Tue, Feb 23, 2021 at 4:16 AM Lars Noodén <lars.nooden at gmx.com> wrote:
The metadata does not have to be marked up in a difficult manner to be
> both machine readable and human readable. Borrowing from the link
> syntax ,
> which could look like this in the body, but would be up to the client as
> to how it is dealt with.
I agree that this syntax makes sense and is easy to read: as you say, it
degrades to plain text nicely, so there is no pressure to do anything in
particular with it. However, it makes the information available to search
engines in such a way that it is possible to find documents.
VERY IMPORTANT: I do *not* think that this convention needs to be part of
the text/gemini spec, because understanding it is not a requirement for
clients. One client (for human use) can render an =: line as an ordinary
gemtext line; another client can ignore such lines; a smarter client can
render them but translate "language en" into "English language" and "format
text/html" into "HTML".
Search engines and other metadata processors have the same three choices.
A simple approach to making use of metadata is to treat "creator Crowder,
Mary" in the index as if it were "creator:Crowder creator:Mary", thus
allowing people to search for these things Google-style, without confusing
them with subject:Crowder.
Here's some lines in the above format for characterizing one of Project
Gutenberg's books. This is an extensive example: I do not mean that
typical metadata creators will use anything this complex.
=: pgterms.ebook 22222
=: creator Crowther, Mary Owens
=: language en
=: subject Etiquette
=: type Text
=: title How to Write Letters (Formerly The Book of Letters) A Complete
Guide to Correct Business and Personal Correspondence
=: issued 2007-08-02
=: lcc PE
=: rights Public domain in the USA
=: publisher Project Gutenberg
=> https://www.gutenberg.org/files/22222/22222.txt =: format text/plain;
charset=us-ascii =: size 392109 bytes
=> https://www.gutenberg.org/ebooks/22222.kindle.images =: format
application/x-mobipocket-ebook =: size 3304322 bytes
=> https://www.gutenberg.org/files/22222/22222-8.txt =: format text/plain;
charset=iso-8859-1 =: size 392115 bytes
=> https://www.gutenberg.org/ebooks/22222.kindle.noimages =: format
application/x-mobipocket-ebook := size 917781 bytes
=> https://www.gutenberg.org/files/22222/22222-h/22222-h.htm := media-type
text/html; charset=iso-8859-1 =: size 508856 bytes
=> https://www.gutenberg.org/ebooks/22222.rdf := format application/rdf+xml
Note that some metadata lines are actually links to other formats of this
book, so a metadata-aware processor would look at links and see that after
the URL there is an "=:" and process it as metadata. For this reason, I do
not think that metadata lines should be required to be in a fixed place in
the document: I have put the links at the end because they are most likely
less important to people than the rest of the metadata.
In addition, "=:" lines can be joined together if they are related, with a
second "=:" on the same line, since that is unlikely to be part of the
value. This provides the benefit of structured metadata with a depth of 1.
Note to Lars and other metadata people: I have simplified
"dcterms.creator" to "creator" and "dcterms.subject.LCSH" to "subject", so
as not to be too scary-looking. I have also omitted some of the available
formats of this particular book.
John Cowan http://vrici.lojban.org/~cowan cowan at ccil.org
Unless it was by accident that I had offended someone, I never apologized.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Gemini