Metadata Without A Proposal

nothien at uber.space nothien at uber.space
Fri Feb 26 12:59:55 GMT 2021


Philip Linde <linde.philip at gmail.com> wrote:
> > 2. Must not be English-specific.
> 
> What is the preferable alternative? We could use numbers to indicate
> element type, but ultimately numbers are dependent on numeral systems,
> which depend on language and culture.

... Did you read the rest of the e-mail?  I listed specific ways in
which we can support date, author, and license metadata, using existing
formats and conventions, none of which use English (for licenses,
instead of SPDX license identifiers, Petite Abeille has neatly suggested
using links, although I don't agree with their format).

> If instead of using English directly, we define opaque strings of
> characters for the tags, such that the tag "author" consistently means
> "author", we really achieve the same thing. That is a simple solution
> that is language independent.
> 
> Or we could use emoji, although I believe most computer users in the
> world would have a harder time typing out a given emoji than a given
> opaque, ASCII- and English-compatible string.

So you want to force non-English Gemini writers to use English words?
When (seeing my original proposal) it's unnecessary?  Imagine if you had
to end every Gemini document with the magic incantanation "ίĭئٻɨƁϸͶѠGڧ".
That doesn't seem fun.

> > 3. Must be machine-parsable.
> 
> We should consider the difference between needs and wants here. If I
> have no interest in specifying another license to use my work than what
> is implied from my sharing it, that doesn't necessarily mean I don't
> want to specify date or author, so perhaps all or most elements should
> be optional.

The sole purpose of giving a fixed format to metadata is so that it is
machine parsable; all other metadata can simply be stated using natural
language.  And yes, if you read the rest of my e-mail, you would notice
that everything in it is completely optional.

> > 4. Should affect presentation.
> >   
> >    gemtext as a whole is about separating content from presentation.
> >    Some of the earlier metadata proposals referred to metadata for
> >    presentation, e.g. to specify a color to view the text in.  This is
> >    against the spirit of gemtext/Gemini (if not the spec).
> 
> Agreed, but as I understand it you do *not* want it to affect
> presentation.

Yep, typo.

> > 5. Must be difficult to extend.
> > 
> >   ...
> 
> What do you propose that prevents conventional use from dictating
> reality? And why is it important that the specification can not be
> extended? Unlike e.g. text/gemini, if a client doesn't support some
> superset of the tags initially specified, there is no degradation. If
> in the future we want to extend a meta data format to support e.g.
> specifying where, in addition to when, it was written, the clients
> that don't support it shouldn't suffer from it.
> 
> The only important concern to me is that there is a canonical
> description of tags. That description can be extended indefinitely as
> far as I'm concerned, for as long as the original meanings of the
> initial set of supported tags aren't changed or overloaded by newer
> tags.

Non-extensibility is a fundamental part of the spirit of Gemini.  We
want to prevent metadata from being used for all but the specified
purposes so that it is not misused in the future.  Consider, for
example, a 'color' metadata key that had been suggested early on in the
original metadata thread.  We want to prevent these kinds of misuses
from happening at all.  Notice that my proposal-not-proposal handles
each metadata field on a case-by-case basis; there is no way provided to
handle additional fields.  In addition, I've stated that other metadata
fields, which don't have to be known to search engines, can use an
arbitrary, capsule-specific convention, so that you can use additional
metadata fields internally.

> I think that instead of defining ourselves what fields are important
> we should start from a standard, e.g. DCMI with the element set
> defined in IETF RFC 5013.
> 
> With that as a basis, if there is no suitable format already, we can
> define a human readable, text-compatible data format and a
> corresponding text/xyz MIME type. Then, a text/gemini document that
> feels like supplying additional metadata can link to a metadata file
> which the server serves with the above MIME type. A client that does
> not support the MIME type should defer to serving unknown text/* types
> as plain text. A client that does support it can localize the
> elements, including things like names and date and time formats. If
> the client is a crawler, it should find the linked metadata document
> as a matter of its normal operation because it is linked from the
> document.

This has a few problems:

1. It is extensible.  As I've argued above, we don't want extensibility.
   This would mean that we have to have a very strict format for this
   metadata file, and given how few fields are really necessary to be
   machine-parsable, this would be a very small file.  With my proposal,
   we can embed all the necessary metadata into the existing files.

2. The keys specified in IETF RFC 5013 are English-specific.  As I've
   explained in my original mail, this is not sustainable for
   non-English Gemini clients and writers, as either the writers are
   forced to use English (bad), or the clients are forced to support the
   same keywords across a /lot/ of languages (bad).

> Personally I don't think this is a standard I would use either way.
> It's mostly for the benefit of robots that there's a point in
> formalizing information like this. Humans can interpret such
> information as indicated in the document itself in a much wider
> variety of formats. It's not my intention, primarily, to serve robots.

Many gemlogs use the gmisub format, which is essentially providing date
metadata.  There are uses, and making your content understandable to
'robots' will also make it understandable to the users behind them.  One
particularly helpful area that my proposal-not-proposal provides for is
basic search engine filtering (by date, author, and license).

~aravk | ~nothien


More information about the Gemini mailing list