[spec] [tech] Companion Specification Proposal for Metadata

John Cowan cowan at ccil.org
Thu Feb 25 19:57:24 GMT 2021

On Thu, Feb 25, 2021 at 2:32 PM Gary Johnson <lambdatronic at disroot.org>

> Gemtext pages may be tagged with information that can be useful to
> automated clients (e.g., search engines, archiving bots, and maybe
> proxies) that is otherwise difficult or impossible to infer from
> performing a full text search of the Gemtext file's contexts.
> ## Against
> Metadata represents a slippery slope to uncontrolled extensibility. It
> might be abused for server-specified styling, requesting external
> resources (e.g., supporting client-side scripting or background images
> of kittens), or just generally making Gemtext pages hard to read in
> clients that don't hide inline metadata or make page concatenation
> difficult with the end-of-file metadata proposal that's been discussed
> at some length on the mailing list.

As has been shown, text lines are equally abusable.

> 1. Metadata /within/ a Gemtext file carries a number of liabilities that
>    make some of our community members nervous (understandably so IMO).

To understand all is to forgive all.

> 2. The subset of metadata that is meant to be read and understood by a
>    human reader using a typical Gemini client can already be expressed
>    in natural language without any community-approved tag
>    standardization.

Sometimes having both is unavoidable: books have both a title page and
cataloging-in-publication data, which also includes the title and the
publisher.  (Whether a title page is part of the book or just more metadata
is OT here.)  But surely if both humans and bots can be informed by the
same thing, that's better?  Don't Repeat Yourself, for when updating, one
copy will be forgotten.

1. $DOCUMENT_ROOT/.metadata.gmi
2. $DOCUMENT_ROOT/.well-known/metadata.gmi

Such proposals always fall down (for me, YMMV) on the issue of where the
document root actually is.  Multi-homing makes it possible for every user
of a shared site to have their own domain name, but not everyone wants
that, and it creates issues:

1) Apache has a global access control file, but it turns out that different
parts of a website need different access controls, so the
per-website-directory ".htaccess" file was invented to make this scalable.

2) Robots.txt (on a website) also has to know about everything precisely
because it is global: multiple users can have their own policies, but they
have to then persuade a site admin (as opposed to a website admin) to get
them added, which becomes bureaucratic over time.

3) Originally the addresses of all hosts on the internet (!) were
maintained in a hosts.txt file that every site had to keep an up-to-date
copy of (!!), usually via FTP.  That broke and was replaced by the DNS we
have today, with authority distributed into DNS zones (not quite the same
as domains, but close enough for this conversation).

The principle of subsidiarity: <https://en.wikipedia.org/wiki/Subsidiarity>
is a generalization of this.  We should avoid adding yet another
centralized (even if per-host) solution.  Capsules are a honking good idea,
but we should not conflate them with DNS host names.

John Cowan          http://vrici.lojban.org/~cowan        cowan at ccil.org
Sir, I quite agree with you, but what are we two against so many?
    --George Bernard Shaw,
         to a man booing at the opening of _Arms and the Man_
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.orbitalfox.eu/archives/gemini/attachments/20210225/6d84f140/attachment.htm>

More information about the Gemini mailing list