Metadata Without A Proposal

Oliver Simmons oliversimmo at
Fri Feb 26 21:25:57 GMT 2021

On Fri, 26 Feb 2021 at 10:51, <nothien at> wrote:
> I've lost track of the currently raging metadata thread entirely, and so
> I've started this as a new post.

Good choice :)

> 3. Must be machine-parsable.
>    Search engines, archivers, and other crawler-style clients need to be
>    attended to.  Some of the information they need is: date, author, and
>    license.

Every form of somewhat organised info is "machine readable", only
sentences and stuff aren't.
(although ML is getting really good - but i don't think anyone want to use that)

> 5. Must be difficult to extend.
>    Again, this comes from the general Gemini philosophy that anything
>    that can be misused will be misused.  This rules out lots of current
>    proposals because they specify tags, and the usage of tags can only be
>    controlled by convention, which is subject to change.

We use a text-based format, so this is semi-bogus.
I can easily add stuff, such as styling, to my documents *without* a
tag format and make software to support it - without extending the
Gemini wants to be "non-extensible", but having freeform text breaks that.
This is an unfixable problem though, and just a side effect of what Gemini is.

> ## Dates
> My proposal with dates is to use what we already have - the gmisub companion spec.
> Search engines
> and crawlers can still choose to include date information based on when
> they last crawled the page.

This would only really work for things that are looking at sites as a
whole, mainly search engines.
My issue with these metadata in separate location ideas is that it
creates additional work, and network requests, to get the info about
one file.
Also for more one-off things with dates, creating gmisub stuff for it
is slightly overboard.

> ## Licenses

This is really nice, I didn't know there was a convention for it.

> ## Authors
> There are two possibilities I see with author metadata: either take it
> from the license line, discussed above, or extend the gmisub spec to
> also allow for an optional author field.

See above about gmisub.
The licence line makes most sense to me, however not everyone adds
licenses (meaning they get copyright), and may still want their name
on it, the current method of licence-first doesn't work in this case.

- Oliver Simmons

(`- name` is how I sign my emails and stuff when I remember)
There's probably many other ways this could be done, the above was
just a quickly typed example.

In the example I have pointed out a second issue - licenses that aren't in SPDX.
I'm not entirely sure what SPDX is, but from a quick search it appears
it doesn't contain the DBAD license (which is what I personally use
for stuff I really don't care about).


> ## Other Fields
> Clearly, other fields aren't supported by this.  If you want to place
> additional metadata in your content, then I suggest writing it in
> natural language.  If it is absolutely necessary to have it
> machine-parsable (so that it can be specially understood by e.g. search
> engines) then we can talk about that here on the ML, but others have
> argued against e.g. tags because they allow easily manipulating search
> results.  Expect resistance.

Agreed on this, tag metadata formats are just a catch-all, and
catch-alls are typically bad.

> ## Conclusion
> I don't think we need a 'metadata proposal' to achieve the goals we're
> looking for.  The format conventions are already mostly in place; we
> just need to formalize them.

I agree that the catch-all metadata proposals are unneeded, I think we
should stop with them.
I would also think we should start calling them catch-all metadata or
something similar, there's a distinction between a generic format that
allows any metadata, and dedicated formats for individual pieces of
metadata, such as dates, authors and licenses.

- Oliver Simmons

More information about the Gemini mailing list