Metadata Without A Proposal

nothien at uber.space nothien at uber.space
Fri Feb 26 22:25:34 GMT 2021


Glad to see we agree on most of the details.

Oliver Simmons <oliversimmo at gmail.com> wrote:
> Every form of somewhat organised info is "machine readable", only
> sentences and stuff aren't.

My point is that thus far, we've kept Gemini software from trying to
read any natural language.  I want to keep it that way.

> > 5. Must be difficult to extend.
> >
> >    ...
> 
> We use a text-based format, so this is semi-bogus.  I can easily add
> stuff, such as styling, to my documents *without* a tag format and
> make software to support it - without extending the spec.  Gemini
> wants to be "non-extensible", but having freeform text breaks that.
> This is an unfixable problem though, and just a side effect of what
> Gemini is.

I disagree, because although you _can_ use additional syntaxes and write
software to support it, Gemini is already too big to spread to make it
a shared consensus from any one person's content alone.  However, I want
to stay on the safe side (in particular, with the 'individual actor'
assumption), and so I'm trying to prevent adding new areas of
extensibility.

> > ## Dates
> >
> > My proposal with dates is to use what we already have - the gmisub
> > companion spec.
> > [...]
> > Search engines and crawlers can still choose to include date
> > information based on when they last crawled the page.
> 
> This would only really work for things that are looking at sites as a
> whole, mainly search engines.  My issue with these metadata in
> separate location ideas is that it creates additional work, and
> network requests, to get the info about one file.  Also for more
> one-off things with dates, creating gmisub stuff for it is slightly
> overboard.

You're right, this only works for crawlers.  What other use cases can
you think of where the software has to know the date of external
content?

If there are actual use cases, then we can probably tweak the license
line format, using full dates instead of just the year.  For example:

```Example license line using a ISO 8601 date
-- © 2021-02-26 nothien
```

> > ## Licenses
> 
> This is really nice, I didn't know there was a convention for it.

Yeah, it's pretty neat!

> > ## Authors
> >
> > There are two possibilities I see with author metadata: either take
> > it from the license line, discussed above, or extend the gmisub spec
> > to also allow for an optional author field.
> 
> See above about gmisub.  The licence line makes most sense to me,
> however not everyone adds licenses (meaning they get copyright), and
> may still want their name on it, the current method of licence-first
> doesn't work in this case.

The format can be tweaked, sure.  But I think it's more important first
to agree that this is the way to go before trying to choose a specific
syntax.

> In the example I have pointed out a second issue - licenses that
> aren't in SPDX.  I'm not entirely sure what SPDX is, but from a quick
> search it appears it doesn't contain the DBAD license (which is what I
> personally use for stuff I really don't care about).
> 
> => https://dbad-license.org/

Oh no.  I don't want non-SPDX licenses to be second-class citizens.  The
most 'correct' way to deal with this in Gemini, IMO, is to use a URL (so
a link line) to the license, but I'm worried about the extra typing
needed, which would put authors off typing out license lines.
Boilerplate is a powerful tool for generating irritation.  However,
short links in most cases (e.g. 'gemini://spdx.dev/GPL-3.0-or-later',
although this doesn't work) should help ease that issue, while allowing
for other licenses to be used.  Of course, this would require reworking
the syntax, but as I said above, that's fine.  Thoughts?

> > ## Conclusion
> 
> I agree that the catch-all metadata proposals are unneeded, I think we
> should stop with them.  I would also think we should start calling
> them catch-all metadata or something similar, there's a distinction
> between a generic format that allows any metadata, and dedicated
> formats for individual pieces of metadata, such as dates, authors and
> licenses.

Not a bad idea.

~aravk | ~nothien


More information about the Gemini mailing list