[SPEC] Encouraging HTTP Proxies to support Gemini hosts self-blacklisting

Mansfield mansfield at ondollo.com
Sun Feb 21 21:37:18 GMT 2021


On Sun, Feb 21, 2021 at 1:48 PM Johann Galle <johann at qwertqwefsday.eu>
wrote:

> Hi,
>
> why is robots.txt not the obvious answer here? The companion
> specification[1] has a "User-agent: webproxy" for this specific case:
>
>  > ### Web proxies
>  > Gemini bots which fetch content in order to translate said content into
> HTML and publicly serve the result over HTTP(S) (in order to make
> Geminispace accessible from within a standard web browser) should respect
> robots.txt directives aimed at a User-agent of "webproxy".
>
> So this should suffice:
>
> ```
> User-agent: webproxy
> Disallow: /
> ```
>
> Regards,
> Johann
>

I must admit, I'm woefully lacking skill or background with robots.txt. It
seems like it could be a great answer.

A few questions to help me educate myself:

 1. How often should that file be referenced by the proxy? It feels like an
answer might be, to check that URL before every request, but that goes in
the direction of some of the negative feedback about the favicon. One user
action -> one gemini request and more.
 2. Is 'webproxy' a standard reference to any proxy, or is that something
left to us to decide?
 3. Are there globbing-like syntax rules for the Disallow field?
 4. I'm assuming there could be multiple rules that need to be mixed. Is
there a standard algorithm for that process? E.g.:
User-agent: webproxy
Disallow: /a
Allow: /a/b
Disallow: /a/b/c

Again - it seems like this could work out really well.

Thanks for helping me learn a bit more!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.orbitalfox.eu/archives/gemini/attachments/20210221/efe2188f/attachment.htm>


More information about the Gemini mailing list