[spec] The Tragedy of &

Sean Conner sean at conman.org
Sun Jan 31 23:16:18 GMT 2021

It was thus said that the Great Gary Johnson once stated:
> Sean Conner <sean at conman.org> writes:
> > Not if the CGI interface is properly written.  All I had to do was write
> > this CGI script and drop it into my tests directory [1]:
> >
> > 	gemini://gemini.conman.org/test/pathseg.cgi

  [ snip ]

> Thanks for sharing some code, Sean. I, of course, realize that one could
> write a CGI script to pick apart the PATH_INFO for user inputs. This
> issue I raised in my message was that this doesn't make any sense in the
> context of a CGI script which is looked up using the path on the remote
> filesystem.
> In your example, your script is located at /test/pathseg.cgi. However,
> lacking side information, I see no indicator (outside of the --
> admittedly optional -- cgi extension on your file name) of which path
> segments should be considered part of the CGI filename lookup and which
> parts are meant to be user input data in your example link:
> /test/pathseg.cgi/name=and%20a%20one/age=and%20a%20two/action=skidoosh

  That's a particular implementation detail of GLV-1.12556 [1].  Other
servers could require the extension, or some other mechanism.

> This feels like a massive hack to me and an abuse of path segments TBH.
> If I were to embrace this approach, I can see that I would have to
> reprogram my server to do some additional path preprocessing magic. I
> could either:
> 1. Check every sequence of path segments starting from the document root
>    to see if any of them correspond to an executable file or have the
>    blessed CGI file extension for my server.

  I see your server just accepts the requested path as is.  GLV-1.12556
(once it gets into the filesystem handler) walks down the document root
checking each path segment looking for an exectuable file (which indicates a
CGI script) or symbolic link (which indicates a SCGI script).

> Once one of these 3 approaches enables the server to successfully detect
> that a particular path corresponds to a CGI script that is not actually
> located where that path is pointing, then the server would need to
> execute that script with PATH_INFO bound to the entire path. Every
> installed CGI script would then be responsible for manually removing
> SCRIPT_NAME from PATH_INFO and splitting it up to get the user inputs,
> which puts an additional burden on CGI developers.

  If you want to follow RFC-3875, that's not the case.  PATH_INFO only
contans data past the script name (section 4.1.5). This link:




There is no PATH_INFO or PATH_TRANSLATED because it's not needed.  However:



	PATH_INFO = /path/to/nowhere
	PATH_TRANSLATED = /home/spc/projects/gemini/non-checkin/gemini.conman.org/path/to/nowhere

  The work is on the server side, not the CGI script side.

> So I've now heard from multiple folks that we should all just get on
> with these path segment hacks and accept that as the best we can do in
> Gemini.
> While I can see that it's technically possible (though arguable ugly) to
> do so, I suppose my question is:
> "What exactly does Gemini lose by allowing chained query parameters?
> (with &)"

  Nothing as far as I can see, as long as the characters '=' and '&' are
escaped if they appear in the input (to prevent confusion).  

> What am I missing here, folks?

  Somebody to do a proof-of-concept probably.

> Any chance of weighing in here, Solderpunk?

  Is he still alive?


[1]	https://github.com/spc476/GLV-1.12556

More information about the Gemini mailing list