Supporting optional underscores for italics

John Cowan cowan at
Fri Nov 13 02:38:52 GMT 2020

On Thu, Nov 12, 2020 at 9:02 PM Sean Conner <sean at> wrote:

  Oooh!  A bike sheeding thread!

Yup.  Let's sheed some bikes together!

> So with my introduction out of the way, let me nitpick [1]

Yes, you can say that.  Head lice do not yet have a pressure group
insisting that you call their eggs something more polite.

> t> Given that we are *not* going to change the definition of text/gemini,
> but
>                     ^^^^^ shouldn't that be _not_?  Or are you going for
> strong emphasis here?

No, just habit.  I write a *lot* of git-flavored Markdown.

>   I've found that terminating italic sections before sentence-terminating
> punctuation can lead to very ugly output.

Yes, though that's up to the content author.  But this isn't about
terminating the italic section; it's about deciding whether an underscore
actually does terminate it.  See below.

> why not just say that once in an emphasized text section, the next
> underscore ends it.

So that lines like "It is important to understand that _although the
standard in C is to use snake_case for variables, C compilers do not
support numbers like 123_456_789_." are interpreted correctly.  To put it
in HTML terms, the first underscore is preceded by whitespace, so it is an
<i>, and the last one is followed by terminating punctuation, so it is an
</i>.  The others, however, don't satisfy either rule 1 or rule 2, so the
emphatic text just goes on right through them.

>         blah blabh _lorem ipsum dolor
>         sit amet_ blah blah blah

They will quickly find out that that doesn't work.  Text/gemini lines are
typically used in prose for paragraphs, and italic text doesn't normally
cross paragraph boundaries.

>   That might be updated at the next Unicode revision.

That's true.  But as time goes by, the new scripts with script-specific
punctuation become fewer and harder to find.  Until we join the Galactic
Federation, there just aren't many more scripts out there.  Newly invented
ones tend to use Latin/Greek/Cyrillic/etc. punctuation.

>   -spc (Unicode is hard!  Let's do rocketry!)

You kidding?  This is one of the easy bits!  All the work has been done for
us.  We don't even need regular expressions to figure it out, just keep the
352 characters in two arrays.  Unless your browser runs on an Arduino, that
is practically free.

John Cowan        cowan at
I amar prestar aen, han mathon ne nen,
han mathon ne chae, a han noston ne 'wilith.  --Galadriel, LOTR:FOTR
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the Gemini mailing list