Possible Bug using indented html with <pre>

Completed

October 27, 2020 13:51

Version: nvUltra v 1.0.0 (68)

If I enter the following into nvUltra it looks like that (right side):

That's a test:

> <pre>
> <mark>Test</mark>
> </pre>

If I compare that to what Typora (for comparison) displays, Typora looks like that (which is much preferred by me):

So my question:

What is correct? Am I doing something wrong?

Comments

3 comments

Fletcher Penney

October 27, 2020 16:30
To be a bit more precise (to help with future similar questions):

1. nvUltra uses MultiMarkdown as the Markdown engine. Barring a *really* unusual bug, what the MultiMarkdown engine receives and produces is exactly the source text you have in nvUltra and the HTML shown in the preview. So, more properly, questions like this are questions about MultiMarkdown, not nvUltra.

2. I don't know what Markdown engine Typora uses. I say this only because what Typora specifically does may or may not have any relevance on what *should* happen, depending on which engine it uses.

3. The best way to get a sense of what "should" happen is to use [Babelmark 2](https://johnmacfarlane.net/babelmark2). It will run specified source text through a variety of Markdown parsers, and consolidate matching output to give you a sense of what the consensus looks like. Everyone will have their own favorites, of course, but there are only a few specific implementations that I care much about:

1. MultiMarkdown of course. In particular, I will look to see whether versions 5 & 6 agree with each other. If not, it is usually an odd edge case, or something fundamental to the parser implementation. But sometimes it's a bug.
2. Markdown.pl for historical reasons, but Gruber's implementation is somewhat bug prone so take it with a grain of salt. I do try to be somewhat intentional about diverging from the original.
3. CommonMark -- there are multiple things I disagree with about CommonMark, but they have created a fairly self-consistent implementation and specification. When I am trying to decide what MultiMarkdown should do, and following Markdown.pl doesn't make sense, I will often look to CommonMark.
4. All of the other implementations are simply viewed through the lens of a crowd for consensus, should I need a tie breaker of some sort.

Using your specific example in Babelmark reveals a few key things:

1. Get rid of the apostrophe, as that causes irrelevant distinction between implementations that support smart typography and those that don't.

2. CommonMark, Markdown.pl, and MultiMarkdown v5 all result in the same output (along with others).

3. MultiMarkdown v6 and PHP Markdown Extra (a "second tier" implementation in the hierarchy of those that I pay more attention to) are the only two that retain the extra `>` characters. But they do so in different ways, and it's not immediately obvious that this is a "correct" alternative interpretation.

4. Unless you're familiar with coding Markdown parsers, this is about as far as you can likely go. It is certainly reasonable to ask whether this is a bug in MultiMarkdown v6 at this point.

5. If you're familiar with Markdown parsers, you'll know that parsing block quotes is a bit tricky. Each line may have one or more `>` characters at the beginning of the line that need to be used to determine whether to increase the level of the block quote, but need to be ignored otherwise. The way in which you do this can have dramatic effects on how things are parsed. So some additional testing may be helpful.

6. Testing `> <pre><mark>Test</mark></pre>` results in identical output between MultiMarkdown 6 and the others, EXCEPT for the odd inclusion of the leading `>`. This was debatable in the first instance when the second and third `>` characters were retained, since they were inside the `<pre>`, but it really doesn't make sense to retain the first one, especially in the second example.

7. The other issue is what happens inside raw HTML. Technically, your example has a `>` character inside raw HTML, which MAY or MAY NOT be parsed as Markdown, depending on the structure.

I'll need to dig into the exact logic of the block quote parser. At this point I can say:

1. Retaining the extra `>` in the first line is definitely a bug.
2. Retaining the `>` characters in the second and third lines may be a bug or may be a different precedence order between block quote and raw HTML, and may or may not be something that is readily modifiable.
3. Given that Markdown.pl, MultiMarkdown v5, and CommonMark handle this in the same way, I would *like* for MultiMarkdown v6 to do the same, but not to the extent that I would rewrite v6 to change the behavior (as a hypothetical example).

Thank you for pointing out this example!
0
Fletcher Penney

October 27, 2020 22:25
I dug into this and found the error. It has been fixed in MultiMarkdown, and will be included in the next release of nvUltra.

In exploring further, however, it does appear that mixing block quotes with some other Markdown structures is handled very inconsistently across implementations. I even found some instances that I believe are errors in CommonMark's handling (which is rare), in addition to various "differences of opinion."
0
Tim Hansinger

October 28, 2020 08:45
Dear Fletcher,

thank you so much for the detailed explanation.

For me it's also not a problem to write it like that, so thank's a lot again:
```
That's a test:

> <pre>
<mark>Test</mark>
</pre>
```
0

Please sign in to leave a comment.

Possible Bug using indented html with <pre>

Comments

Didn't find what you were looking for?