-
Notifications
You must be signed in to change notification settings - Fork 253
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds Markdown support, separates smiley parsing from BBC parsing, adds SMF\Parser base class and subclasses #8349
Adds Markdown support, separates smiley parsing from BBC parsing, adds SMF\Parser base class and subclasses #8349
Conversation
Hm. I used to have a complete unit testing file for this back when I was working on it during the summer, but now I can't find it. Anyone who wants one can just grab a copy of the GFM spec, extract all the examples it gives, and run them through |
I know you have already done lots of work, but it feels a bit backward if we add code parsing BBC and then markdown everywhere it needs. Anyone using our code would also need to update theirs to parse markdown. I wonder if it's appropriate to add a 'parser' class that would have an enum (at least in 8.1+, but constants for 8.0) that defines whether you want BBC, markdown, or both. Something like:
Backward compatible function calls just define to use BBC. Mod authors who are supporting 3.0 code natively, can just call the parser and get both automatically. If we support more/different options in the future, it will be easier to just add it. |
An abstract class and an interface might be a good idea, yes. However, when I was writing this I found that it did not work well to conflate BBCode parsing and Markdown parsing into a single call. Most of the time it is best to parse BBCode and then Markdown, but sometimes it is best to do things in the opposite order. So we need the flexibility to call them in the order that we need. There's also the problem that both parsers need various parameters passed to them, but those sets of parameters have nothing to do with each other. Attempting to munge them all together as one big set of parameters to a single function call would be nasty in a number of different ways. Nevertheless, you have raised a very important point that I hadn't thought about. The approach I've taken so far in this PR would indeed mean that "Anyone using our code would also need to update theirs to parse markdown," which doesn't work well at all with our goals regarding backward compatibility in 3.0. You are quite right that this is a problem that needs to be addressed. I'll have to think some more about the best way to do so... 🤔 |
Well, without mods supporting 3.0 native, the code is OOB is backward compatible. We just didn't enable a new feature. Mods would have to build a native 3.0 code to get the MD support. The feature is complete as is and doesn't need overhaul changes if you don't want to. I would have personally wanted to extend the messages table to add a 'format' column with 1/default = Modern BBC and 2 = markdown and then have an option on the posting page to switch between bbc/md, Some sort of 'version' or other column would also be added to indicate how be how we handle HTML entities as we need to clean that up from the various times we clean and then unclean entries to validate things during inputs. It would be versioned to the SMF version the input was written on and allow our parser to know how to handle cleanups/processing based on us changing the input sanitation rules/other fixes. A scheduled task could even be written to 'upgrade' old posts to newer versions to reduce the amount of parsing old posts go through for security checks. |
3e13f00
to
767734a
Compare
Signed-off-by: Jon Stovell <[email protected]>
Signed-off-by: Jon Stovell <[email protected]>
Also fixes WYSIWYG bugs with the php BBCode Signed-off-by: Jon Stovell <[email protected]>
Signed-off-by: Jon Stovell <[email protected]>
This is complicated, because we need to adjust the HTML tags in the output depending on the context in which the HTML is shown. Signed-off-by: Jon Stovell <[email protected]>
Also changes the content of the tab substitute string to something better and unique. Signed-off-by: Jon Stovell <[email protected]>
Signed-off-by: Jon Stovell <[email protected]>
Signed-off-by: Jon Stovell <[email protected]>
Signed-off-by: Jon Stovell <[email protected]>
Signed-off-by: Jon Stovell <[email protected]>
Signed-off-by: Jon Stovell <[email protected]>
Signed-off-by: Jon Stovell <[email protected]>
Signed-off-by: Jon Stovell <[email protected]>
Signed-off-by: Jon Stovell <[email protected]>
Signed-off-by: Jon Stovell <[email protected]>
Signed-off-by: Jon Stovell <[email protected]>
767734a
to
4333c16
Compare
I took another crack at abstracting our parsing in order to handle the process of transforming one markup type into another. I seem to have solved it this time. |
4a04c89
to
8f5f02f
Compare
I like it. I didn't expect the smilies to become a parser, but that makes complete sense. There's a triple win here (Markdown, Parser customization, and smiley parsing separation). |
8f5f02f
to
4b2284c
Compare
Signed-off-by: Jon Stovell <[email protected]>
Signed-off-by: Jon Stovell <[email protected]>
4b2284c
to
95d6b70
Compare
This was planned for Alpha 4, but I don't want to worry about conflicts, so if no one objects, I'd like to merge it soon. |
// Allow mods access before parsing. | ||
$smileys = !empty($input_types & self::INPUT_SMILEYS); | ||
|
||
IntegrationHook::call('integrate_pre_parsebbc', [&$string, &$smileys, &$options['cache_id'], &$options['parse_tags']]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we call the pre_parsebbc here or in the BBC logic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved these two hooks here so that hooked functions could have access to the input string before any changes were made and after all changes were made, which is what these hooks were expected to do back when BBCode parsing was the only type of parsing. If I had left them in BBCodeParser::parse()
, hooked functions would have been called in the midst of the overall parsing process, which would have had negative effects depending on what the hooked function tried to do.
If I were creating the hooks from scratch I would have named them integrate_pre_parse
and integrate_post_parse
(without the trailing bbc
). But since the hooks already existed with these names, they continue to have those names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense in terms of keeping compatibility with older code.
You can add new things while leaving the old ones: /* @deprecated since 3.0 */
IntegrationHook::call('integrate_pre_parsebbc', [&$string, &$smileys, &$options['cache_id'], &$options['parse_tags']]);
IntegrationHook::call('integrate_pre_parse', [&$string, &$smileys, &$options['cache_id'], &$options['parse_tags']]); |
editor.toggleSourceMode(); | ||
editor.toggleSourceMode(); | ||
editor.sourceEditorCaret({start: caretPos, end: caretPos}); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this code do? I'm having trouble understanding what behavoir it adds or modifies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without the lines that toggle the mode and then set the caret position, I found that the caret ended up jumping to an unexpected position (the start of the line, IIRC).
This PR adds support for Markdown syntax. (It also does some other stuff; see "Other Changes".)
The specific flavour of Markdown this supports is GitHub Flavoured Markdown, which is CommonMark plus a tables extension and a strikethrough extension.
Main features:
[h1]
to[h6]
BBCodes have been added.[tt]
BBCode has been un-deprecated and defined to output inline<code>
elements.[list]
BBCode will now output an<ol>
or a<ul>
as appropriate according to the value of the BBCode'stype
attribute. (Previously, we always output a<ul>
even when the CSS made the list appear as if it were ordered.)Minor features:
<p>
elements when Markdown support is enabled. This is better for accessibility, among other benefits.Other changes:
SMF\Parser
base class.SMF\BBCodeParser
toSMF\Parsers\BBCodeParser
.SMF\Parsers\BBCodeParser
intoSMF\Parsers\SmileyParser
.BBCodeParser::parse()
throughout the codebase with calls toParser::transform()
, which is able to leverage all three ofSMF\Parsers\BBCodeParser
,SMF\Parsers\MarkdownParser
, andSMF\Parsers\SmileyParser
in order to intelligently handle the process of transforming markup (i.e. BBCode, Markdown, HTML, and smiley codes) from one form to another as needed.<strong>
,<em>
, etc.).Config::$modSettings['enablePostHTML']
when BBCode was disabled.More info:
SMF\Parsers\MarkdownParser::parse()
can be used to generate the following types of output:a+b+c:d
,made-up-scheme://foo,bar
, orhttp://../
. SMF is capable of much more robust URI validation, so we use it.