Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Proper highlighting for the Wolfram Language (Mathematica) #2706

Merged
merged 32 commits into from
Nov 6, 2020
Merged
Show file tree
Hide file tree
Changes from 30 commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
75dc896
Add first try for proper highlighting for the Wolfram Language
halirutan Sep 22, 2020
1b5a3ae
Include keywords as list, break up regexes, add features
halirutan Sep 28, 2020
2c01097
Disable autodetect for Mathematica until I have a good idea how this …
halirutan Sep 28, 2020
b78e9fc
Merge branch 'master' into WIP_Wolfram_Language
halirutan Sep 28, 2020
6216916
Merge remote-tracking branch 'upstream/master' into WIP_Wolfram_Language
halirutan Sep 28, 2020
0437cd8
Merge remote-tracking branch 'origin/WIP_Wolfram_Language' into WIP_W…
halirutan Sep 28, 2020
8d61d15
Improve things Josh suggested
halirutan Sep 28, 2020
165ea9d
Improve things Josh suggested
halirutan Oct 31, 2020
9cc315e
Merge branch 'master' into WIP_Wolfram_Language
halirutan Oct 31, 2020
7e7262d
Merge branch 'test_rule_improvements' into WIP_Wolfram_Language
halirutan Oct 31, 2020
050cad2
Make more general for going into core HLJS.
halirutan Oct 31, 2020
15d4251
Introduce specific class-names and use alias table.
halirutan Nov 1, 2020
ae6e1e3
Update src/languages/mathematica.js
halirutan Nov 1, 2020
3ee0fd0
Update src/languages/mathematica.js
halirutan Nov 1, 2020
95e1d91
Update src/languages/mathematica.js
halirutan Nov 1, 2020
b869c93
Update src/languages/mathematica.js
halirutan Nov 1, 2020
2648bd1
Small changes regarding relevance and position of comment rules.
halirutan Nov 1, 2020
47ed49e
first pass as classNameAliases and docs
joshgoebel Nov 1, 2020
5c02def
separate symbol rules
joshgoebel Nov 1, 2020
b06b5ce
add missing escape for UTF-8 compat
joshgoebel Nov 1, 2020
33fbc4d
proper type
joshgoebel Nov 1, 2020
7f43954
declare better type info for on:begin
joshgoebel Nov 2, 2020
1fa1040
we need a naked object
joshgoebel Nov 2, 2020
67ceb03
just use ModeCallback, duh
joshgoebel Nov 2, 2020
92b413a
Remove trailing commas
halirutan Nov 2, 2020
c4838b6
final cleanups
joshgoebel Nov 2, 2020
ad71815
Merge branch 'WIP_Wolfram_Language' of github.com:halirutan/highlight…
joshgoebel Nov 2, 2020
0653dcd
classNameAliases needs to be a null object to avoid `constructor` issues
joshgoebel Nov 2, 2020
a178a71
Merge branch 'master' into WIP_Wolfram_Language
joshgoebel Nov 3, 2020
6d240a7
Add change-log for Mathemtica and put myself into the champions-club
halirutan Nov 3, 2020
c20aa2a
tweak changelog
joshgoebel Nov 6, 2020
5d2d68a
Merge branch 'master' into WIP_Wolfram_Language
joshgoebel Nov 6, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions AUTHORS.txt
Original file line number Diff line number Diff line change
Expand Up @@ -298,3 +298,4 @@ Contributors:
- Jonathan Sharpe <[email protected]>
- Michael Rush <[email protected]>
- Florian Bezdeka <[email protected]>
- Patrick Scheibe <[email protected]>
11 changes: 10 additions & 1 deletion CHANGES.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,11 +21,19 @@ Language Improvements:
- enh(php) highlight variables (#2785) [Taufik Nurrohman][]
- fix(python) Handle comments on decorators (#2804) [Jonathan Sharpe][]
- enh(diff) improve highlighting of diff for git patches [Florian Bezdeka][]

- enh(mathematica) Rework entire implementation [Patrick Scheibe][]
- Correct matching of the many variations of Mathematica's numbers
- Matching of named-characters aka special symbols like `\[Gamma]`
- Updated list of version 12.1 built-in symbols
- Matching of patterns, slots, message-names and braces
Dev Improvements:

- chore(dev) add theme picker to the tools/developer tool (#2770) [Josh Goebel][]

Parser:

- enh(grammars) allow `classNameAliases` for more complex grammars [Josh Goebel][]

New themes:

- *StackOverflow Dark* by [Jan Pilzer][]
Expand All @@ -38,6 +46,7 @@ New themes:
[Jan Pilzer]: https://github.com/Hirse
[Jonathan Sharpe]: https://github.com/textbook
[Michael Rush]: https://github.com/rushimusmaximus
[Patrick Scheibe]: https://github.com/halirutan


## Version 10.3.1
Expand Down
117 changes: 82 additions & 35 deletions docs/mode-reference.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
Mode reference
Mode Reference
==============

Types
Expand All @@ -23,29 +23,81 @@ Types of attributes values in this reference:
+------------+-------------------------------------------------------------------------------------+


Attributes
----------
Language Only Attributes
------------------------

These attributes are only valid at the language level (ie, they many only exist on the top-most language object and have no meaning if specified in children modes).


name
^^^^

- **type**: string

The canonical name of this language, ie "JavaScript", etc.


case_insensitive
^^^^^^^^^^^^^^^^

**type**: boolean
- **type**: boolean

Case insensitivity of language keywords and regexps. Used only on the top-level mode.


aliases
^^^^^^^

**type**: array
- **type**: array

A list of additional names (besides the canonical one given by the filename) that can be used to identify a language in HTML classes and in a call to :ref:`getLanguage <getLanguage>`.


classNameAliases
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@allejo Any thoughts on this naming? It seems clear to me... Other ideas were nesting, but that seems more complex:

themes: { aliases: {}}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be fine with classNameAliases. Another suggestion would be styleClassAliases (or even styleAliases) which makes it a bit clearer what we're talking about. You should decide this having the newbie user in mind.

Copy link
Member

@joshgoebel joshgoebel Nov 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

className has meaning though because it's the key we use to specify such things already... making className and classNameAliases consistent.

^^^^^^^^^^^^^^^^

- **type**: object

A mapping table of any custom class names your grammar uses and their supported equivalencies. Perhaps your language has a concept of "slots" that roughly correspond to variables in other languages. This allows you to write grammar code like:

::

{
classNameAliases: {
slot: "variable",
"message-name": "string"
},
contains: [
{
className: "slot",
begin: // ...
}
]
}

The final HTML output will render slots with the CSS class as ``hljs-variable``. This feature exists to make it easier for grammar maintainers to think in their own language when maintaining a grammar.

For a list of all supported class names please see the :doc:`CSS class reference
</css-classes-reference>`.


disableAutodetect
^^^^^^^^^^^^^^^^^

- **type**: boolean

Disables autodetection for this language.



Mode Attributes
---------------


className
^^^^^^^^^

**type**: identifier
- **type**: identifier

The name of the mode. It is used as a class name in HTML markup.

Expand All @@ -56,16 +108,16 @@ for one thing like string in single or double quotes.
begin
^^^^^

**type**: regexp
- **type**: regexp

Regular expression starting a mode. For example a single quote for strings or two forward slashes for C-style comments.
If absent, ``begin`` defaults to a regexp that matches anything, so the mode starts immediately.


on:begin
^^^^^^^^^^^
^^^^^^^^

**type**: callback (matchData, response)
- **type**: callback (matchData, response)

This callback is triggered the moment a begin match is detected. ``matchData`` includes the typical regex match data; the full match, match groups, etc. The ``response`` object is used to tell the parser how it should handle the match. It can be also used to temporarily store data.

Expand All @@ -78,7 +130,7 @@ For an example of usage see ``END_SAME_AS_BEGIN`` in ``modes.js``.
end
^^^

**type**: regexp
- **type**: regexp

Regular expression ending a mode. For example a single quote for strings or "$" (end of line) for one-line comments.

Expand All @@ -93,9 +145,9 @@ This is achieved with :ref:`endsWithParent <endsWithParent>` attribute.


on:end
^^^^^^^^^^^
^^^^^^

**type**: callback (matchData, response)
- **type**: callback (matchData, response)

This callback is triggered the moment an end match is detected. ``matchData`` includes the typical regex match data; the full match, match groups, etc. The ``response`` object is used to tell the parser how it should handle the match. It can also be used to retrieve data stored from a `begin` callback.

Expand All @@ -106,9 +158,9 @@ For an example of usage see ``END_SAME_AS_BEGIN`` in ``modes.js``.


beginKeywords
^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^

**type**: string
- **type**: string

Used instead of ``begin`` for modes starting with keywords to avoid needless repetition:

Expand Down Expand Up @@ -140,7 +192,7 @@ Ex. ``class A { ... }`` would match while ``A.class == B.class`` would not.
endsWithParent
^^^^^^^^^^^^^^

**type**: boolean
- **type**: boolean

A flag showing that a mode ends when its parent ends.

Expand Down Expand Up @@ -169,7 +221,7 @@ This is when ``endsWithParent`` comes into play:
endsParent
^^^^^^^^^^^^^^

**type**: boolean
- **type**: boolean

Forces closing of the parent mode right after the current mode is closed.

Expand Down Expand Up @@ -215,7 +267,7 @@ endSameAsBegin (deprecated as of 10.1)
``END_SAME_AS_BEGIN`` mode or use the ``on:begin`` and ``on:end`` attributes to
build more complex paired matchers.

**type**: boolean
- **type**: boolean

Acts as ``end`` matching exactly the same string that was found by the
corresponding ``begin`` regexp.
Expand Down Expand Up @@ -244,7 +296,7 @@ and ``endSameAsBegin: true``.
lexemes (now keywords.$pattern)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

**type**: regexp
- **type**: regexp

A regular expression that extracts individual "words" from the code to compare
against :ref:`keywords <keywords>`. The default value is ``\w+`` which works for
Expand All @@ -260,7 +312,7 @@ constant that you repeat multiple times within different modes of your grammar.
keywords
^^^^^^^^

**type**: object
- **type**: object / string

Keyword definition comes in two forms:

Expand All @@ -273,7 +325,7 @@ For detailed explanation see :doc:`Language definition guide </language-guide>`.
illegal
^^^^^^^

**type**: regexp
- **type**: regexp

A regular expression that defines symbols illegal for the mode.
When the parser finds a match for illegal expression it immediately drops parsing the whole language altogether.
Expand All @@ -282,7 +334,7 @@ When the parser finds a match for illegal expression it immediately drops parsin
excludeBegin, excludeEnd
^^^^^^^^^^^^^^^^^^^^^^^^

**type**: boolean
- **type**: boolean

Exclude beginning or ending lexemes out of mode's generated markup. For example in CSS syntax a rule ends with a semicolon.
However visually it's better not to color it as the rule contents. Having ``excludeEnd: true`` forces a ``<span>`` element for the rule to close before the semicolon.
Expand All @@ -291,7 +343,7 @@ However visually it's better not to color it as the rule contents. Having ``excl
returnBegin
^^^^^^^^^^^

**type**: boolean
- **type**: boolean

Returns just found beginning lexeme back into parser. This is used when beginning of a sub-mode is a complex expression
that should not only be found within a parent mode but also parsed according to the rules of a sub-mode.
Expand All @@ -302,7 +354,7 @@ Since the parser is effectively goes back it's quite possible to create a infini
returnEnd
^^^^^^^^^

**type**: boolean
- **type**: boolean

Returns just found ending lexeme back into parser. This is used for example to parse JavaScript embedded into HTML.
A JavaScript block ends with the HTML closing tag ``</script>`` that cannot be parsed with JavaScript rules.
Expand All @@ -314,15 +366,15 @@ Since the parser is effectively goes back it's quite possible to create a infini
contains
^^^^^^^^

**type**: array
- **type**: array

The list of sub-modes that can be found inside the mode. For detailed explanation see :doc:`Language definition guide </language-guide>`.


starts
^^^^^^

**type**: identifier
- **type**: identifier

The name of the mode that will start right after the current mode ends. The new mode won't be contained within the current one.

Expand All @@ -333,7 +385,7 @@ Tags ``<script>`` and ``<style>`` start sub-modes that use another language defi
variants
^^^^^^^^

**type**: array
- **type**: array

Modification to the main definitions of the mode, effectively expanding it into several similar modes
each having all the attributes from the main definition augmented or overridden by the variants::
Expand Down Expand Up @@ -366,10 +418,11 @@ Further info: https://github.com/highlightjs/highlight.js/issues/826

.. _subLanguage:


subLanguage
^^^^^^^^^^^

**type**: string or array
- **type**: string or array

Highlights the entire contents of the mode with another language.

Expand All @@ -381,10 +434,11 @@ The value of the attribute controls which language or languages will be used for
* empty array: auto detection with all the languages available
* array of language names: auto detection constrained to the specified set


skip
^^^^

**type**: boolean
- **type**: boolean

Skips any markup processing for the mode ensuring that it remains a part of its
parent buffer along with the starting and the ending lexemes. This works in
Expand All @@ -407,10 +461,3 @@ handle pairs of ``/* .. */`` to correctly find the ending ``?>``::
Without ``skip: true`` every comment would cause the parser to drop out back
into the HTML mode.

disableAutodetect
^^^^^^^^^^^^^^^^^

**type**: boolean

Disables autodetection for this language.

5 changes: 3 additions & 2 deletions src/highlight.js
Original file line number Diff line number Diff line change
Expand Up @@ -174,7 +174,8 @@ const HLJS = function(hljs) {
buf = "";

relevance += keywordRelevance;
emitter.addKeyword(match[0], kind);
const cssClass = language.classNameAliases[kind] || kind;
emitter.addKeyword(match[0], cssClass);
} else {
buf += match[0];
}
Expand Down Expand Up @@ -225,7 +226,7 @@ const HLJS = function(hljs) {
*/
function startNewMode(mode) {
if (mode.className) {
emitter.openNode(mode.className);
emitter.openNode(language.classNameAliases[mode.className] || mode.className);
}
top = Object.create(mode, { parent: { value: top } });
return top;
Expand Down
Loading