Replies: 1 comment 3 replies
-
Linguist's policy is loosely detailed in the CONTRIBUTING.md file, but this is specific to the analysis of individual files. In short, we accept duplicates and rely on samples to train the classifier and heuristics to differentiate the different languages based on the content of the files that share a common extension. This however has no influence on what you are reporting as the rendering of codeblocks is done entirely by the markup library and it'll only highlight the blocks with the language YOU tell it to use.
They do, however they're not the only thing. The language name and And this is where the crux of your issue is...
You are indeed making an assumption, and someone else may be making the inverse assumption. The markup library has no way of knowing if your assumption or their assumption is the intended result, so in the cases of ambiguity, it goes for the first in the list, as ordered by the So in order to get the correct highlighting in cases of ambiguousity, be more precise: use the language name, or an alias, that YOU want. I want this block to be highlighted as markdown so I use: let a = 123; /* Lisp? */
# markdown?
**bold** I want this block to be highlighted as XML so I use: let a = 123; /* TS? */
<!--xml?-->
<xml>foo</xml> I want this block to be highlighted as TSX so I use: let a = 123; /* TS? */
import * as React from 'react' I want this block to be highlighted as ASP.NET but it's a pain to write that out so I use the alias: <%@ Register Src="~/Account/OpenAuthProviders.ascx" TagPrefix="uc" TagName="OpenAuthProviders" %> I really don't know what to use for this block so I don't specify a language:
So in short, YOU are being ambiguous with your request to markup when it comes to highlighting your codeblock and in turn have hightlighted a short-coming with using extensions to indicate the language of a codeblock that only comes up when using extensions shared by languages. The only solution is to be more explicit about what YOU want; neither Linguist nor markup can guess this.
We've already got one. We accept duplicates and use samples to train the classifier and heuristics to make the language selection more accurate when analysing files. When it comes to codeblocks, this has nothing to do with Linguist and such a policy would need to be documented in the markup docs, though it really comes down to "YOU are telling markup what language to use, so be more precise about what you want it to do; it can't guess".
Don't. This will be rejected. |
Beta Was this translation helpful? Give feedback.
-
Hi!
Macro Q: I was wondering what the policy was on duplicate extensions. I image there to be duplicates for niche languages, but if there’s a clear super popular use of an extension, should others be removed?
Micro Q: Particularly I see that
.tsx
is included in the XML language, and.md
in some GCC Lisp flavor.I think extensions affect how markdown fenced code works here on GH.
But regardless, extensions here will also affect other tools using the provided data.
From what I can tell, I think the grammars here are loaded into the GitHub app by index, which is case sensitive, which means that
.tsx
is used for TS (as it comes before XML), whereas GCC is used for.md
, which comes before markdown.My assumption here is that most people would want
```md
to highlight as markdown, not Lisp.Note: the highlighting here on GH does not match how it renders on GH:
Note: See that Lisp wins over markdown, and TS over XML.
Proposal:
a) I think some policy might be useful
b) I can make a PR to remove
.tsx
from XML and.md
from GCC Lisp, there might be others!Beta Was this translation helpful? Give feedback.
All reactions