-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add integrity check for LaTeX special characters #8712
Comments
The workflow for users is as follows:
|
@Siedlerchr @koppor With this new feature, I receive now tons of LaTeX Parsing Errors, when using "Check integrity", which irritated me for a while and made me think: Field: File This results because of using the JabRef Browser Extension, which adds linked files as follows:
|
@systemoperator Thank you for letting us know! CC @Zylence |
@systemoperator A new version with a fix be available in 30min at #10401. There should then be a comment appearing linking to the binaries. |
Awesome! I'm gonna check this out. |
I am terribly sorry, I did not check for this usecase. ( Honestly I only tested manual input. 😐) |
No worries at all - and thank @systemoperator for the quick report! Background: This is one consequence of our "merge early" development strategy. Instead of asking users to try out PRs, we "just" merge and fix ASAP if something goes wrong. We get more velocity into JabRef - and I think, this good for the tool as whole. -- Otherwise, we had many PRs piling up, because some details were missing. Then, they got abandoned and never got revived. OK, most of the non-merged PRs were 20% solutions. These currently pile up at https://github.com/koppor/jabref/pulls |
@koppor The mentioned error has gone now. :) I've noticed other things for imported entries, using the JabRef Browser Extension: For DOI fields: For Comment, Comment-username, Abstract, Title and Journaltitle fields: |
TBH, I don't use the integrity check, too much to fix 🙈. Thus, I am relying on feedback here. Currently, we just exclude "verbatim" fields. I think, all identifier fields should also be classified as verbatim. On the other hand, DOI etc should be checked for HTML characters (shouldn't they)? Regarding comment and abstract. The should have proper LaTeX. Maybe, we need to allow the commands you mentioned? |
Can you give an example text raising "LaTeX Parsing Error: Math superscript (^) and subscript (_) characters are not allowed in text mode"? |
We could maybe check only the fields which are going to be typeset using LaTeX in the common bibliography styles - to avoid noice? Comment, abstract, ... would not be checked, but title, author, ... |
In my case, I pasted some URLs into the Comments or Comments-username field, which contained the underscores. |
In that case it would be expected to fail, as _ and ^ are only allowed in latex math environments. |
Sure. It's nothing, which breaks something, it's just informative.
Or only allowing URL characters. (maybe excluding &, ? and #?)
I guess, there would be more commands to consider, which can show up. |
Concerning the title, some authors use extraordinary characters there. 🤔 |
The implementation of the underlying snuggletex is not complete. To add new commands, we'd either need to make PRs to snuggletex or to inject new commands as shown here: JabRef#646 (comment) (takeaways section) The latter is probably faster. |
I have pasted URLs into the URL field, where also some parameters exist, which are concatenated with the &. The same applies to the File field, where the JabRef Browser Extension sends the Links, which can also contain "&" characters. Examples: |
Ideas ranked from most to least favorite:
Can not really come up with a better idea then the one proposed first. May need a little more time to think. |
Okay, but that's the "fault" of the AmpersandChecker which was introduced earlier.
[Edit]: Just realized that won't make much sense for that case, since the ampersands are not part of the queryparameter. |
Please don't mess with url fields. Rather exclude them. As far as I know
url is also a verbatim field
Julian Kirsch ***@***.***> schrieb am Fr., 22. Sept. 2023,
01:45:
… Found 2 unescaped '&'
Okay, but that's the "fault" of the AmpersandChecker which was introduced
earlier <#9758>.
As the problem seems to be predominantly induced by urls, url-encoding now
ranks higher in my opinion. We could intercept *paste* and additionally
warn the user.
So for example:
https://en.wikipedia.org/w/index.php?title=Information_extraction&oldid=948091115
becomes:
https%3A%2F%2Fen.wikipedia.org
%2Fw%2Findex.php%3Ftitle%3DInformation%5fextract%0Aion%26oldid%3D948091115
which is manageable.
—
Reply to this email directly, view it on GitHub
<#8712 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACOFZHM65TJEOFXUUIARODX3TGSFANCNFSM5UCQBSGQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Then the ampersand checker needs to be updated to exclude verbatim fields. #quickwin |
As usual, I would do both ^^ -- PR to https://github.com/davemckain/snuggletex (reference is more for me) However, I am not 100% sure about the error "Undefined command". We can never be package-complete. On the one hand, I assume the sources come from publishers, which "should" use correct commands. On the other hand, we should add the most common commands to avoid "typos" ( |
@koppor I've checked the latest build. I would recommend, allowing underscores in In the Furthermore, in my opinion, it would make sense to adapt the messages in the integrity check so that "LaTeX Parsing Error" is only printed, if there is REALLY some LaTeX parsing error. (Initially, I thought, something is broken in my file, but actually, everything was fine.) I think, a message like e.g. "LaTeX Parsing Warning" or something different (like "LaTeX Inconsistency", or even "LaTeX Improvement") would be more appropriate, so that a user does not panic, when reading it. Basically any message in the integrity check suggests improvements in terms of inconsistencies, without anything being broken. (Probably, if something is really broken, like a wrongly inserted "}" or "{", in the bib-file, I guess, even Jabref cannot properly parse it anymore. |
@systemoperator I fully agree. Then, my PR #10436 does not work. I need to re-check. Sorry for that many iterations on this. I will also work on the text. I think, On the one hand, we could classify the errors at https://github.com/davemckain/snuggletex/blob/development_1_2_x/snuggletex-core/src/main/resources/uk/ac/ed/ph/snuggletex/core-error-messages.properties between warning and error. On the other hand, always "Warning" feels currently right. The issue with the missing |
@systemoperator I was testing this again today and cannot reproduce your issue. For me, no error related to DOI is raised |
@Siedlerchr After three iterations, we managed. 😅🙈🎉🎉 |
I've checked it with my reference library, with more than 500 entries. Everything is printed properly now. 🎉 |
This issue was inspired by inspecting and working on issues #8673, plurimath/unicode2latex#19, #8650, #8682 and #3644
Problem:
Desired solution:
From Table 1: LaTeX escapable "special" characters:
%
,$
,_
,#
and&
\%
,\$
,\_
,\#
and\&
, because this should be handled at text paste.From Table 2: Predefined LATEX 2𝜀 Text-mode Commands:
Table 3: Commands Defined to Work in Both Math and Text Mode:
...
- PS. not yet sure about this one. Maybe the...
should be added to the Unicode2LaTeX converter instead. Please Exclude this character for now. Work on the others first!_
and\_
), so no need to implement an integrity check for them! :-)Additional info:
The text was updated successfully, but these errors were encountered: