Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issue 145 and 172 #174

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

LeeiFrankJaw
Copy link

This squashed commit basically fix issue 145 and 3 points mentioned in issue 172. It now supports basic UTF-8 characters and perform proper case conversion for common latin letters with diacritics. At the same time, it is backward compatible with most existing syntactics. Nothing can demonstrate this better than the following examples.

"200 \LaTeX \ae Foö{bar}{\'o \ae}{{\'o}}" smart.upper.case top$
"Perturbations in the A $\Sigma_u^+$ state of Na$_2$" smart.sentence.case top$

The above code produce 200 \LaTeX \AE FOÖ{bar}{\'O \AE}{{\'o}} and Perturbations in the A $\Sigma_u^+$ state of Na$_2$. {\H {c}a{d{e}}}o. It mostly respects x_change_case procedure implemented in bibtex.web. One obvious difference is that commands (except those for single letters with diacritics) at brace level 0 won't undergo case transformation. Brace protection is still honored. For UTF-8 characters and simple math expression, you probably won't need them though, as indicated in the above example.

Squashed commit of the following:

commit bc1574f
Author: Lei Zhao [email protected]
Date: Mon Dec 2 10:46:11 2024 +0800

Rewrite normalize.page.range again

commit b54d856
Author: Lei Zhao [email protected]
Date: Fri Nov 29 03:31:35 2024 +0800

Rewrite normalize.page.range

commit bdb9fdd
Author: Lei Zhao [email protected]
Date: Thu Nov 28 08:22:00 2024 +0800

Implement functions for texchar semantics

commit a1d3f5b
Author: Lei Zhao [email protected]
Date: Wed Nov 27 17:26:57 2024 +0800

Support Latin Extended-A

commit 775f797
Author: Lei Zhao [email protected]
Date: Wed Nov 27 16:33:01 2024 +0800

Implement smart.upper.case

commit 20bffbd
Author: Lei Zhao [email protected]
Date: Sat Nov 23 00:46:39 2024 +0800

Support polymorphism when tokenizing

commit 31ef32e
Author: Lei Zhao [email protected]
Date: Thu Nov 21 21:57:27 2024 +0800

Update tests for Dublin Core entry

commit 4abe601
Author: Lei Zhao [email protected]
Date: Thu Nov 21 09:08:03 2024 +0800

Increase compatibility of font selection

commit aad1cb9
Author: Lei Zhao [email protected]
Date: Thu Nov 21 06:48:52 2024 +0800

Use page.range.separator

Rewrite `hyphenate` and rename it to `normalize.page.range`.  Use
`page.range.separator` to configure separator in page ranges.  Also
update DocStrip options for added or modifed configuration variables.

commit 0450382
Author: Lei Zhao [email protected]
Date: Thu Nov 21 04:49:56 2024 +0800

Follow the existing convention

Follow the convention of using function to define constants

commit cf3c5c6
Author: Lei Zhao [email protected]
Date: Thu Nov 21 02:06:25 2024 +0800

Refactoring

 * Now `is.all.lower` returns true for empty strings.  Following the convention of modern predicate logic, it assumes no existential import.  Update functions which depend on it.

 * The second argument of the return value of `split.first.char.from.str`
   is of polymorphic type.  It returns an empty string for an empty
   string instead of a null char.

 * Some functions are rewritten to enable short-circuit evaluation.

commit 352b89b
Author: Lei Zhao [email protected]
Date: Wed Nov 20 19:30:07 2024 +0800

Do some refactoring and renaming

commit bbf5add
Author: Lei Zhao [email protected]
Date: Wed Nov 20 07:57:18 2024 +0800

Enable lowercase.word.after.colon by default

Also update tests for this

commit 6a7c3bc
Author: Lei Zhao [email protected]
Date: Wed Nov 20 07:49:36 2024 +0800

Update tests for smart.sentence.case

commit b3c9d79
Author: Lei Zhao [email protected]
Date: Wed Nov 20 06:40:38 2024 +0800

Add basic UTF-8 support

commit 456687e
Author: Lei Zhao [email protected]
Date: Sat Nov 16 05:53:13 2024 +0800

Remove ignore.extra.interword.space

This feature is extraneous since the extra spaces are already
preprocessed by the BibTeX.

commit b15d673
Author: Lei Zhao [email protected]
Date: Thu Nov 14 20:33:20 2024 +0800

Improve smart.sentence.case.lower.token

commit d7e7f53
Author: Lei Zhao [email protected]
Date: Thu Nov 14 19:18:29 2024 +0800

Basically finish the smart lowercase feature

commit 5da7b11
Author: Lei Zhao [email protected]
Date: Mon Nov 11 22:22:24 2024 +0800

Update the source dtx file

commit dc88ba8
Author: Lei Zhao [email protected]
Date: Mon Nov 11 20:38:39 2024 +0800

Add en.dash.in.pages option

Also process UTF-8 en dash (–)

This squashed commit basically fix issue [145][1] and 3 points mentioned
in issue [172][2].  It now supports basic UTF-8 characters and perform
proper case conversion for common latin letters with diacritics.  At the
same time, it is backward compatible with most existing syntactics.
Nothing can demonstrate this better than the following examples.

```
"200 \LaTeX \ae Foö{bar}{\'o \ae}{{\'o}}" smart.upper.case top$
"Perturbations in the A $\Sigma_u^+$ state of Na$_2$" smart.sentence.case top$
```

The above code produce `200 \LaTeX \AE FOÖ{bar}{\'O \AE}{{\'o}}` and
`Perturbations in the A $\Sigma_u^+$ state of Na$_2$. {\H {c}a{d{e}}}o`.
It mostly respects `x_change_case` procedure implemented in
[_bibtex.web_][3].  One obvious difference is that commands (except
those for single letters with diacritics) at brace level 0 won't undergo
case transformation.  Brace protection is still honored.  For UTF-8
characters and simple math expression, you probably won't need them
though, as indicated in the above example.

[1]: zepinglee#145
[2]: zepinglee#172 (comment)
[3]: https://tug.org/svn/texlive/trunk/Build/source/texk/web2c/bibtex.web?revision=57915&view=markup#l8884

Squashed commit of the following:

commit bc1574f
Author: Lei Zhao <[email protected]>
Date:   Mon Dec 2 10:46:11 2024 +0800

    Rewrite normalize.page.range again

commit b54d856
Author: Lei Zhao <[email protected]>
Date:   Fri Nov 29 03:31:35 2024 +0800

    Rewrite normalize.page.range

commit bdb9fdd
Author: Lei Zhao <[email protected]>
Date:   Thu Nov 28 08:22:00 2024 +0800

    Implement functions for texchar semantics

commit a1d3f5b
Author: Lei Zhao <[email protected]>
Date:   Wed Nov 27 17:26:57 2024 +0800

    Support Latin Extended-A

commit 775f797
Author: Lei Zhao <[email protected]>
Date:   Wed Nov 27 16:33:01 2024 +0800

    Implement smart.upper.case

commit 20bffbd
Author: Lei Zhao <[email protected]>
Date:   Sat Nov 23 00:46:39 2024 +0800

    Support polymorphism when tokenizing

commit 31ef32e
Author: Lei Zhao <[email protected]>
Date:   Thu Nov 21 21:57:27 2024 +0800

    Update tests for Dublin Core entry

commit 4abe601
Author: Lei Zhao <[email protected]>
Date:   Thu Nov 21 09:08:03 2024 +0800

    Increase compatibility of font selection

commit aad1cb9
Author: Lei Zhao <[email protected]>
Date:   Thu Nov 21 06:48:52 2024 +0800

    Use page.range.separator

    Rewrite `hyphenate` and rename it to `normalize.page.range`.  Use
    `page.range.separator` to configure separator in page ranges.  Also
    update DocStrip options for added or modifed configuration variables.

commit 0450382
Author: Lei Zhao <[email protected]>
Date:   Thu Nov 21 04:49:56 2024 +0800

    Follow the existing convention

    Follow the convention of using function to define constants

commit cf3c5c6
Author: Lei Zhao <[email protected]>
Date:   Thu Nov 21 02:06:25 2024 +0800

    Refactoring

     * Now `is.all.lower` returns true for empty strings.  Following the
       convention of modern predicate logic, it assumes no existential
       import.  Update functions which depend on it.

     * The second argument of the return value of `split.first.char.from.str`
       is of polymorphic type.  It returns an empty string for an empty
       string instead of a null char.

     * Some functions are rewritten to enable short-circuit evaluation.

commit 352b89b
Author: Lei Zhao <[email protected]>
Date:   Wed Nov 20 19:30:07 2024 +0800

    Do some refactoring and renaming

commit bbf5add
Author: Lei Zhao <[email protected]>
Date:   Wed Nov 20 07:57:18 2024 +0800

    Enable lowercase.word.after.colon by default

    Also update tests for this

commit 6a7c3bc
Author: Lei Zhao <[email protected]>
Date:   Wed Nov 20 07:49:36 2024 +0800

    Update tests for smart.sentence.case

commit b3c9d79
Author: Lei Zhao <[email protected]>
Date:   Wed Nov 20 06:40:38 2024 +0800

    Add basic UTF-8 support

commit 456687e
Author: Lei Zhao <[email protected]>
Date:   Sat Nov 16 05:53:13 2024 +0800

    Remove ignore.extra.interword.space

    This feature is extraneous since the extra spaces are already
    preprocessed by the BibTeX.

commit b15d673
Author: Lei Zhao <[email protected]>
Date:   Thu Nov 14 20:33:20 2024 +0800

    Improve smart.sentence.case.lower.token

commit d7e7f53
Author: Lei Zhao <[email protected]>
Date:   Thu Nov 14 19:18:29 2024 +0800

    Basically finish the smart lowercase feature

commit 5da7b11
Author: Lei Zhao <[email protected]>
Date:   Mon Nov 11 22:22:24 2024 +0800

    Update the source dtx file

commit dc88ba8
Author: Lei Zhao <[email protected]>
Date:   Mon Nov 11 20:38:39 2024 +0800

    Add en.dash.in.pages option

    Also process UTF-8 en dash (–)
@zepinglee
Copy link
Owner

感谢!不过我可能需要时间检查下有没有别的问题。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants