Skip to content

Latest commit

 

History

History
919 lines (653 loc) · 58.4 KB

README.org

File metadata and controls

919 lines (653 loc) · 58.4 KB

org-utf-to-xetex

About

If you want to completely replace a Unicode-friendly WYSIWYG text editor like Microsoft Word or OpenOffice with Org Mode for documents that are primarily Latin script then this package will help you do it.

For Skilled LaTeX Users: Please start here.

Table of Contents

Read Me First

You wrote Have a great day 😄 in your primarily Latin script Org-Mode file. Then you export it to PDF. Instead of seeing a cute smiling face, you see a white box instead: Have a great day □. If you are in this situation, then you might be interested in reading on.

Org-Mode exports to many formats. Each format has different strengths. When Org-Mode exports your document, it needs to leverage the advantages of the destination format by abstracting the difference so that you can specify the intended character in Org-Mode’s markup and let Org-Mode choose the right one for your destination format. Org-Mode facilitates this abstraction using it’s Symbols mechanism. A great example here is the smiley face.

In Org-Mode, the smiley face is represented by the string \smiley. When your document is exported, it gets converted to the correct representation for its destination. Here are some example conversions (language, output):

  • LaTeX: \smiley{}
  • HTML: ☺
  • ASCII: :-)
  • UTF-8:

You will get a smiley face in every export. This mechanism should get you what you want. For my computer, this wasn’t enough.

On my computer, I am pretty lazy because I want to write a document once and have it look pretty close to its final version. I don’t want to have to use a Symbol for every Non-Latin character, and there aren’t enough symbols anyway. Using UTF-8 files makes it easy to use whatever character you want, so that is the right place to start.

The list of supported characters is simple: you can use any Unicode character. Once you find the right font, it looks good in a text file, just how you would expect it. When you export to HTML, it “just works.” On my box, though, it didn’t “just work” for PDF. The solution is to switch to a different LaTeX compiler. It lets you generate PDFs just as easily as with every other Org-Mode export: all of your characters show up correctly.

Org-Mode makes it easy to use the same “Unicode Everywhere” workflow by switching from the PDFLaTeX compiler to the XeTeX compiler. To use this package, you must use XeTeX. Most of us are switching here from PDFLaTeX. Once you make the switch to XeTeX, and configure your system using this package, your PDFs print the same characters as your text files. If you are interested in that workflow, then you might be interested in this package.

Overview

org-utf-to-xetex is a little package that teaches Org-Mode and XeTeX what fonts to use for some of the Unicode characters in your document. This is a little package because most of the steps happen in your LaTeX configuration. This package provides helpful functionality for setting up your LaTeX compiler for Unicode characters and, most importantly, configuring your Org LaTeX exporter to use it. The critical functionality, though, is provided by your LaTeX compiler, as you will see in the workflow steps below. Read on to see the empty white box problem and how this package resolves it.

Using the “out of the box” default font settings for LaTeX makes most of your PDF documents look great because 99% of the characters that you use are Latin and the default font supports all of them. The problem is the 1% that it doesn’t, which are most likely Unicode characters. Instead of displaying your Unicode character that you expected to see, instead you see an empty white box. Here is an example of the Org-Mode document, the intermediate LaTeX code, and resulting PDF

Workflow Without This Package:

File In WorkflowScreenshot
Org-Mode Source/images/orgfile.png
LaTeX Source/images/latexfile.png
Generated PDF/images/pdffile.png

That clearly isn’t what you expected.

When your LaTeX compiler created the PDF, it used the default font. But that font doesn’t handle the Unicode character you wrote. Your PDF wants you to know that it tried to show you something for that character but could not. It tells you by showing you an empty white box.

This is pretty common because fonts do not and cannot cover all of the Unicode symbols out there (there are too many). The solution is to specify a different font to handle the characters that the default font doesn’t know about. This package teaches Org-Mode and XeTeX how to do that.

Now your Org-Mode document and generated PDF should look something like this.

Workflow With This Package:

File In WorkflowScreenshot
Org-Mode Source Without Prettification/images/orgfiledonenotpretty.png
Org-Mode Source With Prettification/images/orgfiledonepretty.png
LaTeX Source With Font Commands/images/latexfiledone.png
PDF With Correct Fonts/images/pdffiledone.png

Examples

Requirements And Compatibility

  • Compiler: XeTeX
  • Distribution: An OS Specific TeX Distribution
  • Org-Mode Compatibility
    VersionCompatible?
    9.xYes
    8.xYes
    Below 8.xNo: 8.0 introduced the new exporter framework with which you would use this package.
  • Emacs Compatibility:
    VersionCompatible?
    26.xYes
    25.xYes
    Below 25.xNo

Installation

Download this package to ~/src/.

Add the following to your init file in order to:

  • Add it to your load path.
  • Load it.
  • Add it to your Org-Mode hook.
(add-to-list 'load-path "~/src/org-utf-to-xetex")
(require 'org-utf-to-xetex)
(add-hook 'org-mode-hook #'org-utf-to-xetex-prettify)

Workflow

First stop here, take a few breaths, then seriously consider this:

Initially setting up this workflow might feel intimidating, but remember that #1 if I can set it up, then anybody can set it up, and #2 learning this technology will ultimately empower you to use Org-mode with any Unicode characters here forward. This technology completely frees you from WYSIWIG editors and lets you indeed “Organize Your Life In Plain [Unicode] Text” for the rest of your life. You can do it!

Here are the steps to use this package, starting from the bottom layer with XeTeX all the way up to the top layer with Org-Mode.

TeX

If you followed along above, then you’ve already installed a TeX distribution. Otherwise, read above and install the TeX distribution for your operating system.

Start the update utility and update everything. On macOS, it is called TeX Live Utility.

Be sure to read the “Introduction To [MacTeX|MikTeX|Your Distribution …]”.

Create your TeX configuration resources. The following are for MacTeX, and you can adapt them to your local distribution. Whether you are new to TeX or not, it is always good to back up the original configuration and store your configuration resources in Git.

cd /usr/local/texlive/2019
ls
sudo cp texmf.cnf texmf.cnf-`date '+%Y_%m_%d__%H_%M_%S'`
ls
cd ~/src
rm -rf ./texmf
git clone github:yourname/texmf.git
cd ~/src/texmf
# this requires GNU mkdir
mkdir -p tex/latex/yourname
touch tex/latex/yourname/yourname.sty
sudo tlmgr conf texmf TEXMFhome ~/src/texmf
sudo mktexlsr

Now configure your default fonts for your PDF documents. Out of the box, you can configure the following fonts:

Main Font
Default or all text
Sans Font
Characters without serifs.
Mono Font
Monospaced characters like code, for example.

You probably already have a preferred font for these kinds of characters. If you don’t, you can find plenty of options online. When you are ready to configure your default fonts you can specify them easily by name like this:

\setmainfont{DejaVu Serif}
\setsansfont{DejaVu Sans}[Scale=MatchLowercase]
\setmonofont{Hack}[Scale=MatchLowercase]

Create a new file new.ltx with the following LaTeX code:

\documentclass{article}
\begin{document}
Hello World

\texttt{code goes here}.
\end{document}

Play with it in the LaTeX editor included with your distribution until you are comfortable creating a PDF using XeTeX. On macOS the editor that comes with MacTeX is TeXShop. When you are using TeXShop be sure to start it from the command line. Open the PDF.

It should look something like this:

/images/typesettingcheck.png

Now you have enough configured that when you create your PDF, it should look right. In fact, it is required. Ensure that everything looks right before moving forward: it is the only way to know that your system is working correctly before configuring additional fonts. Verifying that the toolchain works correctly step by step is critical for being able to create a minimal, complete and verifiable example (MCVE) should you ever run into any problems (which do happen).

You’ll use this document throughout the rest of the configuration.

Configure your LaTeX editor until you feel good about it because getting comfortable here will make the whole process pleasant and even fun.

Org-mode

In Org-Mode, change the LaTeX compiler to XeTeX. Force Org-Mode to produce PDFs.

(setq org-latex-compiler "xelatex")
(setq org-latex-pdf-process '("latexmk -xelatex -quiet -shell-escape -f %f"))

Like this article explains, XeTeX uses the fontspec package instead of inputenc or fontenc so add ("" "fontspec") to org-latex-packages-alist:

(add-to-list 'org-latex-packages-alist '("" "fontspec"))

Use your personal LaTeX configuration package (the STY file you created up above) by adding it to your default Org-mode package list:

(add-to-list 'org-latex-packages-alist '("" "yourname"))

Now recreate the test document above using Org-mode instead of LaTeX.

Create a file new.org with the following Org-Mode code:

Hello, world.

~code goes here~.

Hit C-c C-e l L to look at the generated LaTeX code in the newly created buffer named *Org LATEX Export*. It will contain a lot of code, but look for the code that is identical to the sample file you created above. Although you do not need to become a LaTeX export in order to use Org-Mode and this package, you should start to get comfortable looking at it because it will become an important part of your PDF creating life now.

Now try exporting the Org file to a PDF and immediately opening by hitting C-c C-e l o.

The PDF document should be identical to the LaTeX file that you compiled above:

/images/typesettingcheck.png

Now you have enough configured that when you create your PDF, it should look right. In fact, it is required. Make sure that everything looks right before moving forward: it is the only way to know that your system is working correctly before configuring additional fonts.

You’ll use new.org throughout the rest of the configuration.

Configure Emacs until you feel good about using Org because getting comfortable here will make the whole process pleasant and even fun.

Now that both your LaTeX and Org toolchain are working correctly, you can move forward and configure this package.

org-utf-to-xetex

Start by installing this package using the directions above.

Add some Unicode characters to the both new.ltx and new.org. For example, A 我-⍋+☀APPLE🙋ZEBRA. Compile them. White boxes will appear for some of the characters you entered. This is how you know that XeTeX doesn’t know what fonts to use for all of the Unicode blocks right now.

The reason I had you again work with the LaTeX document is simply to continue to help you get comfortable with it. Its just for the fun of it right now. From here forward though you’ll only work with new.org.

For every empty white box that you want to be replaced with a real character, you’ll need to go through the following steps. For the sake of this example, it will go through the entire process for the character 🙋.

Wrap 🙋 in the macro from this package by highlighting it and calling org-utf-to-xetex-insert-or-wrap-with-macro. The following images show how your buffer should look with the visualization options configured for:

Prettified-ModeOrg Hide MacrosScreenshot
TrueTrue/images/workflow-wrap-pretty-hide.png
FalseTrue/images/workflow-nowrap-nopretty-hide.png
FalseFalse/images/workflow-wrap-nopretty-nohide.png

Move your cursor to the first line of the document. Install the macro from this package calling the org-utf-to-xetex-insert-setup-file-line function. With the cursor on #+SETUPFILE line that was just inserted and hit C-c C-c so that Org-Mode will refresh its setup. Now your document can use the macro.

/images/workflow-macro-install.png

Look back at the test document and PDF it created. Each one of those characters that are rendered as an empty square box needs to have a font configured for its Unicode block. In order to configure the Unicode block, you need to know the block name. Identify the Unicode block for the character 🙋 by placing the cursor in front of it and calling org-utf-to-xetex-get-unicode-block-for-string-char-after.

/images/workflow-get-unicode-block.png

The name of the Unicode block will appear in the Minibuffer and also *Messages*. This package ignores most Latin characters. So if you inspect a Latin character, you will see a message explaining that this package ignores Latin characters. It looks like this when you attempt to use it on the character a:

/images/unicode-block-for.png

That means you have nothing more to do here. There is nothing that you need to do to configure the font for this character. However, if this package cares about that character, then it will tell you its Unicode block name. Take note of it because you will use it later.

Find a font that XeTeX should use for rendering this character 🙋. An easy way to find one is to ask Emacs what font that it is using for that character. Place your cursor on that character and C-x describe-char.

/images/workflow-describe-char.png

Take note of it because you will use it later.

Tell XeTeX what font to use for characters in this Unicode block. This package creates XeTeX commands to help you configure new LaTeX fontcommands for the character’s Unicode block. They follow a standard format like you see in the example below. You can create a buffer with commands for every block name by calling M-x org-utf-to-xetex-command-for-every-block for convenience and reference, but you won’t need them all, only the one for 🙋's block: Emoticons. Here is the code you will use by copying it:

\newfontfamily\Emoticons{font}
\DeclareTextFontCommand{\textEmoticons}{\Emoticons}

Add these to your custom package, the file yourname.sty.

Specify what font you decided to use for this block. Here is an example from my configuration for the Emoticon block using Symbola which includes a lot of characters. Here is the code you will use by copying it:

\newfontfamily\Emoticons{Symbola}
\DeclareTextFontCommand{\textEmoticons}{\Emoticons}

Add these to your custom package, the file yourname.sty, and either use Symbola or replace it wither another font that you like.

Compile new.org again and view its code with C-c C-e l L.

Open that buffer and verify that your character 🙋 is specified with the correct fontcommand, it should look like this.

\textEmoticons{🙋} (Joy)

Viewing this generated LaTeX is meant to continue the gentle introduction to LaTeX. Equally as important, you can use it to verify that things are working correctly so that you can confidently use and rely on this toolchain moving forward.

Return to new.org. Create a PDF for it by hitting C-c C-e l o. A PDF is created and opened. It should now render the character 🙋 correctly instead of using a white box.

This is what it takes to teach Org-Mode and XeTeX to use the correct font for a Unicode character in a single Unicode block.

After setting up XeTeX to handle all of the Unicode blocks that you typically use, you’ll be converting all of your documents to Org-Mode in no time. Also, consider that now you’ve got a working XeLaTeX toolchain set up that you can use not only from Org-Mode but also Pandoc: you’ve got a lot of great ways to publishing now moving forward.

Have fun and be well!

Public API Features

First
Play around with them. See what you can do with them.
Second
Use them to configure your system.
Third
If you are really curious then read their source code by placing the cursor on their name, hitting C-h f and hitting enter, placing the cursor on on the filename org-utf-to-xetex.el that is underlined, and hitting enter again. Now you are presented with a buffer containing the source code and the cursor is positioned on that function. To return to wherever you came from you can hit C-x b enter to go back.

API:

GoalFunctionDocumentation
What Unicode block does the character after the cursor live in?org-utf-to-xetex-get-unicode-block-for-string-char-afterThis is Unicode block name for this character.
What Unicode block does this character live in?org-utf-to-xetex-get-unicode-block-for-string, strThis Unicode block name is used for the LaTeX fontcommands.
Tell XeTeX about the Unicode block for some characters (so this package knows what font to use).org-utf-to-xetex-string-to-xetex, strProvides a LaTeX string with the font environment you want.
Wrap some text with the package macro, or just insert it.org-utf-to-xetex-insert-or-wrap-with-macroSee goal.
Make the Org-Mode markup for this package easier to read.org-utf-to-xetex-prettifyUse prettify-symbols-mode and org-hide-macro-markers to hide parentheses. Add to org-mode-hook.
Tell what fonts to use for what kinds of characters.org-utf-to-xetex-command-for-every-blockPop up a window with commands necessary for every Unicode block.
Tell your Org-Mode document to load this package’s macro.org-utf-to-xetex-insert-setup-file-lineSee goal.

Verification

This package is working correctly when:

  • All of the tests pass.
  • You’ve configured enough font blocks to cover the characters in your source document and they appear correctly in the PDF.

Here is how to run the tests:

  • Go to your command line.
  • Verify that Emacs is in your path. You can do that by running the command emacs --version. You should get a message that looks like this: /images/emacs-in-path.png
  • Run:
    emacs -batch \
          -l ert \
          -l ~/src/org-utf-to-xetex/org-utf-to-xetex.el \
          -l ~/src/org-utf-to-xetex/org-utf-to-xetex-test.el \
          -f ert-run-tests-batch-and-exit
        

The test report should say that all of the rests ran as expected.

For example

Ran 8 tests, 8 results as expected (2018-06-26 21:16:34-0500)

Read Me Later

What Is The Value Of Setting Up Your XeLaTeX Toolchain?

Once you’ve set up your XeLaTeX you can use it for the rest of your life.

You can use it with straight LaTeX.

You can use it from Org-Mode.

You can also use it from any markup language that compiled down to LaTeX. For example you can use it with Pandoc and Markdown. Read more about it here.

Leveraging Your Investment In XeLaTeX With Pandoc

Pandoc is a universal document converter. Known for being a super high-quality piece of software: it works well for doing just about anything with document conversion. If you’re unfamiliar with it, then now is the time to dig in. Particularly read more about the first-class LaTeX support.

LaTeX markup can be used inside of Pandoc Markdown documents. It simply works just as you would expect. When you use XeLaTeX behind the scenes, it will work identically to how org-utf-to-xetex compiled PDFs. For example, consider the workflow example used in the workflow section. Here is how you would write it with Pandoc Markdown.

Create a file hi.md. Paste into it the following.

Hi 🙋!

Install Pandoc and then try compiling it like this (assuming you’ve already got your XeLaTeX setup working).

pandoc --from=markdown test.md --to=latex --pdf-engine=xelatex -o test.pdf

You’ll immediately get an error message telling you what you already know from configuring this package: XeLaTeX can’t find the character with the current main font.

[WARNING] Missing character: There is no 🙋 (U+1F64B) in font [lmroman10-regular]:mapping=tex-text;!

The solution is to update the test file with the LaTeX information that XeLaTeX needs to choose the correct fonts. You already know how to do this because you made the same configuration change using this package. If you dug a little further into the workflow process’s intermediate steps, you’d probably recognize this as the same code that is generated when you compile your Org-Mode file to LaTeX.

---
header-includes:
- \usepackage{polyglossia}
- \usepackage{listings}
- \usepackage{yourname}
---

Hi \textEmoticons{🙋}!

Recompile the file, and you are presented with a PDF just as you expect.

/images/pandoc-hi.png

Aside:

Pandoc provides command-line abstractions so you don’t even need to know the LaTeX markup required to configure the LaTeX feature. For example you can specify the command-line argument to configure the font. You can read more about how Pandoc Markdown works with XeLaTeX here for Creating a PDF and configuring other Variables for LaTeX.

Making Pandoc available with XeTeX is genuinely a powerful and exciting toolchain for your publishing workflow. Whether you use Org-Mode or not, it is worth setting up XeLaTeX and Pandoc. If you ever needed to migrate off of Org-Mode, Pandoc would be a greatp place to go.

What Languages Use Latin Characters?

See here.

Unicode And You

Learning more about Unicode will serve you well beyond using this package. Here are some fun ways to explore Unicode.

Code Charts
Click on a code block and see the characters that live there. This is useful when you find the block for characters that you are not familiar with, and you want to see what other characters are in there. Remember that you can use org-utf-to-xetex-get-unicode-block-for-string to get the block for any Non-Latin character. It was fun to see the APL Symbols in the Miscellaneous Technical Block.
The Story Of A Unicode Emoji
Ostensibly only about Unicode Emoji but serves as a great introduction to just about every interesting aspect of Unicode.
The unicode-fonts Package
Configures Emacs with the font to use for each Unicode block. Its default configuration chooses good defaults, so your job is only to install the fonts themselves. After you have found fonts that you like, you can use this package to specify the same font for XeTeX, resulting in a “What You See Is What You Get” experience from Emacs to PDF.
The view-hello-file Function
Call it to “Display the HELLO file, which lists languages and characters.” This is a fun way to learn more about characters using describe-char and org-utf-to-xetex-get-unicode-block-for-string-char-after.

Kinds Of Users

If you are reading this, then it is safe to say that you are an Org-Mode user. You doubtless fit into one of the following profiles:

  • You are not a LaTeX and XeTeX user, but you are willing to set up Org-Mode for both and get very comfortable with them
  • You are already a LaTeX and XeTeX user and have already set up Org-Mode for both. You are very comfortable with both.

This guide attempts to be useful for any level of Org-Mode, LaTeX, and XeTeX users. If you aren’t yet comfortable, then please know that:

  • It is worth learning because you will use it for the rest of your life.
  • They are all relatively easy to learn.

Once you get comfortable with the tools, then the workflow for this package will feel simple to you.

Until you reach that point, please take your time and learn at your own pace. Don’t hesitate to contact me with any questions. Once you get everything set up right you’ll be very happy to have first-class Unicode support through your entire publishing workflow.

If you are already an advanced user then you might value reading this section for skilled LaTeX users.

For Comfortable Org-Mode Users

This section aims to capture an imagined conversation between another Emacs+Org-Mode user who wants to know more about this package and me.

Should I Learn LaTeX To Use Org-Mode In General?

80% of the time, when you are using Org-Mode, you should never have to learn how to use LaTeX. Org-Mode. Org-Mode provides an abstraction away from the publishing format.

For example, text marked as bold is automatically converted to the destination format’s markup for bold. Also, consider how the Org-Mode Symbols mechanism is used for abstracting away common symbols: one scenario for them is expanded upon here. Org-Mode, of course, abstracts away much more than the two examples listed here.

Given that LaTeX is such a rich, deep, and at times intimidating platform: 80% of the time, you should never need to learn LaTeX when you are using Org-Mode because it would be a poor use of your time. Org-Mode saves you a lot of time.

When Might I Learn LaTeX To Use Org-Mode In General?

You’re never going to learn LaTeX for using Org-Mode , in general,: it will only ever be because you want to do something with the LaTeX Export mechanism. For most of us, that means using it to create a PDF.

Typically you reach this point when your generated PDF doesn’t look how you want it to look. For example, the document size is Letter instead of A4, the font is wrong, or the table is going off the edge of the page.

Most of the time, everything “Just Works” for most people is the power of Org-Mode. The second that it doesn’t “Just Work” for us is when learning some LaTeX changes from being specialized information to general information that we need to know immediately.

Here are some examples that you might encounter relatively quickly:

  • Where to add a package that you found on the Internet.
  • How to make your tables look right.
  • How to make your images look right.
  • How to insert a horizontal rule.

Configuring your Org-Mode document and its LaTeX exporter can seem both simple and complicated at the same time.

On the one hand, it is merely because you need some understanding of how to utilize your publishing format. For example, publishing to a format limited to a width of 80 characters, there is no way to ignore that. Org-Mode is an abstraction for its publishing formats, so it is your responsibility as the publisher to understand the destination formats.

On the other hand, it isn’t straightforward. You start using Org-Mode so that you can write instead of fiddling around with the underlying publishing mechanisms. Have you looked for the options for how to configure your LaTeX tables? They are not everyone’s idea of how to spend the next thirty minutes of learning.

The key point to reflect on here is that simple doesn’t mean simplistic. Every publishing mechanism is non-trivial and requires effort to utilize, be it LaTeX, MS Word, and even HTML. Based on my experience, I’ve found that learning and using the most simple LaTeX configurations like the ones given above quickly does two things for you:

  • Bolsters your confidence in using LaTeX.
  • Opens the door for you to explore and use much of the rich and powerful LaTeX packages available for your publishing process.

Once you start configuring your LaTeX exporter, you immediately become part of the 20% of “Skilled LaTeX Users.” How skilled exactly? That is a matter of opinion. Whatever the case consider that you have broken the taboo that “You should never touch LaTeX when you use Org-Mode” because you did use it, it went well, and you will probably use it again.

In the long run: using LaTeX in Org-Mode make using both Org-Mode and LaTeX easier. Consider it a good investment that will pay great dividends both immediately and in the long run.

Why Does org-utf-to-xetex Expose ANY LaTeX To The User At All?

  • Because it was the easiest way to implement this functionality.

Since it’s a problem that I solved for myself, there was no external feedback to shape it. Since I made it as simple as possible, and it included LaTeX, I left it alone.

  • Because it is hard to guess what technical level of Org-Mode users will use this.

Generally, there are two very large groups of Org-Mode users: those who want to write and avoid the “technical details” as much as possible, and those who want to write and to get into the “technical details” at any level.

This package could have been aimed at the former. It would have used the Customize interface: no code would be written at all. It would have used a custom LaTeX package: instead, it would have attempted to include all necessary functionality. Everything that could be automated would be.

This package could also have been aimed at the latter. It could have used advanced Unicode packages: instead vanilla LaTeX code. It would have used a custom exporter to allow for a better writing experience by the user.

Good or bad, this package has elements of both. However, the parts used are guided by a singular goal: to make implementing the desired functionality in as simple a way possible, making it as straightforward as possible for users, and leaving enough flexibility to grow it. This approach wasn’t my plan; it was just as happy accident that can partially be attributed to expertise and more likely attributed to pure pragmatism.

The future users (or lack thereof) will guide how this package moves forward because right now, it is impossible to guess where this package will (or stay).

  • Because it is hard to guess what kind of users will use this.

Will troff users switch to this? I’m not sure why they would change, their problem is solved, and it has been for a long time.

Will Pandoc users switch to this? Maybe. Whether you use this package or not, you’ll need to set up your LaTeX toolchain somehow to handle Unicode. If you already know Emacs, then yes, it is a natural choice. If you don’t, it is much less likely.

Will Microsoft Word (Word) users switch to this? You might be surprised. If you work alone, moving off Word is relatively easy, especially if you are a techie who is willing to learn five of the most common LaTeX packages. On the other hand, asking a non-techie who wants to write with Emacs then asking them to set up XeTeX will be a hard pill to swallow.

Will Org-Mode But Not LaTeX users switch to this? Yes, they are the most likely candidate. They are already comfortable with Emacs, which overtly or not is quite technical. Since even the least technical level user can quickly complete the setup, it is even more likely for users to switch.

Will Org-Mode, who are Already LaTeX users, switch to this? Maybe. Read this to see where you fit.

Will Org-Mode users who have already given up hope that they can use easily use Unicode in their PDFs switch? I have absolutely no idea, I can’t even guess. The extremely wide range of Org-Mode users makes it virtually impossible to predict what is the “right thing” for them.

The best way to move forward with any solution is to make it good for yourself, document it well, and if people see things they same way then they will be able to utilize it with the least effort and most joy possible.

Why Not Put All Of That LaTeX Code In The Org-Mode Configuration?

That is an excellent question to pose because it is true that you can put everything in the Org-Mode level.

You can place all of your LaTeX configuration inside of your Org-Mode settings. Having followed this approach for years, I’m confident sharing that it works very well. The generated LaTeX looks exactly the same, and everything works as expected. However, there are three big topics that I had to face, and you will, too, with this approach: they are probably going to be show-stoppers preventing you from using the Org-Mode LaTeX configuration approach.

Problem #1 is that it is very rigid. When you make changes to your Org-Mode/LaTeX configuration, you have mostly made this your global configuration. In the short term, it is somewhat convenient primarily because you don’t need to touch the custom package. The second that you start customizing your documents differently for different situations, this rigid approach becomes a big problem. What different situations? Simple: you write papers on a computer for reading on a computer, and you write letters to be printed and mailed to someone. You end up with a lot of different settings for the different use cases. Managing this in Org-Mode is painful, and in LaTeX, it is trivially easy. It is a no-brainer to go with the custom package here.

Problem #2 is that eventually, you will want or need to work directly with the LaTeX that creates your PDF. In theory, some of you will want to learn it just for the sake of learning it. However, that is unlikely. Most of you will have no choice but to work directly with it because you face a configuration problem with a package, and, probably doing so in anger. The context here is that you are using a package, and it “just isn’t working right.” In cases like this, configuring the package directly from Org-Mode is a useless abstraction that makes it harder to configure the package. When you follow this approach to solving your problem, working directly with the LaTeX, you need to put that code somewhere: the best place to put it is in that custom package. Finally, once you resolve your issue, the resolution “just works” inside the package, so often, there is little to no benefit for moving it back into your Org-Mode configuration.

Problem #3 using your Org-Mode configuration of LaTeX outside Org-Mode ranges from a painful to nearly impossible experience. For an experienced LaTeX user, it is relatively painless. But if you are switching over using org-utf-to-xetex, then you are probably not an expert, and it will be a painful waste of time dealing with the issues that you face. This package strives to balance well between ease of use and flexibility of configuration.

No matter where you put the LaTeX configuration: understanding the configuration will take effort. By judiciously deciding whether to put it in the LaTeX layer or the Org-Mode layer org-utf-to-xetex makes things easier for you, in the long run, is the best way to start. As your familiarity and expertise with Org-Mode and LaTeX ramp up, it will be naturally the right time to reflect on the approach taken in org-utf-to-xetex and how you want to move forward from there personally.

For Skilled LaTeX Users

The purpose of this section is to capture an imagined conversation between me and another Emacs+Org-Mode+LaTeX user who wants to know more about this package.

This package is written primarily for users who have never directly used any form of LaTeX before. With that in mind, this section is addressed directly at already skilled LaTeX users. Therefore it is terser and less explanatory than the rest of the document. Links to relevant resources, however, will provide the background necessary to provide the full picture for each section.

Get To The Point: What Does org-utf-to-xetex Do?

It helps you configure fontspec to create a newfontfamily then DeclareTextFontCommand for it.

Keep It Simple: Why Does org-utf-to-xetex Exist?

org-utf-to-xetex exists to give a detailed step by step instructions for people switching from Unicode WYSIWYG editors to Org-Mode, so they don’t quit using Org-Mode in a fit of rage because their characters don’t get rendered correctly in the PDF.

org-utf-to-xetex exists so that new Org-Mode users, who are completely unfamiliar with LaTeX and need to generating high-quality PDFs, enjoy the benefit that this package, once configured, will automatically use the correct fonts for every character possible with as little effort as possible and as quickly as possible. Nothing else like this exists for people coming to Org-Mode.

org-utf-to-xetex decidedly doesn’t exist to help Org-Mode users learn LaTeX, LaTeX packages, Org-Mode internals a personal publishing workflow.

org-utf-to-xetex’s singular purpose is to help people make the PDFs they generate looks the same as they do inside of Emacs.

Should You Use XeLaTeX Or Some Other LaTeX?

If you want to easily use any font and write your documents entirely in UTF-8, then XeTeX makes it easy.

As you may or may not expect, there are almost always packages for plain PDFLaTeX that let you do the same thing with varying degrees of effort.

If you already do everything you want in PDFLaTeX, then one can only guess as to why you would use a different compiler.

This package requires XeLaTeX.

Should You Use org-utf-to-xetex For CJK Characters?

For documents primarily written in various Asian scripts:

No, you should not.

The LaTeX package cjk already provides support for that functionality.

Specifically, it supports (copied directly from the link)

  • Chinese (both traditional and simplified).
  • Japanese.
  • Korean.
  • Thai.
  • A special add-on feature is an interface to the Emacs editor (cjk-enc.el) which gives simultaneous, easy-to-use support to a bunch of other scripts in addition to the above — Cyrillic.
    • Greek.
    • Latin-based scripts
      • Russian.
      • Vietnamese.

For XeTeX, the xecjk package is available for “typesetting CJK documents in the way users have become used to, in the CJK package.”

If your mother tongue is one of these languages, either you are already using cjk in some form or another (via LaTeX or something that compiles down to LaTeX), or you should be using it.

Should You Use org-utf-to-xetex For Primarily Multi-Language Documents?

No, probably not.

polyglossia seems to be the best solution available.

Should You Use LaTeX Libraries That Already Do The Same Thing Instead?

Yes, definitely.

If you are primarily a Non-Latin language user, you are probably already using a solution like cjk mentioned above.

That still leaves a wide range of language users ranging from people who want to include Emojis in their letter to graduate students writing dissertations comparing literature written in four different languages. There is a really good chance that a solution already exists for their use case on CTAN.

If you’re unfamiliar with the existing LaTeX packages on The Comprehensive TEX Archive Network (CTAN) that solve the same problem like this one then you should start by studying the following:

When I did the research, I found all of them to be feature-rich, highly-configurable, and flexible. At the very least, you should read the introductory paragraph for each of those packages and then compare them to the approach that this package takes.

If you’re familiar with those packages then you probably already #1 had a problem you needed to solve, #2 chose a solution, and #3 solved it obviating the need for this package.

What none of these packages provide, however, is any level of integration with your Org-Mode workflow. For that, you need to explore org-utf-to-xetex, another package, or a custom exporter that you have written.

Should You Switch To org-utf-to-xetex From An Existing Solution?

Maybe.

First read this and this.

If you are solving hard problems with the existing packages, then this package couldn’t replace them.

If you are already a user of those packages for solving easy problems, then this package might be able to replace them.

If you are already a user of those packages for solving easy problems and you are an Emacs and Org-Mode user looking for something simpler, then this might be a good replacement.

If you are an Emacs and Org-Mode user looking for something simple, then this is a good place to start.

Your workflow is usually so personal that it is hard to assume anything. org-utf-to-xetex certainly makes no assumptions about you either.

Did You Really Need To Write Yet Another Package To Solve This *Problem?

In 2018 the answer was “Yes.”

babel didn’t work for me. Neither did ucharclasses. I didn’t know about polyglossia at the time. Studying how ucharclasses worked though showed me exactly what I wanted to accomplish in the first place, though.

It wasn’t much: automatically choose a font for a character. When I researched how to do something like that in straight LaTeX, the approach turned out to be extremely simple. Once I had that working by hand, I quit looking for a package and went on my merry way: Emoticons worked fine in my (PDF) letters, so there was no more work to do. As time wore on, I used more and more symbols from Unicode blocks. That required finding the right font, adding it to my config, and other slightly tedious tasks. Eventually, I started getting tired of it and automated. After automating it, it dawned on me that all of that work could be automated from Emacs Lisp during the Org-Mode LaTeX export. It was simple from the beginning and remains simple today: the entire solution is easy to implement by hand and trivial to automate in code.

In 2020 the answer is still “Yes.”

It is yes because the approach that org-utf-to-xetex follows is very simple. #1 specify a font per Unicode block. #2 say what block you want the character to use. Done.

You still have to perform the same amount of work with LaTeX and the compiler and the toolchain. That is unavoidable. However, you don’t need to learn yet another LaTeX package or worry about changes to it. Migrating from this approach to any other would-be trivial since, at the Org level, you are using a macro that never has to change.

That is another benefit: whenever you want to make changes, you can do just about everything in Emacs Lisp. Org makes it easy to abstract things away: you don’t have to deal with LaTeX stuff. LaTeX by itself provides plenty of abstractions itself, too, so even that is nothing to worry too much about since you are unlikely to ever write custom LaTeX functionality yourself anyway. So learn Emacs Lisp, and you can get what you want instantly: that is why this package is so small and simple.

Finally and most importantly, I realized that for me to ever recommend that someone switch to Org-Mode from programs like Microsoft Office or Open Office that there should be a way for them to easily handle Unicode characters in their documents. When they are presented with empty white boxes in their PDFs, it would be a complete disservice to them and completely irresponsible for me to reply, “Well, I don’t know. Guess you should learn LaTeX!” That is a completely unrealistic expectation for 99% of people trying to switch to Org-Mode and are inevitably faced with the small yet show-stopping issue if incorrectly rendered characters.

Now I have a solution to that problem that is simple, easy to understand, modify, and maintain: that I understand completely, that any Emacs and Org-Mode user can easily make sense of and utilize and maintain, and doesn’t rely on any external packages either Emacs Lisp or LaTeX. Now I can recommend anyone to switch to Org-Mode knowing that they can use any font that they want to use, any time, easily. This is one way to do it, one of many, and for me, it is the easiest.

How Does org-utf-to-xetex Work And Why?

Essentially the workflow is:

  • Find a character that isn’t rendered properly.
  • Find a font for it.
  • Assign that font to the Unicode block the character it lives in.
  • Add a little LaTeX code to a custom package for the Unicode block.
  • In Org-Mode, wrap your character with the helper macro.
  • During the export, the macro will look up the correct markup for the Unicode block the character lives in.

When you read this README, it explains it in greater detail, and in context, it is even easier to understand. Everything this package does can be done by hand in about 2 minutes per Unicode block, depending on your familiarity with Unicode and Emacs.

org-utf-to-xetex is simple but not simplistic.

Why Does Compilation Get Slow With A Lot Of Fonts?

When you use more than five fonts in a XeTeX compiler: compilation gets slower and slower. This behavior is #1 discussed frequently in TeX.StackExchange.com, and I’ve run into the same thing. However, you are unlikely to run into this issue for two reasons:

  1. You are probably going to use only two or three Unicode blocks. For example, `Emoticons`, `Arrows`, and `Dingbats`. Three blocks won’t be an issue.
  2. You are using documents that are not-primarily Latin-based (see here or here) so you won’t configure many blocks using this approach. Again two or three blocks won’t be an issue.

However, if you need to use more than a few blocks, the good news is that your document will compile but it will take a lot longer. That could mean five times longer in my experience.

If you happen not to be using all of the blocks in your package and slowing down your builds, then the solution is simple: create separate packages for each grouping of blocks that you want to include in your document. For example, use one package for `Emoticons` and `Arrows` that you are likely to use in every document. Then create separate packages for the language or two that you are using in your current document. Although this isn’t a perfect solution, it is one of the best options since the compiler determines the build time behavior.* Credits

  • rolandwalker’s unicode-fonts Package showed how to utilize Unicode fonts in Emacs. The code showed what font blocks to ignore. Educational. Sweet. One of a kind package!

Development

  • Contributing
    • Read the contributing guidelines.
    • Before your commit make sure that byte-compile-file, checkdoc, and package-lint-current-buffer don’t report any errors. The first two are included with Emacs. package-lint you can either install using MELPA or you can also install it by hand like you did the other packages, like this:
      cd ~/src
      git clone https://github.com/purcell/package-lint.git
              

      Use this code to load it:

      (add-to-list 'load-path "~/src/package-lint")
      (require 'package-lint)
              
  • Testing
    • Emacs Lisp Regression Testing
    • Manual System Testing
      • See Examples above. Export them and compare the export to the sample files.
  • Rules

License

User List

  • Cyberdyne Systems
  • ENCOM
  • LexCorp
  • Protovision
  • Setec Astronomy
  • Tyrell Corporation
  • Wayne Enterprises
  • Yoyodyne Propulsion Systems It is hard to know. It is hard to know.