ArcMesg is a set of simple tools used to move and manipulate message files, originating from both e-mails and USENET news. Each message must be stored in a separate file, and will be named after the SHA1 of its universally unique message ID.
Before you can use ArcMesg to collate message files, you should first download some.
First, install fetchmail
. Then create the file ~/.fetchmailrc
, and chmod 600 ~/.fetchmailrc
. Give it the following content:
poll pop3.example.com
protocol pop3
username "foo"
password "bar"
fetchall
mda ~/incoming-messages/save.sh
Change pop3.example.com
to your mail server, foo
to your username, and bar
to your password. This will cause fetchmail
, when you next run it, to pipe each e-mail message, one at a time, to the script ~/incoming-messages/save.sh
before deleting it from the mail server.
Next, mkdir ~/incoming-messages
and create the file ~/incoming-messages/save.sh
with the following content:
#!/bin/sh
number=1
filename="/home/foo/incoming-messages/${number}.txt"
while [ -f $filename ]; do
number=$(( $number + 1 ))
filename="/home/foo/incoming-messages/${number}.txt"
done
tee $filename
Replace foo
with your username. chmod 755 ~/incoming-messages/save.sh
so you can execute it. This script saves each piece of text passed to it in a sequentially numbered file.
Test the script with echo 'Test' | ~/incoming-messages/save.sh
. You should now have a file called ~/incoming-messages/1.txt
with the content Test
. If so, delete it, double check your ~/.fetchmailrc
file, and run fetchmail
.
You should now have several e-mails stored together, one file per e-mail.
### Splitting up mbox files
<Explain how to get mailing list archives; how to get USENET newsgroup archives>
First, install git
, which has the useful utility git mailsplit
.
Next, ensure your mbox files end in the .mbox
extension.
Finally, run the included script split.sh
.
Now you're ready to collate and sort your message files.
To use ArcMesg, set up a file in your home directory called .arcmesgrc
with the following information, in tab-separated format:
# The optional custom message repository directory
DocumentRoot ~/mesg
- I believe I should read the message headers as ASCII, and only if a Content-Type header is present that specifies an alternative charset should the message then be closed and re-opened using the appropriate charset. Currently, all messages are opened and processed as ISO-8859-1. Of course, it would be ideal to ignore the charset entirely, and assume the headers are ASCII and everything else should be preserved as-is without understanding or interpreting it at all.
- Port to C