Introduction"Spam stinks." That's an statement that few would dispute. Email spam is one of the rare things in life that have no redeeming value whatsoever. It's evil, evil, evil! Spam abuses a system that was built on trust, using it to peddle scams and smut. Pure evil it is in both form and substance. The question is what can we do about spam. The commonest approach is filtering. But filters do not work well enough, with spammers often finding ways to circumvent them. We all have received junk mail from fsd7853@yahoo.com or djkpm3423@hotmail.com before. The risk of losing important messages prohibits more aggressive filtering. Project MOWA aims to create a new email system designed with spam prevention in mind. The main problem with today's Internet email system is lack of accountability. Anyone can send a message to anyone as anybody. When you receive a message you have no way of knowing immediately if it's really sent by the sender or if the sender even exists. The key feature of MOWA is therefore mandatory authenication through the use of public-key-private-key encryption. The project is current in conception stage. Any comment is welcome. |
How it works System envisioned Encryption Transport Post message Get message list Get message Get public key Get private key Delete message Post spam complaint Issues with mailing lists Public mailboxes Software Server Client Test servers Forum SourceForge home Acknowledge Contact |
The following example illustrates how MOWA works on a high level:
Now Makar has just finished writing a letter to his good friend Varvara. He wants to send it to her via email. When he clicks on the Send button, his computer encrypts and digitally seals his letter with his private key. Then it sends the result to Varvara's mail server.
Makar's public key resides on his mail server and is available for download. To read
his letter Varvara will need this key.
A day later Varvara checks her email. Her computer sees Makar's email address on the list of new messages, so it connects to his mail server to download his public key . It also possible that since Makar and Varvara exchange letters regularly, his key is already in her key cache. If so her computer will check to see if the cached entry is still valid and use it if it is. Her computer deciphers the message. Then it checks the digital seal to ensure the message really is from Makar.
If no problem is encounter during any of the steps, Markar's letter will appear onscreen.
It should be obvious why emails sent in this scheme must have valid return addresses. If the sender had forged the address, either with a non-existent account name or a non-existant server address, then the recipient's attempt to download his public key would fail. Without the key she wouldn't be able to decrypt his message. If he had sent the email with someone else's address, decrypting the message using that account's public key would produce garbage, as it was not encrypted with the right private key.
INVALID RETURN ADDRESS = NO KEY = UNREADABLE MESSAGE |
Enforcing this policy—that's the reason why we're encrypting the message. In MOWA it's impossible to read email with invalid return addresses. Even if you choose to, you can't do it.
Having valid return addresses greatly aids the fight against spam. It permits us to track down the bastards who sent them out. It's also makes it easier to block emails from known sources of spam.
Note that the sender's account has to be valid at the time when the recipient reads the message, as opposed to just valid at the time of delivery. Depending on how frequently people check their email, the time difference could be a few hours or a few days. If the account is deactivated before a email is retrieved, then it is essentially lost even through it has already been delivered.
MOWA imposes not only an identity requirment, but also a duration of existence requirment. To see how the arrangement retards spam, let us consider a hypotentical situation:
Smerdyakov is a spammer. He has just set up an mailbox at inquisitor.net, a email service provider (because he can't send email without a valid account). He immediately sends out 5000 copies of a spam message. Within a few hours 500 people have received it. One hundred of them decide that they aren't going to put up with this non-sense, so they quickly complain to inquisitor.net. After receiving the complaints, the service provider suspends Smerdyakov's account and reject all subsequent requests for his public keys. The 4500 pieces of spam still sitting in people's mailboxes instantly become unreadable. And Smerdyakov will have to set up another mailbox to send out his stuff.
For this to work, the service provider needs to respond to complaints quickly. We therefore need a robust automated complaint handling system.
The scenario doesn't change much if our spammer decides to operate his own server on a dial-up or broadband connection. He is actually easier to stop in this case, since his server is sitting on the net waiting for public-key requests. Acting on a single complaint, his ISP can immediately establish his guilt by connecting to his server.
The preliminary design of MOWA calls for a system running on top of HTTP. The protocol is well tested and server software is alreadily available. Web server technologies like JSP and PHP provide a great deal of power and flexibility.
A proof of concept server is up and running on a LAMP (Linux + Apache + MySQL + PHP) system. You can find out more about the software here and establish test accounts here. You can download the proof of concept client here. It runs on Win98/2000.
The different parts of a message—plain text, HTML text, attachments—are digitally sealed and encrypted individually, so that they can be retrieved separately. To reduce bandwidth and storage requirment, we will deflate the data first.
To create the digital seal, we pass the compressed data through the MD5 hash function, which yields a 128 bit digest. We then encrypt the digest with the RSA agorithm. The encryption function is:
C = (T^D) mod Nwhere T is the MD5 digest, N is the product of P and Q, two large prime numbers, D is the secret exponent, and C is the digital seal. N and D together form the private key.*
Using the same (not encrypted) MD5 digest as the key, we encrypt the compressed data with the AES cipher.
(* a faster algorithm that uses P, Q, dP, dQ, and N as input is used, actually)
To read a message part (the plain text, for example), we need the digital seal for that part and the public key of the sender.
First, we decrypt the digital seal to get back the MD5 digest. The decryption function is:
T = (C^E) mod Nwhere C is the seal, and N is the same N used by the sender, E is the public exponent, and T is the MD5 digest. N and E together form the public key.
Using the resulting MD5 digest as the key, we decrypt the AES encrypted data. We feed the output through the MD5 function to get another digest. If calculated digest matches the one supplied by the sender, the message is genuine. If not, then either the message is forged or it's been mingled during transport. The mail client shouldn't proceed further.
In the final step, the data is reinflated.
Sample codes can probably better enlighten than the above diagrams. Listed below are Object Pascal codes taken from the proof of concept client.
|
|
MOWA replies on HTTP for the transfer of data. It's a point-to-point system. The sender connects directly to the recipient's mail server (unlike SMTP, where the sender connects to his own server) and upload the content of a email. There's no routing involved.
The following table lists the HTTP commands and URIs for the different operations.
Operation | HTTP command | URI | Authorization required |
Post message | POST | /mowa-mailbox/ | N |
Get message list | GET | /mowa-maillist/ | Y |
Get message part | GET | /mowa-mailpart/ | Y |
Get public key | GET | /mowa-publickey/ | N |
Get private key | GET | /mowa-privatekey/ | Y |
Delete message | DELETE | /mowa-mail/ | Y |
Post spam complaint | POST | /mowa-complaintbox/ | N |
To send a email, we connects direct to the server of the recipient and performs an HTTP POST request:
|
HTTP header descriptions:
Host | The host's DNS name. Allows for virtual hosting. |
Content-type | Use multipart/form-data for HTTP file uploading. |
From | Sender's address. |
To | Comma delimited list of recipients. The server picks out those addresses that it hosts and saves the message to the appropriate mailboxes. |
Cc, Bcc | Lists of carbon-copy recipients. |
Subject | Subject of message, encoded in the character set specificized by Text-Charset. |
Text-charset | The character set this message employs. (optional - iso-8859-2 if omitted) |
Priority | A number indicating the priority of the message. 1 is the highest. 3 is normal. 5 is junk. (optional - 3 if omitted) |
Message-ID | A unique hex number identifying this particular message. Must be within 32 bit range. Meaningful only to the sender. When replying to the message, the recipient should place this number in the In-Reply-To field. The recipient can also use it to post a "message received" notice on the sender's server. (optional) | In-Reply-To | Message-ID of the original message, to which this is a reply. (optional) |
If a PHP script is handling the request, the multipart/form-data
formatted data will be
parsed automatically into global variables. Other languages should have similiar capability.
Variable descriptions:
key_valid[] | An associate array holding the key validation tokens for the individual recipients.
The token is the RSA-encrypted MD5-digest of the recipient's email address (used as the index here).
The server will save it along with the message to the recipient's mailbox. When the recipient's
email program asks the server for a list of her emails, it will get a
key_valid for each mail. It should use the token to validate the sender's public key before it starts
downloading the message (no point in downloading messages that you can't read). It can also use the
token to check whether a cached key is still valid. The address should always be converted to lower case before the MD5 digest is calculated. |
text[] | An associate array holding the message text in various formats. text[plain]
holds the plain text version, while text[html] holds the rich text version.The filenames for this field are dummmy values. They're only there to ensure the binary data get saved correctly. PHP, for instance, truncates the binary data if it's sent as simple POST variables. Note that both the variable name and the indice are in lower case |
text_key[] | An associate array holding the digital seals of the texts |
media[] | An array holding the files needed by the HTML text—image files for example. Should refer to them as though they're stored in a subdirectory named "media." (e.g. <img src="media/me.jpg">) |
media_key[] | An array holding the digital seals of the media files |
attachment[] | An array holding the file attachments. | attachment_key[] | An array holding the digital seals of the files attached |
If the operation is successful, the server responds with the HTTP status code 202 (Accepted). It returns 202 even when its fails to deliver the mail to any of the intended recipients. The causes of the failed deliveries it would list in the body of the response.
|
Attribute descriptions for <error ... >:
error.mailbox | Name of the mailbox that could not be reached. |
error.code | Error code. Two codes are currently defined: 404, when the mailbox doesn't exist, and 301, when the mailbox has been moved permanently. |
error.extra | Extra information associated with the error. For error 301, it's the address where the mailbox was moved to. |
error.description | Text description of the error |
If parts of a message (keys, key-valid) are missing, the server returns 406 (Not acceptable). If it's blocking the sender, it returns 403 (Forbidden).
Downloading the message list differs little from requesting a regular web page. We send an HTTP GET request to the server and the server responds with the list. We use the regular HTTP mechanism for authorization.
|
GET field descriptions:
mailbox | Name of the mailbox. This does not have to be the same as the user name provided by the Authorization field. It's possible for a number of users to have access to the same mailbox. |
from_id | Return only those messages whose id is greater than from_id. The entire list is returned if omitted. |
If no message is found, the server returns status code 204 (No content). Otherwise it returns 200 (OK) with the message list in the body.
|
Attribute descriptions for <mail ... >, <media ... > and <attachment ... >:
mail.id | A numeric id for use in subsequent get mail-part requests. This is not the Message-ID provided by the sender. It's merely a number used by the server to reference the message (most likely a database primary-key). For a given mailbox, this number is not necessarily consecutive. Message #4 could be followed by message #5000. Newer messages will always have a larger id. Deleting a message does not change the IDs of messages that follow it. |
mail.priority | A number indicating the priority of the message. 1 is urgent. 5 means the message can be ignored. |
mail.size | The approximate size of the message. |
mail.from | Email address of the sender |
mail.date | Date and time when the message was receiced. |
mail.subject | Message's subject. |
mail.in_reply_to | ID of the message that this one is a reply to. (Note the underscores instead of dashes.) | mail.key_valid | Message's key validation token. It's the MD5 digest of the mailbox' address encrypted using the sender's (whose email address is given in mail.from) private key. The email client should fetch the cooresponding public key, descrypt the token with it, and compare it to the digest generated by hashing the address. Only if two match should the client proceed to downloading the message. |
media.id | A numeric identifier of the media file. Not necessarily consecutive. Not necessarily unique across different messages. |
media.filename | Name of the media file. |
media.size | Size of the media file. |
attachment.id | A numeric identifier of the attached file. |
attachment.filename | Name of the attached file. |
attachment.size | Size of the attached file. |
If a mail tag indicates a message with no subject, size zero, and which has an in_reply_to ID, it's a mail-received acknowledgment. The email client should either just ignore it, flag the appropriate message in the sent box as received, or communicate this fact to the user in some other way.
The client software can use the message priority to determine whether it asks the user if an acknowledge should be sent. For example, it may launch a Yes/No popup window if the priority is 2 or higher.
An acknowledge is just an empty email with an In-Reply-To ID and no subject.
MOWA allows us to download different parts of a email (plain text, HTML text, graphic files, attachments) separately. We send an HTTP GET request to retrieve each part, placing the necessary parameters stored in the URI.
Downloading the text part:
|
GET field descriptions:
mailbox | Name of the mailbox. |
mail_id | ID of the email as given in the mail list. |
text | Text format requested, "plain" or "html." |
The server returns 200 (OK) if the retrieval is successful, 204 (Not content) if the email doesn't have message text in the requested format (or it has no message text, period), 404 if the email is missing.
|
The Key header field is used to decipher the data. First we convert the hex string to a 512 bit integer, and perform the RSA decrypt operation (T=C^E mod N) on it. The result should be a 128 bit number. Using this number as a key we decrypt the data with AES. We then compute the MD5 digest of the decrypted data and compare it to the 128 bit key (in MOWA, data is always encrypted with its own MD5 digest). If the two match, we inflate the data to get back the original content.
For text retrieval requests the server returns additional information about the email, such as the subject and date, in the HTTP header. Most of this information has already appeared in the mail list. Providing it here again just simplifies the design of web clients.
For attachment or media file request, the server provides Key, Filename, and Size in the header.
Downloading an attached file:
|
GET field descriptions:
mailbox | Name of the mailbox. |
mail_id | ID of the email as given in the mail list. |
attachment_id | ID of the attachment as given in the mail list. |
|
Downloading a media file is done in the same manner with media_id in place of attachment_id.
Getting public keys is easy. We just connect to the sender's server and do a GET.
|
GET field descriptions:
mailbox | A comma delimited list of mailboxes whose public keys are required. |
If the server finds all or any of the keys, it returns the status code 200 (OK) and lists them in the response body.
|
Attribute descriptions for <key ... >:
e | Public exponent. |
n | Modulus, product of p and q. |
If the server fails to find any of the keys it returns 204 (No content).
Before the sender can send out any email, he needs his private key. We download the key from the server with a GET request:
|
The server returns status code 200 and the key in the response body in a key tag.
|
Attribute descriptions for <key ... >:
p | Large prime number p. |
q | Large prime number q. |
dP | Exponent of p, where e*dP=1 mod(p–1) |
dQ | Exponent of q, where e*dQ=1 mod(q–1) |
n | Modulus, product of p and q. Although this can be calculated from p and q, it should not be omitted. |
The server returns 403 (Forbidden) if the user has no access to the key (he can't send).
To delete a email from the server, we send an HTTP DELETE request:
|
The server returns 202 (Accepted) if the user is authorized to delete mails from the mailbox. Otherwise it returns 403 (Forbbiden).
Deleting an non-existent mail does not produce an error.
|
The fact that emails in MOWA must have valid return addresses makes it much easier to regulate junk mail. The system is designed in a way that enables servers to respond to spam complaints quickly and automatically. It follows a simple principle in deciding who's a spammer and who's not: If someone has sent emails to a large number of people and many of these recipients (say 50 of them) file complaints against him, then he's a spammer.
When someone has received a spam mail and wishes to complain about it, she sends complaints to both her mail server and the sender's server, giving both a chance to respond. Her server, after receiving enough complaints, might decide to add the sender's address to its blocklist—or it might decide to block the originating domain altogether. The sender's server, again, after receiving enough complaints, might decide to suspending his account.
When someone's account is suspended, people can no longer download his public key. This renders all the emails that
he has sent unreadable.
A complaint has two components: the key-valid token of the offending email, and the complaint-valid token of the complaint. The former is the RSA-encrypted MD5-diguest of the recipient's email address. The latter is the RSA-encrypted MD5-digest of the key-valid token. Figure 7 illustrates how each is generated.
After receiving a complaint, the server uses the sender's public key to decrypt the key-valid token. Since it is created using the sender's private key, theoritically only he could have created it. If it decrypts to the digest of the recipient's email address, it proves that he has emailed the recipient. Fifty key-valid tokens decrypting to digests of fifty distinct addresses proves that he emailed at least fifty people. The server can safely assume that he's a spammer and suspend his account or put a block on his address.
You can only be branded a spammer by mass-mailing people.
In a similar fashion, since the complaint-valid token is created using the recipient's—i.e. the
complainer's—private key, it proves that it's she who has filed the complaint. Decrypting this token is generally
not necessary, since the complainer's possession of the key-valid token already proves she's the recipient.
Her mail server might find it easier, however, to decrypt this token instead, since it has her public key already. In an
environment where the users are trusted, the server can institute a block on a particular address based on complaints on
its users without checking whether the sender had in fact emailed the complainers.
We send complaints with HTTP POST requests:
|
Header descriptions:
From | The recipient's (complainer's) email address. |
About | The sender's (spammer's) email address. |
Complaint-type | Type of complaint. Must be "spam." |
complaint_valid[] | The complaint valid token. RSA-encrypted MD5-digest of key_valid. It's stored in an array only to maintain the convention of storing all variables in the POST request body as arrays. The 'key_valid' index is a reminder that it's the encrypted digest of the key_valid token. |
key_valid[] | Key-valid token of the offending email. RSA-encrypted MD5-digest of the recipient's email address (given as the array index here). |
The address provided in From should match the index of key_valid[].
When the user makes a spam complaint, the email client should automatically add the sender's address to its blocklist, just in case the complaint doesn't result in the spammer's banishment.
Under MOWA running mailing lists is tricky, because they have a similar mailing pattern as spammers. E-mail newsletter is virtually impossible, since people with contrary views can sign up then shut it down with spam complaints.
MOWA currently has no defined way to handle situations where the sender of a message is not its author. More thought on this is required...
An possible alternative to mailing list is the public mailbox. Anyone can read its content but only a selected few can delete. The mailbox acts essentially as a bulletin board.
This requires more thought...
The RSA algorithm was invented in 1978 by Ronald Rivest, Adi Shamir, and Leonard Adleman. "RSA" is a trademark of RSA Security Inc.
The algorithm selected by NIST to be AES, Rijndael, was created by Joan Daemen and Vincent Rijmen.
The MD5 digest function was created by Ronald Rivest. It's described in RFC 1321.
The Deflate format is explained in RFC 1951.
The Hypertext Transfer Protocol (HTTP) is described in RFC 2068 and 2616.
You can contact me by regular e-mail. Better yet,
download the POC client, set up an account at one of the test servers, and email me through MOWA. My
address is karolina@conradish.com.
- Chung W. Leong
Last update: April 14th 2002 |