On 1/2/08, Tom Limoncelli wrote:
Email systems should de-dup attachments. There's no reason to store the same 50G file multiple times. Even better would be for email systems to store the attachment in a hidden URL and replace the attachment with a link to that URL. (I think Frank Wojcik was the first to propose this idea). Then all sorts of problems go away.
AOL stored the attachment(s) separately from the "body" of the message, since the mid-90s. Multiple attachments to a single message would be stored as a single blob, however.
What they didn't do was de-dup the attachment system, although they had tried it -- at the time, they found that this created "hot spots" that would cause performance to go into the toilet, and the system just couldn't survive that kind of performance hit.
Ironically, Mailman can do this sort of thing -- strip attachments from messages that are posted to the list and store them on the server, and replace the attachment in the message with an URL back to the file. I think a lot of "mail big files" type of solutions work the same way.
I don't know whether or not they do any de-duplication, however. -- Brad Knowles <brad@xxxxxxxxxxxxxxxxx> LinkedIn Profile: <http://tinyurl.com/y8kpxu>