mirror of
https://bitbucket.org/smil3y/kde-playground.git
synced 2025-02-23 18:32:51 +00:00
.. | ||
add_attachments.php | ||
courier_to_kmail_maildirformat.php | ||
create_vcards.php | ||
extract_contacts.php | ||
README | ||
run.sh | ||
to_valid_maildir.php |
These scripts are designed to make use of the Enron email dataset (see http://www.cs.cmu.edu/~enron/). They try to retrieve contact information from the messages and create an address book (VCARD format) from them. To start the whole process, just run './run.sh'. It needs wget, tar, gzip, cat, sort, uniq and PHP. Following is a list of the scripts: - extract_contacts.php Reads the mail files, analyzes their addressee information (names, company names, email addresses) and outputs it. Takes about 4 minutes on a Pentium 4 2.4 GHz to read 128326 mail files. - create_vcards.php Reads the output from extract_contacts.php and creates a VCARD address book from it. Additional information, like telephone numbers, birthdays, photographs, is randomly generated and added to the VCARDs. Takes about 20 seconds on a Pentium 4 2.4 GHz to create 32072 VCARDS. - add_attachments.php Randomly adds fake attachments to the email messages, in order to create a more realistic dataset. (c) 2007 Robert Zwerus <arzie@dds.nl>