Last week was published the dump of the data base of Ashley Madison. A lot of people has started to look into this to find cool stuff, including me.
So first I downloaded the relatively-large dump (10GB). Then I extracted the archive corresponding to email addresses (
aminno_member_email.dump), and from this, with a few lines of python, I ended with a list of 36 396 162 email addresses.
Let’s wander a little bit
From that, I made a few statistics (grep + wc) with no big interest, but some are funny :
.fr: 275 708
And … one very amusing:
Then, I checked a few email addresses of people I know, just in case …
Up to the next level
Ok, so now what? Well, what if I could check all people I know to see if they are in this list? Still just in case …
Getting emails from people I know
I have a file containing all email addresses registered on Ashley Madison, but I need to collect all email addresses from people I know. Exporting my address book does not fulfill my goals. I’d like to extend to all people I’ve sent an email to or received an email from. So how to extract this from my email desktop client (here,
Mail from Apple) ?
From the user interface, it is very unlikely that such a thing is possible. If any, I did not find it. But I found two interesting posts on the web (http://superuser.com/questions/192227/how-to-export-email-addresses-from-apple-mail and http://c-command.com/spamsieve/help/how-can-i-rebuild-apple). From the first one, I learned that
Mail store information in a sqlite database. The other one told me about an interesting file:
/Users//Library/Mail/V3/MailData/Envelope Index, which is one of these sqlite files. It sizes 31MB, and was written just a few minutes ago. A
strings on it tells me that it contains (a lot of) emails plus the headers, including email addresses. So a few sqlite commands later, here I am with a file containing all email addresses I (directly or indirectly) interacted with during the last two years. Here is the detail of the sqlite commands I used, for those who are interested:
> sqlite3 Envelope\ Index
SQLite version 3.8.5 2014-08-15 22:37:57
Enter ".help" for usage hints.
addresses feeds recipients
attachments labels subjects
duplicates_unread_count mailboxes threads
sqlite> .schema addresses
CREATE TABLE addresses (ROWID INTEGER PRIMARY KEY, address COLLATE NOCASE, comment, UNIQUE(address, comment));
CREATE INDEX addresses_address_index ON addresses(address);
sqlite> .output addresses.txt
sqlite> select address from addresses;
This gave me a file of 5669 lines, one email address per line.
Computing the matching
Once I have these two files, it becomes quite easy to compute the intersection, with a few lines of Python and a little optimization thoughts. Some important points of the implementation:
- I first sorted the email list from the dump alphabetically, to speed up the search.
- I sort of hashed it using the first two letters as a key
The implementation can be found here.
The result of my experimentation:
[ok] Load data base
[ok] Test search in data
[ok] Search match between address book and silly emails
(0/5649) No match found. Your address book is made of respectable people.
Unfortunately, no one of my email contacts was registered on Ashley Madison …
As a bonus, here is a part of the descriptions people use in their profiles on Ashley Madison. Enjoy.