Organising a massive photo collection

I say massive, it’s not actually that big. It does make for a good title though. I’d like to share some of the methods I use to organise my photo collection. Most of the stuff I do is a result of reading somebody’s blog and then implementing in my own way. Organizing my photo collection has been incredibly liberating these past few days. I set out to de-duplicate, sort my photos into some sort of folder structure, shrink their file size down somewhat without affecting the quality of the images and then finally automate this for the future. This article assumes you are using a Mac as most of the software featured here isn’t available on Windows. I also recommend anyone who has a keen interest in this subject to follow Bradley Chambers on Twitter, he shares this passion and invests more time in it than I do.

Like many, this blog post would probably not exist had Everpix not shutdown. Pre-Everpix I had photos stored everywhere, mostly dumped into random folders synced to Dropbox. I would often restore iOS betas to my iPhone, dumping my photos out to randomly named folders before doing so. My dad had 20,000 something photos on his iMac in iPhoto dating back to 2006 which I synced to my Everpix account and then backed up using CrashPlan. Everpix – the be all and end all of photo management software in the cloud. Install the desktop uploader, iOS app – tell it to upload everything – be ignorant to the complete mess behind the scenes. Everpix hid this, allowed you to view your photos with no duplicates, no folder structure, just pure photos. Boy was Everpix a good product. A massive time saver, not fully realised by myself until now.

Exporting Your iPhoto Library

Sven Fechner wrote up an excellent blog post that details how to correctly export your iPhoto library and then sort those photos into a date based folder structure. At first I followed Sven’s method, it involves using Phoshare to export your photos instead of File>Export in iPhoto. The reason behind this is that iPhoto will set Date modified and Date created on all the files it exports to today’s date, this hampers the automated sorting process later on. I have since discovered DIM – Digital Image Mover that can sort photos based on Exif data rather than the Date created / Date modified time stamps.

Sorting your photos into folders

Once you have an export of all your images, or if you do not use iPhoto you may already have all your photos dumped into folders, you’ll want to organise them into subfolders based on Year, Month and depending on the number of photos you have maybe even days. When I first experimented with this I used Hazel to organise my photos into subfolders, as described by Sven. Due to limitations in OS X Spotlight Hazel can only organise photos based on the file’s ‘Date Created’ or ‘Date Modified’ and not Exif data from the image itself such as ‘Date Time Digitized’ and ‘Date Time Original’. Really if you’re going to organise your photos accurately you’ll want to use Exif data as this remain unchanged since your camera first recorded the image, unlike ‘Date Created’ and ‘Date Modified’. For this I used DIM: Digital Image Mover 5.0 Beta.


I only stumbled across DIM the other day. It looks to of been created in 2001, the latest beta version released Dec 2013 still has a UI from 2001 but that really doesn’t matter. The tasks the software is designed to do it does very well. It sorted my 27,000+ photos into subfolders in less than 5 minutes. The software is very self-explanatory, it will copy your photos by default rather than altering the originals. To speed things up a bit I switched off the Geolocation functionality. This allows you to sort your photos based on Location. You can see which DIM settings I switched on and off here and here.


If like me you have a ton of duplicate photos there are two pieces of software that can help with that.

MacPaw Gemini is an awesome piece of software, you can use it to de-duplicate any file not just photos. Gemini scanned through my 60GB photo collection in less than 1 minute showing me all the duplicates. The software itself is pretty self explanatory and doesn’t require any guidance. Gemini will detect duplicates based on the file itself rather than the image content i.e those that are exactly the same size and share the same MD5 hash. It won’t detect two of the same photo if one has been resized or compressed. For this you need PhotoSweeper.


PhotoSweeper is an excellent piece of software that has helped me to remove all my what I call ‘WhatsApp duplicates’. These are low-resolution copies of photos that WhatsApp creates and end up in recipients Camera Roll when you send somebody a photo i.e your partner that syncs their photos to the same Dropbox account as you. PhotoSweeper isn’t as 100% accurate as Gemini. It’s detection methods are based on the image content itself however I found it more than accurate enough to rid my collection of nearly all duplicates.

Saving space

When I first started sorting through my photo collection I told myself I wouldn’t compress or shrink any of my photos. After researching the process behind compressing a JPEG photo I changed my mind. Although I do not fully understand the science behind losslessly compressing a JPEG image I do trust it and the results are amazing. At no point have I shrunk the resolution of any of my photos, only re-organised the ‘huffman tables’ to shrink the size of the file itself. The software I am using is jpegoptim, available as a command line tool for Mac OS X and GUI here. I will show you how to implement an automated workflow using the command line tool.

Lossless compression (100% quality) produced significant savings in photos where colors are similar across the whole image. I started to experiment with lower quality settings. At 80% I was hitting near 60% space savings on every image I ran through jpegoptim with no visual loss of image quality. I ran a handful of completely different photos from different devices through jpegoptim with a maximum quality of 80%, zoomed in to 100% and could spot no differences to the original. My whole photo collection now comes in at 25.29 GB where as before running this process it was just over 55 GB, that’s just under a 50% reduction in size with no visible loss in photo quality.

Automating this – My Workflow

As I only shoot using my iPhone my workflow is perhaps more simple than most. I use Dropbox to automatically sync my photos and then have a very simple set of Hazel rules that compress, tag and sort my photos from there. Dropbox’s camera upload app is very well made, files are dropped into a ‘Camera Uploads’ folder renamed with the date the photo was taken, extracted from Exif, it also sets the ‘Date Created’ as per Exif.

Rule 1

Rule Name: Compress, tag, move to photos folder
Target: Dropbox/Camera Uploads
Conditions: Extension is jpg
Actions: Run shell script- embedded script:

jpegoptim -m80 -p "$1"

Add tags: jpegoptim80
Move: to folder: Dropbox/Photos

The embedded script broken down. jpegoptim calls the executable, -m80 sets quality to 80%, -p “$1″tells jpegoptim to process the file that Hazel has detected in Dropbox/Camera Uploads.

Hazel Rule

Rule 2

Rule Name: once moved to photos, sort into subfolders
Target: Dropbox/Photos
Conditions: Extension is jpg
Actions: Sort into subfolders: date created (2013) ▸ date created (12 – Dec) ▸ date created (01)

Hazel Rule

This sorts the photos into a dated folder structure moved to Dropbox/Photos by Rule #1. Here we can use Hazel because Dropbox has set the Date Created / Date Modified attributes on the file as per Exif.

What about Video?

I’m yet to work out how I can de-duplicate, compress and accurately sort my videos. Nearly 70% of my Dropbox is consumed by video and I’d love to do get that number down. If anybody has a similar workflow for video I’d love to here it.

Virgin Media, why are you manipulating my traffic?


  • 6th April: TalkTalk appear to be doing something similar
  • 7th April: Plusnet are doing it too. The responses to my post have hightlighted that using DNSCrypt + OpenDNS doesn’t allow you to opt out of this behaviour which suggests a deal between the ISPs, Google and OpenDNS has been made.
    • *

Virgin Media why does www.google.com resolve to host-62-253-8-99.not-set-yet.virginmedia.net?
What a funny name for a PTR record, but seriously, why are you manipulating my traffic?

I was testing something only to find that google.com, google.co.uk both resolve to an IP address owned by Virgin Media.

_PING google.com ( 56 data bytes 64 bytes from icmpseq=0 ttl=58 time=17.569 ms

nslookup www.google.com Server: Address:
Non-authoritative answer: Name: www.google.com Address:

host domain name pointer host-62-253-8-103.not-set-yet.virginmedia.net.

Performing a dig a google.com +trace fools me into thinking that ns1.google.com is dishing out these Virgin owned IPs, yet a query from elsewhere tells me otherwise.

Using Virgin Media

dig a google.com @ns1.google.com
;; ANSWER SECTION: google.com. 300 IN A 
google.com. 300 IN A 
google.com. 300 IN A 
google.com. 300 IN A 
google.com. 300 IN A 
google.com. 300 IN A 
google.com. 300 IN A 
google.com. 300 IN A 
google.com. 300 IN A 
google.com. 300 IN A 
google.com. 300 IN A 
google.com. 300 IN A 
google.com. 300 IN A 
google.com. 300 IN A 
google.com. 300 IN A 
google.com. 300 IN A

Another ISP

dig a google.com @ns1.google.com
;; ANSWER SECTION: google.com. 300 IN A 
google.com. 300 IN A 
google.com. 300 IN A 
google.com. 300 IN A 
google.com. 300 IN A 
google.com. 300 IN A

Most odd. Especially seeing as I do not use Virgin Media’s DNS resolvers, I use OpenDNS.m7.lon.opendns.com to be exact, according to www.dnsleaktest.com.
OpenDNS’ cache check matches my other ISP, a whole bunch of IPs none of which are anywhere near this we’re seeing from Virgin Media.

So for some reason Virgin Media someone is manipulating the DNS response I recieve from OpenDNS’, for google.com, google.co.uk and possibly other domains. They’re also proxying google.com to me as loading in a web browser shows me Google’s home page, creepy.
Ok so where does a traceroute take me?

traceroute: Warning: google.com has multiple addresses; using traceroute to google.com (, 64 hops max, 52 byte packets

  • 1 ( 4.610 ms 4.257 ms 34.474 ms
  • 2 cpc10-sotn8-2-0-gw.15-1.cable.virginm.net ( 22.904 ms 79.800 ms 14.122 ms
  • 3 sotn-core-2a-ae6-610.network.virginmedia.net ( 13.692 ms 12.621 ms 11.575 ms
  • 4 popl-bb-1c-ae14-0.network.virginmedia.net ( 33.107 ms 16.609 ms 27.541 ms
  • 5 brnt-bb-1c-et-000-0.network.virginmedia.net ( 28.404 ms brnt-bb-1c-et-510-0.network.virginmedia.net ( 15.146 ms 25.651 ms
  • 6 haye-icdn-1-ae0-0.network.virginmedia.net ( 14.849 ms 16.701 ms 16.381 ms
  • 7 *
  • 8 *

Most interesting that it stops here: haye-icdn-1-ae0-0.network.virginmedia.net (

haye-icdn-1, what do you do? A quick google (ironic) reveals this thread titled ‘Virgin hijacking’. One user suggests:

Content Distribution Network ran by Virgin to try and speed things up. It’s not really hijacking, per-se and if it worked, it would actually be a good thing. The problem is, it’s heavily congested so has the opposite effect.


I have no idea why Virgin and OpenDNS feel the need to proxy or CDN google.com for me. The ping response time to one of Google’s actual IPs is 20.049 ms. From now on I will encrypt my DNS traffic to OpenDNS using DNSCrypt and one of the suggested DNS providers, it takes 5 seconds to install their app.

Ahh, that’s better 🙂

Some Notes

  • My Virgin SuperHub is in modem mode
  • It’s entirely possible Virgin Media has struck a deal with OpenDNS however I couldn’t find mention of that anywhere and it seems unlikely. The responses to this post have led me to believe some deal has been made.

Discuss at HN: https://news.ycombinator.com/item?id=7504737

Everpix is awesome

I use Everpix to backup and organise every photo I’ve ever taken – all 27,159 of them.

Not so long ago I used to take my camera with me everywhere, see my Flickr for proof. Over a few years the number of photos I’d taken started to add up (only a handful ended up on Flickr). The saddest part is that I have easily taken 5x the amount I have today. I can’t remember how but I stumbled across Everpix roughly 6 months ago and I’m fairly gutted I didn’t discover it sooner. Everpix is the answer to so many problems. I own a laptop, I own an iPhone and I own a digital camera. Where possible I want my photos to be automatically synced and stored in the cloud forever, without having to worry about storage limits.

I take photos on my iPhone every single day, only the most recent 1000 photos are synced to iCloud Photo Stream. If I take photos using my DSLR (rare these days) I’ll import those photos into iPhoto – the most recent 1000 photos from iPhoto are uploaded to Photo Stream pushing my iPhone photos out of Photo Stream. If I want to view a photo I’ve taken on my iPhone using my laptop I have to open iPhoto, wait a minute for them to download and import, then view. In 2013 this 1 minute wait isn’t cool. Photo Stream doesn’t have a web interface. With Everpix everything is instantly accessible through your web browser 1** **and you can store an unlimited number of photos. Apple’s idea is that you use iCloud Photo Stream to sync your photos from your iPhone, iPad back to your Mac and then you’ll use Time machine to store a backup from there. Apple also offer 5GB free iCloud backups which I should imagine most people don’t have switched on. Once you hit your 5GB limit (roughly 1700 photos) you can then purchase expensive additional iCloud storage.

Until now my photos were stored in unorganised folders that had accumulated over the years. I used Picasa to sort through it all. Unfortunately most of my photography is lost forever, the good stuff is on Flickr but mostly in scaled down formats. Thinking back I don’t really understand how I’ve managed to lose so many of the originals, I think I have possibly got fed up and installed a clean OS on my machine at the time and thought to myself ‘the good stuff is on Flickr, I don’t need the originals’. Same goes with my iPhone – installed a beta OS and accidentally wiped my photos in the process. Saying that I still have just over 27,000 photos (55GB) that aren’t lost.

Why I love Everpix

  1. No Duplicates. Everpix automatically de-duplicates your whole collection. It does it in a few ways. It makes a note of your file’s checksums to ensure it doesn’t re-upload any photos that are already in your collection. Photos that aren’t exact copies so different size, resolution, Everpix will de-duplicate server-side too so that only the best quality version is visible to you through the web interface and in the apps. The metadata from the copies is saved and listed beside the original so that you know where a photo came from. See it in action
  2. Everything is private by default. Everpix is designed to store all your photos including photos of your privates. By default nothing you upload to Everpix is shared with anyone. Sharing an album of 300 photos is achieved in a few clicks but you have to explicitly tell Everpix you want to share those photos.
  3. Unlimited. You can upload as many photos as you like to Everpix. I’ve even seen stories on twitter of people uploading well over 300,000 photos with no issues. I currently have just over 27,000 photos (with 2,000 duplicates). For free Everpix will store your last 1 years worth of photos, if you want to store more than that you’ll have to upgrade to the paid plan. For the price of a coffee each month it really is worth it.
  4. Sharing lots of photos is easy. When I get back from Holiday I like to round up all my best photos and then share those with my family (not the whole of Facebook). The easiest way to do that was to email lower res versions. With Everpix you can highlight as many photos as you like and send them across in seconds. You have the choice of sending a link to a private page with your photos on, or ‘Everpix mail’ which is basically just an email with that same link in it – if they’ve got an Everpix account the photos will also show up in their web interface or the Everpix app.
  5. Nothing gets deleted. Everpix doesn’t sync your photos, it uploads them and keeps them there forever. If you run out of local storage space and need to make room for more, go for it – your photos will still stay in your Everpix account.

Everpix encourage you to rediscover old photos. This is my favorite feature.  Built into their apps is a feature called ‘Flashback’. This will show you photos that were taken 1 year ago today, 2 years ago today, 3 years ago today etc. The 1 year ago today view always catches me out makes me realise how quickly time passes. I constantly find myself re-discovering classic moments and emailing them off with the subject of ‘This was one year ago!’. Everpix has another feature that encourages rediscovery and this is called Explore; it allows you to return photos based on photo subject. For example ‘Food’ ‘Nature’ ‘City’ ‘People’. Explore is still in it’s early stages and isn’t perfect, sometimes it doesn’t work at all. When it does it’s a fantastic way of looking through your photo collection from all years based on photo subject. Food is particularly interesting, for some reason if I’m about to eat something that looks badass I’ll take a photo of it and then never do anything with that photo, now I can trawl through all those photos re-living those awesome fatty moments.

I like Everpix because I trust it. It’s simple and it works well. You install the desktop client on your laptop and it will upload photos from any source (folders, iPhoto, Aperture) automatically. You install the iPhone app and it will sync all your photos, in the background, automatically. The post processing of your photos is nice but at the moment not the deal breaker. Everpix is what iCloud Photo Stream / iCloud backup should be.

Right now I don’t believe Everpix is designed as a standalone solution for professional photographers. There are three things that would stop a professional photographer from using Everpix.

  1. Doesn’t sync RAWs
  2. Doesn’t sync videos
  3. Compression. Everpix use a proprietary image optimization algorithm, a fancy way of saying they’ll do their absolute best to preserve the quality of your photos at full resolution. They don’t touch the originals, instead make a copy of the original and compress that. Uploads are quicker and for them it saves on storage space requirements. I am not a professional photographer so I don’t have an issue with this and at 100% I can’t tell the difference between an original and an Everpix-original. Picturelife deals with RAWs and videos, might take a look at this soon.

I recently read an article named Regular People Have No Idea How To Manage Photos On Their iPhone and it highlights that regular people really don’t do enough to look after their photos. Let Everpix do it for you and never worry about losing a photo again.

 1. As per Everpix website: Everpix can receive millions of new photos per day, which must then be processed by our very advanced Image Analysis. After this step is done, the automated Everpix views must also be updated. There might therefore be some delay between when a photo is uploaded and it showing up on Everpix.

Securing your digital world

Everything requires a password. A long time ago it bugged me that I couldn’t think of something secure so I visited a web page that randomly generated passwords for you and I used the same 8 character password ever since, it contained 1 number and 7 letters one of which was uppercase. What if one of your favourite services gets hacked? What if someone hacks Linkedin and leaks 6 million user passwords? Perhaps one day your favourite dating website might get hacked? These leaks give some interesting insights to people’s passwords, of the 1,215,846 passwords that were analysed in the recent eHarmony password dump 99.5% of them didn’t contain a special character.

I think it’s safe (or not) to assume that if you’re giving your password to somebody else to store and they don’t openly publish how they’re storing your password then it’s entirely possible it will get leaked for the world to use and abuse, so in that case we need to start setting completely different passwords for each website, but that’s going to be impossible to manage right?. This is where password managers are handy, one database holding all your passwords for all your sites encrypted with a master key. There are plenty of password managers out there all of which I wouldn’t have a problem storing my passwords with, ensuring your passwords are protected is what these guys do and the cloud based products are very attractive. I’m probably a bit old school but I  opted for a KeePass database stored in my Dropbox that is protected using two-factor authentication to which the one time passwords required to access my Dropbox can only be generated using my YubiKey.

So what is YubiKey and how can can two-factor authentication help me?

YubiKey it’s basically a USB device that any computer (laptop, PS3, iPad etc) will recognise a hardware keyboard. It has a contact/button on the top that allows you to generate one-time passwordsOATH-HOTPchallenge-response and 32 character static passwords. YubiKey has memory for multiple configuration profiles and allows you to program your key with up to two different profiles, press the contact between 0.3 and 1.5 seconds and you’ll get a code from profile 1, press the contact for 2-5 seconds and you’ll get a code from profile 2. My YubiKey is setup as follows: Configuration profile 1 = Static password, Configuration profile 2 = Yubico Cloud OTP.


One time passwords are 44 characters in length, by nature change each time they’re generated and the best bit is that once a OTP is used once it cannot be used again. Here’s an example of where this could help: You visit your favourite website whilst on holiday and unknowingly the internet cafe PC you’re using has a keylogger installed. You login, eBay asks for your username, password and a OTP generated from your YubiKey. That OTP is then checked against YubiCloud or time-based one-time algorithm you setup earlier and you’re authenticated. The hacker then tries to use those same details but they can’t get in! The YubiKey OTP they recorded is now worthless because it’s already been used. YubiCloud is completely open source and you can even run your own authentication servers. Developers can integrate their applications with YubiCloud using documented APIs for free. Unfortunately not all services support YubiCloud yet, such as Google, however there is talk that Google is trialing it internally. More and more of the web is starting to support two-factor authentication. Google and Dropbox today both support two-factor authentication in the form of time-based one-time passwords. This is basically a one-time password that stays the same for an amount of time and then changes after that time, for Google and Dropbox this time is 30 seconds. Both companies offer time-based one-time passwords via text message or phone call, they text you a 6-digit code to use alongside your username and password. Google will offer to call you if you can’t receive the text message (whenever that would be?!) and read out the 6-digit code to you, unfortunately some clever chap figured out a way to redirect voicemail to a VoIP service, CEO and Co-founder of CloudFlare Matthew Prince was victim to an attack like this and what’s to say your network provider won’t suffer the same social engineering hack. Anyone can spoof a phone number and many voicemail systems will grant access to your mailbox if it thinks you’re calling from your own number, enter the default pin of 0000 and they’re in! This relies on knowing a fair bit about the victim, still, not very secure!

Google Authenticator is a mobile app that generates time-based OTP for you and works with Dropbox too. There are alternative apps out there such as Authy that will generate these codes for you too.


I use Authy, myopt.net & KeePass. Myopt.net is an alternative to Google authenticator that securely stores the algorithm to generate your time-based one-time passwords using your YubiKey/YubiCloud. When you turn on two factor authentication for Dropbox, Google etc it will give you a QR code to scan or a randomly generated code that you then enter into your authenticator app, the app then uses this to generate those time-based one-time passwords. Myopt.net does not require registration you simply login with your YubiKey, you are then authenticated using YubiCloud and let into your own safe of time-based one-time passwords. The website functions the same way an app would generating new codes every 30 seconds based on that randomly generated code you were given when you enabled two-factor authentication. You’re not required to give provide any identifiable information to use the website and you can give random descriptions to each code you have the site generate, so even if somebody where to extract the codes from the site they’d be no use unless you specified the description as your full email address. It basically provides a third layer of security to those two-factor codes that are already providing the second later of security.

What if you lose your YubiKey/smart phone? Most services that allow you to protect your account using YubiCloud will let you register more than one YubiKey against it. I’ve done this with a few of my services and then kept a spare YubiKey at home (my girlfriend Stacey can then use the spare to authenticate against my stuff if she needs to). If you lose your smart phone, and therefore your time-based one-time password algorithms then you could be in trouble. Dropbox will allow you to recover your account using an ‘emergency code’, this code is given to you when you enable two-factor authentication. To gain access to my Gmail wouldn’t be an issue, as long as I hadn’t lost both my YubiKeys. If I had then I guess Google have some stringent way of checking you are the owner of your account, I don’t actually know how they verify this but I do know it’s a manual process and can take up to 5 days. I also take a backup of my KeePass database file and store it in other places online, it really doesn’t matter where I store it as it’s encrypted.

After you have setup a password manager, YubiKey etc test to see if you can gain access to your own accounts like a hacker would. Test your recovery options. If you see one there that you don’t like the look of, remove it. Whilst writing this post I tried to do a password reset on my Gmail account and noticed an old recovery question that I set years ago and didn’t even know was there, I’ve now removed this. Of course this doesn’t stop you becoming Mat Honan and falling victim to a companies poor security policies, this poor guy had his Macbook, iPad & iPhone all remotely wiped and best of all he didn’t have a backup for any of his data. The hackers used social engineering to have Amazon hand over the last 4 digits of his credit card, Apple then accepted those last 4 digits as authorisation to issue a password reset on his iCloud account.

Update: I recently discovered a chrome plugin called One Last Pass that generates and auto-fills password fields on websites based on a master password. Nothing is stored with the developers of the plugin and you can tell the plugin to remember your master key so you’re not typing it in all the time.

YubiKey and Mac OS X FileVault

If there’s one thing I seem to be obsessed with lately and that’s security. I’ve never had any of my accounts hacked but that doesn’t stop me. It happens all the time though and working in IT I should know how to prevent it. If you haven’t heard about YubiKey and you want to prevent getting your accounts hacked then you should check it out, it’s basically a USB key that anything you plug it into will see as a keyboard, if the application you’re authenticating against supports it then it can generate one-time passwords for you so even keyloggers can’t catch you out. It works with the likes of Gmail & Dropbox’s two-factor authentication, more and more services are starting to add it as a security feature.


YubiKey allows you to use one of it’s memory banks for a 32 character static password for services and devices that don’t support two factor authentication, such as Mac OS X FileVault. When you enable FileVault it requires you to enter your password pre-boot so that it can decrypt your stuff, similar to TrueCrypt and Bitlocker for Windows. So I setup a 32 character randomly generated password on my YubiKey and then set this as my OS X login password. I rebooted and here’s where the issue occurs, I press the button on my YubiKey to enter the password and OS X tells me wrong password.

Apparently there is a bug in the EFI firmware that doesn’t like with the speed at which the YubiKey enters the password and causes it to drop a few characters.

There is a workaround for this.

Using the YubiKey Personalization Tool that you programmed your YubiKey with you can set Output Character Rate, this is the speed in which the YubiKey types the characters. There are three increments in speed, I found that 40ms works great, 20ms suffered from the same issue however

So there it is, if you’re using you’ve programmed your YubiKey with a static password and you’re having problems authenticating at boot after enabling OS X FileVault then set your Output Character Rate to 40ms delay. Vuala!

WordPress Appliance - Powered by TurnKey Linux