Sven Utcke's Software Page

Latest Articles

Face Detection and Recognition for Linux

Fixed a bug in the "Face" suit of tools for face detection where the export to F-Spot couldn't deal with already used names for tags. In addition to F-Spot this also exports to DigiKam (although not the full 2.x format) and the files' EXIF data. Please let me know what other exporters you would like to see – what is your favourite photo manager?
Download | Read more...

What is it?

Screenshots

How does it work?

This is currently implemented as a suite of little scripts, which essentially do the following:

face-find:

Search for frontal views of faces in a list of given files or directories (optionally descend into all the directories recursively). This is also called face detection, and uses OpenCV's Haar cascade as its back-end, which is essentially a variant of the face detection algorithm known by its authors as Viola-Jones type algorithm.

Note that, while this approach will find most frontal faces in the images, it will also find a number of non-faces. How many depends on your particular set of images: while there might be only 10% to 20% false positives in image-sets which essentially contain lots of images (group photos, say), you might see up to 80% or 90% if most of your images do not contain any persons at all.

face-learn:

face-sort:

This is (the only currently implemented) part of the face recognition. face-learn essentially does a PCA on all the face-candidates found by face-find, calculating a set of so called eigenfaces, while face-sort then calculates distances in the space spanned by those eigenfaces, using this to find images that "look similar to this image", or "look similar to images with this tag", and label images as either non-faces, or with an appropriate tag.

face2f-spot:

Once some (or all) images have been labelled, this can be used to export theses labels to F-Spot.

check_dups:

One of the site-effects of face-find is that it will calculate md5sums on the actual image content (i.e., the image without headers). Those can be used to find (and weed out) duplicates of images where the image-content is identical, but the actual file used to store that content is not (and which are otherwise had to find).

The way this is currently used is:

run face-find on all your images so far. This will take a long time!
run face-learn. This will learn a best representation of you candidate faces - most of which unfortunately aren't faces at all.
run face-sort. The really important thing to do here is to get rid of as many non-face images as possible - but you probably will not be able to resist labelling quite a few faces anyway (even when I tell you now that this will work even better once you got rid of the non-faces and retrained).
retrain, by running face-learn once more.
label your faces in face-sort. The best way to do that, in my experience, is to click on a face, label a few faces, then "sort by tag", label a few more faces, click on a face again, and so on – i.e. iterate between "sort by thumbnail" and "sort by tag".
reiterate 4 and 5 as needed.
once new images get added, reiterate 1 (no image will be added twice, so it is save to recurse into a directory which contains both old and new images) and 5 (using only "sort by tag", this time).