Digital image matching, part 1: Hashing

Posted by Artem Popov on September 5, 2014

One particularly interesting planned Elog.io feature is matching images found online with the catalog, which in turn makes it possible to retrieve metadata by comparing image contents to the records in our database.

This, for instance, allows looking up metadata instantly from the browser when we see an image on the web. After some explorations on image matching described in Jonas’s earlier post, we came up with an image hashing algorithm that produces satisfactory image “fingerprints” usable for approximate matching of 2 images together and fuzzy searching.

The hashing algorithm is based on Block Mean Value algorithm[1]. Essentially, it divides and image into NxN blocks and calculates a sum of RGBA values for every block which is then compared to block median value and depending on whether the value is greater or less than the median, 0 or 1 bit is added to the hash.

Block Median Value algorithm source image and resulting hash. Hashed image: The Abduction of Europa by Rembrandt Harmensz. van Rijn. Digital image courtesy of the Getty’s Open Content Program.

Block Median Value algorithm source image and resulting hash.
Hashed image: The Abduction of Europa by Rembrandt Harmensz. van Rijn.
Digital image courtesy of the Getty’s Open Content Program.

Additional methods involve block overlapping and weighted distribution of pixel values across blocks for increasing accuracy during the search stage.

So far we’ve got two implementations of the algorithm: JavaScript and Python (command-line). Usage instructions can be found in the package README files. Python implementation is suitable for offline, i. e. server-side generation of hashes and JavaScript module can be used for calculating hashes right in the browser. We intend to use the latter module for querying Elog.io catalog using hashes and we’ll talk more about the search algorithm in one of the forthcoming posts.

[1] Yang, Bian, Fan Gu, and Xiamu Niu. “Block mean value based image perceptual hashing.” Intelligent Information Hiding and Multimedia Signal Processing, 2006. IIH-MSP’06. International Conference on. IEEE, 2006.

One thought on “Digital image matching, part 1: Hashing

  1. Pingback: Digital image matching, part 2: finding similar hashes | Commons Machinery

Comments are closed.