A Black Box Comparison of Machine Learning Reverse Image Search for Cybersecurity OSINT Applications

Wekesa, Esther Nanjala; DeCusatis, Casimer; Zhu, Andy

doi:10.3390/electronics12234822

Open AccessArticle

A Black Box Comparison of Machine Learning Reverse Image Search for Cybersecurity OSINT Applications

by

Esther Nanjala Wekesa

¹,

Casimer DeCusatis

^1,* and

Andy Zhu

²

¹

School of Computer Science and Mathematics, Marist College, Poughkeepsie, NY 12601, USA

²

Spackenkill High School, Poughkeepsie, NY 12603, USA

^*

Author to whom correspondence should be addressed.

Electronics 2023, 12(23), 4822; https://doi.org/10.3390/electronics12234822

Submission received: 17 October 2023 / Revised: 20 November 2023 / Accepted: 27 November 2023 / Published: 29 November 2023

(This article belongs to the Section Artificial Intelligence)

Download

Browse Figures

Versions Notes

Abstract

:

Machine learning algorithms for reverse image search (a subset of open source intelligence or OSINT) provide a free, useful tool for determining the content of an image, where and when it was captured, and, in some cases, whether it has been digitally modified. Using a test data set of 24 images, we compared the performance of reverse image search for Google, Bing, and Yandex. Our black box experimental results are presented for three different categories of images (uncluttered images, images with significant background clutter, and facial recognition). The total number of correct images was highest for Google (65%), while Bing (55%) and Yandex (50%) yielded different results. Google was also the best at identifying cluttered, uncluttered, and facial images. We compare these results with previous studies and review how relative performance has changed over time. Accurate recognition rates for all reverse search platforms tested were higher for original images not previously uploaded, as opposed to images used in earlier studies. We validate our results using exchangeable image file format (EXIF) data and error level analysis (ELA) of selected images. Based on these results, best practices for OSINT image investigation are proposed.

Keywords:

cybersecurity; OSINT; machine learning; reverse image search

1. Introduction

Open source intelligence (OSINT) refers to the use of publicly accessible information for cybersecurity analysis. Open source intelligence collection and analysis techniques are based on the OSINT framework [1] and are increasingly useful in a world where more and more information is added to the internet every day. With billions of internet users sharing information on themselves, their organizations, and people and events they have knowledge of, the internet is a resource-rich environment for intelligence collection and operational security (OPSEC). There are a number of OSINT cybersecurity training programs available [2]. This is a broad field which has applications in cyberwarfare, computer forensics, and related disciplines. The approach can be so effective that some OSINT tools have been banned in certain parts of the world.

Reverse image search is a subset of OSINT which allows cybersecurity investigators to identify people, locations, and objects as well as to assess the veracity of an image. Reverse image search is widely available as a feature on many search engines using machine learning algorithms. Results from these tools can vary widely, and in some cases may even be highly biased or influenced by corrupt training data, which is very hard to detect. The model with the largest training data set will not necessarily be the most accurate, since different algorithms will yield significantly different results. A possible defense against this is to compare results from different models, rather than rely on a single source [3,4]. Since the algorithms also change over time and may include vendor proprietary extensions, it is necessary to conduct repeated controlled black box studies to assess the ongoing behavior of machine learning for reverse image search. This is not unique to our testing; machine learning is typically studied as a black box in cybersecurity, despite concerns such as the identification of spurious correlations that may impact generalizing conclusions [5]. Spurious correlations result from artifacts that correlate with the task that a machine learning system is trying to address but are not actually related to that task. The learning model may adapt to these artifacts instead of addressing the intended problem, leading to false associations. Sampling bias is a common cause of spurious correlations. When dealing with black box approaches to machine learning, depending on the objectives of the learning-based system, spurious correlations in one context may be considered a valid signal in a different context. A 2019 study of reverse image search for digital investigations [6] used a set of reference images to compare performance of several major reverse image search algorithms. This work notes that a relatively small number of images (10–15 or so) is sufficient to demonstrate significant differences in results. Further, this work points out a difficulty in comparing different machine learning algorithms, since the test images used in a given study become part of the training data set and may bias any subsequent studies.

In this paper, we compare the reverse image recognition capabilities of several different machine learning tools against a set of reference images and make recommendations about which approach is the most effective for different types of images. It is important to compare the results obtained from different reverse image search engines in order to assess their relative accuracy when evaluating different types of input images and to establish a level of consistency across results from multiple search engines. Not all search engines provide reverse image identification; for example, DuckDuckGo does not offer native reverse image search at this time. We selected three of the largest reverse image search engines for comparison (Google, Bing, and Yandex). Since these engines have been studied before [6], we can also compare our results with previous findings. We further note that some engines are popular for fraud detection and facial recognition, but cannot be used to reliability identify objects or locations. We considered two popular fraud detection systems, TinEye and PymEyes, which are used to locate copies of images online for plagiarism and copyright enforcement. However, since they were unable to identify image content, these two engines were excluded from the rest of our study. Each of the search engines we used has pros and cons. Google is by far the most popular and widely used; however, this does not necessarily mean that it has the best image search capabilities. Both Google and Bing are owned by private companies which collect data and track user preferences as part of their business model, raising concerns about privacy [7]. Yandex may also engage in data harvesting and, being a Russian company, has additional concerns [8]. They have also recently been impacted by the Russian war against Ukraine [9].

The remainder of this paper is organized as follows. After the introduction, we describe the framework for our study and the classification of our test images. We then present experimental results from Google, Bing, and Yandex, tested using five different compute platforms to remove any bias associated with the operating system or browser level. Results are assessed and compared with other sources of image verification data, such as the exchangeable image file format (EXIF) content and results of error level analysis (ELA).

2. Materials and Methods

Many of the leading search engines have not published the proprietary details of their machine learning algorithms. This has raised concerns about bias in these algorithms, which has led to recent legislation forcing companies to publish certain details about their algorithms [10]. Each of these systems has indexed tens of billions of images, with thousands of key features for each image, as their training data set, and yet is able to respond to a reverse image search query in a few hundred milliseconds or less. Further, both the training data sets and algorithms are being updated continuously. This makes it very difficult to describe the relationship between the algorithms and observed performance with any degree of certainty. In order to provide some context for our black box testing, we provide a brief, high-level review of the available information machine learning and image search algorithms.

Traditional systems for image retrieval from a large database use concept-based image indexing, which uses natural language text, keywords, heading, and similar text-based indexing. This is also known as description-based indexing. In contrast, content-based image retrieval (CBIR) refers to analyzing the contents of an image rather than descriptive metadata associated with the image. The content-based methods are more desirable, since the accuracy of concept-based techniques depends on the quality and completeness of the metadata, and thus can be much less accurate. Google, Bing, and Yandex all use some form of content-based image retrieval.

Concept-based methods require human operators to annotate each image, which is error prone and may be impractical for large data sets or for cases where images are generated automatically in near real time (such as video streaming). Miscataloging of images is fairly common, since the metadata might use different synonyms for the same object, or might classify images according to different naming conventions. For example, an image of a car might be classified as a vehicle, automobile, or by its specific brand and model year. Translation errors from different languages can also impair the accuracy of these systems. Many concept-based image retrieval systems suffer from these issues.

Image retrieval based on syntactical features inherent to the image, such as color or shape, avoid the pitfalls associated with concept-based retrieval systems [11]. For image identification or comparison, this approach uses both global features (such as the color histogram of the entire image) and local features (visual structures described by a small group of pixels). However, such systems can pose their own unique problems. A query for a concept-based system will rely on traditional database structured query language methods, whereas there are many different ways to format a query for content-based systems. A common method used in reverse image search is to provide the system with an approximation of the desired image. The system will return results that share common elements with the provided reference image. This avoids the problems associated with having to describe an image verbally. Common methods for comparing two images with CBIR include using image distance measures, which compares the two images in various dimensions (color, shape, texture, etc.). The image distance is smaller when two images are more alike; an image distance of zero represents a perfect match. Some properties, such as color, are independent of the size or orientation of the target image. For example, it is possible to construct a histogram that identifies the proportion of pixels in an image which contains a given color. Many different methods for measuring image distance have been proposed (so-called similarity models), and most commercially available reverse image search engines do not specify which combination of proprietary image distances they employ (hence the need for black box testing as in this paper).

Machine learning techniques have become more common in CBIR [12]. This is often combined with a fingerprinting algorithm that extracts select features of the image to form a unique fingerprint for comparison with other images. Examples include perceptual hashing (pHash), which is used to compare images without regard to scale, aspect ratio, or minor coloring differences [13]. Another variation is Google’s locality-sensitive hashing (LSH) algorithm, a data-dependent hashing system used with the approximate near neighbor (ANN) search [14]. The LSH algorithm hashes similar input items into categories with the number of categories being much smaller than the number of inputs. This differs from conventional hash techniques in that the number of hash collisions is maximized rather than minimized. It is a form of dimensionality reduction which preserves the relative distance between items in the search space. Thus, LSH is useful for data clustering and nearest neighbor searches. It is an example of a data-independent hashing algorithm. Smaller image features such as textons (clusters as small as 5 × 5 pixels) can be evaluated for texture histograms to improve accuracy [15].

It is also generally acknowledged that most machine learning algorithms for reverse image search likely employ algorithms such as feature extraction, similarity search, or dimensional reduction [16]. One approach to simplify the image recognition problem involves reducing the number of features in an image to a minimal set that still allows accurate reverse image search. Dimensionality reduction is the general term for the process of reducing the dimensions of an image feature set, also known as feature elimination or feature extraction. For example, a popular method of dimensionality reduction involves remapping higher dimensional data into a lower dimensional space (such as reducing 3D data to 2D data). Such techniques, known as principal component analysis (PCA), preserve the original properties of the data set while reducing the computational complexity of image recognition. There is a significant body of work on topological mapping of images, such as the common assumption that higher dimensionality data sets lie along lower dimensional manifolds. Such methods are often employed for machine learning and specialized deep learning algorithms, such as those we are testing for reverse image search.

Many algorithms employ scale-invariant feature transforms to extract local features of an image [17], or machine vision techniques such as maximally stable external regions [18]. Specifically, Google analyzes the input image data and metadata, constructs a mathematical model of the image, then uses machine learning to facilitate image recognition. Google integrated its Google Lens product, which originally performed reverse image searches on the Google Pixel camera, into the Chrome reverse image search in 2022.

Google is known to use a number of additional approaches, including scale invariant feature transform (SIFT), histograms of oriented gradients (HOG), generalized search tree (GIST), and speeded up robust features (SURF). Several of these approaches can be used to pre-condition training data for machine learning. The scale invariant feature transform (SIFT) is an algorithm designed to detect and match local features in an image. The SIFT algorithm is used to find local image features known as key points, which are scale and rotation invariant [19]. A database stores key reference points of an image, and when a reverse image search is performed, objects are recognized in the new image by comparing them with features in the database and computing the Euclidian distance between their feature vectors. The SIFT algorithm is invariant to image orientation, uniform scaling, and changes in illumination; it is robust at identifying images even in the presence of clutter or partial occlusion. A related algorithm, which runs somewhat faster but may not be as accurate in all cases, is speeded-up robust features (SURF), whose descriptors are based on the same principles as SIFT but differs in its method for determining key points of the image and describing local image neighborhoods. The SURF algorithm detects interesting points within the image scale space; it is suitable for larger files due to its faster execution time [20]. In both SIFT and SURF, the dimensionality of the image descriptors plays a significant role in computational complexity, robustness, and accuracy. SURF employs a form of wavelet transforms to achieve better computational speed and performance. Another form of feature descriptor widely used for reverse image search is the histogram of oriented gradients (HOG). The HOG is a feature descriptor, like SIFT, which focusses on the shape of an object [21]. This technique counts occurrences of gradient orientation in localized parts of an image (similar to scale-invariant feature transforms descriptors). It is computed using a dense, uniformly spaced grid overlay with overlapping local contrast normalization (which improves accuracy). Once the image is divided into uniform grid segments (either rectangular or radial in shape), a histogram of gradient directors is compiled for every pixel within the grid space. The magnitude and angle of an image segment are combined to form the image gradient, which is divided into an 8 × 8 matrix called a block. A histogram of each block is calculated, then 4 blocks are combined into a new block whose histograms are combined to form a feature vector, which can be used for image comparison. A concatenation of these local grid histograms forms the HOG image descriptor. Grouping the cells together in this manner ensures local normalization of the gradient strengths, thus making the transform invariant to changes in illumination or contrast. This descriptor is invariant to geometric transformations since it operates on local grid elements. While SIFT and SURF descriptors are usually computed at sparse, scale-invariant key points and rotated to align their orientations, HOG descriptors are computed in dense grids at a single scale, without rotational alignment. Further, HOG descriptors do not require image pre-processing (such as normalizing color values), which is often employed as a separate step for SIFT and SURF to improve accuracy.

The generalized search tree (GIST) algorithm is another way to provide enhanced indexing of images, using PostgresSQL and creating different types of self-balancing tree structures [22]. Both a data structure and an application programming interface, GIST provides a height-balanced search tree structure that can be used to implement a wide range of indexed, disk-based search trees. These can be used for any data type that can be ordered into a hierarchy of supersets and allows the use of any query predicates which may be convenient for the data set. GIST supports both lossy and lossless compression. It is a good example of software extensibility, supporting nearest neighbor search and a variety of statistical search approximation extensions. It has been implemented in the PostgreSQL relational database and many other database structures. Google may also use other proprietary approaches which have not yet been published.

Bing and Yandex are both known to use CBIR and fingerprinting. Bing generates a variety of features to describe image content, then uses a three-level identification and cascaded ranking system [23]. Yandex is known to use a facial recognition algorithm called FindClone (formerly known as SearchFace) [24]. For other types of image search, Yandex employs a machine learning algorithm called CatBoost [25]. Since most machine learning tools will not support features unless they are converted into a numerical fingerprint, CatBoost uses a gradient-boosted decision tree to map non-numeric image features into a useful format.

3. Results and Discussion

We evaluated reverse image search results for Google, Bing, and Yandex using a set of 24 test images, as shown in Figure 1. The correct identification for each image and related image classification information are given in Table 1. The number of images was selected based on similar prior research [6] which indicated that this is a sufficiently large data set for comparisons between three different reverse image search approaches.

Test images represent the U.S., Eastern and Western Europe, Russia, and South America. As indicated in Table 1, images 1–8 were used in a prior 2019 study [6]; images labeled “original” were not previously uploaded to anywhere on the Internet, and images labeled “stock photo” were taken from public Internet searches. There are also several different classifications of images, which were presented in a scrambled order to minimize any ordering bias from the machine learning algorithms [3]. Eight images show a single object/building and are thus classified as uncluttered images. Another eight images are cluttered to some degree with background images of people, cars, or skylines (they lack a single object focal point). The final eight images show people so that we can evaluate facial recognition, but not all of these are images of real people. Image 20 is a known fake image, artificially generated by a Russian troll farm [26]. Images 23 and 24 are also fake images, created using artificial intelligence, from ThatPersonDoesNotExist.com (accessed on 25 November 2023).

We ran five independent tests for each image on a given search engine. We used a simple rubric which scored one point for correctly identifying the image as compared with the results in Table 1, and zero points otherwise. Thus, a search engine could theoretically achieve a perfect score of 120 points for correctly identifying each image on every attempt. For images 20, 23, and 24, correct identification required not only identifying the person’s name but also recognizing that the image was a known fake. Results are shown in Figure 2. None of the reverse image searches tested were perfect. The algorithm with the most correct identifications was Google (78 out of 120 points, or 65%). The next best was Bing (55%), followed by Yandex (50%). While different data sets may yield somewhat different percentages, prior publications suggest that our data set is large enough to be representative [6]. These results differ from the 2019 study [6], which ranked Yandex highest and Google lowest; this is likely due to changes in the algorithms and training data over time. One image (#4) could not be identified by any search engine; this is the cluttered street scene from Sao Paulo, Brazil, which was previously part of the Bellingcat data set [6].

We further analyzed performance of each machine learning system when identifying an uncluttered image, cluttered images, and people. Results are summarized in Figure 3, which indicates the number of correctly identified images in each case. For our test data set, Google and Bing yielded similar results for cluttered and uncluttered images, while Yandex performed significantly worse. For identification of people, Google and Yandex yielded similar results, while Bing was significantly worse. In one notable exception, the Russian engine Yandex did not identify image 20 from a Russian troll farm as fake.

As noted previously, these results can change over time as the machine learning algorithms are updated or additional training data is collected. We are aware of several recent updates to the Google and Bing algorithms, such as the integration of Google Lens with Google reverse image search. We conducted our original testing in July 2022, then repeated the testing in October 2022. Both Google and Bing made significant improvements in their algorithms, as shown in Figure 4, while Yandex (not shown in this figure) remained unchanged. We note that all prior results discussed so far use the most recently available version of the reverse image search algorithms.

We recognize that each time a black box test is conducted, the results may affect subsequent studies, since the images used in a given study become part of the training data set for subsequent studies. To evaluate this effect, we retested images 1–8, which are taken from a 2019 study published by Bellingcat [6], and compared the results with images 9–19, which have not been previously uploaded. The results are shown in Figure 5. While we hypothesized that previously used images might be identified more accurately than original images, this was not found to be the case. Rather, all three engines did a better job identifying the images which had not been previously uploaded. As before, none of the engines was able to correctly identify all of the test images, regardless of whether or not they had been used previously.

For cybersecurity analysis, it is important to not only test an image with multiple machine learning systems but also to verify the results using other image analysis tools. The EXIF industry standard specifies formats for images and related metadata used by all digital camera manufacturers as well as by other systems handling images and sound files recorded by digital cameras. This standard is regularly updated to keep pace with advances in digital image hardware and software. We used the most recently available update (March 2023) in our work. The metadata tags defined in the EXIF standard cover a wide range of topics. These include digital camera settings (information such as the camera make and model, aperture, shutter speed, focal length, and more), image metrics (pixel dimensions, resolution, file size, and color space), data and time stamps, location stamps, thumbnail images for previewing pictures in file managers, copyright information, and more. This metadata can be compared with results from a reverse image search to corroborate the correct identification of an image. For example, the EXIF metadata was available for images 9–16, as shown in the sample screen shot of Figure 6 (data were collected using the open source EXIF Reader [27], all highlighted fields are editable).

Image modifications can also be detected using error level analysis (ELA), a technique which highlights differences in JPEG compression rates. High contrast edges, textures, and surfaces can be used to assess whether an image has been digitally modified from its original content. We reviewed all 24 images using the ELA tool in the open source Fotoforensics toolkit [28]. A sample result for image 3 is shown in Figure 7, from which it can be determined that the person in the middle of the photo was digitally added to the original image.

4. Conclusions

Machine learning algorithms for reverse image search provide a free, useful tool for determining the content of an image, where and when it was captured, and, in some cases, whether it has been digitally modified. Using a test data set of 24 images, we compared the performance of reverse image search for Google, Bing, and Yandex (other reverse image search engines can locate copies of an image but are not designed to identify image content). Our black box testing results showed that Google currently offers the most accurate reverse image search results, followed by Bing and Yandex. This differs from a prior study published in 2019; we also determined that both Google and Bing have made significant improvements in the past six months. Despite this, none of the engines were able to achieve better than 65% accuracy, and we found at least one image which was not correctly identified by any of the engines. Nonintuitively, we found that previously used test images were not easier to identify than images which have never been previously uploaded. For our test data set, Google and Bing yielded similar results for cluttered and uncluttered images, while Yandex performed significantly worse. For identification of people, Google and Yandex yielded similar results, while Bing was significantly worse; there were also issues identifying several known fake images of people. Our results suggest that while Google currently has the highest percentage of correct image identifications, no single engine should be relied upon exclusively at this time. As a best practice, OSINT images should be analyzed using multiple reverse image search algorithms to achieve the highest possible accuracy. Further, the veracity of images should be validated using other tools, such as EXIF data (when available) or ELA analysis. Additional studies are planned to assess future changes in the algorithms and improvements in training data sets.

Author Contributions

Conceptualization and methodology, C.D.; validation and formal analysis, C.D., E.N.W. and A.Z.; investigation and data curation, E.N.W. and A.Z.; writing—original draft preparation, C.D., E.N.W. and A.Z.; writing—review and editing, visualization, supervision, project administration, C.D. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Our data set will be made available on the Marist Innovation Lab public GitHub site: https://github.com/Marist-Innovation-Lab (accessed on 25 November 2023).

Acknowledgments

We acknowledge the support of Andreas Ramdas and Jack Mullane for collecting some of the raw data for the image comparison work. We also acknowledge the support of Jennifer Maloney and Christine Upright of Spackenkill High School for facilitating Andy’s student mentorship.

Conflicts of Interest

The authors declare no conflict of interest.

References

Nordine, J. OSINT Open Source Framework. Available online: https://osintframework.com/ (accessed on 20 December 2022).
The SANS Institute OSINT Training Course. Available online: https://www.sans.org/cyber-security-courses/advanced-open-source-intelligence-gathering-analysis/ (accessed on 20 December 2022).
Doctorow, C. Backdooring a Summarizerbot to Shape Opinion. Available online: https://pluralistic.net/2022/10/21/let-me-summarize/#i-read-the-abstract (accessed on 20 December 2022).
Bagdasaryan, E.; Shmatikov, V. Spinning Language Models: Risks of Propaganda-as-a-Service and Countermeasures. 8 April. 2022. Available online: https://arxiv.org/pdf/2112.05224.pdf (accessed on 20 December 2022).
Arp, D.; Quiring, E.; Pendlebury, F.; Warnecke, A.; Pierazzi, F.; Wressnegger, C.; Cavallaro, L.; Rieck, K. Dos and Don’ts of Machine Learning in Computer Security. In Proceedings of the USENIX Security Symposium 2022, Boston, MA, USA, 10–12 August 2022; Available online: https://arxiv.org/abs/2010.09470 (accessed on 20 December 2022).
Toler, A. Guide to Using Reverse Image Search for Investigations. 2019. Available online: https://www.bellingcat.com/resources/how-tos/2019/12/26/guide-to-using-reverse-image-search-for-investigations/ (accessed on 20 December 2022).
Nielo, D. All the Ways Google Tracks You and How to Stop It. Wired. 27 May. 2019. Available online: https://www.wired.com/story/google-tracks-you-privacy/ (accessed on 20 December 2022).
McGee, P. Data Harvesting Code in Mobile Apps Sends Users to Russia’s Google. Financial Times/Ars Technica 29 March. 2022. Available online: https://arstechnica.com/information-technology/2022/03/data-harvesting-code-in-mobile-apps-sends-user-data-to-russias-google/?amp=1 (accessed on 20 December 2022).
Starobin, P. When War Came for Russia’s Biggest Tech Company. Wired 22 March. 2022. Available online: https://www.wired.com/story/yandex-arkady-volozh-russia-largest-tech-company/ (accessed on 20 December 2022).
Vincent, J. Google, Meta, and Others Will Have to Explain Their Algorithms under EU Legislation. The Verge. Available online: https://www.theverge.com/2022/4/23/23036976/eu-digital-services-act-finalized-algorithms-targeted-advertising (accessed on 20 December 2022).
Lew, M.; Sebe, N.; Djeraba, C.; Jain, R. Content-Based Multimedia Information Retrieval: State of the Art and Challenges. ACM Trans. On Multimedia Computing, Communications, and Applications. 2006. Available online: http://www.ugmode.com/prior_art/lew2006cbm.pdf (accessed on 20 December 2022).
Cardoso, D.N.M.; Muller, D.J.; Alexandre, F.; Neves, L.A.P.; Trevisani, P.M.G.; Giraldi, G.A. Iterative Techniques for Content-Based Image Retrieval Using Multiple SVM Ensembles; Federal University of Parana: Curitiba, Brazil, 2013. [Google Scholar]
Klinger, E.; Starkweather, D. pHash Open Source Perceptual Hash Library. Available online: www.phash.org (accessed on 20 December 2022).
Andoni, A.; Indyk, P. Near optimal hashing algorithms for approximate nearest neighbor in higher dimensions. Commun. ACM 2008, 51, 117–122. [Google Scholar] [CrossRef]
Zhu, S.C.; Guo, C.; Wu, Y.N.; Wang, Y. What Are Textons? In Proceedings of the European Conference on Computer Vision, Copenhagen, Denmark, 27–30 June 2022; Available online: https://escholarship.org/uc/item/5pk7v8c5 (accessed on 20 December 2022).
Koul, A.; Ganju, S.; Kasam, M. Practical Deep Learning for Mobile, Cloud, and Edge; O’Reily: New York, NY, USA, 2019; Chapter 4. [Google Scholar]
Wang, Z.; Mei, Y.; Yan, F. A New Web Image Search Engine by Using SIFT Algorithm. In Proceedings of the 2009 International Conference on Web Information Systems and Mining, Shanghai, China, 7–8 November 2009; Available online: https://www.computer.org/csdl/proceedingsarticle/wism/2009/3817a366/12OmNzd7bLg (accessed on 20 December 2022).
Matas, J.; Chum, O.; Urban, M.; Pajdla, T. Robust wide-baseline stereo from maximally stable extremal regions. Image Vis. Comput. 2002, 22, 761–767. [Google Scholar] [CrossRef]
Lowe, D.G. Object recognition from local scale invariant features. In Proceedings of the IEEE International Conference on Computer Vision, Kerkyra, Greece, 20–27 September 1999; Volume 2, pp. 1150–1157. Available online: https://ieeexplore.ieee.org/document/790410 (accessed on 20 December 2022).
Bay, H.; Ess, A.; Tuytelaars, T.; Gool, L.V. Speeded-up Robust Features (SURF). J. Comput. Vis. Image Underst. 2008, 110, 346–359. Available online: https://www.tarjomefa.com/wpcontent/uploads/2016/09/5349English.pdf (accessed on 20 December 2022). [CrossRef]
Dalal, N.; Triggs, B. Histograms of Oriented Gradients for Human Detection. Available online: http://lear.inrialpes.fr/people/triggs/pubs/Dalal-cvpr05.pdf (accessed on 20 December 2022).
Hellerstein, J.; Naughton, J.; Pfeffer, A. Generalized search trees for database systems. In Proceedings of the 21st VLDB Conference, Zurich, Switzerland, 11–15 September 1995; Available online: https://pages.cs.wisc.edu/~nil/764/Relat/8_vldb95-gist.pdf (accessed on 20 December 2022).
Hu, H.; Wang, Y.; Yang, L.; Komlev, P.; Huang, L.; Chen, X.; Huang, J.; Wu, Y.; Merchant, M.; Sacheti, A. Web Scale Responsive Visual Search at Bing. Available online: https://dl.acm.org/doi/pdf/10.1145/3219819.3219843 (accessed on 20 December 2022).
FindClone Mobile App. Available online: https://findclone.ru/ (accessed on 20 December 2022).
Catboost Library. Available online: https://catboost.ai/en/docs/ (accessed on 20 December 2022).
Collins, B.; Kent, J.L. Facebook, Twitter Remove Disinformation Accounts Targeting Ukrainians. NBC News, 28 February. 2022. Available online: https://www.nbcnews.com/tech/internet/facebook-twitter-remove-disinformation-accounts-targeting-ukrainians-rcna17880 (accessed on 20 December 2022).
Harvey, P. EXIF Tool. Available online: https://exiftool.org/ (accessed on 20 December 2022).
FotoForensics. Available online: https://fotoforensics.com/ (accessed on 20 December 2022).

Figure 1. Thumbnails of 24 image test data set.

Figure 2. Total correct image identification score for different machine learning reverse image search engines (theoretical best score is 120).

Figure 3. Correct image identification for different search engines using different types of images.

Figure 4. Change in correct image identification score over time for Google and Bing (old scores from July 2022, new scores from October 2022).

Figure 5. Correct image identification score for different search engines using 2019 Bellingcat images vs. original image data; theoretical maximum score of 50 for Bellingcat and 40 for other search engines. As in all other graphs, Google data is shown in blue, Bing in green, Yandex in purple, and Bellingcat in orange.

Figure 6. Sample EXIF metadata (fields highlighted in red are editable; this figure illustrates an example showing partial contents of the EXIF fields and formatting, it is not intended as a complete list).

Figure 7. Test image #3 (a). ELA analysis for test image #3 (b).

Table 1. Test image data set.

Image Number	Test Image	Image Classification
1	Olisov Palace, Russia	Cluttered, from 2019
2	Cebu, Philippians	Cluttered, from 2019
3	Stock Photo, Bloomberg ad	People, from 2019
4	Sao Paulo, Brazil	Cluttered, from 2019
5	Amsterdam	Uncluttered, from 2019
6	Hedgehog in the Fog pub	Uncluttered, from 2019
7	Press secretary R. Giuliani	People, from 2019
8	Terrorist S. Dubinsky	People, from 2019
9	Petco Park, San Diego, CA	Cluttered, from 2019
10	Vanderbilt Mansion, NY	Uncluttered, original
11	Frederick Church, Denmark	Uncluttered, original
12	Catherine’s Palace, Russia	Uncluttered, original
13	Amilenburg, Denmark	Uncluttered, original
14	Copenhagen Opera House	Uncluttered, original
15	Stockholm, Sweden	Cluttered, original
16	Porvoo, Finland	Cluttered, original
17	Breast, France	Cluttered, original
18	Tallin, Estonia	Uncluttered, original
19	Copenhagen, Denmark	Cluttered, original
20	Vlad Bonderenko (fake)	People, stock photo
21	Kevin Mitnick	People, stock photo
22	Marcus Hutchins	People, stock photo
23	fake	People, stock photo
24	fake	People, stock photo

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wekesa, E.N.; DeCusatis, C.; Zhu, A. A Black Box Comparison of Machine Learning Reverse Image Search for Cybersecurity OSINT Applications. Electronics 2023, 12, 4822. https://doi.org/10.3390/electronics12234822

AMA Style

Wekesa EN, DeCusatis C, Zhu A. A Black Box Comparison of Machine Learning Reverse Image Search for Cybersecurity OSINT Applications. Electronics. 2023; 12(23):4822. https://doi.org/10.3390/electronics12234822

Chicago/Turabian Style

Wekesa, Esther Nanjala, Casimer DeCusatis, and Andy Zhu. 2023. "A Black Box Comparison of Machine Learning Reverse Image Search for Cybersecurity OSINT Applications" Electronics 12, no. 23: 4822. https://doi.org/10.3390/electronics12234822

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Black Box Comparison of Machine Learning Reverse Image Search for Cybersecurity OSINT Applications

Abstract

1. Introduction

2. Materials and Methods

3. Results and Discussion

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI