Naive Similarity checks between images


Naive Similarity checks between images

This blog post describes my first project experience involving Machine Learning and image processing .

The summer of 2016 i was at Yrals Digitals ,Bombay. The company automates video creation primarily for news articles , that is given a news article it produces a video news clip for the same .

The project had the following sub parts :

  1. Summarization of the text in the existing news articles .

  2. Using Clips of popular Entities mentioned in the news articles and putting them together with text and image transitions for the final video.

  3. A text to speech module for the audio in the news articles.

At the onset the people at the company including the six interns added the images to the platform for video creation manually .The final goal was to automate this lengthy process using the resources at hand thereby significantly speeding up the computation time for a video creation.

I planned the task in the following way :

At first we needed to use the textual information coming up from the news agencies . So a reasonable way was to extract important keywords from the text , removing all the stop words and stemming the other ones , i also queried DBPedia and knowledge graph for named entity recognition.

import nltk
import pymongo  
import spotlight 
with open('sample.txt', 'r') as f:
    sample = f.read()


sentences = nltk.sent_tokenize(sample)
tokenized_sentences = [nltk.word_tokenize(sentence) for sentence in sentences]
# print tokenized_sentences 
tagged_sentences = [nltk.pos_tag(sentence) for sentence in tokenized_sentences]
# print tagged_sentences

chunked_sentences = nltk.ne_chunk_sents(tagged_sentences, binary=True)
# print chunked_sentences
tags=['NN','NNP','NNS','FW']
d= [extra[i][0] for extra in tagged_sentences for i in range(len(extra)) if extra[i][1] in tags ]
print list(set(d))

def extract_entity_names(t):
    entity_names = []

    if hasattr(t, 'label') and t.label:
        if t.label()=='NE':
            entity_names.append(' '.join([child[0] for child in t]))
        else:
            for child in t:
                entity_names.extend(extract_entity_names(child))

    return entity_names

entity_names = []
for tree in chunked_sentences:
    entity_names.extend(extract_entity_names(tree))

x=set(entity_names)
  1. The organization used to store links to images that were manually fed into the platform , so there was a good chance that the images contained photos of the named entities retrieved from DBPedia and Google Knowledge Graph. This implied that i can get a decent cluster of images that contain similar named entities . This would help me tag the image in the existing database and retrieve it as a suggestion the next time somebody comes up with the same named entity in the news article .

  2. Another important realization was that news article images with named entities would not have tremendous amount of faces in them , so i could extract those faces from images and compare them in the subcluster of images i had obtained via the above method.

  3. This was the stage of experimentation , i was “naive” to beleive i could recognize faces with eigenvectors and PCA although papers dating way back do suggest the same.By means of PCA one can transform each original image of the training set into a corresponding eigenface.

    ” An important feature of PCA is that one can reconstruct reconstruct any original image from the training set by combining the eigenfaces. Remember that eigenfaces are nothing less than characteristic features of the faces. Therefore one could say that the original face image can be reconstructed from eigenfaces if one adds up all the eigenfaces (features) in the right proportion. Each eigenface represents only certain features of the face, which may or may not be present in the original image “

    ” If the feature is present in the original image to a higher degree, the share of the corresponding eigenface in the ”sum” of the eigenfaces should be greater. If, contrary, the particular feature is not (or almost not) present in the original image, then the corresponding eigenface should contribute a smaller (or not at all) part to the sum of eigenfaces. So, in order to reconstruct the original image from the eigenfaces, one has to build a kind of weighted sum of all eigenfaces. That is, the reconstructed original image is equal to a sum of all eigenfaces, with each eigenface having a certain weight. This weight specifies, to what degree the specific feature (eigenface) is present in the original image.”

    ” If one uses all the eigenfaces extracted from original images, one can reconstruct the original images from the eigenfaces exactly. But one can also use only a part of the eigenfaces. Then the reconstructed image is an approximation of the original image. However, one can ensure that losses due to omitting some of the eigenfaces can be minimized. This happens by choosing only the most important features (eigenfaces). Omission of eigenfaces is necessary due to scarcity of computational resources.”

I also extracted feature descriptors such as SURF descriptors and SIFT descriptors for these images and tried to club similar images naively together. These features were extracted via OpenCV. Shown below are some samples of matchings produced via short snippet of code :

import numpy as np
import cv2
from matplotlib import pyplot as plt

img1 = cv2.imread('st_face.jpg',0) # queryImage
img2 = cv2.imread('st_face2.jpg',0) # trainImage

# Initiate SIFT detector
sift = cv2.xfeatures2d.SIFT_create()

# find the keypoints and descriptors with SIFT
kp1, des1 = sift.detectAndCompute(img1,None)
kp2, des2 = sift.detectAndCompute(img2,None)

# BFMatcher with default params
bf = cv2.BFMatcher()
matches = bf.knnMatch(des1,des2, k=2)

# Apply ratio test
good = []
for m,n in matches:
    if m.distance < 0.75*n.distance:
        good.append([m])

# cv2.drawMatchesKnn expects list of lists as matches.
img3 = cv2.drawMatchesKnn(img1,kp1,img2,kp2,good,img1,flags=2)

plt.imshow(img3),plt.show()

Pogba - similarity 1
Pogba - similarity 1

For unsupervised methods i also used TF-IDF for clustering of news articles to reduce the search space for tagging named entities in images , which i will leave for another blog post.