How to use "Match Image" API of Content Moderator?

Category: project oxford

Question

ImasterMaster123 on Tue, 24 May 2016 16:29:25


How does the "Match Image" API of Microsoft Content Moderator work?

Let's take the following 3 images added by our users one after the other via our Add Form

1) https://upload.wikimedia.org/wikipedia/commons/thumb/c/c6/Leuchtturm_in_Westerheversand.jpg/800px-Leuchtturm_in_Westerheversand.jpg

2) https://upload.wikimedia.org/wikipedia/commons/thumb/9/97/Olympiastadion_Berlin_Innenansicht.jpg/800px-Olympiastadion_Berlin_Innenansicht.jpg

3) https://upload.wikimedia.org/wikipedia/commons/thumb/c/c6/Leuchtturm_in_Westerheversand.jpg/320px-Leuchtturm_in_Westerheversand.jpg


We want to check whether each new posted image is a duplicate of any of the earlier posted images. Thus using your "Match Image" service, it should catch that Image 3 is a duplicate of previously submitted Image 1.

How do I go about it? First we have to do "Add Image", then "Refresh Index" & then "Match Image"? Then what is "Cache Image" & "Match Cached Image"?

Questions :

a) I am already sending these images one by one for Nudity Checking. If at that time, I just give cache=true, then is it possible that the image will get added to your cache, and would be ready to use for the "Match Image" API without any Index Refreshing?

b) Adding an image to your list and then checking if it matches with any other previous image, these would cost be 2 counts or just 1 count? Thus 1 count for "Add Image", 1 count for "Cache Image", then Refresh Index, and then 1 Count for "Match Image". Thus it would cost us 3 points for matching just a single image?

c) Is it possible to get a response of nudity checking and image matching in a single request? Thus, I save time. This would cost us 2 points right?

d) On the page of "Refresh Index", it says it can take around 5 minutes to do get the index updated. But on my site, users would be submitting images continuously. Thus if 2 images are submitted just seconds apart, we would like duplicate checking to happen even for such images. Would this be possible?



Below are the things we tried to make "Match Image" work, but we are lost. Please help!


* Trial A)

I tried adding images to the Cache via "Custom Image List Management > Add Image" link. It gives Error


- HTTP request

POST https://api.microsoftmoderator.com/Image/Validate/Image/Add HTTP/1.1
Host: api.microsoftmoderator.com
Ocp-Apim-Subscription-Key: ••••••••••••••••••••••••••••••••
Content-Length: 102

{
  "DataRepresentation":"URL",
  "Value":"https://upload.wikimedia.org/wikipedia/commons/thumb/c/c6/Leuchtturm_in_Westerheversand.jpg/800px-Leuchtturm_in_Westerheversand.jpg"
}


- Response Body

Date: Tue, 24 May 2016 15:35:43 GMT
X-Powered-By: ASP.NET
Content-Length: 152
Content-Type: application/json; charset=utf-8

{
  "Status": {
    "Code": 3004,
    "Description": "Error occurred while processing request",
    "Exception": null
  },
  "TrackingId": null,
  "AdditionalInfo": [],
  "ContentId": null
}



* Trial B)

I tried adding images to the Cache via "Image Caching > Cache Image" link and then checking if content is getting cached, but on using a cached image it gives "Content has expired on the server".


- HTTP Request

POST https://api.microsoftmoderator.com/Image/Cache/ HTTP/1.1
Content-Type: application/json
Host: api.microsoftmoderator.com
Content-Length: 106
Ocp-Apim-Subscription-Key: ••••••••••••••••••••••••••••••••

{
  "DataRepresentation":"URL",
  "Value":"https://upload.wikimedia.org/wikipedia/commons/thumb/c/c6/Leuchtturm_in_Westerheversand.jpg/800px-Leuchtturm_in_Westerheversand.jpg"
}



- Response Content

Date: Tue, 24 May 2016 15:41:03 GMT
X-Powered-By: ASP.NET
Content-Length: 141
Content-Type: application/json; charset=utf-8

{
  "CacheID": "72a6cd4d2ffe492980d321d0f34f84be",
  "TrackingId": "WUS_90939_560b14832703740d8c47b89d_72a6cd4d2ffe492980d321d0f34f84be"
}


Now as soon as I try to use this Cached image via "Image Services > Evaluate Cached Image", it gives error saying "Content has expired"


- Response status

410 Content has expired on the server.
Response latency
608 ms
Response content

Date: Tue, 24 May 2016 15:42:31 GMT
X-Powered-By: ASP.NET
Content-Length: 272
Content-Type: application/json; charset=utf-8

{
  "TrackingId": "WUS_90939_560b14832703740d8c47b89d_547f59be13f8437f940c823286aff1c8",
  "AdvancedInfo": [],
  "Result": false,
  "AdultClassificationScore": 0,
  "IsImageAdultClassified": false,
  "IsImageRacyClassified": false,
  "RacyClassificationScore": 0
}



* Trial C)

I even tried various combinations of "Image Services > Match Image" and "Image Services > Match Cached Image", but absolutely nothing seems to be working for me.

 

Replies

Sudipto Rakshit on Fri, 27 May 2016 16:47:51


I will begin with detailing how the Cache Image and Match Image APIs work:

 

Cache Image:

The intent of this API is to alleviate the egress cost of sending the same Image across different APIs. So, when you call the EvaluateImage API with cacheimage=true, we cache the image that was sent for a period of 50 seconds.

 

While the image is cached, you can call any of the other APIs “Detect Faces In Cached Image”, “Match Cached Image” and “Extract Text from Cached Image” with specifying the CacheId that was returned in the first call. This saves yourself egress cost, because we will not be downloading the image again.

 

 

Match Image:

The intent of the this API is to enable you to match a new Image with an existing list of Images. This API uses fuzzy matching and thus would enable you to match images that have been slightly modified.

 

The Add Image API is used to add a new Image to your custom list. The Add Image checks if you have the exact same image in your list already before it adds it.

 

Once you are done adding images to your Image List, you need to call the Refresh Index API. The Refresh Index API kicks off the process to rebuild the data structures for your custom image list, that enable fuzzy matching.

 

 

Addressing questions in your email:

  1.       I am already sending these images one by one for Nudity Checking. If at that time, I just give cache=true, then is it possible that the image will get added to your cache, and would be ready to use for the "Match Image" API without any Index Refreshing?

 

The Cache Image only retains the image temporarily to save you egress cost when calling other Image capabilities. Caching an image does not Add it to your custom list. Thus, the Match Image API would not be able to match against the Cached image.

 

 

 

  1.       Adding an image to your list and then checking if it matches with any other previous image, these would cost be 2 counts or just 1 count? Thus 1 count for "Add Image", 1 count for "Cache Image", then Refresh Index, and then 1 Count for "Match Image". Thus it would cost us 3 points for matching just a single image?

 

Calls to the Custom Image List Management APIs (i.e Add Image, Remove Image, Refresh Index) are NOT counted towards your monthly quota, these APIs have a rate limit of 10/second. So, overall this scenario would be counted as 1 call (i.e. for the Match Image).

 

  1.       Is it possible to get a response of nudity checking and image matching in a single request? Thus, I save time. This would cost us 2 points right?

 

Yes you are correct, as of today these calls will be counted as 2. Though, we are now working on a more inclusive “Moderation API” which would allow you make most of these calls together. We will keep you posted on that.

 

  1.       On the page of "Refresh Index", it says it can take around 5 minutes to do get the index updated. But on my site, users would be submitting images continuously. Thus if 2 images are submitted just seconds apart, we would like duplicate checking to happen even for such images. Would this be possible?

 

The Match Image API works off data structures (that enable fuzzy matching) which need to be rebuilt before it can start matching newly added images. In this scenario though, if you just call the Add Image API with the same image twice, it would return you a response saying that the Image already exists in your list.

 

Note: While the Add Image API would tell you that if the image already exists in your list, it does an exact match on the image and does not enable fuzzy matching.

 

 

 

Addressing each Trial scenario mentioned in your email:

 

Trial A:

You are correct the Add Image API is throwing an exception when called with Content-Type: application/json. We have logged this as an issue and actively working to resolve this issue.

 

Meanwhile, the Add Image API works if you pass the raw image as the body while specifying the Content-Type: <Image Mime Type>.  As documented here:

https://developer.microsoftmoderator.com/docs/services/54f79439e3a97812880825ce/operations/54f79439e3a9780af42ee9ca

 

 

Trial B:

We verified this scenario at our end and it is working as expected. The Image is cached for a period of 50 seconds. We will update documentation on the portal to reflect the same.

 

If you make the Evaluate Image In Cache after 50 secs have elapsed then you get the http 410 with message “Content has expired on the server”.

In your example below, the Image was Cached at Tue, 24 May 2016 15:41:03 GMT and consumed at Date: Tue, 24 May 2016 15:42:31 GMT. The difference between the timestamps is greater than 50 seconds.

 

Please retry this scenario and let us know if you still face issues.

 

Trial C:

Please use the following order of calls:

  1.        Call Add Image(s) (please pass the image in body while specifying the Content-Type as the Mime type of the image)
  2.        Call Refresh Index
  3.        Call Match Image to match against the images that were added in step 1

 

 

Also, we have our SDK published on Github: https://github.com/MicrosoftContentModerator/ContentModeratorSDK/tree/master/ContentModeratorSDK.NET. You can refer the Unit Tests in the suite as you integrate with the service.