Question

Cmathias on Sun, 25 Sep 2016 02:22:13


I first tried to get this to work with Spark 1.6.2 and Hadoop 2.8. Failing that, I followed this tutorial:

https://blogs.msdn.microsoft.com/arsen/2016/08/05/accessing-azure-data-lake-store-using-webhdfs-with-oauth2-from-spark-2-0-that-is-running-locally/

trying same with Spark 2 and Hadoop 2.8.

I keep getting 401 Unauthorized.

I am using the PublicClient java code from this project: https://github.com/AzureAD/azure-activedirectory-library-for-java in order to generate the client id, refresh token, and access token for putting into core-site.xml

Since all of that was failing I wanted to go back to "basics" and just try curling the public webhdfs interface using the tokens I received.  I get the same result, e.g.

curl -X GET -H "Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJSUzI1NiIsIng1dCI6IlliUkFRUlljRV9tb3RXVkpLSHJ3TEJiZF85cyIsImtpZCI6IlliUkFRUlljRV9tb3RXVkpLSHJ3TEJiZF85cyJ9.eyJhdWQiOiJodHRwczovL2dyYXBoLndpbmRvd3MubmV0IiwiaXNzIjoiaHR0cHM6Ly9zdHMud2luZG93cy5uZXQvOGEzYzNiNTYtZTUwOS00Zjg2LTlmNWMtZTA5YjJmNTY0MjE3LyIsImlhdCI6MTQ3NDc2OTE1MCwibmJmIjoxNDc0NzY5MTUwLCJleHAiOjE0NzQ3NzMwNTAsImFjciI6IjEiLCJhbXIiOlsicHdkIl0sImFwcGlkIjoiOWJhMWE1YzctZjE3YS00ZGU5LWExZjEtNjE3OGM4ZDUxMjIzIiwiYXBwaWRhY3IiOiIwIiwiZV9leHAiOjEwODAwLCJmYW1pbHlfbmFtZSI6IlBpcGVMaW5lIEFjY2VzcyIsImdpdmVuX25hbWUiOiJEYXRhIiwiaXBhZGRyIjoiMTM2LjYyLjMuMjI2IiwibmFtZSI6IkRhdGEgUGlwZUxpbmUgQWNjZXNzIiwib2lkIjoiNDBhYzg3ZTEtODc3OS00ZTkxLTg3YTItZGY3MTdhMDZmNDk5IiwicHVpZCI6IjEwMDM3RkZFOUFBMkVGMDYiLCJzY3AiOiJ1c2VyX2ltcGVyc29uYXRpb24iLCJzdWIiOiJWa3JDYy1sU2VvbGpKdmhwcTM0cmFxelRKdE8zcDEwTXZ0bzI5aEpLWjVZIiwidGlkIjoiOGEzYzNiNTYtZTUwOS00Zjg2LTlmNWMtZTA5YjJmNTY0MjE3IiwidW5pcXVlX25hbWUiOiJ3ei1kcGxhY2Nlc3NAd290Y3d6Lm9ubWljcm9zb2Z0LmNvbSIsInVwbiI6Ind6LWRwbGFjY2Vzc0B3b3Rjd3oub25taWNyb3NvZnQuY29tIiwidmVyIjoiMS4wIn0.eatoOxgQ_baVX0F3QEIN9nKVys8N0rL8rcVFhVwQ50RPdmLC7NxI5S3JUjPetxjn3B3Zm0-ZT8d4dasEWoSHPeqvpH6P6KAwfd1P217fdAlnxqBaY54tulObc9RfQRBEPzm-tmhBfKL5d544Wp-HPHUr4LvLn-mo3387QzH8Fg2NHXj9bYBFPTcTKAtKCtiDe8JqKi3ulxvjA2vyegHEZf6kS4dLVhZO1R7m8_a14GKzG2dp83Lcml3719MaOpNbm2xnCYoJwXrJ5S5sztwcqMG5ZaOR9rnZ4E9JrLmXABvViP_JARHnQDQgrdYMCZE3pj4xZvh354vayzm6LV82Wg" https://[tenantid].azuredatalakestore.net/webhdfs/v1/?op=LISTSTATUS  

{"error":{"code":"AuthenticationFailed","message":"Failed to validate the access token in the 'Authorization' header. Trace: a787b738-a99e-4728-8fda-7ecef785b331 Time: 2016-09-24T19:13:59.4338482-07:00"}}

The error says failed to validate access token - but the token was obtained successfully only moments before, using the proper credentials.  Is this actually a failure of access to the folder, or something else?  If I wait long enough I can also get "The access token in the 'Authorization' header is expired"

Running out of ideas to try. Looking for any input or insight.



Sponsored



Replies

Arsen Vladimirskiy on Mon, 26 Sep 2016 20:29:50


Hi,

When you generated the access token, have you specified parameter resource=https://management.core.windows.net as part of the token request? The token issuing request must have this resource specified for the issued token to be valid for accessing the WebHDFS endpoint.

Could you try following the "manual" approach for obtaining the access and refresh token described here just to see if the token you get this way works for you when you issue the CURL command?

Thanks,
Arsen


Cmathias on Mon, 26 Sep 2016 21:19:52


Thanks so much for the response!  Actually that is the guide I followed and admittedly I missed that specific detail.  Unfortunately I cannot follow the thread entirely because I have not been granted access to the Azure console, I have been issued the credentials to access the hdfs store only.  I tried changing the code in the java application to generate credentials to use the resource you specified:

Future<AuthenticationResult> future = context.acquireToken(
"https://management.core.windows.net", CLIENT_ID, username, password,
null);

And the new token does not work. However as I use this PublicClient app I realize that my client id must be wrong, so probably I need to obtain that. I'll work with the team that has portal access to try to obtain this value.  If this value is not relevant, then let me know what else to try.

Also, I was wondering if it's the guid ClientId or guid TenantId or even result.getUserInfo().getUniqueId() to be placed into the core-site.xml?

Arsen Vladimirskiy on Mon, 26 Sep 2016 23:11:56


For the dfs.webhdfs.oauth2.client.id property in the core-site.xml, you should copy the GUID that corresponds to the application that was created in your Azure Active Directory. This GUID is visible in the "Client ID" field on the "Configure" tab of the application in the classic portal.

This document provides more details on accessing Azure Data Lake Store using the REST API and the Prerequisites section on the top provides more details about the Azure Active Directory application that must be created for obtaining the token - https://azure.microsoft.com/en-us/documentation/articles/data-lake-store-get-started-rest-api/

Eric LaPlatney on Tue, 27 Sep 2016 21:56:37


Hi Arsen- I'm the Azure Administrator working with Cmathias, and I can supply some confirmations here.

Using your walkthrough here:

https://blogs.msdn.microsoft.com/arsen/2016/08/05/accessing-azure-data-lake-store-using-webhdfs-with-oauth2-from-spark-2-0-that-is-running-locally/

I've provided Cmathias with a client ID and refresh URL and token- and refreshed the token a time or two as well. I've confirmed that the application has access to the Azure Service Management API, and my cURLs are receiving an identical response from the service.


Is there anything we could have conceivably missed that would cause the token to fail to format or validate?

Arsen Vladimirskiy on Wed, 28 Sep 2016 22:25:38


Hi,

When supplying the resource= parameter to get the access token, please make sure to include the trailing slash (i.e. it should be resource=https://management.core.windows.net/)