Hive and Windows Auth – the curse of the backslash

Captain Avery: Put down the sword. A sword could kill us all, girl.
Amy:  Yeah. Thanks. That’s actually why I’m pointing it at you.

— from “Doctor Who: The Curse of the Black Spot

Background

Typically when you get Hive / Hadoop up and running, everything runs pretty smoothly especially if you use one of the demo VMs (e.g. I’m currently using the Cloudera QuickStart VM). But if you are in production and you want to secure login access to your environment, you may have Windows authentication turned on for access to one of the boxes on your Hadoop cluster via ssh or PuTTY.   If you do and you try to connect to Hive from the terminal, you may end up getting an Input path does not exist error:

 

[rory@tardis01 ~]$ hive

Logging initialized using configuration in jar:file:/opt/cloudera/parcels/CDH-4.4.0-1.cdh4.4.0.p0.39/lib/hive/lib/hive-common-0.10.0-cdh4.4.0.jar!/hive-log4j.properties

Hive history file=/tmp/MY_WINDOWS_DOMAIN\rory/hive_job_log_407166ce-26c0-4be4-8bb1-2e7d5979294e_1985097657.txt

hive> show tables;

OK

Failed with exception java.io.IOException:org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/tmp/MY_WINDOWS_DOMAIN\rory/hive_2013-11-04_21-29-21_584_9182521706239231332-1/-local-10000

Time taken: 1.994 seconds

Notice that its trying to access a /tmp/MY_WINDOWS_DOMAIN\rory folder which does not exist because the hive-log4j.properties are using the /tmp/{user.name} folder.  Because we’re using Windows auth to log into the server, the {user.name} translates to MY_WINDOWS_DOMAIN\rory (in Linux / OSX we don’t have this problem).   

 

Workaround Solution

To workaround this issue, try starting Hive using the following statement – where all temp/tmp files will go to the /tmp/rory folder (instead of the /tmp/MY_WINDOWS_DOMAIN\rory folder)

[rory@tardis01 ~]$ hive -hiveconf hive.log.dir=/tmp/rory -hiveconf hive.querylog.location=/tmp/rory -hiveconf hive.exec.scratchdir=/tmp/rory -hiveconf hive.exec.local.scratchdir=/tmp/rory

Logging initialized using configuration in jar:file:/opt/cloudera/parcels/CDH-4.4.0-1.cdh4.4.0.p0.39/lib/hive/lib/hive-common-0.10.0-cdh4.4.0.jar!/hive-log4j.properties

Hive history file=/tmp/rory/hive_job_log_3752e4b3-bf4c-4c9b-9536-8a73c8b16ced_1030174849.txt

hive> show tables;

OK

table1

table2

tableN

Time taken: 2.102 seconds

hive> 

It may not be the prettiest solution but you could always setup an alias within your .bash_rc / .profile file so that way you do not have to type in the additional parameters to start up Hive.

Enjoy!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s