Captain Avery: Put down the sword. A sword could kill us all, girl.
Amy: Yeah. Thanks. That’s actually why I’m pointing it at you.
— from “Doctor Who: The Curse of the Black Spot”
Background
Typically when you get Hive / Hadoop up and running, everything runs pretty smoothly especially if you use one of the demo VMs (e.g. I’m currently using the Cloudera QuickStart VM). But if you are in production and you want to secure login access to your environment, you may have Windows authentication turned on for access to one of the boxes on your Hadoop cluster via ssh or PuTTY. If you do and you try to connect to Hive from the terminal, you may end up getting an Input path does not exist error:
[rory@tardis01 ~]$ hive
Logging initialized using configuration in jar:file:/opt/cloudera/parcels/CDH-4.4.0-1.cdh4.4.0.p0.39/lib/hive/lib/hive-common-0.10.0-cdh4.4.0.jar!/hive-log4j.properties
Hive history file=/tmp/MY_WINDOWS_DOMAIN\rory/hive_job_log_407166ce-26c0-4be4-8bb1-2e7d5979294e_1985097657.txt
hive> show tables;
OK
Failed with exception java.io.IOException:org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/tmp/MY_WINDOWS_DOMAIN\rory/hive_2013-11-04_21-29-21_584_9182521706239231332-1/-local-10000
Time taken: 1.994 seconds
Notice that its trying to access a /tmp/MY_WINDOWS_DOMAIN\rory folder which does not exist because the hive-log4j.properties are using the /tmp/{user.name} folder. Because we’re using Windows auth to log into the server, the {user.name} translates to MY_WINDOWS_DOMAIN\rory (in Linux / OSX we don’t have this problem).
Workaround Solution
To workaround this issue, try starting Hive using the following statement – where all temp/tmp files will go to the /tmp/rory folder (instead of the /tmp/MY_WINDOWS_DOMAIN\rory folder)
[rory@tardis01 ~]$ hive -hiveconf hive.log.dir=/tmp/rory -hiveconf hive.querylog.location=/tmp/rory -hiveconf hive.exec.scratchdir=/tmp/rory -hiveconf hive.exec.local.scratchdir=/tmp/rory
Logging initialized using configuration in jar:file:/opt/cloudera/parcels/CDH-4.4.0-1.cdh4.4.0.p0.39/lib/hive/lib/hive-common-0.10.0-cdh4.4.0.jar!/hive-log4j.properties
Hive history file=/tmp/rory/hive_job_log_3752e4b3-bf4c-4c9b-9536-8a73c8b16ced_1030174849.txt
hive> show tables;
OK
table1
table2
…
tableN
Time taken: 2.102 seconds
hive>
It may not be the prettiest solution but you could always setup an alias within your .bash_rc / .profile file so that way you do not have to type in the additional parameters to start up Hive.
Enjoy!