Opened a command prompt and run “spark-shell.cmd”. This give me flexibility to run spark from anywhere from command prompt. SPARK_HOME: the bin folder path of uncompressed Sparkįollowing are the values for my Desktop: C:\WINDOWS\system32>echo %HADOOP_HOME%Īlso added “D:\spark-2.0.2-bin-hadoop2.7\bin” folder to PATH environment variable. SCALA_HOME: the bin folder of the Scala location.
JAVA_HOME: set the folder path for my JDK HADOOP_HOME: I set this variable to the folder containing the winutils.exe file Environment Variablesįollowing environment variables set to specify the path where various required components are installed. I’ve extracted the files to a folder in D drive as my C drive have limited space.
I’ve downloaded latest Spark release 2.0.2 (Nov 14, 2016) from the official download site.( ). Placed the winutils.exe file to a folder.
I’m running 64bit Windows 10 and downloaded winutils.exe from this Git Hub URL. In summary, it is required for running shell commands on Windows OS. Also, winutils calls are made to read and write files on Windows. So why exactly is winutils and why it is required? On further investigation, I found that among other things, it seems like Spark uses Hadoop which calls UNIX commands such as chmod to create files and directories. I referred various sources and found that Spark can run locally, but needs winutils.exe which is a component of Hadoop. Followed standard installation prompts and installed Scala on default path ( C:\Program Files (x86)\scala ). I’ve downloaded Scala 2.12.0 binaries MSI installer from ( ). ScalaĪpache Spark is written in Scala programming language and needs it installed on local PC. I’ve used Java JDK Version 1.8.0_101 for my setup. Spark overview page clearly mentions “It’s easy to run locally on one machine - all you need is to have java installed on your system PATH, or the JAVA_HOME environment variable pointing to a Java installation”. Please refer Wikipedia Apache Spark page ( ) to start learning about the same. This blog post summarizes steps that I have performed for the purpose. I try to overcome this situation by creating Apache Spark Standalone Mode Setup on my home Windows 10 PC. DSWB’s Jupyter Notebook link was not working. Today, I was working on IBM Big Data University course Spark Fundamentals and found that there are some issues with Data Scientist Workbench (DSWB) site.