-
Installed spark (spark-3.1.1-bin-hadoop2.7)
-
downloaded java jdk to "C:Program FilesJavajdk1.8.0_281"
-
downloaded anaconda
-
downloaded winutils.exe to C:sparkspark-3.1.1-bin-hadoop2.7bin
5.I have set the Windows Environment variables to,
SPARK_HOME=C:sparkspark-3.1.1-bin-hadoop2.7
HADOOP_HOME=C:sparkspark-3.1.1-bin-hadoop2.7
PYSPARK_DRIVER_PYTHON=jupyter
PYSPARK_DRIVER_PYTHON_OPTS=notebook
JAVA_HOME="C:Program FilesJavajdk1.8.0_281"
(I have tried the suggestion of changing the above to c:Progra~1Javajdk1.8.0_281 and that doesnt work either)
I am able to launch jupyter notebook.
I then copy paste this code onto a cell:
import findspark
findspark.init()
import pyspark # only run after findspark.init()
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
df = spark.sql('''select 'spark' as hello ''')
df.show()
I then get this error !!!!!!
FileNotFoundError: [WinError 2] The system cannot find the file specified
Source: Windows Questions