Not able to run PySpark on Pycharm correctly

  apache-spark, pycharm, pyspark, python, windows

I have PySpark 3.1.2 and Python 3.8.3 installed in my windows. All the paths are also properly set in environment variables, spark_home, hadoop_home and path. Still I am facing the following error when I am trying to run this code. The error is system cannot find the file specified.

from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, StringType, IntegerType
spark = SparkSession.builder.appName('SparkByExamples.com').getOrCreate()
data2 = [("James", "abs"),
     ("Michael", "Rose"),
     ]
schema = StructType([ 
StructField("firstname", StringType(), True), 
StructField("middlename", StringType(), True), 
])
df = spark.createDataFrame(data=data2, schema=schema)
df.printSchema()
df.show(truncate=False)

The error is below.

21/09/01 14:36:19 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.io.IOException: Cannot run program "python3": CreateProcess error=2, The system cannot 
find the file specified
at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1128)
at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1071)
at org.apache.spark.api.python.PythonWorkerFactory.createSimpleWorker(PythonWorkerFactory.scala:165)
.....
.....
1/09/01 14:36:19 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1 times; aborting job
Traceback (most recent call last):
File "C:/Users/abc123/PycharmProjects/pythonProject/test.py", line 18, in <module>
df.show(truncate=False)
File "C:Sparkspark-3.1.2-bin-hadoop3.2pythonlibpyspark.zippysparksqldataframe.py", 
line 486, in show
File "C:Sparkspark-3.1.2-bin-hadoop3.2pythonlibpy4j-0.10.9-src.zippy4jjava_gateway.py", 
line 1304, in __call__
File "C:Sparkspark-3.1.2-bin-hadoop3.2pythonlibpyspark.zippysparksqlutils.py", line 
111, in deco
File "C:Sparkspark-3.1.2-bin-hadoop3.2pythonlibpy4j-0.10.9-src.zippy4jprotocol.py", 
line 326, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o39.showString.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 
failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0) (B****.a*.******.com 
executor driver): java.io.IOException: Cannot run program "python3": CreateProcess error=2, 
The system cannot find the file specified
at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1128)
at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1071)
....

Upto df.printschema() it is working fine, but when i try to run actions like df.show(), df.count() the above errors are coming. All the paths are properly set up in my environment variables. Python is also running properly. But still not able to resolve this issue.
Please guide me in solving the above issue.

Source: Windows Questions

LEAVE A COMMENT