Saving files on PySpark in Jupyter Notebook (Conda install) doesn’t work, but works in scala shell

  apache-spark, conda, pyspark, python, windows

This is directly related to this problem: Spark & Scala: saveAsTextFile() exception

I also have the same error when I attempt to save a dataframe into a csv from Jupyter Notebook when using PySpark. I created a very simple csv to load and immediately save (I could display it using show in its entirety), but when I attempt to save it I get the UnverifiedLink error.

I followed all suggestions in the above StackOverflow questions but none of them helped. However, when I try to load the same csv in CMD using spark-shell, all works fine.

It also seems that the PySpark in Jupyter that I installed using Anaconda (Py 3.8) doesn’t seem to recognize the HADOOP_HOME environment variable, and I have to set it manually using:

import os
os.environ['HADOOP_HOME'] = "C:appshadoop-2.7.3"

I already tried all suggestions I could find on Stack and I’m confused why it works in spark-shell and not in PySpark in a Notebook. I am able to run hadoop from powershell no problem as well

Source: Windows Questions

LEAVE A COMMENT