Category : apache-spark

I am trying to install Apache Spark in my system these are configuration I made but it is showing "The filename, directory name, or volume label syntax is incorrect." JAVA version: 1.8.0 Python version: 3.11.0 Spark version: 3.2.0 Hadoop winutil version: 3.2 I set path in environment variable also HADOOP_HOME C:bigdatahadoop SPARK_HOME C:bigdataspark Path %HADOOP_HOME%bin; ..

Read more

I’ve been trying to read from a .csv file on many ways, utilizing SparkContext object. I found it possible through scala.io.Source.fromFile function, but I want to use spark object. Everytime I run function textfile for org.apache.spark.SparkContext I get the same error: scala> sparkSession.read.csv("file://C:Users184229Desktopbigdata.csv") 21/12/29 16:47:32 WARN streaming.FileStreamSink: Error while looking for metadata directory. java.lang.UnsupportedOperationException: Not ..

Read more

I have a scenario in which we are connecting apache spark with sql server load data of tables into spark and generate parquet file form it , here is a snippet of my code. val database = "testdb" val jdbcDF = (spark.read.format("jdbc") .option("url", "jdbc:sqlserver://DESKTOP-694SPLH:1433;integratedSecurity=true;databaseName="+database) .option("dbtable", "employee") .option("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver") .load()) jdbcDF.write.parquet("/tmp/output/people.parquet") now its working find in ..

Read more

When I execute run-example SparkPi, for example, it works perfectly, but when I run spark-shell, it throws this exceptions: WARNING: An illegal reflective access operation has occurred WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/C:/big_data/spark-3.2.0-bin-hadoop3.2-scala2.13/jars/spark-unsafe_2.13-3.2.0.jar) to constructor java.nio.DirectByteBuffer(long,int) WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform WARNING: Use –illegal-access=warn to enable warnings of further ..

Read more

I am trying to install and run Pyspark on Windows 11 machine. I am using lines = spark.readStream.format("socket").option("host", "127.0.0.1").option("port", 5555).load() to connect to local port, however what happens is: It connects to the local port The app on the other end sends data, but they are never received in spark Spark exits with Caused by: ..

Read more

When I type Pyspark in my cmd prompt or windows powershell I get C:UsersbillAppDataLocalProgramsPythonPython39′ is not recognized as an internal or external command, operable program or batch file. and when I run pyspark in git bash it returns set PYSPARK_SUBMIT_ARGS="–name" "PySparkShell" "pyspark-shell" && C:UsersbillAppDataLocalProgramsPythonPython39 C:UsersbillSparkspark-3.2.0-bin-hadoop2.7/bin/spark-class: line 96: CMD: bad array subscript However python is set ..

Read more

I tried various steps suggested in the internet, Tried to download Directx Downloaded Microsoft C++ Distributable Package latest version Turned on .Net Framework in "Windows feature On and OFF". Tried to reset the PC. Still its not working. Any suggestions to resolve this issue would be helpful. Source: Windows..

Read more

Unable to run Spark To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 21/11/01 23:10:54 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable java.lang.IllegalAccessError: class org.apache.spark.storage.StorageUtils$ (in unnamed module @0x6a6afff2) cannot access class sun.nio.ch.DirectBuffer (in module java.base) because module java.base does not export sun.nio.ch to unnamed module ..

Read more