I got the idea of using pre-defined functions to do this: calculate "a + b", "c * 5", "d * 3" and then add the result. But this way seems generate a lot of code. Is there any better methods to do this? By the way, does Apache Arrow use SIMD by default(c++ version)? If ..
While reading the same feather file in python and in c++,in python, the function pyarrow.feather.read_table() performs expectionally well then the API i am using for c++. When i investigated further, i found out that the main difference is because in python the read_table() API uses a flag named, memory_map(set true by default). When i disable ..
Here is a cmake snippet that I use to link to libpqxx library that is installed by vcpkg and try to use the apache-arrow that is installed by brew. find_package(libpqxx REQUIRED) target_link_libraries(database PRIVATE libpqxx::pqxx) find_package(Arrow CONFIG REQUIRED PATHS /usr/local/lib/cmake/arrow NO_DEFAULT_PATH) find_package(Parquet CONFIG REQUIRED PATHS /usr/local/lib/cmake/arrow NO_DEFAULT_PATH) target_link_libraries(database PRIVATE arrow_shared parquet_shared) To run it I use: ..
Below is the solution that worked for me, but not sure if it is the best way to do this. I used brew to install it. vcpkg does not work at the moment, unfortunately. What I don’t like about this solution is that I need to set Parquet_DIR and find_package(Parquet) separately. set(Parquet_DIR /usr/local/lib/cmake/arrow) find_package(Arrow CONFIG ..
Recently, I have been working on a project in which we reads csv files, convert them to parquet and feather files and write them on disk. Also reading parquet and feather files is another feature we would like to have in the project. Now, I want to perform each and every single process described to ..
I have written code to read the same parquet file using c++ and using python. The time taken to read the file is much less for python than in c++, but as generally we know, execution in c++ is faster than in python. I have attached the code here – #include <arrow/api.h> #include <parquet/arrow/reader.h> #include ..
I had written c++ code to read feather file and insert the data into a arrow::Table, but it gives segmentation fault if the file contains any column with datatype arrow::large_utf8. It gives segfault for this datatype only, there are no errors for utf8/int/float datatype. I think there is something wrong with the feather API implementation ..
I’m trying to compile the examples under cpp starting with minimal_build. I don’t have much cmake experience. Must this be run under docker, or can it just be compiled in a Linux shell? I’m running Centos7 on a AWS EC2 instance, and I’ve installed cmake 3.20.2. Executing sudo ./run.sh, errors immediately with "cd: /io: No ..
I am trying to read a csv input format using Apache arrow. The example here mentions that the input should be an InputStream, however in my case I just have an std::vector of unsigned chars. Is it possible to parse this using apache arrow? I have checked the I/O interface to see if there is ..
I’m trying to use arrow-cpp to build a Table then transfer it back to python. In order to do that, I need to call arrow::py::import_pyarrow() beforehand, but this will cause a SEGFAULT. Can anyone help to find where I did wrong? here is a minimal example CMakeLists.txt cmake_minimum_required(VERSION 3.20.0) project(TEST) set(CMAKE_CXX_STANDARD 17) set(Python3_EXECUTABLE "/home/auderson/miniconda3/bin/python3.8") list(APPEND ..