Category : parquet

I’m trying to write Parquet files on Windows using C++. I followed the instructions I found here and chose the "Using conda-forge for build dependencies" and "Building using Visual Studio (MSVC) Solution Files" approaches. In contrast to the article on the page mentioned before, my calls to cmake look like this: cmake .. -G "Visual ..

Read more

I am trying to read ORC files in chunks. I have a very large ORC file on disk (say 100GB) and very limited memory (e.g., I can buffer max 1MB data in memory). I want to scan ORC file intelligently: read footer get addresses of stripes read first stripe’s metadata (footer) and apply some filters ..

Read more

Example: Let’s say a table name user has id, name, email, phone, and is_active as attributes. And there are 1000s of users part of this table. I would like to read the details per user. void ParquetReaderPlus::read_next_row(long row_group_index, long local_row_num) { std::vector<int> columns_to_tabulate(this->total_row); for (int idx = 0; idx < this->total_row; idx++) columns_to_tabulate[idx] = idx; ..

Read more