how to enable memory mapping while reading feather file in c++

  apache-arrow, c++, feather, pyarrow, python

While reading the same feather file in python and in c++,in python, the function
pyarrow.feather.read_table() performs expectionally well then the API i am using for c++. When i investigated further, i found out that the main difference is because in python the read_table() API uses a flag named, memory_map(set true by default). When i disable this flag, c++ APIs perform just better than read_table() in python. Now as it suggests,c++ is not using memory_mapping by default, but to improve the performance i want to use it. Please suggest me ways to use memory mapping for c++ i as didn’t find any APIs of interest in the documentation available for c++.
The code i am using is –

#!/usr/bin/env python3
import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq
import pyarrow.feather
import time

for i in range(20):
    start_time = time.time()

    table = pyarrow.feather.read_table('data'+str(i + 1)+'.feather')
    end_time = time.time()
    print("Time taken to read file is : ",(end_time - start_time)*1000,"ms")

C++ code is –

void read_feather_to_table(std::string path,std::shared_ptr<arrow::Table> *feather_table){
    std::shared_ptr <arrow::io::RandomAccessFile> input_file = file_system.OpenInputFile(path).ValueOrDie();
    std::shared_ptr <arrow::ipc::feather::Reader> feather_reader = arrow::ipc::feather::Reader::Open(input_file).ValueOrDie();
    auto t1 = std::chrono::high_resolution_clock::now();
    arrow::Status temp_status = feather_reader -> Read(feather_table);
    auto t2 = std::chrono::high_resolution_clock::now();
    auto ms_int = std::chrono::duration_cast<std::chrono::milliseconds>(t2 - t1);
    std::cout << "Time taken to read file is : "<< ms_int.count()<< "msn";
    return;
}

Source: Windows Questions C++

LEAVE A COMMENT