Category : apache-arrow

I’m trying to open a hive-partitioned parquet, which is essentially a nested directory with many little parquet fragments at the bottom level. When working with Python, I can just use read_table on the top directory that has a .parquet at the end of the name, and everything is handled automatically. If I’m working with just ..

Read more

I am attempting to use the C++ StreamWriter class provided by Apache Arrow. The only example of using StreamWriter uses the low-level Parquet API i.e. parquet::schema::NodeVector fields; fields.push_back(parquet::schema::PrimitiveNode::Make( "string_field", parquet::Repetition::OPTIONAL, parquet::Type::BYTE_ARRAY, parquet::ConvertedType::UTF8)); fields.push_back(parquet::schema::PrimitiveNode::Make( "char_field", parquet::Repetition::REQUIRED, parquet::Type::FIXED_LEN_BYTE_ARRAY, parquet::ConvertedType::NONE, 1)); auto node = std::static_pointer_cast<parquet::schema::GroupNode>( parquet::schema::GroupNode::Make("schema", parquet::Repetition::REQUIRED, fields)); The end result is a std::shared_ptr<parquet::schema::GroupNode> that can then be ..

Read more