Skip to content

Conversation

@norberttech
Copy link
Member

Resolves: #1766

Change Log


Added

  • Allow to set custom compression algorithm for columns

Fixed

Changed

Removed

Deprecated

Security

* @return null|array<mixed>|bool|float|int
*/
public function get(Option $option) : bool|int|float|array|ColumnsEncodings|null
public function get(Option $option) : bool|int|float|array|null
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm reverting that change in favor of more generic approach

@codecov
Copy link

codecov bot commented Jul 14, 2025

Codecov Report

Attention: Patch coverage is 77.41935% with 7 lines in your changes missing coverage. Please review.

Project coverage is 81.78%. Comparing base (894ce19) to head (3a38adf).
Report is 2 commits behind head on 1.x.

✅ All tests successful. No failed tests found.

Additional details and impacted files
@@            Coverage Diff             @@
##              1.x    #1769      +/-   ##
==========================================
- Coverage   81.82%   81.78%   -0.04%     
==========================================
  Files         727      726       -1     
  Lines       20879    20829      -50     
==========================================
- Hits        17084    17036      -48     
+ Misses       3795     3793       -2     
Components Coverage Δ
etl 88.41% <ø> (ø)
cli 85.46% <ø> (ø)
lib-array-dot 94.56% <ø> (ø)
lib-azure-sdk 61.35% <ø> (ø)
lib-doctrine-dbal-bulk 93.88% <ø> (ø)
lib-filesystem 78.02% <ø> (ø)
lib-types 53.55% <ø> (ø)
lib-parquet 85.48% <77.41%> (-0.14%) ⬇️
lib-parquet-viewer 83.11% <ø> (ø)
lib-snappy 90.23% <ø> (+0.46%) ⬆️
bridge-filesystem-async-aws 90.38% <ø> (ø)
bridge-filesystem-azure 89.92% <ø> (ø)
bridge-monolog-http 97.04% <ø> (ø)
bridge-openapi-specification 93.16% <ø> (ø)
symfony-http-foundation 74.41% <ø> (ø)
adapter-chartjs 86.70% <ø> (ø)
adapter-csv 88.85% <ø> (ø)
adapter-doctrine 89.89% <ø> (ø)
adapter-elasticsearch 97.23% <ø> (ø)
adapter-google-sheet 83.87% <ø> (ø)
adapter-http 58.10% <ø> (ø)
adapter-json 87.98% <ø> (ø)
adapter-logger 53.84% <ø> (ø)
adapter-meilisearch 97.95% <ø> (ø)
adapter-parquet 78.92% <ø> (ø)
adapter-text 84.44% <ø> (ø)
adapter-xml 82.73% <ø> (ø)
🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@github-actions
Copy link
Contributor

github-actions bot commented Jul 14, 2025

Flow PHP - Benchmarks

Results of the benchmarks from this PR are compared with the results from 1.x branch.

Extractors
+-----------------------+------------------------+------+-----+-----------------+------------------+-----------------+
| benchmark             | subject                | revs | its | mem_peak        | mode             | rstdev          |
+-----------------------+------------------------+------+-----+-----------------+------------------+-----------------+
| CSVExtractorBench     | bench_extract_10k      | 1    | 3   | 4.871mb -0.03%  | 443.685ms +1.04% | ±0.86% +16.71%  |
| ExcelExtractorBench   | bench_extract_10k_ods  | 1    | 3   | 65.566mb -0.00% | 1.057s -0.94%    | ±0.84% +15.39%  |
| ExcelExtractorBench   | bench_extract_10k_xlsx | 1    | 3   | 67.666mb -0.00% | 1.692s -0.58%    | ±0.98% -45.70%  |
| JsonExtractorBench    | bench_extract_10k      | 1    | 3   | 5.463mb -0.02%  | 1.128s -3.31%    | ±1.88% +42.03%  |
| ParquetExtractorBench | bench_extract_10k      | 1    | 3   | 10.671mb -0.20% | 9.233s -19.67%   | ±0.41% +44.47%  |
| TextExtractorBench    | bench_extract_10k      | 1    | 3   | 4.594mb -0.03%  | 42.032ms -0.03%  | ±0.93% +148.75% |
| XmlExtractorBench     | bench_extract_10k      | 1    | 3   | 4.579mb -0.03%  | 591.227ms -2.01% | ±0.38% -53.03%  |
+-----------------------+------------------------+------+-----+-----------------+------------------+-----------------+
Transformers
+---------------------------------+--------------------------+------+-----+------------------+-----------------+-----------------+
| benchmark                       | subject                  | revs | its | mem_peak         | mode            | rstdev          |
+---------------------------------+--------------------------+------+-----+------------------+-----------------+-----------------+
| RenameEachEntryTransformerBench | bench_transform_10k_rows | 1    | 3   | 18.590mb -0.01%  | 72.368ms -1.79% | ±1.69% +115.23% |
| RenameEntryTransformerBench     | bench_transform_10k_rows | 1    | 3   | 123.328mb -0.00% | 66.541ms -1.05% | ±1.16% +238.12% |
+---------------------------------+--------------------------+------+-----+------------------+-----------------+-----------------+
Loaders
+--------------------+----------------+------+-----+------------------+-----------------+----------------+
| benchmark          | subject        | revs | its | mem_peak         | mode            | rstdev         |
+--------------------+----------------+------+-----+------------------+-----------------+----------------+
| CSVLoaderBench     | bench_load_10k | 1    | 3   | 62.532mb -0.00%  | 82.101ms -4.73% | ±1.78% +98.64% |
| JsonLoaderBench    | bench_load_10k | 1    | 3   | 80.613mb -0.00%  | 99.603ms -4.07% | ±0.78% -16.07% |
| ParquetLoaderBench | bench_load_10k | 1    | 3   | 835.158mb +0.02% | 18.815s -30.65% | ±0.42% +58.78% |
| TextLoaderBench    | bench_load_10k | 1    | 3   | 17.896mb -0.01%  | 28.611ms -3.50% | ±0.94% -11.72% |
+--------------------+----------------+------+-----+------------------+-----------------+----------------+
Building Blocks
+-------------------+----------------------------+------+-----+------------------+------------------+-----------------+
| benchmark         | subject                    | revs | its | mem_peak         | mode             | rstdev          |
+-------------------+----------------------------+------+-----+------------------+------------------+-----------------+
| EntryFactoryBench | bench_entry_factory        | 1    | 3   | 106.010mb -0.00% | 651.154ms -1.07% | ±1.27% +25.73%  |
| EntryFactoryBench | bench_entry_factory        | 1    | 3   | 55.285mb -0.00%  | 323.555ms -5.98% | ±1.34% -13.21%  |
| EntryFactoryBench | bench_entry_factory        | 1    | 3   | 14.871mb -0.01%  | 69.278ms -1.64%  | ±1.83% +92.55%  |
| TypeDetectorBench | bench_type_detector        | 1    | 3   | 42.541mb -0.00%  | 407.314ms -0.58% | ±1.41% +23.99%  |
| TypeDetectorBench | bench_type_detector        | 1    | 3   | 11.598mb -0.01%  | 81.595ms -1.23%  | ±0.12% -92.04%  |
| RowsBench         | bench_chunk_10_on_10k      | 2    | 3   | 93.481mb -0.00%  | 3.160ms -16.57%  | ±2.97% +44.99%  |
| RowsBench         | bench_diff_left_1k_on_10k  | 2    | 3   | 110.852mb -0.00% | 236.915ms -0.98% | ±0.66% +153.05% |
| RowsBench         | bench_diff_right_1k_on_10k | 2    | 3   | 93.572mb -0.00%  | 23.602ms -1.41%  | ±0.94% -50.87%  |
| RowsBench         | bench_drop_1k_on_10k       | 2    | 3   | 94.356mb -0.00%  | 1.252ms -18.40%  | ±2.17% -13.35%  |
| RowsBench         | bench_drop_right_1k_on_10k | 2    | 3   | 94.356mb -0.00%  | 1.372ms -21.01%  | ±0.30% -78.07%  |
| RowsBench         | bench_entries_on_10k       | 2    | 3   | 92.516mb -0.00%  | 3.235ms -5.95%   | ±0.48% -82.23%  |
| RowsBench         | bench_filter_on_10k        | 2    | 3   | 93.045mb -0.00%  | 15.105ms -0.54%  | ±1.85% +44.79%  |
| RowsBench         | bench_find_on_10k          | 2    | 3   | 93.045mb -0.00%  | 15.060ms -1.60%  | ±0.62% +193.70% |
| RowsBench         | bench_find_one_on_10k      | 10   | 3   | 91.734mb -0.00%  | 1.700μs -5.56%   | ±0.00% 0.00%    |
| RowsBench         | bench_first_on_10k         | 10   | 3   | 91.734mb -0.00%  | 0.300μs 0.00%    | ±0.00% 0.00%    |
| RowsBench         | bench_flat_map_on_1k       | 2    | 3   | 100.795mb -0.00% | 14.347ms -1.55%  | ±0.91% -70.65%  |
| RowsBench         | bench_map_on_10k           | 2    | 3   | 130.222mb -0.00% | 67.741ms -2.39%  | ±0.56% -3.48%   |
| RowsBench         | bench_merge_1k_on_10k      | 2    | 3   | 93.565mb -0.00%  | 1.265ms -8.52%   | ±1.45% -45.45%  |
| RowsBench         | bench_partition_by_on_10k  | 2    | 3   | 96.934mb -0.00%  | 60.067ms -3.53%  | ±1.12% -14.29%  |
| RowsBench         | bench_remove_on_10k        | 2    | 3   | 94.618mb -0.00%  | 3.366ms -11.57%  | ±0.13% -96.25%  |
| RowsBench         | bench_sort_asc_on_1k       | 2    | 3   | 92.096mb -0.00%  | 39.360ms -0.44%  | ±1.07% +18.67%  |
| RowsBench         | bench_sort_by_on_1k        | 2    | 3   | 92.096mb -0.00%  | 39.757ms +0.23%  | ±0.47% +103.55% |
| RowsBench         | bench_sort_desc_on_1k      | 2    | 3   | 92.096mb -0.00%  | 39.369ms -0.76%  | ±1.28% -43.62%  |
| RowsBench         | bench_sort_entries_on_1k   | 2    | 3   | 94.177mb -0.00%  | 7.869ms -5.12%   | ±0.61% -76.38%  |
| RowsBench         | bench_sort_on_1k           | 2    | 3   | 91.927mb -0.00%  | 29.532ms +0.24%  | ±1.72% +174.51% |
| RowsBench         | bench_take_1k_on_10k       | 10   | 3   | 91.734mb -0.00%  | 14.376μs 0.00%   | ±1.32% 0.00%    |
| RowsBench         | bench_take_right_1k_on_10k | 10   | 3   | 91.734mb -0.00%  | 15.388μs -5.01%  | ±1.69% +11.75%  |
| RowsBench         | bench_unique_on_1k         | 2    | 3   | 110.853mb -0.00% | 241.237ms +0.80% | ±0.19% -50.06%  |
+-------------------+----------------------------+------+-----+------------------+------------------+-----------------+
Parquet Library
+--------------------+---------------------------------+------+-----+----------------+-------------------+-----------------+
| benchmark          | subject                         | revs | its | mem_peak       | mode              | rstdev          |
+--------------------+---------------------------------+------+-----+----------------+-------------------+-----------------+
| ParquetReaderBench | bench_page_headers              | 1    | 3   | 6.669mb -0.04% | 3.284s -1.07%     | ±0.55% +25.52%  |
| ParquetReaderBench | bench_read_metadata             | 1    | 3   | 5.354mb -0.05% | 18.157ms +0.18%   | ±0.60% +114.96% |
| ParquetReaderBench | bench_read_schema               | 1    | 3   | 5.354mb -0.05% | 18.011ms -0.34%   | ±0.18% -60.35%  |
| ParquetReaderBench | bench_read_values_all_columns   | 1    | 3   | 9.104mb -0.23% | 5.607s -28.54%    | ±0.29% -60.60%  |
| ParquetReaderBench | bench_read_values_single_column | 1    | 3   | 6.401mb -0.33% | 232.478ms -49.40% | ±1.62% +269.75% |
| ParquetReaderBench | bench_read_values_with_limit    | 1    | 3   | 6.932mb -0.48% | 28.615ms -14.33%  | ±0.64% +33.05%  |
| ParquetWriterBench | bench_write_batch               | 1    | 3   | 9.855mb -4.00% | 163.433ms -19.24% | ±0.64% -38.37%  |
| ParquetWriterBench | bench_write_gzip                | 1    | 3   | 9.817mb +0.00% | 177.527ms -2.15%  | ±1.01% +87.42%  |
| ParquetWriterBench | bench_write_row_by_row          | 1    | 3   | 9.855mb -4.00% | 163.649ms -18.34% | ±0.30% -25.81%  |
| ParquetWriterBench | bench_write_snappy              | 1    | 3   | 9.855mb -4.00% | 163.679ms -18.72% | ±0.41% +30.85%  |
| ParquetWriterBench | bench_write_uncompressed        | 1    | 3   | 9.632mb +0.00% | 165.292ms -1.10%  | ±1.21% +106.28% |
+--------------------+---------------------------------+------+-----+----------------+-------------------+-----------------+

@norberttech norberttech merged commit ae8e1b4 into 1.x Jul 14, 2025
19 of 21 checks passed
@norberttech norberttech deleted the 1766-proposal-allow-to-chose-a-specific-compression-for-parquet-columns branch July 14, 2025 19:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Proposal]: Allow to chose a specific compression for parquet columns

2 participants