Skip to content

Conversation

@stloyd
Copy link
Member

@stloyd stloyd commented Sep 24, 2025

Resolves: #1865

Note: it doesn't apply changes to XMLReaderExtractor as it was already marked as deprecated.

Change Log


Added

  • [XMLParserExtractor] Add support for Schema

Fixed

Changed

Removed

Deprecated

Security

@stloyd stloyd changed the title [XMLParserExtractor] Add support for Schema [XMLParserExtractor] Add support for Schema Sep 24, 2025
@codecov
Copy link

codecov bot commented Sep 24, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 82.53%. Comparing base (0b878f3) to head (262f5ed).
⚠️ Report is 2 commits behind head on 1.x.

Additional details and impacted files
@@            Coverage Diff             @@
##              1.x    #1866      +/-   ##
==========================================
- Coverage   82.53%   82.53%   -0.01%     
==========================================
  Files         772      772              
  Lines       21805    21808       +3     
==========================================
+ Hits        17997    17999       +2     
- Misses       3808     3809       +1     
Components Coverage Δ
etl 89.25% <ø> (ø)
cli 85.91% <ø> (ø)
lib-array-dot 94.56% <ø> (ø)
lib-azure-sdk 61.35% <ø> (ø)
lib-doctrine-dbal-bulk 95.59% <ø> (ø)
lib-filesystem 80.25% <ø> (ø)
lib-types 53.55% <ø> (ø)
lib-parquet 85.50% <ø> (ø)
lib-parquet-viewer 83.11% <ø> (ø)
lib-snappy 90.23% <ø> (-0.47%) ⬇️
bridge-filesystem-async-aws 90.38% <ø> (ø)
bridge-filesystem-azure 89.92% <ø> (ø)
bridge-monolog-http 97.04% <ø> (ø)
bridge-openapi-specification 94.52% <ø> (ø)
symfony-http-foundation 74.41% <ø> (ø)
adapter-chartjs 86.70% <ø> (ø)
adapter-csv 88.85% <ø> (ø)
adapter-doctrine 91.21% <ø> (ø)
adapter-elasticsearch 97.23% <ø> (ø)
adapter-google-sheet 91.66% <ø> (ø)
adapter-http 58.10% <ø> (ø)
adapter-json 87.98% <ø> (ø)
adapter-logger 53.84% <ø> (ø)
adapter-meilisearch 97.95% <ø> (ø)
adapter-parquet 78.92% <ø> (ø)
adapter-text 84.44% <ø> (ø)
adapter-xml 82.86% <100.00%> (+0.13%) ⬆️
🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@github-actions
Copy link
Contributor

Flow PHP - Benchmarks

Results of the benchmarks from this PR are compared with the results from 1.x branch.

Extractors
+-----------------------+------------------------+------+-----+-----------------+------------------+-----------------+
| benchmark             | subject                | revs | its | mem_peak        | mode             | rstdev          |
+-----------------------+------------------------+------+-----+-----------------+------------------+-----------------+
| CSVExtractorBench     | bench_extract_10k      | 1    | 3   | 4.956mb -0.02%  | 431.939ms -1.49% | ±0.71% -22.20%  |
| ExcelExtractorBench   | bench_extract_10k_ods  | 1    | 3   | 66.215mb -0.00% | 1.085s -1.99%    | ±0.90% -38.94%  |
| ExcelExtractorBench   | bench_extract_10k_xlsx | 1    | 3   | 68.326mb -0.00% | 1.710s -0.03%    | ±0.59% -7.68%   |
| JsonExtractorBench    | bench_extract_10k      | 1    | 3   | 5.492mb +0.01%  | 1.186s -0.47%    | ±0.96% -37.09%  |
| ParquetExtractorBench | bench_extract_10k      | 1    | 3   | 10.788mb -0.26% | 9.584s -18.77%   | ±0.31% -71.63%  |
| TextExtractorBench    | bench_extract_10k      | 1    | 3   | 4.682mb -0.02%  | 61.627ms -0.55%  | ±0.74% +15.44%  |
| XmlExtractorBench     | bench_extract_10k      | 1    | 3   | 4.665mb +0.08%  | 627.911ms +1.34% | ±0.73% +314.97% |
+-----------------------+------------------------+------+-----+-----------------+------------------+-----------------+
Transformers
+---------------------------------+--------------------------+------+-----+------------------+-----------------+-----------------+
| benchmark                       | subject                  | revs | its | mem_peak         | mode            | rstdev          |
+---------------------------------+--------------------------+------+-----+------------------+-----------------+-----------------+
| RenameEachEntryTransformerBench | bench_transform_10k_rows | 1    | 3   | 18.687mb -0.01%  | 73.214ms +0.79% | ±0.43% -26.62%  |
| RenameEntryTransformerBench     | bench_transform_10k_rows | 1    | 3   | 123.490mb -0.00% | 67.938ms +4.05% | ±1.28% +132.01% |
+---------------------------------+--------------------------+------+-----+------------------+-----------------+-----------------+
Loaders
+--------------------+----------------+------+-----+------------------+------------------+----------------+
| benchmark          | subject        | revs | its | mem_peak         | mode             | rstdev         |
+--------------------+----------------+------+-----+------------------+------------------+----------------+
| CSVLoaderBench     | bench_load_10k | 1    | 3   | 62.781mb -0.00%  | 89.975ms -0.73%  | ±0.65% -43.08% |
| JsonLoaderBench    | bench_load_10k | 1    | 3   | 80.702mb -0.00%  | 102.642ms -2.31% | ±0.44% -8.27%  |
| ParquetLoaderBench | bench_load_10k | 1    | 3   | 819.384mb +0.04% | 20.374s -26.84%  | ±0.23% -35.48% |
| TextLoaderBench    | bench_load_10k | 1    | 3   | 17.982mb -0.01%  | 33.922ms -0.06%  | ±0.24% -22.11% |
+--------------------+----------------+------+-----+------------------+------------------+----------------+
Building Blocks
+-------------------+----------------------------+------+-----+------------------+------------------+-----------------+
| benchmark         | subject                    | revs | its | mem_peak         | mode             | rstdev          |
+-------------------+----------------------------+------+-----+------------------+------------------+-----------------+
| TypeDetectorBench | bench_type_detector        | 1    | 3   | 42.608mb -0.00%  | 406.209ms +0.23% | ±0.23% -55.51%  |
| TypeDetectorBench | bench_type_detector        | 1    | 3   | 11.665mb -0.01%  | 82.815ms +1.04%  | ±0.09% -77.07%  |
| EntryFactoryBench | bench_entry_factory        | 1    | 3   | 106.085mb -0.00% | 660.132ms +0.90% | ±1.18% +83.39%  |
| EntryFactoryBench | bench_entry_factory        | 1    | 3   | 55.363mb -0.00%  | 330.759ms -1.15% | ±0.68% -68.11%  |
| EntryFactoryBench | bench_entry_factory        | 1    | 3   | 14.949mb -0.01%  | 72.280ms +0.96%  | ±0.73% -24.93%  |
| RowsBench         | bench_chunk_10_on_10k      | 2    | 3   | 93.553mb -0.00%  | 4.050ms +18.46%  | ±1.27% -63.21%  |
| RowsBench         | bench_diff_left_1k_on_10k  | 2    | 3   | 110.943mb -0.00% | 237.202ms -0.03% | ±0.70% +24.09%  |
| RowsBench         | bench_diff_right_1k_on_10k | 2    | 3   | 93.663mb -0.00%  | 24.098ms +2.41%  | ±1.46% +47.15%  |
| RowsBench         | bench_drop_1k_on_10k       | 2    | 3   | 94.428mb -0.00%  | 1.638ms +19.05%  | ±3.19% +74.19%  |
| RowsBench         | bench_drop_right_1k_on_10k | 2    | 3   | 94.428mb -0.00%  | 1.737ms +19.14%  | ±1.82% +52.18%  |
| RowsBench         | bench_entries_on_10k       | 2    | 3   | 92.589mb -0.00%  | 3.632ms +0.93%   | ±3.45% +8.28%   |
| RowsBench         | bench_filter_on_10k        | 2    | 3   | 93.117mb -0.00%  | 15.821ms +1.33%  | ±2.57% +116.69% |
| RowsBench         | bench_find_on_10k          | 2    | 3   | 93.117mb -0.00%  | 15.765ms +0.51%  | ±1.59% -18.14%  |
| RowsBench         | bench_find_one_on_10k      | 10   | 3   | 91.806mb -0.00%  | 1.706μs 0.00%    | ±2.72% +0.00%   |
| RowsBench         | bench_first_on_10k         | 10   | 3   | 91.806mb -0.00%  | 0.400μs +33.33%  | ±0.00% +0.00%   |
| RowsBench         | bench_flat_map_on_1k       | 2    | 3   | 100.867mb -0.00% | 15.183ms +4.31%  | ±3.07% +10.01%  |
| RowsBench         | bench_map_on_10k           | 2    | 3   | 130.294mb -0.00% | 69.694ms +0.50%  | ±1.39% -55.68%  |
| RowsBench         | bench_merge_1k_on_10k      | 2    | 3   | 93.637mb -0.00%  | 1.578ms -14.29%  | ±1.91% -36.76%  |
| RowsBench         | bench_partition_by_on_10k  | 2    | 3   | 97.025mb -0.00%  | 62.317ms -3.31%  | ±0.67% -62.07%  |
| RowsBench         | bench_remove_on_10k        | 2    | 3   | 94.690mb -0.00%  | 3.882ms -7.45%   | ±1.19% -64.29%  |
| RowsBench         | bench_sort_asc_on_1k       | 2    | 3   | 92.187mb -0.00%  | 40.016ms -2.14%  | ±0.52% -83.20%  |
| RowsBench         | bench_sort_by_on_1k        | 2    | 3   | 92.187mb -0.00%  | 40.538ms -6.20%  | ±1.04% -72.10%  |
| RowsBench         | bench_sort_desc_on_1k      | 2    | 3   | 92.187mb -0.00%  | 40.221ms -3.75%  | ±1.84% -11.09%  |
| RowsBench         | bench_sort_entries_on_1k   | 2    | 3   | 94.249mb -0.00%  | 8.088ms -3.37%   | ±0.50% -80.54%  |
| RowsBench         | bench_sort_on_1k           | 2    | 3   | 91.999mb -0.00%  | 29.228ms -2.14%  | ±0.65% -68.40%  |
| RowsBench         | bench_take_1k_on_10k       | 10   | 3   | 91.806mb -0.00%  | 14.358μs -6.64%  | ±2.03% +151.23% |
| RowsBench         | bench_take_right_1k_on_10k | 10   | 3   | 91.806mb -0.00%  | 18.000μs +3.57%  | ±3.26% +235.06% |
| RowsBench         | bench_unique_on_1k         | 2    | 3   | 110.943mb -0.00% | 240.036ms -0.57% | ±0.24% -76.95%  |
+-------------------+----------------------------+------+-----+------------------+------------------+-----------------+
Parquet Library
+--------------------+---------------------------------+------+-----+------------------+-------------------+-----------------+
| benchmark          | subject                         | revs | its | mem_peak         | mode              | rstdev          |
+--------------------+---------------------------------+------+-----+------------------+-------------------+-----------------+
| ParquetReaderBench | bench_page_headers              | 1    | 3   | 6.989mb -0.02%   | 3.326s -0.71%     | ±0.58% -62.48%  |
| ParquetReaderBench | bench_read_metadata             | 1    | 3   | 5.444mb -0.02%   | 18.288ms -1.70%   | ±0.87% +93.79%  |
| ParquetReaderBench | bench_read_schema               | 1    | 3   | 5.444mb -0.02%   | 18.252ms -3.11%   | ±0.48% -73.26%  |
| ParquetReaderBench | bench_read_values_all_columns   | 1    | 3   | 9.260mb -0.21%   | 5.621s -29.66%    | ±0.86% +177.14% |
| ParquetReaderBench | bench_read_values_single_column | 1    | 3   | 6.491mb -0.31%   | 235.038ms -48.87% | ±0.61% +335.58% |
| ParquetReaderBench | bench_read_values_with_limit    | 1    | 3   | 7.075mb -1.21%   | 29.206ms -16.07%  | ±0.58% -59.84%  |
| ParquetWriterBench | bench_write_batch               | 1    | 3   | 11.878mb -14.76% | 195.706ms -14.27% | ±0.58% -24.48%  |
| ParquetWriterBench | bench_write_gzip                | 1    | 3   | 10.503mb +0.01%  | 218.206ms -1.19%  | ±1.01% +148.96% |
| ParquetWriterBench | bench_write_row_by_row          | 1    | 3   | 11.878mb -14.76% | 193.599ms -14.01% | ±0.36% -79.91%  |
| ParquetWriterBench | bench_write_snappy              | 1    | 3   | 11.878mb -14.76% | 193.855ms -13.90% | ±0.70% -13.19%  |
| ParquetWriterBench | bench_write_uncompressed        | 1    | 3   | 10.124mb +0.01%  | 191.808ms -1.22%  | ±0.65% -16.85%  |
+--------------------+---------------------------------+------+-----+------------------+-------------------+-----------------+

@norberttech norberttech merged commit 0e140a5 into flow-php:1.x Sep 24, 2025
27 checks passed
@stloyd stloyd deleted the xml-schema branch September 24, 2025 18:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Proposal]: Introduce schema support into XML adapter

2 participants