Skip to content

try creation of prosim-db in one single shot #6

@oricdev

Description

@oricdev

Requirements: Issue #1 implemented

Why doing this?
As stated and repeated, the interset process performs a Mapreduction in-memory with data from the data packages (quick), but then, performs another Mapreduction with records already present in the Prosim-db which were created during previous sliced imports. This latter Mapreduction is a very heavy process to deal with for the MongoDb (performed for each record in Memory but still, a lot of read, write, expand, indexing staff in the db).
Hence it could be interesting to determine a maximum amount of products for which a 1-shot integration could be performed (only in-memory Mapreductions). Thus would let us gain a considerable amount of time (no scheduled tasks 2 an hour anymore) and possibly could the Prosim-db be generated from scratch in less than a day instead of several days.

How to proceed?
Number of products with appropriate non-empty tags for making the comparison between products is limited to about 20% of the OFF official db:
about 110.000 / 550.000 products
Check what happens in terms of resources used (memory, disk speed/space, overall behaviour) if we decide to create the Prosim-db in 1 shot by setting the environment as follows:

  • feeder_1 has extracted all 110.000 meeting non empty criteria for "nutrition_score_uk" and "categories_tags" => all_products.json
  • copy all_products.json into updated_products.json
  • in preparer/config.xml, set tags with these values:
    <width>120000</width>
    <height>120000</height>
    <stats_H_nb_products>nb products extracted in all_products.json</stats_H_nb_products>
    <stats_W_nb_products>nb products extracted in all_products.json</stats_W_nb_products>
  • preparer/progress.xml: clear values of the tags to start with a new Prosim-db
  • intersect/config.xml : set max db size to 500GB
    <max_db_size_gigabytes>500</max_db_size_gigabytes>

Metadata

Metadata

Assignees

No one assigned

    Labels

    easyeasy to deal withhelp wantedExtra attention is neededprio 2middle priority

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions