Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace usage of in_array() in MigrateExecutable::handleMissingSourceRows #5765

Open
mdolnik opened this issue Sep 25, 2023 · 0 comments · May be fixed by #5766
Open

Replace usage of in_array() in MigrateExecutable::handleMissingSourceRows #5765

mdolnik opened this issue Sep 25, 2023 · 0 comments · May be fixed by #5766

Comments

@mdolnik
Copy link

mdolnik commented Sep 25, 2023

Describe the bug
Usage of in_array() in MigrateExecutable::handleMissingSourceRows() is proving to be very inefficient for migrations with a very large amount of rows.

To Reproduce
Run any migration ID with a very large amount of rows (eg 10,000+).
While the actual migration has a progress bar and lets you know when its finished, the logic in handleMissingSourceRows() will have the process seem like its frozen for an indeterminate amount of time.

Actual behavior
Running a migration ID with many rows (in my case over 300,000 for upgrade_d7_file_private) would take roughly 20-30 minutes for the actual migration, but would hang on MigrateExecutable::handleMissingSourceRows() for multiple hours before having to manually stop the process.

Using in_array() can be very inefficient as it needs to compare all array values until it finds a match not to mention the current logic is trying to find an an array within an array of arrays.

Workaround
Instead of using in_array() the $allSourceIdValues property should be keyed with a unique ID in order to utilize isset()

Having a dedicated method to build the key off the source ID values can allow it to be used when writing to the $allSourceIdValues property in MigrateExecutable::onPrepareRow() and reading it within handleMissingSourceRows().

Making this change to the example above with 300k rows, brought this post-migration logic to finish within a few minutes instead of multiple hours.

mdolnik pushed a commit to mdolnik/drush that referenced this issue Sep 25, 2023
…urceRows().

Adds a new method getSourceIdKey() to serialize the source ID data into a unique key for retrieval later.

Refer to drush-ops#5765
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant