Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add script to add users for bulk import #977

Open
wants to merge 3 commits into
base: feat/bulk-import-base
Choose a base branch
from

Conversation

anku255
Copy link
Contributor

@anku255 anku255 commented Apr 4, 2024

Summary of change

(A few sentences about this PR)

Related issues

Test Plan

(Write your test plan here. If you changed any code, please provide us with clear instructions on how you verified your
changes work. Bonus points for screenshots and videos!)

Documentation changes

(If relevant, please create a PR in our docs repo, or create a checklist here
highlighting the necessary changes)

Checklist for important updates

  • Changelog has been updated
    • If there are any db schema changes, mention those changes clearly
  • coreDriverInterfaceSupported.json file has been updated (if needed)
  • pluginInterfaceSupported.json file has been updated (if needed)
  • Changes to the version if needed
    • In build.gradle
  • If added a new paid feature, edit the getPaidFeatureStats function in FeatureFlag.java file
  • Had installed and ran the pre-commit hook
  • If there are new dependencies that have been added in build.gradle, please make sure to add them
    in implementationDependencies.json.
  • Update function getValidFields in io/supertokens/config/CoreConfig.java if new aliases were added for any core config (similar to the access_token_signing_key_update_interval config alias).
  • Issue this PR against the latest non released version branch.
    • To know which one it is, run find the latest released tag (git tag) in the format vX.Y.Z, and then find the
      latest branch (git branch --all) whose X.Y is greater than the latest released tag.
    • If no such branch exists, then create one from the latest released branch.
  • If added a foreign key constraint on app_id_to_user_id table, make sure to delete from this table when deleting the user as well if deleteUserIdMappingToo is false.

@anku255 anku255 changed the base branch from master to feat/bulk-import-base April 4, 2024 11:20
@anku255 anku255 requested a review from sattvikc April 18, 2024 09:44
1. Ensure you have Node.js (v16 or higher) installed on your system.
2. Open a terminal window and navigate to the directory where the script is located.
3. Run `npm install` to install necessary dependencies.
4. Run the script using the following command:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of CoreAPIURL, use Core ConnectionURI

2. Open a terminal window and navigate to the directory where the script is located.
3. Run `npm install` to install necessary dependencies.
4. Run the script using the following command:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the core may have an API key as well which needs to be supplied to this program

- Replace `<InputFileName>` with the path to the input JSON file containing user data.
- Optionally, you can specify the paths for the output files:
- `--invalid-schema-file <InvalidSchemaFile>` specifies the path to the file storing users with invalid schema (default is `./usersHavingInvalidSchema.json`).
- `--remaining-users-file <RemainingUsersFile>` specifies the path to the file storing remaining users (default is `./remainingUsers.json`).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remaining-users-file needs more explanation

The input file should be a JSON file with the same format as requested by the `/bulk-import/users` POST API endpoint. An example file named `example_input_file.json` is provided in the same directory.

## Expected Outputs

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about an output whilst the script is running? Sort of like the overall progress.


## Note

The script would re-write the files specified by `--remaining-users-file` and `--invalid-schema-file` options on each run. Ensure to back up these files if needed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add another note that even if this script has finished, it doesn't mean for sure that there will be no errors or that all the users are imported. Cause the cronjob in the core will have to run and it will take its time.

Ideally, this script should also take that into account, and show that output. For example, once we have called the API for all users in the input json file, then this script should query the core to check how many are processing, and how many have failed, out of the ones that have failed, it should output those in a file (same file as usersHavingInvalidSchema)? and the tell devs what to do.

async function main() {
const { coreAPIUrl, inputFileName, usersHavingInvalidSchemaFileName, remainingUsersFileName } = await parseInputArgs();

const users = await getUsersFromInputFile({ inputFileName });
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldnt this actually pick up from the remainingUsers file if that exists? Cause the input file would have users that have been successfully imported as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise the dev would have to manually copy the contents of the remainingUsers file to their input file on each run, which can be something that they miss.

Comment on lines +134 to +138
while (i < users.length || usersToProcessInBatch.length > 0) {
let remainingBatchSize = usersToProcessInBatch.length > BATCH_SIZE ? 0 : BATCH_SIZE - usersToProcessInBatch.length;
remainingBatchSize = Math.min(remainingBatchSize, users.length - i);

usersToProcessInBatch.push(...users.slice(i, i + remainingBatchSize));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add comments explaining all this with an example. Hard for me to understand the logic here.

Comment on lines +140 to +146
const res = await fetch(`${coreAPIUrl}/bulk-import/users`, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
},
body: JSON.stringify({ users: usersToProcessInBatch }),
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need API key as well

if (!res.ok && res.status !== 400) {
const text = await res.text();
console.error(`Failed to add users. API response - status: ${res.status} body: ${text}`);
break;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a good idea to break. Imaging I am running this script for many 100k users, and went away for a coffee. Then after 5 seconds this fails temporarily. Now it will make no progress even if the core is up. Instead, do exponential backoff (upto a few seconds max), and try again.


usersToProcessInBatch.push(...users.slice(i, i + remainingBatchSize));

const res = await fetch(`${coreAPIUrl}/bulk-import/users`, {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to have some wait time each iteration. Otherwise it may breach the late limit of the core! 100 MS wait time.

@anku255 anku255 mentioned this pull request May 29, 2024
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants