-
Notifications
You must be signed in to change notification settings - Fork 978
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add database:import command for non-atomic import #5396
Conversation
7047897
to
3d1bd1a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love your implementation. Really concise and clean.
It's good to start with for now.
Using PUT
allows us to override one path at a time.
However, consider a case with a large list of small elements. (The total list exceeds 1MB) This would create tons of splits.
If we multi-path PATCH
, we can batch many small path and have fewer tiny chunks.
This is an optional optimization.
src/commands/database-import.ts
Outdated
|
||
const inputString = | ||
options.data || | ||
(await utils.streamToString(infile ? fs.createReadStream(infile) : process.stdin)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This allocates memory for the entire file.
However, reading firebase-import
. I think it also loads the whole file into a JSON obj in memory.
Just a note here. I actually don't know if there is a good way to parse JSON in a streaming way. We don't know if the JSON is valid or not until the whole file is parsed. 🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now I have JSONStream
parse and stream the top-level objects. We assume that individual objects can fit in memory.
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## master #5396 +/- ##
==========================================
+ Coverage 56.12% 56.25% +0.13%
==========================================
Files 317 318 +1
Lines 21510 21577 +67
Branches 4391 4397 +6
==========================================
+ Hits 12072 12139 +67
Misses 8376 8376
Partials 1062 1062
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report in Codecov by Sentry. |
src/database/import.ts
Outdated
constructor(private dbUrl: URL, file: string, private chunkSize = MAX_CHUNK_SIZE) { | ||
let data; | ||
try { | ||
data = { json: JSON.parse(file), pathname: dbUrl.pathname }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like this command to support at least 100GB json files. So parsing everything to memory at once isn't really practical. https://github.com/FirebaseExtended/firebase-import uses JSONStream. Can we use it here as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Although, note that
firebase-import
still loads the whole file into memory, not sure why, we'll not do that hereJSONStream
assumes that the top-level objects can fit into memory but that the entire file should be streamed. I think this is a reasonable, maybe even necessary assumption.
49723d4
to
de2e28b
Compare
46442b3
to
972302a
Compare
@fredzqm TBC on doing multi-path PATCH for now. If we do, we'll need to do something else to make sure that each batch of updates also doesn't exceed the request payload size. But shouldn't be hard. |
I would leave whether to use multi-path PATCH in the first version up to you. @tohhsinpei Singe-path PUT would deliver a lot of value already, so I'm happy to get that algorithm in and iterate later. |
3fb53d7
to
bdfc5ee
Compare
bdfc5ee
to
714dbc3
Compare
4fb53fe
to
90991b9
Compare
src/commands/database-import.ts
Outdated
|
||
export const command = new Command("database:import <path> [infile]") | ||
.description("non-atomically import JSON file to the specified path") | ||
.option("-f, --force", "pass this option to bypass confirmation prompt") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't in the API proposal.
I was thinking of:
- If
--force-override
is added, then proceed - Check if the imported path is empty, if yes, then proceed
- Prompt user to confirm importing to an non-empty path.
Sadly, it gets lost in the discussion.
@lepatryk What do you think about the behavior when them imported path isn't empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--force
just bypasses the ? You are about to import data to https://<instance>.firebaseio.com/. Are you sure? (y/N)
prompt. We still check if the imported path is empty and tell the user to delete if not.
src/database/import.ts
Outdated
const getJoinedPath = this.joinPath.bind(this); | ||
|
||
const readChunks = new stream.Transform({ objectMode: true }); | ||
readChunks._transform = function (chunk: { key: string; value: any }, _, done) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very cool. I am reading its docs: https://nodejs.org/api/stream.html#implementing-a-readable-stream
Thinking about how to control the size of the chunk
emitted.
Shall we set highWaterMark
higher to have larger chunk size?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure. We're in objectMode
, so highWaterMark
defaults to 16 (objects, not bytes). It's hard to know how many objects we should buffer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you experiment with a large file and log each request's size?
If it ends up sending lots of tiny requests, it would be very slow.
We want to make sure the batch size is reasonable large for this command to be efficient and useful.
src/commands/database-import.ts
Outdated
.before(requireDatabaseInstance) | ||
.before(populateInstanceDetails) | ||
.before(printNoticeIfEmulated, Emulators.DATABASE) | ||
.action(async (path: string, infile, options) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: We support a global flag --json, which makes a command output whatever is returned by the action function as JSON. To that end, it would be nice to include a return type for the function in action.
Right now, this is Promise, which is fine. However, think about if there is anything that this should return that would be useful for CI/automated use cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe total number of bytes written. Would be a good follow up. I'd prefer to get this PR in first if that's ok.
src/database/import.ts
Outdated
const getJoinedPath = this.joinPath.bind(this); | ||
|
||
const readChunks = new stream.Transform({ objectMode: true }); | ||
readChunks._transform = function (chunk: { key: string; value: any }, _, done) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you experiment with a large file and log each request's size?
If it ends up sending lots of tiny requests, it would be very slow.
We want to make sure the batch size is reasonable large for this command to be efficient and useful.
Impl LGTM. Curious about the actual performance of it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If possible, please minimize any
s in this code.
477cd94
to
2650a89
Compare
@fredzqm I just increased |
Yeah, I know. Long list cannot be chunked unless we use |
Description
b/261669787
API proposal: go/firebase-database-import
Implementation:
JSONStream
. Assume individual objects can fit in memory.