Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

maxByteLength Error Handling #2

Closed
jcapcik opened this issue Jun 26, 2023 · 11 comments
Closed

maxByteLength Error Handling #2

jcapcik opened this issue Jun 26, 2023 · 11 comments
Assignees

Comments

@jcapcik
Copy link

jcapcik commented Jun 26, 2023

First off, thank you @rafasofizada for writing pechkin! I was initially using formidable to handle file uploads, but was growing frustrated with async handling and wasn't able to stream data from the request without using a temporary file. Also note that I'm new to Node, and came from PHP, so I'm mostly fumbling my way around.

My goal with using pechkin was as follows:

  1. Using Express, receive one or more files through a multipart/form-data request.
  2. Stream the request to AWS S3 as multi-part using @aws-sdk/lib-storage.
  3. If the stream fails in any way, but certainly if the file is above a size limit I have set through maxFileByteLength, abort the S3 upload.

The problem I'm running into is with using abortOnFileByteLengthLimit: true. If the file limit is reached, the S3 upload will fail (instead of just being truncated but successfully sent), but the error thrown from FileIterator is not caught and seemingly cannot be caught, so my node process crashes:

/[source-path]/node_modules/pechkin/dist/cjs/FileIterator.js:96
                throw new error_1.FieldLimitError("maxFileByteLength", field, maxFileByteLength);
                      ^

FieldLimitError: Exceeded file byte length limit ("maxFileByteLength").
Configuration info: 100
Field: "files[0]"
    at /[source-path]/node_modules/pechkin/dist/cjs/FileIterator.js:97:23
    at processTicksAndRejections (node:internal/process/task_queues:96:5) {
  configurationInfo: 100,
  limitType: 'maxFileByteLength',
  field: 'files[0]'
}
[nodemon] app crashed - waiting for file changes before starting...

Ideally, at least to a layman such as myself, it would be preferable for the Transform stream to throw an error when the size limit is reached. Just as a test I tried this, within the _transform function within ByteLengthTruncateStream, and it seemed to function as expected. I'm able to catch the error and continue to iterate through the files.

if (this.readBytes + chunkBuffer.byteLength > this.maxByteLength) {
    return callback(new Error(`ByteLengthTruncateStream: maxByteLength exceeded: ${this.maxByteLength}`));
    ...

By looking at the examples and the tests, it just seems that throwing an error to abort a stream was not a priority. As it stands, it seems the only way I can use this package with a file size limit is to set abortOnFileByteLengthLimit: false, check byteLength after the S3 upload completes, and if truncated: true, then manually delete the file that was just uploaded. I'll also probably require a Content-Length header and validate the size ahead of time, although that only protects when the request comes from our app - it seems a separate request can be made with a fake content length, and there's no built-in validation in Node for multipart requests.

If you have any thoughts or suggestions, I'm all ears!

@rafasofizada
Copy link
Owner

Hey @jcapcik! Thank you for a very well-written issue :) I'm looking into it right now.

Could you please provide a snippet of the code that throws an error, and ideally a minimum reproduction of your issue? That'd greatly help.

@jcapcik
Copy link
Author

jcapcik commented Jun 26, 2023

Sure! I'll work on a simple working example to duplicate the issue.

@rafasofizada
Copy link
Owner

Thank you! I knew someone will eventually stumble upon an error case like this – handling errors inside streams is a huge PITA.

@rafasofizada rafasofizada self-assigned this Jun 26, 2023
@rafasofizada
Copy link
Owner

@jcapcik nevermind, no need for a snippet. I found a cause, will fix in the nearest time. Turns out, the error was thrown only inside the File.byteLength promise.

You can either:

  1. await file.byteLength in the for-await loop (taken from /examples/basic-fs-temp.js):
for await (const { filename: originalFilename, byteLength, stream, ...file } of files) {
  const newFilename = `${Math.round(Math.random() * 1000)}-${originalFilename}`;
  const dest = path.join(os.tmpdir(), newFilename);

  stream.pipe(fs.createWriteStream(dest));
  /*
  `byteSize` resolves only after the entire `file.stream` has been consumed
  You should `await byteSize` only AFTER the code that consumes the stream
  (e.g. uploading to AWS S3, loading into memory, etc.)
  */
  const length = await byteLength;
  //^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  // You can either wrap this into a try-catch block, or
  // wrap the outer for-await-of loop into try-catch.

  results.push({ ...file, dest, originalFilename, newFilename, length});
}
  1. Use the workaround you found. I'd advice against throwing errors directly inside _transform(), I've found error throwing and handling inside streams very confusing and complicated. Checking for truncated will probably be less bug-prone.

@jcapcik
Copy link
Author

jcapcik commented Jun 26, 2023

So here's my working example, but please let me know if I missed something.

I created a file testServer.js:

const pechkin = require("pechkin")
const express = require("express")
const os = require("os")
const fs = require("fs")
const path = require("path")
const { pipeline } = require('node:stream/promises')

const app = express()
const port = 3300

const asyncWrapper = (fn) => async (req, res, next) => {
	try {
		await fn(req, res, next)
	} catch (err) {
		next(err)
	}
}

app.post("/",
	asyncWrapper(async (req, res, next) => {
		const { files } = await pechkin.parseFormData(req, {
			maxFileByteLength: 10, // test
			abortOnFileByteLengthLimit: true,
		})

		for await (const { filename, byteLength, stream } of files) {
			// write to a temp file
			await pipeline(stream, fs.createWriteStream(path.join(os.tmpdir(), filename)))
		}

		res.send("File uploaded")
	})
)

app.listen(port, () => {
	console.log(`Example app listening at http://localhost:${port}`)
})

Ran node testServer.js

Then I created a testFile.csv with some dummy data over 10 bytes, and used curl to send it:

curl -X POST http://localhost:3300 -H 'Content-Type: multipart/form-data' -F 'file=@testFile.csv'

And this is the error before the server crashes:

/usr/src/app$ node testServer.js
Example app listening at http://localhost:3300
/usr/src/app/node_modules/pechkin/dist/cjs/FileIterator.js:96
                throw new error_1.FieldLimitError("maxFileByteLength", field, maxFileByteLength);
                      ^

FieldLimitError: Exceeded file byte length limit ("maxFileByteLength").
Configuration info: 10
Field: "file"
    at /usr/src/app/node_modules/pechkin/dist/cjs/FileIterator.js:96:23
    at processTicksAndRejections (node:internal/process/task_queues:96:5) {
  configurationInfo: 10,
  limitType: 'maxFileByteLength',
  field: 'file'
}

@jcapcik
Copy link
Author

jcapcik commented Jun 26, 2023

Whoops @rafasofizada, well I included it anyway :-)

Oh shoot! I apologize! I've been thinking about this wrong. Here I've been "await"ing whatever is writing to a stream (either writing to a file in my example, or uploading to S3). But what I didn't realize is that byteLength is a Promise waiting for the byteLengthEvent to resolve, which gets resolved when the max length is hit or when the stream is done. Not sure exactly how it functions internally (I'm still really new at JavaScript even), but if you await the stream processing first, and the file size limit is reached, it will still throw an error - it just can't be caught.

So yeah, changing my example to this, it works great!

...
		for await (const { filename, byteLength, stream } of files) {
			// write to a temp file
			// await pipeline(stream, fs.createWriteStream(path.join(os.tmpdir(), filename)))
			stream.pipe(fs.createWriteStream(path.join(os.tmpdir(), filename)))
			console.log(await byteLength)
		}
... 

@jcapcik
Copy link
Author

jcapcik commented Jun 26, 2023

Dang. Okay, so it doesn't crash Node at least, but I still have the issue where it will still write the file because the transform stream completes, it just has truncated the file. So this gets back to where I would need to catch the error and remove the file, instead of the stream itself erroring.

It seems like error handling is supported in stream implementation - https://nodejs.org/api/stream.html#transform_transformchunk-encoding-callback

The first argument passed to the callback must be an Error object if an error occurred while processing the input or null otherwise

At least in the Writeable stream, it's noted that the only reliable way of handling errors is to pass it to the callback:

Errors occurring during the processing of the writable._write(), writable._writev() and writable._final() methods must be propagated by invoking the callback and passing the error as the first argument. Throwing an Error from within these methods or manually emitting an 'error' event results in undefined behavior.

If course you have far more experience with this than I do, so maybe there are still gotchas to calling the callback with an error.

@rafasofizada
Copy link
Owner

You're right. I'll have to change the way errors are handled inside ByteLengthTruncateStream.

This is really a design problem – I explicitly don't throw any erros inside the stream implementation, but rather propogate it up (using EventEmitter) to FileIterator (and its internal processBusboyFileEventPayload() function), so that the for-await-of loop can catch the error and perform all necessary cleanup_. Unfortunately I didn't think through what happens to the consumer of the file stream (in your case, AWS S3 library).

I'll fix this ASAP and add as many test cases as I can think of. Thanks a lot for spotting this and researching it thoroughly!

One thing I wanted to ask – how did you discover this library? E.g. what search terms, from which website, etc. I'll be working on actively promoting this library, and knowing how people stumble upon it is very useful to me right now :)

@jcapcik
Copy link
Author

jcapcik commented Jun 26, 2023

Awesome! Hopefully the way I was intending to use it matches how others would want to use it, so hopefully this will be a beneficial change.

I ultimately stumbled upon your Reddit post 😄 I'm not sure exactly how I found it, but I was too many hours into researching how best to handle file uploads in Node, more specifically on utilizing streams with async code and limiting file sizes, and getting frustrated in the process. I just thought that this had to have been figured out many times over already.

Ultimately, I could have just gotten by with an existing package that utilizes temporary files, but the idea of having access to the fields right away without having to process the files, and to iterate through each file without having to read it in/store it, and to stream the file data out somewhere without having to worry about cleanup, and just simply do async things easily within the for-await-of - it's just so clean.

I really appreciate your responsiveness on this! For now, I'm just going to not implement a file size limit, and revisit it whenever you make some changes - so don't rush on my part! If anything else comes up while using the package in production I'll let you know.

rafasofizada added a commit that referenced this issue Jul 13, 2023
1. `byteLength` event is not necessary when you have standard
'finish'/'end'/'close'/'error' events + bytes/truncated getters
2. Pre-setting event handlers (especially on 'error') and
wrapping them into a Promise proved to cause lots of
peculiar bugs and race conditions.
@rafasofizada rafasofizada reopened this Jul 13, 2023
@rafasofizada rafasofizada pinned this issue Jul 13, 2023
@rafasofizada
Copy link
Owner

rafasofizada commented Jul 13, 2023

Hey @jcapcik. Firstly, thanks again for using Pechkin and submitting this bug report. It made me look deep into the codebase and learn lots of stuff about the library that I didn't even know myself :)

The issue is fixed. I've added an edited version of your bug reproduction to the tests, you can find it in /test/length_limit_error.e2e.spec.ts. Also updated the docs and added a changelog.

The actual fix is pretty small, I simply needed to callback() with an error in ByteLengthTruncateStream._transform() function. And this simple change led me down the path of figuring out fiendish edge cases with the byteLength property and stream error handling in general. I decided to get rid of the byteLength Promise as a part of Pechkin.File and not emit any custom events from within the stream.

One should instead wait until the stream finishes processing, listen to 'finish'/'end'/'error' events and use the bytesRead, bytesWritten and truncated getters of ByteLenghtTruncateStream / Pechkin.File.stream. Otherwise, setting stream event handlers beforehand and wrapping them into a Promise leads to lots of unintuitive behaviour which leads to lots of hard-to-track-down bugs – I've encountered several of them even in my testing code!

Version 2.0.0 (removing byteLength is unfortunately a breaking change) is published to npm: https://npmjs.com/package/pechkin.

Let me know what you think!

@jcapcik
Copy link
Author

jcapcik commented Jan 23, 2024

@rafasofizada, my sincere apologies that I did not respond sooner. I finally got around to upgrading to v2 to support better error handling when the file size limit is reached, and it worked perfect! Thanks again for your work on this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants