Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

handle blobs as fields #53

Merged
merged 11 commits into from
Dec 1, 2021
Merged

Conversation

Uzlopak
Copy link
Contributor

@Uzlopak Uzlopak commented Dec 1, 2021

I think this handles the issue with blob uploads correctly:

It doesnt matter we have set a filename or not. We just check if the content is an octet stream. If so we treat it as a file. If not, we treat it as a field.

Also according to the Spec, a filename is not mandatory.
https://datatracker.ietf.org/doc/html/rfc7578#section-4.2

   For form data that represents the content of a file, a name for the
   file SHOULD be supplied as well, by using a "filename" parameter of
   the Content-Disposition header field.  The file name isn't mandatory
   for cases where the file name isn't available or is meaningless or
   private; this might result, for example, when selection or drag-and-
   drop is used or when the form data content is streamed directly from
   a device.

So the check if filename was not undefined was already non spec conform.

You could also create a Blob with mimetype application/octet-stream and upload it, and because we would maybe check for filename,e.g. 'blob' we would not handle the blob correctly. Blob only means, that we have the data in-memory of the browser. For the server it should be actually not be important if an upload was made with a blob or not. It should only look if it is octet-stream or not.

Checklist

@climba03003
Copy link
Member

climba03003 commented Dec 1, 2021

It is the problem is multipart/form-data spec, as it didn't state clearly how to distinguish a field and file.
It is rare but it can

  1. a file without filename
  2. a field without content-type

I would say it is better to make those decision customizable by the consumer.
e.g. multiple flags, or function to determine what they means to be a file.
It is important because we do not know what they needs and the requirement of their server.

I am happy if both option exist.
the flags pass to the default function to determine the file and field.
If they want more, then pass an function to modify it.

@Uzlopak
Copy link
Contributor Author

Uzlopak commented Dec 1, 2021

According to the spec:

https://datatracker.ietf.org/doc/html/rfc7578#section-4.4

   Each part MAY have an (optional) "Content-Type" header field, which
   defaults to "text/plain".  If the contents of a file are to be sent,
   the file data SHOULD be labeled with an appropriate media type, if
   known, or "application/octet-stream".

So it seams, that we must handle all content types which are not text/plain as files?!

@cyantree
Copy link

cyantree commented Dec 1, 2021

Thanks for your research. It's given me more understanding of the matter. If I understand it correctly the spec doesn't allow separating between files and fields - it's all just parts and everything else is up to the developer to decide.

So I think:

  • This package shouldn't make any assumptions regarding which part is a file or a field.
  • It's the developers responsibility and ability to choose.
  • Because for a given route it's clear which parts should be handled in which way, it should be possible to specify this.

Therefore I propose the following:

  • By default this package handles every part as a filestream
  • It's possible to specify per route how every part is handled

That's implemented like this:

  • Remove the config option onFile and introduce a route specific option like onPart() or a hook
  • Remove the separation between files and fields.
  • onPart() will be called for every part and lets the user transform the part according to his wishes
  • This package delivers various utility functions for quickly configuring this

A rough idea:

onPart(part) {
    if (part.name === 'config') {
        return secureJSON((await inlinePart(part, {maxSize: 1024 * 1024 * 1024}).value);
    }

    if (part.name === 'certificate') {
        return inlinePart(part, {maxSize: 10 * 1024 * 1024});
    }

    return dumpPart(part, {dir: 'uploads', maxSize: 1024 * 1024 * 1024});
}

That's a bit like my workaround from the original issue fastify/fastify-multipart#292 (comment)

Or it might be done even more high-level (kind of a webpack loader chain (but run from left to right)).

preParsing: configureParts({
    config: [inlinePart({maxSize: 1024 * 1024 * 1024}), secureJSON],
    certificate: [inlinePart({maxSize: 10 * 1024 * 1024})],
}, [dumpPart({dir: 'uploads', maxSize: 1024 * 1024 * 1024})]);

I think that this would be a huge paradigm shift for this package.

@kibertoad
Copy link
Member

@cyantree See how busboy is currently used: https://github.com/fastify/fastify-multipart/blob/master/index.js

From my understanding, it is created once per request and then discarded. Multer does same thing: https://github.com/expressjs/multer/blob/master/lib/make-middleware.js

So I think that having a config option on Busboy instance, which would be responsible for deciding what is field and what is file is a valid approach. It should fall upon higher-level library (such as multer or fastify-multipart) to pass correct configuration per-route.

@Uzlopak
Copy link
Contributor Author

Uzlopak commented Dec 1, 2021

Actually if no mimetype is set it should be text/plain.

https://datatracker.ietf.org/doc/html/rfc7578#section-4.6

How about a simple regex or function for content type?
Default is: everything is a file except text/plain and no content type means text/plain?

That is imho closes to the rfc

@cyantree
Copy link

cyantree commented Dec 1, 2021

@kibertoad You're absolutely right, sorry for mixing up these two packages. :)
Yes, busboy should only provide handling of parts and fastify-multipart should do the connection and configuration for specific routes.

@kibertoad
Copy link
Member

@Uzlopak What if I send a text/plain file as a file? How should that be handled?

@cyantree
Copy link

cyantree commented Dec 1, 2021

@Uzlopak
I've created a simple HTML form with file upload, text input and textarea input.
The following data will be sent when uploading a .txt file:

------WebKitFormBoundaryhtgx6m6izaBBUPXY
Content-Disposition: form-data; name="x"; filename="test.txt"
Content-Type: text/plain

hello world
------WebKitFormBoundaryhtgx6m6izaBBUPXY
Content-Disposition: form-data; name="txt"

hello world
------WebKitFormBoundaryhtgx6m6izaBBUPXY
Content-Disposition: form-data; name="txt2"

hello world
------WebKitFormBoundaryhtgx6m6izaBBUPXY--

What I take from this:

  • File upload and text fields differ in filename and Content-Type.
  • As discussed earlier filename shouldn't be seen as safe marker to identifying file uploads.
  • Content-Type is also no safe marker as the original issue regards parsing JSON parsing of fields. Therefore in this case it's not possible to know whether a specific part is a file upload (e. g. for storing on the server) or a metadata object (e. g. for further validation)
  • In general Content-Type only tells what a content it but now what it's purpose is.

@Uzlopak
Copy link
Contributor Author

Uzlopak commented Dec 1, 2021

Then it is actually non conform to the spec imho.

Ok, different approach:

We supply per config a function which gets the values of content type, fieldname and filename. Depending on this everybody can determine if it is a file or a field.

In higher levels like fastify you have already schemas. So you could reuse them to configure busboy accordingly

@cyantree
Copy link

cyantree commented Dec 1, 2021

We supply per config a function which gets the values of content type, fieldname and filename. Depending on this everybody can determine if it is a file or a field.

In higher levels like fastify you have already schemas. So you could reuse them to configure busboy accordingly

Yes, that would be fine for me. And when not supplying the function everything gets handled as a file.

Btw: Maybe let's check how other languages do it. I know that PHP also handles files seperately. So they must also use some aspect to mark something as a file, presumable Content-Type and/or filename.

@Uzlopak
Copy link
Contributor Author

Uzlopak commented Dec 1, 2021

Why should everything be by default a file? I kind of prefer the other way round

@cyantree
Copy link

cyantree commented Dec 1, 2021

I've made some tests with PHP:
body.append('json', new Blob(['{"foo":"bar"}'], {type: 'application/json'}), '');
results in error 4 in PHP (UPLOAD_ERR_NO_FILE). So in this case the whole field gets dropped.

body.append('json', new Blob(['{"foo":"bar"}']));
body.append('json', new Blob(['{"foo":"bar"}'], {type: 'application/json'}));
body.append('json', new Blob(['{"foo":"bar"}'], {type: 'application/json'}), 'test.json');

all these three result in file uploads.

So I couldn't manage adding a blob as a field.

@kibertoad
Copy link
Member

Why should everything be by default a file? I kind of prefer the other way round

I think this is how busboy used to work, and what users are going to expect. Experiment done by @cyantree seems to confirm that this is actually a norm.

@cyantree
Copy link

cyantree commented Dec 1, 2021

Why should everything be by default a file? I kind of prefer the other way round

Because without any configuration every field would be loaded into memory which could easily lead to memory and performance issues.


The whole thing is quite confusing and I'm asking myself if this gets out of hand.
Sorry for swaying back and forth, I'm only trying to get a grip on the whole topic.

So maybe let's take a step back and start over:

Finding a sensible default for separating fields from files should be managable and maybe the status quo (before our PRs), already is it. If other languages have a solid way in handling parts, why should this library need some customization options for this?

So maybe the current version of busboy is already good enough and the original issue should be addressed in fastify-multipart? Maybe there the solution is configuring which field (without Content-Type application/json) should be handled as json.

@kibertoad
Copy link
Member

So maybe the current version of busboy is already good enough and the original issue should be addressed in fastify-multipart?

Does busboy currently expose enough configuration for that to be feasible, though?

@Uzlopak
Copy link
Contributor Author

Uzlopak commented Dec 1, 2021

I pushed my proposal. You could supply a isPartAFile as option to busboy. I chose the old behaviour of busboy as fallback if no isPartAFile is supplied as option.

Now everybody can write his own function based on fieldName, contentType and fileName if it is a file or a field.

@cyantree
Copy link

cyantree commented Dec 1, 2021

So maybe the current version of busboy is already good enough and the original issue should be addressed in fastify-multipart?

Does busboy currently expose enough configuration for that to be feasible, though?

Yes, I think so. busboy emits the following events:

boy.emit('file', fieldname, file, filename, encoding, contype)
boy.emit('field', fieldname, buffer, false, truncated, encoding, contype)

They contain pretty much everything.

fastify-multipart already contains a onFile() config that will be called for every file part. So a onField() config might be used for parsing the json data safely when necessary.

@cyantree
Copy link

cyantree commented Dec 1, 2021

I pushed my proposal. You could supply a isPartAFile as option to busboy. I chose the old behaviour of busboy as fallback if no isPartAFile is supplied as option.

Now everybody can write his own function based on fieldName, contentType and fileName if it is a file or a field.

Thanks. Yes, that should do the trick, if exposed by fastify-multipart. Then one could write a filter function specifying which parts should be treated as field and therefore might be parsed as json and handled by the integrated validation.

@Uzlopak
Copy link
Contributor Author

Uzlopak commented Dec 1, 2021

@cyantree
Currently busboy decides what a field is and what a file. So fastify-multipart can not hook into it and say, "hey busboy, you are wrong, this is actually a file/field". But with a isPartAFile you could tell busboy what actually a file is and what a field is and then consume the right event for each part.

@kibertoad
I think with isPartAFile, we would have a necessary minimal change, which gives the power to the userland to configure busboy accordingly and with the default fallback it would be non-breaking. So our busboy would be still a valid drop-in-replacement for legacy busboy.

If you agree with me, I would add the necessary changes to typings and documentation.

Maybe isPartAFile is not the best name? but it is quite descriptive :)

lib/types/multipart.js Outdated Show resolved Hide resolved
@kibertoad
Copy link
Member

kibertoad commented Dec 1, 2021

@Uzlopak Yes, I think this is the exactly right solution.

@Uzlopak
Copy link
Contributor Author

Uzlopak commented Dec 1, 2021

@kibertoad
@cyantree

added documentation. anything missing?

@kibertoad
Copy link
Member

LGTM.
@cyantree What do you think?

Copy link

@cyantree cyantree left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comments don't affect the overall functionality but only code quality and code coverage.

lib/main.d.ts Outdated Show resolved Hide resolved
test/types-multipart.spec.js Show resolved Hide resolved
@Uzlopak
Copy link
Contributor Author

Uzlopak commented Dec 1, 2021

Done

@Uzlopak Uzlopak merged commit 82a9a50 into fastify:master Dec 1, 2021
@cyantree
Copy link

cyantree commented Dec 1, 2021

Thanks @Uzlopak and @kibertoad for your work and time. 👍

@kibertoad
Copy link
Member

Thank you for for bearing with us through all of this :D.
and also for inspiring this whole commotion in the first place!

@Uzlopak
Copy link
Contributor Author

Uzlopak commented Dec 1, 2021

It was quite a ride and it was good that we worked together to get the best solution possible :)

I really love this hands-on mentality

@kibertoad
Copy link
Member

@Uzlopak I'll work on benchmarks now. Then we can hopefully land all outstanding PRs and publish 1.0.0 :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants