Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fs.read sends callback a buffer with more than what was read #3657

Open
wbt opened this issue Dec 17, 2021 · 3 comments
Open

fs.read sends callback a buffer with more than what was read #3657

wbt opened this issue Dec 17, 2021 · 3 comments
Labels
discuss help wanted needs more info issues that need more info from the author workaround provided

Comments

@wbt
Copy link

wbt commented Dec 17, 2021

Version

17

Platform

Win10 Pro x64

Subsystem

fs

What steps will reproduce the bug?

Summary:

As documented, either fs.read() function takes as a final parameter a callback function which accepts three arguments: (err, bytesRead, buffer). The buffer passed to this callback is of the same length as the buffer passed in, not limited to the number of bytes read.

As a workaround, the first line of the callback can be set to:
const properlyLimitedBuffer = Buffer.from(buffer.buffer, buffer.byteOffset, bytesRead);
and then the rest of the callback function should use properlyLimitedBuffer instead of buffer. This is fairly memory-efficient because Buffer.from() just creates a new view without copying the contents of the underlying memory. However, it appears that something like this line should be in the Node code just prior to calling the callback.

Description:

Steps To Reproduce:

These steps are for use case nodejs/node#1 as described under “Impact” below.

  1. Have the latest version of Node installed. This was tested with v16.10.0 and code inspection of the current master branch suggests the issue continues on any later version.
  2. In a known directory, create the following file called copy.js in a known location:
const fs = require('fs');
/**
* Assumes inputDir & outputDir already exist.
* Yes, there are easier ways to copy file contents.
* You can imagine whatever other kind of processing that adds more value
* that you want, including processing that writes to databases, summarizes text,
* etc.; adding anything else here unnecessarily complicates the demonstration.
*/
function copyDirContents(inputDir, outputDir) {
  const reusableBuffer = Buffer.alloc(16384);
  fs.readdir(inputDir, async (err, files) => {
    for(let filename of files) {
      const inputPath = inputDir+'/'+filename;
      const outputPath = outputDir+'/'+filename;
      await new Promise(function(resolve, reject) {
        fs.open(inputPath, 'r', (err, fd) => {
          //In this oversimplified demo, one read gets the full file.
          fs.read(fd, {buffer: reusableBuffer}, (err, bytesRead, buffer) => {
            //buffer = Buffer.from(buffer.buffer, buffer.byteOffset, bytesRead);
            if(bytesRead > 0) {
              fs.writeFile(outputPath, buffer, (err) => {
                if(err) { reject(err); } else { resolve(); }
              });
            }
          });
        });
      });
    }
  });
}
copyDirContents("./inputs", "./outputs");
  1. Next to that in the same directory, create directories inputs and outputs. In inputs, create the following two files:

Doug.txt:

Hello world! My name is Doug.
I am a fictional character created for the purpose of
this overly simplified demonstration.
My full name is Douglas F. Perjitsy.
My date of birth is June 23, 1979.
My social security number is 345-678-9012.
My primary credit card is a VISA, 4111 1111 1111 1111, expires 04/25, CVV 321.
My primary bank account is a Citibank Citigold Private Client account,
the kind that requires at least a million dollars minimum total balance,
#251616-116272, with username DouglasFranco and password D0ug$password12345.
The short version of my will is that I leave everything to my daughter Christie.
I really appreciate this highly secure information storage service which
automatically periodically reviews content with AI reminders that help me keep
documents like this up to date every few years. 

Eve.txt:

Hi I'm Eve!

  1. Manually audit copy.js in your favorite code editor, ignoring the commented-out line (and whatever you’ve read in this report) to see if you can spot the security issue.
  2. If your answer in the last step was yes, try it again from the perspective of an average Node.js developer who’s not a security expert and consider how that impacts Node’s reputation for security by default. You can also see the fs.write() documentation showing its acceptance of a Buffer object directly.
  3. Run node copy.js.
  4. Inspect outputs/Doug.txt. Notice the large number of null characters at the end of the output file, making it much larger than the corresponding input.
  5. Inspect outputs/Eve.txt, the result of processing Eve’s data, which Eve would presumably have access to.
  6. Notice that moving the declaration of reusableBuffer down at least 2 lines (to make it not actually reusable between files) helps a lot but doesn’t fix the large number of null characters at the end of the output files.
  7. Uncomment the proposed-fix-workaround line and repeat steps 5-7 inclusive.

Note that the coding style in the example is meant to be compact for minimal illustration of this specific issue, not optimal for maintainability etc. (e.g. no error or large-file handling).

Possible fix:

Looking at the source code for Streams, it appears that if (bytesRead !== buf.length) { the code will “shrink to fit”
by using Buffer.allocUnsafeSlow(bytesRead) followed by a copy operation. That looks like the correct behavior that should be copied over into fs.read, e.g. within the callback wrapper just prior to calling the callback. This strategy is slower and uses more memory than the Buffer.from() strategy above but is more resilient to a separate misuse where the same buffer is provided to asynchronous code running in parallel. An example of that case can be derived from the attached example by commenting out lines 15 + 27 (Promise constructor wrapper) and 22 (the most indented line).

For maintainability, instead of copying those lines of code as a potential fix, it would be better to split that off into a separate utility function (e.g. shrinkBufferToFit or even Buffer.shrinkToFit(buffer, offset, bytesRead) if Buffer.from() doesn’t cut it) which gets reused in multiple places.

User Impact

Consider the following four use cases, all using CVSS v3.0’s Network attack vector with Node running as a Web server.

0) File Integrity Check

A user stores a file on the server. The server computes a hash of the file, using Crypto.hash.update(buffer) with each chunk of the file and then hash.digest() to compute a file hash. The documentation for update() indicates it can take a Buffer object but it does not appear to pay attention to or accept length (and possibly offset) parameter values.

Even with one-chunk files, the result mismatches a hash of the same file computed elsewhere. This renders the file integrity check pretty useless, eliminating the security benefits one could get from the check. This occurs even without the programming-error-in-the-name-of-efficiency of reusing a buffer instead of initializing a new zero-filled buffer with Buffer.alloc(). This use case is how the issue was initially found.

1) Confidential Information Processing

In this use case, the server application supports the processing of confidential information for independent users. For this purpose, “processing” might mean something as simple as writing a file to a place where the user can access it later. The server processes a file which fills most of the buffer for Doug and then, reusing the same Buffer, processes a shorter file for Eve. Eve gains access to Doug’s confidential data, except for some initial portion obscured by data Eve provided. Eve could make this data intentionally very short to maximize exploit value, but even with zero technical knowledge or intent to gain unauthorized access, Eve is still gaining unauthorized access to some of Doug’s data.

In this scenario, Doug may have followed all modern best security practices in establishing the existence of his confidential information on the target system when becoming a user, perhaps years earlier (e.g. if the buffer was populated by some occasional or periodically run server-initiated process, example here) and Eve’s compromise of the data did not require any interaction from Doug. One could argue that in certain configurations of the unintentional compromise case, this doesn’t require any interaction from the attacker either.

2) Administrator Executable Privilege

In this use case, the server application allows an administrative user to upload executable code or commands which is/are later executed on the server. Attacker Alice, who has only the minimum level of public or user permissions required to make use of the buffer, fills most or all of the buffer or at least some piece at the end of the buffer with malicious code. Administrator Bob then fills the beginning of the buffer with a short executable that is saved to disk and/or subsequently executed. If the buffer is processed without the end limit at the write/execute stage, Alice’s unauthorized code executes after Bob’s authorized code.

Note that in this scenario, Alice does not strictly have to be intentionally attacking. However, if Alice is not intentionally attacking, the remaining contents of the buffer are likely to be junk that does not execute something interesting but rather throws an error. If the execution is done in the Node server’s process, it could cause the whole Node server process to exit, taking down the site and acting as an effective DoS for legitimate users. This can also be done if the data constitutes commands or input to commands for Node to execute, if the functions being executed make assumptions about when or how they will be called or assumptions about input which are violated by the junk or malicious data, and where the error causes the server process to exit. This seems much more common than the scenario where an application allows an administrator to upload near-arbitrary executable code saved and subsequently run on the system. (This DoS strategy has also been observed in a production application with less-than-stellar code quality.)

3) Software repository

In this use case, the server application allows an administrative user to upload executable code which is made available for user/public download (e.g. sites hosting useful software tools for government officials, medical/human subjects researchers, business users, distribution into the software supply chain etc.). The attack setup is similar to nodejs/node#2, but now the compromised code is running on users’ machines.

Regarding privileges needed to exploit this, note that Web applications could be configured to make publicly available functionality that alters the contents of a reused server-side Buffer. This was observed, for example, in an incorrect usage of the ‘express-fileupload’ module that accessed an underlying ArrayBuffer.

Most relevant first-page results in multiple searches suggested implementations using the whole buffer without any limit based on bytes read, or at least neglected to mention anything about the security practices that Node.js relies on developers to be using. Examples are here.

On the first page of results in Tabnine's search for code snippets using fs.read() I saw just johnsonj561 demonstrating the correct usage. Usage of the entire buffer or with explicit reference to buffer.length seemed to be most common, though a couple examples just checked a subset of initial bytes as needed for their use case, and a couple were local aliases for readFileSync.

Search results also included the official fs documentation which discusses how the callback gets a Buffer object that appears identical in description to what other functions say they take as input. While the official documentation can of course be enhanced to call this out, there's a lot of unofficial documentation not in our control which doesn't, it's not intuitive to have to include that extra step, and if everybody should be using that, why isn't it included in the Node code to support better security by default? Reuse of the buffer would still be supported as seen in the demo (with commented-out line active), unless someone is calling read() from within the callback function with an increasing manually-set length value or after an end-of-file (insecure case) - with only the callback-parameter version of the buffer instead of the one passed in to read.

Search results also included a fair number of examples with readFile() variants which are a higher level, but which can create some unpredictable memory-based limits on the sizes of files that can be handled. This would seem to mean that conversions from use of readFile() to read() (the easiest-to-find chunk-based alternative which doesn’t have the memory usage constraint) might tend to be made under time pressure for fixing a bug that's actively blocking a real use case.

Code impact

This is likely to impact similar other functions such as the multiple versions of filehandle.read and related synchronous/promisified functions, but without having done an in-depth analysis of the source code, I hesitantly assume good coding practices would have that handled already by other affected functions wrapping the affected code.

This was previously confidentially reported as a security issue, observing that CVE-2021-22939 was also considered a security issue even though a developer using a Node.js API differently could have solved it. The response from the Node.js team noted that like many of Node.js's low-level APIs, the fs.read() APIs is modelled after a UNIX API and Unix APIs are not seen as vulnerabilities but rather good models. The buffer is not trimmed in case a user is misusing a pattern of passing it through for reuse again. The Node.js security team thinks that an update to official documentation is all that's needed here, with @mcollina offering to open a PR for doing that.

How often does it reproduce? Is there a required condition?

Always.

What is the expected behavior?

The buffer provided to the callback from read() and its associated functions is limited to what was read.

What do you see instead?

The buffer contains extra information, which may include confidential information of another user or malicious code.
Depending on how the buffer is then used, this could be quite problematic.

Additional information

I still disagree that an update to official documentation is all that's needed and think that this should be fixed in Node code, on a semver-major release if necessary, but the discussion directed that next steps should be public discussion here on the issue tracker. I include the potential impact points above to help inform that discussion.

@DerekNonGeneric DerekNonGeneric transferred this issue from nodejs/node Dec 18, 2021
@DerekNonGeneric
Copy link

Thanks for opening this issue. AFAIK its simply a common misunderstanding of how to execute this code pattern using Node APIs and not an actual bug according to you:

On the first page of results in Tabnine's search for code snippets using fs.read() I saw just johnsonj561 demonstrating the correct usage.

How this is actually supposed to be done w/o running into the issue you describe was not immediately obvious to me, so thanks for the link!

There are a couple of PRs open attempting to modify the documentation affecting this coding pattern (one of them was authored by you, another I reviewed and lost interest in pursuing further):

Do you think that maybe if we got those two merged and a third one to include the proper-use example you provided that it would help clear things up for users of Node.js still using callbacks for filesystem operations?


I should probably mention that nearly all maintainers of Node.js (myself included) are unpaid volunteers doing this in their free time — please try to keep in mind that many of us lack incentive to work for free or do anything at all if not for the sake of wanting to help.

@wbt
Copy link
Author

wbt commented Dec 20, 2021

The docs PR I authored (which won't be merged because one commit comment was a little over the rule and I don't know how to retroactively edit commit comments on GitHub) was about the distinction between .length and .byteLength and does not provide any sort of clarity about the fact that the buffer includes more than what was read. At the time I wrote that PR I still considered the vulnerability information security-sensitive and was keeping those details in a private channel. The other PR arguably clarifies the interpretation of a parameter to lastIndexOf to better match documentation about the corresponding method in String.

This issue would require very clearly stating, in the context of describing the callback/return value of each affected function something like this:

The first line handling this read buffer should be
const properlyLimitedBuffer = Buffer.from(buffer.buffer, buffer.byteOffset, bytesRead);
followed by only using properlyLimitedBuffer, or similar code (e.g. const is not necessary in every context; variable names can be different or even reused to assign to the same buffer variable instead). More information on some possible consequences of not doing that can be found here.

Documentation changes like that should be retroactive for versions from the present all the way back to the introduction of the function.

In my opinion, a line of code like the cited one should be added in the Node code just before the callback. I could make a PR which does that which could be merged into the next semver-major branch, but don't want it set up for rejection based on the concept, and I have not figured out how to do a PR in this project which can be accepted even if approved. I would also tend to defer to @mcollina on a docs update PR as he jumped in pretty quickly offering to do one and might have something in mind (or at least have a better idea what it should look like and how it can get added).

Finally, I agree about the funding/motivation problem in open source, even for highly used projects. I'm formally a maintainer on WinstonJS (covering packages with >25M weekly NPM downloads), but available bandwidth to work on that is limited mostly to the particular purposes I was added for; project maintenance as a whole is a pretty large chunk of work. It surprises me that Node.js doesn't have funding or dedicated pro maintainers as it's the core of an ecosystem way larger than Winston's relatively tiny niche within it, and it has a funding channel set up through the OpenJS Foundation with pricey corporate memberships and big events. However, it doesn't surprise me that much. In any event, thanks to all the contributors for your work and contributions. My investment of all the time and effort associated with reporting on this after the much simpler step of adding the securing workaround to my own code is in a similar spirit, done out of a desire to help secure the software ecosystem more broadly.

@knowhatamine
Copy link

Brilliant thank you!!! Very curious to see if this has been fixed by now. I am working on it :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss help wanted needs more info issues that need more info from the author workaround provided
Projects
None yet
Development

No branches or pull requests

3 participants