Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Storage Emulator does not match Google Cloud Storage library API #3953

Closed
kenttregenza opened this issue Dec 16, 2021 · 9 comments · Fixed by #4832 · May be fixed by #4008
Closed

Storage Emulator does not match Google Cloud Storage library API #3953

kenttregenza opened this issue Dec 16, 2021 · 9 comments · Fixed by #4832 · May be fixed by #4008

Comments

@kenttregenza
Copy link

[REQUIRED] Environment info

firebase-tools:
9.23.0

Platform:
Windows 10

[REQUIRED] Test case

Using the Google Cloud Storage Python library which follows the Google Cloud Storage JSON API spec upload a file

blob = blob_from_path(path)
blob.metadata = metadata
blob.upload_from_string(content, content_type, num_retries=3)

[REQUIRED] Steps to reproduce

Any GCS Client library which puts name inside the body of the "file" metadata of the upload and doesn't put name as a query parameter. (Again as per Google Cloud Storage JSON API spec upload a file)

[REQUIRED] Expected behavior

The emulator should get the name of the file from the metadata passed by the client library. Then the file should be uploaded and file metadata returned in a 200 response.

[REQUIRED] Actual behavior

Returns 400 because name is not found in the query (even though name is in the metadata)

The Problem
The problems stems from the emulator/storage/apis/gcloud.js implementation

gcloudStorageAPI.post("/upload/storage/v1/b/:bucketId/o", (req, res) => {
        if (!req.query.name) {
            res.sendStatus(400);
            return;
        }
        let name = req.query.name.toString();
        if (name.startsWith("/")) {
            name = name.slice(1);
        }

The POST handler is checking for a name parameter but the spec says it is optional and may be instead passed via metadata which is in the metadata section of a multipart upload or in the body for a resumable upload.

I just thought it might be worthy of fixing.

@kenttregenza
Copy link
Author

kenttregenza commented Dec 16, 2021

Just an addendum to the above problem.
So I have have modified the code to get a name from request object with a fall back to the metadata (from body in case of resumable or metadatastring in case of multipart)

function getValidName(req, metadata) {
    let name = req.query.name && req.query.name.toString() || metadata && metadata.name;
    if (name && name.startsWith("/")) {
        name = name.slice(1);
    }
    return name    
};

The library is also non-compliant (or perhaps not accommodating) of handling the multipart/related which I imagine would cause a whole bunch of issues for any non-Firebase GCS client library trying to access the emulator using their multipart/related code.

A couple of case in points

Boundaries

The emulator assumes the boundary value from Content-Type doesn't have double quotes around the value... which for a number of libraries (and again spec). These are valid

multipart/related; boundary="===============2559276420576958297=="
multipart/related; boundary================2559276420576958297==
multipart/related; boundary=---------------------------974767299852498929531610575

So this bit of code won't work

const boundary = `--${contentType.split("boundary=")[1]}`;

Blob Headers

Next is how the handler handles the looking for its content-type

if (!blobContentTypeString || !blobContentTypeString.startsWith("Content-Type: ")) {
    res.sendStatus(400);
    return;
}
const blobContentType = blobContentTypeString.slice("Content-Type: ".length);

HTTP Headers are not case sensitive... so the startsWith (nor the slice) can't be used here as below is valid and would never be found

content-type: application/json; charset=UTF-8

Anyway I'm sure Storage Emulator is in early days and it will improve but at this stage it feels that using it with other GCS Client libraries is not looking good.

Perhaps formidable or Multer or Multiparty might be more suitable for the task.

@kenttregenza
Copy link
Author

Just putting some code here until it is fixed. Maybe it'll help... I can't guarantee performance or meeting every spec (multipart is complicated)

gcloudStorageAPI.post("/upload/storage/v1/b/:bucketId/o", (req, res) => {
        
        const getValidName = (req, metadata) => {
            const name = req.query.name && req.query.name.toString()  || metadata && metadata.name;
            if (name && name.startsWith("/")) {
                return name.slice(1);
            }
            return name;
        };

        const contentType = req.header("x-upload-content-type") || req.header("content-type");
        if (!contentType) {
            res.sendStatus(400);
            return;
        }
        
        if (req.query.uploadType == "resumable") {
            req.body.contentType = contentType   // Fix for startUpload ignoring contentType

            const name = getValidName(req, req.body)
            if (!name) {
                res.sendStatus(400);
                return;
            }  
            const upload = storageLayer.startUpload(req.params.bucketId, name, contentType, req.body);
            const emulatorInfo = registry_1.EmulatorRegistry.getInfo(types_1.Emulators.STORAGE);
            if (emulatorInfo == undefined) {
                res.sendStatus(500);
                return;
            }
            const { host, port } = emulatorInfo;
            const uploadUrl = `http://${host}:${port}/upload/storage/v1/b/${upload.bucketId}/o?name=${upload.fileLocation}&uploadType=resumable&upload_id=${upload.uploadId}`;
            res.header("location", uploadUrl).status(200).send();
            return;
        }

        const MULTIPART_RE = /^multipart\/related/i;
        
        if (!contentType.match(MULTIPART_RE)) {
            res.sendStatus(400);
            return;
        }
        
        const BOUNDARY_RE = /;\s*boundary\s*=\s*(?:"([^"]*)"|([^;]*))/i
        const [boundaryMatch, boundaryWithQuotes, boundaryWithoutQuotes] = contentType.match(BOUNDARY_RE) || [];
        if (!boundaryMatch) {
            res.sendStatus(400);
            return;
        }
        const boundary = `--${ (boundaryWithQuotes || boundaryWithoutQuotes).trim() }`;
        
        const extractFiles = (boundary, body) => {
            const SECTIONS_RE = /^(.*?)(\r?\n\r?\n)(.*)$/s;
            const HEADER_LINES_RE = /\r?\n/;
            const TRAILING_CRLF_RE = /(\r?\n)$/;
            const HEADER_KV_RE = /^([^:\s]+)?\s*:\s*?(.*)/;
            
            const files = []
            body.toString("binary").split(boundary).reduce((prev, bodyPart, i)=>{
                const file = { 
                    position: (i*boundary.length) + prev, 
                    size: 0, 
                    start: 0, 
                    end: 0, 
                    buffer: ()=> { return Buffer.from(body.slice(file.start, file.end)); }
                };
                files.push(file);
                const sections = bodyPart.match(SECTIONS_RE);
                if (sections) {
                     // Every file must have headers and body per spec                    
                    const fileHeaders = sections[1];
                    const separator = sections[2];
                    const fileBody = sections[3];
                    const headerEntries = fileHeaders.split(HEADER_LINES_RE).map(rawHeader => rawHeader.match(HEADER_KV_RE)).filter(Boolean);
                    
                    const trailing = fileBody.match(TRAILING_CRLF_RE);

                    file.headers = Object.fromEntries(headerEntries.map(match => [match[1].trim(), match[2].trim()]));
                    file.contentType = Object.entries(file.headers).find(([h,v])=>h.toLowerCase()=="content-type")[1];
                    file.start = file.position + fileHeaders.length + separator.length;
                    file.end = file.start + fileBody.length - (trailing?trailing[1].length:0)
                    file.body = fileBody
                    
                }
                return prev + bodyPart.length;
            }, 0);
            return files.filter(f=>f.headers)
        }

        const files = extractFiles(boundary, req.body)
        const rawMetadata = files[0].body   // files.0 is metadata
        const fileMetadata = JSON.parse(rawMetadata)
        const blobBytes = files[1].buffer(); // files.1 is blob
        const blobContentType = files[1].contentType;
        fileMetadata.contentType = blobContentType   // Fix for one shot upload ignoring blobContentType

        const name = getValidName(req, fileMetadata)
        if (!name) {
            res.sendStatus(400);
            return;
        }  
        const metadata = storageLayer.oneShotUpload(req.params.bucketId, name, blobContentType, fileMetadata, blobBytes);
        if (!metadata) {
            res.sendStatus(400);
            return;
        }
        res.status(200).json(new metadata_1.CloudStorageObjectMetadata(metadata)).send();
        return;
    });

@kenttregenza kenttregenza changed the title Storage Emulator does not match Google Cloud Storage library Spec Storage Emulator does not match Google Cloud Storage library API Dec 22, 2021
@abeisgoat
Copy link
Contributor

So this is actually out of scope for the Cloud Storage for Firebase emulator since our goal is only to cover the Firebase SDKs and the Cloud SDKs which get exposed through firebase-admin-node. We'd love to have complete compatibility with non-JS Cloud SDKs, but as you discovered there's a lot of small edge cases.

It looks like you've done a lot of work on this and we'd happily review a PR with any changes if you had time!

@kenttregenza
Copy link
Author

Hi is there anything else you'd like done here for this? I noted your comment but I think it may fall within the scope of the emulator because the Firebase Admin SDK actually makes use of the default Google Cloud Storage client library to interact with the Firebase Storage emulator ... (at least in the Python stack).

@tonyjhuang
Copy link
Contributor

tonyjhuang commented Mar 14, 2022

Hi @kenttregenza, thanks for the PR and apologies for the lack of communication. I'll be taking over this issue as @abeisgoat has left Firebase. Can you please rebase your PR off master as unfortunately a lot has changed since you originally authored it. Can you also split the change into two separate PRs for us to review: one for the multipart changes and one for the name compatibility. I appreciate your patience and your contribution!

@qhaas
Copy link

qhaas commented Apr 3, 2022

Hopefully there is more movement on this, might resolve googleapis/python-storage issue 752, which I created before finding this issue / PR.

@shaunpersad
Copy link

Wanted to +1 as I'm also encountering the described issues.

@tonyjhuang
Copy link
Contributor

Filed internal bugs to track to 3 feature requests above, b/240630224, b/240637637, b/240627424

@tonyjhuang
Copy link
Contributor

@kenttregenza
Can you share where you see "===============2559276420576958297==" is a valid boundary?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment