Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document how to override file modification times to zero via http.FileSystem wrapper. #31

Open
ZoltanLajosKis opened this issue Aug 26, 2017 · 5 comments

Comments

@ZoltanLajosKis
Copy link

First of all, thanks for this great package. I love the simplicity and clarity of using httpfs as both the input and the output interface.

Currently vfsgen copies the file modification time (ModTime) value from the input httpfs to the output httpfs as is, even if it is set to zero value. When the built-in HTTP file server encounters a zero time in the file system, it omits the Last-Modified header, which disables client-side caching.

This problem is apparent when using go's mapfs and httpfs to provide vfsgen input. The mapfs package always uses zero value as ModTime, so the HTTP file server will never include cache control headers.

To reproduce, generate a file system using the above packages...

package main

import (
	"github.com/shurcooL/vfsgen"
	"golang.org/x/tools/godoc/vfs/httpfs"
	"golang.org/x/tools/godoc/vfs/mapfs"
)

func main() {
    files := map[string]string{"hello.txt": "Hello."}
    mfs := mapfs.New(files)
    hfs := httpfs.New(mfs)
    vfsgen.Generate(hfs, vfsgen.Options{})
}

... serve the generated file system ...

package main

import "net/http"

func main() {
    http.ListenAndServe(":8080", http.FileServer(assets))
}

... and verify that no cache control headers as sent.

curl -v localhost:8080/hello.txt

I would suggest to handle zero value ModTime as a special case in vfsgen. As a simple fix, zero values could replaced with the time.Now() value. This would enable client-side caching, even across server restarts (but not across rebuilds).

As an alternative solution I have created a modified mapfs, go-mapfs, which takes both the file contents and the file modification time as input. This way the true modification time can be preserved and caching will work even across rebuilds.

package main

import (
    "time"

    "github.com/ZoltanLajosKis/go-mapfs"
    "github.com/shurcooL/vfsgen"
    "golang.org/x/tools/godoc/vfs/httpfs"
)

func main() {
    files := mapfs.Files{"hello.txt": {[]byte("Hello."), time.Date(2017, 1, 1, 0, 0, 0, 0, time.UTC)}}
    mfs, _ := mapfs.New(files)
    hfs := httpfs.New(mfs)
    vfsgen.Generate(hfs, vfsgen.Options{})
}
@dmitshur
Copy link
Member

dmitshur commented Aug 28, 2017

Hey @ZoltanLajosKis, thanks for great feedback.

I love the simplicity and clarity of using httpfs as both the input and the output interface.

Glad to hear this package's API is being appreciated for that! I, too, enjoy that very much.

I have two high level answers to your problem that I want to share and ask what you think:

Currently vfsgen copies the file modification time (ModTime) value from the input httpfs to the output httpfs as is, even if it is set to zero value. [...] I would suggest to handle zero value ModTime as a special case in vfsgen. As a simple fix, zero values could replaced with the time.Now() value. This would enable client-side caching, even across server restarts (but not across rebuilds).

First, it's possible to solve the problem without adding a special case to vfsgen, allowing it to continue to be simple and orthogonal (it solves one problem well). It's thanks to the http.FileSystem abstraction itself.

You're saying you want custom behavior for how to handle zero value ModTime. You can create this http.FileSystem implementation wrapper and include whatever custom logic you want. This way, you're welcome to do this at any time, and change it to fit your changing needs, and not affect other people that may not want this behavior. It keeps vfsgen simple and orthogonal.

type modTimeFS struct {
    fs http.FileSystem
}

func (fs modTimeFS) Open(name string) (http.File, error) {
    f, err := fs.fs.Open(name)
    if err != nil {
        return nil, err
    }
    return modTimeFile{f}, nil
}

type modTimeFile struct {
    http.File
}

func (f modTimeFile) Stat() (os.FileInfo, error) {
    fi, err := f.File.Stat()
    if err != nil {
        return nil, err
    }
    if fi.ModTime().IsZero() {
        // ... insert the custom logic you want to deal with zero mod time here
    }
    return fi, nil
}

This is the beauty and power of orthogonal interfaces.

I wanted to point out the above because it's possible. However, I don't neccessary recommend you actually use this, because I have a better suggestion.

I think the better solution here is to avoid dropping information. By using x/tools/godoc/vfs/mapfs as the intermediate filesystem, you're losing the original modify time, and later you're trying to use time.Now() as a replacement estimate. Instead, why not pass though the original modify time of the original files?

I suspect you're using x/tools/godoc/vfs/mapfs for convenience of being able to do a union of multiple files and/or filesystems:

fs := New(map[string]string{
	"foo/bar/three.txt": "a",
	"foo/bar.txt":       "b",
	"top.txt":           "c",
	"other-top.txt":     "d",
})

As an alternative solution I have created a modified mapfs, go-mapfs, which takes both the file contents and the file modification time as input. This way the true modification time can be preserved and caching will work even across rebuilds.

I think that's the way to go.

I've created similar utilities that work with http.FileSystem and don't have the drawback of dropping modtime, see github.com/shurcooL/httpfs collection.

Specifically, see union package.

I use it significantly in all my projects for this purpose. E.g.:

https://github.com/shurcooL/home/blob/e6f2e71819d2db3cd6fb1a781ac880cbfff3ed9b/assets/assets.go#L17-L21

https://github.com/shurcooL/home/blob/e6f2e71819d2db3cd6fb1a781ac880cbfff3ed9b/presentdata/assets.go#L24-L47

What do you think? Do you agree that it's not necessary to modify vfsgen to solve this problem?

@ZoltanLajosKis
Copy link
Author

You're saying you want custom behavior for how to handle zero value ModTime. You can create this http.FileSystem implementation wrapper and include whatever custom logic you want. This way, you're welcome to do this at any time, and change it to fit your changing needs, and not affect other people that may not want this behavior. It keeps vfsgen simple and orthogonal.

I understand your point, and now I agree that ModTime should be opaque for vfsgen. I did not consider the possibility that other people might actually want to rely on this, e.g., to disable HTTP caching (perhaps this commenter meant something like that). It is just inconvenient that while generating the assets code, none of the components' API mention explicitly that using zero ModTime means HTTP caching will be disabled in th end.

I suspect you're using x/tools/godoc/vfs/mapfs for convenience of being able to do a union of multiple files and/or filesystems.

Yes, you got me there. I created go-assets to collect assets from multiple locations and compile them to code with vfsgen. Although I not only collect from the file system, but also from HTTP endpoints; and also extract files selectively from archives. Here is an example usage: https://github.com/ZoltanLajosKis/gmdd/blob/master/generate/assets.go#L12

I've created similar utilities that work with http.FileSystem and don't have the drawback of dropping modtime, see github.com/shurcooL/httpfs collection.

That looks nice. I did consider keeping the File descriptors around instead of pulling everything into memory. But the amount of indirections (vfsgen referring to asset referring to archive referring to http request) seemed too complex for resource management and error propagation. Perhaps it's time to revisit with the http.FileSystem interface in mind? :-)

What do you think? Do you agree that it's not necessary to modify vfsgen to solve this problem?

I agree, no change is necessary. I will leave it up to you whether you think this issue is worth documenting in Usage, around the http.Handle("/assets/", http.FileServer(assets)) example.

@dmitshur
Copy link
Member

I agree, no change is necessary. I will leave it up to you whether you think this issue is worth documenting in Usage, around the http.Handle("/assets/", http.FileServer(assets)) example.

I'm interested in improving documentation. Reopening for that.

However, can you clarify which issue specifically you had in mind?

@dmitshur dmitshur reopened this Aug 29, 2017
@dmitshur
Copy link
Member

dmitshur commented Aug 29, 2017

I'll also mention that in the comparison section, I wrote:

That enables great flexibility through orthogonality, since helpers and wrappers can operate on http.FileSystem without knowing about vfsgen. If you want, you can perform pre-processing, minifying assets, merging folders, filtering out files and otherwise modifying input via generic http.FileSystem middleware.

You said:

Although I not only collect from the file system, but also from HTTP endpoints; and also extract files selectively from archives. Here is an example usage: https://github.com/ZoltanLajosKis/gmdd/blob/master/generate/assets.go#L12

That's really cool. I want to think a bit more about that. I think you're using vfsgen in exactly the way I anticipated, so if it's not scaling well for that kind of use, I want to better understand why.

I did consider keeping the File descriptors around instead of pulling everything into memory. But the amount of indirections (vfsgen referring to asset referring to archive referring to http request) seemed too complex for resource management and error propagation. Perhaps it's time to revisit with the http.FileSystem interface in mind? :-)

Do you mind elaborating a bit on that? What do you mean by revisiting the http.FileSystem interface?

I suspect that some of the challenges you're mentioning are because you're trying to convert from http.FileSystem to vfs.FileSystem. If you used http.FileSystem all the way through, it might work better. Is that not the case?

@ZoltanLajosKis
Copy link
Author

However, can you clarify which issue specifically you had in mind?

Serving vfsgen generated assets with http.FileServer does not automatically mean HTTP caching will happen. It only happens for files which have non-zero ModTimes in input file system to vfsgen. That's not always the case. For example, using mapfs and httpfs from golang.org/x/tools/godoc/vfs to provide the input will result in zero ModTimes.

Do you mind elaborating a bit on that? What do you mean by revisiting the http.FileSystem interface?

I meant that perhaps I should revisit my design and use http.FileSystem as the intermediary interface across processing steps. Tar.gz extraction could be modeled to take a single file from one file system and provide another file system with the archived files in it. A HTTP request could be modeled to provide a file system with a single file in it. Checksum verification could be modeled to take an input file system and provide an output file system with the same files; but return error on Open if there is a checksum mismatch.

I just need to better understand how resource management would look like here. For example, does tar.gz extraction reopen the input file whenever one output file is accessed? Or does it keep the input file descriptor open (and if so, when does it close it?) Or does it read the archive on creation and cache all archived files in memory? Etc.

@dmitshur dmitshur changed the title Special handling of zero value file modification time in httpfs input Document how to override file modification times to zero via http.FileSystem wrapper. Sep 18, 2018
@dmitshur dmitshur mentioned this issue Sep 26, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants