Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decoder seems to be orders of magnitude slower than the standard library's Decoder #491

Open
jgodlew opened this issue Dec 22, 2023 · 0 comments

Comments

@jgodlew
Copy link

jgodlew commented Dec 22, 2023

During an investigation into a performance issue with our site, we narrowed down the issue to our JSON parsing code taking an extremely long time to parse larger JSON files (multiple minutes to parse a 30MB file). Running a series of benchmarks seems to show that using json.NewDecoder(...).decode(...) is significantly slower than reading the contents into memory and then using json.Unmarshal(...). In fact it seems significantly slower than even the standard library's json.NewDecoder(...).decode(...).

The following are the results of benchmarking the (Unmarshal | Decode) methods of the (encoding/json | goccy/go-json) libraries with a typical file we use:

$ go test -bench=.
goos: darwin
goarch: arm64
pkg: bench
BenchmarkUnmarshalEncodingJson-10    	     278	   4217963 ns/op
BenchmarkUnmarshalGoJson-10          	     723	   1619416 ns/op
BenchmarkDecodeEncodingJson-10       	     292	   4070073 ns/op
BenchmarkDecodeGoJson-10             	       2	 531405062 ns/op
PASS
ok  	bench	6.261s

I've attached the benchmarking code:

package main

import (
	"encoding/json"
	json2 "github.com/goccy/go-json"
	"github.com/stretchr/testify/assert"
	"io"
	"os"
	"testing"
)

type MultiCommitActions struct {
	Action          string `json:"action"`
	FilePath        string `json:"file_path"`
	PreviousPath    string `json:"previous_path,omitempty"`
	Content         string `json:"content"`
	ExecuteFileMode bool   `json:"execute_filemode,omitempty"`
	Encoding        string `json:"encoding,omitempty"`
	LastCommitID    string `json:"last_commit_id,omitempty"`
}
type MultiCommit struct {
	Branch        string `json:"branch"`
	CommitMessage string `json:"commit_message"`

	AuthorName  string `json:"author_name"`
	AuthorEmail string `json:"author_email"`

	StartBranch string `json:"start_branch,omitempty"`
	StartSHA    string `json:"start_sha,omitempty"`

	CreateRef bool `json:"create_ref"`

	Actions []MultiCommitActions `json:"actions"`
}

const jsonFile = "./test.json"

func UnmarshalTest(b *testing.B, unmarshalFn func([]byte, interface{}) error) {
	file, err := os.Open(jsonFile)
	assert.NoError(b, err)
	s := MultiCommit{}
	f, err := io.ReadAll(file)
	assert.NoError(b, err)

	err = unmarshalFn(f, &s)
	assert.NoError(b, err)
}

func BenchmarkUnmarshalEncodingJson(b *testing.B) {
	for i := 0; i < b.N; i++ {
		UnmarshalTest(b, json.Unmarshal)
	}
}

func BenchmarkUnmarshalGoJson(b *testing.B) {
	for i := 0; i < b.N; i++ {
		UnmarshalTest(b, json2.Unmarshal)
	}
}

func DecodeTest(b *testing.B, decodeFn func(io.Reader, interface{}) error) {
	file, err := os.Open(jsonFile)
	assert.NoError(b, err)
	s := MultiCommit{}

	err = decodeFn(file, &s)
	assert.NoError(b, err)
}

func BenchmarkDecodeEncodingJson(b *testing.B) {
	for i := 0; i < b.N; i++ {
		DecodeTest(b, func(reader io.Reader, i interface{}) error {
			return json.NewDecoder(reader).Decode(i)
		})
	}
}

func BenchmarkDecodeGoJson(b *testing.B) {
	for i := 0; i < b.N; i++ {
		DecodeTest(b, func(reader io.Reader, i interface{}) error {
			return json2.NewDecoder(reader).Decode(i)
		})
	}
}

I've also attached the test JSON file: test.json

Are there any configurations or settings we should set on the Decoder to fix this performance issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant