Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Max nesting level for json parser #3311

Open
bazzilio opened this issue Mar 31, 2021 · 9 comments
Open

Max nesting level for json parser #3311

bazzilio opened this issue Mar 31, 2021 · 9 comments
Assignees

Comments

@bazzilio
Copy link

bazzilio commented Mar 31, 2021

Is your feature request related to a problem? Please describe.
I want to have option for the json parser plugin to limit nesting level for the parsing.
My developers send huge metadata json, after parsing it "eats" elasticsearch fields.

curl --progress-bar "http://127.0.0.1:9200/index-logs1/_field_caps?fields=*" | jq '.fields'  | grep -E '^  "' 2>/dev/null | grep metadata | awk -F. '{print NF}' | sort -n | wc -l 
733

curl --progress-bar "http://127.0.0.1:9200/index-logs1/_field_caps?fields=*" | jq '.fields'  | grep -E '^  "' 2>/dev/null | grep context | awk -F. '{print NF}' | sort -n | wc -l 
218

But if i could limit nesting level for parsing, it would dramatically decreased fields count:

curl --progress-bar "http://127.0.0.1:9200/index-logs1/_field_caps?fields=*" | jq '.fields'  | grep -E '^  "' 2>/dev/null | grep metadata | awk -F. 'NF>5 {print NF}' | sort -n | wc -l 
101

curl --progress-bar "http://127.0.0.1:9200/index-logs1/_field_caps?fields=*" | jq '.fields'  | grep -E '^  "' 2>/dev/null | grep context | awk -F. 'NF>5 {print NF}' | sort -n | wc -l 
25

Describe the solution you'd like
Set parameter to json parser section - max_nesting(int)
So the parser would leave unparsed json after the nesting is reacher.

Describe alternatives you've considered

Additional context
As i can see, parameter support with main json ruby libraries:

@bazzilio
Copy link
Author

bazzilio commented Mar 31, 2021

One more question:
is there a way to change DEFAULT_OJ_OPTIONS variable ?
If i correct understang login in sources - looks like oj is the default parser.
But as i see, for parse_io method fluentd uses yajl, so i am confused - which parser is using by default.

@ashie
Copy link
Member

ashie commented Apr 2, 2021

is there a way to change DEFAULT_OJ_OPTIONS variable ?

It seems there is no way to do it (pull request is welcome 😄)

If i correct understang login in sources - looks like oj is the default parser.
But as i see, for parse_io method fluentd uses yajl, so i am confused - which parser is using by default.

It seems that oj is optional, it ensures to use oj if it's available but not required mandatory.
On the other hand yajl is madatory required. If oj isn't installed, fall back to yajl.

gem.add_runtime_dependency("yajl-ruby", ["~> 1.0"])

gem.add_development_dependency("oj", [">= 2.14", "< 4"])

rescue LoadError => ex
name = :yajl
if log
if /\boj\z/ =~ ex.message
log.info "Oj is not installed, and failing back to Yajl for json parser"
else
log.warn ex.message
end
end
retry
end

In addition, there is the following description about yajl in the document of this plugin:

yajl: Mainly for stream parsing

@ashie
Copy link
Member

ashie commented Apr 2, 2021

It seems that oj is optional, it ensures to use oj if it's installed but not required mandatory.
On the other hand yajl is madatory required. If oj isn't installed, fall back to yajl.

However, it surely confusing. Because it's not documented, users can't understand such behavior.
We should update the document: https://github.com/fluent/fluentd-docs-gitbook/blob/1.0/parser/json.md

@ashie
Copy link
Member

ashie commented Apr 5, 2021

@ashie
Copy link
Member

ashie commented Jul 14, 2021

Fixed by #3315
You can use FLUENT_OJ_OPTION_MAX_NESTING for it.

@ashie ashie closed this as completed Jul 14, 2021
@ashie
Copy link
Member

ashie commented Sep 21, 2021

Now I've noticed that Oj.default_options doesn't accept :max_nesting: https://www.rubydoc.info/github/ohler55/oj/Oj.default_options

It's reported at https://app.slack.com/client/T0CSKNZLK/C0CTT63EE/thread/C0CTT63EE-1631532462.067500

We should consider other way to apply it.

@ashie ashie reopened this Sep 21, 2021
@vishalmamidi1
Copy link

Does FLUENT_OJ_OPTION_MAX_NESTING still doesn't work?

@ashie
Copy link
Member

ashie commented Mar 1, 2022

Does FLUENT_OJ_OPTION_MAX_NESTING still doesn't work?

Yes, it doesn't work. Because now I notice that Oj.default_options doesn't support it, I'll remove it.
Instead, I'm considering to add max_nesting parameter to parser_json.

@ashie
Copy link
Member

ashie commented Oct 27, 2022

ashie added a commit that referenced this issue Oct 27, 2022
This option doesn't take effect in actual since the global setting
`Oj.default_options` doesn't accept `max_nesting`. It should be
specified by each instances.

#3311 (comment)

Signed-off-by: Takuro Ashie <ashie@clear-code.com>
ashie added a commit that referenced this issue Oct 27, 2022
This option doesn't take effect in actual since the global setting
`Oj.default_options` doesn't accept `max_nesting`. It should be
specified by each instances.

#3311 (comment)

Signed-off-by: Takuro Ashie <ashie@clear-code.com>
ashie added a commit that referenced this issue Oct 27, 2022
This option doesn't take effect in actual since the global setting
`Oj.default_options` doesn't accept `max_nesting`. It should be
specified by each instances.

#3311 (comment)

Signed-off-by: Takuro Ashie <ashie@clear-code.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

No branches or pull requests

3 participants