Add decode_size_limit_bytes option.#30
Add decode_size_limit_bytes option.#30andrewvc wants to merge 1 commit intologstash-plugins:mainfrom
Conversation
| it "should raise an error if the max bytes are exceeded" do | ||
| expect { | ||
| subject.decode(maximum_payload << "z") | ||
| }.to raise_error(RuntimeError, "input buffer full") |
There was a problem hiding this comment.
is there a way to provide more context to this exception? input buffer full feels a little vague, or am I missing something?
There was a problem hiding this comment.
Unfortunately that exception is provided by the FileWatch library, so it'd require a patch there. I should probably wrap and re-raise it.
There was a problem hiding this comment.
I just submitted https://github.com/jordansissel/ruby-filewatch/pull/82/files to enable us to catch a more precise exception.
| # Maximum number of bytes for a single line before a fatal exception is raised | ||
| # which will stop Logsash. | ||
| # The default is 20MB which is quite large for a JSON document | ||
| config :decode_size_limit_bytes, :validate => :number, :default => 20 * (1024 * 1024) # 20MB |
There was a problem hiding this comment.
- Is
decodethe right name? The description says this is bytes for a line, but then we call itdecodewhich isn't something mentioned elsewhere in the docs. - Exceeding this will cause a fatal error in Logstash and stop the process? Is this the desired behavior?
There was a problem hiding this comment.
If the size limit is exceeded, where do we show that this exception will terminate Logstash? I don't see it when I read through the code.
Test planUsed 1 line big json file (~1GB), limited the Java heap to 512Mb, and processed with a file input plugin. It goes in OOM Generate one big json fileUse the script to generate it: require "json"
part = [
{:name => "Jannik", :surname => "Sinner"},
{:name => "Novak", :surname => "Djokovic"},
{:name => "Rafa", :surname => "Nadal"},
{:name => "Roger", :surname => "Federer"},
{:name => "Pete", :surname => "Sampras"},
{:name => "André", :surname => "Agassi"},
{:name => "Rod", :surname => "Laver"},
{:name => "Ivan", :surname => "Lendl"},
{:name => "Bjorn", :surname => "Borg"},
{:name => "John", :surname => "McEnroe"},
{:name => "Jimmy", :surname => "Connors"}
]
json_part = JSON.generate(part)
out_file = File.open("big_single_line.json", "a")
out_file.write "{"
counter = 1
desired_size = 1024 * 1024 * 1024
actual_size = 0
while actual_size < desired_size do
json_fragment = "\"field_#{counter}\": #{json_part}"
actual_size += json_fragment.size
if actual_size < desired_size
json_fragment += ","
end
counter += 1
out_file.write json_fragment
end
out_file.write "}\r\n"
out_file.flush
puts "Done! output file is #{out_file.size} bytes"
out_file.closeConfigure LogstashIn and execute the pipeline Configure this patch PR, in "logstash-codec-json_lines"with "logstash-codec-json_lines", :path => "/Users/andrea/workspace/logstash_plugins/logstash-codec-json_lines"and execute bin/logstash-plugin install --no-verifyResultIt fails with following logs |
|
The problem with this approach is that the size limit works only when the data is fully loaded in memory: In such case, as the exception's stacktrace exposes: the problem is throw before, when the data are still being read from the IO. Can't be catched by this codec because it's raised by the input |
Resolves #29 . This is most useful when people accidentally use this codec on something that is not properly newline delimited. That can easily lead to an OOM.
Superseded by #43