Skip to content

yokawasa/fluent-plugin-azure-loganalytics

Repository files navigation

fluent-plugin-azure-loganalytics

Azure Log Analytics output plugin for Fluentd. The plugin aggregates semi-structured data in real-time and writes the buffered data via HTTPS request to Azure Log Analytics.

fluent-plugin-azure-loganalytics overview

Requirements

fluent-plugin-azure-loganalytics fluentd ruby
>= 0.3.0 >= v0.14.15 >= 2.1
< 0.3.0 >= v0.12.0 >= 1.9

Installation

Installing gems into system Ruby

$ gem install fluent-plugin-azure-loganalytics

Installing gems into td-agent’s Ruby

If you installed td-agent and want to add this custom plugins, use td-agent-gem to install as td-agent has own Ruby so you should install gems into td-agent’s Ruby, not system Ruby:

$ /usr/sbin/td-agent-gem install fluent-plugin-azure-loganalytics

Please see also I installed td-agent and want to add custom plugins. How do I do it?

Configuration

Azure Log Analytics

To start running with Log Analytics in the Microsoft Operations Management Suite (OMS), You need to create either an OMS workspace using the OMS website or Log Analytics workspace using your Azure subscription. Workspaces created either way are functionally equivalent. Here is an instruction:

Once you have the workspace, get Workspace ID and Shared Key (either Primary Key or Secondary Key), which are needed by Log Analytics HTTP Data Collector API to post the data to Log Analytics.

Fluentd - fluent.conf

<match azure-loganalytics.**>
    @type azure-loganalytics
    customer_id CUSTOMER_ID   # Customer ID aka WorkspaceID String
    shared_key KEY_STRING     # The primary or the secondary Connected Sources client authentication key
    log_type EVENT_TYPE_NAME  # The name of the event type. ex) ApacheAccessLog
    endpoint myendpoint
    add_time_field true
    time_field_name mytime
    time_format %s
    localtime true
    add_tag_field true
    tag_field_name mytag
</match>
  • customer_id (required) - Your Operations Management Suite workspace ID

  • shared_key (required) - The primary or the secondary Connected Sources client authentication key

  • log_type (required) - The name of the event type that is being submitted to Log Analytics. log_type only supports alpha characters

  • endpoint (optional) - Default:'ods.opinsights.azure.com'. The service endpoint. You may want to use this param in case of sovereign cloud that has a different endpoint from the public cloud

  • time_generated_field (optional) - Default:''(empty string) The name of the time generated field. Be carefule that the value of field should strictly follow the ISO 8601 format (YYYY-MM-DDThh:mm:ssZ). See also this for more details

  • azure_resource_id (optional) - Default:''(empty string) The resource ID of the Azure resource the data should be associated with. This populates the _ResourceId property and allows the data to be included in resource-context queries in Azure Log Analytics (Azure Monitor). If this field isn't specified, the data will not be included in resource-context queries. The format should be like /subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/{resourceProviderNamespace}/{resourceType}/{resourceName}. Please see this for more detail on the resource ID format.

  • add_time_field (optional) - Default:true. This option allows to insert a time field to record

  • time_field_name (optional) - Default:time. This is required only when add_time_field is true

  • localtime (optional) - Default:false. Time record is inserted with UTC (Coordinated Universal Time) by default. This option allows to use local time if you set localtime true. This is valid only when add_time_field is true

  • time_format (optional) - Default:%s. Time format for a time field to be inserted. Default format is %s, that is unix epoch time. If you want it to be more human readable, set this %FT%T%z, for example. This is valid only when add_time_field is true.

  • add_tag_field (optional) - Default:false. This option allows to insert a tag field to record

  • tag_field_name (optional) - Default:tag. This is required only when add_time_field is true

Configuration examples

fluent-plugin-azure-loganalytics adds time and tag attributes by default if add_time_field and add_tag_field are true respectively. Below are two types of the plugin configurations - Default and All options configuration.

(1) Default Configuration (No options)

fluent_1.conf

<source>
    @type tail                         # input plugin
    path /var/log/apache2/access.log   # monitoring file
    pos_file /tmp/fluentd_pos_file     # position file
    format apache                      # format
    tag azure-loganalytics.access      # tag
</source>

<match azure-loganalytics.**>
    @type azure-loganalytics
    customer_id 818f7bbc-8034-4cc3-b97d-f068dd4cd658
    shared_key ppC5500KzCcDsOKwM1yWUvZydCuC3m+ds/2xci0byeQr1G3E0Jkygn1N0Rxx/yVBUrDE2ok3vf4ksCzvBmQXHw==(dummy)
    log_type ApacheAccessLog
</match>

(2) Configuration with All Options

fluent_2.conf

<source>
    @type tail                         # input plugin
    path /var/log/apache2/access.log   # monitoring file
    pos_file /tmp/fluentd_pos_file     # position file
    format apache                      # format
    tag azure-loganalytics.access      # tag
</source>

<match azure-loganalytics.**>
    @type azure-loganalytics
    customer_id 818f7bbc-8034-4cc3-b97d-f068dd4cd658
    shared_key ppC5500KzCcDsOKwM1yWUvZydCuC3m+ds/2xci0byeQr1G3E0Jkygn1N0Rxx/yVBUrDE2ok3vf4ksCzvBmQXHw==(dummy)
    log_type ApacheAccessLog
    azure_resource_id /subscriptions/11111111-1111-1111-1111-111111111111/resourceGroups/otherResourceGroup/providers/Microsoft.Storage/storageAccounts/examplestorage
    add_time_field true
    time_field_name mytime
    time_format %FT%T%z
    localtime true
    add_tag_field true
    tag_field_name mytag
</match>

(3) Configuration with Typecast filter

You want to add typecast filter when you want to cast fields type. The filed type of code and size are cast by typecast filter. fluent_typecast.conf

<source>
    @type tail                         # input plugin
    path /var/log/apache2/access.log   # monitoring file
    pos_file /tmp/fluentd_pos_file     # position file
    format apache                      # format
    tag azure-loganalytics.access      # tag
</source>

<filter **>
    @type typecast
    types host:string,user:string,method:string,path:string,referer:string,agent:string,code:integer,size:integer
</filter>

<match azure-loganalytics.**>
    @type azure-loganalytics
    customer_id 818f7bbc-8034-4cc3-b97d-f068dd4cd658
    shared_key ppC5500KzCcDsOKwM1yWUvZydCuC3m+ds/2xci0byeQr1G3E0Jkygn1N0Rxx/yVBUrDE2ok3vf4ksCzvBmQXHw==(dummy)
    log_type ApacheAccessLog
    add_time_field true
    time_field_name mytime
    time_format %FT%T%z
    localtime true
    add_tag_field true
    tag_field_name mytag
</match>

[note] you need to install fluent-plugin-filter-typecast for the sample configuration above.

gem install fluent-plugin-filter_typecast

(4) Configuration with CSV format as input and specific field type as output

You want to send to Log Analytics, logs generated with known delimiter (like comma, semi-colon) then you can use the csv format of fluentd and the keys/types properties. This can be used with any log, here implemented with Nginx custom log. fluent_csv.conf

Suppose your log is formated the way below in the /etc/nginx/conf.d/log.conf:

log_format appcustomlog '"$time_iso8601";"$hostname";$bytes_sent;$request_time;$upstream_response_length;$upstream_response_time;$content_length;"$remote_addr";$status;"$host";"$request";"$http_user_agent"';

And this log is activated throught the /etc/nginx/conf.d/virtualhost.conf :

server {
	...
	access_log /var/log/nginx/access.log appcustomlog;
	...
}

You can use the following configuration for the source to tail the log file and format it with proper field type.

<source>
  @type tail
  path /var/log/nginx/access.log
  pos_file /var/log/td-agent/access.log.pos
  tag nginx.accesslog
  format csv
  delimiter ;
  keys time,hostname,bytes_sent,request_time,content_length,remote_addr,status,host,request,http_user_agent
  types time:time,hostname:string,bytes_sent:float,request_time:float,content_length:string,remote_addr:string,status:integer,host:string,request:string,http_user_agent:string
  time_key time
  time_format %FT%T%z
</source>

<match nginx.accesslog>
    @type azure-loganalytics
    customer_id 818f7bbc-8034-4cc3-b97d-f068dd4cd658
    shared_key ppC5500KzCcDsOKwM1yWUvZydCuC3m+ds/2xci0byeQr1G3E0Jkygn1N0Rxx/yVBUrDE2ok3vf4ksCzvBmQXHw==(dummy)
    log_type NginxAcessLog
    time_generated_field time
    time_format %FT%T%z
    add_tag_field true
    tag_field_name mytag
</match>

Sample inputs and expected records

An expected output record for sample input will be like this:

Sample Input (apache access log)

124.211.152.156 - - [10/Dec/2016:05:28:52 +0000] "GET /test/foo.html HTTP/1.1" 200 323 "-" "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36"

Output Record

The output record for sample input can be seen at Log Analytics portal like this:

fluent-plugin-azure-loganalytics output image

Sample Input (nginx custom access log)

"2017-12-13T11:31:59+00:00";"nginx0001";21381;0.238;20882;0.178;-;"193.192.35.178";200;"mynginx.domain.com";"GET /mysite/picture.jpeg HTTP/1.1";"Mozilla/5.0 (Windows NT 10.0; Win64; x64) Chrome/63.0.3239.84 Safari/537.36"

Output Record

Part of the output record for sample input can be seen at Log Analytics portal like this with field of type _s (string) or _d (double):

fluent-plugin-azure-loganalytics output image

Tests

Running test code (using System rake)

$ git clone https://github.com/yokawasa/fluent-plugin-azure-loganalytics.git
$ cd fluent-plugin-azure-loganalytics

# edit CONFIG params of test/plugin/test_azure_loganalytics.rb
$ vi test/plugin/test_azure_loganalytics.rb

# run test
$ rake test

Running test code (using td-agent's rake)

$ git clone https://github.com/yokawasa/fluent-plugin-azure-loganalytics.git
$ cd fluent-plugin-azure-loganalytics

# edit CONFIG params of test/plugin/test_azure_loganalytics.rb
$ vi test/plugin/test_azure_loganalytics.rb

# run test 
$ /opt/td-agent/embedded/bin/rake test

Creating package, running and testing locally

$ rake build
$ rake install:local

# running fluentd with your fluent.conf
$ fluentd -c fluent.conf -vv &

# send test apache requests for testing plugin ( only in the case that input source is apache access log )
$ ab -n 5 -c 2 http://localhost/test/foo.html

Data Limits

As described in Azure Monitor Data Collection API doc, there are some constraints around the data posted to the Azure Monitor Data collection API. Here are relevant constraints:

  • Max payload size: 30 BM
  • Max field value size: 32 KB
  • Max characters num for each field name: 500

Please be noticed that the plugin checks the max payload size before it post to the API (>=0.7.0), however it doesn't check max field value size and max charactores num for each field name.

Change log

Links

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/yokawasa/fluent-plugin-azure-loganalytics.

Copyright

CopyrightCopyright (c) 2016- Yoichi Kawasaki
LicenseApache License, Version 2.0