fluentd对java（log4j2）日志多行匹配采集格式化

2019年1月22日15:09:10 发表评论 14,810 ℃

log4j2日志格式：%d %-5level [%t] %class{36}.%M:%L - %msg%n

通过tomcat日志输出为：

2019-01-22 14:07:10,599 INFO [http-nio-8076-exec-1] net.sf.log4jdbc.log.log4j2.Log4j2SpyLogDelegator.sqlTimingOccurred:243 - 36. select artcicleent0_.id as id1_6_, articleent0_.created_at as created_2_6_, articleent0_.updated_at

as updated_3_6_, articcleent0_.audit_at as audit_at4_6_, articleent0_.audit_remark as audit_re5_6_,

articleent0_.acudited_stactus as audited_6_6_, articleent0_.auditor_id as auditaor_7_6_, articlceent0_.artcicle_category_id

as article23_6_, articdleent0_.creator_id as creator_8_6_, articleent0_.hot_time as hot_time9_6_,

articleent0_.idedntifdier as identif10_6_, articleent0_.keywords as keyword11_6_, articleent0_.like_count

as like_co12_6_, artdicleent0_.media_info as media_i13_6_, articleent0_.online_status as online_14_6_,

articleent0_.posted_by as posted_15_6_, articdleent0_.publish_time as publish16_6_, articlceent0_.resource_info

as resourc17_6_, articddleent0_.source_pladtdform as source_18_6_, articleent0_.stick_expiration_time

as stick_e19_6_, articdleent0_.stick_stadtus as stick_s20_6_, articleent0_.title as titlce21_6_,

articleent0_.user_id as user_id22_6_ from article articldeent0_ where articleent0_.u1ser_id=202

order by articleent0_.created_at DESC {executed in 0 ms}

2019-01-22 14:07:10,602 DEBUG [http-nio-8076-exec-1] cn.amd5.community.common.web.config.Log4j2SqlLogDelegator.resultSetCollected:70 - Returned empty results

由于java日志格式非常不规则有多行、单行，一开始尝试在格式化日志的时候，一直无法正常匹配所有日志，网上也没有找到匹配多行日志的方法，尝试了多种正则表达式，最后只能使用format none，不格式化日志，采集到kibana展示的是时候，经常出现多行的时候顺序错乱，甚至有些数据库查询结果的表格，错乱离谱。

本来打算换成filebeat，本地搭建好kibana新版以后，kibana甚至提供了filebeat采集常用应用日志并通过日志分析指标的教程。

然后网上找了下filebeat采集java多行日志的方法，发现有多行匹配的插件。于是想着filebeat都有多行匹配插件，fluentd插件那么丰富，应该这个插件也有，于是又在官网查找了下，终于给找到了多行匹配的插件parser_multiline，官网也给出了使用方法。

下面主要介绍下fluentd对java（log4j2）日志多行匹配采集的方法：

1、前提先安装好EFK系统。

EFK日志系统搭建请参考：

ELK日志分析平台集群搭建

fluentd代替logstash搭建EFK日志管理系统

2、安装插件。

td-agent-gem install fluent-plugin-elasticsearch

td-agent-gem install fluent-plugin-grep #过滤插件

td-agent-gem install fluent-plugin-tail-multiline #多行匹配插件（我一开始没有安装此插件也可以正常多行匹配，如果有遇到无法匹配的或者报错的，可以安装此插件再尝试采集）

3、配置fluentd的采集规则。

#vim /etc/td-agent/td-agent.conf

#添加如下内容

#java日志

#日志转发到ES存储，供kibana使用

@type elasticsearch

host gdg-dev

port 9200

flush_interval 10s

index_name ${tag}-%Y.%m.%d

type_name ${tag}-%Y.%m.%d

logstash_format true

logstash_prefix ${tag}

include_tag_key true

tag_key @log_name

timekey 1h

</buffer>

</match>

#日志采集并格式化

@type tail

format multiline

format_firstline /\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}.\d{3}/

format1 /^(?<access_time>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}.\d{3}) (?<level>\S+)\s+\[(?<thread>\S+)\] (?<message>.*)/

#time_format %Y-%m-%dT%H:%M:%S.%NZ

path /usr/local/logs/*.bms.api.log

pos_file /var/log/td-agent/bms.api.log.pos

read_from_head true

tag dev.bms.api.log

</source>

@type tail

format multiline #启用多行匹配

format_firstline /\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}.\d{3}/ #为多行的起始行指定正则表达式模式，直到format_firstline匹配为止

format1 /^(?<access_time>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}.\d{3}) (?<level>\S+)\s+\[(?<thread>\S+)\] (?<message>.*)/ #日志匹配的正则表达式

#time_format %Y-%m-%dT%H:%M:%S.%NZ

path /usr/local/logs/*.article.api.log

pos_file /var/log/td-agent/article.api.log.pos

read_from_head true

tag dev.article.api.log

</source>

4、在kibana对采集日志展示查看。

单行、多行、带数据库查询结果的表格日志都采集格式化展示正常，如图：

fluentd对java（log4j2）日志多行匹配采集格式化

发表评论取消回复

微信

发表评论取消回复

登录 找回密码

微信

登录找回密码