log4j2日志格式:%d %-5level [%t] %class{36}.%M:%L - %msg%n
通过tomcat日志输出为:
2019-01-22 14:07:10,599 INFO [http-nio-8076-exec-1] net.sf.log4jdbc.log.log4j2.Log4j2SpyLogDelegator.sqlTimingOccurred:243 - 36. select artcicleent0_.id as id1_6_, articleent0_.created_at as created_2_6_, articleent0_.updated_at
as updated_3_6_, articcleent0_.audit_at as audit_at4_6_, articleent0_.audit_remark as audit_re5_6_,
articleent0_.acudited_stactus as audited_6_6_, articleent0_.auditor_id as auditaor_7_6_, articlceent0_.artcicle_category_id
as article23_6_, articdleent0_.creator_id as creator_8_6_, articleent0_.hot_time as hot_time9_6_,
articleent0_.idedntifdier as identif10_6_, articleent0_.keywords as keyword11_6_, articleent0_.like_count
as like_co12_6_, artdicleent0_.media_info as media_i13_6_, articleent0_.online_status as online_14_6_,
articleent0_.posted_by as posted_15_6_, articdleent0_.publish_time as publish16_6_, articlceent0_.resource_info
as resourc17_6_, articddleent0_.source_pladtdform as source_18_6_, articleent0_.stick_expiration_time
as stick_e19_6_, articdleent0_.stick_stadtus as stick_s20_6_, articleent0_.title as titlce21_6_,
articleent0_.user_id as user_id22_6_ from article articldeent0_ where articleent0_.u1ser_id=202
order by articleent0_.created_at DESC {executed in 0 ms}
2019-01-22 14:07:10,602 DEBUG [http-nio-8076-exec-1] cn.amd5.community.common.web.config.Log4j2SqlLogDelegator.resultSetCollected:70 - Returned empty results
由于java日志格式非常不规则有多行、单行,一开始尝试在格式化日志的时候,一直无法正常匹配所有日志,网上也没有找到匹配多行日志的方法,尝试了多种正则表达式,最后只能使用format none,不格式化日志,采集到kibana展示的是时候,经常出现多行的时候顺序错乱,甚至有些数据库查询结果的表格,错乱离谱。
本来打算换成filebeat,本地搭建好kibana新版以后,kibana甚至提供了filebeat采集常用应用日志并通过日志分析指标的教程。
然后网上找了下filebeat采集java多行日志的方法,发现有多行匹配的插件。于是想着filebeat都有多行匹配插件,fluentd插件那么丰富,应该这个插件也有,于是又在官网查找了下,终于给找到了多行匹配的插件parser_multiline,官网也给出了使用方法。
下面主要介绍下fluentd对java(log4j2)日志多行匹配采集的方法:
1、前提先安装好EFK系统。
EFK日志系统搭建请参考:
2、安装插件。
td-agent-gem install fluent-plugin-elasticsearch
td-agent-gem install fluent-plugin-grep #过滤插件
td-agent-gem install fluent-plugin-tail-multiline #多行匹配插件(我一开始没有安装此插件也可以正常多行匹配,如果有遇到无法匹配的或者报错的,可以安装此插件再尝试采集)
3、配置fluentd的采集规则。
#vim /etc/td-agent/td-agent.conf
#添加如下内容
#java日志
#日志转发到ES存储,供kibana使用
<match dev.**>
@type elasticsearch
host gdg-dev
port 9200
flush_interval 10s
index_name ${tag}-%Y.%m.%d
type_name ${tag}-%Y.%m.%d
logstash_format true
logstash_prefix ${tag}
include_tag_key true
tag_key @log_name
<buffer tag, time>
timekey 1h
</buffer>
</match>
#日志采集并格式化
<source>
@type tail
format multiline
format_firstline /\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}.\d{3}/
format1 /^(?<access_time>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}.\d{3}) (?<level>\S+)\s+\[(?<thread>\S+)\] (?<message>.*)/
#time_format %Y-%m-%dT%H:%M:%S.%NZ
path /usr/local/logs/*.bms.api.log
pos_file /var/log/td-agent/bms.api.log.pos
read_from_head true
tag dev.bms.api.log
</source>
<source>
@type tail
format multiline #启用多行匹配
format_firstline /\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}.\d{3}/ #为多行的起始行指定正则表达式模式,直到format_firstline匹配为止
format1 /^(?<access_time>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}.\d{3}) (?<level>\S+)\s+\[(?<thread>\S+)\] (?<message>.*)/ #日志匹配的正则表达式
#time_format %Y-%m-%dT%H:%M:%S.%NZ
path /usr/local/logs/*.article.api.log
pos_file /var/log/td-agent/article.api.log.pos
read_from_head true
tag dev.article.api.log
</source>
4、在kibana对采集日志展示查看。
单行、多行、带数据库查询结果的表格日志都采集格式化展示正常,如图: