文章信息

大数据实战学习笔记7-flume分布式部署和数据采集

发布时间:『 2019-05-13 05:10』  文章类别:学习日志  阅读(1283) 评论(0)

以下笔记为简略记录,详细实战内容待补充。

http://flume.apache.org/

[zeal@data1 softwares]$ tar -zxf apache-flume-1.7.0-bin.tar.gz -C /opt/modules/

将data1上flume发送至data2上

配置data2上flume配置文件

将data2上flume发送至data3,并修改配置文件


*[重点]配置data1上的flume*


清理日志文件格式

将文件中制表符转换为逗号

[zeal@data1 datas]$ cat SogouQ.reduced|tr "\t" "," > weblog.log

将文件中空格转换为逗号

[zeal@data1 datas]$ cat weblog.log|tr " " "," > weblogs.log

将data1上的weblogs.log发送到data2、data3上


导入flume-src到eclipse

开发flume-ng-hbase-sink包

导出jar文件,替换lib下的jar


配置flume-config kafka部分


开发应用服务模拟程序weblogs.jar并上传分发服务器

对weblogs.jar赋予执行权限

[zeal@data2 tools]$ chmod 777 weblogs.jar

编写应用程序脚本

[zeal@data2 datas]$ touch weblog-shell.sh

[zeal@data2 datas]$ vi weblog-shell.sh

#/bin/bash

echo "start log......"

java -jar /opt/tools/weblogs.jar /opt/datas/weblogs.log /opt/datas/weblog-flume.log

[zeal@data2 datas]$ chmod u+x weblog-shell.sh

[zeal@data2 datas]$ touch weblog-flume.log

将脚本weblog-shell.sh和文件weblog-flume.log发送至data3

修改data2和data3上的flume-config

agent2.sources.r1.command = tail -F /opt/datas/weblog-flume.log

agent3.sources.r1.command = tail -F /opt/datas/weblog-flume.log


编写flume启动脚本



删除topics

[zeal@data3 zookeeper-3.4.5-cdh5.10.0]$ bin/zkCli.sh

[zk: localhost:2181(CONNECTED) 0] ls /

[hbase, hadoop-ha, zeal, admin, zookeeper, consumers, config, controller, yarn-leader-election, brokers, controller_epoch]

[zk: localhost:2181(CONNECTED) 1] ls /brokers

[topics, ids]

[zk: localhost:2181(CONNECTED) 2] ls /brokers/topics

[test]

[zk: localhost:2181(CONNECTED) 3] rmr /brokers/topics/test

[zk: localhost:2181(CONNECTED) 4] ls /brokers/topics

[]

[zk: localhost:2181(CONNECTED) 5] quit


创建topics:weblogs(两台备份机)

[zeal@data2 kafka_2.11-0.9.0.0]$ bin/kafka-topics.sh --create --zookeeper data1.zeal.name:2181,data2.zeal.name:2181,data3.zeal.name:2181 --replication-factor 2 --partitions 1 --topic weblogs



数据流介绍:

flume-agent2/agent3-->flume1-->kafka(实时)

flume-agent2/agent3-->flume1-->hbase(离线)


联动测试

1.启动zookeeper

2.启动hdfs

3.启动hbase

4.启动kafka

5.启动flume-agent1

6.启动flume-agent2,flume-agent3

7.启动weblogs.jar


重置hbase表

[zeal@data1 hbase-0.98.6-cdh5.3.0]$ bin/hbase shell

2019-04-25 10:21:03,209 INFO  [main] Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available

HBase Shell; enter 'help<RETURN>' for list of supported commands.

Type "exit<RETURN>" to leave the HBase Shell

Version 0.98.6-cdh5.3.0, rUnknown, Tue Dec 16 19:18:44 PST 2014


hbase(main):001:0> count 'weblogs'

2019-04-25 10:21:24,456 WARN  [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Current count: 1000, row: 706711585607364200:00:081556157660153                         

1360 row(s) in 1.3760 seconds


=> 1360

hbase(main):002:0> truncate 'weblogs'

Truncating 'weblogs' table (it may take a while):

 - Disabling table...

 - Dropping table...

 - Creating table...

0 row(s) in 6.2030 seconds


hbase(main):003:0> quit


关键字:  flume  大数据  集群  kafka
评论信息
暂无评论
发表评论
验证码: 
当前时间
小主信息

愿历尽千帆,归来仍少年。
3D标签云

Anything in here will be replaced on browsers that support the canvas element

友情链接

Copyright ©2017-2024 uzen.zone
湘ICP备17013178号-3