清华开源软件镜像
https://mirrors.tuna.tsinghua.edu.cn/apache/
1. 安装
1) Mac上安装
参考:
https://segmentfault.com/a/1190000016901469
https://www.jianshu.com/p/17676d34dd35
启动位置:
/usr/local/Cellar/apache-flink/1.6.2/libexec/bin
启动与停止:
$ ./start-cluster.sh
$ ./stop-cluster.sh
2)Windows上安装
参考
https://ci.apache.org/projects/flink/flink-docs-stable/start/flink_on_windows.html
启动位置
D:\flink-1.6.2-bin-scala_2\flink-1.6.2\bin
启动
$ start-cluster.bat
# Starting a local cluster with one JobManager process and one TaskManager process.
# You can terminate the processes via CTRL-C in the spawned shell windows.
# Web interface by default on [http://localhost:8081/.]
启动一个job
进入目录,运行
flink.bat run -c wikiedits.WikipediaAnalysis D:\Flink_Project\target\original-wiki-edits-1.0-SNAPSHOT.jar 127.0.0.1 9000
2. 运行第一个job
1) Java程序
- maven 模版
$ mvn archetype:generate \
-DarchetypeGroupId=org.apache.flink \
-DarchetypeArtifactId=flink-quickstart-java \
-DarchetypeVersion=1.6.2
mvn archetype:generate \
-DarchetypeGroupId=org.apache.flink \
-DarchetypeArtifactId=flink-quickstart-java \
-DarchetypeVersion=1.6.1 \
-DgroupId=wiki-edits \
-DartifactId=wiki-edits \
-Dversion=0.1 \
-Dpackage=wikiedits \
-DinteractiveMode=false
- 自定义任务
package FlinkTest;
import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.util.Collector;
public class SocketTextStreamWordCount {
public static void main(String[] args) throws Exception {
//参数检查
if (args.length != 2) {
System.err.println("USAGE:\nSocketTextStreamWordCount <hostname> <port>");
return;
}
String hostname = args[0];
Integer port = Integer.parseInt(args[1]);
// set up the streaming execution environment
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
//获取数据
DataStreamSource<String> stream = env.socketTextStream(hostname, port);
//计数
SingleOutputStreamOperator<Tuple2<String, Integer>> sum = stream.flatMap(new LineSplitter())
.keyBy(0)
.sum(1);
sum.print();
env.execute("Java WordCount from SocketTextStream Example");
}
public static final class LineSplitter implements FlatMapFunction<String, Tuple2<String, Integer>> {
@Override
public void flatMap(String s, Collector<Tuple2<String, Integer>> collector) {
String[] tokens = s.toLowerCase().split("\\W+");
for (String token: tokens) {
if (token.length() > 0) {
collector.collect(new Tuple2<String, Integer>(token, 1));
}
}
}
}
}
打包
进入工程目录(pom.xml所在目录),使用以下命令打包。
$ maven clean package -Dmaven.test.skip=true执行
flink run -c FlinkTest.SocketTextStreamWordCount \
/Users/dodoyuan/IdeaProjects/flink-quickstart/target/\
original-flink-quickstart-1.8-SNAPSHOT.jar 127.0.0.1 9000
完整参考来源:
https://www.jianshu.com/p/17676d34dd35
2) python程序
- 自定义任务
from flink.plan.Environment import get_environment
from flink.functions.GroupReduceFunction import GroupReduceFunction
class Adder(GroupReduceFunction):
def reduce(self, iterator, collector):
count, word = iterator.next()
count += sum([x[0] for x in iterator])
collector.collect((count, word))
env = get_environment()
data = env.from_elements("Who's there?",
"I think I hear them. Stand, ho! Who's there?")
data \
.flat_map(lambda x, c: [(1, word) for word in x.lower().split()]) \
.group_by(1) \
.reduce_group(Adder(), combinable=True) \
.output()
env.execute(local=True)
运行
/usr/local/Cellar/apache-flink/1.6.2/libexec/bin/pyflink.sh wordcount.py
参考:
http://www.willmcginnis.com/2015/11/08/getting-started-with-python-and-apache-flink/-
查看日志文件
其他参考:
https://ci.apache.org/projects/flink/flink-docs-stable/quickstart/java_api_quickstart.html