前言:
Flink最近刚发布了1.11.0版本,由于加了很多新的功能,对sql的支持更加全面,我就迫不及待的在本地运行了个demo,但是运行的时候报错了:
Exception in thread "main" java.lang.IllegalStateException: No operators defined in streaming topology. Cannot execute.
虽然报错,但任务却是正常运行,不过任务却不能指定jobname了。
原因分析
1. 为什么报错
先看下我的代码:
public static void main(String[] args) {
treamExecutionEnvironment streamEnv = StreamExecutionEnvironment.getExecutionEnvironment();
EnvironmentSettings settings = EnvironmentSettings.newInstance()
.inStreamingMode()
.useBlinkPlanner()
.build();
StreamTableEnvironment streamTableEnv = StreamTableEnvironment.create(streamEnv, settings);
streamEnv.setParallelism(1);
streamTableEnv.executeSql("CREATE TABLE source xxxx");
streamTableEnv.executeSql("CREATE TABLE sink xxxx");
streamTableEnv.executeSql("INSERT INTO sink xxxxx FROM source");
streamEnv.execute("FlinkTest");
}
报错代码在 streamEnv.execute()
, 程序找不到算子,所以报错?那问题出在哪?我们先回顾flink1.10.0的版本,看下之前是怎么执行的。之前的版本是通过 sqlUpdate()
方法执行sql的:
public void sqlUpdate(String stmt) {
List<Operation> operations = parser.parse(stmt);
if (operations.size() != 1) {
throw new TableException(UNSUPPORTED_QUERY_IN_SQL_UPDATE_MSG);
}
Operation operation = operations.get(0);
if (operation instanceof ModifyOperation) {
List<ModifyOperation> modifyOperations = Collections.singletonList((ModifyOperation) operation);
// 一直是false
if (isEagerOperationTranslation()) {
translate(modifyOperations);
} else {
// 加到transfomation的list中
buffer(modifyOperations);
}
} else if (operation instanceof CreateTableOperation) {
....
}
}
/**
* Defines the behavior of this {@link TableEnvironment}. If true the queries will
* be translated immediately. If false the {@link ModifyOperation}s will be buffered
* and translated only when {@link #execute(String)} is called.
*
* <p>If the {@link TableEnvironment} works in a lazy manner it is undefined what
* configurations values will be used. It depends on the characteristic of the particular
* parameter. Some might used values current to the time of query construction (e.g. the currentCatalog)
* and some use values from the time when {@link #execute(String)} is called (e.g. timeZone).
*
* @return true if the queries should be translated immediately.
*/
protected boolean isEagerOperationTranslation() {
return false;
}
从isEagerOperationTranslation
方法注释就很清楚的知道了,任务只有在 调用execute(String)
方法的时候才会把算子遍历组装成task,这其实是1.11版本之前flink运行sql任务的逻辑。但是1.11版本后,我们不需要再显示指定 execute(String)
方法执行sql任务了(jar包任务不受影响)。下面我们来看1.11版本的 executeSql方法:
@Override
public TableResult executeSql(String statement) {
List<Operation> operations = parser.parse(statement);
if (operations.size() != 1) {
throw new TableException(UNSUPPORTED_QUERY_IN_EXECUTE_SQL_MSG);
}
return executeOperation(operations.get(0));
}
private TableResult executeOperation(Operation operation) {
if (operation instanceof ModifyOperation) {
//直接执行
return executeInternal(Collections.singletonList((ModifyOperation) operation));
} else //......
}
从1.11版本的代码可以看出,INSERT 语句直接执行,并没有把算子加到transformation的List中,所以当调用 execute(String)
方法时会报错,报错并不影响执行,但是却不能指定jobName了,很多时候jobName 能够反映出 job的业务和功能,不能指定jobname是很多场景所不能接受的。
2. 修改源码增加jobname
首先我们追踪代码到executeInternal,如下:
@Override
public TableResult executeInternal(List<ModifyOperation> operations) {
List<Transformation<?>> transformations = translate(operations);
List<String> sinkIdentifierNames = extractSinkIdentifierNames(operations);
String jobName = "insert-into_" + String.join(",", sinkIdentifierNames);
// 增加配置 job.name指定jobname
String name = tableConfig.getConfiguration().getString("job.name", jobName);
Pipeline pipeline = execEnv.createPipeline(transformations, tableConfig, name);
try {
JobClient jobClient = execEnv.executeAsync(pipeline);
TableSchema.Builder builder = TableSchema.builder();
Object[] affectedRowCounts = new Long[operations.size()];
for (int i = 0; i < operations.size(); ++i) {
// use sink identifier name as field name
builder.field(sinkIdentifierNames.get(i), DataTypes.BIGINT());
affectedRowCounts[i] = -1L;
}
return TableResultImpl.builder()
.jobClient(jobClient)
.resultKind(ResultKind.SUCCESS_WITH_CONTENT)
.tableSchema(builder.build())
.data(Collections.singletonList(Row.of(affectedRowCounts)))
.build();
} catch (Exception e) {
throw new TableException("Failed to execute sql", e);
}
}
从上面不难看出,默认jobname是 insert-into_
+ sink的表名
,正如代码所示,我已经把指定jobname的功能加上了,只需要增加一个job.name的TableConfig即可,然后重新编译flink代码: mvn clean install -DskipTests -Dfast
, 线上环境替换掉 flink-table_2.11-1.11.0.jar
jar包即可,如果是本地Idea运行,把flink编译好就可以了。
主程序修改如下:
public static void main(String[] args) {
treamExecutionEnvironment streamEnv = StreamExecutionEnvironment.getExecutionEnvironment();
EnvironmentSettings settings = EnvironmentSettings.newInstance()
.inStreamingMode()
.useBlinkPlanner()
.build();
StreamTableEnvironment streamTableEnv = StreamTableEnvironment.create(streamEnv, settings);
streamEnv.setParallelism(1);
streamTableEnv.getConfig().getConfiguration().setString("job.name", "OdsCanalFcboxSendIngressStream");
streamTableEnv.executeSql("CREATE TABLE source xxxx");
streamTableEnv.executeSql("CREATE TABLE sink xxxx");
streamTableEnv.executeSql("INSERT INTO sink xxxxx FROM source");
// streamEnv.execute("FlinkTest");
}
打开控制台,发现job名称已经修改生效: