一. 问题描述
需求背景:
CDH集群 Spark版本 2.4.0
StarRocks版本 2.5.5
使用Spark 2.4.0客户端会报错,java.lang.NoClassDefFoundError: org/slf4j/Logger
看StarRocks lib下面使用的是Spark 2.4.6版本的jar包
于是使用Spark 2.4.6版本的客户端。
遇到的问题:
第二次使用resource 的时候, label很快就显示报错这个
ErrorMsg: type:ETL_SUBMIT_FAIL; msg:Invalid library type: spark
然后spark_launcher_log下压根就不打印日志。
023-06-08 12:11:46,408 INFO (pending_load_task_scheduler_pool-1|56338) [SparkLoadPendingTask.executeTask():117] begin to execute spark pending task. load job id: 3639985
2023-06-08 12:11:46,411 INFO (pending_load_task_scheduler_pool-1|56338) [SparkRepository.initRepository():105] start to init remote repository. local dpp: /data/starrocks-2.3.0/fe/spark-dpp/spark-dpp-1.0.0-jar-with-dependencies.jar
com.starrocks.common.LoadException: Invalid library type: spark
2023-06-08 12:11:46,414 INFO (pending_load_task_scheduler_pool-1|56338) [SparkLoadPendingTask.executeTask():117] begin to execute spark pending task. load job id: 3639985
2023-06-08 12:11:46,414 INFO (pending_load_task_scheduler_pool-1|56338) [SparkRepository.initRepository():105] start to init remote repository. local dpp: /data/starrocks-2.3.0/fe/spark-dpp/spark-dpp-1.0.0-jar-with-dependencies.jar
com.starrocks.common.LoadException: Invalid library type: spark
2023-06-08 12:11:46,416 INFO (pending_load_task_scheduler_pool-1|56338) [SparkLoadPendingTask.executeTask():117] begin to execute spark pending task. load job id: 3639985
2023-06-08 12:11:46,417 INFO (pending_load_task_scheduler_pool-1|56338) [SparkRepository.initRepository():105] start to init remote repository. local dpp: /data/starrocks-2.3.0/fe/spark-dpp/spark-dpp-1.0.0-jar-with-dependencies.jar
com.starrocks.common.LoadException: Invalid library type: spark
2023-06-08 12:11:46,419 INFO (pending_load_task_scheduler_pool-1|56338) [SparkLoadPendingTask.executeTask():117] begin to execute spark pending task. load job id: 3639985
2023-06-08 12:11:46,419 INFO (pending_load_task_scheduler_pool-1|56338) [SparkRepository.initRepository():105] start to init remote repository. local dpp: /data/starrocks-2.3.0/fe/spark-dpp/spark-dpp-1.0.0-jar-with-dependencies.jar
com.starrocks.common.LoadException: Invalid library type: spark
2023-06-08 12:11:46,421 WARN (pending_load_task_scheduler_pool-1|56338) [LoadJob.unprotectedExecuteCancel():589] LOAD_JOB=3639985, transaction_id={62586242}, error_msg={Failed to execute load with error: Invalid library type: spark}
2023-06-08 12:11:46,422 INFO (pending_load_task_scheduler_pool-1|56338) [DatabaseTransactionMgr.abortTransaction():1263] transaction:[TransactionState. txn_id: 62586242, label: label09, db id: 3290466, table id list: 3635988, callback id: 3639985, coordinator: FE: 172.16.10.31, transaction status: ABORTED, error replicas num: 0, replica ids: , prepare time: 1686226306402, commit time: -1, finish time: 1686226306421, total cost: 19ms, reason: Invalid library type: spark] successfully rollback
二. 解决方案
StarRocks的committer在论坛回复我了,原来是打包的问题。
spark jar 打包名字需要是 spark-2x.zip
(既不能是spark.zip 也不能是spark-24.zip, 需要在配置文件中写死为 spark-2x.zip)
https://docs.starrocks.io/zh-cn/latest/loading/SparkLoad#配置-spark-客户端 1