首先 已有一个高于1.8的k8s集群 我的是CentOS7
下载spark 2.4.0 - https://www.apache.org/dyn/closer.lua/spark/spark-2.4.0/spark-2.4.0-bin-hadoop2.7.tgz
wget 下载地址
tar -xzvf 压缩包名
然后参考官方文档 - http://spark.apache.org/docs/2.4.0/running-on-kubernetes.html#running-spark-on-kubernetes 注意里面的prerequisites
此处我们尝试 spark on k8s 的 cluster mode
打包镜像:
首先
$ docker login
登陆注册过的dockerhub账户
然后在spark解压的目录下,打包和发布docker 镜像
方法1:
$ ./bin/docker-image-tool.sh -r <repo> -t my-tag build
$ ./bin/docker-image-tool.sh -r <repo> -t my-tag push
<repo>我用的是docker id my-tag就是个tag名
方法2:
cd /path/to/spark-2.4.0-bin-hadoop2.7
docker build -t <your.image.hub/yourns>/spark:2.4.0 -f kubernetes/dockerfiles/spark/Dockerfile .
docker push <your.image.hub/yourns>/spark:2.4.0
之后可发现dockerhub里多了几个镜像
可以 docker pull morphtin/spark:sparkonk8s 下载我的镜像
To launch Spark Pi in cluster mode,
进入spark目录下,
$ bin/spark-submit \
--master k8s://https://<k8s-apiserver-host>:<k8s-apiserver-port> \
--deploy-mode cluster \
--name spark-pi \
--class org.apache.spark.examples.SparkPi \
--conf spark.executor.instances=5 \
--conf spark.kubernetes.container.image=<spark-image> \
local:///path/to/examples.jar
可通过以下命令查看apiserver的url
$ kubectl cluster-info
Kubernetes master is running at http://x.x.x.x:xxxx
此处我的是 k8s://http://localhost:8080
<spark-image>是运行的镜像 此处我用的是 morphtin/spark:sparkonk8s 与dockerhub一致
local:///path/to/examples.jar 指的是image内部的jar的路径 此处我的路径为 /opt/spark/examples/jars/spark-examples_2.11-2.4.0.jar
所以我的提交如下
bin/spark-submit \
--master k8s://https://ip:port \
--deploy-mode cluster \
--name spark-pi \
--class org.apache.spark.examples.SparkPi \
--conf spark.executor.instances=5 \
--conf spark.kubernetes.container.image=morphtin/spark:sparkonk8s \
local:///opt/spark/examples/jars/spark-examples_2.11-2.4.0.jar
Spark uses the following URL scheme to allow different strategies for disseminating jars:
file: - Absolute paths and file:/ URIs are served by the driver’s HTTP file server, and every executor pulls the file from the driver HTTP server.
hdfs:, http:, https:, ftp: - these pull down files and JARs from the URI as expected
local: - a URI starting with local:/ is expected to exist as a local file on each worker node. This means that no network IO will be incurred, and works well for large files/JARs that are pushed to each worker, or shared via NFS, GlusterFS, etc.
客户端模式运行spark详见官方文档
Cluster Manager Types
The system currently supports three cluster managers:
Standalone – a simple cluster manager included with Spark that makes it easy to set up a cluster.
Apache Mesos – a general cluster manager that can also run Hadoop MapReduce and service applications.
Hadoop YARN – the resource manager in Hadoop 2.
Kubernetes – an open-source system for automating deployment, scaling, and management of containerized applications.
A third-party project (not supported by the Spark project) exists to add support for Nomad as a cluster manager.
可能遇到的坑:
原文:https://blog.csdn.net/ZQZ_QiZheng/article/details/79540487
spark 自带的exemples是用jdk1.8编译的,如果启动过程中提示Unsupported major.minor version 52.0请更换jdk版本;
spark-submit默认会去~/.kube/config去加载集群配置,故请将k8s集群config放在该目录下;
spark driver 启动的时候报错Error: Could not find or load main class org.apache.spark.examples.SparkPi
spark 启动参数的local://后面应该跟你自己的spark application在容器里的路径;
spark driver 启动抛异常Caused by: java.net.UnknownHostException: kubernetes.default.svc: Try again, 请保证 k8d let节点间网络互通;
spark driver 启动抛异常system: serviceaccount: default: default" cannot get pods in the namespace "default, 权限问题,执行一下两条命令:
kubectl create rolebinding default-view --clusterrole=view --serviceaccount=default:default --namespace=defalut 和
kubectl create rolebinding default-admin --clusterrole=admin --serviceaccount=default:default --namespace=default 后就可以了