1.spark-shell重要参数
1)--master
2)--name
application的名字
3)--jars
Comma-separated list of jars to include on the driver and executor classpaths
4)--conf
Arbitrary Spark configuration property
5)--driver-memory
Memory for driver (e.g. 1000M, 2G) (Default: 1024M)
6)--driver-java-options
7)--driver-library-path
8)--driver-class-path
Extra class path entries to pass to the driver. Note that jars added with --jars are automatically included in the classpath
9)--executor-memory
Memory per executor (e.g. 1000M, 2G) (Default: 1G)
Spark standalone and YARN only:
11)--executor-cores
Number of cores per executor. (Default: 1 in YARN mode, or all available cores on the worker in standalone mode)
Cluster deploy mode only:
12)--driver-cores
Number of cores used by the driver, only in cluster mode (Default: 1)
YARN-only:
13)--queue
The YARN queue to submit to (Default: "default")
14)--num-executors
Number of executors to launch (Default: 2). If dynamic allocation is enabled, the initial number of executors will be at least NUM.
2.spark-shell启动界面如下
[hadoop@hadoop001 bin]$ ./spark-shell --master local[2]
18/09/10 22:23:26 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context Web UI available at http://hadoop001:4040
Spark context available as 'sc' (master = local[2], app id = local-1536589468469).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.3.1
/_/
Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_45)
Type in expressions to have them evaluated.
Type :help for more information.
scala>
3.Web UI界面如下4.Sparkshell的App名字来源:
[hadoop@hadoop001 bin]$ vi spark-shell
SPARK_SUBMIT_OPTS="$SPARK_SUBMIT_OPTS -Dscala.usejavacp=true"
function main() {
if $cygwin; then
# Workaround for issue involving JLine and Cygwin
# (see http://sourceforge.net/p/jline/bugs/40/).
# If you're using the Mintty terminal emulator in Cygwin, may need to set the
# "Backspace sends ^H" setting in "Keys" section of the Mintty options
# (see https://github.com/sbt/sbt/issues/562).
stty -icanon min 1 -echo > /dev/null 2>&1
export SPARK_SUBMIT_OPTS="$SPARK_SUBMIT_OPTS -Djline.terminal=unix"
"${SPARK_HOME}"/bin/spark-submit --class org.apache.spark.repl.Main --name "Spark shell" "$@"
stty icanon echo > /dev/null 2>&1
else
export SPARK_SUBMIT_OPTS
"${SPARK_HOME}"/bin/spark-submit --class org.apache.spark.repl.Main --name "Spark shell" "$@"
fi
}
通过spark-shell的脚本,我们可以发现spark-shell的由来,其底层也是调用spark-submit进行作业的提交的
6.spark.app.id的来源:
local模式:local开头,后面跟的是当前时间的时间戳
yarn模式:Application开头
7.java.io.tmpdir很重要,生产上必须修改
8.Classpath Entries:
SPARK_HOME/conf/*
SPARK_HOME/jars/*.jar
spark on yarn提交作业时,大量的jar包传上去,开销很大,思考优化点
9.In the Spark shell, a special interpreter-aware SparkContext is already created for you, in the variable called sc. Making your own SparkContext will not work.
scala> import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.{SparkConf, SparkContext}
scala> val SparkConf = new SparkConf().setAppName("SparkContextApp").setMaster("local[2]")
SparkConf: org.apache.spark.SparkConf = org.apache.spark.SparkConf@75452aea
scala> val sc = new SparkContext(SparkConf)
org.apache.spark.SparkException: Only one SparkContext may be running in this JVM (see SPARK-2243). To ignore this error, set spark.driver.allowMultipleContexts = true. The currently running SparkContext was created at:
org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:924)
org.apache.spark.repl.Main$.createSparkSession(Main.scala:103)
<init>(<console>:15)
<init>(<console>:43)
<init>(<console>:45)
.<init>(<console>:49)
.<clinit>(<console>)
.$print$lzycompute(<console>:7)
.$print(<console>:6)
$print(<console>)
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
java.lang.reflect.Method.invoke(Method.java:497)
scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:786)
scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1047)
scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:638)
scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:637)
scala.reflect.internal.util.ScalaClassLoader$class.asContext(ScalaClassLoader.scala:31)
scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:19)
at org.apache.spark.SparkContext$$anonfun$assertNoOtherContextIsRunning$2.apply(SparkContext.scala:2456)
at org.apache.spark.SparkContext$$anonfun$assertNoOtherContextIsRunning$2.apply(SparkContext.scala:2452)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.SparkContext$.assertNoOtherContextIsRunning(SparkContext.scala:2452)
at org.apache.spark.SparkContext$.markPartiallyConstructed(SparkContext.scala:2541)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:84)
... 50 elided
启动spark-shell之后,在里边再创建一个sparkcontext会报错:org.apache.spark.SparkException: Only one SparkContext may be running in this JVM