Linux系统:Ubuntu 16.04
Hadoop: 2.7.1
JDK: 1.8
Spark: 2.4.3
一.下载安装文件
http://spark.apache.org/downloads.html
https://archive.apache.org/dist/spark/
hadoop@dblab:/usr/local$ sudo wget http://mirror.bit.edu.cn/apache/spark/spark-2.4.3/spark-2.4.3-bin-hadoop2.7.tgz
hadoop@dblab:/usr/local$ sudo tar -zxf spark-2.4.3-bin-hadoop2.7.tgz -C spark
hadoop@dblab:/usr/local$ sudo chown -R hadoop:hadoop spark/
二.配置相关文件
hadoop@dblab:/usr/local/spark$ ./conf/spark-env.sh.template ./conf/spark-env.sh
export SPARK_DIST_CLASSPATH=$(/usr/local/hadoop/bin/hadoop classpath)
#验证Spark是否安装成功
hadoop@dblab:/usr/local/spark$ bin/run-example SparkPi
Pi is roughly 3.139035695178476
三.启动Spark Shell
hadoop@dblab:/usr/local/spark$ ./bin/spark-shell
Welcome to
____ __
/ __/__ ___ _____/ /__
_/ // _ // _ `/ __/ ‘_/
/___/ .__//_,_/_/ /_//_/ version 2.1.0
/_/
Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_212)
Type in expressions to have them evaluated.
Type :help for more information.
scala>
scala> 8*2+5
res0: Int = 21
四.读取文件
1.读取本地文件
hadoop@dblab:/usr/local/hadoop$ ./sbin/start-dfs.sh
scala> val textFile=sc.textFile(“file:///usr/local/spark/README.md”)
textFile: org.apache.spark.rdd.RDD[String] = file:///usr/local/spark/README.md MapPartitionsRDD[1] at textFile at <console>:24
scala> textFile.first()
res0: String = # Apache Spark
2.读取HDFS文件
hadoop@dblab:/usr/local/hadoop$ ./bin/hdfs dfs -put /usr/local/spark/README.md .
hadoop@dblab:/usr/local/hadoop$ ./bin/hdfs dfs -cat README.md
scala> val textFile=sc.textFile(“hdfs://localhost:9000/user/hadoop/README.md”)
textFile: org.apache.spark.rdd.RDD[String] = hdfs://localhost:9000/user/hadoop/README.md MapPartitionsRDD[3] at textFile at <console>:24
scala> textFile.first()
res1: String = # Apache Spark
scala> :quit
原创文章,作者:Maggie-Hunter,如若转载,请注明出处:https://blog.ytso.com/191451.html