Spark编写WordCount（scala编写）详解大数据

2022年1月11日 15:16 • 大数据

本文章主要介绍了Spark编写WordCount（scala编写），具有不错的的参考价值，希望对您有所帮助，如解说有误或未考虑完全的地方，请您留言指出，谢谢！

一、创建maven项目

二、导入依赖

<!-- 定义了一些常量 --> 
    <properties> 
        <maven.compiler.source>1.8</maven.compiler.source> 
        <maven.compiler.target>1.8</maven.compiler.target> 
        <scala.version>2.12.10</scala.version> 
        <spark.version>3.0.0</spark.version> 
        <encoding>UTF-8</encoding> 
    </properties> 
 
    <dependencies> 
        <!-- 导入scala的依赖 --> 
        <dependency> 
            <groupId>org.scala-lang</groupId> 
            <artifactId>scala-library</artifactId> 
            <version>${scala.version}</version> 
            <!-- 打包时不会将依赖打入jar包 --> 
            <scope>provided</scope> 
        </dependency> 
 
        <dependency> 
            <groupId>org.apache.spark</groupId> 
            <artifactId>spark-core_2.12</artifactId> 
            <version>${spark.version}</version> 
            <!-- 打包时不会将依赖打入jar包 --> 
            <scope>provided</scope> 
        </dependency> 
    </dependencies> 
 
    <build> 
        <pluginManagement> 
            <plugins> 
                <!-- 编译scala的插件 --> 
                <plugin> 
                    <groupId>net.alchim31.maven</groupId> 
                    <artifactId>scala-maven-plugin</artifactId> 
                    <version>3.2.2</version> 
                </plugin> 
                <!-- 编译java的插件 --> 
                <plugin> 
                    <groupId>org.apache.maven.plugins</groupId> 
                    <artifactId>maven-compiler-plugin</artifactId> 
                    <version>3.5.1</version> 
                </plugin> 
            </plugins> 
        </pluginManagement> 
        <plugins> 
            <plugin> 
                <groupId>net.alchim31.maven</groupId> 
                <artifactId>scala-maven-plugin</artifactId> 
                <executions> 
                    <execution> 
                        <id>scala-compile-first</id> 
                        <phase>process-resources</phase> 
                        <goals> 
                            <goal>add-source</goal> 
                            <goal>compile</goal> 
                        </goals> 
                    </execution> 
                    <execution> 
                        <id>scala-test-compile</id> 
                        <phase>process-test-resources</phase> 
                        <goals> 
                            <goal>testCompile</goal> 
                        </goals> 
                    </execution> 
                </executions> 
            </plugin> 
 
            <plugin> 
                <groupId>org.apache.maven.plugins</groupId> 
                <artifactId>maven-compiler-plugin</artifactId> 
                <executions> 
                    <execution> 
                        <phase>compile</phase> 
                        <goals> 
                            <goal>compile</goal> 
                        </goals> 
                    </execution> 
                </executions> 
            </plugin> 
 
            <!-- 打jar插件 --> 
            <plugin> 
                <groupId>org.apache.maven.plugins</groupId> 
                <artifactId>maven-shade-plugin</artifactId> 
                <version>2.4.3</version> 
                <executions> 
                    <execution> 
                        <phase>package</phase> 
                        <goals> 
                            <goal>shade</goal> 
                        </goals> 
                        <configuration> 
                            <filters> 
                                <filter> 
                                    <artifact>*:*</artifact> 
                                    <excludes> 
                                        <exclude>META-INF/*.SF</exclude> 
                                        <exclude>META-INF/*.DSA</exclude> 
                                        <exclude>META-INF/*.RSA</exclude> 
                                    </excludes> 
                                </filter> 
                            </filters> 
                        </configuration> 
                    </execution> 
                </executions> 
            </plugin> 
        </plugins> 
    </build>

三、编写程序

package cn._51doit.day01 
 
import org.apache.spark.rdd.RDD 
import org.apache.spark.{
   SparkConf, SparkContext} 
 
object WordCount {
    
  def main(args: Array[String]): Unit = {
    
    //创建SparkContext 
    val conf = new SparkConf().setAppName("WordCount") 
    //SparkContext是用来创建最原始的RDD的 
    val sc = new SparkContext(conf) 
 
    //创建RDD 
    val lines: RDD[String] = sc.textFile(args(0)) 
 
    //切分压平 
    val words: RDD[String] = lines.flatMap(_.split(" ")) 
 
    //将单词和1组合 
    val wordAndOne = words.map((_, 1)) 
 
    //分组聚合 
    val reduced = wordAndOne.reduceByKey(_ + _) 
 
    //排序 
    val sorted = reduced.sortBy(_._2, false) 
 
    //Action算子,会触发任务执行 
    //保存数据到hdfs中 
    sorted.saveAsTextFile(args(1)) 
 
    //释放资源 
    sc.stop() 
 
  } 
}

四、打包

五、上传到集群

六、启动

/opt/apps/spark-3.0.0-bin-hadoop2.7/bin/spark-submit --master spark://linux01:7077 --executor-memory 1g --total-executor-cores 5 --class cn._51doit.day01.WordCount /root/spark-in-action-1.0-SNAPSHOT.jar hdfs://linux01:9000/sp-data hdfs://linux01:9000/out-spark/out2
在这里插入图片描述

原创文章，作者：Maggie-Hunter，如若转载，请注明出处：https://blog.ytso.com/tech/bigdata/228146.html

分布式文件系统，分布式数据库区块链并行处理（MPP）数据库，数据挖掘开源大数据平台数据中台数据分析数据开发数据治理数据湖数据采集

赞 (0)

0 0

Spark编写wordcount（Java编写）详解大数据

上一篇 2022年1月11日 15:16

如何利用C语言可变参数和宏定义来实现自己的日志系统

下一篇 2022年1月11日 15:36

发表回复

登录后才能评论