Spark2.3.1使用技巧是什么样的

2022年1月6日 22:54 • 大数据, 开源, 研发管理, 移动开发, 编程笔记

本篇文章给大家分享的是有关Spark2.3.1使用技巧是什么样的，小编觉得挺实用的，因此分享给大家学习，希望大家阅读完这篇文章后可以有所收获，话不多说，跟着小编一起来看看吧。

`Spark 2.3.1` 使用技巧

`Spark-SQL` 读取`JSON`文件时反射表头

case class StudentInfo(id:Long,name:String,age:Int)

val example = spark.read.json("/data/result.json").as(StudentInfo)
example.show()

动态定义`schema`

在需要根据不同数据定义不同schema

val schemaInfo = "name age"
val fields = schemaInfo.map(item=> item.split(" ")
     .map(item=>StructField(item,StringType,nullable=true))
val schema = StructType(fields)

val rowRDD = peopleRDD.map(_.split(" ").map(attributes=>Row(attributes(0),attributes(1))

val peopleDF = spark.createDataFrame(rowRDD,schema)

peopleDF.show()

`Spark 2.3.1 on YARN`

`spark-submit` 限制参数未生效

因为在spark-submit时配置的executor-memory 2g等没有生效，后来问同事说他也碰到这样的问题，解决方案就是动态的分配executor

--conf spark.yarn.maxAppAttempts=1 --conf spark.dynamicAllocation.minExecutors=2 --conf spark.dynamicAllocation.maxExecutors=4 --conf spark.dynamicAllocation.initialExecutors=4

以上就是Spark2.3.1使用技巧是什么样的，小编相信有部分知识点可能是我们日常工作会见到或用到的。希望你能通过这篇文章学到更多知识。更多详情敬请关注亿速云行业资讯频道。

原创文章，作者：1402239773，如若转载，请注明出处：https://blog.ytso.com/223388.html

赞 (0)

0

Spark2.3.1怎么在Idea控制台调整日志等级

上一篇 2022年1月6日

How to Install NVIDIA Tesla Drivers on Linux or Windows

下一篇 2022年1月6日

发表回复

登录后才能评论