plutolove’s diary

I love three things in this world, the sun, the moon and you. The sun for the day, the moon for the night, and you forever。

SparkSQL

Spark SQL 执行过程

以下列代码为例: val teenagersDF = spark.sql("SELECT name, age FROM people WHERE age BETWEEN 13 AND 19") // The columns of a row in the result can be accessed by field index teenagersDF.map(teenager => "Name: " + teenager(0)).show()

RTree index speed up Range query in SparkSQL

本文接着Add Range Query on Spark DataSet继续,上一篇只是添加了基本的Range操作,为了加速执行,本文将在RDD上实现一个RTree Index来加速执行Range和Knn操作,全部代码在Github。实现之后的例子如下: import org.apache.spark.sql.SparkSession import s…

Add Range Query on Spark DataSet

准备工作 首先下载代码并编译,将编译之后的代码导入到IDEA中,若在IDEA中编译出现问题,一般是由于有的代码在编译时才生成,在导入到IDEA之后要重新生成一下,点击Generate Sources and Update folders后重新编译即可。(我用的是Spark-2.1版本的代码) git c…