Apache Spark 2.0.0 发布，APIs 更新

Apache Spark 2.0.0 发布了，Apache Spark 是一种与 Hadoop 相似的开源集群计算环境，但是两者之间还存在一些不同之处，这些有用的不同之处使 Spark 在某些工作负载方面表现得更加优越，换句话说，Spark 启用了内存分布数据集，除了能够提供交互式查询外，它还可以优化迭代工作负载。

该版本主要更新APIs，支持SQL 2003，支持R UDF ，增强其性能。300个开发者贡献了2500补丁程序。

Apache Spark 2.0.0 APIs更新记录如下：

Unifying DataFrame and Dataset: In Scala and Java, DataFrame and Dataset have been unified, i.e. DataFrame is just a type alias for Dataset of Row. In Python and R, given the lack of type safety, DataFrame is the main programming interface.
SparkSession: new entry point that replaces the old SQLContext and HiveContext for DataFrame and Dataset APIs. SQLContext and HiveContext are kept for backward compatibility.
A new, streamlined configuration API for SparkSession
Simpler, more performant accumulator API
A new, improved Aggregator API for typed aggregation in Datasets

Apache Spark 2.0.0 SQL更新记录如下：

A native SQL parser that supports both ANSI-SQL as well as Hive QL
Native DDL command implementations
Subquery support, including

Uncorrelated Scalar Subqueries
Correlated Scalar Subqueries
NOT IN predicate Subqueries (in WHERE/HAVING clauses)
IN predicate subqueries (in WHERE/HAVING clauses)
(NOT) EXISTS predicate subqueries (in WHERE/HAVING clauses)

View canonicalization support

一些新特性：

Native CSV data source, based on Databricks’ spark-csv module
Off-heap memory management for both caching and runtime execution
Hive style bucketing support
Approximate summary statistics using sketches, including approximate quantile, Bloom filter, and count-min sketch.

性能增强：

Substantial (2 - 10X) performance speedups for common operators in SQL and DataFrames via a new technique called whole stage code generation.
Improved Parquet scan throughput through vectorization
Improved ORC performance
Many improvements in the Catalyst query optimizer for common workloads
Improved window function performance via native implementations for all window functions
Automatic file coalescing for native data sources

Apache Spark 2.0.0 发布，APIs 更新

Trending Articles

雲林縣斗六市科 - 新東京夢公園

uni.requestPayment,支付报错，"errMsg":"requestPayment:fail:[payment微信:-1]General...

[银色子弹字幕组][名侦探柯南][第1164集 17年前的真相染血的騎士][WEBRIP][繁日雙語MP4][1080P]

Linux进程间通信之管道

台湾萌妹COSer Misa米砂写真集赠送活动获奖名单揭晓

清查Cisco Switch port mapping

每日一句泰语：不经历风雨，怎么见彩虹

[下載]AutoCAD 2015~2018 典型工作區

晴色杀手《ＸＸ系列》：1993 美丽凶器、1994 美丽猎人、1996 掠色无罪、1997 温柔的美兽、1997 狂爱、1998 另一个XX

【日影】[MagicStar] Sweet Rain 死神的精度 / Sweet Rain 死神の精度 2008 [WEBDL] [1080p]...

2016年年终总结--勿忘初心

原中国500强建企工程款断崖式下降生存艰难

出售:香港 JoyVirtue JVD- 60M

Pro-face GP-Pro EX 4.09.100 破解版

SearchMyFiles 2.83 免安裝中文版 - 取代Windows內建搜尋功能

[一般] 毀滅神州的外掛

《北京人艺话剧作品合集》部部精良共52部

关门一家亲：习远平、张澜澜、徐才厚

清科2016中国股权投资年度排名公布

[字体]古风字体合集[百度云下载][1.68GB]