Quantcast
Channel: OSCHINA 社区最新新闻
Viewing all articles
Browse latest Browse all 44787

Apache Kudu 0.10.0 发布,Hadoop 存储系统

$
0
0

Apache Kudu 0.10.0 发布了。

Apache Kudu 简介

为了应对先前发现的这些趋势,有两种不同的方式:持续更新现有的Hadoop工具或者重新设计开发一个新的组件。其目标是:

  • 对数据扫描(scan)和随机访问(random access)同时具有高性能,简化用户复杂的混合架构;

  • 高CPU效率,最大化先进处理器的效能;

  • 高IO性能,充分利用先进永久存储介质;

  • 支持数据的原地更新,避免额外的数据处理、数据移动

我们为了实现这些目标,首先在现有的开源项目上实现原型,但是最终我们得出结论:需要从架构层作出重大改变。而这些改变足以让我们重新开发一个全新的数据存储系统。于是3年前开始开发,直到如今我们终于可以分享多年来的努力成果:Kudu,一个新的数据存储系统。

更新如下:

Incompatible changes and deprecated APIs in 0.10.0

  • Gerrit #3737 The Java client has been repackaged under org.apache.kudu instead of org.kududb. Import statements for Kudu classes must be modified in order to compile against 0.10.0. Wire compatibility is maintained.

  • Gerrit #3055 The Java client’s synchronous API methods now throw KuduException instead of Exception. Existing code that catches Exception should still compile, but introspection of an exception’s message may be impacted. This change was made to allow thrown exceptions to be queried more easily using KuduException.getStatus and calling one of Status’s methods. For example, an operation that tries to delete a table that doesn’t exist would return a `Status that returns true when queried on isNotFound().

  • The Java client’s KuduTable.getTabletsLocations set of methods is now deprecated. Additionally, they now take an exclusive end partition key instead of an inclusive key. Applications are encouraged to use the scan tokens API instead of these methods in the future.

  • The C++ API for specifying split points on range-partitioned tables has been improved to make it easier for callers to properly manage the ownership of the provided rows.

    The TableCreator::split_rows API took a vector<const KuduPartialRow*>, which made it very difficult for the calling application to do proper error handling with cleanup when setting the fields of the KuduPartialRow. This API has been now been deprecated and replaced by a new method TableCreator::add_range_split which allows easier use of smart pointers for safe memory management.

  • The Java client’s internal buffering has been reworked. Previously, the number of buffered write operations was constrained on a per-tablet-server basis. Now, the configured maximum buffer size constrains the total number of buffered operations across all tablet servers in the cluster. This provides a more consistent bound on the memory usage of the client regardless of the size of the cluster to which it is writing.

    This change can negatively affect the write performance of Java clients which rely on buffered writes. Consider using thesetMutationBufferSpace API to increase a session’s maximum buffer size if write performance seems to be degraded after upgrading to Kudu 0.10.0.

  • The "remote bootstrap" process used to copy a tablet replica from one host to another has been renamed to "Tablet Copy". This resulted in the renaming of several RPC metrics. Any users previously explicitly fetching or monitoring metrics related to Remote Bootstrap should update their scripts to reflect the new names.

  • The SparkSQL datasource for Kudu no longer supports mode Overwrite. Users should use the new KuduContext.upsertRowsmethod instead. Additionally, inserts using the datasource are now upserts by default. The older behavior can be restored by setting the operation parameter to insert.

New features

  • Users may now manually manage the partitioning of a range-partitioned table. When a table is created, the user may specify a set of range partitions that do not cover the entire available key space. A user may add or drop range partitions to existing tables.

    This feature can be particularly helpful with time series workloads in which new partitions can be created on an hourly or daily basis. Old partitions may be efficiently dropped if the application does not need to retain historical data past a certain point.

    This feature is considered experimental for the 0.10 release. More details of the new feature can be found in the accompanyingblog post.

  • Support for running Kudu clusters with multiple masters has been stabilized. Users may start a cluster with three or five masters to provide fault tolerance despite a failure of one or two masters, respectively.

    Note that certain tools (e.g. ksck) are still lacking complete support for multiple masters. These deficiencies will be addressed in a following release.

  • Kudu now supports the ability to reserve a certain amount of free disk space in each of its configured data directories. If a directory’s free disk space drops to less than the configured minimum, Kudu will stop writing to that directory until space becomes available. If no space is available in any configured directory, Kudu will abort.

    This feature may be configured using the fs_data_dirs_reserved_bytes and fs_wal_dir_reserved_bytes flags.

  • The Spark integration’s KuduContext now supports four new methods for writing to Kudu tables: insertRowsupsertRows,updateRows, and deleteRows. These are now the preferred way to write to Kudu tables from Spark.

完整更新说明:http://kudu.apache.org/releases/0.10.0/docs/release_notes.html

下载:


Viewing all articles
Browse latest Browse all 44787

Trending Articles