Quantcast
Channel: OSCHINA 社区最新新闻
Viewing all articles
Browse latest Browse all 44787

Pachyderm 1.1 发布,容器化的数据池

$
0
0

Pachyderm 1.1 在 7 月份时候就发布了,Pychyderm 是一个容器化的数据池,可以让你使用容器来存储和分析数据。

该版本包含众多改进内容,详细列表如下:

特征:

  • Data Provenance, which tracks the flow of data as it’s analyzed

  • FlushCommit, which tracks commits forward downstream results computed from them

  • DeleteAll, which restores the cluster to factory settings

  • More featureful data partitioning (map, reduce and global methods)

  • Explicit incrementality

  • Better support for dynamic membership (nodes leaving and entering the cluster)

  • Commit IDs are now present as env vars for jobs

  • Deletes and reads now work during job execution

  • pachctl inspect-* now returns much more information about the inspected objects

  • PipelineInfos now contain a count of job outcomes for the pipeline

  • Fixes to pachyderm and bazil.org/fuse to support writing a larger number of files

  • Jobs now report their end times as well as their start times

  • Jobs have a pulling state for when the container is being pulled

  • Put-file now accepts a -f flag for easier puts

  • Cluster restarts now work, even if kubernetes is restarted as well

  • Support for json and binary delimiters in data chunking

  • Manifests now reference specific pachyderm container version making deployment more bulletproof

  • Readiness checks for pachd which makes deployment more bulletproof

  • Kubernetes jobs are now created in the same namespace pachd is deployed in

  • Support for pipeline DAGs that aren’t transitive reductions.

  • Appending to files now works in jobs, from shell scripts you can do >>

  • Network traffic is reduced with object stores by taking advantage of content addressability

  • Transforms now have a Debug field which turns on debug logging for the job

  • Pachctl can now be installed via Homebrew on macOS or apt on Ubuntu

  • ListJob now orders jobs by creation time

  • Openshift Origin is now supported as a deployment platform

内容:

  • Webscraper example

  • Neural net example with Tensor Flow

  • Wordcount example

Bug 修复:

  • False positive on running pipelines

  • Makefile bulletproofing to make sure things are installed when they’re needed

  • Races within the FUSE driver

  • In 1.0 it was possible to get duplicate job ids which, that should be fixed now

  • Pipelines could get stuck in the pulling state after being recreated several times

  • Map jobs no longer return when sharded unless the files are actually empty

  • The fuse driver could encounter a bounds error during execution, no longer

  • Pipelines no longer get stuck in restarting state when the cluster is restarted

  • Failed jobs were being marked failed too early resulting in a race condition

  • Jobs could get stuck in running when they had failed

  • Pachd could panic due to membership changes

  • Starting a commit with a nonexistant parent now errors instead of silently failing

  • Previously pachd nodes would crash when deleting a watched repo

  • Jobs now get recreated if you delete and recreate a pipeline

  • Getting files from non existant commits gives a nicer error message

  • RunPipeline would fail to create a new job if the pipeline had already run

  • FUSE no longer chokes if a commit is closed after the mount happened

  • GCE/AWS backends have been made a lot more reliable

Tests:

From 1.0.0 to 1.1.0 we’ve gone from 70 tests to 120, a 71% increase.


Viewing all articles
Browse latest Browse all 44787

Trending Articles