Presto 0.190 发布,Facebook 开源的大数据查询引擎

来源: 投稿
作者: 王练

Presto 0.190 已发布,Presto 是 Facebook 开源的数据查询引擎,可对250PB以上的数据进行快速地交互式分析,查询的速度达到商业数据仓库的级别。据称该引擎的性能是 Hive 的 10 倍以上。

Presto 可以查询包括 Hive、Cassandra 甚至是一些商业的数据存储产品。单个 Presto 查询可合并来自多个数据源的数据进行统一分析。

General Changes

  • Fix correctness issue for array_min() and array_max() when arrays contain NaN.

  • Fix planning failure for queries involving GROUPING that require implicit coercions in expressions containing aggregate functions.

  • Fix potential workload imbalance when using topology-aware scheduling.

  • Fix performance regression for queries containing DISTINCT aggregates over the same column.

  • Fix a memory leak that occurs on workers.

  • Improve error handling when a HAVING clause contains window functions.

  • Avoid unnecessary data redistribution when writing when the target table has the same partition property as the data being written.

  • Ignore case when sorting the output of SHOW FUNCTIONS.

  • Improve rendering of the BingTile type.

  • The approx_distinct() function now supports a standard error in the range of [0.0040625, 0.26000].

  • Add support for ORDER BY in aggregation functions.

  • Add dictionary processing for joins which can improve join performance up to 50%. This optimization can be disabled using the dictionary-processing-joins-enabled config property or the dictionary_processing_join session property.

  • Add support for casting to INTERVAL types.

  • Add ST_Buffer() geospatial function.

  • Allow treating decimal literals as values of the DECIMAL type rather than DOUBLE. This behavior can be enabled by setting the parse-decimal-literals-as-double config property or the parse_decimal_literals_as_double session property to false.

  • Add JMX counter to track the number of submitted queries.

Resource Groups Changes

  • Add priority column to the DB resource group selectors.

  • Add exact match source selector to the DB resource group selectors.

CLI Changes

  • Add support for setting client tags.

JDBC Driver Changes

  • Add getPeakMemoryBytes() to QueryStats.

Accumulo Changes

  • Improve table scan parallelism.

Hive Changes

  • Fix query failures for the file-based metastore implementation when partition column values contain a colon.

  • Improve performance for writing to bucketed tables when the data being written is already partitioned appropriately (e.g., the output is from a bucketed join).

  • Add config property hive.max-outstanding-splits-size for the maximum amount of memory used to buffer splits for a single table scan. Additionally, the default value is substantially higher than the previous hard-coded limit, which can prevent certain queries from failing.

Thrift Connector Changes

  • Make Thrift retry configurable.

  • Add JMX counters for Thrift requests.

SPI Changes

  • Remove the RecordSink interface, which was difficult to use correctly and had no advantages over the PageSink interface.


13 收藏
0 评论
13 收藏