Apache Mahout 0.8 发布,机器学习库

2013-07-26 00:00:00

Apache Mahout 0.8 发布了,Apache Mahout 是 Apache Software Foundation (ASF) 开发的一个全新的开源项目,其主要目标是创建一些可伸缩的机器学习算法,供开发人员在 Apache 在许可下免费使用。该项目已经发展到了它的最二个年头,目前只有一个公共发行版。Mahout 包含许多实现,包括集群、分类、CP 和进化程序。此外,通过使用 Apache Hadoop 库,Mahout 可以有效地扩展到云中。

该版本主要是 1.0 版本发布之前的代码清理。同时也包含了一些新特性:

- Numerous performance improvements to Vector and Matrix implementations, API's and their iterators (see also MAHOUT-1192, MAHOUT-1202)
- Numerous performance improvements to the recommender implementations (see also MAHOUT-1272, MAHOUT-1035, MAHOUT-1042, MAHOUT-1151, MAHOUT-1166, MAHOUT-1167, MAHOUT-1169, MAHOUT-1205, MAHOUT-1264)
- MAHOUT-1088: Support for biased item-based recommender
- MAHOUT-1089: SGD matrix factorization for rating prediction with user and item biases
- MAHOUT-1106: Support for SVD++
- MAHOUT-944: Support for converting one or more Lucene storage indexes to SequenceFiles as well as an upgrade of the supported Lucene version to Lucene 4.3.1.
- MAHOUT-1154 and friends: New streaming k-means implementation that offers on-line (and fast) clustering
- MAHOUT-833: Make conversion to SequenceFiles Map-Reduce, 'seqdirectory' can now be run as a MapReduce job.
- MAHOUT-1052: Add an option to MinHashDriver that specifies the dimension of vector to hash (indexes or values).
- MAHOUT-884: Matrix Concat utility, presently only concatenates two matrices.
- MAHOUT-1244: Upgraded to use Lucene 4.3
- MAHOUT-1187: Upgraded to CommonsLang3
- MAHOUT-916: Speedup the Mahout build by making tests run in parallel.
- The usual bug fixes. See JIRA [2] for more information on the 0.8 release.


