把 ElasticSearch 当成是 NoSQL 数据库 已翻译 100%

oschina 投递于 2014/02/13 08:20 (共 18 段, 翻译完成于 02-20)
阅读 14192
收藏 47
4
加载中

Can Elasticsearch be used as a "NoSQL"-database? NoSQL means different things in different contexts, and interestingly it's not really about SQL. We will start out with a "Maybe!", and look into the various properties of Elasticsearch as well as those it has sacrificed, in order to become one of the most flexible, scalable and performant search and analytics engines yet.

What is a NoSQL Database Anyway?

NoSQL-database defines NoSQL as “Next Generation Databases mostly addressing some of the points: being non-relational, distributed, open-source and horizontally scalable.”. In other words, it’s not a very precise definition.

已有 2 人翻译此段
我来翻译

It’s not about SQL in particular. For example, Hive’s query language is clearly inspired by SQL. The same is true for Esper’s query language, which operates on streams instead of relations. Also, did you know PostgreSQL was named “Postgres” and had “Quel” as its query language back in the days? While first and foremost an ORDBMS, it now also has many features to make it viable as a schemaless document-store.

It’s not about ACID-ity either. Hyperdex is one example of a NoSQL-database that aims to provide ACID-transactions. MySQL, certainly an SQL-database, has a history of dubious interpretations of what ACID really means.

已有 2 人翻译此段
我来翻译

Relations? While most of the NoSQL-databases do not support joining in the same sense as traditional relational databases and leave that as an exercise for the user, there are those that do. RethinkDB, Hive and Pig, to name a few. Neo4j, a graph-oriented database, certainly deals with relations - it’s excellent at traversing relations (i.e. edges) in graphs. Elasticsearch has a concept of “query time” joining with parent/child-relations and “index time” joining with nested types.

Distributed? While there are some distributed SQL-databases around, and some projects aiming to be something like a NoSQLite, newer generation databases tend to be distributed in some way or another.

已有 1 人翻译此段
我来翻译

To summarize the summary, it neither makes sense to precisely define NoSQL, nor to simply say that Elasticsearch is a “document store”-type NoSQL-database. At the time of writing, nosql-database.org lists >20 of those.

In the next sections, we’ll have a look at some important properties and see how Elasticsearch does or does not implement them.

已有 2 人翻译此段
我来翻译

No Transactions

Lucene, which Elasticsearch is built on, has a notion of transactions. Elasticsearch on the other hand, does not have transactions in the typical sense. There is no way to rollback a submitted document, and you cannot submit a group of documents and have either all or none of them indexed. What it does have, however, is a write-ahead-log to ensure the durability of operations without having to do an expensive Lucene-commit. You can also specify the consistency level of index-operations, in terms of how many replicas must acknowledge the operation before returning. This defaults to a quorum, i.e. n2+1 .

已有 1 人翻译此段
我来翻译

Visibility of changes is controlled when an index is refreshed, which by default is once per second, and happens on a shard-by-shard-basis.

Optimistic concurrency control is done by specifying the version of the submitted documents.

Elasticsearch is built for speed. Doing distributed transactions is a lot of work. Not providing them makes a lot of things easier. By accepting that what we read can be somewhat stale, and that everyone sees the same timeline, Elasticsearch can serve a lot of things from caches - which is paramount for the mind-boggling performance we love it for.

已有 1 人翻译此段
我来翻译

Schema Flexible

Elasticsearch does not require you to specify a schema upfront. Throw a JSON-document at it, and it will do some educated guessing to infer its type. It does a good job at things like numerics, booleans and timestamps. For strings, it will use the “standard”-analyzer, which is usually good to get started.

While it’s arguably “schema free”, in the sense that you don’t have to specify a schema, we like to think of it as “schema flexible” instead. To develop great search and/or analytics, you really need to tweak your schemas. Elasticsearch has an extensive set of powerful tools to help you, like dynamic templates, multi-field objects, etc. This is covered in more detail in our article on mapping.

已有 1 人翻译此段
我来翻译

Relations and Constraints

Elasticsearch is a document oriented database. The entire object graph you want to search needs to be indexed, so before indexing your documents, they must be denormalized. Denormalization increases retrieval performance (since no query joining is necessary), uses more space (because things must be stored several times), but makes keeping things consistent and up-to-date more difficult (as any change must be applied to all instances). They’re excellent for write-once-read-many-workloads, however.

已有 1 人翻译此段
我来翻译

For example, say you have set up database containing customers, orders and products, and you want to search for orders given the name of a product and user. This could be solved by indexing orders with all the necessary information about the user and the products. Searching is then easy, but what happens when you want to change the name of the product? In a relational design with proper normalization, you would simply update the product and be done. That’s what they are really good at. With a denormalized document database, every order with the product would have to be updated.

已有 2 人翻译此段
我来翻译

In other words, with document oriented databases like Elasticsearch, we design our mappings and store our documents such that it’s optimized for search and retrieval.

As mentioned in the introduction, Elasticsearch has a concept of “query time” joining with parent/child-relations, and “index time” joining with nested types. We’ll probably cover this in more depth in a future article. In the meantime, we can recommend Martijn van Groningen’s presentation “Document relations with Elasticsearch”.

Most relational databases also let you specify constraints to define what is and isn’t consistent. For example, referential integrity and uniqueness can be enforced. You can require that the sum of account movements must be positive and so on. Document oriented databases tend not to do this, and Elasticsearch is no different.

已有 1 人翻译此段
我来翻译
本文中的所有译文仅用于学习和交流目的,转载请务必注明文章译者、出处、和本文链接。
我们的翻译工作遵照 CC 协议,如果我们的工作有侵犯到您的权益,请及时联系我们。
加载中

评论(11)

白丝魔理沙
白丝魔理沙

引用来自“afpro”的评论

"鲁棒性"是一个我一直以来都不理解的词,为什么不说"稳定性"或者"健壮性",字数都一样,为什么?!
音译的。
码翼
码翼
很详细的文章,学习了
最适合测试 ElasticSearch REST API工具非 Wisdom RESTClient莫属
支持自动化测试,生成测试报告和API文档。
https://github.com/Wisdom-Projects/rest-client
栗子太重举不动
栗子太重举不动
正打算把Elasticsearch当做NoSQL来试试
Glide
Glide
两年前的老文章啦,还是不错
杨子江
杨子江

引用来自“afpro”的评论

"鲁棒性"是一个我一直以来都不理解的词,为什么不说"稳定性"或者"健壮性",字数都一样,为什么?!
Robust 来自系统科学
强子大叔的码田
强子大叔的码田

引用来自“扣舍蛮”的评论

引用来自“afpro”的评论

"鲁棒性"是一个我一直以来都不理解的词,为什么不说"稳定性"或者"健壮性",字数都一样,为什么?!

IT码农YY的词吧
robust的中文翻译
扣舍蛮
扣舍蛮

引用来自“afpro”的评论

"鲁棒性"是一个我一直以来都不理解的词,为什么不说"稳定性"或者"健壮性",字数都一样,为什么?!

IT码农YY的词吧
空腔
空腔
Linq
afpro
afpro
"鲁棒性"是一个我一直以来都不理解的词,为什么不说"稳定性"或者"健壮性",字数都一样,为什么?!
把妹达人老张
把妹达人老张
看Elasticsearch java api,还真有点orm的意思。
返回顶部
顶部