elasticsearch core component

elasticsearch core component

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
1.NRT(Near realtime)
Elasticsearch is a near real time search platform.

2.cluster
A cluster is a collection of one or more nodes (servers) that together holds your entire data and provides federated indexing and search capabilities across all nodes.
A cluster is identified by a unique name which by default is "elasticsearch". This name is important because a node can only be part of a cluster if the node is set up to join the cluster by its name.

3.node
A node is a single server that is part of your cluster, stores your data, and participates in the cluster’s indexing and search capabilities.
Just like a cluster, a node is identified by a name which by default is a random Universally Unique IDentifier (UUID) that is assigned to the node at startup.
This name is important for administration purposes where you want to identify which servers in your network correspond to which nodes in your Elasticsearch cluster.

4.index
An index is a collection of documents that have somewhat similar characteristics.
An index is identified by a name (that must be all lowercase) and this name is used to refer to the index when performing indexing, search, update, and delete operations against the documents in it.
In a single cluster, you can define as many indexes as you want.

5.type
Within an index, you can define one or more types.
A type is a logical category/partition of your index whose semantics is completely up to you.In general, a type is defined for documents that have a set of common fields.
For example, let’s assume you run a blogging platform and store all your data in a single index. In this index, you may define a type for user data, another type for blog data, and yet another type for comments data.

6.document
A document is a basic unit of information that can be indexed.
For example, you can have a document for a single customer, another document for a single product, and yet another for a single order.
This document is expressed in JSON (JavaScript Object Notation) which is an ubiquitous internet data interchange format.

Within an index/type, you can store as many documents as you want.

Note that although a document physically resides in an index, a document actually must be indexed/assigned to a type inside an index.

7.shards & replicas
An index can potentially store a large amount of data that can exceed the hardware limits of a single node.
For example, a single index of a billion documents taking up 1TB of disk space may not fit on the disk of a single node or may be too slow to serve search requests from a single node alone.

To solve this problem, Elasticsearch provides the ability to subdivide your index into multiple pieces called shards.
When you create an index, you can simply define the number of shards that you want. Each shard is in itself a fully-functional and independent "index" that can be hosted on any node in the cluster.

Sharding is important for two primary reasons:
1.It allows you to horizontally split/scale your content volume
2.It allows you to distribute and parallelize operations across shards (potentially on multiple nodes) thus increasing performance/throughput

In a network/cloud environment where failures can be expected anytime, it is very useful and highly recommended to have a failover mechanism in case a shard/node somehow goes offline or disappears for whatever reason.
To this end, Elasticsearch allows you to make one or more copies of your index’s shards into what are called replica shards, or replicas for short.

Replication is important for two primary reasons:
1.It provides high availability in case a shard/node fails. For this reason, it is important to note that a replica shard is never allocated on the same node as the original/primary shard that it was copied from.
2.It allows you to scale out your search volume/throughput since searches can be executed on all replicas in parallel.

To summarize, each index can be split into multiple shards.
An index can also be replicated zero (meaning no replicas) or more times.
Once replicated, each index will have primary shards (the original shards that were replicated from) and replica shards (the copies of the primary shards).
The number of shards and replicas can be defined per index at the time the index is created.
After the index is created, you may change the number of replicas dynamically anytime but you cannot change the number shards after-the-fact.
当你创建完index后,可以动态添加shard的replica,但不能对shard进行动态修改

By default, each index in Elasticsearch is allocated 5 primary shards and 1 replica which means that if you have at least two nodes in your cluster, your index will have 5 primary shards and another 5 replica shards (1 complete replica) for a total of 10 shards per index.

Note:
Each Elasticsearch shard is a Lucene index.
There is a maximum number of documents you can have in a single Lucene index.
As of LUCENE-5843, the limit is 2,147,483,519 (= Integer.MAX_VALUE - 128) documents. You can monitor shard sizes using the "_cat/shards" api.