How to tune for indexing speed
Use bulk requests
1
Bulk requests will yield much better performance than single-document index requests.
Use multiple workers/threads to send data to elasticsearch
1
This can be tested by progressively increasing the number of workers until either I/O or CPU is saturated on the cluster.
Increase the refresh interval
1
2The default index.refresh_interval is 1s, which forces elasticsearch to create a new segment every second.
Increasing this value (to say, 30s) will allow larger segments to flush and decreases future merge pressure.Disable refresh and replicas for initial loads
1
If you need to load a large amount of data at once, you should disable refresh by setting index.refresh_interval to -1 and set index.number_of_replicas to 0.
Disable swapping
Give memory to the filesystem cache
1
2The filesystem cache will be used in order to buffer I/O operations.
You should make sure to give at least half the memory of the machine running elasticsearch to the filesystem cache.Use auto-generated ids
1
2When indexing a document that has an explicit id, elasticsearch needs to check whether a document with the same id already exists within the same shard, which is a costly operation and gets even more costly as the index grows.
By using auto-generated ids, Elasticsearch can skip this check, which makes indexing faster.Use faster hardware
1
If indexing is I/O bound, you should investigate giving more memory to the filesystem cache (see above) or buying faster drives.
Indexing buffer size
1
2If your node is doing only heavy indexing, be sure indices.memory.index_buffer_size is large enough to give at most 512 MB indexing buffer per shard doing heavy indexing (beyond that indexing performance does not typically improve).
The default is 10% which is often plenty: for example, if you give the JVM 10GB of memory, it will give 1GB to the index buffer, which is enough to host two shards that are heavily indexing.