elasticsearch how to tune for indexing speed

How to tune for indexing speed

  1. Use bulk requests

    1
    Bulk requests will yield much better performance than single-document index requests.
  2. Use multiple workers/threads to send data to elasticsearch

    1
    This can be tested by progressively increasing the number of workers until either I/O or CPU is saturated on the cluster.
  3. Increase the refresh interval

    1
    2
    The default index.refresh_interval is 1s, which forces elasticsearch to create a new segment every second.
    Increasing this value (to say, 30s) will allow larger segments to flush and decreases future merge pressure.
  4. Disable refresh and replicas for initial loads

    1
    If you need to load a large amount of data at once, you should disable refresh by setting index.refresh_interval to -1 and set index.number_of_replicas to 0.
  5. Disable swapping

  6. Give memory to the filesystem cache

    1
    2
    The filesystem cache will be used in order to buffer I/O operations.
    You should make sure to give at least half the memory of the machine running elasticsearch to the filesystem cache.
  7. Use auto-generated ids

    1
    2
    When indexing a document that has an explicit id, elasticsearch needs to check whether a document with the same id already exists within the same shard, which is a costly operation and gets even more costly as the index grows.
    By using auto-generated ids, Elasticsearch can skip this check, which makes indexing faster.
  8. Use faster hardware

    1
    If indexing is I/O bound, you should investigate giving more memory to the filesystem cache (see above) or buying faster drives.
  9. Indexing buffer size

    1
    2
    If your node is doing only heavy indexing, be sure indices.memory.index_buffer_size is large enough to give at most 512 MB indexing buffer per shard doing heavy indexing (beyond that indexing performance does not typically improve).
    The default is 10% which is often plenty: for example, if you give the JVM 10GB of memory, it will give 1GB to the index buffer, which is enough to host two shards that are heavily indexing.