Thursday, June 7, 2018

Two possible options for optimizing an ElasticSearch 5.x cluster

Been trying to help my boss manage a small Elasticsearch cluster. Performance has been a nagging issue on it; and a lot of optimization examples are old for our needs (running 5.6.x at the moment).

Less index refreshing

Elastic suggests changing the "refresh interval" to something other than its default one second index refresh on certain clusters. This forum post told me what to shove into Kibana / CURL: think I went with 15 or 20 seconds.
curl -XPUT localhost:9200/_settings -d '{
  "index": {
    "refresh_interval": "15s"
  }
}'

Setup Curator

Per this awesome Stack Overflow piece, you can wrap your head around how the indexing works. Some other stuff I came across suggests that you can shrink the older indices down to 1 segment, and gain performance in doing so. Turns out there's a tool to do this!
  1. With Python installed, use a command prompt to run pip install elasticsearch-curator
  2. Optional for Windows: you can go to your Python\Scripts directory, and copy the curator EXEs to a folder you want to save those and the config files in.
  3. Create the config files for your needs; put them in a subdirectory. There has to be at least two files: a curator.yml file, and a second file that has the actions. Past users of Ansible will know how to lay this out.
  4. Example command to use with a task manager / cron / etc: curator.exe --config config\curator.yml config\actions\actions.yml

config/curator.yml

---
# Remember, leave a key empty if there is no value.
client:
  hosts:
    - localhost
logging:
  loglevel: INFO
  logfile: 'path_to_logfile'
  logformat: default
  blacklist: ['elasticsearch', 'urllib3']

config/actions/actions.yml

---
# Remember, leave a key empty if there is no value.
actions:
  1:
    action: forcemerge
    description: >-
      Perform a forceMerge on selected indices to 'max_num_segments' per shard.
    options:
      max_num_segments: 1
      timeout_override:
      delay: 60
    filters:
    - filtertype: pattern
      kind: prefix
      value: logstash-
    - filtertype: age
      source: name
      direction: older
      timestring: '%Y.%m.%d'
      unit: days
      unit_count: 7
    - filtertype: forcemerged
      max_num_segments: 1
      exclude: True