Elasticsearch Cluster Install
Important: If you already have a Logging Made Easy (LME) installation, upgrade it to 2.3.0 before converting that environment to a cluster. Reference the Upgrading LME guide for more information.
This document is the single canonical guide for standing up a multi-node LME Elasticsearch cluster from a fresh source checkout. It covers what is required up front, the commands operators reach for most often, the fresh-clone install walkthrough, and the supporting workflows around Secure Shell (SSH) bootstrapping, optional shared snapshot storage, and advanced operations.
It assumes you have just downloaded the LME source onto the host that will become the cluster master and that you intend to install on your Linux machines.
1. Requirements
Infrastructure
- Three or more Linux servers (Ubuntu 24.04+ or Red Hat Enterprise Linux [RHEL]/Rocky 8+ are the supported baselines)
- One server is designated the master; it runs the full LME stack (Elasticsearch, Kibana, Fleet, Wazuh)
- The remaining servers are child (data) nodes; they run Elasticsearch only
- Minimum recommended per node:
- Four Central Processing Unit (CPU) cores
- 8 GB RAM
- 100 GB free disk
Networking
- All nodes must be able to reach each other on:
9200/tcp(Elasticsearch HyperText Transfer Protocol [HTTP])9300/tcp(Elasticsearch transport)22/tcp(SSH)
- The master must additionally accept inbound traffic on:
5601/tcp(Kibana User Interface [UI])8220/tcp(Fleet enrollment)55000/tcp(Wazuh Application Programming Interface [API], where used)
- All nodes must be reachable by the IP or hostname you place in the cluster inventory; use either all IPs or all consistent hostnames (do not mix)
Software on the Master
The master needs a small set of base tools installed before the LME installer runs:
gitansible(2.9 or newer)jq
Examples:
# Ubuntu / Debian
sudo apt-get update
sudo apt-get install -y ansible jq git
# RHEL 9 / Rocky / Alma (jq via EPEL)
sudo dnf install -y epel-release
sudo dnf install -y ansible-core jq git
LME's ./install.sh --cluster will also install the required Ansible Galaxy
collections (community.general, ansible.posix) on the master.
Access Between Master and Child Nodes
The master must be able to:
- SSH to every child node as a chosen user (referred to here as the cluster user) without an interactive password prompt
- Run
sudoon every child node as that same user without a password prompt (so Ansible'sbecomecan complete).
An overview of the set up procedures are (reference Section 4 for the complete step-by-step guide):
- Generate or identify an SSH key on the master
- Authorize that key on every child node for the cluster user
- Grant the cluster user
NOPASSWDsudo on every child node
Storage
There are two distinct storage requirement tiers. Most installs only need the first.
-
Required for Cluster Install (Always):
- Node-local storage on every node, sized for the data you intend to keep on that node (reference the 100 GB minimum above)
- No shared filesystem is needed to form the cluster or to run normal operations
-
Only Needed for Filesystem Snapshot Backups (Optional):
- A directory that is the same path on every node and shares the same contents (typically a Network File System [NFS] or another shared filesystem)
- The Elasticsearch filesystem (
fs) snapshot repository requires this kind of shared directory, because the snapshot files must be visible from every data node - LME does not require this for installation; it is only needed if you plan
to use the
fssnapshot workflow
NFS is not required to form the cluster. It is a snapshot feature, not an install prerequisite.
2. Cluster Command Cheat Sheet
Run these from the LME repo root on the master node unless otherwise noted:
cd ~/LME
Install
# Full cluster install (master + all child nodes)
./install.sh --cluster
Additional variants (generally not required):
# Verbose Ansible output
./install.sh --cluster --debug
# Only install the master (cluster mode), skip child node deployment
./install.sh --cluster --cluster-master-only
# Only deploy Elasticsearch to child nodes (master already installed)
./install.sh --cluster --cluster-nodes-only
# Use a custom inventory path
./install.sh --cluster --cluster-inventory ansible/inventory/cluster.yml
Cluster Health and Status
Load Elasticsearch credentials into the current shell before any curl
commands below:
# use the -p flag to see the secrets and the -q flag to hide them
source /opt/lme/scripts/extract_secrets.sh -p
curl -sk -u "elastic:$elastic" https://localhost:9200/_cluster/health?pretty
curl -sk -u "elastic:$elastic" https://localhost:9200/_cat/nodes?v
curl -sk -u "elastic:$elastic" https://localhost:9200/_cat/shards?v
curl -sk -u "elastic:$elastic" https://localhost:9200/_cat/indices?v
sudo systemctl status lme
sudo podman ps --format '{{.Names}}\t{{.Status}}'
Check for unassigned shards:
curl -sk -u "elastic:$elastic" "https://localhost:9200/_cat/shards?v&s=state" | grep UNASSIGNED
Password Rotation
# Elastic superuser
ansible-playbook -i ansible/inventory/cluster.yml ansible/change_passwords.yml \
-e lme_user=elastic -e lme_password='YourNewSecurePassword123-'
# Kibana service account
ansible-playbook -i ansible/inventory/cluster.yml ansible/change_passwords.yml \
-e lme_user=kibana_system -e lme_password='YourNewSecurePassword123-'
# Wazuh dashboard / API users (run for each)
ansible-playbook -i ansible/inventory/cluster.yml ansible/change_passwords.yml \
-e lme_user=wazuh -e lme_password='YourNewSecurePassword123-'
ansible-playbook -i ansible/inventory/cluster.yml ansible/change_passwords.yml \
-e lme_user=wazuh_api -e lme_password='YourNewSecurePassword123-'
# Offline mode (skip the Have I Been Pwned check)
ansible-playbook -i ansible/inventory/cluster.yml ansible/change_passwords.yml \
-e lme_user=elastic -e lme_password='YourNewSecurePassword123-' \
-e offline_mode=true
Snapshots
# Default cluster snapshot (register repo, verify, create snapshot)
ansible-playbook -i ansible/inventory/cluster.yml ansible/snapshot_elasticsearch.yml
# Register and verify the repo only, no snapshot
ansible-playbook -i ansible/inventory/cluster.yml ansible/snapshot_elasticsearch.yml \
-e create_snapshot=false
# Custom snapshot name
ansible-playbook -i ansible/inventory/cluster.yml ansible/snapshot_elasticsearch.yml \
-e snapshot_name=before-maintenance
# Shared filesystem snapshot path (see Section 5 for the storage setup)
ansible-playbook -i ansible/inventory/cluster.yml ansible/snapshot_elasticsearch.yml \
-e es_snapshot_fs_location=/usr/share/elasticsearch/snapshots \
-e es_snapshot_repo=lme_nfs_backups
# S3 snapshot repository
ansible-playbook -i ansible/inventory/cluster.yml ansible/snapshot_elasticsearch.yml \
-e es_snapshot_repo_type=s3 \
-e es_s3_bucket=my-lme-snapshots \
-e es_s3_region=us-west-2
# List snapshots in the default repo
curl -sk -u "elastic:$elastic" https://localhost:9200/_snapshot/lme_backups/_all?pretty
Backups
# Cluster-safe backup bundle (snapshot + master backup + recovery manifest)
ansible-playbook -i ansible/inventory/cluster.yml ansible/cluster_backup_lme.yml
# Cluster backup using shared snapshot storage
ansible-playbook -i ansible/inventory/cluster.yml ansible/cluster_backup_lme.yml \
-e es_snapshot_fs_location=/usr/share/elasticsearch/snapshots \
-e es_snapshot_repo=lme_nfs_backups
# Cluster backup with a specific snapshot name
ansible-playbook -i ansible/inventory/cluster.yml ansible/cluster_backup_lme.yml \
-e snapshot_name=before-maintenance
Inspect the most recent local backup bundle on the master:
LATEST_BACKUP=$(ls -1dt /var/lib/containers/storage/backups/* | head -n1)
ls -la "$LATEST_BACKUP"
If shared storage is mounted, exported master recovery bundles live at:
ls -la /mnt/es-snapshots/lme-master-backups/
Restore
# Restore Elasticsearch data from a snapshot (full cluster restore)
ansible-playbook -i ansible/inventory/cluster.yml ansible/restore_elasticsearch_snapshot.yml \
-e snapshot_name=before-maintenance \
-e confirm_full_cluster_restore=true
# Restore a single index into the running cluster
ansible-playbook -i ansible/inventory/cluster.yml ansible/restore_elasticsearch_snapshot.yml \
-e snapshot_name=before-maintenance \
-e restore_mode=live_cluster \
-e restore_indices=lme-recovery-test \
-e include_global_state=false
# Restore master/control-plane state
ansible-playbook ansible/restore_lme_master.yml
# Restore master state from a specific backup
ansible-playbook ansible/restore_lme_master.yml \
-e restore_backup_dir=/mnt/es-snapshots/lme-master-backups/<timestamp>.tar.gz
Upgrade and maintenance
# Regenerate and redistribute cluster certificates
ansible-playbook -i ansible/inventory/cluster.yml ansible/elasticsearch.yml --tags certificates
# Rebuild or re-add a single node
ansible-playbook -i ansible/inventory/cluster.yml ansible/elasticsearch.yml --limit <node_name>
Single-node to cluster conversion
# Recommended wrapper, generates ansible/inventory/cluster.yml for you
bash scripts/convert_to_cluster.sh
Additional variants:
# Use an existing ansible/inventory/cluster.yml
bash scripts/convert_to_cluster.sh --skip-inventory
# Non-interactive wrapper
bash scripts/convert_to_cluster.sh --skip-prompts
# Fallback, direct playbook after creating ansible/inventory/cluster.yml
ansible-playbook -i ansible/inventory/cluster.yml ansible/convert_to_cluster.yml
# Fallback, non-interactive direct playbook
ansible-playbook -i ansible/inventory/cluster.yml ansible/convert_to_cluster.yml \
-e skip_prompts=true
3. Fresh Install Walkthrough
Perform these steps to go from a fresh source download on the master to a healthy cluster.
Step 1. Clone the source on the master.
git clone https://github.com/cisagov/LME.git ~/LME
cd ~/LME
Step 2. Prepare SSH and sudo on every child node.
The master must be able to SSH to each child node as the cluster user, and
that user must have NOPASSWD sudo on the child node. Perform the steps in Section 4 to
set this up before continuing.
When you are done, this should succeed from the master for every child node:
ssh <cluster_user>@<child_node> "sudo -n true && hostname"
Step 3. Create the environment file.
cp config/example.env config/lme-environment.env
# Replace with the master's actual IP (the one child nodes will use to reach it)
sed -i 's/IPVAR=.*/IPVAR=<MASTER_PRIVATE_IP>/' config/lme-environment.env
Step 4. Install Ansible collections.
cd ~/LME/ansible
ansible-galaxy collection install -r requirements.yml
cd ~/LME
This installs community.general and ansible.posix. ./install.sh --cluster
runs this for you, but installing manually first gives you faster feedback if
Galaxy is unreachable.
Step 5. Create the cluster inventory.
Create ansible/inventory/cluster.yml. A working three-node example:
all:
vars:
es_master_host: 10.0.0.4
es_cluster_seed_hosts:
- 10.0.0.4
- 10.0.0.5
- 10.0.0.6
children:
elasticsearch:
hosts:
# Master node, MUST be first (handles cert generation)
es1:
ansible_host: 10.0.0.4
ansible_connection: local
es_node_name: lme-elasticsearch
es_is_initial_master: true
es_publish_host: 10.0.0.4
es2:
ansible_host: 10.0.0.5
ansible_user: ubuntu
es_node_name: es2
es_publish_host: 10.0.0.5
es3:
ansible_host: 10.0.0.6
ansible_user: ubuntu
es_node_name: es3
es_publish_host: 10.0.0.6
Important rules:
- The master (
es1) must be listed first and must useansible_connection: local - Set
ansible_useron each child node to the cluster user from Section 4 - Every
es_publish_hostvalue must appear ines_cluster_seed_hosts - Use either IPs everywhere or hostnames everywhere; do not mix
- In most setups,
ansible_hostandes_publish_hostare the same IP; however, they can differ if you have a separate management network
A helper script, scripts/create_cluster_inventory.sh, can generate this
file interactively.
Step 6. Run the cluster install.
cd ~/LME
./install.sh --cluster
This single command will:
- Validate that the cluster inventory file exists and is well-formed
- SSH-ping every child node via Ansible
- Install Ansible Galaxy collections
- Run
site.ymlon the master in cluster mode - Run
elasticsearch.ymlon every cluster node
Useful flags:
--debug- verbose Ansible output--cluster-master-only- install only the master in cluster mode--cluster-nodes-only- deploy only Elasticsearch to child nodes--cluster-inventory PATH- use a non-default inventory file
--cluster and --offline cannot be combined. Offline cluster installation
is not supported at this time.
Step 7. Verify the cluster.
# use the -p flag to see secrets and -q to hide them
source /opt/lme/scripts/extract_secrets.sh -p
curl -sk -u "elastic:$elastic" https://localhost:9200/_cluster/health?pretty
curl -sk -u "elastic:$elastic" https://localhost:9200/_cat/nodes?v
A healthy cluster reports "status": "green", the expected
"number_of_nodes", and "unassigned_shards": 0.
Access Kibana at https://<MASTER_IP>:5601 and log in as elastic. Retrieve
the password with:
# use the -p flag to see the secrets and the -q flag to hide them
source /opt/lme/scripts/extract_secrets.sh -p && echo "$elastic"
4. SSH Bootstrap for Cluster Nodes
This section walks through the SSH and sudo setup that Section 3 depends on.
Repeat for every child node before running ./install.sh --cluster.
What "Reachable and Loginable from the Master" Means
For each child node, the master must be able to:
- Open an SSH session to the node as the chosen cluster user
- Authenticate with an SSH key (no interactive password prompt)
- Run
sudoon the child node as that same user with no password prompt - Reach the node at the exact IP or hostname listed in
ansible/inventory/cluster.yml
If any of these are not true, ./install.sh --cluster will either prompt
interactively or fail.
Step 4.1. Create or Identify the SSH Key on the Master
If the master already has an SSH key you want to reuse, skip this step. To create a new one, run the following:
ssh-keygen -t ed25519 -N "" -f ~/.ssh/id_ed25519 -q
# Or, for older toolchains:
ssh-keygen -t rsa -b 4096 -N "" -f ~/.ssh/id_rsa -q
Choose one key and use it consistently for all child nodes.
Step 4.2. Authorize the Master's Key on Each Child Node
Pick one of the two paths below. Run the chosen path once per child node from the master.
Path A: ssh-copy-id (recommended)
This is the simplest method when password SSH is temporarily available on the child node, or when you have another way to provide the password:
# Will prompt for the cluster user's password on the child node
ssh-copy-id <cluster_user>@<child_node>
If you have many nodes and want to script this, sshpass can supply the
password (Ubuntu: sudo apt-get install -y sshpass. RHEL via EPEL:
sudo dnf install -y epel-release && sudo dnf install -y sshpass):
sshpass -p '<password>' ssh-copy-id -o StrictHostKeyChecking=no \
<cluster_user>@<child_node>
Path B: Manual authorized_keys Setup (fallback)
Use this when ssh-copy-id is unavailable, when password SSH is disabled, or
when you provision child nodes through cloud-init, a config management tool,
or a Kickstart image.
-
On the master, print the public key you want to authorize:
cat ~/.ssh/id_ed25519.pub# or ~/.ssh/id_rsa.pub -
On the child node, append that exact line to the cluster user's
authorized_keys:sudo mkdir -p /home/<cluster_user>/.sshsudo chmod 700 /home/<cluster_user>/.ssh# Append the public key (paste between the quotes)echo 'ssh-ed25519 AAAA... user@master' | \sudo tee -a /home/<cluster_user>/.ssh/authorized_keyssudo chmod 600 /home/<cluster_user>/.ssh/authorized_keyssudo chown -R <cluster_user>:<cluster_user> /home/<cluster_user>/.ssh -
Confirm that
sshdallows key authentication. On most modern distros, this is the default. If you have hardened the host, ensure/etc/ssh/sshd_config(and any drop-ins) includes:PubkeyAuthentication yesReload SSH after edits:
sudo systemctl reload sshd.
Step 4.3. Grant Passwordless sudo on Each Child Node
Ansible runs as the cluster user and elevates with become. That requires
sudo without a password prompt on every child node.
On the child node, create a sudoers drop-in:
echo '<cluster_user> ALL=(ALL) NOPASSWD: ALL' | \
sudo tee /etc/sudoers.d/<cluster_user>-nopasswd
sudo chmod 440 /etc/sudoers.d/<cluster_user>-nopasswd
sudo visudo -cf /etc/sudoers.d/<cluster_user>-nopasswd
The final visudo -c check should print parsed OK. If it does not, fix
the file before disconnecting your current sudo session or you risk locking
yourself out.
Step 4.4. Verify Connectivity from the Master
For every child node, run these from the master:
ssh <cluster_user>@<child_node> "hostname"
ssh <cluster_user>@<child_node> "sudo -n true && echo SUDO_OK"
Both commands must succeed without any prompt. SUDO_OK must print for the
sudo check.
Step 4.5. Verify Your Inventory Matches Reality
Whatever IP, hostname, and user you used above must also appear in
ansible/inventory/cluster.yml:
ansible_hostmust match the value you SSH'd to.ansible_usermust match the cluster user you SSH'd as.es_publish_hostmust match an entry ines_cluster_seed_hosts.
If the install fails at Gathering facts or ping for a node, the mismatch is almost always in this section or in Step 4.4.
5. Optional: Shared Snapshot Storage (NFS)
You only need this section if you intend to use the Elasticsearch filesystem snapshot repository on a multi-node cluster.
The Elasticsearch fs repository requires that the same directory is
visible at the same path on every data node. NFS is the most common
way to provide that.
5.1. Configure NFS on the Master (server)
Ubuntu / Debian:
sudo apt-get install -y nfs-kernel-server
sudo mkdir -p /srv/es-snapshots
sudo chmod 777 /srv/es-snapshots
RHEL 9 / Rocky / Alma (back the export from /var so it is not on the small
root LV):
sudo dnf install -y nfs-utils
sudo mkdir -p /var/lib/lme/es-snapshots /srv/es-snapshots
sudo chmod 777 /var/lib/lme/es-snapshots /srv/es-snapshots
sudo mount --bind /var/lib/lme/es-snapshots /srv/es-snapshots
echo '/var/lib/lme/es-snapshots /srv/es-snapshots none bind 0 0' | \
sudo tee -a /etc/fstab
sudo chcon -Rt container_file_t /var/lib/lme/es-snapshots /srv/es-snapshots
Export to each node's private IP. For a three-node cluster with IPs
10.0.0.4, 10.0.0.5, 10.0.0.6:
echo '/srv/es-snapshots 10.0.0.4(rw,sync,no_subtree_check,no_root_squash) 10.0.0.5(rw,sync,no_subtree_check,no_root_squash) 10.0.0.6(rw,sync,no_subtree_check,no_root_squash)' \
| sudo tee /etc/exports
sudo exportfs -ra
Start and enable the NFS server:
# Ubuntu / Debian
sudo systemctl enable --now nfs-kernel-server
# RHEL / Rocky / Alma (also open firewalld if active)
sudo systemctl enable --now nfs-server
if systemctl is-active --quiet firewalld; then
sudo firewall-cmd --permanent --add-service=nfs
sudo firewall-cmd --permanent --add-service=mountd
sudo firewall-cmd --permanent --add-service=rpc-bind
sudo firewall-cmd --reload
fi
5.2. Mount the Share on Every Node
On the master, use a bind mount if you don't have a separate NFS server (avoids NFS self-mount hangs):
sudo mkdir -p /mnt/es-snapshots
sudo mount --bind /srv/es-snapshots /mnt/es-snapshots
echo '/srv/es-snapshots /mnt/es-snapshots none bind 0 0' | \
sudo tee -a /etc/fstab
On each child node, install the NFS client and mount the master's export
(replace 10.0.0.4 with your master's private IP):
# Ubuntu / Debian
sudo apt-get install -y nfs-common
# RHEL / Rocky / Alma
sudo dnf install -y nfs-utils
sudo mkdir -p /mnt/es-snapshots
sudo mount -t nfs -o vers=4.1,proto=tcp,hard,timeo=600,retrans=2 \
10.0.0.4:/srv/es-snapshots /mnt/es-snapshots
echo '10.0.0.4:/srv/es-snapshots /mnt/es-snapshots nfs vers=4.1,proto=tcp,hard,timeo=600,retrans=2,_netdev,nofail 0 0' \
| sudo tee -a /etc/fstab
On RHEL-family hosts with SELinux enforcing, also label the client mount so the Elasticsearch container can read and write it:
sudo chmod 777 /mnt/es-snapshots
sudo chcon -Rt container_file_t /mnt/es-snapshots
5.3. Wire Elasticsearch to the Snapshot Path
On every node, add the snapshot path to path.repo in
/opt/lme/config/elasticsearch.yml:
path.repo:
- /usr/share/elasticsearch/backups
- /usr/share/elasticsearch/snapshots
On every node, create a Quadlet drop-in so the Elasticsearch container mounts the shared directory, and then reload and restart Elasticsearch:
sudo mkdir -p /etc/containers/systemd/lme-elasticsearch.container.d/
cat <<'EOF' | sudo tee /etc/containers/systemd/lme-elasticsearch.container.d/nfs-mount.conf
[Container]
Volume=/mnt/es-snapshots:/usr/share/elasticsearch/snapshots
EOF
sudo systemctl daemon-reload
sudo systemctl restart lme-elasticsearch
Once every node has restarted, the shared /mnt/es-snapshots directory is
available inside each Elasticsearch container at
/usr/share/elasticsearch/snapshots.
5.4. Use the Shared Path in Snapshot Playbooks
ansible-playbook -i ansible/inventory/cluster.yml ansible/snapshot_elasticsearch.yml \
-e es_snapshot_fs_location=/usr/share/elasticsearch/snapshots \
-e es_snapshot_repo=lme_nfs_backups
6. Advanced and Specialized Workflows
These workflows are covered in dedicated docs in the repository. They are not required for a first install.
- Cluster child node recovery -
testing/v2/development/CLUSTER_NODE_RECOVERY.md - Cluster backup and recovery model -
ansible/CLUSTER_RECOVERY_README.md - Snapshot reference (single-node and cluster) -
ansible/SNAPSHOT_README.md - Password and secret rotation reference -
ansible/PASSWORD_README.md - Backup operations -
ansible/BACKUP_README.md - Rollback (single-node) -
ansible/ROLLBACK_README.md - Single-node to cluster conversion -
testing/v2/development/converting_to_cluster.md - Azure test cluster automation -
testing/v2/installers/cluster_installer/README.md - Cluster admin command reference -
testing/v2/development/CLUSTER_COMMANDS.md - Original cluster install reference -
testing/v2/development/CLUSTER_INSTALL.md
7. Troubleshooting
Child Node Will Not Join the Cluster
Check Elasticsearch logs on the affected node:
sudo podman logs lme-elasticsearch
Common causes:
- Firewall blocking
9200/tcpor9300/tcp. es_publish_hostnot reachable from other nodes.es_publish_hostnot listed ines_cluster_seed_hosts.- Certificate not yet distributed to that node.
Ansible Cannot Reach a Node
ssh <cluster_user>@<child_node> "hostname"
ssh <cluster_user>@<child_node> "sudo -n true && echo SUDO_OK"
Both must succeed without prompting. If they do not, redo Section 4 for that
node, and then re-run with ./install.sh --cluster --cluster-nodes-only.
Certificate Regeneration
Certificates are generated on the master and distributed to cluster nodes. To regenerate, run:
cd ~/LME
ansible-playbook -i ansible/inventory/cluster.yml ansible/elasticsearch.yml --tags certificates
Unassigned Shards After Install
# use the -p flag to see the secrets and the -q flag to hide them
source /opt/lme/scripts/extract_secrets.sh -p
curl -sk -u "elastic:$elastic" "https://localhost:9200/_cat/shards?v&s=state" | grep UNASSIGNED
This is normal briefly during cluster formation. If shards stay unassigned, inspect cluster allocation:
curl -sk -u "elastic:$elastic" \
https://localhost:9200/_cluster/allocation/explain?pretty