Elasticsearch: no known master node, scheduling a retry

Another of those things that managed to fix, so writing about it to resolve it later.

This happened with ES 1.7.6 that's running in an old server. The problem was trying to run a re-index and saw the following error in the log:

...
[DEBUG][action.admin.indices.get ] [Toxin] no known master node, scheduling a retry
[WARN ][discovery.zen            ] [Toxin] failed to connect to master [[Chrome][kLruo5pxRs-VLlADmOlB0w][host][inet[/192.168.1.5:9300]]], retrying...

...

Tried to search for the problem, but found a number of solutions that didn't work for me. All involved making sure that shard nodes were properly configured. I'm running this in a single instance locally to test things. I never configured shards.

Turns out that when restarting the service had these logs:

[INFO ][node                     ] [John Jameson] version[1.7.6], pid[22537], build[c730b59/2016-11-18T15:21:16Z]
[INFO ][node                     ] [John Jameson] initializing ...
[INFO ][plugins                  ] [John Jameson] loaded [], sites []
[INFO ][env                      ] [John Jameson] using [1] data paths, mounts [[/ (/dev/sda1)]], net usable_space [17.5gb], net total_space [218.9gb], types [ext4]
[INFO ][node                     ] [John Jameson] initialized
[INFO ][node                     ] [John Jameson] starting ...
[INFO ][transport                ] [John Jameson] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/192.168.43.157:9300]}
[INFO ][discovery                ] [John Jameson] elasticsearch/I5mmHZVxSGmkKi2KgZs4kQ
[INFO ][cluster.service          ] [John Jameson] new_master [John Jameson][I5mmHZVxSGmkKi2KgZs4kQ][Keia][inet[/192.168.43.157:9300]], reason: zen-disco-join (elected_as_master)
[INFO ][http                     ] [John Jameson] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/192.168.43.157:9200]}
[INFO ][node                     ] [John Jameson] started
[INFO ][gateway                  ] [John Jameson] recovered [0] indices into cluster_state
[WARN ][cluster.routing.allocation.decider] [John Jameson] high disk watermark [90%] exceeded on [I5mmHZVxSGmkKi2KgZs4kQ][John Jameson] free: 17.5gb[8%], shards will be relocated away from this node
[INFO ][cluster.routing.allocation.decider] [John Jameson] high disk watermark exceeded on one or more nodes, rerouting shards
...

Which was the hint I needed, I was low on disk in my computer!! So I had to delete a bunch of old stuff (and got to free about extra 50gb!). And that fixed it.

Turns out that my ES config prevented to start a node if there was too little free space.

Reply or react to this post via Webmentions or reply or like to the Mastodon, Twitter or Instagram post.