Discussion:
REBALANCELEADERS is not reliable
Bernd Fehling
2018-11-27 14:12:33 UTC
Permalink
Hi list,

unfortunately REBALANCELEADERS is not reliable and the leader
election has unpredictable results with SolrCloud 6.6.5 and
Zookeeper 3.4.10.
Seen with 5 shards / 3 replicas.

- CLUSTERSTATUS reports all replicas (core_nodes) as state=active.
- setting with ADDREPLICAPROP the property preferredLeader to other replicas
- calling REBALANCELEADERS
- some leaders have changed, some not.

I then tried:
- removing all preferredLeader properties from replicas which succeeded.
- trying again REBALANCELEADERS for the rest. No success.
- Shutting down nodes to force the leader to a specific replica left running.
No success.
- calling REBALANCELEADERS responds that the replica is inactive!!!
- calling CLUSTERSTATUS reports that the replica is active!!!

Also, the replica which don't want to become leader is not in the list
of collections->[collection_name]->leader_elect->shard1..x->election

Where is CLUSTERSTATUS getting it's state info from?

Has anyone else problems with REBALANCELEADERS?

I noticed that the Reference Guide writes "preferredLeader" (with capital "L")
but the JAVA code has "preferredleader".

Regards, Bernd
Vadim Ivanov
2018-11-27 14:47:22 UTC
Permalink
Hi, Bernd
I have tried REBALANCELEADERS with Solr 6.3 and 7.5
I had very similar results and notion that it's not reliable :(
--
Br, Vadim
-----Original Message-----
Sent: Tuesday, November 27, 2018 5:13 PM
Subject: REBALANCELEADERS is not reliable
Hi list,
unfortunately REBALANCELEADERS is not reliable and the leader
election has unpredictable results with SolrCloud 6.6.5 and
Zookeeper 3.4.10.
Seen with 5 shards / 3 replicas.
- CLUSTERSTATUS reports all replicas (core_nodes) as state=active.
- setting with ADDREPLICAPROP the property preferredLeader to other replicas
- calling REBALANCELEADERS
- some leaders have changed, some not.
- removing all preferredLeader properties from replicas which succeeded.
- trying again REBALANCELEADERS for the rest. No success.
- Shutting down nodes to force the leader to a specific replica left running.
No success.
- calling REBALANCELEADERS responds that the replica is inactive!!!
- calling CLUSTERSTATUS reports that the replica is active!!!
Also, the replica which don't want to become leader is not in the list
of collections->[collection_name]->leader_elect->shard1..x->election
Where is CLUSTERSTATUS getting it's state info from?
Has anyone else problems with REBALANCELEADERS?
I noticed that the Reference Guide writes "preferredLeader" (with capital "L")
but the JAVA code has "preferredleader".
Regards, Bernd
Bernd Fehling
2018-11-28 07:17:00 UTC
Permalink
Hi Vadim,

thanks for confirming.
So it seems to be a general problem with Solr 6.x, 7.x and might
be still there in the most recent versions.

But where to start to debug this problem, is it something not
correctly stored in zookeeper or is overseer the problem?

I was also reading something about a "leader queue" where possible
leaders have to be requeued or something similar.

May be I should try to get a situation where a "locked" core
is on the overseer and then connect the debugger to it and step
through it.
Peeking and poking around, like old Commodore 64 days :-)

Regards, Bernd
Post by Vadim Ivanov
Hi, Bernd
I have tried REBALANCELEADERS with Solr 6.3 and 7.5
I had very similar results and notion that it's not reliable :(
--
Br, Vadim
-----Original Message-----
Sent: Tuesday, November 27, 2018 5:13 PM
Subject: REBALANCELEADERS is not reliable
Hi list,
unfortunately REBALANCELEADERS is not reliable and the leader
election has unpredictable results with SolrCloud 6.6.5 and
Zookeeper 3.4.10.
Seen with 5 shards / 3 replicas.
- CLUSTERSTATUS reports all replicas (core_nodes) as state=active.
- setting with ADDREPLICAPROP the property preferredLeader to other replicas
- calling REBALANCELEADERS
- some leaders have changed, some not.
- removing all preferredLeader properties from replicas which succeeded.
- trying again REBALANCELEADERS for the rest. No success.
- Shutting down nodes to force the leader to a specific replica left running.
No success.
- calling REBALANCELEADERS responds that the replica is inactive!!!
- calling CLUSTERSTATUS reports that the replica is active!!!
Also, the replica which don't want to become leader is not in the list
of collections->[collection_name]->leader_elect->shard1..x->election
Where is CLUSTERSTATUS getting it's state info from?
Has anyone else problems with REBALANCELEADERS?
I noticed that the Reference Guide writes "preferredLeader" (with capital "L")
but the JAVA code has "preferredleader".
Regards, Bernd
Aman Tandon
2018-11-29 19:40:36 UTC
Permalink
For me today, I deleted the leader replica of one of the two shard
collection. Then other replica of that shard was getting elected for leader.

After waiting for long tried the setting addreplicaprop preferred leader on
one of the replica then tried FORCELEADER but no luck. Then also tried
rebalance but no help. Finally have to recreate the whole collection.

Not sure what was the issue but both FORCELEADER AND REBALANCING didn't
work if there was no leader however preferred leader property was setted.
Post by Bernd Fehling
Hi Vadim,
thanks for confirming.
So it seems to be a general problem with Solr 6.x, 7.x and might
be still there in the most recent versions.
But where to start to debug this problem, is it something not
correctly stored in zookeeper or is overseer the problem?
I was also reading something about a "leader queue" where possible
leaders have to be requeued or something similar.
May be I should try to get a situation where a "locked" core
is on the overseer and then connect the debugger to it and step
through it.
Peeking and poking around, like old Commodore 64 days :-)
Regards, Bernd
Post by Vadim Ivanov
Hi, Bernd
I have tried REBALANCELEADERS with Solr 6.3 and 7.5
I had very similar results and notion that it's not reliable :(
--
Br, Vadim
-----Original Message-----
Sent: Tuesday, November 27, 2018 5:13 PM
Subject: REBALANCELEADERS is not reliable
Hi list,
unfortunately REBALANCELEADERS is not reliable and the leader
election has unpredictable results with SolrCloud 6.6.5 and
Zookeeper 3.4.10.
Seen with 5 shards / 3 replicas.
- CLUSTERSTATUS reports all replicas (core_nodes) as state=active.
- setting with ADDREPLICAPROP the property preferredLeader to other
replicas
Post by Vadim Ivanov
- calling REBALANCELEADERS
- some leaders have changed, some not.
- removing all preferredLeader properties from replicas which succeeded.
- trying again REBALANCELEADERS for the rest. No success.
- Shutting down nodes to force the leader to a specific replica left
running.
Post by Vadim Ivanov
No success.
- calling REBALANCELEADERS responds that the replica is inactive!!!
- calling CLUSTERSTATUS reports that the replica is active!!!
Also, the replica which don't want to become leader is not in the list
of collections->[collection_name]->leader_elect->shard1..x->election
Where is CLUSTERSTATUS getting it's state info from?
Has anyone else problems with REBALANCELEADERS?
I noticed that the Reference Guide writes "preferredLeader" (with
capital "L")
Post by Vadim Ivanov
but the JAVA code has "preferredleader".
Regards, Bernd
Aman Tandon
2018-11-29 19:42:07 UTC
Permalink
++ correction
Post by Aman Tandon
For me today, I deleted the leader replica of one of the two shard
collection. Then other replicas of that shard wasn't getting elected for
leader.
After waiting for long tried the setting addreplicaprop preferred leader
on one of the replica then tried FORCELEADER but no luck. Then also tried
rebalance but no help. Finally have to recreate the whole collection.
Not sure what was the issue but both FORCELEADER AND REBALANCING didn't
work if there was no leader however preferred leader property was setted.
Post by Bernd Fehling
Hi Vadim,
thanks for confirming.
So it seems to be a general problem with Solr 6.x, 7.x and might
be still there in the most recent versions.
But where to start to debug this problem, is it something not
correctly stored in zookeeper or is overseer the problem?
I was also reading something about a "leader queue" where possible
leaders have to be requeued or something similar.
May be I should try to get a situation where a "locked" core
is on the overseer and then connect the debugger to it and step
through it.
Peeking and poking around, like old Commodore 64 days :-)
Regards, Bernd
Post by Vadim Ivanov
Hi, Bernd
I have tried REBALANCELEADERS with Solr 6.3 and 7.5
I had very similar results and notion that it's not reliable :(
--
Br, Vadim
-----Original Message-----
Sent: Tuesday, November 27, 2018 5:13 PM
Subject: REBALANCELEADERS is not reliable
Hi list,
unfortunately REBALANCELEADERS is not reliable and the leader
election has unpredictable results with SolrCloud 6.6.5 and
Zookeeper 3.4.10.
Seen with 5 shards / 3 replicas.
- CLUSTERSTATUS reports all replicas (core_nodes) as state=active.
- setting with ADDREPLICAPROP the property preferredLeader to other
replicas
Post by Vadim Ivanov
- calling REBALANCELEADERS
- some leaders have changed, some not.
- removing all preferredLeader properties from replicas which
succeeded.
Post by Vadim Ivanov
- trying again REBALANCELEADERS for the rest. No success.
- Shutting down nodes to force the leader to a specific replica left
running.
Post by Vadim Ivanov
No success.
- calling REBALANCELEADERS responds that the replica is inactive!!!
- calling CLUSTERSTATUS reports that the replica is active!!!
Also, the replica which don't want to become leader is not in the list
of collections->[collection_name]->leader_elect->shard1..x->election
Where is CLUSTERSTATUS getting it's state info from?
Has anyone else problems with REBALANCELEADERS?
I noticed that the Reference Guide writes "preferredLeader" (with
capital "L")
Post by Vadim Ivanov
but the JAVA code has "preferredleader".
Regards, Bernd
Atita Arora
2018-11-29 20:03:25 UTC
Permalink
This post might be inappropriate. Click to display it.
Vadim Ivanov
2018-12-06 15:31:03 UTC
Permalink
Is solr-dev forum I came across this post
http://lucene.472066.n3.nabble.com/Rebalance-Leaders-Leader-node-deleted-when-rebalancing-leaders-td4417040.html
May be it will shed some light?
--
Vadim
-----Original Message-----
Sent: Thursday, November 29, 2018 11:03 PM
Subject: Re: REBALANCELEADERS is not reliable
Indeed, I tried that on 7.4 & 7.5 too, indeed did not work for me as well,
even with the preferredLeader property as recommended in the
documentation.
I handled it with a little hack but certainly this dint work as expected.
I can provide more details if there's a ticket.
On Thu, Nov 29, 2018 at 8:42 PM Aman Tandon
Post by Aman Tandon
++ correction
Post by Aman Tandon
For me today, I deleted the leader replica of one of the two shard
collection. Then other replicas of that shard wasn't getting elected for
leader.
After waiting for long tried the setting addreplicaprop preferred leader
on one of the replica then tried FORCELEADER but no luck. Then also tried
rebalance but no help. Finally have to recreate the whole collection.
Not sure what was the issue but both FORCELEADER AND REBALANCING
didn't
Post by Aman Tandon
Post by Aman Tandon
work if there was no leader however preferred leader property was setted.
On Wed, Nov 28, 2018, 12:54 Bernd Fehling <
Post by Bernd Fehling
Hi Vadim,
thanks for confirming.
So it seems to be a general problem with Solr 6.x, 7.x and might
be still there in the most recent versions.
But where to start to debug this problem, is it something not
correctly stored in zookeeper or is overseer the problem?
I was also reading something about a "leader queue" where possible
leaders have to be requeued or something similar.
May be I should try to get a situation where a "locked" core
is on the overseer and then connect the debugger to it and step
through it.
Peeking and poking around, like old Commodore 64 days :-)
Regards, Bernd
Post by Vadim Ivanov
Hi, Bernd
I have tried REBALANCELEADERS with Solr 6.3 and 7.5
I had very similar results and notion that it's not reliable :(
--
Br, Vadim
-----Original Message-----
Sent: Tuesday, November 27, 2018 5:13 PM
Subject: REBALANCELEADERS is not reliable
Hi list,
unfortunately REBALANCELEADERS is not reliable and the leader
election has unpredictable results with SolrCloud 6.6.5 and
Zookeeper 3.4.10.
Seen with 5 shards / 3 replicas.
- CLUSTERSTATUS reports all replicas (core_nodes) as state=active.
- setting with ADDREPLICAPROP the property preferredLeader to other
replicas
Post by Vadim Ivanov
- calling REBALANCELEADERS
- some leaders have changed, some not.
- removing all preferredLeader properties from replicas which
succeeded.
Post by Vadim Ivanov
- trying again REBALANCELEADERS for the rest. No success.
- Shutting down nodes to force the leader to a specific replica left
running.
Post by Vadim Ivanov
No success.
- calling REBALANCELEADERS responds that the replica is inactive!!!
- calling CLUSTERSTATUS reports that the replica is active!!!
Also, the replica which don't want to become leader is not in the
list
Post by Aman Tandon
Post by Bernd Fehling
Post by Vadim Ivanov
of collections->[collection_name]->leader_elect->shard1..x->election
Where is CLUSTERSTATUS getting it's state info from?
Has anyone else problems with REBALANCELEADERS?
I noticed that the Reference Guide writes "preferredLeader" (with
capital "L")
Post by Vadim Ivanov
but the JAVA code has "preferredleader".
Regards, Bernd
Bernd Fehling
2018-12-07 09:01:15 UTC
Permalink
Thanks for looking this up.
It could be a hint where to jump into the code.
I wonder why they rejected a jira ticket about this problem?

Regards, Bernd
Post by Vadim Ivanov
Is solr-dev forum I came across this post
http://lucene.472066.n3.nabble.com/Rebalance-Leaders-Leader-node-deleted-when-rebalancing-leaders-td4417040.html
May be it will shed some light?
-----Original Message-----
Sent: Thursday, November 29, 2018 11:03 PM
Subject: Re: REBALANCELEADERS is not reliable
Indeed, I tried that on 7.4 & 7.5 too, indeed did not work for me as well,
even with the preferredLeader property as recommended in the
documentation.
I handled it with a little hack but certainly this dint work as expected.
I can provide more details if there's a ticket.
On Thu, Nov 29, 2018 at 8:42 PM Aman Tandon
Post by Aman Tandon
++ correction
Post by Aman Tandon
For me today, I deleted the leader replica of one of the two shard
collection. Then other replicas of that shard wasn't getting elected for
leader.
After waiting for long tried the setting addreplicaprop preferred leader
on one of the replica then tried FORCELEADER but no luck. Then also tried
rebalance but no help. Finally have to recreate the whole collection.
Not sure what was the issue but both FORCELEADER AND REBALANCING
didn't
Post by Aman Tandon
Post by Aman Tandon
work if there was no leader however preferred leader property was setted.
On Wed, Nov 28, 2018, 12:54 Bernd Fehling <
Post by Bernd Fehling
Hi Vadim,
thanks for confirming.
So it seems to be a general problem with Solr 6.x, 7.x and might
be still there in the most recent versions.
But where to start to debug this problem, is it something not
correctly stored in zookeeper or is overseer the problem?
I was also reading something about a "leader queue" where possible
leaders have to be requeued or something similar.
May be I should try to get a situation where a "locked" core
is on the overseer and then connect the debugger to it and step
through it.
Peeking and poking around, like old Commodore 64 days :-)
Regards, Bernd
Post by Vadim Ivanov
Hi, Bernd
I have tried REBALANCELEADERS with Solr 6.3 and 7.5
I had very similar results and notion that it's not reliable :(
--
Br, Vadim
-----Original Message-----
Sent: Tuesday, November 27, 2018 5:13 PM
Subject: REBALANCELEADERS is not reliable
Hi list,
unfortunately REBALANCELEADERS is not reliable and the leader
election has unpredictable results with SolrCloud 6.6.5 and
Zookeeper 3.4.10.
Seen with 5 shards / 3 replicas.
- CLUSTERSTATUS reports all replicas (core_nodes) as state=active.
- setting with ADDREPLICAPROP the property preferredLeader to other
replicas
Post by Vadim Ivanov
- calling REBALANCELEADERS
- some leaders have changed, some not.
- removing all preferredLeader properties from replicas which
succeeded.
Post by Vadim Ivanov
- trying again REBALANCELEADERS for the rest. No success.
- Shutting down nodes to force the leader to a specific replica left
running.
Post by Vadim Ivanov
No success.
- calling REBALANCELEADERS responds that the replica is inactive!!!
- calling CLUSTERSTATUS reports that the replica is active!!!
Also, the replica which don't want to become leader is not in the
list
Post by Aman Tandon
Post by Bernd Fehling
Post by Vadim Ivanov
of collections->[collection_name]->leader_elect->shard1..x->election
Where is CLUSTERSTATUS getting it's state info from?
Has anyone else problems with REBALANCELEADERS?
I noticed that the Reference Guide writes "preferredLeader" (with
capital "L")
Post by Vadim Ivanov
but the JAVA code has "preferredleader".
Regards, Bernd
Vadim Ivanov
2018-12-07 15:13:20 UTC
Permalink
I'm waiting for 7.6 or 7.5.1 and plan to apply patch from Endika Posadas to it.
Then test again and hope it'll help
--
Vadim
-----Original Message-----
Sent: Friday, December 07, 2018 12:01 PM
Subject: Re: REBALANCELEADERS is not reliable
Thanks for looking this up.
It could be a hint where to jump into the code.
I wonder why they rejected a jira ticket about this problem?
Regards, Bernd
Post by Vadim Ivanov
Is solr-dev forum I came across this post
http://lucene.472066.n3.nabble.com/Rebalance-Leaders-Leader-node-
deleted-when-rebalancing-leaders-td4417040.html
Post by Vadim Ivanov
May be it will shed some light?
-----Original Message-----
Sent: Thursday, November 29, 2018 11:03 PM
Subject: Re: REBALANCELEADERS is not reliable
Indeed, I tried that on 7.4 & 7.5 too, indeed did not work for me as well,
even with the preferredLeader property as recommended in the documentation.
I handled it with a little hack but certainly this dint work as expected.
I can provide more details if there's a ticket.
On Thu, Nov 29, 2018 at 8:42 PM Aman Tandon
Post by Aman Tandon
++ correction
Post by Aman Tandon
For me today, I deleted the leader replica of one of the two shard
collection. Then other replicas of that shard wasn't getting elected for
leader.
After waiting for long tried the setting addreplicaprop preferred leader
on one of the replica then tried FORCELEADER but no luck. Then also
tried
Post by Vadim Ivanov
Post by Aman Tandon
Post by Aman Tandon
rebalance but no help. Finally have to recreate the whole collection.
Not sure what was the issue but both FORCELEADER AND REBALANCING
didn't
Post by Aman Tandon
Post by Aman Tandon
work if there was no leader however preferred leader property was
setted.
Post by Vadim Ivanov
Post by Aman Tandon
Post by Aman Tandon
On Wed, Nov 28, 2018, 12:54 Bernd Fehling <
Post by Bernd Fehling
Hi Vadim,
thanks for confirming.
So it seems to be a general problem with Solr 6.x, 7.x and might
be still there in the most recent versions.
But where to start to debug this problem, is it something not
correctly stored in zookeeper or is overseer the problem?
I was also reading something about a "leader queue" where possible
leaders have to be requeued or something similar.
May be I should try to get a situation where a "locked" core
is on the overseer and then connect the debugger to it and step
through it.
Peeking and poking around, like old Commodore 64 days :-)
Regards, Bernd
Post by Vadim Ivanov
Hi, Bernd
I have tried REBALANCELEADERS with Solr 6.3 and 7.5
I had very similar results and notion that it's not reliable :(
--
Br, Vadim
-----Original Message-----
Sent: Tuesday, November 27, 2018 5:13 PM
Subject: REBALANCELEADERS is not reliable
Hi list,
unfortunately REBALANCELEADERS is not reliable and the leader
election has unpredictable results with SolrCloud 6.6.5 and
Zookeeper 3.4.10.
Seen with 5 shards / 3 replicas.
- CLUSTERSTATUS reports all replicas (core_nodes) as state=active.
- setting with ADDREPLICAPROP the property preferredLeader to
other
Post by Vadim Ivanov
Post by Aman Tandon
Post by Aman Tandon
Post by Bernd Fehling
replicas
Post by Vadim Ivanov
- calling REBALANCELEADERS
- some leaders have changed, some not.
- removing all preferredLeader properties from replicas which
succeeded.
Post by Vadim Ivanov
- trying again REBALANCELEADERS for the rest. No success.
- Shutting down nodes to force the leader to a specific replica left
running.
Post by Vadim Ivanov
No success.
- calling REBALANCELEADERS responds that the replica is inactive!!!
- calling CLUSTERSTATUS reports that the replica is active!!!
Also, the replica which don't want to become leader is not in the
list
Post by Aman Tandon
Post by Bernd Fehling
Post by Vadim Ivanov
of collections->[collection_name]->leader_elect->shard1..x->election
Where is CLUSTERSTATUS getting it's state info from?
Has anyone else problems with REBALANCELEADERS?
I noticed that the Reference Guide writes "preferredLeader" (with
capital "L")
Post by Vadim Ivanov
but the JAVA code has "preferredleader".
Regards, Bernd
Loading...