Discussion:
RecoveryStrategy overseer session expired
Trym R. Møller
2012-07-23 13:58:59 UTC
Permalink
Hi

Running SolrCloud with a Solr loosing its zookeeper connection while
having a replica I see the below log message repeatedly and the shard
never recovers. The Solr has successfully reconnected to ZooKeeper and
ZooKeeper is running fine.
I know that the cause is the loss of the ZooKeeper connection and I will
work on that, but I can guarantee that one of my ZooKeepers will go down
at some point (e.g. by a system admin), so I need the recovery to work.
I can see the code has changed recently just in this area.

Do anyone have a hint of what I may do to get more information about this?

Thanks in advance for any comments.

Best regards Trym

SEVERE: Error while trying to recover.
org.apache.zookeeper.KeeperException$SessionExpiredException:
KeeperErrorCode = Session expired for /overseer/queue/qn-
at
org.apache.zookeeper.KeeperException.create(KeeperException.java:118)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:643)
at
org.apache.solr.cloud.DistributedQueue.offer(DistributedQueue.java:236)
at org.apache.solr.cloud.ZkController.publish(ZkController.java:745)
at
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:288)
at
org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:210)
Mark Miller
2012-07-23 14:53:32 UTC
Permalink
Hey - out of the house for a bit so I don't have the issue number, but a few days ago I resolved an issue around the distrib queue using the straight zk client and not the solr zk client.

I'm not 100% since I'm out on the street, but I think that will probably solve your issue.

Sent from my iPhone
Hi
Running SolrCloud with a Solr loosing its zookeeper connection while having a replica I see the below log message repeatedly and the shard never recovers. The Solr has successfully reconnected to ZooKeeper and ZooKeeper is running fine.
I know that the cause is the loss of the ZooKeeper connection and I will work on that, but I can guarantee that one of my ZooKeepers will go down at some point (e.g. by a system admin), so I need the recovery to work. I can see the code has changed recently just in this area.
Do anyone have a hint of what I may do to get more information about this?
Thanks in advance for any comments.
Best regards Trym
SEVERE: Error while trying to recover.
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /overseer/queue/qn-
at org.apache.zookeeper.KeeperException.create(KeeperException.java:118)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:643)
at org.apache.solr.cloud.DistributedQueue.offer(DistributedQueue.java:236)
at org.apache.solr.cloud.ZkController.publish(ZkController.java:745)
at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:288)
at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:210)
Mark Miller
2012-07-23 17:38:37 UTC
Permalink
https://issues.apache.org/jira/browse/SOLR-3647 DistrubtedQueue should use our Solr zk client rather than the std zk client.

I didn't add a CHANGES entry, but there should be one under bugs - I hadn't fully thought through the problems other than it would not deal with connection loss correctly.

Session Expiration is even worse though - the std ZooKeeper client is useless after an expiration - you have to create a new client. Otherwise it will keep throwing an expired connection exception every time you try and use it.

The solr zk client handles this transparently and you can keep using the same client instance once it reconnects.

- Mark
Post by Mark Miller
Hey - out of the house for a bit so I don't have the issue number, but a few days ago I resolved an issue around the distrib queue using the straight zk client and not the solr zk client.
I'm not 100% since I'm out on the street, but I think that will probably solve your issue.
Sent from my iPhone
Hi
Running SolrCloud with a Solr loosing its zookeeper connection while having a replica I see the below log message repeatedly and the shard never recovers. The Solr has successfully reconnected to ZooKeeper and ZooKeeper is running fine.
I know that the cause is the loss of the ZooKeeper connection and I will work on that, but I can guarantee that one of my ZooKeepers will go down at some point (e.g. by a system admin), so I need the recovery to work. I can see the code has changed recently just in this area.
Do anyone have a hint of what I may do to get more information about this?
Thanks in advance for any comments.
Best regards Trym
SEVERE: Error while trying to recover.
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /overseer/queue/qn-
at org.apache.zookeeper.KeeperException.create(KeeperException.java:118)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:643)
at org.apache.solr.cloud.DistributedQueue.offer(DistributedQueue.java:236)
at org.apache.solr.cloud.ZkController.publish(ZkController.java:745)
at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:288)
at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:210)
- Mark Miller
lucidimagination.com

Loading...