Discussion:
Cannot connect to a zookeeper 3.4.6 instance via zkCli.cmd
Adrian Liew
2015-10-01 09:32:22 UTC
Permalink
Hi there,

Currently, I have setup an azure virtual network to connect my Zookeeper clusters together with three Azure VMs. Each VM has an internal IP of 10.0.0.4, 10.0.0.5 and 10.0.0.6. I have also setup Solr 5.3.0 which runs in Solr Cloud mode connected to all three Zookeepers in an external ensemble manner.

I am able to connect to 10.0.0.4 and 10.0.0.6 via the zkCli.cmd after starting the Zookeeper services. However for 10.0.0.5, I keep getting the below error even if I started the zookeeper service.

[cid:***@01D0FC6E.BDC2D990]

I have restarted 10.0.0.5 VM several times and still am unable to connect to Zookeeper via zkCli.cmd. I have checked zoo.cfg (making sure ports, data and logs are all set correctly) and myid to ensure they have the correct configurations.

The simple command line I used to connect to Zookeeper is zkCli.cmd -server 10.0.0.5:2182 for example.

Any ideas?

Best regards,

Adrian Liew | Consultant Application Developer
Avanade Malaysia Sdn. Bhd..| Consulting Services
(: Direct: +(603) 2382 5668
È: +6010-2288030
Adrian Liew
2015-10-01 10:19:21 UTC
Permalink
Hi all,

The problem below was resolved by appropriately setting my server ip addresses to have the following for each zoo.cfg:

server.1=10.0.0.4:2888:3888
server.2=10.0.0.5:2888:3888
server.3=10.0.0.6:2888:3888

as opposed to the following:

server.1=10.0.0.4:2888:3888
server.2=10.0.0.5:2889:3889
server.3=10.0.0.6:2890:3890

I am not sure why the above can be an issue (by right it should not), however I followed the recommendations provided by Zookeeper administration guide under RunningReplicatedZookeeper (https://zookeeper.apache.org/doc/r3.1.2/zookeeperStarted.html#sc_RunningReplicatedZooKeeper)

Given that I am testing multiple servers in a mutiserver environment, it will be safe to use 2888:3888 on each server rather than have different ports.

Regards,
Adrian

From: Adrian Liew [mailto:***@avanade.com]
Sent: Thursday, October 1, 2015 5:32 PM
To: solr-***@lucene.apache.org
Subject: Cannot connect to a zookeeper 3.4.6 instance via zkCli.cmd

Hi there,

Currently, I have setup an azure virtual network to connect my Zookeeper clusters together with three Azure VMs. Each VM has an internal IP of 10.0.0.4, 10.0.0.5 and 10.0.0.6. I have also setup Solr 5.3.0 which runs in Solr Cloud mode connected to all three Zookeepers in an external ensemble manner.

I am able to connect to 10.0.0.4 and 10.0.0.6 via the zkCli.cmd after starting the Zookeeper services. However for 10.0.0.5, I keep getting the below error even if I started the zookeeper service.

[cid:***@01D0FC6E.BDC2D990]

I have restarted 10.0.0.5 VM several times and still am unable to connect to Zookeeper via zkCli.cmd. I have checked zoo.cfg (making sure ports, data and logs are all set correctly) and myid to ensure they have the correct configurations.

The simple command line I used to connect to Zookeeper is zkCli.cmd -server 10.0.0.5:2182 for example.

Any ideas?

Best regards,

Adrian Liew | Consultant Application Developer
Avanade Malaysia Sdn. Bhd..| Consulting Services
(: Direct: +(603) 2382 5668
È: +6010-2288030
Zheng Lin Edwin Yeo
2015-10-02 03:03:12 UTC
Permalink
Hi Adrian,

How is your setup of your system like? By right it shouldn't be an issue if
we use different ports.

in fact, if the various zookeeper instance are running on a single machine,
they have to be on different ports in order for it to work.


Regards,
Edwin
Post by Adrian Liew
Hi all,
The problem below was resolved by appropriately setting my server ip
server.1=10.0.0.4:2888:3888
server.2=10.0.0.5:2888:3888
server.3=10.0.0.6:2888:3888
server.1=10.0.0.4:2888:3888
server.2=10.0.0.5:2889:3889
server.3=10.0.0.6:2890:3890
I am not sure why the above can be an issue (by right it should not),
however I followed the recommendations provided by Zookeeper administration
guide under RunningReplicatedZookeeper (
https://zookeeper.apache.org/doc/r3.1.2/zookeeperStarted.html#sc_RunningReplicatedZooKeeper
)
Given that I am testing multiple servers in a mutiserver environment, it
will be safe to use 2888:3888 on each server rather than have different
ports.
Regards,
Adrian
Sent: Thursday, October 1, 2015 5:32 PM
Subject: Cannot connect to a zookeeper 3.4.6 instance via zkCli.cmd
Hi there,
Currently, I have setup an azure virtual network to connect my Zookeeper
clusters together with three Azure VMs. Each VM has an internal IP of
10.0.0.4, 10.0.0.5 and 10.0.0.6. I have also setup Solr 5.3.0 which runs in
Solr Cloud mode connected to all three Zookeepers in an external ensemble
manner.
I am able to connect to 10.0.0.4 and 10.0.0.6 via the zkCli.cmd after
starting the Zookeeper services. However for 10.0.0.5, I keep getting the
below error even if I started the zookeeper service.
I have restarted 10.0.0.5 VM several times and still am unable to connect
to Zookeeper via zkCli.cmd. I have checked zoo.cfg (making sure ports, data
and logs are all set correctly) and myid to ensure they have the correct
configurations.
The simple command line I used to connect to Zookeeper is zkCli.cmd
-server 10.0.0.5:2182 for example.
Any ideas?
Best regards,
Adrian Liew | Consultant Application Developer
Avanade Malaysia Sdn. Bhd..| Consulting Services
(: Direct: +(603) 2382 5668
È: +6010-2288030
Adrian Liew
2015-10-02 08:33:38 UTC
Permalink
Hi Edwin,

I have followed the standards recommended by the Zookeeper article. It seems to be working.

Incidentally, I am facing intermittent issues whereby I am unable to connect to Zookeeper service via Solr's zkCli.bat command, even after having setting automatic startup of my ZooKeeper service. I have basically configured (non-sucking-service-manager) nssm to auto start Solr with a dependency of Zookeeper to ensure both services are running on startup for each Solr VM.

Here is an example what I tried to run to connect to the ZK service:

E:\solr-5.3.0\server\scripts\cloud-scripts>zkcli.bat -z 10.0.0.6:2183 -cmd list
Exception in thread "main" org.apache.solr.common.SolrException: java.util.concu
rrent.TimeoutException: Could not connect to ZooKeeper 10.0.0.6:2183 within 3000
0 ms
at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:18
1)
at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:11
5)
at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:10
5)
at org.apache.solr.cloud.ZkCLI.main(ZkCLI.java:181)
Caused by: java.util.concurrent.TimeoutException: Could not connect to ZooKeeper
10.0.0.6:2183 within 30000 ms
at org.apache.solr.common.cloud.ConnectionManager.waitForConnected(Conne
ctionManager.java:208)
at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:17
3)
... 3 more


Further to this I inspected the output shown in console window by zkServer.cmd:

2015-10-02 08:24:09,305 [myid:3] - WARN [WorkerSender[myid=3]:QuorumCnxManager@
382] - Cannot open channel to 2 at election address /10.0.0.5:3888
java.net.SocketTimeoutException: connect timed out
at java.net.DualStackPlainSocketImpl.waitForConnect(Native Method)
at java.net.DualStackPlainSocketImpl.socketConnect(Unknown Source)
at java.net.AbstractPlainSocketImpl.doConnect(Unknown Source)
at java.net.AbstractPlainSocketImpl.connectToAddress(Unknown Source)
at java.net.AbstractPlainSocketImpl.connect(Unknown Source)
at java.net.PlainSocketImpl.connect(Unknown Source)
at java.net.SocksSocketImpl.connect(Unknown Source)
at java.net.Socket.connect(Unknown Source)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(Quorum
CnxManager.java:368)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxM
anager.java:341)
at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$Worke
rSender.process(FastLeaderElection.java:449)
at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$Worke
rSender.run(FastLeaderElection.java:430)
at java.lang.Thread.run(Unknown Source)
2015-10-02 08:24:09,305 [myid:3] - INFO [WorkerReceiver[myid=3]:FastLeaderElect
***@597] - Notification: 1 (message format version), 3 (n.leader), 0x700000011 (
n.zxid), 0x1 (n.round), LOOKING (n.state), 3 (n.sid), 0x7 (n.peerEpoch) LOOKING
(my state)

I noticed the error message by zkServer.cmd as Cannot open channel to 2 at election address /10.0.0.5:3888

Can firewall settings be the issue here? I feel this may be a network issue between the individual Solr VMs. I am using a Windows Server 2012 R2 64 bit environment to run Zookeeper 3.4.6 and Solr 5.3.0.

Currently, I have setup my firewalls in the Advanced Configuration Firewall Settings as below:

As for the Firewall settings I have configured the below for each Azure VM (Phoenix-Solr-0, Phoenix-Solr-1, Phoenix-Solr-2) in the Firewall Advanced Security Settings:

For allowed inbound connections:

Solr port 8983
ZK1 port 2181
ZK2 port 2888
ZK3 port 3888

Regards,
Adrian

-----Original Message-----
From: Zheng Lin Edwin Yeo [mailto:***@gmail.com]
Sent: Friday, October 2, 2015 11:03 AM
To: solr-***@lucene.apache.org
Subject: Re: Cannot connect to a zookeeper 3.4.6 instance via zkCli.cmd

Hi Adrian,

How is your setup of your system like? By right it shouldn't be an issue if we use different ports.

in fact, if the various zookeeper instance are running on a single machine, they have to be on different ports in order for it to work.


Regards,
Edwin
Post by Adrian Liew
Hi all,
The problem below was resolved by appropriately setting my server ip
server.1=10.0.0.4:2888:3888
server.2=10.0.0.5:2888:3888
server.3=10.0.0.6:2888:3888
server.1=10.0.0.4:2888:3888
server.2=10.0.0.5:2889:3889
server.3=10.0.0.6:2890:3890
I am not sure why the above can be an issue (by right it should not),
however I followed the recommendations provided by Zookeeper
administration guide under RunningReplicatedZookeeper (
https://zookeeper.apache.org/doc/r3.1.2/zookeeperStarted.html#sc_Runni
ngReplicatedZooKeeper
)
Given that I am testing multiple servers in a mutiserver environment,
it will be safe to use 2888:3888 on each server rather than have
different ports.
Regards,
Adrian
Sent: Thursday, October 1, 2015 5:32 PM
Subject: Cannot connect to a zookeeper 3.4.6 instance via zkCli.cmd
Hi there,
Currently, I have setup an azure virtual network to connect my
Zookeeper clusters together with three Azure VMs. Each VM has an
internal IP of 10.0.0.4, 10.0.0.5 and 10.0.0.6. I have also setup Solr
5.3.0 which runs in Solr Cloud mode connected to all three Zookeepers
in an external ensemble manner.
I am able to connect to 10.0.0.4 and 10.0.0.6 via the zkCli.cmd after
starting the Zookeeper services. However for 10.0.0.5, I keep getting
the below error even if I started the zookeeper service.
I have restarted 10.0.0.5 VM several times and still am unable to
connect to Zookeeper via zkCli.cmd. I have checked zoo.cfg (making
sure ports, data and logs are all set correctly) and myid to ensure
they have the correct configurations.
The simple command line I used to connect to Zookeeper is zkCli.cmd
-server 10.0.0.5:2182 for example.
Any ideas?
Best regards,
Adrian Liew | Consultant Application Developer Avanade Malaysia Sdn.
Bhd..| Consulting Services
(: Direct: +(603) 2382 5668
Erick Erickson
2015-10-02 16:22:43 UTC
Permalink
Hmmm, there are usually a couple of ports that each ZK instance needs,
is it possible that
you've got more than one process using one of those ports?

By default (I think), zookeeper uses "peer port + 1000" for its leader
election process, see:
https://zookeeper.apache.org/doc/r3.3.3/zookeeperStarted.html
the "Running Replicated Zookeeper" section.

I'm not quite clear whether the above ZK2 port and ZK3 port are just
meant to indicate a single
Zookeeper instance on a node or not so I thought I'd check.

Firewalls should always fail, not intermittently so I'm puzzled about that....

Best,
Erick
Post by Adrian Liew
Hi Edwin,
I have followed the standards recommended by the Zookeeper article. It seems to be working.
Incidentally, I am facing intermittent issues whereby I am unable to connect to Zookeeper service via Solr's zkCli.bat command, even after having setting automatic startup of my ZooKeeper service. I have basically configured (non-sucking-service-manager) nssm to auto start Solr with a dependency of Zookeeper to ensure both services are running on startup for each Solr VM.
E:\solr-5.3.0\server\scripts\cloud-scripts>zkcli.bat -z 10.0.0.6:2183 -cmd list
Exception in thread "main" org.apache.solr.common.SolrException: java.util.concu
rrent.TimeoutException: Could not connect to ZooKeeper 10.0.0.6:2183 within 3000
0 ms
at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:18
1)
at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:11
5)
at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:10
5)
at org.apache.solr.cloud.ZkCLI.main(ZkCLI.java:181)
Caused by: java.util.concurrent.TimeoutException: Could not connect to ZooKeeper
10.0.0.6:2183 within 30000 ms
at org.apache.solr.common.cloud.ConnectionManager.waitForConnected(Conne
ctionManager.java:208)
at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:17
3)
... 3 more
382] - Cannot open channel to 2 at election address /10.0.0.5:3888
java.net.SocketTimeoutException: connect timed out
at java.net.DualStackPlainSocketImpl.waitForConnect(Native Method)
at java.net.DualStackPlainSocketImpl.socketConnect(Unknown Source)
at java.net.AbstractPlainSocketImpl.doConnect(Unknown Source)
at java.net.AbstractPlainSocketImpl.connectToAddress(Unknown Source)
at java.net.AbstractPlainSocketImpl.connect(Unknown Source)
at java.net.PlainSocketImpl.connect(Unknown Source)
at java.net.SocksSocketImpl.connect(Unknown Source)
at java.net.Socket.connect(Unknown Source)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(Quorum
CnxManager.java:368)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxM
anager.java:341)
at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$Worke
rSender.process(FastLeaderElection.java:449)
at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$Worke
rSender.run(FastLeaderElection.java:430)
at java.lang.Thread.run(Unknown Source)
2015-10-02 08:24:09,305 [myid:3] - INFO [WorkerReceiver[myid=3]:FastLeaderElect
n.zxid), 0x1 (n.round), LOOKING (n.state), 3 (n.sid), 0x7 (n.peerEpoch) LOOKING
(my state)
I noticed the error message by zkServer.cmd as Cannot open channel to 2 at election address /10.0.0.5:3888
Can firewall settings be the issue here? I feel this may be a network issue between the individual Solr VMs. I am using a Windows Server 2012 R2 64 bit environment to run Zookeeper 3.4.6 and Solr 5.3.0.
Solr port 8983
ZK1 port 2181
ZK2 port 2888
ZK3 port 3888
Regards,
Adrian
-----Original Message-----
Sent: Friday, October 2, 2015 11:03 AM
Subject: Re: Cannot connect to a zookeeper 3.4.6 instance via zkCli.cmd
Hi Adrian,
How is your setup of your system like? By right it shouldn't be an issue if we use different ports.
in fact, if the various zookeeper instance are running on a single machine, they have to be on different ports in order for it to work.
Regards,
Edwin
Post by Adrian Liew
Hi all,
The problem below was resolved by appropriately setting my server ip
server.1=10.0.0.4:2888:3888
server.2=10.0.0.5:2888:3888
server.3=10.0.0.6:2888:3888
server.1=10.0.0.4:2888:3888
server.2=10.0.0.5:2889:3889
server.3=10.0.0.6:2890:3890
I am not sure why the above can be an issue (by right it should not),
however I followed the recommendations provided by Zookeeper
administration guide under RunningReplicatedZookeeper (
https://zookeeper.apache.org/doc/r3.1.2/zookeeperStarted.html#sc_Runni
ngReplicatedZooKeeper
)
Given that I am testing multiple servers in a mutiserver environment,
it will be safe to use 2888:3888 on each server rather than have
different ports.
Regards,
Adrian
Sent: Thursday, October 1, 2015 5:32 PM
Subject: Cannot connect to a zookeeper 3.4.6 instance via zkCli.cmd
Hi there,
Currently, I have setup an azure virtual network to connect my
Zookeeper clusters together with three Azure VMs. Each VM has an
internal IP of 10.0.0.4, 10.0.0.5 and 10.0.0.6. I have also setup Solr
5.3.0 which runs in Solr Cloud mode connected to all three Zookeepers
in an external ensemble manner.
I am able to connect to 10.0.0.4 and 10.0.0.6 via the zkCli.cmd after
starting the Zookeeper services. However for 10.0.0.5, I keep getting
the below error even if I started the zookeeper service.
I have restarted 10.0.0.5 VM several times and still am unable to
connect to Zookeeper via zkCli.cmd. I have checked zoo.cfg (making
sure ports, data and logs are all set correctly) and myid to ensure
they have the correct configurations.
The simple command line I used to connect to Zookeeper is zkCli.cmd
-server 10.0.0.5:2182 for example.
Any ideas?
Best regards,
Adrian Liew | Consultant Application Developer Avanade Malaysia Sdn.
Bhd..| Consulting Services
(: Direct: +(603) 2382 5668
È: +6010-2288030
Zheng Lin Edwin Yeo
2015-10-05 02:01:48 UTC
Permalink
Hi Adrian,

It's unlikely to be the firewall settings if it is failing intermittently.
More of a network issues.

The error says it's a connection time out, and since you say it happens
only intermittently, I'm suspecting it could be network issues.
Have you check if the connection to the various servers are always up?

Regards,
Edwin
Post by Erick Erickson
Hmmm, there are usually a couple of ports that each ZK instance needs,
is it possible that
you've got more than one process using one of those ports?
By default (I think), zookeeper uses "peer port + 1000" for its leader
https://zookeeper.apache.org/doc/r3.3.3/zookeeperStarted.html
the "Running Replicated Zookeeper" section.
I'm not quite clear whether the above ZK2 port and ZK3 port are just
meant to indicate a single
Zookeeper instance on a node or not so I thought I'd check.
Firewalls should always fail, not intermittently so I'm puzzled about that....
Best,
Erick
Post by Adrian Liew
Hi Edwin,
I have followed the standards recommended by the Zookeeper article. It
seems to be working.
Post by Adrian Liew
Incidentally, I am facing intermittent issues whereby I am unable to
connect to Zookeeper service via Solr's zkCli.bat command, even after
having setting automatic startup of my ZooKeeper service. I have basically
configured (non-sucking-service-manager) nssm to auto start Solr with a
dependency of Zookeeper to ensure both services are running on startup for
each Solr VM.
Post by Adrian Liew
E:\solr-5.3.0\server\scripts\cloud-scripts>zkcli.bat -z 10.0.0.6:2183
-cmd list
java.util.concu
Post by Adrian Liew
rrent.TimeoutException: Could not connect to ZooKeeper 10.0.0.6:2183
within 3000
Post by Adrian Liew
0 ms
at
org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:18
Post by Adrian Liew
1)
at
org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:11
Post by Adrian Liew
5)
at
org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:10
Post by Adrian Liew
5)
at org.apache.solr.cloud.ZkCLI.main(ZkCLI.java:181)
Caused by: java.util.concurrent.TimeoutException: Could not connect to
ZooKeeper
Post by Adrian Liew
10.0.0.6:2183 within 30000 ms
at
org.apache.solr.common.cloud.ConnectionManager.waitForConnected(Conne
Post by Adrian Liew
ctionManager.java:208)
at
org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:17
Post by Adrian Liew
3)
... 3 more
Further to this I inspected the output shown in console window by
2015-10-02 08:24:09,305 [myid:3] - WARN
382] - Cannot open channel to 2 at election address /10.0.0.5:3888
java.net.SocketTimeoutException: connect timed out
at java.net.DualStackPlainSocketImpl.waitForConnect(Native
Method)
Post by Adrian Liew
at java.net.DualStackPlainSocketImpl.socketConnect(Unknown
Source)
Post by Adrian Liew
at java.net.AbstractPlainSocketImpl.doConnect(Unknown Source)
at java.net.AbstractPlainSocketImpl.connectToAddress(Unknown
Source)
Post by Adrian Liew
at java.net.AbstractPlainSocketImpl.connect(Unknown Source)
at java.net.PlainSocketImpl.connect(Unknown Source)
at java.net.SocksSocketImpl.connect(Unknown Source)
at java.net.Socket.connect(Unknown Source)
at
org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(Quorum
Post by Adrian Liew
CnxManager.java:368)
at
org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxM
Post by Adrian Liew
anager.java:341)
at
org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$Worke
Post by Adrian Liew
rSender.process(FastLeaderElection.java:449)
at
org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$Worke
Post by Adrian Liew
rSender.run(FastLeaderElection.java:430)
at java.lang.Thread.run(Unknown Source)
2015-10-02 08:24:09,305 [myid:3] - INFO
[WorkerReceiver[myid=3]:FastLeaderElect
0x700000011 (
Post by Adrian Liew
n.zxid), 0x1 (n.round), LOOKING (n.state), 3 (n.sid), 0x7 (n.peerEpoch)
LOOKING
Post by Adrian Liew
(my state)
I noticed the error message by zkServer.cmd as Cannot open channel to 2
at election address /10.0.0.5:3888
Post by Adrian Liew
Can firewall settings be the issue here? I feel this may be a network
issue between the individual Solr VMs. I am using a Windows Server 2012 R2
64 bit environment to run Zookeeper 3.4.6 and Solr 5.3.0.
Post by Adrian Liew
Currently, I have setup my firewalls in the Advanced Configuration
As for the Firewall settings I have configured the below for each Azure
VM (Phoenix-Solr-0, Phoenix-Solr-1, Phoenix-Solr-2) in the Firewall
Post by Adrian Liew
Solr port 8983
ZK1 port 2181
ZK2 port 2888
ZK3 port 3888
Regards,
Adrian
-----Original Message-----
Sent: Friday, October 2, 2015 11:03 AM
Subject: Re: Cannot connect to a zookeeper 3.4.6 instance via zkCli.cmd
Hi Adrian,
How is your setup of your system like? By right it shouldn't be an issue
if we use different ports.
Post by Adrian Liew
in fact, if the various zookeeper instance are running on a single
machine, they have to be on different ports in order for it to work.
Post by Adrian Liew
Regards,
Edwin
Post by Adrian Liew
Hi all,
The problem below was resolved by appropriately setting my server ip
server.1=10.0.0.4:2888:3888
server.2=10.0.0.5:2888:3888
server.3=10.0.0.6:2888:3888
server.1=10.0.0.4:2888:3888
server.2=10.0.0.5:2889:3889
server.3=10.0.0.6:2890:3890
I am not sure why the above can be an issue (by right it should not),
however I followed the recommendations provided by Zookeeper
administration guide under RunningReplicatedZookeeper (
https://zookeeper.apache.org/doc/r3.1.2/zookeeperStarted.html#sc_Runni
ngReplicatedZooKeeper
)
Given that I am testing multiple servers in a mutiserver environment,
it will be safe to use 2888:3888 on each server rather than have
different ports.
Regards,
Adrian
Sent: Thursday, October 1, 2015 5:32 PM
Subject: Cannot connect to a zookeeper 3.4.6 instance via zkCli.cmd
Hi there,
Currently, I have setup an azure virtual network to connect my
Zookeeper clusters together with three Azure VMs. Each VM has an
internal IP of 10.0.0.4, 10.0.0.5 and 10.0.0.6. I have also setup Solr
5.3.0 which runs in Solr Cloud mode connected to all three Zookeepers
in an external ensemble manner.
I am able to connect to 10.0.0.4 and 10.0.0.6 via the zkCli.cmd after
starting the Zookeeper services. However for 10.0.0.5, I keep getting
the below error even if I started the zookeeper service.
I have restarted 10.0.0.5 VM several times and still am unable to
connect to Zookeeper via zkCli.cmd. I have checked zoo.cfg (making
sure ports, data and logs are all set correctly) and myid to ensure
they have the correct configurations.
The simple command line I used to connect to Zookeeper is zkCli.cmd
-server 10.0.0.5:2182 for example.
Any ideas?
Best regards,
Adrian Liew | Consultant Application Developer Avanade Malaysia Sdn.
Bhd..| Consulting Services
(: Direct: +(603) 2382 5668
È: +6010-2288030
Adrian Liew
2015-10-06 09:38:40 UTC
Permalink
Hi Edwin,

Thanks for the reply. Looks like this has been resolved by manually starting the Zookeeper services on each server promptly so that the tickTime value does not timeout too quickly to heartbeat other peers. Hence, I increased the tickTime value to about 5 minutes to give some time for a node hosting Zookeeper to restart and autostart its service. This case seems fixed but I will double check again once more to be sure. I am using nssm (non-sucking-service-manager) to autostart Zookeeper. I will need to retest this once again using nssm to make sure zookeeper services are up and running.

Regards,
Adrian

Best regards,

Adrian Liew | Consultant Application Developer
Avanade Malaysia Sdn. Bhd..| Consulting Services
(: Direct: +(603) 2382 5668
È: +6010-2288030


-----Original Message-----
From: Zheng Lin Edwin Yeo [mailto:***@gmail.com]
Sent: Monday, October 5, 2015 10:02 AM
To: solr-***@lucene.apache.org
Subject: Re: Cannot connect to a zookeeper 3.4.6 instance via zkCli.cmd

Hi Adrian,

It's unlikely to be the firewall settings if it is failing intermittently.
More of a network issues.

The error says it's a connection time out, and since you say it happens only intermittently, I'm suspecting it could be network issues.
Have you check if the connection to the various servers are always up?

Regards,
Edwin
Post by Erick Erickson
Hmmm, there are usually a couple of ports that each ZK instance needs,
is it possible that you've got more than one process using one of
those ports?
By default (I think), zookeeper uses "peer port + 1000" for its leader
https://zookeeper.apache.org/doc/r3.3.3/zookeeperStarted.html
the "Running Replicated Zookeeper" section.
I'm not quite clear whether the above ZK2 port and ZK3 port are just
meant to indicate a single Zookeeper instance on a node or not so I
thought I'd check.
Firewalls should always fail, not intermittently so I'm puzzled about that....
Best,
Erick
Post by Adrian Liew
Hi Edwin,
I have followed the standards recommended by the Zookeeper article. It
seems to be working.
Post by Adrian Liew
Incidentally, I am facing intermittent issues whereby I am unable to
connect to Zookeeper service via Solr's zkCli.bat command, even after
having setting automatic startup of my ZooKeeper service. I have
basically configured (non-sucking-service-manager) nssm to auto start
Solr with a dependency of Zookeeper to ensure both services are
running on startup for each Solr VM.
Post by Adrian Liew
E:\solr-5.3.0\server\scripts\cloud-scripts>zkcli.bat -z
10.0.0.6:2183
-cmd list
java.util.concu
Post by Adrian Liew
rrent.TimeoutException: Could not connect to ZooKeeper 10.0.0.6:2183
within 3000
Post by Adrian Liew
0 ms
at
org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:18
Post by Adrian Liew
1)
at
org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:11
Post by Adrian Liew
5)
at
org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:10
Post by Adrian Liew
5)
at org.apache.solr.cloud.ZkCLI.main(ZkCLI.java:181)
Caused by: java.util.concurrent.TimeoutException: Could not connect to
ZooKeeper
Post by Adrian Liew
10.0.0.6:2183 within 30000 ms
at
org.apache.solr.common.cloud.ConnectionManager.waitForConnected(Conne
Post by Adrian Liew
ctionManager.java:208)
at
org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:17
Post by Adrian Liew
3)
... 3 more
Further to this I inspected the output shown in console window by
2015-10-02 08:24:09,305 [myid:3] - WARN
382] - Cannot open channel to 2 at election address /10.0.0.5:3888
java.net.SocketTimeoutException: connect timed out
at java.net.DualStackPlainSocketImpl.waitForConnect(Native
Method)
Post by Adrian Liew
at java.net.DualStackPlainSocketImpl.socketConnect(Unknown
Source)
Post by Adrian Liew
at java.net.AbstractPlainSocketImpl.doConnect(Unknown Source)
at java.net.AbstractPlainSocketImpl.connectToAddress(Unknown
Source)
Post by Adrian Liew
at java.net.AbstractPlainSocketImpl.connect(Unknown Source)
at java.net.PlainSocketImpl.connect(Unknown Source)
at java.net.SocksSocketImpl.connect(Unknown Source)
at java.net.Socket.connect(Unknown Source)
at
org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(Quorum
Post by Adrian Liew
CnxManager.java:368)
at
org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxM
Post by Adrian Liew
anager.java:341)
at
org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$Worke
Post by Adrian Liew
rSender.process(FastLeaderElection.java:449)
at
org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$Worke
Post by Adrian Liew
rSender.run(FastLeaderElection.java:430)
at java.lang.Thread.run(Unknown Source)
2015-10-02 08:24:09,305 [myid:3] - INFO
[WorkerReceiver[myid=3]:FastLeaderElect
0x700000011 (
Post by Adrian Liew
n.zxid), 0x1 (n.round), LOOKING (n.state), 3 (n.sid), 0x7
(n.peerEpoch)
LOOKING
Post by Adrian Liew
(my state)
I noticed the error message by zkServer.cmd as Cannot open channel to 2
at election address /10.0.0.5:3888
Post by Adrian Liew
Can firewall settings be the issue here? I feel this may be a network
issue between the individual Solr VMs. I am using a Windows Server 2012 R2
64 bit environment to run Zookeeper 3.4.6 and Solr 5.3.0.
Post by Adrian Liew
Currently, I have setup my firewalls in the Advanced Configuration
As for the Firewall settings I have configured the below for each Azure
VM (Phoenix-Solr-0, Phoenix-Solr-1, Phoenix-Solr-2) in the Firewall
Post by Adrian Liew
Solr port 8983
ZK1 port 2181
ZK2 port 2888
ZK3 port 3888
Regards,
Adrian
-----Original Message-----
Sent: Friday, October 2, 2015 11:03 AM
Subject: Re: Cannot connect to a zookeeper 3.4.6 instance via zkCli.cmd
Hi Adrian,
How is your setup of your system like? By right it shouldn't be an issue
if we use different ports.
Post by Adrian Liew
in fact, if the various zookeeper instance are running on a single
machine, they have to be on different ports in order for it to work.
Post by Adrian Liew
Regards,
Edwin
Post by Adrian Liew
Hi all,
The problem below was resolved by appropriately setting my server
server.1=10.0.0.4:2888:3888
server.2=10.0.0.5:2888:3888
server.3=10.0.0.6:2888:3888
server.1=10.0.0.4:2888:3888
server.2=10.0.0.5:2889:3889
server.3=10.0.0.6:2890:3890
I am not sure why the above can be an issue (by right it should
not), however I followed the recommendations provided by Zookeeper
administration guide under RunningReplicatedZookeeper (
https://zookeeper.apache.org/doc/r3.1.2/zookeeperStarted.html#sc_Ru
nni
ngReplicatedZooKeeper
)
Given that I am testing multiple servers in a mutiserver
environment, it will be safe to use 2888:3888 on each server rather
than have different ports.
Regards,
Adrian
Sent: Thursday, October 1, 2015 5:32 PM
Subject: Cannot connect to a zookeeper 3.4.6 instance via zkCli.cmd
Hi there,
Currently, I have setup an azure virtual network to connect my
Zookeeper clusters together with three Azure VMs. Each VM has an
internal IP of 10.0.0.4, 10.0.0.5 and 10.0.0.6. I have also setup Solr
5.3.0 which runs in Solr Cloud mode connected to all three
Zookeepers in an external ensemble manner.
I am able to connect to 10.0.0.4 and 10.0.0.6 via the zkCli.cmd
after starting the Zookeeper services. However for 10.0.0.5, I keep
getting the below error even if I started the zookeeper service.
I have restarted 10.0.0.5 VM several times and still am unable to
connect to Zookeeper via zkCli.cmd. I have checked zoo.cfg (making
sure ports, data and logs are all set correctly) and myid to ensure
they have the correct configurations.
The simple command line I used to connect to Zookeeper is zkCli.cmd
-server 10.0.0.5:2182 for example.
Any ideas?
Best regards,
Adrian Liew | Consultant Application Developer Avanade Malaysia Sdn.
Bhd..| Consulting Services
(: Direct: +(603)
Shawn Heisey
2015-10-06 14:16:06 UTC
Permalink
Post by Adrian Liew
Thanks for the reply. Looks like this has been resolved by manually starting the Zookeeper services on each server promptly so that the tickTime value does not timeout too quickly to heartbeat other peers. Hence, I increased the tickTime value to about 5 minutes to give some time for a node hosting Zookeeper to restart and autostart its service. This case seems fixed but I will double check again once more to be sure. I am using nssm (non-sucking-service-manager) to autostart Zookeeper. I will need to retest this once again using nssm to make sure zookeeper services are up and running.
That sounds like a very bad idea. A typical tickTime is two *seconds*.
Zookeeper is designed around certain things happening very quickly.

I don't think you can increase that to five *minutes* (multiplying it by
150) without the strong possibility of something going very wrong and
processes hanging for minutes at a time waiting for a timeout that
should happen very quickly.

I am reasonably certain that tickTime is used for zookeeper operation in
several ways, so I believe that this much of an increase will cause
fundamental problems with zookeeper's normal operation. I admit that I
have not looked at the code, so I could be wrong ... but based on the
following information from the Zookeeper docs, I don't think I am wrong:

tickTime

the length of a single tick, which is the basic time unit used by
ZooKeeper, as measured in milliseconds. It is used to regulate
heartbeats, and timeouts. For example, the minimum session timeout will
be two ticks.

Thanks,
Shawn
Adrian Liew
2015-10-07 03:56:23 UTC
Permalink
Hi Shawn,

Thanks for the reply. Understood your comments and will revert back to the defaults. However, I raised this issue because I realized that Zookeeper becomes impatient if it cannot heartbeat its other peers in time. So for example, if 1 ZK server goes down out of 3 ZK servers, the 1 ZK server will stop pinging other servers and complain about timeout issues to zkCli connect to its service.

Will revert back with an update.

Regards,
Adrian

-----Original Message-----
From: Shawn Heisey [mailto:***@elyograg.org]
Sent: Tuesday, October 6, 2015 10:16 PM
To: solr-***@lucene.apache.org
Subject: Re: Cannot connect to a zookeeper 3.4.6 instance via zkCli.cmd
Post by Adrian Liew
Thanks for the reply. Looks like this has been resolved by manually starting the Zookeeper services on each server promptly so that the tickTime value does not timeout too quickly to heartbeat other peers. Hence, I increased the tickTime value to about 5 minutes to give some time for a node hosting Zookeeper to restart and autostart its service. This case seems fixed but I will double check again once more to be sure. I am using nssm (non-sucking-service-manager) to autostart Zookeeper. I will need to retest this once again using nssm to make sure zookeeper services are up and running.
That sounds like a very bad idea. A typical tickTime is two *seconds*.
Zookeeper is designed around certain things happening very quickly.

I don't think you can increase that to five *minutes* (multiplying it by
150) without the strong possibility of something going very wrong and processes hanging for minutes at a time waiting for a timeout that should happen very quickly.

I am reasonably certain that tickTime is used for zookeeper operation in several ways, so I believe that this much of an increase will cause fundamental problems with zookeeper's normal operation. I admit that I have not looked at the code, so I could be wrong ... but based on the following information from the Zookeeper docs, I don't think I am wrong:

tickTime

the length of a single tick, which is the basic time unit used by ZooKeeper, as measured in milliseconds. It is used to regulate heartbeats, and timeouts. For example, the minimum session
Adrian Liew
2015-10-07 06:52:16 UTC
Permalink
Hi Shawn,

To reiterate, this is the exception I get if unable to connect to Zookeeper service:

E:\solr-5.3.0\server\scripts\cloud-scripts>zkcli.bat -z 10.0.0.4:2181 -cmd list
Exception in thread "main" org.apache.solr.common.SolrException: java.util.concu
rrent.TimeoutException: Could not connect to ZooKeeper 10.0.0.4:2181 within 3000
0 ms
at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:18
1)
at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:11
5)
at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:10
5)
at org.apache.solr.cloud.ZkCLI.main(ZkCLI.java:181)
Caused by: java.util.concurrent.TimeoutException: Could not connect to ZooKeeper
10.0.0.4:2181 within 30000 ms
at org.apache.solr.common.cloud.ConnectionManager.waitForConnected(Conne
ctionManager.java:208)
at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:17
3)
... 3 more

For example, in the event if one of the zookeeper services goes down for a few minutes, it may be too late to bring that service back online into the zookeeper cluster due the timeout faced above. In that, all zookeeper services need to be restarted at the same time.

Please clarify if there is a configuration that I missed out, an expected behaviour or if this is a bug.

Regards,
Adrian

-----Original Message-----
From: Adrian Liew [mailto:***@avanade.com]
Sent: Wednesday, October 7, 2015 11:56 AM
To: solr-***@lucene.apache.org
Subject: RE: Cannot connect to a zookeeper 3.4.6 instance via zkCli.cmd

Hi Shawn,

Thanks for the reply. Understood your comments and will revert back to the defaults. However, I raised this issue because I realized that Zookeeper becomes impatient if it cannot heartbeat its other peers in time. So for example, if 1 ZK server goes down out of 3 ZK servers, the 1 ZK server will stop pinging other servers and complain about timeout issues to zkCli connect to its service.

Will revert back with an update.

Regards,
Adrian

-----Original Message-----
From: Shawn Heisey [mailto:***@elyograg.org]
Sent: Tuesday, October 6, 2015 10:16 PM
To: solr-***@lucene.apache.org
Subject: Re: Cannot connect to a zookeeper 3.4.6 instance via zkCli.cmd
Post by Adrian Liew
Thanks for the reply. Looks like this has been resolved by manually starting the Zookeeper services on each server promptly so that the tickTime value does not timeout too quickly to heartbeat other peers. Hence, I increased the tickTime value to about 5 minutes to give some time for a node hosting Zookeeper to restart and autostart its service. This case seems fixed but I will double check again once more to be sure. I am using nssm (non-sucking-service-manager) to autostart Zookeeper. I will need to retest this once again using nssm to make sure zookeeper services are up and running.
That sounds like a very bad idea. A typical tickTime is two *seconds*.
Zookeeper is designed around certain things happening very quickly.

I don't think you can increase that to five *minutes* (multiplying it by
150) without the strong possibility of something going very wrong and processes hanging for minutes at a time waiting for a timeout that should happen very quickly.

I am reasonably certain that tickTime is used for zookeeper operation in several ways, so I believe that this much of an increase will cause fundamental problems with zookeeper's normal operation. I admit that I have not looked at the code, so I could be wrong ... but based on the following information from the Zookeeper docs, I don't think I am wrong:

tickTime

the length of a single tick, which is the basic time unit used by ZooKeeper, as measured in milliseconds. It is used to regulate heartbeats, and timeouts. For example, the minimum session timeout will be two t
Loading...