Very high memory and CPU utilization.

Post by Modassar Ather
I have a setup of 12 shard cluster started with 28gb memory each on a
single server. There are no replica. The size of index is around 90gb on
each shard. The Solr version is 5.2.1.

That is 12 machines, running a shard each?

What is the total amount of physical memory on each machine?

Post by Modassar Ather
When I query "network se*", the memory utilization goes upto 24-26 gb and
the query takes around 3+ minutes to execute. Also the CPU utilization goes
upto 400% in few of the nodes.

Well, se* probably expands to a great deal of documents, but a huge bump
in memory utilization and 3 minutes+ sounds strange.

- What are your normal query times?
- How many hits do you get from 'network se*'?
- How many results do you return (the rows-parameter)?
- If you issue a query without wildcards, but with approximately the
same amount of hits as 'network se*', how long does it take?

Post by Modassar Ather
Why the CPU utilization is so high and more than one core is used.
As far as I understand querying is single threaded.

That is strange, yes. Have you checked the logs to see if something
unexpected is going on while you test?

Post by Modassar Ather
How can I disable replication(as it is implicitly enabled) permanently as
in our case we are not using it but can see warnings related to leader
election?

If you are using spinning drives and only have 32GB of RAM in total in
each machine, you are probably struggling just to keep things running.

- Toke Eskildsen, State and University Library, Denmark

Modassar Ather

2015-11-02 09:04:12 UTC

Hi Toke,
Thanks for your response. My comments in-line.

That is 12 machines, running a shard each?
No! This is a single big machine with 12 shards on it.

What is the total amount of physical memory on each machine?
Around 370 gb on the single machine.

Well, se* probably expands to a great deal of documents, but a huge bump
in memory utilization and 3 minutes+ sounds strange.

- What are your normal query times?
Few simple queries are returned with in a couple of seconds. But the more
complex queries with proximity and wild cards have taken more than 3-4
minutes and some times some queries have timed out too where time out is
set to 5 minutes.
- How many hits do you get from 'network se*'?
More than a million records.
- How many results do you return (the rows-parameter)?
It is the default one 10. Grouping is enabled on a field.
- If you issue a query without wildcards, but with approximately the
same amount of hits as 'network se*', how long does it take?
A query resulting in around half a million record return within a couple of
seconds.

That is strange, yes. Have you checked the logs to see if something
unexpected is going on while you test?
Have not seen anything particularly. Will try to check again.

If you are using spinning drives and only have 32GB of RAM in total in
each machine, you are probably struggling just to keep things running.
As mentioned above this is a big machine with 370+ gb of RAM and Solr (12
nodes total) is assigned 336 GB. The rest is still a good for other system
activities.

Thanks,
Modassar

That is 12 machines, running a shard each?
What is the total amount of physical memory on each machine?

Post by Modassar Ather
When I query "network se*", the memory utilization goes upto 24-26 gb and
the query takes around 3+ minutes to execute. Also the CPU utilization

goes

Post by Modassar Ather
upto 400% in few of the nodes.

Well, se* probably expands to a great deal of documents, but a huge bump
in memory utilization and 3 minutes+ sounds strange.
- What are your normal query times?
- How many hits do you get from 'network se*'?
- How many results do you return (the rows-parameter)?
- If you issue a query without wildcards, but with approximately the
same amount of hits as 'network se*', how long does it take?

Post by Modassar Ather
Why the CPU utilization is so high and more than one core is used.
As far as I understand querying is single threaded.

That is strange, yes. Have you checked the logs to see if something
unexpected is going on while you test?

Post by Modassar Ather
How can I disable replication(as it is implicitly enabled) permanently as
in our case we are not using it but can see warnings related to leader
election?

If you are using spinning drives and only have 32GB of RAM in total in
each machine, you are probably struggling just to keep things running.
- Toke Eskildsen, State and University Library, Denmark

Modassar Ather

2015-11-02 09:38:08 UTC

Just to add one more point that one external Zookeeper instance is also
running on this particular machine.

Regards,
Modassar

Post by Modassar Ather
Hi Toke,
Thanks for your response. My comments in-line.
That is 12 machines, running a shard each?
No! This is a single big machine with 12 shards on it.
What is the total amount of physical memory on each machine?
Around 370 gb on the single machine.
Well, se* probably expands to a great deal of documents, but a huge bump
in memory utilization and 3 minutes+ sounds strange.
- What are your normal query times?
Few simple queries are returned with in a couple of seconds. But the more
complex queries with proximity and wild cards have taken more than 3-4
minutes and some times some queries have timed out too where time out is
set to 5 minutes.
- How many hits do you get from 'network se*'?
More than a million records.
- How many results do you return (the rows-parameter)?
It is the default one 10. Grouping is enabled on a field.
- If you issue a query without wildcards, but with approximately the
same amount of hits as 'network se*', how long does it take?
A query resulting in around half a million record return within a couple
of seconds.
That is strange, yes. Have you checked the logs to see if something
unexpected is going on while you test?
Have not seen anything particularly. Will try to check again.
If you are using spinning drives and only have 32GB of RAM in total in
each machine, you are probably struggling just to keep things running.
As mentioned above this is a big machine with 370+ gb of RAM and Solr (12
nodes total) is assigned 336 GB. The rest is still a good for other system
activities.
Thanks,
Modassar

That is 12 machines, running a shard each?
What is the total amount of physical memory on each machine?

Post by Modassar Ather
When I query "network se*", the memory utilization goes upto 24-26 gb

and

Post by Modassar Ather
the query takes around 3+ minutes to execute. Also the CPU utilization

goes

Post by Modassar Ather
upto 400% in few of the nodes.

Well, se* probably expands to a great deal of documents, but a huge bump
in memory utilization and 3 minutes+ sounds strange.
- What are your normal query times?
- How many hits do you get from 'network se*'?
- How many results do you return (the rows-parameter)?
- If you issue a query without wildcards, but with approximately the
same amount of hits as 'network se*', how long does it take?

Post by Modassar Ather
Why the CPU utilization is so high and more than one core is used.
As far as I understand querying is single threaded.

That is strange, yes. Have you checked the logs to see if something
unexpected is going on while you test?

Post by Modassar Ather
How can I disable replication(as it is implicitly enabled) permanently

Post by Modassar Ather
in our case we are not using it but can see warnings related to leader
election?

If you are using spinning drives and only have 32GB of RAM in total in
each machine, you are probably struggling just to keep things running.
- Toke Eskildsen, State and University Library, Denmark

jim ferenczi

2015-11-02 10:38:17 UTC

12 shards with 28GB for the heap and 90GB for each index means that you
need at least 336GB for the heap (assuming you're using all of it which may
be easily the case considering the way the GC is handling memory) and ~=
1TO for the index. Let's say that you don't need your entire index in RAM,
the problem as I see it is that you don't have enough RAM for your index +
heap. Assuming your machine has 370GB of RAM there are only 34GB left for
your index, 1TO/34GB means that you can only have 1/30 of your entire index
in RAM. I would advise you to check the swap activity on the machine and
see if it correlates with the bad performance you're seeing. One important
thing to notice is that a significant part of your index needs to be in RAM
(especially if you're using SSDs) in order to achieve good performance:

*As mentioned above this is a big machine with 370+ gb of RAM and Solr (12
nodes total) is assigned 336 GB. The rest is still a good for other system
activities.*
The remaining size after you removed the heap usage should be reserved for
the index (not only the other system activities).

*Also the CPU utilization goes upto 400% in few of the nodes:*
You said that only machine is used so I assumed that 400% cpu is for a
single process (one solr node), right ?
This seems impossible if you are sure that only one query is played at a
time and no indexing is performed. Best thing to do is to dump stack trace
of the solr nodes during the query and to check what the threads are doing.

Jim

Post by Modassar Ather
Just to add one more point that one external Zookeeper instance is also
running on this particular machine.
Regards,
Modassar

system

Post by Modassar Ather
activities.
Thanks,
Modassar

Post by Modassar Ather
I have a setup of 12 shard cluster started with 28gb memory each on a
single server. There are no replica. The size of index is around 90gb

Post by Modassar Ather
each shard. The Solr version is 5.2.1.

That is 12 machines, running a shard each?
What is the total amount of physical memory on each machine?

Post by Modassar Ather
When I query "network se*", the memory utilization goes upto 24-26 gb

and

Post by Modassar Ather
the query takes around 3+ minutes to execute. Also the CPU utilization

goes

Post by Modassar Ather
upto 400% in few of the nodes.

Well, se* probably expands to a great deal of documents, but a huge bump
in memory utilization and 3 minutes+ sounds strange.
- What are your normal query times?
- How many hits do you get from 'network se*'?
- How many results do you return (the rows-parameter)?
- If you issue a query without wildcards, but with approximately the
same amount of hits as 'network se*', how long does it take?

Post by Modassar Ather
Why the CPU utilization is so high and more than one core is used.
As far as I understand querying is single threaded.

That is strange, yes. Have you checked the logs to see if something
unexpected is going on while you test?

Post by Modassar Ather
How can I disable replication(as it is implicitly enabled) permanently

Post by Modassar Ather
in our case we are not using it but can see warnings related to leader
election?

If you are using spinning drives and only have 32GB of RAM in total in
each machine, you are probably struggling just to keep things running.
- Toke Eskildsen, State and University Library, Denmark

jim ferenczi

2015-11-02 10:39:42 UTC

*if it correlates with the bad performance you're seeing. One important
thing to notice is that a significant part of your index needs to be in RAM
(especially if you're using SSDs) in order to achieve good performance.*

Especially if you're not using SSDs, sorry ;)

Post by jim ferenczi
12 shards with 28GB for the heap and 90GB for each index means that you
need at least 336GB for the heap (assuming you're using all of it which may
be easily the case considering the way the GC is handling memory) and ~=
1TO for the index. Let's say that you don't need your entire index in RAM,
the problem as I see it is that you don't have enough RAM for your index +
heap. Assuming your machine has 370GB of RAM there are only 34GB left for
your index, 1TO/34GB means that you can only have 1/30 of your entire index
in RAM. I would advise you to check the swap activity on the machine and
see if it correlates with the bad performance you're seeing. One important
thing to notice is that a significant part of your index needs to be in RAM
*As mentioned above this is a big machine with 370+ gb of RAM and Solr (12
nodes total) is assigned 336 GB. The rest is still a good for other system
activities.*
The remaining size after you removed the heap usage should be reserved for
the index (not only the other system activities).
*Also the CPU utilization goes upto 400% in few of the nodes:*
You said that only machine is used so I assumed that 400% cpu is for a
single process (one solr node), right ?
This seems impossible if you are sure that only one query is played at a
time and no indexing is performed. Best thing to do is to dump stack trace
of the solr nodes during the query and to check what the threads are doing.
Jim

Post by Modassar Ather
Just to add one more point that one external Zookeeper instance is also
running on this particular machine.
Regards,
Modassar

Post by Modassar Ather
complex queries with proximity and wild cards have taken more than 3-4
minutes and some times some queries have timed out too where time out is
set to 5 minutes.
- How many hits do you get from 'network se*'?
More than a million records.
- How many results do you return (the rows-parameter)?
It is the default one 10. Grouping is enabled on a field.
- If you issue a query without wildcards, but with approximately the
same amount of hits as 'network se*', how long does it take?
A query resulting in around half a million record return within a couple
of seconds.
That is strange, yes. Have you checked the logs to see if something
unexpected is going on while you test?
Have not seen anything particularly. Will try to check again.
If you are using spinning drives and only have 32GB of RAM in total in
each machine, you are probably struggling just to keep things running.
As mentioned above this is a big machine with 370+ gb of RAM and Solr

(12

Post by Modassar Ather
nodes total) is assigned 336 GB. The rest is still a good for other

system

Post by Modassar Ather
activities.
Thanks,
Modassar

Post by Modassar Ather
I have a setup of 12 shard cluster started with 28gb memory each on a
single server. There are no replica. The size of index is around

90gb on

Post by Modassar Ather
each shard. The Solr version is 5.2.1.

That is 12 machines, running a shard each?
What is the total amount of physical memory on each machine?

Post by Modassar Ather
When I query "network se*", the memory utilization goes upto 24-26 gb

and

Post by Modassar Ather
the query takes around 3+ minutes to execute. Also the CPU

utilization

Post by Toke Eskildsen
goes

Post by Modassar Ather
upto 400% in few of the nodes.

Well, se* probably expands to a great deal of documents, but a huge

bump

Post by Toke Eskildsen
in memory utilization and 3 minutes+ sounds strange.
- What are your normal query times?
- How many hits do you get from 'network se*'?
- How many results do you return (the rows-parameter)?
- If you issue a query without wildcards, but with approximately the
same amount of hits as 'network se*', how long does it take?

Post by Modassar Ather
Why the CPU utilization is so high and more than one core is used.
As far as I understand querying is single threaded.

That is strange, yes. Have you checked the logs to see if something
unexpected is going on while you test?

Post by Modassar Ather
How can I disable replication(as it is implicitly enabled)

permanently

Post by Toke Eskildsen
as

Post by Modassar Ather
in our case we are not using it but can see warnings related to

leader

Post by Modassar Ather
election?

If you are using spinning drives and only have 32GB of RAM in total in
each machine, you are probably struggling just to keep things running.
- Toke Eskildsen, State and University Library, Denmark

Modassar Ather

2015-11-02 10:55:19 UTC