Discussion:
Very high memory and CPU utilization.
Modassar Ather
2015-11-02 06:30:54 UTC
Permalink
Hi,

I have a setup of 12 shard cluster started with 28gb memory each on a
single server. There are no replica. The size of index is around 90gb on
each shard. The Solr version is 5.2.1.
When I query "network se*", the memory utilization goes upto 24-26 gb and
the query takes around 3+ minutes to execute. Also the CPU utilization goes
upto 400% in few of the nodes.

Kindly note that use of wildcard in above query can not be restricted.

Please help me understand why so much of the memory utilization? Please
correct me if I am wrong that it is because of the term expansion of *se**.
Why the CPU utilization is so high and more than one core is used. As far
as I understand querying is single threaded.

Help me understand the behavior of query timeout. How the client is
notified about the query time out?
How can I disable replication(as it is implicitly enabled) permanently as
in our case we are not using it but can see warnings related to leader
election?

Thanks,
Modassar
Toke Eskildsen
2015-11-02 08:00:47 UTC
Permalink
Post by Modassar Ather
I have a setup of 12 shard cluster started with 28gb memory each on a
single server. There are no replica. The size of index is around 90gb on
each shard. The Solr version is 5.2.1.
That is 12 machines, running a shard each?

What is the total amount of physical memory on each machine?
Post by Modassar Ather
When I query "network se*", the memory utilization goes upto 24-26 gb and
the query takes around 3+ minutes to execute. Also the CPU utilization goes
upto 400% in few of the nodes.
Well, se* probably expands to a great deal of documents, but a huge bump
in memory utilization and 3 minutes+ sounds strange.

- What are your normal query times?
- How many hits do you get from 'network se*'?
- How many results do you return (the rows-parameter)?
- If you issue a query without wildcards, but with approximately the
same amount of hits as 'network se*', how long does it take?
Post by Modassar Ather
Why the CPU utilization is so high and more than one core is used.
As far as I understand querying is single threaded.
That is strange, yes. Have you checked the logs to see if something
unexpected is going on while you test?
Post by Modassar Ather
How can I disable replication(as it is implicitly enabled) permanently as
in our case we are not using it but can see warnings related to leader
election?
If you are using spinning drives and only have 32GB of RAM in total in
each machine, you are probably struggling just to keep things running.


- Toke Eskildsen, State and University Library, Denmark
Modassar Ather
2015-11-02 09:04:12 UTC
Permalink
Hi Toke,
Thanks for your response. My comments in-line.

That is 12 machines, running a shard each?
No! This is a single big machine with 12 shards on it.

What is the total amount of physical memory on each machine?
Around 370 gb on the single machine.

Well, se* probably expands to a great deal of documents, but a huge bump
in memory utilization and 3 minutes+ sounds strange.

- What are your normal query times?
Few simple queries are returned with in a couple of seconds. But the more
complex queries with proximity and wild cards have taken more than 3-4
minutes and some times some queries have timed out too where time out is
set to 5 minutes.
- How many hits do you get from 'network se*'?
More than a million records.
- How many results do you return (the rows-parameter)?
It is the default one 10. Grouping is enabled on a field.
- If you issue a query without wildcards, but with approximately the
same amount of hits as 'network se*', how long does it take?
A query resulting in around half a million record return within a couple of
seconds.

That is strange, yes. Have you checked the logs to see if something
unexpected is going on while you test?
Have not seen anything particularly. Will try to check again.

If you are using spinning drives and only have 32GB of RAM in total in
each machine, you are probably struggling just to keep things running.
As mentioned above this is a big machine with 370+ gb of RAM and Solr (12
nodes total) is assigned 336 GB. The rest is still a good for other system
activities.

Thanks,
Modassar
Post by Toke Eskildsen
Post by Modassar Ather
I have a setup of 12 shard cluster started with 28gb memory each on a
single server. There are no replica. The size of index is around 90gb on
each shard. The Solr version is 5.2.1.
That is 12 machines, running a shard each?
What is the total amount of physical memory on each machine?
Post by Modassar Ather
When I query "network se*", the memory utilization goes upto 24-26 gb and
the query takes around 3+ minutes to execute. Also the CPU utilization
goes
Post by Modassar Ather
upto 400% in few of the nodes.
Well, se* probably expands to a great deal of documents, but a huge bump
in memory utilization and 3 minutes+ sounds strange.
- What are your normal query times?
- How many hits do you get from 'network se*'?
- How many results do you return (the rows-parameter)?
- If you issue a query without wildcards, but with approximately the
same amount of hits as 'network se*', how long does it take?
Post by Modassar Ather
Why the CPU utilization is so high and more than one core is used.
As far as I understand querying is single threaded.
That is strange, yes. Have you checked the logs to see if something
unexpected is going on while you test?
Post by Modassar Ather
How can I disable replication(as it is implicitly enabled) permanently as
in our case we are not using it but can see warnings related to leader
election?
If you are using spinning drives and only have 32GB of RAM in total in
each machine, you are probably struggling just to keep things running.
- Toke Eskildsen, State and University Library, Denmark
Modassar Ather
2015-11-02 09:38:08 UTC
Permalink
Just to add one more point that one external Zookeeper instance is also
running on this particular machine.

Regards,
Modassar
Post by Modassar Ather
Hi Toke,
Thanks for your response. My comments in-line.
That is 12 machines, running a shard each?
No! This is a single big machine with 12 shards on it.
What is the total amount of physical memory on each machine?
Around 370 gb on the single machine.
Well, se* probably expands to a great deal of documents, but a huge bump
in memory utilization and 3 minutes+ sounds strange.
- What are your normal query times?
Few simple queries are returned with in a couple of seconds. But the more
complex queries with proximity and wild cards have taken more than 3-4
minutes and some times some queries have timed out too where time out is
set to 5 minutes.
- How many hits do you get from 'network se*'?
More than a million records.
- How many results do you return (the rows-parameter)?
It is the default one 10. Grouping is enabled on a field.
- If you issue a query without wildcards, but with approximately the
same amount of hits as 'network se*', how long does it take?
A query resulting in around half a million record return within a couple
of seconds.
That is strange, yes. Have you checked the logs to see if something
unexpected is going on while you test?
Have not seen anything particularly. Will try to check again.
If you are using spinning drives and only have 32GB of RAM in total in
each machine, you are probably struggling just to keep things running.
As mentioned above this is a big machine with 370+ gb of RAM and Solr (12
nodes total) is assigned 336 GB. The rest is still a good for other system
activities.
Thanks,
Modassar
Post by Toke Eskildsen
Post by Modassar Ather
I have a setup of 12 shard cluster started with 28gb memory each on a
single server. There are no replica. The size of index is around 90gb on
each shard. The Solr version is 5.2.1.
That is 12 machines, running a shard each?
What is the total amount of physical memory on each machine?
Post by Modassar Ather
When I query "network se*", the memory utilization goes upto 24-26 gb
and
Post by Modassar Ather
the query takes around 3+ minutes to execute. Also the CPU utilization
goes
Post by Modassar Ather
upto 400% in few of the nodes.
Well, se* probably expands to a great deal of documents, but a huge bump
in memory utilization and 3 minutes+ sounds strange.
- What are your normal query times?
- How many hits do you get from 'network se*'?
- How many results do you return (the rows-parameter)?
- If you issue a query without wildcards, but with approximately the
same amount of hits as 'network se*', how long does it take?
Post by Modassar Ather
Why the CPU utilization is so high and more than one core is used.
As far as I understand querying is single threaded.
That is strange, yes. Have you checked the logs to see if something
unexpected is going on while you test?
Post by Modassar Ather
How can I disable replication(as it is implicitly enabled) permanently
as
Post by Modassar Ather
in our case we are not using it but can see warnings related to leader
election?
If you are using spinning drives and only have 32GB of RAM in total in
each machine, you are probably struggling just to keep things running.
- Toke Eskildsen, State and University Library, Denmark
jim ferenczi
2015-11-02 10:38:17 UTC
Permalink
12 shards with 28GB for the heap and 90GB for each index means that you
need at least 336GB for the heap (assuming you're using all of it which may
be easily the case considering the way the GC is handling memory) and ~=
1TO for the index. Let's say that you don't need your entire index in RAM,
the problem as I see it is that you don't have enough RAM for your index +
heap. Assuming your machine has 370GB of RAM there are only 34GB left for
your index, 1TO/34GB means that you can only have 1/30 of your entire index
in RAM. I would advise you to check the swap activity on the machine and
see if it correlates with the bad performance you're seeing. One important
thing to notice is that a significant part of your index needs to be in RAM
(especially if you're using SSDs) in order to achieve good performance:



*As mentioned above this is a big machine with 370+ gb of RAM and Solr (12
nodes total) is assigned 336 GB. The rest is still a good for other system
activities.*
The remaining size after you removed the heap usage should be reserved for
the index (not only the other system activities).


*Also the CPU utilization goes upto 400% in few of the nodes:*
You said that only machine is used so I assumed that 400% cpu is for a
single process (one solr node), right ?
This seems impossible if you are sure that only one query is played at a
time and no indexing is performed. Best thing to do is to dump stack trace
of the solr nodes during the query and to check what the threads are doing.

Jim
Post by Modassar Ather
Just to add one more point that one external Zookeeper instance is also
running on this particular machine.
Regards,
Modassar
Post by Modassar Ather
Hi Toke,
Thanks for your response. My comments in-line.
That is 12 machines, running a shard each?
No! This is a single big machine with 12 shards on it.
What is the total amount of physical memory on each machine?
Around 370 gb on the single machine.
Well, se* probably expands to a great deal of documents, but a huge bump
in memory utilization and 3 minutes+ sounds strange.
- What are your normal query times?
Few simple queries are returned with in a couple of seconds. But the more
complex queries with proximity and wild cards have taken more than 3-4
minutes and some times some queries have timed out too where time out is
set to 5 minutes.
- How many hits do you get from 'network se*'?
More than a million records.
- How many results do you return (the rows-parameter)?
It is the default one 10. Grouping is enabled on a field.
- If you issue a query without wildcards, but with approximately the
same amount of hits as 'network se*', how long does it take?
A query resulting in around half a million record return within a couple
of seconds.
That is strange, yes. Have you checked the logs to see if something
unexpected is going on while you test?
Have not seen anything particularly. Will try to check again.
If you are using spinning drives and only have 32GB of RAM in total in
each machine, you are probably struggling just to keep things running.
As mentioned above this is a big machine with 370+ gb of RAM and Solr (12
nodes total) is assigned 336 GB. The rest is still a good for other
system
Post by Modassar Ather
activities.
Thanks,
Modassar
Post by Toke Eskildsen
Post by Modassar Ather
I have a setup of 12 shard cluster started with 28gb memory each on a
single server. There are no replica. The size of index is around 90gb
on
Post by Modassar Ather
Post by Toke Eskildsen
Post by Modassar Ather
each shard. The Solr version is 5.2.1.
That is 12 machines, running a shard each?
What is the total amount of physical memory on each machine?
Post by Modassar Ather
When I query "network se*", the memory utilization goes upto 24-26 gb
and
Post by Modassar Ather
the query takes around 3+ minutes to execute. Also the CPU utilization
goes
Post by Modassar Ather
upto 400% in few of the nodes.
Well, se* probably expands to a great deal of documents, but a huge bump
in memory utilization and 3 minutes+ sounds strange.
- What are your normal query times?
- How many hits do you get from 'network se*'?
- How many results do you return (the rows-parameter)?
- If you issue a query without wildcards, but with approximately the
same amount of hits as 'network se*', how long does it take?
Post by Modassar Ather
Why the CPU utilization is so high and more than one core is used.
As far as I understand querying is single threaded.
That is strange, yes. Have you checked the logs to see if something
unexpected is going on while you test?
Post by Modassar Ather
How can I disable replication(as it is implicitly enabled) permanently
as
Post by Modassar Ather
in our case we are not using it but can see warnings related to leader
election?
If you are using spinning drives and only have 32GB of RAM in total in
each machine, you are probably struggling just to keep things running.
- Toke Eskildsen, State and University Library, Denmark
jim ferenczi
2015-11-02 10:39:42 UTC
Permalink
*if it correlates with the bad performance you're seeing. One important
thing to notice is that a significant part of your index needs to be in RAM
(especially if you're using SSDs) in order to achieve good performance.*

Especially if you're not using SSDs, sorry ;)
Post by jim ferenczi
12 shards with 28GB for the heap and 90GB for each index means that you
need at least 336GB for the heap (assuming you're using all of it which may
be easily the case considering the way the GC is handling memory) and ~=
1TO for the index. Let's say that you don't need your entire index in RAM,
the problem as I see it is that you don't have enough RAM for your index +
heap. Assuming your machine has 370GB of RAM there are only 34GB left for
your index, 1TO/34GB means that you can only have 1/30 of your entire index
in RAM. I would advise you to check the swap activity on the machine and
see if it correlates with the bad performance you're seeing. One important
thing to notice is that a significant part of your index needs to be in RAM
*As mentioned above this is a big machine with 370+ gb of RAM and Solr (12
nodes total) is assigned 336 GB. The rest is still a good for other system
activities.*
The remaining size after you removed the heap usage should be reserved for
the index (not only the other system activities).
*Also the CPU utilization goes upto 400% in few of the nodes:*
You said that only machine is used so I assumed that 400% cpu is for a
single process (one solr node), right ?
This seems impossible if you are sure that only one query is played at a
time and no indexing is performed. Best thing to do is to dump stack trace
of the solr nodes during the query and to check what the threads are doing.
Jim
Post by Modassar Ather
Just to add one more point that one external Zookeeper instance is also
running on this particular machine.
Regards,
Modassar
Post by Modassar Ather
Hi Toke,
Thanks for your response. My comments in-line.
That is 12 machines, running a shard each?
No! This is a single big machine with 12 shards on it.
What is the total amount of physical memory on each machine?
Around 370 gb on the single machine.
Well, se* probably expands to a great deal of documents, but a huge bump
in memory utilization and 3 minutes+ sounds strange.
- What are your normal query times?
Few simple queries are returned with in a couple of seconds. But the
more
Post by Modassar Ather
complex queries with proximity and wild cards have taken more than 3-4
minutes and some times some queries have timed out too where time out is
set to 5 minutes.
- How many hits do you get from 'network se*'?
More than a million records.
- How many results do you return (the rows-parameter)?
It is the default one 10. Grouping is enabled on a field.
- If you issue a query without wildcards, but with approximately the
same amount of hits as 'network se*', how long does it take?
A query resulting in around half a million record return within a couple
of seconds.
That is strange, yes. Have you checked the logs to see if something
unexpected is going on while you test?
Have not seen anything particularly. Will try to check again.
If you are using spinning drives and only have 32GB of RAM in total in
each machine, you are probably struggling just to keep things running.
As mentioned above this is a big machine with 370+ gb of RAM and Solr
(12
Post by Modassar Ather
nodes total) is assigned 336 GB. The rest is still a good for other
system
Post by Modassar Ather
activities.
Thanks,
Modassar
Post by Toke Eskildsen
Post by Modassar Ather
I have a setup of 12 shard cluster started with 28gb memory each on a
single server. There are no replica. The size of index is around
90gb on
Post by Modassar Ather
Post by Toke Eskildsen
Post by Modassar Ather
each shard. The Solr version is 5.2.1.
That is 12 machines, running a shard each?
What is the total amount of physical memory on each machine?
Post by Modassar Ather
When I query "network se*", the memory utilization goes upto 24-26 gb
and
Post by Modassar Ather
the query takes around 3+ minutes to execute. Also the CPU
utilization
Post by Modassar Ather
Post by Toke Eskildsen
goes
Post by Modassar Ather
upto 400% in few of the nodes.
Well, se* probably expands to a great deal of documents, but a huge
bump
Post by Modassar Ather
Post by Toke Eskildsen
in memory utilization and 3 minutes+ sounds strange.
- What are your normal query times?
- How many hits do you get from 'network se*'?
- How many results do you return (the rows-parameter)?
- If you issue a query without wildcards, but with approximately the
same amount of hits as 'network se*', how long does it take?
Post by Modassar Ather
Why the CPU utilization is so high and more than one core is used.
As far as I understand querying is single threaded.
That is strange, yes. Have you checked the logs to see if something
unexpected is going on while you test?
Post by Modassar Ather
How can I disable replication(as it is implicitly enabled)
permanently
Post by Modassar Ather
Post by Toke Eskildsen
as
Post by Modassar Ather
in our case we are not using it but can see warnings related to
leader
Post by Modassar Ather
Post by Toke Eskildsen
Post by Modassar Ather
election?
If you are using spinning drives and only have 32GB of RAM in total in
each machine, you are probably struggling just to keep things running.
- Toke Eskildsen, State and University Library, Denmark
Modassar Ather
2015-11-02 10:55:19 UTC
Permalink
Thanks Jim for your response.

The remaining size after you removed the heap usage should be reserved for
the index (not only the other system activities).
I am not able to get the above point. So when I start Solr with 28g RAM,
for all the activities related to Solr it should not go beyond 28g. And the
remaining heap will be used for activities other than Solr. Please help me
understand.

*Also the CPU utilization goes upto 400% in few of the nodes:*
You said that only machine is used so I assumed that 400% cpu is for a
single process (one solr node), right ?
Yes you are right that 400% is for single process.
The disks are SSDs.

Regards,
Modassar
Post by jim ferenczi
*if it correlates with the bad performance you're seeing. One important
thing to notice is that a significant part of your index needs to be in RAM
(especially if you're using SSDs) in order to achieve good performance.*
Especially if you're not using SSDs, sorry ;)
Post by jim ferenczi
12 shards with 28GB for the heap and 90GB for each index means that you
need at least 336GB for the heap (assuming you're using all of it which
may
Post by jim ferenczi
be easily the case considering the way the GC is handling memory) and ~=
1TO for the index. Let's say that you don't need your entire index in
RAM,
Post by jim ferenczi
the problem as I see it is that you don't have enough RAM for your index
+
Post by jim ferenczi
heap. Assuming your machine has 370GB of RAM there are only 34GB left for
your index, 1TO/34GB means that you can only have 1/30 of your entire
index
Post by jim ferenczi
in RAM. I would advise you to check the swap activity on the machine and
see if it correlates with the bad performance you're seeing. One
important
Post by jim ferenczi
thing to notice is that a significant part of your index needs to be in
RAM
Post by jim ferenczi
*As mentioned above this is a big machine with 370+ gb of RAM and Solr
(12
Post by jim ferenczi
nodes total) is assigned 336 GB. The rest is still a good for other
system
Post by jim ferenczi
activities.*
The remaining size after you removed the heap usage should be reserved
for
Post by jim ferenczi
the index (not only the other system activities).
*Also the CPU utilization goes upto 400% in few of the nodes:*
You said that only machine is used so I assumed that 400% cpu is for a
single process (one solr node), right ?
This seems impossible if you are sure that only one query is played at a
time and no indexing is performed. Best thing to do is to dump stack
trace
Post by jim ferenczi
of the solr nodes during the query and to check what the threads are
doing.
Post by jim ferenczi
Jim
Post by Modassar Ather
Just to add one more point that one external Zookeeper instance is also
running on this particular machine.
Regards,
Modassar
Post by Modassar Ather
Hi Toke,
Thanks for your response. My comments in-line.
That is 12 machines, running a shard each?
No! This is a single big machine with 12 shards on it.
What is the total amount of physical memory on each machine?
Around 370 gb on the single machine.
Well, se* probably expands to a great deal of documents, but a huge
bump
Post by jim ferenczi
Post by Modassar Ather
Post by Modassar Ather
in memory utilization and 3 minutes+ sounds strange.
- What are your normal query times?
Few simple queries are returned with in a couple of seconds. But the
more
Post by Modassar Ather
complex queries with proximity and wild cards have taken more than 3-4
minutes and some times some queries have timed out too where time out
is
Post by jim ferenczi
Post by Modassar Ather
Post by Modassar Ather
set to 5 minutes.
- How many hits do you get from 'network se*'?
More than a million records.
- How many results do you return (the rows-parameter)?
It is the default one 10. Grouping is enabled on a field.
- If you issue a query without wildcards, but with approximately the
same amount of hits as 'network se*', how long does it take?
A query resulting in around half a million record return within a
couple
Post by jim ferenczi
Post by Modassar Ather
Post by Modassar Ather
of seconds.
That is strange, yes. Have you checked the logs to see if something
unexpected is going on while you test?
Have not seen anything particularly. Will try to check again.
If you are using spinning drives and only have 32GB of RAM in total in
each machine, you are probably struggling just to keep things running.
As mentioned above this is a big machine with 370+ gb of RAM and Solr
(12
Post by Modassar Ather
nodes total) is assigned 336 GB. The rest is still a good for other
system
Post by Modassar Ather
activities.
Thanks,
Modassar
On Mon, Nov 2, 2015 at 1:30 PM, Toke Eskildsen <
Post by Toke Eskildsen
Post by Modassar Ather
I have a setup of 12 shard cluster started with 28gb memory each
on a
Post by jim ferenczi
Post by Modassar Ather
Post by Modassar Ather
Post by Toke Eskildsen
Post by Modassar Ather
single server. There are no replica. The size of index is around
90gb on
Post by Modassar Ather
Post by Toke Eskildsen
Post by Modassar Ather
each shard. The Solr version is 5.2.1.
That is 12 machines, running a shard each?
What is the total amount of physical memory on each machine?
Post by Modassar Ather
When I query "network se*", the memory utilization goes upto 24-26
gb
Post by jim ferenczi
Post by Modassar Ather
Post by Modassar Ather
Post by Toke Eskildsen
and
Post by Modassar Ather
the query takes around 3+ minutes to execute. Also the CPU
utilization
Post by Modassar Ather
Post by Toke Eskildsen
goes
Post by Modassar Ather
upto 400% in few of the nodes.
Well, se* probably expands to a great deal of documents, but a huge
bump
Post by Modassar Ather
Post by Toke Eskildsen
in memory utilization and 3 minutes+ sounds strange.
- What are your normal query times?
- How many hits do you get from 'network se*'?
- How many results do you return (the rows-parameter)?
- If you issue a query without wildcards, but with approximately the
same amount of hits as 'network se*', how long does it take?
Post by Modassar Ather
Why the CPU utilization is so high and more than one core is used.
As far as I understand querying is single threaded.
That is strange, yes. Have you checked the logs to see if something
unexpected is going on while you test?
Post by Modassar Ather
How can I disable replication(as it is implicitly enabled)
permanently
Post by Modassar Ather
Post by Toke Eskildsen
as
Post by Modassar Ather
in our case we are not using it but can see warnings related to
leader
Post by Modassar Ather
Post by Toke Eskildsen
Post by Modassar Ather
election?
If you are using spinning drives and only have 32GB of RAM in total
in
Post by jim ferenczi
Post by Modassar Ather
Post by Modassar Ather
Post by Toke Eskildsen
each machine, you are probably struggling just to keep things
running.
Post by jim ferenczi
Post by Modassar Ather
Post by Modassar Ather
Post by Toke Eskildsen
- Toke Eskildsen, State and University Library, Denmark
Toke Eskildsen
2015-11-02 11:50:34 UTC
Permalink
Post by jim ferenczi
The remaining size after you removed the heap usage should be reserved for
the index (not only the other system activities).
I am not able to get the above point. So when I start Solr with 28g RAM,
for all the activities related to Solr it should not go beyond 28g. And the
remaining heap will be used for activities other than Solr. Please help me
understand.
It is described here:
https://wiki.apache.org/solr/SolrPerformanceProblems#OS_Disk_Cache

I will be quick to add that I do not agree with Shawn (the primary
author of the page) on the stated limits and find that the page in
general ignores that performance requirements differ a great deal.
Nevertheless, it is very true that Solr performance is tied to the
amount of OS disk cache:

You can have a machine with 10TB of RAM, but Solr performance will still
be poor if you use it all for JVMs.

Practically all modern operating system uses free memory for disk cache.
Free memory is the memory not used for JVMs or other programs. It might
be that you have a lot less than 30-40GB free: If you are on a Linux
server, try calling 'top' and see what is says under 'cached'.

Related, I support jim's suggestion to inspect the swap activity:
In the past we had problem with a machine that insisted on swapping
excessively, although there were high IO and free memory.
Post by jim ferenczi
The disks are SSDs.
That makes your observations stranger still.


- Toke Eskildsen, State and University Library, Denmark
Modassar Ather
2015-11-02 11:57:27 UTC
Permalink
Okay. I guess your observation of 400% for a single core is with top and
looking at that core's entry? If so, the 400% can be explained by
excessive garbage collection. You could turn GC-logging on to check
that. With a bit of luck GC would be the cause of the slow down.

Yes it is with top command. I will check GC activities and try to relate
with CPU usage.

The query q=network se* is quick enough in our system too. It takes around
3-4 seconds for around 8 million records.
The problem is with the same query as phrase. q="network se*".
Can you please share your experience with such query where the wild card
expansion is huge like in the query above?

I changed my SolrCloud setup from 12 shard to 8 shard and given each shard
30 GB of RAM on the same machine with same index size (re-indexed) but
could not see the significant improvement for the query given.

I will check the swap activity.

Also can you please share your experiences with respect to RAM, GC, solr
cache setup etc as it seems by your comment that the SolrCloud environment
you have is kind of similar to the one I work on?

Regards,
Modassar
Post by Toke Eskildsen
Post by jim ferenczi
The remaining size after you removed the heap usage should be reserved
for
Post by jim ferenczi
the index (not only the other system activities).
I am not able to get the above point. So when I start Solr with 28g RAM,
for all the activities related to Solr it should not go beyond 28g. And
the
Post by jim ferenczi
remaining heap will be used for activities other than Solr. Please help
me
Post by jim ferenczi
understand.
https://wiki.apache.org/solr/SolrPerformanceProblems#OS_Disk_Cache
I will be quick to add that I do not agree with Shawn (the primary
author of the page) on the stated limits and find that the page in
general ignores that performance requirements differ a great deal.
Nevertheless, it is very true that Solr performance is tied to the
You can have a machine with 10TB of RAM, but Solr performance will still
be poor if you use it all for JVMs.
Practically all modern operating system uses free memory for disk cache.
Free memory is the memory not used for JVMs or other programs. It might
be that you have a lot less than 30-40GB free: If you are on a Linux
server, try calling 'top' and see what is says under 'cached'.
In the past we had problem with a machine that insisted on swapping
excessively, although there were high IO and free memory.
Post by jim ferenczi
The disks are SSDs.
That makes your observations stranger still.
- Toke Eskildsen, State and University Library, Denmark
Modassar Ather
2015-11-02 12:21:52 UTC
Permalink
I monitored swap activities for the query using vmstat. The *so* and *si*
shows 0 till the completion of query. Also the top showed 0 against swap.
This means there was no scarcity of physical memory. Swap activity seems
not to be a bottleneck.
Kindly note that this I ran on 8 node cluster with 30 gb RAM and 140 gb of
index on each node.

Regards,
Modassar
Post by Toke Eskildsen
Okay. I guess your observation of 400% for a single core is with top and
looking at that core's entry? If so, the 400% can be explained by
excessive garbage collection. You could turn GC-logging on to check
that. With a bit of luck GC would be the cause of the slow down.
Yes it is with top command. I will check GC activities and try to relate
with CPU usage.
The query q=network se* is quick enough in our system too. It takes around
3-4 seconds for around 8 million records.
The problem is with the same query as phrase. q="network se*".
Can you please share your experience with such query where the wild card
expansion is huge like in the query above?
I changed my SolrCloud setup from 12 shard to 8 shard and given each shard
30 GB of RAM on the same machine with same index size (re-indexed) but
could not see the significant improvement for the query given.
I will check the swap activity.
Also can you please share your experiences with respect to RAM, GC, solr
cache setup etc as it seems by your comment that the SolrCloud environment
you have is kind of similar to the one I work on?
Regards,
Modassar
Post by Toke Eskildsen
Post by jim ferenczi
The remaining size after you removed the heap usage should be reserved
for
Post by jim ferenczi
the index (not only the other system activities).
I am not able to get the above point. So when I start Solr with 28g
RAM,
Post by jim ferenczi
for all the activities related to Solr it should not go beyond 28g. And
the
Post by jim ferenczi
remaining heap will be used for activities other than Solr. Please help
me
Post by jim ferenczi
understand.
https://wiki.apache.org/solr/SolrPerformanceProblems#OS_Disk_Cache
I will be quick to add that I do not agree with Shawn (the primary
author of the page) on the stated limits and find that the page in
general ignores that performance requirements differ a great deal.
Nevertheless, it is very true that Solr performance is tied to the
You can have a machine with 10TB of RAM, but Solr performance will still
be poor if you use it all for JVMs.
Practically all modern operating system uses free memory for disk cache.
Free memory is the memory not used for JVMs or other programs. It might
be that you have a lot less than 30-40GB free: If you are on a Linux
server, try calling 'top' and see what is says under 'cached'.
In the past we had problem with a machine that insisted on swapping
excessively, although there were high IO and free memory.
Post by jim ferenczi
The disks are SSDs.
That makes your observations stranger still.
- Toke Eskildsen, State and University Library, Denmark
Toke Eskildsen
2015-11-02 13:17:00 UTC
Permalink
Post by Modassar Ather
The query q=network se* is quick enough in our system too. It takes
around 3-4 seconds for around 8 million records.
The problem is with the same query as phrase. q="network se*".
I misunderstood your query then. I tried replicating it with
q="der se*"

http://rosalind:52300/solr/collection1/select?q=%22der+se*%
22&wt=json&indent=true&facet=false&group=true&group.field=domain

gets expanded to

parsedquery": "(+DisjunctionMaxQuery((content_text:\"kan svane\" |
author:kan svane* | text:\"kan svane\" | title:\"kan svane\" | url:kan
svane* | description:\"kan svane\")) ())/no_coord"

The result was 1,043,258,271 hits in 15,211 ms


Interestingly enough, a search for
q="kan svane*"
resulted in 711 hits in 12,470 ms. Maybe because 'kan' alone matches 1
billion+ documents. On that note,
q=se*
resulted in -951812427 hits in 194,276 ms.

Now this is interesting. The negative number seems to be caused by
grouping, but I finally got the response time up in the minutes. Still
no memory problems though. Hits without grouping were 3,343,154,869.

For comparison,
q=http
resulted in -1527418054 hits in 87,464 ms. Without grouping the hit
count was 7,062,516,538. Twice the hits of 'se*' in half the time.
Post by Modassar Ather
I changed my SolrCloud setup from 12 shard to 8 shard and given each
shard 30 GB of RAM on the same machine with same index size
(re-indexed) but could not see the significant improvement for the
query given.
Strange. I would have expected the extra free memory for disk space to
help performance.
Post by Modassar Ather
Also can you please share your experiences with respect to RAM, GC,
solr cache setup etc as it seems by your comment that the SolrCloud
environment you have is kind of similar to the one I work on?
There is a short write up at
https://sbdevel.wordpress.com/net-archive-search/

- Toke Eskildsen, State and University Library, Denmark
Toke Eskildsen
2015-11-02 13:59:09 UTC
Permalink
Post by Toke Eskildsen
http://rosalind:52300/solr/collection1/select?q=%22der+se*%
22&wt=json&indent=true&facet=false&group=true&group.field=domain
gets expanded to
parsedquery": "(+DisjunctionMaxQuery((content_text:\"kan svane\" |
author:kan svane* | text:\"kan svane\" | title:\"kan svane\" | url:kan
svane* | description:\"kan svane\")) ())/no_coord"
Wrong copy-paste, sorry. The correct expansion of "der se*" is

"rawquerystring": "\"der se*\"",

"querystring": "\"der se*\"",

"parsedquery": "(+DisjunctionMaxQuery((content_text:se | author:der se*
| text:se | title:se | url:der se* | description:se)) ())/no_coord",

"parsedquery_toString": "+(content_text:se | author:der se* | text:se |
title:se | url:der se* | description:se) ()",

"QParser": "ExtendedDismaxQParser",



This supports jim's claim that "foo bar*" is probably not doing what you
(Modassar) think it is doing.


- Toke Eskildsen, State and University Library, Denmark
Walter Underwood
2015-11-02 16:47:00 UTC
Permalink
To back up a bit, how many documents are in this 90GB index? You might not need to shard at all.

Why are you sending a query with a trailing wildcard? Are you matching the prefix of words, for query completion? If so, look at the suggester, which is designed to solve exactly that. Or you can use the EdgeNgramFilter to index prefixes. That will make your index larger, but prefix searches will be very fast.

wunder
Walter Underwood
***@wunderwood.org
http://observer.wunderwood.org/ (my blog)
Post by Toke Eskildsen
Post by Modassar Ather
The query q=network se* is quick enough in our system too. It takes
around 3-4 seconds for around 8 million records.
The problem is with the same query as phrase. q="network se*".
I misunderstood your query then. I tried replicating it with
q="der se*"
http://rosalind:52300/solr/collection1/select?q=%22der+se*%
22&wt=json&indent=true&facet=false&group=true&group.field=domain
gets expanded to
parsedquery": "(+DisjunctionMaxQuery((content_text:\"kan svane\" |
author:kan svane* | text:\"kan svane\" | title:\"kan svane\" | url:kan
svane* | description:\"kan svane\")) ())/no_coord"
The result was 1,043,258,271 hits in 15,211 ms
Interestingly enough, a search for
q="kan svane*"
resulted in 711 hits in 12,470 ms. Maybe because 'kan' alone matches 1
billion+ documents. On that note,
q=se*
resulted in -951812427 hits in 194,276 ms.
Now this is interesting. The negative number seems to be caused by
grouping, but I finally got the response time up in the minutes. Still
no memory problems though. Hits without grouping were 3,343,154,869.
For comparison,
q=http
resulted in -1527418054 hits in 87,464 ms. Without grouping the hit
count was 7,062,516,538. Twice the hits of 'se*' in half the time.
Post by Modassar Ather
I changed my SolrCloud setup from 12 shard to 8 shard and given each
shard 30 GB of RAM on the same machine with same index size
(re-indexed) but could not see the significant improvement for the
query given.
Strange. I would have expected the extra free memory for disk space to
help performance.
Post by Modassar Ather
Also can you please share your experiences with respect to RAM, GC,
solr cache setup etc as it seems by your comment that the SolrCloud
environment you have is kind of similar to the one I work on?
There is a short write up at
https://sbdevel.wordpress.com/net-archive-search/
- Toke Eskildsen, State and University Library, Denmark
Modassar Ather
2015-11-03 05:39:22 UTC
Permalink
Thanks Walter for your response,

It is around 90GB of index (around 8 million documents) on one shard and
there are 12 such shards. As per my understanding the sharding is required
for this case. Please help me understand if it is not required.

We have requirements where we need full wild card support to be provided to
our users.
I will try using EdgeNgramFilter. Can you please help me understand if
EdgeNgramFilter can be a replacement of wild cards?
There are situations where the words may be extended with some special
characters e.g. For se* there can be a match secondry-school which also
needs to be considered.

Regards,
Modassar
Post by Walter Underwood
To back up a bit, how many documents are in this 90GB index? You might not
need to shard at all.
Why are you sending a query with a trailing wildcard? Are you matching the
prefix of words, for query completion? If so, look at the suggester, which
is designed to solve exactly that. Or you can use the EdgeNgramFilter to
index prefixes. That will make your index larger, but prefix searches will
be very fast.
wunder
Walter Underwood
http://observer.wunderwood.org/ (my blog)
Post by Toke Eskildsen
Post by Modassar Ather
The query q=network se* is quick enough in our system too. It takes
around 3-4 seconds for around 8 million records.
The problem is with the same query as phrase. q="network se*".
I misunderstood your query then. I tried replicating it with
q="der se*"
http://rosalind:52300/solr/collection1/select?q=%22der+se*%
22&wt=json&indent=true&facet=false&group=true&group.field=domain
gets expanded to
parsedquery": "(+DisjunctionMaxQuery((content_text:\"kan svane\" |
author:kan svane* | text:\"kan svane\" | title:\"kan svane\" | url:kan
svane* | description:\"kan svane\")) ())/no_coord"
The result was 1,043,258,271 hits in 15,211 ms
Interestingly enough, a search for
q="kan svane*"
resulted in 711 hits in 12,470 ms. Maybe because 'kan' alone matches 1
billion+ documents. On that note,
q=se*
resulted in -951812427 hits in 194,276 ms.
Now this is interesting. The negative number seems to be caused by
grouping, but I finally got the response time up in the minutes. Still
no memory problems though. Hits without grouping were 3,343,154,869.
For comparison,
q=http
resulted in -1527418054 hits in 87,464 ms. Without grouping the hit
count was 7,062,516,538. Twice the hits of 'se*' in half the time.
Post by Modassar Ather
I changed my SolrCloud setup from 12 shard to 8 shard and given each
shard 30 GB of RAM on the same machine with same index size
(re-indexed) but could not see the significant improvement for the
query given.
Strange. I would have expected the extra free memory for disk space to
help performance.
Post by Modassar Ather
Also can you please share your experiences with respect to RAM, GC,
solr cache setup etc as it seems by your comment that the SolrCloud
environment you have is kind of similar to the one I work on?
There is a short write up at
https://sbdevel.wordpress.com/net-archive-search/
- Toke Eskildsen, State and University Library, Denmark
Walter Underwood
2015-11-03 06:34:34 UTC
Permalink
One rule of thumb for Solr is to shard after you reach 100 million documents. With large documents, you might want to shard sooner.

We are running an unsharded index of 7 million documents (55GB) without problems.

The EdgeNgramFilter generates a set of prefix terms for each term in the document. For the term “secondary”, it would generate:

s
se
sec
seco
secon
second
seconda
secondar
secondary

Obviously, this makes the index larger. But it makes prefix match a simple lookup, without needing wildcards.

Again, we can help you more if you describe what you are trying to do.

wunder
Walter Underwood
***@wunderwood.org
http://observer.wunderwood.org/ (my blog)
Post by Modassar Ather
Thanks Walter for your response,
It is around 90GB of index (around 8 million documents) on one shard and
there are 12 such shards. As per my understanding the sharding is required
for this case. Please help me understand if it is not required.
We have requirements where we need full wild card support to be provided to
our users.
I will try using EdgeNgramFilter. Can you please help me understand if
EdgeNgramFilter can be a replacement of wild cards?
There are situations where the words may be extended with some special
characters e.g. For se* there can be a match secondry-school which also
needs to be considered.
Regards,
Modassar
Post by Walter Underwood
To back up a bit, how many documents are in this 90GB index? You might not
need to shard at all.
Why are you sending a query with a trailing wildcard? Are you matching the
prefix of words, for query completion? If so, look at the suggester, which
is designed to solve exactly that. Or you can use the EdgeNgramFilter to
index prefixes. That will make your index larger, but prefix searches will
be very fast.
wunder
Walter Underwood
http://observer.wunderwood.org/ (my blog)
Post by Toke Eskildsen
Post by Modassar Ather
The query q=network se* is quick enough in our system too. It takes
around 3-4 seconds for around 8 million records.
The problem is with the same query as phrase. q="network se*".
I misunderstood your query then. I tried replicating it with
q="der se*"
http://rosalind:52300/solr/collection1/select?q=%22der+se*%
22&wt=json&indent=true&facet=false&group=true&group.field=domain
gets expanded to
parsedquery": "(+DisjunctionMaxQuery((content_text:\"kan svane\" |
author:kan svane* | text:\"kan svane\" | title:\"kan svane\" | url:kan
svane* | description:\"kan svane\")) ())/no_coord"
The result was 1,043,258,271 hits in 15,211 ms
Interestingly enough, a search for
q="kan svane*"
resulted in 711 hits in 12,470 ms. Maybe because 'kan' alone matches 1
billion+ documents. On that note,
q=se*
resulted in -951812427 hits in 194,276 ms.
Now this is interesting. The negative number seems to be caused by
grouping, but I finally got the response time up in the minutes. Still
no memory problems though. Hits without grouping were 3,343,154,869.
For comparison,
q=http
resulted in -1527418054 hits in 87,464 ms. Without grouping the hit
count was 7,062,516,538. Twice the hits of 'se*' in half the time.
Post by Modassar Ather
I changed my SolrCloud setup from 12 shard to 8 shard and given each
shard 30 GB of RAM on the same machine with same index size
(re-indexed) but could not see the significant improvement for the
query given.
Strange. I would have expected the extra free memory for disk space to
help performance.
Post by Modassar Ather
Also can you please share your experiences with respect to RAM, GC,
solr cache setup etc as it seems by your comment that the SolrCloud
environment you have is kind of similar to the one I work on?
There is a short write up at
https://sbdevel.wordpress.com/net-archive-search/
- Toke Eskildsen, State and University Library, Denmark
Toke Eskildsen
2015-11-03 07:45:13 UTC
Permalink
Post by Modassar Ather
It is around 90GB of index (around 8 million documents) on one shard and
there are 12 such shards. As per my understanding the sharding is required
for this case. Please help me understand if it is not required.
Except for an internal limit of 2 billion documents/shard (or 2 billion
unique values in a field in a single shard), there are no requirements
as such.

Our shards are 900GB / 200M+ documents and works well for our use case,
but it all depends on what you are doing. Your heaps are quite large
already, so merging into a single shard would probably require a heap so
large that your would run into trouble with garbage collection.


Your problem seems to be query processing speed. If your machine is not
maxed out by many concurrent requests, sharding should help you there:
As you have noticed, it allows the search to take advantage of multiple
processors.


- Toke Eskildsen, State and University Library, Denmark

jim ferenczi
2015-11-02 12:36:09 UTC
Permalink
*I am not able to get the above point. So when I start Solr with 28g RAM,
for all the activities related to Solr it should not go beyond 28g. And the
remaining heap will be used for activities other than Solr. Please help me
understand.*

Well those 28GB of heap are the memory "reserved" for your Solr
application, though some parts of the index (not to say all) are retrieved
via MMap (if you use the default MMapDirectory) which do not use the heap
at all. This is a very important part of Lucene/Solr, the heap should be
sized in a way that let a significant amount of RAM available for the
index. If not then you rely on the speed of your disk, if you have SSDs
it's better but reads are still significantly slower with SSDs than with
direct RAM access. Another thing to keep in mind is that mmap will always
tries to put things in RAM, this is why I suspect that you swap activity is
killing your performance.
Post by Modassar Ather
Thanks Jim for your response.
The remaining size after you removed the heap usage should be reserved for
the index (not only the other system activities).
I am not able to get the above point. So when I start Solr with 28g RAM,
for all the activities related to Solr it should not go beyond 28g. And the
remaining heap will be used for activities other than Solr. Please help me
understand.
*Also the CPU utilization goes upto 400% in few of the nodes:*
You said that only machine is used so I assumed that 400% cpu is for a
single process (one solr node), right ?
Yes you are right that 400% is for single process.
The disks are SSDs.
Regards,
Modassar
Post by jim ferenczi
*if it correlates with the bad performance you're seeing. One important
thing to notice is that a significant part of your index needs to be in
RAM
Post by jim ferenczi
(especially if you're using SSDs) in order to achieve good performance.*
Especially if you're not using SSDs, sorry ;)
Post by jim ferenczi
12 shards with 28GB for the heap and 90GB for each index means that you
need at least 336GB for the heap (assuming you're using all of it which
may
Post by jim ferenczi
be easily the case considering the way the GC is handling memory) and
~=
Post by jim ferenczi
Post by jim ferenczi
1TO for the index. Let's say that you don't need your entire index in
RAM,
Post by jim ferenczi
the problem as I see it is that you don't have enough RAM for your
index
Post by jim ferenczi
+
Post by jim ferenczi
heap. Assuming your machine has 370GB of RAM there are only 34GB left
for
Post by jim ferenczi
Post by jim ferenczi
your index, 1TO/34GB means that you can only have 1/30 of your entire
index
Post by jim ferenczi
in RAM. I would advise you to check the swap activity on the machine
and
Post by jim ferenczi
Post by jim ferenczi
see if it correlates with the bad performance you're seeing. One
important
Post by jim ferenczi
thing to notice is that a significant part of your index needs to be in
RAM
Post by jim ferenczi
*As mentioned above this is a big machine with 370+ gb of RAM and Solr
(12
Post by jim ferenczi
nodes total) is assigned 336 GB. The rest is still a good for other
system
Post by jim ferenczi
activities.*
The remaining size after you removed the heap usage should be reserved
for
Post by jim ferenczi
the index (not only the other system activities).
*Also the CPU utilization goes upto 400% in few of the nodes:*
You said that only machine is used so I assumed that 400% cpu is for a
single process (one solr node), right ?
This seems impossible if you are sure that only one query is played at
a
Post by jim ferenczi
Post by jim ferenczi
time and no indexing is performed. Best thing to do is to dump stack
trace
Post by jim ferenczi
of the solr nodes during the query and to check what the threads are
doing.
Post by jim ferenczi
Jim
Post by Modassar Ather
Just to add one more point that one external Zookeeper instance is
also
Post by jim ferenczi
Post by jim ferenczi
Post by Modassar Ather
running on this particular machine.
Regards,
Modassar
On Mon, Nov 2, 2015 at 2:34 PM, Modassar Ather <
Post by Modassar Ather
Hi Toke,
Thanks for your response. My comments in-line.
That is 12 machines, running a shard each?
No! This is a single big machine with 12 shards on it.
What is the total amount of physical memory on each machine?
Around 370 gb on the single machine.
Well, se* probably expands to a great deal of documents, but a huge
bump
Post by jim ferenczi
Post by Modassar Ather
Post by Modassar Ather
in memory utilization and 3 minutes+ sounds strange.
- What are your normal query times?
Few simple queries are returned with in a couple of seconds. But the
more
Post by Modassar Ather
complex queries with proximity and wild cards have taken more than
3-4
Post by jim ferenczi
Post by jim ferenczi
Post by Modassar Ather
Post by Modassar Ather
minutes and some times some queries have timed out too where time
out
Post by jim ferenczi
is
Post by jim ferenczi
Post by Modassar Ather
Post by Modassar Ather
set to 5 minutes.
- How many hits do you get from 'network se*'?
More than a million records.
- How many results do you return (the rows-parameter)?
It is the default one 10. Grouping is enabled on a field.
- If you issue a query without wildcards, but with approximately the
same amount of hits as 'network se*', how long does it take?
A query resulting in around half a million record return within a
couple
Post by jim ferenczi
Post by Modassar Ather
Post by Modassar Ather
of seconds.
That is strange, yes. Have you checked the logs to see if something
unexpected is going on while you test?
Have not seen anything particularly. Will try to check again.
If you are using spinning drives and only have 32GB of RAM in total
in
Post by jim ferenczi
Post by jim ferenczi
Post by Modassar Ather
Post by Modassar Ather
each machine, you are probably struggling just to keep things
running.
Post by jim ferenczi
Post by jim ferenczi
Post by Modassar Ather
Post by Modassar Ather
As mentioned above this is a big machine with 370+ gb of RAM and
Solr
Post by jim ferenczi
Post by jim ferenczi
Post by Modassar Ather
(12
Post by Modassar Ather
nodes total) is assigned 336 GB. The rest is still a good for other
system
Post by Modassar Ather
activities.
Thanks,
Modassar
On Mon, Nov 2, 2015 at 1:30 PM, Toke Eskildsen <
Post by Toke Eskildsen
Post by Modassar Ather
I have a setup of 12 shard cluster started with 28gb memory each
on a
Post by jim ferenczi
Post by Modassar Ather
Post by Modassar Ather
Post by Toke Eskildsen
Post by Modassar Ather
single server. There are no replica. The size of index is around
90gb on
Post by Modassar Ather
Post by Toke Eskildsen
Post by Modassar Ather
each shard. The Solr version is 5.2.1.
That is 12 machines, running a shard each?
What is the total amount of physical memory on each machine?
Post by Modassar Ather
When I query "network se*", the memory utilization goes upto
24-26
Post by jim ferenczi
gb
Post by jim ferenczi
Post by Modassar Ather
Post by Modassar Ather
Post by Toke Eskildsen
and
Post by Modassar Ather
the query takes around 3+ minutes to execute. Also the CPU
utilization
Post by Modassar Ather
Post by Toke Eskildsen
goes
Post by Modassar Ather
upto 400% in few of the nodes.
Well, se* probably expands to a great deal of documents, but a huge
bump
Post by Modassar Ather
Post by Toke Eskildsen
in memory utilization and 3 minutes+ sounds strange.
- What are your normal query times?
- How many hits do you get from 'network se*'?
- How many results do you return (the rows-parameter)?
- If you issue a query without wildcards, but with approximately
the
Post by jim ferenczi
Post by jim ferenczi
Post by Modassar Ather
Post by Modassar Ather
Post by Toke Eskildsen
same amount of hits as 'network se*', how long does it take?
Post by Modassar Ather
Why the CPU utilization is so high and more than one core is
used.
Post by jim ferenczi
Post by jim ferenczi
Post by Modassar Ather
Post by Modassar Ather
Post by Toke Eskildsen
Post by Modassar Ather
As far as I understand querying is single threaded.
That is strange, yes. Have you checked the logs to see if something
unexpected is going on while you test?
Post by Modassar Ather
How can I disable replication(as it is implicitly enabled)
permanently
Post by Modassar Ather
Post by Toke Eskildsen
as
Post by Modassar Ather
in our case we are not using it but can see warnings related to
leader
Post by Modassar Ather
Post by Toke Eskildsen
Post by Modassar Ather
election?
If you are using spinning drives and only have 32GB of RAM in total
in
Post by jim ferenczi
Post by Modassar Ather
Post by Modassar Ather
Post by Toke Eskildsen
each machine, you are probably struggling just to keep things
running.
Post by jim ferenczi
Post by Modassar Ather
Post by Modassar Ather
Post by Toke Eskildsen
- Toke Eskildsen, State and University Library, Denmark
jim ferenczi
2015-11-02 12:40:48 UTC
Permalink
Oups I did not read the thread carrefully.
*The problem is with the same query as phrase. q="network se*".*
I was not aware that you could do that with Solr ;). I would say this is
expected because in such case if the number of expansions for "se*" is big
then you would have to check the positions for a significant words. I don't
know if there is a limitation in the number of expansions for a prefix
query contained into a phrase query but I would look at this parameter
first (limit the number of expansion per prefix search, let's say the N
most significant words based on the frequency of the words for instance).
Post by jim ferenczi
*I am not able to get the above point. So when I start Solr with 28g RAM,
for all the activities related to Solr it should not go beyond 28g. And the
remaining heap will be used for activities other than Solr. Please help me
understand.*
Well those 28GB of heap are the memory "reserved" for your Solr
application, though some parts of the index (not to say all) are retrieved
via MMap (if you use the default MMapDirectory) which do not use the heap
at all. This is a very important part of Lucene/Solr, the heap should be
sized in a way that let a significant amount of RAM available for the
index. If not then you rely on the speed of your disk, if you have SSDs
it's better but reads are still significantly slower with SSDs than with
direct RAM access. Another thing to keep in mind is that mmap will always
tries to put things in RAM, this is why I suspect that you swap activity is
killing your performance.
Post by Modassar Ather
Thanks Jim for your response.
The remaining size after you removed the heap usage should be reserved for
the index (not only the other system activities).
I am not able to get the above point. So when I start Solr with 28g RAM,
for all the activities related to Solr it should not go beyond 28g. And the
remaining heap will be used for activities other than Solr. Please help me
understand.
*Also the CPU utilization goes upto 400% in few of the nodes:*
You said that only machine is used so I assumed that 400% cpu is for a
single process (one solr node), right ?
Yes you are right that 400% is for single process.
The disks are SSDs.
Regards,
Modassar
Post by jim ferenczi
*if it correlates with the bad performance you're seeing. One important
thing to notice is that a significant part of your index needs to be in
RAM
Post by jim ferenczi
(especially if you're using SSDs) in order to achieve good performance.*
Especially if you're not using SSDs, sorry ;)
Post by jim ferenczi
12 shards with 28GB for the heap and 90GB for each index means that
you
Post by jim ferenczi
Post by jim ferenczi
need at least 336GB for the heap (assuming you're using all of it
which
Post by jim ferenczi
may
Post by jim ferenczi
be easily the case considering the way the GC is handling memory) and
~=
Post by jim ferenczi
Post by jim ferenczi
1TO for the index. Let's say that you don't need your entire index in
RAM,
Post by jim ferenczi
the problem as I see it is that you don't have enough RAM for your
index
Post by jim ferenczi
+
Post by jim ferenczi
heap. Assuming your machine has 370GB of RAM there are only 34GB left
for
Post by jim ferenczi
Post by jim ferenczi
your index, 1TO/34GB means that you can only have 1/30 of your entire
index
Post by jim ferenczi
in RAM. I would advise you to check the swap activity on the machine
and
Post by jim ferenczi
Post by jim ferenczi
see if it correlates with the bad performance you're seeing. One
important
Post by jim ferenczi
thing to notice is that a significant part of your index needs to be
in
Post by jim ferenczi
RAM
Post by jim ferenczi
(especially if you're using SSDs) in order to achieve good
*As mentioned above this is a big machine with 370+ gb of RAM and Solr
(12
Post by jim ferenczi
nodes total) is assigned 336 GB. The rest is still a good for other
system
Post by jim ferenczi
activities.*
The remaining size after you removed the heap usage should be reserved
for
Post by jim ferenczi
the index (not only the other system activities).
*Also the CPU utilization goes upto 400% in few of the nodes:*
You said that only machine is used so I assumed that 400% cpu is for a
single process (one solr node), right ?
This seems impossible if you are sure that only one query is played
at a
Post by jim ferenczi
Post by jim ferenczi
time and no indexing is performed. Best thing to do is to dump stack
trace
Post by jim ferenczi
of the solr nodes during the query and to check what the threads are
doing.
Post by jim ferenczi
Jim
Post by Modassar Ather
Just to add one more point that one external Zookeeper instance is
also
Post by jim ferenczi
Post by jim ferenczi
Post by Modassar Ather
running on this particular machine.
Regards,
Modassar
On Mon, Nov 2, 2015 at 2:34 PM, Modassar Ather <
Post by Modassar Ather
Hi Toke,
Thanks for your response. My comments in-line.
That is 12 machines, running a shard each?
No! This is a single big machine with 12 shards on it.
What is the total amount of physical memory on each machine?
Around 370 gb on the single machine.
Well, se* probably expands to a great deal of documents, but a huge
bump
Post by jim ferenczi
Post by Modassar Ather
Post by Modassar Ather
in memory utilization and 3 minutes+ sounds strange.
- What are your normal query times?
Few simple queries are returned with in a couple of seconds. But
the
Post by jim ferenczi
Post by jim ferenczi
Post by Modassar Ather
more
Post by Modassar Ather
complex queries with proximity and wild cards have taken more than
3-4
Post by jim ferenczi
Post by jim ferenczi
Post by Modassar Ather
Post by Modassar Ather
minutes and some times some queries have timed out too where time
out
Post by jim ferenczi
is
Post by jim ferenczi
Post by Modassar Ather
Post by Modassar Ather
set to 5 minutes.
- How many hits do you get from 'network se*'?
More than a million records.
- How many results do you return (the rows-parameter)?
It is the default one 10. Grouping is enabled on a field.
- If you issue a query without wildcards, but with approximately
the
Post by jim ferenczi
Post by jim ferenczi
Post by Modassar Ather
Post by Modassar Ather
same amount of hits as 'network se*', how long does it take?
A query resulting in around half a million record return within a
couple
Post by jim ferenczi
Post by Modassar Ather
Post by Modassar Ather
of seconds.
That is strange, yes. Have you checked the logs to see if something
unexpected is going on while you test?
Have not seen anything particularly. Will try to check again.
If you are using spinning drives and only have 32GB of RAM in
total in
Post by jim ferenczi
Post by jim ferenczi
Post by Modassar Ather
Post by Modassar Ather
each machine, you are probably struggling just to keep things
running.
Post by jim ferenczi
Post by jim ferenczi
Post by Modassar Ather
Post by Modassar Ather
As mentioned above this is a big machine with 370+ gb of RAM and
Solr
Post by jim ferenczi
Post by jim ferenczi
Post by Modassar Ather
(12
Post by Modassar Ather
nodes total) is assigned 336 GB. The rest is still a good for other
system
Post by Modassar Ather
activities.
Thanks,
Modassar
On Mon, Nov 2, 2015 at 1:30 PM, Toke Eskildsen <
Post by Toke Eskildsen
Post by Modassar Ather
I have a setup of 12 shard cluster started with 28gb memory each
on a
Post by jim ferenczi
Post by Modassar Ather
Post by Modassar Ather
Post by Toke Eskildsen
Post by Modassar Ather
single server. There are no replica. The size of index is around
90gb on
Post by Modassar Ather
Post by Toke Eskildsen
Post by Modassar Ather
each shard. The Solr version is 5.2.1.
That is 12 machines, running a shard each?
What is the total amount of physical memory on each machine?
Post by Modassar Ather
When I query "network se*", the memory utilization goes upto
24-26
Post by jim ferenczi
gb
Post by jim ferenczi
Post by Modassar Ather
Post by Modassar Ather
Post by Toke Eskildsen
and
Post by Modassar Ather
the query takes around 3+ minutes to execute. Also the CPU
utilization
Post by Modassar Ather
Post by Toke Eskildsen
goes
Post by Modassar Ather
upto 400% in few of the nodes.
Well, se* probably expands to a great deal of documents, but a
huge
Post by jim ferenczi
Post by jim ferenczi
Post by Modassar Ather
bump
Post by Modassar Ather
Post by Toke Eskildsen
in memory utilization and 3 minutes+ sounds strange.
- What are your normal query times?
- How many hits do you get from 'network se*'?
- How many results do you return (the rows-parameter)?
- If you issue a query without wildcards, but with approximately
the
Post by jim ferenczi
Post by jim ferenczi
Post by Modassar Ather
Post by Modassar Ather
Post by Toke Eskildsen
same amount of hits as 'network se*', how long does it take?
Post by Modassar Ather
Why the CPU utilization is so high and more than one core is
used.
Post by jim ferenczi
Post by jim ferenczi
Post by Modassar Ather
Post by Modassar Ather
Post by Toke Eskildsen
Post by Modassar Ather
As far as I understand querying is single threaded.
That is strange, yes. Have you checked the logs to see if
something
Post by jim ferenczi
Post by jim ferenczi
Post by Modassar Ather
Post by Modassar Ather
Post by Toke Eskildsen
unexpected is going on while you test?
Post by Modassar Ather
How can I disable replication(as it is implicitly enabled)
permanently
Post by Modassar Ather
Post by Toke Eskildsen
as
Post by Modassar Ather
in our case we are not using it but can see warnings related to
leader
Post by Modassar Ather
Post by Toke Eskildsen
Post by Modassar Ather
election?
If you are using spinning drives and only have 32GB of RAM in
total
Post by jim ferenczi
in
Post by jim ferenczi
Post by Modassar Ather
Post by Modassar Ather
Post by Toke Eskildsen
each machine, you are probably struggling just to keep things
running.
Post by jim ferenczi
Post by Modassar Ather
Post by Modassar Ather
Post by Toke Eskildsen
- Toke Eskildsen, State and University Library, Denmark
Modassar Ather
2015-11-02 12:47:04 UTC
Permalink
The problem is with the same query as phrase. q="network se*".

The last . is fullstops for the sentence and the query is q=field:"network
se*"

Best,
Modassar
Post by jim ferenczi
Oups I did not read the thread carrefully.
*The problem is with the same query as phrase. q="network se*".*
I was not aware that you could do that with Solr ;). I would say this is
expected because in such case if the number of expansions for "se*" is big
then you would have to check the positions for a significant words. I don't
know if there is a limitation in the number of expansions for a prefix
query contained into a phrase query but I would look at this parameter
first (limit the number of expansion per prefix search, let's say the N
most significant words based on the frequency of the words for instance).
Post by jim ferenczi
*I am not able to get the above point. So when I start Solr with 28g
RAM,
Post by jim ferenczi
for all the activities related to Solr it should not go beyond 28g. And
the
Post by jim ferenczi
remaining heap will be used for activities other than Solr. Please help
me
Post by jim ferenczi
understand.*
Well those 28GB of heap are the memory "reserved" for your Solr
application, though some parts of the index (not to say all) are
retrieved
Post by jim ferenczi
via MMap (if you use the default MMapDirectory) which do not use the heap
at all. This is a very important part of Lucene/Solr, the heap should be
sized in a way that let a significant amount of RAM available for the
index. If not then you rely on the speed of your disk, if you have SSDs
it's better but reads are still significantly slower with SSDs than with
direct RAM access. Another thing to keep in mind is that mmap will always
tries to put things in RAM, this is why I suspect that you swap activity
is
Post by jim ferenczi
killing your performance.
Post by Modassar Ather
Thanks Jim for your response.
The remaining size after you removed the heap usage should be reserved
for
Post by jim ferenczi
Post by Modassar Ather
the index (not only the other system activities).
I am not able to get the above point. So when I start Solr with 28g
RAM,
Post by jim ferenczi
Post by Modassar Ather
for all the activities related to Solr it should not go beyond 28g. And the
remaining heap will be used for activities other than Solr. Please help
me
Post by jim ferenczi
Post by Modassar Ather
understand.
*Also the CPU utilization goes upto 400% in few of the nodes:*
You said that only machine is used so I assumed that 400% cpu is for a
single process (one solr node), right ?
Yes you are right that 400% is for single process.
The disks are SSDs.
Regards,
Modassar
Post by jim ferenczi
*if it correlates with the bad performance you're seeing. One
important
Post by jim ferenczi
Post by Modassar Ather
Post by jim ferenczi
thing to notice is that a significant part of your index needs to be
in
Post by jim ferenczi
Post by Modassar Ather
RAM
Post by jim ferenczi
(especially if you're using SSDs) in order to achieve good
performance.*
Post by jim ferenczi
Post by Modassar Ather
Post by jim ferenczi
Especially if you're not using SSDs, sorry ;)
Post by jim ferenczi
12 shards with 28GB for the heap and 90GB for each index means that
you
Post by jim ferenczi
Post by jim ferenczi
need at least 336GB for the heap (assuming you're using all of it
which
Post by jim ferenczi
may
Post by jim ferenczi
be easily the case considering the way the GC is handling memory)
and
Post by jim ferenczi
Post by Modassar Ather
~=
Post by jim ferenczi
Post by jim ferenczi
1TO for the index. Let's say that you don't need your entire index
in
Post by jim ferenczi
Post by Modassar Ather
Post by jim ferenczi
RAM,
Post by jim ferenczi
the problem as I see it is that you don't have enough RAM for your
index
Post by jim ferenczi
+
Post by jim ferenczi
heap. Assuming your machine has 370GB of RAM there are only 34GB
left
Post by jim ferenczi
Post by Modassar Ather
for
Post by jim ferenczi
Post by jim ferenczi
your index, 1TO/34GB means that you can only have 1/30 of your
entire
Post by jim ferenczi
Post by Modassar Ather
Post by jim ferenczi
index
Post by jim ferenczi
in RAM. I would advise you to check the swap activity on the machine
and
Post by jim ferenczi
Post by jim ferenczi
see if it correlates with the bad performance you're seeing. One
important
Post by jim ferenczi
thing to notice is that a significant part of your index needs to be
in
Post by jim ferenczi
RAM
Post by jim ferenczi
(especially if you're using SSDs) in order to achieve good
*As mentioned above this is a big machine with 370+ gb of RAM and
Solr
Post by jim ferenczi
Post by Modassar Ather
Post by jim ferenczi
(12
Post by jim ferenczi
nodes total) is assigned 336 GB. The rest is still a good for other
system
Post by jim ferenczi
activities.*
The remaining size after you removed the heap usage should be
reserved
Post by jim ferenczi
Post by Modassar Ather
Post by jim ferenczi
for
Post by jim ferenczi
the index (not only the other system activities).
*Also the CPU utilization goes upto 400% in few of the nodes:*
You said that only machine is used so I assumed that 400% cpu is
for a
Post by jim ferenczi
Post by Modassar Ather
Post by jim ferenczi
Post by jim ferenczi
single process (one solr node), right ?
This seems impossible if you are sure that only one query is played
at a
Post by jim ferenczi
Post by jim ferenczi
time and no indexing is performed. Best thing to do is to dump stack
trace
Post by jim ferenczi
of the solr nodes during the query and to check what the threads are
doing.
Post by jim ferenczi
Jim
Post by Modassar Ather
Just to add one more point that one external Zookeeper instance is
also
Post by jim ferenczi
Post by jim ferenczi
Post by Modassar Ather
running on this particular machine.
Regards,
Modassar
On Mon, Nov 2, 2015 at 2:34 PM, Modassar Ather <
Post by Modassar Ather
Hi Toke,
Thanks for your response. My comments in-line.
That is 12 machines, running a shard each?
No! This is a single big machine with 12 shards on it.
What is the total amount of physical memory on each machine?
Around 370 gb on the single machine.
Well, se* probably expands to a great deal of documents, but a
huge
Post by jim ferenczi
Post by Modassar Ather
Post by jim ferenczi
bump
Post by jim ferenczi
Post by Modassar Ather
Post by Modassar Ather
in memory utilization and 3 minutes+ sounds strange.
- What are your normal query times?
Few simple queries are returned with in a couple of seconds. But
the
Post by jim ferenczi
Post by jim ferenczi
Post by Modassar Ather
more
Post by Modassar Ather
complex queries with proximity and wild cards have taken more
than
Post by jim ferenczi
Post by Modassar Ather
3-4
Post by jim ferenczi
Post by jim ferenczi
Post by Modassar Ather
Post by Modassar Ather
minutes and some times some queries have timed out too where time
out
Post by jim ferenczi
is
Post by jim ferenczi
Post by Modassar Ather
Post by Modassar Ather
set to 5 minutes.
- How many hits do you get from 'network se*'?
More than a million records.
- How many results do you return (the rows-parameter)?
It is the default one 10. Grouping is enabled on a field.
- If you issue a query without wildcards, but with approximately
the
Post by jim ferenczi
Post by jim ferenczi
Post by Modassar Ather
Post by Modassar Ather
same amount of hits as 'network se*', how long does it take?
A query resulting in around half a million record return within a
couple
Post by jim ferenczi
Post by Modassar Ather
Post by Modassar Ather
of seconds.
That is strange, yes. Have you checked the logs to see if
something
Post by jim ferenczi
Post by Modassar Ather
Post by jim ferenczi
Post by jim ferenczi
Post by Modassar Ather
Post by Modassar Ather
unexpected is going on while you test?
Have not seen anything particularly. Will try to check again.
If you are using spinning drives and only have 32GB of RAM in
total in
Post by jim ferenczi
Post by jim ferenczi
Post by Modassar Ather
Post by Modassar Ather
each machine, you are probably struggling just to keep things
running.
Post by jim ferenczi
Post by jim ferenczi
Post by Modassar Ather
Post by Modassar Ather
As mentioned above this is a big machine with 370+ gb of RAM and
Solr
Post by jim ferenczi
Post by jim ferenczi
Post by Modassar Ather
(12
Post by Modassar Ather
nodes total) is assigned 336 GB. The rest is still a good for
other
Post by jim ferenczi
Post by Modassar Ather
Post by jim ferenczi
Post by jim ferenczi
Post by Modassar Ather
system
Post by Modassar Ather
activities.
Thanks,
Modassar
On Mon, Nov 2, 2015 at 1:30 PM, Toke Eskildsen <
Post by Toke Eskildsen
Post by Modassar Ather
I have a setup of 12 shard cluster started with 28gb memory
each
Post by jim ferenczi
Post by Modassar Ather
Post by jim ferenczi
on a
Post by jim ferenczi
Post by Modassar Ather
Post by Modassar Ather
Post by Toke Eskildsen
Post by Modassar Ather
single server. There are no replica. The size of index is
around
Post by jim ferenczi
Post by Modassar Ather
Post by jim ferenczi
Post by jim ferenczi
Post by Modassar Ather
90gb on
Post by Modassar Ather
Post by Toke Eskildsen
Post by Modassar Ather
each shard. The Solr version is 5.2.1.
That is 12 machines, running a shard each?
What is the total amount of physical memory on each machine?
Post by Modassar Ather
When I query "network se*", the memory utilization goes upto
24-26
Post by jim ferenczi
gb
Post by jim ferenczi
Post by Modassar Ather
Post by Modassar Ather
Post by Toke Eskildsen
and
Post by Modassar Ather
the query takes around 3+ minutes to execute. Also the CPU
utilization
Post by Modassar Ather
Post by Toke Eskildsen
goes
Post by Modassar Ather
upto 400% in few of the nodes.
Well, se* probably expands to a great deal of documents, but a
huge
Post by jim ferenczi
Post by jim ferenczi
Post by Modassar Ather
bump
Post by Modassar Ather
Post by Toke Eskildsen
in memory utilization and 3 minutes+ sounds strange.
- What are your normal query times?
- How many hits do you get from 'network se*'?
- How many results do you return (the rows-parameter)?
- If you issue a query without wildcards, but with approximately
the
Post by jim ferenczi
Post by jim ferenczi
Post by Modassar Ather
Post by Modassar Ather
Post by Toke Eskildsen
same amount of hits as 'network se*', how long does it take?
Post by Modassar Ather
Why the CPU utilization is so high and more than one core is
used.
Post by jim ferenczi
Post by jim ferenczi
Post by Modassar Ather
Post by Modassar Ather
Post by Toke Eskildsen
Post by Modassar Ather
As far as I understand querying is single threaded.
That is strange, yes. Have you checked the logs to see if
something
Post by jim ferenczi
Post by jim ferenczi
Post by Modassar Ather
Post by Modassar Ather
Post by Toke Eskildsen
unexpected is going on while you test?
Post by Modassar Ather
How can I disable replication(as it is implicitly enabled)
permanently
Post by Modassar Ather
Post by Toke Eskildsen
as
Post by Modassar Ather
in our case we are not using it but can see warnings related
to
Post by jim ferenczi
Post by Modassar Ather
Post by jim ferenczi
Post by jim ferenczi
Post by Modassar Ather
leader
Post by Modassar Ather
Post by Toke Eskildsen
Post by Modassar Ather
election?
If you are using spinning drives and only have 32GB of RAM in
total
Post by jim ferenczi
in
Post by jim ferenczi
Post by Modassar Ather
Post by Modassar Ather
Post by Toke Eskildsen
each machine, you are probably struggling just to keep things
running.
Post by jim ferenczi
Post by Modassar Ather
Post by Modassar Ather
Post by Toke Eskildsen
- Toke Eskildsen, State and University Library, Denmark
jim ferenczi
2015-11-02 13:03:43 UTC
Permalink
Well it seems that doing q="network se*" is working but not in the way you
expect. Doing this q="network se*" would not trigger a prefix query and the
"*" character would be treated as any character. I suspect that your query
is in fact "network se" (assuming you're using a StandardTokenizer) and
that the word "se" is very popular in your documents. That would explain
the slow response time. Bottom line is that doing "network se*" will not
trigger prefix query at all (I may be wrong but this is the expected
behaviour for Solr up to 4.3).
Post by Modassar Ather
The problem is with the same query as phrase. q="network se*".
The last . is fullstops for the sentence and the query is q=field:"network
se*"
Best,
Modassar
Post by jim ferenczi
Oups I did not read the thread carrefully.
*The problem is with the same query as phrase. q="network se*".*
I was not aware that you could do that with Solr ;). I would say this is
expected because in such case if the number of expansions for "se*" is
big
Post by jim ferenczi
then you would have to check the positions for a significant words. I
don't
Post by jim ferenczi
know if there is a limitation in the number of expansions for a prefix
query contained into a phrase query but I would look at this parameter
first (limit the number of expansion per prefix search, let's say the N
most significant words based on the frequency of the words for instance).
Post by jim ferenczi
*I am not able to get the above point. So when I start Solr with 28g
RAM,
Post by jim ferenczi
for all the activities related to Solr it should not go beyond 28g. And
the
Post by jim ferenczi
remaining heap will be used for activities other than Solr. Please help
me
Post by jim ferenczi
understand.*
Well those 28GB of heap are the memory "reserved" for your Solr
application, though some parts of the index (not to say all) are
retrieved
Post by jim ferenczi
via MMap (if you use the default MMapDirectory) which do not use the
heap
Post by jim ferenczi
Post by jim ferenczi
at all. This is a very important part of Lucene/Solr, the heap should
be
Post by jim ferenczi
Post by jim ferenczi
sized in a way that let a significant amount of RAM available for the
index. If not then you rely on the speed of your disk, if you have SSDs
it's better but reads are still significantly slower with SSDs than
with
Post by jim ferenczi
Post by jim ferenczi
direct RAM access. Another thing to keep in mind is that mmap will
always
Post by jim ferenczi
Post by jim ferenczi
tries to put things in RAM, this is why I suspect that you swap
activity
Post by jim ferenczi
is
Post by jim ferenczi
killing your performance.
Post by Modassar Ather
Thanks Jim for your response.
The remaining size after you removed the heap usage should be reserved
for
Post by jim ferenczi
Post by Modassar Ather
the index (not only the other system activities).
I am not able to get the above point. So when I start Solr with 28g
RAM,
Post by jim ferenczi
Post by Modassar Ather
for all the activities related to Solr it should not go beyond 28g.
And
Post by jim ferenczi
Post by jim ferenczi
Post by Modassar Ather
the
remaining heap will be used for activities other than Solr. Please
help
Post by jim ferenczi
me
Post by jim ferenczi
Post by Modassar Ather
understand.
*Also the CPU utilization goes upto 400% in few of the nodes:*
You said that only machine is used so I assumed that 400% cpu is for a
single process (one solr node), right ?
Yes you are right that 400% is for single process.
The disks are SSDs.
Regards,
Modassar
Post by jim ferenczi
*if it correlates with the bad performance you're seeing. One
important
Post by jim ferenczi
Post by Modassar Ather
Post by jim ferenczi
thing to notice is that a significant part of your index needs to be
in
Post by jim ferenczi
Post by Modassar Ather
RAM
Post by jim ferenczi
(especially if you're using SSDs) in order to achieve good
performance.*
Post by jim ferenczi
Post by Modassar Ather
Post by jim ferenczi
Especially if you're not using SSDs, sorry ;)
Post by jim ferenczi
12 shards with 28GB for the heap and 90GB for each index means
that
Post by jim ferenczi
Post by jim ferenczi
Post by Modassar Ather
you
Post by jim ferenczi
Post by jim ferenczi
need at least 336GB for the heap (assuming you're using all of it
which
Post by jim ferenczi
may
Post by jim ferenczi
be easily the case considering the way the GC is handling memory)
and
Post by jim ferenczi
Post by Modassar Ather
~=
Post by jim ferenczi
Post by jim ferenczi
1TO for the index. Let's say that you don't need your entire index
in
Post by jim ferenczi
Post by Modassar Ather
Post by jim ferenczi
RAM,
Post by jim ferenczi
the problem as I see it is that you don't have enough RAM for your
index
Post by jim ferenczi
+
Post by jim ferenczi
heap. Assuming your machine has 370GB of RAM there are only 34GB
left
Post by jim ferenczi
Post by Modassar Ather
for
Post by jim ferenczi
Post by jim ferenczi
your index, 1TO/34GB means that you can only have 1/30 of your
entire
Post by jim ferenczi
Post by Modassar Ather
Post by jim ferenczi
index
Post by jim ferenczi
in RAM. I would advise you to check the swap activity on the
machine
Post by jim ferenczi
Post by jim ferenczi
Post by Modassar Ather
and
Post by jim ferenczi
Post by jim ferenczi
see if it correlates with the bad performance you're seeing. One
important
Post by jim ferenczi
thing to notice is that a significant part of your index needs to
be
Post by jim ferenczi
Post by jim ferenczi
Post by Modassar Ather
in
Post by jim ferenczi
RAM
Post by jim ferenczi
(especially if you're using SSDs) in order to achieve good
*As mentioned above this is a big machine with 370+ gb of RAM and
Solr
Post by jim ferenczi
Post by Modassar Ather
Post by jim ferenczi
(12
Post by jim ferenczi
nodes total) is assigned 336 GB. The rest is still a good for
other
Post by jim ferenczi
Post by jim ferenczi
Post by Modassar Ather
Post by jim ferenczi
system
Post by jim ferenczi
activities.*
The remaining size after you removed the heap usage should be
reserved
Post by jim ferenczi
Post by Modassar Ather
Post by jim ferenczi
for
Post by jim ferenczi
the index (not only the other system activities).
*Also the CPU utilization goes upto 400% in few of the nodes:*
You said that only machine is used so I assumed that 400% cpu is
for a
Post by jim ferenczi
Post by Modassar Ather
Post by jim ferenczi
Post by jim ferenczi
single process (one solr node), right ?
This seems impossible if you are sure that only one query is
played
Post by jim ferenczi
Post by jim ferenczi
Post by Modassar Ather
at a
Post by jim ferenczi
Post by jim ferenczi
time and no indexing is performed. Best thing to do is to dump
stack
Post by jim ferenczi
Post by jim ferenczi
Post by Modassar Ather
Post by jim ferenczi
trace
Post by jim ferenczi
of the solr nodes during the query and to check what the threads
are
Post by jim ferenczi
Post by jim ferenczi
Post by Modassar Ather
Post by jim ferenczi
doing.
Post by jim ferenczi
Jim
Post by Modassar Ather
Just to add one more point that one external Zookeeper instance
is
Post by jim ferenczi
Post by jim ferenczi
Post by Modassar Ather
also
Post by jim ferenczi
Post by jim ferenczi
Post by Modassar Ather
running on this particular machine.
Regards,
Modassar
On Mon, Nov 2, 2015 at 2:34 PM, Modassar Ather <
Post by Modassar Ather
Hi Toke,
Thanks for your response. My comments in-line.
That is 12 machines, running a shard each?
No! This is a single big machine with 12 shards on it.
What is the total amount of physical memory on each machine?
Around 370 gb on the single machine.
Well, se* probably expands to a great deal of documents, but a
huge
Post by jim ferenczi
Post by Modassar Ather
Post by jim ferenczi
bump
Post by jim ferenczi
Post by Modassar Ather
Post by Modassar Ather
in memory utilization and 3 minutes+ sounds strange.
- What are your normal query times?
Few simple queries are returned with in a couple of seconds.
But
Post by jim ferenczi
Post by jim ferenczi
Post by Modassar Ather
the
Post by jim ferenczi
Post by jim ferenczi
Post by Modassar Ather
more
Post by Modassar Ather
complex queries with proximity and wild cards have taken more
than
Post by jim ferenczi
Post by Modassar Ather
3-4
Post by jim ferenczi
Post by jim ferenczi
Post by Modassar Ather
Post by Modassar Ather
minutes and some times some queries have timed out too where
time
Post by jim ferenczi
Post by jim ferenczi
Post by Modassar Ather
out
Post by jim ferenczi
is
Post by jim ferenczi
Post by Modassar Ather
Post by Modassar Ather
set to 5 minutes.
- How many hits do you get from 'network se*'?
More than a million records.
- How many results do you return (the rows-parameter)?
It is the default one 10. Grouping is enabled on a field.
- If you issue a query without wildcards, but with
approximately
Post by jim ferenczi
Post by jim ferenczi
Post by Modassar Ather
the
Post by jim ferenczi
Post by jim ferenczi
Post by Modassar Ather
Post by Modassar Ather
same amount of hits as 'network se*', how long does it take?
A query resulting in around half a million record return
within a
Post by jim ferenczi
Post by jim ferenczi
Post by Modassar Ather
Post by jim ferenczi
couple
Post by jim ferenczi
Post by Modassar Ather
Post by Modassar Ather
of seconds.
That is strange, yes. Have you checked the logs to see if
something
Post by jim ferenczi
Post by Modassar Ather
Post by jim ferenczi
Post by jim ferenczi
Post by Modassar Ather
Post by Modassar Ather
unexpected is going on while you test?
Have not seen anything particularly. Will try to check again.
If you are using spinning drives and only have 32GB of RAM in
total in
Post by jim ferenczi
Post by jim ferenczi
Post by Modassar Ather
Post by Modassar Ather
each machine, you are probably struggling just to keep things
running.
Post by jim ferenczi
Post by jim ferenczi
Post by Modassar Ather
Post by Modassar Ather
As mentioned above this is a big machine with 370+ gb of RAM
and
Post by jim ferenczi
Post by jim ferenczi
Post by Modassar Ather
Solr
Post by jim ferenczi
Post by jim ferenczi
Post by Modassar Ather
(12
Post by Modassar Ather
nodes total) is assigned 336 GB. The rest is still a good for
other
Post by jim ferenczi
Post by Modassar Ather
Post by jim ferenczi
Post by jim ferenczi
Post by Modassar Ather
system
Post by Modassar Ather
activities.
Thanks,
Modassar
On Mon, Nov 2, 2015 at 1:30 PM, Toke Eskildsen <
Post by Toke Eskildsen
Post by Modassar Ather
I have a setup of 12 shard cluster started with 28gb memory
each
Post by jim ferenczi
Post by Modassar Ather
Post by jim ferenczi
on a
Post by jim ferenczi
Post by Modassar Ather
Post by Modassar Ather
Post by Toke Eskildsen
Post by Modassar Ather
single server. There are no replica. The size of index is
around
Post by jim ferenczi
Post by Modassar Ather
Post by jim ferenczi
Post by jim ferenczi
Post by Modassar Ather
90gb on
Post by Modassar Ather
Post by Toke Eskildsen
Post by Modassar Ather
each shard. The Solr version is 5.2.1.
That is 12 machines, running a shard each?
What is the total amount of physical memory on each machine?
Post by Modassar Ather
When I query "network se*", the memory utilization goes upto
24-26
Post by jim ferenczi
gb
Post by jim ferenczi
Post by Modassar Ather
Post by Modassar Ather
Post by Toke Eskildsen
and
Post by Modassar Ather
the query takes around 3+ minutes to execute. Also the CPU
utilization
Post by Modassar Ather
Post by Toke Eskildsen
goes
Post by Modassar Ather
upto 400% in few of the nodes.
Well, se* probably expands to a great deal of documents, but a
huge
Post by jim ferenczi
Post by jim ferenczi
Post by Modassar Ather
bump
Post by Modassar Ather
Post by Toke Eskildsen
in memory utilization and 3 minutes+ sounds strange.
- What are your normal query times?
- How many hits do you get from 'network se*'?
- How many results do you return (the rows-parameter)?
- If you issue a query without wildcards, but with
approximately
Post by jim ferenczi
Post by jim ferenczi
Post by Modassar Ather
the
Post by jim ferenczi
Post by jim ferenczi
Post by Modassar Ather
Post by Modassar Ather
Post by Toke Eskildsen
same amount of hits as 'network se*', how long does it take?
Post by Modassar Ather
Why the CPU utilization is so high and more than one core is
used.
Post by jim ferenczi
Post by jim ferenczi
Post by Modassar Ather
Post by Modassar Ather
Post by Toke Eskildsen
Post by Modassar Ather
As far as I understand querying is single threaded.
That is strange, yes. Have you checked the logs to see if
something
Post by jim ferenczi
Post by jim ferenczi
Post by Modassar Ather
Post by Modassar Ather
Post by Toke Eskildsen
unexpected is going on while you test?
Post by Modassar Ather
How can I disable replication(as it is implicitly enabled)
permanently
Post by Modassar Ather
Post by Toke Eskildsen
as
Post by Modassar Ather
in our case we are not using it but can see warnings related
to
Post by jim ferenczi
Post by Modassar Ather
Post by jim ferenczi
Post by jim ferenczi
Post by Modassar Ather
leader
Post by Modassar Ather
Post by Toke Eskildsen
Post by Modassar Ather
election?
If you are using spinning drives and only have 32GB of RAM in
total
Post by jim ferenczi
in
Post by jim ferenczi
Post by Modassar Ather
Post by Modassar Ather
Post by Toke Eskildsen
each machine, you are probably struggling just to keep things
running.
Post by jim ferenczi
Post by Modassar Ather
Post by Modassar Ather
Post by Toke Eskildsen
- Toke Eskildsen, State and University Library, Denmark
Toke Eskildsen
2015-11-02 11:29:01 UTC
Permalink
Post by Modassar Ather
No! This is a single big machine with 12 shards on it.
Around 370 gb on the single machine.
Okay. I guess your observation of 400% for a single core is with top and
looking at that core's entry? If so, the 400% can be explained by
excessive garbage collection. You could turn GC-logging on to check
that. With a bit of luck GC would be the cause of the slow down.
Post by Modassar Ather
Few simple queries are returned with in a couple of seconds. But the
more complex queries with proximity and wild cards have taken more
than 3-4 minutes and some times some queries have timed out too where
time out is set to 5 minutes.
The proximity information seems relevant here.
Post by Modassar Ather
- How many results do you return (the rows-parameter)?
It is the default one 10. Grouping is enabled on a field.
If you have group.ngroups=true that would be heavy (and require a lot of
memory), but as your non-wildcard searches with many hits are fast, that
is probably not the problem here.
Post by Modassar Ather
If you are using spinning drives and only have 32GB of RAM in total in
each machine, you are probably struggling just to keep things running.
As mentioned above this is a big machine with 370+ gb of RAM and Solr
(12 nodes total) is assigned 336 GB. The rest is still a good for
other system activities.
Assuming the storage is spinning drives, it is quite a small machine,
measured by cache memory vs. index size: You have 30-40GB free for disk
cache and your index is 1TB, so ~3%. Unless you have a great deal of
stored content, 3% for disk caching means that there will be a high
amount of IO during a search. It works for you when the queries are
simple field:term, but I am not surprised that it doesn't work well in
other cases.

By nature, truncated queries touches a lot of terms, which means a lot
of lookups. I have no in-depth knowledge on how these lookups are
performed, but I guesstimate that it involves IO-intensive lookups.


Coincidentally we also run a machine with multiple Solrs, terabytes of
index data and not much memory (< 1%) for disk cache. One difference
being that it is backed by SSDs. I tried doing a few ad-hoc searches
with grouping turned on (search terms are Danish words):

q=ostekiks 38,646 hits, 530 ms.
q=ost* 49,713,655 hits, 2,190 ms.
q=køer mælk 1,232,445 hits, 767 ms.
q=kat mad* 10,926,107 hits, 4624 ms.
q="kaniner harer"~50 161,009 hits, 726 ms.
q=kantarel 337,279 hits, 455 ms.
q=deres kan* 245,719,036 hits, 13,565 ms.

This was with Solr 4.10. No special garbage collection activity
occurred. Heap usage stayed well below 8GB per Solr, which is the
standard behaviour of our system.

In short, I could not replicate your observed special activity based on
the queries you have described. I have no reason to believe that Solr
5.3 should perform worse in this aspect.

The SSDs are probably part of the explanation, but I suspect we are
missing something else. It should not make a difference (as your
non-truncated queries are fast), but could you try to reduce the slow
request to the simplest possible? No grouping, faceting or other special
processing, just q=network se*


- Toke Eskildsen, State and University Library, Denmark
Continue reading on narkive:
Loading...