Discussion:
Solr uses lots of shared memory!
Markus Jelsma
2017-08-22 13:24:17 UTC
Permalink
Hi,

I have never seen this before, one of our collections, all nodes eating tons of shared memory!

Here's one of the nodes:
10497 solr 20 0 19.439g 4.505g 3.139g S 1.0 57.8 2511:46 java

RSS is roughly equal to heap size + usual off-heap space + shared memory. Virtual is equal to RSS and index size on disk. For two other collections, the nodes use shared memory as expected, in the MB range.

How can Solr, this collection, use so much shared memory? Why?

Thanks,
Markus
Shawn Heisey
2017-08-22 15:31:31 UTC
Permalink
Post by Markus Jelsma
I have never seen this before, one of our collections, all nodes eating tons of shared memory!
10497 solr 20 0 19.439g 4.505g 3.139g S 1.0 57.8 2511:46 java
RSS is roughly equal to heap size + usual off-heap space + shared memory. Virtual is equal to RSS and index size on disk. For two other collections, the nodes use shared memory as expected, in the MB range.
How can Solr, this collection, use so much shared memory? Why?
I've seen this on my own servers at work, and when I add up a subset of
the memory numbers I can see from the system, it ends up being more
memory than I even have in the server.

I suspect there is something odd going on in how Java reports memory
usage to the OS, or maybe a glitch in how Linux interprets Java's memory
usage.  At some point in the past, numbers were reported correctly.  I
do not know if the change came about because of a Solr upgrade, because
of a Java upgrade, or because of an OS kernel upgrade.  All three were
upgraded between when I know the numbers looked right and when I noticed
they were wrong.

Loading Image...

This screenshot shows that Solr is using 17GB of memory, 41.45GB of
memory is being used by the OS disk cache, and 10.23GB of memory is
free.  Add those up, and it comes to 68.68GB ... but the machine only
has 64GB of memory, and that total doesn't include the memory usage of
the other processes seen in the screenshot.  This impossible situation
means that something is being misreported somewhere.  If I deduct that
11GB of SHR from the RES value, then all the numbers work.

The screenshot was almost 3 years ago, so I do not know what machine it
came from, and therefore I can't be sure what the actual heap size was. 
I think it was about 6GB -- the difference between RES and SHR.  I have
used a 6GB heap on some of my production servers in the past.  The
server where I got this screenshot was not having any noticeable
performance or memory problems, so I think that I can trust that the
main numbers above the process list (which only come from the OS) are
correct.

Thanks,
Shawn
Markus Jelsma
2017-08-23 13:32:14 UTC
Permalink
I do not think it is a problem of reporting after watching top after restart of some Solr instances, it dropped back to `normal`, around 350 MB, which i think it high to but anyway.

Two hours later, the restarted nodes are slowly increasing shared memory consumption to about 1500 MB now. I don't understand why shared memory usage should/would increase slowly over time, it makes little sense to me and i cannot remember Solr doing this in the past ten years.

But it seems to correlate to index size on disk, these main text search nodes have an index of around 16 GB and up 3 GB of shared memory after a few days. Logs nodes up to 800 MB index size and 320 MB of shared memory, the low latency nodes have four different cores that make up just over 100 MB index size, shared memory consumption is just 22 MB, which seems more reasonable for the case of shared memory.

I can also force Solr to 'leak' shared memory just by sending queries to it. My freshly restarted local node used 68 MB shared memory at startup. Two minutes and 25.000 queries later it was already 2748 MB! At first there is a very sharp increase to 2000, then it takes almost two minutes more to increase to 2748. I can decrease the maximum shared memory usage to 1200 if i query (via edismax) only on fields of one language instead of 25 orso. I can decrease it as well further if i disable highlighting (HUH?) but still query on all fields.

* We have tried patching Java's ByteBuffer [1] because it seemed to fit the problems, it does not fix it.
* We have also removed all our custom plugins, so it has become a vanilla Solr 6.6 just with our stripped down schema and solrconfig, it neither fixes it.

Why does it slowly increase over time?
Why does it appear to correlate to index size?
Is anyone else seeing this on their 6.6 cloud production or local machines?

Thanks,
Markus

[1]: http://www.evanjones.ca/java-bytebuffer-leak.html

-----Original message-----
Sent: Tuesday 22nd August 2017 17:32
Subject: Re: Solr uses lots of shared memory!
Post by Markus Jelsma
I have never seen this before, one of our collections, all nodes eating tons of shared memory!
10497 solr      20   0 19.439g 4.505g 3.139g S   1.0 57.8   2511:46 java
RSS is roughly equal to heap size + usual off-heap space + shared memory. Virtual is equal to RSS and index size on disk. For two other collections, the nodes use shared memory as expected, in the MB range.
How can Solr, this collection, use so much shared memory? Why?
I've seen this on my own servers at work, and when I add up a subset of
the memory numbers I can see from the system, it ends up being more
memory than I even have in the server.
I suspect there is something odd going on in how Java reports memory
usage to the OS, or maybe a glitch in how Linux interprets Java's memory
usage.  At some point in the past, numbers were reported correctly.  I
do not know if the change came about because of a Solr upgrade, because
of a Java upgrade, or because of an OS kernel upgrade.  All three were
upgraded between when I know the numbers looked right and when I noticed
they were wrong.
https://www.dropbox.com/s/91uqlrnfghr2heo/solr-memory-sorted-top.png?dl=0
This screenshot shows that Solr is using 17GB of memory, 41.45GB of
memory is being used by the OS disk cache, and 10.23GB of memory is
free.  Add those up, and it comes to 68.68GB ... but the machine only
has 64GB of memory, and that total doesn't include the memory usage of
the other processes seen in the screenshot.  This impossible situation
means that something is being misreported somewhere.  If I deduct that
11GB of SHR from the RES value, then all the numbers work.
The screenshot was almost 3 years ago, so I do not know what machine it
came from, and therefore I can't be sure what the actual heap size was. 
I think it was about 6GB -- the difference between RES and SHR.  I have
used a 6GB heap on some of my production servers in the past.  The
server where I got this screenshot was not having any noticeable
performance or memory problems, so I think that I can trust that the
main numbers above the process list (which only come from the OS) are
correct.
Thanks,
Shawn
Shawn Heisey
2017-08-23 14:36:45 UTC
Permalink
Post by Markus Jelsma
Why does it slowly increase over time?
Why does it appear to correlate to index size?
Is anyone else seeing this on their 6.6 cloud production or local machines?
More detailed information included here.  My 6.6 dev install is NOT
having the problem, but a much older version IS.

I grabbed this screenshot only moments ago from a production server
which is exhibiting a large SHR value for the Solr process:

Loading Image...

This is Solr 4.7.2, with a 10 month uptime for the Solr process, running
with these arguments:

-DSTOP.KEY=REDACTED
-DSTOP.PORT=8078
-Djetty.port=8981
-Dsolr.solr.home=/index/solr4
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.port=8686
-Dcom.sun.management.jmxremote
-XX:+PrintReferenceGC
-XX:+PrintAdaptiveSizePolicy
-XX:+PrintGCDetails
-XX:+PrintGCDateStamps
-Xloggc:logs/gc.log-verbose:gc
-XX:+AggressiveOpts
-XX:+UseLargePages
-XX:InitiatingHeapOccupancyPercent=75
-XX:MaxGCPauseMillis=250
-XX:G1HeapRegionSize=8m
-XX:+ParallelRefProcEnabled
-XX:+PerfDisableSharedMem
-XX:+UseG1GC
-Dlog4j.configuration=file:etc/log4j.properties
-Xmx8192M
-Xms4096M

The OS is CentOS 6, with the following Java and kernel:

java version "1.7.0_72"
Java(TM) SE Runtime Environment (build 1.7.0_72-b14)
Java HotSpot(TM) 64-Bit Server VM (build 24.72-b04, mixed mode)

Linux idxa1 2.6.32-431.11.2.el6.centos.plus.x86_64 #1 SMP Tue Mar 25
21:36:54 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

I also just grabbed a screenshot from my dev server, running Ubuntu 14,
Solr 6.6.0, a LOT more index data, and a more recent Java version.  Solr
has an uptime of about one month.  This server was installed with the
service installer script, so it uses bin/solr.  It doesn't seem to have
the same problem:

Loading Image...

java version "1.8.0_144"
Java(TM) SE Runtime Environment (build 1.8.0_144-b01)
Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode)

Linux bigindy5 3.13.0-105-generic #152-Ubuntu SMP Fri Dec 2 15:37:11 UTC
2016 x86_64 x86_64 x86_64 GNU/Linux

The arguments for this one are very similar to the production server:

-DSTOP.KEY=solrrocks
-DSTOP.PORT=7982
-Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.local.only=false
-Dcom.sun.management.jmxremote.port=18982
-Dcom.sun.management.jmxremote.rmi.port=18982
-Dcom.sun.management.jmxremote.ssl=false
-Djetty.home=/opt/solr6/server
-Djetty.port=8982
-Dlog4j.configuration=file:/index/solr6/log4j.properties
-Dsolr.install.dir=/opt/solr6
-Dsolr.log.dir=/index/solr6/logs
-Dsolr.log.muteconsole
-Dsolr.solr.home=/index/solr6/data
-Duser.timezone=UTC
-XX:+AggressiveOpts
-XX:+ParallelRefProcEnabled
-XX:+PrintGCApplicationStoppedTime
-XX:+PrintGCDateStamps
-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-XX:+PrintHeapAtGC
-XX:+PrintTenuringDistribution
-XX:+UseG1GC
-XX:+UseGCLogFileRotation
-XX:+UseLargePages
-XX:G1HeapRegionSize=8m
-XX:GCLogFileSize=20M
-XX:InitiatingHeapOccupancyPercent=75
-XX:MaxGCPauseMillis=250
-XX:NumberOfGCLogFiles=9
-XX:OnOutOfMemoryError=/opt/solr6/bin/oom_solr.sh 8982 /index/solr6/logs
-Xloggc:/index/solr6/logs/solr_gc.log
-Xms28g
-Xmx28g
-Xss256k
-verbose:gc

Neither system has any huge pages allocated in the OS, so I doubt that
the UseLargePages option is actually doing anything.  I've left it there
in case I *do* enable huge pages, so they will automatically get used.

Thanks,
Shawn
Shalin Shekhar Mangar
2017-08-24 03:54:03 UTC
Permalink
Very interesting. Do you have many DocValue fields? Have you always
had them i.e. did you see this problem before you turned on DocValues?
The DocValue fields are in a separate file and they will be memory
mapped on demand. One thing you can experiment with is to use
preload=true option on the MMapDirectoryFactory which will mmap all
index files on startup [1]. Once you do this, and if you still notice
shared memory leakage then it may be a genuine memory leak that we
should investigate.

[1] - http://lucene.apache.org/solr/guide/6_6/datadir-and-directoryfactory-in-solrconfig.html#DataDirandDirectoryFactoryinSolrConfig-SpecifyingtheDirectoryFactoryForYourIndex

On Wed, Aug 23, 2017 at 7:02 PM, Markus Jelsma
Post by Markus Jelsma
I do not think it is a problem of reporting after watching top after restart of some Solr instances, it dropped back to `normal`, around 350 MB, which i think it high to but anyway.
Two hours later, the restarted nodes are slowly increasing shared memory consumption to about 1500 MB now. I don't understand why shared memory usage should/would increase slowly over time, it makes little sense to me and i cannot remember Solr doing this in the past ten years.
But it seems to correlate to index size on disk, these main text search nodes have an index of around 16 GB and up 3 GB of shared memory after a few days. Logs nodes up to 800 MB index size and 320 MB of shared memory, the low latency nodes have four different cores that make up just over 100 MB index size, shared memory consumption is just 22 MB, which seems more reasonable for the case of shared memory.
I can also force Solr to 'leak' shared memory just by sending queries to it. My freshly restarted local node used 68 MB shared memory at startup. Two minutes and 25.000 queries later it was already 2748 MB! At first there is a very sharp increase to 2000, then it takes almost two minutes more to increase to 2748. I can decrease the maximum shared memory usage to 1200 if i query (via edismax) only on fields of one language instead of 25 orso. I can decrease it as well further if i disable highlighting (HUH?) but still query on all fields.
* We have tried patching Java's ByteBuffer [1] because it seemed to fit the problems, it does not fix it.
* We have also removed all our custom plugins, so it has become a vanilla Solr 6.6 just with our stripped down schema and solrconfig, it neither fixes it.
Why does it slowly increase over time?
Why does it appear to correlate to index size?
Is anyone else seeing this on their 6.6 cloud production or local machines?
Thanks,
Markus
[1]: http://www.evanjones.ca/java-bytebuffer-leak.html
-----Original message-----
Sent: Tuesday 22nd August 2017 17:32
Subject: Re: Solr uses lots of shared memory!
Post by Markus Jelsma
I have never seen this before, one of our collections, all nodes eating tons of shared memory!
10497 solr 20 0 19.439g 4.505g 3.139g S 1.0 57.8 2511:46 java
RSS is roughly equal to heap size + usual off-heap space + shared memory. Virtual is equal to RSS and index size on disk. For two other collections, the nodes use shared memory as expected, in the MB range.
How can Solr, this collection, use so much shared memory? Why?
I've seen this on my own servers at work, and when I add up a subset of
the memory numbers I can see from the system, it ends up being more
memory than I even have in the server.
I suspect there is something odd going on in how Java reports memory
usage to the OS, or maybe a glitch in how Linux interprets Java's memory
usage. At some point in the past, numbers were reported correctly. I
do not know if the change came about because of a Solr upgrade, because
of a Java upgrade, or because of an OS kernel upgrade. All three were
upgraded between when I know the numbers looked right and when I noticed
they were wrong.
https://www.dropbox.com/s/91uqlrnfghr2heo/solr-memory-sorted-top.png?dl=0
This screenshot shows that Solr is using 17GB of memory, 41.45GB of
memory is being used by the OS disk cache, and 10.23GB of memory is
free. Add those up, and it comes to 68.68GB ... but the machine only
has 64GB of memory, and that total doesn't include the memory usage of
the other processes seen in the screenshot. This impossible situation
means that something is being misreported somewhere. If I deduct that
11GB of SHR from the RES value, then all the numbers work.
The screenshot was almost 3 years ago, so I do not know what machine it
came from, and therefore I can't be sure what the actual heap size was.
I think it was about 6GB -- the difference between RES and SHR. I have
used a 6GB heap on some of my production servers in the past. The
server where I got this screenshot was not having any noticeable
performance or memory problems, so I think that I can trust that the
main numbers above the process list (which only come from the OS) are
correct.
Thanks,
Shawn
--
Regards,
Shalin Shekhar Mangar.
Markus Jelsma
2017-08-23 15:10:44 UTC
Permalink
I have the problem in production and local, with default Solr 6.6 JVM arguments, environments are:

openjdk version "1.8.0_141"
OpenJDK Runtime Environment (build 1.8.0_141-8u141-b15-1~deb9u1-b15)
OpenJDK 64-Bit Server VM (build 25.141-b15, mixed mode)
Linux idx1 4.9.0-3-amd64 #1 SMP Debian 4.9.30-2+deb9u1 (2017-06-18) x86_64 GNU/Linux

and

openjdk version "1.8.0_131"
OpenJDK Runtime Environment (build 1.8.0_131-8u131-b11-2ubuntu1.17.04.3-b11)
OpenJDK 64-Bit Server VM (build 25.131-b11, mixed mode)
Linux midas 4.10.0-32-generic #36-Ubuntu SMP Tue Aug 8 12:10:06 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

Regarding the node that shows the problem, can you reproduce it locally? Fire it up, put some data in, confirm low shared space usage, and execute few thousands queries against it? We immediately see a sharp rise in shared memory, MB's per second until it reaches some sort of plateau.

-----Original message-----
Sent: Wednesday 23rd August 2017 16:37
Subject: Re: Solr uses lots of shared memory!
Post by Markus Jelsma
Why does it slowly increase over time?
Why does it appear to correlate to index size?
Is anyone else seeing this on their 6.6 cloud production or local machines?
More detailed information included here.  My 6.6 dev install is NOT
having the problem, but a much older version IS.
I grabbed this screenshot only moments ago from a production server
https://www.dropbox.com/s/q79lo2gft9es06u/idxa1-top-big-shr.png?dl=0
This is Solr 4.7.2, with a 10 month uptime for the Solr process, running
-DSTOP.KEY=REDACTED
-DSTOP.PORT=8078
-Djetty.port=8981
-Dsolr.solr.home=/index/solr4
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.port=8686
-Dcom.sun.management.jmxremote
-XX:+PrintReferenceGC
-XX:+PrintAdaptiveSizePolicy
-XX:+PrintGCDetails
-XX:+PrintGCDateStamps
-Xloggc:logs/gc.log-verbose:gc
-XX:+AggressiveOpts
-XX:+UseLargePages
-XX:InitiatingHeapOccupancyPercent=75
-XX:MaxGCPauseMillis=250
-XX:G1HeapRegionSize=8m
-XX:+ParallelRefProcEnabled
-XX:+PerfDisableSharedMem
-XX:+UseG1GC
-Dlog4j.configuration=file:etc/log4j.properties
-Xmx8192M
-Xms4096M
java version "1.7.0_72"
Java(TM) SE Runtime Environment (build 1.7.0_72-b14)
Java HotSpot(TM) 64-Bit Server VM (build 24.72-b04, mixed mode)
Linux idxa1 2.6.32-431.11.2.el6.centos.plus.x86_64 #1 SMP Tue Mar 25
21:36:54 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
I also just grabbed a screenshot from my dev server, running Ubuntu 14,
Solr 6.6.0, a LOT more index data, and a more recent Java version.  Solr
has an uptime of about one month.  This server was installed with the
service installer script, so it uses bin/solr.  It doesn't seem to have
https://www.dropbox.com/s/85h1weuopa643za/bigindy5-top-small-shr.png?dl=0
java version "1.8.0_144"
Java(TM) SE Runtime Environment (build 1.8.0_144-b01)
Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode)
Linux bigindy5 3.13.0-105-generic #152-Ubuntu SMP Fri Dec 2 15:37:11 UTC
2016 x86_64 x86_64 x86_64 GNU/Linux
-DSTOP.KEY=solrrocks
-DSTOP.PORT=7982
-Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.local.only=false
-Dcom.sun.management.jmxremote.port=18982
-Dcom.sun.management.jmxremote.rmi.port=18982
-Dcom.sun.management.jmxremote.ssl=false
-Djetty.home=/opt/solr6/server
-Djetty.port=8982
-Dlog4j.configuration=file:/index/solr6/log4j.properties
-Dsolr.install.dir=/opt/solr6
-Dsolr.log.dir=/index/solr6/logs
-Dsolr.log.muteconsole
-Dsolr.solr.home=/index/solr6/data
-Duser.timezone=UTC
-XX:+AggressiveOpts
-XX:+ParallelRefProcEnabled
-XX:+PrintGCApplicationStoppedTime
-XX:+PrintGCDateStamps
-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-XX:+PrintHeapAtGC
-XX:+PrintTenuringDistribution
-XX:+UseG1GC
-XX:+UseGCLogFileRotation
-XX:+UseLargePages
-XX:G1HeapRegionSize=8m
-XX:GCLogFileSize=20M
-XX:InitiatingHeapOccupancyPercent=75
-XX:MaxGCPauseMillis=250
-XX:NumberOfGCLogFiles=9
-XX:OnOutOfMemoryError=/opt/solr6/bin/oom_solr.sh 8982 /index/solr6/logs
-Xloggc:/index/solr6/logs/solr_gc.log
-Xms28g
-Xmx28g
-Xss256k
-verbose:gc
Neither system has any huge pages allocated in the OS, so I doubt that
the UseLargePages option is actually doing anything.  I've left it there
in case I *do* enable huge pages, so they will automatically get used.
Thanks,
Shawn
Erick Erickson
2017-08-24 01:34:36 UTC
Permalink
I suspect you've already seen this, but top and similar can be
confusing when trying to interpret MMapDirectory. Uwe has an excellent
explication:

http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

Best,
Erick

On Wed, Aug 23, 2017 at 9:10 AM, Markus Jelsma
Post by Markus Jelsma
openjdk version "1.8.0_141"
OpenJDK Runtime Environment (build 1.8.0_141-8u141-b15-1~deb9u1-b15)
OpenJDK 64-Bit Server VM (build 25.141-b15, mixed mode)
Linux idx1 4.9.0-3-amd64 #1 SMP Debian 4.9.30-2+deb9u1 (2017-06-18) x86_64 GNU/Linux
and
openjdk version "1.8.0_131"
OpenJDK Runtime Environment (build 1.8.0_131-8u131-b11-2ubuntu1.17.04.3-b11)
OpenJDK 64-Bit Server VM (build 25.131-b11, mixed mode)
Linux midas 4.10.0-32-generic #36-Ubuntu SMP Tue Aug 8 12:10:06 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
Regarding the node that shows the problem, can you reproduce it locally? Fire it up, put some data in, confirm low shared space usage, and execute few thousands queries against it? We immediately see a sharp rise in shared memory, MB's per second until it reaches some sort of plateau.
-----Original message-----
Sent: Wednesday 23rd August 2017 16:37
Subject: Re: Solr uses lots of shared memory!
Post by Markus Jelsma
Why does it slowly increase over time?
Why does it appear to correlate to index size?
Is anyone else seeing this on their 6.6 cloud production or local machines?
More detailed information included here. My 6.6 dev install is NOT
having the problem, but a much older version IS.
I grabbed this screenshot only moments ago from a production server
https://www.dropbox.com/s/q79lo2gft9es06u/idxa1-top-big-shr.png?dl=0
This is Solr 4.7.2, with a 10 month uptime for the Solr process, running
-DSTOP.KEY=REDACTED
-DSTOP.PORT=8078
-Djetty.port=8981
-Dsolr.solr.home=/index/solr4
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.port=8686
-Dcom.sun.management.jmxremote
-XX:+PrintReferenceGC
-XX:+PrintAdaptiveSizePolicy
-XX:+PrintGCDetails
-XX:+PrintGCDateStamps
-Xloggc:logs/gc.log-verbose:gc
-XX:+AggressiveOpts
-XX:+UseLargePages
-XX:InitiatingHeapOccupancyPercent=75
-XX:MaxGCPauseMillis=250
-XX:G1HeapRegionSize=8m
-XX:+ParallelRefProcEnabled
-XX:+PerfDisableSharedMem
-XX:+UseG1GC
-Dlog4j.configuration=file:etc/log4j.properties
-Xmx8192M
-Xms4096M
java version "1.7.0_72"
Java(TM) SE Runtime Environment (build 1.7.0_72-b14)
Java HotSpot(TM) 64-Bit Server VM (build 24.72-b04, mixed mode)
Linux idxa1 2.6.32-431.11.2.el6.centos.plus.x86_64 #1 SMP Tue Mar 25
21:36:54 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
I also just grabbed a screenshot from my dev server, running Ubuntu 14,
Solr 6.6.0, a LOT more index data, and a more recent Java version. Solr
has an uptime of about one month. This server was installed with the
service installer script, so it uses bin/solr. It doesn't seem to have
https://www.dropbox.com/s/85h1weuopa643za/bigindy5-top-small-shr.png?dl=0
java version "1.8.0_144"
Java(TM) SE Runtime Environment (build 1.8.0_144-b01)
Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode)
Linux bigindy5 3.13.0-105-generic #152-Ubuntu SMP Fri Dec 2 15:37:11 UTC
2016 x86_64 x86_64 x86_64 GNU/Linux
-DSTOP.KEY=solrrocks
-DSTOP.PORT=7982
-Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.local.only=false
-Dcom.sun.management.jmxremote.port=18982
-Dcom.sun.management.jmxremote.rmi.port=18982
-Dcom.sun.management.jmxremote.ssl=false
-Djetty.home=/opt/solr6/server
-Djetty.port=8982
-Dlog4j.configuration=file:/index/solr6/log4j.properties
-Dsolr.install.dir=/opt/solr6
-Dsolr.log.dir=/index/solr6/logs
-Dsolr.log.muteconsole
-Dsolr.solr.home=/index/solr6/data
-Duser.timezone=UTC
-XX:+AggressiveOpts
-XX:+ParallelRefProcEnabled
-XX:+PrintGCApplicationStoppedTime
-XX:+PrintGCDateStamps
-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-XX:+PrintHeapAtGC
-XX:+PrintTenuringDistribution
-XX:+UseG1GC
-XX:+UseGCLogFileRotation
-XX:+UseLargePages
-XX:G1HeapRegionSize=8m
-XX:GCLogFileSize=20M
-XX:InitiatingHeapOccupancyPercent=75
-XX:MaxGCPauseMillis=250
-XX:NumberOfGCLogFiles=9
-XX:OnOutOfMemoryError=/opt/solr6/bin/oom_solr.sh 8982 /index/solr6/logs
-Xloggc:/index/solr6/logs/solr_gc.log
-Xms28g
-Xmx28g
-Xss256k
-verbose:gc
Neither system has any huge pages allocated in the OS, so I doubt that
the UseLargePages option is actually doing anything. I've left it there
in case I *do* enable huge pages, so they will automatically get used.
Thanks,
Shawn
Markus Jelsma
2017-08-24 09:32:35 UTC
Permalink
Hello Erick,

I know the article, it is about virtual memory. My problem is with shared memory. Correct me if i am wrong, but MMApped files do not occupy shared but virtual instead. If i am wrong, the article must be rewritten. Our main searchers show very normal numbers for virtual, which is index size plus RSS.

My problem is that RSS is more than four times our heap size. Usually RSS is heap + PermGen/metaspace + code cache + (threads * stack size) + some other stuff, so a 1 GB heap would consume about 1500 MB tops. Our RSS for Solr is 4.3 GB of which 2.85 GB is shared memory.

Let me know what you think.

Thanks,
Markus

-----Original message-----
Sent: Thursday 24th August 2017 3:35
Subject: Re: Solr uses lots of shared memory!
I suspect you've already seen this, but top and similar can be
confusing when trying to interpret MMapDirectory. Uwe has an excellent
http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
Best,
Erick
On Wed, Aug 23, 2017 at 9:10 AM, Markus Jelsma
Post by Markus Jelsma
openjdk version "1.8.0_141"
OpenJDK Runtime Environment (build 1.8.0_141-8u141-b15-1~deb9u1-b15)
OpenJDK 64-Bit Server VM (build 25.141-b15, mixed mode)
Linux idx1 4.9.0-3-amd64 #1 SMP Debian 4.9.30-2+deb9u1 (2017-06-18) x86_64 GNU/Linux
and
openjdk version "1.8.0_131"
OpenJDK Runtime Environment (build 1.8.0_131-8u131-b11-2ubuntu1.17.04.3-b11)
OpenJDK 64-Bit Server VM (build 25.131-b11, mixed mode)
Linux midas 4.10.0-32-generic #36-Ubuntu SMP Tue Aug 8 12:10:06 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
Regarding the node that shows the problem, can you reproduce it locally? Fire it up, put some data in, confirm low shared space usage, and execute few thousands queries against it? We immediately see a sharp rise in shared memory, MB's per second until it reaches some sort of plateau.
-----Original message-----
Sent: Wednesday 23rd August 2017 16:37
Subject: Re: Solr uses lots of shared memory!
Post by Markus Jelsma
Why does it slowly increase over time?
Why does it appear to correlate to index size?
Is anyone else seeing this on their 6.6 cloud production or local machines?
More detailed information included here. My 6.6 dev install is NOT
having the problem, but a much older version IS.
I grabbed this screenshot only moments ago from a production server
https://www.dropbox.com/s/q79lo2gft9es06u/idxa1-top-big-shr.png?dl=0
This is Solr 4.7.2, with a 10 month uptime for the Solr process, running
-DSTOP.KEY=REDACTED
-DSTOP.PORT=8078
-Djetty.port=8981
-Dsolr.solr.home=/index/solr4
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.port=8686
-Dcom.sun.management.jmxremote
-XX:+PrintReferenceGC
-XX:+PrintAdaptiveSizePolicy
-XX:+PrintGCDetails
-XX:+PrintGCDateStamps
-Xloggc:logs/gc.log-verbose:gc
-XX:+AggressiveOpts
-XX:+UseLargePages
-XX:InitiatingHeapOccupancyPercent=75
-XX:MaxGCPauseMillis=250
-XX:G1HeapRegionSize=8m
-XX:+ParallelRefProcEnabled
-XX:+PerfDisableSharedMem
-XX:+UseG1GC
-Dlog4j.configuration=file:etc/log4j.properties
-Xmx8192M
-Xms4096M
java version "1.7.0_72"
Java(TM) SE Runtime Environment (build 1.7.0_72-b14)
Java HotSpot(TM) 64-Bit Server VM (build 24.72-b04, mixed mode)
Linux idxa1 2.6.32-431.11.2.el6.centos.plus.x86_64 #1 SMP Tue Mar 25
21:36:54 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
I also just grabbed a screenshot from my dev server, running Ubuntu 14,
Solr 6.6.0, a LOT more index data, and a more recent Java version. Solr
has an uptime of about one month. This server was installed with the
service installer script, so it uses bin/solr. It doesn't seem to have
https://www.dropbox.com/s/85h1weuopa643za/bigindy5-top-small-shr.png?dl=0
java version "1.8.0_144"
Java(TM) SE Runtime Environment (build 1.8.0_144-b01)
Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode)
Linux bigindy5 3.13.0-105-generic #152-Ubuntu SMP Fri Dec 2 15:37:11 UTC
2016 x86_64 x86_64 x86_64 GNU/Linux
-DSTOP.KEY=solrrocks
-DSTOP.PORT=7982
-Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.local.only=false
-Dcom.sun.management.jmxremote.port=18982
-Dcom.sun.management.jmxremote.rmi.port=18982
-Dcom.sun.management.jmxremote.ssl=false
-Djetty.home=/opt/solr6/server
-Djetty.port=8982
-Dlog4j.configuration=file:/index/solr6/log4j.properties
-Dsolr.install.dir=/opt/solr6
-Dsolr.log.dir=/index/solr6/logs
-Dsolr.log.muteconsole
-Dsolr.solr.home=/index/solr6/data
-Duser.timezone=UTC
-XX:+AggressiveOpts
-XX:+ParallelRefProcEnabled
-XX:+PrintGCApplicationStoppedTime
-XX:+PrintGCDateStamps
-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-XX:+PrintHeapAtGC
-XX:+PrintTenuringDistribution
-XX:+UseG1GC
-XX:+UseGCLogFileRotation
-XX:+UseLargePages
-XX:G1HeapRegionSize=8m
-XX:GCLogFileSize=20M
-XX:InitiatingHeapOccupancyPercent=75
-XX:MaxGCPauseMillis=250
-XX:NumberOfGCLogFiles=9
-XX:OnOutOfMemoryError=/opt/solr6/bin/oom_solr.sh 8982 /index/solr6/logs
-Xloggc:/index/solr6/logs/solr_gc.log
-Xms28g
-Xmx28g
-Xss256k
-verbose:gc
Neither system has any huge pages allocated in the OS, so I doubt that
the UseLargePages option is actually doing anything. I've left it there
in case I *do* enable huge pages, so they will automatically get used.
Thanks,
Shawn
Markus Jelsma
2017-08-24 09:49:45 UTC
Permalink
Hello Shalin,

Yes, the main search index has DocValues on just a few fields, they are used for facetting and function queries, we started using DocValues when 6.0 was released. Most fields are content fields for many languages. I don't think it is going to be DocValues because the max shared memory consumption is reduced my searching on fields fewer languages, and by disabling highlighting, both not using DocValues.

But it tried the option regardless, and because i didn't know about it. But it appears the option does exactly nothing. First is without any configuration for preload, second is with preload=true, third is preload=false

14220 markus 20 0 14,675g 1,508g 62800 S 1,0 9,6 0:36.98 java
14803 markus 20 0 14,674g 1,537g 63248 S 0,0 9,8 0:34.50 java
15324 markus 20 0 14,674g 1,409g 63152 S 0,0 9,0 0:35.50 java

Please correct my config is i am wrong:

<directoryFactory name="DirectoryFactory" class="${solr.directoryFactory:solr.NRTCachingDirectoryFactory}">
<bool name="preload">false</bool>
</directoryFactory>

NRTCachingDirectoryFactory implies MMapDirectory right?

Thanks,
Markus

-----Original message-----
Sent: Thursday 24th August 2017 5:51
Subject: Re: Solr uses lots of shared memory!
Very interesting. Do you have many DocValue fields? Have you always
had them i.e. did you see this problem before you turned on DocValues?
The DocValue fields are in a separate file and they will be memory
mapped on demand. One thing you can experiment with is to use
preload=true option on the MMapDirectoryFactory which will mmap all
index files on startup [1]. Once you do this, and if you still notice
shared memory leakage then it may be a genuine memory leak that we
should investigate.
[1] - http://lucene.apache.org/solr/guide/6_6/datadir-and-directoryfactory-in-solrconfig.html#DataDirandDirectoryFactoryinSolrConfig-SpecifyingtheDirectoryFactoryForYourIndex
On Wed, Aug 23, 2017 at 7:02 PM, Markus Jelsma
Post by Markus Jelsma
I do not think it is a problem of reporting after watching top after restart of some Solr instances, it dropped back to `normal`, around 350 MB, which i think it high to but anyway.
Two hours later, the restarted nodes are slowly increasing shared memory consumption to about 1500 MB now. I don't understand why shared memory usage should/would increase slowly over time, it makes little sense to me and i cannot remember Solr doing this in the past ten years.
But it seems to correlate to index size on disk, these main text search nodes have an index of around 16 GB and up 3 GB of shared memory after a few days. Logs nodes up to 800 MB index size and 320 MB of shared memory, the low latency nodes have four different cores that make up just over 100 MB index size, shared memory consumption is just 22 MB, which seems more reasonable for the case of shared memory.
I can also force Solr to 'leak' shared memory just by sending queries to it. My freshly restarted local node used 68 MB shared memory at startup. Two minutes and 25.000 queries later it was already 2748 MB! At first there is a very sharp increase to 2000, then it takes almost two minutes more to increase to 2748. I can decrease the maximum shared memory usage to 1200 if i query (via edismax) only on fields of one language instead of 25 orso. I can decrease it as well further if i disable highlighting (HUH?) but still query on all fields.
* We have tried patching Java's ByteBuffer [1] because it seemed to fit the problems, it does not fix it.
* We have also removed all our custom plugins, so it has become a vanilla Solr 6.6 just with our stripped down schema and solrconfig, it neither fixes it.
Why does it slowly increase over time?
Why does it appear to correlate to index size?
Is anyone else seeing this on their 6.6 cloud production or local machines?
Thanks,
Markus
[1]: http://www.evanjones.ca/java-bytebuffer-leak.html
-----Original message-----
Sent: Tuesday 22nd August 2017 17:32
Subject: Re: Solr uses lots of shared memory!
Post by Markus Jelsma
I have never seen this before, one of our collections, all nodes eating tons of shared memory!
10497 solr 20 0 19.439g 4.505g 3.139g S 1.0 57.8 2511:46 java
RSS is roughly equal to heap size + usual off-heap space + shared memory. Virtual is equal to RSS and index size on disk. For two other collections, the nodes use shared memory as expected, in the MB range.
How can Solr, this collection, use so much shared memory? Why?
I've seen this on my own servers at work, and when I add up a subset of
the memory numbers I can see from the system, it ends up being more
memory than I even have in the server.
I suspect there is something odd going on in how Java reports memory
usage to the OS, or maybe a glitch in how Linux interprets Java's memory
usage. At some point in the past, numbers were reported correctly. I
do not know if the change came about because of a Solr upgrade, because
of a Java upgrade, or because of an OS kernel upgrade. All three were
upgraded between when I know the numbers looked right and when I noticed
they were wrong.
https://www.dropbox.com/s/91uqlrnfghr2heo/solr-memory-sorted-top.png?dl=0
This screenshot shows that Solr is using 17GB of memory, 41.45GB of
memory is being used by the OS disk cache, and 10.23GB of memory is
free. Add those up, and it comes to 68.68GB ... but the machine only
has 64GB of memory, and that total doesn't include the memory usage of
the other processes seen in the screenshot. This impossible situation
means that something is being misreported somewhere. If I deduct that
11GB of SHR from the RES value, then all the numbers work.
The screenshot was almost 3 years ago, so I do not know what machine it
came from, and therefore I can't be sure what the actual heap size was.
I think it was about 6GB -- the difference between RES and SHR. I have
used a 6GB heap on some of my production servers in the past. The
server where I got this screenshot was not having any noticeable
performance or memory problems, so I think that I can trust that the
main numbers above the process list (which only come from the OS) are
correct.
Thanks,
Shawn
--
Regards,
Shalin Shekhar Mangar.
Bernd Fehling
2017-08-24 13:39:20 UTC
Permalink
Just an idea, how about taking a dump with jmap and using
MemoryAnalyzerTool to see what is going on?

Regards
Bernd
Post by Markus Jelsma
Hello Shalin,
Yes, the main search index has DocValues on just a few fields, they are used for facetting and function queries, we started using DocValues when 6.0 was released. Most fields are content fields for many languages. I don't think it is going to be DocValues because the max shared memory consumption is reduced my searching on fields fewer languages, and by disabling highlighting, both not using DocValues.
But it tried the option regardless, and because i didn't know about it. But it appears the option does exactly nothing. First is without any configuration for preload, second is with preload=true, third is preload=false
14220 markus 20 0 14,675g 1,508g 62800 S 1,0 9,6 0:36.98 java
14803 markus 20 0 14,674g 1,537g 63248 S 0,0 9,8 0:34.50 java
15324 markus 20 0 14,674g 1,409g 63152 S 0,0 9,0 0:35.50 java
<directoryFactory name="DirectoryFactory" class="${solr.directoryFactory:solr.NRTCachingDirectoryFactory}">
<bool name="preload">false</bool>
</directoryFactory>
NRTCachingDirectoryFactory implies MMapDirectory right?
Thanks,
Markus
-----Original message-----
Sent: Thursday 24th August 2017 5:51
Subject: Re: Solr uses lots of shared memory!
Very interesting. Do you have many DocValue fields? Have you always
had them i.e. did you see this problem before you turned on DocValues?
The DocValue fields are in a separate file and they will be memory
mapped on demand. One thing you can experiment with is to use
preload=true option on the MMapDirectoryFactory which will mmap all
index files on startup [1]. Once you do this, and if you still notice
shared memory leakage then it may be a genuine memory leak that we
should investigate.
[1] - http://lucene.apache.org/solr/guide/6_6/datadir-and-directoryfactory-in-solrconfig.html#DataDirandDirectoryFactoryinSolrConfig-SpecifyingtheDirectoryFactoryForYourIndex
On Wed, Aug 23, 2017 at 7:02 PM, Markus Jelsma
Post by Markus Jelsma
I do not think it is a problem of reporting after watching top after restart of some Solr instances, it dropped back to `normal`, around 350 MB, which i think it high to but anyway.
Two hours later, the restarted nodes are slowly increasing shared memory consumption to about 1500 MB now. I don't understand why shared memory usage should/would increase slowly over time, it makes little sense to me and i cannot remember Solr doing this in the past ten years.
But it seems to correlate to index size on disk, these main text search nodes have an index of around 16 GB and up 3 GB of shared memory after a few days. Logs nodes up to 800 MB index size and 320 MB of shared memory, the low latency nodes have four different cores that make up just over 100 MB index size, shared memory consumption is just 22 MB, which seems more reasonable for the case of shared memory.
I can also force Solr to 'leak' shared memory just by sending queries to it. My freshly restarted local node used 68 MB shared memory at startup. Two minutes and 25.000 queries later it was already 2748 MB! At first there is a very sharp increase to 2000, then it takes almost two minutes more to increase to 2748. I can decrease the maximum shared memory usage to 1200 if i query (via edismax) only on fields of one language instead of 25 orso. I can decrease it as well further if i disable highlighting (HUH?) but still query on all fields.
* We have tried patching Java's ByteBuffer [1] because it seemed to fit the problems, it does not fix it.
* We have also removed all our custom plugins, so it has become a vanilla Solr 6.6 just with our stripped down schema and solrconfig, it neither fixes it.
Why does it slowly increase over time?
Why does it appear to correlate to index size?
Is anyone else seeing this on their 6.6 cloud production or local machines?
Thanks,
Markus
[1]: http://www.evanjones.ca/java-bytebuffer-leak.html
-----Original message-----
Sent: Tuesday 22nd August 2017 17:32
Subject: Re: Solr uses lots of shared memory!
Post by Markus Jelsma
I have never seen this before, one of our collections, all nodes eating tons of shared memory!
10497 solr 20 0 19.439g 4.505g 3.139g S 1.0 57.8 2511:46 java
RSS is roughly equal to heap size + usual off-heap space + shared memory. Virtual is equal to RSS and index size on disk. For two other collections, the nodes use shared memory as expected, in the MB range.
How can Solr, this collection, use so much shared memory? Why?
I've seen this on my own servers at work, and when I add up a subset of
the memory numbers I can see from the system, it ends up being more
memory than I even have in the server.
I suspect there is something odd going on in how Java reports memory
usage to the OS, or maybe a glitch in how Linux interprets Java's memory
usage. At some point in the past, numbers were reported correctly. I
do not know if the change came about because of a Solr upgrade, because
of a Java upgrade, or because of an OS kernel upgrade. All three were
upgraded between when I know the numbers looked right and when I noticed
they were wrong.
https://www.dropbox.com/s/91uqlrnfghr2heo/solr-memory-sorted-top.png?dl=0
This screenshot shows that Solr is using 17GB of memory, 41.45GB of
memory is being used by the OS disk cache, and 10.23GB of memory is
free. Add those up, and it comes to 68.68GB ... but the machine only
has 64GB of memory, and that total doesn't include the memory usage of
the other processes seen in the screenshot. This impossible situation
means that something is being misreported somewhere. If I deduct that
11GB of SHR from the RES value, then all the numbers work.
The screenshot was almost 3 years ago, so I do not know what machine it
came from, and therefore I can't be sure what the actual heap size was.
I think it was about 6GB -- the difference between RES and SHR. I have
used a 6GB heap on some of my production servers in the past. The
server where I got this screenshot was not having any noticeable
performance or memory problems, so I think that I can trust that the
main numbers above the process list (which only come from the OS) are
correct.
Thanks,
Shawn
--
Regards,
Shalin Shekhar Mangar.
Markus Jelsma
2017-08-24 15:19:49 UTC
Permalink
Hello Bernd,

According to the man page, i should get a list of stuff in shared memory if i invoke it with just a PID. Which shows a list of libraries that together account for about 25 MB's shared memory usage. Accoring to ps and top, the JVM uses 2800 MB shared memory (not virtual), that leaves 2775 MB unaccounted for. Any ideas? Anyone else to reproduce it on a freshly restarted node?

Thanks,
Markus


PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
18901 markus 20 0 14,778g 4,965g 2,987g S 891,1 31,7 20:21.63 java

0x000055b9a17f1000 6K /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java
0x00007fdf1d314000 182K /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/libsunec.so
0x00007fdf1e548000 38K /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/libmanagement.so
0x00007fdf1e78e000 94K /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/libnet.so
0x00007fdf1e9a6000 75K /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/libnio.so
0x00007fdf5cd6e000 34K /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/libzip.so
0x00007fdf5cf77000 46K /lib/x86_64-linux-gnu/libnss_files-2.24.so
0x00007fdf5d189000 46K /lib/x86_64-linux-gnu/libnss_nis-2.24.so
0x00007fdf5d395000 90K /lib/x86_64-linux-gnu/libnsl-2.24.so
0x00007fdf5d5ae000 34K /lib/x86_64-linux-gnu/libnss_compat-2.24.so
0x00007fdf5d7b7000 187K /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/libjava.so
0x00007fdf5d9e6000 70K /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/libverify.so
0x00007fdf5dbf8000 30K /lib/x86_64-linux-gnu/librt-2.24.so
0x00007fdf5de00000 90K /lib/x86_64-linux-gnu/libgcc_s.so.1
0x00007fdf5e017000 1063K /lib/x86_64-linux-gnu/libm-2.24.so
0x00007fdf5e320000 1553K /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.22
0x00007fdf5e6a8000 15936K /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so
0x00007fdf5f5ed000 139K /lib/x86_64-linux-gnu/libpthread-2.24.so
0x00007fdf5f80b000 14K /lib/x86_64-linux-gnu/libdl-2.24.so
0x00007fdf5fa0f000 110K /lib/x86_64-linux-gnu/libz.so.1.2.11
0x00007fdf5fc2b000 1813K /lib/x86_64-linux-gnu/libc-2.24.so
0x00007fdf5fff2000 58K /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/jli/libjli.so
0x00007fdf60201000 158K /lib/x86_64-linux-gnu/ld-2.24.so

-----Original message-----
Sent: Thursday 24th August 2017 15:39
Subject: Re: Solr uses lots of shared memory!
Just an idea, how about taking a dump with jmap and using
MemoryAnalyzerTool to see what is going on?
Regards
Bernd
Post by Markus Jelsma
Hello Shalin,
Yes, the main search index has DocValues on just a few fields, they are used for facetting and function queries, we started using DocValues when 6.0 was released. Most fields are content fields for many languages. I don't think it is going to be DocValues because the max shared memory consumption is reduced my searching on fields fewer languages, and by disabling highlighting, both not using DocValues.
But it tried the option regardless, and because i didn't know about it. But it appears the option does exactly nothing. First is without any configuration for preload, second is with preload=true, third is preload=false
14220 markus 20 0 14,675g 1,508g 62800 S 1,0 9,6 0:36.98 java
14803 markus 20 0 14,674g 1,537g 63248 S 0,0 9,8 0:34.50 java
15324 markus 20 0 14,674g 1,409g 63152 S 0,0 9,0 0:35.50 java
<directoryFactory name="DirectoryFactory" class="${solr.directoryFactory:solr.NRTCachingDirectoryFactory}">
<bool name="preload">false</bool>
</directoryFactory>
NRTCachingDirectoryFactory implies MMapDirectory right?
Thanks,
Markus
-----Original message-----
Sent: Thursday 24th August 2017 5:51
Subject: Re: Solr uses lots of shared memory!
Very interesting. Do you have many DocValue fields? Have you always
had them i.e. did you see this problem before you turned on DocValues?
The DocValue fields are in a separate file and they will be memory
mapped on demand. One thing you can experiment with is to use
preload=true option on the MMapDirectoryFactory which will mmap all
index files on startup [1]. Once you do this, and if you still notice
shared memory leakage then it may be a genuine memory leak that we
should investigate.
[1] - http://lucene.apache.org/solr/guide/6_6/datadir-and-directoryfactory-in-solrconfig.html#DataDirandDirectoryFactoryinSolrConfig-SpecifyingtheDirectoryFactoryForYourIndex
On Wed, Aug 23, 2017 at 7:02 PM, Markus Jelsma
Post by Markus Jelsma
I do not think it is a problem of reporting after watching top after restart of some Solr instances, it dropped back to `normal`, around 350 MB, which i think it high to but anyway.
Two hours later, the restarted nodes are slowly increasing shared memory consumption to about 1500 MB now. I don't understand why shared memory usage should/would increase slowly over time, it makes little sense to me and i cannot remember Solr doing this in the past ten years.
But it seems to correlate to index size on disk, these main text search nodes have an index of around 16 GB and up 3 GB of shared memory after a few days. Logs nodes up to 800 MB index size and 320 MB of shared memory, the low latency nodes have four different cores that make up just over 100 MB index size, shared memory consumption is just 22 MB, which seems more reasonable for the case of shared memory.
I can also force Solr to 'leak' shared memory just by sending queries to it. My freshly restarted local node used 68 MB shared memory at startup. Two minutes and 25.000 queries later it was already 2748 MB! At first there is a very sharp increase to 2000, then it takes almost two minutes more to increase to 2748. I can decrease the maximum shared memory usage to 1200 if i query (via edismax) only on fields of one language instead of 25 orso. I can decrease it as well further if i disable highlighting (HUH?) but still query on all fields.
* We have tried patching Java's ByteBuffer [1] because it seemed to fit the problems, it does not fix it.
* We have also removed all our custom plugins, so it has become a vanilla Solr 6.6 just with our stripped down schema and solrconfig, it neither fixes it.
Why does it slowly increase over time?
Why does it appear to correlate to index size?
Is anyone else seeing this on their 6.6 cloud production or local machines?
Thanks,
Markus
[1]: http://www.evanjones.ca/java-bytebuffer-leak.html
-----Original message-----
Sent: Tuesday 22nd August 2017 17:32
Subject: Re: Solr uses lots of shared memory!
Post by Markus Jelsma
I have never seen this before, one of our collections, all nodes eating tons of shared memory!
10497 solr 20 0 19.439g 4.505g 3.139g S 1.0 57.8 2511:46 java
RSS is roughly equal to heap size + usual off-heap space + shared memory. Virtual is equal to RSS and index size on disk. For two other collections, the nodes use shared memory as expected, in the MB range.
How can Solr, this collection, use so much shared memory? Why?
I've seen this on my own servers at work, and when I add up a subset of
the memory numbers I can see from the system, it ends up being more
memory than I even have in the server.
I suspect there is something odd going on in how Java reports memory
usage to the OS, or maybe a glitch in how Linux interprets Java's memory
usage. At some point in the past, numbers were reported correctly. I
do not know if the change came about because of a Solr upgrade, because
of a Java upgrade, or because of an OS kernel upgrade. All three were
upgraded between when I know the numbers looked right and when I noticed
they were wrong.
https://www.dropbox.com/s/91uqlrnfghr2heo/solr-memory-sorted-top.png?dl=0
This screenshot shows that Solr is using 17GB of memory, 41.45GB of
memory is being used by the OS disk cache, and 10.23GB of memory is
free. Add those up, and it comes to 68.68GB ... but the machine only
has 64GB of memory, and that total doesn't include the memory usage of
the other processes seen in the screenshot. This impossible situation
means that something is being misreported somewhere. If I deduct that
11GB of SHR from the RES value, then all the numbers work.
The screenshot was almost 3 years ago, so I do not know what machine it
came from, and therefore I can't be sure what the actual heap size was.
I think it was about 6GB -- the difference between RES and SHR. I have
used a 6GB heap on some of my production servers in the past. The
server where I got this screenshot was not having any noticeable
performance or memory problems, so I think that I can trust that the
main numbers above the process list (which only come from the OS) are
correct.
Thanks,
Shawn
--
Regards,
Shalin Shekhar Mangar.
Kevin Risden
2017-09-02 17:15:38 UTC
Permalink
I haven't looked at reproducing this locally, but since it seems like
there haven't been any new ideas decided to share this in case it
helps:

I noticed in Travis CI [1] they are adding the environment variable
MALLOC_ARENA_MAX=2 and so I googled what that configuration did. To my
surprise, I came across a stackoverflow post [2] about how glibc could
actually be the case and report memory differently. I then found a
Hadoop issue HADOOP-7154 [3] about setting this as well to reduce
virtual memory usage. I found some more cases where this has helped as
well [4], [5], and [6]

[1] https://docs.travis-ci.com/user/build-environment-updates/2017-09-06/#Added
[2] https://stackoverflow.com/questions/10575342/what-would-cause-a-java-process-to-greatly-exceed-the-xmx-or-xss-limit
[3] https://issues.apache.org/jira/browse/HADOOP-7154?focusedCommentId=14505792
[4] https://github.com/cloudfoundry/java-buildpack/issues/320
[5] https://devcenter.heroku.com/articles/tuning-glibc-memory-behavior
[6] https://www.ibm.com/developerworks/community/blogs/kevgrig/entry/linux_glibc_2_10_rhel_6_malloc_may_show_excessive_virtual_memory_usage?lang=en
Kevin Risden


On Thu, Aug 24, 2017 at 10:19 AM, Markus Jelsma
Post by Markus Jelsma
Hello Bernd,
According to the man page, i should get a list of stuff in shared memory if i invoke it with just a PID. Which shows a list of libraries that together account for about 25 MB's shared memory usage. Accoring to ps and top, the JVM uses 2800 MB shared memory (not virtual), that leaves 2775 MB unaccounted for. Any ideas? Anyone else to reproduce it on a freshly restarted node?
Thanks,
Markus
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
18901 markus 20 0 14,778g 4,965g 2,987g S 891,1 31,7 20:21.63 java
0x000055b9a17f1000 6K /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java
0x00007fdf1d314000 182K /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/libsunec.so
0x00007fdf1e548000 38K /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/libmanagement.so
0x00007fdf1e78e000 94K /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/libnet.so
0x00007fdf1e9a6000 75K /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/libnio.so
0x00007fdf5cd6e000 34K /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/libzip.so
0x00007fdf5cf77000 46K /lib/x86_64-linux-gnu/libnss_files-2.24.so
0x00007fdf5d189000 46K /lib/x86_64-linux-gnu/libnss_nis-2.24.so
0x00007fdf5d395000 90K /lib/x86_64-linux-gnu/libnsl-2.24.so
0x00007fdf5d5ae000 34K /lib/x86_64-linux-gnu/libnss_compat-2.24.so
0x00007fdf5d7b7000 187K /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/libjava.so
0x00007fdf5d9e6000 70K /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/libverify.so
0x00007fdf5dbf8000 30K /lib/x86_64-linux-gnu/librt-2.24.so
0x00007fdf5de00000 90K /lib/x86_64-linux-gnu/libgcc_s.so.1
0x00007fdf5e017000 1063K /lib/x86_64-linux-gnu/libm-2.24.so
0x00007fdf5e320000 1553K /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.22
0x00007fdf5e6a8000 15936K /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so
0x00007fdf5f5ed000 139K /lib/x86_64-linux-gnu/libpthread-2.24.so
0x00007fdf5f80b000 14K /lib/x86_64-linux-gnu/libdl-2.24.so
0x00007fdf5fa0f000 110K /lib/x86_64-linux-gnu/libz.so.1.2.11
0x00007fdf5fc2b000 1813K /lib/x86_64-linux-gnu/libc-2.24.so
0x00007fdf5fff2000 58K /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/jli/libjli.so
0x00007fdf60201000 158K /lib/x86_64-linux-gnu/ld-2.24.so
-----Original message-----
Sent: Thursday 24th August 2017 15:39
Subject: Re: Solr uses lots of shared memory!
Just an idea, how about taking a dump with jmap and using
MemoryAnalyzerTool to see what is going on?
Regards
Bernd
Post by Markus Jelsma
Hello Shalin,
Yes, the main search index has DocValues on just a few fields, they are used for facetting and function queries, we started using DocValues when 6.0 was released. Most fields are content fields for many languages. I don't think it is going to be DocValues because the max shared memory consumption is reduced my searching on fields fewer languages, and by disabling highlighting, both not using DocValues.
But it tried the option regardless, and because i didn't know about it. But it appears the option does exactly nothing. First is without any configuration for preload, second is with preload=true, third is preload=false
14220 markus 20 0 14,675g 1,508g 62800 S 1,0 9,6 0:36.98 java
14803 markus 20 0 14,674g 1,537g 63248 S 0,0 9,8 0:34.50 java
15324 markus 20 0 14,674g 1,409g 63152 S 0,0 9,0 0:35.50 java
<directoryFactory name="DirectoryFactory" class="${solr.directoryFactory:solr.NRTCachingDirectoryFactory}">
<bool name="preload">false</bool>
</directoryFactory>
NRTCachingDirectoryFactory implies MMapDirectory right?
Thanks,
Markus
-----Original message-----
Sent: Thursday 24th August 2017 5:51
Subject: Re: Solr uses lots of shared memory!
Very interesting. Do you have many DocValue fields? Have you always
had them i.e. did you see this problem before you turned on DocValues?
The DocValue fields are in a separate file and they will be memory
mapped on demand. One thing you can experiment with is to use
preload=true option on the MMapDirectoryFactory which will mmap all
index files on startup [1]. Once you do this, and if you still notice
shared memory leakage then it may be a genuine memory leak that we
should investigate.
[1] - http://lucene.apache.org/solr/guide/6_6/datadir-and-directoryfactory-in-solrconfig.html#DataDirandDirectoryFactoryinSolrConfig-SpecifyingtheDirectoryFactoryForYourIndex
On Wed, Aug 23, 2017 at 7:02 PM, Markus Jelsma
Post by Markus Jelsma
I do not think it is a problem of reporting after watching top after restart of some Solr instances, it dropped back to `normal`, around 350 MB, which i think it high to but anyway.
Two hours later, the restarted nodes are slowly increasing shared memory consumption to about 1500 MB now. I don't understand why shared memory usage should/would increase slowly over time, it makes little sense to me and i cannot remember Solr doing this in the past ten years.
But it seems to correlate to index size on disk, these main text search nodes have an index of around 16 GB and up 3 GB of shared memory after a few days. Logs nodes up to 800 MB index size and 320 MB of shared memory, the low latency nodes have four different cores that make up just over 100 MB index size, shared memory consumption is just 22 MB, which seems more reasonable for the case of shared memory.
I can also force Solr to 'leak' shared memory just by sending queries to it. My freshly restarted local node used 68 MB shared memory at startup. Two minutes and 25.000 queries later it was already 2748 MB! At first there is a very sharp increase to 2000, then it takes almost two minutes more to increase to 2748. I can decrease the maximum shared memory usage to 1200 if i query (via edismax) only on fields of one language instead of 25 orso. I can decrease it as well further if i disable highlighting (HUH?) but still query on all fields.
* We have tried patching Java's ByteBuffer [1] because it seemed to fit the problems, it does not fix it.
* We have also removed all our custom plugins, so it has become a vanilla Solr 6.6 just with our stripped down schema and solrconfig, it neither fixes it.
Why does it slowly increase over time?
Why does it appear to correlate to index size?
Is anyone else seeing this on their 6.6 cloud production or local machines?
Thanks,
Markus
[1]: http://www.evanjones.ca/java-bytebuffer-leak.html
-----Original message-----
Sent: Tuesday 22nd August 2017 17:32
Subject: Re: Solr uses lots of shared memory!
Post by Markus Jelsma
I have never seen this before, one of our collections, all nodes eating tons of shared memory!
10497 solr 20 0 19.439g 4.505g 3.139g S 1.0 57.8 2511:46 java
RSS is roughly equal to heap size + usual off-heap space + shared memory. Virtual is equal to RSS and index size on disk. For two other collections, the nodes use shared memory as expected, in the MB range.
How can Solr, this collection, use so much shared memory? Why?
I've seen this on my own servers at work, and when I add up a subset of
the memory numbers I can see from the system, it ends up being more
memory than I even have in the server.
I suspect there is something odd going on in how Java reports memory
usage to the OS, or maybe a glitch in how Linux interprets Java's memory
usage. At some point in the past, numbers were reported correctly. I
do not know if the change came about because of a Solr upgrade, because
of a Java upgrade, or because of an OS kernel upgrade. All three were
upgraded between when I know the numbers looked right and when I noticed
they were wrong.
https://www.dropbox.com/s/91uqlrnfghr2heo/solr-memory-sorted-top.png?dl=0
This screenshot shows that Solr is using 17GB of memory, 41.45GB of
memory is being used by the OS disk cache, and 10.23GB of memory is
free. Add those up, and it comes to 68.68GB ... but the machine only
has 64GB of memory, and that total doesn't include the memory usage of
the other processes seen in the screenshot. This impossible situation
means that something is being misreported somewhere. If I deduct that
11GB of SHR from the RES value, then all the numbers work.
The screenshot was almost 3 years ago, so I do not know what machine it
came from, and therefore I can't be sure what the actual heap size was.
I think it was about 6GB -- the difference between RES and SHR. I have
used a 6GB heap on some of my production servers in the past. The
server where I got this screenshot was not having any noticeable
performance or memory problems, so I think that I can trust that the
main numbers above the process list (which only come from the OS) are
correct.
Thanks,
Shawn
--
Regards,
Shalin Shekhar Mangar.
Rick Leir
2017-09-03 15:08:07 UTC
Permalink
Hi all
Malloc has a lock while it is active in the heap. If there is more than one thread, and malloc finds the lock in use, then it avoids waiting on the lock by creating a new 'arena' to hold its heap. My understanding is that a process with multiple threads which are all active users of malloc will eventually have an arena per thread. If you limit the number of arenas, you may suffer delays waiting on locks.

But this needs performance testing. My experience was with C++, not with a JVM. I would be interested to know if setting MALLOC_ARENA_MAX=2 makes a difference to performance.
Cheers -- Rick
Post by Kevin Risden
I haven't looked at reproducing this locally, but since it seems like
there haven't been any new ideas decided to share this in case it
I noticed in Travis CI [1] they are adding the environment variable
MALLOC_ARENA_MAX=2 and so I googled what that configuration did. To my
surprise, I came across a stackoverflow post [2] about how glibc could
actually be the case and report memory differently. I then found a
Hadoop issue HADOOP-7154 [3] about setting this as well to reduce
virtual memory usage. I found some more cases where this has helped as
well [4], [5], and [6]
[1]
https://docs.travis-ci.com/user/build-environment-updates/2017-09-06/#Added
[2]
https://stackoverflow.com/questions/10575342/what-would-cause-a-java-process-to-greatly-exceed-the-xmx-or-xss-limit
[3]
https://issues.apache.org/jira/browse/HADOOP-7154?focusedCommentId=14505792
[4] https://github.com/cloudfoundry/java-buildpack/issues/320
[5] https://devcenter.heroku.com/articles/tuning-glibc-memory-behavior
[6]
https://www.ibm.com/developerworks/community/blogs/kevgrig/entry/linux_glibc_2_10_rhel_6_malloc_may_show_excessive_virtual_memory_usage?lang=en
Kevin Risden
On Thu, Aug 24, 2017 at 10:19 AM, Markus Jelsma
Post by Markus Jelsma
Hello Bernd,
According to the man page, i should get a list of stuff in shared
memory if i invoke it with just a PID. Which shows a list of libraries
that together account for about 25 MB's shared memory usage. Accoring
to ps and top, the JVM uses 2800 MB shared memory (not virtual), that
leaves 2775 MB unaccounted for. Any ideas? Anyone else to reproduce it
on a freshly restarted node?
Post by Markus Jelsma
Thanks,
Markus
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
COMMAND
Post by Markus Jelsma
18901 markus 20 0 14,778g 4,965g 2,987g S 891,1 31,7 20:21.63
java
Post by Markus Jelsma
0x000055b9a17f1000 6K
/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java
Post by Markus Jelsma
0x00007fdf1d314000 182K
/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/libsunec.so
Post by Markus Jelsma
0x00007fdf1e548000 38K
/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/libmanagement.so
Post by Markus Jelsma
0x00007fdf1e78e000 94K
/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/libnet.so
Post by Markus Jelsma
0x00007fdf1e9a6000 75K
/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/libnio.so
Post by Markus Jelsma
0x00007fdf5cd6e000 34K
/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/libzip.so
Post by Markus Jelsma
0x00007fdf5cf77000 46K
/lib/x86_64-linux-gnu/libnss_files-2.24.so
Post by Markus Jelsma
0x00007fdf5d189000 46K
/lib/x86_64-linux-gnu/libnss_nis-2.24.so
Post by Markus Jelsma
0x00007fdf5d395000 90K /lib/x86_64-linux-gnu/libnsl-2.24.so
0x00007fdf5d5ae000 34K
/lib/x86_64-linux-gnu/libnss_compat-2.24.so
Post by Markus Jelsma
0x00007fdf5d7b7000 187K
/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/libjava.so
Post by Markus Jelsma
0x00007fdf5d9e6000 70K
/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/libverify.so
Post by Markus Jelsma
0x00007fdf5dbf8000 30K /lib/x86_64-linux-gnu/librt-2.24.so
0x00007fdf5de00000 90K /lib/x86_64-linux-gnu/libgcc_s.so.1
0x00007fdf5e017000 1063K /lib/x86_64-linux-gnu/libm-2.24.so
0x00007fdf5e320000 1553K
/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.22
Post by Markus Jelsma
0x00007fdf5e6a8000 15936K
/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so
Post by Markus Jelsma
0x00007fdf5f5ed000 139K
/lib/x86_64-linux-gnu/libpthread-2.24.so
Post by Markus Jelsma
0x00007fdf5f80b000 14K /lib/x86_64-linux-gnu/libdl-2.24.so
0x00007fdf5fa0f000 110K /lib/x86_64-linux-gnu/libz.so.1.2.11
0x00007fdf5fc2b000 1813K /lib/x86_64-linux-gnu/libc-2.24.so
0x00007fdf5fff2000 58K
/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/jli/libjli.so
Post by Markus Jelsma
0x00007fdf60201000 158K /lib/x86_64-linux-gnu/ld-2.24.so
-----Original message-----
Sent: Thursday 24th August 2017 15:39
Subject: Re: Solr uses lots of shared memory!
Just an idea, how about taking a dump with jmap and using
MemoryAnalyzerTool to see what is going on?
Regards
Bernd
Post by Markus Jelsma
Hello Shalin,
Yes, the main search index has DocValues on just a few fields,
they are used for facetting and function queries, we started using
DocValues when 6.0 was released. Most fields are content fields for
many languages. I don't think it is going to be DocValues because the
max shared memory consumption is reduced my searching on fields fewer
languages, and by disabling highlighting, both not using DocValues.
Post by Markus Jelsma
Post by Markus Jelsma
But it tried the option regardless, and because i didn't know
about it. But it appears the option does exactly nothing. First is
without any configuration for preload, second is with preload=true,
third is preload=false
Post by Markus Jelsma
Post by Markus Jelsma
14220 markus 20 0 14,675g 1,508g 62800 S 1,0 9,6
0:36.98 java
Post by Markus Jelsma
Post by Markus Jelsma
14803 markus 20 0 14,674g 1,537g 63248 S 0,0 9,8
0:34.50 java
Post by Markus Jelsma
Post by Markus Jelsma
15324 markus 20 0 14,674g 1,409g 63152 S 0,0 9,0
0:35.50 java
Post by Markus Jelsma
Post by Markus Jelsma
<directoryFactory name="DirectoryFactory"
class="${solr.directoryFactory:solr.NRTCachingDirectoryFactory}">
Post by Markus Jelsma
Post by Markus Jelsma
<bool name="preload">false</bool>
</directoryFactory>
NRTCachingDirectoryFactory implies MMapDirectory right?
Thanks,
Markus
-----Original message-----
Sent: Thursday 24th August 2017 5:51
Subject: Re: Solr uses lots of shared memory!
Very interesting. Do you have many DocValue fields? Have you
always
Post by Markus Jelsma
Post by Markus Jelsma
had them i.e. did you see this problem before you turned on
DocValues?
Post by Markus Jelsma
Post by Markus Jelsma
The DocValue fields are in a separate file and they will be
memory
Post by Markus Jelsma
Post by Markus Jelsma
mapped on demand. One thing you can experiment with is to use
preload=true option on the MMapDirectoryFactory which will mmap
all
Post by Markus Jelsma
Post by Markus Jelsma
index files on startup [1]. Once you do this, and if you still
notice
Post by Markus Jelsma
Post by Markus Jelsma
shared memory leakage then it may be a genuine memory leak that
we
Post by Markus Jelsma
Post by Markus Jelsma
should investigate.
[1] -
http://lucene.apache.org/solr/guide/6_6/datadir-and-directoryfactory-in-solrconfig.html#DataDirandDirectoryFactoryinSolrConfig-SpecifyingtheDirectoryFactoryForYourIndex
Post by Markus Jelsma
Post by Markus Jelsma
On Wed, Aug 23, 2017 at 7:02 PM, Markus Jelsma
Post by Markus Jelsma
I do not think it is a problem of reporting after watching top
after restart of some Solr instances, it dropped back to `normal`,
around 350 MB, which i think it high to but anyway.
Post by Markus Jelsma
Post by Markus Jelsma
Post by Markus Jelsma
Two hours later, the restarted nodes are slowly increasing
shared memory consumption to about 1500 MB now. I don't understand why
shared memory usage should/would increase slowly over time, it makes
little sense to me and i cannot remember Solr doing this in the past
ten years.
Post by Markus Jelsma
Post by Markus Jelsma
Post by Markus Jelsma
But it seems to correlate to index size on disk, these main text
search nodes have an index of around 16 GB and up 3 GB of shared memory
after a few days. Logs nodes up to 800 MB index size and 320 MB of
shared memory, the low latency nodes have four different cores that
make up just over 100 MB index size, shared memory consumption is just
22 MB, which seems more reasonable for the case of shared memory.
Post by Markus Jelsma
Post by Markus Jelsma
Post by Markus Jelsma
I can also force Solr to 'leak' shared memory just by sending
queries to it. My freshly restarted local node used 68 MB shared memory
at startup. Two minutes and 25.000 queries later it was already 2748
MB! At first there is a very sharp increase to 2000, then it takes
almost two minutes more to increase to 2748. I can decrease the maximum
shared memory usage to 1200 if i query (via edismax) only on fields of
one language instead of 25 orso. I can decrease it as well further if i
disable highlighting (HUH?) but still query on all fields.
Post by Markus Jelsma
Post by Markus Jelsma
Post by Markus Jelsma
* We have tried patching Java's ByteBuffer [1] because it seemed
to fit the problems, it does not fix it.
Post by Markus Jelsma
Post by Markus Jelsma
Post by Markus Jelsma
* We have also removed all our custom plugins, so it has become
a vanilla Solr 6.6 just with our stripped down schema and solrconfig,
it neither fixes it.
Post by Markus Jelsma
Post by Markus Jelsma
Post by Markus Jelsma
Why does it slowly increase over time?
Why does it appear to correlate to index size?
Is anyone else seeing this on their 6.6 cloud production or
local machines?
Post by Markus Jelsma
Post by Markus Jelsma
Post by Markus Jelsma
Thanks,
Markus
[1]: http://www.evanjones.ca/java-bytebuffer-leak.html
-----Original message-----
Sent: Tuesday 22nd August 2017 17:32
Subject: Re: Solr uses lots of shared memory!
Post by Markus Jelsma
I have never seen this before, one of our collections, all
nodes eating tons of shared memory!
Post by Markus Jelsma
Post by Markus Jelsma
Post by Markus Jelsma
Post by Markus Jelsma
10497 solr 20 0 19.439g 4.505g 3.139g S 1.0 57.8
2511:46 java
Post by Markus Jelsma
Post by Markus Jelsma
Post by Markus Jelsma
Post by Markus Jelsma
RSS is roughly equal to heap size + usual off-heap space +
shared memory. Virtual is equal to RSS and index size on disk. For two
other collections, the nodes use shared memory as expected, in the MB
range.
Post by Markus Jelsma
Post by Markus Jelsma
Post by Markus Jelsma
Post by Markus Jelsma
How can Solr, this collection, use so much shared memory? Why?
I've seen this on my own servers at work, and when I add up a
subset of
Post by Markus Jelsma
Post by Markus Jelsma
Post by Markus Jelsma
the memory numbers I can see from the system, it ends up being
more
Post by Markus Jelsma
Post by Markus Jelsma
Post by Markus Jelsma
memory than I even have in the server.
I suspect there is something odd going on in how Java reports
memory
Post by Markus Jelsma
Post by Markus Jelsma
Post by Markus Jelsma
usage to the OS, or maybe a glitch in how Linux interprets
Java's memory
Post by Markus Jelsma
Post by Markus Jelsma
Post by Markus Jelsma
usage. At some point in the past, numbers were reported
correctly. I
Post by Markus Jelsma
Post by Markus Jelsma
Post by Markus Jelsma
do not know if the change came about because of a Solr upgrade,
because
Post by Markus Jelsma
Post by Markus Jelsma
Post by Markus Jelsma
of a Java upgrade, or because of an OS kernel upgrade. All
three were
Post by Markus Jelsma
Post by Markus Jelsma
Post by Markus Jelsma
upgraded between when I know the numbers looked right and when
I noticed
Post by Markus Jelsma
Post by Markus Jelsma
Post by Markus Jelsma
they were wrong.
https://www.dropbox.com/s/91uqlrnfghr2heo/solr-memory-sorted-top.png?dl=0
Post by Markus Jelsma
Post by Markus Jelsma
Post by Markus Jelsma
This screenshot shows that Solr is using 17GB of memory,
41.45GB of
Post by Markus Jelsma
Post by Markus Jelsma
Post by Markus Jelsma
memory is being used by the OS disk cache, and 10.23GB of
memory is
Post by Markus Jelsma
Post by Markus Jelsma
Post by Markus Jelsma
free. Add those up, and it comes to 68.68GB ... but the
machine only
Post by Markus Jelsma
Post by Markus Jelsma
Post by Markus Jelsma
has 64GB of memory, and that total doesn't include the memory
usage of
Post by Markus Jelsma
Post by Markus Jelsma
Post by Markus Jelsma
the other processes seen in the screenshot. This impossible
situation
Post by Markus Jelsma
Post by Markus Jelsma
Post by Markus Jelsma
means that something is being misreported somewhere. If I
deduct that
Post by Markus Jelsma
Post by Markus Jelsma
Post by Markus Jelsma
11GB of SHR from the RES value, then all the numbers work.
The screenshot was almost 3 years ago, so I do not know what
machine it
Post by Markus Jelsma
Post by Markus Jelsma
Post by Markus Jelsma
came from, and therefore I can't be sure what the actual heap
size was.
Post by Markus Jelsma
Post by Markus Jelsma
Post by Markus Jelsma
I think it was about 6GB -- the difference between RES and SHR.
I have
Post by Markus Jelsma
Post by Markus Jelsma
Post by Markus Jelsma
used a 6GB heap on some of my production servers in the past.
The
Post by Markus Jelsma
Post by Markus Jelsma
Post by Markus Jelsma
server where I got this screenshot was not having any
noticeable
Post by Markus Jelsma
Post by Markus Jelsma
Post by Markus Jelsma
performance or memory problems, so I think that I can trust
that the
Post by Markus Jelsma
Post by Markus Jelsma
Post by Markus Jelsma
main numbers above the process list (which only come from the
OS) are
Post by Markus Jelsma
Post by Markus Jelsma
Post by Markus Jelsma
correct.
Thanks,
Shawn
--
Regards,
Shalin Shekhar Mangar.
--
Sorry for being brief. Alternate email is rickleir at yahoo dot com
Markus Jelsma
2017-09-04 11:03:57 UTC
Permalink
Hello Kevin, Rick,

These are interesting points indeed. But this thread is about shared memory, not virtual memory.

Any value higher than 0 for MALLOC_ARENA_MAX only reduces virtual memory consumption, from 22 GB to 16 GB. There are no differences in shared memory nor resident memory.

Although intrestesting, it unfortunately does not answer the question.

To answer Rick's question, the difference with MALLOC_ARENA_MAX=2 is less virtual memory but no changes in queries/second.

Thanks,
Markus

-----Original message-----
Sent: Sunday 3rd September 2017 17:08
Subject: Re: Solr uses lots of shared memory!
Hi all
Malloc has a lock while it is active in the heap. If there is more than one thread, and malloc finds the lock in use, then it avoids waiting on the lock by creating a new 'arena' to hold its heap. My understanding is that a process with multiple threads which are all active users of malloc will eventually have an arena per thread. If you limit the number of arenas, you may suffer delays waiting on locks.
But this needs performance testing. My experience was with C++, not with a JVM. I would be interested to know if setting MALLOC_ARENA_MAX=2 makes a difference to performance.
Cheers -- Rick
Post by Kevin Risden
I haven't looked at reproducing this locally, but since it seems like
there haven't been any new ideas decided to share this in case it
I noticed in Travis CI [1] they are adding the environment variable
MALLOC_ARENA_MAX=2 and so I googled what that configuration did. To my
surprise, I came across a stackoverflow post [2] about how glibc could
actually be the case and report memory differently. I then found a
Hadoop issue HADOOP-7154 [3] about setting this as well to reduce
virtual memory usage. I found some more cases where this has helped as
well [4], [5], and [6]
[1]
https://docs.travis-ci.com/user/build-environment-updates/2017-09-06/#Added
[2]
https://stackoverflow.com/questions/10575342/what-would-cause-a-java-process-to-greatly-exceed-the-xmx-or-xss-limit
[3]
https://issues.apache.org/jira/browse/HADOOP-7154?focusedCommentId=14505792
[4] https://github.com/cloudfoundry/java-buildpack/issues/320
[5] https://devcenter.heroku.com/articles/tuning-glibc-memory-behavior
[6]
https://www.ibm.com/developerworks/community/blogs/kevgrig/entry/linux_glibc_2_10_rhel_6_malloc_may_show_excessive_virtual_memory_usage?lang=en
Kevin Risden
On Thu, Aug 24, 2017 at 10:19 AM, Markus Jelsma
Post by Markus Jelsma
Hello Bernd,
According to the man page, i should get a list of stuff in shared
memory if i invoke it with just a PID. Which shows a list of libraries
that together account for about 25 MB's shared memory usage. Accoring
to ps and top, the JVM uses 2800 MB shared memory (not virtual), that
leaves 2775 MB unaccounted for. Any ideas? Anyone else to reproduce it
on a freshly restarted node?
Post by Markus Jelsma
Thanks,
Markus
   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+
COMMAND
Post by Markus Jelsma
18901 markus    20   0 14,778g 4,965g 2,987g S 891,1 31,7  20:21.63
java
Post by Markus Jelsma
0x000055b9a17f1000      6K    
/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java
Post by Markus Jelsma
0x00007fdf1d314000      182K  
/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/libsunec.so
Post by Markus Jelsma
0x00007fdf1e548000      38K   
/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/libmanagement.so
Post by Markus Jelsma
0x00007fdf1e78e000      94K   
/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/libnet.so
Post by Markus Jelsma
0x00007fdf1e9a6000      75K   
/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/libnio.so
Post by Markus Jelsma
0x00007fdf5cd6e000      34K   
/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/libzip.so
Post by Markus Jelsma
0x00007fdf5cf77000      46K   
/lib/x86_64-linux-gnu/libnss_files-2.24.so
Post by Markus Jelsma
0x00007fdf5d189000      46K   
/lib/x86_64-linux-gnu/libnss_nis-2.24.so
Post by Markus Jelsma
0x00007fdf5d395000      90K     /lib/x86_64-linux-gnu/libnsl-2.24.so
0x00007fdf5d5ae000      34K   
/lib/x86_64-linux-gnu/libnss_compat-2.24.so
Post by Markus Jelsma
0x00007fdf5d7b7000      187K  
/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/libjava.so
Post by Markus Jelsma
0x00007fdf5d9e6000      70K   
/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/libverify.so
Post by Markus Jelsma
0x00007fdf5dbf8000      30K     /lib/x86_64-linux-gnu/librt-2.24.so
0x00007fdf5de00000      90K     /lib/x86_64-linux-gnu/libgcc_s.so.1
0x00007fdf5e017000      1063K   /lib/x86_64-linux-gnu/libm-2.24.so
0x00007fdf5e320000      1553K 
/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.22
Post by Markus Jelsma
0x00007fdf5e6a8000      15936K
/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so
Post by Markus Jelsma
0x00007fdf5f5ed000      139K  
/lib/x86_64-linux-gnu/libpthread-2.24.so
Post by Markus Jelsma
0x00007fdf5f80b000      14K     /lib/x86_64-linux-gnu/libdl-2.24.so
0x00007fdf5fa0f000      110K    /lib/x86_64-linux-gnu/libz.so.1.2.11
0x00007fdf5fc2b000      1813K   /lib/x86_64-linux-gnu/libc-2.24.so
0x00007fdf5fff2000      58K   
/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/jli/libjli.so
Post by Markus Jelsma
0x00007fdf60201000      158K    /lib/x86_64-linux-gnu/ld-2.24.so
-----Original message-----
Sent: Thursday 24th August 2017 15:39
Subject: Re: Solr uses lots of shared memory!
Just an idea, how about taking a dump with jmap and using
MemoryAnalyzerTool to see what is going on?
Regards
Bernd
Post by Markus Jelsma
Hello Shalin,
Yes, the main search index has DocValues on just a few fields,
they are used for facetting and function queries, we started using
DocValues when 6.0 was released. Most fields are content fields for
many languages. I don't think it is going to be DocValues because the
max shared memory consumption is reduced my searching on fields fewer
languages, and by disabling highlighting, both not using DocValues.
Post by Markus Jelsma
Post by Markus Jelsma
But it tried the option regardless, and because i didn't know
about it. But it appears the option does exactly nothing. First is
without any configuration for preload, second is with preload=true,
third is preload=false
Post by Markus Jelsma
Post by Markus Jelsma
14220 markus    20   0 14,675g 1,508g  62800 S   1,0  9,6 
0:36.98 java
Post by Markus Jelsma
Post by Markus Jelsma
14803 markus    20   0 14,674g 1,537g  63248 S   0,0  9,8 
0:34.50 java
Post by Markus Jelsma
Post by Markus Jelsma
15324 markus    20   0 14,674g 1,409g  63152 S   0,0  9,0 
0:35.50 java
Post by Markus Jelsma
Post by Markus Jelsma
   <directoryFactory name="DirectoryFactory"
class="${solr.directoryFactory:solr.NRTCachingDirectoryFactory}">
Post by Markus Jelsma
Post by Markus Jelsma
      <bool name="preload">false</bool>
   </directoryFactory>
NRTCachingDirectoryFactory implies MMapDirectory right?
Thanks,
Markus
-----Original message-----
Sent: Thursday 24th August 2017 5:51
Subject: Re: Solr uses lots of shared memory!
Very interesting. Do you have many DocValue fields? Have you
always
Post by Markus Jelsma
Post by Markus Jelsma
had them i.e. did you see this problem before you turned on
DocValues?
Post by Markus Jelsma
Post by Markus Jelsma
The DocValue fields are in a separate file and they will be
memory
Post by Markus Jelsma
Post by Markus Jelsma
mapped on demand. One thing you can experiment with is to use
preload=true option on the MMapDirectoryFactory which will mmap
all
Post by Markus Jelsma
Post by Markus Jelsma
index files on startup [1]. Once you do this, and if you still
notice
Post by Markus Jelsma
Post by Markus Jelsma
shared memory leakage then it may be a genuine memory leak that
we
Post by Markus Jelsma
Post by Markus Jelsma
should investigate.
[1] -
http://lucene.apache.org/solr/guide/6_6/datadir-and-directoryfactory-in-solrconfig.html#DataDirandDirectoryFactoryinSolrConfig-SpecifyingtheDirectoryFactoryForYourIndex
Post by Markus Jelsma
Post by Markus Jelsma
On Wed, Aug 23, 2017 at 7:02 PM, Markus Jelsma
Post by Markus Jelsma
I do not think it is a problem of reporting after watching top
after restart of some Solr instances, it dropped back to `normal`,
around 350 MB, which i think it high to but anyway.
Post by Markus Jelsma
Post by Markus Jelsma
Post by Markus Jelsma
Two hours later, the restarted nodes are slowly increasing
shared memory consumption to about 1500 MB now. I don't understand why
shared memory usage should/would increase slowly over time, it makes
little sense to me and i cannot remember Solr doing this in the past
ten years.
Post by Markus Jelsma
Post by Markus Jelsma
Post by Markus Jelsma
But it seems to correlate to index size on disk, these main text
search nodes have an index of around 16 GB and up 3 GB of shared memory
after a few days. Logs nodes up to 800 MB index size and 320 MB of
shared memory, the low latency nodes have four different cores that
make up just over 100 MB index size, shared memory consumption is just
22 MB, which seems more reasonable for the case of shared memory.
Post by Markus Jelsma
Post by Markus Jelsma
Post by Markus Jelsma
I can also force Solr to 'leak' shared memory just by sending
queries to it. My freshly restarted local node used 68 MB shared memory
at startup. Two minutes and 25.000 queries later it was already 2748
MB! At first there is a very sharp increase to 2000, then it takes
almost two minutes more to increase to 2748. I can decrease the maximum
shared memory usage to 1200 if i query (via edismax) only on fields of
one language instead of 25 orso. I can decrease it as well further if i
disable highlighting (HUH?) but still query on all fields.
Post by Markus Jelsma
Post by Markus Jelsma
Post by Markus Jelsma
* We have tried patching Java's ByteBuffer [1] because it seemed
to fit the problems, it does not fix it.
Post by Markus Jelsma
Post by Markus Jelsma
Post by Markus Jelsma
* We have also removed all our custom plugins, so it has become
a vanilla Solr 6.6 just with our stripped down schema and solrconfig,
it neither fixes it.
Post by Markus Jelsma
Post by Markus Jelsma
Post by Markus Jelsma
Why does it slowly increase over time?
Why does it appear to correlate to index size?
Is anyone else seeing this on their 6.6 cloud production or
local machines?
Post by Markus Jelsma
Post by Markus Jelsma
Post by Markus Jelsma
Thanks,
Markus
[1]: http://www.evanjones.ca/java-bytebuffer-leak.html
-----Original message-----
Sent: Tuesday 22nd August 2017 17:32
Subject: Re: Solr uses lots of shared memory!
Post by Markus Jelsma
I have never seen this before, one of our collections, all
nodes eating tons of shared memory!
Post by Markus Jelsma
Post by Markus Jelsma
Post by Markus Jelsma
Post by Markus Jelsma
10497 solr      20   0 19.439g 4.505g 3.139g S   1.0 57.8 
2511:46 java
Post by Markus Jelsma
Post by Markus Jelsma
Post by Markus Jelsma
Post by Markus Jelsma
RSS is roughly equal to heap size + usual off-heap space +
shared memory. Virtual is equal to RSS and index size on disk. For two
other collections, the nodes use shared memory as expected, in the MB
range.
Post by Markus Jelsma
Post by Markus Jelsma
Post by Markus Jelsma
Post by Markus Jelsma
How can Solr, this collection, use so much shared memory? Why?
I've seen this on my own servers at work, and when I add up a
subset of
Post by Markus Jelsma
Post by Markus Jelsma
Post by Markus Jelsma
the memory numbers I can see from the system, it ends up being
more
Post by Markus Jelsma
Post by Markus Jelsma
Post by Markus Jelsma
memory than I even have in the server.
I suspect there is something odd going on in how Java reports
memory
Post by Markus Jelsma
Post by Markus Jelsma
Post by Markus Jelsma
usage to the OS, or maybe a glitch in how Linux interprets
Java's memory
Post by Markus Jelsma
Post by Markus Jelsma
Post by Markus Jelsma
usage.  At some point in the past, numbers were reported
correctly.  I
Post by Markus Jelsma
Post by Markus Jelsma
Post by Markus Jelsma
do not know if the change came about because of a Solr upgrade,
because
Post by Markus Jelsma
Post by Markus Jelsma
Post by Markus Jelsma
of a Java upgrade, or because of an OS kernel upgrade.  All
three were
Post by Markus Jelsma
Post by Markus Jelsma
Post by Markus Jelsma
upgraded between when I know the numbers looked right and when
I noticed
Post by Markus Jelsma
Post by Markus Jelsma
Post by Markus Jelsma
they were wrong.
https://www.dropbox.com/s/91uqlrnfghr2heo/solr-memory-sorted-top.png?dl=0
Post by Markus Jelsma
Post by Markus Jelsma
Post by Markus Jelsma
This screenshot shows that Solr is using 17GB of memory,
41.45GB of
Post by Markus Jelsma
Post by Markus Jelsma
Post by Markus Jelsma
memory is being used by the OS disk cache, and 10.23GB of
memory is
Post by Markus Jelsma
Post by Markus Jelsma
Post by Markus Jelsma
free.  Add those up, and it comes to 68.68GB ... but the
machine only
Post by Markus Jelsma
Post by Markus Jelsma
Post by Markus Jelsma
has 64GB of memory, and that total doesn't include the memory
usage of
Post by Markus Jelsma
Post by Markus Jelsma
Post by Markus Jelsma
the other processes seen in the screenshot.  This impossible
situation
Post by Markus Jelsma
Post by Markus Jelsma
Post by Markus Jelsma
means that something is being misreported somewhere.  If I
deduct that
Post by Markus Jelsma
Post by Markus Jelsma
Post by Markus Jelsma
11GB of SHR from the RES value, then all the numbers work.
The screenshot was almost 3 years ago, so I do not know what
machine it
Post by Markus Jelsma
Post by Markus Jelsma
Post by Markus Jelsma
came from, and therefore I can't be sure what the actual heap
size was.
Post by Markus Jelsma
Post by Markus Jelsma
Post by Markus Jelsma
I think it was about 6GB -- the difference between RES and SHR.
I have
Post by Markus Jelsma
Post by Markus Jelsma
Post by Markus Jelsma
used a 6GB heap on some of my production servers in the past.
The
Post by Markus Jelsma
Post by Markus Jelsma
Post by Markus Jelsma
server where I got this screenshot was not having any
noticeable
Post by Markus Jelsma
Post by Markus Jelsma
Post by Markus Jelsma
performance or memory problems, so I think that I can trust
that the
Post by Markus Jelsma
Post by Markus Jelsma
Post by Markus Jelsma
main numbers above the process list (which only come from the
OS) are
Post by Markus Jelsma
Post by Markus Jelsma
Post by Markus Jelsma
correct.
Thanks,
Shawn
--
Regards,
Shalin Shekhar Mangar.
--
Sorry for being brief. Alternate email is rickleir at yahoo dot com
Loading...