Discussion:
SOLR/Tomcat6 keeping references to deleted tlog files
Eric Bus
2013-10-22 08:00:18 UTC
Permalink
Hi,

I've been running a SolrCloud setup running SOLR 4.4 consisting of 3 nodes for some time. The cloud is hosting about 40 small collections that receive updates once a day. The collections are using different shard and replication configurations (varying from 2 shards without replication to 2 shard with 3 replicas).

After running Tomcat for a couple of weeks, I notice the number of open files is dramatically increasing. Most of those files are deleted tlog files that SOLR keeps open:

***@node1:/ # lsof -np 16810 | grep deleted | wc -l
36345

Those files are no longer on disk, but SOLR still has a handle open. My disk use is going through the roof. 6GB is currently 'in use' by deleted but still open files. When I restart Tomcat, the space is freed and it starts all over again. All of my nodes experience this behavior.

First I thought it had something to do with the lack of commits. But it happens on all my collections, even the ones with fast autoCommit:

<autoCommit>
<maxDocs>5000</maxDocs>
<maxTime>120000</maxTime>
<openSearcher>false</openSearcher>
</autoCommit>

My update process always triggers a commit or rollback and updates are showing up correctly.

I read something about SOLR having TCP connections in CLOSE_WAIT. The only CLOSE_WAIT connection I see are between the nodes. And there are only about 10 of them. Those connections can't be causing 36k open files, right?

Any suggestions/tips? At the moment, I have to restart my leader every couple of weeks and that's not really something I would like to do :)

Best regards,
Eric Bus
Erick Erickson
2013-10-22 09:13:30 UTC
Permalink
Hmmmm, sounds like you've put some time into sleuthing here, cool!

Do you notice that your open file handles are increasing roughly
linearly with time? Assuming a relatively constant indexing rate, that's
what
I'd expect if Solr is just failing to close the tlog somehow.

I'm assuming no custom code here, thought I'd check to be sure though.

But what I'd do is wait a few more hours and see if some of the people deep
into SolrCloud answer (Yonik, Shalin, Noble, Mark, etc.). but absent a
response
from those folks this sounds like a JIRA in the making to me.... Those
folks are scattered all over the world...

Best,
Erick

P.S.
This is really a bit unrelated, but unless you're only indexing documents
very
slowly, your maxDocs number of docs is rather short FWIW. But this should
have no bearing on increasing file handles, just a side comment.
Post by Eric Bus
Hi,
I've been running a SolrCloud setup running SOLR 4.4 consisting of 3 nodes
for some time. The cloud is hosting about 40 small collections that receive
updates once a day. The collections are using different shard and
replication configurations (varying from 2 shards without replication to 2
shard with 3 replicas).
After running Tomcat for a couple of weeks, I notice the number of open
files is dramatically increasing. Most of those files are deleted tlog
36345
Those files are no longer on disk, but SOLR still has a handle open. My
disk use is going through the roof. 6GB is currently 'in use' by deleted
but still open files. When I restart Tomcat, the space is freed and it
starts all over again. All of my nodes experience this behavior.
First I thought it had something to do with the lack of commits. But it
<autoCommit>
<maxDocs>5000</maxDocs>
<maxTime>120000</maxTime>
<openSearcher>false</openSearcher>
</autoCommit>
My update process always triggers a commit or rollback and updates are
showing up correctly.
I read something about SOLR having TCP connections in CLOSE_WAIT. The only
CLOSE_WAIT connection I see are between the nodes. And there are only about
10 of them. Those connections can't be causing 36k open files, right?
Any suggestions/tips? At the moment, I have to restart my leader every
couple of weeks and that's not really something I would like to do :)
Best regards,
Eric Bus
Loading...