How to size document cache

Discussion:

Matteo Grolla

2013-10-25 10:28:50 UTC

Hi,
I'd really appreciate if you could give me some help understanding how to tune the document cache.
My thoughts:

min values: max_results * max_concurrent_queries, as stated by http://wiki.apache.org/solr/SolrCaching
how can I estimate max_concurrent_queries?

size: I think there's a tension between dedicating memory to this cache and reducing the java heap size so the OS can buffer more of the index on disk
probably I could try increasing this value if I see strong benefits on the hit ratio (the documents returned are a small subset of all docs)

If I have enough RAM that the whole index fits in memory can I just ignore this cache? (maybe just keep it just above the recommended min values)

Matteo

Erick Erickson

2013-10-25 13:48:52 UTC

Permalink

I hadn't thought about it before, but now I'm curious how
MMapDirectoryFactory plays into documentCache. Uwe,
are you listening? :) My _guess_ is that if you're using
MMapDirectoryFactory, the usefulness of the document
cache is lessened, kinda.

Since the documents are coming from essentially
random places in the files, you're probably going to chew
up op system blocks keeping these around. But that's
probably no worse than chewing up Java memory and
avoids some GC churn.

OTOH, the raw disk data must be decompressed, perhaps
every time they're read no matter whether the data comes
from the MMap IO buffers or have to be read from disk

OTOH, unless the docs are really big, this shouldn't matter
much.

Hmmm, I guess "measure and find out" is about all I can
really offer...

Best,
Erick

Post by Matteo Grolla
Hi,
I'd really appreciate if you could give me some help understanding
how to tune the document cache.
min values: max_results * max_concurrent_queries, as stated by
http://wiki.apache.org/solr/SolrCaching
how can I estimate max_concurrent_queries?
size: I think there's a tension between dedicating memory to this
cache and reducing the java heap size so the OS can buffer more of the
index on disk
probably I could try increasing this value if I see strong
benefits on the hit ratio (the documents returned are a small subset of all
docs)
If I have enough RAM that the whole index fits in memory
can I just ignore this cache? (maybe just keep it just above the
recommended min values)
Matteo

Shawn Heisey

2013-10-25 13:59:21 UTC

Permalink

Post by Erick Erickson
I hadn't thought about it before, but now I'm curious how
MMapDirectoryFactory plays into documentCache. Uwe,
are you listening? :) My _guess_ is that if you're using
MMapDirectoryFactory, the usefulness of the document
cache is lessened, kinda.
Since the documents are coming from essentially
random places in the files, you're probably going to chew
up op system blocks keeping these around. But that's
probably no worse than chewing up Java memory and
avoids some GC churn.

Solr's caches save CPU cycles as well as disk access. If results can be
returned from a Solr cache, then Solr (and ultimately, Lucene) don't
have to go rifling through index data to figure out what the results
are. Although this process is greatly sped up when the data is in the
OS disk cache, it still isn't free.

For large-scale caching, the OS is better at the job than Solr and Java.
IMHO, the Solr caches are still important (but can be smaller) because
data that is accessed a LOT will be very readily available.

Thanks,
Shawn