Using RAMDirectoryFactory in Master/Slave setup

Discussion:

nipunb

2011-06-26 21:07:09 UTC

PS: Sorry if this is a repost, I was unable to see my message in the mailing
list - this may have been due to my outgoing email different from the one I
used to subscribe to the list with.

Overview – Trying to evaluate if keeping the index in memory using
RAMDirectoryFactory can help in query performance.I am trying to perform the
indexing on the master using solr.StandardDirectoryFactory and make those
indexes accesible to the slave using solr.RAMDirectoryFactory

Details:
We have set-up Solr in a master/slave enviornment. The index is built on the
master and then replicated to slaves which are used to serve the query.
The replication is done using the in-built Java replication in Solr.
On the master, in the <indexDefaults> of solrconfig.xml we have
<directoryFactory name="DirectoryFactory"
class="solr.StandardDirectoryFactory"/>

On the slave, I tried to use the following in the <indexDefaults>

<directoryFactory name="DirectoryFactory"
class="solr.RAMDirectoryFactory"/>

My slave shows no data for any queries. In solrconfig.xml it is mentioned
that replication doesn’t work when using RAMDirectoryFactory, however this (
https://issues.apache.org/jira/browse/SOLR-1379) mentions that you can use
it to have the index on disk and then load into memory.

To test the sanity of my set-up, I changed solrconfig.xml in the slave to
and replicated:
<directoryFactory name="DirectoryFactory"
class="solr.StandardDirectoryFactory"/>
I was able to see the results.

Shouldn’t RAMDirectoryFactory be used for reading index from disk into
memory?

Any help/pointers in the right direction would be appreciated.

Thanks!

--
View this message in context: http://lucene.472066.n3.nabble.com/Using-RAMDirectoryFactory-in-Master-Slave-setup-tp3111792p3111792.html
Sent from the Solr - User mailing list archive at Nabble.com.

Nipun Bhatia

2011-06-26 12:01:13 UTC

Permalink

Overview Trying to evaluate if keeping the index in memory using
RAMDirectoryFactory can help in query performance.I am trying to perform the
indexing on the master using solr.StandardDirectoryFactory and on the slave
using solr.RAMDirectoryFactory

Details:
We have set-up Solr in a master/slave enviornment. The index is built on the
master and then replicated to slaves which are used to serve the query.
The replication is done using the in-built Java replication in Solr.
On the master, in the <indexDefaults> of solrconfig.xml we have
<directoryFactory name="DirectoryFactory"
class="solr.StandardDirectoryFactory"/>

On the slave, I tried to use the following in the <indexDefaults>

<directoryFactory name="DirectoryFactory"
class="solr.RAMDirectoryFactory"/>

My slave shows no data for any queries. In solrconfig.xml it is mentioned
that replication doesn¹t work when using RAMDirectoryFactory, however this (
https://issues.apache.org/jira/browse/SOLR-1379) mentions that you can use
it to have the index on disk and then load into memory.

To test the sanity of my set-up, I changed solrconfig.xml in the slave to
and replicated:
<directoryFactory name="DirectoryFactory"
class="solr.StandardDirectoryFactory"/>
I was able to see the results.

Shouldn¹t RAMDirectoryFactory be used for reading index from disk into
memory?

Any help/pointers in the right direction would be appreciated.

Thanks!

nipunb

2011-06-27 07:19:46 UTC

Permalink

I found a similar post -
http://lucene.472066.n3.nabble.com/Problems-with-RAMDirectory-in-Solr-td1575223.html
It mentions that Java based replication might work (This is what I have
used, but didn't work for me)
More interestingly it points out that OS's file system cache maybe able to
do this job better.
Has anybody had a chance to do a comparison based on query performance for
StandardDirectoryFactory vs RAMDirectoryFactory.

--
View this message in context: http://lucene.472066.n3.nabble.com/Using-RAMDirectoryFactory-in-Master-Slave-setup-tp3112007p3112818.html
Sent from the Solr - User mailing list archive at Nabble.com.

Shalin Shekhar Mangar

2011-06-27 08:50:28 UTC

Permalink

Post by nipunb
I found a similar post -
http://lucene.472066.n3.nabble.com/Problems-with-RAMDirectory-in-Solr-td1575223.html
It mentions that Java based replication might work (This is what I have
used, but didn't work for me)

Solr Replication does not work with non-file directory implementations.

--
Regards,
Shalin Shekhar Mangar.

eks dev

2011-06-27 11:29:46 UTC

Permalink

Your best bet is MMapDirectoryFactory, you can come very close to the
performance of the RAMDirectory. Unfortunatelly this setup with
Master_on_disk->Slaves_in_ram type of setup is not possible using
solr.

We are moving our architecture to solr at the moment, and this is one
of "missings" we have to somehow figure out.

The problem is that MMap works fine on average, but it also has quirks
regarding upper quntiles of the responses.

If you are using RAMDirectory, you do not need to be afraid that
occasionally slow IO will kill performance for some of your requests.
This happens with MMAP, and not all that rare, depending on your usage
pattern (high update/commit rate for example). I repeat, RAMDirectory
is not to beat when it comes to reduction of the IO-caused "outliers".

We removed some 90% of the slowest response times by using
RAMDirectory instead of MMap...
Depending on what you want to optimize, MMap can work just fine for
you, and has some nice properties, eg. you do not need to tune gc()
as much as if you manage bigger heap (RAMDirectory...)

But, imo, it would make sense to have some possibility to do it in solr.

On Mon, Jun 27, 2011 at 10:50 AM, Shalin Shekhar Mangar

Post by Shalin Shekhar Mangar

Solr Replication does not work with non-file directory implementations.
--
Regards,
Shalin Shekhar Mangar.

nipunb

2011-06-27 18:43:05 UTC

Permalink

Thanks for the pointer to MMapDirectoryFactory.
Not having replication with RAMDirectoryFactory is a deal killer. We dont
want to index on the machines that serve queries.
From what I can gather from reading, MMapDirectory + SSD could be a happy
medium.
I'll try to evaluate these a bit more formally and post to the list.
Thanks for the help again!

--
View this message in context: http://lucene.472066.n3.nabble.com/Using-RAMDirectoryFactory-in-Master-Slave-setup-tp3112007p3114809.html
Sent from the Solr - User mailing list archive at Nabble.com.

Lance Norskog

2011-06-29 04:00:35 UTC

Permalink

Using RAMDirectory really does not help performance. Java garbage
collection has to work around all of the memory taken by the segments.
It works out that Solr works better (for most indexes) without using
the RAMDirectory.

Post by nipunb
PS: Sorry if this is a repost, I was unable to see my message in the mailing
list - this may have been due to my outgoing email different from the one I
used to subscribe to the list with.
Overview – Trying to evaluate if keeping the index in memory using
RAMDirectoryFactory can help in query performance.I am trying to perform the
indexing on the master using solr.StandardDirectoryFactory and make those
indexes accesible to the slave using solr.RAMDirectoryFactory
We have set-up Solr in a master/slave enviornment. The index is built on the
master and then replicated to slaves which are used to serve the query.
The replication is done using the in-built Java replication in Solr.
On the master, in the <indexDefaults> of solrconfig.xml we have
<directoryFactory name="DirectoryFactory"
class="solr.StandardDirectoryFactory"/>
On the slave, I tried to use the following in the <indexDefaults>
<directoryFactory name="DirectoryFactory"
class="solr.RAMDirectoryFactory"/>
My slave shows no data for any queries. In solrconfig.xml it is mentioned
that replication doesn’t work when using RAMDirectoryFactory, however this (
https://issues.apache.org/jira/browse/SOLR-1379) mentions that you can use
it to have the index on disk and then load into memory.
To test the sanity of my set-up, I changed solrconfig.xml in the slave to
<directoryFactory name="DirectoryFactory"
class="solr.StandardDirectoryFactory"/>
I was able to see the results.
Shouldn’t RAMDirectoryFactory be used for reading index from disk into
memory?
Any help/pointers in the right direction would be appreciated.
Thanks!
--
View this message in context: http://lucene.472066.n3.nabble.com/Using-RAMDirectoryFactory-in-Master-Slave-setup-tp3111792p3111792.html
Sent from the Solr - User mailing list archive at Nabble.com.

--
Lance Norskog
***@gmail.com

eks dev

2011-06-29 07:35:12 UTC

Permalink

...Using RAMDirectory really does not help performance...

I kind of agree, but in my experience with lucene, there are cases
where RAMDirectory helps a lot, with all its drawbacks (huge heap and
gc() tuning).

We had very good experience with MMAP on average, but moving to
RAMDirectory with properly tuned gc() reduced 95% of "slow performers"
in upper range of response times (e.g. slowest 5% queries). On average
it made practically no difference.
Maybe is this mitigated by better warm up on solr than our hand-tuned
warmup, maybe not, I do not really know.

In MMAP, you need to have really smart warm up (MMAP) to beat IO
quirks, for RAMDir you need to tune gc(), choose your poison :)

I argue, in some cases it is very hard to tame IO quirks (e.g. this is
shared resource, you never know what going really on in shared app
setup!). Then, see only what is happening on major merge and all these
efforts with native linux directory to somehow get a grip on that...
If you have spare ram, you are probably safer with RAMDirectory.

From the theoretical perspective, in ideal case, RAM ought to be

faster than disk (and more expensive). If this is not the case, we did
something wrong. I have a feeling that this work Mike is doing with
in memory Codecs (fst TermDictionary, pulsing codec & co) in Lucene 4,
native directory features ... will make RAMDirectory really obsolete
for production setup.

Cheers,
eks

--
Lance Norskog

Toke Eskildsen

2011-06-29 08:55:41 UTC

Permalink

Post by eks dev
In MMAP, you need to have really smart warm up (MMAP) to beat IO
quirks, for RAMDir you need to tune gc(), choose your poison :)

Other alternatives are operating system RAM disks (avoids the GC
problem) and using SSDs (nearly the same performance as RAM).

eks dev

2011-06-29 09:24:13 UTC

Permalink

sure, SSD or RAM disks fix these problems with IO.

Anyhow, I can really see no alternative for some in memory index for
slaves, especially for low latency master-slave apps (high commit rate
is a problem).

having possibility to run slaves in memory that are slurping updates
from Master seams to me like a preffered method (you need no
twiddling with OS, just CPU and RAM is what you need for your slaves,
run slave and point it to master ). I assume that update propagation
times could be better by having
some sexy ReadOnlySlaveRAMDirectorySlurpingUpdatesFromTheMaster that
does reload() directly from the Master (maybe even uncommitted,
somehow NRT-likish).

Point being, lower latency update than current 1-5 Minutes (wiki
recommended values) is not going to be possible with current
master-slave solution, due to the nature of it (commit to disk on
master, copy delta to slave disk, reload...) This is a lot of ping
pong... ES and solandra are by nature better suited if you need update
propagation in seconds range.

It is just thinking aloud, and slightly off-topic... solr/lucene as it
is today, rocks anyhow.

Post by Toke Eskildsen

Post by eks dev
In MMAP, you need to have really smart warm up (MMAP) to beat IO
quirks, for RAMDir you need to tune gc(), choose your poison :)

Other alternatives are operating system RAM disks (avoids the GC
problem) and using SSDs (nearly the same performance as RAM).

eks dev

2011-06-29 09:26:07 UTC

Permalink

Post by Toke Eskildsen

Post by eks dev
In MMAP, you need to have really smart warm up (MMAP) to beat IO
quirks, for RAMDir you need to tune gc(), choose your poison :)

Other alternatives are operating system RAM disks (avoids the GC
problem) and using SSDs (nearly the same performance as RAM).