Discussion:
Using RAMDirectoryFactory in Master/Slave setup
nipunb
2011-06-26 21:07:09 UTC
Permalink
PS: Sorry if this is a repost, I was unable to see my message in the mailing
list - this may have been due to my outgoing email different from the one I
used to subscribe to the list with.

Overview – Trying to evaluate if keeping the index in memory using
RAMDirectoryFactory can help in query performance.I am trying to perform the
indexing on the master using solr.StandardDirectoryFactory and make those
indexes accesible to the slave using solr.RAMDirectoryFactory

Details:
We have set-up Solr in a master/slave enviornment. The index is built on the
master and then replicated to slaves which are used to serve the query.
The replication is done using the in-built Java replication in Solr.
On the master, in the <indexDefaults> of solrconfig.xml we have
<directoryFactory name="DirectoryFactory"
class="solr.StandardDirectoryFactory"/>

On the slave, I tried to use the following in the <indexDefaults>

<directoryFactory name="DirectoryFactory"
class="solr.RAMDirectoryFactory"/>

My slave shows no data for any queries. In solrconfig.xml it is mentioned
that replication doesn’t work when using RAMDirectoryFactory, however this (
https://issues.apache.org/jira/browse/SOLR-1379) mentions that you can use
it to have the index on disk and then load into memory.

To test the sanity of my set-up, I changed solrconfig.xml in the slave to
and replicated:
<directoryFactory name="DirectoryFactory"
class="solr.StandardDirectoryFactory"/>
I was able to see the results.

Shouldn’t RAMDirectoryFactory be used for reading index from disk into
memory?

Any help/pointers in the right direction would be appreciated.

Thanks!

--
View this message in context: http://lucene.472066.n3.nabble.com/Using-RAMDirectoryFactory-in-Master-Slave-setup-tp3111792p3111792.html
Sent from the Solr - User mailing list archive at Nabble.com.
Nipun Bhatia
2011-06-26 12:01:13 UTC
Permalink
Overview ­ Trying to evaluate if keeping the index in memory using
RAMDirectoryFactory can help in query performance.I am trying to perform the
indexing on the master using solr.StandardDirectoryFactory and on the slave
using solr.RAMDirectoryFactory

Details:
We have set-up Solr in a master/slave enviornment. The index is built on the
master and then replicated to slaves which are used to serve the query.
The replication is done using the in-built Java replication in Solr.
On the master, in the <indexDefaults> of solrconfig.xml we have
<directoryFactory name="DirectoryFactory"
class="solr.StandardDirectoryFactory"/>

On the slave, I tried to use the following in the <indexDefaults>

<directoryFactory name="DirectoryFactory"
class="solr.RAMDirectoryFactory"/>

My slave shows no data for any queries. In solrconfig.xml it is mentioned
that replication doesn¹t work when using RAMDirectoryFactory, however this (
https://issues.apache.org/jira/browse/SOLR-1379) mentions that you can use
it to have the index on disk and then load into memory.

To test the sanity of my set-up, I changed solrconfig.xml in the slave to
and replicated:
<directoryFactory name="DirectoryFactory"
class="solr.StandardDirectoryFactory"/>
I was able to see the results.

Shouldn¹t RAMDirectoryFactory be used for reading index from disk into
memory?

Any help/pointers in the right direction would be appreciated.

Thanks!
nipunb
2011-06-27 07:19:46 UTC
Permalink
I found a similar post -
http://lucene.472066.n3.nabble.com/Problems-with-RAMDirectory-in-Solr-td1575223.html
It mentions that Java based replication might work (This is what I have
used, but didn't work for me)
More interestingly it points out that OS's file system cache maybe able to
do this job better.
Has anybody had a chance to do a comparison based on query performance for
StandardDirectoryFactory vs RAMDirectoryFactory.

--
View this message in context: http://lucene.472066.n3.nabble.com/Using-RAMDirectoryFactory-in-Master-Slave-setup-tp3112007p3112818.html
Sent from the Solr - User mailing list archive at Nabble.com.
Shalin Shekhar Mangar
2011-06-27 08:50:28 UTC
Permalink
Post by nipunb
I found a similar post -
http://lucene.472066.n3.nabble.com/Problems-with-RAMDirectory-in-Solr-td1575223.html
It mentions that Java based replication might work (This is what I have
used, but didn't work for me)
Solr Replication does not work with non-file directory implementations.
--
Regards,
Shalin Shekhar Mangar.
eks dev
2011-06-27 11:29:46 UTC
Permalink
Your best bet is MMapDirectoryFactory, you can come very close to the
performance of the RAMDirectory. Unfortunatelly this setup with
Master_on_disk->Slaves_in_ram type of setup is not possible using
solr.

We are moving our architecture to solr at the moment, and this is one
of "missings" we have to somehow figure out.

The problem is that MMap works fine on average, but it also has quirks
regarding upper quntiles of the responses.

If you are using RAMDirectory, you do not need to be afraid that
occasionally slow IO will kill performance for some of your requests.
This happens with MMAP, and not all that rare, depending on your usage
pattern (high update/commit rate for example). I repeat, RAMDirectory
is not to beat when it comes to reduction of the IO-caused "outliers".

We removed some 90% of the slowest response times by using
RAMDirectory instead of MMap...
Depending on what you want to optimize, MMap can work just fine for
you, and has some nice properties, eg. you do not need to tune gc()
as much as if you manage bigger heap (RAMDirectory...)

But, imo, it would make sense to have some possibility to do it in solr.







On Mon, Jun 27, 2011 at 10:50 AM, Shalin Shekhar Mangar
Post by Shalin Shekhar Mangar
Post by nipunb
I found a similar post -
http://lucene.472066.n3.nabble.com/Problems-with-RAMDirectory-in-Solr-td1575223.html
It mentions that Java based replication might work (This is what I have
used, but didn't work for me)
Solr Replication does not work with non-file directory implementations.
--
Regards,
Shalin Shekhar Mangar.
nipunb
2011-06-27 18:43:05 UTC
Permalink
Thanks for the pointer to MMapDirectoryFactory.
Not having replication with RAMDirectoryFactory is a deal killer. We dont
want to index on the machines that serve queries.
From what I can gather from reading, MMapDirectory + SSD could be a happy
medium.
I'll try to evaluate these a bit more formally and post to the list.
Thanks for the help again!

--
View this message in context: http://lucene.472066.n3.nabble.com/Using-RAMDirectoryFactory-in-Master-Slave-setup-tp3112007p3114809.html
Sent from the Solr - User mailing list archive at Nabble.com.
Lance Norskog
2011-06-29 04:00:35 UTC
Permalink
Using RAMDirectory really does not help performance. Java garbage
collection has to work around all of the memory taken by the segments.
It works out that Solr works better (for most indexes) without using
the RAMDirectory.
Post by nipunb
PS: Sorry if this is a repost, I was unable to see my message in the mailing
list - this may have been due to my outgoing email different from the one I
used to subscribe to the list with.
Overview – Trying to evaluate if keeping the index in memory using
RAMDirectoryFactory can help in query performance.I am trying to perform the
indexing on the master using solr.StandardDirectoryFactory and make those
indexes accesible to the slave using solr.RAMDirectoryFactory
We have set-up Solr in a master/slave enviornment. The index is built on the
master and then replicated to slaves which are used to serve the query.
The replication is done using the in-built Java replication in Solr.
On the master, in the <indexDefaults> of solrconfig.xml we have
<directoryFactory name="DirectoryFactory"
       class="solr.StandardDirectoryFactory"/>
On the slave, I tried to use the following in the <indexDefaults>
<directoryFactory name="DirectoryFactory"
        class="solr.RAMDirectoryFactory"/>
My slave shows no data for any queries. In solrconfig.xml it is mentioned
that replication doesn’t work when using RAMDirectoryFactory, however this (
https://issues.apache.org/jira/browse/SOLR-1379) mentions that you can use
it to have the index on disk and then load into memory.
To test the sanity of my set-up, I changed solrconfig.xml in the slave to
<directoryFactory name="DirectoryFactory"
       class="solr.StandardDirectoryFactory"/>
I was able to see the results.
Shouldn’t RAMDirectoryFactory be used for reading index from disk into
memory?
Any help/pointers in the right direction would be appreciated.
Thanks!
--
View this message in context: http://lucene.472066.n3.nabble.com/Using-RAMDirectoryFactory-in-Master-Slave-setup-tp3111792p3111792.html
Sent from the Solr - User mailing list archive at Nabble.com.
--
Lance Norskog
***@gmail.com
eks dev
2011-06-29 07:35:12 UTC
Permalink
...Using RAMDirectory really does not help performance...

I kind of agree, but in my experience with lucene, there are cases
where RAMDirectory helps a lot, with all its drawbacks (huge heap and
gc() tuning).

We had very good experience with MMAP on average, but moving to
RAMDirectory with properly tuned gc() reduced 95% of "slow performers"
in upper range of response times (e.g. slowest 5% queries). On average
it made practically no difference.
Maybe is this mitigated by better warm up on solr than our hand-tuned
warmup, maybe not, I do not really know.

In MMAP, you need to have really smart warm up (MMAP) to beat IO
quirks, for RAMDir you need to tune gc(), choose your poison :)

I argue, in some cases it is very hard to tame IO quirks (e.g. this is
shared resource, you never know what going really on in shared app
setup!). Then, see only what is happening on major merge and all these
efforts with native linux directory to somehow get a grip on that...
If you have spare ram, you are probably safer with RAMDirectory.
From the theoretical perspective, in ideal case, RAM ought to be
faster than disk (and more expensive). If this is not the case, we did
something wrong. I have a feeling that this work Mike is doing with
in memory Codecs (fst TermDictionary, pulsing codec & co) in Lucene 4,
native directory features ... will make RAMDirectory really obsolete
for production setup.


Cheers,
eks
Using RAMDirectory really does not help performance. Java garbage
collection has to work around all of the memory taken by the segments.
It works out that Solr works better (for most indexes) without using
the RAMDirectory.
Post by nipunb
PS: Sorry if this is a repost, I was unable to see my message in the mailing
list - this may have been due to my outgoing email different from the one I
used to subscribe to the list with.
Overview – Trying to evaluate if keeping the index in memory using
RAMDirectoryFactory can help in query performance.I am trying to perform the
indexing on the master using solr.StandardDirectoryFactory and make those
indexes accesible to the slave using solr.RAMDirectoryFactory
We have set-up Solr in a master/slave enviornment. The index is built on the
master and then replicated to slaves which are used to serve the query.
The replication is done using the in-built Java replication in Solr.
On the master, in the <indexDefaults> of solrconfig.xml we have
<directoryFactory name="DirectoryFactory"
       class="solr.StandardDirectoryFactory"/>
On the slave, I tried to use the following in the <indexDefaults>
<directoryFactory name="DirectoryFactory"
        class="solr.RAMDirectoryFactory"/>
My slave shows no data for any queries. In solrconfig.xml it is mentioned
that replication doesn’t work when using RAMDirectoryFactory, however this (
https://issues.apache.org/jira/browse/SOLR-1379) mentions that you can use
it to have the index on disk and then load into memory.
To test the sanity of my set-up, I changed solrconfig.xml in the slave to
<directoryFactory name="DirectoryFactory"
       class="solr.StandardDirectoryFactory"/>
I was able to see the results.
Shouldn’t RAMDirectoryFactory be used for reading index from disk into
memory?
Any help/pointers in the right direction would be appreciated.
Thanks!
--
View this message in context: http://lucene.472066.n3.nabble.com/Using-RAMDirectoryFactory-in-Master-Slave-setup-tp3111792p3111792.html
Sent from the Solr - User mailing list archive at Nabble.com.
--
Lance Norskog
Toke Eskildsen
2011-06-29 08:55:41 UTC
Permalink
Post by eks dev
In MMAP, you need to have really smart warm up (MMAP) to beat IO
quirks, for RAMDir you need to tune gc(), choose your poison :)
Other alternatives are operating system RAM disks (avoids the GC
problem) and using SSDs (nearly the same performance as RAM).
eks dev
2011-06-29 09:24:13 UTC
Permalink
sure, SSD or RAM disks fix these problems with IO.

Anyhow, I can really see no alternative for some in memory index for
slaves, especially for low latency master-slave apps (high commit rate
is a problem).

having possibility to run slaves in memory that are slurping updates
from Master seams to me like a preffered method (you need no
twiddling with OS, just CPU and RAM is what you need for your slaves,
run slave and point it to master ). I assume that update propagation
times could be better by having
some sexy ReadOnlySlaveRAMDirectorySlurpingUpdatesFromTheMaster that
does reload() directly from the Master (maybe even uncommitted,
somehow NRT-likish).

Point being, lower latency update than current 1-5 Minutes (wiki
recommended values) is not going to be possible with current
master-slave solution, due to the nature of it (commit to disk on
master, copy delta to slave disk, reload...) This is a lot of ping
pong... ES and solandra are by nature better suited if you need update
propagation in seconds range.

It is just thinking aloud, and slightly off-topic... solr/lucene as it
is today, rocks anyhow.
Post by Toke Eskildsen
Post by eks dev
In MMAP, you need to have really smart warm up (MMAP) to beat IO
quirks, for RAMDir  you need to tune gc(), choose your poison :)
Other alternatives are operating system RAM disks (avoids the GC
problem) and using SSDs (nearly the same performance as RAM).
eks dev
2011-06-29 09:26:07 UTC
Permalink
sure, SSD or RAM disks fix these problems with IO.

Anyhow, I can really see no alternative for some in memory index for
slaves, especially for low latency master-slave apps (high commit rate
is a problem).

having possibility to run slaves in memory that are slurping updates
from Master seams to me like a preffered method (you need no
twiddling with OS, just CPU and RAM is what you need for your slaves,
run slave and point it to master ). I assume that update propagation
times could be better by having
some sexy ReadOnlySlaveRAMDirectorySlurpingUpdatesFromTheMaster that
does reload() directly from the Master (maybe even uncommitted,
somehow NRT-likish).

Point being, lower latency update than current 1-5 Minutes (wiki
recommended values) is not going to be possible with current
master-slave solution, due to the nature of it (commit to disk on
master, copy delta to slave disk, reload...) This is a lot of ping
pong... ES and solandra are by nature better suited if you need update
propagation in seconds range.

It is just thinking aloud, and slightly off-topic... solr/lucene as it
is today, rocks anyhow.
Post by Toke Eskildsen
Post by eks dev
In MMAP, you need to have really smart warm up (MMAP) to beat IO
quirks, for RAMDir  you need to tune gc(), choose your poison :)
Other alternatives are operating system RAM disks (avoids the GC
problem) and using SSDs (nearly the same performance as RAM).
Loading...