Discussion:
solrj - Batching and Optimistic Concurrency
lstusr 5u93n4
2018-12-03 19:57:35 UTC
Permalink
Hi All,

I have a scenario where I'm trying to enable batching on the solrj client,
but trying to see how that works with Optimistic Concurrency.

From what I can tell, if I pass a list of SolrInputDocument to my solr
client, and a document somewhere in that list contains a `_version_` field
that would cause the Optimistic Concurrency check to fail:
- all documents in the list before the conflicting doc get saved correctly.
- no documents in the list after the conflicting doc get saved.

What I would really like is to "send a list of documents to solr, set the
_version_ on all of these documents to -1 so that they don't save if they
already exist, and have solr save all of the "new" documents in the list".

So three questions related to this:

1) Is Optimistic Concurrency the best mechanism for this, or is there some
other "don't overwrite" flag I can set that would work better?

2) If Optimisic Concurrency is the right way to go, Is there a mode that I
can set that would allow ALL non-conflicting documents in a batch to be
saved?

3) If questions 1 or 2 are not possible, I could trap the resulting
RouteException with a 409 code and remove the offending document from the
list. But:
a) can I safely remove ALL documents in the list before the offending
one, assuming they've been saved?
b) is there a better way to get the ID of the offending document besides
parsing the 'Error from server at
http://my.solr.instance:8983/solr/test_shard1_replica_n1: version conflict
for doc2` string from the exception?

Thanks!

Kyle
Erick Erickson
2018-12-03 21:04:39 UTC
Permalink
And I forgot to mention TolerantUpdateProcessor, might be another approach.
You can add, say, a ScriptUpdateProcessor that checks this for you
pretty easily.
Have you looked at the Overwrite=false option (assuming you're not
assigning _version_ yourself)?
Best,
Erick
Post by lstusr 5u93n4
Hi All,
I have a scenario where I'm trying to enable batching on the solrj client,
but trying to see how that works with Optimistic Concurrency.
From what I can tell, if I pass a list of SolrInputDocument to my solr
client, and a document somewhere in that list contains a `_version_` field
- all documents in the list before the conflicting doc get saved correctly.
- no documents in the list after the conflicting doc get saved.
What I would really like is to "send a list of documents to solr, set the
_version_ on all of these documents to -1 so that they don't save if they
already exist, and have solr save all of the "new" documents in the list".
1) Is Optimistic Concurrency the best mechanism for this, or is there some
other "don't overwrite" flag I can set that would work better?
2) If Optimisic Concurrency is the right way to go, Is there a mode that I
can set that would allow ALL non-conflicting documents in a batch to be
saved?
3) If questions 1 or 2 are not possible, I could trap the resulting
RouteException with a 409 code and remove the offending document from the
a) can I safely remove ALL documents in the list before the offending
one, assuming they've been saved?
b) is there a better way to get the ID of the offending document besides
parsing the 'Error from server at
http://my.solr.instance:8983/solr/test_shard1_replica_n1: version conflict
for doc2` string from the exception?
Thanks!
Kyle
lstusr 5u93n4
2018-12-04 14:51:15 UTC
Permalink
Hi Erick,

Looks like TolerantUpdateProcessor is exactly what I need. Thanks!

Kyle.

P.S. I can find the doc for TolerantUpdateProcessorFactory here:
http://lucene.apache.org/solr/7_5_0/solr-core/org/apache/solr/update/processor/TolerantUpdateProcessor.html
, but it seems to be missing from the guide at
https://lucene.apache.org/solr/guide/7_5/update-request-processors.html .
Not sure if that's something the solr maintainers want to add, just thought
I'd point it out for future searchers following this thread.
Post by Erick Erickson
And I forgot to mention TolerantUpdateProcessor, might be another approach.
You can add, say, a ScriptUpdateProcessor that checks this for you
pretty easily.
Have you looked at the Overwrite=false option (assuming you're not
assigning _version_ yourself)?
Best,
Erick
Post by lstusr 5u93n4
Hi All,
I have a scenario where I'm trying to enable batching on the solrj
client,
Post by lstusr 5u93n4
but trying to see how that works with Optimistic Concurrency.
From what I can tell, if I pass a list of SolrInputDocument to my solr
client, and a document somewhere in that list contains a `_version_`
field
Post by lstusr 5u93n4
- all documents in the list before the conflicting doc get saved
correctly.
Post by lstusr 5u93n4
- no documents in the list after the conflicting doc get saved.
What I would really like is to "send a list of documents to solr, set
the
Post by lstusr 5u93n4
_version_ on all of these documents to -1 so that they don't save if
they
Post by lstusr 5u93n4
already exist, and have solr save all of the "new" documents in the
list".
Post by lstusr 5u93n4
1) Is Optimistic Concurrency the best mechanism for this, or is there
some
Post by lstusr 5u93n4
other "don't overwrite" flag I can set that would work better?
2) If Optimisic Concurrency is the right way to go, Is there a mode
that I
Post by lstusr 5u93n4
can set that would allow ALL non-conflicting documents in a batch to be
saved?
3) If questions 1 or 2 are not possible, I could trap the resulting
RouteException with a 409 code and remove the offending document from
the
Post by lstusr 5u93n4
a) can I safely remove ALL documents in the list before the offending
one, assuming they've been saved?
b) is there a better way to get the ID of the offending document
besides
Post by lstusr 5u93n4
parsing the 'Error from server at
http://my.solr.instance:8983/solr/test_shard1_replica_n1: version
conflict
Post by lstusr 5u93n4
for doc2` string from the exception?
Thanks!
Kyle
Loading...