Discussion:
solr optimize command
Wei
2018-11-29 01:22:56 UTC
Permalink
Hi,

I use the following http request to start solr index optimization:

http://localhost:8983/solr/<core>/update?skipError=true -F stream.body='
<optimize />'


The request returns status code 200 shortly, but when looking at the solr
instance I noticed that actual optimization has not completed yet as there
are more than 1 segments. Is the optimize command async? What is the best
approach to validate that optimize is truly completed?


Thanks,

Wei
Zheng Lin Edwin Yeo
2018-11-29 02:50:05 UTC
Permalink
Hi,

How big is your index size, and do you have enough space in your disk to do
the optimization? You need at least twice the disk space in order for the
optimization to be successful, and even more if you are still doing
indexing during the optimization.

Also, which Solr version are you using?

Regards,
Edwin
Post by Wei
Hi,
http://localhost:8983/solr/<core>/update?skipError=true -F stream.body='
<optimize />'
The request returns status code 200 shortly, but when looking at the solr
instance I noticed that actual optimization has not completed yet as there
are more than 1 segments. Is the optimize command async? What is the best
approach to validate that optimize is truly completed?
Thanks,
Wei
Walter Underwood
2018-11-29 03:07:12 UTC
Permalink
Why do you think you need to optimize? Most configurations don’t need that.

And no, there is not synchronous optimize request.

wunder
Walter Underwood
***@wunderwood.org
http://observer.wunderwood.org/ (my blog)
Post by Zheng Lin Edwin Yeo
Hi,
How big is your index size, and do you have enough space in your disk to do
the optimization? You need at least twice the disk space in order for the
optimization to be successful, and even more if you are still doing
indexing during the optimization.
Also, which Solr version are you using?
Regards,
Edwin
Post by Wei
Hi,
http://localhost:8983/solr/<core>/update?skipError=true -F stream.body='
<optimize />'
The request returns status code 200 shortly, but when looking at the solr
instance I noticed that actual optimization has not completed yet as there
are more than 1 segments. Is the optimize command async? What is the best
approach to validate that optimize is truly completed?
Thanks,
Wei
Christopher Schultz
2018-11-29 14:52:14 UTC
Permalink
Wei,
Post by Wei
Hi,
http://localhost:8983/solr/<core>/update?skipError=true -F
stream.body=' <optimize />'
The request returns status code 200 shortly, but when looking at
the solr instance I noticed that actual optimization has not
completed yet as there are more than 1 segments. Is the optimize
command async? What is the best approach to validate that optimize
is truly completed?
Try this instead:

http://localhost:8983/solr/<core>/update?optimize=true&wait=true

This will wait until the operation has completed. Note that your
client (e.g. curl) may time-out after some time, so you'll want to
adjust that timeout to make sure the client doesn't give-up before the
optimization operation has completed.

As others have said, perhaps you don't actually need to optimize anythin
g.

- -chris
Shawn Heisey
2018-11-29 22:56:12 UTC
Permalink
Post by Wei
http://localhost:8983/solr/<core>/update?skipError=true -F stream.body='
<optimize />'
The request returns status code 200 shortly, but when looking at the solr
instance I noticed that actual optimization has not completed yet as there
are more than 1 segments. Is the optimize command async? What is the best
approach to validate that optimize is truly completed?
I do not know how that request can return a 200 before the optimize job
completes.  The "wait" parameters (one of which Christopher mentioned)
should all default to true, and I don't see them on your request.  As
far as I know, the operation is NOT asynchronous.  Are you absolutely
sure that it returned a 200? I'd like to see the actual response to verify.

I hate to assume you're wrong, but I think it's probably more likely
that your HTTP request timed out because of overly aggressive timeout
settings, probably a socket timeout.  If you have definitive proof that
you received the 200 and a normal-looking response, then we'll need to
look deeper.  Do you have the entry in solr.log for the optimize request?

Thanks,
Shawn
Christopher Schultz
2018-11-29 23:41:49 UTC
Permalink
Shawn,
Post by Shawn Heisey
Post by Wei
I use the following http request to start solr index
http://localhost:8983/solr/<core>/update?skipError=true -F
stream.body=' <optimize />'
The request returns status code 200 shortly, but when looking at
the solr instance I noticed that actual optimization has not
completed yet as there are more than 1 segments. Is the optimize
command async? What is the best approach to validate that
optimize is truly completed?
I do not know how that request can return a 200 before the optimize
job completes. The "wait" parameters (one of which Christopher
mentioned) should all default to true, and I don't see them on your
request. As far as I know, the operation is NOT asynchronous. Are
you absolutely sure that it returned a 200? I'd like to see the
actual response to verify.
I hate to assume you're wrong, but I think it's probably more
likely that your HTTP request timed out because of overly
aggressive timeout settings, probably a socket timeout. If you
have definitive proof that you received the 200 and a
normal-looking response, then we'll need to look deeper. Do you
have the entry in solr.log for the optimize request?
When mine returned (with wait=true as a request parameter), I got a
JSON response telling me how long it took.

- -chris
Shawn Heisey
2018-11-29 23:53:39 UTC
Permalink
Post by Christopher Schultz
When mine returned (with wait=true as a request parameter), I got a
JSON response telling me how long it took.
That's what I would expect.

If you have to explicitly include parameters like "wait" or
"waitSearcher" to make it block until the optimize is done, then in my
mind, that's a bug.  That should be the default setting.  In the 7.5
reference guide, I only see "waitSearcher", and it says the default is true.

Thanks,
Shawn
Erick Erickson
2018-11-30 02:58:35 UTC
Permalink
Here's the scoop on optimize:
https://lucidworks.com/2017/10/13/segment-merging-deleted-documents-optimize-may-bad/

Note the link to how Solr 7.5 is different.

Best,
Erick
Post by Shawn Heisey
Post by Christopher Schultz
When mine returned (with wait=true as a request parameter), I got a
JSON response telling me how long it took.
That's what I would expect.
If you have to explicitly include parameters like "wait" or
"waitSearcher" to make it block until the optimize is done, then in my
mind, that's a bug. That should be the default setting. In the 7.5
reference guide, I only see "waitSearcher", and it says the default is true.
Thanks,
Shawn
Christopher Schultz
2018-11-30 16:32:52 UTC
Permalink
Shawn,
Post by Shawn Heisey
Post by Christopher Schultz
When mine returned (with wait=true as a request parameter), I got
a JSON response telling me how long it took.
That's what I would expect.
If you have to explicitly include parameters like "wait" or
"waitSearcher" to make it block until the optimize is done, then in
my mind, that's a bug. That should be the default setting. In the
7.5 reference guide, I only see "waitSearcher", and it says the
default is true.
I didn't test it without that parameter. I used it because it was
suggested to me earlier this week on this list. It may in fact be
optional. I was using Solr 7.4.

- -chris

Loading...