How to force wildcard query not to ignore word endings

Discussion:

easy.angel

2010-07-01 14:41:03 UTC

Hello,

I have one problem with querying solr. I indexed person with 2 fields:

* firstname - Hans
* lastname - Mustermann

and I have copy field 'text' where these fields are copied. 'text' field is
used during query.

Now, when I search:

han*

I do have Hans Mustermann in the query results. But if I will search:

hans*

I recieve no results! However query without wildcard will return correct
results.

How can I configure solr to return Hans Mustermann for query: hans* ?

I add this wildcard dynamically (during query), so I want to have it in
every query.

Thanks in advance,
Oleg

--
View this message in context: http://lucene.472066.n3.nabble.com/How-to-force-wildcard-query-not-to-ignore-word-endings-tp936322p936322.html
Sent from the Solr - User mailing list archive at Nabble.com.

Ahmet Arslan

2010-07-01 15:03:00 UTC

Permalink

Post by easy.angel
I have one problem with querying solr. I indexed person
* firstname - Hans
* lastname - Mustermann
and I have copy field 'text' where these fields are copied.
'text' field is
used during query.
han*
I do have Hans Mustermann in the query results. But if I
hans*
I recieve no results! However query without wildcard will
return correct
results.
How can I configure solr to return Hans Mustermann for
query: hans* ?
I add this wildcard dynamically (during query), so I want
to have it in
every query.

Easiest solution is to remove stemfilter from your analysis chain.
Or write "hans" no protword.txt file, so stemmer wont touch it.

Another solution requires writing custom code: integrate Lucene's AnalyzingQueryParser so that your wildcard queries are analyzed.

http://lucene.apache.org/java/3_0_2/api/contrib-misc/org/apache/lucene/queryParser/analyzing/AnalyzingQueryParser.html

By the way why are you inserting * at the end of your queries?

easy.angel

2010-07-01 15:52:42 UTC

Permalink

Thank you very match for you help and fast answer!

i always add wildcard because I use solr in autocomplete. So as you type
your query you can see temporary results. I also found that adding wild card
returns better temporary results. At least it was easiest solution in some
cases. I'm not sure weather it can be solved in solr configuration itself
(for example with query analyzer for the text field, or with index
analyzer).

i think problem lies during indexing, and instead "hans" solr index "han"
value (but I'm not sure).

--
View this message in context: http://lucene.472066.n3.nabble.com/How-to-force-wildcard-query-not-to-ignore-word-endings-tp936322p936506.html
Sent from the Solr - User mailing list archive at Nabble.com.

Ahmet Arslan

2010-07-01 16:13:06 UTC

Permalink

Post by easy.angel
I'm not sure weather it can be solved in solr
configuration itself
(for example with query analyzer for the text field, or
with index
analyzer).

Do you have StemFilterFactory in your field type? Remove it from query analyzer for the text field. Re-start core + re-index.

easy.angel

2010-07-01 16:24:39 UTC

Permalink

I have standard configuration for the text field type:

<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory" />

<filter class="solr.StopFilterFactory"
ignoreCase="true"
words="stopwords.txt"
enablePositionIncrements="true" />

<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="1"
catenateWords="1" catenateNumbers="1" catenateAll="0"
splitOnCaseChange="1" />
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.SnowballPorterFilterFactory" language="English"
protected="protwords.txt" />
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory" />

<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true" />
<filter class="solr.StopFilterFactory"
ignoreCase="true"
words="stopwords.txt"
enablePositionIncrements="true"
/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="1"
catenateWords="0" catenateNumbers="0" catenateAll="0"
splitOnCaseChange="1" />
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.SnowballPorterFilterFactory" language="English"
protected="protwords.txt" />
</analyzer>
</fieldType>

I will try to remove SnowballPorterFilterFactory (is it right?) and then
restart solr + reindex

--
View this message in context: http://lucene.472066.n3.nabble.com/How-to-force-wildcard-query-not-to-ignore-word-endings-tp936322p936584.html
Sent from the Solr - User mailing list archive at Nabble.com.

Ahmet Arslan

2010-07-01 19:34:59 UTC

Permalink

Post by easy.angel
I will try to remove SnowballPorterFilterFactory (is it
right?) and then restart solr + reindex

Exactly. This will solve your problem.

However remember that wildcard, prefix searches (*) are not analyzed. For example HAN* won't return anything.

easy.angel

2010-07-02 08:50:06 UTC

Permalink

Thanks! I tested it and it works perfectly.

Post by Ahmet Arslan
However remember that wildcard, prefix searches (*) are not analyzed.
For example HAN* won't return anything.

I making query lowercasing also dynamically, so it's not a problem for me.

--
View this message in context: http://lucene.472066.n3.nabble.com/How-to-force-wildcard-query-not-to-ignore-word-endings-tp936322p938097.html
Sent from the Solr - User mailing list archive at Nabble.com.