Marian Steinbach
2011-12-05 09:01:40 UTC
Hi!
I am surprised to find an empty string as the most frequent index term in
one of my fields. Until now I didn't even know that empty strings would be
indexed.
Here is the schema.xml excerpt for that field:
<fieldType name="text_terms" class="solr.TextField">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory" />
<filter class="solr.PatternReplaceFilterFactory" pattern="^[0-9]+$"
replacement="" />
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.SynonymFilterFactory" synonyms="synonyms_terms.txt"
ignoreCase="true" />
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords_terms.txt" />
</analyzer>
</fieldType>
<field name="terms" type="text_terms" indexed="true" stored="false"
multiValued="true"/>
I have the suspicion that PatternReplaceFilterFactory
with pattern="^[0-9]+$" is causing the empty strings. I introduced that
filter to prevent numbers-only strings from being added to the index.
Any hint on how I can get rid of numbers AND empty strings?
Thanks!
Marian
I am surprised to find an empty string as the most frequent index term in
one of my fields. Until now I didn't even know that empty strings would be
indexed.
Here is the schema.xml excerpt for that field:
<fieldType name="text_terms" class="solr.TextField">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory" />
<filter class="solr.PatternReplaceFilterFactory" pattern="^[0-9]+$"
replacement="" />
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.SynonymFilterFactory" synonyms="synonyms_terms.txt"
ignoreCase="true" />
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords_terms.txt" />
</analyzer>
</fieldType>
<field name="terms" type="text_terms" indexed="true" stored="false"
multiValued="true"/>
I have the suspicion that PatternReplaceFilterFactory
with pattern="^[0-9]+$" is causing the empty strings. I introduced that
filter to prevent numbers-only strings from being added to the index.
Any hint on how I can get rid of numbers AND empty strings?
Thanks!
Marian