Discussion:
Filtering Solr pivot facet values
Arun Rangarajan
2017-12-18 18:59:10 UTC
Permalink
Solr version: 6.6.0

There are two multi-valued string fields in my schema:
* interests
* hierarchy.

Goal is to run a pivot facet query on both these fields, but only for
specific values of `interests` field. This query:

```
/select
?wt=json
&rows=0
&q=interests:(hockey OR soccer)
&facet=true
&facet.pivot=interests,hierarchy
```

selects the correct documents, but since `interests` is a multi-valued
field, it gives the required counts for the interested values (hockey,
soccer), but also gives the counts for other values of `interests` in the
matching documents.

How to filter the pivot facet counts only for the values of `interests`
field specified in the 'q' param i.e. hockey and soccer in the example.
Essentially, is there an equivalent of
https://lucene.apache.org/solr/guide/6_6/faceting.html#Faceting-Limitingfacetwithcertainterms
for pivot facet query? Or are there alternate formats like JSON faceting
that may help here?

(Full disclosure: I asked the question on StackOverflow and got no response
so far:
https://stackoverflow.com/questions/47838619/filtering-solr-pivot-facet-values
)

Thanks.
Arun Rangarajan
2017-12-20 20:31:35 UTC
Permalink
Hello Solr Gurus,

Sorry to bother you again on this. Is there no way in Solr to filter pivot
facets?
[Or did I attract the wrath of the group by posting the question first on
StackOverflow? :-)]

Thanks once again.
Post by Arun Rangarajan
Solr version: 6.6.0
* interests
* hierarchy.
Goal is to run a pivot facet query on both these fields, but only for
```
/select
?wt=json
&rows=0
&q=interests:(hockey OR soccer)
&facet=true
&facet.pivot=interests,hierarchy
```
selects the correct documents, but since `interests` is a multi-valued
field, it gives the required counts for the interested values (hockey,
soccer), but also gives the counts for other values of `interests` in the
matching documents.
How to filter the pivot facet counts only for the values of `interests`
field specified in the 'q' param i.e. hockey and soccer in the example.
Essentially, is there an equivalent of https://lucene.apache.org/
solr/guide/6_6/faceting.html#Faceting-Limitingfacetwithcertainterms for
pivot facet query? Or are there alternate formats like JSON faceting that
may help here?
(Full disclosure: I asked the question on StackOverflow and got no
response so far: https://stackoverflow.com/questions/47838619/
filtering-solr-pivot-facet-values )
Thanks.
Shawn Heisey
2017-12-20 21:07:55 UTC
Permalink
Post by Arun Rangarajan
Sorry to bother you again on this. Is there no way in Solr to filter pivot
facets?
[Or did I attract the wrath of the group by posting the question first on
StackOverflow? :-)]
StackOverflow and this list are pretty much unaware of each other unless
specific mention is made.  I don't care whether you ask on SO or not, or
which one you ask first.

You haven't provided actual output that you're seeing.  Can you provide
actual response output from your queries and describe what you'd rather
see instead?  With that information, we might be able to offer some ideas.

In general, facets should never count documents that are not in the
search results.

Multi-select faceting offers a way to change that general behavior,
though -- tagging specific fq parameters and asking the facet to exclude
those filters.

https://wiki.apache.org/solr/SimpleFacetParameters#Tagging_and_excluding_Filters

Thanks,
Shawn
Arun Rangarajan
2017-12-20 21:40:53 UTC
Permalink
Thanks for your reply, Shawn.

I think multi-select faceting does the opposite of what I want. I want the
facet to include the filters.

Example:

The following 8 documents are the only ones in my Solr core:

[
{"id": "1", "hierarchy": ["1", "16", "169"], "interests": ["soccer",
"futbol"]},
{"id": "2", "hierarchy": ["1", "16", "162"], "interests": ["cricket",
"futbol"]},
{"id": "3", "hierarchy": ["1", "14", "141"], "interests": ["hockey",
"soccer"]},
{"id": "4", "hierarchy": ["1", "16", "162"], "interests": ["hockey",
"soccer", "tennis"]},
{"id": "5", "hierarchy": ["1", "14", "142"], "interests": ["badminton"]},
{"id": "6", "hierarchy": ["1", "14", "147"], "interests": ["soccer"]},
{"id": "7", "hierarchy": ["1", "16", "168"], "interests": ["hockey",
"soccer", "tennis"]},
{"id": "8", "hierarchy": ["1", "14", "140"], "interests": ["badminton"]}
]

As you can see, hierarchy and interests are both multi-valued string fields.

I want pivot facet counts for the two fields: hierarchy and interests, but
filtered for only two values of interests field: hockey, soccer.

The query I am running is:

/select
?wt=json
&rows=0
&q=interests:(hockey soccer)
&facet=true
&facet.pivot=hierarchy,interests

This gives the following result for the pivot facets:

"facet_pivot": {
"hierarchy,interests": [
{
"field": "hierarchy",
"value": "1",
"count": 5,
"pivot": [
{"field": "interests", "value": "soccer", "count": 5},
{"field": "interests", "value": "hockey", "count": 3},
{"field": "interests", "value": "tennis", "count": 2},
{"field": "interests", "value": "futbol", "count": 1}
]
},
{
"field": "hierarchy",
"value": "16",
"count": 3,
"pivot": [
{"field": "interests", "value": "soccer", "count": 3},
{"field": "interests", "value": "hockey", "count": 2},
{"field": "interests", "value": "tennis", "count": 2},
{"field": "interests", "value": "futbol", "count": 1}
]
},
...
]
}

The counts for hockey and soccer are correct. But I am also getting the
facet counts for other values of interests (like tennis, futbol, etc.,)
since these values match the query. I understand why this is happening.
This is why I said I want to do something like
https://lucene.apache.org/solr/guide/6_6/faceting.html#Faceting-Limitingfacetwithcertainterms
for facet pivots. Is there a way to do that?

Thanks.
Post by Shawn Heisey
Post by Arun Rangarajan
Sorry to bother you again on this. Is there no way in Solr to filter
pivot
Post by Arun Rangarajan
facets?
[Or did I attract the wrath of the group by posting the question first on
StackOverflow? :-)]
StackOverflow and this list are pretty much unaware of each other unless
specific mention is made. I don't care whether you ask on SO or not, or
which one you ask first.
You haven't provided actual output that you're seeing. Can you provide
actual response output from your queries and describe what you'd rather
see instead? With that information, we might be able to offer some ideas.
In general, facets should never count documents that are not in the
search results.
Multi-select faceting offers a way to change that general behavior,
though -- tagging specific fq parameters and asking the facet to exclude
those filters.
https://wiki.apache.org/solr/SimpleFacetParameters#Tagging_
and_excluding_Filters
Thanks,
Shawn
Shawn Heisey
2017-12-21 16:17:50 UTC
Permalink
Post by Arun Rangarajan
I think multi-select faceting does the opposite of what I want. I want the
facet to include the filters.
You don't have any filters to include or exclude. You would need fq
parameters to use multi-select faceting. But as you say, it doesn't do
what you want anyway.

<snip>
Post by Arun Rangarajan
As you can see, hierarchy and interests are both multi-valued string fields.
I want pivot facet counts for the two fields: hierarchy and interests, but
filtered for only two values of interests field: hockey, soccer.
<snip>
Post by Arun Rangarajan
The counts for hockey and soccer are correct. But I am also getting the
facet counts for other values of interests (like tennis, futbol, etc.,)
since these values match the query. I understand why this is happening.
This is why I said I want to do something like
https://lucene.apache.org/solr/guide/6_6/faceting.html#Faceting-Limitingfacetwithcertainterms
for facet pivots. Is there a way to do that?
I see now. It's showing the other values because the fields are
multivalued and the matching documents actually do contain those values,
so Solr is working the way I expected it to, but your data is different
than I was thinking. It's the multivalued aspect that makes this
problematic.

I was not aware that you could limit the terms with field faceting.
Either the syntax to achieve what you want is different than what you
are using, or it just can't be done with pivot faceting at the moment
because there are no options to do it. I'm guessing the latter, but
since I am not familiar with the code, I cannot say for sure. Hopefully
somebody else can speak up with an option, but I'm not expecting that to
happen.

Thanks,
Shawn
Erick Erickson
2017-12-21 17:45:36 UTC
Permalink
You might be able to do some interesting with the JSON faceting
approach, but I confess I don't know for sure.

Best,
Erick
Post by Shawn Heisey
Post by Arun Rangarajan
I think multi-select faceting does the opposite of what I want. I want the
facet to include the filters.
You don't have any filters to include or exclude. You would need fq
parameters to use multi-select faceting. But as you say, it doesn't do what
you want anyway.
<snip>
Post by Arun Rangarajan
As you can see, hierarchy and interests are both multi-valued string fields.
I want pivot facet counts for the two fields: hierarchy and interests, but
filtered for only two values of interests field: hockey, soccer.
<snip>
Post by Arun Rangarajan
The counts for hockey and soccer are correct. But I am also getting the
facet counts for other values of interests (like tennis, futbol, etc.,)
since these values match the query. I understand why this is happening.
This is why I said I want to do something like
https://lucene.apache.org/solr/guide/6_6/faceting.html#Faceting-Limitingfacetwithcertainterms
for facet pivots. Is there a way to do that?
I see now. It's showing the other values because the fields are multivalued
and the matching documents actually do contain those values, so Solr is
working the way I expected it to, but your data is different than I was
thinking. It's the multivalued aspect that makes this problematic.
I was not aware that you could limit the terms with field faceting. Either
the syntax to achieve what you want is different than what you are using, or
it just can't be done with pivot faceting at the moment because there are no
options to do it. I'm guessing the latter, but since I am not familiar with
the code, I cannot say for sure. Hopefully somebody else can speak up with
an option, but I'm not expecting that to happen.
Thanks,
Shawn
Emir Arnautović
2017-12-21 20:48:22 UTC
Permalink
It seems that there is something in latest Solr version that you might be able to use. From release notes:

“The new facet.matches parameter returns facet buckets only for terms
that match a regular expression.”

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/
Post by Erick Erickson
You might be able to do some interesting with the JSON faceting
approach, but I confess I don't know for sure.
Best,
Erick
Post by Shawn Heisey
Post by Arun Rangarajan
I think multi-select faceting does the opposite of what I want. I want the
facet to include the filters.
You don't have any filters to include or exclude. You would need fq
parameters to use multi-select faceting. But as you say, it doesn't do what
you want anyway.
<snip>
Post by Arun Rangarajan
As you can see, hierarchy and interests are both multi-valued string fields.
I want pivot facet counts for the two fields: hierarchy and interests, but
filtered for only two values of interests field: hockey, soccer.
<snip>
Post by Arun Rangarajan
The counts for hockey and soccer are correct. But I am also getting the
facet counts for other values of interests (like tennis, futbol, etc.,)
since these values match the query. I understand why this is happening.
This is why I said I want to do something like
https://lucene.apache.org/solr/guide/6_6/faceting.html#Faceting-Limitingfacetwithcertainterms
for facet pivots. Is there a way to do that?
I see now. It's showing the other values because the fields are multivalued
and the matching documents actually do contain those values, so Solr is
working the way I expected it to, but your data is different than I was
thinking. It's the multivalued aspect that makes this problematic.
I was not aware that you could limit the terms with field faceting. Either
the syntax to achieve what you want is different than what you are using, or
it just can't be done with pivot faceting at the moment because there are no
options to do it. I'm guessing the latter, but since I am not familiar with
the code, I cannot say for sure. Hopefully somebody else can speak up with
an option, but I'm not expecting that to happen.
Thanks,
Shawn
Loading...