Solr-ctf-query-parser

Reranks and expands Solr query returns using filtered clickstream data, providing a simple, flexible collaborative filtering framework. Clickthroughfilter-x.x.x.jar runs the filter as a query parser plugin for Solr/Lucene.

The filter samples click data (from a separate clicks core) for items returned by a query and uses it to

1.	reorganise	boost items in proportion to their click traffic (primary items)
2.	extend	inject new items not matching the query, but connected to the query's primary items by click traffic (secondary items)
3.	customise	boost & inject selected item types and boost & inject using selected components of click traffic

These elements can be used separately or in combination for improving search returns, current awareness, identifying related material, making personal recommendations etc (see example CTF queries below).

To minimise processing times, the filter acts either on the top n items of a query return ("base=matches": faster, but lower improvement), or looks deeper into the return and draws out the top n items in terms of click traffic ("base=clicks": potentially slower, but with potentially higher improvement).

For the most part, the filter can be built straight into many types of complex queries, including conjunction with other parsers, methods, facets etc. Where you find that a CTF query is not directly compatible with other complex queries (e.g. certain joins & group functions), you can usually find a way round by rearranging your input query or writing your own request handler.

A "/ctf" request handler is provided to help with testing and tuning (see attached solrconfig.xml). This quantifies primary and secondary improvements relative to the unmodified return and provides other useful metrics.

For more detail about the CTF plugin go to https://www.slideshare.net/pontneo/better-search-implementation-of-click-through-filter-as-a-query-parser-plugin-for-apache-solr-lucene

SOME EXAMPLE CTF QUERIES (parameters are described below):

1.	search in text field of your documents core for "melon" at your default CTF settings;
	q={!ctf}text:melon or q={!ctf v=$qq} with qq=text:melon
2.	weighted search for "melon" in title or body fields using only "botanist" click traffic but otherwise default settings;
	q={!ctf ctp="user:botanist" cts="user:botanist" v=$qq} with qq=title:melon^2 body:melon
3.	weighted search for "melon" or "cherry" blending click traffic & publication_date boosting;
	q={!boost b=recip(ms(NOW/HOUR,publication_date),3.16e-11,1,1)}{!ctf cb=5 cx=2 v=$qq} with qq=text:melon^3 text:cherry^2
	or q={!ctf cb=5 cx=2 v=$qq} with qq=({!boost b=recip(ms(NOW/HOUR,publication_date),3.16e-11,1,1)}body:melon^3 {!boost b=recip(ms(NOW/HOUR,publication_date),3.16e-11,1,1)}body:cherry^2) etc
4.	filtered search for "orange" with all returned items in category fruit;
	q={!ctf v=$qq} with qq=text:orange and fq=category:fruit
5.	filtered search for "orange" with just secondary items in category fruit;
	q={!ctf cf="category:fruit" v=$qq} with qq=text:orange
6.	filtered search for "orange" with just primary items in category fruit;
	q={!ctf v=$qq} with qq=+text:orange +category:fruit
7.	search for "lychee" with a high sensitivity to changes over time, recoiling quickly to the unmodified search return;
	q={!ctf cp=5 cd=2 ctp="time_stamp:[NOW-1DAY TO NOW]" v=$qq} with qq=text:lychee
8.	search for "lychee" in title field preserving original sort & including best 10% of secondary items with title word like "lychee";
	q={!ctf reorder=false cf="title:lychee~0.5" cs=0.1 v=$qq} with qq=title:lychee
9.	last week's search for "kiwi" based on the 10 most clicked "kiwi" items, with the best 20% of non-pdf secondary items that are then pushed down the list to help maximise improvement;
	q={!ctf base=clicks cn=10 cz="NOW-7DAY" cs=0.2 cf="-doctype:pdf" cy=0.95 v=$qq} with qq=text:kiwi
10.	top 5 most visited "lemon" recipes by user types 2 & 3 over the last 7 days;
	q={!ctf cn=5 base=clicks restrict=true extend=false ctp="+user_type:(2 3) +time_stamp:[NOW-7DAY/DAY TO NOW]" v=$qq} with qq=recipe:lemon
11.	top 10 recommendations for userID:x, based on userID:x's up to 20 most visited items over the last month and click traffic through those items by any other user with an interest in "fruit" since userID:x's last visit;
	q={!ctf base=clicks only2y=true cn=20 ctp="userID:x AND time_stamp:[NOW-31DAY TO NOW]" cts="-userID:x AND user_interests:fruit AND time_stamp:[NOW-(last_visit)DAY TO NOW]" v=$qq} with qq=docID:* and rows=10
	To remove any recommendations that userID:x has visited before, include in q;
	cf="-({!join from=toDocID to=docID fromIndex=clicks_core}userID:x {!join from=fromDocID to=docID fromIndex=clicks_core}userID:x)"
12.	next item in non-repeating discovery query with a simple fallback to avoid blind alleys;
	q={!ctf only2y=true cts="-sessionID:currentSessionID" v=$qq} with qq=docID:currentDocID^10 categoryID:currentCategoryID and rows=1
13.	related material for a given document with a simple fallback query for where there are few direct connections;
	q={!ctf only2y=true base=clicks v=$qq} OR {!ctf base=clicks v=$qqq} with qq=docID:currentDocID^10 and qqq=categoryID:currentCategoryID

REQUIREMENTS:

1.	document core(s) with unique docIDs
2.	a clicks core for storing clicks (see attached schema.xml) with a minimum of following fields;
	timestamp - timestamp of click
	fromDocID - referrer docID (may be a null value where necessary)
	toDocID - destination docID
	other fields (such as userID, usertype, user_interests, to_posn_in_list, user_query etc) are not required but are of course part of the point of using this plugin (see example queries above)

PARAMETERS IN SOLRCONFIG.XML AND FOR USE IN QUERIES (see attached solrconfig.xml):

ctf settings:
	solr_host_url	root containing your data cores
	document_core_to_query	document core name
	clicks_core_to_query	clicks core name
ctf mappings:
	document_ID_field_name	document core docID field name
	click_fromID_field_name	clicks core fromDocID field name
	click_null_fromID_value	clicks core null fromDocID value (for clicks without a fromID)
	click_toID_field_name	clicks core toDocID field name
	click_time_stamp_field_name	clicks core timestamp field name
ctf parameters:
	base	get primary (1y) items from best query matches or most clicked items (String matches or clicks)
	restrict	show only items with click boosts (String true or false)
	reorder	allow click boosts to effect score and sort (String true or false)
	extend	include secondary (2y) items (String true or false)
	only2y	show only 2y items (String true or false)
	cn	number of 1y items to sample (int)
	cd	average clicks per 1y item (double), controls click sample period
	cp	time integration (int num clicks >0), low values = responsive, high = stable
	cb	click boost = cb.(fn clicks)^cx (double cb values >0)
	cx	click boost = cb.(fn clicks)^cx (double cx values >0)
	ctp	click traffic type for 1y items - a filter query on click traffic (String use ctp="*" for any, ctp="user_type:2", ctp="userID:xxxx", ctp="time_stamp:[NOW-7DAY TO NOW]" or ctp="some function query" etc)
	cts	click traffic type for 2y items - a filter query on click traffic (String as ctp)
	cg	skews 2y scores from popular to specific (double value 0 to 1)
	cs	proportion of 2y items to allow through, lowest & oldest traffic removed first (double value 0 to 1)
	cf	2y item type - a filter query on 2y items (String use cf="*" for any, cf="cat:2", cf="-doc_type:pdf", cf="published:[NOW-31DAY TO NOW]", cf="some function query" etc)
	cy	position 2y items in list (double value between 0 (next to parent) and 1 (own click boost))
	cz	lookback parameter for observing past query returns at a certain time (Solr date use cz="NOW" or e.g. cz="NOW-7DAY" or cz="2015-07-14T11:32:00Z")

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
README.md		README.md
clickthroughfilter-1.1.0.jar		clickthroughfilter-1.1.0.jar
ctfBase.java		ctfBase.java
ctfClickBase.java		ctfClickBase.java
ctfDataHandler.java		ctfDataHandler.java
ctfOutput.java		ctfOutput.java
ctfQParser.java		ctfQParser.java
ctfScorer.java		ctfScorer.java
dependencies.txt		dependencies.txt
licence.txt		licence.txt
schema.xml		schema.xml
solrconfig.xml		solrconfig.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

clickthroughfilter-1.1.0.jar

clickthroughfilter-1.1.0.jar

ctfBase.java

ctfBase.java

ctfClickBase.java

ctfClickBase.java

ctfDataHandler.java

ctfDataHandler.java

ctfOutput.java

ctfOutput.java

ctfQParser.java

ctfQParser.java

ctfScorer.java

ctfScorer.java

dependencies.txt

dependencies.txt

licence.txt

licence.txt

schema.xml

schema.xml

solrconfig.xml

solrconfig.xml

Repository files navigation

Solr-ctf-query-parser

SOME EXAMPLE CTF QUERIES (parameters are described below):

REQUIREMENTS:

PARAMETERS IN SOLRCONFIG.XML AND FOR USE IN QUERIES (see attached solrconfig.xml):

About

Releases

Packages

Languages

License

gav-ctf/Solr-ctf-query-parser

Folders and files

Latest commit

History

Repository files navigation

Solr-ctf-query-parser

SOME EXAMPLE CTF QUERIES (parameters are described below):

REQUIREMENTS:

PARAMETERS IN SOLRCONFIG.XML AND FOR USE IN QUERIES (see attached solrconfig.xml):

About

Topics

Resources

License

Stars

Watchers

Forks

Languages