Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inaccurate results when searching two or more keywords. #309

Open
Sianature opened this issue Jan 27, 2024 · 7 comments
Open

Inaccurate results when searching two or more keywords. #309

Sianature opened this issue Jan 27, 2024 · 7 comments

Comments

@Sianature
Copy link

Hi all

I am using TNTsearch for mySQL database. When I search the index for two or more words, TNT gives me wrong matches with only one of the keywords in them and not both. Is there a way to fix this?

Here is how I index my database:

Indexing code
require 'vendor/autoload.php';
use TeamTNT\TNTSearch\TNTSearch;
$tnt = new TNTSearch;
$tnt->loadConfig([
'driver' => 'mysql',
'host' => 'localhost',
'database' => 'products2023',
'username' => 'root',
'password' => '',
'storage' => 'C:\xampp2023\htdocs\Text search engines\tntsearch\indexes',
]);

$indexer = $tnt->createIndex('products2023.index');
$indexer->query('SELECT p.id,d.volume,ing.chemicals,c.CO_NAME
FROM products p
LEFT JOIN dimensions d ON d.prod_id = p.id
LEFT JOIN ingredients ing ON ing.prod_id = p.id
LEFT JOIN company c ON c.CO_NR = p.CO_NR
');
$indexer->run();

Searching code:
require 'vendor/autoload.php';
use TeamTNT\TNTSearch\TNTSearch;
include 'db_pdo_connect.php';
$tnt = new TNTSearch;

$tnt->loadConfig([
'storage' => 'C:\xampp2023\htdocs\Text search engines\tntsearch\indexes',
]);

$tnt->selectIndex("products2023.index");
$res = $tnt->search("dupont pvc",10);
#matches products that do not have 'dupont' in them

@nticaric
Copy link
Contributor

In TNTsearch, when you use a phrase like this, it doesn't automatically imply an 'AND' operation between the words. This means the search doesn't guarantee results containing both 'dupont' and 'pvc'. Instead, TNTsearch utilizes the BM25 algorithm to determine the relevance of results. Based on this algorithm, it seems to prioritize the term 'pvc' as more relevant in the documents it returns.

Also, make sure that in the indexer query you have an id column returned

@Sianature
Copy link
Author

Hi @nticaric

Thanks for your response. How should I enforce 'AND' operation to make sure both keywords exist in the matched results?
For the id, do I need to return and id for each table or just one id (primary key) for the first table?

Thnx

@somegooser
Copy link

Try to use searchBoolean($string) instead!

@Sianature
Copy link
Author

Thanks, @somegooser . Using searchBoolean solved the problem. However, add fuzziness messed it up again meaning that non-relevant matches appeared.

@somegooser
Copy link

Try playing with parameters.. like enable 'asYouType' and see if the results get any better.

@Sianature
Copy link
Author

@somegooser Tried that but had no luck! Here is my TNTSearch class parameters:
class TNTSearch
{
public $config;
public $asYouType = true;
public $maxDocs = 500;
public $tokenizer = null;
public $index = null;
public $stemmer = null;
public $fuzziness = true;
public $fuzzy_prefix_length = 2;
public $fuzzy_max_expansions = 500;
public $fuzzy_distance = 3;
protected $dbh = null;

@nticaric
Copy link
Contributor

nticaric commented Feb 6, 2024

Can you provide us with an sample of your dataset. From the first post, it seems you are joining other tables in the process of index building and you don't specify how you retrieve the documents after the search returns the results.
Usually the ->search() method is enough and the searched query is among the first 5 results

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants