sql - How to improve mysql NATURAL LANGUAGE MODE search query? -


this query

select * mytable match (name) against ("apple m1" in natural language mode)  

if search apple m1 results orange m1 third or more position apple m-1 – value stored , assuming should first!

my question is: there way fine tune mysql search?

they best way improve mysql natural language mode search use boolean full-text searches instead. same natural language mode search, can use additional modifiers finetune results, e.g. by

> <

these 2 operators used change word's contribution relevance value assigned row. > operator increases contribution , < operator decreases it.

there 1 minor difference, boolean mode search not order automatically according relevance, have order yourself.

select * mytable  match (name) against (">apple m1" in boolean mode)  order match (name) against (">apple m1" in boolean mode) desc 

and remark: both versions of fulltext search not find m-1 if match against m1 (even minimum wordlength setting of 2). exakt (usually case-insensitive) word matches, not similar words (unless use *). "just" weigh combination of (exact) words algorithm, and, if use them, modifiers.

update additional clarification according comments:

if match against apple m1, returns rows contain (case-insensitive) apple or m1 in order, e.g. m1 apple, apple m4, apple m-1 , orange m1. not find apples m4 or orange m-1, because not words. e.g. like '%m-1%' wouldn't find apple m1 either. if like, can match against apple* find apple , apples, it's @ end of word, *apple* not possible, have use like '%apple%' then.

these rows ordered scoring algorithm, score words less common in texts higher common words. , if add >apple, give apple higher value. number, can add them select, e.g. select ..., match (name) against (">apple m1" in boolean mode) score feeling that.

there other things consider:

  • only words have minimum length added index. length given innodb_ft_min_token_size innodb or ft_min_word_len myisam. should set e.g. 2 include m1 (otherwise, word not have effect in search. since in example, found orange m1, assume set correctly).

  • - considered hyphen. m-1 in text split 2 words m , 1 (that may or may not included according mininum word lenght setting, maybe set 1). can change behaviour adding - characterset (see fine-tuning mysql full-text search, part beginning modify character set file), not find blue-green anymore if search blue and/or green.

  • the full text search uses stopwords. these words not included in index. list includes a , i, minimum wordlength of 1, not find them. can edit list.

some ideas potential problem m1/m-1. adjust exact requirements, have add more information searches , data (and maybe question), ideas:

  • you can replace userinput contains - including both versions search query: once -, enclosed in "", once without. if user enters apple m-1, create search apple m1 "m-1" (that work or without modified characterset, without new characterset, min word length has 1). if user enters m1, should detect , replace m1 "m-1" too.

  • another alternative save additional column clean, hyphenless words , add column full text index , match (name, clean_name) against ("m1" ....

  • and can of course combine , match, e.g. if detect product number in input, can use where match(...) against(...) or product_id 'm%1%', or where match(...) against(...) or product_id = 'm-1' or product_id = 'm1' or where match(...) against(...) or name '%m%1%', latter lot slower , contain lot of noise. , might not score correctly, @ least in resultset.

but said, depend on data , requirements.


Popular posts from this blog

php - How should I create my API for mobile applications (Needs Authentication) -

5 Reasons to Blog Anonymously (and 5 Reasons Not To)

Google AdWords and AdSense - A Dynamic Small Business Marketing Duo