首页 \ 问答 \ SphinxSearch完全匹配排名(SphinxSearch exact match ranking)

SphinxSearch完全匹配排名(SphinxSearch exact match ranking)

我注意到使用Sphinx 2.2.8的排序顺序中有一个奇怪的行为(与v2.3.1-beta相同的结果)。

我希望精确匹配出现在第一个位置(我为此设置了index_exact_words和expan_keywords)。

这对我下面的第一个例子有两行有效。 但是如果我添加更多行,权重会更改,我的完全匹配结果(id = 1)会比其他近似值更低!

例如,索引这两个单词(一些带有形态学libstemmer_fr的法语单词):

source nptest
{
        type                    = pgsql
        sql_host                = localhost
        sql_user                = myuser
        sql_pass                = mypassword
        sql_db                  = mydb
        sql_port                = 5432

        sql_query               = \
                                  SELECT 1, 'chien' AS val \
                                  UNION \
                                  SELECT 2, 'chienne'

        sql_field_string = val
}

index nptest
{
        type                    = plain
        mlock                   = 1
        source                  = nptest
        path                    = /var/lib/sphinx/data/nptest
        morphology              = libstemmer_fr
        index_exact_words       = 1
        expand_keywords         = 1
}

索引后(indexer --rotate nptest):

mysql> SELECT id, val, weight() FROM nptest WHERE match('chien');
+------+---------+----------+
| id   | val     | weight() |
+------+---------+----------+
|    1 | chien   |     1500 |
|    2 | chienne |     1428 |
+------+---------+----------+
2 rows in set (0.00 sec)

“chien”这个词的排名高于“chienne”=>这就是我的预期。

现在我向db添加更多行:

source nptest
{
        type                    = pgsql
        sql_host                = localhost
        sql_user                = myuser
        sql_pass                = mypassword
        sql_db                  = mydb
        sql_port                = 5432

        sql_query               = \
                SELECT 1, 'chien' AS val \
                UNION \
                SELECT 2, 'chienne' \
                UNION \
                SELECT 3, 'un beau chien' \
                UNION \
                SELECT 4, 'chien-loup'

        sql_field_string = val
}


mysql> SELECT id, val, weight() FROM nptest WHERE match('chien');
+------+---------------+----------+
| id   | val           | weight() |
+------+---------------+----------+
|    2 | chienne       |     1402 |
|    1 | chien         |     1373 |
|    3 | un beau chien |     1373 |
|    4 | chien-loup    |     1373 |
+------+---------------+----------+
4 rows in set (0.00 sec)

重新索引后,最高级别现在是“chienne”!

这是正常的行为(如果是这样的原因?)或者它是一个错误? 如果它不是错误,我怎样才能确保完全匹配将获得最高排名?


I noticed a strange behavior in the sort order using Sphinx 2.2.8 (same result with v2.3.1-beta).

I expect exact matching to appear on first position (I set index_exact_words and expan_keywords for that).

That works well on my first example below with two rows. But if I add more rows, weights change and my exact match result (id=1) gets a lower rank than other approximate one!

For example, indexing these 2 words (some french words with morphology libstemmer_fr):

source nptest
{
        type                    = pgsql
        sql_host                = localhost
        sql_user                = myuser
        sql_pass                = mypassword
        sql_db                  = mydb
        sql_port                = 5432

        sql_query               = \
                                  SELECT 1, 'chien' AS val \
                                  UNION \
                                  SELECT 2, 'chienne'

        sql_field_string = val
}

index nptest
{
        type                    = plain
        mlock                   = 1
        source                  = nptest
        path                    = /var/lib/sphinx/data/nptest
        morphology              = libstemmer_fr
        index_exact_words       = 1
        expand_keywords         = 1
}

After indexing (indexer --rotate nptest):

mysql> SELECT id, val, weight() FROM nptest WHERE match('chien');
+------+---------+----------+
| id   | val     | weight() |
+------+---------+----------+
|    1 | chien   |     1500 |
|    2 | chienne |     1428 |
+------+---------+----------+
2 rows in set (0.00 sec)

The word "chien" has a higher rank than "chienne" => that's what I expected.

Now I add more rows to my db:

source nptest
{
        type                    = pgsql
        sql_host                = localhost
        sql_user                = myuser
        sql_pass                = mypassword
        sql_db                  = mydb
        sql_port                = 5432

        sql_query               = \
                SELECT 1, 'chien' AS val \
                UNION \
                SELECT 2, 'chienne' \
                UNION \
                SELECT 3, 'un beau chien' \
                UNION \
                SELECT 4, 'chien-loup'

        sql_field_string = val
}


mysql> SELECT id, val, weight() FROM nptest WHERE match('chien');
+------+---------------+----------+
| id   | val           | weight() |
+------+---------------+----------+
|    2 | chienne       |     1402 |
|    1 | chien         |     1373 |
|    3 | un beau chien |     1373 |
|    4 | chien-loup    |     1373 |
+------+---------------+----------+
4 rows in set (0.00 sec)

After reindexing the highest rank is now on "chienne"!

Is this a normal behaviour (if so why?) or is it a bug? If it is not a bug, how can I ensure that exact matching will get the highest rank ?


原文:https://stackoverflow.com/questions/29073517
更新时间:2022-03-26 15:03

最满意答案

换行:

arrayToFill =[arrayToFill insertObject:dictionary atIndex:i];

[arrayToFill insertObject:dictionary atIndex:i];

您无需再次分配它,只需调用insert方法即可


Change the line:

arrayToFill =[arrayToFill insertObject:dictionary atIndex:i];

to

[arrayToFill insertObject:dictionary atIndex:i];

You don't need to assign it again, just call the insert method

相关问答

更多

相关文章

更多

最新问答

更多
  • 您如何使用git diff文件,并将其应用于同一存储库的副本的本地分支?(How do you take a git diff file, and apply it to a local branch that is a copy of the same repository?)
  • 将长浮点值剪切为2个小数点并复制到字符数组(Cut Long Float Value to 2 decimal points and copy to Character Array)
  • OctoberCMS侧边栏不呈现(OctoberCMS Sidebar not rendering)
  • 页面加载后对象是否有资格进行垃圾回收?(Are objects eligible for garbage collection after the page loads?)
  • codeigniter中的语言不能按预期工作(language in codeigniter doesn' t work as expected)
  • 在计算机拍照在哪里进入
  • 使用cin.get()从c ++中的输入流中丢弃不需要的字符(Using cin.get() to discard unwanted characters from the input stream in c++)
  • No for循环将在for循环中运行。(No for loop will run inside for loop. Testing for primes)
  • 单页应用程序:页面重新加载(Single Page Application: page reload)
  • 在循环中选择具有相似模式的列名称(Selecting Column Name With Similar Pattern in a Loop)
  • System.StackOverflow错误(System.StackOverflow error)
  • KnockoutJS未在嵌套模板上应用beforeRemove和afterAdd(KnockoutJS not applying beforeRemove and afterAdd on nested templates)
  • 散列包括方法和/或嵌套属性(Hash include methods and/or nested attributes)
  • android - 如何避免使用Samsung RFS文件系统延迟/冻结?(android - how to avoid lag/freezes with Samsung RFS filesystem?)
  • TensorFlow:基于索引列表创建新张量(TensorFlow: Create a new tensor based on list of indices)
  • 企业安全培训的各项内容
  • 错误:RPC失败;(error: RPC failed; curl transfer closed with outstanding read data remaining)
  • C#类名中允许哪些字符?(What characters are allowed in C# class name?)
  • NumPy:将int64值存储在np.array中并使用dtype float64并将其转换回整数是否安全?(NumPy: Is it safe to store an int64 value in an np.array with dtype float64 and later convert it back to integer?)
  • 注销后如何隐藏导航portlet?(How to hide navigation portlet after logout?)
  • 将多个行和可变行移动到列(moving multiple and variable rows to columns)
  • 提交表单时忽略基础href,而不使用Javascript(ignore base href when submitting form, without using Javascript)
  • 对setOnInfoWindowClickListener的意图(Intent on setOnInfoWindowClickListener)
  • Angular $资源不会改变方法(Angular $resource doesn't change method)
  • 在Angular 5中不是一个函数(is not a function in Angular 5)
  • 如何配置Composite C1以将.m和桌面作为同一站点提供服务(How to configure Composite C1 to serve .m and desktop as the same site)
  • 不适用:悬停在悬停时:在元素之前[复制](Don't apply :hover when hovering on :before element [duplicate])
  • 常见的python rpc和cli接口(Common python rpc and cli interface)
  • Mysql DB单个字段匹配多个其他字段(Mysql DB single field matching to multiple other fields)
  • 产品页面上的Magento Up出售对齐问题(Magento Up sell alignment issue on the products page)