首页 \ 问答 \ 如何正确解释solr相似度得分?(How to correctly interpret solr similarity score?)

如何正确解释solr相似度得分?(How to correctly interpret solr similarity score?)

我知道Solr返回的相似性分数仅与特定查询相关,并且它们只具有相对含义。 话虽如此,有没有办法确定全球时尚得分的“优点”?

例如:假设我运行MLT查询并获得5个文档。 每个文档都有一个分数,但事实是分数最高的文档不一定是最相关的。 我希望能够指定一个阈值分数,我甚至不会考虑这些文件。

如何确定这个门槛? 只是通过经验测量,或者我可以说通常 ,大于3的相似性得分在内容上有很好的相似性,而小于1的相似性得分通常意味着文档完全不相关? 或者,我可以说结果小于文档与自身相似性的80%是无关紧要的吗?


I am aware that the similarity scores returned by Solr are relevant only for a specific query and that they have only relative meaning. Having said that, is there a way to determine the 'goodness' of a score in a global fashion?

For example: Suppose I run an MLT query and get 5 documents. Each document has a score but the fact is that the document with the highest score is not necessarily the most relevant. I want to be able to specify a threshold score below which I do not even consider the documents.

How can this threshold be determined? Is it only by empirical measurement, or can I say that usually, a similarity score larger than 3 gives good resemblance in content, while similarity scores smaller than 1 usually means the document is completely irrelevant? Or alternatively, can I say that results that are less than 80% of the similarity of a document to itself are irrelevant?


原文:https://stackoverflow.com/questions/21382258
更新时间:2022-02-14 14:02

最满意答案

你可以将upload_to值更改为这样的功能

def upload_to_id_image(instance, filename):
    extension = splitext(filename)[1].lower()
    salt, hashed = generate_sha1(instance.id)
    path = 'profiles/%(id)s_%(date_now)s_' % {
                                         'id': instance.user.id,
                                         'date_now': get_datetime_now().date().strftime("%Y%m%d")}
    return '%(path)s%(hash)s%(extension)s' % {'path': path,
                                          'hash': hashed[:16],
                                          'extension': extension}

然后,你应该改变你的代码,

photo = ProcessedImageField(upload_to=upload_to_id_image,

当然你可以删除哈希码。 但是对于文件安全性,make散列文件名更好。


you can change upload_to value to function like this

def upload_to_id_image(instance, filename):
    extension = splitext(filename)[1].lower()
    salt, hashed = generate_sha1(instance.id)
    path = 'profiles/%(id)s_%(date_now)s_' % {
                                         'id': instance.user.id,
                                         'date_now': get_datetime_now().date().strftime("%Y%m%d")}
    return '%(path)s%(hash)s%(extension)s' % {'path': path,
                                          'hash': hashed[:16],
                                          'extension': extension}

and then, you should change your code like this,

photo = ProcessedImageField(upload_to=upload_to_id_image,

of course you can delete hash code. but for file security, make hashed file name is better.

相关问答

更多
  • 你可以将upload_to值更改为这样的功能 def upload_to_id_image(instance, filename): extension = splitext(filename)[1].lower() salt, hashed = generate_sha1(instance.id) path = 'profiles/%(id)s_%(date_now)s_' % { 'id': ins ...
  • 您应该考虑使用PIL创建缩略图: http : //www.pythonware.com/products/pil/ 真的很容易使用,并易于遵循教程: http : //effbot.org/imagingbook/image.htm You should consider using PIL to create thumbnails: http://www.pythonware.com/products/pil/ Really easy to use, and easy to follow tutoria ...
  • 所有这些应用程序都有(略微)不同的用途,因此您无法真正比较它们。 如果一个比另一个好,取决于你的用例,并且是非常主观的, 不适合SO 。 easy_thumbnails可帮助您快速生成缩略图。 当然,您可以指定尺寸,缩放图像,甚至使用PIL进行各种处理(通过处理器)。 如果你“只”需要缩略图,我会说这是要走的路。 django-image-cropping允许您在管理员中选择图像的一部分(裁剪),并帮助您在整个页面中显示裁剪的选择(使用easy_thumbnail缩略图处理器)。 我没有使用ImageKit ...
  • 您需要在模板{{ bookmaker.logo_small.url }}以获取图片的网址。 然后将创建图像并将其存储在CACHE文件夹中。 Hm, it works now. I did not do anything. Probably just takes some time to generate those images (I do not know what it takes to generate images), probably restart ./manage.py runserver i ...
  • 找到解决方案 我错过了一些图书馆。 apt-get install libjpeg-dev并重新安装枕头修复错误! Solution found. I was missing some libraries. apt-get install libjpeg-dev and reinstalling pillow fixed the error!
  • 你可以通过使用jquery和bootstrap来实现这一点。