首页 \ 问答 \ similar_text如何工作?(How does similar_text work?)

similar_text如何工作?(How does similar_text work?)

我刚刚发现了Similar_text功能,并且正在玩弄,但输出的百分比总是让我感到惊讶。 参见下面的例子。

我试图找到关于PHP所提及的算法的信息: similar_text() 文档

<?php
$p = 0;
similar_text('aaaaaaaaaa', 'aaaaa', $p);
echo $p . "<hr>";
//66.666666666667
//Since 5 out of 10 chars match, I would expect a 50% match

similar_text('aaaaaaaaaaaaaaaaaaaa', 'aaaaa', $p);
echo $p . "<hr>";
//40
//5 out of 20 > not 25% ?

similar_text('aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa', 'aaaaa', $p);
echo $p . "<hr>"; 
//9.5238095238095 
//5 out of 100 > not 5% ?


//Example from PHP.net
//Why is turning the strings around changing the result?

similar_text('PHP IS GREAT', 'WITH MYSQL', $p);
echo $p . "<hr>"; //27.272727272727

similar_text('WITH MYSQL', 'PHP IS GREAT', $p);
echo $p . "<hr>"; //18.181818181818

?>

任何人都可以解释这是如何实际工作的?

更新:

感谢我的意见,我发现百分比实际上是使用类似的特征数* 200 / length1 + lenght 2

Z_DVAL_PP(percent) = sim * 200.0 / (t1_len + t2_len);

所以这就解释了为什么percenatges高于预期。 有一个字符串与95中的5个结果是10,所以我可以使用。

similar_text('aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa', 'aaaaa', $p);
echo $p . "<hr>"; 
//10
//5 out of 95 = 5 * 200 / (5 + 95) = 10

但是我仍然不明白为什么PHP在转动字符串时返回不同的结果。 dfsq提供的JS代码不会这样做。 看看PHP中的源代码,我只能在以下行找到一个区别,但是我不是ac程序员。 一些洞察有什么区别,不胜感激。

在JS中

for (l = 0;(p + l < firstLength) && (q + l < secondLength) && (first.charAt(p + l) === second.charAt(q + l)); l++);

在PHP中:(php_similar_str函数)

for (l = 0; (p + l < end1) && (q + l < end2) && (p[l] == q[l]); l++);

资源:

/* {{{ proto int similar_text(string str1, string str2 [, float percent])
   Calculates the similarity between two strings */
PHP_FUNCTION(similar_text)
{
  char *t1, *t2;
  zval **percent = NULL;
  int ac = ZEND_NUM_ARGS();
  int sim;
  int t1_len, t2_len;

  if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "ss|Z", &t1, &t1_len, &t2, &t2_len, &percent) == FAILURE) {
    return;
  }

  if (ac > 2) {
    convert_to_double_ex(percent);
  }

  if (t1_len + t2_len == 0) {
    if (ac > 2) {
      Z_DVAL_PP(percent) = 0;
    }

    RETURN_LONG(0);
  }

  sim = php_similar_char(t1, t1_len, t2, t2_len);

  if (ac > 2) {
    Z_DVAL_PP(percent) = sim * 200.0 / (t1_len + t2_len);
  }

  RETURN_LONG(sim);
}
/* }}} */ 


/* {{{ php_similar_str
 */
static void php_similar_str(const char *txt1, int len1, const char *txt2, int len2, int *pos1, int *pos2, int *max)
{
  char *p, *q;
  char *end1 = (char *) txt1 + len1;
  char *end2 = (char *) txt2 + len2;
  int l;

  *max = 0;
  for (p = (char *) txt1; p < end1; p++) {
    for (q = (char *) txt2; q < end2; q++) {
      for (l = 0; (p + l < end1) && (q + l < end2) && (p[l] == q[l]); l++);
      if (l > *max) {
        *max = l;
        *pos1 = p - txt1;
        *pos2 = q - txt2;
      }
    }
  }
}
/* }}} */


/* {{{ php_similar_char
 */
static int php_similar_char(const char *txt1, int len1, const char *txt2, int len2)
{
  int sum;
  int pos1, pos2, max;

  php_similar_str(txt1, len1, txt2, len2, &pos1, &pos2, &max);

  if ((sum = max)) {
    if (pos1 && pos2) {
      sum += php_similar_char(txt1, pos1,
                  txt2, pos2);
    }
    if ((pos1 + max < len1) && (pos2 + max < len2)) {
      sum += php_similar_char(txt1 + pos1 + max, len1 - pos1 - max,
                  txt2 + pos2 + max, len2 - pos2 - max);
    }
  }

  return sum;
}
/* }}} */

源码在Javascript中: 类似的文本端口到javascript


I just found the similar_text function and was playing around with it, but the percentage output always suprises me. See the examples below.

I tried to find information on the algorithm used as mentioned on php: similar_text()Docs:

<?php
$p = 0;
similar_text('aaaaaaaaaa', 'aaaaa', $p);
echo $p . "<hr>";
//66.666666666667
//Since 5 out of 10 chars match, I would expect a 50% match

similar_text('aaaaaaaaaaaaaaaaaaaa', 'aaaaa', $p);
echo $p . "<hr>";
//40
//5 out of 20 > not 25% ?

similar_text('aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa', 'aaaaa', $p);
echo $p . "<hr>"; 
//9.5238095238095 
//5 out of 100 > not 5% ?


//Example from PHP.net
//Why is turning the strings around changing the result?

similar_text('PHP IS GREAT', 'WITH MYSQL', $p);
echo $p . "<hr>"; //27.272727272727

similar_text('WITH MYSQL', 'PHP IS GREAT', $p);
echo $p . "<hr>"; //18.181818181818

?>

Can anybody explain how this actually works?

Update:

Thanks to the comments I found that the percentage is actually calculated using the number of similar charactors * 200 / length1 + lenght 2

Z_DVAL_PP(percent) = sim * 200.0 / (t1_len + t2_len);

So that explains why the percenatges are higher then expected. With a string with 5 out of 95 it turns out 10, so that I can use.

similar_text('aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa', 'aaaaa', $p);
echo $p . "<hr>"; 
//10
//5 out of 95 = 5 * 200 / (5 + 95) = 10

But I still cant figure out why PHP returns a different result on turning the strings around. The JS code provided by dfsq doesn't do this. Looking at the source code in PHP I can only find a difference in the following line, but i'm not a c programmer. Some insight in what the difference is, would be appreciated.

In JS:

for (l = 0;(p + l < firstLength) && (q + l < secondLength) && (first.charAt(p + l) === second.charAt(q + l)); l++);

In PHP: (php_similar_str function)

for (l = 0; (p + l < end1) && (q + l < end2) && (p[l] == q[l]); l++);

Source:

/* {{{ proto int similar_text(string str1, string str2 [, float percent])
   Calculates the similarity between two strings */
PHP_FUNCTION(similar_text)
{
  char *t1, *t2;
  zval **percent = NULL;
  int ac = ZEND_NUM_ARGS();
  int sim;
  int t1_len, t2_len;

  if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "ss|Z", &t1, &t1_len, &t2, &t2_len, &percent) == FAILURE) {
    return;
  }

  if (ac > 2) {
    convert_to_double_ex(percent);
  }

  if (t1_len + t2_len == 0) {
    if (ac > 2) {
      Z_DVAL_PP(percent) = 0;
    }

    RETURN_LONG(0);
  }

  sim = php_similar_char(t1, t1_len, t2, t2_len);

  if (ac > 2) {
    Z_DVAL_PP(percent) = sim * 200.0 / (t1_len + t2_len);
  }

  RETURN_LONG(sim);
}
/* }}} */ 


/* {{{ php_similar_str
 */
static void php_similar_str(const char *txt1, int len1, const char *txt2, int len2, int *pos1, int *pos2, int *max)
{
  char *p, *q;
  char *end1 = (char *) txt1 + len1;
  char *end2 = (char *) txt2 + len2;
  int l;

  *max = 0;
  for (p = (char *) txt1; p < end1; p++) {
    for (q = (char *) txt2; q < end2; q++) {
      for (l = 0; (p + l < end1) && (q + l < end2) && (p[l] == q[l]); l++);
      if (l > *max) {
        *max = l;
        *pos1 = p - txt1;
        *pos2 = q - txt2;
      }
    }
  }
}
/* }}} */


/* {{{ php_similar_char
 */
static int php_similar_char(const char *txt1, int len1, const char *txt2, int len2)
{
  int sum;
  int pos1, pos2, max;

  php_similar_str(txt1, len1, txt2, len2, &pos1, &pos2, &max);

  if ((sum = max)) {
    if (pos1 && pos2) {
      sum += php_similar_char(txt1, pos1,
                  txt2, pos2);
    }
    if ((pos1 + max < len1) && (pos2 + max < len2)) {
      sum += php_similar_char(txt1 + pos1 + max, len1 - pos1 - max,
                  txt2 + pos2 + max, len2 - pos2 - max);
    }
  }

  return sum;
}
/* }}} */

Source in Javascript: similar text port to javascript


原文:https://stackoverflow.com/questions/14136349
更新时间:2022-03-06 06:03

最满意答案

要么你作为一个矩阵使用它:

holder<-matrix(0,nrow=3,ncol=3)
for(i in 1:3){
    apple<-c(i+1, i*2, i^3)
    holder[,i]<-apple  # columnwise, that's how sapply does it too
}

或者您使用列表:

holder <- vector('list',3)
for(i in 1:3){
    apple<-c(i+1, i*2, i^3)
    holder[[i]]<-apple
}

或者你只是做R方式:

holder <- sapply(1:3,function(i) c(i+1, i*2,i^3))
holder.list <- sapply(1:3,function(i) c(i+1, i*2,i^3),simplify=FALSE)

旁注:如果你在R中遇到这个非常基本的问题,我强烈建议你浏览一下你在网上找到的任何介绍。 你得到一个列表:

我在哪里可以找到各种实现的有用的R教程?


either you work with it as a matrix :

holder<-matrix(0,nrow=3,ncol=3)
for(i in 1:3){
    apple<-c(i+1, i*2, i^3)
    holder[,i]<-apple  # columnwise, that's how sapply does it too
}

Or you use lists:

holder <- vector('list',3)
for(i in 1:3){
    apple<-c(i+1, i*2, i^3)
    holder[[i]]<-apple
}

Or you just do it the R way :

holder <- sapply(1:3,function(i) c(i+1, i*2,i^3))
holder.list <- sapply(1:3,function(i) c(i+1, i*2,i^3),simplify=FALSE)

On a sidenote : if you struggle with this very basic problem in R, I strongly recommend you to browse through any of the introductions you find on the web. You get a list of them at :

Where can I find useful R tutorials with various implementations?

相关问答

更多

相关文章

更多

最新问答

更多
  • 获取MVC 4使用的DisplayMode后缀(Get the DisplayMode Suffix being used by MVC 4)
  • 如何通过引用返回对象?(How is returning an object by reference possible?)
  • 矩阵如何存储在内存中?(How are matrices stored in memory?)
  • 每个请求的Java新会话?(Java New Session For Each Request?)
  • css:浮动div中重叠的标题h1(css: overlapping headlines h1 in floated divs)
  • 无论图像如何,Caffe预测同一类(Caffe predicts same class regardless of image)
  • xcode语法颜色编码解释?(xcode syntax color coding explained?)
  • 在Access 2010 Runtime中使用Office 2000校对工具(Use Office 2000 proofing tools in Access 2010 Runtime)
  • 从单独的Web主机将图像传输到服务器上(Getting images onto server from separate web host)
  • 从旧版本复制文件并保留它们(旧/新版本)(Copy a file from old revision and keep both of them (old / new revision))
  • 西安哪有PLC可控制编程的培训
  • 在Entity Framework中选择基类(Select base class in Entity Framework)
  • 在Android中出现错误“数据集和渲染器应该不为null,并且应该具有相同数量的系列”(Error “Dataset and renderer should be not null and should have the same number of series” in Android)
  • 电脑二级VF有什么用
  • Datamapper Ruby如何添加Hook方法(Datamapper Ruby How to add Hook Method)
  • 金华英语角.
  • 手机软件如何制作
  • 用于Android webview中图像保存的上下文菜单(Context Menu for Image Saving in an Android webview)
  • 注意:未定义的偏移量:PHP(Notice: Undefined offset: PHP)
  • 如何读R中的大数据集[复制](How to read large dataset in R [duplicate])
  • Unity 5 Heighmap与地形宽度/地形长度的分辨率关系?(Unity 5 Heighmap Resolution relationship to terrain width / terrain length?)
  • 如何通知PipedOutputStream线程写入最后一个字节的PipedInputStream线程?(How to notify PipedInputStream thread that PipedOutputStream thread has written last byte?)
  • python的访问器方法有哪些
  • DeviceNetworkInformation:哪个是哪个?(DeviceNetworkInformation: Which is which?)
  • 在Ruby中对组合进行排序(Sorting a combination in Ruby)
  • 网站开发的流程?
  • 使用Zend Framework 2中的JOIN sql检索数据(Retrieve data using JOIN sql in Zend Framework 2)
  • 条带格式类型格式模式编号无法正常工作(Stripes format type format pattern number not working properly)
  • 透明度错误IE11(Transparency bug IE11)
  • linux的基本操作命令。。。