首页 \ 问答 \ similar_text如何工作？(How does similar_text work?)

similar_text如何工作？(How does similar_text work?)

 我刚刚发现了Similar_text功能，并且正在玩弄，但输出的百分比总是让我感到惊讶。 参见下面的例子。  
 我试图找到关于PHP所提及的算法的信息： similar_text() ^文档 ：  
<?php
$p = 0;
similar_text('aaaaaaaaaa', 'aaaaa', $p);
echo $p . "<hr>";
//66.666666666667
//Since 5 out of 10 chars match, I would expect a 50% match

similar_text('aaaaaaaaaaaaaaaaaaaa', 'aaaaa', $p);
echo $p . "<hr>";
//40
//5 out of 20 > not 25% ?

similar_text('aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa', 'aaaaa', $p);
echo $p . "<hr>"; 
//9.5238095238095 
//5 out of 100 > not 5% ?


//Example from PHP.net
//Why is turning the strings around changing the result?

similar_text('PHP IS GREAT', 'WITH MYSQL', $p);
echo $p . "<hr>"; //27.272727272727

similar_text('WITH MYSQL', 'PHP IS GREAT', $p);
echo $p . "<hr>"; //18.181818181818

?>
 
 任何人都可以解释这是如何实际工作的？  
 更新：  
 感谢我的意见，我发现百分比实际上是使用类似的特征数* 200 / length1 + lenght 2  
Z_DVAL_PP(percent) = sim * 200.0 / (t1_len + t2_len);
 
 所以这就解释了为什么percenatges高于预期。 有一个字符串与95中的5个结果是10，所以我可以使用。  
similar_text('aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa', 'aaaaa', $p);
echo $p . "<hr>"; 
//10
//5 out of 95 = 5 * 200 / (5 + 95) = 10
 
 但是我仍然不明白为什么PHP在转动字符串时返回不同的结果。 dfsq提供的JS代码不会这样做。 看看PHP中的源代码，我只能在以下行找到一个区别，但是我不是ac程序员。 一些洞察有什么区别，不胜感激。  
 在JS中  
for (l = 0;(p + l < firstLength) && (q + l < secondLength) && (first.charAt(p + l) === second.charAt(q + l)); l++);
 
 在PHP中：（php_similar_str函数）  
for (l = 0; (p + l < end1) && (q + l < end2) && (p[l] == q[l]); l++);
 
 资源：  
/* {{{ proto int similar_text(string str1, string str2 [, float percent])
   Calculates the similarity between two strings */
PHP_FUNCTION(similar_text)
{
  char *t1, *t2;
  zval **percent = NULL;
  int ac = ZEND_NUM_ARGS();
  int sim;
  int t1_len, t2_len;

  if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "ss|Z", &t1, &t1_len, &t2, &t2_len, &percent) == FAILURE) {
    return;
  }

  if (ac > 2) {
    convert_to_double_ex(percent);
  }

  if (t1_len + t2_len == 0) {
    if (ac > 2) {
      Z_DVAL_PP(percent) = 0;
    }

    RETURN_LONG(0);
  }

  sim = php_similar_char(t1, t1_len, t2, t2_len);

  if (ac > 2) {
    Z_DVAL_PP(percent) = sim * 200.0 / (t1_len + t2_len);
  }

  RETURN_LONG(sim);
}
/* }}} */ 


/* {{{ php_similar_str
 */
static void php_similar_str(const char *txt1, int len1, const char *txt2, int len2, int *pos1, int *pos2, int *max)
{
  char *p, *q;
  char *end1 = (char *) txt1 + len1;
  char *end2 = (char *) txt2 + len2;
  int l;

  *max = 0;
  for (p = (char *) txt1; p < end1; p++) {
    for (q = (char *) txt2; q < end2; q++) {
      for (l = 0; (p + l < end1) && (q + l < end2) && (p[l] == q[l]); l++);
      if (l > *max) {
        *max = l;
        *pos1 = p - txt1;
        *pos2 = q - txt2;
      }
    }
  }
}
/* }}} */


/* {{{ php_similar_char
 */
static int php_similar_char(const char *txt1, int len1, const char *txt2, int len2)
{
  int sum;
  int pos1, pos2, max;

  php_similar_str(txt1, len1, txt2, len2, &pos1, &pos2, &max);

  if ((sum = max)) {
    if (pos1 && pos2) {
      sum += php_similar_char(txt1, pos1,
                  txt2, pos2);
    }
    if ((pos1 + max < len1) && (pos2 + max < len2)) {
      sum += php_similar_char(txt1 + pos1 + max, len1 - pos1 - max,
                  txt2 + pos2 + max, len2 - pos2 - max);
    }
  }

  return sum;
}
/* }}} */
 
 源码在Javascript中： 类似的文本端口到javascript 

I just found the similar_text function and was playing around with it, but the percentage output always suprises me. See the examples below.  
I tried to find information on the algorithm used as mentioned on php: similar_text()^Docs: 
<?php
$p = 0;
similar_text('aaaaaaaaaa', 'aaaaa', $p);
echo $p . "<hr>";
//66.666666666667
//Since 5 out of 10 chars match, I would expect a 50% match

similar_text('aaaaaaaaaaaaaaaaaaaa', 'aaaaa', $p);
echo $p . "<hr>";
//40
//5 out of 20 > not 25% ?

similar_text('aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa', 'aaaaa', $p);
echo $p . "<hr>"; 
//9.5238095238095 
//5 out of 100 > not 5% ?


//Example from PHP.net
//Why is turning the strings around changing the result?

similar_text('PHP IS GREAT', 'WITH MYSQL', $p);
echo $p . "<hr>"; //27.272727272727

similar_text('WITH MYSQL', 'PHP IS GREAT', $p);
echo $p . "<hr>"; //18.181818181818

?>
 
Can anybody explain how this actually works? 
Update: 
Thanks to the comments I found that the percentage is actually calculated using the number of similar charactors * 200 / length1 + lenght 2 
Z_DVAL_PP(percent) = sim * 200.0 / (t1_len + t2_len);
 
So that explains why the percenatges are higher then expected. With a string with 5 out of 95 it turns out 10, so that I can use. 
similar_text('aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa', 'aaaaa', $p);
echo $p . "<hr>"; 
//10
//5 out of 95 = 5 * 200 / (5 + 95) = 10
 
But I still cant figure out why PHP returns a different result on turning the strings around. The JS code provided by dfsq doesn't do this. Looking at the source code in PHP I can only find a difference in the following line, but i'm not a c programmer. Some insight in what the difference is, would be appreciated. 
In JS: 
for (l = 0;(p + l < firstLength) && (q + l < secondLength) && (first.charAt(p + l) === second.charAt(q + l)); l++);
 
In PHP: (php_similar_str function) 
for (l = 0; (p + l < end1) && (q + l < end2) && (p[l] == q[l]); l++);
 
Source: 
/* {{{ proto int similar_text(string str1, string str2 [, float percent])
   Calculates the similarity between two strings */
PHP_FUNCTION(similar_text)
{
  char *t1, *t2;
  zval **percent = NULL;
  int ac = ZEND_NUM_ARGS();
  int sim;
  int t1_len, t2_len;

  if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "ss|Z", &t1, &t1_len, &t2, &t2_len, &percent) == FAILURE) {
    return;
  }

  if (ac > 2) {
    convert_to_double_ex(percent);
  }

  if (t1_len + t2_len == 0) {
    if (ac > 2) {
      Z_DVAL_PP(percent) = 0;
    }

    RETURN_LONG(0);
  }

  sim = php_similar_char(t1, t1_len, t2, t2_len);

  if (ac > 2) {
    Z_DVAL_PP(percent) = sim * 200.0 / (t1_len + t2_len);
  }

  RETURN_LONG(sim);
}
/* }}} */ 


/* {{{ php_similar_str
 */
static void php_similar_str(const char *txt1, int len1, const char *txt2, int len2, int *pos1, int *pos2, int *max)
{
  char *p, *q;
  char *end1 = (char *) txt1 + len1;
  char *end2 = (char *) txt2 + len2;
  int l;

  *max = 0;
  for (p = (char *) txt1; p < end1; p++) {
    for (q = (char *) txt2; q < end2; q++) {
      for (l = 0; (p + l < end1) && (q + l < end2) && (p[l] == q[l]); l++);
      if (l > *max) {
        *max = l;
        *pos1 = p - txt1;
        *pos2 = q - txt2;
      }
    }
  }
}
/* }}} */


/* {{{ php_similar_char
 */
static int php_similar_char(const char *txt1, int len1, const char *txt2, int len2)
{
  int sum;
  int pos1, pos2, max;

  php_similar_str(txt1, len1, txt2, len2, &pos1, &pos2, &max);

  if ((sum = max)) {
    if (pos1 && pos2) {
      sum += php_similar_char(txt1, pos1,
                  txt2, pos2);
    }
    if ((pos1 + max < len1) && (pos2 + max < len2)) {
      sum += php_similar_char(txt1 + pos1 + max, len1 - pos1 - max,
                  txt2 + pos2 + max, len2 - pos2 - max);
    }
  }

  return sum;
}
/* }}} */
 
Source in Javascript: similar text port to javascript

原文：https://stackoverflow.com/questions/14136349

更新时间：2022-03-06 06:03

最满意答案

 要么你作为一个矩阵使用它：  
holder<-matrix(0,nrow=3,ncol=3)
for(i in 1:3){
    apple<-c(i+1, i*2, i^3)
    holder[,i]<-apple  # columnwise, that's how sapply does it too
}
 
 或者您使用列表：  
holder <- vector('list',3)
for(i in 1:3){
    apple<-c(i+1, i*2, i^3)
    holder[[i]]<-apple
}
 
 或者你只是做R方式：  
holder <- sapply(1:3,function(i) c(i+1, i*2,i^3))
holder.list <- sapply(1:3,function(i) c(i+1, i*2,i^3),simplify=FALSE)
 
 旁注：如果你在R中遇到这个非常基本的问题，我强烈建议你浏览一下你在网上找到的任何介绍。 你得到一个列表：  
 我在哪里可以找到各种实现的有用的R教程？ 

either you work with it as a matrix : 
holder<-matrix(0,nrow=3,ncol=3)
for(i in 1:3){
    apple<-c(i+1, i*2, i^3)
    holder[,i]<-apple  # columnwise, that's how sapply does it too
}
 
Or you use lists: 
holder <- vector('list',3)
for(i in 1:3){
    apple<-c(i+1, i*2, i^3)
    holder[[i]]<-apple
}
 
Or you just do it the R way : 
holder <- sapply(1:3,function(i) c(i+1, i*2,i^3))
holder.list <- sapply(1:3,function(i) c(i+1, i*2,i^3),simplify=FALSE)
 
On a sidenote : if you struggle with this very basic problem in R, I strongly recommend you to browse through any of the introductions you find on the web. You get a list of them at : 
Where can I find useful R tutorials with various implementations?

similar_text如何工作？(How does similar_text work?)

最满意答案

相关问答

R - 循环中将数据写入CSV(R - Writing data to CSV in a loop)[2022-12-16]

R For Loop无法存储数据(R For Loop unable to store the data)[2022-06-12]

存储R循环结果并将其与新结果结合使用(Store R loop result and combine it with new result)[2022-12-17]

R - 无法在向量或数据帧中存储for循环的多个输出(R - fail to store multiple output from for loop in vector or data frame)[2023-06-01]

double for循环，使用R存储每个值(double for loop, store every single value, using R)[2023-01-05]

R-将数据作为data.frame存储在一个循环中(R- store data as data.frame in a loop)[2022-07-03]

For-loop并将结果存储在R中的数组中(For-loop and storing results in an array in R)[2023-05-24]

如何使用R在列表中的列上存储循环(how to store a loop over columns in a list with R)[2024-01-20]

R中的循环：将输出存储在数据帧中(Loop in R: Store output in a dataframe)[2023-05-04]

如何在R中的循环内存储矩阵(How to store matrices inside a loop in R)[2022-09-10]

相关文章

最新问答