similar_text如何工作?(How does similar_text work?)
我刚刚发现了Similar_text功能,并且正在玩弄,但输出的百分比总是让我感到惊讶。 参见下面的例子。
我试图找到关于PHP所提及的算法的信息:
similar_text()
文档 :<?php $p = 0; similar_text('aaaaaaaaaa', 'aaaaa', $p); echo $p . "<hr>"; //66.666666666667 //Since 5 out of 10 chars match, I would expect a 50% match similar_text('aaaaaaaaaaaaaaaaaaaa', 'aaaaa', $p); echo $p . "<hr>"; //40 //5 out of 20 > not 25% ? similar_text('aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa', 'aaaaa', $p); echo $p . "<hr>"; //9.5238095238095 //5 out of 100 > not 5% ? //Example from PHP.net //Why is turning the strings around changing the result? similar_text('PHP IS GREAT', 'WITH MYSQL', $p); echo $p . "<hr>"; //27.272727272727 similar_text('WITH MYSQL', 'PHP IS GREAT', $p); echo $p . "<hr>"; //18.181818181818 ?>
任何人都可以解释这是如何实际工作的?
更新:
感谢我的意见,我发现百分比实际上是使用类似的特征数* 200 / length1 + lenght 2
Z_DVAL_PP(percent) = sim * 200.0 / (t1_len + t2_len);
所以这就解释了为什么percenatges高于预期。 有一个字符串与95中的5个结果是10,所以我可以使用。
similar_text('aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa', 'aaaaa', $p); echo $p . "<hr>"; //10 //5 out of 95 = 5 * 200 / (5 + 95) = 10
但是我仍然不明白为什么PHP在转动字符串时返回不同的结果。 dfsq提供的JS代码不会这样做。 看看PHP中的源代码,我只能在以下行找到一个区别,但是我不是ac程序员。 一些洞察有什么区别,不胜感激。
在JS中
for (l = 0;(p + l < firstLength) && (q + l < secondLength) && (first.charAt(p + l) === second.charAt(q + l)); l++);
在PHP中:(php_similar_str函数)
for (l = 0; (p + l < end1) && (q + l < end2) && (p[l] == q[l]); l++);
资源:
/* {{{ proto int similar_text(string str1, string str2 [, float percent]) Calculates the similarity between two strings */ PHP_FUNCTION(similar_text) { char *t1, *t2; zval **percent = NULL; int ac = ZEND_NUM_ARGS(); int sim; int t1_len, t2_len; if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "ss|Z", &t1, &t1_len, &t2, &t2_len, &percent) == FAILURE) { return; } if (ac > 2) { convert_to_double_ex(percent); } if (t1_len + t2_len == 0) { if (ac > 2) { Z_DVAL_PP(percent) = 0; } RETURN_LONG(0); } sim = php_similar_char(t1, t1_len, t2, t2_len); if (ac > 2) { Z_DVAL_PP(percent) = sim * 200.0 / (t1_len + t2_len); } RETURN_LONG(sim); } /* }}} */ /* {{{ php_similar_str */ static void php_similar_str(const char *txt1, int len1, const char *txt2, int len2, int *pos1, int *pos2, int *max) { char *p, *q; char *end1 = (char *) txt1 + len1; char *end2 = (char *) txt2 + len2; int l; *max = 0; for (p = (char *) txt1; p < end1; p++) { for (q = (char *) txt2; q < end2; q++) { for (l = 0; (p + l < end1) && (q + l < end2) && (p[l] == q[l]); l++); if (l > *max) { *max = l; *pos1 = p - txt1; *pos2 = q - txt2; } } } } /* }}} */ /* {{{ php_similar_char */ static int php_similar_char(const char *txt1, int len1, const char *txt2, int len2) { int sum; int pos1, pos2, max; php_similar_str(txt1, len1, txt2, len2, &pos1, &pos2, &max); if ((sum = max)) { if (pos1 && pos2) { sum += php_similar_char(txt1, pos1, txt2, pos2); } if ((pos1 + max < len1) && (pos2 + max < len2)) { sum += php_similar_char(txt1 + pos1 + max, len1 - pos1 - max, txt2 + pos2 + max, len2 - pos2 - max); } } return sum; } /* }}} */
源码在Javascript中: 类似的文本端口到javascript
I just found the similar_text function and was playing around with it, but the percentage output always suprises me. See the examples below.
I tried to find information on the algorithm used as mentioned on php:
similar_text()
Docs:<?php $p = 0; similar_text('aaaaaaaaaa', 'aaaaa', $p); echo $p . "<hr>"; //66.666666666667 //Since 5 out of 10 chars match, I would expect a 50% match similar_text('aaaaaaaaaaaaaaaaaaaa', 'aaaaa', $p); echo $p . "<hr>"; //40 //5 out of 20 > not 25% ? similar_text('aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa', 'aaaaa', $p); echo $p . "<hr>"; //9.5238095238095 //5 out of 100 > not 5% ? //Example from PHP.net //Why is turning the strings around changing the result? similar_text('PHP IS GREAT', 'WITH MYSQL', $p); echo $p . "<hr>"; //27.272727272727 similar_text('WITH MYSQL', 'PHP IS GREAT', $p); echo $p . "<hr>"; //18.181818181818 ?>
Can anybody explain how this actually works?
Update:
Thanks to the comments I found that the percentage is actually calculated using the number of similar charactors * 200 / length1 + lenght 2
Z_DVAL_PP(percent) = sim * 200.0 / (t1_len + t2_len);
So that explains why the percenatges are higher then expected. With a string with 5 out of 95 it turns out 10, so that I can use.
similar_text('aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa', 'aaaaa', $p); echo $p . "<hr>"; //10 //5 out of 95 = 5 * 200 / (5 + 95) = 10
But I still cant figure out why PHP returns a different result on turning the strings around. The JS code provided by dfsq doesn't do this. Looking at the source code in PHP I can only find a difference in the following line, but i'm not a c programmer. Some insight in what the difference is, would be appreciated.
In JS:
for (l = 0;(p + l < firstLength) && (q + l < secondLength) && (first.charAt(p + l) === second.charAt(q + l)); l++);
In PHP: (php_similar_str function)
for (l = 0; (p + l < end1) && (q + l < end2) && (p[l] == q[l]); l++);
Source:
/* {{{ proto int similar_text(string str1, string str2 [, float percent]) Calculates the similarity between two strings */ PHP_FUNCTION(similar_text) { char *t1, *t2; zval **percent = NULL; int ac = ZEND_NUM_ARGS(); int sim; int t1_len, t2_len; if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "ss|Z", &t1, &t1_len, &t2, &t2_len, &percent) == FAILURE) { return; } if (ac > 2) { convert_to_double_ex(percent); } if (t1_len + t2_len == 0) { if (ac > 2) { Z_DVAL_PP(percent) = 0; } RETURN_LONG(0); } sim = php_similar_char(t1, t1_len, t2, t2_len); if (ac > 2) { Z_DVAL_PP(percent) = sim * 200.0 / (t1_len + t2_len); } RETURN_LONG(sim); } /* }}} */ /* {{{ php_similar_str */ static void php_similar_str(const char *txt1, int len1, const char *txt2, int len2, int *pos1, int *pos2, int *max) { char *p, *q; char *end1 = (char *) txt1 + len1; char *end2 = (char *) txt2 + len2; int l; *max = 0; for (p = (char *) txt1; p < end1; p++) { for (q = (char *) txt2; q < end2; q++) { for (l = 0; (p + l < end1) && (q + l < end2) && (p[l] == q[l]); l++); if (l > *max) { *max = l; *pos1 = p - txt1; *pos2 = q - txt2; } } } } /* }}} */ /* {{{ php_similar_char */ static int php_similar_char(const char *txt1, int len1, const char *txt2, int len2) { int sum; int pos1, pos2, max; php_similar_str(txt1, len1, txt2, len2, &pos1, &pos2, &max); if ((sum = max)) { if (pos1 && pos2) { sum += php_similar_char(txt1, pos1, txt2, pos2); } if ((pos1 + max < len1) && (pos2 + max < len2)) { sum += php_similar_char(txt1 + pos1 + max, len1 - pos1 - max, txt2 + pos2 + max, len2 - pos2 - max); } } return sum; } /* }}} */
Source in Javascript: similar text port to javascript
原文:https://stackoverflow.com/questions/14136349
最满意答案
要么你作为一个矩阵使用它:
holder<-matrix(0,nrow=3,ncol=3) for(i in 1:3){ apple<-c(i+1, i*2, i^3) holder[,i]<-apple # columnwise, that's how sapply does it too }
或者您使用列表:
holder <- vector('list',3) for(i in 1:3){ apple<-c(i+1, i*2, i^3) holder[[i]]<-apple }
或者你只是做R方式:
holder <- sapply(1:3,function(i) c(i+1, i*2,i^3)) holder.list <- sapply(1:3,function(i) c(i+1, i*2,i^3),simplify=FALSE)
旁注:如果你在R中遇到这个非常基本的问题,我强烈建议你浏览一下你在网上找到的任何介绍。 你得到一个列表:
either you work with it as a matrix :
holder<-matrix(0,nrow=3,ncol=3) for(i in 1:3){ apple<-c(i+1, i*2, i^3) holder[,i]<-apple # columnwise, that's how sapply does it too }
Or you use lists:
holder <- vector('list',3) for(i in 1:3){ apple<-c(i+1, i*2, i^3) holder[[i]]<-apple }
Or you just do it the R way :
holder <- sapply(1:3,function(i) c(i+1, i*2,i^3)) holder.list <- sapply(1:3,function(i) c(i+1, i*2,i^3),simplify=FALSE)
On a sidenote : if you struggle with this very basic problem in R, I strongly recommend you to browse through any of the introductions you find on the web. You get a list of them at :
Where can I find useful R tutorials with various implementations?
相关问答
更多-
四个不同的.csv文件最简单,因为您可以在循环中执行以下操作: outfile.name <- paste('Sales', year.of.data, sep='') write.csv(outfile.name, out.filepath, row.names=FALSE) 您还可以将数据附加到一个data.frame中,然后将其全部导出到一个工作表中。 您将无法导出.csv的多个工作表,因为CSV不会让您有多个工作表。 Four different .csv files would be easie ...
-
要么你作为一个矩阵使用它: holder<-matrix(0,nrow=3,ncol=3) for(i in 1:3){ apple<-c(i+1, i*2, i^3) holder[,i]<-apple # columnwise, that's how sapply does it too } 或者您使用列表: holder <- vector('list',3) for(i in 1:3){ apple<-c(i+1, i*2, i^3) holder[[i]]<-a ...
-
这里已经提到了很好的评论和回答。 我仍然想澄清可以帮助OP几点。 很显然, for-loop在许多情况下都非常有效,因为循环效率不高。 尽管如果你想在你的循环中解决这个问题,那么只需修改它为: # Calling seed will ensure same output from function like sample. This will # generate consistent result in every attempt set.seed(1) a <- sample(rep(1:5, 10) ...
-
R - 无法在向量或数据帧中存储for循环的多个输出(R - fail to store multiple output from for loop in vector or data frame)[2023-06-01]
为了避免增长for() lapply() ,我们可以使用lapply() 。 处理大型数据集时,这应该更快。 to_be_removed <- lapply(lbt_all_epitopes$sequence[1:5], function(x) agrep(str_sub(x, start = 5, end = 11), lbt_all_epitopes$sequence, value = T)) 给出一个列表,其中包含单独列表条目中每行的提取字符串: [[1]] [1] "RPGGPPGYRTPYTAK ... -
您可以使用apply在每行上轻松执行函数: S0 <- 20 u <- 1.1 d <- .92 means <- apply(w, 1, function (x) { sum(S0,S0*u^x * d^(1-x))/7 }) means # [1] 18.62857 19.14286 19.14286 19.65714 19.14286 19.65714 19.65714 20.17143 19.14286 19.65714 19.65714 20.17143 # [13] 19.65714 20 ...
-
如果你真的想要使用一个循环,猜猜这个有效 Results <- data.frame() Mystep <- 2 for(i in seq(0, 10, by = Mystep )){ xy = c(i*10, i/10) Results = rbind(Results, xy) } names(Results) = c("X", "Y") > Results X Y 1 0 0.0 2 20 0.2 3 40 0.4 4 60 0.6 5 80 0 ...
-
一种方法是循环( sapply )数字( 2:7 ),检查df$x哪些元素小于( < )“数字”并执行sum , cbind与数字一起,将给出matrix输出 res <- cbind(i=2:7, length=sapply(2:7, function(y) sum(df$x假设你的groups变量是一个数据帧,你可以轻松地使用lapply ,你会得到一个很好的命名列表: # generate data players <- paste0('Player',1:32) grps <- data.frame(A=players[1:8],B=players[9:16],C=players[17:24],D=players[25:32]) #smoother version as suggested by P Lapointe: mylist <- lapply(grps,ma ...尝试使用循环执行以下操作: 我正在使用包dplyr。 x <- "" %>% data.frame for(i in 1:96){ y <- F(home[i,2:17])) x <- bind_rows(x,y) } x <- x %>% .[-1,-1] Try doing the following with your loop: I am using the package dplyr. x <- "" %>% data.frame for(i in 1:96){ y <- F(home[i ...正如ALiX指出的那样,您需要在for循环中正确索引store矩阵。 目前您正在尝试将每个矩阵分配给单个行,因此您将收到如下错误: Error in store[i,] <- samp : number of items to replace is not a multiple of replacement length 相反,您需要将其分配给正确的行数。 您还需要store data.frame而不是矩阵,然后修复列名和Species列: species_samp <- sample(unique ...
相关文章
更多- Don’t work. Be hated. Love someone.
- java.text.DecimalFormat 的问题
- SVG文本text
- RabbitMQ Work模式消息队列
- Full-Text Search in ASP.NET using Lucene.NET
- Solr官方文档系列——Text Analysis
- Django and full-text search
- jquery 怎么得到link中text的属性
- 网络挖掘技术——text mining
- 用‘button’跟‘text’组合代替‘file’,选择文件后点‘submit’,‘file’的值被清空
最新问答
更多- 获取MVC 4使用的DisplayMode后缀(Get the DisplayMode Suffix being used by MVC 4)
- 如何通过引用返回对象?(How is returning an object by reference possible?)
- 矩阵如何存储在内存中?(How are matrices stored in memory?)
- 每个请求的Java新会话?(Java New Session For Each Request?)
- css:浮动div中重叠的标题h1(css: overlapping headlines h1 in floated divs)
- 无论图像如何,Caffe预测同一类(Caffe predicts same class regardless of image)
- xcode语法颜色编码解释?(xcode syntax color coding explained?)
- 在Access 2010 Runtime中使用Office 2000校对工具(Use Office 2000 proofing tools in Access 2010 Runtime)
- 从单独的Web主机将图像传输到服务器上(Getting images onto server from separate web host)
- 从旧版本复制文件并保留它们(旧/新版本)(Copy a file from old revision and keep both of them (old / new revision))
- 西安哪有PLC可控制编程的培训
- 在Entity Framework中选择基类(Select base class in Entity Framework)
- 在Android中出现错误“数据集和渲染器应该不为null,并且应该具有相同数量的系列”(Error “Dataset and renderer should be not null and should have the same number of series” in Android)
- 电脑二级VF有什么用
- Datamapper Ruby如何添加Hook方法(Datamapper Ruby How to add Hook Method)
- 金华英语角.
- 手机软件如何制作
- 用于Android webview中图像保存的上下文菜单(Context Menu for Image Saving in an Android webview)
- 注意:未定义的偏移量:PHP(Notice: Undefined offset: PHP)
- 如何读R中的大数据集[复制](How to read large dataset in R [duplicate])
- Unity 5 Heighmap与地形宽度/地形长度的分辨率关系?(Unity 5 Heighmap Resolution relationship to terrain width / terrain length?)
- 如何通知PipedOutputStream线程写入最后一个字节的PipedInputStream线程?(How to notify PipedInputStream thread that PipedOutputStream thread has written last byte?)
- python的访问器方法有哪些
- DeviceNetworkInformation:哪个是哪个?(DeviceNetworkInformation: Which is which?)
- 在Ruby中对组合进行排序(Sorting a combination in Ruby)
- 网站开发的流程?
- 使用Zend Framework 2中的JOIN sql检索数据(Retrieve data using JOIN sql in Zend Framework 2)
- 条带格式类型格式模式编号无法正常工作(Stripes format type format pattern number not working properly)
- 透明度错误IE11(Transparency bug IE11)
- linux的基本操作命令。。。