文档间相似性(余弦相似度)(Inter document similarity(cosine similarity))
我正在尝试编写一个程序来查找文件的2个文件之间的相似性。 出于这个原因,我正在关注此链接和发布
但是,出现了一个错误
"list object is not callable"
在
test(tf_idf_matrix,count,nltkutil.cosine_distance)
线。
我使用一个文件作为列车集和其他文件作为测试集,我的目标是使用
test()
使用tf-idf输出2个文档之间的余弦相似度。我的代码如下:
def test(tdMatrix,count,fsim): sims=[] sims = np.zeros((len(tdMatrix), count)) for i in range(len(tdMatrix)): for j in range(count): doc1 = np.asarray(tdMatrix[tdMatrix[i], :].todense()).reshape(-1) doc2 = np.asarray(tdMatrix[tdMatrix[j], :].todense()).reshape(-1) sims[i, j] = fsim(doc1, doc2) print sims def main(): file_set=["corpusA.txt","corpusB.txt"] train=[] test=[] for file1 in file_set: s="x"+file1 preprocess(file1,s) count_vectorizer = CountVectorizer() m=open("xcorpusA.txt",'r') for i in m: train.append(i.strip()) #print doc count_vectorizer.fit_transform(train) m1=open("xcorpusB.txt",'r') for i in m1: test.append(i.strip()) freq_term_matrix = count_vectorizer.transform(test) #print freq_term_matrix.todense() tfidf = TfidfTransformer(norm="l2") tfidf.fit(freq_term_matrix) #print "IDF:", tfidf.idf_ tf_idf_matrix = tfidf.transform(freq_term_matrix) print (tf_idf_matrix.toarray()) count=0 for i in tf_idf_matrix.toarray(): for j in i: count+=1 break print "Results with Cosine Distance Similarity Measure" test(tf_idf_matrix,count,nltkutil.cosine_distance) if __name__ == "__main__": main()
I am trying to write a program to find the similarity between 2 files of document. For this reason, I am following this link and a posting from
But, an error is shown up which says
"list object is not callable"
at
test(tf_idf_matrix,count,nltkutil.cosine_distance)
line.
I am using one file as train set and other file as test set and my objective is to use the
test()
to output the cosine similarity between 2 documents using tf-idf.My code is following:
def test(tdMatrix,count,fsim): sims=[] sims = np.zeros((len(tdMatrix), count)) for i in range(len(tdMatrix)): for j in range(count): doc1 = np.asarray(tdMatrix[tdMatrix[i], :].todense()).reshape(-1) doc2 = np.asarray(tdMatrix[tdMatrix[j], :].todense()).reshape(-1) sims[i, j] = fsim(doc1, doc2) print sims def main(): file_set=["corpusA.txt","corpusB.txt"] train=[] test=[] for file1 in file_set: s="x"+file1 preprocess(file1,s) count_vectorizer = CountVectorizer() m=open("xcorpusA.txt",'r') for i in m: train.append(i.strip()) #print doc count_vectorizer.fit_transform(train) m1=open("xcorpusB.txt",'r') for i in m1: test.append(i.strip()) freq_term_matrix = count_vectorizer.transform(test) #print freq_term_matrix.todense() tfidf = TfidfTransformer(norm="l2") tfidf.fit(freq_term_matrix) #print "IDF:", tfidf.idf_ tf_idf_matrix = tfidf.transform(freq_term_matrix) print (tf_idf_matrix.toarray()) count=0 for i in tf_idf_matrix.toarray(): for j in i: count+=1 break print "Results with Cosine Distance Similarity Measure" test(tf_idf_matrix,count,nltkutil.cosine_distance) if __name__ == "__main__": main()
原文:https://stackoverflow.com/questions/21504793
最满意答案
创建一个具有两个容器的分割面板。 一个容器(A)用于动态按钮,另一个容器用于按钮“添加新”。 将新组件添加到A容器中。
找到下面的代码,根据您的情况说明这个概念。 使用风险自负 :)
import javax.swing.*; import java.awt.*; import java.awt.event.*; public class Display extends JFrame { Box upperBox = new Box(BoxLayout.X_AXIS); Box dynamicBox = new Box(BoxLayout.Y_AXIS); Box staticBox = new Box(BoxLayout.X_AXIS); public Display() { super("Test"); setTitle("Test"); setSize(800,800); setResizable(false); setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE); initComponents(); } private void initComponents() { //This will be the parent panel for other panels. JPanel panel = new JPanel(); panel.setLayout(new BoxLayout(panel, BoxLayout.Y_AXIS)); upperBox.add(new JLabel("Resource")); upperBox.add(new JComboBox<>(new String[] { "option1", "option2", "option3",})); upperBox.add(new JLabel("Something")); panel.add(upperBox); staticBox.add(new JButton(new AddResourceAction("Add new"))); panel.add(dynamicBox); //just add this box now, it will be filled later with components panel.add(staticBox); add(panel); } class AddResourceAction extends AbstractAction { public AddResourceAction(String n) { super(n); } @Override public void actionPerformed(ActionEvent e) { Box box = new Box(BoxLayout.X_AXIS); box.add(new JLabel("Resource")); box.add(new JComboBox<>( new String[] { "option1", "option2", "option3",})); box.add(new JLabel("Something")); dynamicBox.add(box); revalidate(); } } public static void main(String[] args) { /*display panel*/ SwingUtilities.invokeLater(new Runnable() { @Override public void run() { new Display().setVisible(true); } }); } }
Create a divided panel, that has two containers. One container (A) for dynamic buttons and another container for the button "Add new". Add new components to the A container.
Find code below that illustrates this concept with your situation. use at your own risk :)
import javax.swing.*; import java.awt.*; import java.awt.event.*; public class Display extends JFrame { Box upperBox = new Box(BoxLayout.X_AXIS); Box dynamicBox = new Box(BoxLayout.Y_AXIS); Box staticBox = new Box(BoxLayout.X_AXIS); public Display() { super("Test"); setTitle("Test"); setSize(800,800); setResizable(false); setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE); initComponents(); } private void initComponents() { //This will be the parent panel for other panels. JPanel panel = new JPanel(); panel.setLayout(new BoxLayout(panel, BoxLayout.Y_AXIS)); upperBox.add(new JLabel("Resource")); upperBox.add(new JComboBox<>(new String[] { "option1", "option2", "option3",})); upperBox.add(new JLabel("Something")); panel.add(upperBox); staticBox.add(new JButton(new AddResourceAction("Add new"))); panel.add(dynamicBox); //just add this box now, it will be filled later with components panel.add(staticBox); add(panel); } class AddResourceAction extends AbstractAction { public AddResourceAction(String n) { super(n); } @Override public void actionPerformed(ActionEvent e) { Box box = new Box(BoxLayout.X_AXIS); box.add(new JLabel("Resource")); box.add(new JComboBox<>( new String[] { "option1", "option2", "option3",})); box.add(new JLabel("Something")); dynamicBox.add(box); revalidate(); } } public static void main(String[] args) { /*display panel*/ SwingUtilities.invokeLater(new Runnable() { @Override public void run() { new Display().setVisible(true); } }); } }
相关问答
更多-
创建一个具有两个容器的分割面板。 一个容器(A)用于动态按钮,另一个容器用于按钮“添加新”。 将新组件添加到A容器中。 找到下面的代码,根据您的情况说明这个概念。 使用风险自负 :) import javax.swing.*; import java.awt.*; import java.awt.event.*; public class Display extends JFrame { Box upperBox = new Box(BoxLayout.X_AXIS); B ...
-
使用像路标一样的库。 然后你不需要担心距离,滚动等等,只需要回调。 Use a library like waypoints. Then you will not need to worry about distances, and scrolling and so on, just about the callback.
-
UPDATE 更新了JSFiddle以符合要求 Hi @Felix您正在寻找的答案与给出的链接非常相似 这里的示例jsFiddle https://jsfiddle.net/df9nykhL/的问题 通过使用给定的方法,您可以获得第一行显示的div的数量,然后您可以根据需要使用它 function countFirstRowItems(parentSelector, childSelector){ var count = 0, theTop = undefined; $(parentSele ...
-
更改 'top' : "-nHeight", 至 'top' : "-" + nHeight JavaScript不解析字符串中的变量。 我也删除了,因为它是多余的,它会在IE中产生错误。 Change 'top' : "-nHeight", to 'top' : "-" + nHeight JavaScript does not parse variables inside of strings. I've also removed the , since it's redundant and it wi ...
-
像这样尝试 RelativeLayout.LayoutParams params1 = new RelativeLayout.LayoutParams(LayoutParams.WRAP_CONTENT, LayoutParams.WRAP_CONTENT); params1.addRule(RelativeLayout.ALIGN_PARENT_LEFT, RelativeLayout.TRUE); yourTextview.setLayoutParams(params1); try lik ...
-
你的例子和Bostock的例子之间的区别是,在他的例子中,有一条连续的路径,他补充到另一条连续的路径。 而在您的示例中,像1,2,3,5,6,7等数字可以使用单个连续路径绘制。 但是,为了画出像4,6,9和0这样的数字,你需要2条路径 - 一条位于另一条之上。 而且,对于数字8,您需要在外部路径上有2条路径。 因此,我的建议是在任何时候都保持2条路径位于您目前使用的外部路径之上,并且每当显示任何特殊数字时给它们适当的尺寸。 请参阅图片了解更多详情: The difference between your e ...
-
如何动态更改Div位置(使用Jquery / Javascript)(How to Change Div Position Dynamically (with Jquery/Javascript))[2023-07-30]
这是一种对我有用的快速而肮脏的方式。 我创建了一个函数,它接受一个数组,指定标签的新顺序,并使用append对它们进行重新排序。 function changeOrder(newOrder) { var $divs = $('div'), $parent = $divs.eq(0).parent(); for (var ii = 0; ii < newOrder.length; ii++) { $parent.append($divs.eq(newO ...$('textarea').on('input', function() { $(this).outerHeight(0); // reset height to reinitialize scrollHeight var scrollHeight = parseInt($(this).prop('scrollHeight')); $(this).height(scrollHeight); $(this).prev('.Content').outerHeight(300 - ...您可以稍微更改您的mousemove代码以更新top位置,如果有重叠,如下所示。 检查演示 - 小提琴 on("mousemove", function(mousePos){ var overlap = mousePos.pageY + posSrollY + $tooltipContainer.height() - $(window).height() - $(window).scrollTop(); $tooltipContainer.css({ left: mouseP ...尝试使用逻辑播放和停止与一些UI元素。 var speed = 5; function play(){ if($("#object").data("play")) { setTimeout(function(){ var count = $("#object").position().left; count += speed; $("#object").css("left ...相关文章
更多- 荐 Lucene打分规则与Similarity模块详解
- 文档相似性匹配
- 关于JQuery的$(document).ready()放的位置
- Solr Document [null] missing required field: id 的原因
- Warning: No grammar constraints (DTD or XML schema) detected for the document
- document.getElementById("name")拿不到name对象里面的值
- lucene/solr 修改评分规则方法总结
- 顶 Storm 【技术文档】- 拓扑并发度
- 快速预览 Jsoup API
- elasticsearch 口水篇(6) Mapping 定义索引
最新问答
更多- 获取MVC 4使用的DisplayMode后缀(Get the DisplayMode Suffix being used by MVC 4)
- 如何通过引用返回对象?(How is returning an object by reference possible?)
- 矩阵如何存储在内存中?(How are matrices stored in memory?)
- 每个请求的Java新会话?(Java New Session For Each Request?)
- css:浮动div中重叠的标题h1(css: overlapping headlines h1 in floated divs)
- 无论图像如何,Caffe预测同一类(Caffe predicts same class regardless of image)
- xcode语法颜色编码解释?(xcode syntax color coding explained?)
- 在Access 2010 Runtime中使用Office 2000校对工具(Use Office 2000 proofing tools in Access 2010 Runtime)
- 从单独的Web主机将图像传输到服务器上(Getting images onto server from separate web host)
- 从旧版本复制文件并保留它们(旧/新版本)(Copy a file from old revision and keep both of them (old / new revision))
- 西安哪有PLC可控制编程的培训
- 在Entity Framework中选择基类(Select base class in Entity Framework)
- 在Android中出现错误“数据集和渲染器应该不为null,并且应该具有相同数量的系列”(Error “Dataset and renderer should be not null and should have the same number of series” in Android)
- 电脑二级VF有什么用
- Datamapper Ruby如何添加Hook方法(Datamapper Ruby How to add Hook Method)
- 金华英语角.
- 手机软件如何制作
- 用于Android webview中图像保存的上下文菜单(Context Menu for Image Saving in an Android webview)
- 注意:未定义的偏移量:PHP(Notice: Undefined offset: PHP)
- 如何读R中的大数据集[复制](How to read large dataset in R [duplicate])
- Unity 5 Heighmap与地形宽度/地形长度的分辨率关系?(Unity 5 Heighmap Resolution relationship to terrain width / terrain length?)
- 如何通知PipedOutputStream线程写入最后一个字节的PipedInputStream线程?(How to notify PipedInputStream thread that PipedOutputStream thread has written last byte?)
- python的访问器方法有哪些
- DeviceNetworkInformation:哪个是哪个?(DeviceNetworkInformation: Which is which?)
- 在Ruby中对组合进行排序(Sorting a combination in Ruby)
- 网站开发的流程?
- 使用Zend Framework 2中的JOIN sql检索数据(Retrieve data using JOIN sql in Zend Framework 2)
- 条带格式类型格式模式编号无法正常工作(Stripes format type format pattern number not working properly)
- 透明度错误IE11(Transparency bug IE11)
- linux的基本操作命令。。。