首页 \ 问答 \ 文档间相似性(余弦相似度)(Inter document similarity(cosine similarity))

文档间相似性(余弦相似度)(Inter document similarity(cosine similarity))

我正在尝试编写一个程序来查找文件的2个文件之间的相似性。 出于这个原因,我正在关注链接和发布

但是,出现了一个错误

"list object is not callable"

test(tf_idf_matrix,count,nltkutil.cosine_distance)

线。

我使用一个文件作为列车集和其他文件作为测试集,我的目标是使用test()使用tf-idf输出2个文档之间的余弦相似度。

我的代码如下:

def test(tdMatrix,count,fsim):

    sims=[] 
    sims = np.zeros((len(tdMatrix), count))

    for i in range(len(tdMatrix)):
        for j in range(count):
                doc1 = np.asarray(tdMatrix[tdMatrix[i], :].todense()).reshape(-1)
                doc2 = np.asarray(tdMatrix[tdMatrix[j], :].todense()).reshape(-1)

                sims[i, j] = fsim(doc1, doc2)

            print sims

def main():

    file_set=["corpusA.txt","corpusB.txt"]
    train=[]
    test=[]

    for file1 in file_set:
        s="x"+file1
        preprocess(file1,s)

    count_vectorizer = CountVectorizer()
    m=open("xcorpusA.txt",'r')
    for i in m:
        train.append(i.strip())
    #print doc
    count_vectorizer.fit_transform(train)


    m1=open("xcorpusB.txt",'r')
    for i in m1:
        test.append(i.strip())

    freq_term_matrix = count_vectorizer.transform(test)
    #print freq_term_matrix.todense()

    tfidf = TfidfTransformer(norm="l2")
    tfidf.fit(freq_term_matrix)

    #print "IDF:", tfidf.idf_

    tf_idf_matrix = tfidf.transform(freq_term_matrix)
    print (tf_idf_matrix.toarray())

    count=0

    for i in tf_idf_matrix.toarray():
        for j in i:
            count+=1    
        break

    print "Results with Cosine Distance Similarity Measure"
    test(tf_idf_matrix,count,nltkutil.cosine_distance)


if __name__ == "__main__":
    main()

I am trying to write a program to find the similarity between 2 files of document. For this reason, I am following this link and a posting from

But, an error is shown up which says

"list object is not callable"

at

test(tf_idf_matrix,count,nltkutil.cosine_distance)

line.

I am using one file as train set and other file as test set and my objective is to use the test() to output the cosine similarity between 2 documents using tf-idf.

My code is following:

def test(tdMatrix,count,fsim):

    sims=[] 
    sims = np.zeros((len(tdMatrix), count))

    for i in range(len(tdMatrix)):
        for j in range(count):
                doc1 = np.asarray(tdMatrix[tdMatrix[i], :].todense()).reshape(-1)
                doc2 = np.asarray(tdMatrix[tdMatrix[j], :].todense()).reshape(-1)

                sims[i, j] = fsim(doc1, doc2)

            print sims

def main():

    file_set=["corpusA.txt","corpusB.txt"]
    train=[]
    test=[]

    for file1 in file_set:
        s="x"+file1
        preprocess(file1,s)

    count_vectorizer = CountVectorizer()
    m=open("xcorpusA.txt",'r')
    for i in m:
        train.append(i.strip())
    #print doc
    count_vectorizer.fit_transform(train)


    m1=open("xcorpusB.txt",'r')
    for i in m1:
        test.append(i.strip())

    freq_term_matrix = count_vectorizer.transform(test)
    #print freq_term_matrix.todense()

    tfidf = TfidfTransformer(norm="l2")
    tfidf.fit(freq_term_matrix)

    #print "IDF:", tfidf.idf_

    tf_idf_matrix = tfidf.transform(freq_term_matrix)
    print (tf_idf_matrix.toarray())

    count=0

    for i in tf_idf_matrix.toarray():
        for j in i:
            count+=1    
        break

    print "Results with Cosine Distance Similarity Measure"
    test(tf_idf_matrix,count,nltkutil.cosine_distance)


if __name__ == "__main__":
    main()

原文:https://stackoverflow.com/questions/21504793
更新时间:2022-04-30 13:04

最满意答案

创建一个具有两个容器的分割面板。 一个容器(A)用于动态按钮,另一个容器用于按钮“添加新”。 将新组件添加到A容器中。

找到下面的代码,根据您的情况说明这个概念。 使用风险自负 :)

import javax.swing.*;
import java.awt.*;
import java.awt.event.*;

public class Display 
    extends JFrame
{ 
    Box upperBox   = new Box(BoxLayout.X_AXIS);
    Box dynamicBox = new Box(BoxLayout.Y_AXIS);
    Box staticBox  = new Box(BoxLayout.X_AXIS);

    public Display()
    {
        super("Test");
        setTitle("Test");
        setSize(800,800);
        setResizable(false);

        setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);
        initComponents();
    }

    private void initComponents()
    {
        //This will be the parent panel for other panels.
        JPanel panel = new JPanel();
        panel.setLayout(new BoxLayout(panel, BoxLayout.Y_AXIS));

        upperBox.add(new JLabel("Resource"));
        upperBox.add(new JComboBox<>(new String[] { "option1", "option2", "option3",}));
        upperBox.add(new JLabel("Something"));

        panel.add(upperBox);

        staticBox.add(new JButton(new AddResourceAction("Add new")));

        panel.add(dynamicBox); //just add this box now, it will be filled later with components
        panel.add(staticBox);  

        add(panel); 
    }

    class AddResourceAction extends AbstractAction 
    {
        public AddResourceAction(String n)
        {
            super(n);
        }

        @Override
        public void actionPerformed(ActionEvent e) {

            Box box = new Box(BoxLayout.X_AXIS);
            box.add(new JLabel("Resource"));
            box.add(new JComboBox<>(
                        new String[] { "option1", "option2", "option3",}));
            box.add(new JLabel("Something"));

            dynamicBox.add(box);

            revalidate(); 
        }
    }

    public static void main(String[] args) 
    {
        /*display panel*/
        SwingUtilities.invokeLater(new Runnable()
        {
            @Override 
            public void run() 
            {
                new Display().setVisible(true);
            }
        });
    }
}

Create a divided panel, that has two containers. One container (A) for dynamic buttons and another container for the button "Add new". Add new components to the A container.

Find code below that illustrates this concept with your situation. use at your own risk :)

import javax.swing.*;
import java.awt.*;
import java.awt.event.*;

public class Display 
    extends JFrame
{ 
    Box upperBox   = new Box(BoxLayout.X_AXIS);
    Box dynamicBox = new Box(BoxLayout.Y_AXIS);
    Box staticBox  = new Box(BoxLayout.X_AXIS);

    public Display()
    {
        super("Test");
        setTitle("Test");
        setSize(800,800);
        setResizable(false);

        setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);
        initComponents();
    }

    private void initComponents()
    {
        //This will be the parent panel for other panels.
        JPanel panel = new JPanel();
        panel.setLayout(new BoxLayout(panel, BoxLayout.Y_AXIS));

        upperBox.add(new JLabel("Resource"));
        upperBox.add(new JComboBox<>(new String[] { "option1", "option2", "option3",}));
        upperBox.add(new JLabel("Something"));

        panel.add(upperBox);

        staticBox.add(new JButton(new AddResourceAction("Add new")));

        panel.add(dynamicBox); //just add this box now, it will be filled later with components
        panel.add(staticBox);  

        add(panel); 
    }

    class AddResourceAction extends AbstractAction 
    {
        public AddResourceAction(String n)
        {
            super(n);
        }

        @Override
        public void actionPerformed(ActionEvent e) {

            Box box = new Box(BoxLayout.X_AXIS);
            box.add(new JLabel("Resource"));
            box.add(new JComboBox<>(
                        new String[] { "option1", "option2", "option3",}));
            box.add(new JLabel("Something"));

            dynamicBox.add(box);

            revalidate(); 
        }
    }

    public static void main(String[] args) 
    {
        /*display panel*/
        SwingUtilities.invokeLater(new Runnable()
        {
            @Override 
            public void run() 
            {
                new Display().setVisible(true);
            }
        });
    }
}

相关问答

更多

相关文章

更多

最新问答

更多
  • 获取MVC 4使用的DisplayMode后缀(Get the DisplayMode Suffix being used by MVC 4)
  • 如何通过引用返回对象?(How is returning an object by reference possible?)
  • 矩阵如何存储在内存中?(How are matrices stored in memory?)
  • 每个请求的Java新会话?(Java New Session For Each Request?)
  • css:浮动div中重叠的标题h1(css: overlapping headlines h1 in floated divs)
  • 无论图像如何,Caffe预测同一类(Caffe predicts same class regardless of image)
  • xcode语法颜色编码解释?(xcode syntax color coding explained?)
  • 在Access 2010 Runtime中使用Office 2000校对工具(Use Office 2000 proofing tools in Access 2010 Runtime)
  • 从单独的Web主机将图像传输到服务器上(Getting images onto server from separate web host)
  • 从旧版本复制文件并保留它们(旧/新版本)(Copy a file from old revision and keep both of them (old / new revision))
  • 西安哪有PLC可控制编程的培训
  • 在Entity Framework中选择基类(Select base class in Entity Framework)
  • 在Android中出现错误“数据集和渲染器应该不为null,并且应该具有相同数量的系列”(Error “Dataset and renderer should be not null and should have the same number of series” in Android)
  • 电脑二级VF有什么用
  • Datamapper Ruby如何添加Hook方法(Datamapper Ruby How to add Hook Method)
  • 金华英语角.
  • 手机软件如何制作
  • 用于Android webview中图像保存的上下文菜单(Context Menu for Image Saving in an Android webview)
  • 注意:未定义的偏移量:PHP(Notice: Undefined offset: PHP)
  • 如何读R中的大数据集[复制](How to read large dataset in R [duplicate])
  • Unity 5 Heighmap与地形宽度/地形长度的分辨率关系?(Unity 5 Heighmap Resolution relationship to terrain width / terrain length?)
  • 如何通知PipedOutputStream线程写入最后一个字节的PipedInputStream线程?(How to notify PipedInputStream thread that PipedOutputStream thread has written last byte?)
  • python的访问器方法有哪些
  • DeviceNetworkInformation:哪个是哪个?(DeviceNetworkInformation: Which is which?)
  • 在Ruby中对组合进行排序(Sorting a combination in Ruby)
  • 网站开发的流程?
  • 使用Zend Framework 2中的JOIN sql检索数据(Retrieve data using JOIN sql in Zend Framework 2)
  • 条带格式类型格式模式编号无法正常工作(Stripes format type format pattern number not working properly)
  • 透明度错误IE11(Transparency bug IE11)
  • linux的基本操作命令。。。