首页 \ 问答 \ 如何将共生矩阵转换为稀疏矩阵(How to convert co-occurrence matrix to sparse matrix)

如何将共生矩阵转换为稀疏矩阵(How to convert co-occurrence matrix to sparse matrix)

我开始处理稀疏矩阵,所以我对这个主题并不十分精通。 我的问题是,我从单词列表中得到一个简单的合并矩阵,只是一个二维共生矩阵,逐字逐句计算一个单词在同一个上下文中出现的次数。 由于语料库不是那么大,因此矩阵非常稀疏。 我想将它转换为稀疏矩阵,以便能够更好地处理它,最终在事后做一些矩阵乘法运算。 这里我到目前为止所做的(只有第一部分,其余部分只是输出格式和清理数据):

def matrix(from_corpus):    
d = defaultdict(lambda : defaultdict(int))
        heads = set() 
        trans = set()
        for text in corpus:
            d[text[0]][text[1]] += 1
            heads.add(text[0])
            trans.add(text[1])

        return d,heads,trans

我的想法是创建一个新功能:

def matrix_to_sparse(d):
    A = sparse.lil_matrix(d)

这有意义吗? 然而,这是行不通的,我不知道如何得到一个稀疏矩阵。 我应该更好地使用numpy数组吗? 什么是最好的方式来做到这一点。 我想比较许多处理矩阵的方法。

如果有人能让我朝着这个方向发展,那将会很好。


I am starting dealing with sparse matrices so I'm not really proficient on this topic. My problem is, I have a simple coo-occurrences matrix from a word list, just a 2-dimensional co-occurrence matrix word by word counting how many times a word occurs in same context. The matrix is quite sparse since the corpus is not that big. I want to convert it to a sparse matrix to be able to deal better with it, eventually do some matrix multiplication afterwards. Here what I have done until now (only the first part, the rest is just output format and cleaning data):

def matrix(from_corpus):    
d = defaultdict(lambda : defaultdict(int))
        heads = set() 
        trans = set()
        for text in corpus:
            d[text[0]][text[1]] += 1
            heads.add(text[0])
            trans.add(text[1])

        return d,heads,trans

My idea would be to make a new function:

def matrix_to_sparse(d):
    A = sparse.lil_matrix(d)

Does this make any sense? This is however not working and somehow I don't the way how get a sparse matrix. Should I better work with numpy arrays? What would be the best way to do this. I want to compare many ways to deal with matrices.

It would be nice if some could put me in the direction.


原文:https://stackoverflow.com/questions/15030047
更新时间:2022-03-07 21:03

最满意答案

非常感谢@Justin XL的这个想法。 为了让模板选择器再次触发,必须将其设置为null,然后将其设置回相同的引用。 这确实是另一种方式,但我喜欢它比我的第一个更好。 没有优雅在这里找到。

    private void UpdateLayout2(object sender)
    {
        ListBoxItem lbi = UWPUtilities.GetParent<ListBoxItem>(sender as DependencyObject);
        DataTemplateSelector dts = lbi.ContentTemplateSelector;
        lbi.ContentTemplateSelector = null;
        lbi.ContentTemplateSelector = dts;
    }

Thanks very much to @Justin XL for this idea. In order to get the template selector to fire again you have to set it to null and then set it back to the the same reference. This is indeed another kluge but I like it slightly better than my first kluge. No elegance to be found here.

    private void UpdateLayout2(object sender)
    {
        ListBoxItem lbi = UWPUtilities.GetParent<ListBoxItem>(sender as DependencyObject);
        DataTemplateSelector dts = lbi.ContentTemplateSelector;
        lbi.ContentTemplateSelector = null;
        lbi.ContentTemplateSelector = dts;
    }

相关问答

更多

相关文章

更多

最新问答

更多
  • 您如何使用git diff文件,并将其应用于同一存储库的副本的本地分支?(How do you take a git diff file, and apply it to a local branch that is a copy of the same repository?)
  • 将长浮点值剪切为2个小数点并复制到字符数组(Cut Long Float Value to 2 decimal points and copy to Character Array)
  • OctoberCMS侧边栏不呈现(OctoberCMS Sidebar not rendering)
  • 页面加载后对象是否有资格进行垃圾回收?(Are objects eligible for garbage collection after the page loads?)
  • codeigniter中的语言不能按预期工作(language in codeigniter doesn' t work as expected)
  • 在计算机拍照在哪里进入
  • 使用cin.get()从c ++中的输入流中丢弃不需要的字符(Using cin.get() to discard unwanted characters from the input stream in c++)
  • No for循环将在for循环中运行。(No for loop will run inside for loop. Testing for primes)
  • 单页应用程序:页面重新加载(Single Page Application: page reload)
  • 在循环中选择具有相似模式的列名称(Selecting Column Name With Similar Pattern in a Loop)
  • System.StackOverflow错误(System.StackOverflow error)
  • KnockoutJS未在嵌套模板上应用beforeRemove和afterAdd(KnockoutJS not applying beforeRemove and afterAdd on nested templates)
  • 散列包括方法和/或嵌套属性(Hash include methods and/or nested attributes)
  • android - 如何避免使用Samsung RFS文件系统延迟/冻结?(android - how to avoid lag/freezes with Samsung RFS filesystem?)
  • TensorFlow:基于索引列表创建新张量(TensorFlow: Create a new tensor based on list of indices)
  • 企业安全培训的各项内容
  • 错误:RPC失败;(error: RPC failed; curl transfer closed with outstanding read data remaining)
  • C#类名中允许哪些字符?(What characters are allowed in C# class name?)
  • NumPy:将int64值存储在np.array中并使用dtype float64并将其转换回整数是否安全?(NumPy: Is it safe to store an int64 value in an np.array with dtype float64 and later convert it back to integer?)
  • 注销后如何隐藏导航portlet?(How to hide navigation portlet after logout?)
  • 将多个行和可变行移动到列(moving multiple and variable rows to columns)
  • 提交表单时忽略基础href,而不使用Javascript(ignore base href when submitting form, without using Javascript)
  • 对setOnInfoWindowClickListener的意图(Intent on setOnInfoWindowClickListener)
  • Angular $资源不会改变方法(Angular $resource doesn't change method)
  • 在Angular 5中不是一个函数(is not a function in Angular 5)
  • 如何配置Composite C1以将.m和桌面作为同一站点提供服务(How to configure Composite C1 to serve .m and desktop as the same site)
  • 不适用:悬停在悬停时:在元素之前[复制](Don't apply :hover when hovering on :before element [duplicate])
  • 常见的python rpc和cli接口(Common python rpc and cli interface)
  • Mysql DB单个字段匹配多个其他字段(Mysql DB single field matching to multiple other fields)
  • 产品页面上的Magento Up出售对齐问题(Magento Up sell alignment issue on the products page)