Carrot2文档的相似性以及tf-idf矩阵中的有序文档索引如何?(Carrot2 documents similarity and how are the ordered documents indexes in the tf-idf matrix?)
我正在尝试使用胡萝卜来确定两个文档之间的相似性。 是否有可能直接从框架中获得这种相似性?
另外,我一直在研究tf-idf矩阵,并意识到行对应于所有单词和列到文档。 但是,如何识别哪个文档对应哪个列?
例如,假设一个文档列表,列顺序将是列表中文档的顺序?
例如:
列出docs = {doc1,doc2,doc3}
和
第0列= doc1 Coluns 1 = doc2
...
这是?
I'm trying to determine the similarity between two documents using carrot. Is it possible get this similarity directly from the framework?
Additionally I've been studying the tf-idf matrix and realized that the rows correspond to the stemmed all words and columns to documents. However, how can I identify which document corresponds to which column?
For example, suppose a list of documents, the column order will be the order of the documents in the list?
Ex:
List docs = {doc1, doc2, doc3}
and
Column 0 = doc1 Coluns 1 = doc2
...
Is this?
原文:https://stackoverflow.com/questions/27800983
最满意答案
每个http://msdn.microsoft.com/en-us/library/office/ff837097%28v=office.15%29.aspx
_wbs.OpenText(Path:=pathTemp,Datatype:=xlDelimited,Other:=True,Otherchar:="*");
Per http://msdn.microsoft.com/en-us/library/office/ff837097%28v=office.15%29.aspx
_wbs.OpenText(Path:=pathTemp,Datatype:=xlDelimited,Other:=True,Otherchar:="*");
相关问答
更多-
这样做的一种方法是查询传播表单 string strConn = "Provider=Microsoft.Jet.OleDb.4.0;data source=C:\\Inetpub\\wwwroot\\CS\\HostData.xls;Extended Properties=Excel 8.0;"; OleDbConnection objConn = new OleDbConnection(strConn); string strSQL = "SELECT * FROM [A1:B439]"; OleDb ...
-
从C#读取Excel文件(Reading Excel files from C#)[2022-07-28]
var fileName = string.Format("{0}\\fileNameHere", Directory.GetCurrentDirectory()); var connectionString = string.Format("Provider=Microsoft.Jet.OLEDB.4.0; data source={0}; Extended Properties=Excel 8.0;", fileName); var adapter = new OleDbDataAdapter("SE ... -
这应该使用BytesCout SpreadSheet SDK : using System; using System.Collections.Generic; using System.IO; using System.Text; using Bytescout.Spreadsheet; namespace Converting_XLS_to_TXT { class Program { static void Main(string[] args) { // Create new Spreadsheet ...
-
每个http://msdn.microsoft.com/en-us/library/office/ff837097%28v=office.15%29.aspx _wbs.OpenText(Path:=pathTemp,Datatype:=xlDelimited,Other:=True,Otherchar:="*"); Per http://msdn.microsoft.com/en-us/library/office/ff837097%28v=office.15%29.aspx _wbs.OpenText ...
-
由于您要对文本进行限定的字符是'(单引号),您必须将Excel.XlTextQualifier.xlTextQualifierSingleQuote作为参数传递。 有关更多信息,请参阅Excel开发人员参考 。 下面的代码应该使您的导入工作。 Excel.Application xlApp; Excel.Workbook xlWorkBook; Excel.Worksheet xlWorkSheet; int[,] fieldInfo = new int[4, 2] { { 1, 2 }, { 2, 4 } ...
-
从C#读取Excel文件(Reading an Excel File From C#)[2022-10-27]
我认为你的连接字符串格式错误,并且“无法找到可安装的ISAM”通常表明了这一点。 试试这个,它来自我拥有的一段操作代码: Excel 2007 string connectionString = string.Format("Provider=Microsoft.ACE.OLEDB.12.0;Data Source={0};Extended Properties=\"Excel 12.0 Xml;HDR=No;IMEX=1\";", fullPath); Excel 2003 string connect ... -
虽然Siddart为您提供了链接,但您也可以尝试下面的链接。 我刚刚添加了一些修复程序以某种方式帮助您获得所需内容。 EDIT2: Sub CopyData() Dim fileDia As FileDialog Dim i As Integer Dim done As Boolean Dim strpathfile As String, filename As String '--> initialize variables here i = 1 done = False Set fileDia = ...
-
而不是Workbooks.OpenText使用Workbooks.Open Instead of Workbooks.OpenText use Workbooks.Open
-
Excel OpenText方法(Excel OpenText method)[2022-12-19]
从您链接到的KB文章: 一个警告是,如果你传递多个参数,他们需要以相反顺序传递。 从MSDN ,到OpenText的参数是: expression.OpenText(Filename, Origin, StartRow, DataType, TextQualifier, ConsecutiveDelimiter, Tab, Semicolon, Comma, Space, Other, OtherChar, FieldInfo, TextVisualLayout, DecimalSeparator, ... -
我已经使用Workbooks.OpenText打开它时关闭文本文件(Close text file when I already use Workbooks.OpenText to open it)[2022-04-01]
这对我也有用 Sub Sample() Dim myfile As String myfile = "C:\Delete Me.txt" Workbooks.OpenText fileName:=myfile, _ DataType:=xlDelimited, _ Origin:=xlWindows, _ Other:=True, _ ...