首页 \ 问答 \ 如何创建一个对象,将词汇中单词的映射存储到其索引中?(How to create an object which stores mappings from a word in a vocabulary to its index?)

如何创建一个对象,将词汇中单词的映射存储到其索引中?(How to create an object which stores mappings from a word in a vocabulary to its index?)

我在词汇表中有一个标记化的单词列表。 (它已经通过了一套,所以没有重复。)

我的问题

我想要生成一个方法来创建一个允许从词汇到词汇索引的映射的词典。

我的尝试

我目前的方法是这样的:

mapping = { w : vocabulary.index(w) for w in vocabulary }

这应该可行,但效率太低,可能是因为反复使用vocabulary.index(w)数千字。

有没有我可以使用的库,可以更高效地执行此操作? 或者只是更有效的方法?

谢谢。

可能的解决方案1

目前,每次在'词汇表'中达到一个单词时,词汇表将被实现,这需要通过'词汇表'来识别索引,这是针对每个单词完成的。 正如答案中所建议的那样,可能首先列举'词汇'。 这允许通过它来识别索引,如下所示:

mapping = { w : i for i, w in enumerate(vocabulary) }

I have a tokenized list of words in a vocabulary. (It's been passed through a set, so there are no duplicates.)

My problem

I want to generate a method which creates a dictionary that allows a mapping from the word to its index in the vocabulary.

My attempt

My current method is like so:

mapping = { w : vocabulary.index(w) for w in vocabulary }

This should work but it is far too inefficient, probably due to repeatedly using vocabulary.index(w) for thousands of words.

Question

Is there a library that I can use that does this more efficiently? Or just more efficient methods?

Thanks.

POSSIBLE SOLUTION 1

Currently, each time a word is reached in 'vocabulary', vocabulary.index() is implemented, which required a pass through 'vocabulary' to identify the index, which is done for every word. As suggested in an answer, a possibility is to enumerate 'vocabulary' first. This allows one pass through it to identify the index, like so:

mapping = { w : i for i, w in enumerate(vocabulary) }

原文:https://stackoverflow.com/questions/48842567
更新时间:2021-12-11 13:12

最满意答案

对于A部分,我使用了:

svn update --accept mine-full --force

对于B部分,我刚刚做了:

步骤1:

svn revert $localPath

第2步:

svn update $localPath

第3步(使用powershell + svn状态,但可以使用* nix中的rm -rf grep / sed完成):

svn status $localPath --no-ignore |
                Select-String '^[?I]' |
                ForEach-Object {
                    [Regex]::Match($_.Line, '^[^\s]*\s+(.*)$').Groups[1].Value
                } |
                Remove-Item -Recurse -Force -ErrorAction SilentlyContinue

For Part A, I used:

svn update --accept mine-full --force

For Part B, I just did:

Step 1:

svn revert $localPath

Step 2:

svn update $localPath

Step 3 (using powershell + svn status, but can be done with rm -rf grep/sed in *nix):

svn status $localPath --no-ignore |
                Select-String '^[?I]' |
                ForEach-Object {
                    [Regex]::Match($_.Line, '^[^\s]*\s+(.*)$').Groups[1].Value
                } |
                Remove-Item -Recurse -Force -ErrorAction SilentlyContinue

相关问答

更多

相关文章

更多

最新问答

更多
  • 您如何使用git diff文件,并将其应用于同一存储库的副本的本地分支?(How do you take a git diff file, and apply it to a local branch that is a copy of the same repository?)
  • 将长浮点值剪切为2个小数点并复制到字符数组(Cut Long Float Value to 2 decimal points and copy to Character Array)
  • OctoberCMS侧边栏不呈现(OctoberCMS Sidebar not rendering)
  • 页面加载后对象是否有资格进行垃圾回收?(Are objects eligible for garbage collection after the page loads?)
  • codeigniter中的语言不能按预期工作(language in codeigniter doesn' t work as expected)
  • 在计算机拍照在哪里进入
  • 使用cin.get()从c ++中的输入流中丢弃不需要的字符(Using cin.get() to discard unwanted characters from the input stream in c++)
  • No for循环将在for循环中运行。(No for loop will run inside for loop. Testing for primes)
  • 单页应用程序:页面重新加载(Single Page Application: page reload)
  • 在循环中选择具有相似模式的列名称(Selecting Column Name With Similar Pattern in a Loop)
  • System.StackOverflow错误(System.StackOverflow error)
  • KnockoutJS未在嵌套模板上应用beforeRemove和afterAdd(KnockoutJS not applying beforeRemove and afterAdd on nested templates)
  • 散列包括方法和/或嵌套属性(Hash include methods and/or nested attributes)
  • android - 如何避免使用Samsung RFS文件系统延迟/冻结?(android - how to avoid lag/freezes with Samsung RFS filesystem?)
  • TensorFlow:基于索引列表创建新张量(TensorFlow: Create a new tensor based on list of indices)
  • 企业安全培训的各项内容
  • 错误:RPC失败;(error: RPC failed; curl transfer closed with outstanding read data remaining)
  • C#类名中允许哪些字符?(What characters are allowed in C# class name?)
  • NumPy:将int64值存储在np.array中并使用dtype float64并将其转换回整数是否安全?(NumPy: Is it safe to store an int64 value in an np.array with dtype float64 and later convert it back to integer?)
  • 注销后如何隐藏导航portlet?(How to hide navigation portlet after logout?)
  • 将多个行和可变行移动到列(moving multiple and variable rows to columns)
  • 提交表单时忽略基础href,而不使用Javascript(ignore base href when submitting form, without using Javascript)
  • 对setOnInfoWindowClickListener的意图(Intent on setOnInfoWindowClickListener)
  • Angular $资源不会改变方法(Angular $resource doesn't change method)
  • 在Angular 5中不是一个函数(is not a function in Angular 5)
  • 如何配置Composite C1以将.m和桌面作为同一站点提供服务(How to configure Composite C1 to serve .m and desktop as the same site)
  • 不适用:悬停在悬停时:在元素之前[复制](Don't apply :hover when hovering on :before element [duplicate])
  • 常见的python rpc和cli接口(Common python rpc and cli interface)
  • Mysql DB单个字段匹配多个其他字段(Mysql DB single field matching to multiple other fields)
  • 产品页面上的Magento Up出售对齐问题(Magento Up sell alignment issue on the products page)