首页 \ 问答 \ 将单词列表转换为这些单词出现的频率列表(Converting a list of words into a list of the frequency in which those words appear)

将单词列表转换为这些单词出现的频率列表(Converting a list of words into a list of the frequency in which those words appear)

我正在用各种单词列表进行广泛的工作。

请考虑我有以下问题:

docText={"settlement", "new", "beginnings", "wildwood", "settlement", "book",
"excerpt", "agnes", "leffler", "perry", "my", "mother", "junetta", 
"hally", "leffler", "brought", "my", "brother", "frank", "and", "me", 
"to", "edmonton", "from", "monmouth", "illinois", "mrs", "matilda", 
"groff", "accompanied", "us", "her", "husband", "joseph", "groff", 
"my", "father", "george", "leffler", "and", "my", "uncle", "andrew", 
"henderson", "were", "already", "in", "edmonton", "they", "came", 
"in", "1910", "we", "arrived", "july", "1", "1911", "the", "sun", 
"was", "shining", "when", "we", "arrived", "however", "it", "had", 
"been", "raining", "for", "days", "and", "it", "was", "very", 
"muddy", "especially", "around", "the", "cn", "train"}

searchWords={"the","for","my","and","me","and","we"}

这些列表中的每一个都要长得多(例如, searchWords列表中的250个单词和大约12,000个单词的docText )。

现在,我可以通过执行以下操作来计算给定单词的频率:

docFrequency=Sort[Tally[docText],#1[[2]]>#2[[2]]&];    
Flatten[Cases[docFrequency,{"settlement",_}]][[2]]

但是,我被挂断的地方在于我想要生成特定的列表。 具体而言,将单词列表转换为这些单词出现的频率列表的问题。 我试图用Do循环来做到这一点,但却遇到了困难。

我想用searchWords查看docText ,并用纯粹的外观频率替换searchWords每个元素。 即因为“结算”出现了两次,它将在列表中被替换为2,而由于“我”出现三次,它将变成3.该列表将会是类似于2,1,1,1,2和等等。

我怀疑答案在If[]Map[]某处?

这听起来很奇怪,但我试图预处理一些词频信息的信息......


增加清晰度(我希望):

这是一个更好的例子。

searchWords={"0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "a", "A", "about", 
"above", "across", "after", "again", "against", "all", "almost", 
"alone", "along", "already", "also", "although", "always", "among", 
"an", "and", "another", "any", "anyone", "anything", "anywhere", 
"are", "around", "as", "at", "b", "B", "back", "be", "became", 
"because", "become", "becomes", "been", "before", "behind", "being", 
"between", "both", "but", "by", "c", "C", "can", "cannot", "could", 
"d", "D", "do", "done", "down", "during", "e", "E", "each", "either", 
"enough", "even", "ever", "every", "everyone", "everything", 
"everywhere", "f", "F", "few", "find", "first", "for", "four", 
"from", "full", "further", "g", "G", "get", "give", "go", "h", "H", 
"had", "has", "have", "he", "her", "here", "herself", "him", 
"himself", "his", "how", "however", "i", "I", "if", "in", "interest", 
"into", "is", "it", "its", "itself", "j", "J", "k", "K", "keep", "l", 
"L", "last", "least", "less", "m", "M", "made", "many", "may", "me", 
"might", "more", "most", "mostly", "much", "must", "my", "myself", 
"n", "N", "never", "next", "no", "nobody", "noone", "not", "nothing", 
"now", "nowhere", "o", "O", "of", "off", "often", "on", "once", 
"one", "only", "or", "other", "others", "our", "out", "over", "p", 
"P", "part", "per", "perhaps", "put", "q", "Q", "r", "R", "rather", 
"s", "S", "same", "see", "seem", "seemed", "seeming", "seems", 
"several", "she", "should", "show", "side", "since", "so", "some", 
"someone", "something", "somewhere", "still", "such", "t", "T", 
"take", "than", "that", "the", "their", "them", "then", "there", 
"therefore", "these", "they", "this", "those", "though", "three", 
"through", "thus", "to", "together", "too", "toward", "two", "u", 
"U", "under", "until", "up", "upon", "us", "v", "V", "very", "w", 
"W", "was", "we", "well", "were", "what", "when", "where", "whether", 
"which", "while", "who", "whole", "whose", "why", "will", "with", 
"within", "without", "would", "x", "X", "y", "Y", "yet", "you", 
"your", "yours", "z", "Z"}

这些是WordData[]自动生成的停用词。 所以我想比较这些词与docText。 由于“结算”不是searchWords一部分,因此它会显示为0.但由于“my”是searchWords一部分,它会弹出作为计数(所以我可以知道给定词出现多少次)。

我真的很感谢你的帮助 - 我很期待能够参加一些正式课程,因为我碰到了能够真正解释我想做什么的能力!


I am doing extensive work with a variety of word lists.

Please consider the following question that I have:

docText={"settlement", "new", "beginnings", "wildwood", "settlement", "book",
"excerpt", "agnes", "leffler", "perry", "my", "mother", "junetta", 
"hally", "leffler", "brought", "my", "brother", "frank", "and", "me", 
"to", "edmonton", "from", "monmouth", "illinois", "mrs", "matilda", 
"groff", "accompanied", "us", "her", "husband", "joseph", "groff", 
"my", "father", "george", "leffler", "and", "my", "uncle", "andrew", 
"henderson", "were", "already", "in", "edmonton", "they", "came", 
"in", "1910", "we", "arrived", "july", "1", "1911", "the", "sun", 
"was", "shining", "when", "we", "arrived", "however", "it", "had", 
"been", "raining", "for", "days", "and", "it", "was", "very", 
"muddy", "especially", "around", "the", "cn", "train"}

searchWords={"the","for","my","and","me","and","we"}

Each of these lists are much longer (say 250 words in the searchWords list and docText being about 12,000 words).

Right now, I have the ability to figure out frequency of a given word by doing something like:

docFrequency=Sort[Tally[docText],#1[[2]]>#2[[2]]&];    
Flatten[Cases[docFrequency,{"settlement",_}]][[2]]

But where I am getting hung up is on my quest to generate specific lists. Specifically, the issue of converting a list of words into a list of the frequency in which those words appear. I've tried to do this with Do loops but have hit a wall.

I want to go through docText with searchWords and replace each element of docText with the sheer frequency of its appearance. I.e. since "settlement" appears twice, it would be replaced by 2 in the list, whereas since "my" appears 3 times, it would become 3. The list would then be something like 2,1,1,1,2, and so forth.

I suspect the answer lies somewhere in If[] and Map[]?

This all sounds weird, but I am trying to pre-process a bunch of information for term frequency information…


Addition for Clarity (I hope):

Here is a better example.

searchWords={"0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "a", "A", "about", 
"above", "across", "after", "again", "against", "all", "almost", 
"alone", "along", "already", "also", "although", "always", "among", 
"an", "and", "another", "any", "anyone", "anything", "anywhere", 
"are", "around", "as", "at", "b", "B", "back", "be", "became", 
"because", "become", "becomes", "been", "before", "behind", "being", 
"between", "both", "but", "by", "c", "C", "can", "cannot", "could", 
"d", "D", "do", "done", "down", "during", "e", "E", "each", "either", 
"enough", "even", "ever", "every", "everyone", "everything", 
"everywhere", "f", "F", "few", "find", "first", "for", "four", 
"from", "full", "further", "g", "G", "get", "give", "go", "h", "H", 
"had", "has", "have", "he", "her", "here", "herself", "him", 
"himself", "his", "how", "however", "i", "I", "if", "in", "interest", 
"into", "is", "it", "its", "itself", "j", "J", "k", "K", "keep", "l", 
"L", "last", "least", "less", "m", "M", "made", "many", "may", "me", 
"might", "more", "most", "mostly", "much", "must", "my", "myself", 
"n", "N", "never", "next", "no", "nobody", "noone", "not", "nothing", 
"now", "nowhere", "o", "O", "of", "off", "often", "on", "once", 
"one", "only", "or", "other", "others", "our", "out", "over", "p", 
"P", "part", "per", "perhaps", "put", "q", "Q", "r", "R", "rather", 
"s", "S", "same", "see", "seem", "seemed", "seeming", "seems", 
"several", "she", "should", "show", "side", "since", "so", "some", 
"someone", "something", "somewhere", "still", "such", "t", "T", 
"take", "than", "that", "the", "their", "them", "then", "there", 
"therefore", "these", "they", "this", "those", "though", "three", 
"through", "thus", "to", "together", "too", "toward", "two", "u", 
"U", "under", "until", "up", "upon", "us", "v", "V", "very", "w", 
"W", "was", "we", "well", "were", "what", "when", "where", "whether", 
"which", "while", "who", "whole", "whose", "why", "will", "with", 
"within", "without", "would", "x", "X", "y", "Y", "yet", "you", 
"your", "yours", "z", "Z"}

These are the automatically generated stopwords from WordData[]. So I want to compare these words against docText. Since "settlement" is NOT part of searchWords, then it would appear as 0. But since "my" is part of searchWords, it would pop up as the count (so I could tell how many times the given word appears).

I really do thank you for your help - I'm looking forward to taking some formal courses soon as I'm bumping up against the edge of my ability to really explain what I want to do!


原文:https://stackoverflow.com/questions/8973830
更新时间:2022-01-23 09:01

最满意答案

你有错误的界限。 你需要香蕉盒装订[(ngModel)]="serial"而不是[ngModel]="serial"

()在每次输入变化时都会更新serial模型。 从inputmodel

如果手动更改代码,Single []将只绑定serial数据。 这将导致单向绑定 - 从modelinput

正如你猜 - 一起[()]他们会做双向绑定。


You have wrong bound. You need banana-in-box binding [(ngModel)]="serial" instead of [ngModel]="serial"

() in the binding will update serial model everytime when the input will be changes. From input into model

Single [] will just bind the data of serial if it will be changed by code manually. This will cause to one-way binding - from model into input.

As you guess - together [()] they will make two-way binding.

相关问答

更多
  • 如果id与您传递的名称相同,则可以获得名称 firechange(event){ if(this.userProfileForm.controls[$event.target.id].valid){ } If the id is the same as the name you are passing you can get the name like firechange(event){ if(this.userProfileForm.controls[$event.target.id].val ...
  • 问题可能出在这段代码中: var isNumberRegex = /^\d$/; if (!isNumberRegex.test(event.key)... 检查event.key是否具有适当的值。 经过一些研究后,似乎因为selenium web驱动程序不再使用本机事件,event.key属性将为空。 所以唯一的解决方案是接收事件的关键属性,如下所示: String.fromCharCode(event.keyCode); 这是开发人员必须这样做的方式。 它与Angular 2无关。查 ...
  • 首先,因为所有字段都将在名称为{{drug.drugName}} formController中注册。 第二件事是没有myForm.{{drug.drugName}}.$invalid范围myForm.{{drug.drugName}}.$invalid 。 您可能尝试过myForm[drug.drugName].$invalid 。 但它无论如何都行不通 - 看看1。 我认为仍然没有正确的方法来在ng-repeat动态设置name字段。 相反,你需要创建tour own指令,它将重复的列表元素构建为Str ...
  • 尝试使用div封装按钮和输入 从按钮事件得到这个(按钮) 使用$(this).closest('div')转到父级并从那里获取下一个输入。 $(document).ready(function() { $('.clickme').click(function() { var div = $(this).closest('div'); alert(div.find('input').val()); }); });