首页 \ 问答 \ 将单词列表转换为这些单词出现的频率列表(Converting a list of words into a list of the frequency in which those words appear)

将单词列表转换为这些单词出现的频率列表(Converting a list of words into a list of the frequency in which those words appear)

 我正在用各种单词列表进行广泛的工作。  
 请考虑我有以下问题：  
docText={"settlement", "new", "beginnings", "wildwood", "settlement", "book",
"excerpt", "agnes", "leffler", "perry", "my", "mother", "junetta", 
"hally", "leffler", "brought", "my", "brother", "frank", "and", "me", 
"to", "edmonton", "from", "monmouth", "illinois", "mrs", "matilda", 
"groff", "accompanied", "us", "her", "husband", "joseph", "groff", 
"my", "father", "george", "leffler", "and", "my", "uncle", "andrew", 
"henderson", "were", "already", "in", "edmonton", "they", "came", 
"in", "1910", "we", "arrived", "july", "1", "1911", "the", "sun", 
"was", "shining", "when", "we", "arrived", "however", "it", "had", 
"been", "raining", "for", "days", "and", "it", "was", "very", 
"muddy", "especially", "around", "the", "cn", "train"}

searchWords={"the","for","my","and","me","and","we"}
 
 这些列表中的每一个都要长得多（例如， searchWords列表中的250个单词和大约12,000个单词的docText ）。  
 现在，我可以通过执行以下操作来计算给定单词的频率：  
docFrequency=Sort[Tally[docText],#1[[2]]>#2[[2]]&];    
Flatten[Cases[docFrequency,{"settlement",_}]][[2]]
 
 但是，我被挂断的地方在于我想要生成特定的列表。 具体而言，将单词列表转换为这些单词出现的频率列表的问题。 我试图用Do循环来做到这一点，但却遇到了困难。  
 我想用searchWords查看docText ，并用纯粹的外观频率替换searchWords每个元素。 即因为“结算”出现了两次，它将在列表中被替换为2，而由于“我”出现三次，它将变成3.该列表将会是类似于2,1,1,1,2和等等。  
 我怀疑答案在If[]和Map[]某处？  
 这听起来很奇怪，但我试图预处理一些词频信息的信息......  
 
 增加清晰度（我希望）：  
 这是一个更好的例子。  
searchWords={"0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "a", "A", "about", 
"above", "across", "after", "again", "against", "all", "almost", 
"alone", "along", "already", "also", "although", "always", "among", 
"an", "and", "another", "any", "anyone", "anything", "anywhere", 
"are", "around", "as", "at", "b", "B", "back", "be", "became", 
"because", "become", "becomes", "been", "before", "behind", "being", 
"between", "both", "but", "by", "c", "C", "can", "cannot", "could", 
"d", "D", "do", "done", "down", "during", "e", "E", "each", "either", 
"enough", "even", "ever", "every", "everyone", "everything", 
"everywhere", "f", "F", "few", "find", "first", "for", "four", 
"from", "full", "further", "g", "G", "get", "give", "go", "h", "H", 
"had", "has", "have", "he", "her", "here", "herself", "him", 
"himself", "his", "how", "however", "i", "I", "if", "in", "interest", 
"into", "is", "it", "its", "itself", "j", "J", "k", "K", "keep", "l", 
"L", "last", "least", "less", "m", "M", "made", "many", "may", "me", 
"might", "more", "most", "mostly", "much", "must", "my", "myself", 
"n", "N", "never", "next", "no", "nobody", "noone", "not", "nothing", 
"now", "nowhere", "o", "O", "of", "off", "often", "on", "once", 
"one", "only", "or", "other", "others", "our", "out", "over", "p", 
"P", "part", "per", "perhaps", "put", "q", "Q", "r", "R", "rather", 
"s", "S", "same", "see", "seem", "seemed", "seeming", "seems", 
"several", "she", "should", "show", "side", "since", "so", "some", 
"someone", "something", "somewhere", "still", "such", "t", "T", 
"take", "than", "that", "the", "their", "them", "then", "there", 
"therefore", "these", "they", "this", "those", "though", "three", 
"through", "thus", "to", "together", "too", "toward", "two", "u", 
"U", "under", "until", "up", "upon", "us", "v", "V", "very", "w", 
"W", "was", "we", "well", "were", "what", "when", "where", "whether", 
"which", "while", "who", "whole", "whose", "why", "will", "with", 
"within", "without", "would", "x", "X", "y", "Y", "yet", "you", 
"your", "yours", "z", "Z"}
 
 这些是WordData[]自动生成的停用词。 所以我想比较这些词与docText。 由于“结算”不是searchWords一部分，因此它会显示为0.但由于“my”是searchWords一部分，它会弹出作为计数（所以我可以知道给定词出现多少次）。  
 我真的很感谢你的帮助 - 我很期待能够参加一些正式课程，因为我碰到了能够真正解释我想做什么的能力！ 

I am doing extensive work with a variety of word lists. 
Please consider the following question that I have: 
docText={"settlement", "new", "beginnings", "wildwood", "settlement", "book",
"excerpt", "agnes", "leffler", "perry", "my", "mother", "junetta", 
"hally", "leffler", "brought", "my", "brother", "frank", "and", "me", 
"to", "edmonton", "from", "monmouth", "illinois", "mrs", "matilda", 
"groff", "accompanied", "us", "her", "husband", "joseph", "groff", 
"my", "father", "george", "leffler", "and", "my", "uncle", "andrew", 
"henderson", "were", "already", "in", "edmonton", "they", "came", 
"in", "1910", "we", "arrived", "july", "1", "1911", "the", "sun", 
"was", "shining", "when", "we", "arrived", "however", "it", "had", 
"been", "raining", "for", "days", "and", "it", "was", "very", 
"muddy", "especially", "around", "the", "cn", "train"}

searchWords={"the","for","my","and","me","and","we"}
 
Each of these lists are much longer (say 250 words in the searchWords list and docText being about 12,000 words).  
Right now, I have the ability to figure out frequency of a given word by doing something like:  
docFrequency=Sort[Tally[docText],#1[[2]]>#2[[2]]&];    
Flatten[Cases[docFrequency,{"settlement",_}]][[2]]
 
But where I am getting hung up is on my quest to generate specific lists. Specifically, the issue of converting a list of words into a list of the frequency in which those words appear. I've tried to do this with Do loops but have hit a wall. 
I want to go through docText with searchWords and replace each element of docText with the sheer frequency of its appearance. I.e. since "settlement" appears twice, it would be replaced by 2 in the list, whereas since "my" appears 3 times, it would become 3. The list would then be something like 2,1,1,1,2, and so forth.  
I suspect the answer lies somewhere in If[] and Map[]? 
This all sounds weird, but I am trying to pre-process a bunch of information for term frequency information…  
 
Addition for Clarity (I hope): 
Here is a better example. 
searchWords={"0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "a", "A", "about", 
"above", "across", "after", "again", "against", "all", "almost", 
"alone", "along", "already", "also", "although", "always", "among", 
"an", "and", "another", "any", "anyone", "anything", "anywhere", 
"are", "around", "as", "at", "b", "B", "back", "be", "became", 
"because", "become", "becomes", "been", "before", "behind", "being", 
"between", "both", "but", "by", "c", "C", "can", "cannot", "could", 
"d", "D", "do", "done", "down", "during", "e", "E", "each", "either", 
"enough", "even", "ever", "every", "everyone", "everything", 
"everywhere", "f", "F", "few", "find", "first", "for", "four", 
"from", "full", "further", "g", "G", "get", "give", "go", "h", "H", 
"had", "has", "have", "he", "her", "here", "herself", "him", 
"himself", "his", "how", "however", "i", "I", "if", "in", "interest", 
"into", "is", "it", "its", "itself", "j", "J", "k", "K", "keep", "l", 
"L", "last", "least", "less", "m", "M", "made", "many", "may", "me", 
"might", "more", "most", "mostly", "much", "must", "my", "myself", 
"n", "N", "never", "next", "no", "nobody", "noone", "not", "nothing", 
"now", "nowhere", "o", "O", "of", "off", "often", "on", "once", 
"one", "only", "or", "other", "others", "our", "out", "over", "p", 
"P", "part", "per", "perhaps", "put", "q", "Q", "r", "R", "rather", 
"s", "S", "same", "see", "seem", "seemed", "seeming", "seems", 
"several", "she", "should", "show", "side", "since", "so", "some", 
"someone", "something", "somewhere", "still", "such", "t", "T", 
"take", "than", "that", "the", "their", "them", "then", "there", 
"therefore", "these", "they", "this", "those", "though", "three", 
"through", "thus", "to", "together", "too", "toward", "two", "u", 
"U", "under", "until", "up", "upon", "us", "v", "V", "very", "w", 
"W", "was", "we", "well", "were", "what", "when", "where", "whether", 
"which", "while", "who", "whole", "whose", "why", "will", "with", 
"within", "without", "would", "x", "X", "y", "Y", "yet", "you", 
"your", "yours", "z", "Z"}
 
These are the automatically generated stopwords from WordData[]. So I want to compare these words against docText. Since "settlement" is NOT part of searchWords, then it would appear as 0. But since "my" is part of searchWords, it would pop up as the count (so I could tell how many times the given word appears). 
I really do thank you for your help - I'm looking forward to taking some formal courses soon as I'm bumping up against the edge of my ability to really explain what I want to do!

原文：https://stackoverflow.com/questions/8973830

更新时间：2022-01-23 09:01

最满意答案

 你有错误的界限。 你需要香蕉盒装订[(ngModel)]="serial"而不是[ngModel]="serial"  
 ()在每次输入变化时都会更新serial模型。 从input到model  
 如果手动更改代码，Single []将只绑定serial数据。 这将导致单向绑定 - 从model到input 。  
 正如你猜 - 一起[()]他们会做双向绑定。 

You have wrong bound. You need banana-in-box binding [(ngModel)]="serial" instead of [ngModel]="serial" 
() in the binding will update serial model everytime when the input will be changes. From input into model 
Single [] will just bind the data of serial if it will be changed by code manually. This will cause to one-way binding - from model into input. 
As you guess - together [()] they will make two-way binding.

将单词列表转换为这些单词出现的频率列表(Converting a list of words into a list of the frequency in which those words appear)

最满意答案

相关问答

如何通过角度2中的事件对象获取输入字段名称(how to get input field name through event object in angular 2)[2023-12-23]

无法使用角度2指令将带有selenium的文本输入到字段中(Cannot input text with selenium into field with angular 2 directive)[2022-05-18]

ngRepeat标记中相关文本字段的角度验证(Angular validation of dependent text field in an ngRepeat tag)[2021-11-24]

如何从具有相同名称的多个输入文本字段中仅获取一个值？(How to get only one value from multiple input text field with same name?)[2024-02-07]