首页 \ 问答 \ CountVectorizer读写词汇(CountVectorizer reading and writing vocabulary)

CountVectorizer读写词汇(CountVectorizer reading and writing vocabulary)

我目前正致力于一个相当微不足道的情绪分类计划。 一切都在训练阶段很好。 但是,我无法使用CountVectorizer来测试包含看不见的单词的新文本字符串。

出于这个原因,我试图在测试阶段为矢量化编写查找词汇表。 但是,我不知道如何创建和检索要作为参数传递的词汇表对象。

我的两种方法目前如下所示:

def trainingVectorTransformation (messages):
    #--> ReviewText to vectors    
    vect = CountVectorizer(analyzer=split_into_lemmas).fit(messages['reviewText'])

    messages_bow = vect.transform(messages['reviewText'])

    feature_list = vect.get_feature_names()
    #NOT SURE HOW TO CREATE VOCABULARY
    with open("vocab.txt", "w") as text_file:
        text_file.write(str(feature_list))   

    tfidf_transformer = TfidfTransformer().fit(messages_bow)


    messages_tfidf = tfidf_transformer.transform(messages_bow)
    return messages_tfidf

def testingVectorTransformation (messages):
    #--> ReviewText to vectors
    #NOT SURE HOW TO READ THE CREATED VOCABULARY AND USE IT APPROPRIATELY   
    txt = open("vocab.txt")
    vocabulary = txt.read()


    vect = CountVectorizer(analyzer=split_into_lemmas, vocabulary = vocabulary).fit(messages['reviewText'])

    messages_bow = vect.transform(messages['reviewText'])

    tfidf_transformer = TfidfTransformer().fit(messages_bow)

    messages_tfidf = tfidf_transformer.transform(messages_bow)
    return messages_tfidf

如果有人对如何正确创建和使用词汇表有任何建议,我将非常感激。


I am currently working on a fairly trivial sentiment classification program. Everything works well in the training phase. However, I am having trouble using CountVectorizer to test new strings of text that contain unseen words.

For this reason I am trying to write a lookup vocabulary for vectorization in the testing phase. However, I don't know how to create and retrieve the vocabulary object to pass as a parameter.

My two methods currently appear as follows:

def trainingVectorTransformation (messages):
    #--> ReviewText to vectors    
    vect = CountVectorizer(analyzer=split_into_lemmas).fit(messages['reviewText'])

    messages_bow = vect.transform(messages['reviewText'])

    feature_list = vect.get_feature_names()
    #NOT SURE HOW TO CREATE VOCABULARY
    with open("vocab.txt", "w") as text_file:
        text_file.write(str(feature_list))   

    tfidf_transformer = TfidfTransformer().fit(messages_bow)


    messages_tfidf = tfidf_transformer.transform(messages_bow)
    return messages_tfidf

and

def testingVectorTransformation (messages):
    #--> ReviewText to vectors
    #NOT SURE HOW TO READ THE CREATED VOCABULARY AND USE IT APPROPRIATELY   
    txt = open("vocab.txt")
    vocabulary = txt.read()


    vect = CountVectorizer(analyzer=split_into_lemmas, vocabulary = vocabulary).fit(messages['reviewText'])

    messages_bow = vect.transform(messages['reviewText'])

    tfidf_transformer = TfidfTransformer().fit(messages_bow)

    messages_tfidf = tfidf_transformer.transform(messages_bow)
    return messages_tfidf

If anyone has any advice on how to properly create and use the vocabulary I would very much appreciate it.


原文:https://stackoverflow.com/questions/37484369
更新时间:2023-03-02 16:03

最满意答案

除非你的Javascript中的session_destroy()方法实际上向PHP脚本或其他东西发送请求,否则看起来你正试图将PHP代码放入你的javascript中,这是行不通的。

您应该尝试将它们重定向到类似firstpage.php?reset=1并且在PHP脚本中,您可以检查重置标志,然后调用session_destroy()


Unless your session_destroy() method in Javascript actually sends a request to the PHP script or something, it looks like you are trying to put PHP code in your javascript, which will not work.

You should try redirecting them to something like firstpage.php?reset=1 and inside the PHP script, you can check for the reset flag, then call session_destroy().

相关问答

更多
  • 这里的问题是变量范围。 你可以这样做:- $("#to-top").click(function () { var bas_name = $("#inp_name").val(); var bas_address = $("#inp_address").val(); var bas_zipcode = $("#inp_zipcode").val(); var bas_city = $("#inp_city").val(); if (bas_name == "" && ...
  • 你在拨打电话时错过了引号。 这将调用toggle(a0)而不是toggle("a0") 。 尝试这个 : echo "Show/Hide"; 由于我总是觉得处理这种级别的叠加引号很痛苦,我宁愿避免在PHP中内联javascript。 即使没有jQuery,您也可以使用document.getElementById('...').onclick=...推迟添加处理 ...
  • 首先,如果您在表单参数中使用POST方法,则不应通过url发送。 现在其中一个问题可能是您在表单中嵌套了一个表单,因此它可能将第一个表单元素作为有效表单元素,而GET是默认方法。 那么看看这可能是问题所在。 First of all if you're using POST method in the form parameters shouldn't be sent through the url. Now one of the issues might be the fact that you have ...
  • 我认为问题在于HTML的结构。 因为角度范围是从HTML元素父子层次结构中解析的。 你的testAppCtrl在body中(在你的表单中), register()函数在testAppCtrl的范围内(但是你试图在body体外访问它)。 因此,您无法在层次结构中从外部访问它。 将Body保持在外面并form内部。 此外,ng-app应与ng-controller的级别相同或更高。 因为您可以将控制器绑定到app / module而不是相反。 所以,代码应该是这样的,
    应该是$_POST["nachricht"] (在POST时考虑名称属性) should be $_POST["nachricht"] (name attribute is taken in account when in a POST)
  • 除非你的Javascript中的session_destroy()方法实际上向PHP脚本或其他东西发送请求,否则看起来你正试图将PHP代码放入你的javascript中,这是行不通的。 您应该尝试将它们重定向到类似firstpage.php?reset=1并且在PHP脚本中,您可以检查重置标志,然后调用session_destroy() 。 Unless your session_destroy() method in Javascript actually sends a request to the P ...
  • 您的脚本可以正常工作,只需将if语句中的filesize值增加到适合这些类型文件的内容即可。 ($ _FILES [“file”] [“size”] <500000) 你只允许19k。 Your script works, just increase the filesize value in the if statement to something the will accommodate these types of files. ($_FILES["file"]["size"] < 500000) ...
  • 你需要在表格上提交一个按钮。 例如: 但是你在表单中提交的是什么,没有数据输入字段! 你正在寻找这样的东西:
    you need a submit butt ...
  • 您需要将这些输入放在表单中并设置操作。 如下:
    这将是你的代码 SomuFinance - Perso ... </div> </div> </li> <li class="tw_li clearfix"> <div class="tw_li_con"> <div class="tw_li_title"> <h2><a href="/wenda/anniujavascriptdaimabuqizuoyong_586" target="_blank">为什么以下按钮/ javascript代码不起作用?(Why is the following button/javascript code not working?)</a><i>[2022-05-11] </i></h2> </div> <div class="tw_li_cont"> 您的浏览器在开发人员工具中有一个控制台。 用它! 未捕获的SyntaxError:意外的令牌{ 函数声明的语法是: function - 关键字 identifier - 函数名称 ( 参数列表 ) { 功能体 } 你错过了(argument list) 。 Your browser has a console in the developer tools. Use it! Uncaught SyntaxError: Unexpected token { The syntax for a function ... </div> </div> </li> </ul> </div> <div class="main_right"> <div class="search-out"> <div class="search"> <form action="/wenda" target="_blank" method="get"> <input type="search" autocorrect="off" autocomplete="off" placeholder="请输入关键词" id="q" name="q" value=""> <button class="btn_s" type="submit">搜索</button> </form> </div> </div> <div class="commonh"> <h2>相关文章</h2> <span class="fr"><a href="/jiaocheng" target="_blank">更多</a></span> </div> <div class="right_list"> <li> <a title="十天内提高单词量到20000! (Vocabulary 22000)" href="/article/stntgdcld20000Vocabulary22000_3" target="_blank">十天内提高单词量到20000! (Vocabulary 22000)</a> </li> <li> <a title="十天内提高单词量到20000! (Vocabulary 22000)" href="/article/stntgdcld20000Vocabulary220004330_3" target="_blank">十天内提高单词量到20000! (Vocabulary 22000)</a> </li> <li> <a title="reading notes for solr source code" href="/article/readingnotesforsolrsourcecode_2" target="_blank">reading notes for solr source code</a> </li> <li> <a title="十天内提高单词量到20000! (Vocabulary 10000)" href="/article/stntgdcld20000Vocabulary10000_3" target="_blank">十天内提高单词量到20000! (Vocabulary 10000)</a> </li> <li> <a title="Hadoop学习总结之二:HDFS读写过程解析" href="/article/HadoopxxzjzeHDFSdxgcjx_0" target="_blank">Hadoop学习总结之二:HDFS读写过程解析</a> </li> <li> <a title="2 网络词汇大全" href="/article/2wangluocihuidaquan_3" target="_blank">2 网络词汇大全</a> </li> <li> <a title="Riak Search" href="/article/RiakSearch_2" target="_blank">Riak Search</a> </li> <li> <a title="IT新潮关键词汇整理 " href="/article/ITxinchaoguanjiancihuizhengli_2" target="_blank">IT新潮关键词汇整理 </a> </li> <li> <a title="CCNA专业英文词汇全集(2)" href="/article/CCNAzhuanyeyingwencihuiquanji2_3" target="_blank">CCNA专业英文词汇全集(2)</a> </li> <li> <a title="颜色的英语词汇" href="/article/yansedeyingyucihui_3" target="_blank">颜色的英语词汇</a> </li> </div> <div class="commonh"> <h2>最新问答</h2> <span class="fr"><a href="/wenda" target="_blank">更多</a></span> </div> <div class="right_list"> <li> <a title="您如何使用git diff文件,并将其应用于同一存储库的副本的本地分支?(How do you take a git diff file, and apply it to a local branch that is a copy of the same repository?)" href="/wenda/gitdiffwjbjyyyyckfbfz_166" target="_blank">您如何使用git diff文件,并将其应用于同一存储库的副本的本地分支?(How do you take a git diff file, and apply it to a local branch that is a copy of the same repository?)</a> </li> <li> <a title="将长浮点值剪切为2个小数点并复制到字符数组(Cut Long Float Value to 2 decimal points and copy to Character Array)" href="/wenda/zfdzjqxsdfzdzfsz_208" target="_blank">将长浮点值剪切为2个小数点并复制到字符数组(Cut Long Float Value to 2 decimal points and copy to Character Array)</a> </li> <li> <a title="OctoberCMS侧边栏不呈现(OctoberCMS Sidebar not rendering)" href="/wenda/octobercmscebianlanchengxian_358" target="_blank">OctoberCMS侧边栏不呈现(OctoberCMS Sidebar not rendering)</a> </li> <li> <a title="页面加载后对象是否有资格进行垃圾回收?(Are objects eligible for garbage collection after the page loads?)" href="/wenda/ymjzdxzgljhs_380" target="_blank">页面加载后对象是否有资格进行垃圾回收?(Are objects eligible for garbage collection after the page loads?)</a> </li> <li> <a title="codeigniter中的语言不能按预期工作(language in codeigniter doesn' t work as expected)" href="/wenda/codeigniterzyybnayqgz_201" target="_blank">codeigniter中的语言不能按预期工作(language in codeigniter doesn' t work as expected)</a> </li> <li> <a title="在计算机拍照在哪里进入" href="/wenda/jisuanjipaizhaozaina_8" target="_blank">在计算机拍照在哪里进入</a> </li> <li> <a title="使用cin.get()从c ++中的输入流中丢弃不需要的字符(Using cin.get() to discard unwanted characters from the input stream in c++)" href="/wenda/cingetczsrlzdqbxyzfdiscardunwant_470" target="_blank">使用cin.get()从c ++中的输入流中丢弃不需要的字符(Using cin.get() to discard unwanted characters from the input stream in c++)</a> </li> <li> <a title="No for循环将在for循环中运行。(No for loop will run inside for loop. Testing for primes)" href="/wenda/forxhjzforxhyx_517" target="_blank">No for循环将在for循环中运行。(No for loop will run inside for loop. Testing for primes)</a> </li> <li> <a title="单页应用程序:页面重新加载(Single Page Application: page reload)" href="/wenda/danyeyingyongchengxuyemianjiazai_103" target="_blank">单页应用程序:页面重新加载(Single Page Application: page reload)</a> </li> <li> <a title="在循环中选择具有相似模式的列名称(Selecting Column Name With Similar Pattern in a Loop)" href="/wenda/xunhuanzhongxuanmoshilieming_433" target="_blank">在循环中选择具有相似模式的列名称(Selecting Column Name With Similar Pattern in a Loop)</a> </li> <li> <a title="System.StackOverflow错误(System.StackOverflow error)" href="/wenda/systemstackoverflowcuowu_431" target="_blank">System.StackOverflow错误(System.StackOverflow error)</a> </li> <li> <a title="KnockoutJS未在嵌套模板上应用beforeRemove和afterAdd(KnockoutJS not applying beforeRemove and afterAdd on nested templates)" href="/wenda/knockoutjswzqtmbbeforeremoveafte_429" target="_blank">KnockoutJS未在嵌套模板上应用beforeRemove和afterAdd(KnockoutJS not applying beforeRemove and afterAdd on nested templates)</a> </li> <li> <a title="散列包括方法和/或嵌套属性(Hash include methods and/or nested attributes)" href="/wenda/sanliebaokuofangfaqiantaoshuxing_254" target="_blank">散列包括方法和/或嵌套属性(Hash include methods and/or nested attributes)</a> </li> <li> <a title="android - 如何避免使用Samsung RFS文件系统延迟/冻结?(android - how to avoid lag/freezes with Samsung RFS filesystem?)" href="/wenda/androidsamsungrfswjxtycdj_556" target="_blank">android - 如何避免使用Samsung RFS文件系统延迟/冻结?(android - how to avoid lag/freezes with Samsung RFS filesystem?)</a> </li> <li> <a title="TensorFlow:基于索引列表创建新张量(TensorFlow: Create a new tensor based on list of indices)" href="/wenda/tensorflowsylbcjxz_538" target="_blank">TensorFlow:基于索引列表创建新张量(TensorFlow: Create a new tensor based on list of indices)</a> </li> <li> <a title="企业安全培训的各项内容" href="/wenda/qiyeanquanpeixungexiangneirong_20" target="_blank">企业安全培训的各项内容</a> </li> <li> <a title="错误:RPC失败;(error: RPC failed; curl transfer closed with outstanding read data remaining)" href="/wenda/cuowurpcshibai_88" target="_blank">错误:RPC失败;(error: RPC failed; curl transfer closed with outstanding read data remaining)</a> </li> <li> <a title="C#类名中允许哪些字符?(What characters are allowed in C# class name?)" href="/wenda/cleimingzhongyunzifu_157" target="_blank">C#类名中允许哪些字符?(What characters are allowed in C# class name?)</a> </li> <li> <a title="NumPy:将int64值存储在np.array中并使用dtype float64并将其转换回整数是否安全?(NumPy: Is it safe to store an int64 value in an np.array with dtype float64 and later convert it back to integer?)" href="/wenda/numpyint64zccnparrayzdtypefloat6_310" target="_blank">NumPy:将int64值存储在np.array中并使用dtype float64并将其转换回整数是否安全?(NumPy: Is it safe to store an int64 value in an np.array with dtype float64 and later convert it back to integer?)</a> </li> <li> <a title="注销后如何隐藏导航portlet?(How to hide navigation portlet after logout?)" href="/wenda/zhuxiaoyinzangdaohangportlet_526" target="_blank">注销后如何隐藏导航portlet?(How to hide navigation portlet after logout?)</a> </li> <li> <a title="将多个行和可变行移动到列(moving multiple and variable rows to columns)" href="/wenda/duogexingkebianxingdongdaolie_442" target="_blank">将多个行和可变行移动到列(moving multiple and variable rows to columns)</a> </li> <li> <a title="提交表单时忽略基础href,而不使用Javascript(ignore base href when submitting form, without using Javascript)" href="/wenda/tjbdshljchrefbsjavascript_382" target="_blank">提交表单时忽略基础href,而不使用Javascript(ignore base href when submitting form, without using Javascript)</a> </li> <li> <a title="对setOnInfoWindowClickListener的意图(Intent on setOnInfoWindowClickListener)" href="/wenda/setoninfowindowclicklisteneryitu_249" target="_blank">对setOnInfoWindowClickListener的意图(Intent on setOnInfoWindowClickListener)</a> </li> <li> <a title="Angular $资源不会改变方法(Angular $resource doesn't change method)" href="/wenda/angularziyuanhuigaifangfa_275" target="_blank">Angular $资源不会改变方法(Angular $resource doesn't change method)</a> </li> <li> <a title="在Angular 5中不是一个函数(is not a function in Angular 5)" href="/wenda/angularzhonghanshu_308" target="_blank">在Angular 5中不是一个函数(is not a function in Angular 5)</a> </li> <li> <a title="如何配置Composite C1以将.m和桌面作为同一站点提供服务(How to configure Composite C1 to serve .m and desktop as the same site)" href="/wenda/pzcompositec1mzmyzdtgfw_354" target="_blank">如何配置Composite C1以将.m和桌面作为同一站点提供服务(How to configure Composite C1 to serve .m and desktop as the same site)</a> </li> <li> <a title="不适用:悬停在悬停时:在元素之前[复制](Don't apply :hover when hovering on :before element [duplicate])" href="/wenda/bsxtxtsysfz_203" target="_blank">不适用:悬停在悬停时:在元素之前[复制](Don't apply :hover when hovering on :before element [duplicate])</a> </li> <li> <a title="常见的python rpc和cli接口(Common python rpc and cli interface)" href="/wenda/changjianpythonrpcclijiekou_315" target="_blank">常见的python rpc和cli接口(Common python rpc and cli interface)</a> </li> <li> <a title="Mysql DB单个字段匹配多个其他字段(Mysql DB single field matching to multiple other fields)" href="/wenda/mysqldbdgzdppdgzd_259" target="_blank">Mysql DB单个字段匹配多个其他字段(Mysql DB single field matching to multiple other fields)</a> </li> <li> <a title="产品页面上的Magento Up出售对齐问题(Magento Up sell alignment issue on the products page)" href="/wenda/chanpinyemianmagentochushouduiqi_417" target="_blank">产品页面上的Magento Up出售对齐问题(Magento Up sell alignment issue on the products page)</a> </li> </div> </div> </div> </div> <div style="clear:both;"></div> <div class="footer"> <div class="mainbox"> <div class="info"> <p>Copyright ©2023 <a href="https://www.peixunduo.com" target="_blank">peixunduo.com</a> All Rights Reserved.<a href="https://beian.miit.gov.cn/" target="_blank">粤ICP备14003112号</a> </p> <p>本站部分内容来源于互联网,仅供学习和参考使用,请莫用于商业用途。如有侵犯你的版权,请联系我们(neng862121861#163.com),本站将尽快处理。谢谢合作!</p> </div> </div> </div> <script type="text/javascript" src="/resources/js/common.js?v=324"></script> <script> (function(){ var bp = document.createElement('script'); var curProtocol = window.location.protocol.split(':')[0]; if (curProtocol === 'https'){ bp.src = 'https://zz.bdstatic.com/linksubmit/push.js'; } else{ bp.src = 'http://push.zhanzhang.baidu.com/push.js'; } var s = document.getElementsByTagName("script")[0]; s.parentNode.insertBefore(bp, s); })(); </script> <script> var _hmt = _hmt || []; (function() { var hm = document.createElement("script"); hm.src = "https://hm.baidu.com/hm.js?9eebaceb5e4371a0aad59712a1a1ecff"; var s = document.getElementsByTagName("script")[0]; s.parentNode.insertBefore(hm, s); })(); </script> </body> </html>