首页 \ 问答 \ JAVA中的StringTokenizer(StringTokenizer in JAVA)

JAVA中的StringTokenizer(StringTokenizer in JAVA)

 StringTokenizer用于标记JAVA中的标记字符串。 该字符串使用Stanford的Parts Of Speech MaxentTagger进行标记。 标记文本的子串被用于仅显示POS标签，并且仅迭代地显示该单词。  
 这是标记之前的文本：  
 
  人一直有这样的观念，即勇敢的行为在身体行为中体现出来。 虽然它并非完全错误，但并不存在勇敢的单一路径。 从旧的角度来看，这是反击野生动物的力量的标志。 如果参与辩护，这是可以理解的; 然而，要加倍努力并煽动动物并与它作斗争，这是人类可以展现的最低程度的文明。 更重要的是，在这个推理和知识的时代。 传统可以称之为，但盲目地坚持它是愚蠢的，无论是在泰米尔纳德邦（印度相当于西班牙斗牛）或公鸡战斗着名的Jallikattu。 在一条狗身上砸石头，在痛苦中re嚎是可怕的。 如果一个人只给予思想和良心的涓涓细流，那么这个问题在每个方面都会表现得令人遗憾。 动物在我们的生态系统中与我们一起发挥作用。 而且，有些动物比较贵：保护我们街道的流浪狗，聪明的乌鸦，负担的野兽和牧场的日常动物。 文学以自己的方式表达出来：在“指环王”中，团契对Bill Ferny的小马极为谨慎; 在哈利波特，当他们没有听从赫敏关于家养小精灵治疗的建议时，他们学到了很难引起他们自己毁灭的方法; 杰克伦敦，写了关于动物的所有内容。事实上，善待动物是一种美德。  
 
 这是POS标记文本：  
 
  Man_NN has_VBZ always_RB had_VBN this_DT notion_NN that_IN brave_VBP deeds_NNS are_VBP manifest_JJ in_IN physical_JJ actions_NNS ._。 While_IN it_PRP is_VBZ not_RB fully_RB erroneous_JJ，_，there_EX does_VBZ n't_RB lie_VB the_DT singular_JJ path_NN to_TO valor_NN ._。 From_IN of_IN old_JJ，_，it_PRP is_VBZ a_DT sign_NN of_IN strength_NN to_TO fight_VB back_RP a_DT wild_JJ animal_NN ._。 It_PRP is_VBZ understandable_JJ if_IN fought_VBN in_IN defense_NN; _：yet_RB，_，to_TO go_VB the_DT extra_JJ mile_NN and_CC instigate_VB an_DT animal_NN and_CC fight_VB it_PRP is_VBZ the_DT lowest_JJS degree_NN of_IN civil_NN man_NN can_MD exhibit_VB ._。 More_RBR so_RB，_，in_IN this_DT age_NN of_IN reasoning_NN and_CC knowledge_NN ._。 Tradition_NN may_MD call_VB it_PRP，_，but_CC adhering_JJ blindly_RB to_TO it_PRP is_VBZ idiocy_NN，_，be_VB it_PRP the_DT famed_JJ Jallikattu_NNP in_IN Tamil_NNP Nadu_NNP -LRB -_- LRB- The_DT Indian_JJ equivalent_NN to_TO the_DT Spanish_JJ Bullfighting_NN -RRB -_- RRB- or_CC the_DT cock- fights_NNS ._。 Pelting_VBG stones_NNS at_IN a_DT dog_NN and_CC relishing_VBG it_PRP howl_NN in_IN pain_NN is_VBZ dreadful_JJ ._。 If_IN one_CD only_RB give_VBD as_RB much_JJ as_IN a_DT trickle_VB of_IN thought_NN and_CC conscience_NN the_DT issue_NN will_MD surface_VB as_IN deplorable_JJ in_IN every_DT aspect_NN ._。 Animals_NNS play_VBP a_DT part_NN along_IN with_IN us_PRP in_IN our_PRP $ ecosystem_NN ._。 And_CC，_，some_DT animals_NNS are_VBP dearer_RBR：_：the_DT stray_JJ dogs_NNS that_WDT guard_VBP our_PRP $ street_NN，_，the_DT intelligent_JJ crow_NN，_，the_DT beast_NN of_IN burden_NN and_CC the_DT everyday_JJ animals_NNS of_IN pasture_NN ._。 Literature_NN has_VBZ voiced_VBN in_IN its_PRP $ own_JJ way_NN：_：In_IN The_DT Lord_NN of_IN the_DT Rings_NNP the_DT fellowship_NN treated_VBN Bill_NNP Ferny_NNP 's_POS pony_NN with_IN utmost_JJ care_NN; _：in_IN Harry_NNP Potter_NNP when_WRB they_PRP did_VBD n't_RB heed_VB Hermione_NNP' s_POS advice_NN on_IN the_DT treatment_NN of_IN house_NN elves_NNS they_PRP learned_VBD the_DT hard_JJ way_NN that_IN it_PRP caused_VBD their_PRP $ own_JJ undoing_NN; _：and_CC Jack_NNP London_NNP，_，writes_VBZ all_DT about_IN animals_NNS ._。 Indeed_RB，_，Kindness_NN to_TO animals_NNS is_VBZ a_DT virtue_NN ._。  
 
 这是寻求获得上述子串的代码：  
String line;
StringBuilder sb=new StringBuilder();
try(FileInputStream input = new FileInputStream("E:\\D.txt"))
    {
    int data = input.read();
    while(data != -1)
        {
        sb.append((char)data);
        data = input.read();
        }
    }
catch(FileNotFoundException e)
{
    System.err.println("File Not Found Exception : " + e.getMessage());
}
line=sb.toString();
String line1=line;//Copy for Tagger
line+=" T";       
List<String> sentenceList = new ArrayList<String>();//TAGGED DOCUMENT
MaxentTagger tagger = new MaxentTagger("E:\\Installations\\Java\\Tagger\\english-left3words-distsim.tagger");
String tagged = tagger.tagString(line1);
File file = new File("A.txt");
BufferedWriter output = new BufferedWriter(new FileWriter(file));
output.write(tagged);
output.close();
DocumentPreprocessor dp = new DocumentPreprocessor("C:\\Users\\Admin\\workspace\\Project\\A.txt");
int largest=50;
int m=0;
StringTokenizer st1;
for (List<HasWord> sentence : dp) 
{
   String sentenceString = Sentence.listToString(sentence);
   sentenceList.add(sentenceString.toString());
}
String[][] Gloss=new String[sentenceList.size()][largest];
String[] Adj=new String[largest];
String[] Adv=new String[largest];
String[] Noun=new String[largest];
String[] Verb=new String[largest];
int adj=0,adv=0,noun=0,verb=0;
for(int i=0;i<sentenceList.size();i++)
{
    st1= new StringTokenizer(sentenceList.get(i)," ,(){}[]/.;:&?!");
    m=0;//Count for Gloss 2nd dimension
    //GETTING THE POS's COMPARTMENTALISED
    while(st1.hasMoreTokens())
    {
        String token=st1.nextToken();
        if(token.length()>1)//TO SKIP PAST TOKENS FOR PUNCTUATION MARKS
        {
        System.out.println(token);
        String s=token.substring(token.lastIndexOf("_")+1,token.length());
        System.out.println(s);
        if(s.equals("JJ")||s.equals("JJR")||s.equals("JJS"))
        {
            Adj[adj]=token.substring(0,token.lastIndexOf("_"));
            System.out.println(Adj[adj]);
            adj++;
        }
        if(s.equals("NN")||s.equals("NNS"))
        {
            Noun[noun]=token.substring(0,  token.lastIndexOf("_"));
            System.out.println(Noun[noun]);
            noun++;
        }
        if(s.equals("RB")||s.equals("RBR")||s.equals("RBS"))
        {
            Adv[adv]=token.substring(0,token.lastIndexOf("_"));
            System.out.println(Adv[adv]);
            adv++;
        }
        if(s.equals("VB")||s.equals("VBD")||s.equals("VBG")||s.equals("VBN")||s.equals("VBP")||s.equals("VBZ"))
        {
            Verb[verb]=token.substring(0,token.lastIndexOf("_"));
            System.out.println(Verb[verb]);
            verb++;
        }
        }
    }
    i++;//TO SKIP PAST THE LINES WHERE AN EXTRA UNDERSCORE OCCURS FOR FULLSTOP
 }
 
 D.txt包含纯文本。  
 至于问题：  
 每个单词都在空格处被标记化。 除了'n't_RB'，它被标记为not而RB是分开的。  
 这是输出的外观：  
Man_NN
NN
Man
has_VBZ 
VBZ
has
always_RB
RB
always
had_VBN
VBN
had
this_DT
DT
notion_NN
NN
notion
that_IN
IN
brave_VBP
VBP
brave
deeds_NNS
NNS
deeds
are_VBP
VBP
are
manifest_JJ
JJ
manifest
in_IN
IN
physical_JJ
JJ
physical
actions_NNS
NNS
actions
While_IN
IN
it_PRP
PRP
is_VBZ
VBZ
is
not_RB
RB
not
entirely_RB
RB
entirely
erroneous_JJ
JJ
erroneous
there_EX
EX
does_VBZ
VBZ
does
n't
n't
RB
RB
 
 但是，如果我只是在标记器中运行'there_EX does_VBZ n't_RB lie_VB'，那么'n't_RB'会被一起编织。 当我运行程序时，我得到一个StringIndexOutOfBounds异常，这是可以理解的，因为'not'或'RB'中没有'_'。 任何人都可以看到它吗？ 谢谢。 

StringTokenizer is used to tokenize a tagged string in JAVA. The string is tagged using Parts Of Speech MaxentTagger of Stanford. Substring of the tagged text is taken to display just the POS tag and just the word iteratively. 
Here's the text before tagging: 
 
 Man has always had this notion that brave deeds are manifest in physical actions. While it is not entirely erroneous, there doesn't lie the singular path to valor. From of old, it is a sign of strength to fight back a wild animal. It is understandable if fought in defense; however, to go the extra mile and instigate an animal and fight it is the lowest degree of civilization man can exhibit. More so, in this age of reasoning and knowledge. Tradition may call it, but adhering blindly to it is idiocy, be it the famed Jallikattu in Tamil Nadu (The Indian equivalent to the Spanish Bullfighting) or the cock-fights. Pelting stones at a dog and relishing it howl in pain is dreadful. If one only gave as much as a trickle of thought and conscience the issue would surface as deplorable in every aspect. Animals play a part along with us in our ecosystem. And, some animals are dearer: the stray dogs that guard our street, the intelligent crow, the beast of burden and the everyday animals of pasture. Literature has voiced in its own way: In The Lord of the Rings the fellowship treated Bill Ferny's pony with utmost care; in Harry Potter when they didn’t heed Hermione's advice on the treatment of house elves they learned the hard way that it caused their own undoing; and Jack London, writes all about animals.Indeed, Kindness to animals is a virtue.  
 
Here's the POS tagged text: 
 
 Man_NN has_VBZ always_RB had_VBN this_DT notion_NN that_IN brave_VBP deeds_NNS are_VBP manifest_JJ in_IN physical_JJ actions_NNS ._. While_IN it_PRP is_VBZ not_RB entirely_RB erroneous_JJ ,_, there_EX does_VBZ n't_RB lie_VB the_DT singular_JJ path_NN to_TO valor_NN ._. From_IN of_IN old_JJ ,_, it_PRP is_VBZ a_DT sign_NN of_IN strength_NN to_TO fight_VB back_RP a_DT wild_JJ animal_NN ._. It_PRP is_VBZ understandable_JJ if_IN fought_VBN in_IN defense_NN ;_: however_RB ,_, to_TO go_VB the_DT extra_JJ mile_NN and_CC instigate_VB an_DT animal_NN and_CC fight_VB it_PRP is_VBZ the_DT lowest_JJS degree_NN of_IN civilization_NN man_NN can_MD exhibit_VB ._. More_RBR so_RB ,_, in_IN this_DT age_NN of_IN reasoning_NN and_CC knowledge_NN ._. Tradition_NN may_MD call_VB it_PRP ,_, but_CC adhering_JJ blindly_RB to_TO it_PRP is_VBZ idiocy_NN ,_, be_VB it_PRP the_DT famed_JJ Jallikattu_NNP in_IN Tamil_NNP Nadu_NNP -LRB-_-LRB- The_DT Indian_JJ equivalent_NN to_TO the_DT Spanish_JJ Bullfighting_NN -RRB-_-RRB- or_CC the_DT cock-fights_NNS ._. Pelting_VBG stones_NNS at_IN a_DT dog_NN and_CC relishing_VBG it_PRP howl_NN in_IN pain_NN is_VBZ dreadful_JJ ._. If_IN one_CD only_RB gave_VBD as_RB much_JJ as_IN a_DT trickle_VB of_IN thought_NN and_CC conscience_NN the_DT issue_NN would_MD surface_VB as_IN deplorable_JJ in_IN every_DT aspect_NN ._. Animals_NNS play_VBP a_DT part_NN along_IN with_IN us_PRP in_IN our_PRP$ ecosystem_NN ._. And_CC ,_, some_DT animals_NNS are_VBP dearer_RBR :_: the_DT stray_JJ dogs_NNS that_WDT guard_VBP our_PRP$ street_NN ,_, the_DT intelligent_JJ crow_NN ,_, the_DT beast_NN of_IN burden_NN and_CC the_DT everyday_JJ animals_NNS of_IN pasture_NN ._. Literature_NN has_VBZ voiced_VBN in_IN its_PRP$ own_JJ way_NN :_: In_IN The_DT Lord_NN of_IN the_DT Rings_NNP the_DT fellowship_NN treated_VBN Bill_NNP Ferny_NNP 's_POS pony_NN with_IN utmost_JJ care_NN ;_: in_IN Harry_NNP Potter_NNP when_WRB they_PRP did_VBD n't_RB heed_VB Hermione_NNP 's_POS advice_NN on_IN the_DT treatment_NN of_IN house_NN elves_NNS they_PRP learned_VBD the_DT hard_JJ way_NN that_IN it_PRP caused_VBD their_PRP$ own_JJ undoing_NN ;_: and_CC Jack_NNP London_NNP ,_, writes_VBZ all_DT about_IN animals_NNS ._. Indeed_RB ,_, Kindness_NN to_TO animals_NNS is_VBZ a_DT virtue_NN ._.  
 
And here's the code which seeks to obtain the above mentioned substrings: 
String line;
StringBuilder sb=new StringBuilder();
try(FileInputStream input = new FileInputStream("E:\\D.txt"))
    {
    int data = input.read();
    while(data != -1)
        {
        sb.append((char)data);
        data = input.read();
        }
    }
catch(FileNotFoundException e)
{
    System.err.println("File Not Found Exception : " + e.getMessage());
}
line=sb.toString();
String line1=line;//Copy for Tagger
line+=" T";       
List<String> sentenceList = new ArrayList<String>();//TAGGED DOCUMENT
MaxentTagger tagger = new MaxentTagger("E:\\Installations\\Java\\Tagger\\english-left3words-distsim.tagger");
String tagged = tagger.tagString(line1);
File file = new File("A.txt");
BufferedWriter output = new BufferedWriter(new FileWriter(file));
output.write(tagged);
output.close();
DocumentPreprocessor dp = new DocumentPreprocessor("C:\\Users\\Admin\\workspace\\Project\\A.txt");
int largest=50;
int m=0;
StringTokenizer st1;
for (List<HasWord> sentence : dp) 
{
   String sentenceString = Sentence.listToString(sentence);
   sentenceList.add(sentenceString.toString());
}
String[][] Gloss=new String[sentenceList.size()][largest];
String[] Adj=new String[largest];
String[] Adv=new String[largest];
String[] Noun=new String[largest];
String[] Verb=new String[largest];
int adj=0,adv=0,noun=0,verb=0;
for(int i=0;i<sentenceList.size();i++)
{
    st1= new StringTokenizer(sentenceList.get(i)," ,(){}[]/.;:&?!");
    m=0;//Count for Gloss 2nd dimension
    //GETTING THE POS's COMPARTMENTALISED
    while(st1.hasMoreTokens())
    {
        String token=st1.nextToken();
        if(token.length()>1)//TO SKIP PAST TOKENS FOR PUNCTUATION MARKS
        {
        System.out.println(token);
        String s=token.substring(token.lastIndexOf("_")+1,token.length());
        System.out.println(s);
        if(s.equals("JJ")||s.equals("JJR")||s.equals("JJS"))
        {
            Adj[adj]=token.substring(0,token.lastIndexOf("_"));
            System.out.println(Adj[adj]);
            adj++;
        }
        if(s.equals("NN")||s.equals("NNS"))
        {
            Noun[noun]=token.substring(0,  token.lastIndexOf("_"));
            System.out.println(Noun[noun]);
            noun++;
        }
        if(s.equals("RB")||s.equals("RBR")||s.equals("RBS"))
        {
            Adv[adv]=token.substring(0,token.lastIndexOf("_"));
            System.out.println(Adv[adv]);
            adv++;
        }
        if(s.equals("VB")||s.equals("VBD")||s.equals("VBG")||s.equals("VBN")||s.equals("VBP")||s.equals("VBZ"))
        {
            Verb[verb]=token.substring(0,token.lastIndexOf("_"));
            System.out.println(Verb[verb]);
            verb++;
        }
        }
    }
    i++;//TO SKIP PAST THE LINES WHERE AN EXTRA UNDERSCORE OCCURS FOR FULLSTOP
 }
 
D.txt contains the plain text.  
As for the issue: 
Every word gets tokenized at the spaces. Except for 'n't_RB' where it is tokenized as n't and RB separately.  
This is how the output looks: 
Man_NN
NN
Man
has_VBZ 
VBZ
has
always_RB
RB
always
had_VBN
VBN
had
this_DT
DT
notion_NN
NN
notion
that_IN
IN
brave_VBP
VBP
brave
deeds_NNS
NNS
deeds
are_VBP
VBP
are
manifest_JJ
JJ
manifest
in_IN
IN
physical_JJ
JJ
physical
actions_NNS
NNS
actions
While_IN
IN
it_PRP
PRP
is_VBZ
VBZ
is
not_RB
RB
not
entirely_RB
RB
entirely
erroneous_JJ
JJ
erroneous
there_EX
EX
does_VBZ
VBZ
does
n't
n't
RB
RB
 
But if I just run 'there_EX does_VBZ n't_RB lie_VB' in the tokenizer 'n't_RB' gets toknized together. When I run the program I get a StringIndexOutOfBounds Exception which is understandable because there's no '_' in 'n't' or 'RB'. Can anybody look to it? Thank you.

原文：https://stackoverflow.com/questions/29444966

更新时间：2023-07-28 10:07

最满意答案

 为了不循环从A到B并检查完美的正方形，为什么不循环从sqrt(A)到sqrt(B)的整数并将每个整数平方，然后给出答案。  
 例如，让我们找到1000到2000之间的平方数：  
sqrt(1000) = 31.6  -->  32  (need the ceiling here)
sqrt(2000) = 44.7  -->  44  (need the floor here)
 
 因此，我们的答案是：  
32² = 1024
33² = 1089
34² = 1156
35² = 1225
36² = 1296
37² = 1369
38² = 1444
39² = 1521
40² = 1600
41² = 1681
42² = 1764
43² = 1849
44² = 1936

Instead of looping from A to B and checking for perfect squares, why not just loop through the integers from sqrt(A) to sqrt(B) and square each, giving you your answer. 
For example, let's find the square numbers between 1000 and 2000: 
sqrt(1000) = 31.6  -->  32  (need the ceiling here)
sqrt(2000) = 44.7  -->  44  (need the floor here)
 
Therefore, our answer is: 
32² = 1024
33² = 1089
34² = 1156
35² = 1225
36² = 1296
37² = 1369
38² = 1444
39² = 1521
40² = 1600
41² = 1681
42² = 1764
43² = 1849
44² = 1936

JAVA中的StringTokenizer(StringTokenizer in JAVA)

最满意答案

相关问答

下列中不属于面向对象的编程语言的是?[2022-05-30]

CSS - 具有完美正方形的网格[复制](CSS - Grid with perfect squares [duplicate])[2022-03-27]

Python - 在给定的大数范围内找到所有完美正方形的最快方法(Python - Fastest way to find all perfect squares in a given large number range)[2023-08-22]

递归地在列表中找到完美正方形的总和(Recursively finding a sum of perfect squares in a list)[2022-02-15]

在输入的数字前打印完整正方形的所有值(Printing all the value of perfect squares before the inputted number)[2022-10-28]

如何在Python中输入大数时有效地找到范围内的完美正方形(How to find perfect squares in a range efficiently when the inputs are large numbers in Python)[2023-07-25]

奇数的完美广场(Perfect squares from odd numbers)[2023-04-06]

Python Squares功能(Python Squares Function)[2023-01-21]

计算给定范围内的完美正方形，完美立方体等的数量？(Calculating the number of perfect squares, perfect cubes,etc in a given range?)[2021-11-05]

在字符串中找到完美的正方形(Find the perfect square in the string)[2021-10-12]

相关文章

最新问答