Hadoop中的ClassNotFoundException(ClassNotFoundException in Hadoop)
使用Hadoop mapreduce我正在编写代码以获得不同长度的子字符串。 示例字符串“ZYXCBA”和长度3.我的代码必须返回所有可能的长度为3的字符串(“ZYX”,“YXC”,“XCB”,“CBA”),长度为4(“ZYXC”,“YXCB”, “XCBA”)最终长度为5(“ZYXCB”,“YXCBA”)。
在地图阶段,我做了以下事情:
key =我想要的子串的长度
value =“ZYXCBA”。
因此映射器输出是
3,"ZYXCBA" 4,"ZYXCBA" 5,"ZYXCBA"
在reduce中,我使用字符串(“ZYXCBA”)和键3来获得长度为3的所有子串。对于4,5,也会出现相同的情况。 结果收集在ArrayList中。
我正在使用以下命令运行我的代码:
hduser@Ganesh:~/Documents$ hadoop jar Saishingles.jar hadoopshingles.Saishingles Behara/Shingles/input Behara/Shingles/output
我的代码如下所示::
package hadoopshingles; import java.io.IOException; import java.util.ArrayList; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; public class Saishingles{ public static class shinglesmapper extends Mapper<Object, Text, IntWritable, Text>{ public void map(Object key, Text value, Context context ) throws IOException, InterruptedException { String str = new String(value.toString()); String[] list = str.split(" "); int index = Integer.parseInt(list[0]); String val = list[1]; int length = val.length(); for(int i = index; i <= length; i++) { context.write(new IntWritable(index),new Text(val)); } } } public static class shinglesreducer extends Reducer<IntWritable,Text,IntWritable,ArrayList<String>> { private ArrayList<String> result = new ArrayList<String>(); public void reduce(IntWritable key, Text value, Context context ) throws IOException, InterruptedException { String str = new String(value.toString()); int newkey = key.get(); int Tz = str.length() - newkey + 1; int position = 0; while (position <= Tz) { result.add(str.substring(position,position + newkey -1)); position = position + 1; } context.write(new IntWritable(newkey),result); } } public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); Job job = Job.getInstance(conf, "Saishingles"); job.setJarByClass(hadoopshingles.Saishingles.class); job.setMapperClass(shinglesmapper.class); job.setCombinerClass(shinglesreducer.class); job.setReducerClass(shinglesreducer.class); job.setMapOutputKeyClass(IntWritable.class); job.setMapOutputValueClass(Text.class); job.setOutputKeyClass(IntWritable.class); job.setOutputValueClass(ArrayList.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); System.exit(job.waitForCompletion(true) ? 0 : 1); } }
它给出了以下错误:
Exception in thread "main" java.lang.ClassNotFoundException: hadoopshingles.Saishingles at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:278) at org.apache.hadoop.util.RunJar.run(RunJar.java:214) at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
请帮助我,并提前谢谢你:)
Using Hadoop mapreduce I am writing code to get substrings of different lengths. Example given string "ZYXCBA" and length 3. My code has to return all possible strings of length 3 ("ZYX","YXC","XCB","CBA"), length 4("ZYXC","YXCB","XCBA") finally length 5("ZYXCB","YXCBA").
In map phase I did the following:
key = length of substrings I want
value = "ZYXCBA".
So mapper output is
3,"ZYXCBA" 4,"ZYXCBA" 5,"ZYXCBA"
In reduce I take string ("ZYXCBA") and key 3 to get all substrings of length 3. Same occurs for 4,5. Results are collected in an ArrayList.
I am running my code using following command:
hduser@Ganesh:~/Documents$ hadoop jar Saishingles.jar hadoopshingles.Saishingles Behara/Shingles/input Behara/Shingles/output
My code is as shown below ::
package hadoopshingles; import java.io.IOException; import java.util.ArrayList; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; public class Saishingles{ public static class shinglesmapper extends Mapper<Object, Text, IntWritable, Text>{ public void map(Object key, Text value, Context context ) throws IOException, InterruptedException { String str = new String(value.toString()); String[] list = str.split(" "); int index = Integer.parseInt(list[0]); String val = list[1]; int length = val.length(); for(int i = index; i <= length; i++) { context.write(new IntWritable(index),new Text(val)); } } } public static class shinglesreducer extends Reducer<IntWritable,Text,IntWritable,ArrayList<String>> { private ArrayList<String> result = new ArrayList<String>(); public void reduce(IntWritable key, Text value, Context context ) throws IOException, InterruptedException { String str = new String(value.toString()); int newkey = key.get(); int Tz = str.length() - newkey + 1; int position = 0; while (position <= Tz) { result.add(str.substring(position,position + newkey -1)); position = position + 1; } context.write(new IntWritable(newkey),result); } } public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); Job job = Job.getInstance(conf, "Saishingles"); job.setJarByClass(hadoopshingles.Saishingles.class); job.setMapperClass(shinglesmapper.class); job.setCombinerClass(shinglesreducer.class); job.setReducerClass(shinglesreducer.class); job.setMapOutputKeyClass(IntWritable.class); job.setMapOutputValueClass(Text.class); job.setOutputKeyClass(IntWritable.class); job.setOutputValueClass(ArrayList.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); System.exit(job.waitForCompletion(true) ? 0 : 1); } }
it's giving the following error :
Exception in thread "main" java.lang.ClassNotFoundException: hadoopshingles.Saishingles at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:278) at org.apache.hadoop.util.RunJar.run(RunJar.java:214) at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
please help me and thank you in advance :)
原文:https://stackoverflow.com/questions/38478737
最满意答案
使用uid(唯一ID)作为您的pk,这将保证在任何地方创建唯一条目,因此可以进行组合,更新等。
java.util.UUID.randomUUID();
Use uids (unique ids) as your pk, this will guarantee unique entries wherever they are created so combining, updating etc can be done.
java.util.UUID.randomUUID();
相关问答
更多-
您可以从任何其他节点访问mongo集群,因为您需要知道节点(计算机)上运行config-server的端口1.可以使用以下命令启动config-server。 理想情况下,系统上应该运行3个配置服务器,因此,我正在更新将其考虑在内的步骤。
/bin/mongod --configsvr --port --dbpath ./shardedcluster/cfg0 --fork /bin/mongod --configsvr --po ... -
共享数据库数据(Sharing Database Data)[2024-02-24]
使用uid(唯一ID)作为您的pk,这将保证在任何地方创建唯一条目,因此可以进行组合,更新等。 java.util.UUID.randomUUID(); Use uids (unique ids) as your pk, this will guarantee unique entries wherever they are created so combining, updating etc can be done. java.util.UUID.randomUUID(); -
使用REST公开特定表以执行CRUD操作。 您也可以控制它的访问权限。 Use REST for exposing specific tables to do CRUD operations. You can control the access on it too.
-
就数据共享而言,解耦数据库设计的最佳方法是什么?(What is the best approach for decoupled database design in terms of data sharing?)[2022-01-13]
Streams是Oracle复制技术。 你可以通过数据库链接使用MV(所以数据库'A'具有数据库'B'的数据的物化视图,如果'B'关闭,MV不能被刷新,但数据仍然是'A')。 。 里程可能取决于数据库容量,更改容量... Streams is the Oracle replication technology. You can use MVs over database links (so database 'A' has a materialized view of the data from datab ... -
他们将无法查看您的数据,因为它将是一个不同的服务器。 您可以创建数据库的转储文件并共享它。 They will not be able to view your data as it will be a different server. You can create a dump file of your database and share it.
-
如果这些站点没有共享数据,我会说最好为每个站点创建一个单独的数据库。 如果您搞砸了任何查询,这将防止您意外损坏其他版本的表。 If these sites share no data I would say it's better to create a separate database for each. This will prevent you from accidentally damaging other version's tables if you mess up any queries.
-
这就是我们目前的运作方式。 每个站点在httpd.conf都有自己的VirtualHost条目,每个应用程序都有自己的django.wsgi配置文件,看起来像这样(你可以使用更简单的一个): import os, sys, site, glob prev_sys_path = list(sys.path) root_dir = os.path.abspath(os.path.join(os.path.dirname(__file__), '..')) site.addsitedir(glob.glob ...
-
数据库设计:使用复合键作为FK,标记数据进行共享?(Database Design: use composite key as FK, flag data for sharing?)[2023-03-29]
我需要能够在数据库中查询某个供应商输入的值,以便我可以将这些值作为选项提供。 不知何故,数据必须与供应商相关联。 这是一个相当普遍的问题,需要在多个域中解决,其中多个客户参与共同的数据库结构,但不希望看到彼此的数据。 例如,Oracle就有一种称为虚拟专用数据库的东西。 本质上,一列被添加到每个表中,并且给定行的列中的值指示谁“拥有”该行。 视图可以基于此: CUSTOMERA : create view CUSTOMERAPRODUCTS as select * from pro ... -
嗯,据我所知,这就是你需要做的。 我相信你看过这个: Grails Datasource Well, as far as I know, that's all you need to do. I'm sure you had a look at this: Grails Datasource
-
Rails 5应用程序与Rails 4应用程序共享服务器和数据库(Rails 5 app sharing the server and database with a Rails 4 app)[2023-08-04]
在同一物理服务器上运行多个版本的Rails没有问题,只要您使用rbenv或类似的东西来管理不同版本的Ruby,或者在两者上使用完全相同的Ruby版本。 我不建议在旧应用程序上更改Ruby的版本,除非你有一个测试套件,并且你不能使用早于2.2.2的Ruby运行Rails 5。 至于连接到同一个数据库:使用同一个数据库的多个Rails应用程序非常困难,因为Rails希望管理数据库上的迁移。 如果在现有的Rails应用程序中保留数据库迁移,那么如何设置数据库以在新的Rails应用程序中运行测试? 有一些方法可以做 ...