首页 \ 教程 \ hadoop

知识点

hadoop

在Hadoop的streaming中使用自定义的inputformat和outputformat

Hadoop streaming中指定自定义的inputformat java类

自定义Hadoop Writable

Hadoop 自定义InputFormat实现自定义Split

自定义实现Hadoop Key-Value

Hadoop自定义RecordReader

自定义Hadoop的可序列化类

ExtJS自定义组件中的事件传递问题

请教jquery中如何写自定义的format方法

在微信编辑器中插入自定义样式

Hadoop : 新版API 自定义InputFormat 把整个文件作为一条记录处理

自定义菜单中添加查看历史消息链接

solr中mmseg4j自定义词库配置

Hadoop Oozie学习笔记自定义安装和启动

FreeMarker自定义指令

Hadoop自定义SdfTextInputFormat用在streaming中

2019-03-28 13:47|来源: 网络

本人在（Hadoop streaming中指定自定义的inputformat java类 http://www.linuxidc.com/Linux/2012-04/57830.htm）中写了自定义InputFormat的制作。

可是直接加入后一直得不到想要的结果。

查看源码发现：

PipeMapper.java
if (!this.ignoreKey) {
write(key);
clientOut_.write(getInputSeparator());
}
write(value);
clientOut_.write('\n');
if(skipping) {
//flush the streams on every record input if running in skip mode
//so that we don't buffer other records surrounding a bad record.
clientOut_.flush();
}

只有TextInputFormat时，streaming才默认只处理Value，其他inputFormat，key和value都要处理，改写PipeMapper.java

并重新生成streaming.jar只需要在MANIFEST.MF中指定主类就可以。

MANIFEST.MF:
Manifest-Version: 1.0
Ant-Version: Apache Ant 1.7.1
Created-By: 20.1-b02 (Sun Microsystems Inc.)
Main-Class: org.apache.hadoop.streaming.HadoopStreaming

jar cvfm jar/hadoop-streaming-1.0.0.jar MANIFEST.MF -C classes/ .

更多Hadoop相关信息见Hadoop 专题页面 http://www.linuxidc.com/topicnews.aspx?tid=13

相关问答

在SQL中怎样调用自定义函数？[2022-08-29]

在select后跟你的自定义函数就可以。 mysql中用select调用自带的now()函数： mysql> select now(); 然后mysql就会返回当前的时间。
从文件中为hadoop创建映射器的自定义键值(creating custom key value for mappers in hadoop from file)[2022-02-12]

您可以在配置中设置mapreduce.input.fileinputformat.split.maxsize ，以告诉映射器您应该获得5MB的数据。 You can set mapreduce.input.fileinputformat.split.maxsize in your configuration in bytes to tell the mapper you should get 5MB of data.
Spark流自定义指标(Spark streaming custom metrics)[2023-03-21]

通过源代码挖掘后，我发现如何添加我自己的自定义指标。它需要3件事情：创建我自己的自定义源。有点像这样在spark metrics.properties文件中启用Jmx接收器。我使用的具体行是： *.sink.jmx.class=org.apache.spark.metrics.sink.JmxSink ，它为所有实例启用JmxSink 在SparkEnv指标系统中注册我的自定义源。如何做的一个例子可以在这里看到 - 我之前看过这个链接，但错过了注册部分，这使我无法真正看到JVisualVM中的 ...
Hadoop 2中的自定义log4j appender(Custom log4j appender in Hadoop 2)[2022-02-23]

1.为了在名称节点更改log4j.properties，可以更改/home/hadoop/log4j.properties。 2.为了更改容器日志的log4j.properties，您需要在容器jar中更改它，因为它们硬编码直接从项目资源加载文件。 2.1 ssh到奴隶（在EMR上，你也可以简单地将它添加为引导操作，所以你不需要ssh到每个节点）。 ssh到hadoop奴隶 2.2在jar资源上覆盖container-log4j.properties： jar uf /home/hadoop/share/h ...
自定义流实现(Custom streaming implementation)[2021-10-13]

MediaPlayer不支持您尝试做的事情（在无限增长的文件中播放音频）。相反，请考虑自己解码音频并将原始PCM数据发送到AudioTrack 。这项工作要多得多，但AudioTrack是逐步播放音频数据流的最简单方法。 What you're trying to do (play the audio in a file that keeps growing indefinitely) is not supported by MediaPlayer. Instead, look into decodin ...
自定义Hadoop Distribution支持Talend中的Spark组件(Custom Hadoop Distribution support to Spark components in Talend)[2023-09-13]

是CDH 5.0.0包含Hadoop 2.3。 Hadoop 2.4.0在路线图上，听起来像CDH 5.x可用。最好。 Yes CDH 5.0.0 contains Hadoop 2.3. Hadoop 2.4.0 is on the roadmap and sounds like it will be available for CDH 5.x. Best.
在hadoop中，我只想在每个节点上执行自己的自定义程序(In hadoop, I just want to execute my own custom program on each node)[2024-02-14]

MapReduce和Tez作业都使用YARN（Yet Another Resource Negotiator）在所谓的容器中通过集群进行分发和执行。您也可以自己使用YARN来运行自己的工作。请查看Hadoop架构概述，以获得高级概述。 Both MapReduce and Tez jobs use YARN (Yet Another Resource Negotiator) to get distributed and executed over the cluster in so-called co ...
在Hadoop中实现自定义Writable？(Implementation of custom Writable in Hadoop?)[2024-04-01]

看起来像write(DataOutput)方法中的错误： @Override public void write(DataOutput arg0) throws IOException { //write the size first // arg0.write(aggValues.size()); // here you're writing an int as a byte // try this instead: arg0.writeInt(aggValues.size()); // ...
Hadoop自定义分区程序问题(Hadoop Custom Partitioner Issue)[2023-04-05]

问题最终出现在自定义密钥（IntermediaryKey）的序列化/反序列化中。正在阅读“useBothGUIDFlag”变量，与其本应相反。在reducer中获取“mapred.task.partition”属性值有助于注意到已发生此交换。具有相反“useBothGUIDFlag”值的键似乎将转到正确的reducer。 The problem ended up being in the serialization/deserialization of the custom key (Intermed ...
如何在新的Hadoop API中设置自定义输出提交者(How do i set a custom output committer in the new Hadoop API)[2020-12-11]

我想这取决于你对新API的意思 - 在1.1.1中至少不再这样做了 - 我想我已经记得读过整个mapred包已经过早弃用了，这在以后的版本中是不推荐使用的。如果通过新API，你的意思是mapreduce包over mapred，那么OutputFormats本身有一个关联的OutputCommitter，它通过OutputFormat.getOutputCommitter方法获取 I guess it depends on what you mean by new API - in 1.1.1 at le ...

知识点

相关文章

最近更新

Hadoop自定义SdfTextInputFormat用在streaming中

相关问答

在SQL中怎样调用自定义函数？[2022-08-29]

从文件中为hadoop创建映射器的自定义键值(creating custom key value for mappers in hadoop from file)[2022-02-12]

Spark流自定义指标(Spark streaming custom metrics)[2023-03-21]

Hadoop 2中的自定义log4j appender(Custom log4j appender in Hadoop 2)[2022-02-23]

自定义流实现(Custom streaming implementation)[2021-10-13]

自定义Hadoop Distribution支持Talend中的Spark组件(Custom Hadoop Distribution support to Spark components in Talend)[2023-09-13]

在hadoop中，我只想在每个节点上执行自己的自定义程序(In hadoop, I just want to execute my own custom program on each node)[2024-02-14]

在Hadoop中实现自定义Writable？(Implementation of custom Writable in Hadoop?)[2024-04-01]

Hadoop自定义分区程序问题(Hadoop Custom Partitioner Issue)[2023-04-05]

如何在新的Hadoop API中设置自定义输出提交者(How do i set a custom output committer in the new Hadoop API)[2020-12-11]