首页 \ 教程 \ hadoop

知识点

hadoop

Hbase基于Mapreduce的编程

HBase在淘宝的应用和优化小结

HBase 在淘宝的应用和优化

HBase在淘宝的应用和优化小结

hbase在淘宝的应用和优化小结

HBase 在淘宝的应用和优化

Hadoop权威指南学习（二）——HDFS & Hadoop IO

Hadoop core之IO

转：HBase在淘宝的应用和优化小结

使用Hadoop构建MapReduce应用

MapReduce TotalOrderPartitioner 全局排序

Hadoop-MapReduce后时代

Java 流(Stream)、文件(File)和IO

Hadoop之MapReduce

Java：IO/NIO篇，读写属性文件（properties）

Mapreduce 读取Hbase,写入hbase IO 不均衡问题

2019-03-28 13:11|来源: 网络

硬件环境：h46、h47、h48 三个节点 2cpu 4核共8个核心 14G 内存

软件环境：

三台机器分别部署Hadoop、hbase 并同时作为datanode 和 tasktracker regionserver、HQuorumPeer；

H46同时为Namenode、Jobtracker 和HMaster 和HQuorumPeer

出现问题:跑mapreduce 时使用 iostat 1| grep sdb 查看各节点的Io情况，发现h46有和适量的IOWrite h48 的IOWrite 也在核实范围内，唯独h47 基本没有IO。

查找原因：mapreduce 执行报告中如下提示

Launched map tasks=207

local map tasks=92

意思是大多数map 的数据都不是本地的，根据原因查找数据量比较大的task 去查看其具体执行情况。

All Task Attempts

Task Attempts

Machine

attempt_201212071915_1816_m_000225_0

/default-rack/h47

意思是此task 是交给h47 来执行，没有问题，但47为什么没有IO呢

看下面

Input Split Locations

/default-rack/h48

也就是说这个任务的数据是h48的

为什么出现这种情况呢，为什么非要从48 远程来跑task呢。

具体查看一下 .META.信息表就可以看到，每个region 都是由具体的一个regionserver来管理，具体是哪个是在每个region 的映射中的info:server 字段来指定。

可以查看其实我的所有region 并没有给h47的regionserver 来管理，所以h47 就没有IO

产生原因:

具体就看hmaster 是如何分配regions 的

目前动态分配region是由master来分配，使用的是随机分配regions

而Hbase的 DefaultLoadBalance 是按照整体负载均衡的方式来分配region 的，而不是按照其中每张表所占的region 进行均衡banlance，导致每张表可能会不均等的分配到不同的region 上。

所以就会出现上面的情况，我们所访问的hbase 表region 分配不均衡，但分配给具体TaskTracker的任务，而region 又不归它管，导致local map tasks<< Launched map tasks=207

更多Hadoop相关信息见Hadoop 专题页面 http://www.linuxidc.com/topicnews.aspx?tid=13

相关问答

如何使用spark从hbase读取(How to read from hbase using spark)[2023-10-30]

使用Spark（Scala）读取HBase数据的基本示例，您还可以使用Java来描述这一点： import org.apache.hadoop.hbase.client.{HBaseAdmin, Result} import org.apache.hadoop.hbase.{ HBaseConfiguration, HTableDescriptor } import org.apache.hadoop.hbase.mapreduce.TableInputFormat import org.apache.ha ...
使用MultipleInputs的Hbase MapReduce作业：无法将LongWritable强制转换为ImmutableBytesWritable(Hbase MapReduce job using MultipleInputs: cannot cast LongWritable to ImmutableBytesWritable)[2022-05-22]

我得到了答案：在以下语句中：将TextInputFormat.class替换为TableInputFormat.class MultipleInputs.addInputPath（job，inputPath1，TextInputFormat.class，TableMap.class）; I got the answer: in the following statement:replace TextInputFormat.class to TableInputFormat.class MultipleInp ...
如何配置Spark Streaming Scala应用程序以从Hadoop + Yarn上的HBase读取(How to configure Spark Streaming Scala app to read from HBase on Hadoop + Yarn)[2022-02-20]

首先，SBT找不到类HBaseConf 。这是因为您已导入org.apache.hadoop.hbase.HBaseConfiguration ，但您需要的类是unicredit.spark.hbase.HBaseConf 。你的第二个问题是 value hbase is not a member of org.apache.spark.streaming.StreamingContextvalue hbase is not a member of org.apache.spark.streaming. ...
Hbase mapside join-其中一个表没有被读取？(Hbase mapside join- One of the tables is not getting read? read from hbase and right result into hbase)[2023-01-01]

通过阅读您的问题陈述，我相信您对使用多个HBase表输入有一些错误的想法。我建议你在一个HashMap中加载小表，在mapper类的setup方法中。然后在大表上使用map only job，在map方法中，您可以从之前加载的HashMap中获取相应的值。让我知道这是如何工作的。 By reading your problem statement I believe you have got some wrong idea about uses of Multiple HBase table inpu ...
HBase上的mapreduce需要哪些库？(Which libraries are needed for mapreduce on HBase?)[2024-01-06]

除了hbase-client依赖关系，您还需要相同版本的hbase-server依赖关系，这还包括您需要的mapreduce库。如果你使用Maven，你需要添加你的pom.xml文件： org.apache.hbase hbase-server 1.1.2 （在编写最后一个版本的时刻是1.1.2 ）祝你 ...
使用MultipleOutputs在MapReduce中写入HBase(Writing to HBase in MapReduce using MultipleOutputs)[2023-08-04]

我以不同的方式将数据放入HBase 3。最有效（和分布式）使用HFileOutputFormat类。我按如下方式设置了工作...（请注意，这是根据实际代码编辑的，但核心内容仍然存在） cubeBuilderETLJob.setJobName(jobName); cubeBuilderETLJob.setMapOutputKeyClass(ImmutableBytesWritable.class); cubeBuilderETLJob.setMapOutputValueClass(Put.class); ...
使用Apache Spark从HBase读取数据(Reading data from HBase using Apache Spark)[2022-05-25]

好吧显然这是一个意想不到的依赖问题（因为它始终没有任何意义）。这些是我为解决这个问题而采取的步骤（希望它们能帮助未来的开发人员）：我使用完全相同的代码创建了一个干净的项目。这没有任何问题立即让我怀疑它是某种依赖性问题为了确保，我将HBase依赖项放在依赖项的顶部。这创建了一个与Spark和安全性相关的异常，更具体地说：javax.servlet.FilterRegistration 然后我遇到了这个有用的解决方案，为我解决了这个问题。我不得不从我的pom中排除所有的javax和mortbay码头 ...
当MapReduce在表上运行时，HBase MapReduce的版本如何读取？(How may versions does HBase MapReduce reads when MapReduce is run on a table ?)[2022-04-11]

当您点击create 'test', 'cf'会创建默认版本create 'test', 'cf'为3 但是当你在hbase shell中扫描时 scan 'test' =>仅返回最新版本。 scan 'test', {VERSIONS => 3} =>返回3个版本（如果可用）。在Java中，默认扫描始终返回最新版本，但您可以通过此行强制获取以前的版本 scan.setMaxVersions(int maxVersions); Default versions gets created when you ...
HBase with MapReduce选项(HBase with MapReduce option)[2022-03-11]

HBase是数据库，它没有为其操作运行mapreduce的选项，如get，scan，put ...... 如果您想以mapreduce样式处理HBase中的数据您需要创建自定义地图缩减作业，或使用其他分析工具，如Hive，Pig，... Hive是建立在Hadoop mapreduce之上的数据仓库平台。它可以读取来自许多不同来源的数据，如HDFS文件，S3文件，HBase等...... 希望这对你有用。 HBase is database which and it doesn't have option ...
将数据从MapReduce导入HBase（TableOutputFormat错误）(Import data from MapReduce to HBase (TableOutputFormat error))[2022-06-20]

错误实际上是由此消息引起的：引起：java.lang.ClassNotFoundException：org.cloudera.htrace.Trace * 可能你在类路径中缺少一个jar。上面提到的类可能会间接引用您的代码。尝试将包含此类的jar放在classpath中。希望这可以帮助！！！ Error is actually caused by this message: Caused by: java.lang.ClassNotFoundException: org.cloudera.htrac ...

知识点

相关文章

最近更新

Mapreduce 读取Hbase,写入hbase IO 不均衡问题

相关问答

如何使用spark从hbase读取(How to read from hbase using spark)[2023-10-30]

使用MultipleInputs的Hbase MapReduce作业：无法将LongWritable强制转换为ImmutableBytesWritable(Hbase MapReduce job using MultipleInputs: cannot cast LongWritable to ImmutableBytesWritable)[2022-05-22]

如何配置Spark Streaming Scala应用程序以从Hadoop + Yarn上的HBase读取(How to configure Spark Streaming Scala app to read from HBase on Hadoop + Yarn)[2022-02-20]

Hbase mapside join-其中一个表没有被读取？(Hbase mapside join- One of the tables is not getting read? read from hbase and right result into hbase)[2023-01-01]

HBase上的mapreduce需要哪些库？(Which libraries are needed for mapreduce on HBase?)[2024-01-06]

使用MultipleOutputs在MapReduce中写入HBase(Writing to HBase in MapReduce using MultipleOutputs)[2023-08-04]

使用Apache Spark从HBase读取数据(Reading data from HBase using Apache Spark)[2022-05-25]

当MapReduce在表上运行时，HBase MapReduce的版本如何读取？(How may versions does HBase MapReduce reads when MapReduce is run on a table ?)[2022-04-11]

HBase with MapReduce选项(HBase with MapReduce option)[2022-03-11]

将数据从MapReduce导入HBase（TableOutputFormat错误）(Import data from MapReduce to HBase (TableOutputFormat error))[2022-06-20]