首页 \ 问答 \ Hadoop自定义分区程序问题(Hadoop Custom Partitioner Issue)

Hadoop自定义分区程序问题(Hadoop Custom Partitioner Issue)

根据自定义分区程序的“getPartition”方法的输出,我遇到的问题是自定义中间键不会出现在我希望的分区中。 我可以在我的映射器日志文件中看到分区器生成预期的分区号,但是有时具有公共分区号的键不会在同一个reducer中结束。

具有共同“getPartition”输出的键如何以不同的reducers结束?

我注意到在所有“getPartition”调用完成后,在映射器日志文件中,对自定义中间键“hashCode”和“compareTo”方法进行了多次调用。 映射器只是在分区排序中进行还是这可能是问题的一部分?

我附加了自定义中介密钥和分区程序的代码。 注意:我知道确切的1/2键将“useBothGUIDFlag”设置为true,而1/2将此设置为false(这就是为什么我将这些键分区为分隔空间的不同部分)。 我也知道密钥似乎没有交叉到分区的另一半​​(即“useBothGUIDFlag”键不会在“!useBothGUIDFlag”分区中结束,反之亦然),而是它们在其一半内混合分区。

public class IntermediaryKey implements WritableComparable<IntermediaryKey> {

    public String guid1;
    public String guid2;
    public boolean useBothGUIDFlag;

    @Override
    public int compareTo(IntermediaryKey other) {
        if(useBothGUIDFlag)
        {
            if(other.useBothGUIDFlag)
            {
                return this.hashCode() - other.hashCode();
            }else{
                return 1;
            }
        }else{
            if(!other.useBothGUIDFlag)
            {
                return guid2.compareTo(other.guid2);
            }else{
                return -1;
            }
        }
    }

    @Override
    public int hashCode()
    {
        if(useBothGUIDFlag)
        {
            if(guid1.compareTo(guid2) > 0)
            {
                return (guid2+guid1).hashCode();
            }else{
                return (guid1+guid2).hashCode();
            }
        }else{
            return guid2.hashCode();
        }
    }

    @Override
    public boolean equals(Object otherKey)
    {
        if(otherKey instanceof IntermediaryKey)
        {
            return this.compareTo((IntermediaryKey)otherKey) == 0;
        }
        return false;
    }
}

public static class KeyPartitioner extends Partitioner<IntermediaryKey, PathValue>
{
    @Override
    public int getPartition(IntermediaryKey key, PathValue value, int numReduceTasks) {
        int bothGUIDReducers = numReduceTasks/2;
        if(bothGUIDReducers == 0)
        {
            return 0;
        }

        int keyHashCode = Math.abs(key.hashCode());
        if(key.useBothGUIDFlag)
        {
            return keyHashCode % bothGUIDReducers;
        }else{
            return (bothGUIDReducers + (keyHashCode % (numReduceTasks-bothGUIDReducers)));
        }
    }
}

I am having an issue with a custom intermediary keys not ending up in the partition I would expect based on the output of the custom partitioner's "getPartition" method. I can see in my mapper log files that the partitioner produces the expected partition numbers, however sometimes keys with a common partition number do not end up at the same reducer.

How would keys with a common "getPartition" output end up at different reducers?

I noticed in the mapper log files after all "getPartition" calls have been made many calls to the custom intermediary keys "hashCode" and "compareTo" methods are made. Is the mapper just doing within partition sorting or could this be part of the issue?

I have attached code for the custom intermediary key and partitioner. Note: I know exactly 1/2 of the keys have the "useBothGUIDFlag" set to true and 1/2 have this set to false (which is why I partition these keys to separate halves of the partition space). I also know that keys do not seem to cross over into the other half of the partition (i.e., "useBothGUIDFlag" keys do not end up in the "!useBothGUIDFlag" partitions and vice versa), rather they are mixed up within their half of the partitions.

public class IntermediaryKey implements WritableComparable<IntermediaryKey> {

    public String guid1;
    public String guid2;
    public boolean useBothGUIDFlag;

    @Override
    public int compareTo(IntermediaryKey other) {
        if(useBothGUIDFlag)
        {
            if(other.useBothGUIDFlag)
            {
                return this.hashCode() - other.hashCode();
            }else{
                return 1;
            }
        }else{
            if(!other.useBothGUIDFlag)
            {
                return guid2.compareTo(other.guid2);
            }else{
                return -1;
            }
        }
    }

    @Override
    public int hashCode()
    {
        if(useBothGUIDFlag)
        {
            if(guid1.compareTo(guid2) > 0)
            {
                return (guid2+guid1).hashCode();
            }else{
                return (guid1+guid2).hashCode();
            }
        }else{
            return guid2.hashCode();
        }
    }

    @Override
    public boolean equals(Object otherKey)
    {
        if(otherKey instanceof IntermediaryKey)
        {
            return this.compareTo((IntermediaryKey)otherKey) == 0;
        }
        return false;
    }
}

public static class KeyPartitioner extends Partitioner<IntermediaryKey, PathValue>
{
    @Override
    public int getPartition(IntermediaryKey key, PathValue value, int numReduceTasks) {
        int bothGUIDReducers = numReduceTasks/2;
        if(bothGUIDReducers == 0)
        {
            return 0;
        }

        int keyHashCode = Math.abs(key.hashCode());
        if(key.useBothGUIDFlag)
        {
            return keyHashCode % bothGUIDReducers;
        }else{
            return (bothGUIDReducers + (keyHashCode % (numReduceTasks-bothGUIDReducers)));
        }
    }
}

原文:https://stackoverflow.com/questions/13076331
更新时间:2023-04-05 07:04

最满意答案

正常比较运算符与NULL不兼容。 Something = NULLSomething != NULL将返回'unknown',这会导致在结果中省略该行。 使用特殊运算符IS NULLIS NOT NULL代替:

SELECT * FROM actions 
WHERE auth_id = 6 
  AND trusts_number = 'N100723' 
  AND fx_actions_id IS NOT NULL

关于NULL及其背景的维基百科


Normal comparison operators don't work well with NULL. Both Something = NULL and Something != NULL will return 'unknown', which causes the row to be omitted in the result. Use the special operators IS NULL and IS NOT NULL instead:

SELECT * FROM actions 
WHERE auth_id = 6 
  AND trusts_number = 'N100723' 
  AND fx_actions_id IS NOT NULL

Wikipedia on NULL and its background

相关问答

更多

相关文章

更多

最新问答

更多
  • h2元素推动其他h2和div。(h2 element pushing other h2 and div down. two divs, two headers, and they're wrapped within a parent div)
  • 创建一个功能(Create a function)
  • 我投了份简历,是电脑编程方面的学徒,面试时说要培训三个月,前面
  • PDO语句不显示获取的结果(PDOstatement not displaying fetched results)
  • Qt冻结循环的原因?(Qt freezing cause of the loop?)
  • TableView重复youtube-api结果(TableView Repeating youtube-api result)
  • 如何使用自由职业者帐户登录我的php网站?(How can I login into my php website using freelancer account? [closed])
  • SQL Server 2014版本支持的最大数据库数(Maximum number of databases supported by SQL Server 2014 editions)
  • 我如何获得DynamicJasper 3.1.2(或更高版本)的Maven仓库?(How do I get the maven repository for DynamicJasper 3.1.2 (or higher)?)
  • 以编程方式创建UITableView(Creating a UITableView Programmatically)
  • 如何打破按钮上的生命周期循环(How to break do-while loop on button)
  • C#使用EF访问MVC上的部分类的自定义属性(C# access custom attributes of a partial class on MVC with EF)
  • 如何获得facebook app的publish_stream权限?(How to get publish_stream permissions for facebook app?)
  • 如何防止调用冗余函数的postgres视图(how to prevent postgres views calling redundant functions)
  • Sql Server在欧洲获取当前日期时间(Sql Server get current date time in Europe)
  • 设置kotlin扩展名(Setting a kotlin extension)
  • 如何并排放置两个元件?(How to position two elements side by side?)
  • 如何在vim中启用python3?(How to enable python3 in vim?)
  • 在MySQL和/或多列中使用多个表用于Rails应用程序(Using multiple tables in MySQL and/or multiple columns for a Rails application)
  • 如何隐藏谷歌地图上的登录按钮?(How to hide the Sign in button from Google maps?)
  • Mysql左连接旋转90°表(Mysql Left join rotate 90° table)
  • dedecms如何安装?
  • 在哪儿学计算机最好?
  • 学php哪个的书 最好,本人菜鸟
  • 触摸时不要突出显示表格视图行(Do not highlight table view row when touched)
  • 如何覆盖错误堆栈getter(How to override Error stack getter)
  • 带有ImageMagick和许多图像的GIF动画(GIF animation with ImageMagick and many images)
  • USSD INTERFACE - > java web应用程序通信(USSD INTERFACE -> java web app communication)
  • 电脑高中毕业学习去哪里培训
  • 正则表达式验证SMTP响应(Regex to validate SMTP Responses)