Hadoop自定义分区程序问题(Hadoop Custom Partitioner Issue)
根据自定义分区程序的“getPartition”方法的输出,我遇到的问题是自定义中间键不会出现在我希望的分区中。 我可以在我的映射器日志文件中看到分区器生成预期的分区号,但是有时具有公共分区号的键不会在同一个reducer中结束。
具有共同“getPartition”输出的键如何以不同的reducers结束?
我注意到在所有“getPartition”调用完成后,在映射器日志文件中,对自定义中间键“hashCode”和“compareTo”方法进行了多次调用。 映射器只是在分区排序中进行还是这可能是问题的一部分?
我附加了自定义中介密钥和分区程序的代码。 注意:我知道确切的1/2键将“useBothGUIDFlag”设置为true,而1/2将此设置为false(这就是为什么我将这些键分区为分隔空间的不同部分)。 我也知道密钥似乎没有交叉到分区的另一半(即“useBothGUIDFlag”键不会在“!useBothGUIDFlag”分区中结束,反之亦然),而是它们在其一半内混合分区。
public class IntermediaryKey implements WritableComparable<IntermediaryKey> { public String guid1; public String guid2; public boolean useBothGUIDFlag; @Override public int compareTo(IntermediaryKey other) { if(useBothGUIDFlag) { if(other.useBothGUIDFlag) { return this.hashCode() - other.hashCode(); }else{ return 1; } }else{ if(!other.useBothGUIDFlag) { return guid2.compareTo(other.guid2); }else{ return -1; } } } @Override public int hashCode() { if(useBothGUIDFlag) { if(guid1.compareTo(guid2) > 0) { return (guid2+guid1).hashCode(); }else{ return (guid1+guid2).hashCode(); } }else{ return guid2.hashCode(); } } @Override public boolean equals(Object otherKey) { if(otherKey instanceof IntermediaryKey) { return this.compareTo((IntermediaryKey)otherKey) == 0; } return false; } } public static class KeyPartitioner extends Partitioner<IntermediaryKey, PathValue> { @Override public int getPartition(IntermediaryKey key, PathValue value, int numReduceTasks) { int bothGUIDReducers = numReduceTasks/2; if(bothGUIDReducers == 0) { return 0; } int keyHashCode = Math.abs(key.hashCode()); if(key.useBothGUIDFlag) { return keyHashCode % bothGUIDReducers; }else{ return (bothGUIDReducers + (keyHashCode % (numReduceTasks-bothGUIDReducers))); } } }
I am having an issue with a custom intermediary keys not ending up in the partition I would expect based on the output of the custom partitioner's "getPartition" method. I can see in my mapper log files that the partitioner produces the expected partition numbers, however sometimes keys with a common partition number do not end up at the same reducer.
How would keys with a common "getPartition" output end up at different reducers?
I noticed in the mapper log files after all "getPartition" calls have been made many calls to the custom intermediary keys "hashCode" and "compareTo" methods are made. Is the mapper just doing within partition sorting or could this be part of the issue?
I have attached code for the custom intermediary key and partitioner. Note: I know exactly 1/2 of the keys have the "useBothGUIDFlag" set to true and 1/2 have this set to false (which is why I partition these keys to separate halves of the partition space). I also know that keys do not seem to cross over into the other half of the partition (i.e., "useBothGUIDFlag" keys do not end up in the "!useBothGUIDFlag" partitions and vice versa), rather they are mixed up within their half of the partitions.
public class IntermediaryKey implements WritableComparable<IntermediaryKey> { public String guid1; public String guid2; public boolean useBothGUIDFlag; @Override public int compareTo(IntermediaryKey other) { if(useBothGUIDFlag) { if(other.useBothGUIDFlag) { return this.hashCode() - other.hashCode(); }else{ return 1; } }else{ if(!other.useBothGUIDFlag) { return guid2.compareTo(other.guid2); }else{ return -1; } } } @Override public int hashCode() { if(useBothGUIDFlag) { if(guid1.compareTo(guid2) > 0) { return (guid2+guid1).hashCode(); }else{ return (guid1+guid2).hashCode(); } }else{ return guid2.hashCode(); } } @Override public boolean equals(Object otherKey) { if(otherKey instanceof IntermediaryKey) { return this.compareTo((IntermediaryKey)otherKey) == 0; } return false; } } public static class KeyPartitioner extends Partitioner<IntermediaryKey, PathValue> { @Override public int getPartition(IntermediaryKey key, PathValue value, int numReduceTasks) { int bothGUIDReducers = numReduceTasks/2; if(bothGUIDReducers == 0) { return 0; } int keyHashCode = Math.abs(key.hashCode()); if(key.useBothGUIDFlag) { return keyHashCode % bothGUIDReducers; }else{ return (bothGUIDReducers + (keyHashCode % (numReduceTasks-bothGUIDReducers))); } } }
原文:https://stackoverflow.com/questions/13076331
最满意答案
正常比较运算符与
NULL
不兼容。Something = NULL
和Something != NULL
将返回'unknown',这会导致在结果中省略该行。 使用特殊运算符IS NULL
和IS NOT NULL
代替:SELECT * FROM actions WHERE auth_id = 6 AND trusts_number = 'N100723' AND fx_actions_id IS NOT NULL
Normal comparison operators don't work well with
NULL
. BothSomething = NULL
andSomething != NULL
will return 'unknown', which causes the row to be omitted in the result. Use the special operatorsIS NULL
andIS NOT NULL
instead:SELECT * FROM actions WHERE auth_id = 6 AND trusts_number = 'N100723' AND fx_actions_id IS NOT NULL
相关问答
更多-
在参考上述编辑之后,确定如下,这里是解决方案 使用通配符“%”时使用“LIKE”而不是“=” 所以你现在的查询应该是 $queryText = "SELECT * FROM tags WHERE tag LIKE '%" . $search . "%'"; [我在本地系统上创建了完全相同的数据库,并运行相同的代码,完成上述更改后,它按预期运行] Ok so after referring to the above edit you made, here is the solution Use "LIKE" ...
-
当你调用fetch_assoc() ,你忘记了括号: while ($row=$result5->fetch_assoc) { 应该: while ($row=$result5->fetch_assoc()) { 因此,永远不会输入循环,并且未设置$uid和$visits (因此它们仍然为null )。 When you're calling fetch_assoc(), you're forgetting the parentheses: while ($row=$result5->fetch_ass ...
-
使用此代码 app.get('/customers', function(req, res) { getCustomersQuery(function(err, result) { if (err) { console.log(err); } res.json({ customers : result }); }); }); function getCus ...
-
select (Your entire current Select statement goes here) as Alias from dual 或者可能只是 select (Your entire current Select statement goes here) as Alias 在任何一种情况下,您都在选择单个值。 这意味着: 如果您的select返回一个值,则返回该值。 如果select语句返回一列但没有行,则返回NULL。 如果您的select语句返回多列和/或多行,则这 ...
-
测试这个,它似乎返回你想要的。 让我知道如果这是不正确的 SELECT SUM(1) as total, ( 3959 * Acos( Cos(Radians('52.97682200')) * Cos(Radians(lat)) * Cos(Radians(lng) - Radians(-0.02210000)) + Sin(Radians(52.97682200)) ...
-
PDO声明返回false(PDO Statement returning false)[2022-03-08]
我认为你的班级里一定会有其他的错误,导致这段代码无法工作。 我已经导入了您的表格结构并创建了以下测试代码: true, 'pointshop' => true]; public function __construct($pdo, $uniqueid) { $this->pdo = $pd ... -
如果你使用$query = $this->db->get('users', 1); 这种格式在CI中查询,那么行数将通过以下方式实现: $this->db->where('user_name', $username ); $this->db->or_where('email', $username); $query = $this->db->get('users')->result_array(); if(count($query) == 0){ log_messa ...
-
JDBC:空/ null结果集?(JDBC: empty/null result set?)[2021-07-22]
如果可能存在更多与数据库的并发连接,则首先要发出警告。 原因很简单,更多会话将读取最大值增加它并插入重复项。 另一方面, 如果每次只有一个连接,这是一种有效的方法 。 问题是MAX函数总是返回一条记录,因此测试并不意味着完整。 检查必须检查返回的值是NULL还是NOT NULL。 为此,定义了wasNull方法 - 参见下面的剪辑: if(rs.next()) { ResultSet idx = rs.getInt(1) if (rs.wasNull()) { idx = 0;} // no ... -
正常比较运算符与NULL不兼容。 Something = NULL和Something != NULL将返回'unknown',这会导致在结果中省略该行。 使用特殊运算符IS NULL和IS NOT NULL代替: SELECT * FROM actions WHERE auth_id = 6 AND trusts_number = 'N100723' AND fx_actions_id IS NOT NULL 关于NULL及其背景的维基百科 Normal comparison operat ...
-
您可以使用1 = 1的LEFT JOIN SELECT e.id AS current, prev.id AS previous, next.id AS next FROM events e LEFT JOIN ( SELECT id FROM events WHERE date < '{$result['date']}' ORDER BY date DESC LIMIT 1 ) ON prev 1=1 LEFT JOIN ( SELE ...