“购买x的顾客也购买了y”的Hadoop数据流效率(Hadoop data flow efficiency for “customers who bought x also bought y”)
我开始使用Hadoop,并且正在为“购买x的客户也购买y”构建MapReduce链,其中y是最常用x购买的产品。 我正在寻求提高此任务效率的建议,我的意思是减少从映射器节点到Reducer节点的数据量 。 我的目标与其他“顾客购买x”情景有点不同,因为我只想存储最常用的产品,而不是按照频率排列的产品购买的产品列表。
我正在关注这篇博文来指导我的方法。
据我所知,如果Hadoop中的一个大型性能限制器是将数据从映射器节点转移到reducer节点,那么对于MapReduce链的每个阶段,我都希望将混洗数据量保持在最低水平。
比方说,我的初始数据集是一个SQL表
purchases_products
,是购买和购买产品之间的连接表。 我将select x.product_id, y.product_id from purchases_products x inner join purchases_products y on x.purchase_id = y.purchase_id and x.product_id != y.product_id
到我的MapReduce操作中。我的MapReduce策略是将
product_id_x, product_id_y
映射到product_id_x_product_id_y, 1
,然后在减少步骤中对这些值进行求和。 到那时,我可以拆分键并将对存储回SQL表。我对这个操作的问题是,即使我想要生成的结果集的大小只是
count(products)
很大,它可能会混洗大量的行。 理想情况下,我希望在这个阶段有一个组合器步骤来缩小行减速器的行数,但我没有看到可靠地做到这一点的方法。这仅仅是手头任务的限制,还是有Hadoop技巧来组织工作流程,这将帮助我在第二步中缩减数据洗牌? 在这种情况下,我担心洗牌大小是否合适?
谢谢!
Am getting started with Hadoop, and am working on building a MapReduce chain for "customers who bought x also bought y", where y is the product that is purchased most frequently with x. I am looking for advice on increasing the efficiency of this task, by which I mean reducing the amount of data shuffled from mapper nodes to reducer node. My goal is a little different than other "customer bought x" scenarios, because I simply want to store the most commonly purchased product for a given product, not a list of products purchased with a given product ranked by frequency.
I am following this blog post to guide my approach.
If, as I understand, one of the big performance limiters in Hadoop is shuffling data from the mapper nodes to the reducer node, then, for every phase of the MapReduce chain, I want to keep the amount of shuffled data at a minimum.
Let's say my initial data set is a SQL table
purchases_products
, a join table between a purchase and products that were bought in that purchase. I'll feedselect x.product_id, y.product_id from purchases_products x inner join purchases_products y on x.purchase_id = y.purchase_id and x.product_id != y.product_id
into my MapReduce operation.My MapReduce strategy is to map
product_id_x, product_id_y
toproduct_id_x_product_id_y, 1
and then sum the values in my reduce step. At then end I can split the keys and store pairs back to a SQL table.My problem with this operation is that it shuffles a potentially huge number of rows, even though the size of the result set I want to produce is only
count(products)
big. Ideally, I'd like to have a combiner step narrow the amount of rows shuffled to reducers during this phase, but I don't see a way to reliably do this.Is this simply a limitation of the task at hand, or are there Hadoop tricks for organizing the workflow that will help me shrink the data shuffle during the second step? Is my worry about shuffle size appropriate in this case, or not?
Thanks!
原文:https://stackoverflow.com/questions/9774049
最满意答案
--- 从评论更新 ---
作为Symfony 2.1,您必须使用
{{ app.request.locale }}
要么
{{ app.request.getLocale() }}
如果没有设置
app.request.defaultLocale
则返回app.request.locale
如果可用)和app.request.defaultLocale
。---UPDATED FROM THE COMMENTS---
As Symfony 2.1, you must use
{{ app.request.locale }}
or
{{ app.request.getLocale() }}
which returns
app.request.locale
if available andapp.request.defaultLocale
ifapp.request.locale
is not set.
相关问答
更多-
下列中不属于面向对象的编程语言的是?[2022-05-30]
a -
--- 从评论更新 --- 作为Symfony 2.1,您必须使用 {{ app.request.locale }} 要么 {{ app.request.getLocale() }} 如果没有设置app.request.defaultLocale则返回app.request.locale如果可用)和app.request.defaultLocale 。 ---UPDATED FROM THE COMMENTS--- As Symfony 2.1, you must use {{ app.request. ...
-
尝试添加 $this->getContainer()->get('translator')->setLocale($reminder->getLocale()); 因为语言环境仅在初始化时保存在转换器中,而不是在调用trans()时保存 Try to add $this->getContainer()->get('translator')->setLocale($reminder->getLocale()); Because locale saved at translator only at its ...
-
在实际处理请求之前编译DIC之前加载配置,因为它可以从Request访问, 您无法在配置中访问区域设置 。 如果您不介意一个额外的重定向,您可以执行一个操作,在验证失败后将用户重定向到正确的路由: namespace Acme\DemoBundle\Controller; use Symfony\Bundle\FrameworkBundle\Controller\Controller; class SecurityController extends Controller { public fu ...
-
您应该能够访问类似DI容器的类中的路由器服务。 所以,你可以写下如下的东西: $routes = $this->container->get('router')->getRouteCollection(); $route = $routes->get('my_route_name'); print_r($route->getRequirements()); You should be able to access the router service inside a class that's DI c ...
-
重写Symfony2表单主题时,不存在变量“表单”(Variable “form” does not exist when overriding Symfony2 form theme)[2022-08-30]
如果扩展布局的一个或多个twig文件不包含表单,那么可能您不会将表单变量传递给它们。 您可以通过更新布局文件来解决此问题,如下所示: {% if form is defined %} {% form_theme form 'SiteBackendBundle:Form:fields.html.twig' %} {% endif %} If one or more of the twig files that extend your layout do not contain a form then pre ... -
我找到了一个解决方案,不是最佳解决方案,但有效 在我的主页 HomepageController public function indexAction() { $request = $this->get('request'); /** * Store it into session the user language */ $sessionId = $this->get("session"); if($sessionId->get("lingua")= ...
-
我正在使用这个听众并为我工作。 您可以通过在查询字符串中传递“lang”参数来更改语言。 use Symfony\Component\HttpKernel\Event\GetResponseEvent; use Symfony\Component\EventDispatcher\EventSubscriberInterface; use Symfony\Component\HttpKernel\HttpKernel; class LanaguageListener implements EventSub ...
-
您必须在没有提供区域设置的情况下定义另一条路径来覆盖场景,请尝试将路径定义更改为: #src/AppBundle/Controller/Admin/MyController.php class MyController extends Controller { /** * @Route( * "/admin/my", * defaults={"_locale":"%locale%"}, * ) * @Route( ...
-
在你的Twig扩展的getFunctions()方法中,你告诉Twig可调用的是字符串renderLogo 。 这被PHP解释为函数renderLogo() 。 对于对象方法,您需要传递一个数组,其中第一个元素是对象,第二个元素是方法名称(请参阅http://php.net/manual/en/language.types.callable.php ): public function getFunctions() { return [ new \Twig_SimpleFuncti ...