在YARN上运行时,Spark调度程序池如何工作?(How do Spark scheduler pools work when running on YARN?)
我在YARN(Hadoop 2.6.0 / CDH 5.5)上部署了各种Spark版本(1.6,2.0,2.1)。 我试图保证某个应用程序永远不会在我们的YARN集群上缺乏资源,无论在那里运行的是什么。
我启用了shuffle服务并设置了一些Fair Scheduler Pools ,如Spark文档中所述。 我为高优先级应用程序创建了一个单独的池,我希望永远不会缺乏资源,并给它一个
minShare
的资源共享:<?xml version="1.0"?> <allocations> <pool name="default"> <schedulingMode>FAIR</schedulingMode> <weight>1</weight> <minShare>0</minShare> </pool> <pool name="high_priority"> <schedulingMode>FAIR</schedulingMode> <weight>1</weight> <minShare>24</minShare> </pool> </allocations>
当我在YARN集群上运行Spark应用程序时,我可以看到我配置的池被识别:
17/04/04 11:38:20 INFO scheduler.FairSchedulableBuilder: Created pool default, schedulingMode: FAIR, minShare: 0, weight: 1 17/04/04 11:38:20 INFO scheduler.FairSchedulableBuilder: Created pool high_priority, schedulingMode: FAIR, minShare: 24, weight: 1
但是,我没有看到我的应用程序正在使用新的
high_priority
池,即使我在调用spark-submit
时设置了spark.scheduler.pool
。 这意味着当群集与常规活动挂钩时,我的高优先级应用程序无法获得所需的资源:17/04/04 11:39:49 INFO cluster.YarnScheduler: Adding task set 0.0 with 1 tasks 17/04/04 11:39:50 INFO scheduler.FairSchedulableBuilder: Added task set TaskSet_0 tasks to pool default 17/04/04 11:39:50 INFO spark.ExecutorAllocationManager: Requesting 1 new executor because tasks are backlogged (new desired total will be 1) 17/04/04 11:40:05 WARN cluster.YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
我在这里想念的是什么? 我的同事和我试图在YARN中实现先发制人,但这没有做任何事情。 然后我们意识到YARN中的概念与称为YARN队列的 Spark调度程序池非常相似。 所以现在我们不确定这两个概念是否会以某种方式发生冲突。
我们如何让我们的高优先级池按预期工作? Spark调度程序池和YARN队列之间是否存在某种冲突?
I have a mix of Spark versions (1.6, 2.0, 2.1) all deployed on YARN (Hadoop 2.6.0 / CDH 5.5). I'm trying to guarantee that a certain application will never be starved of resources on our YARN cluster, regardless of what else may be running on there.
I've enabled the shuffle service and setup some Fair Scheduler Pools as described in the Spark docs. I created a separate pool for the high priority application I want never to be starved of resources, and gave it a
minShare
of resources:<?xml version="1.0"?> <allocations> <pool name="default"> <schedulingMode>FAIR</schedulingMode> <weight>1</weight> <minShare>0</minShare> </pool> <pool name="high_priority"> <schedulingMode>FAIR</schedulingMode> <weight>1</weight> <minShare>24</minShare> </pool> </allocations>
When I run a Spark application on our YARN cluster, I can see that the pools I configured are recognized:
17/04/04 11:38:20 INFO scheduler.FairSchedulableBuilder: Created pool default, schedulingMode: FAIR, minShare: 0, weight: 1 17/04/04 11:38:20 INFO scheduler.FairSchedulableBuilder: Created pool high_priority, schedulingMode: FAIR, minShare: 24, weight: 1
However, I don't see that my application is using the new
high_priority
pool, even though I am settingspark.scheduler.pool
in my call tospark-submit
. So that means when the cluster is pegged by regular activity, my high priority application is not getting the resources it needs:17/04/04 11:39:49 INFO cluster.YarnScheduler: Adding task set 0.0 with 1 tasks 17/04/04 11:39:50 INFO scheduler.FairSchedulableBuilder: Added task set TaskSet_0 tasks to pool default 17/04/04 11:39:50 INFO spark.ExecutorAllocationManager: Requesting 1 new executor because tasks are backlogged (new desired total will be 1) 17/04/04 11:40:05 WARN cluster.YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
What am I missing here? My coworkers and I tried enabling preemption in YARN, but that didn't do anything. And then we realized that there is a concept in YARN very similar to Spark scheduler pools called YARN queues. So now we're not sure if the two concepts conflict somehow.
How can we get our high priority pool to work as expected? Is there some kind of conflict between Spark scheduler pools and YARN queues?
原文:https://stackoverflow.com/questions/43239921
最满意答案
这应该是(假设你的字符串存储在'
$string
'中):my ($var1, $var2) = $string =~ /_(\d+)_(\d+)/s;
这个想法是抓住数字,直到你得到一个非数字字符:这里'
_
'。然后将每个捕获组分配给它们各自的变量。
\d
确实可以匹配超过10个不同的字符,如果应用于Unicode字符串。所以你可以使用:
my ($var1, $var2) = $string =~ /_([0-9]+)_([0-9]+)/s;
That should be (assuming your string is stored in '
$string
'):my ($var1, $var2) = $string =~ /_(\d+)_(\d+)/s;
The idea is to grab numbers until you get a non-number character: here '
_
'.Each capturing group is then assign to their respective variable.
As mentioned in this question (and in the comments below by Kaoru):
\d
can indeed match more than 10 different characters, if applied to Unicode strings.So you can use instead:
my ($var1, $var2) = $string =~ /_([0-9]+)_([0-9]+)/s;
相关问答
更多-
这应该是(假设你的字符串存储在' $string '中): my ($var1, $var2) = $string =~ /_(\d+)_(\d+)/s; 这个想法是抓住数字,直到你得到一个非数字字符:这里' _ '。 然后将每个捕获组分配给它们各自的变量。 正如在这个问题中提到的(以及Kaoru 在下面的评论中): \d确实可以匹配超过10个不同的字符,如果应用于Unicode字符串。 所以你可以使用: my ($var1, $var2) = $string =~ /_([0-9]+)_([0-9]+ ...
-
只需改变这个: perl -e '$t="10 hello 25 moo 31 foo"; $o=@{ ( $t =~ /(\d+)/g ) }[1]; print "$o\n";' 至: perl -e '$t="10 hello 25 moo 31 foo"; $o=( $t =~ /(\d+)/g )[1]; print "$o\n";' Just change this: perl -e '$t="10 hello 25 moo 31 foo"; $o=@{ ( $t =~ /(\d+)/g ) ...
-
我会用sed: OUTPUT=$(echo $OUTPUT| sed 's/[^0-9]//g') 例如,删除所有非数字字符。 I would use sed: OUTPUT=$(echo $OUTPUT| sed 's/[^0-9]//g') to delete all non-digit characters for instance.
-
将jquery正则表达式匹配中的匹配值分配给字符串变量(assign matched values from jquery regex match to string variable)[2024-03-09]
我检查了jquery文档,他们说匹配应该返回一个数组。 jQuery没有这样的方法。 match是字符串的标准javascript方法。 所以使用你的例子,这可能是 var str = "blah:xx:blahdeeblah"; var matchedString = str.match(/([^.:]+):(.*?):([^.:]+)/); alert(matchedString[2]); // -> "xx" 但是,你真的不需要正则表达式。 您可以使用另一个字符串方法split()使用分隔符将字符串 ... -
这实际上非常棘手。 它正在做的是利用perl的短路功能来制作条件语句。 这跟说这个是一样的。 if (/^(From|Subject):\s+/i) { print $_; } 它的工作原理是因为perl在评估为0后停止计算和语句。除非另有说明,否则形式为/regex/而不是$somevar =~ /regex/的正则$somevar =~ /regex/将正则表达式应用于默认变量$_ 你可以像这样存储它 my $var; if (/^(From|Subject):\s+/i) { ...
-
作为参考,对于大多数正则表达式引擎,组匹配不像数组那样累积。 Dot-Net是一个可以做到这一点的例外(集合)。 我道歉,你是对的,它需要改变。 但是,你必须强制找到第一个OR c。 这是通过条件前瞻来完成的。 祝你好运! # ^.*?(?:(?:(?
(?:\baa\b.*?\bbb\b|\bbb\b.*?\baa\b))(?(?=.*\b(?:cc|aa)\b).*(? (?:\bcc\b|\baa\b))|))|(? \b(?:cc|aa)\b)) ^ .* ... -
perl正则表达式匹配相同的行(perl regex matching same line)[2022-06-02]
在我们看到一些数据之前,这个问题无法得到充分回答。 同时这里有一些关于代码的评论。 首先是该计划 use warnings; use strict; use 5.012; # so readdir assigns to $_ in while (readdir $dh) my %definitions; my $dir = '/path/to/dir'; opendir my $dh, $dir or die "Can't open $dir: $!"; while (my $file = r ... -
您可以使用谓词: var checkStrings = { checkFirstRegex = function(x) { return x.match(/[aeiou]/gi); }, checkSecondRegex = function(x) { return /ee/.test(x); } }; // call on input checkStrings.firstRegex(input); 使用ES6箭头功能,它变得更短: va ...
-
Javascript:测试正则表达式并分配给变量,如果它在一行中匹配(Javascript: test regex and assign to variable if it matches in one line)[2022-02-23]
您可以使用OR运算符( || ): var result = (someString.match(/some_regex/gi) || ['default_value'])[0]; 如果该操作数是真实的,则该运算符返回其第一个操作数,否则返回其第二个操作数。 因此,如果someString.match(/some_regex/gi)是假的(即不匹配),它将使用['default_value']代替。 例如,如果你想提取第二个捕获组,这可能会有点麻烦。 在这种情况下,您仍然可以在初始化多个变量时干净利落地执 ... -
尝试以下模式: /^(>>[0-9]{0,6}) (.*)$/gm This was a gigantic pain but I finally got it working. I took the linky filter from the sanitize module and modified it into a directive. It basically turns a scope into two via regex on a certain pattern, as described in ...