首页 \ 问答 \ Ruby Thread Pooling - 我做错了什么?(Ruby Thread Pooling - What am I doing wrong?)

Ruby Thread Pooling - 我做错了什么?(Ruby Thread Pooling - What am I doing wrong?)

我在Postgres数据库的Content表中有250万条记录,我需要遍历这250万条记录中的每条记录并执行一些操作(其中很多都是慢的)并根据内容更新记录。我一直聚集在一起 - 一切都行得通 - 这个问题,需要血腥的运行。

我遇到了几篇关于多线程这类工作的文章(我之前在C中做过这个,但从来没有使用Ruby)以及在Ruby中使用线程的优点和缺点,尽管有这些缺点,我可以得到的2000个线程显着发生比没有线程运行更快,但我一次只能关闭2000,限制我实际上能够更新所有250万条记录。 这是我的代码:

Content.all.each do |content|
  threads << Thread.new do
    grab_and_store(content)
  end
  index += 1
  index % 100 == 0 ? (puts index) : nil
end
threads.map(&:join)

我还读到了关于线程池的问题,一旦完成了原来的工作,就使用相同的线程来完成其他工作,但我似乎无法让它工作。 这是我的代码:

POOL_SIZE = 1000

jobs = Queue.new
Content.all.each{ |x| jobs.push x }

workers = (POOL_SIZE).times.map do
  Thread.new do
    begin
      while x = jobs.pop(true)
        grab_and_store(x)
      end
    rescue ThreadError
    end
  end
end
workers.map(&:join)

当我运行这个时,我得到一个错误,我无法在nil类上执行.join,这意味着工作者在此结束时是零。 但是,当我采用我基于此的代码(如下所示和源代码 )并运行它完美无缺。 我似乎无法弄清楚我的破坏在哪里/如何最好地实现线程池以阻止我的代码在2000线程之后耗尽资源。

谢谢!

PS继承我使用的教程中的代码:

require 'thread'
work_q = Queue.new
(0..50).to_a.each{|x| work_q.push x }
workers = (0...4).map do
  Thread.new do
    begin
      while x = work_q.pop(true)
        50.times{print [128000+x].pack "U*"}
      end
    rescue ThreadError
    end
  end
end; "ok"
workers.map(&:join); "ok"

更新:

根据安东尼的回答,我发现自己使用了以下大块代码,使用他推荐的ruby-thread gem,它很快就会运行给定的内容(样本大小为1000),但是当我检查控制台时,它似乎只保存了大约20最大 这是代码:

pool = Thread.pool(5)

@ids = []
arr = Content.where(needs_update: true)[0...1000]

puts "Starting With Sample 1000"

arr.each do |content|
  pool.process do
    grab_and_store(content)
  end
  index += 1
  index % 100 == 0 ? (puts index) : nil
end

pool.shutdown

I have 2.5 million records in the Content table of my Postgres Database, I need to go through each of those 2.5 million records and perform a number of actions (many of which are slow by themselves) and update the record at the end based on what I have gathered along the way - that all works - the problem, it takes for bloody ever to run.

I came across a couple of articles talking about multithreading such jobs (I have done this before in C, but never Ruby) and the pros and cons of using threads in Ruby, still despite those cons, the 2000 threads I can get off happen significantly faster than running without threading, but I can only get 2000 off at one time, limiting me from actually being able to update all 2.5 million records. Here is the code I had for that:

Content.all.each do |content|
  threads << Thread.new do
    grab_and_store(content)
  end
  index += 1
  index % 100 == 0 ? (puts index) : nil
end
threads.map(&:join)

I also read about thread pooling, using the same threads to do other jobs once they have completed their original one, but I can't seem to get it to work. Here is the code that I had:

POOL_SIZE = 1000

jobs = Queue.new
Content.all.each{ |x| jobs.push x }

workers = (POOL_SIZE).times.map do
  Thread.new do
    begin
      while x = jobs.pop(true)
        grab_and_store(x)
      end
    rescue ThreadError
    end
  end
end
workers.map(&:join)

When I run this I get an error that I can't execute .join on a nil class, which would mean that workers is nil at the end of this. But when I take the code that I based this off of (shown below, and source) and run that it works perfectly. I can't seem to figure out where mine is breaking / how to best implement the thread pool to stop my code from running out of resources after 2000 threads.

Thanks!

P.S. Heres the code from the tutorial I used:

require 'thread'
work_q = Queue.new
(0..50).to_a.each{|x| work_q.push x }
workers = (0...4).map do
  Thread.new do
    begin
      while x = work_q.pop(true)
        50.times{print [128000+x].pack "U*"}
      end
    rescue ThreadError
    end
  end
end; "ok"
workers.map(&:join); "ok"

Update:

Per Anthony's answer I found myself with the following chunk of code, using the ruby-thread gem he recommended, it runs through the given Content really quickly (it's a sample size of 1000), but when I check console it appears to have only saved around 20 max. Here's the code:

pool = Thread.pool(5)

@ids = []
arr = Content.where(needs_update: true)[0...1000]

puts "Starting With Sample 1000"

arr.each do |content|
  pool.process do
    grab_and_store(content)
  end
  index += 1
  index % 100 == 0 ? (puts index) : nil
end

pool.shutdown

原文:https://stackoverflow.com/questions/32593289
更新时间:2023-06-02 19:06

最满意答案

您可能想要使用GREATEST()函数:

SELECT GREATEST(value2, value2, value3);

但为了从所有行中获得绝对最大值,您可以使用:

SELECT GREATEST(MAX(value1), MAX(value2), MAX(value3)) FROM table_a;

You may want to use the GREATEST() function:

SELECT GREATEST(value2, value2, value3);

But to get absolute maximum from all the rows, you may use:

SELECT GREATEST(MAX(value1), MAX(value2), MAX(value3)) FROM table_a;

相关问答

更多

相关文章

更多

最新问答

更多
  • 获取MVC 4使用的DisplayMode后缀(Get the DisplayMode Suffix being used by MVC 4)
  • 如何通过引用返回对象?(How is returning an object by reference possible?)
  • 矩阵如何存储在内存中?(How are matrices stored in memory?)
  • 每个请求的Java新会话?(Java New Session For Each Request?)
  • css:浮动div中重叠的标题h1(css: overlapping headlines h1 in floated divs)
  • 无论图像如何,Caffe预测同一类(Caffe predicts same class regardless of image)
  • xcode语法颜色编码解释?(xcode syntax color coding explained?)
  • 在Access 2010 Runtime中使用Office 2000校对工具(Use Office 2000 proofing tools in Access 2010 Runtime)
  • 从单独的Web主机将图像传输到服务器上(Getting images onto server from separate web host)
  • 从旧版本复制文件并保留它们(旧/新版本)(Copy a file from old revision and keep both of them (old / new revision))
  • 西安哪有PLC可控制编程的培训
  • 在Entity Framework中选择基类(Select base class in Entity Framework)
  • 在Android中出现错误“数据集和渲染器应该不为null,并且应该具有相同数量的系列”(Error “Dataset and renderer should be not null and should have the same number of series” in Android)
  • 电脑二级VF有什么用
  • Datamapper Ruby如何添加Hook方法(Datamapper Ruby How to add Hook Method)
  • 金华英语角.
  • 手机软件如何制作
  • 用于Android webview中图像保存的上下文菜单(Context Menu for Image Saving in an Android webview)
  • 注意:未定义的偏移量:PHP(Notice: Undefined offset: PHP)
  • 如何读R中的大数据集[复制](How to read large dataset in R [duplicate])
  • Unity 5 Heighmap与地形宽度/地形长度的分辨率关系?(Unity 5 Heighmap Resolution relationship to terrain width / terrain length?)
  • 如何通知PipedOutputStream线程写入最后一个字节的PipedInputStream线程?(How to notify PipedInputStream thread that PipedOutputStream thread has written last byte?)
  • python的访问器方法有哪些
  • DeviceNetworkInformation:哪个是哪个?(DeviceNetworkInformation: Which is which?)
  • 在Ruby中对组合进行排序(Sorting a combination in Ruby)
  • 网站开发的流程?
  • 使用Zend Framework 2中的JOIN sql检索数据(Retrieve data using JOIN sql in Zend Framework 2)
  • 条带格式类型格式模式编号无法正常工作(Stripes format type format pattern number not working properly)
  • 透明度错误IE11(Transparency bug IE11)
  • linux的基本操作命令。。。