Unthreaded threads of hobbiton

Update: With 1.0.4 release, this method has been removed. It was introduced as a workaround for thread unsafe register_status, but its no longer required, since result caching is anyway threadsafe in this version.

You know the story too well, in your BackgrounDRb worker, you want to run 10 tasks concurrently using thread pool and collect the results back in a instance variable and return it. Now, threads are funny little beasts and simplest of things can easily go out of hand. For example, one of BackgrounDRb users wrote something like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
pages = Array.new
pages_to_scrape.each do |url|
  thread_pool.defer(url) do |url|
    begin
      # model object performs the scraping
      page = ScrapedPage.new(page.url)
      pages << page
    rescue
      logger.info "page scrape failed"
    end
  end
end
return pages

There are many things thats wrong with above code, and one of them is, it modifies a shared variable without acquiring a lock. Remember anything inside thread_pool.defer happens inside a separate thread and hence should be thread safe.

Another use case, will be, say you don’t want to spawn gazillions of workers and rather within one worker, you want to process requests from n users and save the results back using “register_status” with user_id as identifier. thread_pool.defer is for fire and forget kind of jobs and using register_status within block supplied to thread_pool.defer is dangerous.

Enter thread_pool.fetch_parallely(args,request_proc,response_proc). Lets have an example going. Inside one of your workers:

1
2
3
4
5
6
7
8
9
10
11
  def barbar args
    request_proc = lambda do |foo|
      sleep(2)
      "Hello #{foo}"
    end
 
    callback = lambda do |result|
      register_status(result)
    end
    thread_pool.fetch_parallely(args,request_proc,callback)
  end

First argument to fetch_parallely is some data argument that will be simply passed to request_proc. Note that, its necessary and you should not rely on the fact that a closure captures the local scope. Within threads, it could be dangerous. Now, last return value of request_proc will be passed as an argument to callback proc, when request_proc finishes execution and is ready with result.

The difference here is, although request_proc runs within new thread, calback proc gets executed within main thread and hence need not be thread safe. You can use pretty much do anything you want within callback proc.

This is available in git version of BackgrounDRb. Here is the link on, how to install git version of BackgrounDRb.

http://gnufied.org/2008/05/21/bleeding-edge-version-of-backgroundrb-for-better-memory-usage/

3 thoughts on “Unthreaded threads of hobbiton

  1. Chaz

    Hi!

    I was trying out fetch_parallely earlier today and I noticed that if args is a hash, foo ends up being an array of key/value pair arrays. For example, {:a => :b, :c => :d} turns into [[:a, :b], [:c, :d]].

    Any ideas why that would be happening?

    Reply
  2. Hemant Post author

    Hmm, true:

    in meta_worker.rb:

      ActiveRecord::Base.verify_active_connections!
    - result = (block_arity == 0 ? task.block.call : task.block.call(*(task.data)))
    + data = (task.data.is_a?(Array)) ? *(task.data) : task.data
    + result = (block_arity == 0 ? task.block.call : task.block.call(data))
    
    Reply
  3. Ben B

    Instead of forcing the callback proc at the end of the threads execution, why not just let people call the callback proc when they want to?

    If I understand this correctly, you should be able to call the callback proc multiple times within a thread if you want?

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *