Category Archives: backgroundrb

Detect if daemon is really running

You know the story, how your daemon die unexpectedly (mongrel,thin,BackgrounDRb) and on restart they complain about existing pid file (and assume its running). For BackgrounDRb we solved this irksome problem in following way:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
def really_running? pid
  begin
    Process.kill(0,pid)
    true
  rescue Errno::ESRCH
    puts "pid file exists but process doesn't seem to be running restarting now"
    false
  end
end
 
def try_restart
  pid = nil
  pid = File.open(PID_FILE, "r") { |pid_handle| pid_handle.gets.strip.chomp.to_i }
  if really_running? pid
    puts "pid file already exists, exiting..."
    exit(-1)
  end
end

Simple, try sending signal “0” and if process responds to it, its alive otherwise pid file is stale and daemon can be safely restarted. There could be many more ways to solve this problem, for example to grep output of “ps aux”.

Unthreaded threads of hobbiton

Update: With 1.0.4 release, this method has been removed. It was introduced as a workaround for thread unsafe register_status, but its no longer required, since result caching is anyway threadsafe in this version.

You know the story too well, in your BackgrounDRb worker, you want to run 10 tasks concurrently using thread pool and collect the results back in a instance variable and return it. Now, threads are funny little beasts and simplest of things can easily go out of hand. For example, one of BackgrounDRb users wrote something like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
pages = Array.new
pages_to_scrape.each do |url|
  thread_pool.defer(url) do |url|
    begin
      # model object performs the scraping
      page = ScrapedPage.new(page.url)
      pages << page
    rescue
      logger.info "page scrape failed"
    end
  end
end
return pages

There are many things thats wrong with above code, and one of them is, it modifies a shared variable without acquiring a lock. Remember anything inside thread_pool.defer happens inside a separate thread and hence should be thread safe.

Another use case, will be, say you don’t want to spawn gazillions of workers and rather within one worker, you want to process requests from n users and save the results back using “register_status” with user_id as identifier. thread_pool.defer is for fire and forget kind of jobs and using register_status within block supplied to thread_pool.defer is dangerous.

Enter thread_pool.fetch_parallely(args,request_proc,response_proc). Lets have an example going. Inside one of your workers:

1
2
3
4
5
6
7
8
9
10
11
  def barbar args
    request_proc = lambda do |foo|
      sleep(2)
      "Hello #{foo}"
    end
 
    callback = lambda do |result|
      register_status(result)
    end
    thread_pool.fetch_parallely(args,request_proc,callback)
  end

First argument to fetch_parallely is some data argument that will be simply passed to request_proc. Note that, its necessary and you should not rely on the fact that a closure captures the local scope. Within threads, it could be dangerous. Now, last return value of request_proc will be passed as an argument to callback proc, when request_proc finishes execution and is ready with result.

The difference here is, although request_proc runs within new thread, calback proc gets executed within main thread and hence need not be thread safe. You can use pretty much do anything you want within callback proc.

This is available in git version of BackgrounDRb. Here is the link on, how to install git version of BackgrounDRb.

http://gnufied.org/2008/05/21/bleeding-edge-version-of-backgroundrb-for-better-memory-usage/

Bleeding Edge Version of BackgrounDRb for better memory usage

Update: Do not forget to get rid of autogenerated backgroundrb etcetera files that were generated during last run of rake backgroundrb:setup.

A plain fork() is evil under Ruby and hence there are some issues with memory usage of BackgrounDRb. I have made some changes to packet library so as it uses fork and exec rather than just fork for better memory usage. However, there were quite bit of changes and BackgrounDRb is affected by them. You can try bleeding edge version as following.Also you need rspec for building packet gem.

clone the packet git repo:

git clone git://github.com/gnufied/packet.git
cd packet;rake gem
cd pkg; sudo gem install --local packet-0.1.6.gem

Go to your vendor/plugins directory of your rails directory and remove or
backup older version of backgroundrb plugin and backup related config
file as well.

from vendor/plugins directory:

git clone git://github.com/gnufied/backgroundrb.git
cd RAILS_ROOT
rake backgroundrb:setup
./script/backgroundrb start

Let me know,how it goes.

BackgrounDRb best practises

  • Best place for BackgrounDRb documentation is the README file that comes with the plugin. Read it thoroughly before going anywhere else for documentation.
  • When passing arguments from Rails to BackgrounDRb workers, don’t pass huge ActiveRecord objects. Its asking for trouble. You can easily circumvent the situation by passing id of AR objects.
  • Its always a good idea to run trunk version rather than older tag releases.
  • To debug backgroundrb problems. Its always a good idea to start bdrb in foreground mode by skipping ‘start’ argument while starting the bdrb server. After that, you should fire rails console and try invoking bdrb tasks from rails console and find out whats happening. John Yerhot has posted an excellent write up about this, here
  • Whenever you update the plugin code from svn, don’t forget to remove old backgroundrb script and run :
     rake backgroundrb:setup
  • When deploying the plugin in production, please change backgroundrb.yml, so as production environment is loaded in backgroundrb server. You should avoid keeping backgroundrb.yml file in svn. Rather, you should have a cap task that generates backgroundrb.yml on production servers.
  • When you are processing too many tasks from rails, you should use inbuilt thread pool, rather than firing new workers
  • BackgrounDRb needs Ruby >= 1.8.5
  • When you are starting a worker using
     MiddleMan.new_worker() 

    from rails and using a job_key to start the worker ( You must use unique job keys anyways, if you want more than one instance of same worker running at the same time ), you must always access that instance of worker with same job key. Thats all MiddleMan methods that will invoke a method on that instance of worker must carry job_key as a parameter. For example:

    1
    2
    
       session[:job_key] = MiddleMan.new_worker(:worker => :fibonacci_worker, :job_key => 'the_key', :data => params[:input])
       MiddleMan.send_request(:worker => :fibonacci_worker, :worker_method => :do_work, :data => params[:input],:job_key => session[:job_key])

    Omitting the job_key in subsequent calls will be an error, if your worker is started with a job_key.

BackgrounDRb 1.0 released

Although its been quite sometime since 1.0 release of BackgrounDRb has been out in the wild, yet a belated post mentioning its features is nonetheless welcome.

Although README document available at, http://backgroundrb.rubyforge.org is quite comprehensive and there is precious little I can add, yet I shall try.

  • BackgrounDRb is a Ruby job server and scheduler. Its main intent is to be
    used with Ruby on Rails applications for offloading long-running tasks. However unlike other libraries BackgrounDRb offers tight integration with Rails and hence you can check status of your workers, pass data to workers and get response back, register status of your workers, dynamically start or stop workers from rails and stuff like that.
  • BackgrounDRb doesn’t have any DRb in its skin now. Its based on networking library packet, (http://packet.googlecode.com)
  • Its stable.
  • It has support for thread_pools, storing of results in MemCache clusters
  • It comes with its own scheduler and hence you don’t need to muck around with crontab anymore

A Quick overview of installation:

  • Get the plugin using:
     piston import http://svn.devjavu.com/backgroundrb/trunk/ backgroundrb
  • Remove or backup older backgroundrb scripts/config files in your rails root directory
  • Run following command from root directory of your rails app:
     rake backgroundrb:setup
  • Have a look at generated config file, RAILS_ROOT/config/backgroundrb.yml and see if there is anything you would like to change.
  • Generate a new worker using :
     ./script/generate worker foo
  • Read the detail documentation about writing workers and stuff on http://backgroundrb.rubyforge.org
  • Start your BackgrounDRb server with:
    ./script/backgroundrb start
  • Stop your BackgrounDRb server with:
    ./script/backgroundrb stop

New release of BackgrounDRb available now

Assuming nobody is reading this, I would quietly mention that, I released new version of BackgrounDRb plugin today.
Checkout full announcement here:

http://rubyforge.org/pipermail/backgroundrb-devel/2007-November/001043.html

Here is a sample worker:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
class FooWorker < BackgrounDRb::MetaWorker
  set_worker_name :foo_worker
  attr_accessor :count
 
  def create
    puts "Starting Foo Worker"
    @count = 0
    add_periodic_timer(4) { increment_status}
  end
 
  def process_request p_data
    user_input = p_data[:data]
    result = self.send(user_input[:method],user_input[:data])
    send_response(p_data,result)
  end
 
  def increment_status
    puts "Registering status"
    register_status("stuff #{rand(10)}")
  end
 
  def foobar
    puts "Invoking foobar at #{Time.now}"
  end
 
  def add_values user_input
    p user_input
    return eval(user_input)
  end
end
 
=begin
  problems, with existing things.
=end