Archive for the ‘backgroundrb’ Category
Update: With 1.0.4 release, this method has been removed. It was introduced as a workaround for thread unsafe register_status, but its no longer required, since result caching is anyway threadsafe in this version.
You know the story too well, in your BackgrounDRb worker, you want to run 10 tasks concurrently using thread pool and collect the results back in a instance variable and return it. Now, threads are funny little beasts and simplest of things can easily go out of hand. For example, one of BackgrounDRb users wrote something like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 | pages = Array.new pages_to_scrape.each do |url| thread_pool.defer(url) do |url| begin # model object performs the scraping page = ScrapedPage.new(page.url) pages << page rescue logger.info "page scrape failed" end end end return pages |
There are many things thats wrong with above code, and one of them is, it modifies a shared variable without acquiring a lock. Remember anything inside thread_pool.defer happens inside a separate thread and hence should be thread safe.
Another use case, will be, say you don’t want to spawn gazillions of workers and rather within one worker, you want to process requests from n users and save the results back using “register_status” with user_id as identifier. thread_pool.defer is for fire and forget kind of jobs and using register_status within block supplied to thread_pool.defer is dangerous.
Enter thread_pool.fetch_parallely(args,request_proc,response_proc). Lets have an example going. Inside one of your workers:
1 2 3 4 5 6 7 8 9 10 11 | def barbar args request_proc = lambda do |foo| sleep(2) "Hello #{foo}" end callback = lambda do |result| register_status(result) end thread_pool.fetch_parallely(args,request_proc,callback) end |
First argument to fetch_parallely is some data argument that will be simply passed to request_proc. Note that, its necessary and you should not rely on the fact that a closure captures the local scope. Within threads, it could be dangerous. Now, last return value of request_proc will be passed as an argument to callback proc, when request_proc finishes execution and is ready with result.
The difference here is, although request_proc runs within new thread, calback proc gets executed within main thread and hence need not be thread safe. You can use pretty much do anything you want within callback proc.
This is available in git version of BackgrounDRb. Here is the link on, how to install git version of BackgrounDRb.
http://gnufied.org/2008/05/21/bleeding-edge-version-of-backgroundrb-for-better-memory-usage/
Update: Do not forget to get rid of autogenerated backgroundrb etcetera files that were generated during last run of rake backgroundrb:setup.
A plain fork() is evil under Ruby and hence there are some issues with memory usage of BackgrounDRb. I have made some changes to packet library so as it uses fork and exec rather than just fork for better memory usage. However, there were quite bit of changes and BackgrounDRb is affected by them. You can try bleeding edge version as following.Also you need rspec for building packet gem.
clone the packet git repo:
git clone git://github.com/gnufied/packet.git cd packet;rake gem cd pkg; sudo gem install --local packet-0.1.6.gem
Go to your vendor/plugins directory of your rails directory and remove or
backup older version of backgroundrb plugin and backup related config
file as well.
from vendor/plugins directory:
git clone git://github.com/gnufied/backgroundrb.git cd RAILS_ROOT rake backgroundrb:setup ./script/backgroundrb start
Let me know,how it goes.
- Best place for BackgrounDRb documentation is the README file that comes with the plugin. Read it thoroughly before going anywhere else for documentation.
- When passing arguments from Rails to BackgrounDRb workers, don’t pass huge ActiveRecord objects. Its asking for trouble. You can easily circumvent the situation by passing id of AR objects.
- Its always a good idea to run trunk version rather than older tag releases.
- To debug backgroundrb problems. Its always a good idea to start bdrb in foreground mode by skipping ’start’ argument while starting the bdrb server. After that, you should fire rails console and try invoking bdrb tasks from rails console and find out whats happening. John Yerhot has posted an excellent write up about this, here
- Whenever you update the plugin code from svn, don’t forget to remove old backgroundrb script and run :
rake backgroundrb:setup
- When deploying the plugin in production, please change backgroundrb.yml, so as production environment is loaded in backgroundrb server. You should avoid keeping backgroundrb.yml file in svn. Rather, you should have a cap task that generates backgroundrb.yml on production servers.
- When you are processing too many tasks from rails, you should use inbuilt thread pool, rather than firing new workers
- BackgrounDRb needs Ruby >= 1.8.5
- When you are starting a worker using
MiddleMan.new_worker()
from rails and using a job_key to start the worker ( You must use unique job keys anyways, if you want more than one instance of same worker running at the same time ), you must always access that instance of worker with same job key. Thats all MiddleMan methods that will invoke a method on that instance of worker must carry job_key as a parameter. For example:
1 2
session[:job_key] = MiddleMan.new_worker(:worker => :fibonacci_worker, :job_key => 'the_key', :data => params[:input]) MiddleMan.send_request(:worker => :fibonacci_worker, :worker_method => :do_work, :data => params[:input],:job_key => session[:job_key])
Omitting the job_key in subsequent calls will be an error, if your worker is started with a job_key.
Although its been quite sometime since 1.0 release of BackgrounDRb has been out in the wild, yet a belated post mentioning its features is nonetheless welcome.
Although README document available at, http://backgroundrb.rubyforge.org is quite comprehensive and there is precious little I can add, yet I shall try.
- BackgrounDRb is a Ruby job server and scheduler. Its main intent is to be
used with Ruby on Rails applications for offloading long-running tasks. However unlike other libraries BackgrounDRb offers tight integration with Rails and hence you can check status of your workers, pass data to workers and get response back, register status of your workers, dynamically start or stop workers from rails and stuff like that. - BackgrounDRb doesn’t have any DRb in its skin now. Its based on networking library packet, (http://packet.googlecode.com)
- Its stable.
- It has support for thread_pools, storing of results in MemCache clusters
- It comes with its own scheduler and hence you don’t need to muck around with crontab anymore
A Quick overview of installation:
- Get the plugin using:
piston import http://svn.devjavu.com/backgroundrb/trunk/ backgroundrb
- Remove or backup older backgroundrb scripts/config files in your rails root directory
- Run following command from root directory of your rails app:
rake backgroundrb:setup
- Have a look at generated config file, RAILS_ROOT/config/backgroundrb.yml and see if there is anything you would like to change.
- Generate a new worker using :
./script/generate worker foo
- Read the detail documentation about writing workers and stuff on http://backgroundrb.rubyforge.org
- Start your BackgrounDRb server with:
./script/backgroundrb start
- Stop your BackgrounDRb server with:
./script/backgroundrb stop
Assuming nobody is reading this, I would quietly mention that, I released new version of BackgrounDRb plugin today.
Checkout full announcement here:
http://rubyforge.org/pipermail/backgroundrb-devel/2007-November/001043.html
Here is a sample worker:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 | class FooWorker < BackgrounDRb::MetaWorker set_worker_name :foo_worker attr_accessor :count def create puts "Starting Foo Worker" @count = 0 add_periodic_timer(4) { increment_status} end def process_request p_data user_input = p_data[:data] result = self.send(user_input[:method],user_input[:data]) send_response(p_data,result) end def increment_status puts "Registering status" register_status("stuff #{rand(10)}") end def foobar puts "Invoking foobar at #{Time.now}" end def add_values user_input p user_input return eval(user_input) end end =begin problems, with existing things. =end |
