Archive for the ‘ruby’ Category
Since, we started to scale to multi machine clustering, our rails application was showing one weird problem. We were generating Stock Market Technical charts and caching them in memcache (with 1 hour ttl). During testing we found that, if requests gets routed to mongrel cluster running on other machine, charts aren’t appearing, which was weird because we had memcache cluster configured properly in “production.rb” file:
1 2 3 4 5 6 7 8 9 10 | memcache_options = { :c_threshold => 10_000, :compression => true, :debug => false, :namespace => 'foobar' :readonly => false, :urlencode => false } CACHE = MemCache.new(memcache_options) CACHE.servers = SIMPLE_CONFIG_FILE["memcache_servers"] |
Where SIMPLE_CONFIG_FILE['memcache_servers'] contains list of memcache clusters participating in clustering. After, debugging for few hours(*gasp*) and turning on verbose logging on all participating memcache clusters, I found that, trusty CacheFu , was replacing the CACHE constant with following code:
1 2 3 4 5 6 7 | silence_warnings do Object.const_set :CACHE, memcache_klass.new(config) Object.const_set :SESSION_CACHE, memcache_klass.new(config) if config[:session_servers] end CACHE.servers = Array(config.delete(:servers)) SESSION_CACHE.servers = Array(config[:session_servers]) if config[:session_servers] |
Now this deal is real (and sucks). After couple of minutes of hacking, I took out cache_fu config file out of svn,wrote code to generate it on the fly during deployment. Now, pigs can fly.
This post is not about pork and coffee. So, stay clear, if google has landed you here thinking I am going to describe some sort of recipe for making nice mocha coffee with chunky bacons.
Its about using shiny new testing library called Bacon by Chris. Mocha is of course, venerable mocking library for Ruby and Rails.Here is a tiny bit of code that will make you started with bacon and mocha:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | require "rubygems" require "bacon" require "mocha/standalone" require "mocha/object" class Bacon::Context include Mocha::Standalone alias_method :old_it,:it def it description,&block mocha_setup old_it(description,&block) mocha_verify mocha_teardown end end |
Thats it. Happy baking.
Update: With 1.0.4 release, this method has been removed. It was introduced as a workaround for thread unsafe register_status, but its no longer required, since result caching is anyway threadsafe in this version.
You know the story too well, in your BackgrounDRb worker, you want to run 10 tasks concurrently using thread pool and collect the results back in a instance variable and return it. Now, threads are funny little beasts and simplest of things can easily go out of hand. For example, one of BackgrounDRb users wrote something like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 | pages = Array.new pages_to_scrape.each do |url| thread_pool.defer(url) do |url| begin # model object performs the scraping page = ScrapedPage.new(page.url) pages << page rescue logger.info "page scrape failed" end end end return pages |
There are many things thats wrong with above code, and one of them is, it modifies a shared variable without acquiring a lock. Remember anything inside thread_pool.defer happens inside a separate thread and hence should be thread safe.
Another use case, will be, say you don’t want to spawn gazillions of workers and rather within one worker, you want to process requests from n users and save the results back using “register_status” with user_id as identifier. thread_pool.defer is for fire and forget kind of jobs and using register_status within block supplied to thread_pool.defer is dangerous.
Enter thread_pool.fetch_parallely(args,request_proc,response_proc). Lets have an example going. Inside one of your workers:
1 2 3 4 5 6 7 8 9 10 11 | def barbar args request_proc = lambda do |foo| sleep(2) "Hello #{foo}" end callback = lambda do |result| register_status(result) end thread_pool.fetch_parallely(args,request_proc,callback) end |
First argument to fetch_parallely is some data argument that will be simply passed to request_proc. Note that, its necessary and you should not rely on the fact that a closure captures the local scope. Within threads, it could be dangerous. Now, last return value of request_proc will be passed as an argument to callback proc, when request_proc finishes execution and is ready with result.
The difference here is, although request_proc runs within new thread, calback proc gets executed within main thread and hence need not be thread safe. You can use pretty much do anything you want within callback proc.
This is available in git version of BackgrounDRb. Here is the link on, how to install git version of BackgrounDRb.
http://gnufied.org/2008/05/21/bleeding-edge-version-of-backgroundrb-for-better-memory-usage/
I thought it will be cool to display real time stock market streaming ticks in our marketsimplified application.

I needed wee bit of flash code that opens an XML socket to our comet server,accepts data and invokes corresponding javascript function in browser for displaying streaming data. I took the flash code from Juggernaut project and compiled it to a swf. Now, we needed a Comet Server. The initial version I wrote in Ruby using EventMachine. It was pretty darn good, you can plugin and stream how many type of data you want, you just need to inherit a class and you were done. But it was slow and was unable to handle influx of stock market ticks. We had to restart the damn comet server everyday.
So I proceeded to port Comet Server in Scala. Before choosing Scala, I played with Erlang and D. My attempts at learning Erlang and using it for our Comet Server were serious. Why I gave up Erlang was mostly because:
- String handling and near absence of it. Yes sure one can write a recursive descent parser, but for small string manipulation, its an overkill. Our internal data exchange protocol is line oriented and the parers that I wrote used string manipulation.
- I dunno, if many will agree, but Erlang is hard. Yeah, probably not for simple “hello world” applications, but on whole you need to turn your head quite a bit. I wasn’t sure, if my colleagues will buy into it.
- Even if you ignore above two critireas, I believe Erlang needs quite different Eco System. Our application is already built around open source technologies, such as - Mysql, MemCache, Rails, Hibernate, Ruby, Python, Nginx, Mongrel and controlled by YAML files. Hunting Erlang libraries for mysql, memcache, yaml, json and making them work seemed like too much work.
Anyways, that was my decision so get over it. I briefly flirted with D. I wasn’t very happy with libraries, their installation procedure and their API. I started with Scala long ago, playing it with now and then. But after my Erlang tryst fizzled out, I decided to look into Scala seriously. There were no decent Network programming libraries for Scala. Sure, I could have used Mina, but I wanted a more Scalasque library, which helps in me translating my EventMachine code to Scala and hence I wrote Eventfax. ( Code in public svn is a bit stale, our corporate svns have latest code, which I will be publish soon )
For example a EchoServer using Eventfax library will look like:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 | import scala.actors._ import scala.actors.Actor._ import java.io.IOException import eventfax.core._ import eventfax.protocol._ // define a reactor core class EchoReactor(starterBlock: ReactorCore => Unit) extends ReactorCore(starterBlock) { def check_for_actor_messages = {} } // define a factor class object EchoServerFactory extends ConnectionFactory { def create_connection(): Connection = { new EchoServer() } } class EchoServer extends Connection { val buftok: LineProtocol = new LineProtocol("[\n]") val masterActor: EchoReactor = reactor.asInstanceOf[EchoReactor] def receive_data(p_data: FaxData) = { buftok.extract(p_data) buftok.foreach(str_data => { dispatch_request(str_data) }) } def dispatch_request(str_data: String) = { val t_str = str_data.trim() if(!t_str.isEmpty) send_data(str_data+"\n") } def on_write_error() = { try { close_connection } catch { case ex: IOException => println("error")} } def unbind() = { } } object EchoServerRunner { def main(args: Array[String]) = { new EchoReactor(t_reactor => { t_reactor.start_server("localhost",8700,EchoServerFactory) }) } } |
Also, I had to port our protocol parsing libraries to Scala, which proved trivial. After a week of effort, our watchlist is powered by a Comet Server written Scala. I used Scala Specs for testing, buildr for compiling and packaging.
I am yet to get buildr working properly with Scala specs, but I will perhaps look into that later. Getting Memcache,Mysql, YAML or JSON working with Scala was trivial. Although, Json parser of Scala is one cane full of worms. Every new scala release seems to introduce new bugs in it. I had to spend quite sometime in getting around them.
Update: Do not forget to get rid of autogenerated backgroundrb etcetera files that were generated during last run of rake backgroundrb:setup.
A plain fork() is evil under Ruby and hence there are some issues with memory usage of BackgrounDRb. I have made some changes to packet library so as it uses fork and exec rather than just fork for better memory usage. However, there were quite bit of changes and BackgrounDRb is affected by them. You can try bleeding edge version as following.Also you need rspec for building packet gem.
clone the packet git repo:
git clone git://github.com/gnufied/packet.git cd packet;rake gem cd pkg; sudo gem install --local packet-0.1.6.gem
Go to your vendor/plugins directory of your rails directory and remove or
backup older version of backgroundrb plugin and backup related config
file as well.
from vendor/plugins directory:
git clone git://github.com/gnufied/backgroundrb.git cd RAILS_ROOT rake backgroundrb:setup ./script/backgroundrb start
Let me know,how it goes.
I am no expert in Ruby, but overtime I have accumulated some thoughts that may help you in writing better Ruby code.
- Always create a directory hierarchy for your library/application. Such as:
|__ bin |__ lib |__ tests |__ yaml_specs
- If your project hierarchy is like above and you are writing an library not an application, don’t make the mistake of putting all your files in lib directory straightaway. Rather have a setup like:
Root |__ bin |__ lib |__ lib/packet.rb |__ lib/packet/other files go here
And use relative requires in “packet.rb” file, like:
1 2 3 4 5 6 7 8 9 10 11 12 13 14
$:.unshift(File.dirname(__FILE__)) unless $:.include?(File.dirname(__FILE__)) || $:.include?(File.expand_path(File.dirname(__FILE__))) require "packet/packet_parser" require "packet/packet_meta_pimp" require "packet/packet_core" require "packet/packet_master" require "packet/packet_connection" require "packet/packet_worker" PACKET_APP = File.expand_path'../' unless defined?(PACKET_APP) module Packet VERSION='0.1.4' end
It would be helpful in avoiding package name collisions that otherwise your users will report.
- Once chris2 mentioned on #ruby-lang, you shouldn’t be overly clever with test cases. Don’t try to be too DRY in your test cases.
- Write code that can be easilly tested. What the fuck that means? When I started with Ruby and was doing Network programming. I used to write methods like ughh, that always manipulated state through instance variables. I either used threads or EventMachine. One of the issues with EventMachine is, code written usually relies on state machine and hence it can be notoriously difficult to unit test, because most of the time your methods are working according to the state of instance of variables. That was bad. Try to write code in more functional way, where methods take some parameters and return some values based on arguments. You should minimize methods with side effects as much as possible. This will make your code more readable and easily testable
- Read code of some good libraries, such as Ramaze , Rake, standard library.
- Use FastRi rather than Ri. If possible, generate your set of documentation using rdoc on Ruby source code. I spend time just looking through methods, classes just for fun. However, I don’t like the default default RDoc template, use Jamis RDoc template, if you like Rails documentation. Often for gems installed on your machine, you can use gem server or gem_server to view their documentation.
- #ruby-lang on freenode is generally a good place to shoot general Ruby questions. Be polite, don’t repeat and you will get your answers
- Avoid monkey patching core classes if your code is a library and will go with third party code.
If you are not writing a library and rather an executable application. Then, have a separate file that loads/requires required libraries and does some basic stuff. For example, I have a boot.rb in my Comet server that looks like:
1 2 3 4 5 6 7 | require "rubygems" require "eventmachine" require "buftok" require "sequel/mysql" PUSH_SERVER_PATH = File.expand_path(File.join(File.dirname(__FILE__),'..')) ["lib","channels"].each {|x| $:.unshift(File.join(PUSH_SERVER_PATH,x)) } require "push_server" |
Why? Because such a file can come handy when you are writing test_helper for your applications. There, you can simply require above boot.rb, so as you don’t have to copy stuff back and forth if your required libs change.
- Best place for BackgrounDRb documentation is the README file that comes with the plugin. Read it thoroughly before going anywhere else for documentation.
- When passing arguments from Rails to BackgrounDRb workers, don’t pass huge ActiveRecord objects. Its asking for trouble. You can easily circumvent the situation by passing id of AR objects.
- Its always a good idea to run trunk version rather than older tag releases.
- To debug backgroundrb problems. Its always a good idea to start bdrb in foreground mode by skipping ’start’ argument while starting the bdrb server. After that, you should fire rails console and try invoking bdrb tasks from rails console and find out whats happening. John Yerhot has posted an excellent write up about this, here
- Whenever you update the plugin code from svn, don’t forget to remove old backgroundrb script and run :
rake backgroundrb:setup
- When deploying the plugin in production, please change backgroundrb.yml, so as production environment is loaded in backgroundrb server. You should avoid keeping backgroundrb.yml file in svn. Rather, you should have a cap task that generates backgroundrb.yml on production servers.
- When you are processing too many tasks from rails, you should use inbuilt thread pool, rather than firing new workers
- BackgrounDRb needs Ruby >= 1.8.5
- When you are starting a worker using
MiddleMan.new_worker()
from rails and using a job_key to start the worker ( You must use unique job keys anyways, if you want more than one instance of same worker running at the same time ), you must always access that instance of worker with same job key. Thats all MiddleMan methods that will invoke a method on that instance of worker must carry job_key as a parameter. For example:
1 2
session[:job_key] = MiddleMan.new_worker(:worker => :fibonacci_worker, :job_key => 'the_key', :data => params[:input]) MiddleMan.send_request(:worker => :fibonacci_worker, :worker_method => :do_work, :data => params[:input],:job_key => session[:job_key])
Omitting the job_key in subsequent calls will be an error, if your worker is started with a job_key.
You might have heard of Raven . Its a rake and Rubygems based tool for building/managing Java projects.
Looks like upgrades to rake and rubygems has broke raven and my search for “Ruby Raven” led me to this:

Quite amusing.
Update: Mysql official bindings has been ported to 1.9, look into comments for details
For those who want to stay on the edge, here is modified set of mysql C bindings for Ruby 1.9. Works perfectly well in my small tests.
To compile it:
$ ruby2 extconf.rb $ make $ sudo make install
Although its been quite sometime since 1.0 release of BackgrounDRb has been out in the wild, yet a belated post mentioning its features is nonetheless welcome.
Although README document available at, http://backgroundrb.rubyforge.org is quite comprehensive and there is precious little I can add, yet I shall try.
- BackgrounDRb is a Ruby job server and scheduler. Its main intent is to be
used with Ruby on Rails applications for offloading long-running tasks. However unlike other libraries BackgrounDRb offers tight integration with Rails and hence you can check status of your workers, pass data to workers and get response back, register status of your workers, dynamically start or stop workers from rails and stuff like that. - BackgrounDRb doesn’t have any DRb in its skin now. Its based on networking library packet, (http://packet.googlecode.com)
- Its stable.
- It has support for thread_pools, storing of results in MemCache clusters
- It comes with its own scheduler and hence you don’t need to muck around with crontab anymore
A Quick overview of installation:
- Get the plugin using:
piston import http://svn.devjavu.com/backgroundrb/trunk/ backgroundrb
- Remove or backup older backgroundrb scripts/config files in your rails root directory
- Run following command from root directory of your rails app:
rake backgroundrb:setup
- Have a look at generated config file, RAILS_ROOT/config/backgroundrb.yml and see if there is anything you would like to change.
- Generate a new worker using :
./script/generate worker foo
- Read the detail documentation about writing workers and stuff on http://backgroundrb.rubyforge.org
- Start your BackgrounDRb server with:
./script/backgroundrb start
- Stop your BackgrounDRb server with:
./script/backgroundrb stop
