A Place for my head

On Ruby, Rails, Concurrency and fiction

Archive for the ‘rails’ Category

Posted in : rails, ruby | 1 comment

Stuff floating on intrawebs for adding more files to be ignored using “add_exception” may not work because by the time “run” hook gets a chance to run, files to be ignored regexp is already compiled. The alternative is to use “initialize” hook like this:

1
2
3
Autotest.add_hook :initialize do |at|
  at.add_exception(/^(coverage|\.git)/)
end

This is with 3.11.0 of ZenTest and YMMV.

Posted in : rails, ruby | No comments

Since, we started to scale to multi machine clustering, our rails application was showing one weird problem. We were generating Stock Market Technical charts and caching them in memcache (with 1 hour ttl). During testing we found that, if requests gets routed to mongrel cluster running on other machine, charts aren’t appearing, which was weird because we had memcache cluster configured properly in “production.rb” file:

1
2
3
4
5
6
7
8
9
10
memcache_options = {
  :c_threshold => 10_000,
  :compression => true,
  :debug => false,
  :namespace => 'foobar'
  :readonly => false,
  :urlencode => false
}
CACHE = MemCache.new(memcache_options)
CACHE.servers = SIMPLE_CONFIG_FILE["memcache_servers"]

Where SIMPLE_CONFIG_FILE['memcache_servers'] contains list of memcache clusters participating in clustering. After, debugging for few hours(*gasp*) and turning on verbose logging on all participating memcache clusters, I found that, trusty CacheFu , was replacing the CACHE constant with following code:

1
2
3
4
5
6
7
silence_warnings do
  Object.const_set :CACHE, memcache_klass.new(config)
  Object.const_set :SESSION_CACHE, memcache_klass.new(config) if config[:session_servers]
end
 
CACHE.servers = Array(config.delete(:servers))
SESSION_CACHE.servers = Array(config[:session_servers]) if config[:session_servers]

Now this deal is real (and sucks). After couple of minutes of hacking, I took out cache_fu config file out of svn,wrote code to generate it on the fly during deployment. Now, pigs can fly.

Posted in : rails, ruby | 2 comments

This post is not about pork and coffee. So, stay clear, if google has landed you here thinking I am going to describe some sort of recipe for making nice mocha coffee with chunky bacons.

Its about using shiny new testing library called Bacon by Chris. Mocha is of course, venerable mocking library for Ruby and Rails.Here is a tiny bit of code that will make you started with bacon and mocha:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
require "rubygems"
require "bacon"
require "mocha/standalone"
require "mocha/object"
class Bacon::Context
  include Mocha::Standalone
  alias_method :old_it,:it
  def it description,&block
    mocha_setup
    old_it(description,&block)
    mocha_verify
    mocha_teardown
  end
end

Thats it. Happy baking.

Posted in : backgroundrb, rails, ruby | 3 comments

Update: With 1.0.4 release, this method has been removed. It was introduced as a workaround for thread unsafe register_status, but its no longer required, since result caching is anyway threadsafe in this version.

You know the story too well, in your BackgrounDRb worker, you want to run 10 tasks concurrently using thread pool and collect the results back in a instance variable and return it. Now, threads are funny little beasts and simplest of things can easily go out of hand. For example, one of BackgrounDRb users wrote something like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
pages = Array.new
pages_to_scrape.each do |url|
  thread_pool.defer(url) do |url|
    begin
      # model object performs the scraping
      page = ScrapedPage.new(page.url)
      pages << page
    rescue
      logger.info "page scrape failed"
    end
  end
end
return pages

There are many things thats wrong with above code, and one of them is, it modifies a shared variable without acquiring a lock. Remember anything inside thread_pool.defer happens inside a separate thread and hence should be thread safe.

Another use case, will be, say you don’t want to spawn gazillions of workers and rather within one worker, you want to process requests from n users and save the results back using “register_status” with user_id as identifier. thread_pool.defer is for fire and forget kind of jobs and using register_status within block supplied to thread_pool.defer is dangerous.

Enter thread_pool.fetch_parallely(args,request_proc,response_proc). Lets have an example going. Inside one of your workers:

1
2
3
4
5
6
7
8
9
10
11
  def barbar args
    request_proc = lambda do |foo|
      sleep(2)
      "Hello #{foo}"
    end
 
    callback = lambda do |result|
      register_status(result)
    end
    thread_pool.fetch_parallely(args,request_proc,callback)
  end

First argument to fetch_parallely is some data argument that will be simply passed to request_proc. Note that, its necessary and you should not rely on the fact that a closure captures the local scope. Within threads, it could be dangerous. Now, last return value of request_proc will be passed as an argument to callback proc, when request_proc finishes execution and is ready with result.

The difference here is, although request_proc runs within new thread, calback proc gets executed within main thread and hence need not be thread safe. You can use pretty much do anything you want within callback proc.

This is available in git version of BackgrounDRb. Here is the link on, how to install git version of BackgrounDRb.

http://gnufied.org/2008/05/21/bleeding-edge-version-of-backgroundrb-for-better-memory-usage/

Update: Do not forget to get rid of autogenerated backgroundrb etcetera files that were generated during last run of rake backgroundrb:setup.

A plain fork() is evil under Ruby and hence there are some issues with memory usage of BackgrounDRb. I have made some changes to packet library so as it uses fork and exec rather than just fork for better memory usage. However, there were quite bit of changes and BackgrounDRb is affected by them. You can try bleeding edge version as following.Also you need rspec for building packet gem.

clone the packet git repo:

git clone git://github.com/gnufied/packet.git
cd packet;rake gem
cd pkg; sudo gem install --local packet-0.1.6.gem

Go to your vendor/plugins directory of your rails directory and remove or
backup older version of backgroundrb plugin and backup related config
file as well.

from vendor/plugins directory:

git clone git://github.com/gnufied/backgroundrb.git
cd RAILS_ROOT
rake backgroundrb:setup
./script/backgroundrb start

Let me know,how it goes.

Posted in : rails, rant, ruby | 2 comments

I am no expert in Ruby, but overtime I have accumulated some thoughts that may help you in writing better Ruby code.

  • Always create a directory hierarchy for your library/application. Such as:
       |__ bin
       |__ lib
       |__ tests
       |__ yaml_specs
  • If you are not writing a library and rather an executable application. Then, have a separate file that loads/requires required libraries and does some basic stuff. For example, I have a boot.rb in my Comet server that looks like:

    1
    2
    3
    4
    5
    6
    7
    
    require "rubygems"
    require "eventmachine"
    require "buftok"
    require "sequel/mysql"
    PUSH_SERVER_PATH = File.expand_path(File.join(File.dirname(__FILE__),'..'))
    ["lib","channels"].each {|x| $:.unshift(File.join(PUSH_SERVER_PATH,x)) }
    require "push_server"

    Why? Because such a file can come handy when you are writing test_helper for your applications. There, you can simply require above boot.rb, so as you don’t have to copy stuff back and forth if your required libs change.

  • If your project hierarchy is like above and you are writing an library not an application, don’t make the mistake of putting all your files in lib directory straightaway. Rather have a setup like:
      Root
      |__ bin
      |__ lib
      |__ lib/packet.rb
      |__ lib/packet/other files go here

    And use relative requires in “packet.rb” file, like:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    
    $:.unshift(File.dirname(__FILE__)) unless
      $:.include?(File.dirname(__FILE__)) || $:.include?(File.expand_path(File.dirname(__FILE__)))
    require "packet/packet_parser"
    require "packet/packet_meta_pimp"
    require "packet/packet_core"
    require "packet/packet_master"
    require "packet/packet_connection"
    require "packet/packet_worker"
     
    PACKET_APP = File.expand_path'../' unless defined?(PACKET_APP)
     
    module Packet
      VERSION='0.1.4'
    end

    It would be helpful in avoiding package name collisions that otherwise your users will report.

  • Once chris2 mentioned on #ruby-lang, you shouldn’t be overly clever with test cases. Don’t try to be too DRY in your test cases.
  • Write code that can be easilly tested. What the fuck that means? When I started with Ruby and was doing Network programming. I used to write methods like ughh, that always manipulated state through instance variables. I either used threads or EventMachine. One of the issues with EventMachine is, code written usually relies on state machine and hence it can be notoriously difficult to unit test, because most of the time your methods are working according to the state of instance of variables. That was bad. Try to write code in more functional way, where methods take some parameters and return some values based on arguments. You should minimize methods with side effects as much as possible. This will make your code more readable and easily testable
  • Read code of some good libraries, such as Ramaze , Rake, standard library.
  • Use FastRi rather than Ri. If possible, generate your set of documentation using rdoc on Ruby source code. I spend time just looking through methods, classes just for fun. However, I don’t like the default default RDoc template, use Jamis RDoc template, if you like Rails documentation. Often for gems installed on your machine, you can use gem server or gem_server to view their documentation.
  • #ruby-lang on freenode is generally a good place to shoot general Ruby questions. Be polite, don’t repeat and you will get your answers
  • Avoid monkey patching core classes if your code is a library and will go with third party code.
  • Best place for BackgrounDRb documentation is the README file that comes with the plugin. Read it thoroughly before going anywhere else for documentation.
  • When passing arguments from Rails to BackgrounDRb workers, don’t pass huge ActiveRecord objects. Its asking for trouble. You can easily circumvent the situation by passing id of AR objects.
  • Its always a good idea to run trunk version rather than older tag releases.
  • To debug backgroundrb problems. Its always a good idea to start bdrb in foreground mode by skipping ’start’ argument while starting the bdrb server. After that, you should fire rails console and try invoking bdrb tasks from rails console and find out whats happening. John Yerhot has posted an excellent write up about this, here
  • Whenever you update the plugin code from svn, don’t forget to remove old backgroundrb script and run :
     rake backgroundrb:setup
  • When deploying the plugin in production, please change backgroundrb.yml, so as production environment is loaded in backgroundrb server. You should avoid keeping backgroundrb.yml file in svn. Rather, you should have a cap task that generates backgroundrb.yml on production servers.
  • When you are processing too many tasks from rails, you should use inbuilt thread pool, rather than firing new workers
  • BackgrounDRb needs Ruby >= 1.8.5
  • When you are starting a worker using
     MiddleMan.new_worker() 

    from rails and using a job_key to start the worker ( You must use unique job keys anyways, if you want more than one instance of same worker running at the same time ), you must always access that instance of worker with same job key. Thats all MiddleMan methods that will invoke a method on that instance of worker must carry job_key as a parameter. For example:

    1
    2
    
       session[:job_key] = MiddleMan.new_worker(:worker => :fibonacci_worker, :job_key => 'the_key', :data => params[:input])
       MiddleMan.send_request(:worker => :fibonacci_worker, :worker_method => :do_work, :data => params[:input],:job_key => session[:job_key])

    Omitting the job_key in subsequent calls will be an error, if your worker is started with a job_key.

Although its been quite sometime since 1.0 release of BackgrounDRb has been out in the wild, yet a belated post mentioning its features is nonetheless welcome.

Although README document available at, http://backgroundrb.rubyforge.org is quite comprehensive and there is precious little I can add, yet I shall try.

  • BackgrounDRb is a Ruby job server and scheduler. Its main intent is to be
    used with Ruby on Rails applications for offloading long-running tasks. However unlike other libraries BackgrounDRb offers tight integration with Rails and hence you can check status of your workers, pass data to workers and get response back, register status of your workers, dynamically start or stop workers from rails and stuff like that.
  • BackgrounDRb doesn’t have any DRb in its skin now. Its based on networking library packet, (http://packet.googlecode.com)
  • Its stable.
  • It has support for thread_pools, storing of results in MemCache clusters
  • It comes with its own scheduler and hence you don’t need to muck around with crontab anymore

A Quick overview of installation:

  • Get the plugin using:
     piston import http://svn.devjavu.com/backgroundrb/trunk/ backgroundrb
  • Remove or backup older backgroundrb scripts/config files in your rails root directory
  • Run following command from root directory of your rails app:
     rake backgroundrb:setup
  • Have a look at generated config file, RAILS_ROOT/config/backgroundrb.yml and see if there is anything you would like to change.
  • Generate a new worker using :
     ./script/generate worker foo
  • Read the detail documentation about writing workers and stuff on http://backgroundrb.rubyforge.org
  • Start your BackgrounDRb server with:
    ./script/backgroundrb start
  • Stop your BackgrounDRb server with:
    ./script/backgroundrb stop
Posted in : java, rails | No comments

I had to recently write a smallish TCP/IP server in Java. It has to be written in Java, because of yet another obsession of corporate world with Java. The API that I had to use was javish, and although people who wrote it, would claim that API can be used easily in any language, it was not so. Whats more, I had to use that API from rails, so you can understand my situation.

Well, So i googled and found the book for “Java Network Programming, Third Edition”.
Java Network Programming

What a crap. My main issues were:

  • I hate a book, which pretends that its examples are ready to run, but they don’t run, because they need tinkering. I am all for snippets, that demonstrate a thing or two. But when you are saying, ok this example is ready to run, with import and everything in place and It doesn’t run on actual machine, it just freaks me out.
  • Although the book has 776 pages, its surprisingly free from useful stuff. Seriously, when I compare this with book by Richard Stevens, this book looks like shit. Although author had more scope here on how to write REALLY scalable networking applications. He wasted 776 pages, talking nothing and perhaps reserved good stuff for “Advanced Network Programming with Java” ( Soon, after reading the book, I found that indeed there is a book on Advanced Network Programming )
Posted in : rails | No comments

Our flash guy, uses usual loadVars for loading external data in a flash movie and do some funky stuff like plotting of nice looking portfolio charts.
Portfolio Chart

But somehow, we saw some issues with loading of charts in IE6 running flash. A quick ethreal packet sniffing showed us, that although client is making
request with “Accept-Encoding: gzip,deflate”, its not able to decode the gzip response of web server. And parsing of gzipped response at flash obviously fails.

This is quite a corner case, I suppose, but I did this, to prevent Content-Encoding for that particular controller.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
class OutputCompressionFilter
  def self.filter(controller)
    controller.response.headers['Content-Encoding'] = 'identity'
  end
end
 
class FlashController < ApplicationController
  no_pref true
  after_filter OutputCompressionFilter
  layout :set_layout
  include REXML
  def index
  end
end