A Place for my head

On Ruby, Rails, Concurrency and fiction

Update: With 1.0.4 release, this method has been removed. It was introduced as a workaround for thread unsafe register_status, but its no longer required, since result caching is anyway threadsafe in this version.

You know the story too well, in your BackgrounDRb worker, you want to run 10 tasks concurrently using thread pool and collect the results back in a instance variable and return it. Now, threads are funny little beasts and simplest of things can easily go out of hand. For example, one of BackgrounDRb users wrote something like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
pages = Array.new
pages_to_scrape.each do |url|
  thread_pool.defer(url) do |url|
    begin
      # model object performs the scraping
      page = ScrapedPage.new(page.url)
      pages << page
    rescue
      logger.info "page scrape failed"
    end
  end
end
return pages

There are many things thats wrong with above code, and one of them is, it modifies a shared variable without acquiring a lock. Remember anything inside thread_pool.defer happens inside a separate thread and hence should be thread safe.

Another use case, will be, say you don’t want to spawn gazillions of workers and rather within one worker, you want to process requests from n users and save the results back using “register_status” with user_id as identifier. thread_pool.defer is for fire and forget kind of jobs and using register_status within block supplied to thread_pool.defer is dangerous.

Enter thread_pool.fetch_parallely(args,request_proc,response_proc). Lets have an example going. Inside one of your workers:

1
2
3
4
5
6
7
8
9
10
11
  def barbar args
    request_proc = lambda do |foo|
      sleep(2)
      "Hello #{foo}"
    end
 
    callback = lambda do |result|
      register_status(result)
    end
    thread_pool.fetch_parallely(args,request_proc,callback)
  end

First argument to fetch_parallely is some data argument that will be simply passed to request_proc. Note that, its necessary and you should not rely on the fact that a closure captures the local scope. Within threads, it could be dangerous. Now, last return value of request_proc will be passed as an argument to callback proc, when request_proc finishes execution and is ready with result.

The difference here is, although request_proc runs within new thread, calback proc gets executed within main thread and hence need not be thread safe. You can use pretty much do anything you want within callback proc.

This is available in git version of BackgrounDRb. Here is the link on, how to install git version of BackgrounDRb.

http://gnufied.org/2008/05/21/bleeding-edge-version-of-backgroundrb-for-better-memory-usage/

Posted in : ruby,scala | 4 comments

I thought it will be cool to display real time stock market streaming ticks in our marketsimplified application.
Small Watchlist

I needed wee bit of flash code that opens an XML socket to our comet server,accepts data and invokes corresponding javascript function in browser for displaying streaming data. I took the flash code from Juggernaut project and compiled it to a swf. Now, we needed a Comet Server. The initial version I wrote in Ruby using EventMachine. It was pretty darn good, you can plugin and stream how many type of data you want, you just need to inherit a class and you were done. But it was slow and was unable to handle influx of stock market ticks. We had to restart the damn comet server everyday.

So I proceeded to port Comet Server in Scala. Before choosing Scala, I played with Erlang and D. My attempts at learning Erlang and using it for our Comet Server were serious. Why I gave up Erlang was mostly because:

  • String handling and near absence of it. Yes sure one can write a recursive descent parser, but for small string manipulation, its an overkill. Our internal data exchange protocol is line oriented and the parers that I wrote used string manipulation.
  • I dunno, if many will agree, but Erlang is hard. Yeah, probably not for simple “hello world” applications, but on whole you need to turn your head quite a bit. I wasn’t sure, if my colleagues will buy into it.
  • Even if you ignore above two critireas, I believe Erlang needs quite different Eco System. Our application is already built around open source technologies, such as – Mysql, MemCache, Rails, Hibernate, Ruby, Python, Nginx, Mongrel and controlled by YAML files. Hunting Erlang libraries for mysql, memcache, yaml, json and making them work seemed like too much work.

Anyways, that was my decision so get over it. I briefly flirted with D. I wasn’t very happy with libraries, their installation procedure and their API. I started with Scala long ago, playing it with now and then. But after my Erlang tryst fizzled out, I decided to look into Scala seriously. There were no decent Network programming libraries for Scala. Sure, I could have used Mina, but I wanted a more Scalasque library, which helps in me translating my EventMachine code to Scala and hence I wrote Eventfax. ( Code in public svn is a bit stale, our corporate svns have latest code, which I will be publish soon )

For example a EchoServer using Eventfax library will look like:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
import scala.actors._
import scala.actors.Actor._
import java.io.IOException
 
import eventfax.core._
import eventfax.protocol._
 
// define a reactor core
class EchoReactor(starterBlock: ReactorCore => Unit) extends ReactorCore(starterBlock) {
  def check_for_actor_messages = {}
}
 
// define a factor class
object EchoServerFactory extends ConnectionFactory {
  def create_connection(): Connection = {
    new EchoServer()
  }
}
 
class EchoServer extends Connection {
  val buftok: LineProtocol = new LineProtocol("[\n]")
  val masterActor: EchoReactor = reactor.asInstanceOf[EchoReactor]
  def receive_data(p_data: FaxData) = {
    buftok.extract(p_data)
    buftok.foreach(str_data => {
      dispatch_request(str_data)
    })
  }
 
  def dispatch_request(str_data: String) = {
    val t_str = str_data.trim()
    if(!t_str.isEmpty)
      send_data(str_data+"\n")
  }
 
  def on_write_error() = {
    try { close_connection }
    catch { case ex: IOException => println("error")}
  }
  def unbind() = {
  }
}
 
object EchoServerRunner {
  def main(args: Array[String]) = {
    new EchoReactor(t_reactor => {
      t_reactor.start_server("localhost",8700,EchoServerFactory)
    })
  }
}

Also, I had to port our protocol parsing libraries to Scala, which proved trivial. After a week of effort, our watchlist is powered by a Comet Server written Scala. I used Scala Specs for testing, buildr for compiling and packaging.

I am yet to get buildr working properly with Scala specs, but I will perhaps look into that later. Getting Memcache,Mysql, YAML or JSON working with Scala was trivial. Although, Json parser of Scala is one cane full of worms. Every new scala release seems to introduce new bugs in it. I had to spend quite sometime in getting around them.

Update: Do not forget to get rid of autogenerated backgroundrb etcetera files that were generated during last run of rake backgroundrb:setup.

A plain fork() is evil under Ruby and hence there are some issues with memory usage of BackgrounDRb. I have made some changes to packet library so as it uses fork and exec rather than just fork for better memory usage. However, there were quite bit of changes and BackgrounDRb is affected by them. You can try bleeding edge version as following.Also you need rspec for building packet gem.

clone the packet git repo:

git clone git://github.com/gnufied/packet.git
cd packet;rake gem
cd pkg; sudo gem install --local packet-0.1.6.gem

Go to your vendor/plugins directory of your rails directory and remove or
backup older version of backgroundrb plugin and backup related config
file as well.

from vendor/plugins directory:

git clone git://github.com/gnufied/backgroundrb.git
cd RAILS_ROOT
rake backgroundrb:setup
./script/backgroundrb start

Let me know,how it goes.

Posted in : Uncategorized | 4 comments

Newest version of operating system for human beings is out. Kicking ass and users. Yes users, you heard it right.
I am not very happy with this release. In fact, I regret that I upgraded to this version. Why?

  • Today every upcoming Linux distro is bundling Firefox3 beta 5 as default browser so ubuntu is not alone there. But in my opinion this is a bad decision. Firefox3 beta5 is not stable. Most of the extensions aren’t supported yet. You are being hard on users by bundling a software thats still 2 months from final release. My main gripes with Firefox3 beta 5 is,
    • Its a CPU hog. Even with only one tab open, it consumes considerable CPU cycles. Bad for laptop users. Now before you jump the gun, in my tests, I have a simple page open (no fancy gmail with lots of background processing) and I am not interacting with firefox window
    • It hangs and freezes now and then. I have experienced this kinda behavior often, while downloading a page firefox will freeze momentarily and then suddenly it will come back.
    • Extensions, extensions. Many of my favorite extensions are not available for firefox3.
    • Also, I am using fully updated version of Hardy and have disabled Security->Tell Me, thingies. and have deleted urlclassifier3.sqlite without much success.
  • I guess all Linux users have seen occasional freezes and they have no other go than to hard reboot the machine. With Hardy and Compiz fusion, I just saw too many of freezes on my notebook. Also, did anyone notice metacity compositioning is buggy? I tried using metacity compositioning in place of compiz and had to press alt-ctrl-backspace couple of times. One sure way is, try enabling compositioning and then disabling it. I experienced X freeze when doing that.
  • Many Emacs users i think swap ctrl and caps keys. I am one of them. After upgrading to hardy swapping them through Gnome keyboard settings works, but whenever I press caps ( now its ctrl ), it still turns on caps LED display on my notebook. A bit annoying. This issue was not there in Guts.
  • Shipped Xorg is a CPU hog, even if you are not interacting with your machine, Xorg cpu usage stays around 10-30%. I have Nvidia graphics card and I have tried using <code>”UseEvents” “on”</code>. It doesn’t seem to have any effect.
  • Oh boy, I don’t even know, why they shipped this half hearted PulseAudio integration? For getting sound back on my laptop, I had to manually choose ALSA as Sound Playback device.

Being a LTS edition, I was expecting something solid, but I can confirm with some certainity that Hardy release is certainly inferior than Gutsy. Thank you Ubuntu for scrweing it up.

Posted in : Uncategorized | 2 comments

Chennai.rb has been languishing for a while (strictly my opinion). First problem is, like almost all open source groups in India, its too business oriented. I have been on the mailing list for a while and attended one of their meetings and didn’t like the atmosphere. Back then, it was too rails centric.

I dunno if its a good idea, but I wanted to start a alternate Ruby group which is more code centric (like seatle.rb perhaps). We organize coding sprints, hacking sessions and stuff like that.

Whatever projects core members of the group decide to start and provided group accepts general idea of the project, it becomes groups responsibility, rather than one man job. Also, it will be responsibility of the creator of the project to explain underlying concepts of the project and it will be maintained by entire group.

We can have a page like seatle.rb on Rubyforge and collabaratively maintain projects.

Also perhaps, there won’t be any beginner sessions, business discussions and it should be truly chennai.alt.rb.

If you read this blog and is from Chennai let me know what you think?

Posted in : laptop,notebook,rant | 3 comments

I am serious, I know perhaps you have read my last blog entry about Dell support, but let me tell you, that got resolved eventually and onsite tech support guy came and fixed it.

Now, Acer was/is the first Laptop I bought about one and half year ago. I had various issues with notebook. One of them was overheating, other was rebooting without any reasons ( perhaps because of issue#1). Acer never fixed them. I live in Chennai and their service center is in one obscure place ( only one in Chennai ), where you are supposed to take your notebook and ask them to fix. They will take 4 to 5 days if you are lucky enough and after that you are supposed to go there and collect your notebook. Each day, getting your notebook repaired will take almost half of your day. But anyways, they never actually fixed.

Last time I took my notebook to them, my warranty period was over and their repair charges were almost same as cost of their brand new laptop with almost same configuration. Whats sad is, that was my first laptop and I bought it with Prize money obtained from here .

Needless to say, I left hope. But just for anyone else out there, please don’t buy Acer laptops.

Posted in : rails,rant,ruby | 2 comments

I am no expert in Ruby, but overtime I have accumulated some thoughts that may help you in writing better Ruby code.

  • Always create a directory hierarchy for your library/application. Such as:
       |__ bin
       |__ lib
       |__ tests
       |__ yaml_specs
  • If you are not writing a library and rather an executable application. Then, have a separate file that loads/requires required libraries and does some basic stuff. For example, I have a boot.rb in my Comet server that looks like:

    1
    2
    3
    4
    5
    6
    7
    
    require "rubygems"
    require "eventmachine"
    require "buftok"
    require "sequel/mysql"
    PUSH_SERVER_PATH = File.expand_path(File.join(File.dirname(__FILE__),'..'))
    ["lib","channels"].each {|x| $:.unshift(File.join(PUSH_SERVER_PATH,x)) }
    require "push_server"

    Why? Because such a file can come handy when you are writing test_helper for your applications. There, you can simply require above boot.rb, so as you don’t have to copy stuff back and forth if your required libs change.

  • If your project hierarchy is like above and you are writing an library not an application, don’t make the mistake of putting all your files in lib directory straightaway. Rather have a setup like:
      Root
      |__ bin
      |__ lib
      |__ lib/packet.rb
      |__ lib/packet/other files go here

    And use relative requires in “packet.rb” file, like:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    
    $:.unshift(File.dirname(__FILE__)) unless
      $:.include?(File.dirname(__FILE__)) || $:.include?(File.expand_path(File.dirname(__FILE__)))
    require "packet/packet_parser"
    require "packet/packet_meta_pimp"
    require "packet/packet_core"
    require "packet/packet_master"
    require "packet/packet_connection"
    require "packet/packet_worker"
     
    PACKET_APP = File.expand_path'../' unless defined?(PACKET_APP)
     
    module Packet
      VERSION='0.1.4'
    end

    It would be helpful in avoiding package name collisions that otherwise your users will report.

  • Once chris2 mentioned on #ruby-lang, you shouldn’t be overly clever with test cases. Don’t try to be too DRY in your test cases.
  • Write code that can be easilly tested. What the fuck that means? When I started with Ruby and was doing Network programming. I used to write methods like ughh, that always manipulated state through instance variables. I either used threads or EventMachine. One of the issues with EventMachine is, code written usually relies on state machine and hence it can be notoriously difficult to unit test, because most of the time your methods are working according to the state of instance of variables. That was bad. Try to write code in more functional way, where methods take some parameters and return some values based on arguments. You should minimize methods with side effects as much as possible. This will make your code more readable and easily testable
  • Read code of some good libraries, such as Ramaze , Rake, standard library.
  • Use FastRi rather than Ri. If possible, generate your set of documentation using rdoc on Ruby source code. I spend time just looking through methods, classes just for fun. However, I don’t like the default default RDoc template, use Jamis RDoc template, if you like Rails documentation. Often for gems installed on your machine, you can use gem server or gem_server to view their documentation.
  • #ruby-lang on freenode is generally a good place to shoot general Ruby questions. Be polite, don’t repeat and you will get your answers
  • Avoid monkey patching core classes if your code is a library and will go with third party code.
Posted in : Uncategorized | No comments

I am experiencing weird issue on my Notebook running Ubuntu Gutsy. Often, I will loose wireless connection and network manager will attempt to reconnect, but in stead of properly reconnecting to wireless router it just hangs. Whats more weird is, after this, I am unable to start any X application with few exceptions.

I have reported the bug here . Many have reported similar problems and I am waiting for a fix. However temporary workaround seems to be, to run

$ network-admin --sm-diable

and reset the network connections. Also, you can try deleting the /tmp/.ICE-Unix directory and files.

  • Best place for BackgrounDRb documentation is the README file that comes with the plugin. Read it thoroughly before going anywhere else for documentation.
  • When passing arguments from Rails to BackgrounDRb workers, don’t pass huge ActiveRecord objects. Its asking for trouble. You can easily circumvent the situation by passing id of AR objects.
  • Its always a good idea to run trunk version rather than older tag releases.
  • To debug backgroundrb problems. Its always a good idea to start bdrb in foreground mode by skipping ‘start’ argument while starting the bdrb server. After that, you should fire rails console and try invoking bdrb tasks from rails console and find out whats happening. John Yerhot has posted an excellent write up about this, here
  • Whenever you update the plugin code from svn, don’t forget to remove old backgroundrb script and run :
     rake backgroundrb:setup
  • When deploying the plugin in production, please change backgroundrb.yml, so as production environment is loaded in backgroundrb server. You should avoid keeping backgroundrb.yml file in svn. Rather, you should have a cap task that generates backgroundrb.yml on production servers.
  • When you are processing too many tasks from rails, you should use inbuilt thread pool, rather than firing new workers
  • BackgrounDRb needs Ruby >= 1.8.5
  • When you are starting a worker using
     MiddleMan.new_worker() 

    from rails and using a job_key to start the worker ( You must use unique job keys anyways, if you want more than one instance of same worker running at the same time ), you must always access that instance of worker with same job key. Thats all MiddleMan methods that will invoke a method on that instance of worker must carry job_key as a parameter. For example:

    1
    2
    
       session[:job_key] = MiddleMan.new_worker(:worker => :fibonacci_worker, :job_key => 'the_key', :data => params[:input])
       MiddleMan.send_request(:worker => :fibonacci_worker, :worker_method => :do_work, :data => params[:input],:job_key => session[:job_key])

    Omitting the job_key in subsequent calls will be an error, if your worker is started with a job_key.

Posted in : Uncategorized | 5 comments

Its a no brainer in India that you should buy your favorite notebook from US if possible. The usual way is, Ask your friend, family , distance relative , beg them and if they are kind enough, you will get a good laptop in cheap bargain. But for most of us, its not possible.

So, I did a comparison between cost of same model notebooks in India and US and here are the results.

  • Dell : I dunno about you, but I like XPS M1330. Its really an awesome notebook. Starting price in US for this model is $ 1000 and in India its about 52,000 ruppes. Assuming $1 = 40 Rs, you have a price parity of about 12,000 rupees and notebook is costlier by 300 dollars in India. Not bad I would say.
  • Apple Macbook : I will leave Macbook air for you to figure. But simple white macbook costs around 59,000 Rs in India, while in US the same model costs 1100 dollars. Again assuming same currency conversion rates, macbook is costlier by 15,000 rupees or 350 dollars in India. Again, pretty good deal. The price difference used to be really high for Apple products, but Apple seem to have bridged the gap.
  • Thinkpad T61: Well, Thinkpads are Thinkpads and generally T61 and X61 are one of the best notebooks out there.But wait a minute, entry level T61 costs a mind boggling 90,000 rupees (2250 dollars ) in India, while in US it costs 984 dollars. So If you are buying this notebook from India, you are paying 1266 dollars or about 51,000 rupees more. Apple is selling Macbook air for a price of 96,000 rupees in India and T61 is 90,000 rupees. Somebody should slap these bastards at Lenovo. It pisses me off badly. I can’t imagine, one single reason, why Thinkpad models are so overpriced in India.
  • EEE PC: Again price parity is of about 6,000 rupees. EEE PC costs around 349 dollars in US and in India, it was launched at a price of 18,000 rupees.

Well, thats about it. I hope, Lenovo will learn and make thinkpads cheaper.