The guilty man speaks (of Mac OS X)

I have been a GNU/Linux user for almost 7 years (no not dual booting, but exclusive user). Recently, I switched jobs ( Castle Rock Research great place to work btw) and ended up with MacBook pro in the bargain.

You can imagine as a Linux user I have used everything thats out there literally (Gnome,KDE,Xmonad, Awesome, WindowMaker, Stumpwm). My previous notebook was running KDE 4.3.2 and was pretty happy with it, overall.

Since I started using Macbook, my initial days were all pain, toil and fumbling in dark. Man O man. The biggest confusion is, many well known applications(such as Netbeans) use Command key in place of Control key in OSX. But this behavior is not uniform (there is no behavior, thats just my flimsy theory of adapting to OSX environment). So for copy you will have to press Command-C, but for some other shortcut you may still have to use Control key.

Then there is Emacs. The first question you will ask yourself as Emacs user on Mac OSX is, what should be the Meta key? Alt or Command? Well for me, both. Needless to say, Emacs will block some of global OSX shortcuts that use Command key, when you start using Command as meta key in Emacs.

What I miss most from Linux?

  • apt-get.
  • Having a true POSIX environment which is supported by open source folks. Now there. Don’t jump the gun and say OSX is POSIX. I know that already, but as I discovered painfully, Snow Leopard in particular has broken many Open Source libraries and tools. (Emacs23, rb-gsl, RMagick and many more). It was painful to see Apple fans blaming open source developers for this breakage and not coming up with the fix.
  • KDE4.3. I really loved KDE 4.3. I loved multi-screen handling of KDE 4.3.2. I could move window between screens. I could move focus between screens. I can set window to start in particular screen. All without moving mouse. Its way ahead of whats there in Mac OSX (which is click and drag essentially).I miss vertical maximize. I miss, window specific shortcuts which I used to have ( and yeah QuickSilver can fix some of that, but its nice to have all these features right out of the box).
  • I miss Qt and Gtk. I was decent with Gtk+ I think. When fancy took me, I could whip up utility applications in Gtk+. I have new ropes to learn now.
  • More importantly I understood the system inside out (at least little more than basics).

What I gained?

  • Applications. CRRC uses some specific tools for communication, which simply won’t work on Linux.
  • No hibernate and hence fast resume.
  • Better batter life.

I certainly wouldn’t add “ease of use” or crap like that here. If applications I need, would run on Linux, I would format this stuff and go back to Linux.

Constructor memory leaks

Yesterday I fixed a nasty memory leak in our PushServer which is written in Scala. As so happens with all the memory leaks, application was  running smoothily in test mode and just when I deployed in production it bombed within hours of launching the application.  The problem was with following bit of code:

class Quote(dataHash: Map[Int,Any],oldQuote: Quote) extends AmtdStockData {
  val symbol: String = dataHash.getOrElse(0,oldQuote.symbol)
  val bid: Float = dataHash.getOrElse(1,oldQuote.bid)
  val ask: Float = ....
}

Above code defines Quote class, the constructor of class takes a data Map as an argument and old quote for the same symbol. Class populates its field from data Map, but if a certain field is not found in the Map, it uses the field’s value from oldQuote. The Quote class basically works around the problem from data providers which send only updated fields for Level 1 quote.

Probably you have guessed by now, the problem with above code is that memory that is held by old quotes are never freed, because newer quotes always keep a reference to old quote values.

I can probably say, similar logic wouldn’t have created problems with Java or C++, but since constructor arguments are automatically taken as class fields in scala, we have a problem. The ability to pass arguments while defining a class and using it as a constructor is a life saver.It helps in avoiding placeholder field declaration which will be later populated through constructor or another method call as done in Java/C++.

But apparently, in this particular case I needed old ugly way back and hence new code is:

class Quote extends AmtdStockData {
  var symbol: String = _
  var bid: Float = _
  var ask: Float = _
  def populateValues(dataHash: Map[Int,Any],oldQuote: Quote) = {
    // populate above fields
  }
}

Above class design solves the problem at the expense of using “var”.Its not ideal, but I am scouting for ideas if the class can be designed in any other way!

Update: Turns out compiler usually doesn’t emit field for constructor parameter unless parameter is accessed by an instance method/inner class. In above case, getOrElse creates an anonymous inner class and hence compiler emits a field for oldQuote and its never garbage collected.

Autotest and add_exception method

Stuff floating on intrawebs for adding more files to be ignored using “add_exception” may not work because by the time “run” hook gets a chance to run, files to be ignored regexp is already compiled. The alternative is to use “initialize” hook like this:

1
2
3
Autotest.add_hook :initialize do |at|
  at.add_exception(/^(coverage|\.git)/)
end

This is with 3.11.0 of ZenTest and YMMV.

Amarok2 and redefinition of awesomeness

Disclaimer: Fully aware of the fact that this post will bring no change to Amarok or KDE, I am setting out to write this nonetheless. I know I should have been more constructive, but this is all I got to spare right now.

With the disclaimer out of the way, I used to be a KDE user and devoted Amarok user. Even when fancy took me to run Gnome, I used to run Amarok faithfully. Nothing unique, many Gnome users do the same.

Amarok2 has been criticised and critics attacked fittingly. Starting with huge “fuck you” in the clouds (addition is mine and text theirs) and much politer version ( http://amarok.kde.org/en/node/604 ).

However I am not entirely convinced that Amarok1.4 was a sinking boat of feature creep and junkyard of code. At least it wan’t trying to be a dysfunctional video player like Banshee, right? 😉

Anyways I care about very few features and only complain from me in feature department is, they have hidden some quick actions inside deep menus in Amarok2, such as “repeat single track”. Apart from this snag, me and Amarok2 go quite well together when it comes to features.

Now coming to main point.

  • UI : We programmers sometimes get so blinded by our creation that we see nothing past it. The plasma widget in center is exact case of that. In a media player, meta info about music is not more important than the music itself. Meta information is a nice to have feature and shouldn’t protrude like it does in Amarok2. I do like lyrics, album info and other semantic information but main thing that I do with my media player is drag songs from collection and drop it in the playlist. Most of the time its just minized to system tray. I do not run music player to impress chicks. Besides that widget poses significant challenges to entire layout. In beta and RC releases, damn thing rarely used to scale with rest of the window. In final version its better but yet above screenshot is living proof of its fragility.
  • Scattered playlist control : Okay even Amarok1.4 wasn’t a god in this, but it was better. Most of the playlist controls were somewhat closer. Contrast this with Amarok2 where playlist search is in one corner, <play/pause/stop> in another, <clear playlist/add to playlist> somewhere else. More so some of the controls are not at all on the screen or in context menu but are only accessible from main application menu.

  • Amarok Bug tracker : You come as a well meaning user and want to submit a bug report for Amarok, I would be quite surprised if you can find a link to bug tracker on ( http://amarok.kde.org/ ). But then bug tracking has been traditionally a no-joy operative word in Open Source because of monstrosity called Bugzilla. In this post Aaron Seigo ( http://aseigo.blogspot.com/2009/01/building-community-around-your-foss.html ) talks about lowering the barrier to entry. I have worked with bugzilla in past and user/submitter side of pain is nothing compared to pain the developer goes through. Contrast this with bliss Rails developers are having (http://rails.lighthouseapp.com/dashboard ). Why can’t this be improved?

Detect if daemon is really running

You know the story, how your daemon die unexpectedly (mongrel,thin,BackgrounDRb) and on restart they complain about existing pid file (and assume its running). For BackgrounDRb we solved this irksome problem in following way:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
def really_running? pid
  begin
    Process.kill(0,pid)
    true
  rescue Errno::ESRCH
    puts "pid file exists but process doesn't seem to be running restarting now"
    false
  end
end
 
def try_restart
  pid = nil
  pid = File.open(PID_FILE, "r") { |pid_handle| pid_handle.gets.strip.chomp.to_i }
  if really_running? pid
    puts "pid file already exists, exiting..."
    exit(-1)
  end
end

Simple, try sending signal “0” and if process responds to it, its alive otherwise pid file is stale and daemon can be safely restarted. There could be many more ways to solve this problem, for example to grep output of “ps aux”.

Rails debugging warmhole

Since, we started to scale to multi machine clustering, our rails application was showing one weird problem. We were generating Stock Market Technical charts and caching them in memcache (with 1 hour ttl). During testing we found that, if requests gets routed to mongrel cluster running on other machine, charts aren’t appearing, which was weird because we had memcache cluster configured properly in “production.rb” file:

1
2
3
4
5
6
7
8
9
10
memcache_options = {
  :c_threshold => 10_000,
  :compression => true,
  :debug => false,
  :namespace => 'foobar'
  :readonly => false,
  :urlencode => false
}
CACHE = MemCache.new(memcache_options)
CACHE.servers = SIMPLE_CONFIG_FILE["memcache_servers"]

Where SIMPLE_CONFIG_FILE[‘memcache_servers’] contains list of memcache clusters participating in clustering. After, debugging for few hours(*gasp*) and turning on verbose logging on all participating memcache clusters, I found that, trusty CacheFu , was replacing the CACHE constant with following code:

1
2
3
4
5
6
7
silence_warnings do
  Object.const_set :CACHE, memcache_klass.new(config)
  Object.const_set :SESSION_CACHE, memcache_klass.new(config) if config[:session_servers]
end
 
CACHE.servers = Array(config.delete(:servers))
SESSION_CACHE.servers = Array(config[:session_servers]) if config[:session_servers]

Now this deal is real (and sucks). After couple of minutes of hacking, I took out cache_fu config file out of svn,wrote code to generate it on the fly during deployment. Now, pigs can fly.

Screencasting from Linux/Ubuntu

Sorry for yet another do this – do that, crap of screencasting in Linux. Well, most of the tools out there simply doesn’t work (yeah, I have tried plenty of them).

Finally, I have settled down on ffmpeg, which works perfectly well for screencasts with both audio and video.

Here are step by step instructions on How to make Screencasts in Linux(Ubuntu 8.04 in my case, but you get the idea) using ffmpeg.

  • Get ffmpeg dependencis :
    sudo aptitude install build-essential subversion zlib1g-dev \
      checkinstall libgpac-dev libfaad-dev libfaac-dev liblame-dev \
      libtheora-dev libvorbis-dev gpac
  • Get X11 dependencies :
    sudo aptitude install libx11-dev xlibs-static-dev \
      x11proto-input-dev
  • Get ffmpeg code from svn, compile and install:
    svn checkout svn://svn.mplayerhq.hu/ffmpeg/trunk ffmpeg
    cd ffmpeg
    ./configure --prefix=/opt/ffmpeg --enable-gpl --enable-postproc \
      --enable-libvorbis --enable-libtheora --disable-debug \
      --enable-libmp3lame --enable-libfaad --enable-libfaac \
      --enable-pthreads --enable-x11grab
    make
    sudo make install
  • Record high quality screencasts with:
    ffmpeg -f oss -i /dev/dsp -f x11grab -s 1024x768 -r ntsc \
      -sameq -i :0.0 foo.avi

Making Ruby Bacon play with Mocha

This post is not about pork and coffee. So, stay clear, if google has landed you here thinking I am going to describe some sort of recipe for making nice mocha coffee with chunky bacons.

Its about using shiny new testing library called Bacon by Chris. Mocha is of course, venerable mocking library for Ruby and Rails.Here is a tiny bit of code that will make you started with bacon and mocha:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
require "rubygems"
require "bacon"
require "mocha/standalone"
require "mocha/object"
class Bacon::Context
  include Mocha::Standalone
  alias_method :old_it,:it
  def it description,&block
    mocha_setup
    old_it(description,&block)
    mocha_verify
    mocha_teardown
  end
end

Thats it. Happy baking.

Unthreaded threads of hobbiton

Update: With 1.0.4 release, this method has been removed. It was introduced as a workaround for thread unsafe register_status, but its no longer required, since result caching is anyway threadsafe in this version.

You know the story too well, in your BackgrounDRb worker, you want to run 10 tasks concurrently using thread pool and collect the results back in a instance variable and return it. Now, threads are funny little beasts and simplest of things can easily go out of hand. For example, one of BackgrounDRb users wrote something like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
pages = Array.new
pages_to_scrape.each do |url|
  thread_pool.defer(url) do |url|
    begin
      # model object performs the scraping
      page = ScrapedPage.new(page.url)
      pages << page
    rescue
      logger.info "page scrape failed"
    end
  end
end
return pages

There are many things thats wrong with above code, and one of them is, it modifies a shared variable without acquiring a lock. Remember anything inside thread_pool.defer happens inside a separate thread and hence should be thread safe.

Another use case, will be, say you don’t want to spawn gazillions of workers and rather within one worker, you want to process requests from n users and save the results back using “register_status” with user_id as identifier. thread_pool.defer is for fire and forget kind of jobs and using register_status within block supplied to thread_pool.defer is dangerous.

Enter thread_pool.fetch_parallely(args,request_proc,response_proc). Lets have an example going. Inside one of your workers:

1
2
3
4
5
6
7
8
9
10
11
  def barbar args
    request_proc = lambda do |foo|
      sleep(2)
      "Hello #{foo}"
    end
 
    callback = lambda do |result|
      register_status(result)
    end
    thread_pool.fetch_parallely(args,request_proc,callback)
  end

First argument to fetch_parallely is some data argument that will be simply passed to request_proc. Note that, its necessary and you should not rely on the fact that a closure captures the local scope. Within threads, it could be dangerous. Now, last return value of request_proc will be passed as an argument to callback proc, when request_proc finishes execution and is ready with result.

The difference here is, although request_proc runs within new thread, calback proc gets executed within main thread and hence need not be thread safe. You can use pretty much do anything you want within callback proc.

This is available in git version of BackgrounDRb. Here is the link on, how to install git version of BackgrounDRb.

http://gnufied.org/2008/05/21/bleeding-edge-version-of-backgroundrb-for-better-memory-usage/

Your Comet Server In Scala

I thought it will be cool to display real time stock market streaming ticks in our marketsimplified application.
Small Watchlist

I needed wee bit of flash code that opens an XML socket to our comet server,accepts data and invokes corresponding javascript function in browser for displaying streaming data. I took the flash code from Juggernaut project and compiled it to a swf. Now, we needed a Comet Server. The initial version I wrote in Ruby using EventMachine. It was pretty darn good, you can plugin and stream how many type of data you want, you just need to inherit a class and you were done. But it was slow and was unable to handle influx of stock market ticks. We had to restart the damn comet server everyday.

So I proceeded to port Comet Server in Scala. Before choosing Scala, I played with Erlang and D. My attempts at learning Erlang and using it for our Comet Server were serious. Why I gave up Erlang was mostly because:

  • String handling and near absence of it. Yes sure one can write a recursive descent parser, but for small string manipulation, its an overkill. Our internal data exchange protocol is line oriented and the parers that I wrote used string manipulation.
  • I dunno, if many will agree, but Erlang is hard. Yeah, probably not for simple “hello world” applications, but on whole you need to turn your head quite a bit. I wasn’t sure, if my colleagues will buy into it.
  • Even if you ignore above two critireas, I believe Erlang needs quite different Eco System. Our application is already built around open source technologies, such as – Mysql, MemCache, Rails, Hibernate, Ruby, Python, Nginx, Mongrel and controlled by YAML files. Hunting Erlang libraries for mysql, memcache, yaml, json and making them work seemed like too much work.

Anyways, that was my decision so get over it. I briefly flirted with D. I wasn’t very happy with libraries, their installation procedure and their API. I started with Scala long ago, playing it with now and then. But after my Erlang tryst fizzled out, I decided to look into Scala seriously. There were no decent Network programming libraries for Scala. Sure, I could have used Mina, but I wanted a more Scalasque library, which helps in me translating my EventMachine code to Scala and hence I wrote Eventfax. ( Code in public svn is a bit stale, our corporate svns have latest code, which I will be publish soon )

For example a EchoServer using Eventfax library will look like:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
import scala.actors._
import scala.actors.Actor._
import java.io.IOException
 
import eventfax.core._
import eventfax.protocol._
 
// define a reactor core
class EchoReactor(starterBlock: ReactorCore => Unit) extends ReactorCore(starterBlock) {
  def check_for_actor_messages = {}
}
 
// define a factor class
object EchoServerFactory extends ConnectionFactory {
  def create_connection(): Connection = {
    new EchoServer()
  }
}
 
class EchoServer extends Connection {
  val buftok: LineProtocol = new LineProtocol("[\n]")
  val masterActor: EchoReactor = reactor.asInstanceOf[EchoReactor]
  def receive_data(p_data: FaxData) = {
    buftok.extract(p_data)
    buftok.foreach(str_data => {
      dispatch_request(str_data)
    })
  }
 
  def dispatch_request(str_data: String) = {
    val t_str = str_data.trim()
    if(!t_str.isEmpty)
      send_data(str_data+"\n")
  }
 
  def on_write_error() = {
    try { close_connection }
    catch { case ex: IOException => println("error")}
  }
  def unbind() = {
  }
}
 
object EchoServerRunner {
  def main(args: Array[String]) = {
    new EchoReactor(t_reactor => {
      t_reactor.start_server("localhost",8700,EchoServerFactory)
    })
  }
}

Also, I had to port our protocol parsing libraries to Scala, which proved trivial. After a week of effort, our watchlist is powered by a Comet Server written Scala. I used Scala Specs for testing, buildr for compiling and packaging.

I am yet to get buildr working properly with Scala specs, but I will perhaps look into that later. Getting Memcache,Mysql, YAML or JSON working with Scala was trivial. Although, Json parser of Scala is one cane full of worms. Every new scala release seems to introduce new bugs in it. I had to spend quite sometime in getting around them.