Parallel threads are still a myth in Ruby

Giant Padlock

In the beginning there was a giant lock. It wrapped all your requests and let only one run at  a time. Then it was removed in Rails 2.2, there was much to rejoice and some nay-sayers too. The world moved on, no one knew how many are actually using this new feature. Passenger app instance kept processing only one request a time, anyways. We did our bit of chest thumping and forgot about what good old m.on.key said.

But the debate seems to have been revived with Rubinius planning to remove GIL, JRuby already running threads parallely and finally MRI running native threads. On more serious note, I have been working with Ruby professionally for more than 4 years and I am astonished at anyone who is planning to run Ruby in a parallel environment. Igvita brought up that issue in his post (http://www.igvita.com/2008/11/13/concurrency-is-a-myth-in-ruby/), but yehuda disagrees.

The problem isn’t whether underlying platform supports parallel native threads or not, but the problem is, does Ruby eco-system supports that?

require "thread"
 
class Foo
  def hello
    puts "Original hello"
  end
end
 
threads = []
threads << Thread.new do
  sleep(2)
  Foo.class_eval do
    def hello
      puts "Hello from modified thread #1 "
    end
  end
end
 
threads << Thread.new do
  a = Foo.new()
  a.hello
  sleep(3)
  puts "After sleeping"
  a.hello
  b = Foo.new()
  b.hello
end
 
threads.each { |t| t.join }

Above code runs as expected, but ruby runtime offers no guarantees AFAIK about visibility of modifications in class objects. Lets take another example:

def synchronize(*methods)
    options = methods.extract_options!
    unless options.is_a?(Hash) && with = options[:with]
      raise ArgumentError, "Synchronization needs a mutex. Supply an options hash with a :with key as the last argument (e.g. synchronize :hello, :with => :@mutex)."
    end
 
    methods.each do |method|
      aliased_method, punctuation = method.to_s.sub(/([?!=])$/, ''), $1
 
      if method_defined?("#{aliased_method}_without_synchronization#{punctuation}")
        raise ArgumentError, "#{method} is already synchronized. Double synchronization is not currently supported."
      end
 
      module_eval(<<-EOS, __FILE__, __LINE__ + 1)
        def #{aliased_method}_with_synchronization#{punctuation}(*args, &block)   
          #{with}.synchronize do 
            #{aliased_method}_without_synchronization#{punctuation}(*args, &block)
          end                                                     
        end
      EOS
 
      alias_method_chain method, :synchronization
    end
  end

The idea is, you don’t need to start a synchronize block in your method, if you want entire method body to be wrapped inside synchronized block, rather than that you can specify entire contract at class level. Neat!
Except not, how does it work when classes are getting reloaded in development mode? Will this metaprogrammtically created synchronize hold? The answer will vary from Ruby to Ruby to implementation. In parallel environment, it will not.

The problem I am trying to drive at is, Ruby’s memory model makes no guarantees about class state. It’s a problem, no one talks about.

Lets take another example of “thread safe” code, from ActiveRecord connection pool (http://github.com/rails/rails/blob/master/activerecord/lib/active_record/connection_adapters/abstract/connection_pool.rb).
Anyone closely reading the code can find few thread problems with the code. For example, “@reserved_connections” seems be read and written by multiple threads without bother. It may work fine in Ruby 1.8 and 1.9, but the code is definitely not thread safe, if multiple threads are allowed to run parallel. A solution would be perhaps to use, concurrent hash (http://stackoverflow.com/questions/1080993/pure-ruby-concurrent-hash).

Where are we getting with this? In my opinion, until Ruby runtime makes guarantees about class state, module state , core threading primitives (Queue, Monitors, MonitorMixin, ConditionVariable) get exhaustively tested, concurrent collections are added; Ruby isn’t ready for parallel threads. In my professional experience, I had several problems with ruby’s default primitives. I was listening to a talk by ThoughWorks folks, who were using JRuby, single VM multiple threads for running rails for Mingle, their problems were simply too hard to catch and hard to debug.

In the meanwhile, Ruby community should focus on getting evented and co-operative threading proper. Fiber is a good start.

Thanks to folks in #ruby-pro for reviewing the post. Specially to James Tucker(raggi)

Leave a Reply

Your email address will not be published. Required fields are marked *