One of most common theories assigned to Ruby over the last few years is the opinion that it is a very slow programming language. In this short article, I would like to address the subject of Ruby’s performance and to try to explain what caused the negative opinion and why it is still not true.
People who know Ruby have probably often been asked by colleagues who do not have experience in Ruby whether or not the theory that it is slow is true. Personally, in those cases, I usually answer that it all depends. But in most cases this theory is no longer true.
What is causing this negative opinion?
The truth is that Ruby before version 1.9 was fairly slow. This was mainly due to the fact that the development team who created Ruby was more focused on extending the functionality of this language and not improving its performance. Everything changed when Koichi Sasada wrote a new Ruby interpreter, called YARV (Yet another Ruby VM), which was officially included in release 1.9. The performance of the new interpreter was significant, and some tests demonstrated that it is even four times faster than the previous one. This was a milestone that takes Ruby to the next level.
The next significant change to improve the performance was introduced in version 2.0, a solution called Copy-on-Write. CoW is a technique that during the fork() operation, the memory of the parent process is not fully copied but is shared , and in only the case when a child process tries the modified shared page, are new resources allocated.
Release 2.1 and next important changed in Ruby code were a next performance improvement. The modification was related to a new approach to memory management. Developers noted that very often, newly-created objects in Ruby code are short-lived; this provided the background to decide a change in current GC, based on the Mark & Sweep algorithm to newly called on generation GC.
One day a Rubyist came to Koichi and said, “I understand how to improve CRuby’s performance. We must use a generational garbage collector.”
Generation GC splits heap space to two scopes: young and old. New objects are placed in the young scope and they stay there over specific numbers of memory scanning. The next objects, which survive particular numbers of memory scanning, are moved to the old scope. Of course, that kind of approach does not cause a release of the all non-used memory page, and sometimes it also requires full memory scan. However, this modification has a significant impact on the time needed by GC to acquire memory.
Release 2.2 had the next modifications on the memory management level and the implementation of incremental garbage collection. GC incrementally is characterized by the memory scanning process, which is split into smaller intervals instead one long scanning. Executing Ruby code is not completely stopped for the whole time that the memory is scanning, and this results in the response of a much better application. These improvements caused that Ruby on Rails team decided to support only Ruby 2.2+ in Rails 5.
Rails 5.0 will target Ruby 2.2+ exclusively. There are a bunch of optimizations coming in Ruby 2.2 that are going to be very nice, but most importantly for Rails, symbols are going to be garbage collected. This means we can shed a lot of weight related to juggling strings when we accept input from the outside world. It also means that we can convert fully to keyword arguments and all the other good stuff from the latest Ruby.
The last release of Ruby is a bundle of improvements related to a socket, I/O operation or CGI.escapeHTML, which, for example, have an immediate impact on the time needed to parse erb files by Rails.
From a review of these important changes in the latest Ruby release, we can observe that performance improvement was often related not with Ruby interpreter, but rather, with how to Ruby manage memory. Why? Because natively, Ruby is not a slow language. From the 1.9 release, the performance of Ruby is on the same level as other dynamic programming languages, such as Python, Perl or PHP, and the problems they face are related to memory management and time needed by GC to free memory.
Let’s make a simple performance test!
To visualize how changes on the GC level in a different version of interpreter impact to performance, below, a simple test presents the execution time of the loop that allocated memory.
store = Array.new start = Time.new 0..20000000.times do |count| sum = count + Time.now.to_i store[count] = sum end p Time.new - start p GC.count
To avoid the impact of external processes to score, each test was executed couple times. Below table which presents the result of our test.
|#||Version||GC enable||GC disabled||GC count|
From this simple test, we can clearly deduct a few interesting conclusions. First, and the most important conclusion, is that comparing version 1.8 to 2.3, the execution time with default settings is five times faster. This is a very significant number because it indicates how Ruby performance changed over the last few years. The required execution time between version 1.8 and 2.0 is two times greater, and this is probably related to the new virtual machine. The difference between version 2.0 and 2.3 is three times greater and in this case is probably related to modifications on the memory management level. The GC counter also shows us how the performance of garbage collection is increased. Although the GC in version 2.3 was said to be about 2.5 times more often than in version 2.0, the total execution time with enabled and disabled GC is almost the same.
So in that case if we migrate our application from version 2.0 to 2.3, will the performance impact be three times better?
Well, it depends. Of course, our simple script does not simulate real application behavior and in that case, we must consider the individual. However, if the main assumption of our application is to process enormous numbers of data causing Ruby to allocate memory often, the increase of performance can be significant. On the other hand, if our application does not often use memory, the performance impact can be slight.
I hope that in this short introduction to the subject of Ruby performance I have explained how Ruby performance has changed over the last few years, and have explained its bottleneck. Of course, performance is a very wide subject, and there are many books that describe this topic in greater depth. However, the main purpose of this article was to present the performance differences between a version from over ten years ago and the current release, and to show that the negative opinion that Ruby is slow is no longer true. It is also important to know that Ruby is not for everything, and some tasks are better executed on the backend, such as the database instead of in the application code. This can significantly improve application performance.