Neil's Place

April 4, 2011

Loading XBL Performance

Filed under: Mozilla — enndeakin @ 3:56 pm

A while ago I spent a bit of time looking at the performance of opening a new window recently. My first few experiments were actually a bit off track as I realized that I had my XUL cache disabled. This isn’t representative of most users, so isn’t completely valid for testing real performance, but it did get me thinking a bit.

In case you’re wondering, the XUL cache does two things. When a XUL file is loaded, it is parsed into an in-memory form and the scripts compiled into a bytecode-like form. When that file is needed again, the already parsed and compiled form is used instead of reading it again. The second step is that the parsed form is saved out to disk in a file. When the browser is restarted, this file is read instead. This way, the source form is not reparsed each time. This process happens for both XUL content and for Javascript. Instead of compiling the script into bytecode each time, the bytecode is serialized to disk and reused the next time. This process improves performance significantly at the cost of a couple of megabytes of disk space.

The file the parsed data is stored in, generally called the fastload file, is currently located in the same place as the network cache and can be found with the filename XUL.mfasl. You can safely delete it and it will get recreated when you start the browser again if you want to see how it affects startup time. (although that test will of course be affected by the time it takes to write it out again).

With the cache disabled, none of the above happens and the files must be reread from the source every time. However, when I had the cache disabled, I noticed that a significant amount of time was taken up by compiling the script associated with XBL bindings. Much less time was used with the cache enabled again.

Now, one thing of note here, is that, unlike XUL documents, only one part of the cache mechanism is used for XBL. XBL is only cached in memory and it isn’t saved out to the fastload file on disk. This means that this compilation time occurs upon each startup. I decided to investigate what would happen if XBL was also saved into the fastload file.

But first, let’s look at the performance with the current behaviour. Reading XBL generally has three steps, the first is to load and parse the source XML document and convert it into an internal representation. The second step is to attach the binding into the document and create the anonymous content. The third and final step is to compile the property and method scripts associated with the binding. Testing shows that the second step takes only 15 percent of the total time, so I’m going to focus on the first and last steps here. (Here I’m only considering the time spent executing within the area of code used to implement XBL.)

This chart shows the time to read a few selected bindings used by the Firefox UI. The first three are some of the pieces that make up the tabbrowser, the fourth is the tree element, the fifth is the dropdown autocomplete popup for the URL address field, and the last is the URL address field itself. This chart only shows a selection of the more complex bindings; in reality about 60 bindings or so are read at startup to create a window.

The chart breaks down the time to load and parse the binding (the blue bar) and the time to compile the properties and method scripts associated with the bindings (the green bar).

The first binding ‘tabbrowser-tabs’ takes a lot longer to parse than the following two tabbrowser related bindings. This should be expected since all three bindings are stored in the same XML file. As the document is only loaded and the XML parsed once per file, we expect that most of the parsing time will be for whatever binding is asked for first. The parse time for ‘tabbrowser’ and ‘tabbrowser-tabbox’ is mostly just overhead from having to locate the cached bindings previously read. (Remember that in-memory caching of XBL is performed currently.)

The compile time for the bindings, especially ‘tabbrowser’ correlates to the amount of script used by that binding. As evidenced, the ‘tabbrowser’ binding has a lot of methods, so significant time is spent on this.

The tree binding shows that a more complex binding that requires both parsing and compiling does indeed require notable time for both parsing and for script compiling.

As with the tabbrowser, the two urlbar bindings are contained within the same file, so the parsing time of the first takes the brunt of the total time. But notice that the ‘urlbar’ also requires significant time to parse as well. Again, there is an explanation. This last binding inherits from the autocomplete binding, so the time here also includes the time to load and parse the base autocomplete binding as well.

As evidenced, script compilation is a significant part of reading a binding. It is this part that we hope to reduce by fastloading.

I implemented a simple XBL fastloading mechanism to see what would happen. The hope is that we can see faster loading if the parsing and compiling steps are replaced by a single mechanism to read data from the fastload file that is already in a format that is close to the in-memory representation used. We can’t eliminate the time entirely of course, as we still need to read the compiled form, but if the original testing is correct, we should be able to eliminate the time needed to compile the scripts at least. The following chart shows the results.

This chart includes a bar showing the time taken when fastloading the same set of bindings (the orange bar) with the original data for comparison.

We can clearly see that the compilation time is entirely gone. The ‘tabbrowser’ binding shows this most obviously as it eliminates over 17 milliseconds off of the original time. But all of the other bindings have also saved this compilation time as well.

In all, it appears that the parse time is reduced by around 20 to 25 percent. In the two cases where parsing is not done, ‘tabbrowser’ and ‘tabbrowser-tabbox’, there is no difference in parse time. Note that in the implementation I did, all of the loading for all bindings in a file happens when the first binding for that file is loaded, so the parse time for all three actually occurs during the first tabbrowser-tabs binding. The small amount that exists is from the overhead of retrieving a binding from a file in the cache that is already been read. That might be worth investigating as well, since this overhead occurs over 500 times just when starting up Firefox.

This last chart shows the total time taken up by XBL parsing and compilation to load all of the bindings using two tests. The first is the time taken during startup. The second is a test which starts the browser, opens the bookmarks window, the sidebar, a couple of panes in the preferences window, a new blank window and then closes them again. As not all elements and bindings are used in the basic Firefox window, this latter test ensures we read a good selection of the additional bindings.

This chart shows that overall, using XBL fastloading removes the time needed for compilation, but has only a marginal effect on the total parsing time.

Note that this testing is only based on a few basic observations, but other tests I’ve done show that similar results occur at least on Windows and Mac, both with optimized and debug builds. Testing suggests about a 3-5% improvement in startup time once the fastload data is cached.

Blog at WordPress.com.