So we’ve been using Eideticker to automatically measure startup/pageload times for about a year now on Android, and more recently on FirefoxOS as well (albeit not automatically). This gives us nice and pretty graphs like this:
Ok, so we’re generating numbers and graphing them. That’s great. But what’s really going on behind the scenes? I’m glad you asked. The story is a bit different depending on which platform you’re talking about.
On Android we connect Eideticker to the device’s HDMI out, so we count on a nearly pixel-perfect signal. In practice, it isn’t quite, but it is within a few RGB values that we can easily filter for. This lets us come up with a pretty good mechanism for determining when a page load or app startup is finished: just compare frames, and say we’ve “stopped” when the pixel differences between frames are negligible (previously defined at 2048 pixels, now 4096 — see below). Eideticker’s new frame difference view lets us see how this works. Look at this graph of application startup:
[Link to original]
What’s going on here? Well, we see some huge jumps in the beginning. This represents the animated transitions that Android makes as we transition from the SUTAgent application (don’t ask) to the beginnings of the FirefoxOS browser chrome. You’ll notice though that there’s some more changes that come in around the 3 second mark. This is when the site bookmarks are fully loaded. If you load the original page (link above) and swipe your mouse over the graph, you can see what’s going on for yourself.
This approach is not completely without problems. It turns out that there is sometimes some minor churn in the display even when the app is for all intents and purposes started. For example, sometimes the scrollbar fading out of view can result in a significantish pixel value change, so I recently upped the threshold of pixels that are different from 2048 to 4096. We also recently encountered a silly problem with a random automation app displaying “toasts” which caused results to artificially spike. More tweaking may still be required. However, on the whole I’m pretty happy with this solution. It gives useful, undeniably objective results whose meaning is easy to understand.
So as mentioned previously, we use a camera on FirefoxOS to record output instead of HDMI output. Pretty unsurprisingly, this is much noisier. See this movie of the contacts app starting and note all the random lighting changes, for example:
My experience has been that pixel differences can be so great between visually identical frames on an eideticker capture on these devices that it’s pretty much impossible to settle on when startup is done using the frame difference method. It’s of course possible to detect very large scale changes, but the small scale ones (like the contacts actually appearing in the example above) are very hard to distinguish from random differences in the amount of light absorbed by the camera sensor. Tricks like using median filtering (a.k.a. “blurring”) help a bit, but not much. Take a look at this graph, for example:
[Link to original]
You’ll note that the pixel differences during “static” parts of the capture are highly variable. This is because the pixel difference depends heavily on how “bright” each frame is: parts of the capture which are black (e.g. a contacts icon with a black background) have a much lower difference between them than parts that are bright (e.g. the contacts screen fully loaded).
After a day or so of experimenting and research, I settled on an approach which seems to work pretty reliably. Instead of comparing the frames directly, I measure the entropy of the histogram of colours used in each frame (essentially just an indication of brightness in this case, see this article for more on calculating it), then compare that of each frame with the average of the same measure over 5 previous frames (to account for the fact that two frames may be arbitrarily different, but that is unlikely that a sequence of frames will be). This seems to work much better than frame difference in this environment: although there are plenty of minute differences in light absorption in a capture from this camera, the overall color composition stays mostly the same. See this graph:
[Link to original]
If you look closely, you can see some minor variance in the entropy differences depending on the state of the screen, but it’s not nearly as pronounced as before. In practice, I’ve been able to get extremely consistent numbers with a reasonable “threshold” of “0.05”.
In Eideticker I’ve tried to steer away from using really complicated math or algorithms to measure things, unless all the alternatives fail. In that sense, I really liked the simplicity of “pixel differences” and am not thrilled about having to resort to this: hopefully the concepts in this case (histograms and entropy) are simple enough that most people will be able to understand my methodology, if they care to. Likely I will need to come up with something else for measuring responsiveness and animation smoothness (frames per second), as likely we can’t count on light composition changing the same way for those cases. My initial thought was to use edge detection (which, while somewhat complex to calculate, is at least easy to understand conceptually) but am open to other ideas.
[ For more information on the Eideticker software I’m referring to, see this entry ]
Time for another update on Eideticker. In the last quarter, I’ve been working on two main items:
- Responsiveness tests (Android / FirefoxOS)
- Eideticker for FirefoxOS
The focus of this post is the responsiveness work. I’ll talk about Eideticker for FirefoxOS soon.
So what do I mean by responsiveness? At a high-level, I mean how quickly one sees a response after performing an action on the device. For example, if I perform a swipe gesture to scroll the content down while browsing CNN.com, how long does it take after
I start the gesture for the content to visibly scroll down? If you break it down, there’s a multi-step process that happens behind the scenes after a user action like this:
If anywhere in the steps above, there is a significant delay, the user experience is likely to be bad. Usability research
suggests that any lag that is consistently above 100 milliseconds will lead the user to perceive things as being laggy. To keep our users happy, we need to do our bit to make sure that we respond quickly at all levels that we control (just the application layer on Android, but pretty much everything on FirefoxOS). Even if we can’t complete the work required on our end to completely respond to the user’s desire, we should at least display something to acknowledge that things have changed.
But you can’t improve what you can’t measure. Fortunately, we have the means to do calculate of the time delta between most of the steps above. I learned from Taras Glek this weekend that it should be possible to simulate the actual capacitative touch event on a modern touch screen. We can recognize when the hardware event is available to be consumed by userspace by monitoring the `/dev/input` subsystem. And once the event reaches the application (the Android or FirefoxOS application) there’s no reason we can’t add instrumentation in all sorts of places to track the processing of both the event and the rendering of the response.
My working hypothesis is that it’s application-level latency (i.e. the time between the application receiving the event and being able to act on it) that dominates, so that’s what I decided to measure. This is purely based on intuition and by no means proven, so we should test this (it would certainly be an interesting exercise!). However, even if it turns out that there are significant problems here, we still care about the other bits of the stack — there’s lots of potentially-latency-introducing churn there and the risk of regression in our own code is probably higher than it is elsewhere since it changes so much.
Last year, I wrote up a tool called Orangutan that can directly inject input events into an input device on Android or FirefoxOS. It seemed like a fairly straightforward extension of the tool to output timestamps when these events were registered. It was. Then, by synchronizing the time between the device and the machine doing the capturing, we can then correlate the input timestamps to events. To help visualize what’s going on, I generated this view:
[Link to original]
The X axis in that graph represents time. The Y-axis represents the difference between the frame at that time with the previous one in number of pixels. The “red” represents periods in capture when events are ongoing (we use different colours only to
distinguish distinct events). 1
For a first pass at measuring responsiveness, I decided to measure the time between the first event being initiated and there being a significant frame difference (i.e. an observable response to the action). You can see some preliminary results on the eideticker dashboard:
[Link to original]
1 Incidentally, these “frame difference” graphs are also quite useful for understanding where and how application startup has regressed in Fennec — try opening these two startup views side-by-side (before/after a large regression) and spot the difference:  and )
Last night while I was lying in bed the mystery of my being here, present, again occurred to me. Pondered that a bit upon waking up. Let me formulate two mysteries that, as far as I know, no one has given really satisfactory answers to:
- Why does anything exist at all? And given that things do exist, why should they take the form that they do (planets, suns, nebulae, even life)?
- What accounts for the “subjectivity” of experience? That is, why is life not only here, but (in humanity’s case at least, probably in the case of other higher-order life, and possibly all life) there is a *conscious* experience that goes on with our perceptions of the world? It does not seem necessary for (1), does it?
Perhaps the answer here is just that the way our minds (and hence anything we could form into thought or language) is based on descriptions of the world according to our perception. But (1) and (2) are in a sense, beyond this. I think in the case of (1) it is obvious why. In the case of (2) this might just be a limitation of our language/thought — certainly we can express that someone/something is conscious in a 3rd party sort of way (i.e. “she perceived red”), though this does not (as far as I can tell) express the realness of the experience. It’s a description, not the experience. To really understand experience from a 3rd person perspective (and hence why it exists?), you would need to go outside experience — but description is part of experience! The concept of being outside of it makes no sense.
[ Maybe I am just restating Kant here ]
I was kind of appalled today to see this:
I initially thought this had to be a tall tale told by hippies, but doing a back of the envelope calculation, I realized that such a figure is entirely possible. Assume each packet weighs 0.05 pounds. Typing that into python I get:
>>> 966*(10**6)/0.05<br />
19 billion packets. Seems awfully big. But divide that by, say, 10 million people:
>>> x = 966*(10**6)/0.05<br />
>>> x/10**7<br />
1932 cups. Hmm, still seems big. That’s more than 5 cups a day. But if we say 30 million people are drinking this stuff, we rapidly get to the zone of plausibility.
People, it doesn’t have to be this way. You can have way better coffee that produces zero waste for only marginally more effort. Allow me to present the Will method of coffee production. First off, you use this thing:
I have tried alternatives: french presses, filter coffee, “cowboy” percolators, even “professional” espresso makers. I maintain that the Bialetti filter produces the best cup of coffee: one full cup of espresso goodness. Not too strong, not too weak. Just perfect. Add some milk and you have an amazing café au lait. Of course, part of getting the best cup is using the right beans. If you’re brewing at home, you can afford to go a little fancy. Here’s what I’m currently using:
Yep, that’s right. A slice of Portlandia. Got this bag of espresso from Cafe Myriad, a rather upscale coffee joint. I think it was 15 dollars. A small bag like this is good for 30 cups or so. A keurig k-pack is $17.45 for 24. I’d say I’m still ahead. If you’re on a tighter budget you can get fair trade beans for cheaper ($10 a pound?) from Santropol in Montréal. Or whatever. Even generic stuff is probably fine (though I encourage fair trade if you can possibly afford it).
And what do I do with the waste? The only waste product of the Bialetti filter is coffee grinds. If I happened to live in a borough of Montréal with composting, I could dump it there. Unfortunately I don’t (if you live in NDG, please vote for these people in the upcoming municipal election; municipal composting is part of their platform, amongst other awesomeness) so I have a vermicompost. My morning ritual is dump yesterday’s coffee grinds into this bin:
… and then my numerous worms do the work of turning it into beautiful soil which I use in my balcony garden to grow tomotatos, kale, swiss chard, basil, and oregano.
What I want to emphasize most of all is that my ritual takes very little time. Scraping out and cleaning my Bialetti in the worm compost bin takes around minute. Refilling it with water and coffee takes maybe 30 seconds. Yes, once a year I have to take the worm trailings out of my vermicompost bin. That takes longer (maybe 30 minutes to an hour) but it’s a once a year thing and you avoid having to go to the store to buy fertilizer. Less waste. Way better coffee. Only a marginally more time spent. To me, this is a no-brainer.
I’ve been working on a new, mobile friendly version of Nixi on-and-off for the past year and a bit. I’m not sure when it’s ever going to be finished, so I thought I might as well post the work-in-progress, which has these noteworthy improvements:
- Even faster than before (using the Bootstrap library behind the scenes, no longer using slow canvas library to update map)
- Sexier graphics (thanks to the aforementioned Bootstrap library)
- Now uses client side URLs to keep track of state as you navigate through the site. This allows you to bookmark a favorite spot (e.g. your home) and then go back to it later. For example, this link will give you a list of BIXI docks near Station C, the coworking space I belong to.
If you use BIXI at all, check it out and let me know what you think!
Today I did a quick port of Larry Doolittle’s ntpclient program to Android and FirefoxOS. Basically this lets you easily synchronize your device’s time to that of a central server. Yes, there’s lots and lots of Android “applications” which let you do this, but I wanted to be able to do this from the command line because that’s how I roll. If you’re interested, source and instructions are here:
For those curious, no, I didn’t just do this for fun. For next quarter, we want to write some Eideticker-based responsiveness tests for FirefoxOS and Android. For example, how long does it take from the time you tap on an icon in the homescreen on FirefoxOS to when the application is fully loaded? Or on Android, how long does it take to see a full list of sites in the awesomebar from the time you tap on the URL field and enter your search term?
Because an Eideticker test run involves two different machines (a host machine which controls the device and captures video of it in action, as well as the device itself), we need to use timestamps to really understand when and how events are being sent to the device. To do that reliably, we really need some easy way of synchronizing time between two machines (or at least accounting for the difference in their clocks, which amounts to about the same thing). NTP struck me as being the easiest, most standard way of doing this.
[ For more information on the Eideticker software I’m referring to, see this entry ]
I just put up a proof of concept Eideticker dashboard for FirefoxOS here. Right now it has two days worth of data, manually sampled from an Unagi device running b2g18. Right now there are two tests: one the measures the “speed” of the contacts application scrolling, another that measures the amount of time it takes for the contacts application to be fully loaded.
For those not already familiar with it, Eideticker is a benchmarking suite which captures live video data coming from a device and analyzes it to determine performance. This lets us get data which is more representative of actual user experience (as opposed to an oft artificial benchmark). For example, Eideticker measures contacts startup as taking anywhere between 3.5 seconds and 4.5 seconds, versus than the 0.5 to 1 seconds that the existing datazilla benchmarks show. What accounts for the difference? If you step through an eideticker-captured video, you can see that even though something appears very quickly, not all the contacts are displayed until the 3.5 second mark. There is a gap between an app being reported as “loaded” and it being fully available for use, which we had not been measuring until now.
At this point, I am most interested in hearing from FirefoxOS developers on new tests that would be interesting and useful to track performance of the system on an ongoing basis. I’d obviously prefer to focus on things which have been difficult to measure accurately through other means. My setup is rather fiddly right now, but hopefully soon we can get some useful numbers going on an ongoing basis, as we do already for Firefox for Android.
Okay, remember last time when I said I was going to continue my “sham of a human existence” and not commit to a Zen practice? Well, I came back to the idea sooner than I thought: the experience was just too compelling for me not to do some further exploration. In some strange coincidence, Hacker News had a great thread on meditation just after I wrote my last blog entry, where a few people recommended a book called Mindfulness in Plain English. I figured doing meditation at home didn’t involve any kind of huge commitment (don’t like it? just stop!), so I decided to order it online and give it a try.
Mindfulness in Plain English is really fascinating stuff. It describes how to do a type of Vipassana (insight) meditation, which is practiced with a great deal of ritual in places like Thailand, India, and Sri Lanka. The book however, strips out most of the ritual and just gives you a set of techniques that is quite accessible for a (presumably) western audience. From what I can gather though, it seems like the goal of Vipassana is quite similar to that of Zen (enlightenment; release from attachment and dualism), though the methods and rituals around it are quite different (e.g. there are no koans). Perhaps it’s akin to the difference between GIMP and Photoshop: as those two programs are both aimed at the manipulation of images, both Vipassana and Zen are aimed at the manipulation of the mind. There are differences in the script of how to do so, but the overarching purpose is the same.
Regardless, the portion of the C method that the book describes is almost exactly that which I tried at the Zen workshop: sit still and pay attention to your breathing. There’s a few minor distinctions in terms of the suggested posture (the book recommends either sitting cross legged or in a lotus position vs. the kneeling posture I learnt at the workshop) and the focal point (Mindfulness recommends the tip of the nostrils). But essentially it’s the same stuff. Focus on the breath — counting it if necessary, rince, repeat.
As I mentioned before, this is actually really hard to do properly for being simple in concept. The mind keeps wandering and wandering on all sorts of tangents: plans, daydreams, even thoughts about the meditation itself. Where I found Mindfulness in Plain English helpful was in the advice it gave for dealing with this “monkey mind” phenomenon. The subject is dealt with throughout the book (with two chapters on it and nothing else), but all the advice boils down to “treat it as part of the meditation”. Don’t try to avoid it, just treat it as something to be aware of in the same way as breathing. Then once you have acknowledged it, move the attention back to the breath.
Mindfulness, as far as I can gather, is simply non-judgemental awareness of what we are doing (and what we are supposed to be doing). Every time a distraction is noticed, felt, and understood, you’ve just experienced some approximation of the end goal of the meditation. Like it is with other things (an exercise regimen, learning to play a musical instrument), every small victory should push you further and the path to where you want to go. With enough practice, it might just become part of your day-to-day experience.
Or so I’m told by the book. Up to now, I haven’t enjoyed any longlasting effects from meditation aside from (possibly?) a bit more mental clarity in my day-to-day tasks. But I’ve found the practice to be extremely interesting both from the point of view of understanding my own thought, as well as being rather relaxing in and of itself. So while I’m curious as to what comes next, I am happy enough with things as they are in the present. I’m planning to continue to meditate (20–30 minutes a day, 6 days a week), but also delve a bit deeper into the details and history of Zen and Vipasanna. More updates as appropriate.
Another update on getting Eideticker working with FirefoxOS. Once again this is sort of high-level, looking forward to writing something more in-depth soon now that we have the basics working.
I finally got the last kinks out of the rig I was using to capture live video from FirefoxOS phones using the Point Grey devices last week. In order to make things reasonable I had to write some custom code to isolate the actual device screen from the rest of capture and a few other things. The setup looks interesting (reminds me a bit of something out of the War of the Worlds):
Here’s some example video of a test I wrote up to measure the performance of contacts scrolling performance (measured at a very respectable 44 frames per second, in case you wondering):
Surprisingly enough, I didn’t wind up having to write up any code to compensate for a noisy image. Of course there’s a certain amount of variance in every frame depending on how much light is hitting the camera sensor at any particular moment, but apparently not enough to interfere with getting useful results in the tests I’ve been running.
Likely next step: Create some kind of chassis for mounting both the camera and device on a permanent basis (instead of an adhoc one on my desk) so we can start running these sorts of tests on a daily basis, much like we currently do with Android on the Eideticker Dashboard.
As an aside, I’ve been really impressed with both the Marionette framework and the gaiatests python module that was written up for FirefoxOS. Writing the above test took just 5 minutes — and the code is quite straightforward. Quite the pleasant change from my various efforts in Android automation.
One of my frustrations with the Linux desktop is the lack of an email client that’s in the same league as GMail or Apple’s mail.app. Thunderbird is ok as far as it goes (I use it for my day-to-day Mozilla correspondence) but I miss having a decent conversation view of email (yes, I tried the conversation view extension — while impressive in some ways, it ultimately didn’t work particularly well for me) and the search functionality is rather slow and cumbersome. I’d like to be optimistic about these problems being fixed at some point… but after nearly 2 years of using the product without much visible improvement my expectation of that happening is rather low.
The Yorba non-profit recently started a fundraiser to work on the next edition of Geary, an email client which I hope will fill the niche that I’m talking about. It’s pretty rough around the edges still, but even at this early stage the conversation view is beautiful and more or less exactly what I want. The example of Shotwell (their photo management application) suggests that they know a thing or two about creating robust and useable software, not a common thing in this day and age. In any case, their pitch was compelling enough for me to donate a few dollars to the cause. If you care about having a great email experience that is completely under your control (and not that of an advertising or product company with their own agenda), then maybe you could too?