We developed an internal tool to measure website performance in a more meaningful way - we call it the "crash rating" of a website.
Traditional measures include page time, bandwidth, total throughput, concurrency level, etc. These numbers are core metrics of a server, but they don't easily translate to the scaling ability of a website.
Until now, the ability to scale has been vaguely described by the word "scalability". We've decided to quantify this important performance characteristic with one simple number instead of a collection of the above said measures.
The conventional approach to measure scalability is to monitor the server's response time as the load increases; it tries to answer the question "when is the server is going to fail?"
In our model, we answer a different set of questions: how often and how much does the website fail?
There are different types of failures. They can be grouped in soft-fails and hard-fails. A soft-fail is when a page takes exceptionally long to load but is still functionally intact. A hard-fail could be a partial or total "crap-out" of a page. Typical error messages are "unable to allocate X amount of memory" from PHP, "OutOfMemoryError" in Java, "Internal System Error" and "Unable to Connect" from the web server.
We built a custom testing platform that measures both types of failures, but we focus our analysis on soft-fails. The platform consists of a master controller and an array of satellite servers also known as load agents. The agents send synchronized server requests in the same fashion of a DDOS attack. They also gather the response time and content length for the master to analyze the overall performance.
The agents do not compare each transaction against an established baseline. Instead, they compare among themselves. If an agent takes longer than the "norm", it's considered an outlier.
The master controller, a web app, visualizes the agents as a grid of cells. A soft-fail agent is colored in yellow and a hard-fail agent in red. The following chart simulates the look of the visualization tool:
In addition, each time-out is marked wth an X. If an agent doesn't receive a response in double the expected time frame, the cell will be filled with 2 X's.
We define the soft-fail rate to be the average number of outliers over the agent population. For example, the soft-fail rate of this website is 4/105. A non-optimized wordpress website was clocked at 12/21.
What's neat about this model is that every target page is challenged by its own performance. Even the fastest page fails. Again, our focus is on scalable performance or how the website responds to growing traffic. Although the ratio is affected by many factors, a lower fail ratio does correspond to a more performing system. Adding more hardware resource helps lowering the number but only marginally.
We have bench marked the performance of some of the typical open source packages:
|OpenX Login ||crashed after 40 instances
For legal reasons, we cannot release the tool to the public. You may contact us
if you would like to have your website tested. Due to the nature of stress testing, your site might be brought down temporarily.