Wordpress and the Curse of EAV July 5, 2014

We have written a few articles about the performance characteristics of Wordpress (here and here). In short, Wordpress is a memory hog and unless you have plenty of hardware resource, your Wordpress powered website will hit a performance wall the moment you start using it seriously.

So why another article to beat the dead horse? Well, it turns out that when HHVM significantly improved the execution speed of the PHP language altogether, many Wordpress sites got a chance to breath. For a moment, we can look pass the memory overhead and script execution issues. However, HHVM doesn't make MySQL run faster. And the way Wordpress stores and retrieves records in the MySQL database is highly inefficient.

EAV - a curse in disguise?

Entity-Attribute-Value, or EAV, is a database layout. The strengths and shortcomings of EAV are directly linked to the extensibility and inability to achieve good performance of the Wordpress core and many of its extensions. Although EAV sounds like a highly technical term, the idea is rather simple. Instead of storing data like this:

customeridfnamelnameagegender
1TomScott32M
2GeorgeBern50M

, EAV uses a row for each column:

meta_idcustomer_idmeta_keymeta_value
11fnameTom
21lnameScott
31age32
41genderM
52fnameGeorge
62lnameBern
72age50
82genderM

The benefit of EAV is that columns are expressed in rows. A column change in the conventional design is simulated through adding or removing a record. In database operations, changing columns are both expensive, performance-wise, and dangerous. Row-level changes are carried out much faster. If one row is messed up, the damage is contained (supposedly). A dynamic CMS such as Wordpress needs to make structural and functionality changes on the fly. So a column-agnostic design makes sense.

Now let's look at a simple query that finds all the male customers 50 years of age:

select age.customer_id, fname.meta_value as fname, gender.meta_value as gender
from customermeta age, customermeta fname, customermeta gender
where age.meta_key='age' and age.meta_value='50'
and age.customer_id=fname.customer_id
and gender.customer_id=age.customer_id
and gender.meta_key='gender' and gender.meta_value='M'

Basically the above query rebuilds a multi-column table by creating several temporary tables. These sub tables are joined by their common customer_id key.

The non-EAV form allows a much simpler and faster query:

select * from customers where age=50 and gender='M'

Many database operations are considered "expensive". Both table creation and joining tables are expensive operations. They are expensive because the database engine has to scan through more records. If the customer table has N records, the search complexity of the about EAV query would be NxN. This means if you have 100 customers in the database, the query runs potentially 100 times slower. Of course, MySQL will deploy many optimization tactics, but the performance degradation is still at an order of magnitude.

What would a simple join in the "simple form" become in EAV? Here's the join that shows all the customers who have ordered on May 1, 2014:

select * from orders, customers where orders.customer_id=customers.id and order_date='2014-5-1'

EAV form:

select * from customermeta,
(select order_id,
from ordermeta o1, ordermeta o2, customers
where o1.order_id=o2.order_id
and o1.customer_id=customers.ID
and o2.order_date='2014-5-1') t
where customermeta.customer_id=t.ID


That's right, a simple join is expressed as a self-inner join and a nested table join. It's okay if you aren't familiar with SQL syntax, as long as you understand that the above query demands even more resource than the previous one. If there are M orders and N customers, the search complexity is MxNxN.

Another source of inefficiency is that all the records have the same type. Wordpress uses the most generic and forgiving storage type, "LongText" to store numbers, dates, strings and serialized objects. MySQL (or any SQL engine for that matter) cannot optimize for either storage or query execution. Search queries such as range filtering, date sorting, full text search become unbearably slow.

How does Wordpress use EAV?

As if EAV's complexity and lack of data-type optimization weren't bad enough, Wordpress tops it by using EAV abusively and indiscriminately (read: poorly). The table that holds all the blog articles, "posts", also stores images, attachments as well as arbitrary record types from other plugins. The woocommerce plugin, for example, saves a purchase order as a post!

One can imagine the amount of deciphering a programmer has to go through just to maintain this over-generic storage structure.

Using the same table for everything also causes severe performance penalty. For example, when someone is placing an online order, MySQL locks the "posts" table. Meanwhile, another user is viewing some completely unrelated information - an event calendar perhaps, but events are also in the "posts" table. Now this two isolated actions, along with other web requests, interlock each other, causing noticeable delays.

No, caching is not a solution

If Wordpress is used to display blogs and web pages, caching can be an effective tactic to mitigate the underlying data storage issues. However, today's websites tend to be multipurpose. We've seen many times a Wordpress is stretched to full fledged store fronts and management systems. Caching for these use case scenarios does not allow real-time information. There's a common wishful thinking that if a Wordpress implementation is slow, it can be optimized later, and that caching is the go-to method. Realistically though, when a site is CPU bound because of SQL queries, it will only get slower. The Ms and Ns will grow in our previous complexity analysis. When both the customers and orders grow 10 times, we are looking at a complexity growth of 1000 times!

What are the alternatives?

Be specific with features. Wordpress uses EAV because it tries to be everything, and it _can_ be everything but poorly. When a system is built for a specific set of purposes, the data tables can be properly built and correlated (a process called normalization).

Split the tables. If Wordpress must be used, which is often the case when we were consulted to just "optimize" a Wordpress site, we can still bring some senses to the over-generic "posts" table. We rewire the plugins so that they use a different table. This strategy helps with table locking.

Use cEAV. The performance issues of EAV is not unique to Wordpress. There are times when we have to allow user-defined data structure. One example is faceted search on data records with arbitrary attributes. We developed a specialized engine - columnized EAV, or cEAV, to battle typical EAV problems.

Our Services

Targeted Crawlers

Crawlers for content extraction, restoration and competitive intelligence gathering.

Learn More

Gyroscope™ ERP Solutions

Fully integrated enterprise solutions for rapid and steady growth.

Learn More

E-Commerce

Self-updating websites with product catalog and payment processing.

Learn More