Performance tuning PostgreSQL for Drupal 7 simple tests

Testing PostgreSQL and making sure its passing 100% of Drupal 7's simple tests is currently a big task because patches being committed to Drupal's CVS HEAD are only going through MySQL based automated testing leaving opportunity for Drupal to break in PostgreSQL and SQLite. While the solution isn't too far away with boombatower's Project Issue File Review 2.0 about to go into testing.drupal.org with the PostgreSQL support I wrote for it, until this happens, testing PostgreSQL manually is still a must.

Drupal 7 has well over 10,000 unit tests now, which, on un-optimized servers, can be a very slow process. Especially with PostgreSQL. I was running PostgreSQL 8.3 straight from the Ubuntu repositories using sudo apt-get install postgresql-8.3 along with PHP 5.2 and apache2. Firstly, PHP's memory_limit needs to be increased to around 256MB, without this, there is a good change your tests with exceed memory limit and you'll have to start from scratch again. To do so, edit /etc/php5/apache2/php.ini and find something that says: memory_limit=16M and change 16M to 256M - you could try something lower if you're worried like 128M or 64M. Then you'll need to restart Apache: sudo apache2ctl graceful.

Note: If you plan to run the simpletests script over CLI rather than through a browser, then you'll want to edit /etc/php5/cli/php.ini. Anf if you use fastcgi PHP rather than running PHP as a module through Apache, you'll want to edit /etc/php5/cgi/php.ini.

With this setup, my server was running the entire test suite in just over 4 hours. This was not ideal. I knew for a fact that MySQL did these tests a lot faster so it would seem logical to optimize PostgreSQL for this environment.

Drupal's simpletests are designed to test Drupal's functionality however, it doesn't run drupal how it was intended to be use. For example, for suite of tests, Drupal creates a database prefix and installs itself, runs testson these prefixed tables and then drops them once the tests have finished running. There are many test suites in Drupal, so this happens hundreds of times. Drupal doesn't use persistant connections, so for each page request, Drupal makes a connection to the database. Again, hundereds of page requests are made during these simple tests.

Its important to understand how Drupal is being used here, because we're going to set up PostgreSQL in a different way then how you would normally setup a dev or production instance of PostgreSQL.

Connection Settings

Firstly, from my understanding, by default, PostgreSQL uses SSL for every connection. PostgreSQL has great authentication abilities that can be locked down by IP, if you're only allowing connections from the server Drupal's test's are running on by the Drupal user in PostgreSQL, then I don't see any need for a password or SSL to connect and authenticate. This will increase the performance because secure authentication processes are quite slow. To turn off SSL:

sudo vim /etc/postgresql/8.3/main/postgresql.conf
#uncomment and change ssl parameter
ssl=true

Memory Settings

Because database's are so memory intensive, naturally, you may think, the more memory PostgreSQL is allowed, the better it well perform. This is really a myth. The right memory settings are really on the work at hand and resource available.

  • shared_buffers = 64MB

    Shared buffers sets how much shared memory PostgreSQL can use. 32MB is the default but you'll want it high if you want PostgreSQL to perform.

  • max_prepared_transactions = 0

    max_prepared_transactions is a feature Drupal doesn't utilize. Setting this to anything other than 0 would simply be a waste of memory.

  • work_mem = 16MB

    This determines how much memory PostgreSQL should use for SELECT queries (basically). So if a record result set was 20MB, then 16MB would be stored in RAM and 4MB in a temporary file on disk. Ideally, you want work_mem to be able to always store the entire query result in memory for faster access. I wouldn't think Drupal testing would generate results over 16MB.

  • maintenance_work_mem = 16MB

    This is usally set a lot lower, its the memory limit for maintenance work like creating indexes, foreign keys and vacuuming. Because Drupal's simpletests do so much creating and droping tables, and thus creating indexes, setting maintenance_work_mem to the same value as work_mem makes sense.

Other Optimizations

fsync is a writing method that is used by PostgreSQL for data integrity. Should the server crash midway through a write, fsync allows postgreSQL to recover from such a problem. The down side is that postgreSQL must wait for the write overhead to flush before it can return successful. Since we don't really care about the data in Drupal testing, we can turn this feature off and save a lot of time.

fsync = off

After all of that you'll need to restart postgreSQL

sudo /etc/init.d/postgresql-8.3 restart

This may fail at first potentially because you're kernels shmmax parameter isn't set high enough.96MB was enough for me:
# Don't set this higher that you're PC's actual RAM amount.
sudo sysctl -w kernel.shmmax = 100663296

Try restarting again, if it wasn't a syntax error, then everything should be good.

These optimizations got my test from 4 hours to run to 2 and a half hours. I'm sure there are more things todo to make postgreSQL faster. Please feel free to comment below with questions or suggestions.

Best of luck ;)

work_mem is actually used to

work_mem is actually used to manage memory usage during a query, most often with sort operations. the limit is actually per sort, so if you have 4 sort operations, your allowing postgres to try to use 4*work_mem per query. Set that too high, and you might end up causing swapping.

some other things to tune:
* set effective_cache_size to the amount of memory you can reasonably expect in your OS file system cache.
* set checkpoint_segments to something really big, like 300.
* set checkpoint_timeout to something big, like 3000
* set checkpoint_completion_target = .9
* set wal_buffers to like 2MB

There are other things that could be tuned, but it would help to know what type of workload is involved, and what parts specifically take so long (one common case would be around aggregate sql optimized for mysql, which tends to perform poorly on postgres, and vice versa). turning on something like log_min_duration_statement might be a good idea for at least one go around. hth.

You should really consider

You should really consider setting up a connection pool to get rid of the one-connection-per-query overhead, which I trust is huge in your case.

http://www.last.fm/user/Russ/journal/2008/02/21/zd_postgres_connection_p...

its not

its not one-connection-per-query, its on connection per page request, which could have 1000 queries on it. The pgpool stuff looks pretty cool though, so last.fm runs on postgreSQL? awesome.

Also note that you should set

Also note that you should set max_connections as low as possible (e.g. could be as low as 5 if the tests are doing concurrent connections).
The 16MB for maintenance_work_mem may still be a little on the low side. If these databases are being setup and torn down relatively quickly, you should also turn autovacuum off. As Robert says, getting information on the slowest queries would be a great help here. Doing something as simple as ANALYZEing the database before the tests are run, or lowering random_page_cost, may lead to some huge improvements.

We've been running with fsync

We've been running with fsync off in our regression test environment for a while. It made a huge difference - cutting the run time in half. However it bears repeating that you should never *ever* turn this off on a production server.

In a test environment you really want to have your db server setup done via a script because with fsync off, you *will* eventually lose your Postgres instance. Obviously you don't care about the contents of the test databases, but you'll need to do an init_db (pg_createcluster on Debian) and then you'll have to recreate your test users, and re-apply your tuning parameters and connection (pg_hba) permissions. If your data loss occurs 6 months from now, will you remember all the tweaks you applied manually?

Having a DB setup script also makes it easier to replicate you test environment on another server.

I would like to know why do

I would like to know why do you go for PostgresSQL when MySql is very powerful though.

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Allowed HTML tags: <h2><h3><h4><a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <p> <div> <span>
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
  • You may post code using <code>...</code> (generic) or <?php ... ?> (highlighted PHP) tags.
  • Glossary terms will be automatically marked with links to their descriptions. If there are certain phrases or sections of text that should be excluded from glossary marking and linking, use the special markup, [no-glossary] ... [/no-glossary]. Additionally, these HTML elements will not be scanned: a, abbr, acronym, code, pre.
  • You may insert videos with [video:URL]

More information about formatting options