宏哥 发布于 2015/12/29 11:01
阅读 142
收藏 1



We use MySQL where I work, and we've looked at PostgreSQL (it's running for one small system, as a test). They both have their pluses and minuses. Note that for the purposes of this discussion I refer to MySQL 5.0.x with InnoDB. While 5.1 may fix some of these things, it's not stable yet.

MySQL - Good Points

Probably MySQL's best point is that it's so common. Since it's part of LAMP, everyone and their brother provides it as an option. It runs on Windows, Linux, and anything else. You can't swing a stick at a web host without having MySQL available. There are TONS of people out there that can give you advice on it.

I like the tools (really, the command line client) better. It feels quite friendly to me. Want to see the tables in a database? SHOW TABLES. Want to see the databases? SHOW DATABASES. Want to see the status of your replication against the master? SHOW SLAVE STATUS. PostgreSQL feels quite a bit like Oracle to me. There is no SHOW TABLES, it's \dt (IIRC). To quit it's not QUIT or EXIT, it's \q.

Replication in MySQL is pretty nice. It's built in (PostgreSQL didn't have built in replication until recently). The last time we spent much time looking (last year), PostgreSQL replication was based on triggers, which we found a little iffy (in theory). MySQL replication has some limitations (basically must use InnoDB, it's statement based not data based (changes in 5.1, I think)), but it's been working great for us. It can do multiple masters, multiple slaves, chains. Again, this is a known quantity, since MySQL is so common.

MySQL - Bad Points

MySQL has some very serious limitations. It is very important that you understand what you're getting into.

The biggest one, the one that's a killer for us, is the inability to add or remove columns or indexes without locking the table. We have tables with tens of millions of records. We can't edit them. To add a new column or index locks the table from reads and writes, which is killer for us. We don't know how long it would take, but it would be hours, minimum. When we HAD to add a new column last year, we did it by making a new table and always doing a join. It was the only way to do things without taking the server out of production (or going through a large mess since we use replication).

MySQL can be dead stupid at times with indexes. It's important to run a DESCRIBE or EXPLAIN on things to see if it's doing something sane. Sometimes we've had to use FORCE INDEX to get good performance. This is made worse by the fact that it can't use multiple indexes (for the most part). If you want it to use your index on the date column and your index on the email column, you need to create an index that is on both columns. This is supposed to be fixed/improved in 5.1 (I think, haven't tested 5.1).

Subqueries can be a big problem too. A query can run fine. A subquery can run fine. But when you get to three levels of queries (or more) MySQL can (and usually does) just give up. So instead of doing things a smart way (even if your subquery is as easy as a constant "SELECT id FROM table WHERE id = '5'), MySQL will just start running that query for every row, killing performance. Again, you have to use EXPLAIN / DESCRIBE.

The error messages in MySQL are worse, which is really a nitpick. If you try to create a table with a foreign key and you get something wrong, it will just throw an error 150. It doesn't tell you what the problem is (usually mismatched column definitions), just error 150. You have to go look the error up just to see that 150 is a foreign key problem. Other error messages are much more helpful.

Then there is the, well, we'll call it buggyness and weirdness. MySQL performance, at least on Windows and Linux would bomb with more than 4 (possibly 8) CPUs. I hear that it's not a problem on Solaris (thanks to Sun's work and the Niagara processors). This is something you must be aware of. I believe indexes must be held in memory for InnoDB. If you want a giant index that you will only use once a week, too bad. This may have changed, or it may be fixed in 5.1, I don't know.

You also run into fun little things. There is a condition we've run into involving subqueries and joins in 5.0 that causes MySQL to return no data. Despite what explain says, what the partial query parts show, etc, you just get zero rows back. You change your code just a little and the bug isn't hit and you get your data. TIMESTAMPs don't record milliseconds, you have to do that by hand with another column. We also once ran into a situation where the date format command could crash MySQL (this was in the 4.0 or 4.1 days). You just have to be aware that weird little things like these crop up. Did I mention that I've heard (from someone I really trust who has had experience with this) that stored procedures / triggers could crash the MySQL server in 5.0? This was earlier in the 5.0 branch and is probably fixed by now... but you need to watch MySQL closely.

This should go without saying, but if you go with MySQL use InnoDB. Depending on your setup that may not be the default engine. Change that. Everything good on MySQL is InnoDB. Replication and transactions both need InnoDB, MyISAM doesn't have them. There are other storage engines, but they are much more specialized.

PostgreSQL - Good and Bad

Now I don't have a ton of experience with PostgreSQL. Like I said we've started to experiment with it. The fact that it doesn't have many of MySQL's limitations is a big plus. Just the ability to add a column on a large table without locking the thing for a huge amount of time would be great for us. PostgreSQL can also use multiple indexes which, again, is a serious plus. The error messages that PostgreSQL returns can be much more informative than MySQL. At times when you screw up a query instead of "this is impossible" or "you can't do that", you get something more akin to "this is impossible because of X". Again, this is just my impression. Basically, PostgreSQL feels more like an open source Oracle (a grown up database) than MySQL (which doesn't feel very Oracle-y).

Now the tools are MUCH less friendly, but if you come from an Oracle (or probably DB2) world you'll be used to it. While they require more cryptic commands (see MySQL tools, above), they do work very well.

There is the replication thing. When we picked PostgreSQL back up this year to play with for a new little system, we discovered that they took one of the formerly external replication systems and it has been put into the tree. This is a very good step, as the replication situation was a big problem for us. Before there were little third party things (based on triggers) that you could take your chances with, and there were paid replication solutions. Having one built in is quite nice.


My advice? While I have less experience with it if I was starting with a new system I think I'd go with PostgreSQL. I've seen enough weirdness in MySQL that I'd be willing to try it. We haven't had any problems with it so far, and I know many people use it. MySQL is improving fast. It's gained replication for a disk based storage engine, stored procedures, better index choosing, and it's become much more strict about not allowing obviously bad data (like the date '0000-00-00') or data that violates a key.

Switching is not a decision to be taken lightly. Just moving the data across will be a big problem if you haven't been very strict about validation in the past.

The most important thing to do though is to just try it. Set-up a test server and a quick codebase hack to get your code running. Put some data in and benchmark it. Maybe it won't be substantially faster than your current setup and it won't be worth looking at.

To Your Update

In regards to the small update you posted, I have two comments. For the BLOBs, are they being stored external to the table or inline? I know Oracle can do that, and I'm guessing PostgreSQL can, but I don't remember about MySQL. That may be a big boon.

As for storing the PDFs in BLOBs in the columns, from what I've been taught that's a little weird and could cause your performance problem. I haven't had a ton of experience in this area.

Have you tried storing the PDFs in one or more other tables and just storing the IDs? That is, normalizing them out? That may help your performance problems.

Those are just stabs in the dark though.