源文地址

作者:Robin Schumacher 和 Gary Carter,EnterpriseDB 公司

原文: Replication is one of the most popular features used in RDBMS’s today. Replication is used for disaster recovery purposes (i.e. backup or warm stand-by servers), reporting systems where query activity is offloaded onto another machine to conserve resources on the transactional server, and scale-out architectures that use sharding or other methods to increase overall query performance and data throughput.

翻译: 今天,“复制”是关系数据库的最受欢迎的功能之一。复制的目的是用于灾难恢复(也就是备份或者叫“热”备用服务器),用于在事务服务器上报告系统,将查询活动转移到其他机器,以节约资源。并且这种分区或其他方法的横向扩展构架将会提升整体的查询性能以及数据的吞吐率。

原文: Replication is not restricted to only the major proprietary databases; open source databases such as MySQL and PostgreSQL also offer replication as a feature. While MySQL has offered built-in replication for a number of years, PostgreSQL replication used to be accomplished via community software that was an add-on to the core Postgres Server. That all changed with the release of version 9.0 of PostgreSQL, which now offers built-in streaming replication that is based on its proven write ahead log technology.

翻译: 复制功能,并不仅仅限制在专利的商业数据库,一些开源的比如:MySQL 和 PostgreSQL 也提供复制的功能。当MySQL已经内建提供复制功能好几年时,PostgreSQL的复制功能还只是社区通过附加组件形式被添加到核心PostgreSQL服务器。但是,直到PostgreSQL9.0版本时,一切都发生改变了,PostgreSQL现在提供内建一种基于WAL(预写日志)技术的流复制功能。

原文: With the two most popular open source databases now providing built-in replication, questions are being asked about how they differ in their replication technologies. What follows is a brief overview of both MySQL and PostgreSQL replication, with a brief compare and contrast of the implementations being performed immediately afterwards.

翻译: 现在,这两个最流行的开源数据库都已经提供了内建的复制功能,那么问题来了:他们两者之间的复制功能有什么区别?下面对MySQL和PostgreSQL作一些简要概述,作一个简单的比较和它们的实现区别之后来简单实践一下。

原文: An Overview of MySQL Replication

Asynchronous replication was introduced into Oracle’s MySQL with version 3.23 and today it remains the primary feature employed by many MySQL users to create scale-out architectures, standby servers, read-only data marts, and more. The various supported MySQL replication topologies include:

• Single master to one slave

• Single master to multiple slaves

• Single master to one slave to one or more slaves

• Circular replication (A to B to C and back to A)

• Master to master

The major replication topology not currently supported in Oracle’s MySQL today is multi-source replication: having one or more master servers feed a single slave.

A graphical view of how MySQL replication functions can be represented as follows:

翻译: MySQL的复制功能概要

异步复制在MySQL的3.23版本时就被引入了。今天,异步复制仍然是许多MySQL用户用于横向扩展、备库、只读数据等等的主要功能。MySQL支持的各种拓扑结构包括:

• 单主,一从

• 单主,多从

• 单主一从到多从

• 环形复制(A - B - C - A)

• 主对主

对于现在的MySQL主要不支持的拓扑结构是“多源复制”,也就是:拥有一个或多个主服务器,但只有一个从库。

MySQL复制功能的图形化表示如下:

原文: Object, data, and security operations run on the master are copied to the master server’s binary log. A user has the option of replicating an entire server, one or more databases, or just selected tables (although filtering by table is only done on the slave). The slave server obtains information from the master’s binary log over the network , copies the commands and/or data, and first applies them to the slave’s relay binary log. That log is then read by another process – the SQL thread – that applies the replicated operations/data to the slave database and its binary log.

翻译: 在主库上,对象、数据和安全的操作全部都会被复制到主库服务器的二进制日志。用户可以选择复制整个服务器、一个或多个数据库、或仅仅选择某些表来进行复制(尽管表过滤只是在从库完成)。从库通过网络传输从主库获取二进制日志信息,复制的命令或数据会首先应用到从库的 relay 二进制日志里。这个 relay 二进制日志,会被其他工具处理 —— SQL 线程,SQL线程会将replay二进制日志里复制的操作命令和数据应用到从库自己的二进制日志里。

原文: Prior to release 5.1, MySQL replication was statement-based, meaning that the actual SQL commands were replicated from the master to one or more slaves. However, certain use cases did not lend themselves to statement-based replication (e.g. non-deterministic function calls) so in MySQL 5.1 row-based replication was introduced. A user now has the option of setting a configuration parameter to use either statement or row-based replication.

翻译: 在5.1版本之前,MySQL 复制是基于语句的,意味着实际的SQL命令是通过复制从主库到一个或多个备库中的。然而,某些使用情况不能让它们基于语句复制(例如,非确定函数调用)。所以,在MySQL 5.1版本提供了基于行复制功能。用户现在可以通过配置相应的参数来决定是使用基于语句还是基于行复制。

原文: The primary bottleneck for busy MySQL replication configurations is the single-threaded nature of its design: replication operations are not multi-threaded at the moment, although MySQL has declared it is coming in a future release. This limitation can cause some slave servers under heavy load to get far behind the master in regards to applying binary log information.

翻译: 在繁忙时的MySQL复制配置里,主要的瓶颈是由于它自身的单线程设计性质:复制操作都不是多线程的,尽管MySQL已经声称将来会将这个多线程特性添加到未来的版本中。这个限制可能导致从库在高负载情况下应用执行二进制日志将会远远落后于主库的日志信息。

原文: Setting up MySQL replication is a fairly painless process. Although various setup procedures exist, in general, the following is a basic outline of how it is done:

• The master and slave servers are identified

• The master server is modified to include a replication security account

• The master server’s MySQL configuration file is modified to enable binary logging. A few other parameters are included as well (e.g. a unique server ID, type of replication such as statement or row-based, etc.)

• The slave server’s MySQL configuration file is modified to include a unique server ID

• The master server is restarted

• The master server’s log file position is recorded

• The master’s data is copied to the slave to initially seed the slave server. This can be done via a cold backup/restore, using the mysqldump utility, locking the master tables and doing a file copy, etc.

• The slave server is restarted

• The MySQL CHANGE MASTER command is executed on the slave server to set the master host name on the slave server as well as other parameters such as the master account username and password, the log file name, and beginning log file position

翻译: 设置MySQL的复制功能是一个非常痛苦的过程。尽管有许多种设置规则存在,但通常,以下是一些通常的步骤:

• 主库和从库都要标识

• 主库添加一个用于安全复制的账号

• 在主库中修改配置文件以开启二进制日志功能,还有一些其他的参数也要开启。 (比如:唯一的服务器ID, 复制的类型,等等)

• 在从库中修改配置文件,添加一个唯一的服务器ID。

• 重启主服务器

• 主库的日志文件位置被记录。

• 主库的数据将被初始化到从库中。这是通过冷备份/恢复完成的,例如使用:mysqldump 工具,锁住主库的表然后完成文件复制,等等。

• 重启从库

• MySQL CHANGE MASTER 命令是在从库服务器上执行设置将主库的主机名以及其他的参数如:主库的用户名,密码,日志文件名以及日志文件起始位置设置到从库。

原文: Once set up, MySQL replication is quite reliable. Being asynchronous in nature, however, there are use cases that could result in data loss between a master and slave. To help combat these situations, MySQL 5.5 introduced semi-synchronous replication where a pending transaction is sent from a master to a slave, but not committed on the slave; it merely ‘lands’ safely on the slave to be run as soon as possible. Once the master is notified that the transaction is safely recorded on the slave, then the transaction is committed on the master.

翻译: 一旦设置完毕,MySQL复制是相当可靠的。因为本质上是异步的,所以有一些使用情况下还是会导致在主库和从库之间丢失数据。为了帮助解决这个问题,MySQL 5.5 提供了“半同步”复制,它会将事务从主库发送到从库,但在从库并未提交(只是记录到relay二进制日志)。它仅仅是尽可能快速地被安全送到从库。一旦主库收到事务已经被安全地记录在从库上,然后主库才会正式提交事务。

原文: In terms of MySQL replication limitations and missing features, besides the already mentioned single threaded nature of the implementation and the inability to perform multi-source replication, other wish-list items include a full synchronous option, conflict detection and resolution, time-delayed replication, changing the binary log to a storage engine, better replication filtering on the master, global statement ID’s, and graphical tools to manage replication functions.

翻译: MySQL在复制功能方面有许多限制以及不足,除了之前已经提过的单线程性质,还有不能进行多源进制,其他的希望加入的特性包括:完全同步选项,冲突检测及解决办法,延时复制,更改二进制日志的存储引擎,在主库上更好的复制过滤功能,全局语句的ID标识以及管理复制函数的图形化工具。

原文: There are third-party providers of MySQL replication solutions that overcome some of the current shortcomings in what is provided out-of-the-box with MySQL. One example is Continuent’s Tungsten product.

For more information about Oracle’s MySQL replication, see: http://dev.mysql.com/doc/refman/5.5/en/replication.html.

翻译: 有许多第三方提供即开即用的MySQL复制解决方案来克服当前MySQL内建复制功能的缺点。一个例子是:Continuent’s Tungsten 的数据库产品。 更多关于MySQL复制功能,请看:MySQL5.5复制

原文: An Overview of PostgreSQL Replication

PostgreSQL replication is based on a mature and long used technology called write ahead log (WAL) archiving. WAL technology has been in use since version 7.1 and has been used in features such as backup and restore and warm standby servers (i.e. slave servers offline kept in synch with the master to step in during crash recovery) for high availability.

PostgreSQL 9.0 introduced significant enhancements producing extremely fast WAL processing that results in near real-time replication and hot standby capabilities for slave servers. The supported PostgreSQL replication topologies include:

• Single master to one slave

• Single master to multiple slaves

A graphical view of how PostgreSQL replication functions can be represented as follows:

翻译: PostgreSQL的复制功能概要

PostgreSQL 复制功能是基于一个成熟并且被长时间使用的技术,叫WAL(预写日志)归档。WAL技术已经在PostgreSQL 7.0版本被使用,并且是用在备份/恢复和热备用服务器的高可用中。(比如,从库离线【注,这里我认为是作者写错了,应该是online在线,而不是offline离线】保持与主库同步,并且在主库崩溃时介入以进行恢复)。

PostgreSQL9.0 版本显著地改进以产生极快的WAL日志处理,结果就是一个几乎是近实时复制并且是双机(主从)热备功能的从库。PostgreSQL支持的复制拓扑结构包括:

• 单主单从

• 单主多从

A graphical view of how PostgreSQL replication functions can be represented as follows:

PostgreSQL复制功能的图形化表示如下:

原文:

All objects and data (including schema) and security operation executed on the master are written to the WAL log directly on the slave machine for safety (avoiding complete data loss in the event of a catastrophic master failure). WAL also ensures that no transaction is committed on the master until a successful write of the WAL log has occurred. No filtering is currently possible (although replication with filtering is possible with the xDB Replication Server from EnterpriseDB) so a complete copy of the master is replicated on the slave.

翻译: 所有的对象和数据(包括模式)和在主库安全的操作都被写到WAL日志会立即安全地同步地从库(完全避免了在主库发生灾难时导致的数据丢失)。WAL也会确保在主库上不会有事务被提交,直到该事务已经成功地写到入WAL日志。当前版本并没有选择性复制功能(尽管选择性复制功能可能在 EnterpriseDB 的 xDB复制服务器提供了),所以主库会完整地被复制到从库。

原文: The slave then applies the WAL log by directly rewriting the raw table data on disk, which is much faster than statement based replication. It is also safer since statements such as:

INSERT INTO table (column) VALUES (SELECT function());

may have unexpected and inconsistent results if the function returns different values on different servers - perhaps because it involves a generated timestamp or uuid.

翻译: 从库然后直接应用磁盘上的WAL日志来重写元数据,这点比基于语句复制更加快速。并且,遇到以下这种SQL时都是安全的:

INSERT INTO table (column) VALUES (SELECT function());

如果这个function函数在主从两个不同的服务器之间产生不同的值,这可能有意想不到和不一致的结果。比如可能它调用一个产生UUID或Timestamp的函数。

原文: The primary limitations of PostgreSQL replication are topology based. It cannot currently do cascading replication or filter tables by rows for replication. Again, these are capabilities available in a separate replication solution from EnterpriseDB called xDB Replication Server.

翻译: PostgreSQL复制的主要限制是它的拓扑结构。它目前不能进行联级复制,也不能通过行来过滤表进行复制。同样,这些功能可以通过 EnterpriseDB 里单独提供一种叫 xDB 复制服务器来解决。

原文: Setting up PostgreSQL replication is very straightforward. WAL logging is always enabled with minimal configuration needed by the user to utilize replication. The basic process to get replication going is:

• The master and slave servers are identified

• The postgresql.conf file on the master is edited to turn on streaming replication

• The pg_hba.conf file on the master is edited in order to let the slave connect

• The recovery.conf and postgresql.conf files on the slave are edited to start up replication and hot standby

• The master is shutdown and the data files are copied to the slave

• The slave is started first

• The master is started

翻译: 设置PostgreSQL复制是非常简单的。开启WAL日志是用户使用复制的最小化配置。使用复制的基本流程如下:

• 主库和从库都要标识

• 编辑在主库的postgresql.conf 配置文件里,以开启流复制

• 编辑在主库的pg_hba.conf 配置文件,以让从库连接到主库

• 编辑在从库的 recovery.conf 配置文件和postgresql.conf 配置文件,以开始复制和热备用

• 关闭主库,并且复制data目录所有文件到从库

• 先启动从库

• 再启动从库

原文: The secret sauce to PostgreSQL 9.0’s extremely reliable WAL based replication is a set of enhancements to efficiently stream very small WAL segments compared to earlier versions. Like MySQL there are cases where data loss could occur – however, depending on how you configure the system, your hardware architecture, and load, its possible the data loss could be as small as a single transaction.

PostgreSQL does not currently have native synchronous replication. However, there are multiple replication options available from other community and third-party software providers. PostgreSQL offers multiple solutions for multi-master replication, including solutions based on two phase commit. Offerings include Bucardo, rubyrep, PgPool and PgPool-II and Tungsten Replicator as well as some proprietary solutions. Another promising approach, implementing eager (synchronous) replication is Postgres-R, however it is still in development. Yet another project implementing synchronous replication is Postgres-XC, which is a shared-nothing, transactional scale-out solution that is still under development.

For more information on PostgreSQL replication see:

PostgreSQL Documentation: http://www.enterprisedb.com/docs/en/9.0/pg/high-availability.html

Bucardo: http://bucardo.org/wiki/Bucardo

PgPool-II: http://pgpool.projects.postgresql.org/

Tungsten Replication: http://www.continuent.com/community/tungsten-replicator

翻译: PostgreSQL9.0非常可靠的秘密武器在于基于WAL日志复制,它是一套增强的高效流,利用非常小的一段WAL来进行早期版本的比较。像MySQL在有些情况下可能会导致数据丢失,然而,这取决于你的系统、你的硬件构架以及负载如何,它丢失的数据可能非常小,如:一个事务的数据。

PostgreSQL目前还没有本地同步复制。然而,有许多种通过社区和第三方软件提供商提供的可用复制方案。PostgreSQL 提供许多“多主复制”的解决方案,包括基于两阶段提交的解决方案。包括:Bucardo, rubyrep, PgPool and PgPool-II and Tungsten Replicator 以及其他一些专有的解决方案。另一种非常有前景的解决办法,实现 饥饿(同步)复制的Postgres-R产品,然而,它目前还在开发中。还有另一个项目实现同步复制的是Postgres-XC,它是一个无共享、事务横向扩展的解决方案,目前也是处于开发中。

更多关于PostgreSQL复制,请看: PostgreSQL 文档: http://www.enterprisedb.com/docs/en/9.0/pg/high-availability.html

Bucardo: http://bucardo.org/wiki/Bucardo

PgPool-II: http://pgpool.projects.postgresql.org/

Tungsten Replication: http://www.continuent.com/community/tungsten-replicator

原文: A Brief Compare and Contrast of MySQL and PostgreSQL Replication

Those wanting to use an open source database for a particular application project that requires replication have two good choices in MySQL and PostgreSQL. But, the question naturally arises, which should be used? Is one just as good as the other?

As demonstrated above, there are both feature and functional differences between how MySQL and PostgreSQL implement replication. However, for many general application use cases, either MySQL or PostgreSQL replication will serve just fine; technically speaking, from a functional and performance perspective, it won’t matter which solution is chosen. That said, there still are some considerations to keep in mind in deciding between the different offerings. Some of these include the following:

• Oracle’s MySQL offers both statement and row-based replication, whereas PostgreSQL only uses the latter based on write ahead log information. There are pro’s and con’s to using statement-based replication, which MySQL has documented here: http://dev.mysql.com/doc/refman/5.5/en/replication-sbr-rbr.html. It is generally acknowledged that row or WAL-based replication is the safest and most reliable form of replication. It does, however, result in larger log files for MySQL than the statement-based option does.

• MySQL currently supports more replication topologies than PostgreSQL (e.g. ring, etc.). However PostgreSQL does have a number of community supported replication offerings that help close this gap (e.g. Bucardo’s master-to-master solution).

• In regard to data loss, MySQL 5.5 offers the semi-synchronous option, which helps minimize the risk of master-slave synchronization problems due to a master server going down. For PostgreSQL, a full synchronous replication option is in development and scheduled for release sometime in 2011.

• As to replication filtering, MySQL provides filtering on the slave server, whereas with PostgreSQL, no filtering is available; in other worlds, the entire database from the master is replicated to the slave. With MySQL, all the information is sent, but then options exist to selectively apply the replicated events on the slave. However, as the MySQL binary log is not used for crash recovery purposes in the same way as PostgreSQL’s WAL is, a user can configure a MySQL master so only certain databases are logged and, in that sense, a filter for the master server is available.

• Both MySQL and PostgreSQL replication are single-threaded at the current time.

• With respect to monitoring replication, MySQL provides a number of SHOW commands to understand the state of replication between a master and slave. To date, PostgreSQL offers functions to compute the differences in log positions between the master and slave servers, but that is all that is currently provided in 9.0.

• For failover and load balancing, the PostgreSQL community provides pgPool, which is middleware that provides connection pooling, load balancing, failover, and more between replicated servers. MySQL 5.5 supports connection pooling in the Enterprise edition, but failover and load balancing must be handled via a third-party product or custom development.

翻译: 简单对比一下MySQL和PostgreSQL复制

那些想为一个需要复制功能的特定应用的项目使用开源数据库的人,MySQL和PostgreSQL是两个很好的选择。但是,问题自然而然产生了,我们应该使用哪个?还是说这两个一样好?

综上所述,MySQL和PostgreSQL都有复制功能,但是有不同的实现。然而,对于一般的应用来说,无论是MySQL还是PostgreSQL复制功能会工作得挺好;从技术上说,以及从功能和性能来看,它不会不管要选哪个数据库。这意味着,在不同产品之间还有一些值得注意的事项,其中包括以下内容:

• MySQL提供基于语句和基于行的复制,而PostgreSQL只有基于WAL日志信息。有赞成也有反对使用基于语句复制的,MySQL有文档介绍:http://dev.mysql.com/doc/refman/5.5/en/replication-sbr-rbr.html 。一般认为,基于行或基于WAL 复制是最安全和最可靠的复制形式。的确如此,然而,这会导致比基于语句复制的形式产生更大的日志文件。

• MySQL目前比PostgreSQL支持更加多的复制拓扑结构(比如:环形等)。然而,PostgreSQL有许多种社区支持的复制选项,这就缩小了这个因拓扑结构种类而导致的距离(比如 Bucardo 的 主-主解决方案)

• 考虑到数据丢失,MySQL 5.5 提供了半同步选项,这有助于减小因主库崩溃而导致主从同步问题的风险。对于PostgreSQL,完全同步复制特性正在开发,并且计划于2011年正式可用。

• 对于过滤复制,MySQL 提供了在从库过滤,而在PostgreSQL,并不能使用过滤,换句话说,就是完整地将数据库从主库复制到从库。对于MySQL,所有信息都会被发送到从库,但如果开启过滤复制,从库会有选择地将事件应用到从库。然而,对于MySQL二进制日志并不是用于灾难恢复的,但PostgreSQL的WAL是可以用于灾难恢复的,用户可以配置MySQL主库指定哪些数据库会被记录到日志,在这种意义上,过滤器对于主库是可用的。

• 目前,MySQL和PostgreSQL复制都是单线程的

• 对于复制的监控,MySQL提供了许多 SHOW 命令去了解主从复制的状态。至今,PostgreSQL提供一些函数去计算主从日志位置的区别,但是当前只是在PostgreSQL9.0版本才提供的。

• 对于故障切换和负载均衡,PostgreSQL社区提供pgPool的中间件,pgPool提供连接池,负载均衡,故障切换和更多种复制形式。MySQL 5.5 在企业版里提供连接池,但是对于故障切换和负载均衡必须通过第三方产品或定制开发。

原文: Conclusions

As was previously stated, for many application use cases, both Oracle’s MySQL and PostgreSQL replication will be an equally good choice. The best way to determine which is right for you is to download both and put each through a comprehensive evaluation.

You can download Oracle’s MySQL at http://www.mysql.com/downloads/, while both community and EnterpriseDB’s offerings of PostgreSQL can be found at: http://www.enterprisedb.com/products/download.do.

By Robin Schumacher and Gary Carter, www.enterprisedb.com

18 Nov 2010

翻译: 结论

正如前面所指出一样,对于许多应用程序的用例,MySQL和PostgreSQL都是非常好的选择。要决定哪一个最适合你,最好的办法就是同时下载它们两个,然后使用进行综合评估。

你可以在 http://www.mysql.com/downloads/ 下载MySQL,PostgreSQL的社区版和EnterpriseDB提供的PostgreSQL版本都可以在这个地址里找到:http://www.enterprisedb.com/products/download.do

作者:Robin Schumacher and Gary Carter, www.enterprisedb.com 18 Nov 2010


注:版权是原作者所有,我只是出于业余爱好进行翻译。

这也是我的处女版翻译文章,有许多不足或表达不清晰的地方,恳请各位指出,我会加以修改,一起为开源、为PostgreSQL作出一份力量。 于2014年12月9号星期二晚,广州