PhxSQL is a high-availability and strong-consistency MySQL cluster built on Paxos and Percona.
Authors: Junchao Chen (junechen@tencent.com), Haochuan Cui (lynncui@tencent.com), Duokai Huang (mariohuang@tencent.com), Ming Chen (mingchen@tencent.com) and Sifan Liu (stephenliu@tencent.com)
#PhxSQL features:
- high resilience to nodes failure and network partition: the cluster works well when more than half of cluster nodes work and are interconnected.
- high availability by automatic failovers.
- guarantee of data consistency among cluster nodes: replacing loss-less semi-sync between MySQL master and MySQL slaves with Paxos, PhxSQL ensures zero-loss binlogs between master and slaves.
- easy deployment and easy maintenance: PhxSQL, powered by in-house implementation of Paxos, has only 4 components including MySQL and doesn't depend on zookeeper or etcd for anything. PhxSql supports automated cluster membership hot reconfiguration.
- complete compliance with MySQL and MySQL client.
This project includes
- Source codes
- Third party submodules
- Binaries in Ubuntu 64bit system.
It depends the other projects which are also published by tencent-wechat( phxpaxos, phxrpc, libco ). You can download or clone them with --recurse-submodule.
phxpaxos: http://github.com/tencent-wechat/phxpaxos
phxrpc: http://github.com/tencent-wechat/phxrpc
libco: http://github.com/tencent-wechat/libco
If you prefer using binaries directly, just skip this part.
- PhxSQL
- phxsqlproxy
- phxbinlogsvr
- percona
- phx_percona
- plugin
- phxsync_phxrpc
- semisync
- third_party
- glog
- leveldb
- protobuf
- phxpaxos
- colib
- phxrpc
- tools
- phxrpc_package_config
| Name | Introduction |
|---|---|
| phxsqlproxy | In charge of access and handle request |
| phxbinlogsvr | In charge of binlog synchronization, mastership and membership management |
| percona | Source code of percona5.6.31-77.0 |
| phx_percona/plugin/phxsync_phxrpc | A plugin which is used for MySQL to post binlog to phxbinlogsvr |
| phx_percona/plugin/semisync | We modified some plugin APIs in MySQL, This is a compatible version of semisync |
| third_party/glog | Directory to store GLOG library |
| third_party/leveldb | Directory to store LevelDB library |
| third_party/protobuf | Directory to store Google Protobuf 3.0+ library |
| third_party/phxpaxos | Directory to store PhxPaxos library |
| third_party/colib | Directory to store Libco library |
| third_party/phxrpc | Directory to store Phxrpc library |
PhxSQL needs 6 third party libs(glog, leveldb, protobuf, phxpaxos, colib, phxrpc). Please install them in phxsql/third_party directory or just link to third_party.
NOTE: Please make sure -fPIC is added while executing configure in GLOG and Protobuf as well as specifying --prefix=/the/current/absolute/path.
For example: ./configure CXXFLAGS=-fPIC --prefix=/home/root/phxsql/third_party/glog.
Then download percona-server-5.6.31-77.0.tar.gz
Move percona-server-5.6\_5.6.31-77.0 to PhxSQL directory, rename or link as 'percona'
(NOTE: Only percona-server-5.6_5.6.31-77.0 is available)
- Execute
./autoinstall.sh && make && make install - Execute 'make package' to generate a tar.gz package so you can transfer to your online hosts.
(NOTE: We will put the binaries in install_package/sbin, configuration files in install_package/tools/etc_template, install scripts in install_package/tools. The 'make package' command will pack 'install_package' into 'phxsql-$version.tar.gz'. Please specify -prefix=/the/path/you/want/to/install while executing ./autoinstall.sh)
PhxSQL need to be run on more than 2 hosts. We suggest N >= 3 and N is an odd number(N means the number os hosts)
-
Transfer phxsql.tar.gz to all of the hosts you want to install. Then do as the following step:
-
Execute
tar -xvf phxsq.tar.gz . -
Enter phxsql/tools, Execute
python install.py --helpto get the help of installation.(For example:
python2.7 install.py -i"your_inner_ip" -p 54321 -g 6000 -y 11111 -P 17000 -a 8001 -f/tmp/data/)
-
-
After executing 'install.py' on all the hosts, Execute './phxbinlogsvr_tools_phxrpc -f InitBinlogSvrMaster -h"ip1,ip2,ip3" -p 17000' in any one hosts. 17000 should be replaced into the port which phxbinlogsvr is listening.
-
The cluster is active while the message shows master initialization is finished.
-
You can execute some SQLs to check the status of cluster through
mysql -uroot -h"your_inner_ip" -P$phxsqlproxy_port
- Enter phxsql/tools/.
- Execute
test_phxsql.sh phxsqlproxy_port ip1 ip2 ip3
PhxSQL have 3 configuration files in total.
NOTE:Modify tools/etc_temlate/my.cnf before installation, Modify etc/my.cnf after installation
| Section name | Key name | comment |
|---|---|---|
| AgentOption | AgentPort | Port for the connection of binlogsvr and MySQL |
| EventDataDir | Directory where to store the binlogsvr data | |
| MaxFileSize | File size per data of phxbinlogsvr, the unit is B | |
| MasterLease | Lease length of master, the unit is second | |
| CheckPointTime | The data before CheckPointTime will be deleted by phxbinlogsvr, but it will not be deleted if some other PhxSQL nodes have not learned yet, the unit is minute | |
| MaxDeleteCheckPointFileNum | The maximum number of files deleted each time by phxbinlogsvr | |
| FollowIP | Enabled if it is a follower node and will learn binlog from this FollowIP, this node will not vote |
|
| PaxosOption | PaxosLogPath | Directory where to store paxos data |
| PaxosPort | Port for paxos to connect each other | |
| PacketMode | The maximum size of paxos log for PhxPaxos,1 means 100M, but the network timeout will be 1 minute, 0 means 50M and network timeout is 2s(changed in dynamic). | |
| Server | IP | IP for phxbinlogsvr to listen |
| Port | Port for phxbinlogsvr to listen | |
| LogFilePath | Directory to store log | |
| LogLevel | Log level of phxbinlogsvr |
| Section name | Key name | comment |
|---|---|---|
| Server | IP | IP for phxsqlproxy to listen |
| Port | Port for phxsqlproxy to listen | |
| LogFilePath | Directory to store log | |
| LogLevel | Log level of phxbinlogsvr | |
| MasterEnableReadPort | Enable readonly-port in master node |
phxsqlproxy is the acces layer os PhxSQL, all requests will be sent to phxsqlproxy and be redirected to MySQL.
It equals the port configured in phxsqlproxy.conf.
Every requests will be sent to a master node to excute through this port.
It equals MasterPort + 1. You can also specify it by setting SlavePort = xxxxx. in phxsqlproxy.conf.
Every requests will be executed on the local MySQL. A master node will make a redirection to another slave nodes if MasterEnableReadPort = 0(this will save the CPU/IO resource for write requests)
- Using
mysql -u$user -h$phxsqlproxyip -P$phxsqlproxyport -p$pwdto connect phxsqlproxy - Execute SQL command.
$phxsqlproxyipcan be any one IP of the clusters and$phxsqlproxyportcan be one ofMasterPortorSlavePort.
PhxSQL provides a tool phxbinlogsvr_tools_phxrpc to help the mangerment of PhxSQL.
PhxSQL cluster needs 1 MySQL admin accounts and 1 synchronization account. The default admin account is (root, "" ), the default synchronization account is ( replica, replica123 ), They can be modified( and only be modifyed ) through phxbinlogsvr_tools_phxrpc. DON'T DO THIS MANUALLY.
Following is some commands you may used frequently.
**Function:**Get the current master info from quorum nodes( IP and timeout ).
Arguments:
- Host: Any one IP of clusters nodes
- Port: Port which phxbinlogsvr is listening. like
17000
phxbinlogsvr_tools -f SetMySqlAdminInfo -h <host> -p <port> -u <admin username> -d <admin pwd> -U <new admin username> -D <new admin pwd>
Function: Set the user and password of admin account.
Arguments:
- Host: Any one IP of clusters nodes
- Port: Port which phxbinlogsvr is listening. like
17000 - Admin username: Current account user( default is
root) - Admin pwd: Current account password( default is
"") - New admin username: New user
- New admin pwd: New password
phxbinlogsvr_tools -f SetMySqlReplicaInfo -h <host> -p <port> -u <admin username> -d <admin pwd> -U <new replica username> -D <new replica pwd>
Function: Set the user and password of synchronization account.
参数:
- Host: Any one IP of clusters nodes
- Port: Port which phxbinlogsvr is listening. like
17000 - Admin username: Current account user( default is
root) - Admin pwd: Current account password( default is
"") - New replica username: New user
- New replica pwd: New password
Function: Membership of this cluster, all IPs and Ports included.
Arguments:
- Host: Any one IP of clusters nodes
- Port: Port which phxbinlogsvr is listening. like
17000
Execute phxbinlogsvr_tools_phxrpc -f RemoveMember -h<host> -p<port> -m <ip_of_nodeA> to delete node A.
Once it is succesfully executed, A will not learn binlog after a small period.
- Execute
phxbinlogsvr_tools -f AddMember -h<host> -p<port> -m <ip_of_nodeA>to add node A into the membership. - Install PhxSQL on A.
- A will begin to learn data after installation is finished.
- Copy a snapshot of MySQL from any other nodes to A.
- Kill phxbinlogsvr and access MySQL through the local port( or socket ). then execute
set super_read_only = 0; set read_only = 0; - Dump the snapshot into MySQL.
- A will begin to work after a while.
Phxbinlogsvr will pull the checkpoint in another node to reboot during reinstallation. It will self-kill after pulling is over(to make sure the consistency). You can reboot phxbinlogsvr after a message like "All sm load state ok, start to exit" appears.
phxbinlogsvr will stop working if a data problem arise in MySQL. We suggest you to check the status of MySQL.
CPU : Intel(R) Xeon(R) CPU E5-2420 0 @ 1.90GHz * 24
Memory : 32G
Disk : SSD Raid10
Master -> Slave : 3 ~ 4ms
Client -> Master : 4ms
sysbench --oltp-tables-count=10 --oltp-table-size=1000000 --num-threads=500 --max-requests=100000 --report-interval=1 --max-time=200
| Client Threads | Clusters | Test sets | |||||
|---|---|---|---|---|---|---|---|
| insert.lua (100% write) | select.lua (0% write) | OLTP.lua (20% write) | |||||
| QPS | Costs | QPS | Costs | QPS | Costs | ||
| 200 | PhxSQL | 5076 | 39.34/56.93 | 46334 | 4.21/5.12 | 25657 | 140.16/186.39 |
| 200 | MySQL semi-sync | 4055 | 49.27/66.64 | 47528 | 4.10/5.00 | 20391 | 176.39/226.76 |
| 500 | PhxSQL | 8260 | 60.41/83.14 | 105928 | 4.58/5.81 | 46543 | 192.93/242.85 |
| 500 | MySQL semi-sync | 7072 | 70.60/91.72 | 121535 | 4.17/5.08 | 33229 | 270.38/345.84 |
NOTE:The 2 costs numbers means average and 95% percentile