作者简介:高文博,OCM,从事ORACLE专业15年,主要服务于各大升级运营商。现任职北京海天起点技术服务股份有限公司山东办事处主任,为山东省内企事业客户提供海量级ORACLE数据库及相关产品的运维支持,参与出版《Oracle云管理平台——企业管理器12c实战指南》一书。
背景:
在本文之前,写过一篇使用了ORACLE GOLDENGATE DIRECT LOAD 数据初始化来进行OGG数据同步的操作文档。在测试过程中没有问题,但在实际生产中,当数据量上升到一定数量级,初始化速度持续下降的,根据测试160万数据到后期每秒初始化仅不到20条,实际测试数据如下:
根据测试数据,1张58万,1张161万的表,实际初始化时间超过15个小时,考虑到上线后如有数据问题等情况需重新初始化,根本不能满足生产要求,因此必须考虑采用其他方式进行初始化。主要思路有3条
-
使用并行进程,采用多个抽取、复制进程,在进程参数里设置filter条件,将大表分散成若干小表进行初始化
-
使用第三方工具进行初始化
-
使用数据库自己的工具将数据导出到文本,再从目标端将文本导入进行初始化
方案测评
首先第一个方案,在其他环境下可以实施,但在当前数据库环境下无法实施,原因主要有两个:
-
当前源端ORACLE数据库,已有这两张表的抽取进程,并且按照要求只能使用当前抽取进程。
-
在测试中发现因为在DJ_SMZXX表中SMZXH主键是varchar2(20)类型,根据主键对数据进行过滤,就必须将varchar2进行隐式转换成number类型。在转换过程中遇到了OGG的bug OGG GoldenGate : Numcnv_getnumeric – Buffer Overflow When Using Filter “Where” in a Table Statement (文档 ID 1376606.1)。也就是说当在当前的ogg版本中,数字转换时最大为19位,而SMZXH是20位。该BUG需要到13.1版本才能fix。
因此,第一个方案被放弃,其他场景下该方案应能对初始化的速度有所提升。
其次第二个方案,尝试了PLSQL DEVELOPER/TOAD/NAVICAT/ORACLE TO MYSQL等第三方工具,能进行同步的工具都要求能同时连接源端和目标端两套库,当前网络环境ORACLE和MYSQL数据库分别处于内外网,需要不同的跳板机,无法在一台机器上进行同步;或者这些工具在进行数据导出导入时效率太慢(PLSQL DEVELOPER);而KETTLE作为ETL的主流工具,没有时间深入研究,该方案也只能放弃。
最后是第三个方案,其实也就是使用ORACLE的UTL_FILE包,将表数据导出为文本文件,再从目标端使用mysql的LOAD DATA包,将文本文件导入,经过初步测试,效率非常高,决定使用该方案。
部署OGG for MYSQL
下载介质并安装
从http://edelivery.oracle.com选择golden gate for non oracle database11.2.0.1,继续后可以选择for mysql 5.x下载。
将介质上传至服务器指定目录,root用户解压,进入ggsci
1 2 3 4 5 6 7 8 9 10 11 12 |
GGSCI (edudb3) 3> create subdirs Creating subdirectories under current directory /u01/ogg Parameter files /u01/ogg/dirprm: already exists Report files /u01/ogg/dirrpt: created Checkpoint files /u01/ogg/dirchk: created Process status files /u01/ogg/dirpcs: created SQL script files /u01/ogg/dirsql: created Database definitions files /u01/ogg/dirdef: created Extract data files /u01/ogg/dirdat: created Temporary files /u01/ogg/dirtmp: created Stdout files /u01/ogg/dirout: created |
配置MYSQL端mgr
1 2 3 4 5 6 |
GGSCI (edudb3) 2> edit params mgr PORT 7809 DYNAMICPORTLIST 7810-7820, 7830 AUTOSTART ER R* AUTORESTART ER R*, RETRIES 4, WAITMINUTES 4 STARTUPVALIDATIONDELAY 5 |
源端(ORACLE)配置
查看MANAGER进程
1 2 3 4 5 6 7 |
GGSCI (edudb1) 19> view params mgr PORT 7809 userid ggate@source,password ggate AUTOSTART ER * AUTORESTART ER * RETRIES 3 PURGEOLDEXTRACTS /u01/app/oracle/product/ggate/dirdat/*; |
查看源端抽取进程
1 2 3 4 5 6 7 |
GGSCI (edudb1) 18> view params extra1 extract extra1 userid ggate@source,password ggate exttrail /u01/app/oracle/product/ggate/dirdat/sm discardfile /u01/app/oracle/product/ggate/dirdat/extdac.dsc,append,megabytes 1000 tranlogoptions altarchivelogdest instance edudb /u01/app/oracle/oradata/edudb/archivelog table smbs.dj_smzxx; |
使用defgen生成表定义文件
异构数据库需使用defgen生成sourcedefs/targetdefs文件
1 2 3 4 |
GGSCI> edit params defgen DEFSFILE /u01/app/oracle/product/ggate/dirdef/mt.def USERID ggate@source,password ggate table smbs.dj_smzxx; |
到$GGATE_HOME下使用defgen生成定义文件
./defgen paramfile ./dirprm/defgen.prm NOEXTATTR –//NOEXTATTR,不同版本ogg,不同表结构,需要此参数,11.2版本以后支持此参数
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
*********************************************************************** Oracle GoldenGate Table Definition Generator for Oracle Version 11.2.1.0.3 14400833 OGGCORE_11.2.1.0.3_PLATFORMS_120823.1258 Linux, x64, 64bit (optimized), Oracle 11g on Aug 23 2012 16:58:29 Copyright (C) 1995, 2012, Oracle and/or its affiliates. All rights reserved. Starting at 2018-02-13 15:33:40 *********************************************************************** Operating System Version: Linux Version #1 SMP Thu Nov 7 17:01:44 PST 2013, Release 3.8.13-16.2.1.el6uek.x86_64 Node: edudb1 Machine: x86_64 soft limit hard limit Address Space Size : unlimited unlimited Heap Size : unlimited unlimited File Size : unlimited unlimited CPU Time : unlimited unlimited Process id: 16104 *********************************************************************** ** Running with the following parameters ** *********************************************************************** DEFSFILE /u01/app/oracle/product/ggate/dirdef/mt.def USERID ggate@source,password ***** table test.m1; Retrieving definition for SBMS.DJ_SMZXX Definitions generated for 1 table in /u01/app/oracle/product/ggate/dirdef/smzxx.def |
将生成的定义文件复制到目标端
1 2 3 4 5 6 7 |
[root@edudb1 ~]# scp /u01/app/oracle/product/ggate/dirdef/smzxx.def 192.168.0.103:/u01/dirdef/ The authenticity of host '192.168.0.103 (192.168.0.103)' can't be established. RSA key fingerprint is f5:b2:a0:95:62:1c:58:62:6d:b6:55:c1:ae:26:b3:ec. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added '192.168.0.103' (RSA) to the list of known hosts. root@192.168.0.103's password: smzxx.def 100% 787 0.8KB/s 00:00 |
新加一个传送进程
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
GGSCI> add extract MYSQL, exttrailsource /u01/app/oracle/product/ggate/dirdat/sm --这里是对应抽取进程产生的trail文件 EXTRACT added. GGSCI> edit params mysql extract MYSQL SETENV (NLS_LANG =AMERICAN_AMERICA.ZHS16GBK) userid ggate@source,password ggate PASSTHRU RMTHOST 192.168.0.103, MGRPORT 7809 RMTTRAIL /u01/ogg/dirdat/xy DISCARDFILE /u01/app/oracle/product/ggate/dirrpt/mysql.dsc, APPEND, MEGABYTES 1000 table smbs.dj_smzxx; GGSCI (edudb1) 35> ADD RMTTRAIL /u01/ogg/dirdat/xy, EXTRACT MYSQL RMTTRAIL added. |
目标端(MYSQL)配置
查看MYSQL数据库字符集
1 2 3 4 5 6 7 8 9 10 11 12 13 |
mysql> show variables like '%char%'; +--------------------------+----------------------------+ | Variable_name | Value | +--------------------------+----------------------------+ | character_set_client | utf8 | | character_set_connection | utf8 | | character_set_database | gbk | | character_set_filesystem | binary | | character_set_results | utf8 | | character_set_server | utf8 | | character_set_system | utf8 | | character_sets_dir | /usr/share/mysql/charsets/ | +--------------------------+----------------------------+ |
如果字符集不对,导入的中文均未乱码,需要调整字符集。方法如下:
1 |
vi /etc/my.cnf |
在[mysqld]上面加入下面两句话
1 2 3 |
[sql] view plain copy [client] default-character-set=utf8 |
在[mysqld]最下面加入下面几句话
1 2 3 4 |
[sql] view plain copy default-storage-engine=INNODB character-set-server=utf8 collation-server=utf8_general_ci |
然后保存退出
1 |
Service mysqld restart |
登录后运行
1 |
alter database test default character set gbk; --test为数据库名 |
创建表及索引
在正确的字符集下创建支持gbk的innodb表及主键
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
CREATE TABLE test.dj_smzxx ( SMZXH VARCHAR(20) NOT NULL , ZJLX_DM CHAR(3) NULL , ZJHM VARCHAR(20) NULL , ZJMC VARCHAR(200) NULL , ZJXX VARCHAR(1000) NULL , YXQ_Q DATE NULL , YXQ_Z DATE NULL , LRR_DM CHAR(11) NOT NULL , LRRQ DATE NOT NULL , XGR_DM CHAR(11) NOT NULL , XGRQ DATE NOT NULL , YXBZ CHAR(1) NOT NULL , SJHM VARCHAR(11) NULL , SMRZXT_DM VARCHAR(10) NULL , SEX VARCHAR(4) NULL , BORN VARCHAR(20) NULL , NATION VARCHAR(10) NULL , POLICE VARCHAR(200) NULL , ADDRESS VARCHAR(200) NULL , YLSJHM VARCHAR(11) NULL ) engine=innodb DEFAULT CHARSET=gbk; ALTER TABLE dj_smzxx ADD PRIMARY KEY(smzxh); |
要创建innodb引擎的表。
要注意ORACLE和MYSQL的对应数据类型。ORACLE中的number对应MYSQL的int,ORACLE中的varchar2对应MYSQL的varchar,建表中需要进行数据类型的转换。
在实际操作中,生产表DJ_NSR_SMZ有一个字段是DJXH NUMBER(20),20位数字,一开始我对应了int,但在mysql中int类型只有10位数字
1 2 3 4 5 6 7 8 9 10 |
TINYINT TINYINT UNSIGNED -128 – 127 0 - 255 SMALLINT SMALLINT UNSIGNED -32768 – 32767 0 - 65535 MEDIUMINT MEDIUMINT UNSIGNED -8388608 – 8388607 0 - 16777215 INT 或 INTEGER INT UNSIGNED 或 INTEGER UNSIGNED -2147483648 - 2147483647 0 - 4294967295 BIGINT -9223372036854775808 - 9223372036854775807 BIGINT UNSIGNED 0 - 18446744073709551615 |
后来使用bigint也报错,是因为bigint默认19位,一位是符号位,只有在创建表时指定UNSIGNED关键字才能达到20位数字。
1 |
create table test(id bigint(20) UNSIGNED not null); |
测试连接mysql数据库
1 2 3 4 |
GGSCI> dblogin sourcedb test@localhost:3306,userid root,password root 2018-02-13 16:47:55 WARNING OGG-00769 MySQL Login failed: . SQL error (2002). Can't connect to local MySQL server through socket '/tmp/mysql.sock' (2). ERROR: Failed to connect to MySQL database engine for HOST localhost, DATABASE test, USER root, PORT 3306. |
没有/tmp/mysql.sock,实际位置在/etc/my.cnf里有,是在/var/lib/mysql/mysql.sock
修改后,本地mysql登录有问题,只能创建一个软连接,可以正常连接数据库了。
1 2 3 4 |
ln -s /var/lib/mysql/mysql.sock /tmp/mysql.sock GGSCI2> dblogin sourcedb test@localhost:3306,userid root,password root Successfully logged into database. |
MYSQL数据库中给root用户赋权
1 2 |
grant insert on test.* to root@'192.168.0.103' identified by 'root'; GRANT ALL PRIVILEGES ON *.* TO 'root'@'192.168.0.103' WITH GRANT OPTION; |
MYSQL端启动Manager
1 2 |
GGSCI > start mgr GGSCI> info all |
配置GLOBAL
1 2 3 4 5 6 7 8 |
GGSCI > edit param ./GLOBALS checkpointtable test.checkpoint GGSCI > dblogin sourcedb test@localhost:3306,userid root,password root Successfully logged into database. GGSCI > add checkpointtable test.checkpoint Successfully created checkpoint table test.checkpoint. |
添加一个复制进程
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
GGSCI >add replicat ORA, exttrail /u01/ogg/dirdat/xy GGSCI > edit params ora replicat ORA SETENV (NLS_LANG =AMERICAN_AMERICA.ZHS16GBK) targetdb test@localhost,userid root,password root sourcedefs /u01/ogg/dirdef/smzxx.def discardfile /u01/ogg/dirrpt/ORA.dsc,purge ASSUMETARGETDEFS APPLYNOOPUPDATES DYNAMICRESOLUTION EOFDELAYCSECS 30 map SMBS.DJ_SMZXX, target test.dj_smzxx; |
开启传送和复制进程
查看TO ORACLE的传送进程
1 2 3 4 5 6 |
GGSCI (edudb1) 85> info pumpa1 EXTRACT PUMPA1 Last Started 2018-02-14 11:39 Status RUNNING Checkpoint Lag 00:00:00 (updated 00:00:00 ago) Log Read Checkpoint File /u01/app/oracle/product/ggate/dirdat/c0000001 2018-02-20 21:32:35.000000 RBA 2431 |
当前trail文件为c000001,是1号文件,RBA2431
查看MYSQL传送进程
1 2 3 4 5 6 7 8 |
Info mysql GGSCI (edudb1) 70> info mysql EXTRACT MYSQL Last Started 2018-02-20 21:00 Status STOPPED Checkpoint Lag 00:00:00 (updated 00:00:58 ago) Log Read Checkpoint File /u01/app/oracle/product/ggate/dirdat/c0000000 First Record RBA 0 |
指定从当前文件及块号读取
Mysql传送进程读的trail文件是0号文件,不满足我们的同步需求,因此,需要指定从1号文件2431块开始读取。
1 2 |
Alter mysql extseqno 1,extrba 2431 EXTRACT altered. |
源端启动传递进程
1 |
GGSCI > start MYSQL |
MYSQL端启动复制进程
1 |
GGSCI >start ORA |
如同步多张表,启动复制进程报错的话,可等同步完再启动。
查看状态
源端
1 2 3 4 5 6 7 |
GGSCI > info all Program Status Group Lag at Chkpt Time Since Chkpt MANAGER RUNNING EXTRACT RUNNING EXTRA1 00:00:00 00:00:01 EXTRACT RUNNING MYSQL 00:00:00 00:00:05 |
目标端
1 2 3 4 5 6 |
GGSCI > info all Program Status Group Lag at Chkpt Time Since Chkpt MANAGER RUNNING REPLICAT RUNNING ORA 00:00:00 00:00:07 |
数据初始化
源端(ORACLE)
使用UTL_FILE包将表数据导出到dj_smzxx.txt。
1 2 3 4 5 6 7 8 9 10 11 12 |
SQLPLUS> create or replace directory utlexportpath as '/u01/'; SQLPLUS> declare outfile utl_file.file_type; begin outfile := utl_file.fopen('UTLEXPORTPATH','dj_smzxx.txt','W'); for rec in (select * from smbs.dj_smzxx) loop utl_file.put_line(outfile, rec.SMZXH||','||rec.ZJLX_DM||','||rec.ZJHM||','||rec.ZJMC||','||rec.ZJXX||','||rec.YXQ_Q||','||rec.YXQ_Z||','||rec.LRR_DM||','||rec.LRRQ||','||rec.XGR_DM||','||rec.XGRQ||','||rec.YXBZ||','||rec.SJHM||','||rec.SMRZXT_DM||','||rec.SEX||','||rec.BORN||','||rec.NATION||','||rec.POLICE||','||rec.ADDRESS||','||rec.YLSJHM); end loop; utl_file.fclose(outfile); end; |
以上操作导出58万数据不到1分钟
传送dj_smzxx.txt到目标端
1 |
Scp dj_smzxx.txt 192.168.0.103:/u01 |
目标端(MYSQL)
1 2 3 4 5 6 7 |
MYSQL> load data infile '/u01/dj_smzxx.txt' ignore into table dj_smzxx fields terminated by',' lines terminated by '\n' (smzxh, zjlx_dm, zjhm, zjmc, zjxx, @c1, @c2, lrr_dm, @c3, xgr_dm, @c4, yxbz, sjhm, smrzxt_dm, sex, born, nation, police, address, ylsjhm) set yxq_q = str_to_date(@c1,'%d-%b-%y'),yxq_z = str_to_date(@c2,'%d-%b-%y'),lrrq = str_to_date(@c3,'%d-%b-%y'),xgrq = str_to_date(@c4,'%d-%b-%y'); |
使用了mysql中的load data工具,其实也就是sqlloader,
运行结果如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 |
mysql> select now(); +---------------------+ | now() | +---------------------+ | 2018-03-01 21:41:00 | +---------------------+ 1 row in set (0.00 sec) mysql> load data infile '/u01/dj_smzxx.txt' -> ignore into table dj_smzxx -> fields terminated by',' -> lines terminated by '\n' -> (smzxh, zjlx_dm, zjhm, zjmc, zjxx, @c1, @c2, lrr_dm, @c3, xgr_dm, @c4, yxbz, sjhm, smrzxt_dm, sex, born, nation, police, address, ylsjhm) -> set yxq_q = str_to_date(@c1,'%d-%b-%y'),yxq_z = str_to_date(@c2,'%d-%b-%y'),lrrq = str_to_date(@c3,'%d-%b-%y'),xgrq = str_to_date(@c4,'%d-%b-%y'); Query OK, 529822 rows affected, 25 warnings (37.06 sec) Records: 582822 Deleted: 0 Skipped: 53000 Warnings: 18 mysql> show warnings -> ; +---------+------+--------------------------------------------------------------------------------+ | Level | Code | Message | +---------+------+--------------------------------------------------------------------------------+ | Warning | 1265 | Data truncated for column 'YXBZ' at row 102644 | | Warning | 1265 | Data truncated for column 'SEX' at row 102644 | | Error | 1411 | Incorrect datetime value: 'CHRISTIAN' for function str_to_date | | Warning | 1262 | Row 102644 was truncated; it contained more data than there were input columns | | Warning | 1265 | Data truncated for column 'YXBZ' at row 143587 | | Warning | 1265 | Data truncated for column 'SEX' at row 143587 | | Error | 1411 | Incorrect datetime value: 'FRAN' for function str_to_date | | Warning | 1262 | Row 143587 was truncated; it contained more data than there were input columns | | Warning | 1265 | Data truncated for column 'YXBZ' at row 182324 | | Warning | 1265 | Data truncated for column 'SEX' at row 182324 | | Error | 1411 | Incorrect datetime value: 'Yvon ESNAULT' for function str_to_date | | Warning | 1262 | Row 182324 was truncated; it contained more data than there were input columns | | Warning | 1265 | Data truncated for column 'YXBZ' at row 321736 | | Warning | 1265 | Data truncated for column 'SMRZXT_DM' at row 321736 | | Error | 1411 | Incorrect datetime value: '11111111111' for function str_to_date | | Warning | 1048 | Column 'LRRQ' cannot be null | | Error | 1411 | Incorrect datetime value: '11111111111' for function str_to_date | | Warning | 1048 | Column 'XGRQ' cannot be null | | Warning | 1265 | Data truncated for column 'YXBZ' at row 342113 | | Warning | 1265 | Data truncated for column 'SMRZXT_DM' at row 342113 | | Error | 1411 | Incorrect datetime value: '11111111111' for function str_to_date | | Warning | 1048 | Column 'LRRQ' cannot be null | | Error | 1411 | Incorrect datetime value: '11111111111' for function str_to_date | | Warning | 1048 | Column 'XGRQ' cannot be null | | Warning | 1265 | Data truncated for column 'YLSJHM' at row 561005 | +---------+------+--------------------------------------------------------------------------------+ 25 rows in set (0.00 sec) mysql> select now(); +---------------------+ | now() | +---------------------+ | 2018-03-01 21:41:52 | +---------------------+ 1 row in set (0.00 sec) mysql> select count(*) from dj_smzxx; +----------+ | count(*) | +----------+ | 582822 | +----------+ SQL> select count(*) from smbs.dj_smzxx; COUNT(*) ---------- 582822 |
可以看到,算上显示时间、复制黏贴等一系列时间,总共耗费时间才52秒,效率非常高。但最后输出有18条warning,可实际对比条数没有差别,如对实际数据准确度不放心,可逐条检查。
检查两端OGG进程
源端
1 2 3 4 5 6 7 |
GGSCI (edudb1) 22> info all Program Status Group Lag at Chkpt Time Since Chkpt MANAGER RUNNING EXTRACT RUNNING EXTRA1 00:00:00 00:00:08 EXTRACT RUNNING MYSQL 00:00:00 00:00:09 |
目标端
1 2 3 4 5 6 |
GGSCI (edudb3) 24> info all Program Status Group Lag at Chkpt Time Since Chkpt MANAGER RUNNING REPLICAT RUNNING ORA 00:00:00 00:00:01 |
验证
源端(ORACLE)
1 2 3 4 5 |
SQL> select LRRQ from smbs.dj_smzxx where smzxh='20000000000000000001'; LRRQ --------------- 17-MAY-17 |
目标端(MYSQL)
1 2 3 4 5 6 7 |
mysql> select LRRQ from dj_smzxx where smzxh='20000000000000000001'; +------------+ | LRRQ | +------------+ | 2017-05-17 | +------------+ 1 row in set (0.00 sec) |
更新
1 2 3 4 5 6 7 |
SQL> update smbs.dj_smzxx set lrrq=to_date('2017-07-20 00:00:00','yyyy-mm-dd hh24:mi:ss') where smzxh='20000000000000000001'; 1 row updated. SQL> commit; Commit complete. |
源端(ORACLE)
1 2 3 4 5 |
SQL> select LRRQ from smbs.dj_smzxx where smzxh='20000000000000000001'; LRRQ --------------- 20-JUL-17 |
目标端(MYSQL)
1 2 3 4 5 6 7 |
mysql> select LRRQ from dj_smzxx where smzxh='20000000000000000001'; +------------+ | LRRQ | +------------+ | 2017-07-20 | +------------+ 1 row in set (0.00 sec) |
初始化脚本
未经允许不得转载:Oracle一体机用户组 » 使用OGG从ORACLE到MYSQL的数据同步(二)