Correction strategies v7
Before you begin manual resolution correction, it's important to determine the extent of the inconsistencies that occurred in the publication tables across the primary nodes of the replication system.
The Conflict History tab and the SQL query described in Finding conflicts can help determine the source of an initial conflict.
However, once this conflict occurs, your replication system might have processed and replicated more transactions during that synchronization operation. Some of these later replications might have succeeded, but others might have failed or produced unexpected results because of the prior conflict. With a replication schedule in effect, more synchronization operations can occur, which can create more conflicts.
Therefore, when you discovered that a conflict occurred, we recommended that you stop the publication server. Use the stop option of the Linux scripts or Windows services described in Step 1 of Registering a publication server.
In this way, you can carefully analyze the content of the publication tables in question as well as any pending transactions in the shadow tables. This approach helps you to determine the best course of action without continued updates by the running replication system.
When analyzing your tables, you must determine which publication tables contain inconsistent rows across primary nodes (that is, missing rows on some primary nodes or rows with different column values for the same primary key on different primary nodes).
The general steps to resolving the problem following this analysis are:
Make the needed manual corrections to the rows in the publication tables across all primary nodes to get them into an initial, consistent state so each publication table has the same set of identical rows across primary nodes. This might be to a state before the conflicting transactions occurred, depending on what you determine is the easiest course of action for fully resolving the conflict.
Apply transactions (either from your application or from transaction blocks as defined in Conflict resolution concept for the log-based method) so that all publication tables across all primary nodes are updated consistently according to the desired, expected result.
In the control schema, update certain indicators for the conflicting entries to show that these conflicts are resolved. This update changes the Resolution Status of these entries to Resolved in the Conflict History tab. These entries no longer appear in the SQL query described in Finding conflicts.
Perform these updates to the control schema of the controller database. You can determine the currently designated controller database from the content of the Replication Server configuration file (see xDB replication configuration rile). The publication server ensures that the control schema changes made on the controller database are replicated to the control schemas of all publication databases to maintain metadata consistency across all publication databases.
Resume operating your replication system. Start the publication server and re-create the replication schedule if you were using one.
For accomplishing steps 1 and 2, use some combination of the following methods. The methods you use depends upon the state of your publication tables.
- Manual publication table correction. Use a utility such as
PSQL
orpgAdmin
(Postgres Enterprise Manager Client in Advanced Server) to manually correct the rows in the publication tables across all primary nodes without replicating these changes. Apply these manual corrections in the transaction block described in Conflict resolution concept for the log-based method. - Correction using new transactions. To create new transactions that you allow to replicate to all other primary nodes, rerun your application on one primary node. Use this method after you ensure that all publication tables are in a consistent state across all primary nodes.
In the description of these methods, the following replication environment is used.
- A three-node multi-master replication system is established. The primary node names are
MMRnode_a
(the primary definition node and the controller database),MMRnode_b
, andMMRnode_c
. - The publication is named
emp_pub
and uses the dept and emp tables. - The conflict used to show the conflict resolution methods is a uniqueness conflict occurring on the dept table on primary key column
deptno
on value 50. This conflict is a result of theINSERT
statements shown by the following:
On MMRnode_a
, the following statement is run:
A synchronization replication is then performed.
The following is the content of table dept on MMRnode_a
:
The following is the content of table dept on MMRnode_b
:
The following is the content of table dept on MMRnode_c
:
The Conflict History tab shows the following entry:
The following is the output from the SQL query described in Finding conflicts.
Manual publication table correction
The first step required in all manual conflict resolutions is to ensure all publication tables are consistent across all primary nodes. All corresponding tables must have the same rows with the same column values.
Once this state is achieved, you can then reapply transactions that failed to replicate successfully.
In the example shown in Correction strategies, the inconsistencies are the following:
- Primary nodes
MMRnode_a
andMMRnode_b
each contain a row with primary key value50
, but the other column values in the row are different. - Primary node
MMRnode_c
doesn't have a row with primary key value50
.
Assuming that the correct state of the dept table is the one in MMRnode_b
, you can use one of the following options to correct the state of all primary nodes:
- Manually correct the dept table in
MMRnode_a
andMMRnode_c
. That is, update the row inMMRnode_a
so it has the correct values, and insert the missing row inMMRnode_c
. The dept table on all nodes is then consistent and up to date. - Manually delete the row with primary key value
50
from the table on bothMMRnode_a
andMMRnode_b
. This brings the dept table on all primary nodes back to a prior, consistent state. Then, with the multi-master replication system running, perform the insert transaction again using the correct column values on any one of the primary nodes.
After the publication table rows are corrected, update the appropriate control schema table in the publication database currently designated as the controller database to indicate that the conflict was resolved.
The method outlined by the first bullet point is accomplished as follows.
Step 1
Manually correct the rows in the publication tables with SQL statements incorporated in a transaction block as described in Conflict resolution concept for the log-based method.
On MMRnode_a
, correct the erroneous row by running the following transaction block:
This is shown by the following:
On MMRnode_c
, insert the missing row with the following transaction block:
This is shown by the following:
The dept table on MMRnode_a
and MMRnode_c
now match the content of the table on MMRnode_b
:
Step 2
In the control schema of the publication database currently designated as the controller database, modify the entry in the xdb_conflicts table to indicate the conflict was resolved. Table xdb_conflicts is located in schema _edb_replicator_pub
.
Note
The entries in table xdb_conflicts affect only the data that appears in the Conflict History tab and the SQL query described in Finding conflicts. Changing entries in xdb_conflicts doesn't affect future replication operations but provides a way to keep a record of how past conflicts were resolved.
Note the following regarding the xdb_conflicts table:
A row in the xdb_conflicts table appears as an entry in the Conflict History tab.
The primary key of the xdb_conflicts table is made up of columns
src_db_id
,target_db_id
,src_rrep_sync_id
, andtarget_rrep_sync_id
.Column src_db_id
contains a unique identifier for the primary node in which a transaction occurred that results in a conflict when replicated to the primary node identified bytarget_db_id
.src_rrep_sync_id
is the identifier of the transaction on the source primary node involved in the conflict.target_rrep_sync_id
is the identifier of the transaction on the target primary node involved in the conflict.Note
The
src_rrep_sync_id
andtarget_rrep_sync_id
values are used internally by Replication Server. You don't need them for the manual conflict resolution process.Table xdb_pub_database in the control schema associates the database identifiers
src_db_id
andtarget_db_id
with the primary node attributes such as the database name, IP address, and port.Column
table_id
is the identifier of the publication table where the conflict occurred. Association of thetable_id
value with the publication table attributes, such as its name and schema, is found in each primary node in_edb_replicator_pub.rrep_tables
.Column
pk_value
contains text indicating the primary key value that resulted in the conflict. The text is formatted ascolumn_name=value
. If the primary key is composed of two or more columns, each column and value pair is separated by the keyword AND, such ascolumn_1=value_1 AND column_2=value_2
. This syntax provides the primary key of the row in the publication table designated bytable_id
that resulted in the conflict.Column
resolution_status
indicates the status of the conflict. Possible values areP
(pending) orC
(completed: the conflict was resolved). This status appears in the Resolution Status column of the Conflict History tab.You can use column
win_db_id
to record the database identifier of the primary node that contains the winning (accepted) transaction. This information appears in the Winning DB column of the Conflict History tab.
Pending uniqueness conflict
For this example, you can find the entry for the pending insert/insert conflict on the deptno
primary key value of 50
in xdb_conflicts with the following query:
This entry appears in the Postgres Enterprise Manager Client.
Pending conflict in xdb_conflicts
Change the value in column resolution_status
from P
(pending) to C
(completed) to indicate this conflict was resolved. Change the value in winning_db_id
to 22
to indicate primary node MMRnode_b
contains the winning transaction.
The SQL statement to perform this update for the MMRnode_a
to the MMRnode_b
synchronization conflict is:
When viewed in the Conflict History tab, the entry now shows Resolved
instead of Pending
in the Resolution Status column, and the Winning DB column shows the address of primary node MMRnode_b
.
Correction using new transactions
Another method for bringing all the publication tables to a consistent state is by removing any changes caused by the conflicting transactions and then issuing new, corrected transactions at one primary node. You allow the multi-master replication system to synchronize this node to all other primary nodes.
Referring back to the uniqueness conflict on the dept table, instead of correcting the erroneous row and inserting the row into the primary node where it's missing, as described in Manual publication table correction, you can delete the conflicting rows from all primary nodes and then insert the correct row in one primary node. Then let the multi-master replication system synchronize the correct row to all primary nodes.
Manually delete the inserted row from the publication tables in all primary nodes using the transaction block described in Conflict resolution concept for the log-based method.
On
MMRnode_a
, delete the erroneous row with the following transaction block:On
MMRnode_b
, delete the row even though the transaction created the correct result:On
MMRnode_c
, no changes are required, as the conflicting transaction didn't insert a new row into the table on this node:Rerun the correct transaction on one primary node with the multi-master replication system running. Don't run this in the transaction block described in Conflict resolution concept for the log-based method, as the objective is to synchronize it to all primary nodes.
For this example, execute the correct INSERT statement on
MMRnode_a
:On
MMRnode_a
:Perform synchronization replication.
The same rows now appear in the publication table on all primary nodes.
On
MMRnode_a
;On
MMRnode_b
;On
MMRnode_c
;In the control schema of the publication database currently designated as the controller database, modify the entry in the xdb_conflicts table to indicate the conflict was resolved as in Step 2 of Conflict resolution concept for the log-based method.