Correction strategies v7
Before you begin manual resolution correction, first determine the extent of the inconsistencies that occurred in the publication tables across the primary nodes of the replication system.
The Conflict History tab in the Replication Server console and the SQL query described in Finding conflicts can help determine the source of an initial conflict.
However, once this conflict occurs, your replication system might have processed and replicated more transactions during that synchronization operation. Some of these later replications might have succeeded, but others might have failed or produced unexpected results because of the prior conflict. With a replication schedule in effect, more synchronization operations can occur, which can create more conflicts.
Therefore, when you learn that a conflict occurred, we recommend that you stop the publication server. Use the stop option of the Linux scripts or Windows services described in Registering a publication server.
In this way, you can carefully analyze the content of the publication tables in question as well as any pending transactions in the shadow tables. This approach helps you to determine the best course of action without continued updates by the running replication system.
When analyzing your tables. you must determine the following:
- The publication tables that contain inconsistent rows across primary nodes (that is, missing rows on some primary nodes or rows with different column values for the same primary key on different primary nodes).
- Pending transactions in the shadow tables not applied to the publication tables across all primary nodes. Pending transactions are denoted by a value of
P
in therrep_tx_conflict_status
column of the shadow table. - Transactions on the publication tables that have occurred and are recorded in the shadow tables following the initial conflict. Also determine whether these transactions were applied completely and correctly to the publication tables across all primary nodes. These transactions might not be marked as pending. Instead, their
rrep_tx_conflict_status
column might be set to null, meaning that no conflict was detected during replication or the transaction wasn't replicated yet. You can identify these transactions because they have a laterrrep_tx_timestamp
value than the transactions causing the initial conflict.
The general steps to resolving the problem following this analysis are the following:
Make the needed manual corrections to the rows in the publication tables across all primary nodes to get them into an initial, consistent state so each publication table has the same set of identical rows across primary nodes. This might be to a state before the conflicting transactions occurred, depending on what you determine to be the easiest course of action for fully resolving the conflict.
Apply or reapply transactions (either from your application or from the shadow tables) so that all publication tables across all primary nodes are updated consistently according to the desired, expected result of what was recorded in the shadow tables.
In the shadow tables, update certain indicators for conflicting entries to verify that they were resolved.
In the control schema, update certain indicators for the conflicting entries to show that these conflicts were resolved. This update changes the Resolution Status of these entries to Resolved in the Conflict History tab. These entries no longer appear in the SQL query described in Finding conflicts.
Perform these updates to the control schema of the controller database. You can determine the currently designated controller database from the content of the Replication Server configuration file (see Replication Server configuration file). The publication server ensures that the control schema changes made on the controller database are replicated to the control schemas of all publication databases. This mechanism maintains metadata consistency across all publication databases.
Resume operating your replication system. Start the publication server and recreate the replication schedule if you were using one.
To acheive the first two steps, use some combination of the following methods. The methods you use depend on the state of your publication tables and the extent of pending transactions that need to be applied from the shadow tables.
- Manual publication table correction. Use a utility such as PSQL or pgAdmin (Postgres Enterprise Manager Client in Advanced Server) to manually correct the rows in the publication tables across all primary nodes without replicating these changes. Use the database user with
session_replication_role
set toreplica
for this purpose. - Correction using new transactions. Rerun your application on one primary node to create new transactions that can replicate to all other primary nodes. Use this method after you ensure that all publication tables are in a consistent state across all primary nodes.
- Correction using shadow table transactions. Force the synchronization of transactions already recorded in the shadow tables. Use this method if many shadow table transactions need to be applied and it's simpler to force the synchronization of these transactions rather than reissuing the transactions from your application.
Example replication environment
In the examples that follow, the following replication environment is used.
- A three-node multi-master replication system is established. The primary node names are
MMRnode_a
(the primary definition node and the controller database),MMRnode_b
, andMMRnode_c
. - The publication is named
emp_pub
and uses dept and emp tables. - The conflict used to show the first two conflict resolution methods is a uniqueness conflict occurring on the dept table on primary key column
deptno
on value 50. This conflict resulted from theINSERT
statements shown by the following:
On MMRnode_a
, the following statement is run:
On MMRnode_b
, the following statement is run:
A synchronization replication is then performed.
The following is the content of table dept on MMRnode_a
:
The following is the content of table dept on MMRnode_b
:
The following is the content of table dept on MMRnode_c
:
The Conflict History tab shows the following entry:
The following is the output from the SQL query described in Finding conflicts.
Manual publication table correction
The first step in all manual conflict resolutions is to ensure all publication tables are consistent across all primary nodes. That is, all corresponding tables have the same rows with the same column values.
After that, you can then reapply transactions that failed to replicate successfully.
Using the example from Correction strategies, the inconsistencies are the following:
- Primary nodes
MMRnode_a
andMMRnode_b
each contain a row with primary key value50
, but the other column values in the row are different. - Primary node
MMRnode_c
doesn't have a row with primary key value50
.
Assuming that the correct state of the dept table is the one in MMRnode_b
, the following options are available to correct the state of all primary nodes:
- Manually correct the dept table in
MMRnode_a
andMMRnode_c
. That is, update the row inMMRnode_a
so it has the correct values, and insert the missing row inMMRnode_c
. The dept table on all nodes is now consistent and up to date. - Manually delete the row with primary key value
50
from the table on bothMMRnode_a
andMMRnode_b
. This brings the dept table on all primary nodes back to a prior, consistent state. Then, with the multi-master replication system running, perform the insert transaction again using the correct column values on any one of the primary nodes. - Manually delete the incorrect row with primary key value
50
from the table onMMRnode_a
. Leave the correct row in the table inMMRnode_b
. This simulates the state where the correct transaction was run onMMRnode_b
, is recorded in the shadow table, but has not yet been replicated, and the incorrect transaction was never run onMMRnode_a
. Update the shadow table entry inMMRnode_a
to indicate that it is discarded and to ensure it isn't included in any future synchronizations. Update the metadata for the shadow table entry inMMRnode_b
to force its inclusion in the next synchronization. Perform a synchronization replication so the accepted shadow table entry inMMRnode_b
is replicated toMMRnode_a
andMMRnode_c
.
After the publication table rows are corrected, update the appropriate control schema table in the publication database currently designated as the controller database to indicate that the conflict has been resolved.
Step 1
Manually correct the rows in the publication tables with session_replication_role
set to replica
. On MMRnode_a
, correct the erroneous row:
On MMRnode_c
, insert the missing row:
The dept table on MMRnode_a
and MMRnode_c
now match the content of the table on MMRnode_b
:
Step 2
Update the shadow table entries for the conflicting transactions in the primary nodes to indicate that the conflict was resolved. In each primary node where a transaction occurred that's involved in the conflict, inspect the shadow table for the publication table in question. Shadow tables are located in each primary node in schema _edb_replicator_pub
. Shadow tables follow the naming convention rrst_schema_table
where schema
is the name of the schema containing the publication table and table
is the name of the publication table.
Note the following points regarding shadow tables:
- A row in a shadow table corresponds to an
INSERT, UPDATE
, orDELETE
statement that's applied to the corresponding publication tables in the other primary nodes. A shadow table row doesn't necessarily correspond to the SQL statement issued by the user application. For example, a SQL statement issued by a user application that includes aWHERE
clause using a range such as greater than or less than results in multiple, individual entries in the shadow table for each row in the result set of the application’s SQL statement. - The primary key of a shadow table is a program-generated positive integer in
column rrep_sync_id
. Therrep_sync_id
values are unique among all shadow tables in a given primary node. Therefore, therrep_sync_id
values for conflicting transactions might not have the same value across primary nodes. This depends on how many prior transactions were recorded in the shadow tables of each primary node. - A shadow table entry for a transaction involved in a conflict that wasn't yet resolved contains a value of
P
(pending) in columnrrep_tx_conflict_status
. If a transaction isn't involved in a conflict, this column is set to null. (Most shadow table entries have null in this column.) If a transaction was involved in a conflict that was resolved automatically by the publication server, and this transaction was accepted as correct, this column containsC
(complete/accepted). If a transaction was involved in a conflict that was resolved automatically, and this transaction was deemed incorrect, this column containsD
(discarded).
To find the shadow table entries involved in a conflict, use the Conflict History tab in the Replication Server console or the SQL query described in Finding conflicts and shown by the following output:
You can then query the shadow table in the desired primary node on its rrep_sync_id
value.
The following query is performed on the shadow table for the dept table in MMRnode_a
on rrep_sync_id
value 2 obtained from field src_rrep_sync_id
of RECORD 1 in the preceding output.
A similar query can locate the pending shadow table entry in MMRnode_b
by querying on the key value obtained from field src_rep_sync_id: of RECORD 2:
Note
To be certain no pending transactions are overlooked, examine the shadow tables in all primary nodes that might be involved in the conflict, and search for entries where rrep_tx_conflict_status
is set to P
.
The following shows the rrep_tx_conflict_status
column marked P
(pending) in the Postgres Enterprise Manager Client.
Modify column rrep_tx_conflict_status
by changing the value to D
(discarded) to show that the pending conflict was resolved. A value of D
also ensures that the shadow table entry isn't replicated during any future synchronization replications.
Make this change to the shadow tables in both MMRnode_a
and MMRnode_b
.
Be sure to qualify the row with the correct rrep_sync_id
value if you perform the update using a SQL statement such as in the following:
There's no shadow table entry in MMRnode_c
, since an insert transaction wasn't performed in that primary node by the application.
Step 3
In the control schema of the publication database currently designated as the controller database, modify the entries in the xdb_conflicts table
to indicate the conflict was resolved. Table xdb_conflicts
is located in schema _edb_replicator_pub
.
Note
The entries in table xdb_conflicts affect only the data that appears in the Conflict History tab and the SQL query described in Finding conflicts. Changing entries in xdb_conflicts
has no effect on future replication operations but provides a way to keep a record of how past conflicts were resolved.
Note the following points regarding the xdb_conflicts
table:
- A row in the
xdb_conflicts
table appears as an entry in the Conflict History tab. - The primary key of the
xdb_conflicts table
is made up of columnssrc_db_id
,target_db_id
,src_rrep_sync_id
, andtarget_rrep_sync_id
. Columnsrc_db_id
contains a unique identifier for the primary node in which a transaction occurred that results in a conflict when replicated to the primary node identified bytarget_db_id
.src_rrep_sync_id
is the shadow table identifier of the transaction on the source primary node involved in the conflict, whiletarget_rrep_sync_id
is the shadow table identifier of the transaction on the target primary node involved in the conflict. For uniqueness (insert/insert) conflicts, thetarget_rrep_sync_id
value is always set to0
. For a given uniqueness conflict, there are two entries in thexdb_conflicts table
. Thesrc_rrep_sync_id
value in each of the two entries corresponds to the shadow table identifiers. One is for the shadow table identifier associated with the source primary node. The other is for the shadow table identifier associated with the target primary node. - Table
xdb_pub_database
in the control schema associates the database identifierssrc_db_id
andtarget_db_id
with the primary node attributes such as the database name, IP address, and port. - Column
table_id
is the identifier of the publication table on which the conflict occurred. Association of thetable_id
value with the publication table attributes such as its name, schema, and shadow table is found in each primary node in_edb_replicator_pub.rrep_tables
. - For uniqueness (insert/insert) conflicts only, column
pk_value
contains text indicating the primary key value that resulted in the conflict. The text is formatted ascolumn_name=value
. If the primary key is made up of two or more columns, each column and value pair is separated by the keyword AND, such ascolumn_1=value_1 AND column_2=value_2
. This provides the primary key of the row in the publication table designated bytable_id
that resulted in the conflict. Only uniqueness (insert/insert) conflicts contain thecolumn_name=value
text in thepk_value
column. Thepk_value
column is null for all other conflict types (that is, update/update, delete/update, update/delete, and delete/delete conflicts). - Column
resolution_status
indicates the status of the conflict. Possible values areP
(pending) orC
(completed, that is the conflict was resolved). This status appears in the Resolution Status column of the Conflict History tab. - You can use the column
win_db_id
to record the database identifier of the primary node that contains the winning (accepted) transaction. This information appears in the Winning DB column of the Conflict History tab. - You can use column
win_rrep_sync_id
to record the shadow table identifier of the winning transaction.
The following shows the Conflict History tab prior to updating the xdb_conflicts
table.
You can find the conflict entry for synchronization from MMRnode_a
to MMRnode_b
in xdb_conflicts
with the following query for this example:
You can find the conflict entry for synchronization from MMRnode_b
to MMRnode_a
in xdb_conflicts with the following query for this example:
For uniqueness (insert/insert) conflicts only, you can use the following query to display both of the preceding entries:
These entries appear in the Postgres Enterprise Manager Client.
Change the value in column resolution_status
from P
(pending) to C
(completed) to indicate this conflict was resolved. The value in winning_db_id
changes to 4
to indicate primary node MMRnode_b
contains the winning transaction. The value in winning_rrep_sync_id
changes to the value of rrep_sync_id
for the shadow table entry of the transaction in MMRnode_b
, since this is the one deemed correct.
The SQL statement to perform this update for the MMRnode_a
to the MMRnode_b
synchronization conflict is the following:
The SQL statement to perform this update for the MMRnode_b
to the MMRnode_a
synchronization conflict is the following:
For uniqueness (insert/insert) conflicts only, you can use the following SQL statement to update both of the preceding entries at the same time:
The following are the updated xdb_conflicts entries:
When viewed in the Conflict History tab, the entries now show Resolved instead of Pending in the Resolution Status column, and the Winning DB column shows the address of primary node MMRnode_b
.
Correction using new transactions
You can bring all the publication tables to a consistent state by removing any changes caused by the conflicting transactions. Then issue new, corrected transactions at one primary node, which you allow the multi-master replication system to synchronize to all other primary nodes.
Referring back to the uniqueness conflict on the dept table, instead of correcting the erroneous row and inserting the row into the primary node where it is missing as described in Manual publication table correction, you can delete the conflicting rows from all primary nodes and then insert the correct row in one primary node and let the multi-master replication system synchronize the correct row to all primary nodes.
Step 1
Manually delete the inserted row from the publication tables in all primary nodes with session_replication_role
set to replica
.
On MMRnode_a
, delete the erroneous row:
On MMRnode_b
, delete the row even though the transaction created the correct result:
On MMRnode_c
, no changes are required as the conflicting transaction didn't insert a new row into the table on this node:
Step 2
Rerun the transaction on one primary node with the multi-master replication system running and with session_replication_role
set to the default (origin
).
For this example, the correct INSERT
statement is executed on MMRnode_a
:
Step 3
Perform synchronization replication.
The same rows now appear in the publication table on all primary nodes. On MMRnode_a
:
On MMRnode_b
:
On MMRnode_c
:
Step 4
Update the shadow table entries for the conflicting transactions in the primary nodes to indicate that the conflict was resolved as in Step 2 of Manual publication table correction.
Change the rrep_tx_conflict_status
column from P
(pending) to D
(discarded) on all primary nodes.
Note
The second entry for the accepted transaction you ran in Step 2, where rrep_tx_conflict_status
is set to null, indicates there was no conflict.
There's no shadow table entry in MMRnode_c
, since an insert transaction wasn't performed in that primary node by the application.
Step 5
In the control schema of the publication database currently designated as the controller database, modify the entries in the xdb_conflicts
table to indicate the conflict was resolved as in Step 3 of Manual publication table correction.
Correction using shadow table transactions
You can bring all publication tables to a consistent state is by removing changes caused by the conflicting transactions. You then modify the publication table’s metadata in such a way that the next synchronization results in the replication of transactions already stored in the shadow tables.
Such transactions might not have successfully replicated to all the other primary nodes in a prior synchronization for various reasons.
The following is an example of such a case:
- Applications on two primary nodes insert rows with the same primary key value. This results in a uniqueness conflict when synchronization replication occurs.
- Following the insert on one primary node, the application continues to apply updates to the newly inserted row. These updates are successfully applied to the row on this primary node and are recorded in the shadow table on this node.
- Synchronization replication is performed.
- Since there is a uniqueness conflict, the rows with the conflicting primary key value aren't replicated into the publication tables on the other primary nodes.
- However, the conflicting row on the primary node that wasn't directly updated receives those update transactions by the replication, resulting in possibly inconsistent, updated rows on the two primary nodes.
Two options are:
- Manually insert the missing row into the other primary nodes and manually change the incorrect row.
- Rerun the application to reapply the correct insert and updates.
However, the following option provides a way to reapply the transactions already recorded in the shadow table of the winning primary node.
The example used to illustrate this method is based upon the following transactions on the emp table.
In MMRnode_b
, the following row is inserted:
In MMRnode_c
, the following row is inserted with the same primary key value 9001 in the empno
column:
Synchronization replication is performed. The resulting content of the emp
table is as follows:
On MMRnode_a
the conflicting row has not been replicated:
On MMRnode_b
the conflicting row inserted on this node remains, but is updated with the transactions replicated from MMRnode_c
:
On MMRnode_c
the conflicting row inserted on this node remains along with the updates performed on this node:
In this example, it is assumed that the desired, correct row is on MMRnode_c
.
The following are the steps to reproduce the correct row, currently on MMRnode_c
, to the other primary nodes by synchronizing the shadow table entries that resulted from the original insert and updates to this row on MMRnode_c
.
Step 1
Manually delete the inserted row from the publication tables on all primary nodes except for MMRnode_c
, which has the correct row. Be sure session_replication_role
is set to replica
.
On MMRnode_a
, this row does not exist:
On MMRnode_b
, delete the erroneous row:
On MMRnode_c
, leave the correct, accepted row intact:
Step 2
On the primary nodes containing the conflicting row to discard, mark the shadow table entry for that row as discarded. This mark indicates the conflict on this row was resolved and ensures this shadow table entry isn't replicated in the future.
Change the rrep_tx_conflict_status
column from P
(pending) to D
(discarded) on the losing node, MMRnode_b
.
Step 3
On winning node MMRnode_c
, inspect the shadow table for the emp publication table.
The objective is to use the shadow table entries for the insert and three update transactions that were previously run on this node to replicate to the other primary nodes during the next synchronization.
The left-most columns of the shadow table appear as follows:
Make note of the rrep_sync_id
values for these four entries, which are 1
, 2
, 3
, and 4
in this example.
The following shows the right-most columns of the shadow table from the figure. Note the contents of column rrep_tx_conflict_status
furthest to the right.
Make sure the rrep_tx_conflict_status
column is null for these four entries. In this case, for the insert transaction, you need to change the P
(pending) value to null.
The resulting change for the rrep_tx_conflict_status
column in the shadow table on MMRnode_c
is shown by the following:
Step 4
To replicate these four shadow table entries during the next synchronization, you must add one or more entries to the control schema table _edb_replicator_pub.rrep_MMR_txset
on MMRnode_c
to indicate pending status for synchronization to the target primary nodes (MMRnode_a
and MMRnode_b
) of the four shadow table entries. These shadow table entries are identified by the rrep_sync_id
values of 1
, 2
, 3
, and 4
noted in Step 3.
First, you must identify the pub_id
and target db_id
values to associate with the pending transactions. To do so, invoke the following query, substituting the rrep_sync_id
values for sync_id
in the query:
In this example, there are four values to substitute for sync_id
, which are 1
, 2
, 3
, and 4
.
The results are the following:
The results indicate that the previously executed synchronization that attempted to apply the shadow table transactions identified by the rrep_sync_id
values of 1
, 2
, 3
, and 4
were all for the publication identified by pub_id
of 3
. The target primary nodes were identified by db_id
of 1
(for MMRnode_a
) and db_id
of 4
(for MMRnode_b
).
Thus, you must insert at least two entries into the control schema table _edb_replicator_pub.rrep_MMR_txset
on MMRnode_c
. At least one entry is required for the target db_id
of 1
and at least one entry for the target db_id
of 4
.
Each entry in _edb_replicator_pub.rrep_MMR_txset
consists of a range of rrep_sync_id
values (identified by columns start_rrep_sync_id
and end_rrep_sync_id
). The desired shadow table rrep_sync_id
values happen to be contiguous (1 through 4). Thus a single entry can encompass the four rrep_sync_id
values for a single target database.
In this example, you can add a total of two entries to _edb_replicator_pub.rrep_MMR_txset
, one for each target database.
Note
When multiple, noncontiguous rrep_sync_id
values are required for synchronization (for example, 1
, 2
, 5
, and 6
), multiple entries are required for each target database. The entries specify rrep_sync_id
ranges to collectively cover all of the noncontiguous values but omitting rrep_sync_id
values not included in the synchronization (for example, one entry for 1 through 2 and a second entry for 5 through 6).
Step 5
Insert the entries into the _edb_replicator_pub.rrep_MMR_txset
control schema table identified in the preceding step.
The two INSERT
statements invoked on MMRnode_c
are the following:
A query of the _edb_replicator_pub.rrep_MMR_txset metadata table displays the following:
There are now two new entries with pending status: one for target db_id
1, the other for target db_id
4. Both entries cover the rrep_sync_id
range of 1 through 4.
The two entries with completed status are from the synchronization attempt that first produced the conflict.
Step 6
Perform synchronization replication.
The insert and three update transactions recorded in the rrst_edb_emp
shadow table on MMRnode_c
are replicated to the other primary nodes.
These rows now match the row created by the original transactions on MMRnode_c
:
Step 7
In the control schema of the publication database currently designated as the controller database, modify the entries in the xdb_conflicts table to indicate the conflict was resolved as in Step 3 of Manual publication table correction.
For a uniqueness (insert/insert) conflict only, the following query on the xdb_conflicts
table in the controller database can display the conflicts:
The following SQL statement changes the value in column resolution_status
from P
(pending) to C
(completed) to indicate this conflict was resolved. The value in winning_db_id
changes to 56
to indicate primary node MMRnode_c
contains the winning transaction. The value in winning_rrep_sync_id
is changed to the value of rrep_sync_id
for the shadow table entry of the INSERT
transaction in MMRnode_c
, since this is the correct one.
When viewed in the Conflict History tab, the entry now shows Resolved in the Resolution Status column, and the Winning DB column shows the address of primary node MMRnode_c
.