Condividi tramite


Conflicts in Peer to Peer replication

Recently I have worked on a case where Customer configured the Peer to Peer replication and he started getting conflicts right from the beginning, Conflicts are expected to occur if we are doing changes to same row on multiple peer nodes, but in our case nobody has access to database to write something on second node so we were surprised to see conflicts. On top of it conflict resolution says it is update-update conflict and conflict winner column is no longer exist , Error message is something like this "This row no onger exists in '[dbo].[customers]'". that is surprising to see because there is winner that should have a row and column value too.

Customer was very eager to know how can this happen and even I was curious to repro. So troubleshooting went as following.

So first things first , on the second peer node We need to collect audit traces which will help us to identify where are the INSERTS,UPDATES and DELETES coming from , so We have collected SQL server audit traces and we have noticed an update on second peer node apart from replication store procedure, something like below.

select * from fn_get_audit_file('C:\repl_issue\*.sqlaudit',default,default) where statement like '%update%'

You need to check additional_information column in the output, if only updates are coming from replication then you will see object_name='sp_MSupd_dboCustomers02030113239' but if you see anything apart from this (like below) then that means there is someone or something making changes.

<tsql_stack><framenest_level='2'database_name='repl_db'schema_name='dbo'object_name='update_customers'/></tsql_stack>

So there is definitely update coming from object "update_customers", so first thought came into my mind was whether it is a trigger, when we checked yes there was trigger with that name.

in the initial phase we have asked them to disable trigger , issue surfaced almost 90% but we still noticed one or two conflicts per day, why we are still getting conflicts even after disabling trigger i will explain in my repro (below) but final resolution for this issue is to modify trigger on all peer nodes and add

"NOT for Replication", something like below

 ALTER TRIGGER[dbo].[customer_updates]

ON [dbo].[customers]

AFTER UPDATE

NOT for Replication

AS 

UPDATE customers 

SET

customers.dateupdated=getutcdate() 

FROM 

inserted 

WHERE 

customers.ID=inserted.ID

GO

But Customer wanted to prove that trigger is the cause here and I was also interested so started doing repro. And this post is to give you some insight on how conflict detection and resolution happens.

Demo

use the below script to create database and table

create database p2p

use p2p 

create table customers(CumID nvarchar(10) CONSTRAINT[PK_Customers] PRIMARYKEY,Name nvarchar(50) notnull,datecreated datetimeoffset CONSTRAINT[CreationDate] DEFAULTG ETDATE(),datemodified datetimeoffset CONSTRAINT[modifiedDate] DEFAULTGETDATE())

then take backup and restore on second node and configure peer to peer replication. use the below script to create trigger on both the nodes. after configuring the peer to peer replication , change the publication property "Continue replication after conflict detection" to True, otherwise your distribution agent will be stopped considering this error is critical.

CREATE TRIGGER [dbo].[update_customers]
ON [dbo].[Customers]
AFTER UPDATE
AS
   UPDATE
 Customers
   SET
 Customers.dateModified = getutcdate()
   FROM
 inserted
   WHERE
 Customers.CumID = inserted.CumID

GO

Now lets insert a first record.

insert into customers values(111,'ravi',getutcdate(),getutcdate()) select

select * from customers  and make sure record present in both the nodes

now I want to pause my repro here and explain about the hidden column that gets added to table when you configure peer to peer replication.

$sys_p2p_cd_id is the column name that gets added , and you need to use DAC connection to see that column and there are functions that can be used to convert column values, something like below.

select cumID,name,datecreated,datemodified,sys.fn_replvarbintoint( $sys_p2p_cd_id) as originator_id , sys.fn_replp2pversiontotranid($sys_p2p_cd_id) as tran_id from customers

output will be like this.

 

$sys_p2p_cd_id include originator ID from where that record got originated and transaction ID. and this is the value used to detect the conflicts, according to this MSDB article  https://msdn.microsoft.com/en-us/library/bb934199.aspx "In peer-to-peer replication, conflicts are detected by the stored procedures that apply changes to each node, based on a hidden column in each published table. This hidden column stores an ID that combines an originator IDthat you specify for each node and the version of the row. During synchronization, the Distribution Agent executes procedures for each table. These procedures apply insert, update, and delete operations from other peers. If one of the procedures detects a conflict when it reads the hidden column value, it raises error 22815 that has a severity level of 16:

A conflict of type '%s' was detected at peer %d between peer %d (incoming), transaction id %s and peer %d (on disk), transaction id %s"

Hope now you know how conflicts are detected, The conflict is resolved based on the originator ID value that you specified during configuration: the row that originated at the node with the highest ID wins the conflict.

Now going back to demo. first disable triggers on both the nodes. and update the record and check the originator ID.

in my case , Node1 originator ID=1 and Node 2 originator ID=202. now I updated a row on Node 2 (ID 202)

update customers set name='ravi_1' where CumID=111
select cumID,name,datecreated,datemodified,sys.fn_replvarbintoint( $sys_p2p_cd_id) as originator_id , sys.fn_replp2pversiontotranid($sys_p2p_cd_id) as tran_id from customers

So  now I see that record got updated and we have correct originator ID on both sides. Lets enable the trigger on Node 1 (originator id =1)

after enabling trigger on Node1, update the same row again in Node 2(originator id=202).

update customers set name='ravi_1_tigger_enabled_Node1' where CumID=111
select cumID,name,datecreated,datemodified,sys.fn_replvarbintoint( $sys_p2p_cd_id) as originator_id , sys.fn_replp2pversiontotranid($sys_p2p_cd_id) as tran_id from customers

execute the above select statement on both the nodes and I am sure you will see the difference in originator ID column in both the nodes, something like below.

That confirms that trigger is causing the issue and we have difference in originator ID in both the server, we need to do another update to set this correctly even after disabling trigger on node1, so that is the reason you will see one more conflict even after disabling the triggers.

If you open the conflicts you should see conflict as well. there is lot of backend work done by replication store procedure, I do not want to put that in this article, it will be overhead, I will try to post another article on how replication will send records to another node. Hope this helps someone at least.