Question

Alessandro.Zadro on Thu, 10 Jul 2014 14:55:11


Hello,

We are experiencing strange updates from subscriber to publisher.

The system is a "default settings" merge replication, 1 publisher SQL Server 2008 Enterprise, 24 subscribers SQL Server 2008 Standard, with filtered partitions and pull subscritions.

Every 30 minutes each subscriptor replicates against publisher.

Normal update activity is 300 rows / 30 minutes. 

Sometimes 1 or 2 subscribers start to upload 200k rows that nobody changed (during the day and during the night, with no customer activity at all). There isn't any client updating those rows.

It seems that replication tables are somehow out of sync or corrupted and the merge agent needs to upload records.

The result is thousands of conflicts at publisher.

The short-term solution to conflicts is to apply mergedummypudate stored procedure to those records.

The problem disappears by reinitializing subscriber.

The issue has a suspicius frequency of 14 days (default snapshot generation frequency)

A suspicius memory dump of replication merge agent appears on subscriber.

WinDbg:

*******************************************************************************
*                                                                             *
*                        Exception Analysis                                   *
*                                                                             *
*******************************************************************************

*** WARNING: Unable to verify timestamp for replmerg.exe
*** ERROR: Module load completed but symbols could not be loaded for replmerg.exe
*** WARNING: Unable to verify timestamp for sqlncli10.dll

FAULTING_IP: 
replrec+6314d
4e6b314d 8b420c          mov     eax,dword ptr [edx+0Ch]

EXCEPTION_RECORD:  ffffffff -- (.exr 0xffffffffffffffff)
ExceptionAddress: 4e6b314d (replrec+0x0006314d)
   ExceptionCode: c0000005 (Access violation)
  ExceptionFlags: 00000000
NumberParameters: 2
   Parameter[0]: 00000000
   Parameter[1]: 0000000c
Attempt to read from address 0000000c

CONTEXT:  00000000 -- (.cxr 0x0;r)
eax=00000000 ebx=00000000 ecx=01d1d8c0 edx=00000020 esi=00000e3c edi=01d1e414
eip=772370f4 esp=01d1e3cc ebp=01d1e438 iopl=0         nv up ei ng nz ac pe cy
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00000297
ntdll!KiFastSystemCallRet:
772370f4 c3              ret

PROCESS_NAME:  replmerg.exe

ERROR_CODE: (NTSTATUS) 0xc0000005 - La instrucci n en 0x%08lx hace referencia a la memoria en 0x%08lx. La memoria no se pudo %s.

EXCEPTION_CODE: (NTSTATUS) 0xc0000005 - La instrucci n en 0x%08lx hace referencia a la memoria en 0x%08lx. La memoria no se pudo %s.

EXCEPTION_PARAMETER1:  00000000

EXCEPTION_PARAMETER2:  0000000c

READ_ADDRESS:  0000000c 

FOLLOWUP_IP: 
replrec+6314d
4e6b314d 8b420c          mov     eax,dword ptr [edx+0Ch]

NTGLOBALFLAG:  0

APP:  replmerg.exe

ANALYSIS_VERSION: 6.3.9600.17029 (debuggers(dbg).140219-1702) amd64fre

FAULTING_THREAD:  00000fc8

BUGCHECK_STR:  APPLICATION_FAULT_NULL_CLASS_PTR_READ_BEFORE_CALL

PRIMARY_PROBLEM_CLASS:  NULL_CLASS_PTR_READ_BEFORE_CALL

DEFAULT_BUCKET_ID:  NULL_CLASS_PTR_READ_BEFORE_CALL

LAST_CONTROL_TRANSFER:  from 6b0beaba to 4e6b314d

STACK_TEXT:  
WARNING: Stack unwind information not available. Following frames may be wrong.
01d1f7d0 6b0beaba 01040048 0416716c 02eadb6c replrec+0x6314d
01d1fe2c 6b0c0153 00000000 01d1fe5c 01d1fe78 replprov+0x5eaba
01d1fe3c 4e6898be 024b2ff8 01d1fe5c 01d1fe9c replprov+0x60153
01d1fe78 4e6ac0a1 04164424 02eaca2c 00368d6c replrec+0x398be
01d1fee8 4e6af01d 00000001 023a1b68 00000000 replrec+0x5c0a1
01d1ff1c 734429bb 00000000 ee206aa3 00000000 replrec+0x5f01d
01d1ff54 73442a47 00000000 76bcee1c 023a7478 msvcr80!_endthreadex+0x3b
01d1ff5c 76bcee1c 023a7478 01d1ffa8 772537eb msvcr80!_endthreadex+0xc7
01d1ff68 772537eb 023a7478 74e02971 00000000 kernel32!BaseThreadInitThunk+0xe
01d1ffa8 772537be 734429e1 023a7478 00000000 ntdll!__RtlUserThreadStart+0x70
01d1ffc0 00000000 734429e1 023a7478 00000000 ntdll!_RtlUserThreadStart+0x1b


STACK_COMMAND:  ~9s; .ecxr ; kb

SYMBOL_STACK_INDEX:  0

SYMBOL_NAME:  replrec+6314d

FOLLOWUP_NAME:  MachineOwner

MODULE_NAME: replrec

IMAGE_NAME:  replrec.dll

DEBUG_FLR_IMAGE_TIMESTAMP:  4e7aebe9

FAILURE_BUCKET_ID:  NULL_CLASS_PTR_READ_BEFORE_CALL_c0000005_replrec.dll!Unknown

BUCKET_ID:  APPLICATION_FAULT_NULL_CLASS_PTR_READ_BEFORE_CALL_replrec+6314d

ANALYSIS_SOURCE:  UM

FAILURE_ID_HASH_STRING:  um:null_class_ptr_read_before_call_c0000005_replrec.dll!unknown

FAILURE_ID_HASH:  {0fe3e937-3154-07c1-51d1-be50efc87f87}

Followup: MachineOwner

Thank you in advance for any suggestion!


Sponsored



Replies

Hilary Cotter on Thu, 10 Jul 2014 15:14:45


Is this publication/was this publication part of another merge publication? There is an issue with this causing something very similar.

What is going on is likely something triggered by the metadata cleanup process. Using the mergedummypudate  proc is not the best choice here.