Host Level Backup not Working for One VM in MABS

Category: azure backup

Question

Jamie Sandell on Thu, 18 Oct 2018 10:35:44


Hi

We have MABS installed and use it to back up our VMs at the host level.

This is working fine for all VMs except for one.

The error it gives is:

"

  • Affected area: RCT\GBYOR-XXXXX
    Occurred since: 18/10/2018 10:01:33
    Description: The replica of Microsoft Hyper-V RCT\GBYOR-XXXXX on GBYOR-XXXXX.Cluster.domain.com is inconsistent with the protected data source. All protection activities for data source will fail until the replica is synchronized with consistency check. You can recover data from existing recovery points, but new recovery points cannot be created until the replica is consistent. 

    For SharePoint farm, recovery points will continue getting created with the databases that are consistent. To backup inconsistent databases, run a consistency check on the farm. (ID 3106)
    An unexpected error occurred while the job was running. (ID 104 Details: The filename, directory name, or volume label syntax is incorrect (0x8007007B))
    More information
    Recommended action: Retry the operation.
    Synchronize with consistency check.
    Run a synchronization job with consistency check...
    Resolution: To dismiss the alert, click below
    Inactivate

"

I've looked up this error, and suggestions are to run DiskShadow /L c:\temp\output.txt

list writers

exit

And examine the output.txt file for '\\' which should lead to any invalid image names that need correcting.

I've tried this on the VM and the Hyper-V host that it is on, however there is nothing in the output.txt files that need fixing.

I've tried the following troubleshooting:

  • Restarting the VM
  • Restarting the Hyper-V host
  • Restarting the MABS server
  • Removing the VM from the protection group and deleting the data already held for it in the backup destination, and then adding it back to the protection group

Backups in MABS at the guest level for 'All Volumes' and 'System Protection' work fine for the VM.

Any ideas?

Replies

WyattHavron-MSFT on Fri, 19 Oct 2018 15:23:06


There are a few other things that might cause this issue. I would try these steps next -

  1. Check for space-related issues in the DPM storage pool on the Microsoft Azure Backup Server, and allocate storage as required.
  2. Check the state of the Volume Shadow Copy Service on the protected server. If it is in a disabled state, set it to start manually. Start the service on the server. Then go back to the DPM/Microsoft Azure Backup Server console, and start the sync with the consistency check job.

If those steps do not get it back to consistency (or if you both have enough space and shadow copy is already running) I would try a manual consistancy check using the steps here - https://technet.microsoft.com/en-us/library/cc161279.aspx

the article is a bit older, but should still be accurate.

Let us know if this does not work!

“If this answer was helpful, click “Mark as Answer” or Up-Vote. To provide additional feedback on your forum experience, click Here

Jamie Sandell on Mon, 22 Oct 2018 09:04:10


Hi WyattHavron

I've made sure that there are no space-related issues on the DPM storage pool. I have disabled protection of the VM that used Storage Disk 1 on the NAS, removed the current replication data and have enabled protection again for the VM this time specifiyng Storage Disk 2 on the NAS,

Tried a consistency check and it failed again, this time with a different error.

"

Type: Replica creation
Status: Failed
Description: The protection agent on GBYOR-VSMABS1.lan.cyclops-electronics.com was temporarily unable to respond because it was in an unexpected state. (ID 60 Details: Internal error code: 0x809909B0)
More information
End time: 22/10/2018 09:47:10
Start time: 22/10/2018 09:09:30
Time elapsed: 00:37:40
Data transferred: 164,872.69 MB
Cluster node GBYOR-SHPV4.lan.cyclops-electronics.com
Source details: RCT\GBYOR-VSFS4
Protection group: Virtual Servers - Storage 3

"

I've seen this that lists the above error (https://support.microsoft.com/en-gb/help/971411/protection-jobs-for-various-protected-servers-fail-with-errors-in-syst) - it says the cause is " Type: Recovery point Status: Failed Description: The protection agent on Server_name.com was temporarily unable to respond because it was in an unexpected state. (ID 60 Details: Internal error code: 0x809909B0)"

I throttle all machines to 85% as recommended and ran the consistency check again.

This time it failed with the error it normally does which is "

Affected area: RCT\GBYOR-VSFS4
Occurred since: 22/10/2018 10:13:25
Description: The replica of Microsoft Hyper-V RCT\GBYOR-VSFS4 on GBYOR-VSFS4.GBYOR-CLUSTER1.lan.cyclops-electronics.com is inconsistent with the protected data source. All protection activities for data source will fail until the replica is synchronized with consistency check. You can recover data from existing recovery points, but new recovery points cannot be created until the replica is consistent. 

For SharePoint farm, recovery points will continue getting created with the databases that are consistent. To backup inconsistent databases, run a consistency check on the farm. (ID 3106)
An unexpected error occurred while the job was running. (ID 104 Details: The filename, directory name, or volume label syntax is incorrect (0x8007007B))
More information
Recommended action: Retry the operation.
Synchronize with consistency check.
Run a synchronization job with consistency check...
Resolution: To dismiss the alert, click below
Inactivate

"

Fyi, it does back up some of the VM because it says it transferred 160 GB. Again OS level backups for the contents of the VM is working, it's just when trying to back it up at the host level that it fails.

'VSSAdmin List Writers' on the affected VM showed no VSS errors.


Jamie Sandell on Mon, 29 Oct 2018 08:40:23


Bump...anyone?

Jamie Sandell on Mon, 26 Nov 2018 08:43:27


Hi WyattHavron

I realised I hadn't upgraded the DPMRA on the servers following upgrading MABS to v3. I've done this now. When I run a consistency check (as it's showing as being in an inconsistent state), I get this warning

"

The description for Event ID 902 from source MSDPM cannot be found. Either the component that raises this event is not installed on your local computer or the installation is corrupted. You can install or repair the component on the local computer.

If the event originated on another computer, the display information had to be saved with the event.

The following information was included with the event: 

The job encountered an internal fatal error.  Contact Microsoft Customer Support

Problem Details:
<RecoveredFromFatalError><__System><ID>1059</ID><Seq>13</Seq><TimeCreated>23/11/2018 09:51:05</TimeCreated><Source>TaskExecutor.cs</Source><Line>384</Line><HasError>True</HasError></__System><ExceptionType>System.NullReferenceException</ExceptionType><ExceptionDetails>   at Microsoft.Internal.EnterpriseStorage.Dls.Prm.FindActiveNodeBlock.DetermineIfAnotherPossibleOwnerNodeExists(Message msg)
   at Microsoft.Internal.EnterpriseStorage.Dls.TaskExecutor.Fsm.ConnectionPoint.Execute(Message msg)
   at Microsoft.Internal.EnterpriseStorage.Dls.TaskExecutor.Fsm.Engine.ChangeState(Message msg)
   at Microsoft.Internal.EnterpriseStorage.Dls.TaskExecutor.TaskInstance.ExecuteOnEngineRestart(Message msg, Boolean&amp; isTaskForceFailed)</ExceptionDetails><ExceptionMessage>Object reference not set to an instance of an object.</ExceptionMessage><CustomDetails>&lt;?xml version="1.0" encoding="utf-16"?&gt;
&lt;TaskExecutionContext&gt;
  &lt;PrmWriterId&gt;713b9566-8a2f-4ae6-a628-07a58c0fd0cc&lt;/PrmWriterId&gt;
  &lt;PrmDatasourceId&gt;a19ff99c-8780-4a2c-8733-aa8f97b4b5da&lt;/PrmDatasourceId&gt;
  &lt;PrmPhysicalReplicaId&gt;c9eb2662-02f1-49f0-81aa-6bd8bdef9fdc&lt;/PrmPhysicalReplicaId&gt;
  &lt;PrmReplicaValidity&gt;Invalid&lt;/PrmReplicaValidity&gt;
  &lt;PrmReplicaStatus&gt;Idle&lt;/PrmReplicaStatus&gt;
  &lt;PrmOwnerLockId&gt;00000000-0000-0000-0000-000000000000&lt;/PrmOwnerLockId&gt;
  &lt;PrmActiveNodeName&gt;GBYOR-SHPV4.lan.cyclops-electronics.com&lt;/PrmActiveNodeName&gt;
  &lt;PrmLogicalReplicaId&gt;7b1d088a-77cd-4b81-add4-69ea4301f39f&lt;/PrmLogicalReplicaId&gt;
  &lt;PrmDatasetId&gt;febc94aa-fa5a-4859-9a4d-e8a58b32d3d4&lt;/PrmDatasetId&gt;
  &lt;TEVerb&gt;ValidateFixupReplica&lt;/TEVerb&gt;
  &lt;TETaskXml&gt;&amp;lt;?xml version="1.0" encoding="utf-16"?&amp;gt;
&amp;lt;BackupTask xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" BackupType="Full" Generation="Son" WaitForAllowedConflictingOperationToFinish="true" CheckDataIntegrity="false" IsLastRetryAttempt="true" TaskId="2a672573-1b1a-4a9e-9da4-a95ba98a0c39" AAClsId="da6aa17a-d61c-4e9c-8cea-db25dea52a95" SourceWorkItemId="9dc20d54-c653-40aa-939f-422548c04591" DestinationWorkItemId="645cf9f6-381e-4bed-98c7-b43b01465365" xmlns="http://schemas.microsoft.com/2003/dls/arm/TaskDefinitions.xsd"&amp;gt;
  &amp;lt;CommonConnectionAttributes Compression="true" Encryption="false" Checksum="false" /&amp;gt;
  &amp;lt;ProtectionConfig DataSourceId="a19ff99c-8780-4a2c-8733-aa8f97b4b5da"&amp;gt;
    &amp;lt;HostEndpoint NameOrIP="GBYOR-VSFS4" PrincipalName="GBYOR-VSFS4.GBYOR-CLUSTER1.lan.cyclops-electronics.com" Port="5718" /&amp;gt;
    &amp;lt;ProtectedObject Name="Protection" WriterId="713b9566-8a2f-4ae6-a628-07a58c0fd0cc" LogicalPath="" ComponentName="3173AA0F-E76F-4B78-BD14-B8AE453DEADB" ComponentType="FileGroup" Caption="Th-VM\GBYOR-VSFS4" TimeStamp="0" /&amp;gt;
  &amp;lt;/ProtectionConfig&amp;gt;
  &amp;lt;ReplicaTaskConfig&amp;gt;
    &amp;lt;HostEndpoint NameOrIP="GBYOR-VSMABS1" PrincipalName="GBYOR-VSMABS1.lan.cyclops-electronics.com" Port="5718" /&amp;gt;
    &amp;lt;ForcedHeavyweight&amp;gt;true&amp;lt;/ForcedHeavyweight&amp;gt;
  &amp;lt;/ReplicaTaskConfig&amp;gt;
  &amp;lt;DatasetConfig DatasetId="febc94aa-fa5a-4859-9a4d-e8a58b32d3d4" DependentDatasetId="00000000-0000-0000-0000-000000000000" /&amp;gt;
&amp;lt;/BackupTask&amp;gt;&lt;/TETaskXml&gt;
  &lt;TEErrorDetails&gt;&amp;lt;?xml version="1.0" encoding="utf-16"?&amp;gt;
&amp;lt;q1:ErrorInfo ErrorCode="0" DetailedCode="0" DetailedSource="0" ExceptionDetails="" xmlns:q1="http://schemas.microsoft.com/2003/dls/GenericAgentStatus.xsd" /&amp;gt;&lt;/TEErrorDetails&gt;
  &lt;TELastStateName&gt;BackupRAForRead.ACInquiryPending&lt;/TELastStateName&gt;
&lt;/TaskExecutionContext&gt;</CustomDetails><OccurredDateTime>23/11/2018 09:51:05</OccurredDateTime></RecoveredFromFatalError>


the message resource is present but the message is not found in the string/message table

"

Then there is a an info alert that follows it:

"

The description for Event ID 997 from source MSDPM cannot be found. Either the component that raises this event is not installed on your local computer or the installation is corrupted. You can install or repair the component on the local computer.

If the event originated on another computer, the display information had to be saved with the event.

The following information was included with the event: 

A non-fatal failure instance has been detected for process 'msdpm'. This will be reported to Microsoft.  

Problem Details:
<FatalServiceError><__System><ID>19</ID><Seq>14</Seq><TimeCreated>23/11/2018 09:51:05</TimeCreated><Source>DpmThreadPool.cs</Source><Line>163</Line><HasError>True</HasError></__System><ExceptionType>NullReferenceException</ExceptionType><ExceptionMessage>Object reference not set to an instance of an object.</ExceptionMessage><ExceptionDetails>System.NullReferenceException: Object reference not set to an instance of an object.
   at Microsoft.Internal.EnterpriseStorage.Dls.Prm.FindActiveNodeBlock.DetermineIfAnotherPossibleOwnerNodeExists(Message msg)
   at Microsoft.Internal.EnterpriseStorage.Dls.TaskExecutor.Fsm.ConnectionPoint.Execute(Message msg)
   at Microsoft.Internal.EnterpriseStorage.Dls.TaskExecutor.Fsm.Engine.ChangeState(Message msg)
   at Microsoft.Internal.EnterpriseStorage.Dls.TaskExecutor.TaskInstance.ExecuteOnEngineRestart(Message msg, Boolean&amp; isTaskForceFailed)</ExceptionDetails></FatalServiceError>


the message resource is present but the message is not found in the string/message table

"

Any ideas?

WyattHavron-MSFT on Thu, 29 Nov 2018 19:20:26


Interesting. Event 902 is associated with SPPSVC service start up, but is normally informational, not an error.

The contents of the error would seem to point to a problem with the machine trying to determine if another possible node owner exists and returning a null instead.

If you check in C:\Program Files\Microsoft Azure Recovery Services Agent\Temp do you see any additional error logs from MABS included there? We may get a more useful error there that might help us head in the right direction

Jamie Sandell on Fri, 30 Nov 2018 10:53:32


Hi Wyatt

This issue is now resolved.

The solution was to:

    • Upgrade MABS from v2 to v3
    • Move the VM storage to another CSV on another host.
    • Perform a consistency check on the affected VM
      1. Which passed successfully
    • Create a recovery point manually
    1. Which passed successfully
  1. Move the VM storage back to its original CSV
  2. Perform steps 3 and 4 to make sure it was still working.