Alan Lorimer on Mon, 20 Jun 2016 15:31:11
I have a number of VMs, all created from the openlogic Centos 6.7 release. I have has a classic backup vault configured which has been running well until recently.
Since June 8th some of my backups have started failing (I believe those where the server has been rebooted or waagent restarted).
The complaint in the portal is an exclamation mark with the Take Snapshot. Backups always fail around 41-42 minutes after starting. This now affects about 4 out of 6 VMs.
My version of waagent is 2.0.18. I have enabled verbose logging and although there is a lot of chatter, there are no errors being reported. I notice that some automatic updates have been applied, but these don't tie up with the dates of the service failing.
274778 8 -rwxr--r-- 1 root root 5440 Feb 23 23:36 /var/lib/waagent/Microsoft.Azure.RecoveryServices.VMSnapshotLinux-126.96.36.199/setup.py 142700 8 -rwxr--r-- 1 root root 5440 May 24 16:22 /var/lib/waagent/Microsoft.Azure.RecoveryServices.VMSnapshotLinux-188.8.131.52/setup.py 270027 8 -rwxr--r-- 1 root root 5440 Feb 8 23:59 /var/lib/waagent/Microsoft.Azure.RecoveryServices.VMSnapshotLinux-184.108.40.206/setup.py 269919 8 -rwxr--r-- 1 root root 5734 Sep 29 2015 /var/lib/waagent/Microsoft.Azure.RecoveryServices.VMSnapshotLinux-220.127.116.11/setup.py
Does anyone have any idea why this might be happening.
Azam Khan - MSFT on Mon, 20 Jun 2016 16:23:18
There are possibilities that the software firewall prevents the waagent from starting up correctly, only needed to disable the software firewall for the initial startup for everything to function correctly.
Also try to update the latest version of Azure Linux Agent.
Please check this link that might help : https://acom-swtest-2.azurewebsites.net/en-us/documentation/articles/virtual-machines-linux-update-agent/
Alan Lorimer on Mon, 20 Jun 2016 17:01:36
Thanks for the prompt reply.
This server has no software firewall enabled. I can see in the waagent logs that it seems to poll a couple of files that in the storage that contains the server's blob. That is working fine, indeed it never reports an error.
I am on the latest version of the agent - 2.0.8 - that is available via a yum update. Using the URL you provided I can see that there is a zip file of 2.1.1 - I'm happy to go to this if you think I should but then I'll lose the yum update facility.
I feel that there is something else. How is the snapshot taken by the server and how is it reported back to the Azure fabric?
Azam Khan - MSFT on Thu, 23 Jun 2016 18:52:07
When the Azure Backup service initiates a backup job at the scheduled time, it triggers the backup extension to take a point-in-time snapshot. This snapshot is taken in coordination with the Volume Shadow Copy Service (VSS) to get a consistent snapshot of the disks in the virtual machine without having to shut it down.
After the snapshot is taken, the data is transferred by the Azure Backup service to the backup vault. To make the backup process more efficient, the service identifies and transfers only the blocks of data that have changed since the last backup.
When the data transfer is complete, the snapshot is removed and a recovery point is created.