Monthly Archives: August 2011

Azure FAQ: How frequently is the clock on my Windows Azure VM synchronized?

Q. How often do Windows Azure VMs synchronize their internal clocks to ensure they are keeping accurate time?

A. This basic question comes up occassionally, usually when there is concern around correlating timestamps across instances, such as for log files or business events.  Over time, like mechanical clocks, computer clocks can drift, with virtual machines (especially when sharing cores) effected even more. (This is not specific to Microsoft technologies; for example, it is apparently an annoying issue on Linux VMs.)

I can’t find any official stats on how much drift happens generally (though some data is out there), but the question at hand is what to do to minimize it. Specifically, on Windows Azure Virtual Machines (VMs) – including Web Role, Worker Role, and VM Role – how is this handled?

According to this Word document – which specifies the “MICROSOFT ONLINE SERVICES USE RIGHTS SUPPLEMENTAL LICENSE TERMS, MICROSOFT WINDOWS SERVER 2008 R2 (FOR USE WITH WINDOWS AZURE)” – the answer is once a week. (Note: the title above includes “Windows Server 2008 R2” – I don’t know for sure if the exact same policies apply to the older Windows Server 2008 SP2, but would guess that they do.)

Here is the full quote, in the context of which services you can expect will be running on your VM in Windows Azure:

Windows Time Service. This service synchronizes with time.windows.com once a week to provide your computer with the correct time. You can turn this feature off or choose your preferred time source within the Date and Time Control Panel applet. The connection uses standard NTP protocol.

So Windows Azure roles use the time service at time.windows.com to keep their local clocks up to snuff.  This service uses the venerable Network Time Protocol (NTP), described most recently in RFC 5905.

UDP Challenges

The documentation around NTP indicates it is based on User Datagram Protocol (UDP). While Windows Azure roles do not currently support you building network services that require UDP endpoints (though you can vote up the feature request here!), the opposite is not true: Windows Azure roles are able to communicate with non-Azure services using UDP, but only within the Azure Data Center. This is how some of the key internet plumbing based on UDP still works, such as the ability to do Domain Name System (DNS) lookups, and – of course – time synchronization via NTP.

This may lead to some confusion since UDP support is currently limited, while NTP being already provided.

The document cited above mentions you can “choose your preferred time source” if you don’t want to use time.windows.com. There are other sources from which you can update the time of a computing using NTP, such as free options from National Institute for Standards and Technology (NIST).

Here are the current NTP Server offerings as seen in the Control Panel on a running Windows Azure Role VM (logged in using Remote Desktop Connection). The list includes time.windows.com and four options from NIST:

Interestingly, when I manually tried changing the time on my Azure role using a Remote Desktop session, any time changes I made were immediately corrected whenever I tried to make changes. Not sure if it was doing an automatic NTP correction after any time change, but my guess is something else was going on since the advertised next time it would sync via NTP did not change based on this.

When choosing a different NTP Server, it did not always succeed (sometimes timing out), but also I did see it succeed, as in the following:

The interesting part of seeing any successful sync with time.nist.gov is that it implies UDP traffic leaving and re-entering the Windows Azure data center. This, in general, is just not allowed – all UDP traffic leaving or entering the data center is blocked (unless you use a VM Role with Windows Azure Connect). To prove this for yourself another way, configure your Azure role VM to use a DNS server which is outside of the Azure data center; all subsequent DNS resolution will fail.

If “weekly” is Not Enough

If the weekly synchronization frequency is somehow inadequate, you could write a Startup Task to adjust the frequency to, say, daily. This can be done via the Windows Registry (full details here including all the registry settings and some tools, plus there is a very focused summary here giving you just the one registry entry to tweak for most cases).

How frequently is too much? Not sure about time.windows.com, but time.nist.gov warns:

All users should ensure that their software NEVER queries a server more frequently than once every 4 seconds. Systems that exceed this rate will be refused service. In extreme cases, systems that exceed this limit may be considered as attempting a denial-of-service attack.

Of further interest, check out the NIST Time Server Status descriptions:

Name IP Address Location Status
time-a.nist.gov 129.6.15.28 NIST, Gaithersburg, Maryland ntp ok, time,daytime busy, not recommended
time-b.nist.gov 129.6.15.29 NIST, Gaithersburg, Maryland ntp ok, time,daytime busy, not recommended
time-nw.nist.gov 131.107.13.100 Microsoft, Redmond, Washington ntp, time ok, daytime busy, not recommended
time.nist.gov 192.43.244.18 NCAR, Boulder, Colorado All services busy, not recommended

They recommend against using any of the servers, at least at the moment I grabbed these Status values from their web site.  I find this amusing since – other than the default time.windows.com – these are the only four servers offered as alternatives in the User Interface of the Control Panel applet. As I mentioned above, sometimes these servers timed out on an on-demand NTP sync request I issued through the applet user interface; this may explain why.

It may be possible to use a commercial NTP service, but I don’t know if the Windows Server 2008 R2 configuration supports it (at least I did not see it in the user interface), and if there was a way to specify it (such as in the registry), I am not sure that the Windows Azure data center will allow the UDP traffic to that third-party host. (They may – I just don’t know. They do appear to allow UDP requests/responses to NIST servers. Not sure if this is a firewall/proxy rule, and if so, is it for NTP, or just NTP to NIST?)

And – for the (good kind of) hacker in you – if you want to play around with accessing an NTP service from code, check out this open source C# code.

Is this useful? Did I leave out something interesting or get something wrong? Please let me know in the comments! Think other people might be interested? Spread the word!

Advertisements

Four 4 tips for developing Windows Services more efficiently

Are you building Windows Services?

I recently did some work with Windows Services, and since it had been rather a long while since I’d done so, I had to recall a couple of tips and tricks from the depths of my memory in order to get my “edit, run, test” cycle to be efficient. The singular challenge for me was quickly getting into a debuggable state with the service. How I did this is described below.

Does Windows Azure support Windows Services?

First, a trivia question…

Trivia Question: Does Windows Azure allow you to deploy your Windows Services as part of your application or cloud-hosted service?

Short Answer: Windows Azure is more than happy to run your Windows Services! While a more native approach is to use a Worker Role, a Windows Service can surely be deployed as well, and there are some very good use cases to recommend them.

More Detailed Answer: One good use case for deploying a Windows Service: you have legacy services and want to use the same binary on-prem and on-azure. Maybe you are doing something fancy with Azure VM Roles. These are valid examples. In general – for something only targetting Azure – a Worker Role will be easier to build and debug. If you are trying to share code across a legacy Windows Service and a shiny new Windows Azure Worker Role, consider following the following good software engineering practice (something you may want to do anyway): factor out the “business logic” into its own class(es) and invoke it with just a few lines of code from either host (or a console app, a Web Service, a unit test (ahem), etc.).

Windows Services != Web Services

Most readers will already understand and realize this, but just to be clear, a Windows Service is not the same as a Web Service. This post is not about Web Services. However, Windows Azure is a full-service platform, so of course has great support for not only Windows Services but also Web Services. Windows Communication Foundation (WCF) is a popular choice for implementing Web Services on Windows Azure, though other libraries work fine too – including in non-.NET languages and platforms like Java.

Now, on to the main topic at hand…

Why is Developing with Windows Services Slower?

Developing with Windows Services is slower than some other types of applications for a couple of reasons:

  • It is harder to stop in the Debugger from Visual Studio. This is because a Windows Service does not want to be started by Visual Studio, but rather by the Service Control Manager (the “scm” for short – pronounced “the scum”). This is an external program.
  • Before being started, Windows Services need to be installed.
  • Before being installed, Windows Services need to be uninstalled (if already installed).

Tip 1: Add Services applet as a shortcut

I find myself using the Services applet frequently to see which Windows Services are running, and to start/stop and other functions. So create a shortcut to it. The name of the Microsoft Management Console snapin is services.msc and you can expect to find it in Windows/System32, such as here: C:\Windows\System32\services.msc

A good use of the Services applet is to find out the Service name of a Windows Service. This is not the same as the Windows Services’s Display name you seen shown in the Name column. For example, see the Windows Time service properties – note that W32Time is the real name of the service:

Tip 2: Use Pre-Build Event in Visual Studio

Visual Studio projects have the ability to run commands for you before and after the regular compilation steps. These are known as Build Events and there are two types: Pre-build events and Post-build events. These Build Events can be accessed from your Project’s properties page, on the Build Events side-tab. Let’s start with the Pre-build event.

Use this event to make sure there are no traces of the Windows Service installed on your computer. Depending on where you install your services from (see Tip 3), you may find that you can’t even recompile your service until you’ve at least stopped it; this smooths out that situation, and goes beyond it to make the usual steps happen faster than you can type.

One way to do this is to write a command file –  undeploy-service.cmd – and invoke it as a Pre-build event as follows:

undeploy-service.cmd

You will need to make sure undeploy-service.cmd is in your path, of course, or else you could invoke it with the path, as in c:\tools\undeploy-service.cmd.

The contents of undeploy-service.cmd can be hard-coded to undeploy the service(s) you are building every time, or you can pass parameters to modularize it. Here, I hard-code for simplicity (and since this is the more common case).

set ServiceName=NameOfMyService
net stop %ServiceName%
C:\WINDOWS\Microsoft.NET\Framework\v4.0.30319\installutil.exe /u %ServiceName%
sc delete %ServiceName%
exit /b 0
Here is what the commands each do:
  1. Set a reusable variable to the name of my service (set ServiceName=NameOfMyService)
  2. Stop it, if it is running (net stop)
  3. Uninstall it (installutil.exe /u)
  4. If the service is still around at this point, ask the SCM to nuke it (sc delete)
  5. Return from this .cmd file with a  success status so that Visual Studio won’t think the Pre-Build event ended with an error (exit /b 0 => that’s a zero on the end)
In practice, you should not need all the horsepower in steps 2, 3, and 4 since each of them does what the prior one does, plus more. They are increasingly powerful. I include them all for completeness and your consideration as to which you’d like to use – depending on how “orderly” you’d like to be.

Tip 3: Use Post-Build Event in Visual Studio

Use this event to install the service and start it up right away. We’ll need another command file – deploy-service.cmd – to invoke as a Post-build event as follows:

deploy-service.cmd $(TargetPath)

What is $(TargetPath) you might wonder. This is a Visual Studio build macro which will be expanded to the full path to the executable – e.g., c:\foo\bin\debug\MyService.exe will be passed into deploy-service.cmd as the first parameter.  This is helpful so that deploy-service.cmd doesn’t need to know where your executable lives. (Visual Studio build macros may also come in handy in your undeploy script from Tip 2.)

Within deploy-service.cmd you can either copy the service executables to another location, or install the service inline. If you copy the service elsewhere, be sure to copy needed dependencies, including debugging support (*.pdb). Here is what deploy-service.cmd might contain:

set ServiceName=NameOfMyService
set ServiceExe=%1
C:\WINDOWS\Microsoft.NET\Framework\v4.0.30319\InstallUtil.exe %ServiceExe%
net start %ServiceName%
Here is what the commands each do:
  1. Set a reusable variable to the name of my service (set ServiceName=NameOfMyService)
  2. Set a reusable variable to the path to the executable (passed in via the expanded $(TargetPath) macro)
  3. Install it (installutil.exe)
  4. Start it (net start)
Note that net start will not be necessary if your Windows Service is designed to start automatically upon installation. That is specified through a simple property if you build with the standard .NET template.

Tip 4: Use System.Diagnostics.Debugger in your code

If you follow Tip 2 when you build, you will have no trouble building. If you follow Tip 3, your code will immediately begin executing, ready for debugging. But how to get it into the debugger? You can manually attach it to a running debug session, such as through Visual Studio’s Debug menu with the Attach to Process… option.

I find it is often more productive to drop a directive right into my code, as in the following:

void Foo()
{
int x = 1;
System.Diagnostics.Debugger.Launch(); // use this…
System.Diagnostics.Debugger.Break();    // … or this — but not both
}

System.Diagnostics.Debugger.Launch will launch into a into debugger session once it hits that line of code and System.Diagnostics.Debugger.Break will break on that line. They are both useful, but you only need one of them – you don’t need them both – I only show both here for illustrative purposes. (I have seen problems with .NET 4.0 when using Break, but not sure if .NET 4.0 or Break is the real culpret. Have not experienced any issues with Launch.)

This is the fastest way I know of to get into a debugging mood when developing Windows Services. Hope it helps!

Quick: How many 9s are in your SLA?

I recently attended an event where one of the speakers was the CTO of a company built on top of Amazon cloud services, the most critical of these being the Simple Storage Service known as Amazon S3.

The S3 service runs “out there” (in the cloud) and provides a scalable repository for applications to store and manage data files. The service can support files of any size, as well as any quantity. So you can put as much stuff up there as you want – and since it is a pay-as-you-go service, you pay for what you use. The S3 service is very popular. An example of a well-known customer, according to Wikipedia, is SmugMug:

Photo hosting service SmugMug has used S3 since April 2006. They experienced a number of initial outages and slowdowns, but after one year they described it as being “considerably more reliable than our own internal storage” and claimed to have saved almost $1 million in storage costs.

Good stuff.

Of course, Amazon isn’t the only cloud vendor with such an offering. Google offers Google Storage, and Microsoft offers Windows Azure Blob Storage; both offer features and capabilities very similar to those of S3. While Amazon was the first to market, all three services are now mature, and all three companies are experts at building internet-scale systems and high-volume data storage platforms.

As I mentioned above, S3 came up during a talk I attended. The speaker – CTO of a company built entirely on Amazon services – twice touted S3’s incredibly strong Service Level Agreement (SLA). He said this was both a competitive differentiator for his company, and also a competitive differentiator for Amazon versus other cloud vendors.

Pause and think for a moment – any idea? – What is the SLA for S3? How about Google Storage? How about Windows Azure Blob Storage?

Before I give away the answer, let me remind you that a Service Level Agreement (SLA) is a written policy offered by the service provider (Amazon, Google, and Microsoft in this case) that describes the level of service being offered, how it is measured, and consequences if it is not met. Usually, the “level of service” part relates to uptime and is measured in “nines” as in 99.9% (“three nines”) and so forth. More nines is better, in general – and wikipedia offers a handy chart translating the number of nines into aggregate downtime/unavailability. (More generally, an SLA also deals with other factors – like refunds to customers if expectations are not met, what speed to expect, limitations, and more. I will focus only on the “nines” here.)

So… back to the question… For S3 and equivalent services from other vendors, how many nines are in the Amazon, Google, and Microsoft SLAs? The speaker at the talk said that S3 had an uptime SLA with 11 9s. Let me say that again – eleven nines – or 99.999999999% uptime. half-an-eye-blinkIf you attempt to look this up in the chart mentioned above, you will find this number is literally “off the chart” – the chart doesn’t go past six nines! But my back-of-the-envelope calculation says it amounts to – on average – less than 32 milliseconds of downtime per year. This is about half what “a blink of your eye” would take – yes, a mere half of an eye-blink. (Which ends with your eyes closed. :-))

This is an impressive number! If only it was true. It turns out the real SLA for Amazon S3 has exactly as many nines as the SLA for Windows Azure Blob Storage and the SLA for Google Storage: they are all 99.9%.

Storage SLAs for Amazon, Google, and Microsoft all have exactly the same number of nines: they are all 99.9%. That’s three nines.

I am not picking on the CTO I heard gushing about the (non-existant) eleven-nines SLA. (In fact, his or her identity is irrelevent to the overall discussion here.) The more interesting part to me is the impressive reality distortion field around Amazon and its platform’s capabilities. The CTO I heard speak got it wrong, but this is not the first time it was misinterpreted as an SLA, and it won’t be the last.

I tracked down the origin of the eleven nines. Amazon CTO Werners Vogels mentions in a blog post that the S3 service is “design[ed]” for “99.999999999% durability” – choosing his words carefully. Consistent with Vogels’ language is the following Amazon FAQ on the same topic:

Q: How durable is Amazon S3? Amazon S3 is designed to provide 99.999999999% durability of objects over a given year. This durability level corresponds to an average annual expected loss of 0.000000001% of objects. For example, if you store 10,000 objects with Amazon S3, you can on average expect to incur a loss of a single object once every 10,000,000 years. In addition, Amazon S3 is designed to sustain the concurrent loss of data in two facilities.

First of all, these mentions are a comment on a blog and an item in an FAQ page; neither is from a company SLA. And second, they both speak to durability of objects – not uptime or availability. And third, also critically, they say “designed” for all those nines – but guarantee nothing of the sort. Even still, it is a bold statement. And good marketing.

It is nice that Amazon can have so much confidence in their S3 design. I did not find a comparable statement about confidence in the design of their compute infrastructure… Reality is that [cloud] services are about more than design and architecture – also about implementation, operations, management, and more. To have any hope, architecture and design need to be solid, of course, but alone they cannot prevent a general service outage which could take your site down with it (and even still lose data occasionally). Some others on the interwebs are skeptical as I am, not just of Amazon, but anyone claiming too many nines.

How about the actual 99.9% “three-nines” SLA? Be careful in your expectations. As a wise man once told me, there’s a reason they are called Service Level Agreements, rather than Service Level Guarantees. There are no guarantees here.

This isn’t to pick on Amazon – other vendors have had – and will have – interruptions in service. For most companies, the cloud will still be the most cost-effective and reliable way to host your applications; few companies can compete with the big platform cloud vendors for expertise, focus, reliability, security, economies-of-scale, and efficiency. It is only a matter of time before you are there. Today, your competitors (known and unknown) are moving there already. As a wise man once told me (citing Crossing the Chasm), the innovators and early adoptors are those companies willing to trade off risk for competitive advantage. You saw it here first: this Internet thing is going to stick around for a while. Yes, and cloud services will just make too much sense to ignore. You will be on the cloud; it is only a matter of where you’ll be on the curve.

Back to all those nines… Of course, Amazon has done nothing wrong here. I see nothing inaccurate or deceptive in their documentation. But those of us in the community need to pay closer attention to what is really being described.  So here’s a small favor I ask of this technology community I am part of: Let’s please do our homework so that when we discuss and compare the cloud platforms – on blogs, when giving talks, or chatting 1:1 – we can at least keep the discussions based on facts.