Azure


Last night I had the privilege of speaking at the Nashua .NET Cloud User Group in Nashua, NH. It was an engaged group to be sure – thanks for all the great questions.

A few followups:

  • Azure VM pricing: the $0.013/hour pricing mentioned for Extra Small instances of the Infrastructure as a Service (IaaS) Virtual Machine is shown here to be a promotional price, with the regular price of $0.02/hour (two cents per hour) kicking in on June 1. The architectures we spoke of in the talk used Platform as a Service (PaaS) Virtual Machines and the pricing for those is very similar, though slightly lower, and is shown here.
  • How many customers does Azure have: here is the 10,000 number that Udai shared, which is from was about three years ago when most of the tech world had not yet even heard of Azure. More recently,  it was mentioned there are 200,000 Azure customers and it has passed $1 billion in revenue. So, according to those numbers, it appears to have grown 20x in a little less than three years. Additional interesting numbers mentioned here and here.
  • We focused on use of Cloud Services last night, but we also mentioned Virtual Machines (part of what Microsoft is calling Infrastructure Services, like IaaS) and Web Sites, noting all use different approaches. You can read more about all of them here where you’ll see write-ups for each specific area.
  • I mentioned that Blob Storage is also being used to support the persistent disks on the Infrastructure Services Virtual Machines, in part-enabled by new high performance network architecture. I wrote about some of this before in a blog post titled Azure Cloud Storage Improvements Hit the Target.

The deck I used follows.

Architecting for the Cloud — NH Azure — 15-Mar-2013 — Bill Wilder (blog.codingoutloud.com)

My book, if you are interested, is described here. And the Boston Azure Cloud User Group can be found here.

Cloud Architecture Patterns book

Last night, Mark Eisenberg and I represented the Windows Azure Cloud Platform in a Clash of the Clouds panel discussion/debate opposite Erik Sebesta  and Ed Brennan who represented the Open Source cloud alternatives. Erik & Ed declared OpenStack to be the strongest of the open source options today, so it became about Azure vs. OpenStack.

While I will not attempt to reproduce the discussion (sorry!, though there are a few photos), I do want to follow up on a few questions that I offered to provide references on. If you have further questions, please feel free to put a comment on this post. Also, at the end of this post, you will find a link to the short “Azure in 3 minutes or less” deck we used to introduce the Windows Azure Cloud Platform at the very beginning (per the ground rules of the panel – we limited the intro to 3 minutes).

  • In response to the question about scalability of Windows Azure Blobs, here is the write-up I referenced on Windows Azure Storage Scalability Targets. Here is an additional (more comparative) discussion (follow links) you may find helpful: Azure Cloud Storage Improvements Hit the Target.
  • In response to the question about pricing, check out the Windows Azure pricing calculator. Note that for the Microsoft Server products (e.g. Windows Server, or SQL Server on Windows Azure SQL Database (offered as a service) or on a Virtual Machine (that you manage)), the cost of the license is baked into the hourly rental cost.
  • In response to the question about the ability to support different types of apps (whether new ones from startups, existing ones from big company, etc.), see the spectrum of offerings described here: https://www.windowsazure.com/en-us/develop/net/fundamentals/compute/. In a nutshell, Web Sites is for hosting (with a free Tier) for basic, low-scale sites, but these can scale very nicely too), Cloud Services is for building Cloud-Native applications using PaaS (which my book focuses on), Virtual Machines (parallel to what OpenStack offers in terms of managed VMs) is more useful for applications you want to run in the cloud with minimal change, and Virtual Networking allows many options for connecting your data center with a secure private network on Windows Azure among other options.
  • In response to the question about openness, any programming language or platform can access the Windows Azure services through REST APIs, but here is the list of those with first-class SDKs: http://www.windowsazure.com/en-us/downloads/
  • For any further follow-up questions feel free leave a COMMENT below and I will update this post.

Windows Azure is not the only full-service, rock-solid cloud platform out there, but I hope you got an appreciation for how it might help you and why you might wish to choose it for your applications and services. If you are interested in learning more about Windows Azure, you may wish to check out the Boston Azure User Group, which has been meeting regularly at NERD since October 2009. Our next meeting is in just a few days: Tuesday May 9.

The SLIDE DECK we used for the 3 minute intro is here:

 

Webinar Registration:

  • Azure Best Practices – How to Successfully Architect Windows Azure Apps for the Cloud @ 1pm ET on 13-March-2013
  • VIEW RECORDING HERE: http://bit.ly/ZzQDDW 

Abstract:

Discover how you can successfully architect Windows Azure-based applications to avoid and mitigate performance and reliability issues with our live webinar
Microsoft’s Windows Azure cloud offerings provide you with the ability to build and deliver a powerful cloud-based application in a fraction of the time and cost of traditional on-premise approaches.  So what’s the problem? Tried-and-true traditional architectural concepts don’t apply when it comes to cloud-native applications. Building cloud-based applications must factor in answers to such questions as:

  • How to scale?
  • How to overcome failure?
  • How to build a manageable system?
  • How to minimize monthly bills from cloud vendors?

During this webinar, we will examine why cloud-based applications must be architected differently from that of traditional applications, and break down key architectural patterns that truly unlock cloud benefits. Items of discussion include:

  • Architecting for success in the cloud
  • Getting the right architecture and scalability
  • Auto-scaling in Azure and other cloud architecture patterns

If you want to avoid long nights, help-desk calls, frustrated business owners and end-users, then don’t miss this webinar or your chance to learn how to deliver highly-scalable, high-performance cloud applications.

Deck:

Book:

The core ideas were drawn from my Cloud Architecture Patterns (O’Reilly Media, 2012) book:

book-cover-medium.jpg

Hosted by Dell:

image

Windows Azure Storage (WAS)

Brad Calder SOSP talk from http://www.youtube.com/watch?v=QnYdbQO0yj4

Brad Calder delivering SOSP talk

Since its initial release, Windows Azure has offered a storage service known as Windows Azure Storage (WAS). According to the SOSP paper and related talk published by the team (led by Brad Calder), WAS is architected to be a “Highly Available Cloud Storage Service with Strong Consistency.” Part of being highly availably is keeping your data safe and accessible. The SOSP paper mentions that the WAS service retains three copies of every stored byte, and (announced a few months before the SOSP paper) another asynchronously geo-replicated trio of copies in another data center hundreds of miles away in the same geo-political region. Six copies in total.

WAS is a broad service, offering not only blob (file) storage, but also a NoSQL store and a reliable queue.

Further, all of these WAS storage offerings are strongly consistent (as opposed to other storage approaches which are sometimes eventually consistent). Again citing the SOSP paper: “Many customers want strong consistency: especially enterprise customers moving their line of business applications to the cloud.” This is because traditional data stores are strongly consistent and code needs to be specially crafted in order to handle an eventually consistent model. This simplifies moving existing code into the cloud.

The points made so far are just to establish some basic properties of this system before jumping into the real purpose of this article: performance at scale. The particular points mentioned (highly available, storage in triplicate and then geo-replicated, strong consistency, and supporting also a NoSQL database and reliable queuing features) were highlighted since they may be considered disadvantages – rich capabilities that may be considered to hamper scalability and performance. Except that they don’t hamper scalability and performance at all. Read on for details.

Performance at Scale

A couple of years ago, Nasuni benchmarked the most important public cloud vendors on how their services performed on cloud file storage at scale (using workloads modeled after those observed from real world business scenarios). Among the public clouds tested were Windows Azure Storage (though only the blob/file storage aspect was considered), Amazon S3 (an eventually consistent file store), and a couple of others.

In the first published result in 2011, Nasuni declared Amazon S3 the overall winner, prevailing over Windows Azure Storage and others, though WAS fininshed ahead of Amazon in some of the tests. At the time of these tests, WAS was running on its first-generation network architecture and supported capacity as described in the team’s published scalability targets from mid-2010.

In 2012, Microsoft network engineers were busy implementing a new data center network design they are calling Quantum 10 (or Q10 for short). The original network design was hierarchical, but the Q10 design is flat (and uses other improvements like SSD for journaling). The end result of this dramatic redesign is that WAS-based network storage is much faster, more scalable, and as robust as ever. The corresponding Q10 scalability targets were published in November 2012 and show substantial advances.

Q10 was implemented during 2012 and apparently was in place before Nasuni ran its updated benchmarks between November 2012 and January 2013. With its fancy new network design in place, WAS really shined. While the results in 2011 were close, with Amazon S3 being the overall winner, in 2012 the results were a blowout, with Windows Azure Storage being declared the winner, sweeping all other contenders across the three categories.

“This year, our tests revealed that Microsoft Azure Blob Storage has taken a significant step ahead of last year’s leader, Amazon S3, to take the top spot. Across three primary tests (performance, scalability and stability), Microsoft emerged as a top performer in every category.” -Nusani Report

The Nasuni report goes on to mention that “the technology [Microsoft] are providing to the market is second to none.”

Reliability

One aspect of the report I found very interesting was in the error rates. For several of the vendors (including Amazon, Google, and Azure), Nasuni reported not a single error was detected during 100 million write attempts. And Microsoft stood alone for the read tests: “During read attempts, only Microsoft resulted in no errors.” In my book, I write about the Busy Signal Pattern which is needed whenever transient failures result during attempts to access a cloud service. The scenario described in the book showed the number of retries needed when I uploaded about four million files. Of course, the Busy Signal Pattern will still be needed for storage access and other services – not all transient failures can be eliminated from multitenant cloud services running on commodity hardware served over the public internet – and while this is not a guarantee there won’t be any, it does bode well for improvements in throughput and user experience.

And while it’s always been the case you can trust WAS for HA, these days it is very hard to find any reason – certainly not peformance or scalability – to not consider Windows Azure Storage. Further, WAS, S3, and Google Storage all have similar pricing (already low – and trending towards even lower prices) – and Azure, Google, and Amazon have the same SLAs for storage.

References

Note that the Nasuni report was published February 19, 2013 on the Nasuni blog and is available from their web site, though is gated, requiring that you fill out a contact form for access. The link is here: http://www.nasuni.com/blog/193-comparing_cloud_storage_providers_in

Other related articles of interest:

  1. Windows Azure beats the competition in cloud speed test – Oct 7, 2011 – http://yossidahan.wordpress.com/2011/10/07/windows-azure-beats-the-competition-in-cloud-speed-test/
  2. Amazon bests Microsoft, all other contenders in cloud storage test – Dec 12, 2011 -
  3. Only Six Cloud Storage Providers Pass Nasuni Stress Tests for Performance, Stability, Availability and Scalability – Dec 11, 2011 – http://www.nasuni.com/news/press_releases/46-only_six_cloud_storage_providers_pass_nasuni_stress
  4. Dec 3, 2012 – http://www.networkworld.com/news/2012/120312-argument-cloud-264454.html - Cloud computing showdown: Amazon vs. Rackspace (OpenStack) vs. Microsoft vs. Google
  5. http://www.networkworld.com/news/2013/021913-azure-aws-266831.html?hpg1=bn - Feb 19, 2013 – Microsoft Azure overtakes Amazon’s cloud in performance test

On Saturday March 9, 2013, I teamed up with Joan Wortman on a talk at the 19th (!) Boston Code Camp. Some of the patterns I discuss require some different thinking about application architecture, including aspects that impact the user experience (UX). I teamed up with Joan Wortman (who is a UX expert) to better include some context around how to deal with some of these UX challenges as they intersect with architecture.

I also hope to see many of the attendees at future Boston Azure meetings (held at same location as the Boston Code Camp – NERD in Cambridge, MA). Also feel free to post follow-up questions to this post or email me (codingoutloud on gmail) or ask me on twitter where I am @codingoutloud.

Here are a couple of questions that came up in the talk:

  1. How much does the cloud cost? As I mentioned, this is a question that deserves some discussion since it is not as simple as looking at the pricing calculator (which can be found here). Sometimes it will be less costly, sometimes more costly. (I did point out there is a free tier for Windows Azure Web Sites.) One major factor is the cost of resources (which is trending down over time). Another major factor is the impact of reducing resource usage when it is not needed; for example, consider a Line of Business application which is used only during business hours in North America and can be turned off completely (accruing no VM usage charges) during non-business hours/weekends/holidays; as another example consider that you don’t need to own resource for the “spike” at the Superbowl (like Shazam scenario described by Joan) since you can “give it all back” (stop paying) once the rush is over. There are also other considerations when you get into DR and HA and geo-distribution. (I wrote about RPO and RTO terms in the context Engineering for DR in the Cloud recently.) And still another factor is understanding what you are paying for — don’t forget the Iceberg idea — so do not compare pricing with those of traditional hosting (unless that’s what you really want) since hosting is not cloud computing!
  2. Why can I only access 32 messages at a time from the Windows Azure Storage Queue? This is the same limit when we talk about “peeking” (looking at what’s on the queue without removing it) and retrieving messages for exclusive access. I don’t know why this particular limit was chosen (why not 20? why not 100?) so could only speculate on that. The bottom line is that all messages can be accessed – sometimes requiring more than one call. I wish I had time to probe into the application scenario that would benefit from grabbing so many messages at once, but due to time constraints did not do that. I will answer the question further if I get a follow-up question.
  3. Where can I find the mail app that Joan mentioned? The Mailbox app is for iOS and can be found in your app store or directly on iTunes here: https://itunes.apple.com/us/app/mailbox/id576502633?mt=8 (and there’s a lot of press – such as this story here).
  4. OTHER QUESTIONS? Send ‘em along!

Hope to see you at Boston Azure:

clip_image001_thumb.png

Much of the material for the talk also appears in my book:

Cloud Architecture Patterns book

On Thursday 07-February-2013 I spoke at DevBoston about “How is Architecting for the Cloud Different?”

Here is the abstract:

If my application runs on cloud infrastructure, am I done? Not if you wish to truly take advantage of the cloud. The architecture of a cloud-native application is different than the architecture of a traditional application and this talk will explain why. How to scale? How do I overcome failure? How do I build a system that I can manage? And how can I do all this without a huge monthly bill from my cloud vendor? We will examine key architectural patterns that truly unlock cloud benefits. By the end of the talk you should appreciate how cloud architecture differs from what most of use have become accustomed to with traditional applications. You should also understand how to approach building self-healing distributed applications that automatically overcome hardware failures without downtime (really!), scale like crazy, and allow for flexible cost-optimization.

Here are the slides:

How is Architecting for the Cloud Different — DevBoston — 06-Feb-2013 — Bill Wilder (blog.codingoutloud.com)

Here is the book we gave away copies of (and from which some of the material was drawn):

book-cover-medium.jpg

Ready to learn more about Windows Azure? Come join us at the Boston Azure Cloud User Group!

Boston Azure cloud user group logo

Microsoft released version 4.5 of its popular .NET Framework in August 2012. This framework can be installed independently on any compatible machine (check out the .NET FrameworkThe Azure FAQ Deployment Guide for Administrators) and (for developers) come along with Visual Studio 2012.

Windows Azure Web Sites also support .NET 4.5, but what is the easiest way to deploy a .NET 4.5 application to Windows Azure as a Cloud Service? This post shows how easy this is.

Assumption

This post assumes you have updated to the most recent Windows Azure Tools for Visual Studio and the latest SDK for .NET.

For any update to a new operating system or new SDK, consult the Windows Azure Guest OS Releases and SDK Compatibility Matrix to understand which versions of operating systems and Azure SDKs are intended to work together.

You can do this with the Web Platform Installer by installing Windows Azure SDK for .NET (VS 2012) – Latest (best option) – or directly here (2nd option since this link will become out-of-date eventually).

Also pay close attention to the release notes, and don’t forget to Right-Click on your Cloud Service, hit Properties, and take advantage of some of the tooling support for the upgrade:

UpgradeFall2012

Creating New ASP.NET Web Role for .NET 4.5

Assuming you have up-to-date bits, a File | New from Visual Studio 2012 will look something like this:

image

Select a Cloud project template, and (the only current choice) a Windows Azure Cloud Service, and be sure to specify .NET Framework 4.5. Then proceed as normal.

Updating Existing ASP.NET Web Role for .NET 4.5

If you wish to update an existing Web Role (or Worker Role), you need to make a couple of changes in your project.

First, update the Windows Azure Operating System version use Windows Server 2012. This is done by opening your Cloud project (pageofphotos in the screen shot) and opening ServiceConfiguration.Cloud.cscfg.

image

Change the osFamily setting to be “3” to indicate Windows Server 2012.

   osFamily=”3″

As of this writing. the other allowed values for osFamily are “1” and “2” to indicate Windows Server 2008 SP2 and Windows Server 2008 R2 (or R2 SP1) respectively. The up-to-date settings are here.

Now you are set for your operating system to include .NET 4.5, but none of your Visual Studio projects have yet been updated to take advantage of this. For each project that you intend to update to use .NET 4.5, you need to update the project settings accordingly.

image

First, select the project in the Solution Explorer, right-click on it, and choose Properties from the pop-up menu. That will display the screen shown. Now simply select .NET Framework 4.5 from the available list of Target framework options.

If you open an older solution with the newer Azure tools for Visual Studio, you might see a message something like the following. If that happens, just follow the instructions.

WindowAzureTools-dialog-NeedOct2012ToolsForDotNet45

That’s it!

Now when you deploy your Cloud Service to Windows Azure, your code can take advantage of .NET 4.5 features.

Troubleshooting

Be sure you get all the dependencies correct across projects. In one project I migrated, I realized the following came up because I had a mix of projects that needed to stay on .NET 4.0, but those aspects deployed to the Windows Azure Cloud could be on 4.5. If you don’t get this quite right, you may get a compiler warning like the following:

Warning  The referenced project ‘CapsConfig’ is targeting a higher framework version (4.5) than this project’s current target framework version (4.0). This may lead to build failures if types from assemblies outside this project’s target framework are used by any project in the dependency chain.    SomeOtherProjectThatReferencesThisProject

The warning text is self-explanatory: the solution is to not migrate that particular project to .NET 4.5 from .NET 4.0. In my case, I was trying to take advantage of the new WIF features, and this project did not have anything to do with Identity, so there was no problem.

At the December 13, 2012 meeting for Boston Azure Cloud User Group, I gave a short talk on how Digital Certificates work (cryptographically speaking).

The backstory is that Windows Azure uses certificates in a few different ways, and understanding the different types of certificate uses is key to understanding why these different ways of using and deploying certificates are the way they are.

The slide deck is here:

Sorting Out Digital Certificates – 13-Dec-2012 – Bill Wilder – Boston Azure

 

 

Disaster Recovery, or DR, refers to your approach for recovering from an event that results in failure of your software system. Some examples of such events: hurricanes, earthquakes, and fires. The common thread with these events is that they were not your fault and they happened suddenly, usually at the most inconvenient of times.

image of storm clouds

Clouds are not always inviting! Be prepared for storm clouds.

Damage from one of these events might be temporary: a prolonged power outage that is eventually restored. Damage might be permanent: servers immersed in water are unlikely to work after drying out.

Whether a one-person shop with all the customer data on a single laptop, or a large multi-national with its own data centers, any business that uses computers to manage data important to that business needs to consider DR.

The remainder of this article focuses on some useful DR approaches for avoiding loss of business data when engineering applications for the cloud. The detailed examples are specific to the Windows Azure Cloud Platform, but the concepts apply more broadly, such as with Amazon Web Services and other cloud platforms. Notable this post does not discuss DR approaches as they apply to other parts of infrastructure, such as web server nodes or DNS routing.

Minimize Exposure

Your first line of defense is to minimize exposure. Consider a cloud application with business logic running on many compute nodes.

Terminology note: I will use the definition of node from page 2 of my Cloud Architecture Patterns book (and occasionally in other places in this post I will reference patterns and primers from the book where they add more information):

An application runs on multiple nodes, which have hardware resources. Application logic runs on compute nodes and data is stored on data nodes. There are other types of nodes, but these are the primary ones. A node might be part of a physical server (usually a virtual machine), a physical server, or even a cluster of servers, but the generic term node is useful when the underlying resource doesn’t matter. Usually it doesn’t matter.

In cloud-native Windows Azure applications, these compute nodes are Web Roles and Worker Roles. The thing to realize is that local storage on Web Roles and Worker Roles is not a safe place to keep important data long term. Well before getting to an event significant enough to be characterized as needing DR, small events such as a hard-disk failure can result in the loss of such data.

While not a DR issue per se due to the small scope, these applications should nevertheless apply the Node Failure Pattern (Chapter 10) to deal with this.

But the real solution is to not use local storage on compute nodes to store important business data. This is part of an overall strategy of using stateless nodes to enable your application to scale horizontally, which comes with many important benefits beyond just resilience to failure. Further details are described in the Horizontally Scaling Compute Pattern (Chapter 2).

Leverage Platform Services

In the United States, there are television commercials featuring “The Most Interesting Man in the World” who lives an amazing, fantastical life, and doesn’t always drink beer, but when he does he drinks DOS EQUIS.

image

In the cloud, our compute nodes do not always need to persist data long-term, but when they do, they use cloud platform services.

And the “DOS” in “DOS EQUIS” stands for neither Disk Operating System nor Denial of Service here, but rather is the number two in Spanish. But cloud platform services for data storage do better than dos, they have tres – as in three copies.

Windows Azure Storage and Windows Azure SQL Database both write three copies of each byte onto three independent servers on three independent disks. The hardware is commodity hardware – chosen for high value, not strictly for high availability – so it is expected to fail, and the failures are overcome by keeping multiple copies of every byte. If the one of the three instances fails, a new third instance is created by making copies from the other two. The goal state is to continually have three copies of every byte.

Windows Azure Storage is always accessed through a REST interface, either directly, or via specific SDK which uses the REST interface under the hood. For any REST API call that modifies data, the API does not return until all three copies of the bytes are successfully stored.

Windows Azure SQL Database is always accessed through TDS, which is the same TCP protocol as SQL Server. While your application is provided a single connection string, and you create a single TDS connection, behind the scenes there is a three-node cluster. For any operation that modifies data, the operation does not return until at least two copies of the update have been successfully applied on two of the nodes in this cluster; the third node is updated asynchronously.

So if you have a Web Role or Worker Role in Windows Azure, and that node has to save data, it should use one of the persistent storage mechanisms just mentioned.

What about Windows Azure Virtual Machines?

Windows Azure also has a Virtual Machine node that you can deploy (Windows or Linux flavored), and the hard disks attached to those nodes are persistent, but how can that be? It turns out they are backed by Windows Azure Blob storage, so that doesn’t break the model: they also have some storage that is truly local and can use it for caching sorts of functions, but any long-term data is persisted to blob storage, even though it is indistinguishable from a local disk drive from the point of view of any code running on the virtual machine.

But wait, there’s more!

In addition to this, Windows Azure Storage asynchronously geo-replicates blobs and tables to a sister data center. There are eight Azure data centers, and they are paired as follows: East US-West US, North Central US-South Central US, North Europe-West Europe, and East Asia-Southeast Asia. Note that the pairs are chosen to be in the same geo-political region to simplify regulatory compliance in many cases. So if you save data to a blob in East US, three copies will be synchronously written in East US, then three more copies will be asynchronously written to West US.

It is easy to overlook the immense value of having data stored in triplicate and transparently geo-replicated. While the feature comes across rather matter-of-factly, you get incredibly rich DR features without lifting a finger. Don’t let the ease of use mask the great value of this powerful feature.

All of the local and geo-replication mentioned so far happens for free: it is included as part of the listed at-rest storage costs, and no action needed on your part to enable this capability (though you can turn it off).

Enable More as Needed

All the replication listed above will help DR. If a hardware failure takes out one of your three local copies, the system self-heals – you will never even know most types of failures happen. If a natural disaster takes out a whole data center, Microsoft decides when to reroute DNS traffic for Windows Azure Storage away from the disabled data center and over to its sister data center which has the geo-replicated copies.

Note that the geo-replication is only out-of-the-box today for Windows Azure Storage (and not for queues – just for blobs and tables) and not for SQL Database. However, this can be enabled using the sync service available today – you decide how many copies and to which data centers and at what frequency.

Note that there are additional costs associated with using the sync service for SQL Database, for the sync service itself and for data center egress bandwidth.

Regardless of the mechanism, there is always a time-lag in asynchronous geo-replication, so if a primary data center was lost suddenly, the last few minutes worth of updates may not have been fully replicated. Of course, you could choose to write synchronously to two data centers for super-extra safety, but please consult the Network Latency Primer (Chapter 11) before doing so.

This is all part of the overall Multisite Deployment Pattern (Chapter 15), though servicing a geo-distributed user base is another feature of this architecture pattern, beyond the DR features.

Where’s the Engineering?

The title of this blog post is “Engineering for Disaster Recovery in the Cloud” but where did all the engineering happen?

Much of what you need for DR is handled for you by cloud platform services, but not all of it. From time-to-time we alluded to some design patterns that your applications need to adhere to in order for these platform services to make sense. As one example, if your application is written to assume it is safe to use local storage on your web server as a good long-term home for business data, well… the awesomeness built into cloud platform services isn’t going to help you.

There is an important assumption here if you want to leverage the full set of services available in the cloud: you need to build cloud-native applications. These are cloud application that are architected to align with the architecture of the cloud.

I wrote an entire book explaining what it means to architect a cloud-native application and detailing specific cloud architecture patterns to enable that, so I won’t attempt to cover it in a blog post, except to point out that many of the architectural approaches of traditional software will not be optimal for applications deployed to the cloud.

Distinguish HE from DR

Finally, we need to distinguish DR from HE – Disaster Recover from Human Error.

Consider how the DR features built into the cloud will not help with many classes of HE. If you modify or delete data, your changes will dutifully be replicated throughout the system. There is no magic “undo” in the cloud. This is why you usually will still want to take control of making back-ups of certain data.

So backups are still desirable. There are cloud platform services to help you with backups, and some great third-party tools as well. Details on which to choose warrant an entire blog post of their own, but hopefully this post at least clarifies the different needs driven by DR vs. HE.

Is This Enough?

Maybe. It depends on your business needs. If your application is one of those rare applications that needs to be responsive 24×7 without exception, not even for a natural disaster, then no, this is not enough. If your application is a line-of-business application (even an important one), often it can withstand a rare outage under unusual circumstances, so this approach might be fine. Most applications are somewhere in between and you will need to exercise judgement in weighing the business value against the engineering investment and operational cost of a more resilient solution.

And while this post talked about how the combination of following some specific cloud architecture patterns to design cloud-native applications provides a great deal of out-of-the-box resilience in DR situations, it did not cover ongoing continuity, such as with computation, or immediate access to data from multiple data centers. If you rely entirely on the cloud platform to preserve your data, you may not have access to it for a while since (as mentioned earlier, and emphasized nicely in Neil’s comment) you don’t control all the failover mechanisms; you will need to wait until Microsoft decides to failover the DNS for Windows Azure Storage, for example. And remember that background geo-replication does not guarantee zero data loss: some changes may be lost due to the additional latency needed in moving data across data centers, and not all data is geo-replicated (such as queued messages and some other data not discussed).

The ITIL term for “how much data can I stand to lose” is known as the recovery point objective (RPO). The ITIL term for “how long can I be down” is known as the recovery time objective (RTO). The RPO and RTO are useful concepts for modeling DR.

So the DR capabilities built into cloud platform services are powerful, but somewhat short of all-encompassing. However, they do offer a toolbox providing you with unprecedented flexibility in making this happen.

Is This Specific to the Cloud?

The underlying need to understand RPO and RTO and use them to model for DR is not specific to the cloud. These are very real issues in on-premises systems as well. The approaches to addressing them may vary, however.

Generally speaking, while the cloud does not excuse you from thinking about these important characteristics, it does provide some handy capabilities that make it easier to overcome some of the more challenging data-loss threats. Hopefully this allows you to sleep better at night.

—-

Bill Wilder is the author of the book Cloud Architecture Patterns – Develop Cloud-Native Applications from O’Reilly. This post complements the content in the book. Feel free to connect with Bill on twitter (@codingoutloud) or leave a comment on this post. (He’s also warming up to Google Plus.)

book-cover-medium

—-

Recently I encountered a strange error when attempting some storage-related activities using Windows Azure Tools within Visual Studio 2012. When either adding a new storage account or changing Connection String settings I was met with:

The certificate for the given thumbprint could not be loaded from the Current User/Personal certificate store. Please install the certificate.

While I was able to resolve the error, I cannot reproduce it (and gave up trying), but if you face the same problem, hopefully this will help you.

Background

If Visual Studio was looking for a certificate, where was it looking? It turns out that the Windows Azure Tools for Visual Studio store some certificate related references in a file called Windows Azure Connections.xml in your personal settings area on Windows. This file is created on your behalf once you’ve created any Publish Profiles by Publishing Cloud Services to Windows Azure from Visual Studio.

The file lives here:

%UserProfile%\Documents\Visual Studio 2012\Settings\Windows Azure Connections.xml

On my Windows 8 development machine, this is:

C:\Users\billdev\My Documents\Visual Studio 2012\Settings\Windows Azure Connections.xml

The file contains the credentials you’d previously supplied during publishing and will look something like the following:

<?xml version=”1.0″?>
<NamedCredentials xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance” xmlns:xsd=”http://www.w3.org/2001/XMLSchema”>
<Items>
<NamedCredential>
    <SubscriptionId>12345678-abcd-ae10-abba-01812e1e1000</SubscriptionId>
<IsImported>false</IsImported>
<ServiceEndpoint>https://management.core.windows.net/</ServiceEndpoint>
<CertificateThumbprint>A123B123C12333EC954471ED75C37D59003681F7</CertificateThumbprint>
<Name>Page of Photos</Name>
</NamedCredential>
</Items>
<LastUsedName>Page of Photos</LastUsedName>
</NamedCredentials>

Note that the NamedCredential XML element item may be repeated.

Problem

The problem turns out to be that one of the certificates referenced (identified by thumbprint via CertificateThumbprint XML element) either it is not installed properly locally, or not installed in the associated Windows Azure Subscription (identified by SubscriptionId XML element).

Solution

For each certificate referenced by a CertificateThumbprint element (there could be more than one, unlike the simple example shown above):

  1. Make sure the certificate is installed in your Local Certificate store and contains a Private Key – which usually can be found in the Personal (or “My”) store name under the Current User certificates by using the Certificates Snap-in with Microsoft Management Console. (You can also use certmgr.exe or write your own code to dump certificate info). (If the certificate exists in your local certificate store then it is probably fine. It is not likely it is missing a Private Key. But it is possible.)
  2. Make sure the certificate has been uploaded to the Windows Azure Portal for the SubscriptionId  referenced within Windows Azure Connections.xml.

That’s it. Should work. Worst case you can delete each element of your Windows Azure Connections.xml profile and start over.

Specific Scenarios

These are the two specific scenarios where I saw the problem in case you are interested.

Scenario #1

This scenario failed whether or not a project was open.

  1. Open the Server Explorer window in Visual Studio
  2. Right-click on Windows Azure Storage, choose “Add New Storage Account…“, and the error dialog appears:
    “The certificate for the given thumbprint could not be loaded from the Current User/Personal certificate store. Please install the certificate.”
  3. This message is extra confusing since I don’t think there ought to be any certificates involved here. And no project/solution is open.

Scenario #2.

This scenario requires an open Azure project.

  1. Open the UI tool for editing Azure configuration by opening your Cloud Project in Solution Explorer, drilling into Roles, and double-clicking on a Web Role or Worker Role project. The Role configuration editor window opens in Visual Studio.
  2. Choose Settings, then Add Setting (which creates Setting1 of Type=String), change Setting1‘s Type to Connection String, and the click the “…” button at far right (to pop up the connection string edit window), and an error dialog appears:
    “The certificate for the given thumbprint could not be loaded from the Current User/Personal certificate store. Please install the certificate.”

Here are the screen shots for the two error dialogs (slightly different).

image

Next Page »

Follow

Get every new post delivered to your Inbox.

Join 761 other followers