Category Archives: Programming

Related to some aspect of programming, software development, related tools, or supporting technologies, related standards, etc.

Choosing CORS over JSONP over Inline… and Lessons Learned Using CORS

I recently created a very simple client-only (no server-side code) that loads needed data dynamically. In order to access data from another storage location (in my case the data came from a Windows Azure Blob), the application needed to make a choice: how to load the data.

It really came down to three choices:

  1. Load the data synchronously as the page loaded using an Inline script tag
  2. Load the data asynchronously as part of initial page load using JSONP
  3. Load the data asynchronously as part of initial page load using CORS

All 3 options effectively work within the Same Origin Policy (SOP) sandbox security measures that browsers implement. If access is not coming from a browser (but from, say, curl or a server application), SOP has no effect. SOP is there to protect end users from web sites that might not behave themselves.

Option 1 would be to basically have a hardcoded script tag load the data. One disadvantage of this is put perfectly by Douglas Crockford: “A <script src="url"></script> will block the downloading of other page components until the script has been fetched, compiled, and executed.” This means that the page will block while the data is loaded, potentially making the initial load appear a bit more visually chaotic. Also, if this technique is the only mechanism for loading data, once the page is loaded, the data is never refreshed, a potentially severe limitation for some applications; in the very old days, the best we could do was periodically trigger a full-page refresh, but that’s not state-of-the-art in 2014.

Option 2 would be to load the data asynchronously using JSONP. This is a fine solution from a user experience point of view: the page structure is first loaded, then populated once the data arrives. The client invokes the request using the XMLHttpRequest object in JavaScript.

Option 3 would be to load the data asynchronously using CORS. This offers essentially the identical user experience as option 2 and also relies on the XMLHttpRequest object.

Options 1 and 2 require that the data be encapsulated in JavaScript code. For option 2 with JSONP the convention is a function (often named callback) that simply returns a JSON object. The client making the call will then need to execute the function to get at the data. Option 1 has slightly more flexibility and could be simply a data structure declared with a known name like var mapData = ... which the client can access directly.

Option 3 with CORS is able to return the data directly. In that regard it is a little tiny bit more efficient since no bubble-wrap is needed – and is a lot safer since you are not executing a JavaScript function returned returned by a potentially untrusted server.

JSONP is not based on any official standard, but is common practice. CORS is a standard that is supported in modern browsers and comes with granular access policies. As an example, CORS policies can be set to allow access from a whitelist of domains (such as paying customers), while disallowing from any other domain.

For all three options there needs to be coordination between the client and the server since they need to agree on how the data is packaged for transmission. For CORS, this also requires browser support (see chart below). All options require that JavaScript is enabled in the client browser.

Summarizing CORS, JSONP, Inline

The following summary compares key qualities.

Inline JavaScript JSONP CORS Comments
Synchronous or Async Synchronous Async Async
Granular domain-level security no no yes In any of the three, you could also implement an authorization scheme. This is above and beyond that.
Risk no yes no JSONP requires that you execute a JavaScript function to get at the data. Neither of the other two approaches require that. There’s an extra degree of caution needed for JSONP data sources outside of your control.
Efficiency on the wire close close most efficient Both Inline and JSONP both wrap your data in JavaScript constructs. These add a small amount of overhead. Depending on what you are doing, these could add up. But minor.
Browser support full full partial
Server support full full partial Servers need to support the CORS handshake with browsers to (a) deny disallowed domains, and (b) to give browsers the information they need to honor restrictions
Supported by a Standard no no yes
Is it the future no no yes Safer. Granular security. Standardized. Max efficiency.

Lessons Learned Using CORS

Yes, my simple one-page map app (described here) ended up using CORS. In large part since it is mature, and the browser support (see below) was sufficient.

Reloading Browser Pages: In debugging, CTRL-F5 is your friend in Chrome, Firefox, and IE if you want to clear the cache and reload the page you are on. I did this a lot as I was continually enabled and disabling CORS on the server to test out the effects.

Overriding CORS Logic in CHROME: It turns out that Chrome normally will honor all CORS settings. This is what most users will see. Let’s call this “civilian mode” for Chrome. But there’s also a developer mode – which you enable by running chrome with the chrome.exe –disable-web-security parameter. It was initially confused since it seemed Chrome’s CORS support didn’t work, but of course it did. This is one of the perils of living with a software nerd; my wife had used my computer and changed this a long time ago when she needed to build some CORS features, and I never knew until I ran into the perplexing issue.

Handling CORS Rejection: Your browser may not not let your JavaScript code know directly that a remote call was rejected due to a CORS policy. Some browsers silently map 404 to 0 if against a CORS-protected resource. You’ll see this mentioned in the code for httpGetString.js (if you look at my sample code).

Testing CORS from curl: Helped by a post on StackOverflow, I found it very handy to look at CORS headers from the command line. Note that you need to provide SOME origin in the request for it to be valid CORS, but here’s the command that worked for my cloud-host resource (you should also be able to run this same command):

curl -H “Origin: http://localhost” -H “Access-Control-Request-Method: GET” -H “Access-Control-Request-Headers: X-Requested-With” -X OPTIONS –verbose http://azuremap.blob.core.windows.net/maps/azuremap.geojson

Browser Support for CORS

To understand where CORS support stands with web browsers, this fantastic site http://caniuse.com/cors offers a nice visual showing CORS support across today. A corresponding chart for JSONP is not needed since it works within long-standing capabilities.

image

Resources

http://en.wikipedia.org/wiki/Same-origin_policy

My simple one-page map app is described here. That page includes a link to a running instance and its source code is easily viewed with View Source.

http://blog.auth0.com/2014/01/27/ten-things-you-should-know-about-tokens-and-cookies/#preflight

Stupid Azure Trick #5 – Got a Penny? Run a Simple Web Site 100% on Blob Storage for a Month – Cost Analysis Provided

Suppose you have a simple static web site you want to publish, but your budget is small. You could do this with Windows Azure Storage as a set of blobs. The “simple static” qualifier rules out ASP.NET and PHP and Node.js – and anything that does server-side processing before serving up a page. But that still leaves a lot of scenarios – and does not preclude the site from being interactive or loading external data using AJAX and behaving like it is dynamic. This one does.

Check out the web site at http://azuremap.blob.core.windows.net/apps/bingmap-geojson-display.html.

image

You may recognize the map from an earlier post that showed how one could visualize Windows Azure Data Center Regions on a map. It should look familiar because this web site uses the exact same underlying GeoJSON data used earlier, except this time the map implementation is completely different. This version has JavaScript code that loads and parses the raw GeoJSON data and renders it dynamically by populating a Bing Maps viewer control (which is also in JavaScript).

But the neat part is there’s only JavaScript behind the scenes. All of the site’s assets are loaded directly from Windows Azure Blob Storage (plus Bing Maps control from an external location).

Here’s the simple breakdown. There is the main HTML page (the URL specifies that directly), and that in turn loads the following four JavaScript files:

  1. http://ecn.dev.virtualearth.net/mapcontrol/mapcontrol.ashx?v=7.0 – version 7.0 of the Bing Map control
  2. httpGetString.js – general purposes data fetcher (used to pull in the GeoJSON data)
  3. geojson-parse.js – application-specific to parse the GeoJSON data
  4. bingmap-geojson-display.js – application-specific logic to put elements from the GeoJSON file onto the Bing Map

I have not tried this to prove the point, but I think that to render on, say, Google Maps, the only JavaScript that would need to change would be bingmap-geojson-display.js (presumably replaced by googlemap-geojson-display.js).

Notice that the GeoJSON data lives in a different Blob Storage Container here:  http://azuremap.blob.core.windows.net/maps/azuremap.geojson. We’ll get into the details in another post, but in order for this to work – in order for …/apps/bingmap-geojson.html to directly load a JSON data file from …/maps/azuremap.geojson – we enabled CORS for the Blob Service within the host Windows Azure Storage account.

Costs Analysis

Hosting a very low-cost (and low-complexity) web site as a few blobs is really handy. It is very scalable and robust. Blob Storage costs come from three sources:

  1. cost of data at rest – for this scenario, probably Blob Blobs and Locally Redundant Storage would be appropriate, and the cost there is $0.068 per GB / month (details)
  2. storage transactions – $0.005 per 100,000 transactions (details – same as above link, but look lower on the page) – where a storage transaction is (loosely speaking) a file read or write operation
  3. outbound data transfers (data leaving the data center) – first 5 GB / month is free, then there’s a per GB cost (details)

The azuremap web site shown earlier weighs in at under 18 KB and is spread across 5 files (1 .html, 3 .js, 1 .geojson). If we assume a healthy 1000 hits a day on our site, here’s the math.

  • We have around 1000 x 31 = 31,000 visits per month.
  • Cost of data at rest would be 18 KB x $0.068 / GB = effectively $0. Since storage starts at less than 7 cents per GB and our data is 5 orders of magnitude smaller, the cost is too small to meaningfully measure.
  • Storage transactions would be 31,000 x 5 (one per file in our case) x $0.005 / 100,000 = $0.00775, or a little more than 3/4 of a penny in US currency per month, around 9 cents per year, or $1 every 11 years.
  • Outbound data transfer total would be 31,000 x 18 KB = 560 MB, which is around 1/10th of the amount allowed for free, so there’d be no charge for that.

So our monthly bill would be for less than 1 penny (less than US$0.01).

This is also a good (though very simple) example of the sort of cost analysis you will need to do when understanding what it takes to create cloud applications or migrate from on-premises to the cloud. The Windows Azure Calculator and information on lower-cost commitment plans may also prove handy.

Alternative Approaches

Of course in this day and age, for a low-cost simple site it is hard to beat Windows Azure Web Sites. There’s an entirely free tier there (details) – allowing you to save yourself nearly a penny every month. That’s pretty good since Benjamin Franklin, one of America’s founding fathers, famously quipped A penny saved is a penny earned!.BenFranklinDuplessis.jpg

Windows Azure Web Sites also has other features – your site can be in PHP or ASP.NET or Node.js or Python. And you can get continuous deployment from GitHub or Bitbucket or TFS or Dropbox or others. And you get monitoring and other features from the portal. And more.

But at least you know you can host in blob storage if you like.

[This is part of a series of posts on #StupidAzureTricks, explained here.]

Dumping objects one property at a time? A Pretty-Printer for C# objects that’s Good Enough™

Over the years, I’ve written a lot of code that simple dumps out an object’s properties. Sometimes this is for debugging, sometimes it is for output to Console.WriteLine. But a lot of those cases are plain old BORING, and the only reason I end up typing in obj.foo, obj.bar, and obj.gizmo is that I was too lazy to figure out how to easily stringify an entire object at a time – so I kept doing it one property (and sub-property (and sub-sub-property ..)) at a time.

I know that ToString() is supposed to help out (in .NET at least), but you probably noticed how uncommon it is for this to be usefully implemented.

There’s a better way.

A Pretty-Printer for C# objects that’s usually Good Enough™

The simple way to dump objects that’s often good enough (but not always good enough) is to use Json.NET’s object serializer.

Add Json.NET using NuGet, then using a code snippet like the following to dump out an object named someObject :

Console.WriteLine(Newtonsoft.Json.JsonConvert.SerializeObject(
someObject
, Formatting.Indented));

That’s pretty much it. That’s the whole trick.

Note: You can use Formatting.None instead of Formatting.Indented if you want a more compact output (though harder to read).

Here are a couple of reasons why is isn’t always better:

  • You get the WHOLE object graph (no filtering – but see this and this)
  • Fields appear in JSON in the order they appear in the object – you don’t get to change it
  • Not easily massaged (e.g., do you want only a certain number of decimal places?)
  • (Probably more since I just started using this…)

Useful in other languages

This hack applies to any language that supports JSON serializers and formatters. For example, in Python, check out the json module.

Examples in C#

Here is are a couple of examples using a CORS tool I was fiddling with. In these examples, the serviceProperties object is of type ServiceProperties, a class from the Windows Azure Storage SDK for .NET.

Dump Just CORS:
Newtonsoft.Json.JsonConvert.SerializeObject(
serviceProperties.Cors, Formatting.Indented);

"Cors": {
   "CorsRules": [
   {
      "AllowedOrigins": [
      "*"
   ],
   "ExposedHeaders": [
      "*"
   ],
   "AllowedHeaders": [
      "*"
   ],
   "AllowedMethods": 1,
   "MaxAgeInSeconds": 36000
   }
   ]
}

Dump Entire Properties object:

Newtonsoft.Json.JsonConvert.SerializeObject(serviceProperties, Formatting.Indented);

{
   "Logging": {
      "Version": "1.0",
      "LoggingOperations": 0,
      "RetentionDays": null
   },
   "Metrics": {
      "Version": "1.0",
      "MetricsLevel": 0,
      "RetentionDays": null
   },
      "HourMetrics": {
      "Version": "1.0",
      "MetricsLevel": 0,
      "RetentionDays": null
   },
   "Cors": {
      "CorsRules": [
      {
         "AllowedOrigins": [
            "*"
         ],
         "ExposedHeaders": [
            "*"
         ],
         "AllowedHeaders": [
            "*"
         ],
         "AllowedMethods": 1,
         "MaxAgeInSeconds": 36000
      }
      ]
   },
   "MinuteMetrics": {
      "Version": "1.0",
      "MetricsLevel": 0,
      "RetentionDays": null
   },
   "DefaultServiceVersion": null
}

To use another concrete example, consider a simple program that I wrote a while back called DumpAllWindowsCerts.cs. The program just iterates through the Certificate Store on the current machine and dumps out a bunch of information. It uses Console.WriteLine statements to do this.

To compare the old and new outputs, I jumped to the LAST Console.WriteLine statement in the file and changed it to a JsonConvert.SerializeObject statement. Here’s what happened.

Note that the old Console.WriteLine statement was very limited since the contents of these objects varied a lot, so I had kept is simple (I didn’t know what I wanted, really). But the JSON output is pretty reasonable.

————————————————– Console.WriteLine

OID = Key Usage
OID = Basic Constraints [Critical]
OID = Subject Key Identifier
OID = CRL Distribution Points
...

————————————————– JSON.NET

{
   "KeyUsages": 198,
   "Critical": false,
   "Oid": {
      "Value": "2.5.29.15",
      "FriendlyName": "Key Usage"
   },
   "RawData": "AwIBxg=="
}
{
   "CertificateAuthority": true,
   "HasPathLengthConstraint": false,
   "PathLengthConstraint": 0,
   "Critical": true,
   "Oid": {
      "Value": "2.5.29.19",
      "FriendlyName": "Basic Constraints"
   },
   "RawData": "MAMBAf8="
}
{
   "SubjectKeyIdentifier": "DAED6474149C143CABDD99A9BD5B284D8B3CC9D8",
   "Critical": false,
   "Oid": {
      "Value": "2.5.29.14",
      "FriendlyName": "Subject Key Identifier"
   },
   "RawData": "BBTa7WR0FJwUPKvdmam9WyhNizzJ2A=="
}
{
   "Critical": false,
   "Oid": {
      "Value": "2.5.29.31",
      "FriendlyName": "CRL Distribution Points"
   },
   "RawData": "MDkw...9iamVjdC5jcmw="
}

Talk: Windows Azure Web Sites are PaaS 2.0

Last night I had the chance to speak as part of the Some prefer PaaS over IaaS clouds event at the Boston Cloud Services Meetup. Thanks J Singh for inviting me and I enjoyed speaking with many of the attendees.

Some info:

Also, for those interested, next week I am giving an extended version of this talk where there will be more time (60-75 minutes) – and I promise the demos will not be inhibited by screen resolution problems! This will be at the Boston Azure User Group meeting on Tuesday Jan 21 which will take place at the NERD Center at 1 Memorial Drive in Cambridge, with pizza provided (thanks to Carbonite).

Talk: Make the Cloud Less Cloudy: A Perspective for Software Development Teams: It’s all about Productivity

Today I gave a talk at Better Software Conference East 2013 about how the cloud impacts your development team. The talk was called “Making the Cloud Less Cloudy: A Perspective for Software Development Teams” and was heavy with short demos on making your dev team more productive, then a slightly longer look into how you can evolve your application to fully go cloud-native with some interesting patterns. All the demos showed off the Windows Azure Cloud Platform, though, as I explained, most of the techniques are general and can be used with other platforms such as Amazon Web Services (AWS).

Tweet stream: twitter.com/#bsceadc

http://bsceast.techwell.com/sme-profiles/bill-wilder

http://bsceast.techwell.com/sessions/better-software-conference-east-2013/make-cloud-less-cloudy-perspective-software-developmen

The deck doesn’t mention this explicitly, but all of my demos (and my slide presentation) were done from the cloud! Yes, I was in the room, but my laptop was remotely connected to a Windows Azure Virtual Machine running in Microsoft’s East US Windows Azure data center. It worked flawlessly. 🙂

Here’s the PowerPoint Deck:

Talk: Telemetry: Beyond Logging to Insight

Today I spoke at the NYC Code Camp. My talk was Telemetry: Beyond Logging to Insight and focused on Event Tracing for Windows (ETW), ETW support in .NET 4.5, some .NET 4.5.1 additions, Semantic Logging Application Block (SLAB), Semantic Logging, and a number of other tools and ideas for using logging and other means to generate insight and answer questions. In order to allow this, “logging” needs to be structured, which ETW facilitates. In order for the structured data to make sense, developers need to be disciplined, which the Semantic Logging mindset supports.

The talk abstract and the slide deck used are both included below.

ABSTRACT

What is my application doing? This question can be difficult to answer in distributed environments such as the cloud. Parsing logs doesn’t cut it anymore. We need insight. In this talk we look at current logging approaches, contrast it with Telemetry, mix in the Semantic Logging mindset, and then use some new-fangled tools and techniques (enabled by .NET 4.5) alongside some old-school tools and techniques to see how to apply this goodness in our code. Event Tracing for Windows (ETW), the Semantic Logging Application Block, and several other tools and technologies will play a role.

DECK

Telemetry with Event Tracing for Windows (ETW), EventSource, and Semantic Logging Application Block (SLAB) — NYC CC — 14-September-2013 — Bill Wilder (blog.codingoutloud.com)

Talk (Guest Speaker at BU): Architecting to be Cloud Native – On Windows Azure or Otherwise

Tonight I had the honor of being a guest lecturer at a Boston University graduate cloud computing class – BU MET CS755, Cloud Computing, taught by Dino Konstantopoulos.

The theme of my talk was Architecting to be Cloud Native – On Windows Azure or Otherwise. The slide deck I used is included below.

Night class is tough. Thanks for a warm reception – so congratulations and many thanks to those of you able to stay awake until 9:00 PM (!).

clip_image001.png I hope to see all of you at future Boston Azure events – to get announcements, Join our Meetup Group. We are also the world’s first/oldest Azure User Group. Here are a couple of upcoming events:

Feel free to reach out with any questions (twitter (@codingoutloud) or  email (codingoutloud at gmail)) — especially if it will be “on the midterm” – and good luck in the cloud!

Bill Wilder

book-cover-medium.jpg

Talk: Azure Best Practices – How to Successfully Architect Windows Azure Apps for the Cloud

Webinar Registration:

  • Azure Best Practices – How to Successfully Architect Windows Azure Apps for the Cloud @ 1pm ET on 13-March-2013
  • VIEW RECORDING HERE: http://bit.ly/ZzQDDW 

Abstract:

Discover how you can successfully architect Windows Azure-based applications to avoid and mitigate performance and reliability issues with our live webinar
Microsoft’s Windows Azure cloud offerings provide you with the ability to build and deliver a powerful cloud-based application in a fraction of the time and cost of traditional on-premise approaches.  So what’s the problem? Tried-and-true traditional architectural concepts don’t apply when it comes to cloud-native applications. Building cloud-based applications must factor in answers to such questions as:

  • How to scale?
  • How to overcome failure?
  • How to build a manageable system?
  • How to minimize monthly bills from cloud vendors?

During this webinar, we will examine why cloud-based applications must be architected differently from that of traditional applications, and break down key architectural patterns that truly unlock cloud benefits. Items of discussion include:

  • Architecting for success in the cloud
  • Getting the right architecture and scalability
  • Auto-scaling in Azure and other cloud architecture patterns

If you want to avoid long nights, help-desk calls, frustrated business owners and end-users, then don’t miss this webinar or your chance to learn how to deliver highly-scalable, high-performance cloud applications.

Deck:

Book:

The core ideas were drawn from my Cloud Architecture Patterns (O’Reilly Media, 2012) book:

book-cover-medium.jpg

Hosted by Dell:

image

Azure Cloud Storage Improvements Hit the Target

Windows Azure Storage (WAS)

Brad Calder SOSP talk from http://www.youtube.com/watch?v=QnYdbQO0yj4

Brad Calder delivering SOSP talk

Since its initial release, Windows Azure has offered a storage service known as Windows Azure Storage (WAS). According to the SOSP paper and related talk published by the team (led by Brad Calder), WAS is architected to be a “Highly Available Cloud Storage Service with Strong Consistency.” Part of being highly availably is keeping your data safe and accessible. The SOSP paper mentions that the WAS service retains three copies of every stored byte, and (announced a few months before the SOSP paper) another asynchronously geo-replicated trio of copies in another data center hundreds of miles away in the same geo-political region. Six copies in total.

WAS is a broad service, offering not only blob (file) storage, but also a NoSQL store and a reliable queue.

Further, all of these WAS storage offerings are strongly consistent (as opposed to other storage approaches which are sometimes eventually consistent). Again citing the SOSP paper: “Many customers want strong consistency: especially enterprise customers moving their line of business applications to the cloud.” This is because traditional data stores are strongly consistent and code needs to be specially crafted in order to handle an eventually consistent model. This simplifies moving existing code into the cloud.

The points made so far are just to establish some basic properties of this system before jumping into the real purpose of this article: performance at scale. The particular points mentioned (highly available, storage in triplicate and then geo-replicated, strong consistency, and supporting also a NoSQL database and reliable queuing features) were highlighted since they may be considered disadvantages – rich capabilities that may be considered to hamper scalability and performance. Except that they don’t hamper scalability and performance at all. Read on for details.

Performance at Scale

A couple of years ago, Nasuni benchmarked the most important public cloud vendors on how their services performed on cloud file storage at scale (using workloads modeled after those observed from real world business scenarios). Among the public clouds tested were Windows Azure Storage (though only the blob/file storage aspect was considered), Amazon S3 (an eventually consistent file store), and a couple of others.

In the first published result in 2011, Nasuni declared Amazon S3 the overall winner, prevailing over Windows Azure Storage and others, though WAS fininshed ahead of Amazon in some of the tests. At the time of these tests, WAS was running on its first-generation network architecture and supported capacity as described in the team’s published scalability targets from mid-2010.

In 2012, Microsoft network engineers were busy implementing a new data center network design they are calling Quantum 10 (or Q10 for short). The original network design was hierarchical, but the Q10 design is flat (and uses other improvements like SSD for journaling). The end result of this dramatic redesign is that WAS-based network storage is much faster, more scalable, and as robust as ever. The corresponding Q10 scalability targets were published in November 2012 and show substantial advances. EDIT: the information on scalability targets and related factors is kept up to date in official documentation here.

Q10 was implemented during 2012 and apparently was in place before Nasuni ran its updated benchmarks between November 2012 and January 2013. With its fancy new network design in place, WAS really shined. While the results in 2011 were close, with Amazon S3 being the overall winner, in 2012 the results were a blowout, with Windows Azure Storage being declared the winner, sweeping all other contenders across the three categories.

“This year, our tests revealed that Microsoft Azure Blob Storage has taken a significant step ahead of last year’s leader, Amazon S3, to take the top spot. Across three primary tests (performance, scalability and stability), Microsoft emerged as a top performer in every category.” -Nusani Report

The Nasuni report goes on to mention that “the technology [Microsoft] are providing to the market is second to none.”

Reliability

One aspect of the report I found very interesting was in the error rates. For several of the vendors (including Amazon, Google, and Azure), Nasuni reported not a single error was detected during 100 million write attempts. And Microsoft stood alone for the read tests: “During read attempts, only Microsoft resulted in no errors.” In my book, I write about the Busy Signal Pattern which is needed whenever transient failures result during attempts to access a cloud service. The scenario described in the book showed the number of retries needed when I uploaded about four million files. Of course, the Busy Signal Pattern will still be needed for storage access and other services – not all transient failures can be eliminated from multitenant cloud services running on commodity hardware served over the public internet – and while this is not a guarantee there won’t be any, it does bode well for improvements in throughput and user experience.

And while it’s always been the case you can trust WAS for HA, these days it is very hard to find any reason – certainly not peformance or scalability – to not consider Windows Azure Storage. Further, WAS, S3, and Google Storage all have similar pricing (already low – and trending towards even lower prices) – and Azure, Google, and Amazon have the same SLAs for storage.

References

Note that the Nasuni report was published February 19, 2013 on the Nasuni blog and is available from their web site, though is gated, requiring that you fill out a contact form for access. The link is here: http://www.nasuni.com/blog/193-comparing_cloud_storage_providers_in

Other related articles of interest:

  1. Windows Azure beats the competition in cloud speed test – Oct 7, 2011 – http://yossidahan.wordpress.com/2011/10/07/windows-azure-beats-the-competition-in-cloud-speed-test/
  2. Amazon bests Microsoft, all other contenders in cloud storage test – Dec 12, 2011 –
  3. Only Six Cloud Storage Providers Pass Nasuni Stress Tests for Performance, Stability, Availability and Scalability – Dec 11, 2011 – http://www.nasuni.com/news/press_releases/46-only_six_cloud_storage_providers_pass_nasuni_stress
  4. Dec 3, 2012 – http://www.networkworld.com/news/2012/120312-argument-cloud-264454.html – Cloud computing showdown: Amazon vs. Rackspace (OpenStack) vs. Microsoft vs. Google
  5. http://www.networkworld.com/news/2013/021913-azure-aws-266831.html?hpg1=bn – Feb 19, 2013 – Microsoft Azure overtakes Amazon’s cloud in performance test

Beyond IaaS for the IT Pro (Part 21 of 31)

[This post is part 21 of the 31 Days of Server (VMs) in the Cloud Series – I contributed the article below, but all others have been contributed by others – please find the index for the whole series by clicking here.]

As technology professionals we need to be careful about how we spend our time. Unless we want short careers, we find time to keep us with at least some new technologies, but there isn’t time in anyone’s day to keep up with every technology. We have to make choices.

For the IT Pro looking at cloud technologies, the IaaS capabilities are a far more obvious area on which to spend time than PaaS capabilities. In this post, we’ll take a peek into PaaS. The goal is to clarify the difference between IaaS and PaaS, understand what PaaS is uniquely good for, and offer some reasons why a busy IT Pro might want to invest some time learning about PaaS.

While the concepts in this point can apply generally to many platforms – including public and private clouds, Microsoft technologies and competing solutions – this post focuses on IaaS and PaaS capabilities within the Windows Azure Cloud Platform. Virtual machines and SQL databases are highlighted since these are likely of greatest interest to the IT Pro.

The Options – From Ten Thousand Feet

The NIST Definition of Cloud Computing (SP800-145) defines some terms that are widely used in the industry for classifying cloud computing approaches. One set of definitions delineates Software as a Service (SaaS), Platform as a Service (PaaS), and Infrastructure as a Service (IaaS). You can read the NIST definitions for more details, but the gist is this:

Service Model What You Provide Target Audience Control & Flexibility Expertise Needed Example
SaaS Users Business Users Low App usage Office 365
PaaS Applications Developers Medium App design and mgmt Windows Azure Cloud Services
IaaS Virtual Machines IT Pros High App design and mgmt
+ VM/OS mgmt
Windows Azure Virtual Machines (Windows Server, Linux)

Generally speaking, as we move from SaaS through PaaS to IaaS, we gain more control and flexibility at the expense of more cost and expertise needed due to added complexity. There are always exceptions (perhaps a SaaS solution that requires complex integration with an on-premises solution), but this is good enough to set the stage. Now let’s look at the core differences between PaaS and IaaS as they relate to the IT Pro.

Not All VMs are Created Equal

Even though Windows Azure has vastly more to offer (more on that later), the most obvious front-and-center offering is the humble VM. This is true both for PaaS and IaaS. So what distinguishes the two approaches?

The VMs for PaaS and IaaS behave very differently. The PaaS VM has a couple of behaviors that may surprise you, while the IaaS VM behavior is more familiar. Let’s start with the most far-reaching difference: On a PaaS VM, local storage is not durable.

This has significant implications. Suppose you install software (perhaps a database) on a PaaS VM and it stores some data locally. This will work fine… at least for a short while. At some point, Azure will migrate your application from one node to another… and it will not bring local data with it. Your locally-stored database data, not to mention any custom system tuning you did during installation, are gone. And this is by design. (For list of scenarios where PaaS VM drive data is destroyed, see the bottom of this document.)

How can this possibly be useful: a VM that doesn’t hold on to its local data…

You might wonder how this can possibly be useful: a VM that doesn’t hold on to its data. The fact of the matter is that it is not very useful for many applications written with conventional (pre-cloud) assumptions (such as guarantees around the durability of data). [PaaS may not be good at running certain applications, but is great at running others. So please keep reading!]

PaaS VM Local Storage

The PaaS VM drives use conventional server hard drives. These can fail, of course, and they are not RAID or high-end drives; this is commodity hardware optimized for high value for the money. And even if drives don’t outright fail, there are scenarios where the Azure operating environment does not guarantee durability of locally stored data (as referenced earlier).

IaaS VM Local Storage

On the other hand, IaaS VMs do have persistent/durable local drives. This is what makes them so much more convenient to use – and why they have a more familiar feel to IT Pros (and developers). But these drives are not the local server hard drives (other than the D: drive which is expected to be used only for temporary caching), they use a high capacity, highly scalable data storage service known as the Windows Azure Blob service (“blobs” for short, where each blob is roughly equivalent to a file, and each drive referenced by the VM is a VHD stored as one of these files). Data stored in blobs is safe from hardware failure: it is stored in triplicate by the blob service (each copy on a different physical node), and is even then geo-replicated in the background to a data center in another region, resulting (after a few minutes of latency) in an additional three copies.

IaaS VMs have persistent/durable local storage backed by blobs… this makes them so much more convenient to use – and more familiar to IT Pros

Storing redundant copies of your data offers a RAID-like feel, though is more cost-efficient at the scale of a data center.

Since blobs transparently handle storage for IaaS VMs (operating system drive, one or more data drives) and is external to any particular VM instance, in addition to being a familiar model, it is extremely robust and convenient.

Summarizing Some Key Differences

PaaS VM IaaS VM
Virtual Machine image Choose from Win 2008 SP2, Win 2008 R2, and Win 2012. There are patch releases within each of these families. There are many to choose from, including those you can create yourself. Can be Windows or Linux.
Hard Disk Persistence Not durable. Could be lost due to hardware failure or when moved from one machine to another. Durable. Backed by a blob (blobs are explained below).
Service Level Agreement (SLA) 99.95% for two or more instances (details). No SLA offered for single instance. 99.95% for two or more instances. 99.9% for single instance. (Preliminary details.)

SLA details for the IaaS VM are preliminary since the service is still in preview as of this writing.

SQL Database Options: PaaS vs IaaS

Windows Azure offers a PaaS database option, formerly called SQL Azure, and today known simply as SQL Database. This is really SQL Server behind the scenes, though it is not exactly the same as SQL Server 2012 (“Denali”).

SQL Database is offered as a service. This means with a few mouse clicks (or a few lines of PowerShell) you can have a database connection string that’s ready to go. Connecting to this database will actually connect you to a 3-node SQL Server cluster behind the scenes, but this is not visible to you; it appears to you to simply be a single-node instance. Three copies of your data are maintained by the cluster (each on different hardware).

Consider the three copies of every byte to be great for High Availability (HA), but offers no defense against Human Error (HE). If someone drops the CUSTOMER table, that drop will be immediately replicated to all three copies of your data. You still need a backup strategy.

One big benefit of the SQL Database service is that the server is completely managed by Windows Azure… with the flip side of that coin being that an IT Pro simply cannot make any adjustments to the configuration. Note that SQL tuning and database schema design skills have not gone anywhere; this is all just as demanding in the cloud as outside the cloud.

SQL Database Service has a 150 GB Limit

SQL Database has some limitations. The most obvious is that you cannot store more than 150 GB in a single instance. What happens when you have 151 GB? This brings to light another PaaS/IaaS divergence: the IaaS approach is to grow the database (“scale up” or “vertical scaling”) while the PaaS approach is to add additional databases (“scale out” or “horizontal scaling”). For the SQL Database service in Windows Azure, only the “horizontal scaling” approach is supported – it becomes up to the application to distribute its data across more than one physical database, an approach known commonly as sharding, where each shard represents one physical database server. This can be a big change for an application to support since the database schema needs to be compatible, which usually means it needs to have been originally designed with sharding in mind. Further, the application needs to be built to handle finding and connecting to the correct shard.

For PaaS applications that wish to support sharding, the Federations in SQL Database feature provides robust support for handling most of the routine tasks. Without the kind of support offered by Federations, building a sharding layer can be far more daunting. Federations simplifies connection string management, has smart caching, and offers management features that allow you to repartition your data across SQL Database nodes without experiencing downtime.

The alternative to SQL Database is for you to simply use an IaaS VM to host your own copy of SQL Server. You have full control (you can configure, tune, and manage your own database, unlike with the SQL Database service where these functions are all handled and controlled by Windows Azure). You can grow it beyond 150 GB. It is all yours.

But realize that in the cloud, there are still limitations. All public cloud vendors offer a fixed menu of virtual machine sizes, so you will need to ensure that your self-managed IaaS SQL Server will have enough resources (e.g., RAM) for your largest database.

Any database can outgrow its hardware, whether on the cloud or not.

It is worth pointing out that any database can outgrow its hardware. And the higher end the hardware, the more expensive it becomes from a “capabilities for the money” point of view. And at some point you can reach the point where (a) you can’t afford sufficiently large hardware, or (b) the needed hardware is so high end that it is not commercially available. This will drive you towards a either a sharding architecture, or some other approach to make your very large database smaller so that it will fit in available hardware.

SQL Database Service is Multitenant

Another significant difference between the SQL Database service and a self-hosted IaaS SQL Server is that the SQL Database service is multitenant: your data sits alongside the data of other customers. This is secure – one customer cannot access another customer’s data – but it does present challenges when one customer’s queries are very heavy, and another customer [potentially] experiences variability in performance as a result. For this reason, the SQL Database service protects itself and other customers by not letting any one customer dominate resources – this is accomplished with “throttling” and can manifest in multiple ways, from a delay in execution to dropping a connection (which the calling application is responsible for reestablishing).

Don’t underestimate the importance of properly handling of throttling. Applications need to be written to handle these scenarios in order to function correctly. Throttling can happen even if your application is doing nothing wrong.

Handling the throttling should not be underestimated. Proper throttling handling requires that application code handle certain types of transient failures and retry. Most existing application code does not do this. Blindly pointing an existing application at a SQL Database instance might seem to work, but will also potentially experience odd errors occasionally that may be hard to track down or diagnose if the application was written (and tested) in an environment where interactions with SQL Server always succeeded.

The self-managed IaaS database does not suffer this unpredictability since you presumably control which application can connect and can manage resources more directly.

SQL Database Service has Additional Services

The SQL Database service has some easy to enable features that may make your like easier. One example is the database sync service that can be enabled in the Windows Azure Portal. You can easily configure a SQL Database instance to be replicated with one or more other instances in the same or different databases. This can help with an offsite-backup strategy, mirroring globally to reduce latency, and is one area where PaaS shines.

image

SQL Database Service is SQL Server

Windows Azure today offers the SQL Database service based on SQL Server 2012. If your application (for some reason) needs an older version SQL Server (perhaps it is a vendor product and you don’t control this), then your hands are tied.

Or perhaps you want another database besides SQL Server. Windows Azure has a partner offering MySQL, and other vendor products will likely be offered over time. NoSQL Databases are also becoming more popular. Windows Azure natively offers the NoSQL Windows Azure Table service, and a few examples of other third-party ones include MongoDB, Couchbase, RavenDB, and Riak. Unless (or until) these are offered as PaaS services through the Windows Azure Store, your only option will be to run them yourself in an IaaS VM.

WazOps Features and Limitations

The main thrust of PaaS is to make operations efficient for applications designed to align with the PaaS approach. For example, applications that can deal with throttling, or can deal with a PaaS VM being migrated and losing all locally stored data. This is all doable – and without degrading user experience – it just so happens that most applications that exist today (and will still exist tomorrow) don’t work this way.

The PaaS approach can be used to horizontally scale an application very efficiently (whether computational resources running on VMs or database resources sharded with Federations for SQL Database), overcome disruptions due to commodity hardware failures, gracefully handle throttling (whether from SQL Database or other Azure services not discussed), and do so with minimal human interaction. But getting to this point is not automatic.

WazOps – DevOps, Windows Azure style! – is the role that will build out this reality. There are auto-scaling tools, both external services, and some that we can run ourselves — like the awesome WASABi auto-scaling application block from Microsoft’s Patterns & Practices group – that can be configured to scale an application on a schedule or based on environmental signals (like the CPU is spiking in a certain VM).

There is also the mundane. How to script a managed deployment so our application can be upgraded without downtime? Windows Azure PaaS services have features for this, such as the in-place update and the VIP Swap. But we still need to understand them and create a strategy to use them appropriately.

Further, there are at least some of the same-old-details. For example, it is easy to deploy an SSL certificate to my PaaS VM that is being deployed to IIS… but it still will expire in a year and someone still needs to know this – and know what to do about it before it results in someone being called at 2:00 AM on a Sunday.

Should IT Pros Pass on PaaS?

Clearly there are some drawbacks to running PaaS since most existing applications will not run successfully without some non-trivial rework, but will work just fine if deployed to IaaS VMs.

However, that does not mean that PaaS is not useful. It turns out that some of the most reliable, scalable, cost-efficient applications in the world are architected for this sort of PaaS environment. The Bing services behind bing.com take this approach, as only one example. The key here is that these applications are architected assuming a PaaS environment. I don’t use the term “architected” lightly, since architecture dictates the most fundamental assumptions about how an application is put together. Most applications that exist today are not architected with PaaS-compatible assumption. However, as we move forward, and developer skills catch up with the cloud offerings, we will see more and more applications designed from the outset to be cloud-native; these will be deployed using these PaaS facilities.

A stateless web-tier (with no session affinity in the load balancer) is a good example today of an application that could run successfully in a PaaS environment – though I’ll be quick to note that other tiers of that application may not run so well in PaaS. Which bring up an obvious path going forward: hybrid applications that mix PaaS and IaaS. This will be a popular mix in the coming years.

Hybrid Applications

Consider a 3-tier application with a web tier running in IIS, a service tier, and a SQL Server back-end database. If built with conventional approaches, not considering the PaaS cloud, none of these three tiers would be ready for a PaaS environment. So we could deploy all three tiers using IaaS VMs.

As a software maintenance step, it would be reasonable to upgrade the web site (perhaps written in PHP or ASP.NET) to be stateless and not need session affinity (Windows Azure PaaS Cloud Services do not support session affinity from the load balancer). These types of changes may be enough to allow the web tier to run more efficiently using PaaS VMs, while still interacting with a service tier and database running on IaaS VMs.

A future step could upgrade the service tier to handle SQL Database throttling correctly, allowing the SQL Server instance running on an IaaS VM to be migrated to the SQL Database service. This will reduce the number of Windows servers and SQL Servers being managed by the organization (shifting these to Windows Azure), and may also simplify some other tasks (like replicating that data using the Data Sync Service). Each services and VM also has its own direct costs (our monthly bill to Microsoft for the Windows Azure Cloud services we consume), which are detailed in the pricing section of the Windows Azure Portal.

Still another future step could be to migrate the middle tier to be stateless – but maybe not. All of these decisions are business decisions; perhaps the cost-benefit is not there. It depends on you application and your business and the skills and preferences of the IT Pros and developers in the organization.

Conclusions

I’ll summarize here with some of the key take-aways for the IT Pro who is new to PaaS services:

  1. Be aware of the challenges in migrating existing applications onto either PaaS VMs or SQL Database. If the application is not architected with the right assumptions (stateless VMs, SQL operations that may be throttled, 150 GB limit), it will not work correctly – even though it might seem to work at first. IaaS VMs will often present a better option.
  2. SQL Database does not support all of the features that SQL Server 2012 support. Though it does have some special ones of its own: always runs as a three-node cluster for HA, and has Federation support.
  3. PaaS is increasingly the right choice for new applications that can be built from the outset. Assumes that team understands PaaS and has learned the needed skills! (I wrote a book – Cloud Architecture Patterns – to illuminate these new skills.)
  4. Pure IaaS and pure PaaS are not the only approaches. Hybrid approaches will be productive.
  5. PaaS will gain momentum long-term due to the economic benefits since they can be cheaper to run and maintain. There are direct costs which are easy to measure (since you get a detailed bill) and indirect/people costs which are more challenging to measure.
  6. WazOps (DevOps with an Azure spin) will be the role to deliver on the promise of PaaS going forward. Not only with the well-informed WazOps professional help avoid issues of going too fast (see earlier points which speak to not all applications being PaaS-ready), but also understand the business drivers and economics of investing to move faster where appropriate for your business.

Feedback always welcome and appreciated. Good luck in your cloud journey!

[This post is part 21 of the 31 Days of Server (VMs) in the Cloud Series – please return to the series index by clicking here]