# Azure FAQ: Can I write to the file system on Windows Azure?

[Update 12-Oct-2012] This post only applies to Windows Azure Cloud Services (which have Web Roles and Worker Roles). This post was written a year before Windows Azure Web Sites and Windows Azure Virtual Machines (including Windows and Linux flavors) were announced and does not apply to either of them.

Q. Can I write to the file system from an application running on Windows Azure?

A. The short answer is that, yes, you can. The longer answer involves better approaches to persisting data in Windows Azure, plus a couple of caveats in writing data to (virtual) hard disks attached to the (virtual) machines on which your application is deployed.

Any of your code running in either (a) ASP.NET (e.g., default.aspx or default.aspx.cs) or (b) WebRole.cs/WorkerRole.cs (e.g., methods OnStartup, OnRun, and OnStop which are derived from RoleEntryPoint class) will not have permission to write to the file system. This. is. a. good. thing.®

To be clear, if you have code that currently writes to fixed locations on the file system, you will probably need to change it. For example, your ASP.NET or Role code cannot directly create/write the file c:\foo.txt – the permissions are against you, so Windows will not allow it. (To round out the picture though… You can write to the file system directly if you are running in an elevated Startup Task, but cannot write to it from a limited Startup Task. For more on Startup Tasks and how to configure them see How to Define Startup Tasks for a Role.)

The best option is usually to use one of the cloud-native solutions: use one of the Windows Azure Storage Services or use SQL Azure. These services are all built into Windows Azure for the purpose of supporting scalable, reliable, highly available storage. In practice, this means choosing among Windows Azure Blob storage, Windows Azure Table storage, or SQL Azure.

The second-best option is usually to use a Windows Azure Cloud Drive – which is an abstraction that sits on top of Blob storage (Page blobs, specifically) – and looks and acts a lot like an old-fashioned hard disk. You can access it with a drive letter (though you won’t know the drive letter until deployment time!), it can be mounted by and read from multiple of your role instances, but only one of these at a time will be able to mount it for updating. The Windows Azure Drive feature is really there for backward compatibility – to make it easier to migrate existing applications into the cloud without having to change them. Learn more from Neil Mackenzie’s detailed post on Azure Drives.

The third-best option is usually to use the local hard disk. (And this is what the original FAQ question specifically asked about.) Read on…

### Writing Data to Local Drive from Windows Azure Role

So… Can I write to the hard disk? Yes. And you have a decent amount of disk at your disposal, depending on role size. Using Azure APIs to write to disk on your role is known as writing to Local Storage. You will need to configure some space in Local Storage from your ServiceDefinition.csdef by giving that space (a) a name, (b) a size, and (c) indicating whether the data there should survive basic role recycles (via cleanOnRoleRecycle). Note – cleanOnRoleRecycle does not guarantee your data will survive – it is just a hint to the Fabric Controller that, if it is available, should it leave it around or clean it up.  That limitation is fine for data that is easily recalculated or generated when the role starts up – so there are some good use cases for this data, even for cloud-native applications – think of it as a handy place for a local cache. (Up above I refer to this as the usually being the third-best option. But maybe it is the best option! In some use cases it might be. One good example might be if you were simply exploding a ZIP file that was pulled from blob storage, but there are others too. But let’s get back to Local Storage…)

Here is the snippet from ServiceDefinition.csdef:

... <LocalResources> <LocalStorage name="SomeLocationForCache" cleanOnRoleRecycle="false" sizeInMB="10" /> </LocalResources> ...

You can also use the Windows Azure Tools for Visual Studio user interface to edit these values; double-click on the role you wish to configure from the Roles list in your Windows Azure solution. This is the easiest approach.

Once specified, the named Local Storage area can be written to and read from using code similar to the following:
 // reference Microsoft.WindowsAzure.ServiceRuntime.dll from SDK // (probably in C:\Program Files\Windows Azure SDK\v1.4\ref) const string azureLocalResourceNameFromServiceDefinition = "SomeLocationForCache"; var azureLocalResource = RoleEnvironment.GetLocalResource( azureLocalResourceNameFromServiceDefinition); var filepath = azureLocalResource.RootPath + "myCacheFile.xml"; // build full path to file // the rest of the code is plain old reading and writing of files // using the 'filepath' variable immediately above

### Writing to TEMP Folder from Windows Azure Role

How about writing temporary files? Is that supported? Yes, same as in Windows. For example, in .NET one can get a temporary scratch space and write to it using code similar to the following:
 var filepath = System.IO.Path.GetTempFileName(); System.IO.File.WriteAllText(filepath, "some text");

### Do Not Use Environment.SpecialFolder Locations in Azure

You may also have some existing code which writes files for the currently logged in user. Check the Environment.SpecialFolder Enumeration for the full list, but one examples is Environment.SpecialFolder.ApplicationData. You would access this location with code such as the following:

string filepath = Environment.GetFolderPath( Environment.SpecialFolder.ApplicationData, Environment.SpecialFolderOption.DoNotVerify);

You will find that your ASP.NET code will be able to write to this location, but that is almost certainly not what you want! By default, the user account under which you will be saving this data is one that is generated when your role is deployed – something like RD00155D328831$– not some IPrincipal from your Windows domain. Further, for data you care about, you don’t want to store data it in the local file system in Windows Azure. Better options should be apparent from earlier points made in this article. And, finally, you may prefer the elegance of claims-based federated authentication using the Access Control Service. ### Writing to File System from Windows Service in Windows Azure Role If you want to do something unusual, like write to the file system from outside of Role’s code, there are ways to write to the file system from a Windows Service or a Startup Task (though be sure to run your Startup Task with elevated permissions). Is this useful? Did I leave out something interesting or get something wrong? Please let me know in the comments! Think other people might be interested? Spread the word! Advertisement # Azure FAQ: How much will it cost me to run my application on Windows Azure? Q. How much will it cost me to run my application in the Windows Azure cloud platform? A. The anwer, of course, depends on what you are doing. Official pricing information is available on the Windows Azure Pricing site, and to help you model pricing for your application you can check out the latest Windows Azure Pricing Calculator. Also, the Microsoft Assessment and Planning (MAP) Toolkit is now in beta. Simple cost example: Running One Instance of a Small Compute Role costs 12¢ per hour, which is around$1052 per year. A SQL Azure instance that holds up to 1 GB costs $9.99 per month. If you have Two Small Compute Instances & 1 GB of SQL Azure storage, plus throwing in some bandwidth use, a dash of Content Delivery Network (CDN) use, and your baseline cost might start at around$2,225.

Update 22-June-2011: The pricing calculators may not reflect this interesting development: data transfer into the Azure Data Centers becomes free on July 1, 2011. See: https://blog.codingoutloud.com/2011/06/22/free-data-transfer-into-azure-datacenters-is-a-big-deal/ and http://blogs.msdn.com/b/windowsazure/archive/2011/06/22/announcing-free-ingress-for-all-windows-azure-customers-starting-july-1st-2011.aspx

But it is not always that simple: this is just the simplest, pay-as-you-go model! In the short term, there are many deals, offers, and trials – some free. There are Azure benefits included with MSDN. And long term there are ways to get better rates if you have an Enterprise Agreement with Microsoft, or by selecting a more predictable baseline than pay-as-you-go. See the Windows Azure Pricing site for current options.

Further, when comparing costs with other options, consider a few factors:

• The SQL Azure storage is really a SQL Azure cluster of three instances giving you storage redundancy (3 copies of every byte), high availability (with automagic failover), high performance, and other advanced capabilities.
• Similarly, every byte written to Windows Azure Storage (blobs, tables, and queues) is stored as three copies.
• Running two Small Compute instances of a role comes with a 99.9% uptime Service Level Agreement (SLA), and a 99.95% connectivity SLA. Read more about the Compute, SQL Azure, and other Windows Azure Platform SLAs here.
• Since Windows Azure is Platform as a Service (PaaS), be careful to also consider that you may have fewer hassles and lower engineering and operational costs – these are lower staff-time costs – if you are comparing to an Infrastructure as a Service (IaaS) offering.

While you are at it, consider checking out some of these related third-party offerings:

• CloudValue – A whole company dedicated to understanding and optimizing costs in moving to the cloud. I saw them at TechEd Atlanta in May 2011. They (a) presented a generally useful talk on Cost-Oriented Development (not specific to their technology, though we saw a glimpse of their Visual Studio integrated cost analyzer); and they (b) had a booth so people could check out their CloudValue Online Service Tracking (COST) service which provides ongoing analysis of your costs in the Windows Azure cloud. I am trying out the COST product now that my beta request has been approved!
• CloudTally – A service offering from Red Gate Software – currently in beta, and currently free – will keep an eye on your SQL Azure database instance and based on how much data you have in it over time, it will report your daily storage costs via email. I’ve been using this for a few months. The data isn’t very sophisticated – of the “you spent \$3.21 yesterday” variety – but I think they are considering some enhancements (I even sent them some suggestions).
• Windows Azure Migration Scanner – An open source tool created by Neudesic to help you identify changes your application might require in order to make it ready for Azure. This is not specifically a cost-analysis tool, but is useful from a cost-analysis point of view since it can help you predict operational costs of the Azure-ready version of your application – for example if you will make changes to leverage the reliable queue service in Windows Azure Storage, you will know enough to model this. Read David Pallmann’s introduction to the scanner, where he also mentions some other tools.
• Greybox – While not a core tool for calculating costs, it is a interesting open source utility to help you avoid the “I-deployed-to-Azure-for-testing-purposes-but-forgot-all-about-it” memory lapse. (If deployed, you pay – whether you are using it or not. Like an apartment – you pay for it, even while you are at work – though Azure has awesome capabilities for you to “move out of your cloud apartment” during times when you don’t need it!) You may not need it, but its existance illustrates an important lesson!

Credit: I discovered the new Windows Azure Pricing Calcular from http://twitter.com/#!/dhamdhere/status/73056679599677440.

Is this useful? Did I leave out something interesting or get something wrong? Please let me know in the comments! Think other people might be interested? Spread the word!

# Azure FAQ: How do I run MapReduce on Windows Azure?

Q. Some cloud environments support running MapReduce. Can I do this in Windows Azure?

A. You can run MapReduce in Windows Azure. First we give some pointers, then get into some other options that might even be more useful or powerful, depending on what you are doing.

Summary of most obvious Azure-oriented choices: (1) Apache Hadoop on Azure, (2) LINQ to HPC leveraging Azure, or (3) Daytona Map/Reduce on Azure.

The first approach is to use the open source Apache Hadoop project which implements MapReduce. Details on how to run Hadoop on Azure are available on the Distributed Development Blog. Update 14-Oct-2011: Check out this write-up by Ted Kummert about his keynote at PASS where he discussed deeper Hadoop support for Windows Azure: “Microsoft makes this possible through SQL Server 2012 and through new investments to help customers manage ‘big data’, including an Apache Hadoop-based distribution for Windows Server and Windows Azure and a strategic partnership with Hortonworks. Our announcements today highlight how we enable our customers to take advantage of the cloud to better manage the ‘currency’ of their data.” Also, Avkash Chauhan provides a nice summary of the announcement.

The MapReduce tutorial on the Apache Hadoop project site explains the goal of the project, as followed by detailed steps on how to use the software.

“Hadoop MapReduce is a software framework for easily writing applications which process vast amounts of data (multi-terabyte data-sets) in-parallel on large clusters (thousands of nodes) of commodity hardware in a reliable, fault-tolerant manner.” – from Overview section of Hadoop MapReduce tutorial

Another entrant in this Big Data Analytics space is LINQ to HPC. For more details on LINQ to HPC, check out David Chappell‘s whitepaper called Introducing LINQ to HPC: Processing Big Data on Windows. Chappell explains the value proposition, and also talks about when you might use it versus using SQL Server Parallel Data Warehouse. LINQ to HPC beta 2 is availlable for download.

[Update 19-July-2011: Daytona enters the fray] “Microsoft has developed an iterative MapReduce runtime for Windows Azure, code-named Daytona.” It is available for download as of early July, though has a non-commercial-use-only license attached to it. (credit: saw it on the insideHPC blog)

[Update 19-July-2011: It is now clear that LINQ to HPC (available in beta 2!) is supplanting DryadLINQ.] You may also be interested in checking out DryadLINQ from Microsoft Research. Though not identical to MapReduce, they describe it as “a simple, powerful, and elegant programming environment for writing large-scale data parallel applications running on large PC clusters.” As of this writing it was not licensed for commercial use, but was available under an academic use license. (With the introduction of LINQ to HPC, I can’t tell whether these projects are related, or whether LINQ to HPC is the productized version of DryadLINQ.)

And, finally, I also just read an interesting post called Hadoop is the Answer! What is the Question? by Tim Negris. This brings up some good points about the maturity of Hadoop and other points – if you are thinking about MapReduce, Hadoop, DryadLINQ, or other approaches, give his article a read.