Azure SQL Data Warehouse

Azure SQL Data Warehouse is a cloud-based data warehouse-as-a-service hosted within Microsoft’s Azure platform. It has a massively parallel processing (MPP) shared nothing architecture capable of distributing query computation over a set of compute nodes running Azure SQL Database and uses Azure Storage Blobs as the underlying data storage.[1]

By combining MPP architecture and Azure storage capabilities, SQL Data Warehouse can:

  • Grow or shrink storage independent of compute
  • Grow or shrink compute without moving data
  • Pause compute capacity while keeping data intact
  • Resume compute capacity at a moment's notice

It is designed to handle computational and data intensive queries on datasets exceeding 1TB. This technology is based on Microsoft’s on premise data warehousing product, Parallel Data Warehouse (PDW), which runs on the Analytics Platform System (APS).

Azure SQL Data Warehouse decouples compute and storage enabling compute power to be adjusted independently of storage based on workload requirements any given time.[2]

Allocation of resources to SQL Data Warehouse is measured in Data Warehouse Units (DWUs). DWUs are a measure of underlying resources like CPU, memory, IOPS, which are allocated to your SQL Data Warehouse. Increasing the number of DWUs increases resources and performance. Specifically, DWUs help ensure that:

  • Scale a data warehouse easily, without worrying about the underlying hardware or software.
  • Predict performance improvement for a DWU level before changing the size of a data warehouse.
  • The underlying hardware and software of an instance can change or move without affecting the workload performance.
  • Microsoft can make adjustments to the underlying architecture of the service without affecting the performance of a workload.
  • Microsoft can rapidly improve performance in SQL Data Warehouse, in a way that is scalable and evenly effects the system.

This PaaS also features PolyBase[3] which allows queries to span across both structured and unstructured analytical workloads using Transact-SQL(TSQL). PolyBase[4] is also the optimal method to load data into SQL Data Warehouse[5] as it does so in parallel across the compute nodes.[6]

All data is written geo-redundantly across regions for high availability and disaster recovery. Scaling computation does not require reorganization of the underlying data which allows this operation to take only the time needed to allocate new compute nodes. This data warehousing service also includes the ability to pause a data warehouse instance which also pauses the billing of the compute nodes.

Data Tools

SQL Data Warehouse integrates with data tools such as Azure Machine Learning[7] for advanced analytics, Azure Data Factory for data orchestration, Azure Stream Analytics for real-time analytics, and Power BI[8] for data visualization. Standard query tools like SQL Management Studio (SSMS), SQLCMD, and SQL Server Data Tools can also be used to execute queries.

Use Cases

  • Disaster Recovery[9]
  • Stream Analytics and Machine Learning[10]
  • Enterprise-wide views and Analysis[11]

History

SQL Data Warehouse is generally available as of July 12, 2016. It was in limited public preview on June 25, 2015.[12]

Pricing

Azure SQL Data Warehouse, is an elastic cloud data warehousing service that allows pausing and scaling on demand. It creates a SQL-based view across all your data which enables business insights. Pricing in January 2017 is approximately 1/10th of the cost of traditional appliance solutions.

Further reading

  • Next Generation Data warehouse[13]
  • Implementing a 2016 Data Warehouse[14]
  • 7 Coolest features of Azure SQL and SQL Data Warehouse[15]
  • Statistics in Azure SQL Data Warehouse[16]
  • Taking Azure SQL Data Warehouse for a Test Drive[17]

References

  1. ^ "What is Azure SQL Data Warehouse | Microsoft Azure". azure.microsoft.com. Retrieved 2015-12-09. 
  2. ^ "Cloud Data Warehousing Just Got Easier with Azure SQL Data Warehouse". www.blue-granite.com. Retrieved 2015-12-15. 
  3. ^ "PolyBase". msdn.microsoft.com. Retrieved 2015-12-10. 
  4. ^ DeWitt, David J.; Halverson, Alan; Nehme, Rimma; Shankar, Srinath; Aguilar-Saborit, Josep; Avanes, Artin; Flasza, Miro; Gramling, Jim. "Split query processing in polybase". Proceedings of the 2013 international conference on Management of data - SIGMOD '13. doi:10.1145/2463676.2463709. 
  5. ^ "Loading data with PolyBase in Azure SQL Data Warehouse (Channel 9)". Channel 9. Retrieved 2015-12-11. 
  6. ^ "Load data into SQL Data Warehouse | Microsoft Azure". azure.microsoft.com. Retrieved 2015-12-11. 
  7. ^ "Using Azure Machine Learning with SQL Data Warehouse | Microsoft Azure Blog". azure.microsoft.com. Retrieved 2015-12-09. 
  8. ^ "Using Power BI with SQL Data Warehouse | Microsoft Azure Blog". azure.microsoft.com. Retrieved 2015-12-09. 
  9. ^ "Microsoft Customers". customers.microsoft.com. Retrieved 2017-01-04. 
  10. ^ "Microsoft Customers". customers.microsoft.com. Retrieved 2017-01-04. 
  11. ^ "Microsoft Customers". customers.microsoft.com. Retrieved 2017-01-04. 
  12. ^ "Azure SQL Data Warehouse Leads New Microsoft Data Products -- Visual Studio Magazine". visualstudiomagazine.com. Retrieved 2015-12-15. 
  13. ^ Asmus, Oliver (September 2016). "The Next Generation Data Warehouse". Slalom. 
  14. ^ "Implementing a 2016 Data Warehouse". Dynamics Edge. 
  15. ^ Marcus, Crast. "7 Coolest features of Azure SQL Data Warehouse". 10th Magnitude. 
  16. ^ Stewart, Emma (November 2016). "Statistics in Azure SQL Data Warehouse". adatis. 
  17. ^ Sheldon, Robert (May 2016). "Taking Azure SQL Data Warehouse for a Test Drive". simple-talk.com. Simple Talk. 

External links