James Serra\s Blog-www.jamesserra.com

James Serra\s Blog

Time 2020-10-08 02:03:06

Web Name: James Serra\s Blog

WebSite: http://www.jamesserra.com

ID:74139

Keywords:

James,Serra,Blog,

Description:

Microsoft Ignitehas always announced many new products and new product features, and this year was no exception. Many exciting announcements, and below I list the major data platform related announcements:Usage-based optimization with Azure Synapse and Power BI: A new feature that analyzes usage patterns in Power BI and shares the information with Synapse is now available to improve query performances. Synapse automatically creates a materialized view optimized for Power BI users — greatly accelerating the speed of a query performance. More infoApache Spark for Azure Synapse In-cluster Caching and Shuffle Service (Preview): Caching and shuffle are two of the components of infrastructure for Apache Spark that have the greatest impact on performance. These new services, which are written from scratch, allow the optimization of performance for these components on modern hardware and operating systems. The service is enabled for Apache Spark Pools in Azure Synapse today.High throughput output from Stream Analytics to Azure Synapse Analytics (General Availability): Azure Stream Analytics jobs now have the ability to output to an existing SQL pool table in Azure Synapse Analytics, and can process throughput rates even higher than 200 MB/s. This will support the most demanding real-time analytics and hot-path data processing needs of Azure customers who depend on their Data Warehouse for workloads such as reporting and dashboarding.Azure AD authentication features for Azure SQL DB, Azure Synapse Analytics, and Azure SQL Managed Instance: Three new features using Azure Active Directory (Azure AD) authentication are currently in preview for Azure SQL Database, Azure Synapse Analytics, and Azure Managed Instance. These features help automate user creation using Azure AD applications and allow individual Azure AD guest users to be created in SQL Database, Azure Synapse Analytics, or SQL Managed Instance. First, support is being added for Azure AD user creation in SQL Database and Azure Synapse Analytics on behalf of Azure AD applications. This functionality is useful in the Azure AD application automation processes where Azure AD objects are created and maintained in SQL Database and Azure Synapse Analytics without human interaction. Second, users can use cloud groups to manage Directory Readers role assignment in Azure AD. This change will promote ease of use when setting up SQL Database, Azure Synapse Analytics, and SQL Managed Instance to create users. Finally, Azure AD guest users can now be created as database users and set as Azure AD admin without the need to first add them as members of a group created in Azure AD. These features extend existing functionality, remove user limitations, and provide customers with greater ease of use when setting up the SQL Database, Azure Synapse Analytics, or SQL Managed Instance.Announcing Azure SQL Edge now in general availability: Azure SQL Edge, which brings the most secure Microsoft SQL data engine to Internet of Things (IoT) gateways and edge devices, is now available. Optimized for IoT workloads, Azure SQL Edge supports built-in data streaming, storage, and artificial intelligence packed into a small footprint container that works in connected, disconnected, and semi-connected environments where local storage and analytics are important. You can reduce cost, drive innovation, and improve operational efficiencies by using Azure SQL Edge to stream, store, and analyze IoT data. You simply need to develop your application once and deploy it anywhere across the edge, your datacenter, and Azure.Hosting catalog databases for all supported versions of SSRS in Azure SQL Managed Instance: Now in general availability, Azure SQL Managed Instance can host catalog databases for all supported versions of SQL Server Reporting Services (SSRS). Last year, SQL Server 2019 Reporting Services introduced native support for hosting catalog databases in SQL Managed Instance. Now you can also use SQL Managed Instance to host catalog databases for earlier supported versions of SSRS. This is especially useful for fast migration of the existing solutions to SQL Managed Instance, without the need to test and adopt the latest version of SSRS. It also helps you get quick business benefits and then modernize further at your own pace. To learn how to configure your SQL Managed Instance to host catalog databases for earlier supported versions of SSRS, visit theTech Community blog.Major performance improvements for Azure SQL Managed Instances: We are announcing a set ofmajor performance improvements for Azure SQL Managed Instances, which enable you to migrate your more performance-hungry database workloads to Azure SQL Managed Instance. These improvements include better transaction log write throughput for general purpose and business critical instances and superior data/log IOPS for business critical instances. Increased performance predictability and stability for general purpose service tier through better integration and enhanced tempdb performance are also included. These improvements are automatically enabled for all existing and future Azure SQL Managed Instances at no extra charge, making Azure SQL Managed Instance the best fully-managed database platform for your mission-critical workloads. Learn more about major performance improvements for Azure SQL Managed InstancesMachine Learning on Azure SQL Managed Instance in preview: Machine Learning Services with support for R and Python languages now include preview support onAzure SQL Managed Instance. When using Machine Learning Services with R or Python support in Azure SQL Managed Instance, you can run R and Python scripts to do data preparation and general purpose data processing, train machine learning models in database, and deploy your models and scripts into production in stored procedures. Azure SQL Managed InstanceConfigurable backup storage redundancy options for Azure SQL Database coming soon: By default, all backups taken on Azure SQL Database are stored on read-access geo-redundant storage (RA-GRS) blobs that are being replicated inpaired regions, ensuring data is protected in case of planned and unplanned events. Preview of configurable backup storage redundancy options for Azure SQL DB is coming soon to South East Asia, with additional regions expected in the coming months. This feature will provide you with more flexibility and choice to pick between locally redundant storage (LRS), zone redundant storage (ZRS) or geo redundant (RA-GRS) storage blobs for their backups. As additional regions are launched in preview, there will be more backup storage redundancy options available to use to meet your data residency requirements or minimize backup costs. You will be able to easily configure backup storage redundancy in the Azure portal. For more information, visit ourTech Community blog.Azure Synapse Link for Azure Cosmos DB: SQL serverless runtime support coming soon: Azure Synapse Link will soon have preview support for querying Azure Cosmos DB data withSynapse SQL serverless. In the coming months, you will be able to run analytical T-SQL queries over data in Azure Cosmos DB in place within seconds. These queries can be used to build rich near real-time dashboards using Power BI integration with Azure Synapse Analytics. This new feature enhances the capability of Azure Synapse Link and will give you a wide range of business intelligence and ad-hoc querying tools via the T-SQL interface. Azure Synapse Link, announced earlier this year, creates a tight integration between Azure Cosmos DB and Azure Synapse Analytics. This gives you direct access to Azure Cosmos DB data from Azure Synapse Analytics to run near real-time business intelligence, analytics, and machine learning pipelines. This integration enables you to build cloud-native HTAP (Hybrid transactional/analytical processing) solutions to perform no-ETL (extract, transform, and load) analytical processing on operational data in Azure Cosmos DB, without impacting the performance of your transactional workloads. Azure Synapse Analytics will soon support Azure Synapse Link withSynapse Apache SparkandSynapse SQL serverless.Serverless offer for Azure Cosmos DB coming soon on all APIs: Azure Cosmos DB serverlessis in preview for the Core (SQL) API and will soon be available in preview for all Azure Cosmos DB APIs, including MongoDB, Cassandra, Gremlin, and Table. This offers NoSQL developers a database operations option with zero capacity planning or management required. As a consumption-based option with no minimum, serverless could also significantly lower the entry price for applications. It is ideally suited for small and medium-sized workloads that do not require steady throughput, receive requests sporadically or in short bursts, and have moderate performance requirements. Learn more.Enhanced management experience for Azure SQL Managed Instance: Management operationsare an essential part of Azure SQL Managed Instance, covering create, update, and delete scenarios. Most of the management operations In SQL Managed Instance are long-running but did not have operation progress visible. Through the introduction of a newCRUD APIversion, the SQL Managed Instance resource is now visible from when the create request is submitted. In addition, the newOPERATIONS APIadds the ability to monitor management operations, see operation steps, and take dependent actions based on operation progress. Learn MoreTransactionally consistent database copy for Azure SQL Database Hyperscale tier: Now in preview, the Hyperscale service tier in Azure SQL Database supports generating a transactionally consistent database copy to either the same or different logical server or region, similar to database copy already supported in other service tiers. This enables scenarios where Hyperscale databases have to be copied for various reasons, such as development and testing. If a database copy is created within the same region, it is a fast operation regardless of data size, similar to a point in time restore in Hyperscale. A cross-region database copy is a slower size of data operation. If you require support for database copy in your workflow, you can now take advantage of unique Hyperscale capabilities such as support for very large databases, fast scaling, and multiple replicas. Learn how to create adatabase copyand visit ourHyperscale database copy blogfor more details.Hyperspace, an indexing subsystem for Apache Spark™, is now open source: For Microsoft’s internal teams and external customers, we store datasets that span from a few GBs to 100s of PBs in our data lake. The scope of analytics on these datasets ranges from traditional batch-style queries (e.g., OLAP) to explorative ”finding the needle in a haystack” type of queries (e.g., point-lookups, summarization). Resorting to linear scans of these large datasets with huge clusters for every simple query is prohibitively expensive and not the top choice for many of our customers, who are constantly exploring ways to reducing their operational costs – incurring unchecked expenses are their worst nightmares. Over the years, we have seen a huge demand for bringing indexing capabilities that come de facto in the traditional database systems world intoApache Spark™. Today, we are making this possible by releasing an indexing subsystem for Apache Spark calledHyperspace– the same technology that powers indexing withinAzure Synapse Analytics.New MERGE command for Azure Synapse Analytics: The new MERGE command in Azure Synapse Analytics allows customers to insert, update, or delete data from one table based on the differences discovered in another table. With the MERGE command, the previous multi-step process to synchronize two tables in a dedicated SQL pool is now streamlined into a single step, thus reducing the data operation cost and improving the user experience. Learn MoreCOPY command now generally available in Azure Synapse Analytics: The COPY command feature in Azure Synapse Analytics provides users a simple, flexible, and fast interface for high-throughput data ingestion for SQL workloads. With this announcement, we ve added performance improvements with file splits, ease of use with Parquet auto-schema discovery and complex data type support, and the support of the COPY command within our data integration partner ecosystem including: Databricks, Informatica, Streamsets, Panoply, and Talend. Learn MoreColumn-level encryption for Azure Synapse Analytics: Column-level encryption (CLE) helps customers implement fine-grained protection of sensitive data within a table (server-side encryption). With CLE, customers gain the ability to use different protection keys for columns in tables with each key having its own access permissions. The data in CLE enforced columns is encrypted on disk (and remains encrypted in memory) until the DECRYPTBYKEY function is used to decrypt it. This feature applies to dedicated SQL pools in Azure Synapse Analytics. Learn MoreAzure Databricks Delta engine with Photon: Announced the preview of Photon powered Delta engine onAzure Databricks– fast, easy, and collaborative Analytics and AI service. Built from scratch in C++ and fully compatible with Spark APIs, Photon is a vectorized query engine that leverages modern CPU architecture along with Delta Lake to enhance Apache Spark 3.0’s performance by up to 20x. As organizations worldwide embrace data-driven decision-making, it has become imperative for them to invest in a platform that can quickly analyze massive amounts and types of data. However, this has been a challenge. While storage and network performance have increased 10x, CPU processing speeds have only increased marginally. This leads to the question if CPUs have become the bottleneck, how can we achieve the next level of performance? The answer with Photon lies in greater parallelism of CPU processing at both the data-level and instruction-level. Photon powered Delta Engine is a 100% Apache Spark-compatible vectorized query engine designed to take advantage of modern CPU architecture for extremely fast parallel processing of data. Read theblogto learn more.Azure Arc enabled data services: Announcing Azure data services anywhere, a new capability enabled by Azure Arc that allows customers to run Azure data services on-premises, multi-cloud and edge, using Kubernetes on the hardware of choice. Azure Arc enabled SQL Managed Instance and Azure Arc enabled PostgreSQL Hyperscale are available in preview now, with other services to come over time. Customers can now deploy a variety of data services, with access to latest innovation, cloud benefits like automation and elastic scale, unified management across all data assets in hybrid environments, and unique Azure security and governance capabilities like Azure Security Center, Policies and Role-based Access Control for on-premises data workloads. Best of all, customers can use modern cloud billing models on-premises for better cost efficiency. Learn more aboutAzure Arc enabled data servicesAzure ML Ignite 2020 announcements: Azure Machine Learning Enterprise Edition (preview) capabilities are merging into Basic Edition on September 22, 2020. With these enhanced capabilities included in Basic Edition, the Enterprise Edition of Azure Machine Learning will retire on January 1, 2021.Automatically build and deploy predictive models using the no-code UI or through a code-first notebooks experience.The machine learning assisted labeling feature lets users trigger automatic machine learning models to accelerate the labeling task.AzureML operation level RBAC will allow customers to set custom roles, or re-use pre-built roles to control specific operations for the individual users in a workspace.MLflow projects support for simplified job submission from local to cloudMlflow model registry supportModel deployment support (model from MLflow model registry and deploy it to ACI or AKS)Expanded support for MLflow experimentation UIA network isolation feature that enables you to access Azure ML over a private IP in your virtual network (VNet).Azure SQL Database Cross-SQL MI distributed Tx Preview: The distributed database transactions spanning multiple Azure SQL Managed Instances will be added soon to enable frictionless migration of existing applications, as well as development of modern multi-tenant applications relying on vertically or horizontally partitioned database architecture. By utilizing distributed transactions once public preview is announced, customers will be able to save time when migrating existing applications that require this capability as it eliminates the need to change application code and to perform extensive testing. If customers develop new applications, they will be able to benefit from partitioning data into multiple databases to overcome current sizing limitations of Azure SQL Managed Instance while utilizing distributed transactions to keep partitioned data in a consistent state. Once preview is announced, two methods of running distributed transactions will be supported usingBEGIN DISTRIBUTED TRANSACTIONstatement from Transact-SQL code and usingTransactionScope classfrom .NET code.Power BI Premium will soon be available on a per-user basis. Individuals will be able to upsize their per-seat Professional licenses to Premium. During the preview period, the upgrade carries no additional cost. See Answering your questions around the new Power BI Premium per user licensePower BI Premium is getting faster, and is gaining an autoscale capability that, when enabled, allows Microsoft to provision additional V-cores (virtual CPU cores) to the customer s cloud tenant for periods of 24 hours, when overloads are detected. The v-cores are automatically removed during idle periods. See Announcing the Upcoming Evolution of Power BI Premium to enterprise markets and beyondPreview of smart narratives, an augmented analytics feature which provides plain-English summarizations of the data in a report, either on a per-visualization or page-wide basis. The narratives automatically update when data is filtered or drilled down upon, and the narratives are editable, both in terms of formatting and for insertion of arbitrary or calculation-driven text. See Smart Narrative (preview)Dedicated Power BI add-in application for Microsoft s Teams collaboration platform, released as a preview. The Teams integration includes the ability to browse reports, dashboards and workspaces and directly embed links to them in Teams channel chats. It s not just about linking though, as Teams users can also browse Power BI datasets, both through an alphabetical listing of them or by reviewing a palette of recommended ones. In both cases, datasets previously marked as Certified or Promoted will be identified as such, and Teams users will have the ability view their lineage, generate template-based reports on them, or just analyze their data in Excel. See Announcing: New Power BI experiences in Microsoft TeamsThrough a new preview feature, Power BI asset sensitivity levels, applied as Microsoft Information Protection (MIP) labels, can be applied from within Power BI Desktop, making it unnecessary to apply them to the report and its underlying dataset in the cloud service s user interface. More infoMicrosoft is enhancing its Power Query technology to feature a visual interface rather than relying on the data grid view that has been its hallmark. Essentially, Power Query gets a new Diagram View where all queries in the model, and each of the transformation steps within them, will appear together, in a flowchart-like representation. This visual presentation indicates the dependencies between queries and the sequence of individual transformations within each. New transforms can be easily added in the Diagram View as well. Visual data preparation is coming soon. It will be available initially in Power BI dataflows only (i.e. online), and not in Power BI Desktop. The capability will come to Desktop within the next calendar year. More infoA visual anomaly detection (coming soon). More infoA new visual that can trigger Power Automate processes in a data-driven fashion (also coming soon)A performance accelerator for Azure Synapse Analytics that automatically creates materialized views to accelerate frequent queries from Power BI. More infoGA of deployment pipelines (including availability on government clouds) that can move Power BI assets between development, test and production environments. See Announcing deployment pipelines General Availability (GA)A preview of a data impact analysis feature that can notify users of datasets that may be affected by a change to another dataset upstream. More infoThere are some options in Azure Synapse Analytics that are not obvious that I wanted to point out.While in Azure Synapse Studio, going to the Data hub and clicking the + at the top to the right of Data brings up:Under Workspace , choosing Synapse SQL Database gives you the ability to create a database in a SQL on-demand pool. A SQL on-demand pool (which you access via an endpoint) can have multiple databases. Note that SQL on-demand has no local storage so only metadata objects are stored in the databases.After you create a database, if you then go to the Develop hub and create a new SQL script, you can choose to connect to SQL on-demand , and then the newly created database will be available under Use database . You can then create views or external tables in that database. When you go to the Data hub, under Databases you will see the newly created database (you might need to hit refresh first).Under Linked , choosing Connect to external data will create a linked service to ADLS Gen2 or Cosmos DB that you can use in the Data hub (under Linked ). This is the same method as going to the Manage hub and choosing Linked services and clicking New (but that option has more sources to connect to). These linked services can also be used in Azure Data Factory (see Linked Services in Azure Data Factory). Under Linked , choosing Browse samples will give you a ton of samples to choose from broken into four sets:Datasets: A bunch of datasets that when added will show up under Data -> Linked -> Azure Blob Storage. You can then choose an action (via … next to any of the containers in the dataset) and choose New SQL script -> Select TOP 100 rows to examine the data as well as choose New notebook to load the data into a Spark dataframe. Any dataset you add is a linked service to files in a blob storage container using SAS authentication. You can also create an external table in a SQL on-demand pool or SQL provisioned pool to each dataset via an action (via … next to External tables under the database, then New SQL script -> New external table) and then query it or insert the data into a SQL provisioned databaseNotebooks: A bunch of sample Apache Spark notebooks in various languages (PySpark/Scala/Spark.NET C#/SparkSQL) that will show up under Develop -> NotebooksSQL scripts: A bunch of sample SQL scripts that will show up under Develop -> SQL scriptsPipelines: This is similar to choosing the Create pipeline from template on the home page of a Azure Data Factory workspace and will create a pipeline that will show up under Orchestrate -> PipelinesThese samples offer a great way to shortcut the process of building a solution in Synapse.The public preview version of Azure Synapse Analytics has three compute options and four types of storage that it can access (mentioned in my blog at SQL on-demand in Azure Synapse Analytics). This gives twelve possible combinations of querying data. Not all of these combinations currently are supported and some have a few quirks of which I list below.Relational DatabaseADLS Gen2Spark TableCosmos DBSQL Provisioned poolYY (*1)N (*2)N (*6)SQL On-demand poolN (*6)YY (*3)N (*4)Apache Spark poolYYYY (*5)*1: This is available via an external table which uses the Polybase technology and does not use push-down queries so can be slow. A feature that will be available after Azure Synapse Analytics goes GA called fast parquet will speed up queries over external tables mapped to parquet files (the technology underneath is the same that is being used for SQL on-demand)*2: This feature will be available soon, but will only support Spark tables created using Parquet as the input format*3: This feature only supports Spark tables created using Parquet as the input format (explained here)*4: This feature is available via the Azure Synapse Link and will soon be available*5: This feature is available via the Azure Synapse Link*6: This feature will be available after Azure Synapse Analytics goes GAA huge benefit of using the public preview version of Azure Synapse Analytics is the ability to query a file in the data lake by simply right-clicking the file. When you do that a menu will pop-up giving you three choices on how to query the data. Currently this works on three different files types: parquet, csv, and JSON (note that you can always query additional file types using a Spark notebook). The three query choices are listed below with all but one currently supported:PreviewNew SQL script - Select TOP 100 rowsNew notebookparquetN (*3)YYcsvYY (*1)YJSONYY (*2)Y Preview opens a pop-up window with the contents of the file, Select TOP 100 rows opens a tab with a T-SQL SELECT statement using SQL on-demand and then you can run the statement to show the results, and New notebook opens a Spark notebook that has PySpark code that you can run to load the data into a Spark in-memoryDataFrame and display the rows.*1: This feature was just added and has a limitation in that the column headers do not show correctly (a feature is being worked on to enable the query to infer the column names if the file has a header row)*2: replace varchar(8000) with varchar(MAX) in the SELECT statement if you receive an error when running the query*3: A work-around is to right-click the file, choose New SQL script - Bulk load, click the Continue button and you will see a Preview data option I am starting to see this relatively new phrase, Data Lakehouse , being used in the data platform world. It s the combination of Data Lake and Data Warehouse . In this post I ll give my thoughts on it, and how the next version of Azure Synapse Analytics that is in public preview fits right in with the Data Lakehouse.First, I want to clear up a bit of confusion regarding Azure Synapse Analytics. If you login to the Azure portal and do a search, you will see:Notice that Azure SQL Data Warehouse (SQL DW) is now called Azure Synapse Analytics. This product has been generally available (GA) for a while and is simply a renaming. It contains new features and soon-to-be new features that I blogged about here, but this is separate from other features in a public preview version that I will call Synapse PP (see the new docs). I ll call the current GA version Synapse GA . I blogged about this here.I like to think a major feature of a Data Lakehouse as the ability to use T-SQL to query data the same way no matter whether the data is in a data lake (storage such as ADLS Gen2) or a data warehouse (a relational database such as Synapse GA).Here are some of the major new Synapse PP features that support a Data Lakehouse:There are some great benefits to having a data lake that I blogged about here. And there are many reasons to also have a relational data warehouse along with your data lake that I blogged about here. Up until now, the data lake and the relational data warehouse where sort of on their own island, having to jump to a different product to interface with each. For example, your data lake could be in ADLS Gen2 and you can use Azure Databricks to query it with SparkSQL (first setting up a connection to ADLS Gen2), while your relational data warehouse could be in Synapse GA and you use SSMS to query it with T-SQL. Having to use Azure Data Factory or Power BI means opening up another tab in your browser, logging into those products, and connecting to ADLS Gen2. So it can be time consuming and costly to create an environment to query both, and also be confusing to query using two different versions of SQL.You can now get the best of both worlds all under one roof via Azure Synapse Studio: using the relational data warehouse when you need fast query performance, high user concurrency, enhanced security, or just prefer working in a relational environment; or using the data lake when you have huge data volumes, need instant access to the data via schema-on-read, or need to deal with semi-structured data. You never have to leave your workspace no matter which tool you want to use, and can use T-SQL no matter if the data is in the data lake or in the relational data warehouse. And querying a file in the data lake is simply a matter of right-clicking the file (more on this in my next blog).So at a high level you can view Azure Synapse Studio as supporting business intelligence, data science, T-SQL plus other languages (Python, Scala, C#, SparkSQL), schema flexibility, ACID compliance, any data type, petabytes of data, streaming and batch, with high performance and user concurrency.So as you can see, Synapse PP combines the use of a data lake and a relational database to make it a Data Lakehouse. It is not really a new technology, but just a merging of existing technologies to make it easier to gain insights from data in order to make better business decisions.With these new features, especially SQL on-demand and T-SQL against ADLS, I am starting to see a few use cases where you may not need a relational database anymore in your modern data warehouse, which goes against my previous way of thinking (see Is the traditional data warehouse dead?). I still feel you will want to have a relational database in your modern data warehouse architecture a large majority of the time, but there will be exceptions.For example, I can land a bunch of parquet files into the data lake and create a T-SQL view on top of that data, where that view is stored in SQL on-demand. Then I call that view from Power BI making it appear to the end-user that they are using a relational database and at the same time they are only paying when the query is being run. So that leads to compute costs savings as well as not having to copy the data from the data lake to a relational database. This means less time is needed to build a solution and there is a reduction in the complexity of the solution and therefore additional cost savings. Below are some of the concerns that arise out of skipping the relational database and the new features that counter some of those concerns:Speed: Queries against a relational storage will always be faster than against a data lake because of the lack of statistics, query plans, and indexes in the data lake. Counter: DirectParquet, CSV 2.0, query acceleration, predict pushdown, and sql on-demand auto-scaling are some of the features that can make queries against ADLS be nearly as fast as a relational database. Then there are features like Delta lake and the ability to use statistics for external tables that can add even more performance. Plus you can also import the data into Power BI, use Power BI aggregation tables, or import the data into Azure Analysis Services to get even faster performanceSecurity: Row-level security (RLS), column-level security, and dynamic data masking are security-related features that are not available in a data lake. Counter: User RLS in Power BI or RLS on external tables instead of RLS on a database table, which then allows you to use result set caching in SynapseComplexity: Schema-on-read (ADLS) is more complex to query than schema-on-write (relational database). Schema-on-read means the end-user must define the metadata, where with schema-on-write the metadata was stored along with the data. Then there is the difficulty in querying in a file-based world compared to a relational database world. Counter: Create a SQL relational view on top of files in the data lake so the end-user does not have to create the metadata, which will make it appear to the end-user that the data is in a relational database. But I still see it being very difficult to manage a solution with just a data lake when you have data from many sourcesMissing features: Auditing, referential integrity, ACID compliance, updating/deleting rows of data, data caching, full support of T-SQL are not available in a data lake. Counter: many of these features can be added when using a Delta Lake, but will not be as easy to implement as a relational database and you will be locked into using SparkSome additional reasons for using a relational database include helping to break up a large problem into smaller pieces (think of the relational database as an additional step in the data journey from a data lake to the end-user), and the need to handle slowly changing dimensions (SCD). Using time travel in the Delta Lake or creating daily folders for each table in the data lake are options for handling SCD but much more complex than in a relational database world. I also see some customers who have source systems that are relational databases and as they are in the early stages of building a modern data warehouse in Azure they will bypass the data lake and copy the relational data right into a relational data warehouse (especially if they have many existing SSIS packages that they are using for their on-prem data warehouse and they want to continue using those for the short-term). They do this for quick wins and eventually will land the data in the data lake.A strong reason for skipping a relational database is because, for the short term, SQL on-demand is not able to access a relational database, so you can only use this feature against a data lake. So if you want to save costs by using SQL on-demand it would have to be against data in the data lake. This could be especially true if you have a small data warehouse.I m still thinking through this, but these are my initial thoughts and it will be interesting to have discussions with customers on this topic when Synapse PP goes GA. Please comment below on your thoughts!I ll be presenting and doing a demo of the public preview of Azure Synapse Analytics at my sessions at European Digital Week on 9/25 (session info), SQL Bits on 10/3 (session info), PASS Summit on 11/13 (session info), and Big Data Conference Europe on 11/25 (session info).I had a previous blog comparing Cosmos DB to a relational database (see Understanding Cosmos DB coming from a relational world) and one topic that it did not address that I want to now is how to handle reference tables that are common in the relational database world.A big difference with Cosmos DB compared to a relational database is you will create a denormalized data model. Take a person record for example. You will embed all the information related to a person, such as their contact details and addresses, into a single JSON document. Retrieving a complete person record from the database is now a single read operation against a single container and for a single item. Updating a person record, with their contact details and addresses, is also asingle write operation against a single item. By denormalizing data, your application typically will have better read performance and write performance and allow for a scale-out architecture since you don t need to join tables. (Side note: container is the generic term. Depending on the API, a specific term is used such as collection for the SQL API). Think of a container as one or more tables in the relational world. Going a little deeper, think of a container as a group of one or more entities which share the same partition key. A relational table shares a schema, but containers are not bound in that way.)Embedding data works nicely for many cases but there are scenarios when denormalizing your data will cause more problems than it is worth. In a document database, you can have information in one document that relates to data in other documents. While there may be some use cases that are better suited for a relational database than in Cosmos DB (see below), in most cases you can handle relationships in Cosmos DB by creating a normalized data model for them, with the tradeoff that it can require more round trips to the server to read data (but improve the efficiency of write operations since less data is written). In general, use normalized data models to represent one-to-many relationships or many-to-many relationships when related data changes frequently. The key is knowing whether the cost of the updates is greater than the cost of the queries.When using a normalized data model, your application will need to handle creating the reference document. One way would be to use a change feed that triggers on the creation of a new document the change feed essentially triggers an Azure function that creates the relationship record.When using a normalized data model, your application will need to query the multiple documents that need to be joined (costing more money because it will use more request units), and do the joining within the application (i.e. join a main document with documents that contain the reference data) as you cannot do a join between documents within different containers in Cosmos DB (joins between documents within the same container can be done via self-joins). Since every time you display a document it needs to search the entire container for the name, it would be best to put the other document type (the reference data) in a different container so you can have different partition keys for each document type (read up on how partitioning can make a big impact on performance and cost). Note that partitioning in a RDBMS compared to Cosmos DB are different things: partitioning in Cosmos DB refers to sharding or horizontal partitioning , where replica sets which contain both the dataand copies of compute (database) resources operating in a “shared nothing” architecture (i.e. scaled “horizontally” where each compute resource (server node) operates independently of every other node, but with a programming model transparent to developers). Conversely, what is often referred to as “partitioning” in a RDBMS ispurely a separation ofdata into separate file groups within ashared compute (database) environment. This is also often called “vertical partitioning”. Another option that is common pattern for NoSQL databases is to create a separate container to satisfy specific queries. For example, having a container for products based on category and another container for products based on geography. Both of those containers for my query/app are being sourced from one that is my main or source container that is being updated (front end, or another app) and the change feed attached to that pushes out to my other containers that I use for my queries. This means duplicating data, but storage is cheap and you save costs to retrieve data (think of those extra containers as covering indexes in the relational database world). Since joining data can involve multiple ways of reading the data, it s important to understand the two ways to read data using the Azure Cosmos DB SQL API:Point reads You can do a key/value lookup on a singleitem IDand partition key. Theitem IDand partition key combination is the key and the item itself is the value. For a 1 KB document, point reads typically cost 1request unitwith a latency under 10ms. Point reads return a single itemSQL queries You can query data by writing queries using the Structured Query Language (SQL) as a JSON query language. Queries always cost at least 2.3 request units and, in general, will have a higher and more variable latency than point reads. Queries can return many items. See Getting started with SQL queriesThe key in deciding when to use a normalized data model is how frequently the data will change. If the data only changes once a year it may not be worthwhile to create a reference document and instead just do an update to all the documents. But be aware that the update has to be done from the client side spread over the affected documents, doing it in batches as one big UPDATE statement does not exist in Cosmos DB. You will need to retrieve the entire document from Cosmos DB, update the property/properties in your application and then call the ‘Replace’ method in the Cosmos DB SDK to replace the document in question (see CosmosDb – Updating a Document (Partially)). If you are using SQL API and .NET or Java, you can consider using bulk support(.NET) or bulk executor (Java). Other ideas would involve using change feed, or if you really need a level of ACID consistency, you can achieve this using stored procedures, with snapshot isolation scoped to a single partition(this is not the same as stored procedures in SQL rather these are designed specifically to support multi-doc transactions). Also be aware that because there is currently no concept of a constraint, foreign-key or otherwise, any inter-document relationships that you have in documents are effectively weak links and will not be verified by the database itself. If you want to ensure that the data a document is referring to actually exists, then you need to do this in your application, or through the use of server-side triggers or stored procedures on Azure Cosmos DB.Avoiding the broader topic of when to use a relational database over a non-relational database, there are a few use cases where a relational database is essential:The customer experience and comfort zone is with relational databases. It is a reality that relational databases are ahead in the maturity curve with respect to tooling (an example would be foreign-key constraint behavior). However, it should be noted that this is not the same as saying that “more use cases are technically better suited to the relational model”. Rather, the barrier to entry in new customer projects tends to be lower because mindshare is greater in the relational space. In these cases, it often isn’t worth the effort for companies to upskillThe system really needs strict ACID semantics across the entire dataset. Sharded/partitioned databases like Cosmos DB will not provide ACID guarantees across the entire set of physical partitions (and likely never will). In reality, however, the use cases where this is necessary is quite small. Things like transaction management and other SDK-level things that go along with these aspects come easier in the RDBMS space, but this is really the same as above point – RDBMS is ahead on maturity curve for user-level tooling to help abstract paradigm specific concepts – but this does not make the paradigm better suited to a greater number of use casesHaving a single data store that services both operational and analytical needs with equal utility, including tabular models – this is probably the most powerful argument, and NoSQL engines are likely never going to serve a data structure that coalesces as well into tabular models that produce reports, charts, graphs, etc. But again, history has proven that, at scale, the “one fits all” approach can have some non-trivial drawbacks. And the new Analytical Store in Cosmos DB is addressing the need to service both operational and analytical needsYou can create complex hierarchical “relationships” in Cosmos DB, which would have to be modelled in separate tables in an RDBMS. Cosmos DB can’t handle them using joins – but again, this is a paradigmatic/semantic difference, not a fundamental flaw in the database model itself. In order to do the equivalent of what one may be trying to achieve in a relational database, you may have to “unlearn what you have learned”, but this comes back to your comfort level with a RDBMS, which is not a trivial thing and can be the main and very valid reason for staying with a RDBMS.In summary, in a NoSQL database like Cosmos DB, most use cases are covered. Some things are a little harder (due to lack of maturity in tooling), but most things are easier, many things can only be done in NoSQL (i.e. handling millions of transactions per second), and very few things cannot be done in a NoSQL database. Most NoSQL engines are characterized by having a lot more configurability, tunability, and flexibility than a RDBMS. And in many ways, that is the hardest challenge for newcomers.The new SQL on-demand (to be renamed SQL serverless) offering within Azure Synapse Analytics opens up a whole new world of getting insights into data immediately, at low-cost, using familiar tools/languages. There is no infrastructure to setup or clusters to maintain. It uses a pay-per-query model (you are only being charged per query for the data processed by the query see pricing) and auto-scales depending on the amount of data being queried. See Quickstart: Use SQL on-demandA SQL on-demand pool currently can access data from ADLS Gen2, Spark Tables, and Cosmos DB using T-SQL (click picture to enlarge):Basic discovery and exploration Quickly view the data in various formats (Parquet, CSV, JSON) in your data lake, so you can plan how to extract insights from itLogical data warehouse – Provide a relational abstraction on top of raw or disparate data without having to relocate or transform the data, allowing an always up-to-date view of your data. By putting T-SQL views on top of data in your data lake, this makes it appear to the end user that they are querying data in a relational database since they are using T-SQL, blurring the line between a relational database and a data lakeData transformation Simple, scalable, and performant way to transform data in the lake using T-SQL, so it can be fed to BI and other tools, or loaded into a relational data store (Synapse SQL databases, Azure SQL Database, etc.). For example, using the Copy activity in Azure Data Factory you can convert CSV files in the data lake (via T-SQL views in SQL on-demand) to Parquet files in the data lake. See Azure Synapse Analytics: How serverless is replacing the data warehouseData Engineers can explore the lake, transform and prepare data using this service, and simplify their data transformation pipelines. For more information, check out thistutorial. You could even create a view over data in the data lake and use that to refresh your data in a tabular modelData Scientists can quickly reason about the contents and structure of the data in the lake, thanks to features such as OPENROWSET and automatic schema inferenceData Analysts canexplore data and Spark external tablescreated by Data Scientists or Data Engineers using familiar T-SQL language or their favorite tools, which can connect to SQL on-demand. So if you create a Spark Table, that table will becreated as an external table in SQL On-Demand so you can query it without having to keep a Spark cluster up and running. Currently this is only supported with Spark Tables created using Parquet as the input formatBI Professionals can quicklycreate Power BI reports on top of data in the lakeand Spark tables. You are connecting to a database that has no data, but rather views or external tables to data in the data lake. Not only can you create reports, but you can use Power BI to explore the data and do basic discoveryWill SQL on-demand be able to query SQL Pool tables? It will be able to in the futureIs SQL serverless MPP? SQL serverless uses nodes to scale out processing, similar to MPP, although it is completely different architecture from existing MPP data warehouses. It uses customized SQL enginesWhat are the warmup times when submitting a query? For 95 percentile it is less than 2s. There is a timeout defined after which there is scale-down, so if you don t execute any query targeting storage during that timeout, your resources will be taken from you . Once you get back/connect and execute a query that targets storage, resources are granted back to your endpoint. Maintained is a pool of warm clusters to get the quick warmup timeIs there a predicate pushdown concept for SQL on-demand in Azure Synapse? Yes, there is filter pushdown where SQL on-demand will push down queries from the front-end to back-end nodes. For example, if you query parquet files parquet metadata is used to target only column groups that contain values you are looking for. Microsoft is expanding the range of cases in which filter pushdown is supported What are the best practices for SQL on-demand? Check out https://docs.microsoft.com/en-us/azure/synapse-analytics/sql/best-practices-sql-on-demandWhy should I choose serverless and have a penalty on first (warmup) query, instead of using provisioned? It depends on your workload. For constant or high workloads provisioned might be a better choice, while for sporadic or ad-hoc or exploratory workloads serverless might be better fit. It also brings a difference in charges: provisioned means you are paying for resources, while in serverless you are paying for what you useSo when would you use provisioned SQL (aka SQL Pool, which is exactly what was in SQL Data Warehouse)? When you need consistent performance, high performance, or have a large number of queries being consistently run. Provisioned SQL may give you better and more predictable performance due to resource reservation. But the good thing is because both use T-SQL, it is easy to transition back-and-forth between SQL Serverless and a SQL poolMost customers are starting to use Delta Lake to store their data. When will SQL OD support this file format? We are working on the support for Delta Lake, in the following months there will be more details regarding it and when will it be availableWhat is the ADLS Gen2 security? There are different methods to access storage account files using SQL serverless. You can use Synapse workspace MSI, SAS, or control ACL on storage level for a particular AAD accountCan we read from files stored in Common Data Model (CDM) format within the Data Lake (ADLS Gen2)? Yes, at this moment we have a way to read CDM data using the SQL script or provisioned script, which gets metadata from the model.json and creates views over CDM data (native support in the query currently doesn t exist)Why use external tables over views? Views are faster and have more features, so avoid external tables if possible. Particularly, virtual functions (filepath and filename are not supported in external tables at this moment which means users cannot do partition elimination on external tables)Can you use PBI and AAS with SQL on-demand? Yes, you can create views on top of your data lake, use wildcards and filepath/filename functions to expose partitioning column values in a view. That way, for PBI and AAS it is just another object (please make sure that you do proper data type casting of virtual function results as stated in best practices document)Why use a logical data warehouse instead of just using AAS or PBI instead? Hot analytics is one reason as well as a logical data warehouse that can be used to speed up user request to delivery time. Ideally, there would be no ETLs created (it might not be feasible in all cases, so providing another perspective here)Is there any cost control? Yes, cost control will be available per workspace and time period daily/weekly/monthly. Also, cost estimation for laterWhat about CI/CD? When Synapse goes GA there will be improvements overall in CI/CD area, where customers can create group/department and scope the permissions of particular teams to the artifacts and capabilitiesCan I use other tools with SQL on-demand? Yes, there is a default endpoint (i.e. workspacejs1966-ondemand.sql.azuresynapse.net) for this service that is provided within every Azure Synapse workspace. Any tool capable of establishing TDS connection SQL offerings canconnect to and query Synapse SQLon-demand endpoint. You can connect with Azure Data Studio or SSMS and run ad-hoc queries or connect with Power BI to gain insights. You can also usePrivate Linkto bring your SQL on-demand endpoint into yourmanaged workspace VNetI have wrote a number of blogs on the topic of business continuity in SQL Database before (HA/DR for Azure SQL Database, Azure SQL Database high availability, Azure SQL Database disaster recovery) but with a number of new features I felt it was time for a new blog on the subject, focusing on disaster recovery and not high availability.Business continuityin Azure SQL Database and SQL Managed Instance refers to the mechanisms, policies, and procedures that enable your business to continue operating in the face of disruption, particularly to its computing infrastructure. In the most of the cases, SQL Database and SQL Managed Instance will handle the disruptive events that might happen in the cloud environment and keep your applications and business processes running. From a database perspective, these are the major potential disruption scenarios, or disasters:Data corruption or deletion typically caused by an application bug or human error (user accidentally deleted or updated a row in a table). Such failures are application-specific and typically cannot be detected by the database serviceMalicious attacker succeeded to delete data or drop a databaseDatacenter outage or temporarily disabled, possibly caused by a natural disaster such as an earthquake. This scenario requires some level of geo-redundancy with application failover to an alternate datacenterLocal hardware or software failures affecting the database node such as a disk-drive failureUpgrade or maintenance errors, unanticipated issues that occur during planned infrastructure maintenance or upgrades may require rapid rollback to a prior database stateThis overview describes the capabilities that SQL Database and SQL Managed Instance provide for business continuity and disaster recovery.To mitigate the local hardware and software failures, SQL Database includes ahigh availability architecture, which guarantees automatic recovery from these failures with up to 99.995% availability SLA.SQL Database and SQL Managed Instance also provide several business continuity features that you can use to mitigate various unplanned scenarios:Temporal tables enable you to restore row versions from any point in timeTo protect your business from data loss, SQL Database and SQL Managed Instance automatically create full database backups weekly, differential database backups every 12 hours, and transaction log backups every 5 10 minutes (see Built-in automated backups). The backups are stored in RA-GRS storage for at least 7 days for all service tiers. All service tiers except Basic support configurable backup retention period for point-in-time restore, up to 35 days. Point in Time Restore enables you to restore a complete database to some point in time within the configured retention periodYou can restore a deleted database to the point at which it was deleted if the server has not been deleted. For a corrupted database, you can create a new database from a backup to the same server, usually in less than 12 hours unless it is a very large or very active database (see database recovery time). You can also restore a database to another geographic region, called geo-restore, due to geo-replicated backups, which is where a backed up database is automatically copied to an Azure blob in a different region (note there is a delay between when a backup is taken and when it is geo-replicated to an Azure blob in a different region, so as a result, the restored database can be up to one hour behind the original database). This allows you to recover from a geographic disaster when you cannot access your database or backups in the primary region. It creates a new database on any existing server or managed instance, in any Azure region. See Azure SQL Database and BackupsLong-term backup retention (LTR) enables you to keep the backups up to 10 years (this is in limited public preview for SQL Managed Instance). LTR allows you to restore an old version of the database by using the Azure portal or Azure PowerShell to satisfy a compliance request or to run an old version of the applicationActive geo-replication enables you to create readable replicas and manually failover to any replica in case of a datacenter outage or application upgrade (see table below to compare with auto-failover groups). If you have an application that must be taken offline because of planned maintenance such as an application upgrade, check out Manage application upgrades which describes how to use active geo-replication to enable rolling upgrades of your cloud application to minimize downtime during upgrades and provide a recovery path if something goes wrongAuto-failover group allows the application to automatically recovery in case of a datacenter outage (see table below to compare with active geo-replication)Auto-failover groupssimplify the deployment and usage ofactive geo-replicationand add the additional capabilities as described in the following table:FeatureGeo-replicationFailover groupsAutomatic failoverNoYesFail over multiple databases simultaneouslyNoYesUser must update connection string after failoverYesNoSQL Managed Instance supportNoYesCan be in same region as primaryYesNoMultiple replicasYesNoSupports read-scaleYesYesSome details if you need to recover a database due to the very rare case of an Azure datacenter having an outage:One option is to wait for your database to come back online when the datacenter outage is over. This works for applications that can afford to have the database offline. When a datacenter has an outage, you do not know how long the outage might last, so this option only works if you don t need your database for a whileAnother option is to restore a database on any server in any Azure region usinggeo-restore, as explained aboveFinally, you can quickly recover from an outage if you have configured either geo-secondary usingactive geo-replicationor anauto-failover groupfor your database or databases, as explained above. Depending on your choice of these technologies, you can use either manual or automatic failover. While failover itself takes only a few seconds, the service will take at least 1 hour to activate it. This is necessary to ensure that the failover is justified by the scale of the outage. Also, the failover may result in small data loss due to the nature of asynchronous replicationAs you develop your business continuity plan, you need to understand the maximum acceptable time before the application fully recovers after the disruptive event. The time required for application to fully recover is known as Recovery time objective (RTO). You also need to understand the maximum period of recent data updates (time interval) the application can tolerate losing when recovering from an unplanned disruptive event. The potential data loss is known as Recovery point objective (RPO).Different recovery methods offer different levels of RPO and RTO. You can choose a specific recovery method, or use a combination of database backups and active geo-replication to achieve full application recovery. The following table compares RPO and RTO of each recovery option:Recovery methodRTORPOGeo-restore from geo-replicated backups12 h1 hAuto-failover groups1 h5 sManual database failover (via unplanned mode)30 s5 sIs mission criticalHas a service level agreement (SLA) that does not allow for 12 hours or more of downtimeDowntime may result in financial liabilityHas a high rate of data change and 1 hour of data loss is not acceptableThe additional cost of active geo-replication is lower than the potential financial liability and associated loss of businessThe following sections provide an overview of the steps to recover using database backups, active geo-replication, or auto-failover groups. For detailed steps including planning requirements, post recovery steps, and information about how to simulate an outage to perform a disaster recovery drill, seeRecover a database in SQL Database from an outage.Identify and prepare the target server, including server-level IP firewall rules, logins, and master database level permissionsDetermine how to redirect clients and client applications to the new serverDocument other dependencies, such as auditing settings and alertsIf you are using active geo-replication or auto-failover groups as your recovery mechanism, you can configure an automatic failover policy or usemanual unplanned failover. Once initiated, the failover causes the secondary to become the new primary and to be ready to record new transactions and respond to queries with minimal data loss for the data not yet replicated. For information on designing the failover process, seeDesign an application for cloud disaster recovery. When the Azure datacenter comes back online the old primaries automatically reconnect to the new primary and become secondary databases. If you need to relocate the primary back to the original region, you can initiate a planned failover manually (failback).If you are using the automated backups with geo-redundant storage (enabled by default), you can recover the database usinggeo-restore. Recovery usually takes place within 12 hours with data loss of up to one hour determined by when the last log backup was taken and replicated. Until the recovery completes, the database is unable to record any transactions or respond to any queries. Note that geo-restore only restores the database to the last available point in time. If the datacenter comes back online before you switch your application over to the recovered database, you can cancel the recovery.After recovery, you must perform the following additional tasks before your users and applications are back up and running:Redirect clients and client applications to the new server and restored databaseEnsure appropriate server-level IP firewall rules are in place for users to connect or usedatabase-level firewallsto enable appropriate rulesEnsure appropriate logins and master database level permissions are in place (or usecontained users)Configure auditing, as appropriateConfigure alerts, as appropriateIf you are using an auto-failover group and connect to the databases using the read-write listener, the redirection after failover will happen automatically and transparently to the application.Archive tier is now GA: The archive tier provides an ultra-low cost tier for long term retention of data while keeping your data available for future analytics needs. Tier your data seamlessly among hot, cool, and archive so all your data stays in one storage account. Lifecycle management policies can be set so files are moved automatically to the archive tier when data access becomes rare. When needed, data in the archive tier can be quickly and easily rehydrated so that the data is available for your analytics workloads. More info.Immutable storage (preview): Immutable storage provides the capability to store data in a write once, read many (WORM) state. Once data is written, the data becomes non-erasable and non-modifiable, and you can set a retention period so that files can t be deleted until after that period has elapsed. Additionally, legal holds can be placed on data to make that data non-erasable and non-modifiable until the hold is removed. This preview is currently available in Canada Central, Canada East, France Central, and France South. To enroll in the preview, complete thisform.More info.File snapshots (preview): Use file snapshots to take an unlimited number of point-in-time snapshots of your files. These snapshots can be used to revert a file back to that snapshot in the case of accidental or inadvertent updates. Snapshots can also be retained so you can reference the content of a file at that point in time. File snapshots is currently available in preview in Canada Central, Canada East, France Central, and France South. To enroll in the preview, completethis form. More info.Static website (preview): Use static website to directly host static content from Azure Data Lake Storage, and view that site content from a browser by using the public URL of that website. This preview is currently available in Canada Central, Canada East, France Central, and France South. To enroll in the preview, complete thisform. More info.I discover a small new feature the other day, but a very useful one. Previously, there was no way to upload files to an ADLS Gen2 storage account via the Azure portal. You had to use Azure Storage Explorer if you hit the Upload button via the Storage Explorer (preview) in the Azure Portal, it told you to download Azure Storage Explorer.But now, when on the Azure Portal, you can upload a file by choosing Containers from the overview blade or choosing Containers under Data Lake Storage , selecting a container, and using the Upload button. Note that you can upload multiple files at once and specify there authentication type, block size and access tier.You can also change the tier of a file to Hot, Cool, or Archive, create a container, or create a folder/directory (Storage Explorer in the portal does not support changing the tier).Also note in the containers section that next to each file the has an option to View/edit the file, so you don t have to download it to view it (unless the file size is over 2.1MB which is the max supported by the editor). Among the types of files that can be viewed are csv, json, jpg, log, avro but not parquet. All files in the Storage Explorer in the portal must be downloaded to view.Power BI has become hugely popular and I find common questions about functionality and features, so I thought I would put in this blog some of those questions with answers:To improve your Power BI implementation, it is highly recommended to separate datasets from reports by having separate pbix files for each, resulting in a data workspace and a reporting workspace . The data workspace should have permissions so only certain people (i.e. data modelers) can edit the data, while giving broader permissions to people (i.e. report developers) for the reporting workspace. Power BI row-level security will still be invoked for report developers even if they don t have edit rights on the dataset (they just need read and build permissions). Make sure to use auditing to track who is using the workspaces. A few downsides: pinning to a dashboard from Q A or Quick Insights are no longer valid options because you can’t choose a dashboard in another workspace. Also, if your company has a huge investment in O365 groups, currently the build permission can’t be set on that.Using a dataset for multiple reports is called a golden dataset or the hub and spoke method. To do this, when creating a new report in Power BI Desktop, rather than importing data into the .pbix file, instead use “Get Data” to make a live connection to an existing Power BI dataset. It s a good idea to use report pages in the data workspace to describe the datasets. For more info, check out Melissa Coates blog and video at 5 Tips for Separating Power BI Datasets and Reports [VIDEO].Yes! You should use change management to move reports through the dev/test/prod workspace tiers via the new deployment pipelines in Power BI. Use the workspaces to collaborate on Power BI content with your colleagues, while distributing the report to a larger audience by publishing an app. You should also promote and certify your datasets. The reports and datasets should have repeatable test criteria.When you publish an app, is it always in the same dedicated capacity as the workspace? Wondering if you could have a workspace in dedicated capacity A and publish the app to dedicated capacity B (if you do not want people hammering the workspace to cause performance issues with people using the app).An App is tied to the content stored in a workspace. That content is physically located in Azure storage (data model) and metadata is stored in Azure SQL. These details are covered in the security whitepaper. In fact in Power BI Embedded you can use the app GUID and workspace GUID interchangeably. GUIDs are logical content groupings. Capacity backend cores and memory are used to process data in use this is the only real physical relationship with the capacity. You should look into shared datasets. Shared datasets can reside in any workspace that they can move to any available capacity. Frontend cores are shared per cluster, so there is no front end load benefits. Workspace movement between capacities is instantaneous (assuming they are in the same data center).Yes, as long as you set proper permissions and the user has proper licensing, you will be able to leverage datasets for report creation and viewingin any workspace. More details at https://docs.microsoft.com/en-us/power-bi/connect-data/service-datasets-share. In a deployment pipeline, can you have dev/test/prodin workspaces that are in different capacities?Yes, can change the workspace to a different capacity via settings on the deployment pipeline screen.How can you know about datasets that you don t have access to? It would make sense to be able to search for a dataset and get a result back that the dataset exists but you need to request permission to use it.This is not supported. Possibly could be a use case for Azure Data Catalog.The main difference is around available connectors (~125 in Power Query Desktop vs. ~50 in Power Query Online). On the transformations front, we re at full parity on M Engine support and almost there for PQ UX too with some notable exceptions (e.g. By Example capabilities). You can also look at the Public Roadmap to get a sense for the remaining PQ Online gaps and things planned in next 6 months: https://docs.microsoft.com/en-us/power-platform-release-plan/2020wave1/cdm-data-integration/power-query-online.Yes:https://docs.microsoft.com/en-us/power-bi/create-reports/service-export-to-pbix. Note that the report must have been created by using Power BI Desktop and published to the Power BI service, or the .pbix file must have been uploaded to the Power BI service.There is a 3rd party product to recover deleted reports: Power BI Sentinel.For a deleted dashboard: No. Tenant Admin can recover deleted workspaces, but not individual artifacts. You can raise a support ticket to Microsoft Support.How can I get an alert when a premium capacity hits aperformance metric (i.e. when the CPU of a premium capacity hits 100% or areport takes more than a minute to execute)?When a Power BIPremium capacity is experiencing extended periods of high resource use thatpotentially impacts reliability, a notification email is automaticallysent. Examples of such impacts includeextended delays in operations such as opening a report, dataset refresh, andquery executions. More info at https://docs.microsoft.com/en-us/power-bi/admin/service-interruption-notifications#capacity-and-reliability-notifications. Coming in July is a public preview of AzureMonitor integration that will allow customers to connect their Power BIenvironment to pre-configured Azure Log Analytics workspaces. This provides long term data storage,retention policies, adhoc query capability, and the ability to analyze the logdata directly from Power BI (see AzureMonitor integration).Users can cause performance issues by overloading the Power BI service with jobs, writing overly complex queries, creating circular references, and so on, that can consume all of the resources available on the capacity. You need the ability to mitigate significant issues when they occur. The quickest way to mitigate these issues is to restart the capacity. More details at https://docs.microsoft.com/en-us/power-bi/admin/service-admin-premium-restart.Is it possible to change the region selection of apremium capacity after it is created?Create a second capacity and move workspaces. Free users won t experience any downtime as long as the tenant has spare v-cores.If creating a second capacity isn t an option, you can temporarily move the content back to shared capacity from Premium. You don t need extra v-cores, but free users will experience some downtime. Then create the premium capacity in the new region and move the workspace from shared to that premium capacityThen click the Update app and go to the Permissions tab to see the permissions previously given to the app. There is not a way to see the permissionsprogrammatically (you can utilize the Activity Logs to review when people areadded). Check out the Power Platform: 2020 release wave 1 plan (last updated May 14, 2020). The Power Platform release plan (formerly release notes) for the 2020 release wave 1 describes all new features releasing from April 2020 through September 2020 for Power BI, Power Apps, Power Automate, AI Builder, Power Virtual Agents, and Common Data Model and Data Integration. You can either browse the release planonlineor download the document as aPDF file or via the Power BI Release Wave.I have many reports with the share permissions set in the reports, various users get to see various sub-sets of these reports and a small number of senior managers get to see all reports. That all works very well. If I set up a Power BI app for my reports and share that will all the users. Will the share permissions set in the reports be respected by the app or will the fact that I shared the app with all users override my report permissions thus making the all the reports available to all users of the app?When you publish an app, the security from the reports is not carried over. By default no users will have access to the app, so you will need to grant access.

TAGS:James Serra Blog

<<< Thank you for your visit >>>

Websites to related :
Vermont SIRI Web Site

　　Note to viewers, hazard.com was one of the first if not the very first open internet site for material safety information. The information on this sit

Ben Bolstad Webpage

　　I have been heavily involved with the development of statistical algorithms for microarray data analysis, including spending a great deal of time as a

SPDF - OMNIWeb Service

　　Paths to Magnetic field, Plasma, Energetic particle data relevant to heliospheric studies and resident at Goddard's Space Physics Data Facility. The

NCAR Graphics

　　Contour plotsXY plotsVector plotsStreamline plotsTriangular meshesWeather mapsHistogramsSurfaces/IsosurfacesMapsScrolling titlesGKS libraryMath librar

Homoeopathic Physician Online, C

　　Homoeopathy, the healing art, restores the sick to health, in a rapid, gentle and permanent way and is based on easily comprehensible principles. It c

Freelance Graphic Designer

　　Specialising in branding, print digital design, I m here to make your brand shine. Identity With an equal measure of mind, heart and creative insight

Church of Reality

　　Welcome to the Church of Reality. Welcome to the Real World. The Church of Reality is a positive force for change and you can be part of it. You can h

Chemistry | University of Missou

　　A new view of microscopic interactions COLUMBIA, Mo. – When two cars collide at an intersection — from opposite directions — the impact is much di

Classical Homeopathy Consultatio

　　Athena Holistic Health has been selected for the 2019 Best of Redmond Awards in the category of Homeopathy Click on the NCH icon to find out how

San Diego Homeopathy - Welcome

　　Robbear7 Website DesignsPO Box 296Airway Heights, WA 99001 admin@robbear7.com Welcome to San Diego Homeopathy We hope that this site will provide y

ads

Hot Websites

WHY YOU SHOULD Be listed

Matt Cutts, a software engineer and former head of the web spam team at Google, stated back in 2012 that "spammy directories" are disregarded while those who "excercise editorial discretion" worth it. Even though countless business directories litter the internet, they won't all offer a positive change. Don't take unneeded risks and suggest your website to one that Google loves!

Email: liunxqq$$$126.com ($$$=@) ie websites |