Microsoft Fabric: A detailed fundamentals guide with examples
Microsoft Fabric is an end-to-end analytics and data platform providing a unified solution. It encompasses data ingestion, processing, transformation, real-time event routing, and report building.
Data Factory
You can think of Data Factory as an Extract, Transform, and Load (ETL) tool. its main items consist of Dataflow & Data Pipeline.
Synapse Data Warehouse
Provides a familiar transactional data warehouse solution with tables, schemas, views, stored procedures, etc. Query-able using Structured Query Language (T-SQL)
Synapse Data Engineering
Enables users to design, build, and maintain infrastructure and systems that allow their organizations to collect, store, process, and analyze large volumes of data. it is similar to Databricks or Snowflake. It consists of the following Fabric items: Lakehouse, Notebook & Spark Job.
Synapse Data Science
Supports an organization's entire data science workflow, from data exploration, preparation, and cleansing to experimentation, modeling, model scoring, and serving of predictive insights to BI reports.
Synapse Real-time Analytics
Provides a set of tools to ingest, manage, and analyze real-time event data.
Power BI
Is Microsoft’s business intelligence solution that allows users to create reports to prevent visual insights to business users.
Data Activator
Automatically taking actions (like running a Power Automate routine) when patterns or conditions are detected in changing data, such as data in Power BI.
Microsoft Fabric Environment
- Usually there is one tenant per organisation and in the tenant you provision one to more capacity.
Capacity
- A capacity is a distinct pool of resources allocated to Microsoft Fabric (pay as you go model)
- How many capacity do you need?
Workspaces
- In a capacity you can have more than one workspace.
- In a workspace you can create Fabric items to collaborate on within a team.
- It is the main way to give access to people/groups in Fabric.
How many Workspaces do you need?
To control the number od workspaces in an ideal way:
- Align your workspaces to Personas e.g. data engineering team
- Namic conventions for Fabric items e.g. dev, test, production
- use item-level sharing where applicable
Workspaces best practices:
- Giving access to Fabric items for groups not to individuals
- Understanding worspaces roles e.g. admin, member, contributor, viewer. More details in the official documentation.
Getting Data into Fabric
There are many ways to get data into Fabric e.g. shortcuts, mirroring. pipelines, dataflows and more. and it is important to take the right architectural decision by understanding the difference between these methods, which I will focus on next.
The following graph presents a high level overview of the possible ways to get data into Fabric.
Data Ingestion Principles
- Access to out-th-box connectors (data pipelines & dataflow)
- Access to Python client libraries for a lot of SaaS products (via Fabric Notebook) which makes high configurability.
- Need to handle incremental update logic yourself
- Not real-time
Dataflow Gen 2
- Benefit from 300+ external connectors
- No/ low-code solution that can do Extract Transform Laod (ETL)
- Dataflow + data pipeline can enable the ELT pattern: Get data from a source & write it somewhere (data pipeline or dataflow) then read it, then transform it and write it to a ‘transformed’ location (dataflow).
- Dataflow can be included in data pipelines, this can help with orchestration, logging, error handling.
- Difficult to implement data validation
Data Pipeline
Primarly an orchestration tool (do this, do that)
Fabric Notebooks
- General purpose coding notebook which can be used to bring data into Fabric, via connecting ti APIs or by using Python libraries.
- Good for data validation & data quality testing of incoming data
- Ingesting large datasets with Spark
OneLake Shortcuts
A shortcut enables you to create a live link to data stored either in another part of Fabric (internal shortcut) or in external storage locations such as Azure Data Lake Storage (ADLS) or Amazon S3.
Now we come to the end of this article and at this level you should have a solid understanding of Microsoft Fabric fundamentals.
See you in the next article about Azure for Data Engineering.