Microsoft Fabric: A detailed fundamentals guide with examples

Rihab Feki
5 min readJan 2, 2025

--

Microsoft Fabric is an end-to-end analytics and data platform providing a unified solution. It encompasses data ingestion, processing, transformation, real-time event routing, and report building.

Highlevel overview of Microsoft Fabric Architecture

Data Factory

You can think of Data Factory as an Extract, Transform, and Load (ETL) tool. its main items consist of Dataflow & Data Pipeline.

Synapse Data Warehouse

Provides a familiar transactional data warehouse solution with tables, schemas, views, stored procedures, etc. Query-able using Structured Query Language (T-SQL)

Synapse Data Engineering

Enables users to design, build, and maintain infrastructure and systems that allow their organizations to collect, store, process, and analyze large volumes of data. it is similar to Databricks or Snowflake. It consists of the following Fabric items: Lakehouse, Notebook & Spark Job.

Synapse Data Science

Supports an organization's entire data science workflow, from data exploration, preparation, and cleansing to experimentation, modeling, model scoring, and serving of predictive insights to BI reports.

Synapse Real-time Analytics

Provides a set of tools to ingest, manage, and analyze real-time event data.

Power BI

Is Microsoft’s business intelligence solution that allows users to create reports to prevent visual insights to business users.

Data Activator

Automatically taking actions (like running a Power Automate routine) when patterns or conditions are detected in changing data, such as data in Power BI.

Microsoft Fabric Environment

Microsoft Fabric Environment structure
  • Usually there is one tenant per organisation and in the tenant you provision one to more capacity.

Capacity

  • A capacity is a distinct pool of resources allocated to Microsoft Fabric (pay as you go model)
  • How many capacity do you need?

Workspaces

  • In a capacity you can have more than one workspace.
  • In a workspace you can create Fabric items to collaborate on within a team.
  • It is the main way to give access to people/groups in Fabric.

How many Workspaces do you need?

Option 1
Option 2

To control the number od workspaces in an ideal way:

  • Align your workspaces to Personas e.g. data engineering team
  • Namic conventions for Fabric items e.g. dev, test, production
  • use item-level sharing where applicable

Workspaces best practices:

  • Giving access to Fabric items for groups not to individuals
  • Understanding worspaces roles e.g. admin, member, contributor, viewer. More details in the official documentation.
worspaces roles per Fabric items

Getting Data into Fabric

There are many ways to get data into Fabric e.g. shortcuts, mirroring. pipelines, dataflows and more. and it is important to take the right architectural decision by understanding the difference between these methods, which I will focus on next.

The following graph presents a high level overview of the possible ways to get data into Fabric.

Data Ingestion Principles

  • Access to out-th-box connectors (data pipelines & dataflow)
  • Access to Python client libraries for a lot of SaaS products (via Fabric Notebook) which makes high configurability.
  • Need to handle incremental update logic yourself
  • Not real-time

Dataflow Gen 2

  • Benefit from 300+ external connectors
  • No/ low-code solution that can do Extract Transform Laod (ETL)
  • Dataflow + data pipeline can enable the ELT pattern: Get data from a source & write it somewhere (data pipeline or dataflow) then read it, then transform it and write it to a ‘transformed’ location (dataflow).
  • Dataflow can be included in data pipelines, this can help with orchestration, logging, error handling.
  • Difficult to implement data validation

Data Pipeline

Primarly an orchestration tool (do this, do that)

Source: https://learn.microsoft.com/en-us/fabric/data-factory/tutorial-end-to-end-introduction

Fabric Notebooks

  • General purpose coding notebook which can be used to bring data into Fabric, via connecting ti APIs or by using Python libraries.
  • Good for data validation & data quality testing of incoming data
  • Ingesting large datasets with Spark

OneLake Shortcuts

A shortcut enables you to create a live link to data stored either in another part of Fabric (internal shortcut) or in external storage locations such as Azure Data Lake Storage (ADLS) or Amazon S3.

Now we come to the end of this article and at this level you should have a solid understanding of Microsoft Fabric fundamentals.

See you in the next article about Azure for Data Engineering.

--

--

Rihab Feki
Rihab Feki

No responses yet