Name	Name	Last commit message	Last commit date
parent directory ..
src	src
README.md	README.md

Demostration: How to integrate AI in Microsoft Fabric

Costa Rica

Last updated: 2025-07-16

Fabric's OneLake datastore provides a unified data storage solution that supports differents data formats and sources. This feature simplifies data access and management, enabling efficient data preparation and model training.

List of References (Click to expand)

Table of Content (Click to expand)

Overview
Demo

Overview

Microsoft Fabric is a comprehensive data analytics platform that brings together various data services to provide an end-to-end solution for data engineering, data science, data warehousing, real-time analytics, and business intelligence. It's designed to simplify the process of working with data and to enable organizations to gain insights more efficiently.

Capabilities Enabled by LLMs:

Document Summarization: LLMs can process and summarize large documents, making it easier to extract key information.

Question Answering: Users can perform Q&A tasks on PDF documents, allowing for interactive data exploration.

Embedding Generation: LLMs can generate embeddings for document chunks, which can be stored in a vector store for efficient search and retrieval.

Demo

Tools in practice:

Tool	Description
LangChain	LangChain is a framework for developing applications powered by language models. It can be used with Azure OpenAI to build applications that require natural language understanding and generation. Use Case: Creating complex applications that involve multiple steps or stages of processing, such as preprocessing text data, applying a language model, and postprocessing the results.
SynapseML	SynapseML is an open-source library that simplifies the creation of massively scalable machine learning pipelines. It integrates with Azure OpenAI to provide distributed computing capabilities, allowing you to apply large language models at scale. Use Case: Applying powerful language models to massive amounts of data, enabling scenarios like batch processing of text data or large-scale text analytics.

Tool

Description

LangChain

LangChain is a framework for developing applications powered by language models. It can be used with Azure OpenAI to build applications that require natural language understanding and generation.
Use Case: Creating complex applications that involve multiple steps or stages of processing, such as preprocessing text data, applying a language model, and postprocessing the results.

SynapseML

SynapseML is an open-source library that simplifies the creation of massively scalable machine learning pipelines. It integrates with Azure OpenAI to provide distributed computing capabilities, allowing you to apply large language models at scale.
Use Case: Applying powerful language models to massive amounts of data, enabling scenarios like batch processing of text data or large-scale text analytics.

Set Up Your Environment

Register the Resource Provider: Ensure that the microsoft.fabric resource provider is registered in your subscription.
Create a Microsoft Fabric Resource:
- Navigate to the Azure Portal.
- Create a new resource of type Microsoft Fabric.
- Choose the appropriate subscription, resource group, capacity name, region, size, and administrator.
Enable Fabric Capacity in Power BI:
- Go to the Power BI workspace.
- Select the Fabric capacity license and the Fabric resource created in Azure.
Pause Fabric Compute When Not in Use: To save costs, remember to pause the Fabric compute in Azure when you're not using it.

Install Required Libraries

Access Microsoft Fabric:
- Open your web browser and navigate to the Microsoft Fabric portal.
- Sign in with your Azure credentials.
Select Your Workspace: From the Microsoft Fabric home page, select the workspace where you want to configure SynapseML.
Create a New Cluster:
- Within the Data Science component, you should find options to create a new cluster.
- Follow the prompts to configure and create your cluster, specifying the details such as cluster name, region, node size, and node count.
Install SynapseML on Your Cluster: Configure your cluster to include the SynapseML package.
```
%pip show synapseml
```

Install LangChain and Other Dependencies:

You can use %pip install to install the necessary packages

%pip install openai langchain_community

Or you can use the environment configuration:

You can also try with the .yml file approach. Just upload your list of dependencies. E.g:

dependencies:
  - pip:
      - synapseml==1.0.8
      - langchain==0.3.4
      - langchain_community==0.3.4
      - openai==1.53.0
      - langchain.openai==0.2.4

Configure Azure OpenAI Service

Note

Click here to see all notebook

Set Up API Keys: Ensure you have the API key and endpoint URL for your deployed model. Set these as environment variables

import os

# Set the API version for the Azure OpenAI service
os.environ["OPENAI_API_VERSION"] = "2023-08-01-preview"

# Set the base URL for the Azure OpenAI service
os.environ["AZURE_OPENAI_ENDPOINT"] = "https://your-resource-name.openai.azure.com"

# Set the API key for Azure OpenAI
os.environ["AZURE_OPENAI_API_KEY"] = "your-api-key"

Initialize Azure OpenAI Class: Create an instance of the Azure OpenAI class using the environment variables set above.

from langchain_openai import AzureChatOpenAI

# Set the API base URL
api_base = os.environ["AZURE_OPENAI_ENDPOINT"]

# Create an instance of the Azure OpenAI Class
llm = AzureChatOpenAI(
   openai_api_key=os.environ["AZURE_OPENAI_API_KEY"],
   temperature=0.7,
   verbose=True,
   top_p=0.9
)

Call the Deployed Model: Use the Azure OpenAI service to generate text or perform other language model tasks. Here's an example of generating a response based on a prompt

# Define a prompt
messages = [
   (
       "system",
       "You are a helpful assistant that translates English to French. Translate the user sentence.",
   ),
   ("human", "Hi, how are you?"),
]

# Generate a response from the Azure OpenAI service using the invoke method
ai_msg = llm.invoke(messages)

# Print the response
print(ai_msg)

Make sure to replace "your_openai_api_key", "https://your_openai_api_base/", "your_deployment_name", and "your_model_name" with your actual API key, base URL, deployment name, and model name from your Azure OpenAI instance. This example demonstrates how to configure and use an existing Azure OpenAI instance in Microsoft Fabric.

Basic Usage of LangChain Transformer

Note

E.g: Automate the process of generating definitions for technology terms using a language model. The LangChain Transformer is a tool that makes it easy to use advanced language models for generating and transforming text. It works by setting up a template for what you want to create, linking this template to a language model, and then processing your data to produce the desired output. This setup helps automate tasks like defining technology terms or generating other text-based content, making your workflow smoother and more efficient.

LangChain Transformer helps you automate the process of generating and transforming text data using advanced language models, making it easier to integrate AI capabilities into your data workflows.

Prompt Creation: Start by defining a template for the kind of text you want to generate or analyze. For example, you might create a prompt that asks the model to define a specific technology term.

Chain Setup: Then set up a chain that links this prompt to a language model. This chain is responsible for sending the prompt to the model and receiving the generated response.

Transformer Configuration: The LangChain Transformer is configured to use this chain. It specifies how the input data (like a list of technology names) should be processed and what kind of output (like definitions) should be produced.

Data Processing: Finally, apply this setup to a dataset. E.g., list of technology names in a DataFrame, and the transformer will use the language model to generate definitions for each technology.

Create a Prompt Template: Define a prompt template for generating definitions.

from langchain.prompts import PromptTemplate

copy_prompt = PromptTemplate(
    input_variables=["technology"],
    template="Define the following word: {technology}",
)

Set Up an LLMChain: Create an LLMChain with the defined prompt template.

from langchain.chains import LLMChain

chain = LLMChain(llm=llm, prompt=copy_prompt)

Configure LangChain Transformer: Set up the LangChain transformer to execute the processing chain.

# Set up the LangChain transformer to execute the processing chain.
from synapse.ml.cognitive.langchain import LangchainTransformer

openai_api_key= os.environ["AZURE_OPENAI_API_KEY"]

transformer = (
   LangchainTransformer()
   .setInputCol("technology")
   .setOutputCol("definition")
   .setChain(chain)
   .setSubscriptionKey(openai_api_key)
   .setUrl(api_base)
)

Create a Test DataFrame: Construct a DataFrame with technology names.

from pyspark.sql import SparkSession
from pyspark.sql.functions import udf
from pyspark.sql.types import StringType

# Initialize Spark session
spark = SparkSession.builder.appName("example").getOrCreate()

# Construct a DataFrame with technology names
df = spark.createDataFrame(
   [
       (0, "docker"), (1, "spark"), (2, "python")
   ],
   ["label", "technology"]
)

# Define a simple UDF to transform the technology column
def transform_technology(tech):
   return tech.upper()

# Register the UDF
transform_udf = udf(transform_technology, StringType())

# Apply the UDF to the DataFrame
transformed_df = df.withColumn("transformed_technology", transform_udf(df["technology"]))

# Show the transformed DataFrame
transformed_df.show()

Using LangChain for Large Scale Literature Review

Note

E.g: Automating the extraction and summarization of academic papers: script for an agent using LangChain to extract content from an online PDF and generate a prompt based on that content. An agent in the context of programming and artificial intelligence is a software entity that performs tasks autonomously. It can interact with itsenvironment, make decisions, and execute actions based on predefined rules or learned behavior.

Define Functions for Content Extraction and Prompt Generation: Extract content from PDFs linked in arXiv papers and generate prompts for extracting specific information.

from langchain.document_loaders import OnlinePDFLoader

def paper_content_extraction(inputs: dict) -> dict:
    arxiv_link = inputs["arxiv_link"]
    loader = OnlinePDFLoader(arxiv_link)
    pages = loader.load_and_split()
    return {"paper_content": pages[0].page_content + pages[1].page_content}

def prompt_generation(inputs: dict) -> dict:
    output = inputs["Output"]
    prompt = (
        "find the paper title, author, summary in the paper description below, output them. "
        "After that, Use websearch to find out 3 recent papers of the first author in the author section below "
        "(first author is the first name separated by comma) and list the paper titles in bullet points: "
        "<Paper Description Start>\n" + output + "<Paper Description End>."
    )
    return {"prompt": prompt}

Create a Sequential Chain for Information Extraction: Set up a chain to extract structured information from an arXiv link

from langchain.chains import TransformChain, SimpleSequentialChain

paper_content_extraction_chain = TransformChain(
    input_variables=["arxiv_link"],
    output_variables=["paper_content"],
    transform=paper_content_extraction,
    verbose=False,
)

paper_summarizer_template = """
You are a paper summarizer, given the paper content, it is your job to summarize the paper into a short summary, 
and extract authors and paper title from the paper content.
"""

Machine Learning Integration with Microsoft Fabric

Train and Register Machine Learning Models: Use Microsoft Fabric's native integration with the MLflow framework to log the trained machine learning models, the used hyperparameters, and evaluation metrics.

import mlflow
from mlflow.models import infer_signature
from sklearn.datasets import make_regression
from sklearn.ensemble import RandomForestRegressor

# Generate synthetic regression data
X, y = make_regression(n_features=4, n_informative=2, random_state=0, shuffle=False)

# Model parameters
params = {"n_estimators": 3, "random_state": 42}

# Model tags for MLflow
model_tags = {
   "project_name": "grocery-forecasting",
   "store_dept": "produce",
   "team": "stores-ml",
   "project_quarter": "Q3-2023"
}

# Log MLflow entities
with mlflow.start_run() as run:
   # Train the model
   model = RandomForestRegressor(**params).fit(X, y)

   # Infer the model signature
   signature = infer_signature(X, model.predict(X))

   # Log parameters and the model
   mlflow.log_params(params)
   mlflow.sklearn.log_model(model, artifact_path="sklearn-model", signature=signature)

   # Register the model with tags
   model_uri = f"runs:/{run.info.run_id}/sklearn-model"
   model_version = mlflow.register_model(model_uri, "RandomForestRegressionModel", tags=model_tags)

   # Output model registration details
   print(f"Model Name: {model_version.name}")
   print(f"Model Version: {model_version.version}")

Compare and Filter Machine Learning Models: Use MLflow to search among multiple models saved within the workspace.

from pprint import pprint
from mlflow.tracking import MlflowClient

client = MlflowClient()
for rm in client.search_registered_models():
   pprint(dict(rm), indent=4)

Refresh Date: 2025-07-16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Demostration: How to integrate AI in Microsoft Fabric

Overview

Demo

Set Up Your Environment

Install Required Libraries

Configure Azure OpenAI Service

Basic Usage of LangChain Transformer

Using LangChain for Large Scale Literature Review

Machine Learning Integration with Microsoft Fabric

FilesExpand file tree

msFabric-AI_integration

Directory actions

More options

Directory actions

More options

Latest commit

History

msFabric-AI_integration

Folders and files

parent directory

README.md

Demostration: How to integrate AI in Microsoft Fabric

Overview

Demo

Set Up Your Environment

Install Required Libraries

Configure Azure OpenAI Service

Basic Usage of LangChain Transformer

Using LangChain for Large Scale Literature Review

Machine Learning Integration with Microsoft Fabric