Costa Rica
Last updated: 2025-07-16
Fabric's OneLake datastore provides a unified data storage solution that supports differents data formats and sources. This feature simplifies data access and management, enabling efficient data preparation and model training.
List of References (Click to expand)
- Unleashing the Power of Microsoft Fabric and SynapseML
- Building a RAG application with Microsoft Fabric
- Building Custom AI Applications with Microsoft Fabric: Implementing Retrieval-Augmented Generation
- Avail the Power of Microsoft Fabric from within Azure Machine Learning
- AI and Machine Learning on Databricks - Azure Databricks | Microsoft Learn
- Training and Inference of LLMs with PyTorch Fully Sharded Data Parallel
- Harness the Power of LangChain in Microsoft Fabric for Advanced Document Summarization
- Integrating Azure AI and Microsoft Fabric for Next-Gen AI Solutions
- Generative AI with Microsoft Fabric
- Harness Microsoft Fabric AI Skill to Unlock Context-Rich Insights from Your Data
- LangChain-AzureOpenAI Parameter API Reference
Table of Content (Click to expand)
Microsoft Fabric is a comprehensive data analytics platform that brings together various data services to provide an end-to-end solution for data engineering, data science, data warehousing, real-time analytics, and business intelligence. It's designed to simplify the process of working with data and to enable organizations to gain insights more efficiently.
Capabilities Enabled by LLMs:
Document Summarization: LLMs can process and summarize large documents, making it easier to extract key information.Question Answering:Users can perform Q&A tasks on PDF documents, allowing for interactive data exploration.Embedding Generation: LLMs can generate embeddings for document chunks, which can be stored in a vector store for efficient search and retrieval.
Tools in practice:
| Tool | Description |
|---|---|
| LangChain | LangChain is a framework for developing applications powered by language models. It can be used with Azure OpenAI to build applications that require natural language understanding and generation. Use Case: Creating complex applications that involve multiple steps or stages of processing, such as preprocessing text data, applying a language model, and postprocessing the results. |
| SynapseML | SynapseML is an open-source library that simplifies the creation of massively scalable machine learning pipelines. It integrates with Azure OpenAI to provide distributed computing capabilities, allowing you to apply large language models at scale. Use Case: Applying powerful language models to massive amounts of data, enabling scenarios like batch processing of text data or large-scale text analytics. |
-
Register the Resource Provider: Ensure that the
microsoft.fabricresource provider is registered in your subscription.
-
Create a Microsoft Fabric Resource:
-
Enable Fabric Capacity in Power BI:
-
Pause Fabric Compute When Not in Use: To save costs, remember to pause the Fabric compute in Azure when you're not using it.
-
Access Microsoft Fabric:
- Open your web browser and navigate to the Microsoft Fabric portal.
- Sign in with your Azure credentials.
-
Select Your Workspace: From the Microsoft Fabric home page, select the workspace where you want to configure SynapseML.
-
Create a New Cluster:
-
Install SynapseML on Your Cluster: Configure your cluster to include the SynapseML package.
%pip show synapseml -
Install LangChain and Other Dependencies:
You can use
%pip installto install the necessary packages%pip install openai langchain_community
Or you can use the environment configuration:
You can also try with the
.yml fileapproach. Just upload your list of dependencies. E.g:dependencies: - pip: - synapseml==1.0.8 - langchain==0.3.4 - langchain_community==0.3.4 - openai==1.53.0 - langchain.openai==0.2.4
Note
Click here to see all notebook
-
Set Up API Keys: Ensure you have the API key and endpoint URL for your deployed model. Set these as environment variables
import os # Set the API version for the Azure OpenAI service os.environ["OPENAI_API_VERSION"] = "2023-08-01-preview" # Set the base URL for the Azure OpenAI service os.environ["AZURE_OPENAI_ENDPOINT"] = "https://your-resource-name.openai.azure.com" # Set the API key for Azure OpenAI os.environ["AZURE_OPENAI_API_KEY"] = "your-api-key"
-
Initialize Azure OpenAI Class: Create an instance of the Azure OpenAI class using the environment variables set above.
from langchain_openai import AzureChatOpenAI # Set the API base URL api_base = os.environ["AZURE_OPENAI_ENDPOINT"] # Create an instance of the Azure OpenAI Class llm = AzureChatOpenAI( openai_api_key=os.environ["AZURE_OPENAI_API_KEY"], temperature=0.7, verbose=True, top_p=0.9 )
-
Call the Deployed Model: Use the Azure OpenAI service to generate text or perform other language model tasks. Here's an example of generating a response based on a prompt
# Define a prompt messages = [ ( "system", "You are a helpful assistant that translates English to French. Translate the user sentence.", ), ("human", "Hi, how are you?"), ] # Generate a response from the Azure OpenAI service using the invoke method ai_msg = llm.invoke(messages) # Print the response print(ai_msg)
Make sure to replace "your_openai_api_key", "https://your_openai_api_base/", "your_deployment_name", and "your_model_name" with your actual API key, base URL, deployment name, and model name from your Azure OpenAI instance. This example demonstrates how to configure and use an existing Azure OpenAI instance in Microsoft Fabric.
Note
E.g: Automate the process of generating definitions for technology terms using a language model.
The LangChain Transformer is a tool that makes it easy to use advanced language models for generating and transforming text. It works by setting up a template for what you want to create, linking this template to a language model, and then processing your data to produce the desired output. This setup helps automate tasks like defining technology terms or generating other text-based content, making your workflow smoother and more efficient.
LangChain Transformer helps you automate the process of generating and transforming text data using advanced language models, making it easier to integrate AI capabilities into your data workflows.
Prompt Creation: Start bydefining a template for the kind of text you want to generate or analyze. For example, you might create a prompt that asks the model to define a specific technology term.Chain Setup: Thenset up a chain that links this prompt to a language model. This chain is responsible for sending the prompt to the model and receiving the generated response.Transformer Configuration: The LangChain Transformer isconfigured to use this chain. It specifies how theinput data (like a list of technology names) should be processed and what kind of output (like definitions) should be produced.Data Processing: Finally,apply this setup to a dataset.E.g., list of technology names in a DataFrame, and the transformer will use the language model to generate definitions for each technology.
-
Create a Prompt Template: Define a prompt template for generating definitions.
from langchain.prompts import PromptTemplate copy_prompt = PromptTemplate( input_variables=["technology"], template="Define the following word: {technology}", )
-
Set Up an LLMChain: Create an LLMChain with the defined prompt template.
from langchain.chains import LLMChain chain = LLMChain(llm=llm, prompt=copy_prompt)
-
Configure LangChain Transformer: Set up the LangChain transformer to execute the processing chain.
# Set up the LangChain transformer to execute the processing chain. from synapse.ml.cognitive.langchain import LangchainTransformer openai_api_key= os.environ["AZURE_OPENAI_API_KEY"] transformer = ( LangchainTransformer() .setInputCol("technology") .setOutputCol("definition") .setChain(chain) .setSubscriptionKey(openai_api_key) .setUrl(api_base) )
-
Create a Test DataFrame: Construct a DataFrame with technology names.
from pyspark.sql import SparkSession from pyspark.sql.functions import udf from pyspark.sql.types import StringType # Initialize Spark session spark = SparkSession.builder.appName("example").getOrCreate() # Construct a DataFrame with technology names df = spark.createDataFrame( [ (0, "docker"), (1, "spark"), (2, "python") ], ["label", "technology"] ) # Define a simple UDF to transform the technology column def transform_technology(tech): return tech.upper() # Register the UDF transform_udf = udf(transform_technology, StringType()) # Apply the UDF to the DataFrame transformed_df = df.withColumn("transformed_technology", transform_udf(df["technology"])) # Show the transformed DataFrame transformed_df.show()
Note
E.g: Automating the extraction and summarization of academic papers: script for an agent using LangChain to extract content from an online PDF and generate a prompt based on that content.
An agent in the context of programming and artificial intelligence is a software entity that performs tasks autonomously. It can interact with itsenvironment, make decisions, and execute actions based on predefined rules or learned behavior.
-
Define Functions for Content Extraction and Prompt Generation: Extract content from PDFs linked in arXiv papers and generate prompts for extracting specific information.
from langchain.document_loaders import OnlinePDFLoader def paper_content_extraction(inputs: dict) -> dict: arxiv_link = inputs["arxiv_link"] loader = OnlinePDFLoader(arxiv_link) pages = loader.load_and_split() return {"paper_content": pages[0].page_content + pages[1].page_content} def prompt_generation(inputs: dict) -> dict: output = inputs["Output"] prompt = ( "find the paper title, author, summary in the paper description below, output them. " "After that, Use websearch to find out 3 recent papers of the first author in the author section below " "(first author is the first name separated by comma) and list the paper titles in bullet points: " "<Paper Description Start>\n" + output + "<Paper Description End>." ) return {"prompt": prompt}
-
Create a Sequential Chain for Information Extraction: Set up a chain to extract structured information from an arXiv link
from langchain.chains import TransformChain, SimpleSequentialChain paper_content_extraction_chain = TransformChain( input_variables=["arxiv_link"], output_variables=["paper_content"], transform=paper_content_extraction, verbose=False, ) paper_summarizer_template = """ You are a paper summarizer, given the paper content, it is your job to summarize the paper into a short summary, and extract authors and paper title from the paper content. """
-
Train and Register Machine Learning Models: Use Microsoft Fabric's native integration with the MLflow framework to log the trained machine learning models, the used hyperparameters, and evaluation metrics.
import mlflow from mlflow.models import infer_signature from sklearn.datasets import make_regression from sklearn.ensemble import RandomForestRegressor # Generate synthetic regression data X, y = make_regression(n_features=4, n_informative=2, random_state=0, shuffle=False) # Model parameters params = {"n_estimators": 3, "random_state": 42} # Model tags for MLflow model_tags = { "project_name": "grocery-forecasting", "store_dept": "produce", "team": "stores-ml", "project_quarter": "Q3-2023" } # Log MLflow entities with mlflow.start_run() as run: # Train the model model = RandomForestRegressor(**params).fit(X, y) # Infer the model signature signature = infer_signature(X, model.predict(X)) # Log parameters and the model mlflow.log_params(params) mlflow.sklearn.log_model(model, artifact_path="sklearn-model", signature=signature) # Register the model with tags model_uri = f"runs:/{run.info.run_id}/sklearn-model" model_version = mlflow.register_model(model_uri, "RandomForestRegressionModel", tags=model_tags) # Output model registration details print(f"Model Name: {model_version.name}") print(f"Model Version: {model_version.version}")
-
Compare and Filter Machine Learning Models: Use MLflow to search among multiple models saved within the workspace.
from pprint import pprint from mlflow.tracking import MlflowClient client = MlflowClient() for rm in client.search_registered_models(): pprint(dict(rm), indent=4)




