Implementation of Microsoft’s Phi-3 with Hugging Face Transformers library in Python

TONI RAMCHANDANI
Generative AI
Published in
7 min readApr 25, 2024

--

All About It

Microsoft has introduced the Phi-3 Mini, a compact yet powerful AI model boasting 3.8 billion parameters, marking the first in a trio of scaled-down models the company intends to roll out. Designed to be more manageable in size compared to behemoths like GPT-4, Phi-3 Mini has been optimized for efficiency and is now accessible via Azure, Hugging Face, and Ollama platforms. The upcoming models, named Phi-3 Small and Phi-3 Medium, will feature 7 billion and 14 billion parameters respectively, offering progressively more sophisticated capabilities.

The introduction of Phi-3 Mini follows the December release of Phi-2, which matched the performance of larger models such as Llama 2. Microsoft asserts that Phi-3 not only surpasses its predecessor but also performs comparably to models ten times its size, providing an impressive level of responsiveness from a smaller framework.

Eric Boyd, Microsoft’s Corporate Vice President of the Azure AI Platform, highlighted to The Verge that the Phi-3 Mini performs on par with larger language models like GPT-3.5 but is housed in a more compact form. This smaller size translates into cost savings and enhanced suitability for personal devices like smartphones and laptops, reflecting Microsoft’s strategic emphasis on developing lightweight AI models. Earlier reports from The Information noted Microsoft’s efforts to assemble a team dedicated to this initiative, alongside other specialized projects like Orca-Math, an AI focused on solving mathematical problems.

The competitive landscape includes similar initiatives from other tech giants. Google’s Gemma 2B and 7B models are tailored for tasks such as powering simple chatbots and handling language-related functions, while Anthropic’s Claude 3 Haiku excels at digesting and summarizing complex research papers. Meta’s Llama 3 8B, meanwhile, is geared towards chatbot functionality and coding support.

An interesting aspect of Phi-3’s development is its training regimen, which Boyd likens to educational strategies used in childhood learning. Inspired by the structure and simplicity of children’s books, developers crafted a specialized curriculum to teach the model. When adequate children’s literature proved scarce, the team prompted an LLM to create new “children’s books” from a list of over 3,000 words, effectively generating tailored training content.

Boyd explained that Phi-3 builds upon the foundational knowledge of its predecessors: while Phi-1 was adept at coding tasks, Phi-2 advanced to develop reasoning skills. Phi-3 amalgamates these capabilities, offering improved coding and reasoning prowess. However, Boyd acknowledges that despite its advancements, Phi-3 does not rival the expansive knowledge base of models trained on more extensive datasets like GPT-4, which benefit from broader internet-sourced content.

Phi-3 Mini and its forthcoming variants represent Microsoft’s commitment to providing powerful, yet more accessible AI options. These models are particularly beneficial for companies with smaller internal datasets, offering a cost-effective solution without sacrificing computational power. This strategic move not only enhances Microsoft’s product offerings but also meets the growing demand for efficient, scalable AI tools across various sectors.

Implementation

Lets demonstrates how to interact with Microsoft’s Phi-3 model, a state-of-the-art language model, using the Hugging Face Transformers library in Python. Initially, it involves user authentication with Hugging Face to access the model. Subsequently, the script installs necessary Python packages to ensure the latest functionalities are available. The core of the script involves loading both the tokenizer and the model from Hugging Face’s platform, specifically configured to handle causal language modeling tasks, which are suitable for generating text based on prompts. The setup includes preparing the environment to utilize GPU resources for efficient computation.

The script sets up a simple interaction where the model is asked about “How ai shake hands with quantum computing”. To facilitate this, it employs a tokenizer to format the input into a structured prompt compatible with the model’s expected input format, including converting the chat history into a tokenized format that the model can process. The model then generates a response based on the input prompt, controlling the creativity of the response with a defined temperature parameter, which helps in balancing randomness and fidelity in the generated text. Finally, the output tokens are decoded back into human-readable text, showcasing the model’s ability to engage in meaningful dialogue. This example not only highlights the practical application of advanced NLP models in generating interactive text but also demonstrates the integration of cutting-edge AI models into applications that can be scaled to run on various devices, including those with GPU support.

Here’s a detailed breakdown of each part of the script:

Authentication with Hugging Face Hub:

Imports the ‘notebook_login’ function from the ‘huggingface_hub’ package and calls it to authenticate the user. This is necessary for accessing models hosted on the Hugging Face platform that may require authorization.

from huggingface_hub import notebook_login
notebook_login()

Installing Necessary Libraries:

Ensure that the latest versions of the ‘transformers’ and ‘accelerate’ libraries are installed. ‘transformers’ is used for loading and interacting with pre-trained models, while ‘accelerate’ helps in optimizing model performance across different computing environments.

!pip install -U transformers accelerate

Loading the Tokenizer and Model:

Utilize the `transformers` library from Hugging Face to load a pre-trained language model, specifically Microsoft’s Phi-3 Mini model optimized for causal language modeling tasks. The code first imports necessary classes for handling the model and its tokenizer. It then proceeds to load the tokenizer and the model itself using the ‘from_pretrained’ method with the identifier ‘”microsoft/Phi-3-mini-128k-instruct”’, which ensures compatibility between the tokenizer and the model, both optimized for generating text based on instructions. The model loading includes specifications like ‘torch_dtype=”auto”’ to automatically optimize tensor data types for computational efficiency and ‘trust_remote_code=True’ to allow the execution of custom model code, emphasizing the need for trust in the source of the model due to potential security risks. This setup is crucial for applications that involve generating text responses from models trained to understand and process natural language effectively.

from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-128k-instruct")
model = AutoModelForCausalLM.from_pretrained(
"microsoft/Phi-3-mini-128k-instruct",
torch_dtype="auto",
trust_remote_code=True,
)

Setting Up Chat Interaction:

Create a chat history array where the user asks a question about How ai shake hands with quantum computing.


import torch
chat = [
{
"role": "user",
"content": "How ai shake hands with quantum computing",
}
]

Generating the Prompt:

prompt = tokenizer.apply_chat_template(chat, tokenize = False, add_generation_prompt = True)

token_ids = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt")

token_ids

Generating a Response:

with torch.no_grad():
output_ids = model.generate(
token_ids.to(model.device),
do_sample = True,
temperature = 0.7,
max_new_tokens = 512
)

output = tokenizer.decode(output_ids.tolist()[0][token_ids.size(1):], skip_special_tokens=True)

output

Output

AI doesn't literally "shake hands" as humans do. However, when referring to "how AI can interact or work alongside quantum computing," it's about integrating AI capabilities with quantum computing's power to enhance performance, solve complex problems, and innovate. Here's a non-literal interpretation of your question:

1. **Optimization Solutions:** Quantum computers can process vast amounts of data at unprecedented speeds. AI algorithms running on these quantum systems can optimize logistics, financial modeling, or complex simulations more efficiently than classical computers.

2. **Enhanced Machine Learning:** By leveraging quantum computing, AI can perform machine learning tasks at a scale and speed not possible with traditional computing. Quantum-enhanced machine learning can lead to breakthroughs in data analysis, pattern recognition, and predictive modeling.

3. **Cryptography:** Quantum computing has the potential to revolutionize cybersecurity, including cryptography. AI can manage and analyze the immense amount of data and cryptographic operations processed by quantum computers to ensure secure communications.

4. **Drug Discovery:** AI, when combined with quantum computing, can dramatically speed up the drug discovery process. Quantum systems can explore chemical and molecular interactions at a quantum level, while AI algorithms can analyze and predict outcomes more efficiently.

5. **Complex System Analysis:** Quantum computing's ability to handle vast combinatorial possibilities complements AI's analytical capabilities. Together, they can analyze complex systems (e.g., climate models, financial systems) in ways previously unfeasible, providing deeper insights and predictions.

In essence, AI and quantum computing are two advanced technologies that, when combined, have the potential to solve problems and perform tasks that are beyond the reach of either alone, symbolically "shaking hands" to form a powerful partnership for innovation and progress.

Phi-3 Mini and its forthcoming variants represent Microsoft’s commitment to providing powerful, yet more accessible AI options. These models are particularly beneficial for companies with smaller internal datasets, offering a cost-effective solution without sacrificing computational power. This strategic move not only enhances Microsoft’s product offerings but also meets the growing demand for efficient, scalable AI tools across various sectors.

You can get all the above code easily available at

About Me🚀
Hello! I’m Toni Ramchandani 👋. I’m deeply passionate about all things technology! My journey is about exploring the vast and dynamic world of tech, from cutting-edge innovations to practical business solutions. I believe in the power of technology to transform our lives and work. 🌐

Let’s connect at https://www.linkedin.com/in/toni-ramchandani/ and exchange ideas about the latest tech trends and advancements! 🌟

Engage & Stay Connected 📢
If you find value in my posts, please Clapp 👏 | Like 👍 and share 📤 them. Your support inspires me to continue sharing insights and knowledge. Follow me for more updates and let’s explore the fascinating world of technology together! 🛰️

This story is published under Generative AI Publication.

Connect with us on Substack, LinkedIn, and Zeniteq to stay in the loop with the latest AI stories. Let’s shape the future of AI together!

--

--