Generative AI Newsletter — April 2024

Published in

Generative AI

8 min readMay 8, 2024

Hey Everyone,

Welcome to the April 2024 edition of our Generative AI newsletter! We are super excited to share with you the newest and coolest advancements in the world of Generative AI.

Our mission remains steadfast: to keep you informed, inspired, and engaged with thought-provoking content, insightful analyses, and the most recent breakthroughs in Generative AI.

Recent AI News

Meta releases Llama3 models setting a new benchmark on open source LLMs: Meta has released a new series of large language models called Llama 3, with model sizes ranging from 8 billion to 70 billion parameters, and an upcoming 400 billion parameter version. These models represent a significant advancement in natural language processing capabilities. They were trained on an enormous 15 trillion token dataset compiled from publicly available sources, using advanced data filtering techniques. The training process was highly compute-intensive, utilizing over 400 TFLOPS per GPU across 16,000 GPUs. Under the hood, Llama 3 incorporates innovations like a 128K token vocabulary tokenizer for more efficient encoding, and a Grouped Query Attention mechanism that boosts performance.

The Llama 3 models demonstrate impressive benchmark results:

MMLU: 8B model scores 68.4, 70B model achieves 82.0 (projected 85 for 400B)
HumanEval: 8B at 62.2, 70B reaches 81.7
GSM-8K: 79.6 for 8B, 70B model leads with 93.0
MATH: 8B at 30.0, 70B scores 50.4

Notably, Meta has made Llama 3 fully open-source, including model weights, with no access costs. This aligns with their stance on open-source AI driving safer, faster innovation across disciplines. Available on major cloud platforms, Llama 3 could significantly impact the open-source AI landscape with its scale, performance, and potential for multimodality and larger context windows.

Model comparison to other open source and closed source models further highlights the power of this model

What people are saying

“LLaMA-3 is a prime example of why training a good LLM is almost entirely about data quality” — Cameron R Wolfe

“The upcoming Llama-3–400B+ will mark the watershed moment that the community gains open-weight access to a GPT-4-class model.” — Dr. Jim Fan

Microsoft launches Phi3 series of models: Microsoft has launched the Phi-3 series, a set of efficient language models designed for mobile devices and PCs. The series includes three sizes: mini (3.8B parameters), small (7B), and medium (14B). These transformer decoder models are trained on a combination of filtered web data and synthetic data, using a two-phase approach to enhance general knowledge and specialized skills like logical reasoning. The models leverage the same tokenizer as Llama-2 for compatibility and focus on robustness, safety, and effective interaction across formats. Highlights of their performance include:

Mini model achieves 69% on MMLU and 8.38 on MT-bench, on par with larger models
Default 4K context length, expandable to 128K with LongRope technology
Mini model optimized for mobile, requiring ~1.8GB at 4-bit compression and processing over 12 tokens/sec on iPhone 14
Post-training enhancements for domains like math and coding
Extended 128K context version of mini model for complex tasks

The entire Phi-3 series is available under an MIT license on the Hugging Face platform, allowing for widespread integration and use.

What people are saying

“Phi-3 7B just dropped and beats Llama-3 7B handily. With an MMLU of 75.3, it’s coming close to 70B SOTA models!! I wouldn’t be surprised if we ended up with a 7B model that beats GPT-4 by the end of the year.” — Bindu Reddy

Try Phi-3 on HF — >

Apple releases Open ELM — Small language models to run on device: Apple has unveiled OpenELM, a family of compact yet efficient language models tailored for on-device applications on mobile devices and computers. Ranging from 270M to 3B parameters, these models leverage a novel “layer-wise scaling” architecture that strategically allocates fewer parameters to initial transformer layers and gradually increases the parameter count towards the output layers, optimizing compute resources based on information complexity at each layer.

Key highlights of OpenELM:

Trained on 1.8T tokens from datasets like RefinedWeb, deduplicated PILE, RedPajama subset, and Dolma v1.6 subset
OpenELM-1.1B outperforms AI2’s OLMo-1B by 2.36% accuracy using half the pre-training tokens
On benchmarks: 3B model scored 42.24% on ARC-C, 26.76% on MMLU, 73.28% on HellaSwag
Pre-trained and instruction-tuned checkpoints available for all four sizes (270M, 450M, 1.1B, 3B)
Open-sourced under permissive “sample code” license, with CoreNet library for reproducibility
Requires hardware like Intel i9 with RTX 4090 GPU or M2 Max MacBook Pro for inference

As a vertically integrated hardware and software company, Apple’s open-source OpenELM paves the way for on-device AI assistants and language capabilities without privacy trade-offs, potentially enabling more advanced device-centric AI experiences across Apple’s ecosystem.

What people are saying

“Can’t wait for Apple to step into the LLM arena. They own the hardware in all our pockets. They have to be the one to do this. Fingers crossed that they deliver the ability to run a decent model locally “ — Indira Negi

“It seems like everyone is joining the trend of creating compact models, and this launch is another hint towards Apple’s possible advancements in on-device AI, which might be revealed at WWDC” — The AI Edge

A mysterious GPT2-chatbot appears on LMSys and stuns everyone: The AI world was sent into a frenzy when a mysterious model called “gpt2-chatbot” appeared without fanfare on LMSYS Chatbot Arena and proceeded to stun researchers by outperforming OpenAI’s GPT-4 and Anthropic’s Claude Opus in reasoning, coding, and math tasks. This enigmatic challenger solved an International Math Olympiad problem on the first try — a feat only achieved by the top 4 U.S. high school students annually.

It exceeded benchmarks on complex coding prompts, demonstrated iterative dialogue capabilities, self-awareness in refining responses, and even exhibited rule-breaking behavior by solving logic puzzles that historically stumped GPT-4. With no official documentation, intense speculation arose about its origins — some believe it could be an OpenAI release or preview of GPT-5, while others theorize an independent group released it to showcase cutting-edge AI capabilities, akin to the GPT-4chan phenomenon in 2022. But just as mysteriously as it arrived, gpt2-chatbot vanished without a trace, leaving the AI community clamoring for answers about this supremely capable yet enigmatic model.

What people are saying

Most likely explanation for gpt2-chatbot: OpenAI has been working on a more efficient method for fine-tuning language models, and they managed to get GPT-2, a 1.5B parameter model, to perform pretty damn close to GPT-4, which is an order of magnitude larger and more costly to train/run. They’re driving down the cost of operating LLMs by injecting the little models with some fine-tuned steroids. “GPT-5” might have fewer parameters than GPT-4. — AI Breakfast

Try: LMSys ChatBot Arena →

Meta’s game-changing Multi-Token prediction: Meta has proposed a groundbreaking new approach to training large language models called “multi-token prediction.” Instead of the traditional next-token prediction objective, their method trains models to simultaneously predict multiple future tokens at each position in the input sequence. This is achieved through an architecture with a shared trunk that encodes the input context, followed by multiple output heads that independently predict different future tokens in parallel during training. See architecture shown below:

The key benefits of this multi-token prediction approach are:

Enhanced sample efficiency and faster inference times, up to 3x speedup particularly for larger models and batch sizes
Substantial performance improvements over next-token prediction models on coding tasks and generative benchmarks
Scalability benefits become more pronounced as model size increases
Robustness of training gains maintained even over multiple epochs

Under the hood, each output head makes its prediction independently based on the shared context representation from the trunk. During training, the model is optimized to predict each future token in parallel across the multiple heads, effectively training it to consider multiple possible future outcomes at each step. At inference time, this allows generating multiple tokens simultaneously for much faster text generation.

Meta’s research shows promising results of this multi-token prediction technique leading to more sample-efficient, higher-performing and faster language models — a potential paradigm shift in large language model training.

OpenAI released Memory feature in ChatGPT: Access to proprietary memory can make a model more personal and useful. Example, knowledge of someone’s location or preferences can be used to return better resposnes. However, there is sensitivity involved in collecting and storing personal data.

This new feature on ChatGPT allows users to put in information, they would like for the model to use in a memory tab. The information is entered by the user and can be edited any time.

Let us know if you get to try this feature!

Some Recent Interesting Stories

Implementation of Microsoft Phi-3 model in HugginFace: Good blog with code on getting started with Microsoft Phi-3 model
Demystifying PDF Parsing: A detailed look at different methods for PDF parsing
Getting the perfect AI text length every time: An interesting and calculated analysis of the impact of different prompt words on the output text length from ChatGPT
Building Custom LLMs — A detailed Guide — Details a 10 step process for creating your own custom LLM
Generate infinite image variations using Freepik — A walkthrough of the Reimagine tool from Freepik and how you can use it to generate almost infinite variations
Summarize Long Form Youtube Videos — Practical guide on summarizing long youtube videos using NoteGPT

Recent Boosted Stories

AI’s role in supporting disabled people : This interesting story talks about creating new narratives around AI and its usage
Mitigate 3 Major LLM Security Threats to Protect Your Business: Understand the vulnerabilities of using LLMs in production and how to address them
Advanced RAG — Recursive Retrieval Strategy— Improve your basic RAG pipeline using Recursive Retrieval

Giggles & Gigabytes: The AI Comedy Corner

Source: Link

“Is it perfect? No. Is it as good as my executive team? No. Is it really, really valuable, so valuable that I talk to ChatGPT every single day? Yes.” — Jeff Maggioncalda, CEO of Coursera

That’s all for now! We hope you found something interesting, or at least vaguely useful, in this newsletter. If you have any questions, or ideas for future editions, or want to chat about the fascinating world of AI, then reach out to us in the comments below.

Until next time,

Jim and Priya

Generative AI Newsletter — April 2024

Recent AI News

Some Recent Interesting Stories

Recent Boosted Stories

Giggles & Gigabytes: The AI Comedy Corner

Written by Priya Dwivedi