

AFM: Arcee Foundation Model
The latest FM is an SLM! The newest foundation model from Arcee AI is a Small Language Model, or SLM.
In fact, it appears to be the beginning of a new chapter with the Arcee Foundation Model family. The first member - AFM-4.5B—a 4.5-billion-parameter model engineered for “real-world enterprise demands and able to run anywhere, from smartphones to GPU clusters.”
We’re fans of Arcee, as they’ve been onto new trends ahead of the curve for years now, as we’ve covered on this channel before including:
Distributed Fine-Tuning and Training using Spectrum
We’re pumped to talk to Arcee’s CTO Lucas Atkins about the training pipeline for this model, as well as what they have in store for the AFM family of models.
What we know so from their technical blog post is that the AFM family was born out of working with enterprises and noticing three distinct pain points, including
Performance and Size Gaps: Edge-optimized models weren’t simply reliable enough for demanding tasks. Customers needed a model that could run on modest hardware, yet still deliver top-tier accuracy and robustness.
Regulatory and Licensing Friction:T TThe most advanced models from major Chinese AI labs (Deepseek, Qwen, GLM, MiniCPM) offered impressive results, but rarely satisfied Western compliance standards, disqualifying them for regulated industr
Stagnant Western Alternatives: Models from Meta (Llama) and Mistral, while solid, were quickly becoming outdated in relevance. The 3–10B parameter space was primarily served by models a year old or older, outpaced by newer research, data pipelines, and post-training strategies.ie
Shots fired! First and foremost, we’ll set the context of AFM by recapping and digging into some of these opinionated details.
But more interestingly, how is their technology different? Here are some highlights that we’ll cover related to their data curation and training procedures.
Data: 6.58 trillion tokens of the most relevant, highest-quality data possible.
Compute: Amazon SageMaker Hyperpod and orchestrated training across 512 Nvidia H200 GPUs
Post-Training: Fine-tuning, distillation, merging, and alignment techniques were all used. It started with midtraining to give the model “strong early instincts for precision and clarity,” followed by “checkpoint merging, consolidating, and enhancing intermediate models into a cohesive base.” Context window extension was completed using YaRN, a rotary scaling method that retains performance at scale, before MergeKit was used to “refine the long-context foundation.” Mergekit was then used for “layer-wise weighting, residual scaling, and targeted integrations.” Next, supervised fine-tuning was used “focusing on instruction clarity, diversity, and alignment,” before reinforcement learning using verifiable reward signals helped the model “prefer factual, high-utility responses.” Finally, “Post-RL merges smoothed out inconsistencies, and we followed with KTO, an alignment method where the model learns directly from trusted reference behavior.”
What a post-training stack! We have lots of questions 🤓
Moreover, beyond training, the AFM family is ready for Context Engineering using RAG and Agents, and true to the Small Language Model form to what we’ve come to expect from Arcee AI, domain adaptation and customization.
Join us live to learn all about the AFM family of models, and what to expect from the SLM model builders on the LLM Edge at Arcee AI for the rest of the year.
🤓 Who should attend
Engineer and Data Scientists who love training models
AI Engineers and leaders who have to choose models that balance performance and efficiency
Any machine learning nerds out there interested in the state-of-the-art of post-training pipelines!
Speaker Bios
Dr. Greg” Loughnane is the Co-Founder & CEO of AI Makerspace, where he is an instructor for their AI Engineering Bootcamp. Since 2021, he has built and led industry-leading Machine Learning education programs. Previously, he worked as an AI product manager, a university professor teaching AI, an AI consultant and startup advisor, and an ML researcher. He loves trail running and is based in Dayton, Ohio.
Chris “The Wiz” Alexiuk is the Co-Founder & CTO at AI Makerspace, where he is an instructor for their AI Engineering Bootcamp. During the day, he is also a Developer Advocate at NVIDIA. Previously, he was a Founding Machine Learning Engineer, Data Scientist, and ML curriculum developer and instructor. He’s a YouTube content creator who’s motto is “Build, build, build!” He loves Dungeons & Dragons and is based in Toronto, Canada.
Follow AI Makerspace on LinkedIn and YouTube to stay updated about workshops, new courses, and corporate training opportunities.