Finops Mission impossible?: AZURE AI Services cost + Foundational models cost such as GPT-4, LLAMA2 or Mistral (I)

Did you realize Microsoft is involved in a frenetic evolution of their AI services?.Old AI services such as “Luis” or QnMaker are now label as Classic, new LLM (Large Language Models), those which speak with you, as GTP 4.0 or GTP 3.5 Turbo from OPEN AI are available to customers through a request application form or other as LLAMA2 from META and MISTRAL from Hugging Face can be uses in PAYG.

Let´s see from a holistic approach how to start applying FINOPS to such an AI offerings. But first of all, we need to understand what are all the Azure AI Services.

Azure AI Services (Classic Azure Congnitives still alive and kicking)

Azure AI Services provides an incredible number of solutions to leverage AI potential to lots of business scenarios or assist citizens in their daily tasks such as:

AI Search –  Provides secure information retrieval at scale over user-owned content in traditional and conversational search applications. Azure AI Search has the ability to rapidly find relevant data which is essential to the end-user experience and results.

The most interesting application within LLM is that works smoothly with ADA, an OPEN AI model focus on embedding technology, ( Embeddings are numerical representations of concepts allowing computers to understand their relationships).

Computer Vision – Use visual data processing to label content (from objects to concepts), extract printed and handwritten text, recognise familiar subjects like brands and landmarks, and moderate content. 

FACE API – Embed facial recognition into your apps for a seamless and highly secured user experience. No machine learning expertise is required. Features include: face detection that perceives faces and attributes in an image; person identification that matches an individual in your private repository of up to 1 million people; and recognition and grouping of similar faces in image.

Speech Services – transcribe audible speech into readable, searchable text. Add real-time speech translations to your apps and services. Convert text to audio nearly in real time. Quickly build speech-enabled apps and services using the programming languages you already work with. Customize speech systems to optimize quality for specific scenarios.

Document Intelligent – Accelerate your business processes by automating information extraction. Document Intelligence applies advanced machine learning to accurately extract text, key/value pairs, and tables from documents. With just a few samples, Document Intelligence tailors its understanding to your documents, both on-premises and in the cloud. 

Azure AI Bot Service – provides an integrated development environment for bot building. Its integration with power Virtual agents (Microsoft Copilot Studio lets you create powerful AI-powered copilots for a range of requests from providing simple answers to common questions to resolving issues requiring complex conversations).

LUIS or QnMaker appears in the portal as classic and in less than a year, i guess would be deprecated within the next two years.

PRICING goes from F0 or Free tier to Standard tiers to some compromised Tiers depending on the AI services as well as PAYG. No DISCOUNTS programs are still available..

Azure OPEN IA is included within those previous Azure AI services. Although is available in your Azure portal within that area, only those lucky companies and users which are approved to access after sending a request proposal, can enjoy this new experience (March 2024 GTM).

The revolution to be honest comes here. The spring of AI stands up with these foundational models (already trained and fined- tuned).

Basically, you can choose several LLM models as GTP 3.5 turbo, GTP 4.0, Embedding models as ADA and others to generate images from a text as DALLE- E 2 or E3, etc.

Azure OpenAI supports many generative AI workloads such as:

  • Generating Natural Language
    • Text completion: generate and edit text
    • Embeddings: search, classify, and compare text
  • Generating Code: generate, edit, and explain code
  • Generating Images: generate and edit images

PRICING goes from PAYG to PTU (Provisioned Throughput Units) usually from 1000 Token for LLM models and 100 Images for Image models.

If you login to Azure AI Studio, you can start to work with OPEN AI models as well as Open models as LLAMA from Meta and Mistral (Europe is still Alive!!) from Hugging Face. Take into account Azure AI Studio is still in public preview at the time of writing this post.:-) and Microsoft doesn´t recommend it for production workloads.

Right now if you want to launch LLAMA2 in production please go to the AZURE MARKETPLACE models offering.

LLAMA2 and MISTRAL are so powerful as OPEN AI models but are less integrated than the Sam Altman company models.

Actually, the only way to deploy Llama-2-70b-chat, for example, would be in a PAYG token approach. Indeed, the pay-as-you-go model deployment offering is only available with AI hubs created in East US 2 and West US 3 regions.

Despite Llama should be deploy through Marketplace (Classic Method), it seems Microsoft advice you to use Azure AI Studio to deploy Mistral.

Also, regarding the way to consume cloud with this model, you have several options.Mistral AI offers two categories of models:

  • Premium models: Mistral Large. These models are available with PAYG token based billing with Models as a Service in the AI Studio model catalog.
  • Open models: Mixtral-8x7B-Instruct-v01, Mixtral-8x7B-v01, Mistral-7B-Instruct-v01, and Mistral-7B-v01. These models are also available in the AI Studio model catalog and can be deployed to dedicated VM instances in your own Azure subscription with Managed Online Endpoints.

In the next post we will understand the way we pay for TOKEN, how you can use tools as Azure AI HUB or Azure Calculator or Azure Cost Management to deal with this mess an some other surprise.

Enjoy the journey to the cloud with me…see you soon.

Leave a comment