Token Alchemy: Turning Ideas into Hard Numbers

There’s no shortage of hype around generative AI. From dinner table debates to executive boardrooms, people are abuzz with talk of AI transforming everything, from coding to customer service, risk analysis to recipe generation. Across industries, leaders are feeling the pressure to “do something” with AI. But what exactly?

As businesses look for ways to improve productivity and reduce costs, inference-based solutions can offer a smart entry point into the generative AI era.

As organizations explore modernizing legacy applications, integrating inference-based functionality could be a game changer. But while everyone loves to demo generative AI, few are talking about what it actually costs to run at scale. Other than, it looks expensive.

From Cool Demo to Scalable Reality

Let’s talk about the hard part.

Once you move beyond a clever prototype and start considering inference in a production setting, several challenges appear. First, there’s latency and performance, your model needs to return results fast enough for real-world use.

Then there’s infrastructure. Do you run this on CPUs? GPUs? Where? And let’s not forget model size, fine-tuning, and security.

Today, I’d recommend starting with inference before diving into fine-tuning. But one question tends to dominate stakeholder discussions. What does it actually cost to run?

Whether you’re using OpenAI’s GPT models, your own LLaMA instance on Azure, or Hugging Face models via containers, the real question is, What’s the dollar cost per inference? And more importantly, what’s the cost per business transaction?

Three Paths to Inference Deployment

Let’s break down the three most common paths for running inference:

Option 1: API-Based Inference (Token-Based Services)

  • How it works: You consume a model via a managed API (like OpenAI, Azure OpenAI, or Cohere). You pay per token used.
  • Pros: No infrastructure overhead, rapid setup, great for experimentation and burst workloads.
  • Cons: Limited control over performance, latency, and data governance. You’re locked into model choices and pricing.

Option 2: Containerized Inference (Self-Hosted Models)

  • How it works: You run models like LLaMA or DeepSeek in your own cloud (or even on-prem) using GPU VMs.
  • Pros: Full control over the model and tuning, consistent performance, and easier cost predictability at scale.
  • Cons: High setup complexity, need for ML engineering expertise, GPU cost volatility, and you carry the burden of uptime and scaling.

Option 3: Hybrid Model (Burstable Inference)

  • How it works: You run a base level of dedicated GPU capacity and burst to an API when demand spikes.
  • Pros: Balances cost and performance, reduces latency under load, and provides fallback capacity.
  • Cons: Requires orchestration logic and potentially dual billing models, with added complexity to monitor.

What’s the Cost Per Business Transaction?

This is where it gets real.

An API request or a GPU inference run is not a business outcome. To justify the investment to leadership, you need to tie this back to actual workflows.

Use Case: Construction Site Safety Inspection with AI

Here’s a process you could automate with generative AI:

  1. A construction site photo is uploaded.
  2. A 10MB safety policy document is ingested.
  3. The model identifies any safety violations in the image by comparing it against the policy.
  4. A risk register is generated with identified issues and proposed mitigations.
  5. Tasks are created for the site manager to resolve each issue.

As a ballpark estimate lets say it costs on average of $250 to perform a site inspection and takes about 3 hours per visit.

What would it look like if you could automate most of this and do it daily across every construction site and only send a human when high-risk sites are identified ?

Token Math: Estimating Inference Cost with Real Data

Let’s get into the numbers. A quick and dirty way to estimate inference costs is what I call “token math.”

Assumptions,

  • Policy document: ~10MB of text → ~40,000–50,000 tokens
  • Photo (analyzed for context): ~500–1,000 tokens
  • Prompt: 100–300 tokens
  • Output (structured data, tasks, risk register): 500–2,000 tokens

That gives us a total token count per job of ~11,000–53,000, depending on prompt structure and policy complexity.

Now let’s look at the cost,

  • Containerized GPU run (e.g., LLaMA 8B on Azure low-end GPU VM): ≈ $0.72 per scan at the high end of token usage
  • API-based inference with a similar model ≈ $0.05 per scan

This doesn’t include supporting cloud infrastructure (storage, networking, orchestration), but those are relatively predictable costs that most teams already model.

So even at the higher end, $0.72 vs. $250 per inspection? That’s an eye-popping reduction. Even if you only automated part of the process and cut site visits in half, the ROI becomes clear.

What’s the Takeaway?

As you consider deploying generative AI in production, especially for inference-heavy use cases, the deployment model you choose has a dramatic impact on cost and flexibility.

  • APIs are great for speed and scale
  • Containers give you control and cost predictability
  • Hybrid models offer balance, if you’re ready for the complexity

But no matter the tech stack, the business case is won or lost on how clearly you map tokens to transactions, and dollars to outcomes.

Links to head over to if you want to read some more

Hugging Face Inference Endpoints – Hugging Face

Read about responsible AI if you are interested Responsible AI: Ethical policies and practices | Microsoft AI

If your want to build workflow solutions and inject GenAI check out CoPilot Studio https://www.microsoft.com/microsoft-copilot/microsoft-copilot-studio

If you want to build with AI check out

Demystifying the ISV Partner Channel

Why Partner Ecosystems Matter for ISVs

For Independent Software Vendors (ISVs), scaling beyond a direct sales model is no longer a luxury, t’s a necessity. In today’s competitive SaaS landscape, partner ecosystems serve as powerful growth engines that enable global reach, faster customer onboarding, deeper market penetration, and scalable service delivery.

Whether you’re just starting to explore indirect channels or refining a mature partner strategy, one truth remains: not all partners are created equal.

Navigating the partner ecosystem can feel like alphabet soup—GSI, NSI, SI, ISV, VAR, LAR, MSP. Each partner type brings different strengths, incentives, and go-to-market (GTM) models. Without a clear understanding of their roles, ISV leaders risk misalignment, lost revenue, or poor partner engagement. This guide breaks it all down.

One Size Doesn’t Fit All

Many ISVs step into the partner world assuming more partners mean more revenue, what it really means is busy work. But quality trumps quantity. A GSI won’t solve the same problems as a VAR. An MSP isn’t going to co-develop your product like an OEM might. Clarity is key.

This post helps you:

  • Understand the major partner types
  • Recognize what each brings to the table
  • Identify which ones are right for your strategy

Let’s look at the partner landscape.


Putting It All Together: Strategy Meets Partner Type (with a Microsoft Ecosystem Lens)

Microsoft’s partner ecosystem is one of the most mature and structured in the software industry, making it a powerful lens through which to understand how different partner types contribute to an ISV’s go-to-market success.

Each partner type aligns to specific business goals within Microsoft’s Cloud Partner Program and Azure Marketplace. Here’s how it typically breaks down:

Partner TypeIdeal ForExample Microsoft-Aligned ISV Scenarios
Global System Integrator (GSI)Enterprise deals, global deliveryDynamics 365 or Power Platform transformation via Accenture or TCS
National System Integrator (NSI)Regional growth, regulated industriesSlalom delivering Azure data solutions to U.S. healthcare orgs
System Integrator (SI)Ideally industry-focused, but often not. Access to midmarket customers and great for a product that requires services to be successful. Smaller SI’s are often nimble and flexible SI’s offer a huge range of services from building Microsoft Teams integrations, developing custom software solutions and now building agentic solutions
Independent Software Vendor (ISV)Product innovation and ecosystem playsDocuSign integrating with Microsoft 365 or Power Automate connectors
Value Added Reseller (VAR)Mid-market sales & deploymentCDW bundling Microsoft 365 with cybersecurity solutions
Large Account Reseller (LAR)Licensing scale & procurementSoftwareONE reselling Microsoft 365 or Azure consumption SKUs
DistributorReach and partner enablementTD Synnex recruiting resellers for Azure and Defender bundles
Managed Service Provider (MSP)Operational management & retentionRackspace offering managed Azure and Microsoft 365 environments

In the Microsoft world, co-sell readiness and marketplace presence are essential. ISVs looking to succeed here often:

  • Register as co-sell ready in Partner Center
  • List solutions in Microsoft AppSource or Azure Marketplace
  • Enable their partners through Solution Workspace and PDM relationships

Mapping your partner mix to Microsoft’s ecosystem could provides both strategic leverage and operational scale. Understanding how these partner types fit into Microsoft’s tiered programs, incentives, and co-sell motions can significantly accelerate your ISV growth.


Real-World Applications: How ISVs Use a Partner Mix

Below are some simple examples

1. ServiceNow

  • Works with GSIs NSI and SI’s (e.g., Deloitte, Accenture) for enterprise transformation
  • Builds with ISVs for App Store extensions
  • Leverages SIs and VARs for implementation

2. Okta

  • Partners with SIs and MSPs for identity management rollouts
  • Collaborates with ISVs (e.g., Zoom, Slack) for SSO integrations
  • Uses distributors to reach smaller resellers

3. Atlassian

  • Strong ecosystem of marketplace ISVs
  • Engages SIs for Jira Service Management implementations
  • Scales globally with distributors like Arrow

Know Your Partners, Know Your Growth

As your ISV scales, your partner strategy must evolve. Understanding the strengths and roles of each partner type ensures you can:

  • Design effective co-sell and delivery models
  • Fill ecosystem gaps with purpose
  • Accelerate growth without adding internal overhead

Next steps:

  • Map your ideal partner profile by business goal
  • Evaluate your current ecosystem for coverage and alignment
  • Explore partner enablement and co-sell plays

Excited for the Power Platform Community Conference!

I’m thrilled to be attending the Power Platform Community Conference on September 18th! 🎉 This event is a fantastic opportunity to connect with like-minded professionals, learn about the latest innovations in the Power Platform, and dive deep into the tools that help shape so many incredible solutions. I can’t wait to see what new ideas and best practices I’ll be able to bring back to my work after attending.

If you’re planning on joining or just curious, check out the event here: Power Platform Community Conference 2024.

Looking forward to sharing what I learn with all of you!

Blog refresh time!

I’m starting again, again. I’ve just moved hosting providers again and also starting a fresh. Over the coming month I’ll be drafting and publishing some new content!

What am I looking to share? At this stage I want to share content about BBQ’ing, tech and life in general,