On-device or cloud AI: a decision rule

Apple's Foundation Models framework changes the on-device AI calculus. Here is a practical decision framework for choosing platform-native versus cloud LLMs.

Abstract illustration representing on-device AI versus cloud AI, with an Apple device on one side and cloud data streams on the other

Apple's Foundation Models framework update now supports image input alongside text, and Xcode ships with agentic coding workflows that go well beyond autocomplete. For teams building on Apple platforms, the question is no longer whether on-device AI is viable. It is when to use it instead of a cloud LLM, and how to restructure engineering teams around agents that write code.

We have been advising clients through this shift across our Remote Development Teams and Fractional CTO engagements. The pattern we see is that most teams default to cloud LLMs by habit, then quietly burn margin on inference costs and latency that an on-device model would have handled. Others over-rotate to on-device and ship features that need a frontier model to be useful. Both are avoidable.

What actually changed with Apple's Foundation Models

The framework exposes Apple's roughly 3B parameter on-device model directly to developers, with structured output via Swift macros, tool calling, and now multimodal image input. It runs on Apple Silicon with no network round trip, no per-token billing, and no data leaving the device. Apple's developer documentation is worth reading end to end before scoping anything.

The model is small. It is not GPT-4 class and will not pretend to be. What it is good at: classification, summarisation, structured extraction, intent parsing, and constrained generation inside a known domain. Tasks that used to require a cloud call for a 200 token response now run locally in under a second on recent hardware.

The Xcode side is equally significant. Agentic coding workflows mean the IDE can plan multi-file changes, execute them, run tests, and iterate. This is not Copilot. It is closer to having a junior engineer that never gets tired and never asks for clarification when it should.

A decision framework for on-device AI versus cloud

Here is the rule we apply with clients. Choose on-device when at least three of the following are true:

The task fits in a constrained domain with a clear schema for the output.
Latency under one second matters to the user experience.
The data is sensitive enough that sending it to a third party creates compliance friction.
Inference volume is high enough that cloud costs would exceed roughly five cents per active user per month.
The feature must work offline or in low connectivity environments.

Choose cloud when any of these apply:

The task requires reasoning across long contexts, broad world knowledge, or creative generation.
You need the most capable available model and are willing to pay for it.
Output quality variance directly affects revenue, for example sales copy or legal drafts.
You need to swap models frequently as the frontier moves.

The interesting middle case is hybrid. Route on device first, escalate to cloud when confidence is low. We have shipped this pattern for a healthcare client where on-device handled 80% of clinical note classification, with cloud reserved for ambiguous cases flagged by the local model. Inference spend dropped by roughly 70% and median latency improved.

What agentic coding does to team structure

The Xcode agentic tooling raises a harder question than which model to use. It changes what a senior engineer's day looks like.

If an agent can take a Linear ticket, write the implementation across five files, run tests, and open a PR, then the bottleneck moves from typing speed to review quality, architectural judgement, and taste. The engineers who thrive are the ones who can specify problems precisely and evaluate generated code quickly. The engineers who struggle are the ones whose value was in the mechanical production of correct code.

This has three practical implications for engineering leaders.

Hiring shifts toward senior

The productivity floor for a competent senior engineer with agentic tooling is now substantially higher than it was twelve months ago. The productivity ceiling for a junior is largely unchanged, because juniors lack the judgement to catch the agent's mistakes. We are seeing clients consolidate around smaller, more senior teams. This is consistent with what we see across our 500+ engineer network. The seniors who pair agentic tooling with strong system design are pulling away from the field.

Code review becomes the work

When agents produce most of the line-by-line code, the review process becomes the primary place where engineering quality is established. Teams that still treat review as a rubber stamp will ship more bugs, not fewer. The teams winning here are investing in review tooling, automated checks, and explicit standards for what an agent is allowed to merge unsupervised.

Technical due diligence gets harder

For investors and acquirers, the agentic shift complicates code quality assessment. A codebase can look clean and conventional while being substantially agent-generated, with all the subtle architectural drift that implies. Our Technical Due Diligence practice now explicitly assesses agent usage patterns, review discipline, and the team's ability to maintain code they did not write line by line. This is a real diligence risk that did not exist two years ago.

What to do in the next quarter

If you ship on Apple platforms, audit your current cloud LLM usage against the decision rule above. There is almost certainly a feature you are paying cloud inference for that should run on device. Start with the highest volume, lowest complexity task.

If you are scaling your engineering team, reconsider the senior to junior ratio in light of agentic tooling. The math has shifted. A Fractional CTO engagement is often the fastest way to pressure test team structure against where the tooling is actually going, not where it was when you last hired.

If you are evaluating a target for acquisition, ask explicitly about agent usage in the codebase. The answer tells you something important about technical debt that will not show up in a standard code scan.

The teams that treat Apple's Foundation Models framework as just another SDK will get marginal benefit. The teams that rebuild their assumptions about where intelligence lives, on device or in the cloud, and how engineers spend their time, will compound the advantage over the next two years.

Tell us what you need. We'll find the right engineers.

Whether you need senior developers embedded in your team, a Fractional CTO, or a technology assessment before a deal — most engagements start within 2–4 weeks.

Or email us directly at post@devspace.no to get a free consultation.