One traditional enterprise goal of 40% utilization was to cover DR/failovers, so one region could take on 100% of traffic from another, with 20% headroom.
I'm curious about the granularity of contracts around granting/selling excess capacity. Are they short term? Can the owner evict those workloads (with a penalty)?
Good point - people do set capacity aside, reserving it for later.
But our utilisation measurements are from waste within a users allocation. It’s waste of what users are actually requesting and running, not from any reserved idle capacity.
For now we sit only on the prediction/intelligence layer; we don’t do any scheduling. We don’t grant or sell capacity, we just tell the scheduler (and user) what a job actually needs.
This is a cool idea—I know from snooping on sumbit scripts and node utilization on the HPC that I use at my institution that most submissions leave some compute on the table (and many of them are egregiously bad). I'd probably vote in favor of sending every submitted sbatch script through an LLM (at least for everyone else, I'd would prefer tuning my own usage myself :) ).
Presumably the underlying model here is also an LLM? To what degree is it "fine-tuned", or is it just given a set of tools to build a good picture of cluster usage?
Nope :) the core model isn’t an LLM. It’s a custom architecture built from the ground up. We natively accept multimodal inputs such as source code, submission scripts and hardware topologies. The LLMs in the post are the baselines we beat.
This is also why fine-tuning matters for us. We train a cluster-specific model that gets better as more jobs run on your cluster, because the same code behaves differently on different topology. An LLM reasons about code/script in a vacuum with no native sense of how your nodes actually perform
> Datacenters run at roughly 30% to 40% effective utilisation
I wonder what is stopping datacenters from passing this benefit to customers by launching better tuned plans. For example, t series EC2 instances on AWS.
I'm curious about the granularity of contracts around granting/selling excess capacity. Are they short term? Can the owner evict those workloads (with a penalty)?
But our utilisation measurements are from waste within a users allocation. It’s waste of what users are actually requesting and running, not from any reserved idle capacity.
For now we sit only on the prediction/intelligence layer; we don’t do any scheduling. We don’t grant or sell capacity, we just tell the scheduler (and user) what a job actually needs.
Presumably the underlying model here is also an LLM? To what degree is it "fine-tuned", or is it just given a set of tools to build a good picture of cluster usage?
This is also why fine-tuning matters for us. We train a cluster-specific model that gets better as more jobs run on your cluster, because the same code behaves differently on different topology. An LLM reasons about code/script in a vacuum with no native sense of how your nodes actually perform
https://www.linkedin.com/posts/rahmi-pruitt-a1bb4a127_agentn...
I wonder what is stopping datacenters from passing this benefit to customers by launching better tuned plans. For example, t series EC2 instances on AWS.
I feel like it’s probably just complexity.
Different workloads benefit from specific types of optimisations.
I’ll send you an email, good luck with the book!