<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Inference on Peter Grimshaw&#39;s Site</title>
    <link>https://pagrim.github.io/tags/inference/</link>
    <description>Recent content in Inference on Peter Grimshaw&#39;s Site</description>
    <generator>Hugo -- gohugo.io</generator>
    <language>en-gb</language>
    <lastBuildDate>Mon, 29 Sep 2025 08:05:01 +0100</lastBuildDate>
    <atom:link href="https://pagrim.github.io/tags/inference/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>ML Inference with BentoML</title>
      <link>https://pagrim.github.io/post/bentoml-chronos/</link>
      <pubDate>Mon, 29 Sep 2025 08:05:01 +0100</pubDate>
      <guid>https://pagrim.github.io/post/bentoml-chronos/</guid>
      <description>When it comes to deploying machine learning models into production, there’s no shortage of tools available. I’ve been exploring the landscape of ML inference frameworks, trying to understand the trade-offs and strengths of different options. I spent a bit of time investigating BentoML a while back, and really liked user-friendly design and focus on model serving.&#xA;How Widely used is BentoML? For fun—and a bit of insight—I compared three well-known ML serving tools using Google Trends: BentoML, NVIDIA’s Triton Inference Server, and KServe.</description>
    </item>
  </channel>
</rss>
