BentoML vs Cortex - ML Serving Showdown

To find the best model serving tool, compare open-source MLOps platforms BentoML and Cortex.
BentoML and Cortex logo
BentoML and Cortex logo

Update 2022-02-02 - bought by Databricks

Consider migrating away from the as it was bought by Databricks and is only maintained but not developed at the moment. Consider other options. You can do clusterless with directly EC2 wiht AWS cli scripsts, or Fargate. Or with cluster maintanance you can think about popular choice Terraform. For deploying consider using e.g. Kubernetes Python client.

Original Post:

Do you need a simple way to train and host your machine learning AI models in the cloud? Here is my experience with Cortex Labs’s Cortex. My view on BentoML is based on cursory overview of their documentation. Checkout also AWS App Runner option at the end.


  • both deploy and serve models via API
  • both support major ML frameworks (TensorFlow, Pytorch)
  • both have good documentation


Language fully Python - easier to modify? Go & Python wrapper (Updated)
Deployment Delegated to other tools: Docker, Kubernetes, AWS Lambda, SageMaker, GCP, Azule ML and more. Works currently only with local Docker and AWS EKS, GCP (Updated)
Service configuration from Python from Python (Updated)
Service packaging and distribution Can be packaged, saved via Python command to management repository with a web dashboard or PyPI Packaging only via Docker images without explicit support
Horizontal scaling Configured separately in other clustering tools. Working on an opinionated Kubernetes deployment. Configurable in Cortex. May be less flexible (private cloud deploy may require custom scripts)
User interface CLI, Web UI CLI
Metrics Prometheus metrics Prometheus metrics (Updated)
API Auto-Docs Swagger/OpenAPI N/A
User support Responsive unpaid Slack Channel, but Slack is not the best tool for support Very responsive Gitter and now Slack
Suggest anything else?

My Experience with Cortex

Here is a blog post on Cortex use at GLAMI. It is a bit outdated take as Cortex now has its own wrapper. Consider using this Cortex client for Python, which is a Python wrapper around Cortex CLI that we use at GLAMI for MLOps. It has a couple of extra features, that keep us using it for now. I used Cortex to deploy small multi-modal transformer models but we used it for other deployments as well.


BentoML vs Cortex - ML Serving Showdown
JS disabled! Watch BentoML vs Cortex - ML Serving Showdown on Youtube
Watch video "BentoML vs Cortex - ML Serving Showdown"

AWS App Runner

If you need super simple deployment of CPU only applications with auto-scaling you can consider using AWS App Runner. You just fill in your source code repository and you app gets hosted in the selected region on your domain with auto certificate renewal.

Need More Flexibility? Helm Could Help

If you need more flexibility and you have dedicated DevOps person, consider using Heml. Heml is more complex to use, but is still simpler than using Kubernetes directly and has some similarities with Cortex.

External Discussions

Created on 11 May 2020.
Thank you

About Vaclav Kosar How many days left in this quarter? Twitter Bullet Points to Copy & Paste Averaging Stopwatch Privacy Policy
Copyright © Vaclav Kosar. All rights reserved. Not investment, financial, medical, or any other advice. No guarantee of information accuracy.