Overview 🌙

If you can’t measure it, you can’t improve it.Peter Drucker

Why Luna ML

ML model development is an iterative process to improve the performance with

  1. Data (quality and quantity)

  2. Model architectures

  3. Hyper parameters

How do you define the performance of your model? Metrics like Accuracy, Log loss, AUC, F1 score, may define the performance. But the definition of performance sometimes can not be defined by a single metric and really depends on your problem and target applications. For instance, Qps (Queries per second) or the size of the model after compression can be part of the performance.

performance = func(metric1, metric1, …)

Furthermore, the definition of the performance probably will evolve over time, with different metrics or data changes. So, the evolution of the definition of the performance needs to be managed and communicated in the team efficiently, in a realtime. Otherwise, you can’t really tell whether a model is improved or not, by how much. And evaluating the model based on the latest (or selected) performance definition shouldn’t be complicated, for faster iteration of your model development.

This is what Luna ML does. Luna ML allows you to define performance via evaluators and scorer, automatically evaluate your models when submitted, and display results you can share.

How it works

How it works

Your team can Create a Luna ML Project for each problem to solve. Define your evaluators and scorer. They’re essentially container images that produce desired outputs in the desired mount point. So you can define them using any language you want, with any libraries (e.g. PyTorch, Tensorflow, etc) and any environments (e.g. open ai gym, etc).

Then, machine learning engineers in your team can Submit a Model and get Evaluated. Luna ML manages all project configurations through connected GitHub repository, including model submission.

A model is submitted by making a Pull Request and Luna Github Action automatically validates if the model is evaluate-able in the project. Once the review is done and the Pull Request is merged, Luna Github Action automatically syncs changes with Luna ML Server, and evaluation is automatically triggered. Once a new model or a new revision of the model is ready, you can repeat this process.

From Luan ML UI, you can see evaluation results produced by evaluators and scores produced by scorer, as well as a Leaderboard of all submitted models in your project.

Since project (including evaluators and scorer), models are version controlled and evaluation is done with container image on server-side, Luna ML provides

  • Reproducible evaluation environment

  • Immutable, sharable evaluation results

  • Leaderboard, score trends over time


What are the differences between Luna ML and ML Experiment Tracking softwares, like MLflow?

Luna ML executes evaluations on the server side to generate Metrics while MLflow collects generated metrics from the client.

Therefore, Luna ML can easily re-evaluate models with clicks when data or evaluation criteria changes, while MLflow re-evaluation completely depends on the users.

Luna ML

Experiment tracking software


Pre-trained models and user defined evaluators

Collected metrics during training, validating


Evaluation results (video, text, data, …)


How it works

Server evaluates submitted trained model

Tracking client sends metrics to server


Compare between different model architectures, revisions

Compare between experiments

When data changes

Server Re-evaluates all models

Manually re-evaluate all models to generate new metrics

What are the differences between Luna ML and Kaggle Leaderboard?

Luna ML provides server-side evaluation of the model to generate evaluation results and scores, while Kaggle (https://kaggle.com) usually submits a prediction results file to get scores.

Also Luna ML is designed with MLOps in mind. Version controls and submission processes are controlled with GitHub and PullRequest processes. So it is easier to integrate with the other components of the ML pipeline.

Luna ML

Kaggle leaderboard


SaaS, Private deployment



Pre-trained model, function, etc

File with a set of predictions (known as “Submission file”)

Submission process

Create a PullRequest on connected repository

Upload from the browser

Version control

Submission is version controlled through Git

Submission is version controlled internally in Kaggle

What are the differences between Luna ML and PapesWithCode?

PapersWithCode (https://paperswithcode.com/) to benchmark latest papers. It is great to compare papers against the same benchmark. But it is not designed for your work environment, especially when your evaluation is constantly changing.

Luna ML



SaaS, Private deployment



Automated evaluation on submitted model

Manually or through the Client API insert pre-generated evaluation metrics

When data changes

Server Re-evaluates all models

Doesn’t really assume this case

Can I change evaluation criteria later?

Yes. It always happens. You may have new evaluation datasets, a new simulation environment, or new metrics you’d like to include to evaluate your models’ performance.

You can add a new evaluator in your project with new datasets and a new simulation environment, and your models can be evaluated by both existing and new evaluators.