Kanban

A dataset kanban board is a great way to visualize the progress of your machine learning dataset through the different stages of its lifecycle.

Scenario

Find it in Dataset -> Scenario

This page allows you to search your data in this dataset by search ontologies.

When building a machine learning model, it is common to split your data into three subsets:

Training set: The majority of your data (typically 60-80%) that is used to train your model. Your model learns from the training data by finding patterns and correlations to understand the relationship between the input and output.
Validation set: A portion of your data (typically 10-20%) that is used to evaluate your model during training. After each epoch or iteration of training, you test your model on the validation set to check its performance. You then tweak or retrain your model as needed to improve validation accuracy.
Test set: A portion of your data (typically 10-20%) that is held out and not used during training at all. It is only used at the very end, to test your final trained model. Performance on the test set determines how well your model can generalize to new data.

The split between training, validation and test sets is typically:

Find it in Dataset -> Overall, some statistics about the dataset can be seen:

Last updated 2 years ago

Was this helpful?