How to Label Data for Machine Learning in Python | ActiveState

How to label data for machine learning in python qr cover

Data labeling in Machine Learning (ML) is the method of assigning labels to subsets of knowledge primarily based on its traits. Data labeling takes unlabeled datasets and augments each bit of knowledge with informative labels or tags.

Most generally, knowledge is annotated with a textual content label. However, there are numerous use instances for labeling knowledge with different sorts of labels. Labels present context for knowledge starting from photographs to audio recordings to x-rays, and extra.

Data Labeling Procedure

While knowledge has historically been labeled manually, the method is sluggish and resource-intensive. Instead, ML fashions or algorithms can be utilized to routinely label knowledge by first coaching them on a subset of knowledge that has been labeled manually.


One approach to automate knowledge labeling is to make use of a workflow that may establish when the labeling mannequin has larger or decrease confidence in its outcomes, and move the info to people to do the labeling when decrease confidence arises. The new human-generated labels can then be supplied again to the labeling mannequin for it to be taught from and enhance its capacity to routinely label the following set of knowledge.

Over time, the mannequin will label increasingly knowledge routinely, and the method will speed up. However, knowledge labeling is commonly a sluggish and repetitive process. In order to streamline the method, varied instruments have been developed.

How to Use Label Studio to Automatically Label Data

One automated labeling device is Label Studio, an open supply Python device that permits you to label varied knowledge varieties together with textual content, photographs, audio, movies, and time sequence.

1. To set up Label Studio, open a command window or terminal, and enter:

pip set up -U label-studio


python -m pip set up -U label-studio

2. To create a labeling challenge, run the next command:

label-studio init <project_name> 

Once the challenge has been created, you’ll obtain a message stating:

Label Studio has been efficiently initialized. Check challenge states in .<project_name> Start the server: label-studio begin .<project_name>

3. To begin the challenge run the next command:

label-studio begin .<project-name>


label-studio begin <project-name>

The challenge will routinely load in your net browser at

how to label ML data workflow welcome.png
4. Click on the Import button to import your knowledge from varied sources.

how to label ML data workflow import data

Once the info is imported, you’ll be able to scroll down the web page and preview it.
how to label ML data workflow preview

5. In the menu, click on on Settings to proceed:

how to label ML data workflow settings

You can now select among the many many choices to complete setup to your particular challenge.

how to label ML data workflow configuration


Please enter your comment!
Please enter your name here