My pet-project is about food recognition. More info here.
Scrapy made me a folder with images and a .csv with rows like:
Apple Cake,"Some apple cake description...",https://www.some-recipes-website.ru/binfiles/images/20200109/m12b509e.jpg,"[{'url': 'https://www.some-recipes-website.ru/binfiles/images/20200109/m12b509e.jpg', 'path': 'full/ae00a78059ad08506aa4767ed925bef5dccabf63.jpg', 'checksum': '55088c744a564af5ed8d4e5ea6478d20', 'status': 'downloaded'}]"
Now, I needed to create .csv files like this:
pic-037.jpg,80,20,500,120,risotto pic-025.jpg,520,250,1152,953,risotto pic-with-nothing.jpg,,,,, pic-004.jpg,0,0,1600,1113,beans ...
To do that, I googled smth like "best machine learning label tools 2022" and found Label-studio. I followed these steps from their docs:
python3 -m venv env source env/bin/activate python -m pip install label-studio
But I couldn't launch label-studio until I did some of these things:
pip install wheel pip install spacy pip install cymem brew install postgresql
(link1, link2 might be helpful)
I didn't have any problems importing my data to Label-studio. The only setting I had to do was to add my labeling interface code:
<View> <Image name="image" value="$image"/> <Choices name="choice" toName="image" showInLine="true"> <Choice value="Salad" background="blue"/> <Choice value="Soup" background="green" /> <Choice value="Pastry" background="orange" /> <Choice value="Nothing" background="orange" /> </Choices> <RectangleLabels name="label" toName="image"> <Label value="Salad" background="green"/> <Label value="Soup" background="blue"/> <Label value="Pastry" background="orange"/> <Label value="Nothing" background="black"/> </RectangleLabels> </View>
After that, labeling interface looked like this:
I had label "Nothing" for confusing images that I decided to exclude from the dataset:
After I finished labeling a portion of images, I chose to export them in .csv format. I got a file with rows like this:
/data/upload/1/83d8ce57-7478f053119ca2a85c4932870ef3e1833eb3eeb5.jpg,14,Pastry,"[{""x"": 13.157894736842104, ""y"": 6.5625, ""width"": 41.578947368421055, ""height"": 74.375, ""rotation"": 0, ""rectanglelabels"": [""Pastry""], ""original_width"": 570, ""original_height"": 320}]",1,20,2022-06-06T07:11:56.377261Z,2022-06-10T21:02:35.758255Z,1209.599
I was surprised when I saw x, y, width and height. Then I've read in the docs that "Image annotations exported in JSON format use percentages of overall image size, not pixels, to describe the size and location of the bounding boxes."
I wrote a small python script to check exported regions:
from PIL import Image img = Image.open('../../images/full/7478f053119ca2a85c4932870ef3e1833eb3eeb5.jpg') x = 13.157894736842104 y = 6.5625 width = 41.578947368421055 height = 74.375 original_width = 570 original_height = 320 pixel_x = x / 100.0 * original_width pixel_y = y / 100.0 * original_height pixel_width = width / 100.0 * original_width pixel_height = height / 100.0 * original_height left = pixel_x upper = pixel_y right = pixel_x + pixel_width lower = pixel_y + pixel_height box = (left, upper, right, lower) region = img.crop(box) region.show()
When I launched the script, it showed me the correct region cropped out of the original image:
So now I understood how to get pixel annotations if I need them.
Next post is going to be about trying to feed the dataset to Retinanet. I am going to use only 48 images scraped so far just to see what format of input it really needs.
Top comments (0)