Data 08 00112
Data 08 00112
Data 08 00112
Data Descriptor
RipSetCocoaCNCH12: Labeled Dataset for Ripeness Stage
Detection, Semantic and Instance Segmentation of Cocoa Pods
Juan Felipe Restrepo-Arias * , María Isabel Salinas-Agudelo, María Isabel Hernandez-Pérez ,
Alejandro Marulanda-Tobón and María Camila Giraldo-Carvajal
Abstract: Fruit counting and ripeness detection are computer vision applications that have gained
strength in recent years due to the advancement of new algorithms, especially those based on
artificial neural networks (ANNs), better known as deep learning. In agriculture, those algorithms
capable of fruit counting, including information about their ripeness, are mainly applied to make
production forecasts or plan different activities such as fertilization or crop harvest. This paper
presents the RipSetCocoaCNCH12 dataset of cocoa pods labeled at four different ripeness stages:
stage 1 (0–2 months), stage 2 (2–4 months), stage 3 (4–6 months), and harvest stage (>6 months). An
additional class was also included for pods aborted by plants in the early stage of development. A
total of 4116 images were labeled to train algorithms that mainly perform semantic and instance
segmentation. The labeling was carried out with CVAT (Computer Vision Annotation Tool). The
dataset, therefore, includes labeling in two formats: COCO 1.0 and segmentation mask 1.1. The
images were taken with different mobile devices (smartphones), in field conditions, during the
Citation: Restrepo-Arias, J.F.; harvest season at different times of the day, which could allow the algorithms to be trained with data
Salinas-Agudelo, M.I.; that includes many variations in lighting, colors, textures, and sizes of the cocoa pods. As far as we
Hernandez-Pérez, M.I.; know, this is the first openly available dataset for cocoa pod detection with semantic segmentation
Marulanda-Tobón, A.; for five classes, 4116 images, and 7917 instances, comprising RGB images and two different formats
Giraldo-Carvajal, M.C. for labels. With the publication of this dataset, we expect that researchers in smart farming, especially
RipSetCocoaCNCH12: Labeled in cocoa cultivation, can benefit from the quantity and variety of images it contains.
Dataset for Ripeness Stage Detection,
Semantic and Instance Segmentation
Keywords: cocoa pods detection; ripeness stage detection; semantic segmentation; smart farming
of Cocoa Pods. Data 2023, 8, 112.
https://doi.org/10.3390/
data8060112
2. RipSetCocoaCNCH12 Dataset
2.1. Descripion
CACAO CNCH12, developed by “Compañía Nacional de Chocolates”, is the cocoa
variety in the dataset. The images were collected at the “Compañía Nacional de Choco-
lates” farm, located in the municipality of Támesis, department of Antioquia—Colombia
(5◦ 430 0200 N–75◦ 410 2500 W). The average height above sea level in the farm is approximately
1100 m. The dataset was created between 1 December 2022 and 17 February 2023, the
primary cocoa harvest season in the study area.
The average ripening period for a cocoa pod typically spans six to seven months,
although slight variations may occur based on the specific agronomic and climatic condi-
tions of the crop. The ripeness stages were defined in ranges of two months due to the
key physical and chemical differences of the cocoa pods according to the agronomists
of the “Compañía Nacional de Chocolates” company. The stages are defined based on
the duration in months, starting from pollination of the flowers to the optimal time for
Data 2023, 8, x FOR PEER REVIEW harvesting the pod. The sequential progression of cocoa pods during the ripening process,3 of 10
Figure 1.
Figure 1. Ripeness process in
Ripeness process in aa sequence
sequence of
of cocoa
cocoa pods.
pods.
The images of cocoa pods were divided into five classes (Table 1). They were divided
into four classes according to their ripeness stage in months: Class 1 (0–2 months), Class 2
(2–4 months), Class 3 (4–6 months), and Class 4 (>6 months) (Figure 2). Additionally, there
is a fifth class known as “abortions” that does not fall under any of the ripeness stages
(Class A). Abortions are cocoa pods that start their growth process but die from various
Data 2023, 8, 112 3 of 10
The dataset contains two folders: the first contains the annotations in COCO 1.0 format,
and the second contains the images in segmentation mask 1.1 format. In each of these
folders, the images are divided into subfolders named with the main class they contain; an
image can contain several instances of different classes, but the images in each folder are
dominated by one of the classes. The distribution of instances in each folder can be seen
below in Figure 4.
The dataset contains two folders: the first contains the annotations in COCO 1.0 for-
mat, and the second contains the images in segmentation mask 1.1 format. In each of these
folders, the images are divided into subfolders named with the main class they contain;
an image can contain several instances of different classes, but the images in each folder
Data 2023, 8, 112 are dominated by one of the classes. The distribution of instances in each folder can4 ofbe
10
Figure4.
Figure Distributionof
4.Distribution ofthe
theinstances
instancesfor
foreach
eachimage
image folder
folder(y-axes
(y-axes differ
differ between
between the
the frames).
frames).
2.2. Quantitative
2.2. QuantitativeMeasure
Measureto
toDifferenciate
Differenciate Cocoa
Cocoa Classes
Classes
Theripening
The ripeningprocess
process ofoffruit
fruitinvolves
involvesaasequence
sequenceof ofphysiological
physiologicalchanges
changesto tobecome
become
ready for consumption
ready consumption or orprocessing.
processing.The Thefruit grows,
fruit grows, accumulating
accumulating essential nutrients
essential and
nutrients
water,
and whilewhile
water, noticeable transformations
noticeable in color,
transformations intexture, and composition
color, texture, signify itssignify
and composition ripeness.
its
A
ripeness.widely used way to measure the state of maturity of a fruit quantitatively at different
stages is to calculate
A widely the to
used way internal
measuresugar
thecontent
state ofby measuring
maturity of a Brix
fruit degrees [12–15].atTo
quantitatively have
differ-
a quantitative
ent stages is to measure
calculatethat
the would
internalconfirm the difference
sugar content betweenBrix
by measuring ripeness
degreesstages, the Brix
[12–15]. To
degrees
have were measured
a quantitative in more
measure than
that 35 cocoa
would podsthe
confirm fordifference
each class between
in the four ripenessstages,
ripeness stages
(C1Brix
the to C4). The results
degrees are presented
were measured in Table
in more than2. 35 cocoa pods for each class in the four
ripeness stages (C1 to C4). The results are presented in Table 2.
Table 2. Number of samples and average Brix degrees for the ripeness stages.
C1 39 5.3
C2 45 6.6
C3 38 8.7
C4 40 16.6
Data 2023, 8, 112 5 of 10
(a) (b)
(c) (d)
Figure
Figure 5.
5. Dataset examplesofof
Dataset examples the
the ripeness
ripeness stages:
stages: (a) Class
(a) Class 1; (b)1;Class
(b) Class 2; (c)3;Class
2; (c) Class 3; (d)
(d) Class 4. Class 4.
Table
below4 below
showsshows a summary
a summary of the
of the RipSetCocoaCNCH12dataset.
RipSetCocoaCNCH12 dataset.
Item Description
Field of application Object detection—smart farming
Data 2023, 8, 112 6 of 10
Item Description
Field of application Object detection—smart farming
Data acquisition Smartphone devices
Manually with CVAT (Computer Vision
Method of annotation
Annotation Tool)
5: stage 1 (0–2 months), stage 2 (2–4 months),
Number of classes stage 3 (4–6 months), for harvest (>6 months),
and abortions
Number of images 4116
Number of instances 7917
Data collected by Authors of this paper
Years of collection 2022–2023
Vertical resolution 96 dpi
Horizontal resolution 96 dpi
Dataset size 27 GB
Image format .JPG
Image size 3000 × 3000 px
Annotation formats COCO 1.0 and segmentation mask 1.1
3. Methods
Nowadays, smartphones have become ubiquitous. In even the most remote rural
areas, smartphones have become the main communication technology due to their low
costs and portability. These devices can also give farmers the ability to collect image data.
Therefore, in this work, the images were captured with smartphones to have a dataset as
similar as possible to real conditions.
Table 5. Technical specifications of the smartphone cameras used to capture the dataset images.
The strategy for capturing images involved zigzag paths in the field enabling access
to each crop tree. During each pass, a person took images of a single class to allow easier
classification in the folders.
Between one and four images of each cocoa pod were taken from different angles to
obtain as many samples as possible (Figure 6).
The images were taken between 8:00 a.m. and 4:00 p.m. First, the size format for the
capture was adjusted on all smartphones to a 1:1 ratio, and then resizing was applied to
them using a script in the Python language with Pillow (Python Imaging Library), giving
Motorola G9 plus
sor, f/2.2 aperture. Macro: 2 MP sensor and f/2.2 aperture.
Depth: 2 MP sensor and f/2.2 aperture.
The strategy for capturing images involved zigzag paths in the field enabling access
Data 2023, 8, 112 7 of 10 easier
to each crop tree. During each pass, a person took images of a single class to allow
classification in the folders.
Between one and four images of each cocoa pod were taken from different angles to
obtainaasfinal
them manysizesamples × 3000
of 3000 as px. (Figure
possible The original
6). images had sizes in the range from
3072 × 3072 to 4096 × 4096 px.
Figure 6.
Figure Image capture
6. Image captureprocess
processforfor
oneone
cocoa podpod
cocoa fromfrom
different angles.angles.
different
3.3.3.3.
Data Annotation
Data Annotation
The
The tool
tool usedfor
used forlabeling
labeling images
images was
wasCVAT
CVAT(Computer
(Computer Vision
VisionAnnotation Tool)Tool)
Annotation [16], [16],
which allows for different techniques. The technique used for this work was polygon
which allows for different techniques. The technique used for this work was polygon la-
labeling to obtain a semantic segmentation of the classes (Figure 8).
beling to obtain a semantic segmentation of the classes (Figure 8).
The dataset contains labels in two alternative formats: (1) COCO 1.0, which has files
in the format (*.json) for detection using bounding boxes and polygons, and (2) segmen-
tation mask 1.1, which contains separate folders for semantic segmentation and instance
segmentation. Examples of these masks can be seen in Figures 9 and 10.
and (d) measurement recording.
The dataset contains labels in two alternative formats: (1) COCO 1.0, which has files
in the format (*.json) for detection using bounding boxes and polygons, and (2) segmen-
tation mask 1.1, which contains separate folders for semantic segmentation and instance
segmentation. Examples of these masks can be seen in Figures 9 and 10.
4. Limitations
4. Limitations
The RipSetCocoaCNCH12 dataset does not include classes of cocoa pods to discard.
The RipSetCocoaCNCH12 dataset does not include classes of cocoa pods to discard.
In future work, diseases and rotten pods may be included. Additionally, more data should
In future work, diseases and rotten pods may be included. Additionally, more data should
be collected on other different cocoa varieties.
be collected on other different cocoa varieties.
Data 2023, 8, 112 9 of 10
4. Limitations
The RipSetCocoaCNCH12 dataset does not include classes of cocoa pods to discard.
In future work, diseases and rotten pods may be included. Additionally, more data should
be collected on other different cocoa varieties.
References
1. Bosompem, M. Potential challenges to precision agriculture technologies development in Ghana: Scientists’ and cocoa extension
agents’ perspectives. Precis. Agric. 2021, 22, 1578–1600. [CrossRef]
2. Bueno, G.E.; Valenzuela, K.A.; Arboleda, E.R. Maturity classification of cacao through spectrogram and convolutional neural
network. J. Teknol. Sist. Komput. 2020, 8, 228–233. [CrossRef]
3. Quezada-Ramón, L.A.; Quevedo-Guerrero, J.N.; García-Batista, R.M. Determinación del efecto del grado de madurez de las
mazorcas en la producción y la calidad sensorial de (Theobroma cacao L.). Rev. Científica Agroecosistemas 2017, 5, 36–46. Available
online: http://aes.ucf.edu.cu/index.php/aes/index (accessed on 12 May 2023).
4. Galindo, J.A.M.; Rosal, J.E.C.; Villaverde, J.F. Ripeness Classification of Cacao Using Cepstral-Based Statistical Features and
Support Vector Machine. In Proceedings of the 2022 IEEE International Conference on Artificial Intelligence in Engineering and
Technology (IICAIET), Kota Kinabalu, Malaysia, 13–15 September 2022; pp. 1–5. [CrossRef]
5. Gallego, A.M.; Zambrano, R.A.; Zuluaga, M.; Rodríguez, A.V.C.; Cortés, M.S.C.; Vergel, A.P.R.; Valencia, J.W.A. Analysis of fruit
ripening in Theobroma cacao pod husk based on untargeted metabolomics. Phytochemistry 2022, 203, 113412. [CrossRef] [PubMed]
6. Lockman, N.A.; Hashim, N.; Onwude, D.I. Laser-Based imaging for Cocoa Pods Maturity Detection. Food Bioprocess Technol. 2019,
12, 1928–1937. [CrossRef]
7. Veites-Campos, S.A.; Betancour, R.R.; González-Pérez, M. Identification of Cocoa Pods with Image Processing and Artificial
Neural Networks. Int. J. Adv. Eng. Manag. Sci. 2018, 4, 510–518. [CrossRef]
8. Heredia-Gómez, J.F.; Rueda-Gómez, J.P.; Talero-Sarmiento, L.H.; Ramírez-Acuña, J.S.; Coronado-Silva, R.A. Cocoa pods ripeness
estimation, using convolutional neural networks in an embedded system. Rev. Colomb. Comput. 2020, 21, 42–55. [CrossRef]
9. Baculio, N.G.; Barbosa, J.B. An Objective Classification Approach of Cacao Pods using Local Binary Pattern Features and Artificial Neu-
ral Network Architecture (ANN). Indian J. Sci. Technol. 2022, 15, 495–504. Available online: https://indjst.org/articles/an-objective-
classification-approach-of-cacao-pods-using-local-binary-pattern-features-and-artificial-neural-network-architecture-ann (accessed
on 1 March 2023). [CrossRef]
10. Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; The MIT Press: Cambridge, MA, USA, 2016.
11. Ayikpa, K.J.; Mamadou, D.; Ballo, A.B.; Yao, K.; Gouton, P.; Adou, K.J. CocoaMFDB: A dataset of cocoa pod maturity and families
in an uncontrolled environment in Côte d’Ivoire. Data Brief 2023, 48, 109196. Available online: https://linkinghub.elsevier.com/
retrieve/pii/S2352340923003153 (accessed on 1 March 2023). [CrossRef] [PubMed]
12. Pérez, V.O.; Álvarez-Barreto, C.I.; Matallana, L.G.; Acuña, J.R.; Echeverri, L.F.; Imbachí, L.C. Effect of Prolonged Fermentations of
Coffee Mucilage with Different Stages of Maturity on the Quality and Chemical Composition of the Bean. Fermentation 2022, 8, 519.
[CrossRef]
13. Darbellay, C.; Luisier, J.-L.; Villettaz, J.-C.; Azodanlou, R. Changes in flavour and texture during the ripening of strawberries. Eur.
Food Res. Technol. 2003, 218, 167–172. [CrossRef]
14. Chassagne-Berces, S.; Fonseca, F.; Citeau, M.; Marin, M. Freezing protocol effect on quality properties of fruit tissue according to
the fruit, the variety and the stage of maturity. LWT 2010, 43, 1441–1449. [CrossRef]
Data 2023, 8, 112 10 of 10
15. Teka, T.A. Analysis of the effect of maturity stage on the postharvest biochemical quality characteristics of tomato (Lycopersicon
esculentum Mill.) fruit. Int. Res. J. Pharm. Appl. Sci. 2013, 3, 180–186. Available online: www.irjpas.com (accessed on 1 March 2023).
16. CVAT. Available online: https://www.cvat.ai/ (accessed on 21 February 2023).
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.