Data Descriptor
RipSetCocoaCNCH12: Labeled Dataset for Ripeness Stage
Detection, Semantic and Instance Segmentation of Cocoa Pods
Juan Felipe Restrepo-Arias * , María Isabel Salinas-Agudelo, María Isabel Hernandez-Pérez ,
Alejandro Marulanda-Tobón and María Camila Giraldo-Carvajal

Escuela de Ciencias Aplicadas e Ingeniería, Universidad EAFIT, Medellín 050022, Colombia;

[email protected] (M.I.S.-A.); [email protected] (M.I.H.-P.); [email protected] (A.M.-T.);
[email protected] (M.C.G.-C.)
* Correspondence: [email protected]

Abstract: Fruit counting and ripeness detection are computer vision applications that have gained
strength in recent years due to the advancement of new algorithms, especially those based on
artificial neural networks (ANNs), better known as deep learning. In agriculture, those algorithms
capable of fruit counting, including information about their ripeness, are mainly applied to make
production forecasts or plan different activities such as fertilization or crop harvest. This paper
presents the RipSetCocoaCNCH12 dataset of cocoa pods labeled at four different ripeness stages:
stage 1 (0–2 months), stage 2 (2–4 months), stage 3 (4–6 months), and harvest stage (>6 months). An
additional class was also included for pods aborted by plants in the early stage of development. A
total of 4116 images were labeled to train algorithms that mainly perform semantic and instance
segmentation. The labeling was carried out with CVAT (Computer Vision Annotation Tool). The
dataset, therefore, includes labeling in two formats: COCO 1.0 and segmentation mask 1.1. The
images were taken with different mobile devices (smartphones), in field conditions, during the
Academic Editor: Juan-Carlos 1. Introduction

The application of precision agriculture strategies in cocoa crops continues to en-
Received: 31 May 2023 counter various challenges that need to be addressed. These challenges primarily involve
Revised: 9 June 2023 issues related to the poor quality of existing data and the acquisition of new data necessary
Accepted: 12 June 2023 for the application of advanced precision agriculture techniques [1].
Published: 18 June 2023 One of the main challenges is to identify different stages of ripeness of the cocoa pods
since this type of crop has a wide number of varieties, and all of them can show different
textures and color characteristics in their maturation process [2].
Detecting ripeness stages in cocoa pods is critical in determining two relevant factors
• laser techniques with backscattered images [6].

However, these techniques are unrealistic when implemented in the field with real
conditions, since the devices for capturing sound data, laser images, spectrometry, or
bio-chemical markers require expensive devices that are not within the reach of the farmers.
On the other hand, artificial intelligence techniques based on artificial neural networks
(ANNs), better known as deep learning, are increasingly used [7–9].
The precision and robustness of deep learning models depend on the quality and
quantity of the training data, as they are crucial factors that contribute to the variability of
the phenomenon under study [10].
Moreover, the increasing prevalence of smartphones among farmers for their daily
activities simplifies the process of capturing images, eliminating the necessity of investing
in costly equipment and specialized management for data capture.
Unfortunately, the community engaged in applied research using deep learning tech-
niques to detect ripeness stages in cocoa pods faces a scarcity of image datasets for most
varieties. In addition, the available public datasets offer only a limited number of images
for training deep learning models [8,11].
To help the community that performs applied research for developing deep learning
solutions to detect ripeness stages in cocoa pods, we propose the RipSetCocoaC-NCH12
dataset, which consists of 4116 images taken with different types of smartphones labeled
for semantic segmentation. Having several stages of ripeness is a feature that will allow
researchers to train machine learning algorithms that classify more than two classes: mature
and immature. These features will allow the scientific community interested in these
applications to train more robust and accurate deep learning models.
The RipSetCocoaCNCH12 dataset will be important for the training of machine learn-
ing algorithms that seek to detect different ripeness stages in cocoa crops of the CNCH12
variety and to make inventories of pods.

2. RipSetCocoaCNCH12 Dataset
2.1. Descripion
CACAO CNCH12, developed by “Compañía Nacional de Chocolates”, is the cocoa
variety in the dataset. The images were collected at the “Compañía Nacional de Choco-
lates” farm, located in the municipality of Támesis, department of Antioquia—Colombia
(5◦ 430 0200 N–75◦ 410 2500 W). The average height above sea level in the farm is approximately
1100 m. The dataset was created between 1 December 2022 and 17 February 2023, the
primary cocoa harvest season in the study area.
The average ripening period for a cocoa pod typically spans six to seven months,
although slight variations may occur based on the specific agronomic and climatic condi-
tions of the crop. The ripeness stages were defined in ranges of two months due to the
key physical and chemical differences of the cocoa pods according to the agronomists
of the “Compañía Nacional de Chocolates” company. The stages are defined based on
the duration in months, starting from pollination of the flowers to the optimal time for
from 0 to 6 months, is illustrated in Figure 1.

Figure 1.
Figure 1. Ripeness process in
Ripeness process in aa sequence
sequence of
of cocoa
cocoa pods.

The images of cocoa pods were divided into five classes (Table 1). They were divided
into four classes according to their ripeness stage in months: Class 1 (0–2 months), Class 2
(2–4 months), Class 3 (4–6 months), and Class 4 (>6 months) (Figure 2). Additionally, there
is a fifth class known as “abortions” that does not fall under any of the ripeness stages
(Class A). Abortions are cocoa pods that start their growth process but die from various
The images of cocoa pods were divided into five classes (Table 1). They were divided
into The
classes of cocoa pods
according were
to their dividedstage
ripeness into in
five classesClass
months: (Table 1). They
1 (0–2 wereClass
months), divided
into The images ofripeness
cocoa pods were divided Class
into five classes (TableClass
1). They were
(2–4 months), Class 3 (4–6 months), and Class 4 (>6 months) (Figure 2). Additionally, there 2
four classes according to their stage in months: 1 (0–2 months),
(2–4 into four months),
classes according to4their ripeness (Figure
stage in months: Class 1 (0–2
is a months),
fifth classClass
known 3 (4–6
as “abortions” andthatClass
does (>6
fall under any of2).the Additionally,
ripeness stages
a fifthA).
class (2–4
Abortions months),
as cocoa Class
“abortions” 3
pods that (4–6 months),
doestheir and
not growthClass
fall under 4 (>6 months)
any but
process of the (Figure
dieripeness 2). Additiona
from various
causesA). is
associated a
with fifth class
cocoa by known
peststhatas “abortions”
or start that
diseases does
even due not
process fall under
but die from
to physiological anyvariousripenes
causes associated
of the plant (Class
(Figurewith A). Abortions are cocoa pods that start their growth
3). attacks by pests or diseases or even due to physiological problems process but die from
of the plant (Figure causes
3). associated with attacks by pests or diseases or even due to physiological p
Table 1. Number and the
of namesplant (Figure 3).
of instances per class.
Figure 3. Examples of several types of abortions (CA).

of several
several types of abortions
abortions (CA).

The dataset contains two folders: the first contains the annotations in COCO 1.0 format,
and the second contains the images in segmentation mask 1.1 format. In each of these
folders, the images are divided into subfolders named with the main class they contain; an
image can contain several instances of different classes, but the images in each folder are
dominated by one of the classes. The distribution of instances in each folder can be seen
below in Figure 4.
Figure Distributionof
4.Distribution ofthe
image folder
(y-axes differ
differ between
between the
the frames).

2.2. Quantitative
2.2. QuantitativeMeasure
Differenciate Cocoa
Cocoa Classes
The ripeningprocess
process ofoffruit
sequenceof ofphysiological
changesto tobecome
ready for consumption
ready consumption or orprocessing.
processing.The Thefruit grows,
fruit grows, accumulating
accumulating essential nutrients
essential and
and whilewhile
water, noticeable transformations
noticeable in color,
transformations intexture, and composition
color, texture, signify itssignify
and composition ripeness.
ripeness.widely used way to measure the state of maturity of a fruit quantitatively at different
stages is to calculate
A widely the to
used way internal
state ofby measuring
maturity of a Brix
fruit degrees [12–15].atTo
quantitatively have
a quantitative
ent stages is to measure
the would
internalconfirm the difference
sugar content betweenBrix
by measuring ripeness
degreesstages, the Brix
[12–15]. To
have were measured
a quantitative in more
measure than
that 35 cocoa
would podsthe
confirm fordifference
each class between
in the four ripenessstages,
ripeness stages
the to C4). The results
degrees are presented
were measured in Table
in more than2. 35 cocoa pods for each class in the four
ripeness stages (C1 to C4). The results are presented in Table 2.
Table 2. Number of samples and average Brix degrees for the ripeness stages.

Number of Samples by Class Selected to Measure  

Degrees Brix Average Brix Degrees Measured µj (◦ Bx)

C1 39 5.3
C2 45 6.6
C3 38 8.7
C4 40 16.6
Data 2023, 8, 112 5 of 10

An ANOVA test was performed to check for a significant difference between the
different classes, according to their measure of Brix degrees. The results can be seen below
in Table 3.
Null hypothesis : µ j are equal
An ANOVA test was performed
Alternative to check
hypothesis : µfor a significant difference between the dif-
j are not equal
ferent classes, according to their measure of Brix degrees. The results can be seen below
According to the results of the F and p-value, the null hypothesis is rejected. Therefore,
there is a significant difference in Brix degrees among classes, which confirms the accuracy
of dividing cocoa pods into the four proposed classes for the stages of ripeness.
Every imageisis3000
Every image 3000××3000
inin JPEG
JPEG format,
format, withwith 8 bits.
8 bits. The image
The image filesnamed
files were were named
with the
the date andtime
date and timeofofcapture.
capture. is an
Figure example
5 is of the
an example images
of the corresponding
images correspondingtotothe four
ripeness stages. stages.
the four ripeness

(a) (b)

(c) (d)
Figure 5.
5. Dataset examplesofof
Dataset examples the
the ripeness
ripeness stages:
stages: (a) Class
(a) Class 1; (b)1;Class
(b) Class 2; (c)3;Class
2; (c) Class 3; (d)
(d) Class 4. Class 4.

below4 below
showsshows a summary
a summary of the
of the RipSetCocoaCNCH12dataset.
RipSetCocoaCNCH12 dataset.

Table 4. The RipSetCocoaCNCH12 specifications.

Item Description
Field of application Object detection—smart farming
Data 2023, 8, 112 6 of 10

Table 4. The RipSetCocoaCNCH12 specifications.

Item Description
Field of application Object detection—smart farming
Data acquisition Smartphone devices
Manually with CVAT (Computer Vision
Method of annotation
Annotation Tool)
5: stage 1 (0–2 months), stage 2 (2–4 months),
Number of classes stage 3 (4–6 months), for harvest (>6 months),
and abortions
Number of images 4116
Number of instances 7917
Data collected by Authors of this paper
Years of collection 2022–2023
Vertical resolution 96 dpi
Horizontal resolution 96 dpi
Dataset size 27 GB
Image format .JPG
Image size 3000 × 3000 px
Annotation formats COCO 1.0 and segmentation mask 1.1

3. Methods
Nowadays, smartphones have become ubiquitous. In even the most remote rural
areas, smartphones have become the main communication technology due to their low
costs and portability. These devices can also give farmers the ability to collect image data.
Therefore, in this work, the images were captured with smartphones to have a dataset as
similar as possible to real conditions.

3.1. Image Data Acquisition

Five devices from some of the leading manufacturers were selected for this work.
To ensure significant variability in the types of images captured and enrich the dataset,
multiple devices were chosen. The technical specifications of used smartphones can be
seen below in Table 5.

Table 5. Technical specifications of the smartphone cameras used to capture the dataset images.

Smartphone Camera Specifications

Dual rear camera consisting of a 13-megapixel f/2.0 main sensor and a 2-megapixel
Samsung Galaxy A01
f/2.4 depth sensor.
Triple camera composed of an ultra-wide angle: 16 MP, f/2.2, 123◦ ; a wide angle: 12 MP, AF,
Samsung Galaxy Note 10
f/1.5–2.4; and a phone Camera: 12 MP, f/2.1.
iPhone SE 2020 Single camera. 12 MP wide-angle camera, f/1.8 aperture.
Dual camera. 16 MP main camera and f/1.8 aperture.8MP secondary super-wide-angle
camera with f/2.4 aperture.
Quadruple camera. Main camera: 64 MP sensor, f/1.8 aperture and phase detection focus.
Motorola G9 plus Ultra-angular: 8 MP sensor, f/2.2 aperture. Macro: 2 MP sensor and f/2.2 aperture. Depth:
2 MP sensor and f/2.2 aperture.

The strategy for capturing images involved zigzag paths in the field enabling access
to each crop tree. During each pass, a person took images of a single class to allow easier
classification in the folders.
Between one and four images of each cocoa pod were taken from different angles to
obtain as many samples as possible (Figure 6).
The images were taken between 8:00 a.m. and 4:00 p.m. First, the size format for the
capture was adjusted on all smartphones to a 1:1 ratio, and then resizing was applied to
them using a script in the Python language with Pillow (Python Imaging Library), giving
Motorola G9 plus
sor, f/2.2 aperture. Macro: 2 MP sensor and f/2.2 aperture.
Depth: 2 MP sensor and f/2.2 aperture.

The strategy for capturing images involved zigzag paths in the field enabling access
Data 2023, 8, 112 7 of 10 easier
to each crop tree. During each pass, a person took images of a single class to allow
classification in the folders.
Between one and four images of each cocoa pod were taken from different angles to
them manysizesamples × 3000
of 3000 as px. (Figure
possible The original
6). images had sizes in the range from
3072 × 3072 to 4096 × 4096 px.

Figure 6.
Figure Image capture
6. Image captureprocess
cocoa podpod
cocoa fromfrom
different angles.angles.

3.2. Brix Degrees Data Acquisition

The images were taken between 8:00 a.m. and 4:00 p.m. First, the size format for the
captureSome pods
was were selected
adjusted on all to measure the Brix
smartphones to a degrees of the
1:1 ratio, andinternal sugar content,
then resizing as
was applied to
mentioned in Section 2.1. First, the pods chosen for samples were perforated with a drill.
Then, the sample was extracted, which was later placed in a handheld refractometer, and
them a final size of 3000 × 3000 px. The original images had sizes in the range from 3072 ×
finally, the data were recorded manually. Images of this process can be seen below in Figure 7.
(a) (b) (c) (d)

Figure 7. Process
Figure 7. Processofofsampling
sampling to
to measure Brixdegrees:
measure Brix degrees:(a)(a) perforation
perforation of cocoa
of the the cocoa pod awith
pod with drill,a drill,
sampleininaahandheld spectrometer,
handheld spectrometer, (c)(c) reading
reading of the
of the BrixBrix degree
degree measurement,
The dataset contains labels in two alternative formats: (1) COCO 1.0, which has files
in the format (*.json) for detection using bounding boxes and polygons, and (2) segmen-
tation mask 1.1, which contains separate folders for semantic segmentation and instance
segmentation. Examples of these masks can be seen in Figures 9 and 10.
and (d) measurement recording.

3.3. Data Annotation

The tool used for labeling images was CVAT (Computer Vision Annotation Tool) [16],
Data 2023, 8, 112 9 of 10

4. Limitations
The RipSetCocoaCNCH12 dataset does not include classes of cocoa pods to discard.
In future work, diseases and rotten pods may be included. Additionally, more data should
be collected on other different cocoa varieties.

Author Contributions: Conceptualization, J.F.R.-A., M.I.H.-P., and A.M.-T.; methodology, J.F.R.-A.

and M.I.S.-A.; software, J.F.R.-A.; validation, J.F.R.-A., M.I.H.-P., and A.M.-T.; formal analysis, M.I.S.-A.
and J.F.R.-A.; data curation, M.I.S.-A. and M.C.G.-C.; writing—original draft preparation, J.F.R.-A.;
writing—review and editing, J.F.R.-A., M.I.H.-P., and A.M.-T.; project administration, J.F.R.-A. All
authors have read and agreed to the published version of the manuscript.
Funding: This research was funded by Universidad EAFIT, project No. 819422.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: The data presented in this study are openly available in Zenodo at (accessed on 24 May 2023).
Acknowledgments: We want to thank the “Compañía Nacional de Chocolates” company for pro-
viding access to the farm “La Granja” in the municipality of Támesis to take the images for this
work. Thanks for their support and for allowing us to use their facilities. Special thanks to the
BIOSUROESTE organization.
Conflicts of Interest: The authors declare no conflict of interest.

