Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

APPRENTICE: Towards a Cobot Helper in Assembly Lines

Yunus Terzioğlu1 , Özgür Aslan2 , Burak Bolat2 , Batuhan Bal2 , Tuğba Tümer2 ,
Fatih Can Kurnaz2 , Sinan Kalkan2 , Erol Şahin2

Abstract— In this paper, we describe a vision of human-robot apprentice cobot is; (a) waiting attentively with breathing
collaboration on assembly lines, where a collaborative robotic animation (can be seen in video and is not captured in the
manipulator, a.k.a. cobot, operates as a work-mate for the snapshot) for the start of the assembly, (b) handing over
worker. Specifically, the cobot, referred to as the “apprentice”
since it lacks the ability to replace the worker, can only aid the the screwdriver to the worker, (c) attentively following the
worker by observing his status during the assembly process, worker’s assembly through mutual gaze, (d) leaning back and
and handing over (and stowing away when done) the tools and waving (can be seen in the video and is not captured in the
the parts. Towards this end, we first describe the vision, and snapshot) to refuse to stow away the screwdriver since the
outline the challenges involved in developing an “apprentice” worker still has work to do with the screwdriver, (e) reaching
cobot and share the current state of the work done.
out to take the screwdriver once current job involving a
I. I NTRODUCTION screwdriver is completed, (f) stowing away the screwdriver.
The realization of such an apprentice cobot requires;
Collaborative robotic manipulators (a.k.a. cobots), de-
• improved human-robot interaction skills through the use
signed to work safely alongside humans, are envisioned to
of non-verbal behaviors,
take industrial automation to the next level, to increase the
• perception abilities to track (1) human body pose and
efficiency of a human worker. With the projected worldwide
gaze direction, (2) tools and parts in the workspace,
cobot market size to grow 5 times between 2020 and
• awareness of the status of the assembly process,
2025 [1], these cobots are expected to take part in frequently
• ability to discover assembly sequences,
changing tasks, mainly in medium and small businesses. This
• real-time motion planning in free space, and
brings in many challenges pertaining to (i) how easily such
• guarantees on safety.
robots can be used and how much programming and design
they require, (ii) how user-friendly and helpful they are, and In the rest of the paper, we will discuss the challenges
(iii) how appealing they are as co-workers. towards implementing these capabilities and report briefly
our results.
II. T HE A PPRENTICE V ISION FOR H UMAN -ROBOT
III. H UMAN -ROBOT I NTERACTION USING NON - VERBAL
C OLLABORATION
BEHAVIORS
The ÇIRAK and its successor KALFA (Apprentice and Drawing on the character animation principles Appeal,
Journeymen in Turkish) projects are based on the observation Arcing, and Secondary Action, we designed a set of social
that human workers are superior to cobots in the tasks cues for a commercially popular cobot platform, a UR5
involving manipulation. Carrying out the versatile manipu- robot arm (Universal Robots, Odense, Denmark) equipped
lation tasks of an assembly-line worker at the same speed with a 2F-140 two-finger gripper (Robotiq, Lévis, Canada)
and finesse is likely to remain beyond the reach of cobots (see Figure 1) that included giving it a head-on-neck look
in the near future. In our projects, we propose to develop by augmenting its appearance and implementing gaze and
technologies towards an “apprentice cobot”, an under-skilled posture cues (Appeal), generating smooth motion trajectories
robotic helper that can track the state of the worker, the for the arm (Arcing), and introducing breathing motions to
assembly process and hand over (and stow away when done) the robot during its idle operation (Secondary Action).
the necessary tools and the parts to the worker for the current In the ÇIRAK project, it was shown that applying some
state of the process. Such a system would only require of Disney’s animation principles to a Cobot improves the
limited manipulative capabilities given that it is coupled with quality of human-robot interaction (HRI) [2]. The KALFA
a cognitively intelligent interaction with the human worker. project will advance these proof-of-concept works in three
The “apprentice cobot” vision can be seen in its trailer directions to develop a full non-verbal communication ability
video (Figure 1) where the human assembles an IKEA chair in cobots: (1) After evaluating how all of Disney’s animation
with a cobot as his apprentice. In the snapshots of this video principles can be applied to improve the quality of HRI,
1 Khoury College of Computer Sciences, Northeastern University, these principles will be formally defined and integrated
Boston, MA, USA [email protected] into cobots as parameterized “HRI filters”. (2) Methods for
2 KOVAN Research Lab., ROMER (Center for Robotics and detecting non-verbal communication cues of the workers in
AI Technologies) and Dept. of Computer Eng., Middle East the assembly scenarios will be developed. (3) Non-verbal
Technical University, Turkey {aslan.ozgur, burak.bolat,
balbatuhan, tugba.tumer, fatih.kurnaz, communication cues from workers will be associated with
skalkan, erol}@metu.edu.tr HRI-filters in order to increase the harmony between cobot
Fig. 1. Snapshots from the apprentice trailer video (at https://youtu.be/CsV363jeuJs). See the text for description.

and the worker. The effects of these methods on the quality task-dependent [10]. Proxemics studies the usage of the
of HRI will be measured by human-robot experiments. interaction space. Interactions with closer distance than the
human-human cases were observed [11], [12], [13]. Time
A. Related work perception and manipulation of perceived time studied by
Nonverbal communication, allowing transfer of informa- Komatsu and Yamada [14]. Song and Yamada analyzed
tion via social clues via facial expressions, gestures, body color, sound, vibration and their combinations [15] in social
language etc., is very essential for human-robot interaction robotics.
[3]. Its importance has been identified in many studies. For
IV. P ERCEPTION
example, Salem et al. stated that regardless of the gesture
congruence, arm gestures lead to a more sympathetic, lively, A robotic co-worker needs to be able to perceive humans,
active and engaged interpretation of a robot [4]. Moreover, their gaze (and intention), objects parts, tools, and other
extroverted and abrupt gestures increase engagement and utilities in the environment. This requires locating such
human awareness [5], [6]. Arm gestures are also investigated objects in the camera frames in general by placing bounding
under synchronization congruence in human-robot interac- boxes around them using deep object detectors. However,
tions. Shen et al. reported adaptive robot velocity with respect this is often insufficient as robotic control takes place in 3D.
to interacting participants increases gesture recognition, task Therefore, perception of humans, objects, parts, tools, etc.
performance and social interaction [7]. In other studies, it needs to finally provide 3D information in the robot’s 3D
is found that participants synchronize their frequency of coordinate frame.
gestures to the robot’s gesture frequency, unlike their phase Perception incurs many challenges: (i) Obtained 3D in-
difference [8], [9]. Gaze has a major effect in Kinesics. formation needs to be very precise since, otherwise, the
Stanton and Stevens revealed the impact of gaze to be assembly will fail. (ii) Perception should be robust to clutter,
assembly, predicting the error isolating its source, rectifying
the error are highly necessary for widespread applicability
and acceptance of such robots.
In our project, we are focusing our efforts along two
directions:
Automating assembly plan creation using Deep Reinforce-
ment Learning: In the ÇIRAK project, a precise assembly
plan was manually prepared for step-by-step execution of
actions. This plan included the parts, the tools and the details
like which tools should be used on what parts at which step
of the assembly sequence. In the KALFA project we pro-
posed to learn the assembly plan using Deep Reinforcement
Learning by interacting with the parts and the tools within
the cobot’s simulation environment, thus facilitating the use
of cobot in assembly scenarios by people with little technical
skills.
Determining the sources of errors using a causality model:
In the ÇIRAK project and similar studies can detect an error
during the installation by comparing the current state of
the assembly with the previously defined steps in a plan.
However, they cannot detect the source of the errors or
determine the steps that should be taken in order not to
Fig. 2. An important problem in robotic assembly is perceiving the repeat the errors. The KALFA Project proposes to learn a
environment. However, common workplaces are unstructured and pose causal model in the simulation environment from interplay
challenges to existing perception methods. [16])
between parts, tools, factors and assembly stages, and to use
this causal model to determine the sources of errors when
disorganized environment, illumination conditions (e.g. the an anomaly is detected.
challenging conditions in Figure 2). (iii) Perception should
A. Related work
be aware of ambiguous cases and should be explainable.
In the project, we constructed a perception pipeline for de- An important problem in robotic assembly is the precise
tecting tools, objects, humans, and their gazes with the help generation of the assembly plan, which necessitates 3D
of deep learning applied on RGB-D video streams. Although perception of object parts and how those parts should be
existing object detectors, pose estimation, and human detec- manipulated to perform the assembly. Advances in machine
tion models, gaze estimation networks are very capable, they learning have paved the way for addressing this tedious task
are not as robust as advertised on benchmarks and therefore, by directly learning the assembly from the 3D models of
they often require tuning or combination with disambiguating the parts and the target object, e.g. for furniture assembly
sources of information, e.g. the task knowledge, any other problem [17], [18], [19]. For example, Li et al. [17] proposed
type of contextual information, temporal consistency etc. two network modules to extract information from the image
In addition, the learning-based methods which have been of the assembled furniture and part point clouds. Moreover,
trained on datasets that do not follow the characteristics Huang et al. [18] introduced a dynamic graph learning
of working environments that we consider, and therefore, framework to make predictions only from the part point
the collection of labeled datasets is often necessary. For clouds. In a similar line of approach, in Li et al.’s work
instance, in our project, we curated a tool detection dataset [19], learned relationships are used to predict the position
[16] specifically purposed for detecting tools in human-robot and the scale of parts instead of their poses (position and
collaboration settings orientation).
For an assembly task, another important challenge is
V. A SSEMBLY precise manipulation, which is hard to achieve using stan-
Obtaining a precise assembly plan is a labor intensive, dard controllers. To this end, the trending approach is us-
tedious task. Despite efforts in the literature for learning ing Reinforcement Learning (RL) to learn adaptive control
how to combine parts to assemble the final object in 3D strategies. Due to the complexity of the assembly task, RL
[17], [18], [19], these efforts are very limited to toy settings algorithms can get stuck at local minima and yield sub-
(in terms of objects and environments) and there remain optimal controllers. To overcome this issue, recent studies
crucial open issues: (i) It is still an open issue to assemble [20], [21] propose guiding RL with additional information.
an object by looking at e.g. the IKEA assembly manual. (ii) For example, Thomas et al. [21] use CAD data to extract a
Moreover, learning to do the assembly with a human co- geometric motion plan and a reward function that tracks the
worker and/or other robots has not been addressed. (iii) In motion plan. Luo et al. [20] utilizes the force information
addition, detecting whether there has been an error in the obtained by the robot’s interaction with the environment.
VI. S AFETY AND M OTION P LANNING [7] Q. Shen, K. Dautenhahn, J. Saunders, and H. Kose, “Can real-time,
adaptive human–robot motor coordination improve humans’ overall
The cobot needs to perceive the occupied space (which is perception of a robot?” IEEE Transactions on Autonomous Mental
varying due to the movement of the worker and the parts and Development, vol. 7, no. 1, pp. 52–64, 2015.
tools) and should plan and adjust its motions in real-time. [8] T. Lorenz, A. Mörtl, and S. Hirche, “Movement synchronization fails
during non-adaptive human-robot interaction,” in 2013 8th ACM/IEEE
This challenge was not addressed within our projects. International Conference on Human-Robot Interaction (HRI). IEEE,
In order to ensure the safety of human worker, we have 2013, pp. 189–190.
implemented a ROS watchdog node that is placed between [9] E. Ansermin, G. Mostafaoui, X. Sargentini, and P. Gaussier, “Unin-
tentional entrainment effect in a context of human robot interaction:
the motion control node and the UR5, which checked the An experimental study,” in 2017 26th IEEE International Symposium
desired pose and velocity commands against predefined con- on Robot and Human Interactive Communication (RO-MAN). IEEE,
straints, to ensure the operation of the cobot remain within 2017, pp. 1108–1114.
[10] K. L. Koay, D. S. Syrdal, M. Ashgari-Oskoei, M. L. Walters, and
a desired volume excluding the worker’s body (but not the K. Dautenhahn, “Social roles and baseline proxemic preferences for
arms) and under predefined velocity thresholds. a domestic service robot,” International Journal of Social Robotics,
vol. 6, no. 4, pp. 469–488, 2014.
VII. CONCLUSION [11] M. Walters, K. Dautenhahn, R. Boekhorst, K. Koay, C. Kaouri,
S. Woods, C. Nehaniv, D. Lee, and I. Werry, “The influence of
In this paper, we presented the “apprentice cobot” vision subjects’ personality traits on personal spatial zones in a human-robot
for human-robot collaboration and described the challenges interaction experiment,” vol. 2005, 09 2005, pp. 347 – 352.
involved in accomplishing it, shared our current results and [12] D. Shi, E. G. Collins Jr, B. Goldiez, A. Donate, X. Liu, and D. Dunlap,
“Human-aware robot motion planning with velocity constraints,” in
future research plans. 2008 International Symposium on Collaborative Technologies and
Systems. IEEE, 2008, pp. 490–497.
ACKNOWLEDGMENT [13] K. L. Koay, D. S. Syrdal, M. Ashgari-Oskoei, M. L. Walters, and
This work is partially supported by TÜBİTAK through K. Dautenhahn, “Social roles and baseline proxemic preferences for
a domestic service robot,” International Journal of Social Robotics,
projects “ÇIRAK: Compliant Robot Manipulator Support for vol. 6, no. 4, pp. 469–488, 2014.
Assembly Workers in Factories” (117E002) and “KALFA: [14] T. Komatsu and S. Yamada, “Exploring auditory information to
New Methods for Assembly Scenarios with Collaborative change users’ perception of time passing as shorter,” in Proceedings
of the 2020 CHI Conference on Human Factors in Computing
Robots” (120E269). Systems, ser. CHI ’20. New York, NY, USA: Association
for Computing Machinery, 2020, p. 1–12. [Online]. Available:
R EFERENCES https://doi.org/10.1145/3313831.3376157
[1] Statista.com. Size of the global market for collaborative robots from [15] S. Song and S. Yamada, “Expressing emotions through color,
2017 to 2025. [Online]. Available: https://www.statista.com/statistics/ sound, and vibration with an appearance-constrained social robot,”
748234/global-market-size-collaborative-robots/ in Proceedings of the 2017 ACM/IEEE International Conference on
[2] Y. Terzioğlu, B. Mutlu, and E. Şahin, “Designing social cues for Human-Robot Interaction, ser. HRI ’17. New York, NY, USA:
collaborative robots: the role of gaze and breathing in human-robot Association for Computing Machinery, 2017, p. 2–11. [Online].
collaboration,” in Proceedings of the 2020 ACM/IEEE International Available: https://doi.org/10.1145/2909824.3020239
Conference on Human-Robot Interaction, 2020, pp. 343–357. [16] F. C. Kurnaz, B. Hocaoglu, M. K. Yılmaz, İ. Sülo, and S. Kalkan,
[3] S. Saunderson and G. Nejat, “How robots influence humans: A sur- “Alet (automated labeling of equipment and tools): A dataset for tool
vey of nonverbal communication in social human–robot interaction,” detection and human worker safety detection,” in European Conference
International Journal of Social Robotics, vol. 11, no. 4, pp. 575–608, on Computer Vision. Springer, 2020, pp. 371–386.
2019. [17] Y. Li, K. Mo, L. Shao, M. Sung, and L. Guibas, “Learning 3d part
[4] M. Salem, K. Rohlfing, S. Kopp, and F. Joublin, “A friendly gesture: assembly from a single image,” 2020.
Investigating the effect of multimodal robot behavior in human-robot [18] J. Huang, G. Zhan, Q. Fan, K. Mo, L. Shao, B. Chen, L. Guibas, and
interaction,” in 2011 Ro-Man. IEEE, 2011, pp. 247–252. H. Dong, “Generative 3d part assembly via dynamic graph learning,”
[5] M. Salem, F. Eyssel, K. Rohlfing, S. Kopp, and F. Joublin, “To err 2020.
is human (-like): Effects of robot gesture on perceived anthropomor- [19] J. Li, C. Niu, and K. Xu, “Learning part generation and assembly for
phism and likability,” International Journal of Social Robotics, vol. 5, structure-aware shape synthesis,” 2020.
no. 3, pp. 313–323, 2013. [20] J. Luo, E. Solowjow, C. Wen, J. A. Ojea, A. M. Agogino, A. Tamar,
[6] L. D. Riek, T.-C. Rabinowitch, P. Bremner, A. G. Pipe, M. Fraser, and and P. Abbeel, “Reinforcement learning on variable impedance con-
P. Robinson, “Cooperative gestures: Effective signaling for humanoid troller for high-precision robotic assembly,” 2019.
robots,” in 2010 5th ACM/IEEE International Conference on Human- [21] G. Thomas, M. Chien, A. Tamar, J. A. Ojea, and P. Abbeel, “Learning
Robot Interaction (HRI). IEEE, 2010, pp. 61–68. robotic assembly from cad,” 2018.

You might also like