Skip to main content

Showing 1–17 of 17 results for author: Uehara, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.07060  [pdf, other

    cs.RO

    Memory-Maze: Scenario Driven Benchmark and Visual Language Navigation Model for Guiding Blind People

    Authors: Masaki Kuribayashi, Kohei Uehara, Allan Wang, Daisuke Sato, Simon Chu, Shigeo Morishima

    Abstract: Visual Language Navigation (VLN) powered navigation robots have the potential to guide blind people by understanding and executing route instructions provided by sighted passersby. This capability allows robots to operate in environments that are often unknown a priori. Existing VLN models are insufficient for the scenario of navigation guidance for blind people, as they need to understand routes… ▽ More

    Submitted 11 May, 2024; originally announced May 2024.

  2. arXiv:2401.10005  [pdf, other

    cs.CV cs.CL

    Advancing Large Multi-modal Models with Explicit Chain-of-Reasoning and Visual Question Generation

    Authors: Kohei Uehara, Nabarun Goswami, Hanqin Wang, Toshiaki Baba, Kohtaro Tanaka, Tomohiro Hashimoto, Kai Wang, Rei Ito, Takagi Naoya, Ryo Umagami, Yingyi Wen, Tanachai Anakewat, Tatsuya Harada

    Abstract: The increasing demand for intelligent systems capable of interpreting and reasoning about visual content requires the development of large Vision-and-Language Models (VLMs) that are not only accurate but also have explicit reasoning capabilities. This paper presents a novel approach to develop a VLM with the ability to conduct explicit reasoning based on visual content and textual instructions. We… ▽ More

    Submitted 17 July, 2024; v1 submitted 18 January, 2024; originally announced January 2024.

  3. Scrappy: SeCure Rate Assuring Protocol with PrivacY

    Authors: Kosei Akama, Yoshimichi Nakatsuka, Masaaki Sato, Keisuke Uehara

    Abstract: Preventing abusive activities caused by adversaries accessing online services at a rate exceeding that expected by websites has become an ever-increasing problem. CAPTCHAs and SMS authentication are widely used to provide a solution by implementing rate limiting, although they are becoming less effective, and some are considered privacy-invasive. In light of this, many studies have proposed better… ▽ More

    Submitted 1 December, 2023; originally announced December 2023.

    Journal ref: Network and Distributed System Security (NDSS) Symposium 2024

  4. arXiv:2210.05879  [pdf, other

    cs.CV

    Learning by Asking Questions for Knowledge-based Novel Object Recognition

    Authors: Kohei Uehara, Tatsuya Harada

    Abstract: In real-world object recognition, there are numerous object classes to be recognized. Conventional image recognition based on supervised learning can only recognize object classes that exist in the training data, and thus has limited applicability in the real world. On the other hand, humans can recognize novel objects by asking questions and acquiring knowledge about them. Inspired by this, we st… ▽ More

    Submitted 11 October, 2022; originally announced October 2022.

  5. arXiv:2203.07890  [pdf, other

    cs.CV cs.CL

    K-VQG: Knowledge-aware Visual Question Generation for Common-sense Acquisition

    Authors: Kohei Uehara, Tatsuya Harada

    Abstract: Visual Question Generation (VQG) is a task to generate questions from images. When humans ask questions about an image, their goal is often to acquire some new knowledge. However, existing studies on VQG have mainly addressed question generation from answers or question categories, overlooking the objectives of knowledge acquisition. To introduce a knowledge acquisition perspective into VQG, we co… ▽ More

    Submitted 15 March, 2022; originally announced March 2022.

  6. ViNTER: Image Narrative Generation with Emotion-Arc-Aware Transformer

    Authors: Kohei Uehara, Yusuke Mori, Yusuke Mukuta, Tatsuya Harada

    Abstract: Image narrative generation is a task to create a story from an image with a subjective viewpoint. Given the importance of the subjective feelings of writers, readers, and characters in storytelling, an image narrative generation method should consider human emotion. In this study, we propose a novel method of image narrative generation called ViNTER (Visual Narrative Transformer with Emotion arc R… ▽ More

    Submitted 7 April, 2022; v1 submitted 15 February, 2022; originally announced February 2022.

  7. arXiv:2012.02346  [pdf, other

    cs.CV cs.GR cs.LG

    ChartPointFlow for Topology-Aware 3D Point Cloud Generation

    Authors: Takumi Kimura, Takashi Matsubara, Kuniaki Uehara

    Abstract: A point cloud serves as a representation of the surface of a three-dimensional (3D) shape. Deep generative models have been adapted to model their variations typically using a map from a ball-like set of latent variables. However, previous approaches did not pay much attention to the topological structure of a point cloud, despite that a continuous map cannot express the varying numbers of holes a… ▽ More

    Submitted 7 August, 2021; v1 submitted 3 December, 2020; originally announced December 2020.

    Comments: Accepted to ACM International Conference on Multimedia (ACMMM2021) as an oral presentation

    Journal ref: ACM International Conference on Multimedia (ACMMM2021)

  8. arXiv:1911.10354  [pdf, other

    cs.CV cs.CL cs.LG

    Unsupervised Keyword Extraction for Full-sentence VQA

    Authors: Kohei Uehara, Tatsuya Harada

    Abstract: In the majority of the existing Visual Question Answering (VQA) research, the answers consist of short, often single words, as per instructions given to the annotators during dataset construction. This study envisions a VQA task for natural situations, where the answers are more likely to be sentences rather than single words. To bridge the gap between this natural VQA and existing VQA approaches,… ▽ More

    Submitted 12 October, 2020; v1 submitted 23 November, 2019; originally announced November 2019.

    Comments: EMNLP 2020 workshop: NLP Beyond Text (NLPBT)

  9. arXiv:1905.02442  [pdf, other

    cs.CV

    Interactive Video Retrieval with Dialog

    Authors: Sho Maeoki, Kohei Uehara, Tatsuya Harada

    Abstract: Now that everyone can easily record videos, the quantity of which is continuously increasing, research on methods for improved video retrieval is important in the contemporary world. In cases where target videos are to be identified within a large collection gathered by individuals, the appropriate information must be obtained to retrieve the correct video within a large number of similar items in… ▽ More

    Submitted 7 May, 2019; originally announced May 2019.

  10. arXiv:1904.08504  [pdf, other

    cs.CV cs.CL cs.LG cs.MM stat.ML

    Exploring Uncertainty Measures for Image-Caption Embedding-and-Retrieval Task

    Authors: Kenta Hama, Takashi Matsubara, Kuniaki Uehara, Jianfei Cai

    Abstract: With the wide development of black-box machine learning algorithms, particularly deep neural network (DNN), the practical demand for the reliability assessment is rapidly rising. On the basis of the concept that `Bayesian deep learning knows what it does not know,' the uncertainty of DNN outputs has been investigated as a reliability measure for the classification and regression tasks. However, in… ▽ More

    Submitted 9 April, 2019; originally announced April 2019.

  11. Data Augmentation using Random Image Cropping and Patching for Deep CNNs

    Authors: Ryo Takahashi, Takashi Matsubara, Kuniaki Uehara

    Abstract: Deep convolutional neural networks (CNNs) have achieved remarkable results in image processing tasks. However, their high expression ability risks overfitting. Consequently, data augmentation techniques have been proposed to prevent overfitting while enriching datasets. Recent CNN architectures with more parameters are rendering traditional data augmentation techniques insufficient. In this study,… ▽ More

    Submitted 27 August, 2019; v1 submitted 22 November, 2018; originally announced November 2018.

    Comments: accepted version, 16 pages

    Journal ref: IEEE Transactions on Circuits and Systems for Video Technology, 2019

  12. arXiv:1808.02996  [pdf, other

    cs.CV

    Object Detection in Satellite Imagery using 2-Step Convolutional Neural Networks

    Authors: Hiroki Miyamoto, Kazuki Uehara, Masahiro Murakawa, Hidenori Sakanashi, Hirokazu Nosato, Toru Kouyama, Ryosuke Nakamura

    Abstract: This paper presents an efficient object detection method from satellite imagery. Among a number of machine learning algorithms, we proposed a combination of two convolutional neural networks (CNN) aimed at high precision and high recall, respectively. We validated our models using golf courses as target objects. The proposed deep learning method demonstrated higher accuracy than previous object id… ▽ More

    Submitted 8 August, 2018; originally announced August 2018.

    Comments: 4 pages,5 figures

  13. arXiv:1808.01821  [pdf, other

    cs.CV

    Visual Question Generation for Class Acquisition of Unknown Objects

    Authors: Kohei Uehara, Antonio Tejero-De-Pablos, Yoshitaka Ushiku, Tatsuya Harada

    Abstract: Traditional image recognition methods only consider objects belonging to already learned classes. However, since training a recognition model with every object class in the world is unfeasible, a way of getting information on unknown objects (i.e., objects whose class has not been learned) is necessary. A way for an image recognition system to learn new classes could be asking a human about object… ▽ More

    Submitted 6 August, 2018; originally announced August 2018.

  14. arXiv:1807.05800  [pdf, other

    cs.LG cs.CV stat.ML

    Deep Generative Model using Unregularized Score for Anomaly Detection with Heterogeneous Complexity

    Authors: Takashi Matsubara, Kenta Hama, Ryosuke Tachibana, Kuniaki Uehara

    Abstract: Accurate and automated detection of anomalous samples in a natural image dataset can be accomplished with a probabilistic model for end-to-end modeling of images. Such images have heterogeneous complexity, however, and a probabilistic model overlooks simply shaped objects with small anomalies. This is because the probabilistic model assigns undesirably lower likelihoods to complexly shaped objects… ▽ More

    Submitted 4 September, 2018; v1 submitted 16 July, 2018; originally announced July 2018.

    Comments: An extended version of a manuscript in Proc. of The 2018 International Joint Conference on Neural Networks (IJCNN2018)

  15. Deep Neural Generative Model of Functional MRI Images for Psychiatric Disorder Diagnosis

    Authors: Takashi Matsubara, Tetsuo Tashiro, Kuniaki Uehara

    Abstract: Accurate diagnosis of psychiatric disorders plays a critical role in improving the quality of life for patients and potentially supports the development of new treatments. Many studies have been conducted on machine learning techniques that seek brain imaging data for specific biomarkers of disorders. These studies have encountered the following dilemma: A direct classification overfits to a small… ▽ More

    Submitted 11 April, 2019; v1 submitted 18 December, 2017; originally announced December 2017.

    Comments: accepted version, 12 pages

    Journal ref: IEEE Transactions on Biomedical Engineering, 2019

  16. arXiv:1707.09099  [pdf, other

    cs.CV

    Object Detection of Satellite Images Using Multi-Channel Higher-order Local Autocorrelation

    Authors: Kazuki Uehara, Hidenori Sakanashi, Hirokazu Nosato, Masahiro Murakawa, Hiroki Miyamoto, Ryosuke Nakamura

    Abstract: The Earth observation satellites have been monitoring the earth's surface for a long time, and the images taken by the satellites contain large amounts of valuable data. However, it is extremely hard work to manually analyze such huge data. Thus, a method of automatic object detection is needed for satellite images to facilitate efficient data analyses. This paper describes a new image feature ext… ▽ More

    Submitted 27 July, 2017; originally announced July 2017.

    Comments: 6 pages, 2 column, 7 figures, Accepted by IEEE International Conference on Systems, Man, and Cybernetics (SMC) 2017

  17. A Novel Weight-Shared Multi-Stage CNN for Scale Robustness

    Authors: Ryo Takahashi, Takashi Matsubara, Kuniaki Uehara

    Abstract: Convolutional neural networks (CNNs) have demonstrated remarkable results in image classification for benchmark tasks and practical applications. The CNNs with deeper architectures have achieved even higher performance recently thanks to their robustness to the parallel shift of objects in images as well as their numerous parameters and the resulting high expression ability. However, CNNs have a l… ▽ More

    Submitted 11 April, 2019; v1 submitted 12 February, 2017; originally announced February 2017.

    Comments: accepted version, 13 pages

    Journal ref: IEEE Transactions on Circuits and Systems for Video Technology, vol. 29, no. 4, 2019, pp. 1090-1101