Search | arXiv e-print repository

MRC-LSTM: A Hybrid Approach of Multi-scale Residual CNN and LSTM to Predict Bitcoin Price

Authors: Qiutong Guo, Shun Lei, Qing Ye, Zhiyang Fang

Abstract: Bitcoin, one of the major cryptocurrencies, presents great opportunities and challenges with its tremendous potential returns accompanying high risks. The high volatility of Bitcoin and the complex factors affecting them make the study of effective price forecasting methods of great practical importance to financial investors and researchers worldwide. In this paper, we propose a novel approach ca… ▽ More Bitcoin, one of the major cryptocurrencies, presents great opportunities and challenges with its tremendous potential returns accompanying high risks. The high volatility of Bitcoin and the complex factors affecting them make the study of effective price forecasting methods of great practical importance to financial investors and researchers worldwide. In this paper, we propose a novel approach called MRC-LSTM, which combines a Multi-scale Residual Convolutional neural network (MRC) and a Long Short-Term Memory (LSTM) to implement Bitcoin closing price prediction. Specifically, the Multi-scale residual module is based on one-dimensional convolution, which is not only capable of adaptive detecting features of different time scales in multivariate time series, but also enables the fusion of these features. LSTM has the ability to learn long-term dependencies in series, which is widely used in financial time series forecasting. By mixing these two methods, the model is able to obtain highly expressive features and efficiently learn trends and interactions of multivariate time series. In the study, the impact of external factors such as macroeconomic variables and investor attention on the Bitcoin price is considered in addition to the trading information of the Bitcoin market. We performed experiments to predict the daily closing price of Bitcoin (USD), and the experimental results show that MRC-LSTM significantly outperforms a variety of other network structures. Furthermore, we conduct additional experiments on two other cryptocurrencies, Ethereum and Litecoin, to further confirm the effectiveness of the MRC-LSTM in short-term forecasting for multivariate time series of cryptocurrencies. △ Less

Submitted 3 May, 2021; originally announced May 2021.

arXiv:2105.00693 [pdf, other]

Heart-Darts: Classification of Heartbeats Using Differentiable Architecture Search

Authors: Jindi Lv, Qing Ye, Yanan Sun, Juan Zhao, Jiancheng Lv

Abstract: Arrhythmia is a cardiovascular disease that manifests irregular heartbeats. In arrhythmia detection, the electrocardiogram (ECG) signal is an important diagnostic technique. However, manually evaluating ECG signals is a complicated and time-consuming task. With the application of convolutional neural networks (CNNs), the evaluation process has been accelerated and the performance is improved. It i… ▽ More Arrhythmia is a cardiovascular disease that manifests irregular heartbeats. In arrhythmia detection, the electrocardiogram (ECG) signal is an important diagnostic technique. However, manually evaluating ECG signals is a complicated and time-consuming task. With the application of convolutional neural networks (CNNs), the evaluation process has been accelerated and the performance is improved. It is noteworthy that the performance of CNNs heavily depends on their architecture design, which is a complex process grounded on expert experience and trial-and-error. In this paper, we propose a novel approach, Heart-Darts, to efficiently classify the ECG signals by automatically designing the CNN model with the differentiable architecture search (i.e., Darts, a cell-based neural architecture search method). Specifically, we initially search a cell architecture by Darts and then customize a novel CNN model for ECG classification based on the obtained cells. To investigate the efficiency of the proposed method, we evaluate the constructed model on the MIT-BIH arrhythmia database. Additionally, the extensibility of the proposed CNN model is validated on two other new databases. Extensive experimental results demonstrate that the proposed method outperforms several state-of-the-art CNN models in ECG classification in terms of both performance and generalization capability. △ Less

Submitted 3 May, 2021; originally announced May 2021.

arXiv:2104.14506 [pdf, other]

Explainable AI For COVID-19 CT Classifiers: An Initial Comparison Study

Authors: Qinghao Ye, Jun Xia, Guang Yang

Abstract: Artificial Intelligence (AI) has made leapfrogs in development across all the industrial sectors especially when deep learning has been introduced. Deep learning helps to learn the behaviour of an entity through methods of recognising and interpreting patterns. Despite its limitless potential, the mystery is how deep learning algorithms make a decision in the first place. Explainable AI (XAI) is t… ▽ More Artificial Intelligence (AI) has made leapfrogs in development across all the industrial sectors especially when deep learning has been introduced. Deep learning helps to learn the behaviour of an entity through methods of recognising and interpreting patterns. Despite its limitless potential, the mystery is how deep learning algorithms make a decision in the first place. Explainable AI (XAI) is the key to unlocking AI and the black-box for deep learning. XAI is an AI model that is programmed to explain its goals, logic, and decision making so that the end users can understand. The end users can be domain experts, regulatory agencies, managers and executive board members, data scientists, users that use AI, with or without awareness, or someone who is affected by the decisions of an AI model. Chest CT has emerged as a valuable tool for the clinical diagnostic and treatment management of the lung diseases associated with COVID-19. AI can support rapid evaluation of CT scans to differentiate COVID-19 findings from other lung diseases. However, how these AI tools or deep learning algorithms reach such a decision and which are the most influential features derived from these neural networks with typically deep layers are not clear. The aim of this study is to propose and develop XAI strategies for COVID-19 classification models with an investigation of comparison. The results demonstrate promising quantification and qualitative visualisations that can further enhance the clinician's understanding and decision making with more granular information from the results given by the learned XAI models. △ Less

Submitted 25 April, 2021; originally announced April 2021.

Comments: 6 pages, 4 figures, IEEE CBMS 2021

MSC Class: 68T01

arXiv:2104.08840 [pdf, other]

On the Influence of Masking Policies in Intermediate Pre-training

Authors: Qinyuan Ye, Belinda Z. Li, Sinong Wang, Benjamin Bolte, Hao Ma, Wen-tau Yih, Xiang Ren, Madian Khabsa

Abstract: Current NLP models are predominantly trained through a two-stage "pre-train then fine-tune" pipeline. Prior work has shown that inserting an intermediate pre-training stage, using heuristic masking policies for masked language modeling (MLM), can significantly improve final performance. However, it is still unclear (1) in what cases such intermediate pre-training is helpful, (2) whether hand-craft… ▽ More Current NLP models are predominantly trained through a two-stage "pre-train then fine-tune" pipeline. Prior work has shown that inserting an intermediate pre-training stage, using heuristic masking policies for masked language modeling (MLM), can significantly improve final performance. However, it is still unclear (1) in what cases such intermediate pre-training is helpful, (2) whether hand-crafted heuristic objectives are optimal for a given task, and (3) whether a masking policy designed for one task is generalizable beyond that task. In this paper, we perform a large-scale empirical study to investigate the effect of various masking policies in intermediate pre-training with nine selected tasks across three categories. Crucially, we introduce methods to automate the discovery of optimal masking policies via direct supervision or meta-learning. We conclude that the success of intermediate pre-training is dependent on appropriate pre-train corpus, selection of output format (i.e., masked spans or full sentence), and clear understanding of the role that MLM plays for the downstream task. In addition, we find our learned masking policies outperform the heuristic of masking named entities on TriviaQA, and policies learned from one task can positively transfer to other tasks in certain cases, inviting future research in this direction. △ Less

Submitted 30 September, 2021; v1 submitted 18 April, 2021; originally announced April 2021.

Comments: Accepted to EMNLP 2021. Camera-ready version

arXiv:2104.08835 [pdf, other]

CrossFit: A Few-shot Learning Challenge for Cross-task Generalization in NLP

Authors: Qinyuan Ye, Bill Yuchen Lin, Xiang Ren

Abstract: Humans can learn a new language task efficiently with only few examples, by leveraging their knowledge obtained when learning prior tasks. In this paper, we explore whether and how such cross-task generalization ability can be acquired, and further applied to build better few-shot learners across diverse NLP tasks. We introduce CrossFit, a problem setup for studying cross-task generalization abili… ▽ More Humans can learn a new language task efficiently with only few examples, by leveraging their knowledge obtained when learning prior tasks. In this paper, we explore whether and how such cross-task generalization ability can be acquired, and further applied to build better few-shot learners across diverse NLP tasks. We introduce CrossFit, a problem setup for studying cross-task generalization ability, which standardizes seen/unseen task partitions, data access during different learning stages, and the evaluation protocols. To instantiate different seen/unseen task partitions in CrossFit and facilitate in-depth analysis, we present the NLP Few-shot Gym, a repository of 160 diverse few-shot NLP tasks created from open-access NLP datasets and converted to a unified text-to-text format. Our analysis reveals that the few-shot learning ability on unseen tasks can be improved via an upstream learning stage using a set of seen tasks. We also observe that the selection of upstream learning tasks can significantly influence few-shot performance on unseen tasks, asking further analysis on task similarity and transferability. △ Less

Submitted 30 September, 2021; v1 submitted 18 April, 2021; originally announced April 2021.

Comments: Accepted to EMNLP 2021. Camera-ready version. Code: https://github.com/INK-USC/CrossFit

arXiv:2104.03775 [pdf, other]

Geometry-based Distance Decomposition for Monocular 3D Object Detection

Authors: Xuepeng Shi, Qi Ye, Xiaozhi Chen, Chuangrong Chen, Zhixiang Chen, Tae-Kyun Kim

Abstract: Monocular 3D object detection is of great significance for autonomous driving but remains challenging. The core challenge is to predict the distance of objects in the absence of explicit depth information. Unlike regressing the distance as a single variable in most existing methods, we propose a novel geometry-based distance decomposition to recover the distance by its factors. The decomposition f… ▽ More Monocular 3D object detection is of great significance for autonomous driving but remains challenging. The core challenge is to predict the distance of objects in the absence of explicit depth information. Unlike regressing the distance as a single variable in most existing methods, we propose a novel geometry-based distance decomposition to recover the distance by its factors. The decomposition factors the distance of objects into the most representative and stable variables, i.e. the physical height and the projected visual height in the image plane. Moreover, the decomposition maintains the self-consistency between the two heights, leading to robust distance prediction when both predicted heights are inaccurate. The decomposition also enables us to trace the causes of the distance uncertainty for different scenarios. Such decomposition makes the distance prediction interpretable, accurate, and robust. Our method directly predicts 3D bounding boxes from RGB images with a compact architecture, making the training and inference simple and efficient. The experimental results show that our method achieves the state-of-the-art performance on the monocular 3D Object Detection and Birds Eye View tasks of the KITTI dataset, and can generalize to images with different camera intrinsics. △ Less

Submitted 29 June, 2022; v1 submitted 8 April, 2021; originally announced April 2021.

Comments: Accepted to ICCV 2021. Code: https://github.com/Rock-100/MonoDet

arXiv:2104.02324 [pdf, other]

Multiple instance active learning for object detection

Authors: Tianning Yuan, Fang Wan, Mengying Fu, Jianzhuang Liu, Songcen Xu, Xiangyang Ji, Qixiang Ye

Abstract: Despite the substantial progress of active learning for image recognition, there still lacks an instance-level active learning method specified for object detection. In this paper, we propose Multiple Instance Active Object Detection (MI-AOD), to select the most informative images for detector training by observing instance-level uncertainty. MI-AOD defines an instance uncertainty learning module,… ▽ More Despite the substantial progress of active learning for image recognition, there still lacks an instance-level active learning method specified for object detection. In this paper, we propose Multiple Instance Active Object Detection (MI-AOD), to select the most informative images for detector training by observing instance-level uncertainty. MI-AOD defines an instance uncertainty learning module, which leverages the discrepancy of two adversarial instance classifiers trained on the labeled set to predict instance uncertainty of the unlabeled set. MI-AOD treats unlabeled images as instance bags and feature anchors in images as instances, and estimates the image uncertainty by re-weighting instances in a multiple instance learning (MIL) fashion. Iterative instance uncertainty learning and re-weighting facilitate suppressing noisy instances, toward bridging the gap between instance uncertainty and image-level uncertainty. Experiments validate that MI-AOD sets a solid baseline for instance-level active learning. On commonly used object detection datasets, MI-AOD outperforms state-of-the-art methods with significant margins, particularly when the labeled sets are small. Code is available at https://github.com/yuantn/MI-AOD. △ Less

Submitted 6 April, 2021; originally announced April 2021.

Comments: 10 pages, 7 figures, 5 tables. Code is available at https://github.com/yuantn/MI-AOD

arXiv:2104.02281 [pdf, other]

Learnable Expansion-and-Compression Network for Few-shot Class-Incremental Learning

Authors: Boyu Yang, Mingbao Lin, Binghao Liu, Mengying Fu, Chang Liu, Rongrong Ji, Qixiang Ye

Abstract: Few-shot class-incremental learning (FSCIL), which targets at continuously expanding model's representation capacity under few supervisions, is an important yet challenging problem. On the one hand, when fitting new tasks (novel classes), features trained on old tasks (old classes) could significantly drift, causing catastrophic forgetting. On the other hand, training the large amount of model par… ▽ More Few-shot class-incremental learning (FSCIL), which targets at continuously expanding model's representation capacity under few supervisions, is an important yet challenging problem. On the one hand, when fitting new tasks (novel classes), features trained on old tasks (old classes) could significantly drift, causing catastrophic forgetting. On the other hand, training the large amount of model parameters with few-shot novel-class examples leads to model over-fitting. In this paper, we propose a learnable expansion-and-compression network (LEC-Net), with the aim to simultaneously solve catastrophic forgetting and model over-fitting problems in a unified framework. By tentatively expanding network nodes, LEC-Net enlarges the representation capacity of features, alleviating feature drift of old network from the perspective of model regularization. By compressing the expanded network nodes, LEC-Net purses minimal increase of model parameters, alleviating over-fitting of the expanded network from a perspective of compact representation. Experiments on the CUB/CIFAR-100 datasets show that LEC-Net improves the baseline by 5~7% while outperforms the state-of-the-art by 5~6%. LEC-Net also demonstrates the potential to be a general incremental learning approach with dynamic model expansion capability. △ Less

Submitted 6 April, 2021; originally announced April 2021.

arXiv:2103.16800 [pdf, ps, other]

Optimal Retirement Time and Consumption with the Variation in Habitual Persistence

Authors: Lin He, Zongxia Liang, Yilun Song, Qi Ye

Abstract: In this paper,we study the individual's optimal retirement time and optimal consumption under habitual persistence. Because the individual feels equally satisfied with a lower habitual level and is more reluctant to change the habitual level after retirement, we assume that both the level and the sensitivity of the habitual consumption decline at the time of retirement. We establish the concise fo… ▽ More In this paper,we study the individual's optimal retirement time and optimal consumption under habitual persistence. Because the individual feels equally satisfied with a lower habitual level and is more reluctant to change the habitual level after retirement, we assume that both the level and the sensitivity of the habitual consumption decline at the time of retirement. We establish the concise form of the habitual evolutions, and obtain the optimal retirement time and consumption policy based on martingale and duality methods. The optimal consumption experiences a sharp decline at retirement, but the excess consumption raises because of the reduced sensitivity of the habitual level. This result contributes to explain the "retirement consumption puzzle". Particularly, the optimal retirement and consumption policies are balanced between the wealth effect and the habitual effect. Larger wealth increases consumption, and larger growth inertia (sensitivity) of the habitual level decreases consumption and brings forward the retirement time. △ Less

Submitted 31 March, 2021; originally announced March 2021.

MSC Class: 91G05; 91G10; 91B05; 91B06

arXiv:2103.14862 [pdf, other]

TS-CAM: Token Semantic Coupled Attention Map for Weakly Supervised Object Localization

Authors: Wei Gao, Fang Wan, Xingjia Pan, Zhiliang Peng, Qi Tian, Zhenjun Han, Bolei Zhou, Qixiang Ye

Abstract: Weakly supervised object localization (WSOL) is a challenging problem when given image category labels but requires to learn object localization models. Optimizing a convolutional neural network (CNN) for classification tends to activate local discriminative regions while ignoring complete object extent, causing the partial activation issue. In this paper, we argue that partial activation is cause… ▽ More Weakly supervised object localization (WSOL) is a challenging problem when given image category labels but requires to learn object localization models. Optimizing a convolutional neural network (CNN) for classification tends to activate local discriminative regions while ignoring complete object extent, causing the partial activation issue. In this paper, we argue that partial activation is caused by the intrinsic characteristics of CNN, where the convolution operations produce local receptive fields and experience difficulty to capture long-range feature dependency among pixels. We introduce the token semantic coupled attention map (TS-CAM) to take full advantage of the self-attention mechanism in visual transformer for long-range dependency extraction. TS-CAM first splits an image into a sequence of patch tokens for spatial embedding, which produce attention maps of long-range visual dependency to avoid partial activation. TS-CAM then re-allocates category-related semantics for patch tokens, enabling each of them to be aware of object categories. TS-CAM finally couples the patch tokens with the semantic-agnostic attention map to achieve semantic-aware localization. Experiments on the ILSVRC/CUB-200-2011 datasets show that TS-CAM outperforms its CNN-CAM counterparts by 7.1%/27.1% for WSOL, achieving state-of-the-art performance. △ Less

Submitted 3 August, 2021; v1 submitted 27 March, 2021; originally announced March 2021.

Comments: Accepted by ICCV2021 (poster)

arXiv:2103.10415 [pdf, other]

Refining Language Models with Compositional Explanations

Authors: Huihan Yao, Ying Chen, Qinyuan Ye, Xisen Jin, Xiang Ren

Abstract: Pre-trained language models have been successful on text classification tasks, but are prone to learning spurious correlations from biased datasets, and are thus vulnerable when making inferences in a new domain. Prior work reveals such spurious patterns via post-hoc explanation algorithms which compute the importance of input features. Further, the model is regularized to align the importance sco… ▽ More Pre-trained language models have been successful on text classification tasks, but are prone to learning spurious correlations from biased datasets, and are thus vulnerable when making inferences in a new domain. Prior work reveals such spurious patterns via post-hoc explanation algorithms which compute the importance of input features. Further, the model is regularized to align the importance scores with human knowledge, so that the unintended model behaviors are eliminated. However, such a regularization technique lacks flexibility and coverage, since only importance scores towards a pre-defined list of features are adjusted, while more complex human knowledge such as feature interaction and pattern generalization can hardly be incorporated. In this work, we propose to refine a learned language model for a target domain by collecting human-provided compositional explanations regarding observed biases. By parsing these explanations into executable logic rules, the human-specified refinement advice from a small set of explanations can be generalized to more training examples. We additionally introduce a regularization term allowing adjustments for both importance and interaction of features to better rectify model behavior. We demonstrate the effectiveness of the proposed approach on two text classification tasks by showing improved performance in target domain as well as improved model fairness after refinement. △ Less

Submitted 31 December, 2021; v1 submitted 18 March, 2021; originally announced March 2021.

Comments: Accepted to NeurIPS 2021. Camera-ready version. Code: https://github.com/INK-USC/expl-refinement

arXiv:2103.05293 [pdf, other]

Decentralized Circle Formation Control for Fish-like Robots in the Real-world via Reinforcement Learning

Authors: Tianhao Zhang, Yueheng Li, Shuai Li, Qiwei Ye, Chen Wang, Guangming Xie

Abstract: In this paper, the circle formation control problem is addressed for a group of cooperative underactuated fish-like robots involving unknown nonlinear dynamics and disturbances. Based on the reinforcement learning and cognitive consistency theory, we propose a decentralized controller without the knowledge of the dynamics of the fish-like robots. The proposed controller can be transferred from sim… ▽ More In this paper, the circle formation control problem is addressed for a group of cooperative underactuated fish-like robots involving unknown nonlinear dynamics and disturbances. Based on the reinforcement learning and cognitive consistency theory, we propose a decentralized controller without the knowledge of the dynamics of the fish-like robots. The proposed controller can be transferred from simulation to reality. It is only trained in our established simulation environment, and the trained controller can be deployed to real robots without any manual tuning. Simulation results confirm that the proposed model-free robust formation control method is scalable with respect to the group size of the robots and outperforms other representative RL algorithms. Several experiments in the real world verify the effectiveness of our RL-based approach for circle formation control. △ Less

Submitted 9 March, 2021; originally announced March 2021.

Comments: to be published in ICRA2021

MSC Class: 68T40 ACM Class: I.2.9

arXiv:2103.04612 [pdf, other]

Beyond Max-Margin: Class Margin Equilibrium for Few-shot Object Detection

Authors: Bohao Li, Boyu Yang, Chang Liu, Feng Liu, Rongrong Ji, Qixiang Ye

Abstract: Few-shot object detection has made substantial progressby representing novel class objects using the feature representation learned upon a set of base class objects. However,an implicit contradiction between novel class classification and representation is unfortunately ignored. On the one hand, to achieve accurate novel class classification, the distributions of either two base classes must be fa… ▽ More Few-shot object detection has made substantial progressby representing novel class objects using the feature representation learned upon a set of base class objects. However,an implicit contradiction between novel class classification and representation is unfortunately ignored. On the one hand, to achieve accurate novel class classification, the distributions of either two base classes must be far away fromeach other (max-margin). On the other hand, to precisely represent novel classes, the distributions of base classes should be close to each other to reduce the intra-class distance of novel classes (min-margin). In this paper, we propose a class margin equilibrium (CME) approach, with the aim to optimize both feature space partition and novel class reconstruction in a systematic way. CME first converts the few-shot detection problem to the few-shot classification problem by using a fully connected layer to decouple localization features. CME then reserves adequate margin space for novel classes by introducing simple-yet-effective class margin loss during feature learning. Finally, CME pursues margin equilibrium by disturbing the features of novel class instances in an adversarial min-max fashion. Experiments on Pascal VOC and MS-COCO datasets show that CME significantly improves upon two baseline detectors (up to $3\sim 5\%$ in average), achieving state-of-the-art performance. Code is available at https://github.com/Bohao-Lee/CME . △ Less

Submitted 31 May, 2021; v1 submitted 8 March, 2021; originally announced March 2021.

Comments: This paper has been modified by the author due to errors

arXiv:2102.01998 [pdf, other]

Unbox the Black-box for the Medical Explainable AI via Multi-modal and Multi-centre Data Fusion: A Mini-Review, Two Showcases and Beyond

Authors: Guang Yang, Qinghao Ye, Jun Xia

Abstract: Explainable Artificial Intelligence (XAI) is an emerging research topic of machine learning aimed at unboxing how AI systems' black-box choices are made. This research field inspects the measures and models involved in decision-making and seeks solutions to explain them explicitly. Many of the machine learning algorithms can not manifest how and why a decision has been cast. This is particularly t… ▽ More Explainable Artificial Intelligence (XAI) is an emerging research topic of machine learning aimed at unboxing how AI systems' black-box choices are made. This research field inspects the measures and models involved in decision-making and seeks solutions to explain them explicitly. Many of the machine learning algorithms can not manifest how and why a decision has been cast. This is particularly true of the most popular deep neural network approaches currently in use. Consequently, our confidence in AI systems can be hindered by the lack of explainability in these black-box models. The XAI becomes more and more crucial for deep learning powered applications, especially for medical and healthcare studies, although in general these deep neural networks can return an arresting dividend in performance. The insufficient explainability and transparency in most existing AI systems can be one of the major reasons that successful implementation and integration of AI tools into routine clinical practice are uncommon. In this study, we first surveyed the current progress of XAI and in particular its advances in healthcare applications. We then introduced our solutions for XAI leveraging multi-modal and multi-centre data fusion, and subsequently validated in two showcases following real clinical scenarios. Comprehensive quantitative and qualitative analyses can prove the efficacy of our proposed XAI solutions, from which we can envisage successful applications in a broader range of clinical questions. △ Less

Submitted 3 February, 2021; originally announced February 2021.

Comments: 68 pages, 19 figures, submitted to the Information Fusion journal

arXiv:2101.07985 [pdf, other]

Network Pruning using Adaptive Exemplar Filters

Authors: Mingbao Lin, Rongrong Ji, Shaojie Li, Yan Wang, Yongjian Wu, Feiyue Huang, Qixiang Ye

Abstract: Popular network pruning algorithms reduce redundant information by optimizing hand-crafted models, and may cause suboptimal performance and long time in selecting filters. We innovatively introduce adaptive exemplar filters to simplify the algorithm design, resulting in an automatic and efficient pruning approach called EPruner. Inspired by the face recognition community, we use a message passing… ▽ More Popular network pruning algorithms reduce redundant information by optimizing hand-crafted models, and may cause suboptimal performance and long time in selecting filters. We innovatively introduce adaptive exemplar filters to simplify the algorithm design, resulting in an automatic and efficient pruning approach called EPruner. Inspired by the face recognition community, we use a message passing algorithm Affinity Propagation on the weight matrices to obtain an adaptive number of exemplars, which then act as the preserved filters. EPruner breaks the dependency on the training data in determining the "important" filters and allows the CPU implementation in seconds, an order of magnitude faster than GPU based SOTAs. Moreover, we show that the weights of exemplars provide a better initialization for the fine-tuning. On VGGNet-16, EPruner achieves a 76.34%-FLOPs reduction by removing 88.80% parameters, with 0.06% accuracy improvement on CIFAR-10. In ResNet-152, EPruner achieves a 65.12%-FLOPs reduction by removing 64.18% parameters, with only 0.71% top-5 accuracy loss on ILSVRC-2012. Our code can be available at https://github.com/lmbxmu/EPruner. △ Less

Submitted 26 May, 2021; v1 submitted 20 January, 2021; originally announced January 2021.

Comments: Accepted by IEEE Transactions on Neural Networks and Learning Systems (IEEE TNNLS)

arXiv:2101.00420 [pdf, other]

Learning to Generate Task-Specific Adapters from Task Description

Authors: Qinyuan Ye, Xiang Ren

Abstract: Pre-trained text-to-text transformers such as BART have achieved impressive performance across a range of NLP tasks. Recent study further shows that they can learn to generalize to novel tasks, by including task descriptions as part of the source sequence and training the model with (source, target) examples. At test time, these fine-tuned models can make inferences on new tasks using the new task… ▽ More Pre-trained text-to-text transformers such as BART have achieved impressive performance across a range of NLP tasks. Recent study further shows that they can learn to generalize to novel tasks, by including task descriptions as part of the source sequence and training the model with (source, target) examples. At test time, these fine-tuned models can make inferences on new tasks using the new task descriptions as part of the input. However, this approach has potential limitations, as the model learns to solve individual (source, target) examples (i.e., at the instance level), instead of learning to solve tasks by taking all examples within a task as a whole (i.e., at the task level). To this end, we introduce Hypter, a framework that improves text-to-text transformer's generalization ability to unseen tasks by training a hypernetwork to generate task-specific, light-weight adapters from task descriptions. Experiments on ZEST dataset and a synthetic SQuAD dataset demonstrate that Hypter improves upon fine-tuning baselines. Notably, when using BART-Large as the main network, Hypter brings 11.3% comparative improvement on ZEST dataset. △ Less

Submitted 15 June, 2021; v1 submitted 2 January, 2021; originally announced January 2021.

Comments: Accepted to ACL 2021. Camera-ready version. Code: https://github.com/INK-USC/hypter

arXiv:2012.15856 [pdf, other]

Studying Strategically: Learning to Mask for Closed-book QA

Authors: Qinyuan Ye, Belinda Z. Li, Sinong Wang, Benjamin Bolte, Hao Ma, Wen-tau Yih, Xiang Ren, Madian Khabsa

Abstract: Closed-book question-answering (QA) is a challenging task that requires a model to directly answer questions without access to external knowledge. It has been shown that directly fine-tuning pre-trained language models with (question, answer) examples yields surprisingly competitive performance, which is further improved upon through adding an intermediate pre-training stage between general pre-tr… ▽ More Closed-book question-answering (QA) is a challenging task that requires a model to directly answer questions without access to external knowledge. It has been shown that directly fine-tuning pre-trained language models with (question, answer) examples yields surprisingly competitive performance, which is further improved upon through adding an intermediate pre-training stage between general pre-training and fine-tuning. Prior work used a heuristic during this intermediate stage, whereby named entities and dates are masked, and the model is trained to recover these tokens. In this paper, we aim to learn the optimal masking strategy for the intermediate pre-training stage. We first train our masking policy to extract spans that are likely to be tested, using supervision from the downstream task itself, then deploy the learned policy during intermediate pre-training. Thus, our policy packs task-relevant knowledge into the parameters of a language model. Our approach is particularly effective on TriviaQA, outperforming strong heuristics when used to pre-train BART. △ Less

Submitted 1 January, 2021; v1 submitted 31 December, 2020; originally announced December 2020.

arXiv:2012.12356 [pdf, other]

Unbiased Subdata Selection for Fair Classification: A Unified Framework and Scalable Algorithms

Authors: Qing Ye, Weijun Xie

Abstract: As an important problem in modern data analytics, classification has witnessed varieties of applications from different domains. Different from conventional classification approaches, fair classification concerns the issues of unintentional biases against the sensitive features (e.g., gender, race). Due to high nonconvexity of fairness measures, existing methods are often unable to model exact fai… ▽ More As an important problem in modern data analytics, classification has witnessed varieties of applications from different domains. Different from conventional classification approaches, fair classification concerns the issues of unintentional biases against the sensitive features (e.g., gender, race). Due to high nonconvexity of fairness measures, existing methods are often unable to model exact fairness, which can cause inferior fair classification outcomes. This paper fills the gap by developing a novel unified framework to jointly optimize accuracy and fairness. The proposed framework is versatile and can incorporate different fairness measures studied in literature precisely as well as can be applicable to many classifiers including deep classification models. Specifically, in this paper, we first prove Fisher consistency of the proposed framework. We then show that many classification models within this framework can be recast as mixed-integer convex programs, which can be solved effectively by off-the-shelf solvers when the instance sizes are moderate and can be used as benchmarks to compare the efficiency of approximation algorithms. We prove that in the proposed framework, when the classification outcomes are known, the resulting problem, termed "unbiased subdata selection," is strongly polynomial-solvable and can be used to enhance the classification fairness by selecting more representative data points. This motivates us to develop an iterative refining strategy (IRS) to solve the large-scale instances, where we improve the classification accuracy and conduct the unbiased subdata selection in an alternating fashion. We study the convergence property of IRS and derive its approximation bound. More broadly, this framework can be leveraged to improve classification models with unbalanced data by taking F1 score into consideration. △ Less

Submitted 24 December, 2020; v1 submitted 22 December, 2020; originally announced December 2020.

Comments: 42 pages, 4 Figures

arXiv:2012.09410 [pdf, ps, other]

doi 10.1016/j.optlaseng.2021.106808

Robust Phase Retrieval with Green Noise Binary Masks

Authors: Qiuliang Ye, Yuk-Hee Chan, Michael G. Somekh, Daniel P. K. Lun

Abstract: Phase retrieval with pre-defined optical masks can provide extra constraint and thus achieve improved performance. The recent progress in optimization theory demonstrates the superiority of random masks in phase retrieval algorithms. However, traditional approaches just focus on the randomness of the masks but ignore their non-bandlimited nature. When using these masks in the reconstruction proces… ▽ More Phase retrieval with pre-defined optical masks can provide extra constraint and thus achieve improved performance. The recent progress in optimization theory demonstrates the superiority of random masks in phase retrieval algorithms. However, traditional approaches just focus on the randomness of the masks but ignore their non-bandlimited nature. When using these masks in the reconstruction process for phase retrieval, the high frequency part of the masks is often removed in the process and thus leads to degraded performance. Based on the concept of digital halftoning, this paper proposes a green noise binary masking scheme which can greatly reduce the high frequency content of the masks while fulfilling the randomness requirement. The experimental results show that the proposed green noise binary masking scheme outperform the traditional ones when using in a DMD-based coded diffraction pattern phase retrieval system. △ Less

Submitted 14 September, 2021; v1 submitted 17 December, 2020; originally announced December 2020.

arXiv:2012.03149 [pdf, other]

Adaptive Weighted Discriminator for Training Generative Adversarial Networks

Authors: Vasily Zadorozhnyy, Qiang Cheng, Qiang Ye

Abstract: Generative adversarial network (GAN) has become one of the most important neural network models for classical unsupervised machine learning. A variety of discriminator loss functions have been developed to train GAN's discriminators and they all have a common structure: a sum of real and fake losses that only depends on the actual and generated data respectively. One challenge associated with an e… ▽ More Generative adversarial network (GAN) has become one of the most important neural network models for classical unsupervised machine learning. A variety of discriminator loss functions have been developed to train GAN's discriminators and they all have a common structure: a sum of real and fake losses that only depends on the actual and generated data respectively. One challenge associated with an equally weighted sum of two losses is that the training may benefit one loss but harm the other, which we show causes instability and mode collapse. In this paper, we introduce a new family of discriminator loss functions that adopts a weighted sum of real and fake parts, which we call adaptive weighted loss functions or aw-loss functions. Using the gradients of the real and fake parts of the loss, we can adaptively choose weights to train a discriminator in the direction that benefits the GAN's stability. Our method can be potentially applied to any discriminator model with a loss that is a sum of the real and fake parts. Experiments validated the effectiveness of our loss functions on an unconditional image generation task, improving the baseline results by a significant margin on CIFAR-10, STL-10, and CIFAR-100 datasets in Inception Scores and FID. △ Less

Submitted 27 May, 2021; v1 submitted 5 December, 2020; originally announced December 2020.

Comments: 16 pages total, 7 figures, 6 tables and 2 algorithms

arXiv:2012.02188 [pdf, other]

Stochastic Gradient Descent with Nonlinear Conjugate Gradient-Style Adaptive Momentum

Authors: Bao Wang, Qiang Ye

Abstract: Momentum plays a crucial role in stochastic gradient-based optimization algorithms for accelerating or improving training deep neural networks (DNNs). In deep learning practice, the momentum is usually weighted by a well-calibrated constant. However, tuning hyperparameters for momentum can be a significant computational burden. In this paper, we propose a novel \emph{adaptive momentum} for improvi… ▽ More Momentum plays a crucial role in stochastic gradient-based optimization algorithms for accelerating or improving training deep neural networks (DNNs). In deep learning practice, the momentum is usually weighted by a well-calibrated constant. However, tuning hyperparameters for momentum can be a significant computational burden. In this paper, we propose a novel \emph{adaptive momentum} for improving DNNs training; this adaptive momentum, with no momentum related hyperparameter required, is motivated by the nonlinear conjugate gradient (NCG) method. Stochastic gradient descent (SGD) with this new adaptive momentum eliminates the need for the momentum hyperparameter calibration, allows a significantly larger learning rate, accelerates DNN training, and improves final accuracy and robustness of the trained DNNs. For instance, SGD with this adaptive momentum reduces classification errors for training ResNet110 for CIFAR10 and CIFAR100 from $5.25\%$ to $4.64\%$ and $23.75\%$ to $20.03\%$, respectively. Furthermore, SGD with the new adaptive momentum also benefits adversarial training and improves adversarial robustness of the trained DNNs. △ Less

Submitted 3 December, 2020; originally announced December 2020.

Comments: 17 pages, 2 figures

MSC Class: 65K10 ACM Class: G.1; I.2

arXiv:2012.00396 [pdf, ps, other]

doi 10.1016/j.dam.2021.11.016

A bridge between the minimal doubly resolving set problem in (folded) hypercubes and the coin weighing problem

Authors: Changhong Lu, Qingjie Ye

Abstract: In this paper, we consider the minimal doubly resolving set problem in Hamming graphs, hypercubes and folded hypercubes. We prove that the minimal doubly resolving set problem in hypercubes is equivalent to the coin weighing problem. Then we answer an open question on the minimal doubly resolving set problem in hypercubes. We disprove a conjecture on the metric dimension problem in folded hypercub… ▽ More In this paper, we consider the minimal doubly resolving set problem in Hamming graphs, hypercubes and folded hypercubes. We prove that the minimal doubly resolving set problem in hypercubes is equivalent to the coin weighing problem. Then we answer an open question on the minimal doubly resolving set problem in hypercubes. We disprove a conjecture on the metric dimension problem in folded hypercubes and give some asymptotic results for the metric dimension and the minimal doubly resolving set problems in Hamming graphs and folded hypercubes by establishing connections between these problems. Using the Lindström's method for the coin weighing problem, we give an efficient algorithm for the minimal doubly resolving set problem in hypercubes and report some new upper bounds. We also prove that the minimal doubly resolving set problem is NP-hard even restrict on split graphs, bipartite graphs and co-bipartite graphs. △ Less

Submitted 5 December, 2021; v1 submitted 1 December, 2020; originally announced December 2020.

Journal ref: Discrete Appl. Math. 309 (2022) 147-159

arXiv:2011.10380 [pdf, other]

doi 10.3847/1538-3881/abc3bc

Establishing Earth's Minimoon Population through Characterization of Asteroid 2020 CD$_3$

Authors: Grigori Fedorets, Marco Micheli, Robert Jedicke, Shantanu P. Naidu, Davide Farnocchia, Mikael Granvik, Nicholas Moskovitz, Megan E. Schwamb, Robert Weryk, Kacper Wierzchoś, Eric Christensen, Theodore Pruyne, William F. Bottke, Quanzhi Ye, Richard Wainscoat, Maxime Devogèle, Laura E. Buchanan, Anlaug Amanda Djupvik, Daniel M. Faes, Dora Föhring, Joel Roediger, Tom Seccull, Adam B. Smith

Abstract: We report on our detailed characterization of Earth's second known temporary natural satellite, or minimoon, asteroid 2020 CD3. An artificial origin can be ruled out based on its area-to-mass ratio and broadband photometry, which suggest that it is a silicate asteroid belonging to the S or V complex in asteroid taxonomy. The discovery of 2020 CD3 allows for the first time a comparison between know… ▽ More We report on our detailed characterization of Earth's second known temporary natural satellite, or minimoon, asteroid 2020 CD3. An artificial origin can be ruled out based on its area-to-mass ratio and broadband photometry, which suggest that it is a silicate asteroid belonging to the S or V complex in asteroid taxonomy. The discovery of 2020 CD3 allows for the first time a comparison between known minimoons and theoretical models of their expected physical and dynamical properties. The estimated diameter of 1.2+0.4-0.2 m and geocentric capture approximately a decade after the first known minimoon, 2006 RH120, are in agreement with theoretical predictions. The capture duration of 2020 CD3 of at least 2.7 yr is unexpectedly long compared to the simulation average, but it is in agreement with simulated minimoons that have close lunar encounters, providing additional support for the orbital models. 2020 CD3's atypical rotation period, significantly longer than theoretical predictions, suggests that our understanding of meter-scale asteroids needs revision. More discoveries and a detailed characterization of the population can be expected with the forthcoming Vera C. Rubin Observatory Legacy Survey of Space and Time. △ Less

Submitted 20 November, 2020; originally announced November 2020.

Comments: 22 pages, 6 Figures, 5 Tables, to appear in the Astronomical Journal

arXiv:2011.10184 [pdf, other]

A Deep Search for Emission From "Rock Comet" (3200) Phaethon At 1 AU

Authors: Quanzhi Ye, Matthew M. Knight, Michael S. P. Kelley, Nicholas A. Moskovitz, Annika Gustafsson, David Schleicher

Abstract: We present a deep imaging and spectroscopic search for emission from (3200) Phaethon, a large near-Earth asteroid that appears to be the parent of the strong Geminid meteoroid stream, using the 4.3 m Lowell Discovery Telescope. Observations were conducted on 2017 December 14-18 when Phaethon passed only 0.07 au from the Earth. We determine the $3σ$ upper level of dust and CN production rates to be… ▽ More We present a deep imaging and spectroscopic search for emission from (3200) Phaethon, a large near-Earth asteroid that appears to be the parent of the strong Geminid meteoroid stream, using the 4.3 m Lowell Discovery Telescope. Observations were conducted on 2017 December 14-18 when Phaethon passed only 0.07 au from the Earth. We determine the $3σ$ upper level of dust and CN production rates to be 0.007-0.2 $\mathrm{kg~s^{-1}}$ and $2.3\times10^{22}~\mathrm{molecule~s^{-1}}$ through narrowband imaging. A search in broadband images taken through the SDSS $r'$ filter shows no 100-m-class fragments in Phaethon's vicinity. A deeper, but star-contaminated search also shows no sign of fragments down to 15 m. Optical spectroscopy of Phaethon and comet C/2017 O1 (ASASSN) as comparison confirms the absence of cometary emission lines from Phaethon and yields $3σ$ upper levels of CN, C$_2$ and C$_3$ of $\sim10^{24}$-$10^{25} \mathrm{molecule~s^{-1}}$, 2 orders of magnitude higher than the CN constraint placed by narrowband imaging, due to the much narrower on-sky aperture of the spectrographic slit. We show that narrowband imaging could provide an efficient way to look for weak gas emission from near-extinct bodies near the Earth, though these observations require careful interpretation. Assuming Phaethon's behavior is unchanged, our analysis shows that the DESTINY$^+$ mission, currently planning to explore Phaethon in 2026, may not be able to directly detect a gas coma. △ Less

Submitted 19 November, 2020; originally announced November 2020.

Comments: PSJ in press

arXiv:2011.09781 [pdf, other]

Towards Spatio-Temporal Video Scene Text Detection via Temporal Clustering

Authors: Yuanqiang Cai, Chang Liu, Weiqiang Wang, Qixiang Ye

Abstract: With only bounding-box annotations in the spatial domain, existing video scene text detection (VSTD) benchmarks lack temporal relation of text instances among video frames, which hinders the development of video text-related applications. In this paper, we systematically introduce a new large-scale benchmark, named as STVText4, a well-designed spatial-temporal detection metric (STDM), and a novel… ▽ More With only bounding-box annotations in the spatial domain, existing video scene text detection (VSTD) benchmarks lack temporal relation of text instances among video frames, which hinders the development of video text-related applications. In this paper, we systematically introduce a new large-scale benchmark, named as STVText4, a well-designed spatial-temporal detection metric (STDM), and a novel clustering-based baseline method, referred to as Temporal Clustering (TC). STVText4 opens a challenging yet promising direction of VSTD, termed as ST-VSTD, which targets at simultaneously detecting video scene texts in both spatial and temporal domains. STVText4 contains more than 1.4 million text instances from 161,347 video frames of 106 videos, where each instance is annotated with not only spatial bounding box and temporal range but also four intrinsic attributes, including legibility, density, scale, and lifecycle, to facilitate the community. With continuous propagation of identical texts in the video sequence, TC can accurately output the spatial quadrilateral and temporal range of the texts, which sets a strong baseline for ST-VSTD. Experiments demonstrate the efficacy of our method and the great academic and practical value of the STVText4. The dataset and code will be available soon. △ Less

Submitted 19 November, 2020; originally announced November 2020.

arXiv:2011.03972 [pdf, other]

doi 10.1109/TIP.2021.3078079

Adaptive Linear Span Network for Object Skeleton Detection

Authors: Chang Liu, Yunjie Tian, Jianbin Jiao, Qixiang Ye

Abstract: Conventional networks for object skeleton detection are usually hand-crafted. Although effective, they require intensive priori knowledge to configure representative features for objects in different scale granularity.In this paper, we propose adaptive linear span network (AdaLSN), driven by neural architecture search (NAS), to automatically configure and integrate scale-aware features for object… ▽ More Conventional networks for object skeleton detection are usually hand-crafted. Although effective, they require intensive priori knowledge to configure representative features for objects in different scale granularity.In this paper, we propose adaptive linear span network (AdaLSN), driven by neural architecture search (NAS), to automatically configure and integrate scale-aware features for object skeleton detection. AdaLSN is formulated with the theory of linear span, which provides one of the earliest explanations for multi-scale deep feature fusion. AdaLSN is materialized by defining a mixed unit-pyramid search space, which goes beyond many existing search spaces using unit-level or pyramid-level features.Within the mixed space, we apply genetic architecture search to jointly optimize unit-level operations and pyramid-level connections for adaptive feature space expansion. AdaLSN substantiates its versatility by achieving significantly higher accuracy and latency trade-off compared with state-of-the-arts. It also demonstrates general applicability to image-to-mask tasks such as edge detection and road extraction. Code is available at \href{https://github.com/sunsmarterjie/SDL-Skeleton}{\color{magenta}github.com/sunsmarterjie/SDL-Skeleton}. △ Less

Submitted 8 November, 2020; originally announced November 2020.

Comments: 13 pages, 9 figures

arXiv:2011.00875 [pdf, other]

doi 10.1103/PhysRevLett.126.185501

The dynamic nature of high pressure ice VII

Authors: Qi-Jun Ye, Lin Zhuang, Xin-Zheng Li

Abstract: Starting from Shannon's definition of dynamic entropy, we proposed a simple theory to describe the transition between different rare event related dynamic states in condensed matters, and used it to investigate high pressure ice VII. Instead of the thermodynamic intensive quantities such as the temperature and pressure, a dynamic intensive quantity named dynamic field is taken as the controlling v… ▽ More Starting from Shannon's definition of dynamic entropy, we proposed a simple theory to describe the transition between different rare event related dynamic states in condensed matters, and used it to investigate high pressure ice VII. Instead of the thermodynamic intensive quantities such as the temperature and pressure, a dynamic intensive quantity named dynamic field is taken as the controlling variable for the transition. Based on the dynamic entropy versus dynamic field curve, two dynamic states corresponding to ice VII and dynamic ice VII were discriminated rigorously in a pure dynamic view. Their microscopic differences were assigned to the dynamic patterns of proton transfer. This study puts a similar dynamical theory used in earlier studies of glass models on a simple and more fundamental basis, which could be applied to describe the dynamic states of realistic and more condensed matter systems. △ Less

Submitted 17 November, 2020; v1 submitted 2 November, 2020; originally announced November 2020.

Comments: 5 pages, 4 figures

Journal ref: Phys. Rev. Lett. 126, 185501 (2021)

arXiv:2010.04801 [pdf, other]

Semi-Automated Protocol Disambiguation and Code Generation

Authors: Jane Yen, Tamás Lévai, Qinyuan Ye, Xiang Ren, Ramesh Govindan, Barath Raghavan

Abstract: For decades, Internet protocols have been specified using natural language. Given the ambiguity inherent in such text, it is not surprising that protocol implementations have long exhibited bugs. In this paper, we apply natural language processing (NLP) to effect semi-automated generation of protocol implementations from specification text. Our system, SAGE, can uncover ambiguous or under-specifie… ▽ More For decades, Internet protocols have been specified using natural language. Given the ambiguity inherent in such text, it is not surprising that protocol implementations have long exhibited bugs. In this paper, we apply natural language processing (NLP) to effect semi-automated generation of protocol implementations from specification text. Our system, SAGE, can uncover ambiguous or under-specified sentences in specifications; once these are clarified by the spec author, SAGE can generate protocol code automatically. Using SAGE, we discover 5 instances of ambiguity and 6 instances of under-specification in the ICMP RFC; after clarification, SAGE is able to automatically generate code that interoperates perfectly with Linux implementations. We show that SAGE generalizes to BFD, IGMP, and NTP. We also find that SAGE supports many of the conceptual components found in key protocols, suggesting that, with some additional machinery, SAGE may be able to generalize to TCP and BGP. △ Less

Submitted 1 February, 2021; v1 submitted 9 October, 2020; originally announced October 2020.

arXiv:2009.10942 [pdf, other]

Exploring global diverse attention via pairwise temporal relation for video summarization

Authors: Ping Li, Qinghao Ye, Luming Zhang, Li Yuan, Xianghua Xu, Ling Shao

Abstract: Video summarization is an effective way to facilitate video searching and browsing. Most of existing systems employ encoder-decoder based recurrent neural networks, which fail to explicitly diversify the system-generated summary frames while requiring intensive computations. In this paper, we propose an efficient convolutional neural network architecture for video SUMmarization via Global Diverse… ▽ More Video summarization is an effective way to facilitate video searching and browsing. Most of existing systems employ encoder-decoder based recurrent neural networks, which fail to explicitly diversify the system-generated summary frames while requiring intensive computations. In this paper, we propose an efficient convolutional neural network architecture for video SUMmarization via Global Diverse Attention called SUM-GDA, which adapts attention mechanism in a global perspective to consider pairwise temporal relations of video frames. Particularly, the GDA module has two advantages: 1) it models the relations within paired frames as well as the relations among all pairs, thus capturing the global attention across all frames of one video; 2) it reflects the importance of each frame to the whole video, leading to diverse attention on these frames. Thus, SUM-GDA is beneficial for generating diverse frames to form satisfactory video summary. Extensive experiments on three data sets, i.e., SumMe, TVSum, and VTW, have demonstrated that SUM-GDA and its extension outperform other competing state-of-the-art methods with remarkable improvements. In addition, the proposed models can be run in parallel with significantly less computational costs, which helps the deployment in highly demanding applications. △ Less

Submitted 23 September, 2020; originally announced September 2020.

Comments: 12 pages, 8 figures

Journal ref: Pattern Recognition, 2020

arXiv:2009.07506 [pdf, other]

The 1st Tiny Object Detection Challenge:Methods and Results

Authors: Xuehui Yu, Zhenjun Han, Yuqi Gong, Nan Jiang, Jian Zhao, Qixiang Ye, Jie Chen, Yuan Feng, Bin Zhang, Xiaodi Wang, Ying Xin, Jingwei Liu, Mingyuan Mao, Sheng Xu, Baochang Zhang, Shumin Han, Cheng Gao, Wei Tang, Lizuo Jin, Mingbo Hong, Yuchao Yang, Shuiwang Li, Huan Luo, Qijun Zhao, Humphrey Shi

Abstract: The 1st Tiny Object Detection (TOD) Challenge aims to encourage research in developing novel and accurate methods for tiny object detection in images which have wide views, with a current focus on tiny person detection. The TinyPerson dataset was used for the TOD Challenge and is publicly released. It has 1610 images and 72651 box-levelannotations. Around 36 participating teams from the globe comp… ▽ More The 1st Tiny Object Detection (TOD) Challenge aims to encourage research in developing novel and accurate methods for tiny object detection in images which have wide views, with a current focus on tiny person detection. The TinyPerson dataset was used for the TOD Challenge and is publicly released. It has 1610 images and 72651 box-levelannotations. Around 36 participating teams from the globe competed inthe 1st TOD Challenge. In this paper, we provide a brief summary of the1st TOD Challenge including brief introductions to the top three methods.The submission leaderboard will be reopened for researchers that areinterested in the TOD challenge. The benchmark dataset and other information can be found at: https://github.com/ucas-vg/TinyBenchmark. △ Less

Submitted 6 October, 2020; v1 submitted 16 September, 2020; originally announced September 2020.

Comments: ECCV2020 Workshop on Real-world Computer Vision from Inputs with Limited Quality (RLQ) and Tiny Object Detection Challenge

arXiv:2009.05780 [pdf, other]

EdgeLoc: An Edge-IoT Framework for Robust Indoor Localization Using Capsule Networks

Authors: Qianwen Ye, Xiaochen Fan, Gengfa Fang, Hongxia Bie, Chaocan Xiang, Xudong Song, Xiangjian He

Abstract: With the unprecedented demand for location-based services in indoor scenarios, wireless indoor localization has become essential for mobile users. While GPS is not available at indoor spaces, WiFi RSS fingerprinting has become popular with its ubiquitous accessibility. However, it is challenging to achieve robust and efficient indoor localization with two major challenges. First, the localization… ▽ More With the unprecedented demand for location-based services in indoor scenarios, wireless indoor localization has become essential for mobile users. While GPS is not available at indoor spaces, WiFi RSS fingerprinting has become popular with its ubiquitous accessibility. However, it is challenging to achieve robust and efficient indoor localization with two major challenges. First, the localization accuracy can be degraded by the random signal fluctuations, which would influence conventional localization algorithms that simply learn handcrafted features from raw fingerprint data. Second, mobile users are sensitive to the localization delay, but conventional indoor localization algorithms are computation-intensive and time-consuming. In this paper, we propose EdgeLoc, an edge-IoT framework for efficient and robust indoor localization using capsule networks. We develop a deep learning model with the CapsNet to efficiently extract hierarchical information from WiFi fingerprint data, thereby significantly improving the localization accuracy. Moreover, we implement an edge-computing prototype system to achieve a nearly real-time localization process, by enabling mobile users with the deep-learning model that has been well-trained by the edge server. We conduct a real-world field experimental study with over 33,600 data points and an extensive synthetic experiment with the open dataset, and the experimental results validate the effectiveness of EdgeLoc. The best trade-off of the EdgeLoc system achieves 98.5% localization accuracy within an average positioning time of only 2.31 ms in the field experiment. △ Less

Submitted 12 September, 2020; originally announced September 2020.

Comments: 11 pages, 12 figures

arXiv:2009.03816 [pdf, other]

PSO-PS: Parameter Synchronization with Particle Swarm Optimization for Distributed Training of Deep Neural Networks

Authors: Qing Ye, Yuxuan Han, Yanan sun, JIancheng Lv

Abstract: Parameter updating is an important stage in parallelism-based distributed deep learning. Synchronous methods are widely used in distributed training the Deep Neural Networks (DNNs). To reduce the communication and synchronization overhead of synchronous methods, decreasing the synchronization frequency (e.g., every $n$ mini-batches) is a straightforward approach. However, it often suffers from poo… ▽ More Parameter updating is an important stage in parallelism-based distributed deep learning. Synchronous methods are widely used in distributed training the Deep Neural Networks (DNNs). To reduce the communication and synchronization overhead of synchronous methods, decreasing the synchronization frequency (e.g., every $n$ mini-batches) is a straightforward approach. However, it often suffers from poor convergence. In this paper, we propose a new algorithm of integrating Particle Swarm Optimization (PSO) into the distributed training process of DNNs to automatically compute new parameters. In the proposed algorithm, a computing work is encoded by a particle, the weights of DNNs and the training loss are modeled by the particle attributes. At each synchronization stage, the weights are updated by PSO from the sub weights gathered from all workers, instead of averaging the weights or the gradients. To verify the performance of the proposed algorithm, the experiments are performed on two commonly used image classification benchmarks: MNIST and CIFAR10, and compared with the peer competitors at multiple different synchronization configurations. The experimental results demonstrate the competitiveness of the proposed algorithm. △ Less

Submitted 6 September, 2020; originally announced September 2020.

Comments: 7pages

Journal ref: IJCNN2020

arXiv:2009.02701 [pdf, other]

HPSGD: Hierarchical Parallel SGD With Stale Gradients Featuring

Authors: Yuhao Zhou, Qing Ye, Hailun Zhang, Jiancheng Lv

Abstract: While distributed training significantly speeds up the training process of the deep neural network (DNN), the utilization of the cluster is relatively low due to the time-consuming data synchronizing between workers. To alleviate this problem, a novel Hierarchical Parallel SGD (HPSGD) strategy is proposed based on the observation that the data synchronization phase can be paralleled with the local… ▽ More While distributed training significantly speeds up the training process of the deep neural network (DNN), the utilization of the cluster is relatively low due to the time-consuming data synchronizing between workers. To alleviate this problem, a novel Hierarchical Parallel SGD (HPSGD) strategy is proposed based on the observation that the data synchronization phase can be paralleled with the local training phase (i.e., Feed-forward and back-propagation). Furthermore, an improved model updating method is unitized to remedy the introduced stale gradients problem, which commits updates to the replica (i.e., a temporary model that has the same parameters as the global model) and then merges the average changes to the global model. Extensive experiments are conducted to demonstrate that the proposed HPSGD approach substantially boosts the distributed DNN training, reduces the disturbance of the stale gradients and achieves better accuracy in given fixed wall-time. △ Less

Submitted 28 November, 2020; v1 submitted 6 September, 2020; originally announced September 2020.

Comments: 12 pages, 10 figures, ICONIP2020 under review

arXiv:2009.01489 [pdf, other]

HACCLE: Metaprogramming for Secure Multi-Party Computation -- Extended Version

Authors: Yuyan Bao, Kirshanthan Sundararajah, Raghav Malik, Qianchuan Ye, Christopher Wagner, Nouraldin Jaber, Fei Wang, Mohammad Hassan Ameri, Donghang Lu, Alexander Seto, Benjamin Delaware, Roopsha Samanta, Aniket Kate, Christina Garman, Jeremiah Blocki, Pierre-David Letourneau, Benoit Meister, Jonathan Springer, Tiark Rompf, Milind Kulkarni

Abstract: Cryptographic techniques have the potential to enable distrusting parties to collaborate in fundamentally new ways, but their practical implementation poses numerous challenges. An important class of such cryptographic techniques is known as Secure Multi-Party Computation (MPC). Developing Secure MPC applications in realistic scenarios requires extensive knowledge spanning multiple areas of crypto… ▽ More Cryptographic techniques have the potential to enable distrusting parties to collaborate in fundamentally new ways, but their practical implementation poses numerous challenges. An important class of such cryptographic techniques is known as Secure Multi-Party Computation (MPC). Developing Secure MPC applications in realistic scenarios requires extensive knowledge spanning multiple areas of cryptography and systems. And while the steps to arrive at a solution for a particular application are often straightforward, it remains difficult to make the implementation efficient, and tedious to apply those same steps to a slightly different application from scratch. Hence, it is an important problem to design platforms for implementing Secure MPC applications with minimum effort and using techniques accessible to non-experts in cryptography. In this paper, we present the HACCLE (High Assurance Compositional Cryptography: Languages and Environments) toolchain, specifically targeted to MPC applications. HACCLE contains an embedded domain-specific language Harpoon, for software developers without cryptographic expertise to write MPC-based programs, and uses Lightweight Modular Staging (LMS) for code generation. Harpoon programs are compiled into acyclic circuits represented in HACCLE's Intermediate Representation (HIR) that serves as an abstraction over different cryptographic protocols such as secret sharing, homomorphic encryption, or garbled circuits. Implementations of different cryptographic protocols serve as different backends of our toolchain. The extensible design of HIR allows cryptographic experts to plug in new primitives and protocols to realize computation. And the use of standard metaprogramming techniques lowers the development effort significantly. △ Less

Submitted 30 September, 2021; v1 submitted 3 September, 2020; originally announced September 2020.

arXiv:2008.07366 [pdf]

Using LDA and LSTM Models to Study Public Opinions and Critical Groups Towards Congestion Pricing in New York City through 2007 to 2019

Authors: Qian Ye, Xiaohong Chen, Onur Kalan, Kaan Ozbay

Abstract: This study explores how people view and respond to the proposals of NYC congestion pricing evolve in time. To understand these responses, Twitter data is collected and analyzed. Critical groups in the recurrent process are detected by statistically analyzing the active users and the most mentioned accounts, and the trends of people's attitudes and concerns over the years are identified with text m… ▽ More This study explores how people view and respond to the proposals of NYC congestion pricing evolve in time. To understand these responses, Twitter data is collected and analyzed. Critical groups in the recurrent process are detected by statistically analyzing the active users and the most mentioned accounts, and the trends of people's attitudes and concerns over the years are identified with text mining and hybrid Nature Language Processing techniques, including LDA topic modeling and LSTM sentiment classification. The result shows that multiple interest groups were involved and played crucial roles during the proposal, especially Mayor and Governor, MTA, and outer-borough representatives. The public shifted the concern of focus from the plan details to a wider city's sustainability and fairness. Furthermore, the plan's approval relies on several elements, the joint agreement reached in the political process, strong motivation in the real-world, the scheme based on balancing multiple interests, and groups' awareness of tolling's benefits and necessity. △ Less

Submitted 31 July, 2020; originally announced August 2020.

arXiv:2008.03898 [pdf, other]

Prototype Mixture Models for Few-shot Semantic Segmentation

Authors: Boyu Yang, Chang Liu, Bohao Li, Jianbin Jiao, Qixiang Ye

Abstract: Few-shot segmentation is challenging because objects within the support and query images could significantly differ in appearance and pose. Using a single prototype acquired directly from the support image to segment the query image causes semantic ambiguity. In this paper, we propose prototype mixture models (PMMs), which correlate diverse image regions with multiple prototypes to enforce the pro… ▽ More Few-shot segmentation is challenging because objects within the support and query images could significantly differ in appearance and pose. Using a single prototype acquired directly from the support image to segment the query image causes semantic ambiguity. In this paper, we propose prototype mixture models (PMMs), which correlate diverse image regions with multiple prototypes to enforce the prototype-based semantic representation. Estimated by an Expectation-Maximization algorithm, PMMs incorporate rich channel-wised and spatial semantics from limited support images. Utilized as representations as well as classifiers, PMMs fully leverage the semantics to activate objects in the query image while depressing background regions in a duplex manner. Extensive experiments on Pascal VOC and MS-COCO datasets show that PMMs significantly improve upon state-of-the-arts. Particularly, PMMs improve 5-shot segmentation performance on MS-COCO by up to 5.82\% with only a moderate cost for model size and inference speed. △ Less

Submitted 1 September, 2020; v1 submitted 10 August, 2020; originally announced August 2020.

arXiv:2008.01928 [pdf, other]

Component Divide-and-Conquer for Real-World Image Super-Resolution

Authors: Pengxu Wei, Ziwei Xie, Hannan Lu, Zongyuan Zhan, Qixiang Ye, Wangmeng Zuo, Liang Lin

Abstract: In this paper, we present a large-scale Diverse Real-world image Super-Resolution dataset, i.e., DRealSR, as well as a divide-and-conquer Super-Resolution (SR) network, exploring the utility of guiding SR model with low-level image components. DRealSR establishes a new SR benchmark with diverse real-world degradation processes, mitigating the limitations of conventional simulated image degradation… ▽ More In this paper, we present a large-scale Diverse Real-world image Super-Resolution dataset, i.e., DRealSR, as well as a divide-and-conquer Super-Resolution (SR) network, exploring the utility of guiding SR model with low-level image components. DRealSR establishes a new SR benchmark with diverse real-world degradation processes, mitigating the limitations of conventional simulated image degradation. In general, the targets of SR vary with image regions with different low-level image components, e.g., smoothness preserving for flat regions, sharpening for edges, and detail enhancing for textures. Learning an SR model with conventional pixel-wise loss usually is easily dominated by flat regions and edges, and fails to infer realistic details of complex textures. We propose a Component Divide-and-Conquer (CDC) model and a Gradient-Weighted (GW) loss for SR. Our CDC parses an image with three components, employs three Component-Attentive Blocks (CABs) to learn attentive masks and intermediate SR predictions with an intermediate supervision learning strategy, and trains an SR model following a divide-and-conquer learning principle. Our GW loss also provides a feasible way to balance the difficulties of image components for SR. Extensive experiments validate the superior performance of our CDC and the challenging aspects of our DRealSR dataset related to diverse real-world scenarios. Our dataset and codes are publicly available at https://github.com/xiezw5/Component-Divide-and-Conquer-for-Real-World-Image-Super-Resolution △ Less

Submitted 5 August, 2020; originally announced August 2020.

Journal ref: European Conference on Computer Vision (ECCV), 2020

arXiv:2007.13264 [pdf, other]

Learning Task-oriented Disentangled Representations for Unsupervised Domain Adaptation

Authors: Pingyang Dai, Peixian Chen, Qiong Wu, Xiaopeng Hong, Qixiang Ye, Qi Tian, Rongrong Ji

Abstract: Unsupervised domain adaptation (UDA) aims to address the domain-shift problem between a labeled source domain and an unlabeled target domain. Many efforts have been made to address the mismatch between the distributions of training and testing data, but unfortunately, they ignore the task-oriented information across domains and are inflexible to perform well in complicated open-set scenarios. Many… ▽ More Unsupervised domain adaptation (UDA) aims to address the domain-shift problem between a labeled source domain and an unlabeled target domain. Many efforts have been made to address the mismatch between the distributions of training and testing data, but unfortunately, they ignore the task-oriented information across domains and are inflexible to perform well in complicated open-set scenarios. Many efforts have been made to eliminate the mismatch between the distributions of training and testing data by learning domain-invariant representations. However, the learned representations are usually not task-oriented, i.e., being class-discriminative and domain-transferable simultaneously. This drawback limits the flexibility of UDA in complicated open-set tasks where no labels are shared between domains. In this paper, we break the concept of task-orientation into task-relevance and task-irrelevance, and propose a dynamic task-oriented disentangling network (DTDN) to learn disentangled representations in an end-to-end fashion for UDA. The dynamic disentangling network effectively disentangles data representations into two components: the task-relevant ones embedding critical information associated with the task across domains, and the task-irrelevant ones with the remaining non-transferable or disturbing information. These two components are regularized by a group of task-specific objective functions across domains. Such regularization explicitly encourages disentangling and avoids the use of generative models or decoders. Experiments in complicated, open-set scenarios (retrieval tasks) and empirical benchmarks (classification tasks) demonstrate that the proposed method captures rich disentangled information and achieves superior performance. △ Less

Submitted 26 July, 2020; originally announced July 2020.

Comments: 9 pages, 6 figures

arXiv:2007.11831 [pdf, other]

DBS: Dynamic Batch Size For Distributed Deep Neural Network Training

Authors: Qing Ye, Yuhao Zhou, Mingjia Shi, Yanan Sun, Jiancheng Lv

Abstract: Synchronous strategies with data parallelism, such as the Synchronous StochasticGradient Descent (S-SGD) and the model averaging methods, are widely utilizedin distributed training of Deep Neural Networks (DNNs), largely owing to itseasy implementation yet promising performance. Particularly, each worker ofthe cluster hosts a copy of the DNN and an evenly divided share of the datasetwith the fixed… ▽ More Synchronous strategies with data parallelism, such as the Synchronous StochasticGradient Descent (S-SGD) and the model averaging methods, are widely utilizedin distributed training of Deep Neural Networks (DNNs), largely owing to itseasy implementation yet promising performance. Particularly, each worker ofthe cluster hosts a copy of the DNN and an evenly divided share of the datasetwith the fixed mini-batch size, to keep the training of DNNs convergence. In thestrategies, the workers with different computational capability, need to wait foreach other because of the synchronization and delays in network transmission,which will inevitably result in the high-performance workers wasting computation.Consequently, the utilization of the cluster is relatively low. To alleviate thisissue, we propose the Dynamic Batch Size (DBS) strategy for the distributedtraining of DNNs. Specifically, the performance of each worker is evaluatedfirst based on the fact in the previous epoch, and then the batch size and datasetpartition are dynamically adjusted in consideration of the current performanceof the worker, thereby improving the utilization of the cluster. To verify theeffectiveness of the proposed strategy, extensive experiments have been conducted,and the experimental results indicate that the proposed strategy can fully utilizethe performance of the cluster, reduce the training time, and have good robustnesswith disturbance by irrelevant tasks. Furthermore, rigorous theoretical analysis hasalso been provided to prove the convergence of the proposed strategy. △ Less

Submitted 3 November, 2022; v1 submitted 23 July, 2020; originally announced July 2020.

Comments: The latest version of this article has been accepted by IEEE TETCI

arXiv:2007.07373 [pdf]

GJ 3470 c: A Saturn-like Exoplanet Candidate in the Habitable Zone of GJ 3470

Authors: Phillip Scott, Bradley Walter, Quanzhi Ye, David Mitchell, Leo Heiland, Xing Gao, Alejandro Palado, Burkhonov Otabek, Jesus Delgado Casal, Colin Hill, Alberto Garcia, Kevin B. Alton, Yenal Ogmen, Vikrant Kumar Agnihotri, Alberto Caballero

Abstract: We report the discovery of a new exoplanet candidate orbiting the star GJ 3470. A total of three transits were detected by OKSky Observatory: the first one on December 23, 2019, the second one on February 27, 2020, and the third one on May 3, 2020. We estimate an average transit depth of 0.84 percent and duration of 1 hour and 2 minutes. Based on this parameter, we calculate a radius of 9.2 Earth… ▽ More We report the discovery of a new exoplanet candidate orbiting the star GJ 3470. A total of three transits were detected by OKSky Observatory: the first one on December 23, 2019, the second one on February 27, 2020, and the third one on May 3, 2020. We estimate an average transit depth of 0.84 percent and duration of 1 hour and 2 minutes. Based on this parameter, we calculate a radius of 9.2 Earth radii, which would correspond to the size of a Saturn-like exoplanet. We also estimate an orbital period of 66 days that places the exoplanet inside the habitable zone, near the orbital distance at Earths equivalent radiation. Another twelve potential transits that do not belong to GJ 3470 b are also reported. Despite our candidate for GJ 3470 c still has to be confirmed by the scientific community, the discovery represents a turning point in exoplanet research for being the first candidate discovered through an international project managed by amateur astronomers. △ Less

Submitted 14 July, 2020; originally announced July 2020.

arXiv:2007.04940 [pdf, other]

The Phong Surface: Efficient 3D Model Fitting using Lifted Optimization

Authors: Jingjing Shen, Thomas J. Cashman, Qi Ye, Tim Hutton, Toby Sharp, Federica Bogo, Andrew William Fitzgibbon, Jamie Shotton

Abstract: Realtime perceptual and interaction capabilities in mixed reality require a range of 3D tracking problems to be solved at low latency on resource-constrained hardware such as head-mounted devices. Indeed, for devices such as HoloLens 2 where the CPU and GPU are left available for applications, multiple tracking subsystems are required to run on a continuous, real-time basis while sharing a single… ▽ More Realtime perceptual and interaction capabilities in mixed reality require a range of 3D tracking problems to be solved at low latency on resource-constrained hardware such as head-mounted devices. Indeed, for devices such as HoloLens 2 where the CPU and GPU are left available for applications, multiple tracking subsystems are required to run on a continuous, real-time basis while sharing a single Digital Signal Processor. To solve model-fitting problems for HoloLens 2 hand tracking, where the computational budget is approximately 100 times smaller than an iPhone 7, we introduce a new surface model: the `Phong surface'. Using ideas from computer graphics, the Phong surface describes the same 3D shape as a triangulated mesh model, but with continuous surface normals which enable the use of lifting-based optimization, providing significant efficiency gains over ICP-based methods. We show that Phong surfaces retain the convergence benefits of smoother surface models, while triangle meshes do not. △ Less

Submitted 9 July, 2020; originally announced July 2020.

Journal ref: ECCV2020

arXiv:2007.03154 [pdf, other]

Discretization-Aware Architecture Search

Authors: Yunjie Tian, Chang Liu, Lingxi Xie, Jianbin Jiao, Qixiang Ye

Abstract: The search cost of neural architecture search (NAS) has been largely reduced by weight-sharing methods. These methods optimize a super-network with all possible edges and operations, and determine the optimal sub-network by discretization, \textit{i.e.}, pruning off weak candidates. The discretization process, performed on either operations or edges, incurs significant inaccuracy and thus the qual… ▽ More The search cost of neural architecture search (NAS) has been largely reduced by weight-sharing methods. These methods optimize a super-network with all possible edges and operations, and determine the optimal sub-network by discretization, \textit{i.e.}, pruning off weak candidates. The discretization process, performed on either operations or edges, incurs significant inaccuracy and thus the quality of the final architecture is not guaranteed. This paper presents discretization-aware architecture search (DA\textsuperscript{2}S), with the core idea being adding a loss term to push the super-network towards the configuration of desired topology, so that the accuracy loss brought by discretization is largely alleviated. Experiments on standard image classification benchmarks demonstrate the superiority of our approach, in particular, under imbalanced target network configurations that were not studied before. △ Less

Submitted 6 July, 2020; originally announced July 2020.

Comments: 14 pages, 7 figures

arXiv:2007.02577 [pdf, other]

Progressive Cluster Purification for Unsupervised Feature Learning

Authors: Yifei Zhang, Chang Liu, Yu Zhou, Wei Wang, Weiping Wang, Qixiang Ye

Abstract: In unsupervised feature learning, sample specificity based methods ignore the inter-class information, which deteriorates the discriminative capability of representation models. Clustering based methods are error-prone to explore the complete class boundary information due to the inevitable class inconsistent samples in each cluster. In this work, we propose a novel clustering based method, which,… ▽ More In unsupervised feature learning, sample specificity based methods ignore the inter-class information, which deteriorates the discriminative capability of representation models. Clustering based methods are error-prone to explore the complete class boundary information due to the inevitable class inconsistent samples in each cluster. In this work, we propose a novel clustering based method, which, by iteratively excluding class inconsistent samples during progressive cluster formation, alleviates the impact of noise samples in a simple-yet-effective manner. Our approach, referred to as Progressive Cluster Purification (PCP), implements progressive clustering by gradually reducing the number of clusters during training, while the sizes of clusters continuously expand consistently with the growth of model representation capability. With a well-designed cluster purification mechanism, it further purifies clusters by filtering noise samples which facilitate the subsequent feature learning by utilizing the refined clusters as pseudo-labels. Experiments on commonly used benchmarks demonstrate that the proposed PCP improves baseline method with significant margins. Our code will be available at https://github.com/zhangyifei0115/PCP. △ Less

Submitted 15 July, 2020; v1 submitted 6 July, 2020; originally announced July 2020.

Comments: 8 pages, 5 figures

arXiv:2007.01546 [pdf, other]

Multiple Expert Brainstorming for Domain Adaptive Person Re-identification

Authors: Yunpeng Zhai, Qixiang Ye, Shijian Lu, Mengxi Jia, Rongrong Ji, Yonghong Tian

Abstract: Often the best performing deep neural models are ensembles of multiple base-level networks, nevertheless, ensemble learning with respect to domain adaptive person re-ID remains unexplored. In this paper, we propose a multiple expert brainstorming network (MEB-Net) for domain adaptive person re-ID, opening up a promising direction about model ensemble problem under unsupervised conditions. MEB-Net… ▽ More Often the best performing deep neural models are ensembles of multiple base-level networks, nevertheless, ensemble learning with respect to domain adaptive person re-ID remains unexplored. In this paper, we propose a multiple expert brainstorming network (MEB-Net) for domain adaptive person re-ID, opening up a promising direction about model ensemble problem under unsupervised conditions. MEB-Net adopts a mutual learning strategy, where multiple networks with different architectures are pre-trained within a source domain as expert models equipped with specific features and knowledge, while the adaptation is then accomplished through brainstorming (mutual learning) among expert models. MEB-Net accommodates the heterogeneity of experts learned with different architectures and enhances discrimination capability of the adapted re-ID model, by introducing a regularization scheme about authority of experts. Extensive experiments on large-scale datasets (Market-1501 and DukeMTMC-reID) demonstrate the superior performance of MEB-Net over the state-of-the-arts. △ Less

Submitted 13 July, 2020; v1 submitted 3 July, 2020; originally announced July 2020.

Comments: Accepted by ECCV'20

arXiv:2007.01368 [pdf, other]

Recovery of Returning Halley-Type Comet 12P/Pons-Brooks With the Lowell Discovery Telescope

Authors: Quanzhi Ye, Tony L. Farnham, Matthew M. Knight, Carrie E. Holt, Lori M. Feaga

Abstract: We report the recovery of returning Halley-type comet 12P/Pons-Brooks using the 4.3 m Lowell Discovery Telescope, at a heliocentric distance of 11.89 au. Comparative analysis with a dust model suggests that the comet may have been active since $\sim30$ au from the Sun. We derive a nucleus radius of $17\pm6$ km from the nucleus photometry, though this number is likely an overestimation due to the c… ▽ More We report the recovery of returning Halley-type comet 12P/Pons-Brooks using the 4.3 m Lowell Discovery Telescope, at a heliocentric distance of 11.89 au. Comparative analysis with a dust model suggests that the comet may have been active since $\sim30$ au from the Sun. We derive a nucleus radius of $17\pm6$ km from the nucleus photometry, though this number is likely an overestimation due to the contamination from dust and gas. Continuing monitoring is encouraged in anticipation of the comet's forthcoming perihelion in 2024 April. △ Less

Submitted 2 July, 2020; originally announced July 2020.

Comments: Submitted to RNAAS

arXiv:2007.01015 [pdf, ps, other]

doi 10.1088/1674-4527/20/11/188

Luminosity of radio pulsar and its new emission death line

Authors: Q. D. Wu, Q. J. Zhi, C. M. Zhang, D. H. Wang, C. Q. Ye

Abstract: We investigated the pulsar radio luminosity ($L$), emission efficiency (ratio of radio luminosity to its spin-down power $\dot{E}$), and death line in the diagram of magnetic field (B) versus spin period (P), and found that the dependence of pulsar radio luminosity on its spin-down power ($L-\dot{E}$) is very weak, shown as $L\sim\dot{E}^{0.06}$, which deduces an equivalent inverse correlation bet… ▽ More We investigated the pulsar radio luminosity ($L$), emission efficiency (ratio of radio luminosity to its spin-down power $\dot{E}$), and death line in the diagram of magnetic field (B) versus spin period (P), and found that the dependence of pulsar radio luminosity on its spin-down power ($L-\dot{E}$) is very weak, shown as $L\sim\dot{E}^{0.06}$, which deduces an equivalent inverse correlation between emission efficiency and spin-down power as $ξ\sim \dot{E}^{-0.94}$. Furthermore, we examined the distributions of radio luminosity of millisecond and normal pulsars, and found that, for the similar spin-down powers, the radio luminosity of millisecond pulsars is about one order of magnitude lower than that of the normal pulsars. The analysis of pulsar radio flux suggests that this correlations are not due to a selective effect, but are intrinsic to the pulsar radio emission physics. Their radio radiations may be dominated by the different radiation mechanisms. The cut-off phenomenon of currently observed radio pulsars in B-P diagram is usually referred as the "pulsar death line", which corresponds to $\dot{E}\approx 10^{30}$ erg/s and is obtained by the cut-off voltage of electron acceleration gap in the polar cap model of pulsar proposed by Ruderman and Sutherland. Observationally, this death line can be inferred by the actual observed pulsar flux $S\approx $1mJy and 1kpc distance, together with the maximum radio emission efficiency of 1\%. At present, the actual observed pulsar flux can reach 0.01mJy by FAST telescope, which will arise the observational limit of spin-down power of pulsar as low as $\dot{E}\approx 10^28$ erg/s. This means that the new death line is downward shifted two orders of magnitude, which might be favorably referred as the "observational limit-line", and accordingly the pulsar theoretical model for the cut-off voltage of gap should be heavily modified. △ Less

Submitted 2 July, 2020; originally announced July 2020.

arXiv:2006.14863 [pdf, other]

Domain Contrast for Domain Adaptive Object Detection

Authors: Feng Liu, Xiaoxong Zhang, Fang Wan, Xiangyang Ji, Qixiang Ye

Abstract: We present Domain Contrast (DC), a simple yet effective approach inspired by contrastive learning for training domain adaptive detectors. DC is deduced from the error bound minimization perspective of a transferred model, and is implemented with cross-domain contrast loss which is plug-and-play. By minimizing cross-domain contrast loss, DC guarantees the transferability of detectors while naturall… ▽ More We present Domain Contrast (DC), a simple yet effective approach inspired by contrastive learning for training domain adaptive detectors. DC is deduced from the error bound minimization perspective of a transferred model, and is implemented with cross-domain contrast loss which is plug-and-play. By minimizing cross-domain contrast loss, DC guarantees the transferability of detectors while naturally alleviating the class imbalance issue in the target domain. DC can be applied at either image level or region level, consistently improving detectors' transferability and discriminability. Extensive experiments on commonly used benchmarks show that DC improves the baseline and state-of-the-art by significant margins, while demonstrating great potential for large domain divergence. △ Less

Submitted 26 June, 2020; originally announced June 2020.

arXiv:2006.12708 [pdf, other]

iffDetector: Inference-aware Feature Filtering for Object Detection

Authors: Mingyuan Mao, Yuxin Tian, Baochang Zhang, Qixiang Ye, Wanquan Liu, Guodong Guo, David Doermann

Abstract: Modern CNN-based object detectors focus on feature configuration during training but often ignore feature optimization during inference. In this paper, we propose a new feature optimization approach to enhance features and suppress background noise in both the training and inference stages. We introduce a generic Inference-aware Feature Filtering (IFF) module that can easily be combined with moder… ▽ More Modern CNN-based object detectors focus on feature configuration during training but often ignore feature optimization during inference. In this paper, we propose a new feature optimization approach to enhance features and suppress background noise in both the training and inference stages. We introduce a generic Inference-aware Feature Filtering (IFF) module that can easily be combined with modern detectors, resulting in our iffDetector. Unlike conventional open-loop feature calculation approaches without feedback, the IFF module performs closed-loop optimization by leveraging high-level semantics to enhance the convolutional features. By applying Fourier transform analysis, we demonstrate that the IFF module acts as a negative feedback that theoretically guarantees the stability of feature learning. IFF can be fused with CNN-based object detectors in a plug-and-play manner with negligible computational cost overhead. Experiments on the PASCAL VOC and MS COCO datasets demonstrate that our iffDetector consistently outperforms state-of-the-art methods by significant margins\footnote{The test code and model are anonymously available in https://github.com/anonymous2020new/iffDetector }. △ Less

Submitted 22 June, 2020; originally announced June 2020.

Comments: 14 pages, 6 figures

arXiv:2006.11476 [pdf, other]

Video Playback Rate Perception for Self-supervisedSpatio-Temporal Representation Learning

Authors: Yuan Yao, Chang Liu, Dezhao Luo, Yu Zhou, Qixiang Ye

Abstract: In self-supervised spatio-temporal representation learning, the temporal resolution and long-short term characteristics are not yet fully explored, which limits representation capabilities of learned models. In this paper, we propose a novel self-supervised method, referred to as video Playback Rate Perception (PRP), to learn spatio-temporal representation in a simple-yet-effective way. PRP roots… ▽ More In self-supervised spatio-temporal representation learning, the temporal resolution and long-short term characteristics are not yet fully explored, which limits representation capabilities of learned models. In this paper, we propose a novel self-supervised method, referred to as video Playback Rate Perception (PRP), to learn spatio-temporal representation in a simple-yet-effective way. PRP roots in a dilated sampling strategy, which produces self-supervision signals about video playback rates for representation model learning. PRP is implemented with a feature encoder, a classification module, and a reconstructing decoder, to achieve spatio-temporal semantic retention in a collaborative discrimination-generation manner. The discriminative perception model follows a feature encoder to prefer perceiving low temporal resolution and long-term representation by classifying fast-forward rates. The generative perception model acts as a feature decoder to focus on comprehending high temporal resolution and short-term representation by introducing a motion-attention mechanism. PRP is applied on typical video target tasks including action recognition and video retrieval. Experiments show that PRP outperforms state-of-the-art self-supervised models with significant margins. Code is available at github.com/yuanyao366/PRP △ Less

Submitted 19 June, 2020; originally announced June 2020.

Comments: CVPR 2020

arXiv:2006.09142 [pdf, other]

Cogradient Descent for Bilinear Optimization

Authors: Li'an Zhuo, Baochang Zhang, Linlin Yang, Hanlin Chen, Qixiang Ye, David Doermann, Guodong Guo, Rongrong Ji

Abstract: Conventional learning methods simplify the bilinear model by regarding two intrinsically coupled factors independently, which degrades the optimization procedure. One reason lies in the insufficient training due to the asynchronous gradient descent, which results in vanishing gradients for the coupled variables. In this paper, we introduce a Cogradient Descent algorithm (CoGD) to address the bilin… ▽ More Conventional learning methods simplify the bilinear model by regarding two intrinsically coupled factors independently, which degrades the optimization procedure. One reason lies in the insufficient training due to the asynchronous gradient descent, which results in vanishing gradients for the coupled variables. In this paper, we introduce a Cogradient Descent algorithm (CoGD) to address the bilinear problem, based on a theoretical framework to coordinate the gradient of hidden variables via a projection function. We solve one variable by considering its coupling relationship with the other, leading to a synchronous gradient descent to facilitate the optimization procedure. Our algorithm is applied to solve problems with one variable under the sparsity constraint, which is widely used in the learning paradigm. We validate our CoGD considering an extensive set of applications including image reconstruction, inpainting, and network pruning. Experiments show that it improves the state-of-the-art by a significant margin. △ Less

Submitted 16 June, 2020; originally announced June 2020.

Comments: 9 pages, 6 figures

Showing 251–300 of 475 results for author: Ye, Q