Jie Yang

Jie Yang

San Francisco Bay Area
842 followers 500+ connections

About

I love finding critical problems, designing innovative solutions, initiating projects…

Activity

Join now to see all activity

Experience

  • Cybever Graphic

    Cybever

    Sunnyvale, California, United States

  • -

    Mountain View, California, United States

  • -

    Mountain View, CA

  • -

    Mountain View, CA

  • -

    Santa Clara, CA

  • -

    San Francisco, CA

  • -

    Delft, South Holland, Netherlands

  • -

    Sichuan University, Chengdu, Sichuan, China

  • -

    Sichuan University, Chengdu, Sichuan, China

Education

  • Stanford University Graphic

    Stanford University

    GPA: 4.0

    STATS 202 - Data Mining and Analysis (Prof. Susan Holmes)
    STATS 315A - Modern Applied Statistics: Learning (Prof. Trevor Hastie)
    STATS 315B - Modern Applied Statistics: Data Mining (Prof. Jerome Friedman)

    CS224W - Social and Information Network Analysis (Prof. Jure Leskovec)

  • Specialization: Internet and Web Technology, Parallel and Distributed Systems
    Supervisors: Prof. Maarten van Steen, Dr. Guillaume Pierre
    Thesis: Data Clustering for Autonomic Application Replication

    Master Project: automatically partitioning and replicating databases for web
    - applications based on cluster analysis and machine learning methods. It is a part of Globule project, a P2P Content Delivery Networks.

    Course Projects:
    * Operating System: programming to upgrade…

    Specialization: Internet and Web Technology, Parallel and Distributed Systems
    Supervisors: Prof. Maarten van Steen, Dr. Guillaume Pierre
    Thesis: Data Clustering for Autonomic Application Replication

    Master Project: automatically partitioning and replicating databases for web
    - applications based on cluster analysis and machine learning methods. It is a part of Globule project, a P2P Content Delivery Networks.

    Course Projects:
    * Operating System: programming to upgrade the kernel, memory management and file system of the Minix operating system.
    * Web-based Knowledge Representation: using XML, XSL, RDF, OWL to build a semantic based website.
    * Object-Oriental Programming: using Java, JSP and design patterns to build an online program contest and judge system.
    * Machine Learning: using machine learning methods to filter out spam e-mails.
    * Multimedia Authoring: using Prolog, VRML and JavaScript to build a 3D Virtual Community.

  • Activities and Societies: Miracle Studio, ACM/ICPC

    Supervisor: Prof. Bingfa Li
    Thesis: Wavelet Transform in Image Analysis Applications

    Extensively trained on programming and algorithm design for ACM/ICPC international programming contest

    Founded a developer club (Miracle Studio) and built a few websites from scratch.

Licenses & Certifications

Publications

  • IM-RAG: Multi-Round Retrieval-Augmented Generation Through Learning Inner Monologues

    SIGIR

    Although the Retrieval-Augmented Generation (RAG) paradigms can use external knowledge to enhance and ground the outputs of Large Language Models (LLMs) to mitigate generative hallucinations and static knowledge base problems, they still suffer from limited flexibility in adopting Information Retrieval (IR) systems with varying capabilities, constrained interpretability during the multi-round retrieval process, and a lack of end-to-end optimization. To address these challenges, we propose a…

    Although the Retrieval-Augmented Generation (RAG) paradigms can use external knowledge to enhance and ground the outputs of Large Language Models (LLMs) to mitigate generative hallucinations and static knowledge base problems, they still suffer from limited flexibility in adopting Information Retrieval (IR) systems with varying capabilities, constrained interpretability during the multi-round retrieval process, and a lack of end-to-end optimization. To address these challenges, we propose a novel LLM-centric approach, IM-RAG, that integrates IR systems with LLMs to support multi-round RAG through learning Inner Monologues (IM, i.e., the human inner voice that narrates one's thoughts). During the IM process, the LLM serves as the core reasoning model (i.e., Reasoner) to either propose queries to collect more information via the Retriever or to provide a final answer based on the conversational context. We also introduce a Refiner that improves the outputs from the Retriever, effectively bridging the gap between the Reasoner and IR modules with varying capabilities and fostering multi-round communications. The entire IM process is optimized via Reinforcement Learning (RL) where a Progress Tracker is incorporated to provide mid-step rewards, and the answer prediction is further separately optimized via Supervised Fine-Tuning (SFT). We conduct extensive experiments with the HotPotQA dataset, a popular benchmark for retrieval-based, multi-step question-answering. The results show that our approach achieves state-of-the-art (SOTA) performance while providing high flexibility in integrating IR modules as well as strong interpretability exhibited in the learned inner monologues.

    See publication
  • Tackling Vision Language Tasks Through Learning Inner Monologues

    AAAI

    Visual language tasks require AI models to comprehend and reason with both visual and textual content. Driven by the power of Large Language Models (LLMs), two prominent methods have emerged: (1) the hybrid integration between LLMs and Vision-Language Models (VLMs), where visual inputs are firstly converted into language descriptions by VLMs, serving as inputs for LLMs to generate final answer(s); (2) visual feature alignment in language space, where visual inputs are encoded as embeddings and…

    Visual language tasks require AI models to comprehend and reason with both visual and textual content. Driven by the power of Large Language Models (LLMs), two prominent methods have emerged: (1) the hybrid integration between LLMs and Vision-Language Models (VLMs), where visual inputs are firstly converted into language descriptions by VLMs, serving as inputs for LLMs to generate final answer(s); (2) visual feature alignment in language space, where visual inputs are encoded as embeddings and projected to LLMs' language space via further supervised fine-tuning. The first approach provides light training costs and interpretability but is hard to be optimized in an end-to-end fashion. The second approach presents decent performance, but feature alignment usually requires large amounts of training data and lacks interpretability. To tackle this dilemma, we propose a novel approach, Inner Monologue Multi-Modal Optimization (IMMO), to solve complex vision language problems by simulating inner monologue processes, a cognitive process in which an individual engages in silent verbal communication with themselves. We enable LLMs and VLMs to interact through natural language conversation and propose to use a two-stage training process to learn how to do the inner monologue (self-asking questions and answering questions). IMMO is evaluated on two popular tasks and the results suggest by emulating the cognitive phenomenon of internal dialogue, our approach can enhance reasoning and explanation abilities, contributing to the more effective fusion of vision and language models. More importantly, instead of using predefined human-crafted monologues, IMMO learns this process within the deep learning models, promising wider applicability to many different AI problems beyond vision language tasks.

    See publication
  • Higher Layers Need More LoRA Experts

    arXiv

    Parameter-efficient tuning (PEFT) techniques like low-rank adaptation (LoRA) offer training efficiency on Large Language Models, but their impact on model performance remains limited. Recent efforts integrate LoRA and Mixture-of-Experts (MoE) to improve the performance of PEFT methods. Despite promising results, research on improving the efficiency of LoRA with MoE is still in its early stages. Recent studies have shown that experts in the MoE architecture have different strengths and also…

    Parameter-efficient tuning (PEFT) techniques like low-rank adaptation (LoRA) offer training efficiency on Large Language Models, but their impact on model performance remains limited. Recent efforts integrate LoRA and Mixture-of-Experts (MoE) to improve the performance of PEFT methods. Despite promising results, research on improving the efficiency of LoRA with MoE is still in its early stages. Recent studies have shown that experts in the MoE architecture have different strengths and also exhibit some redundancy. Does this statement also apply to parameter-efficient MoE? In this paper, we introduce a novel parameter-efficient MoE method, \textit{\textbf{M}oE-L\textbf{o}RA with \textbf{L}ayer-wise Expert \textbf{A}llocation (MoLA)} for Transformer-based models, where each model layer has the flexibility to employ a varying number of LoRA experts. We investigate several architectures with varying layer-wise expert configurations. Experiments on six well-known NLP and commonsense QA benchmarks demonstrate that MoLA achieves equal or superior performance compared to all baselines. We find that allocating more LoRA experts to higher layers further enhances the effectiveness of models with a certain number of experts in total. With much fewer parameters, this allocation strategy outperforms the setting with the same number of experts in every layer. This work can be widely used as a plug-and-play parameter-efficient tuning approach for various applications. The code is available at https://github.com/GCYZSL/MoLA.

    See publication
  • Evaluation and Enhancement of Semantic Grounding in Large Vision-Language Models

    AAAI-ReLM Workshop

    Large Vision-Language Models (LVLMs) offer remarkable benefits for a variety of vision-language tasks. However, a challenge hindering their application in real-world scenarios, particularly regarding safety, robustness, and reliability, is their constrained semantic grounding ability, which pertains to connecting language to the physical-world entities or concepts referenced in images. Therefore, a crucial need arises for a comprehensive study to assess the semantic grounding ability of widely…

    Large Vision-Language Models (LVLMs) offer remarkable benefits for a variety of vision-language tasks. However, a challenge hindering their application in real-world scenarios, particularly regarding safety, robustness, and reliability, is their constrained semantic grounding ability, which pertains to connecting language to the physical-world entities or concepts referenced in images. Therefore, a crucial need arises for a comprehensive study to assess the semantic grounding ability of widely used LVLMs. Despite the significance, sufficient investigation in this direction is currently lacking. Our work bridges this gap by designing a pipeline for generating large-scale evaluation datasets covering fine-grained semantic information, such as color, number, material, etc., along with a thorough assessment of seven popular LVLMs’ semantic grounding ability. Results highlight prevalent misgrounding across various aspects and degrees. To address this issue, we propose a data-centric enhancement method that aims to improve LVLMs’ semantic grounding ability through multimodal instruction tuning on fine-grained conversations. Experiments on enhanced LVLMs demonstrate notable improvements in addressing misgrounding issues.

    See publication
  • Learning Inner Monologue and Its Utilization in Vision-Language Challenges

    NeurIPS Workshop: Socially Responsible Language Modelling Research (SoLaR)

    Inner monologue is an essential phenomenon for reasoning and insight mining in human cognition. In this work, we propose a novel approach for AI systems to simulate inner monologue. Specifically, we consider the communications between components in an LLM-centric system as inner monologues, and demonstrate inner monologue reasoning ability can be learned by supervised learning and reinforcement learning, and then be utilized to solve different complex vision-language problems in different…

    Inner monologue is an essential phenomenon for reasoning and insight mining in human cognition. In this work, we propose a novel approach for AI systems to simulate inner monologue. Specifically, we consider the communications between components in an LLM-centric system as inner monologues, and demonstrate inner monologue reasoning ability can be learned by supervised learning and reinforcement learning, and then be utilized to solve different complex vision-language problems in different domains. Driven by the power of Large Language Models (LLMs), two prominent methods for vision-language tasks have emerged: (1) the hybrid integration between LLMs and Vision-Language Models (VLMs), where visual inputs are firstly converted into language descriptions by VLMs, serving as inputs for LLMs to generate final answer(s); (2) visual feature alignment in language space, where visual inputs are encoded as embeddings and projected to LLMs' language space via further supervised fine-tuning. The first approach provides light training costs and interpretability but is hard to be optimized in an end-to-end fashion. The second approach presents decent performance, but feature alignment usually requires large amounts of training data and lacks interpretability. With inner monologue simulation, our approach achieves competitive performance with less training data and promising interpretability when compared with state-of-the-art models on two popular tasks.

    See publication
  • Insight miner: A time series analysis dataset for cross-domain alignment with natural language

    NeurIPS AI for Science Workshop

    Time-series data is essential in various science and industry domains, like environmental analysis, agriculture, transportation, and finance. Researchers need to use their domain knowledge to conduct insight mining from time-series data to study scientific topics. However, this process is time-consuming and highly depends on expert knowledge. This paper proposes a large-scale multimodal model (LMM), Insight Miner, to generate decent and comprehensive time-series descriptions with…

    Time-series data is essential in various science and industry domains, like environmental analysis, agriculture, transportation, and finance. Researchers need to use their domain knowledge to conduct insight mining from time-series data to study scientific topics. However, this process is time-consuming and highly depends on expert knowledge. This paper proposes a large-scale multimodal model (LMM), Insight Miner, to generate decent and comprehensive time-series descriptions with domain-specific knowledge. To introduce rich time-series insights to Insight Miner, we propose a time-series analysis dataset, TS-Insights, composed of time series and textual insight pairs. In the TS-Insights dataset, we include 100k time series windows sampled from 20 forecasting datasets spanning a wide variety of domains and granularities. Through a meticulous combination of heuristics and statistical tools, we preprocess each raw time series window and use GPT-4 to generate a coherent trend description based on the extracted features. After training with the TS-Insights dataset via instruct tuning, the Insight Miner model performs better in generating time series descriptions and insights compared with state-of-the-art multimodality models, such as LLaVA and GPT-4. Our findings suggest a promising direction of leveraging LMMs for time series analysis and potentially offering avenues for efficient insight mining in scientific domains

    See publication
  • Evaluation and mitigation of agnosia in multimodal large language models

    arXiv

    While Multimodal Large Language Models (MLLMs) are widely used for a variety of vision-language tasks, one observation is that they sometimes misinterpret visual inputs or fail to follow textual instructions even in straightforward cases, leading to irrelevant responses, mistakes, and ungrounded claims. This observation is analogous to a phenomenon in neuropsychology known as Agnosia, an inability to correctly process sensory modalities and recognize things (e.g., objects, colors, relations)…

    While Multimodal Large Language Models (MLLMs) are widely used for a variety of vision-language tasks, one observation is that they sometimes misinterpret visual inputs or fail to follow textual instructions even in straightforward cases, leading to irrelevant responses, mistakes, and ungrounded claims. This observation is analogous to a phenomenon in neuropsychology known as Agnosia, an inability to correctly process sensory modalities and recognize things (e.g., objects, colors, relations). In our study, we adapt this similar concept to define "agnosia in MLLMs", and our goal is to comprehensively evaluate and mitigate such agnosia in MLLMs. Inspired by the diagnosis and treatment process in neuropsychology, we propose a novel framework EMMA (Evaluation and Mitigation of Multimodal Agnosia). In EMMA, we develop an evaluation module that automatically creates fine-grained and diverse visual question answering examples to assess the extent of agnosia in MLLMs comprehensively. We also develop a mitigation module to reduce agnosia in MLLMs through multimodal instruction tuning on fine-grained conversations. To verify the effectiveness of our framework, we evaluate and analyze agnosia in seven state-of-the-art MLLMs using 9K test samples. The results reveal that most of them exhibit agnosia across various aspects and degrees. We further develop a fine-grained instruction set and tune MLLMs to mitigate agnosia, which led to notable improvement in accuracy.

    See publication
  • LOWA: Localize Objects in the Wild with Attributes

    NeurIPS Workshop on robustness of zero/few-shot learning in foundation models

    Existing open-vocabulary object detectors can struggle with uncommon or fine-grained classes, as the model and users may have different understandings of object names. Incorporating attributes such as color, shape, and size can help to reduce this inconsistency and make interactive detection more convenient and flexible. Motivated by this, we present LOWA, a new method for localizing objects with attributes effectively in the wild. To train LOWA, we propose a multi-step vision-language training…

    Existing open-vocabulary object detectors can struggle with uncommon or fine-grained classes, as the model and users may have different understandings of object names. Incorporating attributes such as color, shape, and size can help to reduce this inconsistency and make interactive detection more convenient and flexible. Motivated by this, we present LOWA, a new method for localizing objects with attributes effectively in the wild. To train LOWA, we propose a multi-step vision-language training strategy to learn object detection and recognition with class names as well as attribute information, which empowers users to flexibly customize text queries and extend to fine-grained detection with attribute and object information for a wider range of applications. LOWA is built on top of a two-tower vision-language architecture and consists of a standard vision transformer as the image encoder and a similar transformer as the text encoder. To learn the alignment between visual and text inputs at the instance level, we train LOWA with three training steps: object-level training, attribute-aware learning, and free-text joint training of objects and attributes. This training strategy first ensures correct object detection, then incorporates instance-level attribute information, and finally balances the object class and attribute sensitivity. We evaluate our model performance of attribute classification and attribute localization on the Open-Vocabulary Attribute Detection (OVAD) benchmark and the Visual Attributes in the Wild (VAW) dataset, and experiments indicate strong zero-shot performance

    See publication
  • Hierarchical label propagation and discovery for machine generated email

    WSDM

    Machine-generated documents such as email or dynamic web pages are single instantiations of a pre-defined structural template. As such, they can be viewed as a hierarchy of template and document specific content. This hierarchical template representation has several important advantages for document clustering and classification. First, templates capture common topics among the documents, while filtering out the potentially noisy variabilities such as personal information. Second, template…

    Machine-generated documents such as email or dynamic web pages are single instantiations of a pre-defined structural template. As such, they can be viewed as a hierarchy of template and document specific content. This hierarchical template representation has several important advantages for document clustering and classification. First, templates capture common topics among the documents, while filtering out the potentially noisy variabilities such as personal information. Second, template representations scale far better than document representations since a single template captures numerous documents. Finally, since templates group together structurally similar documents, they can propagate properties between all the documents that match the template. In this paper, we use these advantages for document classification by formulating an efficient and effective hierarchical label propagation and discovery algorithm. The labels are propagated first over a template graph (constructed based on either term-based or topic-based similarities), and then to the matching documents. We evaluate the performance of the proposed algorithm using a large donated email corpus and show that the resulting template graph is significantly more compact than the corresponding document graph and the hierarchical label propagation is both efficient and effective in increasing the coverage of the baseline document classification algorithm. We demonstrate that the template label propagation achieves more than 91% precision and 93% recall, while increasing the label coverage by more than 11%.

    See publication
  • Annotating needles in the haystack without looking: Product information extraction from emails

    SIGKDD

    Business-to-consumer (B2C) emails are usually generated by filling structured user data (e.g.purchase, event) into templates. Extracting structured data from B2C emails allows users to track important information on various devices.

    However, it also poses several challenges, due to the requirement of short response time for massive data volume, the diversity and complexity of templates, and the privacy and legal constraints. Most notably, email data is legally protected content, which…

    Business-to-consumer (B2C) emails are usually generated by filling structured user data (e.g.purchase, event) into templates. Extracting structured data from B2C emails allows users to track important information on various devices.

    However, it also poses several challenges, due to the requirement of short response time for massive data volume, the diversity and complexity of templates, and the privacy and legal constraints. Most notably, email data is legally protected content, which means no one except the receiver can review the messages or derived information.

    In this paper we first introduce a system which can extract structured information automatically without requiring human review of any personal content. Then we focus on how to annotate product names from the extracted texts, which is one of the most difficult problems in the system. Neither general learning methods, such as binary classifiers, nor more specific structure learning methods, suchas Conditional Random Field (CRF), can solve this problem well.

    To accomplish this task, we propose a hybrid approach, which basically trains a CRF model using the labels predicted by binary classifiers (weak learners). However, the performance of weak learners can be low, therefore we use Expectation Maximization (EM) algorithm on CRF to remove the noise and improve the accuracy, without the need to label and inspect specific emails. In our experiments, the EM-CRF model can significantly improve the product name annotations over the weak learners and plain CRFs.

    See publication
  • Online Modeling of Proactive Moderation System for Auction Fraud Detection

    WWW

    We consider the problem of building online machine-learned models for detecting auction frauds in e-commence web sites. Since the emergence of the world wide web, online shopping and auction has gained more and more popularity. While people are enjoying the benefits from online trading, criminals are also taking advantages to conduct fraudulent activities against honest parties to obtain illegal profit. Hence proactive fraud-detection moderation systems are commonly applied in practice to…

    We consider the problem of building online machine-learned models for detecting auction frauds in e-commence web sites. Since the emergence of the world wide web, online shopping and auction has gained more and more popularity. While people are enjoying the benefits from online trading, criminals are also taking advantages to conduct fraudulent activities against honest parties to obtain illegal profit. Hence proactive fraud-detection moderation systems are commonly applied in practice to detect and prevent such illegal and fraud activities. Machine-learned models, especially those that are learned online, are able to catch frauds more efficiently and quickly than human-tuned rule-based systems. In this paper, we propose an online probit model framework which takes feature selection, coefficient bounds from human knowledge and multiple instance learning into account simultaneously. By empirical experiments we show that this model can potentially detect more frauds and significantly reduce customer complaints compared to several baseline models and the human-tuned rule-based system.

    Other authors
    • L. Zhang, J. Yang, B. Tseng
    See publication
  • Vote Calibration in Community Question-Answering Systems

    SIGIR

    User votes are important signals in community question-answering (CQA) systems. Many features of typical CQA systems, e.g. the best answer to a question, status of a user, are dependent on ratings or votes cast by the community. In a popular CQA site, Yahoo! Answers, users vote for the best answers to their questions and can also thumb up or down each individual answer. Prior work has shown that these votes provide useful predictors for content quality and user expertise, where each vote is…

    User votes are important signals in community question-answering (CQA) systems. Many features of typical CQA systems, e.g. the best answer to a question, status of a user, are dependent on ratings or votes cast by the community. In a popular CQA site, Yahoo! Answers, users vote for the best answers to their questions and can also thumb up or down each individual answer. Prior work has shown that these votes provide useful predictors for content quality and user expertise, where each vote is usually assumed to carry the same weight as others. In this paper, we analyze a set of possible factors that indicate bias in user voting behavior -- these factors encompass different gaming behavior, as well as other eccentricities, e.g., votes to show appreciation of answerers. These observations suggest that votes need to be calibrated before being used to identify good answers or experts. To address this problem, we propose a general machine learning framework to calibrate such votes. Through extensive experiments based on an editorially judged CQA dataset, we show that our supervised learning method of content-agnostic vote calibration can significantly improve the performance of answer ranking and expert ranking.

    Other authors
    See publication
  • FastInf: A Fast Algorithm to Infer Social Networks from Cascades

    Stanford CS224W

    The structure of social networks, i.e., the edges among nodes, are the fundamental information we need for future studies. However it may not be easily observed in many cases, such as infection, shopping recommendation, etc. Therefore it is an interesting and important problem to infer the networks from the observed data, e.g., information propagation cascades.
    Pervious works have focused on accuracy and developed several complex models to solve this problem. However in the real world those…

    The structure of social networks, i.e., the edges among nodes, are the fundamental information we need for future studies. However it may not be easily observed in many cases, such as infection, shopping recommendation, etc. Therefore it is an interesting and important problem to infer the networks from the observed data, e.g., information propagation cascades.
    Pervious works have focused on accuracy and developed several complex models to solve this problem. However in the real world those solutions are not e efficient nor scalable enough to be used in big data environment. For example,
    people generate billions of records every day on the biggest social media websites such as Twitter and Facebook. We develop a very ecient and scalable algorithm, FastInf, to infer the hidden networks from cascades. Although its
    speed is much faster than other algorithms, we still can keep its accuracy at least close to the state of the art, and in some cases it is even better. We also provide a Map-Reduce implementation of this algorithm, which makes it feasible in the real-world industry environment.

    Other authors
    See publication
  • A Machine-Learned Proactive Moderation System for Auction Fraud Detection

    CIKM [Poster]

    Online auction and shopping are gaining popularity with the growth of web-based eCommerce. Criminals are also taking advantage of these opportunities to conduct fraudulent activities against honest parties with the purpose of deception and illegal profit. In practice, proactive moderation systems are deployed to detect suspicious events for further inspection by human experts. Motivated by real-world applications in commercial auction sites in Asia, we develop various advanced machine learning…

    Online auction and shopping are gaining popularity with the growth of web-based eCommerce. Criminals are also taking advantage of these opportunities to conduct fraudulent activities against honest parties with the purpose of deception and illegal profit. In practice, proactive moderation systems are deployed to detect suspicious events for further inspection by human experts. Motivated by real-world applications in commercial auction sites in Asia, we develop various advanced machine learning techniques in the proactive moderation system. Our proposed system is formulated as optimizing bounded generalized linear models in multi-instance learning problems, with intrinsic bias in selective labeling and massive unlabeled samples. In both offline evaluations and online bucket tests, the proposed system significantly outperforms the rule-based system on various metrics, including area under ROC (AUC), loss rate of labeled frauds and customer complaints. We also show that the metrics of loss rates are more effective than AUC in our cases.

    Other authors
    See publication
  • User Reputation in a Comment Rating Environment

    KDD

    Reputable users are valuable assets of a web site. We focus on user reputation in a comment rating environment, where users make comments about content items and rate the comments of one another. Intuitively, a reputable user posts high quality comments and is highly rated by the user community. To our surprise, we find that the quality of a comment judged editorially is almost uncorrelated with the ratings that it receives, but can be predicted using standard text features, achieving accuracy…

    Reputable users are valuable assets of a web site. We focus on user reputation in a comment rating environment, where users make comments about content items and rate the comments of one another. Intuitively, a reputable user posts high quality comments and is highly rated by the user community. To our surprise, we find that the quality of a comment judged editorially is almost uncorrelated with the ratings that it receives, but can be predicted using standard text features, achieving accuracy as high as the agreement between two editors! However, extracting a pure reputation signal from ratings is difficult because of data sparseness and several confounding factors in users' voting behavior. To address these issues, we propose a novel bias-smoothed tensor model and empirically show that our model significantly outperforms a number of alternatives based on Yahoo! News, Yahoo! Buzz and Epinions datasets.

    Other authors
    • Bee-Chung Chen, Jian Guo, Belle Tseng and Jie Yang
    See publication
  • Personalization of tagging systems

    Information Processing & Management

    Social media systems have encouraged end user participation in the Internet, for the purpose of storing and distributing Internet content, sharing opinions and maintaining relationships. Collaborative tagging allows users to annotate the resulting user-generated content, and enables effective retrieval of otherwise uncategorised data. However, compared to professional web content production, collaborative tagging systems face the challenge that end-users assign tags in an uncontrolled manner…

    Social media systems have encouraged end user participation in the Internet, for the purpose of storing and distributing Internet content, sharing opinions and maintaining relationships. Collaborative tagging allows users to annotate the resulting user-generated content, and enables effective retrieval of otherwise uncategorised data. However, compared to professional web content production, collaborative tagging systems face the challenge that end-users assign tags in an uncontrolled manner, resulting in unsystematic and inconsistent metadata.

    This paper introduces a framework for the personalization of social media systems. We pinpoint three tasks that would benefit from personalization: collaborative tagging, collaborative browsing and collaborative search. We propose a ranking model for each task that integrates the individual user’s tagging history in the recommendation of tags and content, to align its suggestions to the individual user preferences. We demonstrate on two real data sets that for all three tasks, the personalized ranking should take into account both the user’s own preference and the opinion of others.

    Other authors
    • J. Wang, M. Clements, J. Yang, A. P. de Vries, and M. Reinders
    See publication
  • Buddycast: an Operational Peer-to-Peer Epidemic Protocol Stack

    Proc. of the 14th Annual Conf. of the Advanced School for Computing and Imaging

    Peer-to-peer (P2P) networks have developed into popular media to share and seek information. Given the large amount of information available, it is of great interest to design a distributed recommender system, to personalize the infor- mation seeking in these networks. This paper describes a distributed collaborative filtering framework. In this frame- work, we introduce an item ranking model for collabora- tive filtering, inspired by the Probability Ranking Principle (PRP) of information…

    Peer-to-peer (P2P) networks have developed into popular media to share and seek information. Given the large amount of information available, it is of great interest to design a distributed recommender system, to personalize the infor- mation seeking in these networks. This paper describes a distributed collaborative filtering framework. In this frame- work, we introduce an item ranking model for collabora- tive filtering, inspired by the Probability Ranking Principle (PRP) of information retrieval. However, the probability es- timation for the item ranking requires a centralized database to store user preferences. Within a P2P network such a centralized database is not readily available. To overcome this problem, we developed a novel preference exchange al- gorithm called BuddyCast, based on the epidemic protocol. Under this overlay network, the distributed item ranking is realized by fully decomposing the computation loads of the model and preference data into the entire network. The for- mal derivation and analysis in this paper help us to outline and motivate possible future directions of research in P2P Recommender Systems.
    The distributed recommendation framework described in this paper has been implemented in our Open Source P2P file sharing software Tribler (tribler.org).

    Other authors
    • J.A. Pouwelse, J. Yang, M. Meulpolder, D.H.J. Epema, H.J. Sips
    See publication
  • An Epidemic-based P2P Recommender System

    Workshop on Large Scale Distributed Systems for IR in SIGIR

    Peer-to-peer (P2P) networks have developed into popular media to share and seek information. Given the large amount of information available, it is of great interest to design a distributed recommender system, to personalize the information seeking in these networks. This paper describes a distributed collaborative filtering framework. In this framework, we introduce an item ranking model for collaborative filtering, inspired by the Probability Ranking Principle (PRP) of information retrieval…

    Peer-to-peer (P2P) networks have developed into popular media to share and seek information. Given the large amount of information available, it is of great interest to design a distributed recommender system, to personalize the information seeking in these networks. This paper describes a distributed collaborative filtering framework. In this framework, we introduce an item ranking model for collaborative filtering, inspired by the Probability Ranking Principle (PRP) of information retrieval. However, the probability estimation for the item ranking requires a centralized database to store user preferences. Within a P2P network such a centralized database is not readily available. To overcome this problem, we developed a novel preference exchange algorithm called BuddyCast, based on the epidemic protocol. Under this overlay network, the distributed item ranking is realized by fully decomposing the computation loads of the model and preference data into the entire network. The formal derivation and analysis in this paper help us to outline and motivate possible future directions of research in P2P Recommender Systems. The distributed recommendation framework described in this paper has been implemented in our Open Source P2P file sharing software Tribler (tribler.org).

    Other authors
    • J. Yang, J. Wang, M. Clements, J. Pouwelse, A. P. de Vries, M. Reinders
    See publication
  • Tribler: A social-based peer-to-peer system

    IPTPS 06; Concurrency and Computation: Practice and Experience

    Most current peer-to-peer (P2P) file-sharing systems treat their users as anonymous, unrelated entities, and completely disregard any social relationships between them. However, social phenomena such as friendship and the existence of communities of users with similar tastes or interests may well be exploited in such systems in order to increase their usability and performance. In this paper we present a novel social-based P2P file-sharing paradigm that exploits social phenomena by maintaining…

    Most current peer-to-peer (P2P) file-sharing systems treat their users as anonymous, unrelated entities, and completely disregard any social relationships between them. However, social phenomena such as friendship and the existence of communities of users with similar tastes or interests may well be exploited in such systems in order to increase their usability and performance. In this paper we present a novel social-based P2P file-sharing paradigm that exploits social phenomena by maintaining social networks and using these in content discovery, content recommendation, and downloading. Based on this paradigm's main concepts such as taste buddies and friends, we have designed and implemented the Tribler P2P file-sharing system as a set of extensions to BitTorrent. We present and discuss the design of Tribler, and we show evidence that Tribler enables fast content discovery and recommendation at a low additional overhead, and a significant improvement in download performance.

    Other authors
    • J. Pouwelse, P. Garbacki, J. Wang, A. Bakker, J. Yang, A. Iosup, D. Epema, M. Reinders, M. van Steen, and H. J
    See publication

Patents

  • Crop yield prediction at field-level and pixel-level

    Issued US PCT/US2019/056882

    Implementations relate to crop yield prediction at the field- and pixel-level. In various implementations, a first temporal sequence of high-elevation digital images may be obtained that capture a first geographic area and are acquired over a first predetermined time interval while the first geographic area includes a particular crop. A first plurality of other data points may also be obtained that influence a ground truth crop yield of the first geographic area after the first predetermined…

    Implementations relate to crop yield prediction at the field- and pixel-level. In various implementations, a first temporal sequence of high-elevation digital images may be obtained that capture a first geographic area and are acquired over a first predetermined time interval while the first geographic area includes a particular crop. A first plurality of other data points may also be obtained that influence a ground truth crop yield of the first geographic area after the first predetermined time interval. The first plurality of other data points may be grouped into temporal chunks corresponding temporally with respective images of the first temporal sequence. The first temporal sequence and the temporal chunks of the first plurality of other data points may be applied, e.g., iteratively, as input across a machine learning model to estimate a crop yield of the first geographic area at the end of the first predetermined time interval.

    See patent
  • Detection and replacement of transient obstructions from high elevation digital images

    Issued US US20190392596A1

    Implementations relate to detecting/replacing transient obstructions from high-elevation digital images. A digital image of a geographic area includes pixels that align spatially with respective geographic units of the geographic area. Analysis of the digital image may uncover obscured pixel(s) that align spatially with geographic unit(s) of the geographic area that are obscured by transient obstruction(s). Domain fingerprint(s) of the obscured geographic unit(s) may be determined across pixels…

    Implementations relate to detecting/replacing transient obstructions from high-elevation digital images. A digital image of a geographic area includes pixels that align spatially with respective geographic units of the geographic area. Analysis of the digital image may uncover obscured pixel(s) that align spatially with geographic unit(s) of the geographic area that are obscured by transient obstruction(s). Domain fingerprint(s) of the obscured geographic unit(s) may be determined across pixels of a corpus of digital images that align spatially with the one or more obscured geographic units. Unobscured pixel(s) of the same/different digital image may be identified that align spatially with unobscured geographic unit(s) of the geographic area. The unobscured geographic unit(s) also may have domain fingerprint(s) that match the domain fingerprint(s) of the obscured geographic unit(s). Replacement pixel data may be calculated based on the unobscured pixels and used to generate a transient-obstruction-free version of the digital image.

    See patent
  • Crop Boundary Detection in Images

    Issued US PCT/US2019/013708

    In embodiments, obtaining a plurality of image sets associated with a geographical region and a time period, wherein each image set of the plurality of image sets comprises multi-spectral and time series images that depict a respective particular portion of the geographical region during the time period, and predicting presence of a crop at particular locations within the particular portion of the geographical region associated with an image set of the plurality of image sets. Determining crop…

    In embodiments, obtaining a plurality of image sets associated with a geographical region and a time period, wherein each image set of the plurality of image sets comprises multi-spectral and time series images that depict a respective particular portion of the geographical region during the time period, and predicting presence of a crop at particular locations within the particular portion of the geographical region associated with an image set of the plurality of image sets. Determining crop boundary locations within the particular portion of the geographical region based on the predicted presence of the crop at the particular locations, and generating a crop indicative image comprising at least one image of the multi-spectral and time series images of the image set overlaid with indication of crop areas, wherein the crop areas are defined by the determined crop boundary locations.

    See patent
  • Crop type classification in images

    Issued US US20190228224A1

    In embodiments, obtaining a plurality of image sets associated with a geographical region and a time period, wherein each image set of the plurality of image sets comprises multi-spectral and time series images that depict a respective particular portion of the geographical region during the time period, and predicting one or more crop types growing in each of particular locations within the particular portion of the geographical region associated with an image set of the plurality of image…

    In embodiments, obtaining a plurality of image sets associated with a geographical region and a time period, wherein each image set of the plurality of image sets comprises multi-spectral and time series images that depict a respective particular portion of the geographical region during the time period, and predicting one or more crop types growing in each of particular locations within the particular portion of the geographical region associated with an image set of the plurality of image sets. Determining a crop type classification for each of the particular locations based on the predicted one or more crop types for the respective particular locations, and generating a crop indicative image comprising at least one image of the multi-spectral and time series images of the image set overlaid with indications of the crop type classification determined for the respective particular locations.

    See patent
  • Generating and applying event data extraction templates

    Issued US US 9652530 B1

    Methods and apparatus are described herein for generating and applying event data extraction templates. In various implementations, a set of structural paths may be identified from a corpus of communications. A first structural path of the set of structural paths, associated with a first segment of text, may be classified as transient in response to a determination that a frequency of occurrences of the first segment of text across the corpus satisfies a criterion. Event heuristics may be…

    Methods and apparatus are described herein for generating and applying event data extraction templates. In various implementations, a set of structural paths may be identified from a corpus of communications. A first structural path of the set of structural paths, associated with a first segment of text, may be classified as transient in response to a determination that a frequency of occurrences of the first segment of text across the corpus satisfies a criterion. Event heuristics may be applied to the communications of the corpus. A determination may be made, based on the applying, that the communications of the corpus are event-related. An event data type may be assigned to the transient structural path based on the applying. An event data extraction template may be generated to extract, from one or more subsequent communications, one or more event-related segments of text associated with the transient structural path.

    See patent
  • Identifying phishing communications using templates

    Issued US US 9596265 B2

    Methods, apparatus, systems, and computer-readable media are provided for determining whether communications are attempts at phishing. In various implementations, a potentially-deceptive communication may be matched to one or more templates of a plurality of templates. Each template may represent content shared among a cluster of communications sent by a trustworthy entity. In various implementations, it may be determined that an address associated with the communication is not affiliated with…

    Methods, apparatus, systems, and computer-readable media are provided for determining whether communications are attempts at phishing. In various implementations, a potentially-deceptive communication may be matched to one or more templates of a plurality of templates. Each template may represent content shared among a cluster of communications sent by a trustworthy entity. In various implementations, it may be determined that an address associated with the communication is not affiliated with one or more trustworthy entities associated with the one or more matched templates. In various implementations, the communication may be classified as a phishing attempt based on the determining.

    See patent
  • Generating and applying data extraction templates

    Issued US US 9563689 B1

    Methods, apparatus, and computer-readable media are provided for generating and applying data extraction templates. In various implementations, a corpus of structured communications such as emails may be grouped into clusters based on one or more similarities between the structured communications. A set of structural paths may be identified from structured communications of a particular cluster. One or more structural paths of the set may be classified as transient wherein a count of…

    Methods, apparatus, and computer-readable media are provided for generating and applying data extraction templates. In various implementations, a corpus of structured communications such as emails may be grouped into clusters based on one or more similarities between the structured communications. A set of structural paths may be identified from structured communications of a particular cluster. One or more structural paths of the set may be classified as transient wherein a count of occurrences of one or more associated segments of text across the particular cluster satisfies a criterion. One or more transient paths may be assigned a semantic data type and/or a confidentiality designation based on various signals. A data extraction template may be generated to extract, from subsequent structured communications, segments of text associated with transient (and in some cases, non-confidential) structural paths.

    See patent
  • Classifying documents by cluster

    Issued US US 20160314184 A1

    Methods, apparatus, systems, and computer-readable media are provided for classifying, or “labeling,” documents such as emails en masse based on association with a cluster/template. In various implementations, a corpus of documents may be grouped into a plurality of disjoint clusters of documents based on one or more shared content attributes. A classification distribution associated with a first cluster of the plurality of clusters may be determined based on classifications assigned to…

    Methods, apparatus, systems, and computer-readable media are provided for classifying, or “labeling,” documents such as emails en masse based on association with a cluster/template. In various implementations, a corpus of documents may be grouped into a plurality of disjoint clusters of documents based on one or more shared content attributes. A classification distribution associated with a first cluster of the plurality of clusters may be determined based on classifications assigned to individual documents of the first cluster. A classification distribution associated with a second cluster of the plurality of clusters may then be determined based at least in part on the classification distribution associated with the first cluster and a relationship between the first and second clusters.

    See patent
  • Information redaction from document data

    Issued US US 20160110352 A1

    Methods, systems, and apparatus, including computer programs encoded on a computer
    storage medium, for redacting data from a document collection generated for a set of
    documents that include personal information. The redaction of the data is based in part on a
    comparison of the document collection to a set of a personal documents of users for which
    the users have provided explicit approval to use in the processing of the document
    collection.

    See patent
  • ONLINE TECHNIQUES FOR SELLING GROUP COMBO COUPONS

    Issued US PCT/US2012/056322

    Techniques for providing group discounts are described. A group discount package is configured by associating a plurality of different items with the package, associating a discount price with each item, and associating a threshold value with at least one item. One or more actions that have corresponding threshold values may also be associated with the package. The group discount package may be offered by enabling users to request to purchase items associated with the package. Each user may…

    Techniques for providing group discounts are described. A group discount package is configured by associating a plurality of different items with the package, associating a discount price with each item, and associating a threshold value with at least one item. One or more actions that have corresponding threshold values may also be associated with the package. The group discount package may be offered by enabling users to request to purchase items associated with the package. Each user may request to purchase one or more of the items associated with the package at the associated discount price. Furthermore, the users may be enabled to perform any actions associated with the package. A deal with the package is confirmed when each associated threshold value is met.

    Other inventors
    See patent

Courses

  • Data Mining and Analysis

    STATS 202

  • Modern Applied Statistics: Data Mining

    315B

  • Modern Applied Statistics: Learning

    STATS 315A

  • Social and Information Network Analysis

    CS224W

Projects

Languages

  • Chinese

    Native or bilingual proficiency

  • English

    Full professional proficiency

Recommendations received

3 people have recommended Jie

Join now to view

More activity by Jie

View Jie’s full profile

  • See who you know in common
  • Get introduced
  • Contact Jie directly
Join to view full profile

Other similar profiles

Explore collaborative articles

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Explore More

Others named Jie Yang in United States

Add new skills with these courses