Haishan Liu

Haishan Liu

San Francisco Bay Area
1K followers 500+ connections

Activity

Join now to see all activity

Experience

  • SmartNews Graphic

    SmartNews

    Palo Alto, California, United States

  • -

    United States

  • -

  • -

    San Francisco Bay Area

  • -

    San Francisco Bay Area

  • -

    San Francisco Bay Area

Education

Publications

  • Generating supplemental content information using virtual profiles

    ACM RecSys 2013

    We describe a hybrid recommendation system at LinkedIn that seeks to optimally extract relevant information pertaining to items to be recommended. By extending the notion of an item profile, we propose the concept of a "virtual profile" that augments the content of the item with rich set of features inherited from members who have already shown explicit interest in it. Unlike item-based collaborative filtering, we focus on discovering the characteristic descriptors that underlie the item-user…

    We describe a hybrid recommendation system at LinkedIn that seeks to optimally extract relevant information pertaining to items to be recommended. By extending the notion of an item profile, we propose the concept of a "virtual profile" that augments the content of the item with rich set of features inherited from members who have already shown explicit interest in it. Unlike item-based collaborative filtering, we focus on discovering the characteristic descriptors that underlie the item-user association. Such information is used as supplemental features in a content-based filtering system. The main objective of virtual profiles is to provide a means to tap into rich-content information from one type of entity and propagate features extracted from which to other affiliated entities that may suffer from relative data scarcity. We empirically evaluate the proposed method on a real-world community recommendation problem at LinkedIn. The result shows that the virtual profiles outperform a collaborative filtering based approach (user who likes this also likes that). In particular, the improvement is more significant for new users with only limited connections, demonstrating the capability of the method to address the cold-start problem in pure collaborative filtering systems.

    Other authors
    See publication
  • Breaking the Deadlock: Simultaneously Discovering Attribute Matching and Cluster Matching with Multi-Objective Metaheuristics

    Journal on Data Semantics: Volume 1, Issue 2 (2012), Page 133-145, DOI: 10.1007/s13740-012-0010-0

    In this paper, we present a data mining approach to address challenges in the matching of heterogeneous datasets. In particular, we propose solutions to two problems that arise in integrating information from different results of scientific research. The first problem, attribute matching, involves discovery of correspondences among distinct numeric features (attributes) that are used to characterize datasets that have been collected and analyzed in different research labs. The second problem…

    In this paper, we present a data mining approach to address challenges in the matching of heterogeneous datasets. In particular, we propose solutions to two problems that arise in integrating information from different results of scientific research. The first problem, attribute matching, involves discovery of correspondences among distinct numeric features (attributes) that are used to characterize datasets that have been collected and analyzed in different research labs. The second problem, cluster matching, involves discovery of matchings between patterns (clusters) across datasets. We treat both of these problems together as a multi-objective optimization problem. A multi-objective simulated annealing algorithm is described to find the optimal solution and compared with the genetic algorithm. The utility of this approach is demonstrated in a series of experiments using synthetic and realistic datasets that are designed to simulate heterogeneous data from different sources.

    Other authors
    See publication
  • A Hypergraph-based Method for Discovering Semantically Associated Itemsets

    In Proceedings of the 11th IEEE International Conference on Data Mining (ICDM)

    In this paper, we address an interesting data mining problem of finding semantically associated item sets, i.e., items connected via indirect links. We propose a novel method for discovering semantically associated item sets based on a hyper graph representation of the database. We describe two similarity measures to compute the strength of associations between items. Specifically, we introduce the average commute time similarity, $\mathbf{s_{CT}}$, based on the random walk model on hyper…

    In this paper, we address an interesting data mining problem of finding semantically associated item sets, i.e., items connected via indirect links. We propose a novel method for discovering semantically associated item sets based on a hyper graph representation of the database. We describe two similarity measures to compute the strength of associations between items. Specifically, we introduce the average commute time similarity, $\mathbf{s_{CT}}$, based on the random walk model on hyper graph, and the inner-product similarity, $\mathbf{s_{L+}}$, based on the Moore-Penrose pseudoinverse of the hyper graph Laplacian matrix. Given semantically associated 2-itemsets generated by these measures, we design a hyper graph expansion method with two search strategies, namely, the clique and connected component search, to generate $k$-item sets ($k>2$). We show the proposed method is indeed capable of capturing semantically associated item sets through experiments performed on three datasets ranging from low to high dimensionality. The semantically associated item sets discovered in our experiment is promising to provide valuable insights on interrelationship between medical concepts and other domain specific concepts.

    Other authors
    See publication
  • Breaking the Deadlock: Simultaneously Discovering Attribute Matching and Cluster Matching with Multi-Objective Simulated Annealing

    In Proceedings of the International Conference on Ontologies, Databases and Application of SEmantics (ODBASE 2011). LNCS 7045, pp. 698-715,

    In this paper, we present a data mining approach to chal- lenges in the matching and integration of heterogeneous datasets. In particular, we propose solutions to two problems that arise in combining information from different results of scientific research. The first problem, attribute matching, involves discovery of correspondences among distinct numeric-typed summary features (“attributes”) that are used to charac- terize datasets that have been collected and analyzed in different research…

    In this paper, we present a data mining approach to chal- lenges in the matching and integration of heterogeneous datasets. In particular, we propose solutions to two problems that arise in combining information from different results of scientific research. The first problem, attribute matching, involves discovery of correspondences among distinct numeric-typed summary features (“attributes”) that are used to charac- terize datasets that have been collected and analyzed in different research labs. The second problem, cluster matching, involves discovery of matchings between patterns across datasets. We treat both of these problems together as a multi-objective optimization problem. A multi-objective simulated annealing algorithm is described to find the optimal solution. The utility of this approach is demonstrated in a series of experiments using synthetic and realistic datasets that are designed to simulate heterogeneous data from different sources.

    Other authors
    See publication
  • Ontology-based Mining of Brainwaves: A Sequence Similarity Technique for Mapping Alternative Descriptions of Patterns in Event Related Potentials (ERP) Data

    In Proceedings of the 14th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2010). LNCS 6119, pp. 43-54

  • Towards Semantic Data Mining

    In Doctoral Consortium of the 9th International Semantic Web Conference (ISWC)

    Incorporating domain knowledge is one of the most challenging problems in data mining. The Semantic Web technologies are promising to offer solutions to formally capture domain knowledge and realize the efficient usage of the knowledge. We call the the data mining technology powered by the Semantic Web, capable of systematically incorporating domain knowledge, the semantic data mining. In this paper, we first survey a body of representative academic work that explores ontology- based…

    Incorporating domain knowledge is one of the most challenging problems in data mining. The Semantic Web technologies are promising to offer solutions to formally capture domain knowledge and realize the efficient usage of the knowledge. We call the the data mining technology powered by the Semantic Web, capable of systematically incorporating domain knowledge, the semantic data mining. In this paper, we first survey a body of representative academic work that explores ontology- based optimization for various data mining applications. Then we identify the semantic annotation as a crucial step towards semantic data mining since it brings meaning to data. Finally we propose a learning- based semantic search algorithm for annotating (semi-) structured data.

    See publication
  • An Exploration of Understanding Heterogeneity through Data Mining

    In Proceedings of KDD'08 Workshop on Mining Multiple Information Sources (MMIS 2008). pp. 18-25

    Other authors
    See publication

Patents

More activity by Haishan

View Haishan’s full profile

  • See who you know in common
  • Get introduced
  • Contact Haishan directly
Join to view full profile

Other similar profiles

Explore collaborative articles

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Explore More

Others named Haishan Liu in United States

Add new skills with these courses