Huiyang Zhou

Huiyang Zhou

Raleigh, North Carolina, United States
956 followers 500+ connections

About

Huiyang Zhou received the bachelor's degree in electrical engineering from Xian Jiaotong…

Activity

Join now to see all activity

Experience

Education

Licenses & Certifications

Publications

  • yaSpMV Yet Another SpMV Framework on GPUs

    PPoPP'14

    SpMV is a key linear algebra algorithm and has been widely used in many important application domains. As a result, numerous attempts have been made to optimize SpMV on GPUs to leverage their massive computational throughput. Although the previous work has shown impressive progress, load imbalance and high memory bandwidth remain the critical performance bottlenecks for SpMV. In this paper, we present our novel solutions to these problems. First, we devise a new SpMV format, called blocked…

    SpMV is a key linear algebra algorithm and has been widely used in many important application domains. As a result, numerous attempts have been made to optimize SpMV on GPUs to leverage their massive computational throughput. Although the previous work has shown impressive progress, load imbalance and high memory bandwidth remain the critical performance bottlenecks for SpMV. In this paper, we present our novel solutions to these problems. First, we devise a new SpMV format, called blocked compressed common coordinate (BCCOO), which uses bit flags to store the row indices in a blocked common coordinate (COO) format so as to alleviate the bandwidth problem. We further improve this format by partitioning the matrix into vertical slices to enhance the cache hit rates when accessing the vector to be multiplied. Second, we revisit the segmented scan approach for SpMV to address the load imbalance problem. We propose a highly efficient matrix-based segmented sum/scan for SpMV and further improve it by eliminating global synchronization. Then, we introduce an auto-tuning framework to choose optimization parameters based on the characteristics of input sparse matrices and target hardware platforms. Our experimental results on GTX680 GPUs and GTX480 GPUs show that our proposed framework achieves significant performance improvement over the vendor tuned CUSPARSE V5.0 (up to 229% and 65% on average on GTX680 GPUs, up to 150% and 42% on average on GTX480 GPUs) and some most recently proposed schemes (e.g., up to 195% and 70% on average over clSpMV on GTX680 GPUs, up to 162% and 40% on average over clSpMV on GTX480 GPUs).

    Other authors
  • Time-Ordered Event Traces: A New Debugging Primitive for Concurrency Bugs

    International Parallel and Distributed Processing Symposium (IPDPS-XXV)

    Other authors
    See publication
  • Combining Local and Global History for High Performance Data Prefetching

    Journal of Instruction-Level Parallelism (JILP)

    Other authors
    See publication
  • Anomaly-Based Bug Prediction, Isolation, and Validation: An automated Approach for Software debugging

    14th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-XIV)

    Other authors
    See publication
  • Unified Architectural Support for Soft-Error Protection or Software-Bug Detection

    International Conference on Parallel Architectures and Compilation Techniques (PACT’07)

    Other authors
    See publication

More activity by Huiyang

View Huiyang’s full profile

  • See who you know in common
  • Get introduced
  • Contact Huiyang directly
Join to view full profile

Other similar profiles

Explore collaborative articles

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Explore More

Others named Huiyang Zhou