Mehrzad Samadi

Mehrzad Samadi

San Jose, California, United States
8K followers 500+ connections

About

NVIDIA Parabricks provides high-performance GPU-based software solutions for the analysis…

Experience

  • NVIDIA Graphic
  • -

    San Jose, California, United States

  • -

    Ann Arbor

  • -

    Ann Arbor

  • -

    Ann Arbor, MI

  • -

    Redmond

  • -

    Redmond, WA

  • -

    Tehran, Iran

Education

  • University of Michigan Graphic
  • -

    Advisor: Prof. Ali Afzali-Kusha
    Thesis title: Dynamic power management and voltage/frequency scaling in nano-scale devices

  • -

Publications

  • APOGEE: Adaptive Prefetching On GPUs for Energy Efficiency

    PACT 2013

    Modern graphics processing units (GPUs) combine large amounts of parallel hardware with fast context switching among thousands of active threads to achieve high performance. However, such designs do not translate well to mobile environments where power constraints often limit the amount of hardware. In this work, we investigate the use of prefetching as a means to increase the energy efficiency of GPUs. Classically, CPU prefetching results in higher performance but worse energy efficiency due…

    Modern graphics processing units (GPUs) combine large amounts of parallel hardware with fast context switching among thousands of active threads to achieve high performance. However, such designs do not translate well to mobile environments where power constraints often limit the amount of hardware. In this work, we investigate the use of prefetching as a means to increase the energy efficiency of GPUs. Classically, CPU prefetching results in higher performance but worse energy efficiency due to unnecessary data being brought on chip. Our approach, called APOGEE, uses an adaptive mechanism to dynamically detect and adapt to the memory access patterns found in both graphics and scientific applications that are run on modern GPUs to achieve prefetching efficiencies of over 90%. Rather than examining threads in isolation, APOGEE uses
    adjacent threads to more efficiently identify address patterns and dynamically adapt the timeliness of prefetching. The net effect of APOGEE is that fewer thread contexts are necessary to hide memory latency and thus sustain performance. This reduction in thread contexts and related hardware translates to simplification of hardware and leads to a reduction in power. For Graphics and GPGPU applications, APOGEE enables an 8X reduction in multithreading
    hardware, while providing a performance benefit of 19%. This translates to a 52% increase in performance per watt over systems with high multi-threading and 33% over existing
    GPU prefetching techniques.

    Other authors

Courses

  • Advanced Compilers

    583

  • Introduction to Artificial Intelligence

    492

  • Linear Programing

    510

  • Microarchitecture

    573

  • Parallel Computer Architecture

    570

Languages

  • Persian

    Native or bilingual proficiency

  • French

    Elementary proficiency

  • English

    Full professional proficiency

View Mehrzad’s full profile

  • See who you know in common
  • Get introduced
  • Contact Mehrzad directly
Join to view full profile

Other similar profiles

Explore collaborative articles

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Explore More

Others named Mehrzad Samadi

Add new skills with these courses