Ehsan Saboori (PhD, Eng)

Ehsan Saboori (PhD, Eng)

Toronto, Ontario, Canada
4K followers 500+ connections

About

I am a Co-Founder and CTO at Deeplite Inc. where I am leading core technology development, encompassing novel neural network compression, design space exploration methods, ultra low-bit quantization and efficient runtime and compiler approaches for on-device AI inference.

Deeplite is dedicated to the fundamental advancement of deep learning systems. We are tackling inference optimization of deep neural networks. Our solution leverages state-of-the-art technologies from elite universities to make deep neural networks faster, smaller and energy-efficient from cloud to edge computing.

Activity

Join now to see all activity

Experience

  • Deeplite Graphic

    Deeplite

    Montreal, Canada Area

  • -

    Montreal, Quebec, Canada

  • -

    Montreal, Canada Area

  • -

    Montreal, Canada Area

  • -

    Greater New York City Area

  • -

    Vancouver, Canada Area

  • -

    Montreal, Canada Area

  • -

    Iran, Mashhad

  • -

    Iran, Mashhad

Education

  •  Graphic

    -

    Activities and Societies: Research Assistance

    Hybrid prototyping of multicore embedded systems

  • -

  • -

  • -

Publications

  • Fast and cycle-accurate simulation of multi-threaded applications on SMP architectures using hybrid prototyping

    Proceedings of the Eleventh IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis

    This paper presents a fast and cycle accurate simulation environment for early power-performance analysis of multithreaded applications targeted to symmetric multiprocessing embedded architectures. Our simulation environment leverages the hybrid prototyping technique, where a lightweight emulation kernel performs logical simulation of multiple identical cores on top of a single physical instance of a core. The technique does not require a detailed timing model of the core hardware because the…

    This paper presents a fast and cycle accurate simulation environment for early power-performance analysis of multithreaded applications targeted to symmetric multiprocessing embedded architectures. Our simulation environment leverages the hybrid prototyping technique, where a lightweight emulation kernel performs logical simulation of multiple identical cores on top of a single physical instance of a core. The technique does not require a detailed timing model of the core hardware because the application threads execute directly on the target core. Previous work on hybrid prototyping supported modeling of only statically scheduled threads, thereby severely limiting its modeling capabilities. In this work, we describe the modeling of dynamic RTOS scheduler as well as hardware interrupts on top of the emulation kernel, in order to support the simulation of unmodified multi-threaded applications. Our experimental results demonstrate the high accuracy, simulation speed and scalability of our hybrid prototyping-based simulation models.

    Other authors
    See publication
  • Rapid design space exploration of multi-clock domain MPSoCs with hybrid prototyping

    Canadian Conference on Electrical and Computer Engineering (CCECE)

    This paper presents novel techniques of using hybrid prototyping for early power-performance analysis of MPSoC designs with multiple clock domains. The fundamental idea of hybrid prototyping is to simulate a design with multiple cores by creating an emulation kernel in software on top of a single physical instance of the core. However, so far hybrid prototyping has been limited to homogeneous multicores running at the same clock frequency. Moreover, hybrid prototyping has not yet been…

    This paper presents novel techniques of using hybrid prototyping for early power-performance analysis of MPSoC designs with multiple clock domains. The fundamental idea of hybrid prototyping is to simulate a design with multiple cores by creating an emulation kernel in software on top of a single physical instance of the core. However, so far hybrid prototyping has been limited to homogeneous multicores running at the same clock frequency. Moreover, hybrid prototyping has not yet been demonstrated for efficient design space exploration. Our work focuses on enhancing the capabilities of hybrid prototyping, such that it can be applied to realistic multi-clock MPSoC designs as well to perform early power-performance evaluation of MPSoC designs. Our experiments using industrial strength applications such as JPEG, MP3 and Packet Processing, demonstrate the high accuracy of our hybrid prototypes, and over two orders of magnitude improvement over software simulation speed. We also demonstrate that exploring over 150 design options using hybrid prototyping can be done with high reliability in the order of minutes compared to multiple days using conventional FPGA prototyping.

    Other authors
    See publication
  • DRAC: A Dynamically Reconfigurable Active L1 Cache Model for Hybrid Prototyping of Multicore Embedded Systems

    Rapid System Prototyping (RSP), 2014 International

    The paper presents a novel dynamically reconfigurable active L1 cache model for hybrid prototyping, called DRAC. The hybrid prototyping technique simulates multicore system using an emulation kernel on top of a single physical instance of a core. We extend hybrid prototyping by supporting memory hierarchy modeling with DRAC. The presented cache model is a standalone cycle accurate model that is customized for multicore emulation. DRAC run-time configurability enables the embedded system…

    The paper presents a novel dynamically reconfigurable active L1 cache model for hybrid prototyping, called DRAC. The hybrid prototyping technique simulates multicore system using an emulation kernel on top of a single physical instance of a core. We extend hybrid prototyping by supporting memory hierarchy modeling with DRAC. The presented cache model is a standalone cycle accurate model that is customized for multicore emulation. DRAC run-time configurability enables the embedded system designer to simulate and explore different multicore design options without the need for full FPGA prototyping. Our experimental results show timing/energy estimation for different task mappings and different cache sizes of a JPEG encoder benchmark. We have observed 2.78% average error and 5.06% worst case error when DRAC is used as a standalone cache model in a single core design. We also observed 100% relative accuracy and less than 13% absolute worst case error in timing estimation when DRAC is used for hybrid prototyping of multicore designs.

    Other authors
    See publication
  • Hybrid Prototyping of multicore embedded systems

    Design, Automation & Test in Europe Conference & Exhibition (DATE), 2013

    This paper presents a novel modeling technique for multicore embedded systems, called Hybrid Prototyping. The fundamental idea is to simulate a design with multiple cores by creating an emulation kernel in software on top of a single physical instance of the core. The emulation kernel switches between tasks mapped to different cores and manages the logical simulation times of the individual cores. As a result, we can achieve fast and cycle-accurate simulation of symmetric multicore designs…

    This paper presents a novel modeling technique for multicore embedded systems, called Hybrid Prototyping. The fundamental idea is to simulate a design with multiple cores by creating an emulation kernel in software on top of a single physical instance of the core. The emulation kernel switches between tasks mapped to different cores and manages the logical simulation times of the individual cores. As a result, we can achieve fast and cycle-accurate simulation of symmetric multicore designs, thereby overcoming the accuracy concerns of virtual prototyping and the scalability issues of physical prototyping. Our experiments with industrial multicore designs show that the simulation time with hybrid prototyping grows only linearly with the number of cores and the inter-core communication traffic, while providing 100% cycle accuracy.

    Other authors
    See publication
  • Analyzing the Dual-Path Peer-to-Peer Anonymous Approach

    arXiv preprint arXiv:1208.3022

    Dual-Path is an anonymous peer-to-peer approach which provides requester anonymity. This approach provides anonymity between a requester and a provider in peer-to-peer networks with trusted servers called suppernode so the provider will not be able to identify the requester and no other peers can identify the two communicating parties with certainty. Dual-Path establishes two paths for transmitting data. These paths called Request path and Response path. The first one is used for requesting…

    Dual-Path is an anonymous peer-to-peer approach which provides requester anonymity. This approach provides anonymity between a requester and a provider in peer-to-peer networks with trusted servers called suppernode so the provider will not be able to identify the requester and no other peers can identify the two communicating parties with certainty. Dual-Path establishes two paths for transmitting data. These paths called Request path and Response path. The first one is used for requesting data and the second one is used for sending the requested data to the requester. As Dual-Path approach is similar to Crowds approach, this article compares reliability and performance of Dual-Path and Crowds. For this purpose a simulator is developed and several scenarios are defined to compare Dual-Path and Crowds in different situations. In chapter 2 and 3 Dual-Path and Crowds approaches are briefly described. Chapter 4 is talking about simulator. Chapter 5 explains the scenarios for comparison of performance. Chapter 6 is about comparison of reliability and chapter 7 is conclusion.

    Other authors
    See publication
  • Data Selection for Semi-Supervised Learning

    arXiv preprint arXiv:1208.1315

    The real challenge in pattern recognition task and machine learning process is to train a discriminator using labeled data and use it to distinguish between future data as accurate as possible. However, most of the problems in the real world have numerous data, which labeling them is a cumbersome or even an impossible matter. Semi-supervised learning is one approach to overcome these types of problems. It uses only a small set of labeled with the company of huge remain and unlabeled data to…

    The real challenge in pattern recognition task and machine learning process is to train a discriminator using labeled data and use it to distinguish between future data as accurate as possible. However, most of the problems in the real world have numerous data, which labeling them is a cumbersome or even an impossible matter. Semi-supervised learning is one approach to overcome these types of problems. It uses only a small set of labeled with the company of huge remain and unlabeled data to train the discriminator. In semi-supervised learning, it is very essential that which data is labeled and depend on position of data it effectiveness changes. In this paper, we proposed an evolutionary approach called Artificial Immune System (AIS) to determine which data is better to be labeled to get the high quality data. The experimental results represent the effectiveness of this algorithm in finding these data points.

    Other authors
    See publication
  • Fast feature reduction in intrusion detection datasets

    MIPRO, 2012 Proceedings of the 35th International Convention

    In the most intrusion detection systems (IDS), a system tries to learn characteristics of different type of attacks by analyzing packets that sent or received in network. These packets have a lot of features. But not all of them is required to be analyzed to detect that specific type of attack. Detection speed and computational cost is another vital matter here, because in these types of problems, datasets are very huge regularly. In this paper we tried to propose a very simple and fast feature…

    In the most intrusion detection systems (IDS), a system tries to learn characteristics of different type of attacks by analyzing packets that sent or received in network. These packets have a lot of features. But not all of them is required to be analyzed to detect that specific type of attack. Detection speed and computational cost is another vital matter here, because in these types of problems, datasets are very huge regularly. In this paper we tried to propose a very simple and fast feature selection method to eliminate features with no helpful information on them. Result faster learning in process of redundant feature omission. We compared our proposed method with three most successful similarity based feature selection algorithm including Correlation Coefficient, Least Square Regression Error and Maximal Information Compression Index. After that we used recommended features by each of these algorithms in two popular classifiers including: Bayes and KNN classifier to measure the quality of the recommendations. Experimental result shows that although the proposed method can't outperform evaluated algorithms with high differences in accuracy, but in computational cost it has huge superiority over them.

    Other authors
    See publication
  • Anonymous Communication in Peer-to-Peer Networks for providing more Privacy and Security

    International Journal of Modeling and Optimization

    One of the most important issues in peer-to-peer networks is anonymity. The major anonymity for peer-to-peer users concerned with the users' identities and actions which can be revealed by any other members. There are many approaches proposed to provide anonymous peer-to-peer communications. An intruder can get information about the content of the data, the sender's and receiver's identities. Anonymous approaches are designed with the following three goals: to protect the identity of provider…

    One of the most important issues in peer-to-peer networks is anonymity. The major anonymity for peer-to-peer users concerned with the users' identities and actions which can be revealed by any other members. There are many approaches proposed to provide anonymous peer-to-peer communications. An intruder can get information about the content of the data, the sender's and receiver's identities. Anonymous approaches are designed with the following three goals: to protect the identity of provider, to protect the identity of requester and to protect the contents of transferred data between them. This article presents a new peer-to-peer approach to achieve anonymity between a requester and a provider in peer-to-peer networks with trusted servers called suppernode so that the provider will not be able to identify the requester and no other peers can identify the two communicating parties with certainty. This article shows that the proposed algorithm improved reliability and has more security. This algorithm, based on onion routing and randomization, protects transferring data against traffic analysis attack. The ultimate goal of this anonymous communications algorithm is to allow a requester to communicate with a provider in such a manner that nobody can determine the requester's identity and the content of transferred data.

    Other authors
    • Shahriar Mohammadi
    See publication
  • Improving the K-means algorithm using improved downhill simplex search

    Software Technology and Engineering (ICSTE), 2010 2nd International Conference on

    the k-means algorithm is one of the well-known and most popular clustering algorithms. K-means seeks an optimal partition of the data by minimizing the sum of squared error with an iterative optimization procedure, which belongs to the category of hill climbing algorithms. As we know hill climbing searches are famous for converging to local optimums. Since k-means can converge to a local optimum, different initial points generally lead to different convergence cancroids, which makes it…

    the k-means algorithm is one of the well-known and most popular clustering algorithms. K-means seeks an optimal partition of the data by minimizing the sum of squared error with an iterative optimization procedure, which belongs to the category of hill climbing algorithms. As we know hill climbing searches are famous for converging to local optimums. Since k-means can converge to a local optimum, different initial points generally lead to different convergence cancroids, which makes it important to start with a reasonable initial partition in order to achieve high quality clustering solutions. However, in theory, there exist no efficient and universal methods for determining such initial partitions. In this paper we tried to find an optimum initial partitioning for k-means algorithm. To achieve this goal we proposed a new improved version of downhill simplex search, and then we used it in order to find an optimal result for clustering approach and then compare this algorithm with Genetic Algorithm base (GA), Genetic K-Means (GKM), Improved Genetic K-Means (IGKM) and k-means algorithms.

    Other authors
    See publication
  • Automatic firewall rules generator for anomaly detection systems with Apriori algorithm

    2010 3rd International Conference on Advanced Computer Theory and Engineering (ICACTE)

    Network intrusion detection systems have become a crucial issue for computer systems security infrastructures. Different methods and algorithms are developed and proposed in recent years to improve intrusion detection systems. The most important issue in current systems is that they are poor at detecting novel anomaly attacks. These kinds of attacks refer to any action that significantly deviates from the normal behaviour which is considered intrusion. This paper proposed a model to improve…

    Network intrusion detection systems have become a crucial issue for computer systems security infrastructures. Different methods and algorithms are developed and proposed in recent years to improve intrusion detection systems. The most important issue in current systems is that they are poor at detecting novel anomaly attacks. These kinds of attacks refer to any action that significantly deviates from the normal behaviour which is considered intrusion. This paper proposed a model to improve this problem based on data mining techniques. Apriori algorithm is used to predict novel attacks and generate real-time rules for firewall. Apriori algorithm extracts interesting correlation relationships among large set of data items. This paper illustrates how to use Apriori algorithm in intrusion detection systems to cerate a automatic firewall rules generator to detect novel anomaly attack. Apriori is the best-known algorithm to mine association rules. This is an innovative way to find association rules on large scale.

    Other authors
    See publication
  • Dual-Path Peer-to-Peer Anonymous Approach

    Parallel and Distributed Systems (ICPADS), 2010 IEEE 16th International Conference on

    there are many approaches proposed to provide anonymous peer-to-peer communications. Data sent via peer-to-peer communications is vulnerable to traffic analysis. Traffic analysis is the process of intercepting and analysing messages in order to compromise information from patterns in communications. An intruder can get information about the content of the data, the requester's and provider's identities. Anonymous approaches are designed with the following three goals: to protect the identity of…

    there are many approaches proposed to provide anonymous peer-to-peer communications. Data sent via peer-to-peer communications is vulnerable to traffic analysis. Traffic analysis is the process of intercepting and analysing messages in order to compromise information from patterns in communications. An intruder can get information about the content of the data, the requester's and provider's identities. Anonymous approaches are designed with the following three goals: to protect the identity of provider, to protect the identity of requester and to protect the contents of transferred data between them. This article presents a new peer-to-peer approach to achieve anonymity between a requester and a provider in a hybrid peer-to-peer information-sharing environment with trusted servers called supper node so that the provider will not be able to identify the requester and no other peers can identify the two communicating parties with certainty. This article shows that the proposed approaches improved reliability and has more security. This approach, based on onion routing and randomization, protects transferring data against traffic analysis attack. The ultimate goal of this anonymous communications approach is to allow a requester to communicate with a provider in such a manner that nobody can determine the requester’s identity and the content of transferred data.

    Other authors
    See publication
  • A new scheduling algorithm for server farms load balancing

    Industrial and Information Systems (IIS), 2010 2nd International Conference on

    this paper describes a new scheduling algorithm to distribute jobs in server farm systems. The proposed algorithm overcomes the starvation caused by SRPT (Shortest Remaining Processing Time). This algorithm is used in process scheduling in operating system approach. The algorithm was developed to be used in dispatcher scheduling. This algorithm is non-preemptive discipline, similar to SRPT, in which the priority of each job depends on its estimated run time, and also the amount of time it has…

    this paper describes a new scheduling algorithm to distribute jobs in server farm systems. The proposed algorithm overcomes the starvation caused by SRPT (Shortest Remaining Processing Time). This algorithm is used in process scheduling in operating system approach. The algorithm was developed to be used in dispatcher scheduling. This algorithm is non-preemptive discipline, similar to SRPT, in which the priority of each job depends on its estimated run time, and also the amount of time it has spent on waiting. Tasks in the servers are served in order of priority to optimize the system response time. The experiments show that the mean round around time is reduced in the server farm system.

    Other authors
    See publication
  • A real-time nonlinear robust controller for magnetic levitation systems

    2008 3rd IEEE Conference on Industrial Electronics and Applications

    This paper first introduces a nonlinear robust controller for maglev systems. This controller which employs mainly sliding mode concepts ensures global stability of the closed-loop system. A theorem has been established in the paper to prove the overall system stability. To show how good the proposed controller is we apply it to the maglev system. Furthermore, it has been shown that the proposed controller is robust against plant parameter uncertainties. The Experimental result clearly approves…

    This paper first introduces a nonlinear robust controller for maglev systems. This controller which employs mainly sliding mode concepts ensures global stability of the closed-loop system. A theorem has been established in the paper to prove the overall system stability. To show how good the proposed controller is we apply it to the maglev system. Furthermore, it has been shown that the proposed controller is robust against plant parameter uncertainties. The Experimental result clearly approves the effectiveness of the designed controller

    Other authors
    See publication
  • An evolutionary gait generator with online parameter adjustment for humanoid robots

    2008 IEEE/ACS International Conference on Computer Systems and Applications

    This article proposes a new hybrid methodology, together with an associated series of experiments employing this methodology, for an evolutionary gait generator that uses trigonometric truncated Fourier series formulations with coefficients optimized by a Genetic Algorithm. The Fourier series is used to model joint angle trajectories of a simulated humanoid robot with 25 degrees of freedom. The humanoid robot in this study learns to imitate the human walking behavior on flat terrains in a…

    This article proposes a new hybrid methodology, together with an associated series of experiments employing this methodology, for an evolutionary gait generator that uses trigonometric truncated Fourier series formulations with coefficients optimized by a Genetic Algorithm. The Fourier series is used to model joint angle trajectories of a simulated humanoid robot with 25 degrees of freedom. The humanoid robot in this study learns to imitate the human walking behavior on flat terrains in a dynamically simulated environment. The simulation result shows the robustness of the developed walking behaviors even in extremely high and low speeds providing appropriate frequency. Number of range limitations were applied to the genetic algorithm used in this research to improve the learning period to less than 48 hours. The research seeks to improve upon the previous works on evolutionary gait generation, in robots with lower degrees of freedom. In addition, the proposed solution adapts a hybrid approach, thereby avoiding the long learning curves and unstable and slow gaits associated with evolutionary approaches.

    Other authors
    See publication

Projects

  • Management Information System (MIS)

    Designed and developed an MIS for an educational institute with more than seven thousand students offering several courses on foreign languages. This web based system includes several customized panels for administrators, teachers students and other roles.

    Other creators
    See project
  • Legacy ERP System Migration

    Redesigned and developed parts of a legacy Enterprise Resource Planning (ERP) software for a trading company with warehouses and sale points across the country. The project required as-is analysis of the current business processes, managing user requirements and planning migration.

    Other creators
  • Controller Software for a Humanoid Robot

    Designed a genetic algorithm that teaches a simulated humanoid robot to walk. The robot was able to walk with a decent speed after some generations. The simulation of the physical world was performed in the open source framework Simspark. This project required teamwork and scientific research alongside c++ programming of the controller.

    Other creators

Honors & Awards

  • 2nd Place in Humanoid Robotic Contest, IranOpen Robotic Contest

    IranOpen Organization

  • 9th place, Robocup Contest, Atlanta, US

    Robocup Organization

  • 11th place in 32nd ACM International collegiate programming Contest

    ACM/ICPC

Languages

  • Persian

    Native or bilingual proficiency

  • English

    Full professional proficiency

More activity by Ehsan

View Ehsan’s full profile

  • See who you know in common
  • Get introduced
  • Contact Ehsan directly
Join to view full profile

Other similar profiles

Explore collaborative articles

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Explore More

Add new skills with these courses