research

 

 

 

 

 


research

 

 

 

 

 

 

research

 

 

 

 

 


research


 

 

 

 

 

research

 

 

 

 

 


research

 

 

 

 

 


research

 

 

 

 

 


research

 

 

 

 

 


research

 

 

 

 

 


research

Advances in various branches of technology – data sensing, data communication, data
computation, and data storage – are driving an era of unprecedented innovation for information retrieval. Individuals, businesses, governments, and society as a whole are now having access to enormous collections of data, empowering them to build their own analytics.
“You can’t manage what you don’t measure,” a statement attributed to W. Edwards Deming, best describes why the recent explosion of data in various industry sectors such as financial, pharmaceutical, healthcare, and telecommunication is so important and needs attention for efficient processing and management. As more clients, consumer devices, and industrial equipment get connected through the Internet of Things, the tsunami of data that is generated and the opportunities to make decisions faster will make today’s predominant computing platforms from portable embedded devices to data center servers inefficient and obsolete. However, while demand for more computational resources continues to grow, the semiconductor industry has reached its physical scaling limits and is no longer able to reduce power consumption in new chips. Physical design constraints, such as power and density, have therefore become the dominant limiting factor for scaling out microprocessors.
Moore's Law scaling continues to provide increased transistor counts available to each new processor generation. In recent generations, those transistors are primarily devoted to increased core counts. However, the increase in transistor count is not accompanied by an increased chip power and thermal budget. Therefore, we are approaching an architectural era that some have labeled the ‘dark silicon era’ because we can configure more transistors than we can afford to turn on and power at once. The increasing number of cores integrated on a single chip invariably exacerbates the issues of power management and temperature control, both of which directly affect the system reliability. Continued device miniaturization makes circuits at nanoscale more expensive to produce as they become more unreliable and no longer behave precisely. Thus, we are facing a growing efficiency problem of designing computing systems that make use of all the available silicon estate.
The US microprocessor industry is now at a crossroads. While it continues to scale performance with each generation, we continue to drive this critically important technology domain. When performance scaling stops, microprocessors become a generic commodity and no longer a technology driver or enabler. Because modern processors are most heavily constrained by power, and sometimes energy, constraints, performance scaling no longer falls naturally from increased transistor counts. Instead, total performance is maximized by maximizing performance/Watt.
Heterogeneous architectures promise to push the envelope of power efficiency further by enabling general-purpose processors to achieve the efficiency of customized cores. By enabling more diverse designs, and designs that are customized dynamically, we can push the efficiency envelope even further.
Therefore, in my research I am interested to address this inefficiency problem by developing dynamic heterogeneous architectures to tackle the Dark and Unreliable Silicon, thus increasing chip performance, power-energy efficiency, and lifetime reliability. The overarching theme of my research is therefore on Green and Sustainable Computing which targets using computing resources in an energy efficient and environmentally responsible manner. The common theme of my work is to better architect computing systems with the goal of improving energy/performance proportionality. In particular, my research focus has been performance, power, temperature and reliability-aware memory subsystems and processor architecture design optimization. It focuses on each of these design challenges in multi/many cores embedded and high-performance computing systems. For the long-term, the major challenge continues to be that the underlying implementation technology affects processor architecture and circuit design. My primary research interests therefore lie at the intersection of run-time systems, computer architecture, embedded systems, and VLSI design, and in particular targets design challenges for deploying new technologies into today' s complex MPSoC and optimizing and customizing computing platform for emerging applications in various domains such as healthcare, telecommunication, and financial, where demanding high performance, yet efficient big data computation. This necessitates forming strong collaborative bonds with other disciplines such as computer vision, machine learning, data mining, biomedical engineering, and telecommunication and undertaking multi-disciplinary research projects.
The broader impact of my research spans a large range of industry and academic segments, from a Google or Facebook Data Center down to Apple smartphones, all of which strive for reliable and energy-efficient computing. This research not only benefits the embedded system and computer architecture community, it also motivates a sustained cross-disciplinary investigation that could potentially become the catalyst that will set the standard in other disciplines as well.


Heterogeneous Chip Architectures for Energy-Efficient Big Data Computing

Emerging big data analytics applications require a significant amount of server computational power. The costs of building and running a computing server to process big data and the capacity to which we can scale it are driven in large part by those computational resources. However, big data applications share many characteristics that are fundamentally different from traditional desktop, parallel, and scale-out applications. Big data analytics applications rely heavily on specific deep machine learning and data mining algorithms, and are running a complex and deep software stack with various components (e.g. Hadoop, Spark, MPI, Hbase, Impala, MySQL, Hive, Shark, Apache, and MangoDB) that are bound together with a runtime software system and interact significantly with I/O and OS, exhibiting high computational intensity, memory intensity, I/O intensity and control intensity. Current server designs, based on commodity homogeneous processors, will not be the most efficient in terms of performance/watt for this emerging class of applications. In other domains, heterogeneous architectures have emerged as a promising solution to enhance energy-efficiency by allowing each application to run on a core that matches resource needs more closely than a one-size-fits-all core. A heterogeneous architecture integrates cores with various micro-architectures and accelerators to provide more opportunity for efficient workload mapping. Effective workload mapping plays a crucial role to exploit heterogeneity by finding an architecture with just enough resources to match the workload needs and maximize the energy-efficiency.


Heterogeneous Ultra Low Power Accelerator for Wearable Biomedical Computing (NSF CSR-CNS, 2015-2018)

With the rapid advances in small, low-cost wearable computing technologies, there is a tremendous opportunity to develop personal health monitoring devices capable of continuous vigilant monitoring of physiological signals. Wearable biomedical devices have the potential to reduce the morbidity, mortality, and economic cost associated with many chronic diseases by enabling early intervention and preventing costly hospitalizations. These low power systems require to have the capacity to provide fast and accurate processing and interpretation of vast amounts of data and generate smart alarms only when warranted. The objective of this project is to build the foundation of the next generation of heterogeneous biomedical signal processing platforms that can address the current and future generation energy-efficiency requirements and computational demands. The PIs start with understanding the specific characteristics of emerging biomedical signal and imaging applications on off-the-shelf embedded low power multicore CPU, GPU and FPGA platforms to accurately understand the trade-offs they offer and the bottlenecks they have. Based on these results, the PIs will design and architect a domain-specific manycore accelerator in hardware and integrate it with an off-the-shelf embedded processor that together combine performance, scalability, programmability, and power efficiency requirements for these applications. The PIs will implement the proposed heterogeneous architecture in hardware and will evaluate its performance and power efficiency with a number of real-life biomedical workloads including seizure detection, handheld ultrasound spectral Doppler and imaging, tongue drive assistive device and prosthetic hand control interface. The proposed interdisciplinary research effort could inspire and enable new approaches to healthcare monitoring, and can significantly impact several fields including human-centered cyber- physical systems, cyber-security, mobile communications, bioinformatics and applications that require high performance and energy efficient embedded computing from different sensors. The proposed benchmark, characterization, and software-hardware computing framework will be freely shared and broadly disseminated among colleagues in related disciplines.


Logical Vanishability through Hybrid Spin Transfer Torque-CMOS Technology to Enhance Chip Security (DARPA, 2015-2017)

Integrated Circuits (ICs) are at the core of any modern computing system deployed in various industry sectors such as financial, pharmaceuticals, IT, automotive, smart electric power grids, aerospace, defense, and consumer electronics and their security and trustworthiness ground the security of the entire system. Notwithstanding the central impact of IC security and trustworthiness, the horizontal IC supply chain which involves several steps performed at multiple locations by different providers and integrates various IPs from several vendors has become prevalent due to confluence of increasingly complex supply chains, time-to-market delivery, and cost pressures. This trend, therefore poses significant challenges to hardware security assurance including design cloning, overproduction and reverse engineering. In possession of detailed design implementation at the physical level, an untrusted foundry may overproduce the design without design-house permission. After design release to the market, the design can also be subject to non-invasive reserve engineering, such as side channel attacks, to obtain secret information during design operation or invasive reserve engineering to obtain detailed design implementation. In order to prevent design cloning and overproduction, impede circuit reverse engineering and counterfeiting, and protect confidential data and proprietary/classified intellectual property this research introduces the concept of vanishable design through a novel hybrid logic design. We introduce a logical vanishing design based on hardware re-configuration and transformation by employing highly promising Spin Transfer Torque Magnetic technology (also called STT technology) to build Look-Up-Tables (LUTs) logic components. The STT reconfigurable design is similar in functionality to an FPGA but with significantly higher speed running at GHz frequency, near zero leakage power, high thermal stability, highly integratable with CMOS and more secure against various physical attacks, and overall competitive with custom CMOS design in terms of performance and energy-efficiency. With considering design constrains, such as performance and power, our proposed design flow integrates STT and CMOS technologies such that the final design implementation is hidden from any untrusted party been involved in the IC supply chain. Design implementation is complete when reconfigurable SST units are programmed in the design house. As a result, the untrusted foundry would not be able to clone or overproduce the design. Furthermore, the design effectively stands destructive reverse engineering attacks and non- invasive side-channel attacks.


A Novel Biomechatronic Interface Based on Wearable Dynamic Imaging Sensors (NSF CPS, 2013-2018)

The problem of controlling biomechatronic systems, such as multiarticulating prosthetic hands, involves unique challenges in the science and engineering of Cyber Physical Systems (CPS), requiring integration between computational systems for recognizing human functional activity and intent and controlling prosthetic devices to interact with the physical world. Research on this problem has been limited by the difficulties in noninvasively acquiring robust biosignals that allow intuitive and reliable control of multiple degrees of freedom (DoF). The objective of this research is to investigate a new sensing paradigm based on ultrasonic imaging of dynamic muscle activity. The synergistic research plan will integrate novel imaging technologies, new computational methods for activity recognition and learning, and high-performance embedded computing to enable robust and intuitive control of dexterous prosthetic hands with multiple DoF. The interdisciplinary research team involves collaboration between biomedical engineers, electrical engineers and computer scientists. The specific aims are to: (1) research and develop spatio- temporal image analysis and pattern recognition algorithms to learn and predict different dexterous tasks based on sonographic patterns of muscle activity (2) develop a wearable image- based biosignal sensing system by integrating multiple ultrasound imaging sensors with a low- power heterogeneous multicore embedded processor and (3) perform experiments to evaluate the real-time control of a prosthetic hand. The proposed research methods are broadly applicable to assistive technologies where physical systems, computational frameworks and low-power embedded computing serve to augment human activities or to replace lost functionality. The research will advance CPS science and engineering through integration of portable sensors for image-based sensing of complex adaptive physical phenomena such as dynamic neuromuscular activity, and real-time sophisticated image understanding algorithms to interpret such phenomena running on low-power high performance embedded systems. The technological advances would enable practical wearable image-based biosensing, with applications in healthcare, and the computational methods would be broadly applicable to problems involving activity recognition from spatiotemporal image data, such as surveillance. This research will have societal impacts as well as train students in interdisciplinary methods relevant to CPS. About 1.6 million Americans live with amputations that significantly affect activities of daily living. The proposed project has the long-term potential to significantly improve functionality of upper extremity prostheses, improve quality of life of amputees, and increase the acceptance of prosthetic limbs. This research could also facilitate intelligent assistive devices for more targeted neurorehabilitation of stroke victims. This project will provide immersive interdisciplinary CPS-relevant training for graduate and undergraduate students to integrate computational methods with imaging, processor architectures, human functional activity and artificial devices for solving challenging public health problems. A strong emphasis will be placed on involving undergraduate students in research as part of structured programs at our institution. The research team will involve students with disabilities in research activities by leveraging an ongoing NSF-funded project. Bioengineering training activities will be part of a newly developed undergraduate curriculum and a graduate curriculum under development. The synergistic research plan has been designed to advance CPS science and engineering through the development of new computational methods for dynamic activity recognition and learning from image sequences, development of novel wearable imaging technologies including high-performance embedded computing, and real-time control of a physical system. The specific aims are to: (1) Research and develop spatio-temporal image analysis and pattern recognition algorithms to learn and predict different dexterous tasks based on sonographic patterns of muscle activity. The first aim has three subtasks designed to collect, analyze and understand image sequences associated with functional tasks. (2) Develop a wearable image-based biosignal sensing system by integrating multiple ultrasound imaging sensors with a low-power heterogeneous multicore embedded processor. The second aim has two subtasks designed to integrate wearable imaging sensors with a real-time computational platform. (3) Perform experiments to evaluate the real-time control of a prosthetic hand. The third aim will integrate the wearable image acquisition system developed in Aim 2, and the image understanding algorithms developed in Aim 1, for real-time evaluation of the control of a prosthetic hand interacting with a virtual reality environment. Successful completion of these aims will result in a real-time system that acquires image data from complex neuromuscular activity, decodes activity intent from spatiotemporal image data using computational algorithms, and controls a prosthetic limb in a virtual reality environment in real time. Once developed and validated, this system can be the starting point for developing a new class of sophisticated control algorithms for intuitive control of advanced prosthetic limbs, new assistive technologies for neurorehabilitation, and wearable real-time imaging systems for smart health applications.


Enhancing the Security on Embedded Automotive Systems (General Motors,2013-2016)

Today’s automobiles are monitored and controlled by a vast array of computers, sensors, and embedded systems. With each generation of vehicle, the human to mechanical interfaces are supplanted by sensors and embedded systems that, in some cases, still provide the driver with the perception that the human is controlling mechanical systems, when in fact, the computers and embedded systems are controlling the vehicle. Critical automotive functions including braking, acceleration, gear shifting, lights, wipers, airbag deployment and in some cases, steering are controlled by these embedded systems. These embedded systems and micro controllers communicate on shared bus structures that allow them to communicate with each other without a central host computer. Modern vehicles come with cellular data links, internet based ignition, Bluetooth radios and other external or remote interfaces. The combination of these critical systems with remote accessibility creates vulnerabilities and risks to passenger safety and to the reputation of the manufacturers. With the average weight of a vehicle exceeding 4000 pounds, these vulnerabilities create the potential for cyber hackers to remotely control objects that contain lethal amounts of kinetic energy. Several universities have recently demonstrated the ability for a cyber attacker to maliciously control a wide range of automotive functions where the vehicle will completely ignore driver input. Since the automotive industry has not designed their vehicles with security of the internal electronics in mind, manufacturers also do not have the ability to assess the impact that the broad array of new components have on the overall security of the vehicle. In this project we explore the security of automotive infotainment hardware that supports third party apps and connects smart phones to the internal automotive networks and hardware. Our goal is to understand the existing attack surface and to develop hardware defenses that can provide improved isolation between these apps and safety critical automotive hardware.


Dynamically Heterogeneous Cores Through 3D Resource Pooling (NSF CI Fellow, 2010- 2012)

3D die stacking is a recent technological development which makes it possible to create chip multiprocessors using multiple layers of active silicon bonded with low-latency, high-bandwidth, and very dense vertical interconnects. 3D die stacking technology provides very fast communication, as low as a few picoseconds, between processing elements residing on different layers of the chip. The rapid communication network in a 3D stack design, along with the expanded geometry, provides an opportunity to dynamical share on-core resources among different cores. While in a 2D processor, resources (such as register file, instruction queue, load/store queue and reorder buffer) available to one core are too distant to ever be useful to another core, with a 3D architecture, we can dynamically “pool” resources that are performance bottlenecks for the particular thread running on a particular core. In this research, I introduced an architecture for a dynamically heterogeneous processor architecture leveraging 3D stacking technology. Unlike prior work in the 2D plane, the extra dimension makes it possible to share resources at a fine granularity between vertically stacked cores. As a result, each core can grow or shrink resources as needed by the code running on the core. This architecture, therefore, enables runtime customization of cores at a fine granularity, enables efficient execution at both high and low levels of thread level parallelism, enables fine- grain thermal management, and enables fine-grain reconfiguration around faults. With this architecture, we achieve performance gains of up to 2X, depending on the number of executing threads, and gain significant advantage in energy-efficiency.

Volgenau School of Engineering, George Mason University