The Computer Engineering Group (CEG) at the Institute of Computer Engineering at Ruprecht-Karls University of Heidelberg is focussing on improving the performance, energy-efficiency and usability of heterogeneous computing systems. The group’s expertise covers parallel computer architecture, high-performance computing, high-performance analytics and reconfigurable logic. The group’s research focus in the last years has been driven by application-specific computing, with projects including specialized communication models and methods for thread-parallel processors (Mantaro project), data acquisition for high-energy physics under hard real-time constraints (CERN ATLAS collaboration), deep learning techniques for resource-constrained embedded systems (DeepChip project), advanced compilation techniques for multi-GPU systems (Mekong project), and data analytics using columnar in-memory database systems (Graphite project).
The group has gained international notice for designing the first communication model that is completely in-line with a thread-collaborative processor’s execution model (GGAS), yielding substantial savings for clustered GPUs in terms of energy and time. Other efforts on GPU Computing include compilation techniques for automated partitioning and resource aggregation, and performance and power modeling of scalable heterogeneous computing systems. The group leads the GPU Education and Research Center at Heidelberg University, sponsored by NVIDIA. The group has a rich history on reconfigurable logic for specialization, including interconnection networks and memory architectures. Within the DeepChip project, they collaborate with Graz University of Technology and the Materials Center Leoben to optimize Deep Learning for resource-constrained embedded platforms, and to automate the use of hybrid processors like Advanced RISC (Reduced Instruction Set Computing) Machines (ARM) and FPGA for such tasks. Similarly, their collaboration with CERN explores real-time data acquisition using commodity technology. Also, related efforts include GPU-accelerated ray-tracing for real-time treatment planning and optimizing graph processing for high-performance analytics by extending columnar in-memory databases.
- 04/2017: Congrats to Matthias for getting his paper accepted at the GRADES workshop at SIGMOD 2017!
- 04/2017: Congrats to Benjamin, his paper is a best paper finalist at ISC2017: http://insidehpc.com/2017/04/isc-2017-announces-finalists-hans-meuer-award
- 04/2017: Invited talk at Saarland University: "Life after Dennard and its implications: it’s about time for energy"
- 03/2017: One talk (S7300 Managed Communication for Multi-GPU Systems) and two posters (Mantaro: Managed Communication for Multi-GPU Systems & GPU Mekong: Simplified Multi-GPU Programming using Automated Partitioning) accepted at NVIDIA's GTC 2017
- 03/2017: Congrats to Benjamin for getting a best paper award for his IPDPS2017 contribution! http://www.ipdps.org/ipdps2017/2017_advance_program.html#thursday
- 02/2017: Invited talk at Technical University of Dresden: "Don't trust anyone over thirty: GPUs as general-purpose processors in their teenage decade"
- 02/2017: Submission deadline for special issue of CCPE journal on heterogenous and unconventional cluster architectures and applications: http://www.hucaa-workshop.org/ccpe2017
- 02/2017: Paper on Message Passing Relaxations for SIMT processors accepted at IPDPS 2017. Congrats to Benjamin!
- 01/2017: Two papers accepted at HiPINEB 2017 workshop, in conjunction with HPCA 2017. Congrats to Felix, Steffen and former visiting researchers Francisco and Juan!
- 12/2016: Invited talk at SAP, Walldorf: "Growing up: GPUs as general-purpose processors in their teenage decade"
- 12/2016: Invited talk at University of Lübeck: "Life after Dennard and its implications: it’s about time for energy"
- 12/2016: Research project "DeepChip", a collaboration with Graz University of Technology" is funded and starting 12/2016!
- 11/2016: Invited guest lecture at Stanford on GP-GPU (hosted by Christos Kozyrakis & Heiner Litz)
- 05/2016: Invited talk at NVIDIA Research, Santa Clara: "Talkative GPUs: towards efficient communication in terms of performance, energy and usability"
- 05/2016: Invited talk at Technical University of Munich: "Life after Dennard and its implications: it’s about time for energy"
- 04/2016: Congratulations to Felix Zahn for his Carl-Zeiss Scholarship!
- 02/2016: Hosting a workshop on "Energy-Proportional Networks" at the Institute of Computer Engineering
- 01/2016: Invited talk at Engineering Mathematics and Computing Lab: "Life after Dennard and its implications: it’s about time for energy"
- 10/2015: Invited talk at Computer Vision Forum Heidelberg: "Energy-Efficient Computations on Heterogeneous Processor Architectures" http://www.bv-forum.de
- 09/2015: Invited talk at HiPEAC Computing Week: "Unexplored energy aspects of scalable heterogeneous computing systems" http://www.hipeac.net/csw/2015/milano [pdf]
- 09/2015: HUCAA workshop in conjunction with IEEE CLUSTER, Chicago, IL
- 07/2015: An interview with HPCWire about the upcoming session on On-chip and Off-chip Interconnection Networks for Future HPC Systems at ISC2015
- 03/2015: Congrats to Benjamin (Klenk) for getting his paper "Analyzing Communication Models for Distributed Thread-Collaborative Processors in Terms of Energy and Time" accepted at ISPASS2015, and in addition for receiving an ISPASS student travel grant!
- 03/2015: Congrats to Javier (Pradesa), external collaborator, for getting his paper "On the Design of a New Dynamic Credit-Based End-to-End Flow-Control Mechanism for HPC Clusters" accepted at PARCO Journal!
- 03/2015: Call for papers for HUCAA2015 workshop is announced!
- 03/2015: Invited visiting professor at Technical University of Graz, Austria
- 02/2015: Guest editor of Special issue of Wiley's Concurrency and Computation - Practise and Experience (CCPE)
Main Research Projects
- Mantaro is an adaptive communication architecture that focuses on data movement optimizations for heterogeneous environments with specialized ISAs. It is aware of heterogeneity, specialized execution models and non-uniform memory hierarchies. It gears to automatically adapt during execution to provide optimal communication paths for multi-GPU systems, allowing to optimally support various workloads including emerging ones with highly irregular, unstructured and dynamically changing communication patterns. Mantaro builds on top of specialized communication models like GGAS (among others) and combines them behind suitable user-level communication abstractions. Since 2014.
- Mekong (formerly GCUDA): the main objective of (GPU) Mekong is to provide a simplified path to scale out the execution of GPU programs from one GPU to almost any number, independent of whether the GPUs are located within one host or distributed at the cloud or cluster level. Unlike existing solutions, this work proposes to maintain the GPU’s native programming model, which relies on a bulk-synchronous, thread-collective execution; that is, no hybrid solutions like OpenCL/CUDA programs combined with message passing are required. As a result, we can maintain the simplicity and efficiency of GPU computing in the scale-out case, together with a high productivity and performance. GCUDA received funding from Google in form of a research award. Mekong has just been granted additional BMBF funding. Since 2014. Read more: http://sites.google.com/site/gpumekong
- Deep learning on resource-constrained systems (DeepChip): many processes require evaluation of complex numerical functions close to the machine or structure of interest, to avoid the effort of data transfer or to enable small reaction times. Although computing performance of embedded platforms is increasing, it is often significantly lower than the requirements of state-of-the-art algorithms. With the advent of Deep Neural Networks (DNN), the achievable classification performance has been pushed to new levels. The high cost of execution, however, renders them unusable to many real-world applications. A possible approach is the use of hybrid processors (ARM+FPGA or similar), but this raises the question on how to auto-generate optimized DNN classifier implementations. In the DeepChip project, we tackle this problem by optimizing deep models in terms of sparsity, asynchrony and reduced precision, and by extending machine learning languages with a hybrid back-end that is responsible for HDL code generation, automated partitioning and integration. This research line was initiated during the research stay as visiting professor at TU Graz. A DACH project has been funded and started December 2016, which is a collaboration with the group of Franz Pernkopf from Technical University of Graz, and Manfred Mücke from Materials Center Leoben. Since 2015.
- Integrated Power Models: a fundamental understanding of power consumption is essential to design and operate computing systems. Especially interconnection networks are a neglected topic in the area of power modeling. In this project we collaborate with colleagues from the University of Castilla-La Mancha (Spain) to explore such aspects using simulations and derive suitable models that help understanding power and energy consumption to drive optimizations. An integrated power and performance model for scalable, heterogeneous computing clusters, which covers processors, memory and network, enables an improved predictability of power consumption and a characterization of data movement costs in terms of energy and time. Since 2015.
- Graphite: while common approaches try to optimally support graph computations by dedicated software stacks (e.g. graph database management systems), in this work we explore how existing columnar databases can be extended to optimally support graph queries. Direct advantages include reduced data movements, and in addition other aspects like attributes, updates, concurrency and NUMA effects can be much better addressed. This is a joint project with SAP, and the Innovation Lab (Berlin). Since 2015.
- Data acquisition for high-energy physics experiments: For the ATLAS high-energy physics experiment at CERN we are contributing to the data acquisition system, in particular the data collection manager. In this project, a commodity Ethernet network is used and upper-level software layers like the data collection manager guarantee minimal collection latencies by traffic shaping techniques. In addition, a complete data-flow messaging library is designed and optimized for this special application. This is a collaboration with colleagues from CERN and University of Castilla-La Mancha, Spain, and currently being extended by system modeling and data compression techniques. Since 2013.
We gratefully acknowledge the generous support that we are receiving. Current sponsors include BMBF, DFG, Google, NVIDIA, Xilinx, Micron, HiPEAC, Carl-Zeiss Stiftung, and the German Excellence Initiative.
- JProf. Dr. Holger Fröning
Associate Professor for Computer Engineering (Juniorprofessur für Technische Informatik)
Office: B6,29, Mannheim, Room B2.21
Office hours: by appointment
- Benjamin Klenk - B2.22 - +49-621-181-2656
- Alexander Matz - B2.22 - +49-621-181-2656
- Felix Zahn - B2.20 - +49-621-181-2696
- Günther Schindler - B2.20 - +49-621-181-2696
- Lorenz Braun - B2.20 - +49-621-181-2696
External PhD Students
- Matthias Hauck, SAP
- Alejandro Santos, CERN, Geneve, Switzerland
Research Assistants and Graduate Students - Room B2.18
- Steffen Lammel
- Klaus Neumann
- Julian Schwing
- Andreas Melzer
- Dominik Michels
- Dennis Rieber
- Himanshu Tiwari
- Armin Schaeffer
- Kazem Shekofteh, PhD student at Ferdowsi University of Mashhad, Mashhad, Iran, 11/2016-05/2017
Internships and research stays of group members
- Holger Fröning, Nvidia Research, Santa Clara, CA, US, 05-10/2016
- Benjamin Klenk, Nvidia Research, Santa Clara, CA, US, 02-07/2016
- Felix Zahn, University of Castilla-La Mancha, Spain, 10/2015
- Holger Fröning, Technical University of Graz, Austria, 03/2015
- Benjamin Klenk, Nvidia Research, Santa Clara, CA, US, 03-06/2015
- Alexander Matz, Intel Labs, Hillsboro, OR, US, 01-05/2015
Former members & visitors (date of leave/visit)
- Artur Kühlwein (MSc student, 2017)
- Dominik Sterk, (MSc student, 2016)
- Christoph Klein, (MSc student, 2016), first appointment as PhD student at Heidelberg University
- Daniel Schlegel (MSc student, 2016), first appointment at EVS Broadcast Equipment
- Benjamin Baumann (MSc student, 2016)
- Eugen Rusakov (MSc student, 2016), first appointment as PhD student at TU Dortmund
- Julian Romera (MSc student, 2016)
- Lena Oden (PhD, defended 04/2015), first appointment at Postdoc at Argonne National Labs, Chicago, IL, US.
- Francisco Andujar (visiting PhD student, Universidad Castilla-La Mancha, Spain, 2014)
- Pedro Garcia (visiting scientist, Universidad Castilla-La Mancha, Spain, 2013)
- Hector Montaner (PhD student, defended 04/2013 with highest degrees, Technical University of Valencia, Spain, 2013)
- Jesus Escudero Sahuquillo (visiting scientist, Universidad Castilla-La Mancha, Spain, 2012)
- Manuel Dewald (MSc student, 2013), first appointment at SAP
- Elena Kuss (MSc student, 2012), first appointment as PhD student at University of Mannheim