PerfBound: Conserving Energy with Bounded Overheads in On/Off-Based HPC Interconnects
Energy and power are key challenges in high-performance computing. System energy efficiency must be significantly improved, and this requires greater efficiency in all subcomponents. An important target of optimization is the interconnect, since network links are always on, consuming power even during idle periods. A large number of HPC machines have a primary interconnect based on Ethernet (about 40 percent of TOP500 machines), which, since 2010, has included support for saving power via Energy Efficient Ethernet (EEE). Nevertheless, it is unlikely that HPC interconnects would use these energy saving modes unless the performance overhead is known and small. This paper presents PerfBound, a self-contained technique to manage on/off-based networks such as EEE, minimizing interconnect link energy consumption subject to a bound on the performance degradation. PerfBound does not require changes to the applications and it uses only local information already available at switches and NICs without introducing additional communication messages, and is also compatible with multi-hop networks. PerfBound is evaluated using traces from a production supercomputer. For twelve out of fourteen applications, PerfBound has high energy savings, up to 70 percent for only 1 percent performance degradation. This paper also presents DynamicFastwake, which extends PerfBound to exploit multiple low-power states. DynamicFastwake achieves an energy-delay product 10 percent lower than the original PerfBound technique ; This research was supported by European Union's 7th Framework Programme [FP7/2007-2013] under the Mont-Blanc-3 (FP7-ICT-671697) and EUROSERVER (FP7-ICT-610456) projects, the Ministry of Economy and Competitiveness of Spain (TIN2012-34557 and TIN2015-65316), Generalitat de Catalunya (FI-AGAUR 2012 FI B 00644, 2014-SGR-1051 and 2014-SGR-1272), the European Union's Horizon2020 research and innovation programme under the HiPEAC-3 Network of Excellence (ICT-287759), and the Severo Ochoa Program (SEV-2011-00067) of the Spanish Government. ; Peer Reviewed ; Postprint (author's final draft)