CHIME

Project Description

Overview

CHIME (Canadian Hydrogen Intensity Mapping Experiment) is a collaboration between the University of Toronto, McGill University, and the University of British Columbia along with the Dominion Radio Astrophysical Observatory.  It is an innovative telescope with a unique half-cylinder design and ultra-wide frequency range making it ideally suited to map the universe in record time, detect FRBs (Fast Radio Bursts), and monitor pulsars. The project leaders took advantage of emerging consumer technology to reduce costs and maximize computing power in the limited space available.  Located in a protected area of British Columbia, the site was chosen to minimize interference from man-made radiation. Likewise, the 256 compute nodes that process the massive flow of frequency data had to be housed in two specially shielded shipping containers not far from the antenna arrays. This supercomputer X-Engine spatially correlates 130 billion bits of frequency data per second in real time.

Design Challenges

These location and design constraints introduced several challenges that could not be met with off-the-shelf computer systems.  Because the compute clusters have to be sealed from EMI leakage, traditional air-cooling could not be used. Instead, each of the 1,024 high end GPUs and 256 CPUs are liquid cooled.  Each rack of ten compute nodes has its own liquid loop pump and manifolds, ultimately exchanging heat with the ambient air outside. Besides accommodating additional liquid cooling components, each node has to house four GPUs, a Quad 10GbE network card, and a titanium efficiency 1600W power supply.  The depth of the chassis were also limited by the available space within the shielded containers, to ensure that they could be individually accessible for repairs.

The CHIME project leaders approached General Technics with this design challenge, and with our assistance met their goals with enormous success.  The CHIME telescope continues to make amazing scientific discoveries, and has since been expanded with another 128 compute nodes focused on detecting fast radio bursts.  All of us at General Technics are proud to have contributed to this achievement.

GT's Role

Design of Custom Chassis

The CHIME project required a custom 4U rackmount ATX chassis with eight card slots, and a full front panel intake grill.  The chassis also needed to be quite short (no more than 15 inches deep) in order to be removable. Another unique feature, not likely to be found on any other computer chassis, is its drainage spout, intended to contain and redirect liquid coolant leaks to minimize damage to other nodes.  The power supply and system board are also raised up 3/4in from the chassis bottom to prevent pooling of coolant in the rare case of a leak.

While the critical components are liquid cooled, the coolant that circulates is not actively refrigerated nor cooled by the air passing through the chassis like a typical gaming rig.  Instead, the coolant is passively cooled externally, while the other components such as the system board, memory, network card, and power supply are still air cooled. That recirculating air cannot be vented with fresh outside air, but is instead cooled by radiators and three large 120mm fans at the front intake panel of each chassis.  To ease installation and removal, each chassis is mounted on ball bearing slide rails, and the coolant hoses terminate in quick disconnect fittings mounted to the rear panel.

 

System Design

Testing showed that liquid cooling proved to be more effective and less expensive than a chilled air solution, and the cooler operating temperatures allowed room for overclocking and extended component life.  For cost, availability, and customizability, the early designs centered on the use of high end consumer GPUs, waterblocks, and tubing. As shown in the renderings that we generated, these waterblocks allowed four AMD R9 Fury X cards (normally dual-slot in width) to fit alongside the network card.  An ASRock X99 OC Formula enthusiast class board was chosen for its full 40 lane PCIe support, its ability to route additional power to the extra GPUs, and for its waterproof coating for an extra layer of protection.

GT also assisted the CHIME team in performing a cost-benefit analysis comparing the use of consumer and enthusiast class parts to server class and professional varieties of those components and cooling system.  We determined that server class components and a professional cooling solution would be better suited to the project’s goals. The additional reliability, scalability, and robustness of these components would extend their lifetime while operating 24/7 under the harsh environmental conditions.

 

Bringing People Together

GT also played a role in connecting the CHIME project leaders with AMD.  We arranged a key meeting between the three teams at SC15 in Austin, Texas to discuss the project.  Making the connections to the GPU manufacturer was extremely helpful in advancing the project’s goals.  AMD brought a lot to the table, discussing additional benefits of their server-class products, providing details about a yet unreleased dual Fiji GPU card that would ultimately serve to be a better overall solution, and establishing a direct channel for sourcing the large quantity of cards required for the project, as well as cross-promotional opportunities to offset their cost.  Our meeting with AMD also provided the CHIME team with access to custom BIOS and drivers to unlock features of the GPU, optimizing it for their code. When the first set of prototypes were built, our relationship with AMD helped us to secure advance samples of those dual GPU cards for testing months ahead of the card’s official release date.

While at SC15, GT and the CHIME team also met with CoolIT Systems, a manufacturer of enterprise class liquid cooling systems.  There they demonstrated their DCLC product’s ability to cool a full rack of servers more efficiently than traditional HVAC methods.  GT would go on to collaborate with CoolIT Systems to design and build the cooling systems used in the CHIME test nodes.

 

Fulfillment and Project Expansion

After the initial prototype order, GT provided the fully integrated test nodes for the project.  We would later go on to fabricate and deliver 266 of the custom chassis we designed. These were installed on site at the telescope, and are currently housing the world’s largest interferometric X-engine correlator of its kind.

Later, when the CHIME project was expanded to include research into fast radio bursts, we modified the design and built the chassis for an additional 128 compute nodes.  These are housed in a third sealed container on the site. The FRB team has since made significant contributions to astrophysics, including a second recently discovered repeating FRB.

As a company, General Technics was thrilled to work with the University of Toronto and McGill University on this project.  It directly aligns with our goals and principles to deliver the latest technology available for the advancement of science and education – goals which are not bound by any political border.  It has given us a great opportunity to demonstrate our skills in design and technical support, and in connecting our customers with our manufacturers. To us, CHIME represents the kind of project that will lead our business in the direction which we most want it to go.

Chassis Features

  • Non-standard eight card slots to support more double-width cards
  • Liquid cooled with a unique drainage spout
  • Extra-high mounting of power supply and system board
  • Large triple fan radiator on the intake with a full front panel grill
  • Extra-short chassis depth and a quick release lid for easy access