Academic and government research centers are, to date, the exclusive domains of high-performance computing (HPC) because of supercomputers’ ability to perform sophisticated mathematical modeling. Opportunities for research are expanding, as is evidenced by the advent of "P4" (predictive, preventive, personalized, participatory) medicine. P4 medicine will require HPC analysis of each individual’s genome, bringing about a sea change in medicine, including patient care and the application of medical technology. For example, the Ohio State University (OSU) Medical Center is engaged in researching and applying P4 medicine.
HPCs were introduced in 1976, and the first Cray-1 supercomputer was installed at Los Alamos National Laboratory. Designed by Seymour Cray, regarded as the "father of supercomputing," the Cray-1 clocked a speed of 160 megaFLOPS, or 160 million floating-point operations (FLOPS) per second. Last year, Cray Inc. installed the world’s fastest supercomputer at Oak Ridge National Laboratory (ORNL). Named Jaguar, the XT5 System has a clock speed of 1.8 PetaFLOPs, or 1,800,000,000,000,000 FLOPS per second. This surpassed the IBM "Roadrunner" system, installed in 2008, which was the first computer to pass the PetaFLOP barrier.
Academic institutions normally do not house the fastest HPCs; these generally are housed in government research facilities, such as ORNL. Nevertheless, according to a www.top500.org list of the 500 fastest HPCs in the world, 77 sites are at academic institutions. The University of Tennessee, No. 4, has a Cray XT5 system with an 831 TeraFLOP speed, which is housed in the same data center at ORNL that houses the Jaguar system. The University of Alaska, number 435, uses a Cray XT5 platform with a clock speed of 26.31 TeraFLOPs.
A Dramatic Difference
Requests for increased processing capability come from various academic programs, with requirements for the size or scale of the system. Focusing on this requirement is essential before design can begin. The planning team must consider the fact that an HPC facility will require significant increases in power and cooling capacity compared with a typical data center. These massively paralleled networks of specialized servers have a load density between 700 and 1,650 watts per square foot (W/sf), while most data centers have a load density in the range of 100 to 225 W/sf.
To deal effectively with these issues, a university’s HPC planning team should include representatives from the academic programs that will use the HPC facility, as well as representatives from facilities and IT. There also likely will be representatives from the university’s architectural planning committee, especially if the site is to be on a campus that requires a traditional architectural design. The university’s project manager should be a person with decisionmaking authority, knowledge of the budget, and familiarity with design and construction disciplines.
Academic institutions typically hire an architect to form a project A/E team. However, in technical projects of this nature, the architecture must respond to the technical requirements of the space. Moreover, in the case of an HPC facility, the electrical and mechanical trades represent roughly 75 to 80 percent of the construction cost. Therefore, it is advantageous to hire an engineer as the lead member of an E/A team with responsibility for subcontracting the architect. In particular, the E/A team should have experience designing HPC spaces, which narrows the field significantly. The E/A team will include civil and structural engineers, and IT design firms. If the site is to be LEED-certified, the team will need a LEED consultant, as well as a commissioning agent.
Technical Requirements Drive Design
The size and capacity of any HPC facility must respond to the technical requirements of the system above all else. These can be identified by working with the HPC system vendor. In particular, the power and cooling requirements are likely to exceed any other facility on campus; therefore, an institution must plan for a much larger infrastructure area than any other facility. In turn, power densities of this nature drive the infrastructure areas. It is not uncommon for infrastructure areas to be significantly larger than the HPC space. In fact, these can exceed the size of the HPC space by as much as four to six times based on the required capacity and redundancy of support systems.
For example, a hypothetical entry-level HPC is a 24.2 TeraFLOP (24.2 trillion FLOPS) system composed of two Cray XT6 cabinets, requiring roughly 70 square feet of raised floor. This system will require 90 kW of power, and will reject 307,000 BTUs/hr (26 tons) of heat to water. A robust system—a 700-TeraFLOP (700 trillion FLOPS) computer cluster using the same Cray XT6 platform—will fill two rows of 20 cabinets. At an average of 35 square feet per cabinet, this will require 1,400 square feet of raised floor space. This system will require 1,800 kW of power, and will reject 6.14 million BTUs/hr (512 tons) of heat to water.
The spaces quoted above are for the computational elements only; data storage devices also will be required that can meet the high-speed communications of the HPC. It is not unusual for the data storage element to match, or exceed, the size of the computational element. The good news: The power density for data storage is much lower than for the HPC.
An important issue is the expected availability of the system: will it be available strictly during business hours or 24x7x365 days a year? Will it be used strictly to support the institution’s own academic research, or will it lease time to local businesses, academic institutions or third-party research entities? Users’ requirements for availability will play a large role in decisions about support systems redundancy levels.
The Challenge of Siting
Choosing a site for an HPC facility is challenging. Although a planning team’s first thought might be to place this equipment in an existing data center space, it is more appropriate to have a dedicated space built with this type of equipment in mind because of the structural, power and cooling requirements.
Many factors affect site selection. An appropriate site for an HPC is close to primary power lines and fiber-optic telecommunication networks, with allowance for any setback or landscaping requirement. The site also needs to be large enough to accommodate what typically is a single-story structure. A prominent site may be advantageous if, for example, the building will carry the name of a major donor.
Adjacency requirements may trump all other issues; for example, if a facility will be used to support P4 translational medicine, it may need to be attached to the medical center.
In any case, adequate space will have to be provided for construction staging, equipment storage and construction trailers.
Today’s computers generally take power at 120 volts, but HPCs require power at 480 volts. Systems require a 100-amp, 3-phase 480 volt feed to each cabinet. The electrical load of an HPC may seem daunting, but it may force a university to rethink how it handles data facilities. In fact, an HPC center could prompt energy-efficiency.
Because 7x24 uptime generally is not a requirement for academic research, it may not be essential to provide redundant power and an uninterruptible power supply (UPS) system. Typically, the data storage equipment is protected. However, if a university leases space on the HPC to external entities, the university may have to guarantee 7x24 uptime.
The telecommunications requirements of an HPC can be supported with a 10-gigabit Ethernet backbone, or a protocol called InfiniBand, which uses a switched fabric topology rather than a hierarchical switched network. Either system will provide a more robust communications network with more fiber required than typically is deployed. These telecommunications protocols enable the many cabinets in an HPC to operate a single computer.
Raising the Bar on Cooling
In the examples above, the entry-level HPC will reject 26 tons of heat, and the robust system will reject 512 tons of heat. The massive heat generated in a small area by these systems, and the fact that they reject heat to water, will require an upgrade to the conventional air-cooling system in the form of package chillers and chilled-water piping. Chillers are an expensive first-cost element, but they are more cost-efficient to operate compared with competing cooling technologies.
With a packaged chilled-water system removing the heat of the HPC, the net result will be a reduced heat load in the air-side system in the HPC space. Using chilled water also makes available an option of installing a fluid cooler that would enable free cooling in the cooler seasons, depending on location.
Meanwhile, Cray and other manufacturers are experimenting with high-temperature HPC products, which also will reduce cooling requirements. For example, the Cray XT6 has a maximum inlet temperature of 32ºC (89.6ºF), and Cray is attempting to raise this to 50ºC (122ºF); as a result, this system may require little or no cooling.
Return on Investment
An HPC facility represents a major capital investment. It also represents an enormous potential return on investment in the form of faculty recruitment and funding for scientific research. An institution can consider a number of ways to manage the capital costs. For example, it might consider purchasing used HPC equipment from government entities, which typically turn over equipment every three years. It also might design the system and facilities so it can lease time to other research institutions or industrial corporations.
Some colleges and universities have taken another route, establishing or joining public/private supercomputing partnerships. For example, OSU Medical Center uses the supercomputing services of the Ohio Supercomputer Center (OSC) for its P4 research program. Another example of a successful partnership: the University of Tennessee leveraged its relationship as site manager of the HPC center at ORNL to house its Cray XT5 system there.
With more institutions looking to expand research capabilities, HPC facilities will become more prevalent. The process of building or expanding a HPC center requires a level of technical expertise not usually essential for a standard mission-critical facility. The challenge is to design systems that will be efficient and flexible well into the future.
- Technology Demands: The need to provide an early educational advantage requires the planning and design of efficient career and technology education (CATE) facilities.
More information on Jaguar, the Cray XT5(TM) system installed at Oak Ridge National Laboratory: