In February 2018, Russian nuclear scientists at the Federal Nuclear Centre were arrested for using their supercomputer resources to mine crypto-currency, Bitcoin.
Previously, HPC security breaches like this tended to be few and far between, however recent trends are increasing the vulnerabilities and threats faced by HPC systems.
Compute clusters enjoyed a level of security through obscurity due to their idiosyncratic architectures in terms of both hardware, with different CPU architectures and networking, and software of often home-grown applications running on Unix-like operating systems. In addition, the reward for compromising a cluster wasn’t all that great. Although hacking into HPC data generated by atomic weapons research and pharmaceutical modelling does present a valuable outcome; meteorological institutes, astrophysics laboratories or other mathematical research is less so.
Increased risks
But all of this is changing. Most clusters don’t use obscure processors like IBM POWER or Oracle SPARC anymore (93.2% of Top500 HPC systems in the latest list were using the x86 architecture) and according to the same list the share of them using Ethernet as their primary interconnect passed 50% in Nov 2018[1]. Linux is no longer a small operating system not worth attacking and this has seen the number of reported vulnerabilities rise. In 2018 there were more vulnerabilities reported in RedHat products than Microsoft.
Figure 1Total Number Of Vulnerabilities Of Top 50 Products By Vendor[2]
This is not evidence that RedHat is any less secure than Windows, but it does show that more effort is being put into compromising Linux security.
The rewards for security breaches are also rising. Companies are increasingly using HPC clusters for Computer-Aided Design (CAD) and Computational Fluid Dynamics (CFD), the results of which are crucial to their competitive advantage. If a competitor was to acquire the designs for a new vehicle chassis or mobile phone, that could be incredibly damaging, so improving the security of these systems should be of paramount importance to commercial enterprises and is to many already.
Traditional approach to security
Traditionally, the approach to security has been to adopt Science DMZ[3]. This computer subnetwork is structured to be secure, but without the performance limits that would otherwise result from passing data through a firewall. The cluster is placed in its own logically separate location with its own storage and networking. Clusters are configured to only have one access point that is connected to a wider network making access easy to control.
For larger configurations, data transfer nodes (DTNs) are used to move input and output files to and from the system as required. This protects the network performance of the cluster in the face of traditional security measures such as firewalls and inline intrusion detection systems. The Science DMZ was designed for research institutions which encourage collaboration and external use of their resources, so its blueprint also includes a WAN connection to the cluster. For enterprises or other institutions looking for greater security, this can be removed at the expense of restricting access to users within the trusted area of the organisations network.
Emergence of the cloud in HPC
Unfortunately, the hole in the Science DMZ approach comes when you want to enable some form of cloud bursting. This is a growing trend in HPC[4] and requires some form of connection between the cluster and the Public Internet if bursting to a public cloud provider. This works fine if your cluster is already open for ‘collaboration’, but less so if you are trying to keep it all to yourself.
Cloud providers are very keen to reassure users of the absolute security of their platforms, but as the recent Cloud Hopper[5] revelations have shown, some clouds are more secure than others. A potential mitigation for this would be to configure your scheduler to use separate queues and keeping sensitive workloads on premise, but that still requires at least a public route to the cluster manager.
However, it’s worth highlighting that we shouldn’t distrust the public cloud. The US Department of Defense is looking to commence using it (if its JEDI project ever reaches a conclusion[6]) and because of the level of trust required and how damaging to a cloud provider’s reputation a breach would be, it does take pains to ensure its platforms are as secure as possible. A cloud provider’s security ends at the edge of its cloud though, so the key to a secure hybrid cloud deployment is the correct configuration of the link between your trusted network and the cloud provider’s.
An organisation needs to ask itself the following questions. Is all out performance the key, and can data confidentiality, integrity and availability be sacrificed? Or do you need to ensure that your results remain completely secure, and you don’t mind adding some wall time to the jobs in that case? Maybe you can’t achieve the performance you need within your budget if you stay on premise, and it is time to consider a hybrid cloud deployment?
The important choice is to calculate the risk appetite of your organisation regarding the compute cluster, and balance that against the fundamental reality of cost. The tide is changing in HPC security with increased vulnerabilities in operating systems and the emergence of cloud bursting so making the right choice is essential to protect an organisation’s investment in HPC.
If you are currently agonising any over these concerns, then please reach out to OCF for our advice.
[1] https://www.top500.org/statistics/list/
[2] https://www.cvedetails.com/top-50-products.php?year=2018
[3] https://www.researchgate.net/publication/307843116_The_Science_DMZ_A_Network_Design_Pattern_for_Data-Intensive_Science
[4] https://www.ocf.co.uk/blog/cloud-bursting-part-1/
[5] https://www.theregister.co.uk/2019/06/26/china_apt10_hpe_ibm_dxc_hacked/
[6] https://www.datacenterdynamics.com/news/oracle-loses-dod-court-challenge-10bn-pentagon-cloud-contract-go-aws-or-microsoft/