LUMI, Europe’s most powerful supercomputer

Barrin92 · on June 13, 2022

>"In addition, the waste heat produced by LUMI will be utilised in the district heating network of Kajaani, which means that its overall carbon footprint is negative. The waste heat produced by LUMI will provide 20 percent of Kajaani’s annual demand for district heat"

Pretty cool honestly. Reminds me of the datacenter that Microsoft built in a harbor to cool with the surrounding seawater.

nabla9 · on June 13, 2022

There are now several datacenters in Finland that link into local district heating.

Microsoft recently announced that they build similar data center in Finland too https://www.fortum.com/media/2022/03/fortum-and-microsoft-an...

jupp0r · on June 13, 2022

Their definition of negative carbon footprint is broken. Unless there is something in the computer that permanently binds carbon from the atmosphere.

patall · on June 14, 2022

They run the computer carbon neutral and then use the excess heat to replace carbon derived heat. Hence, there is less carbon emission while the computer is running. Their definition is not broken, just based on comparison to the initial state (that emmits CO2) and not a zero emission state. Like Google's supposed carbon negativity by reducing the emission of other peoples cow sheds.

rightbyte · on June 14, 2022

Decreasing carbon emissions is not "carbon negative" in my world. A "carbon negative" machine would be something that e.g. plants trees.

jupp0r · on June 15, 2022

Negative carbon emissions are those that remove carbon from the atmosphere. This project doesn’t remove carbon from the atmosphere, it’s preventing more carbon from being released into the atmosphere which is usually called carbon neutral.

ravi-delia · on June 13, 2022

Not when it is replacing gas heat, as it likely is!

jupp0r · on June 13, 2022

That would make it neutral not negative, but only if it was powered fully by renewables which were constructed and operated in a fully carbon neutral way.

mike_hock · on June 14, 2022

It could be that electrical district heating has a lower carbon footprint than gas heating (if the electricity is sufficiently low-carbon).

Still doesn't make it "carbon-negative," just kills two birds with one stone by using the same energy both for heating and computing.

ravi-delia · on June 14, 2022

I see what you're saying with this, but the comparison isn't a world where electricity is used for the heating, it's one where gas is (gas heating is almost always more efficient than electric heating, since you lose a lot of heat at the generator). So either the heat is radiated out into the atmosphere uselessly, and some amount of natural gas is burned, or the heat is used and 20% less natural gas is burned. That's a carbon footprint of -20%.

asciimike · on June 13, 2022

[Cloud and Heat](https://www.cloudandheat.com/hardware/) offers liquid cooling systems that purport to offer waste hot water on the town/small city scale.

20100thibault · on June 13, 2022

Www.qscale.com recuperate heat for 100+ Football field of greenhouse in North America And FYI a properly designed liquid cooling facility is cheaper and more efficient then an air cooled one. Hopefully everything will switch to liquid cooling just like car engines are all liquid cooled !

alkonaut · on June 13, 2022

I hope no datacenters these days are built on the idea of just running cooling with straight electricity (e.g. no cooling water) and shifting the heat straight out to the air (no waste heat recovery). Even in the late 90's that sounds like a poor design.

why_only_15 · on June 13, 2022

that's how all of Google's datacenters are built in my understanding. Water cooling is very expensive compared to air cooling, and only used for their supercomputer-esque applications like TPU pods. I don't know about waste heat recovery but I don't think they use that either.

tuukkah · on June 13, 2022

"A case in point is our technologically advanced, first-of-its-kind cooling system that uses seawater from the Bay of Finland, which reduces energy use." https://www.google.com/about/datacenters/locations/hamina/

Out_of_Characte · on June 13, 2022

Depending on what you define as water cooling, Google most definitely uses watercooling in all their datacenters.

https://www.datacenterknowledge.com/archives/2012/10/17/how-...

https://arstechnica.com/tech-policy/2012/03/google-flushes-h...

alkonaut · on June 14, 2022

You don't need to water cooled machines, you can use use cold water to chill air.

Heating a community of houses rather than emitting into the air seems like it should be a requirement for building any power hungry industry.

adrianN · on June 14, 2022

Using the water directly is a lot better for your efficiency, because the computers are quite a bit hotter than the air that cools them so you get higher quality heat from there.

alkonaut · on June 14, 2022

Yes but that requires special gear I assume, that might drive up the cost. Just using cold water for a cheaper heat exchanger (Than some pumped medium) seems like the simple solution if you have unlimited cool water.

RobertoG · on June 13, 2022

There is also immersion cooling. The liquid is not water and it seems is pretty efficient:

https://submer.com/immersion-cooling/

asciimike · on June 13, 2022

Exceedingly efficient (PUEs of 1.0X) vs cold plate liquid cooled or air cooled. The tradeoff is that mineral oil is annoying (messy especially if leaked but even with maintenance) and fluorinated fluids are bad for the environment (high GWP, tend to evaporate) and crazy expensive. In either case, the fluids tend to have weird effects on plastics or other components, so you have to spend a good amount of time testing your components and ensuring that someone doesn't switch components on your motherboards without you knowing, lest it not play well.

namibj · on June 14, 2022

At the low cost of a nitrogen blanket for fire safety, something like n-pentane should be suitable for immersion nucleate boiling cooling with the added benefit of not needing a heart exchanger been the immersion pool and a possible outside air hat rejection condenser.

The solvent effects are relatively well understood for these medium-light alliaphatic hydrocarbons, done care ensure that aren't particularly toxic, and a nitrogen blanket for the rooms takes care of the inherent flammability issue associated with what's basically boiling gasoline on the CPU.

FastMonkey · on June 14, 2022

I don't know why you're being downvoted. Something like a third of GHG emissions come from heating, if you can extract the heat from a datacentre you're displacing something like gas, it is something to hope for.

weberer · on June 13, 2022

That's also in Finland. The district heating infrastructure is already in place, so if you're producing heat, its not hard to push steam in a nearby pipe and make an easy PR statement about sustainability.

danielvaughn · on June 13, 2022

though couldn't the district then save money by either reducing their own infra, or eliminating it entirely?

sampo · on June 13, 2022

The computer as a whole has an entry (3.) in the top500 list. And then the cpu-only part of the computer has another entry (84.). The whole computer does about 150 PFlop/s, and the cpu-only part about 6 PFlop/s. So 96% of the computing power comes from the GPU cards.

https://www.top500.org/lists/top500/list/2022/06/

credit_guy · on June 14, 2022

This is impressive, but not exceedingly so.

A single NVidia H100 GPU gets 1 PFlops/s [1] for single float (32 bit) precision. There's an asterisk which says

   Shown with sparsity. Specifications are one-half lower without sparsity

I have to admit that I don't understand what this means, but the apparent takeaway is that the "non-sparse" speed is 0.5 PFlops/s.

So this supercomputer has a speed equal to about 300 H100 GPUs. Not bad, but not something where one would start worrying about renewable electricity or not.

[1] https://www.nvidia.com/en-us/data-center/h100/

ajtulloch · on June 14, 2022

TF32 is not IEEE-754 float32, it is a reduced precision format designed for machine learning usecases. The correct specsheet number for FP32 (and FP64 which is the relevant precision here) throughput on H100 is more like 60TFLOP/s, so your number is off by roughly an order of magnitude.

robocat · on June 14, 2022

> LUMI gets all of its electricity from 100 percent renewable hydroelectric power.

100% pure greenwashing bullshit at a guess. Power usage has gone up in Finland, and that extra power does not come from hydroelectricity, because all[1] the hydroelectric power is already used domestically. Instead the extra power usually comes from non-renewable sources.

I am guessing they signed an agreement with a hydroelectric producer to buy “hydroelectric” electricity, fungibly delivered over the transmission network. Even if they got the electricity delivered directly from hydro, that would leave a network shortfall to be filled by non-renewables.

Avoiding aircon is good, and the heat reuse is good since it presumably reduces non-renewable resource use by domestic heating.

[1] https://www.stat.fi/til/salatuo/2020/salatuo_2020_2021-11-02...

teruakohatu · on June 14, 2022

If every large producer insists on hydro/green energy, the price of hydro will go up encouraging investment in more green energy.

It's like arguing planting a tree just denied someone else the opportunity to plant a tree in that square foot of dirt.

I am sure they are paying a premium.

Here in NZ we have an aluminium smelter that is directly connected to a huge hydro dam, rather than the grid, so they can claim it is powered by green energy regardless of how you look at it.

robocat · on June 14, 2022

> If every large producer insists on hydro/green energy, the price of hydro will go up encouraging investment in more green energy.

I don’t think that is realistic, and your word “every” is doing some very heavy lifting to give an overly simplistic answer. Renewable electricity is over 50% in NZ[0]. From a pure economic argument, green demand would need to approach green supply before before hydro could gain much of an excess price. Paradoxically, it seems plausible to me that electricity generation companies have enough moat to be able to just squeeze profits out, and not increase green generation at all (regardless of your hypothetical financial incentives). Most renewable requires huge capital investments, and a few percent extra profit can easily not move the profitability needle enough to change a project from a no to a go.

I would like to see a good economic and ecological comparison of the regulatory/tax choices the government could make, because I am somewhat suspicious about the recent electric-car subsidy. I have an acquaintance who has been part of a very well planned ($x million invested) solar project for NZ with an expected capital cost in the hundreds of millions, and anecdotally the government doesn’t seem to give a shit about it. Maybe they have the wrong political connections? Hydro is the same as a battery, so solar appears to be very sensible in NZ.

> aluminium smelter [claims] it is powered by green energy

I already answered that one in the comment you are replying to. More specifically: our NZ government has decided for decades to give Tiwai Smelter cheap electricity contracts (maybe for the sake of 2500 jobs), and finally decided to pull the plug on them. NZ consumers have had to buy marginally expensive and mostly non-renewable electricity over many decades due to that government choice. I haven’t looked at the country economics for the smelter, but I strongly suspect our government has been making uneconomic political decisions that have cost the country dearly. I presume they will put in a link from Manapouri to Benmore (connect to the Benmore to Wellington DC link, which isn’t 100% capacity constrained?[3]). I would also guess that the government was politically pressured to have more green generation for NZ, and shutting down Tiwai finally made political sense on the world stage to achieve “more” renewable generation.

Who wants another 1992 due to mis-planning for our security of supply? 1992 was perhaps historical blindness given that: “the power crises of the early 50s when serious shortfalls in supply created shutdowns and blackouts throughout the country. These were a regular occurrence in the years between 1954 and 1957”[1]. Although I guess I would need to see a past analysis of the risks versus the costs to judge whether the 1992 economic GDP reduction was just an acceptable cost versus the costs of upgrading generation. Governments generally are motivated by quick wins, and are sometimes not motivated to use regulation to encourage long term risk reduction against fat-tail events.

[0] https://www.mbie.govt.nz/assets/Data-Files/Energy/nz-energy-...

[1] https://www.engineeringnz.org/programmes/heritage/heritage-r...

[2] 2020: “Transpower has indicated it will cost $600 million to put the transmission structures in place to take the power from Southland to Auckland. That work was due to be completed over the next five to seven years.” The majority of Auckland electricity consumption is residential, the remainder of usage mostly commercial, and Glenbrook mill is less than 10%. https://www.stuff.co.nz/national/122113863/the-power-game-wh...

[3] https://www.transpower.co.nz/power-system-live-data

teruakohatu · on June 14, 2022

An interesting analysis. Feel free to reach out to me by email (my address is on my profile).

Employment is certainly part of the reason the smelter remains open, but as far as costing the country money since it was built, keep in mind the dam was originally built for the purpose, at a time when the country desperatly needed hard currency. We have the dam because of the smelter.

robocat · on June 14, 2022

> keep in mind the dam was originally built for the purpose, at a time when the country desperatly needed hard currency

Yeah - 1971. Paid for by the government I think.

And $600 million to connect it now into the grid is not chump change so there is that. But the transmission line from Manapouri to Tiwai can’t have been cheap either. Then again, it isn’t clear how much of that modern $600 million is to handle natural growth in Auckland (the DC link has already been upgraded).

Meanwhile I’m guessing NZ imported oil during the oil crisis to run power stations up North!! I am guessing because the linked spreadsheet only goes back to 1974. The question is would NZ have been economically better off without the Smelter? Should it have been shut down long ago? What is the total ecological cost? E.g. https://www.rnz.co.nz/news/national/436877/smelter-stockpile...

Was Tiwai “justified” this century because of sunken cost fallacy?

jltsiren · on June 14, 2022

It's complicated. The site is a former paper mill, and the power comes from nearby hydro plants in a very concrete sense. The plants also deliver power to the grid. Things are changing so fast that it's difficult to tell where the marginal energy will come from next winter. The share of renewables and nuclear is rising, Finland uses very little fossil fuels for electricity outside cogeneration plants, and Russian energy imports have already stopped or are about to stop.

robocat · on June 14, 2022

> and the power comes from nearby hydro plants in a very concrete sense

Unless the grid network connection is constrained, it certainly doesn’t “come” from nearby hydro because 1kWh is mostly fungible. Transmission costs are likely low. That is the point - although I admit marginal economics are confusing and most people don’t try to understand it (Disclaimer: I don’t have a good grasp of it either).

The grid connection is not likely to be constrained, because that would likely mean no redundant power. I assume even government data centers like reliable power supply: commercial data centres usually plan to be located where there is power redundancy for availability uptime. Although constraints are often only in one direction, so hard to predict just from a redundancy argument. Details of the network matter, however I am not motivated to find that exact information.

A hypothetical analogy would be to think of a lake in Lakeland. Currently the lake water is supplied 100 units by a clean green river and 100 units by a desalination plant (producing equivalently clean water, but using wads of dirty fossil fueled power to produce). 200 units of water is currently taken out of the lake by the residents of Lakeland. A data center is added that takes 50 units of water from the lake, and they sign up with the local Lakeland government that contractually agrees that the water the data centre gets from the lake is clean green water from the river. The desalination plant now needs to produce 150 units of water. Analogy: the lake is the power grid, the river is hydropower generation, and the desalination plant is fossil fuel power generation. You are saying the data center is getting water from the river (because contracts), and I am saying the extra marginal demand causes greenwashed fossil fuel generation.

> Finland uses very little fossil fuels for electricity

But nearly 100% of added power usage is supplied by non-renewable resources. That is the core of the likely greenwashing, and the point I am making about knowing where 1kWh of extra generation comes from, if 1kWh of load is marginally added.

I know I am repeating what I said, but that is because you are not arguing against the points I made.

There is a sort of argument that Finland will have 100% renewable power in the future. But if Finland has cross-connections to countries with dirty power, then the marginal argument still holds, just as though the lake in Lakeland is connected with lakes in other countries (European grid?)

To be called green:

1: add new green generation that wouldn’t otherwise be added. BUT if solar and wind generation installation is being installed at a maximum rate around the world due to manufacturing capacity limits, then you can’t claim “your” green generation unless you also increase manufacturing capacity limits. There’s a parallel to the lake analogy.

2: or reduce power usage that wouldn’t otherwise have been reduced. Usually hard to actually account for (the reduction needs is difficult to get right).

3: AND while doing the above, don’t spend a bunch of uneconomic money, because spending money is usually wasteful and indirectly generates carbon.

It is hard to do the above. However it is almost free to just label your power as “green”. Most projects choose a green label. Very very few new projects marginally reduce carbon dioxide - because new projects are usually marginal increases in power usage.

jltsiren · on June 14, 2022

> But nearly 100% of added power usage is supplied by non-renewable resources.

I would have agreed with that last year, or at least a decade ago. The role of fossil fuels in power generation has been diminishing, and now everything is weird with the lack of imports from Russia and with the new nuclear plant getting operational. Nobody really knows what will happen in the winter.

My impression is that added power is usually supplied by hydro and/or imports. Fossil fuels are almost exclusively used in cogeneration plants, which scale more by the demand for heat than by the demand for power. There are some plants burning coal and gas that are used in cold winter days, but they are normally not competitive enough to run.

When it's a rainy year in Norway, their exports are constrained by transmission capacity. It would not be possible to use the hydro exported to Finland anywhere else. In a dry year, any power used in Finland could plausibly increase the demand for fossil fuels elsewhere.

robocat · on June 15, 2022

> My impression is that added power is usually supplied by hydro

Most hydro is already consumed. Where do you think this extra power comes from?

> When it's a rainy year in Norway, their exports are constrained by transmission capacity

Yeah, at times there may be water spilled/wasted instead of used for generation, just like excess wind can make electricity prices zero/negative in Germany or Texas. But I am guessing it is a small percentage of generation time (single digit I would guess) so it doesn’t affect my argument.

jltsiren · on June 15, 2022

Norwegian hydro follows an annual cycle. They get a certain volume of rain and meltwater each year, and they have to use it before the next cycle, or it goes to waste. The terrain is not suitable for the kind of massive reservoirs you may see in the US or Russia. There is often something like 5-10 TWh/year excess power they have a hard time selling.

The big issue is that electric markets are not particularly flexible. It's no longer the 20th century, when people regularly burned fuels in easily adjustable plants in order to generate power. There is nuclear, which provides inflexible base capacity. There is cogeneration, where the demand for heat is seasonal and a corresponding amount of power is generated regardless of whether someone is willing to buy it. And then there are solar and wind, which fluctuate wildly and unpredictably and produce an ever increasing share of total power.

Hydro is primarily used for filling the gaps. And if it was a rainy year and/or a mild winter, the demand for hydro, in places where it can be exported to, may not be high enough.

antupis · on June 14, 2022

Almost all new energy at Finland comes from carbon neutral sources(Olkiluoto 3, wind and solar).

robinhoodexe · on June 13, 2022

For a more technical overview:

https://www.lumi-supercomputer.eu/may-we-introduce-lumi/

oittaa · on June 13, 2022

And full specs at https://www.lumi-supercomputer.eu/lumis-full-system-architec...

martyvis · on June 13, 2022

And it looks like the programming environment is running on that hobbyist OS kernel, Linux, that will come to nothing ;-) https://rocmdocs.amd.com/en/latest/index.html

pjmlp · on June 14, 2022

> 1998: Many major companies such as IBM, Compaq and Oracle announce their support for Linux.

> 2000: Dell announces that it is now the No. 2 provider of Linux-based systems worldwide and the first major manufacturer to offer Linux across its full product line

https://en.wikipedia.org/wiki/History_of_Linux

2019 => https://newsroom.ibm.com/2019-07-09-IBM-Closes-Landmark-Acqu...

Linux never had any issues becoming yet another UNIX clone, hence why its strong points are CLI and headless applications.

I bet LUMI researchers won't be doing data analysis on their laptops from Linux distributions.

asdf4564446 · on June 14, 2022

>I bet LUMI researchers won't be doing data analysis on their laptops from Linux distributions.

I did just that when I used one of LUMI's predecessors. Linux desktops weren't uncommon in my corner of academia.

pjmlp · on June 14, 2022

While at CERN and Fermilab everyone was using mostly Windows 2000 and OS X on their desktops, and many were doing their papers in Framemaker instead of LaTeX.

The creation of the Scientific Linux distribution hardly changed this during the times I was there, and during my last visit for the Alumni Network creation event, it seems to have hardly changed.

This is where I am coming from.

hrgiger · on June 14, 2022

Afaik ROCM only provides fine tuned libs/drivers running on linux. At least it was provided that way years ago when I used and shipped with kinda soft vendor lock only main distros supported but if they provide now with the some tuned hpc OS that would be pretty good idea.

anttiharju · on June 13, 2022

Lumi is the Finnish word for snow in case anyone's wondering.

kgwgk · on June 13, 2022

Good to know the inspiration was not this: https://www.collinsdictionary.com/dictionary/spanish-english...

Hamuko · on June 14, 2022

It's so hard to name anything without it being rude in some language.

See also: Mitsubishi Pajero, Mazda Laputa.

jjtheblunt · on June 13, 2022

is there a cognate in an indo-european neighboring language, perhaps by loanword?

user_7832 · on June 13, 2022

https://en.m.wiktionary.org/wiki/lumi

Doesn't look like in a quick glance

jjtheblunt · on June 13, 2022

holy cow, I didn't realize that was searchable or i'd have. thank you.

user_7832 · on June 14, 2022

Haha I just googled lumi etymology I think

jjtheblunt · on June 14, 2022

true : right under our noses is Google.

anewhnaccount2 · on June 14, 2022

A quick way of determining this with Finnish is if it is 1) a nature word + 2) follows a simple phonetic structure where each syllable is CV or CVV (C = consonant V = vowel) then it might be from proto-Uralic and so not a result of language contact but a "really old" Finnish word and may be congnate with words in other Uralic languages e.g. Hungarian or one of several tiny languages Russia. Also e.g. puu (= tree), vuori (= mountain) is proto-Uralic. Other nature words like järvi (= lake) which don't follow the pattern might be proto-Finnic but have come from neighbours (for järvi it is apparently from baltic/proto-slavic).

jjtheblunt · on June 14, 2022

totally makes sense (did lots of germanic linguistics in my case, so i definitely understand the patterns you're laying out).

thank you.

fancyfredbot · on June 13, 2022

I really love supercomputing but I worry whether, with a machine like this one, we get the right balance between spending on software optimization Vs spending on hardware. It used to be the case that fast hardware made sense because it was cheaper than optimising hundreds of applications but these days with unforgiving GPU architectures the penalty for poor optimisation is so high...

Ekaros · on June 14, 2022

I think supercomputer users and customers are on better side of this. The time availability is rationed and likely somewhat fought for. So there is drive to get as much done in time allocated or do it in little time as possible.

I think waste is much more likely in startup and private sector. Where scaling up and out is easy and "cheap"...

jbjbjbjb · on June 13, 2022

I wonder if anyone on HN could tell us how well optimised the code is on these? I imagine the simulations are complicated enough without someone going in and adding some performance optimisation.

cabacon · on June 14, 2022

I worked at a supercomputing facility for a few years. The codes are typically decades old, maintained by hundreds of people over the years. By and large, they understand their performance profiles, and are working to squeeze as much out of the code as they can.

In addition, the performance engineers tend to be employed by the facilities, not the computational scientists. They're the ones who do a bunch of legwork of profiling the existing code on their new platform, and figuring out how to squeeze any machine-specific performance out of the code.

A lot of these codes are time-marching PDE solvers that do a bunch of matrix math to advance the simulation, so the kernel of the code is responsible for a vast majority of the time spent during a job. So it's not necessarily a huge chunk of code that needs to be tuned to wring better performance out of the machine.

The parallel communication they do is also to an API, not an ABI - the supercomputing vendors drop in the optimizations in the build of the library for their machine, to take advantage of network-specific optimizations for various communications patterns. If you express your code in the most-specific function (doing a collective all-to-all explicitly, say, rather than building your own all-to-all out of the point-to-point primitive) the MPI build can insert optimized code for those cases.

There's some misalignment because the facility will be in the top 500 for a few years, while the code lives on and on and on. If your supercomputer architecture is really out of left field (https://en.wikipedia.org/wiki/Roadrunner_(supercomputer)) it's not going to be super worth it for people to try to run on it without porting support from the facility.

nestorD · on June 13, 2022

I am not familiar with that particular one but I have used other supercomputers and those people are not waiting for better hardware, they are trying to squeeze the best performance they can out of what they have.

The end result mostly depends on the balance between scientists and engineers in the development team, it will oscillates between "this is python because the scientists working on the code know only that but we are using MPI to at least use several cores" and "we have a direct line with the hardware vendors in order to help us write the best software possible for this thing".

SiempreViernes · on June 13, 2022

It varies quite a lot depending on the exact project and how much is expected to be purely waiting on one big compute job to finish.

For something like climate simulations where a project is running big long jobs repeatedly I imagine they spend quite a bit of time on making it fast.

For something like detector development where you run the hardware simulation production once and then spend three years trying to find the best way to reconstruct events less effort is put into making it fast. Saving two months from a six months job you run once isn't worth it if you have to spend more than a few weeks optimising it, and as these type of jobs need to write a lot to disk there's a limit to how much you'll get from optimising the hot loop.

jpgvm · on June 13, 2022

Interesting to see Ceph mixed into the storage options.

Lustre still king of the hill though.

asciimike · on June 13, 2022

My assumption is that Ceph is just there for easy/cheap block storage, while Lustre is doing the majority of the heavy lifting for the "supercomputing." Ceph file storage performance is abysmal, so it doesn't make sense to try and offer it for everything.

geoalchimista · on June 13, 2022

Its peak flops performance seems on par with DOE's Summit and 15% of Frontier, according to the top 500 supercomputer list: https://www.top500.org/lists/top500/2022/06/.

drallison · on June 18, 2022

The full article title is "LUMI, Europe’s most powerful supercomputer, is solving global challenges and promoting a green transformation". It's an irrational response, but I find it difficult to trust an article about a computer "solving global challenges" and "promoting a green transformation". The English usage is wrong and that makes me wonder whether the content of the article is correct.

occamrazor · on June 13, 2022

What is a goid benchmark today for supercomputers? TFLOPS don’t seem to be a good measure, because it’s relatively easy to deploy tens of thousands of servers. Is it the latency of the interconnection? Or the bandwidth? Or something entirely different?

the_svd_doctor · on June 13, 2022

Hard/impossible to respond. Every workload is different. It can be mostly compute bound (Linpack), communication bound, a mix, very latency sensitive, etc. Imho, it just depends on the workflow and we should probably use multiple metrics instead of just Linpack peak TFLOPS.

alar44 · on June 13, 2022

That's like asking what the benchmark for an engine is. It all depends on what you're trying to do with it. There's no single metric to compare a diesel semi truck engine to a 2 stroke golf cart. You need multiple measures and the importance of each is dependent on your workload.

nabla9 · on June 13, 2022

They never have used flops to measure supercomputer performance.

Its GFLOPS in HPLinpack (dense matrice multiplication)

jp0d · on June 13, 2022

They've also partnered with the RIKEN Centre for Computation Science (Developer of the fastest Super computer on earth). Quite impressive and interesting at the same time as they use very different architectures.

https://www.r-ccs.riken.jp/en/outreach/topics/20220518-1/ https://top500.org/lists/hpcg/2022/06/

Wuzado · on June 14, 2022

Technically, according to Top500, they're not the fastest anymore.

The #1 right now is held by Frontier over at the Oak Ridge National Laboratory, with Rmax 1.102 EFlop/s.

That doesn't mean that it's not an extremely capable system, though. (And Fugaku's performance seems not that well reflected in the HPL benchmark.)

jp0d · on June 14, 2022

Thanks for clarifying. It's also interesting to see the presence of ARM64 and Power architecture in top 10. Also the prominence of AMD EPYC chips.

hulitu · on June 15, 2022

This looks like a CIA briefing: A very powerful supercomputer. - But what can it do ? - I tell you, it's very powerful.

throw0101a · on June 13, 2022

Using AMD GPUs.

How popular are they compared to Nvidia for HPC?

cameronperot · on June 13, 2022

NVIDIA has a significantly larger market share for HPC [1] (select accelerator for category).

[1] https://top500.org/statistics/list/

brandmeyer · on June 13, 2022

That's not my take-away from the chart, especially if you normalize by performance share. "Other" is the clear winner, and AMD has slightly more performance share than NVIDIA.

cameronperot · on June 13, 2022

Good point. I was looking at the "Family" aggregation which doesn't list AMD in the performance share chart, which was a bit misleading.

incrudible · on June 13, 2022

AMD hardware is much more cost efficient on software that has not yet been developed. If you are planning on not using GPUs, definitely go with AMD.

ValtteriL · on June 14, 2022

CSC <3

formerkrogemp · on June 13, 2022

The most powerful computer is the one that can launch nuclear weapons. "Shall we play a game?"

saddlerustle · on June 13, 2022

A supercomputer comparable to mid-size hyperscaler DC. (and no, it doesn't have uniquely good interconnect, it's broadly on par with the HPC GPU instances available from AWS and Azure)

slizard · on June 13, 2022

Hard no. Amazon EFA can barely come close to a dated HPC interconnect from the lower part of the top500 (when it comes to running code that does use the network, e.g. molecular dynamics or CFD), Azure does offer Cray XC or CS (https://azure.microsoft.com/en-us/solutions/high-performance...) which can/will be set up as proper HPC machines with fast interconnects, but I doubt these can be readily rented in the 100s of PFlops size.

Check these talks from the recent ISC EXACOMM workshop if you want to see why HPC machines and HPC computing are an entirely different league compared to traditional data center computing: https://www.youtube.com/watch?v=9PPGvqvWW8s&list=WL&index=9&... https://www.youtube.com/watch?v=q4LkF33YMJ4&list=WL&index=7

hash07e · on June 13, 2022

Nope.

It has Slinghshot-11[1] as interconnection having a raw power of 200GB speed, plus caching and other heavy optimizations.

It is not only the gpu instances but the way it interconnects. This model has even containers available for use.[2]

It is more open.

[1] - https://www.nextplatform.com/2022/01/31/crays-slingshot-inte...

[2] - https://www.lumi-supercomputer.eu/may-we-introduce-lumi/

ClumsyPilot · on June 13, 2022

Whats is the difference between a DC and why don't DCs appear in supercomputer rankings?

dekhn · on June 13, 2022

Because if you tried to run the supercomputer benchmark on a DC, you'd get a low score, and you can't easily make up for that by adding more computers to a DC. To win the supercomputer benchmarks, you need low-latency, high bandwidth networks that allow all the worker nodes in the computer to communicate calculation results. Different real jobs that run on supercomputers have different communications needs but none of them really scale well enough to be economic to run on datacenter style machines.

What's interesting is that over time, the datacenter folks ended up adding supercomputers to their datacenters, with very large and fast database/blob storage/data warehousing systems connected up to "ML supercomputers" (like supercomputers, but typically only do single precision floating point). The two work well together so long as you scale the bandwidth between them. At the end of the day, any interesting data center has obscenely complex networking technology. For example, TPUs are PCI-attached devices in Google data centers; they plug into server machines just like GPUs. The TPUs themselves have networking between TPUs, that allows them to move important data, like gradients, between TPUs, as needed to do gradient descent and other operations, but the hosts that the TPUs are plugged into have their own networks. The TPUs form a mesh- the latest TPUs form a 3D mesh, but physically implemented through a complex optical switch, while the hosts they are attached to multiple switches which themselves from complex graphs of networking elements. When running ML, part of your job might be using the host CPU to read in training data and transform it, keeping the network busy, keeping some remote disk servers busy, while pushing the transformed data into the TPUs, which then communicate internal data between themselves and other TPUs, over an entirely distinct network. Crazy stuff.

why_only_15 · on June 13, 2022

Generally speaking a DC is designed for doing a bunch of different things that have less punishing interconnect needs, whereas supercomputers are designed for doing fewer things with higher interconnect needs. Datacenters often look like rows upon rows of racks with weaker interconnects between them, whereas supercomputers are much more tightly bound and built to work together.

pbsd · on June 13, 2022

Entries #13, #36, #37, #38, #39 on the current list are Azure clusters. #52 is an EC2 cluster.

zekrioca · on June 14, 2022

Highly optimized one-time bare-metal runs, i.e., not a regular run using vCPUs and VMs like it would be for 99th of clients running HPC workloads over these platforms. Only to pop up in the Top500 charts and market about it. That’s all.

freemint · on June 13, 2022

Optimization for different workloads, scheduling is per workload not renting per machine and that they are submitted the TOP500 list and run benchmarks.

Why do not DCs appear? Because they have not submitted benchmarks and power measurements.

SiempreViernes · on June 13, 2022

> scheduling is per workload

This is really the key: a supercomputer has the (software) facilities that makes it possible to launch one coordinate job that runs across all nodes. A data centre is just a bunch of computers placed next to each other, with no affordances to coordinate things across them.

At on point in time the hardware differences were much greater between the two, but the fundamental distinction where a supercomputer really is concerned with having the ability to be "one" computer remains.

zekrioca · on June 14, 2022

Indeed, and that’s why it is called a “supercomputer”.

wongarsu · on June 13, 2022

I think the main difference is that on a supercomputer you generally run one task at a time, while in a DC you have computers that do different, unrelated things.

The rest kind of follows from that, like how a supercomputer that consists of multiple computers needs a fast, low-latency interconnect between them to coordinate and exchange results, while computers in a DC care a lot less about each other.

On the other hand the distinction is fluid. Google could call the indexers that power their search engine a supercomputer, but they prefer to talk about datacenters

SiempreViernes · on June 13, 2022

Not so much "generally" as having the ability of doing it, but it is true that it a supercomputer is managed like it is one big thing that has one job queue it tries to optimise.

jeffbee · on June 13, 2022

A cloud datacenter is about 50x larger than this, for starters.