Let me start by explaining what we built: a computer. And, since we at NordAxon are generally inspired by the Hitchhiker's Guide to the Galaxy, we had to name our power computer after one of the characters.
So who is Marvin? For those of you that have not read the books, here comes a tl;dr of the character Marvin the Paranoid Android. In the books, Marvin is one of many failed prototypes of the Sirius Cybernetics Corporation, created to prototype human personality artificial intelligence. Unfortunately, because of his massive brain capacity — allegedly, Marvin is 50 000 times more intelligent than an average human — , he is instead afflicted with severe depression and boredom. No task is too complex for Marvin, no question too deep!
And why did NordAxon need to build a computer? The reason why we needed a local power machine here at NordAxon was because sometimes we need to handle sensitive data in our projects, data that we cannot or are not allowed to upload to the cloud. This means that we cannot depend on cloud services when it comes to training machine learning models, and as such, we needed to have our own machine. But why get an already built machine when it is so much more fun to hand-pick all the individual parts of one and building it yourself? So, that is exactly what we did! And Marvin the Paranoid Android, having such computing capacity and being an all-knowing character, made us realize that his name would fit our new power machine perfectly. We imagine that every time we use him for training one of our models, he will become a little bit happier, maybe eventually even so he can leave his depression behind him?
… No task is too complex for Marvin, no question too deep!…
Introducing Marvin our Super Computer
How does one go about building one’s own machine? We will try to explain this in a way that makes this readable both for the novice and the more technical profiles, so hang in there. For those of you that wish to take a look at something even lighter than this article can tune in on our (not at all corny) video about the build, found here!
There are in general 8 parts (components) that constitute a computer;
- GPU (graphics processing unit) — this one makes it possible for you to watch videos, play games and train machine learning models on your computer.
- CPU (central processing unit) — this can be compared to the project leader of the computer; the CPU computes calculations and delegates responsibilities and tasks to the GPU, by using its random access memory (RAM).
- CPU cooler — this component makes sure that the CPU does not overheat when working, and is pretty straight forward; keep the CPU at low temperatures so that it does not break.
- Motherboard — it is via the motherboard that the components communicate to each other, we could compare that to the nerve cells in the body: it is via them the information flows with tasks such as “lift finger”.
- RAM (random access memory) — this is where the CPU stores its temporary variables that it needs to send tasks to other parts of the computer.
- Storage memory — here we have our operating system, together with all of our files and programs
- Power supply —this is what makes sure that all components have the power they need to work, because we all know that nothing fun can happen with no power!
- Case — where we store all of our components, preferably with a nice window to the inside so that we can see and enjoy all the cool components.
Don’t worry, I will present some details around the first 4 components and what types of parts we chose and why, starting with the star of the show of course — our GPU.
Image Credit: Gigabyte
Since we work with machine learning (ML) and thus spend a lot of time training ML models, we needed to start our build from a powerful GPU. The reason why this was needed was because GPU:s have the capability to parallelize computing, which means that the training time of machine learning models will be sliced — the better the GPU, the shorter the training time. For those of you that are not that tech-savvy, it could slice training time from days to hours or less (depending on some surrounding variables as well, such as RAM or CPU capability, or training hyperparameters). So, let’s look closer at our choice of GPU.
Today, the world leading company when it comes to general purpose GPUs (i.e. GPUs that can be used for other things than strictly graphics processing, such as training machine learning models) is NVIDIA. During fall 2020 they released one of their latest series of GPUs, called the GeForce RTX series, which has proven to be immensely popular. Fortunately for us at NordAxon, we managed to get a hold of the best GPU from this series — the NVIDIA GeForce RTX 3090 with 24 GB memory, more specifically the Gigabyte RTX 3090 24GB TURBO.
“But wait, I thought you said NVIDIA and then you wrote Gigabyte?” Yes, yes, we know. It is a bit confusing. Essentially, NVIDIA designs the hardware architecture of graphics cards, and then lets other companies produce them with their own additions to it, such as cooling mechanisms. So, we bought the Gigabyte version with TURBO cooling. Was this Gigabyte Turbo version the only RTX GPU we could get a hold of? Yes. This series has, as we mentioned, been very popular and getting a hold of any GPU in the series is a challenge — they sell out in less than minutes as soon as batches are released.
So what does this TURBO cooling mean? Well, essentially the turbo cooling means that the air flow to cool the GPU is set up in a certain way. Now, one of the reasons that we could get a hold of this GPU is because the turbo cooling method is perceived as too weak for a GPU this powerful, meaning that it is possible that the GPU overheats before we reach its full potential — this means that people have been very hesitant to buy this expensive GPU. However, the turbo cooling method is essential if we want to add more GPUs into Marvin in the future, as we otherwise would spew warm air from one GPU to the next, overheating them anyway.
Now, is the 3090 the best GPU on the market for training machine learning models? No, of course there are better GPUs, such as the NVIDIA A100 Tensor Core, but for a small startup of 6 employees there is no chance that we could afford one of those. Is the 3090 the best GPU within a reasonable price range for a small startup? Yes! Does Marvin sound like a jet plane due to the loud turbo cooling mechanism? Indeed.
So, when we finally had gotten a hold of the GPU, we created the rest of the build. And so, our second component we decided on was the CPU.
Image Credit: Intel
So for our CPU we chose an Intel Core i9–7900X 3.3 GHz 10-Core Processor. What does all of this mean? Well, the first part, Intel Core i9–7900X, is the name of the CPU, but it is mostly what comes after that is interesting: 3.3 GHz and 10 cores. That first part, 3.3 GHz, is essentially a measure of how many calculations or operations that the CPU can do per time unit. This means at which speed the CPU can compute or assign tasks — the more, the better! The CPU having 10 cores essentially means that we have 10 small CPUs in our big CPU, also enabling parallelized computations — the more, the better applies here as well.
And yes, for those of you that keep up to date, we know that this is an old Intel CPU, released already in June 2017. But you know what they say; oldie but goldie.
Jokes aside, this CPU choice was a tradeoff. As we wanted to keep the possibility to add more GPUs to our build later on, we were constrained to a choice of motherboards. Given the motherboard that we chose, we were even more constrained regarding CPU, and finally, we went with this one. You have to start somewhere, right?
Now, in every build there is going to be a bottleneck. Taking into account that we chose a CPU with “only” 10 cores, this might be it. Have we noticed it bottlenecking us so far? Nope. But technically it could bottleneck us in the future, when we train bigger models with more data.
Image Credit: Corsair
Now for cooling, we realized that an all-in-one (AIO) CPU cooling system probably would do for our CPU, despite its reputation to overheat, and an AIO system looks neat and is easy to install (foreshadowing…). An AIO consists of a pump, two tubes, one radiator and normally two fans mounted on the radiator. We decided on the Corsair H115i PRO 55.4 CFM Liquid CPU Cooler, with subtle RGB on the pump. We know, we know, it is so nerdy with RGB lights in a computer. But we are nerdy at NordAxon... And this cooler gives us the possibility to play around with color schemes during our after works!
Now, an AIO cooler works as follows: we place the pump directly over the CPU. Between the pump and the CPU there is something called thermal paste. This paste will conduct heat away from the CPU to the pump. In the pump, and in the entire AIO cooling system, there is a cooling liquid that moves around. The liquid is heated by the CPU, then pumped away into the radiator. The radiator divides the liquid into small sections so that it can be efficiently cooled with fans, and the fans blow air on the radiator.
Installing this AIO cooler in the case proved to be a challenge, though. We tried installing it top-mounted, as recommended, but the case turned out to be just a tiny, tiny bit too short for that and the RAMs ended up being in the way. So, there went our air flow planning right out the window and we had to mount radiator and fans at the front of the case, right behind our case fans. So, the radiator got a double set of fans!
… there went our air flow planning right out the window and we had to mount radiator and fans at the front of the case…
We have not entirely figured out how we want our air flow to be configured for the maximum cooling capacity and the minimum dust intake in the case. Yet another thing for us to ponder during our after works…
Image Credit: ASUS
Given the budget we had for Marvin and the potential that we see in him in the near future, we chose the ASUS Pro WS X299 Sage II Workstation Motherboard. Easily put, this motherboard gave us a lot of value for the money and it allows us to add GPUs when needed — it even has fancy-sounding things such as PCIex16 ports, quad-channel memory support and overclocking possibilities. To not make this article unbearable, we will not go into the details of the motherboard more than this, but we will mention that we are fairly happy with this choice!
So, how did the actual building go? Well, a bit up and down, but everything worked out nice in the end! As mentioned, the AIO radiator and fans were tested on a multitude of places, finally being front mounted behind these fancy RGB case fans:
A real-life GIF showing the RGB case fans of Marvin! Are you impressed yet?
Our data scientists Alexander Hagelborn and Isabella Gagner gladly spent an entire Saturday building Marvin and configuring his software exactly the way they wanted him to work.
Some photos from the day! Both of us used ESD bracelets — since the room has a wall-to-wall carpet we were worried we might kill components with electrostatic discharge and thus used these bracelets to ground ourselves.
And now, as the finale, we will present the different components as seen in the fully built Marvin! The parts that are visible are shown with colored squares and the corresponding part name in the same color. And yes, for those of you that know a thing or two about AIO:s — we will mount the fans and radiator higher to prevent possible air leakage into the system.
Do you have any questions about the build or about the performance of Marvin? Let us know by commenting here or contacting us! We thank you for your time reading about our Marvin adventure and hope you will stay tuned for our next blog posts, that will feature models built and trained on Marvin!