I'm definitely not at the stage where I can say anything definitive about performance. For a SIMD-intensive task, a preliminary run on a 3.3Ghz eMag core gets about 400 megabytes/second of JSON parsing relative to 2.2 gigabytes/second on a 4.0Ghz Skylake. This may come down to having a maximum of 2 x 128-bit NEON operations per cycle vs 3 x 256 AVX2 operations per cycle, as well as some clock speed differences. Will eventually post about this at branchfree.org when I don my Nomex long johns (required whenever doing any benchmark that might imply to anyone that any processor runs faster or slower than any other processor).
Just tested the Daniel's tests against our 16 Core LX2160 that is still only clocked at 1.9Ghz (will be 2.2Ghz final production). These are our numbers
binarytrees: 37s
mandelbrot: 13.6s
fasta: 0.9s
I will give your benchmark a run once I have some time.
Whoa, thanks. 368, huh? That's pretty impressive given the 1.9Ghz clock speed... By coincidence, the 'twitter.json' file was the one I was using as my slender data point.
Do note that at this stage the benchmark is so preliminary as to be largely meaningless - I haven't really done much more than eyeball the results. But as a preliminary data point, that seems to be a stronger showing clock-for-clock than the eMag one.
Out of personal interest, I did some very unscientific benchmarking by timing a build of the Yocto core-image-sato distro for the BeagleBone. All sourcecode was downloaded beforehand, so that does not factor into the results. All the machines were running Ubuntu 18.04 Server except the A10, which runs Ubuntu 18.04 Desktop.
Here are the results:
34m9.347s EPYC 7401P 24C/48T 2.2GHz (Packet c2.medium.x86 bare metal server)
75m14.661s eMAG 32C/32T 3.3GHz (Packet c2.large.arm bare metal server)
96m31.901s i5-8259U 4C/8T 2.30-3.80GHz (Intel NUC8I5, NVMe SSD)
139m52.184s ThunderX 96C/96T 2.0GHz (Packet c1.large.arm bare metal server)
194m52.745s A10-6800K 4C/4T 4.1GHz (Old self-built desktop, slow-ish SSD)
535m52.642s Celeron N3150 4C/4T 1.60-2.08GHz (Gigabyte Brix, SSD)
I assume the eMAG results will improve somewhat once its support matures, but the difference to the i5 is disappointingly small. Both ARM machines performed reasonably well when all cores were used, but the relatively weak per-core performance showed whenever utilization fell. But although the ThunderX was slow, looking at 96 cores in htop felt pretty good...
That case surprised me. I just built an Intel based Linux system using that exact case; it's a "be quiet!" Pure Base 600 Black [1]. They "debranded" the case in the photo with a low-effort edit.
It's well engineered and the tooling and finish are very high quality. Everything snicks together with great precision.
I would not build a high end gaming machine with it. Motherboard tray cable routing cutouts are not well positioned if you have a large, high feature motherboard. I knew this but as I used an mATX board I didn't care. Also, airflow is limited by the solid front panel as is always the case with 'quite' designs.
The supplied fans are excellent but only one front case fan is supplied; I knew this and obtained a second "be quiet!" 140mm fan for my build.
I have mixed feelings about the gauge of steel. Typically quiet focused cases of the sort I've been using for many years now rely in part on heavy steel. The steel in this case is comparatively thin; similar to what you get with OEM machines from Dell or HP. On one hand I miss the rigidity of prior cases, on the other I've been surprised at how happy I am with the reduction in weight.
The size is perfect. It's a little larger in every dimension than a traditional mid tower making assembly and changes easier.
This is a bit of a boutique product; Amazon doesn't have this exact model and it took a while for shipment through newegg; it shipped from the manufacturers US warehouse and took extra time.
I would buy it again.
I did a double take when I saw the Arm workstation but I suppose it's not really surprising. The market is full of windowed, LED riddled gaming cases on one hand and low end 'value' stuff on the other. There are few quality 'grown up' looking cases available. I had 'workstation' in mind when I went hunting for a case and I imagine that's what Ampere was thinking as well; we made thoughtful choices and ended up in the same place.
Generally rack based equipment is designed with less vertical height and strict front to back airflow. They are also often designed for relatively high input temperatures, so they are designed for high airflow rates. These combine to be very loud and often crazy inefficient. Read that as consuming significant power to cool the equipment.
Mini/Mid/Full tower cases generally are designed to take advantage of heat wanting to raise. So the intakes are often large and low (140mm is not unusual), and the top rear for the exhaust. Even 200mm isn't unusual for the exhaust. Air moving efficient increases quickly with fan size. 1U fans often move at 15k rpm and make more noise and vibration than anything else. Desktop fans are often 1200 rpm or lower and just take a few watts to dump substantial heat.
As an example the Fractal Design Mini C (a small, quiet, under $100 case) has room for 2 x 140mm in the lower front, and 2 x 140mm on top. It's a smaller case, so there's only 120mm in the rear. With $80 ish for the case and a few extra fans (Fractal design isn't bad, but not quite class leading) you can easily dump a few 100 watts quietly.
Find a rack mount case that can move as much air as quietly is challenging and when possible often prohibitively expensive and/or crazy loud. Last time I build one to house a single socket motherboard and a GPU (much like a desktop) it included 4 delta fans that I needed ear protection on to be in the same room.
I don't see your point. Maybe it is true for off the shelf stuff. I may assume that a lot of YC readers build their own system.
I just build a rack for my desk. I have not really put it under load, but I have to see the GPU, Power supply or Case fan spinning yet. For now I only have only once fan for the case. The only fan that moves, but silent is the CPU fan. The only thing that you can hear: The 10 TB HDD.
Speaking of Xeon, you could probably put together a much cheaper workstation with an older Xeon over the eMag. I just don’t see myself picking this up, at its current price point.
Without knowing what the perf is like, it is really difficult to say. 32 cores @ 2.8-3.3 GHz sounds like quite a lot, I don't think getting 32 core Xeon would be anywhere near as cheap as this.
Is there anything similar but more hobbyist priced? DIY would be fine, but all motherboards that I know of don’t have ram slots or other desktop like amenities
Semihalf is maintaining the tree and patches, but I believe the original work done to support using a GPU on the platform was done by Linaro. I could be mis-remembering but I believe that is how things happened.
We are waiting for the final release of the EDK2 tree from NXP. I will refrain from promising anything until that is integrated.
The cores are not overclockable. 2.2Ghz will be the limit.
> The 10-100G ports are no extra cost. It is available on the SOC and brought out on the COM Express pins so we wanted it available for those that could use it. Additionally these are raw SERDES lanes so in theory could be attached to a PCIe expansion cage.
Looks like anyone who makes these 16-core A72 chips is making them for networking (this NXP one, Mellanox Bluefield is a somewhat similar idea I think). So you can't not pay the cost of the NIC that's on the chip already…
With Talos II, you'd get a workstation with only 4 cores at this price. Blackbird board + 8-core CPU bundle is a bit better, but now you're limited by the small form factor.
Those 4 POWER9 cores (16 threads with its 4-way SMT) will obliterate this ARM. Depending on the benchmark, this 32 core ARM box will do 1/2 the single thread performance and about 2.5x the multithreaded performance of my venerable Sandy Bridge i7 desktop (so it does 2.5x the MT performance with 8x the amount of physical cores). This is a hardware class from 2011, which you can usually pick up for under $250 from eBay in second-hand corporate desktops. On the other hand, the POWER9 in the Talos II can go toe-to-toe with Intel's Skylake in both single thread and especially multithreaded use cases.
The reason to buy a workstation like this is not performance, but rather being able to natively develop ARM software locally. That's the only real killer feature it has over Intel/AMD/IBM processors, but a legitimate reason to buy it given the availability of ARM servers at cloud providers.
Daniel Lemire's single-threaded Mandelbrot benchmark: 15s on the Ampere eMAG (Skylark), 24s on a 4GHz Skylake. (Also wins in bitset_count.) It's definitely faster 1/2 of Sandy Bridge.
eMAG will have a disadvantage in SIMD, but for normal workloads (make -j32 on a huge project :D) it should be plenty fast.
I don't think that that Mandelbrot benchmark is single threaded. It would mean that this processor is much faster than Skylake in computational instructions per clock, which would be a huge game changer (and it clearly isn't). That 25s vs 18s figure makes a lot more sense if the benchmark is multithreaded, since Skylake is around 1.7x to 2x faster than Sandy Bridge.
Moreover, the benchmark source code [0] clearly uses OpenMP to parallelize the benchmark tasks.
You can't really benchmark anything on FreeBSD X-CURRENT. There is a ton of debug code there which slowdowns whole system including its libc. Very unfair comparison be warned!
Me too! I do have remote access to POWER9, but this machine is kind of sick since its results are completely abysmal. As I don't know its configuration it may be whatever. I do have access to POWER8 which I know very well and I can benchmark that. Ubuntu 18.04 LTS, gcc 7.3.0:
long lived tree of depth 21 check: 4194303
real 18.00
user 17.97
sys 0.03
pixels[15476] = 7
real 7.61
user 7.61
sys 0.00
tccgcggatttaccatcctc
ctgatgttaattctctgtggtcagatacagaccaaaaac
real 1.18
user 1.18
sys 0.00
The 32 core ARM is 30% faster than a 8 core Ryzen on Geekbench [0]. It's not exactly slow but it also has 4 times as many cores which is pretty depressing.
at that price i don't think punters are the target demographic.
if your build target is an arm environment (industrial, embeded?) or you are a uni teaching/researching arm isa, then i imagine it would be a pretty nice bit of kit.
I'm definitely not at the stage where I can say anything definitive about performance. For a SIMD-intensive task, a preliminary run on a 3.3Ghz eMag core gets about 400 megabytes/second of JSON parsing relative to 2.2 gigabytes/second on a 4.0Ghz Skylake. This may come down to having a maximum of 2 x 128-bit NEON operations per cycle vs 3 x 256 AVX2 operations per cycle, as well as some clock speed differences. Will eventually post about this at branchfree.org when I don my Nomex long johns (required whenever doing any benchmark that might imply to anyone that any processor runs faster or slower than any other processor).