BUILDKERNEL.TXT DragonFlyBSD buildworld and buildkernel NO_MODULES tests on various bits of hardware, plus a additional concurrency tests. See buildkernel.old.txt for the 'old' version of this file. This version starts fresh to give us a new baseline but necessarily does not include all the CPUs the old version had. Measuring performance in modern times requires a somewhat different take. Modern CPUs naturally turbo or overclock, pull varying amounts of peak power from the wall, and AMD CPUs in particular are sensitive to the speed of the memory fabric. In order to supply appropriate context, DDR4 frequency and wattage at the wall is included. These tests are done just with hardware I have on-hand, configured in various different ways. It is by no means comprehensive but the numbers are very meaningful if you have an eye towards understanding performance/power. Some of these machines, particularly the Zen-based parts, can operate at extreme efficiencies with minor BIOS tweaking. Tests: (N=32, N=64 on 32-thread boxes and N=128 on 64-thread boxes, and N=256 on 128-thread boxes) cpdup /usr/src /tmp/src cd /tmp/src time make -j N buildworld >& /tmp/bw.out Modest concurrency, longer build time at load NOTE: set WORLD_ALTCOMPILER=gcc47 in /etc/make.conf to avoid building gcc50 so all results are on an even keel. time make -j N buildkernel NO_MODULES=TRUE >& /tmp/bk.out Higher concurrency (except for depend at beginning and link at end), shorter build times under load. time comp32x This is a test script which concurrently executes 64 copies of another script which compiles a file 200 times serially. Each copy is locked to a cpu thread. So basically the same source file is compiled (to a different .o target in /tmp) 12800 times in this test. It fully loads the machine for the duration of the test and tends to also life-test the operating system's own concurrency for the related shared resources (such as the compiler binary, path lookups, etc). (buildworld 19-Sep-2018 gcc8) CPU Cores Idle Full MemFreq Timing -------------- ----- ---- ---- ---------- ------------------------------------ TR2 2990WX (1) 32/64 65W 250W 2666C14ECC 6011.630u 2329.279s 5:10.64 2685.0% TR2 2990WX (1) 32/64 85W 250W 3000 9205.971u 3567.632s 6:22.89 3336.1% Zen 2700X (2) 8/16 40W 115W 2133-ECC 5725.612u 1591.252s 9:46.19 1248.2% e5-2620v4 x 2 16/32 100W 175W 2133-ECC 9661.961u 2620.193s 9:59.62 2048.3% Zen 2600 6/12 53W 130W 2133-ECC 5074.088u 1378.119s 10:48.68 994.6% Zen 2400G 4/8 28W 80W 3000 5436.010u 1396.686s 15:56.51 714.3% Zen 3500U 4/8 --- --- --- 7412.187u 1072.496s 19:49.89 713.0% i5-6500 4/4 40W ---W ? 4475.738u 1565.652s 28:17.13 355.9% i5-7200U(laptp) 2/4 3W 20W ? 5728.982u 1713.134s 32:54.55 376.9% i3-4130 2/4 35W 56W (DDR3) 6946.033u 2133.954s 42:18.03 357.7% i5-5200U(BRIX) 2/4 9W 22W 2133 9073.578u 2816.897s 54:40.37 362.4% (buildworld 15-Jun-2019 gcc8) CPU Cores Idle Full MemFreq Timing -------------- ----- ---- ---- ---------- ------------------------------------ TR2 2990WX (1) 32/64 65W 250W 2400C14ECC 6139.443u 1682.077s 4:51.58 2682.4% Zen 2700X 8/16 40W ---- 2666C14 4852.887u 803.640s 8:00.75 1176.6% Zen 2700X PBO 8/16 40W 195W 2666C14 4819.923u 765.378s 7:40.60 1212.6% Zen 2700X PBO 8/16 40W 195W 2666C15 4834.637u 823.245s 7:44.94 1216.9% Zen 2700X 3.6 8/16 40W 144W 2666C14 5111.739u 838.797s 8:07.81 1219.8% (buildworld 11-Jul-2019 gcc8) Zen 3600X 6/12 40W 125W 2933C16 3614.134u 575.678s 7:09.98 974.4% Zen 3900X 12/24 46W 205W 3000C15 3587.725u 660.518s 4:28.75 1580.7% Zen 3900X (5) 12/24 46W 150W 3000C15 3786.311u 662.455s 4:33.85 1624.5% Zen 3900X (5) 12/24 46W 130W 3000C15 4058.894u 687.264s 4:47.56 1650.4% (buildworld 12-Feb-2020) CPU Cores Idle Full MemFreq Timing -------------- ----- ---- ---- ---------- ------------------------------------ TR3 3990X 64/128 100W 410W 2666C20ECC 3612.585u 2170.080s 2:59.97 3213.1% (buildworld 28-Jul-2020) i5-1035G1(lap) 4/8 ? ? ? 8724.921u 1265.721s 22:22.21 744.3% 4700U(lap) 8/8 ? ? ? 3870.234u 1254.148s 12:24.89 687.9% 4800H 8/16 ? ? ? 4538.554u 696.939s 6:56.02 1258.4% (buildworld 22-Mar-2020) Zen 3900X 12/24 --- ---- 2133ECC 3531.192u 572.466s 5:20.24 1281.4% Zen 5900X 12/24 --- ---- 2666ECC 3228.211u 572.559s 4:34.34 1385.4% (buildworld 30-May-2022) Zen 5600G 6/12 25W --- 2666 3643.955u 632.772s 7:50.27 909.4% (buildworld 05-Jan-2023) Zen 7900X 12/24 --- ---- DDR5/64G 2803.383u 607.868s 4:11.14 1358.3% Zen 7950X3D 16/32 --- ---- DDR5/128G 2423.654u 452.168s 3:19.29 1443.0% (buildkernel 19-Sep-2018 gcc8) (20-Oct-2018 for 2666C14ECC test) CPU Cores Idle Full MemFreq Timing -------------- ----- ---- ---- ---------- ------------------------------------ TR2 2990WX (1) 32/64 65W 250W 2666C14ECC 647.831u 109.343s 0:45.54 1662.6% TR2 2990WX (1) 32/64 85W 250W 3000 806.621u 197.263s 0:51.80 1937.9% Zen 2700X (2) 8/16 40W 115W 2133-ECC 569.925u 98.118s 1:15.57 883.9% e5-2620v4 x 2 16/32 100W 175W 2133-ECC 754.642u 135.770s 1:23.71 1063.6% Zen 2600 6/12 53W 130W 2133-ECC 502.161u 87.920s 1:22.61 714.2% Zen 2400G 4/8 35W 80W 3000 527.056u 92.081s 1:51.38 555.8% Zen 3500U 4/8 --- --- --- 730.303u 67.417s 2:12.37 602.6% i5-6500 4/4 40W ---W ? 359.409u 82.585s 2:29.03 296.5% i5-7200U(laptp) 2/4 3W 20W ? 578.944u 105.356s 3:27.98 329.0% i3-4130 2/4 35W 56W (DDR3) 569.165u 103.885s 4:11.75 267.3% i5-5200U(BRIX) 2/4 9W 22W 2133 752.936u 137.936s 5:05.00 292.0% (buildkernel 15-Jun-2019 gcc8) CPU Cores Idle Full MemFreq Timing -------------- ----- ---- ---- ---------- ------------------------------------ TR2 2990WX (1) 32/64 65W 250W 2400C14ECC 637.602u 73.099s 0:48.56 1463.5% Zen 2700X (2) 8/16 40W ---- 2666C14 474.562u 48.915s 1:05.61 797.8% (buildkernel 11-Jul-2019 gcc8) Zen 3600X 6/12 40W 125W 2666C15/ECC 367.475u 34.788s 0:55.33 727.0% Zen 3600X 6/12 40W 125W 2933C16 365.356u 35.650s 0:55.44 723.3% Zen 3600X 6/12 40W 125W 3000C15 356.894u 32.793s 0:53.63 726.6% Zen 3900X (5) 12/24 46W 205W 3000C15 368.736u 37.648s 0:39.68 1024.1% Zen 3900X (5) 12/24 46W 140W 3000C15 400.252u 35.650s 0:44.09 988.6% Zen 3900X (6) 12/24 46W 150W 3000C15 368.940u 38.703s 0:41.31 986.7% TR2 2990WX (1) 32/64 65W 250W 2400C14ECC 651.720u 75.272s 0:40.39 1799.9% (buildkernel 12-Feb-2020) CPU Cores Idle Full MemFreq Timing -------------- ----- ---- ---- ---------- ------------------------------------ TR3 3990X 64/128 100W 410W 2666C20ECC 412.269u 70.163s 0:30.24 1595.3% (buildkernel 28-Jul-2020) i5-1035G1(lap) 4/8 ? ? ? 904.424u 81.832s 2:26.97 671.0% 4700U(lap) 8/8 ? ? ? 385.315u 86.409s 1:30.98 518.4% 4800H(lap) 8/16 ? ? ? 464.901u 51.436s 0:55.24 934.7% (buildkernel 22-Mar-2020) Zen 3900X 12/24 --- ---- 2666ECC 389.074u 38.302s 0:41.39 1032.5% Zen 3900X 12/24 --- ---- 2933ECC 383.440u 35.953s 0:40.97 1023.6% Zen 5900X 12/24 --- ---- 2666ECC 354.309u 38.563s 0:34.03 1154.4% Zen 5900X 12/24 --- ---- 2933ECC 355.146u 34.045s 0:33.75 1153.1% (buildkernel 06-Nov-2021) Zen 3550H 4/8 --- --- --- 656.105u 66.461s 2:00.21 601.0% Zen 5600G 6/12 25W --- 2666 383.584u 43.684s 0:54.74 780.5% (buildkernel 05-Jan-2023) Zen 7900X 12/24 --- ---- DDR5/64GB 234.508u 26.611s 0:24.16 1080.7% Zen 7950X3D 16/32 --- ---- DDR5/128G 264.692u 29.230s 0:24.82 1184.2% (comp32x 19-Sep-2018 gcc8) (20-Oct-2018 for 2666C14ECC test) CPU Cores Idle Full MemFreq Timing -------------- ----- ---- ---- ---------- ------------------------------------ TR2 2990WX (1) 32/64 65W 250W 2666C14ECC 3573.039u 914.592s 1:13.36 6117.2% TR2 2990WX (1) 32/64 85W 250W 3000(x4) 3789.265u 1133.939s 1:26.30 5704.7% Opt6168x4 (4) 48/48 ? 1kW (DDR3) 3793.625u 916.108s 1:47.07 4398.7% e5-2620v4 x 2 16/32 100W 175W 2133-ECC 4004.389u 955.936s 2:47.80 2956.0% Zen 2700X (2) 8/16 40W 115W 2133-ECC 3050.261u 705.751s 3:57.02 1584.6% Zen 2600 6/12 53W 130W 2400-ECC 2671.949u 514.877s 4:57.91 1069.7% Zen 2600 6/12 53W 130W 2133-ECC 2661.470u 579.514s 5:02.18 1072.5% Zen 2400G 4/8 35W 80W 3000 2724.998u 552.647s 6:54.33 791.0% i5-6500 4/4 40W ---W ? 1564.546u 540.124s 8:53.57 394.4% i3-4130 2/4 35W 56W (DDR3) 2928.072u 614.627s 14:58.50 394.2% i5-7200U(laptp) 2/4 3W 20W ? 3060.005u 756.862s 16:00.64 397.3% i5-5200U(BRIX) 2/4 9W 22W 2133 3932.795u 865.197s 20:12.45 395.7% (11-Jul-2019) Zen 3600X 6/12 40W 125W 2933 1937.696u 250.559s 3:19.30 1097.9% Zen 3900X 12/24 46W 205W 3000C15 1923.960u 264.267s 1:46.42 2056.2% Zen 3900X (5) 12/24 46W 130W 3000C15 2150.183u 274.704s 1:56.88 2074.6% Zen 3900X (5) 12/24 46W 150W 3000C15 2049.746u 251.210s 1:51.75 2059.0% Zen 3900X (5) 12/24 46W 90W 3000C15 3076.203u 359.454s 2:42.37 2115.9% (12-Feb-2020) TR3 3990X 64/128 100W 410W 2666C20ECC 1556.588u 321.780s 0:30.45 6168.6% NOTE(1) - Power envelope explicitly limited to 250W via XFR2/PPT NOTE(2) - Power envelope explicitly limited to 115W via XFR2/PPT NOTE(4) - Keep in mind total time is per core, whereas on the TR2 and the dual xeon and the rest its per-thread. Normalizing the results, the TR2 is 6.6x more power efficient than the older quad opteron. It has 1.6x the computing power on an absolute, whole system basis, using 1/4 the power. NOTE(5) - The Power envelope for the 3900X is explicitly limited to various values via PPT limiting in CBS/NBIO. Note just how little it loses even when we shed 75W off of its power envelope. This chip is insane. NOTE(6) - Fixed 4.1 GHz VCORE 1.2345V. Note low power consumption, its still only 150W. RYZEN 3900X MEMORY SPEED TESTS WITH ECC faultzf 24 (all-cores zero-fill VM fault rate test) 2133/ECC @ 1.2V 2.7-2.8 MFault/s 2400/ECC @ 1.35V 3.2-3.3 MFault/s 2666C15/ECC @ 1.35V 3.5-3.6 MFault/s THREADRIPPER 2990WX POWER ENVELOPE TESTS with comp32x (comp32x 19-Sep-2018 gcc8) CPU Cores Idle Full MemFreq Timing -------------- ----- ---- ---- ---------- ------------------------------------ 23876 Ws (3) 65W 127W 2133(x8) 9228.356u 2194.057s 3:07.81 6081.8% >18240 Ws< (3) <--- 65W 160W 2133(x8) 5188.827u 1475.815s 1:53.93 5849.7% 20492 Ws (3) 65W 188W 2133(x4) 4431.668u 1539.502s 1:49.39 5458.5% 21830 Ws (3) 85W 185W 3000(x4) 5569.357u 1530.499s 1:57.95 6019.3% 18816 Ws (3) 65W 192W 2133(x8) 4059.762u 1393.962s 1:37.99 5565.5% 19740 Ws (3) 65W 210W 2133(x8) 3817.086u 1377.514s 1:33.77 5539.7% 21500 Ws (3) 85W 250W 3000(x4) 3789.265u 1133.939s 1:26.30 5704.7% NOTE(3) - Additional tests with different power envelopes. NON-ECC memory. Maximum efficiency point is somewhere between 160W and 192W with all 8 dimm slots populated running at 2133 MHz. Best in this test is 18240 Watt-seconds @ 160W. Populating all 8 slots confers around a 10% advantage over just populating 4 slots in this test. EFFECT OF RETPOLINE ON KERNEL GCC-8 -mindirect-branch=thunk-inline 16-May-2019 We now build the kernel with retpoline, which protects against return stack buffer Spectre attacks. On modern CPUs Xeon e5-2620v4 x 2 time make -j 32 nativekernel (all tmpfs) BEFORE 1718.550u 323.429s 2:26.26 1396.1% 9594+723k 200902+0io 4884pf+0w BEFORE 1716.093u 339.106s 2:29.23 1377.1% 9564+720k 199780+0io 4818pf+0w BEFORE 1725.135u 341.668s 2:29.26 1384.6% 9555+720k 199780+0io 4818pf+0w AFTER 1720.271u 329.492s 2:28.27 1382.4% 9578+721k 200842+0io 4870pf+0w AFTER 1736.268u 344.874s 2:30.90 1379.1% 9555+720k 199720+0io 4804pf+0w AFTER 1726.056u 348.324s 2:31.14 1372.4% 9543+719k 199720+0io 4804pf+0w Haswell i3-4130 time make -j 8 nativekernel (all tmpfs) BEFORE 1370.400u 315.587s 7:38.53 367.6% 9702+730k 201502+0io 5396pf+0w BEFORE 1372.673u 324.329s 7:36.68 371.5% 9686+729k 199692+0io 4804pf+0w BEFORE 1372.179u 323.946s 7:36.46 371.5% 9677+729k 199692+0io 4804pf+0w AFTER 1383.109u 317.079s 7:40.24 369.4% 9691+730k 201500+0io 5382pf+0w AFTER 1379.950u 334.446s 7:41.16 371.7% 9682+729k 199690+0io 4804pf+0w AFTER 1375.151u 333.069s 7:42.39 369.4% 9665+728k 199692+0io 4804pf+0w Ryzen 2400G time make -j 16 nativekernel (all tmpfs) BEFORE 1338.987u 258.311s 4:03.43 656.1% 9785+736k 201506+0io 5420pf+0w BEFORE 1345.617u 270.968s 4:06.20 656.6% 9762+734k 199702+0io 4804pf+0w BEFORE 1345.131u 273.166s 4:06.99 655.2% 9754+733k 199704+0io 4804pf+0w AFTER 1341.246u 261.299s 4:02.51 660.8% 9786+736k 201518+0io 5398pf+0w AFTER 1340.052u 275.687s 4:08.17 651.0% 9736+732k 199704+0io 4804pf+0w AFTER 1346.427u 276.280s 4:06.70 657.7% 9735+732k 199704+0io 4804pf+0w