DRAGONFLY NVME BENCHMARKS
	    2 x XEON 2620v4 (16 cores / 32 threads @ 2.1 GHz, turbo 3.0 GHz)
				    16 July 2016
				   Matthew Dillon

    nvme0: Model SAMSUNG_MZVPV128HDGM-00000 BaseSerial S1XVNYAGA03031 nscount=1
    nvme0: NVME Version 1.1 maxqe=16384 caps=00f000203c013fff
    nvme0: mapped 9 MSIX IRQs
    nvme0: Request 64/32 queues, Returns 8/8 queues, rw-sep map (8, 8)

    nvme1: Model SAMSUNG_MZVPV128HDGM-00000 BaseSerial S1XVNYAGA02988 nscount=1
    nvme1: NVME Version 1.1 maxqe=16384 caps=00f000203c013fff
    nvme1: mapped 9 MSIX IRQs
    nvme1: Request 64/32 queues, Returns 8/8 queues, rw-sep map (8, 8)

    nvme2: Model INTEL_SSDPEDMW400G4 BaseSerial CVCQ535100LC400AGN nscount=1
    nvme2: NVME Version 1.0 maxqe=4096 caps=0000002028010fff
    nvme2: mapped 32 MSIX IRQs
    nvme2: Request 64/32 queues, Returns 31/31 queues, rw-sep map (31, 31)

    NOTES ON IOSTAT and SYSTAT -pv 1 OUTPUT:

    * The iostat output shows the block-size, tps (IOPS), and throughput
      for each device under test.  'id' is for total-system idle percentage,
      and is correct.  I just noticed that it isn't reporting user 'us' time
      correctly.  The idle time is verified by the systat -pv 1 output.
      Each line is one second.

    * The systat -pv 1 breaks down system resources by cpu.
      
      timer	Timer interrupts/sec (lapic timer).

      ipi	IPIs/sec (general IPIs and invltlb/invlpg IPIs).
		Most of the IPI traffic will be for pmap invalidations
		and wakeup()s.

      extint	External interrupts (from the NVMe and AHCI SSDs).
		The NVMe devices use MSI-x, the AHCI devices use
		MSI.  Substantially all the extints you see are
		from the NVMe devices.  As you can see we distribute
		interrupts between cpus fairly well.

      smpcol	SMP collisions (with -pv 1 this is per-second).
		Any value less than 10000 or so is good.  The
		collision count can build very quickly since
		spin-locks increment the per-cpu value on each
		loop.

      label	Approximate reason for the SMP collisions, typically
		a lock name, spin-lock name, or procedure name.
		(heuristical, accurate if you stare at it for a few
		seconds but not so much in a snapshot).

    * The NVMe device spec supports up to 65535 queues and up to
      65535 entries per queue, but NVMe devices are not likely
      to implement these limits.  The Samsungs only implements
      8 queues, the Intel only 31 (non-inclusive of the admin
      queue).  Pretty stupid actually, since there's no need to
      use the interrupt mask registers with MSI-x.  But these
      are first-generation NVMe devices.

      The devices have generous queue entry limits (maxqe, 16384
      for the Samsung, 4096 for the Intel, per queue).  However,
      the driver I wrote for DFly currently sets an arbitrary
      256-entry limit per queue.  This is more than generous enough
      and really has to be done in order to pre-allocate (for
      performance reasons) all necessary DMA descriptor buffers for
      the highest-possible transfer size per entry.  Otherwise the
      kernel would have to allocate an insane amount of ram that
      would mostly go unused.

      DragonFly itself implements fine-grained, per-cpu MSI-x
      support and thus has no significant MSI-x limitations.  My
      preference would be to have one read and one write queue
      per cpu per NVMe device and it is really unfortunate that
      these first-generation devices do not have a generous number
      of queues or MSI-x vectors to make that possible.

      For the Samsungs my DFly driver is implementing a combined
      R/W queue for each group of 4 cpu threads (8 queues, 32
      cpu threads total).  For the Intel the driver is able
      to assign a unique queue to all but two cpus, with cpu
      #31 and cpu #0 sharing a queue.

    * WARNING!  The performance of the Intel NVMe SSD is destroyed
      in larger-block tests when I use 65536 bytes.  So for now
      Im using 32768 bytes for all large-block tests.  I consider
      this a serious bug in the Intel NVMe SSD chipset and hardware.
      It certainly isn't a bug in DragonFly.  The Samsung's don't
      have this problem.


				LARGE BLOCK TEST
			      BEFORE PHYSIO CHANGES
		      (RESULTS ~SAME AFTER PHYSIO CHANGES TOO)

    This is running two randread's per nvme card x 3 cards, 32KB block
    size, random seek, 32 threads (64 threads per card, total of
    196 threads).  All tests are on a 16GB partition via the raw device.
    The partition is completely filled with /dev/urandom data.

    Example test line using randread from /usr/src/sys/test/sysperf:
    randread /dev/nvme0s1b 4096 95 32	(two of these for each of nvme0,
					 nvme1, and nvme2).

    Aggregate throughput is around 5.0-5.1 GBytes/sec with the three
    NVMe devices (NVMe driver) and 6.5 GBytes/sec if I throw in another
    four SATA devices (AHCI driver).

      tty          nvme0                nvme1                nvme2             cpu
 tin tout  KB/t tps   MB/s     KB/t  tps   MB/s     KB/t  tps   MB/s     us ni sy in id
   0   83 32.00 55135 1722.96  32.00 53221 1663.16  32.00 65507 2047.03   0  0 27  1 72
   0   79 32.00 55234 1726.07  32.00 53455 1670.51  32.00 64968 2030.23   0  0 26  1 73
   0   81 32.00 46728 1460.29  32.00 53323 1666.36  32.00 65009 2031.52   0  0 24  1 75
   0   79 32.00 47748 1492.08  32.00 52949 1654.67  32.00 64619 2019.33   0  0 23  1 76
   0   80 32.00 37377 1168.03  32.00 53044 1657.61  32.00 65646 2051.42   0  0 20  1 78
   0   79 32.00 48556 1517.37  32.00 53126 1660.18  32.00 65465 2045.71   0  0 25  1 74

       timer     ipi  extint user%  sys% intr% idle%  smpcol             label
 total        1439669  160903                           62132
 cpu0     290   51350      15   0.7  20.8   1.7  76.9    1890          Xrelpbuf
 cpu1     283   51882    5927   0.0  23.9   0.8  75.4    1934          Xrelpbuf
 cpu2     283   59501    8848   0.0  26.9   0.8  72.3    2138      Xgetpbuf_kva
 cpu3     282   44157    4956   0.0  23.9   1.5  74.6    1463   X_vm_page_queue
 cpu4     283   62401    6184   0.8  31.6   2.3  65.4    2160          Xrelpbuf
 cpu5     282   47571    4961   0.0  26.9   0.0  73.1    1758      Xgetpbuf_kva
 cpu6     283   38836    5262   0.0  18.5   0.0  81.5    1551   X_vm_page_queue
 cpu7     283   43718    5596   0.0  19.2   3.1  77.7    1324   X_vm_page_queue
 cpu8     282   47511    6324   0.0  20.8   3.1  76.1    1945          Xrelpbuf
 cpu9     282   44625    5106   0.0  22.3   0.0  77.7    2151   X_vm_page_queue
 cpu10    282   50254    5860   0.0  23.1   1.5  75.4    2041          Xrelpbuf
 cpu11    283   49734    7519   0.8  23.1   1.5  74.6    2200          Xrelpbuf
 cpu12    283   50408    6840   0.0  16.9   1.5  81.5    2523          Xrelpbuf
 cpu13    283   53601    9031   0.8  26.2   1.5  71.5    2154   X_vm_page_queue
 cpu14    282   53460    7443   0.0  27.7   4.6  67.7    2520   X_vm_page_queue
 cpu15    282   51619    6735   0.0  25.4   0.8  73.8    2035          Xrelpbuf
 cpu16    282   49195   10131   0.0  22.3   0.8  76.9    1534   X_vm_page_queue
 cpu17    283   48078    4358   0.8  14.6   0.8  83.8    1320          Xrelpbuf
 cpu18    282   32368      27   0.0  18.5   0.0  81.5    1336          Xrelpbuf
 cpu19    283   40742    4077   0.0  18.5   0.0  81.5    1477          Xrelpbuf
 cpu20    283   44458    4131   0.0  26.2   0.0  73.8    1794          Xrelpbuf
 cpu21    283   33963    3107   0.0  22.3   0.8  76.9    1208      Xgetpbuf_kva
 cpu22    283   37800    3056   0.0  16.9   0.0  83.1    1676          Xrelpbuf
 cpu23    284   37727    3050   0.0  24.6   0.0  75.4    1637      Xgetpbuf_kva
 cpu24    284   35968    2057   0.0  13.9   0.0  86.1    1698      Xgetpbuf_kva
 cpu25    284   46486    6153   0.0  20.8   0.0  79.2    2975      Xgetpbuf_kva
 cpu26    282   39188    5121   2.3  21.6   0.8  75.4    2291      Xgetpbuf_kva
 cpu27    284   34978    3112   0.0  18.5   0.8  80.8    2109   X_vm_page_queue
 cpu28    282   34981    4503   0.8  13.9   2.3  83.1    2064      Xgetpbuf_kva
 cpu29    283   36798    3089   1.5  20.8   0.8  76.9    2408          Xrelpbuf
 cpu30    283   38774    4151   1.5  22.3   2.3  73.8    2612      Xgetpbuf_kva
 cpu31    283   47537    4173   1.5  26.2   0.0  72.3    2206      Xgetpbuf_kva


			      SMALL BLOCK RANDREAD TEST
				BEFORE PHYSIO CHANGES

    This is the 4K block size randread test.  Individually the cards can do
    around 200K IOPS.  Currently all three together perform more poorly,
    only 160K IOPS in aggregate.  

    Roughly speaking the problem is associated with the IPI rate which tends
    to max out at 150K IPI/sec.  These IPIs are due to kernel_pmap
    manipulation related to the buffer cache.  Also pbuf lock contention.
    As you can see, the cpu's are fully engaged (0% idle).

      tty           nvme0            nvme1            nvme2             cpu
 tin tout  KB/t tps   MB/s     KB/t tps   MB/s     KB/t tps   MB/s    us ni sy in id
   0   74  4.00 46882 183.13   4.00 54310 212.14   4.00 56358 220.15   0  0 96  4  0
   0   73  4.00 47400 185.16   4.00 53260 208.04   4.00 56695 221.44   0  0 96  4  0
   0   72  4.00 48413 189.11   4.00 54059 211.18   4.00 54986 214.78   0  0 96  3  0
   0   69  4.00 47826 186.82   4.00 54065 211.19   4.00 55118 215.30   0  0 96  3  0

        timer     ipi  extint user%  sys% intr% idle%  smpcol             label
 total        4405203  139569                         1834874
 cpu0     282  156766      10   1.5  94.7   3.8   0.0   50330          Xrelpbuf
 cpu1     276  152035    4954   0.0  96.2   3.8   0.0   50976      Xgetpbuf_kva
 cpu2     277  149827    7917   1.5  90.0   8.5   0.0   45292            Xnvqlk
 cpu3     276  147195    5634   0.0  93.9   6.1   0.0   45478          Xrelpbuf
 cpu4     276  146153    5970   2.3  95.4   2.3   0.0   46169      Xgetpbuf_kva
 cpu5 ... cpu31

			      SMALL BLOCK RANDREAD TEST
				AFTER PHYSIO CHANGES

    This is a 4K block size test after changes to the pbuf system used by
    physio.  Most telling of these changes is that tests on each card no
    longer interfere with each other, the IPI rate is way way down, lock
    contention is also way way down, and the system is able to achieve
    these results will maintaining 75% idle.

      tty           nvme0                 nvme1                 nvme2             cpu
 tin tout  KB/t tps    MB/s      KB/t tps    MB/s      KB/t tps    MB/s     us ni sy in id
   0   69  4.00 274582 1072.58   4.00 274437 1072.01   4.00 381999 1492.17   1  0 19  4 76
   0   57  4.00 274659 1072.89   4.00 274340 1071.64   4.00 382228 1493.06   1  0 20  3 76
   0   65  4.00 274597 1072.65   4.00 274436 1072.01   4.00 380048 1484.54   1  0 18  3 78
   0   65  4.00 274373 1071.77   4.00 273993 1070.29   4.00 380542 1486.46   1  0 18  3 77
   0   59  4.00 274157 1070.93   4.00 274310 1071.53   4.00 380431 1486.04   1  0 18  4 77
   0   58  4.00 275083 1074.54   4.00 274157 1070.93   4.00 381172 1488.93   1  0 20  3 76

        timer     ipi  extint user%  sys% intr% idle%  smpcol             label
 total         151375  940226                            5034
 cpu0     239   23672       8   0.8  10.7   1.6  86.9     573            Xnvqlk
 cpu1     232    4933   37915   0.8  27.7   2.3  69.2      85            Xnvqlk
 cpu2     232     393   51882   2.3  23.1   6.2  68.5     411            Xnvqlk
 cpu3     232    4338   30340   2.3  13.8   3.8  80.0      18      Xgetpbuf_mem
 cpu4     232    4930   40823   1.5  17.7   2.3  78.5      99            Xnvqlk
 cpu5     232      59   40732   0.8  21.5   5.4  72.3      85          Xrelpbuf
 cpu6     232    4989   33267   0.0  18.5   4.6  76.9      94            Xnvqlk
 cpu7     232     103   45010   2.3  26.9   4.6  66.2     132            Xnvqlk
 cpu8     232      25   27719   1.5  16.9   6.9  74.6      48          Xrelpbuf
 cpu9     232       0   30153   0.0  11.5   1.5  86.9      27      Xgetpbuf_mem
 cpu10    232    9887   37865   0.8  20.0   3.1  76.2     166            Xnvqlk
 cpu11    232      86   41407   1.5  20.8   7.7  70.0     103            Xnvqlk
 cpu12    232    4856   26126   0.8  13.1   2.3  83.8      15          Xrelpbuf
 cpu13    232     303   38798   2.3  18.5   3.8  75.4     323            Xnvqlk
 cpu14    233     170   34764   0.0  20.0   1.5  78.5     187            Xnvqlk
 cpu15    232       0   31869   4.6  16.2   5.4  73.8      23          Xrelpbuf
 cpu16    232     550   56215   2.3  30.0   2.3  65.4     575            Xnvqlk
 cpu17    232    9143   21787   0.8  16.9   3.8  78.5      68          Xrelpbuf
 cpu18    233   24311       0   0.8   9.2   0.0  90.0     550            Xnvqlk
 cpu19    233     938   24132   0.8  16.9   2.3  80.0      29            Xnvqlk
 cpu20    232    6374   12117   1.5   6.9   1.5  90.0     163            Xnvqlk
 cpu21    232    2909   18207   0.0  10.8   1.5  87.7      86            Xnvqlk
 cpu22    234    4073   18072   0.0  12.3   3.1  84.6      88            Xnvqlk
 cpu23    232    5066   18194   0.8  14.6   0.0  84.6     148            Xnvqlk
 cpu24    232       2   34357   1.5  16.9   3.8  77.7      27          Xrelpbuf
 cpu25    232    4850   36118   0.0  20.0   3.8  76.2      20          Xrelpbuf
 cpu26    232    4344   24183   0.8  15.4   0.8  83.1     189            Xnvqlk
 cpu27    232    9331   24196   0.0  11.5   0.8  87.7     138            Xnvqlk
 cpu28    232       0   30248   0.8  16.9   3.1  79.2      31      Xgetpbuf_mem
 cpu29    232   11313   24211   0.8  18.5   5.4  75.4     310            Xnvqlk
 cpu30    232    6779   19382   0.0  16.2   1.5  82.3     203            Xnvqlk
 cpu31    232    2648   30129   0.8  17.7   3.1  78.5      20          Xrelpbuf


			      SMALL BLOCK RANDREAD TEST
				AFTER PHYSIO CHANGES
				 3 x NVMe + 4 x SATA

    In this experiment I had three NVMe drives and four SATA drives,
    achieving 1.05M IOPS with random 4K reads which left the machine
    with 63% idle.  I need to buy some more NVMe drives, I think this
    machine should be capable of 2M IOPS+ before it runs out of cpu.

    I did have to make one change, and that was to increase the number
    of preallocated kernel pbufs from 256 to 512 since in this test
    I am running 320 user process threads for the test.  Without that
    change the IOPS becomes limited by avaiable pbufs in the kernel.

     tty             da0                 da1                 da2                 da3               nvme0                  nvme1                  nvme2           cpu
 tin tout  KB/t tps   MB/s     KB/t tps   MB/s     KB/t tps   MB/s     KB/t tps   MB/s     KB/t tps   MB/s       KB/t tps    MB/s      KB/t tps     MB/s    us ni sy in id
   0   73  4.00 57746 225.57   4.00 77116 301.23   4.00 29216 114.12   4.00 29486 115.18   4.00 274191 1071.05   4.00 274273 1071.38   4.00 380702 1487.08   2  0 31  6 62
   0   66  4.00 57763 225.64   4.00 76130 297.38   4.00 28992 113.25   4.00 29605 115.65   4.00 274923 1073.91   4.00 273980 1070.23   4.00 380305 1485.53   2  0 29  6 64
   0   65  4.00 57464 224.47   4.00 75386 294.48   4.00 28987 113.23   4.00 29475 115.14   4.00 274611 1072.71   4.00 273802 1069.53   4.00 379282 1481.51   2  0 29  5 65
   0   65  4.00 57989 226.52   4.00 74467 290.89   4.00 28832 112.63   4.00 29317 114.52   4.00 274584 1072.56   4.00 274162 1070.95   4.00 379796 1483.53   2  0 29  6 63
   0   66  4.00 57558 224.84   4.00 74587 291.36   4.00 28781 112.43   4.00 29491 115.20   4.00 273922 1070.00   4.00 273702 1069.14   4.00 379750 1483.33   2  0 31  6 61
   0   65  4.00 57662 225.24   4.00 75338 294.29   4.00 28795 112.48   4.00 29495 115.22   4.00 274470 1072.14   4.00 273672 1069.02   4.00 379529 1482.51   2  0 29  5 64

       timer     ipi  extint user%  sys% intr% idle%  smpcol             label
 total         990162  982494                          353883
 cpu0   40413   87057   30428   3.8  40.0   4.6  51.5   24988          Xahcicam
 cpu1     282  155197   39109   2.3  63.1  11.5  23.1   15485          Xahcicam
 cpu2     282    4543   48734   4.6  30.0   4.6  60.8    2783          Xrelpbuf
 cpu3     282  165138    8269   0.0  40.0  32.3  27.7   56868           Xahcipo
 cpu4     282    1161   50861   0.8  31.5   5.4  62.3     212            Xnvqlk
 cpu5     282    5746   48141   1.5  33.8   5.4  59.2    1255          Xahcicam
 cpu6     282   26873   32241   0.8  27.7   6.9  64.6   15695          Xahcicam
 cpu7     282   33200   25926   0.8  27.7   3.8  67.7   16462          Xahcicam
 cpu8     282   10319   56108   2.3  29.2   8.5  60.0    1136            Xnvqlk
 cpu9     282   10113   30148   0.8  21.5   4.6  73.1    4188          Xahcicam
 cpu10    282    5200   42295   0.0  25.4   3.1  71.5     820          Xrelpbuf
 cpu11    282    5701   38645   3.1  25.4   7.7  63.8    2433      Xgetpbuf_mem
 cpu12    282    3154   52138   0.8  29.2   6.2  63.8     358            Xnvqlk
 cpu13    282   13507   40507   3.1  31.5   3.8  61.5    1611          Xahcicam
 cpu14    283   15422   33537   0.8  29.2   1.5  68.5    7283          Xahcicam
 cpu15    282    9317   36373   0.8  30.8  10.0  58.5    2681          Xahcicam
 cpu16    282    3391   43111   0.0  39.2   6.9  53.8    1582          Xahcicam
 cpu17    282   32044   31724   1.5  36.2   3.8  58.5   16379          Xahcicam
 cpu18    282   52803      26   0.0  30.0   0.0  70.0   37945          Xahcicam
 cpu19    282   44982       0   0.0  14.6   0.0  85.4   26327          Xahcicam
 cpu20    283   13102   29949   2.3  23.8   3.8  70.0    4404          Xahcicam
 cpu21    282   37836   23980   2.3  26.2   4.6  66.9   18587          Xahcicam
 cpu22    282   38797   17959   2.3  26.9   1.5  69.2   23421          Xahcicam
 cpu23    282   33839   17971   0.8  26.9   3.8  68.5   21431          Xahcicam
 cpu24    282   35791   12059   0.0  24.6   2.3  73.1    2292          Xahcicam
 cpu25    284   18204   30009   1.5  26.9   2.3  69.2    3199          Xahcicam
 cpu26    282   26933   23878   2.3  27.7   3.1  66.9   13126          Xahcicam
 cpu27    283   18406   30063   2.3  30.8   1.5  65.4    5294          Xahcicam
 cpu28    282   28561   15075   1.5  28.5   1.5  68.5    8788          Xahcicam
 cpu29    283   10091   30173   1.5  23.8   2.3  72.3    2494          Xahcicam
 cpu30    283   21472   35899   0.8  20.0   6.2  73.1    3660          Xahcicam
 cpu31    283   22262   27158   0.8  34.6   5.4  59.2   10696          Xahcicam


			      LARGE BLOCK RANDREAD TEST
				AFTER PHYSIO CHANGES
				 3 x NVMe + 4 x SATA

    This test is using a 32KB block size (in deference to the Intel NVMe card
    which has *HORRIBLE* performance with 64KB blocks.  Literally only
    300MBytes/sec if I use 64KB blocks, verses 2GBytes/sec with 32KB blocks).

    In this test I am clearing 6.5 GBytes/sec using 32 user process threads
    per device except for the Intel (nvme2) where I use 64 user process threads.
    (Since the Intel has a nearly 1:1 queue mapping with 31 queues I need to
    queue at least two read requests per queue to maximize performance).

      tty            da0                 da1                da2                da3               nvme0                nvme1                nvme2            cpu
 tin tout  KB/t tps   MB/s     KB/t tps   MB/s     KB/t tps  MB/s     KB/t tps  MB/s     KB/t tps   MB/s      KB/t tps   MB/s      KB/t tps   MB/s     us ni sy in id
   0   71 32.00 16115 503.58  32.00 13834 432.30  32.00 6394 199.83  32.00 8560 267.51  32.00 55535 1735.47  32.00 53310 1665.97  32.00 64949 2029.63   0  0  9  2 89
   0   64 32.00 16086 502.68  32.00 13791 430.97  32.00 6394 199.80  32.00 8570 267.82  32.00 47155 1473.58  32.00 52977 1655.53  32.00 65181 2036.88   1  0  8  2 90
   0   57 32.00 16089 502.78  32.00 13839 432.47  32.00 6313 197.27  32.00 8520 266.26  32.00 47577 1486.78  32.00 53338 1666.80  32.00 65401 2043.78   0  0  8  2 90
   0   57 32.00 16057 501.78  32.00 13793 431.04  32.00 6318 197.43  32.00 8509 265.89  32.00 47063 1470.73  32.00 53184 1662.00  32.00 64865 2027.04   0  0  9  1 89
   0   62 32.00 16096 502.99  32.00 13838 432.44  32.00 6376 199.24  32.00 8490 265.32  32.00 47469 1483.39  32.00 53098 1659.32  32.00 65148 2035.88   0  0  8  2 90
   0   60 32.00 16095 502.97  32.00 13804 431.38  32.00 6399 199.95  32.00 8553 267.29  32.00 55278 1727.44  32.00 53178 1661.82  32.00 65247 2038.98   1  0  9  2 88

        timer     ipi  extint user%  sys% intr% idle%  smpcol             label
 total         215970  256043                           13758
 cpu0     288   19934   32356   0.0  17.1   8.5  74.4      33          Xahcicam
 cpu1   40621   29759   59232   0.0  35.7  11.6  52.7       0
 cpu2     281    1503    7375   0.8  10.1   2.3  86.8      13          Xahcicam
 cpu3   40622   90981       0   0.0   9.3  14.0  76.7   13147           Xahcipo
 cpu4     281     976    9015   0.8  14.0   2.3  82.9       2          Xahcicam
 cpu5     281    2529    7268   0.8   7.8   0.0  91.5       2          Xahcicam
 cpu6     281     293    9673   0.0  14.0   0.8  85.3       1          Xahcicam
 cpu7     281    3729    7292   0.0   7.8   0.8  91.5      25          Xahcicam
 cpu8     281      15   15663   0.8  22.5   0.8  76.0       8           Xahcipo
 cpu9     281    3042    2039   0.0   6.2   0.8  93.0      12          Xahcicam
 cpu10    281    1454    6127   0.8   3.9   2.3  93.0       2          Xahcicam
 cpu11    282     276    9367   0.0   9.3   0.8  89.9       1          Xrelpbuf
 cpu12    281    1249    8713   0.0   9.3   1.6  89.1       0
 cpu13    282   10141    8406   0.8  12.4   0.8  86.0       1      Xgetpbuf_mem
 cpu14    282    2331    7778   0.0   7.0   1.6  91.5       3          Xahcicam
 cpu15    283    2060    6671   2.3   8.5   1.5  87.6       1          Xrelpbuf
 cpu16    282    2490    3669   1.6   4.7   0.8  93.0      17          Xahcicam
 cpu17    281    2317   12099   0.0  12.4   1.6  86.0       1      Xgetpbuf_mem
 cpu18    282    3717       0   0.0   3.9   0.0  96.1      39          Xahcicam
 cpu19    282    4217    1037   0.0   7.7   0.0  92.3      52          Xrelpbuf
 cpu20    281    1510    4079   0.0  10.1   0.8  89.1      11          Xahcicam
 cpu21    281    3983    2075   0.0   4.7   0.0  95.3      43          Xahcicam
 cpu22    281    1926    4218   0.0  11.6   0.0  88.4      12          Xahcicam
 cpu23    281    2261    3051   0.8   6.2   0.0  93.0      13          Xahcicam
 cpu24    281    3873    4067   0.0   6.2   0.0  93.8       2          Xrelpbuf
 cpu25    281    5080    2073   0.0   8.5   0.0  91.5       6            Xnvqlk
 cpu26    281    2089    3093   0.0   5.4   0.8  93.8       6          Xrelpbuf
 cpu27    281    3569    3057   0.8   6.2   0.8  92.2       6          Xahcicam
 cpu28    281    1776    2105   0.8   3.1   0.0  96.1     277          Xahcicam
 cpu29    281    2450    4125   0.0   3.9   0.8  95.3      10          Xahcicam
 cpu30    281    1689    4118   0.0   6.2   1.6  92.2      11          Xahcicam
 cpu31    281    2751    6202   0.8   9.3   0.8  89.2       1            Xnvqlk