linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: John Hubbard <jhubbard@nvidia.com>
To: Tom Talpey <tom@talpey.com>, <john.hubbard@gmail.com>,
	<linux-mm@kvack.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	LKML <linux-kernel@vger.kernel.org>,
	linux-rdma <linux-rdma@vger.kernel.org>,
	<linux-fsdevel@vger.kernel.org>
Subject: Re: [PATCH v2 0/6] RFC: gup+dma: tracking dma-pinned pages
Date: Wed, 21 Nov 2018 14:06:34 -0800	[thread overview]
Message-ID: <c1ba07d6-ebfa-ddb9-c25e-e5c1bfbecf74@nvidia.com> (raw)
In-Reply-To: <5159e02f-17f8-df8b-600c-1b09356e46a9@talpey.com>

On 11/21/18 8:49 AM, Tom Talpey wrote:
> On 11/21/2018 1:09 AM, John Hubbard wrote:
>> On 11/19/18 10:57 AM, Tom Talpey wrote:
>>> ~14000 4KB read IOPS is really, really low for an NVMe disk.
>>
>> Yes, but Jan Kara's original config file for fio is *intended* to highlight
>> the get_user_pages/put_user_pages changes. It was *not* intended to get max
>> performance,  as you can see by the numjobs and direct IO parameters:
>>
>> cat fio.conf
>> [reader]
>> direct=1
>> ioengine=libaio
>> blocksize=4096
>> size=1g
>> numjobs=1
>> rw=read
>> iodepth=64
> 
> To be clear - I used those identical parameters, on my lower-spec
> machine, and got 400,000 4KB read IOPS. Those results are nearly 30x
> higher than yours!

OK, then something really is wrong here...

> 
>> So I'm thinking that this is not a "tainted" test, but rather, we're constraining
>> things a lot with these choices. It's hard to find a good test config to run that
>> allows decisions, but so far, I'm not really seeing anything that says "this
>> is so bad that we can't afford to fix the brokenness." I think.
> 
> I'm not suggesting we tune the benchmark, I'm suggesting the results
> on your system are not meaningful since they are orders of magnitude
> low. And without meaningful data it's impossible to see the performance
> impact of the change...
> 
>>> Can you confirm what type of hardware you're running this test on?
>>> CPU, memory speed and capacity, and NVMe device especially?
>>>
>>> Tom.
>>
>> Yes, it's a nice new system, I don't expect any strange perf problems:
>>
>> CPU: Intel(R) Core(TM) i7-7800X CPU @ 3.50GHz
>>      (Intel X299 chipset)
>> Block device: nvme-Samsung_SSD_970_EVO_250GB
>> DRAM: 32 GB
> 
> The Samsung Evo 970 250GB is speced to yield 200,000 random read IOPS
> with a 4KB QD32 workload:
> 
> 
> https://www.samsung.com/us/computing/memory-storage/solid-state-drives/ssd-970-evo-nvme-m-2-250gb-mz-v7e250bw/#specs
> 
> And the I7-7800X is a 6-core processor (12 hyperthreads).
> 
>> So, here's a comparison using 20 threads, direct IO, for the baseline vs.
>> patched kernel (below). Highlights:
>>
>>     -- IOPS are similar, around 60k.
>>     -- BW gets worse, dropping from 290 to 220 MB/s.
>>     -- CPU is well under 100%.
>>     -- latency is incredibly long, but...20 threads.
>>
>> Baseline:
>>
>> $ ./run.sh
>> fio configuration:
>> [reader]
>> ioengine=libaio
>> blocksize=4096
>> size=1g
>> rw=read
>> group_reporting
>> iodepth=256
>> direct=1
>> numjobs=20
> 
> Ouch - 20 threads issuing 256 io's each!? Of course latency skyrockets.
> That's going to cause tremendous queuing, and context switching, far
> outside of the get_user_pages() change.
> 
> But even so, it only brings IOPS to 74.2K, which is still far short of
> the device's 200K spec.
> 
> Comparing anyway:
> 
> 
>> Patched:
>>
>> -------- Running fio:
>> reader: (g=0): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=256
>> ...
>> fio-3.3
>> Starting 20 processes
>> Jobs: 13 (f=8): [_(1),R(1),_(1),f(1),R(2),_(1),f(2),_(1),R(1),f(1),R(1),f(1),R(1),_(2),R(1),_(1),R(1)][97.9%][r=229MiB/s,w=0KiB/s][r=58.5k,w=0 IOPS][eta 00m:02s]
>> reader: (groupid=0, jobs=20): err= 0: pid=2104: Tue Nov 20 22:01:58 2018
>>     read: IOPS=56.8k, BW=222MiB/s (232MB/s)(20.0GiB/92385msec)
>> ...
>> Thoughts?
> 
> Concern - the 74.2K IOPS unpatched drops to 56.8K patched!

ACK. :)

> 
> What I'd really like to see is to go back to the original fio parameters
> (1 thread, 64 iodepth) and try to get a result that gets at least close
> to the speced 200K IOPS of the NVMe device. There seems to be something
> wrong with yours, currently.

I'll dig into what has gone wrong with the test. I see fio putting data files
in the right place, so the obvious "using the wrong drive" is (probably)
not it. Even though it really feels like that sort of thing. We'll see. 

> 
> Then of course, the result with the patched get_user_pages, and
> compare whichever of IOPS or CPU% changes, and how much.
> 
> If these are within a few percent, I agree it's good to go. If it's
> roughly 25% like the result just above, that's a rocky road.
> 
> I can try this after the holiday on some basic hardware and might
> be able to scrounge up better. Can you post that github link?
> 

Here:

   git@github.com:johnhubbard/linux (branch: gup_dma_testing)


-- 
thanks,
John Hubbard
NVIDIA

  reply	other threads:[~2018-11-21 22:06 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-10  8:50 [PATCH v2 0/6] RFC: gup+dma: tracking dma-pinned pages john.hubbard
2018-11-10  8:50 ` [PATCH v2 1/6] mm/gup: finish consolidating error handling john.hubbard
2018-11-12 15:41   ` Keith Busch
2018-11-12 16:14     ` Dan Williams
2018-11-15  0:45       ` John Hubbard
2018-11-10  8:50 ` [PATCH v2 2/6] mm: introduce put_user_page*(), placeholder versions john.hubbard
2018-11-11 14:10   ` Mike Rapoport
2018-11-10  8:50 ` [PATCH v2 3/6] infiniband/mm: convert put_page() to put_user_page*() john.hubbard
2018-11-10  8:50 ` [PATCH v2 4/6] mm: introduce page->dma_pinned_flags, _count john.hubbard
2018-11-10  8:50 ` [PATCH v2 5/6] mm: introduce zone_gup_lock, for dma-pinned pages john.hubbard
2018-11-10  8:50 ` [PATCH v2 6/6] mm: track gup pages with page->dma_pinned_* fields john.hubbard
2018-11-12 13:58   ` Jan Kara
2018-11-15  6:28   ` [LKP] [mm] 0e9755bfa2: kernel_BUG_at_include/linux/mm.h kernel test robot
2018-11-19 18:57 ` [PATCH v2 0/6] RFC: gup+dma: tracking dma-pinned pages Tom Talpey
2018-11-21  6:09   ` John Hubbard
2018-11-21 16:49     ` Tom Talpey
2018-11-21 22:06       ` John Hubbard [this message]
2018-11-28  1:21         ` Tom Talpey
2018-11-28  2:52           ` John Hubbard
2018-11-28 13:59             ` Tom Talpey
2018-11-30  1:39               ` John Hubbard
2018-11-30  2:18                 ` Tom Talpey
2018-11-30  2:21                   ` John Hubbard
2018-11-30  2:30                     ` Tom Talpey
2018-11-30  3:00                       ` John Hubbard
2018-11-30  3:14                         ` Tom Talpey

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c1ba07d6-ebfa-ddb9-c25e-e5c1bfbecf74@nvidia.com \
    --to=jhubbard@nvidia.com \
    --cc=akpm@linux-foundation.org \
    --cc=john.hubbard@gmail.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=tom@talpey.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).