linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Tom Talpey <tom@talpey.com>
To: John Hubbard <jhubbard@nvidia.com>,
	john.hubbard@gmail.com, linux-mm@kvack.org
Cc: Andrew Morton <akpm@linux-foundation.org>,
	LKML <linux-kernel@vger.kernel.org>,
	linux-rdma <linux-rdma@vger.kernel.org>,
	linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH v2 0/6] RFC: gup+dma: tracking dma-pinned pages
Date: Tue, 27 Nov 2018 20:21:51 -0500	[thread overview]
Message-ID: <15e4a0c0-cadd-e549-962f-8d9aa9fc033a@talpey.com> (raw)
In-Reply-To: <c1ba07d6-ebfa-ddb9-c25e-e5c1bfbecf74@nvidia.com>

On 11/21/2018 5:06 PM, John Hubbard wrote:
> On 11/21/18 8:49 AM, Tom Talpey wrote:
>> On 11/21/2018 1:09 AM, John Hubbard wrote:
>>> On 11/19/18 10:57 AM, Tom Talpey wrote:
>>>> ~14000 4KB read IOPS is really, really low for an NVMe disk.
>>>
>>> Yes, but Jan Kara's original config file for fio is *intended* to highlight
>>> the get_user_pages/put_user_pages changes. It was *not* intended to get max
>>> performance,  as you can see by the numjobs and direct IO parameters:
>>>
>>> cat fio.conf
>>> [reader]
>>> direct=1
>>> ioengine=libaio
>>> blocksize=4096
>>> size=1g
>>> numjobs=1
>>> rw=read
>>> iodepth=64
>>
>> To be clear - I used those identical parameters, on my lower-spec
>> machine, and got 400,000 4KB read IOPS. Those results are nearly 30x
>> higher than yours!
> 
> OK, then something really is wrong here...
> 
>>
>>> So I'm thinking that this is not a "tainted" test, but rather, we're constraining
>>> things a lot with these choices. It's hard to find a good test config to run that
>>> allows decisions, but so far, I'm not really seeing anything that says "this
>>> is so bad that we can't afford to fix the brokenness." I think.
>>
>> I'm not suggesting we tune the benchmark, I'm suggesting the results
>> on your system are not meaningful since they are orders of magnitude
>> low. And without meaningful data it's impossible to see the performance
>> impact of the change...
>>
>>>> Can you confirm what type of hardware you're running this test on?
>>>> CPU, memory speed and capacity, and NVMe device especially?
>>>>
>>>> Tom.
>>>
>>> Yes, it's a nice new system, I don't expect any strange perf problems:
>>>
>>> CPU: Intel(R) Core(TM) i7-7800X CPU @ 3.50GHz
>>>       (Intel X299 chipset)
>>> Block device: nvme-Samsung_SSD_970_EVO_250GB
>>> DRAM: 32 GB
>>
>> The Samsung Evo 970 250GB is speced to yield 200,000 random read IOPS
>> with a 4KB QD32 workload:
>>
>>
>> https://www.samsung.com/us/computing/memory-storage/solid-state-drives/ssd-970-evo-nvme-m-2-250gb-mz-v7e250bw/#specs
>>
>> And the I7-7800X is a 6-core processor (12 hyperthreads).
>>
>>> So, here's a comparison using 20 threads, direct IO, for the baseline vs.
>>> patched kernel (below). Highlights:
>>>
>>>      -- IOPS are similar, around 60k.
>>>      -- BW gets worse, dropping from 290 to 220 MB/s.
>>>      -- CPU is well under 100%.
>>>      -- latency is incredibly long, but...20 threads.
>>>
>>> Baseline:
>>>
>>> $ ./run.sh
>>> fio configuration:
>>> [reader]
>>> ioengine=libaio
>>> blocksize=4096
>>> size=1g
>>> rw=read
>>> group_reporting
>>> iodepth=256
>>> direct=1
>>> numjobs=20
>>
>> Ouch - 20 threads issuing 256 io's each!? Of course latency skyrockets.
>> That's going to cause tremendous queuing, and context switching, far
>> outside of the get_user_pages() change.
>>
>> But even so, it only brings IOPS to 74.2K, which is still far short of
>> the device's 200K spec.
>>
>> Comparing anyway:
>>
>>
>>> Patched:
>>>
>>> -------- Running fio:
>>> reader: (g=0): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=256
>>> ...
>>> fio-3.3
>>> Starting 20 processes
>>> Jobs: 13 (f=8): [_(1),R(1),_(1),f(1),R(2),_(1),f(2),_(1),R(1),f(1),R(1),f(1),R(1),_(2),R(1),_(1),R(1)][97.9%][r=229MiB/s,w=0KiB/s][r=58.5k,w=0 IOPS][eta 00m:02s]
>>> reader: (groupid=0, jobs=20): err= 0: pid=2104: Tue Nov 20 22:01:58 2018
>>>      read: IOPS=56.8k, BW=222MiB/s (232MB/s)(20.0GiB/92385msec)
>>> ...
>>> Thoughts?
>>
>> Concern - the 74.2K IOPS unpatched drops to 56.8K patched!
> 
> ACK. :)
> 
>>
>> What I'd really like to see is to go back to the original fio parameters
>> (1 thread, 64 iodepth) and try to get a result that gets at least close
>> to the speced 200K IOPS of the NVMe device. There seems to be something
>> wrong with yours, currently.
> 
> I'll dig into what has gone wrong with the test. I see fio putting data files
> in the right place, so the obvious "using the wrong drive" is (probably)
> not it. Even though it really feels like that sort of thing. We'll see.
> 
>>
>> Then of course, the result with the patched get_user_pages, and
>> compare whichever of IOPS or CPU% changes, and how much.
>>
>> If these are within a few percent, I agree it's good to go. If it's
>> roughly 25% like the result just above, that's a rocky road.
>>
>> I can try this after the holiday on some basic hardware and might
>> be able to scrounge up better. Can you post that github link?
>>
> 
> Here:
> 
>     git@github.com:johnhubbard/linux (branch: gup_dma_testing)

I'm super-limited here this week hardware-wise and have not been able
to try testing with the patched kernel.

I was able to compare my earlier quick test with a Bionic 4.15 kernel
(400K IOPS) against a similar 4.20rc3 kernel, and the rate dropped to
~_375K_ IOPS. Which I found perhaps troubling. But it was only a quick
test, and without your change.

Say, that branch reports it has not had a commit since June 30. Is that
the right one? What about gup_dma_for_lpc_2018?

Tom.

  reply	other threads:[~2018-11-28  1:29 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-10  8:50 [PATCH v2 0/6] RFC: gup+dma: tracking dma-pinned pages john.hubbard
2018-11-10  8:50 ` [PATCH v2 1/6] mm/gup: finish consolidating error handling john.hubbard
2018-11-12 15:41   ` Keith Busch
2018-11-12 16:14     ` Dan Williams
2018-11-15  0:45       ` John Hubbard
2018-11-10  8:50 ` [PATCH v2 2/6] mm: introduce put_user_page*(), placeholder versions john.hubbard
2018-11-11 14:10   ` Mike Rapoport
2018-11-10  8:50 ` [PATCH v2 3/6] infiniband/mm: convert put_page() to put_user_page*() john.hubbard
2018-11-10  8:50 ` [PATCH v2 4/6] mm: introduce page->dma_pinned_flags, _count john.hubbard
2018-11-10  8:50 ` [PATCH v2 5/6] mm: introduce zone_gup_lock, for dma-pinned pages john.hubbard
2018-11-10  8:50 ` [PATCH v2 6/6] mm: track gup pages with page->dma_pinned_* fields john.hubbard
2018-11-12 13:58   ` Jan Kara
2018-11-15  6:28   ` [LKP] [mm] 0e9755bfa2: kernel_BUG_at_include/linux/mm.h kernel test robot
2018-11-19 18:57 ` [PATCH v2 0/6] RFC: gup+dma: tracking dma-pinned pages Tom Talpey
2018-11-21  6:09   ` John Hubbard
2018-11-21 16:49     ` Tom Talpey
2018-11-21 22:06       ` John Hubbard
2018-11-28  1:21         ` Tom Talpey [this message]
2018-11-28  2:52           ` John Hubbard
2018-11-28 13:59             ` Tom Talpey
2018-11-30  1:39               ` John Hubbard
2018-11-30  2:18                 ` Tom Talpey
2018-11-30  2:21                   ` John Hubbard
2018-11-30  2:30                     ` Tom Talpey
2018-11-30  3:00                       ` John Hubbard
2018-11-30  3:14                         ` Tom Talpey

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=15e4a0c0-cadd-e549-962f-8d9aa9fc033a@talpey.com \
    --to=tom@talpey.com \
    --cc=akpm@linux-foundation.org \
    --cc=jhubbard@nvidia.com \
    --cc=john.hubbard@gmail.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-rdma@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).