From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-nvdimm-bounces@lists.01.org>
Received: from lgeamrelo12.lge.com (LGEAMRELO12.lge.com [156.147.23.52])
 by ml01.01.org (Postfix) with ESMTP id 063A52095D9C8
 for <linux-nvdimm@lists.01.org>; Fri,  4 Aug 2017 01:15:31 -0700 (PDT)
Date: Fri, 4 Aug 2017 17:17:40 +0900
From: Minchan Kim <minchan@kernel.org>
Subject: Re: [PATCH 0/3] remove rw_page() from brd, pmem and btt
Message-ID: <20170804081740.GA2083@bbox>
References: <20170728165604.10455-1-ross.zwisler@linux.intel.com>
 <20170728173143.GE15980@bombadil.infradead.org>
 <20170802221359.GA20666@linux.intel.com>
 <20170803001315.GF32020@bbox>
 <20170803211335.GA1260@linux.intel.com> <20170804035441.GA305@bbox>
MIME-Version: 1.0
Content-Disposition: inline
In-Reply-To: <20170804035441.GA305@bbox>
List-Unsubscribe: <https://lists.01.org/mailman/options/linux-nvdimm>,
 <mailto:linux-nvdimm-request@lists.01.org?subject=unsubscribe>
List-Archive: <http://lists.01.org/pipermail/linux-nvdimm/>
List-Post: <mailto:linux-nvdimm@lists.01.org>
List-Help: <mailto:linux-nvdimm-request@lists.01.org?subject=help>
List-Subscribe: <https://lists.01.org/mailman/listinfo/linux-nvdimm>,
 <mailto:linux-nvdimm-request@lists.01.org?subject=subscribe>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Errors-To: linux-nvdimm-bounces@lists.01.org
Sender: "Linux-nvdimm" <linux-nvdimm-bounces@lists.01.org>
To: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Jens Axboe <axboe@kernel.dk>, Jerome Marchand <jmarchan@redhat.com>, linux-nvdimm@lists.01.org, Dave Chinner <david@fromorbit.com>, linux-kernel@vger.kernel.org, Matthew Wilcox <willy@infradead.org>, Christoph Hellwig <hch@lst.de>, Jan Kara <jack@suse.cz>, Andrew Morton <akpm@linux-foundation.org>, "karam . lee" <karam.lee@lge.com>, seungho1.park@lge.com, Nitin Gupta <ngupta@vflare.org>
List-ID: <linux-nvdimm@lists.01.org>

On Fri, Aug 04, 2017 at 12:54:41PM +0900, Minchan Kim wrote:
> On Thu, Aug 03, 2017 at 03:13:35PM -0600, Ross Zwisler wrote:
> > On Thu, Aug 03, 2017 at 09:13:15AM +0900, Minchan Kim wrote:
> > > Hi Ross,
> > > 
> > > On Wed, Aug 02, 2017 at 04:13:59PM -0600, Ross Zwisler wrote:
> > > > On Fri, Jul 28, 2017 at 10:31:43AM -0700, Matthew Wilcox wrote:
> > > > > On Fri, Jul 28, 2017 at 10:56:01AM -0600, Ross Zwisler wrote:
> > > > > > Dan Williams and Christoph Hellwig have recently expressed doubt about
> > > > > > whether the rw_page() interface made sense for synchronous memory drivers
> > > > > > [1][2].  It's unclear whether this interface has any performance benefit
> > > > > > for these drivers, but as we continue to fix bugs it is clear that it does
> > > > > > have a maintenance burden.  This series removes the rw_page()
> > > > > > implementations in brd, pmem and btt to relieve this burden.
> > > > > 
> > > > > Why don't you measure whether it has performance benefits?  I don't
> > > > > understand why zram would see performance benefits and not other drivers.
> > > > > If it's going to be removed, then the whole interface should be removed,
> > > > > not just have the implementations removed from some drivers.
> > > > 
> > > > Okay, I've run a bunch of performance tests with the PMEM and with BTT entry
> > > > points for rw_pages() in a swap workload, and in all cases I do see an
> > > > improvement over the code when rw_pages() is removed.  Here are the results
> > > > from my random lab box:
> > > > 
> > > >   Average latency of swap_writepage()
> > > > +------+------------+---------+-------------+
> > > > |      | no rw_page | rw_page | Improvement |
> > > > +-------------------------------------------+
> > > > | PMEM |  5.0 us    |  4.7 us |     6%      |
> > > > +-------------------------------------------+
> > > > |  BTT |  6.8 us    |  6.1 us |    10%      |
> > > > +------+------------+---------+-------------+
> > > > 
> > > >   Average latency of swap_readpage()
> > > > +------+------------+---------+-------------+
> > > > |      | no rw_page | rw_page | Improvement |
> > > > +-------------------------------------------+
> > > > | PMEM |  3.3 us    |  2.9 us |    12%      |
> > > > +-------------------------------------------+
> > > > |  BTT |  3.7 us    |  3.4 us |     8%      |
> > > > +------+------------+---------+-------------+
> > > > 
> > > > The workload was pmbench, a memory benchmark, run on a system where I had
> > > > severely restricted the amount of memory in the system with the 'mem' kernel
> > > > command line parameter.  The benchmark was set up to test more memory than I
> > > > allowed the OS to have so it spilled over into swap.
> > > > 
> > > > The PMEM or BTT device was set up as my swap device, and during the test I got
> > > > a few hundred thousand samples of each of swap_writepage() and
> > > > swap_writepage().  The PMEM/BTT device was just memory reserved with the
> > > > memmap kernel command line parameter.
> > > > 
> > > > Thanks, Matthew, for asking for performance data.  It looks like removing this
> > > > code would have been a mistake.
> > > 
> > > By suggestion of Christoph Hellwig, I made a quick patch which does IO without
> > > dynamic bio allocation for swap IO. Actually, it's not formal patch to be
> > > worth to send mainline yet but I believe it's enough to test the improvement.
> > > 
> > > Could you test patchset on pmem and btt without rw_page?
> > > 
> > > For working the patch, block drivers need to declare it's synchronous IO
> > > device via BDI_CAP_SYNC but if it's hard, you can just make every swap IO
> > > comes from (sis->flags & SWP_SYNC_IO) with removing condition check
> > > 
> > > if (!(sis->flags & SWP_SYNC_IO)) in swap_[read|write]page.
> > > 
> > > Patchset is based on 4.13-rc3.
> > 
> > Thanks for the patch, here are the updated results from my test box:
> > 
> >  Average latency of swap_writepage()
> > +------+------------+---------+---------+
> > |      | no rw_page | minchan | rw_page |
> > +----------------------------------------
> > | PMEM |  5.0 us    | 4.98 us |  4.7 us |
> > +----------------------------------------
> > |  BTT |  6.8 us    | 6.3 us  |  6.1 us |
> > +------+------------+---------+---------+
> >   				   
> >  Average latency of swap_readpage()
> > +------+------------+---------+---------+
> > |      | no rw_page | minchan | rw_page |
> > +----------------------------------------
> > | PMEM |  3.3 us    | 3.27 us |  2.9 us |
> > +----------------------------------------
> > |  BTT |  3.7 us    | 3.44 us |  3.4 us |
> > +------+------------+---------+---------+
> > 
> > I've added another digit in precision in some cases to help differentiate the
> > various results.
> > 
> > In all cases your patches did perform better than with the regularly allocated
> > BIO, but again for all cases the rw_page() path was the fastest, even if only
> > marginally.
> 
> Thanks for the testing. Your testing number is within noise level?
> 
> I cannot understand why PMEM doesn't have enough gain while BTT is significant
> win(8%). I guess no rw_page with BTT testing had more chances to wait bio dynamic
> allocation and mine and rw_page testing reduced it significantly. However,
> in no rw_page with pmem, there wasn't many cases to wait bio allocations due
> to the device is so fast so the number comes from purely the number of
> instructions has done. At a quick glance of bio init/submit, it's not trivial
> so indeed, i understand where the 12% enhancement comes from but I'm not sure
> it's really big difference in real practice at the cost of maintaince burden.

I tested pmbench 10 times in my local machine(4 core) with zram-swap.
In my machine, even, on-stack bio is faster than rw_page. Unbelievable.

I guess it's really hard to get stable result in severe memory pressure.
It would be a result within noise level(see below stddev).
So, I think it's hard to conclude rw_page is far faster than onstack-bio.

rw_page
avg     5.54us
stddev  8.89%
max     6.02us
min     4.20us

onstack bio
avg     5.27us
stddev  13.03%
max     5.96us
min     3.55us
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1751383AbdHDIRt (ORCPT <rfc822;w@1wt.eu>);
        Fri, 4 Aug 2017 04:17:49 -0400
Received: from LGEAMRELO12.lge.com ([156.147.23.52]:45079 "EHLO
        lgeamrelo12.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1751283AbdHDIRp (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 4 Aug 2017 04:17:45 -0400
X-Original-SENDERIP: 156.147.1.151
X-Original-MAILFROM: minchan@kernel.org
X-Original-SENDERIP: 10.177.220.163
X-Original-MAILFROM: minchan@kernel.org
Date: Fri, 4 Aug 2017 17:17:40 +0900
From: Minchan Kim <minchan@kernel.org>
To: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Matthew Wilcox <willy@infradead.org>,
        Andrew Morton <akpm@linux-foundation.org>,
        linux-kernel@vger.kernel.org, "karam . lee" <karam.lee@lge.com>,
        Jerome Marchand <jmarchan@redhat.com>, Nitin Gupta <ngupta@vflare.org>,
        seungho1.park@lge.com, Christoph Hellwig <hch@lst.de>,
        Dan Williams <dan.j.williams@intel.com>,
        Dave Chinner <david@fromorbit.com>, Jan Kara <jack@suse.cz>,
        Jens Axboe <axboe@kernel.dk>, Vishal Verma <vishal.l.verma@intel.com>,
        linux-nvdimm@lists.01.org
Subject: Re: [PATCH 0/3] remove rw_page() from brd, pmem and btt
Message-ID: <20170804081740.GA2083@bbox>
References: <20170728165604.10455-1-ross.zwisler@linux.intel.com>
 <20170728173143.GE15980@bombadil.infradead.org>
 <20170802221359.GA20666@linux.intel.com>
 <20170803001315.GF32020@bbox>
 <20170803211335.GA1260@linux.intel.com>
 <20170804035441.GA305@bbox>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20170804035441.GA305@bbox>
User-Agent: Mutt/1.5.24 (2015-08-30)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Fri, Aug 04, 2017 at 12:54:41PM +0900, Minchan Kim wrote:
> On Thu, Aug 03, 2017 at 03:13:35PM -0600, Ross Zwisler wrote:
> > On Thu, Aug 03, 2017 at 09:13:15AM +0900, Minchan Kim wrote:
> > > Hi Ross,
> > > 
> > > On Wed, Aug 02, 2017 at 04:13:59PM -0600, Ross Zwisler wrote:
> > > > On Fri, Jul 28, 2017 at 10:31:43AM -0700, Matthew Wilcox wrote:
> > > > > On Fri, Jul 28, 2017 at 10:56:01AM -0600, Ross Zwisler wrote:
> > > > > > Dan Williams and Christoph Hellwig have recently expressed doubt about
> > > > > > whether the rw_page() interface made sense for synchronous memory drivers
> > > > > > [1][2].  It's unclear whether this interface has any performance benefit
> > > > > > for these drivers, but as we continue to fix bugs it is clear that it does
> > > > > > have a maintenance burden.  This series removes the rw_page()
> > > > > > implementations in brd, pmem and btt to relieve this burden.
> > > > > 
> > > > > Why don't you measure whether it has performance benefits?  I don't
> > > > > understand why zram would see performance benefits and not other drivers.
> > > > > If it's going to be removed, then the whole interface should be removed,
> > > > > not just have the implementations removed from some drivers.
> > > > 
> > > > Okay, I've run a bunch of performance tests with the PMEM and with BTT entry
> > > > points for rw_pages() in a swap workload, and in all cases I do see an
> > > > improvement over the code when rw_pages() is removed.  Here are the results
> > > > from my random lab box:
> > > > 
> > > >   Average latency of swap_writepage()
> > > > +------+------------+---------+-------------+
> > > > |      | no rw_page | rw_page | Improvement |
> > > > +-------------------------------------------+
> > > > | PMEM |  5.0 us    |  4.7 us |     6%      |
> > > > +-------------------------------------------+
> > > > |  BTT |  6.8 us    |  6.1 us |    10%      |
> > > > +------+------------+---------+-------------+
> > > > 
> > > >   Average latency of swap_readpage()
> > > > +------+------------+---------+-------------+
> > > > |      | no rw_page | rw_page | Improvement |
> > > > +-------------------------------------------+
> > > > | PMEM |  3.3 us    |  2.9 us |    12%      |
> > > > +-------------------------------------------+
> > > > |  BTT |  3.7 us    |  3.4 us |     8%      |
> > > > +------+------------+---------+-------------+
> > > > 
> > > > The workload was pmbench, a memory benchmark, run on a system where I had
> > > > severely restricted the amount of memory in the system with the 'mem' kernel
> > > > command line parameter.  The benchmark was set up to test more memory than I
> > > > allowed the OS to have so it spilled over into swap.
> > > > 
> > > > The PMEM or BTT device was set up as my swap device, and during the test I got
> > > > a few hundred thousand samples of each of swap_writepage() and
> > > > swap_writepage().  The PMEM/BTT device was just memory reserved with the
> > > > memmap kernel command line parameter.
> > > > 
> > > > Thanks, Matthew, for asking for performance data.  It looks like removing this
> > > > code would have been a mistake.
> > > 
> > > By suggestion of Christoph Hellwig, I made a quick patch which does IO without
> > > dynamic bio allocation for swap IO. Actually, it's not formal patch to be
> > > worth to send mainline yet but I believe it's enough to test the improvement.
> > > 
> > > Could you test patchset on pmem and btt without rw_page?
> > > 
> > > For working the patch, block drivers need to declare it's synchronous IO
> > > device via BDI_CAP_SYNC but if it's hard, you can just make every swap IO
> > > comes from (sis->flags & SWP_SYNC_IO) with removing condition check
> > > 
> > > if (!(sis->flags & SWP_SYNC_IO)) in swap_[read|write]page.
> > > 
> > > Patchset is based on 4.13-rc3.
> > 
> > Thanks for the patch, here are the updated results from my test box:
> > 
> >  Average latency of swap_writepage()
> > +------+------------+---------+---------+
> > |      | no rw_page | minchan | rw_page |
> > +----------------------------------------
> > | PMEM |  5.0 us    | 4.98 us |  4.7 us |
> > +----------------------------------------
> > |  BTT |  6.8 us    | 6.3 us  |  6.1 us |
> > +------+------------+---------+---------+
> >   				   
> >  Average latency of swap_readpage()
> > +------+------------+---------+---------+
> > |      | no rw_page | minchan | rw_page |
> > +----------------------------------------
> > | PMEM |  3.3 us    | 3.27 us |  2.9 us |
> > +----------------------------------------
> > |  BTT |  3.7 us    | 3.44 us |  3.4 us |
> > +------+------------+---------+---------+
> > 
> > I've added another digit in precision in some cases to help differentiate the
> > various results.
> > 
> > In all cases your patches did perform better than with the regularly allocated
> > BIO, but again for all cases the rw_page() path was the fastest, even if only
> > marginally.
> 
> Thanks for the testing. Your testing number is within noise level?
> 
> I cannot understand why PMEM doesn't have enough gain while BTT is significant
> win(8%). I guess no rw_page with BTT testing had more chances to wait bio dynamic
> allocation and mine and rw_page testing reduced it significantly. However,
> in no rw_page with pmem, there wasn't many cases to wait bio allocations due
> to the device is so fast so the number comes from purely the number of
> instructions has done. At a quick glance of bio init/submit, it's not trivial
> so indeed, i understand where the 12% enhancement comes from but I'm not sure
> it's really big difference in real practice at the cost of maintaince burden.

I tested pmbench 10 times in my local machine(4 core) with zram-swap.
In my machine, even, on-stack bio is faster than rw_page. Unbelievable.

I guess it's really hard to get stable result in severe memory pressure.
It would be a result within noise level(see below stddev).
So, I think it's hard to conclude rw_page is far faster than onstack-bio.

rw_page
avg     5.54us
stddev  8.89%
max     6.02us
min     4.20us

onstack bio
avg     5.27us
stddev  13.03%
max     5.96us
min     3.55us