From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oi0-x234.google.com (mail-oi0-x234.google.com [IPv6:2607:f8b0:4003:c06::234]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 3D0A220348624 for ; Fri, 18 May 2018 15:10:06 -0700 (PDT) Received: by mail-oi0-x234.google.com with SMTP id l1-v6so8421905oii.1 for ; Fri, 18 May 2018 15:10:06 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: References: <20180308145153.GB23262@infradead.org> <20180518154454.GA4902@redhat.com> From: Dan Williams Date: Fri, 18 May 2018 15:10:04 -0700 Message-ID: Subject: Re: dm-writecache List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: linux-nvdimm-bounces@lists.01.org Sender: "Linux-nvdimm" To: Mikulas Patocka Cc: Christoph Hellwig , device-mapper development , "Alasdair G. Kergon" , Mike Snitzer , linux-nvdimm List-ID: On Fri, May 18, 2018 at 3:00 PM, Mikulas Patocka wrote: > > > On Fri, 18 May 2018, Dan Williams wrote: > >> >> ...and I wonder what the benefit is of the 16-byte case? I would >> >> assume the bulk of the benefit is limited to the 4 and 8 byte copy >> >> cases. >> > >> > dm-writecache uses 16-byte writes frequently, so it is needed for that. >> > >> > If we split 16-byte write to two 8-byte writes, it would degrade >> > performance for architectures where memcpy_flushcache needs to flush the >> > cache. >> >> My question was how measurable it is to special case 16-byte >> transfers? I know Ingo is going to ask this question, so it would >> speed things along if this patch included performance benefit numbers >> for each special case in the changelog. > > I tested it some times ago - and the movnti instruction has 2% better > throughput than the existing memcpy_flushcache function. > > It is doing one 16-byte write for every sector written and one 8-byte > write for every sector clean-up. So, the overhead is measurable. Awesome, include those measured numbers in the changelog for the next spin of the patch. _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dan Williams Subject: Re: dm-writecache Date: Fri, 18 May 2018 15:10:04 -0700 Message-ID: References: <20180308145153.GB23262@infradead.org> <20180518154454.GA4902@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: linux-nvdimm-bounces-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org Sender: "Linux-nvdimm" To: Mikulas Patocka Cc: Christoph Hellwig , device-mapper development , "Alasdair G. Kergon" , Mike Snitzer , linux-nvdimm List-Id: dm-devel.ids On Fri, May 18, 2018 at 3:00 PM, Mikulas Patocka wrote: > > > On Fri, 18 May 2018, Dan Williams wrote: > >> >> ...and I wonder what the benefit is of the 16-byte case? I would >> >> assume the bulk of the benefit is limited to the 4 and 8 byte copy >> >> cases. >> > >> > dm-writecache uses 16-byte writes frequently, so it is needed for that. >> > >> > If we split 16-byte write to two 8-byte writes, it would degrade >> > performance for architectures where memcpy_flushcache needs to flush the >> > cache. >> >> My question was how measurable it is to special case 16-byte >> transfers? I know Ingo is going to ask this question, so it would >> speed things along if this patch included performance benefit numbers >> for each special case in the changelog. > > I tested it some times ago - and the movnti instruction has 2% better > throughput than the existing memcpy_flushcache function. > > It is doing one 16-byte write for every sector written and one 8-byte > write for every sector clean-up. So, the overhead is measurable. Awesome, include those measured numbers in the changelog for the next spin of the patch.