From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1162925AbeCAXUt (ORCPT ); Thu, 1 Mar 2018 18:20:49 -0500 Received: from mail-wr0-f172.google.com ([209.85.128.172]:36405 "EHLO mail-wr0-f172.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1162891AbeCAXUq (ORCPT ); Thu, 1 Mar 2018 18:20:46 -0500 X-Google-Smtp-Source: AG47ELuFjxbzYZT6CZDhrw9cZkcVRAvgwZtCqAuyZaWilzgR855FAz2DtJOgCj3NlVdDhaGBh3SM2w== Date: Thu, 1 Mar 2018 16:20:38 -0700 From: Jason Gunthorpe To: Stephen Bates Cc: Logan Gunthorpe , Sagi Grimberg , "linux-kernel@vger.kernel.org" , "linux-pci@vger.kernel.org" , "linux-nvme@lists.infradead.org" , "linux-rdma@vger.kernel.org" , "linux-nvdimm@lists.01.org" , "linux-block@vger.kernel.org" , Christoph Hellwig , Jens Axboe , Keith Busch , Bjorn Helgaas , Max Gurtovoy , Dan Williams , =?utf-8?B?SsOpcsO0bWU=?= Glisse , Benjamin Herrenschmidt , Alex Williamson , Steve Wise Subject: Re: [PATCH v2 10/10] nvmet: Optionally use PCI P2P memory Message-ID: <20180301232038.GO19007@ziepe.ca> References: <20180228234006.21093-1-logang@deltatee.com> <20180228234006.21093-11-logang@deltatee.com> <749e3752-4349-0bdf-5243-3d510c2b26db@grimberg.me> <40d69074-31a8-d06a-ade9-90de7712c553@deltatee.com> <5649098f-b775-815b-8b9a-f34628873ff4@grimberg.me> <20180301184249.GI19007@ziepe.ca> <20180301224540.GL19007@ziepe.ca> <77591162-4CCD-446E-A27C-1CDB4996ACB7@raithlin.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <77591162-4CCD-446E-A27C-1CDB4996ACB7@raithlin.com> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Mar 01, 2018 at 11:00:51PM +0000, Stephen Bates wrote: > > Seems like a very subtle and hard to debug performance trap to leave > > for the users, and pretty much the only reason to use P2P is > > performance... So why have such a dangerous interface? > > P2P is about offloading the memory and PCI subsystem of the host CPU > and this is achieved no matter which p2p_dev is used. No, locality matters. If you have a bunch of NICs and bunch of drives and the allocator chooses to put all P2P memory on a single drive your performance will suck horribly even if all the traffic is offloaded. Performance will suck if you have speed differences between the PCI-E devices. Eg a bunch of Gen 3 x8 NVMe cards paired with a Gen 4 x16 NIC will not reach full performance unless the p2p buffers are properly balanced between all cards. This is why I think putting things in one big pool and pretending like any P2P mem is interchangable with any other makes zero sense to me, it is not how the system actually works. Proper locality matters a lot. Jason