From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755666AbdDFFdw (ORCPT ); Thu, 6 Apr 2017 01:33:52 -0400 Received: from mail-wr0-f195.google.com ([209.85.128.195]:34675 "EHLO mail-wr0-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752101AbdDFFdo (ORCPT ); Thu, 6 Apr 2017 01:33:44 -0400 Subject: Re: [RFC 6/8] nvmet: Be careful about using iomem accesses when dealing with p2pmem To: Jason Gunthorpe References: <1490911959-5146-1-git-send-email-logang@deltatee.com> <1490911959-5146-7-git-send-email-logang@deltatee.com> <080b68b4-eba3-861c-4f29-5d829425b5e7@grimberg.me> <20170404154629.GA13552@obsidianresearch.com> Cc: Logan Gunthorpe , Christoph Hellwig , "James E.J. Bottomley" , "Martin K. Petersen" , Jens Axboe , Steve Wise , Stephen Bates , Max Gurtovoy , Dan Williams , Keith Busch , linux-pci@vger.kernel.org, linux-scsi@vger.kernel.org, linux-nvme@lists.infradead.org, linux-rdma@vger.kernel.org, linux-nvdimm@ml01.01.org, linux-kernel@vger.kernel.org From: Sagi Grimberg Message-ID: <4df229d8-8124-664a-9bc4-6401bc034be1@grimberg.me> Date: Thu, 6 Apr 2017 08:33:38 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.7.0 MIME-Version: 1.0 In-Reply-To: <20170404154629.GA13552@obsidianresearch.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org >> Note that the nvme completion queues are still on the host memory, so >> this means we have lost the ordering between data and completions as >> they go to different pcie targets. > > Hmm, in this simple up/down case with a switch, I think it might > actually be OK. > > Transactions might not complete at the NVMe device before the CPU > processes the RDMA completion, however due to the PCI-E ordering rules > new TLPs directed to the NVMe will complete after the RMDA TLPs and > thus observe the new data. (eg order preserving) > > It would be very hard to use P2P if fabric ordering is not preserved.. I think it still can race if the p2p device is connected with more than a single port to the switch. Say it's connected via 2 legs, the bar is accessed from leg A and the data from the disk comes via leg B. In this case, the data is heading towards the p2p device via leg B (might be congested), the completion goes directly to the RC, and then the host issues a read from the bar via leg A. I don't understand what can guarantee ordering here. Stephen told me that this still guarantees ordering, but I honestly can't understand how, perhaps someone can explain to me in a simple way that I can understand.