From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp.codeaurora.org (smtp.codeaurora.org [198.145.29.96]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 0445021E08297 for ; Mon, 5 Mar 2018 09:56:47 -0800 (PST) Subject: Re: [PATCH v2 07/10] nvme-pci: Use PCI p2pmem subsystem to manage the CMB References: <20180228234006.21093-1-logang@deltatee.com> <20180228234006.21093-8-logang@deltatee.com> <20180305160004.GA30975@localhost.localdomain> <3f56c76d-6a5c-7c2f-5442-c9209749b598@deltatee.com> From: Sinan Kaya Message-ID: Date: Mon, 5 Mar 2018 13:02:57 -0500 MIME-Version: 1.0 In-Reply-To: <3f56c76d-6a5c-7c2f-5442-c9209749b598@deltatee.com> Content-Language: en-US List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: linux-nvdimm-bounces@lists.01.org Sender: "Linux-nvdimm" To: Logan Gunthorpe , Keith Busch , Oliver Cc: Jens Axboe , "linux-nvdimm@lists.01.org" , linux-rdma@vger.kernel.org, linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org, linux-nvme@lists.infradead.org, linux-block@vger.kernel.org, Alex Williamson , Jason Gunthorpe , =?UTF-8?B?SsOpcsO0bWUgR2xpc3Nl?= , Benjamin Herrenschmidt , Bjorn Helgaas , Max Gurtovoy , Christoph Hellwig List-ID: T24gMy81LzIwMTggMTI6MTAgUE0sIExvZ2FuIEd1bnRob3JwZSB3cm90ZToKPiAKPiAKPiBPbiAw NS8wMy8xOCAwOTowMCBBTSwgS2VpdGggQnVzY2ggd3JvdGU6Cj4+IE9uIE1vbiwgTWFyIDA1LCAy MDE4IGF0IDEyOjMzOjI5UE0gKzExMDAsIE9saXZlciB3cm90ZToKPj4+IE9uIFRodSwgTWFyIDEs IDIwMTggYXQgMTA6NDAgQU0sIExvZ2FuIEd1bnRob3JwZSA8bG9nYW5nQGRlbHRhdGVlLmNvbT4g d3JvdGU6Cj4+Pj4gQEAgLTQyOSwxMCArNDI5LDcgQEAgc3RhdGljIHZvaWQgX19udm1lX3N1Ym1p dF9jbWQoc3RydWN0IG52bWVfcXVldWUgKm52bWVxLAo+Pj4+IMKgIHsKPj4+PiDCoMKgwqDCoMKg wqDCoMKgIHUxNiB0YWlsID0gbnZtZXEtPnNxX3RhaWw7Cj4+Pgo+Pj4+IC3CoMKgwqDCoMKgwqAg aWYgKG52bWVxLT5zcV9jbWRzX2lvKQo+Pj4+IC3CoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKg IG1lbWNweV90b2lvKCZudm1lcS0+c3FfY21kc19pb1t0YWlsXSwgY21kLCBzaXplb2YoKmNtZCkp Owo+Pj4+IC3CoMKgwqDCoMKgwqAgZWxzZQo+Pj4+IC3CoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDC oMKgIG1lbWNweSgmbnZtZXEtPnNxX2NtZHNbdGFpbF0sIGNtZCwgc2l6ZW9mKCpjbWQpKTsKPj4+ PiArwqDCoMKgwqDCoMKgIG1lbWNweSgmbnZtZXEtPnNxX2NtZHNbdGFpbF0sIGNtZCwgc2l6ZW9m KCpjbWQpKTsKPj4+Cj4+PiBIbW0sIGhvdyBzYWZlIGlzIHJlcGxhY2luZyBtZW1jcHlfdG9pbygp IHdpdGggcmVndWxhciBtZW1jcHkoKT8gT24gUFBDCj4+PiB0aGUgX3RvaW8oKSB2YXJpYW50IGVu Zm9yY2VzIGFsaWdubWVudCwgZG9lcyB0aGUgY29weSB3aXRoIDQgYnl0ZQo+Pj4gc3RvcmVzLCBh bmQgaGFzIGEgZnVsbCBiYXJyaWVyIGFmdGVyIHRoZSBjb3B5LiBJbiBjb21wYXJpc29uIG91cgo+ Pj4gcmVndWxhciBtZW1jcHkoKSBkb2VzIG5vbmUgb2YgdGhvc2UgdGhpbmdzIGFuZCBtYXkgdXNl IHVuYWxpZ25lZCBhbmQKPj4+IHZlY3RvciBsb2FkL3N0b3Jlcy4gRm9yIG5vcm1hbCAoY2FjaGVh YmxlKSBtZW1vcnkgdGhhdCBpcyBwZXJmZWN0bHkKPj4+IGZpbmUsIGJ1dCB0aGV5IGNhbiBjYXVz ZSBhbGlnbm1lbnQgZmF1bHRzIHdoZW4gdGFyZ2V0ZWQgYXQgTU1JTwo+Pj4gKGNhY2hlLWluaGli aXRlZCkgbWVtb3J5Lgo+Pj4KPj4+IEkgdGhpbmsgaW4gdGhpcyBwYXJ0aWN1bGFyIGNhc2UgaXQg bWlnaHQgYmUgb2sgc2luY2Ugd2Uga25vdyBTRVFzIGFyZQo+Pj4gYWxpZ25lZCB0byA2NCBieXRl IGJvdW5kYXJpZXMgYW5kIHRoZSBjb3B5IGlzIHRvbyBzbWFsbCB0byB1c2Ugb3VyCj4+PiB2ZWN0 b3Jpc2VkIG1lbWNweSgpLiBJJ2xsIGFzc3VtZSB3ZSBkb24ndCBuZWVkIGV4cGxpY2l0IG9yZGVy aW5nCj4+PiBiZXR3ZWVuIHdyaXRlcyBvZiBTRVFzIHNpbmNlIHRoZSBleGlzdGluZyBjb2RlIGRv ZXNuJ3Qgc2VlbSB0byBjYXJlCj4+PiB1bmxlc3MgdGhlIGRvb3JiZWxsIGlzIGJlaW5nIHJ1bmcs IHNvIHlvdSdyZSBwcm9iYWJseSBmaW5lIHRoZXJlIHRvby4KPj4+IFRoYXQgc2FpZCwgSSBzdGls bCB0aGluayB0aGlzIGlzIGEgbGl0dGxlIGJpdCBza2V0Y2h5IGFuZCBhdCB0aGUgdmVyeQo+Pj4g bGVhc3QgeW91IHNob3VsZCBhZGQgYSBjb21tZW50IGV4cGxhaW5pbmcgd2hhdCdzIGdvaW5nIG9u IHdoZW4gdGhlIENNQgo+Pj4gaXMgYmVpbmcgdXNlZC4gSWYgc29tZW9uZSBtb3JlIGZhbWlsaWFy IHdpdGggdGhlIE5WTWUgZHJpdmVyIGNvdWxkCj4+PiBjaGltZSBpbiBJIHdvdWxkIGFwcHJlY2lh dGUgaXQuCj4+Cj4+IEkgbWF5IG5vdCBiZSB1bmRlcnN0YW5kaW5nIHRoZSBjb25jZXJuLCBidXQg SSdsbCBnaXZlIGl0IGEgc2hvdC4KPj4KPj4gWW91J3JlIHJpZ2h0LCB0aGUgc3RhcnQgb2YgYW55 IFNRRSBpcyBhbHdheXMgNjQtYnl0ZSBhbGlnbmVkLCBzbyB0aGF0Cj4+IHNob3VsZCBzYXRpc2Z5 IGFsaWdubWVudCByZXF1aXJlbWVudHMuCj4+Cj4+IFRoZSBvcmRlciB3aGVuIHdyaXRpbmcgbXVs dGlwbGUvc3VjY2Vzc2l2ZSBTUUVzIGluIGEgc3VibWlzc2lvbiBxdWV1ZQo+PiBkb2VzIG1hdHRl ciwgYW5kIHRoaXMgaXMgY3VycmVudGx5IHNlcmlhbGl6ZWQgdGhyb3VnaCB0aGUgcV9sb2NrLgo+ Pgo+PiBUaGUgb3JkZXIgaW4gd2hpY2ggdGhlIGJ5dGVzIG9mIGEgc2luZ2xlIFNRRSBpcyB3cml0 dGVuIGRvZXNuJ3QgcmVhbGx5Cj4+IG1hdHRlciBhcyBsb25nIGFzIHRoZSBlbnRpcmUgU1FFIGlz IHdyaXR0ZW4gaW50byB0aGUgQ01CIHByaW9yIHRvIHdyaXRpbmcKPj4gdGhhdCBTUSdzIGRvb3Ji ZWxsIHJlZ2lzdGVyLgo+Pgo+PiBUaGUgZG9vcmJlbGwgcmVnaXN0ZXIgaXMgd3JpdHRlbiBpbW1l ZGlhdGVseSBhZnRlciBjb3B5aW5nIGEgY29tbWFuZAo+PiBlbnRyeSBpbnRvIHRoZSBzdWJtaXNz aW9uIHF1ZXVlIChpZ25vcmUgInNoYWRvdyBidWZmZXIiIGZlYXR1cmVzKSwKPj4gc28gdGhlIGRv b3JiZWxscyB3cml0dGVuIHRvIGNvbW1hbmRzIHN1Ym1pdHRlZCBpcyAxOjEuCj4+Cj4+IElmIGEg Q01CIFNRRSBhbmQgREIgb3JkZXIgaXMgbm90IGVuZm9yY2VkIHdpdGggdGhlIG1lbWNweSwgdGhl biB3ZSBkbwo+PiBuZWVkIGEgYmFycmllciBhZnRlciB0aGUgU1FFJ3MgbWVtY3B5IGFuZCBiZWZv cmUgdGhlIGRvb3JiZWxsJ3Mgd3JpdGVsLgo+IAo+IAo+IFRoYW5rcyBmb3IgdGhlIGluZm9ybWF0 aW9uIEtlaXRoLgo+IAo+IEFkZGluZyB0byB0aGlzOiByZWd1bGFyIG1lbWNweSBnZW5lcmFsbHkg YWxzbyBlbmZvcmNlcyBhbGlnbm1lbnQgYXMgdW5hbGlnbmVkIGFjY2VzcyB0byByZWd1bGFyIG1l bW9yeSBpcyB0eXBpY2FsbHkgYmFkIGluIHNvbWUgd2F5IG9uIG1vc3QgYXJjaGVzLiBUaGUgZ2Vu ZXJpYyBtZW1jcHlfdG9pbyBhbHNvIGRvZXMgbm90IGhhdmUgYW55IGJhcnJpZXIgYXMgaXQgaXMg anVzdCBhIGNhbGwgdG8gbWVtY3B5LiBBcm02NCBhbHNvIGRvZXMgbm90IGFwcGVhciB0byBoYXZl IGEgYmFycmllciBpbiBpdHMgaW1wbGVtZW50YXRpb24gYW5kIGluIHRoZSBzaG9ydCBzdXJ2ZXkg SSBkaWQgSSBjb3VsZCBub3QgZmluZCBhbnkgaW1wbGVtZW50YXRpb24gd2l0aCBhIGJhcnJpZXIu IEkgYWxzbyBkaWQgbm90IGZpbmQgYSBwcGMgaW1wbGVtZW50YXRpb24gaW4gdGhlIHRyZWUgYnV0 IGl0IHdvdWxkIGJlIHdlaXJkIGZvciBpdCB0byBhZGQgYSBiYXJyaWVyIHdoZW4gb3RoZXIgYXJj aGVzIGRvIG5vdCBhcHBlYXIgdG8gbmVlZCBpdC4KPiAKPiBXZSd2ZSBiZWVuIG9wZXJhdGluZyBv biB0aGUgYXNzdW1wdGlvbiB0aGF0IG1lbW9yeSBtYXBwZWQgYnkgZGV2bV9tZW1yZW1hcF9wYWdl cygpIGNhbiBiZSB0cmVhdGVkIGFzIHJlZ3VsYXIgbWVtb3J5LiBUaGlzIGlzIGVtcGhhc2l6ZWQg YnkgdGhlIGZhY3QgdGhhdCBpdCBkb2VzIG5vdCByZXR1cm4gYW4gX19pb21lbSBwb2ludGVyLiBJ ZiB0aGlzIGFzc3VtcHRpb24gZG9lcyBub3QgaG9sZCBmb3IgYW4gYXJjaCB0aGVuIHdlIGNhbm5v dCBzdXBwb3J0IFAyUCBETUEgd2l0aG91dCBhbiBvdmVyaGF1bCBvZiBtYW55IGtlcm5lbCBpbnRl cmZhY2VzIG9yIGNyZWF0aW5nIG90aGVyIGJhY2tlbmQgaW50ZXJmYWNlcyBpbnRvIHRoZSBkcml2 ZXJzIHdoaWNoIHRha2UgZGlmZmVyZW50IGRhdGEgdHlwZXMgKGllLiB3ZSdkIGhhdmUgdG8gYnlw YXNzIHRoZSBlbnRpcmUgYmxvY2sgbGF5ZXIgd2hlbiB0cnlpbmcgdG8gd3JpdGUgZGF0YSBpbiBw MnBtZW0gdG8gYW4gbnZtZSBkZXZpY2UuIFRoaXMgaXMgdmVyeSB1bmRlc2lyYWJsZS4KPiAKCndy aXRlbCBoYXMgYSBiYXJyaWVyIGluc2lkZSBvbiBBUk02NC4KCmh0dHBzOi8vZWxpeGlyLmJvb3Rs aW4uY29tL2xpbnV4L2xhdGVzdC9zb3VyY2UvYXJjaC9hcm02NC9pbmNsdWRlL2FzbS9pby5oI0wx NDMKCldoeSBkbyB5b3UgbmVlZCBhbm90aGVyIGJhcnJpZXI/CgoKQUNDRVNTSU5HIERFVklDRVMK LS0tLS0tLS0tLS0tLS0tLS0KCk1hbnkgZGV2aWNlcyBjYW4gYmUgbWVtb3J5IG1hcHBlZCwgYW5k IHNvIGFwcGVhciB0byB0aGUgQ1BVIGFzIGlmIHRoZXkncmUganVzdAphIHNldCBvZiBtZW1vcnkg bG9jYXRpb25zLiAgVG8gY29udHJvbCBzdWNoIGEgZGV2aWNlLCB0aGUgZHJpdmVyIHVzdWFsbHkg aGFzIHRvCm1ha2UgdGhlIHJpZ2h0IG1lbW9yeSBhY2Nlc3NlcyBpbiBleGFjdGx5IHRoZSByaWdo dCBvcmRlci4KCkhvd2V2ZXIsIGhhdmluZyBhIGNsZXZlciBDUFUgb3IgYSBjbGV2ZXIgY29tcGls ZXIgY3JlYXRlcyBhIHBvdGVudGlhbCBwcm9ibGVtCmluIHRoYXQgdGhlIGNhcmVmdWxseSBzZXF1 ZW5jZWQgYWNjZXNzZXMgaW4gdGhlIGRyaXZlciBjb2RlIHdvbid0IHJlYWNoIHRoZQpkZXZpY2Ug aW4gdGhlIHJlcXVpc2l0ZSBvcmRlciBpZiB0aGUgQ1BVIG9yIHRoZSBjb21waWxlciB0aGlua3Mg aXQgaXMgbW9yZQplZmZpY2llbnQgdG8gcmVvcmRlciwgY29tYmluZSBvciBtZXJnZSBhY2Nlc3Nl cyAtIHNvbWV0aGluZyB0aGF0IHdvdWxkIGNhdXNlCnRoZSBkZXZpY2UgdG8gbWFsZnVuY3Rpb24u CgpJbnNpZGUgb2YgdGhlIExpbnV4IGtlcm5lbCwgSS9PIHNob3VsZCBiZSBkb25lIHRocm91Z2gg dGhlIGFwcHJvcHJpYXRlIGFjY2Vzc29yCnJvdXRpbmVzIC0gc3VjaCBhcyBpbmIoKSBvciB3cml0 ZWwoKSAtIHdoaWNoIGtub3cgaG93IHRvIG1ha2Ugc3VjaCBhY2Nlc3NlcwphcHByb3ByaWF0ZWx5 IHNlcXVlbnRpYWwuIAoKCj4gTG9nYW4KPiAKPiAKPiAKCgotLSAKU2luYW4gS2F5YQpRdWFsY29t bSBEYXRhY2VudGVyIFRlY2hub2xvZ2llcywgSW5jLiBhcyBhbiBhZmZpbGlhdGUgb2YgUXVhbGNv bW0gVGVjaG5vbG9naWVzLCBJbmMuClF1YWxjb21tIFRlY2hub2xvZ2llcywgSW5jLiBpcyBhIG1l bWJlciBvZiB0aGUgQ29kZSBBdXJvcmEgRm9ydW0sIGEgTGludXggRm91bmRhdGlvbiBDb2xsYWJv cmF0aXZlIFByb2plY3QuCl9fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19f X19fX19fCkxpbnV4LW52ZGltbSBtYWlsaW5nIGxpc3QKTGludXgtbnZkaW1tQGxpc3RzLjAxLm9y ZwpodHRwczovL2xpc3RzLjAxLm9yZy9tYWlsbWFuL2xpc3RpbmZvL2xpbnV4LW52ZGltbQo= From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Subject: Re: [PATCH v2 07/10] nvme-pci: Use PCI p2pmem subsystem to manage the CMB To: Logan Gunthorpe , Keith Busch , Oliver Cc: linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org, linux-nvme@lists.infradead.org, linux-rdma@vger.kernel.org, "linux-nvdimm@lists.01.org" , linux-block@vger.kernel.org, Jens Axboe , Benjamin Herrenschmidt , Alex Williamson , =?UTF-8?B?SsOpcsO0bWUgR2xpc3Nl?= , Jason Gunthorpe , Bjorn Helgaas , Max Gurtovoy , Christoph Hellwig References: <20180228234006.21093-1-logang@deltatee.com> <20180228234006.21093-8-logang@deltatee.com> <20180305160004.GA30975@localhost.localdomain> <3f56c76d-6a5c-7c2f-5442-c9209749b598@deltatee.com> From: Sinan Kaya Message-ID: Date: Mon, 5 Mar 2018 13:02:57 -0500 MIME-Version: 1.0 In-Reply-To: <3f56c76d-6a5c-7c2f-5442-c9209749b598@deltatee.com> Content-Type: text/plain; charset=utf-8 List-ID: On 3/5/2018 12:10 PM, Logan Gunthorpe wrote: > > > On 05/03/18 09:00 AM, Keith Busch wrote: >> On Mon, Mar 05, 2018 at 12:33:29PM +1100, Oliver wrote: >>> On Thu, Mar 1, 2018 at 10:40 AM, Logan Gunthorpe wrote: >>>> @@ -429,10 +429,7 @@ static void __nvme_submit_cmd(struct nvme_queue *nvmeq, >>>>   { >>>>          u16 tail = nvmeq->sq_tail; >>> >>>> -       if (nvmeq->sq_cmds_io) >>>> -               memcpy_toio(&nvmeq->sq_cmds_io[tail], cmd, sizeof(*cmd)); >>>> -       else >>>> -               memcpy(&nvmeq->sq_cmds[tail], cmd, sizeof(*cmd)); >>>> +       memcpy(&nvmeq->sq_cmds[tail], cmd, sizeof(*cmd)); >>> >>> Hmm, how safe is replacing memcpy_toio() with regular memcpy()? On PPC >>> the _toio() variant enforces alignment, does the copy with 4 byte >>> stores, and has a full barrier after the copy. In comparison our >>> regular memcpy() does none of those things and may use unaligned and >>> vector load/stores. For normal (cacheable) memory that is perfectly >>> fine, but they can cause alignment faults when targeted at MMIO >>> (cache-inhibited) memory. >>> >>> I think in this particular case it might be ok since we know SEQs are >>> aligned to 64 byte boundaries and the copy is too small to use our >>> vectorised memcpy(). I'll assume we don't need explicit ordering >>> between writes of SEQs since the existing code doesn't seem to care >>> unless the doorbell is being rung, so you're probably fine there too. >>> That said, I still think this is a little bit sketchy and at the very >>> least you should add a comment explaining what's going on when the CMB >>> is being used. If someone more familiar with the NVMe driver could >>> chime in I would appreciate it. >> >> I may not be understanding the concern, but I'll give it a shot. >> >> You're right, the start of any SQE is always 64-byte aligned, so that >> should satisfy alignment requirements. >> >> The order when writing multiple/successive SQEs in a submission queue >> does matter, and this is currently serialized through the q_lock. >> >> The order in which the bytes of a single SQE is written doesn't really >> matter as long as the entire SQE is written into the CMB prior to writing >> that SQ's doorbell register. >> >> The doorbell register is written immediately after copying a command >> entry into the submission queue (ignore "shadow buffer" features), >> so the doorbells written to commands submitted is 1:1. >> >> If a CMB SQE and DB order is not enforced with the memcpy, then we do >> need a barrier after the SQE's memcpy and before the doorbell's writel. > > > Thanks for the information Keith. > > Adding to this: regular memcpy generally also enforces alignment as unaligned access to regular memory is typically bad in some way on most arches. The generic memcpy_toio also does not have any barrier as it is just a call to memcpy. Arm64 also does not appear to have a barrier in its implementation and in the short survey I did I could not find any implementation with a barrier. I also did not find a ppc implementation in the tree but it would be weird for it to add a barrier when other arches do not appear to need it. > > We've been operating on the assumption that memory mapped by devm_memremap_pages() can be treated as regular memory. This is emphasized by the fact that it does not return an __iomem pointer. If this assumption does not hold for an arch then we cannot support P2P DMA without an overhaul of many kernel interfaces or creating other backend interfaces into the drivers which take different data types (ie. we'd have to bypass the entire block layer when trying to write data in p2pmem to an nvme device. This is very undesirable. > writel has a barrier inside on ARM64. https://elixir.bootlin.com/linux/latest/source/arch/arm64/include/asm/io.h#L143 Why do you need another barrier? ACCESSING DEVICES ----------------- Many devices can be memory mapped, and so appear to the CPU as if they're just a set of memory locations. To control such a device, the driver usually has to make the right memory accesses in exactly the right order. However, having a clever CPU or a clever compiler creates a potential problem in that the carefully sequenced accesses in the driver code won't reach the device in the requisite order if the CPU or the compiler thinks it is more efficient to reorder, combine or merge accesses - something that would cause the device to malfunction. Inside of the Linux kernel, I/O should be done through the appropriate accessor routines - such as inb() or writel() - which know how to make such accesses appropriately sequential. > Logan > > > -- Sinan Kaya Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project. From mboxrd@z Thu Jan 1 00:00:00 1970 From: okaya@codeaurora.org (Sinan Kaya) Date: Mon, 5 Mar 2018 13:02:57 -0500 Subject: [PATCH v2 07/10] nvme-pci: Use PCI p2pmem subsystem to manage the CMB In-Reply-To: <3f56c76d-6a5c-7c2f-5442-c9209749b598@deltatee.com> References: <20180228234006.21093-1-logang@deltatee.com> <20180228234006.21093-8-logang@deltatee.com> <20180305160004.GA30975@localhost.localdomain> <3f56c76d-6a5c-7c2f-5442-c9209749b598@deltatee.com> Message-ID: On 3/5/2018 12:10 PM, Logan Gunthorpe wrote: > > > On 05/03/18 09:00 AM, Keith Busch wrote: >> On Mon, Mar 05, 2018@12:33:29PM +1100, Oliver wrote: >>> On Thu, Mar 1, 2018@10:40 AM, Logan Gunthorpe wrote: >>>> @@ -429,10 +429,7 @@ static void __nvme_submit_cmd(struct nvme_queue *nvmeq, >>>> ? { >>>> ???????? u16 tail = nvmeq->sq_tail; >>> >>>> -?????? if (nvmeq->sq_cmds_io) >>>> -?????????????? memcpy_toio(&nvmeq->sq_cmds_io[tail], cmd, sizeof(*cmd)); >>>> -?????? else >>>> -?????????????? memcpy(&nvmeq->sq_cmds[tail], cmd, sizeof(*cmd)); >>>> +?????? memcpy(&nvmeq->sq_cmds[tail], cmd, sizeof(*cmd)); >>> >>> Hmm, how safe is replacing memcpy_toio() with regular memcpy()? On PPC >>> the _toio() variant enforces alignment, does the copy with 4 byte >>> stores, and has a full barrier after the copy. In comparison our >>> regular memcpy() does none of those things and may use unaligned and >>> vector load/stores. For normal (cacheable) memory that is perfectly >>> fine, but they can cause alignment faults when targeted at MMIO >>> (cache-inhibited) memory. >>> >>> I think in this particular case it might be ok since we know SEQs are >>> aligned to 64 byte boundaries and the copy is too small to use our >>> vectorised memcpy(). I'll assume we don't need explicit ordering >>> between writes of SEQs since the existing code doesn't seem to care >>> unless the doorbell is being rung, so you're probably fine there too. >>> That said, I still think this is a little bit sketchy and at the very >>> least you should add a comment explaining what's going on when the CMB >>> is being used. If someone more familiar with the NVMe driver could >>> chime in I would appreciate it. >> >> I may not be understanding the concern, but I'll give it a shot. >> >> You're right, the start of any SQE is always 64-byte aligned, so that >> should satisfy alignment requirements. >> >> The order when writing multiple/successive SQEs in a submission queue >> does matter, and this is currently serialized through the q_lock. >> >> The order in which the bytes of a single SQE is written doesn't really >> matter as long as the entire SQE is written into the CMB prior to writing >> that SQ's doorbell register. >> >> The doorbell register is written immediately after copying a command >> entry into the submission queue (ignore "shadow buffer" features), >> so the doorbells written to commands submitted is 1:1. >> >> If a CMB SQE and DB order is not enforced with the memcpy, then we do >> need a barrier after the SQE's memcpy and before the doorbell's writel. > > > Thanks for the information Keith. > > Adding to this: regular memcpy generally also enforces alignment as unaligned access to regular memory is typically bad in some way on most arches. The generic memcpy_toio also does not have any barrier as it is just a call to memcpy. Arm64 also does not appear to have a barrier in its implementation and in the short survey I did I could not find any implementation with a barrier. I also did not find a ppc implementation in the tree but it would be weird for it to add a barrier when other arches do not appear to need it. > > We've been operating on the assumption that memory mapped by devm_memremap_pages() can be treated as regular memory. This is emphasized by the fact that it does not return an __iomem pointer. If this assumption does not hold for an arch then we cannot support P2P DMA without an overhaul of many kernel interfaces or creating other backend interfaces into the drivers which take different data types (ie. we'd have to bypass the entire block layer when trying to write data in p2pmem to an nvme device. This is very undesirable. > writel has a barrier inside on ARM64. https://elixir.bootlin.com/linux/latest/source/arch/arm64/include/asm/io.h#L143 Why do you need another barrier? ACCESSING DEVICES ----------------- Many devices can be memory mapped, and so appear to the CPU as if they're just a set of memory locations. To control such a device, the driver usually has to make the right memory accesses in exactly the right order. However, having a clever CPU or a clever compiler creates a potential problem in that the carefully sequenced accesses in the driver code won't reach the device in the requisite order if the CPU or the compiler thinks it is more efficient to reorder, combine or merge accesses - something that would cause the device to malfunction. Inside of the Linux kernel, I/O should be done through the appropriate accessor routines - such as inb() or writel() - which know how to make such accesses appropriately sequential. > Logan > > > -- Sinan Kaya Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.