linux-cifs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Tom Talpey <tom@talpey.com>
To: Jason Gunthorpe <jgg@nvidia.com>,
	Chuck Lever III <chuck.lever@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>, Leon Romanovsky <leon@kernel.org>,
	Doug Ledford <dledford@redhat.com>,
	Leon Romanovsky <leonro@nvidia.com>,
	Adit Ranadive <aditr@vmware.com>,
	Anna Schumaker <anna.schumaker@netapp.com>,
	Ariel Elior <aelior@marvell.com>,
	Avihai Horon <avihaih@nvidia.com>,
	Bart Van Assche <bvanassche@acm.org>,
	Bernard Metzler <bmt@zurich.ibm.com>,
	"David S. Miller" <davem@davemloft.net>,
	Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>,
	Devesh Sharma <devesh.sharma@broadcom.com>,
	Faisal Latif <faisal.latif@intel.com>,
	Jack Wang <jinpu.wang@ionos.com>,
	Jakub Kicinski <kuba@kernel.org>,
	Bruce Fields <bfields@fieldses.org>, Jens Axboe <axboe@fb.com>,
	Karsten Graul <kgraul@linux.ibm.com>,
	Keith Busch <kbusch@kernel.org>, Lijun Ou <oulijun@huawei.com>,
	CIFS <linux-cifs@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Linux NFS Mailing List <linux-nfs@vger.kernel.org>,
	"linux-nvme@lists.infradead.org" <linux-nvme@lists.infradead.org>,
	linux-rdma <linux-rdma@vger.kernel.org>,
	"linux-s390@vger.kernel.org" <linux-s390@vger.kernel.org>,
	Max Gurtovoy <maxg@mellanox.com>,
	Max Gurtovoy <mgurtovoy@nvidia.com>,
	"Md. Haris Iqbal" <haris.iqbal@ionos.com>,
	Michael Guralnik <michaelgur@nvidia.com>,
	Michal Kalderon <mkalderon@marvell.com>,
	Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>,
	Naresh Kumar PBS <nareshkumar.pbs@broadcom.com>,
	Linux-Net <netdev@vger.kernel.org>,
	Potnuri Bharat Teja <bharat@chelsio.com>,
	"rds-devel@oss.oracle.com" <rds-devel@oss.oracle.com>,
	Sagi Grimberg <sagi@grimberg.me>,
	"samba-technical@lists.samba.org"
	<samba-technical@lists.samba.org>,
	Santosh Shilimkar <santosh.shilimkar@oracle.com>,
	Selvin Xavier <selvin.xavier@broadcom.com>,
	Shiraz Saleem <shiraz.saleem@intel.com>,
	Somnath Kotur <somnath.kotur@broadcom.com>,
	Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>,
	Steve French <sfrench@samba.org>,
	Trond Myklebust <trond.myklebust@hammerspace.com>,
	VMware PV-Drivers <pv-drivers@vmware.com>,
	Weihang Li <liweihang@huawei.com>,
	Yishai Hadas <yishaih@nvidia.com>,
	Zhu Yanjun <zyjzyj2000@gmail.com>
Subject: Re: [PATCH rdma-next 00/10] Enable relaxed ordering for ULPs
Date: Fri, 9 Apr 2021 10:26:21 -0400	[thread overview]
Message-ID: <aeb7334b-edc0-78c2-4adb-92d4a994210d@talpey.com> (raw)
In-Reply-To: <20210406114952.GH7405@nvidia.com>

On 4/6/2021 7:49 AM, Jason Gunthorpe wrote:
> On Mon, Apr 05, 2021 at 11:42:31PM +0000, Chuck Lever III wrote:
>   
>> We need to get a better idea what correctness testing has been done,
>> and whether positive correctness testing results can be replicated
>> on a variety of platforms.
> 
> RO has been rolling out slowly on mlx5 over a few years and storage
> ULPs are the last to change. eg the mlx5 ethernet driver has had RO
> turned on for a long time, userspace HPC applications have been using
> it for a while now too.

I'd love to see RO be used more, it was always something the RDMA
specs supported and carefully architected for. My only concern is
that it's difficult to get right, especially when the platforms
have been running strictly-ordered for so long. The ULPs need
testing, and a lot of it.

> We know there are platforms with broken RO implementations (like
> Haswell) but the kernel is supposed to globally turn off RO on all
> those cases. I'd be a bit surprised if we discover any more from this
> series.
> 
> On the other hand there are platforms that get huge speed ups from
> turning this on, AMD is one example, there are a bunch in the ARM
> world too.

My belief is that the biggest risk is from situations where completions
are batched, and therefore polling is used to detect them without
interrupts (which explicitly). The RO pipeline will completely reorder
DMA writes, and consumers which infer ordering from memory contents may
break. This can even apply within the provider code, which may attempt
to poll WR and CQ structures, and be tripped up.

The Mellanox adapter, itself, historically has strict in-order DMA
semantics, and while it's great to relax that, changing it by default
for all consumers is something to consider very cautiously.

> Still, obviously people should test on the platforms they have.

Yes, and "test" be taken seriously with focus on ULP data integrity.
Speedups will mean nothing if the data is ever damaged.

Tom.

  reply	other threads:[~2021-04-09 14:26 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-05  5:23 [PATCH rdma-next 00/10] Enable relaxed ordering for ULPs Leon Romanovsky
2021-04-05  5:23 ` [PATCH rdma-next 01/10] RDMA: Add access flags to ib_alloc_mr() and ib_mr_pool_init() Leon Romanovsky
2021-04-05 13:46   ` Christoph Hellwig
2021-04-06  5:24     ` Leon Romanovsky
2021-04-05 15:27   ` Bart Van Assche
2021-04-06  5:23     ` Leon Romanovsky
2021-04-06  5:27       ` Christoph Hellwig
2021-04-06  5:58         ` Leon Romanovsky
2021-04-06 12:13           ` Jason Gunthorpe
2021-04-06 12:30             ` Christoph Hellwig
2021-04-06 14:04               ` Jason Gunthorpe
2021-04-06 14:15                 ` Christoph Hellwig
2021-04-06 14:40                   ` Jason Gunthorpe
2021-04-06 14:54                     ` Christoph Hellwig
2021-04-06 15:03                       ` Christoph Hellwig
2021-04-07 18:28                       ` Jason Gunthorpe
2021-04-05  5:23 ` [PATCH rdma-next 02/10] RDMA/core: Enable Relaxed Ordering in __ib_alloc_pd() Leon Romanovsky
2021-04-05 18:01   ` Tom Talpey
2021-04-05 20:40     ` Adit Ranadive
2021-04-06  6:28     ` Leon Romanovsky
2021-04-05  5:23 ` [PATCH rdma-next 03/10] RDMA/iser: Enable Relaxed Ordering Leon Romanovsky
2021-04-05  5:23 ` [PATCH rdma-next 04/10] RDMA/rtrs: " Leon Romanovsky
2021-04-05  5:23 ` [PATCH rdma-next 05/10] RDMA/srp: " Leon Romanovsky
2021-04-05  5:24 ` [PATCH rdma-next 06/10] nvme-rdma: " Leon Romanovsky
2021-04-05  5:24 ` [PATCH rdma-next 07/10] cifs: smbd: " Leon Romanovsky
2021-04-05  5:24 ` [PATCH rdma-next 08/10] net/rds: " Leon Romanovsky
2021-04-05  5:24 ` [PATCH rdma-next 09/10] net/smc: " Leon Romanovsky
2021-04-05  5:24 ` [PATCH rdma-next 10/10] xprtrdma: " Leon Romanovsky
2021-04-05 13:41 ` [PATCH rdma-next 00/10] Enable relaxed ordering for ULPs Christoph Hellwig
2021-04-05 14:08   ` Leon Romanovsky
2021-04-05 16:11     ` Santosh Shilimkar
2021-04-05 17:54     ` Tom Talpey
2021-04-05 20:07   ` Jason Gunthorpe
2021-04-05 23:42     ` Chuck Lever III
2021-04-05 23:50       ` Keith Busch
2021-04-06  5:12       ` Leon Romanovsky
2021-04-06 11:49       ` Jason Gunthorpe
2021-04-09 14:26         ` Tom Talpey [this message]
2021-04-09 14:45           ` Chuck Lever III
2021-04-09 15:32             ` Tom Talpey
2021-04-09 16:27               ` Haakon Bugge
2021-04-09 17:49                 ` Tom Talpey
2021-04-10 13:30                   ` David Laight
2021-04-12 18:32                     ` Haakon Bugge
2021-04-12 20:20                       ` Tom Talpey
2021-04-12 22:48                         ` Jason Gunthorpe
2021-04-14 14:16                           ` Tom Talpey
2021-04-14 14:41                             ` David Laight
2021-04-14 14:49                               ` Jason Gunthorpe
2021-04-14 14:44                             ` Jason Gunthorpe
2021-04-09 16:40           ` Jason Gunthorpe
2021-04-09 17:44             ` Tom Talpey
2021-04-06  2:37 ` Honggang LI
2021-04-06  5:09   ` Leon Romanovsky
2021-04-06 11:53     ` Jason Gunthorpe
2021-04-11 10:09       ` Max Gurtovoy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aeb7334b-edc0-78c2-4adb-92d4a994210d@talpey.com \
    --to=tom@talpey.com \
    --cc=aditr@vmware.com \
    --cc=aelior@marvell.com \
    --cc=anna.schumaker@netapp.com \
    --cc=avihaih@nvidia.com \
    --cc=axboe@fb.com \
    --cc=bfields@fieldses.org \
    --cc=bharat@chelsio.com \
    --cc=bmt@zurich.ibm.com \
    --cc=bvanassche@acm.org \
    --cc=chuck.lever@oracle.com \
    --cc=davem@davemloft.net \
    --cc=dennis.dalessandro@cornelisnetworks.com \
    --cc=devesh.sharma@broadcom.com \
    --cc=dledford@redhat.com \
    --cc=faisal.latif@intel.com \
    --cc=haris.iqbal@ionos.com \
    --cc=hch@lst.de \
    --cc=jgg@nvidia.com \
    --cc=jinpu.wang@ionos.com \
    --cc=kbusch@kernel.org \
    --cc=kgraul@linux.ibm.com \
    --cc=kuba@kernel.org \
    --cc=leon@kernel.org \
    --cc=leonro@nvidia.com \
    --cc=linux-cifs@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=liweihang@huawei.com \
    --cc=maxg@mellanox.com \
    --cc=mgurtovoy@nvidia.com \
    --cc=michaelgur@nvidia.com \
    --cc=mike.marciniszyn@cornelisnetworks.com \
    --cc=mkalderon@marvell.com \
    --cc=nareshkumar.pbs@broadcom.com \
    --cc=netdev@vger.kernel.org \
    --cc=oulijun@huawei.com \
    --cc=pv-drivers@vmware.com \
    --cc=rds-devel@oss.oracle.com \
    --cc=sagi@grimberg.me \
    --cc=samba-technical@lists.samba.org \
    --cc=santosh.shilimkar@oracle.com \
    --cc=selvin.xavier@broadcom.com \
    --cc=sfrench@samba.org \
    --cc=shiraz.saleem@intel.com \
    --cc=somnath.kotur@broadcom.com \
    --cc=sriharsha.basavapatna@broadcom.com \
    --cc=trond.myklebust@hammerspace.com \
    --cc=yishaih@nvidia.com \
    --cc=zyjzyj2000@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).