All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tom Talpey <tom@talpey.com>
To: "yangx.jy@fujitsu.com" <yangx.jy@fujitsu.com>
Cc: "linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>,
	"yanjun.zhu@linux.dev" <yanjun.zhu@linux.dev>,
	"rpearsonhpe@gmail.com" <rpearsonhpe@gmail.com>,
	"jgg@nvidia.com" <jgg@nvidia.com>,
	"y-goto@fujitsu.com" <y-goto@fujitsu.com>,
	"lizhijian@fujitsu.com" <lizhijian@fujitsu.com>,
	"tomasz.gromadzki@intel.com" <tomasz.gromadzki@intel.com>
Subject: Re: [RFC PATCH 2/2] RDMA/rxe: Add RDMA Atomic Write operation
Date: Fri, 31 Dec 2021 10:09:47 -0500	[thread overview]
Message-ID: <0a226c28-9565-55cf-7680-432b28a02cfb@talpey.com> (raw)
In-Reply-To: <61CEBF4E.1090308@fujitsu.com>

On 12/31/2021 3:29 AM, yangx.jy@fujitsu.com wrote:
> On 2021/12/31 5:39, Tom Talpey wrote:
>>
>> On 12/30/2021 7:14 AM, Xiao Yang wrote:
>>> This patch implements RDMA Atomic Write operation for RC service.
>>>
>>> Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com>
>>> ---
>>>    drivers/infiniband/sw/rxe/rxe_comp.c   |  4 +++
>>>    drivers/infiniband/sw/rxe/rxe_opcode.c | 18 ++++++++++
>>>    drivers/infiniband/sw/rxe/rxe_opcode.h |  3 ++
>>>    drivers/infiniband/sw/rxe/rxe_qp.c     |  3 +-
>>>    drivers/infiniband/sw/rxe/rxe_req.c    | 10 ++++--
>>>    drivers/infiniband/sw/rxe/rxe_resp.c   | 49 +++++++++++++++++++++-----
>>>    include/rdma/ib_pack.h                 |  2 ++
>>>    include/rdma/ib_verbs.h                |  2 ++
>>>    include/uapi/rdma/ib_user_verbs.h      |  2 ++
>>>    9 files changed, 81 insertions(+), 12 deletions(-)
>>>
>>
>> <snip>
>>
>>> +/* Guarantee atomicity of atomic write operations at the machine
>>> level. */
>>> +static DEFINE_SPINLOCK(atomic_write_ops_lock);
>>> +
>>> +static enum resp_states process_atomic_write(struct rxe_qp *qp,
>>> +                         struct rxe_pkt_info *pkt)
>>> +{
>>> +    u64 *src, *dst;
>>> +    struct rxe_mr *mr = qp->resp.mr;
>>> +
>>> +    src = payload_addr(pkt);
>>> +
>>> +    dst = iova_to_vaddr(mr, qp->resp.va + qp->resp.offset,
>>> sizeof(u64));
>>> +    if (!dst || (uintptr_t)dst & 7)
>>> +        return RESPST_ERR_MISALIGNED_ATOMIC;
>>> +
>>> +    spin_lock_bh(&atomic_write_ops_lock);
>>> +    *dst = *src;
>>> +    spin_unlock_bh(&atomic_write_ops_lock);
>>
>> Spinlock protection is insufficient! Using a spinlock protects only
>> the atomicity of the store operation with respect to another remote
>> atomic_write operation. But the semantics of RDMA_ATOMIC_WRITE go much
>> further. The operation requires a fully atomic bus transaction, across
>> the memory complex and with respect to CPU, PCI, and other sources.
>> And this guarantee needs to apply to all architectures, including ones
>> with noncoherent caches (software consistency).
>>
>> Because RXE is a software provider, I believe the most natural approach
>> here is to use an atomic64_set(dst, *src). But I'm not certain this
>> is supported on all the target architectures, therefore it may require
>> some additional failure checks, such as stricter alignment than testing
>> (dst & 7), or returning a failure if atomic64 operations are not
>> available. I especially think the ARM and PPC platforms will need
>> an expert review.
> Hi Tom,
> 
> Thanks a lot for the detailed suggestion.
> 
> I will try to use atomic64_set() and add additional failure checks.
> By the way, process_atomic() uses the same method(spinlock + assignment
> expression),
> so do you think we also need to update it by using atomic64_set()?

It is *criticial* for you to understand that the ATOMIC_WRITE has a
much stronger semantic than ordinary RDMA atomics.

The ordinary atomics are only atomic from the perspective of a single
connection and a single adapter. And the result is not placed in memory
atomically, only its execution in the adapter is processed that way.
And the final result is only consistently visible to the application
after a completion event is delivered. This rather huge compromise is
because when atomics were first designed, there were no PCI primitives
which supported atomics.

An RDMA atomic_write operation is atomic all the way to memory. It
must to be implemented in a platform operation which is similarly
atomic. If implemented by a PCIe adapter, it would use the newer
PCIe atomics to perform a locked read-modify-write. If implemented
in software as you are doing, it must use a similar local r-m-w
instruction, or the platform equivalent if necessary.

Tom.

  reply	other threads:[~2021-12-31 15:09 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-30 12:14 [RFC PATCH 0/2] RDMA/rxe: Add RDMA Atomic Write operation Xiao Yang
2021-12-30 12:14 ` [RFC PATCH 1/2] RDMA/rxe: Rename send_atomic_ack() and atomic member of struct resp_res Xiao Yang
2021-12-30 12:14 ` [RFC PATCH 2/2] RDMA/rxe: Add RDMA Atomic Write operation Xiao Yang
2021-12-30 21:39   ` Tom Talpey
2021-12-31  8:29     ` yangx.jy
2021-12-31 15:09       ` Tom Talpey [this message]
     [not found]         ` <61D563B4.2070106@fujitsu.com>
2022-01-07 15:50           ` Tom Talpey
2022-01-07 17:11             ` Jason Gunthorpe
2022-01-12  9:24             ` yangx.jy
2022-01-05 23:53     ` Jason Gunthorpe
2022-01-06 10:52       ` yangx.jy
2022-01-06 13:00         ` Jason Gunthorpe
2022-01-07  2:15           ` yangx.jy
2022-01-07 12:22             ` Jason Gunthorpe
2022-01-07 15:38               ` Tom Talpey
2022-01-07 19:28                 ` Jason Gunthorpe
2022-01-07 20:11                   ` Tom Talpey
2021-12-31  3:01   ` lizhijian
2021-12-31  6:02     ` yangx.jy
2021-12-30 19:21 ` [RFC PATCH 0/2] " Gromadzki, Tomasz
2021-12-30 21:42   ` Tom Talpey
2021-12-31  6:30     ` yangx.jy
2022-01-04  9:28       ` yangx.jy
2022-01-04 15:17         ` Tom Talpey
2022-01-05  1:00           ` yangx.jy
2022-01-06  0:01             ` Jason Gunthorpe
2022-01-06  1:54               ` yangx.jy
2022-01-10 15:42                 ` Jason Gunthorpe
2022-01-11  2:34                   ` yangx.jy
2022-01-11 23:29                     ` Jason Gunthorpe
2022-02-11 13:18           ` Gromadzki, Tomasz
2022-02-17  3:50 ` yangx.jy
2022-02-19 10:37   ` Leon Romanovsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0a226c28-9565-55cf-7680-432b28a02cfb@talpey.com \
    --to=tom@talpey.com \
    --cc=jgg@nvidia.com \
    --cc=linux-rdma@vger.kernel.org \
    --cc=lizhijian@fujitsu.com \
    --cc=rpearsonhpe@gmail.com \
    --cc=tomasz.gromadzki@intel.com \
    --cc=y-goto@fujitsu.com \
    --cc=yangx.jy@fujitsu.com \
    --cc=yanjun.zhu@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.