linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Olga Kornievskaia <aglo@umich.edu>
To: Chuck Lever <chuck.lever@oracle.com>
Cc: linux-rdma <linux-rdma@vger.kernel.org>,
	Linux NFS Mailing List <linux-nfs@vger.kernel.org>
Subject: Re: [PATCH RFC] svcrdma: Ignore source port when computing DRC hash
Date: Thu, 6 Jun 2019 14:13:36 -0400	[thread overview]
Message-ID: <CAN-5tyEUHrDkj7MKfYeN5LsFwZEtaLsHYMX20UQMShHtQa-QsA@mail.gmail.com> (raw)
In-Reply-To: <DD7B8184-4124-4307-BD7F-98F6231361DF@oracle.com>

On Wed, Jun 5, 2019 at 1:28 PM Chuck Lever <chuck.lever@oracle.com> wrote:
>
>
>
> > On Jun 5, 2019, at 11:57 AM, Olga Kornievskaia <aglo@umich.edu> wrote:
> >
> > On Wed, Jun 5, 2019 at 8:15 AM Chuck Lever <chuck.lever@oracle.com> wrote:
> >>
> >> The DRC is not working at all after an RPC/RDMA transport reconnect.
> >> The problem is that the new connection uses a different source port,
> >> which defeats DRC hash.
> >>
> >> An NFS/RDMA client's source port is meaningless for RDMA transports.
> >> The transport layer typically sets the source port value on the
> >> connection to a random ephemeral port. The server already ignores it
> >> for the "secure port" check. See commit 16e4d93f6de7 ("NFSD: Ignore
> >> client's source port on RDMA transports").
> >>
> >> I'm not sure why I never noticed this before.
> >
> > Hi Chuck,
> >
> > I have a question: is the reason for choosing this fix as oppose to
> > fixing the client because it's server's responsibility to design a DRC
> > differently for the NFSoRDMA?
>
> The server DRC implementation is not specified in any standard.
> The server implementation can use whatever works the best. The
> current Linux DRC implementation is useless for NFS/RDMA, because
> the clients have to disconnect before they send retransmissions.
>
> If someone knows a way that a client side RDMA consumer can specify
> the source port that is presented to a server, then we can make
> that change too.

Ok I see you point about this being difficult on the client. Various
implementations take different approaches even on RoCE itself:
1. software RoCE does
        /* pick a source UDP port number for this QP based on
         * the source QPN. this spreads traffic for different QPs
         * across different NIC RX queues (while using a single
         * flow for a given QP to maintain packet order).
         * the port number must be in the Dynamic Ports range
         * (0xc000 - 0xffff).
         */
        qp->src_port = RXE_ROCE_V2_SPORT +
                (hash_32_generic(qp_num(qp), 14) & 0x3fff);

When I allow the connection to be re-established I can confirm that a
new UDP source port is gotten).

2. bnxt_re (broadband net extreme) seems to always use the same source port:
qp->qp1_hdr.udp.sport = htons(0x8CD1);

3. mlx4 seems to always use the same source port:
sqp->ud_header.udp.sport = htons(MLX4_ROCEV2_QP1_SPORT);

4. mlx5 also seems to be pick the same port:
return cpu_to_be16(MLX5_CAP_ROCE(dev->mdev, r_roce_min_src_udp_port));

I have a CX5 card and I see that the source port is always 49513. So
if you use Mellanox cards or this other card, DNC implementations are
safe?

I also see that there is nothing in the verbs API thru which we
interact with the RDMA drivers will allow us to set the port. Unless
we can ask the linux implementation to augment some structures to
allow us to set and query that port or is that unreasonable because
it's not in the standard API.

>
>
> >> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> >> Cc: stable@vger.kernel.org
> >> ---
> >> net/sunrpc/xprtrdma/svc_rdma_transport.c |    7 ++++++-
> >> 1 file changed, 6 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
> >> index 027a3b0..1b3700b 100644
> >> --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
> >> +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
> >> @@ -211,9 +211,14 @@ static void handle_connect_req(struct rdma_cm_id *new_cma_id,
> >>        /* Save client advertised inbound read limit for use later in accept. */
> >>        newxprt->sc_ord = param->initiator_depth;
> >>
> >> -       /* Set the local and remote addresses in the transport */
> >>        sa = (struct sockaddr *)&newxprt->sc_cm_id->route.addr.dst_addr;
> >>        svc_xprt_set_remote(&newxprt->sc_xprt, sa, svc_addr_len(sa));
> >> +       /* The remote port is arbitrary and not under the control of the
> >> +        * ULP. Set it to a fixed value so that the DRC continues to work
> >> +        * after a reconnect.
> >> +        */
> >> +       rpc_set_port((struct sockaddr *)&newxprt->sc_xprt.xpt_remote, 0);
> >> +
> >>        sa = (struct sockaddr *)&newxprt->sc_cm_id->route.addr.src_addr;
> >>        svc_xprt_set_local(&newxprt->sc_xprt, sa, svc_addr_len(sa));
> >>
> >>
>
> --
> Chuck Lever
>
>
>

  reply	other threads:[~2019-06-06 18:13 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-05 12:15 [PATCH RFC] svcrdma: Ignore source port when computing DRC hash Chuck Lever
2019-06-05 15:57 ` Olga Kornievskaia
2019-06-05 17:28   ` Chuck Lever
2019-06-06 18:13     ` Olga Kornievskaia [this message]
2019-06-06 18:33       ` Chuck Lever
2019-06-07 15:43         ` Olga Kornievskaia
2019-06-10 14:38           ` Tom Talpey
2019-06-05 16:43 ` Tom Talpey
2019-06-05 17:25   ` Chuck Lever
2019-06-10 14:50     ` Tom Talpey
2019-06-10 17:50       ` Chuck Lever
2019-06-10 19:14         ` Tom Talpey
2019-06-10 21:57           ` Chuck Lever
2019-06-10 22:13             ` Tom Talpey
2019-06-11  0:07               ` Tom Talpey
2019-06-11 14:25                 ` Chuck Lever
2019-06-11 14:23               ` Chuck Lever
2019-06-06 13:08 ` Sasha Levin
2019-06-06 13:24   ` Chuck Lever

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAN-5tyEUHrDkj7MKfYeN5LsFwZEtaLsHYMX20UQMShHtQa-QsA@mail.gmail.com \
    --to=aglo@umich.edu \
    --cc=chuck.lever@oracle.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).