linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Erez Shitrit <erezsh@dev.mellanox.co.il>
To: Dennis Dalessandro <dennis.dalessandro@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>,
	Doug Ledford <dledford@redhat.com>,
	Mike Marciniszyn <mike.marcinisyzn@intel.com>,
	RDMA mailing list <linux-rdma@vger.kernel.org>,
	Sadanand Warrier <sadanand.warrier@intel.com>,
	Kaike Wan <kaike.wan@intel.com>
Subject: Re: [PATCH for-next 07/16] IB/ipoib: Increase ipoib Datagram mode MTU's upper limit
Date: Wed, 19 Feb 2020 13:01:30 +0200	[thread overview]
Message-ID: <CAAk-MO-NZbL8WjUB+eupK_0uYAJXEyvKjEVX8Bo5DzkExNnweg@mail.gmail.com> (raw)
In-Reply-To: <20200210131850.87776.40842.stgit@awfm-01.aw.intel.com>

On Mon, Feb 10, 2020 at 3:18 PM Dennis Dalessandro
<dennis.dalessandro@intel.com> wrote:
>
> From: Sadanand Warrier <sadanand.warrier@intel.com>
>
> Currently the ipoib UD mtu is restricted to 4K bytes. Remove this
> limitation so that the IPOIB module can potentially use an MTU (in UD
> mode) that is bounded by the MTU of the underlying device. A field is
> added to the ib_port_attr structure to indicate the maximum physical
> MTU the underlying device supports.
>
> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
> Reviewed-by: Mike Marciniszyn <mike.marcinisyzn@intel.com>
> Signed-off-by: Sadanand Warrier <sadanand.warrier@intel.com>
> Signed-off-by: Kaike Wan <kaike.wan@intel.com>
> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
> ---
>  drivers/infiniband/hw/hfi1/qp.c                |   18 +------
>  drivers/infiniband/hw/hfi1/verbs.c             |    2 +
>  drivers/infiniband/ulp/ipoib/ipoib_main.c      |    5 ++
>  drivers/infiniband/ulp/ipoib/ipoib_multicast.c |   11 ++--
>  include/rdma/ib_verbs.h                        |   60 ++++++++++++++++++++++++
>  include/rdma/opa_port_info.h                   |   10 ----
>  6 files changed, 74 insertions(+), 32 deletions(-)
>
> diff --git a/drivers/infiniband/hw/hfi1/qp.c b/drivers/infiniband/hw/hfi1/qp.c
> index f8e733a..0c2ae9f 100644
> --- a/drivers/infiniband/hw/hfi1/qp.c
> +++ b/drivers/infiniband/hw/hfi1/qp.c
> @@ -1,5 +1,5 @@
>  /*
> - * Copyright(c) 2015 - 2019 Intel Corporation.
> + * Copyright(c) 2015 - 2020 Intel Corporation.
>   *
>   * This file is provided under a dual BSD/GPLv2 license.  When using or
>   * redistributing this file, you may do so under either license.
> @@ -186,15 +186,6 @@ static void flush_iowait(struct rvt_qp *qp)
>         write_sequnlock_irqrestore(lock, flags);
>  }
>
> -static inline int opa_mtu_enum_to_int(int mtu)
> -{
> -       switch (mtu) {
> -       case OPA_MTU_8192:  return 8192;
> -       case OPA_MTU_10240: return 10240;
> -       default:            return -1;
> -       }
> -}
> -
>  /**
>   * This function is what we would push to the core layer if we wanted to be a
>   * "first class citizen".  Instead we hide this here and rely on Verbs ULPs
> @@ -202,15 +193,10 @@ static inline int opa_mtu_enum_to_int(int mtu)
>   */
>  static inline int verbs_mtu_enum_to_int(struct ib_device *dev, enum ib_mtu mtu)
>  {
> -       int val;
> -
>         /* Constraining 10KB packets to 8KB packets */
>         if (mtu == (enum ib_mtu)OPA_MTU_10240)
>                 mtu = OPA_MTU_8192;
> -       val = opa_mtu_enum_to_int((int)mtu);
> -       if (val > 0)
> -               return val;
> -       return ib_mtu_enum_to_int(mtu);
> +       return opa_mtu_enum_to_int((enum opa_mtu)mtu);
>  }
>
>  int hfi1_check_modify_qp(struct rvt_qp *qp, struct ib_qp_attr *attr,
> diff --git a/drivers/infiniband/hw/hfi1/verbs.c b/drivers/infiniband/hw/hfi1/verbs.c
> index 8061e93..04f0206 100644
> --- a/drivers/infiniband/hw/hfi1/verbs.c
> +++ b/drivers/infiniband/hw/hfi1/verbs.c
> @@ -1437,6 +1437,8 @@ static int query_port(struct rvt_dev_info *rdi, u8 port_num,
>                                       4096 : hfi1_max_mtu), IB_MTU_4096);
>         props->active_mtu = !valid_ib_mtu(ppd->ibmtu) ? props->max_mtu :
>                 mtu_to_enum(ppd->ibmtu, IB_MTU_4096);
> +       props->phys_mtu = HFI1_CAP_IS_KSET(AIP) ? hfi1_max_mtu :
> +                               ib_mtu_enum_to_int(props->max_mtu);
>
>         return 0;
>  }
> diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
> index e5f438a..5c1cf68 100644
> --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
> +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
> @@ -1862,7 +1862,10 @@ static int ipoib_parent_init(struct net_device *ndev)
>                         priv->port);
>                 return result;
>         }
> -       priv->max_ib_mtu = ib_mtu_enum_to_int(attr.max_mtu);
> +       if (rdma_core_cap_opa_port(priv->ca, priv->port))

Why not to use something like you did in rdma_mtu_enum_to_int, in
order to hide all the "_opa_" stuff from IPoIB ?

> +               priv->max_ib_mtu = attr.phys_mtu;
> +       else
> +               priv->max_ib_mtu = ib_mtu_enum_to_int(attr.max_mtu);
>
>         result = ib_query_pkey(priv->ca, priv->port, 0, &priv->pkey);
>         if (result) {
> diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
> index b9e9562..7166ee9b 100644
> --- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
> +++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
> @@ -218,6 +218,7 @@ static int ipoib_mcast_join_finish(struct ipoib_mcast *mcast,
>         struct rdma_ah_attr av;
>         int ret;
>         int set_qkey = 0;
> +       int mtu;
>
>         mcast->mcmember = *mcmember;
>
> @@ -240,13 +241,11 @@ static int ipoib_mcast_join_finish(struct ipoib_mcast *mcast,
>                 priv->broadcast->mcmember.flow_label = mcmember->flow_label;
>                 priv->broadcast->mcmember.hop_limit = mcmember->hop_limit;
>                 /* assume if the admin and the mcast are the same both can be changed */
> +               mtu = rdma_mtu_enum_to_int(priv->ca,  priv->port,
> +                                          priv->broadcast->mcmember.mtu);
>                 if (priv->mcast_mtu == priv->admin_mtu)
> -                       priv->admin_mtu =
> -                       priv->mcast_mtu =
> -                       IPOIB_UD_MTU(ib_mtu_enum_to_int(priv->broadcast->mcmember.mtu));
> -               else
> -                       priv->mcast_mtu =
> -                       IPOIB_UD_MTU(ib_mtu_enum_to_int(priv->broadcast->mcmember.mtu));
> +                       priv->admin_mtu = IPOIB_UD_MTU(mtu);
> +               priv->mcast_mtu = IPOIB_UD_MTU(mtu);
>
>                 priv->qkey = be32_to_cpu(priv->broadcast->mcmember.qkey);
>                 spin_unlock_irq(&priv->lock);
> diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
> index 8b805ab3..6ab343f 100644
> --- a/include/rdma/ib_verbs.h
> +++ b/include/rdma/ib_verbs.h
> @@ -462,6 +462,11 @@ enum ib_mtu {
>         IB_MTU_4096 = 5
>  };
>
> +enum opa_mtu {
> +       OPA_MTU_8192 = 6,
> +       OPA_MTU_10240 = 7
> +};
> +
>  static inline int ib_mtu_enum_to_int(enum ib_mtu mtu)
>  {
>         switch (mtu) {
> @@ -488,6 +493,28 @@ static inline enum ib_mtu ib_mtu_int_to_enum(int mtu)
>                 return IB_MTU_256;
>  }
>
> +static inline int opa_mtu_enum_to_int(enum opa_mtu mtu)
> +{
> +       switch (mtu) {
> +       case OPA_MTU_8192:
> +               return 8192;
> +       case OPA_MTU_10240:
> +               return 10240;
> +       default:
> +               return(ib_mtu_enum_to_int((enum ib_mtu)mtu));
> +       }
> +}
> +
> +static inline enum opa_mtu opa_mtu_int_to_enum(int mtu)
> +{
> +       if (mtu >= 10240)
> +               return OPA_MTU_10240;
> +       else if (mtu >= 8192)
> +               return OPA_MTU_8192;
> +       else
> +               return ((enum opa_mtu)ib_mtu_int_to_enum(mtu));
> +}
> +
>  enum ib_port_state {
>         IB_PORT_NOP             = 0,
>         IB_PORT_DOWN            = 1,
> @@ -651,6 +678,7 @@ struct ib_port_attr {
>         enum ib_port_state      state;
>         enum ib_mtu             max_mtu;
>         enum ib_mtu             active_mtu;
> +       u32                     phys_mtu;
>         int                     gid_tbl_len;
>         unsigned int            ip_gids:1;
>         /* This is the value from PortInfo CapabilityMask, defined by IBA */
> @@ -3356,6 +3384,38 @@ static inline unsigned int rdma_find_pg_bit(unsigned long addr,
>         return __fls(pgsz);
>  }
>
> +/**
> + * rdma_core_cap_opa_port - Return whether the RDMA Port is OPA or not.
> + * @device: Device
> + * @port_num: 1 based Port number
> + *
> + * Return true if port is an Intel OPA port , false if not
> + */
> +static inline bool rdma_core_cap_opa_port(struct ib_device *device,
> +                                         u32 port_num)
> +{
> +       return (device->port_data[port_num].immutable.core_cap_flags &
> +               RDMA_CORE_PORT_INTEL_OPA) == RDMA_CORE_PORT_INTEL_OPA;
> +}
> +
> +/**
> + * rdma_mtu_enum_to_int - Return the mtu of the port as an integer value.
> + * @device: Device
> + * @port_num: Port number
> + * @mtu: enum value of MTU
> + *
> + * Return the MTU size supported by the port as an integer value. Will return
> + * -1 if enum value of mtu is not supported.
> + */
> +static inline int rdma_mtu_enum_to_int(struct ib_device *device, u8 port,
> +                                      int mtu)
> +{
> +       if (rdma_core_cap_opa_port(device, port))
> +               return opa_mtu_enum_to_int((enum opa_mtu)mtu);
> +       else
> +               return ib_mtu_enum_to_int((enum ib_mtu)mtu);
> +}
> +
>  int ib_set_vf_link_state(struct ib_device *device, int vf, u8 port,
>                          int state);
>  int ib_get_vf_config(struct ib_device *device, int vf, u8 port,
> diff --git a/include/rdma/opa_port_info.h b/include/rdma/opa_port_info.h
> index bdbfe25..0d9e6d7 100644
> --- a/include/rdma/opa_port_info.h
> +++ b/include/rdma/opa_port_info.h
> @@ -1,5 +1,5 @@
>  /*
> - * Copyright (c) 2014-2017 Intel Corporation.  All rights reserved.
> + * Copyright (c) 2014-2020 Intel Corporation.  All rights reserved.
>   *
>   * This software is available to you under a choice of one of two
>   * licenses.  You may choose to be licensed under the terms of the GNU
> @@ -139,14 +139,6 @@
>  #define OPA_CAP_MASK3_IsVLMarkerSupported         (1 << 1)
>  #define OPA_CAP_MASK3_IsVLrSupported              (1 << 0)
>
> -/**
> - * new MTU values
> - */
> -enum {
> -       OPA_MTU_8192  = 6,
> -       OPA_MTU_10240 = 7,
> -};
> -
>  enum {
>         OPA_PORT_PHYS_CONF_DISCONNECTED = 0,
>         OPA_PORT_PHYS_CONF_STANDARD     = 1,
>

  reply	other threads:[~2020-02-19 11:01 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-10 13:18 [PATCH for-next 00/16] New hfi1 feature: Accelerated IP Dennis Dalessandro
2020-02-10 13:18 ` [PATCH for-next 01/16] IB/hfi1: Add accelerated IP capability bit Dennis Dalessandro
2020-02-10 13:18 ` [PATCH for-next 02/16] IB/hfi1: Add functions to transmit datagram ipoib packets Dennis Dalessandro
2020-02-10 13:18 ` [PATCH for-next 03/16] IB/hfi1: Add the transmit side of a datagram ipoib RDMA netdev Dennis Dalessandro
2020-02-10 13:18 ` [PATCH for-next 04/16] IB/hfi1: Remove module parameter for KDETH qpns Dennis Dalessandro
2020-02-10 13:18 ` [PATCH for-next 05/16] IB/{rdmavt, hfi1}: Implement creation of accelerated UD QPs Dennis Dalessandro
2020-02-10 13:18 ` [PATCH for-next 06/16] IB/hfi1: RSM rules for AIP Dennis Dalessandro
2020-02-10 13:18 ` [PATCH for-next 07/16] IB/ipoib: Increase ipoib Datagram mode MTU's upper limit Dennis Dalessandro
2020-02-19 11:01   ` Erez Shitrit [this message]
2020-02-21 19:40     ` Dennis Dalessandro
2020-02-10 13:18 ` [PATCH for-next 08/16] IB/hfi1: Rename num_vnic_contexts as num_netdev_contexts Dennis Dalessandro
2020-02-10 13:19 ` [PATCH for-next 09/16] IB/hfi1: Add functions to receive accelerated ipoib packets Dennis Dalessandro
2020-02-10 13:19 ` [PATCH for-next 10/16] IB/hfi1: Add interrupt handler functions for accelerated ipoib Dennis Dalessandro
2020-02-10 13:19 ` [PATCH for-next 11/16] IB/hfi1: Add rx functions for dummy netdev Dennis Dalessandro
2020-02-10 13:19 ` [PATCH for-next 12/16] IB/hfi1: Activate the " Dennis Dalessandro
2020-02-10 13:19 ` [PATCH for-next 13/16] IB/{hfi1, ipoib, rdma}: Broadcast ping sent packets which exceeded mtu size Dennis Dalessandro
2020-02-19  0:42   ` Jason Gunthorpe
2020-02-21 19:40     ` Dennis Dalessandro
2020-02-21 23:32       ` Jason Gunthorpe
2020-03-20 13:53         ` Dennis Dalessandro
2020-02-19 13:41   ` Erez Shitrit
2020-02-21 19:40     ` Dennis Dalessandro
2020-02-10 13:19 ` [PATCH for-next 14/16] IB/hfi1: Add packet histogram trace event Dennis Dalessandro
2020-02-10 13:19 ` [PATCH for-next 15/16] IB/ipoib: Add capability to switch between datagram and connected mode Dennis Dalessandro
2020-02-10 13:20 ` [PATCH for-next 16/16] IB/hfi1: Enable the transmit side of the datagram ipoib netdev Dennis Dalessandro
2020-02-10 13:31 ` [PATCH for-next 00/16] New hfi1 feature: Accelerated IP Jason Gunthorpe
2020-02-10 17:36   ` Dennis Dalessandro
2020-02-10 18:32     ` Jason Gunthorpe
2020-02-11 21:58       ` Dennis Dalessandro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAAk-MO-NZbL8WjUB+eupK_0uYAJXEyvKjEVX8Bo5DzkExNnweg@mail.gmail.com \
    --to=erezsh@dev.mellanox.co.il \
    --cc=dennis.dalessandro@intel.com \
    --cc=dledford@redhat.com \
    --cc=jgg@ziepe.ca \
    --cc=kaike.wan@intel.com \
    --cc=linux-rdma@vger.kernel.org \
    --cc=mike.marcinisyzn@intel.com \
    --cc=sadanand.warrier@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).