linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Roman Penyaev <roman.penyaev@profitbricks.com>
To: Sagi Grimberg <sagi@grimberg.me>
Cc: linux-block@vger.kernel.org, linux-rdma@vger.kernel.org,
	Jens Axboe <axboe@kernel.dk>,
	Christoph Hellwig <hch@infradead.org>,
	Bart Van Assche <bart.vanassche@sandisk.com>,
	Or Gerlitz <ogerlitz@mellanox.com>,
	Danil Kipnis <danil.kipnis@profitbricks.com>,
	Jack Wang <jinpu.wang@profitbricks.com>
Subject: Re: [PATCH 03/24] ibtrs: core: lib functions shared between client and server modules
Date: Tue, 6 Feb 2018 13:01:23 +0100	[thread overview]
Message-ID: <CAJrWOzC7f8T3gLAyywx2yY-UFrN+SO1Eb99jtRnB58dr7XdXxg@mail.gmail.com> (raw)
In-Reply-To: <b706bb5f-b317-fc3a-8ada-066a2a26e9ce@grimberg.me>

Hi Sagi,

On Mon, Feb 5, 2018 at 11:52 AM, Sagi Grimberg <sagi@grimberg.me> wrote:
> Hi Roman,
>
> Here are some comments below.
>
>> +int ibtrs_post_recv_empty(struct ibtrs_con *con, struct ib_cqe *cqe)
>> +{
>> +       struct ib_recv_wr wr, *bad_wr;
>> +
>> +       wr.next    = NULL;
>> +       wr.wr_cqe  = cqe;
>> +       wr.sg_list = NULL;
>> +       wr.num_sge = 0;
>> +
>> +       return ib_post_recv(con->qp, &wr, &bad_wr);
>> +}
>> +EXPORT_SYMBOL_GPL(ibtrs_post_recv_empty);
>
>
> What is this designed to do?

Each IO completion (response from server to client) is an immediate
message with no data inside.  Using IMM field we are able to find IO
in the inflight table and complete it with an error code if any.

>> +int ibtrs_iu_post_rdma_write_imm(struct ibtrs_con *con, struct ibtrs_iu
>> *iu,
>> +                                struct ib_sge *sge, unsigned int num_sge,
>> +                                u32 rkey, u64 rdma_addr, u32 imm_data,
>> +                                enum ib_send_flags flags)
>> +{
>> +       struct ib_send_wr *bad_wr;
>> +       struct ib_rdma_wr wr;
>> +       int i;
>> +
>> +       wr.wr.next        = NULL;
>> +       wr.wr.wr_cqe      = &iu->cqe;
>> +       wr.wr.sg_list     = sge;
>> +       wr.wr.num_sge     = num_sge;
>> +       wr.rkey           = rkey;
>> +       wr.remote_addr    = rdma_addr;
>> +       wr.wr.opcode      = IB_WR_RDMA_WRITE_WITH_IMM;
>> +       wr.wr.ex.imm_data = cpu_to_be32(imm_data);
>> +       wr.wr.send_flags  = flags;
>> +
>> +       /*
>> +        * If one of the sges has 0 size, the operation will fail with an
>> +        * length error
>> +        */
>> +       for (i = 0; i < num_sge; i++)
>> +               if (WARN_ON(sge[i].length == 0))
>> +                       return -EINVAL;
>> +
>> +       return ib_post_send(con->qp, &wr.wr, &bad_wr);
>> +}
>> +EXPORT_SYMBOL_GPL(ibtrs_iu_post_rdma_write_imm);
>> +
>> +int ibtrs_post_rdma_write_imm_empty(struct ibtrs_con *con, struct ib_cqe
>> *cqe,
>> +                                   u32 imm_data, enum ib_send_flags
>> flags)
>> +{
>> +       struct ib_send_wr wr, *bad_wr;
>> +
>> +       memset(&wr, 0, sizeof(wr));
>> +       wr.wr_cqe       = cqe;
>> +       wr.send_flags   = flags;
>> +       wr.opcode       = IB_WR_RDMA_WRITE_WITH_IMM;
>> +       wr.ex.imm_data  = cpu_to_be32(imm_data);
>> +
>> +       return ib_post_send(con->qp, &wr, &bad_wr);
>> +}
>> +EXPORT_SYMBOL_GPL(ibtrs_post_rdma_write_imm_empty);
>
>
> Christoph did a great job adding a generic rdma rw API, please
> reuse it, if you rely on needed functionality that does not exist
> there, please enhance it instead of open-coding a new rdma engine
> library.

Good to know, thanks.

>> +static int ibtrs_ib_dev_init(struct ibtrs_ib_dev *d, struct ib_device
>> *dev)
>> +{
>> +       int err;
>> +
>> +       d->pd = ib_alloc_pd(dev, IB_PD_UNSAFE_GLOBAL_RKEY);
>> +       if (IS_ERR(d->pd))
>> +               return PTR_ERR(d->pd);
>> +       d->dev = dev;
>> +       d->lkey = d->pd->local_dma_lkey;
>> +       d->rkey = d->pd->unsafe_global_rkey;
>> +
>> +       err = ibtrs_query_device(d);
>> +       if (unlikely(err))
>> +               ib_dealloc_pd(d->pd);
>> +
>> +       return err;
>> +}
>
>
> I must say that this makes me frustrated.. We stopped doing these
> sort of things long time ago. No way we can even consider accepting
> the unsafe use of the global rkey exposing the entire memory space for
> remote access permissions.
>
> Sorry for being blunt, but this protocol design which makes a concious
> decision to expose unconditionally is broken by definition.

I suppose we can also afford the same trick which nvme does: provide
register_always module argument, can we?  That can be also interesting
to measure the performance difference.

When I did nvme testing with register_always=true/false I saw the
difference.  Would be nice to measure ibtrs with register_always=true
once ibtrs supports that.

>> +struct ibtrs_ib_dev *ibtrs_ib_dev_find_get(struct rdma_cm_id *cm_id)
>> +{
>> +       struct ibtrs_ib_dev *dev;
>> +       int err;
>> +
>> +       mutex_lock(&device_list_mutex);
>> +       list_for_each_entry(dev, &device_list, entry) {
>> +               if (dev->dev->node_guid == cm_id->device->node_guid &&
>> +                   kref_get_unless_zero(&dev->ref))
>> +                       goto out_unlock;
>> +       }
>> +       dev = kzalloc(sizeof(*dev), GFP_KERNEL);
>> +       if (unlikely(!dev))
>> +               goto out_err;
>> +
>> +       kref_init(&dev->ref);
>> +       err = ibtrs_ib_dev_init(dev, cm_id->device);
>> +       if (unlikely(err))
>> +               goto out_free;
>> +       list_add(&dev->entry, &device_list);
>> +out_unlock:
>> +       mutex_unlock(&device_list_mutex);
>> +
>> +       return dev;
>> +
>> +out_free:
>> +       kfree(dev);
>> +out_err:
>> +       mutex_unlock(&device_list_mutex);
>> +
>> +       return NULL;
>> +}
>> +EXPORT_SYMBOL_GPL(ibtrs_ib_dev_find_get);
>
>
> Is it time to make this a common helper in rdma_cm?

True, can be also the patch for nvme.

>> +static void schedule_hb(struct ibtrs_sess *sess)
>> +{
>> +       queue_delayed_work(sess->hb_wq, &sess->hb_dwork,
>> +                          msecs_to_jiffies(sess->hb_interval_ms));
>> +}
>
>
> What does hb stand for?

Just heartbeats :)

>
>> +void ibtrs_send_hb_ack(struct ibtrs_sess *sess)
>> +{
>> +       struct ibtrs_con *usr_con = sess->con[0];
>> +       u32 imm;
>> +       int err;
>> +
>> +       imm = ibtrs_to_imm(IBTRS_HB_ACK_IMM, 0);
>> +       err = ibtrs_post_rdma_write_imm_empty(usr_con, sess->hb_cqe,
>> +                                             imm, IB_SEND_SIGNALED);
>> +       if (unlikely(err)) {
>> +               sess->hb_err_handler(usr_con, err);
>> +               return;
>> +       }
>> +}
>> +EXPORT_SYMBOL_GPL(ibtrs_send_hb_ack);
>
>
> What is this?
>
> What is all this hb stuff?

Heartbeats acknowledgements.

>> +static int ibtrs_str_ipv4_to_sockaddr(const char *addr, size_t len,
>> +                                     short port, struct sockaddr *dst)
>> +{
>> +       struct sockaddr_in *dst_sin = (struct sockaddr_in *)dst;
>> +       int ret;
>> +
>> +       ret = in4_pton(addr, len, (u8 *)&dst_sin->sin_addr.s_addr,
>> +                      '\0', NULL);
>> +       if (ret == 0)
>> +               return -EINVAL;
>> +
>> +       dst_sin->sin_family = AF_INET;
>> +       dst_sin->sin_port = htons(port);
>> +
>> +       return 0;
>> +}
>> +
>> +static int ibtrs_str_ipv6_to_sockaddr(const char *addr, size_t len,
>> +                                     short port, struct sockaddr *dst)
>> +{
>> +       struct sockaddr_in6 *dst_sin6 = (struct sockaddr_in6 *)dst;
>> +       int ret;
>> +
>> +       ret = in6_pton(addr, len, dst_sin6->sin6_addr.s6_addr,
>> +                      '\0', NULL);
>> +       if (ret != 1)
>> +               return -EINVAL;
>> +
>> +       dst_sin6->sin6_family = AF_INET6;
>> +       dst_sin6->sin6_port = htons(port);
>> +
>> +       return 0;
>> +}
>
>
> We already added helpers for this in net utils, you don't need to
> code it again.

Nice.  Will reuse then.

>> +
>> +static int ibtrs_str_gid_to_sockaddr(const char *addr, size_t len,
>> +                                    short port, struct sockaddr *dst)
>> +{
>> +       struct sockaddr_ib *dst_ib = (struct sockaddr_ib *)dst;
>> +       int ret;
>> +
>> +       /* We can use some of the I6 functions since GID is a valid
>> +        * IPv6 address format
>> +        */
>> +       ret = in6_pton(addr, len, dst_ib->sib_addr.sib_raw, '\0', NULL);
>> +       if (ret == 0)
>> +               return -EINVAL;
>> +
>> +       dst_ib->sib_family = AF_IB;
>> +       /*
>> +        * Use the same TCP server port number as the IB service ID
>> +        * on the IB port space range
>> +        */
>> +       dst_ib->sib_sid = cpu_to_be64(RDMA_IB_IP_PS_IB | port);
>> +       dst_ib->sib_sid_mask = cpu_to_be64(0xffffffffffffffffULL);
>> +       dst_ib->sib_pkey = cpu_to_be16(0xffff);
>> +
>> +       return 0;
>> +}
>
>
> Would be a nice addition to net utils.

Got it.

--
Roman

  reply	other threads:[~2018-02-06 12:01 UTC|newest]

Thread overview: 79+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-02 14:08 [PATCH 00/24] InfiniBand Transport (IBTRS) and Network Block Device (IBNBD) Roman Pen
2018-02-02 14:08 ` [PATCH 01/24] ibtrs: public interface header to establish RDMA connections Roman Pen
2018-02-02 14:08 ` [PATCH 02/24] ibtrs: private headers with IBTRS protocol structs and helpers Roman Pen
2018-02-02 14:08 ` [PATCH 03/24] ibtrs: core: lib functions shared between client and server modules Roman Pen
2018-02-05 10:52   ` Sagi Grimberg
2018-02-06 12:01     ` Roman Penyaev [this message]
2018-02-06 16:10       ` Jason Gunthorpe
2018-02-07 10:34         ` Roman Penyaev
2018-02-02 14:08 ` [PATCH 04/24] ibtrs: client: private header with client structs and functions Roman Pen
2018-02-05 10:59   ` Sagi Grimberg
2018-02-06 12:23     ` Roman Penyaev
2018-02-02 14:08 ` [PATCH 05/24] ibtrs: client: main functionality Roman Pen
2018-02-02 16:54   ` Bart Van Assche
2018-02-05 13:27     ` Roman Penyaev
2018-02-05 14:14       ` Sagi Grimberg
2018-02-05 17:05         ` Roman Penyaev
2018-02-05 11:19   ` Sagi Grimberg
2018-02-05 14:19     ` Roman Penyaev
2018-02-05 16:24       ` Bart Van Assche
2018-02-02 14:08 ` [PATCH 06/24] ibtrs: client: statistics functions Roman Pen
2018-02-02 14:08 ` [PATCH 07/24] ibtrs: client: sysfs interface functions Roman Pen
2018-02-05 11:20   ` Sagi Grimberg
2018-02-06 12:28     ` Roman Penyaev
2018-02-02 14:08 ` [PATCH 08/24] ibtrs: server: private header with server structs and functions Roman Pen
2018-02-02 14:08 ` [PATCH 09/24] ibtrs: server: main functionality Roman Pen
2018-02-05 11:29   ` Sagi Grimberg
2018-02-06 12:46     ` Roman Penyaev
2018-02-02 14:08 ` [PATCH 10/24] ibtrs: server: statistics functions Roman Pen
2018-02-02 14:08 ` [PATCH 11/24] ibtrs: server: sysfs interface functions Roman Pen
2018-02-02 14:08 ` [PATCH 12/24] ibtrs: include client and server modules into kernel compilation Roman Pen
2018-02-02 14:08 ` [PATCH 13/24] ibtrs: a bit of documentation Roman Pen
2018-02-02 14:08 ` [PATCH 14/24] ibnbd: private headers with IBNBD protocol structs and helpers Roman Pen
2018-02-02 14:08 ` [PATCH 15/24] ibnbd: client: private header with client structs and functions Roman Pen
2018-02-02 14:08 ` [PATCH 16/24] ibnbd: client: main functionality Roman Pen
2018-02-02 15:11   ` Jens Axboe
2018-02-05 12:54     ` Roman Penyaev
2018-02-02 14:08 ` [PATCH 17/24] ibnbd: client: sysfs interface functions Roman Pen
2018-02-02 14:08 ` [PATCH 18/24] ibnbd: server: private header with server structs and functions Roman Pen
2018-02-02 14:08 ` [PATCH 19/24] ibnbd: server: main functionality Roman Pen
2018-02-02 14:09 ` [PATCH 20/24] ibnbd: server: functionality for IO submission to file or block dev Roman Pen
2018-02-02 14:09 ` [PATCH 21/24] ibnbd: server: sysfs interface functions Roman Pen
2018-02-02 14:09 ` [PATCH 22/24] ibnbd: include client and server modules into kernel compilation Roman Pen
2018-02-02 14:09 ` [PATCH 23/24] ibnbd: a bit of documentation Roman Pen
2018-02-02 15:55   ` Bart Van Assche
2018-02-05 13:03     ` Roman Penyaev
2018-02-05 14:16       ` Sagi Grimberg
2018-02-02 14:09 ` [PATCH 24/24] MAINTAINERS: Add maintainer for IBNBD/IBTRS modules Roman Pen
2018-02-02 16:07 ` [PATCH 00/24] InfiniBand Transport (IBTRS) and Network Block Device (IBNBD) Bart Van Assche
2018-02-02 16:40   ` Doug Ledford
2018-02-05  8:45     ` Jinpu Wang
2018-06-04 12:14     ` Danil Kipnis
2018-02-02 17:05 ` Bart Van Assche
2018-02-05  8:56   ` Jinpu Wang
2018-02-05 11:36     ` Sagi Grimberg
2018-02-05 13:38       ` Danil Kipnis
2018-02-05 14:17         ` Sagi Grimberg
2018-02-05 16:40           ` Danil Kipnis
2018-02-05 18:38             ` Bart Van Assche
2018-02-06  9:44               ` Danil Kipnis
2018-02-06 15:35                 ` Bart Van Assche
2018-02-05 16:16     ` Bart Van Assche
2018-02-05 16:36       ` Jinpu Wang
2018-02-07 16:35       ` Christopher Lameter
2018-02-07 17:18         ` Roman Penyaev
2018-02-07 17:32           ` Bart Van Assche
2018-02-08 17:38             ` Danil Kipnis
2018-02-08 18:09               ` Bart Van Assche
2018-06-04 12:27                 ` Danil Kipnis
2018-02-05 12:16 ` Sagi Grimberg
2018-02-05 12:30   ` Sagi Grimberg
2018-02-07 13:06     ` Roman Penyaev
2018-02-05 16:58   ` Bart Van Assche
2018-02-05 17:16     ` Roman Penyaev
2018-02-05 17:20       ` Bart Van Assche
2018-02-06 11:47         ` Roman Penyaev
2018-02-06 13:12   ` Roman Penyaev
2018-02-06 16:01     ` Bart Van Assche
2018-02-07 12:57       ` Roman Penyaev
2018-02-07 16:35         ` Bart Van Assche

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJrWOzC7f8T3gLAyywx2yY-UFrN+SO1Eb99jtRnB58dr7XdXxg@mail.gmail.com \
    --to=roman.penyaev@profitbricks.com \
    --cc=axboe@kernel.dk \
    --cc=bart.vanassche@sandisk.com \
    --cc=danil.kipnis@profitbricks.com \
    --cc=hch@infradead.org \
    --cc=jinpu.wang@profitbricks.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=ogerlitz@mellanox.com \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).