Linux-RDMA Archive on lore.kernel.org
 help / color / Atom feed
From: Danil Kipnis <danil.kipnis@cloud.ionos.com>
To: Bart Van Assche <bvanassche@acm.org>
Cc: Jack Wang <jinpuwang@gmail.com>,
	linux-block@vger.kernel.org, linux-rdma@vger.kernel.org,
	Jens Axboe <axboe@kernel.dk>,
	Christoph Hellwig <hch@infradead.org>,
	Sagi Grimberg <sagi@grimberg.me>,
	Jason Gunthorpe <jgg@mellanox.com>,
	Doug Ledford <dledford@redhat.com>,
	rpenyaev@suse.de, Jack Wang <jinpu.wang@cloud.ionos.com>,
	Roman Pen <r.peniaev@gmail.com>
Subject: Re: [PATCH v4 17/25] ibnbd: client: main functionality
Date: Tue, 17 Sep 2019 13:39:43 +0200
Message-ID: <CAHg0HuwFTVsCNHbiXW20P6hQ3c-P_p5tB6dYKtOW=_euWEvLnA@mail.gmail.com> (raw)
In-Reply-To: <5c5ff7df-2cce-ec26-7893-55911e4d8595@acm.org>

On Mon, Sep 16, 2019 at 6:46 PM Bart Van Assche <bvanassche@acm.org> wrote:
>
> On 9/16/19 7:17 AM, Danil Kipnis wrote:
> > On Sat, Sep 14, 2019 at 1:46 AM Bart Van Assche <bvanassche@acm.org> wrote:
> >> On 6/20/19 8:03 AM, Jack Wang wrote:
> >>> +/*
> >>> + * This is for closing devices when unloading the module:
> >>> + * we might be closing a lot (>256) of devices in parallel
> >>> + * and it is better not to use the system_wq.
> >>> + */
> >>> +static struct workqueue_struct *unload_wq;
> >>
> >> I think that a better motivation is needed for the introduction of a new
> >> workqueue.
>  >
> > We didn't want to pollute the system workqueue when unmapping a big
> > number of devices at once in parallel. Will reiterate on it.
>
> There are multiple system workqueues. From <linux/workqueue.h>:
>
> extern struct workqueue_struct *system_wq;
> extern struct workqueue_struct *system_highpri_wq;
> extern struct workqueue_struct *system_long_wq;
> extern struct workqueue_struct *system_unbound_wq;
> extern struct workqueue_struct *system_freezable_wq;
> extern struct workqueue_struct *system_power_efficient_wq;
> extern struct workqueue_struct *system_freezable_power_efficient_wq;
>
> Has it been considered to use e.g. system_long_wq?
Will try to switch to system_long_wq, I do agree that a new wq for
just closing devices does make an impression of an overreaction.

>
> >> A more general question is why ibnbd needs its own queue management
> >> while no other block driver needs this?
> >
> > Each IBNBD device promises to have a queue_depth (of say 512) on each
> > of its num_cpus hardware queues. In fact we can only process a
> > queue_depth inflights at once on the whole ibtrs session connecting a
> > given client with a given server. Those 512 inflights (corresponding
> > to the number of buffers reserved by the server for this particular
> > client) have to be shared among all the devices mapped on this
> > session. This leads to the situation, that we receive more requests
> > than we can process at the moment. So we need to stop queues and start
> > them again later in some fair fashion.
>
> Can a single CPU really sustain a queue depth of 512 commands? Is it
> really necessary to have one hardware queue per CPU or is e.g. four
> queues per NUMA node sufficient? Has it been considered to send the
> number of hardware queues that the initiator wants to use and also the
> command depth per queue during login to the target side? That would
> allow the target side to allocate an independent set of buffers for each
> initiator hardware queue and would allow to remove the queue management
> at the initiator side. This might even yield better performance.
We needed a way which would allow us to address one particular
requirement: we'd like to be able to "enforce" that a response to an
IO would be processed on the same CPU the IO was originally submitted
on. In order to be able to do so we establish one rdma connection per
cpu, each having a separate cq_vector. The administrator can then
assign the corresponding IRQs to distinct CPUs. The server always
replies to an IO on the same connection he received the request on. If
the administrator did configure the /proc/irq/y/smp_affinity
accordingly, the response sent by the server will generate interrupt
on the same cpu, the IO was originally submitted on. The administrator
can configure IRQs differently, for example assign a given irq
(<->cq_vector) to a range of cpus belonging to a numa node, or
whatever assignment is best for his use-case.
Our transport module IBTRS establishes number of cpus connections
between a client and a server. The user of the transport module (i.e.
IBNBD) has no knowledge about the rdma connections, it only has a
pointer to an abstract "session", which connects  him somehow to a
remote host. IBNBD as a user of IBTRS creates block devices and uses a
given "session" to send IOs from all the block devices it created for
that session. That means IBNBD is limited in maximum number of his
inflights toward a given remote host by the capability of the
corresponding "session". So it needs to share the resources provided
by the session (in our current model those resources are in fact some
pre registered buffers on server side) among his devices.
It is possible to extend the IBTRS API so that the user (IBNBD) could
specify how many connections he wants to have on the session to be
established. It is also possible to extend the ibtrs_clt_get_tag API
(this is to get a send "permit") with a parameter specifying the
connection, the future IO is to be send on.
We now might have to change our communication model in IBTRS a bit in
order to fix the potential security problem raised during the recent
RDMA MC: https://etherpad.net/p/LPC2019_RDMA.

>
> >>> +static void msg_conf(void *priv, int errno)
> >>> +{
> >>> +     struct ibnbd_iu *iu = (struct ibnbd_iu *)priv;
> >>
> >> The kernel code I'm familiar with does not cast void pointers explicitly
> >> into another type. Please follow that convention and leave the cast out
> >> from the above and also from similar statements.
> > msg_conf() is a callback which IBNBD passes down with a request to
> > IBTRS when calling ibtrs_clt_request(). msg_conf() is called when a
> > request is completed with a pointer to a struct defined in IBNBD. So
> > IBTRS as transport doesn't know what's inside the private pointer
> > which IBNBD passed down with the request, it's opaque, since struct
> > ibnbd_iu is not visible in IBTRS. I will try to find how others avoid
> > a cast in similar situations.
>
> Are you aware that the C language can cast a void pointer into a
> non-void pointer implicitly, that means, without having to use a cast?
Oh, I misunderstood your original comment: you suggest to just remove
the explicit (struct ibnbd_iu *) and similar casts from void pointers.
I think an explicit cast makes it easier for readers to follow the
code. But it does say "Casting the return value which is a void
pointer is redundant." at least in the "Allocating Memory" section of
https://www.kernel.org/doc/html/v4.10/process/coding-style.html and
seems others don't do that, at least not when declaring a variable.
Will drop those casts.

  reply index

Thread overview: 123+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20190620150337.7847-1-jinpuwang@gmail.com>
2019-06-20 15:03 ` [PATCH v4 01/25] sysfs: export sysfs_remove_file_self() Jack Wang
2019-09-23 17:21   ` Bart Van Assche
2019-09-25  9:30     ` Danil Kipnis
2019-07-09  9:55 ` [PATCH v4 00/25] InfiniBand Transport (IBTRS) and Network Block Device (IBNBD) Danil Kipnis
2019-07-09 11:00   ` Leon Romanovsky
2019-07-09 11:17     ` Greg KH
2019-07-09 11:57       ` Jinpu Wang
2019-07-09 13:32       ` Leon Romanovsky
2019-07-09 15:39       ` Bart Van Assche
2019-07-09 11:37     ` Jinpu Wang
2019-07-09 12:06       ` Jason Gunthorpe
2019-07-09 13:15         ` Jinpu Wang
2019-07-09 13:19           ` Jason Gunthorpe
2019-07-09 14:17             ` Jinpu Wang
2019-07-09 21:27             ` Sagi Grimberg
2019-07-19 13:12               ` Danil Kipnis
2019-07-10 14:55     ` Danil Kipnis
2019-07-09 12:04   ` Jason Gunthorpe
2019-07-09 19:45   ` Sagi Grimberg
2019-07-10 13:55     ` Jason Gunthorpe
2019-07-10 16:25       ` Sagi Grimberg
2019-07-10 17:25         ` Jason Gunthorpe
2019-07-10 19:11           ` Sagi Grimberg
2019-07-11  7:27             ` Danil Kipnis
2019-07-11  8:54     ` Danil Kipnis
2019-07-12  0:22       ` Sagi Grimberg
2019-07-12  7:57         ` Jinpu Wang
2019-07-12 19:40           ` Sagi Grimberg
2019-07-15 11:21             ` Jinpu Wang
2019-07-12 10:58         ` Danil Kipnis
     [not found] ` <20190620150337.7847-26-jinpuwang@gmail.com>
2019-07-09 15:10   ` [PATCH v4 25/25] MAINTAINERS: Add maintainer for IBNBD/IBTRS modules Leon Romanovsky
2019-07-09 15:18     ` Jinpu Wang
2019-07-09 15:51       ` Leon Romanovsky
2019-09-13 23:56   ` Bart Van Assche
2019-09-19 10:30     ` Jinpu Wang
     [not found] ` <20190620150337.7847-16-jinpuwang@gmail.com>
2019-09-13 22:10   ` [PATCH v4 15/25] ibnbd: private headers with IBNBD protocol structs and helpers Bart Van Assche
2019-09-15 14:30     ` Jinpu Wang
2019-09-16  5:27       ` Leon Romanovsky
2019-09-16 13:45         ` Bart Van Assche
2019-09-17 15:41           ` Leon Romanovsky
2019-09-17 15:52             ` Jinpu Wang
2019-09-16  7:08       ` Danil Kipnis
2019-09-16 14:57       ` Jinpu Wang
2019-09-16 17:25         ` Bart Van Assche
2019-09-17 12:27           ` Jinpu Wang
2019-09-16 15:39       ` Jinpu Wang
2019-09-18 15:26         ` Bart Van Assche
2019-09-18 16:11           ` Jinpu Wang
     [not found] ` <20190620150337.7847-17-jinpuwang@gmail.com>
2019-09-13 22:25   ` [PATCH v4 16/25] ibnbd: client: private header with client structs and functions Bart Van Assche
2019-09-17 16:36     ` Jinpu Wang
2019-09-25 23:43       ` Danil Kipnis
2019-09-26 10:00         ` Jinpu Wang
     [not found] ` <20190620150337.7847-18-jinpuwang@gmail.com>
2019-09-13 23:46   ` [PATCH v4 17/25] ibnbd: client: main functionality Bart Van Assche
2019-09-16 14:17     ` Danil Kipnis
2019-09-16 16:46       ` Bart Van Assche
2019-09-17 11:39         ` Danil Kipnis [this message]
2019-09-18  7:14           ` Danil Kipnis
2019-09-18 15:47             ` Bart Van Assche
2019-09-20  8:29               ` Danil Kipnis
2019-09-25 22:26               ` Danil Kipnis
2019-09-26  9:55                 ` Roman Penyaev
2019-09-26 15:01                   ` Bart Van Assche
2019-09-27  8:52                     ` Roman Penyaev
2019-09-27  9:32                       ` Danil Kipnis
2019-09-27 12:18                         ` Danil Kipnis
2019-09-27 16:37                       ` Bart Van Assche
2019-09-27 16:50                         ` Roman Penyaev
2019-09-27 17:16                           ` Bart Van Assche
2019-09-17 13:09     ` Jinpu Wang
2019-09-17 16:46       ` Bart Van Assche
2019-09-18 12:02         ` Jinpu Wang
2019-09-18 16:05     ` Jinpu Wang
2019-09-14  0:00   ` Bart Van Assche
     [not found] ` <20190620150337.7847-25-jinpuwang@gmail.com>
2019-09-13 23:58   ` [PATCH v4 24/25] ibnbd: a bit of documentation Bart Van Assche
2019-09-18 12:22     ` Jinpu Wang
     [not found] ` <20190620150337.7847-19-jinpuwang@gmail.com>
2019-09-18 16:28   ` [PATCH v4 18/25] ibnbd: client: sysfs interface functions Bart Van Assche
2019-09-19 15:55     ` Jinpu Wang
     [not found] ` <20190620150337.7847-21-jinpuwang@gmail.com>
2019-09-18 17:41   ` [PATCH v4 20/25] ibnbd: server: main functionality Bart Van Assche
2019-09-20  7:36     ` Danil Kipnis
2019-09-20 15:42       ` Bart Van Assche
2019-09-23 15:19         ` Danil Kipnis
     [not found] ` <20190620150337.7847-22-jinpuwang@gmail.com>
2019-09-18 21:46   ` [PATCH v4 21/25] ibnbd: server: functionality for IO submission to file or block dev Bart Van Assche
2019-09-26 14:04     ` Jinpu Wang
2019-09-26 15:11       ` Bart Van Assche
2019-09-26 15:25         ` Danil Kipnis
2019-09-26 15:29           ` Bart Van Assche
2019-09-26 15:38             ` Danil Kipnis
2019-09-26 15:42               ` Jinpu Wang
     [not found] ` <20190620150337.7847-3-jinpuwang@gmail.com>
2019-09-23 17:44   ` [PATCH v4 02/25] ibtrs: public interface header to establish RDMA connections Bart Van Assche
2019-09-25 10:20     ` Danil Kipnis
2019-09-25 15:38       ` Bart Van Assche
     [not found] ` <20190620150337.7847-7-jinpuwang@gmail.com>
2019-09-23 21:51   ` [PATCH v4 06/25] ibtrs: client: main functionality Bart Van Assche
2019-09-25 17:36     ` Danil Kipnis
2019-09-25 18:55       ` Bart Van Assche
2019-09-25 20:50         ` Danil Kipnis
2019-09-25 21:08           ` Bart Van Assche
2019-09-25 21:16             ` Bart Van Assche
2019-09-25 22:53             ` Danil Kipnis
2019-09-25 23:21               ` Bart Van Assche
2019-09-26  9:16                 ` Danil Kipnis
     [not found] ` <20190620150337.7847-4-jinpuwang@gmail.com>
2019-09-23 22:50   ` [PATCH v4 03/25] ibtrs: private headers with IBTRS protocol structs and helpers Bart Van Assche
2019-09-25 21:45     ` Danil Kipnis
2019-09-25 21:57       ` Bart Van Assche
2019-09-27  8:56     ` Jinpu Wang
     [not found] ` <20190620150337.7847-5-jinpuwang@gmail.com>
2019-09-23 23:03   ` [PATCH v4 04/25] ibtrs: core: lib functions shared between client and server modules Bart Van Assche
2019-09-27 10:13     ` Jinpu Wang
     [not found] ` <20190620150337.7847-6-jinpuwang@gmail.com>
2019-09-23 23:05   ` [PATCH v4 05/25] ibtrs: client: private header with client structs and functions Bart Van Assche
2019-09-27 10:18     ` Jinpu Wang
     [not found] ` <20190620150337.7847-8-jinpuwang@gmail.com>
2019-09-23 23:15   ` [PATCH v4 07/25] ibtrs: client: statistics functions Bart Van Assche
2019-09-27 12:00     ` Jinpu Wang
     [not found] ` <20190620150337.7847-10-jinpuwang@gmail.com>
2019-09-23 23:21   ` [PATCH v4 09/25] ibtrs: server: private header with server structs and functions Bart Van Assche
2019-09-27 12:04     ` Jinpu Wang
     [not found] ` <20190620150337.7847-11-jinpuwang@gmail.com>
2019-09-23 23:49   ` [PATCH v4 10/25] ibtrs: server: main functionality Bart Van Assche
2019-09-27 15:03     ` Jinpu Wang
2019-09-27 15:11       ` Bart Van Assche
2019-09-27 15:19         ` Jinpu Wang
     [not found] ` <20190620150337.7847-12-jinpuwang@gmail.com>
2019-09-23 23:56   ` [PATCH v4 11/25] ibtrs: server: statistics functions Bart Van Assche
2019-10-02 15:15     ` Jinpu Wang
2019-10-02 15:42       ` Leon Romanovsky
2019-10-02 15:45         ` Jinpu Wang
2019-10-02 16:00           ` Leon Romanovsky
     [not found] ` <20190620150337.7847-13-jinpuwang@gmail.com>
2019-09-24  0:00   ` [PATCH v4 12/25] ibtrs: server: sysfs interface functions Bart Van Assche
2019-10-02 15:11     ` Jinpu Wang

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHg0HuwFTVsCNHbiXW20P6hQ3c-P_p5tB6dYKtOW=_euWEvLnA@mail.gmail.com' \
    --to=danil.kipnis@cloud.ionos.com \
    --cc=axboe@kernel.dk \
    --cc=bvanassche@acm.org \
    --cc=dledford@redhat.com \
    --cc=hch@infradead.org \
    --cc=jgg@mellanox.com \
    --cc=jinpu.wang@cloud.ionos.com \
    --cc=jinpuwang@gmail.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=r.peniaev@gmail.com \
    --cc=rpenyaev@suse.de \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-RDMA Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-rdma/0 linux-rdma/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-rdma linux-rdma/ https://lore.kernel.org/linux-rdma \
		linux-rdma@vger.kernel.org linux-rdma@archiver.kernel.org
	public-inbox-index linux-rdma

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-rdma


AGPL code for this site: git clone https://public-inbox.org/ public-inbox