From: Sagi Grimberg <email@example.com> To: Danil Kipnis <firstname.lastname@example.org> Cc: Jack Wang <email@example.com>, firstname.lastname@example.org, email@example.com, firstname.lastname@example.org, Christoph Hellwig <email@example.com>, firstname.lastname@example.org, email@example.com, firstname.lastname@example.org, Roman Pen <email@example.com>, firstname.lastname@example.org Subject: Re: [PATCH v4 00/25] InfiniBand Transport (IBTRS) and Network Block Device (IBNBD) Date: Thu, 11 Jul 2019 17:22:26 -0700 Message-ID: <email@example.com> (raw) In-Reply-To: <CAHg0HuxZvXH899=M4vC7BTH-bP2J35aTwsGhiGoC8AamD8gOyA@mail.gmail.com> >> My main issues which were raised before are: >> - IMO there isn't any justification to this ibtrs layering separation >> given that the only user of this is your ibnbd. Unless you are >> trying to submit another consumer, you should avoid adding another >> subsystem that is not really general purpose. > We designed ibtrs not only with the IBNBD in mind but also as the > transport layer for a distributed SDS. We'd like to be able to do what > ceph is capable of (automatic up/down scaling of the storage cluster, > automatic recovery) but using in-kernel rdma-based IO transport > drivers, thin-provisioned volume managers, etc. to keep the highest > possible performance. Sounds lovely, but still very much bound to your ibnbd. And that part is not included in the patch set, so I still don't see why should this be considered as a "generic" transport subsystem (it clearly isn't). > All in all itbrs is a library to establish a "fat", multipath, > autoreconnectable connection between two hosts on top of rdma, > optimized for transport of IO traffic. That is also dictating a wire-protocol which makes it useless to pretty much any other consumer. Personally, I don't see how this library would ever be used outside of your ibnbd. >> - ibtrs in general is using almost no infrastructure from the existing >> kernel subsystems. Examples are: >> - tag allocation mechanism (which I'm not clear why its needed) > As you correctly noticed our client manages the buffers allocated and > registered by the server on the connection establishment. Our tags are > just a mechanism to take and release those buffers for incoming > requests on client side. Since the buffers allocated by the server are > to be shared between all the devices mapped from that server and all > their HW queues (each having num_cpus of them) the mechanism behind > get_tag/put_tag also takes care of the fairness. We have infrastructure for this, sbitmaps. >> - rdma rw abstraction similar to what we have in the core > On the one hand we have only single IO related function: > ibtrs_clt_request(READ/WRITE, session,...), which executes rdma write > with imm, or requests an rdma write with imm to be executed by the > server. For sure you can enhance the rw API to have imm support? > On the other hand we provide an abstraction to establish and > manage what we call "session", which consist of multiple paths (to do > failover and multipath with different policies), where each path > consists of num_cpu rdma connections. That's fine, but it doesn't mean that it also needs to re-write infrastructure that we already have. > Once you established a session > you can add or remove paths from it on the fly. In case the connection > to server is lost, the client does periodic attempts to reconnect > automatically. On the server side you get just sg-lists with a > direction READ or WRITE as requested by the client. We designed this > interface not only as the minimum required to build a block device on > top of rdma but also with a distributed raid in mind. I suggest you take a look at the rw API and use that in your transport. >> Another question, from what I understand from the code, the client >> always rdma_writes data on writes (with imm) from a remote pool of >> server buffers dedicated to it. Essentially all writes are immediate (no >> rdma reads ever). How is that different than using send wrs to a set of >> pre-posted recv buffers (like all others are doing)? Is it faster? > At the very beginning of the project we did some measurements and saw, > that it is faster. I'm not sure if this is still true Its not significantly faster (can't imagine why it would be). What could make a difference is probably the fact that you never do rdma reads for I/O writes which might be better. Also perhaps the fact that you normally don't wait for send completions before completing I/O (which is broken), and the fact that you batch recv operations. I would be interested to understand what indeed makes ibnbd run faster though. >> Also, given that the server pre-allocate a substantial amount of memory >> for each connection, is it documented the requirements from the server >> side? Usually kernel implementations (especially upstream ones) will >> avoid imposing such large longstanding memory requirements on the system >> by default. I don't have a firm stand on this, but wanted to highlight >> this as you are sending this for upstream inclusion. > We definitely need to stress that somewhere. Will include into readme > and add to the cover letter next time. Our memory management is indeed > basically absent in favor of performance: The server reserves > queue_depth of say 512K buffers. Each buffer is used by client for > single IO only, no matter how big the request is. So if client only > issues 4K IOs, we do waste 508*queue_depth K of memory. We were aiming > for lowest possible latency from the beginning. It is probably > possible to implement some clever allocator on the server side which > wouldn't affect the performance a lot. Or you can fallback to rdma_read like the rest of the ulps.
next prev parent reply index Thread overview: 123+ messages / expand[flat|nested] mbox.gz Atom feed top [not found] <firstname.lastname@example.org> 2019-06-20 15:03 ` [PATCH v4 01/25] sysfs: export sysfs_remove_file_self() Jack Wang 2019-09-23 17:21 ` Bart Van Assche 2019-09-25 9:30 ` Danil Kipnis 2019-07-09 9:55 ` [PATCH v4 00/25] InfiniBand Transport (IBTRS) and Network Block Device (IBNBD) Danil Kipnis 2019-07-09 11:00 ` Leon Romanovsky 2019-07-09 11:17 ` Greg KH 2019-07-09 11:57 ` Jinpu Wang 2019-07-09 13:32 ` Leon Romanovsky 2019-07-09 15:39 ` Bart Van Assche 2019-07-09 11:37 ` Jinpu Wang 2019-07-09 12:06 ` Jason Gunthorpe 2019-07-09 13:15 ` Jinpu Wang 2019-07-09 13:19 ` Jason Gunthorpe 2019-07-09 14:17 ` Jinpu Wang 2019-07-09 21:27 ` Sagi Grimberg 2019-07-19 13:12 ` Danil Kipnis 2019-07-10 14:55 ` Danil Kipnis 2019-07-09 12:04 ` Jason Gunthorpe 2019-07-09 19:45 ` Sagi Grimberg 2019-07-10 13:55 ` Jason Gunthorpe 2019-07-10 16:25 ` Sagi Grimberg 2019-07-10 17:25 ` Jason Gunthorpe 2019-07-10 19:11 ` Sagi Grimberg 2019-07-11 7:27 ` Danil Kipnis 2019-07-11 8:54 ` Danil Kipnis 2019-07-12 0:22 ` Sagi Grimberg [this message] 2019-07-12 7:57 ` Jinpu Wang 2019-07-12 19:40 ` Sagi Grimberg 2019-07-15 11:21 ` Jinpu Wang 2019-07-12 10:58 ` Danil Kipnis [not found] ` <email@example.com> 2019-07-09 15:10 ` [PATCH v4 25/25] MAINTAINERS: Add maintainer for IBNBD/IBTRS modules Leon Romanovsky 2019-07-09 15:18 ` Jinpu Wang 2019-07-09 15:51 ` Leon Romanovsky 2019-09-13 23:56 ` Bart Van Assche 2019-09-19 10:30 ` Jinpu Wang [not found] ` <firstname.lastname@example.org> 2019-09-13 22:10 ` [PATCH v4 15/25] ibnbd: private headers with IBNBD protocol structs and helpers Bart Van Assche 2019-09-15 14:30 ` Jinpu Wang 2019-09-16 5:27 ` Leon Romanovsky 2019-09-16 13:45 ` Bart Van Assche 2019-09-17 15:41 ` Leon Romanovsky 2019-09-17 15:52 ` Jinpu Wang 2019-09-16 7:08 ` Danil Kipnis 2019-09-16 14:57 ` Jinpu Wang 2019-09-16 17:25 ` Bart Van Assche 2019-09-17 12:27 ` Jinpu Wang 2019-09-16 15:39 ` Jinpu Wang 2019-09-18 15:26 ` Bart Van Assche 2019-09-18 16:11 ` Jinpu Wang [not found] ` <email@example.com> 2019-09-13 22:25 ` [PATCH v4 16/25] ibnbd: client: private header with client structs and functions Bart Van Assche 2019-09-17 16:36 ` Jinpu Wang 2019-09-25 23:43 ` Danil Kipnis 2019-09-26 10:00 ` Jinpu Wang [not found] ` <firstname.lastname@example.org> 2019-09-13 23:46 ` [PATCH v4 17/25] ibnbd: client: main functionality Bart Van Assche 2019-09-16 14:17 ` Danil Kipnis 2019-09-16 16:46 ` Bart Van Assche 2019-09-17 11:39 ` Danil Kipnis 2019-09-18 7:14 ` Danil Kipnis 2019-09-18 15:47 ` Bart Van Assche 2019-09-20 8:29 ` Danil Kipnis 2019-09-25 22:26 ` Danil Kipnis 2019-09-26 9:55 ` Roman Penyaev 2019-09-26 15:01 ` Bart Van Assche 2019-09-27 8:52 ` Roman Penyaev 2019-09-27 9:32 ` Danil Kipnis 2019-09-27 12:18 ` Danil Kipnis 2019-09-27 16:37 ` Bart Van Assche 2019-09-27 16:50 ` Roman Penyaev 2019-09-27 17:16 ` Bart Van Assche 2019-09-17 13:09 ` Jinpu Wang 2019-09-17 16:46 ` Bart Van Assche 2019-09-18 12:02 ` Jinpu Wang 2019-09-18 16:05 ` Jinpu Wang 2019-09-14 0:00 ` Bart Van Assche [not found] ` <email@example.com> 2019-09-13 23:58 ` [PATCH v4 24/25] ibnbd: a bit of documentation Bart Van Assche 2019-09-18 12:22 ` Jinpu Wang [not found] ` <firstname.lastname@example.org> 2019-09-18 16:28 ` [PATCH v4 18/25] ibnbd: client: sysfs interface functions Bart Van Assche 2019-09-19 15:55 ` Jinpu Wang [not found] ` <email@example.com> 2019-09-18 17:41 ` [PATCH v4 20/25] ibnbd: server: main functionality Bart Van Assche 2019-09-20 7:36 ` Danil Kipnis 2019-09-20 15:42 ` Bart Van Assche 2019-09-23 15:19 ` Danil Kipnis [not found] ` <firstname.lastname@example.org> 2019-09-18 21:46 ` [PATCH v4 21/25] ibnbd: server: functionality for IO submission to file or block dev Bart Van Assche 2019-09-26 14:04 ` Jinpu Wang 2019-09-26 15:11 ` Bart Van Assche 2019-09-26 15:25 ` Danil Kipnis 2019-09-26 15:29 ` Bart Van Assche 2019-09-26 15:38 ` Danil Kipnis 2019-09-26 15:42 ` Jinpu Wang [not found] ` <email@example.com> 2019-09-23 17:44 ` [PATCH v4 02/25] ibtrs: public interface header to establish RDMA connections Bart Van Assche 2019-09-25 10:20 ` Danil Kipnis 2019-09-25 15:38 ` Bart Van Assche [not found] ` <firstname.lastname@example.org> 2019-09-23 21:51 ` [PATCH v4 06/25] ibtrs: client: main functionality Bart Van Assche 2019-09-25 17:36 ` Danil Kipnis 2019-09-25 18:55 ` Bart Van Assche 2019-09-25 20:50 ` Danil Kipnis 2019-09-25 21:08 ` Bart Van Assche 2019-09-25 21:16 ` Bart Van Assche 2019-09-25 22:53 ` Danil Kipnis 2019-09-25 23:21 ` Bart Van Assche 2019-09-26 9:16 ` Danil Kipnis [not found] ` <email@example.com> 2019-09-23 22:50 ` [PATCH v4 03/25] ibtrs: private headers with IBTRS protocol structs and helpers Bart Van Assche 2019-09-25 21:45 ` Danil Kipnis 2019-09-25 21:57 ` Bart Van Assche 2019-09-27 8:56 ` Jinpu Wang [not found] ` <firstname.lastname@example.org> 2019-09-23 23:03 ` [PATCH v4 04/25] ibtrs: core: lib functions shared between client and server modules Bart Van Assche 2019-09-27 10:13 ` Jinpu Wang [not found] ` <email@example.com> 2019-09-23 23:05 ` [PATCH v4 05/25] ibtrs: client: private header with client structs and functions Bart Van Assche 2019-09-27 10:18 ` Jinpu Wang [not found] ` <firstname.lastname@example.org> 2019-09-23 23:15 ` [PATCH v4 07/25] ibtrs: client: statistics functions Bart Van Assche 2019-09-27 12:00 ` Jinpu Wang [not found] ` <email@example.com> 2019-09-23 23:21 ` [PATCH v4 09/25] ibtrs: server: private header with server structs and functions Bart Van Assche 2019-09-27 12:04 ` Jinpu Wang [not found] ` <firstname.lastname@example.org> 2019-09-23 23:49 ` [PATCH v4 10/25] ibtrs: server: main functionality Bart Van Assche 2019-09-27 15:03 ` Jinpu Wang 2019-09-27 15:11 ` Bart Van Assche 2019-09-27 15:19 ` Jinpu Wang [not found] ` <email@example.com> 2019-09-23 23:56 ` [PATCH v4 11/25] ibtrs: server: statistics functions Bart Van Assche 2019-10-02 15:15 ` Jinpu Wang 2019-10-02 15:42 ` Leon Romanovsky 2019-10-02 15:45 ` Jinpu Wang 2019-10-02 16:00 ` Leon Romanovsky [not found] ` <firstname.lastname@example.org> 2019-09-24 0:00 ` [PATCH v4 12/25] ibtrs: server: sysfs interface functions Bart Van Assche 2019-10-02 15:11 ` Jinpu Wang
Reply instructions: You may reply publically to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
Linux-RDMA Archive on lore.kernel.org Archives are clonable: git clone --mirror https://lore.kernel.org/linux-rdma/0 linux-rdma/git/0.git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V2 linux-rdma linux-rdma/ https://lore.kernel.org/linux-rdma \ firstname.lastname@example.org email@example.com public-inbox-index linux-rdma Example config snippet for mirrors Newsgroup available over NNTP: nntp://nntp.lore.kernel.org/org.kernel.vger.linux-rdma AGPL code for this site: git clone https://public-inbox.org/ public-inbox