From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=ACpf=VH=vger.kernel.org=linux-block-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-5.8 required=3.0 tests=DKIM_SIGNED,DKIM_VALID,
	DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,
	MENTIONS_GIT_HOSTING,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 10D1FC74A21
	for <linux-block@archiver.kernel.org>; Wed, 10 Jul 2019 14:55:38 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.kernel.org (Postfix) with ESMTP id DD9852086D
	for <linux-block@archiver.kernel.org>; Wed, 10 Jul 2019 14:55:37 +0000 (UTC)
Authentication-Results: mail.kernel.org;
	dkim=pass (2048-bit key) header.d=cloud.ionos.com header.i=@cloud.ionos.com header.b="VFNULUCG"
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1727823AbfGJOzh (ORCPT <rfc822;linux-block@archiver.kernel.org>);
        Wed, 10 Jul 2019 10:55:37 -0400
Received: from mail-io1-f65.google.com ([209.85.166.65]:35169 "EHLO
        mail-io1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1726380AbfGJOzh (ORCPT
        <rfc822;linux-block@vger.kernel.org>);
        Wed, 10 Jul 2019 10:55:37 -0400
Received: by mail-io1-f65.google.com with SMTP id m24so5384076ioo.2
        for <linux-block@vger.kernel.org>; Wed, 10 Jul 2019 07:55:36 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=cloud.ionos.com; s=google;
        h=mime-version:references:in-reply-to:from:date:message-id:subject:to
         :cc;
        bh=Twe7ecyzjHZ8DMP/2Lr9YSihfBlQIaYCO4YxlwHixEU=;
        b=VFNULUCGYc5C/mAy1+GvgZKxhQSwQiTOI1WdoPQsmr12lST0WDAsEJFENwWY3yf8BS
         gyQk13B6DePfrb6TxztsQro/A3WUX/A4/td/wWY/EoITBAR7yYK9R63UVXcYy8YKzbSd
         LjDqeYr91o8JDuo16+S5UMyOrB7nkX7Clrs0ntg4azgN4qRwsXZvVE4o8n3yHrtF6Ttz
         pYhP5FZ1qg4Go41uK0NsTWxwa2S8LPY6/8gBOiJlbTQn1r9lK1xYHh7zueMAEAkWnT3B
         i99/sD44DWMNIUGOv4CYl8Bqx0yRWGu+Uhy1EU4SvD2fRXN1Wbg1TEN8FAUPL66U2Z4P
         bTYA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:mime-version:references:in-reply-to:from:date
         :message-id:subject:to:cc;
        bh=Twe7ecyzjHZ8DMP/2Lr9YSihfBlQIaYCO4YxlwHixEU=;
        b=n279txm23qQRWenJvZkm72JXFthVnFe/ukpP9xJDTtL4WJZlUCO0UwdXBdK2I6Uo2p
         FJPdNgghFUOCaYOZixCSJ/XhxKJ31+VEuAhAWe0AIhXenecSVb/8SmY8P4HVpvsTe6Xn
         bouOrd6lv7RfXEdxpkmCul/sPlySujIA2DxF7/SjP6A+n4qJnwW9SUDtCQGet0ZQMep8
         /r86Q0c95NTVIhYd5AttrtskS2UzcaK0ONqRxmp3Ro8e4XBHEhJDu9SIcTBEyacvonyw
         u8Kpn54wSmQp0dqWIpombDJMmBNj9hRR9wdBQAYnEG8ObV9f3b08eDzTL2EkAarFHggt
         Utqw==
X-Gm-Message-State: APjAAAX4CfsTYgp8BzBhkKKhOK/k1Y3iZf8k1bEf449xapcKNrdPS/K/
        0pEQL0738fkZLxejQzHRwjhcuZsjiuUIC1V2mEJY
X-Google-Smtp-Source: APXvYqwiNFgudvHR7APlwic5M5ARNema/LLFjl4sBRzFWkTW0ZH+V/Dp9giDofxDnT7DsB2uOoUFg67pSqWZ6SFNThU=
X-Received: by 2002:a6b:5b01:: with SMTP id v1mr27601587ioh.120.1562770535816;
 Wed, 10 Jul 2019 07:55:35 -0700 (PDT)
MIME-Version: 1.0
References: <20190620150337.7847-1-jinpuwang@gmail.com> <CAHg0HuzUaKs-ACHah-VdNHbot0_usx4ErMesVAw8+DFR63FFqw@mail.gmail.com>
 <20190709110036.GQ7034@mtr-leonro.mtl.com>
In-Reply-To: <20190709110036.GQ7034@mtr-leonro.mtl.com>
From:   Danil Kipnis <danil.kipnis@cloud.ionos.com>
Date:   Wed, 10 Jul 2019 16:55:24 +0200
Message-ID: <CAHg0Huz=kQtSVthXBpnzt5hos70sMLbdBo1hQRsQXnn4MccWzw@mail.gmail.com>
Subject: Re: [PATCH v4 00/25] InfiniBand Transport (IBTRS) and Network Block
 Device (IBNBD)
To:     Leon Romanovsky <leon@kernel.org>
Cc:     Jack Wang <jinpuwang@gmail.com>, linux-block@vger.kernel.org,
        linux-rdma@vger.kernel.org, axboe@kernel.dk,
        Christoph Hellwig <hch@infradead.org>,
        Sagi Grimberg <sagi@grimberg.me>, bvanassche@acm.org,
        jgg@mellanox.com, dledford@redhat.com,
        Roman Pen <r.peniaev@gmail.com>, gregkh@linuxfoundation.org
Content-Type: text/plain; charset="UTF-8"
Sender: linux-block-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-block.vger.kernel.org>
X-Mailing-List: linux-block@vger.kernel.org

Hi Leon,

thanks for the feedback!

On Tue, Jul 9, 2019 at 1:00 PM Leon Romanovsky <leon@kernel.org> wrote:
>
> On Tue, Jul 09, 2019 at 11:55:03AM +0200, Danil Kipnis wrote:
> > Hallo Doug, Hallo Jason, Hallo Jens, Hallo Greg,
> >
> > Could you please provide some feedback to the IBNBD driver and the
> > IBTRS library?
> > So far we addressed all the requests provided by the community and
> > continue to maintain our code up-to-date with the upstream kernel
> > while having an extra compatibility layer for older kernels in our
> > out-of-tree repository.
> > I understand that SRP and NVMEoF which are in the kernel already do
> > provide equivalent functionality for the majority of the use cases.
> > IBNBD on the other hand is showing higher performance and more
> > importantly includes the IBTRS - a general purpose library to
> > establish connections and transport BIO-like read/write sg-lists over
> > RDMA, while SRP is targeting SCSI and NVMEoF is addressing NVME. While
> > I believe IBNBD does meet the kernel coding standards, it doesn't have
> > a lot of users, while SRP and NVMEoF are widely accepted. Do you think
> > it would make sense for us to rework our patchset and try pushing it
> > for staging tree first, so that we can proof IBNBD is well maintained,
> > beneficial for the eco-system, find a proper location for it within
> > block/rdma subsystems? This would make it easier for people to try it
> > out and would also be a huge step for us in terms of maintenance
> > effort.
> > The names IBNBD and IBTRS are in fact misleading. IBTRS sits on top of
> > RDMA and is not bound to IB (We will evaluate IBTRS with ROCE in the
> > near future). Do you think it would make sense to rename the driver to
> > RNBD/RTRS?
>
> It is better to avoid "staging" tree, because it will lack attention of
> relevant people and your efforts will be lost once you will try to move
> out of staging. We are all remembering Lustre and don't want to see it
> again.
>
> Back then, you was asked to provide support for performance superiority.

I have only theories of why ibnbd is showing better numbers than nvmeof:
1. The way we utilize the MQ framework in IBNBD. We promise to have
queue_depth (say 512) requests on each of the num_cpus hardware queues
of each device, but in fact we have only queue_depth for the whole
"session" toward a given server. The moment we have queue_depth
inflights we need stop the queue (on a device on a cpu) we get more
requests on. We need to start them again after some requests are
completed. We maintain per cpu lists of stopped HW queues, a bitmap
showing which lists are not empty, etc. to wake them up in a
round-robin fashion to avoid starvation of any devices.
2. We only do rdma writes with imm. A server reserves queue_depth of
max_io_size buffers for a given client. The client manages those
himself. Client uses imm field to tell to the server which buffer has
been written (and where) and server uses the imm field to send back
errno. If our max_io_size is 64K and queue_depth 512 and client only
issues 4K IOs all the time, then 60*512K memory is wasted. On the
other hand we do no buffer allocation/registration in io path on
server side. Server sends rdma addresses and keys to those
preregistered buffers on connection establishment and
deallocates/unregisters them when a session is closed. That's for
writes. For reads, client registers user buffers (after fr) and sends
the addresses and keys to the server (with an rdma write with imm).
Server rdma writes into those buffers. Client does the
unregistering/invalidation and completes the request.

> Can you please share any numbers with us?
Apart from github
(https://github.com/ionos-enterprise/ibnbd/tree/master/performance/v4-v5.2-rc3)
the performance results for v5.2-rc3 on two different systems can be
accessed under dcd.ionos.com/ibnbd-performance-report. The page allows
to filter out test scenarios interesting for comparison.

>
> Thanks