linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Boaz Harrosh <ooo@electrozaur.com>
To: Christoph Hellwig <hch@lst.de>, Douglas Gilbert <dgilbert@interlog.com>
Cc: axboe@kernel.dk, martin.petersen@oracle.com,
	Johannes Thumshirn <jthumshirn@suse.de>,
	Benjamin Block <bblock@linux.vnet.ibm.com>,
	linux-scsi@vger.kernel.org, linux-block@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: remove exofs, the T10 OSD code and block/scsi bidi support V3
Date: Sun, 23 Dec 2018 14:57:07 +0200	[thread overview]
Message-ID: <406d1a96-2a97-2e35-e52e-22525555fc09@electrozaur.com> (raw)
In-Reply-To: <20181220072656.GA10011@lst.de>

On 20/12/18 09:26, Christoph Hellwig wrote:
> On Wed, Dec 19, 2018 at 09:01:53PM -0500, Douglas Gilbert wrote:
>>>   1) reduce the size of every kernel with block layer support, and
>>>      even more for every kernel with scsi support
>>
>> By proposing the removal of bidi support from the block layer, it isn't
>> just the SCSI subsystem that will be impacted. Those NVMe documents
>> that you referred me to earlier in the year, in the command tables
>> in 1.3c and earlier you have noticed the 2 bit direction field and
>> what 11b means? Even if there aren't any bidi NVMe commands *** yet,
>> the fact that NVMe's 64 byte command format has provision for 4
>> (not 2) independent data transfers (data + meta, for each direction).
>> Surely NVMe will sooner or later take advantage of those ... a
>> command like READ GATHERED comes to mind.
> 
> NVMe on the other hand does have support for separate read and write
> buffers as in the current SCSI bidi support, as it encodes the data
> transfers in that SQE.  So IFF NVMe does bidi commands it would have
> to use a single buffer for data in/out, 

There is no such thing as "buffer" there is at first a bio, and after
virtual-to-iommu mapping a scatter-gather-list. All these are currently
governed by a struct request.
request, bio, and sgl, have a single direction, All API's expect a single
direction.

All BIDI did was to say. Lets not change any API or structure but just
use two of them at the same time.

All the wiser is the very high level user, and the very low HW driver like
iscsi. All the middlewere was never touched.

In the view of a bidi target like say an osd. It all stream looks like a single
"Buffer" on the wire, were some of it is read and some of it is written
to.

> which can be easily done

?? Did you try. It will take much more than an additional pointer sir

> in the block layer without the current bidi support that chains
> two struct request instances for data in and data out.
> 

That was the all trick of not changing a single API or structure
Just have two of the same thing, we already know how to handle

>>>   2) reduce the size of the critical struct request structure by
>>>      128 bits, thus reducing the memory used by every blk-mq driver
>>>      significantly, never mind the cache effects
>>
>> Hmm, one pointer (that is null in the non-bidi case) should be enough,
>> that's 64 or 32 bits.
> 
> Due to the way we use request chaining we need two fields at the
> moment.  ->special and ->next_rq.  

No! ->special is nothing to do with bidi. ->special is a field to be
used by LLD's only and are not to be touched by block layer or transports
or high level users.
 
Request has the single ->next_rq for bidi. And could be eliminated by
sharing space with the elevator info. Do you want a patch?

(So in effect it can be taking 0 bytes, and yes a little bit of code)

> If we'd refactor the whole thing
> for the basically non-existent user we could indeed probably get it
> down to a single pointer. 
> 
>> While on the subject of bidi, the order of transfers: is the data-out
>> (to the target) always before the data-in or is it the target device
>> that decides (depending on the semantics of the command) who is first?
> 
> The way I read SAM data needs to be transferred to the device for
> processing first, then the processing occurs and then it is transferred
> out, so the order seems fixed.
> 

Not sure what is the "SAM" above. But most of the BIDI commands I know,
osd and otherwise, the order is command specific, and many times it is
done in parallel.
Read some bits than write some bits, rinse and repeat ...

(You see in scsi the all OUT buffer is part of the actual CDB, so in effect
 any READ is a BIDI. The novelty here is the variable sizes and the SW stack
 memory targets for the different operations)

>>
>> Doug Gilbert
>>
>>  *** there could already be vendor specific bidi NVMe commands out
>>     there (ditto for SCSI)
> 
> For NVMe they'd need to transfer data in and out in the same buffer
> to sort work, and even then only if we don't happen to be bounce
> buffering using swiotlb, or using a network transport.  Similarly for
> SCSI only iSCSI at the moment supports bidi CDBs, so we could have
> applications using vendor specific bidi commands on iSCSI, which
> is exactly what we're trying to find out, but it is a bit of a very
> niche use case.
> 

Again bidi works NOW. Did not yet see the big gain, of throwing it
out.

Jai Maa
Boaz


  reply	other threads:[~2018-12-23 12:57 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-11 13:32 remove exofs, the T10 OSD code and block/scsi bidi support V3 Christoph Hellwig
2018-11-11 13:32 ` [PATCH 1/8] fs: remove exofs Christoph Hellwig
2018-11-11 13:32 ` [PATCH 2/8] scsi: remove the SCSI OSD library Christoph Hellwig
2018-11-11 13:32 ` [PATCH 3/8] scsi: remove bidirectional command support Christoph Hellwig
2018-11-11 13:32 ` [PATCH 4/8] scsi: stop setting up request->special Christoph Hellwig
2018-11-11 13:32 ` [PATCH 5/8] bsg: refactor bsg_ioctl Christoph Hellwig
2018-11-13 14:05   ` Benjamin Block
2018-11-19 12:29     ` Avri Altman
2018-11-11 13:32 ` [PATCH 6/8] bsg-lib: handle bidi requests without block layer help Christoph Hellwig
2018-11-13 14:35   ` Benjamin Block
2018-11-14 15:48     ` Christoph Hellwig
2018-11-14 16:44       ` Benjamin Block
2018-11-19 12:31   ` Avri Altman
2018-11-11 13:32 ` [PATCH 7/8] block: remove req->special Christoph Hellwig
2018-11-11 13:32 ` [PATCH 8/8] block: remove bidi support Christoph Hellwig
2018-11-11 16:35 ` remove exofs, the T10 OSD code and block/scsi bidi support V3 Douglas Gilbert
2018-11-26 17:11 ` Boaz Harrosh
2018-12-19 14:43   ` Christoph Hellwig
2018-12-20  2:01     ` Douglas Gilbert
2018-12-20  7:26       ` Christoph Hellwig
2018-12-23 12:57         ` Boaz Harrosh [this message]
2018-12-20 16:08       ` Elliott, Robert (Persistent Memory)
2018-12-23 12:17     ` Boaz Harrosh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=406d1a96-2a97-2e35-e52e-22525555fc09@electrozaur.com \
    --to=ooo@electrozaur.com \
    --cc=axboe@kernel.dk \
    --cc=bblock@linux.vnet.ibm.com \
    --cc=dgilbert@interlog.com \
    --cc=hch@lst.de \
    --cc=jthumshirn@suse.de \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).