linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Hannes Reinecke <hare@suse.de>
To: John Garry <john.garry@huawei.com>,
	"axboe@kernel.dk" <axboe@kernel.dk>,
	"jejb@linux.ibm.com" <jejb@linux.ibm.com>,
	"martin.petersen@oracle.com" <martin.petersen@oracle.com>
Cc: "linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>,
	"ming.lei@redhat.com" <ming.lei@redhat.com>,
	"hare@suse.com" <hare@suse.com>,
	"bvanassche@acm.org" <bvanassche@acm.org>,
	"chenxiang (M)" <chenxiang66@hisilicon.com>
Subject: Re: [PATCH RFC 3/5] blk-mq: Facilitate a shared tags per tagset
Date: Wed, 13 Nov 2019 16:38:22 +0100	[thread overview]
Message-ID: <02056612-a958-7b05-3c54-bb2fa69bc493@suse.de> (raw)
In-Reply-To: <2cbf591c-8284-8499-7804-e7078cf274d2@huawei.com>

On 11/13/19 3:57 PM, John Garry wrote:
> On 13/11/2019 14:06, Hannes Reinecke wrote:
>> On 11/13/19 2:36 PM, John Garry wrote:
>>> Some SCSI HBAs (such as HPSA, megaraid, mpt3sas, hisi_sas_v3 ..) support
>>> multiple reply queues with single hostwide tags.
>>>
>>> In addition, these drivers want to use interrupt assignment in
>>> pci_alloc_irq_vectors(PCI_IRQ_AFFINITY). However, as discussed in [0],
>>> CPU hotplug may cause in-flight IO completion to not be serviced when an
>>> interrupt is shutdown.
>>>
>>> To solve that problem, Ming's patchset to drain hctx's should ensure no
>>> IOs are missed in-flight [1].
>>>
>>> However, to take advantage of that patchset, we need to map the HBA HW
>>> queues to blk mq hctx's; to do that, we need to expose the HBA HW
>>> queues.
>>>
>>> In making that transition, the per-SCSI command request tags are no
>>> longer unique per Scsi host - they are just unique per hctx. As such,
>>> the
>>> HBA LLDD would have to generate this tag internally, which has a certain
>>> performance overhead.
>>>
>>> However another problem is that blk mq assumes the host may accept
>>> (Scsi_host.can_queue * #hw queue) commands. In [2], we removed the Scsi
>>> host busy counter, which would stop the LLDD being sent more than
>>> .can_queue commands; however, we should still ensure that the block
>>> layer
>>> does not issue more than .can_queue commands to the Scsi host.
>>>
>>> To solve this problem, introduce a shared tags per blk_mq_tag_set, which
>>> may be requested when allocating the tagset.
>>>
>>> New flag BLK_MQ_F_TAG_HCTX_SHARED should be set when requesting the
>>> tagset.
>>>
>>> This is based on work originally from Ming Lei in [3].
>>>
>>> [0]
>>> https://lore.kernel.org/linux-block/alpine.DEB.2.21.1904051331270.1802@nanos.tec.linutronix.de/
>>>
>>> [1]
>>> https://lore.kernel.org/linux-block/20191014015043.25029-1-ming.lei@redhat.com/
>>>
>>> [2]
>>> https://lore.kernel.org/linux-scsi/20191025065855.6309-1-ming.lei@redhat.com/
>>>
>>> [3]
>>> https://lore.kernel.org/linux-block/20190531022801.10003-1-ming.lei@redhat.com/
>>>
>>>
>>> Signed-off-by: John Garry <john.garry@huawei.com>
>>> ---
>>>   block/blk-core.c       |  1 +
>>>   block/blk-flush.c      |  2 +
>>>   block/blk-mq-debugfs.c |  2 +-
>>>   block/blk-mq-tag.c     | 85 ++++++++++++++++++++++++++++++++++++++++++
>>>   block/blk-mq-tag.h     |  1 +
>>>   block/blk-mq.c         | 61 +++++++++++++++++++++++++-----
>>>   block/blk-mq.h         |  9 +++++
>>>   include/linux/blk-mq.h |  3 ++
>>>   include/linux/blkdev.h |  1 +
>>>   9 files changed, 155 insertions(+), 10 deletions(-)
>>>
>> [ .. ]
>>> @@ -396,15 +398,17 @@ static struct request
>>> *blk_mq_get_request(struct request_queue *q,
>>>           blk_mq_tag_busy(data->hctx);
>>>       }
>>>   -    tag = blk_mq_get_tag(data);
>>> -    if (tag == BLK_MQ_TAG_FAIL) {
>>> -        if (clear_ctx_on_error)
>>> -            data->ctx = NULL;
>>> -        blk_queue_exit(q);
>>> -        return NULL;
>>> +    if (data->hctx->shared_tags) {
>>> +        shared_tag = blk_mq_get_shared_tag(data);
>>> +        if (shared_tag == BLK_MQ_TAG_FAIL)
>>> +            goto err_shared_tag;
>>>       }
>>>   -    rq = blk_mq_rq_ctx_init(data, tag, data->cmd_flags,
>>> alloc_time_ns);
>>> +    tag = blk_mq_get_tag(data);
>>> +    if (tag == BLK_MQ_TAG_FAIL)
>>> +        goto err_tag;
>>> +
>>> +    rq = blk_mq_rq_ctx_init(data, tag, shared_tag, data->cmd_flags,
>>> alloc_time_ns);
>>>       if (!op_is_flush(data->cmd_flags)) {
>>>           rq->elv.icq = NULL;
>>>           if (e && e->type->ops.prepare_request) {
> 
> Hi Hannes,
> 
>> Why do you need to keep a parallel tag accounting between 'normal' and
>> 'shared' tags here?
>> Isn't is sufficient to get a shared tag only, and us that in lieo of the
>> 'real' one?
> 
> In theory, yes. Just the 'shared' tag should be adequate.
> 
> A problem I see with this approach is that we lose the identity of which
> tags are allocated for each hctx. As an example for this, consider
> blk_mq_queue_tag_busy_iter(), which iterates the bits for each hctx.
> Now, if you're just using shared tags only, that wouldn't work.
> 
> Consider blk_mq_can_queue() as another example - this tells us if a hctx
> has any bits unset, while with only using shared tags it would tell if
> any bits unset over all queues, and this change in semantics could break
> things. At a glance, function __blk_mq_tag_idle() looks problematic also.
> 
> And this is where it becomes messy to implement.
> 
Oh, my. Indeed, that's correct.

But then we don't really care _which_ shared tag is assigned; so
wouldn't be we better off by just having an atomic counter here?
Cache locality will be blown anyway ...

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      Teamlead Storage & Networking
hare@suse.de			                  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 247165 (AG München), GF: Felix Imendörffer

  reply	other threads:[~2019-11-13 16:16 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-13 13:36 [PATCH RFC 0/5] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs John Garry
2019-11-13 13:36 ` [PATCH RFC 1/5] blk-mq: Remove some unused function arguments John Garry
2019-11-13 13:58   ` Hannes Reinecke
2019-11-13 13:36 ` [PATCH RFC 2/5] blk-mq: rename BLK_MQ_F_TAG_SHARED as BLK_MQ_F_TAG_QUEUE_SHARED John Garry
2019-11-13 13:58   ` Hannes Reinecke
2019-11-13 13:36 ` [PATCH RFC 3/5] blk-mq: Facilitate a shared tags per tagset John Garry
2019-11-13 14:06   ` Hannes Reinecke
2019-11-13 14:57     ` John Garry
2019-11-13 15:38       ` Hannes Reinecke [this message]
2019-11-13 16:21         ` John Garry
2019-11-13 18:38           ` Hannes Reinecke
2019-11-14  9:41             ` John Garry
2019-11-15  5:30               ` Bart Van Assche
2019-11-15  7:29                 ` Hannes Reinecke
2019-11-15 10:24                 ` John Garry
2019-11-15 17:57                   ` Bart Van Assche
2019-11-18 10:31                     ` John Garry
2019-11-19  9:26                       ` John Garry
2019-11-15  7:26               ` Hannes Reinecke
2019-11-15 10:46                 ` John Garry
2019-11-13 13:36 ` [PATCH RFC 4/5] scsi: Add template flag 'host_tagset' John Garry
2019-11-13 13:36 ` [PATCH RFC 5/5] scsi: hisi_sas: Switch v3 hw to MQ John Garry

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=02056612-a958-7b05-3c54-bb2fa69bc493@suse.de \
    --to=hare@suse.de \
    --cc=axboe@kernel.dk \
    --cc=bvanassche@acm.org \
    --cc=chenxiang66@hisilicon.com \
    --cc=hare@suse.com \
    --cc=jejb@linux.ibm.com \
    --cc=john.garry@huawei.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=ming.lei@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).