All of lore.kernel.org
 help / color / mirror / Atom feed
From: Kashyap Desai <kashyap.desai@broadcom.com>
To: John Garry <john.garry@huawei.com>
Cc: axboe@kernel.dk, jejb@linux.ibm.com, martin.petersen@oracle.com,
	ming.lei@redhat.com, bvanassche@acm.org, hare@suse.de,
	don.brace@microsemi.com, Sumit Saxena <sumit.saxena@broadcom.com>,
	hch@infradead.org,
	Shivasharan Srikanteshwara 
	<shivasharan.srikanteshwara@broadcom.com>,
	"chenxiang (M)" <chenxiang66@hisilicon.com>,
	linux-block@vger.kernel.org, linux-scsi@vger.kernel.org,
	esc.storagedev@microsemi.com, Hannes Reinecke <hare@suse.com>
Subject: RE: [PATCH RFC v6 08/10] megaraid_sas: switch fusion adapters to MQ
Date: Wed, 29 Apr 2020 21:20:49 +0530	[thread overview]
Message-ID: <29f8062c1fccace73c45252073232917@mail.gmail.com> (raw)
In-Reply-To: <ca799ed3-0b18-2aa5-7d75-6eac5b0e6e7b@huawei.com>

> -----Original Message-----
> From: John Garry [mailto:john.garry@huawei.com]
> Sent: Wednesday, April 29, 2020 5:00 PM
> To: Kashyap Desai <kashyap.desai@broadcom.com>
> Cc: axboe@kernel.dk; jejb@linux.ibm.com; martin.petersen@oracle.com;
> ming.lei@redhat.com; bvanassche@acm.org; hare@suse.de;
> don.brace@microsemi.com; Sumit Saxena <sumit.saxena@broadcom.com>;
> hch@infradead.org; Shivasharan Srikanteshwara
> <shivasharan.srikanteshwara@broadcom.com>; chenxiang (M)
> <chenxiang66@hisilicon.com>; linux-block@vger.kernel.org; linux-
> scsi@vger.kernel.org; esc.storagedev@microsemi.com; Hannes Reinecke
> <hare@suse.com>
> Subject: Re: [PATCH RFC v6 08/10] megaraid_sas: switch fusion adapters to
> MQ
>
> On 28/04/2020 16:55, John Garry wrote:
> > Hi Kashyap,
> >
> >> I am using <mq-deadline> which is MQ version of SQ deadline.
> >> For now we can just consider case without scheduler.
> >>
> >> I have some more findings. May be someone from upstream community
> can
> >> connect the dots.
> >>
> >> #1. hctx_may_queue() throttle the IO at hctx level. This is
> >> eventually per sdev throttling for megaraid_sas driver because It
> >> creates only one context - hctx0 for each scsi device.
> >>
> >> If driver is using only one h/w queue,  active_queues will be always
> >> steady.
> >> In my test it was 64 thread, so active_queues=64.
> >
> > So I figure that 64 threads comes from 64 having disks.
> >
> >> Even though <fio> thread is shared among allowed cpumask
> >> (cpus_allowed_policy=shared option in fio),  active_queues will be
> >> always 64 because we have only one h/w queue.
> >> All the logical cpus are mapped to one h/w queue. It means, thread
> >> moving from one cpu to another cpu will not change active_queues per
> hctx.
> >>
> >> In case of this RFC, active_queues are now per hctx and there are
> >> multiple hctx, but tags are shared.
> >
> > Right, so we need a policy to divide up the shared tags across request
> > queues, based on principle of fairness.
> >
>
> Hi Kashyap,
>
> How about this, which counts requests per request_queue for shared sbitmap
> instead of per hctx (and, as such, does not need to aggreate them in
> hctx_may_queue()):

Hi John,

I have to add additional changes on top of your below patch -
active_queues also should be managed differently for shared tag map case. I
added extra flags in queue_flags interface which is per request.

Combination of your patch and below resolves fairness issues. I see some
better results using " --cpus_allowed_policy=split", but
--cpus_allowed_policy=shared is still having some issue and most likely it
is to do with fairness. If fairness is not managed properly, there is high
contention in wait/wakeup handling of sbitmap.


=====Additional patch ===

diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index 3719e1a..95bcb47 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -23,10 +23,20 @@
  */
 bool __blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx)
 {
-       if (!test_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state) &&
-           !test_and_set_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state))
-               atomic_inc(&hctx->tags->active_queues);
+       struct blk_mq_tags *tags = hctx->tags;
+       struct request_queue *q = hctx->queue;
+       struct sbitmap_queue *bt;

+       if (blk_mq_is_sbitmap_shared(q->tag_set)){
+               bt = tags->bitmap_tags;
+               if (!test_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags) &&
+                       !test_and_set_bit(QUEUE_FLAG_HCTX_ACTIVE,
&q->queue_flags))
+                       atomic_inc(&bt->active_queues_per_host);
+       } else {
+               if (!test_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state) &&
+                   !test_and_set_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state))
+                       atomic_inc(&hctx->tags->active_queues);
+       }
        return true;
 }

@@ -47,12 +57,20 @@ void blk_mq_tag_wakeup_all(struct blk_mq_tags *tags,
bool include_reserve)
 void __blk_mq_tag_idle(struct blk_mq_hw_ctx *hctx)
 {
        struct blk_mq_tags *tags = hctx->tags;
+       struct sbitmap_queue *bt;
+       struct request_queue *q = hctx->queue;

-       if (!test_and_clear_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state))
-               return;
-
-       atomic_dec(&tags->active_queues);
+       if (blk_mq_is_sbitmap_shared(q->tag_set)){
+               bt = tags->bitmap_tags;
+               if (!test_and_clear_bit(QUEUE_FLAG_HCTX_ACTIVE,
&q->queue_flags))
+                       return;
+               atomic_dec(&bt->active_queues_per_host);
+       } else {
+               if (!test_and_clear_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state))
+                       return;

+               atomic_dec(&tags->active_queues);
+       }
        blk_mq_tag_wakeup_all(tags, false);
 }

@@ -65,12 +83,13 @@ static inline bool hctx_may_queue(struct
blk_mq_alloc_data *data,
 {
        struct blk_mq_hw_ctx *hctx = data->hctx;
        struct request_queue *q = data->q;
+       struct blk_mq_tags *tags = hctx->tags;
        unsigned int depth, users;

        if (!hctx || !(hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED))
                return true;
-       if (!test_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state))
-               return true;
+       //if (!test_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state))
+       //      return true;

        /*
         * Don't try dividing an ant
@@ -78,7 +97,11 @@ static inline bool hctx_may_queue(struct
blk_mq_alloc_data *data,
        if (bt->sb.depth == 1)
                return true;

-       users = atomic_read(&hctx->tags->active_queues);
+       if (blk_mq_is_sbitmap_shared(q->tag_set)) {
+               bt = tags->bitmap_tags;
+               users = atomic_read(&bt->active_queues_per_host);
+       } else
+               users = atomic_read(&hctx->tags->active_queues);
        if (!users)
                return true;

diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 2bc9998..7049800 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -613,6 +613,7 @@ struct request_queue {
 #define QUEUE_FLAG_PCI_P2PDMA  25      /* device supports PCI p2p requests
*/
 #define QUEUE_FLAG_ZONE_RESETALL 26    /* supports Zone Reset All */
 #define QUEUE_FLAG_RQ_ALLOC_TIME 27    /* record rq->alloc_time_ns */
+#define QUEUE_FLAG_HCTX_ACTIVE 28      /* atleast one hctx is active*/

 #define QUEUE_FLAG_MQ_DEFAULT  ((1 << QUEUE_FLAG_IO_STAT) |            \
                                 (1 << QUEUE_FLAG_SAME_COMP))
diff --git a/include/linux/sbitmap.h b/include/linux/sbitmap.h
index e40d019..fb44925 100644
--- a/include/linux/sbitmap.h
+++ b/include/linux/sbitmap.h
@@ -139,6 +139,8 @@ struct sbitmap_queue {
         * sbitmap_queue_get_shallow() or __sbitmap_queue_get_shallow().
         */
        unsigned int min_shallow_depth;
+
+       atomic_t active_queues_per_host;
 };

 /**

>
> ---->8
>
> diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c index
> ba68d934e3ca..0c8adecba219 100644
> --- a/block/blk-mq-tag.c
> +++ b/block/blk-mq-tag.c
> @@ -60,9 +60,11 @@ void __blk_mq_tag_idle(struct blk_mq_hw_ctx *hctx)
>    * For shared tag users, we track the number of currently active users
>    * and attempt to provide a fair share of the tag depth for each of
> them.
>    */
> -static inline bool hctx_may_queue(struct blk_mq_hw_ctx *hctx,
> +static inline bool hctx_may_queue(struct blk_mq_alloc_data *data,
>   				  struct sbitmap_queue *bt)
>   {
> +	struct blk_mq_hw_ctx *hctx = data->hctx;
> +	struct request_queue *q = data->q;
>   	unsigned int depth, users;
>
>   	if (!hctx || !(hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED)) @@ -
> 84,14 +86,14 @@ static inline bool hctx_may_queue(struct blk_mq_hw_ctx
> *hctx,
>   	 * Allow at least some tags
>   	 */
>   	depth = max((bt->sb.depth + users - 1) / users, 4U);
> -	return atomic_read(&hctx->nr_active) < depth;
> +	return __blk_mq_active_requests(hctx, q) < depth;
>   }
>
>   static int __blk_mq_get_tag(struct blk_mq_alloc_data *data,
>   			    struct sbitmap_queue *bt)
>   {
>   	if (!(data->flags & BLK_MQ_REQ_INTERNAL) &&
> -	    !hctx_may_queue(data->hctx, bt))
> +	    !hctx_may_queue(data, bt))
>   		return -1;
>   	if (data->shallow_depth)
>   		return __sbitmap_queue_get_shallow(bt, data-
> >shallow_depth); diff --git a/block/blk-mq.c b/block/blk-mq.c index
> 7e446f946e73..5bbd95e01f08 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -282,7 +282,7 @@ static struct request *blk_mq_rq_ctx_init(struct
> blk_mq_alloc_data *data,
>   	} else {
>   		if (data->hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED) {
>   			rq_flags = RQF_MQ_INFLIGHT;
> -			atomic_inc(&data->hctx->nr_active);
> +			__blk_mq_inc_active_requests(data->hctx, data->q);
>   		}
>   		rq->tag = tag;
>   		rq->internal_tag = -1;
> @@ -502,7 +502,7 @@ void blk_mq_free_request(struct request *rq)
>
>   	ctx->rq_completed[rq_is_sync(rq)]++;
>   	if (rq->rq_flags & RQF_MQ_INFLIGHT)
> -		atomic_dec(&hctx->nr_active);
> +		__blk_mq_dec_active_requests(hctx, q);
>
>   	if (unlikely(laptop_mode && !blk_rq_is_passthrough(rq)))
>   		laptop_io_completion(q->backing_dev_info);
> @@ -1048,7 +1048,7 @@ bool blk_mq_get_driver_tag(struct request *rq)
>   	if (rq->tag >= 0) {
>   		if (shared) {
>   			rq->rq_flags |= RQF_MQ_INFLIGHT;
> -			atomic_inc(&data.hctx->nr_active);
> +			__blk_mq_inc_active_requests(rq->mq_hctx, rq->q);
>   		}
>   		data.hctx->tags->rqs[rq->tag] = rq;
>   	}
> diff --git a/block/blk-mq.h b/block/blk-mq.h index
> dde2d29f0ce5..5ab1566c4b7d 100644
> --- a/block/blk-mq.h
> +++ b/block/blk-mq.h
> @@ -202,6 +202,29 @@ static inline bool
> blk_mq_get_dispatch_budget(struct blk_mq_hw_ctx *hctx)
>   	return true;
>   }
>
> +static inline  void __blk_mq_inc_active_requests(struct blk_mq_hw_ctx
> *hctx, struct request_queue *q)
> +{
> +	if (blk_mq_is_sbitmap_shared(q->tag_set))
> +		atomic_inc(&q->nr_active_requests);
> +	else
> +		atomic_inc(&hctx->nr_active);
> +}
> +
> +static inline void __blk_mq_dec_active_requests(struct blk_mq_hw_ctx
> *hctx, struct request_queue *q)
> +{
> +	if (blk_mq_is_sbitmap_shared(q->tag_set))
> +		atomic_dec(&q->nr_active_requests);
> +	else
> +		atomic_dec(&hctx->nr_active);
> +}
> +
> +static inline int __blk_mq_active_requests(struct blk_mq_hw_ctx *hctx,
> struct request_queue *q)
> +{
> +	if (blk_mq_is_sbitmap_shared(q->tag_set))
> +		return atomic_read(&q->nr_active_requests);
> +	return atomic_read(&hctx->nr_active);
> +}
> +
>   static inline void __blk_mq_put_driver_tag(struct blk_mq_hw_ctx *hctx,
>   					   struct request *rq)
>   {
> @@ -210,7 +233,7 @@ static inline void __blk_mq_put_driver_tag(struct
> blk_mq_hw_ctx *hctx,
>
>   	if (rq->rq_flags & RQF_MQ_INFLIGHT) {
>   		rq->rq_flags &= ~RQF_MQ_INFLIGHT;
> -		atomic_dec(&hctx->nr_active);
> +		__blk_mq_dec_active_requests(hctx, rq->q);
>   	}
>   }
>
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index
> 32868fbedc9e..5f2955872234 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -579,6 +579,8 @@ struct request_queue {
>
>   	size_t			cmd_size;
>
> +	atomic_t nr_active_requests;
> +
>   	struct work_struct	release_work;
>
>   #define BLK_MAX_WRITE_HINTS	5
>
> > This can create unwanted throttling and
> >> eventually more lock contention in sbitmap.
> >> I added below patch and things improved a bit, but not a full proof.
> >>
> >> diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c index
> >> 586c9d6..c708fbc 100644
> >> --- a/block/blk-mq-tag.c
> >> +++ b/block/blk-mq-tag.c
> >> @@ -60,10 +60,12 @@ void __blk_mq_tag_idle(struct blk_mq_hw_ctx
> >> *hctx)
> >>    * For shared tag users, we track the number of currently active
> >> users
> >>    * and attempt to provide a fair share of the tag depth for each of
> >> them.
> >>    */
> >> -static inline bool hctx_may_queue(struct blk_mq_hw_ctx *hctx,
> >> +static inline bool hctx_may_queue(struct request_queue *q,
> >> +                                 struct blk_mq_hw_ctx *hctx,
> >>                                    struct sbitmap_queue *bt)
> >>   {
> >> -       unsigned int depth, users;
> >> +       unsigned int depth, users, i, outstanding = 0;
> >> +       struct blk_mq_hw_ctx *hctx_iter;
> >>
> >>          if (!hctx || !(hctx->flags & BLK_MQ_F_TAG_SHARED))
> >>                  return true;
> >> @@ -84,14 +86,18 @@ static inline bool hctx_may_queue(struct
> >> blk_mq_hw_ctx *hctx,
> >>           * Allow at least some tags
> >>           */
> >>          depth = max((bt->sb.depth + users - 1) / users, 4U);
> >> -       return atomic_read(&hctx->nr_active) < depth;
> >> +
> >> +       queue_for_each_hw_ctx(q, hctx_iter, i)
> >> +               outstanding += atomic_read(&hctx_iter->nr_active);
> >> +
> >
> > OK,  I think that we need to find a cleaner way to do this.
> >
> >> +       return outstanding < depth;
> >>   }
> >>
> >>   static int __blk_mq_get_tag(struct blk_mq_alloc_data *data,
> >>                              struct sbitmap_queue *bt)
> >>   {
> >>          if (!(data->flags & BLK_MQ_REQ_INTERNAL) &&
> >> -           !hctx_may_queue(data->hctx, bt))
> >>
> >>
> >> #2 - In completion path, scsi module call blk_mq_run_hw_queues() upon
> >> IO completion.  I am not sure if it is good to run all the h/w queue
> >> or just h/w queue of current reference is good enough.
> >> Below patch helped to reduce contention in hcxt_lock().
> >>
> >> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index
> >> 610ee41..f72de2a 100644
> >> --- a/drivers/scsi/scsi_lib.c
> >> +++ b/drivers/scsi/scsi_lib.c
> >> @@ -572,6 +572,7 @@ static bool scsi_end_request(struct request *req,
> >> blk_status_t error,
> >>          struct scsi_cmnd *cmd = blk_mq_rq_to_pdu(req);
> >>          struct scsi_device *sdev = cmd->device;
> >>          struct request_queue *q = sdev->request_queue;
> >> +       struct blk_mq_hw_ctx *mq_hctx = req->mq_hctx;
> >>
> >>          if (blk_update_request(req, error, bytes))
> >>                  return true;
> >> @@ -613,7 +614,7 @@ static bool scsi_end_request(struct request *req,
> >> blk_status_t error,
> >>              !list_empty(&sdev->host->starved_list))
> >>                  kblockd_schedule_work(&sdev->requeue_work);
> >>          else
> >> -               blk_mq_run_hw_queues(q, true);
> >> +               blk_mq_run_hw_queue(mq_hctx, true);
> >
> > Not sure on this. But, indeed, I found running all queues did add lots
> > of extra load for when enabling the deadline scheduler.
> >
> >>
> >>          percpu_ref_put(&q->q_usage_counter);
> >>          return false;
> >>
> >> #3 -  __blk_mq_tag_idle() calls blk_mq_tag_wakeup_all which may not
> >> be optimal for shared queue.
> >> There is a penalty if we are calling __sbq_wake_up() frequently. In
> >> case of nr_hw_queue = 1, things are better because one hctx and
> >> hctx->state will avoid multiple calls.
> >> If blk_mq_tag_wakeup_all is called from hctx0 context, there is no
> >> need to call from hctx1, hctx2 etc.
> >>
> >> I have added below patch in my testing.
> >>
> >> diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c index
> >> 586c9d6..5b331e5 100644
> >> --- a/block/blk-mq-tag.c
> >> +++ b/block/blk-mq-tag.c
> >> @@ -53,7 +53,9 @@ void __blk_mq_tag_idle(struct blk_mq_hw_ctx *hctx)
> >>
> >>          atomic_dec(&tags->active_queues);
> >>
> >> -       blk_mq_tag_wakeup_all(tags, false);
> >> +       /* TBD - Do this only for hctx->flags &
> >> BLK_MQ_F_TAG_HCTX_SHARED */
> >> +       if (hctx->queue_num == 0)
> >> +               blk_mq_tag_wakeup_all(tags, false);
> >
> > ok, I see. But, again, I think we need to find a cleaner way to do this.
> >
> >>   }
> >>
> >>   /*
> >>
> >>
> >> With all above mentioned changes, I see performance improved from
> >> 2.2M IOPS to 2.7M on same workload and profile.
> >
> > But still short of nr_hw_queue = 1, which got 3.1M IOPS, as below,
> > right?
> >
> > Thanks,
> > John
> >
> >>
> >>>
> >>>>
> >>>> Old Driver which has nr_hw_queue = 1 and I issue IOs from <fio>
> >>>> queue depth = 128. We get 3.1M IOPS in this config. This eventually
> >>>> exhaust host can_queue.
> >>>
> >>> So I think I need to find a point where we start to get throttled.
> >>>
> >>>> Note - Very low contention in sbitmap_get()
> >>>>
> >>>> -   23.58%     0.25%  fio              [kernel.vmlinux]
> >>>> [k] blk_mq_make_request
> >>>>      - 23.33% blk_mq_make_request
> >>>>         - 21.68% blk_mq_get_request
> >>>>            - 20.19% blk_mq_get_tag
> >>>>               + 10.08% prepare_to_wait_exclusive
> >>>>               + 4.51% io_schedule
> >>>>               - 3.59% __sbitmap_queue_get
> >>>>                  - 2.82% sbitmap_get
> >>>>                       0.86% __sbitmap_get_word
> >>>>                       0.75% _raw_spin_lock_irqsave
> >>>>                       0.55% _raw_spin_unlock_irqrestore
> >>>>
> >>>> Driver with RFC which has nr_hw_queue = N and I issue IOs from
> >>>> <fio> queue depth = 128. We get 2.3 M IOPS in this config. This
> >>>> eventually exhaust host can_queue.
> >>>> Note - Very high contention in sbitmap_get()
> >>>>
> >>>> -   42.39%     0.12%  fio              [kernel.vmlinux]
> >>>> [k] generic_make_request
> >>>>      - 42.27% generic_make_request
> >>>>         - 41.00% blk_mq_make_request
> >>>>            - 38.28% blk_mq_get_request
> >>>>               - 33.76% blk_mq_get_tag
> >>>>                  - 30.25% __sbitmap_queue_get
> >>>>                     - 29.90% sbitmap_get
> >>>>                        + 9.06% _raw_spin_lock_irqsave
> >>>>                        + 7.94% _raw_spin_unlock_irqrestore
> >>>>                        + 3.86% __sbitmap_get_word
> >>>>                        + 1.78% call_function_single_interrupt
> >>>>                        + 0.67% ret_from_intr
> >>>>                  + 1.69% io_schedule
> >>>>                    0.59% prepare_to_wait_exclusive
> >>>>                    0.55% __blk_mq_get_tag
> >>>>
> >>>> In this particular case, I observed alloc_hint = zeros which means,
> >>>> sbitmap_get is not able to find free tags from hint. That may lead
> >>>> to contention.
> >>>> This condition is not happening with nr_hw_queue=1 (without RFC)
> >>>> driver.
> >>>>
> >>>> alloc_hint=
> >>>> {663, 2425, 3060, 54, 3149, 4319, 4175, 4867, 543, 2481, 0, 4779,
> >>>> 377, ***0***, 2010, 0, 909, 3350, 1546, 2179, 2875, 659, 3902,
> >>>> 2224, 3212, 836, 1892, 1669, 2420, 3415, 1904, 512, 3027, 4810,
> >>>> 2845, 4690, 712, 3105, 0, 0, 0, 3268, 4915, 3897, 1349, 547, 4,
> >>>> 733, 1765, 2068, 979, 51, 880, 0, 370, 3520, 2877, 4097, 418, 4501,
> >>>> 3717, 2893, 604, 508, 759, 3329, 4038, 4829, 715, 842, 1443, 556}
> >>>>
> >>>> Driver with RFC which has nr_hw_queue = N and I issue IOs from
> >>>> <fio> queue depth = 32. We get 3.1M IOPS in this config. This
> >>>> workload does
> >>>> *not* exhaust host can_queue.
> >>>
> >>> Please ensure .host_tagset is set for whenever nr_hw_queue = N. This
> >>> is as per RFC, and I don't think you modified from the RFC for your
> >>> test.
> >>> But I just wanted to mention that to be crystal clear.
> >>
> >> Yes I have two separate driver copy. One with RFC change and another
> >> without RFC.
> >>>
> >>>>
> >>>> -    5.07%     0.14%  fio              [kernel.vmlinux]  [k]
> >>>> generic_make_request
> >>>>      - 4.93% generic_make_request
> >>>>         - 3.61% blk_mq_make_request
> >>>>            - 2.04% blk_mq_get_request
> >>>>               - 1.08% blk_mq_get_tag
> >>>>                  - 0.70% __sbitmap_queue_get
> >>>>                       0.67% sbitmap_get
> >>>>
> >>>> In summary, RFC has some performance bottleneck in sbitmap_get ()
> >>>> if outstanding per shost is about to exhaust.  Without this RFC
> >>>> also driver works in nr_hw_queue = 1, but that case is managed very
> well.
> >>>> I am not sure why it happens only with shared host tag ?
> >>>> Theoretically all the hctx is sharing the same bitmaptag which is
> >>>> same as nr_hw_queue=1, so why contention is only visible in shared
> >>>> host tag case.
> >>>
> >>> Let me check this.
> >>>
> >>>>
> >>>> If you want to reproduce this issue, may be you have to reduce the
> >>>> can_queue in hisi_sas driver.
> >>>>
> >>>
> >>> Thanks,
> >>> John
> >

  reply	other threads:[~2020-04-29 15:50 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-05 11:54 [PATCH RFC v6 00/10] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs John Garry
2020-03-05 11:54 ` [PATCH RFC v6 01/10] blk-mq: rename BLK_MQ_F_TAG_SHARED as BLK_MQ_F_TAG_QUEUE_SHARED John Garry
2020-03-05 11:54 ` [PATCH RFC v6 02/10] blk-mq: rename blk_mq_update_tag_set_depth() John Garry
2020-03-05 11:54 ` [PATCH RFC v6 03/10] blk-mq: Use pointers for blk_mq_tags bitmap tags John Garry
2020-03-05 12:42   ` Hannes Reinecke
2020-03-05 11:54 ` [PATCH RFC v6 04/10] blk-mq: Facilitate a shared sbitmap per tagset John Garry
2020-03-05 12:49   ` Hannes Reinecke
2020-03-05 13:52     ` John Garry
2020-03-05 11:54 ` [PATCH RFC v6 05/10] blk-mq: Add support in hctx_tags_bitmap_show() for a shared sbitmap John Garry
2020-03-05 12:52   ` Hannes Reinecke
2020-03-05 11:54 ` [PATCH RFC v6 06/10] scsi: Add template flag 'host_tagset' John Garry
2020-03-06 11:12   ` John Garry
2020-03-05 11:54 ` [PATCH RFC v6 07/10] scsi: hisi_sas: Switch v3 hw to MQ John Garry
2020-03-05 12:52   ` Hannes Reinecke
2020-03-05 11:54 ` [PATCH RFC v6 08/10] megaraid_sas: switch fusion adapters " John Garry
2020-04-07 11:14   ` Kashyap Desai
2020-04-08  9:33     ` John Garry
2020-04-08  9:59       ` Kashyap Desai
2020-04-17 16:46         ` John Garry
2020-04-20 17:47           ` Kashyap Desai
2020-04-21 12:35             ` John Garry
2020-04-22 18:59               ` Kashyap Desai
2020-04-22 21:28                 ` John Garry
2020-04-23 16:31                   ` John Garry
2020-04-24 16:31                     ` Kashyap Desai
2020-04-27 17:06                       ` John Garry
2020-04-27 18:58                         ` Kashyap Desai
2020-04-28 15:55                           ` John Garry
2020-04-29 11:29                             ` John Garry
2020-04-29 15:50                               ` Kashyap Desai [this message]
2020-04-29 17:55                                 ` John Garry
2020-04-30 17:40                                   ` John Garry
2020-04-30 19:18                                     ` Kashyap Desai
2020-03-05 11:54 ` [PATCH RFC v6 09/10] smartpqi: enable host tagset John Garry
2020-03-05 11:54 ` [PATCH RFC v6 10/10] hpsa: enable host_tagset and switch to MQ John Garry

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=29f8062c1fccace73c45252073232917@mail.gmail.com \
    --to=kashyap.desai@broadcom.com \
    --cc=axboe@kernel.dk \
    --cc=bvanassche@acm.org \
    --cc=chenxiang66@hisilicon.com \
    --cc=don.brace@microsemi.com \
    --cc=esc.storagedev@microsemi.com \
    --cc=hare@suse.com \
    --cc=hare@suse.de \
    --cc=hch@infradead.org \
    --cc=jejb@linux.ibm.com \
    --cc=john.garry@huawei.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=ming.lei@redhat.com \
    --cc=shivasharan.srikanteshwara@broadcom.com \
    --cc=sumit.saxena@broadcom.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.