All of lore.kernel.org
 help / color / mirror / Atom feed
From: Shaohua Li <shaohua.li@intel.com>
To: Jens Axboe <jaxboe@fusionio.com>
Cc: "Shi, Alex" <alex.shi@intel.com>,
	"James.Bottomley@hansenpartnership.com" 
	<James.Bottomley@hansenpartnership.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: Perfromance drop on SCSI hard disk
Date: Fri, 13 May 2011 08:48:24 +0800	[thread overview]
Message-ID: <1305247704.2373.32.camel@sli10-conroe> (raw)
In-Reply-To: <4DCC4340.6000407@fusionio.com>

On Fri, 2011-05-13 at 04:29 +0800, Jens Axboe wrote:
> On 2011-05-10 08:40, Alex,Shi wrote:
> > commit c21e6beba8835d09bb80e34961 removed the REENTER flag and changed
> > scsi_run_queue() to punt all requests on starved_list devices to
> > kblockd. Yes, like Jens mentioned, the performance on slow SCSI disk was
> > hurt here.  :) (Intel SSD isn't effected here)
> > 
> > In our testing on 12 SAS disk JBD, the fio write with sync ioengine drop
> > about 30~40% throughput, fio randread/randwrite with aio ioengine drop
> > about 20%/50% throughput. and fio mmap testing was hurt also. 
> > 
> > With the following debug patch, the performance can be totally recovered
> > in our testing. But without REENTER flag here, in some corner case, like
> > a device is keeping blocked and then unblocked repeatedly,
> > __blk_run_queue() may recursively call scsi_run_queue() and then cause
> > kernel stack overflow. 
> > I don't know details of block device driver, just wondering why on scsi
> > need the REENTER flag here. :) 
> 
> This is a problem and we should do something about it for 2.6.39. I knew
> that there would be cases where the async offload would cause a
> performance degredation, but not to the extent that you are reporting.
> Must be hitting the pathological case.
async offload is expected to increase context switch. But the real root
cause of the issue is fairness issue. Please see my previous email.

> I can think of two scenarios where it could potentially recurse:
> 
> - request_fn enter, end up requeuing IO. Run queue at the end. Rinse,
>   repeat.
> - Running starved list from request_fn, two (or more) devices could
>   alternately recurse.
> 
> The first case should be fairly easy to handle. The second one is
> already handled by the local list splice.
this isn't true to me. if you unlock host_lock in scsi_run_queue, other
cpus can add sdev to the starved device list again. In the recursive
call of scsi_run_queue, the starved device list might not be empty. So
the local list_splice doesn't help.

> 
> Looking at the code, is this a real scenario? Only potential recurse I
> see is:
> 
> scsi_request_fn()
>         scsi_dispatch_cmd()
>                 scsi_queue_insert()
>                         __scsi_queue_insert()
>                                 scsi_run_queue()
> 
> Why are we even re-running the queue immediately on a BUSY condition?
> Should only be needed if we have zero pending commands from this
> particular queue, and for that particular case async run is just fine
> since it's a rare condition (or performance would suck already).
> 
> And it should only really be needed for the 'q' being passed in, not the
> others. Something like the below.
> 
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 0bac91e..0b01c1f 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -74,7 +74,7 @@ struct kmem_cache *scsi_sdb_cache;
>   */
>  #define SCSI_QUEUE_DELAY	3
>  
> -static void scsi_run_queue(struct request_queue *q);
> +static void scsi_run_queue_async(struct request_queue *q);
>  
>  /*
>   * Function:	scsi_unprep_request()
> @@ -161,7 +161,7 @@ static int __scsi_queue_insert(struct scsi_cmnd *cmd, int reason, int unbusy)
>  	blk_requeue_request(q, cmd->request);
>  	spin_unlock_irqrestore(q->queue_lock, flags);
>  
> -	scsi_run_queue(q);
> +	scsi_run_queue_async(q);
so you could still recursivly run into starved list. Do you want to put
the whole __scsi_run_queue into workqueue?

Thanks,
Shaohua


  parent reply	other threads:[~2011-05-13  0:48 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-05-10  6:40 Perfromance drop on SCSI hard disk Alex,Shi
2011-05-10  6:52 ` Shaohua Li
2011-05-12  0:36   ` Shaohua Li
2011-05-12 20:29 ` Jens Axboe
2011-05-13  0:11   ` Alex,Shi
2011-05-13  0:48   ` Shaohua Li [this message]
2011-05-13  3:01     ` Shaohua Li
2011-05-16  8:04       ` Shaohua Li
2011-05-16  8:37         ` Alex,Shi
2011-05-17  6:09           ` Alex,Shi
2011-05-17  7:20             ` Jens Axboe
2011-05-19  8:26               ` Alex,Shi
2011-05-19  8:47                 ` Alex,Shi
2011-05-19 18:27                 ` Jens Axboe
2011-05-20  0:22                   ` Alex,Shi
2011-05-20  0:40                     ` Shaohua Li
2011-05-20  5:17                       ` Alex,Shi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1305247704.2373.32.camel@sli10-conroe \
    --to=shaohua.li@intel.com \
    --cc=James.Bottomley@hansenpartnership.com \
    --cc=alex.shi@intel.com \
    --cc=jaxboe@fusionio.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.