All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Elliott, Robert (Server Storage)" <Elliott@hp.com>
To: Jens Axboe <axboe@kernel.dk>,
	"dgilbert@interlog.com" <dgilbert@interlog.com>,
	Christoph Hellwig <hch@infradead.org>,
	"James Bottomley" <James.Bottomley@HansenPartnership.com>,
	Bart Van Assche <bvanassche@fusionio.com>,
	Benjamin LaHaise <bcrl@kvack.org>,
	"linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: RE: scsi-mq V2
Date: Thu, 10 Jul 2014 00:53:36 +0000	[thread overview]
Message-ID: <94D0CD8314A33A4D9D801C0FE68B402958B9628B@G9W0745.americas.hpqcorp.net> (raw)
In-Reply-To: <53BD9A24.7010203@kernel.dk>



> -----Original Message-----
> From: Jens Axboe [mailto:axboe@kernel.dk]
> Sent: Wednesday, 09 July, 2014 2:38 PM
> To: dgilbert@interlog.com; Christoph Hellwig; James Bottomley; Bart Van
> Assche; Elliott, Robert (Server Storage); linux-scsi@vger.kernel.org; linux-
> kernel@vger.kernel.org
> Subject: Re: scsi-mq V2
> 
> On 2014-07-09 18:39, Douglas Gilbert wrote:
> > On 14-07-08 10:48 AM, Christoph Hellwig wrote:
> >> On Wed, Jun 25, 2014 at 06:51:47PM +0200, Christoph Hellwig wrote:
> >>> Changes from V1:
> >>>   - rebased on top of the core-for-3.17 branch, most notable the
> >>>     scsi logging changes
> >>>   - fixed handling of cmd_list to prevent crashes for some heavy
> >>>     workloads
> >>>   - fixed incorrect handling of !target->can_queue
> >>>   - avoid scheduling a workqueue on I/O completions when no queues
> >>>     are congested
> >>>
> >>> In addition to the patches in this thread there also is a git
> >>> available at:
> >>>
> >>>     git://git.infradead.org/users/hch/scsi.git scsi-mq.2
> >>
> >>
> >> I've pushed out a new scsi-mq.3 branch, which has been rebased on the
> >> latest core-for-3.17 tree + the "RFC: clean up command setup" series
> >> from June 29th.  Robert Elliot found a problem with not fully zeroed
> >> out UNMAP CDBs, which is fixed by the saner discard handling in that
> >> series.
> >>
> >> There is a new patch to factor the code from the above series for
> >> blk-mq use, which I've attached below.  Besides that the only changes
> >> are minor merge fixups in the main blk-mq usage patch.
> >
> > Be warned: both Rob Elliott and I can easily break
> > the scsi-mq.3 branch. It seems as though a regression
> > has slipped in. I notice that Christoph has added a
> > new branch called "scsi-mq.3-no-rebase".
> 
> Rob/Doug, those issues look very much like problems in the aio code. Can
> either/both of you try with:
> 
> f8567a3845ac05bb28f3c1b478ef752762bd39ef
> edfbbf388f293d70bf4b7c0bc38774d05e6f711a
> 
> reverted (in that order) and see if that changes anything.
> 
> 
> --
> Jens Axboe

scsi-mq.3-no-rebase, which has all the scsi updates from scsi-mq.3
but is based on 3.16.0-rc2 rather than 3.16.0-rc4, works fine:
* ^C exits fio cleanly with scsi_debug devices
* ^C exits fio cleanly with mpt3sas devices
* fio hits 1M IOPS with 16 hpsa devices
* fio hits 700K IOPS with 6 mpt3sas devices
* 38 device test to mpt3sas, hpsa, and scsi_debug devices runs OK


With:
* scsi-mq-3, which is based on 3.16.0-rc4
* [PATCH] x86-64: fix vDSO build from https://lkml.org/lkml/2014/7/3/738
* those two aio patches reverted

the problem still occurs - fio results in low or 0 IOPS, with perf top 
reporting unusual amounts of time spent in do_io_submit and io_submit.

perf top:
 14.38%  [kernel]              [k] do_io_submit
 13.71%  libaio.so.1.0.1       [.] io_submit
 13.32%  [kernel]              [k] system_call
 11.60%  [kernel]              [k] system_call_after_swapgs
  8.88%  [kernel]              [k] lookup_ioctx
  8.78%  [kernel]              [k] copy_user_generic_string
  7.78%  [kernel]              [k] io_submit_one
  5.97%  [kernel]              [k] blk_flush_plug_list
  2.73%  fio                   [.] fio_libaio_commit
  2.70%  [kernel]              [k] sysret_check
  2.68%  [kernel]              [k] blk_finish_plug
  1.98%  [kernel]              [k] blk_start_plug
  1.17%  [kernel]              [k] SyS_io_submit
  1.17%  [kernel]              [k] __get_user_4
  0.99%  fio                   [.] io_submit@plt
  0.85%  [kernel]              [k] _copy_from_user
  0.79%  [kernel]              [k] system_call_fastpath

Repeating some of last night's investigation details for the lists:

ftrace of one of the CPUs for all functions shows these 
are repeatedly being called:
 
           <...>-34508 [004] ....  6360.790714: io_submit_one <-do_io_submit
           <...>-34508 [004] ....  6360.790714: blk_finish_plug <-do_io_submit
           <...>-34508 [004] ....  6360.790714: blk_flush_plug_list <-blk_finish_plug
           <...>-34508 [004] ....  6360.790714: SyS_io_submit <-system_call_fastpath
           <...>-34508 [004] ....  6360.790715: do_io_submit <-SyS_io_submit
           <...>-34508 [004] ....  6360.790715: lookup_ioctx <-do_io_submit
           <...>-34508 [004] ....  6360.790715: blk_start_plug <-do_io_submit
           <...>-34508 [004] ....  6360.790715: io_submit_one <-do_io_submit
           <...>-34508 [004] ....  6360.790715: blk_finish_plug <-do_io_submit
           <...>-34508 [004] ....  6360.790715: blk_flush_plug_list <-blk_finish_plug
           <...>-34508 [004] ....  6360.790715: SyS_io_submit <-system_call_fastpath
           <...>-34508 [004] ....  6360.790715: do_io_submit <-SyS_io_submit
           <...>-34508 [004] ....  6360.790715: lookup_ioctx <-do_io_submit
           <...>-34508 [004] ....  6360.790716: blk_start_plug <-do_io_submit
           <...>-34508 [004] ....  6360.790716: io_submit_one <-do_io_submit
           <...>-34508 [004] ....  6360.790716: blk_finish_plug <-do_io_submit
           <...>-34508 [004] ....  6360.790716: blk_flush_plug_list <-blk_finish_plug
           <...>-34508 [004] ....  6360.790716: SyS_io_submit <-system_call_fastpath
           <...>-34508 [004] ....  6360.790716: do_io_submit <-SyS_io_submit
           <...>-34508 [004] ....  6360.790716: lookup_ioctx <-do_io_submit
           <...>-34508 [004] ....  6360.790716: blk_start_plug <-do_io_submit
           <...>-34508 [004] ....  6360.790717: io_submit_one <-do_io_submit
           <...>-34508 [004] ....  6360.790717: blk_finish_plug <-do_io_submit
           <...>-34508 [004] ....  6360.790717: blk_flush_plug_list <-blk_finish_plug
           <...>-34508 [004] ....  6360.790717: SyS_io_submit <-system_call_fastpath
           <...>-34508 [004] ....  6360.790717: do_io_submit <-SyS_io_submit

fs/aio.c do_io_submit is apparently completing (many times) - it's not
stuck in the for loop:
        blk_start_plug(&plug);

        /*
         * AKPM: should this return a partial result if some of the IOs were
         * successfully submitted?
         */
        for (i=0; i<nr; i++) {
                struct iocb __user *user_iocb;
                struct iocb tmp;

                if (unlikely(__get_user(user_iocb, iocbpp + i))) {
                        ret = -EFAULT;
                        break;
                }

                if (unlikely(copy_from_user(&tmp, user_iocb, sizeof(tmp)))) {
                        ret = -EFAULT;
                        break;
                }

                ret = io_submit_one(ctx, user_iocb, &tmp, compat);
                if (ret)
                        break;
        }
        blk_finish_plug(&plug);


fs/aio.c io_submit_one is not getting to fget, which is traceable:
        /* enforce forwards compatibility on users */
        if (unlikely(iocb->aio_reserved1 || iocb->aio_reserved2)) {
                pr_debug("EINVAL: reserve field set\n");
                return -EINVAL;
        }

        /* prevent overflows */
        if (unlikely(
            (iocb->aio_buf != (unsigned long)iocb->aio_buf) ||
            (iocb->aio_nbytes != (size_t)iocb->aio_nbytes) ||
            ((ssize_t)iocb->aio_nbytes < 0)
           )) {
                pr_debug("EINVAL: io_submit: overflow check\n");
                return -EINVAL;
        }

        req = aio_get_req(ctx);
        if (unlikely(!req))
                return -EAGAIN;

        req->ki_filp = fget(iocb->aio_fildes);

I don't have that file compiled with -DDEBUG so the pr_debug
prints are unavailable.  The -EAGAIN seems most likely to lead 
to a hang like this.

aio_get_req is not getting to kmem_cache_alloc, which is
traceable:
        if (!get_reqs_available(ctx))
                return NULL;

        req = kmem_cache_alloc(kiocb_cachep, GFP_KERNEL|__GFP_ZERO);

get_reqs_available is probably returning false because not
enough reqs are available compared to req_batch:

        struct kioctx_cpu *kcpu;
        bool ret = false;

        preempt_disable();
        kcpu = this_cpu_ptr(ctx->cpu);

        if (!kcpu->reqs_available) {
                int old, avail = atomic_read(&ctx->reqs_available);

                do {
                        if (avail < ctx->req_batch)
                                goto out;

                        old = avail;
                        avail = atomic_cmpxchg(&ctx->reqs_available,
                                               avail, avail - ctx->req_batch);
                } while (avail != old);

                kcpu->reqs_available += ctx->req_batch;
        }

        ret = true;
        kcpu->reqs_available--;
out:
        preempt_enable();
        return ret;


---
Rob Elliott    HP Server Storage




  reply	other threads:[~2014-07-10  0:56 UTC|newest]

Thread overview: 92+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-06-25 16:51 scsi-mq V2 Christoph Hellwig
2014-06-25 16:51 ` [PATCH 01/14] sd: don't use rq->cmd_len before setting it up Christoph Hellwig
2014-07-09 11:12   ` Hannes Reinecke
2014-07-09 11:12     ` Hannes Reinecke
2014-07-09 15:03     ` Christoph Hellwig
2014-06-25 16:51 ` [PATCH 02/14] scsi: split __scsi_queue_insert Christoph Hellwig
2014-07-09 11:12   ` Hannes Reinecke
2014-06-25 16:51 ` [PATCH 03/14] scsi: centralize command re-queueing in scsi_dispatch_fn Christoph Hellwig
2014-07-08 20:51   ` Elliott, Robert (Server Storage)
2014-07-09  6:40     ` Christoph Hellwig
2014-07-09 11:13   ` Hannes Reinecke
2014-06-25 16:51 ` [PATCH 04/14] scsi: set ->scsi_done before calling scsi_dispatch_cmd Christoph Hellwig
2014-07-09 11:14   ` Hannes Reinecke
2014-06-25 16:51 ` [PATCH 05/14] scsi: push host_lock down into scsi_{host,target}_queue_ready Christoph Hellwig
2014-07-09 11:14   ` Hannes Reinecke
2014-06-25 16:51 ` [PATCH 06/14] scsi: convert target_busy to an atomic_t Christoph Hellwig
2014-07-09 11:15   ` Hannes Reinecke
2014-07-09 11:15     ` Hannes Reinecke
2014-06-25 16:51 ` [PATCH 07/14] scsi: convert host_busy to atomic_t Christoph Hellwig
2014-07-09 11:15   ` Hannes Reinecke
2014-06-25 16:51 ` [PATCH 08/14] scsi: convert device_busy " Christoph Hellwig
2014-07-09 11:16   ` Hannes Reinecke
2014-07-09 16:49   ` James Bottomley
2014-07-10  6:01     ` Christoph Hellwig
2014-06-25 16:51 ` [PATCH 09/14] scsi: fix the {host,target,device}_blocked counter mess Christoph Hellwig
2014-07-09 11:12   ` Hannes Reinecke
2014-07-10  6:06     ` Christoph Hellwig
2014-06-25 16:51 ` [PATCH 10/14] scsi: only maintain target_blocked if the driver has a target queue limit Christoph Hellwig
2014-07-09 11:19   ` Hannes Reinecke
2014-07-09 15:05     ` Christoph Hellwig
2014-06-25 16:51 ` [PATCH 11/14] scsi: unwind blk_end_request_all and blk_end_request_err calls Christoph Hellwig
2014-07-09 11:20   ` Hannes Reinecke
2014-06-25 16:51 ` [PATCH 12/14] scatterlist: allow chaining to preallocated chunks Christoph Hellwig
2014-07-09 11:21   ` Hannes Reinecke
2014-06-25 16:52 ` [PATCH 13/14] scsi: add support for a blk-mq based I/O path Christoph Hellwig
2014-07-09 11:25   ` Hannes Reinecke
2014-07-16 11:13   ` Mike Christie
2014-07-16 11:16     ` Christoph Hellwig
2014-06-25 16:52 ` [PATCH 14/14] fnic: reject device resets without assigned tags for the blk-mq case Christoph Hellwig
2014-07-09 11:27   ` Hannes Reinecke
2014-06-26  4:50 ` scsi-mq V2 Jens Axboe
2014-06-26 22:07   ` Elliott, Robert (Server Storage)
2014-06-27 14:42     ` Bart Van Assche
2014-06-30 15:20   ` Jens Axboe
2014-06-30 15:25     ` Christoph Hellwig
2014-06-30 15:54       ` Martin K. Petersen
2014-07-08 14:48 ` Christoph Hellwig
2014-07-09 16:39   ` Douglas Gilbert
2014-07-09 19:38     ` Jens Axboe
2014-07-10  0:53       ` Elliott, Robert (Server Storage) [this message]
2014-07-10  6:20         ` Christoph Hellwig
2014-07-10 13:36           ` Benjamin LaHaise
2014-07-10 13:39             ` Jens Axboe
2014-07-10 13:44               ` Benjamin LaHaise
2014-07-10 13:48                 ` Jens Axboe
2014-07-10 13:50                   ` Benjamin LaHaise
2014-07-10 13:52                     ` Jens Axboe
2014-07-10 13:50             ` Christoph Hellwig
2014-07-10 13:52               ` Jens Axboe
2014-07-10 14:36                 ` Elliott, Robert (Server Storage)
2014-07-10 14:45                   ` Benjamin LaHaise
2014-07-10 15:11                     ` Jeff Moyer
2014-07-10 15:11                       ` Jeff Moyer
2014-07-10 19:59                       ` Jens Axboe
2014-07-10 19:59                         ` Jens Axboe
2014-07-10 20:05                         ` Jeff Moyer
2014-07-10 20:05                           ` Jeff Moyer
2014-07-10 20:06                           ` Jens Axboe
2014-07-10 20:06                             ` Jens Axboe
2014-07-10 15:51           ` Elliott, Robert (Server Storage)
2014-07-10 16:04             ` Christoph Hellwig
2014-07-10 16:14               ` Christoph Hellwig
2014-07-10 18:49                 ` Elliott, Robert (Server Storage)
2014-07-10 19:14                   ` Jeff Moyer
2014-07-10 19:14                     ` Jeff Moyer
2014-07-10 19:36                     ` Jeff Moyer
2014-07-10 19:36                       ` Jeff Moyer
2014-07-10 21:10                     ` Elliott, Robert (Server Storage)
2014-07-11  6:02                       ` Elliott, Robert (Server Storage)
2014-07-11  6:14                         ` Christoph Hellwig
2014-07-11 14:33                           ` Elliott, Robert (Server Storage)
2014-07-11 14:55                             ` Benjamin LaHaise
2014-07-12 21:50                               ` Elliott, Robert (Server Storage)
2014-07-12 23:20                                 ` Elliott, Robert (Server Storage)
2014-07-13 17:15                                   ` Elliott, Robert (Server Storage)
2014-07-14 17:15                                     ` Benjamin LaHaise
2014-07-14  9:13   ` Sagi Grimberg
2014-08-21 12:32     ` Performance degradation in IO writes vs. reads (was scsi-mq V2) Sagi Grimberg
2014-08-21 12:32       ` Sagi Grimberg
2014-08-21 13:03       ` Christoph Hellwig
2014-08-21 14:02         ` Sagi Grimberg
2014-08-24 16:41           ` Sagi Grimberg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=94D0CD8314A33A4D9D801C0FE68B402958B9628B@G9W0745.americas.hpqcorp.net \
    --to=elliott@hp.com \
    --cc=James.Bottomley@HansenPartnership.com \
    --cc=axboe@kernel.dk \
    --cc=bcrl@kvack.org \
    --cc=bvanassche@fusionio.com \
    --cc=dgilbert@interlog.com \
    --cc=hch@infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.