All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ming Lei <ming.lei@redhat.com>
To: Guenter Roeck <linux@roeck-us.net>
Cc: James Bottomley <James.Bottomley@hansenpartnership.com>,
	linux-scsi@vger.kernel.org,
	"Martin K . Petersen" <martin.petersen@oracle.com>,
	linux-block@vger.kernel.org, Christoph Hellwig <hch@lst.de>,
	Bart Van Assche <bvanassche@acm.org>,
	"Ewan D . Milne" <emilne@redhat.com>,
	Hannes Reinecke <hare@suse.com>
Subject: Re: [PATCH V4 3/3] scsi: core: avoid to pre-allocate big chunk for sg list
Date: Tue, 4 Jun 2019 14:55:59 +0800	[thread overview]
Message-ID: <20190604065558.GA6059@ming.t460p> (raw)
In-Reply-To: <cdf94e43-79a7-078e-676d-dfc736eec286@roeck-us.net>

On Mon, Jun 03, 2019 at 08:49:10PM -0700, Guenter Roeck wrote:
> On 6/3/19 6:00 PM, Ming Lei wrote:
> > On Mon, Jun 03, 2019 at 01:44:22PM -0700, Guenter Roeck wrote:
> > > On Sun, Apr 28, 2019 at 03:39:32PM +0800, Ming Lei wrote:
> > > > Now scsi_mq_setup_tags() pre-allocates a big buffer for IO sg list,
> > > > and the buffer size is scsi_mq_sgl_size() which depends on smaller
> > > > value between shost->sg_tablesize and SG_CHUNK_SIZE.
> > > > 
> > > > Modern HBA's DMA is often capable of deadling with very big segment
> > > > number, so scsi_mq_sgl_size() is often big. Suppose the max sg number
> > > > of SG_CHUNK_SIZE is taken, scsi_mq_sgl_size() will be 4KB.
> > > > 
> > > > Then if one HBA has lots of queues, and each hw queue's depth is
> > > > high, pre-allocation for sg list can consume huge memory.
> > > > For example of lpfc, nr_hw_queues can be 70, each queue's depth
> > > > can be 3781, so the pre-allocation for data sg list is 70*3781*2k
> > > > =517MB for single HBA.
> > > > 
> > > > There is Red Hat internal report that scsi_debug based tests can't
> > > > be run any more since legacy io path is killed because too big
> > > > pre-allocation.
> > > > 
> > > > So switch to runtime allocation for sg list, meantime pre-allocate 2
> > > > inline sg entries. This way has been applied to NVMe PCI for a while,
> > > > so it should be fine for SCSI too. Also runtime sg entries allocation
> > > > has verified and run always in the original legacy io path.
> > > > 
> > > > Not see performance effect in my big BS test on scsi_debug.
> > > > 
> > > 
> > > This patch causes a variety of boot failures in -next. Typical failure
> > > pattern is scsi hangs or failure to find a root file system. For example,
> > > on alpha, trying to boot from usb:
> > 
> > I guess it is because alpha doesn't support sg chaining, and
> > CONFIG_ARCH_NO_SG_CHAIN is enabled. ARCHs not supporting sg chaining
> > can only be arm, alpha and parisc.
> > 
> 
> I don't think it is that simple. I do see the problem on x86 (32 and 64 bit)
> sparc, ppc, and m68k as well, and possibly others (I didn't check all because
> -next is in terrible shape right now). Error log is always a bit different
> but similar.
> 
> On sparc:
> 
> scsi host0: Data transfer overflow.
> scsi host0: cur_residue[0] tot_residue[-181604017] len[8192]
> scsi host0: DMA length is zero!
> scsi host0: cur adr[f000f000] len[00000000]
> scsi host0: Data transfer overflow.
> scsi host0: cur_residue[0] tot_residue[-181604017] len[8192]
> scsi host0: DMA length is zero!
> 
> On ppc:
> 
> scsi host0: DMA length is zero!
> scsi host0: cur adr[0fd21000] len[00000000]
> scsi host0: Aborting command [(ptrval):28]
> scsi host0: Current command [(ptrval):28]
> scsi host0:  Active command [(ptrval):28]
> 
> On x86, x86_64 (after reverting a different crash-causing patch):
> 
> [   20.226809] scsi host0: DMA length is zero!
> [   20.227459] scsi host0: cur adr[00000000] len[00000000]
> [   50.588814] scsi host0: Aborting command [(____ptrval____):28]
> [   50.589210] scsi host0: Current command [(____ptrval____):28]
> [   50.589447] scsi host0:  Active command [(____ptrval____):28]
> [   50.589674] scsi host0: Dumping command log
> 
> On m68k there is a crash.
> 
> Unable to handle kernel NULL pointer dereference at virtual address (ptrval)
> Oops: 00000000
> Modules linked in:
> PC: [<00203a9e>] esp_maybe_execute_command+0x31e/0x46c
> SR: 2704  SP: (ptrval)  a2: 07c1ea20
> d0: 00000002    d1: 00000400    d2: 00000000    d3: 00002000
> d4: 00000000    d5: 00000000    a0: 07db753c    a1: 00000000
> Process kworker/0:0H (pid: 4, task=(ptrval))
> Frame format=7 eff addr=00000020 ssw=0505 faddr=00000020
> wb 1 stat/addr/data: 0000 00000000 00000000
> wb 2 stat/addr/data: 0000 00000000 00000000
> wb 3 stat/addr/data: 0000 00000020 00000000
> push data: 00000000 00000000 00000000 00000000
> Stack from 07c2be08:
> ...
> Call Trace: [<00002000>] _start+0x0/0x8
>  [<001f65ca>] scsi_mq_done+0x0/0x2c
>  [<001887fe>] blk_mq_get_driver_tag+0x0/0xba
>  [<00188800>] blk_mq_get_driver_tag+0x2/0xba
>  [<00025aec>] create_worker+0x0/0x14e
>  [<00203f64>] esp_queuecommand+0x9c/0xa2
>  [<00203f3a>] esp_queuecommand+0x72/0xa2
>  [<001f7fd4>] scsi_queue_rq+0x54e/0x5ba
>  [<00188b4a>] blk_mq_dispatch_rq_list+0x292/0x38a
>  [<00188656>] blk_mq_dequeue_from_ctx+0x0/0x1a8
>  [<001888b8>] blk_mq_dispatch_rq_list+0x0/0x38a
>  [<00025aec>] create_worker+0x0/0x14e
> ...
> 
> All those problems are fixed by reverting this patch.

All above problem is esp_scsi specific, and esp_scsi does not support
sg chain.

I will try to figure out one patch to cover all.

esp_map_dma():
        if (esp->flags & ESP_FLAG_NO_DMA_MAP) {
                /*
                 * For pseudo DMA and PIO we need the virtual address instead of
                 * a dma address, so perform an identity mapping.
                 */
                spriv->num_sg = scsi_sg_count(cmd);
                for (i = 0; i < spriv->num_sg; i++) {
                        sg[i].dma_address = (uintptr_t)sg_virt(&sg[i]);
                        total += sg_dma_len(&sg[i]);
                }
        } else {
                spriv->num_sg = scsi_dma_map(cmd);
                for (i = 0; i < spriv->num_sg; i++)
                        total += sg_dma_len(&sg[i]);
        }

esp_advance_dma():
	p->cur_sg++;

esp_msgin_process():
	spriv->cur_sg--;

Thanks,
Ming

  parent reply	other threads:[~2019-06-04  6:56 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-28  7:39 [PATCH V4 0/3] scsi: core: avoid big pre-allocation for sg list Ming Lei
2019-04-28  7:39 ` Ming Lei
2019-04-28  7:39 ` [PATCH V4 1/3] lib/sg_pool.c: improve APIs for allocating sg pool Ming Lei
2019-04-28  7:39   ` Ming Lei
2019-04-28 12:04   ` Christoph Hellwig
2019-04-28 12:04     ` Christoph Hellwig
2019-04-28  7:39 ` [PATCH V4 2/3] scsi: core: avoid to pre-allocate big chunk for protection meta data Ming Lei
2019-04-29 18:15   ` Bart Van Assche
2019-04-28  7:39 ` [PATCH V4 3/3] scsi: core: avoid to pre-allocate big chunk for sg list Ming Lei
2019-04-29 18:16   ` Bart Van Assche
2019-06-03 20:44   ` Guenter Roeck
2019-06-04  1:00     ` Ming Lei
2019-06-04  3:49       ` Guenter Roeck
2019-06-04  4:10         ` Ming Lei
2019-06-04 14:51           ` Guenter Roeck
2019-06-04  6:55         ` Ming Lei [this message]
2019-05-05  1:10 ` [PATCH V4 0/3] scsi: core: avoid big pre-allocation " Ming Lei
2019-05-05  1:10   ` Ming Lei
2019-05-14  2:06 ` Martin K. Petersen
2019-05-14  2:06   ` Martin K. Petersen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190604065558.GA6059@ming.t460p \
    --to=ming.lei@redhat.com \
    --cc=James.Bottomley@hansenpartnership.com \
    --cc=bvanassche@acm.org \
    --cc=emilne@redhat.com \
    --cc=hare@suse.com \
    --cc=hch@lst.de \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=linux@roeck-us.net \
    --cc=martin.petersen@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.