All of lore.kernel.org
 help / color / mirror / Atom feed
From: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
To: James.Bottomley@HansenPartnership.com
Cc: fujita.tomonori@lab.ntt.co.jp, jens.axboe@oracle.com,
	rdreier@cisco.com, bharrosh@panasas.com,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	chris.mason@oracle.com, david@fromorbit.com, hch@infradead.org,
	akpm@linux-foundation.org, jack@suse.cz,
	yanmin_zhang@linux.intel.com, linux-scsi@vger.kernel.org
Subject: Re: [PATCH 03/13] scsi: unify allocation of scsi command and sense buffer
Date: Wed, 27 May 2009 00:31:43 +0900	[thread overview]
Message-ID: <20090527003152X.fujita.tomonori@lab.ntt.co.jp> (raw)
In-Reply-To: <1243349222.2815.22.camel@localhost.localdomain>

On Tue, 26 May 2009 09:47:02 -0500
James Bottomley <James.Bottomley@HansenPartnership.com> wrote:

> On Tue, 2009-05-26 at 16:38 +0900, FUJITA Tomonori wrote:
> > On Tue, 26 May 2009 09:32:29 +0200
> > Jens Axboe <jens.axboe@oracle.com> wrote:
> > 
> > > On Tue, May 26 2009, FUJITA Tomonori wrote:
> > > > On Tue, 26 May 2009 08:29:53 +0200
> > > > Jens Axboe <jens.axboe@oracle.com> wrote:
> > > > 
> > > > > On Tue, May 26 2009, FUJITA Tomonori wrote:
> > > > > > On Mon, 25 May 2009 18:45:25 -0700
> > > > > > Roland Dreier <rdreier@cisco.com> wrote:
> > > > > > 
> > > > > > >  > Ideally there should be a MACRO that is defined to WORD_SIZE on cache-coherent
> > > > > > >  > ARCHs and to SMP_CACHE_BYTES on none-cache-coherent systems and use that size
> > > > > > >  > at the __align() attribute. (So only stupid ARCHES get hurt)
> > > > > > > 
> > > > > > > this seems to come up repeatedly -- I had a proposal a _long_ time ago
> > > > > > > that never quite got merged, cf http://lwn.net/Articles/2265/ and
> > > > > > > http://lwn.net/Articles/2269/ -- from 2002 (!?).  The idea is to go a
> > > > > > 
> > > > > > Yeah, I think that Benjamin did last time:
> > > > > > 
> > > > > > http://www.mail-archive.com/linux-scsi@vger.kernel.org/msg12632.html
> > > > > > 
> > > > > > IIRC, James didn't like it so I wrote the current code. I didn't see
> > > > > > any big performance difference with scsi_debug:
> > > > > > 
> > > > > > http://marc.info/?l=linux-scsi&m=120038907123706&w=2
> > > > > > 
> > > > > > Jens, you see the performance difference due to this unification?
> > > > > 
> > > > > Yes, it's definitely a worth while optimization. The problem isn't as
> > > > > such this specific allocation, it's the total number of allocations we
> > > > > do for a piece of IO. This sense buffer one is just one of many, I'm
> > > > > continually working to reduce them. If we get rid of this one and add
> > > > > the ->alloc_cmd() stuff, we can kill one more. The bio path already lost
> > > > > one. So in the IO stack, we went from 6 allocations to 3 for a piece of
> > > > > IO. And then it starts to add up. Even at just 30-50k iops, that's more
> > > > > than 1% of time in the testing I did.
> > > > 
> > > > I see, thanks. Hmm, possibly slab becomes slower. ;)
> > > > 
> > > > Then I think that we need something like the ->alloc_cmd()
> > > > method. Let's ask James. 
> > > > 
> > > > I don't think that it's just about simply adding the hook; there are
> > > > some issues that we need to think about. Though Boaz worries too much
> > > > a bit, I think.
> > > > 
> > > > I'm not sure about this patch if we add ->alloc_cmd(). I doubt that
> > > > there are any llds don't use ->alloc_cmd() worry about the overhead of
> > > > the separated sense buffer allocation. If a lld doesn't define the own
> > > > alloc_cmd, then I think it's fine to use the generic command
> > > > allocator that does the separate sense buffer allocation.
> > > 
> > > I think we should do the two things seperately. If we can safely inline
> > > the sense buffer in the command by doing the right alignment, then lets
> > > do that. The ->alloc_cmd() approach will be easier to do with an inline
> > > sense buffer.
> > 
> > James rejected this in the past. Let's wait for his verdict.
> 
> OK, so the reason for the original problems where the sense buffer was
> inlined with the scsi_command was that we need to DMA to the sense
> buffer but not to the command.  Plus the command is in fairly constant
> use so we get cacheline interference unless they're always in separate
> caches.  This necessitates opening up a hole in the command to achieve
> this (you can separate to the next cache line if you can guarantee that
> the command always begins on a cacheline.  If not, it has to be
> 2*cacheline).  The L1 cacheline can be up to 128 bytes on some
> architectures, so we'd need to know the waste of space is worth it in
> terms of speed.  The other problem is that the entire command now has to
> be allocated in DMAable memory, which restricts the allocation on some
> systems.

Yeah, I think that there are good reasons why we shouldn't inline the
sense buffer. As I already wrote, seems that the DMA requirement
wasn't properly understood; it's not about the alignment.


> > Yeah, we can inline the sense buffer but as we discussed in the past
> > several times, there are some good reasons that we should not do so, I
> > think.
> 
> There are several other approaches:
> 
>      1. Keep the sense buffer packed in the command but disallow DMA to
>         it, which fixes all the alignment problems.  Then we supply a
>         set of rotating DMA buffers to drivers which need to do the DMA
>         (which isn't the majority).

Can we just fix some drivers not to do the DMA with the sense buffer in
scsi_cmnd? IIRC, there are only five or six drivers that do such.

  parent reply	other threads:[~2009-05-26 15:33 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-05-25  7:30 [PATCH 0/12] Per-bdi writeback flusher threads #5 Jens Axboe
2009-05-25  7:30 ` [PATCH 01/13] libata: get rid of ATA_MAX_QUEUE loop in ata_qc_complete_multiple() Jens Axboe
2009-05-25  7:30 ` [PATCH 01/12] ntfs: remove old debug check for dirty data in ntfs_put_super() Jens Axboe
2009-05-25  7:30 ` [PATCH 02/13] block: add static rq allocation cache Jens Axboe
2009-05-25  7:30 ` [PATCH 02/12] btrfs: properly register fs backing device Jens Axboe
2009-05-25  7:30 ` [PATCH 03/13] scsi: unify allocation of scsi command and sense buffer Jens Axboe
2009-05-25  7:41   ` Christoph Hellwig
2009-05-25  7:46     ` Jens Axboe
2009-05-25  7:50       ` Christoph Hellwig
2009-05-25  7:54         ` Jens Axboe
2009-05-25 10:33         ` Boaz Harrosh
2009-05-25 10:42           ` Christoph Hellwig
2009-05-25 10:49             ` Jens Axboe
2009-05-26  4:36         ` FUJITA Tomonori
2009-05-26  5:08           ` FUJITA Tomonori
2009-05-25  8:15   ` Pekka Enberg
2009-05-25  8:15     ` Pekka Enberg
2009-05-25 11:32     ` Nick Piggin
2009-05-25  9:28   ` Boaz Harrosh
2009-05-26  1:45     ` Roland Dreier
2009-05-26  4:36       ` FUJITA Tomonori
2009-05-26  6:29         ` Jens Axboe
2009-05-26  7:25           ` FUJITA Tomonori
2009-05-26  7:32             ` Jens Axboe
2009-05-26  7:38               ` FUJITA Tomonori
2009-05-26 14:47                 ` James Bottomley
2009-05-26 15:13                   ` Matthew Wilcox
2009-05-26 15:31                   ` FUJITA Tomonori [this message]
2009-05-26 16:05                     ` Boaz Harrosh
2009-05-27  1:36                       ` FUJITA Tomonori
2009-05-27  7:54                         ` Boaz Harrosh
2009-05-27  8:26                           ` FUJITA Tomonori
2009-05-27  9:11                             ` Boaz Harrosh
2009-05-26 16:12                   ` Boaz Harrosh
2009-05-26 16:28                     ` Boaz Harrosh
2009-05-26  7:56               ` FUJITA Tomonori
2009-05-26  5:23     ` FUJITA Tomonori
2009-05-25  7:30 ` [PATCH 03/12] writeback: move dirty inodes from super_block to backing_dev_info Jens Axboe
2009-05-25  7:30 ` [PATCH 04/13] scsi: get rid of lock in __scsi_put_command() Jens Axboe
2009-05-25  7:30 ` [PATCH 04/12] writeback: switch to per-bdi threads for flushing data Jens Axboe
2009-05-25  7:30 ` [PATCH 05/13] aio: mostly crap Jens Axboe
2009-05-25  9:09   ` Jan Kara
2009-05-25  7:30 ` [PATCH 05/12] writeback: get rid of pdflush completely Jens Axboe
2009-05-25  7:30 ` [PATCH 06/13] block: move elevator ops into the queue Jens Axboe
2009-05-25  7:30 ` [PATCH 06/12] writeback: separate the flushing state/task from the bdi Jens Axboe
2009-05-25  7:30 ` [PATCH 07/13] block: avoid indirect calls to enter cfq io scheduler Jens Axboe
2009-05-26  9:02   ` Nikanth K
2009-05-26  9:02     ` Nikanth K
2009-05-25  7:30 ` [PATCH 07/12] writeback: support > 1 flusher thread per bdi Jens Axboe
2009-05-25  7:30 ` [PATCH 08/13] block: change the tag sync vs async restriction logic Jens Axboe
2009-05-25  7:30 ` [PATCH 08/12] writeback: include default_backing_dev_info in writeback Jens Axboe
2009-05-25  7:31 ` [PATCH 09/13] libata: switch to using block layer tagging support Jens Axboe
2009-05-25  7:31 ` [PATCH 09/12] writeback: allow sleepy exit of default writeback task Jens Axboe
2009-05-25  7:31 ` [PATCH 10/13] block: add function for waiting for a specific free tag Jens Axboe
2009-05-25  7:31 ` [PATCH 10/12] writeback: add some debug inode list counters to bdi stats Jens Axboe
2009-05-25  7:31 ` [PATCH 11/13] block: disallow merging of read-ahead bits into normal request Jens Axboe
2009-05-25  7:31 ` [PATCH 11/12] writeback: add name to backing_dev_info Jens Axboe
2009-05-25  7:31 ` [PATCH 12/13] block: first cut at implementing a NAPI approach for block devices Jens Axboe
2009-05-25  7:31 ` [PATCH 12/12] writeback: check for registered bdi in flusher add and inode dirty Jens Axboe
2009-05-25  7:31 ` [PATCH 13/13] block: unlocked completion test patch Jens Axboe
2009-05-25  7:33 ` [PATCH 0/12] Per-bdi writeback flusher threads #5 Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090527003152X.fujita.tomonori@lab.ntt.co.jp \
    --to=fujita.tomonori@lab.ntt.co.jp \
    --cc=James.Bottomley@HansenPartnership.com \
    --cc=akpm@linux-foundation.org \
    --cc=bharrosh@panasas.com \
    --cc=chris.mason@oracle.com \
    --cc=david@fromorbit.com \
    --cc=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=jens.axboe@oracle.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=rdreier@cisco.com \
    --cc=yanmin_zhang@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.