Message ID | 20060801030443.GA2221@gondor.apana.org.au |
---|---|
State | New, archived |
Headers | show |
Series |
|
Related | show |
On Tue, 1 Aug 2006 13:04:43 +1000 Herbert Xu <herbert@gondor.apana.org.au> wrote: > > There is a bug in jbd with slab debugging enabled where it was submitting > a bh obtained via jbd_rep_kmalloc which crossed a page boundary. A lot > of time was spent on tracking this down because the symptoms were far off > from where the problem was. > > This patch adds a sanity check to submit_bh so we can immediately spot > anyone doing similar things in future. Seems sane. > Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> > > While you're at it, could you fix that jbd bug for us :) Could we have a more detailed description? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
On Mon, Jul 31, 2006 at 09:04:18PM -0700, Andrew Morton wrote: > > Could we have a more detailed description? Sure, this particular instance is in journal_write_metadata_buffer where the bh may be constructed from kmalloc memory (search for the call to jbd_rep_kmalloc). Because the memory returned by kmalloc may straddle a page (when slab debugging is enabled that is), this causes a broken bh to be injected into submit_bh. Cheers,
On Tue, 1 Aug 2006 15:02:59 +1000 Herbert Xu <herbert@gondor.apana.org.au> wrote: > On Mon, Jul 31, 2006 at 09:04:18PM -0700, Andrew Morton wrote: > > > > Could we have a more detailed description? > > Sure, this particular instance is in journal_write_metadata_buffer > where the bh may be constructed from kmalloc memory (search for the > call to jbd_rep_kmalloc). Because the memory returned by kmalloc > may straddle a page (when slab debugging is enabled that is), this > causes a broken bh to be injected into submit_bh. > Crap, that's hard to fix. Am I allowed to blame submit_bh()? ;) uhm, we don't want to lose kmalloc redzoning, so I guess we need to create on-demand ext3-private slab caches for 1024, 2048, and 4096 bytes. With the appropriate slab flags to defeat the redzoning. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
On Tue, Aug 01 2006, Herbert Xu wrote: > Hi Andrew: > > [BLOCK] bh: Ensure bh fits within a page > > There is a bug in jbd with slab debugging enabled where it was submitting > a bh obtained via jbd_rep_kmalloc which crossed a page boundary. A lot > of time was spent on tracking this down because the symptoms were far off > from where the problem was. > > This patch adds a sanity check to submit_bh so we can immediately spot > anyone doing similar things in future. > > Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> > > While you're at it, could you fix that jbd bug for us :) > > Cheers, > -- > Visit Openswan at http://www.openswan.org/ > Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> > Home Page: http://gondor.apana.org.au/~herbert/ > PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt > -- > diff --git a/fs/buffer.c b/fs/buffer.c > index 71649ef..b998f08 100644 > --- a/fs/buffer.c > +++ b/fs/buffer.c > @@ -2790,6 +2790,7 @@ int submit_bh(int rw, struct buffer_head > BUG_ON(!buffer_locked(bh)); > BUG_ON(!buffer_mapped(bh)); > BUG_ON(!bh->b_end_io); > + WARN_ON(bh_offset(bh) + bh->b_size > PAGE_SIZE); That looks really dangerous, I'd prefer that to be a BUG_ON() as well to prevent nastiness further down.
On Mon, Jul 31, 2006 at 10:54:54PM -0700, Andrew Morton wrote: > > Crap, that's hard to fix. Am I allowed to blame submit_bh()? ;) Sure you're allowed to do anything :) > uhm, we don't want to lose kmalloc redzoning, so I guess we need to create > on-demand ext3-private slab caches for 1024, 2048, and 4096 bytes. With > the appropriate slab flags to defeat the redzoning. Either that or we should fix redzoning so that it actually preserves alignment, rather than turning off debugging whenever we want alignment. Basically it means that we need to use twice the amount of memory for each object (so 2K for each 1K object). Is this acceptable for debugging? Cheers,
On Tue, 1 Aug 2006 21:06:58 +1000 Herbert Xu <herbert@gondor.apana.org.au> wrote: > On Mon, Jul 31, 2006 at 10:54:54PM -0700, Andrew Morton wrote: > > > > Crap, that's hard to fix. Am I allowed to blame submit_bh()? ;) > > Sure you're allowed to do anything :) > > > uhm, we don't want to lose kmalloc redzoning, so I guess we need to create > > on-demand ext3-private slab caches for 1024, 2048, and 4096 bytes. With > > the appropriate slab flags to defeat the redzoning. > > Either that or we should fix redzoning so that it actually preserves > alignment, rather than turning off debugging whenever we want alignment. > Basically it means that we need to use twice the amount of memory for > each object (so 2K for each 1K object). Is this acceptable for debugging? It doesn't sound like a good idea. At present CONFIG_DEBUG_SLAB is acceptable for production use (although it makes kernel profiles look pretty sad). Doubling the size of the slabs would rather alter that. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
On Mon, 31 Jul 2006, Andrew Morton wrote: > > Sure, this particular instance is in journal_write_metadata_buffer > > where the bh may be constructed from kmalloc memory (search for the > > call to jbd_rep_kmalloc). Because the memory returned by kmalloc > > may straddle a page (when slab debugging is enabled that is), this > > causes a broken bh to be injected into submit_bh. > > > > Crap, that's hard to fix. Am I allowed to blame submit_bh()? ;) > > uhm, we don't want to lose kmalloc redzoning, so I guess we need to create > on-demand ext3-private slab caches for 1024, 2048, and 4096 bytes. With > the appropriate slab flags to defeat the redzoning. The slab allocator gives no guarantee that a structure is not straddling a page boundary regardless of debug or not. It may just happen that the objects are arranged if kmem_cache_cretae() is called with certain parameters. Another arch with other cacheline alignment and another page size may arrange the objects differently. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Hi, On 8/1/06, Christoph Lameter <clameter@sgi.com> wrote: > The slab allocator gives no guarantee that a structure is not straddling a > page boundary regardless of debug or not. It may just happen that the > objects are arranged if kmem_cache_cretae() is called with certain > parameters. Another arch with other cacheline alignment and another page > size may arrange the objects differently. Indeed. You could try to force zero-order page allocations for a cache, but we are probably better of using the page allocator directly or crafting a special purpose allocator for the block layer. Pekka - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
On Tue, 2006-08-01 at 10:36 -0700, Christoph Lameter wrote: > On Mon, 31 Jul 2006, Andrew Morton wrote: > > > > Sure, this particular instance is in journal_write_metadata_buffer > > > where the bh may be constructed from kmalloc memory (search for the > > > call to jbd_rep_kmalloc). Because the memory returned by kmalloc > > > may straddle a page (when slab debugging is enabled that is), this > > > causes a broken bh to be injected into submit_bh. > > > > > > > Crap, that's hard to fix. Am I allowed to blame submit_bh()? ;) > > > > uhm, we don't want to lose kmalloc redzoning, so I guess we need to create > > on-demand ext3-private slab caches for 1024, 2048, and 4096 bytes. With > > the appropriate slab flags to defeat the redzoning. > > The slab allocator gives no guarantee that a structure is not straddling a > page boundary regardless of debug or not. It may just happen that the > objects are arranged if kmem_cache_cretae() is called with certain > parameters. Another arch with other cacheline alignment and another page > size may arrange the objects differently. If you set the alignment for ext3 the same as the size (ie 1024, 2048, 4096 for the above respectively) then wouldn't that guarantee not straddling a page? -- Steve - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
On Tue, 1 Aug 2006, Steven Rostedt wrote: > If you set the alignment for ext3 the same as the size (ie 1024, 2048, > 4096 for the above respectively) then wouldn't that guarantee not > straddling a page? Yes. But then that number must always be a fraction of pagesize. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Herbert Xu <herbert@gondor.apana.org.au> writes: > -- > diff --git a/fs/buffer.c b/fs/buffer.c > index 71649ef..b998f08 100644 > --- a/fs/buffer.c > +++ b/fs/buffer.c > @@ -2790,6 +2790,7 @@ int submit_bh(int rw, struct buffer_head > BUG_ON(!buffer_locked(bh)); > BUG_ON(!buffer_mapped(bh)); > BUG_ON(!bh->b_end_io); > + WARN_ON(bh_offset(bh) + bh->b_size > PAGE_SIZE); What happens when someone implements direct large page IO? -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
On Tue, 2006-08-01 at 12:10 -0700, Christoph Lameter wrote: > On Tue, 1 Aug 2006, Steven Rostedt wrote: > > > If you set the alignment for ext3 the same as the size (ie 1024, 2048, > > 4096 for the above respectively) then wouldn't that guarantee not > > straddling a page? > > Yes. But then that number must always be a fraction of pagesize. > understood, as is 1024, 2048, and 4096 are. Well, if pagesize is 4096 is 4096 really a fraction of 4096? :) Also, isn't all sizes for kmalloc that are under pagesize a fraction of the page size? Or more correctly, a power of 2? -- Steve - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
On Tue, 1 Aug 2006, Steven Rostedt wrote: > > Yes. But then that number must always be a fraction of pagesize. > > > > understood, as is 1024, 2048, and 4096 are. Well, if pagesize is 4096 > is 4096 really a fraction of 4096? :) > > Also, isn't all sizes for kmalloc that are under pagesize a fraction of > the page size? Or more correctly, a power of 2? Well the size of the object and the alignment must be a fraction of the pagesize. If you get page aligned pages then I wonder why use the slab allocator? The page allocator will be much better suited. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
On Tue, Aug 01, 2006 at 09:33:54PM +0200, Andi Kleen wrote: > > diff --git a/fs/buffer.c b/fs/buffer.c > > index 71649ef..b998f08 100644 > > --- a/fs/buffer.c > > +++ b/fs/buffer.c > > @@ -2790,6 +2790,7 @@ int submit_bh(int rw, struct buffer_head > > BUG_ON(!buffer_locked(bh)); > > BUG_ON(!buffer_mapped(bh)); > > BUG_ON(!bh->b_end_io); > > + WARN_ON(bh_offset(bh) + bh->b_size > PAGE_SIZE); > > What happens when someone implements direct large page IO? Then they'll need to change submit_bh to generate more than one bvec. At that time they can remove this warning :) Cheers,
Jens Axboe <axboe@suse.de> wrote: > > That looks really dangerous, I'd prefer that to be a BUG_ON() as well to > prevent nastiness further down. OK, I used a WARN_ON mainly because ext3 has been doing this for years without killing anyone until now :) [BLOCK] bh: Ensure bh fits within a page There is a bug in jbd with slab debugging enabled where it was submitting a bh obtained via jbd_rep_kmalloc which crossed a page boundary. A lot of time was spent on tracking this down because the symptoms were far off from where the problem was. This patch adds a sanity check to submit_bh so we can immediately spot anyone doing similar things in future. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Cheers,
On Wed, Aug 02, 2006 at 09:30:51AM +1000, Herbert Xu wrote: > > OK, I used a WARN_ON mainly because ext3 has been doing this for years > without killing anyone until now :) > > [BLOCK] bh: Ensure bh fits within a page Actually, the other reason is that we can't BUG_ON until ext3 is fixed to not do this. So I suppose we should keep the WARN_ON until that happens. Cheers,
On Wed, 02 Aug 2006 09:30:51 +1000 Herbert Xu <herbert@gondor.apana.org.au> wrote: > Jens Axboe <axboe@suse.de> wrote: > > > > That looks really dangerous, I'd prefer that to be a BUG_ON() as well to > > prevent nastiness further down. > > OK, I used a WARN_ON mainly because ext3 has been doing this for years > without killing anyone until now :) I can't apply either of these patches, because we _know_ it'll trigger. Once the JBD fix is in hand, we can go with a WARN_ON_ONCE and if that is all clear, then we can make it a BUG_ON(). - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
On Wed, Aug 02 2006, Herbert Xu wrote: > On Wed, Aug 02, 2006 at 09:30:51AM +1000, Herbert Xu wrote: > > > > OK, I used a WARN_ON mainly because ext3 has been doing this for years > > without killing anyone until now :) > > > > [BLOCK] bh: Ensure bh fits within a page > > Actually, the other reason is that we can't BUG_ON until ext3 is fixed > to not do this. So I suppose we should keep the WARN_ON until that > happens. Yep, the ext3 bits need to go in first.
On Wed, Aug 02 2006, Herbert Xu wrote: > On Tue, Aug 01, 2006 at 09:33:54PM +0200, Andi Kleen wrote: > > > diff --git a/fs/buffer.c b/fs/buffer.c > > > index 71649ef..b998f08 100644 > > > --- a/fs/buffer.c > > > +++ b/fs/buffer.c > > > @@ -2790,6 +2790,7 @@ int submit_bh(int rw, struct buffer_head > > > BUG_ON(!buffer_locked(bh)); > > > BUG_ON(!buffer_mapped(bh)); > > > BUG_ON(!bh->b_end_io); > > > + WARN_ON(bh_offset(bh) + bh->b_size > PAGE_SIZE); > > > > What happens when someone implements direct large page IO? > > Then they'll need to change submit_bh to generate more than one bvec. > At that time they can remove this warning :) Or just use a proper interface, no new code should touch submit_bh().
On Wed, 2006-08-02 at 09:30 +1000, Herbert Xu wrote: > Jens Axboe <axboe@suse.de> wrote: > > > > That looks really dangerous, I'd prefer that to be a BUG_ON() as well to > > prevent nastiness further down. > > OK, I used a WARN_ON mainly because ext3 has been doing this for years > without killing anyone until now :) > > [BLOCK] bh: Ensure bh fits within a page > > There is a bug in jbd with slab debugging enabled where it was submitting > a bh obtained via jbd_rep_kmalloc which crossed a page boundary. A lot > of time was spent on tracking this down because the symptoms were far off > from where the problem was. > > This patch adds a sanity check to submit_bh so we can immediately spot > anyone doing similar things in future. > > Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> > > Cheers, Hi, I am working on a fix to address this problem. Do you have any testcases to reproduce the problem ? I have been running bunch of tests with CONFIG_DEBUG_SLAB with WARN_ON() and I haven't seen it yet :( Wondering, if there is any special thing I need to do to reproduce the problem ? Please let me know. Thanks, Badari - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
On Thu, Aug 17, 2006 at 04:14:16PM -0700, Badari Pulavarty wrote: > > Wondering, if there is any special thing I need to do to reproduce > the problem ? Please let me know. I must say that I didn't try to reproduce the problem. However, here is the bugzilla entry for it. It apparently shows up during installation: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=200127 Cheers,
diff --git a/fs/buffer.c b/fs/buffer.c index 71649ef..b998f08 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -2790,6 +2790,7 @@ int submit_bh(int rw, struct buffer_head BUG_ON(!buffer_locked(bh)); BUG_ON(!buffer_mapped(bh)); BUG_ON(!bh->b_end_io); + WARN_ON(bh_offset(bh) + bh->b_size > PAGE_SIZE); if (buffer_ordered(bh) && (rw == WRITE)) rw = WRITE_BARRIER;
Hi Andrew: [BLOCK] bh: Ensure bh fits within a page There is a bug in jbd with slab debugging enabled where it was submitting a bh obtained via jbd_rep_kmalloc which crossed a page boundary. A lot of time was spent on tracking this down because the symptoms were far off from where the problem was. This patch adds a sanity check to submit_bh so we can immediately spot anyone doing similar things in future. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> While you're at it, could you fix that jbd bug for us :) Cheers,