All of lore.kernel.org
 help / color / mirror / Atom feed
* ext4fs error "ext4_mb_generate_buddy:741:group 16, 8160 clusters in bitmap, 4064 in gd" (with repro)
@ 2012-08-09 10:00 ` Paolo Bonzini
  0 siblings, 0 replies; 10+ messages in thread
From: Paolo Bonzini @ 2012-08-09 10:00 UTC (permalink / raw)
  To: tytso, Linux Kernel Mailinlinux-ext4@vger.kernel.orgg List, linux-ext4

Here is how to reproduce it.  It happens during fstrim.  I found other
occurrences of the error in the mailing list, but they were not related
to trim so they may be something different.

modprobe scsi_debug dev_size_mb=256 lbpws=1
dd if=/dev/zero of=/dev/sdb bs=1M      
fdisk /dev/sdb
 >> create a new partition accepting all defaults
fdisk -lu /dev/sdb|tail -1
 >> should show: /dev/sdb1     57      524285      262114+  83  Linux

mkfs.ext4 /dev/sdb1
mkdir test
mount /dev/sdb1 test
fstrim ./test

Here is the output in dmesg:

[140934.644166] EXT4-fs (sdb1): mounted filesystem with ordered data mode. Opts: (null)
[140941.562060] EXT4-fs error (device sdb1): ext4_mb_generate_buddy:741: group 16, 8160 clusters in bitmap, 4064 in gd
[140941.603066] EXT4-fs error (device sdb1): ext4_mb_generate_buddy:741: group 25, 8192 clusters in bitmap, 7934 in gd
[140941.613060] EXT4-fs error (device sdb1): ext4_mb_generate_buddy:741: group 27, 8192 clusters in bitmap, 7934 in gd
[140941.634074] EXT4-fs error (device sdb1): ext4_mb_generate_buddy:741: group 31, 8192 clusters in bitmap, 8159 in gd

Hope this helps,

Paolo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* ext4fs error "ext4_mb_generate_buddy:741:group 16, 8160 clusters in bitmap, 4064 in gd" (with repro)
@ 2012-08-09 10:00 ` Paolo Bonzini
  0 siblings, 0 replies; 10+ messages in thread
From: Paolo Bonzini @ 2012-08-09 10:00 UTC (permalink / raw)
  To: tytso, Linux Kernel Mailinlinux-ext4@vger.kernel.orgg List, linux-ext4

Here is how to reproduce it.  It happens during fstrim.  I found other
occurrences of the error in the mailing list, but they were not related
to trim so they may be something different.

modprobe scsi_debug dev_size_mb=256 lbpws=1
dd if=/dev/zero of=/dev/sdb bs=1M      
fdisk /dev/sdb
 >> create a new partition accepting all defaults
fdisk -lu /dev/sdb|tail -1
 >> should show: /dev/sdb1     57      524285      262114+  83  Linux

mkfs.ext4 /dev/sdb1
mkdir test
mount /dev/sdb1 test
fstrim ./test

Here is the output in dmesg:

[140934.644166] EXT4-fs (sdb1): mounted filesystem with ordered data mode. Opts: (null)
[140941.562060] EXT4-fs error (device sdb1): ext4_mb_generate_buddy:741: group 16, 8160 clusters in bitmap, 4064 in gd
[140941.603066] EXT4-fs error (device sdb1): ext4_mb_generate_buddy:741: group 25, 8192 clusters in bitmap, 7934 in gd
[140941.613060] EXT4-fs error (device sdb1): ext4_mb_generate_buddy:741: group 27, 8192 clusters in bitmap, 7934 in gd
[140941.634074] EXT4-fs error (device sdb1): ext4_mb_generate_buddy:741: group 31, 8192 clusters in bitmap, 8159 in gd

Hope this helps,

Paolo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: ext4fs error "ext4_mb_generate_buddy:741:group 16, 8160 clusters in bitmap, 4064 in gd" (with repro)
  2012-08-09 10:00 ` Paolo Bonzini
@ 2012-08-09 17:06   ` Theodore Ts'o
  -1 siblings, 0 replies; 10+ messages in thread
From: Theodore Ts'o @ 2012-08-09 17:06 UTC (permalink / raw)
  To: Lukas Czerner
  Cc: Paolo Bonzini,
	Linux Kernel Mailinlinux-ext4@vger.kernel.orgg List, linux-ext4

On Thu, Aug 09, 2012 at 12:00:09PM +0200, Paolo Bonzini wrote:
> Here is how to reproduce it.  It happens during fstrim.  I found other
> occurrences of the error in the mailing list, but they were not related
> to trim so they may be something different.
> 
> modprobe scsi_debug dev_size_mb=256 lbpws=1
> dd if=/dev/zero of=/dev/sdb bs=1M      
> fdisk /dev/sdb
>  >> create a new partition accepting all defaults
> fdisk -lu /dev/sdb|tail -1
>  >> should show: /dev/sdb1     57      524285      262114+  83  Linux
> 
> mkfs.ext4 /dev/sdb1
> mkdir test
> mount /dev/sdb1 test
> fstrim ./test

I can confirm that this accurately reproduces file system corruption
using a 3.5 kernel.  It looks like some block allocation bitmap blocks
is getting trimmed when it shouldn't have been.  Lukas, can you take a
look at this?

					- Ted

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: ext4fs error "ext4_mb_generate_buddy:741:group 16, 8160 clusters in bitmap, 4064 in gd" (with repro)
@ 2012-08-09 17:06   ` Theodore Ts'o
  0 siblings, 0 replies; 10+ messages in thread
From: Theodore Ts'o @ 2012-08-09 17:06 UTC (permalink / raw)
  To: Lukas Czerner
  Cc: Paolo Bonzini,
	Linux Kernel Mailinlinux-ext4@vger.kernel.orgg List, linux-ext4

On Thu, Aug 09, 2012 at 12:00:09PM +0200, Paolo Bonzini wrote:
> Here is how to reproduce it.  It happens during fstrim.  I found other
> occurrences of the error in the mailing list, but they were not related
> to trim so they may be something different.
> 
> modprobe scsi_debug dev_size_mb=256 lbpws=1
> dd if=/dev/zero of=/dev/sdb bs=1M      
> fdisk /dev/sdb
>  >> create a new partition accepting all defaults
> fdisk -lu /dev/sdb|tail -1
>  >> should show: /dev/sdb1     57      524285      262114+  83  Linux
> 
> mkfs.ext4 /dev/sdb1
> mkdir test
> mount /dev/sdb1 test
> fstrim ./test

I can confirm that this accurately reproduces file system corruption
using a 3.5 kernel.  It looks like some block allocation bitmap blocks
is getting trimmed when it shouldn't have been.  Lukas, can you take a
look at this?

					- Ted

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: ext4fs error "ext4_mb_generate_buddy:741:group 16, 8160 clusters in bitmap, 4064 in gd" (with repro)
  2012-08-09 17:06   ` Theodore Ts'o
@ 2012-08-15  9:17     ` Lukáš Czerner
  -1 siblings, 0 replies; 10+ messages in thread
From: Lukáš Czerner @ 2012-08-15  9:17 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Lukas Czerner, Paolo Bonzini,
	Linux Kernel Mailinlinux-ext4@vger.kernel.orgg List, linux-ext4

On Thu, 9 Aug 2012, Theodore Ts'o wrote:

> Date: Thu, 9 Aug 2012 13:06:40 -0400
> From: Theodore Ts'o <tytso@mit.edu>
> To: Lukas Czerner <lczerner@redhat.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>,
>     "Linux Kernel Mailinlinux-ext4@vger.kernel.orgg List"
>     <linux-kernel@vger.kernel.org>, linux-ext4@vger.kernel.org
> Subject: Re: ext4fs error
>     "ext4_mb_generate_buddy:741:group 16, 8160 clusters in bitmap, 4064 in gd"
>      (with repro)
> 
> On Thu, Aug 09, 2012 at 12:00:09PM +0200, Paolo Bonzini wrote:
> > Here is how to reproduce it.  It happens during fstrim.  I found other
> > occurrences of the error in the mailing list, but they were not related
> > to trim so they may be something different.
> > 
> > modprobe scsi_debug dev_size_mb=256 lbpws=1
> > dd if=/dev/zero of=/dev/sdb bs=1M      
> > fdisk /dev/sdb
> >  >> create a new partition accepting all defaults
> > fdisk -lu /dev/sdb|tail -1
> >  >> should show: /dev/sdb1     57      524285      262114+  83  Linux
> > 
> > mkfs.ext4 /dev/sdb1
> > mkdir test
> > mount /dev/sdb1 test
> > fstrim ./test
> 
> I can confirm that this accurately reproduces file system corruption
> using a 3.5 kernel.  It looks like some block allocation bitmap blocks
> is getting trimmed when it shouldn't have been.  Lukas, can you take a
> look at this?
> 
> 					- Ted

Hi Ted,

sorry for the delay, I've just got back from my vacation. I'll take
a look at it.

Thanks!
-Lukas

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: ext4fs error "ext4_mb_generate_buddy:741:group 16, 8160 clusters in bitmap, 4064 in gd" (with repro)
@ 2012-08-15  9:17     ` Lukáš Czerner
  0 siblings, 0 replies; 10+ messages in thread
From: Lukáš Czerner @ 2012-08-15  9:17 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Lukas Czerner, Paolo Bonzini,
	Linux Kernel Mailinlinux-ext4@vger.kernel.orgg List, linux-ext4

On Thu, 9 Aug 2012, Theodore Ts'o wrote:

> Date: Thu, 9 Aug 2012 13:06:40 -0400
> From: Theodore Ts'o <tytso@mit.edu>
> To: Lukas Czerner <lczerner@redhat.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>,
>     "Linux Kernel Mailinlinux-ext4@vger.kernel.orgg List"
>     <linux-kernel@vger.kernel.org>, linux-ext4@vger.kernel.org
> Subject: Re: ext4fs error
>     "ext4_mb_generate_buddy:741:group 16, 8160 clusters in bitmap, 4064 in gd"
>      (with repro)
> 
> On Thu, Aug 09, 2012 at 12:00:09PM +0200, Paolo Bonzini wrote:
> > Here is how to reproduce it.  It happens during fstrim.  I found other
> > occurrences of the error in the mailing list, but they were not related
> > to trim so they may be something different.
> > 
> > modprobe scsi_debug dev_size_mb=256 lbpws=1
> > dd if=/dev/zero of=/dev/sdb bs=1M      
> > fdisk /dev/sdb
> >  >> create a new partition accepting all defaults
> > fdisk -lu /dev/sdb|tail -1
> >  >> should show: /dev/sdb1     57      524285      262114+  83  Linux
> > 
> > mkfs.ext4 /dev/sdb1
> > mkdir test
> > mount /dev/sdb1 test
> > fstrim ./test
> 
> I can confirm that this accurately reproduces file system corruption
> using a 3.5 kernel.  It looks like some block allocation bitmap blocks
> is getting trimmed when it shouldn't have been.  Lukas, can you take a
> look at this?
> 
> 					- Ted

Hi Ted,

sorry for the delay, I've just got back from my vacation. I'll take
a look at it.

Thanks!
-Lukas

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: ext4fs error "ext4_mb_generate_buddy:741:group 16, 8160 clusters in bitmap, 4064 in gd" (with repro)
  2012-08-15  9:17     ` Lukáš Czerner
@ 2012-08-16 14:28       ` Lukáš Czerner
  -1 siblings, 0 replies; 10+ messages in thread
From: Lukáš Czerner @ 2012-08-16 14:28 UTC (permalink / raw)
  To: Lukáš Czerner
  Cc: Theodore Ts'o, Paolo Bonzini,
	Linux Kernel Mailinlinux-ext4@vger.kernel.orgg List, linux-ext4

[-- Attachment #1: Type: TEXT/PLAIN, Size: 3295 bytes --]

On Wed, 15 Aug 2012, Lukáš Czerner wrote:

> Date: Wed, 15 Aug 2012 11:17:57 +0200 (CEST)
> From: Lukáš Czerner <lczerner@redhat.com>
> To: Theodore Ts'o <tytso@mit.edu>
> Cc: Lukas Czerner <lczerner@redhat.com>, Paolo Bonzini <pbonzini@redhat.com>,
>     "Linux Kernel Mailinlinux-ext4@vger.kernel.orgg List"
>     <linux-kernel@vger.kernel.org>, linux-ext4@vger.kernel.org
> Subject: Re: ext4fs error
>     "ext4_mb_generate_buddy:741:group 16, 8160 clusters in bitmap, 4064 in gd"
>      (with repro)
> 
> On Thu, 9 Aug 2012, Theodore Ts'o wrote:
> 
> > Date: Thu, 9 Aug 2012 13:06:40 -0400
> > From: Theodore Ts'o <tytso@mit.edu>
> > To: Lukas Czerner <lczerner@redhat.com>
> > Cc: Paolo Bonzini <pbonzini@redhat.com>,
> >     "Linux Kernel Mailinlinux-ext4@vger.kernel.orgg List"
> >     <linux-kernel@vger.kernel.org>, linux-ext4@vger.kernel.org
> > Subject: Re: ext4fs error
> >     "ext4_mb_generate_buddy:741:group 16, 8160 clusters in bitmap, 4064 in gd"
> >      (with repro)
> > 
> > On Thu, Aug 09, 2012 at 12:00:09PM +0200, Paolo Bonzini wrote:
> > > Here is how to reproduce it.  It happens during fstrim.  I found other
> > > occurrences of the error in the mailing list, but they were not related
> > > to trim so they may be something different.
> > > 
> > > modprobe scsi_debug dev_size_mb=256 lbpws=1
> > > dd if=/dev/zero of=/dev/sdb bs=1M      
> > > fdisk /dev/sdb
> > >  >> create a new partition accepting all defaults
> > > fdisk -lu /dev/sdb|tail -1
> > >  >> should show: /dev/sdb1     57      524285      262114+  83  Linux
> > > 
> > > mkfs.ext4 /dev/sdb1
> > > mkdir test
> > > mount /dev/sdb1 test
> > > fstrim ./test
> > 
> > I can confirm that this accurately reproduces file system corruption
> > using a 3.5 kernel.  It looks like some block allocation bitmap blocks
> > is getting trimmed when it shouldn't have been.  Lukas, can you take a
> > look at this?
> > 
> > 					- Ted
> 
> Hi Ted,
> 
> sorry for the delay, I've just got back from my vacation. I'll take
> a look at it.
> 
> Thanks!
> -Lukas

This does not seem like an ext4 problem. The code seems unable to
actually discard blocks which are allocated. Moreover I was not able
to reproduce the problem on the loop device with the same setting as
the reported scsi_debug device (1024 bs file system on the 256MB image
residing on the 1024 bs filesystem) 

After a little bit of tracing with the systemtap and blktrace ext4
does not seem to be doing anything wrong and yet we get part of the
block bitmap trimmed. This lead me to the scsi_debug driver itself
and indeed it seems that we have off-by-one bug there in the
unamp_region() which is causing the problem.

Here is the patch which fixes the problem for me, I'll resend the
proper patch in a bit.

diff --git a/drivers/scsi/scsi_debug.c b/drivers/scsi/scsi_debug.c
index 182d5a5..f4cc413 100644
--- a/drivers/scsi/scsi_debug.c
+++ b/drivers/scsi/scsi_debug.c
@@ -2054,7 +2054,7 @@ static void unmap_region(sector_t lba, unsigned int len)
 		block = lba + alignment;
 		rem = do_div(block, granularity);
 
-		if (rem == 0 && lba + granularity <= end && block < map_size) {
+		if (rem == 0 && lba + granularity < end && block < map_size) {
 			clear_bit(block, map_storep);
 			if (scsi_debug_lbprz)
 				memset(fake_storep +


Thanks!
-Lukas

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: ext4fs error "ext4_mb_generate_buddy:741:group 16, 8160 clusters in bitmap, 4064 in gd" (with repro)
@ 2012-08-16 14:28       ` Lukáš Czerner
  0 siblings, 0 replies; 10+ messages in thread
From: Lukáš Czerner @ 2012-08-16 14:28 UTC (permalink / raw)
  To: Lukáš Czerner
  Cc: Theodore Ts'o, Paolo Bonzini,
	Linux Kernel Mailinlinux-ext4@vger.kernel.orgg List, linux-ext4

[-- Attachment #1: Type: TEXT/PLAIN, Size: 3295 bytes --]

On Wed, 15 Aug 2012, Lukáš Czerner wrote:

> Date: Wed, 15 Aug 2012 11:17:57 +0200 (CEST)
> From: Lukáš Czerner <lczerner@redhat.com>
> To: Theodore Ts'o <tytso@mit.edu>
> Cc: Lukas Czerner <lczerner@redhat.com>, Paolo Bonzini <pbonzini@redhat.com>,
>     "Linux Kernel Mailinlinux-ext4@vger.kernel.orgg List"
>     <linux-kernel@vger.kernel.org>, linux-ext4@vger.kernel.org
> Subject: Re: ext4fs error
>     "ext4_mb_generate_buddy:741:group 16, 8160 clusters in bitmap, 4064 in gd"
>      (with repro)
> 
> On Thu, 9 Aug 2012, Theodore Ts'o wrote:
> 
> > Date: Thu, 9 Aug 2012 13:06:40 -0400
> > From: Theodore Ts'o <tytso@mit.edu>
> > To: Lukas Czerner <lczerner@redhat.com>
> > Cc: Paolo Bonzini <pbonzini@redhat.com>,
> >     "Linux Kernel Mailinlinux-ext4@vger.kernel.orgg List"
> >     <linux-kernel@vger.kernel.org>, linux-ext4@vger.kernel.org
> > Subject: Re: ext4fs error
> >     "ext4_mb_generate_buddy:741:group 16, 8160 clusters in bitmap, 4064 in gd"
> >      (with repro)
> > 
> > On Thu, Aug 09, 2012 at 12:00:09PM +0200, Paolo Bonzini wrote:
> > > Here is how to reproduce it.  It happens during fstrim.  I found other
> > > occurrences of the error in the mailing list, but they were not related
> > > to trim so they may be something different.
> > > 
> > > modprobe scsi_debug dev_size_mb=256 lbpws=1
> > > dd if=/dev/zero of=/dev/sdb bs=1M      
> > > fdisk /dev/sdb
> > >  >> create a new partition accepting all defaults
> > > fdisk -lu /dev/sdb|tail -1
> > >  >> should show: /dev/sdb1     57      524285      262114+  83  Linux
> > > 
> > > mkfs.ext4 /dev/sdb1
> > > mkdir test
> > > mount /dev/sdb1 test
> > > fstrim ./test
> > 
> > I can confirm that this accurately reproduces file system corruption
> > using a 3.5 kernel.  It looks like some block allocation bitmap blocks
> > is getting trimmed when it shouldn't have been.  Lukas, can you take a
> > look at this?
> > 
> > 					- Ted
> 
> Hi Ted,
> 
> sorry for the delay, I've just got back from my vacation. I'll take
> a look at it.
> 
> Thanks!
> -Lukas

This does not seem like an ext4 problem. The code seems unable to
actually discard blocks which are allocated. Moreover I was not able
to reproduce the problem on the loop device with the same setting as
the reported scsi_debug device (1024 bs file system on the 256MB image
residing on the 1024 bs filesystem) 

After a little bit of tracing with the systemtap and blktrace ext4
does not seem to be doing anything wrong and yet we get part of the
block bitmap trimmed. This lead me to the scsi_debug driver itself
and indeed it seems that we have off-by-one bug there in the
unamp_region() which is causing the problem.

Here is the patch which fixes the problem for me, I'll resend the
proper patch in a bit.

diff --git a/drivers/scsi/scsi_debug.c b/drivers/scsi/scsi_debug.c
index 182d5a5..f4cc413 100644
--- a/drivers/scsi/scsi_debug.c
+++ b/drivers/scsi/scsi_debug.c
@@ -2054,7 +2054,7 @@ static void unmap_region(sector_t lba, unsigned int len)
 		block = lba + alignment;
 		rem = do_div(block, granularity);
 
-		if (rem == 0 && lba + granularity <= end && block < map_size) {
+		if (rem == 0 && lba + granularity < end && block < map_size) {
 			clear_bit(block, map_storep);
 			if (scsi_debug_lbprz)
 				memset(fake_storep +


Thanks!
-Lukas

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: ext4fs error "ext4_mb_generate_buddy:741:group 16, 8160 clusters in bitmap, 4064 in gd" (with repro)
  2012-08-16 14:28       ` Lukáš Czerner
@ 2012-08-16 20:00         ` Theodore Ts'o
  -1 siblings, 0 replies; 10+ messages in thread
From: Theodore Ts'o @ 2012-08-16 20:00 UTC (permalink / raw)
  To: Lukáš Czerner
  Cc: Paolo Bonzini,
	Linux Kernel Mailinlinux-ext4@vger.kernel.orgg List, linux-ext4

On Thu, Aug 16, 2012 at 04:28:07PM +0200, Lukáš Czerner wrote:
> 
> After a little bit of tracing with the systemtap and blktrace ext4
> does not seem to be doing anything wrong and yet we get part of the
> block bitmap trimmed. This lead me to the scsi_debug driver itself
> and indeed it seems that we have off-by-one bug there in the
> unamp_region() which is causing the problem.

Thanks for finding this --- I was getting scared that ext4 users were
losing data in production.  It's good to know it was just a bug in the
scsi_debug driver.

						- Ted

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: ext4fs error "ext4_mb_generate_buddy:741:group 16, 8160 clusters in bitmap, 4064 in gd" (with repro)
@ 2012-08-16 20:00         ` Theodore Ts'o
  0 siblings, 0 replies; 10+ messages in thread
From: Theodore Ts'o @ 2012-08-16 20:00 UTC (permalink / raw)
  To: Lukáš Czerner
  Cc: Paolo Bonzini,
	Linux Kernel Mailinlinux-ext4@vger.kernel.orgg List, linux-ext4

On Thu, Aug 16, 2012 at 04:28:07PM +0200, Lukáš Czerner wrote:
> 
> After a little bit of tracing with the systemtap and blktrace ext4
> does not seem to be doing anything wrong and yet we get part of the
> block bitmap trimmed. This lead me to the scsi_debug driver itself
> and indeed it seems that we have off-by-one bug there in the
> unamp_region() which is causing the problem.

Thanks for finding this --- I was getting scared that ext4 users were
losing data in production.  It's good to know it was just a bug in the
scsi_debug driver.

						- Ted

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2012-08-16 20:00 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-08-09 10:00 ext4fs error "ext4_mb_generate_buddy:741:group 16, 8160 clusters in bitmap, 4064 in gd" (with repro) Paolo Bonzini
2012-08-09 10:00 ` Paolo Bonzini
2012-08-09 17:06 ` Theodore Ts'o
2012-08-09 17:06   ` Theodore Ts'o
2012-08-15  9:17   ` Lukáš Czerner
2012-08-15  9:17     ` Lukáš Czerner
2012-08-16 14:28     ` Lukáš Czerner
2012-08-16 14:28       ` Lukáš Czerner
2012-08-16 20:00       ` Theodore Ts'o
2012-08-16 20:00         ` Theodore Ts'o

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.