can't recover ext4 on lvm from ext4_mb_generate_buddy:739: group 1687, 32254 clusters in bitmap, 32258 in gd

All of lore.kernel.org
 help / color / mirror / Atom feed

* can't recover ext4 on lvm from ext4_mb_generate_buddy:739: group 1687, 32254 clusters in bitmap, 32258 in gd
@ 2012-01-05 10:37 Sander Eikelenboom
  2012-01-05 13:21 ` Sander Eikelenboom
  0 siblings, 1 reply; 26+ messages in thread
From: Sander Eikelenboom @ 2012-01-05 10:37 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: linux-ext4, linux-kernel


I'm having some troubles with a ext4 filesystem on LVM, it seems bricked and fsck doesn't seem to find and correct the problem.

Steps:
1) fsck -v -p -f the filesystem
2) mount the filesystem
3) Try to copy a file
4) filesystem will be mounted RO on error  (see below)
5) fsck again, journal will be recovered, no other errors
6) start at 1)


I think the way i bricked it is:
- make a lvm snapshot from that lvm logical disk
- mount that lvm snapshot as RO
- try to copy a file from that mounted RO snapshot to a diffrent dir on the lvm logical disk the snapshot is from.
- it fails and i can't recover (see above)


Is there a way to recover from this ?



[  220.748928] EXT4-fs error (device dm-2): ext4_mb_generate_buddy:739: group 1687, 32254 clusters in bitmap, 32258 in gd
[  220.749415] Aborting journal on device dm-2-8.
[  220.771633] EXT4-fs error (device dm-2): ext4_journal_start_sb:327: Detected aborted journal
[  220.772593] EXT4-fs (dm-2): Remounting filesystem read-only
[  220.792455] EXT4-fs (dm-2): Remounting filesystem read-only
[  220.805118] EXT4-fs (dm-2): ext4_da_writepages: jbd2_start: 9680 pages, ino 4079617; err -30
serveerstertje:/mnt/xen_images/domains/production# cd /
serveerstertje:/# umount /mnt/xen_images/
serveerstertje:/# fsck -f -v -p /dev/serveerstertje/xen_images
fsck from util-linux-ng 2.17.2
/dev/mapper/serveerstertje-xen_images: recovering journal

     277 inodes used (0.00%)
       5 non-contiguous files (1.8%)
       0 non-contiguous directories (0.0%)
         # of inodes with ind/dind/tind blocks: 41/41/3
         Extent depth histogram: 69/28/2
51890920 blocks used (79.18%)
       0 bad blocks
      41 large files

     199 regular files
      53 directories
       0 character device files
       0 block device files
       0 fifos
       0 links
      16 symbolic links (16 fast symbolic links)
       0 sockets
--------
     268 files
serveerstertje:/#




System:
- Kernel 3.2.0
- Debian Squeeze with:
ii  e2fslibs                              1.41.12-4stable1                     ext2/ext3/ext4 file system libraries
ii  e2fsprogs                             1.41.12-4stable1                     ext2/ext3/ext4 file system utilities


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: can't recover ext4 on lvm from ext4_mb_generate_buddy:739: group 1687, 32254 clusters in bitmap, 32258 in gd
  2012-01-05 10:37 can't recover ext4 on lvm from ext4_mb_generate_buddy:739: group 1687, 32254 clusters in bitmap, 32258 in gd Sander Eikelenboom
@ 2012-01-05 13:21 ` Sander Eikelenboom
  2012-01-05 14:45     ` Theodore Tso
  0 siblings, 1 reply; 26+ messages in thread
From: Sander Eikelenboom @ 2012-01-05 13:21 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: linux-ext4, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2845 bytes --]

Hmm it seems to be over by reverting from a 3.2.0 to a 3.1.5 kernel, i now can copy the files after the fsck without it being remounted-ro due to the error.

--
Sander


This is a forwarded message
From: Sander Eikelenboom <linux@eikelenboom.it>
To: "Theodore Ts'o" <tytso@mit.edu>
Date: Thursday, January 5, 2012, 11:37:59 AM
Subject: can't recover ext4 on lvm from ext4_mb_generate_buddy:739: group 1687, 32254 clusters in bitmap, 32258 in gd

===8<==============Original message text===============

I'm having some troubles with a ext4 filesystem on LVM, it seems bricked and fsck doesn't seem to find and correct the problem.

Steps:
1) fsck -v -p -f the filesystem
2) mount the filesystem
3) Try to copy a file
4) filesystem will be mounted RO on error  (see below)
5) fsck again, journal will be recovered, no other errors
6) start at 1)


I think the way i bricked it is:
- make a lvm snapshot from that lvm logical disk
- mount that lvm snapshot as RO
- try to copy a file from that mounted RO snapshot to a diffrent dir on the lvm logical disk the snapshot is from.
- it fails and i can't recover (see above)


Is there a way to recover from this ?



[  220.748928] EXT4-fs error (device dm-2): ext4_mb_generate_buddy:739: group 1687, 32254 clusters in bitmap, 32258 in gd
[  220.749415] Aborting journal on device dm-2-8.
[  220.771633] EXT4-fs error (device dm-2): ext4_journal_start_sb:327: Detected aborted journal
[  220.772593] EXT4-fs (dm-2): Remounting filesystem read-only
[  220.792455] EXT4-fs (dm-2): Remounting filesystem read-only
[  220.805118] EXT4-fs (dm-2): ext4_da_writepages: jbd2_start: 9680 pages, ino 4079617; err -30
serveerstertje:/mnt/xen_images/domains/production# cd /
serveerstertje:/# umount /mnt/xen_images/
serveerstertje:/# fsck -f -v -p /dev/serveerstertje/xen_images
fsck from util-linux-ng 2.17.2
/dev/mapper/serveerstertje-xen_images: recovering journal

     277 inodes used (0.00%)
       5 non-contiguous files (1.8%)
       0 non-contiguous directories (0.0%)
         # of inodes with ind/dind/tind blocks: 41/41/3
         Extent depth histogram: 69/28/2
51890920 blocks used (79.18%)
       0 bad blocks
      41 large files

     199 regular files
      53 directories
       0 character device files
       0 block device files
       0 fifos
       0 links
      16 symbolic links (16 fast symbolic links)
       0 sockets
--------
     268 files
serveerstertje:/#




System:
- Kernel 3.2.0
- Debian Squeeze with:
ii  e2fslibs                              1.41.12-4stable1                     ext2/ext3/ext4 file system libraries
ii  e2fsprogs                             1.41.12-4stable1                     ext2/ext3/ext4 file system utilities

===8<===========End of original message text===========



-- 
Best regards,
 Sander                            mailto:linux@eikelenboom.it

[-- Attachment #2: Message01.eml --]
[-- Type: message/rfc822, Size: 2712 bytes --]

From: Sander Eikelenboom <linux@eikelenboom.it>
To: "Theodore Ts'o" <tytso@mit.edu>
Cc: linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: can't recover ext4 on lvm from ext4_mb_generate_buddy:739: group 1687, 32254 clusters in bitmap, 32258 in gd
Date: Thu, 5 Jan 2012 11:37:59 +0100
Message-ID: <217150909.20120105113759@eikelenboom.it>


I'm having some troubles with a ext4 filesystem on LVM, it seems bricked and fsck doesn't seem to find and correct the problem.

Steps:
1) fsck -v -p -f the filesystem
2) mount the filesystem
3) Try to copy a file
4) filesystem will be mounted RO on error  (see below)
5) fsck again, journal will be recovered, no other errors
6) start at 1)


I think the way i bricked it is:
- make a lvm snapshot from that lvm logical disk
- mount that lvm snapshot as RO
- try to copy a file from that mounted RO snapshot to a diffrent dir on the lvm logical disk the snapshot is from.
- it fails and i can't recover (see above)


Is there a way to recover from this ?



[  220.748928] EXT4-fs error (device dm-2): ext4_mb_generate_buddy:739: group 1687, 32254 clusters in bitmap, 32258 in gd
[  220.749415] Aborting journal on device dm-2-8.
[  220.771633] EXT4-fs error (device dm-2): ext4_journal_start_sb:327: Detected aborted journal
[  220.772593] EXT4-fs (dm-2): Remounting filesystem read-only
[  220.792455] EXT4-fs (dm-2): Remounting filesystem read-only
[  220.805118] EXT4-fs (dm-2): ext4_da_writepages: jbd2_start: 9680 pages, ino 4079617; err -30
serveerstertje:/mnt/xen_images/domains/production# cd /
serveerstertje:/# umount /mnt/xen_images/
serveerstertje:/# fsck -f -v -p /dev/serveerstertje/xen_images
fsck from util-linux-ng 2.17.2
/dev/mapper/serveerstertje-xen_images: recovering journal

     277 inodes used (0.00%)
       5 non-contiguous files (1.8%)
       0 non-contiguous directories (0.0%)
         # of inodes with ind/dind/tind blocks: 41/41/3
         Extent depth histogram: 69/28/2
51890920 blocks used (79.18%)
       0 bad blocks
      41 large files

     199 regular files
      53 directories
       0 character device files
       0 block device files
       0 fifos
       0 links
      16 symbolic links (16 fast symbolic links)
       0 sockets
--------
     268 files
serveerstertje:/#




System:
- Kernel 3.2.0
- Debian Squeeze with:
ii  e2fslibs                              1.41.12-4stable1                     ext2/ext3/ext4 file system libraries
ii  e2fsprogs                             1.41.12-4stable1                     ext2/ext3/ext4 file system utilities

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: can't recover ext4 on lvm from ext4_mb_generate_buddy:739: group 1687, 32254 clusters in bitmap, 32258 in gd
  2012-01-05 13:21 ` Sander Eikelenboom
@ 2012-01-05 14:45     ` Theodore Tso
  0 siblings, 0 replies; 26+ messages in thread
From: Theodore Tso @ 2012-01-05 14:45 UTC (permalink / raw)
  To: Sander Eikelenboom; +Cc: Theodore Tso, linux-ext4, linux-kernel


On Jan 5, 2012, at 8:21 AM, Sander Eikelenboom wrote:

> Hmm it seems to be over by reverting from a 3.2.0 to a 3.1.5 kernel, i now can copy the files after the fsck without it being remounted-ro due to the error.

Hmm…  So the question is whether this is caused by changes to ext4 or in the device-mapper / LVM.

The error which ext4 is reporting is that a block bitmap appears to be corrupted; the block group descriptors are reporting that there are 32258 free blocks, while only 32254 free blocks are found in the block bitmap.  Since one or the other is must be wrong, and continuing could potentially cause data loss, the file system gets mounted remounted read-only.

What's funny is that fsck didn't report anything wrong.   That implies that the LVM volume is returning different block contents, at least under some circumstances.

Hmm…. can you try reproducing this?   What happens if you now reboot into 3.2?   Do you still get the file system getting remounted read-only?    Can you try running dumpe2fs on the file system before and after running e2fsck, and when you try to reproduce it, can you make a special note of the EXT4-fs error message:

[  220.748928] EXT4-fs error (device dm-2): ext4_mb_generate_buddy:739: group 1687, 32254 clusters in bitmap, 32258 in gd

Do the numbers stay the same each time you reproduce the problem?   And are there any changes in the output of dumpe2fs (run diff; it will probably be a very tiny difference).

Also, what is the underlying devices underlying the LVM?   Are you using a MD device?   Or is the 200T volume spread out across multiple hard drives directly (i.e., no RAID)?

-- Ted


> 
> --
> Sander
> 
> 
> This is a forwarded message
> From: Sander Eikelenboom <linux@eikelenboom.it>
> To: "Theodore Ts'o" <tytso@mit.edu>
> Date: Thursday, January 5, 2012, 11:37:59 AM
> Subject: can't recover ext4 on lvm from ext4_mb_generate_buddy:739: group 1687, 32254 clusters in bitmap, 32258 in gd
> 
> ===8<==============Original message text===============
> 
> I'm having some troubles with a ext4 filesystem on LVM, it seems bricked and fsck doesn't seem to find and correct the problem.
> 
> Steps:
> 1) fsck -v -p -f the filesystem
> 2) mount the filesystem
> 3) Try to copy a file
> 4) filesystem will be mounted RO on error  (see below)
> 5) fsck again, journal will be recovered, no other errors
> 6) start at 1)
> 
> 
> I think the way i bricked it is:
> - make a lvm snapshot from that lvm logical disk
> - mount that lvm snapshot as RO
> - try to copy a file from that mounted RO snapshot to a diffrent dir on the lvm logical disk the snapshot is from.
> - it fails and i can't recover (see above)
> 
> 
> Is there a way to recover from this ?
> 
> 
> 
> [  220.748928] EXT4-fs error (device dm-2): ext4_mb_generate_buddy:739: group 1687, 32254 clusters in bitmap, 32258 in gd
> [  220.749415] Aborting journal on device dm-2-8.
> [  220.771633] EXT4-fs error (device dm-2): ext4_journal_start_sb:327: Detected aborted journal
> [  220.772593] EXT4-fs (dm-2): Remounting filesystem read-only
> [  220.792455] EXT4-fs (dm-2): Remounting filesystem read-only
> [  220.805118] EXT4-fs (dm-2): ext4_da_writepages: jbd2_start: 9680 pages, ino 4079617; err -30
> serveerstertje:/mnt/xen_images/domains/production# cd /
> serveerstertje:/# umount /mnt/xen_images/
> serveerstertje:/# fsck -f -v -p /dev/serveerstertje/xen_images
> fsck from util-linux-ng 2.17.2
> /dev/mapper/serveerstertje-xen_images: recovering journal
> 
>     277 inodes used (0.00%)
>       5 non-contiguous files (1.8%)
>       0 non-contiguous directories (0.0%)
>         # of inodes with ind/dind/tind blocks: 41/41/3
>         Extent depth histogram: 69/28/2
> 51890920 blocks used (79.18%)
>       0 bad blocks
>      41 large files
> 
>     199 regular files
>      53 directories
>       0 character device files
>       0 block device files
>       0 fifos
>       0 links
>      16 symbolic links (16 fast symbolic links)
>       0 sockets
> --------
>     268 files
> serveerstertje:/#
> 
> 
> 
> 
> System:
> - Kernel 3.2.0
> - Debian Squeeze with:
> ii  e2fslibs                              1.41.12-4stable1                     ext2/ext3/ext4 file system libraries
> ii  e2fsprogs                             1.41.12-4stable1                     ext2/ext3/ext4 file system utilities
> 
> ===8<===========End of original message text===========
> 
> 
> 
> -- 
> Best regards,
> Sander                            mailto:linux@eikelenboom.it<Message01.eml>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: can't recover ext4 on lvm from ext4_mb_generate_buddy:739: group 1687, 32254 clusters in bitmap, 32258 in gd
@ 2012-01-05 14:45     ` Theodore Tso
  0 siblings, 0 replies; 26+ messages in thread
From: Theodore Tso @ 2012-01-05 14:45 UTC (permalink / raw)
  To: Sander Eikelenboom; +Cc: Theodore Tso, linux-ext4, linux-kernel


On Jan 5, 2012, at 8:21 AM, Sander Eikelenboom wrote:

> Hmm it seems to be over by reverting from a 3.2.0 to a 3.1.5 kernel, i now can copy the files after the fsck without it being remounted-ro due to the error.

Hmm…  So the question is whether this is caused by changes to ext4 or in the device-mapper / LVM.

The error which ext4 is reporting is that a block bitmap appears to be corrupted; the block group descriptors are reporting that there are 32258 free blocks, while only 32254 free blocks are found in the block bitmap.  Since one or the other is must be wrong, and continuing could potentially cause data loss, the file system gets mounted remounted read-only.

What's funny is that fsck didn't report anything wrong.   That implies that the LVM volume is returning different block contents, at least under some circumstances.

Hmm…. can you try reproducing this?   What happens if you now reboot into 3.2?   Do you still get the file system getting remounted read-only?    Can you try running dumpe2fs on the file system before and after running e2fsck, and when you try to reproduce it, can you make a special note of the EXT4-fs error message:

[  220.748928] EXT4-fs error (device dm-2): ext4_mb_generate_buddy:739: group 1687, 32254 clusters in bitmap, 32258 in gd

Do the numbers stay the same each time you reproduce the problem?   And are there any changes in the output of dumpe2fs (run diff; it will probably be a very tiny difference).

Also, what is the underlying devices underlying the LVM?   Are you using a MD device?   Or is the 200T volume spread out across multiple hard drives directly (i.e., no RAID)?

-- Ted


> 
> --
> Sander
> 
> 
> This is a forwarded message
> From: Sander Eikelenboom <linux@eikelenboom.it>
> To: "Theodore Ts'o" <tytso@mit.edu>
> Date: Thursday, January 5, 2012, 11:37:59 AM
> Subject: can't recover ext4 on lvm from ext4_mb_generate_buddy:739: group 1687, 32254 clusters in bitmap, 32258 in gd
> 
> ===8<==============Original message text===============
> 
> I'm having some troubles with a ext4 filesystem on LVM, it seems bricked and fsck doesn't seem to find and correct the problem.
> 
> Steps:
> 1) fsck -v -p -f the filesystem
> 2) mount the filesystem
> 3) Try to copy a file
> 4) filesystem will be mounted RO on error  (see below)
> 5) fsck again, journal will be recovered, no other errors
> 6) start at 1)
> 
> 
> I think the way i bricked it is:
> - make a lvm snapshot from that lvm logical disk
> - mount that lvm snapshot as RO
> - try to copy a file from that mounted RO snapshot to a diffrent dir on the lvm logical disk the snapshot is from.
> - it fails and i can't recover (see above)
> 
> 
> Is there a way to recover from this ?
> 
> 
> 
> [  220.748928] EXT4-fs error (device dm-2): ext4_mb_generate_buddy:739: group 1687, 32254 clusters in bitmap, 32258 in gd
> [  220.749415] Aborting journal on device dm-2-8.
> [  220.771633] EXT4-fs error (device dm-2): ext4_journal_start_sb:327: Detected aborted journal
> [  220.772593] EXT4-fs (dm-2): Remounting filesystem read-only
> [  220.792455] EXT4-fs (dm-2): Remounting filesystem read-only
> [  220.805118] EXT4-fs (dm-2): ext4_da_writepages: jbd2_start: 9680 pages, ino 4079617; err -30
> serveerstertje:/mnt/xen_images/domains/production# cd /
> serveerstertje:/# umount /mnt/xen_images/
> serveerstertje:/# fsck -f -v -p /dev/serveerstertje/xen_images
> fsck from util-linux-ng 2.17.2
> /dev/mapper/serveerstertje-xen_images: recovering journal
> 
>     277 inodes used (0.00%)
>       5 non-contiguous files (1.8%)
>       0 non-contiguous directories (0.0%)
>         # of inodes with ind/dind/tind blocks: 41/41/3
>         Extent depth histogram: 69/28/2
> 51890920 blocks used (79.18%)
>       0 bad blocks
>      41 large files
> 
>     199 regular files
>      53 directories
>       0 character device files
>       0 block device files
>       0 fifos
>       0 links
>      16 symbolic links (16 fast symbolic links)
>       0 sockets
> --------
>     268 files
> serveerstertje:/#
> 
> 
> 
> 
> System:
> - Kernel 3.2.0
> - Debian Squeeze with:
> ii  e2fslibs                              1.41.12-4stable1                     ext2/ext3/ext4 file system libraries
> ii  e2fsprogs                             1.41.12-4stable1                     ext2/ext3/ext4 file system utilities
> 
> ===8<===========End of original message text===========
> 
> 
> 
> -- 
> Best regards,
> Sander                            mailto:linux@eikelenboom.it<Message01.eml>

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: can't recover ext4 on lvm from ext4_mb_generate_buddy:739: group 1687, 32254 clusters in bitmap, 32258 in gd
  2012-01-05 14:45     ` Theodore Tso
@ 2012-01-05 14:52       ` Sander Eikelenboom
  -1 siblings, 0 replies; 26+ messages in thread
From: Sander Eikelenboom @ 2012-01-05 14:52 UTC (permalink / raw)
  To: Theodore Tso; +Cc: linux-ext4, linux-kernel

Thursday, January 5, 2012, 3:45:01 PM, you wrote:


> On Jan 5, 2012, at 8:21 AM, Sander Eikelenboom wrote:

>> Hmm it seems to be over by reverting from a 3.2.0 to a 3.1.5 kernel, i now can copy the files after the fsck without it being remounted-ro due to the error.

> Hmm…  So the question is whether this is caused by changes to ext4 or in the device-mapper / LVM.

> The error which ext4 is reporting is that a block bitmap appears to be corrupted; the block group descriptors are reporting that there are 32258 free blocks, while only 32254 free blocks are found in the block bitmap.  Since one or the other is must be wrong, and continuing could potentially cause data loss, the file system gets mounted remounted read-only.

> What's funny is that fsck didn't report anything wrong.   That implies that the LVM volume is returning different block contents, at least under some circumstances.

> Hmm…. can you try reproducing this?   What happens if you now reboot into 3.2?   Do you still get the file system getting remounted read-only?    Can you try running dumpe2fs on the file system before and after running e2fsck, and when you try to reproduce it, can you make a special note of the EXT4-fs error message:

> [  220.748928] EXT4-fs error (device dm-2): ext4_mb_generate_buddy:739: group 1687, 32254 clusters in bitmap, 32258 in gd

> Do the numbers stay the same each time you reproduce the problem?   And are there any changes in the output of dumpe2fs (run diff; it will probably be a very tiny difference).

> Also, what is the underlying devices underlying the LVM?   Are you using a MD device?   Or is the 200T volume spread out across multiple hard drives directly (i.e., no RAID)?

> -- Ted


Yes well, since it worked again, i continued reshuffling the lvm layout that was planned for today, but this specific LV is still there, so i will try booting 3.2 again to see if i can reproduce and do the things you asked for.
No MD is used, it's one sata disk, containing 2 partitions (boot and a lvm PV), it has one VG that has the PV. That one is split into multiple LV's.



>> 
>> --
>> Sander
>> 
>> 
>> This is a forwarded message
>> From: Sander Eikelenboom <linux@eikelenboom.it>
>> To: "Theodore Ts'o" <tytso@mit.edu>
>> Date: Thursday, January 5, 2012, 11:37:59 AM
>> Subject: can't recover ext4 on lvm from ext4_mb_generate_buddy:739: group 1687, 32254 clusters in bitmap, 32258 in gd
>> 
>> ===8<==============Original message text===============
>> 
>> I'm having some troubles with a ext4 filesystem on LVM, it seems bricked and fsck doesn't seem to find and correct the problem.
>> 
>> Steps:
>> 1) fsck -v -p -f the filesystem
>> 2) mount the filesystem
>> 3) Try to copy a file
>> 4) filesystem will be mounted RO on error  (see below)
>> 5) fsck again, journal will be recovered, no other errors
>> 6) start at 1)
>> 
>> 
>> I think the way i bricked it is:
>> - make a lvm snapshot from that lvm logical disk
>> - mount that lvm snapshot as RO
>> - try to copy a file from that mounted RO snapshot to a diffrent dir on the lvm logical disk the snapshot is from.
>> - it fails and i can't recover (see above)
>> 
>> 
>> Is there a way to recover from this ?
>> 
>> 
>> 
>> [  220.748928] EXT4-fs error (device dm-2): ext4_mb_generate_buddy:739: group 1687, 32254 clusters in bitmap, 32258 in gd
>> [  220.749415] Aborting journal on device dm-2-8.
>> [  220.771633] EXT4-fs error (device dm-2): ext4_journal_start_sb:327: Detected aborted journal
>> [  220.772593] EXT4-fs (dm-2): Remounting filesystem read-only
>> [  220.792455] EXT4-fs (dm-2): Remounting filesystem read-only
>> [  220.805118] EXT4-fs (dm-2): ext4_da_writepages: jbd2_start: 9680 pages, ino 4079617; err -30
>> serveerstertje:/mnt/xen_images/domains/production# cd /
>> serveerstertje:/# umount /mnt/xen_images/
>> serveerstertje:/# fsck -f -v -p /dev/serveerstertje/xen_images
>> fsck from util-linux-ng 2.17.2
>> /dev/mapper/serveerstertje-xen_images: recovering journal
>> 
>>     277 inodes used (0.00%)
>>       5 non-contiguous files (1.8%)
>>       0 non-contiguous directories (0.0%)
>>         # of inodes with ind/dind/tind blocks: 41/41/3
>>         Extent depth histogram: 69/28/2
>> 51890920 blocks used (79.18%)
>>       0 bad blocks
>>      41 large files
>> 
>>     199 regular files
>>      53 directories
>>       0 character device files
>>       0 block device files
>>       0 fifos
>>       0 links
>>      16 symbolic links (16 fast symbolic links)
>>       0 sockets
>> --------
>>     268 files
>> serveerstertje:/#
>> 
>> 
>> 
>> 
>> System:
>> - Kernel 3.2.0
>> - Debian Squeeze with:
>> ii  e2fslibs                              1.41.12-4stable1                     ext2/ext3/ext4 file system libraries
>> ii  e2fsprogs                             1.41.12-4stable1                     ext2/ext3/ext4 file system utilities
>> 
>> ===8<===========End of original message text===========
>> 
>> 
>> 
>> -- 
>> Best regards,
>> Sander                            mailto:linux@eikelenboom.it<Message01.eml>




-- 
Best regards,
 Sander                            mailto:linux@eikelenboom.it


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: can't recover ext4 on lvm from ext4_mb_generate_buddy:739: group 1687, 32254 clusters in bitmap, 32258 in gd
@ 2012-01-05 14:52       ` Sander Eikelenboom
  0 siblings, 0 replies; 26+ messages in thread
From: Sander Eikelenboom @ 2012-01-05 14:52 UTC (permalink / raw)
  To: Theodore Tso; +Cc: linux-ext4, linux-kernel

Thursday, January 5, 2012, 3:45:01 PM, you wrote:


> On Jan 5, 2012, at 8:21 AM, Sander Eikelenboom wrote:

>> Hmm it seems to be over by reverting from a 3.2.0 to a 3.1.5 kernel, i now can copy the files after the fsck without it being remounted-ro due to the error.

> Hmm…  So the question is whether this is caused by changes to ext4 or in the device-mapper / LVM.

> The error which ext4 is reporting is that a block bitmap appears to be corrupted; the block group descriptors are reporting that there are 32258 free blocks, while only 32254 free blocks are found in the block bitmap.  Since one or the other is must be wrong, and continuing could potentially cause data loss, the file system gets mounted remounted read-only.

> What's funny is that fsck didn't report anything wrong.   That implies that the LVM volume is returning different block contents, at least under some circumstances.

> Hmm…. can you try reproducing this?   What happens if you now reboot into 3.2?   Do you still get the file system getting remounted read-only?    Can you try running dumpe2fs on the file system before and after running e2fsck, and when you try to reproduce it, can you make a special note of the EXT4-fs error message:

> [  220.748928] EXT4-fs error (device dm-2): ext4_mb_generate_buddy:739: group 1687, 32254 clusters in bitmap, 32258 in gd

> Do the numbers stay the same each time you reproduce the problem?   And are there any changes in the output of dumpe2fs (run diff; it will probably be a very tiny difference).

> Also, what is the underlying devices underlying the LVM?   Are you using a MD device?   Or is the 200T volume spread out across multiple hard drives directly (i.e., no RAID)?

> -- Ted


Yes well, since it worked again, i continued reshuffling the lvm layout that was planned for today, but this specific LV is still there, so i will try booting 3.2 again to see if i can reproduce and do the things you asked for.
No MD is used, it's one sata disk, containing 2 partitions (boot and a lvm PV), it has one VG that has the PV. That one is split into multiple LV's.



>> 
>> --
>> Sander
>> 
>> 
>> This is a forwarded message
>> From: Sander Eikelenboom <linux@eikelenboom.it>
>> To: "Theodore Ts'o" <tytso@mit.edu>
>> Date: Thursday, January 5, 2012, 11:37:59 AM
>> Subject: can't recover ext4 on lvm from ext4_mb_generate_buddy:739: group 1687, 32254 clusters in bitmap, 32258 in gd
>> 
>> ===8<==============Original message text===============
>> 
>> I'm having some troubles with a ext4 filesystem on LVM, it seems bricked and fsck doesn't seem to find and correct the problem.
>> 
>> Steps:
>> 1) fsck -v -p -f the filesystem
>> 2) mount the filesystem
>> 3) Try to copy a file
>> 4) filesystem will be mounted RO on error  (see below)
>> 5) fsck again, journal will be recovered, no other errors
>> 6) start at 1)
>> 
>> 
>> I think the way i bricked it is:
>> - make a lvm snapshot from that lvm logical disk
>> - mount that lvm snapshot as RO
>> - try to copy a file from that mounted RO snapshot to a diffrent dir on the lvm logical disk the snapshot is from.
>> - it fails and i can't recover (see above)
>> 
>> 
>> Is there a way to recover from this ?
>> 
>> 
>> 
>> [  220.748928] EXT4-fs error (device dm-2): ext4_mb_generate_buddy:739: group 1687, 32254 clusters in bitmap, 32258 in gd
>> [  220.749415] Aborting journal on device dm-2-8.
>> [  220.771633] EXT4-fs error (device dm-2): ext4_journal_start_sb:327: Detected aborted journal
>> [  220.772593] EXT4-fs (dm-2): Remounting filesystem read-only
>> [  220.792455] EXT4-fs (dm-2): Remounting filesystem read-only
>> [  220.805118] EXT4-fs (dm-2): ext4_da_writepages: jbd2_start: 9680 pages, ino 4079617; err -30
>> serveerstertje:/mnt/xen_images/domains/production# cd /
>> serveerstertje:/# umount /mnt/xen_images/
>> serveerstertje:/# fsck -f -v -p /dev/serveerstertje/xen_images
>> fsck from util-linux-ng 2.17.2
>> /dev/mapper/serveerstertje-xen_images: recovering journal
>> 
>>     277 inodes used (0.00%)
>>       5 non-contiguous files (1.8%)
>>       0 non-contiguous directories (0.0%)
>>         # of inodes with ind/dind/tind blocks: 41/41/3
>>         Extent depth histogram: 69/28/2
>> 51890920 blocks used (79.18%)
>>       0 bad blocks
>>      41 large files
>> 
>>     199 regular files
>>      53 directories
>>       0 character device files
>>       0 block device files
>>       0 fifos
>>       0 links
>>      16 symbolic links (16 fast symbolic links)
>>       0 sockets
>> --------
>>     268 files
>> serveerstertje:/#
>> 
>> 
>> 
>> 
>> System:
>> - Kernel 3.2.0
>> - Debian Squeeze with:
>> ii  e2fslibs                              1.41.12-4stable1                     ext2/ext3/ext4 file system libraries
>> ii  e2fsprogs                             1.41.12-4stable1                     ext2/ext3/ext4 file system utilities
>> 
>> ===8<===========End of original message text===========
>> 
>> 
>> 
>> -- 
>> Best regards,
>> Sander                            mailto:linux@eikelenboom.it<Message01.eml>




-- 
Best regards,
 Sander                            mailto:linux@eikelenboom.it

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: can't recover ext4 on lvm from ext4_mb_generate_buddy:739: group 1687, 32254 clusters in bitmap, 32258 in gd
  2012-01-05 14:45     ` Theodore Tso
@ 2012-01-05 15:46       ` Sander Eikelenboom
  -1 siblings, 0 replies; 26+ messages in thread
From: Sander Eikelenboom @ 2012-01-05 15:46 UTC (permalink / raw)
  To: Theodore Tso; +Cc: linux-ext4, linux-kernel

Thursday, January 5, 2012, 3:45:01 PM, you wrote:


> On Jan 5, 2012, at 8:21 AM, Sander Eikelenboom wrote:

>> Hmm it seems to be over by reverting from a 3.2.0 to a 3.1.5 kernel, i now can copy the files after the fsck without it being remounted-ro due to the error.

> Hmm…  So the question is whether this is caused by changes to ext4 or in the device-mapper / LVM.

> The error which ext4 is reporting is that a block bitmap appears to be corrupted; the block group descriptors are reporting that there are 32258 free blocks, while only 32254 free blocks are found in the block bitmap.  Since one or the other is must be wrong, and continuing could potentially cause data loss, the file system gets mounted remounted read-only.

> What's funny is that fsck didn't report anything wrong.   That implies that the LVM volume is returning different block contents, at least under some circumstances.

> Hmm…. can you try reproducing this?   What happens if you now reboot into 3.2?   Do you still get the file system getting remounted read-only?    Can you try running dumpe2fs on the file system before and after running e2fsck, and when you try to reproduce it, can you make a special note of the EXT4-fs error message:

> [  220.748928] EXT4-fs error (device dm-2): ext4_mb_generate_buddy:739: group 1687, 32254 clusters in bitmap, 32258 in gd

> Do the numbers stay the same each time you reproduce the problem?   And are there any changes in the output of dumpe2fs (run diff; it will probably be a very tiny difference).

> Also, what is the underlying devices underlying the LVM?   Are you using a MD device?   Or is the 200T volume spread out across multiple hard drives directly (i.e., no RAID)?

> -- Ted

Hmm it seems i can't reproduce :-(
Not under 3.2.0, not while copying from a RO snapshot of the same LV.

At least i know the steps to take when encountering a potential filesystem bug in the future.

--
Sander



>> 
>> --
>> Sander
>> 
>> 
>> This is a forwarded message
>> From: Sander Eikelenboom <linux@eikelenboom.it>
>> To: "Theodore Ts'o" <tytso@mit.edu>
>> Date: Thursday, January 5, 2012, 11:37:59 AM
>> Subject: can't recover ext4 on lvm from ext4_mb_generate_buddy:739: group 1687, 32254 clusters in bitmap, 32258 in gd
>> 
>> ===8<==============Original message text===============
>> 
>> I'm having some troubles with a ext4 filesystem on LVM, it seems bricked and fsck doesn't seem to find and correct the problem.
>> 
>> Steps:
>> 1) fsck -v -p -f the filesystem
>> 2) mount the filesystem
>> 3) Try to copy a file
>> 4) filesystem will be mounted RO on error  (see below)
>> 5) fsck again, journal will be recovered, no other errors
>> 6) start at 1)
>> 
>> 
>> I think the way i bricked it is:
>> - make a lvm snapshot from that lvm logical disk
>> - mount that lvm snapshot as RO
>> - try to copy a file from that mounted RO snapshot to a diffrent dir on the lvm logical disk the snapshot is from.
>> - it fails and i can't recover (see above)
>> 
>> 
>> Is there a way to recover from this ?
>> 
>> 
>> 
>> [  220.748928] EXT4-fs error (device dm-2): ext4_mb_generate_buddy:739: group 1687, 32254 clusters in bitmap, 32258 in gd
>> [  220.749415] Aborting journal on device dm-2-8.
>> [  220.771633] EXT4-fs error (device dm-2): ext4_journal_start_sb:327: Detected aborted journal
>> [  220.772593] EXT4-fs (dm-2): Remounting filesystem read-only
>> [  220.792455] EXT4-fs (dm-2): Remounting filesystem read-only
>> [  220.805118] EXT4-fs (dm-2): ext4_da_writepages: jbd2_start: 9680 pages, ino 4079617; err -30
>> serveerstertje:/mnt/xen_images/domains/production# cd /
>> serveerstertje:/# umount /mnt/xen_images/
>> serveerstertje:/# fsck -f -v -p /dev/serveerstertje/xen_images
>> fsck from util-linux-ng 2.17.2
>> /dev/mapper/serveerstertje-xen_images: recovering journal
>> 
>>     277 inodes used (0.00%)
>>       5 non-contiguous files (1.8%)
>>       0 non-contiguous directories (0.0%)
>>         # of inodes with ind/dind/tind blocks: 41/41/3
>>         Extent depth histogram: 69/28/2
>> 51890920 blocks used (79.18%)
>>       0 bad blocks
>>      41 large files
>> 
>>     199 regular files
>>      53 directories
>>       0 character device files
>>       0 block device files
>>       0 fifos
>>       0 links
>>      16 symbolic links (16 fast symbolic links)
>>       0 sockets
>> --------
>>     268 files
>> serveerstertje:/#
>> 
>> 
>> 
>> 
>> System:
>> - Kernel 3.2.0
>> - Debian Squeeze with:
>> ii  e2fslibs                              1.41.12-4stable1                     ext2/ext3/ext4 file system libraries
>> ii  e2fsprogs                             1.41.12-4stable1                     ext2/ext3/ext4 file system utilities
>> 
>> ===8<===========End of original message text===========
>> 
>> 
>> 
>> -- 
>> Best regards,
>> Sander                            mailto:linux@eikelenboom.it<Message01.eml>




-- 
Best regards,
 Sander                            mailto:linux@eikelenboom.it


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: can't recover ext4 on lvm from ext4_mb_generate_buddy:739: group 1687, 32254 clusters in bitmap, 32258 in gd
@ 2012-01-05 15:46       ` Sander Eikelenboom
  0 siblings, 0 replies; 26+ messages in thread
From: Sander Eikelenboom @ 2012-01-05 15:46 UTC (permalink / raw)
  To: Theodore Tso; +Cc: linux-ext4, linux-kernel

Thursday, January 5, 2012, 3:45:01 PM, you wrote:


> On Jan 5, 2012, at 8:21 AM, Sander Eikelenboom wrote:

>> Hmm it seems to be over by reverting from a 3.2.0 to a 3.1.5 kernel, i now can copy the files after the fsck without it being remounted-ro due to the error.

> Hmm…  So the question is whether this is caused by changes to ext4 or in the device-mapper / LVM.

> The error which ext4 is reporting is that a block bitmap appears to be corrupted; the block group descriptors are reporting that there are 32258 free blocks, while only 32254 free blocks are found in the block bitmap.  Since one or the other is must be wrong, and continuing could potentially cause data loss, the file system gets mounted remounted read-only.

> What's funny is that fsck didn't report anything wrong.   That implies that the LVM volume is returning different block contents, at least under some circumstances.

> Hmm…. can you try reproducing this?   What happens if you now reboot into 3.2?   Do you still get the file system getting remounted read-only?    Can you try running dumpe2fs on the file system before and after running e2fsck, and when you try to reproduce it, can you make a special note of the EXT4-fs error message:

> [  220.748928] EXT4-fs error (device dm-2): ext4_mb_generate_buddy:739: group 1687, 32254 clusters in bitmap, 32258 in gd

> Do the numbers stay the same each time you reproduce the problem?   And are there any changes in the output of dumpe2fs (run diff; it will probably be a very tiny difference).

> Also, what is the underlying devices underlying the LVM?   Are you using a MD device?   Or is the 200T volume spread out across multiple hard drives directly (i.e., no RAID)?

> -- Ted

Hmm it seems i can't reproduce :-(
Not under 3.2.0, not while copying from a RO snapshot of the same LV.

At least i know the steps to take when encountering a potential filesystem bug in the future.

--
Sander



>> 
>> --
>> Sander
>> 
>> 
>> This is a forwarded message
>> From: Sander Eikelenboom <linux@eikelenboom.it>
>> To: "Theodore Ts'o" <tytso@mit.edu>
>> Date: Thursday, January 5, 2012, 11:37:59 AM
>> Subject: can't recover ext4 on lvm from ext4_mb_generate_buddy:739: group 1687, 32254 clusters in bitmap, 32258 in gd
>> 
>> ===8<==============Original message text===============
>> 
>> I'm having some troubles with a ext4 filesystem on LVM, it seems bricked and fsck doesn't seem to find and correct the problem.
>> 
>> Steps:
>> 1) fsck -v -p -f the filesystem
>> 2) mount the filesystem
>> 3) Try to copy a file
>> 4) filesystem will be mounted RO on error  (see below)
>> 5) fsck again, journal will be recovered, no other errors
>> 6) start at 1)
>> 
>> 
>> I think the way i bricked it is:
>> - make a lvm snapshot from that lvm logical disk
>> - mount that lvm snapshot as RO
>> - try to copy a file from that mounted RO snapshot to a diffrent dir on the lvm logical disk the snapshot is from.
>> - it fails and i can't recover (see above)
>> 
>> 
>> Is there a way to recover from this ?
>> 
>> 
>> 
>> [  220.748928] EXT4-fs error (device dm-2): ext4_mb_generate_buddy:739: group 1687, 32254 clusters in bitmap, 32258 in gd
>> [  220.749415] Aborting journal on device dm-2-8.
>> [  220.771633] EXT4-fs error (device dm-2): ext4_journal_start_sb:327: Detected aborted journal
>> [  220.772593] EXT4-fs (dm-2): Remounting filesystem read-only
>> [  220.792455] EXT4-fs (dm-2): Remounting filesystem read-only
>> [  220.805118] EXT4-fs (dm-2): ext4_da_writepages: jbd2_start: 9680 pages, ino 4079617; err -30
>> serveerstertje:/mnt/xen_images/domains/production# cd /
>> serveerstertje:/# umount /mnt/xen_images/
>> serveerstertje:/# fsck -f -v -p /dev/serveerstertje/xen_images
>> fsck from util-linux-ng 2.17.2
>> /dev/mapper/serveerstertje-xen_images: recovering journal
>> 
>>     277 inodes used (0.00%)
>>       5 non-contiguous files (1.8%)
>>       0 non-contiguous directories (0.0%)
>>         # of inodes with ind/dind/tind blocks: 41/41/3
>>         Extent depth histogram: 69/28/2
>> 51890920 blocks used (79.18%)
>>       0 bad blocks
>>      41 large files
>> 
>>     199 regular files
>>      53 directories
>>       0 character device files
>>       0 block device files
>>       0 fifos
>>       0 links
>>      16 symbolic links (16 fast symbolic links)
>>       0 sockets
>> --------
>>     268 files
>> serveerstertje:/#
>> 
>> 
>> 
>> 
>> System:
>> - Kernel 3.2.0
>> - Debian Squeeze with:
>> ii  e2fslibs                              1.41.12-4stable1                     ext2/ext3/ext4 file system libraries
>> ii  e2fsprogs                             1.41.12-4stable1                     ext2/ext3/ext4 file system utilities
>> 
>> ===8<===========End of original message text===========
>> 
>> 
>> 
>> -- 
>> Best regards,
>> Sander                            mailto:linux@eikelenboom.it<Message01.eml>




-- 
Best regards,
 Sander                            mailto:linux@eikelenboom.it

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: can't recover ext4 on lvm from ext4_mb_generate_buddy:739: group 1687, 32254 clusters in bitmap, 32258 in gd
       [not found]     ` <4910694144.20120105171428@eikelenboom.it>
@ 2012-01-05 18:15       ` Ted Ts'o
  2012-01-05 20:04           ` Sander Eikelenboom
                           ` (5 more replies)
  0 siblings, 6 replies; 26+ messages in thread
From: Ted Ts'o @ 2012-01-05 18:15 UTC (permalink / raw)
  To: Sander Eikelenboom; +Cc: linux-ext4, linux-kernel, dm-devel

On Thu, Jan 05, 2012 at 05:14:28PM +0100, Sander Eikelenboom wrote:
> 
> OK spoke too soon, i have been able to trigger it again:
> - copying files from LV to the same LV without the snapshot went OK
> - copying from the RO snapshot of a LV to the same LV gave the error while copying the file again:

OK.  Originally, you said you did this:

1) fsck -v -p -f the filesystem
2) mount the filesystem
3) Try to copy a file
4) filesystem will be mounted RO on error  (see below)
5) fsck again, journal will be recovered, no other errors
6) start at 1)

Was this with with a read-only snapshot always being in existence
through all of these five steps?  When was the RO snapshot created?

If a RO snapshot has to be there in order for this to happen, then
this is almost certainly a device-mapper regression.  (dm-devel folks,
this is a problem which apparently occurred when the user went from
v3.1.5 to v3.2, so this looks likes 3.2 regression.)

						- Ted


> 
> [ 2357.655783] EXT4-fs error (device dm-2): ext4_mb_generate_buddy:739: group 1861, 32254 clusters in bitmap, 32258 in gd
> [ 2357.656056] Aborting journal on device dm-2-8.
> [ 2357.718473] EXT4-fs (dm-2): Remounting filesystem read-only
> [ 2357.736680] EXT4-fs error (device dm-2) in ext4_da_write_end:2532: IO failure
> [ 2357.738328] EXT4-fs (dm-2): ext4_da_writepages: jbd2_start: 7615 pages, ino 4079617; err -30
> [ 2716.125010] EXT4-fs error (device dm-2): ext4_put_super:818: Couldn't clean up the journal
> 
> 
> Attached are 4x output from dumpe2fs
> - dumpe2fs-xen_images-3.2.0                           Made just after boot
> - dumpe2fs-xen_images-3.2.0-afterfsck                 Made after doing a fsck -v -p -f on the unmounted LV
> - dumpe2fs-xen_images-3.2.0-aftererror                Made after the error occured on the mounted LV
> - dumpe2fs-xen_images-3.2.0-aftererror-afterfsck      Made after the error occured, and after a subsequent fsck -v -p -f on the unmounted LV
> - dumpe2fs-xen_images-3.1.5                           Made after booting into 3.1.5 after all of the above
> 
> Oh yes also did a badblock scan to rule that out, and it seems the numbers stay the same.
> e2fsck 1.41.12 (17-May-2010) (from debian squeeze)
> 
> --
> Sander
> 
> 
> 
> >> 
> >> --
> >> Sander
> >> 
> >> 
> >> This is a forwarded message
> >> From: Sander Eikelenboom <linux@eikelenboom.it>
> >> To: "Theodore Ts'o" <tytso@mit.edu>
> >> Date: Thursday, January 5, 2012, 11:37:59 AM
> >> Subject: can't recover ext4 on lvm from ext4_mb_generate_buddy:739: group 1687, 32254 clusters in bitmap, 32258 in gd
> >> 
> >> ===8<==============Original message text===============
> >> 
> >> I'm having some troubles with a ext4 filesystem on LVM, it seems bricked and fsck doesn't seem to find and correct the problem.
> >> 
> >> Steps:
> >> 1) fsck -v -p -f the filesystem
> >> 2) mount the filesystem
> >> 3) Try to copy a file
> >> 4) filesystem will be mounted RO on error  (see below)
> >> 5) fsck again, journal will be recovered, no other errors
> >> 6) start at 1)
> >> 
> >> 
> >> I think the way i bricked it is:
> >> - make a lvm snapshot from that lvm logical disk
> >> - mount that lvm snapshot as RO
> >> - try to copy a file from that mounted RO snapshot to a diffrent dir on the lvm logical disk the snapshot is from.
> >> - it fails and i can't recover (see above)
> >> 
> >> 
> >> Is there a way to recover from this ?
> >> 
> >> 
> >> 
> >> [  220.748928] EXT4-fs error (device dm-2): ext4_mb_generate_buddy:739: group 1687, 32254 clusters in bitmap, 32258 in gd
> >> [  220.749415] Aborting journal on device dm-2-8.
> >> [  220.771633] EXT4-fs error (device dm-2): ext4_journal_start_sb:327: Detected aborted journal
> >> [  220.772593] EXT4-fs (dm-2): Remounting filesystem read-only
> >> [  220.792455] EXT4-fs (dm-2): Remounting filesystem read-only
> >> [  220.805118] EXT4-fs (dm-2): ext4_da_writepages: jbd2_start: 9680 pages, ino 4079617; err -30
> >> serveerstertje:/mnt/xen_images/domains/production# cd /
> >> serveerstertje:/# umount /mnt/xen_images/
> >> serveerstertje:/# fsck -f -v -p /dev/serveerstertje/xen_images
> >> fsck from util-linux-ng 2.17.2
> >> /dev/mapper/serveerstertje-xen_images: recovering journal
> >> 
> >>     277 inodes used (0.00%)
> >>       5 non-contiguous files (1.8%)
> >>       0 non-contiguous directories (0.0%)
> >>         # of inodes with ind/dind/tind blocks: 41/41/3
> >>         Extent depth histogram: 69/28/2
> >> 51890920 blocks used (79.18%)
> >>       0 bad blocks
> >>      41 large files
> >> 
> >>     199 regular files
> >>      53 directories
> >>       0 character device files
> >>       0 block device files
> >>       0 fifos
> >>       0 links
> >>      16 symbolic links (16 fast symbolic links)
> >>       0 sockets
> >> --------
> >>     268 files
> >> serveerstertje:/#
> >> 
> >> 
> >> 
> >> 
> >> System:
> >> - Kernel 3.2.0
> >> - Debian Squeeze with:
> >> ii  e2fslibs                              1.41.12-4stable1                     ext2/ext3/ext4 file system libraries
> >> ii  e2fsprogs                             1.41.12-4stable1                     ext2/ext3/ext4 file system utilities
> >> 
> >> ===8<===========End of original message text===========
> >> 
> >> 
> >> 
> >> -- 
> >> Best regards,
> >> Sander                            mailto:linux@eikelenboom.it<Message01.eml>
> 
> 
> 
> 
> -- 
> Best regards,
>  Sander                            mailto:linux@eikelenboom.it







^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: can't recover ext4 on lvm from ext4_mb_generate_buddy:739: group 1687, 32254 clusters in bitmap, 32258 in gd
  2012-01-05 18:15       ` Ted Ts'o
@ 2012-01-05 20:04           ` Sander Eikelenboom
  2012-01-05 20:45           ` Sander Eikelenboom
                             ` (4 subsequent siblings)
  5 siblings, 0 replies; 26+ messages in thread
From: Sander Eikelenboom @ 2012-01-05 20:04 UTC (permalink / raw)
  To: Ted Ts'o; +Cc: linux-ext4, linux-kernel, dm-devel

Thursday, January 5, 2012, 7:15:35 PM, you wrote:

> On Thu, Jan 05, 2012 at 05:14:28PM +0100, Sander Eikelenboom wrote:
>> 
>> OK spoke too soon, i have been able to trigger it again:
>> - copying files from LV to the same LV without the snapshot went OK
>> - copying from the RO snapshot of a LV to the same LV gave the error while copying the file again:

> OK.  Originally, you said you did this:

> 1) fsck -v -p -f the filesystem
> 2) mount the filesystem
> 3) Try to copy a file
> 4) filesystem will be mounted RO on error  (see below)
> 5) fsck again, journal will be recovered, no other errors
> 6) start at 1)

> Was this with with a read-only snapshot always being in existence
> through all of these five steps?  When was the RO snapshot created?

> If a RO snapshot has to be there in order for this to happen, then
> this is almost certainly a device-mapper regression.  (dm-devel folks,
> this is a problem which apparently occurred when the user went from
> v3.1.5 to v3.2, so this looks likes 3.2 regression.)

>                                                 - Ted


Well it seems to consist of 2 issues with a kernel booted with a 3.2.0 kernel:

1) - It only seems to trigger with a snapshot of the LV present
   - Just tested if the snapshot being mounted RO did really matter, it doesn't.
   - It can also be triggerd if mounted RW
   - It can also be triggered when the snapshot is not mounted at all (by just copying some files on the filesystem itself)

   So that seems a device mapper issue

2) BUT:
   after the error triggerd by 1:
   - After removing the snapshot with lvremove,
   - umounting the filesystem on the LV
   - fsckíng the filesystem without errors (apart from the journal recovery)
   - rebooting the machine again with 3.2.0 kernel
   - mounting the filesystem on the LV
   - removing the partially copied files
   - trying to copy files from the filesystem on the LV to the same filesystem, without a snapshot of the LV present
   - it fails with the exact same error mounting the filesystem RO.

   then
   - umounting the filesystem on the LV
   - fsckíng the filesystem without errors (apart from the journal recovery)
   - rebooting the machine with a 3.1.5 kernel
   - mounting the filesystem on the LV
   - removing the partially copied files
   - trying to copy files from the filesystem on the LV to the same filesystem, without a snapshot of the LV present
   - no problems files copied ok

   then
   - rebooting into 3.2.0 again
   - mounting the filesystem on the LV
   - removing the completly copied files
   - trying to copy files from the filesystem on the LV to the same filesystem, without a snapshot of the LV present
   - no problems files copied ok


   SO
   - it keeps on failing on 3.2.0, even when the snapshot is gone and the system is rebooted, after 3.1.5 is booted once everything seems to be OK again .... even under 3.2.0
   - that seems more like a filesystem thing ?


I doubled checked and performed all these steps again.

--
Sander








>> 
>> [ 2357.655783] EXT4-fs error (device dm-2): ext4_mb_generate_buddy:739: group 1861, 32254 clusters in bitmap, 32258 in gd
>> [ 2357.656056] Aborting journal on device dm-2-8.
>> [ 2357.718473] EXT4-fs (dm-2): Remounting filesystem read-only
>> [ 2357.736680] EXT4-fs error (device dm-2) in ext4_da_write_end:2532: IO failure
>> [ 2357.738328] EXT4-fs (dm-2): ext4_da_writepages: jbd2_start: 7615 pages, ino 4079617; err -30
>> [ 2716.125010] EXT4-fs error (device dm-2): ext4_put_super:818: Couldn't clean up the journal
>> 
>> 
>> Attached are 4x output from dumpe2fs
>> - dumpe2fs-xen_images-3.2.0                           Made just after boot
>> - dumpe2fs-xen_images-3.2.0-afterfsck                 Made after doing a fsck -v -p -f on the unmounted LV
>> - dumpe2fs-xen_images-3.2.0-aftererror                Made after the error occured on the mounted LV
>> - dumpe2fs-xen_images-3.2.0-aftererror-afterfsck      Made after the error occured, and after a subsequent fsck -v -p -f on the unmounted LV
>> - dumpe2fs-xen_images-3.1.5                           Made after booting into 3.1.5 after all of the above
>> 
>> Oh yes also did a badblock scan to rule that out, and it seems the numbers stay the same.
>> e2fsck 1.41.12 (17-May-2010) (from debian squeeze)
>> 
>> --
>> Sander
>> 
>> 
>> 
>> >> 
>> >> --
>> >> Sander
>> >> 
>> >> 
>> >> This is a forwarded message
>> >> From: Sander Eikelenboom <linux@eikelenboom.it>
>> >> To: "Theodore Ts'o" <tytso@mit.edu>
>> >> Date: Thursday, January 5, 2012, 11:37:59 AM
>> >> Subject: can't recover ext4 on lvm from ext4_mb_generate_buddy:739: group 1687, 32254 clusters in bitmap, 32258 in gd
>> >> 
>> >> ===8<==============Original message text===============
>> >> 
>> >> I'm having some troubles with a ext4 filesystem on LVM, it seems bricked and fsck doesn't seem to find and correct the problem.
>> >> 
>> >> Steps:
>> >> 1) fsck -v -p -f the filesystem
>> >> 2) mount the filesystem
>> >> 3) Try to copy a file
>> >> 4) filesystem will be mounted RO on error  (see below)
>> >> 5) fsck again, journal will be recovered, no other errors
>> >> 6) start at 1)
>> >> 
>> >> 
>> >> I think the way i bricked it is:
>> >> - make a lvm snapshot from that lvm logical disk
>> >> - mount that lvm snapshot as RO
>> >> - try to copy a file from that mounted RO snapshot to a diffrent dir on the lvm logical disk the snapshot is from.
>> >> - it fails and i can't recover (see above)
>> >> 
>> >> 
>> >> Is there a way to recover from this ?
>> >> 
>> >> 
>> >> 
>> >> [  220.748928] EXT4-fs error (device dm-2): ext4_mb_generate_buddy:739: group 1687, 32254 clusters in bitmap, 32258 in gd
>> >> [  220.749415] Aborting journal on device dm-2-8.
>> >> [  220.771633] EXT4-fs error (device dm-2): ext4_journal_start_sb:327: Detected aborted journal
>> >> [  220.772593] EXT4-fs (dm-2): Remounting filesystem read-only
>> >> [  220.792455] EXT4-fs (dm-2): Remounting filesystem read-only
>> >> [  220.805118] EXT4-fs (dm-2): ext4_da_writepages: jbd2_start: 9680 pages, ino 4079617; err -30
>> >> serveerstertje:/mnt/xen_images/domains/production# cd /
>> >> serveerstertje:/# umount /mnt/xen_images/
>> >> serveerstertje:/# fsck -f -v -p /dev/serveerstertje/xen_images
>> >> fsck from util-linux-ng 2.17.2
>> >> /dev/mapper/serveerstertje-xen_images: recovering journal
>> >> 
>> >>     277 inodes used (0.00%)
>> >>       5 non-contiguous files (1.8%)
>> >>       0 non-contiguous directories (0.0%)
>> >>         # of inodes with ind/dind/tind blocks: 41/41/3
>> >>         Extent depth histogram: 69/28/2
>> >> 51890920 blocks used (79.18%)
>> >>       0 bad blocks
>> >>      41 large files
>> >> 
>> >>     199 regular files
>> >>      53 directories
>> >>       0 character device files
>> >>       0 block device files
>> >>       0 fifos
>> >>       0 links
>> >>      16 symbolic links (16 fast symbolic links)
>> >>       0 sockets
>> >> --------
>> >>     268 files
>> >> serveerstertje:/#
>> >> 
>> >> 
>> >> 
>> >> 
>> >> System:
>> >> - Kernel 3.2.0
>> >> - Debian Squeeze with:
>> >> ii  e2fslibs                              1.41.12-4stable1                     ext2/ext3/ext4 file system libraries
>> >> ii  e2fsprogs                             1.41.12-4stable1                     ext2/ext3/ext4 file system utilities
>> >> 
>> >> ===8<===========End of original message text===========
>> >> 
>> >> 
>> >> 
>> >> -- 
>> >> Best regards,
>> >> Sander                            mailto:linux@eikelenboom.it<Message01.eml>
>> 
>> 
>> 
>> 
>> -- 
>> Best regards,
>>  Sander                            mailto:linux@eikelenboom.it









-- 
Best regards,
 Sander                            mailto:linux@eikelenboom.it


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: can't recover ext4 on lvm from ext4_mb_generate_buddy:739: group 1687, 32254 clusters in bitmap, 32258 in gd
@ 2012-01-05 20:04           ` Sander Eikelenboom
  0 siblings, 0 replies; 26+ messages in thread
From: Sander Eikelenboom @ 2012-01-05 20:04 UTC (permalink / raw)
  To: Ted Ts'o; +Cc: linux-ext4, linux-kernel, dm-devel

Thursday, January 5, 2012, 7:15:35 PM, you wrote:

> On Thu, Jan 05, 2012 at 05:14:28PM +0100, Sander Eikelenboom wrote:
>> 
>> OK spoke too soon, i have been able to trigger it again:
>> - copying files from LV to the same LV without the snapshot went OK
>> - copying from the RO snapshot of a LV to the same LV gave the error while copying the file again:

> OK.  Originally, you said you did this:

> 1) fsck -v -p -f the filesystem
> 2) mount the filesystem
> 3) Try to copy a file
> 4) filesystem will be mounted RO on error  (see below)
> 5) fsck again, journal will be recovered, no other errors
> 6) start at 1)

> Was this with with a read-only snapshot always being in existence
> through all of these five steps?  When was the RO snapshot created?

> If a RO snapshot has to be there in order for this to happen, then
> this is almost certainly a device-mapper regression.  (dm-devel folks,
> this is a problem which apparently occurred when the user went from
> v3.1.5 to v3.2, so this looks likes 3.2 regression.)

>                                                 - Ted


Well it seems to consist of 2 issues with a kernel booted with a 3.2.0 kernel:

1) - It only seems to trigger with a snapshot of the LV present
   - Just tested if the snapshot being mounted RO did really matter, it doesn't.
   - It can also be triggerd if mounted RW
   - It can also be triggered when the snapshot is not mounted at all (by just copying some files on the filesystem itself)

   So that seems a device mapper issue

2) BUT:
   after the error triggerd by 1:
   - After removing the snapshot with lvremove,
   - umounting the filesystem on the LV
   - fsckíng the filesystem without errors (apart from the journal recovery)
   - rebooting the machine again with 3.2.0 kernel
   - mounting the filesystem on the LV
   - removing the partially copied files
   - trying to copy files from the filesystem on the LV to the same filesystem, without a snapshot of the LV present
   - it fails with the exact same error mounting the filesystem RO.

   then
   - umounting the filesystem on the LV
   - fsckíng the filesystem without errors (apart from the journal recovery)
   - rebooting the machine with a 3.1.5 kernel
   - mounting the filesystem on the LV
   - removing the partially copied files
   - trying to copy files from the filesystem on the LV to the same filesystem, without a snapshot of the LV present
   - no problems files copied ok

   then
   - rebooting into 3.2.0 again
   - mounting the filesystem on the LV
   - removing the completly copied files
   - trying to copy files from the filesystem on the LV to the same filesystem, without a snapshot of the LV present
   - no problems files copied ok


   SO
   - it keeps on failing on 3.2.0, even when the snapshot is gone and the system is rebooted, after 3.1.5 is booted once everything seems to be OK again .... even under 3.2.0
   - that seems more like a filesystem thing ?


I doubled checked and performed all these steps again.

--
Sander








>> 
>> [ 2357.655783] EXT4-fs error (device dm-2): ext4_mb_generate_buddy:739: group 1861, 32254 clusters in bitmap, 32258 in gd
>> [ 2357.656056] Aborting journal on device dm-2-8.
>> [ 2357.718473] EXT4-fs (dm-2): Remounting filesystem read-only
>> [ 2357.736680] EXT4-fs error (device dm-2) in ext4_da_write_end:2532: IO failure
>> [ 2357.738328] EXT4-fs (dm-2): ext4_da_writepages: jbd2_start: 7615 pages, ino 4079617; err -30
>> [ 2716.125010] EXT4-fs error (device dm-2): ext4_put_super:818: Couldn't clean up the journal
>> 
>> 
>> Attached are 4x output from dumpe2fs
>> - dumpe2fs-xen_images-3.2.0                           Made just after boot
>> - dumpe2fs-xen_images-3.2.0-afterfsck                 Made after doing a fsck -v -p -f on the unmounted LV
>> - dumpe2fs-xen_images-3.2.0-aftererror                Made after the error occured on the mounted LV
>> - dumpe2fs-xen_images-3.2.0-aftererror-afterfsck      Made after the error occured, and after a subsequent fsck -v -p -f on the unmounted LV
>> - dumpe2fs-xen_images-3.1.5                           Made after booting into 3.1.5 after all of the above
>> 
>> Oh yes also did a badblock scan to rule that out, and it seems the numbers stay the same.
>> e2fsck 1.41.12 (17-May-2010) (from debian squeeze)
>> 
>> --
>> Sander
>> 
>> 
>> 
>> >> 
>> >> --
>> >> Sander
>> >> 
>> >> 
>> >> This is a forwarded message
>> >> From: Sander Eikelenboom <linux@eikelenboom.it>
>> >> To: "Theodore Ts'o" <tytso@mit.edu>
>> >> Date: Thursday, January 5, 2012, 11:37:59 AM
>> >> Subject: can't recover ext4 on lvm from ext4_mb_generate_buddy:739: group 1687, 32254 clusters in bitmap, 32258 in gd
>> >> 
>> >> ===8<==============Original message text===============
>> >> 
>> >> I'm having some troubles with a ext4 filesystem on LVM, it seems bricked and fsck doesn't seem to find and correct the problem.
>> >> 
>> >> Steps:
>> >> 1) fsck -v -p -f the filesystem
>> >> 2) mount the filesystem
>> >> 3) Try to copy a file
>> >> 4) filesystem will be mounted RO on error  (see below)
>> >> 5) fsck again, journal will be recovered, no other errors
>> >> 6) start at 1)
>> >> 
>> >> 
>> >> I think the way i bricked it is:
>> >> - make a lvm snapshot from that lvm logical disk
>> >> - mount that lvm snapshot as RO
>> >> - try to copy a file from that mounted RO snapshot to a diffrent dir on the lvm logical disk the snapshot is from.
>> >> - it fails and i can't recover (see above)
>> >> 
>> >> 
>> >> Is there a way to recover from this ?
>> >> 
>> >> 
>> >> 
>> >> [  220.748928] EXT4-fs error (device dm-2): ext4_mb_generate_buddy:739: group 1687, 32254 clusters in bitmap, 32258 in gd
>> >> [  220.749415] Aborting journal on device dm-2-8.
>> >> [  220.771633] EXT4-fs error (device dm-2): ext4_journal_start_sb:327: Detected aborted journal
>> >> [  220.772593] EXT4-fs (dm-2): Remounting filesystem read-only
>> >> [  220.792455] EXT4-fs (dm-2): Remounting filesystem read-only
>> >> [  220.805118] EXT4-fs (dm-2): ext4_da_writepages: jbd2_start: 9680 pages, ino 4079617; err -30
>> >> serveerstertje:/mnt/xen_images/domains/production# cd /
>> >> serveerstertje:/# umount /mnt/xen_images/
>> >> serveerstertje:/# fsck -f -v -p /dev/serveerstertje/xen_images
>> >> fsck from util-linux-ng 2.17.2
>> >> /dev/mapper/serveerstertje-xen_images: recovering journal
>> >> 
>> >>     277 inodes used (0.00%)
>> >>       5 non-contiguous files (1.8%)
>> >>       0 non-contiguous directories (0.0%)
>> >>         # of inodes with ind/dind/tind blocks: 41/41/3
>> >>         Extent depth histogram: 69/28/2
>> >> 51890920 blocks used (79.18%)
>> >>       0 bad blocks
>> >>      41 large files
>> >> 
>> >>     199 regular files
>> >>      53 directories
>> >>       0 character device files
>> >>       0 block device files
>> >>       0 fifos
>> >>       0 links
>> >>      16 symbolic links (16 fast symbolic links)
>> >>       0 sockets
>> >> --------
>> >>     268 files
>> >> serveerstertje:/#
>> >> 
>> >> 
>> >> 
>> >> 
>> >> System:
>> >> - Kernel 3.2.0
>> >> - Debian Squeeze with:
>> >> ii  e2fslibs                              1.41.12-4stable1                     ext2/ext3/ext4 file system libraries
>> >> ii  e2fsprogs                             1.41.12-4stable1                     ext2/ext3/ext4 file system utilities
>> >> 
>> >> ===8<===========End of original message text===========
>> >> 
>> >> 
>> >> 
>> >> -- 
>> >> Best regards,
>> >> Sander                            mailto:linux@eikelenboom.it<Message01.eml>
>> 
>> 
>> 
>> 
>> -- 
>> Best regards,
>>  Sander                            mailto:linux@eikelenboom.it









-- 
Best regards,
 Sander                            mailto:linux@eikelenboom.it

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: can't recover ext4 on lvm from ext4_mb_generate_buddy:739: group 1687, 32254 clusters in bitmap, 32258 in gd
  2012-01-05 18:15       ` Ted Ts'o
@ 2012-01-05 20:45           ` Sander Eikelenboom
  2012-01-05 20:45           ` Sander Eikelenboom
                             ` (4 subsequent siblings)
  5 siblings, 0 replies; 26+ messages in thread
From: Sander Eikelenboom @ 2012-01-05 20:45 UTC (permalink / raw)
  To: Ted Ts'o; +Cc: linux-ext4, linux-kernel, dm-devel


Thursday, January 5, 2012, 7:15:35 PM, you wrote:

> On Thu, Jan 05, 2012 at 05:14:28PM +0100, Sander Eikelenboom wrote:
>> 
>> OK spoke too soon, i have been able to trigger it again:
>> - copying files from LV to the same LV without the snapshot went OK
>> - copying from the RO snapshot of a LV to the same LV gave the error while copying the file again:

> OK.  Originally, you said you did this:

> 1) fsck -v -p -f the filesystem
> 2) mount the filesystem
> 3) Try to copy a file
> 4) filesystem will be mounted RO on error  (see below)
> 5) fsck again, journal will be recovered, no other errors
> 6) start at 1)

> Was this with with a read-only snapshot always being in existence
> through all of these five steps?  When was the RO snapshot created?

> If a RO snapshot has to be there in order for this to happen, then
> this is almost certainly a device-mapper regression.  (dm-devel folks,
> this is a problem which apparently occurred when the user went from
> v3.1.5 to v3.2, so this looks likes 3.2 regression.)

>                                                 - Ted


Also found some old info that might be related, http://answers.softpicks.net/answers/topic/2-6-28-ext4-xen-and-lvm-volume-becomes-ro-after-snapshot-1610734-1.htm
I'm also running under xen (host only, no guests started), will try baremetal as well to see if that plays a role.

--
Sander


>> 
>> [ 2357.655783] EXT4-fs error (device dm-2): ext4_mb_generate_buddy:739: group 1861, 32254 clusters in bitmap, 32258 in gd
>> [ 2357.656056] Aborting journal on device dm-2-8.
>> [ 2357.718473] EXT4-fs (dm-2): Remounting filesystem read-only
>> [ 2357.736680] EXT4-fs error (device dm-2) in ext4_da_write_end:2532: IO failure
>> [ 2357.738328] EXT4-fs (dm-2): ext4_da_writepages: jbd2_start: 7615 pages, ino 4079617; err -30
>> [ 2716.125010] EXT4-fs error (device dm-2): ext4_put_super:818: Couldn't clean up the journal
>> 
>> 
>> Attached are 4x output from dumpe2fs
>> - dumpe2fs-xen_images-3.2.0                           Made just after boot
>> - dumpe2fs-xen_images-3.2.0-afterfsck                 Made after doing a fsck -v -p -f on the unmounted LV
>> - dumpe2fs-xen_images-3.2.0-aftererror                Made after the error occured on the mounted LV
>> - dumpe2fs-xen_images-3.2.0-aftererror-afterfsck      Made after the error occured, and after a subsequent fsck -v -p -f on the unmounted LV
>> - dumpe2fs-xen_images-3.1.5                           Made after booting into 3.1.5 after all of the above
>> 
>> Oh yes also did a badblock scan to rule that out, and it seems the numbers stay the same.
>> e2fsck 1.41.12 (17-May-2010) (from debian squeeze)
>> 
>> --
>> Sander
>> 
>> 
>> 
>> >> 
>> >> --
>> >> Sander
>> >> 
>> >> 
>> >> This is a forwarded message
>> >> From: Sander Eikelenboom <linux@eikelenboom.it>
>> >> To: "Theodore Ts'o" <tytso@mit.edu>
>> >> Date: Thursday, January 5, 2012, 11:37:59 AM
>> >> Subject: can't recover ext4 on lvm from ext4_mb_generate_buddy:739: group 1687, 32254 clusters in bitmap, 32258 in gd
>> >> 
>> >> ===8<==============Original message text===============
>> >> 
>> >> I'm having some troubles with a ext4 filesystem on LVM, it seems bricked and fsck doesn't seem to find and correct the problem.
>> >> 
>> >> Steps:
>> >> 1) fsck -v -p -f the filesystem
>> >> 2) mount the filesystem
>> >> 3) Try to copy a file
>> >> 4) filesystem will be mounted RO on error  (see below)
>> >> 5) fsck again, journal will be recovered, no other errors
>> >> 6) start at 1)
>> >> 
>> >> 
>> >> I think the way i bricked it is:
>> >> - make a lvm snapshot from that lvm logical disk
>> >> - mount that lvm snapshot as RO
>> >> - try to copy a file from that mounted RO snapshot to a diffrent dir on the lvm logical disk the snapshot is from.
>> >> - it fails and i can't recover (see above)
>> >> 
>> >> 
>> >> Is there a way to recover from this ?
>> >> 
>> >> 
>> >> 
>> >> [  220.748928] EXT4-fs error (device dm-2): ext4_mb_generate_buddy:739: group 1687, 32254 clusters in bitmap, 32258 in gd
>> >> [  220.749415] Aborting journal on device dm-2-8.
>> >> [  220.771633] EXT4-fs error (device dm-2): ext4_journal_start_sb:327: Detected aborted journal
>> >> [  220.772593] EXT4-fs (dm-2): Remounting filesystem read-only
>> >> [  220.792455] EXT4-fs (dm-2): Remounting filesystem read-only
>> >> [  220.805118] EXT4-fs (dm-2): ext4_da_writepages: jbd2_start: 9680 pages, ino 4079617; err -30
>> >> serveerstertje:/mnt/xen_images/domains/production# cd /
>> >> serveerstertje:/# umount /mnt/xen_images/
>> >> serveerstertje:/# fsck -f -v -p /dev/serveerstertje/xen_images
>> >> fsck from util-linux-ng 2.17.2
>> >> /dev/mapper/serveerstertje-xen_images: recovering journal
>> >> 
>> >>     277 inodes used (0.00%)
>> >>       5 non-contiguous files (1.8%)
>> >>       0 non-contiguous directories (0.0%)
>> >>         # of inodes with ind/dind/tind blocks: 41/41/3
>> >>         Extent depth histogram: 69/28/2
>> >> 51890920 blocks used (79.18%)
>> >>       0 bad blocks
>> >>      41 large files
>> >> 
>> >>     199 regular files
>> >>      53 directories
>> >>       0 character device files
>> >>       0 block device files
>> >>       0 fifos
>> >>       0 links
>> >>      16 symbolic links (16 fast symbolic links)
>> >>       0 sockets
>> >> --------
>> >>     268 files
>> >> serveerstertje:/#
>> >> 
>> >> 
>> >> 
>> >> 
>> >> System:
>> >> - Kernel 3.2.0
>> >> - Debian Squeeze with:
>> >> ii  e2fslibs                              1.41.12-4stable1                     ext2/ext3/ext4 file system libraries
>> >> ii  e2fsprogs                             1.41.12-4stable1                     ext2/ext3/ext4 file system utilities
>> >> 
>> >> ===8<===========End of original message text===========
>> >> 
>> >> 
>> >> 
>> >> -- 
>> >> Best regards,
>> >> Sander                            mailto:linux@eikelenboom.it<Message01.eml>
>> 
>> 
>> 
>> 
>> -- 
>> Best regards,
>>  Sander                            mailto:linux@eikelenboom.it









-- 
Best regards,
 Sander                            mailto:linux@eikelenboom.it


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: can't recover ext4 on lvm from ext4_mb_generate_buddy:739: group 1687, 32254 clusters in bitmap, 32258 in gd
@ 2012-01-05 20:45           ` Sander Eikelenboom
  0 siblings, 0 replies; 26+ messages in thread
From: Sander Eikelenboom @ 2012-01-05 20:45 UTC (permalink / raw)
  To: Ted Ts'o; +Cc: linux-ext4, linux-kernel, dm-devel


Thursday, January 5, 2012, 7:15:35 PM, you wrote:

> On Thu, Jan 05, 2012 at 05:14:28PM +0100, Sander Eikelenboom wrote:
>> 
>> OK spoke too soon, i have been able to trigger it again:
>> - copying files from LV to the same LV without the snapshot went OK
>> - copying from the RO snapshot of a LV to the same LV gave the error while copying the file again:

> OK.  Originally, you said you did this:

> 1) fsck -v -p -f the filesystem
> 2) mount the filesystem
> 3) Try to copy a file
> 4) filesystem will be mounted RO on error  (see below)
> 5) fsck again, journal will be recovered, no other errors
> 6) start at 1)

> Was this with with a read-only snapshot always being in existence
> through all of these five steps?  When was the RO snapshot created?

> If a RO snapshot has to be there in order for this to happen, then
> this is almost certainly a device-mapper regression.  (dm-devel folks,
> this is a problem which apparently occurred when the user went from
> v3.1.5 to v3.2, so this looks likes 3.2 regression.)

>                                                 - Ted


Also found some old info that might be related, http://answers.softpicks.net/answers/topic/2-6-28-ext4-xen-and-lvm-volume-becomes-ro-after-snapshot-1610734-1.htm
I'm also running under xen (host only, no guests started), will try baremetal as well to see if that plays a role.

--
Sander


>> 
>> [ 2357.655783] EXT4-fs error (device dm-2): ext4_mb_generate_buddy:739: group 1861, 32254 clusters in bitmap, 32258 in gd
>> [ 2357.656056] Aborting journal on device dm-2-8.
>> [ 2357.718473] EXT4-fs (dm-2): Remounting filesystem read-only
>> [ 2357.736680] EXT4-fs error (device dm-2) in ext4_da_write_end:2532: IO failure
>> [ 2357.738328] EXT4-fs (dm-2): ext4_da_writepages: jbd2_start: 7615 pages, ino 4079617; err -30
>> [ 2716.125010] EXT4-fs error (device dm-2): ext4_put_super:818: Couldn't clean up the journal
>> 
>> 
>> Attached are 4x output from dumpe2fs
>> - dumpe2fs-xen_images-3.2.0                           Made just after boot
>> - dumpe2fs-xen_images-3.2.0-afterfsck                 Made after doing a fsck -v -p -f on the unmounted LV
>> - dumpe2fs-xen_images-3.2.0-aftererror                Made after the error occured on the mounted LV
>> - dumpe2fs-xen_images-3.2.0-aftererror-afterfsck      Made after the error occured, and after a subsequent fsck -v -p -f on the unmounted LV
>> - dumpe2fs-xen_images-3.1.5                           Made after booting into 3.1.5 after all of the above
>> 
>> Oh yes also did a badblock scan to rule that out, and it seems the numbers stay the same.
>> e2fsck 1.41.12 (17-May-2010) (from debian squeeze)
>> 
>> --
>> Sander
>> 
>> 
>> 
>> >> 
>> >> --
>> >> Sander
>> >> 
>> >> 
>> >> This is a forwarded message
>> >> From: Sander Eikelenboom <linux@eikelenboom.it>
>> >> To: "Theodore Ts'o" <tytso@mit.edu>
>> >> Date: Thursday, January 5, 2012, 11:37:59 AM
>> >> Subject: can't recover ext4 on lvm from ext4_mb_generate_buddy:739: group 1687, 32254 clusters in bitmap, 32258 in gd
>> >> 
>> >> ===8<==============Original message text===============
>> >> 
>> >> I'm having some troubles with a ext4 filesystem on LVM, it seems bricked and fsck doesn't seem to find and correct the problem.
>> >> 
>> >> Steps:
>> >> 1) fsck -v -p -f the filesystem
>> >> 2) mount the filesystem
>> >> 3) Try to copy a file
>> >> 4) filesystem will be mounted RO on error  (see below)
>> >> 5) fsck again, journal will be recovered, no other errors
>> >> 6) start at 1)
>> >> 
>> >> 
>> >> I think the way i bricked it is:
>> >> - make a lvm snapshot from that lvm logical disk
>> >> - mount that lvm snapshot as RO
>> >> - try to copy a file from that mounted RO snapshot to a diffrent dir on the lvm logical disk the snapshot is from.
>> >> - it fails and i can't recover (see above)
>> >> 
>> >> 
>> >> Is there a way to recover from this ?
>> >> 
>> >> 
>> >> 
>> >> [  220.748928] EXT4-fs error (device dm-2): ext4_mb_generate_buddy:739: group 1687, 32254 clusters in bitmap, 32258 in gd
>> >> [  220.749415] Aborting journal on device dm-2-8.
>> >> [  220.771633] EXT4-fs error (device dm-2): ext4_journal_start_sb:327: Detected aborted journal
>> >> [  220.772593] EXT4-fs (dm-2): Remounting filesystem read-only
>> >> [  220.792455] EXT4-fs (dm-2): Remounting filesystem read-only
>> >> [  220.805118] EXT4-fs (dm-2): ext4_da_writepages: jbd2_start: 9680 pages, ino 4079617; err -30
>> >> serveerstertje:/mnt/xen_images/domains/production# cd /
>> >> serveerstertje:/# umount /mnt/xen_images/
>> >> serveerstertje:/# fsck -f -v -p /dev/serveerstertje/xen_images
>> >> fsck from util-linux-ng 2.17.2
>> >> /dev/mapper/serveerstertje-xen_images: recovering journal
>> >> 
>> >>     277 inodes used (0.00%)
>> >>       5 non-contiguous files (1.8%)
>> >>       0 non-contiguous directories (0.0%)
>> >>         # of inodes with ind/dind/tind blocks: 41/41/3
>> >>         Extent depth histogram: 69/28/2
>> >> 51890920 blocks used (79.18%)
>> >>       0 bad blocks
>> >>      41 large files
>> >> 
>> >>     199 regular files
>> >>      53 directories
>> >>       0 character device files
>> >>       0 block device files
>> >>       0 fifos
>> >>       0 links
>> >>      16 symbolic links (16 fast symbolic links)
>> >>       0 sockets
>> >> --------
>> >>     268 files
>> >> serveerstertje:/#
>> >> 
>> >> 
>> >> 
>> >> 
>> >> System:
>> >> - Kernel 3.2.0
>> >> - Debian Squeeze with:
>> >> ii  e2fslibs                              1.41.12-4stable1                     ext2/ext3/ext4 file system libraries
>> >> ii  e2fsprogs                             1.41.12-4stable1                     ext2/ext3/ext4 file system utilities
>> >> 
>> >> ===8<===========End of original message text===========
>> >> 
>> >> 
>> >> 
>> >> -- 
>> >> Best regards,
>> >> Sander                            mailto:linux@eikelenboom.it<Message01.eml>
>> 
>> 
>> 
>> 
>> -- 
>> Best regards,
>>  Sander                            mailto:linux@eikelenboom.it









-- 
Best regards,
 Sander                            mailto:linux@eikelenboom.it


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: can't recover ext4 on lvm from ext4_mb_generate_buddy:739: group 1687, 32254 clusters in bitmap, 32258 in gd
  2012-01-05 18:15       ` Ted Ts'o
@ 2012-01-05 21:31           ` Sander Eikelenboom
  2012-01-05 20:45           ` Sander Eikelenboom
                             ` (4 subsequent siblings)
  5 siblings, 0 replies; 26+ messages in thread
From: Sander Eikelenboom @ 2012-01-05 21:31 UTC (permalink / raw)
  To: Ted Ts'o; +Cc: linux-ext4, linux-kernel, dm-devel

Hello Ted,

Thursday, January 5, 2012, 7:15:35 PM, you wrote:

> On Thu, Jan 05, 2012 at 05:14:28PM +0100, Sander Eikelenboom wrote:
>> 
>> OK spoke too soon, i have been able to trigger it again:
>> - copying files from LV to the same LV without the snapshot went OK
>> - copying from the RO snapshot of a LV to the same LV gave the error while copying the file again:

> OK.  Originally, you said you did this:

> 1) fsck -v -p -f the filesystem
> 2) mount the filesystem
> 3) Try to copy a file
> 4) filesystem will be mounted RO on error  (see below)
> 5) fsck again, journal will be recovered, no other errors
> 6) start at 1)

> Was this with with a read-only snapshot always being in existence
> through all of these five steps?  When was the RO snapshot created?

> If a RO snapshot has to be there in order for this to happen, then
> this is almost certainly a device-mapper regression.  (dm-devel folks,
> this is a problem which apparently occurred when the user went from
> v3.1.5 to v3.2, so this looks likes 3.2 regression.)

>                                                 - Ted


OK Xen is out of the equation, it also happens on baremetal.
Last time under both Xen and baremetal i got a slightly different error (different numbers (group)

[  823.782633] EXT4-fs error (device dm-2): ext4_mb_generate_buddy:739: group 1865, 32254 clusters in bitmap, 32258 in gd
[  823.788129] Aborting journal on device dm-2-8.
[  823.852443] EXT4-fs (dm-2): Remounting filesystem read-only
[  823.857956] EXT4-fs error (device dm-2) in ext4_da_write_end:2532: IO failure
[  823.858646] EXT4-fs (dm-2): ext4_da_writepages: jbd2_start: 12288 pages, ino 4079617; err -30



>> 
>> [ 2357.655783] EXT4-fs error (device dm-2): ext4_mb_generate_buddy:739: group 1861, 32254 clusters in bitmap, 32258 in gd
>> [ 2357.656056] Aborting journal on device dm-2-8.
>> [ 2357.718473] EXT4-fs (dm-2): Remounting filesystem read-only
>> [ 2357.736680] EXT4-fs error (device dm-2) in ext4_da_write_end:2532: IO failure
>> [ 2357.738328] EXT4-fs (dm-2): ext4_da_writepages: jbd2_start: 7615 pages, ino 4079617; err -30
>> [ 2716.125010] EXT4-fs error (device dm-2): ext4_put_super:818: Couldn't clean up the journal
>> 
>> 
>> Attached are 4x output from dumpe2fs
>> - dumpe2fs-xen_images-3.2.0                           Made just after boot
>> - dumpe2fs-xen_images-3.2.0-afterfsck                 Made after doing a fsck -v -p -f on the unmounted LV
>> - dumpe2fs-xen_images-3.2.0-aftererror                Made after the error occured on the mounted LV
>> - dumpe2fs-xen_images-3.2.0-aftererror-afterfsck      Made after the error occured, and after a subsequent fsck -v -p -f on the unmounted LV
>> - dumpe2fs-xen_images-3.1.5                           Made after booting into 3.1.5 after all of the above
>> 
>> Oh yes also did a badblock scan to rule that out, and it seems the numbers stay the same.
>> e2fsck 1.41.12 (17-May-2010) (from debian squeeze)
>> 
>> --
>> Sander
>> 
>> 
>> 
>> >> 
>> >> --
>> >> Sander
>> >> 
>> >> 
>> >> This is a forwarded message
>> >> From: Sander Eikelenboom <linux@eikelenboom.it>
>> >> To: "Theodore Ts'o" <tytso@mit.edu>
>> >> Date: Thursday, January 5, 2012, 11:37:59 AM
>> >> Subject: can't recover ext4 on lvm from ext4_mb_generate_buddy:739: group 1687, 32254 clusters in bitmap, 32258 in gd
>> >> 
>> >> ===8<==============Original message text===============
>> >> 
>> >> I'm having some troubles with a ext4 filesystem on LVM, it seems bricked and fsck doesn't seem to find and correct the problem.
>> >> 
>> >> Steps:
>> >> 1) fsck -v -p -f the filesystem
>> >> 2) mount the filesystem
>> >> 3) Try to copy a file
>> >> 4) filesystem will be mounted RO on error  (see below)
>> >> 5) fsck again, journal will be recovered, no other errors
>> >> 6) start at 1)
>> >> 
>> >> 
>> >> I think the way i bricked it is:
>> >> - make a lvm snapshot from that lvm logical disk
>> >> - mount that lvm snapshot as RO
>> >> - try to copy a file from that mounted RO snapshot to a diffrent dir on the lvm logical disk the snapshot is from.
>> >> - it fails and i can't recover (see above)
>> >> 
>> >> 
>> >> Is there a way to recover from this ?
>> >> 
>> >> 
>> >> 
>> >> [  220.748928] EXT4-fs error (device dm-2): ext4_mb_generate_buddy:739: group 1687, 32254 clusters in bitmap, 32258 in gd
>> >> [  220.749415] Aborting journal on device dm-2-8.
>> >> [  220.771633] EXT4-fs error (device dm-2): ext4_journal_start_sb:327: Detected aborted journal
>> >> [  220.772593] EXT4-fs (dm-2): Remounting filesystem read-only
>> >> [  220.792455] EXT4-fs (dm-2): Remounting filesystem read-only
>> >> [  220.805118] EXT4-fs (dm-2): ext4_da_writepages: jbd2_start: 9680 pages, ino 4079617; err -30
>> >> serveerstertje:/mnt/xen_images/domains/production# cd /
>> >> serveerstertje:/# umount /mnt/xen_images/
>> >> serveerstertje:/# fsck -f -v -p /dev/serveerstertje/xen_images
>> >> fsck from util-linux-ng 2.17.2
>> >> /dev/mapper/serveerstertje-xen_images: recovering journal
>> >> 
>> >>     277 inodes used (0.00%)
>> >>       5 non-contiguous files (1.8%)
>> >>       0 non-contiguous directories (0.0%)
>> >>         # of inodes with ind/dind/tind blocks: 41/41/3
>> >>         Extent depth histogram: 69/28/2
>> >> 51890920 blocks used (79.18%)
>> >>       0 bad blocks
>> >>      41 large files
>> >> 
>> >>     199 regular files
>> >>      53 directories
>> >>       0 character device files
>> >>       0 block device files
>> >>       0 fifos
>> >>       0 links
>> >>      16 symbolic links (16 fast symbolic links)
>> >>       0 sockets
>> >> --------
>> >>     268 files
>> >> serveerstertje:/#
>> >> 
>> >> 
>> >> 
>> >> 
>> >> System:
>> >> - Kernel 3.2.0
>> >> - Debian Squeeze with:
>> >> ii  e2fslibs                              1.41.12-4stable1                     ext2/ext3/ext4 file system libraries
>> >> ii  e2fsprogs                             1.41.12-4stable1                     ext2/ext3/ext4 file system utilities
>> >> 
>> >> ===8<===========End of original message text===========
>> >> 
>> >> 
>> >> 
>> >> -- 
>> >> Best regards,
>> >> Sander                            mailto:linux@eikelenboom.it<Message01.eml>
>> 
>> 
>> 
>> 
>> -- 
>> Best regards,
>>  Sander                            mailto:linux@eikelenboom.it









-- 
Best regards,
 Sander                            mailto:linux@eikelenboom.it


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: can't recover ext4 on lvm from ext4_mb_generate_buddy:739: group 1687, 32254 clusters in bitmap, 32258 in gd
@ 2012-01-05 21:31           ` Sander Eikelenboom
  0 siblings, 0 replies; 26+ messages in thread
From: Sander Eikelenboom @ 2012-01-05 21:31 UTC (permalink / raw)
  To: Ted Ts'o; +Cc: linux-ext4, linux-kernel, dm-devel

Hello Ted,

Thursday, January 5, 2012, 7:15:35 PM, you wrote:

> On Thu, Jan 05, 2012 at 05:14:28PM +0100, Sander Eikelenboom wrote:
>> 
>> OK spoke too soon, i have been able to trigger it again:
>> - copying files from LV to the same LV without the snapshot went OK
>> - copying from the RO snapshot of a LV to the same LV gave the error while copying the file again:

> OK.  Originally, you said you did this:

> 1) fsck -v -p -f the filesystem
> 2) mount the filesystem
> 3) Try to copy a file
> 4) filesystem will be mounted RO on error  (see below)
> 5) fsck again, journal will be recovered, no other errors
> 6) start at 1)

> Was this with with a read-only snapshot always being in existence
> through all of these five steps?  When was the RO snapshot created?

> If a RO snapshot has to be there in order for this to happen, then
> this is almost certainly a device-mapper regression.  (dm-devel folks,
> this is a problem which apparently occurred when the user went from
> v3.1.5 to v3.2, so this looks likes 3.2 regression.)

>                                                 - Ted


OK Xen is out of the equation, it also happens on baremetal.
Last time under both Xen and baremetal i got a slightly different error (different numbers (group)

[  823.782633] EXT4-fs error (device dm-2): ext4_mb_generate_buddy:739: group 1865, 32254 clusters in bitmap, 32258 in gd
[  823.788129] Aborting journal on device dm-2-8.
[  823.852443] EXT4-fs (dm-2): Remounting filesystem read-only
[  823.857956] EXT4-fs error (device dm-2) in ext4_da_write_end:2532: IO failure
[  823.858646] EXT4-fs (dm-2): ext4_da_writepages: jbd2_start: 12288 pages, ino 4079617; err -30



>> 
>> [ 2357.655783] EXT4-fs error (device dm-2): ext4_mb_generate_buddy:739: group 1861, 32254 clusters in bitmap, 32258 in gd
>> [ 2357.656056] Aborting journal on device dm-2-8.
>> [ 2357.718473] EXT4-fs (dm-2): Remounting filesystem read-only
>> [ 2357.736680] EXT4-fs error (device dm-2) in ext4_da_write_end:2532: IO failure
>> [ 2357.738328] EXT4-fs (dm-2): ext4_da_writepages: jbd2_start: 7615 pages, ino 4079617; err -30
>> [ 2716.125010] EXT4-fs error (device dm-2): ext4_put_super:818: Couldn't clean up the journal
>> 
>> 
>> Attached are 4x output from dumpe2fs
>> - dumpe2fs-xen_images-3.2.0                           Made just after boot
>> - dumpe2fs-xen_images-3.2.0-afterfsck                 Made after doing a fsck -v -p -f on the unmounted LV
>> - dumpe2fs-xen_images-3.2.0-aftererror                Made after the error occured on the mounted LV
>> - dumpe2fs-xen_images-3.2.0-aftererror-afterfsck      Made after the error occured, and after a subsequent fsck -v -p -f on the unmounted LV
>> - dumpe2fs-xen_images-3.1.5                           Made after booting into 3.1.5 after all of the above
>> 
>> Oh yes also did a badblock scan to rule that out, and it seems the numbers stay the same.
>> e2fsck 1.41.12 (17-May-2010) (from debian squeeze)
>> 
>> --
>> Sander
>> 
>> 
>> 
>> >> 
>> >> --
>> >> Sander
>> >> 
>> >> 
>> >> This is a forwarded message
>> >> From: Sander Eikelenboom <linux@eikelenboom.it>
>> >> To: "Theodore Ts'o" <tytso@mit.edu>
>> >> Date: Thursday, January 5, 2012, 11:37:59 AM
>> >> Subject: can't recover ext4 on lvm from ext4_mb_generate_buddy:739: group 1687, 32254 clusters in bitmap, 32258 in gd
>> >> 
>> >> ===8<==============Original message text===============
>> >> 
>> >> I'm having some troubles with a ext4 filesystem on LVM, it seems bricked and fsck doesn't seem to find and correct the problem.
>> >> 
>> >> Steps:
>> >> 1) fsck -v -p -f the filesystem
>> >> 2) mount the filesystem
>> >> 3) Try to copy a file
>> >> 4) filesystem will be mounted RO on error  (see below)
>> >> 5) fsck again, journal will be recovered, no other errors
>> >> 6) start at 1)
>> >> 
>> >> 
>> >> I think the way i bricked it is:
>> >> - make a lvm snapshot from that lvm logical disk
>> >> - mount that lvm snapshot as RO
>> >> - try to copy a file from that mounted RO snapshot to a diffrent dir on the lvm logical disk the snapshot is from.
>> >> - it fails and i can't recover (see above)
>> >> 
>> >> 
>> >> Is there a way to recover from this ?
>> >> 
>> >> 
>> >> 
>> >> [  220.748928] EXT4-fs error (device dm-2): ext4_mb_generate_buddy:739: group 1687, 32254 clusters in bitmap, 32258 in gd
>> >> [  220.749415] Aborting journal on device dm-2-8.
>> >> [  220.771633] EXT4-fs error (device dm-2): ext4_journal_start_sb:327: Detected aborted journal
>> >> [  220.772593] EXT4-fs (dm-2): Remounting filesystem read-only
>> >> [  220.792455] EXT4-fs (dm-2): Remounting filesystem read-only
>> >> [  220.805118] EXT4-fs (dm-2): ext4_da_writepages: jbd2_start: 9680 pages, ino 4079617; err -30
>> >> serveerstertje:/mnt/xen_images/domains/production# cd /
>> >> serveerstertje:/# umount /mnt/xen_images/
>> >> serveerstertje:/# fsck -f -v -p /dev/serveerstertje/xen_images
>> >> fsck from util-linux-ng 2.17.2
>> >> /dev/mapper/serveerstertje-xen_images: recovering journal
>> >> 
>> >>     277 inodes used (0.00%)
>> >>       5 non-contiguous files (1.8%)
>> >>       0 non-contiguous directories (0.0%)
>> >>         # of inodes with ind/dind/tind blocks: 41/41/3
>> >>         Extent depth histogram: 69/28/2
>> >> 51890920 blocks used (79.18%)
>> >>       0 bad blocks
>> >>      41 large files
>> >> 
>> >>     199 regular files
>> >>      53 directories
>> >>       0 character device files
>> >>       0 block device files
>> >>       0 fifos
>> >>       0 links
>> >>      16 symbolic links (16 fast symbolic links)
>> >>       0 sockets
>> >> --------
>> >>     268 files
>> >> serveerstertje:/#
>> >> 
>> >> 
>> >> 
>> >> 
>> >> System:
>> >> - Kernel 3.2.0
>> >> - Debian Squeeze with:
>> >> ii  e2fslibs                              1.41.12-4stable1                     ext2/ext3/ext4 file system libraries
>> >> ii  e2fsprogs                             1.41.12-4stable1                     ext2/ext3/ext4 file system utilities
>> >> 
>> >> ===8<===========End of original message text===========
>> >> 
>> >> 
>> >> 
>> >> -- 
>> >> Best regards,
>> >> Sander                            mailto:linux@eikelenboom.it<Message01.eml>
>> 
>> 
>> 
>> 
>> -- 
>> Best regards,
>>  Sander                            mailto:linux@eikelenboom.it









-- 
Best regards,
 Sander                            mailto:linux@eikelenboom.it


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: can't recover ext4 on lvm from ext4_mb_generate_buddy:739: group 1687, 32254 clusters in bitmap, 32258 in gd
  2012-01-05 18:15       ` Ted Ts'o
@ 2012-01-05 22:43           ` Sander Eikelenboom
  2012-01-05 20:45           ` Sander Eikelenboom
                             ` (4 subsequent siblings)
  5 siblings, 0 replies; 26+ messages in thread
From: Sander Eikelenboom @ 2012-01-05 22:43 UTC (permalink / raw)
  To: Ted Ts'o; +Cc: linux-ext4, linux-kernel, dm-devel

Hello Ted,

Thursday, January 5, 2012, 7:15:35 PM, you wrote:

> On Thu, Jan 05, 2012 at 05:14:28PM +0100, Sander Eikelenboom wrote:
>> 
>> OK spoke too soon, i have been able to trigger it again:
>> - copying files from LV to the same LV without the snapshot went OK
>> - copying from the RO snapshot of a LV to the same LV gave the error while copying the file again:

> OK.  Originally, you said you did this:

> 1) fsck -v -p -f the filesystem
> 2) mount the filesystem
> 3) Try to copy a file
> 4) filesystem will be mounted RO on error  (see below)
> 5) fsck again, journal will be recovered, no other errors
> 6) start at 1)

> Was this with with a read-only snapshot always being in existence
> through all of these five steps?  When was the RO snapshot created?

> If a RO snapshot has to be there in order for this to happen, then
> this is almost certainly a device-mapper regression.  (dm-devel folks,
> this is a problem which apparently occurred when the user went from
> v3.1.5 to v3.2, so this looks likes 3.2 regression.)

>                                                 - Ted

Tried to bisect, but every kernel in between seems to have some drivers for devices f*cked up so it doesn't even boot.
That was a quite frustrating and disappointing experience.
So it's back to 3.1.5 and continue with i was actually trying to do, and try later if it's still reproducible with another disk layout.

Thx for your effort so far.

--
Sander

>> 
>> [ 2357.655783] EXT4-fs error (device dm-2): ext4_mb_generate_buddy:739: group 1861, 32254 clusters in bitmap, 32258 in gd
>> [ 2357.656056] Aborting journal on device dm-2-8.
>> [ 2357.718473] EXT4-fs (dm-2): Remounting filesystem read-only
>> [ 2357.736680] EXT4-fs error (device dm-2) in ext4_da_write_end:2532: IO failure
>> [ 2357.738328] EXT4-fs (dm-2): ext4_da_writepages: jbd2_start: 7615 pages, ino 4079617; err -30
>> [ 2716.125010] EXT4-fs error (device dm-2): ext4_put_super:818: Couldn't clean up the journal
>> 
>> 
>> Attached are 4x output from dumpe2fs
>> - dumpe2fs-xen_images-3.2.0                           Made just after boot
>> - dumpe2fs-xen_images-3.2.0-afterfsck                 Made after doing a fsck -v -p -f on the unmounted LV
>> - dumpe2fs-xen_images-3.2.0-aftererror                Made after the error occured on the mounted LV
>> - dumpe2fs-xen_images-3.2.0-aftererror-afterfsck      Made after the error occured, and after a subsequent fsck -v -p -f on the unmounted LV
>> - dumpe2fs-xen_images-3.1.5                           Made after booting into 3.1.5 after all of the above
>> 
>> Oh yes also did a badblock scan to rule that out, and it seems the numbers stay the same.
>> e2fsck 1.41.12 (17-May-2010) (from debian squeeze)
>> 
>> --
>> Sander
>> 
>> 
>> 
>> >> 
>> >> --
>> >> Sander
>> >> 
>> >> 
>> >> This is a forwarded message
>> >> From: Sander Eikelenboom <linux@eikelenboom.it>
>> >> To: "Theodore Ts'o" <tytso@mit.edu>
>> >> Date: Thursday, January 5, 2012, 11:37:59 AM
>> >> Subject: can't recover ext4 on lvm from ext4_mb_generate_buddy:739: group 1687, 32254 clusters in bitmap, 32258 in gd
>> >> 
>> >> ===8<==============Original message text===============
>> >> 
>> >> I'm having some troubles with a ext4 filesystem on LVM, it seems bricked and fsck doesn't seem to find and correct the problem.
>> >> 
>> >> Steps:
>> >> 1) fsck -v -p -f the filesystem
>> >> 2) mount the filesystem
>> >> 3) Try to copy a file
>> >> 4) filesystem will be mounted RO on error  (see below)
>> >> 5) fsck again, journal will be recovered, no other errors
>> >> 6) start at 1)
>> >> 
>> >> 
>> >> I think the way i bricked it is:
>> >> - make a lvm snapshot from that lvm logical disk
>> >> - mount that lvm snapshot as RO
>> >> - try to copy a file from that mounted RO snapshot to a diffrent dir on the lvm logical disk the snapshot is from.
>> >> - it fails and i can't recover (see above)
>> >> 
>> >> 
>> >> Is there a way to recover from this ?
>> >> 
>> >> 
>> >> 
>> >> [  220.748928] EXT4-fs error (device dm-2): ext4_mb_generate_buddy:739: group 1687, 32254 clusters in bitmap, 32258 in gd
>> >> [  220.749415] Aborting journal on device dm-2-8.
>> >> [  220.771633] EXT4-fs error (device dm-2): ext4_journal_start_sb:327: Detected aborted journal
>> >> [  220.772593] EXT4-fs (dm-2): Remounting filesystem read-only
>> >> [  220.792455] EXT4-fs (dm-2): Remounting filesystem read-only
>> >> [  220.805118] EXT4-fs (dm-2): ext4_da_writepages: jbd2_start: 9680 pages, ino 4079617; err -30
>> >> serveerstertje:/mnt/xen_images/domains/production# cd /
>> >> serveerstertje:/# umount /mnt/xen_images/
>> >> serveerstertje:/# fsck -f -v -p /dev/serveerstertje/xen_images
>> >> fsck from util-linux-ng 2.17.2
>> >> /dev/mapper/serveerstertje-xen_images: recovering journal
>> >> 
>> >>     277 inodes used (0.00%)
>> >>       5 non-contiguous files (1.8%)
>> >>       0 non-contiguous directories (0.0%)
>> >>         # of inodes with ind/dind/tind blocks: 41/41/3
>> >>         Extent depth histogram: 69/28/2
>> >> 51890920 blocks used (79.18%)
>> >>       0 bad blocks
>> >>      41 large files
>> >> 
>> >>     199 regular files
>> >>      53 directories
>> >>       0 character device files
>> >>       0 block device files
>> >>       0 fifos
>> >>       0 links
>> >>      16 symbolic links (16 fast symbolic links)
>> >>       0 sockets
>> >> --------
>> >>     268 files
>> >> serveerstertje:/#
>> >> 
>> >> 
>> >> 
>> >> 
>> >> System:
>> >> - Kernel 3.2.0
>> >> - Debian Squeeze with:
>> >> ii  e2fslibs                              1.41.12-4stable1                     ext2/ext3/ext4 file system libraries
>> >> ii  e2fsprogs                             1.41.12-4stable1                     ext2/ext3/ext4 file system utilities
>> >> 
>> >> ===8<===========End of original message text===========
>> >> 
>> >> 
>> >> 
>> >> -- 
>> >> Best regards,
>> >> Sander                            mailto:linux@eikelenboom.it<Message01.eml>
>> 
>> 
>> 
>> 
>> -- 
>> Best regards,
>>  Sander                            mailto:linux@eikelenboom.it









-- 
Best regards,
 Sander                            mailto:linux@eikelenboom.it


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: can't recover ext4 on lvm from ext4_mb_generate_buddy:739: group 1687, 32254 clusters in bitmap, 32258 in gd
@ 2012-01-05 22:43           ` Sander Eikelenboom
  0 siblings, 0 replies; 26+ messages in thread
From: Sander Eikelenboom @ 2012-01-05 22:43 UTC (permalink / raw)
  To: Ted Ts'o; +Cc: linux-ext4, linux-kernel, dm-devel

Hello Ted,

Thursday, January 5, 2012, 7:15:35 PM, you wrote:

> On Thu, Jan 05, 2012 at 05:14:28PM +0100, Sander Eikelenboom wrote:
>> 
>> OK spoke too soon, i have been able to trigger it again:
>> - copying files from LV to the same LV without the snapshot went OK
>> - copying from the RO snapshot of a LV to the same LV gave the error while copying the file again:

> OK.  Originally, you said you did this:

> 1) fsck -v -p -f the filesystem
> 2) mount the filesystem
> 3) Try to copy a file
> 4) filesystem will be mounted RO on error  (see below)
> 5) fsck again, journal will be recovered, no other errors
> 6) start at 1)

> Was this with with a read-only snapshot always being in existence
> through all of these five steps?  When was the RO snapshot created?

> If a RO snapshot has to be there in order for this to happen, then
> this is almost certainly a device-mapper regression.  (dm-devel folks,
> this is a problem which apparently occurred when the user went from
> v3.1.5 to v3.2, so this looks likes 3.2 regression.)

>                                                 - Ted

Tried to bisect, but every kernel in between seems to have some drivers for devices f*cked up so it doesn't even boot.
That was a quite frustrating and disappointing experience.
So it's back to 3.1.5 and continue with i was actually trying to do, and try later if it's still reproducible with another disk layout.

Thx for your effort so far.

--
Sander

>> 
>> [ 2357.655783] EXT4-fs error (device dm-2): ext4_mb_generate_buddy:739: group 1861, 32254 clusters in bitmap, 32258 in gd
>> [ 2357.656056] Aborting journal on device dm-2-8.
>> [ 2357.718473] EXT4-fs (dm-2): Remounting filesystem read-only
>> [ 2357.736680] EXT4-fs error (device dm-2) in ext4_da_write_end:2532: IO failure
>> [ 2357.738328] EXT4-fs (dm-2): ext4_da_writepages: jbd2_start: 7615 pages, ino 4079617; err -30
>> [ 2716.125010] EXT4-fs error (device dm-2): ext4_put_super:818: Couldn't clean up the journal
>> 
>> 
>> Attached are 4x output from dumpe2fs
>> - dumpe2fs-xen_images-3.2.0                           Made just after boot
>> - dumpe2fs-xen_images-3.2.0-afterfsck                 Made after doing a fsck -v -p -f on the unmounted LV
>> - dumpe2fs-xen_images-3.2.0-aftererror                Made after the error occured on the mounted LV
>> - dumpe2fs-xen_images-3.2.0-aftererror-afterfsck      Made after the error occured, and after a subsequent fsck -v -p -f on the unmounted LV
>> - dumpe2fs-xen_images-3.1.5                           Made after booting into 3.1.5 after all of the above
>> 
>> Oh yes also did a badblock scan to rule that out, and it seems the numbers stay the same.
>> e2fsck 1.41.12 (17-May-2010) (from debian squeeze)
>> 
>> --
>> Sander
>> 
>> 
>> 
>> >> 
>> >> --
>> >> Sander
>> >> 
>> >> 
>> >> This is a forwarded message
>> >> From: Sander Eikelenboom <linux@eikelenboom.it>
>> >> To: "Theodore Ts'o" <tytso@mit.edu>
>> >> Date: Thursday, January 5, 2012, 11:37:59 AM
>> >> Subject: can't recover ext4 on lvm from ext4_mb_generate_buddy:739: group 1687, 32254 clusters in bitmap, 32258 in gd
>> >> 
>> >> ===8<==============Original message text===============
>> >> 
>> >> I'm having some troubles with a ext4 filesystem on LVM, it seems bricked and fsck doesn't seem to find and correct the problem.
>> >> 
>> >> Steps:
>> >> 1) fsck -v -p -f the filesystem
>> >> 2) mount the filesystem
>> >> 3) Try to copy a file
>> >> 4) filesystem will be mounted RO on error  (see below)
>> >> 5) fsck again, journal will be recovered, no other errors
>> >> 6) start at 1)
>> >> 
>> >> 
>> >> I think the way i bricked it is:
>> >> - make a lvm snapshot from that lvm logical disk
>> >> - mount that lvm snapshot as RO
>> >> - try to copy a file from that mounted RO snapshot to a diffrent dir on the lvm logical disk the snapshot is from.
>> >> - it fails and i can't recover (see above)
>> >> 
>> >> 
>> >> Is there a way to recover from this ?
>> >> 
>> >> 
>> >> 
>> >> [  220.748928] EXT4-fs error (device dm-2): ext4_mb_generate_buddy:739: group 1687, 32254 clusters in bitmap, 32258 in gd
>> >> [  220.749415] Aborting journal on device dm-2-8.
>> >> [  220.771633] EXT4-fs error (device dm-2): ext4_journal_start_sb:327: Detected aborted journal
>> >> [  220.772593] EXT4-fs (dm-2): Remounting filesystem read-only
>> >> [  220.792455] EXT4-fs (dm-2): Remounting filesystem read-only
>> >> [  220.805118] EXT4-fs (dm-2): ext4_da_writepages: jbd2_start: 9680 pages, ino 4079617; err -30
>> >> serveerstertje:/mnt/xen_images/domains/production# cd /
>> >> serveerstertje:/# umount /mnt/xen_images/
>> >> serveerstertje:/# fsck -f -v -p /dev/serveerstertje/xen_images
>> >> fsck from util-linux-ng 2.17.2
>> >> /dev/mapper/serveerstertje-xen_images: recovering journal
>> >> 
>> >>     277 inodes used (0.00%)
>> >>       5 non-contiguous files (1.8%)
>> >>       0 non-contiguous directories (0.0%)
>> >>         # of inodes with ind/dind/tind blocks: 41/41/3
>> >>         Extent depth histogram: 69/28/2
>> >> 51890920 blocks used (79.18%)
>> >>       0 bad blocks
>> >>      41 large files
>> >> 
>> >>     199 regular files
>> >>      53 directories
>> >>       0 character device files
>> >>       0 block device files
>> >>       0 fifos
>> >>       0 links
>> >>      16 symbolic links (16 fast symbolic links)
>> >>       0 sockets
>> >> --------
>> >>     268 files
>> >> serveerstertje:/#
>> >> 
>> >> 
>> >> 
>> >> 
>> >> System:
>> >> - Kernel 3.2.0
>> >> - Debian Squeeze with:
>> >> ii  e2fslibs                              1.41.12-4stable1                     ext2/ext3/ext4 file system libraries
>> >> ii  e2fsprogs                             1.41.12-4stable1                     ext2/ext3/ext4 file system utilities
>> >> 
>> >> ===8<===========End of original message text===========
>> >> 
>> >> 
>> >> 
>> >> -- 
>> >> Best regards,
>> >> Sander                            mailto:linux@eikelenboom.it<Message01.eml>
>> 
>> 
>> 
>> 
>> -- 
>> Best regards,
>>  Sander                            mailto:linux@eikelenboom.it









-- 
Best regards,
 Sander                            mailto:linux@eikelenboom.it


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [dm-devel] can't recover ext4 on lvm from ext4_mb_generate_buddy:739: group 1687, 32254 clusters in bitmap, 32258 in gd
  2012-01-05 18:15       ` Ted Ts'o
                           ` (3 preceding siblings ...)
  2012-01-05 22:43           ` Sander Eikelenboom
@ 2012-01-06 16:40         ` Mikulas Patocka
  2012-01-28  4:53           ` WIMPy
  2012-04-12  6:45           ` Landry Minoza
  5 siblings, 1 reply; 26+ messages in thread
From: Mikulas Patocka @ 2012-01-06 16:40 UTC (permalink / raw)
  To: device-mapper development; +Cc: Sander Eikelenboom, linux-ext4, linux-kernel



On Thu, 5 Jan 2012, Ted Ts'o wrote:

> On Thu, Jan 05, 2012 at 05:14:28PM +0100, Sander Eikelenboom wrote:
> > 
> > OK spoke too soon, i have been able to trigger it again:
> > - copying files from LV to the same LV without the snapshot went OK
> > - copying from the RO snapshot of a LV to the same LV gave the error while copying the file again:
> 
> OK.  Originally, you said you did this:
> 
> 1) fsck -v -p -f the filesystem
> 2) mount the filesystem
> 3) Try to copy a file
> 4) filesystem will be mounted RO on error  (see below)
> 5) fsck again, journal will be recovered, no other errors
> 6) start at 1)
> 
> Was this with with a read-only snapshot always being in existence
> through all of these five steps?  When was the RO snapshot created?
> 
> If a RO snapshot has to be there in order for this to happen, then
> this is almost certainly a device-mapper regression.  (dm-devel folks,

The existence of a snapshot changes I/O completion times significantly, so 
it may be a race condition in ext4 that gets triggered which changed 
timings.

Mikulas

> this is a problem which apparently occurred when the user went from
> v3.1.5 to v3.2, so this looks likes 3.2 regression.)
> 
> 						- Ted

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [dm-devel] can't recover ext4 on lvm from ext4_mb_generate_buddy:739: group 1687, 32254 clusters in bitmap, 32258 in gd
  2012-01-06 16:40         ` [dm-devel] " Mikulas Patocka
@ 2012-01-28  4:53           ` WIMPy
  2012-01-28  8:14             ` WIMPy
  0 siblings, 1 reply; 26+ messages in thread
From: WIMPy @ 2012-01-28  4:53 UTC (permalink / raw)
  To: linux-ext4

Mikulas Patocka <mpatocka <at> redhat.com> writes:

> The existence of a snapshot changes I/O completion times significantly, so 
> it may be a race condition in ext4 that gets triggered which changed 
> timings.

The idea that timing might cause issues on a FS is disturbing.

> > this is a problem which apparently occurred when the user went from
> > v3.1.5 to v3.2, so this looks likes 3.2 regression.)

I am on 3.2.0 as well.

It happened for me on a freshly created FS.
"mke2fs -j -O sparse_super -O dir_index -O extents -O filetype -O uninit_bg"
mounted with no additional options for the first time I got an
"EXT4-fs error (device md127): ext4_mb_generate_buddy:739: group 28671, 32765 
clusters in bitmap, 32766 in gd"
after writing about 3TB of data.
I do not have RO snapshots as the OP, but my md sits on to of luks containers. 
So we do have the device mapper in common.

Just for the records: Unlike the contents, the hardware is not new and did not 
have any known issues.

  Greetings,
    WIMPy

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [dm-devel] can't recover ext4 on lvm from ext4_mb_generate_buddy:739: group 1687, 32254 clusters in bitmap, 32258 in gd
  2012-01-28  4:53           ` WIMPy
@ 2012-01-28  8:14             ` WIMPy
  2012-01-28  8:34               ` Andreas Dilger
  0 siblings, 1 reply; 26+ messages in thread
From: WIMPy @ 2012-01-28  8:14 UTC (permalink / raw)
  To: linux-ext4

Update:

>> > > this is a problem which apparently occurred when the user went from
> > > v3.1.5 to v3.2, so this looks likes 3.2 regression.)
> 
> I am on 3.2.0 as well.

I didn't spot anything obvious in the logs.
 
> It happened for me on a freshly created FS.
> "mke2fs -j -O sparse_super -O dir_index -O extents -O filetype -O uninit_bg"
> mounted with no additional options for the first time I got an
> "EXT4-fs error (device md127): ext4_mb_generate_buddy:739: group 28671, 32765 
> clusters in bitmap, 32766 in gd"
> after writing about 3TB of data.
> I do not have RO snapshots as the OP, but my md sits on to of luks 
containers. 
> So we do have the device mapper in common.

After I did an fsck and tried to continue, I didn't get that far.
After another 200GB or so it happened again.
And now it's reproducible:
I can run fsck and then try to continue (using rsync). But as soon as writing 
starts, the process hangs for a long time. At least one minute, probably longer.
Then the ext4_mb_generate_buddy comes again.

I upgraded e2fstools from 1.41.14 to 1.42 and the kernel to 3.2.2.
No difference.
That FS is unusable.

> Just for the records: Unlike the contents, the hardware is not new and did 
not 
> have any known issues.
> 
>   Greetings,
>     WIMPy
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo <at> vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 





^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [dm-devel] can't recover ext4 on lvm from ext4_mb_generate_buddy:739: group 1687, 32254 clusters in bitmap, 32258 in gd
  2012-01-28  8:14             ` WIMPy
@ 2012-01-28  8:34               ` Andreas Dilger
  2012-01-28 15:31                 ` WIMPy
  0 siblings, 1 reply; 26+ messages in thread
From: Andreas Dilger @ 2012-01-28  8:34 UTC (permalink / raw)
  To: WIMPy; +Cc: linux-ext4

Could you please try to bisect the problem, if it is reproducible?

I was looking for a change which I thought might be responsible (removal of block bitmap initialization when inodes are first allocated from an uninitialized inode table) but I couldn't see it in the git log, so maybe that change has not landed yet.

I don't have any other ideas of which recent patches might be responsible at this point. 

Cheers, Andreas

On 2012-01-28, at 1:14, WIMPy <WIMPy@yeti.dk> wrote:

> Update:
> 
>>>>> this is a problem which apparently occurred when the user went from
>>>> v3.1.5 to v3.2, so this looks likes 3.2 regression.)
>> 
>> I am on 3.2.0 as well.
> 
> I didn't spot anything obvious in the logs.
> 
>> It happened for me on a freshly created FS.
>> "mke2fs -j -O sparse_super -O dir_index -O extents -O filetype -O uninit_bg"
>> mounted with no additional options for the first time I got an
>> "EXT4-fs error (device md127): ext4_mb_generate_buddy:739: group 28671, 32765 
>> clusters in bitmap, 32766 in gd"
>> after writing about 3TB of data.
>> I do not have RO snapshots as the OP, but my md sits on to of luks 
> containers. 
>> So we do have the device mapper in common.
> 
> After I did an fsck and tried to continue, I didn't get that far.
> After another 200GB or so it happened again.
> And now it's reproducible:
> I can run fsck and then try to continue (using rsync). But as soon as writing 
> starts, the process hangs for a long time. At least one minute, probably longer.
> Then the ext4_mb_generate_buddy comes again.
> 
> I upgraded e2fstools from 1.41.14 to 1.42 and the kernel to 3.2.2.
> No difference.
> That FS is unusable.
> 
>> Just for the records: Unlike the contents, the hardware is not new and did 
> not 
>> have any known issues.
>> 
>>  Greetings,
>>    WIMPy
>> 
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>> the body of a message to majordomo <at> vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> 
>> 
> 
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [dm-devel] can't recover ext4 on lvm from ext4_mb_generate_buddy:739: group 1687, 32254 clusters in bitmap, 32258 in gd
  2012-01-28  8:34               ` Andreas Dilger
@ 2012-01-28 15:31                 ` WIMPy
  2012-01-28 21:04                   ` WIMPy
  0 siblings, 1 reply; 26+ messages in thread
From: WIMPy @ 2012-01-28 15:31 UTC (permalink / raw)
  To: linux-ext4

Andreas Dilger <adilger <at> dilger.ca> writes:

> 
> Could you please try to bisect the problem, if it is reproducible?

If you or someone else has an idea, how to do so, I will try to collect more 
information.

There is actually an important bit I forgot to mention in the last message: 
After I got the error and umount the FS I get lots of journal commit I/O 
errors. But no indication as to what or why it fails.

> I was looking for a change which I thought might be responsible (removal of 
block bitmap initialization
> when inodes are first allocated from an uninitialized inode table) but I 
couldn't see it in the git log, so
> maybe that change has not landed yet.
> 
> I don't have any other ideas of which recent patches might be responsible at 
this point. 

As there was a mention at the beginning that this may have happened after an 
upgrade from 3.1.5 to 3.2, I will build a 3.1.5 and see if that really makes a 
difference.

> On 2012-01-28, at 1:14, WIMPy <WIMPy <at> yeti.dk> wrote:
> 
> > Update:
> > 
> >>>>> this is a problem which apparently occurred when the user went from
> >>>> v3.1.5 to v3.2, so this looks likes 3.2 regression.)
> >> 
> >> I am on 3.2.0 as well.
> > 
> > I didn't spot anything obvious in the logs.
> > 
> >> It happened for me on a freshly created FS.
> >> "mke2fs -j -O sparse_super -O dir_index -O extents -O filetype -O uninit_
bg"
> >> mounted with no additional options for the first time I got an
> >> "EXT4-fs error (device md127): ext4_mb_generate_buddy:739: group 28671, 
32765 
> >> clusters in bitmap, 32766 in gd"
> >> after writing about 3TB of data.
> >> I do not have RO snapshots as the OP, but my md sits on to of luks 
> > containers. 
> >> So we do have the device mapper in common.
> > 
> > After I did an fsck and tried to continue, I didn't get that far.
> > After another 200GB or so it happened again.
> > And now it's reproducible:
> > I can run fsck and then try to continue (using rsync). But as soon as 
writing 
> > starts, the process hangs for a long time. At least one minute, probably 
longer.
> > Then the ext4_mb_generate_buddy comes again.
> > 
> > I upgraded e2fstools from 1.41.14 to 1.42 and the kernel to 3.2.2.
> > No difference.
> > That FS is unusable.
> > 
> >> Just for the records: Unlike the contents, the hardware is not new and did 
> > not 
> >> have any known issues.
> >> 
> >>  Greetings,
> >>    WIMPy



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [dm-devel] can't recover ext4 on lvm from ext4_mb_generate_buddy:739: group 1687, 32254 clusters in bitmap, 32258 in gd
  2012-01-28 15:31                 ` WIMPy
@ 2012-01-28 21:04                   ` WIMPy
  2012-02-03  5:30                     ` WIMPy
  0 siblings, 1 reply; 26+ messages in thread
From: WIMPy @ 2012-01-28 21:04 UTC (permalink / raw)
  To: linux-ext4

... and another update:

> As there was a mention at the beginning that this may have happened after an 
> upgrade from 3.1.5 to 3.2, I will build a 3.1.5 and see if that really makes 
a 
> difference.

Yes it does.
3.1.5 has been working for 4.5 hours now, continuing form the point where 3.2 
and 3.2.2 reproducibly barfed.
I see some changes to ext4 on January 9 and 10. But nothing thereafter so I'm 
not sure if it's worth trying something like 3.3-rc1.
The bad thing is that 3.2 has been working for about 20 hours, so it's not a 
quick test.

> > >>>>> this is a problem which apparently occurred when the user went from
> > >>>> v3.1.5 to v3.2, so this looks likes 3.2 regression.)

> > >> It happened for me on a freshly created FS.
> > >> "mke2fs -j -O sparse_super -O dir_index -O extents -O filetype -O uninit_
> bg"
> > >> mounted with no additional options for the first time I got an
> > >> "EXT4-fs error (device md127): ext4_mb_generate_buddy:739: group 28671, 
> 32765 
> > >> clusters in bitmap, 32766 in gd"
> > >> after writing about 3TB of data.
> > >> I do not have RO snapshots as the OP, but my md sits on top of luks 
> > > containers. 
> > >> So we do have the device mapper in common.
> > > 
> > > After I did an fsck and tried to continue, I didn't get that far.
> > > After another 200GB or so it happened again.
> > > And now it's reproducible:
> > > I can run fsck and then try to continue (using rsync). But as soon as 
> writing 
> > > starts, the process hangs for a long time. At least one minute, probably 
> longer.
> > > Then the ext4_mb_generate_buddy comes again.

  Greetings,
    WIMPy



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [dm-devel] can't recover ext4 on lvm from ext4_mb_generate_buddy:739: group 1687, 32254 clusters in bitmap, 32258 in gd
  2012-01-28 21:04                   ` WIMPy
@ 2012-02-03  5:30                     ` WIMPy
  0 siblings, 0 replies; 26+ messages in thread
From: WIMPy @ 2012-02-03  5:30 UTC (permalink / raw)
  To: linux-ext4

WIMPy <WIMPy <at> yeti.dk> writes:

> ... and another update:

I don't know what the cause is, but I think I've got the trigger.
Those errors appeared when using rsync on a directory containing a file that was 
written to (extended) while the rsync was running, which seems to be a 
situation, where rsync causes a lot of stress. It certainly takes a hell of a 
lot of time.

I suspect any of the ext4 related commits from Jan 9th/10th. From the log I 
guess they should still exist in 3.3-rc1. I'm currently testing that, but 
unfortunately that might take some time.

And a short repeat: I'm using an md, but no lvm.

  Greetings,
    WIMPy

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: can't recover ext4 on lvm from ext4_mb_generate_buddy:739: group 1687, 32254 clusters in bitmap, 32258 in gd
  2012-01-05 18:15       ` Ted Ts'o
@ 2012-04-12  6:45           ` Landry Minoza
  2012-01-05 20:45           ` Sander Eikelenboom
                             ` (4 subsequent siblings)
  5 siblings, 0 replies; 26+ messages in thread
From: Landry Minoza @ 2012-04-12  6:45 UTC (permalink / raw)
  To: Ted Ts'o, Sander Eikelenboom, linux-ext4, linux-kernel, dm-devel

On Thu, Jan 5, 2012 at 7:15 PM, Ted Ts'o <tytso@mit.edu> wrote:
> On Thu, Jan 05, 2012 at 05:14:28PM +0100, Sander Eikelenboom wrote:
>>
>> OK spoke too soon, i have been able to trigger it again:
>> - copying files from LV to the same LV without the snapshot went OK
>> - copying from the RO snapshot of a LV to the same LV gave the error while copying the file again:
>
> OK.  Originally, you said you did this:
>
> 1) fsck -v -p -f the filesystem
> 2) mount the filesystem
> 3) Try to copy a file
> 4) filesystem will be mounted RO on error  (see below)
> 5) fsck again, journal will be recovered, no other errors
> 6) start at 1)
>
> Was this with with a read-only snapshot always being in existence
> through all of these five steps?  When was the RO snapshot created?
>
> If a RO snapshot has to be there in order for this to happen, then
> this is almost certainly a device-mapper regression.  (dm-devel folks,
> this is a problem which apparently occurred when the user went from
> v3.1.5 to v3.2, so this looks likes 3.2 regression.)
>

If it can help, I add the exactly same behaviour: filesystem remounted
read-only with the same messages in dmesg and had to fsck it with a
3.1 kernel when I resized my ext4/lvm root fs.

I used kernel 3.3-rc6 from debian experimental amd64.
root fs remounted read-only with the same errors in dmesg after:
lvresize -L +5G /dev/mapper/perceval_vg1-root
resize2fs /dev/mapper/perceval_vg1-root

Rebooting on 3.3 or 3.2 kernel doesn't helped. Also tried to boot on
3.0 and 2.6.x from rescue CDs without success (fsck ok, mounting
without problem but fs remounted ro as soon as I boot on 3.2 or 3.3
kernel).
I had to install a 3.1 kernel boot on it to be able to finaly reboot on 3.3.

I use a single harddrive without any sort of raid and with one lvm pv
and one vg:
sudo fdisk -l /dev/sda

Disk /dev/sda: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders, total 976773168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0xb0000000

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1              63      273104      136521   de  Dell Utility
/dev/sda2       205073105   205265884       96390   83  Linux
/dev/sda3   *      273105   205073104   102400000    7  HPFS/NTFS/exFAT
/dev/sda4       205265885   976773167   385753641+  8e  Linux LVM

Partition table entries are not in disk order

sudo pvs
  PV         VG           Fmt  Attr PSize   PFree
  /dev/sda4  perceval_vg1 lvm2 a--  367,88g 187,35g

sudo vgs
  VG           #PV #LV #SN Attr   VSize   VFree
  perceval_vg1   1   3   0 wz--n- 367,88g 187,35g

sudo lvs
  LV   VG           Attr   LSize   Origin Snap%  Move Log Copy%  Convert
  home perceval_vg1 -wi-ao 151,75g
  root perceval_vg1 -wi-ao  24,97g
  swap perceval_vg1 -wi-ao   3,82g


Can post other informations if it can help

-- 
Landry MINOZA
landry.minoza@gmail.com

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: can't recover ext4 on lvm from ext4_mb_generate_buddy:739: group 1687, 32254 clusters in bitmap, 32258 in gd
@ 2012-04-12  6:45           ` Landry Minoza
  0 siblings, 0 replies; 26+ messages in thread
From: Landry Minoza @ 2012-04-12  6:45 UTC (permalink / raw)
  To: Ted Ts'o, Sander Eikelenboom, linux-ext4, linux-kernel, dm-devel

On Thu, Jan 5, 2012 at 7:15 PM, Ted Ts'o <tytso@mit.edu> wrote:
> On Thu, Jan 05, 2012 at 05:14:28PM +0100, Sander Eikelenboom wrote:
>>
>> OK spoke too soon, i have been able to trigger it again:
>> - copying files from LV to the same LV without the snapshot went OK
>> - copying from the RO snapshot of a LV to the same LV gave the error while copying the file again:
>
> OK.  Originally, you said you did this:
>
> 1) fsck -v -p -f the filesystem
> 2) mount the filesystem
> 3) Try to copy a file
> 4) filesystem will be mounted RO on error  (see below)
> 5) fsck again, journal will be recovered, no other errors
> 6) start at 1)
>
> Was this with with a read-only snapshot always being in existence
> through all of these five steps?  When was the RO snapshot created?
>
> If a RO snapshot has to be there in order for this to happen, then
> this is almost certainly a device-mapper regression.  (dm-devel folks,
> this is a problem which apparently occurred when the user went from
> v3.1.5 to v3.2, so this looks likes 3.2 regression.)
>

If it can help, I add the exactly same behaviour: filesystem remounted
read-only with the same messages in dmesg and had to fsck it with a
3.1 kernel when I resized my ext4/lvm root fs.

I used kernel 3.3-rc6 from debian experimental amd64.
root fs remounted read-only with the same errors in dmesg after:
lvresize -L +5G /dev/mapper/perceval_vg1-root
resize2fs /dev/mapper/perceval_vg1-root

Rebooting on 3.3 or 3.2 kernel doesn't helped. Also tried to boot on
3.0 and 2.6.x from rescue CDs without success (fsck ok, mounting
without problem but fs remounted ro as soon as I boot on 3.2 or 3.3
kernel).
I had to install a 3.1 kernel boot on it to be able to finaly reboot on 3.3.

I use a single harddrive without any sort of raid and with one lvm pv
and one vg:
sudo fdisk -l /dev/sda

Disk /dev/sda: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders, total 976773168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0xb0000000

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1              63      273104      136521   de  Dell Utility
/dev/sda2       205073105   205265884       96390   83  Linux
/dev/sda3   *      273105   205073104   102400000    7  HPFS/NTFS/exFAT
/dev/sda4       205265885   976773167   385753641+  8e  Linux LVM

Partition table entries are not in disk order

sudo pvs
  PV         VG           Fmt  Attr PSize   PFree
  /dev/sda4  perceval_vg1 lvm2 a--  367,88g 187,35g

sudo vgs
  VG           #PV #LV #SN Attr   VSize   VFree
  perceval_vg1   1   3   0 wz--n- 367,88g 187,35g

sudo lvs
  LV   VG           Attr   LSize   Origin Snap%  Move Log Copy%  Convert
  home perceval_vg1 -wi-ao 151,75g
  root perceval_vg1 -wi-ao  24,97g
  swap perceval_vg1 -wi-ao   3,82g


Can post other informations if it can help

-- 
Landry MINOZA
landry.minoza@gmail.com
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2012-04-12  6:45 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-01-05 10:37 can't recover ext4 on lvm from ext4_mb_generate_buddy:739: group 1687, 32254 clusters in bitmap, 32258 in gd Sander Eikelenboom
2012-01-05 13:21 ` Sander Eikelenboom
2012-01-05 14:45   ` Theodore Tso
2012-01-05 14:45     ` Theodore Tso
2012-01-05 14:52     ` Sander Eikelenboom
2012-01-05 14:52       ` Sander Eikelenboom
2012-01-05 15:46     ` Sander Eikelenboom
2012-01-05 15:46       ` Sander Eikelenboom
     [not found]     ` <4910694144.20120105171428@eikelenboom.it>
2012-01-05 18:15       ` Ted Ts'o
2012-01-05 20:04         ` Sander Eikelenboom
2012-01-05 20:04           ` Sander Eikelenboom
2012-01-05 20:45         ` Sander Eikelenboom
2012-01-05 20:45           ` Sander Eikelenboom
2012-01-05 21:31         ` Sander Eikelenboom
2012-01-05 21:31           ` Sander Eikelenboom
2012-01-05 22:43         ` Sander Eikelenboom
2012-01-05 22:43           ` Sander Eikelenboom
2012-01-06 16:40         ` [dm-devel] " Mikulas Patocka
2012-01-28  4:53           ` WIMPy
2012-01-28  8:14             ` WIMPy
2012-01-28  8:34               ` Andreas Dilger
2012-01-28 15:31                 ` WIMPy
2012-01-28 21:04                   ` WIMPy
2012-02-03  5:30                     ` WIMPy
2012-04-12  6:45         ` Landry Minoza
2012-04-12  6:45           ` Landry Minoza

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.