All of lore.kernel.org
 help / color / mirror / Atom feed
* kernel 3.3.4 damages filesystem (?)
@ 2012-05-07 10:46 Helmut Hullen
  2012-05-07 10:58 ` Fajar A. Nugraha
                   ` (3 more replies)
  0 siblings, 4 replies; 54+ messages in thread
From: Helmut Hullen @ 2012-05-07 10:46 UTC (permalink / raw)
  To: linux-btrfs

Hallo,

"never change a running system" ...

=46or some months I run btrfs unter kernel 3.2.5 and 3.2.9, without =20
problems.

Yesterday I compiled kernel 3.3.4, and this morning I started the =20
machine with this kernel. There may be some ugly problems.

Copying something into the btrfs "directory" worked well for some files=
, =20
and then I got error messages (I've not copied them, something with "IO=
 =20
error" under Samba).

Rebooting the machine with kernel 3.2.9 worked, copying 1 file worked, =
=20
but copying more than this file didn't work. And I can't delete this =20
file.

That doesn't please me - copying more than 4 TBytes wastes time and =20
money.

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D configuration =3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D

/dev/sdc1 on /srv/MM type btrfs (rw,noatime)

/dev/sdc: SAMSUNG HD204UI: 25 =B0C
/dev/sdf: WDC WD30EZRX-00MMMB0: 30 =B0C
/dev/sdi: WDC WD30EZRX-00MMMB0: 29 =B0C

Data, RAID0: total=3D5.29TB, used=3D4.29TB
System, RAID1: total=3D8.00MB, used=3D352.00KB
System: total=3D4.00MB, used=3D0.00
Metadata, RAID1: total=3D149.00GB, used=3D5.00GB

Label: 'MMedia'  uuid: 9adfdc84-0fbe-431b-bcb1-cabb6a915e91
	Total devices 3 FS bytes used 4.29TB
	devid    3 size 2.73TB used 1.98TB path /dev/sdi1
	devid    2 size 2.73TB used 1.94TB path /dev/sdf1
	devid    1 size 1.82TB used 1.63TB path /dev/sdc1

Btrfs Btrfs v0.19

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D boot messages=
, kernel related =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D

[boot with kernel 3.3.4]
May  7 06:55:26 Arktur kernel: ata5: exception Emask 0x10 SAct 0x0 SErr=
 0x10000 action 0xe frozen
May  7 06:55:26 Arktur kernel: ata5: SError: { PHYRdyChg }
May  7 06:55:26 Arktur kernel: ata5: hard resetting link
May  7 06:55:31 Arktur kernel: ata5: COMRESET failed (errno=3D-19)
May  7 06:55:31 Arktur kernel: ata5: reset failed (errno=3D-19), retryi=
ng in 6 secs
May  7 06:55:36 Arktur kernel: ata5: hard resetting link
May  7 06:55:38 Arktur kernel: ata5: COMRESET failed (errno=3D-19)
May  7 06:55:38 Arktur kernel: ata5: reset failed (errno=3D-19), retryi=
ng in 9 secs
May  7 06:55:46 Arktur kernel: ata5: hard resetting link
May  7 06:55:47 Arktur kernel: ata5: COMRESET failed (errno=3D-19)
May  7 06:55:47 Arktur kernel: ata5: reset failed (errno=3D-19), retryi=
ng in 34 secs
May  7 06:56:21 Arktur kernel: ata5: hard resetting link
May  7 06:56:22 Arktur kernel: ata5: SATA link up 1.5 Gbps (SStatus 113=
 SControl 310)
May  7 06:56:22 Arktur kernel: ata5.00: configured for UDMA/100
May  7 06:56:22 Arktur kernel: ata5: EH complete
May  7 07:12:07 Arktur kernel: ata5.00: exception Emask 0x10 SAct 0x0 S=
Err 0x10000 action 0xe frozen
May  7 07:12:07 Arktur kernel: ata5: SError: { PHYRdyChg }
May  7 07:12:07 Arktur kernel: ata5.00: failed command: WRITE DMA EXT
May  7 07:12:07 Arktur kernel: ata5.00: cmd 35/00:00:00:62:50/00:04:5e:=
00:00/e0 tag 0 dma 524288 out
May  7 07:12:07 Arktur kernel:          res d8/d8:d8:d8:d8:d8/d8:d8:d8:=
d8:d8/d8 Emask 0x12 (ATA bus error)
May  7 07:12:07 Arktur kernel: ata5.00: status: { Busy }
May  7 07:12:07 Arktur kernel: ata5.00: error: { ICRC UNC IDNF }
May  7 07:12:07 Arktur kernel: ata5: hard resetting link
May  7 07:12:13 Arktur kernel: ata5: link is slow to respond, please be=
 patient (ready=3D-19)
May  7 07:12:15 Arktur kernel: ata5: SATA link up 1.5 Gbps (SStatus 113=
 SControl 310)
May  7 07:12:15 Arktur kernel: ata5.00: failed to IDENTIFY (I/O error, =
err_mask=3D0x100)
May  7 07:12:15 Arktur kernel: ata5.00: revalidation failed (errno=3D-5=
)
May  7 07:12:20 Arktur kernel: ata5: hard resetting link
May  7 07:12:20 Arktur kernel: ata5: COMRESET failed (errno=3D-19)
May  7 07:12:20 Arktur kernel: ata5: reset failed (errno=3D-19), retryi=
ng in 10 secs
May  7 07:12:30 Arktur kernel: ata5: hard resetting link
May  7 07:12:30 Arktur kernel: ata5: COMRESET failed (errno=3D-19)
May  7 07:12:30 Arktur kernel: ata5: reset failed (errno=3D-19), retryi=
ng in 10 secs
May  7 07:12:40 Arktur kernel: ata5: hard resetting link
May  7 07:12:42 Arktur kernel: ata5: SATA link up 1.5 Gbps (SStatus 113=
 SControl 310)
May  7 07:12:43 Arktur kernel: ata5.00: configured for UDMA/100
May  7 07:12:43 Arktur kernel: ata5: EH complete
May  7 07:12:43 Arktur kernel: ata5.00: exception Emask 0x10 SAct 0x0 S=
Err 0x10000 action 0xe frozen
May  7 07:12:43 Arktur kernel: ata5: SError: { PHYRdyChg }
May  7 07:12:43 Arktur kernel: ata5.00: failed command: WRITE DMA EXT
May  7 07:12:43 Arktur kernel: ata5.00: cmd 35/00:00:00:72:50/00:04:5e:=
00:00/e0 tag 0 dma 524288 out
May  7 07:12:43 Arktur kernel:          res d0/d0:d0:d0:d0:d0/d0:d0:d0:=
d0:d0/d0 Emask 0x12 (ATA bus error)
May  7 07:12:43 Arktur kernel: ata5.00: status: { Busy }
May  7 07:12:43 Arktur kernel: ata5.00: error: { ICRC UNC IDNF }
May  7 07:12:43 Arktur kernel: ata5: hard resetting link
May  7 07:12:49 Arktur kernel: ata5: link is slow to respond, please be=
 patient (ready=3D-19)
May  7 07:12:50 Arktur kernel: ata5: SATA link up 1.5 Gbps (SStatus 113=
 SControl 310)
May  7 07:12:51 Arktur kernel: ata5.00: configured for UDMA/100
May  7 07:12:51 Arktur kernel: ata5: EH complete
May  7 07:12:51 Arktur kernel: ata5: exception Emask 0x10 SAct 0x0 SErr=
 0x10000 action 0xe frozen
May  7 07:12:51 Arktur kernel: ata5: SError: { PHYRdyChg }
May  7 07:12:51 Arktur kernel: ata5: hard resetting link
May  7 07:12:54 Arktur kernel: ata5: COMRESET failed (errno=3D-19)
May  7 07:12:54 Arktur kernel: ata5: reset failed (errno=3D-19), retryi=
ng in 7 secs
May  7 07:13:01 Arktur kernel: ata5: hard resetting link
May  7 07:13:04 Arktur kernel: ata5: COMRESET failed (errno=3D-19)
May  7 07:13:04 Arktur kernel: ata5: reset failed (errno=3D-19), retryi=
ng in 7 secs
May  7 07:13:11 Arktur kernel: ata5: hard resetting link
May  7 07:13:14 Arktur kernel: ata5: COMRESET failed (errno=3D-19)
May  7 07:13:14 Arktur kernel: ata5: reset failed (errno=3D-19), retryi=
ng in 33 secs
May  7 07:13:46 Arktur kernel: ata5: hard resetting link
May  7 07:13:47 Arktur kernel: ata5: SATA link up 1.5 Gbps (SStatus 113=
 SControl 310)
May  7 07:13:47 Arktur kernel: ata5.00: failed to read native max addre=
ss (err_mask=3D0x100)
May  7 07:13:47 Arktur kernel: ata5.00: HPA support seems broken, skipp=
ing HPA handling
May  7 07:13:47 Arktur kernel: ata5.00: revalidation failed (errno=3D-5=
)
May  7 07:13:52 Arktur kernel: ata5: hard resetting link
May  7 07:13:53 Arktur kernel: ata5: COMRESET failed (errno=3D-19)
May  7 07:13:53 Arktur kernel: ata5: reset failed (errno=3D-19), retryi=
ng in 9 secs
May  7 07:14:02 Arktur kernel: ata5: hard resetting link
May  7 07:14:05 Arktur kernel: ata5: COMRESET failed (errno=3D-19)
May  7 07:14:05 Arktur kernel: ata5: reset failed (errno=3D-19), retryi=
ng in 8 secs
May  7 07:14:12 Arktur kernel: ata5: hard resetting link
May  7 07:14:14 Arktur kernel: ata5: COMRESET failed (errno=3D-19)
May  7 07:14:14 Arktur kernel: ata5: reset failed (errno=3D-19), retryi=
ng in 33 secs
May  7 07:14:47 Arktur kernel: ata5: hard resetting link
May  7 07:14:47 Arktur kernel: ata5: COMRESET failed (errno=3D-19)
May  7 07:14:47 Arktur kernel: ata5: reset failed, giving up
May  7 07:14:47 Arktur kernel: ata5.00: disabled
May  7 07:14:47 Arktur kernel: ata5: exception Emask 0x10 SAct 0x0 SErr=
 0x10000 action 0xe frozen t4
May  7 07:14:47 Arktur kernel: ata5: SError: { PHYRdyChg }
May  7 07:14:47 Arktur kernel: ata5: hard resetting link
May  7 07:14:47 Arktur kernel: sd 5:0:0:0: rejecting I/O to offline dev=
ice
May  7 07:14:47 Arktur kernel: sd 5:0:0:0: [sdf] killing request
May  7 07:14:47 Arktur kernel: sd 5:0:0:0: [sdf] Unhandled error code
May  7 07:14:47 Arktur kernel: sd 5:0:0:0: [sdf]  Result: hostbyte=3D0x=
01 driverbyte=3D0x00
May  7 07:14:47 Arktur kernel: sd 5:0:0:0: [sdf] CDB: cdb[0]=3D0x28: 28=
 00 d0 d1 07 20 00 00 08 00
May  7 07:14:47 Arktur kernel: end_request: I/O error, dev sdf, sector =
3503359776
May  7 07:14:48 Arktur kernel: sd 5:0:0:0: rejecting I/O to offline dev=
ice
May  7 07:14:48 Arktur kernel: end_request: I/O error, dev sdf, sector =
0
May  7 07:14:48 Arktur kernel: sd 5:0:0:0: rejecting I/O to offline dev=
ice
May  7 07:14:49 Arktur kernel: lost page write due to I/O error on sdf1
May  7 07:14:49 Arktur kernel: sd 5:0:0:0: rejecting I/O to offline dev=
ice
May  7 07:14:49 Arktur kernel: lost page write due to I/O error on sdf1
May  7 07:14:49 Arktur kernel: sd 5:0:0:0: rejecting I/O to offline dev=
ice
May  7 07:14:49 Arktur kernel: lost page write due to I/O error on sdf1
May  7 07:14:50 Arktur kernel: sd 5:0:0:0: rejecting I/O to offline dev=
ice
May  7 07:14:54 Arktur kernel: ata5: link is slow to respond, please be=
 patient (ready=3D-19)
May  7 07:14:57 Arktur kernel: ata5: COMRESET failed (errno=3D-16)
May  7 07:14:57 Arktur kernel: ata5: hard resetting link
May  7 07:15:01 Arktur kernel: sd 5:0:0:0: rejecting I/O to offline dev=
ice
May  7 07:15:03 Arktur kernel: ata5: link is slow to respond, please be=
 patient (ready=3D-19)
May  7 07:15:07 Arktur kernel: sd 5:0:0:0: rejecting I/O to offline dev=
ice
May  7 07:15:07 Arktur kernel: ata5: COMRESET failed (errno=3D-19)
May  7 07:15:07 Arktur kernel: ata5: reset failed (errno=3D-19), retryi=
ng in 1 secs
May  7 07:15:07 Arktur kernel: sd 5:0:0:0: rejecting I/O to offline dev=
ice
May  7 07:15:07 Arktur kernel: ata5: hard resetting link
May  7 07:15:07 Arktur kernel: sd 5:0:0:0: rejecting I/O to offline dev=
ice
May  7 07:15:12 Arktur kernel: ata5: COMRESET failed (errno=3D-19)
May  7 07:15:12 Arktur kernel: ata5: reset failed (errno=3D-19), retryi=
ng in 31 secs
May  7 07:15:19 Arktur kernel: sd 5:0:0:0: rejecting I/O to offline dev=
ice
May  7 07:15:19 Arktur kernel: end_request: I/O error, dev sdf, sector =
0
May  7 07:15:19 Arktur kernel: sd 5:0:0:0: rejecting I/O to offline dev=
ice
May  7 07:15:19 Arktur kernel: lost page write due to I/O error on sdf1
May  7 07:15:19 Arktur kernel: sd 5:0:0:0: rejecting I/O to offline dev=
ice
May  7 07:15:19 Arktur kernel: lost page write due to I/O error on sdf1
May  7 07:15:19 Arktur kernel: sd 5:0:0:0: rejecting I/O to offline dev=
ice
May  7 07:15:19 Arktur kernel: lost page write due to I/O error on sdf1
May  7 07:15:22 Arktur kernel: sd 5:0:0:0: rejecting I/O to offline dev=
ice
May  7 07:15:42 Arktur kernel: ata5: hard resetting link
May  7 07:15:44 Arktur kernel: ata5: COMRESET failed (errno=3D-19)
May  7 07:15:44 Arktur kernel: ata5: reset failed, giving up
May  7 07:15:44 Arktur kernel: ata5: exception Emask 0x10 SAct 0x0 SErr=
 0x10000 action 0xe frozen t3
May  7 07:15:44 Arktur kernel: ata5: SError: { PHYRdyChg }
May  7 07:15:44 Arktur kernel: ata5: hard resetting link
May  7 07:15:44 Arktur kernel: ata5: COMRESET failed (errno=3D-19)
May  7 07:15:44 Arktur kernel: ata5: reset failed (errno=3D-19), retryi=
ng in 10 secs
May  7 07:15:49 Arktur kernel: sd 5:0:0:0: rejecting I/O to offline dev=
ice
May  7 07:15:50 Arktur kernel: end_request: I/O error, dev sdf, sector =
0
May  7 07:15:50 Arktur kernel: sd 5:0:0:0: rejecting I/O to offline dev=
ice
May  7 07:15:50 Arktur kernel: lost page write due to I/O error on sdf1
May  7 07:15:50 Arktur kernel: sd 5:0:0:0: rejecting I/O to offline dev=
ice
May  7 07:15:50 Arktur kernel: lost page write due to I/O error on sdf1
May  7 07:15:50 Arktur kernel: sd 5:0:0:0: rejecting I/O to offline dev=
ice
May  7 07:15:50 Arktur kernel: lost page write due to I/O error on sdf1
May  7 07:15:54 Arktur kernel: ata5: hard resetting link
May  7 07:15:55 Arktur kernel: sd 5:0:0:0: rejecting I/O to offline dev=
ice
May  7 07:15:59 Arktur kernel: ata5: link is slow to respond, please be=
 patient (ready=3D-19)
May  7 07:16:04 Arktur kernel: ata5: COMRESET failed (errno=3D-16)
May  7 07:16:04 Arktur kernel: ata5: hard resetting link
May  7 07:16:05 Arktur kernel: ata5: SATA link up 1.5 Gbps (SStatus 113=
 SControl 310)
May  7 07:16:05 Arktur kernel: ata5.00: ATA-8: WDC WD30EZRX-00MMMB0, 80=
=2E00A80, max UDMA/133
May  7 07:16:05 Arktur kernel: ata5.00: 5860533168 sectors, multi 0: LB=
A48 NCQ (depth 0/32)
May  7 07:16:05 Arktur kernel: ata5.00: configured for UDMA/100
May  7 07:16:05 Arktur kernel: ata5: EH complete
May  7 07:16:05 Arktur kernel: ata5.00: detaching (SCSI 5:0:0:0)
May  7 07:16:05 Arktur kernel: sd 5:0:0:0: [sdf] Synchronizing SCSI cac=
he
May  7 07:16:20 Arktur kernel: end_request: I/O error, dev sdf, sector =
0
May  7 07:16:20 Arktur kernel: lost page write due to I/O error on sdf1
May  7 07:22:05 Arktur kernel: sd 5:0:0:0: timing out command, waited 3=
60s
May  7 07:28:05 Arktur kernel: sd 5:0:0:0: timing out command, waited 3=
60s
May  7 07:34:05 Arktur kernel: sd 5:0:0:0: timing out command, waited 3=
60s
May  7 07:34:05 Arktur kernel: sd 5:0:0:0: [sdf]  Result: hostbyte=3D0x=
00 driverbyte=3D0x00
May  7 07:34:05 Arktur kernel: sd 5:0:0:0: [sdf] Stopping disk
May  7 07:37:05 Arktur kernel: sd 5:0:0:0: timing out command, waited 1=
80s
May  7 07:37:05 Arktur kernel: sd 5:0:0:0: [sdf] START_STOP FAILED
May  7 07:37:05 Arktur kernel: sd 5:0:0:0: [sdf]  Result: hostbyte=3D0x=
00 driverbyte=3D0x00
May  7 07:37:06 Arktur kernel: ata5: SATA link up 1.5 Gbps (SStatus 113=
 SControl 310)
May  7 07:37:07 Arktur kernel: ata5.00: configured for UDMA/100
May  7 07:37:07 Arktur kernel: scsi 5:0:0:0: Direct-Access     ATA     =
 WDC WD30EZRX-00M 80.0 PQ: 0 ANSI: 5

May  7 10:47:22 Arktur kernel: lost page write due to I/O error on sdf1

May  7 11:11:21 Arktur kernel: lost page write due to I/O error on sdf1
May  7 11:12:07 Arktur kernel: lost page write due to I/O error on sdf1

[reboot with kernel 3.2.9]

May  7 11:15:25 Arktur kernel: ata5.00: configured for UDMA/100
May  7 11:15:25 Arktur kernel: scsi 5:0:0:0: Direct-Access     ATA     =
 WDC WD30EZRX-00M 80.0 PQ: 0 ANSI: 5
May  7 11:15:26 Arktur kernel: sd 5:0:0:0: [sdf] 5860533168 512-byte lo=
gical blocks: (3.00 TB/2.72 TiB)
May  7 11:15:26 Arktur kernel: sd 5:0:0:0: [sdf] 4096-byte physical blo=
cks
May  7 11:15:26 Arktur kernel: sd 5:0:0:0: [sdf] Write Protect is off
May  7 11:15:26 Arktur kernel: sd 5:0:0:0: [sdf] Mode Sense: 00 3a 00 0=
0
May  7 11:15:26 Arktur kernel: sd 5:0:0:0: [sdf] Write cache: enabled, =
read cache: enabled, doesn't support DPO or FUA
May  7 11:15:26 Arktur kernel:  sdf: sdf1
May  7 11:15:26 Arktur kernel: sd 5:0:0:0: [sdf] Attached SCSI disk

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D  dmesg output =3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D

btrfs: free space inode generation (0) did not match free space cache =20
generation (36740)
btrfs: space cache generation (36727) does not match inode (36747)
btrfs: failed to load free space cache for block group 9193084223488
ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata5.00: BMDMA2 stat 0x80d0009
ata5.00: failed command: READ DMA EXT
ata5.00: cmd 25/00:80:00:b3:d7/00:00:02:01:00/e0 tag 0 dma 65536 in
         res 51/40:6f:08:b3:d7/00:00:02:01:00/f0 Emask 0x9 (media error=
)
ata5.00: status: { DRDY ERR }
ata5.00: error: { UNC }
ata5.00: configured for UDMA/100
ata5: EH complete
ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata5.00: BMDMA2 stat 0x80d0009
ata5.00: failed command: READ DMA EXT
ata5.00: cmd 25/00:80:00:b3:d7/00:00:02:01:00/e0 tag 0 dma 65536 in
         res 51/40:6f:08:b3:d7/00:00:02:01:00/f0 Emask 0x9 (media error=
)
ata5.00: status: { DRDY ERR }
ata5.00: error: { UNC }
ata5.00: configured for UDMA/100
ata5: EH complete
ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata5.00: BMDMA2 stat 0x80d0009
ata5.00: failed command: READ DMA EXT
ata5.00: cmd 25/00:80:00:b3:d7/00:00:02:01:00/e0 tag 0 dma 65536 in
         res 51/40:6f:08:b3:d7/00:00:02:01:00/f0 Emask 0x9 (media error=
)
ata5.00: status: { DRDY ERR }
ata5.00: error: { UNC }
ata5.00: configured for UDMA/100
ata5: EH complete
ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata5.00: BMDMA2 stat 0x80d0009
ata5.00: failed command: READ DMA EXT
ata5.00: cmd 25/00:80:00:b3:d7/00:00:02:01:00/e0 tag 0 dma 65536 in
         res 51/40:6f:08:b3:d7/00:00:02:01:00/f0 Emask 0x9 (media error=
)
ata5.00: status: { DRDY ERR }
ata5.00: error: { UNC }
ata5.00: configured for UDMA/100
ata5: EH complete
ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata5.00: BMDMA2 stat 0x80d0009
ata5.00: failed command: READ DMA EXT
ata5.00: cmd 25/00:80:00:b3:d7/00:00:02:01:00/e0 tag 0 dma 65536 in
         res 51/40:6f:08:b3:d7/00:00:02:01:00/f0 Emask 0x9 (media error=
)
ata5.00: status: { DRDY ERR }
ata5.00: error: { UNC }
ata5.00: configured for UDMA/100
ata5: EH complete
ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata5.00: BMDMA2 stat 0x80d0009
ata5.00: failed command: READ DMA EXT
ata5.00: cmd 25/00:80:00:b3:d7/00:00:02:01:00/e0 tag 0 dma 65536 in
         res 51/40:6f:08:b3:d7/00:00:02:01:00/f0 Emask 0x9 (media error=
)
ata5.00: status: { DRDY ERR }
ata5.00: error: { UNC }
ata5.00: configured for UDMA/100
sd 5:0:0:0: [sdf] Unhandled sense code
sd 5:0:0:0: [sdf]  Result: hostbyte=3D0x00 driverbyte=3D0x08
sd 5:0:0:0: [sdf]  Sense Key : 0x3 [current] [descriptor]
Descriptor sense data with sense descriptors (in hex):
        72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 01
        02 d7 b3 08
sd 5:0:0:0: [sdf]  ASC=3D0x11 ASCQ=3D0x4
sd 5:0:0:0: [sdf] CDB: cdb[0]=3D0x88: 88 00 00 00 00 01 02 d7 b3 00 00 =
00 00 80 00 00
end_request: I/O error, dev sdf, sector 4342657800
ata5: EH complete
ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata5.00: BMDMA2 stat 0x80d0009
ata5.00: failed command: READ DMA EXT
ata5.00: cmd 25/00:08:08:b3:d7/00:00:02:01:00/e0 tag 0 dma 4096 in
         res 51/40:08:08:b3:d7/00:00:02:01:00/f0 Emask 0x9 (media error=
)
ata5.00: status: { DRDY ERR }
ata5.00: error: { UNC }
ata5.00: configured for UDMA/100
ata5: EH complete
ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata5.00: BMDMA2 stat 0x80d0009
ata5.00: failed command: READ DMA EXT
ata5.00: cmd 25/00:08:08:b3:d7/00:00:02:01:00/e0 tag 0 dma 4096 in
         res 51/40:08:08:b3:d7/00:00:02:01:00/f0 Emask 0x9 (media error=
)
ata5.00: status: { DRDY ERR }
ata5.00: error: { UNC }
ata5.00: configured for UDMA/100
ata5: EH complete
ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata5.00: BMDMA2 stat 0x80d0009
ata5.00: failed command: READ DMA EXT
ata5.00: cmd 25/00:08:08:b3:d7/00:00:02:01:00/e0 tag 0 dma 4096 in
         res 51/40:08:08:b3:d7/00:00:02:01:00/f0 Emask 0x9 (media error=
)
ata5.00: status: { DRDY ERR }
ata5.00: error: { UNC }
ata5.00: configured for UDMA/100
ata5: EH complete
ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata5.00: BMDMA2 stat 0x80d0009
ata5.00: failed command: READ DMA EXT
ata5.00: cmd 25/00:08:08:b3:d7/00:00:02:01:00/e0 tag 0 dma 4096 in
         res 51/40:08:08:b3:d7/00:00:02:01:00/f0 Emask 0x9 (media error=
)
ata5.00: status: { DRDY ERR }
ata5.00: error: { UNC }
ata5.00: configured for UDMA/100
ata5: EH complete
ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata5.00: BMDMA2 stat 0x80d0009
ata5.00: failed command: READ DMA EXT
ata5.00: cmd 25/00:08:08:b3:d7/00:00:02:01:00/e0 tag 0 dma 4096 in
         res 51/40:08:08:b3:d7/00:00:02:01:00/f0 Emask 0x9 (media error=
)
ata5.00: status: { DRDY ERR }
ata5.00: error: { UNC }
ata5.00: configured for UDMA/100
ata5: EH complete
ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata5.00: BMDMA2 stat 0x80d0009
ata5.00: failed command: READ DMA EXT
ata5.00: cmd 25/00:08:08:b3:d7/00:00:02:01:00/e0 tag 0 dma 4096 in
         res 51/40:08:08:b3:d7/00:00:02:01:00/f0 Emask 0x9 (media error=
)
ata5.00: status: { DRDY ERR }
ata5.00: error: { UNC }
ata5.00: configured for UDMA/100
sd 5:0:0:0: [sdf] Unhandled sense code
sd 5:0:0:0: [sdf]  Result: hostbyte=3D0x00 driverbyte=3D0x08
sd 5:0:0:0: [sdf]  Sense Key : 0x3 [current] [descriptor]
Descriptor sense data with sense descriptors (in hex):
        72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 01
        02 d7 b3 08
sd 5:0:0:0: [sdf]  ASC=3D0x11 ASCQ=3D0x4
sd 5:0:0:0: [sdf] CDB: cdb[0]=3D0x88: 88 00 00 00 00 01 02 d7 b3 08 00 =
00 00 08 00 00
end_request: I/O error, dev sdf, sector 4342657800
ata5: EH complete
btrfs: error reading free space cache
BUG: unable to handle kernel NULL pointer dereference at 00000001
IP: [<c1295c36>] io_ctl_drop_pages+0x26/0x50
*pdpt =3D 0000000029712001 *pde =3D 0000000000000000
Oops: 0002 [#1]

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D syslogd output =3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D kernel 3.2.9 after the 3.3.4=
 try =3D=3D=3D=3D

Message from syslogd@Arktur at Mon May    7  11:21:55 2012
Arktur kernel: Oops: 0002 [#1]
Message from syslogd@Arktur at Mon May    7  11:21:56 2012
es existieren nur
Arktur kernel:   Process flush-btrfs-l  (pid:   51 ti=3De9f12000)
=F2 at Mon May
=F2 at Mon May
ti=3De9f12000 task=3Df6882a50 task. 11:21:56 2012   .. . 11:21:56 2012 =
  ...

Message from syslogd@Arktur at Mon May    7  11:21:56 2012   ...
Arktur kernel: Code: c3 8d 74 26 00 55 89 e5 56 53 3e 8d 74 26 00 89 c6=
 e8 2f ff
ff ff 8b 5e 1c 85 db 7e 30 31 db 8d b6 00 00 00 00 8b 46 0c 8b 04 98 <8=
0> 60 01
fe 8b 46 0c 8b 04 98 e8 1b 96 df ff 8b 46 0c 8b 04 98

Message from syslogd@Arktur at Mon May    7  11:21:56 2012   ...
Arktur kernel:   EIP:   [<c1295c36>]   io_cfl_drop_pages+0x26/0x50 SS:E=
SP 0068:e9fl396

Message from syslogd@Arktur at Mon May 7 11:21:56 2012
Arktur kernel: CR2: 0000000000000001
Arktur:- #

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D

The 3 btrfs disks are connected via a SiI 3114 SATA-PCI-Controller.
Only 1 of the 3 disks seems to be damaged.

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D

Ca I repair the system? Or have I to copy it to a set of other disks?

Viele Gruesse!
Helmut
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: kernel 3.3.4 damages filesystem (?)
  2012-05-07 10:46 kernel 3.3.4 damages filesystem (?) Helmut Hullen
@ 2012-05-07 10:58 ` Fajar A. Nugraha
  2012-05-07 12:06   ` Helmut Hullen
  2012-05-07 10:59 ` Hugo Mills
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 54+ messages in thread
From: Fajar A. Nugraha @ 2012-05-07 10:58 UTC (permalink / raw)
  To: helmut; +Cc: linux-btrfs

On Mon, May 7, 2012 at 5:46 PM, Helmut Hullen <Hullen@t-online.de> wrot=
e:

> For some months I run btrfs unter kernel 3.2.5 and 3.2.9, without
> problems.
>
> Yesterday I compiled kernel 3.3.4, and this morning I started the
> machine with this kernel. There may be some ugly problems.


> Data, RAID0: total=3D5.29TB, used=3D4.29TB

Raid0? Yaiks!

> System, RAID1: total=3D8.00MB, used=3D352.00KB
> System: total=3D4.00MB, used=3D0.00
> Metadata, RAID1: total=3D149.00GB, used=3D5.00GB
>
> Label: 'MMedia' =A0uuid: 9adfdc84-0fbe-431b-bcb1-cabb6a915e91
> =A0 =A0 =A0 =A0Total devices 3 FS bytes used 4.29TB
> =A0 =A0 =A0 =A0devid =A0 =A03 size 2.73TB used 1.98TB path /dev/sdi1
> =A0 =A0 =A0 =A0devid =A0 =A02 size 2.73TB used 1.94TB path /dev/sdf1
> =A0 =A0 =A0 =A0devid =A0 =A01 size 1.82TB used 1.63TB path /dev/sdc1
>


> May =A07 06:55:26 Arktur kernel: ata5: exception Emask 0x10 SAct 0x0 =
SErr 0x10000 action 0xe frozen
> May =A07 06:55:26 Arktur kernel: ata5: SError: { PHYRdyChg }
> May =A07 06:55:26 Arktur kernel: ata5: hard resetting link
> May =A07 06:55:31 Arktur kernel: ata5: COMRESET failed (errno=3D-19)
> May =A07 06:55:31 Arktur kernel: ata5: reset failed (errno=3D-19), re=
trying in 6 secs


> May =A07 07:15:19 Arktur kernel: sd 5:0:0:0: rejecting I/O to offline=
 device
> May =A07 07:15:19 Arktur kernel: end_request: I/O error, dev sdf, sec=
tor 0
> May =A07 07:15:19 Arktur kernel: sd 5:0:0:0: rejecting I/O to offline=
 device
> May =A07 07:15:19 Arktur kernel: lost page write due to I/O error on =
sdf1


That looks like a bad disk to me, and it shouldn't be related to ther
kernel version you use.

Your best chance might be:
- unmount the fs
- get another disk to replace /dev/sdf, copy the content over with
dd_rescue. Ata resets can be a PITA, so you might be better of by
moving the failed disk to a usb external adapter, and du some creative
combination of plug-unplug and selectively skip bad sectors manually
(by passing "-s" to dd_rescue).
- reboot, with the bad disk unplugged
- (optional) run "btrfs filesystem scrub" (you might need to build
btrfs-progs manually from git source). or simply read the entire fs
(e.g. using tar to /dev/null, or whatever). It should check the
checksum of all files and print out which files are damaged (either in
stdout or syslog).

I don't think there's anything you can do to recover the damaged files
(other than restore from backup), but at least you know which files
are NOT damaged.

--=20
=46ajar
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: kernel 3.3.4 damages filesystem (?)
  2012-05-07 10:46 kernel 3.3.4 damages filesystem (?) Helmut Hullen
  2012-05-07 10:58 ` Fajar A. Nugraha
@ 2012-05-07 10:59 ` Hugo Mills
  2012-05-07 12:15   ` Helmut Hullen
  2012-05-07 13:34   ` Helmut Hullen
  2012-05-07 12:53 ` Liu Bo
  2012-05-09 17:32 ` Duncan
  3 siblings, 2 replies; 54+ messages in thread
From: Hugo Mills @ 2012-05-07 10:59 UTC (permalink / raw)
  To: helmut; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 4199 bytes --]

On Mon, May 07, 2012 at 12:46:00PM +0200, Helmut Hullen wrote:
> Hallo,
> 
> "never change a running system" ...
> 
> For some months I run btrfs unter kernel 3.2.5 and 3.2.9, without  
> problems.
> 
> Yesterday I compiled kernel 3.3.4, and this morning I started the  
> machine with this kernel. There may be some ugly problems.
> 
> Copying something into the btrfs "directory" worked well for some files,  
> and then I got error messages (I've not copied them, something with "IO  
> error" under Samba).
> 
> Rebooting the machine with kernel 3.2.9 worked, copying 1 file worked,  
> but copying more than this file didn't work. And I can't delete this  
> file.
> 
> That doesn't please me - copying more than 4 TBytes wastes time and  
> money.
> 
> =========== configuration =================
> 
> /dev/sdc1 on /srv/MM type btrfs (rw,noatime)
> 
> /dev/sdc: SAMSUNG HD204UI: 25 °C
> /dev/sdf: WDC WD30EZRX-00MMMB0: 30 °C
> /dev/sdi: WDC WD30EZRX-00MMMB0: 29 °C
> 
> Data, RAID0: total=5.29TB, used=4.29TB
> System, RAID1: total=8.00MB, used=352.00KB
> System: total=4.00MB, used=0.00
> Metadata, RAID1: total=149.00GB, used=5.00GB
> 
> Label: 'MMedia'  uuid: 9adfdc84-0fbe-431b-bcb1-cabb6a915e91
> 	Total devices 3 FS bytes used 4.29TB
> 	devid    3 size 2.73TB used 1.98TB path /dev/sdi1
> 	devid    2 size 2.73TB used 1.94TB path /dev/sdf1
> 	devid    1 size 1.82TB used 1.63TB path /dev/sdc1
> 
> Btrfs Btrfs v0.19
> 
> =================== boot messages, kernel related ==============
> 
> [boot with kernel 3.3.4]
> May  7 06:55:26 Arktur kernel: ata5: exception Emask 0x10 SAct 0x0 SErr 0x10000 action 0xe frozen
> May  7 06:55:26 Arktur kernel: ata5: SError: { PHYRdyChg }
> May  7 06:55:26 Arktur kernel: ata5: hard resetting link
> May  7 06:55:31 Arktur kernel: ata5: COMRESET failed (errno=-19)
> May  7 06:55:31 Arktur kernel: ata5: reset failed (errno=-19), retrying in 6 secs
> May  7 06:55:36 Arktur kernel: ata5: hard resetting link
> May  7 06:55:38 Arktur kernel: ata5: COMRESET failed (errno=-19)
> May  7 06:55:38 Arktur kernel: ata5: reset failed (errno=-19), retrying in 9 secs
> May  7 06:55:46 Arktur kernel: ata5: hard resetting link
> May  7 06:55:47 Arktur kernel: ata5: COMRESET failed (errno=-19)
> May  7 06:55:47 Arktur kernel: ata5: reset failed (errno=-19), retrying in 34 secs
> May  7 06:56:21 Arktur kernel: ata5: hard resetting link
> May  7 06:56:22 Arktur kernel: ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
> May  7 06:56:22 Arktur kernel: ata5.00: configured for UDMA/100
> May  7 06:56:22 Arktur kernel: ata5: EH complete
> May  7 07:12:07 Arktur kernel: ata5.00: exception Emask 0x10 SAct 0x0 SErr 0x10000 action 0xe frozen
> May  7 07:12:07 Arktur kernel: ata5: SError: { PHYRdyChg }
> May  7 07:12:07 Arktur kernel: ata5.00: failed command: WRITE DMA EXT
> May  7 07:12:07 Arktur kernel: ata5.00: cmd 35/00:00:00:62:50/00:04:5e:00:00/e0 tag 0 dma 524288 out
> May  7 07:12:07 Arktur kernel:          res d8/d8:d8:d8:d8:d8/d8:d8:d8:d8:d8/d8 Emask 0x12 (ATA bus error)
> May  7 07:12:07 Arktur kernel: ata5.00: status: { Busy }
> May  7 07:12:07 Arktur kernel: ata5.00: error: { ICRC UNC IDNF }

   This is a hardware error. You have a device that's either dead or
dying. (Given the number of errors, probably already dead).

> May  7 07:12:07 Arktur kernel: ata5: hard resetting link
> ==========================================================
> 
> The 3 btrfs disks are connected via a SiI 3114 SATA-PCI-Controller.
> Only 1 of the 3 disks seems to be damaged.
> 
> ==========================================================
> 
> Ca I repair the system? Or have I to copy it to a set of other disks?

   If you have RAID-1 or RAID-10 on both data and netadata, then you
_should_ in theory just be able to remove the dead disk (physically),
then btrfs dev add a new one, btrfs dev del missing, and balance.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
                        --- argc, argv, argh! ---                        

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 190 bytes --]

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: kernel 3.3.4 damages filesystem (?)
  2012-05-07 10:58 ` Fajar A. Nugraha
@ 2012-05-07 12:06   ` Helmut Hullen
  0 siblings, 0 replies; 54+ messages in thread
From: Helmut Hullen @ 2012-05-07 12:06 UTC (permalink / raw)
  To: linux-btrfs

Hallo, Fajar,

Du meintest am 07.05.12:

>> For some months I run btrfs unter kernel 3.2.5 and 3.2.9, without
>> problems.
>>
>> Yesterday I compiled kernel 3.3.4, and this morning I started the
>> machine with this kernel. There may be some ugly problems.


>> Data, RAID0: total=3D5.29TB, used=3D4.29TB

> Raid0? Yaiks!

Why not?
You know the price of 1 3-TByte disk?
The data isn't irreproducible, in this case.

[...]

>> May =A07 07:15:19 Arktur kernel: sd 5:0:0:0: rejecting I/O to offlin=
e
>> device
>> May =A07 07:15:19 Arktur kernel: end_request: I/O error, dev
>> sdf, sector 0
>> May =A07 07:15:19 Arktur kernel: sd 5:0:0:0: rejecting
>> I/O to offline device
>> May =A07 07:15:19 Arktur kernel: lost page write
>> due to I/O error on sdf1


> That looks like a bad disk to me, and it shouldn't be related to the
> kernel version you use.

But why does it happen just when I change the kernel?
(Yes - I know: Murphy works reliable ...)

> Your best chance might be:
> - unmount the fs
> - get another disk to replace /dev/sdf, copy the content over with
> dd_rescue. Ata resets can be a PITA, so you might be better of by
> moving the failed disk to a usb external adapter, and du some
> creative combination of plug-unplug and selectively skip bad sectors
> manually (by passing "-s" to dd_rescue).

Hmmm - I'll take a try ...


> - reboot, with the bad disk unplugged
> - (optional) run "btrfs filesystem scrub" (you might need to build
> btrfs-progs manually from git source).

Last time I'd tried this command (some months ago) it had produced a =20
completely unusable system of disks/partitions ...


> or simply read the entire fs
> (e.g. using tar to /dev/null, or whatever). It should check the
> checksum of all files and print out which files are damaged (either
> in stdout or syslog).

And that's the other try - I had to use it for another disk (also WD, =20
but only 2 TByte - I could watch how it died ...).

Viele Gruesse!
Helmut
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: kernel 3.3.4 damages filesystem (?)
  2012-05-07 10:59 ` Hugo Mills
@ 2012-05-07 12:15   ` Helmut Hullen
  2012-05-07 13:34   ` Helmut Hullen
  1 sibling, 0 replies; 54+ messages in thread
From: Helmut Hullen @ 2012-05-07 12:15 UTC (permalink / raw)
  To: linux-btrfs

Hallo, Hugo,

Du meintest am 07.05.12:

>> Yesterday I compiled kernel 3.3.4, and this morning I started the
>> machine with this kernel. There may be some ugly problems.
>>
>> Copying something into the btrfs "directory" worked well for some
>> files, and then I got error messages (I've not copied them,
>> something with "IO error" under Samba).

[...]

>> Data, RAID0: total=5.29TB, used=4.29TB
>> System, RAID1: total=8.00MB, used=352.00KB
>> System: total=4.00MB, used=0.00
>> Metadata, RAID1: total=149.00GB, used=5.00GB

>>
>> Label: 'MMedia'  uuid: 9adfdc84-0fbe-431b-bcb1-cabb6a915e91
>> 	Total devices 3 FS bytes used 4.29TB
>> 	devid    3 size 2.73TB used 1.98TB path /dev/sdi1
>> 	devid    2 size 2.73TB used 1.94TB path /dev/sdf1
>> 	devid    1 size 1.82TB used 1.63TB path /dev/sdc1
>>
>> Btrfs Btrfs v0.19
>>
>> =================== boot messages, kernel related ==============
>>
>> [boot with kernel 3.3.4]
>> May  7 06:55:26 Arktur kernel: ata5: exception Emask 0x10 SAct 0x0
>> SErr 0x10000 action 0xe frozen
>> May  7 06:55:26 Arktur kernel: ata5: SError: { PHYRdyChg }
>> May  7 06:55:26 Arktur kernel: ata5: hard resetting link

>    This is a hardware error. You have a device that's either dead or
> dying. (Given the number of errors, probably already dead).

It seems to be undecided which status it has ...

>> Can I repair the system? Or have I to copy it to a set of other
>> disks?

>    If you have RAID-1 or RAID-10 on both data and netadata, then you
> _should_ in theory just be able to remove the dead disk (physically),
> then btrfs dev add a new one, btrfs dev del missing, and balance.


I haven't - I have a kind of copy/backup in the neighbourhood.

Viele Gruesse!
Helmut

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: kernel 3.3.4 damages filesystem (?)
  2012-05-07 10:46 kernel 3.3.4 damages filesystem (?) Helmut Hullen
  2012-05-07 10:58 ` Fajar A. Nugraha
  2012-05-07 10:59 ` Hugo Mills
@ 2012-05-07 12:53 ` Liu Bo
  2012-05-09 17:32 ` Duncan
  3 siblings, 0 replies; 54+ messages in thread
From: Liu Bo @ 2012-05-07 12:53 UTC (permalink / raw)
  To: helmut; +Cc: linux-btrfs

On 05/07/2012 06:46 PM, Helmut Hullen wrote:

> btrfs: error reading free space cache
> BUG: unable to handle kernel NULL pointer dereference at 00000001
> IP: [<c1295c36>] io_ctl_drop_pages+0x26/0x50
> *pdpt = 0000000029712001 *pde = 0000000000000000
> Oops: 0002 [#1]



Could you please try this and show us the results?

diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index 202008e..ae514ad 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -296,7 +296,9 @@ static void io_ctl_free(struct io_ctl *io_ctl)
 static void io_ctl_unmap_page(struct io_ctl *io_ctl)
 {
 	if (io_ctl->cur) {
-		kunmap(io_ctl->page);
+		WARN_ON(!io_ctl->page);
+		if (io_ctl->page)
+			kunmap(io_ctl->page);
 		io_ctl->cur = NULL;
 		io_ctl->orig = NULL;
 	}

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: kernel 3.3.4 damages filesystem (?)
  2012-05-07 10:59 ` Hugo Mills
  2012-05-07 12:15   ` Helmut Hullen
@ 2012-05-07 13:34   ` Helmut Hullen
  2012-05-07 14:05     ` Hugo Mills
  1 sibling, 1 reply; 54+ messages in thread
From: Helmut Hullen @ 2012-05-07 13:34 UTC (permalink / raw)
  To: linux-btrfs

Hallo, Hugo,

Du meintest am 07.05.12:

>> =================== boot messages, kernel related ==============
>>
>> [boot with kernel 3.3.4]
>> May  7 06:55:26 Arktur kernel: ata5: exception Emask 0x10 SAct 0x0
>> SErr 0x10000 action 0xe frozen
>> May  7 06:55:26 Arktur kernel: ata5: SError: { PHYRdyChg }
>> May  7 06:55:26 Arktur kernel: ata5: hard resetting link

[...]

>    This is a hardware error. You have a device that's either dead or
> dying. (Given the number of errors, probably already dead).

It's dead - R.I.P.

I've tried it with a SATA-USB-adapter - that adapter produces dmesg  
lines when connecting or disconnecting.

And this special drive doesn't tell anything now. Shit.

Viele Gruesse!
Helmut

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: kernel 3.3.4 damages filesystem (?)
  2012-05-07 13:34   ` Helmut Hullen
@ 2012-05-07 14:05     ` Hugo Mills
  2012-05-07 16:36       ` Helmut Hullen
  0 siblings, 1 reply; 54+ messages in thread
From: Hugo Mills @ 2012-05-07 14:05 UTC (permalink / raw)
  To: helmut; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 1661 bytes --]

On Mon, May 07, 2012 at 03:34:00PM +0200, Helmut Hullen wrote:
> Hallo, Hugo,
> 
> Du meintest am 07.05.12:
> 
> >> =================== boot messages, kernel related ==============
> >>
> >> [boot with kernel 3.3.4]
> >> May  7 06:55:26 Arktur kernel: ata5: exception Emask 0x10 SAct 0x0
> >> SErr 0x10000 action 0xe frozen
> >> May  7 06:55:26 Arktur kernel: ata5: SError: { PHYRdyChg }
> >> May  7 06:55:26 Arktur kernel: ata5: hard resetting link
> 
> [...]
> 
> >    This is a hardware error. You have a device that's either dead or
> > dying. (Given the number of errors, probably already dead).
> 
> It's dead - R.I.P.
> 
> I've tried it with a SATA-USB-adapter - that adapter produces dmesg  
> lines when connecting or disconnecting.
> 
> And this special drive doesn't tell anything now. Shit.

   Sorry to be the bearer of bad news. I don't think we can point the
finger at btrfs here.

   It looks like you've lost most of your data -- losing a RAID-0
stripe across the whole FS isn't likely to have left much of it
intact. If you've got the space (or the money to get it), mkfs.btrfs
-m raid1 -d raid1 would have saved you here.

[ Incidentally, thinking about it, the failure coming at a kernel
   upgrade could well be down to the additional stress of the
   power-down/reboot finally pushing a bad drive over the edge. ]

   In sympathy,
   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
    --- But somewhere along the line, it seems / That pimp became ---    
                       cool,  and punk mainstream.                       

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 190 bytes --]

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: kernel 3.3.4 damages filesystem (?)
  2012-05-07 14:05     ` Hugo Mills
@ 2012-05-07 16:36       ` Helmut Hullen
  2012-05-07 17:13         ` Felix Blanke
  0 siblings, 1 reply; 54+ messages in thread
From: Helmut Hullen @ 2012-05-07 16:36 UTC (permalink / raw)
  To: linux-btrfs

Hallo, Hugo,

Du meintest am 07.05.12:

>> It's dead - R.I.P.

>    Sorry to be the bearer of bad news. I don't think we can point the
> finger at btrfs here.

a) you know what to do with the bearer?
b) I like such errors - completely independent, but simultaneously.

>    It looks like you've lost most of your data -- losing a RAID-0
> stripe across the whole FS isn't likely to have left much of it
> intact.

I'm just going back to ext4 - then one broken disk doesn't disturb the  
contents of the other disks.

The data is not very valuable - DVB video mpegs. Most of the files are  
repeated on and on.

> If you've got the space (or the money to get it), mkfs.btrfs
> -m raid1 -d raid1 would have saved you here.

About 400 ... 500 Euro for backing up videos? Not necessary.

(No: I don't count the minutes and hours working with the system ...)

> [ Incidentally, thinking about it, the failure coming at a kernel
>    upgrade could well be down to the additional stress of the
>    power-down/reboot finally pushing a bad drive over the edge. ]

Just now it's again an "open system"; I had to wobble the cables too ...

Maybe the SATA-PCI-controller needs to be replaced too ...

Viele Gruesse!
Helmut

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: kernel 3.3.4 damages filesystem (?)
  2012-05-07 16:36       ` Helmut Hullen
@ 2012-05-07 17:13         ` Felix Blanke
  2012-05-07 17:52           ` Helmut Hullen
  0 siblings, 1 reply; 54+ messages in thread
From: Felix Blanke @ 2012-05-07 17:13 UTC (permalink / raw)
  To: helmut; +Cc: Helmut Hullen, linux-btrfs

On 5/7/12 6:36 PM, Helmut Hullen wrote:
> Hallo, Hugo,
>
> Du meintest am 07.05.12:
>
>>> It's dead - R.I.P.
>
>>     Sorry to be the bearer of bad news. I don't think we can point the
>> finger at btrfs here.
>
> a) you know what to do with the bearer?
> b) I like such errors - completely independent, but simultaneously.
>
>>     It looks like you've lost most of your data -- losing a RAID-0
>> stripe across the whole FS isn't likely to have left much of it
>> intact.
>
> I'm just going back to ext4 - then one broken disk doesn't disturb the
> contents of the other disks.

?! If you use raid0 one broken disk will allways disturb the contents of 
the other disks, that is what raid0 does, no matter what filesystem you 
use. You could easly use btrfs with the "normal" or raid1 mode. Btrfs is 
still in development and often times you can blaim it for a corrupt 
filesystem, but in this case it's simply "raid0 -> 1 disc dies -> data 
are gone".

>
> The data is not very valuable - DVB video mpegs. Most of the files are
> repeated on and on.
>
>> If you've got the space (or the money to get it), mkfs.btrfs
>> -m raid1 -d raid1 would have saved you here.
>
> About 400 ... 500 Euro for backing up videos? Not necessary.
>
> (No: I don't count the minutes and hours working with the system ...)

>
>> [ Incidentally, thinking about it, the failure coming at a kernel
>>     upgrade could well be down to the additional stress of the
>>     power-down/reboot finally pushing a bad drive over the edge. ]
>
> Just now it's again an "open system"; I had to wobble the cables too ...
>
> Maybe the SATA-PCI-controller needs to be replaced too ...
>
> Viele Gruesse!
> Helmut
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: kernel 3.3.4 damages filesystem (?)
  2012-05-07 17:13         ` Felix Blanke
@ 2012-05-07 17:52           ` Helmut Hullen
  2012-05-07 18:00             ` Hugo Mills
  2012-05-07 19:30             ` kernel 3.3.4 damages filesystem (?) Daniel Lee
  0 siblings, 2 replies; 54+ messages in thread
From: Helmut Hullen @ 2012-05-07 17:52 UTC (permalink / raw)
  To: linux-btrfs

Hallo, Felix,

Du meintest am 07.05.12:

>> I'm just going back to ext4 - then one broken disk doesn't disturb
>> the contents of the other disks.

> ?! If you use raid0 one broken disk will always disturb the contents
> of the other disks, that is what raid0 does, no matter what
> filesystem you use.

Yes - I know. But btrfs promises that I can add bigger disks and delete  
smaller disks "on the fly". For something like a video collection which  
will grow on and on an interesting feature. And such a (big) collection  
does need a "gradfather-father-son" backup, that's no critical data.

With a file system like ext2/3/4 I can work with several directories  
which are mounted together, but (as said before) one broken disk doesn't  
disturb the others.

Viele Gruesse!
Helmut

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: kernel 3.3.4 damages filesystem (?)
  2012-05-07 17:52           ` Helmut Hullen
@ 2012-05-07 18:00             ` Hugo Mills
  2012-05-07 18:25               ` Helmut Hullen
  2012-05-09 14:25               ` Helmut Hullen
  2012-05-07 19:30             ` kernel 3.3.4 damages filesystem (?) Daniel Lee
  1 sibling, 2 replies; 54+ messages in thread
From: Hugo Mills @ 2012-05-07 18:00 UTC (permalink / raw)
  To: helmut; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 1305 bytes --]

On Mon, May 07, 2012 at 07:52:00PM +0200, Helmut Hullen wrote:
> Hallo, Felix,
> 
> Du meintest am 07.05.12:
> 
> >> I'm just going back to ext4 - then one broken disk doesn't disturb
> >> the contents of the other disks.
> 
> > ?! If you use raid0 one broken disk will always disturb the contents
> > of the other disks, that is what raid0 does, no matter what
> > filesystem you use.
> 
> Yes - I know. But btrfs promises that I can add bigger disks and delete  
> smaller disks "on the fly". For something like a video collection which  
> will grow on and on an interesting feature. And such a (big) collection  
> does need a "gradfather-father-son" backup, that's no critical data.
> 
> With a file system like ext2/3/4 I can work with several directories  
> which are mounted together, but (as said before) one broken disk doesn't  
> disturb the others.

   mkfs.btrfs -m raid1 -d single should give you that.

   There may be a kernel patch you need to stop it doing the silly
single → raid0 "upgrade" automatically, as well.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
                       ---   __(_'>  Squeak!   ---                       

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 190 bytes --]

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: kernel 3.3.4 damages filesystem (?)
  2012-05-07 18:00             ` Hugo Mills
@ 2012-05-07 18:25               ` Helmut Hullen
  2012-05-07 18:44                 ` Hugo Mills
  2012-05-09 14:25               ` Helmut Hullen
  1 sibling, 1 reply; 54+ messages in thread
From: Helmut Hullen @ 2012-05-07 18:25 UTC (permalink / raw)
  To: linux-btrfs

Hallo, Hugo,

Du meintest am 07.05.12:

>> With a file system like ext2/3/4 I can work with several directories
>> which are mounted together, but (as said before) one broken disk
>> doesn't disturb the others.

>    mkfs.btrfs -m raid1 -d single should give you that.

What's the difference to

     mkfs.btrfs -m raid1 -d raid0

(what I have used the last time)?

Viele Gruesse!
Helmut

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: kernel 3.3.4 damages filesystem (?)
  2012-05-07 18:25               ` Helmut Hullen
@ 2012-05-07 18:44                 ` Hugo Mills
  2012-05-09 13:04                   ` failed disk (was: kernel 3.3.4 damages filesystem (?)) Helmut Hullen
  0 siblings, 1 reply; 54+ messages in thread
From: Hugo Mills @ 2012-05-07 18:44 UTC (permalink / raw)
  To: helmut; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 2203 bytes --]

On Mon, May 07, 2012 at 08:25:00PM +0200, Helmut Hullen wrote:
> Hallo, Hugo,
> 
> Du meintest am 07.05.12:
> 
> >> With a file system like ext2/3/4 I can work with several directories
> >> which are mounted together, but (as said before) one broken disk
> >> doesn't disturb the others.
> 
> >    mkfs.btrfs -m raid1 -d single should give you that.
> 
> What's the difference to
> 
>      mkfs.btrfs -m raid1 -d raid0

 - RAID-0 stripes each piece of data across all the disks.
 - single puts data on one disk at a time.

   So, on three disks (each disk running horizontally), the FS will
allocate block groups this way for RAID-0:

Disk 1:   | A1 | B1 | C1 |...
Disk 2:   | A2 | B2 | C2 |...
Disk 3:   | A3 | B3 | C3 |...

where each chunk, e.g. A2, is 1G in size. Then data is striped across
all of the An chunks (a single block group of size 3G) in 64k
sub-stripes, until block group A is filled up, and then it'll move on
to another block group.

   For "single" allocation on the same disks, you will instead get:

Disk 1:  | A  | D  | G  |...
Disk 2:  | B  | E  | H  |...
Disk 3:  | C  | F  | I  |...

where, again, each chunk is 1G in size. Data written to the FS will
live in one of the chunks, overflowing to some other chunk when
there's no more space.

   With large files, you've still got a chance that (some of) the data
from the file will be on more than one disk, but it's a much much
better situation than you'd have with RAID-0.

   Of course, you still need RAID-1 metadata, so that when a disk does
go bang, you still have all the filesystem structures you need to read
the remaining data. :)

   In fact, this is probably a good argument for having the option to
put back the old allocator algorithm, which would have ensured that
the first disk would fill up completely first before it touched the
next one...

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
          --- ...  one ping(1) to rule them all, and in the ---          
                         darkness bind(2) them.                          

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 190 bytes --]

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: kernel 3.3.4 damages filesystem (?)
  2012-05-07 17:52           ` Helmut Hullen
  2012-05-07 18:00             ` Hugo Mills
@ 2012-05-07 19:30             ` Daniel Lee
  2012-05-07 20:21               ` Helmut Hullen
  1 sibling, 1 reply; 54+ messages in thread
From: Daniel Lee @ 2012-05-07 19:30 UTC (permalink / raw)
  To: linux-btrfs

On 05/07/2012 10:52 AM, Helmut Hullen wrote:
> Hallo, Felix,
>
> Du meintest am 07.05.12:
>
>>> I'm just going back to ext4 - then one broken disk doesn't disturb
>>> the contents of the other disks.
>
>> ?! If you use raid0 one broken disk will always disturb the contents
>> of the other disks, that is what raid0 does, no matter what
>> filesystem you use.
>
> Yes - I know. But btrfs promises that I can add bigger disks and delete
> smaller disks "on the fly". For something like a video collection which
> will grow on and on an interesting feature. And such a (big) collection
> does need a "gradfather-father-son" backup, that's no critical data.
>
> With a file system like ext2/3/4 I can work with several directories
> which are mounted together, but (as said before) one broken disk doesn't
> disturb the others.
>

How can you do that with ext2/3/4? If you mean create several different 
filesystems and mount them separately then that's very different from 
your current situation. What you did in this case is comparable to 
creating a raid0 array out of your disks. I don't see how an ext 
filesystem is going to work any better if one of the disks drops out 
than with a btrfs filesystem. Using -d single isn't going to be of much 
use in this case either because that's like spanning a lvm volume over 
several disks and then putting ext over that, it's pretty 
nondeterministic how much you'll actually save should a large chunk of 
the filesystem suddenly disappear.

It sounds like what you're thinking of is creating several separate ext 
filesystems and then just mounting them separately. There's nothing 
inherently special about doing this with ext, you can can do the same 
thing with btrfs and it would amount to about the same level of 
protection (potentially more if you consider [meta]data checksums 
important but potentially less if you feel that ext is more robust for 
whatever reason).

If you want to survive losing a single disk without the (absolute) fear 
of the whole filesystem breaking you have to have some sort of 
redundancy either by separating filesystems or using some version of 
raid other than raid0. I suppose the volume management of btrfs is sort 
of confusing at the moment but when btrfs promises you can remove disks 
"on the fly" it doesn't mean you can just unplug disks from a raid0 
without telling btrfs to put that data elsewhere first.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: kernel 3.3.4 damages filesystem (?)
  2012-05-07 19:30             ` kernel 3.3.4 damages filesystem (?) Daniel Lee
@ 2012-05-07 20:21               ` Helmut Hullen
  2012-05-07 20:51                 ` Daniel Lee
  2012-05-07 22:07                 ` Martin Steigerwald
  0 siblings, 2 replies; 54+ messages in thread
From: Helmut Hullen @ 2012-05-07 20:21 UTC (permalink / raw)
  To: linux-btrfs

Hallo, Daniel,

Du meintest am 07.05.12:

>> Yes - I know. But btrfs promises that I can add bigger disks and
>> delete smaller disks "on the fly". For something like a video
>> collection which will grow on and on an interesting feature. And
>> such a (big) collection does need a "gradfather-father-son" backup,
>> that's no critical data.
>>
>> With a file system like ext2/3/4 I can work with several directories
>> which are mounted together, but (as said before) one broken disk
>> doesn't disturb the others.

> How can you do that with ext2/3/4? If you mean create several
> different filesystems and mount them separately then that's very
> different from your current situation. What you did in this case is
> comparable to creating a raid0 array out of your disks. I don't see
> how an ext filesystem is going to work any better if one of the disks
> drops out than with a btrfs filesystem.

  mkfs.btrfs  -m raid1 -d raid0

with 3 disks gives me a "cluster" which looks like 1 disk/partition/ 
directory.
If one disk fails nothing is usable.

(Yes - I've read Hugo's explanation of "-d single", I'll try this way)

With ext2/3/4 I mount 2 disks/partitions into the first disk. If one  
disk fails the contents of the 2 other disks is still readable,

> It sounds like what you're thinking of is creating several separate
> ext filesystems and then just mounting them separately.

Yes - that's the old way. It's reliable but "ugly".

> There's nothing inherently special about doing this with ext, you can
> do the same thing with btrfs and it would amount to about the same
> level of protection (potentially more if you consider [meta]data
> checksums important but potentially less if you feel that ext is more
> robust for whatever reason).

No - as just mentionend: there's a big difference when one disk fails.

> If you want to survive losing a single disk without the (absolute)
> fear of the whole filesystem breaking you have to have some sort of
> redundancy either by separating filesystems or using some version of
> raid other than raid0.

No - since some years I use a kind of outsourced backup. A copy of all  
data is on a bundle of disks somewhere in the neighbourhood. As  
mentionend: the data isn't business critical, it's just "nice to have".  
It's not worth something like raid1 or so (with twice the costs of a non  
raid solution).

> I suppose the volume management of btrfs is
> sort of confusing at the moment but when btrfs promises you can
> remove disks "on the fly" it doesn't mean you can just unplug disks
> from a raid0 without telling btrfs to put that data elsewhere first.

No - it's not confusing. It only needs a kind of recipe and much time:

        btrfs device add ...
        btrfs filesystem balance ... (perhaps no necessary)
        btrfs device delete ...
        btrfs filesystem balance ... (perhaps not necessary)

No intellectual challenge.
And completely different to "hot pluggable".

Viele Gruesse!
Helmut

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: kernel 3.3.4 damages filesystem (?)
  2012-05-07 20:21               ` Helmut Hullen
@ 2012-05-07 20:51                 ` Daniel Lee
  2012-05-07 21:17                   ` Helmut Hullen
  2012-05-07 22:07                 ` Martin Steigerwald
  1 sibling, 1 reply; 54+ messages in thread
From: Daniel Lee @ 2012-05-07 20:51 UTC (permalink / raw)
  To: linux-btrfs

On 05/07/2012 01:21 PM, Helmut Hullen wrote:
> Hallo, Daniel,
>
> Du meintest am 07.05.12:
>
>>> Yes - I know. But btrfs promises that I can add bigger disks and
>>> delete smaller disks "on the fly". For something like a video
>>> collection which will grow on and on an interesting feature. And
>>> such a (big) collection does need a "gradfather-father-son" backup,
>>> that's no critical data.
>>>
>>> With a file system like ext2/3/4 I can work with several directories
>>> which are mounted together, but (as said before) one broken disk
>>> doesn't disturb the others.
>
>> How can you do that with ext2/3/4? If you mean create several
>> different filesystems and mount them separately then that's very
>> different from your current situation. What you did in this case is
>> comparable to creating a raid0 array out of your disks. I don't see
>> how an ext filesystem is going to work any better if one of the disks
>> drops out than with a btrfs filesystem.
>
>    mkfs.btrfs  -m raid1 -d raid0
>
> with 3 disks gives me a "cluster" which looks like 1 disk/partition/
> directory.
> If one disk fails nothing is usable.

How is that different from putting ext on top of a raid0?

>
> (Yes - I've read Hugo's explanation of "-d single", I'll try this way)
>
> With ext2/3/4 I mount 2 disks/partitions into the first disk. If one
> disk fails the contents of the 2 other disks is still readable,

There is nothing that prevents you from using this strategy with btrfs.

>
>> It sounds like what you're thinking of is creating several separate
>> ext filesystems and then just mounting them separately.
>
> Yes - that's the old way. It's reliable but "ugly".
>
>> There's nothing inherently special about doing this with ext, you can
>> do the same thing with btrfs and it would amount to about the same
>> level of protection (potentially more if you consider [meta]data
>> checksums important but potentially less if you feel that ext is more
>> robust for whatever reason).
>
> No - as just mentionend: there's a big difference when one disk fails.

No there isn't.

>
>> If you want to survive losing a single disk without the (absolute)
>> fear of the whole filesystem breaking you have to have some sort of
>> redundancy either by separating filesystems or using some version of
>> raid other than raid0.
>
> No - since some years I use a kind of outsourced backup. A copy of all
> data is on a bundle of disks somewhere in the neighbourhood. As
> mentionend: the data isn't business critical, it's just "nice to have".
> It's not worth something like raid1 or so (with twice the costs of a non
> raid solution).
>
>> I suppose the volume management of btrfs is
>> sort of confusing at the moment but when btrfs promises you can
>> remove disks "on the fly" it doesn't mean you can just unplug disks
>> from a raid0 without telling btrfs to put that data elsewhere first.
>
> No - it's not confusing. It only needs a kind of recipe and much time:
>
>          btrfs device add ...
>          btrfs filesystem balance ... (perhaps no necessary)
>          btrfs device delete ...
>          btrfs filesystem balance ... (perhaps not necessary)
>
> No intellectual challenge.
> And completely different to "hot pluggable".

This is no different to any raid0 or spanning disk setup that allows 
growing/shrinking of the array.


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: kernel 3.3.4 damages filesystem (?)
  2012-05-07 20:51                 ` Daniel Lee
@ 2012-05-07 21:17                   ` Helmut Hullen
  2012-05-07 21:27                     ` cwillu
  0 siblings, 1 reply; 54+ messages in thread
From: Helmut Hullen @ 2012-05-07 21:17 UTC (permalink / raw)
  To: linux-btrfs

Hallo, Daniel,

Du meintest am 07.05.12:

>>    mkfs.btrfs  -m raid1 -d raid0
>>
>> with 3 disks gives me a "cluster" which looks like 1 disk/partition/
>> directory.
>> If one disk fails nothing is usable.

> How is that different from putting ext on top of a raid0?

Classic raid0 doesn't allow deleting/removing disks from a cluster.

>> With ext2/3/4 I mount 2 disks/partitions into the first disk. If one
>> disk fails the contents of the 2 other disks is still readable,

> There is nothing that prevents you from using this strategy with
> btrfs.

How?
I've tried many installations of btrfs, sometimes 1 disk failed, and  
then the data on all other disks was inaccessible.

Viele Gruesse!
Helmut

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: kernel 3.3.4 damages filesystem (?)
  2012-05-07 21:17                   ` Helmut Hullen
@ 2012-05-07 21:27                     ` cwillu
  0 siblings, 0 replies; 54+ messages in thread
From: cwillu @ 2012-05-07 21:27 UTC (permalink / raw)
  To: helmut; +Cc: linux-btrfs

On Mon, May 7, 2012 at 3:17 PM, Helmut Hullen <Hullen@t-online.de> wrot=
e:
> Hallo, Daniel,
>
> Du meintest am 07.05.12:
>
>>> =C2=A0 =C2=A0mkfs.btrfs =C2=A0-m raid1 -d raid0
>>>
>>> with 3 disks gives me a "cluster" which looks like 1 disk/partition=
/
>>> directory.
>>> If one disk fails nothing is usable.
>
>> How is that different from putting ext on top of a raid0?
>
> Classic raid0 doesn't allow deleting/removing disks from a cluster.
>
>>> With ext2/3/4 I mount 2 disks/partitions into the first disk. If on=
e
>>> disk fails the contents of the 2 other disks is still readable,
>
>> There is nothing that prevents you from using this strategy with
>> btrfs.
>
> How?
> I've tried many installations of btrfs, sometimes 1 disk failed, and
> then the data on all other disks was inaccessible.

"With ext2/3/4 I mount 2 disks/partitions into the first disk. If one
disk fails the contents of the 2 other disks is still readable,"

There's nothing stopping you from using 3 btrfs filesystems mounted in
the same way as you would 3 ext4 filesystems.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: kernel 3.3.4 damages filesystem (?)
  2012-05-07 20:21               ` Helmut Hullen
  2012-05-07 20:51                 ` Daniel Lee
@ 2012-05-07 22:07                 ` Martin Steigerwald
  2012-05-08  7:39                   ` Helmut Hullen
  1 sibling, 1 reply; 54+ messages in thread
From: Martin Steigerwald @ 2012-05-07 22:07 UTC (permalink / raw)
  To: linux-btrfs, helmut

Am Montag, 7. Mai 2012 schrieb Helmut Hullen:
> > If you want to survive losing a single disk without the (absolute)
> > fear of the whole filesystem breaking you have to have some sort of
> > redundancy either by separating filesystems or using some version of
> > raid other than raid0.
> 
> No - since some years I use a kind of outsourced backup. A copy of
> all   data is on a bundle of disks somewhere in the neighbourhood. As
> mentionend: the data isn't business critical, it's just "nice to
> have". It's not worth something like raid1 or so (with twice the costs
> of a non raid solution).

Thats not true when you use BTRFS RAID1 with three disks. BTRFS will only 
store each chunk on two different drives then, not on all three. Such it is 
not twice the cost, but given all three drives have the same capacity 
about one and a half times the cost.

Consider the time to recover the files from the outsourced backup. Maybe it 
does make up the money you would have to spend for one additional 
harddisk.

Anyway, I agree with the others responding to your post that this one 
harddisk died and I do not see a kernel version related issue. Any striped 
RAID 0 would have failed in that case.

And you can use three BTRFS filesystems the same way as three Ext4 
filesystems if you prefer such a setup if the time spent for restoring the 
backup does not make up the cost for one additional disk for you.

-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: kernel 3.3.4 damages filesystem (?)
  2012-05-07 22:07                 ` Martin Steigerwald
@ 2012-05-08  7:39                   ` Helmut Hullen
  2012-05-08  7:44                     ` Fajar A. Nugraha
  0 siblings, 1 reply; 54+ messages in thread
From: Helmut Hullen @ 2012-05-08  7:39 UTC (permalink / raw)
  To: linux-btrfs

Hallo, Martin,

Du meintest am 08.05.12:

>> No - since some years I use a kind of outsourced backup. A copy of
>> all   data is on a bundle of disks somewhere in the neighbourhood.
>> As mentionend: the data isn't business critical, it's just "nice to
>> have". It's not worth something like raid1 or so (with twice the
>> costs of a non raid solution).

> Thats not true when you use BTRFS RAID1 with three disks. BTRFS will
> only store each chunk on two different drives then, not on all three.
> Such it is not twice the cost, but given all three drives have the
> same capacity about one and a half times the cost.

> Consider the time to recover the files from the outsourced backup.
> Maybe it does make up the money you would have to spend for one
> additional harddisk.

I have considered it, many times. And the result is unchanged: no RAID1.  
It doesn't replace a real backup.

> Anyway, I agree with the others responding to your post that this one
> harddisk died and I do not see a kernel version related issue. Any
> striped RAID 0 would have failed in that case.

Yes - I had written yesterday that the disk is dead. One of three disks.  
I'm on the way restoring (from backup) the three disks.

> And you can use three BTRFS filesystems the same way as three Ext4
> filesystems if you prefer such a setup if the time spent for
> restoring the backup does not make up the cost for one additional
> disk for you.

But where's the gain? If a disk fails I have a lot of tools for  
repairing an ext2/3/4 system.

Viele Gruesse!
Helmut

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: kernel 3.3.4 damages filesystem (?)
  2012-05-08  7:39                   ` Helmut Hullen
@ 2012-05-08  7:44                     ` Fajar A. Nugraha
  2012-05-08 10:00                       ` Helmut Hullen
  0 siblings, 1 reply; 54+ messages in thread
From: Fajar A. Nugraha @ 2012-05-08  7:44 UTC (permalink / raw)
  To: helmut; +Cc: linux-btrfs

On Tue, May 8, 2012 at 2:39 PM, Helmut Hullen <Hullen@t-online.de> wrote:

>> And you can use three BTRFS filesystems the same way as three Ext4
>> filesystems if you prefer such a setup if the time spent for
>> restoring the backup does not make up the cost for one additional
>> disk for you.
>
> But where's the gain? If a disk fails I have a lot of tools for
> repairing an ext2/3/4 system.

It won't work if you use it in RAID0 (e.g. with LVM spanning three
disks, then use ext4 on top of the LV). Which is basically the same
thing that you did (using btrfs in raid0 mode).

As others said, if your only concern is "if a disk is dead, I want to
be able to access data on other disks", then simply use btrfs as three
different fs, mounted on three directories.

btrfs will shine when:
- you need checksum and self-healing in raid10 mode
- you have lots of small files
- you have highly compressible content
- you need snapshot/clone feature

Since you don't need either, IMHO it's actually better if you just use ext4.

-- 
Fajar

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: kernel 3.3.4 damages filesystem (?)
  2012-05-08  7:44                     ` Fajar A. Nugraha
@ 2012-05-08 10:00                       ` Helmut Hullen
  2012-05-08 10:41                         ` Clemens Eisserer
  2012-05-08 21:42                         ` Hubert Kario
  0 siblings, 2 replies; 54+ messages in thread
From: Helmut Hullen @ 2012-05-08 10:00 UTC (permalink / raw)
  To: linux-btrfs

Hallo, Fajar,

Du meintest am 08.05.12:

>>> And you can use three BTRFS filesystems the same way as three Ext4
>>> filesystems if you prefer such a setup if the time spent for
>>> restoring the backup does not make up the cost for one additional
>>> disk for you.
>>
>> But where's the gain? If a disk fails I have a lot of tools for
>> repairing an ext2/3/4 system.

> It won't work if you use it in RAID0 (e.g. with LVM spanning three
> disks, then use ext4 on top of the LV).

But when I use ext2/3/4 I neither need RAID0 nor do I need LVM.

> As others said, if your only concern is "if a disk is dead, I want to
> be able to access data on other disks", then simply use btrfs as
> three different fs, mounted on three directories.

But then I don't need especially btrfs.

> btrfs will shine when:
> - you need checksum and self-healing in raid10 mode
> - you have lots of small files
> - you have highly compressible content
> - you need snapshot/clone feature

For my video collection (mpeg2) nothing fits ...

The only advantage I see with btrfs is

        adding a bigger disk
        deleting/removing a smaller disk

with really simple commands.

Viele Gruesse!
Helmut

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: kernel 3.3.4 damages filesystem (?)
  2012-05-08 10:00                       ` Helmut Hullen
@ 2012-05-08 10:41                         ` Clemens Eisserer
  2012-05-08 13:13                           ` Helmut Hullen
  2012-05-08 21:42                         ` Hubert Kario
  1 sibling, 1 reply; 54+ messages in thread
From: Clemens Eisserer @ 2012-05-08 10:41 UTC (permalink / raw)
  To: helmut, linux-btrfs

Hi Helmut,

>> But where's the gain? If a disk fails I have a lot of tools for
>> repairing an ext2/3/4 system.

Nope, when a disk in your ext4 raid0 array fails, you are just as doomed.


> But when I use ext2/3/4 I neither need RAID0 nor do I need LVM.

You can use btrfs, without using its raid capabilities.
Face it, you used an experimental filesystem and you configured it the
wrong way.
Btrfs is not the one to blame here.

- Clemens

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: kernel 3.3.4 damages filesystem (?)
  2012-05-08 10:41                         ` Clemens Eisserer
@ 2012-05-08 13:13                           ` Helmut Hullen
  2012-05-08 13:44                             ` Felix Blanke
  0 siblings, 1 reply; 54+ messages in thread
From: Helmut Hullen @ 2012-05-08 13:13 UTC (permalink / raw)
  To: linux-btrfs

Hallo, Clemens,

Du meintest am 08.05.12:

>>> But where's the gain? If a disk fails I have a lot of tools for
>>> repairing an ext2/3/4 system.

> Nope, when a disk in your ext4 raid0 array fails, you are just as
> doomed.

Why should I use RAID0 with a bundle of ext2/3/4? Mounting on/in the  
directory tree does the job.

Viele Gruesse!
Helmut

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: kernel 3.3.4 damages filesystem (?)
  2012-05-08 13:13                           ` Helmut Hullen
@ 2012-05-08 13:44                             ` Felix Blanke
  2012-05-08 13:52                               ` Hugo Mills
  2012-05-08 16:53                               ` Helmut Hullen
  0 siblings, 2 replies; 54+ messages in thread
From: Felix Blanke @ 2012-05-08 13:44 UTC (permalink / raw)
  To: helmut; +Cc: Helmut Hullen, linux-btrfs

On 5/8/12 3:13 PM, Helmut Hullen wrote:
> Hallo, Clemens,
>
> Du meintest am 08.05.12:
>
>>>> But where's the gain? If a disk fails I have a lot of tools for
>>>> repairing an ext2/3/4 system.
>
>> Nope, when a disk in your ext4 raid0 array fails, you are just as
>> doomed.
>
> Why should I use RAID0 with a bundle of ext2/3/4? Mounting on/in the
> directory tree does the job.

Nobody told you that you should do it. What EVERYBODY here is telling 
you: The problem you have right now would be the same damn problem, no 
matter what fs you would you. Every fs will be unusable if you lose one 
disk in a raid0 setup. That's all what we are trying to tell you for the 
last 15 mails :)

If you don't see any benefits using btrfs then simply don't  use it :) 
Again: You misconfigured your fs if you never wanted to use raid0. Don't 
blame the fs, blame yourself.

>
> Viele Gruesse!
> Helmut
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: kernel 3.3.4 damages filesystem (?)
  2012-05-08 13:44                             ` Felix Blanke
@ 2012-05-08 13:52                               ` Hugo Mills
  2012-05-08 16:53                               ` Helmut Hullen
  1 sibling, 0 replies; 54+ messages in thread
From: Hugo Mills @ 2012-05-08 13:52 UTC (permalink / raw)
  To: Felix Blanke; +Cc: helmut, Helmut Hullen, linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 1137 bytes --]

On Tue, May 08, 2012 at 03:44:12PM +0200, Felix Blanke wrote:
> On 5/8/12 3:13 PM, Helmut Hullen wrote:
> >Hallo, Clemens,
> >
> >Du meintest am 08.05.12:
> >
> >>>>But where's the gain? If a disk fails I have a lot of tools for
> >>>>repairing an ext2/3/4 system.
> >
> >>Nope, when a disk in your ext4 raid0 array fails, you are just as
> >>doomed.
> >
> >Why should I use RAID0 with a bundle of ext2/3/4? Mounting on/in the
> >directory tree does the job.
> 
> Nobody told you that you should do it. What EVERYBODY here is
> telling you: The problem you have right now would be the same damn
> problem, no matter what fs you would you. Every fs will be unusable
> if you lose one disk in a raid0 setup. That's all what we are trying
> to tell you for the last 15 mails :)

   I think he's got the point by now. Can we stop this thread now,
please? It doesn't seem to be serving any further purpose.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
             --- No names... I want to remain anomalous. ---             

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 190 bytes --]

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: kernel 3.3.4 damages filesystem (?)
  2012-05-08 13:44                             ` Felix Blanke
  2012-05-08 13:52                               ` Hugo Mills
@ 2012-05-08 16:53                               ` Helmut Hullen
  2012-05-08 17:24                                 ` Felix Blanke
  1 sibling, 1 reply; 54+ messages in thread
From: Helmut Hullen @ 2012-05-08 16:53 UTC (permalink / raw)
  To: linux-btrfs

Hallo, Felix,

Du meintest am 08.05.12:

>> Why should I use RAID0 with a bundle of ext2/3/4? Mounting on/in the
>> directory tree does the job.

> Nobody told you that you should do it. What EVERYBODY here is telling
> you: The problem you have right now would be the same damn problem,
> no matter what fs you would you. Every fs will be unusable if you
> lose one disk in a raid0 setup. That's all what we are trying to tell
> you for the last 15 mails :)

> If you don't see any benefits using btrfs then simply don't  use it

I still hope for a benefit when I use btrfs.

As I've written many times: I want a system for my video collection  
which allows

        adding a bigger disk
        deleting/removing a smaller disk

with simple commands.

btrfs seems to be able to do that (and I have tested this job many  
times). But with my configuration "mkfs.btrfs -m raid1 -d raid0" I've  
(again) seen that all data vanishes when 1 disk fails.

I'll try Hugo's proposal "mkfs.btrfs -m raid1 -d single".
And I hope that it doesn't make all disks unreadable when 1 disk fails.

Viele Gruesse!
Helmut

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: kernel 3.3.4 damages filesystem (?)
  2012-05-08 16:53                               ` Helmut Hullen
@ 2012-05-08 17:24                                 ` Felix Blanke
  2012-05-08 18:29                                   ` Helmut Hullen
  0 siblings, 1 reply; 54+ messages in thread
From: Felix Blanke @ 2012-05-08 17:24 UTC (permalink / raw)
  To: helmut; +Cc: Helmut Hullen, linux-btrfs

On 5/8/12 6:53 PM, Helmut Hullen wrote:
 > Hallo, Felix,
 >
 > Du meintest am 08.05.12:
 >
 >>> Why should I use RAID0 with a bundle of ext2/3/4? Mounting on/in the
 >>> directory tree does the job.
 >
 >> Nobody told you that you should do it. What EVERYBODY here is telling
 >> you: The problem you have right now would be the same damn problem,
 >> no matter what fs you would you. Every fs will be unusable if you
 >> lose one disk in a raid0 setup. That's all what we are trying to tell
 >> you for the last 15 mails :)
 >
 >> If you don't see any benefits using btrfs then simply don't  use it
 >
 > I still hope for a benefit when I use btrfs.
 >
 > As I've written many times: I want a system for my video collection
 > which allows
 >
 >          adding a bigger disk
 >          deleting/removing a smaller disk
 >
 > with simple commands.
 >
 > btrfs seems to be able to do that (and I have tested this job many
 > times). But with my configuration "mkfs.btrfs -m raid1 -d raid0" I've
 > (again) seen that all data vanishes when 1 disk fails.
 >
 > I'll try Hugo's proposal "mkfs.btrfs -m raid1 -d single".
 > And I hope that it doesn't make all disks unreadable when 1 disk fails.

Maybe you should inform yourself about the different raid level before 
you use them?

http://en.wikipedia.org/wiki/Standard_RAID_levels#RAID_0

Raid0 will allways be that way: One disk dies, filesystem is gone. 
That's some sort of defintion of raid0 :)

@"-d single"

Is it really possible to remove a disk from btrfs (created with -d 
single) without losing the data on that disk? Is there a way to tell 
balance to copy all the data from this disk to the other disks (ofc if 
there is enough free space on them)?

 >
 > Viele Gruesse!
 > Helmut
 > --
 > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
 > the body of a message to majordomo@vger.kernel.org
 > More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: kernel 3.3.4 damages filesystem (?)
  2012-05-08 17:24                                 ` Felix Blanke
@ 2012-05-08 18:29                                   ` Helmut Hullen
  2012-05-08 18:41                                     ` Felix Blanke
  0 siblings, 1 reply; 54+ messages in thread
From: Helmut Hullen @ 2012-05-08 18:29 UTC (permalink / raw)
  To: linux-btrfs

Hallo, Felix,

Du meintest am 08.05.12:

>> As I've written many times: I want a system for my video collection
>> which allows
>>
>>          adding a bigger disk
>>          deleting/removing a smaller disk
>>
>> with simple commands.
>>
>> btrfs seems to be able to do that (and I have tested this job many
>> times). But with my configuration "mkfs.btrfs -m raid1 -d raid0"
>> I've (again) seen that all data vanishes when 1 disk fails.
>>
>> I'll try Hugo's proposal "mkfs.btrfs -m raid1 -d single".
>> And I hope that it doesn't make all disks unreadable when 1 disk
>> fails.

[...]

> @"-d single"

> Is it really possible to remove a disk from btrfs (created with -d
> single) without losing the data on that disk?

When the system is configured with

        mkfs.btrfs -m raid1 -d raid0

then the above shown way is possible, it works (now) as expected.
Ok - it needs some time.

And I have yet told in this mailing list that I'll try the option 2-d  
single".

> Is there a way to tell
> balance to copy all the data from this disk to the other disks (ofc
> if there is enough free space on them)?

As I've written some hours ago: I run

        btrfs fi balance ...

after adding and after deleting a disk. Maybe it's not necessary.  
Especially it seems not to be necessary after adding a disk.

Viele Gruesse!
Helmut

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: kernel 3.3.4 damages filesystem (?)
  2012-05-08 18:29                                   ` Helmut Hullen
@ 2012-05-08 18:41                                     ` Felix Blanke
  2012-05-08 19:12                                       ` David Sterba
  2012-05-08 19:34                                       ` Helmut Hullen
  0 siblings, 2 replies; 54+ messages in thread
From: Felix Blanke @ 2012-05-08 18:41 UTC (permalink / raw)
  To: helmut; +Cc: Helmut Hullen, linux-btrfs

On 5/8/12 8:29 PM, Helmut Hullen wrote:
> Hallo, Felix,
>
> Du meintest am 08.05.12:
>
>>> As I've written many times: I want a system for my video collection
>>> which allows
>>>
>>>           adding a bigger disk
>>>           deleting/removing a smaller disk
>>>
>>> with simple commands.
>>>
>>> btrfs seems to be able to do that (and I have tested this job many
>>> times). But with my configuration "mkfs.btrfs -m raid1 -d raid0"
>>> I've (again) seen that all data vanishes when 1 disk fails.
>>>
>>> I'll try Hugo's proposal "mkfs.btrfs -m raid1 -d single".
>>> And I hope that it doesn't make all disks unreadable when 1 disk
>>> fails.
>
> [...]
>
>> @"-d single"
>
>> Is it really possible to remove a disk from btrfs (created with -d
>> single) without losing the data on that disk?
>
> When the system is configured with
>
>          mkfs.btrfs -m raid1 -d raid0
>
> then the above shown way is possible, it works (now) as expected.
> Ok - it needs some time.
>
> And I have yet told in this mailing list that I'll try the option 2-d
> single".
>
>> Is there a way to tell
>> balance to copy all the data from this disk to the other disks (ofc
>> if there is enough free space on them)?
>
> As I've written some hours ago: I run
>
>          btrfs fi balance ...
>
> after adding and after deleting a disk. Maybe it's not necessary.
> Especially it seems not to be necessary after adding a disk.

What are the steps you're doing?! If this is really possible then there 
must be some sort of command that tells btrfs "Hey, I wanne remove this 
disk from the fs, please copy all data to the other disks and then 
remove the disk". Is there such a command? Haven't heard of one, but 
that would be interesting.

Otherwise if you remove a disk from a raid0 (doesn't matter if you have 
2 or 5 or x disks in the fs, btrfs should stripe above all disks) your 
fs should be broken.

>
> Viele Gruesse!
> Helmut
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: kernel 3.3.4 damages filesystem (?)
  2012-05-08 18:41                                     ` Felix Blanke
@ 2012-05-08 19:12                                       ` David Sterba
  2012-05-08 19:34                                       ` Helmut Hullen
  1 sibling, 0 replies; 54+ messages in thread
From: David Sterba @ 2012-05-08 19:12 UTC (permalink / raw)
  To: Felix Blanke; +Cc: helmut, Helmut Hullen, linux-btrfs

On Tue, May 08, 2012 at 08:41:47PM +0200, Felix Blanke wrote:
> >As I've written some hours ago: I run
> >
> >         btrfs fi balance ...
> >
> >after adding and after deleting a disk. Maybe it's not necessary.
> >Especially it seems not to be necessary after adding a disk.
> 
> What are the steps you're doing?! If this is really possible then there must
> be some sort of command that tells btrfs "Hey, I wanne remove this disk from
> the fs, please copy all data to the other disks and then remove the disk".
> Is there such a command? Haven't heard of one, but that would be
> interesting.

The 'btrfs device delete' command does what you described, a pretty
basic command, so I'm not sure if I did not miss something during this
thread.

> Otherwise if you remove a disk from a raid0 (doesn't matter if you have 2 or
> 5 or x disks in the fs, btrfs should stripe above all disks) your fs should
> be broken.

All data from the device being removed are relocated to the rest of the
device group.


david

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: kernel 3.3.4 damages filesystem (?)
  2012-05-08 18:41                                     ` Felix Blanke
  2012-05-08 19:12                                       ` David Sterba
@ 2012-05-08 19:34                                       ` Helmut Hullen
  2012-05-08 20:02                                         ` Hugo Mills
  1 sibling, 1 reply; 54+ messages in thread
From: Helmut Hullen @ 2012-05-08 19:34 UTC (permalink / raw)
  To: linux-btrfs

Hallo, Felix,

Du meintest am 08.05.12:

>>>>           adding a bigger disk
>>>>           deleting/removing a smaller disk
>>>>
>>>> with simple commands.

[...]

>>> Is it really possible to remove a disk from btrfs (created with -d
>>> single) without losing the data on that disk?
>>
>> When the system is configured with
>>
>>          mkfs.btrfs -m raid1 -d raid0
>>
>> then the above shown way is possible, it works (now) as expected.
>> Ok - it needs some time.

[...]

> What are the steps you're doing?! If this is really possible then
> there must be some sort of command that tells btrfs "Hey, I wanne
> remove this disk from the fs, please copy all data to the other disks
> and then remove the disk". Is there such a command? Haven't heard of
> one, but that would be interesting.

        btrfs device add /dev/$newdisk ...
        (btrfs fi balance ...)
        btrfs device delete /dev/$olddisk ...
        (btrfs fi balance ...)

I've told these simple steps many times in this mailing list.

Since some kernel versions (at least since kernel 3.2.x) it seems to  
work without problems; "btrfs-progs"-packet from 2011-10-30.


> Otherwise if you remove a disk from a raid0 (doesn't matter if you
> have 2 or 5 or x disks in the fs, btrfs should stripe above all
> disks) your fs should be broken.


Not with btrfs ... there it works even with

  mkfs.btrfs -m raid1 -d raid0 ...

Viele Gruesse!
Helmut

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: kernel 3.3.4 damages filesystem (?)
  2012-05-08 19:34                                       ` Helmut Hullen
@ 2012-05-08 20:02                                         ` Hugo Mills
  2012-05-08 20:19                                           ` Helmut Hullen
  0 siblings, 1 reply; 54+ messages in thread
From: Hugo Mills @ 2012-05-08 20:02 UTC (permalink / raw)
  To: helmut; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 2002 bytes --]

On Tue, May 08, 2012 at 09:34:00PM +0200, Helmut Hullen wrote:
> Hallo, Felix,
> 
> Du meintest am 08.05.12:
> 
> >>>>           adding a bigger disk
> >>>>           deleting/removing a smaller disk
> >>>>
> >>>> with simple commands.
> 
> [...]
> 
> >>> Is it really possible to remove a disk from btrfs (created with -d
> >>> single) without losing the data on that disk?
> >>
> >> When the system is configured with
> >>
> >>          mkfs.btrfs -m raid1 -d raid0
> >>
> >> then the above shown way is possible, it works (now) as expected.
> >> Ok - it needs some time.
> 
> [...]
> 
> > What are the steps you're doing?! If this is really possible then
> > there must be some sort of command that tells btrfs "Hey, I wanne
> > remove this disk from the fs, please copy all data to the other disks
> > and then remove the disk". Is there such a command? Haven't heard of
> > one, but that would be interesting.
> 
>         btrfs device add /dev/$newdisk ...
>         (btrfs fi balance ...)
>         btrfs device delete /dev/$olddisk ...
>         (btrfs fi balance ...)
> 
> I've told these simple steps many times in this mailing list.
> 
> Since some kernel versions (at least since kernel 3.2.x) it seems to  
> work without problems; "btrfs-progs"-packet from 2011-10-30.
> 
> 
> > Otherwise if you remove a disk from a raid0 (doesn't matter if you
> > have 2 or 5 or x disks in the fs, btrfs should stripe above all
> > disks) your fs should be broken.
> 
> 
> Not with btrfs ... there it works even with
> 
>   mkfs.btrfs -m raid1 -d raid0 ...

   There is a big difference between "orderly and planned removal of a
hard disk", and "disk goes away with no warning". This is essentially
the difference you've been talking about at cross-purposes all day.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
                 --- My karma has run over my dogma. ---                 

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 190 bytes --]

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: kernel 3.3.4 damages filesystem (?)
  2012-05-08 20:02                                         ` Hugo Mills
@ 2012-05-08 20:19                                           ` Helmut Hullen
  2012-05-08 20:56                                             ` Roman Mamedov
  0 siblings, 1 reply; 54+ messages in thread
From: Helmut Hullen @ 2012-05-08 20:19 UTC (permalink / raw)
  To: linux-btrfs

Hallo, Hugo,

Du meintest am 08.05.12:

>>> Otherwise if you remove a disk from a raid0 (doesn't matter if you
>>> have 2 or 5 or x disks in the fs, btrfs should stripe above all
>>> disks) your fs should be broken.

>> Not with btrfs ... there it works even with
>>
>>   mkfs.btrfs -m raid1 -d raid0 ...

>    There is a big difference between "orderly and planned removal of
> a hard disk", and "disk goes away with no warning".

And I know the difference ...

When I first called for help I searched the failure in another place  
than in "disk is dead".

> This is essentially the difference you've been talking about at cross-
> purposes all day.

What I still hope (may be it's impossible): when 1 disk/partition fails,  
then the contents of the other disks is "somehow" restorable. And not  
irreproducable.

Viele Gruesse!
Helmut

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: kernel 3.3.4 damages filesystem (?)
  2012-05-08 20:19                                           ` Helmut Hullen
@ 2012-05-08 20:56                                             ` Roman Mamedov
  2012-05-09 14:46                                               ` Kaspar Schleiser
  0 siblings, 1 reply; 54+ messages in thread
From: Roman Mamedov @ 2012-05-08 20:56 UTC (permalink / raw)
  To: helmut; +Cc: Hullen, linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 895 bytes --]

On 08 May 2012 22:19:00 +0200
Hullen@t-online.de (Helmut Hullen) wrote:

> What I still hope (may be it's impossible): when 1 disk/partition fails,  
> then the contents of the other disks is "somehow" restorable. And not  
> irreproducable.

You should look for file/directory-level tree merging, e.g. this FUSE based
virtual FS: 

  https://romanrm.ru/en/mhddfs

Or various other unionfs'es, some of which are kernel-based.

Regarding btrfs, AFAIK even "btrfs -d single" suggested above works not "per
file", but per allocation extent, so in case of one disk failure you will lose
random *parts* (extents) of random files, which in effect could mean no file
in your whole file system will remain undamaged.

-- 
With respect,
Roman

~~~~~~~~~~~~~~~~~~~~~~~~~~~
"Stallman had a printer,
with code he could not see.
So he began to tinker,
and set the software free."

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: kernel 3.3.4 damages filesystem (?)
  2012-05-08 10:00                       ` Helmut Hullen
  2012-05-08 10:41                         ` Clemens Eisserer
@ 2012-05-08 21:42                         ` Hubert Kario
  1 sibling, 0 replies; 54+ messages in thread
From: Hubert Kario @ 2012-05-08 21:42 UTC (permalink / raw)
  To: helmut; +Cc: linux-btrfs

On Tuesday 08 of May 2012 12:00:00 Helmut Hullen wrote:
> Hallo, Fajar,
>=20
> Du meintest am 08.05.12:
> >>> And you can use three BTRFS filesystems the same way as three Ext=
4
> >>> filesystems if you prefer such a setup if the time spent for
> >>> restoring the backup does not make up the cost for one additional
> >>> disk for you.
> >>=20
> >> But where's the gain? If a disk fails I have a lot of tools for
> >> repairing an ext2/3/4 system.
> >=20
> > It won't work if you use it in RAID0 (e.g. with LVM spanning three
> > disks, then use ext4 on top of the LV).
>=20
> But when I use ext2/3/4 I neither need RAID0 nor do I need LVM.
>=20
> > As others said, if your only concern is "if a disk is dead, I want =
to
> > be able to access data on other disks", then simply use btrfs as
> > three different fs, mounted on three directories.
>=20
> But then I don't need especially btrfs.
>=20
> > btrfs will shine when:
> > - you need checksum and self-healing in raid10 mode
> > - you have lots of small files
> > - you have highly compressible content
> > - you need snapshot/clone feature
>=20
> For my video collection (mpeg2) nothing fits ...
>=20
> The only advantage I see with btrfs is
>=20
>         adding a bigger disk
>         deleting/removing a smaller disk
>=20
> with really simple commands.

Playing the Devil's advocate here (not that I don't use The Other Linux=
 FS=20
;)

I don't see btrfs commands much different from
pvcreate /dev/new-disk
vgextend videos-volume-42 /dev/new-disk
pvmove /dev/old-disk /dev/new-disk
vgreduce videos-volume-42 /dev/old-disk
resize2fs /dev/videos-volume-42/logical-volume

Unlike with shrinking, there's really no place for error. Messing up th=
ose=20
commands will give quite clear error messages and definetly won't destr=
oy=20
data (unless a hardware error occurs). And the FS on the LV is online a=
ll=20
the time, just like with btrfs.

The only difference is that with btrfs you can both extend and shrink t=
he FS=20
online, with ext2/3/4 you can only extend online...

Regards,
--=20
Hubert Kario
QBS - Quality Business Software
02-656 Warszawa, ul. Ksawer=F3w 30/85
tel. +48 (22) 646-61-51, 646-74-24
www.qbs.com.pl
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 54+ messages in thread

* failed disk (was: kernel 3.3.4 damages filesystem (?))
  2012-05-07 18:44                 ` Hugo Mills
@ 2012-05-09 13:04                   ` Helmut Hullen
  2012-05-09 13:19                     ` Hugo Mills
  0 siblings, 1 reply; 54+ messages in thread
From: Helmut Hullen @ 2012-05-09 13:04 UTC (permalink / raw)
  To: linux-btrfs

Hallo, Hugo,

Du meintest am 07.05.12:

>>>    mkfs.btrfs -m raid1 -d single should give you that.

>> What's the difference to
>>
>>      mkfs.btrfs -m raid1 -d raid0

>  - RAID-0 stripes each piece of data across all the disks.
>  - single puts data on one disk at a time.

[...]


>    In fact, this is probably a good argument for having the option to
> put back the old allocator algorithm, which would have ensured that
> the first disk would fill up completely first before it touched the
> next one...

The actual version seems to oscillate from disk to disk:

Copying about 160 GiByte shows

Label: none  uuid: fd0596c6-d819-42cd-bb4a-420c38d2a60b
	Total devices 2 FS bytes used 155.64GB
	devid    2 size 136.73GB used 114.00GB path /dev/sdl1
	devid    1 size 68.37GB used 45.04GB path /dev/sdk1

Btrfs Btrfs v0.19

------------------------

Watching the amount showed that both disks are filled nearly  
simultaneously.

That would be more difficult to restore ...

Viele Gruesse!
Helmut

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: failed disk (was: kernel 3.3.4 damages filesystem (?))
  2012-05-09 13:04                   ` failed disk (was: kernel 3.3.4 damages filesystem (?)) Helmut Hullen
@ 2012-05-09 13:19                     ` Hugo Mills
  0 siblings, 0 replies; 54+ messages in thread
From: Hugo Mills @ 2012-05-09 13:19 UTC (permalink / raw)
  To: helmut; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 1616 bytes --]

On Wed, May 09, 2012 at 03:04:00PM +0200, Helmut Hullen wrote:
> Hallo, Hugo,
> 
> Du meintest am 07.05.12:
> 
> >>>    mkfs.btrfs -m raid1 -d single should give you that.
> 
> >> What's the difference to
> >>
> >>      mkfs.btrfs -m raid1 -d raid0
> 
> >  - RAID-0 stripes each piece of data across all the disks.
> >  - single puts data on one disk at a time.
> 
> [...]
> 
> 
> >    In fact, this is probably a good argument for having the option to
> > put back the old allocator algorithm, which would have ensured that
> > the first disk would fill up completely first before it touched the
> > next one...
> 
> The actual version seems to oscillate from disk to disk:

   Yes, specifically, when it's asked for n chunks to make up a block
group, the current allocator will pick the n disks with the most free
space on them. The original allocator would pick the disks with the
smallest devid (which is probably optimal for your use case -- hence
my comment above).

> Watching the amount showed that both disks are filled nearly  
> simultaneously.
> 
> That would be more difficult to restore ...

   If your files are small compared to the block group size (1GiB in
this case), then the odds of a file spanning block groups are small.
With files similar in size to, or larger than, a chunk, you will be
far more likely to lose some part of the file when a disk goes away.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
       --- Great oxymorons of the world, no. 6: Mature Student ---       

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 190 bytes --]

^ permalink raw reply	[flat|nested] 54+ messages in thread

* failed disk (was: kernel 3.3.4 damages filesystem (?))
  2012-05-07 18:00             ` Hugo Mills
  2012-05-07 18:25               ` Helmut Hullen
@ 2012-05-09 14:25               ` Helmut Hullen
  2012-05-09 14:37                 ` Hugo Mills
  1 sibling, 1 reply; 54+ messages in thread
From: Helmut Hullen @ 2012-05-09 14:25 UTC (permalink / raw)
  To: linux-btrfs

Hallo, Hugo,

Du meintest am 07.05.12:

[...]

>> With a file system like ext2/3/4 I can work with several directories
>> which are mounted together, but (as said before) one broken disk
>> doesn't disturb the others.

>    mkfs.btrfs -m raid1 -d single should give you that.

Just a small bug, perhaps:

created a system with

        mkfs.btrfs -m raid1 -d single /dev/sdl1
        mount /dev/sdl1 /mnt/Scsi
        btrfs device add /dev/sdk1 /mnt/Scsi
        btrfs device add /dev/sdm1 /mnt/Scsi
        (filling with data)

and

        btrfs fi df /mnt/Scsi

now tells

Data, RAID0: total=183.18GB, used=76.60GB
Data: total=80.01GB, used=79.83GB
System, DUP: total=8.00MB, used=32.00KB
System: total=4.00MB, used=0.00
Metadata, DUP: total=1.00GB, used=192.74MB
Metadata: total=8.00MB, used=0.00

--------------------------------------

"Data, RAID0" confuses me (not very much ...), and the system for  
metadata (RAID1) is not told.


Viele Gruesse!
Helmut

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: failed disk (was: kernel 3.3.4 damages filesystem (?))
  2012-05-09 14:25               ` Helmut Hullen
@ 2012-05-09 14:37                 ` Hugo Mills
  2012-05-09 15:14                   ` failed disk Helmut Hullen
                                     ` (2 more replies)
  0 siblings, 3 replies; 54+ messages in thread
From: Hugo Mills @ 2012-05-09 14:37 UTC (permalink / raw)
  To: helmut; +Cc: linux-btrfs, Ilya Dryomov

[-- Attachment #1: Type: text/plain, Size: 1940 bytes --]

On Wed, May 09, 2012 at 04:25:00PM +0200, Helmut Hullen wrote:
> Du meintest am 07.05.12:
> 
> [...]
> 
> >> With a file system like ext2/3/4 I can work with several directories
> >> which are mounted together, but (as said before) one broken disk
> >> doesn't disturb the others.
> 
> >    mkfs.btrfs -m raid1 -d single should give you that.
> 
> Just a small bug, perhaps:
> 
> created a system with
> 
>         mkfs.btrfs -m raid1 -d single /dev/sdl1
>         mount /dev/sdl1 /mnt/Scsi
>         btrfs device add /dev/sdk1 /mnt/Scsi
>         btrfs device add /dev/sdm1 /mnt/Scsi
>         (filling with data)
> 
> and
> 
>         btrfs fi df /mnt/Scsi
> 
> now tells
> 
> Data, RAID0: total=183.18GB, used=76.60GB
> Data: total=80.01GB, used=79.83GB
> System, DUP: total=8.00MB, used=32.00KB
> System: total=4.00MB, used=0.00
> Metadata, DUP: total=1.00GB, used=192.74MB
> Metadata: total=8.00MB, used=0.00
> 
> --------------------------------------
> 
> "Data, RAID0" confuses me (not very much ...), and the system for  
> metadata (RAID1) is not told.

   DUP is two copies of each block, but it allows the two copies to
live on the same device. It's done this because you started with a
single device, and you can't do RAID-1 on one device. The first bit of
metadata you write to it should automatically upgrade the DUP chunk to
RAID-1.

   As to the spurious "upgrade" of single to RAID-0, I thought Ilya
had stopped it doing that. What kernel version are you running?

   Out of interest, why did you do the device adds separately, instead
of just this?

# mkfs.btrfs -m raid1 -d single /dev/sdl1 /dev/sdk1 /dev/sdm1

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- Comic Sans goes into a bar,  and the barman says, "We don't ---   
                         serve your type here."                          

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 190 bytes --]

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: kernel 3.3.4 damages filesystem (?)
  2012-05-08 20:56                                             ` Roman Mamedov
@ 2012-05-09 14:46                                               ` Kaspar Schleiser
  2012-05-10 10:40                                                 ` Martin Steigerwald
  0 siblings, 1 reply; 54+ messages in thread
From: Kaspar Schleiser @ 2012-05-09 14:46 UTC (permalink / raw)
  To: linux-btrfs

Hi,

On 05/08/2012 10:56 PM, Roman Mamedov wrote:
> Regarding btrfs, AFAIK even "btrfs -d single" suggested above works not "per
> file", but per allocation extent, so in case of one disk failure you will lose
> random *parts* (extents) of random files, which in effect could mean no file
> in your whole file system will remain undamaged.
Maybe we should evaluate the possiblility of such a "one file gets on 
one disk" feature.

Helmut Hullen has the use case: Many disks, totally non-critical but 
nice-to-have data. If one disk dies, some *files* should lost, not some 
*random parts of all files*.

This could be accomplished by some userspace-tool that moves stuff 
around, combined with "file pinning"-support, that lets the user make 
sure a specific file is on a specific disk.

Cheers
Kaspar




^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: failed disk
  2012-05-09 14:37                 ` Hugo Mills
@ 2012-05-09 15:14                   ` Helmut Hullen
  2012-05-09 15:33                     ` Hugo Mills
  2012-05-09 16:13                   ` failed disk (was: kernel 3.3.4 damages filesystem (?)) Ilya Dryomov
  2012-05-10  2:49                   ` failed disk Helmut Hullen
  2 siblings, 1 reply; 54+ messages in thread
From: Helmut Hullen @ 2012-05-09 15:14 UTC (permalink / raw)
  To: linux-btrfs

Hallo, Hugo,

Du meintest am 09.05.12:

>>>    mkfs.btrfs -m raid1 -d single should give you that.

>> Just a small bug, perhaps:
>>
>> created a system with
>>
>>         mkfs.btrfs -m raid1 -d single /dev/sdl1
>>         mount /dev/sdl1 /mnt/Scsi
>>         btrfs device add /dev/sdk1 /mnt/Scsi
>>         btrfs device add /dev/sdm1 /mnt/Scsi
>>         (filling with data)
>>
>> and
>>
>>         btrfs fi df /mnt/Scsi
>>
>> now tells
>>
>> Data, RAID0: total=183.18GB, used=76.60GB
>> Data: total=80.01GB, used=79.83GB
>> System, DUP: total=8.00MB, used=32.00KB
>> System: total=4.00MB, used=0.00
>> Metadata, DUP: total=1.00GB, used=192.74MB
>> Metadata: total=8.00MB, used=0.00
>>
>> --------------------------------------
>>
>> "Data, RAID0" confuses me (not very much ...), and the system for
>> metadata (RAID1) is not told.

>    DUP is two copies of each block, but it allows the two copies to
> live on the same device. It's done this because you started with a
> single device, and you can't do RAID-1 on one device. The first bit
> of metadata you write to it should automatically upgrade the DUP
> chunk to RAID-1.

Ok.

Sounds familiar - have you explained that to me many months ago?

>    As to the spurious "upgrade" of single to RAID-0, I thought Ilya
> had stopped it doing that. What kernel version are you running?

3.2.9, self made.
I could test the message with 3.3.4, but not today (if it's only an  
interpretation of always the same data).

>    Out of interest, why did you do the device adds separately,
> instead of just this?

a) making the first 2 devices: I have tested both versions (one line  
with 2 devices or 2 lines with 1 device); no big difference.

But I had tested the option "-L" (labelling) too, and that makes shit  
for the oneliner: both devices get the same label, and then "findfs"  
finds none of them.

The really safe way would be: deleting this option for the "mkfs.btrfs"  
command and only using

        btrfs fi label <device> [<newlabel>]

b) third device: that's my usual test:
        make a cluster of 2 deivces
        fill them with data
        add a third device
        delete the smallest device

Viele Gruesse!
Helmut

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: failed disk
  2012-05-09 15:14                   ` failed disk Helmut Hullen
@ 2012-05-09 15:33                     ` Hugo Mills
  2012-05-09 18:49                       ` Helmut Hullen
  0 siblings, 1 reply; 54+ messages in thread
From: Hugo Mills @ 2012-05-09 15:33 UTC (permalink / raw)
  To: helmut; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 2620 bytes --]

On Wed, May 09, 2012 at 05:14:00PM +0200, Helmut Hullen wrote:
> Hallo, Hugo,
> 
> Du meintest am 09.05.12:
> 
> >    DUP is two copies of each block, but it allows the two copies to
> > live on the same device. It's done this because you started with a
> > single device, and you can't do RAID-1 on one device. The first bit
> > of metadata you write to it should automatically upgrade the DUP
> > chunk to RAID-1.
> 
> Ok.
> 
> Sounds familiar - have you explained that to me many months ago?

   Probably. I tend to explain this kind of thing a lot to people.

> >    As to the spurious "upgrade" of single to RAID-0, I thought Ilya
> > had stopped it doing that. What kernel version are you running?
> 
> 3.2.9, self made.

   OK, I'm pretty sure that's too old -- it will "upgrade" single to
RAID-0. You can probably turn it back to "single" using balance
filters:

# btrfs fi balance -dconvert=single /mountpoint

(You may want to write at least a little data to the FS first --
balance has some slightly odd behaviour on empty filesystems).

> I could test the message with 3.3.4, but not today (if it's only an  
> interpretation of always the same data).
> 
> >    Out of interest, why did you do the device adds separately,
> > instead of just this?
> 
> a) making the first 2 devices: I have tested both versions (one line  
> with 2 devices or 2 lines with 1 device); no big difference.
> 
> But I had tested the option "-L" (labelling) too, and that makes shit  
> for the oneliner: both devices get the same label, and then "findfs"  
> finds none of them.

   Umm... Yes, of course both devices will get the same label --
you're labelling the filesystem, not the devices. (Didn't we have this
argument some time ago?).

   I don't know what "findfs" is doing, that it can't find the
filesystem by label: you may need to run "sync" after mkfs, possibly.

> The really safe way would be: deleting this option for the "mkfs.btrfs"  
> command and only using
> 
>         btrfs fi label <device> [<newlabel>]

   ... except that it'd have to take a filesystem as parameter, not a
device (see above).

> b) third device: that's my usual test:
>         make a cluster of 2 deivces
>         fill them with data
>         add a third device
>         delete the smallest device

   What are you testing? And by "delete" do you mean "btrfs dev
delete" or "pull the cable out"?

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
           --- Quidquid latine dictum sit,  altum videtur. ---           

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 190 bytes --]

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: failed disk (was: kernel 3.3.4 damages filesystem (?))
  2012-05-09 14:37                 ` Hugo Mills
  2012-05-09 15:14                   ` failed disk Helmut Hullen
@ 2012-05-09 16:13                   ` Ilya Dryomov
  2012-05-10  2:49                   ` failed disk Helmut Hullen
  2 siblings, 0 replies; 54+ messages in thread
From: Ilya Dryomov @ 2012-05-09 16:13 UTC (permalink / raw)
  To: Hugo Mills, helmut, linux-btrfs

On Wed, May 09, 2012 at 03:37:35PM +0100, Hugo Mills wrote:
> On Wed, May 09, 2012 at 04:25:00PM +0200, Helmut Hullen wrote:
> > Du meintest am 07.05.12:
> > 
> > [...]
> > 
> > >> With a file system like ext2/3/4 I can work with several directories
> > >> which are mounted together, but (as said before) one broken disk
> > >> doesn't disturb the others.
> > 
> > >    mkfs.btrfs -m raid1 -d single should give you that.
> > 
> > Just a small bug, perhaps:
> > 
> > created a system with
> > 
> >         mkfs.btrfs -m raid1 -d single /dev/sdl1
> >         mount /dev/sdl1 /mnt/Scsi
> >         btrfs device add /dev/sdk1 /mnt/Scsi
> >         btrfs device add /dev/sdm1 /mnt/Scsi
> >         (filling with data)
> > 
> > and
> > 
> >         btrfs fi df /mnt/Scsi
> > 
> > now tells
> > 
> > Data, RAID0: total=183.18GB, used=76.60GB
> > Data: total=80.01GB, used=79.83GB
> > System, DUP: total=8.00MB, used=32.00KB
> > System: total=4.00MB, used=0.00
> > Metadata, DUP: total=1.00GB, used=192.74MB
> > Metadata: total=8.00MB, used=0.00
> > 
> > --------------------------------------
> > 
> > "Data, RAID0" confuses me (not very much ...), and the system for  
> > metadata (RAID1) is not told.
> 
>    DUP is two copies of each block, but it allows the two copies to
> live on the same device. It's done this because you started with a
> single device, and you can't do RAID-1 on one device. The first bit of

What Hugo said.  Newer mkfs.btrfs will error out if you try to do this.

> metadata you write to it should automatically upgrade the DUP chunk to
> RAID-1.

We don't "upgrade" chunks in place, only during balance.

> 
>    As to the spurious "upgrade" of single to RAID-0, I thought Ilya
> had stopped it doing that. What kernel version are you running?

I did, but again, we were doing it only as part of balance, not as part
of normal operation.

Helmut, do you have any additional data points - the output of btrfs fi
df right after you created FS or somewhere in the middle of filling it ?

Also could you please paste the output of btrfs fi show and tell us what
kernel version you are running ?

Thanks,

		Ilya

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: kernel 3.3.4 damages filesystem (?)
  2012-05-07 10:46 kernel 3.3.4 damages filesystem (?) Helmut Hullen
                   ` (2 preceding siblings ...)
  2012-05-07 12:53 ` Liu Bo
@ 2012-05-09 17:32 ` Duncan
  2012-05-09 18:06   ` Atila
  3 siblings, 1 reply; 54+ messages in thread
From: Duncan @ 2012-05-09 17:32 UTC (permalink / raw)
  To: linux-btrfs

Helmut Hullen posted on Mon, 07 May 2012 12:46:00 +0200 as excerpted:

> The 3 btrfs disks are connected via a SiI 3114 SATA-PCI-Controller.
> Only 1 of the 3 disks seems to be damaged.

I don't plan to rehash the raid0/single discussion here, but here's some 
perhaps useful additional information on that hardware:


For some years I've been running that same hardware, SiI 3114 SATA PCI, 
on an old dual-socket 3-digit Opteron system, running for some years now 
dual dual-core Opteron 290s (the highest they went, 2.8 GHz, 4 cores in 
two sockets).  However, I *WAS* running them in RAID-1, 4-disk md RAID-1, 
to be exact (with reiserfs, FWIW).


What's VERY interesting is that I've just returned from being offline for 
several days due to severe disk-I/O hardware issues of my own -- again, 
on that Sil-SATA 3114.

Most of the time I was getting full system crashes, but perhaps 25-33% of 
the time it didn't fully crash the system, simply error out with an 
eventual ATA reset.  When the system didn't crash immediately, most of 
the time (about 80% I'd say) the reset would be good and I'd be back up, 
but sometimes it'd repeatedly reset, occasionally not ever becoming 
usable again.

As the drives are all the same quite old Seagate 300 gig drives, at about 
half their rated SMART operating hours but I think well beyond the 5 year 
warrantee, I originally thought I'd just learned my lesson on the don't 
use all the same model or you're risking them all going out at once rule, 
but I bought a new drive (half-TB seagate 2.5" drive, I've been thinking 
about going 2.5" for awhile now and this was the chance, I'll RAID it 
later with at least one more, preferably a different run at least if not 
a different model) and have been SLOWLY, PAINFULLY, RESETTINGLY copying 
stuff over from one or another of the four RAID-1 drives.

The reset problem, however, hasn't gone away, tho it's rather reduced on 
the newer hardware.

I also happened to have a 4-3.5-in-3-5.25-slot drive enclosure that 
seemed to be making the problem worse, as when I first tried the new 2.5 
inch retrofitted into it, the reset problem was as bad with it as with 
the old drives, but when I ran it "lose", just cabled into the mobo and 
power-supply directly, resets went down significantly but did NOT go away.


So... I've now concluded that I need a new controller and will probably 
buy one in a day or two.

Meanwhile, I THOUGHT it was "just me" with the SIL-SATA controller, until 
I happened to see the same hardware mentioned on this thread.


Now, I'm beginning to suspect that there's some new kernel DMA or storage 
or perhaps xorg/mesa (AMD AGPGART, after all, handling the DMA using half 
the aperture. if either the graphics or storage try writing to the wrong 
half...) problem that stressed what was already aging hardware, 
triggering the problem.  It's worth noting that I tried running an older 
kernel and rebuilding (on Gentoo) most of X/mesa/anything-else-I-could-
think-might-be-related between older versions that WERE working find 
before and newer versions, and reverting to older didn't help, so it's 
apparently NOT a direct software-only-bug.  However, what I'm wondering 
now is whether as I said, software upgrades added stress to already aging 
hardware, such that it tipped it over the edge, and by the time I tried 
reverting, I'd already had enough crashes and etc that my entire system 
was unstable, and reverting to older software didn't help because now the 
hardware was unstable as well.

I'd still chalk it up to simply failing hardware, except that it's a 
rather interesting coincidence that both you and I had their SIL-SATA 
3114s go bad at very close to the same time.


Meanwhile, I did recently see an interesting kernel commit, either late 
3.4-rc5+ or early 3.4-rc6+.  I don't want to try to track it down and 
lose this post to a crash on a less than stable system, but it did 
mention that AMD AGPGARTs sometimes poked holes in memory allocations and 
the commit was to try to allow for that.  I'm not sure how long the bad 
code had been in the kernel, but if it was introduced at say the 3.2 or 
3.3 kernel, it could be that is what first started triggering the lockups 
that lead to more and more system instability, until now I've bought a 
new drive and it looks like I'm going to need to replace the onboard SIL-
SATA.

So, some questions:

* Do you run OpenGL/Mesa at all on that system, possibly with an OpenGL 
compositing window manager?

* If so, how new is your mesa and xorg-server, and what is your video 
card/driver?

* Do you run quite new kernels, say 3.3/3.4?

* What libffi and cairo? (I did notice reverting libffi seemed to lessen 
the crashing a bit, especially with firefox on my bank's SSL site, which 
was where the problem first became ugly for me as I kept crashing trying 
to get in to pay bills, etc, but I'm not positive that's related, or it 
might be that likely otherwise separate bug's crashes advanced the ATA-
resets issue too.)

* Perhaps most critically, is your system an old AMD with the AGPGART?

* Also, amd64/x86_64, x86 (32), or?

FWIW, amd64, KDE 4.8 here with kwin OpenGL compositing, generally leading 
edge mesa/xorg.  I run git kernels so am on pre-release 3.4 now, and was 
pre-release 3.3 before that, when the problem perhaps started.  (It 
seemed to get worse so I can't say for sure when it went from normal to 
getting gradually worse, but for sure it wasn't back in the 3.2 era as I 
was stable and happy back then.)  Radeon hd4650 card, freedomware drivers.

If any of that, especially the AGPGART, sounds familiar, we may have a 
hardware-burner bug that caught us both.  If you're running a bit older 
versions of all that stuff or no compositing/opengl, and have say an 
nVidia card and no AMD AGPGART, it's probably simply coincidence.  But if 
it's not, and we can catch and get this fixed before the folks running 
older software as well upgrade and start burning their SIL-SATAs...

(FWIW, I hadn't yet upgraded to btrfs at all when the trouble started 
happening here, tho I was looking at it, thus my being on the list.  I 
didn't trust the two-way-only btrfs raid1 mode on my older disks and was 
waiting on N-way raid1 mode, roadmapped for after raid-5/6 mode, which is 
now roadmapped for 3.5...  But with a new disk, eventually to add another 
for raid, I don't have that problem now, so with the upgrade I'm trying 
btrfs dual-metadata single-data on a few working partitions now, backup's 
still reiserfs, tho.)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: kernel 3.3.4 damages filesystem (?)
  2012-05-09 17:32 ` Duncan
@ 2012-05-09 18:06   ` Atila
  0 siblings, 0 replies; 54+ messages in thread
From: Atila @ 2012-05-09 18:06 UTC (permalink / raw)
  To: linux-btrfs

I dont know if this is related or not, but I updated two different 
computers to ubuntu 12, which uses kernel 3.2, and in both I had the 
same problem: using btrfs with compress-force=lzo, after some IO stress 
the filesystem became unusable, some sort of busy.
Im using kernel 3.0 right now, with no such problem.

On 09-05-2012 14:32, Duncan wrote:
> Helmut Hullen posted on Mon, 07 May 2012 12:46:00 +0200 as excerpted:
>
>> The 3 btrfs disks are connected via a SiI 3114 SATA-PCI-Controller.
>> Only 1 of the 3 disks seems to be damaged.
> I don't plan to rehash the raid0/single discussion here, but here's some
> perhaps useful additional information on that hardware:
>
>
> For some years I've been running that same hardware, SiI 3114 SATA PCI,
> on an old dual-socket 3-digit Opteron system, running for some years now
> dual dual-core Opteron 290s (the highest they went, 2.8 GHz, 4 cores in
> two sockets).  However, I *WAS* running them in RAID-1, 4-disk md RAID-1,
> to be exact (with reiserfs, FWIW).
>
>
> What's VERY interesting is that I've just returned from being offline for
> several days due to severe disk-I/O hardware issues of my own -- again,
> on that Sil-SATA 3114.
>
> Most of the time I was getting full system crashes, but perhaps 25-33% of
> the time it didn't fully crash the system, simply error out with an
> eventual ATA reset.  When the system didn't crash immediately, most of
> the time (about 80% I'd say) the reset would be good and I'd be back up,
> but sometimes it'd repeatedly reset, occasionally not ever becoming
> usable again.
>
> As the drives are all the same quite old Seagate 300 gig drives, at about
> half their rated SMART operating hours but I think well beyond the 5 year
> warrantee, I originally thought I'd just learned my lesson on the don't
> use all the same model or you're risking them all going out at once rule,
> but I bought a new drive (half-TB seagate 2.5" drive, I've been thinking
> about going 2.5" for awhile now and this was the chance, I'll RAID it
> later with at least one more, preferably a different run at least if not
> a different model) and have been SLOWLY, PAINFULLY, RESETTINGLY copying
> stuff over from one or another of the four RAID-1 drives.
>
> The reset problem, however, hasn't gone away, tho it's rather reduced on
> the newer hardware.
>
> I also happened to have a 4-3.5-in-3-5.25-slot drive enclosure that
> seemed to be making the problem worse, as when I first tried the new 2.5
> inch retrofitted into it, the reset problem was as bad with it as with
> the old drives, but when I ran it "lose", just cabled into the mobo and
> power-supply directly, resets went down significantly but did NOT go away.
>
>
> So... I've now concluded that I need a new controller and will probably
> buy one in a day or two.
>
> Meanwhile, I THOUGHT it was "just me" with the SIL-SATA controller, until
> I happened to see the same hardware mentioned on this thread.
>
>
> Now, I'm beginning to suspect that there's some new kernel DMA or storage
> or perhaps xorg/mesa (AMD AGPGART, after all, handling the DMA using half
> the aperture. if either the graphics or storage try writing to the wrong
> half...) problem that stressed what was already aging hardware,
> triggering the problem.  It's worth noting that I tried running an older
> kernel and rebuilding (on Gentoo) most of X/mesa/anything-else-I-could-
> think-might-be-related between older versions that WERE working find
> before and newer versions, and reverting to older didn't help, so it's
> apparently NOT a direct software-only-bug.  However, what I'm wondering
> now is whether as I said, software upgrades added stress to already aging
> hardware, such that it tipped it over the edge, and by the time I tried
> reverting, I'd already had enough crashes and etc that my entire system
> was unstable, and reverting to older software didn't help because now the
> hardware was unstable as well.
>
> I'd still chalk it up to simply failing hardware, except that it's a
> rather interesting coincidence that both you and I had their SIL-SATA
> 3114s go bad at very close to the same time.
>
>
> Meanwhile, I did recently see an interesting kernel commit, either late
> 3.4-rc5+ or early 3.4-rc6+.  I don't want to try to track it down and
> lose this post to a crash on a less than stable system, but it did
> mention that AMD AGPGARTs sometimes poked holes in memory allocations and
> the commit was to try to allow for that.  I'm not sure how long the bad
> code had been in the kernel, but if it was introduced at say the 3.2 or
> 3.3 kernel, it could be that is what first started triggering the lockups
> that lead to more and more system instability, until now I've bought a
> new drive and it looks like I'm going to need to replace the onboard SIL-
> SATA.
>
> So, some questions:
>
> * Do you run OpenGL/Mesa at all on that system, possibly with an OpenGL
> compositing window manager?
>
> * If so, how new is your mesa and xorg-server, and what is your video
> card/driver?
>
> * Do you run quite new kernels, say 3.3/3.4?
>
> * What libffi and cairo? (I did notice reverting libffi seemed to lessen
> the crashing a bit, especially with firefox on my bank's SSL site, which
> was where the problem first became ugly for me as I kept crashing trying
> to get in to pay bills, etc, but I'm not positive that's related, or it
> might be that likely otherwise separate bug's crashes advanced the ATA-
> resets issue too.)
>
> * Perhaps most critically, is your system an old AMD with the AGPGART?
>
> * Also, amd64/x86_64, x86 (32), or?
>
> FWIW, amd64, KDE 4.8 here with kwin OpenGL compositing, generally leading
> edge mesa/xorg.  I run git kernels so am on pre-release 3.4 now, and was
> pre-release 3.3 before that, when the problem perhaps started.  (It
> seemed to get worse so I can't say for sure when it went from normal to
> getting gradually worse, but for sure it wasn't back in the 3.2 era as I
> was stable and happy back then.)  Radeon hd4650 card, freedomware drivers.
>
> If any of that, especially the AGPGART, sounds familiar, we may have a
> hardware-burner bug that caught us both.  If you're running a bit older
> versions of all that stuff or no compositing/opengl, and have say an
> nVidia card and no AMD AGPGART, it's probably simply coincidence.  But if
> it's not, and we can catch and get this fixed before the folks running
> older software as well upgrade and start burning their SIL-SATAs...
>
> (FWIW, I hadn't yet upgraded to btrfs at all when the trouble started
> happening here, tho I was looking at it, thus my being on the list.  I
> didn't trust the two-way-only btrfs raid1 mode on my older disks and was
> waiting on N-way raid1 mode, roadmapped for after raid-5/6 mode, which is
> now roadmapped for 3.5...  But with a new disk, eventually to add another
> for raid, I don't have that problem now, so with the upgrade I'm trying
> btrfs dual-metadata single-data on a few working partitions now, backup's
> still reiserfs, tho.)
>


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: failed disk
  2012-05-09 15:33                     ` Hugo Mills
@ 2012-05-09 18:49                       ` Helmut Hullen
  0 siblings, 0 replies; 54+ messages in thread
From: Helmut Hullen @ 2012-05-09 18:49 UTC (permalink / raw)
  To: linux-btrfs

Hallo, Hugo,

Du meintest am 09.05.12:

>>>    As to the spurious "upgrade" of single to RAID-0, I thought Ilya
>>> had stopped it doing that. What kernel version are you running?

>> 3.2.9, self made.

>    OK, I'm pretty sure that's too old -- it will "upgrade" single to
> RAID-0. You can probably turn it back to "single" using balance
> filters:

> # btrfs fi balance -dconvert=single /mountpoint

> (You may want to write at least a little data to the FS first --
> balance has some slightly odd behaviour on empty filesystems).

"manana" ... the system is just running "balance" after "device delete".  
And that may still need 4 ... 5 hours.

>>>    Out of interest, why did you do the device adds separately,
>>> instead of just this?

>> a) making the first 2 devices: I have tested both versions (one line
>> with 2 devices or 2 lines with 1 device); no big difference.
>>
>> But I had tested the option "-L" (labelling) too, and that makes
>> shit for the oneliner: both devices get the same label, and then
>> "findfs" finds none of them.

>    Umm... Yes, of course both devices will get the same label --
> you're labelling the filesystem, not the devices. (Didn't we have
> this argument some time ago?).

Not with that special case (and that led me to misinterpreting the error  
...).

>    I don't know what "findfs" is doing, that it can't find the
> filesystem by label: you may need to run "sync" after mkfs, possibly.

No - "findfs" works quite simple: if it finds 1 label then it tells the  
partition.
If it finds more or less labels it tells nothing.

>> b) third device: that's my usual test:
>>         make a cluster of 2 deivces
>>         fill them with data
>>         add a third device
>>         delete the smallest device

>    What are you testing? And by "delete" do you mean "btrfs dev
> delete" or "pull the cable out"?

First pure software delete. Tomorrow I'll reboot the system and look at  
the results with

        btrfs fi show

It should tell only 2 devices (that's the part which seems to work as  
described at least since kernel 3.2).

By the way: it seems to be necessary running

        btrfs fi balance ...

after "btrfs device add ..." and after "btrfs device delete ...".

Viele Gruesse!
Helmut

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: failed disk
  2012-05-09 14:37                 ` Hugo Mills
  2012-05-09 15:14                   ` failed disk Helmut Hullen
  2012-05-09 16:13                   ` failed disk (was: kernel 3.3.4 damages filesystem (?)) Ilya Dryomov
@ 2012-05-10  2:49                   ` Helmut Hullen
  2 siblings, 0 replies; 54+ messages in thread
From: Helmut Hullen @ 2012-05-10  2:49 UTC (permalink / raw)
  To: linux-btrfs

Hallo, Hugo,

Du meintest am 09.05.12:

>>         btrfs fi df /mnt/Scsi
>>
>> now tells
>>
>> Data, RAID0: total=183.18GB, used=76.60GB
>> Data: total=80.01GB, used=79.83GB
>> System, DUP: total=8.00MB, used=32.00KB
>> System: total=4.00MB, used=0.00
>> Metadata, DUP: total=1.00GB, used=192.74MB
>> Metadata: total=8.00MB, used=0.00
>>
>> --------------------------------------
>>
>> "Data, RAID0" confuses me (not very much ...), and the system for
>> metadata (RAID1) is not told.

>    DUP is two copies of each block, but it allows the two copies to
> live on the same device. It's done this because you started with a
> single device, and you can't do RAID-1 on one device. The first bit
> of metadata you write to it should automatically upgrade the DUP
> chunk to RAID-1.

It has done - ok. Adding and removing disks/partitions works as  
expected.

Viele Gruesse!
Helmut

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: kernel 3.3.4 damages filesystem (?)
  2012-05-09 14:46                                               ` Kaspar Schleiser
@ 2012-05-10 10:40                                                 ` Martin Steigerwald
  2012-05-10 11:55                                                   ` feature request (was: kernel 3.3.4 damages filesystem (?)) Helmut Hullen
  2012-05-10 19:43                                                   ` kernel 3.3.4 damages filesystem (?) Hubert Kario
  0 siblings, 2 replies; 54+ messages in thread
From: Martin Steigerwald @ 2012-05-10 10:40 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Kaspar Schleiser

Am Mittwoch, 9. Mai 2012 schrieb Kaspar Schleiser:
> Hi,
> 
> On 05/08/2012 10:56 PM, Roman Mamedov wrote:
> > Regarding btrfs, AFAIK even "btrfs -d single" suggested above works
> > not "per file", but per allocation extent, so in case of one disk
> > failure you will lose random *parts* (extents) of random files,
> > which in effect could mean no file in your whole file system will
> > remain undamaged.
> 
> Maybe we should evaluate the possiblility of such a "one file gets on
> one disk" feature.
> 
> Helmut Hullen has the use case: Many disks, totally non-critical but
> nice-to-have data. If one disk dies, some *files* should lost, not some
> *random parts of all files*.
> 
> This could be accomplished by some userspace-tool that moves stuff
> around, combined with "file pinning"-support, that lets the user make
> sure a specific file is on a specific disk.

Yeah, basically I think thats the whole point Helmut is trying to make.

I am not sure whether that should be in userspace. It could be just an 
allocation mode like "raid0" or "single". Such as "single" as in one file 
is really on one disk and thats it.

-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

^ permalink raw reply	[flat|nested] 54+ messages in thread

* feature request (was: kernel 3.3.4 damages filesystem (?))
  2012-05-10 10:40                                                 ` Martin Steigerwald
@ 2012-05-10 11:55                                                   ` Helmut Hullen
  2012-05-10 19:43                                                   ` kernel 3.3.4 damages filesystem (?) Hubert Kario
  1 sibling, 0 replies; 54+ messages in thread
From: Helmut Hullen @ 2012-05-10 11:55 UTC (permalink / raw)
  To: linux-btrfs

Hallo, Martin,

Du meintest am 10.05.12:

[...]

>> Maybe we should evaluate the possiblility of such a "one file gets
>> on one disk" feature.
>>
>> Helmut Hullen has the use case: Many disks, totally non-critical but
>> nice-to-have data. If one disk dies, some *files* should lost, not
>> some *random parts of all files*.
>>
>> This could be accomplished by some userspace-tool that moves stuff
>> around, combined with "file pinning"-support, that lets the user
>> make sure a specific file is on a specific disk.

> Yeah, basically I think thats the whole point Helmut is trying to
> make.

Yes - that's the feature which I miss ...

> I am not sure whether that should be in userspace. It could be just
> an allocation mode like "raid0" or "single". Such as "single" as in
> one file is really on one disk and thats it.

What I'm dreaming for:

I have a bundle/cluster of (p.e.) 3 disks. When I remove 1 disk  
(accidently/planned/because of disk failure) then I'd be very pleased  
when the contents of the other disks is (mostly) still readable.

It's no fun restoring Terabytes ...

Yes - I know: that's no backup, that doesn't replace a backup.

Viele Gruesse!
Helmut

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: kernel 3.3.4 damages filesystem (?)
  2012-05-10 10:40                                                 ` Martin Steigerwald
  2012-05-10 11:55                                                   ` feature request (was: kernel 3.3.4 damages filesystem (?)) Helmut Hullen
@ 2012-05-10 19:43                                                   ` Hubert Kario
  2012-05-10 20:15                                                     ` Hugo Mills
  1 sibling, 1 reply; 54+ messages in thread
From: Hubert Kario @ 2012-05-10 19:43 UTC (permalink / raw)
  To: Martin Steigerwald; +Cc: linux-btrfs, Kaspar Schleiser

On Thursday 10 of May 2012 12:40:49 Martin Steigerwald wrote:
> Am Mittwoch, 9. Mai 2012 schrieb Kaspar Schleiser:
> > Hi,
> >=20
> > On 05/08/2012 10:56 PM, Roman Mamedov wrote:
> > > Regarding btrfs, AFAIK even "btrfs -d single" suggested above wor=
ks
> > > not "per file", but per allocation extent, so in case of one disk
> > > failure you will lose random *parts* (extents) of random files,
> > > which in effect could mean no file in your whole file system will
> > > remain undamaged.
> >=20
> > Maybe we should evaluate the possiblility of such a "one file gets =
on
> > one disk" feature.
> >=20
> > Helmut Hullen has the use case: Many disks, totally non-critical bu=
t
> > nice-to-have data. If one disk dies, some *files* should lost, not =
some
> > *random parts of all files*.
> >=20
> > This could be accomplished by some userspace-tool that moves stuff
> > around, combined with "file pinning"-support, that lets the user ma=
ke
> > sure a specific file is on a specific disk.
>=20
> Yeah, basically I think thats the whole point Helmut is trying to mak=
e.
>=20
> I am not sure whether that should be in userspace. It could be just a=
n
> allocation mode like "raid0" or "single". Such as "single" as in one =
file
> is really on one disk and thats it.

I was thinking that "linear" would be good name for old style allocator=
=2E

Regards
--=20
Hubert Kario
QBS - Quality Business Software
02-656 Warszawa, ul. Ksawer=F3w 30/85
tel. +48 (22) 646-61-51, 646-74-24
www.qbs.com.pl
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: kernel 3.3.4 damages filesystem (?)
  2012-05-10 19:43                                                   ` kernel 3.3.4 damages filesystem (?) Hubert Kario
@ 2012-05-10 20:15                                                     ` Hugo Mills
  2012-05-10 20:23                                                       ` Hubert Kario
  0 siblings, 1 reply; 54+ messages in thread
From: Hugo Mills @ 2012-05-10 20:15 UTC (permalink / raw)
  To: Hubert Kario; +Cc: Martin Steigerwald, linux-btrfs, Kaspar Schleiser

[-- Attachment #1: Type: text/plain, Size: 2257 bytes --]

On Thu, May 10, 2012 at 09:43:58PM +0200, Hubert Kario wrote:
> On Thursday 10 of May 2012 12:40:49 Martin Steigerwald wrote:
> > Am Mittwoch, 9. Mai 2012 schrieb Kaspar Schleiser:
> > > Hi,
> > > 
> > > On 05/08/2012 10:56 PM, Roman Mamedov wrote:
> > > > Regarding btrfs, AFAIK even "btrfs -d single" suggested above works
> > > > not "per file", but per allocation extent, so in case of one disk
> > > > failure you will lose random *parts* (extents) of random files,
> > > > which in effect could mean no file in your whole file system will
> > > > remain undamaged.
> > > 
> > > Maybe we should evaluate the possiblility of such a "one file gets on
> > > one disk" feature.
> > > 
> > > Helmut Hullen has the use case: Many disks, totally non-critical but
> > > nice-to-have data. If one disk dies, some *files* should lost, not some
> > > *random parts of all files*.
> > > 
> > > This could be accomplished by some userspace-tool that moves stuff
> > > around, combined with "file pinning"-support, that lets the user make
> > > sure a specific file is on a specific disk.
> > 
> > Yeah, basically I think thats the whole point Helmut is trying to make.
> > 
> > I am not sure whether that should be in userspace. It could be just an
> > allocation mode like "raid0" or "single". Such as "single" as in one file
> > is really on one disk and thats it.
> 
> I was thinking that "linear" would be good name for old style allocator.

   Please do distinguish between the replication level (e.g. "single",
"RAID-1") and the allocator algorithm. These are distinct. Also, note
that both of those work on the scale of chunks/block groups. There is
a further consideration, which is the allocation of file data to block
groups, which is a whole different thing again (and not something I
know a great deal about), but which will also affect the desired
outcome quite a lot.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- Anyone who claims their cryptographic protocol is secure is ---   
         either a genius or a fool.  Given the genius/fool ratio         
                 for our species,  the odds aren't good.                 

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 190 bytes --]

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: kernel 3.3.4 damages filesystem (?)
  2012-05-10 20:15                                                     ` Hugo Mills
@ 2012-05-10 20:23                                                       ` Hubert Kario
  0 siblings, 0 replies; 54+ messages in thread
From: Hubert Kario @ 2012-05-10 20:23 UTC (permalink / raw)
  To: Hugo Mills; +Cc: Martin Steigerwald, linux-btrfs, Kaspar Schleiser

On Thursday 10 of May 2012 21:15:30 Hugo Mills wrote:
> On Thu, May 10, 2012 at 09:43:58PM +0200, Hubert Kario wrote:
> > On Thursday 10 of May 2012 12:40:49 Martin Steigerwald wrote:
> > > Am Mittwoch, 9. Mai 2012 schrieb Kaspar Schleiser:
> > > > Hi,
> > > >=20
> > > > On 05/08/2012 10:56 PM, Roman Mamedov wrote:
> > > > > Regarding btrfs, AFAIK even "btrfs -d single" suggested above
> > > > > works
> > > > > not "per file", but per allocation extent, so in case of one =
disk
> > > > > failure you will lose random *parts* (extents) of random file=
s,
> > > > > which in effect could mean no file in your whole file system =
will
> > > > > remain undamaged.
> > > >=20
> > > > Maybe we should evaluate the possiblility of such a "one file g=
ets
> > > > on
> > > > one disk" feature.
> > > >=20
> > > > Helmut Hullen has the use case: Many disks, totally non-critica=
l but
> > > > nice-to-have data. If one disk dies, some *files* should lost, =
not
> > > > some
> > > > *random parts of all files*.
> > > >=20
> > > > This could be accomplished by some userspace-tool that moves st=
uff
> > > > around, combined with "file pinning"-support, that lets the use=
r
> > > > make
> > > > sure a specific file is on a specific disk.
> > >=20
> > > Yeah, basically I think thats the whole point Helmut is trying to
> > > make.
> > >=20
> > > I am not sure whether that should be in userspace. It could be ju=
st an
> > > allocation mode like "raid0" or "single". Such as "single" as in =
one
> > > file
> > > is really on one disk and thats it.
> >=20
> > I was thinking that "linear" would be good name for old style alloc=
ator.
>=20
>    Please do distinguish between the replication level (e.g. "single"=
,
> "RAID-1") and the allocator algorithm. These are distinct. Also, note
> that both of those work on the scale of chunks/block groups. There is
> a further consideration, which is the allocation of file data to bloc=
k
> groups, which is a whole different thing again (and not something I
> know a great deal about), but which will also affect the desired
> outcome quite a lot.

Yes, I know about that.

I was more thinking on the line "how quickly restore aviability of old=20
allocator".

Regards,
--=20
Hubert Kario
QBS - Quality Business Software
02-656 Warszawa, ul. Ksawer=F3w 30/85
tel. +48 (22) 646-61-51, 646-74-24
www.qbs.com.pl
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 54+ messages in thread

end of thread, other threads:[~2012-05-10 20:23 UTC | newest]

Thread overview: 54+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-05-07 10:46 kernel 3.3.4 damages filesystem (?) Helmut Hullen
2012-05-07 10:58 ` Fajar A. Nugraha
2012-05-07 12:06   ` Helmut Hullen
2012-05-07 10:59 ` Hugo Mills
2012-05-07 12:15   ` Helmut Hullen
2012-05-07 13:34   ` Helmut Hullen
2012-05-07 14:05     ` Hugo Mills
2012-05-07 16:36       ` Helmut Hullen
2012-05-07 17:13         ` Felix Blanke
2012-05-07 17:52           ` Helmut Hullen
2012-05-07 18:00             ` Hugo Mills
2012-05-07 18:25               ` Helmut Hullen
2012-05-07 18:44                 ` Hugo Mills
2012-05-09 13:04                   ` failed disk (was: kernel 3.3.4 damages filesystem (?)) Helmut Hullen
2012-05-09 13:19                     ` Hugo Mills
2012-05-09 14:25               ` Helmut Hullen
2012-05-09 14:37                 ` Hugo Mills
2012-05-09 15:14                   ` failed disk Helmut Hullen
2012-05-09 15:33                     ` Hugo Mills
2012-05-09 18:49                       ` Helmut Hullen
2012-05-09 16:13                   ` failed disk (was: kernel 3.3.4 damages filesystem (?)) Ilya Dryomov
2012-05-10  2:49                   ` failed disk Helmut Hullen
2012-05-07 19:30             ` kernel 3.3.4 damages filesystem (?) Daniel Lee
2012-05-07 20:21               ` Helmut Hullen
2012-05-07 20:51                 ` Daniel Lee
2012-05-07 21:17                   ` Helmut Hullen
2012-05-07 21:27                     ` cwillu
2012-05-07 22:07                 ` Martin Steigerwald
2012-05-08  7:39                   ` Helmut Hullen
2012-05-08  7:44                     ` Fajar A. Nugraha
2012-05-08 10:00                       ` Helmut Hullen
2012-05-08 10:41                         ` Clemens Eisserer
2012-05-08 13:13                           ` Helmut Hullen
2012-05-08 13:44                             ` Felix Blanke
2012-05-08 13:52                               ` Hugo Mills
2012-05-08 16:53                               ` Helmut Hullen
2012-05-08 17:24                                 ` Felix Blanke
2012-05-08 18:29                                   ` Helmut Hullen
2012-05-08 18:41                                     ` Felix Blanke
2012-05-08 19:12                                       ` David Sterba
2012-05-08 19:34                                       ` Helmut Hullen
2012-05-08 20:02                                         ` Hugo Mills
2012-05-08 20:19                                           ` Helmut Hullen
2012-05-08 20:56                                             ` Roman Mamedov
2012-05-09 14:46                                               ` Kaspar Schleiser
2012-05-10 10:40                                                 ` Martin Steigerwald
2012-05-10 11:55                                                   ` feature request (was: kernel 3.3.4 damages filesystem (?)) Helmut Hullen
2012-05-10 19:43                                                   ` kernel 3.3.4 damages filesystem (?) Hubert Kario
2012-05-10 20:15                                                     ` Hugo Mills
2012-05-10 20:23                                                       ` Hubert Kario
2012-05-08 21:42                         ` Hubert Kario
2012-05-07 12:53 ` Liu Bo
2012-05-09 17:32 ` Duncan
2012-05-09 18:06   ` Atila

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.