* [linux-lvm] pvresize will cause a meta-data corruption with error message "Error writing device at 4096 length 512" @ 2019-09-11 9:17 Gang He 2019-09-11 10:01 ` Ilia Zykov ` (2 more replies) 0 siblings, 3 replies; 16+ messages in thread From: Gang He @ 2019-09-11 9:17 UTC (permalink / raw) To: LVM general discussion and development Hello List, Our user encountered a meta-data corruption problem, when run pvresize command after upgrading to LVM2 v2.02.180 from v2.02.120. The details are as below, we have following environment: - Storage: HP XP7 (SAN) - LUN's are presented to ESX via RDM - VMWare ESXi 6.5 - SLES 12 SP 4 Guest Resize happened this way (is our standard way since years) - however - this is our first resize after upgrading SLES 12 SP3 to SLES 12 SP4 - until this upgrade, we never had a problem like this: - split continous access on storage box, resize lun on XP7 - recreate ca on XP7 - scan on ESX - rescan-scsi-bus.sh -s on SLES VM - pvresize ( at this step the error happened) huns1vdb01:~ # pvresize /dev/disk/by-id/scsi-360060e80072a660000302a6600003274 Error writing device /dev/sdaf at 4096 length 512. Failed to write mda header to /dev/sdaf fd -1 Failed to update old PV extension headers in VG vghundbhulv_ar. Error writing device /dev/disk/by-id/scsi-360060e80072a660000302a66000031ec at 4096 length 512. Failed to write mda header to /dev/disk/by-id/scsi-360060e80072a660000302a66000031ec fd -1 Failed to update old PV extension headers in VG vghundbhulk_ar. VG info not found after rescan of vghundbhulv_r2 VG info not found after rescan of vghundbhula_r1 VG info not found after rescan of vghundbhuco_ar Error writing device /dev/disk/by-id/scsi-360060e80072a660000302a66000031e8 at 4096 length 512. Failed to write mda header to /dev/disk/by-id/scsi-360060e80072a660000302a66000031e8 fd -1 Failed to update old PV extension headers in VG vghundbhula_ar. VG info not found after rescan of vghundbhuco_r2 Error writing device /dev/disk/by-id/scsi-360060e80072a660000302a660000300b at 4096 length 512. Failed to write mda header to /dev/disk/by-id/scsi-360060e80072a660000302a660000300b fd -1 Failed to update old PV extension headers in VG vghundbhunrm02_r2. Any idea for this bug? Thanks a lot. Gang ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [linux-lvm] pvresize will cause a meta-data corruption with error message "Error writing device at 4096 length 512" 2019-09-11 9:17 [linux-lvm] pvresize will cause a meta-data corruption with error message "Error writing device at 4096 length 512" Gang He @ 2019-09-11 10:01 ` Ilia Zykov 2019-09-11 10:03 ` Ilia Zykov 2019-10-11 8:11 ` Heming Zhao 2 siblings, 0 replies; 16+ messages in thread From: Ilia Zykov @ 2019-09-11 10:01 UTC (permalink / raw) To: LVM general discussion and development, Gang He [-- Attachment #1: Type: text/plain, Size: 2589 bytes --] Maybe this? Please note that this problem can also happen in other cases, such as mixing disks with different block sizes (e.g. SCSI disks with 512 bytes and s390x-DASDs with 4096 block size). https://www.redhat.com/archives/linux-lvm/2019-February/msg00018.html On 11.09.2019 12:17, Gang He wrote: > Hello List, > > Our user encountered a meta-data corruption problem, when run pvresize command after upgrading to LVM2 v2.02.180 from v2.02.120. > > The details are as below, > we have following environment: > - Storage: HP XP7 (SAN) - LUN's are presented to ESX via RDM > - VMWare ESXi 6.5 > - SLES 12 SP 4 Guest > > Resize happened this way (is our standard way since years) - however - this is our first resize after upgrading SLES 12 SP3 to SLES 12 SP4 - until this upgrade, we > never had a problem like this: > - split continous access on storage box, resize lun on XP7 > - recreate ca on XP7 > - scan on ESX > - rescan-scsi-bus.sh -s on SLES VM > - pvresize ( at this step the error happened) > > huns1vdb01:~ # pvresize /dev/disk/by-id/scsi-360060e80072a660000302a6600003274 > Error writing device /dev/sdaf at 4096 length 512. > Failed to write mda header to /dev/sdaf fd -1 > Failed to update old PV extension headers in VG vghundbhulv_ar. > Error writing device /dev/disk/by-id/scsi-360060e80072a660000302a66000031ec at 4096 length 512. > Failed to write mda header to /dev/disk/by-id/scsi-360060e80072a660000302a66000031ec fd -1 > Failed to update old PV extension headers in VG vghundbhulk_ar. > VG info not found after rescan of vghundbhulv_r2 > VG info not found after rescan of vghundbhula_r1 > VG info not found after rescan of vghundbhuco_ar > Error writing device /dev/disk/by-id/scsi-360060e80072a660000302a66000031e8 at 4096 length 512. > Failed to write mda header to /dev/disk/by-id/scsi-360060e80072a660000302a66000031e8 fd -1 > Failed to update old PV extension headers in VG vghundbhula_ar. > VG info not found after rescan of vghundbhuco_r2 > Error writing device /dev/disk/by-id/scsi-360060e80072a660000302a660000300b at 4096 length 512. > Failed to write mda header to /dev/disk/by-id/scsi-360060e80072a660000302a660000300b fd -1 > Failed to update old PV extension headers in VG vghundbhunrm02_r2. > > Any idea for this bug? > > Thanks a lot. > Gang > > > _______________________________________________ > linux-lvm mailing list > linux-lvm@redhat.com > https://www.redhat.com/mailman/listinfo/linux-lvm > read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/ > [-- Attachment #2: S/MIME Cryptographic Signature --] [-- Type: application/pkcs7-signature, Size: 3703 bytes --] ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [linux-lvm] pvresize will cause a meta-data corruption with error message "Error writing device at 4096 length 512" 2019-09-11 9:17 [linux-lvm] pvresize will cause a meta-data corruption with error message "Error writing device at 4096 length 512" Gang He 2019-09-11 10:01 ` Ilia Zykov @ 2019-09-11 10:03 ` Ilia Zykov 2019-09-11 10:10 ` Ingo Franzki 2019-10-11 8:11 ` Heming Zhao 2 siblings, 1 reply; 16+ messages in thread From: Ilia Zykov @ 2019-09-11 10:03 UTC (permalink / raw) To: LVM general discussion and development, Gang He [-- Attachment #1: Type: text/plain, Size: 2587 bytes --] Maybe this? Please note that this problem can also happen in other cases, such as mixing disks with different block sizes (e.g. SCSI disks with 512 bytes and s390x-DASDs with 4096 block size). https://www.redhat.com/archives/linux-lvm/2019-February/msg00018.html On 11.09.2019 12:17, Gang He wrote: > Hello List, > > Our user encountered a meta-data corruption problem, when run pvresize command after upgrading to LVM2 v2.02.180 from v2.02.120. > > The details are as below, > we have following environment: > - Storage: HP XP7 (SAN) - LUN's are presented to ESX via RDM > - VMWare ESXi 6.5 > - SLES 12 SP 4 Guest > > Resize happened this way (is our standard way since years) - however - this is our first resize after upgrading SLES 12 SP3 to SLES 12 SP4 - until this upgrade, we > never had a problem like this: > - split continous access on storage box, resize lun on XP7 > - recreate ca on XP7 > - scan on ESX > - rescan-scsi-bus.sh -s on SLES VM > - pvresize ( at this step the error happened) > > huns1vdb01:~ # pvresize /dev/disk/by-id/scsi-360060e80072a660000302a6600003274 > Error writing device /dev/sdaf at 4096 length 512. > Failed to write mda header to /dev/sdaf fd -1 > Failed to update old PV extension headers in VG vghundbhulv_ar. > Error writing device /dev/disk/by-id/scsi-360060e80072a660000302a66000031ec at 4096 length 512. > Failed to write mda header to /dev/disk/by-id/scsi-360060e80072a660000302a66000031ec fd -1 > Failed to update old PV extension headers in VG vghundbhulk_ar. > VG info not found after rescan of vghundbhulv_r2 > VG info not found after rescan of vghundbhula_r1 > VG info not found after rescan of vghundbhuco_ar > Error writing device /dev/disk/by-id/scsi-360060e80072a660000302a66000031e8 at 4096 length 512. > Failed to write mda header to /dev/disk/by-id/scsi-360060e80072a660000302a66000031e8 fd -1 > Failed to update old PV extension headers in VG vghundbhula_ar. > VG info not found after rescan of vghundbhuco_r2 > Error writing device /dev/disk/by-id/scsi-360060e80072a660000302a660000300b at 4096 length 512. > Failed to write mda header to /dev/disk/by-id/scsi-360060e80072a660000302a660000300b fd -1 > Failed to update old PV extension headers in VG vghundbhunrm02_r2. > > Any idea for this bug? > > Thanks a lot. > Gang > > > _______________________________________________ > linux-lvm mailing list > linux-lvm@redhat.com > https://www.redhat.com/mailman/listinfo/linux-lvm > read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/ > [-- Attachment #2: S/MIME Cryptographic Signature --] [-- Type: application/pkcs7-signature, Size: 3695 bytes --] ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [linux-lvm] pvresize will cause a meta-data corruption with error message "Error writing device at 4096 length 512" 2019-09-11 10:03 ` Ilia Zykov @ 2019-09-11 10:10 ` Ingo Franzki 2019-09-11 10:20 ` Gang He 0 siblings, 1 reply; 16+ messages in thread From: Ingo Franzki @ 2019-09-11 10:10 UTC (permalink / raw) To: LVM general discussion and development, Ilia Zykov, Gang He On 11.09.2019 12:03, Ilia Zykov wrote: > Maybe this? > > Please note that this problem can also happen in other cases, such as > mixing disks with different block sizes (e.g. SCSI disks with 512 bytes > and s390x-DASDs with 4096 block size). > > https://www.redhat.com/archives/linux-lvm/2019-February/msg00018.html And the fix for this is already available upstream (Thanks David!): https://sourceware.org/git/?p=lvm2.git;a=commit;h=0404539edb25e4a9d3456bb3e6b402aa2767af6b > > > On 11.09.2019 12:17, Gang He wrote: >> Hello List, >> >> Our user encountered a meta-data corruption problem, when run pvresize command after upgrading to LVM2 v2.02.180 from v2.02.120. >> >> The details are as below, >> we have following environment: >> - Storage: HP XP7 (SAN) - LUN's are presented to ESX via RDM >> - VMWare ESXi 6.5 >> - SLES 12 SP 4 Guest >> >> Resize happened this way (is our standard way since years) - however - this is our first resize after upgrading SLES 12 SP3 to SLES 12 SP4 - until this upgrade, we >> never had a problem like this: >> - split continous access on storage box, resize lun on XP7 >> - recreate ca on XP7 >> - scan on ESX >> - rescan-scsi-bus.sh -s on SLES VM >> - pvresize ( at this step the error happened) >> >> huns1vdb01:~ # pvresize /dev/disk/by-id/scsi-360060e80072a660000302a6600003274 >> Error writing device /dev/sdaf at 4096 length 512. >> Failed to write mda header to /dev/sdaf fd -1 >> Failed to update old PV extension headers in VG vghundbhulv_ar. >> Error writing device /dev/disk/by-id/scsi-360060e80072a660000302a66000031ec at 4096 length 512. >> Failed to write mda header to /dev/disk/by-id/scsi-360060e80072a660000302a66000031ec fd -1 >> Failed to update old PV extension headers in VG vghundbhulk_ar. >> VG info not found after rescan of vghundbhulv_r2 >> VG info not found after rescan of vghundbhula_r1 >> VG info not found after rescan of vghundbhuco_ar >> Error writing device /dev/disk/by-id/scsi-360060e80072a660000302a66000031e8 at 4096 length 512. >> Failed to write mda header to /dev/disk/by-id/scsi-360060e80072a660000302a66000031e8 fd -1 >> Failed to update old PV extension headers in VG vghundbhula_ar. >> VG info not found after rescan of vghundbhuco_r2 >> Error writing device /dev/disk/by-id/scsi-360060e80072a660000302a660000300b at 4096 length 512. >> Failed to write mda header to /dev/disk/by-id/scsi-360060e80072a660000302a660000300b fd -1 >> Failed to update old PV extension headers in VG vghundbhunrm02_r2. >> >> Any idea for this bug? >> >> Thanks a lot. >> Gang >> >> >> _______________________________________________ >> linux-lvm mailing list >> linux-lvm@redhat.com >> https://www.redhat.com/mailman/listinfo/linux-lvm >> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/ >> > > > > _______________________________________________ > linux-lvm mailing list > linux-lvm@redhat.com > https://www.redhat.com/mailman/listinfo/linux-lvm > read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/ > -- Ingo Franzki eMail: ifranzki@linux.ibm.com Tel: ++49 (0)7031-16-4648 Fax: ++49 (0)7031-16-3456 Linux on IBM Z Development, Schoenaicher Str. 220, 71032 Boeblingen, Germany IBM Deutschland Research & Development GmbH / Vorsitzender des Aufsichtsrats: Matthias Hartmann Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 IBM DATA Privacy Statement: https://www.ibm.com/privacy/us/en/ ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [linux-lvm] pvresize will cause a meta-data corruption with error message "Error writing device at 4096 length 512" 2019-09-11 10:10 ` Ingo Franzki @ 2019-09-11 10:20 ` Gang He 0 siblings, 0 replies; 16+ messages in thread From: Gang He @ 2019-09-11 10:20 UTC (permalink / raw) To: Ingo Franzki, Ilia Zykov, LVM general discussion and development Hi Ingo and Ilia, Thank for your helps. > -----Original Message----- > From: Ingo Franzki [mailto:ifranzki@linux.ibm.com] > Sent: 2019年9月11日 18:11 > To: Ilia Zykov <mail@izyk.ru>; LVM general discussion and development > <linux-lvm@redhat.com>; Gang He <GHe@suse.com> > Subject: Re: [linux-lvm] pvresize will cause a meta-data corruption with error > message "Error writing device at 4096 length 512" > > On 11.09.2019 12:03, Ilia Zykov wrote: > > Maybe this? > > > > Please note that this problem can also happen in other cases, such as > > mixing disks with different block sizes (e.g. SCSI disks with 512 > > bytes and s390x-DASDs with 4096 block size). > > > > > https://www.redhat.com/archives/linux-lvm/2019-February/msg00018.html > > And the fix for this is already available upstream (Thanks David!): > https://sourceware.org/git/?p=lvm2.git;a=commit;h=0404539edb25e4a9d34 > 56bb3e6b402aa2767af6b This commit can fix the problem thoroughly? Do we need any other patches based on v2.02.180? Thanks Gang > > > > > > On 11.09.2019 12:17, Gang He wrote: > >> Hello List, > >> > >> Our user encountered a meta-data corruption problem, when run pvresize > command after upgrading to LVM2 v2.02.180 from v2.02.120. > >> > >> The details are as below, > >> we have following environment: > >> - Storage: HP XP7 (SAN) - LUN's are presented to ESX via RDM > >> - VMWare ESXi 6.5 > >> - SLES 12 SP 4 Guest > >> > >> Resize happened this way (is our standard way since years) - however > >> - this is our first resize after upgrading SLES 12 SP3 to SLES 12 SP4 - until > this upgrade, we never had a problem like this: > >> - split continous access on storage box, resize lun on XP7 > >> - recreate ca on XP7 > >> - scan on ESX > >> - rescan-scsi-bus.sh -s on SLES VM > >> - pvresize ( at this step the error happened) > >> > >> huns1vdb01:~ # pvresize > >> /dev/disk/by-id/scsi-360060e80072a660000302a6600003274 > >> Error writing device /dev/sdaf at 4096 length 512. > >> Failed to write mda header to /dev/sdaf fd -1 Failed to update old > >> PV extension headers in VG vghundbhulv_ar. > >> Error writing device > /dev/disk/by-id/scsi-360060e80072a660000302a66000031ec at 4096 length > 512. > >> Failed to write mda header to > >> /dev/disk/by-id/scsi-360060e80072a660000302a66000031ec fd -1 > Failed to update old PV extension headers in VG vghundbhulk_ar. > >> VG info not found after rescan of vghundbhulv_r2 VG info not found > >> after rescan of vghundbhula_r1 VG info not found after rescan of > >> vghundbhuco_ar Error writing device > >> /dev/disk/by-id/scsi-360060e80072a660000302a66000031e8 at 4096 > length 512. > >> Failed to write mda header to > >> /dev/disk/by-id/scsi-360060e80072a660000302a66000031e8 fd -1 > Failed to update old PV extension headers in VG vghundbhula_ar. > >> VG info not found after rescan of vghundbhuco_r2 Error writing > >> device /dev/disk/by-id/scsi-360060e80072a660000302a660000300b at > 4096 length 512. > >> Failed to write mda header to > >> /dev/disk/by-id/scsi-360060e80072a660000302a660000300b fd -1 > Failed to update old PV extension headers in VG vghundbhunrm02_r2. > >> > >> Any idea for this bug? > >> > >> Thanks a lot. > >> Gang > >> > >> > >> _______________________________________________ > >> linux-lvm mailing list > >> linux-lvm@redhat.com > >> https://www.redhat.com/mailman/listinfo/linux-lvm > >> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/ > >> > > > > > > > > _______________________________________________ > > linux-lvm mailing list > > linux-lvm@redhat.com > > https://www.redhat.com/mailman/listinfo/linux-lvm > > read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/ > > > > > -- > Ingo Franzki > eMail: ifranzki@linux.ibm.com > Tel: ++49 (0)7031-16-4648 > Fax: ++49 (0)7031-16-3456 > Linux on IBM Z Development, Schoenaicher Str. 220, 71032 Boeblingen, > Germany > > IBM Deutschland Research & Development GmbH / Vorsitzender des > Aufsichtsrats: Matthias Hartmann > Geschäftsführung: Dirk Wittkopp > Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB > 243294 IBM DATA Privacy Statement: https://www.ibm.com/privacy/us/en/ > ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [linux-lvm] pvresize will cause a meta-data corruption with error message "Error writing device at 4096 length 512" 2019-09-11 9:17 [linux-lvm] pvresize will cause a meta-data corruption with error message "Error writing device at 4096 length 512" Gang He 2019-09-11 10:01 ` Ilia Zykov 2019-09-11 10:03 ` Ilia Zykov @ 2019-10-11 8:11 ` Heming Zhao 2019-10-11 9:22 ` Heming Zhao 2019-10-11 15:14 ` David Teigland 2 siblings, 2 replies; 16+ messages in thread From: Heming Zhao @ 2019-10-11 8:11 UTC (permalink / raw) To: LVM general discussion and development, Gang He Hello list, I analyze this issue for some days. It looks a new bug. trigger steps: user execute pvresize to enlarge the pv. After the command execution, one disk lvm metadata was overwrite by another disk lvm metadata. once log (execute pvresize cmd), there are 7 disk occur read/write failed: ``` scsi-360060e80072a670000302a670000fc68 scsi-360060e80072a670000302a670000fc67 scsi-360060e80072a670000302a670000fc66 scsi-360060e80072a660000302a660000f74c scsi-360060e80072a660000302a660000f74a scsi-360060e80072a660000302a660000f749 scsi-360060e80072a660000302a660000f748 (has fc68 metadata) ``` the f748 metadata was overwritten by fc68. the fc67 fc66 f74c f74a f749 f748 error log are same: ``` #toollib.c:4377 Processing PVs in VG vgpocdbcdb1_r1 #locking/locking.c:331 Dropping cache for vgpocdbcdb1_r1. #misc/lvm-flock.c:202 Locking /run/lvm/lock/V_vgpocdbcdb1_r1 WB #misc/lvm-flock.c:100 _do_flock /run/lvm/lock/V_vgpocdbcdb1_r1:aux WB #misc/lvm-flock.c:100 _do_flock /run/lvm/lock/V_vgpocdbcdb1_r1 WB #misc/lvm-flock.c:47 _undo_flock /run/lvm/lock/V_vgpocdbcdb1_r1:aux #metadata/metadata.c:3778 Reading VG vgpocdbcdb1_r1 tTwjvG-xxxx-FA0cJj #metadata/metadata.c:3874 Rescanning devices for vgpocdbcdb1_r1 #cache/lvmcache.c:751 lvmcache has no info for vgname "vgpocdbcdb1_r1" with VGID tTwjvGfl1zsU6gODANVsela1siFA0cJj. #label/label.c:629 Scanning 1 devices for VG info #label/label.c:665 Scanning submitted 1 reads #label/label.c:674 Scan failed to read /dev/disk/by-id/scsi-360060e80072a670000302a670000fc67 error 0. #device/bcache.c:189 WRITE last fd 36 last_offset 4608 last_sector_size 512 #device/bcache.c:244 Limit write at 0 len 131072 to len 4608 #label/label.c:764 Scanned devices: read errors 1 process errors 0 failed 1 #cache/lvmcache.c:751 lvmcache has no info for vgname "vgpocdbcdb1_r1" with VGID tTwjvGfl1zsU6gODANVsela1siFA0cJj. #cache/lvmcache.c:1410 VG info not found after rescan of vgpocdbcdb1_r1 #cache/lvmcache.c:751 lvmcache has no info for vgname "vgpocdbcdb1_r1" with VGID tTwjvGfl1zsU6gODANVsela1siFA0cJj. #metadata/metadata.c:3884 Cache did not find fmt for vgname vgpocdbcdb1_r1 #metadata/metadata.c:3885 <backtrace> #metadata/metadata.c:4518 <backtrace> ``` the fc68 error log is in below <1> subsection. From all the log files, user's disk have 3 classes issues: 1> disk (fc68) has old lvm extension header. It will trigger lvm write action to update PV header (metadata area). related log: ``` #format_text/text_label.c:423 /dev/disk/by-id/scsi-360060e80072a670000302a670000fc68: PV header extension version 1 found ... ... #metadata/metadata.c:2842 PV /dev/disk/by-id/scsi-360060e80072a670000302a670000fc68 has old extension header, updating to newest version. ``` In user machine, this write action was failed, the PV header data (first 4K) save in bcache (cache->errored list), and then write (by bcache_flush) to another disk (f748). related error log: ``` #format_text/format-text.c:1470 Creating metadata area on /dev/disk/by-id/scsi-360060e80072a670000302a670000fc68 at sector 8 size 2040 sectors #device/bcache.c:189 WRITE last fd 36 last_offset 4608 last_sector_size 512 #device/bcache.c:244 Limit write at 0 len 131072 to len 4608 #label/label.c:1333 Error writing device /dev/disk/by-id/scsi-360060e80072a670000302a670000fc68 at 4096 length 512. #format_text/format-text.c:407 Failed to write mda header to /dev/disk/by-id/scsi-360060e80072a670000302a670000fc68 fd -1 ``` related code: pvresize process_each_pv _process_pvs_in_vgs vg_read vg_read_internal _vg_read _vg_update_old_pv_ext_if_needed +-> pv_needs_rewrite is 1 | set vg->pv_write_list +-> vg_write || \/ vg_write pv_write //vg->pv_write_list is not empty. pv->fmt->ops->pv_write _text_pv_write lvmcache_foreach_mda(info, _write_single_mda, &baton) _raw_write_mda_header static int _raw_write_mda_header () { ... ... dev_set_last_byte(dev, start_byte + MDA_HEADER_SIZE); if (!dev_write_bytes(dev, start_byte, MDA_HEADER_SIZE, mdah)) { dev_unset_last_byte(dev); //zhm: useless, fd = -1 now!! return 0; } dev_unset_last_byte(dev); return 1; } If dev_write_bytes failed, the bcache never clean last_byte. and the fd is closed at same time, but cache->errored still have errored fd's data. later lvm open new disk, the fd may reuse the old-errored fd number, error data will be written when later lvm call bcache_flush. 2> duplicated pv header. as <1> description, fc68 metadata was overwritten to f748. this cause by lvm bug (I said in <1>). 3> device not correct I don't know why the disk scsi-360060e80072a670000302a670000fc68 has below wrong metadata: pre_pvr/scsi-360060e80072a670000302a670000fc68 (please also read the comments in below metadata area.) ``` vgpocdbcdb1_r2 { id = "PWd17E-xxx-oANHbq" seqno = 20 format = "lvm2" status = ["RESIZEABLE", "READ", "WRITE"] flags = [] extent_size = 65536 max_lv = 0 max_pv = 0 metadata_copies = 0 physical_volumes { pv0 { id = "3KTOW5-xxxx-8g0Rf2" device = "/dev/disk/by-id/scsi-360060e80072a660000302a660000f768" Wrong!! ^^^^^ I don't know why there is f768, please ask customer status = ["ALLOCATABLE"] flags = [] dev_size = 860160 pe_start = 2048 pe_count = 13 } } ``` fc68 => f768 the 'c' (b1100) change to '7' (b0111). maybe disk bit overturn, maybe lvm has bug. I don't know & have no idea. Thanks zhm On 9/11/19 5:17 PM, Gang He wrote: > Hello List, > > Our user encountered a meta-data corruption problem, when run pvresize command after upgrading to LVM2 v2.02.180 from v2.02.120. > > The details are as below, > we have following environment: > - Storage: HP XP7 (SAN) - LUN's are presented to ESX via RDM > - VMWare ESXi 6.5 > - SLES 12 SP 4 Guest > > Resize happened this way (is our standard way since years) - however - this is our first resize after upgrading SLES 12 SP3 to SLES 12 SP4 - until this upgrade, we > never had a problem like this: > - split continous access on storage box, resize lun on XP7 > - recreate ca on XP7 > - scan on ESX > - rescan-scsi-bus.sh -s on SLES VM > - pvresize ( at this step the error happened) > > huns1vdb01:~ # pvresize /dev/disk/by-id/scsi-360060e80072a660000302a6600003274 ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [linux-lvm] pvresize will cause a meta-data corruption with error message "Error writing device at 4096 length 512" 2019-10-11 8:11 ` Heming Zhao @ 2019-10-11 9:22 ` Heming Zhao 2019-10-11 10:38 ` Zdenek Kabelac 2019-10-11 15:14 ` David Teigland 1 sibling, 1 reply; 16+ messages in thread From: Heming Zhao @ 2019-10-11 9:22 UTC (permalink / raw) To: LVM general discussion and development, Gang He Only one thing I am confusion all the time. When read/write error, lvm will call bcache_invalidate_fd & _scan_dev_close to close fd. So the first successfully read (i.e.: f747), which following f748 finally has fc68's fd. This will cause f747 metadata overwrite not f748. the sequence of disk scanning: ``` scsi-360060e80072a670000302a670000fc69 <=== successful scsi-360060e80072a670000302a670000fc68 <=== first failed scsi-360060e80072a670000302a670000fc67 scsi-360060e80072a670000302a670000fc66 scsi-360060e80072a660000302a660000f74c scsi-360060e80072a660000302a660000f74a scsi-360060e80072a660000302a660000f749 scsi-360060e80072a660000302a660000f748 (has fc68 metadata) <=== last failed scsi-360060e80072a660000302a660000f747 <=== first successfully read following last failed ``` Hope you understand my saying. On 10/11/19 4:11 PM, Heming Zhao wrote: > Hello list, > > I analyze this issue for some days. It looks a new bug. > > trigger steps: > user execute pvresize to enlarge the pv. > After the command execution, one disk lvm metadata was overwrite by another disk lvm metadata. > > once log (execute pvresize cmd), there are 7 disk occur read/write failed: > ``` > scsi-360060e80072a670000302a670000fc68 > scsi-360060e80072a670000302a670000fc67 > scsi-360060e80072a670000302a670000fc66 > scsi-360060e80072a660000302a660000f74c > scsi-360060e80072a660000302a660000f74a > scsi-360060e80072a660000302a660000f749 > scsi-360060e80072a660000302a660000f748 (has fc68 metadata) > ``` > the f748 metadata was overwritten by fc68. > ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [linux-lvm] pvresize will cause a meta-data corruption with error message "Error writing device at 4096 length 512" 2019-10-11 9:22 ` Heming Zhao @ 2019-10-11 10:38 ` Zdenek Kabelac 2019-10-11 11:50 ` Heming Zhao 0 siblings, 1 reply; 16+ messages in thread From: Zdenek Kabelac @ 2019-10-11 10:38 UTC (permalink / raw) To: LVM general discussion and development, Heming Zhao, Gang He Dne 11. 10. 19 v 11:22 Heming Zhao napsal(a): > Only one thing I am confusion all the time. > When read/write error, lvm will call bcache_invalidate_fd & _scan_dev_close to close fd. > So the first successfully read (i.e.: f747), which following f748 finally has fc68's fd. > This will cause f747 metadata overwrite not f748. > > Hi Have you considered checking newer version of lvm2? Regards Zdenek ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [linux-lvm] pvresize will cause a meta-data corruption with error message "Error writing device at 4096 length 512" 2019-10-11 10:38 ` Zdenek Kabelac @ 2019-10-11 11:50 ` Heming Zhao 0 siblings, 0 replies; 16+ messages in thread From: Heming Zhao @ 2019-10-11 11:50 UTC (permalink / raw) To: Zdenek Kabelac, LVM general discussion and development, Gang He For _raw_write_mda_header(), the latest codes are same as stable-2.02. And below usage is wrong, should be fixed. ``` if (!dev_write_bytes(mdac->area.dev, write1_start, (size_t)write1_size, write_buf)) { ... ... dev_unset_last_byte(mdac->area.dev); <==== invalid code, this time fd had been released. ... ... } ``` This issue only happened on our customer machine, when updating lvm2 from 2.02.120 (no bcache code) to 2.02.180 (contains bcache). Thanks zhm On 10/11/19 6:38 PM, Zdenek Kabelac wrote: > Dne 11. 10. 19 v 11:22 Heming Zhao napsal(a): >> Only one thing I am confusion all the time. >> When read/write error, lvm will call bcache_invalidate_fd & _scan_dev_close to close fd. >> So the first successfully read (i.e.: f747), which following f748 finally has fc68's fd. >> This will cause f747 metadata overwrite not f748. >> >> > > Hi > > Have you considered checking newer version of lvm2? > > Regards > > Zdenek > > ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [linux-lvm] pvresize will cause a meta-data corruption with error message "Error writing device at 4096 length 512" 2019-10-11 8:11 ` Heming Zhao 2019-10-11 9:22 ` Heming Zhao @ 2019-10-11 15:14 ` David Teigland 2019-10-12 3:23 ` Gang He 2019-10-12 6:34 ` Heming Zhao 1 sibling, 2 replies; 16+ messages in thread From: David Teigland @ 2019-10-11 15:14 UTC (permalink / raw) To: Heming Zhao; +Cc: Gang He, linux-lvm On Fri, Oct 11, 2019 at 08:11:29AM +0000, Heming Zhao wrote: > I analyze this issue for some days. It looks a new bug. Yes, thanks for the thorough analysis. > In user machine, this write action was failed, the PV header data (first > 4K) save in bcache (cache->errored list), and then write (by > bcache_flush) to another disk (f748). It looks like we need to get rid of cache->errored completely. > If dev_write_bytes failed, the bcache never clean last_byte. and the fd > is closed at same time, but cache->errored still have errored fd's data. > later lvm open new disk, the fd may reuse the old-errored fd number, > error data will be written when later lvm call bcache_flush. That's a bad bug. > 2> duplicated pv header. > as <1> description, fc68 metadata was overwritten to f748. > this cause by lvm bug (I said in <1>). > > 3> device not correct > I don't know why the disk scsi-360060e80072a670000302a670000fc68 has below wrong metadata: > > pre_pvr/scsi-360060e80072a670000302a670000fc68 > (please also read the comments in below metadata area.) > ``` > vgpocdbcdb1_r2 { > id = "PWd17E-xxx-oANHbq" > seqno = 20 > format = "lvm2" > status = ["RESIZEABLE", "READ", "WRITE"] > flags = [] > extent_size = 65536 > max_lv = 0 > max_pv = 0 > metadata_copies = 0 > > physical_volumes { > > pv0 { > id = "3KTOW5-xxxx-8g0Rf2" > device = "/dev/disk/by-id/scsi-360060e80072a660000302a660000f768" > Wrong!! ^^^^^ > I don't know why there is f768, please ask customer > status = ["ALLOCATABLE"] > flags = [] > dev_size = 860160 > pe_start = 2048 > pe_count = 13 > } > } > ``` > fc68 => f768 the 'c' (b1100) change to '7' (b0111). > maybe disk bit overturn, maybe lvm has bug. I don't know & have no idea. Is scsi-360060e80072a660000302a660000f768 the correct device for PVID 3KTOW5...? If so, then it's consistent. If not, then I suspect this is a result of duplicating the PVID on multiple devices above. > On 9/11/19 5:17 PM, Gang He wrote: > > Hello List, > > > > Our user encountered a meta-data corruption problem, when run pvresize command after upgrading to LVM2 v2.02.180 from v2.02.120. > > > > The details are as below, > > we have following environment: > > - Storage: HP XP7 (SAN) - LUN's are presented to ESX via RDM > > - VMWare ESXi 6.5 > > - SLES 12 SP 4 Guest > > > > Resize happened this way (is our standard way since years) - however - this is our first resize after upgrading SLES 12 SP3 to SLES 12 SP4 - until this upgrade, we > > never had a problem like this: > > - split continous access on storage box, resize lun on XP7 > > - recreate ca on XP7 > > - scan on ESX > > - rescan-scsi-bus.sh -s on SLES VM > > - pvresize ( at this step the error happened) > > > > huns1vdb01:~ # pvresize /dev/disk/by-id/scsi-360060e80072a660000302a6600003274 > > _______________________________________________ > linux-lvm mailing list > linux-lvm@redhat.com > https://www.redhat.com/mailman/listinfo/linux-lvm > read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/ ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [linux-lvm] pvresize will cause a meta-data corruption with error message "Error writing device at 4096 length 512" 2019-10-11 15:14 ` David Teigland @ 2019-10-12 3:23 ` Gang He 2019-10-12 6:34 ` Heming Zhao 1 sibling, 0 replies; 16+ messages in thread From: Gang He @ 2019-10-12 3:23 UTC (permalink / raw) To: David Teigland, Heming Zhao; +Cc: linux-lvm Hello David, Based on the information from Heming, do you think this is a new bug? Or we can fix it with the existing patches. Now, the user want to restore the LVM2 meta-data back to the original status, do you have any suggestions? Thanks Gang > -----Original Message----- > From: David Teigland [mailto:teigland@redhat.com] > Sent: 2019��10��11�� 23:14 > To: Heming Zhao <heming.zhao@suse.com> > Cc: linux-lvm@redhat.com; Gang He <GHe@suse.com> > Subject: Re: [linux-lvm] pvresize will cause a meta-data corruption with error > message "Error writing device at 4096 length 512" > > On Fri, Oct 11, 2019 at 08:11:29AM +0000, Heming Zhao wrote: > > > I analyze this issue for some days. It looks a new bug. > > Yes, thanks for the thorough analysis. > > > In user machine, this write action was failed, the PV header data > > (first > > 4K) save in bcache (cache->errored list), and then write (by > > bcache_flush) to another disk (f748). > > It looks like we need to get rid of cache->errored completely. > > > If dev_write_bytes failed, the bcache never clean last_byte. and the > > fd is closed at same time, but cache->errored still have errored fd's data. > > later lvm open new disk, the fd may reuse the old-errored fd number, > > error data will be written when later lvm call bcache_flush. > > That's a bad bug. > > > 2> duplicated pv header. > > as <1> description, fc68 metadata was overwritten to f748. > > this cause by lvm bug (I said in <1>). > > > > 3> device not correct > > I don't know why the disk > scsi-360060e80072a670000302a670000fc68 has below wrong metadata: > > > > pre_pvr/scsi-360060e80072a670000302a670000fc68 > > (please also read the comments in below metadata area.) ``` > > vgpocdbcdb1_r2 { > > id = "PWd17E-xxx-oANHbq" > > seqno = 20 > > format = "lvm2" > > status = ["RESIZEABLE", "READ", "WRITE"] > > flags = [] > > extent_size = 65536 > > max_lv = 0 > > max_pv = 0 > > metadata_copies = 0 > > > > physical_volumes { > > > > pv0 { > > id = "3KTOW5-xxxx-8g0Rf2" > > device = > "/dev/disk/by-id/scsi-360060e80072a660000302a660000f768" > > > Wrong!! ^^^^^ > > I don't know why there is f768, please ask > customer > > status = ["ALLOCATABLE"] > > flags = [] > > dev_size = 860160 > > pe_start = 2048 > > pe_count = 13 > > } > > } > > ``` > > fc68 => f768 the 'c' (b1100) change to '7' (b0111). > > maybe disk bit overturn, maybe lvm has bug. I don't know & have no > idea. > > Is scsi-360060e80072a660000302a660000f768 the correct device for PVID > 3KTOW5...? If so, then it's consistent. If not, then I suspect this is a result of > duplicating the PVID on multiple devices above. > > > > On 9/11/19 5:17 PM, Gang He wrote: > > > Hello List, > > > > > > Our user encountered a meta-data corruption problem, when run > pvresize command after upgrading to LVM2 v2.02.180 from v2.02.120. > > > > > > The details are as below, > > > we have following environment: > > > - Storage: HP XP7 (SAN) - LUN's are presented to ESX via RDM > > > - VMWare ESXi 6.5 > > > - SLES 12 SP 4 Guest > > > > > > Resize happened this way (is our standard way since years) - however > > > - this is our first resize after upgrading SLES 12 SP3 to SLES 12 SP4 - until > this upgrade, we never had a problem like this: > > > - split continous access on storage box, resize lun on XP7 > > > - recreate ca on XP7 > > > - scan on ESX > > > - rescan-scsi-bus.sh -s on SLES VM > > > - pvresize ( at this step the error happened) > > > > > > huns1vdb01:~ # pvresize > > > /dev/disk/by-id/scsi-360060e80072a660000302a6600003274 > > > > _______________________________________________ > > linux-lvm mailing list > > linux-lvm@redhat.com > > https://www.redhat.com/mailman/listinfo/linux-lvm > > read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/ ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [linux-lvm] pvresize will cause a meta-data corruption with error message "Error writing device at 4096 length 512" 2019-10-11 15:14 ` David Teigland 2019-10-12 3:23 ` Gang He @ 2019-10-12 6:34 ` Heming Zhao 2019-10-12 7:11 ` Heming Zhao 1 sibling, 1 reply; 16+ messages in thread From: Heming Zhao @ 2019-10-12 6:34 UTC (permalink / raw) To: David Teigland; +Cc: Gang He, linux-lvm Hello David, Thank you for your reply. For these days analysis code, I found below codes can be enhanced. (code changes base on git master branch.) --------------- commit 3768196011fb01e4016510bfab9eef0c7bdc04f5 (HEAD -> master) Author: Zhao Heming <heming.zhao@suse.com> Date: Sat Oct 12 14:28:06 2019 +0800 fix typo in lib/cache/lvmcache.c enhance error handling in bcache fix constant var 'error' in _scan_list fix gcc warning in _lvconvert_split_cache_single Signed-off-by: Zhao Heming <heming.zhao@suse.com> diff --git a/lib/cache/lvmcache.c b/lib/cache/lvmcache.c index f6e792459b..499f9437cb 100644 --- a/lib/cache/lvmcache.c +++ b/lib/cache/lvmcache.c @@ -939,7 +939,7 @@ int lvmcache_label_rescan_vg_rw(struct cmd_context *cmd, const char *vgname, con * incorrectly placed PVs should have been moved from the orphan vginfo * onto their correct vginfo's, and the orphan vginfo should (in theory) * represent only real orphan PVs. (Note: if lvmcache_label_scan is run - * after vg_read udpates to lvmcache state, then the lvmcache will be + * after vg_read updates to lvmcache state, then the lvmcache will be * incorrect again, so do not run lvmcache_label_scan during the * processing phase.) * diff --git a/lib/device/bcache.c b/lib/device/bcache.c index d100419770..cfe01bac2f 100644 --- a/lib/device/bcache.c +++ b/lib/device/bcache.c @@ -292,6 +292,10 @@ static bool _async_issue(struct io_engine *ioe, enum dir d, int fd, } while (r == -EAGAIN); if (r < 0) { + ((struct block *)context)->error = r; + log_warn("io_submit <%c> off %llu bytes %llu return %d:%s", + (d == DIR_READ) ? 'R' : 'W', (long long unsigned)offset, + (long long unsigned)nbytes, r, strerror(-r)); _cb_free(e->cbs, cb); return false; } @@ -842,7 +846,7 @@ static void _complete_io(void *context, int err) if (b->error) { dm_list_add(&cache->errored, &b->list); - + log_warn("fd: %d error: %d", b->fd, err); } else { _clear_flags(b, BF_DIRTY); _link_block(b); @@ -869,8 +873,7 @@ static void _issue_low_level(struct block *b, enum dir d) dm_list_move(&cache->io_pending, &b->list); if (!cache->engine->issue(cache->engine, d, b->fd, sb, se, b->data, b)) { - /* FIXME: if io_submit() set an errno, return that instead of EIO? */ - _complete_io(b, -EIO); + _complete_io(b, b->error); return; } } diff --git a/lib/label/label.c b/lib/label/label.c index dc4d32d151..60ad387219 100644 --- a/lib/label/label.c +++ b/lib/label/label.c @@ -647,7 +647,6 @@ static int _scan_list(struct cmd_context *cmd, struct dev_filter *f, int submit_count; int scan_failed; int is_lvm_device; - int error; int ret; dm_list_init(&wait_devs); @@ -694,12 +693,12 @@ static int _scan_list(struct cmd_context *cmd, struct dev_filter *f, dm_list_iterate_items_safe(devl, devl2, &wait_devs) { bb = NULL; - error = 0; scan_failed = 0; is_lvm_device = 0; if (!bcache_get(scan_bcache, devl->dev->bcache_fd, 0, 0, &bb)) { - log_debug_devs("Scan failed to read %s error %d.", dev_name(devl->dev), error); + log_debug_devs("Scan failed to read %s error %d.", + dev_name(devl->dev), bb ? bb->error : 0); scan_failed = 1; scan_read_errors++; scan_failed_count++; diff --git a/tools/lvconvert.c b/tools/lvconvert.c index 60ab956614..4939e5ec7d 100644 --- a/tools/lvconvert.c +++ b/tools/lvconvert.c @@ -4676,7 +4676,7 @@ static int _lvconvert_split_cache_single(struct cmd_context *cmd, struct logical_volume *lv_main = NULL; struct logical_volume *lv_fast = NULL; struct lv_segment *seg; - int ret; + int ret = 0; if (lv_is_writecache(lv)) { lv_main = lv; --- Thanks zhm On 10/11/19 11:14 PM, David Teigland wrote: > On Fri, Oct 11, 2019 at 08:11:29AM +0000, Heming Zhao wrote: > >> I analyze this issue for some days. It looks a new bug. > > Yes, thanks for the thorough analysis. > >> In user machine, this write action was failed, the PV header data (first >> 4K) save in bcache (cache->errored list), and then write (by >> bcache_flush) to another disk (f748). > > It looks like we need to get rid of cache->errored completely. > >> If dev_write_bytes failed, the bcache never clean last_byte. and the fd >> is closed at same time, but cache->errored still have errored fd's data. >> later lvm open new disk, the fd may reuse the old-errored fd number, >> error data will be written when later lvm call bcache_flush. > > That's a bad bug. > >> 2> duplicated pv header. >> as <1> description, fc68 metadata was overwritten to f748. >> this cause by lvm bug (I said in <1>). >> >> 3> device not correct >> I don't know why the disk scsi-360060e80072a670000302a670000fc68 has below wrong metadata: >> >> pre_pvr/scsi-360060e80072a670000302a670000fc68 >> (please also read the comments in below metadata area.) >> ``` >> vgpocdbcdb1_r2 { >> id = "PWd17E-xxx-oANHbq" >> seqno = 20 >> format = "lvm2" >> status = ["RESIZEABLE", "READ", "WRITE"] >> flags = [] >> extent_size = 65536 >> max_lv = 0 >> max_pv = 0 >> metadata_copies = 0 >> >> physical_volumes { >> >> pv0 { >> id = "3KTOW5-xxxx-8g0Rf2" >> device = "/dev/disk/by-id/scsi-360060e80072a660000302a660000f768" >> Wrong!! ^^^^^ >> I don't know why there is f768, please ask customer >> status = ["ALLOCATABLE"] >> flags = [] >> dev_size = 860160 >> pe_start = 2048 >> pe_count = 13 >> } >> } >> ``` >> fc68 => f768 the 'c' (b1100) change to '7' (b0111). >> maybe disk bit overturn, maybe lvm has bug. I don't know & have no idea. > > Is scsi-360060e80072a660000302a660000f768 the correct device for > PVID 3KTOW5...? If so, then it's consistent. If not, then I suspect > this is a result of duplicating the PVID on multiple devices above. > > >> On 9/11/19 5:17 PM, Gang He wrote: >>> Hello List, >>> >>> Our user encountered a meta-data corruption problem, when run pvresize command after upgrading to LVM2 v2.02.180 from v2.02.120. >>> >>> The details are as below, >>> we have following environment: >>> - Storage: HP XP7 (SAN) - LUN's are presented to ESX via RDM >>> - VMWare ESXi 6.5 >>> - SLES 12 SP 4 Guest >>> >>> Resize happened this way (is our standard way since years) - however - this is our first resize after upgrading SLES 12 SP3 to SLES 12 SP4 - until this upgrade, we >>> never had a problem like this: >>> - split continous access on storage box, resize lun on XP7 >>> - recreate ca on XP7 >>> - scan on ESX >>> - rescan-scsi-bus.sh -s on SLES VM >>> - pvresize ( at this step the error happened) >>> >>> huns1vdb01:~ # pvresize /dev/disk/by-id/scsi-360060e80072a660000302a6600003274 >> >> _______________________________________________ >> linux-lvm mailing list >> linux-lvm@redhat.com >> https://www.redhat.com/mailman/listinfo/linux-lvm >> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/ > ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [linux-lvm] pvresize will cause a meta-data corruption with error message "Error writing device at 4096 length 512" 2019-10-12 6:34 ` Heming Zhao @ 2019-10-12 7:11 ` Heming Zhao 2019-10-14 3:07 ` Heming Zhao 2019-10-14 3:13 ` Heming Zhao 0 siblings, 2 replies; 16+ messages in thread From: Heming Zhao @ 2019-10-12 7:11 UTC (permalink / raw) To: David Teigland; +Cc: Gang He, linux-lvm Hello List & David, Below patch for fix incorrect calling dev_unset_last_byte. ------------ commit 89cfffeffb7499d8f51112f58c381007aebc372d (HEAD -> master) Author: Zhao Heming <heming.zhao@suse.com> Date: Sat Oct 12 15:04:42 2019 +0800 When dev_write_bytes error, this function will release fd. It makes caller can't reset bcache last_byte by dev_unset_last_byte. Signed-off-by: Zhao Heming <heming.zhao@suse.com> diff --git a/.gitignore b/.gitignore index 7ebb8bb3be..cfd5bee1c4 100644 --- a/.gitignore +++ b/.gitignore @@ -30,7 +30,7 @@ make.tmpl /config.log /config.status /configure.scan -/cscope.out +/cscope.* /html/ /reports/ /tags diff --git a/lib/format_text/format-text.c b/lib/format_text/format-text.c index 6ec47bfcef..fd65f50f5f 100644 --- a/lib/format_text/format-text.c +++ b/lib/format_text/format-text.c @@ -277,8 +277,7 @@ static int _raw_write_mda_header(const struct format_type *fmt, dev_set_last_byte(dev, start_byte + MDA_HEADER_SIZE); if (!dev_write_bytes(dev, start_byte, MDA_HEADER_SIZE, mdah)) { - dev_unset_last_byte(dev); - log_error("Failed to write mda header to %s fd %d", dev_name(dev), dev->bcache_fd); + log_error("Failed to write mda header to %s", dev_name(dev)); return 0; } dev_unset_last_byte(dev); @@ -988,8 +987,7 @@ static int _vg_write_raw(struct format_instance *fid, struct volume_group *vg, (unsigned long long)write2_size); if (!dev_write_bytes(mdac->area.dev, write1_start, (size_t)write1_size, write_buf)) { - log_error("Failed to write metadata to %s fd %d", devname, mdac->area.dev->bcache_fd); - dev_unset_last_byte(mdac->area.dev); + log_error("Failed to write metadata to %s", devname); goto out; } @@ -1001,8 +999,7 @@ static int _vg_write_raw(struct format_instance *fid, struct volume_group *vg, if (!dev_write_bytes(mdac->area.dev, write2_start, write2_size, write_buf + new_size - new_wrap)) { - log_error("Failed to write metadata wrap to %s fd %d", devname, mdac->area.dev->bcache_fd); - dev_unset_last_byte(mdac->area.dev); + log_error("Failed to write metadata wrap to %s", devname); goto out; } } @@ -1019,7 +1016,7 @@ static int _vg_write_raw(struct format_instance *fid, struct volume_group *vg, r = 1; - out: +out: if (!r) { free(fidtc->write_buf); fidtc->write_buf = NULL; diff --git a/lib/label/label.c b/lib/label/label.c index 60ad387219..f4787b18cb 100644 --- a/lib/label/label.c +++ b/lib/label/label.c @@ -218,7 +218,7 @@ int label_write(struct device *dev, struct label *label) if (!dev_write_bytes(dev, offset, LABEL_SIZE, buf)) { log_debug_devs("Failed to write label to %s", dev_name(dev)); - r = 0; + return 0; } dev_unset_last_byte(dev); @@ -1415,7 +1415,8 @@ bool dev_write_bytes(struct device *dev, uint64_t start, size_t len, void *data) if (!scan_bcache) { /* Should not happen */ - log_error("dev_write bcache not set up %s", dev_name(dev)); + log_error("dev_write bcache not set up %s fd %d", dev_name(dev), + dev->bcache_fd); return false; } @@ -1434,21 +1435,25 @@ bool dev_write_bytes(struct device *dev, uint64_t start, size_t len, void *data) dev->flags |= DEV_BCACHE_WRITE; if (!label_scan_open(dev)) { log_error("Error opening device %s for writing at %llu length %u.", - dev_name(dev), (unsigned long long)start, (uint32_t)len); + dev_name(dev), (unsigned long long)start, (uint32_t)len); return false; } } if (!bcache_write_bytes(scan_bcache, dev->bcache_fd, start, len, data)) { - log_error("Error writing device %s at %llu length %u.", - dev_name(dev), (unsigned long long)start, (uint32_t)len); + log_error("Error writing device %s at %llu length %u fd %d.", + dev_name(dev), (unsigned long long)start, (uint32_t)len, + dev->bcache_fd); + dev_unset_last_byte(mdac->area.dev); label_scan_invalidate(dev); return false; } if (!bcache_flush(scan_bcache)) { - log_error("Error writing device %s at %llu length %u.", - dev_name(dev), (unsigned long long)start, (uint32_t)len); + log_error("Error writing device %s at %llu length %u fd %d.", + dev_name(dev), (unsigned long long)start, (uint32_t)len, + dev->bcache_fd); + dev_unset_last_byte(mdac->area.dev); label_scan_invalidate(dev); return false; } diff --git a/lib/metadata/mirror.c b/lib/metadata/mirror.c index 75dc18c113..c8280f9c47 100644 --- a/lib/metadata/mirror.c +++ b/lib/metadata/mirror.c @@ -266,7 +266,6 @@ static int _write_log_header(struct cmd_context *cmd, struct logical_volume *lv) dev_set_last_byte(dev, sizeof(log_header)); if (!dev_write_bytes(dev, UINT64_C(0), sizeof(log_header), &log_header)) { - dev_unset_last_byte(dev); log_error("Failed to write log header to %s.", name); return 0; } --- Thanks zhm On 10/12/19 2:34 PM, Heming Zhao wrote: > Hello David, > > Thank you for your reply. > > For these days analysis code, I found below codes can be enhanced. > (code changes base on git master branch.) > > --------------- > commit 3768196011fb01e4016510bfab9eef0c7bdc04f5 (HEAD -> master) > Author: Zhao Heming <heming.zhao@suse.com> > Date: Sat Oct 12 14:28:06 2019 +0800 > > fix typo in lib/cache/lvmcache.c > enhance error handling in bcache > fix constant var 'error' in _scan_list > fix gcc warning in _lvconvert_split_cache_single > > Signed-off-by: Zhao Heming <heming.zhao@suse.com> > > diff --git a/lib/cache/lvmcache.c b/lib/cache/lvmcache.c > index f6e792459b..499f9437cb 100644 > --- a/lib/cache/lvmcache.c > +++ b/lib/cache/lvmcache.c > @@ -939,7 +939,7 @@ int lvmcache_label_rescan_vg_rw(struct cmd_context *cmd, const char *vgname, con > * incorrectly placed PVs should have been moved from the orphan vginfo > * onto their correct vginfo's, and the orphan vginfo should (in theory) > * represent only real orphan PVs. (Note: if lvmcache_label_scan is run > - * after vg_read udpates to lvmcache state, then the lvmcache will be > + * after vg_read updates to lvmcache state, then the lvmcache will be > * incorrect again, so do not run lvmcache_label_scan during the > * processing phase.) > * > diff --git a/lib/device/bcache.c b/lib/device/bcache.c > index d100419770..cfe01bac2f 100644 > --- a/lib/device/bcache.c > +++ b/lib/device/bcache.c > @@ -292,6 +292,10 @@ static bool _async_issue(struct io_engine *ioe, enum dir d, int fd, > } while (r == -EAGAIN); > > if (r < 0) { > + ((struct block *)context)->error = r; > + log_warn("io_submit <%c> off %llu bytes %llu return %d:%s", > + (d == DIR_READ) ? 'R' : 'W', (long long unsigned)offset, > + (long long unsigned)nbytes, r, strerror(-r)); > _cb_free(e->cbs, cb); > return false; > } > @@ -842,7 +846,7 @@ static void _complete_io(void *context, int err) > > if (b->error) { > dm_list_add(&cache->errored, &b->list); > - > + log_warn("fd: %d error: %d", b->fd, err); > } else { > _clear_flags(b, BF_DIRTY); > _link_block(b); > @@ -869,8 +873,7 @@ static void _issue_low_level(struct block *b, enum dir d) > dm_list_move(&cache->io_pending, &b->list); > > if (!cache->engine->issue(cache->engine, d, b->fd, sb, se, b->data, b)) { > - /* FIXME: if io_submit() set an errno, return that instead of EIO? */ > - _complete_io(b, -EIO); > + _complete_io(b, b->error); > return; > } > } > diff --git a/lib/label/label.c b/lib/label/label.c > index dc4d32d151..60ad387219 100644 > --- a/lib/label/label.c > +++ b/lib/label/label.c > @@ -647,7 +647,6 @@ static int _scan_list(struct cmd_context *cmd, struct dev_filter *f, > int submit_count; > int scan_failed; > int is_lvm_device; > - int error; > int ret; > > dm_list_init(&wait_devs); > @@ -694,12 +693,12 @@ static int _scan_list(struct cmd_context *cmd, struct dev_filter *f, > > dm_list_iterate_items_safe(devl, devl2, &wait_devs) { > bb = NULL; > - error = 0; > scan_failed = 0; > is_lvm_device = 0; > > if (!bcache_get(scan_bcache, devl->dev->bcache_fd, 0, 0, &bb)) { > - log_debug_devs("Scan failed to read %s error %d.", dev_name(devl->dev), error); > + log_debug_devs("Scan failed to read %s error %d.", > + dev_name(devl->dev), bb ? bb->error : 0); > scan_failed = 1; > scan_read_errors++; > scan_failed_count++; > diff --git a/tools/lvconvert.c b/tools/lvconvert.c > index 60ab956614..4939e5ec7d 100644 > --- a/tools/lvconvert.c > +++ b/tools/lvconvert.c > @@ -4676,7 +4676,7 @@ static int _lvconvert_split_cache_single(struct cmd_context *cmd, > struct logical_volume *lv_main = NULL; > struct logical_volume *lv_fast = NULL; > struct lv_segment *seg; > - int ret; > + int ret = 0; > > if (lv_is_writecache(lv)) { > lv_main = lv; > > --- > Thanks > zhm > > On 10/11/19 11:14 PM, David Teigland wrote: >> On Fri, Oct 11, 2019 at 08:11:29AM +0000, Heming Zhao wrote: >> >>> I analyze this issue for some days. It looks a new bug. >> >> Yes, thanks for the thorough analysis. >> >>> In user machine, this write action was failed, the PV header data (first >>> 4K) save in bcache (cache->errored list), and then write (by >>> bcache_flush) to another disk (f748). >> >> It looks like we need to get rid of cache->errored completely. >> >>> If dev_write_bytes failed, the bcache never clean last_byte. and the fd >>> is closed at same time, but cache->errored still have errored fd's data. >>> later lvm open new disk, the fd may reuse the old-errored fd number, >>> error data will be written when later lvm call bcache_flush. >> >> That's a bad bug. >> >>> 2> duplicated pv header. >>> as <1> description, fc68 metadata was overwritten to f748. >>> this cause by lvm bug (I said in <1>). >>> >>> 3> device not correct >>> I don't know why the disk scsi-360060e80072a670000302a670000fc68 has below wrong metadata: >>> >>> pre_pvr/scsi-360060e80072a670000302a670000fc68 >>> (please also read the comments in below metadata area.) >>> ``` >>> vgpocdbcdb1_r2 { >>> id = "PWd17E-xxx-oANHbq" >>> seqno = 20 >>> format = "lvm2" >>> status = ["RESIZEABLE", "READ", "WRITE"] >>> flags = [] >>> extent_size = 65536 >>> max_lv = 0 >>> max_pv = 0 >>> metadata_copies = 0 >>> >>> physical_volumes { >>> >>> pv0 { >>> id = "3KTOW5-xxxx-8g0Rf2" >>> device = "/dev/disk/by-id/scsi-360060e80072a660000302a660000f768" >>> Wrong!! ^^^^^ >>> I don't know why there is f768, please ask customer >>> status = ["ALLOCATABLE"] >>> flags = [] >>> dev_size = 860160 >>> pe_start = 2048 >>> pe_count = 13 >>> } >>> } >>> ``` >>> fc68 => f768 the 'c' (b1100) change to '7' (b0111). >>> maybe disk bit overturn, maybe lvm has bug. I don't know & have no idea. >> >> Is scsi-360060e80072a660000302a660000f768 the correct device for >> PVID 3KTOW5...? If so, then it's consistent. If not, then I suspect >> this is a result of duplicating the PVID on multiple devices above. >> >> >>> On 9/11/19 5:17 PM, Gang He wrote: >>>> Hello List, >>>> >>>> Our user encountered a meta-data corruption problem, when run pvresize command after upgrading to LVM2 v2.02.180 from v2.02.120. >>>> >>>> The details are as below, >>>> we have following environment: >>>> - Storage: HP XP7 (SAN) - LUN's are presented to ESX via RDM >>>> - VMWare ESXi 6.5 >>>> - SLES 12 SP 4 Guest >>>> >>>> Resize happened this way (is our standard way since years) - however - this is our first resize after upgrading SLES 12 SP3 to SLES 12 SP4 - until this upgrade, we >>>> never had a problem like this: >>>> - split continous access on storage box, resize lun on XP7 >>>> - recreate ca on XP7 >>>> - scan on ESX >>>> - rescan-scsi-bus.sh -s on SLES VM >>>> - pvresize ( at this step the error happened) >>>> >>>> huns1vdb01:~ # pvresize /dev/disk/by-id/scsi-360060e80072a660000302a6600003274 >>> >>> _______________________________________________ >>> linux-lvm mailing list >>> linux-lvm@redhat.com >>> https://www.redhat.com/mailman/listinfo/linux-lvm >>> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/ >> > > _______________________________________________ > linux-lvm mailing list > linux-lvm@redhat.com > https://www.redhat.com/mailman/listinfo/linux-lvm > read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/ > ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [linux-lvm] pvresize will cause a meta-data corruption with error message "Error writing device at 4096 length 512" 2019-10-12 7:11 ` Heming Zhao @ 2019-10-14 3:07 ` Heming Zhao 2019-10-14 3:13 ` Heming Zhao 1 sibling, 0 replies; 16+ messages in thread From: Heming Zhao @ 2019-10-14 3:07 UTC (permalink / raw) To: David Teigland; +Cc: Gang He, linux-lvm I'm very sorry for the last mail. There are compiling errors in patch. (with commit 89cfffeffb7499d8f51112f58c381007aebc372d) I resend this patch. see below: ``` commit 7032c9c0bfe3c1fcbbb6e4e036ffe69a02aaa440 Author: Zhao Heming <heming.zhao@suse.com> Date: Mon Oct 14 10:55:54 2019 +0800 makes caller can't reset bcache last_byte by dev_unset_last_byte Signed-off-by: Zhao Heming <heming.zhao@suse.com> diff --git a/.gitignore b/.gitignore index 7ebb8bb3be..cfd5bee1c4 100644 --- a/.gitignore +++ b/.gitignore @@ -30,7 +30,7 @@ make.tmpl /config.log /config.status /configure.scan -/cscope.out +/cscope.* /html/ /reports/ /tags diff --git a/lib/format_text/format-text.c b/lib/format_text/format-text.c index 6ec47bfcef..fd65f50f5f 100644 --- a/lib/format_text/format-text.c +++ b/lib/format_text/format-text.c @@ -277,8 +277,7 @@ static int _raw_write_mda_header(const struct format_type *fmt, dev_set_last_byte(dev, start_byte + MDA_HEADER_SIZE); if (!dev_write_bytes(dev, start_byte, MDA_HEADER_SIZE, mdah)) { - dev_unset_last_byte(dev); - log_error("Failed to write mda header to %s fd %d", dev_name(dev), dev->bcache_fd); + log_error("Failed to write mda header to %s", dev_name(dev)); return 0; } dev_unset_last_byte(dev); @@ -988,8 +987,7 @@ static int _vg_write_raw(struct format_instance *fid, struct volume_group *vg, (unsigned long long)write2_size); if (!dev_write_bytes(mdac->area.dev, write1_start, (size_t)write1_size, write_buf)) { - log_error("Failed to write metadata to %s fd %d", devname, mdac->area.dev->bcache_fd); - dev_unset_last_byte(mdac->area.dev); + log_error("Failed to write metadata to %s", devname); goto out; } @@ -1001,8 +999,7 @@ static int _vg_write_raw(struct format_instance *fid, struct volume_group *vg, if (!dev_write_bytes(mdac->area.dev, write2_start, write2_size, write_buf + new_size - new_wrap)) { - log_error("Failed to write metadata wrap to %s fd %d", devname, mdac->area.dev->bcache_fd); - dev_unset_last_byte(mdac->area.dev); + log_error("Failed to write metadata wrap to %s", devname); goto out; } } @@ -1019,7 +1016,7 @@ static int _vg_write_raw(struct format_instance *fid, struct volume_group *vg, r = 1; - out: +out: if (!r) { free(fidtc->write_buf); fidtc->write_buf = NULL; diff --git a/lib/label/label.c b/lib/label/label.c index 60ad387219..64a8f14150 100644 --- a/lib/label/label.c +++ b/lib/label/label.c @@ -218,7 +218,7 @@ int label_write(struct device *dev, struct label *label) if (!dev_write_bytes(dev, offset, LABEL_SIZE, buf)) { log_debug_devs("Failed to write label to %s", dev_name(dev)); - r = 0; + return 0; } dev_unset_last_byte(dev); @@ -1415,7 +1415,8 @@ bool dev_write_bytes(struct device *dev, uint64_t start, size_t len, void *data) if (!scan_bcache) { /* Should not happen */ - log_error("dev_write bcache not set up %s", dev_name(dev)); + log_error("dev_write bcache not set up %s fd %d", dev_name(dev), + dev->bcache_fd); return false; } @@ -1434,21 +1435,25 @@ bool dev_write_bytes(struct device *dev, uint64_t start, size_t len, void *data) dev->flags |= DEV_BCACHE_WRITE; if (!label_scan_open(dev)) { log_error("Error opening device %s for writing at %llu length %u.", - dev_name(dev), (unsigned long long)start, (uint32_t)len); + dev_name(dev), (unsigned long long)start, (uint32_t)len); return false; } } if (!bcache_write_bytes(scan_bcache, dev->bcache_fd, start, len, data)) { - log_error("Error writing device %s at %llu length %u.", - dev_name(dev), (unsigned long long)start, (uint32_t)len); + log_error("Error writing device %s at %llu length %u fd %d.", + dev_name(dev), (unsigned long long)start, (uint32_t)len, + dev->bcache_fd); + dev_unset_last_byte(dev); label_scan_invalidate(dev); return false; } if (!bcache_flush(scan_bcache)) { - log_error("Error writing device %s at %llu length %u.", - dev_name(dev), (unsigned long long)start, (uint32_t)len); + log_error("Error writing device %s at %llu length %u fd %d.", + dev_name(dev), (unsigned long long)start, (uint32_t)len, + dev->bcache_fd); + dev_unset_last_byte(dev); label_scan_invalidate(dev); return false; } diff --git a/lib/metadata/mirror.c b/lib/metadata/mirror.c index 75dc18c113..c8280f9c47 100644 --- a/lib/metadata/mirror.c +++ b/lib/metadata/mirror.c @@ -266,7 +266,6 @@ static int _write_log_header(struct cmd_context *cmd, struct logical_volume *lv) dev_set_last_byte(dev, sizeof(log_header)); if (!dev_write_bytes(dev, UINT64_C(0), sizeof(log_header), &log_header)) { - dev_unset_last_byte(dev); log_error("Failed to write log header to %s.", name); return 0; } ``` On 10/12/19 3:11 PM, Heming Zhao wrote: > Below patch for fix incorrect calling dev_unset_last_byte. ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [linux-lvm] pvresize will cause a meta-data corruption with error message "Error writing device at 4096 length 512" 2019-10-12 7:11 ` Heming Zhao 2019-10-14 3:07 ` Heming Zhao @ 2019-10-14 3:13 ` Heming Zhao 2019-10-16 8:50 ` Heming Zhao 1 sibling, 1 reply; 16+ messages in thread From: Heming Zhao @ 2019-10-14 3:13 UTC (permalink / raw) To: David Teigland; +Cc: Gang He, linux-lvm For the issue in bcache_flush, it's related with cache->errored. I give my fix. I believe there should have better solution than my. Solution: To keep cache->errored, but this list only use to save error data, and the error data never resend. So bcache_flush check the cache->errored, when the errored list is not empty, bcache_flush return false, it will trigger caller/upper to do the clean jobs. ``` commit 17e959c0ba58edc67b6caa7669444ecffa40a16f (HEAD -> master) Author: Zhao Heming <heming.zhao@suse.com> Date: Mon Oct 14 10:57:54 2019 +0800 The fd in cache->errored may already be closed before calling bcache_flush, so bcache_flush shouldn't rewrite data in cache->errored. Currently solution is return error to caller when cache->errored is not empty, and caller should do all the clean jobs. Signed-off-by: Zhao Heming <heming.zhao@suse.com> diff --git a/lib/device/bcache.c b/lib/device/bcache.c index cfe01bac2f..2eb3f0ee34 100644 --- a/lib/device/bcache.c +++ b/lib/device/bcache.c @@ -897,16 +897,20 @@ static bool _wait_io(struct bcache *cache) * High level IO handling *--------------------------------------------------------------*/ -static void _wait_all(struct bcache *cache) +static bool _wait_all(struct bcache *cache) { + bool ret = true; while (!dm_list_empty(&cache->io_pending)) - _wait_io(cache); + ret = _wait_io(cache); + return ret; } -static void _wait_specific(struct block *b) +static bool _wait_specific(struct block *b) { + bool ret = true; while (_test_flags(b, BF_IO_PENDING)) - _wait_io(b->cache); + ret = _wait_io(b->cache); + return ret; } static unsigned _writeback(struct bcache *cache, unsigned count) @@ -1262,10 +1266,7 @@ void bcache_put(struct block *b) bool bcache_flush(struct bcache *cache) { - // Only dirty data is on the errored list, since bad read blocks get - // recycled straight away. So we put these back on the dirty list, and - // try and rewrite everything. - dm_list_splice(&cache->dirty, &cache->errored); + bool ret = true; while (!dm_list_empty(&cache->dirty)) { struct block *b = dm_list_item(_list_pop(&cache->dirty), struct block); @@ -1275,11 +1276,18 @@ bool bcache_flush(struct bcache *cache) } _issue_write(b); + if (b->error) ret = false; } - _wait_all(cache); + ret = _wait_all(cache); - return dm_list_empty(&cache->errored); + // merge the errored list to dirty, return false to trigger caller to + // clean them. + if (!dm_list_empty(&cache->errored)) { + dm_list_splice(&cache->dirty, &cache->errored); + ret = false; + } + return ret; } //---------------------------------------------------------------- ``` ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [linux-lvm] pvresize will cause a meta-data corruption with error message "Error writing device at 4096 length 512" 2019-10-14 3:13 ` Heming Zhao @ 2019-10-16 8:50 ` Heming Zhao 0 siblings, 0 replies; 16+ messages in thread From: Heming Zhao @ 2019-10-16 8:50 UTC (permalink / raw) To: David Teigland; +Cc: Gang He, linux-lvm Hello list, the reason: customer "pvresize -vvvvvv -dddddd /dev/disk/by-id/scsi-360060e80072a660000302a660000f111", the disk scsi-360060e80072a660000302a660000f747 overwrite was found. from our user's log. the fc68 disk was opened as fd 18, it related buffer (fc68's PV header) was written 8 times: ``` /dev/disk/by-id/scsi-360060e80072a670000302a670000fc68 <== first error /dev/disk/by-id/scsi-360060e80072a670000302a670000fc67 <== 2nd time write fc68's pv header, but failed /dev/disk/by-id/scsi-360060e80072a670000302a670000fc66 <== 3th time write failed. /dev/disk/by-id/scsi-360060e80072a660000302a660000fa08-part1 /dev/disk/by-id/scsi-360060e80072a660000302a660000f74c /dev/disk/by-id/scsi-360060e80072a660000302a660000f74a /dev/disk/by-id/scsi-360060e80072a660000302a660000f749 /dev/disk/by-id/scsi-360060e80072a660000302a660000f747 <=== last error, but write fc68's data successfully. ``` when the last error on f747, the write to fd 18 is successfully: ``` #label/label.c:629 Scanning 1 devices for VG info #label/label.c:665 Scanning submitted 1 reads #device/bcache.c:1063 zhm _lookup_or_read_block 1063 flags: 0x0 b: 0x555e778b26a0 #label/label.c:674 Scan failed to read /dev/disk/by-id/scsi-360060e80072a660000302a660000f747 error 0. #device/bcache.c:1363 zhm bcache_invalidate_fd 1363 cache->dirty fd: 18 #device/bcache.c:1372 zhm bcache_invalidate_fd 1372 cache->clean fd: 18 #label/label.c:764 Scanned devices: read errors 1 process errors 0 failed 1 ``` At above, the line '#device/bcache.c:1363' will trigger write action & successfully. And another important thing is the first fd 18's data write error, the data in cache->errored, it was not saved in dirty list & clean list. So first fc68 write error (by calling bcache_invalidate_fd) won't clean this data. Then happened below 6 read error. (the error of fc67, fc66, fa08-par1, f74c, f74a, f749 are same): ``` #toollib.c:4377 Processing PVs in VG vgautdbtest02_db03 #locking/locking.c:331 Dropping cache for vgautdbtest02_db03. #misc/lvm-flock.c:202 Locking /run/lvm/lock/V_vgautdbtest02_db03 WB #misc/lvm-flock.c:100 _do_flock /run/lvm/lock/V_vgautdbtest02_db03:aux WB #misc/lvm-flock.c:100 _do_flock /run/lvm/lock/V_vgautdbtest02_db03 WB #misc/lvm-flock.c:47 _undo_flock /run/lvm/lock/V_vgautdbtest02_db03:aux #metadata/metadata.c:3778 Reading VG vgautdbtest02_db03 fgItp1-1M8x-6qVT-84ND-Fpwa-mFYf-or2sow #metadata/metadata.c:3874 Rescanning devices for vgautdbtest02_db03 #cache/lvmcache.c:751 lvmcache has no info for vgname "vgautdbtest02_db03" with VGID fgItp11M8x6qVT84NDFpwamFYfor2sow. #device/bcache.c:1372 zhm bcache_invalidate_fd 1372 cache->clean fd: 49 #label/label.c:629 Scanning 1 devices for VG info #label/label.c:665 Scanning submitted 1 reads #device/bcache.c:1063 zhm _lookup_or_read_block 1063 flags: 0x0 b: 0x555e778b26a0 #label/label.c:674 Scan failed to read /dev/disk/by-id/scsi-360060e80072a660000302a660000f749 error 0. #device/bcache.c:1363 zhm bcache_invalidate_fd 1363 cache->dirty fd: 18 #device/bcache.c:337 zhm _async_wait 337 ev->res: 18446744073709551495 errno: 0 #device/bcache.c:875 zhm _complete_io 875 fd: 18 error: -121 #label/label.c:764 Scanned devices: read errors 1 process errors 0 failed 1 ``` why disk f749's fd is 49, but finally disk f749's fd is 18? with following steps, f749 got fd 18 which was fc68's closed fd before. ``` lvmcache_label_rescan_vg label_scan_devs_rw { //clean already exist bcache & closed current fd dm_list_iterate_items(devl, devs) { if (_in_bcache(devl->dev)) { bcache_invalidate_fd(scan_bcache, devl->dev->bcache_fd); <== invalidate current bcache _scan_dev_close(devl->dev); <=== closed current fd 49 } _scan_list(cmd, f, devs, NULL); <== will open new fd (reuse just closed errored fd 18). } ``` in _scan_list, it will open disk with last failed fd 18. then use this fd 18 to search hash table. then got the last errored bcache data (with PV metadata). With following flow, the cache->errored move into cache->dirty list. ``` bcache_get _lookup_or_read_block { b = _hash_lookup() <=== fetch just closed errored fd 18. if (b) { ... ... _unlink_block(b); <=== unlink from cache->errored ... ... } if (b) { _link_block(b); <== legacy fd 18 has FG_DIRTY, it will trigger move to cache->dirty. } } ``` with fd 18's data in cache->dirty, when calling bcache_invalidate_fd, it will trigger _issue_write() to write out this PV header to another disk. if happen write error (like this log showed 7 times error), pv header data first saved in cache->errored, then move to cache->dirty under calling bcache_get. this error won't disappear until the write successfully. the line "#device/bcache.c:1063" show, the bcache always fetch/reuse last errored fd 18 (fc68 first opened). fc67, fc66, fa08-par1, f74c, f74a, f749 try to write disk but failed, on the 8th write (by disk f747) the write successfully. so the lvm metadata of f747 disk had been changed. and for the write error -121, I need time to dig out. kernel message reported below info when lvm wrote error: ``` kernel: [ 714.139133] print_req_error: critical target error, dev sddz, sector 2048 kernel: [ 714.139144] print_req_error: critical target error, dev dm-54, sector 2048 ``` at last, please ignore my previous patches. they are not complete. the function bcache_invalidate_fd should be fix. I had send my lastest fix to our customer, and wait for feedback. Thanks, zhm On 10/14/19 11:13 AM, Heming Zhao wrote: > For the issue in bcache_flush, it's related with cache->errored. > > I give my fix. I believe there should have better solution than my. > > Solution: > To keep cache->errored, but this list only use to save error data, > and the error data never resend. > So bcache_flush check the cache->errored, when the errored list is not empty, > bcache_flush return false, it will trigger caller/upper to do the clean jobs. > > ``` > commit 17e959c0ba58edc67b6caa7669444ecffa40a16f (HEAD -> master) > Author: Zhao Heming <heming.zhao@suse.com> > Date: Mon Oct 14 10:57:54 2019 +0800 > > The fd in cache->errored may already be closed before calling bcache_flush, > so bcache_flush shouldn't rewrite data in cache->errored. Currently > solution is return error to caller when cache->errored is not empty, and > caller should do all the clean jobs. > > Signed-off-by: Zhao Heming <heming.zhao@suse.com> > > diff --git a/lib/device/bcache.c b/lib/device/bcache.c > index cfe01bac2f..2eb3f0ee34 100644 > --- a/lib/device/bcache.c > +++ b/lib/device/bcache.c > @@ -897,16 +897,20 @@ static bool _wait_io(struct bcache *cache) > * High level IO handling > *--------------------------------------------------------------*/ > > -static void _wait_all(struct bcache *cache) > +static bool _wait_all(struct bcache *cache) > { > + bool ret = true; > while (!dm_list_empty(&cache->io_pending)) > - _wait_io(cache); > + ret = _wait_io(cache); > + return ret; > } > > -static void _wait_specific(struct block *b) > +static bool _wait_specific(struct block *b) > { > + bool ret = true; > while (_test_flags(b, BF_IO_PENDING)) > - _wait_io(b->cache); > + ret = _wait_io(b->cache); > + return ret; > } > > static unsigned _writeback(struct bcache *cache, unsigned count) > @@ -1262,10 +1266,7 @@ void bcache_put(struct block *b) > > bool bcache_flush(struct bcache *cache) > { > - // Only dirty data is on the errored list, since bad read blocks get > - // recycled straight away. So we put these back on the dirty list, and > - // try and rewrite everything. > - dm_list_splice(&cache->dirty, &cache->errored); > + bool ret = true; > > while (!dm_list_empty(&cache->dirty)) { > struct block *b = dm_list_item(_list_pop(&cache->dirty), struct block); > @@ -1275,11 +1276,18 @@ bool bcache_flush(struct bcache *cache) > } > > _issue_write(b); > + if (b->error) ret = false; > } > > - _wait_all(cache); > + ret = _wait_all(cache); > > - return dm_list_empty(&cache->errored); > + // merge the errored list to dirty, return false to trigger caller to > + // clean them. > + if (!dm_list_empty(&cache->errored)) { > + dm_list_splice(&cache->dirty, &cache->errored); > + ret = false; > + } > + return ret; > } > > //---------------------------------------------------------------- > ``` > > > _______________________________________________ > linux-lvm mailing list > linux-lvm@redhat.com > https://www.redhat.com/mailman/listinfo/linux-lvm > read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/ > ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2019-10-16 8:50 UTC | newest] Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-09-11 9:17 [linux-lvm] pvresize will cause a meta-data corruption with error message "Error writing device at 4096 length 512" Gang He 2019-09-11 10:01 ` Ilia Zykov 2019-09-11 10:03 ` Ilia Zykov 2019-09-11 10:10 ` Ingo Franzki 2019-09-11 10:20 ` Gang He 2019-10-11 8:11 ` Heming Zhao 2019-10-11 9:22 ` Heming Zhao 2019-10-11 10:38 ` Zdenek Kabelac 2019-10-11 11:50 ` Heming Zhao 2019-10-11 15:14 ` David Teigland 2019-10-12 3:23 ` Gang He 2019-10-12 6:34 ` Heming Zhao 2019-10-12 7:11 ` Heming Zhao 2019-10-14 3:07 ` Heming Zhao 2019-10-14 3:13 ` Heming Zhao 2019-10-16 8:50 ` Heming Zhao
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).