linux-lvm.redhat.com archive mirror
 help / color / mirror / Atom feed
* [linux-lvm] pvresize will cause a meta-data corruption with error message "Error writing device at 4096 length 512"
@ 2019-09-11  9:17 Gang He
  2019-09-11 10:01 ` Ilia Zykov
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Gang He @ 2019-09-11  9:17 UTC (permalink / raw)
  To: LVM general discussion and development

Hello List,

Our user encountered a meta-data corruption problem, when run pvresize command after upgrading to LVM2 v2.02.180 from v2.02.120.

The details are as below,
we have following environment:
- Storage: HP XP7 (SAN) - LUN's are presented to ESX via RDM
- VMWare ESXi 6.5
- SLES 12 SP 4 Guest

Resize happened this way (is our standard way since years) - however - this is our first resize after upgrading SLES 12 SP3 to SLES 12 SP4 - until this upgrade, we
never had a problem like this:
- split continous access on storage box, resize lun on XP7
- recreate ca on XP7
- scan on ESX
- rescan-scsi-bus.sh -s on SLES VM
- pvresize  ( at this step the error happened)

huns1vdb01:~ # pvresize /dev/disk/by-id/scsi-360060e80072a660000302a6600003274
 Error writing device /dev/sdaf at 4096 length 512.
 Failed to write mda header to /dev/sdaf fd -1
 Failed to update old PV extension headers in VG vghundbhulv_ar.
 Error writing device /dev/disk/by-id/scsi-360060e80072a660000302a66000031ec at 4096 length 512.
 Failed to write mda header to /dev/disk/by-id/scsi-360060e80072a660000302a66000031ec fd -1
 Failed to update old PV extension headers in VG vghundbhulk_ar.
 VG info not found after rescan of vghundbhulv_r2
 VG info not found after rescan of vghundbhula_r1
 VG info not found after rescan of vghundbhuco_ar
 Error writing device /dev/disk/by-id/scsi-360060e80072a660000302a66000031e8 at 4096 length 512.
 Failed to write mda header to /dev/disk/by-id/scsi-360060e80072a660000302a66000031e8 fd -1
 Failed to update old PV extension headers in VG vghundbhula_ar.
 VG info not found after rescan of vghundbhuco_r2
 Error writing device /dev/disk/by-id/scsi-360060e80072a660000302a660000300b at 4096 length 512.
 Failed to write mda header to /dev/disk/by-id/scsi-360060e80072a660000302a660000300b fd -1
 Failed to update old PV extension headers in VG vghundbhunrm02_r2.

Any idea for this bug?

Thanks a lot.
Gang

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [linux-lvm] pvresize will cause a meta-data corruption with error message "Error writing device at 4096 length 512"
  2019-09-11  9:17 [linux-lvm] pvresize will cause a meta-data corruption with error message "Error writing device at 4096 length 512" Gang He
@ 2019-09-11 10:01 ` Ilia Zykov
  2019-09-11 10:03 ` Ilia Zykov
  2019-10-11  8:11 ` Heming Zhao
  2 siblings, 0 replies; 16+ messages in thread
From: Ilia Zykov @ 2019-09-11 10:01 UTC (permalink / raw)
  To: LVM general discussion and development, Gang He

[-- Attachment #1: Type: text/plain, Size: 2589 bytes --]

Maybe this?

Please note that this problem can also happen in other cases, such as
mixing disks with different block sizes (e.g. SCSI disks with 512 bytes
and s390x-DASDs with 4096 block size).

https://www.redhat.com/archives/linux-lvm/2019-February/msg00018.html



On 11.09.2019 12:17, Gang He wrote:
> Hello List,
> 
> Our user encountered a meta-data corruption problem, when run pvresize command after upgrading to LVM2 v2.02.180 from v2.02.120.
> 
> The details are as below,
> we have following environment:
> - Storage: HP XP7 (SAN) - LUN's are presented to ESX via RDM
> - VMWare ESXi 6.5
> - SLES 12 SP 4 Guest
> 
> Resize happened this way (is our standard way since years) - however - this is our first resize after upgrading SLES 12 SP3 to SLES 12 SP4 - until this upgrade, we
> never had a problem like this:
> - split continous access on storage box, resize lun on XP7
> - recreate ca on XP7
> - scan on ESX
> - rescan-scsi-bus.sh -s on SLES VM
> - pvresize  ( at this step the error happened)
> 
> huns1vdb01:~ # pvresize /dev/disk/by-id/scsi-360060e80072a660000302a6600003274
>  Error writing device /dev/sdaf at 4096 length 512.
>  Failed to write mda header to /dev/sdaf fd -1
>  Failed to update old PV extension headers in VG vghundbhulv_ar.
>  Error writing device /dev/disk/by-id/scsi-360060e80072a660000302a66000031ec at 4096 length 512.
>  Failed to write mda header to /dev/disk/by-id/scsi-360060e80072a660000302a66000031ec fd -1
>  Failed to update old PV extension headers in VG vghundbhulk_ar.
>  VG info not found after rescan of vghundbhulv_r2
>  VG info not found after rescan of vghundbhula_r1
>  VG info not found after rescan of vghundbhuco_ar
>  Error writing device /dev/disk/by-id/scsi-360060e80072a660000302a66000031e8 at 4096 length 512.
>  Failed to write mda header to /dev/disk/by-id/scsi-360060e80072a660000302a66000031e8 fd -1
>  Failed to update old PV extension headers in VG vghundbhula_ar.
>  VG info not found after rescan of vghundbhuco_r2
>  Error writing device /dev/disk/by-id/scsi-360060e80072a660000302a660000300b at 4096 length 512.
>  Failed to write mda header to /dev/disk/by-id/scsi-360060e80072a660000302a660000300b fd -1
>  Failed to update old PV extension headers in VG vghundbhunrm02_r2.
> 
> Any idea for this bug?
> 
> Thanks a lot.
> Gang
> 
> 
> _______________________________________________
> linux-lvm mailing list
> linux-lvm@redhat.com
> https://www.redhat.com/mailman/listinfo/linux-lvm
> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
> 



[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 3703 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [linux-lvm] pvresize will cause a meta-data corruption with error message "Error writing device at 4096 length 512"
  2019-09-11  9:17 [linux-lvm] pvresize will cause a meta-data corruption with error message "Error writing device at 4096 length 512" Gang He
  2019-09-11 10:01 ` Ilia Zykov
@ 2019-09-11 10:03 ` Ilia Zykov
  2019-09-11 10:10   ` Ingo Franzki
  2019-10-11  8:11 ` Heming Zhao
  2 siblings, 1 reply; 16+ messages in thread
From: Ilia Zykov @ 2019-09-11 10:03 UTC (permalink / raw)
  To: LVM general discussion and development, Gang He

[-- Attachment #1: Type: text/plain, Size: 2587 bytes --]

Maybe this?

Please note that this problem can also happen in other cases, such as
mixing disks with different block sizes (e.g. SCSI disks with 512 bytes
and s390x-DASDs with 4096 block size).

https://www.redhat.com/archives/linux-lvm/2019-February/msg00018.html


On 11.09.2019 12:17, Gang He wrote:
> Hello List,
> 
> Our user encountered a meta-data corruption problem, when run pvresize command after upgrading to LVM2 v2.02.180 from v2.02.120.
> 
> The details are as below,
> we have following environment:
> - Storage: HP XP7 (SAN) - LUN's are presented to ESX via RDM
> - VMWare ESXi 6.5
> - SLES 12 SP 4 Guest
> 
> Resize happened this way (is our standard way since years) - however - this is our first resize after upgrading SLES 12 SP3 to SLES 12 SP4 - until this upgrade, we
> never had a problem like this:
> - split continous access on storage box, resize lun on XP7
> - recreate ca on XP7
> - scan on ESX
> - rescan-scsi-bus.sh -s on SLES VM
> - pvresize  ( at this step the error happened)
> 
> huns1vdb01:~ # pvresize /dev/disk/by-id/scsi-360060e80072a660000302a6600003274
>  Error writing device /dev/sdaf at 4096 length 512.
>  Failed to write mda header to /dev/sdaf fd -1
>  Failed to update old PV extension headers in VG vghundbhulv_ar.
>  Error writing device /dev/disk/by-id/scsi-360060e80072a660000302a66000031ec at 4096 length 512.
>  Failed to write mda header to /dev/disk/by-id/scsi-360060e80072a660000302a66000031ec fd -1
>  Failed to update old PV extension headers in VG vghundbhulk_ar.
>  VG info not found after rescan of vghundbhulv_r2
>  VG info not found after rescan of vghundbhula_r1
>  VG info not found after rescan of vghundbhuco_ar
>  Error writing device /dev/disk/by-id/scsi-360060e80072a660000302a66000031e8 at 4096 length 512.
>  Failed to write mda header to /dev/disk/by-id/scsi-360060e80072a660000302a66000031e8 fd -1
>  Failed to update old PV extension headers in VG vghundbhula_ar.
>  VG info not found after rescan of vghundbhuco_r2
>  Error writing device /dev/disk/by-id/scsi-360060e80072a660000302a660000300b at 4096 length 512.
>  Failed to write mda header to /dev/disk/by-id/scsi-360060e80072a660000302a660000300b fd -1
>  Failed to update old PV extension headers in VG vghundbhunrm02_r2.
> 
> Any idea for this bug?
> 
> Thanks a lot.
> Gang
> 
> 
> _______________________________________________
> linux-lvm mailing list
> linux-lvm@redhat.com
> https://www.redhat.com/mailman/listinfo/linux-lvm
> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
> 



[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 3695 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [linux-lvm] pvresize will cause a meta-data corruption with error message "Error writing device at 4096 length 512"
  2019-09-11 10:03 ` Ilia Zykov
@ 2019-09-11 10:10   ` Ingo Franzki
  2019-09-11 10:20     ` Gang He
  0 siblings, 1 reply; 16+ messages in thread
From: Ingo Franzki @ 2019-09-11 10:10 UTC (permalink / raw)
  To: LVM general discussion and development, Ilia Zykov, Gang He

On 11.09.2019 12:03, Ilia Zykov wrote:
> Maybe this?
> 
> Please note that this problem can also happen in other cases, such as
> mixing disks with different block sizes (e.g. SCSI disks with 512 bytes
> and s390x-DASDs with 4096 block size).
> 
> https://www.redhat.com/archives/linux-lvm/2019-February/msg00018.html

And the fix for this is already available upstream (Thanks David!):
https://sourceware.org/git/?p=lvm2.git;a=commit;h=0404539edb25e4a9d3456bb3e6b402aa2767af6b
> 
> 
> On 11.09.2019 12:17, Gang He wrote:
>> Hello List,
>>
>> Our user encountered a meta-data corruption problem, when run pvresize command after upgrading to LVM2 v2.02.180 from v2.02.120.
>>
>> The details are as below,
>> we have following environment:
>> - Storage: HP XP7 (SAN) - LUN's are presented to ESX via RDM
>> - VMWare ESXi 6.5
>> - SLES 12 SP 4 Guest
>>
>> Resize happened this way (is our standard way since years) - however - this is our first resize after upgrading SLES 12 SP3 to SLES 12 SP4 - until this upgrade, we
>> never had a problem like this:
>> - split continous access on storage box, resize lun on XP7
>> - recreate ca on XP7
>> - scan on ESX
>> - rescan-scsi-bus.sh -s on SLES VM
>> - pvresize  ( at this step the error happened)
>>
>> huns1vdb01:~ # pvresize /dev/disk/by-id/scsi-360060e80072a660000302a6600003274
>>  Error writing device /dev/sdaf at 4096 length 512.
>>  Failed to write mda header to /dev/sdaf fd -1
>>  Failed to update old PV extension headers in VG vghundbhulv_ar.
>>  Error writing device /dev/disk/by-id/scsi-360060e80072a660000302a66000031ec at 4096 length 512.
>>  Failed to write mda header to /dev/disk/by-id/scsi-360060e80072a660000302a66000031ec fd -1
>>  Failed to update old PV extension headers in VG vghundbhulk_ar.
>>  VG info not found after rescan of vghundbhulv_r2
>>  VG info not found after rescan of vghundbhula_r1
>>  VG info not found after rescan of vghundbhuco_ar
>>  Error writing device /dev/disk/by-id/scsi-360060e80072a660000302a66000031e8 at 4096 length 512.
>>  Failed to write mda header to /dev/disk/by-id/scsi-360060e80072a660000302a66000031e8 fd -1
>>  Failed to update old PV extension headers in VG vghundbhula_ar.
>>  VG info not found after rescan of vghundbhuco_r2
>>  Error writing device /dev/disk/by-id/scsi-360060e80072a660000302a660000300b at 4096 length 512.
>>  Failed to write mda header to /dev/disk/by-id/scsi-360060e80072a660000302a660000300b fd -1
>>  Failed to update old PV extension headers in VG vghundbhunrm02_r2.
>>
>> Any idea for this bug?
>>
>> Thanks a lot.
>> Gang
>>
>>
>> _______________________________________________
>> linux-lvm mailing list
>> linux-lvm@redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-lvm
>> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
>>
> 
> 
> 
> _______________________________________________
> linux-lvm mailing list
> linux-lvm@redhat.com
> https://www.redhat.com/mailman/listinfo/linux-lvm
> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
> 


-- 
Ingo Franzki
eMail: ifranzki@linux.ibm.com  
Tel: ++49 (0)7031-16-4648
Fax: ++49 (0)7031-16-3456
Linux on IBM Z Development, Schoenaicher Str. 220, 71032 Boeblingen, Germany

IBM Deutschland Research & Development GmbH / Vorsitzender des Aufsichtsrats: Matthias Hartmann
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 243294
IBM DATA Privacy Statement: https://www.ibm.com/privacy/us/en/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [linux-lvm] pvresize will cause a meta-data corruption with error message "Error writing device at 4096 length 512"
  2019-09-11 10:10   ` Ingo Franzki
@ 2019-09-11 10:20     ` Gang He
  0 siblings, 0 replies; 16+ messages in thread
From: Gang He @ 2019-09-11 10:20 UTC (permalink / raw)
  To: Ingo Franzki, Ilia Zykov, LVM general discussion and development

Hi Ingo and Ilia,

Thank for your helps.

> -----Original Message-----
> From: Ingo Franzki [mailto:ifranzki@linux.ibm.com]
> Sent: 2019年9月11日 18:11
> To: Ilia Zykov <mail@izyk.ru>; LVM general discussion and development
> <linux-lvm@redhat.com>; Gang He <GHe@suse.com>
> Subject: Re: [linux-lvm] pvresize will cause a meta-data corruption with error
> message "Error writing device at 4096 length 512"
> 
> On 11.09.2019 12:03, Ilia Zykov wrote:
> > Maybe this?
> >
> > Please note that this problem can also happen in other cases, such as
> > mixing disks with different block sizes (e.g. SCSI disks with 512
> > bytes and s390x-DASDs with 4096 block size).
> >
> >
> https://www.redhat.com/archives/linux-lvm/2019-February/msg00018.html
> 
> And the fix for this is already available upstream (Thanks David!):
> https://sourceware.org/git/?p=lvm2.git;a=commit;h=0404539edb25e4a9d34
> 56bb3e6b402aa2767af6b
This commit can fix the problem thoroughly? Do we need any other patches based on v2.02.180?

Thanks
Gang

> >
> >
> > On 11.09.2019 12:17, Gang He wrote:
> >> Hello List,
> >>
> >> Our user encountered a meta-data corruption problem, when run pvresize
> command after upgrading to LVM2 v2.02.180 from v2.02.120.
> >>
> >> The details are as below,
> >> we have following environment:
> >> - Storage: HP XP7 (SAN) - LUN's are presented to ESX via RDM
> >> - VMWare ESXi 6.5
> >> - SLES 12 SP 4 Guest
> >>
> >> Resize happened this way (is our standard way since years) - however
> >> - this is our first resize after upgrading SLES 12 SP3 to SLES 12 SP4 - until
> this upgrade, we never had a problem like this:
> >> - split continous access on storage box, resize lun on XP7
> >> - recreate ca on XP7
> >> - scan on ESX
> >> - rescan-scsi-bus.sh -s on SLES VM
> >> - pvresize  ( at this step the error happened)
> >>
> >> huns1vdb01:~ # pvresize
> >> /dev/disk/by-id/scsi-360060e80072a660000302a6600003274
> >>  Error writing device /dev/sdaf at 4096 length 512.
> >>  Failed to write mda header to /dev/sdaf fd -1  Failed to update old
> >> PV extension headers in VG vghundbhulv_ar.
> >>  Error writing device
> /dev/disk/by-id/scsi-360060e80072a660000302a66000031ec at 4096 length
> 512.
> >>  Failed to write mda header to
> >> /dev/disk/by-id/scsi-360060e80072a660000302a66000031ec fd -1
> Failed to update old PV extension headers in VG vghundbhulk_ar.
> >>  VG info not found after rescan of vghundbhulv_r2  VG info not found
> >> after rescan of vghundbhula_r1  VG info not found after rescan of
> >> vghundbhuco_ar  Error writing device
> >> /dev/disk/by-id/scsi-360060e80072a660000302a66000031e8 at 4096
> length 512.
> >>  Failed to write mda header to
> >> /dev/disk/by-id/scsi-360060e80072a660000302a66000031e8 fd -1
> Failed to update old PV extension headers in VG vghundbhula_ar.
> >>  VG info not found after rescan of vghundbhuco_r2  Error writing
> >> device /dev/disk/by-id/scsi-360060e80072a660000302a660000300b at
> 4096 length 512.
> >>  Failed to write mda header to
> >> /dev/disk/by-id/scsi-360060e80072a660000302a660000300b fd -1
> Failed to update old PV extension headers in VG vghundbhunrm02_r2.
> >>
> >> Any idea for this bug?
> >>
> >> Thanks a lot.
> >> Gang
> >>
> >>
> >> _______________________________________________
> >> linux-lvm mailing list
> >> linux-lvm@redhat.com
> >> https://www.redhat.com/mailman/listinfo/linux-lvm
> >> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
> >>
> >
> >
> >
> > _______________________________________________
> > linux-lvm mailing list
> > linux-lvm@redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-lvm
> > read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
> >
> 
> 
> --
> Ingo Franzki
> eMail: ifranzki@linux.ibm.com
> Tel: ++49 (0)7031-16-4648
> Fax: ++49 (0)7031-16-3456
> Linux on IBM Z Development, Schoenaicher Str. 220, 71032 Boeblingen,
> Germany
> 
> IBM Deutschland Research & Development GmbH / Vorsitzender des
> Aufsichtsrats: Matthias Hartmann
> Geschäftsführung: Dirk Wittkopp
> Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB
> 243294 IBM DATA Privacy Statement: https://www.ibm.com/privacy/us/en/
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [linux-lvm] pvresize will cause a meta-data corruption with error message "Error writing device at 4096 length 512"
  2019-09-11  9:17 [linux-lvm] pvresize will cause a meta-data corruption with error message "Error writing device at 4096 length 512" Gang He
  2019-09-11 10:01 ` Ilia Zykov
  2019-09-11 10:03 ` Ilia Zykov
@ 2019-10-11  8:11 ` Heming Zhao
  2019-10-11  9:22   ` Heming Zhao
  2019-10-11 15:14   ` David Teigland
  2 siblings, 2 replies; 16+ messages in thread
From: Heming Zhao @ 2019-10-11  8:11 UTC (permalink / raw)
  To: LVM general discussion and development, Gang He

Hello list,

I analyze this issue for some days. It looks a new bug.

trigger steps:
user execute pvresize to enlarge the pv.
After the command execution, one disk lvm metadata was overwrite by another disk lvm metadata.

once log (execute pvresize cmd), there are 7 disk occur read/write failed:
```
scsi-360060e80072a670000302a670000fc68
scsi-360060e80072a670000302a670000fc67
scsi-360060e80072a670000302a670000fc66
scsi-360060e80072a660000302a660000f74c
scsi-360060e80072a660000302a660000f74a
scsi-360060e80072a660000302a660000f749
scsi-360060e80072a660000302a660000f748 (has fc68 metadata)
```
the f748 metadata was overwritten by fc68.


the fc67 fc66 f74c f74a f749 f748 error log are same:
```
#toollib.c:4377          Processing PVs in VG vgpocdbcdb1_r1
#locking/locking.c:331           Dropping cache for vgpocdbcdb1_r1.
#misc/lvm-flock.c:202         Locking /run/lvm/lock/V_vgpocdbcdb1_r1 WB
#misc/lvm-flock.c:100           _do_flock /run/lvm/lock/V_vgpocdbcdb1_r1:aux WB
#misc/lvm-flock.c:100           _do_flock /run/lvm/lock/V_vgpocdbcdb1_r1 WB
#misc/lvm-flock.c:47            _undo_flock /run/lvm/lock/V_vgpocdbcdb1_r1:aux
#metadata/metadata.c:3778        Reading VG vgpocdbcdb1_r1 tTwjvG-xxxx-FA0cJj
#metadata/metadata.c:3874          Rescanning devices for vgpocdbcdb1_r1
#cache/lvmcache.c:751           lvmcache has no info for vgname "vgpocdbcdb1_r1" with VGID tTwjvGfl1zsU6gODANVsela1siFA0cJj.
#label/label.c:629           Scanning 1 devices for VG info
#label/label.c:665           Scanning submitted 1 reads
#label/label.c:674           Scan failed to read /dev/disk/by-id/scsi-360060e80072a670000302a670000fc67 error 0.
#device/bcache.c:189     WRITE last fd 36 last_offset 4608 last_sector_size 512
#device/bcache.c:244           Limit write at 0 len 131072 to len 4608
#label/label.c:764           Scanned devices: read errors 1 process errors 0 failed 1
#cache/lvmcache.c:751           lvmcache has no info for vgname "vgpocdbcdb1_r1" with VGID tTwjvGfl1zsU6gODANVsela1siFA0cJj.
#cache/lvmcache.c:1410    VG info not found after rescan of vgpocdbcdb1_r1
#cache/lvmcache.c:751           lvmcache has no info for vgname "vgpocdbcdb1_r1" with VGID tTwjvGfl1zsU6gODANVsela1siFA0cJj.
#metadata/metadata.c:3884          Cache did not find fmt for vgname vgpocdbcdb1_r1
#metadata/metadata.c:3885          <backtrace>
#metadata/metadata.c:4518          <backtrace>
```

the fc68 error log is in below <1> subsection.


 From all the log files, user's disk have 3 classes issues:

1> disk (fc68) has old lvm extension header.

It will trigger lvm write action to update PV header (metadata area).

related log:
```
#format_text/text_label.c:423           /dev/disk/by-id/scsi-360060e80072a670000302a670000fc68: PV header extension version 1 found
... ...
#metadata/metadata.c:2842          PV /dev/disk/by-id/scsi-360060e80072a670000302a670000fc68 has old extension header, updating to newest version.
```

In user machine, this write action was failed, the PV header data (first 4K) save in bcache (cache->errored list), and then write (by bcache_flush) to another disk (f748).

related error log:
```
#format_text/format-text.c:1470          Creating metadata area on /dev/disk/by-id/scsi-360060e80072a670000302a670000fc68 at sector 8 size 2040 sectors
#device/bcache.c:189     WRITE last fd 36 last_offset 4608 last_sector_size 512
#device/bcache.c:244           Limit write at 0 len 131072 to len 4608
#label/label.c:1333    Error writing device /dev/disk/by-id/scsi-360060e80072a670000302a670000fc68 at 4096 length 512.
#format_text/format-text.c:407     Failed to write mda header to /dev/disk/by-id/scsi-360060e80072a670000302a670000fc68 fd -1
```

related code:
pvresize
  process_each_pv
   _process_pvs_in_vgs
    vg_read
     vg_read_internal
      _vg_read
       _vg_update_old_pv_ext_if_needed
        +-> pv_needs_rewrite is 1
        |    set vg->pv_write_list
        +-> vg_write
    ||
    \/
vg_write
  pv_write //vg->pv_write_list is not empty.
   pv->fmt->ops->pv_write
    _text_pv_write
     lvmcache_foreach_mda(info, _write_single_mda, &baton)
      _raw_write_mda_header

static int  _raw_write_mda_header ()
{
     ... ...
     dev_set_last_byte(dev, start_byte + MDA_HEADER_SIZE);

     if (!dev_write_bytes(dev, start_byte, MDA_HEADER_SIZE, mdah)) {
         dev_unset_last_byte(dev); //zhm: useless, fd = -1 now!!
         return 0;
     }
     dev_unset_last_byte(dev);

     return 1;
}

If dev_write_bytes failed, the bcache never clean last_byte. and the fd is closed at same time, but cache->errored still have errored fd's data. later lvm open new disk, the fd may reuse the old-errored fd number, error data will be written when later lvm call bcache_flush.

2> duplicated pv header.
    as <1> description, fc68 metadata was overwritten to f748.
    this cause by lvm bug (I said in <1>).

3> device not correct
    I don't know why the disk scsi-360060e80072a670000302a670000fc68 has below wrong metadata:

pre_pvr/scsi-360060e80072a670000302a670000fc68
(please also read the comments in below metadata area.)
```
     vgpocdbcdb1_r2 {
         id = "PWd17E-xxx-oANHbq"
         seqno = 20
         format = "lvm2"
         status = ["RESIZEABLE", "READ", "WRITE"]
         flags = []
         extent_size = 65536
         max_lv = 0
         max_pv = 0
         metadata_copies = 0
         
         physical_volumes {
             
             pv0 {
                 id = "3KTOW5-xxxx-8g0Rf2"
                 device = "/dev/disk/by-id/scsi-360060e80072a660000302a660000f768"
                                                                     Wrong!! ^^^^^
                          I don't know why there is f768, please ask customer
                 status = ["ALLOCATABLE"]
                 flags = []
                 dev_size = 860160
                 pe_start = 2048
                 pe_count = 13
             }
         }
```
    fc68 => f768  the 'c' (b1100) change to '7' (b0111).
    maybe disk bit overturn, maybe lvm has bug. I don't know & have no idea.

Thanks
zhm

On 9/11/19 5:17 PM, Gang He wrote:
> Hello List,
> 
> Our user encountered a meta-data corruption problem, when run pvresize command after upgrading to LVM2 v2.02.180 from v2.02.120.
> 
> The details are as below,
> we have following environment:
> - Storage: HP XP7 (SAN) - LUN's are presented to ESX via RDM
> - VMWare ESXi 6.5
> - SLES 12 SP 4 Guest
> 
> Resize happened this way (is our standard way since years) - however - this is our first resize after upgrading SLES 12 SP3 to SLES 12 SP4 - until this upgrade, we
> never had a problem like this:
> - split continous access on storage box, resize lun on XP7
> - recreate ca on XP7
> - scan on ESX
> - rescan-scsi-bus.sh -s on SLES VM
> - pvresize  ( at this step the error happened)
> 
> huns1vdb01:~ # pvresize /dev/disk/by-id/scsi-360060e80072a660000302a6600003274

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [linux-lvm] pvresize will cause a meta-data corruption with error message "Error writing device at 4096 length 512"
  2019-10-11  8:11 ` Heming Zhao
@ 2019-10-11  9:22   ` Heming Zhao
  2019-10-11 10:38     ` Zdenek Kabelac
  2019-10-11 15:14   ` David Teigland
  1 sibling, 1 reply; 16+ messages in thread
From: Heming Zhao @ 2019-10-11  9:22 UTC (permalink / raw)
  To: LVM general discussion and development, Gang He

Only one thing I am confusion all the time.
When read/write error, lvm will call bcache_invalidate_fd & _scan_dev_close to close fd.
So the first successfully read (i.e.: f747), which following f748 finally has fc68's fd.
This will cause f747 metadata overwrite not f748.

the sequence of disk scanning:
```
  scsi-360060e80072a670000302a670000fc69 <=== successful
  scsi-360060e80072a670000302a670000fc68 <=== first failed
  scsi-360060e80072a670000302a670000fc67
  scsi-360060e80072a670000302a670000fc66
  scsi-360060e80072a660000302a660000f74c
  scsi-360060e80072a660000302a660000f74a
  scsi-360060e80072a660000302a660000f749
  scsi-360060e80072a660000302a660000f748 (has fc68 metadata) <=== last failed
  scsi-360060e80072a660000302a660000f747 <=== first successfully read following last failed
```

Hope you understand my saying.


On 10/11/19 4:11 PM, Heming Zhao wrote:
> Hello list,
> 
> I analyze this issue for some days. It looks a new bug.
> 
> trigger steps:
> user execute pvresize to enlarge the pv.
> After the command execution, one disk lvm metadata was overwrite by another disk lvm metadata.
> 
> once log (execute pvresize cmd), there are 7 disk occur read/write failed:
> ```
> scsi-360060e80072a670000302a670000fc68
> scsi-360060e80072a670000302a670000fc67
> scsi-360060e80072a670000302a670000fc66
> scsi-360060e80072a660000302a660000f74c
> scsi-360060e80072a660000302a660000f74a
> scsi-360060e80072a660000302a660000f749
> scsi-360060e80072a660000302a660000f748 (has fc68 metadata)
> ```
> the f748 metadata was overwritten by fc68.
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [linux-lvm] pvresize will cause a meta-data corruption with error message "Error writing device at 4096 length 512"
  2019-10-11  9:22   ` Heming Zhao
@ 2019-10-11 10:38     ` Zdenek Kabelac
  2019-10-11 11:50       ` Heming Zhao
  0 siblings, 1 reply; 16+ messages in thread
From: Zdenek Kabelac @ 2019-10-11 10:38 UTC (permalink / raw)
  To: LVM general discussion and development, Heming Zhao, Gang He

Dne 11. 10. 19 v 11:22 Heming Zhao napsal(a):
> Only one thing I am confusion all the time.
> When read/write error, lvm will call bcache_invalidate_fd & _scan_dev_close to close fd.
> So the first successfully read (i.e.: f747), which following f748 finally has fc68's fd.
> This will cause f747 metadata overwrite not f748.
> 
>

Hi

Have you considered checking newer version of lvm2?

Regards

Zdenek

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [linux-lvm] pvresize will cause a meta-data corruption with error message "Error writing device at 4096 length 512"
  2019-10-11 10:38     ` Zdenek Kabelac
@ 2019-10-11 11:50       ` Heming Zhao
  0 siblings, 0 replies; 16+ messages in thread
From: Heming Zhao @ 2019-10-11 11:50 UTC (permalink / raw)
  To: Zdenek Kabelac, LVM general discussion and development, Gang He

For _raw_write_mda_header(), the latest codes are same as stable-2.02.
And below usage is wrong, should be fixed.
```
     if (!dev_write_bytes(mdac->area.dev, write1_start, (size_t)write1_size, write_buf)) {
         ... ...
         dev_unset_last_byte(mdac->area.dev); <==== invalid code, this time fd had been released.
         ... ...
     }
```

This issue only happened on our customer machine, when updating lvm2 from
2.02.120 (no bcache code) to 2.02.180 (contains bcache).

Thanks
zhm

On 10/11/19 6:38 PM, Zdenek Kabelac wrote:
> Dne 11. 10. 19 v 11:22 Heming Zhao napsal(a):
>> Only one thing I am confusion all the time.
>> When read/write error, lvm will call bcache_invalidate_fd & _scan_dev_close to close fd.
>> So the first successfully read (i.e.: f747), which following f748 finally has fc68's fd.
>> This will cause f747 metadata overwrite not f748.
>>
>>
> 
> Hi
> 
> Have you considered checking newer version of lvm2?
> 
> Regards
> 
> Zdenek
> 
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [linux-lvm] pvresize will cause a meta-data corruption with error message "Error writing device at 4096 length 512"
  2019-10-11  8:11 ` Heming Zhao
  2019-10-11  9:22   ` Heming Zhao
@ 2019-10-11 15:14   ` David Teigland
  2019-10-12  3:23     ` Gang He
  2019-10-12  6:34     ` Heming Zhao
  1 sibling, 2 replies; 16+ messages in thread
From: David Teigland @ 2019-10-11 15:14 UTC (permalink / raw)
  To: Heming Zhao; +Cc: Gang He, linux-lvm

On Fri, Oct 11, 2019 at 08:11:29AM +0000, Heming Zhao wrote:

> I analyze this issue for some days. It looks a new bug.

Yes, thanks for the thorough analysis.

> In user machine, this write action was failed, the PV header data (first
> 4K) save in bcache (cache->errored list), and then write (by
> bcache_flush) to another disk (f748).

It looks like we need to get rid of cache->errored completely.

> If dev_write_bytes failed, the bcache never clean last_byte. and the fd
> is closed at same time, but cache->errored still have errored fd's data.
> later lvm open new disk, the fd may reuse the old-errored fd number,
> error data will be written when later lvm call bcache_flush.

That's a bad bug.

> 2> duplicated pv header.
>     as <1> description, fc68 metadata was overwritten to f748.
>     this cause by lvm bug (I said in <1>).
> 
> 3> device not correct
>     I don't know why the disk scsi-360060e80072a670000302a670000fc68 has below wrong metadata:
> 
> pre_pvr/scsi-360060e80072a670000302a670000fc68
> (please also read the comments in below metadata area.)
> ```
>      vgpocdbcdb1_r2 {
>          id = "PWd17E-xxx-oANHbq"
>          seqno = 20
>          format = "lvm2"
>          status = ["RESIZEABLE", "READ", "WRITE"]
>          flags = []
>          extent_size = 65536
>          max_lv = 0
>          max_pv = 0
>          metadata_copies = 0
>          
>          physical_volumes {
>              
>              pv0 {
>                  id = "3KTOW5-xxxx-8g0Rf2"
>                  device = "/dev/disk/by-id/scsi-360060e80072a660000302a660000f768"
>                                                                      Wrong!! ^^^^^
>                           I don't know why there is f768, please ask customer
>                  status = ["ALLOCATABLE"]
>                  flags = []
>                  dev_size = 860160
>                  pe_start = 2048
>                  pe_count = 13
>              }
>          }
> ```
>     fc68 => f768  the 'c' (b1100) change to '7' (b0111).
>     maybe disk bit overturn, maybe lvm has bug. I don't know & have no idea.

Is scsi-360060e80072a660000302a660000f768 the correct device for
PVID 3KTOW5...?  If so, then it's consistent.  If not, then I suspect
this is a result of duplicating the PVID on multiple devices above.


> On 9/11/19 5:17 PM, Gang He wrote:
> > Hello List,
> > 
> > Our user encountered a meta-data corruption problem, when run pvresize command after upgrading to LVM2 v2.02.180 from v2.02.120.
> > 
> > The details are as below,
> > we have following environment:
> > - Storage: HP XP7 (SAN) - LUN's are presented to ESX via RDM
> > - VMWare ESXi 6.5
> > - SLES 12 SP 4 Guest
> > 
> > Resize happened this way (is our standard way since years) - however - this is our first resize after upgrading SLES 12 SP3 to SLES 12 SP4 - until this upgrade, we
> > never had a problem like this:
> > - split continous access on storage box, resize lun on XP7
> > - recreate ca on XP7
> > - scan on ESX
> > - rescan-scsi-bus.sh -s on SLES VM
> > - pvresize  ( at this step the error happened)
> > 
> > huns1vdb01:~ # pvresize /dev/disk/by-id/scsi-360060e80072a660000302a6600003274
> 
> _______________________________________________
> linux-lvm mailing list
> linux-lvm@redhat.com
> https://www.redhat.com/mailman/listinfo/linux-lvm
> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [linux-lvm] pvresize will cause a meta-data corruption with error message "Error writing device at 4096 length 512"
  2019-10-11 15:14   ` David Teigland
@ 2019-10-12  3:23     ` Gang He
  2019-10-12  6:34     ` Heming Zhao
  1 sibling, 0 replies; 16+ messages in thread
From: Gang He @ 2019-10-12  3:23 UTC (permalink / raw)
  To: David Teigland, Heming Zhao; +Cc: linux-lvm

Hello David,

Based on the information from Heming, do you think this is a new bug? Or we can fix it with the existing patches.
Now, the user want to restore the LVM2 meta-data back to the original status, do you have any suggestions?

Thanks
Gang

> -----Original Message-----
> From: David Teigland [mailto:teigland@redhat.com]
> Sent: 2019��10��11�� 23:14
> To: Heming Zhao <heming.zhao@suse.com>
> Cc: linux-lvm@redhat.com; Gang He <GHe@suse.com>
> Subject: Re: [linux-lvm] pvresize will cause a meta-data corruption with error
> message "Error writing device at 4096 length 512"
> 
> On Fri, Oct 11, 2019 at 08:11:29AM +0000, Heming Zhao wrote:
> 
> > I analyze this issue for some days. It looks a new bug.
> 
> Yes, thanks for the thorough analysis.
> 
> > In user machine, this write action was failed, the PV header data
> > (first
> > 4K) save in bcache (cache->errored list), and then write (by
> > bcache_flush) to another disk (f748).
> 
> It looks like we need to get rid of cache->errored completely.
> 
> > If dev_write_bytes failed, the bcache never clean last_byte. and the
> > fd is closed at same time, but cache->errored still have errored fd's data.
> > later lvm open new disk, the fd may reuse the old-errored fd number,
> > error data will be written when later lvm call bcache_flush.
> 
> That's a bad bug.
> 
> > 2> duplicated pv header.
> >     as <1> description, fc68 metadata was overwritten to f748.
> >     this cause by lvm bug (I said in <1>).
> >
> > 3> device not correct
> >     I don't know why the disk
> scsi-360060e80072a670000302a670000fc68 has below wrong metadata:
> >
> > pre_pvr/scsi-360060e80072a670000302a670000fc68
> > (please also read the comments in below metadata area.) ```
> >      vgpocdbcdb1_r2 {
> >          id = "PWd17E-xxx-oANHbq"
> >          seqno = 20
> >          format = "lvm2"
> >          status = ["RESIZEABLE", "READ", "WRITE"]
> >          flags = []
> >          extent_size = 65536
> >          max_lv = 0
> >          max_pv = 0
> >          metadata_copies = 0
> >
> >          physical_volumes {
> >
> >              pv0 {
> >                  id = "3KTOW5-xxxx-8g0Rf2"
> >                  device =
> "/dev/disk/by-id/scsi-360060e80072a660000302a660000f768"
> >
> Wrong!! ^^^^^
> >                           I don't know why there is f768, please ask
> customer
> >                  status = ["ALLOCATABLE"]
> >                  flags = []
> >                  dev_size = 860160
> >                  pe_start = 2048
> >                  pe_count = 13
> >              }
> >          }
> > ```
> >     fc68 => f768  the 'c' (b1100) change to '7' (b0111).
> >     maybe disk bit overturn, maybe lvm has bug. I don't know & have no
> idea.
> 
> Is scsi-360060e80072a660000302a660000f768 the correct device for PVID
> 3KTOW5...?  If so, then it's consistent.  If not, then I suspect this is a result of
> duplicating the PVID on multiple devices above.
> 
> 
> > On 9/11/19 5:17 PM, Gang He wrote:
> > > Hello List,
> > >
> > > Our user encountered a meta-data corruption problem, when run
> pvresize command after upgrading to LVM2 v2.02.180 from v2.02.120.
> > >
> > > The details are as below,
> > > we have following environment:
> > > - Storage: HP XP7 (SAN) - LUN's are presented to ESX via RDM
> > > - VMWare ESXi 6.5
> > > - SLES 12 SP 4 Guest
> > >
> > > Resize happened this way (is our standard way since years) - however
> > > - this is our first resize after upgrading SLES 12 SP3 to SLES 12 SP4 - until
> this upgrade, we never had a problem like this:
> > > - split continous access on storage box, resize lun on XP7
> > > - recreate ca on XP7
> > > - scan on ESX
> > > - rescan-scsi-bus.sh -s on SLES VM
> > > - pvresize  ( at this step the error happened)
> > >
> > > huns1vdb01:~ # pvresize
> > > /dev/disk/by-id/scsi-360060e80072a660000302a6600003274
> >
> > _______________________________________________
> > linux-lvm mailing list
> > linux-lvm@redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-lvm
> > read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [linux-lvm] pvresize will cause a meta-data corruption with error message "Error writing device at 4096 length 512"
  2019-10-11 15:14   ` David Teigland
  2019-10-12  3:23     ` Gang He
@ 2019-10-12  6:34     ` Heming Zhao
  2019-10-12  7:11       ` Heming Zhao
  1 sibling, 1 reply; 16+ messages in thread
From: Heming Zhao @ 2019-10-12  6:34 UTC (permalink / raw)
  To: David Teigland; +Cc: Gang He, linux-lvm

Hello David,

Thank you for your reply.

For these days analysis code, I found below codes can be enhanced.
(code changes base on git master branch.)

---------------
commit 3768196011fb01e4016510bfab9eef0c7bdc04f5 (HEAD -> master)
Author: Zhao Heming <heming.zhao@suse.com>
Date:   Sat Oct 12 14:28:06 2019 +0800

     fix typo in lib/cache/lvmcache.c
     enhance error handling in bcache
     fix constant var 'error' in _scan_list
     fix gcc warning in _lvconvert_split_cache_single
     
     Signed-off-by: Zhao Heming <heming.zhao@suse.com>

diff --git a/lib/cache/lvmcache.c b/lib/cache/lvmcache.c
index f6e792459b..499f9437cb 100644
--- a/lib/cache/lvmcache.c
+++ b/lib/cache/lvmcache.c
@@ -939,7 +939,7 @@ int lvmcache_label_rescan_vg_rw(struct cmd_context *cmd, const char *vgname, con
   * incorrectly placed PVs should have been moved from the orphan vginfo
   * onto their correct vginfo's, and the orphan vginfo should (in theory)
   * represent only real orphan PVs.  (Note: if lvmcache_label_scan is run
- * after vg_read udpates to lvmcache state, then the lvmcache will be
+ * after vg_read updates to lvmcache state, then the lvmcache will be
   * incorrect again, so do not run lvmcache_label_scan during the
   * processing phase.)
   *
diff --git a/lib/device/bcache.c b/lib/device/bcache.c
index d100419770..cfe01bac2f 100644
--- a/lib/device/bcache.c
+++ b/lib/device/bcache.c
@@ -292,6 +292,10 @@ static bool _async_issue(struct io_engine *ioe, enum dir d, int fd,
         } while (r == -EAGAIN);
  
         if (r < 0) {
+               ((struct block *)context)->error = r;
+               log_warn("io_submit <%c> off %llu bytes %llu return %d:%s",
+                               (d == DIR_READ) ? 'R' : 'W', (long long unsigned)offset,
+                               (long long unsigned)nbytes, r, strerror(-r));
                 _cb_free(e->cbs, cb);
                 return false;
         }
@@ -842,7 +846,7 @@ static void _complete_io(void *context, int err)
  
         if (b->error) {
                 dm_list_add(&cache->errored, &b->list);
-
+               log_warn("fd: %d error: %d", b->fd, err);
         } else {
                 _clear_flags(b, BF_DIRTY);
                 _link_block(b);
@@ -869,8 +873,7 @@ static void _issue_low_level(struct block *b, enum dir d)
         dm_list_move(&cache->io_pending, &b->list);
  
         if (!cache->engine->issue(cache->engine, d, b->fd, sb, se, b->data, b)) {
-               /* FIXME: if io_submit() set an errno, return that instead of EIO? */
-               _complete_io(b, -EIO);
+               _complete_io(b, b->error);
                 return;
         }
  }
diff --git a/lib/label/label.c b/lib/label/label.c
index dc4d32d151..60ad387219 100644
--- a/lib/label/label.c
+++ b/lib/label/label.c
@@ -647,7 +647,6 @@ static int _scan_list(struct cmd_context *cmd, struct dev_filter *f,
         int submit_count;
         int scan_failed;
         int is_lvm_device;
-       int error;
         int ret;
  
         dm_list_init(&wait_devs);
@@ -694,12 +693,12 @@ static int _scan_list(struct cmd_context *cmd, struct dev_filter *f,
  
         dm_list_iterate_items_safe(devl, devl2, &wait_devs) {
                 bb = NULL;
-               error = 0;
                 scan_failed = 0;
                 is_lvm_device = 0;
  
                 if (!bcache_get(scan_bcache, devl->dev->bcache_fd, 0, 0, &bb)) {
-                       log_debug_devs("Scan failed to read %s error %d.", dev_name(devl->dev), error);
+                       log_debug_devs("Scan failed to read %s error %d.",
+                                                       dev_name(devl->dev), bb ? bb->error : 0);
                         scan_failed = 1;
                         scan_read_errors++;
                         scan_failed_count++;
diff --git a/tools/lvconvert.c b/tools/lvconvert.c
index 60ab956614..4939e5ec7d 100644
--- a/tools/lvconvert.c
+++ b/tools/lvconvert.c
@@ -4676,7 +4676,7 @@ static int _lvconvert_split_cache_single(struct cmd_context *cmd,
         struct logical_volume *lv_main = NULL;
         struct logical_volume *lv_fast = NULL;
         struct lv_segment *seg;
-       int ret;
+       int ret = 0;
  
         if (lv_is_writecache(lv)) {
                 lv_main = lv;

---
Thanks
zhm

On 10/11/19 11:14 PM, David Teigland wrote:
> On Fri, Oct 11, 2019 at 08:11:29AM +0000, Heming Zhao wrote:
> 
>> I analyze this issue for some days. It looks a new bug.
> 
> Yes, thanks for the thorough analysis.
> 
>> In user machine, this write action was failed, the PV header data (first
>> 4K) save in bcache (cache->errored list), and then write (by
>> bcache_flush) to another disk (f748).
> 
> It looks like we need to get rid of cache->errored completely.
> 
>> If dev_write_bytes failed, the bcache never clean last_byte. and the fd
>> is closed at same time, but cache->errored still have errored fd's data.
>> later lvm open new disk, the fd may reuse the old-errored fd number,
>> error data will be written when later lvm call bcache_flush.
> 
> That's a bad bug.
> 
>> 2> duplicated pv header.
>>      as <1> description, fc68 metadata was overwritten to f748.
>>      this cause by lvm bug (I said in <1>).
>>
>> 3> device not correct
>>      I don't know why the disk scsi-360060e80072a670000302a670000fc68 has below wrong metadata:
>>
>> pre_pvr/scsi-360060e80072a670000302a670000fc68
>> (please also read the comments in below metadata area.)
>> ```
>>       vgpocdbcdb1_r2 {
>>           id = "PWd17E-xxx-oANHbq"
>>           seqno = 20
>>           format = "lvm2"
>>           status = ["RESIZEABLE", "READ", "WRITE"]
>>           flags = []
>>           extent_size = 65536
>>           max_lv = 0
>>           max_pv = 0
>>           metadata_copies = 0
>>           
>>           physical_volumes {
>>               
>>               pv0 {
>>                   id = "3KTOW5-xxxx-8g0Rf2"
>>                   device = "/dev/disk/by-id/scsi-360060e80072a660000302a660000f768"
>>                                                                       Wrong!! ^^^^^
>>                            I don't know why there is f768, please ask customer
>>                   status = ["ALLOCATABLE"]
>>                   flags = []
>>                   dev_size = 860160
>>                   pe_start = 2048
>>                   pe_count = 13
>>               }
>>           }
>> ```
>>      fc68 => f768  the 'c' (b1100) change to '7' (b0111).
>>      maybe disk bit overturn, maybe lvm has bug. I don't know & have no idea.
> 
> Is scsi-360060e80072a660000302a660000f768 the correct device for
> PVID 3KTOW5...?  If so, then it's consistent.  If not, then I suspect
> this is a result of duplicating the PVID on multiple devices above.
> 
> 
>> On 9/11/19 5:17 PM, Gang He wrote:
>>> Hello List,
>>>
>>> Our user encountered a meta-data corruption problem, when run pvresize command after upgrading to LVM2 v2.02.180 from v2.02.120.
>>>
>>> The details are as below,
>>> we have following environment:
>>> - Storage: HP XP7 (SAN) - LUN's are presented to ESX via RDM
>>> - VMWare ESXi 6.5
>>> - SLES 12 SP 4 Guest
>>>
>>> Resize happened this way (is our standard way since years) - however - this is our first resize after upgrading SLES 12 SP3 to SLES 12 SP4 - until this upgrade, we
>>> never had a problem like this:
>>> - split continous access on storage box, resize lun on XP7
>>> - recreate ca on XP7
>>> - scan on ESX
>>> - rescan-scsi-bus.sh -s on SLES VM
>>> - pvresize  ( at this step the error happened)
>>>
>>> huns1vdb01:~ # pvresize /dev/disk/by-id/scsi-360060e80072a660000302a6600003274
>>
>> _______________________________________________
>> linux-lvm mailing list
>> linux-lvm@redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-lvm
>> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
> 

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [linux-lvm] pvresize will cause a meta-data corruption with error message "Error writing device at 4096 length 512"
  2019-10-12  6:34     ` Heming Zhao
@ 2019-10-12  7:11       ` Heming Zhao
  2019-10-14  3:07         ` Heming Zhao
  2019-10-14  3:13         ` Heming Zhao
  0 siblings, 2 replies; 16+ messages in thread
From: Heming Zhao @ 2019-10-12  7:11 UTC (permalink / raw)
  To: David Teigland; +Cc: Gang He, linux-lvm

Hello List & David,

Below patch for fix incorrect calling dev_unset_last_byte.

------------
commit 89cfffeffb7499d8f51112f58c381007aebc372d (HEAD -> master)
Author: Zhao Heming <heming.zhao@suse.com>
Date:   Sat Oct 12 15:04:42 2019 +0800

     When dev_write_bytes error, this function will release fd.
     It makes caller can't reset bcache last_byte by dev_unset_last_byte.
     
     Signed-off-by: Zhao Heming <heming.zhao@suse.com>

diff --git a/.gitignore b/.gitignore
index 7ebb8bb3be..cfd5bee1c4 100644
--- a/.gitignore
+++ b/.gitignore
@@ -30,7 +30,7 @@ make.tmpl
  /config.log
  /config.status
  /configure.scan
-/cscope.out
+/cscope.*
  /html/
  /reports/
  /tags
diff --git a/lib/format_text/format-text.c b/lib/format_text/format-text.c
index 6ec47bfcef..fd65f50f5f 100644
--- a/lib/format_text/format-text.c
+++ b/lib/format_text/format-text.c
@@ -277,8 +277,7 @@ static int _raw_write_mda_header(const struct format_type *fmt,
         dev_set_last_byte(dev, start_byte + MDA_HEADER_SIZE);
  
         if (!dev_write_bytes(dev, start_byte, MDA_HEADER_SIZE, mdah)) {
-               dev_unset_last_byte(dev);
-               log_error("Failed to write mda header to %s fd %d", dev_name(dev), dev->bcache_fd);
+               log_error("Failed to write mda header to %s", dev_name(dev));
                 return 0;
         }
         dev_unset_last_byte(dev);
@@ -988,8 +987,7 @@ static int _vg_write_raw(struct format_instance *fid, struct volume_group *vg,
                            (unsigned long long)write2_size);
  
         if (!dev_write_bytes(mdac->area.dev, write1_start, (size_t)write1_size, write_buf)) {
-               log_error("Failed to write metadata to %s fd %d", devname, mdac->area.dev->bcache_fd);
-               dev_unset_last_byte(mdac->area.dev);
+               log_error("Failed to write metadata to %s", devname);
                 goto out;
         }

@@ -1001,8 +999,7 @@ static int _vg_write_raw(struct format_instance *fid, struct volume_group *vg,
  
                 if (!dev_write_bytes(mdac->area.dev, write2_start, write2_size,
                                      write_buf + new_size - new_wrap)) {
-                       log_error("Failed to write metadata wrap to %s fd %d", devname, mdac->area.dev->bcache_fd);
-                       dev_unset_last_byte(mdac->area.dev);
+                       log_error("Failed to write metadata wrap to %s", devname);
                         goto out;
                 }
         }
@@ -1019,7 +1016,7 @@ static int _vg_write_raw(struct format_instance *fid, struct volume_group *vg,
  
         r = 1;
  
-      out:
+out:
         if (!r) {
                 free(fidtc->write_buf);
                 fidtc->write_buf = NULL;
diff --git a/lib/label/label.c b/lib/label/label.c
index 60ad387219..f4787b18cb 100644
--- a/lib/label/label.c
+++ b/lib/label/label.c
@@ -218,7 +218,7 @@ int label_write(struct device *dev, struct label *label)
  
         if (!dev_write_bytes(dev, offset, LABEL_SIZE, buf)) {
                 log_debug_devs("Failed to write label to %s", dev_name(dev));
-               r = 0;
+               return 0;
         }
  
         dev_unset_last_byte(dev);
@@ -1415,7 +1415,8 @@ bool dev_write_bytes(struct device *dev, uint64_t start, size_t len, void *data)
  
         if (!scan_bcache) {
                 /* Should not happen */
-               log_error("dev_write bcache not set up %s", dev_name(dev));
+               log_error("dev_write bcache not set up %s fd %d", dev_name(dev),
+                               dev->bcache_fd);
                 return false;
         }
  
@@ -1434,21 +1435,25 @@ bool dev_write_bytes(struct device *dev, uint64_t start, size_t len, void *data)
                 dev->flags |= DEV_BCACHE_WRITE;
                 if (!label_scan_open(dev)) {
                         log_error("Error opening device %s for writing at %llu length %u.",
-                                 dev_name(dev), (unsigned long long)start, (uint32_t)len);
+                                       dev_name(dev), (unsigned long long)start, (uint32_t)len);
                         return false;
                 }
         }
  
         if (!bcache_write_bytes(scan_bcache, dev->bcache_fd, start, len, data)) {
-               log_error("Error writing device %s at %llu length %u.",
-                         dev_name(dev), (unsigned long long)start, (uint32_t)len);
+               log_error("Error writing device %s at %llu length %u fd %d.",
+                         dev_name(dev), (unsigned long long)start, (uint32_t)len,
+                         dev->bcache_fd);
+               dev_unset_last_byte(mdac->area.dev);
                 label_scan_invalidate(dev);
                 return false;
         }
  
         if (!bcache_flush(scan_bcache)) {
-               log_error("Error writing device %s at %llu length %u.",
-                         dev_name(dev), (unsigned long long)start, (uint32_t)len);
+               log_error("Error writing device %s at %llu length %u fd %d.",
+                         dev_name(dev), (unsigned long long)start, (uint32_t)len,
+                         dev->bcache_fd);
+               dev_unset_last_byte(mdac->area.dev);
                 label_scan_invalidate(dev);
                 return false;
         }
diff --git a/lib/metadata/mirror.c b/lib/metadata/mirror.c
index 75dc18c113..c8280f9c47 100644
--- a/lib/metadata/mirror.c
+++ b/lib/metadata/mirror.c
@@ -266,7 +266,6 @@ static int _write_log_header(struct cmd_context *cmd, struct logical_volume *lv)
         dev_set_last_byte(dev, sizeof(log_header));
  
         if (!dev_write_bytes(dev, UINT64_C(0), sizeof(log_header), &log_header)) {
-               dev_unset_last_byte(dev);
                 log_error("Failed to write log header to %s.", name);
                 return 0;
         }

---
Thanks
zhm

On 10/12/19 2:34 PM, Heming Zhao wrote:
> Hello David,
> 
> Thank you for your reply.
> 
> For these days analysis code, I found below codes can be enhanced.
> (code changes base on git master branch.)
> 
> ---------------
> commit 3768196011fb01e4016510bfab9eef0c7bdc04f5 (HEAD -> master)
> Author: Zhao Heming <heming.zhao@suse.com>
> Date:   Sat Oct 12 14:28:06 2019 +0800
> 
>       fix typo in lib/cache/lvmcache.c
>       enhance error handling in bcache
>       fix constant var 'error' in _scan_list
>       fix gcc warning in _lvconvert_split_cache_single
>       
>       Signed-off-by: Zhao Heming <heming.zhao@suse.com>
> 
> diff --git a/lib/cache/lvmcache.c b/lib/cache/lvmcache.c
> index f6e792459b..499f9437cb 100644
> --- a/lib/cache/lvmcache.c
> +++ b/lib/cache/lvmcache.c
> @@ -939,7 +939,7 @@ int lvmcache_label_rescan_vg_rw(struct cmd_context *cmd, const char *vgname, con
>     * incorrectly placed PVs should have been moved from the orphan vginfo
>     * onto their correct vginfo's, and the orphan vginfo should (in theory)
>     * represent only real orphan PVs.  (Note: if lvmcache_label_scan is run
> - * after vg_read udpates to lvmcache state, then the lvmcache will be
> + * after vg_read updates to lvmcache state, then the lvmcache will be
>     * incorrect again, so do not run lvmcache_label_scan during the
>     * processing phase.)
>     *
> diff --git a/lib/device/bcache.c b/lib/device/bcache.c
> index d100419770..cfe01bac2f 100644
> --- a/lib/device/bcache.c
> +++ b/lib/device/bcache.c
> @@ -292,6 +292,10 @@ static bool _async_issue(struct io_engine *ioe, enum dir d, int fd,
>           } while (r == -EAGAIN);
>    
>           if (r < 0) {
> +               ((struct block *)context)->error = r;
> +               log_warn("io_submit <%c> off %llu bytes %llu return %d:%s",
> +                               (d == DIR_READ) ? 'R' : 'W', (long long unsigned)offset,
> +                               (long long unsigned)nbytes, r, strerror(-r));
>                   _cb_free(e->cbs, cb);
>                   return false;
>           }
> @@ -842,7 +846,7 @@ static void _complete_io(void *context, int err)
>    
>           if (b->error) {
>                   dm_list_add(&cache->errored, &b->list);
> -
> +               log_warn("fd: %d error: %d", b->fd, err);
>           } else {
>                   _clear_flags(b, BF_DIRTY);
>                   _link_block(b);
> @@ -869,8 +873,7 @@ static void _issue_low_level(struct block *b, enum dir d)
>           dm_list_move(&cache->io_pending, &b->list);
>    
>           if (!cache->engine->issue(cache->engine, d, b->fd, sb, se, b->data, b)) {
> -               /* FIXME: if io_submit() set an errno, return that instead of EIO? */
> -               _complete_io(b, -EIO);
> +               _complete_io(b, b->error);
>                   return;
>           }
>    }
> diff --git a/lib/label/label.c b/lib/label/label.c
> index dc4d32d151..60ad387219 100644
> --- a/lib/label/label.c
> +++ b/lib/label/label.c
> @@ -647,7 +647,6 @@ static int _scan_list(struct cmd_context *cmd, struct dev_filter *f,
>           int submit_count;
>           int scan_failed;
>           int is_lvm_device;
> -       int error;
>           int ret;
>    
>           dm_list_init(&wait_devs);
> @@ -694,12 +693,12 @@ static int _scan_list(struct cmd_context *cmd, struct dev_filter *f,
>    
>           dm_list_iterate_items_safe(devl, devl2, &wait_devs) {
>                   bb = NULL;
> -               error = 0;
>                   scan_failed = 0;
>                   is_lvm_device = 0;
>    
>                   if (!bcache_get(scan_bcache, devl->dev->bcache_fd, 0, 0, &bb)) {
> -                       log_debug_devs("Scan failed to read %s error %d.", dev_name(devl->dev), error);
> +                       log_debug_devs("Scan failed to read %s error %d.",
> +                                                       dev_name(devl->dev), bb ? bb->error : 0);
>                           scan_failed = 1;
>                           scan_read_errors++;
>                           scan_failed_count++;
> diff --git a/tools/lvconvert.c b/tools/lvconvert.c
> index 60ab956614..4939e5ec7d 100644
> --- a/tools/lvconvert.c
> +++ b/tools/lvconvert.c
> @@ -4676,7 +4676,7 @@ static int _lvconvert_split_cache_single(struct cmd_context *cmd,
>           struct logical_volume *lv_main = NULL;
>           struct logical_volume *lv_fast = NULL;
>           struct lv_segment *seg;
> -       int ret;
> +       int ret = 0;
>    
>           if (lv_is_writecache(lv)) {
>                   lv_main = lv;
> 
> ---
> Thanks
> zhm
> 
> On 10/11/19 11:14 PM, David Teigland wrote:
>> On Fri, Oct 11, 2019 at 08:11:29AM +0000, Heming Zhao wrote:
>>
>>> I analyze this issue for some days. It looks a new bug.
>>
>> Yes, thanks for the thorough analysis.
>>
>>> In user machine, this write action was failed, the PV header data (first
>>> 4K) save in bcache (cache->errored list), and then write (by
>>> bcache_flush) to another disk (f748).
>>
>> It looks like we need to get rid of cache->errored completely.
>>
>>> If dev_write_bytes failed, the bcache never clean last_byte. and the fd
>>> is closed at same time, but cache->errored still have errored fd's data.
>>> later lvm open new disk, the fd may reuse the old-errored fd number,
>>> error data will be written when later lvm call bcache_flush.
>>
>> That's a bad bug.
>>
>>> 2> duplicated pv header.
>>>       as <1> description, fc68 metadata was overwritten to f748.
>>>       this cause by lvm bug (I said in <1>).
>>>
>>> 3> device not correct
>>>       I don't know why the disk scsi-360060e80072a670000302a670000fc68 has below wrong metadata:
>>>
>>> pre_pvr/scsi-360060e80072a670000302a670000fc68
>>> (please also read the comments in below metadata area.)
>>> ```
>>>        vgpocdbcdb1_r2 {
>>>            id = "PWd17E-xxx-oANHbq"
>>>            seqno = 20
>>>            format = "lvm2"
>>>            status = ["RESIZEABLE", "READ", "WRITE"]
>>>            flags = []
>>>            extent_size = 65536
>>>            max_lv = 0
>>>            max_pv = 0
>>>            metadata_copies = 0
>>>            
>>>            physical_volumes {
>>>                
>>>                pv0 {
>>>                    id = "3KTOW5-xxxx-8g0Rf2"
>>>                    device = "/dev/disk/by-id/scsi-360060e80072a660000302a660000f768"
>>>                                                                        Wrong!! ^^^^^
>>>                             I don't know why there is f768, please ask customer
>>>                    status = ["ALLOCATABLE"]
>>>                    flags = []
>>>                    dev_size = 860160
>>>                    pe_start = 2048
>>>                    pe_count = 13
>>>                }
>>>            }
>>> ```
>>>       fc68 => f768  the 'c' (b1100) change to '7' (b0111).
>>>       maybe disk bit overturn, maybe lvm has bug. I don't know & have no idea.
>>
>> Is scsi-360060e80072a660000302a660000f768 the correct device for
>> PVID 3KTOW5...?  If so, then it's consistent.  If not, then I suspect
>> this is a result of duplicating the PVID on multiple devices above.
>>
>>
>>> On 9/11/19 5:17 PM, Gang He wrote:
>>>> Hello List,
>>>>
>>>> Our user encountered a meta-data corruption problem, when run pvresize command after upgrading to LVM2 v2.02.180 from v2.02.120.
>>>>
>>>> The details are as below,
>>>> we have following environment:
>>>> - Storage: HP XP7 (SAN) - LUN's are presented to ESX via RDM
>>>> - VMWare ESXi 6.5
>>>> - SLES 12 SP 4 Guest
>>>>
>>>> Resize happened this way (is our standard way since years) - however - this is our first resize after upgrading SLES 12 SP3 to SLES 12 SP4 - until this upgrade, we
>>>> never had a problem like this:
>>>> - split continous access on storage box, resize lun on XP7
>>>> - recreate ca on XP7
>>>> - scan on ESX
>>>> - rescan-scsi-bus.sh -s on SLES VM
>>>> - pvresize  ( at this step the error happened)
>>>>
>>>> huns1vdb01:~ # pvresize /dev/disk/by-id/scsi-360060e80072a660000302a6600003274
>>>
>>> _______________________________________________
>>> linux-lvm mailing list
>>> linux-lvm@redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-lvm
>>> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
>>
> 
> _______________________________________________
> linux-lvm mailing list
> linux-lvm@redhat.com
> https://www.redhat.com/mailman/listinfo/linux-lvm
> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
> 

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [linux-lvm] pvresize will cause a meta-data corruption with error message "Error writing device at 4096 length 512"
  2019-10-12  7:11       ` Heming Zhao
@ 2019-10-14  3:07         ` Heming Zhao
  2019-10-14  3:13         ` Heming Zhao
  1 sibling, 0 replies; 16+ messages in thread
From: Heming Zhao @ 2019-10-14  3:07 UTC (permalink / raw)
  To: David Teigland; +Cc: Gang He, linux-lvm

I'm very sorry for the last mail. There are compiling errors in patch.
(with commit 89cfffeffb7499d8f51112f58c381007aebc372d)
I resend this patch. see below:

```
commit 7032c9c0bfe3c1fcbbb6e4e036ffe69a02aaa440
Author: Zhao Heming <heming.zhao@suse.com>
Date:   Mon Oct 14 10:55:54 2019 +0800

     makes caller can't reset bcache last_byte by dev_unset_last_byte
     
     Signed-off-by: Zhao Heming <heming.zhao@suse.com>

diff --git a/.gitignore b/.gitignore
index 7ebb8bb3be..cfd5bee1c4 100644
--- a/.gitignore
+++ b/.gitignore
@@ -30,7 +30,7 @@ make.tmpl
  /config.log
  /config.status
  /configure.scan
-/cscope.out
+/cscope.*
  /html/
  /reports/
  /tags
diff --git a/lib/format_text/format-text.c b/lib/format_text/format-text.c
index 6ec47bfcef..fd65f50f5f 100644
--- a/lib/format_text/format-text.c
+++ b/lib/format_text/format-text.c
@@ -277,8 +277,7 @@ static int _raw_write_mda_header(const struct format_type *fmt,
         dev_set_last_byte(dev, start_byte + MDA_HEADER_SIZE);
  
         if (!dev_write_bytes(dev, start_byte, MDA_HEADER_SIZE, mdah)) {
-               dev_unset_last_byte(dev);
-               log_error("Failed to write mda header to %s fd %d", dev_name(dev), dev->bcache_fd);
+               log_error("Failed to write mda header to %s", dev_name(dev));
                 return 0;
         }
         dev_unset_last_byte(dev);
@@ -988,8 +987,7 @@ static int _vg_write_raw(struct format_instance *fid, struct volume_group *vg,
                            (unsigned long long)write2_size);
  
         if (!dev_write_bytes(mdac->area.dev, write1_start, (size_t)write1_size, write_buf)) {
-               log_error("Failed to write metadata to %s fd %d", devname, mdac->area.dev->bcache_fd);
-               dev_unset_last_byte(mdac->area.dev);
+               log_error("Failed to write metadata to %s", devname);
                 goto out;
         }
  
@@ -1001,8 +999,7 @@ static int _vg_write_raw(struct format_instance *fid, struct volume_group *vg,
  
                 if (!dev_write_bytes(mdac->area.dev, write2_start, write2_size,
                                      write_buf + new_size - new_wrap)) {
-                       log_error("Failed to write metadata wrap to %s fd %d", devname, mdac->area.dev->bcache_fd);
-                       dev_unset_last_byte(mdac->area.dev);
+                       log_error("Failed to write metadata wrap to %s", devname);
                         goto out;
                 }
         }
@@ -1019,7 +1016,7 @@ static int _vg_write_raw(struct format_instance *fid, struct volume_group *vg,
  
         r = 1;
  
-      out:
+out:
         if (!r) {
                 free(fidtc->write_buf);
                 fidtc->write_buf = NULL;
diff --git a/lib/label/label.c b/lib/label/label.c
index 60ad387219..64a8f14150 100644
--- a/lib/label/label.c
+++ b/lib/label/label.c
@@ -218,7 +218,7 @@ int label_write(struct device *dev, struct label *label)
  
         if (!dev_write_bytes(dev, offset, LABEL_SIZE, buf)) {
                 log_debug_devs("Failed to write label to %s", dev_name(dev));
-               r = 0;
+               return 0;
         }
  
         dev_unset_last_byte(dev);
@@ -1415,7 +1415,8 @@ bool dev_write_bytes(struct device *dev, uint64_t start, size_t len, void *data)
  
         if (!scan_bcache) {
                 /* Should not happen */
-               log_error("dev_write bcache not set up %s", dev_name(dev));
+               log_error("dev_write bcache not set up %s fd %d", dev_name(dev),
+                               dev->bcache_fd);
                 return false;
         }
  
@@ -1434,21 +1435,25 @@ bool dev_write_bytes(struct device *dev, uint64_t start, size_t len, void *data)
                 dev->flags |= DEV_BCACHE_WRITE;
                 if (!label_scan_open(dev)) {
                         log_error("Error opening device %s for writing at %llu length %u.",
-                                 dev_name(dev), (unsigned long long)start, (uint32_t)len);
+                                       dev_name(dev), (unsigned long long)start, (uint32_t)len);
                         return false;
                 }
         }
  
         if (!bcache_write_bytes(scan_bcache, dev->bcache_fd, start, len, data)) {
-               log_error("Error writing device %s at %llu length %u.",
-                         dev_name(dev), (unsigned long long)start, (uint32_t)len);
+               log_error("Error writing device %s at %llu length %u fd %d.",
+                         dev_name(dev), (unsigned long long)start, (uint32_t)len,
+                         dev->bcache_fd);
+               dev_unset_last_byte(dev);
                 label_scan_invalidate(dev);
                 return false;
         }
  
         if (!bcache_flush(scan_bcache)) {
-               log_error("Error writing device %s at %llu length %u.",
-                         dev_name(dev), (unsigned long long)start, (uint32_t)len);
+               log_error("Error writing device %s at %llu length %u fd %d.",
+                         dev_name(dev), (unsigned long long)start, (uint32_t)len,
+                         dev->bcache_fd);
+               dev_unset_last_byte(dev);
                 label_scan_invalidate(dev);
                 return false;
         }
diff --git a/lib/metadata/mirror.c b/lib/metadata/mirror.c
index 75dc18c113..c8280f9c47 100644
--- a/lib/metadata/mirror.c
+++ b/lib/metadata/mirror.c
@@ -266,7 +266,6 @@ static int _write_log_header(struct cmd_context *cmd, struct logical_volume *lv)
         dev_set_last_byte(dev, sizeof(log_header));
  
         if (!dev_write_bytes(dev, UINT64_C(0), sizeof(log_header), &log_header)) {
-               dev_unset_last_byte(dev);
                 log_error("Failed to write log header to %s.", name);
                 return 0;
         }
```

On 10/12/19 3:11 PM, Heming Zhao wrote:
> Below patch for fix incorrect calling dev_unset_last_byte.

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [linux-lvm] pvresize will cause a meta-data corruption with error message "Error writing device at 4096 length 512"
  2019-10-12  7:11       ` Heming Zhao
  2019-10-14  3:07         ` Heming Zhao
@ 2019-10-14  3:13         ` Heming Zhao
  2019-10-16  8:50           ` Heming Zhao
  1 sibling, 1 reply; 16+ messages in thread
From: Heming Zhao @ 2019-10-14  3:13 UTC (permalink / raw)
  To: David Teigland; +Cc: Gang He, linux-lvm

For the issue in bcache_flush, it's related with cache->errored.

I give my fix. I believe there should have better solution than my.

Solution:
To keep cache->errored, but this list only use to save error data,
and the error data never resend.
So bcache_flush check the cache->errored, when the errored list is not empty,
bcache_flush return false, it will trigger caller/upper to do the clean jobs.

```
commit 17e959c0ba58edc67b6caa7669444ecffa40a16f (HEAD -> master)
Author: Zhao Heming <heming.zhao@suse.com>
Date:   Mon Oct 14 10:57:54 2019 +0800

     The fd in cache->errored may already be closed before calling bcache_flush,
     so bcache_flush shouldn't rewrite data in cache->errored. Currently
     solution is return error to caller when cache->errored is not empty, and
     caller should do all the clean jobs.
     
     Signed-off-by: Zhao Heming <heming.zhao@suse.com>

diff --git a/lib/device/bcache.c b/lib/device/bcache.c
index cfe01bac2f..2eb3f0ee34 100644
--- a/lib/device/bcache.c
+++ b/lib/device/bcache.c
@@ -897,16 +897,20 @@ static bool _wait_io(struct bcache *cache)
   * High level IO handling
   *--------------------------------------------------------------*/
  
-static void _wait_all(struct bcache *cache)
+static bool _wait_all(struct bcache *cache)
  {
+       bool ret = true;
         while (!dm_list_empty(&cache->io_pending))
-               _wait_io(cache);
+               ret = _wait_io(cache);
+       return ret;
  }
  
-static void _wait_specific(struct block *b)
+static bool _wait_specific(struct block *b)
  {
+       bool ret = true;
         while (_test_flags(b, BF_IO_PENDING))
-               _wait_io(b->cache);
+               ret = _wait_io(b->cache);
+       return ret;
  }
  
  static unsigned _writeback(struct bcache *cache, unsigned count)
@@ -1262,10 +1266,7 @@ void bcache_put(struct block *b)
  
  bool bcache_flush(struct bcache *cache)
  {
-       // Only dirty data is on the errored list, since bad read blocks get
-       // recycled straight away.  So we put these back on the dirty list, and
-       // try and rewrite everything.
-       dm_list_splice(&cache->dirty, &cache->errored);
+       bool ret = true;
  
         while (!dm_list_empty(&cache->dirty)) {
                 struct block *b = dm_list_item(_list_pop(&cache->dirty), struct block);
@@ -1275,11 +1276,18 @@ bool bcache_flush(struct bcache *cache)
                 }
  
                 _issue_write(b);
+               if (b->error) ret = false;
         }
  
-       _wait_all(cache);
+       ret = _wait_all(cache);
  
-       return dm_list_empty(&cache->errored);
+       // merge the errored list to dirty, return false to trigger caller to
+       // clean them.
+       if (!dm_list_empty(&cache->errored)) {
+               dm_list_splice(&cache->dirty, &cache->errored);
+               ret = false;
+       }
+       return ret;
  }
  
  //----------------------------------------------------------------
```

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [linux-lvm] pvresize will cause a meta-data corruption with error message "Error writing device at 4096 length 512"
  2019-10-14  3:13         ` Heming Zhao
@ 2019-10-16  8:50           ` Heming Zhao
  0 siblings, 0 replies; 16+ messages in thread
From: Heming Zhao @ 2019-10-16  8:50 UTC (permalink / raw)
  To: David Teigland; +Cc: Gang He, linux-lvm

Hello list,

the reason: customer "pvresize -vvvvvv -dddddd /dev/disk/by-id/scsi-360060e80072a660000302a660000f111", the disk scsi-360060e80072a660000302a660000f747 overwrite was found.

from our user's log. the fc68 disk was opened as fd 18, it related buffer (fc68's PV header) was written 8 times:
```
/dev/disk/by-id/scsi-360060e80072a670000302a670000fc68 <== first error
/dev/disk/by-id/scsi-360060e80072a670000302a670000fc67 <== 2nd time write fc68's pv header, but failed
/dev/disk/by-id/scsi-360060e80072a670000302a670000fc66 <== 3th time write failed.
/dev/disk/by-id/scsi-360060e80072a660000302a660000fa08-part1
/dev/disk/by-id/scsi-360060e80072a660000302a660000f74c
/dev/disk/by-id/scsi-360060e80072a660000302a660000f74a
/dev/disk/by-id/scsi-360060e80072a660000302a660000f749
/dev/disk/by-id/scsi-360060e80072a660000302a660000f747 <=== last error, but write fc68's data successfully.
```

when the last error on f747, the write to fd 18 is successfully:
```
#label/label.c:629           Scanning 1 devices for VG info
#label/label.c:665           Scanning submitted 1 reads
#device/bcache.c:1063    zhm _lookup_or_read_block 1063 flags: 0x0 b: 0x555e778b26a0
#label/label.c:674           Scan failed to read /dev/disk/by-id/scsi-360060e80072a660000302a660000f747 error 0.
#device/bcache.c:1363    zhm bcache_invalidate_fd 1363 cache->dirty fd: 18
#device/bcache.c:1372    zhm bcache_invalidate_fd 1372 cache->clean fd: 18
#label/label.c:764           Scanned devices: read errors 1 process errors 0 failed 1
```
At above, the line '#device/bcache.c:1363' will trigger write action & successfully.

And another important thing is the first fd 18's data write error, the data in cache->errored, it was not saved in dirty list & clean list. So first fc68 write error (by calling bcache_invalidate_fd) won't clean this data.

Then happened below 6 read error. (the error of fc67, fc66, fa08-par1, f74c, f74a, f749 are same):
```
#toollib.c:4377          Processing PVs in VG vgautdbtest02_db03
#locking/locking.c:331           Dropping cache for vgautdbtest02_db03.
#misc/lvm-flock.c:202         Locking /run/lvm/lock/V_vgautdbtest02_db03 WB
#misc/lvm-flock.c:100           _do_flock /run/lvm/lock/V_vgautdbtest02_db03:aux WB
#misc/lvm-flock.c:100           _do_flock /run/lvm/lock/V_vgautdbtest02_db03 WB
#misc/lvm-flock.c:47            _undo_flock /run/lvm/lock/V_vgautdbtest02_db03:aux
#metadata/metadata.c:3778        Reading VG vgautdbtest02_db03 fgItp1-1M8x-6qVT-84ND-Fpwa-mFYf-or2sow
#metadata/metadata.c:3874          Rescanning devices for vgautdbtest02_db03
#cache/lvmcache.c:751           lvmcache has no info for vgname "vgautdbtest02_db03" with VGID fgItp11M8x6qVT84NDFpwamFYfor2sow.
#device/bcache.c:1372    zhm bcache_invalidate_fd 1372 cache->clean fd: 49
#label/label.c:629           Scanning 1 devices for VG info
#label/label.c:665           Scanning submitted 1 reads
#device/bcache.c:1063    zhm _lookup_or_read_block 1063 flags: 0x0 b: 0x555e778b26a0
#label/label.c:674           Scan failed to read /dev/disk/by-id/scsi-360060e80072a660000302a660000f749 error 0.
#device/bcache.c:1363    zhm bcache_invalidate_fd 1363 cache->dirty fd: 18
#device/bcache.c:337     zhm _async_wait 337 ev->res: 18446744073709551495 errno: 0
#device/bcache.c:875     zhm _complete_io 875 fd: 18 error: -121
#label/label.c:764           Scanned devices: read errors 1 process errors 0 failed 1
```

why disk f749's fd is 49, but finally disk f749's fd is 18?

with following steps, f749 got fd 18 which was fc68's closed fd before.
```
lvmcache_label_rescan_vg
  label_scan_devs_rw
  {
      //clean already exist bcache & closed current fd
      dm_list_iterate_items(devl, devs) {
         if (_in_bcache(devl->dev)) {
             bcache_invalidate_fd(scan_bcache, devl->dev->bcache_fd); <== invalidate current bcache
             _scan_dev_close(devl->dev); <=== closed current fd 49
         }

      _scan_list(cmd, f, devs, NULL); <== will open new fd (reuse just closed errored fd 18).
  }
```
in _scan_list, it will open disk with last failed fd 18. then use this fd 18 to search hash table. then got the last errored bcache data (with PV metadata).

With following flow, the cache->errored move into cache->dirty list.
```
bcache_get
  _lookup_or_read_block
  {
    b = _hash_lookup() <=== fetch just closed errored fd 18.
    if (b) {
      ... ...
      _unlink_block(b); <=== unlink from cache->errored
      ... ...
    }
    if (b) {
      _link_block(b); <== legacy fd 18 has FG_DIRTY, it will trigger move to cache->dirty.
    }
  }
```

with fd 18's data in cache->dirty, when calling bcache_invalidate_fd, it will trigger _issue_write() to write out this PV header to another disk. if happen write error (like this log showed 7 times error), pv header data first saved in cache->errored, then move to cache->dirty under calling bcache_get. this error won't disappear until the write successfully.

the line "#device/bcache.c:1063" show, the bcache always fetch/reuse last errored fd 18 (fc68 first opened).
fc67, fc66, fa08-par1, f74c, f74a, f749 try to write disk but failed, on the 8th write (by disk f747) the write successfully. so the lvm metadata of f747 disk had been changed.


and for the write error -121, I need time to dig out. kernel message reported below info when lvm wrote error:
```
kernel: [ 714.139133] print_req_error: critical target error, dev sddz, sector 2048
kernel: [ 714.139144] print_req_error: critical target error, dev dm-54, sector 2048
```

at last, please ignore my previous patches. they are not complete. the function bcache_invalidate_fd should be fix.
I had send my lastest fix to our customer, and wait for feedback.

Thanks,
zhm

On 10/14/19 11:13 AM, Heming Zhao wrote:
> For the issue in bcache_flush, it's related with cache->errored.
> 
> I give my fix. I believe there should have better solution than my.
> 
> Solution:
> To keep cache->errored, but this list only use to save error data,
> and the error data never resend.
> So bcache_flush check the cache->errored, when the errored list is not empty,
> bcache_flush return false, it will trigger caller/upper to do the clean jobs.
> 
> ```
> commit 17e959c0ba58edc67b6caa7669444ecffa40a16f (HEAD -> master)
> Author: Zhao Heming <heming.zhao@suse.com>
> Date:   Mon Oct 14 10:57:54 2019 +0800
> 
>       The fd in cache->errored may already be closed before calling bcache_flush,
>       so bcache_flush shouldn't rewrite data in cache->errored. Currently
>       solution is return error to caller when cache->errored is not empty, and
>       caller should do all the clean jobs.
>       
>       Signed-off-by: Zhao Heming <heming.zhao@suse.com>
> 
> diff --git a/lib/device/bcache.c b/lib/device/bcache.c
> index cfe01bac2f..2eb3f0ee34 100644
> --- a/lib/device/bcache.c
> +++ b/lib/device/bcache.c
> @@ -897,16 +897,20 @@ static bool _wait_io(struct bcache *cache)
>     * High level IO handling
>     *--------------------------------------------------------------*/
>    
> -static void _wait_all(struct bcache *cache)
> +static bool _wait_all(struct bcache *cache)
>    {
> +       bool ret = true;
>           while (!dm_list_empty(&cache->io_pending))
> -               _wait_io(cache);
> +               ret = _wait_io(cache);
> +       return ret;
>    }
>    
> -static void _wait_specific(struct block *b)
> +static bool _wait_specific(struct block *b)
>    {
> +       bool ret = true;
>           while (_test_flags(b, BF_IO_PENDING))
> -               _wait_io(b->cache);
> +               ret = _wait_io(b->cache);
> +       return ret;
>    }
>    
>    static unsigned _writeback(struct bcache *cache, unsigned count)
> @@ -1262,10 +1266,7 @@ void bcache_put(struct block *b)
>    
>    bool bcache_flush(struct bcache *cache)
>    {
> -       // Only dirty data is on the errored list, since bad read blocks get
> -       // recycled straight away.  So we put these back on the dirty list, and
> -       // try and rewrite everything.
> -       dm_list_splice(&cache->dirty, &cache->errored);
> +       bool ret = true;
>    
>           while (!dm_list_empty(&cache->dirty)) {
>                   struct block *b = dm_list_item(_list_pop(&cache->dirty), struct block);
> @@ -1275,11 +1276,18 @@ bool bcache_flush(struct bcache *cache)
>                   }
>    
>                   _issue_write(b);
> +               if (b->error) ret = false;
>           }
>    
> -       _wait_all(cache);
> +       ret = _wait_all(cache);
>    
> -       return dm_list_empty(&cache->errored);
> +       // merge the errored list to dirty, return false to trigger caller to
> +       // clean them.
> +       if (!dm_list_empty(&cache->errored)) {
> +               dm_list_splice(&cache->dirty, &cache->errored);
> +               ret = false;
> +       }
> +       return ret;
>    }
>    
>    //----------------------------------------------------------------
> ```
> 
> 
> _______________________________________________
> linux-lvm mailing list
> linux-lvm@redhat.com
> https://www.redhat.com/mailman/listinfo/linux-lvm
> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2019-10-16  8:50 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-11  9:17 [linux-lvm] pvresize will cause a meta-data corruption with error message "Error writing device at 4096 length 512" Gang He
2019-09-11 10:01 ` Ilia Zykov
2019-09-11 10:03 ` Ilia Zykov
2019-09-11 10:10   ` Ingo Franzki
2019-09-11 10:20     ` Gang He
2019-10-11  8:11 ` Heming Zhao
2019-10-11  9:22   ` Heming Zhao
2019-10-11 10:38     ` Zdenek Kabelac
2019-10-11 11:50       ` Heming Zhao
2019-10-11 15:14   ` David Teigland
2019-10-12  3:23     ` Gang He
2019-10-12  6:34     ` Heming Zhao
2019-10-12  7:11       ` Heming Zhao
2019-10-14  3:07         ` Heming Zhao
2019-10-14  3:13         ` Heming Zhao
2019-10-16  8:50           ` Heming Zhao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).