stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* stable 4.19.80 and 4.19.81: BAD! BarrierAck seen in drbd
@ 2019-11-08 16:38 Wolfgang Walter
  2019-11-08 17:41 ` Greg KH
  0 siblings, 1 reply; 3+ messages in thread
From: Wolfgang Walter @ 2019-11-08 16:38 UTC (permalink / raw)
  To: drbd-dev; +Cc: stable

Hello,

starting with 4.19.80 we saw twice this message from drbd (primary):

[335776.845165] drbd fastexport: BAD! BarrierAck #24834877 received, expected #24834876!
[335776.845272] drbd fastexport: peer( Secondary -> Unknown ) conn( Connected -> ProtocolError ) pdsk( UpToDate -> DUnknown ) 
[335776.845359] drbd fastexport: ack_receiver terminated
[335776.845361] drbd fastexport: Terminating drbd_a_fastexpo
[335776.845452] block drbd131: new current UUID E5F844267DB80A79:9823CD86802A0EC5:D37ECA3CF80CEEE3:D37DCA3CF80CEEE3
[335776.845790] block drbd132: new current UUID B243958413F455E9:E107A4DF8E316F71:F05164996BFE9341:F05064996BFE9341
[335776.846148] block drbd133: new current UUID 6E661C621E6C22E5:3271D34B1D818E09:6BA25176D9D79A5F:6BA15176D9D79A5F
[335776.895617] drbd fastexport: Connection closed
[335776.895764] drbd fastexport: conn( ProtocolError -> Unconnected ) 
[335776.895767] drbd fastexport: receiver terminated
[335776.895768] drbd fastexport: Restarting receiver thread
[335776.895769] drbd fastexport: receiver (re)started
[335776.895783] drbd fastexport: conn( Unconnected -> WFConnection ) 
[335782.092297] drbd fastexport: Handshake successful: Agreed network protocol version 101
[335782.092301] drbd fastexport: Feature flags enabled on protocol level: 0x7 TRIM THIN_RESYNC WRITE_SAME.
[335782.092489] drbd fastexport: Peer authenticated using 32 bytes HMAC
[335782.092577] drbd fastexport: conn( WFConnection -> WFReportParams ) 
[335782.092587] drbd fastexport: Starting ack_recv thread (from drbd_r_fastexpo [1925])
[335782.142516] block drbd131: drbd_sync_handshake:
[335782.142522] block drbd131: self E5F844267DB80A79:9823CD86802A0EC5:D37ECA3CF80CEEE3:D37DCA3CF80CEEE3 bits:8180 flags:0
[335782.142527] block drbd131: peer 9823CD86802A0EC4:0000000000000000:D37ECA3CF80CEEE2:D37DCA3CF80CEEE3 bits:0 flags:0
[335782.142530] block drbd131: uuid_compare()=1 by rule 70
[335782.142539] block drbd131: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS ) pdsk( DUnknown -> Consistent ) 
[335782.153913] block drbd131: send bitmap stats [Bytes(packets)]: plain 0(0), RLE 438(1), total 438; compression: 100.0%
[335782.178564] block drbd132: drbd_sync_handshake:
[335782.178570] block drbd132: self B243958413F455E9:E107A4DF8E316F71:F05164996BFE9341:F05064996BFE9341 bits:16198 flags:0
[335782.178574] block drbd132: peer E107A4DF8E316F70:0000000000000000:F05164996BFE9340:F05064996BFE9341 bits:0 flags:0
[335782.178578] block drbd132: uuid_compare()=1 by rule 70
[335782.178587] block drbd132: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS ) pdsk( DUnknown -> Consistent ) 
[335782.191705] block drbd132: send bitmap stats [Bytes(packets)]: plain 0(0), RLE 721(1), total 721; compression: 100.0%
[335782.206444] block drbd133: drbd_sync_handshake:
[335782.206447] block drbd133: self 6E661C621E6C22E5:3271D34B1D818E09:6BA25176D9D79A5F:6BA15176D9D79A5F bits:7775 flags:0
[335782.206450] block drbd133: peer 3271D34B1D818E08:0000000000000000:6BA25176D9D79A5E:6BA15176D9D79A5F bits:0 flags:0
[335782.206452] block drbd133: uuid_compare()=1 by rule 70
[335782.206459] block drbd133: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS ) pdsk( DUnknown -> Consistent ) 
[335782.206734] block drbd131: receive bitmap stats [Bytes(packets)]: plain 0(0), RLE 438(1), total 438; compression: 100.0%
[335782.206738] block drbd131: helper command: /sbin/drbdadm before-resync-source minor-131
[335782.214351] block drbd131: helper command: /sbin/drbdadm before-resync-source minor-131 exit code 0 (0x0)
[335782.214376] block drbd131: conn( WFBitMapS -> SyncSource ) pdsk( Consistent -> Inconsistent ) 
[335782.214395] block drbd131: Began resync as SyncSource (will sync 33104 KB [8276 bits set]).
[335782.214746] block drbd132: receive bitmap stats [Bytes(packets)]: plain 0(0), RLE 721(1), total 721; compression: 100.0%
[335782.214749] block drbd132: helper command: /sbin/drbdadm before-resync-source minor-132
[335782.217368] block drbd132: helper command: /sbin/drbdadm before-resync-source minor-132 exit code 0 (0x0)
[335782.217379] block drbd132: conn( WFBitMapS -> SyncSource ) pdsk( Consistent -> Inconsistent ) 
[335782.217391] block drbd132: Began resync as SyncSource (will sync 65132 KB [16283 bits set]).
[335782.218079] block drbd133: send bitmap stats [Bytes(packets)]: plain 0(0), RLE 290(1), total 290; compression: 100.0%
[335782.218106] block drbd131: updated sync UUID E5F844267DB80A79:9824CD86802A0EC5:9823CD86802A0EC5:D37ECA3CF80CEEE3
[335782.218450] block drbd132: updated sync UUID B243958413F455E9:E108A4DF8E316F71:E107A4DF8E316F71:F05164996BFE9341
[335782.225868] block drbd133: receive bitmap stats [Bytes(packets)]: plain 0(0), RLE 290(1), total 290; compression: 100.0%
[335782.225873] block drbd133: helper command: /sbin/drbdadm before-resync-source minor-133
[335782.227420] block drbd133: helper command: /sbin/drbdadm before-resync-source minor-133 exit code 0 (0x0)
[335782.227438] block drbd133: conn( WFBitMapS -> SyncSource ) pdsk( Consistent -> Inconsistent ) 
[335782.227461] block drbd133: Began resync as SyncSource (will sync 31456 KB [7864 bits set]).
[335782.227508] block drbd133: updated sync UUID 6E661C621E6C22E5:3272D34B1D818E09:3271D34B1D818E09:6BA25176D9D79A5F
[335791.633014] block drbd133: Resync done (total 9 sec; paused 0 sec; 3492 K/sec)
[335791.633020] block drbd133: 0 % had equal checksums, eliminated: 92K; transferred 31364K total 31456K
[335791.633026] block drbd133: updated UUIDs 6E661C621E6C22E5:0000000000000000:3272D34B1D818E09:3271D34B1D818E09
[335791.633036] block drbd133: conn( SyncSource -> Connected ) pdsk( Inconsistent -> UpToDate ) 
[335791.847326] block drbd131: Resync done (total 9 sec; paused 0 sec; 3676 K/sec)
[335791.847329] block drbd131: 0 % had equal checksums, eliminated: 152K; transferred 32952K total 33104K
[335791.847332] block drbd131: updated UUIDs E5F844267DB80A79:0000000000000000:9824CD86802A0EC5:9823CD86802A0EC5
[335791.847336] block drbd131: conn( SyncSource -> Connected ) pdsk( Inconsistent -> UpToDate ) 
[335795.577446] block drbd132: Resync done (total 13 sec; paused 0 sec; 5008 K/sec)
[335795.577450] block drbd132: 0 % had equal checksums, eliminated: 32K; transferred 65100K total 65132K
[335795.577456] block drbd132: updated UUIDs B243958413F455E9:0000000000000000:E108A4DF8E316F71:E107A4DF8E316F71
[335795.577465] block drbd132: conn( SyncSource -> Connected ) pdsk( Inconsistent -> UpToDate )



Both times the difference between the BarrierAck was 1.

We didn't use 4.19.79. We never saw this message with 4.19.78 or ealier.

After the first time this happened we saw a filesystem corruption.


Setup:

The drbd fastexport consists of three volumes 1, 2 and 3

Each volume is backed by a logical volume

	drbd_fastexport01
	drbd_fastexport02
	drbd_fastexport03
	
of an LVM volumegroup
	drbddisks_fast

This logical volumes are each on there own physical devices (which are raid1).

These raid1 are based on SSDs attached to an mpt3sas driven controller.


(both sides are setup like that).


Regards,
-- 
Wolfgang Walter
Studentenwerk München
Anstalt des öffentlichen Rechts

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: stable 4.19.80 and 4.19.81: BAD! BarrierAck seen in drbd
  2019-11-08 16:38 stable 4.19.80 and 4.19.81: BAD! BarrierAck seen in drbd Wolfgang Walter
@ 2019-11-08 17:41 ` Greg KH
  2019-11-08 19:30   ` Wolfgang Walter
  0 siblings, 1 reply; 3+ messages in thread
From: Greg KH @ 2019-11-08 17:41 UTC (permalink / raw)
  To: Wolfgang Walter; +Cc: drbd-dev, stable

On Fri, Nov 08, 2019 at 05:38:13PM +0100, Wolfgang Walter wrote:
> Hello,
> 
> starting with 4.19.80 we saw twice this message from drbd (primary):
> 
> [335776.845165] drbd fastexport: BAD! BarrierAck #24834877 received, expected #24834876!
> [335776.845272] drbd fastexport: peer( Secondary -> Unknown ) conn( Connected -> ProtocolError ) pdsk( UpToDate -> DUnknown ) 
> [335776.845359] drbd fastexport: ack_receiver terminated
> [335776.845361] drbd fastexport: Terminating drbd_a_fastexpo
> [335776.845452] block drbd131: new current UUID E5F844267DB80A79:9823CD86802A0EC5:D37ECA3CF80CEEE3:D37DCA3CF80CEEE3
> [335776.845790] block drbd132: new current UUID B243958413F455E9:E107A4DF8E316F71:F05164996BFE9341:F05064996BFE9341
> [335776.846148] block drbd133: new current UUID 6E661C621E6C22E5:3271D34B1D818E09:6BA25176D9D79A5F:6BA15176D9D79A5F
> [335776.895617] drbd fastexport: Connection closed
> [335776.895764] drbd fastexport: conn( ProtocolError -> Unconnected ) 
> [335776.895767] drbd fastexport: receiver terminated
> [335776.895768] drbd fastexport: Restarting receiver thread
> [335776.895769] drbd fastexport: receiver (re)started
> [335776.895783] drbd fastexport: conn( Unconnected -> WFConnection ) 
> [335782.092297] drbd fastexport: Handshake successful: Agreed network protocol version 101
> [335782.092301] drbd fastexport: Feature flags enabled on protocol level: 0x7 TRIM THIN_RESYNC WRITE_SAME.
> [335782.092489] drbd fastexport: Peer authenticated using 32 bytes HMAC
> [335782.092577] drbd fastexport: conn( WFConnection -> WFReportParams ) 
> [335782.092587] drbd fastexport: Starting ack_recv thread (from drbd_r_fastexpo [1925])
> [335782.142516] block drbd131: drbd_sync_handshake:
> [335782.142522] block drbd131: self E5F844267DB80A79:9823CD86802A0EC5:D37ECA3CF80CEEE3:D37DCA3CF80CEEE3 bits:8180 flags:0
> [335782.142527] block drbd131: peer 9823CD86802A0EC4:0000000000000000:D37ECA3CF80CEEE2:D37DCA3CF80CEEE3 bits:0 flags:0
> [335782.142530] block drbd131: uuid_compare()=1 by rule 70
> [335782.142539] block drbd131: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS ) pdsk( DUnknown -> Consistent ) 
> [335782.153913] block drbd131: send bitmap stats [Bytes(packets)]: plain 0(0), RLE 438(1), total 438; compression: 100.0%
> [335782.178564] block drbd132: drbd_sync_handshake:
> [335782.178570] block drbd132: self B243958413F455E9:E107A4DF8E316F71:F05164996BFE9341:F05064996BFE9341 bits:16198 flags:0
> [335782.178574] block drbd132: peer E107A4DF8E316F70:0000000000000000:F05164996BFE9340:F05064996BFE9341 bits:0 flags:0
> [335782.178578] block drbd132: uuid_compare()=1 by rule 70
> [335782.178587] block drbd132: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS ) pdsk( DUnknown -> Consistent ) 
> [335782.191705] block drbd132: send bitmap stats [Bytes(packets)]: plain 0(0), RLE 721(1), total 721; compression: 100.0%
> [335782.206444] block drbd133: drbd_sync_handshake:
> [335782.206447] block drbd133: self 6E661C621E6C22E5:3271D34B1D818E09:6BA25176D9D79A5F:6BA15176D9D79A5F bits:7775 flags:0
> [335782.206450] block drbd133: peer 3271D34B1D818E08:0000000000000000:6BA25176D9D79A5E:6BA15176D9D79A5F bits:0 flags:0
> [335782.206452] block drbd133: uuid_compare()=1 by rule 70
> [335782.206459] block drbd133: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS ) pdsk( DUnknown -> Consistent ) 
> [335782.206734] block drbd131: receive bitmap stats [Bytes(packets)]: plain 0(0), RLE 438(1), total 438; compression: 100.0%
> [335782.206738] block drbd131: helper command: /sbin/drbdadm before-resync-source minor-131
> [335782.214351] block drbd131: helper command: /sbin/drbdadm before-resync-source minor-131 exit code 0 (0x0)
> [335782.214376] block drbd131: conn( WFBitMapS -> SyncSource ) pdsk( Consistent -> Inconsistent ) 
> [335782.214395] block drbd131: Began resync as SyncSource (will sync 33104 KB [8276 bits set]).
> [335782.214746] block drbd132: receive bitmap stats [Bytes(packets)]: plain 0(0), RLE 721(1), total 721; compression: 100.0%
> [335782.214749] block drbd132: helper command: /sbin/drbdadm before-resync-source minor-132
> [335782.217368] block drbd132: helper command: /sbin/drbdadm before-resync-source minor-132 exit code 0 (0x0)
> [335782.217379] block drbd132: conn( WFBitMapS -> SyncSource ) pdsk( Consistent -> Inconsistent ) 
> [335782.217391] block drbd132: Began resync as SyncSource (will sync 65132 KB [16283 bits set]).
> [335782.218079] block drbd133: send bitmap stats [Bytes(packets)]: plain 0(0), RLE 290(1), total 290; compression: 100.0%
> [335782.218106] block drbd131: updated sync UUID E5F844267DB80A79:9824CD86802A0EC5:9823CD86802A0EC5:D37ECA3CF80CEEE3
> [335782.218450] block drbd132: updated sync UUID B243958413F455E9:E108A4DF8E316F71:E107A4DF8E316F71:F05164996BFE9341
> [335782.225868] block drbd133: receive bitmap stats [Bytes(packets)]: plain 0(0), RLE 290(1), total 290; compression: 100.0%
> [335782.225873] block drbd133: helper command: /sbin/drbdadm before-resync-source minor-133
> [335782.227420] block drbd133: helper command: /sbin/drbdadm before-resync-source minor-133 exit code 0 (0x0)
> [335782.227438] block drbd133: conn( WFBitMapS -> SyncSource ) pdsk( Consistent -> Inconsistent ) 
> [335782.227461] block drbd133: Began resync as SyncSource (will sync 31456 KB [7864 bits set]).
> [335782.227508] block drbd133: updated sync UUID 6E661C621E6C22E5:3272D34B1D818E09:3271D34B1D818E09:6BA25176D9D79A5F
> [335791.633014] block drbd133: Resync done (total 9 sec; paused 0 sec; 3492 K/sec)
> [335791.633020] block drbd133: 0 % had equal checksums, eliminated: 92K; transferred 31364K total 31456K
> [335791.633026] block drbd133: updated UUIDs 6E661C621E6C22E5:0000000000000000:3272D34B1D818E09:3271D34B1D818E09
> [335791.633036] block drbd133: conn( SyncSource -> Connected ) pdsk( Inconsistent -> UpToDate ) 
> [335791.847326] block drbd131: Resync done (total 9 sec; paused 0 sec; 3676 K/sec)
> [335791.847329] block drbd131: 0 % had equal checksums, eliminated: 152K; transferred 32952K total 33104K
> [335791.847332] block drbd131: updated UUIDs E5F844267DB80A79:0000000000000000:9824CD86802A0EC5:9823CD86802A0EC5
> [335791.847336] block drbd131: conn( SyncSource -> Connected ) pdsk( Inconsistent -> UpToDate ) 
> [335795.577446] block drbd132: Resync done (total 13 sec; paused 0 sec; 5008 K/sec)
> [335795.577450] block drbd132: 0 % had equal checksums, eliminated: 32K; transferred 65100K total 65132K
> [335795.577456] block drbd132: updated UUIDs B243958413F455E9:0000000000000000:E108A4DF8E316F71:E107A4DF8E316F71
> [335795.577465] block drbd132: conn( SyncSource -> Connected ) pdsk( Inconsistent -> UpToDate )
> 
> 
> 
> Both times the difference between the BarrierAck was 1.
> 
> We didn't use 4.19.79. We never saw this message with 4.19.78 or ealier.
> 
> After the first time this happened we saw a filesystem corruption.
> 
> 
> Setup:
> 
> The drbd fastexport consists of three volumes 1, 2 and 3
> 
> Each volume is backed by a logical volume
> 
> 	drbd_fastexport01
> 	drbd_fastexport02
> 	drbd_fastexport03
> 	
> of an LVM volumegroup
> 	drbddisks_fast
> 
> This logical volumes are each on there own physical devices (which are raid1).
> 
> These raid1 are based on SSDs attached to an mpt3sas driven controller.
> 
> 
> (both sides are setup like that).

Any chance you can run 'git bisect' between the two kernels (bad and
good) and find the offending commit?

Also, does 5.3.9 work for you?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: stable 4.19.80 and 4.19.81: BAD! BarrierAck seen in drbd
  2019-11-08 17:41 ` Greg KH
@ 2019-11-08 19:30   ` Wolfgang Walter
  0 siblings, 0 replies; 3+ messages in thread
From: Wolfgang Walter @ 2019-11-08 19:30 UTC (permalink / raw)
  To: Greg KH; +Cc: drbd-dev, stable

Am Freitag, 8. November 2019, 18:41:18 schrieb Greg KH:
> On Fri, Nov 08, 2019 at 05:38:13PM +0100, Wolfgang Walter wrote:
> > Hello,
> > 
> > starting with 4.19.80 we saw twice this message from drbd (primary):
> > 
> > [335776.845165] drbd fastexport: BAD! BarrierAck #24834877 received,
> > expected #24834876! [335776.845272] drbd fastexport: peer( Secondary ->
> > Unknown ) conn( Connected -> ProtocolError ) pdsk( UpToDate -> DUnknown )
> > [335776.845359] drbd fastexport: ack_receiver terminated
> > [335776.845361] drbd fastexport: Terminating drbd_a_fastexpo
> > [335776.845452] block drbd131: new current UUID
> > E5F844267DB80A79:9823CD86802A0EC5:D37ECA3CF80CEEE3:D37DCA3CF80CEEE3
> > [335776.845790] block drbd132: new current UUID
> > B243958413F455E9:E107A4DF8E316F71:F05164996BFE9341:F05064996BFE9341
> > [335776.846148] block drbd133: new current UUID
> > 6E661C621E6C22E5:3271D34B1D818E09:6BA25176D9D79A5F:6BA15176D9D79A5F
> > [335776.895617] drbd fastexport: Connection closed
> > [335776.895764] drbd fastexport: conn( ProtocolError -> Unconnected )
> > [335776.895767] drbd fastexport: receiver terminated
> > [335776.895768] drbd fastexport: Restarting receiver thread
> > [335776.895769] drbd fastexport: receiver (re)started
> > [335776.895783] drbd fastexport: conn( Unconnected -> WFConnection )
> > [335782.092297] drbd fastexport: Handshake successful: Agreed network
> > protocol version 101 [335782.092301] drbd fastexport: Feature flags
> > enabled on protocol level: 0x7 TRIM THIN_RESYNC WRITE_SAME.
> > [335782.092489] drbd fastexport: Peer authenticated using 32 bytes HMAC
> > [335782.092577] drbd fastexport: conn( WFConnection -> WFReportParams )
> > [335782.092587] drbd fastexport: Starting ack_recv thread (from
> > drbd_r_fastexpo [1925]) [335782.142516] block drbd131:
> > drbd_sync_handshake:
> > [335782.142522] block drbd131: self
> > E5F844267DB80A79:9823CD86802A0EC5:D37ECA3CF80CEEE3:D37DCA3CF80CEEE3
> > bits:8180 flags:0 [335782.142527] block drbd131: peer
> > 9823CD86802A0EC4:0000000000000000:D37ECA3CF80CEEE2:D37DCA3CF80CEEE3
> > bits:0 flags:0 [335782.142530] block drbd131: uuid_compare()=1 by rule 70
> > [335782.142539] block drbd131: peer( Unknown -> Secondary ) conn(
> > WFReportParams -> WFBitMapS ) pdsk( DUnknown -> Consistent )
> > [335782.153913] block drbd131: send bitmap stats [Bytes(packets)]: plain
> > 0(0), RLE 438(1), total 438; compression: 100.0% [335782.178564] block
> > drbd132: drbd_sync_handshake:
> > [335782.178570] block drbd132: self
> > B243958413F455E9:E107A4DF8E316F71:F05164996BFE9341:F05064996BFE9341
> > bits:16198 flags:0 [335782.178574] block drbd132: peer
> > E107A4DF8E316F70:0000000000000000:F05164996BFE9340:F05064996BFE9341
> > bits:0 flags:0 [335782.178578] block drbd132: uuid_compare()=1 by rule 70
> > [335782.178587] block drbd132: peer( Unknown -> Secondary ) conn(
> > WFReportParams -> WFBitMapS ) pdsk( DUnknown -> Consistent )
> > [335782.191705] block drbd132: send bitmap stats [Bytes(packets)]: plain
> > 0(0), RLE 721(1), total 721; compression: 100.0% [335782.206444] block
> > drbd133: drbd_sync_handshake:
> > [335782.206447] block drbd133: self
> > 6E661C621E6C22E5:3271D34B1D818E09:6BA25176D9D79A5F:6BA15176D9D79A5F
> > bits:7775 flags:0 [335782.206450] block drbd133: peer
> > 3271D34B1D818E08:0000000000000000:6BA25176D9D79A5E:6BA15176D9D79A5F
> > bits:0 flags:0 [335782.206452] block drbd133: uuid_compare()=1 by rule 70
> > [335782.206459] block drbd133: peer( Unknown -> Secondary ) conn(
> > WFReportParams -> WFBitMapS ) pdsk( DUnknown -> Consistent )
> > [335782.206734] block drbd131: receive bitmap stats [Bytes(packets)]:
> > plain 0(0), RLE 438(1), total 438; compression: 100.0% [335782.206738]
> > block drbd131: helper command: /sbin/drbdadm before-resync-source
> > minor-131 [335782.214351] block drbd131: helper command: /sbin/drbdadm
> > before-resync-source minor-131 exit code 0 (0x0) [335782.214376] block
> > drbd131: conn( WFBitMapS -> SyncSource ) pdsk( Consistent -> Inconsistent
> > ) [335782.214395] block drbd131: Began resync as SyncSource (will sync
> > 33104 KB [8276 bits set]). [335782.214746] block drbd132: receive bitmap
> > stats [Bytes(packets)]: plain 0(0), RLE 721(1), total 721; compression:
> > 100.0% [335782.214749] block drbd132: helper command: /sbin/drbdadm
> > before-resync-source minor-132 [335782.217368] block drbd132: helper
> > command: /sbin/drbdadm before-resync-source minor-132 exit code 0 (0x0)
> > [335782.217379] block drbd132: conn( WFBitMapS -> SyncSource ) pdsk(
> > Consistent -> Inconsistent ) [335782.217391] block drbd132: Began resync
> > as SyncSource (will sync 65132 KB [16283 bits set]). [335782.218079]
> > block drbd133: send bitmap stats [Bytes(packets)]: plain 0(0), RLE
> > 290(1), total 290; compression: 100.0% [335782.218106] block drbd131:
> > updated sync UUID
> > E5F844267DB80A79:9824CD86802A0EC5:9823CD86802A0EC5:D37ECA3CF80CEEE3
> > [335782.218450] block drbd132: updated sync UUID
> > B243958413F455E9:E108A4DF8E316F71:E107A4DF8E316F71:F05164996BFE9341
> > [335782.225868] block drbd133: receive bitmap stats [Bytes(packets)]:
> > plain 0(0), RLE 290(1), total 290; compression: 100.0% [335782.225873]
> > block drbd133: helper command: /sbin/drbdadm before-resync-source
> > minor-133 [335782.227420] block drbd133: helper command: /sbin/drbdadm
> > before-resync-source minor-133 exit code 0 (0x0) [335782.227438] block
> > drbd133: conn( WFBitMapS -> SyncSource ) pdsk( Consistent -> Inconsistent
> > ) [335782.227461] block drbd133: Began resync as SyncSource (will sync
> > 31456 KB [7864 bits set]). [335782.227508] block drbd133: updated sync
> > UUID 6E661C621E6C22E5:3272D34B1D818E09:3271D34B1D818E09:6BA25176D9D79A5F
> > [335791.633014] block drbd133: Resync done (total 9 sec; paused 0 sec;
> > 3492 K/sec) [335791.633020] block drbd133: 0 % had equal checksums,
> > eliminated: 92K; transferred 31364K total 31456K [335791.633026] block
> > drbd133: updated UUIDs
> > 6E661C621E6C22E5:0000000000000000:3272D34B1D818E09:3271D34B1D818E09
> > [335791.633036] block drbd133: conn( SyncSource -> Connected ) pdsk(
> > Inconsistent -> UpToDate ) [335791.847326] block drbd131: Resync done
> > (total 9 sec; paused 0 sec; 3676 K/sec) [335791.847329] block drbd131: 0
> > % had equal checksums, eliminated: 152K; transferred 32952K total 33104K
> > [335791.847332] block drbd131: updated UUIDs
> > E5F844267DB80A79:0000000000000000:9824CD86802A0EC5:9823CD86802A0EC5
> > [335791.847336] block drbd131: conn( SyncSource -> Connected ) pdsk(
> > Inconsistent -> UpToDate ) [335795.577446] block drbd132: Resync done
> > (total 13 sec; paused 0 sec; 5008 K/sec) [335795.577450] block drbd132: 0
> > % had equal checksums, eliminated: 32K; transferred 65100K total 65132K
> > [335795.577456] block drbd132: updated UUIDs
> > B243958413F455E9:0000000000000000:E108A4DF8E316F71:E107A4DF8E316F71
> > [335795.577465] block drbd132: conn( SyncSource -> Connected ) pdsk(
> > Inconsistent -> UpToDate )
> > 
> > 
> > 
> > Both times the difference between the BarrierAck was 1.
> > 
> > We didn't use 4.19.79. We never saw this message with 4.19.78 or ealier.
> > 
> > After the first time this happened we saw a filesystem corruption.
> > 
> > 
> > Setup:
> > 
> > The drbd fastexport consists of three volumes 1, 2 and 3
> > 
> > Each volume is backed by a logical volume
> > 
> > 	drbd_fastexport01
> > 	drbd_fastexport02
> > 	drbd_fastexport03
> > 
> > of an LVM volumegroup
> > 
> > 	drbddisks_fast
> > 
> > This logical volumes are each on there own physical devices (which are
> > raid1).
> > 
> > These raid1 are based on SSDs attached to an mpt3sas driven controller.
> > 
> > 
> > (both sides are setup like that).
> 
> Any chance you can run 'git bisect' between the two kernels (bad and
> good) and find the offending commit?

Thanks for your fast answer!

I already considered a bisect, but this will need a rather long time. The 
error yet happened only once with 4.19.80 and once with 4.19.81.

I bootet 4.19.81 on 2019-10-31 and the error happened on 2019-11-04. drbd 
revovered and I did not reboot the machine (I checked the filesystem, no error 
was found). Until today the error has not yet happened again.

So bisecting is rather difficult. I hoped the drbd people could explain what 
this error exactly means.

> 
> Also, does 5.3.9 work for you?

Didn't try yet. It is our fileserver, so usually it is the last machine I move 
to a new longterm kernel and I also avoid all stable kernels in between (for 
that machine).

We use longterm kernels here and usually try to switch to the next longterm 
version as soon as available, starting with our routers. Then we switch one 
server after the other.

To avoid too many suprises we regularly run newest stable kernels on some of 
our machines. But never on our fileserver :-).

But 5.3 was not that bad, so I will try 5.3 next week and keep it as long as 
it does not show other problems.

Regards,
-- 
Wolfgang Walter
Studentenwerk München
Anstalt des öffentlichen Rechts

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2019-11-08 19:30 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-08 16:38 stable 4.19.80 and 4.19.81: BAD! BarrierAck seen in drbd Wolfgang Walter
2019-11-08 17:41 ` Greg KH
2019-11-08 19:30   ` Wolfgang Walter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).