All of lore.kernel.org
 help / color / mirror / Atom feed
* another scrub bug? blocked for > 10240.948831 secs
@ 2017-04-15 21:34 Peter Maloney
  2017-04-17 13:18 ` Sage Weil
  0 siblings, 1 reply; 10+ messages in thread
From: Peter Maloney @ 2017-04-15 21:34 UTC (permalink / raw)
  To: ceph-devel

Is this another scrub bug? Something just like this (1 or 2 requests
blocked forever until osd restart) happened about 5 times so far, each
time during recovery or some other thing I did myself to trigger it,
probably involving snapshots. This time I noticed that it says scrub in
the log. One other time it made a client block, but didn't seem to this
time. I didn't have the same issue in 10.2.3, but I don't know if I
generated the same load or whatever causes it back then.

ceph version 10.2.5 (c461ee19ecbc0c5c330aca20f7392c9a00730367)

If you want me to try 10.2.6 or 7 instead, I can do that, but no
guarantee I can reproduce it any time soon.


>  42392 GB used, 24643 GB / 67035 GB avail; 15917 kB/s rd, 147 MB/s wr,
> 1483 op/s
> 2017-04-15 03:53:57.301902 osd.5 10.3.0.132:6813/1085915 1991 :
> cluster [WRN] 1 slow requests, 1 included below; oldest blocked for >
> 5.372629 secs
> 2017-04-15 03:53:57.301905 osd.5 10.3.0.132:6813/1085915 1992 :
> cluster [WRN] slow request 5.372629 seconds old, received at
> 2017-04-15 03:53:51.929240: replica scrub(pg:
> 4.25,from:0'0,to:73551'5179474,epoch:73551,start:4:a4537100:::rbd_data.4bf687238e1f29.000000000001e5dc:0,end:4:a453818a:::rbd_data.4bf687238e1f29.0000000000017d8b:db18,chunky:1,deep:0,seed:4294967295,version:6)
> currently reached_pg
> 2017-04-15 03:53:57.312641 mon.0 10.3.0.131:6789/0 158090 : cluster
> [INF] pgmap v14652123: 896 pgs: 2 active+clean+scrubbing+deep, 5
> active+clean+scrubbing, 889 active+clean; 17900 GB data, 42392 GB
> used, 24643 GB / 67035 GB avail; 22124 kB/s rd, 191 MB/s wr, 2422 op/s
> ...
> 2017-04-15 03:53:57.419047 osd.8 10.3.0.133:6814/1124407 1725 :
> cluster [WRN] 1 slow requests, 1 included below; oldest blocked for >
> 5.489743 secs
> 2017-04-15 03:53:57.419052 osd.8 10.3.0.133:6814/1124407 1726 :
> cluster [WRN] slow request 5.489743 seconds old, received at
> 2017-04-15 03:53:51.929266: replica scrub(pg:
> 4.25,from:0'0,to:73551'5179474,epoch:73551,start:4:a4537100:::rbd_data.4bf687238e1f29.000000000001e5dc:0,end:4:a453818a:::rbd_data.4bf687238e1f29.0000000000017d8b:db18,chunky:1,deep:0,seed:4294967295,version:6)
> currently reached_pg
> ...
> 2017-04-15 06:44:32.969476 mon.0 10.3.0.131:6789/0 168432 : cluster
> [INF] pgmap v14662280: 896 pgs: 5 active+clean+scrubbing, 891
> active+clean; 18011 GB data, 42703 GB used, 24332 GB / 6703
> 5 GB avail; 2512 kB/s rd, 12321 kB/s wr, 1599 op/s
> 2017-04-15 06:44:32.878155 osd.8 10.3.0.133:6814/1124407 1747 :
> cluster [WRN] 1 slow requests, 1 included below; oldest blocked for >
> 10240.948831 secs
> 2017-04-15 06:44:32.878159 osd.8 10.3.0.133:6814/1124407 1748 :
> cluster [WRN] slow request 10240.948831 seconds old, received at
> 2017-04-15 03:53:51.929266: replica scrub(pg: 4.25,from:0'0,
> to:73551'5179474,epoch:73551,start:4:a4537100:::rbd_data.4bf687238e1f29.000000000001e5dc:0,end:4:a453818a:::rbd_data.4bf687238e1f29.0000000000017d8b:db18,chunky:1,deep:0,seed:4294967295,ver
> sion:6) currently reached_pg
> 2017-04-15 06:44:33.984306 mon.0 10.3.0.131:6789/0 168433 : cluster
> [INF] pgmap v14662281: 896 pgs: 5 active+clean+scrubbing, 891
> active+clean; 18011 GB data, 42703 GB used, 24332 GB / 6703
> 5 GB avail; 11675 kB/s rd, 29068 kB/s wr, 1847 op/s


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: another scrub bug? blocked for > 10240.948831 secs
  2017-04-15 21:34 another scrub bug? blocked for > 10240.948831 secs Peter Maloney
@ 2017-04-17 13:18 ` Sage Weil
  2017-04-20  5:58   ` Peter Maloney
  0 siblings, 1 reply; 10+ messages in thread
From: Sage Weil @ 2017-04-17 13:18 UTC (permalink / raw)
  To: Peter Maloney; +Cc: ceph-devel

On Sat, 15 Apr 2017, Peter Maloney wrote:
> Is this another scrub bug? Something just like this (1 or 2 requests
> blocked forever until osd restart) happened about 5 times so far, each
> time during recovery or some other thing I did myself to trigger it,
> probably involving snapshots. This time I noticed that it says scrub in
> the log. One other time it made a client block, but didn't seem to this
> time. I didn't have the same issue in 10.2.3, but I don't know if I
> generated the same load or whatever causes it back then.
> 
> ceph version 10.2.5 (c461ee19ecbc0c5c330aca20f7392c9a00730367)
> 
> If you want me to try 10.2.6 or 7 instead, I can do that, but no
> guarantee I can reproduce it any time soon.

There have been lots of scrub-related patches since 10.2.5, but I don't 
see one that would explain this.  I'm guessing there is a scrub waitlist 
bug that we aren't turning up in qa because our thrashing tests are 
triggering lots of other actions in sequence (peering from up/down osds 
and balancing) and those probably have the effect of 
clearing the issue.

Next time you see it, can you capture the output of 'ceph daemon osd.NNN 
ops' so we can see what steps the request went through?  Also, any 
additional or more specific clues as to what might have triggered it would 
help.

Thanks!
sage



> 
> >  42392 GB used, 24643 GB / 67035 GB avail; 15917 kB/s rd, 147 MB/s wr,
> > 1483 op/s
> > 2017-04-15 03:53:57.301902 osd.5 10.3.0.132:6813/1085915 1991 :
> > cluster [WRN] 1 slow requests, 1 included below; oldest blocked for >
> > 5.372629 secs
> > 2017-04-15 03:53:57.301905 osd.5 10.3.0.132:6813/1085915 1992 :
> > cluster [WRN] slow request 5.372629 seconds old, received at
> > 2017-04-15 03:53:51.929240: replica scrub(pg:
> > 4.25,from:0'0,to:73551'5179474,epoch:73551,start:4:a4537100:::rbd_data.4bf687238e1f29.000000000001e5dc:0,end:4:a453818a:::rbd_data.4bf687238e1f29.0000000000017d8b:db18,chunky:1,deep:0,seed:4294967295,version:6)
> > currently reached_pg
> > 2017-04-15 03:53:57.312641 mon.0 10.3.0.131:6789/0 158090 : cluster
> > [INF] pgmap v14652123: 896 pgs: 2 active+clean+scrubbing+deep, 5
> > active+clean+scrubbing, 889 active+clean; 17900 GB data, 42392 GB
> > used, 24643 GB / 67035 GB avail; 22124 kB/s rd, 191 MB/s wr, 2422 op/s
> > ...
> > 2017-04-15 03:53:57.419047 osd.8 10.3.0.133:6814/1124407 1725 :
> > cluster [WRN] 1 slow requests, 1 included below; oldest blocked for >
> > 5.489743 secs
> > 2017-04-15 03:53:57.419052 osd.8 10.3.0.133:6814/1124407 1726 :
> > cluster [WRN] slow request 5.489743 seconds old, received at
> > 2017-04-15 03:53:51.929266: replica scrub(pg:
> > 4.25,from:0'0,to:73551'5179474,epoch:73551,start:4:a4537100:::rbd_data.4bf687238e1f29.000000000001e5dc:0,end:4:a453818a:::rbd_data.4bf687238e1f29.0000000000017d8b:db18,chunky:1,deep:0,seed:4294967295,version:6)
> > currently reached_pg
> > ...
> > 2017-04-15 06:44:32.969476 mon.0 10.3.0.131:6789/0 168432 : cluster
> > [INF] pgmap v14662280: 896 pgs: 5 active+clean+scrubbing, 891
> > active+clean; 18011 GB data, 42703 GB used, 24332 GB / 6703
> > 5 GB avail; 2512 kB/s rd, 12321 kB/s wr, 1599 op/s
> > 2017-04-15 06:44:32.878155 osd.8 10.3.0.133:6814/1124407 1747 :
> > cluster [WRN] 1 slow requests, 1 included below; oldest blocked for >
> > 10240.948831 secs
> > 2017-04-15 06:44:32.878159 osd.8 10.3.0.133:6814/1124407 1748 :
> > cluster [WRN] slow request 10240.948831 seconds old, received at
> > 2017-04-15 03:53:51.929266: replica scrub(pg: 4.25,from:0'0,
> > to:73551'5179474,epoch:73551,start:4:a4537100:::rbd_data.4bf687238e1f29.000000000001e5dc:0,end:4:a453818a:::rbd_data.4bf687238e1f29.0000000000017d8b:db18,chunky:1,deep:0,seed:4294967295,ver
> > sion:6) currently reached_pg
> > 2017-04-15 06:44:33.984306 mon.0 10.3.0.131:6789/0 168433 : cluster
> > [INF] pgmap v14662281: 896 pgs: 5 active+clean+scrubbing, 891
> > active+clean; 18011 GB data, 42703 GB used, 24332 GB / 6703
> > 5 GB avail; 11675 kB/s rd, 29068 kB/s wr, 1847 op/s
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: another scrub bug? blocked for > 10240.948831 secs
  2017-04-17 13:18 ` Sage Weil
@ 2017-04-20  5:58   ` Peter Maloney
  2017-04-20 14:23     ` Sage Weil
  0 siblings, 1 reply; 10+ messages in thread
From: Peter Maloney @ 2017-04-20  5:58 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel

On 04/17/17 15:18, Sage Weil wrote:
> On Sat, 15 Apr 2017, Peter Maloney wrote:
>> Is this another scrub bug? Something just like this (1 or 2 requests
>> blocked forever until osd restart) happened about 5 times so far, each
>> time during recovery or some other thing I did myself to trigger it,
>> probably involving snapshots. This time I noticed that it says scrub in
>> the log. One other time it made a client block, but didn't seem to this
>> time. I didn't have the same issue in 10.2.3, but I don't know if I
>> generated the same load or whatever causes it back then.
>>
>> ceph version 10.2.5 (c461ee19ecbc0c5c330aca20f7392c9a00730367)
>>
>> If you want me to try 10.2.6 or 7 instead, I can do that, but no
>> guarantee I can reproduce it any time soon.
> There have been lots of scrub-related patches since 10.2.5, but I don't 
> see one that would explain this.  I'm guessing there is a scrub waitlist 
> bug that we aren't turning up in qa because our thrashing tests are 
> triggering lots of other actions in sequence (peering from up/down osds 
> and balancing) and those probably have the effect of 
> clearing the issue.
>
> Next time you see it, can you capture the output of 'ceph daemon osd.NNN 
> ops' so we can see what steps the request went through?  Also, any 
> additional or more specific clues as to what might have triggered it would 
> help.
>
> Thanks!
> sage
>


Here, we go...

> 2017-04-20 05:03:58.522624 osd.20 10.3.0.132:6824/1088346 2078 :
> cluster [WRN] slow request 20480.953923 seconds old, received at
> 2017-04-19 23:22:37.568657: replica scrub(pg:
> 4.2d,from:0'0,to:79873'5433886,epoch:79873,start:4:b46a9ab4:::rbd_data.43e4a9238e1f29.00000000000019fe:e438,end:4:b46a9ab4:::rbd_data.43e4a9238e1f29.00000000000019fe:e43a,chunky:1,deep:1,seed:4294967295,version:6)
> currently reached_pg
>
>
> root@ceph3:/var/log/ceph # ceph pg dump | grep -E "^pg_stat|^4.2d"
> dumped all in format plain
> pg_stat objects mip     degr    misp    unf     bytes   log    
> disklog state   state_stamp     v       reported        up     
> up_primary      acting  acting_primary  last_scrub      scrub_stamp 
> last_deep_scrub deep_scrub_stamp
> 4.2d    3494    0       0       0       0       13061467136    
> 3012    3012    active+clean+scrubbing+deep     2017-04-19
> 23:11:22.763008      80142'5447712   80142:3027638   [14,7,20]  
> 14       [14,7,20]       14      76449'5396453   2017-04-18
> 17:30:01.419498      73551'5298960   2017-04-15 03:23:48.821848
>
> ceph daemon osd.14 ops > longscrubbug.14.log # ceph1
> ceph daemon osd.7 ops > longscrubbug.7.log # ceph3
> ceph daemon osd.20 ops > longscrubbug.20.log # ceph2
>
>
>
> peter@peter:~/tmp/longscrubbug $ cat longscrubbug.14.log
> {
>     "ops": [],
>     "num_ops": 0
> }
>
> peter@peter:~/tmp/longscrubbug $ cat longscrubbug.7.log
> {
>     "ops": [],
>     "num_ops": 0
> }
>
> peter@peter:~/tmp/longscrubbug $ cat longscrubbug.20.log
> {
>     "ops": [
>         {
>             "description": "replica scrub(pg:
> 4.2d,from:0'0,to:79873'5433886,epoch:79873,start:4:b46a9ab4:::rbd_data.43e4a9238e1f29.00000000000019fe:e438,end:4:b46a9ab4:::rbd_data.43e4a9238e1f29.00000000000019fe:e43a,chunky:1,deep:1,seed:4294967295,version:6)",
>             "initiated_at": "2017-04-19 23:22:37.568657",
>             "age": 30563.826540,
>             "duration": 30563.826803,
>             "type_data": [
>                 "reached pg",
>                 [
>                     {
>                         "time": "2017-04-19 23:22:37.568657",
>                         "event": "initiated"
>                     },
>                     {
>                         "time": "2017-04-19 23:22:37.568769",
>                         "event": "queued_for_pg"
>                     },
>                     {
>                         "time": "2017-04-19 23:22:37.568814",
>                         "event": "reached_pg"
>                     }
>                 ]
>             ]
>         }
>     ],
>     "num_ops": 1
> }
>

to me that looks kinda non-useful... anything else you need? (I think
I'll leave this one alone for a while in case you reply soon and want
something)


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: another scrub bug? blocked for > 10240.948831 secs
  2017-04-20  5:58   ` Peter Maloney
@ 2017-04-20 14:23     ` Sage Weil
  2017-04-20 16:05       ` Peter Maloney
  0 siblings, 1 reply; 10+ messages in thread
From: Sage Weil @ 2017-04-20 14:23 UTC (permalink / raw)
  To: Peter Maloney; +Cc: ceph-devel

On Thu, 20 Apr 2017, Peter Maloney wrote:
> On 04/17/17 15:18, Sage Weil wrote:
> > On Sat, 15 Apr 2017, Peter Maloney wrote:
> >> Is this another scrub bug? Something just like this (1 or 2 requests
> >> blocked forever until osd restart) happened about 5 times so far, each
> >> time during recovery or some other thing I did myself to trigger it,
> >> probably involving snapshots. This time I noticed that it says scrub in
> >> the log. One other time it made a client block, but didn't seem to this
> >> time. I didn't have the same issue in 10.2.3, but I don't know if I
> >> generated the same load or whatever causes it back then.
> >>
> >> ceph version 10.2.5 (c461ee19ecbc0c5c330aca20f7392c9a00730367)
> >>
> >> If you want me to try 10.2.6 or 7 instead, I can do that, but no
> >> guarantee I can reproduce it any time soon.
> > There have been lots of scrub-related patches since 10.2.5, but I don't 
> > see one that would explain this.  I'm guessing there is a scrub waitlist 
> > bug that we aren't turning up in qa because our thrashing tests are 
> > triggering lots of other actions in sequence (peering from up/down osds 
> > and balancing) and those probably have the effect of 
> > clearing the issue.
> >
> > Next time you see it, can you capture the output of 'ceph daemon osd.NNN 
> > ops' so we can see what steps the request went through?  Also, any 
> > additional or more specific clues as to what might have triggered it would 
> > help.
> >
> > Thanks!
> > sage
> >
> 
> 
> Here, we go...
> 
> > 2017-04-20 05:03:58.522624 osd.20 10.3.0.132:6824/1088346 2078 :
> > cluster [WRN] slow request 20480.953923 seconds old, received at
> > 2017-04-19 23:22:37.568657: replica scrub(pg:
> > 4.2d,from:0'0,to:79873'5433886,epoch:79873,start:4:b46a9ab4:::rbd_data.43e4a9238e1f29.00000000000019fe:e438,end:4:b46a9ab4:::rbd_data.43e4a9238e1f29.00000000000019fe:e43a,chunky:1,deep:1,seed:4294967295,version:6)
> > currently reached_pg
> >
> >
> > root@ceph3:/var/log/ceph # ceph pg dump | grep -E "^pg_stat|^4.2d"
> > dumped all in format plain
> > pg_stat objects mip     degr    misp    unf     bytes   log    
> > disklog state   state_stamp     v       reported        up     
> > up_primary      acting  acting_primary  last_scrub      scrub_stamp 
> > last_deep_scrub deep_scrub_stamp
> > 4.2d    3494    0       0       0       0       13061467136    
> > 3012    3012    active+clean+scrubbing+deep     2017-04-19
> > 23:11:22.763008      80142'5447712   80142:3027638   [14,7,20]  
> > 14       [14,7,20]       14      76449'5396453   2017-04-18
> > 17:30:01.419498      73551'5298960   2017-04-15 03:23:48.821848
> >
> > ceph daemon osd.14 ops > longscrubbug.14.log # ceph1
> > ceph daemon osd.7 ops > longscrubbug.7.log # ceph3
> > ceph daemon osd.20 ops > longscrubbug.20.log # ceph2
> >
> >
> >
> > peter@peter:~/tmp/longscrubbug $ cat longscrubbug.14.log
> > {
> >     "ops": [],
> >     "num_ops": 0
> > }
> >
> > peter@peter:~/tmp/longscrubbug $ cat longscrubbug.7.log
> > {
> >     "ops": [],
> >     "num_ops": 0
> > }
> >
> > peter@peter:~/tmp/longscrubbug $ cat longscrubbug.20.log
> > {
> >     "ops": [
> >         {
> >             "description": "replica scrub(pg:
> > 4.2d,from:0'0,to:79873'5433886,epoch:79873,start:4:b46a9ab4:::rbd_data.43e4a9238e1f29.00000000000019fe:e438,end:4:b46a9ab4:::rbd_data.43e4a9238e1f29.00000000000019fe:e43a,chunky:1,deep:1,seed:4294967295,version:6)",
> >             "initiated_at": "2017-04-19 23:22:37.568657",
> >             "age": 30563.826540,
> >             "duration": 30563.826803,
> >             "type_data": [
> >                 "reached pg",
> >                 [
> >                     {
> >                         "time": "2017-04-19 23:22:37.568657",
> >                         "event": "initiated"
> >                     },
> >                     {
> >                         "time": "2017-04-19 23:22:37.568769",
> >                         "event": "queued_for_pg"
> >                     },
> >                     {
> >                         "time": "2017-04-19 23:22:37.568814",
> >                         "event": "reached_pg"
> >                     }
> >                 ]
> >             ]
> >         }
> >     ],
> >     "num_ops": 1
> > }
> >
> 
> to me that looks kinda non-useful... anything else you need? (I think
> I'll leave this one alone for a while in case you reply soon and want
> something)

Hmm.  Yeah, not terribly helpful.  Can you attach a 'ceph osd dump -f 
json-pretty' for epoch 79870 through 79880, as well as the latest?

My best guess is this is an interaction with snap trimming; the osdmap 
should indicate if a snap was deleted right around then.

Thanks!
sage

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: another scrub bug? blocked for > 10240.948831 secs
  2017-04-20 14:23     ` Sage Weil
@ 2017-04-20 16:05       ` Peter Maloney
  2017-04-20 16:19         ` Sage Weil
  0 siblings, 1 reply; 10+ messages in thread
From: Peter Maloney @ 2017-04-20 16:05 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel

On 04/20/17 16:23, Sage Weil wrote:
> On Thu, 20 Apr 2017, Peter Maloney wrote:
>> On 04/17/17 15:18, Sage Weil wrote:
>>>
>>> Next time you see it, can you capture the output of 'ceph daemon osd.NNN 
>>> ops' so we can see what steps the request went through?  Also, any 
>>> additional or more specific clues as to what might have triggered it would 
>>> help.
>>>
>>> Thanks!
>>> sage
>>>
>>
>> Here, we go...
>>
>>> ...
>>>
>>> {
>>>     "ops": [
>>>         {
>>>             "description": "replica scrub(pg:
>>> 4.2d,from:0'0,to:79873'5433886,epoch:79873,start:4:b46a9ab4:::rbd_data.43e4a9238e1f29.00000000000019fe:e438,end:4:b46a9ab4:::rbd_data.43e4a9238e1f29.00000000000019fe:e43a,chunky:1,deep:1,seed:4294967295,version:6)",
>>>             "initiated_at": "2017-04-19 23:22:37.568657",
>>>             "age": 30563.826540,
>>>             "duration": 30563.826803,
>>>             "type_data": [
>>>                 "reached pg",
>>>                 [
>>>                     {
>>>                         "time": "2017-04-19 23:22:37.568657",
>>>                         "event": "initiated"
>>>                     },
>>>                     {
>>>                         "time": "2017-04-19 23:22:37.568769",
>>>                         "event": "queued_for_pg"
>>>                     },
>>>                     {
>>>                         "time": "2017-04-19 23:22:37.568814",
>>>                         "event": "reached_pg"
>>>                     }
>>>                 ]
>>>             ]
>>>         }
>>>     ],
>>>     "num_ops": 1
>>> }
>>>
>> to me that looks kinda non-useful... anything else you need? (I think
>> I'll leave this one alone for a while in case you reply soon and want
>> something)
> Hmm.  Yeah, not terribly helpful.  Can you attach a 'ceph osd dump -f 
> json-pretty' for epoch 79870 through 79880, as well as the latest?
>
> My best guess is this is an interaction with snap trimming; the osdmap 
> should indicate if a snap was deleted right around then.
>
> Thanks!
> sage

I restarted some osds which cleared the previous block...but here is a
new one.

> 2017-04-20 17:13:20.650929 osd.9 10.3.0.132:6807/1084881 4776 :
> cluster [WRN] 1 slow requests, 1 included below; oldest blocked for >
> 2560.473108 secs
> 2017-04-20 17:13:20.650933 osd.9 10.3.0.132:6807/1084881 4777 :
> cluster [WRN] slow request 2560.473108 seconds old, received at
> 2017-04-20 16:30:40.177787: replica scrub(pg:
> 4.27,from:0'0,to:83264'5292979,epoch:83264,start:4:e45da22f:::rbd_data.4d3c7e2ae8944a.000000000000388e:0,end:4:e45da2cc:::rbd_data.46820b238e1f29.0000000000008bf4:e5c4,chunky:1,deep:0,seed:4294967295,version:6)
> currently reached_pg
>


And some osd dump stuff....which seems all the same. Which fields are
you looking for? And are you sure you want the epoch around that time,
not 2560 seconds before?

> for n in {83210..83290}; do
>     echo $n
>     ceph osd dump 83264 -f json-pretty | jq ".pools | .[] |
> select(.pool == 4)" | grep snap
> done

> 83210
>   "snap_mode": "selfmanaged",
>   "snap_seq": 58820,
>   "snap_epoch": 83260,
>   "pool_snaps": [],
>   "removed_snaps":
> "[1~e4d3,e4d5~1,e4d7~1,e4d9~1,e4db~1,e4dd~1,e4df~1,e4e1~1,e4e3~1,e4e5~1,e4e7~1,e4e9~1,e4eb~1,e4ed~1,e4ef~1,e4f1~1,e4f3~1,e4f5~1,e4f7~1,e4f9~1,e4fb~1,e4fd~1,e4ff~1,e501~1,e503~1,e505~1,e507~1,e509~1,e50b~1,e50d~1,e50f~1,e511~1,e513~1,e515~1,e517~1,e519~1,e51b~1,e51d~1,e51f~1,e521~1,e523~1,e525~1,e527~1,e529~1,e52b~1,e52d~1,e52f~1,e531~1,e533~1,e535~1,e537~1,e539~1,e53b~1,e53d~1,e53f~1,e541~1,e543~1,e545~1,e547~1,e549~1,e54b~1,e54d~1,e54f~1,e551~1,e553~1,e555~1,e557~1,e559~1,e55b~1,e55d~1,e55f~1,e561~1,e563~1,e565~1,e567~1,e569~1,e56b~1,e56d~1,e56f~1,e571~1,e573~1,e575~1,e577~1,e579~1,e57b~1,e57d~1,e57f~1,e581~1,e583~1,e585~1,e587~1,e589~1,e58b~1,e58d~1,e58f~1,e591~1,e593~1,e595~1,e597~1,e599~1,e59b~1,e59d~1,e59f~1,e5a1~1,e5a3~1,e5a5~1,e5a7~1,e5a9~1,e5ab~1,e5ad~1,e5af~1,e5b1~1,e5b3~1,e5b5~
 1,e5b7~1,e5b9~1,e5bb~1,e5bd~1,e5bf~1,e5c1~1,e5c3~1]",
...
> 83290
>   "snap_mode": "selfmanaged",
>   "snap_seq": 58820,
>   "snap_epoch": 83260,
>   "pool_snaps": [],
>   "removed_snaps":
> "[1~e4d3,e4d5~1,e4d7~1,e4d9~1,e4db~1,e4dd~1,e4df~1,e4e1~1,e4e3~1,e4e5~1,e4e7~1,e4e9~1,e4eb~1,e4ed~1,e4ef~1,e4f1~1,e4f3~1,e4f5~1,e4f7~1,e4f9~1,e4fb~1,e4fd~1,e4ff~1,e501~1,e503~1,e505~1,e507~1,e509~1,e50b~1,e50d~1,e50f~1,e511~1,e513~1,e515~1,e517~1,e519~1,e51b~1,e51d~1,e51f~1,e521~1,e523~1,e525~1,e527~1,e529~1,e52b~1,e52d~1,e52f~1,e531~1,e533~1,e535~1,e537~1,e539~1,e53b~1,e53d~1,e53f~1,e541~1,e543~1,e545~1,e547~1,e549~1,e54b~1,e54d~1,e54f~1,e551~1,e553~1,e555~1,e557~1,e559~1,e55b~1,e55d~1,e55f~1,e561~1,e563~1,e565~1,e567~1,e569~1,e56b~1,e56d~1,e56f~1,e571~1,e573~1,e575~1,e577~1,e579~1,e57b~1,e57d~1,e57f~1,e581~1,e583~1,e585~1,e587~1,e589~1,e58b~1,e58d~1,e58f~1,e591~1,e593~1,e595~1,e597~1,e599~1,e59b~1,e59d~1,e59f~1,e5a1~1,e5a3~1,e5a5~1,e5a7~1,e5a9~1,e5ab~1,e5ad~1,e5af~1,e5b1~1,e5b3~1,e5b5~
 1,e5b7~1,e5b9~1,e5bb~1,e5bd~1,e5bf~1,e5c1~1,e5c3~1]",
(always says the same thing in this huge range)


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: another scrub bug? blocked for > 10240.948831 secs
  2017-04-20 16:05       ` Peter Maloney
@ 2017-04-20 16:19         ` Sage Weil
  2017-04-20 18:58           ` Peter Maloney
  0 siblings, 1 reply; 10+ messages in thread
From: Sage Weil @ 2017-04-20 16:19 UTC (permalink / raw)
  To: Peter Maloney; +Cc: ceph-devel

On Thu, 20 Apr 2017, Peter Maloney wrote:
> On 04/20/17 16:23, Sage Weil wrote:
> > On Thu, 20 Apr 2017, Peter Maloney wrote:
> >> On 04/17/17 15:18, Sage Weil wrote:
> >>>
> >>> Next time you see it, can you capture the output of 'ceph daemon osd.NNN 
> >>> ops' so we can see what steps the request went through?  Also, any 
> >>> additional or more specific clues as to what might have triggered it would 
> >>> help.
> >>>
> >>> Thanks!
> >>> sage
> >>>
> >>
> >> Here, we go...
> >>
> >>> ...
> >>>
> >>> {
> >>>     "ops": [
> >>>         {
> >>>             "description": "replica scrub(pg:
> >>> 4.2d,from:0'0,to:79873'5433886,epoch:79873,start:4:b46a9ab4:::rbd_data.43e4a9238e1f29.00000000000019fe:e438,end:4:b46a9ab4:::rbd_data.43e4a9238e1f29.00000000000019fe:e43a,chunky:1,deep:1,seed:4294967295,version:6)",
> >>>             "initiated_at": "2017-04-19 23:22:37.568657",
> >>>             "age": 30563.826540,
> >>>             "duration": 30563.826803,
> >>>             "type_data": [
> >>>                 "reached pg",
> >>>                 [
> >>>                     {
> >>>                         "time": "2017-04-19 23:22:37.568657",
> >>>                         "event": "initiated"
> >>>                     },
> >>>                     {
> >>>                         "time": "2017-04-19 23:22:37.568769",
> >>>                         "event": "queued_for_pg"
> >>>                     },
> >>>                     {
> >>>                         "time": "2017-04-19 23:22:37.568814",
> >>>                         "event": "reached_pg"
> >>>                     }
> >>>                 ]
> >>>             ]
> >>>         }
> >>>     ],
> >>>     "num_ops": 1
> >>> }
> >>>
> >> to me that looks kinda non-useful... anything else you need? (I think
> >> I'll leave this one alone for a while in case you reply soon and want
> >> something)
> > Hmm.  Yeah, not terribly helpful.  Can you attach a 'ceph osd dump -f 
> > json-pretty' for epoch 79870 through 79880, as well as the latest?
> >
> > My best guess is this is an interaction with snap trimming; the osdmap 
> > should indicate if a snap was deleted right around then.
> >
> > Thanks!
> > sage
> 
> I restarted some osds which cleared the previous block...but here is a
> new one.
> 
> > 2017-04-20 17:13:20.650929 osd.9 10.3.0.132:6807/1084881 4776 :
> > cluster [WRN] 1 slow requests, 1 included below; oldest blocked for >
> > 2560.473108 secs
> > 2017-04-20 17:13:20.650933 osd.9 10.3.0.132:6807/1084881 4777 :
> > cluster [WRN] slow request 2560.473108 seconds old, received at
> > 2017-04-20 16:30:40.177787: replica scrub(pg:
> > 4.27,from:0'0,to:83264'5292979,epoch:83264,start:4:e45da22f:::rbd_data.4d3c7e2ae8944a.000000000000388e:0,end:4:e45da2cc:::rbd_data.46820b238e1f29.0000000000008bf4:e5c4,chunky:1,deep:0,seed:4294967295,version:6)
> > currently reached_pg
> >
> 
> 
> And some osd dump stuff....which seems all the same. Which fields are
> you looking for? And are you sure you want the epoch around that time,
> not 2560 seconds before?

The epoch number came from the scrub op that got blocked ("epoch:83264" 
from the op string), so it should be right around then... but I guess the 
underlying first question is whether any snap deletion happened anywhere 
around this period (2560+ sec before the warning, or around the time the 
op was sent in epoch 83264).  (And yeah, removed_snaps is the field that 
matters here!)

Thanks!
sage


> > for n in {83210..83290}; do
> >     echo $n
> >     ceph osd dump 83264 -f json-pretty | jq ".pools | .[] |
> > select(.pool == 4)" | grep snap
> > done
> 
> > 83210
> >   "snap_mode": "selfmanaged",
> >   "snap_seq": 58820,
> >   "snap_epoch": 83260,
> >   "pool_snaps": [],
> >   "removed_snaps":
> > "[1~e4d3,e4d5~1,e4d7~1,e4d9~1,e4db~1,e4dd~1,e4df~1,e4e1~1,e4e3~1,e4e5~1,e4e7~1,e4e9~1,e4eb~1,e4ed~1,e4ef~1,e4f1~1,e4f3~1,e4f5~1,e4f7~1,e4f9~1,e4fb~1,e4fd~1,e4ff~1,e501~1,e503~1,e505~1,e507~1,e509~1,e50b~1,e50d~1,e50f~1,e511~1,e513~1,e515~1,e517~1,e519~1,e51b~1,e51d~1,e51f~1,e521~1,e523~1,e525~1,e527~1,e529~1,e52b~1,e52d~1,e52f~1,e531~1,e533~1,e535~1,e537~1,e539~1,e53b~1,e53d~1,e53f~1,e541~1,e543~1,e545~1,e547~1,e549~1,e54b~1,e54d~1,e54f~1,e551~1,e553~1,e555~1,e557~1,e559~1,e55b~1,e55d~1,e55f~1,e561~1,e563~1,e565~1,e567~1,e569~1,e56b~1,e56d~1,e56f~1,e571~1,e573~1,e575~1,e577~1,e579~1,e57b~1,e57d~1,e57f~1,e581~1,e583~1,e585~1,e587~1,e589~1,e58b~1,e58d~1,e58f~1,e591~1,e593~1,e595~1,e597~1,e599~1,e59b~1,e59d~1,e59f~1,e5a1~1,e5a3~1,e5a5~1,e5a7~1,e5a9~1,e5ab~1,e5ad~1,e5af~1,e5b1~1,e5b3~1,e5b
 5~1,e5b7~1,e5b9~1,e5bb~1,e5bd~1,e5bf~1,e5c1~1,e5c3~1]",
> ...
> > 83290
> >   "snap_mode": "selfmanaged",
> >   "snap_seq": 58820,
> >   "snap_epoch": 83260,
> >   "pool_snaps": [],
> >   "removed_snaps":
> > "[1~e4d3,e4d5~1,e4d7~1,e4d9~1,e4db~1,e4dd~1,e4df~1,e4e1~1,e4e3~1,e4e5~1,e4e7~1,e4e9~1,e4eb~1,e4ed~1,e4ef~1,e4f1~1,e4f3~1,e4f5~1,e4f7~1,e4f9~1,e4fb~1,e4fd~1,e4ff~1,e501~1,e503~1,e505~1,e507~1,e509~1,e50b~1,e50d~1,e50f~1,e511~1,e513~1,e515~1,e517~1,e519~1,e51b~1,e51d~1,e51f~1,e521~1,e523~1,e525~1,e527~1,e529~1,e52b~1,e52d~1,e52f~1,e531~1,e533~1,e535~1,e537~1,e539~1,e53b~1,e53d~1,e53f~1,e541~1,e543~1,e545~1,e547~1,e549~1,e54b~1,e54d~1,e54f~1,e551~1,e553~1,e555~1,e557~1,e559~1,e55b~1,e55d~1,e55f~1,e561~1,e563~1,e565~1,e567~1,e569~1,e56b~1,e56d~1,e56f~1,e571~1,e573~1,e575~1,e577~1,e579~1,e57b~1,e57d~1,e57f~1,e581~1,e583~1,e585~1,e587~1,e589~1,e58b~1,e58d~1,e58f~1,e591~1,e593~1,e595~1,e597~1,e599~1,e59b~1,e59d~1,e59f~1,e5a1~1,e5a3~1,e5a5~1,e5a7~1,e5a9~1,e5ab~1,e5ad~1,e5af~1,e5b1~1,e5b3~1,e5b
 5~1,e5b7~1,e5b9~1,e5bb~1,e5bd~1,e5bf~1,e5c1~1,e5c3~1]",
> (always says the same thing in this huge range)
> 
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: another scrub bug? blocked for > 10240.948831 secs
  2017-04-20 16:19         ` Sage Weil
@ 2017-04-20 18:58           ` Peter Maloney
  2017-04-28  7:24             ` Peter Maloney
  0 siblings, 1 reply; 10+ messages in thread
From: Peter Maloney @ 2017-04-20 18:58 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel

On 04/20/17 18:19, Sage Weil wrote:
> On Thu, 20 Apr 2017, Peter Maloney wrote:
>> On 04/20/17 16:23, Sage Weil wrote:
>>> On Thu, 20 Apr 2017, Peter Maloney wrote:
>>>> On 04/17/17 15:18, Sage Weil wrote:
>>>>> Next time you see it, can you capture the output of 'ceph daemon osd.NNN 
>>>>> ops' so we can see what steps the request went through?  Also, any 
>>>>> additional or more specific clues as to what might have triggered it would 
>>>>> help.
>>>>>
>>>>> Thanks!
>>>>> sage
>>>>>
>>>> Here, we go...
>>>>
>>>>> ...
>>>>>
>>>>> {
>>>>>     "ops": [
>>>>>         {
>>>>>             "description": "replica scrub(pg:
>>>>> 4.2d,from:0'0,to:79873'5433886,epoch:79873,start:4:b46a9ab4:::rbd_data.43e4a9238e1f29.00000000000019fe:e438,end:4:b46a9ab4:::rbd_data.43e4a9238e1f29.00000000000019fe:e43a,chunky:1,deep:1,seed:4294967295,version:6)",
>>>>>             "initiated_at": "2017-04-19 23:22:37.568657",
>>>>>             "age": 30563.826540,
>>>>>             "duration": 30563.826803,
>>>>>             "type_data": [
>>>>>                 "reached pg",
>>>>>                 [
>>>>>                     {
>>>>>                         "time": "2017-04-19 23:22:37.568657",
>>>>>                         "event": "initiated"
>>>>>                     },
>>>>>                     {
>>>>>                         "time": "2017-04-19 23:22:37.568769",
>>>>>                         "event": "queued_for_pg"
>>>>>                     },
>>>>>                     {
>>>>>                         "time": "2017-04-19 23:22:37.568814",
>>>>>                         "event": "reached_pg"
>>>>>                     }
>>>>>                 ]
>>>>>             ]
>>>>>         }
>>>>>     ],
>>>>>     "num_ops": 1
>>>>> }
>>>>>
>>>> to me that looks kinda non-useful... anything else you need? (I think
>>>> I'll leave this one alone for a while in case you reply soon and want
>>>> something)
>>> Hmm.  Yeah, not terribly helpful.  Can you attach a 'ceph osd dump -f 
>>> json-pretty' for epoch 79870 through 79880, as well as the latest?
>>>
>>> My best guess is this is an interaction with snap trimming; the osdmap 
>>> should indicate if a snap was deleted right around then.
>>>
>>> Thanks!
>>> sage
>> I restarted some osds which cleared the previous block...but here is a
>> new one.
>>
>>> 2017-04-20 17:13:20.650929 osd.9 10.3.0.132:6807/1084881 4776 :
>>> cluster [WRN] 1 slow requests, 1 included below; oldest blocked for >
>>> 2560.473108 secs
>>> 2017-04-20 17:13:20.650933 osd.9 10.3.0.132:6807/1084881 4777 :
>>> cluster [WRN] slow request 2560.473108 seconds old, received at
>>> 2017-04-20 16:30:40.177787: replica scrub(pg:
>>> 4.27,from:0'0,to:83264'5292979,epoch:83264,start:4:e45da22f:::rbd_data.4d3c7e2ae8944a.000000000000388e:0,end:4:e45da2cc:::rbd_data.46820b238e1f29.0000000000008bf4:e5c4,chunky:1,deep:0,seed:4294967295,version:6)
>>> currently reached_pg
>>>
>>
>> And some osd dump stuff....which seems all the same. Which fields are
>> you looking for? And are you sure you want the epoch around that time,
>> not 2560 seconds before?
> The epoch number came from the scrub op that got blocked ("epoch:83264" 
> from the op string), so it should be right around then...
I understand that, so I used a different range in this one since it's a
new blocked request. (and made it automatic so a large range is easy and
made it huge since it wasn't any different when I looked manually)

And I see that the log always has the same epoch for the request...not
the current epoch. So I guess it's not 2560 seconds before that. And
found the problem...my script didn't use $n in the actual dump so it
repeated the same dump ;)

So here's a proper dump. I hope you find the problem.

> oldx=
> for n in {83244..83290}; do
>     x=$(ceph osd dump "$n" -f json-pretty | jq ".pools | .[] |
> select(.pool == 4)" | grep snap)
>     if [ "$x" != "$oldx" ]; then
>         echo $n
>         echo "$x"
>     fi
>     oldx="$x"
> done

> 83244
>   "snap_mode": "selfmanaged",
>   "snap_seq": 58814,
>   "snap_epoch": 83244,
>   "pool_snaps": [],
>   "removed_snaps":
> "[1~e4cd,e4cf~1,e4d1~1,e4d3~1,e4d5~1,e4d7~1,e4d9~1,e4db~1,e4dd~1,e4df~1,e4e1~1,e4e3~1,e4e5~1,e4e7~1,e4e9~1,e4eb~1,e4ed~1,e4ef~1,e4f1~1,e4f3~1,e4f5~1,e4f7~1,e4f9~1,e4fb~1,e4fd~1,e4ff~1,e501~1,e503~1,e505~1,e507~1,e509~1,e50b~1,e50d~1,e50f~1,e511~1,e513~1,e515~1,e517~1,e519~1,e51b~1,e51d~1,e51f~1,e521~1,e523~1,e525~1,e527~1,e529~1,e52b~1,e52d~1,e52f~1,e531~1,e533~1,e535~1,e537~1,e539~1,e53b~1,e53d~1,e53f~1,e541~1,e543~1,e545~1,e547~1,e549~1,e54b~1,e54d~1,e54f~1,e551~1,e553~1,e555~1,e557~1,e559~1,e55b~1,e55d~1,e55f~1,e561~1,e563~1,e565~1,e567~1,e569~1,e56b~1,e56d~1,e56f~1,e571~1,e573~1,e575~1,e577~1,e579~1,e57b~1,e57d~1,e57f~1,e581~1,e583~1,e585~1,e587~1,e589~1,e58b~1,e58d~1,e58f~1,e591~1,e593~1,e595~1,e597~1,e599~1,e59b~1,e59d~1,e59f~1,e5a1~1,e5a3~1,e5a5~1,e5a7~1,e5a9~1,e5ab~1,e5ad~1,e5af~
 1,e5b1~1,e5b3~1,e5b5~1,e5b7~1,e5b9~1,e5bb~1,e5bd~1]",
> 83253
>   "snap_mode": "selfmanaged",
>   "snap_seq": 58815,
>   "snap_epoch": 83253,
>   "pool_snaps": [],
>   "removed_snaps":
> "[1~e4cf,e4d1~1,e4d3~1,e4d5~1,e4d7~1,e4d9~1,e4db~1,e4dd~1,e4df~1,e4e1~1,e4e3~1,e4e5~1,e4e7~1,e4e9~1,e4eb~1,e4ed~1,e4ef~1,e4f1~1,e4f3~1,e4f5~1,e4f7~1,e4f9~1,e4fb~1,e4fd~1,e4ff~1,e501~1,e503~1,e505~1,e507~1,e509~1,e50b~1,e50d~1,e50f~1,e511~1,e513~1,e515~1,e517~1,e519~1,e51b~1,e51d~1,e51f~1,e521~1,e523~1,e525~1,e527~1,e529~1,e52b~1,e52d~1,e52f~1,e531~1,e533~1,e535~1,e537~1,e539~1,e53b~1,e53d~1,e53f~1,e541~1,e543~1,e545~1,e547~1,e549~1,e54b~1,e54d~1,e54f~1,e551~1,e553~1,e555~1,e557~1,e559~1,e55b~1,e55d~1,e55f~1,e561~1,e563~1,e565~1,e567~1,e569~1,e56b~1,e56d~1,e56f~1,e571~1,e573~1,e575~1,e577~1,e579~1,e57b~1,e57d~1,e57f~1,e581~1,e583~1,e585~1,e587~1,e589~1,e58b~1,e58d~1,e58f~1,e591~1,e593~1,e595~1,e597~1,e599~1,e59b~1,e59d~1,e59f~1,e5a1~1,e5a3~1,e5a5~1,e5a7~1,e5a9~1,e5ab~1,e5ad~1,e5af~1,e5b1~
 1,e5b3~1,e5b5~1,e5b7~1,e5b9~1,e5bb~1,e5bd~1,e5bf~1]",
> 83254
>   "snap_mode": "selfmanaged",
>   "snap_seq": 58816,
>   "snap_epoch": 83254,
>   "pool_snaps": [],
>   "removed_snaps":
> "[1~e4cf,e4d1~1,e4d3~1,e4d5~1,e4d7~1,e4d9~1,e4db~1,e4dd~1,e4df~1,e4e1~1,e4e3~1,e4e5~1,e4e7~1,e4e9~1,e4eb~1,e4ed~1,e4ef~1,e4f1~1,e4f3~1,e4f5~1,e4f7~1,e4f9~1,e4fb~1,e4fd~1,e4ff~1,e501~1,e503~1,e505~1,e507~1,e509~1,e50b~1,e50d~1,e50f~1,e511~1,e513~1,e515~1,e517~1,e519~1,e51b~1,e51d~1,e51f~1,e521~1,e523~1,e525~1,e527~1,e529~1,e52b~1,e52d~1,e52f~1,e531~1,e533~1,e535~1,e537~1,e539~1,e53b~1,e53d~1,e53f~1,e541~1,e543~1,e545~1,e547~1,e549~1,e54b~1,e54d~1,e54f~1,e551~1,e553~1,e555~1,e557~1,e559~1,e55b~1,e55d~1,e55f~1,e561~1,e563~1,e565~1,e567~1,e569~1,e56b~1,e56d~1,e56f~1,e571~1,e573~1,e575~1,e577~1,e579~1,e57b~1,e57d~1,e57f~1,e581~1,e583~1,e585~1,e587~1,e589~1,e58b~1,e58d~1,e58f~1,e591~1,e593~1,e595~1,e597~1,e599~1,e59b~1,e59d~1,e59f~1,e5a1~1,e5a3~1,e5a5~1,e5a7~1,e5a9~1,e5ab~1,e5ad~1,e5af~1,e5b1~
 1,e5b3~1,e5b5~1,e5b7~1,e5b9~1,e5bb~1,e5bd~1,e5bf~1]",
> 83257
>   "snap_mode": "selfmanaged",
>   "snap_seq": 58817,
>   "snap_epoch": 83257,
>   "pool_snaps": [],
>   "removed_snaps":
> "[1~e4d1,e4d3~1,e4d5~1,e4d7~1,e4d9~1,e4db~1,e4dd~1,e4df~1,e4e1~1,e4e3~1,e4e5~1,e4e7~1,e4e9~1,e4eb~1,e4ed~1,e4ef~1,e4f1~1,e4f3~1,e4f5~1,e4f7~1,e4f9~1,e4fb~1,e4fd~1,e4ff~1,e501~1,e503~1,e505~1,e507~1,e509~1,e50b~1,e50d~1,e50f~1,e511~1,e513~1,e515~1,e517~1,e519~1,e51b~1,e51d~1,e51f~1,e521~1,e523~1,e525~1,e527~1,e529~1,e52b~1,e52d~1,e52f~1,e531~1,e533~1,e535~1,e537~1,e539~1,e53b~1,e53d~1,e53f~1,e541~1,e543~1,e545~1,e547~1,e549~1,e54b~1,e54d~1,e54f~1,e551~1,e553~1,e555~1,e557~1,e559~1,e55b~1,e55d~1,e55f~1,e561~1,e563~1,e565~1,e567~1,e569~1,e56b~1,e56d~1,e56f~1,e571~1,e573~1,e575~1,e577~1,e579~1,e57b~1,e57d~1,e57f~1,e581~1,e583~1,e585~1,e587~1,e589~1,e58b~1,e58d~1,e58f~1,e591~1,e593~1,e595~1,e597~1,e599~1,e59b~1,e59d~1,e59f~1,e5a1~1,e5a3~1,e5a5~1,e5a7~1,e5a9~1,e5ab~1,e5ad~1,e5af~1,e5b1~1,e5b3~
 1,e5b5~1,e5b7~1,e5b9~1,e5bb~1,e5bd~1,e5bf~1,e5c1~1]",
> 83258
>   "snap_mode": "selfmanaged",
>   "snap_seq": 58818,
>   "snap_epoch": 83258,
>   "pool_snaps": [],
>   "removed_snaps":
> "[1~e4d1,e4d3~1,e4d5~1,e4d7~1,e4d9~1,e4db~1,e4dd~1,e4df~1,e4e1~1,e4e3~1,e4e5~1,e4e7~1,e4e9~1,e4eb~1,e4ed~1,e4ef~1,e4f1~1,e4f3~1,e4f5~1,e4f7~1,e4f9~1,e4fb~1,e4fd~1,e4ff~1,e501~1,e503~1,e505~1,e507~1,e509~1,e50b~1,e50d~1,e50f~1,e511~1,e513~1,e515~1,e517~1,e519~1,e51b~1,e51d~1,e51f~1,e521~1,e523~1,e525~1,e527~1,e529~1,e52b~1,e52d~1,e52f~1,e531~1,e533~1,e535~1,e537~1,e539~1,e53b~1,e53d~1,e53f~1,e541~1,e543~1,e545~1,e547~1,e549~1,e54b~1,e54d~1,e54f~1,e551~1,e553~1,e555~1,e557~1,e559~1,e55b~1,e55d~1,e55f~1,e561~1,e563~1,e565~1,e567~1,e569~1,e56b~1,e56d~1,e56f~1,e571~1,e573~1,e575~1,e577~1,e579~1,e57b~1,e57d~1,e57f~1,e581~1,e583~1,e585~1,e587~1,e589~1,e58b~1,e58d~1,e58f~1,e591~1,e593~1,e595~1,e597~1,e599~1,e59b~1,e59d~1,e59f~1,e5a1~1,e5a3~1,e5a5~1,e5a7~1,e5a9~1,e5ab~1,e5ad~1,e5af~1,e5b1~1,e5b3~
 1,e5b5~1,e5b7~1,e5b9~1,e5bb~1,e5bd~1,e5bf~1,e5c1~1]",
> 83259
>   "snap_mode": "selfmanaged",
>   "snap_seq": 58819,
>   "snap_epoch": 83259,
>   "pool_snaps": [],
>   "removed_snaps":
> "[1~e4d3,e4d5~1,e4d7~1,e4d9~1,e4db~1,e4dd~1,e4df~1,e4e1~1,e4e3~1,e4e5~1,e4e7~1,e4e9~1,e4eb~1,e4ed~1,e4ef~1,e4f1~1,e4f3~1,e4f5~1,e4f7~1,e4f9~1,e4fb~1,e4fd~1,e4ff~1,e501~1,e503~1,e505~1,e507~1,e509~1,e50b~1,e50d~1,e50f~1,e511~1,e513~1,e515~1,e517~1,e519~1,e51b~1,e51d~1,e51f~1,e521~1,e523~1,e525~1,e527~1,e529~1,e52b~1,e52d~1,e52f~1,e531~1,e533~1,e535~1,e537~1,e539~1,e53b~1,e53d~1,e53f~1,e541~1,e543~1,e545~1,e547~1,e549~1,e54b~1,e54d~1,e54f~1,e551~1,e553~1,e555~1,e557~1,e559~1,e55b~1,e55d~1,e55f~1,e561~1,e563~1,e565~1,e567~1,e569~1,e56b~1,e56d~1,e56f~1,e571~1,e573~1,e575~1,e577~1,e579~1,e57b~1,e57d~1,e57f~1,e581~1,e583~1,e585~1,e587~1,e589~1,e58b~1,e58d~1,e58f~1,e591~1,e593~1,e595~1,e597~1,e599~1,e59b~1,e59d~1,e59f~1,e5a1~1,e5a3~1,e5a5~1,e5a7~1,e5a9~1,e5ab~1,e5ad~1,e5af~1,e5b1~1,e5b3~1,e5b5~
 1,e5b7~1,e5b9~1,e5bb~1,e5bd~1,e5bf~1,e5c1~1,e5c3~1]",
> 83260
>   "snap_mode": "selfmanaged",
>   "snap_seq": 58820,
>   "snap_epoch": 83260,
>   "pool_snaps": [],
>   "removed_snaps":
> "[1~e4d3,e4d5~1,e4d7~1,e4d9~1,e4db~1,e4dd~1,e4df~1,e4e1~1,e4e3~1,e4e5~1,e4e7~1,e4e9~1,e4eb~1,e4ed~1,e4ef~1,e4f1~1,e4f3~1,e4f5~1,e4f7~1,e4f9~1,e4fb~1,e4fd~1,e4ff~1,e501~1,e503~1,e505~1,e507~1,e509~1,e50b~1,e50d~1,e50f~1,e511~1,e513~1,e515~1,e517~1,e519~1,e51b~1,e51d~1,e51f~1,e521~1,e523~1,e525~1,e527~1,e529~1,e52b~1,e52d~1,e52f~1,e531~1,e533~1,e535~1,e537~1,e539~1,e53b~1,e53d~1,e53f~1,e541~1,e543~1,e545~1,e547~1,e549~1,e54b~1,e54d~1,e54f~1,e551~1,e553~1,e555~1,e557~1,e559~1,e55b~1,e55d~1,e55f~1,e561~1,e563~1,e565~1,e567~1,e569~1,e56b~1,e56d~1,e56f~1,e571~1,e573~1,e575~1,e577~1,e579~1,e57b~1,e57d~1,e57f~1,e581~1,e583~1,e585~1,e587~1,e589~1,e58b~1,e58d~1,e58f~1,e591~1,e593~1,e595~1,e597~1,e599~1,e59b~1,e59d~1,e59f~1,e5a1~1,e5a3~1,e5a5~1,e5a7~1,e5a9~1,e5ab~1,e5ad~1,e5af~1,e5b1~1,e5b3~1,e5b5~
 1,e5b7~1,e5b9~1,e5bb~1,e5bd~1,e5bf~1,e5c1~1,e5c3~1]",
> 83269
>   "snap_mode": "selfmanaged",
>   "snap_seq": 58821,
>   "snap_epoch": 83269,
>   "pool_snaps": [],
>   "removed_snaps":
> "[1~e4d3,e4d5~1,e4d7~1,e4d9~1,e4db~1,e4dd~1,e4df~1,e4e1~1,e4e3~1,e4e5~1,e4e7~1,e4e9~1,e4eb~1,e4ed~1,e4ef~1,e4f1~1,e4f3~1,e4f5~1,e4f7~1,e4f9~1,e4fb~1,e4fd~1,e4ff~1,e501~1,e503~1,e505~1,e507~1,e509~1,e50b~1,e50d~1,e50f~1,e511~1,e513~1,e515~1,e517~1,e519~1,e51b~1,e51d~1,e51f~1,e521~1,e523~1,e525~1,e527~1,e529~1,e52b~1,e52d~1,e52f~1,e531~1,e533~1,e535~1,e537~1,e539~1,e53b~1,e53d~1,e53f~1,e541~1,e543~1,e545~1,e547~1,e549~1,e54b~1,e54d~1,e54f~1,e551~1,e553~1,e555~1,e557~1,e559~1,e55b~1,e55d~1,e55f~1,e561~1,e563~1,e565~1,e567~1,e569~1,e56b~1,e56d~1,e56f~1,e571~1,e573~1,e575~1,e577~1,e579~1,e57b~1,e57d~1,e57f~1,e581~1,e583~1,e585~1,e587~1,e589~1,e58b~1,e58d~1,e58f~1,e591~1,e593~1,e595~1,e597~1,e599~1,e59b~1,e59d~1,e59f~1,e5a1~1,e5a3~1,e5a5~1,e5a7~1,e5a9~1,e5ab~1,e5ad~1,e5af~1,e5b1~1,e5b3~1,e5b5~
 1,e5b7~1,e5b9~1,e5bb~1,e5bd~1,e5bf~1,e5c1~1,e5c3~1]",
> 83270
>   "snap_mode": "selfmanaged",
>   "snap_seq": 58822,
>   "snap_epoch": 83270,
>   "pool_snaps": [],
>   "removed_snaps":
> "[1~e4d3,e4d5~1,e4d7~1,e4d9~1,e4db~1,e4dd~1,e4df~1,e4e1~1,e4e3~1,e4e5~1,e4e7~1,e4e9~1,e4eb~1,e4ed~1,e4ef~1,e4f1~1,e4f3~1,e4f5~1,e4f7~1,e4f9~1,e4fb~1,e4fd~1,e4ff~1,e501~1,e503~1,e505~1,e507~1,e509~1,e50b~1,e50d~1,e50f~1,e511~1,e513~1,e515~1,e517~1,e519~1,e51b~1,e51d~1,e51f~1,e521~1,e523~1,e525~1,e527~1,e529~1,e52b~1,e52d~1,e52f~1,e531~1,e533~1,e535~1,e537~1,e539~1,e53b~1,e53d~1,e53f~1,e541~1,e543~1,e545~1,e547~1,e549~1,e54b~1,e54d~1,e54f~1,e551~1,e553~1,e555~1,e557~1,e559~1,e55b~1,e55d~1,e55f~1,e561~1,e563~1,e565~1,e567~1,e569~1,e56b~1,e56d~1,e56f~1,e571~1,e573~1,e575~1,e577~1,e579~1,e57b~1,e57d~1,e57f~1,e581~1,e583~1,e585~1,e587~1,e589~1,e58b~1,e58d~1,e58f~1,e591~1,e593~1,e595~1,e597~1,e599~1,e59b~1,e59d~1,e59f~1,e5a1~1,e5a3~1,e5a5~1,e5a7~1,e5a9~1,e5ab~1,e5ad~1,e5af~1,e5b1~1,e5b3~1,e5b5~
 1,e5b7~1,e5b9~1,e5bb~1,e5bd~1,e5bf~1,e5c1~1,e5c3~1]",
> 83271
>   "snap_mode": "selfmanaged",
>   "snap_seq": 58823,
>   "snap_epoch": 83271,
>   "pool_snaps": [],
>   "removed_snaps":
> "[1~e4d3,e4d5~1,e4d7~1,e4d9~1,e4db~1,e4dd~1,e4df~1,e4e1~1,e4e3~1,e4e5~1,e4e7~1,e4e9~1,e4eb~1,e4ed~1,e4ef~1,e4f1~1,e4f3~1,e4f5~1,e4f7~1,e4f9~1,e4fb~1,e4fd~1,e4ff~1,e501~1,e503~1,e505~1,e507~1,e509~1,e50b~1,e50d~1,e50f~1,e511~1,e513~1,e515~1,e517~1,e519~1,e51b~1,e51d~1,e51f~1,e521~1,e523~1,e525~1,e527~1,e529~1,e52b~1,e52d~1,e52f~1,e531~1,e533~1,e535~1,e537~1,e539~1,e53b~1,e53d~1,e53f~1,e541~1,e543~1,e545~1,e547~1,e549~1,e54b~1,e54d~1,e54f~1,e551~1,e553~1,e555~1,e557~1,e559~1,e55b~1,e55d~1,e55f~1,e561~1,e563~1,e565~1,e567~1,e569~1,e56b~1,e56d~1,e56f~1,e571~1,e573~1,e575~1,e577~1,e579~1,e57b~1,e57d~1,e57f~1,e581~1,e583~1,e585~1,e587~1,e589~1,e58b~1,e58d~1,e58f~1,e591~1,e593~1,e595~1,e597~1,e599~1,e59b~1,e59d~1,e59f~1,e5a1~1,e5a3~1,e5a5~1,e5a7~1,e5a9~1,e5ab~1,e5ad~1,e5af~1,e5b1~1,e5b3~1,e5b5~
 1,e5b7~1,e5b9~1,e5bb~1,e5bd~1,e5bf~1,e5c1~1,e5c3~1]",
> 83276
>   "snap_mode": "selfmanaged",
>   "snap_seq": 58824,
>   "snap_epoch": 83276,
>   "pool_snaps": [],
>   "removed_snaps":
> "[1~e4d3,e4d5~1,e4d7~1,e4d9~1,e4db~1,e4dd~1,e4df~1,e4e1~1,e4e3~1,e4e5~1,e4e7~1,e4e9~1,e4eb~1,e4ed~1,e4ef~1,e4f1~1,e4f3~1,e4f5~1,e4f7~1,e4f9~1,e4fb~1,e4fd~1,e4ff~1,e501~1,e503~1,e505~1,e507~1,e509~1,e50b~1,e50d~1,e50f~1,e511~1,e513~1,e515~1,e517~1,e519~1,e51b~1,e51d~1,e51f~1,e521~1,e523~1,e525~1,e527~1,e529~1,e52b~1,e52d~1,e52f~1,e531~1,e533~1,e535~1,e537~1,e539~1,e53b~1,e53d~1,e53f~1,e541~1,e543~1,e545~1,e547~1,e549~1,e54b~1,e54d~1,e54f~1,e551~1,e553~1,e555~1,e557~1,e559~1,e55b~1,e55d~1,e55f~1,e561~1,e563~1,e565~1,e567~1,e569~1,e56b~1,e56d~1,e56f~1,e571~1,e573~1,e575~1,e577~1,e579~1,e57b~1,e57d~1,e57f~1,e581~1,e583~1,e585~1,e587~1,e589~1,e58b~1,e58d~1,e58f~1,e591~1,e593~1,e595~1,e597~1,e599~1,e59b~1,e59d~1,e59f~1,e5a1~1,e5a3~1,e5a5~1,e5a7~1,e5a9~1,e5ab~1,e5ad~1,e5af~1,e5b1~1,e5b3~1,e5b5~
 1,e5b7~1,e5b9~1,e5bb~1,e5bd~1,e5bf~1,e5c1~1,e5c3~1]",
> 83277
>   "snap_mode": "selfmanaged",
>   "snap_seq": 58825,
>   "snap_epoch": 83277,
>   "pool_snaps": [],
>   "removed_snaps":
> "[1~e4d3,e4d5~1,e4d7~1,e4d9~1,e4db~1,e4dd~1,e4df~1,e4e1~1,e4e3~1,e4e5~1,e4e7~1,e4e9~1,e4eb~1,e4ed~1,e4ef~1,e4f1~1,e4f3~1,e4f5~1,e4f7~1,e4f9~1,e4fb~1,e4fd~1,e4ff~1,e501~1,e503~1,e505~1,e507~1,e509~1,e50b~1,e50d~1,e50f~1,e511~1,e513~1,e515~1,e517~1,e519~1,e51b~1,e51d~1,e51f~1,e521~1,e523~1,e525~1,e527~1,e529~1,e52b~1,e52d~1,e52f~1,e531~1,e533~1,e535~1,e537~1,e539~1,e53b~1,e53d~1,e53f~1,e541~1,e543~1,e545~1,e547~1,e549~1,e54b~1,e54d~1,e54f~1,e551~1,e553~1,e555~1,e557~1,e559~1,e55b~1,e55d~1,e55f~1,e561~3,e565~1,e567~1,e569~1,e56b~1,e56d~1,e56f~1,e571~1,e573~1,e575~1,e577~1,e579~1,e57b~1,e57d~1,e57f~1,e581~1,e583~1,e585~1,e587~1,e589~1,e58b~1,e58d~1,e58f~1,e591~1,e593~1,e595~1,e597~1,e599~1,e59b~1,e59d~1,e59f~1,e5a1~1,e5a3~1,e5a5~1,e5a7~1,e5a9~1,e5ab~1,e5ad~1,e5af~1,e5b1~1,e5b3~1,e5b5~1,e5b7~
 1,e5b9~1,e5bb~1,e5bd~1,e5bf~1,e5c1~1,e5c3~1,e5c9~1]",
> 83278
>   "snap_mode": "selfmanaged",
>   "snap_seq": 58826,
>   "snap_epoch": 83278,
>   "pool_snaps": [],
>   "removed_snaps":
> "[1~e4d3,e4d5~1,e4d7~1,e4d9~1,e4db~1,e4dd~1,e4df~1,e4e1~1,e4e3~1,e4e5~1,e4e7~1,e4e9~1,e4eb~1,e4ed~1,e4ef~1,e4f1~1,e4f3~1,e4f5~1,e4f7~1,e4f9~1,e4fb~1,e4fd~1,e4ff~1,e501~1,e503~1,e505~1,e507~1,e509~1,e50b~1,e50d~1,e50f~1,e511~1,e513~1,e515~1,e517~1,e519~1,e51b~1,e51d~1,e51f~1,e521~1,e523~1,e525~1,e527~1,e529~1,e52b~1,e52d~1,e52f~1,e531~1,e533~1,e535~1,e537~1,e539~1,e53b~1,e53d~1,e53f~1,e541~1,e543~1,e545~1,e547~1,e549~1,e54b~1,e54d~1,e54f~1,e551~1,e553~1,e555~1,e557~1,e559~1,e55b~1,e55d~1,e55f~1,e561~3,e565~1,e567~1,e569~1,e56b~1,e56d~1,e56f~1,e571~1,e573~1,e575~1,e577~1,e579~1,e57b~1,e57d~1,e57f~1,e581~1,e583~1,e585~1,e587~1,e589~1,e58b~1,e58d~1,e58f~1,e591~1,e593~1,e595~1,e597~1,e599~1,e59b~1,e59d~1,e59f~1,e5a1~1,e5a3~1,e5a5~1,e5a7~1,e5a9~1,e5ab~1,e5ad~1,e5af~1,e5b1~1,e5b3~1,e5b5~1,e5b7~
 1,e5b9~1,e5bb~1,e5bd~1,e5bf~1,e5c1~1,e5c3~1,e5c5~1,e5c9~2]",
> 83279
>   "snap_mode": "selfmanaged",
>   "snap_seq": 58827,
>   "snap_epoch": 83279,
>   "pool_snaps": [],
>   "removed_snaps":
> "[1~e4d3,e4d5~1,e4d7~1,e4d9~1,e4db~1,e4dd~1,e4df~1,e4e1~1,e4e3~1,e4e5~1,e4e7~1,e4e9~1,e4eb~1,e4ed~1,e4ef~1,e4f1~1,e4f3~1,e4f5~1,e4f7~1,e4f9~1,e4fb~1,e4fd~1,e4ff~1,e501~1,e503~1,e505~1,e507~1,e509~1,e50b~1,e50d~1,e50f~1,e511~1,e513~1,e515~1,e517~1,e519~1,e51b~1,e51d~1,e51f~1,e521~1,e523~1,e525~1,e527~1,e529~1,e52b~1,e52d~1,e52f~1,e531~1,e533~1,e535~1,e537~1,e539~1,e53b~1,e53d~1,e53f~1,e541~1,e543~1,e545~1,e547~1,e549~1,e54b~1,e54d~1,e54f~1,e551~1,e553~1,e555~1,e557~1,e559~1,e55b~1,e55d~1,e55f~1,e561~3,e565~1,e567~1,e569~1,e56b~1,e56d~1,e56f~1,e571~1,e573~1,e575~1,e577~1,e579~1,e57b~1,e57d~1,e57f~1,e581~1,e583~1,e585~1,e587~1,e589~1,e58b~1,e58d~1,e58f~1,e591~1,e593~1,e595~1,e597~1,e599~1,e59b~1,e59d~1,e59f~1,e5a1~1,e5a3~1,e5a5~1,e5a7~1,e5a9~1,e5ab~1,e5ad~1,e5af~1,e5b1~1,e5b3~1,e5b5~1,e5b7~
 1,e5b9~1,e5bb~1,e5bd~1,e5bf~1,e5c1~1,e5c3~1,e5c5~2,e5c9~3]",
> 83280
>   "snap_mode": "selfmanaged",
>   "snap_seq": 58828,
>   "snap_epoch": 83280,
>   "pool_snaps": [],
>   "removed_snaps":
> "[1~e4d3,e4d5~1,e4d7~1,e4d9~1,e4db~1,e4dd~1,e4df~1,e4e1~1,e4e3~1,e4e5~1,e4e7~1,e4e9~1,e4eb~1,e4ed~1,e4ef~1,e4f1~1,e4f3~1,e4f5~1,e4f7~1,e4f9~1,e4fb~1,e4fd~1,e4ff~1,e501~1,e503~1,e505~1,e507~1,e509~1,e50b~1,e50d~1,e50f~1,e511~1,e513~1,e515~1,e517~1,e519~1,e51b~1,e51d~1,e51f~1,e521~1,e523~1,e525~1,e527~1,e529~1,e52b~1,e52d~1,e52f~1,e531~1,e533~1,e535~1,e537~1,e539~1,e53b~1,e53d~1,e53f~1,e541~1,e543~1,e545~1,e547~1,e549~1,e54b~1,e54d~1,e54f~1,e551~1,e553~1,e555~1,e557~1,e559~1,e55b~1,e55d~1,e55f~1,e561~3,e565~1,e567~1,e569~1,e56b~1,e56d~1,e56f~1,e571~1,e573~1,e575~1,e577~1,e579~1,e57b~1,e57d~1,e57f~1,e581~1,e583~1,e585~1,e587~1,e589~1,e58b~1,e58d~1,e58f~1,e591~1,e593~1,e595~1,e597~1,e599~1,e59b~1,e59d~1,e59f~1,e5a1~1,e5a3~1,e5a5~1,e5a7~1,e5a9~1,e5ab~1,e5ad~1,e5af~1,e5b1~1,e5b3~1,e5b5~1,e5b7~
 1,e5b9~1,e5bb~1,e5bd~1,e5bf~1,e5c1~1,e5c3~1,e5c5~3,e5c9~4]",
> 83281
>   "snap_mode": "selfmanaged",
>   "snap_seq": 58829,
>   "snap_epoch": 83281,
>   "pool_snaps": [],
>   "removed_snaps":
> "[1~e4d3,e4d5~1,e4d7~1,e4d9~1,e4db~1,e4dd~1,e4df~1,e4e1~1,e4e3~1,e4e5~1,e4e7~1,e4e9~1,e4eb~1,e4ed~1,e4ef~1,e4f1~1,e4f3~1,e4f5~1,e4f7~1,e4f9~1,e4fb~1,e4fd~1,e4ff~1,e501~1,e503~1,e505~1,e507~1,e509~1,e50b~1,e50d~1,e50f~1,e511~1,e513~1,e515~1,e517~1,e519~1,e51b~1,e51d~1,e51f~1,e521~1,e523~1,e525~1,e527~1,e529~1,e52b~1,e52d~1,e52f~1,e531~1,e533~1,e535~1,e537~1,e539~1,e53b~1,e53d~1,e53f~1,e541~1,e543~1,e545~1,e547~1,e549~1,e54b~1,e54d~1,e54f~1,e551~1,e553~1,e555~1,e557~1,e559~1,e55b~1,e55d~1,e55f~1,e561~3,e565~1,e567~1,e569~1,e56b~1,e56d~1,e56f~1,e571~1,e573~1,e575~1,e577~1,e579~1,e57b~1,e57d~1,e57f~1,e581~1,e583~1,e585~1,e587~1,e589~1,e58b~1,e58d~1,e58f~1,e591~1,e593~1,e595~1,e597~1,e599~1,e59b~1,e59d~1,e59f~1,e5a1~1,e5a3~1,e5a5~1,e5a7~1,e5a9~1,e5ab~1,e5ad~1,e5af~1,e5b1~1,e5b3~1,e5b5~1,e5b7~
 1,e5b9~1,e5bb~1,e5bd~1,e5bf~1,e5c1~1,e5c3~1,e5c5~9]",
> 83282
>   "snap_mode": "selfmanaged",
>   "snap_seq": 58830,
>   "snap_epoch": 83282,
>   "pool_snaps": [],
>   "removed_snaps":
> "[1~e4d3,e4d5~1,e4d7~1,e4d9~1,e4db~1,e4dd~1,e4df~1,e4e1~1,e4e3~1,e4e5~1,e4e7~1,e4e9~1,e4eb~1,e4ed~1,e4ef~1,e4f1~1,e4f3~1,e4f5~1,e4f7~1,e4f9~1,e4fb~1,e4fd~1,e4ff~1,e501~1,e503~1,e505~1,e507~1,e509~1,e50b~1,e50d~1,e50f~1,e511~1,e513~1,e515~1,e517~1,e519~1,e51b~1,e51d~1,e51f~1,e521~1,e523~1,e525~1,e527~1,e529~1,e52b~1,e52d~1,e52f~1,e531~1,e533~1,e535~1,e537~1,e539~1,e53b~1,e53d~1,e53f~1,e541~1,e543~1,e545~1,e547~1,e549~1,e54b~1,e54d~1,e54f~1,e551~1,e553~1,e555~1,e557~1,e559~1,e55b~1,e55d~1,e55f~1,e561~3,e565~1,e567~1,e569~1,e56b~1,e56d~1,e56f~1,e571~1,e573~1,e575~1,e577~1,e579~1,e57b~1,e57d~1,e57f~1,e581~1,e583~1,e585~1,e587~1,e589~1,e58b~1,e58d~1,e58f~1,e591~1,e593~1,e595~1,e597~1,e599~1,e59b~1,e59d~1,e59f~1,e5a1~1,e5a3~1,e5a5~1,e5a7~1,e5a9~1,e5ab~1,e5ad~1,e5af~1,e5b1~1,e5b3~1,e5b5~1,e5b7~
 1,e5b9~1,e5bb~1,e5bd~1,e5bf~1,e5c1~1,e5c3~1,e5c5~9]",
> 83283
>   "snap_mode": "selfmanaged",
>   "snap_seq": 58831,
>   "snap_epoch": 83283,
>   "pool_snaps": [],
>   "removed_snaps":
> "[1~e4d3,e4d5~1,e4d7~1,e4d9~1,e4db~1,e4dd~1,e4df~1,e4e1~1,e4e3~1,e4e5~1,e4e7~1,e4e9~1,e4eb~1,e4ed~1,e4ef~1,e4f1~1,e4f3~1,e4f5~1,e4f7~1,e4f9~1,e4fb~1,e4fd~1,e4ff~1,e501~1,e503~1,e505~1,e507~1,e509~1,e50b~1,e50d~1,e50f~1,e511~1,e513~1,e515~1,e517~1,e519~1,e51b~1,e51d~1,e51f~1,e521~1,e523~1,e525~1,e527~1,e529~1,e52b~1,e52d~1,e52f~1,e531~1,e533~1,e535~1,e537~1,e539~1,e53b~1,e53d~1,e53f~1,e541~1,e543~1,e545~1,e547~1,e549~1,e54b~1,e54d~1,e54f~1,e551~1,e553~1,e555~1,e557~1,e559~1,e55b~1,e55d~1,e55f~1,e561~3,e565~1,e567~1,e569~1,e56b~1,e56d~1,e56f~1,e571~1,e573~1,e575~1,e577~1,e579~1,e57b~1,e57d~1,e57f~1,e581~1,e583~1,e585~1,e587~1,e589~1,e58b~1,e58d~1,e58f~1,e591~1,e593~1,e595~1,e597~1,e599~1,e59b~1,e59d~1,e59f~1,e5a1~1,e5a3~1,e5a5~1,e5a7~1,e5a9~1,e5ab~1,e5ad~1,e5af~1,e5b1~1,e5b3~1,e5b5~1,e5b7~
 1,e5b9~1,e5bb~1,e5bd~1,e5bf~1,e5c1~1,e5c3~1,e5c5~9]",
> 83284
>   "snap_mode": "selfmanaged",
>   "snap_seq": 58832,
>   "snap_epoch": 83284,
>   "pool_snaps": [],
>   "removed_snaps":
> "[1~e4d3,e4d5~1,e4d7~1,e4d9~1,e4db~1,e4dd~1,e4df~1,e4e1~1,e4e3~1,e4e5~1,e4e7~1,e4e9~1,e4eb~1,e4ed~1,e4ef~1,e4f1~1,e4f3~1,e4f5~1,e4f7~1,e4f9~1,e4fb~1,e4fd~1,e4ff~1,e501~1,e503~1,e505~1,e507~1,e509~1,e50b~1,e50d~1,e50f~1,e511~1,e513~1,e515~1,e517~1,e519~1,e51b~1,e51d~1,e51f~1,e521~1,e523~1,e525~1,e527~1,e529~1,e52b~1,e52d~1,e52f~1,e531~1,e533~1,e535~1,e537~1,e539~1,e53b~1,e53d~1,e53f~1,e541~1,e543~1,e545~1,e547~1,e549~1,e54b~1,e54d~1,e54f~1,e551~1,e553~1,e555~1,e557~1,e559~1,e55b~1,e55d~1,e55f~1,e561~3,e565~1,e567~1,e569~1,e56b~1,e56d~1,e56f~1,e571~1,e573~1,e575~1,e577~1,e579~1,e57b~1,e57d~1,e57f~1,e581~1,e583~1,e585~1,e587~1,e589~1,e58b~1,e58d~1,e58f~1,e591~1,e593~1,e595~1,e597~1,e599~1,e59b~1,e59d~1,e59f~1,e5a1~1,e5a3~1,e5a5~1,e5a7~1,e5a9~1,e5ab~1,e5ad~1,e5af~1,e5b1~1,e5b3~1,e5b5~1,e5b7~
 1,e5b9~1,e5bb~1,e5bd~1,e5bf~1,e5c1~1,e5c3~1,e5c5~9]",
> 83287
>   "snap_mode": "selfmanaged",
>   "snap_seq": 58833,
>   "snap_epoch": 83287,
>   "pool_snaps": [],
>   "removed_snaps":
> "[1~e4d3,e4d5~1,e4d7~1,e4d9~1,e4db~1,e4dd~1,e4df~1,e4e1~1,e4e3~1,e4e5~1,e4e7~1,e4e9~1,e4eb~1,e4ed~1,e4ef~1,e4f1~1,e4f3~1,e4f5~1,e4f7~1,e4f9~1,e4fb~1,e4fd~1,e4ff~1,e501~1,e503~1,e505~1,e507~1,e509~1,e50b~1,e50d~1,e50f~1,e511~1,e513~1,e515~1,e517~1,e519~1,e51b~1,e51d~1,e51f~1,e521~1,e523~1,e525~1,e527~1,e529~1,e52b~1,e52d~1,e52f~1,e531~1,e533~1,e535~1,e537~1,e539~1,e53b~1,e53d~1,e53f~1,e541~1,e543~1,e545~1,e547~1,e549~1,e54b~1,e54d~1,e54f~1,e551~1,e553~1,e555~1,e557~1,e559~1,e55b~1,e55d~1,e55f~1,e561~3,e565~1,e567~1,e569~1,e56b~1,e56d~1,e56f~1,e571~1,e573~1,e575~1,e577~1,e579~1,e57b~1,e57d~1,e57f~1,e581~1,e583~1,e585~1,e587~1,e589~1,e58b~1,e58d~1,e58f~1,e591~1,e593~1,e595~1,e597~1,e599~1,e59b~1,e59d~1,e59f~1,e5a1~1,e5a3~1,e5a5~1,e5a7~1,e5a9~1,e5ab~1,e5ad~1,e5af~1,e5b1~1,e5b3~1,e5b5~1,e5b7~
 1,e5b9~1,e5bb~1,e5bd~1,e5bf~1,e5c1~1,e5c3~1,e5c5~9]",
> 83288
>   "snap_mode": "selfmanaged",
>   "snap_seq": 58834,
>   "snap_epoch": 83288,
>   "pool_snaps": [],
>   "removed_snaps":
> "[1~e4d3,e4d5~1,e4d7~1,e4d9~1,e4db~1,e4dd~1,e4df~1,e4e1~1,e4e3~1,e4e5~1,e4e7~1,e4e9~1,e4eb~1,e4ed~1,e4ef~1,e4f1~1,e4f3~1,e4f5~1,e4f7~1,e4f9~1,e4fb~1,e4fd~1,e4ff~1,e501~1,e503~1,e505~1,e507~1,e509~1,e50b~1,e50d~1,e50f~1,e511~1,e513~1,e515~1,e517~1,e519~1,e51b~1,e51d~1,e51f~1,e521~1,e523~1,e525~1,e527~1,e529~1,e52b~1,e52d~1,e52f~1,e531~1,e533~1,e535~1,e537~1,e539~1,e53b~1,e53d~1,e53f~1,e541~1,e543~1,e545~1,e547~1,e549~1,e54b~1,e54d~1,e54f~1,e551~1,e553~1,e555~1,e557~1,e559~1,e55b~1,e55d~1,e55f~1,e561~3,e565~1,e567~1,e569~1,e56b~1,e56d~1,e56f~1,e571~1,e573~1,e575~1,e577~1,e579~1,e57b~1,e57d~1,e57f~1,e581~1,e583~1,e585~1,e587~1,e589~1,e58b~1,e58d~1,e58f~1,e591~1,e593~1,e595~1,e597~1,e599~1,e59b~1,e59d~1,e59f~1,e5a1~1,e5a3~1,e5a5~1,e5a7~1,e5a9~1,e5ab~1,e5ad~1,e5af~1,e5b1~1,e5b3~1,e5b5~1,e5b7~
 1,e5b9~1,e5bb~1,e5bd~1,e5bf~1,e5c1~1,e5c3~1,e5c5~9]",
> 83289
>   "snap_mode": "selfmanaged",
>   "snap_seq": 58835,
>   "snap_epoch": 83289,
>   "pool_snaps": [],
>   "removed_snaps":
> "[1~e4d5,e4d7~1,e4d9~1,e4db~1,e4dd~1,e4df~1,e4e1~1,e4e3~1,e4e5~1,e4e7~1,e4e9~1,e4eb~1,e4ed~1,e4ef~1,e4f1~1,e4f3~1,e4f5~1,e4f7~1,e4f9~1,e4fb~1,e4fd~1,e4ff~1,e501~1,e503~1,e505~1,e507~1,e509~1,e50b~1,e50d~1,e50f~1,e511~1,e513~1,e515~1,e517~1,e519~1,e51b~1,e51d~1,e51f~1,e521~1,e523~1,e525~1,e527~1,e529~1,e52b~1,e52d~1,e52f~1,e531~1,e533~1,e535~1,e537~1,e539~1,e53b~1,e53d~1,e53f~1,e541~1,e543~1,e545~1,e547~1,e549~1,e54b~1,e54d~1,e54f~1,e551~1,e553~1,e555~1,e557~1,e559~1,e55b~1,e55d~1,e55f~1,e561~3,e565~1,e567~1,e569~1,e56b~1,e56d~1,e56f~1,e571~1,e573~1,e575~1,e577~1,e579~1,e57b~1,e57d~1,e57f~1,e581~1,e583~1,e585~1,e587~1,e589~1,e58b~1,e58d~1,e58f~1,e591~1,e593~1,e595~1,e597~1,e599~1,e59b~1,e59d~1,e59f~1,e5a1~1,e5a3~1,e5a5~1,e5a7~1,e5a9~1,e5ab~1,e5ad~1,e5af~1,e5b1~1,e5b3~1,e5b5~1,e5b7~1,e5b9~
 1,e5bb~1,e5bd~1,e5bf~1,e5c1~1,e5c3~1,e5c5~9,e5d3~1]",
> 83290
>   "snap_mode": "selfmanaged",
>   "snap_seq": 58836,
>   "snap_epoch": 83290,
>   "pool_snaps": [],
>   "removed_snaps":
> "[1~e4d5,e4d7~1,e4d9~1,e4db~1,e4dd~1,e4df~1,e4e1~1,e4e3~1,e4e5~1,e4e7~1,e4e9~1,e4eb~1,e4ed~1,e4ef~1,e4f1~1,e4f3~1,e4f5~1,e4f7~1,e4f9~1,e4fb~1,e4fd~1,e4ff~1,e501~1,e503~1,e505~1,e507~1,e509~1,e50b~1,e50d~1,e50f~1,e511~1,e513~1,e515~1,e517~1,e519~1,e51b~1,e51d~1,e51f~1,e521~1,e523~1,e525~1,e527~1,e529~1,e52b~1,e52d~1,e52f~1,e531~1,e533~1,e535~1,e537~1,e539~1,e53b~1,e53d~1,e53f~1,e541~1,e543~1,e545~1,e547~1,e549~1,e54b~1,e54d~1,e54f~1,e551~1,e553~1,e555~1,e557~1,e559~1,e55b~1,e55d~1,e55f~1,e561~3,e565~1,e567~1,e569~1,e56b~1,e56d~1,e56f~1,e571~1,e573~1,e575~1,e577~1,e579~1,e57b~1,e57d~1,e57f~1,e581~1,e583~1,e585~1,e587~1,e589~1,e58b~1,e58d~1,e58f~1,e591~1,e593~1,e595~1,e597~1,e599~1,e59b~1,e59d~1,e59f~1,e5a1~1,e5a3~1,e5a5~1,e5a7~1,e5a9~1,e5ab~1,e5ad~1,e5af~1,e5b1~1,e5b3~1,e5b5~1,e5b7~1,e5b9~
 1,e5bb~1,e5bd~1,e5bf~1,e5c1~1,e5c3~1,e5c5~9,e5d3~1]",


>  but I guess the 
> underlying first question is whether any snap deletion happened anywhere 
> around this period (2560+ sec before the warning, or around the time the 
> op was sent in epoch 83264).  
A snapshot based backup thing ran at 12:00 CEST and took until 18:25
CEST to finish, which overlaps that, and creates and removes 120
snapshots spread throughout the process.
> (And yeah, removed_snaps is the field that 
> matters here!)
>
> Thanks!
> sage



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: another scrub bug? blocked for > 10240.948831 secs
  2017-04-20 18:58           ` Peter Maloney
@ 2017-04-28  7:24             ` Peter Maloney
  2017-05-23  9:29               ` Peter Maloney
  0 siblings, 1 reply; 10+ messages in thread
From: Peter Maloney @ 2017-04-28  7:24 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel

On 04/20/17 20:58, Peter Maloney wrote:
> On 04/20/17 18:19, Sage Weil wrote:
>>  but I guess the 
>> underlying first question is whether any snap deletion happened anywhere 
>> around this period (2560+ sec before the warning, or around the time the 
>> op was sent in epoch 83264).  
> A snapshot based backup thing ran at 12:00 CEST and took until 18:25
> CEST to finish, which overlaps that, and creates and removes 120
> snapshots spread throughout the process.
>> (And yeah, removed_snaps is the field that 
>> matters here!)
>>
>> Thanks!
>> sage
This still happens in 10.2.7

> 2017-04-28 04:41:59.343443 osd.9 10.3.0.132:6808/2704 18 : cluster
> [WRN] slow request 10.040822 seconds old, received at 2017-04-28
> 04:41:49.302552: replica scrub(pg:
> 4.145,from:0'0,to:93267'6832180,epoch:93267,start:4:a2d2c99e:::rbd_data.4bf687238e1f29.000000000000f7a3:0,end:4:a2d2dcd6:::rbd_data.46820b238e1f29.000000000000bfbc:f25e,chunky:1,deep:0,seed:4294967295,version:6)
> currently reached_pg
> ...
> 2017-04-28 06:07:09.975902 osd.9 10.3.0.132:6808/2704 36 : cluster
> [WRN] slow request 5120.673291 seconds old, received at 2017-04-28
> 04:41:49.302552: replica scrub(pg: 4.145,from:0'0,to:93
> 267'6832180,epoch:93267,start:4:a2d2c99e:::rbd_data.4bf687238e1f29.000000000000f7a3:0,end:4:a2d2dcd6:::rbd_data.46820b238e1f29.000000000000bfbc:f25e,chunky:1,deep:0,seed:4294967295,version:
and there are snaps created and removed around that time.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: another scrub bug? blocked for > 10240.948831 secs
  2017-04-28  7:24             ` Peter Maloney
@ 2017-05-23  9:29               ` Peter Maloney
  2017-06-15 12:49                 ` Peter Maloney
  0 siblings, 1 reply; 10+ messages in thread
From: Peter Maloney @ 2017-05-23  9:29 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel

On 04/28/17 09:24, Peter Maloney wrote:
> On 04/20/17 20:58, Peter Maloney wrote:
>> On 04/20/17 18:19, Sage Weil wrote:
>>>  but I guess the 
>>> underlying first question is whether any snap deletion happened anywhere 
>>> around this period (2560+ sec before the warning, or around the time the 
>>> op was sent in epoch 83264).  
>> A snapshot based backup thing ran at 12:00 CEST and took until 18:25
>> CEST to finish, which overlaps that, and creates and removes 120
>> snapshots spread throughout the process.
>>> (And yeah, removed_snaps is the field that 
>>> matters here!)
>>>
>>> Thanks!
>>> sage
> This still happens in 10.2.7
>
>> 2017-04-28 04:41:59.343443 osd.9 10.3.0.132:6808/2704 18 : cluster
>> [WRN] slow request 10.040822 seconds old, received at 2017-04-28
>> 04:41:49.302552: replica scrub(pg:
>> 4.145,from:0'0,to:93267'6832180,epoch:93267,start:4:a2d2c99e:::rbd_data.4bf687238e1f29.000000000000f7a3:0,end:4:a2d2dcd6:::rbd_data.46820b238e1f29.000000000000bfbc:f25e,chunky:1,deep:0,seed:4294967295,version:6)
>> currently reached_pg
>> ...
>> 2017-04-28 06:07:09.975902 osd.9 10.3.0.132:6808/2704 36 : cluster
>> [WRN] slow request 5120.673291 seconds old, received at 2017-04-28
>> 04:41:49.302552: replica scrub(pg: 4.145,from:0'0,to:93
>> 267'6832180,epoch:93267,start:4:a2d2c99e:::rbd_data.4bf687238e1f29.000000000000f7a3:0,end:4:a2d2dcd6:::rbd_data.46820b238e1f29.000000000000bfbc:f25e,chunky:1,deep:0,seed:4294967295,version:
> and there are snaps created and removed around that time.

So I changed some settings a long time ago for unrelated reasons, and
now it's far more rare (only happened once since, but had many more than
1 request blocked).

Here are the old settings:

> osd deep scrub stride = 524288  # 512 KiB
> osd scrub chunk min = 1
> osd scrub chunk max = 1
> osd scrub sleep = 0.5
And the new:
> osd deep scrub stride = 4194304  # 4 MiB
> osd scrub chunk min = 20
> osd scrub chunk max = 25
> osd scrub sleep = 4



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: another scrub bug? blocked for > 10240.948831 secs
  2017-05-23  9:29               ` Peter Maloney
@ 2017-06-15 12:49                 ` Peter Maloney
  0 siblings, 0 replies; 10+ messages in thread
From: Peter Maloney @ 2017-06-15 12:49 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel

On 05/23/17 11:29, Peter Maloney wrote:
> On 04/28/17 09:24, Peter Maloney wrote:
>>
>> This still happens in 10.2.7
>>
>>> 2017-04-28 04:41:59.343443 osd.9 10.3.0.132:6808/2704 18 : cluster
>>> [WRN] slow request 10.040822 seconds old, received at 2017-04-28
>>> 04:41:49.302552: replica scrub(pg:
>>> 4.145,from:0'0,to:93267'6832180,epoch:93267,start:4:a2d2c99e:::rbd_data.4bf687238e1f29.000000000000f7a3:0,end:4:a2d2dcd6:::rbd_data.46820b238e1f29.000000000000bfbc:f25e,chunky:1,deep:0,seed:4294967295,version:6)
>>> currently reached_pg
>>> ...
>>> 2017-04-28 06:07:09.975902 osd.9 10.3.0.132:6808/2704 36 : cluster
>>> [WRN] slow request 5120.673291 seconds old, received at 2017-04-28
>>> 04:41:49.302552: replica scrub(pg: 4.145,from:0'0,to:93
>>> 267'6832180,epoch:93267,start:4:a2d2c99e:::rbd_data.4bf687238e1f29.000000000000f7a3:0,end:4:a2d2dcd6:::rbd_data.46820b238e1f29.000000000000bfbc:f25e,chunky:1,deep:0,seed:4294967295,version:
>> and there are snaps created and removed around that time.
> So I changed some settings a long time ago for unrelated reasons, and
> now it's far more rare (only happened once since, but had many more than
> 1 request blocked).
>
> Here are the old settings:
>
>> osd deep scrub stride = 524288  # 512 KiB
>> osd scrub chunk min = 1
>> osd scrub chunk max = 1
>> osd scrub sleep = 0.5
> And the new:
>> osd deep scrub stride = 4194304  # 4 MiB
>> osd scrub chunk min = 20
>> osd scrub chunk max = 25
>> osd scrub sleep = 4

So, it happened again today. This time the log said "currently waiting
for scrub" instead, and a client was affected... the VM would work for a
short time, then hang, then need kill -9, and work for a short time
again after restart. Restarting the osd which had the slow request in
its log (which cancels the scrub) made the VM work again without
restarting it. So now we're making a watchdog script to restart osds
that have that log.

Here are some log snippets:
> 2017-06-15 08:29:40.953017 7f31e2294700  0 log_channel(cluster) log
> [WRN] : 1 slow requests, 1 included below; oldest blocked for >
> 10.956689 secs
> 2017-06-15 08:29:40.953030 7f31e2294700  0 log_channel(cluster) log
> [WRN] : slow request 10.956689 seconds old, received at 2017-06-15
> 08:29:29.995548: osd_op(client.5057555.0:64922641 4.a565eedb
> rbd_data.4bf687238e1f29.000000000000b1dc [set-alloc-hint object_size
> 4194304 write_size 4194304,writefull 0~4194304] snapc 11e5f=[11e5f]
> ack+ondisk+write+known_if_redirected e123792) currently waiting for scrub

> 2017-06-15 12:50:25.100735 7f31e2294700  0 log_channel(cluster) log
> [WRN] : 4 slow requests, 1 included below; oldest blocked for >
> 15655.105139 secs
> 2017-06-15 12:50:25.100745 7f31e2294700  0 log_channel(cluster) log
> [WRN] : slow request 10240.544155 seconds old, received at 2017-06-15
> 09:59:44.556532: osd_op(client.8047070.0:182487 4.5
> 734eedb rbd_data.4bf687238e1f29.000000000000370b [set-alloc-hint
> object_size 4194304 write_size 4194304,writefull 0~4194304] snapc
> 11e5f=[11e5f] ack+ondisk+write+known_if_redirected e123820
> ) currently waiting for scrub
(and snaps are created and removed on all images daily)


Why should anything ever wait for scrub? Shouldn't a scrub just use the
data asynchronously, and if a client changes the data during scrub, you
can asynchronously skip that overwritten part (unless it's in a snap,
then scrub the copy when done).

And shouldn't it have timed out after 4 hours? What does the config
option "mon_scrub_timeout" (default "300") do? It's not documented here
http://docs.ceph.com/docs/master/rados/configuration/mon-config-ref/

-- 

--------------------------------------------
Peter Maloney
Brockmann Consult
Max-Planck-Str. 2
21502 Geesthacht
Germany
Tel: +49 4152 889 300
Fax: +49 4152 889 333
E-mail: peter.maloney@brockmann-consult.de
Internet: http://www.brockmann-consult.de
--------------------------------------------


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2017-06-15 12:49 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-15 21:34 another scrub bug? blocked for > 10240.948831 secs Peter Maloney
2017-04-17 13:18 ` Sage Weil
2017-04-20  5:58   ` Peter Maloney
2017-04-20 14:23     ` Sage Weil
2017-04-20 16:05       ` Peter Maloney
2017-04-20 16:19         ` Sage Weil
2017-04-20 18:58           ` Peter Maloney
2017-04-28  7:24             ` Peter Maloney
2017-05-23  9:29               ` Peter Maloney
2017-06-15 12:49                 ` Peter Maloney

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.