"md/raid10:md5: sdf: redirecting sector 2979126480 to another mirror"

All of lore.kernel.org
 help / color / mirror / Atom feed

* "md/raid10:md5: sdf: redirecting sector 2979126480 to another mirror"
@ 2021-01-06 23:27 Andy Smith
  2021-01-11 17:27 ` Andy Smith
  2021-01-12  0:36 ` Guoqing Jiang
  0 siblings, 2 replies; 5+ messages in thread
From: Andy Smith @ 2021-01-06 23:27 UTC (permalink / raw)
  To: linux-raid

Hi,

"md/raid10:md5: sdf: redirecting sector 2979126480 to another
mirror"

I've actually been seeing these messages very occasionally across
many machines for years but have never found anything wrong so kept
putting investigation of it to the bottom of my TODO list. I have
even in the past upon seeing this done a full scrub and check and
found no issue.

Having just seen one of them again now, and having some spare time I
tried to look into it.

So, this messages comes from here:

    https://github.com/torvalds/linux/blob/v5.8/drivers/md/raid10.c#L1188

but under what circumstances does it actually happen?

This time, as with the other times, I cannot see any indication of
read error (i.e. no logs of that) and no problems apparent in SMART
data.

err_rdev there can only be set inside the block above that starts
with:

    if (r10_bio->devs[slot].rdev) {
        /*
         * This is an error retry, but we cannot
         * safely dereference the rdev in the r10_bio,
         * we must use the one in conf.

…but why is this an error retry? Nothing was logged so how do I find
out what the error was?

Cheers,
Andy

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: "md/raid10:md5: sdf: redirecting sector 2979126480 to another mirror"
  2021-01-06 23:27 "md/raid10:md5: sdf: redirecting sector 2979126480 to another mirror" Andy Smith
@ 2021-01-11 17:27 ` Andy Smith
  2021-01-12  0:36 ` Guoqing Jiang
  1 sibling, 0 replies; 5+ messages in thread
From: Andy Smith @ 2021-01-11 17:27 UTC (permalink / raw)
  To: linux-raid

On Wed, Jan 06, 2021 at 11:27:16PM +0000, Andy Smith wrote:
> "md/raid10:md5: sdf: redirecting sector 2979126480 to another
> mirror"

[…]

> So, this messages comes from here:
> 
>     https://github.com/torvalds/linux/blob/v5.8/drivers/md/raid10.c#L1188
> 
> but under what circumstances does it actually happen?

I managed to obtain a stack trace with "perf":

# Line 77 of this function in the raid10 module is where the
# "redirecting sector" message comes from on my kernel, the stock
# Debian buster kernel.
$ sudo perf probe -s ./linux-source-4.19/ -m raid10 --add 'raid10_read_request:77'
Added new events:
  probe:raid10_read_request (on raid10_read_request:77 in raid10)
  probe:raid10_read_request_1 (on raid10_read_request:77 in raid10)
  probe:raid10_read_request_2 (on raid10_read_request:77 in raid10)
  probe:raid10_read_request_3 (on raid10_read_request:77 in raid10)
  probe:raid10_read_request_4 (on raid10_read_request:77 in raid10)

You can now use it in all perf tools, such as:

        perf record -e probe:raid10_read_request_4 -aR sleep 1

# In another window start up a heavy continuous read load on
# /dev/md3.
$ sudo perf record -e probe:raid10_read_request -gaR sleep 120

# In syslog:
Jan 11 17:10:38 hostname kernel: [1318771.689507] md/raid10:md3: nvme1n1p5: redirecting sector 656970992 to another mirror

# "perf record" finishes:
[ perf record: Woken up 2 times to write data ]
[ perf record: Captured and wrote 0.757 MB perf.data (2 samples) ]
$ sudo perf report --stdio
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 2  of event 'probe:raid10_read_request'
# Event count (approx.): 2
#
# Children      Self  Trace output
# ........  ........  ..................
#
   100.00%   100.00%  (ffffffffc0127e42)
            |
            ---__libc_read
               entry_SYSCALL_64_after_hwframe
               do_syscall_64
               ksys_read
               vfs_read
               new_sync_read
               generic_file_read_iter
               ondemand_readahead
               __do_page_cache_readahead
               read_pages
               mpage_readpages
               submit_bio
               generic_make_request
               md_make_request
               md_handle_request
               raid10_make_request
               raid10_read_request

So I still don't know why this is considered an error and worth
logging about, but at least I don't see any obvious error paths
there.

I will continue to dig in to it ("perf" is all new to me), but if
anyone happens to know why it does this please do put me out of my
misery!

BTW this is a different host to the one I previously saw it on. As I
say I have seen this message occasionally for years now, on multiple
machines and multiple versions of Debian.

Cheers,
Andy

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: "md/raid10:md5: sdf: redirecting sector 2979126480 to another mirror"
  2021-01-06 23:27 "md/raid10:md5: sdf: redirecting sector 2979126480 to another mirror" Andy Smith
  2021-01-11 17:27 ` Andy Smith
@ 2021-01-12  0:36 ` Guoqing Jiang
  2021-01-12  2:03   ` Andy Smith
  1 sibling, 1 reply; 5+ messages in thread
From: Guoqing Jiang @ 2021-01-12  0:36 UTC (permalink / raw)
  To: linux-raid

Hi,

On 1/7/21 00:27, Andy Smith wrote:
> Hi,
> 
> "md/raid10:md5: sdf: redirecting sector 2979126480 to another
> mirror"
> 
> I've actually been seeing these messages very occasionally across
> many machines for years but have never found anything wrong so kept
> putting investigation of it to the bottom of my TODO list. I have
> even in the past upon seeing this done a full scrub and check and
> found no issue.
> 
> Having just seen one of them again now, and having some spare time I
> tried to look into it.
> 
> So, this messages comes from here:
> 
>      https://github.com/torvalds/linux/blob/v5.8/drivers/md/raid10.c#L1188
> 
> but under what circumstances does it actually happen?
> 
> This time, as with the other times, I cannot see any indication of
> read error (i.e. no logs of that) and no problems apparent in SMART
> data.
> 
> err_rdev there can only be set inside the block above that starts
> with:
> 
> 
>      if (r10_bio->devs[slot].rdev) {
>          /*
>           * This is an error retry, but we cannot
>           * safely dereference the rdev in the r10_bio,
>           * we must use the one in conf.
> 
> …but why is this an error retry? Nothing was logged so how do I find
> out what the error was?
> 

This is because handle_read_error also calls raid10_read_request, pls 
see commit 545250f2480 ("md/raid10: simplify handle_read_error()").

I guess it is better to distinguish the caller to avoid the normal read 
path prints the message too (if it is the problem).

--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -1115,7 +1115,7 @@ static void regular_request_wait(struct mddev 
*mddev, struct r10conf *conf,
  }

  static void raid10_read_request(struct mddev *mddev, struct bio *bio,
-                               struct r10bio *r10_bio)
+                               struct r10bio *r10_bio, bool read_err)
  {
         struct r10conf *conf = mddev->private;
         struct bio *read_bio;
@@ -1128,7 +1128,7 @@ static void raid10_read_request(struct mddev 
*mddev, struct bio *bio,
         struct md_rdev *err_rdev = NULL;
         gfp_t gfp = GFP_NOIO;

-       if (slot >= 0 && r10_bio->devs[slot].rdev) {
+       if (read_err && slot >= 0 && r10_bio->devs[slot].rdev) {
                 /*
                  * This is an error retry, but we cannot
                  * safely dereference the rdev in the r10_bio,
@@ -1495,7 +1495,7 @@ static void __make_request(struct mddev *mddev, 
struct bio *bio, int sectors)
         memset(r10_bio->devs, 0, sizeof(r10_bio->devs[0]) * conf->copies);

         if (bio_data_dir(bio) == READ)
-               raid10_read_request(mddev, bio, r10_bio);
+               raid10_read_request(mddev, bio, r10_bio, false);
         else
                 raid10_write_request(mddev, bio, r10_bio);
  }
@@ -2586,7 +2586,7 @@ static void handle_read_error(struct mddev *mddev, 
struct r10bio *r10_bio)
         rdev_dec_pending(rdev, mddev);
         allow_barrier(conf);
         r10_bio->state = 0;
-       raid10_read_request(mddev, r10_bio->master_bio, r10_bio);
+       raid10_read_request(mddev, r10_bio->master_bio, r10_bio, true);
  }

Thanks,
Guoqing

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: "md/raid10:md5: sdf: redirecting sector 2979126480 to another mirror"
  2021-01-12  0:36 ` Guoqing Jiang
@ 2021-01-12  2:03   ` Andy Smith
  2021-01-12 11:30     ` Guoqing Jiang
  0 siblings, 1 reply; 5+ messages in thread
From: Andy Smith @ 2021-01-12  2:03 UTC (permalink / raw)
  To: linux-raid

Hi Guoqing,

Thanks for following up on this. I have a couple of questions.

On Tue, Jan 12, 2021 at 01:36:55AM +0100, Guoqing Jiang wrote:
> On 1/7/21 00:27, Andy Smith wrote:
> >err_rdev there can only be set inside the block above that starts
> >with:
> >
> >     if (r10_bio->devs[slot].rdev) {
> >         /*
> >          * This is an error retry, but we cannot
> >          * safely dereference the rdev in the r10_bio,
> >          * we must use the one in conf.
> >
> >…but why is this an error retry? Nothing was logged so how do I find
> >out what the error was?
> 
> This is because handle_read_error also calls raid10_read_request, pls see
> commit 545250f2480 ("md/raid10: simplify handle_read_error()").

So if I understand you correctly, this is a consequence of
raid10_read_request being reworked so that it can be called by
handle_read_error, but in my case it is not being called by
handle_read_error but instead raid10_make_request and is incorrectly
going down an error path and reporting a redirected read?

From my stack trace it seemed that it was just
raid10.c:__make_request that was calling raid10_read_request but I
could not see where in __make_request the r10_bio->devs[slot].rdev
was being set, enabling the above test to succeed. All I could see
was a memset to 0.

I understand that your patch makes it so this test can no longer
succeed when being called by __make_request, meaning that aside from
not logging about a redirected read it will also not do the
rcu_read_lock() / rcu_dereference() / rcu_read_unlock() that's in
that if block. Is that a significant amount of work that is being
needlessly done right now or is it trivial?

I'm trying to understand how big of a problem this is, beyond some
spurious logging.

Right now when it is logging about redirecting a read, does that
mean that it isn't actually redirecting a read? That is, when it
says:

Jan 11 17:10:40 hostname kernel: [1318773.480077] md/raid10:md3: nvme1n1p5: redirecting sector 699122984 to another mirror

in the absence of any other error logging it is in fact its first
try at reading and it will really be using device nvme1n1p5 (rdev)
to satisfy that?

I suppose I am also confused why this happens so rarely. I can only
encourage it to happen a couple of times by putting the array under
very heavy read load, and it only seems to happen with pretty high
performing arrays (all SSD). But perhaps that is the result of the
rate-limited logging with pr_err_ratelimited() causing me to only
see very few of the actual events.

Thanks again,
Andy

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: "md/raid10:md5: sdf: redirecting sector 2979126480 to another mirror"
  2021-01-12  2:03   ` Andy Smith
@ 2021-01-12 11:30     ` Guoqing Jiang
  0 siblings, 0 replies; 5+ messages in thread
From: Guoqing Jiang @ 2021-01-12 11:30 UTC (permalink / raw)
  To: linux-raid

Hi Andy,

On 1/12/21 03:03, Andy Smith wrote:
> Hi Guoqing,
> 
> Thanks for following up on this. I have a couple of questions.
> 
> On Tue, Jan 12, 2021 at 01:36:55AM +0100, Guoqing Jiang wrote:
>> On 1/7/21 00:27, Andy Smith wrote:
>>> err_rdev there can only be set inside the block above that starts
>>> with:
>>>
>>>      if (r10_bio->devs[slot].rdev) {
>>>          /*
>>>           * This is an error retry, but we cannot
>>>           * safely dereference the rdev in the r10_bio,
>>>           * we must use the one in conf.
>>>
>>> …but why is this an error retry? Nothing was logged so how do I find
>>> out what the error was?
>>
>> This is because handle_read_error also calls raid10_read_request, pls see
>> commit 545250f2480 ("md/raid10: simplify handle_read_error()").
> 
> So if I understand you correctly, this is a consequence of
> raid10_read_request being reworked so that it can be called by
> handle_read_error, but in my case it is not being called by
> handle_read_error but instead raid10_make_request and is incorrectly
> going down an error path and reporting a redirected read?

Yes, that is my guess too if the message appears but there is no read 
issue from component device.

> 
>  From my stack trace it seemed that it was just
> raid10.c:__make_request that was calling raid10_read_request but I
> could not see where in __make_request the r10_bio->devs[slot].rdev
> was being set, enabling the above test to succeed. All I could see
> was a memset to 0.

IIUC, the rdev is set in raid10_run instead of before dispatch IO request.

> 
> I understand that your patch makes it so this test can no longer
> succeed when being called by __make_request, meaning that aside from
> not logging about a redirected read it will also not do the
> rcu_read_lock() / rcu_dereference() / rcu_read_unlock() that's in
> that if block. Is that a significant amount of work that is being
> needlessly done right now or is it trivial?

I think check if raid10_read_request is called from read error path is 
enough.

> 
> I'm trying to understand how big of a problem this is, beyond some
> spurious logging.
> 
> Right now when it is logging about redirecting a read, does that
> mean that it isn't actually redirecting a read? That is, when it
> says:
> 
> Jan 11 17:10:40 hostname kernel: [1318773.480077] md/raid10:md3: nvme1n1p5: redirecting sector 699122984 to another mirror
> 
> in the absence of any other error logging it is in fact its first
> try at reading and it will really be using device nvme1n1p5 (rdev)
> to satisfy that?
> 
> I suppose I am also confused why this happens so rarely. I can only
> encourage it to happen a couple of times by putting the array under
> very heavy read load, and it only seems to happen with pretty high
> performing arrays (all SSD). But perhaps that is the result of the
> rate-limited logging with pr_err_ratelimited() causing me to only
> see very few of the actual events.

If the message ("redirecting sector ...") is not comes from handle read 
err path, then I suppose the message is false alarm.


Thanks,
Guoqing

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-01-12 11:31 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-06 23:27 "md/raid10:md5: sdf: redirecting sector 2979126480 to another mirror" Andy Smith
2021-01-11 17:27 ` Andy Smith
2021-01-12  0:36 ` Guoqing Jiang
2021-01-12  2:03   ` Andy Smith
2021-01-12 11:30     ` Guoqing Jiang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.