linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Yu Kuai <yukuai1@huaweicloud.com>
To: Guoqing Jiang <guoqing.jiang@linux.dev>,
	Marc Smith <msmith626@gmail.com>
Cc: Donald Buczek <buczek@molgen.mpg.de>, Song Liu <song@kernel.org>,
	linux-raid@vger.kernel.org,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	it+raid@molgen.mpg.de, "yukuai (C)" <yukuai3@huawei.com>
Subject: Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition
Date: Wed, 15 Mar 2023 17:53:35 +0800	[thread overview]
Message-ID: <2cc75a4b-2df5-e8f0-cc01-f07210ba580f@huaweicloud.com> (raw)
In-Reply-To: <9dc19483-de0f-e8c6-bf18-10c33d0a35fd@linux.dev>

Hi,

在 2023/03/15 17:30, Guoqing Jiang 写道:
> 
>> Just borrow this thread to discuss, I think this commit might have
>> problem in some corner cases:
>>
>> t1:                t2:
>> action_store
>>  mddev_lock
>>   if (mddev->sync_thread)
>>    mddev_unlock
>>    md_unregister_thread
>>                 md_check_recovery
>>                  set_bit(MD_RECOVERY_RUNNING, &mddev->recovery)
>>                  queue_work(md_misc_wq, &mddev->del_work)
>>    mddev_lock_nointr
>>    md_reap_sync_thread
>>    // clear running
>>  mddev_lock
>>
>> t3:
>> md_start_sync
>> // running is not set
> 
> What does 'running' mean? MD_RECOVERY_RUNNING?
> 
>> Our test report a problem that can be cause by this in theory, by we
>> can't be sure for now...
> 
> I guess you tried to describe racy between
> 
> action_store -> md_register_thread
> 
> and
> 
> md_start_sync -> md_register_thread
> 
> Didn't you already fix them in the series?
> 
> [PATCH -next 0/5] md: fix uaf for sync_thread
> 
> Sorry, I didn't follow the problem and also your series, I might try your
> test with latest mainline kernel if the test is available somewhere.
> 
>> We thought about how to fix this, instead of calling
>> md_register_thread() here to wait for sync_thread to be done
>> synchronisely,
> 
> IMO, md_register_thread just create and wake a thread, not sure why it
> waits for sync_thread.
> 
>> we do this asynchronously like what md_set_readonly() and do_md_stop() 
>> does.
> 
> Still, I don't have clear picture about the problem, so I can't judge it.
> 

Sorry that I didn't explain the problem clear. Let me explain the
problem we meet first:

1) raid10d is waiting for sync_thread to stop:
   raid10d
    md_unregister_thread
     kthread_stop

2) sync_thread is waiting for io to finish:
   md_do_sync
    wait_event(... atomic_read(&mddev->recovery_active) == 0)

3) io is waiting for raid10d to finish(online crash found 2 io in 
conf->retry_list)

Additional information from online crash:
mddev->recovery = 29, // DONE, RUNING, INTR is set

PID: 138293  TASK: ffff0000de89a900  CPU: 7   COMMAND: "md0_resync"
  #0 [ffffa00107c178a0] __switch_to at ffffa0010001d75c
  #1 [ffffa00107c178d0] __schedule at ffffa001017c7f14
  #2 [ffffa00107c179f0] schedule at ffffa001017c880c
  #3 [ffffa00107c17a20] md_do_sync at ffffa0010129cdb4
  #4 [ffffa00107c17d50] md_thread at ffffa00101290d9c
  #5 [ffffa00107c17e50] kthread at ffffa00100187a74

PID: 138294  TASK: ffff0000eba13d80  CPU: 5   COMMAND: "md0_resync"
  #0 [ffffa00107e47a60] __switch_to at ffffa0010001d75c
  #1 [ffffa00107e47a90] __schedule at ffffa001017c7f14
  #2 [ffffa00107e47bb0] schedule at ffffa001017c880c
  #3 [ffffa00107e47be0] schedule_timeout at ffffa001017d1298
  #4 [ffffa00107e47d50] md_thread at ffffa00101290ee8
  #5 [ffffa00107e47e50] kthread at ffffa00100187a74
// there are two sync_thread for md0

I believe the root cause is that two sync_thread exist for the same
mddev, and this is how I think this is possible:

t1:			t2:
action_store
  mddev_lock
   if (mddev->sync_thread)
    mddev_unlock
    md_unregister_thread
    // first sync_thread is done
			md_check_recovery
                  	 set_bit(MD_RECOVERY_RUNNING, &mddev->recovery)
                  	 queue_work(md_misc_wq, &mddev->del_work)
    mddev_lock_nointr
    md_reap_sync_thread
    // MD_RECOVERY_RUNNING is cleared
  mddev_unlock

t3:
md_start_sync
// second sync_thread is registed

t3:
md_check_recovery
  queue_work(md_misc_wq, &mddev->del_work)
  // MD_RECOVERY_RUNNING  is not set, a new sync_thread can be started

This is just guess, I can't reporduce the problem yet. Please let me
know if you have any questions

Thanks,
Kuai


  reply	other threads:[~2023-03-15  9:55 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-28 12:25 md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition Donald Buczek
2020-11-30  2:06 ` Guoqing Jiang
2020-12-01  9:29   ` Donald Buczek
2020-12-02 17:28     ` Donald Buczek
2020-12-03  1:55       ` Guoqing Jiang
2020-12-03 11:42         ` Donald Buczek
2020-12-21 12:33           ` Donald Buczek
2021-01-19 11:30             ` Donald Buczek
2021-01-20 16:33               ` Guoqing Jiang
2021-01-23 13:04                 ` Donald Buczek
2021-01-25  8:54                   ` Donald Buczek
2021-01-25 21:32                     ` Donald Buczek
2021-01-26  0:44                       ` Guoqing Jiang
2021-01-26  9:50                         ` Donald Buczek
2021-01-26 11:14                           ` Guoqing Jiang
2021-01-26 12:58                             ` Donald Buczek
2021-01-26 14:06                               ` Guoqing Jiang
2021-01-26 16:05                                 ` Donald Buczek
2021-02-02 15:42                                   ` Guoqing Jiang
2021-02-08 11:38                                     ` Donald Buczek
2021-02-08 14:53                                       ` Guoqing Jiang
2021-02-08 18:41                                         ` Donald Buczek
2021-02-09  0:46                                           ` Guoqing Jiang
2021-02-09  9:24                                             ` Donald Buczek
2023-03-14 13:25                                             ` Marc Smith
2023-03-14 13:55                                               ` Guoqing Jiang
2023-03-14 14:45                                                 ` Marc Smith
2023-03-16 15:25                                                   ` Marc Smith
2023-03-29  0:01                                                     ` Song Liu
2023-08-22 21:16                                                       ` Dragan Stancevic
2023-08-23  1:22                                                         ` Yu Kuai
2023-08-23 15:33                                                           ` Dragan Stancevic
2023-08-24  1:18                                                             ` Yu Kuai
2023-08-28 20:32                                                               ` Dragan Stancevic
2023-08-30  1:36                                                                 ` Yu Kuai
2023-09-05  3:50                                                                   ` Yu Kuai
2023-09-05 13:54                                                                     ` Dragan Stancevic
2023-09-13  9:08                                                                       ` Donald Buczek
2023-09-13 14:16                                                                         ` Dragan Stancevic
2023-09-14  6:03                                                                           ` Donald Buczek
2023-09-17  8:55                                                                             ` Donald Buczek
2023-09-24 14:35                                                                               ` Donald Buczek
2023-09-25  1:11                                                                                 ` Yu Kuai
2023-09-25  9:11                                                                                   ` Donald Buczek
2023-09-25  9:32                                                                                     ` Yu Kuai
2023-03-15  3:02                                                 ` Yu Kuai
2023-03-15  9:30                                                   ` Guoqing Jiang
2023-03-15  9:53                                                     ` Yu Kuai [this message]
2023-03-15  7:52                                               ` Donald Buczek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2cc75a4b-2df5-e8f0-cc01-f07210ba580f@huaweicloud.com \
    --to=yukuai1@huaweicloud.com \
    --cc=buczek@molgen.mpg.de \
    --cc=guoqing.jiang@linux.dev \
    --cc=it+raid@molgen.mpg.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=msmith626@gmail.com \
    --cc=song@kernel.org \
    --cc=yukuai3@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).