linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] bloody odd logics in md_exit()
@ 2018-09-29  3:33 Al Viro
  2018-09-29 23:04 ` NeilBrown
  0 siblings, 1 reply; 5+ messages in thread
From: Al Viro @ 2018-09-29  3:33 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-fsdevel

	Rationale in e2f23b606b94 (md: avoid oops on unload if some
process is in poll or select) is very odd.  Waitqueue code _does_
provide a way to remove all listeners from a waitqueue - it's simply
wake_up_all().  Once the wakeup callback has been executed (and it
runs in context of wake_up_all() caller), we don't *care* if md.o
is still there - all waiters are gone from the queue and the callback
(pollwake() and friends) doesn't reinsert them.

	Why do we need the entire md_unloading logics?  Just do
remove_proc_entry() (which will wait for any in-progress call of
->poll()) and then do plain wake_up_all().  Nothing new could get
added there once remove_proc_entry() is done and we don't need to
wake the waiters one-by-one, let alone play with exponential
backoffs for them to be gone from the queue...

	And while we are at it, procfs file_operations do not need
->owner - it's never looked at by anybody.

	Looks like the following would do just as well as the variant
currently in mainline and it's considerably simpler...  What am I
missing here?

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 63ceabb4e020..42ea44a809bd 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -7953,14 +7953,11 @@ static int md_seq_open(struct inode *inode, struct file *file)
 	return error;
 }
 
-static int md_unloading;
 static __poll_t mdstat_poll(struct file *filp, poll_table *wait)
 {
 	struct seq_file *seq = filp->private_data;
 	__poll_t mask;
 
-	if (md_unloading)
-		return EPOLLIN|EPOLLRDNORM|EPOLLERR|EPOLLPRI;
 	poll_wait(filp, &md_event_waiters, wait);
 
 	/* always allow read */
@@ -7972,7 +7969,6 @@ static __poll_t mdstat_poll(struct file *filp, poll_table *wait)
 }
 
 static const struct file_operations md_seq_fops = {
-	.owner		= THIS_MODULE,
 	.open           = md_seq_open,
 	.read           = seq_read,
 	.llseek         = seq_lseek,
@@ -9382,7 +9378,6 @@ static __exit void md_exit(void)
 {
 	struct mddev *mddev;
 	struct list_head *tmp;
-	int delay = 1;
 
 	blk_unregister_region(MKDEV(MD_MAJOR,0), 512);
 	blk_unregister_region(MKDEV(mdp_major,0), 1U << MINORBITS);
@@ -9392,17 +9387,8 @@ static __exit void md_exit(void)
 	unregister_reboot_notifier(&md_notifier);
 	unregister_sysctl_table(raid_table_header);
 
-	/* We cannot unload the modules while some process is
-	 * waiting for us in select() or poll() - wake them up
-	 */
-	md_unloading = 1;
-	while (waitqueue_active(&md_event_waiters)) {
-		/* not safe to leave yet */
-		wake_up(&md_event_waiters);
-		msleep(delay);
-		delay += delay;
-	}
 	remove_proc_entry("mdstat", NULL);
+	wake_up_all(&md_event_waiters);
 
 	for_each_mddev(mddev, tmp) {
 		export_array(mddev);

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [RFC] bloody odd logics in md_exit()
  2018-09-29  3:33 [RFC] bloody odd logics in md_exit() Al Viro
@ 2018-09-29 23:04 ` NeilBrown
  2018-09-30  2:00   ` Al Viro
  2018-09-30  2:13   ` Matthew Wilcox
  0 siblings, 2 replies; 5+ messages in thread
From: NeilBrown @ 2018-09-29 23:04 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 3962 bytes --]

On Sat, Sep 29 2018, Al Viro wrote:

> 	Rationale in e2f23b606b94 (md: avoid oops on unload if some
> process is in poll or select) is very odd.  Waitqueue code _does_
> provide a way to remove all listeners from a waitqueue - it's simply
> wake_up_all().  Once the wakeup callback has been executed (and it
> runs in context of wake_up_all() caller), we don't *care* if md.o
> is still there - all waiters are gone from the queue and the callback
> (pollwake() and friends) doesn't reinsert them.
>
> 	Why do we need the entire md_unloading logics?  Just do
> remove_proc_entry() (which will wait for any in-progress call of
> ->poll()) and then do plain wake_up_all().  Nothing new could get
> added there once remove_proc_entry() is done and we don't need to
> wake the waiters one-by-one, let alone play with exponential
> backoffs for them to be gone from the queue...
>
> 	And while we are at it, procfs file_operations do not need
> ->owner - it's never looked at by anybody.
>
> 	Looks like the following would do just as well as the variant
> currently in mainline and it's considerably simpler...  What am I
> missing here?

Hi Al,
 I don't think wake_up_all() does remove anything from the queue.
 It simply wakes up the various processes that are waiting.
 They remain on the queue until they call remove_wait_queue(), which
 could be delayed arbitrarily.
 If it was delayed until after the module was unloaded and
 "md_event_waiters" no longer existed, the unlink attempt would cause an
 invalid memory access.

 I think your approach for simplify the code would only work if
 md_event_waiters could be moved out of the module, or if some global
 wait_queue could be used instead.
 Maybe we could use bit_waitqueue(NULL,0) (rather an ugly hack).
 Maybe we could export a general-purpose waitqueue for modules to use.
 Maybe procfs could export something??

 I agree that we can remove md_unloading, by moving remove_proc_entry()
 before the wakeup.  I'm not yet convinced that we can remove the wakeup
 loop.
 
 Or am I missing something else here?

Thanks,
NeilBrown


>
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 63ceabb4e020..42ea44a809bd 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -7953,14 +7953,11 @@ static int md_seq_open(struct inode *inode, struct file *file)
>  	return error;
>  }
>  
> -static int md_unloading;
>  static __poll_t mdstat_poll(struct file *filp, poll_table *wait)
>  {
>  	struct seq_file *seq = filp->private_data;
>  	__poll_t mask;
>  
> -	if (md_unloading)
> -		return EPOLLIN|EPOLLRDNORM|EPOLLERR|EPOLLPRI;
>  	poll_wait(filp, &md_event_waiters, wait);
>  
>  	/* always allow read */
> @@ -7972,7 +7969,6 @@ static __poll_t mdstat_poll(struct file *filp, poll_table *wait)
>  }
>  
>  static const struct file_operations md_seq_fops = {
> -	.owner		= THIS_MODULE,
>  	.open           = md_seq_open,
>  	.read           = seq_read,
>  	.llseek         = seq_lseek,
> @@ -9382,7 +9378,6 @@ static __exit void md_exit(void)
>  {
>  	struct mddev *mddev;
>  	struct list_head *tmp;
> -	int delay = 1;
>  
>  	blk_unregister_region(MKDEV(MD_MAJOR,0), 512);
>  	blk_unregister_region(MKDEV(mdp_major,0), 1U << MINORBITS);
> @@ -9392,17 +9387,8 @@ static __exit void md_exit(void)
>  	unregister_reboot_notifier(&md_notifier);
>  	unregister_sysctl_table(raid_table_header);
>  
> -	/* We cannot unload the modules while some process is
> -	 * waiting for us in select() or poll() - wake them up
> -	 */
> -	md_unloading = 1;
> -	while (waitqueue_active(&md_event_waiters)) {
> -		/* not safe to leave yet */
> -		wake_up(&md_event_waiters);
> -		msleep(delay);
> -		delay += delay;
> -	}
>  	remove_proc_entry("mdstat", NULL);
> +	wake_up_all(&md_event_waiters);
>  
>  	for_each_mddev(mddev, tmp) {
>  		export_array(mddev);

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] bloody odd logics in md_exit()
  2018-09-29 23:04 ` NeilBrown
@ 2018-09-30  2:00   ` Al Viro
  2018-09-30  2:13   ` Matthew Wilcox
  1 sibling, 0 replies; 5+ messages in thread
From: Al Viro @ 2018-09-30  2:00 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-fsdevel

On Sun, Sep 30, 2018 at 09:04:11AM +1000, NeilBrown wrote:

> Hi Al,
>  I don't think wake_up_all() does remove anything from the queue.
>  It simply wakes up the various processes that are waiting.
>  They remain on the queue until they call remove_wait_queue(), which
>  could be delayed arbitrarily.
>  If it was delayed until after the module was unloaded and
>  "md_event_waiters" no longer existed, the unlink attempt would cause an
>  invalid memory access.
> 
>  I think your approach for simplify the code would only work if
>  md_event_waiters could be moved out of the module, or if some global
>  wait_queue could be used instead.
>  Maybe we could use bit_waitqueue(NULL,0) (rather an ugly hack).
>  Maybe we could export a general-purpose waitqueue for modules to use.
>  Maybe procfs could export something??
> 
>  I agree that we can remove md_unloading, by moving remove_proc_entry()
>  before the wakeup.  I'm not yet convinced that we can remove the wakeup
>  loop.
>  
>  Or am I missing something else here?

You are not, unfortunately; I plead a bad braino.  remove_wait_queue() will,
indeed, happen only when we get around to poll_freewait().  Bugger...
That's a problem, and I'm not at all sure that it's only md that needs to
be concerned with it.  And it's not just procfs either - debugfs users have
exact same problem if they have ->poll() in their file_operations...

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] bloody odd logics in md_exit()
  2018-09-29 23:04 ` NeilBrown
  2018-09-30  2:00   ` Al Viro
@ 2018-09-30  2:13   ` Matthew Wilcox
  2018-09-30  3:17     ` Al Viro
  1 sibling, 1 reply; 5+ messages in thread
From: Matthew Wilcox @ 2018-09-30  2:13 UTC (permalink / raw)
  To: NeilBrown; +Cc: Al Viro, linux-fsdevel

On Sun, Sep 30, 2018 at 09:04:11AM +1000, NeilBrown wrote:
> On Sat, Sep 29 2018, Al Viro wrote:
> > 	Rationale in e2f23b606b94 (md: avoid oops on unload if some
> > process is in poll or select) is very odd.  Waitqueue code _does_
> > provide a way to remove all listeners from a waitqueue - it's simply
> > wake_up_all().  Once the wakeup callback has been executed (and it
> > runs in context of wake_up_all() caller), we don't *care* if md.o
> > is still there - all waiters are gone from the queue and the callback
> > (pollwake() and friends) doesn't reinsert them.
>
>  I don't think wake_up_all() does remove anything from the queue.
>  It simply wakes up the various processes that are waiting.
>  They remain on the queue until they call remove_wait_queue(), which
>  could be delayed arbitrarily.
>  If it was delayed until after the module was unloaded and
>  "md_event_waiters" no longer existed, the unlink attempt would cause an
>  invalid memory access.

init_wait_entry() initialises wq_entry->func to autoremove_wake_function
which calls list_del_init() when it's called from __wake_up_common().
If we look at the AIO path, it sets ->func to aio_poll_wake() which
also calls list_del_init().  So I think Al is right, but I haven't
looked at _every_ code path.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] bloody odd logics in md_exit()
  2018-09-30  2:13   ` Matthew Wilcox
@ 2018-09-30  3:17     ` Al Viro
  0 siblings, 0 replies; 5+ messages in thread
From: Al Viro @ 2018-09-30  3:17 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: NeilBrown, linux-fsdevel

On Sat, Sep 29, 2018 at 07:13:56PM -0700, Matthew Wilcox wrote:
> On Sun, Sep 30, 2018 at 09:04:11AM +1000, NeilBrown wrote:
> > On Sat, Sep 29 2018, Al Viro wrote:
> > > 	Rationale in e2f23b606b94 (md: avoid oops on unload if some
> > > process is in poll or select) is very odd.  Waitqueue code _does_
> > > provide a way to remove all listeners from a waitqueue - it's simply
> > > wake_up_all().  Once the wakeup callback has been executed (and it
> > > runs in context of wake_up_all() caller), we don't *care* if md.o
> > > is still there - all waiters are gone from the queue and the callback
> > > (pollwake() and friends) doesn't reinsert them.
> >
> >  I don't think wake_up_all() does remove anything from the queue.
> >  It simply wakes up the various processes that are waiting.
> >  They remain on the queue until they call remove_wait_queue(), which
> >  could be delayed arbitrarily.
> >  If it was delayed until after the module was unloaded and
> >  "md_event_waiters" no longer existed, the unlink attempt would cause an
> >  invalid memory access.
> 
> init_wait_entry() initialises wq_entry->func to autoremove_wake_function
> which calls list_del_init() when it's called from __wake_up_common().
> If we look at the AIO path, it sets ->func to aio_poll_wake() which
> also calls list_del_init().  So I think Al is right, but I haven't
> looked at _every_ code path.

Alas, no - poll_wait() does
        init_waitqueue_func_entry(&entry->wait, pollwake);
and pollwake() does not do autoremove.  Worse, the way wakeups are done in
actual drivers, we _can't_ do autoremove there - it is possible to get
several non-specific wakeups for the same file before we finally get the
condition select(2) is waiting for.

Look, for example, at drivers/char/virtio_console.c; the same queue is used
for poll, for read waiting for data and for write waiting for possiblity to
send.  Worse, there's no obvious way to tell which conditions change in given
wakeup source.

And yeah, we do have broken stuff - there's four hundred or so instances, so
I didn't get anywhere near complete audit, but we do have some buggered ones.
Both on rmmod and, at least in one case, on pcmcia card removal...

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-09-30  9:49 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-29  3:33 [RFC] bloody odd logics in md_exit() Al Viro
2018-09-29 23:04 ` NeilBrown
2018-09-30  2:00   ` Al Viro
2018-09-30  2:13   ` Matthew Wilcox
2018-09-30  3:17     ` Al Viro

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).