From: "Paul E. McKenney" <paulmck@linux.ibm.com> To: "Michael S. Tsirkin" <mst@redhat.com> Cc: aarcange@redhat.com, akpm@linux-foundation.org, christian@brauner.io, davem@davemloft.net, ebiederm@xmission.com, elena.reshetova@intel.com, guro@fb.com, hch@infradead.org, james.bottomley@hansenpartnership.com, jasowang@redhat.com, jglisse@redhat.com, keescook@chromium.org, ldv@altlinux.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-parisc@vger.kernel.org, luto@amacapital.net, mhocko@suse.com, mingo@kernel.org, namit@vmware.com, peterz@infradead.org, syzkaller-bugs@googlegroups.com, viro@zeniv.linux.org.uk, wad@chromium.org Subject: Re: RFC: call_rcu_outstanding (was Re: WARNING in __mmdrop) Date: Sun, 21 Jul 2019 06:17:25 -0700 [thread overview] Message-ID: <20190721131725.GR14271@linux.ibm.com> (raw) In-Reply-To: <20190721081933-mutt-send-email-mst@kernel.org> On Sun, Jul 21, 2019 at 08:28:05AM -0400, Michael S. Tsirkin wrote: > Hi Paul, others, > > So it seems that vhost needs to call kfree_rcu from an ioctl. My worry > is what happens if userspace starts cycling through lots of these > ioctls. Given we actually use rcu as an optimization, we could just > disable the optimization temporarily - but the question would be how to > detect an excessive rate without working too hard :) . > > I guess we could define as excessive any rate where callback is > outstanding at the time when new structure is allocated. I have very > little understanding of rcu internals - so I wanted to check that the > following more or less implements this heuristic before I spend time > actually testing it. > > Could others pls take a look and let me know? These look good as a way of seeing if there are any outstanding callbacks, but in the case of Tree RCU, call_rcu_outstanding() would almost never return false on a busy system. Here are some alternatives: o RCU uses some pieces of Rao Shoaib kfree_rcu() patches. The idea is to make kfree_rcu() locally buffer requests into batches of (say) 1,000, but processing smaller batches when RCU is idle, or when some smallish amout of time has passed with no more kfree_rcu() request from that CPU. RCU than takes in the batch using not call_rcu(), but rather queue_rcu_work(). The resulting batch of kfree() calls would therefore execute in workqueue context rather than in softirq context, which should be much easier on the system. In theory, this would allow people to use kfree_rcu() without worrying quite so much about overload. It would also not be that hard to implement. o Subsystems vulnerable to user-induced kfree_rcu() flooding use call_rcu() instead of kfree_rcu(). Keep a count of the number of things waiting for a grace period, and when this gets too large, disable the optimization. It will then drain down, at which point the optimization can be re-enabled. But please note that callbacks are -not- guaranteed to run on the CPU that queued them. So yes, you would need a per-CPU counter, but you would need to periodically sum it up to check against the global state. Or keep track of the CPU that did the call_rcu() so that you can atomically decrement in the callback the same counter that was atomically incremented just before the call_rcu(). Or any number of other approaches. Also, the overhead is important. For example, as far as I know, current RCU gracefully handles close(open(...)) in a tight userspace loop. But there might be trouble due to tight userspace loops around lighter-weight operations. So an important question is "Just how fast is your ioctl?" If it takes (say) 100 microseconds to execute, there should be absolutely no problem. On the other hand, if it can execute in 50 nanoseconds, this very likely does need serious attention. Other thoughts? Thanx, Paul > Thanks! > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com> > > > diff --git a/kernel/rcu/tiny.c b/kernel/rcu/tiny.c > index 477b4eb44af5..067909521d72 100644 > --- a/kernel/rcu/tiny.c > +++ b/kernel/rcu/tiny.c > @@ -125,6 +125,25 @@ void synchronize_rcu(void) > } > EXPORT_SYMBOL_GPL(synchronize_rcu); > > +/* > + * Helpful for rate-limiting kfree_rcu/call_rcu callbacks. > + */ > +bool call_rcu_outstanding(void) > +{ > + unsigned long flags; > + struct rcu_data *rdp; > + bool outstanding; > + > + local_irq_save(flags); > + rdp = this_cpu_ptr(&rcu_data); > + outstanding = rcu_segcblist_empty(&rdp->cblist); > + outstanding = rcu_ctrlblk.donetail != rcu_ctrlblk.curtail; > + local_irq_restore(flags); > + > + return outstanding; > +} > +EXPORT_SYMBOL_GPL(call_rcu_outstanding); > + > /* > * Post an RCU callback to be invoked after the end of an RCU grace > * period. But since we have but one CPU, that would be after any > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > index a14e5fbbea46..d4b9d61e637d 100644 > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -2482,6 +2482,24 @@ static void rcu_leak_callback(struct rcu_head *rhp) > { > } > > +/* > + * Helpful for rate-limiting kfree_rcu/call_rcu callbacks. > + */ > +bool call_rcu_outstanding(void) > +{ > + unsigned long flags; > + struct rcu_data *rdp; > + bool outstanding; > + > + local_irq_save(flags); > + rdp = this_cpu_ptr(&rcu_data); > + outstanding = rcu_segcblist_empty(&rdp->cblist); > + local_irq_restore(flags); > + > + return outstanding; > +} > +EXPORT_SYMBOL_GPL(call_rcu_outstanding); > + > /* > * Helper function for call_rcu() and friends. The cpu argument will > * normally be -1, indicating "currently running CPU". It may specify
WARNING: multiple messages have this Message-ID (diff)
From: "Paul E. McKenney" <paulmck@linux.ibm.com> To: "Michael S. Tsirkin" <mst@redhat.com> Cc: mhocko@suse.com, peterz@infradead.org, jasowang@redhat.com, ldv@altlinux.org, james.bottomley@hansenpartnership.com, linux-mm@kvack.org, namit@vmware.com, mingo@kernel.org, elena.reshetova@intel.com, aarcange@redhat.com, davem@davemloft.net, hch@infradead.org, christian@brauner.io, keescook@chromium.org, syzkaller-bugs@googlegroups.com, jglisse@redhat.com, viro@zeniv.linux.org.uk, linux-arm-kernel@lists.infradead.org, wad@chromium.org, linux-parisc@vger.kernel.org, linux-kernel@vger.kernel.org, luto@amacapital.net, ebiederm@xmission.com, akpm@linux-foundation.org, guro@fb.com Subject: Re: RFC: call_rcu_outstanding (was Re: WARNING in __mmdrop) Date: Sun, 21 Jul 2019 06:17:25 -0700 [thread overview] Message-ID: <20190721131725.GR14271@linux.ibm.com> (raw) In-Reply-To: <20190721081933-mutt-send-email-mst@kernel.org> On Sun, Jul 21, 2019 at 08:28:05AM -0400, Michael S. Tsirkin wrote: > Hi Paul, others, > > So it seems that vhost needs to call kfree_rcu from an ioctl. My worry > is what happens if userspace starts cycling through lots of these > ioctls. Given we actually use rcu as an optimization, we could just > disable the optimization temporarily - but the question would be how to > detect an excessive rate without working too hard :) . > > I guess we could define as excessive any rate where callback is > outstanding at the time when new structure is allocated. I have very > little understanding of rcu internals - so I wanted to check that the > following more or less implements this heuristic before I spend time > actually testing it. > > Could others pls take a look and let me know? These look good as a way of seeing if there are any outstanding callbacks, but in the case of Tree RCU, call_rcu_outstanding() would almost never return false on a busy system. Here are some alternatives: o RCU uses some pieces of Rao Shoaib kfree_rcu() patches. The idea is to make kfree_rcu() locally buffer requests into batches of (say) 1,000, but processing smaller batches when RCU is idle, or when some smallish amout of time has passed with no more kfree_rcu() request from that CPU. RCU than takes in the batch using not call_rcu(), but rather queue_rcu_work(). The resulting batch of kfree() calls would therefore execute in workqueue context rather than in softirq context, which should be much easier on the system. In theory, this would allow people to use kfree_rcu() without worrying quite so much about overload. It would also not be that hard to implement. o Subsystems vulnerable to user-induced kfree_rcu() flooding use call_rcu() instead of kfree_rcu(). Keep a count of the number of things waiting for a grace period, and when this gets too large, disable the optimization. It will then drain down, at which point the optimization can be re-enabled. But please note that callbacks are -not- guaranteed to run on the CPU that queued them. So yes, you would need a per-CPU counter, but you would need to periodically sum it up to check against the global state. Or keep track of the CPU that did the call_rcu() so that you can atomically decrement in the callback the same counter that was atomically incremented just before the call_rcu(). Or any number of other approaches. Also, the overhead is important. For example, as far as I know, current RCU gracefully handles close(open(...)) in a tight userspace loop. But there might be trouble due to tight userspace loops around lighter-weight operations. So an important question is "Just how fast is your ioctl?" If it takes (say) 100 microseconds to execute, there should be absolutely no problem. On the other hand, if it can execute in 50 nanoseconds, this very likely does need serious attention. Other thoughts? Thanx, Paul > Thanks! > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com> > > > diff --git a/kernel/rcu/tiny.c b/kernel/rcu/tiny.c > index 477b4eb44af5..067909521d72 100644 > --- a/kernel/rcu/tiny.c > +++ b/kernel/rcu/tiny.c > @@ -125,6 +125,25 @@ void synchronize_rcu(void) > } > EXPORT_SYMBOL_GPL(synchronize_rcu); > > +/* > + * Helpful for rate-limiting kfree_rcu/call_rcu callbacks. > + */ > +bool call_rcu_outstanding(void) > +{ > + unsigned long flags; > + struct rcu_data *rdp; > + bool outstanding; > + > + local_irq_save(flags); > + rdp = this_cpu_ptr(&rcu_data); > + outstanding = rcu_segcblist_empty(&rdp->cblist); > + outstanding = rcu_ctrlblk.donetail != rcu_ctrlblk.curtail; > + local_irq_restore(flags); > + > + return outstanding; > +} > +EXPORT_SYMBOL_GPL(call_rcu_outstanding); > + > /* > * Post an RCU callback to be invoked after the end of an RCU grace > * period. But since we have but one CPU, that would be after any > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > index a14e5fbbea46..d4b9d61e637d 100644 > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -2482,6 +2482,24 @@ static void rcu_leak_callback(struct rcu_head *rhp) > { > } > > +/* > + * Helpful for rate-limiting kfree_rcu/call_rcu callbacks. > + */ > +bool call_rcu_outstanding(void) > +{ > + unsigned long flags; > + struct rcu_data *rdp; > + bool outstanding; > + > + local_irq_save(flags); > + rdp = this_cpu_ptr(&rcu_data); > + outstanding = rcu_segcblist_empty(&rdp->cblist); > + local_irq_restore(flags); > + > + return outstanding; > +} > +EXPORT_SYMBOL_GPL(call_rcu_outstanding); > + > /* > * Helper function for call_rcu() and friends. The cpu argument will > * normally be -1, indicating "currently running CPU". It may specify _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2019-07-21 13:19 UTC|newest] Thread overview: 176+ messages / expand[flat|nested] mbox.gz Atom feed top 2019-07-19 3:35 WARNING in __mmdrop syzbot 2019-07-20 10:08 ` syzbot 2019-07-20 10:08 ` syzbot 2019-07-20 10:08 ` syzbot 2019-07-21 10:02 ` Michael S. Tsirkin 2019-07-21 10:02 ` Michael S. Tsirkin 2019-07-21 12:18 ` Michael S. Tsirkin 2019-07-21 12:18 ` Michael S. Tsirkin 2019-07-22 5:24 ` Jason Wang 2019-07-22 5:24 ` Jason Wang 2019-07-22 8:08 ` Michael S. Tsirkin 2019-07-22 8:08 ` Michael S. Tsirkin 2019-07-23 4:01 ` Jason Wang 2019-07-23 4:01 ` Jason Wang 2019-07-23 5:01 ` Michael S. Tsirkin 2019-07-23 5:01 ` Michael S. Tsirkin 2019-07-23 5:47 ` Jason Wang 2019-07-23 5:47 ` Jason Wang 2019-07-23 7:23 ` Michael S. Tsirkin 2019-07-23 7:23 ` Michael S. Tsirkin 2019-07-23 7:53 ` Jason Wang 2019-07-23 7:53 ` Jason Wang 2019-07-23 8:10 ` Michael S. Tsirkin 2019-07-23 8:10 ` Michael S. Tsirkin 2019-07-23 8:49 ` Jason Wang 2019-07-23 8:49 ` Jason Wang 2019-07-23 9:26 ` Michael S. Tsirkin 2019-07-23 9:26 ` Michael S. Tsirkin 2019-07-23 13:31 ` Jason Wang 2019-07-23 13:31 ` Jason Wang 2019-07-25 5:52 ` Michael S. Tsirkin 2019-07-25 5:52 ` Michael S. Tsirkin 2019-07-25 7:43 ` Jason Wang 2019-07-25 7:43 ` Jason Wang 2019-07-25 8:28 ` Michael S. Tsirkin 2019-07-25 8:28 ` Michael S. Tsirkin 2019-07-25 13:21 ` Jason Wang 2019-07-25 13:21 ` Jason Wang 2019-07-25 13:26 ` Michael S. Tsirkin 2019-07-25 13:26 ` Michael S. Tsirkin 2019-07-25 14:25 ` Jason Wang 2019-07-25 14:25 ` Jason Wang 2019-07-26 11:49 ` Michael S. Tsirkin 2019-07-26 11:49 ` Michael S. Tsirkin 2019-07-26 12:00 ` Jason Wang 2019-07-26 12:00 ` Jason Wang 2019-07-26 12:38 ` Michael S. Tsirkin 2019-07-26 12:38 ` Michael S. Tsirkin 2019-07-26 12:53 ` Jason Wang 2019-07-26 12:53 ` Jason Wang 2019-07-26 13:36 ` Jason Wang 2019-07-26 13:36 ` Jason Wang 2019-07-26 13:49 ` Michael S. Tsirkin 2019-07-26 13:49 ` Michael S. Tsirkin 2019-07-29 5:54 ` Jason Wang 2019-07-29 5:54 ` Jason Wang 2019-07-29 8:59 ` Michael S. Tsirkin 2019-07-29 8:59 ` Michael S. Tsirkin 2019-07-29 14:24 ` Jason Wang 2019-07-29 14:24 ` Jason Wang 2019-07-29 14:44 ` Michael S. Tsirkin 2019-07-29 14:44 ` Michael S. Tsirkin 2019-07-30 7:44 ` Jason Wang 2019-07-30 7:44 ` Jason Wang 2019-07-30 8:03 ` Jason Wang 2019-07-30 8:03 ` Jason Wang 2019-07-30 15:08 ` Michael S. Tsirkin 2019-07-30 15:08 ` Michael S. Tsirkin 2019-07-31 8:49 ` Jason Wang 2019-07-31 8:49 ` Jason Wang 2019-07-31 23:00 ` Jason Gunthorpe 2019-07-31 23:00 ` Jason Gunthorpe 2019-07-26 13:47 ` Michael S. Tsirkin 2019-07-26 13:47 ` Michael S. Tsirkin 2019-07-26 14:00 ` Jason Wang 2019-07-26 14:00 ` Jason Wang 2019-07-26 14:10 ` Michael S. Tsirkin 2019-07-26 14:10 ` Michael S. Tsirkin 2019-07-26 15:03 ` Jason Gunthorpe 2019-07-26 15:03 ` Jason Gunthorpe 2019-07-29 5:56 ` Jason Wang 2019-07-29 5:56 ` Jason Wang 2019-07-21 12:28 ` RFC: call_rcu_outstanding (was Re: WARNING in __mmdrop) Michael S. Tsirkin 2019-07-21 12:28 ` Michael S. Tsirkin 2019-07-21 13:17 ` Paul E. McKenney [this message] 2019-07-21 13:17 ` Paul E. McKenney 2019-07-21 17:53 ` Michael S. Tsirkin 2019-07-21 17:53 ` Michael S. Tsirkin 2019-07-21 19:28 ` Paul E. McKenney 2019-07-21 19:28 ` Paul E. McKenney 2019-07-22 7:56 ` Michael S. Tsirkin 2019-07-22 7:56 ` Michael S. Tsirkin 2019-07-22 11:57 ` Paul E. McKenney 2019-07-22 11:57 ` Paul E. McKenney 2019-07-21 21:08 ` Matthew Wilcox 2019-07-21 21:08 ` Matthew Wilcox 2019-07-21 23:31 ` Paul E. McKenney 2019-07-21 23:31 ` Paul E. McKenney 2019-07-22 7:52 ` Michael S. Tsirkin 2019-07-22 7:52 ` Michael S. Tsirkin 2019-07-22 11:51 ` Paul E. McKenney 2019-07-22 11:51 ` Paul E. McKenney 2019-07-22 13:41 ` Jason Gunthorpe 2019-07-22 13:41 ` Jason Gunthorpe 2019-07-22 15:52 ` Paul E. McKenney 2019-07-22 15:52 ` Paul E. McKenney 2019-07-22 16:04 ` Jason Gunthorpe 2019-07-22 16:04 ` Jason Gunthorpe 2019-07-22 16:15 ` Michael S. Tsirkin 2019-07-22 16:15 ` Michael S. Tsirkin 2019-07-22 16:15 ` Paul E. McKenney 2019-07-22 16:15 ` Paul E. McKenney 2019-07-22 15:14 ` Joel Fernandes 2019-07-22 15:14 ` Joel Fernandes 2019-07-22 15:47 ` Michael S. Tsirkin 2019-07-22 15:47 ` Michael S. Tsirkin 2019-07-22 15:55 ` Paul E. McKenney 2019-07-22 15:55 ` Paul E. McKenney 2019-07-22 16:13 ` Michael S. Tsirkin 2019-07-22 16:13 ` Michael S. Tsirkin 2019-07-22 16:25 ` Paul E. McKenney 2019-07-22 16:25 ` Paul E. McKenney 2019-07-22 16:32 ` Michael S. Tsirkin 2019-07-22 16:32 ` Michael S. Tsirkin 2019-07-22 18:58 ` Paul E. McKenney 2019-07-22 18:58 ` Paul E. McKenney 2019-07-22 5:21 ` WARNING in __mmdrop Jason Wang 2019-07-22 5:21 ` Jason Wang 2019-07-22 8:02 ` Michael S. Tsirkin 2019-07-22 8:02 ` Michael S. Tsirkin 2019-07-23 3:55 ` Jason Wang 2019-07-23 3:55 ` Jason Wang 2019-07-23 5:02 ` Michael S. Tsirkin 2019-07-23 5:02 ` Michael S. Tsirkin 2019-07-23 5:48 ` Jason Wang 2019-07-23 5:48 ` Jason Wang 2019-07-23 7:25 ` Michael S. Tsirkin 2019-07-23 7:25 ` Michael S. Tsirkin 2019-07-23 7:55 ` Jason Wang 2019-07-23 7:55 ` Jason Wang 2019-07-23 7:56 ` Michael S. Tsirkin 2019-07-23 7:56 ` Michael S. Tsirkin 2019-07-23 8:42 ` Jason Wang 2019-07-23 8:42 ` Jason Wang 2019-07-23 10:27 ` Michael S. Tsirkin 2019-07-23 10:27 ` Michael S. Tsirkin 2019-07-23 13:34 ` Jason Wang 2019-07-23 13:34 ` Jason Wang 2019-07-23 15:02 ` Michael S. Tsirkin 2019-07-23 15:02 ` Michael S. Tsirkin 2019-07-24 2:17 ` Jason Wang 2019-07-24 2:17 ` Jason Wang 2019-07-24 8:05 ` Michael S. Tsirkin 2019-07-24 8:05 ` Michael S. Tsirkin 2019-07-24 10:08 ` Jason Wang 2019-07-24 10:08 ` Jason Wang 2019-07-24 18:25 ` Michael S. Tsirkin 2019-07-24 18:25 ` Michael S. Tsirkin 2019-07-25 3:44 ` Jason Wang 2019-07-25 3:44 ` Jason Wang 2019-07-25 5:09 ` Michael S. Tsirkin 2019-07-25 5:09 ` Michael S. Tsirkin 2019-07-24 16:53 ` Jason Gunthorpe 2019-07-24 16:53 ` Jason Gunthorpe 2019-07-24 18:25 ` Michael S. Tsirkin 2019-07-24 18:25 ` Michael S. Tsirkin 2019-07-23 10:42 ` Michael S. Tsirkin 2019-07-23 10:42 ` Michael S. Tsirkin 2019-07-23 13:37 ` Jason Wang 2019-07-23 13:37 ` Jason Wang 2019-07-22 14:11 ` Jason Gunthorpe 2019-07-22 14:11 ` Jason Gunthorpe 2019-07-25 6:02 ` Michael S. Tsirkin 2019-07-25 6:02 ` Michael S. Tsirkin 2019-07-25 7:44 ` Jason Wang 2019-07-25 7:44 ` Jason Wang
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20190721131725.GR14271@linux.ibm.com \ --to=paulmck@linux.ibm.com \ --cc=aarcange@redhat.com \ --cc=akpm@linux-foundation.org \ --cc=christian@brauner.io \ --cc=davem@davemloft.net \ --cc=ebiederm@xmission.com \ --cc=elena.reshetova@intel.com \ --cc=guro@fb.com \ --cc=hch@infradead.org \ --cc=james.bottomley@hansenpartnership.com \ --cc=jasowang@redhat.com \ --cc=jglisse@redhat.com \ --cc=keescook@chromium.org \ --cc=ldv@altlinux.org \ --cc=linux-arm-kernel@lists.infradead.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=linux-parisc@vger.kernel.org \ --cc=luto@amacapital.net \ --cc=mhocko@suse.com \ --cc=mingo@kernel.org \ --cc=mst@redhat.com \ --cc=namit@vmware.com \ --cc=peterz@infradead.org \ --cc=syzkaller-bugs@googlegroups.com \ --cc=viro@zeniv.linux.org.uk \ --cc=wad@chromium.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.