* Re: [ovs-discuss] Double free in recent kernels after memleak fix
[not found] ` <CA+Sh73NeNr+UNZYDfD1nHUXCY-P8mT1vJdm0cEY4MPwo_0PtzQ@mail.gmail.com>
@ 2020-08-07 20:47 ` Joel Fernandes
2020-08-07 20:49 ` Joel Fernandes
2020-08-07 22:20 ` Paul E. McKenney
0 siblings, 2 replies; 14+ messages in thread
From: Joel Fernandes @ 2020-08-07 20:47 UTC (permalink / raw)
To: Johan Knöös
Cc: Gregory Rose, bugs, Tonghao Zhang, Netdev,
Uladzislau Rezki (Sony),
Paul E. McKenney, rcu
Hi,
Adding more of us working on RCU as well. Johan from another team at
Google discovered a likely issue in openswitch, details below:
On Fri, Aug 7, 2020 at 11:32 AM Johan Knöös <jknoos@google.com> wrote:
>
> On Tue, Aug 4, 2020 at 8:52 AM Gregory Rose <gvrose8192@gmail.com> wrote:
> >
> >
> >
> > On 8/3/2020 12:01 PM, Johan Knöös via discuss wrote:
> > > Hi Open vSwitch contributors,
> > >
> > > We have found openvswitch is causing double-freeing of memory. The
> > > issue was not present in kernel version 5.5.17 but is present in
> > > 5.6.14 and newer kernels.
> > >
> > > After reverting the RCU commits below for debugging, enabling
> > > slub_debug, lockdep, and KASAN, we see the warnings at the end of this
> > > email in the kernel log (the last one shows the double-free). When I
> > > revert 50b0e61b32ee890a75b4377d5fbe770a86d6a4c1 ("net: openvswitch:
> > > fix possible memleak on destroy flow-table"), the symptoms disappear.
> > > While I have a reliable way to reproduce the issue, I unfortunately
> > > don't yet have a process that's amenable to sharing. Please take a
> > > look.
> > >
> > > 189a6883dcf7 rcu: Remove kfree_call_rcu_nobatch()
> > > 77a40f97030b rcu: Remove kfree_rcu() special casing and lazy-callback handling
> > > e99637becb2e rcu: Add support for debug_objects debugging for kfree_rcu()
> > > 0392bebebf26 rcu: Add multiple in-flight batches of kfree_rcu() work
> > > 569d767087ef rcu: Make kfree_rcu() use a non-atomic ->monitor_todo
> > > a35d16905efc rcu: Add basic support for kfree_rcu() batching
> > >
Note that these reverts were only for testing the same code, because
he was testing 2 different kernel versions. One of them did not have
this set. So I asked him to revert. There's no known bug in the
reverted code itself. But somehow these patches do make it harder for
him to reproduce the issue.
> > > Thanks,
> > > Johan Knöös
> >
> > Let's add the author of the patch you reverted and the Linux netdev
> > mailing list.
> >
> > - Greg
>
> I found we also sometimes get warnings from
> https://elixir.bootlin.com/linux/v5.5.17/source/kernel/rcu/tree.c#L2239
> under similar conditions even on kernel 5.5.17, which I believe may be
> related. However, it's much rarer and I don't have a reliable way of
> reproducing it. Perhaps 50b0e61b32ee890a75b4377d5fbe770a86d6a4c1 only
> increases the frequency of a pre-existing bug.
This is interesting, because I saw kbuild warn me recently [1] about
it as well. Though, I was actually intentionally messing with the
segcblist. I plan to debug it next week, but the warning itself is
unlikely to be caused by my patch IMHO (since it is slightly
orthogonal to what I changed).
[1] https://lore.kernel.org/lkml/20200720005334.GC19262@shao2-debian/
But then again, I have not heard reports of this warning firing. Paul,
has this come to your radar recently?
Thanks,
- Joel
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [ovs-discuss] Double free in recent kernels after memleak fix
2020-08-07 20:47 ` [ovs-discuss] Double free in recent kernels after memleak fix Joel Fernandes
@ 2020-08-07 20:49 ` Joel Fernandes
2020-08-07 22:20 ` Paul E. McKenney
1 sibling, 0 replies; 14+ messages in thread
From: Joel Fernandes @ 2020-08-07 20:49 UTC (permalink / raw)
To: Johan Knöös
Cc: Gregory Rose, bugs, Tonghao Zhang, Netdev,
Uladzislau Rezki (Sony),
Paul E. McKenney, rcu
On Fri, Aug 7, 2020 at 4:47 PM Joel Fernandes <joel@joelfernandes.org> wrote:
>
> Hi,
> Adding more of us working on RCU as well. Johan from another team at
> Google discovered a likely issue in openswitch, details below:
>
> On Fri, Aug 7, 2020 at 11:32 AM Johan Knöös <jknoos@google.com> wrote:
> >
> > On Tue, Aug 4, 2020 at 8:52 AM Gregory Rose <gvrose8192@gmail.com> wrote:
> > >
> > >
> > >
> > > On 8/3/2020 12:01 PM, Johan Knöös via discuss wrote:
> > > > Hi Open vSwitch contributors,
> > > >
> > > > We have found openvswitch is causing double-freeing of memory. The
> > > > issue was not present in kernel version 5.5.17 but is present in
> > > > 5.6.14 and newer kernels.
> > > >
> > > > After reverting the RCU commits below for debugging, enabling
> > > > slub_debug, lockdep, and KASAN, we see the warnings at the end of this
> > > > email in the kernel log (the last one shows the double-free). When I
> > > > revert 50b0e61b32ee890a75b4377d5fbe770a86d6a4c1 ("net: openvswitch:
> > > > fix possible memleak on destroy flow-table"), the symptoms disappear.
> > > > While I have a reliable way to reproduce the issue, I unfortunately
> > > > don't yet have a process that's amenable to sharing. Please take a
> > > > look.
> > > >
> > > > 189a6883dcf7 rcu: Remove kfree_call_rcu_nobatch()
> > > > 77a40f97030b rcu: Remove kfree_rcu() special casing and lazy-callback handling
> > > > e99637becb2e rcu: Add support for debug_objects debugging for kfree_rcu()
> > > > 0392bebebf26 rcu: Add multiple in-flight batches of kfree_rcu() work
> > > > 569d767087ef rcu: Make kfree_rcu() use a non-atomic ->monitor_todo
> > > > a35d16905efc rcu: Add basic support for kfree_rcu() batching
> > > >
>
> Note that these reverts were only for testing the same code, because
> he was testing 2 different kernel versions. One of them did not have
> this set. So I asked him to revert. There's no known bug in the
> reverted code itself. But somehow these patches do make it harder for
> him to reproduce the issue.
And the reason for this is likely the additional kfree batching is
slowing down the occurrence of the crash.
thanks,
- Joel
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [ovs-discuss] Double free in recent kernels after memleak fix
2020-08-07 20:47 ` [ovs-discuss] Double free in recent kernels after memleak fix Joel Fernandes
2020-08-07 20:49 ` Joel Fernandes
@ 2020-08-07 22:20 ` Paul E. McKenney
2020-08-07 23:05 ` Johan Knöös
2020-08-10 20:08 ` Joel Fernandes
1 sibling, 2 replies; 14+ messages in thread
From: Paul E. McKenney @ 2020-08-07 22:20 UTC (permalink / raw)
To: Joel Fernandes
Cc: Johan Knöös, Gregory Rose, bugs, Tonghao Zhang, Netdev,
Uladzislau Rezki (Sony),
rcu
On Fri, Aug 07, 2020 at 04:47:56PM -0400, Joel Fernandes wrote:
> Hi,
> Adding more of us working on RCU as well. Johan from another team at
> Google discovered a likely issue in openswitch, details below:
>
> On Fri, Aug 7, 2020 at 11:32 AM Johan Knöös <jknoos@google.com> wrote:
> >
> > On Tue, Aug 4, 2020 at 8:52 AM Gregory Rose <gvrose8192@gmail.com> wrote:
> > >
> > >
> > >
> > > On 8/3/2020 12:01 PM, Johan Knöös via discuss wrote:
> > > > Hi Open vSwitch contributors,
> > > >
> > > > We have found openvswitch is causing double-freeing of memory. The
> > > > issue was not present in kernel version 5.5.17 but is present in
> > > > 5.6.14 and newer kernels.
> > > >
> > > > After reverting the RCU commits below for debugging, enabling
> > > > slub_debug, lockdep, and KASAN, we see the warnings at the end of this
> > > > email in the kernel log (the last one shows the double-free). When I
> > > > revert 50b0e61b32ee890a75b4377d5fbe770a86d6a4c1 ("net: openvswitch:
> > > > fix possible memleak on destroy flow-table"), the symptoms disappear.
> > > > While I have a reliable way to reproduce the issue, I unfortunately
> > > > don't yet have a process that's amenable to sharing. Please take a
> > > > look.
> > > >
> > > > 189a6883dcf7 rcu: Remove kfree_call_rcu_nobatch()
> > > > 77a40f97030b rcu: Remove kfree_rcu() special casing and lazy-callback handling
> > > > e99637becb2e rcu: Add support for debug_objects debugging for kfree_rcu()
> > > > 0392bebebf26 rcu: Add multiple in-flight batches of kfree_rcu() work
> > > > 569d767087ef rcu: Make kfree_rcu() use a non-atomic ->monitor_todo
> > > > a35d16905efc rcu: Add basic support for kfree_rcu() batching
>
> Note that these reverts were only for testing the same code, because
> he was testing 2 different kernel versions. One of them did not have
> this set. So I asked him to revert. There's no known bug in the
> reverted code itself. But somehow these patches do make it harder for
> him to reproduce the issue.
Perhaps they adjust timing?
> > > > Thanks,
> > > > Johan Knöös
> > >
> > > Let's add the author of the patch you reverted and the Linux netdev
> > > mailing list.
> > >
> > > - Greg
> >
> > I found we also sometimes get warnings from
> > https://elixir.bootlin.com/linux/v5.5.17/source/kernel/rcu/tree.c#L2239
> > under similar conditions even on kernel 5.5.17, which I believe may be
> > related. However, it's much rarer and I don't have a reliable way of
> > reproducing it. Perhaps 50b0e61b32ee890a75b4377d5fbe770a86d6a4c1 only
> > increases the frequency of a pre-existing bug.
>
> This is interesting, because I saw kbuild warn me recently [1] about
> it as well. Though, I was actually intentionally messing with the
> segcblist. I plan to debug it next week, but the warning itself is
> unlikely to be caused by my patch IMHO (since it is slightly
> orthogonal to what I changed).
>
> [1] https://lore.kernel.org/lkml/20200720005334.GC19262@shao2-debian/
>
> But then again, I have not heard reports of this warning firing. Paul,
> has this come to your radar recently?
I have not seen any recent WARNs in rcu_do_batch(). I am guessing that
this is one of the last two in that function?
If so, have you tried using CONFIG_DEBUG_OBJECTS_RCU_HEAD=y? That Kconfig
option is designed to help locate double frees via RCU.
Thanx, Paul
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [ovs-discuss] Double free in recent kernels after memleak fix
2020-08-07 22:20 ` Paul E. McKenney
@ 2020-08-07 23:05 ` Johan Knöös
2020-08-08 11:44 ` Uladzislau Rezki
2020-08-10 20:08 ` Joel Fernandes
1 sibling, 1 reply; 14+ messages in thread
From: Johan Knöös @ 2020-08-07 23:05 UTC (permalink / raw)
To: paulmck
Cc: Joel Fernandes, Gregory Rose, bugs, Tonghao Zhang, Netdev,
Uladzislau Rezki (Sony),
rcu
On Fri, Aug 7, 2020 at 3:20 PM Paul E. McKenney <paulmck@kernel.org> wrote:
>
> On Fri, Aug 07, 2020 at 04:47:56PM -0400, Joel Fernandes wrote:
> > Hi,
> > Adding more of us working on RCU as well. Johan from another team at
> > Google discovered a likely issue in openswitch, details below:
> >
> > On Fri, Aug 7, 2020 at 11:32 AM Johan Knöös <jknoos@google.com> wrote:
> > >
> > > On Tue, Aug 4, 2020 at 8:52 AM Gregory Rose <gvrose8192@gmail.com> wrote:
> > > >
> > > >
> > > >
> > > > On 8/3/2020 12:01 PM, Johan Knöös via discuss wrote:
> > > > > Hi Open vSwitch contributors,
> > > > >
> > > > > We have found openvswitch is causing double-freeing of memory. The
> > > > > issue was not present in kernel version 5.5.17 but is present in
> > > > > 5.6.14 and newer kernels.
> > > > >
> > > > > After reverting the RCU commits below for debugging, enabling
> > > > > slub_debug, lockdep, and KASAN, we see the warnings at the end of this
> > > > > email in the kernel log (the last one shows the double-free). When I
> > > > > revert 50b0e61b32ee890a75b4377d5fbe770a86d6a4c1 ("net: openvswitch:
> > > > > fix possible memleak on destroy flow-table"), the symptoms disappear.
> > > > > While I have a reliable way to reproduce the issue, I unfortunately
> > > > > don't yet have a process that's amenable to sharing. Please take a
> > > > > look.
> > > > >
> > > > > 189a6883dcf7 rcu: Remove kfree_call_rcu_nobatch()
> > > > > 77a40f97030b rcu: Remove kfree_rcu() special casing and lazy-callback handling
> > > > > e99637becb2e rcu: Add support for debug_objects debugging for kfree_rcu()
> > > > > 0392bebebf26 rcu: Add multiple in-flight batches of kfree_rcu() work
> > > > > 569d767087ef rcu: Make kfree_rcu() use a non-atomic ->monitor_todo
> > > > > a35d16905efc rcu: Add basic support for kfree_rcu() batching
> >
> > Note that these reverts were only for testing the same code, because
> > he was testing 2 different kernel versions. One of them did not have
> > this set. So I asked him to revert. There's no known bug in the
> > reverted code itself. But somehow these patches do make it harder for
> > him to reproduce the issue.
I'm not certain the frequency of the issue changes with and without
these commits on 5.6.14, but at least the symptoms/definition of the
issue changes. To clarify, this is what I've observed with different
kernels:
* 5.6.14: "kernel BUG at mm/slub.c:304!". Easily reproducible.
* 5.6.14 with the above RCU commits reverted: the warnings reported in
my original email. Easily reproducible.
* 5.6.14 with the above RCU commits reverted and
50b0e61b32ee890a75b4377d5fbe770a86d6a4c1 reverted: no warnings
observed (the frequency might be the same as on 5.5.17).
* 5.5.17: warning at kernel/rcu/tree.c#L2239. Difficult to reproduce.
Maybe a different root cause.
> Perhaps they adjust timing?
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [ovs-discuss] Double free in recent kernels after memleak fix
2020-08-07 23:05 ` Johan Knöös
@ 2020-08-08 11:44 ` Uladzislau Rezki
0 siblings, 0 replies; 14+ messages in thread
From: Uladzislau Rezki @ 2020-08-08 11:44 UTC (permalink / raw)
To: Johan Knöös
Cc: paulmck, Joel Fernandes, Gregory Rose, bugs, Tonghao Zhang,
Netdev, Uladzislau Rezki (Sony),
rcu
> On Fri, Aug 7, 2020 at 3:20 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> >
> > On Fri, Aug 07, 2020 at 04:47:56PM -0400, Joel Fernandes wrote:
> > > Hi,
> > > Adding more of us working on RCU as well. Johan from another team at
> > > Google discovered a likely issue in openswitch, details below:
> > >
> > > On Fri, Aug 7, 2020 at 11:32 AM Johan Knöös <jknoos@google.com> wrote:
> > > >
> > > > On Tue, Aug 4, 2020 at 8:52 AM Gregory Rose <gvrose8192@gmail.com> wrote:
> > > > >
> > > > >
> > > > >
> > > > > On 8/3/2020 12:01 PM, Johan Knöös via discuss wrote:
> > > > > > Hi Open vSwitch contributors,
> > > > > >
> > > > > > We have found openvswitch is causing double-freeing of memory. The
> > > > > > issue was not present in kernel version 5.5.17 but is present in
> > > > > > 5.6.14 and newer kernels.
> > > > > >
> > > > > > After reverting the RCU commits below for debugging, enabling
> > > > > > slub_debug, lockdep, and KASAN, we see the warnings at the end of this
> > > > > > email in the kernel log (the last one shows the double-free). When I
> > > > > > revert 50b0e61b32ee890a75b4377d5fbe770a86d6a4c1 ("net: openvswitch:
> > > > > > fix possible memleak on destroy flow-table"), the symptoms disappear.
> > > > > > While I have a reliable way to reproduce the issue, I unfortunately
> > > > > > don't yet have a process that's amenable to sharing. Please take a
> > > > > > look.
> > > > > >
> > > > > > 189a6883dcf7 rcu: Remove kfree_call_rcu_nobatch()
> > > > > > 77a40f97030b rcu: Remove kfree_rcu() special casing and lazy-callback handling
> > > > > > e99637becb2e rcu: Add support for debug_objects debugging for kfree_rcu()
> > > > > > 0392bebebf26 rcu: Add multiple in-flight batches of kfree_rcu() work
> > > > > > 569d767087ef rcu: Make kfree_rcu() use a non-atomic ->monitor_todo
> > > > > > a35d16905efc rcu: Add basic support for kfree_rcu() batching
> > >
> > > Note that these reverts were only for testing the same code, because
> > > he was testing 2 different kernel versions. One of them did not have
> > > this set. So I asked him to revert. There's no known bug in the
> > > reverted code itself. But somehow these patches do make it harder for
> > > him to reproduce the issue.
>
> I'm not certain the frequency of the issue changes with and without
> these commits on 5.6.14, but at least the symptoms/definition of the
> issue changes. To clarify, this is what I've observed with different
> kernels:
> * 5.6.14: "kernel BUG at mm/slub.c:304!". Easily reproducible.
> * 5.6.14 with the above RCU commits reverted: the warnings reported in
> my original email. Easily reproducible.
> * 5.6.14 with the above RCU commits reverted and
> 50b0e61b32ee890a75b4377d5fbe770a86d6a4c1 reverted: no warnings
> observed (the frequency might be the same as on 5.5.17).
> * 5.5.17: warning at kernel/rcu/tree.c#L2239. Difficult to reproduce.
> Maybe a different root cause.
>
If you can reproduce it, maybe enabling CONFIG_KASAN will detect something?
It can detect out-of-bounds and use after free bugs.
Thanks.
--
Vlad Rezki
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [ovs-discuss] Double free in recent kernels after memleak fix
2020-08-07 22:20 ` Paul E. McKenney
2020-08-07 23:05 ` Johan Knöös
@ 2020-08-10 20:08 ` Joel Fernandes
2020-08-10 20:28 ` Paul E. McKenney
1 sibling, 1 reply; 14+ messages in thread
From: Joel Fernandes @ 2020-08-10 20:08 UTC (permalink / raw)
To: Paul E. McKenney
Cc: Johan Knöös, Gregory Rose, bugs, Tonghao Zhang, Netdev,
Uladzislau Rezki (Sony),
rcu
On Fri, Aug 07, 2020 at 03:20:15PM -0700, Paul E. McKenney wrote:
> On Fri, Aug 07, 2020 at 04:47:56PM -0400, Joel Fernandes wrote:
> > Hi,
> > Adding more of us working on RCU as well. Johan from another team at
> > Google discovered a likely issue in openswitch, details below:
> >
> > On Fri, Aug 7, 2020 at 11:32 AM Johan Knöös <jknoos@google.com> wrote:
> > >
> > > On Tue, Aug 4, 2020 at 8:52 AM Gregory Rose <gvrose8192@gmail.com> wrote:
> > > >
> > > >
> > > >
> > > > On 8/3/2020 12:01 PM, Johan Knöös via discuss wrote:
> > > > > Hi Open vSwitch contributors,
> > > > >
> > > > > We have found openvswitch is causing double-freeing of memory. The
> > > > > issue was not present in kernel version 5.5.17 but is present in
> > > > > 5.6.14 and newer kernels.
> > > > >
> > > > > After reverting the RCU commits below for debugging, enabling
> > > > > slub_debug, lockdep, and KASAN, we see the warnings at the end of this
> > > > > email in the kernel log (the last one shows the double-free). When I
> > > > > revert 50b0e61b32ee890a75b4377d5fbe770a86d6a4c1 ("net: openvswitch:
> > > > > fix possible memleak on destroy flow-table"), the symptoms disappear.
> > > > > While I have a reliable way to reproduce the issue, I unfortunately
> > > > > don't yet have a process that's amenable to sharing. Please take a
> > > > > look.
> > > > >
> > > > > 189a6883dcf7 rcu: Remove kfree_call_rcu_nobatch()
> > > > > 77a40f97030b rcu: Remove kfree_rcu() special casing and lazy-callback handling
> > > > > e99637becb2e rcu: Add support for debug_objects debugging for kfree_rcu()
> > > > > 0392bebebf26 rcu: Add multiple in-flight batches of kfree_rcu() work
> > > > > 569d767087ef rcu: Make kfree_rcu() use a non-atomic ->monitor_todo
> > > > > a35d16905efc rcu: Add basic support for kfree_rcu() batching
> >
> > Note that these reverts were only for testing the same code, because
> > he was testing 2 different kernel versions. One of them did not have
> > this set. So I asked him to revert. There's no known bug in the
> > reverted code itself. But somehow these patches do make it harder for
> > him to reproduce the issue.
>
> Perhaps they adjust timing?
Yes that could be it. In my testing (which is unrelated to OVS), the issue
happens only with TREE02. I can reproduce the issue in [1] on just boot-up of
TREE02.
I could have screwed up something in my segcblist count patch, any hints
would be great. I'll dig more into it as well.
> >
> > But then again, I have not heard reports of this warning firing. Paul,
> > has this come to your radar recently?
>
> I have not seen any recent WARNs in rcu_do_batch(). I am guessing that
> this is one of the last two in that function?
>
> If so, have you tried using CONFIG_DEBUG_OBJECTS_RCU_HEAD=y? That Kconfig
> option is designed to help locate double frees via RCU.
Yes true, kfree_rcu() also has support for this. Jonathan, did you get a
chance to try this out in your failure scenario?
thanks,
- Joel
[1] https://lore.kernel.org/lkml/20200720005334.GC19262@shao2-debian/
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [ovs-discuss] Double free in recent kernels after memleak fix
2020-08-10 20:08 ` Joel Fernandes
@ 2020-08-10 20:28 ` Paul E. McKenney
2020-08-11 1:14 ` Tonghao Zhang
0 siblings, 1 reply; 14+ messages in thread
From: Paul E. McKenney @ 2020-08-10 20:28 UTC (permalink / raw)
To: Joel Fernandes
Cc: Johan Knöös, Gregory Rose, bugs, Tonghao Zhang, Netdev,
Uladzislau Rezki (Sony),
rcu
On Mon, Aug 10, 2020 at 04:08:59PM -0400, Joel Fernandes wrote:
> On Fri, Aug 07, 2020 at 03:20:15PM -0700, Paul E. McKenney wrote:
> > On Fri, Aug 07, 2020 at 04:47:56PM -0400, Joel Fernandes wrote:
> > > Hi,
> > > Adding more of us working on RCU as well. Johan from another team at
> > > Google discovered a likely issue in openswitch, details below:
> > >
> > > On Fri, Aug 7, 2020 at 11:32 AM Johan Knöös <jknoos@google.com> wrote:
> > > > On Tue, Aug 4, 2020 at 8:52 AM Gregory Rose <gvrose8192@gmail.com> wrote:
> > > > > On 8/3/2020 12:01 PM, Johan Knöös via discuss wrote:
> > > > > > Hi Open vSwitch contributors,
> > > > > >
> > > > > > We have found openvswitch is causing double-freeing of memory. The
> > > > > > issue was not present in kernel version 5.5.17 but is present in
> > > > > > 5.6.14 and newer kernels.
> > > > > >
> > > > > > After reverting the RCU commits below for debugging, enabling
> > > > > > slub_debug, lockdep, and KASAN, we see the warnings at the end of this
> > > > > > email in the kernel log (the last one shows the double-free). When I
> > > > > > revert 50b0e61b32ee890a75b4377d5fbe770a86d6a4c1 ("net: openvswitch:
> > > > > > fix possible memleak on destroy flow-table"), the symptoms disappear.
> > > > > > While I have a reliable way to reproduce the issue, I unfortunately
> > > > > > don't yet have a process that's amenable to sharing. Please take a
> > > > > > look.
> > > > > >
> > > > > > 189a6883dcf7 rcu: Remove kfree_call_rcu_nobatch()
> > > > > > 77a40f97030b rcu: Remove kfree_rcu() special casing and lazy-callback handling
> > > > > > e99637becb2e rcu: Add support for debug_objects debugging for kfree_rcu()
> > > > > > 0392bebebf26 rcu: Add multiple in-flight batches of kfree_rcu() work
> > > > > > 569d767087ef rcu: Make kfree_rcu() use a non-atomic ->monitor_todo
> > > > > > a35d16905efc rcu: Add basic support for kfree_rcu() batching
> > >
> > > Note that these reverts were only for testing the same code, because
> > > he was testing 2 different kernel versions. One of them did not have
> > > this set. So I asked him to revert. There's no known bug in the
> > > reverted code itself. But somehow these patches do make it harder for
> > > him to reproduce the issue.
> >
> > Perhaps they adjust timing?
>
> Yes that could be it. In my testing (which is unrelated to OVS), the issue
> happens only with TREE02. I can reproduce the issue in [1] on just boot-up of
> TREE02.
>
> I could have screwed up something in my segcblist count patch, any hints
> would be great. I'll dig more into it as well.
Has anyone taken a close look at 50b0e61b32ee ("net: openvswitch: fix
possible memleak on destroy flow-table") commit? Maybe it avoided the
memleak so thoroughly that it did a double free?
Thanx, Paul
> > > But then again, I have not heard reports of this warning firing. Paul,
> > > has this come to your radar recently?
> >
> > I have not seen any recent WARNs in rcu_do_batch(). I am guessing that
> > this is one of the last two in that function?
> >
> > If so, have you tried using CONFIG_DEBUG_OBJECTS_RCU_HEAD=y? That Kconfig
> > option is designed to help locate double frees via RCU.
>
> Yes true, kfree_rcu() also has support for this. Jonathan, did you get a
> chance to try this out in your failure scenario?
>
> thanks,
>
> - Joel
>
> [1] https://lore.kernel.org/lkml/20200720005334.GC19262@shao2-debian/
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [ovs-discuss] Double free in recent kernels after memleak fix
2020-08-10 20:28 ` Paul E. McKenney
@ 2020-08-11 1:14 ` Tonghao Zhang
2020-08-11 2:24 ` Cong Wang
0 siblings, 1 reply; 14+ messages in thread
From: Tonghao Zhang @ 2020-08-11 1:14 UTC (permalink / raw)
To: paulmck
Cc: Joel Fernandes, Johan Knöös, Gregory Rose, bugs,
Netdev, Uladzislau Rezki (Sony),
rcu
On Tue, Aug 11, 2020 at 4:28 AM Paul E. McKenney <paulmck@kernel.org> wrote:
>
> On Mon, Aug 10, 2020 at 04:08:59PM -0400, Joel Fernandes wrote:
> > On Fri, Aug 07, 2020 at 03:20:15PM -0700, Paul E. McKenney wrote:
> > > On Fri, Aug 07, 2020 at 04:47:56PM -0400, Joel Fernandes wrote:
> > > > Hi,
> > > > Adding more of us working on RCU as well. Johan from another team at
> > > > Google discovered a likely issue in openswitch, details below:
> > > >
> > > > On Fri, Aug 7, 2020 at 11:32 AM Johan Knöös <jknoos@google.com> wrote:
> > > > > On Tue, Aug 4, 2020 at 8:52 AM Gregory Rose <gvrose8192@gmail.com> wrote:
> > > > > > On 8/3/2020 12:01 PM, Johan Knöös via discuss wrote:
> > > > > > > Hi Open vSwitch contributors,
> > > > > > >
> > > > > > > We have found openvswitch is causing double-freeing of memory. The
> > > > > > > issue was not present in kernel version 5.5.17 but is present in
> > > > > > > 5.6.14 and newer kernels.
> > > > > > >
> > > > > > > After reverting the RCU commits below for debugging, enabling
> > > > > > > slub_debug, lockdep, and KASAN, we see the warnings at the end of this
> > > > > > > email in the kernel log (the last one shows the double-free). When I
> > > > > > > revert 50b0e61b32ee890a75b4377d5fbe770a86d6a4c1 ("net: openvswitch:
> > > > > > > fix possible memleak on destroy flow-table"), the symptoms disappear.
> > > > > > > While I have a reliable way to reproduce the issue, I unfortunately
> > > > > > > don't yet have a process that's amenable to sharing. Please take a
> > > > > > > look.
> > > > > > >
> > > > > > > 189a6883dcf7 rcu: Remove kfree_call_rcu_nobatch()
> > > > > > > 77a40f97030b rcu: Remove kfree_rcu() special casing and lazy-callback handling
> > > > > > > e99637becb2e rcu: Add support for debug_objects debugging for kfree_rcu()
> > > > > > > 0392bebebf26 rcu: Add multiple in-flight batches of kfree_rcu() work
> > > > > > > 569d767087ef rcu: Make kfree_rcu() use a non-atomic ->monitor_todo
> > > > > > > a35d16905efc rcu: Add basic support for kfree_rcu() batching
> > > >
> > > > Note that these reverts were only for testing the same code, because
> > > > he was testing 2 different kernel versions. One of them did not have
> > > > this set. So I asked him to revert. There's no known bug in the
> > > > reverted code itself. But somehow these patches do make it harder for
> > > > him to reproduce the issue.
> > >
> > > Perhaps they adjust timing?
> >
> > Yes that could be it. In my testing (which is unrelated to OVS), the issue
> > happens only with TREE02. I can reproduce the issue in [1] on just boot-up of
> > TREE02.
> >
> > I could have screwed up something in my segcblist count patch, any hints
> > would be great. I'll dig more into it as well.
>
> Has anyone taken a close look at 50b0e61b32ee ("net: openvswitch: fix
> possible memleak on destroy flow-table") commit? Maybe it avoided the
> memleak so thoroughly that it did a double free?
Hi all, I send a patch to fix this. The rcu warnings disappear. I
don't reproduce the double free issue.
But I guess this patch may address this issue.
http://patchwork.ozlabs.org/project/netdev/patch/20200811011001.75690-1-xiangxia.m.yue@gmail.com/
> Thanx, Paul
>
> > > > But then again, I have not heard reports of this warning firing. Paul,
> > > > has this come to your radar recently?
> > >
> > > I have not seen any recent WARNs in rcu_do_batch(). I am guessing that
> > > this is one of the last two in that function?
> > >
> > > If so, have you tried using CONFIG_DEBUG_OBJECTS_RCU_HEAD=y? That Kconfig
> > > option is designed to help locate double frees via RCU.
> >
> > Yes true, kfree_rcu() also has support for this. Jonathan, did you get a
> > chance to try this out in your failure scenario?
> >
> > thanks,
> >
> > - Joel
> >
> > [1] https://lore.kernel.org/lkml/20200720005334.GC19262@shao2-debian/
--
Best regards, Tonghao
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [ovs-discuss] Double free in recent kernels after memleak fix
2020-08-11 1:14 ` Tonghao Zhang
@ 2020-08-11 2:24 ` Cong Wang
2020-08-11 3:26 ` Tonghao Zhang
0 siblings, 1 reply; 14+ messages in thread
From: Cong Wang @ 2020-08-11 2:24 UTC (permalink / raw)
To: Tonghao Zhang
Cc: Paul E . McKenney, Joel Fernandes, Johan Knöös,
Gregory Rose, bugs, Netdev, Uladzislau Rezki (Sony),
rcu
On Mon, Aug 10, 2020 at 6:16 PM Tonghao Zhang <xiangxia.m.yue@gmail.com> wrote:
> Hi all, I send a patch to fix this. The rcu warnings disappear. I
> don't reproduce the double free issue.
> But I guess this patch may address this issue.
>
> http://patchwork.ozlabs.org/project/netdev/patch/20200811011001.75690-1-xiangxia.m.yue@gmail.com/
I don't see how your patch address the double-free, as we still
free mask array twice after your patch: once in tbl_mask_array_realloc()
and once in ovs_flow_tbl_destroy().
Have you tried my patch which is supposed to address this double-free?
It simply skips the reallocation as it makes no sense to trigger reallocation
when destroying it.
Thanks.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [ovs-discuss] Double free in recent kernels after memleak fix
2020-08-11 2:24 ` Cong Wang
@ 2020-08-11 3:26 ` Tonghao Zhang
2020-08-11 4:07 ` Cong Wang
0 siblings, 1 reply; 14+ messages in thread
From: Tonghao Zhang @ 2020-08-11 3:26 UTC (permalink / raw)
To: Cong Wang
Cc: Paul E . McKenney, Joel Fernandes, Johan Knöös,
Gregory Rose, bugs, Netdev, Uladzislau Rezki (Sony),
rcu
On Tue, Aug 11, 2020 at 10:24 AM Cong Wang <xiyou.wangcong@gmail.com> wrote:
>
> On Mon, Aug 10, 2020 at 6:16 PM Tonghao Zhang <xiangxia.m.yue@gmail.com> wrote:
> > Hi all, I send a patch to fix this. The rcu warnings disappear. I
> > don't reproduce the double free issue.
> > But I guess this patch may address this issue.
> >
> > http://patchwork.ozlabs.org/project/netdev/patch/20200811011001.75690-1-xiangxia.m.yue@gmail.com/
>
> I don't see how your patch address the double-free, as we still
> free mask array twice after your patch: once in tbl_mask_array_realloc()
> and once in ovs_flow_tbl_destroy().
Hi Cong.
Before my patch, we use the ovsl_dereference
(rcu_dereference_protected) in the rcu callback.
ovs_flow_tbl_destroy
->table_instance_destroy
->table_instance_flow_free
->flow_mask_remove
ASSERT_OVSL(will print warning)
->tbl_mask_array_del_mask
ovsl_dereference(rcu usage warning)
so we should invoke the table_instance_destroy or others under
ovs_lock to avoid (ASSERT_OVSL and rcu usage warning).
with this patch, we reallocate the mask_array under ovs_lock, and free
it in the rcu callback. Without it, we reallocate and free it in the
rcu callback.
I think we may fix it with this patch.
> Have you tried my patch which is supposed to address this double-free?
I don't reproduce it. but your patch does not avoid ruc usage warning
and ASSERT_OVSL.
> It simply skips the reallocation as it makes no sense to trigger reallocation
> when destroying it.
>
> Thanks.
--
Best regards, Tonghao
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [ovs-discuss] Double free in recent kernels after memleak fix
2020-08-11 3:26 ` Tonghao Zhang
@ 2020-08-11 4:07 ` Cong Wang
2020-08-11 5:58 ` Tonghao Zhang
0 siblings, 1 reply; 14+ messages in thread
From: Cong Wang @ 2020-08-11 4:07 UTC (permalink / raw)
To: Tonghao Zhang
Cc: Paul E . McKenney, Joel Fernandes, Johan Knöös,
Gregory Rose, bugs, Netdev, Uladzislau Rezki (Sony),
rcu
On Mon, Aug 10, 2020 at 8:27 PM Tonghao Zhang <xiangxia.m.yue@gmail.com> wrote:
>
> On Tue, Aug 11, 2020 at 10:24 AM Cong Wang <xiyou.wangcong@gmail.com> wrote:
> >
> > On Mon, Aug 10, 2020 at 6:16 PM Tonghao Zhang <xiangxia.m.yue@gmail.com> wrote:
> > > Hi all, I send a patch to fix this. The rcu warnings disappear. I
> > > don't reproduce the double free issue.
> > > But I guess this patch may address this issue.
> > >
> > > http://patchwork.ozlabs.org/project/netdev/patch/20200811011001.75690-1-xiangxia.m.yue@gmail.com/
> >
> > I don't see how your patch address the double-free, as we still
> > free mask array twice after your patch: once in tbl_mask_array_realloc()
> > and once in ovs_flow_tbl_destroy().
> Hi Cong.
> Before my patch, we use the ovsl_dereference
> (rcu_dereference_protected) in the rcu callback.
> ovs_flow_tbl_destroy
> ->table_instance_destroy
> ->table_instance_flow_free
> ->flow_mask_remove
> ASSERT_OVSL(will print warning)
> ->tbl_mask_array_del_mask
> ovsl_dereference(rcu usage warning)
>
I understand how your patch addresses the RCU annotation issue,
which is different from double-free.
> so we should invoke the table_instance_destroy or others under
> ovs_lock to avoid (ASSERT_OVSL and rcu usage warning).
Of course... I never doubt it.
> with this patch, we reallocate the mask_array under ovs_lock, and free
> it in the rcu callback. Without it, we reallocate and free it in the
> rcu callback.
> I think we may fix it with this patch.
Does it matter which context tbl_mask_array_realloc() is called?
Even with ovs_lock, we can still double free:
ovs_lock()
tbl_mask_array_realloc()
=> call_rcu(&old->rcu, mask_array_rcu_cb);
ovs_unlock()
...
ovs_flow_tbl_destroy()
=> call_rcu(&old->rcu, mask_array_rcu_cb);
So still twice, right? To fix the double-free, we have to eliminate one
of them, don't we? ;)
>
> > Have you tried my patch which is supposed to address this double-free?
> I don't reproduce it. but your patch does not avoid ruc usage warning
> and ASSERT_OVSL.
Sure, I never intend to fix anything else but double-free. The $subject is
about double free, I double checked. ;)
Thanks.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [ovs-discuss] Double free in recent kernels after memleak fix
2020-08-11 4:07 ` Cong Wang
@ 2020-08-11 5:58 ` Tonghao Zhang
2020-08-11 18:28 ` Johan Knöös
2020-08-12 0:43 ` Cong Wang
0 siblings, 2 replies; 14+ messages in thread
From: Tonghao Zhang @ 2020-08-11 5:58 UTC (permalink / raw)
To: Cong Wang
Cc: Paul E . McKenney, Joel Fernandes, Johan Knöös,
Gregory Rose, bugs, Netdev, Uladzislau Rezki (Sony),
rcu
On Tue, Aug 11, 2020 at 12:08 PM Cong Wang <xiyou.wangcong@gmail.com> wrote:
>
> On Mon, Aug 10, 2020 at 8:27 PM Tonghao Zhang <xiangxia.m.yue@gmail.com> wrote:
> >
> > On Tue, Aug 11, 2020 at 10:24 AM Cong Wang <xiyou.wangcong@gmail.com> wrote:
> > >
> > > On Mon, Aug 10, 2020 at 6:16 PM Tonghao Zhang <xiangxia.m.yue@gmail.com> wrote:
> > > > Hi all, I send a patch to fix this. The rcu warnings disappear. I
> > > > don't reproduce the double free issue.
> > > > But I guess this patch may address this issue.
> > > >
> > > > http://patchwork.ozlabs.org/project/netdev/patch/20200811011001.75690-1-xiangxia.m.yue@gmail.com/
> > >
> > > I don't see how your patch address the double-free, as we still
> > > free mask array twice after your patch: once in tbl_mask_array_realloc()
> > > and once in ovs_flow_tbl_destroy().
> > Hi Cong.
> > Before my patch, we use the ovsl_dereference
> > (rcu_dereference_protected) in the rcu callback.
> > ovs_flow_tbl_destroy
> > ->table_instance_destroy
> > ->table_instance_flow_free
> > ->flow_mask_remove
> > ASSERT_OVSL(will print warning)
> > ->tbl_mask_array_del_mask
> > ovsl_dereference(rcu usage warning)
> >
>
> I understand how your patch addresses the RCU annotation issue,
> which is different from double-free.
>
>
> > so we should invoke the table_instance_destroy or others under
> > ovs_lock to avoid (ASSERT_OVSL and rcu usage warning).
>
> Of course... I never doubt it.
>
>
> > with this patch, we reallocate the mask_array under ovs_lock, and free
> > it in the rcu callback. Without it, we reallocate and free it in the
> > rcu callback.
> > I think we may fix it with this patch.
>
> Does it matter which context tbl_mask_array_realloc() is called?
> Even with ovs_lock, we can still double free:
>
> ovs_lock()
> tbl_mask_array_realloc()
> => call_rcu(&old->rcu, mask_array_rcu_cb);
> ovs_unlock()
> ...
> ovs_flow_tbl_destroy()
> => call_rcu(&old->rcu, mask_array_rcu_cb);
>
> So still twice, right? To fix the double-free, we have to eliminate one
> of them, don't we? ;)
No
Without my patch: in rcu callback:
ovs_flow_tbl_destroy
->call_rcu(&ma->rcu, mask_array_rcu_cb);
->table_instance_destroy
->tbl_mask_array_realloc(Shrink the mask array if necessary)
->call_rcu(&old->rcu, mask_array_rcu_cb);
With the patch:
ovs_lock
table_instance_flow_flush (free the flow)
tbl_mask_array_realloc(shrink the mask array if necessary, will free
mask_array in rcu(mask_array_rcu_cb) and rcu_assign_pointer new
mask_array)
ovs_unlock
in rcu callback:
ovs_flow_tbl_destroy
call_rcu(&ma->rcu, mask_array_rcu_cb);(that is new mask_array)
>
> >
> > > Have you tried my patch which is supposed to address this double-free?
> > I don't reproduce it. but your patch does not avoid ruc usage warning
> > and ASSERT_OVSL.
>
> Sure, I never intend to fix anything else but double-free. The $subject is
> about double free, I double checked. ;)
>
> Thanks.
--
Best regards, Tonghao
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [ovs-discuss] Double free in recent kernels after memleak fix
2020-08-11 5:58 ` Tonghao Zhang
@ 2020-08-11 18:28 ` Johan Knöös
2020-08-12 0:43 ` Cong Wang
1 sibling, 0 replies; 14+ messages in thread
From: Johan Knöös @ 2020-08-11 18:28 UTC (permalink / raw)
To: Tonghao Zhang
Cc: Cong Wang, Paul E . McKenney, Joel Fernandes, Gregory Rose, bugs,
Netdev, Uladzislau Rezki (Sony),
rcu
On Mon, Aug 10, 2020 at 10:59 PM Tonghao Zhang <xiangxia.m.yue@gmail.com> wrote:
>
> On Tue, Aug 11, 2020 at 12:08 PM Cong Wang <xiyou.wangcong@gmail.com> wrote:
> >
> > On Mon, Aug 10, 2020 at 8:27 PM Tonghao Zhang <xiangxia.m.yue@gmail.com> wrote:
> > >
> > > On Tue, Aug 11, 2020 at 10:24 AM Cong Wang <xiyou.wangcong@gmail.com> wrote:
> > > >
> > > > On Mon, Aug 10, 2020 at 6:16 PM Tonghao Zhang <xiangxia.m.yue@gmail.com> wrote:
> > > > > Hi all, I send a patch to fix this. The rcu warnings disappear. I
> > > > > don't reproduce the double free issue.
> > > > > But I guess this patch may address this issue.
> > > > >
> > > > > http://patchwork.ozlabs.org/project/netdev/patch/20200811011001.75690-1-xiangxia.m.yue@gmail.com/
> > > >
> > > > I don't see how your patch address the double-free, as we still
> > > > free mask array twice after your patch: once in tbl_mask_array_realloc()
> > > > and once in ovs_flow_tbl_destroy().
> > > Hi Cong.
> > > Before my patch, we use the ovsl_dereference
> > > (rcu_dereference_protected) in the rcu callback.
> > > ovs_flow_tbl_destroy
> > > ->table_instance_destroy
> > > ->table_instance_flow_free
> > > ->flow_mask_remove
> > > ASSERT_OVSL(will print warning)
> > > ->tbl_mask_array_del_mask
> > > ovsl_dereference(rcu usage warning)
> > >
> >
> > I understand how your patch addresses the RCU annotation issue,
> > which is different from double-free.
> >
> >
> > > so we should invoke the table_instance_destroy or others under
> > > ovs_lock to avoid (ASSERT_OVSL and rcu usage warning).
> >
> > Of course... I never doubt it.
> >
> >
> > > with this patch, we reallocate the mask_array under ovs_lock, and free
> > > it in the rcu callback. Without it, we reallocate and free it in the
> > > rcu callback.
> > > I think we may fix it with this patch.
> >
> > Does it matter which context tbl_mask_array_realloc() is called?
> > Even with ovs_lock, we can still double free:
> >
> > ovs_lock()
> > tbl_mask_array_realloc()
> > => call_rcu(&old->rcu, mask_array_rcu_cb);
> > ovs_unlock()
> > ...
> > ovs_flow_tbl_destroy()
> > => call_rcu(&old->rcu, mask_array_rcu_cb);
> >
> > So still twice, right? To fix the double-free, we have to eliminate one
> > of them, don't we? ;)
> No
> Without my patch: in rcu callback:
> ovs_flow_tbl_destroy
> ->call_rcu(&ma->rcu, mask_array_rcu_cb);
> ->table_instance_destroy
> ->tbl_mask_array_realloc(Shrink the mask array if necessary)
> ->call_rcu(&old->rcu, mask_array_rcu_cb);
>
> With the patch:
> ovs_lock
> table_instance_flow_flush (free the flow)
> tbl_mask_array_realloc(shrink the mask array if necessary, will free
> mask_array in rcu(mask_array_rcu_cb) and rcu_assign_pointer new
> mask_array)
> ovs_unlock
>
> in rcu callback:
> ovs_flow_tbl_destroy
> call_rcu(&ma->rcu, mask_array_rcu_cb);(that is new mask_array)
>
> >
> > >
> > > > Have you tried my patch which is supposed to address this double-free?
> > > I don't reproduce it. but your patch does not avoid ruc usage warning
> > > and ASSERT_OVSL.
> >
> > Sure, I never intend to fix anything else but double-free. The $subject is
> > about double free, I double checked. ;)
> >
> > Thanks.
>
>
>
> --
> Best regards, Tonghao
Cong and Tonghao, thanks for your patches. I cannot repro the double
free with either of them, and the "suspicious RCU usage" and the
ASSERT_OVSL warnings are also gone with Tonghao's patch.
Tonghao, from your sequence above it looks like it should fix the
https://elixir.bootlin.com/linux/v5.5.17/source/kernel/rcu/tree.c#L2239
warning, correct?
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [ovs-discuss] Double free in recent kernels after memleak fix
2020-08-11 5:58 ` Tonghao Zhang
2020-08-11 18:28 ` Johan Knöös
@ 2020-08-12 0:43 ` Cong Wang
1 sibling, 0 replies; 14+ messages in thread
From: Cong Wang @ 2020-08-12 0:43 UTC (permalink / raw)
To: Tonghao Zhang
Cc: Paul E . McKenney, Joel Fernandes, Johan Knöös,
Gregory Rose, bugs, Netdev, Uladzislau Rezki (Sony),
rcu
On Mon, Aug 10, 2020 at 10:59 PM Tonghao Zhang <xiangxia.m.yue@gmail.com> wrote:
>
> On Tue, Aug 11, 2020 at 12:08 PM Cong Wang <xiyou.wangcong@gmail.com> wrote:
> >
> > On Mon, Aug 10, 2020 at 8:27 PM Tonghao Zhang <xiangxia.m.yue@gmail.com> wrote:
> > >
> > > On Tue, Aug 11, 2020 at 10:24 AM Cong Wang <xiyou.wangcong@gmail.com> wrote:
> > > >
> > > > On Mon, Aug 10, 2020 at 6:16 PM Tonghao Zhang <xiangxia.m.yue@gmail.com> wrote:
> > > > > Hi all, I send a patch to fix this. The rcu warnings disappear. I
> > > > > don't reproduce the double free issue.
> > > > > But I guess this patch may address this issue.
> > > > >
> > > > > http://patchwork.ozlabs.org/project/netdev/patch/20200811011001.75690-1-xiangxia.m.yue@gmail.com/
> > > >
> > > > I don't see how your patch address the double-free, as we still
> > > > free mask array twice after your patch: once in tbl_mask_array_realloc()
> > > > and once in ovs_flow_tbl_destroy().
> > > Hi Cong.
> > > Before my patch, we use the ovsl_dereference
> > > (rcu_dereference_protected) in the rcu callback.
> > > ovs_flow_tbl_destroy
> > > ->table_instance_destroy
> > > ->table_instance_flow_free
> > > ->flow_mask_remove
> > > ASSERT_OVSL(will print warning)
> > > ->tbl_mask_array_del_mask
> > > ovsl_dereference(rcu usage warning)
> > >
> >
> > I understand how your patch addresses the RCU annotation issue,
> > which is different from double-free.
> >
> >
> > > so we should invoke the table_instance_destroy or others under
> > > ovs_lock to avoid (ASSERT_OVSL and rcu usage warning).
> >
> > Of course... I never doubt it.
> >
> >
> > > with this patch, we reallocate the mask_array under ovs_lock, and free
> > > it in the rcu callback. Without it, we reallocate and free it in the
> > > rcu callback.
> > > I think we may fix it with this patch.
> >
> > Does it matter which context tbl_mask_array_realloc() is called?
> > Even with ovs_lock, we can still double free:
> >
> > ovs_lock()
> > tbl_mask_array_realloc()
> > => call_rcu(&old->rcu, mask_array_rcu_cb);
> > ovs_unlock()
> > ...
> > ovs_flow_tbl_destroy()
> > => call_rcu(&old->rcu, mask_array_rcu_cb);
> >
> > So still twice, right? To fix the double-free, we have to eliminate one
> > of them, don't we? ;)
> No
> Without my patch: in rcu callback:
> ovs_flow_tbl_destroy
> ->call_rcu(&ma->rcu, mask_array_rcu_cb);
> ->table_instance_destroy
> ->tbl_mask_array_realloc(Shrink the mask array if necessary)
> ->call_rcu(&old->rcu, mask_array_rcu_cb);
>
> With the patch:
> ovs_lock
> table_instance_flow_flush (free the flow)
> tbl_mask_array_realloc(shrink the mask array if necessary, will free
> mask_array in rcu(mask_array_rcu_cb) and rcu_assign_pointer new
> mask_array)
> ovs_unlock
>
> in rcu callback:
> ovs_flow_tbl_destroy
> call_rcu(&ma->rcu, mask_array_rcu_cb);(that is new mask_array)
Ah, I see, I thought the mask array was cached in caller and passed along,
it is in fact refetched via &dp->table.
Thanks.
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2020-08-12 0:43 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <CA+Sh73MJhqs7PBk6OV2AhzVjYvE1foUQUnwP5DwWR44LHZRZ9w@mail.gmail.com>
[not found] ` <58be64c5-9ae4-95ff-629e-f55e47ff020b@gmail.com>
[not found] ` <CA+Sh73NeNr+UNZYDfD1nHUXCY-P8mT1vJdm0cEY4MPwo_0PtzQ@mail.gmail.com>
2020-08-07 20:47 ` [ovs-discuss] Double free in recent kernels after memleak fix Joel Fernandes
2020-08-07 20:49 ` Joel Fernandes
2020-08-07 22:20 ` Paul E. McKenney
2020-08-07 23:05 ` Johan Knöös
2020-08-08 11:44 ` Uladzislau Rezki
2020-08-10 20:08 ` Joel Fernandes
2020-08-10 20:28 ` Paul E. McKenney
2020-08-11 1:14 ` Tonghao Zhang
2020-08-11 2:24 ` Cong Wang
2020-08-11 3:26 ` Tonghao Zhang
2020-08-11 4:07 ` Cong Wang
2020-08-11 5:58 ` Tonghao Zhang
2020-08-11 18:28 ` Johan Knöös
2020-08-12 0:43 ` Cong Wang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).