linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] net: check negative value for signed refcnt
@ 2019-01-31 13:20 alexandre.besnard
  2019-01-31 13:49 ` Kirill Tkhai
  0 siblings, 1 reply; 7+ messages in thread
From: alexandre.besnard @ 2019-01-31 13:20 UTC (permalink / raw)
  To: davem, ktkhai, ecree, jiri, petrm, alexander.h.duyck,
	amritha.nambiar, lirongqing
  Cc: netdev, linux-kernel, Alexandre Besnard

From: Alexandre Besnard <alexandre.besnard@softathome.com>

Device remaining references counter is get as a signed integer.

When unregistering network devices, the loop waiting for this counter
to decrement tests the 0 strict equality. Thus if an error occurs and
two references are given back by a protocol, we are stuck in the loop
forever, with a -1 value.

Robustness is added by checking a negative value: the device is then
considered free of references, and a warning is issued (it should not
happen, one should check that behavior)

Signed-off-by: Alexandre Besnard <alexandre.besnard@softathome.com>
---
 net/core/dev.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/net/core/dev.c b/net/core/dev.c
index ddc551f..e4190ae 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -8687,6 +8687,11 @@ static void netdev_wait_allrefs(struct net_device *dev)
 	refcnt = netdev_refcnt_read(dev);

 	while (refcnt != 0) {
+		if (refcnt < 0) {
+			pr_warn("Device %s refcnt negative: device considered free, but it should not happen\n",
+				dev->name);
+			break;
+		}
 		if (time_after(jiffies, rebroadcast_time + 1 * HZ)) {
 			rtnl_lock();

--
2.7.4


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] net: check negative value for signed refcnt
  2019-01-31 13:20 [PATCH] net: check negative value for signed refcnt alexandre.besnard
@ 2019-01-31 13:49 ` Kirill Tkhai
  2019-01-31 15:14   ` Alexandre BESNARD
  2019-01-31 15:15   ` Eric Dumazet
  0 siblings, 2 replies; 7+ messages in thread
From: Kirill Tkhai @ 2019-01-31 13:49 UTC (permalink / raw)
  To: alexandre.besnard, davem, ecree, jiri, petrm, alexander.h.duyck,
	amritha.nambiar, lirongqing
  Cc: netdev, linux-kernel

Hi, Alexandre,

On 31.01.2019 16:20, alexandre.besnard@softathome.com wrote:
> From: Alexandre Besnard <alexandre.besnard@softathome.com>
> 
> Device remaining references counter is get as a signed integer.
> 
> When unregistering network devices, the loop waiting for this counter
> to decrement tests the 0 strict equality. Thus if an error occurs and
> two references are given back by a protocol, we are stuck in the loop
> forever, with a -1 value.
> 
> Robustness is added by checking a negative value: the device is then
> considered free of references, and a warning is issued (it should not
> happen, one should check that behavior)
> 
> Signed-off-by: Alexandre Besnard <alexandre.besnard@softathome.com>
> ---
>  net/core/dev.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/net/core/dev.c b/net/core/dev.c
> index ddc551f..e4190ae 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -8687,6 +8687,11 @@ static void netdev_wait_allrefs(struct net_device *dev)
>  	refcnt = netdev_refcnt_read(dev);
> 
>  	while (refcnt != 0) {
> +		if (refcnt < 0) {
> +			pr_warn("Device %s refcnt negative: device considered free, but it should not happen\n",
> +				dev->name);
> +			break;
> +		}

1)I don't think this is a good approach. Negative value does not guarantee
there is just a double put of device reference. Negative value is an indicator
something goes wrong, and we definitely should not free device memory in
this case.

2)Not related to your patch -- it looks like we have problem in existing
code with this netdev_refcnt_read(). It does not imply a memory ordering
or some guarantees about reading percpu values. For example, in generic
code struct percpu_ref switches a counter into atomic mode before it checks
for the last reference. But there is nothing in netdev_refcnt_read().



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] net: check negative value for signed refcnt
  2019-01-31 13:49 ` Kirill Tkhai
@ 2019-01-31 15:14   ` Alexandre BESNARD
  2019-01-31 15:31     ` Kirill Tkhai
  2019-01-31 15:15   ` Eric Dumazet
  1 sibling, 1 reply; 7+ messages in thread
From: Alexandre BESNARD @ 2019-01-31 15:14 UTC (permalink / raw)
  To: Kirill Tkhai
  Cc: davem, amritha nambiar, lirongqing, netdev, linux-kernel,
	alexander h duyck, jiri, petrm, ecree

Hi Kirill, and thanks for your time,

On 31 Jan 19 14:49, Kirill Tkhai ktkhai@virtuozzo.com wrote :

> Hi, Alexandre,

> On 31.01.2019 16:20, alexandre.besnard@softathome.com wrote:
> > From: Alexandre Besnard <alexandre.besnard@softathome.com>

> > Device remaining references counter is get as a signed integer.

> > When unregistering network devices, the loop waiting for this counter
> > to decrement tests the 0 strict equality. Thus if an error occurs and
> > two references are given back by a protocol, we are stuck in the loop
> > forever, with a -1 value.

> > Robustness is added by checking a negative value: the device is then
> > considered free of references, and a warning is issued (it should not
> > happen, one should check that behavior)

> > Signed-off-by: Alexandre Besnard <alexandre.besnard@softathome.com>
> > ---
> > net/core/dev.c | 5 +++++
> > 1 file changed, 5 insertions(+)

> > diff --git a/net/core/dev.c b/net/core/dev.c
> > index ddc551f..e4190ae 100644
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -8687,6 +8687,11 @@ static void netdev_wait_allrefs(struct net_device *dev)
> > refcnt = netdev_refcnt_read(dev);

> > while (refcnt != 0) {
> > + if (refcnt < 0) {
> > + pr_warn("Device %s refcnt negative: device considered free, but it should not
> > happen\n",
> > + dev->name);
> > + break;
> > + }

> 1)I don't think this is a good approach. Negative value does not guarantee
> there is just a double put of device reference. Negative value is an indicator
> something goes wrong, and we definitely should not free device memory in
> this case.

> 2)Not related to your patch -- it looks like we have problem in existing
> code with this netdev_refcnt_read(). It does not imply a memory ordering
> or some guarantees about reading percpu values. For example, in generic
> code struct percpu_ref switches a counter into atomic mode before it checks
> for the last reference. But there is nothing in netdev_refcnt_read().

I agree with you, as it is not a full fix for a bad behavior of the refcnt: many
wrong things could happen here, and that's why I added a warning (short of a
more critical flag I could think of).

However, I think this is a good approach as a global workaround for any critical
situation caused by a negative refcnt, acting as a failsafe. What I try to avoid
here is not the bug, but a situation such as a deadlock keeping a system from
powering off, or way worse in the system life.
On the other hand, I can't think of a critical situation caused by freeing
the device memory. Processes or even systems may crash in some cases, but it
should be an expected behavior in such a case IMHO.

Actually, I think that with the current implementation, most of the systems
locked in the problem are powered off.

Do you think of any issue beyond this behavior ? 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] net: check negative value for signed refcnt
  2019-01-31 13:49 ` Kirill Tkhai
  2019-01-31 15:14   ` Alexandre BESNARD
@ 2019-01-31 15:15   ` Eric Dumazet
  2019-01-31 15:21     ` Eric Dumazet
  1 sibling, 1 reply; 7+ messages in thread
From: Eric Dumazet @ 2019-01-31 15:15 UTC (permalink / raw)
  To: Kirill Tkhai, alexandre.besnard, davem, ecree, jiri, petrm,
	alexander.h.duyck, amritha.nambiar, lirongqing
  Cc: netdev, linux-kernel



On 01/31/2019 05:49 AM, Kirill Tkhai wrote:
> 
> 2)Not related to your patch -- it looks like we have problem in existing
> code with this netdev_refcnt_read(). It does not imply a memory ordering
> or some guarantees about reading percpu values. For example, in generic
> code struct percpu_ref switches a counter into atomic mode before it checks
> for the last reference. But there is nothing in netdev_refcnt_read().

Well, if we read an old value here, after a full and expensive synchronize_net(),
then we would have lot more problems than simply having a second round in
netdev_wait_allrefs()
 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] net: check negative value for signed refcnt
  2019-01-31 15:15   ` Eric Dumazet
@ 2019-01-31 15:21     ` Eric Dumazet
  2019-01-31 15:34       ` Kirill Tkhai
  0 siblings, 1 reply; 7+ messages in thread
From: Eric Dumazet @ 2019-01-31 15:21 UTC (permalink / raw)
  To: Kirill Tkhai, alexandre.besnard, davem, ecree, jiri, petrm,
	alexander.h.duyck, amritha.nambiar, lirongqing
  Cc: netdev, linux-kernel



On 01/31/2019 07:15 AM, Eric Dumazet wrote:
> 
> 
> On 01/31/2019 05:49 AM, Kirill Tkhai wrote:
>>
>> 2)Not related to your patch -- it looks like we have problem in existing
>> code with this netdev_refcnt_read(). It does not imply a memory ordering
>> or some guarantees about reading percpu values. For example, in generic
>> code struct percpu_ref switches a counter into atomic mode before it checks
>> for the last reference. But there is nothing in netdev_refcnt_read().
> 
> Well, if we read an old value here, after a full and expensive synchronize_net(),
> then we would have lot more problems than simply having a second round in
> netdev_wait_allrefs()
>  
> 

percpu_ref was added more recently than the netdev_refcnt stuff, and is
interesting for users wanting a synchronous wait for the refcnt reaching 0.

netdev_wait_allrefs() was designed to be asynchronous, so that we at least release
RTNL (and current cpu) when something is wrong and a device can not be dismantled.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] net: check negative value for signed refcnt
  2019-01-31 15:14   ` Alexandre BESNARD
@ 2019-01-31 15:31     ` Kirill Tkhai
  0 siblings, 0 replies; 7+ messages in thread
From: Kirill Tkhai @ 2019-01-31 15:31 UTC (permalink / raw)
  To: Alexandre BESNARD
  Cc: davem, amritha nambiar, lirongqing, netdev, linux-kernel,
	alexander h duyck, jiri, petrm, ecree

On 31.01.2019 18:14, Alexandre BESNARD wrote:
> Hi Kirill, and thanks for your time,
> 
> On 31 Jan 19 14:49, Kirill Tkhai ktkhai@virtuozzo.com wrote :
> 
>> Hi, Alexandre,
> 
>> On 31.01.2019 16:20, alexandre.besnard@softathome.com wrote:
>>> From: Alexandre Besnard <alexandre.besnard@softathome.com>
> 
>>> Device remaining references counter is get as a signed integer.
> 
>>> When unregistering network devices, the loop waiting for this counter
>>> to decrement tests the 0 strict equality. Thus if an error occurs and
>>> two references are given back by a protocol, we are stuck in the loop
>>> forever, with a -1 value.
> 
>>> Robustness is added by checking a negative value: the device is then
>>> considered free of references, and a warning is issued (it should not
>>> happen, one should check that behavior)
> 
>>> Signed-off-by: Alexandre Besnard <alexandre.besnard@softathome.com>
>>> ---
>>> net/core/dev.c | 5 +++++
>>> 1 file changed, 5 insertions(+)
> 
>>> diff --git a/net/core/dev.c b/net/core/dev.c
>>> index ddc551f..e4190ae 100644
>>> --- a/net/core/dev.c
>>> +++ b/net/core/dev.c
>>> @@ -8687,6 +8687,11 @@ static void netdev_wait_allrefs(struct net_device *dev)
>>> refcnt = netdev_refcnt_read(dev);
> 
>>> while (refcnt != 0) {
>>> + if (refcnt < 0) {
>>> + pr_warn("Device %s refcnt negative: device considered free, but it should not
>>> happen\n",
>>> + dev->name);
>>> + break;
>>> + }
> 
>> 1)I don't think this is a good approach. Negative value does not guarantee
>> there is just a double put of device reference. Negative value is an indicator
>> something goes wrong, and we definitely should not free device memory in
>> this case.
> 
>> 2)Not related to your patch -- it looks like we have problem in existing
>> code with this netdev_refcnt_read(). It does not imply a memory ordering
>> or some guarantees about reading percpu values. For example, in generic
>> code struct percpu_ref switches a counter into atomic mode before it checks
>> for the last reference. But there is nothing in netdev_refcnt_read().
> 
> I agree with you, as it is not a full fix for a bad behavior of the refcnt: many
> wrong things could happen here, and that's why I added a warning (short of a
> more critical flag I could think of).
> 
> However, I think this is a good approach as a global workaround for any critical
> situation caused by a negative refcnt, acting as a failsafe. What I try to avoid
> here is not the bug, but a situation such as a deadlock keeping a system from
> powering off, or way worse in the system life.
> On the other hand, I can't think of a critical situation caused by freeing
> the device memory. Processes or even systems may crash in some cases, but it
> should be an expected behavior in such a case IMHO.
>
> Actually, I think that with the current implementation, most of the systems
> locked in the problem are powered off.
> 
> Do you think of any issue beyond this behavior ? 

The problem is that network devices are destroyed not only during reboot
of the system. For example, this happens every time network namespace dies.
And nobody wants to have network device memory is released in some undefined
condition, while system is continuing to run.

I see two approaches there. First one is to crash the system in case of
refcounter is not released for a long time (which is user-defined parameter).
The second one is to set the counter to INT_MAX/2 or something like this,
and never release this memory (we do especially this in our virtuozzo 7 kernel).

Maybe there are more solutions and another people will say about them.

Kirill




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] net: check negative value for signed refcnt
  2019-01-31 15:21     ` Eric Dumazet
@ 2019-01-31 15:34       ` Kirill Tkhai
  0 siblings, 0 replies; 7+ messages in thread
From: Kirill Tkhai @ 2019-01-31 15:34 UTC (permalink / raw)
  To: Eric Dumazet, alexandre.besnard, davem, ecree, jiri, petrm,
	alexander.h.duyck, amritha.nambiar, lirongqing
  Cc: netdev, linux-kernel

On 31.01.2019 18:21, Eric Dumazet wrote:
> 
> 
> On 01/31/2019 07:15 AM, Eric Dumazet wrote:
>>
>>
>> On 01/31/2019 05:49 AM, Kirill Tkhai wrote:
>>>
>>> 2)Not related to your patch -- it looks like we have problem in existing
>>> code with this netdev_refcnt_read(). It does not imply a memory ordering
>>> or some guarantees about reading percpu values. For example, in generic
>>> code struct percpu_ref switches a counter into atomic mode before it checks
>>> for the last reference. But there is nothing in netdev_refcnt_read().
>>
>> Well, if we read an old value here, after a full and expensive synchronize_net(),
>> then we would have lot more problems than simply having a second round in
>> netdev_wait_allrefs()
>>  
>>
> 
> percpu_ref was added more recently than the netdev_refcnt stuff, and is
> interesting for users wanting a synchronous wait for the refcnt reaching 0.
> 
> netdev_wait_allrefs() was designed to be asynchronous, so that we at least release
> RTNL (and current cpu) when something is wrong and a device can not be dismantled.

Yeah, they are different, and I think we can't add more synchronize_rcu()-dependent
synchronizations in this code, since network namespaces are already destroyed very slow.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-01-31 15:34 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-31 13:20 [PATCH] net: check negative value for signed refcnt alexandre.besnard
2019-01-31 13:49 ` Kirill Tkhai
2019-01-31 15:14   ` Alexandre BESNARD
2019-01-31 15:31     ` Kirill Tkhai
2019-01-31 15:15   ` Eric Dumazet
2019-01-31 15:21     ` Eric Dumazet
2019-01-31 15:34       ` Kirill Tkhai

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).