All of lore.kernel.org
 help / color / mirror / Atom feed
* [RHEL6.6 PATCH] netpoll: Close race condition between poll_one_napi and napi_disable
@ 2015-09-22 18:00 Neil Horman
  2015-09-22 19:01 ` Neil Horman
  0 siblings, 1 reply; 10+ messages in thread
From: Neil Horman @ 2015-09-22 18:00 UTC (permalink / raw)
  To: netdev

Drivers might call napi_disable while not holding the napi instance poll_lock.
In those instances, its possible for a race condition to exist between
poll_one_napi and napi_disable.  That is to say, poll_one_napi only tests the
NAPI_STATE_SCHED bit to see if there is work to do during a poll, and as such
the following may happen:

CPU0				CPU1
ndo_tx_timeout			napi_poll_dev
 napi_disable			 poll_one_napi
  test_and_set_bit (ret 0)
				  test_bit (ret 1)
   reset adapter		   napi_poll_routine

If the adapter gets a tx timeout without a napi instance scheduled, its possible
for the adapter to think it has exclusive access to the hardware  (as the napi
instance is now scheduled via the napi_disable call), while the netpoll code
thinks there is simply work to do.  The result is parallel hardware access
leading to corrupt data structures in the driver, and a crash.

Additionaly, there is another, more critical race between netpoll and
napi_disable.  The disabled napi state is actually identical to the scheduled
state for a given napi instance.  The implication being that, if a napi instance
is disabled, a netconsole instance would see the napi state of the device as
having been scheduled, and poll it, likely while the driver was dong something
requiring exclusive access.  In the case above, its fairly clear that not having
the rings in a state ready to be polled will cause any number of crashes.

The fix should be pretty easy.  netpoll uses its own bit to indicate that that
the napi instance is in a state of being serviced by netpoll (NAPI_STATE_NPSVC).
We can just gate disabling on that bit as well as the sched bit.  That should
prevent netpoll from conducting a napi poll if we convert its set bit to a
test_and_set_bit operation to provide mutual exclusion

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: jmaxwell@redhat.com
Tested-by: jmaxwell@redhat.com
---
 include/linux/netdevice.h |  2 ++
 net/core/dev.c            |  2 ++
 net/core/netpoll.c        | 12 ++++++++++--
 3 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index b791405..48becac 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -507,6 +507,8 @@ static inline void napi_enable(struct napi_struct *n)
 	BUG_ON(!test_bit(NAPI_STATE_SCHED, &n->state));
 	smp_mb__before_atomic();
 	clear_bit(NAPI_STATE_SCHED, &n->state);
+	clear_bit(NAPI_STATE_NPSVC, &n->state);
+
 }
 
 #ifdef CONFIG_SMP
diff --git a/net/core/dev.c b/net/core/dev.c
index ee0d628..b6b01bf 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4723,6 +4723,8 @@ void napi_disable(struct napi_struct *n)
 
 	while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
 		msleep(1);
+	while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
+		msleep(1);
 
 	hrtimer_cancel(&n->timer);
 
diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index 6aa3db8..91cf217 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -142,7 +142,7 @@ static void queue_process(struct work_struct *work)
  */
 static int poll_one_napi(struct napi_struct *napi, int budget)
 {
-	int work;
+	int work = 0;
 
 	/* net_rx_action's ->poll() invocations and our's are
 	 * synchronized by this test which is only made while
@@ -151,7 +151,14 @@ static int poll_one_napi(struct napi_struct *napi, int budget)
 	if (!test_bit(NAPI_STATE_SCHED, &napi->state))
 		return budget;
 
-	set_bit(NAPI_STATE_NPSVC, &napi->state);
+	/*
+ 	 * If we set this bit but see that it has already been set, 
+ 	 * that indicates that napi has been disabled and we need
+ 	 * to abort this operation
+ 	 */
+
+	if(test_and_set_bit(NAPI_STATE_NPSVC, &napi->state))
+		goto out;
 
 	work = napi->poll(napi, budget);
 	WARN_ONCE(work > budget, "%pF exceeded budget in poll\n", napi->poll);
@@ -159,6 +166,7 @@ static int poll_one_napi(struct napi_struct *napi, int budget)
 
 	clear_bit(NAPI_STATE_NPSVC, &napi->state);
 
+out:
 	return budget - work;
 }
 
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [RHEL6.6 PATCH] netpoll: Close race condition between poll_one_napi and napi_disable
  2015-09-22 18:00 [RHEL6.6 PATCH] netpoll: Close race condition between poll_one_napi and napi_disable Neil Horman
@ 2015-09-22 19:01 ` Neil Horman
  2015-09-22 19:46   ` Alexander Duyck
                     ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Neil Horman @ 2015-09-22 19:01 UTC (permalink / raw)
  To: Neil Horman; +Cc: netdev

On Tue, Sep 22, 2015 at 02:00:23PM -0400, Neil Horman wrote:
> Drivers might call napi_disable while not holding the napi instance poll_lock.
> In those instances, its possible for a race condition to exist between
> poll_one_napi and napi_disable.  That is to say, poll_one_napi only tests the
> NAPI_STATE_SCHED bit to see if there is work to do during a poll, and as such
> the following may happen:
> 
> CPU0				CPU1
> ndo_tx_timeout			napi_poll_dev
>  napi_disable			 poll_one_napi
>   test_and_set_bit (ret 0)
> 				  test_bit (ret 1)
>    reset adapter		   napi_poll_routine
> 
> If the adapter gets a tx timeout without a napi instance scheduled, its possible
> for the adapter to think it has exclusive access to the hardware  (as the napi
> instance is now scheduled via the napi_disable call), while the netpoll code
> thinks there is simply work to do.  The result is parallel hardware access
> leading to corrupt data structures in the driver, and a crash.
> 
> Additionaly, there is another, more critical race between netpoll and
> napi_disable.  The disabled napi state is actually identical to the scheduled
> state for a given napi instance.  The implication being that, if a napi instance
> is disabled, a netconsole instance would see the napi state of the device as
> having been scheduled, and poll it, likely while the driver was dong something
> requiring exclusive access.  In the case above, its fairly clear that not having
> the rings in a state ready to be polled will cause any number of crashes.
> 
> The fix should be pretty easy.  netpoll uses its own bit to indicate that that
> the napi instance is in a state of being serviced by netpoll (NAPI_STATE_NPSVC).
> We can just gate disabling on that bit as well as the sched bit.  That should
> prevent netpoll from conducting a napi poll if we convert its set bit to a
> test_and_set_bit operation to provide mutual exclusion
> 
> Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> CC: "David S. Miller" <davem@davemloft.net>
> CC: jmaxwell@redhat.com
> Tested-by: jmaxwell@redhat.com
> ---
>  include/linux/netdevice.h |  2 ++
>  net/core/dev.c            |  2 ++
>  net/core/netpoll.c        | 12 ++++++++++--
>  3 files changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index b791405..48becac 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -507,6 +507,8 @@ static inline void napi_enable(struct napi_struct *n)
>  	BUG_ON(!test_bit(NAPI_STATE_SCHED, &n->state));
>  	smp_mb__before_atomic();
>  	clear_bit(NAPI_STATE_SCHED, &n->state);
> +	clear_bit(NAPI_STATE_NPSVC, &n->state);
> +
>  }
>  
>  #ifdef CONFIG_SMP
> diff --git a/net/core/dev.c b/net/core/dev.c
> index ee0d628..b6b01bf 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -4723,6 +4723,8 @@ void napi_disable(struct napi_struct *n)
>  
>  	while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
>  		msleep(1);
> +	while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
> +		msleep(1);
>  
>  	hrtimer_cancel(&n->timer);
>  
> diff --git a/net/core/netpoll.c b/net/core/netpoll.c
> index 6aa3db8..91cf217 100644
> --- a/net/core/netpoll.c
> +++ b/net/core/netpoll.c
> @@ -142,7 +142,7 @@ static void queue_process(struct work_struct *work)
>   */
>  static int poll_one_napi(struct napi_struct *napi, int budget)
>  {
> -	int work;
> +	int work = 0;
>  
>  	/* net_rx_action's ->poll() invocations and our's are
>  	 * synchronized by this test which is only made while
> @@ -151,7 +151,14 @@ static int poll_one_napi(struct napi_struct *napi, int budget)
>  	if (!test_bit(NAPI_STATE_SCHED, &napi->state))
>  		return budget;
>  
> -	set_bit(NAPI_STATE_NPSVC, &napi->state);
> +	/*
> + 	 * If we set this bit but see that it has already been set, 
> + 	 * that indicates that napi has been disabled and we need
> + 	 * to abort this operation
> + 	 */
> +
> +	if(test_and_set_bit(NAPI_STATE_NPSVC, &napi->state))
> +		goto out;
>  
>  	work = napi->poll(napi, budget);
>  	WARN_ONCE(work > budget, "%pF exceeded budget in poll\n", napi->poll);
> @@ -159,6 +166,7 @@ static int poll_one_napi(struct napi_struct *napi, int budget)
>  
>  	clear_bit(NAPI_STATE_NPSVC, &napi->state);
>  
> +out:
>  	return budget - work;
>  }
>  
> -- 
> 2.1.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
Shoot, I forgot to change my subject prefix, sorry about that. Dave this applies
to net-next, shall I resubmit with a proper prefix, or are you good with it as
is?
Neil

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RHEL6.6 PATCH] netpoll: Close race condition between poll_one_napi and napi_disable
  2015-09-22 19:01 ` Neil Horman
@ 2015-09-22 19:46   ` Alexander Duyck
  2015-09-22 19:57     ` Neil Horman
  2015-09-22 20:01   ` [PATCH v2] " Neil Horman
  2015-09-23 18:57   ` [PATCH v3] " Neil Horman
  2 siblings, 1 reply; 10+ messages in thread
From: Alexander Duyck @ 2015-09-22 19:46 UTC (permalink / raw)
  To: Neil Horman, Neil Horman; +Cc: netdev

On 09/22/2015 12:01 PM, Neil Horman wrote:
> On Tue, Sep 22, 2015 at 02:00:23PM -0400, Neil Horman wrote:
>> Drivers might call napi_disable while not holding the napi instance poll_lock.
>> In those instances, its possible for a race condition to exist between
>> poll_one_napi and napi_disable.  That is to say, poll_one_napi only tests the
>> NAPI_STATE_SCHED bit to see if there is work to do during a poll, and as such
>> the following may happen:
>>
>> CPU0				CPU1
>> ndo_tx_timeout			napi_poll_dev
>>   napi_disable			 poll_one_napi
>>    test_and_set_bit (ret 0)
>> 				  test_bit (ret 1)
>>     reset adapter		   napi_poll_routine
>>
>> If the adapter gets a tx timeout without a napi instance scheduled, its possible
>> for the adapter to think it has exclusive access to the hardware  (as the napi
>> instance is now scheduled via the napi_disable call), while the netpoll code
>> thinks there is simply work to do.  The result is parallel hardware access
>> leading to corrupt data structures in the driver, and a crash.
>>
>> Additionaly, there is another, more critical race between netpoll and
>> napi_disable.  The disabled napi state is actually identical to the scheduled
>> state for a given napi instance.  The implication being that, if a napi instance
>> is disabled, a netconsole instance would see the napi state of the device as
>> having been scheduled, and poll it, likely while the driver was dong something
>> requiring exclusive access.  In the case above, its fairly clear that not having
>> the rings in a state ready to be polled will cause any number of crashes.
>>
>> The fix should be pretty easy.  netpoll uses its own bit to indicate that that
>> the napi instance is in a state of being serviced by netpoll (NAPI_STATE_NPSVC).
>> We can just gate disabling on that bit as well as the sched bit.  That should
>> prevent netpoll from conducting a napi poll if we convert its set bit to a
>> test_and_set_bit operation to provide mutual exclusion
>>
>> Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
>> CC: "David S. Miller" <davem@davemloft.net>
>> CC: jmaxwell@redhat.com
>> Tested-by: jmaxwell@redhat.com
>> ---
>>   include/linux/netdevice.h |  2 ++
>>   net/core/dev.c            |  2 ++
>>   net/core/netpoll.c        | 12 ++++++++++--
>>   3 files changed, 14 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>> index b791405..48becac 100644
>> --- a/include/linux/netdevice.h
>> +++ b/include/linux/netdevice.h
>> @@ -507,6 +507,8 @@ static inline void napi_enable(struct napi_struct *n)
>>   	BUG_ON(!test_bit(NAPI_STATE_SCHED, &n->state));
>>   	smp_mb__before_atomic();
>>   	clear_bit(NAPI_STATE_SCHED, &n->state);
>> +	clear_bit(NAPI_STATE_NPSVC, &n->state);
>> +
>>   }
>>   
>>   #ifdef CONFIG_SMP
>> diff --git a/net/core/dev.c b/net/core/dev.c
>> index ee0d628..b6b01bf 100644
>> --- a/net/core/dev.c
>> +++ b/net/core/dev.c
>> @@ -4723,6 +4723,8 @@ void napi_disable(struct napi_struct *n)
>>   
>>   	while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
>>   		msleep(1);
>> +	while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
>> +		msleep(1);
>>   
>>   	hrtimer_cancel(&n->timer);
>>   
>> diff --git a/net/core/netpoll.c b/net/core/netpoll.c
>> index 6aa3db8..91cf217 100644
>> --- a/net/core/netpoll.c
>> +++ b/net/core/netpoll.c
>> @@ -142,7 +142,7 @@ static void queue_process(struct work_struct *work)
>>    */
>>   static int poll_one_napi(struct napi_struct *napi, int budget)
>>   {
>> -	int work;
>> +	int work = 0;
>>   
>>   	/* net_rx_action's ->poll() invocations and our's are
>>   	 * synchronized by this test which is only made while
>> @@ -151,7 +151,14 @@ static int poll_one_napi(struct napi_struct *napi, int budget)
>>   	if (!test_bit(NAPI_STATE_SCHED, &napi->state))
>>   		return budget;
>>   
>> -	set_bit(NAPI_STATE_NPSVC, &napi->state);
>> +	/*
>> + 	 * If we set this bit but see that it has already been set,
>> + 	 * that indicates that napi has been disabled and we need
>> + 	 * to abort this operation
>> + 	 */
>> +
>> +	if(test_and_set_bit(NAPI_STATE_NPSVC, &napi->state))
>> +		goto out;
>>   
>>   	work = napi->poll(napi, budget);
>>   	WARN_ONCE(work > budget, "%pF exceeded budget in poll\n", napi->poll);
>> @@ -159,6 +166,7 @@ static int poll_one_napi(struct napi_struct *napi, int budget)
>>   
>>   	clear_bit(NAPI_STATE_NPSVC, &napi->state);
>>   
>> +out:
>>   	return budget - work;
>>   }
>>   
>> -- 
>> 2.1.0
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> Shoot, I forgot to change my subject prefix, sorry about that. Dave this applies
> to net-next, shall I resubmit with a proper prefix, or are you good with it as
> is?

It looks like this patch introduces some white-space errors as well.  
The comment block has one trailing white space, and 4 spaces before 
tabs.  You might want to resubmit with that fixed.

- Alex

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RHEL6.6 PATCH] netpoll: Close race condition between poll_one_napi and napi_disable
  2015-09-22 19:46   ` Alexander Duyck
@ 2015-09-22 19:57     ` Neil Horman
  0 siblings, 0 replies; 10+ messages in thread
From: Neil Horman @ 2015-09-22 19:57 UTC (permalink / raw)
  To: Alexander Duyck; +Cc: Neil Horman, netdev

On Tue, Sep 22, 2015 at 12:46:13PM -0700, Alexander Duyck wrote:
> On 09/22/2015 12:01 PM, Neil Horman wrote:
> >On Tue, Sep 22, 2015 at 02:00:23PM -0400, Neil Horman wrote:
> >>Drivers might call napi_disable while not holding the napi instance poll_lock.
> >>In those instances, its possible for a race condition to exist between
> >>poll_one_napi and napi_disable.  That is to say, poll_one_napi only tests the
> >>NAPI_STATE_SCHED bit to see if there is work to do during a poll, and as such
> >>the following may happen:
> >>
> >>CPU0				CPU1
> >>ndo_tx_timeout			napi_poll_dev
> >>  napi_disable			 poll_one_napi
> >>   test_and_set_bit (ret 0)
> >>				  test_bit (ret 1)
> >>    reset adapter		   napi_poll_routine
> >>
> >>If the adapter gets a tx timeout without a napi instance scheduled, its possible
> >>for the adapter to think it has exclusive access to the hardware  (as the napi
> >>instance is now scheduled via the napi_disable call), while the netpoll code
> >>thinks there is simply work to do.  The result is parallel hardware access
> >>leading to corrupt data structures in the driver, and a crash.
> >>
> >>Additionaly, there is another, more critical race between netpoll and
> >>napi_disable.  The disabled napi state is actually identical to the scheduled
> >>state for a given napi instance.  The implication being that, if a napi instance
> >>is disabled, a netconsole instance would see the napi state of the device as
> >>having been scheduled, and poll it, likely while the driver was dong something
> >>requiring exclusive access.  In the case above, its fairly clear that not having
> >>the rings in a state ready to be polled will cause any number of crashes.
> >>
> >>The fix should be pretty easy.  netpoll uses its own bit to indicate that that
> >>the napi instance is in a state of being serviced by netpoll (NAPI_STATE_NPSVC).
> >>We can just gate disabling on that bit as well as the sched bit.  That should
> >>prevent netpoll from conducting a napi poll if we convert its set bit to a
> >>test_and_set_bit operation to provide mutual exclusion
> >>
> >>Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> >>CC: "David S. Miller" <davem@davemloft.net>
> >>CC: jmaxwell@redhat.com
> >>Tested-by: jmaxwell@redhat.com
> >>---
> >>  include/linux/netdevice.h |  2 ++
> >>  net/core/dev.c            |  2 ++
> >>  net/core/netpoll.c        | 12 ++++++++++--
> >>  3 files changed, 14 insertions(+), 2 deletions(-)
> >>
> >>diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> >>index b791405..48becac 100644
> >>--- a/include/linux/netdevice.h
> >>+++ b/include/linux/netdevice.h
> >>@@ -507,6 +507,8 @@ static inline void napi_enable(struct napi_struct *n)
> >>  	BUG_ON(!test_bit(NAPI_STATE_SCHED, &n->state));
> >>  	smp_mb__before_atomic();
> >>  	clear_bit(NAPI_STATE_SCHED, &n->state);
> >>+	clear_bit(NAPI_STATE_NPSVC, &n->state);
> >>+
> >>  }
> >>  #ifdef CONFIG_SMP
> >>diff --git a/net/core/dev.c b/net/core/dev.c
> >>index ee0d628..b6b01bf 100644
> >>--- a/net/core/dev.c
> >>+++ b/net/core/dev.c
> >>@@ -4723,6 +4723,8 @@ void napi_disable(struct napi_struct *n)
> >>  	while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
> >>  		msleep(1);
> >>+	while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
> >>+		msleep(1);
> >>  	hrtimer_cancel(&n->timer);
> >>diff --git a/net/core/netpoll.c b/net/core/netpoll.c
> >>index 6aa3db8..91cf217 100644
> >>--- a/net/core/netpoll.c
> >>+++ b/net/core/netpoll.c
> >>@@ -142,7 +142,7 @@ static void queue_process(struct work_struct *work)
> >>   */
> >>  static int poll_one_napi(struct napi_struct *napi, int budget)
> >>  {
> >>-	int work;
> >>+	int work = 0;
> >>  	/* net_rx_action's ->poll() invocations and our's are
> >>  	 * synchronized by this test which is only made while
> >>@@ -151,7 +151,14 @@ static int poll_one_napi(struct napi_struct *napi, int budget)
> >>  	if (!test_bit(NAPI_STATE_SCHED, &napi->state))
> >>  		return budget;
> >>-	set_bit(NAPI_STATE_NPSVC, &napi->state);
> >>+	/*
> >>+ 	 * If we set this bit but see that it has already been set,
> >>+ 	 * that indicates that napi has been disabled and we need
> >>+ 	 * to abort this operation
> >>+ 	 */
> >>+
> >>+	if(test_and_set_bit(NAPI_STATE_NPSVC, &napi->state))
> >>+		goto out;
> >>  	work = napi->poll(napi, budget);
> >>  	WARN_ONCE(work > budget, "%pF exceeded budget in poll\n", napi->poll);
> >>@@ -159,6 +166,7 @@ static int poll_one_napi(struct napi_struct *napi, int budget)
> >>  	clear_bit(NAPI_STATE_NPSVC, &napi->state);
> >>+out:
> >>  	return budget - work;
> >>  }
> >>-- 
> >>2.1.0
> >>
> >>--
> >>To unsubscribe from this list: send the line "unsubscribe netdev" in
> >>the body of a message to majordomo@vger.kernel.org
> >>More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>
> >Shoot, I forgot to change my subject prefix, sorry about that. Dave this applies
> >to net-next, shall I resubmit with a proper prefix, or are you good with it as
> >is?
> 
> It looks like this patch introduces some white-space errors as well.  The
> comment block has one trailing white space, and 4 spaces before tabs.  You
> might want to resubmit with that fixed.
> 
 I don't think so.  Its got one trailing space that I see, you're right about
that, but the only leading white space is directly in front of the astersks,
which is necessecary for alignment, its identical to other simmilar comments in
the file. 

No matter though, I'll just repost it at this point
Neil

> - Alex

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v2] netpoll: Close race condition between poll_one_napi and napi_disable
  2015-09-22 19:01 ` Neil Horman
  2015-09-22 19:46   ` Alexander Duyck
@ 2015-09-22 20:01   ` Neil Horman
  2015-09-22 22:09     ` Stephen Hemminger
  2015-09-23 18:03     ` David Miller
  2015-09-23 18:57   ` [PATCH v3] " Neil Horman
  2 siblings, 2 replies; 10+ messages in thread
From: Neil Horman @ 2015-09-22 20:01 UTC (permalink / raw)
  To: netdev

Drivers might call napi_disable while not holding the napi instance poll_lock.
In those instances, its possible for a race condition to exist between
poll_one_napi and napi_disable.  That is to say, poll_one_napi only tests the
NAPI_STATE_SCHED bit to see if there is work to do during a poll, and as such
the following may happen:

CPU0				CPU1
ndo_tx_timeout			napi_poll_dev
 napi_disable			 poll_one_napi
  test_and_set_bit (ret 0)
				  test_bit (ret 1)
   reset adapter		   napi_poll_routine

If the adapter gets a tx timeout without a napi instance scheduled, its possible
for the adapter to think it has exclusive access to the hardware  (as the napi
instance is now scheduled via the napi_disable call), while the netpoll code
thinks there is simply work to do.  The result is parallel hardware access
leading to corrupt data structures in the driver, and a crash.

Additionaly, there is another, more critical race between netpoll and
napi_disable.  The disabled napi state is actually identical to the scheduled
state for a given napi instance.  The implication being that, if a napi instance
is disabled, a netconsole instance would see the napi state of the device as
having been scheduled, and poll it, likely while the driver was dong something
requiring exclusive access.  In the case above, its fairly clear that not having
the rings in a state ready to be polled will cause any number of crashes.

The fix should be pretty easy.  netpoll uses its own bit to indicate that that
the napi instance is in a state of being serviced by netpoll (NAPI_STATE_NPSVC).
We can just gate disabling on that bit as well as the sched bit.  That should
prevent netpoll from conducting a napi poll if we convert its set bit to a
test_and_set_bit operation to provide mutual exclusion

Change notes:
V2)
	Remove a trailing whtiespace
	Resubmit with proper subject prefix

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: jmaxwell@redhat.com
Tested-by: jmaxwell@redhat.com
---
 include/linux/netdevice.h |  2 ++
 net/core/dev.c            |  2 ++
 net/core/netpoll.c        | 12 ++++++++++--
 3 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index b791405..48becac 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -507,6 +507,8 @@ static inline void napi_enable(struct napi_struct *n)
 	BUG_ON(!test_bit(NAPI_STATE_SCHED, &n->state));
 	smp_mb__before_atomic();
 	clear_bit(NAPI_STATE_SCHED, &n->state);
+	clear_bit(NAPI_STATE_NPSVC, &n->state);
+
 }
 
 #ifdef CONFIG_SMP
diff --git a/net/core/dev.c b/net/core/dev.c
index ee0d628..b6b01bf 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4723,6 +4723,8 @@ void napi_disable(struct napi_struct *n)
 
 	while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
 		msleep(1);
+	while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
+		msleep(1);
 
 	hrtimer_cancel(&n->timer);
 
diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index 6aa3db8..9312b66 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -142,7 +142,7 @@ static void queue_process(struct work_struct *work)
  */
 static int poll_one_napi(struct napi_struct *napi, int budget)
 {
-	int work;
+	int work = 0;
 
 	/* net_rx_action's ->poll() invocations and our's are
 	 * synchronized by this test which is only made while
@@ -151,7 +151,14 @@ static int poll_one_napi(struct napi_struct *napi, int budget)
 	if (!test_bit(NAPI_STATE_SCHED, &napi->state))
 		return budget;
 
-	set_bit(NAPI_STATE_NPSVC, &napi->state);
+	/*
+	 * If we set this bit but see that it has already been set,
+	 * that indicates that napi has been disabled and we need
+	 * to abort this operation
+ 	 */
+
+	if(test_and_set_bit(NAPI_STATE_NPSVC, &napi->state))
+		goto out;
 
 	work = napi->poll(napi, budget);
 	WARN_ONCE(work > budget, "%pF exceeded budget in poll\n", napi->poll);
@@ -159,6 +166,7 @@ static int poll_one_napi(struct napi_struct *napi, int budget)
 
 	clear_bit(NAPI_STATE_NPSVC, &napi->state);
 
+out:
 	return budget - work;
 }
 
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] netpoll: Close race condition between poll_one_napi and napi_disable
  2015-09-22 20:01   ` [PATCH v2] " Neil Horman
@ 2015-09-22 22:09     ` Stephen Hemminger
  2015-09-23 10:39       ` Neil Horman
  2015-09-23 18:03     ` David Miller
  1 sibling, 1 reply; 10+ messages in thread
From: Stephen Hemminger @ 2015-09-22 22:09 UTC (permalink / raw)
  To: Neil Horman; +Cc: netdev

On Tue, 22 Sep 2015 16:01:36 -0400
Neil Horman <nhorman@redhat.com> wrote:

> +	clear_bit(NAPI_STATE_NPSVC, &n->state);
> +
>  }
why introduce extra line here?

> +	/*
> +	 * If we set this bit but see that it has already been set,
> +	 * that indicates that napi has been disabled and we need
> +	 * to abort this operation
> + 	 */
> +
> +	if(test_and_set_bit(NAPI_STATE_NPSVC, &napi->state))

And why introduce line after comment before code.
My preference is to have comment as close to code as possible.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] netpoll: Close race condition between poll_one_napi and napi_disable
  2015-09-22 22:09     ` Stephen Hemminger
@ 2015-09-23 10:39       ` Neil Horman
  0 siblings, 0 replies; 10+ messages in thread
From: Neil Horman @ 2015-09-23 10:39 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Neil Horman, netdev

On Tue, Sep 22, 2015 at 03:09:20PM -0700, Stephen Hemminger wrote:
> On Tue, 22 Sep 2015 16:01:36 -0400
> Neil Horman <nhorman@redhat.com> wrote:
> 
> > +	clear_bit(NAPI_STATE_NPSVC, &n->state);
> > +
> >  }
> why introduce extra line here?
> 
> > +	/*
> > +	 * If we set this bit but see that it has already been set,
> > +	 * that indicates that napi has been disabled and we need
> > +	 * to abort this operation
> > + 	 */
> > +
> > +	if(test_and_set_bit(NAPI_STATE_NPSVC, &napi->state))
> 
> And why introduce line after comment before code.
> My preference is to have comment as close to code as possible.

Mine is to add spaces, as I feel the code is more readable if its a bit more
spread out.

> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] netpoll: Close race condition between poll_one_napi and napi_disable
  2015-09-22 20:01   ` [PATCH v2] " Neil Horman
  2015-09-22 22:09     ` Stephen Hemminger
@ 2015-09-23 18:03     ` David Miller
  1 sibling, 0 replies; 10+ messages in thread
From: David Miller @ 2015-09-23 18:03 UTC (permalink / raw)
  To: nhorman; +Cc: netdev

From: Neil Horman <nhorman@redhat.com>
Date: Tue, 22 Sep 2015 16:01:36 -0400

> -	set_bit(NAPI_STATE_NPSVC, &napi->state);
> +	/*
> +	 * If we set this bit but see that it has already been set,
> +	 * that indicates that napi has been disabled and we need
> +	 * to abort this operation
> + 	 */
> +
> +	if(test_and_set_bit(NAPI_STATE_NPSVC, &napi->state))
> +		goto out;
>  

Networking comments should be styled:

	/* Like
	 * this.
	 */

Also you need a space after the "if" and before the openning parenthesis.

And like Stephen I think the empty line between the comment and the
if() statement is superfluous and eats up precious vertical space on
the screen :-)

Thanks.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v3] netpoll: Close race condition between poll_one_napi and napi_disable
  2015-09-22 19:01 ` Neil Horman
  2015-09-22 19:46   ` Alexander Duyck
  2015-09-22 20:01   ` [PATCH v2] " Neil Horman
@ 2015-09-23 18:57   ` Neil Horman
  2015-09-23 21:33     ` David Miller
  2 siblings, 1 reply; 10+ messages in thread
From: Neil Horman @ 2015-09-23 18:57 UTC (permalink / raw)
  To: netdev

Drivers might call napi_disable while not holding the napi instance poll_lock.
In those instances, its possible for a race condition to exist between
poll_one_napi and napi_disable.  That is to say, poll_one_napi only tests the
NAPI_STATE_SCHED bit to see if there is work to do during a poll, and as such
the following may happen:

CPU0				CPU1
ndo_tx_timeout			napi_poll_dev
 napi_disable			 poll_one_napi
  test_and_set_bit (ret 0)
				  test_bit (ret 1)
   reset adapter		   napi_poll_routine

If the adapter gets a tx timeout without a napi instance scheduled, its possible
for the adapter to think it has exclusive access to the hardware  (as the napi
instance is now scheduled via the napi_disable call), while the netpoll code
thinks there is simply work to do.  The result is parallel hardware access
leading to corrupt data structures in the driver, and a crash.

Additionaly, there is another, more critical race between netpoll and
napi_disable.  The disabled napi state is actually identical to the scheduled
state for a given napi instance.  The implication being that, if a napi instance
is disabled, a netconsole instance would see the napi state of the device as
having been scheduled, and poll it, likely while the driver was dong something
requiring exclusive access.  In the case above, its fairly clear that not having
the rings in a state ready to be polled will cause any number of crashes.

The fix should be pretty easy.  netpoll uses its own bit to indicate that that
the napi instance is in a state of being serviced by netpoll (NAPI_STATE_NPSVC).
We can just gate disabling on that bit as well as the sched bit.  That should
prevent netpoll from conducting a napi poll if we convert its set bit to a
test_and_set_bit operation to provide mutual exclusion

Change notes:
V2)
	Remove a trailing whtiespace
	Resubmit with proper subject prefix

V3)
	Clean up spacing nits

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: jmaxwell@redhat.com
Tested-by: jmaxwell@redhat.com
---
 include/linux/netdevice.h |  1 +
 net/core/dev.c            |  2 ++
 net/core/netpoll.c        | 11 +++++++++--
 3 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index b791405..d2ffeaf 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -507,6 +507,7 @@ static inline void napi_enable(struct napi_struct *n)
 	BUG_ON(!test_bit(NAPI_STATE_SCHED, &n->state));
 	smp_mb__before_atomic();
 	clear_bit(NAPI_STATE_SCHED, &n->state);
+	clear_bit(NAPI_STATE_NPSVC, &n->state);
 }
 
 #ifdef CONFIG_SMP
diff --git a/net/core/dev.c b/net/core/dev.c
index ee0d628..b6b01bf 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4723,6 +4723,8 @@ void napi_disable(struct napi_struct *n)
 
 	while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
 		msleep(1);
+	while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
+		msleep(1);
 
 	hrtimer_cancel(&n->timer);
 
diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index 6aa3db8..1734f50 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -142,7 +142,7 @@ static void queue_process(struct work_struct *work)
  */
 static int poll_one_napi(struct napi_struct *napi, int budget)
 {
-	int work;
+	int work = 0;
 
 	/* net_rx_action's ->poll() invocations and our's are
 	 * synchronized by this test which is only made while
@@ -151,7 +151,13 @@ static int poll_one_napi(struct napi_struct *napi, int budget)
 	if (!test_bit(NAPI_STATE_SCHED, &napi->state))
 		return budget;
 
-	set_bit(NAPI_STATE_NPSVC, &napi->state);
+	/* If we set this bit but see that it has already been set,
+	 * that indicates that napi has been disabled and we need
+	 * to abort this operation
+ 	 */
+
+	if (test_and_set_bit(NAPI_STATE_NPSVC, &napi->state))
+		goto out;
 
 	work = napi->poll(napi, budget);
 	WARN_ONCE(work > budget, "%pF exceeded budget in poll\n", napi->poll);
@@ -159,6 +165,7 @@ static int poll_one_napi(struct napi_struct *napi, int budget)
 
 	clear_bit(NAPI_STATE_NPSVC, &napi->state);
 
+out:
 	return budget - work;
 }
 
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v3] netpoll: Close race condition between poll_one_napi and napi_disable
  2015-09-23 18:57   ` [PATCH v3] " Neil Horman
@ 2015-09-23 21:33     ` David Miller
  0 siblings, 0 replies; 10+ messages in thread
From: David Miller @ 2015-09-23 21:33 UTC (permalink / raw)
  To: nhorman; +Cc: netdev

From: Neil Horman <nhorman@redhat.com>
Date: Wed, 23 Sep 2015 14:57:58 -0400

> Drivers might call napi_disable while not holding the napi instance poll_lock.
> In those instances, its possible for a race condition to exist between
> poll_one_napi and napi_disable.  That is to say, poll_one_napi only tests the
> NAPI_STATE_SCHED bit to see if there is work to do during a poll, and as such
> the following may happen:
> 
> CPU0				CPU1
> ndo_tx_timeout			napi_poll_dev
>  napi_disable			 poll_one_napi
>   test_and_set_bit (ret 0)
> 				  test_bit (ret 1)
>    reset adapter		   napi_poll_routine
> 
> If the adapter gets a tx timeout without a napi instance scheduled, its possible
> for the adapter to think it has exclusive access to the hardware  (as the napi
> instance is now scheduled via the napi_disable call), while the netpoll code
> thinks there is simply work to do.  The result is parallel hardware access
> leading to corrupt data structures in the driver, and a crash.
> 
> Additionaly, there is another, more critical race between netpoll and
> napi_disable.  The disabled napi state is actually identical to the scheduled
> state for a given napi instance.  The implication being that, if a napi instance
> is disabled, a netconsole instance would see the napi state of the device as
> having been scheduled, and poll it, likely while the driver was dong something
> requiring exclusive access.  In the case above, its fairly clear that not having
> the rings in a state ready to be polled will cause any number of crashes.
> 
> The fix should be pretty easy.  netpoll uses its own bit to indicate that that
> the napi instance is in a state of being serviced by netpoll (NAPI_STATE_NPSVC).
> We can just gate disabling on that bit as well as the sched bit.  That should
> prevent netpoll from conducting a napi poll if we convert its set bit to a
> test_and_set_bit operation to provide mutual exclusion
> 
> Change notes:
> V2)
> 	Remove a trailing whtiespace
> 	Resubmit with proper subject prefix
> 
> V3)
> 	Clean up spacing nits
> 
> Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> CC: "David S. Miller" <davem@davemloft.net>
> CC: jmaxwell@redhat.com
> Tested-by: jmaxwell@redhat.com

Applied, thanks Neil.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2015-09-23 21:33 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-22 18:00 [RHEL6.6 PATCH] netpoll: Close race condition between poll_one_napi and napi_disable Neil Horman
2015-09-22 19:01 ` Neil Horman
2015-09-22 19:46   ` Alexander Duyck
2015-09-22 19:57     ` Neil Horman
2015-09-22 20:01   ` [PATCH v2] " Neil Horman
2015-09-22 22:09     ` Stephen Hemminger
2015-09-23 10:39       ` Neil Horman
2015-09-23 18:03     ` David Miller
2015-09-23 18:57   ` [PATCH v3] " Neil Horman
2015-09-23 21:33     ` David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.