netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Eugene Shatokhin <eugene.shatokhin-irhHPgl+04UvJsYlp49lxw@public.gmane.org>
To: Oliver Neukum <oneukum-IBi9RG/b67k@public.gmane.org>
Cc: netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-usb-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	LKML <linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
Subject: Re: Several races in "usbnet" module (kernel 4.1.x)
Date: Fri, 24 Jul 2015 20:38:13 +0300	[thread overview]
Message-ID: <55B27805.90601@rosalab.ru> (raw)
In-Reply-To: <1437480243.3823.5.camel-IBi9RG/b67k@public.gmane.org>

21.07.2015 15:04, Oliver Neukum пишет:
> On Mon, 2015-07-20 at 21:13 +0300, Eugene Shatokhin wrote:
>> Hi,
>>
>> I have recently found several data races in "usbnet" module, checked on
>> vanilla kernel 4.1.0 on x86_64. The races do actually happen, I have
>> confirmed it by adding delays and using hardware breakpoints to detect
>> the conflicting memory accesses (with RaceHound tool,
>> https://github.com/winnukem/racehound).
>>
>> I have not analyzed yet how harmful these races are (if they are), but
>> it is better to report them anyway, I think.
>>
>> Everything was checked using YOTA 4G LTE Modem that works via "usbnet"
>> and "cdc_ether" kernel modules.
>> --------------------------
>>
>> [Race #1]
>>
>> Race on skb_queue ('next' pointer) between usbnet_stop() and rx_complete().
>>
>> Reproduced that by unplugging the device while the system was
>> downloading a large file from the Net.
>>
>> Here is part of the call stack with the code where the changes to the
>> queue happen:
>>
>> #0 __skb_unlink (skbuff.h:1517)	
>> 	prev->next = next;
>> #1 defer_bh (usbnet.c:430)
>> 	spin_lock_irqsave(&list->lock, flags);
>> 	old_state = entry->state;
>> 	entry->state = state;
>> 	__skb_unlink(skb, list);
>> 	spin_unlock(&list->lock);
>> 	spin_lock(&dev->done.lock);
>> 	__skb_queue_tail(&dev->done, skb);
>> 	if (dev->done.qlen == 1)
>> 		tasklet_schedule(&dev->bh);
>> 	spin_unlock_irqrestore(&dev->done.lock, flags);
>> #2 rx_complete (usbnet.c:640)
>> 	state = defer_bh(dev, skb, &dev->rxq, state);
>>
>> At the same time, the following code repeatedly checks if the queue is
>> empty and reads the same values concurrently with the above changes:
>>
>> #0  usbnet_terminate_urbs (usbnet.c:765)
>> 	/* maybe wait for deletions to finish. */
>> 	while (!skb_queue_empty(&dev->rxq)
>> 		&& !skb_queue_empty(&dev->txq)
>> 		&& !skb_queue_empty(&dev->done)) {
>> 			schedule_timeout(msecs_to_jiffies(UNLINK_TIMEOUT_MS));
>> 			set_current_state(TASK_UNINTERRUPTIBLE);
>> 			netif_dbg(dev, ifdown, dev->net,
>> 				  "waited for %d urb completions\n", temp);
>> 	}
>> #1  usbnet_stop (usbnet.c:806)
>> 	if (!(info->flags & FLAG_AVOID_UNLINK_URBS))
>> 		usbnet_terminate_urbs(dev);
>>
>> For example, it is possible that the skb is removed from dev->rxq by
>> __skb_unlink() before the check "!skb_queue_empty(&dev->rxq)" in
>> usbnet_terminate_urbs() is made. It is also possible in this case that
>> the skb is added to dev->done queue after "!skb_queue_empty(&dev->done)"
>> is checked. So usbnet_terminate_urbs() may stop waiting and return while
>> dev->done queue still has an item.
>
> Hi,
>
> your analysis is correct and it looks like in addition to your proposed
> fix locking needs to be simplified and a common lock to be taken.
> Suggestions?

Just an idea, I haven't tested it.

How about moving the operations with dev->done under &list->lock in 
defer_bh, while keeping dev->done.lock too and changing 
usbnet_terminate_urbs() as described below?

Like this:
@@ -428,12 +428,12 @@ static enum skb_state defer_bh(struct usbnet *dev, 
struct sk_buff *skb,
  	old_state = entry->state;
  	entry->state = state;
  	__skb_unlink(skb, list);
-	spin_unlock(&list->lock);
  	spin_lock(&dev->done.lock);
  	__skb_queue_tail(&dev->done, skb);
  	if (dev->done.qlen == 1)
  		tasklet_schedule(&dev->bh);
-	spin_unlock_irqrestore(&dev->done.lock, flags);
+	spin_unlock(&dev->done.lock);
+	spin_unlock_irqrestore(&list->lock, flags);
  	return old_state;
  }
-------------------

usbnet_terminate_urbs() can then be changed as follows:

@@ -749,6 +749,20 @@ EXPORT_SYMBOL_GPL(usbnet_unlink_rx_urbs);

 
/*-------------------------------------------------------------------------*/

+static void wait_skb_queue_empty(struct sk_buff_head *q)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&q->lock, flags);
+	while (!skb_queue_empty(q)) {
+		spin_unlock_irqrestore(&q->lock, flags);
+		schedule_timeout(msecs_to_jiffies(UNLINK_TIMEOUT_MS));
+		set_current_state(TASK_UNINTERRUPTIBLE);
+		spin_lock_irqsave(&q->lock, flags);
+	}
+	spin_unlock_irqrestore(&q->lock, flags);
+}
+
  // precondition: never called in_interrupt
  static void usbnet_terminate_urbs(struct usbnet *dev)
  {
@@ -762,14 +776,11 @@ static void usbnet_terminate_urbs(struct usbnet *dev)
  		unlink_urbs(dev, &dev->rxq);

  	/* maybe wait for deletions to finish. */
-	while (!skb_queue_empty(&dev->rxq)
-		&& !skb_queue_empty(&dev->txq)
-		&& !skb_queue_empty(&dev->done)) {
-			schedule_timeout(msecs_to_jiffies(UNLINK_TIMEOUT_MS));
-			set_current_state(TASK_UNINTERRUPTIBLE);
-			netif_dbg(dev, ifdown, dev->net,
-				  "waited for %d urb completions\n", temp);
-	}
+	wait_skb_queue_empty(&dev->rxq);
+	wait_skb_queue_empty(&dev->txq);
+	wait_skb_queue_empty(&dev->done);
+	netif_dbg(dev, ifdown, dev->net,
+		  "waited for %d urb completions\n", temp);
  	set_current_state(TASK_RUNNING);
  	remove_wait_queue(&dev->wait, &wait);
  }
-------------------

This way, when usbnet_terminate_urbs() finds dev->rxq or dev->txq empty, 
the skbs from these queues, if there were any, have already been queued 
to dev->done.

At the first glance, moving the code under list->lock in defer_bh() 
should not produce deadlocks. Still, I suppose, it is better to use 
lockdep to be sure.

Regards,
Eugene

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  parent reply	other threads:[~2015-07-24 17:38 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-07-20 18:13 Several races in "usbnet" module (kernel 4.1.x) Eugene Shatokhin
2015-07-21 12:04 ` Oliver Neukum
     [not found]   ` <1437480243.3823.5.camel-IBi9RG/b67k@public.gmane.org>
2015-07-24 17:38     ` Eugene Shatokhin [this message]
2015-07-27 12:29       ` Oliver Neukum
2015-07-27 13:53         ` Eugene Shatokhin
2015-07-21 13:07 ` Oliver Neukum
     [not found] ` <55AD3A41.2040100-irhHPgl+04UvJsYlp49lxw@public.gmane.org>
2015-07-21 14:22   ` Oliver Neukum
2015-07-22 18:33     ` Eugene Shatokhin
2015-07-23  9:15       ` Oliver Neukum
2015-07-24 14:41         ` Eugene Shatokhin
2015-07-27 10:00           ` Oliver Neukum
2015-07-27 14:23             ` Eugene Shatokhin
2015-08-14 16:55     ` Eugene Shatokhin
2015-08-14 16:58       ` [PATCH] usbnet: Fix two races between usbnet_stop() and the BH Eugene Shatokhin
2015-08-19  1:54         ` David Miller
     [not found]           ` <20150818.185407.1667358232705414236.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
2015-08-19  7:57             ` Eugene Shatokhin
2015-08-19 10:54               ` Bjørn Mork
2015-08-19 11:59                 ` Eugene Shatokhin
2015-08-19 12:31                   ` Bjørn Mork
2015-08-24 12:20                     ` Eugene Shatokhin
2015-08-24 13:29                       ` Bjørn Mork
2015-08-24 17:00                         ` Eugene Shatokhin
2015-08-25 12:31                         ` Oliver Neukum
2015-08-24 17:43                   ` David Miller
     [not found]                     ` <20150824.104328.554582952440857559.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
2015-08-24 18:06                       ` Alan Stern
2015-08-24 18:21                         ` Alan Stern
2015-08-25 12:36                           ` Oliver Neukum
2015-08-24 18:35                         ` David Miller
2015-08-24 18:12                     ` Eugene Shatokhin
2015-07-23  9:43   ` Several races in "usbnet" module (kernel 4.1.x) Oliver Neukum
2015-07-23 11:39     ` Eugene Shatokhin
2015-08-24 20:13 ` [PATCH 0/2] usbnet: Fix 2 problems in usbnet_stop() Eugene Shatokhin
2015-08-24 20:13   ` [PATCH 1/2] usbnet: Get EVENT_NO_RUNTIME_PM bit before it is cleared Eugene Shatokhin
2015-08-25 13:01     ` Oliver Neukum
     [not found]       ` <1440507709.13824.6.camel-IBi9RG/b67k@public.gmane.org>
2015-08-25 14:16         ` Bjørn Mork
2015-08-25 14:22     ` Oliver Neukum
2015-08-26  2:44     ` David Miller
2015-08-24 20:13   ` [PATCH 2/2] usbnet: Fix a race between usbnet_stop() and the BH Eugene Shatokhin
2015-08-24 21:01     ` Bjørn Mork
2015-08-28  8:09       ` Eugene Shatokhin
2015-08-28  8:55         ` Bjørn Mork
2015-08-28 10:42           ` Eugene Shatokhin
2015-08-31  7:32             ` Bjørn Mork
2015-08-31  8:50               ` Eugene Shatokhin
2015-09-01  7:58                 ` Oliver Neukum
2015-09-01 13:54                   ` Eugene Shatokhin
2015-09-01 14:05                   ` [PATCH] " Eugene Shatokhin
2015-09-08  7:24                     ` Eugene Shatokhin
2015-09-08  7:37                       ` Bjørn Mork
2015-09-08  7:48                         ` Oliver Neukum
2015-09-08 20:18                     ` David Miller
2015-09-01  7:57         ` [PATCH 2/2] " Oliver Neukum
2015-08-26  2:45     ` David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55B27805.90601@rosalab.ru \
    --to=eugene.shatokhin-irhhpgl+04uvjsylp49lxw@public.gmane.org \
    --cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-usb-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=oneukum-IBi9RG/b67k@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).