All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stephen Hemminger <stephen@networkplumber.org>
To: "André Pribil" <a.pribil@beck-ipc.com>
Cc: "netdev@vger.kernel.org" <netdev@vger.kernel.org>
Subject: Re: Deadlock with restart_syscall()
Date: Fri, 27 Jul 2018 08:53:51 -0700	[thread overview]
Message-ID: <20180727085351.36210a12@xeon-e3> (raw)
In-Reply-To: <681500CE65202E47A192754B01DAB4673BE3D87D8D@SDE12.beckipc.net>

On Mon, 16 Jul 2018 09:31:06 +0200
André Pribil <a.pribil@beck-ipc.com> wrote:

> Hello,
> 
> I'm using kernel 4.14.52-rt34 on a single core ARM system and I'm seeing a 
> deadlock inside the kernel when two RT processes make calls in the right 
> temporal distance. The first process is trying to bring the Ethernet interface 
> up, with the SIOCGIFFLAGS ioctl(). The second process is checking the Ethernet 
> carrier, speed and duplex status, by reading e.g. "/sys/class/net/eth1/speed".
> 
> The first process finally gets to phy_poll_reset() in 
> drivers/net/phy/phy_device.c, where it calls msleep(50). 
> It never returns from the sleep.
> 
> The second process gets to speed_show() in net/core/net-sysfs.c. It tries to get
> the RTNL lock with rtnl_trylock(), but fails and calls restart_syscall(). 
> This happens over and over again.
> 
> It seems like the first process in no longer scheduled and cannot release the
> RTNL lock, while the second process is busy restarting the syscall. The first 
> process has a higher RT priority than the second process.
>                                                          
> Just for testing I've added the TIF_NEED_RESCHED flag to the restart_syscall() 
> function and I did not see the deadlock again with this change.
> 
> static inline int restart_syscall(void)
> {
> 	set_tsk_thread_flag(current, TIF_SIGPENDING | TIF_NEED_RESCHED);
> 	return -ERESTARTNOINTR;
> }
> 
> As a second test I released the RTNL lock while calling msleep() in 
> phy_poll_reset(). This also made the problem disappear.
> 
> I've found this thread, where a similar issue with restart_syscall() has been 
> reported:
> https://www.spinics.net/lists/netdev/msg415144.html
> 
> Any ideas how to fix this issue?
> 
> Andre   

Don't do control operations from RT processes!
There can be cases of priority inversion where RT process is waiting for
something that requires a kthread to complete the operation.

  parent reply	other threads:[~2018-07-27 17:16 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-16  7:31 Deadlock with restart_syscall() André Pribil
2018-07-27  9:31 ` André Pribil
2018-07-27 15:53 ` Stephen Hemminger [this message]
2018-07-30  8:08   ` André Pribil

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180727085351.36210a12@xeon-e3 \
    --to=stephen@networkplumber.org \
    --cc=a.pribil@beck-ipc.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.