All of lore.kernel.org
 help / color / mirror / Atom feed
* Deadlock with restart_syscall()
@ 2018-07-16  7:31 André Pribil
  2018-07-27  9:31 ` André Pribil
  2018-07-27 15:53 ` Stephen Hemminger
  0 siblings, 2 replies; 4+ messages in thread
From: André Pribil @ 2018-07-16  7:31 UTC (permalink / raw)
  To: netdev

Hello,

I'm using kernel 4.14.52-rt34 on a single core ARM system and I'm seeing a 
deadlock inside the kernel when two RT processes make calls in the right 
temporal distance. The first process is trying to bring the Ethernet interface 
up, with the SIOCGIFFLAGS ioctl(). The second process is checking the Ethernet 
carrier, speed and duplex status, by reading e.g. "/sys/class/net/eth1/speed".

The first process finally gets to phy_poll_reset() in 
drivers/net/phy/phy_device.c, where it calls msleep(50). 
It never returns from the sleep.

The second process gets to speed_show() in net/core/net-sysfs.c. It tries to get
the RTNL lock with rtnl_trylock(), but fails and calls restart_syscall(). 
This happens over and over again.

It seems like the first process in no longer scheduled and cannot release the
RTNL lock, while the second process is busy restarting the syscall. The first 
process has a higher RT priority than the second process.
                                                         
Just for testing I've added the TIF_NEED_RESCHED flag to the restart_syscall() 
function and I did not see the deadlock again with this change.

static inline int restart_syscall(void)
{
	set_tsk_thread_flag(current, TIF_SIGPENDING | TIF_NEED_RESCHED);
	return -ERESTARTNOINTR;
}

As a second test I released the RTNL lock while calling msleep() in 
phy_poll_reset(). This also made the problem disappear.

I've found this thread, where a similar issue with restart_syscall() has been 
reported:
https://www.spinics.net/lists/netdev/msg415144.html

Any ideas how to fix this issue?

Andre   

^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: Deadlock with restart_syscall()
  2018-07-16  7:31 Deadlock with restart_syscall() André Pribil
@ 2018-07-27  9:31 ` André Pribil
  2018-07-27 15:53 ` Stephen Hemminger
  1 sibling, 0 replies; 4+ messages in thread
From: André Pribil @ 2018-07-27  9:31 UTC (permalink / raw)
  To: André Pribil, netdev

> I've found this thread, where a similar issue with restart_syscall()
> has been reported:
> https://www.spinics.net/lists/netdev/msg415144.html

Found another old report about restart_syscall() producing a dead loop:
https://lists.gt.net/linux/kernel/2371438

I do not agree with the conclusion there that the user space should be 
to blame for this. I also see no ugly priority games in my scenario.

No one who wants to say anything about this?

Andre

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Deadlock with restart_syscall()
  2018-07-16  7:31 Deadlock with restart_syscall() André Pribil
  2018-07-27  9:31 ` André Pribil
@ 2018-07-27 15:53 ` Stephen Hemminger
  2018-07-30  8:08   ` André Pribil
  1 sibling, 1 reply; 4+ messages in thread
From: Stephen Hemminger @ 2018-07-27 15:53 UTC (permalink / raw)
  To: André Pribil; +Cc: netdev

On Mon, 16 Jul 2018 09:31:06 +0200
André Pribil <a.pribil@beck-ipc.com> wrote:

> Hello,
> 
> I'm using kernel 4.14.52-rt34 on a single core ARM system and I'm seeing a 
> deadlock inside the kernel when two RT processes make calls in the right 
> temporal distance. The first process is trying to bring the Ethernet interface 
> up, with the SIOCGIFFLAGS ioctl(). The second process is checking the Ethernet 
> carrier, speed and duplex status, by reading e.g. "/sys/class/net/eth1/speed".
> 
> The first process finally gets to phy_poll_reset() in 
> drivers/net/phy/phy_device.c, where it calls msleep(50). 
> It never returns from the sleep.
> 
> The second process gets to speed_show() in net/core/net-sysfs.c. It tries to get
> the RTNL lock with rtnl_trylock(), but fails and calls restart_syscall(). 
> This happens over and over again.
> 
> It seems like the first process in no longer scheduled and cannot release the
> RTNL lock, while the second process is busy restarting the syscall. The first 
> process has a higher RT priority than the second process.
>                                                          
> Just for testing I've added the TIF_NEED_RESCHED flag to the restart_syscall() 
> function and I did not see the deadlock again with this change.
> 
> static inline int restart_syscall(void)
> {
> 	set_tsk_thread_flag(current, TIF_SIGPENDING | TIF_NEED_RESCHED);
> 	return -ERESTARTNOINTR;
> }
> 
> As a second test I released the RTNL lock while calling msleep() in 
> phy_poll_reset(). This also made the problem disappear.
> 
> I've found this thread, where a similar issue with restart_syscall() has been 
> reported:
> https://www.spinics.net/lists/netdev/msg415144.html
> 
> Any ideas how to fix this issue?
> 
> Andre   

Don't do control operations from RT processes!
There can be cases of priority inversion where RT process is waiting for
something that requires a kthread to complete the operation.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: Deadlock with restart_syscall()
  2018-07-27 15:53 ` Stephen Hemminger
@ 2018-07-30  8:08   ` André Pribil
  0 siblings, 0 replies; 4+ messages in thread
From: André Pribil @ 2018-07-30  8:08 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev

> Don't do control operations from RT processes!
> There can be cases of priority inversion where RT process is waiting
> for
> something that requires a kthread to complete the operation.

I have RT throttling activated and I thought it would kick-in and give 
the non-RT threads some time to execute.
Therefore, I would expect this control operation to terminate sooner or later, 
even if there's a kthread involved and a priority inversion occurs here, 
but this seems not to be the case. 

Andre

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2018-07-30  9:42 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-16  7:31 Deadlock with restart_syscall() André Pribil
2018-07-27  9:31 ` André Pribil
2018-07-27 15:53 ` Stephen Hemminger
2018-07-30  8:08   ` André Pribil

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.