All of lore.kernel.org
 help / color / mirror / Atom feed
* RT kernel on Acer laptop unreliable
@ 2017-07-15 17:43 Jacek Konieczny
  2017-07-17  6:21 ` Stéphane Ancelot
  2017-08-07 13:19 ` Sebastian Andrzej Siewior
  0 siblings, 2 replies; 19+ messages in thread
From: Jacek Konieczny @ 2017-07-15 17:43 UTC (permalink / raw)
  To: linux-rt-users

Hello,

I need low and stable latency for audio processing. I use Guitarix,
Ardour and other jack-based applications. For anything else I would use
the regular kernel package from my distribution (PLD Linux), but the
latency spikes are unacceptable in some cases. That is why I have been
trying to use RT kernel, for a few months now…

First I added the RT patch to the PLD Linux kernel package. I have tried
both the 4.4 and 4.9 kernel. Neither of them worked well. Then I
manually compiled a few vanilla 4.9.x kernels, just with the most recent
RT patch and with a configuration made from scratch (not to copy any PLD
mistakes). Still no luck.

Most of the information I could find online is quite outdated or
incomplete, so I have really little idea what the proper configuration
of the RT kernel is or how to debug it.

When the RT kernel works, it would provide quite acceptable latency – I
could get down to 8ms in jack with no significant xruns. Unfortunately
none of the RT kernels I tried would be reliable. With little load the
system would work for a few hours sometimes, but often it would slow
down and crash much earlier. With high load, e.g. when trying to play
some demanding game, the system would usually crash quite quickly.

Sometimes it would just lock up hard with no warning and nothing would
work – not even magic sysrq.
Other times it would gradually slow down until it is not usable at all.
Sometimes I would be able to see some 'BUG' in dmesg, but rarely I would
be able to restart the system cleanly.
Sometimes only some subsystems would fail, while otherwise the system
still seems to work. It could be sound, mouse, keyboard or network that
doesn't work.

System logs would contain some kernel BUGs/WARNINGs, but they would
often look generic and would not point to a specific problem (not for me).

Today I have tried kernel 4.9.37 with the 4.9.35-rt25 patch. It failed
again, here are the kernel error logs:

https://gist.github.com/Jajcus/494b79062b537269b49265ff3c50ee78

I have no idea how to properly debug the problem, even what data should
I collect to prepare a reasonable bug report.

I have already upgraded laptop firmware to the latest available, I have
also played with some BIOS setup options.

One thing that looks suspicious is the amount of time spent in IRQ 16
handling, which is: 'idma64, i801_smbus, i2c_designware'. Also the
i801-smbus in i2c-designware-platform were first to fail when loaded.
Regular kernel (without PREEMPT_RT_FULL) seems to generate much lower
number of those interrupts.

However, the system would crash even if the i2c modules are blacklisted
and not loaded. Getting rid of idma64 would not help much either. IRQ 16
would not come any more, but the system would slow down and crash anyway.


Any idea what is the problem?
Any hints how to debug it?


Greets,
Jacek

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: RT kernel on Acer laptop unreliable
  2017-07-15 17:43 RT kernel on Acer laptop unreliable Jacek Konieczny
@ 2017-07-17  6:21 ` Stéphane Ancelot
  2017-07-17  6:31   ` Piotr Gregor
  2017-08-07 13:19 ` Sebastian Andrzej Siewior
  1 sibling, 1 reply; 19+ messages in thread
From: Stéphane Ancelot @ 2017-07-17  6:21 UTC (permalink / raw)
  To: Jacek Konieczny, linux-rt-users

Hi,

If you don't manage to achieve realtime, you have some wrong options in 
your kernel settings .

Please share your kernel config.

Regards,

S.Ancelot


Le 15/07/2017 à 19:43, Jacek Konieczny a écrit :
> Hello,
>
> I need low and stable latency for audio processing. I use Guitarix,
> Ardour and other jack-based applications. For anything else I would use
> the regular kernel package from my distribution (PLD Linux), but the
> latency spikes are unacceptable in some cases. That is why I have been
> trying to use RT kernel, for a few months now…
>
> First I added the RT patch to the PLD Linux kernel package. I have tried
> both the 4.4 and 4.9 kernel. Neither of them worked well. Then I
> manually compiled a few vanilla 4.9.x kernels, just with the most recent
> RT patch and with a configuration made from scratch (not to copy any PLD
> mistakes). Still no luck.
>
> Most of the information I could find online is quite outdated or
> incomplete, so I have really little idea what the proper configuration
> of the RT kernel is or how to debug it.
>
> When the RT kernel works, it would provide quite acceptable latency – I
> could get down to 8ms in jack with no significant xruns. Unfortunately
> none of the RT kernels I tried would be reliable. With little load the
> system would work for a few hours sometimes, but often it would slow
> down and crash much earlier. With high load, e.g. when trying to play
> some demanding game, the system would usually crash quite quickly.
>
> Sometimes it would just lock up hard with no warning and nothing would
> work – not even magic sysrq.
> Other times it would gradually slow down until it is not usable at all.
> Sometimes I would be able to see some 'BUG' in dmesg, but rarely I would
> be able to restart the system cleanly.
> Sometimes only some subsystems would fail, while otherwise the system
> still seems to work. It could be sound, mouse, keyboard or network that
> doesn't work.
>
> System logs would contain some kernel BUGs/WARNINGs, but they would
> often look generic and would not point to a specific problem (not for me).
>
> Today I have tried kernel 4.9.37 with the 4.9.35-rt25 patch. It failed
> again, here are the kernel error logs:
>
> https://gist.github.com/Jajcus/494b79062b537269b49265ff3c50ee78
>
> I have no idea how to properly debug the problem, even what data should
> I collect to prepare a reasonable bug report.
>
> I have already upgraded laptop firmware to the latest available, I have
> also played with some BIOS setup options.
>
> One thing that looks suspicious is the amount of time spent in IRQ 16
> handling, which is: 'idma64, i801_smbus, i2c_designware'. Also the
> i801-smbus in i2c-designware-platform were first to fail when loaded.
> Regular kernel (without PREEMPT_RT_FULL) seems to generate much lower
> number of those interrupts.
>
> However, the system would crash even if the i2c modules are blacklisted
> and not loaded. Getting rid of idma64 would not help much either. IRQ 16
> would not come any more, but the system would slow down and crash anyway.
>
>
> Any idea what is the problem?
> Any hints how to debug it?
>
>
> Greets,
> Jacek
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: RT kernel on Acer laptop unreliable
  2017-07-17  6:21 ` Stéphane Ancelot
@ 2017-07-17  6:31   ` Piotr Gregor
  2017-07-17 15:06     ` Jacek Konieczny
  0 siblings, 1 reply; 19+ messages in thread
From: Piotr Gregor @ 2017-07-17  6:31 UTC (permalink / raw)
  To: Stéphane Ancelot, Jacek Konieczny, linux-rt-users

Hi Jacek,

Just to clarify: does the laptop works fine with other, non-rt-patched kernel?
If yes, couldn't you copy existing config from /boot and make oldconfig based on that? 

cd linux-4.4.32
cp /boot/config-`uname -r` .config;
make oldconfig

Then enable CONFIG_PREEMPT_RT_FULL and configure the rest if needed.
Simply enabling PREEMPT_RT_FULL is enough to get you started.
You may have to do some tricks to get GUI with reasonable graphics though
but this should give you runnable, stable system.

I have some luck with this approach on a couple of machines so far.

cheers,
Piotr

-----Original Message-----
From: linux-rt-users-owner@vger.kernel.org [mailto:linux-rt-users-owner@vger.kernel.org] On Behalf Of Stéphane Ancelot
Sent: 17 July 2017 07:21
To: Jacek Konieczny <jajcus@jajcus.net>; linux-rt-users@vger.kernel.org
Subject: Re: RT kernel on Acer laptop unreliable

Hi,

If you don't manage to achieve realtime, you have some wrong options in your kernel settings .

Please share your kernel config.

Regards,

S.Ancelot


Le 15/07/2017 à 19:43, Jacek Konieczny a écrit :
> Hello,
>
> I need low and stable latency for audio processing. I use Guitarix, 
> Ardour and other jack-based applications. For anything else I would 
> use the regular kernel package from my distribution (PLD Linux), but 
> the latency spikes are unacceptable in some cases. That is why I have 
> been trying to use RT kernel, for a few months now…
>
> First I added the RT patch to the PLD Linux kernel package. I have 
> tried both the 4.4 and 4.9 kernel. Neither of them worked well. Then I 
> manually compiled a few vanilla 4.9.x kernels, just with the most 
> recent RT patch and with a configuration made from scratch (not to 
> copy any PLD mistakes). Still no luck.
>
> Most of the information I could find online is quite outdated or 
> incomplete, so I have really little idea what the proper configuration 
> of the RT kernel is or how to debug it.
>
> When the RT kernel works, it would provide quite acceptable latency – 
> I could get down to 8ms in jack with no significant xruns. 
> Unfortunately none of the RT kernels I tried would be reliable. With 
> little load the system would work for a few hours sometimes, but often 
> it would slow down and crash much earlier. With high load, e.g. when 
> trying to play some demanding game, the system would usually crash quite quickly.
>
> Sometimes it would just lock up hard with no warning and nothing would 
> work – not even magic sysrq.
> Other times it would gradually slow down until it is not usable at all.
> Sometimes I would be able to see some 'BUG' in dmesg, but rarely I 
> would be able to restart the system cleanly.
> Sometimes only some subsystems would fail, while otherwise the system 
> still seems to work. It could be sound, mouse, keyboard or network 
> that doesn't work.
>
> System logs would contain some kernel BUGs/WARNINGs, but they would 
> often look generic and would not point to a specific problem (not for me).
>
> Today I have tried kernel 4.9.37 with the 4.9.35-rt25 patch. It failed 
> again, here are the kernel error logs:
>
> https://gist.github.com/Jajcus/494b79062b537269b49265ff3c50ee78
>
> I have no idea how to properly debug the problem, even what data 
> should I collect to prepare a reasonable bug report.
>
> I have already upgraded laptop firmware to the latest available, I 
> have also played with some BIOS setup options.
>
> One thing that looks suspicious is the amount of time spent in IRQ 16 
> handling, which is: 'idma64, i801_smbus, i2c_designware'. Also the 
> i801-smbus in i2c-designware-platform were first to fail when loaded.
> Regular kernel (without PREEMPT_RT_FULL) seems to generate much lower 
> number of those interrupts.
>
> However, the system would crash even if the i2c modules are 
> blacklisted and not loaded. Getting rid of idma64 would not help much 
> either. IRQ 16 would not come any more, but the system would slow down and crash anyway.
>
>
> Any idea what is the problem?
> Any hints how to debug it?
>
>
> Greets,
> Jacek
> --
> To unsubscribe from this list: send the line "unsubscribe 
> linux-rt-users" in the body of a message to majordomo@vger.kernel.org 
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: RT kernel on Acer laptop unreliable
  2017-07-17  6:31   ` Piotr Gregor
@ 2017-07-17 15:06     ` Jacek Konieczny
  2017-07-17 17:59       ` Piotr Gregor
  0 siblings, 1 reply; 19+ messages in thread
From: Jacek Konieczny @ 2017-07-17 15:06 UTC (permalink / raw)
  To: linux-rt-users; +Cc: Stéphane Ancelot, Piotr Gregor

On 2017-07-17 08:21, Stéphane Ancelot wrote:
> If you don't manage to achieve realtime, you have some wrong options in
> your kernel settings .

I do achieve real-time (stable, low-enough latency), but it doesn't last.


On 2017-07-17 08:31, Piotr Gregor wrote:
> Just to clarify: does the laptop works fine with other, non-rt-patched kernel?

Yes.

> If yes, couldn't you copy existing config from /boot and make oldconfig based on that? 

I have not done exactly that – I have added the RT patch and
CONFIG_PREEMPT_RT_FULL option (with dependencies) to the package source
from which the original kernel was built.

I can try the 'make oldconfig' with a working non-rt kernel config next
time, but I won't be able to do that in the next week.

> You may have to do some tricks to get GUI with reasonable graphics though
> but this should give you runnable, stable system.

I use GUI applications, rely on working OpenGL. Graphics is 'reasonable'
as long as the system is stable, but sooner or later it locks-up/crashes.

Here are two of the kernel configs I have tried:

https://gist.github.com/Jajcus/9af258ae55555afb126537e42d697d76


I suspect some firmware/hardware related thing which triggers the problem.


BIOS Information
        Vendor: Insyde Corp.
        Version: V1.25
        Release Date: 03/03/2017

System Information
        Manufacturer: Acer
        Product Name: Aspire E5-575
        Version: V1.25

Base Board Information
        Manufacturer: Acer
        Product Name: Ironman_SK
        Version: V1.25

CPU: Intel(R) Core(TM) i5-6267U CPU @ 2.90GHz

lspci:
00:00.0 Host bridge: Intel Corporation Skylake Host Bridge/DRAM
Registers (rev 09)
00:02.0 VGA compatible controller: Intel Corporation Iris Graphics 550
(rev 0a)
00:14.0 USB controller: Intel Corporation Sunrise Point-LP USB 3.0 xHCI
Controller (rev 21)
00:14.2 Signal processing controller: Intel Corporation Sunrise Point-LP
Thermal subsystem (rev 21)
00:15.0 Signal processing controller: Intel Corporation Sunrise Point-LP
Serial IO I2C Controller #0 (rev 21)
00:16.0 Communication controller: Intel Corporation Sunrise Point-LP
CSME HECI #1 (rev 21)
00:17.0 SATA controller: Intel Corporation Sunrise Point-LP SATA
Controller [AHCI mode] (rev 21)
00:1d.0 PCI bridge: Intel Corporation Sunrise Point-LP PCI Express Root
Port (rev f1)
00:1d.2 PCI bridge: Intel Corporation Device 9d1a (rev f1)
00:1d.3 PCI bridge: Intel Corporation Device 9d1b (rev f1)
00:1f.0 ISA bridge: Intel Corporation Sunrise Point-LP LPC Controller
(rev 21)
00:1f.2 Memory controller: Intel Corporation Sunrise Point-LP PMC (rev 21)
00:1f.3 Audio device: Intel Corporation Sunrise Point-LP HD Audio (rev 21)
00:1f.4 SMBus: Intel Corporation Sunrise Point-LP SMBus (rev 21)
02:00.0 Network controller: Qualcomm Atheros QCA9377 802.11ac Wireless
Network Adapter (rev 31)
03:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd.
RTL8411B PCI Express Card Reader (rev 01)
03:00.1 Ethernet controller: Realtek Semiconductor Co., Ltd.
RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 12)


Jacek

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: RT kernel on Acer laptop unreliable
  2017-07-17 15:06     ` Jacek Konieczny
@ 2017-07-17 17:59       ` Piotr Gregor
  0 siblings, 0 replies; 19+ messages in thread
From: Piotr Gregor @ 2017-07-17 17:59 UTC (permalink / raw)
  To: Jacek Konieczny; +Cc: linux-rt-users, Stéphane Ancelot, Piotr Gregor

Hi Jacek,

I have a script to install rt-patched kernel and thought you may give it
a go.

https://github.com/spinlockirqsave/scripts/blob/master/rt.sh

Script allows for installation of several versions of kernels though
some of them may require update of the url link to download kernel or
patch as these links change every now and then. I am using this script
these days to install linux-4.4.70 with patch-4.4.70-rt83 patch on a
couple of machines with success. Please use default option in Step 3
and 6.3 Substep in Step 6 to install former and copy existing kernel
config.

cheers,
Piotr

On Mon, Jul 17, 2017 at 05:06:58PM +0200, Jacek Konieczny wrote:
> On 2017-07-17 08:21, Stéphane Ancelot wrote:
> > If you don't manage to achieve realtime, you have some wrong options in
> > your kernel settings .
> 
> I do achieve real-time (stable, low-enough latency), but it doesn't last.
> 
> 
> On 2017-07-17 08:31, Piotr Gregor wrote:
> > Just to clarify: does the laptop works fine with other, non-rt-patched kernel?
> 
> Yes.
> 
> > If yes, couldn't you copy existing config from /boot and make oldconfig based on that? 
> 
> I have not done exactly that – I have added the RT patch and
> CONFIG_PREEMPT_RT_FULL option (with dependencies) to the package source
> from which the original kernel was built.
> 
> I can try the 'make oldconfig' with a working non-rt kernel config next
> time, but I won't be able to do that in the next week.
> 
> > You may have to do some tricks to get GUI with reasonable graphics though
> > but this should give you runnable, stable system.
> 
> I use GUI applications, rely on working OpenGL. Graphics is 'reasonable'
> as long as the system is stable, but sooner or later it locks-up/crashes.
> 
> Here are two of the kernel configs I have tried:
> 
> https://gist.github.com/Jajcus/9af258ae55555afb126537e42d697d76
> 
> 
> I suspect some firmware/hardware related thing which triggers the problem.
> 
> 
> BIOS Information
>         Vendor: Insyde Corp.
>         Version: V1.25
>         Release Date: 03/03/2017
> 
> System Information
>         Manufacturer: Acer
>         Product Name: Aspire E5-575
>         Version: V1.25
> 
> Base Board Information
>         Manufacturer: Acer
>         Product Name: Ironman_SK
>         Version: V1.25
> 
> CPU: Intel(R) Core(TM) i5-6267U CPU @ 2.90GHz
> 
> lspci:
> 00:00.0 Host bridge: Intel Corporation Skylake Host Bridge/DRAM
> Registers (rev 09)
> 00:02.0 VGA compatible controller: Intel Corporation Iris Graphics 550
> (rev 0a)
> 00:14.0 USB controller: Intel Corporation Sunrise Point-LP USB 3.0 xHCI
> Controller (rev 21)
> 00:14.2 Signal processing controller: Intel Corporation Sunrise Point-LP
> Thermal subsystem (rev 21)
> 00:15.0 Signal processing controller: Intel Corporation Sunrise Point-LP
> Serial IO I2C Controller #0 (rev 21)
> 00:16.0 Communication controller: Intel Corporation Sunrise Point-LP
> CSME HECI #1 (rev 21)
> 00:17.0 SATA controller: Intel Corporation Sunrise Point-LP SATA
> Controller [AHCI mode] (rev 21)
> 00:1d.0 PCI bridge: Intel Corporation Sunrise Point-LP PCI Express Root
> Port (rev f1)
> 00:1d.2 PCI bridge: Intel Corporation Device 9d1a (rev f1)
> 00:1d.3 PCI bridge: Intel Corporation Device 9d1b (rev f1)
> 00:1f.0 ISA bridge: Intel Corporation Sunrise Point-LP LPC Controller
> (rev 21)
> 00:1f.2 Memory controller: Intel Corporation Sunrise Point-LP PMC (rev 21)
> 00:1f.3 Audio device: Intel Corporation Sunrise Point-LP HD Audio (rev 21)
> 00:1f.4 SMBus: Intel Corporation Sunrise Point-LP SMBus (rev 21)
> 02:00.0 Network controller: Qualcomm Atheros QCA9377 802.11ac Wireless
> Network Adapter (rev 31)
> 03:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd.
> RTL8411B PCI Express Card Reader (rev 01)
> 03:00.1 Ethernet controller: Realtek Semiconductor Co., Ltd.
> RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 12)
> 
> 
> Jacek
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: RT kernel on Acer laptop unreliable
  2017-07-15 17:43 RT kernel on Acer laptop unreliable Jacek Konieczny
  2017-07-17  6:21 ` Stéphane Ancelot
@ 2017-08-07 13:19 ` Sebastian Andrzej Siewior
  2017-08-12 10:07   ` Jacek Konieczny
  2017-08-14 18:03   ` Jacek Konieczny
  1 sibling, 2 replies; 19+ messages in thread
From: Sebastian Andrzej Siewior @ 2017-08-07 13:19 UTC (permalink / raw)
  To: Jacek Konieczny; +Cc: linux-rt-users

On 2017-07-15 19:43:11 [+0200], Jacek Konieczny wrote:
> Hello,
Hi,

> Most of the information I could find online is quite outdated or
> incomplete, so I have really little idea what the proper configuration
> of the RT kernel is or how to debug it.

usually people take their local distro's config (make localyesconfig),
patch RT, enable PREEMPT-FULL (via make oldconfig) and tweak the config
in what they think is best for them.

> Sometimes it would just lock up hard with no warning and nothing would
> work – not even magic sysrq.
> Other times it would gradually slow down until it is not usable at all.
> Sometimes I would be able to see some 'BUG' in dmesg, but rarely I would
> be able to restart the system cleanly.
> Sometimes only some subsystems would fail, while otherwise the system
> still seems to work. It could be sound, mouse, keyboard or network that
> doesn't work.
> 
> System logs would contain some kernel BUGs/WARNINGs, but they would
> often look generic and would not point to a specific problem (not for me).

one would need a BUG/WARNING error report of some kind to start
somewhere.

> Today I have tried kernel 4.9.37 with the 4.9.35-rt25 patch. It failed
> again, here are the kernel error logs:
> 
> https://gist.github.com/Jajcus/494b79062b537269b49265ff3c50ee78

Is this reproducible or do you so each time something else?

> I have no idea how to properly debug the problem, even what data should
> I collect to prepare a reasonable bug report.

This is probably -EDEADLK coming from task_blocks_on_rt_mutex(). I
suspect that the following patch

diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
index 78a6c4a223c1..59430ede6e89 100644
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -524,6 +524,7 @@ static int rt_mutex_adjust_prio_chain(struct task_struct *task,
 		}
 		put_task_struct(task);
 
+		pr_err("EDEADLK #1\n");
 		return -EDEADLK;
 	}
 
@@ -639,6 +640,7 @@ static int rt_mutex_adjust_prio_chain(struct task_struct *task,
 		debug_rt_mutex_deadlock(chwalk, orig_waiter, lock);
 		raw_spin_unlock(&lock->wait_lock);
 		ret = -EDEADLK;
+		pr_err("EDEADLK #2\n");
 		goto out_unlock_pi;
 	}
 
@@ -1081,6 +1083,8 @@ static void  noinline __sched rt_spin_lock_slowlock(struct rt_mutex *lock,
 	raw_spin_unlock(&self->pi_lock);
 
 	ret = task_blocks_on_rt_mutex(lock, &waiter, self, RT_MUTEX_MIN_CHAINWALK);
+	if (ret )
+		pr_err("Crashing soon on %d (%p %p)\n", ret, rt_mutex_owner(lock), self);
 	BUG_ON(ret);
 
 	for (;;) {

will return "EDEADLK #2". And we got rid of two instances of this error
before v4.9 went into maintain mode.

> Any idea what is the problem?
> Any hints how to debug it?

The patch should confirm the origin of the return error code, not the
reason. The backtrace comes from networking so with networking disabled,
it should not get into this particular problem.
One thing you could try, is to see if the latest v4.11 based RT kernel
works more reliable.

> Greets,
> Jacek

Sebastian

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: RT kernel on Acer laptop unreliable
  2017-08-07 13:19 ` Sebastian Andrzej Siewior
@ 2017-08-12 10:07   ` Jacek Konieczny
  2017-08-14 18:03   ` Jacek Konieczny
  1 sibling, 0 replies; 19+ messages in thread
From: Jacek Konieczny @ 2017-08-12 10:07 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: linux-rt-users

Hi,

>> Most of the information I could find online is quite outdated or
>> incomplete, so I have really little idea what the proper configuration
>> of the RT kernel is or how to debug it.
> 
> usually people take their local distro's config (make localyesconfig),
> patch RT, enable PREEMPT-FULL (via make oldconfig) and tweak the config
> in what they think is best for them.

That was the first thing that I tried.


>> I have no idea how to properly debug the problem, even what data should
>> I collect to prepare a reasonable bug report.
> 
> This is probably -EDEADLK coming from task_blocks_on_rt_mutex(). I
> suspect that the following patch
> 
> diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
> index 78a6c4a223c1..59430ede6e89 100644
> --- a/kernel/locking/rtmutex.c
> +++ b/kernel/locking/rtmutex.c
> @@ -524,6 +524,7 @@ static int rt_mutex_adjust_prio_chain(struct task_struct *task,
>  		}
>  		put_task_struct(task);
>  
> +		pr_err("EDEADLK #1\n");
>  		return -EDEADLK;
>  	}
>  
> @@ -639,6 +640,7 @@ static int rt_mutex_adjust_prio_chain(struct task_struct *task,
>  		debug_rt_mutex_deadlock(chwalk, orig_waiter, lock);
>  		raw_spin_unlock(&lock->wait_lock);
>  		ret = -EDEADLK;
> +		pr_err("EDEADLK #2\n");
>  		goto out_unlock_pi;
>  	}
>  
> @@ -1081,6 +1083,8 @@ static void  noinline __sched rt_spin_lock_slowlock(struct rt_mutex *lock,
>  	raw_spin_unlock(&self->pi_lock);
>  
>  	ret = task_blocks_on_rt_mutex(lock, &waiter, self, RT_MUTEX_MIN_CHAINWALK);
> +	if (ret )
> +		pr_err("Crashing soon on %d (%p %p)\n", ret, rt_mutex_owner(lock), self);
>  	BUG_ON(ret);
>  
>  	for (;;) {
> 
> will return "EDEADLK #2". And we got rid of two instances of this error
> before v4.9 went into maintain mode.

Got it! I had to extract this from EFI pstore, as the disk was already dead.

<3>[  917.051362] EDEADLK #2
<3>[  917.051364] Crashing soon on -35 (ffff8e3906642000 ffff8e38f73d6000)
<4>[  917.051390] ------------[ cut here ]------------
<2>[  917.051390] kernel BUG at kernel/locking/rtmutex.c:1088!
<4>[  917.051391] invalid opcode: 0000 [#1] PREEMPT SMP
<4>[  917.051408] Modules linked in: snd_seq_dummy snd_seq fuse
ip6t_REJECT nf_reject_ipv6 ip6t_rpfilter xt_tcpudp ipt_REJECT
nf_reject_ipv4 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute
bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6
ip6table_mangle ip6table_raw ip6table_security iptable_nat
nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack
iptable_mangle iptable_raw iptable_security ebtable_filter ebtables
ip6table_filter ip6_tables iptable_filter bnep msr joydev acer_wmi
sparse_keymap coretemp hwmon intel_rapl efi_pstore snd_hda_codec_hdmi
intel_powerclamp intel_cstate snd_soc_skl snd_hda_codec_realtek
snd_soc_skl_ipc intel_uncore snd_hda_codec_generic snd_soc_sst_ipc
snd_soc_sst_dsp intel_rapl_perf snd_hda_ext_core snd_soc_sst_match
snd_soc_core snd_compress
<4>[  917.051429]  ac97_bus psmouse snd_pcm_dmaengine pcspkr
snd_hda_intel efivars snd_hda_codec snd_hda_core input_leds uvcvideo
r8169 btusb videobuf2_vmalloc mii videobuf2_memops btrtl videobuf2_v4l2
btbcm snd_usb_audio videobuf2_core btintel videodev snd_usbmidi_lib
snd_hwdep bluetooth media snd_rawmidi snd_seq_device snd_pcm mei_me mei
shpchp dell_smo8800 wmi pinctrl_sunrisepoint pinctrl_intel
intel_lpss_acpi idma64 evdev battery fjes tpm_crb intel_lpss_pci
acpi_pad intel_lpss ac intel_pch_thermal thermal sch_fq_codel ip_tables
x_tables ext4 crc16 jbd2 fscrypto mbcache dm_crypt algif_skcipher af_alg
sr_mod cdrom sd_mod hid_generic usbhid i915 rtsx_pci_sdmmc mmc_core
crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_gtt ahci
i2c_algo_bit aesni_intel libahci drm_kms_helper aes_x86_64
glue_helperOops#1 Part4
 syscopyarea lrw sysfillrect libata gf128mul xhci_pci ablk_helper
sysimgblt cryptd fb_sys_fops xhci_hcd drm serio_raw scsi_mod i2c_hid
rtsx_pci usbcore hid i2c_core video button dm_mirror dm_region_hash
dm_log rpcsec_gss_krb5 auth_rpcgss sunrpc snd_hrtimer snd_timer snd
soundcore dm_cache_smq dm_cache dm_persistent_data libcrc32c
crc32c_generic crc32c_intel dm_bufio dm_bio_prison dm_mod efivarfs autofs4
<4>[  917.051442] CPU: 0 PID: 1213 Comm: zabbix_agentd Not tainted
4.9.37-rt25-1 #1
<4>[  917.051443] Hardware name: Acer Aspire E5-575/Ironman_SK  , BIOS
V1.25 03/03/2017
<4>[  917.051443] task: ffff8e38f73d6000 task.stack: ffff97dc022b4000
<4>[  917.051447] RIP: 0010:[<ffffffffb8656ad2>]  [<ffffffffb8656ad2>]
rt_spin_lock_slowlock+0x362/0x3e0
<4>[  917.051448] RSP: 0018:ffff97dc022b7c10  EFLAGS: 00010082
<4>[  917.051449] RAX: 0000000000000038 RBX: ffff97dc022b7c30 RCX:
0000000000000000
<4>[  917.051449] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
0000000000000001
<4>[  917.051450] RBP: ffff97dc022b7cd0 R08: 0000000000000000 R09:
0000000000000038
<4>[  917.051450] R10: 0000000000000008 R11: 000000000002b23c R12:
ffff8e38f73d6000
Oops#1 Part2
<4>[  917.051451] R13: 0000000000000246 R14: ffff8e391000cdd8 R15:
ffff8e38f73d6000
<4>[  917.051452] FS:  00007f4f233d4780(0000) GS:ffff8e3910000000(0000)
knlGS:0000000000000000
<4>[  917.051452] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[  917.051453] CR2: 00007f4f22050f38 CR3: 00000002368ef000 CR4:
00000000003406f0
<4>[  917.051453] Stack:
<4>[  917.051456]  ffffffffb8586dd5 ffffffffb8cd4880 00ff8e3905410000
ffff8e38f73d6890
<4>[  917.051457]  0000000000000001 0000000000000000 0000000000000000
0000000000000001
<4>[  917.051458]  0000000000000000 0000000000000000 ffff8e38f73d6000
ffff8e391000cdd8
<4>[  917.051459] Call Trace:
<4>[  917.051461]  [<ffffffffb8586dd5>] ? ip_local_out+0x35/0x40
<4>[  917.051464]  [<ffffffffb8659220>] rt_spin_lock__no_mg+0x10/0x20
<4>[  917.051466]  [<ffffffffb806b1e6>] do_current_softirqs+0x116/0x370
<4>[  917.051468]  [<ffffffffb806b49b>] __local_bh_enable+0x5b/0x80
<4>[  917.051472]  [<ffffffffb85a5c6f>] tcp_v4_send_reset+0x3df/0x530
<4>[  917.051475]  [<ffffffffb859d400>] ? tcp_rcv_state_process+0x280/0xda0
<4>[  917.051481]  [<ffffffffb8090b57>] ? migrate_enable+0x1e7/0x360
<4>[  917.051483]  [<ffffffffb85a5f33>] tcp_v4_do_rcv+0x73/0x210
<4>[  917.051487]  [<ffffffffb852147b>] __release_sock+0x6b/0x110
<4>[  917.051489]  [<ffffffffb8521555>] release_sock+0x35/0xa0
<4>[  917.051493]  [<ffffffffb85be516>] inet_shutdown+0x86/0x100
<4>[  917.051494]  [<ffffffffb851e704>] SyS_shutdown+0x84/0x90
<4>[  917.051495]  [<ffffffffb8002ddf>] do_syscall_64+0x7f/0x190
<4>[  917.051496]  [<ffffffffb8659723>] entry_SYSCALL64_slow_path+0x25/0x25
<4>[  917.051508] Code: ff e9 27 fe ff ff e8 1e 42 a7 ff e9 2f fe ff ff
0f 0b 49 8b 56 18 4c 89 e1 89 c6 48 c7 c7 20 56 9b b8 48 83 e2 fe e8 42
c1 b2 ff <0f> 0b 31 d2 b9 01 00 00 00 4c 89 e6 4c 89 f7 e8 5a 30 a6 ff 85
<1>[  917.051509] RIP  [<ffffffffb8656ad2>]
rt_spin_lock_slowlock+0x362/0x3e0
<4>[  917.051510]  RSP <ffff97dc022b7c10>
<4>[  917.058528] ---[ end trace 0000000000000002 ]---

That was on 4.9.37-rt25-1

> 
>> Any idea what is the problem?
>> Any hints how to debug it?
> 
> The patch should confirm the origin of the return error code, not the
> reason.

So we have the origin confirmed. How can we find the reason?

> The backtrace comes from networking so with networking disabled,
> it should not get into this particular problem.

Unfortunately, I need the network here.

> One thing you could try, is to see if the latest v4.11 based RT kernel
> works more reliable.

I will compile it now and see.

Thanks.

Jacek

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: RT kernel on Acer laptop unreliable
  2017-08-07 13:19 ` Sebastian Andrzej Siewior
  2017-08-12 10:07   ` Jacek Konieczny
@ 2017-08-14 18:03   ` Jacek Konieczny
  2017-08-18 12:44     ` Sebastian Andrzej Siewior
  1 sibling, 1 reply; 19+ messages in thread
From: Jacek Konieczny @ 2017-08-14 18:03 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: linux-rt-users

On 2017-08-07 15:19, Sebastian Andrzej Siewior wrote:
> One thing you could try, is to see if the latest v4.11 based RT kernel
> works more reliable.

I tried 4.11.12-rt9 (with your patch applied, just in case). It worked
find for a day and then failed like the older kernels:

Aug 13 21:26:34 lolek kernel: EDEADLK #2
Aug 13 21:26:34 lolek kernel: Crashing soon on -35 (ffff8f3f8674df40
ffff8f3f76a11fc0)
Aug 13 21:26:34 lolek kernel: ------------[ cut here ]------------
Aug 13 21:26:34 lolek kernel: kernel BUG at kernel/locking/rtmutex.c:1077!
Aug 13 21:26:34 lolek kernel: invalid opcode: 0000 [#1] PREEMPT SMP
Aug 13 21:26:34 lolek kernel: Modules linked in: snd_seq_dummy snd_seq
cts nfsv4 dns_resolver nfs lockd grace fscache fuse ip6t_REJECT
nf_reject_ipv6 ip6t_rpfilter xt_tcpudp ipt_REJECT nf_reject_ipv4
xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc
ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6
ip6table_mangle ip6table_raw ip6table_security iptable_nat
nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack
iptable_mangle iptable_raw iptable_security ebtable_filter ebtables
ip6table_filter ip6_tables iptable_filter msr bnep joydev acer_wmi
sparse_keymap coretemp hwmon intel_rapl intel_powerclamp intel_cstate
intel_uncore intel_rapl_perf efi_pstore psmouse pcspkr snd_soc_skl
uvcvideo efivars snd_soc_skl_ipc snd_hda_codec_hdmi videobuf2_vmalloc
snd_soc_sst_ipc videobuf2_memops
Aug 13 21:26:34 lolek kernel:  snd_soc_sst_dsp videobuf2_v4l2
snd_hda_ext_core snd_hda_codec_realtek videobuf2_core snd_soc_sst_match
snd_hda_codec_generic videodev input_leds snd_soc_core btusb btrtl media
btbcm snd_compress btintel ac97_bus snd_pcm_dmaengine snd_usb_audio
bluetooth snd_usbmidi_lib snd_hda_intel snd_rawmidi snd_hda_codec
snd_seq_device snd_hda_core snd_hwdep r8169 snd_pcm mii mei_me mei
shpchp dell_smo8800 thermal wmi pinctrl_sunrisepoint pinctrl_intel
intel_lpss_acpi idma64 battery tpm_crb evdev intel_lpss_pci acpi_pad ac
intel_lpss intel_pch_thermal sch_fq_codel ip_tables x_tables ext4 crc16
jbd2 fscrypto mbcache dm_crypt algif_skcipher af_alg hid_generic usbhid
sr_mod cdrom sd_mod rtsx_pci_sdmmc i915 mmc_core crct10dif_pclmul
crc32_pclmul intel_gtt ghash_clmulni_intel ahci i2c_algo_bit libahci
Aug 13 21:26:34 lolek kernel:  drm_kms_helper libata aesni_intel
syscopyarea sysfillrect aes_x86_64 sysimgblt crypto_simd xhci_pci
fb_sys_fops cryptd scsi_mod glue_helper xhci_hcd drm serio_raw rtsx_pci
i2c_hid usbcore hid i2c_core video button dm_mirror dm_region_hash
dm_log rpcsec_gss_krb5 auth_rpcgss sunrpc snd_hrtimer snd_timer snd
soundcore dm_cache_smq dm_cache dm_persistent_data libcrc32c
crc32c_generic crc32c_intel dm_bufio dm_bio_prison dm_mod efivarfs autofs4
Aug 13 21:26:34 lolek kernel: CPU: 3 PID: 1312 Comm: zabbix_agentd Not
tainted 4.11.12-rt9-1 #2
Aug 13 21:26:34 lolek kernel: Hardware name: Acer Aspire
E5-575/Ironman_SK  , BIOS V1.25 03/03/2017
Aug 13 21:26:34 lolek kernel: task: ffff8f3f76a11fc0 task.stack:
ffff90d7022dc000
Aug 13 21:26:34 lolek kernel: RIP:
0010:rt_spin_lock_slowlock_locked+0x2cf/0x300
Aug 13 21:26:34 lolek kernel: RSP: 0018:ffff90d7022dfbe8 EFLAGS: 00010082
Aug 13 21:26:34 lolek kernel: RAX: 0000000000000038 RBX:
ffff8f3f76a11fc0 RCX: 0000000000000000
Aug 13 21:26:34 lolek kernel: RDX: 0000000000000000 RSI:
0000000000000038 RDI: 0000000000000001
Aug 13 21:26:34 lolek kernel: RBP: ffff90d7022dfc28 R08:
0000000000000000 R09: 0000000000000038
Aug 13 21:26:34 lolek kernel: R10: 0000000000000008 R11:
000000000001ac04 R12: ffff8f3f76a11fc0
Aug 13 21:26:34 lolek kernel: R13: ffff90d7022dfc38 R14:
0000000000000246 R15: ffff8f3f9158ce18
Aug 13 21:26:34 lolek kernel: FS:  00007f4858271780(0000)
GS:ffff8f3f91580000(0000) knlGS:0000000000000000
Aug 13 21:26:34 lolek kernel: CS:  0010 DS: 0000 ES: 0000 CR0:
0000000080050033
Aug 13 21:26:34 lolek kernel: CR2: 00007f4856eedf38 CR3:
0000000236a1a000 CR4: 00000000003406e0
Aug 13 21:26:34 lolek kernel: Call Trace:
Aug 13 21:26:34 lolek kernel:  rt_spin_lock_slowlock+0x67/0xa0
Aug 13 21:26:34 lolek kernel:  rt_spin_lock+0x1a/0x20
Aug 13 21:26:34 lolek kernel:  do_current_softirqs+0x116/0x390
Aug 13 21:26:34 lolek kernel:  __local_bh_enable+0x5b/0x80
Aug 13 21:26:34 lolek kernel:  tcp_v4_send_reset+0x412/0x590
Aug 13 21:26:34 lolek kernel:  ? tcp_rcv_state_process+0x28a/0xda0
Aug 13 21:26:34 lolek kernel:  ? preempt_count_sub+0xa1/0x100
Aug 13 21:26:34 lolek kernel:  tcp_v4_do_rcv+0x73/0x200
Aug 13 21:26:34 lolek kernel:  ? tcp_v4_do_rcv+0x73/0x200
Aug 13 21:26:34 lolek kernel:  __release_sock+0x6b/0x110
Aug 13 21:26:34 lolek kernel:  release_sock+0x35/0xa0
Aug 13 21:26:34 lolek kernel:  inet_shutdown+0x86/0x100
Aug 13 21:26:34 lolek kernel:  SyS_shutdown+0x84/0x90
Aug 13 21:26:34 lolek kernel:  do_syscall_64+0x7f/0x190
Aug 13 21:26:34 lolek kernel:  entry_SYSCALL64_slow_path+0x25/0x25
Aug 13 21:26:34 lolek kernel: RIP: 0033:0x7f4856c38b57
Aug 13 21:26:34 lolek kernel: RSP: 002b:00007fffe1110558 EFLAGS:
00000202 ORIG_RAX: 0000000000000030
Aug 13 21:26:34 lolek kernel: RAX: ffffffffffffffda RBX:
00007fffe11105e0 RCX: 00007f4856c38b57
Aug 13 21:26:34 lolek kernel: RDX: 00000000ffffffff RSI:
0000000000000002 RDI: 0000000000000007
Aug 13 21:26:34 lolek kernel: RBP: 00000000006671f8 R08:
000000000157edc0 R09: 0000000000000000
Aug 13 21:26:34 lolek kernel: R10: 000000000158e130 R11:
0000000000000202 R12: 000000000044afb0
Aug 13 21:26:34 lolek kernel: R13: 00000000006607dc R14:
00000000006615e0 R15: 00007fffe11105a0
Aug 13 21:26:34 lolek kernel: Code: 89 e6 e8 85 2b a3 ff 85 c0 75 bc e9
89 fd ff ff 0f 0b 49 8b 57 18 4c 89 e1 89 c6 48 c7 c7 00 1c 9c b8 48 83
e2 fe e8 69 cf af ff <0f> 0b 49 8b 47 10 4c 3b 78 38 75 16 49 39 c5 75
85 0f 0b 0f 0b
Aug 13 21:26:34 lolek kernel: RIP:
rt_spin_lock_slowlock_locked+0x2cf/0x300 RSP: ffff90d7022dfbe8
Aug 13 21:26:34 lolek kernel: ---[ end trace 0000000000000002 ]---

Jacek

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: RT kernel on Acer laptop unreliable
  2017-08-14 18:03   ` Jacek Konieczny
@ 2017-08-18 12:44     ` Sebastian Andrzej Siewior
  2017-08-20 14:48       ` Jacek Konieczny
  0 siblings, 1 reply; 19+ messages in thread
From: Sebastian Andrzej Siewior @ 2017-08-18 12:44 UTC (permalink / raw)
  To: Jacek Konieczny; +Cc: linux-rt-users

On 2017-08-14 20:03:26 [+0200], Jacek Konieczny wrote:
> On 2017-08-07 15:19, Sebastian Andrzej Siewior wrote:
> > One thing you could try, is to see if the latest v4.11 based RT kernel
> > works more reliable.
> 
> I tried 4.11.12-rt9 (with your patch applied, just in case). It worked
> find for a day and then failed like the older kernels:

I see. Could you try to enable lockdep and see if it yell in dmesg?
lockdep would be:
	CONFIG_DEBUG_RT_MUTEXES=y
	CONFIG_PROVE_LOCKING=y
	CONFIG_DEBUG_ATOMIC_SLEEP=y

> Jacek

Sebastian

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: RT kernel on Acer laptop unreliable
  2017-08-18 12:44     ` Sebastian Andrzej Siewior
@ 2017-08-20 14:48       ` Jacek Konieczny
  2017-08-21 13:12         ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 19+ messages in thread
From: Jacek Konieczny @ 2017-08-20 14:48 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: linux-rt-users

On 2017-08-18 14:44, Sebastian Andrzej Siewior wrote:
> On 2017-08-14 20:03:26 [+0200], Jacek Konieczny wrote:
>> On 2017-08-07 15:19, Sebastian Andrzej Siewior wrote:
>>> One thing you could try, is to see if the latest v4.11 based RT kernel
>>> works more reliable.
>>
>> I tried 4.11.12-rt9 (with your patch applied, just in case). It worked
>> find for a day and then failed like the older kernels:
> 
> I see. Could you try to enable lockdep and see if it yell in dmesg?
> lockdep would be:
> 	CONFIG_DEBUG_RT_MUTEXES=y
> 	CONFIG_PROVE_LOCKING=y
> 	CONFIG_DEBUG_ATOMIC_SLEEP=y

Sure. I have recompiled the kernel with those settings, and got this:


======================================================
[ INFO: possible circular locking dependency detected ]
4.11.12-rt9-1 #3 Not tainted
-------------------------------------------------------
zabbix_agentd/1299 is trying to acquire lock:
 (&per_cpu(local_softirq_locks[i], __cpu).lock){+.+...}, at:
[<ffffffff8a06e1ed>] do_current_softirqs+0x14d/0x670

                              but task is already holding lock:
 ((tcp_sk_lock).lock){+.+...}, at: [<ffffffff8a65eab1>]
tcp_v4_send_reset+0x3b1/0x7d0

                              which lock already depends on the new lock.

                              the existing dependency chain (in reverse
order) is:

                              -> #1 ((tcp_sk_lock).lock){+.+...}:
       lock_acquire+0xb7/0x250
       rt_spin_lock+0x4b/0x60
       tcp_v4_send_reset+0x3b1/0x7d0
       tcp_v4_rcv+0x7c0/0xfb0
       ip_local_deliver_finish+0xe4/0x3d0
       ip_local_deliver+0x1a7/0x220
       ip_rcv_finish+0x222/0x6e0
       ip_rcv+0x3aa/0x540
       __netif_receive_skb_core+0x790/0xdc0
       __netif_receive_skb+0x1d/0x60
       process_backlog+0x9f/0x270
       net_rx_action+0x389/0x6c0
       do_current_softirqs+0x22e/0x670
       __local_bh_enable+0x5b/0x80
       ip_finish_output2+0x2aa/0x5e0
       ip_finish_output+0x229/0x320
       ip_output+0x182/0x260
       ip_local_out+0x39/0x70
       ip_queue_xmit+0x1e8/0x5e0
       tcp_transmit_skb+0x4ce/0x9e0
       tcp_connect+0x658/0x9f0
       tcp_v4_connect+0x56e/0x5b0
       __inet_stream_connect+0xb7/0x320
       inet_stream_connect+0x3b/0x60
       SyS_connect+0xe1/0x120
       do_syscall_64+0x7f/0x210
       return_from_SYSCALL_64+0x0/0x7a

                              -> #0 (&per_cpu(local_softirq_locks[i],
__cpu).lock){+.+...}:
       __lock_acquire+0x1b84/0x1d30
       lock_acquire+0xb7/0x250
       rt_spin_lock+0x4b/0x60
       do_current_softirqs+0x14d/0x670
       __local_bh_enable+0x5b/0x80
       tcp_v4_send_reset+0x48b/0x7d0
       tcp_v4_do_rcv+0x73/0x200
       __release_sock+0x86/0x160
       release_sock+0x35/0xc0
       inet_shutdown+0x86/0x100
       SyS_shutdown+0x84/0x90
       do_syscall_64+0x7f/0x210
       return_from_SYSCALL_64+0x0/0x7a

                              other info that might help us debug this:
 Possible unsafe locking scenario:
       CPU0                    CPU1
       ----                    ----
  lock((tcp_sk_lock).lock);
                               lock(&per_cpu(local_softirq_locks[i],
__cpu).lock);
                               lock((tcp_sk_lock).lock);
  lock(&per_cpu(local_softirq_locks[i], __cpu).lock);

                               *** DEADLOCK ***
3 locks held by zabbix_agentd/1299:
 #0:  (sk_lock-AF_INET){+.+.+.}, at: [<ffffffff8a67d45b>]
inet_shutdown+0x3b/0x100
 #1:  (rcu_read_lock){......}, at: [<ffffffff8a65e834>]
tcp_v4_send_reset+0x134/0x7d0
 #2:  ((tcp_sk_lock).lock){+.+...}, at: [<ffffffff8a65eab1>]
tcp_v4_send_reset+0x3b1/0x7d0

                              stack backtrace:
CPU: 1 PID: 1299 Comm: zabbix_agentd Not tainted 4.11.12-rt9-1 #3
Hardware name: Acer Aspire E5-575/Ironman_SK  , BIOS V1.25 03/03/2017
Call Trace:
 dump_stack+0x68/0x92
 print_circular_bug+0x1f6/0x300
 __lock_acquire+0x1b84/0x1d30
 ? preempt_count_sub+0xa1/0x100
 lock_acquire+0xb7/0x250
 ? lock_acquire+0xb7/0x250
 ? do_current_softirqs+0x14d/0x670
 rt_spin_lock+0x4b/0x60
 ? do_current_softirqs+0x14d/0x670
 do_current_softirqs+0x14d/0x670
 ? __local_bh_enable+0x23/0x80
 __local_bh_enable+0x5b/0x80
 tcp_v4_send_reset+0x48b/0x7d0
 ? tcp_rcv_state_process+0x28c/0xf20
 tcp_v4_do_rcv+0x73/0x200
 ? tcp_v4_do_rcv+0x73/0x200
 __release_sock+0x86/0x160
 release_sock+0x35/0xc0
 inet_shutdown+0x86/0x100
 SyS_shutdown+0x84/0x90
 do_syscall_64+0x7f/0x210
 entry_SYSCALL64_slow_path+0x25/0x25
RIP: 0033:0x7fbaa1dbbb57
RSP: 002b:00007ffd750de028 EFLAGS: 00000202 ORIG_RAX: 0000000000000030
RAX: ffffffffffffffda RBX: 00007ffd750de0b0 RCX: 00007fbaa1dbbb57
RDX: 00000000ffffffff RSI: 0000000000000002 RDI: 0000000000000007
RBP: 00000000006671f8 R08: 0000000000995de0 R09: 0000000000000000
R10: 00007fbaa206ebe8 R11: 0000000000000202 R12: 000000000044afb0
R13: 00000000006607dc R14: 00000000006615e0 R15: 00007ffd750de070

Does this help?

The system has not crashed yet, I may catch something more.

Jacek

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: RT kernel on Acer laptop unreliable
  2017-08-20 14:48       ` Jacek Konieczny
@ 2017-08-21 13:12         ` Sebastian Andrzej Siewior
  2017-08-23 17:04           ` Jacek Konieczny
  0 siblings, 1 reply; 19+ messages in thread
From: Sebastian Andrzej Siewior @ 2017-08-21 13:12 UTC (permalink / raw)
  To: Jacek Konieczny; +Cc: linux-rt-users

On 2017-08-20 16:48:24 [+0200], Jacek Konieczny wrote:
> Does this help?

yup, the following patch should make it go away.

diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 55c5f068d986..8bbb8f967614 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -713,8 +713,8 @@ static void tcp_v4_send_reset(const struct sock *sk, struct sk_buff *skb)
 
 	arg.tos = ip_hdr(skb)->tos;
 	arg.uid = sock_net_uid(net, sk && sk_fullsock(sk) ? sk : NULL);
-	local_lock(tcp_sk_lock);
 	local_bh_disable();
+	local_lock(tcp_sk_lock);
 	ip_send_unicast_reply(*this_cpu_ptr(net->ipv4.tcp_sk),
 			      skb, &TCP_SKB_CB(skb)->header.h4.opt,
 			      ip_hdr(skb)->saddr, ip_hdr(skb)->daddr,
@@ -722,8 +722,8 @@ static void tcp_v4_send_reset(const struct sock *sk, struct sk_buff *skb)
 
 	__TCP_INC_STATS(net, TCP_MIB_OUTSEGS);
 	__TCP_INC_STATS(net, TCP_MIB_OUTRSTS);
-	local_bh_enable();
 	local_unlock(tcp_sk_lock);
+	local_bh_enable();
 
 #ifdef CONFIG_TCP_MD5SIG
 out:
@@ -801,16 +801,16 @@ static void tcp_v4_send_ack(const struct sock *sk,
 		arg.bound_dev_if = oif;
 	arg.tos = tos;
 	arg.uid = sock_net_uid(net, sk_fullsock(sk) ? sk : NULL);
-	local_lock(tcp_sk_lock);
 	local_bh_disable();
+	local_lock(tcp_sk_lock);
 	ip_send_unicast_reply(*this_cpu_ptr(net->ipv4.tcp_sk),
 			      skb, &TCP_SKB_CB(skb)->header.h4.opt,
 			      ip_hdr(skb)->saddr, ip_hdr(skb)->daddr,
 			      &arg, arg.iov[0].iov_len);
 
 	__TCP_INC_STATS(net, TCP_MIB_OUTSEGS);
-	local_bh_enable();
 	local_unlock(tcp_sk_lock);
+	local_bh_enable();
 }
 
 static void tcp_v4_timewait_ack(struct sock *sk, struct sk_buff *skb)


> The system has not crashed yet, I may catch something more.

okay.

> Jacek

Sebastian

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: RT kernel on Acer laptop unreliable
  2017-08-21 13:12         ` Sebastian Andrzej Siewior
@ 2017-08-23 17:04           ` Jacek Konieczny
  2017-09-05  9:17             ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 19+ messages in thread
From: Jacek Konieczny @ 2017-08-23 17:04 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: linux-rt-users

On 2017-08-21 15:12, Sebastian Andrzej Siewior wrote:
> On 2017-08-20 16:48:24 [+0200], Jacek Konieczny wrote:
>> Does this help?
> 
> yup, the following patch should make it go away.
> 
> diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
[...]

I have applied the patch, no crash since then, but I got another problem
detected:


======================================================
[ INFO: possible circular locking dependency detected ]
4.11.12-rt9-1 #4 Not tainted
-------------------------------------------------------
irq/132-ath10k_/1008 is trying to acquire lock:
 ((icmp_sk_lock).lock){+.+...}, at: [<ffffffff92677264>]
icmp_send+0x144/0x7a0

                              but task is already holding lock:
 ((xt_write_lock).lock){+.+...}, at: [<ffffffffc059b256>]
ipt_do_table+0xd6/0x710 [ip_tables]

                              which lock already depends on the new lock.

                              the existing dependency chain (in reverse
order) is:

                              -> #1 ((xt_write_lock).lock){+.+...}:
       lock_acquire+0xb7/0x250
       rt_spin_lock+0x4b/0x60
       ipt_do_table+0xd6/0x710 [ip_tables]
       iptable_raw_hook+0x27/0x60 [iptable_raw]
       nf_hook_slow+0x2c/0xf0
       __ip_local_out+0x134/0x2e0
       ip_local_out+0x1c/0x70
       ip_send_skb+0x19/0x40
       ip_push_pending_frames+0x33/0x40
       icmp_push_reply+0xf5/0x130
       icmp_send+0x754/0x7a0
       __udp4_lib_rcv+0x927/0xc00
       udp_rcv+0x1a/0x20
       ip_local_deliver_finish+0xe4/0x3d0
       ip_local_deliver+0x1a7/0x220
       ip_rcv_finish+0x222/0x6e0
       ip_rcv+0x3aa/0x540
       __netif_receive_skb_core+0x790/0xdc0
       __netif_receive_skb+0x1d/0x60
       process_backlog+0x9f/0x270
       net_rx_action+0x389/0x6c0
       do_current_softirqs+0x22e/0x670
       __local_bh_enable+0x5b/0x80
       ip_finish_output2+0x2aa/0x5e0
       ip_finish_output+0x229/0x320
       ip_output+0x182/0x260
       ip_local_out+0x39/0x70
       ip_send_skb+0x19/0x40
       udp_send_skb+0x14d/0x280
       udp_sendmsg+0x364/0xb30
       inet_sendmsg+0x4a/0x1c0
       sock_sendmsg+0x38/0x50
       SyS_sendto+0x10a/0x190
       do_syscall_64+0x7f/0x210
       return_from_SYSCALL_64+0x0/0x7a

                              -> #0 ((icmp_sk_lock).lock){+.+...}:
       __lock_acquire+0x1b84/0x1d30
       lock_acquire+0xb7/0x250
       rt_spin_lock+0x4b/0x60
       icmp_send+0x144/0x7a0
       nf_send_unreach+0x97/0xa00 [nf_reject_ipv4]
       reject_tg+0x2b/0x96 [ipt_REJECT]
       ipt_do_table+0x353/0x710 [ip_tables]
       iptable_filter_hook+0x27/0x57 [iptable_filter]
       nf_hook_slow+0x2c/0xf0
       ip_local_deliver+0xfb/0x220
       ip_rcv_finish+0x222/0x6e0
       ip_rcv+0x3aa/0x540
       __netif_receive_skb_core+0x790/0xdc0
       __netif_receive_skb+0x1d/0x60
       netif_receive_skb_internal+0x7c/0x200
       napi_gro_receive+0x16b/0x210
       ieee80211_deliver_skb+0xe9/0x220 [mac80211]
       ieee80211_rx_handlers+0x1df8/0x2700 [mac80211]
       ieee80211_invoke_rx_handlers+0x155/0x750 [mac80211]
       ieee80211_prepare_and_rx_handle+0x586/0xb00 [mac80211]
       ieee80211_rx_napi+0x35a/0xc10 [mac80211]
       ath10k_htt_rx_h_deliver+0xae/0x120 [ath10k_core]
       ath10k_htt_rx_handle_amsdu+0x7c0/0x910 [ath10k_core]
       ath10k_htt_txrx_compl_task+0x61e/0x790 [ath10k_core]
       ath10k_pci_napi_poll+0x4a/0xf0 [ath10k_pci]
       net_rx_action+0x389/0x6c0
       do_current_softirqs+0x22e/0x670
       __local_bh_enable+0x5b/0x80
       irq_forced_thread_fn+0x55/0x70
       irq_thread+0x15e/0x200
       kthread+0x114/0x150
       ret_from_fork+0x27/0x40

                              other info that might help us debug this:
 Possible unsafe locking scenario:
       CPU0                    CPU1
       ----                    ----
  lock((xt_write_lock).lock);
                               lock((icmp_sk_lock).lock);
                               lock((xt_write_lock).lock);
  lock((icmp_sk_lock).lock);

                               *** DEADLOCK ***
6 locks held by irq/132-ath10k_/1008:
 #0:  (&per_cpu(local_softirq_locks[i], __cpu).lock){+.+...}, at:
[<ffffffff9206e1ed>] do_current_softirqs+0x14d/0x670
 #1:  (rcu_read_lock){......}, at: [<ffffffffc0b15762>]
ieee80211_rx_napi+0xb2/0xc10 [mac80211]
 #2:  (&local->rx_path_lock){+.+...}, at: [<ffffffffc0b11a60>]
ieee80211_rx_handlers+0x40/0x2700 [mac80211]
 #3:  (rcu_read_lock){......}, at: [<ffffffff925e138e>]
netif_receive_skb_internal+0x2e/0x200
 #4:  (rcu_read_lock){......}, at: [<ffffffff92636356>]
ip_local_deliver+0x66/0x220
 #5:  ((xt_write_lock).lock){+.+...}, at: [<ffffffffc059b256>]
ipt_do_table+0xd6/0x710 [ip_tables]

                              stack backtrace:
CPU: 0 PID: 1008 Comm: irq/132-ath10k_ Not tainted 4.11.12-rt9-1 #4
Hardware name: Acer Aspire E5-575/Ironman_SK  , BIOS V1.25 03/03/2017
Call Trace:
 dump_stack+0x68/0x92
 print_circular_bug+0x1f6/0x300
 __lock_acquire+0x1b84/0x1d30
 ? __this_cpu_preempt_check+0x13/0x20
 lock_acquire+0xb7/0x250
 ? lock_acquire+0xb7/0x250
 ? icmp_send+0x144/0x7a0
 rt_spin_lock+0x4b/0x60
 ? icmp_send+0x144/0x7a0
 icmp_send+0x144/0x7a0
 ? __this_cpu_preempt_check+0x13/0x20
 ? trace_hardirqs_on_caller+0xef/0x210
 ? preempt_count_sub+0xa1/0x100
 ? lock_acquire+0xb7/0x250
 ? ipt_do_table+0xd6/0x710 [ip_tables]
 nf_send_unreach+0x97/0xa00 [nf_reject_ipv4]
 reject_tg+0x2b/0x96 [ipt_REJECT]
 ipt_do_table+0x353/0x710 [ip_tables]
 iptable_filter_hook+0x27/0x57 [iptable_filter]
 nf_hook_slow+0x2c/0xf0
 ip_local_deliver+0xfb/0x220
 ? inet_del_offload+0x40/0x40
 ip_rcv_finish+0x222/0x6e0
 ip_rcv+0x3aa/0x540
 ? ip_local_deliver_finish+0x3d0/0x3d0
 __netif_receive_skb_core+0x790/0xdc0
 ? lock_acquire+0xb7/0x250
 ? __this_cpu_preempt_check+0x13/0x20
 ? netif_receive_skb_internal+0x2e/0x200
 __netif_receive_skb+0x1d/0x60
 netif_receive_skb_internal+0x7c/0x200
 napi_gro_receive+0x16b/0x210
 ieee80211_deliver_skb+0xe9/0x220 [mac80211]
 ieee80211_rx_handlers+0x1df8/0x2700 [mac80211]
 ieee80211_invoke_rx_handlers+0x155/0x750 [mac80211]
 ? __lock_acquire+0x516/0x1d30
 ieee80211_prepare_and_rx_handle+0x586/0xb00 [mac80211]
 ? sta_info_hash_lookup+0x13a/0x250 [mac80211]
 ieee80211_rx_napi+0x35a/0xc10 [mac80211]
 ? ath10k_htt_rx_h_mpdu+0x40d/0x840 [ath10k_core]
 ath10k_htt_rx_h_deliver+0xae/0x120 [ath10k_core]
 ? ath10k_htt_rx_h_ppdu+0xbe/0x2c0 [ath10k_core]
 ath10k_htt_rx_handle_amsdu+0x7c0/0x910 [ath10k_core]
 ? __lock_acquire+0x516/0x1d30
 ath10k_htt_txrx_compl_task+0x61e/0x790 [ath10k_core]
 ? _raw_spin_unlock_irqrestore+0x80/0x90
 ? __this_cpu_preempt_check+0x13/0x20
 ? __local_bh_enable+0x31/0x80
 ? __this_cpu_preempt_check+0x13/0x20
 ? trace_hardirqs_on_caller+0xef/0x210
 ? trace_hardirqs_on+0xd/0x10
 ? __local_bh_enable+0x31/0x80
 ? ath10k_ce_per_engine_service+0xc3/0xe0 [ath10k_pci]
 ath10k_pci_napi_poll+0x4a/0xf0 [ath10k_pci]
 net_rx_action+0x389/0x6c0
 ? trace_hardirqs_on_caller+0xef/0x210
 ? preempt_count_sub+0xa1/0x100
 do_current_softirqs+0x22e/0x670
 __local_bh_enable+0x5b/0x80
 irq_forced_thread_fn+0x55/0x70
 ? irq_thread+0xb7/0x200
 irq_thread+0x15e/0x200
 ? irq_finalize_oneshot.part.2+0xe0/0xe0
 ? wake_threads_waitq+0x30/0x30
 kthread+0x114/0x150
 ? irq_thread_dtor+0xc0/0xc0
 ? kthread_create_on_node+0x70/0x70
 ret_from_fork+0x27/0x40


Jacek

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: RT kernel on Acer laptop unreliable
  2017-08-23 17:04           ` Jacek Konieczny
@ 2017-09-05  9:17             ` Sebastian Andrzej Siewior
  2017-09-05 19:37               ` Jacek Konieczny
  2017-09-22  7:40               ` Sebastian Andrzej Siewior
  0 siblings, 2 replies; 19+ messages in thread
From: Sebastian Andrzej Siewior @ 2017-09-05  9:17 UTC (permalink / raw)
  To: Jacek Konieczny; +Cc: linux-rt-users

On 2017-08-23 19:04:20 [+0200], Jacek Konieczny wrote:
> On 2017-08-21 15:12, Sebastian Andrzej Siewior wrote:
> > On 2017-08-20 16:48:24 [+0200], Jacek Konieczny wrote:
> >> Does this help?
> > 
> > yup, the following patch should make it go away.
> > 
> > diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> [...]
> 
> I have applied the patch, no crash since then, but I got another problem
> detected:
I'm sorry, I missed that. What about this on-top?

diff --git a/include/linux/locallock.h b/include/linux/locallock.h
index eeb1a66df402..298afcd8e219 100644
--- a/include/linux/locallock.h
+++ b/include/linux/locallock.h
@@ -61,6 +61,9 @@ static inline int __local_trylock(struct local_irq_lock *lv)
 		lv->owner = current;
 		lv->nestcnt = 1;
 		return 1;
+	} else if (lv->owner == current) {
+		lv->nestcnt++;
+		return 1;
 	}
 	return 0;
 }
diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index 8ea63314f196..169b27596bc7 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -218,12 +218,16 @@ static inline struct sock *icmp_xmit_lock(struct net *net)
 {
 	struct sock *sk;
 
+	if (!local_trylock(icmp_sk_lock))
+		return NULL;
+
 	sk = icmp_sk(net);
 
 	if (unlikely(!spin_trylock(&sk->sk_lock.slock))) {
 		/* This can happen if the output path signals a
 		 * dst_link_failure() for an outgoing ICMP packet.
 		 */
+		local_unlock(icmp_sk_lock);
 		return NULL;
 	}
 	return sk;
@@ -232,6 +236,7 @@ static inline struct sock *icmp_xmit_lock(struct net *net)
 static inline void icmp_xmit_unlock(struct sock *sk)
 {
 	spin_unlock(&sk->sk_lock.slock);
+	local_unlock(icmp_sk_lock);
 }
 
 int sysctl_icmp_msgs_per_sec __read_mostly = 1000;
@@ -421,7 +426,6 @@ static void icmp_reply(struct icmp_bxm *icmp_param, struct sk_buff *skb)
 
 	/* Needed by both icmp_global_allow and icmp_xmit_lock */
 	local_bh_disable();
-	local_lock(icmp_sk_lock);
 
 	/* global icmp_msgs_per_sec */
 	if (!icmpv4_global_allow(net, type, code))
@@ -466,7 +470,6 @@ static void icmp_reply(struct icmp_bxm *icmp_param, struct sk_buff *skb)
 out_unlock:
 	icmp_xmit_unlock(sk);
 out_bh_enable:
-	local_unlock(icmp_sk_lock);
 	local_bh_enable();
 }
 
@@ -679,7 +682,6 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info)
 
 	/* Needed by both icmp_global_allow and icmp_xmit_lock */
 	local_bh_disable();
-	local_lock(icmp_sk_lock);
 
 	/* Check global sysctl_icmp_msgs_per_sec ratelimit, unless
 	 * incoming dev is loopback.  If outgoing dev change to not be
@@ -768,7 +770,6 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info)
 out_unlock:
 	icmp_xmit_unlock(sk);
 out_bh_enable:
-	local_unlock(icmp_sk_lock);
 	local_bh_enable();
 out:;
 }
> 
> Jacek

Sebastian

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: RT kernel on Acer laptop unreliable
  2017-09-05  9:17             ` Sebastian Andrzej Siewior
@ 2017-09-05 19:37               ` Jacek Konieczny
  2017-09-10 12:49                 ` Jacek Konieczny
  2017-09-22  7:40               ` Sebastian Andrzej Siewior
  1 sibling, 1 reply; 19+ messages in thread
From: Jacek Konieczny @ 2017-09-05 19:37 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: linux-rt-users

On 2017-09-05 11:17, Sebastian Andrzej Siewior wrote:
> On 2017-08-23 19:04:20 [+0200], Jacek Konieczny wrote:
>> I have applied the patch, no crash since then, but I got another problem
>> detected:
> I'm sorry, I missed that.

No problem. I was away on my holidays anyway.

> What about this on-top?

That would not apply on top of my previous source, but I took
patch-4.11.12-rt13.patch and applied over that. I should have some
results in a few days.

Jacek

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: RT kernel on Acer laptop unreliable
  2017-09-05 19:37               ` Jacek Konieczny
@ 2017-09-10 12:49                 ` Jacek Konieczny
  2017-09-11  7:17                   ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 19+ messages in thread
From: Jacek Konieczny @ 2017-09-10 12:49 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: linux-rt-users

On 2017-09-05 21:37, Jacek Konieczny wrote:
> On 2017-09-05 11:17, Sebastian Andrzej Siewior wrote:
>> On 2017-08-23 19:04:20 [+0200], Jacek Konieczny wrote:
>>> I have applied the patch, no crash since then, but I got another problem
>>> detected:
>> I'm sorry, I missed that.
> 
> No problem. I was away on my holidays anyway.
> 
>> What about this on-top?
> 
> That would not apply on top of my previous source, but I took
> patch-4.11.12-rt13.patch and applied over that. I should have some
> results in a few days.

I have been running it for a few days and no problems here. It seems
this fixed the locking problem for me.


Jacek

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: RT kernel on Acer laptop unreliable
  2017-09-10 12:49                 ` Jacek Konieczny
@ 2017-09-11  7:17                   ` Sebastian Andrzej Siewior
  2017-09-14 19:11                     ` Jacek Konieczny
  0 siblings, 1 reply; 19+ messages in thread
From: Sebastian Andrzej Siewior @ 2017-09-11  7:17 UTC (permalink / raw)
  To: Jacek Konieczny; +Cc: linux-rt-users

On 2017-09-10 14:49:24 [+0200], Jacek Konieczny wrote:
> I have been running it for a few days and no problems here. It seems
> this fixed the locking problem for me.

Okay. Can you try without lockdep, please?

> 
> Jacek

Sebastian

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: RT kernel on Acer laptop unreliable
  2017-09-11  7:17                   ` Sebastian Andrzej Siewior
@ 2017-09-14 19:11                     ` Jacek Konieczny
  2017-09-21 12:57                       ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 19+ messages in thread
From: Jacek Konieczny @ 2017-09-14 19:11 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: linux-rt-users

On 2017-09-11 09:17, Sebastian Andrzej Siewior wrote:
> On 2017-09-10 14:49:24 [+0200], Jacek Konieczny wrote:
>> I have been running it for a few days and no problems here. It seems
>> this fixed the locking problem for me.
> 
> Okay. Can you try without lockdep, please?

I have been using it for a few days now – still no crashes or lockups
here and performance seems significantly better without lockdep
(subjective feel, I have not measured it).

Thanks for the patches!

Jacek

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: RT kernel on Acer laptop unreliable
  2017-09-14 19:11                     ` Jacek Konieczny
@ 2017-09-21 12:57                       ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 19+ messages in thread
From: Sebastian Andrzej Siewior @ 2017-09-21 12:57 UTC (permalink / raw)
  To: Jacek Konieczny; +Cc: linux-rt-users

On 2017-09-14 21:11:24 [+0200], Jacek Konieczny wrote:
> On 2017-09-11 09:17, Sebastian Andrzej Siewior wrote:
> > On 2017-09-10 14:49:24 [+0200], Jacek Konieczny wrote:
> >> I have been running it for a few days and no problems here. It seems
> >> this fixed the locking problem for me.
> > 
> > Okay. Can you try without lockdep, please?
> 
> I have been using it for a few days now – still no crashes or lockups
> here and performance seems significantly better without lockdep
> (subjective feel, I have not measured it).

yes, without lockdep the performance is way better. But it is needed for
debugging :)

> Thanks for the patches!

Thanks for the confirmation that it is working now.

> Jacek

Sebastian

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: RT kernel on Acer laptop unreliable
  2017-09-05  9:17             ` Sebastian Andrzej Siewior
  2017-09-05 19:37               ` Jacek Konieczny
@ 2017-09-22  7:40               ` Sebastian Andrzej Siewior
  1 sibling, 0 replies; 19+ messages in thread
From: Sebastian Andrzej Siewior @ 2017-09-22  7:40 UTC (permalink / raw)
  To: Jacek Konieczny; +Cc: linux-rt-users

On 2017-09-05 11:17:13 [+0200], To Jacek Konieczny wrote:
> I'm sorry, I missed that. What about this on-top?
> 
> diff --git a/include/linux/locallock.h b/include/linux/locallock.h
> index eeb1a66df402..298afcd8e219 100644
> --- a/include/linux/locallock.h
> +++ b/include/linux/locallock.h
> @@ -61,6 +61,9 @@ static inline int __local_trylock(struct local_irq_lock *lv)
>  		lv->owner = current;
>  		lv->nestcnt = 1;
>  		return 1;
> +	} else if (lv->owner == current) {
> +		lv->nestcnt++;
> +		return 1;
>  	}
>  	return 0;
>  }

I am going to split up that part out in a separate patch and also add
the !RT part to it.

--- a/include/linux/locallock.h
+++ b/include/linux/locallock.h
@@ -68,6 +68,9 @@ static inline int __local_trylock(struct
 		lv->owner = current;
 		lv->nestcnt = 1;
 		return 1;
+	} else if (lv->owner == current) {
+		lv->nestcnt++;
+		return 1;
 	}
 	return 0;
 }
@@ -238,6 +241,12 @@ static inline int __local_unlock_irqrest
 
 static inline void local_irq_lock_init(int lvar) { }
 
+#define local_trylock(lvar)					\
+	({							\
+		preempt_disable();				\
+		1;						\
+	})
+
 #define local_lock(lvar)			preempt_disable()
 #define local_unlock(lvar)			preempt_enable()
 #define local_lock_irq(lvar)			local_irq_disable()

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2017-09-22  7:40 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-15 17:43 RT kernel on Acer laptop unreliable Jacek Konieczny
2017-07-17  6:21 ` Stéphane Ancelot
2017-07-17  6:31   ` Piotr Gregor
2017-07-17 15:06     ` Jacek Konieczny
2017-07-17 17:59       ` Piotr Gregor
2017-08-07 13:19 ` Sebastian Andrzej Siewior
2017-08-12 10:07   ` Jacek Konieczny
2017-08-14 18:03   ` Jacek Konieczny
2017-08-18 12:44     ` Sebastian Andrzej Siewior
2017-08-20 14:48       ` Jacek Konieczny
2017-08-21 13:12         ` Sebastian Andrzej Siewior
2017-08-23 17:04           ` Jacek Konieczny
2017-09-05  9:17             ` Sebastian Andrzej Siewior
2017-09-05 19:37               ` Jacek Konieczny
2017-09-10 12:49                 ` Jacek Konieczny
2017-09-11  7:17                   ` Sebastian Andrzej Siewior
2017-09-14 19:11                     ` Jacek Konieczny
2017-09-21 12:57                       ` Sebastian Andrzej Siewior
2017-09-22  7:40               ` Sebastian Andrzej Siewior

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.