linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [ANNOUNCE] 4.1.3-rt3
@ 2015-07-25 10:32 Sebastian Andrzej Siewior
  2015-08-06 17:50 ` [ANNOUNCE] 4.1.3-rt3 - xmit queue timeout, oops, rcu stalls Fernando Lopez-Lezcano
  0 siblings, 1 reply; 4+ messages in thread
From: Sebastian Andrzej Siewior @ 2015-07-25 10:32 UTC (permalink / raw)
  To: linux-rt-users; +Cc: LKML, Thomas Gleixner, rostedt, John Kacur

Dear RT folks!

I'm pleased to announce the v4.1.3-rt3 patch set.
Changes since v4.1.3-rt2:

- fix compile of locktorture. Patch by Wolfgang M. Reimer.

- fix compile pid_namespace without lockdep on ARM. Patch by Grygorii
  Strashko

- The annoying "cpufreq_stat_notifier_trans: No policy found" is finally
  gone. 

- xor / raid_pq
  The max latency will increase into the ms range if the raid6_pq is
  loaded. This should not matter under normal circumstances because that
  module should only be loaded at boot time if required (and not while a
  -RT task is active in production). It might also get loaded at
  run-time manually. 
  Dropping the preempt_disable() might cause different results for the
  individual implementations. People who don't care (load it at
  run-time) don't need to load it at all. People who care (load it boot
  time) would prefer to stick with the best implementation.
  Therefore I think it is enough to document this (don't load it at run
  time if you don't need it) and I cross it off my list. Patches are
  welcome if someone needs / has an improvement.

Known issues:
 
- bcache is disabled.

- CPU hotplug works in general. Steven's test script however
  deadlocks usually on the second invocation.

You can get this release via the git tree at:

    git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git linux-4.1.y-rt
    git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git linux-4.1.y-rt-rebase
    git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git linux-4.1.y-rt-queue

The delta patch against 4.0.8-rt5 is appended below and can be found here:

    https://www.kernel.org/pub/linux/kernel/projects/rt/4.1/incr/patch-4.1.3-rt2-rt3.patch.xz

The RT patch against 4.1.3 can be found here:

    https://www.kernel.org/pub/linux/kernel/projects/rt/4.1/patch-4.1.3-rt3.patch.xz

The split quilt queue is available at:

    https://www.kernel.org/pub/linux/kernel/projects/rt/4.1/patches-4.1.3-rt3.tar.xz

Sebastian

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 8ae655c364f4..ce1d93e93d1a 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -64,12 +64,6 @@ static inline bool has_target(void)
 	return cpufreq_driver->target_index || cpufreq_driver->target;
 }
 
-/*
- * rwsem to guarantee that cpufreq driver module doesn't unload during critical
- * sections
- */
-static DECLARE_RWSEM(cpufreq_rwsem);
-
 /* internal prototypes */
 static int __cpufreq_governor(struct cpufreq_policy *policy,
 		unsigned int event);
@@ -215,9 +209,6 @@ struct cpufreq_policy *cpufreq_cpu_get(unsigned int cpu)
 	if (cpu >= nr_cpu_ids)
 		return NULL;
 
-	if (!down_read_trylock(&cpufreq_rwsem))
-		return NULL;
-
 	/* get the cpufreq driver */
 	read_lock_irqsave(&cpufreq_driver_lock, flags);
 
@@ -230,9 +221,6 @@ struct cpufreq_policy *cpufreq_cpu_get(unsigned int cpu)
 
 	read_unlock_irqrestore(&cpufreq_driver_lock, flags);
 
-	if (!policy)
-		up_read(&cpufreq_rwsem);
-
 	return policy;
 }
 EXPORT_SYMBOL_GPL(cpufreq_cpu_get);
@@ -240,7 +228,6 @@ EXPORT_SYMBOL_GPL(cpufreq_cpu_get);
 void cpufreq_cpu_put(struct cpufreq_policy *policy)
 {
 	kobject_put(&policy->kobj);
-	up_read(&cpufreq_rwsem);
 }
 EXPORT_SYMBOL_GPL(cpufreq_cpu_put);
 
@@ -765,9 +752,6 @@ static ssize_t show(struct kobject *kobj, struct attribute *attr, char *buf)
 	struct freq_attr *fattr = to_attr(attr);
 	ssize_t ret;
 
-	if (!down_read_trylock(&cpufreq_rwsem))
-		return -EINVAL;
-
 	down_read(&policy->rwsem);
 
 	if (fattr->show)
@@ -776,7 +760,6 @@ static ssize_t show(struct kobject *kobj, struct attribute *attr, char *buf)
 		ret = -EIO;
 
 	up_read(&policy->rwsem);
-	up_read(&cpufreq_rwsem);
 
 	return ret;
 }
@@ -793,9 +776,6 @@ static ssize_t store(struct kobject *kobj, struct attribute *attr,
 	if (!cpu_online(policy->cpu))
 		goto unlock;
 
-	if (!down_read_trylock(&cpufreq_rwsem))
-		goto unlock;
-
 	down_write(&policy->rwsem);
 
 	if (fattr->store)
@@ -804,8 +784,6 @@ static ssize_t store(struct kobject *kobj, struct attribute *attr,
 		ret = -EIO;
 
 	up_write(&policy->rwsem);
-
-	up_read(&cpufreq_rwsem);
 unlock:
 	put_online_cpus();
 
@@ -1117,16 +1095,12 @@ static int __cpufreq_add_dev(struct device *dev, struct subsys_interface *sif)
 	if (unlikely(policy))
 		return 0;
 
-	if (!down_read_trylock(&cpufreq_rwsem))
-		return 0;
-
 	/* Check if this cpu was hot-unplugged earlier and has siblings */
 	read_lock_irqsave(&cpufreq_driver_lock, flags);
 	for_each_policy(policy) {
 		if (cpumask_test_cpu(cpu, policy->related_cpus)) {
 			read_unlock_irqrestore(&cpufreq_driver_lock, flags);
 			ret = cpufreq_add_policy_cpu(policy, cpu, dev);
-			up_read(&cpufreq_rwsem);
 			return ret;
 		}
 	}
@@ -1269,8 +1243,6 @@ static int __cpufreq_add_dev(struct device *dev, struct subsys_interface *sif)
 
 	kobject_uevent(&policy->kobj, KOBJ_ADD);
 
-	up_read(&cpufreq_rwsem);
-
 	/* Callback for handling stuff after policy is ready */
 	if (cpufreq_driver->ready)
 		cpufreq_driver->ready(policy);
@@ -1304,8 +1276,6 @@ static int __cpufreq_add_dev(struct device *dev, struct subsys_interface *sif)
 	cpufreq_policy_free(policy);
 
 nomem_out:
-	up_read(&cpufreq_rwsem);
-
 	return ret;
 }
 
@@ -2499,19 +2469,20 @@ int cpufreq_unregister_driver(struct cpufreq_driver *driver)
 
 	pr_debug("unregistering driver %s\n", driver->name);
 
+	/* Protect against concurrent cpu hotplug */
+	get_online_cpus();
 	subsys_interface_unregister(&cpufreq_interface);
 	if (cpufreq_boost_supported())
 		cpufreq_sysfs_remove_file(&boost.attr);
 
 	unregister_hotcpu_notifier(&cpufreq_cpu_notifier);
 
-	down_write(&cpufreq_rwsem);
 	write_lock_irqsave(&cpufreq_driver_lock, flags);
 
 	cpufreq_driver = NULL;
 
 	write_unlock_irqrestore(&cpufreq_driver_lock, flags);
-	up_write(&cpufreq_rwsem);
+	put_online_cpus();
 
 	return 0;
 }
diff --git a/include/linux/pid.h b/include/linux/pid.h
index 23705a53abba..2cc64b779f03 100644
--- a/include/linux/pid.h
+++ b/include/linux/pid.h
@@ -2,6 +2,7 @@
 #define _LINUX_PID_H
 
 #include <linux/rcupdate.h>
+#include <linux/atomic.h>
 
 enum pid_type
 {
diff --git a/kernel/locking/locktorture.c b/kernel/locking/locktorture.c
index ec8cce259779..aa60d919e336 100644
--- a/kernel/locking/locktorture.c
+++ b/kernel/locking/locktorture.c
@@ -24,7 +24,6 @@
 #include <linux/module.h>
 #include <linux/kthread.h>
 #include <linux/spinlock.h>
-#include <linux/rwlock.h>
 #include <linux/mutex.h>
 #include <linux/rwsem.h>
 #include <linux/smp.h>
diff --git a/localversion-rt b/localversion-rt
index c3054d08a112..1445cd65885c 100644
--- a/localversion-rt
+++ b/localversion-rt
@@ -1 +1 @@
--rt2
+-rt3

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [ANNOUNCE] 4.1.3-rt3 - xmit queue timeout, oops, rcu stalls
  2015-07-25 10:32 [ANNOUNCE] 4.1.3-rt3 Sebastian Andrzej Siewior
@ 2015-08-06 17:50 ` Fernando Lopez-Lezcano
  2015-08-06 22:19   ` John Dulaney
  2015-08-16 11:23   ` Sebastian Andrzej Siewior
  0 siblings, 2 replies; 4+ messages in thread
From: Fernando Lopez-Lezcano @ 2015-08-06 17:50 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, linux-rt-users
  Cc: nando, LKML, Thomas Gleixner, rostedt, John Kacur

[-- Attachment #1: Type: text/plain, Size: 3733 bytes --]

On 07/25/2015 03:32 AM, Sebastian Andrzej Siewior wrote:
> Dear RT folks!
>
> I'm pleased to announce the v4.1.3-rt3 patch set.
...

I've had a few hangs with nothing left behind to debug... but today I 
find this:

(NOTE: I'm attaching a file with the details, I don't know if my mailer 
will mangled these lines)

----
Aug  5 10:46:18 localhost kernel: [ 2343.673560] WARNING: CPU: 3 PID: 43 
at net/sched/sch_generic.c:303 dev_watchdog+0x26f/0x280()
Aug  5 10:46:18 localhost kernel: [ 2343.673561] NETDEV WATCHDOG: eth1 
(e1000e): transmit queue 0 timed out
----

and then:

----
Aug  5 10:46:18 localhost kernel: [ 2343.673679] e1000e 0000:04:00.0 
eth1: Reset adapter unexpectedly
Aug  5 10:46:30 localhost kernel: [ 2355.706987] ata5.00: exception 
Emask 0x40 SAct 0x0 SErr 0x80800 action 0x6 frozen
Aug  5 10:46:30 localhost kernel: [ 2355.706990] ata5: SError: { HostInt 
10B8B }
Aug  5 10:46:30 localhost kernel: [ 2355.707003] ata5.00: cmd 
a0/00:00:00:08:00/00:00:00:00:00/a0 tag 0 pio 16392 in
Aug  5 10:46:30 localhost kernel: [ 2355.707003]          Get event 
status notification 4a 01 00 00 10 00 00 00 08 00res 
40/00:03:00:00:00/00:00:00:00:00/a0 Emask 0x44 (timeout)
Aug  5 10:46:30 localhost kernel: [ 2355.707005] ata5.00: status: { DRDY }
Aug  5 10:46:30 localhost kernel: [ 2355.707007] ata5: hard resetting link
----

same one but later in the log:

----
Aug  5 10:46:18 localhost kernel: WARNING: CPU: 3 PID: 43 at 
net/sched/sch_generic.c:303 dev_watchdog+0x26f/0x280()
Aug  5 10:46:18 localhost kernel: NETDEV WATCHDOG: eth1 (e1000e): 
transmit queue 0 timed out
----

Things apparently keep working and then:

----
Aug  5 11:58:36 localhost kernel: [ 6678.122596] Network Receive[2409]: 
segfault at 28 ip 0000003c4c293ca9 sp 00007fb6f64dbb58 error 6 in 
libc-2.18.so[3c4c200000+1b4000]
Aug  5 11:58:36 localhost kernel: Network Receive[2409]: segfault at 28 
ip 0000003c4c293ca9 sp 00007fb6f64dbb58 error 6 in 
libc-2.18.so[3c4c200000+1b4000]
Aug  5 11:58:36 localhost kernel: timekeeping watchdog: Marking 
clocksource 'tsc' as unstable, because the skew is too large:
Aug  5 11:58:36 localhost kernel: 	'hpet' wd_now: 47ebf654 wd_last: 
c0debfe6 mask: ffffffff
Aug  5 11:58:36 localhost kernel: 	'tsc' cs_now: 154f6e564f7d cs_last: 
7784d315c59 mask: ffffffffffffffff
Aug  5 11:58:36 localhost systemd: Starting dnf makecache...
Aug  5 11:58:36 localhost kernel: [ 6678.123233] timekeeping watchdog: 
Marking clocksource 'tsc' as unstable, because the skew is too large:
Aug  5 11:58:36 localhost kernel: [ 6678.123237] 	'hpet' wd_now: 
47ebf654 wd_last: c0debfe6 mask: ffffffff
Aug  5 11:58:36 localhost kernel: [ 6678.123238] 	'tsc' cs_now: 
154f6e564f7d cs_last: 7784d315c59 mask: ffffffffffffffff
Aug  5 11:58:36 localhost kernel: [ 6678.146207] Switched to clocksource 
hpet
Aug  5 11:58:36 localhost kernel: Switched to clocksource hpet
Aug  5 11:58:36 localhost kernel: [ 6678.150087] BUG: unable to handle 
kernel NULL pointer dereference at 0000000000000ea0
Aug  5 11:58:36 localhost kernel: [ 6678.150097] IP: 
[<ffffffffa05d922e>] nfs40_discover_server_trunking+0x5e/0x110 [nfsv4]
Aug  5 11:58:36 localhost kernel: [ 6678.150098] PGD 7f3c83067 PUD 
7f46fb067 PMD 0
Aug  5 11:58:36 localhost kernel: [ 6678.150099] Oops: 0000 [#1] PREEMPT 
SMP
----

And eventually (later) get a ton of these:

----
Aug  5 11:59:36 localhost kernel: [ 6738.107181] INFO: rcu_preempt 
detected stalls on CPUs/tasks: {} (detected by 3, t=60002 jiffies, 
g=37092, c=37091, q=0)
Aug  5 11:59:36 localhost kernel: [ 6738.107183] All QSes seen, last 
rcu_preempt kthread activity 1 (4301410925-4301410924), 
jiffies_till_next_fqs=3, root ->qsmask 0x0
----

So something is left in a not good state...

-- Fernando

[-- Attachment #2: messages.gz --]
[-- Type: application/x-gzip, Size: 8179 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: [ANNOUNCE] 4.1.3-rt3 - xmit queue timeout, oops, rcu stalls
  2015-08-06 17:50 ` [ANNOUNCE] 4.1.3-rt3 - xmit queue timeout, oops, rcu stalls Fernando Lopez-Lezcano
@ 2015-08-06 22:19   ` John Dulaney
  2015-08-16 11:23   ` Sebastian Andrzej Siewior
  1 sibling, 0 replies; 4+ messages in thread
From: John Dulaney @ 2015-08-06 22:19 UTC (permalink / raw)
  To: Fernando Lopez-Lezcano, Sebastian Andrzej Siewior, linux-rt-users
  Cc: LKML, Thomas Gleixner, rostedt, John Kacur

----------------------------------------
> Subject: Re: [ANNOUNCE] 4.1.3-rt3 - xmit queue timeout, oops, rcu stalls
> To: bigeasy@linutronix.de; linux-rt-users@vger.kernel.org
> CC: nando@ccrma.Stanford.EDU; linux-kernel@vger.kernel.org; tglx@linutronix.de; rostedt@goodmis.org; jkacur@redhat.com
> From: nando@ccrma.Stanford.EDU
> Date: Thu, 6 Aug 2015 10:50:22 -0700
>
> On 07/25/2015 03:32 AM, Sebastian Andrzej Siewior wrote:
>> Dear RT folks!
>>
>> I'm pleased to announce the v4.1.3-rt3 patch set.
> ...
>
> I've had a few hangs with nothing left behind to debug... but today I
> find this:
>
> (NOTE: I'm attaching a file with the details, I don't know if my mailer
> will mangled these lines)
>
> ----
> Aug 5 10:46:18 localhost kernel: [ 2343.673560] WARNING: CPU: 3 PID: 43
> at net/sched/sch_generic.c:303 dev_watchdog+0x26f/0x280()
> Aug 5 10:46:18 localhost kernel: [ 2343.673561] NETDEV WATCHDOG: eth1
> (e1000e): transmit queue 0 timed out
> ----
>
> and then:
>
> ----
> Aug 5 10:46:18 localhost kernel: [ 2343.673679] e1000e 0000:04:00.0
> eth1: Reset adapter unexpectedly
> Aug 5 10:46:30 localhost kernel: [ 2355.706987] ata5.00: exception
> Emask 0x40 SAct 0x0 SErr 0x80800 action 0x6 frozen
> Aug 5 10:46:30 localhost kernel: [ 2355.706990] ata5: SError: { HostInt
> 10B8B }
> Aug 5 10:46:30 localhost kernel: [ 2355.707003] ata5.00: cmd
> a0/00:00:00:08:00/00:00:00:00:00/a0 tag 0 pio 16392 in
> Aug 5 10:46:30 localhost kernel: [ 2355.707003] Get event
> status notification 4a 01 00 00 10 00 00 00 08 00res
> 40/00:03:00:00:00/00:00:00:00:00/a0 Emask 0x44 (timeout)
> Aug 5 10:46:30 localhost kernel: [ 2355.707005] ata5.00: status: { DRDY }
> Aug 5 10:46:30 localhost kernel: [ 2355.707007] ata5: hard resetting link
> ----
>
> same one but later in the log:
>
> ----
> Aug 5 10:46:18 localhost kernel: WARNING: CPU: 3 PID: 43 at
> net/sched/sch_generic.c:303 dev_watchdog+0x26f/0x280()
> Aug 5 10:46:18 localhost kernel: NETDEV WATCHDOG: eth1 (e1000e):
> transmit queue 0 timed out
> ----
>
> Things apparently keep working and then:
>
> ----
> Aug 5 11:58:36 localhost kernel: [ 6678.122596] Network Receive[2409]:
> segfault at 28 ip 0000003c4c293ca9 sp 00007fb6f64dbb58 error 6 in
> libc-2.18.so[3c4c200000+1b4000]
> Aug 5 11:58:36 localhost kernel: Network Receive[2409]: segfault at 28
> ip 0000003c4c293ca9 sp 00007fb6f64dbb58 error 6 in
> libc-2.18.so[3c4c200000+1b4000]
> Aug 5 11:58:36 localhost kernel: timekeeping watchdog: Marking
> clocksource 'tsc' as unstable, because the skew is too large:
> Aug 5 11:58:36 localhost kernel: 'hpet' wd_now: 47ebf654 wd_last:
> c0debfe6 mask: ffffffff
> Aug 5 11:58:36 localhost kernel: 'tsc' cs_now: 154f6e564f7d cs_last:
> 7784d315c59 mask: ffffffffffffffff
> Aug 5 11:58:36 localhost systemd: Starting dnf makecache...
> Aug 5 11:58:36 localhost kernel: [ 6678.123233] timekeeping watchdog:
> Marking clocksource 'tsc' as unstable, because the skew is too large:
> Aug 5 11:58:36 localhost kernel: [ 6678.123237] 'hpet' wd_now:
> 47ebf654 wd_last: c0debfe6 mask: ffffffff
> Aug 5 11:58:36 localhost kernel: [ 6678.123238] 'tsc' cs_now:
> 154f6e564f7d cs_last: 7784d315c59 mask: ffffffffffffffff
> Aug 5 11:58:36 localhost kernel: [ 6678.146207] Switched to clocksource
> hpet
> Aug 5 11:58:36 localhost kernel: Switched to clocksource hpet
> Aug 5 11:58:36 localhost kernel: [ 6678.150087] BUG: unable to handle
> kernel NULL pointer dereference at 0000000000000ea0
> Aug 5 11:58:36 localhost kernel: [ 6678.150097] IP:
> [<ffffffffa05d922e>] nfs40_discover_server_trunking+0x5e/0x110 [nfsv4]
> Aug 5 11:58:36 localhost kernel: [ 6678.150098] PGD 7f3c83067 PUD
> 7f46fb067 PMD 0
> Aug 5 11:58:36 localhost kernel: [ 6678.150099] Oops: 0000 [#1] PREEMPT
> SMP
> ----
>
> And eventually (later) get a ton of these:
>
> ----
> Aug 5 11:59:36 localhost kernel: [ 6738.107181] INFO: rcu_preempt
> detected stalls on CPUs/tasks: {} (detected by 3, t=60002 jiffies,
> g=37092, c=37091, q=0)
> Aug 5 11:59:36 localhost kernel: [ 6738.107183] All QSes seen, last
> rcu_preempt kthread activity 1 (4301410925-4301410924),
> jiffies_till_next_fqs=3, root ->qsmask 0x0
> ----
>
> So something is left in a not good state...
>
> -- Fernando

Do you still have your box setup to capture a vmcore?  Also, is this my latest
build?  I've been having issues with LUKs.

If you do still have your system setup to capture a vmcore, maybe set:

kernel.panic_on_oops = 1
In your /etc/sysctl.conf and then reboot to this kernel.

John.
 		 	   		  

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [ANNOUNCE] 4.1.3-rt3 - xmit queue timeout, oops, rcu stalls
  2015-08-06 17:50 ` [ANNOUNCE] 4.1.3-rt3 - xmit queue timeout, oops, rcu stalls Fernando Lopez-Lezcano
  2015-08-06 22:19   ` John Dulaney
@ 2015-08-16 11:23   ` Sebastian Andrzej Siewior
  1 sibling, 0 replies; 4+ messages in thread
From: Sebastian Andrzej Siewior @ 2015-08-16 11:23 UTC (permalink / raw)
  To: Fernando Lopez-Lezcano
  Cc: linux-rt-users, LKML, Thomas Gleixner, rostedt, John Kacur

* Fernando Lopez-Lezcano | 2015-08-06 10:50:22 [-0700]:

>I've had a few hangs with nothing left behind to debug... but today I
>find this:
>
>----
>Aug  5 10:46:18 localhost kernel: [ 2343.673560] WARNING: CPU: 3 PID:
>43 at net/sched/sch_generic.c:303 dev_watchdog+0x26f/0x280()
>Aug  5 10:46:18 localhost kernel: [ 2343.673561] NETDEV WATCHDOG:
>eth1 (e1000e): transmit queue 0 timed out
>----

Your network controller did not manage to send TX packets.

>and then:
>
>----
>Aug  5 10:46:18 localhost kernel: [ 2343.673679] e1000e 0000:04:00.0
>eth1: Reset adapter unexpectedly

this is the consequene of the former problem.

>Aug  5 10:46:30 localhost kernel: [ 2355.706987] ata5.00: exception
>Emask 0x40 SAct 0x0 SErr 0x80800 action 0x6 frozen
>Aug  5 10:46:30 localhost kernel: [ 2355.706990] ata5: SError: {
>HostInt 10B8B }
>Aug  5 10:46:30 localhost kernel: [ 2355.707003] ata5.00: cmd
>a0/00:00:00:08:00/00:00:00:00:00/a0 tag 0 pio 16392 in
>Aug  5 10:46:30 localhost kernel: [ 2355.707003]          Get event
>status notification 4a 01 00 00 10 00 00 00 08 00res
>40/00:03:00:00:00/00:00:00:00:00/a0 Emask 0x44 (timeout)
>Aug  5 10:46:30 localhost kernel: [ 2355.707005] ata5.00: status: { DRDY }
>Aug  5 10:46:30 localhost kernel: [ 2355.707007] ata5: hard resetting link

And now ata5 (hard disk?) suddenly got another problem and the link gets
reset.

>----
>Aug  5 10:46:18 localhost kernel: WARNING: CPU: 3 PID: 43 at
>net/sched/sch_generic.c:303 dev_watchdog+0x26f/0x280()
>Aug  5 10:46:18 localhost kernel: NETDEV WATCHDOG: eth1 (e1000e):
>transmit queue 0 timed out
ethernet is still not working.

>Aug  5 11:58:36 localhost kernel: [ 6678.122596] Network
>Receive[2409]: segfault at 28 ip 0000003c4c293ca9 sp 00007fb6f64dbb58
>error 6 in libc-2.18.so[3c4c200000+1b4000]
>Aug  5 11:58:36 localhost kernel: Network Receive[2409]: segfault at
>28 ip 0000003c4c293ca9 sp 00007fb6f64dbb58 error 6 in
>libc-2.18.so[3c4c200000+1b4000]

and now we have a segfault in libc. You box is kind of falling apart.

>And eventually (later) get a ton of these:
>
>----
>Aug  5 11:59:36 localhost kernel: [ 6738.107181] INFO: rcu_preempt
>detected stalls on CPUs/tasks: {} (detected by 3, t=60002 jiffies,
>g=37092, c=37091, q=0)
>Aug  5 11:59:36 localhost kernel: [ 6738.107183] All QSes seen, last
>rcu_preempt kthread activity 1 (4301410925-4301410924),
>jiffies_till_next_fqs=3, root ->qsmask 0x0

one CPU hangs and does not make any progress.

>
>So something is left in a not good state...

Can you reproduce this and if so with and without -RT? There is nothing
in the what would indicate a -RT bug.

>-- Fernando

Sebastian

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2015-08-16 11:23 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-25 10:32 [ANNOUNCE] 4.1.3-rt3 Sebastian Andrzej Siewior
2015-08-06 17:50 ` [ANNOUNCE] 4.1.3-rt3 - xmit queue timeout, oops, rcu stalls Fernando Lopez-Lezcano
2015-08-06 22:19   ` John Dulaney
2015-08-16 11:23   ` Sebastian Andrzej Siewior

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).