All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: schedule under irqs_disabled in SLUB problem
       [not found] <CADF-jezvVP2O++FR2KiRSSSJF7oObjy8LSP3-yj1HCmxyTzB_Q@mail.gmail.com>
@ 2017-11-02 16:50 ` Sebastian Andrzej Siewior
  2017-11-02 20:55   ` Grygorii Strashko
       [not found]   ` <CADF-jexLs9vRuiuoRmcA+0L6Mp-XxW75okheWV+ipGf1b_Ua1w@mail.gmail.com>
  0 siblings, 2 replies; 19+ messages in thread
From: Sebastian Andrzej Siewior @ 2017-11-02 16:50 UTC (permalink / raw)
  To: Pavel V. Panteleev; +Cc: linux-rt-users, tglx

On 2017-11-01 12:41:33 [+0300], Pavel V. Panteleev wrote:
> Hello! I have a problem on kernel 3.14.79-rt85 and I don't know, if it's
> solved on current kernel or, may be, it's our additions problem. Could you
> help me, please?

will do my best.

> Kernel is broken in migrate_enable():
>         WARN_ON_ONCE(p->migrate_disable <= 0);
> 
> I have a trace, where I see, that it happened, because few
> migrate_disable() were called under irqs_disabled (migrate_disable counter
> wasn't changed). But after schedule() call irqs were enabled and the
> following migrate_disable() and migrate_enable() changed migrate_disable
> counter. I suppose, that problem is schedule() call under irqs_disabled.

that is basically what happens if the counter goes negative.

…
> As I see, in allocate_slab() kernel could be under irqs_disabled. And irqs
> would be enabled in case of SYSTEM_RUNNING (why only in case of
> SYSTEM_RUNNING?). But in our case system isn't running yet (it's always
> before /sbin/init), so irqs wouldn't be enabled:

ideally the interrupts would be always enabled because they have to on
-RT but we can't do so early in the boot process.

>         enableirqs = (flags & __GFP_WAIT) != 0;
>         #ifdef CONFIG_PREEMPT_RT_FULL
>         enableirqs |= system_state == SYSTEM_RUNNING;
>         #endif
>         if (enableirqs)
>                 local_irq_enable();
> P. S.
> Today I reproduced this on 4.9.47-rt37. Try to reproduce on x86.

In newer -RT we have
	if (system_state > SYSTEM_BOOTING)
instead but this won't work even in v4.9 because it does not have
SYSTEM_SCHEDULING yet. So v4.11/v4.13 should not be affected.

If that is true, then something like

diff --git a/init/main.c b/init/main.c
index 6470deef01c9..362761effeac 100644
--- a/init/main.c
+++ b/init/main.c
@@ -379,6 +379,7 @@ static void __init setup_command_line(char *command_line)
  */
 
 static __initdata DECLARE_COMPLETION(kthreadd_done);
+extern bool slab_do_irq_on;
 
 static noinline void __ref rest_init(void)
 {
@@ -396,6 +397,7 @@ static noinline void __ref rest_init(void)
 	rcu_read_lock();
 	kthreadd_task = find_task_by_pid_ns(pid, &init_pid_ns);
 	rcu_read_unlock();
+	slab_do_irq_on = true;
 	complete(&kthreadd_done);
 
 	/*
diff --git a/mm/slub.c b/mm/slub.c
index 67eb368b9314..22ec805130e5 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1527,6 +1527,7 @@ static inline bool shuffle_freelist(struct kmem_cache *s, struct page *page)
 	return false;
 }
 #endif /* CONFIG_SLAB_FREELIST_RANDOM */
+bool slab_do_irq_on;
 
 static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
 {
@@ -1543,7 +1544,7 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
 	if (gfpflags_allow_blocking(flags))
 		enableirqs = true;
 #ifdef CONFIG_PREEMPT_RT_FULL
-	if (system_state == SYSTEM_RUNNING)
+	if (slab_do_irq_on)
 		enableirqs = true;
 #endif
 	if (enableirqs)

might just work. Can you test this please?

Sebastian

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: schedule under irqs_disabled in SLUB problem
  2017-11-02 16:50 ` schedule under irqs_disabled in SLUB problem Sebastian Andrzej Siewior
@ 2017-11-02 20:55   ` Grygorii Strashko
       [not found]   ` <CADF-jexLs9vRuiuoRmcA+0L6Mp-XxW75okheWV+ipGf1b_Ua1w@mail.gmail.com>
  1 sibling, 0 replies; 19+ messages in thread
From: Grygorii Strashko @ 2017-11-02 20:55 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, Pavel V. Panteleev; +Cc: linux-rt-users, tglx



On 11/02/2017 11:50 AM, Sebastian Andrzej Siewior wrote:
> On 2017-11-01 12:41:33 [+0300], Pavel V. Panteleev wrote:
>> Hello! I have a problem on kernel 3.14.79-rt85 and I don't know, if it's
>> solved on current kernel or, may be, it's our additions problem. Could you
>> help me, please?
> 
> will do my best.
> 
>> Kernel is broken in migrate_enable():
>>          WARN_ON_ONCE(p->migrate_disable <= 0);
>>
>> I have a trace, where I see, that it happened, because few
>> migrate_disable() were called under irqs_disabled (migrate_disable counter
>> wasn't changed). But after schedule() call irqs were enabled and the
>> following migrate_disable() and migrate_enable() changed migrate_disable
>> counter. I suppose, that problem is schedule() call under irqs_disabled.
> 
> that is basically what happens if the counter goes negative.
> 
> …
>> As I see, in allocate_slab() kernel could be under irqs_disabled. And irqs
>> would be enabled in case of SYSTEM_RUNNING (why only in case of
>> SYSTEM_RUNNING?). But in our case system isn't running yet (it's always
>> before /sbin/init), so irqs wouldn't be enabled:
> 
> ideally the interrupts would be always enabled because they have to on
> -RT but we can't do so early in the boot process.
> 
>>          enableirqs = (flags & __GFP_WAIT) != 0;
>>          #ifdef CONFIG_PREEMPT_RT_FULL
>>          enableirqs |= system_state == SYSTEM_RUNNING;
>>          #endif
>>          if (enableirqs)
>>                  local_irq_enable();
> …
>> P. S.
>> Today I reproduced this on 4.9.47-rt37. Try to reproduce on x86.
> 
> In newer -RT we have
> 	if (system_state > SYSTEM_BOOTING)
> instead but this won't work even in v4.9 because it does not have
> SYSTEM_SCHEDULING yet. So v4.11/v4.13 should not be affected.
> 
> If that is true, then something like
> 
> diff --git a/init/main.c b/init/main.c
> index 6470deef01c9..362761effeac 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -379,6 +379,7 @@ static void __init setup_command_line(char *command_line)
>    */
>   
>   static __initdata DECLARE_COMPLETION(kthreadd_done);
> +extern bool slab_do_irq_on;
>   
>   static noinline void __ref rest_init(void)
>   {
> @@ -396,6 +397,7 @@ static noinline void __ref rest_init(void)
>   	rcu_read_lock();
>   	kthreadd_task = find_task_by_pid_ns(pid, &init_pid_ns);
>   	rcu_read_unlock();
> +	slab_do_irq_on = true;
>   	complete(&kthreadd_done);
>   
>   	/*
> diff --git a/mm/slub.c b/mm/slub.c
> index 67eb368b9314..22ec805130e5 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -1527,6 +1527,7 @@ static inline bool shuffle_freelist(struct kmem_cache *s, struct page *page)
>   	return false;
>   }
>   #endif /* CONFIG_SLAB_FREELIST_RANDOM */
> +bool slab_do_irq_on;
>   
>   static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
>   {
> @@ -1543,7 +1544,7 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
>   	if (gfpflags_allow_blocking(flags))
>   		enableirqs = true;
>   #ifdef CONFIG_PREEMPT_RT_FULL
> -	if (system_state == SYSTEM_RUNNING)
> +	if (slab_do_irq_on)
>   		enableirqs = true;
>   #endif
>   	if (enableirqs)
> 
> might just work. Can you test this please?

I've used to see the same and created patch for 4.9 (should work for 4.1)
which does mostly the same things, but it also enables __might_sleep() during boot.
Of course, it's possible that I've selected wrong place to set SYSTEM_BOOTING_LATE
and it's more correct to do it in rest_init().

NOTE: As Sebastian mentioned below patch is not relevant for newer Kernel versions.


>From a9600a6a18ffe285bccd299985ad57450e4aa1b4 Mon Sep 17 00:00:00 2001
From: Grygorii Strashko <grygorii.strashko@ti.com>
Date: Fri, 9 Oct 2015 08:52:16 -0500
Subject: [DEBUG PATCH] core: introduce SYSTEM_BOOTING_LATE state

This patch introduces new SYSTEM_BOOTING_LATE system state
which will be set right after scheduler is fully operational
during the boot.

And reuse this state in update ___might_sleep() to allow catching
"BUG: sleeping function called from invalid context" early during
the boot which is very helping for debugging.

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
---
 include/linux/kernel.h | 1 +
 init/main.c            | 1 +
 kernel/sched/core.c    | 4 +++-
 mm/slub.c              | 3 +--
 4 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 7894d55..5dfed24 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -488,6 +488,7 @@ extern bool early_boot_irqs_disabled;
 /* Values used for system_state */
 extern enum system_states {
 	SYSTEM_BOOTING,
+	SYSTEM_BOOTING_LATE,
 	SYSTEM_RUNNING,
 	SYSTEM_HALT,
 	SYSTEM_POWER_OFF,
diff --git a/init/main.c b/init/main.c
index a4a61e7..e5e5906 100644
--- a/init/main.c
+++ b/init/main.c
@@ -989,6 +989,7 @@ static noinline void __init kernel_init_freeable(void)
 
 	/* Now the scheduler is fully set up and can do blocking allocations */
 	gfp_allowed_mask = __GFP_BITS_MASK;
+	system_state = SYSTEM_BOOTING_LATE;
 
 	/*
 	 * init can allocate pages on any node
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 127aaf4..04c1805 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7977,7 +7977,9 @@ void ___might_sleep(const char *file, int line, int preempt_offset)
 	rcu_sleep_check(); /* WARN_ON_ONCE() by default, no rate limit reqd. */
 	if ((preempt_count_equals(preempt_offset) && !irqs_disabled() &&
 	     !is_idle_task(current)) ||
-		 (system_state != SYSTEM_RUNNING && system_state != SYSTEM_SUSPEND) || oops_in_progress)
+		 (system_state != SYSTEM_RUNNING &&
+		  system_state != SYSTEM_SUSPEND &&
+		  system_state != SYSTEM_BOOTING_LATE) || oops_in_progress)
 		return;
 	if (time_before(jiffies, prev_jiffy + HZ) && prev_jiffy)
 		return;
diff --git a/mm/slub.c b/mm/slub.c
index 6d72b7f..993c89b 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1540,8 +1540,7 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
 	if (gfpflags_allow_blocking(flags))
 		enableirqs = true;
 #ifdef CONFIG_PREEMPT_RT_FULL
-	if (system_state == SYSTEM_RUNNING)
-		enableirqs = true;
+	enableirqs = system_state == SYSTEM_RUNNING || system_state == SYSTEM_BOOTING_LATE;
 #endif
 	if (enableirqs)
 		local_irq_enable();
-- 
2.10.5



-- 
regards,
-grygorii

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: schedule under irqs_disabled in SLUB problem
       [not found]   ` <CADF-jexLs9vRuiuoRmcA+0L6Mp-XxW75okheWV+ipGf1b_Ua1w@mail.gmail.com>
@ 2017-11-03 10:23     ` Pavel V. Panteleev
  2017-11-07  9:00       ` Pavel V. Panteleev
                         ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Pavel V. Panteleev @ 2017-11-03 10:23 UTC (permalink / raw)
  To: linux-rt-users

I'll try Sebastian's patch on the next week. Thanks!

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: schedule under irqs_disabled in SLUB problem
  2017-11-03 10:23     ` Pavel V. Panteleev
@ 2017-11-07  9:00       ` Pavel V. Panteleev
  2017-11-07  9:14       ` Pavel V. Panteleev
  2017-11-07  9:47       ` Pavel V. Panteleev
  2 siblings, 0 replies; 19+ messages in thread
From: Pavel V. Panteleev @ 2017-11-07  9:00 UTC (permalink / raw)
  To: linux-rt-users

Thanks, it works.

On 03.11.2017 13:23, Pavel V. Panteleev wrote:
> I'll try Sebastian's patch on the next week. Thanks!


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: schedule under irqs_disabled in SLUB problem
  2017-11-03 10:23     ` Pavel V. Panteleev
  2017-11-07  9:00       ` Pavel V. Panteleev
@ 2017-11-07  9:14       ` Pavel V. Panteleev
  2017-11-07  9:47       ` Pavel V. Panteleev
  2 siblings, 0 replies; 19+ messages in thread
From: Pavel V. Panteleev @ 2017-11-07  9:14 UTC (permalink / raw)
  To: linux-rt-users

Thanks, it works.

On 03.11.2017 13:23, Pavel V. Panteleev wrote:
> I'll try Sebastian's patch on the next week. Thanks!


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: schedule under irqs_disabled in SLUB problem
  2017-11-03 10:23     ` Pavel V. Panteleev
  2017-11-07  9:00       ` Pavel V. Panteleev
  2017-11-07  9:14       ` Pavel V. Panteleev
@ 2017-11-07  9:47       ` Pavel V. Panteleev
  2017-11-16 16:08         ` Sebastian Andrzej Siewior
  2 siblings, 1 reply; 19+ messages in thread
From: Pavel V. Panteleev @ 2017-11-07  9:47 UTC (permalink / raw)
  To: linux-rt-users

Thanks, it works.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: schedule under irqs_disabled in SLUB problem
  2017-11-07  9:47       ` Pavel V. Panteleev
@ 2017-11-16 16:08         ` Sebastian Andrzej Siewior
  2017-11-16 16:39           ` Pavel V. Panteleev
  2017-11-17 17:38           ` Julia Cartwright
  0 siblings, 2 replies; 19+ messages in thread
From: Sebastian Andrzej Siewior @ 2017-11-16 16:08 UTC (permalink / raw)
  To: Pavel V. Panteleev; +Cc: linux-rt-users, Julia Cartwright, Steven Rostedt

+ Steven & Julia

On 2017-11-07 12:47:27 [+0300], Pavel V. Panteleev wrote:
> Thanks, it works.

Okay, good to hear.

Steven + Julia:
We need to decide what are going to do about this stable-wise. The bug
was reported against 3.14.79-rt85 and the devel tree is not affected*.
The thread starts at 
  https://www.spinics.net/lists/linux-rt-users/msg17560.html

[*] probably since v4.11.5-rt1 once
    "sched/core: Enable might_sleep() and smp_processor_id() checks
    early" and its prerequisites appeared in the queue.

Sebastian

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: schedule under irqs_disabled in SLUB problem
  2017-11-16 16:08         ` Sebastian Andrzej Siewior
@ 2017-11-16 16:39           ` Pavel V. Panteleev
  2017-11-17 17:38           ` Julia Cartwright
  1 sibling, 0 replies; 19+ messages in thread
From: Pavel V. Panteleev @ 2017-11-16 16:39 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-rt-users, Julia Cartwright, Steven Rostedt

On 16.11.2017 19:08, Sebastian Andrzej Siewior wrote:
> + Steven & Julia
>
> On 2017-11-07 12:47:27 [+0300], Pavel V. Panteleev wrote:
>> Thanks, it works.
> Okay, good to hear.
>
> Steven + Julia:
> We need to decide what are going to do about this stable-wise. The bug
> was reported against 3.14.79-rt85 and the devel tree is not affected*.
> The thread starts at
>    https://www.spinics.net/lists/linux-rt-users/msg17560.html
>
> [*] probably since v4.11.5-rt1 once
>      "sched/core: Enable might_sleep() and smp_processor_id() checks
>      early" and its prerequisites appeared in the queue.
>
> Sebastian
>
>
kernel 4.9 has the same problem.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: schedule under irqs_disabled in SLUB problem
  2017-11-16 16:08         ` Sebastian Andrzej Siewior
  2017-11-16 16:39           ` Pavel V. Panteleev
@ 2017-11-17 17:38           ` Julia Cartwright
  2017-11-24  6:39             ` Sam Kappen
  2017-11-24  9:35             ` [PATCH] mm/slub: enable IRQs once scheduling is working Sebastian Andrzej Siewior
  1 sibling, 2 replies; 19+ messages in thread
From: Julia Cartwright @ 2017-11-17 17:38 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Pavel V. Panteleev, linux-rt-users, Steven Rostedt

On Thu, Nov 16, 2017 at 05:08:37PM +0100, Sebastian Andrzej Siewior wrote:
> + Steven & Julia
>
> On 2017-11-07 12:47:27 [+0300], Pavel V. Panteleev wrote:
> > Thanks, it works.
>
> Okay, good to hear.
>
> Steven + Julia:
> We need to decide what are going to do about this stable-wise. The bug
> was reported against 3.14.79-rt85 and the devel tree is not affected*.
> The thread starts at
>   https://www.spinics.net/lists/linux-rt-users/msg17560.html

Your proposed patch seems reasonable to me to pull back into the
relevant releases.  Can you send a proper patch against the latest
affected tree (4.9?) and the stable team will pull it back?  It looks
like it will need some minor massaging on it's way back, but that
shouldn't be a problem.

Thanks,
   Julia

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: schedule under irqs_disabled in SLUB problem
  2017-11-17 17:38           ` Julia Cartwright
@ 2017-11-24  6:39             ` Sam Kappen
  2017-11-24  9:37               ` Sebastian Andrzej Siewior
  2017-11-24  9:35             ` [PATCH] mm/slub: enable IRQs once scheduling is working Sebastian Andrzej Siewior
  1 sibling, 1 reply; 19+ messages in thread
From: Sam Kappen @ 2017-11-24  6:39 UTC (permalink / raw)
  To: linux-rt-users

Hi,

I am also faces a similar kind of issue on X86 target, while testing
3.10.105-rt119.
The issue is seen during boot-up when USB/SCSI enumeration starts.

Below is the log from my console

scsi 0:0:0:0: Direct-Access     Linux    scsi_debug       0004 PQ: 0 ANSI: 5
------------[ cut here ]------------
------------[ cut here ]------------
WARNING: at kernel/sched/core.c:3052 migrate_disable+0xee/0x100()
Modules linked in:
CPU: 3 PID: 7 Comm: kworker/u16:0 Not tainted 3.10.107-rt120+ #2
Hardware name: Intel Corporation S1200RP_SE/S1200RP_SE, BIOS
S1200RP.86B.02.02.0005.102320140911 10/23/2014
Workqueue: events_unbound async_run_entry_fn
 0000000000000000 ffff880244927338 ffffffff8168b2f0 0000000000000000
 0000000000000009 ffff880244927370 ffffffff8105ef8c ffff8802448fb540
 0000000000000025 0000000000000004 0000000000000025 ffffffff81d9810c
Call Trace:
 [<ffffffff8168b2f0>] dump_stack+0x4f/0x65
 [<ffffffff8105ef8c>] warn_slowpath_common+0x5c/0xa0
 [<ffffffff8105f085>] warn_slowpath_null+0x15/0x20
 [<ffffffff8109355e>] migrate_disable+0xee/0x100
 [<ffffffff810600af>] call_console_drivers.constprop.14+0x4f/0xd0
 [<ffffffff81061241>] console_unlock+0x2a1/0x470
 [<ffffffff810616e2>] vprintk_emit+0x2d2/0x550
 [<ffffffff8168eb49>] ? _raw_spin_unlock_irqrestore+0x19/0x50
 [<ffffffff810936ce>] ? migrate_enable+0x15e/0x1f0
 [<ffffffff816892d3>] printk+0x4a/0x52
 [<ffffffff810936ce>] ? migrate_enable+0x15e/0x1f0
 [<ffffffff8105ef5a>] warn_slowpath_common+0x2a/0xa0
 [<ffffffff8105f085>] warn_slowpath_null+0x15/0x20
 [<ffffffff810936ce>] migrate_enable+0x15e/0x1f0
 [<ffffffff810fce40>] get_page_from_freelist+0x630/0xb90
 [<ffffffff8168e32a>] ? rt_spin_lock_slowlock+0x2ca/0x310
 [<ffffffff810fe36d>] __alloc_pages_nodemask+0x13d/0x9e0
 [<ffffffff810fce72>] ? get_page_from_freelist+0x662/0xb90
 [<ffffffff81133dd0>] alloc_pages_current+0xb0/0x150
 [<ffffffff81138e05>] new_slab+0x2b5/0x380
 [<ffffffff8113b67a>] __slab_alloc.isra.18+0x58a/0x670
 [<ffffffff813d3f40>] ? scsi_pool_alloc_command+0x20/0x70
 [<ffffffff81133dd0>] ? alloc_pages_current+0xb0/0x150
 [<ffffffff8113b956>] kmem_cache_alloc+0xd6/0x100
 [<ffffffff813d3f40>] ? scsi_pool_alloc_command+0x20/0x70
 [<ffffffff813d3f40>] scsi_pool_alloc_command+0x20/0x70
 [<ffffffff813d492e>] scsi_host_alloc_command.isra.1+0x1e/0x80
 [<ffffffff813d49b0>] __scsi_get_command+0x20/0xc0
 [<ffffffff813d4a83>] scsi_get_command+0x33/0xc0
 [<ffffffff813dad1a>] scsi_get_cmd_from_req+0x4a/0x60
 [<ffffffff813db6cb>] scsi_setup_blk_pc_cmnd+0x2b/0xf0
 [<ffffffff813db8fc>] scsi_prep_fn+0x3c/0x50
 [<ffffffff812c9ef3>] blk_peek_request+0xf3/0x1c0
 [<ffffffff813db960>] scsi_request_fn+0x50/0x570
 [<ffffffff812c6c6e>] __blk_run_queue+0x2e/0x40
 [<ffffffff812cdde0>] blk_execute_rq_nowait+0x70/0x100
 [<ffffffff812cdef8>] blk_execute_rq+0x88/0xe0
sd 0:0:0:0: Attached scsi generic sg0 type 0
 [<ffffffff812ca040>] ? blk_rq_bio_prep+0x60/0xc0
 [<ffffffff812cdcf0>] ? blk_rq_map_kern+0xf0/0x170
 [<ffffffff812c86c0>] ? blk_get_request+0x60/0xe0
 [<ffffffff813da050>] scsi_execute+0xf0/0x150
 [<ffffffff813da182>] scsi_execute_req_flags+0x82/0xf0
 [<ffffffff8145d87f>] read_capacity_16+0xcf/0x520
 [<ffffffff8145e060>] sd_revalidate_disk+0x350/0x1bd0
 [<ffffffff8145f9a4>] sd_probe_async+0xc4/0x1d0
 [<ffffffff8108e7c2>] async_run_entry_fn+0x32/0x130
 [<ffffffff8107f5a5>] process_one_work+0x145/0x420
 [<ffffffff81080903>] worker_thread+0x163/0x470
 [<ffffffff8168d91c>] ? preempt_schedule+0x4c/0x70
 [<ffffffff810807a0>] ? manage_workers.isra.7+0x2d0/0x2d0
 [<ffffffff8108735f>] kthread+0xbf/0xd0
 [<ffffffff810872a0>] ? kthread_worker_fn+0x1a0/0x1a0
 [<ffffffff8168f6be>] ret_from_fork+0x4e/0x80
 [<ffffffff810872a0>] ? kthread_worker_fn+0x1a0/0x1a0
---[ end trace 0000000000000001 ]---
------------[ cut here ]------------
WARNING: at kernel/sched/core.c:3087 migrate_enable+0x15e/0x1f0()
Modules linked in:
CPU: 3 PID: 7 Comm: kworker/u16:0 Tainted: G        W    3.10.1


Test case to reproduce:
1. Enable PXE boot and mount file-system on USB stick
2. Continuously reboot the system with USB stick connected
3. We generally see the issue after every 3 to 5 hours.


On looking at the issue it is identified that there is some piece of
code someplace that
calls migrate_disable() with interrupts off, enables interrupts, then calls
migrate_enable().

On instrumentation it is observed that for some SCSI layer calls(calls from
get_requests) the above condition is not evaluated to true hence reaches at
buffered_rmqueue with irqs in disabled state.

>From the below call chain

buffered_rmqueue-> local_spin_lock_irqsave -> local_lock_irqsave -> spin_lock
->rt_spin_lock -> rt_spin_lock_fastlock -> rt_spin_lock_slowlock

In a normal case, when it enters rt_spin_lock_slowlock with irqs_disabled, the
same is returned in below case,

        if (__try_to_take_rt_mutex(lock, self, NULL, STEAL_LATERAL)) {
                raw_spin_unlock(&lock->wait_lock);
                return;
        }

But in the some case above condition is meet true and the control reaches below
in same function,

        pi_lock(&self->pi_lock);
        self->saved_state = self->state;
        __set_current_state(TASK_UNINTERRUPTIBLE);
        pi_unlock(&self->pi_lock);


pi_lock & pi_unlock disables and enables the irqs respectively, so in this
special case the irq state is not retained while exiting rt_spin_lock_slowlock
function and this results in the crash!

Could you please help to resolve the issue.

Regards,
Sam

On Fri, Nov 17, 2017 at 11:08 PM, Julia Cartwright <julia@ni.com> wrote:
> On Thu, Nov 16, 2017 at 05:08:37PM +0100, Sebastian Andrzej Siewior wrote:
>> + Steven & Julia
>>
>> On 2017-11-07 12:47:27 [+0300], Pavel V. Panteleev wrote:
>> > Thanks, it works.
>>
>> Okay, good to hear.
>>
>> Steven + Julia:
>> We need to decide what are going to do about this stable-wise. The bug
>> was reported against 3.14.79-rt85 and the devel tree is not affected*.
>> The thread starts at
>>   https://www.spinics.net/lists/linux-rt-users/msg17560.html
>
> Your proposed patch seems reasonable to me to pull back into the
> relevant releases.  Can you send a proper patch against the latest
> affected tree (4.9?) and the stable team will pull it back?  It looks
> like it will need some minor massaging on it's way back, but that
> shouldn't be a problem.
>
> Thanks,
>    Julia
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH] mm/slub: enable IRQs once scheduling is working
  2017-11-17 17:38           ` Julia Cartwright
  2017-11-24  6:39             ` Sam Kappen
@ 2017-11-24  9:35             ` Sebastian Andrzej Siewior
  1 sibling, 0 replies; 19+ messages in thread
From: Sebastian Andrzej Siewior @ 2017-11-24  9:35 UTC (permalink / raw)
  To: Julia Cartwright; +Cc: Pavel V. Panteleev, linux-rt-users, Steven Rostedt

Pavel V. Panteleev reported that a WARN_ON_ONCE(p->migrate_disable <= 0)
triggered in 3.14.79-rt85 during boot from radix_tree_insert().
The problem is that radix_tree_insert() allocates memory and SLUB does
not enable interrupts in allocate_slab().
They can't be be enabled unconditionally because early in the boot
process they have to remain off. Enabling them at the "SYSTEM_RUNNING"
state is too late. The earliest point is once we have scheduling
working that is once the kthread is running.

In newer -RT (v4.11+) there is the check
	if (system_state > SYSTEM_BOOTING)

to enable interrupts in the allocator which is what this patch intends:
Once scheduling is working enable interrupts in the allocator path so we
can take sleeping locks.

Reported-by: "Pavel V. Panteleev" <panteleev_p@mcst.ru>
Tested-by: "Pavel V. Panteleev" <panteleev_p@mcst.ru>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 init/main.c | 2 ++
 mm/slub.c   | 3 ++-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/init/main.c b/init/main.c
index 6470deef01c9..362761effeac 100644
--- a/init/main.c
+++ b/init/main.c
@@ -379,6 +379,7 @@ static void __init setup_command_line(char *command_line)
  */
 
 static __initdata DECLARE_COMPLETION(kthreadd_done);
+extern bool slab_do_irq_on;
 
 static noinline void __ref rest_init(void)
 {
@@ -396,6 +397,7 @@ static noinline void __ref rest_init(void)
 	rcu_read_lock();
 	kthreadd_task = find_task_by_pid_ns(pid, &init_pid_ns);
 	rcu_read_unlock();
+	slab_do_irq_on = true;
 	complete(&kthreadd_done);
 
 	/*
diff --git a/mm/slub.c b/mm/slub.c
index 67eb368b9314..22ec805130e5 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1527,6 +1527,7 @@ static inline bool shuffle_freelist(struct kmem_cache *s, struct page *page)
 	return false;
 }
 #endif /* CONFIG_SLAB_FREELIST_RANDOM */
+bool slab_do_irq_on;
 
 static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
 {
@@ -1543,7 +1544,7 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
 	if (gfpflags_allow_blocking(flags))
 		enableirqs = true;
 #ifdef CONFIG_PREEMPT_RT_FULL
-	if (system_state == SYSTEM_RUNNING)
+	if (slab_do_irq_on)
 		enableirqs = true;
 #endif
 	if (enableirqs)
-- 
2.15.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: schedule under irqs_disabled in SLUB problem
  2017-11-24  6:39             ` Sam Kappen
@ 2017-11-24  9:37               ` Sebastian Andrzej Siewior
  2017-11-27  6:46                 ` Sam Kappen
  0 siblings, 1 reply; 19+ messages in thread
From: Sebastian Andrzej Siewior @ 2017-11-24  9:37 UTC (permalink / raw)
  To: Sam Kappen; +Cc: linux-rt-users

On 2017-11-24 12:09:16 [+0530], Sam Kappen wrote:
> Hi,
Hi,

> I am also faces a similar kind of issue on X86 target, while testing
> 3.10.105-rt119.
> The issue is seen during boot-up when USB/SCSI enumeration starts.
> 
> Below is the log from my console

Can you try if the patch I posted solves that? From the callchain it
looks like the same thing.

Sebastian

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: schedule under irqs_disabled in SLUB problem
  2017-11-24  9:37               ` Sebastian Andrzej Siewior
@ 2017-11-27  6:46                 ` Sam Kappen
  2017-12-04  9:59                   ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 19+ messages in thread
From: Sam Kappen @ 2017-11-27  6:46 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: linux-rt-users

Hi,

Many thanks for your kind response.
I will put it for long run test and update.

Could you please look at my below queries?

1.)
I had derived and tried a patch based on the below analysis.
( I referred below open source commit, to derive on this patch.
https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git/commit/?h=v4.9.47-rt37-rebase&id=7a347757f027190c95a363a491c18156a926a370
)

In some cases pi_lock in rt_spin_lock_slowlock does not retain the
irqs state while exiting function, this causes
issue in migrate_disable() + enable as they are not symmetrical in
regard to the status of interrupts.
To fix pi_lock & pi_unlock in rt_spin_lock_slowlock, it has been
modified to retain irq state by using
raw_spin_lock and raw_spin_unlock and also modified wait_lock in
rt_spin_lock_slowlock with raw_spin_lock_irqsave & *_restore.



kernel/rtmutex.c | 25 ++++++++++++-------------
 1 file changed, 12 insertions(+), 13 deletions(-)

diff --git a/kernel/rtmutex.c b/kernel/rtmutex.c
index 7cf4b8b..9c67d80 100644
--- a/kernel/rtmutex.c
+++ b/kernel/rtmutex.c
@@ -1191,8 +1191,6 @@ static int adaptive_wait(struct rt_mutex *lock,
 }
 #endif

-# define pi_lock(lock) raw_spin_lock_irq(lock)
-# define pi_unlock(lock) raw_spin_unlock_irq(lock)

 /*
  * Slow path lock function spin_lock style: this variant is very
@@ -1206,14 +1204,15 @@ static void  noinline __sched
rt_spin_lock_slowlock(struct rt_mutex *lock)
  struct task_struct *lock_owner, *self = current;
  struct rt_mutex_waiter waiter, *top_waiter;
  int ret;
+ unsigned long flags;

  rt_mutex_init_waiter(&waiter, true);

- raw_spin_lock(&lock->wait_lock);
+ raw_spin_lock_irqsave(&lock->wait_lock, flags);
  init_lists(lock);

  if (__try_to_take_rt_mutex(lock, self, NULL, STEAL_LATERAL)) {
- raw_spin_unlock(&lock->wait_lock);
+ raw_spin_unlock_irqrestore(&lock->wait_lock, flags);
  return;
  }

@@ -1225,10 +1224,10 @@ static void  noinline __sched
rt_spin_lock_slowlock(struct rt_mutex *lock)
  * as well. We are serialized via pi_lock against wakeups. See
  * try_to_wake_up().
  */
- pi_lock(&self->pi_lock);
+ raw_spin_lock(&self->pi_lock);
  self->saved_state = self->state;
  __set_current_state(TASK_UNINTERRUPTIBLE);
- pi_unlock(&self->pi_lock);
+ raw_spin_unlock(&self->pi_lock);

  ret = task_blocks_on_rt_mutex(lock, &waiter, self, RT_MUTEX_MIN_CHAINWALK);
  BUG_ON(ret);
@@ -1241,18 +1240,18 @@ static void  noinline __sched
rt_spin_lock_slowlock(struct rt_mutex *lock)
  top_waiter = rt_mutex_top_waiter(lock);
  lock_owner = rt_mutex_owner(lock);

- raw_spin_unlock(&lock->wait_lock);
+ raw_spin_unlock_irqrestore(&lock->wait_lock, flags);

  debug_rt_mutex_print_deadlock(&waiter);

  if (top_waiter != &waiter || adaptive_wait(lock, lock_owner))
  schedule_rt_mutex(lock);

- raw_spin_lock(&lock->wait_lock);
+ raw_spin_lock_irqsave(&lock->wait_lock, flags);

- pi_lock(&self->pi_lock);
+ raw_spin_lock(&self->pi_lock);
  __set_current_state(TASK_UNINTERRUPTIBLE);
- pi_unlock(&self->pi_lock);
+ raw_spin_unlock(&self->pi_lock);
  }

  /*
@@ -1262,10 +1261,10 @@ static void  noinline __sched
rt_spin_lock_slowlock(struct rt_mutex *lock)
  * happened while we were blocked. Clear saved_state so
  * try_to_wakeup() does not get confused.
  */
- pi_lock(&self->pi_lock);
+ raw_spin_lock(&self->pi_lock);
  __set_current_state(self->saved_state);
  self->saved_state = TASK_RUNNING;
- pi_unlock(&self->pi_lock);
+ raw_spin_unlock(&self->pi_lock);

  /*
  * try_to_take_rt_mutex() sets the waiter bit
@@ -1276,7 +1275,7 @@ static void  noinline __sched
rt_spin_lock_slowlock(struct rt_mutex *lock)
  BUG_ON(rt_mutex_has_waiters(lock) && &waiter == rt_mutex_top_waiter(lock));
  BUG_ON(!plist_node_empty(&waiter.list_entry));

- raw_spin_unlock(&lock->wait_lock);
+ raw_spin_unlock_irqrestore(&lock->wait_lock, flags);

  debug_rt_mutex_free_waiter(&waiter);
 }
-- 
2.7.4

We were testing above patch on multiple targets we could experience
some stuck issue on some remote target after 2 days. I am not
sure what really happens there, may be the issue when try for
scheduling with irq in disabled state.
The systems I have tested found to be worked 7 days after that I
stopped the test.


2.) With your patch during the slab allocations irqs will be in enabled state.
So if we enable irqs in early stage will there be any side effects? I
am sorry if my question doesn't seem
to be logical.



Regards,
Sam






On Fri, Nov 24, 2017 at 3:07 PM, Sebastian Andrzej Siewior
<bigeasy@linutronix.de> wrote:
> On 2017-11-24 12:09:16 [+0530], Sam Kappen wrote:
>> Hi,
> Hi,
>
>> I am also faces a similar kind of issue on X86 target, while testing
>> 3.10.105-rt119.
>> The issue is seen during boot-up when USB/SCSI enumeration starts.
>>
>> Below is the log from my console
>
> Can you try if the patch I posted solves that? From the callchain it
> looks like the same thing.
>
> Sebastian

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: schedule under irqs_disabled in SLUB problem
  2017-11-27  6:46                 ` Sam Kappen
@ 2017-12-04  9:59                   ` Sebastian Andrzej Siewior
  2017-12-05 16:31                     ` Sam Kappen
  0 siblings, 1 reply; 19+ messages in thread
From: Sebastian Andrzej Siewior @ 2017-12-04  9:59 UTC (permalink / raw)
  To: Sam Kappen; +Cc: linux-rt-users

On 2017-11-27 12:16:36 [+0530], Sam Kappen wrote:
> Hi,
Hi,

> 1.)
> I had derived and tried a patch based on the below analysis.
> ( I referred below open source commit, to derive on this patch.
> https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git/commit/?h=v4.9.47-rt37-rebase&id=7a347757f027190c95a363a491c18156a926a370
> )
> 
> In some cases pi_lock in rt_spin_lock_slowlock does not retain the
> irqs state while exiting function, this causes
> issue in migrate_disable() + enable as they are not symmetrical in
> regard to the status of interrupts.
> To fix pi_lock & pi_unlock in rt_spin_lock_slowlock, it has been
> modified to retain irq state by using
> raw_spin_lock and raw_spin_unlock and also modified wait_lock in
> rt_spin_lock_slowlock with raw_spin_lock_irqsave & *_restore.

Can you provide more informations on this? Like a stack strace that
shows that this happens and when it happens? It should not happen.

…
> We were testing above patch on multiple targets we could experience
> some stuck issue on some remote target after 2 days. I am not
> sure what really happens there, may be the issue when try for
> scheduling with irq in disabled state.
> The systems I have tested found to be worked 7 days after that I
> stopped the test.

Which patch? The patch I've sent and ask you for testing or the patch
you had in this email?

> 
> 2.) With your patch during the slab allocations irqs will be in enabled state.
> So if we enable irqs in early stage will there be any side effects? I
> am sorry if my question doesn't seem
> to be logical.

You must not enable the interrupts too early. At the time of scheduling
it is okay.

> Regards,
> Sam

Sebastian

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: schedule under irqs_disabled in SLUB problem
  2017-12-04  9:59                   ` Sebastian Andrzej Siewior
@ 2017-12-05 16:31                     ` Sam Kappen
  2017-12-12 10:18                       ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 19+ messages in thread
From: Sam Kappen @ 2017-12-05 16:31 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: linux-rt-users

Hi,

Thanks for looking at my queries. Please see my answers inline.

 1.)
> I had derived and tried a patch based on the below analysis.
> ( I referred below open source commit, to derive on this patch.
> https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git/commit/?h=v4.9.47-rt37-rebase&id=7a347757f027190c95a363a491c18156a926a370
> )
>
> In some cases pi_lock in rt_spin_lock_slowlock does not retain the
> irqs state while exiting function, this causes
> issue in migrate_disable() + enable as they are not symmetrical in
> regard to the status of interrupts.
> To fix pi_lock & pi_unlock in rt_spin_lock_slowlock, it has been
> modified to retain irq state by using
> raw_spin_lock and raw_spin_unlock and also modified wait_lock in
> rt_spin_lock_slowlock with raw_spin_lock_irqsave & *_restore.

Can you provide more informations on this? Like a stack strace that
shows that this happens and when it happens? It should not happen.

As we were experiencing a panic issue with in  3 to 6 hours during the
test(Test is continues soft reboot of the system
as mentioned in previous mail). With the help of instrument code we
have been tracked it down to the function rt_spin_lock_slowlock() in
rtmutex.c.
We see this issue when there is a state change for irqs from disabled
to enabled. During slab allocations for SCSI on bootup
the irqs are found to be in disabled state since the system state is
not yet in "RUNNING".

So we have added instrument code throughout the call trace and
confirmed culprit as pi_lock()/pi_unlock for changing the irqs state.
Basically it happens when it acquires the lock with irqs in disabled state.


I guess below scenario is happens when issue hits.

It looks like during normal cases with irqs in disabled state from the
function rt_spin_lock_slowlock(),
It gets mutex lock in its first try and  takes first return path so it
 need not have to take pi_lock/unlock. But in some special case (error
case)
mutex  lock is not available(I am not sure why this happens? ) and go
further retry hence it acquires pi_lock/unlock
then into panic.


I am providing below some stack traces which we have seen during the
test. All relevant debug configs were enabled while testing.

scsi 0:0:0:0: Direct-Access     Linux    scsi_debug       0004 PQ: 0 ANSI: 5
mm/mempolicy.c alloc_pages_current 2067 irq disabled!!!   ==> debug
print added by me
mm/mempolicy.c alloc_pages_current 2067 irq disabled!!!   ==> debug
print added by me
mm/mempolicy.c alloc_pages_current 2067 irq disabled!!!   ==> debug
print added by me
mm/mempolicy.c alloc_pages_current 2067 irq disabled!!!   ==> debug
print added by me
------------[ cut here ]------------
------------[ cut here ]------------
WARNING: at kernel/sched/core.c:3052 migrate_disable+0x10b/0x120()
Modules linked in:
CPU: 1 PID: 7 Comm: kworker/u8:0 Not tainted 3.10.107-rt120+ #49
Hardware name: To be filled by O.E.M. To be filled by
O.E.M./WADE-8078, BIOS R1.00.E0 07/07/2014
Workqueue: events_unbound async_run_entry_fn
 0000000000000000 ffff880159ee72e8 ffffffff816b617c 0000000000000000
 0000000000000009 ffff880159ee7328 ffffffff8105fc8b ffff880159ee7348
 ffff880159ea3540 0000000000000038 0000000000000001 0000000000000004
Call Trace:
 [<ffffffff816b617c>] dump_stack+0x4f/0x65
 [<ffffffff8105fc8b>] warn_slowpath_common+0x6b/0xa0
 [<ffffffff8105fcd5>] warn_slowpath_null+0x15/0x20
 [<ffffffff8109585b>] migrate_disable+0x10b/0x120
 [<ffffffff81060c45>] call_console_drivers.constprop.20+0x65/0x100
 [<ffffffff81061da8>] console_unlock+0x398/0x3d0
 [<ffffffff81062303>] vprintk_emit+0x2b3/0x500
 [<ffffffff810b9526>] ? __try_to_take_rt_mutex+0x146/0x190
 [<ffffffff8109569c>] ? migrate_enable+0x14c/0x200
 [<ffffffff816b17e5>] printk+0x48/0x4a
 [<ffffffff8109569c>] ? migrate_enable+0x14c/0x200
 [<ffffffff8105fc59>] warn_slowpath_common+0x39/0xa0
 [<ffffffff8105fcd5>] warn_slowpath_null+0x15/0x20
 [<ffffffff8109569c>] migrate_enable+0x14c/0x200
 [<ffffffff81100fb1>] get_page_from_freelist+0x9a1/0xbc0
 [<ffffffff812fc3d9>] ? number.isra.1+0x329/0x360
 [<ffffffff81101f89>] __alloc_pages_nodemask+0x179/0xa50
 [<ffffffff816ba6b3>] ? _raw_spin_unlock+0x13/0x40
 [<ffffffff8109de14>] ? update_curr+0xa4/0xf0
 [<ffffffff81138ab1>] alloc_pages_current+0x101/0x1f0
 [<ffffffff8113cf95>] new_slab+0x265/0x310
 [<ffffffff816b386e>] __slab_alloc.isra.62+0x4e0/0x6ca
 [<ffffffff816ba744>] ? _raw_spin_unlock_irq+0x14/0x40
 [<ffffffff810fbd0a>] ? mempool_alloc_slab+0x3a/0x70
 [<ffffffff816ba6b3>] ? _raw_spin_unlock+0x13/0x40
 [<ffffffff8113f5d0>] kmem_cache_alloc+0x170/0x190
 [<ffffffff810fbd0a>] ? mempool_alloc_slab+0x3a/0x70
 [<ffffffff810fbd0a>] mempool_alloc_slab+0x3a/0x70
 [<ffffffff810fc0be>] mempool_alloc+0xae/0x210
 [<ffffffff81063f85>] ? unpin_current_cpu+0x15/0x70
 [<ffffffff812d5ce8>] get_request+0x3a8/0x7c0
 [<ffffffff81089c30>] ? __init_waitqueue_head+0x60/0x60
 [<ffffffff812d619a>] blk_get_request+0x9a/0x140
 [<ffffffff8113fb68>] ? kmem_cache_free+0x188/0x1a0
 [<ffffffff813ef02a>] scsi_execute+0x4a/0x170
 [<ffffffff813ef226>] scsi_execute_req_flags+0xd6/0x190
 [<ffffffff81478349>] read_capacity_16+0xb9/0x550
 [<ffffffff81478fc8>] sd_revalidate_disk+0x4c8/0x1c90
 [<ffffffff8147a855>] sd_probe_async+0xc5/0x1d0
 [<ffffffff81090a96>] async_run_entry_fn+0x36/0x130
 [<ffffffff81081047>] process_one_work+0x147/0x3d0
 [<ffffffff810824e1>] worker_thread+0x161/0x3d0
 [<ffffffff81082380>] ? manage_workers.isra.33+0x2f0/0x2f0
 [<ffffffff81088f75>] kthread+0xc5/0xd0
 [<ffffffff81088eb0>] ? __init_kthread_worker+0x60/0x60
 [<ffffffff816bb37e>] ret_from_fork+0x4e/0x80
 [<ffffffff81088eb0>] ? __init_kthread_worker+0x60/0x60
---[ end trace 0000000000000001 ]---
------------[ cut here ]------------
WARNING: at kernel/sched/core.c:3087 migrate_enable+0x14c/0x200()
Modules linked in:
CPU: 1 PID: 7 Comm: kworker/u8:0 Tainted: G        W    3.10.107-rt120+ #49
Hardware name: To be filled by O.E.M. To be filled by
O.E.M./WADE-8078, BIOS R1.00.E0 07/07/2014
Workqueue: events_unbound async_run_entry_fn
 0000000000000000 ffff880159ee7228 ffffffff816b617c 0000000000000000
 0000000000000009 ffff880159ee7268 ffffffff8105fc8b ffff880159ee7258
 ffff880159ea3540 ffffffff81e1a9e8 ffffffff81e1a840 0000000000000000
Call Trace:
 [<ffffffff816b617c>] dump_stack+0x4f/0x65
 [<ffffffff8105fc8b>] warn_slowpath_common+0x6b/0xa0
 [<ffffffff8105fcd5>] warn_slowpath_null+0x15/0x20
 [<ffffffff8109569c>] migrate_enable+0x14c/0x200
 [<ffffffff8138ae56>] serial8250_poll+0xd6/0x120
 [<ffffffff813878dd>] uartdrv_console_write+0xdd/0x330
 [<ffffffff81060cad>] call_console_drivers.constprop.20+0xcd/0x100
 [<ffffffff81061da8>] console_unlock+0x398/0x3d0
 [<ffffffff81062303>] vprintk_emit+0x2b3/0x500
 [<ffffffff810b9526>] ? __try_to_take_rt_mutex+0x146/0x190
sd 0:0:0:0: Attached scsi generic sg0 type 0
 [<ffffffff8109569c>] ? migrate_enable+0x14c/0x200
 [<ffffffff816b17e5>] printk+0x48/0x4a
 [<ffffffff8109569c>] ? migrate_enable+0x14c/0x200
 [<ffffffff8105fc59>] warn_slowpath_common+0x39/0xa0
 [<ffffffff8105fcd5>] warn_slowpath_null+0x15/0x20
 [<ffffffff8109569c>] migrate_enable+0x14c/0x200
 [<ffffffff81100fb1>] get_page_from_freelist+0x9a1/0xbc0
 [<ffffffff812fc3d9>] ? number.isra.1+0x329/0x360
 [<ffffffff81101f89>] __alloc_pages_nodemask+0x179/0xa50
 [<ffffffff816ba6b3>] ? _raw_spin_unlock+0x13/0x40
 [<ffffffff8109de14>] ? update_curr+0xa4/0xf0
 [<ffffffff81138ab1>] alloc_pages_current+0x101/0x1f0
 [<ffffffff8113cf95>] new_slab+0x265/0x310
 [<ffffffff816b386e>] __slab_alloc.isra.62+0x4e0/0x6ca
 [<ffffffff816ba744>] ? _raw_spin_unlock_irq+0x14/0x40
 [<ffffffff810fbd0a>] ? mempool_alloc_slab+0x3a/0x70
 [<ffffffff816ba6b3>] ? _raw_spin_unlock+0x13/0x40
 [<ffffffff8113f5d0>] kmem_cache_alloc+0x170/0x190
 [<ffffffff810fbd0a>] ? mempool_alloc_slab+0x3a/0x70
 [<ffffffff810fbd0a>] mempool_alloc_slab+0x3a/0x70
 [<ffffffff810fc0be>] mempool_alloc+0xae/0x210
 [<ffffffff81063f85>] ? unpin_current_cpu+0x15/0x70
 [<ffffffff812d5ce8>] get_request+0x3a8/0x7c0
 [<ffffffff81089c30>] ? __init_waitqueue_head+0x60/0x60
 [<ffffffff812d619a>] blk_get_request+0x9a/0x140
 [<ffffffff8113fb68>] ? kmem_cache_free+0x188/0x1a0
 [<ffffffff813ef02a>] scsi_execute+0x4a/0x170
 [<ffffffff813ef226>] scsi_execute_req_flags+0xd6/0x190
 [<ffffffff81478349>] read_capacity_16+0xb9/0x550
 [<ffffffff81478fc8>] sd_revalidate_disk+0x4c8/0x
------------------------------------------------------------------------------------
scsi0 : scsi_debug, version 1.82 [20100324], dev_size_mb=8, opts=0x0
scsi 0:0:0:0: Direct-Access     Linux    scsi_debug       0004 PQ: 0 ANSI: 5
kernel/rtmutex.c rt_spin_lock_slowlock 1266 enterirq 1 exitirq 0!!!==>
debug print added by me
XXX: After local_spin_lock_irqsave enterirqs 1 exitirqs 0!!!    ==>
debug print added by me
------------[ cut here ]------------
WARNING: at kernel/sched/core.c:3052 migrate_disable+0x10b/0x120()
Modules linked in:
CPU: 0 PID: 7 Comm: kworker/u8:0 Not tainted 3.10.107-rt120+ #106
Hardware name: To be filled by O.E.M. To be filled by
O.E.M./WADE-8078, BIOS R1.00.E0 07/07/2014
Workqueue: events_unbound async_run_entry_fn
 0000000000000000 ffff880159ee73f8 ffffffff816b65bc 0000000000000000
 0000000000000009 ffff880159ee7438 ffffffff8105fc8b ffff880159ee7438
 ffff880159ea3540 0000000000000044 0000000000000001 0000000000000004
Call Trace:
 [<ffffffff816b65bc>] dump_stack+0x4f/0x65
 [<ffffffff8105fc8b>] warn_slowpath_common+0x6b/0xa0
 [<ffffffff8105fcd5>] warn_slowpath_null+0x15/0x20
 [<ffffffff8109585b>] migrate_disable+0x10b/0x120
 [<ffffffff81060c45>] call_console_drivers.constprop.20+0x65/0x100
 [<ffffffff81061da8>] console_unlock+0x398/0x3d0
 [<ffffffff81062303>] vprintk_emit+0x2b3/0x500
 [<ffffffff816b1c25>] printk+0x48/0x4a
 [<ffffffff81100f85>] get_page_from_freelist+0x985/0xc70
 [<ffffffff81102029>] __alloc_pages_nodemask+0x179/0xa50
 [<ffffffff81100b7a>] ? get_page_from_freelist+0x57a/0xc70
 [<ffffffff81063f85>] ? unpin_current_cpu+0x15/0x70
 [<ffffffff81138b2c>] alloc_pages_current+0xdc/0x1b0
 [<ffffffff8113d015>] new_slab+0x285/0x370
 [<ffffffff816b3cae>] __slab_alloc.isra.62+0x4e0/0x6ca
 [<ffffffff810fbcf8>] ? mempool_alloc_slab+0x28/0x60
 [<ffffffff8113f752>] kmem_cache_alloc+0x1a2/0x1e0
 [<ffffffff810fbcf8>] ? mempool_alloc_slab+0x28/0x60
 [<ffffffff810fbcf8>] mempool_alloc_slab+0x28/0x60
 [<ffffffff810fc0ae>] mempool_alloc+0xae/0x210
 [<ffffffff81063f85>] ? unpin_current_cpu+0x15/0x70
 [<ffffffff812d60f8>] get_request+0x3a8/0x7c0
 [<ffffffff81089c30>] ? __init_waitqueue_head+0x60/0x60
 [<ffffffff812d65aa>] blk_get_request+0x9a/0x140
 [<ffffffff813ef46a>] scsi_execute+0x4a/0x170
 [<ffffffff813ef666>] scsi_execute_req_flags+0xd6/0x190
 [<ffffffff81479008>] sd_revalidate_disk+0xc8/0x1c90
 [<ffffffff816ba561>] ? rt_spin_lock_slowlock+0x291/0x340
 [<ffffffff8147ac95>] sd_probe_async+0xc5/0x1d0
 [<ffffffff81090a96>] async_run_entry_fn+0x36/0x130
 [<ffffffff81081047>] process_one_work+0x147/0x3d0
 [<ffffffff810824e1>] worker_thread+0x161/0x3d0
 [<ffffffff81082380>] ? manage_workers.isra.33+0x2f0/0x2f0
 [<ffffffff81088f75>] kthread+0xc5/0xd0
 [<ffffffff81088eb0>] ? __init_kthread_worker+0x60/0x60
 [<ffffffff816bb8fe>] ret_from_fork+0x4e/0x80
 [<ffffffff81088eb0>] ? __init_kthread_worker+0x60/0x60
---[ end trace 0000000000000001 ]---
------------[ cut here ]------------
WARNING: at kernel/sched/core.c:3087 migrate_enable+0x14c/0x200()
Modules linked in:
CPU: 0 PID: 7 Comm: kworker/u8:0 Tainted: G        W    3.10.107-rt120+ #106
Hardware name: To be filled by O.E.M. To be filled by
O.E.M./WADE-8078, BIOS R1.00.E0 07/07/2014
Workqueue: events_unbound async_run_entry_fn
 0000000000000000 ffff880159ee7338 ffffffff816b65bc 0000000000000000
 0000000000000009 ffff880159ee7378 ffffffff8105fc8b ffff880159ee7368
 ffff880159ea3540 ffffffff81e1aa28 ffffffff81e1a880 0000000000000000
Call Trace:
 [<ffffffff816b65bc>] dump_stack+0x4f/0x65
 [<ffffffff8105fc8b>] warn_slowpath_common+0x6b/0xa0
 [<ffffffff8105fcd5>] warn_slowpath_null+0x15/0x20
 [<ffffffff8109569c>] migrate_enable+0x14c/0x200
 [<ffffffff8138b266>] serial8250_poll+0xd6/0x120
 [<ffffffff81387ced>] uartdrv_console_write+0xdd/0x330
 [<ffffffff81060cad>] call_console_drivers.constprop.20+0xcd/0x100
 [<ffffffff81061da8>] console_unlock+0x398/0x3d0
 [<ffffffff81062303>] vprintk_emit+0x2b3/0x500
 [<ffffffff816b1c25>] printk+0x48/0x4a
 [<ffffffff81100f85>] get_page_from_freelist+0x985/0xc70
 [<ffffffff81102029>] __alloc_pages_nodemask+0x179/0xa50
 [<ffffffff81100b7a>] ? get_page_from_freelist+0x57a/0xc70
 [<ffffffff81063f85>] ? unpin_current_cpu+0x15/0x70
 [<ffffffff81138b2c>] alloc_pages_current+0xdc/0x1b0
 [<ffffffff8113d015>] new_slab+0x285/0x370
 [<ffffffff816b3cae>] __slab_alloc.isra.62+0x4e0/0x6ca
 [<ffffffff810fbcf8>] ? mempool_alloc_slab+0x28/0x60
 [<ffffffff8113f752>] kmem_cache_alloc+0x1a2/0x1e0
 [<ffffffff810fbcf8>] ? mempool_alloc_slab+0x28/0x60
 [<ffffffff810fbcf8>] mempool_alloc_slab+0x28/0x60
 [<ffffffff810fc0ae>] mempool_alloc+0xae/0x210
 [<ffffffff81063f85>] ? unpin_current_cpu+0x15/0x70
 [<ffffffff812d60f8>] get_request+0x3a8/0x7c0
 [<ffffffff81089c30>] ? __init_waitqueue_head+0x60/0x60
 [<ffffffff812d65aa>] blk_get_request+0x9a/0x140
 [<ffffffff813ef46a>] scsi_execute+0x4a/0x170
 [<ffffffff813ef666>] scsi_execute_req_flags+0xd6/0x190
 [<ffffffff81479008>] sd_revalidate_disk+0xc8/0x1c90
 [<ffffffff816ba561>] ? rt_spin_lock_slowlock+0x291/0x340
 [<ffffffff8147ac95>] sd_probe_async+0xc5/0x1d0
 [<ffffffff81090a96>] async_run_entry_fn+0x36/0x130
 [<ffffffff81081047>] process_one_work+0x147/0x3d0
 [<ffffffff810824e1>] worker_thread+0x161/0x3d0
 [<ffffffff81082380>] ? manage_workers.isra.33+0x2f0/0x2f0
 [<ffffffff81088f75>] kthread+0xc5/0xd0
 [<ffffffff81088eb0>] ? __init_kthread_worker+0x60/0x60
 [<ffffffff816bb8fe>] ret_from_fork+0x4e/0x80
 [<ffffffff81088eb0>] ? __init_kthread_worker+0x60/0x60
---[ end trace 0000000000000002 ]---
XXX: __rmqueue enterirqs 1 exitirqs 0!!!             ==> debug print added by me
XXX: __mod_zone_freepage_state enterirq 1 exitirq 0!!!    ==> debug
print added by me
------------[ cut here ]------------
sd 0:0:0:0: Attached scsi generic sg0 type 0
ahci 0000:00:13.0: controller can't do DEVSLP, turning off
ahci 0000:00:13.0: AHCI 0001.0300 32 slots 2 ports 3 Gbps 0x0 impl SATA mode
ahci 0000:00:13.0: flags: 64bit ncq pm led clo pio slum part deso
scsi1 : ahci
----------------------------------------------------------------------------------------------

WARNING: at kernel/sched/core.c:3052 migrate_disable+0x10b/0x120()
Modules linked in:
CPU: 1 PID: 7 Comm: kworker/u8:0 Not tainted 3.10.107-rt120+ #49
Hardware name: To be filled by O.E.M. To be filled by
O.E.M./WADE-8078, BIOS R1.00.E0 07/07/2014
Workqueue: events_unbound async_run_entry_fn
 0000000000000000 ffff880159ee72e8 ffffffff816b617c 0000000000000000
 0000000000000009 ffff880159ee7328 ffffffff8105fc8b ffff880159ee7348
 ffff880159ea3540 0000000000000038 0000000000000001 0000000000000004
Call Trace:
 [<ffffffff816b617c>] dump_stack+0x4f/0x65
 [<ffffffff8105fc8b>] warn_slowpath_common+0x6b/0xa0
 [<ffffffff8105fcd5>] warn_slowpath_null+0x15/0x20
 [<ffffffff8109585b>] migrate_disable+0x10b/0x120
 [<ffffffff81060c45>] call_console_drivers.constprop.20+0x65/0x100
 [<ffffffff81061da8>] console_unlock+0x398/0x3d0
 [<ffffffff81062303>] vprintk_emit+0x2b3/0x500
 [<ffffffff810b9526>] ? __try_to_take_rt_mutex+0x146/0x190
 [<ffffffff8109569c>] ? migrate_enable+0x14c/0x200
 [<ffffffff816b17e5>] printk+0x48/0x4a
 [<ffffffff8109569c>] ? migrate_enable+0x14c/0x200
 [<ffffffff8105fc59>] warn_slowpath_common+0x39/0xa0
 [<ffffffff8105fcd5>] warn_slowpath_null+0x15/0x20
 [<ffffffff8109569c>] migrate_enable+0x14c/0x200
 [<ffffffff81100fb1>] get_page_from_freelist+0x9a1/0xbc0
 [<ffffffff812fc3d9>] ? number.isra.1+0x329/0x360
 [<ffffffff81101f89>] __alloc_pages_nodemask+0x179/0xa50
 [<ffffffff816ba6b3>] ? _raw_spin_unlock+0x13/0x40
 [<ffffffff8109de14>] ? update_curr+0xa4/0xf0
 [<ffffffff81138ab1>] alloc_pages_current+0x101/0x1f0
 [<ffffffff8113cf95>] new_slab+0x265/0x310
 [<ffffffff816b386e>] __slab_alloc.isra.62+0x4e0/0x6ca
 [<ffffffff816ba744>] ? _raw_spin_unlock_irq+0x14/0x40
 [<ffffffff810fbd0a>] ? mempool_alloc_slab+0x3a/0x70
 [<ffffffff816ba6b3>] ? _raw_spin_unlock+0x13/0x40
 [<ffffffff8113f5d0>] kmem_cache_alloc+0x170/0x190
 [<ffffffff810fbd0a>] ? mempool_alloc_slab+0x3a/0x70
 [<ffffffff810fbd0a>] mempool_alloc_slab+0x3a/0x70
 [<ffffffff810fc0be>] mempool_alloc+0xae/0x210
 [<ffffffff81063f85>] ? unpin_current_cpu+0x15/0x70
 [<ffffffff812d5ce8>] get_request+0x3a8/0x7c0
 [<ffffffff81089c30>] ? __init_waitqueue_head+0x60/0x60
 [<ffffffff812d619a>] blk_get_request+0x9a/0x140
 [<ffffffff8113fb68>] ? kmem_cache_free+0x188/0x1a0
 [<ffffffff813ef02a>] scsi_execute+0x4a/0x170
 [<ffffffff813ef226>] scsi_execute_req_flags+0xd6/0x190
 [<ffffffff81478349>] read_capacity_16+0xb9/0x550
 [<ffffffff81478fc8>] sd_revalidate_disk+0x4c8/0x1c90
 [<ffffffff8147a855>] sd_probe_async+0xc5/0x1d0
 [<ffffffff81090a96>] async_run_entry_fn+0x36/0x130
 [<ffffffff81081047>] process_one_work+0x147/0x3d0
 [<ffffffff810824e1>] worker_thread+0x161/0x3d0
 [<ffffffff81082380>] ? manage_workers.isra.33+0x2f0/0x2f0
 [<ffffffff81088f75>] kthread+0xc5/0xd0
 [<ffffffff81088eb0>] ? __init_kthread_worker+0x60/0x60
 [<ffffffff816bb37e>] ret_from_fork+0x4e/0x80
 [<ffffffff81088eb0>] ? __init_kthread_worker+0x60/0x60
---[ end trace 0000000000000001 ]---
------------[ cut here ]------------
WARNING: at kernel/sched/core.c:3087 migrate_enable+0x14c/0x200()
Modules linked in:

…
> We were testing above patch on multiple targets we could experience
> some stuck issue on some remote target after 2 days. I am not
> sure what really happens there, may be the issue when try for
> scheduling with irq in disabled state.
> The systems I have tested found to be worked 7 days after that I
> stopped the test.

Which patch? The patch I've sent and ask you for testing or the patch
you had in this email?

Patch I had in this mail.

>
> 2.) With your patch during the slab allocations irqs will be in enabled state.
> So if we enable irqs in early stage will there be any side effects? I
> am sorry if my question doesn't seem
> to be logical.

You must not enable the interrupts too early. At the time of scheduling
it is okay.

Thanks. I have been testing your patch, I will update once I finish the long
run test.

Regards,
Sam

On Mon, Dec 4, 2017 at 3:29 PM, Sebastian Andrzej Siewior
<bigeasy@linutronix.de> wrote:
> On 2017-11-27 12:16:36 [+0530], Sam Kappen wrote:
>> Hi,
> Hi,
>
>> 1.)
>> I had derived and tried a patch based on the below analysis.
>> ( I referred below open source commit, to derive on this patch.
>> https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git/commit/?h=v4.9.47-rt37-rebase&id=7a347757f027190c95a363a491c18156a926a370
>> )
>>
>> In some cases pi_lock in rt_spin_lock_slowlock does not retain the
>> irqs state while exiting function, this causes
>> issue in migrate_disable() + enable as they are not symmetrical in
>> regard to the status of interrupts.
>> To fix pi_lock & pi_unlock in rt_spin_lock_slowlock, it has been
>> modified to retain irq state by using
>> raw_spin_lock and raw_spin_unlock and also modified wait_lock in
>> rt_spin_lock_slowlock with raw_spin_lock_irqsave & *_restore.
>
> Can you provide more informations on this? Like a stack strace that
> shows that this happens and when it happens? It should not happen.
>
> …
>> We were testing above patch on multiple targets we could experience
>> some stuck issue on some remote target after 2 days. I am not
>> sure what really happens there, may be the issue when try for
>> scheduling with irq in disabled state.
>> The systems I have tested found to be worked 7 days after that I
>> stopped the test.
>
> Which patch? The patch I've sent and ask you for testing or the patch
> you had in this email?
>
>>
>> 2.) With your patch during the slab allocations irqs will be in enabled state.
>> So if we enable irqs in early stage will there be any side effects? I
>> am sorry if my question doesn't seem
>> to be logical.
>
> You must not enable the interrupts too early. At the time of scheduling
> it is okay.
>
>> Regards,
>> Sam
>
> Sebastian

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: schedule under irqs_disabled in SLUB problem
  2017-12-05 16:31                     ` Sam Kappen
@ 2017-12-12 10:18                       ` Sebastian Andrzej Siewior
  2018-03-05  8:47                         ` Sam Kappen
  0 siblings, 1 reply; 19+ messages in thread
From: Sebastian Andrzej Siewior @ 2017-12-12 10:18 UTC (permalink / raw)
  To: Sam Kappen; +Cc: linux-rt-users

On 2017-12-05 22:01:19 [+0530], Sam Kappen wrote:
> Hi,
Hi,

> Thanks for looking at my queries. Please see my answers inline.
please don't top-post. Please use a client which adds proper indention
while quoting the email.

>  1.)
> > I had derived and tried a patch based on the below analysis.
> > ( I referred below open source commit, to derive on this patch.
> > https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git/commit/?h=v4.9.47-rt37-rebase&id=7a347757f027190c95a363a491c18156a926a370
> > )
> >
> We see this issue when there is a state change for irqs from disabled
> to enabled. During slab allocations for SCSI on bootup
> the irqs are found to be in disabled state since the system state is
> not yet in "RUNNING".
> 
> So we have added instrument code throughout the call trace and
> confirmed culprit as pi_lock()/pi_unlock for changing the irqs state.
> Basically it happens when it acquires the lock with irqs in disabled state.

but by pi_lock/pi_unlock you don't mean the futex operation, do you?

based on the fact that the system is not in state "running" yet and this
trace here:
> ------------[ cut here ]------------
> WARNING: at kernel/sched/core.c:3052 migrate_disable+0x10b/0x120()
> Modules linked in:
> CPU: 1 PID: 7 Comm: kworker/u8:0 Not tainted 3.10.107-rt120+ #49
> Hardware name: To be filled by O.E.M. To be filled by
> Call Trace:
>  [<ffffffff8105fcd5>] warn_slowpath_null+0x15/0x20
>  [<ffffffff8109569c>] migrate_enable+0x14c/0x200
>  [<ffffffff81100fb1>] get_page_from_freelist+0x9a1/0xbc0
>  [<ffffffff81101f89>] __alloc_pages_nodemask+0x179/0xa50
>  [<ffffffff81138ab1>] alloc_pages_current+0x101/0x1f0
>  [<ffffffff8113cf95>] new_slab+0x265/0x310
>  [<ffffffff816b386e>] __slab_alloc.isra.62+0x4e0/0x6ca
>  [<ffffffff8113f5d0>] kmem_cache_alloc+0x170/0x190
>  [<ffffffff810fbd0a>] mempool_alloc_slab+0x3a/0x70
>  [<ffffffff810fc0be>] mempool_alloc+0xae/0x210
>  [<ffffffff812d5ce8>] get_request+0x3a8/0x7c0
>  [<ffffffff812d619a>] blk_get_request+0x9a/0x140
>  [<ffffffff813ef02a>] scsi_execute+0x4a/0x170
> ---[ end trace 0000000000000001 ]---

I would that this is the same issue and the patch I posted should help.

> > 2.) With your patch during the slab allocations irqs will be in enabled state.
> 
> Thanks. I have been testing your patch, I will update once I finish the long
> run test.

Okay, so a note to myself, there is nothing outstanding for me to do so
far.

> Regards,
> Sam

Sebastian

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: schedule under irqs_disabled in SLUB problem
  2017-12-12 10:18                       ` Sebastian Andrzej Siewior
@ 2018-03-05  8:47                         ` Sam Kappen
  2018-03-05 17:40                           ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 19+ messages in thread
From: Sam Kappen @ 2018-03-05  8:47 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: linux-rt-users

On Tue, Dec 12, 2017 at 3:48 PM, Sebastian Andrzej Siewior
<bigeasy@linutronix.de> wrote:
> On 2017-12-05 22:01:19 [+0530], Sam Kappen wrote:
>> Hi,
> Hi,
>
>> Thanks for looking at my queries. Please see my answers inline.
> please don't top-post. Please use a client which adds proper indention
> while quoting the email.
>
>>  1.)
>> > I had derived and tried a patch based on the below analysis.
>> > ( I referred below open source commit, to derive on this patch.
>> > https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git/commit/?h=v4.9.47-rt37-rebase&id=7a347757f027190c95a363a491c18156a926a370
>> > )
>> >
>> We see this issue when there is a state change for irqs from disabled
>> to enabled. During slab allocations for SCSI on bootup
>> the irqs are found to be in disabled state since the system state is
>> not yet in "RUNNING".
>>
>> So we have added instrument code throughout the call trace and
>> confirmed culprit as pi_lock()/pi_unlock for changing the irqs state.
>> Basically it happens when it acquires the lock with irqs in disabled state.
>
> but by pi_lock/pi_unlock you don't mean the futex operation, do you?
>
> based on the fact that the system is not in state "running" yet and this
> trace here:
>> ------------[ cut here ]------------
>> WARNING: at kernel/sched/core.c:3052 migrate_disable+0x10b/0x120()
>> Modules linked in:
>> CPU: 1 PID: 7 Comm: kworker/u8:0 Not tainted 3.10.107-rt120+ #49
>> Hardware name: To be filled by O.E.M. To be filled by
> …
>> Call Trace:
> …
>>  [<ffffffff8105fcd5>] warn_slowpath_null+0x15/0x20
>>  [<ffffffff8109569c>] migrate_enable+0x14c/0x200
>>  [<ffffffff81100fb1>] get_page_from_freelist+0x9a1/0xbc0
>>  [<ffffffff81101f89>] __alloc_pages_nodemask+0x179/0xa50
>>  [<ffffffff81138ab1>] alloc_pages_current+0x101/0x1f0
>>  [<ffffffff8113cf95>] new_slab+0x265/0x310
>>  [<ffffffff816b386e>] __slab_alloc.isra.62+0x4e0/0x6ca
>>  [<ffffffff8113f5d0>] kmem_cache_alloc+0x170/0x190
>>  [<ffffffff810fbd0a>] mempool_alloc_slab+0x3a/0x70
>>  [<ffffffff810fc0be>] mempool_alloc+0xae/0x210
>>  [<ffffffff812d5ce8>] get_request+0x3a8/0x7c0
>>  [<ffffffff812d619a>] blk_get_request+0x9a/0x140
>>  [<ffffffff813ef02a>] scsi_execute+0x4a/0x170
> …
>> ---[ end trace 0000000000000001 ]---
>
> I would that this is the same issue and the patch I posted should help.
>
>> > 2.) With your patch during the slab allocations irqs will be in enabled state.
>>
>> Thanks. I have been testing your patch, I will update once I finish the long
>> run test.
>
> Okay, so a note to myself, there is nothing outstanding for me to do so
> far.
>We have tested it for nearly a month and issue is not reproducible with your patch. Many thanks.

>> Regards,
>> Sam
>
> Sebastian

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: schedule under irqs_disabled in SLUB problem
  2018-03-05  8:47                         ` Sam Kappen
@ 2018-03-05 17:40                           ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 19+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-03-05 17:40 UTC (permalink / raw)
  To: Sam Kappen; +Cc: linux-rt-users

On 2018-03-05 14:17:29 [+0530], Sam Kappen wrote:
> On Tue, Dec 12, 2017 at 3:48 PM, Sebastian Andrzej Siewior
> <bigeasy@linutronix.de> wrote:

The whole email looks like quote of my email with no additional content.
Did you press the send button too soon or did you hide the new content
somewhere where I could not find it :)?

Sebastian

^ permalink raw reply	[flat|nested] 19+ messages in thread

* schedule under irqs_disabled in SLUB problem
@ 2017-11-01 11:31 Pavel V. Panteleev
  0 siblings, 0 replies; 19+ messages in thread
From: Pavel V. Panteleev @ 2017-11-01 11:31 UTC (permalink / raw)
  To: linux-rt-users

Hello! I have a problem on kernel 3.14.79-rt85 and I don't know, if it's 
solved on current kernel or, may be, it's our additions problem. Could 
you help me, please?

Kernel is broken in migrate_enable():
         WARN_ON_ONCE(p->migrate_disable <= 0);

I have a trace, where I see, that it happened, because few 
migrate_disable() were called under irqs_disabled (migrate_disable 
counter wasn't changed). But after schedule() call irqs were enabled and 
the following migrate_disable() and migrate_enable() changed 
migrate_disable counter. I suppose, that problem is schedule() call 
under irqs_disabled. There is a stack:
PROCESS: kworker/u8:4, PID: 744, CPU: 3, state: R oncpu (0x0), flags: 
0x4208060
  ---------------------------------------------------------------------
IP (hex) FILENAME PROCEDURE
---------------------------------------------------------------------
e200020e0700 <kernel> __schedule+0x1b60/0x1f90
e200020e0da8 <kernel> schedule+0x278/0x2e8
e200020e8a68 <kernel> rt_spin_lock_slowlock+0x740/0x15c8
e200020f5088 <kernel> rt_spin_lock+0x3b8/0x408
e20000578290 <kernel> get_page_from_freelist+0x1528/0x1b48
e20000579b28 <kernel> __alloc_pages_nodemask+0x270/0x1b90
e200006ac9d8 <kernel> alloc_pages_current+0x638/0xad0
e200006b24d8 <kernel> allocate_slab+0x210/0xa70
e200006b9468 <kernel> __slab_alloc+0x1300/0x1d80
e200006ba900 <kernel> kmem_cache_alloc+0xa18/0xa78
e2000101ed78 <kernel> radix_tree_node_alloc+0x280/0x478
e2000101f2a8 <kernel> radix_tree_extend+0x248/0x588
e2000101f6c0 <kernel> radix_tree_insert+0xd8/0x1258
e20000550e10 <kernel> page_cache_tree_insert+0xd0/0x498
e200005513e8 <kernel> add_to_page_cache_locked+0x210/0x7b0
e2000055d7f8 <kernel> __read_cache_page+0x160/0x7e8
e2000055df00 <kernel> do_read_cache_page+0x80/0xab8
e2000055e9c8 <kernel> read_cache_page+0x90/0xa8
e20000fdf110 <kernel> read_dev_sector+0xa8/0x180
e20000fe0178 <kernel> parse_extended+0x198/0x8f0
e20000fe1288 <kernel> msdos_partition+0x7a8/0xed8
e20000fdf790 <kernel> check_partition+0x508/0x870
e20000fddfe8 <kernel> rescan_partitions+0x3a8/0x1128
e20000818328 <kernel> __blkdev_get+0xcb8/0xe48
e20000818d20 <kernel> blkdev_get+0x868/0xf90
e20000fcd100 <kernel> register_disk+0x890/0xad0
e20000fcda08 <kernel> add_disk+0x6c8/0xc98
e20001801ab0 <kernel> sd_probe_async+0x328/0x648
e200002935a8 <kernel> async_run_entry_fn+0x100/0x598
e2000023fd28 <kernel> process_one_work+0x5a0/0x1940
e20000241508 <kernel> worker_thread+0x440/0xff8
e2000026c1a8 <kernel> kthread+0x4e8/0x5a0

As I see, in allocate_slab() kernel could be under irqs_disabled. And 
irqs would be enabled in case of SYSTEM_RUNNING (why only in case of 
SYSTEM_RUNNING?). But in our case system isn't running yet (it's always 
before /sbin/init), so irqs wouldn't be enabled:

         enableirqs = (flags & __GFP_WAIT) != 0;
         #ifdef CONFIG_PREEMPT_RT_FULL
         enableirqs |= system_state == SYSTEM_RUNNING;
         #endif
         if (enableirqs)
                 local_irq_enable();

So we call schedule() under irqs_disabled in 
...->get_page_from_freelist->buffered_rmqueue->local_spin_lock_irqsave->local_lock_irqsave->__local_lock_irqsave->__local_lock_irq->spin_lock_irqsave->spin_lock->rt_spin_lock->rt_spin_lock_slowlock->schedule_rt_mutex.

P. S.
Today I reproduced this on 4.9.47-rt37. Try to reproduce on x86.


Best regards,
Pavel V. Panteleev

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2018-03-05 17:40 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CADF-jezvVP2O++FR2KiRSSSJF7oObjy8LSP3-yj1HCmxyTzB_Q@mail.gmail.com>
2017-11-02 16:50 ` schedule under irqs_disabled in SLUB problem Sebastian Andrzej Siewior
2017-11-02 20:55   ` Grygorii Strashko
     [not found]   ` <CADF-jexLs9vRuiuoRmcA+0L6Mp-XxW75okheWV+ipGf1b_Ua1w@mail.gmail.com>
2017-11-03 10:23     ` Pavel V. Panteleev
2017-11-07  9:00       ` Pavel V. Panteleev
2017-11-07  9:14       ` Pavel V. Panteleev
2017-11-07  9:47       ` Pavel V. Panteleev
2017-11-16 16:08         ` Sebastian Andrzej Siewior
2017-11-16 16:39           ` Pavel V. Panteleev
2017-11-17 17:38           ` Julia Cartwright
2017-11-24  6:39             ` Sam Kappen
2017-11-24  9:37               ` Sebastian Andrzej Siewior
2017-11-27  6:46                 ` Sam Kappen
2017-12-04  9:59                   ` Sebastian Andrzej Siewior
2017-12-05 16:31                     ` Sam Kappen
2017-12-12 10:18                       ` Sebastian Andrzej Siewior
2018-03-05  8:47                         ` Sam Kappen
2018-03-05 17:40                           ` Sebastian Andrzej Siewior
2017-11-24  9:35             ` [PATCH] mm/slub: enable IRQs once scheduling is working Sebastian Andrzej Siewior
2017-11-01 11:31 schedule under irqs_disabled in SLUB problem Pavel V. Panteleev

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.