linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Sleeping BUG in khugepaged for i586
@ 2017-06-03 19:24 Larry Finger
  2017-06-05 21:44 ` Andrew Morton
  0 siblings, 1 reply; 25+ messages in thread
From: Larry Finger @ 2017-06-03 19:24 UTC (permalink / raw)
  To: LKML, linux-mm; +Cc: Andrew Morton

I recently turned on locking diagnostics for a Dell Latitude D600 laptop, which 
requires a 32-bit kernel. In the log I found the following:

BUG: sleeping function called from invalid context at mm/khugepaged.c:655
in_atomic(): 1, irqs_disabled(): 0, pid: 20, name: khugepaged
1 lock held by khugepaged/20:
  #0:  (&mm->mmap_sem){++++++}, at: [<c03d6609>] 
collapse_huge_page.isra.47+0x439/0x1240
CPU: 0 PID: 20 Comm: khugepaged Tainted: G        W 
4.12.0-rc1-wl-12125-g952a068 #80
Hardware name: Dell Computer Corporation Latitude D600 
/03U652, BIOS A05 05/29/2003
Call Trace:
  dump_stack+0x76/0xb2
  ___might_sleep+0x174/0x230
  collapse_huge_page.isra.47+0xacf/0x1240
  khugepaged_scan_mm_slot+0x41e/0xc00
  ? _raw_spin_lock+0x46/0x50
  khugepaged+0x277/0x4f0
  ? prepare_to_wait_event+0xe0/0xe0
  kthread+0xeb/0x120
  ? khugepaged_scan_mm_slot+0xc00/0xc00
  ? kthread_create_on_node+0x30/0x30
  ret_from_fork+0x21/0x30

I have no idea when this problem was introduced. Of course, I will test any 
proposed fixes.

Thanks,

Larry

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Sleeping BUG in khugepaged for i586
  2017-06-03 19:24 Sleeping BUG in khugepaged for i586 Larry Finger
@ 2017-06-05 21:44 ` Andrew Morton
  2017-06-06 14:02   ` Vlastimil Babka
  0 siblings, 1 reply; 25+ messages in thread
From: Andrew Morton @ 2017-06-05 21:44 UTC (permalink / raw)
  To: Larry Finger; +Cc: LKML, linux-mm

On Sat, 3 Jun 2017 14:24:26 -0500 Larry Finger <Larry.Finger@lwfinger.net> wrote:

> I recently turned on locking diagnostics for a Dell Latitude D600 laptop, which 
> requires a 32-bit kernel. In the log I found the following:
> 
> BUG: sleeping function called from invalid context at mm/khugepaged.c:655
> in_atomic(): 1, irqs_disabled(): 0, pid: 20, name: khugepaged
> 1 lock held by khugepaged/20:
>   #0:  (&mm->mmap_sem){++++++}, at: [<c03d6609>] 
> collapse_huge_page.isra.47+0x439/0x1240
> CPU: 0 PID: 20 Comm: khugepaged Tainted: G        W 
> 4.12.0-rc1-wl-12125-g952a068 #80
> Hardware name: Dell Computer Corporation Latitude D600 
> /03U652, BIOS A05 05/29/2003
> Call Trace:
>   dump_stack+0x76/0xb2
>   ___might_sleep+0x174/0x230
>   collapse_huge_page.isra.47+0xacf/0x1240
>   khugepaged_scan_mm_slot+0x41e/0xc00
>   ? _raw_spin_lock+0x46/0x50
>   khugepaged+0x277/0x4f0
>   ? prepare_to_wait_event+0xe0/0xe0
>   kthread+0xeb/0x120
>   ? khugepaged_scan_mm_slot+0xc00/0xc00
>   ? kthread_create_on_node+0x30/0x30
>   ret_from_fork+0x21/0x30
> 
> I have no idea when this problem was introduced. Of course, I will test any 
> proposed fixes.
> 

Odd.  There's nothing wrong with cond_resched() while holding mmap_sem.
It looks like khugepaged forgot to do a spin_unlock somewhere and we
leaked a preempt_count.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Sleeping BUG in khugepaged for i586
  2017-06-05 21:44 ` Andrew Morton
@ 2017-06-06 14:02   ` Vlastimil Babka
  2017-06-06 15:01     ` Larry Finger
  0 siblings, 1 reply; 25+ messages in thread
From: Vlastimil Babka @ 2017-06-06 14:02 UTC (permalink / raw)
  To: Andrew Morton, Larry Finger; +Cc: LKML, linux-mm

On 06/05/2017 11:44 PM, Andrew Morton wrote:
> On Sat, 3 Jun 2017 14:24:26 -0500 Larry Finger <Larry.Finger@lwfinger.net> wrote:
> 
>> I recently turned on locking diagnostics for a Dell Latitude D600 laptop, which 
>> requires a 32-bit kernel. In the log I found the following:
>>
>> BUG: sleeping function called from invalid context at mm/khugepaged.c:655
>> in_atomic(): 1, irqs_disabled(): 0, pid: 20, name: khugepaged
>> 1 lock held by khugepaged/20:
>>   #0:  (&mm->mmap_sem){++++++}, at: [<c03d6609>] 
>> collapse_huge_page.isra.47+0x439/0x1240
>> CPU: 0 PID: 20 Comm: khugepaged Tainted: G        W 

W means thre was WARN earler. Could be related... Got logs?

>> 4.12.0-rc1-wl-12125-g952a068 #80

What is "wl-12125-g952a068"? What patches on top of mainline?

>> Hardware name: Dell Computer Corporation Latitude D600 
>> /03U652, BIOS A05 05/29/2003
>> Call Trace:
>>   dump_stack+0x76/0xb2
>>   ___might_sleep+0x174/0x230
>>   collapse_huge_page.isra.47+0xacf/0x1240
>>   khugepaged_scan_mm_slot+0x41e/0xc00
>>   ? _raw_spin_lock+0x46/0x50
>>   khugepaged+0x277/0x4f0
>>   ? prepare_to_wait_event+0xe0/0xe0
>>   kthread+0xeb/0x120
>>   ? khugepaged_scan_mm_slot+0xc00/0xc00
>>   ? kthread_create_on_node+0x30/0x30
>>   ret_from_fork+0x21/0x30
>>
>> I have no idea when this problem was introduced. Of course, I will test any 
>> proposed fixes.
>>
> 
> Odd.  There's nothing wrong with cond_resched() while holding mmap_sem.
> It looks like khugepaged forgot to do a spin_unlock somewhere and we
> leaked a preempt_count.

Hmm I'd expect such spin lock to be reported together with mmap_sem in
the debugging "locks held" message?

> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Sleeping BUG in khugepaged for i586
  2017-06-06 14:02   ` Vlastimil Babka
@ 2017-06-06 15:01     ` Larry Finger
  2017-06-07  7:19       ` Vlastimil Babka
  0 siblings, 1 reply; 25+ messages in thread
From: Larry Finger @ 2017-06-06 15:01 UTC (permalink / raw)
  To: Vlastimil Babka, Andrew Morton; +Cc: LKML, linux-mm

On 06/06/2017 09:02 AM, Vlastimil Babka wrote:
> On 06/05/2017 11:44 PM, Andrew Morton wrote:
>> On Sat, 3 Jun 2017 14:24:26 -0500 Larry Finger <Larry.Finger@lwfinger.net> wrote:
>>
>>> I recently turned on locking diagnostics for a Dell Latitude D600 laptop, which
>>> requires a 32-bit kernel. In the log I found the following:
>>>
>>> BUG: sleeping function called from invalid context at mm/khugepaged.c:655
>>> in_atomic(): 1, irqs_disabled(): 0, pid: 20, name: khugepaged
>>> 1 lock held by khugepaged/20:
>>>    #0:  (&mm->mmap_sem){++++++}, at: [<c03d6609>]
>>> collapse_huge_page.isra.47+0x439/0x1240
>>> CPU: 0 PID: 20 Comm: khugepaged Tainted: G        W
> 
> W means thre was WARN earler. Could be related... Got logs?

When I grabbed a splat, I got the last one in my log. The first one shows "Not 
tainted".

> 
>>> 4.12.0-rc1-wl-12125-g952a068 #80
> 
> What is "wl-12125-g952a068"? What patches on top of mainline?

I found this while chasing a problem with one of the wireless drivers. For that 
reason I use Kalle Valo's wireless-testing-next, which happens to be the only 
kernel tree I have on this laptop. I'm reasonably certain that the extra updates 
are not the cause of the problem as the first one appears before any of the 
wireless drivers are loaded, but I will pull a clean copy of mainline to test 
that assumption.

>>> Hardware name: Dell Computer Corporation Latitude D600
>>> /03U652, BIOS A05 05/29/2003
>>> Call Trace:
>>>    dump_stack+0x76/0xb2
>>>    ___might_sleep+0x174/0x230
>>>    collapse_huge_page.isra.47+0xacf/0x1240
>>>    khugepaged_scan_mm_slot+0x41e/0xc00
>>>    ? _raw_spin_lock+0x46/0x50
>>>    khugepaged+0x277/0x4f0
>>>    ? prepare_to_wait_event+0xe0/0xe0
>>>    kthread+0xeb/0x120
>>>    ? khugepaged_scan_mm_slot+0xc00/0xc00
>>>    ? kthread_create_on_node+0x30/0x30
>>>    ret_from_fork+0x21/0x30
>>>
>>> I have no idea when this problem was introduced. Of course, I will test any
>>> proposed fixes.
>>>
>>
>> Odd.  There's nothing wrong with cond_resched() while holding mmap_sem.
>> It looks like khugepaged forgot to do a spin_unlock somewhere and we
>> leaked a preempt_count.
> 
> Hmm I'd expect such spin lock to be reported together with mmap_sem in
> the debugging "locks held" message?

My bisection of the problem is about half done. My latest good version is commit 
7b8cd33 and the latest bad one is 2ea659a. Only about 7 steps to go.

Larry


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Sleeping BUG in khugepaged for i586
  2017-06-06 15:01     ` Larry Finger
@ 2017-06-07  7:19       ` Vlastimil Babka
  2017-06-07 20:56         ` David Rientjes
  0 siblings, 1 reply; 25+ messages in thread
From: Vlastimil Babka @ 2017-06-07  7:19 UTC (permalink / raw)
  To: Larry Finger, Andrew Morton; +Cc: LKML, linux-mm, David Rientjes

On 06/06/2017 05:01 PM, Larry Finger wrote:
> On 06/06/2017 09:02 AM, Vlastimil Babka wrote:
>> On 06/05/2017 11:44 PM, Andrew Morton wrote:
>>> On Sat, 3 Jun 2017 14:24:26 -0500 Larry Finger <Larry.Finger@lwfinger.net> wrote:
>>>
>>>> I recently turned on locking diagnostics for a Dell Latitude D600 laptop, which
>>>> requires a 32-bit kernel. In the log I found the following:
>>>>
>>>> BUG: sleeping function called from invalid context at mm/khugepaged.c:655
>>>> in_atomic(): 1, irqs_disabled(): 0, pid: 20, name: khugepaged
>>>> 1 lock held by khugepaged/20:
>>>>    #0:  (&mm->mmap_sem){++++++}, at: [<c03d6609>]
>>>> collapse_huge_page.isra.47+0x439/0x1240
>>>> CPU: 0 PID: 20 Comm: khugepaged Tainted: G        W
>>
>> W means thre was WARN earler. Could be related... Got logs?
> 
> When I grabbed a splat, I got the last one in my log. The first one shows "Not 
> tainted".
> 
>>
>>>> 4.12.0-rc1-wl-12125-g952a068 #80
>>
>> What is "wl-12125-g952a068"? What patches on top of mainline?
> 
> I found this while chasing a problem with one of the wireless drivers. For that 
> reason I use Kalle Valo's wireless-testing-next, which happens to be the only 
> kernel tree I have on this laptop. I'm reasonably certain that the extra updates 
> are not the cause of the problem as the first one appears before any of the 
> wireless drivers are loaded, but I will pull a clean copy of mainline to test 
> that assumption.
> 
>>>> Hardware name: Dell Computer Corporation Latitude D600
>>>> /03U652, BIOS A05 05/29/2003
>>>> Call Trace:
>>>>    dump_stack+0x76/0xb2
>>>>    ___might_sleep+0x174/0x230
>>>>    collapse_huge_page.isra.47+0xacf/0x1240
>>>>    khugepaged_scan_mm_slot+0x41e/0xc00
>>>>    ? _raw_spin_lock+0x46/0x50
>>>>    khugepaged+0x277/0x4f0
>>>>    ? prepare_to_wait_event+0xe0/0xe0
>>>>    kthread+0xeb/0x120
>>>>    ? khugepaged_scan_mm_slot+0xc00/0xc00
>>>>    ? kthread_create_on_node+0x30/0x30
>>>>    ret_from_fork+0x21/0x30
>>>>
>>>> I have no idea when this problem was introduced. Of course, I will test any
>>>> proposed fixes.
>>>>
>>>
>>> Odd.  There's nothing wrong with cond_resched() while holding mmap_sem.
>>> It looks like khugepaged forgot to do a spin_unlock somewhere and we
>>> leaked a preempt_count.
>>
>> Hmm I'd expect such spin lock to be reported together with mmap_sem in
>> the debugging "locks held" message?
> 
> My bisection of the problem is about half done. My latest good version is commit 
> 7b8cd33 and the latest bad one is 2ea659a. Only about 7 steps to go.

Hmm, your bisection will most likely just find commit 338a16ba15495
which added the cond_resched() at mm/khugepaged.c:655. CCing David who
added it.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Sleeping BUG in khugepaged for i586
  2017-06-07  7:19       ` Vlastimil Babka
@ 2017-06-07 20:56         ` David Rientjes
  2017-06-08 14:48           ` Michal Hocko
  2017-06-08 15:29           ` Larry Finger
  0 siblings, 2 replies; 25+ messages in thread
From: David Rientjes @ 2017-06-07 20:56 UTC (permalink / raw)
  To: Vlastimil Babka; +Cc: Larry Finger, Andrew Morton, LKML, linux-mm

On Wed, 7 Jun 2017, Vlastimil Babka wrote:

> >> Hmm I'd expect such spin lock to be reported together with mmap_sem in
> >> the debugging "locks held" message?
> > 
> > My bisection of the problem is about half done. My latest good version is commit 
> > 7b8cd33 and the latest bad one is 2ea659a. Only about 7 steps to go.
> 
> Hmm, your bisection will most likely just find commit 338a16ba15495
> which added the cond_resched() at mm/khugepaged.c:655. CCing David who
> added it.
> 

I agree it's probably going to bisect to 338a16ba15495 since it's the 
cond_resched() at the line number reported, but I think there must be 
something else going on.  I think the list of locks held by khugepaged is 
correct because it matches with the implementation.  The preempt_count(), 
as suggested by Andrew, does not.  If this is reproducible, I'd like to 
know what preempt_count() is.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Sleeping BUG in khugepaged for i586
  2017-06-07 20:56         ` David Rientjes
@ 2017-06-08 14:48           ` Michal Hocko
  2017-06-08 17:05             ` Matthew Wilcox
                               ` (2 more replies)
  2017-06-08 15:29           ` Larry Finger
  1 sibling, 3 replies; 25+ messages in thread
From: Michal Hocko @ 2017-06-08 14:48 UTC (permalink / raw)
  To: David Rientjes
  Cc: Vlastimil Babka, Larry Finger, Andrew Morton, LKML, linux-mm

On Wed 07-06-17 13:56:01, David Rientjes wrote:
> On Wed, 7 Jun 2017, Vlastimil Babka wrote:
> 
> > >> Hmm I'd expect such spin lock to be reported together with mmap_sem in
> > >> the debugging "locks held" message?
> > > 
> > > My bisection of the problem is about half done. My latest good version is commit 
> > > 7b8cd33 and the latest bad one is 2ea659a. Only about 7 steps to go.
> > 
> > Hmm, your bisection will most likely just find commit 338a16ba15495
> > which added the cond_resched() at mm/khugepaged.c:655. CCing David who
> > added it.
> > 
> 
> I agree it's probably going to bisect to 338a16ba15495 since it's the 
> cond_resched() at the line number reported, but I think there must be 
> something else going on.  I think the list of locks held by khugepaged is 
> correct because it matches with the implementation.  The preempt_count(), 
> as suggested by Andrew, does not.  If this is reproducible, I'd like to 
> know what preempt_count() is.

collapse_huge_page
  pte_offset_map
    kmap_atomic
      kmap_atomic_prot
        preempt_disable
  __collapse_huge_page_copy
  pte_unmap
    kunmap_atomic
      __kunmap_atomic
        preempt_enable

I suspect, so cond_resched seems indeed inappropriate on 32b systems.

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Sleeping BUG in khugepaged for i586
  2017-06-07 20:56         ` David Rientjes
  2017-06-08 14:48           ` Michal Hocko
@ 2017-06-08 15:29           ` Larry Finger
  1 sibling, 0 replies; 25+ messages in thread
From: Larry Finger @ 2017-06-08 15:29 UTC (permalink / raw)
  To: David Rientjes, Vlastimil Babka; +Cc: Andrew Morton, LKML, linux-mm

On 06/07/2017 03:56 PM, David Rientjes wrote:
> On Wed, 7 Jun 2017, Vlastimil Babka wrote:
> 
>>>> Hmm I'd expect such spin lock to be reported together with mmap_sem in
>>>> the debugging "locks held" message?
>>>
>>> My bisection of the problem is about half done. My latest good version is commit
>>> 7b8cd33 and the latest bad one is 2ea659a. Only about 7 steps to go.
>>
>> Hmm, your bisection will most likely just find commit 338a16ba15495
>> which added the cond_resched() at mm/khugepaged.c:655. CCing David who
>> added it.
>>
> 
> I agree it's probably going to bisect to 338a16ba15495 since it's the
> cond_resched() at the line number reported, but I think there must be
> something else going on.  I think the list of locks held by khugepaged is
> correct because it matches with the implementation.  The preempt_count(),
> as suggested by Andrew, does not.  If this is reproducible, I'd like to
> know what preempt_count() is.
> 

The BUG output is reproducible. By the time the box finishes booting, there are 
at least 2 of them logged. My bisection shows that commit 338a16ba15495 is the 
bad one. I added a pr_info() to output the value of preempt_count() just before 
the cond_resched() statement. The count was always 1 whether the BUG was 
triggered or not.

If there are other things you would like logged at that point, or any other 
diagnostics, please let me know.

Larry

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Sleeping BUG in khugepaged for i586
  2017-06-08 14:48           ` Michal Hocko
@ 2017-06-08 17:05             ` Matthew Wilcox
  2017-06-08 20:18               ` Michal Hocko
  2017-06-15  1:12             ` David Rientjes
  2017-06-23 12:08             ` Michal Hocko
  2 siblings, 1 reply; 25+ messages in thread
From: Matthew Wilcox @ 2017-06-08 17:05 UTC (permalink / raw)
  To: Michal Hocko
  Cc: David Rientjes, Vlastimil Babka, Larry Finger, Andrew Morton,
	LKML, linux-mm

On Thu, Jun 08, 2017 at 04:48:31PM +0200, Michal Hocko wrote:
> On Wed 07-06-17 13:56:01, David Rientjes wrote:
> > I agree it's probably going to bisect to 338a16ba15495 since it's the 
> > cond_resched() at the line number reported, but I think there must be 
> > something else going on.  I think the list of locks held by khugepaged is 
> > correct because it matches with the implementation.  The preempt_count(), 
> > as suggested by Andrew, does not.  If this is reproducible, I'd like to 
> > know what preempt_count() is.
> 
> collapse_huge_page
>   pte_offset_map
>     kmap_atomic
>       kmap_atomic_prot
>         preempt_disable
>   __collapse_huge_page_copy
>   pte_unmap
>     kunmap_atomic
>       __kunmap_atomic
>         preempt_enable
> 
> I suspect, so cond_resched seems indeed inappropriate on 32b systems.

Then why doesn't it trigger on 64-bit systems too?

#ifndef ARCH_HAS_KMAP
...
static inline void *kmap_atomic(struct page *page)
{
        preempt_disable();
        pagefault_disable();
        return page_address(page);
}
#define kmap_atomic_prot(page, prot)    kmap_atomic(page)


... oh, wait, I see.  Because pte_offset_map() doesn't call kmap_atomic()
on 64-bit.  Indeed, it doesn't necessarily call kmap_atomic() on 32-bit
either; only with CONFIG_HIGHPTE enabled.  How much of a performance
penalty would it be to call kmap_atomic() unconditionally on 64 bit to
make sure that this kind of problem doesn't show on 32-bit systems only?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Sleeping BUG in khugepaged for i586
  2017-06-08 17:05             ` Matthew Wilcox
@ 2017-06-08 20:18               ` Michal Hocko
  2017-06-08 20:30                 ` Michal Hocko
  0 siblings, 1 reply; 25+ messages in thread
From: Michal Hocko @ 2017-06-08 20:18 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: David Rientjes, Vlastimil Babka, Larry Finger, Andrew Morton,
	LKML, linux-mm

On Thu 08-06-17 10:05:57, Matthew Wilcox wrote:
> On Thu, Jun 08, 2017 at 04:48:31PM +0200, Michal Hocko wrote:
> > On Wed 07-06-17 13:56:01, David Rientjes wrote:
> > > I agree it's probably going to bisect to 338a16ba15495 since it's the 
> > > cond_resched() at the line number reported, but I think there must be 
> > > something else going on.  I think the list of locks held by khugepaged is 
> > > correct because it matches with the implementation.  The preempt_count(), 
> > > as suggested by Andrew, does not.  If this is reproducible, I'd like to 
> > > know what preempt_count() is.
> > 
> > collapse_huge_page
> >   pte_offset_map
> >     kmap_atomic
> >       kmap_atomic_prot
> >         preempt_disable
> >   __collapse_huge_page_copy
> >   pte_unmap
> >     kunmap_atomic
> >       __kunmap_atomic
> >         preempt_enable
> > 
> > I suspect, so cond_resched seems indeed inappropriate on 32b systems.
> 
> Then why doesn't it trigger on 64-bit systems too?
> 
> #ifndef ARCH_HAS_KMAP
> ...
> static inline void *kmap_atomic(struct page *page)
> {
>         preempt_disable();
>         pagefault_disable();
>         return page_address(page);
> }
> #define kmap_atomic_prot(page, prot)    kmap_atomic(page)
> 
> 
> ... oh, wait, I see.  Because pte_offset_map() doesn't call kmap_atomic()
> on 64-bit.  Indeed, it doesn't necessarily call kmap_atomic() on 32-bit
> either; only with CONFIG_HIGHPTE enabled.  How much of a performance
> penalty would it be to call kmap_atomic() unconditionally on 64 bit to
> make sure that this kind of problem doesn't show on 32-bit systems only?

I am not sure I understand why would we map those pages in 64b systems?
We can access them directly.

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Sleeping BUG in khugepaged for i586
  2017-06-08 20:18               ` Michal Hocko
@ 2017-06-08 20:30                 ` Michal Hocko
  2017-06-09  6:48                   ` Vlastimil Babka
  2017-06-09 22:38                   ` David Rientjes
  0 siblings, 2 replies; 25+ messages in thread
From: Michal Hocko @ 2017-06-08 20:30 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: David Rientjes, Vlastimil Babka, Larry Finger, Andrew Morton,
	LKML, linux-mm

On Thu 08-06-17 22:18:22, Michal Hocko wrote:
> On Thu 08-06-17 10:05:57, Matthew Wilcox wrote:
> > On Thu, Jun 08, 2017 at 04:48:31PM +0200, Michal Hocko wrote:
> > > On Wed 07-06-17 13:56:01, David Rientjes wrote:
> > > > I agree it's probably going to bisect to 338a16ba15495 since it's the 
> > > > cond_resched() at the line number reported, but I think there must be 
> > > > something else going on.  I think the list of locks held by khugepaged is 
> > > > correct because it matches with the implementation.  The preempt_count(), 
> > > > as suggested by Andrew, does not.  If this is reproducible, I'd like to 
> > > > know what preempt_count() is.
> > > 
> > > collapse_huge_page
> > >   pte_offset_map
> > >     kmap_atomic
> > >       kmap_atomic_prot
> > >         preempt_disable
> > >   __collapse_huge_page_copy
> > >   pte_unmap
> > >     kunmap_atomic
> > >       __kunmap_atomic
> > >         preempt_enable
> > > 
> > > I suspect, so cond_resched seems indeed inappropriate on 32b systems.
> > 
> > Then why doesn't it trigger on 64-bit systems too?
> > 
> > #ifndef ARCH_HAS_KMAP
> > ...
> > static inline void *kmap_atomic(struct page *page)
> > {
> >         preempt_disable();
> >         pagefault_disable();
> >         return page_address(page);
> > }
> > #define kmap_atomic_prot(page, prot)    kmap_atomic(page)
> > 
> > 
> > ... oh, wait, I see.  Because pte_offset_map() doesn't call kmap_atomic()
> > on 64-bit.  Indeed, it doesn't necessarily call kmap_atomic() on 32-bit
> > either; only with CONFIG_HIGHPTE enabled.  How much of a performance
> > penalty would it be to call kmap_atomic() unconditionally on 64 bit to
> > make sure that this kind of problem doesn't show on 32-bit systems only?
> 
> I am not sure I understand why would we map those pages in 64b systems?
> We can access them directly.

But I guess you are primary after syncing the preemptive mode for 64 and
32b systems, right? I agree that having a different model is more than
unfortunate because 32b gets much less testing coverage and so a risk of
introducing a new bug is just a matter of time. Maybe we should make
pte_offset_map disable preemption and currently noop pte_unmap to
preempt_enable. The overhead should be pretty marginal on x86_64 but not
all arches have per-cpu preempt count. So I am not sure we really want
to add this to just for the debugging purposes...

I would just pull the cond_resched out of __collapse_huge_page_copy
right after pte_unmap. But I am not really sure why this cond_resched is
really needed because the changelog of the patch which adds is is quite
terse on details.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Sleeping BUG in khugepaged for i586
  2017-06-08 20:30                 ` Michal Hocko
@ 2017-06-09  6:48                   ` Vlastimil Babka
  2017-06-09  7:43                     ` Michal Hocko
  2017-06-09 14:28                     ` Larry Finger
  2017-06-09 22:38                   ` David Rientjes
  1 sibling, 2 replies; 25+ messages in thread
From: Vlastimil Babka @ 2017-06-09  6:48 UTC (permalink / raw)
  To: Michal Hocko, Matthew Wilcox
  Cc: David Rientjes, Larry Finger, Andrew Morton, LKML, linux-mm

On 06/08/2017 10:30 PM, Michal Hocko wrote:
> But I guess you are primary after syncing the preemptive mode for 64 and
> 32b systems, right? I agree that having a different model is more than
> unfortunate because 32b gets much less testing coverage and so a risk of
> introducing a new bug is just a matter of time. Maybe we should make
> pte_offset_map disable preemption and currently noop pte_unmap to
> preempt_enable. The overhead should be pretty marginal on x86_64 but not
> all arches have per-cpu preempt count. So I am not sure we really want
> to add this to just for the debugging purposes...

I think adding that overhead for everyone would be unfortunate. It would
be acceptable, if it was done only for the config option that enables
the might_sleep() checks (CONFIG_DEBUG_ATOMIC_SLEEP?)

Vlastimil

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Sleeping BUG in khugepaged for i586
  2017-06-09  6:48                   ` Vlastimil Babka
@ 2017-06-09  7:43                     ` Michal Hocko
  2017-06-09 14:28                     ` Larry Finger
  1 sibling, 0 replies; 25+ messages in thread
From: Michal Hocko @ 2017-06-09  7:43 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Matthew Wilcox, David Rientjes, Larry Finger, Andrew Morton,
	LKML, linux-mm

On Fri 09-06-17 08:48:58, Vlastimil Babka wrote:
> On 06/08/2017 10:30 PM, Michal Hocko wrote:
> > But I guess you are primary after syncing the preemptive mode for 64 and
> > 32b systems, right? I agree that having a different model is more than
> > unfortunate because 32b gets much less testing coverage and so a risk of
> > introducing a new bug is just a matter of time. Maybe we should make
> > pte_offset_map disable preemption and currently noop pte_unmap to
> > preempt_enable. The overhead should be pretty marginal on x86_64 but not
> > all arches have per-cpu preempt count. So I am not sure we really want
> > to add this to just for the debugging purposes...
> 
> I think adding that overhead for everyone would be unfortunate. It would
> be acceptable, if it was done only for the config option that enables
> the might_sleep() checks (CONFIG_DEBUG_ATOMIC_SLEEP?)

That is certainly possible. But is it worth it?
arch/alpha/include/asm/pgtable.h:#define pte_offset_map(dir,addr)	pte_offset_kernel((dir),(addr))
arch/arc/include/asm/pgtable.h:#define pte_offset_map(dir, addr)		pte_offset(dir, addr)
arch/arm/include/asm/pgtable.h:#define pte_offset_map(pmd,addr)	(__pte_map(pmd) + pte_index(addr))
arch/arm64/include/asm/pgtable.h:#define pte_offset_map(dir,addr)	pte_offset_kernel((dir), (addr))
arch/arm64/include/asm/pgtable.h:#define pte_offset_map_nested(dir,addr)	pte_offset_kernel((dir), (addr))
arch/cris/include/asm/pgtable.h:#define pte_offset_map(dir, address) \
arch/frv/include/asm/pgtable.h:#define pte_offset_map(dir, address) \
arch/frv/include/asm/pgtable.h:#define pte_offset_map(dir, address) \
arch/hexagon/include/asm/pgtable.h:#define pte_offset_map(dir, address)                                    \
arch/hexagon/include/asm/pgtable.h:#define pte_offset_map_nested(pmd, addr) pte_offset_map(pmd, addr)
arch/ia64/include/asm/pgtable.h:#define pte_offset_map(dir,addr)	pte_offset_kernel(dir, addr)
arch/m32r/include/asm/pgtable.h:#define pte_offset_map(dir, address)	\
arch/m68k/include/asm/mcf_pgtable.h:#define pte_offset_map(pmdp, addr) ((pte_t *)__pmd_page(*pmdp) + \
arch/m68k/include/asm/motorola_pgtable.h:#define pte_offset_map(pmdp,address) ((pte_t *)__pmd_page(*pmdp) + (((address) >> PAGE_SHIFT) & (PTRS_PER_PTE - 1)))
arch/m68k/include/asm/sun3_pgtable.h:#define pte_offset_map(pmd, address) ((pte_t *)page_address(pmd_page(*pmd)) + pte_index(address))
arch/metag/include/asm/pgtable.h:#define pte_offset_map(dir, address)		pte_offset_kernel(dir, address)
arch/metag/include/asm/pgtable.h:#define pte_offset_map_nested(dir, address)	pte_offset_kernel(dir, address)
arch/microblaze/include/asm/pgtable.h:#define pte_offset_map(dir, addr)		\
arch/mips/include/asm/pgtable-32.h:#define pte_offset_map(dir, address)					\
arch/mips/include/asm/pgtable-64.h:#define pte_offset_map(dir, address)					\
arch/mn10300/include/asm/pgtable.h:#define pte_offset_map(dir, address) \
arch/nios2/include/asm/pgtable.h:#define pte_offset_map(dir, addr)			\
arch/openrisc/include/asm/pgtable.h:#define pte_offset_map(dir, address)	        \
arch/openrisc/include/asm/pgtable.h:#define pte_offset_map_nested(dir, address)     \
arch/parisc/include/asm/pgtable.h:#define pte_offset_map(pmd, address) pte_offset_kernel(pmd, address)
arch/powerpc/include/asm/book3s/32/pgtable.h:#define pte_offset_map(dir, addr)		\
arch/powerpc/include/asm/book3s/64/pgtable.h:#define pte_offset_map(dir,addr)	pte_offset_kernel((dir), (addr))
arch/powerpc/include/asm/nohash/32/pgtable.h:#define pte_offset_map(dir, addr)		\
arch/powerpc/include/asm/nohash/64/pgtable.h:#define pte_offset_map(dir,addr)	pte_offset_kernel((dir), (addr))
arch/s390/include/asm/pgtable.h:#define pte_offset_map(pmd, address) pte_offset_kernel(pmd, address)
arch/score/include/asm/pgtable.h:#define pte_offset_map(dir, address)	\
arch/sh/include/asm/pgtable_32.h:#define pte_offset_map(dir, address)		pte_offset_kernel(dir, address)
arch/sh/include/asm/pgtable_64.h:#define pte_offset_map(dir,addr)	pte_offset_kernel(dir, addr)
arch/sparc/include/asm/pgtable_32.h:#define pte_offset_map(d, a)		pte_offset_kernel(d,a)
arch/sparc/include/asm/pgtable_64.h:#define pte_offset_map			pte_index
arch/tile/include/asm/pgtable.h:#define pte_offset_map(dir, address) pte_offset_kernel(dir, address)
arch/um/include/asm/pgtable.h:#define pte_offset_map(dir, address) \
arch/unicore32/include/asm/pgtable.h:#define pte_offset_map(dir, addr)	(pmd_page_vaddr(*(dir)) \
arch/x86/include/asm/pgtable_32.h:#define pte_offset_map(dir, address)					\
arch/x86/include/asm/pgtable_32.h:#define pte_offset_map(dir, address)					\
arch/x86/include/asm/pgtable_64.h:#define pte_offset_map(dir, address) pte_offset_kernel((dir), (address))
arch/xtensa/include/asm/pgtable.h:#define pte_offset_map(dir,addr)	pte_offset_kernel((dir),(addr))
include/linux/mm.h:#define pte_offset_map_lock(mm, pmd, address, ptlp)	\
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Sleeping BUG in khugepaged for i586
  2017-06-09  6:48                   ` Vlastimil Babka
  2017-06-09  7:43                     ` Michal Hocko
@ 2017-06-09 14:28                     ` Larry Finger
  1 sibling, 0 replies; 25+ messages in thread
From: Larry Finger @ 2017-06-09 14:28 UTC (permalink / raw)
  To: Vlastimil Babka, Michal Hocko, Matthew Wilcox
  Cc: David Rientjes, Andrew Morton, LKML, linux-mm

On 06/09/2017 01:48 AM, Vlastimil Babka wrote:
> On 06/08/2017 10:30 PM, Michal Hocko wrote:
>> But I guess you are primary after syncing the preemptive mode for 64 and
>> 32b systems, right? I agree that having a different model is more than
>> unfortunate because 32b gets much less testing coverage and so a risk of
>> introducing a new bug is just a matter of time. Maybe we should make
>> pte_offset_map disable preemption and currently noop pte_unmap to
>> preempt_enable. The overhead should be pretty marginal on x86_64 but not
>> all arches have per-cpu preempt count. So I am not sure we really want
>> to add this to just for the debugging purposes...
> 
> I think adding that overhead for everyone would be unfortunate. It would
> be acceptable, if it was done only for the config option that enables
> the might_sleep() checks (CONFIG_DEBUG_ATOMIC_SLEEP?)

As a "heads up", I will not be available for any testing from June 10 through 
June 17.

Larry


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Sleeping BUG in khugepaged for i586
  2017-06-08 20:30                 ` Michal Hocko
  2017-06-09  6:48                   ` Vlastimil Babka
@ 2017-06-09 22:38                   ` David Rientjes
  2017-06-10  8:09                     ` Michal Hocko
  1 sibling, 1 reply; 25+ messages in thread
From: David Rientjes @ 2017-06-09 22:38 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Matthew Wilcox, Vlastimil Babka, Larry Finger, Andrew Morton,
	LKML, linux-mm

On Thu, 8 Jun 2017, Michal Hocko wrote:

> I would just pull the cond_resched out of __collapse_huge_page_copy
> right after pte_unmap. But I am not really sure why this cond_resched is
> really needed because the changelog of the patch which adds is is quite
> terse on details.

I'm not sure what could possibly be added to the changelog.  We have 
encountered need_resched warnings during the iteration.  We fix these 
because need_resched warnings suppress future warnings of the same type 
for issues that are more important.

I can fix the i386 issue but removing the cond_resched() entirely isn't 
really suitable.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Sleeping BUG in khugepaged for i586
  2017-06-09 22:38                   ` David Rientjes
@ 2017-06-10  8:09                     ` Michal Hocko
  2017-06-11 23:28                       ` David Rientjes
  0 siblings, 1 reply; 25+ messages in thread
From: Michal Hocko @ 2017-06-10  8:09 UTC (permalink / raw)
  To: David Rientjes
  Cc: Matthew Wilcox, Vlastimil Babka, Larry Finger, Andrew Morton,
	LKML, linux-mm

On Fri 09-06-17 15:38:44, David Rientjes wrote:
> On Thu, 8 Jun 2017, Michal Hocko wrote:
> 
> > I would just pull the cond_resched out of __collapse_huge_page_copy
> > right after pte_unmap. But I am not really sure why this cond_resched is
> > really needed because the changelog of the patch which adds is is quite
> > terse on details.
> 
> I'm not sure what could possibly be added to the changelog.  We have 
> encountered need_resched warnings during the iteration.

Well, the part the changelog is not really clear about is whether the
HPAGE_PMD_NR loops itself is the source of the stall. This would be
quite surprising because doing 512 iterations taking up to 20+s sounds
way to much. So is it possible that we are missing a cond_resched
somewhere up the __collapse_huge_page_copy call path? Or do we really do
something stupidly expensive here?

> We fix these 
> because need_resched warnings suppress future warnings of the same type 
> for issues that are more important.

Sure thing. I do care about soft lockups as well.

> I can fix the i386 issue but removing the cond_resched() entirely isn't 
> really suitable.

I am not calling for a complete removal. I just do not yet see what is
the source of the long processing of the the loop.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Sleeping BUG in khugepaged for i586
  2017-06-10  8:09                     ` Michal Hocko
@ 2017-06-11 23:28                       ` David Rientjes
  2017-06-12  6:29                         ` Michal Hocko
  0 siblings, 1 reply; 25+ messages in thread
From: David Rientjes @ 2017-06-11 23:28 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Matthew Wilcox, Vlastimil Babka, Larry Finger, Andrew Morton,
	LKML, linux-mm

On Sat, 10 Jun 2017, Michal Hocko wrote:

> > > I would just pull the cond_resched out of __collapse_huge_page_copy
> > > right after pte_unmap. But I am not really sure why this cond_resched is
> > > really needed because the changelog of the patch which adds is is quite
> > > terse on details.
> > 
> > I'm not sure what could possibly be added to the changelog.  We have 
> > encountered need_resched warnings during the iteration.
> 
> Well, the part the changelog is not really clear about is whether the
> HPAGE_PMD_NR loops itself is the source of the stall. This would be
> quite surprising because doing 512 iterations taking up to 20+s sounds
> way to much.

I have no idea where you come up with 20+ seconds.

These are not soft lockups, these are need_resched warnings.  We monitor 
how long need_resched has been set and when a thread takes an excessive 
amount of time to reschedule after it has been set.  A loop of 512 pages 
with ptl contention and doing {clear,copy}_user_highpage() shows that 
need_resched can sit without scheduling for an excessive amount of time.

> So is it possible that we are missing a cond_resched
> somewhere up the __collapse_huge_page_copy call path?

No.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Sleeping BUG in khugepaged for i586
  2017-06-11 23:28                       ` David Rientjes
@ 2017-06-12  6:29                         ` Michal Hocko
  2017-06-15  0:28                           ` David Rientjes
  0 siblings, 1 reply; 25+ messages in thread
From: Michal Hocko @ 2017-06-12  6:29 UTC (permalink / raw)
  To: David Rientjes
  Cc: Matthew Wilcox, Vlastimil Babka, Larry Finger, Andrew Morton,
	LKML, linux-mm

On Sun 11-06-17 16:28:11, David Rientjes wrote:
> On Sat, 10 Jun 2017, Michal Hocko wrote:
> 
> > > > I would just pull the cond_resched out of __collapse_huge_page_copy
> > > > right after pte_unmap. But I am not really sure why this cond_resched is
> > > > really needed because the changelog of the patch which adds is is quite
> > > > terse on details.
> > > 
> > > I'm not sure what could possibly be added to the changelog.  We have 
> > > encountered need_resched warnings during the iteration.
> > 
> > Well, the part the changelog is not really clear about is whether the
> > HPAGE_PMD_NR loops itself is the source of the stall. This would be
> > quite surprising because doing 512 iterations taking up to 20+s sounds
> > way to much.
> 
> I have no idea where you come up with 20+ seconds.

OK, I misread your report as a soft lockup.

> These are not soft lockups, these are need_resched warnings.  We monitor 
> how long need_resched has been set and when a thread takes an excessive 
> amount of time to reschedule after it has been set.  A loop of 512 pages 
> with ptl contention and doing {clear,copy}_user_highpage() shows that 
> need_resched can sit without scheduling for an excessive amount of time.

How much is excessive here?
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Sleeping BUG in khugepaged for i586
  2017-06-12  6:29                         ` Michal Hocko
@ 2017-06-15  0:28                           ` David Rientjes
  0 siblings, 0 replies; 25+ messages in thread
From: David Rientjes @ 2017-06-15  0:28 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Matthew Wilcox, Vlastimil Babka, Larry Finger, Andrew Morton,
	LKML, linux-mm

On Mon, 12 Jun 2017, Michal Hocko wrote:

> > These are not soft lockups, these are need_resched warnings.  We monitor 
> > how long need_resched has been set and when a thread takes an excessive 
> > amount of time to reschedule after it has been set.  A loop of 512 pages 
> > with ptl contention and doing {clear,copy}_user_highpage() shows that 
> > need_resched can sit without scheduling for an excessive amount of time.
> 
> How much is excessive here?

We monitor anything that holds the cpu for more than 1/20th of a second, 
but this specific occurrence has been observed for ~1/8th.  The majority 
of mm/ is quite good in this regard.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Sleeping BUG in khugepaged for i586
  2017-06-08 14:48           ` Michal Hocko
  2017-06-08 17:05             ` Matthew Wilcox
@ 2017-06-15  1:12             ` David Rientjes
  2017-06-15  8:32               ` Michal Hocko
  2017-06-23 12:08             ` Michal Hocko
  2 siblings, 1 reply; 25+ messages in thread
From: David Rientjes @ 2017-06-15  1:12 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Vlastimil Babka, Larry Finger, Andrew Morton, LKML, linux-mm

On Thu, 8 Jun 2017, Michal Hocko wrote:

> collapse_huge_page
>   pte_offset_map
>     kmap_atomic
>       kmap_atomic_prot
>         preempt_disable
>   __collapse_huge_page_copy
>   pte_unmap
>     kunmap_atomic
>       __kunmap_atomic
>         preempt_enable
> 
> I suspect, so cond_resched seems indeed inappropriate on 32b systems.
> 

Seems to be an issue for i386 and arm with ARM_LPAE.  I'm slightly 
surprised we can get away with __collapse_huge_page_swapin() for 
VM_FAULT_RETRY, unless that hasn't been encountered yet.  I think the 
cond_resched() in __collapse_huge_page_copy() could be done only for 
!in_atomic() if we choose.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Sleeping BUG in khugepaged for i586
  2017-06-15  1:12             ` David Rientjes
@ 2017-06-15  8:32               ` Michal Hocko
  0 siblings, 0 replies; 25+ messages in thread
From: Michal Hocko @ 2017-06-15  8:32 UTC (permalink / raw)
  To: David Rientjes
  Cc: Vlastimil Babka, Larry Finger, Andrew Morton, LKML, linux-mm

On Wed 14-06-17 18:12:06, David Rientjes wrote:
> On Thu, 8 Jun 2017, Michal Hocko wrote:
> 
> > collapse_huge_page
> >   pte_offset_map
> >     kmap_atomic
> >       kmap_atomic_prot
> >         preempt_disable
> >   __collapse_huge_page_copy
> >   pte_unmap
> >     kunmap_atomic
> >       __kunmap_atomic
> >         preempt_enable
> > 
> > I suspect, so cond_resched seems indeed inappropriate on 32b systems.
> > 
> 
> Seems to be an issue for i386 and arm with ARM_LPAE.  I'm slightly 
> surprised we can get away with __collapse_huge_page_swapin() for 
> VM_FAULT_RETRY, unless that hasn't been encountered yet.

I do not see what you mean here or how is it related.
__collapse_huge_page_swapin is called outside of
pte_offset_map/pte_unmap section

> I think the cond_resched() in __collapse_huge_page_copy() could be
> done only for !in_atomic() if we choose.

in_atomic() depends on having PREEMPT_COUNT enabled to work properly AFAIR.
I haven't double checked and something might have changed since I've
looked the last time.

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Sleeping BUG in khugepaged for i586
  2017-06-08 14:48           ` Michal Hocko
  2017-06-08 17:05             ` Matthew Wilcox
  2017-06-15  1:12             ` David Rientjes
@ 2017-06-23 12:08             ` Michal Hocko
  2017-06-23 13:13               ` Vlastimil Babka
  2 siblings, 1 reply; 25+ messages in thread
From: Michal Hocko @ 2017-06-23 12:08 UTC (permalink / raw)
  To: David Rientjes, Andrew Morton
  Cc: Vlastimil Babka, Larry Finger, LKML, linux-mm

On Thu 08-06-17 16:48:31, Michal Hocko wrote:
> On Wed 07-06-17 13:56:01, David Rientjes wrote:
> > On Wed, 7 Jun 2017, Vlastimil Babka wrote:
> > 
> > > >> Hmm I'd expect such spin lock to be reported together with mmap_sem in
> > > >> the debugging "locks held" message?
> > > > 
> > > > My bisection of the problem is about half done. My latest good version is commit 
> > > > 7b8cd33 and the latest bad one is 2ea659a. Only about 7 steps to go.
> > > 
> > > Hmm, your bisection will most likely just find commit 338a16ba15495
> > > which added the cond_resched() at mm/khugepaged.c:655. CCing David who
> > > added it.
> > > 
> > 
> > I agree it's probably going to bisect to 338a16ba15495 since it's the 
> > cond_resched() at the line number reported, but I think there must be 
> > something else going on.  I think the list of locks held by khugepaged is 
> > correct because it matches with the implementation.  The preempt_count(), 
> > as suggested by Andrew, does not.  If this is reproducible, I'd like to 
> > know what preempt_count() is.
> 
> collapse_huge_page
>   pte_offset_map
>     kmap_atomic
>       kmap_atomic_prot
>         preempt_disable
>   __collapse_huge_page_copy
>   pte_unmap
>     kunmap_atomic
>       __kunmap_atomic
>         preempt_enable
> 
> I suspect, so cond_resched seems indeed inappropriate on 32b systems.

The code still seems to be in the mmotm tree. Are there any plans to fix
this or drop the patch?
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Sleeping BUG in khugepaged for i586
  2017-06-23 12:08             ` Michal Hocko
@ 2017-06-23 13:13               ` Vlastimil Babka
  2017-06-23 13:25                 ` Michal Hocko
  0 siblings, 1 reply; 25+ messages in thread
From: Vlastimil Babka @ 2017-06-23 13:13 UTC (permalink / raw)
  To: Michal Hocko, David Rientjes, Andrew Morton; +Cc: Larry Finger, LKML, linux-mm

On 06/23/2017 02:08 PM, Michal Hocko wrote:
> On Thu 08-06-17 16:48:31, Michal Hocko wrote:
>> On Wed 07-06-17 13:56:01, David Rientjes wrote:
>>
>> I suspect, so cond_resched seems indeed inappropriate on 32b systems.
> 
> The code still seems to be in the mmotm tree.

Even mainline at this point - 338a16ba1549

> Are there any plans to fix
> this or drop the patch?

https://lkml.kernel.org/r/alpine.DEB.2.10.1706191341550.97821@chino.kir.corp.google.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Sleeping BUG in khugepaged for i586
  2017-06-23 13:13               ` Vlastimil Babka
@ 2017-06-23 13:25                 ` Michal Hocko
  2017-06-23 15:28                   ` Larry Finger
  0 siblings, 1 reply; 25+ messages in thread
From: Michal Hocko @ 2017-06-23 13:25 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: David Rientjes, Andrew Morton, Larry Finger, LKML, linux-mm

On Fri 23-06-17 15:13:45, Vlastimil Babka wrote:
> On 06/23/2017 02:08 PM, Michal Hocko wrote:
> > On Thu 08-06-17 16:48:31, Michal Hocko wrote:
> >> On Wed 07-06-17 13:56:01, David Rientjes wrote:
> >>
> >> I suspect, so cond_resched seems indeed inappropriate on 32b systems.
> > 
> > The code still seems to be in the mmotm tree.
> 
> Even mainline at this point - 338a16ba1549
> 
> > Are there any plans to fix
> > this or drop the patch?
> 
> https://lkml.kernel.org/r/alpine.DEB.2.10.1706191341550.97821@chino.kir.corp.google.com

Ahh, I have missed that. Thanks!

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Sleeping BUG in khugepaged for i586
  2017-06-23 13:25                 ` Michal Hocko
@ 2017-06-23 15:28                   ` Larry Finger
  0 siblings, 0 replies; 25+ messages in thread
From: Larry Finger @ 2017-06-23 15:28 UTC (permalink / raw)
  To: Michal Hocko, Vlastimil Babka
  Cc: David Rientjes, Andrew Morton, LKML, linux-mm

On 06/23/2017 08:25 AM, Michal Hocko wrote:
> On Fri 23-06-17 15:13:45, Vlastimil Babka wrote:
>> On 06/23/2017 02:08 PM, Michal Hocko wrote:
>>> On Thu 08-06-17 16:48:31, Michal Hocko wrote:
>>>> On Wed 07-06-17 13:56:01, David Rientjes wrote:
>>>>
>>>> I suspect, so cond_resched seems indeed inappropriate on 32b systems.
>>>
>>> The code still seems to be in the mmotm tree.
>>
>> Even mainline at this point - 338a16ba1549
>>
>>> Are there any plans to fix
>>> this or drop the patch?
>>
>> https://lkml.kernel.org/r/alpine.DEB.2.10.1706191341550.97821@chino.kir.corp.google.com
> 
> Ahh, I have missed that. Thanks!

I also missed that patch. Applying it to my box fixes the scheduling while 
atomic splats and no downside has been detected.

You may add "Reported-and-tested-by: Larry Finger <Larry.Finger@lwfinger.net>".

Thanks for everyone's efforts in fixing this problem.

Larry


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2017-06-23 15:28 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-03 19:24 Sleeping BUG in khugepaged for i586 Larry Finger
2017-06-05 21:44 ` Andrew Morton
2017-06-06 14:02   ` Vlastimil Babka
2017-06-06 15:01     ` Larry Finger
2017-06-07  7:19       ` Vlastimil Babka
2017-06-07 20:56         ` David Rientjes
2017-06-08 14:48           ` Michal Hocko
2017-06-08 17:05             ` Matthew Wilcox
2017-06-08 20:18               ` Michal Hocko
2017-06-08 20:30                 ` Michal Hocko
2017-06-09  6:48                   ` Vlastimil Babka
2017-06-09  7:43                     ` Michal Hocko
2017-06-09 14:28                     ` Larry Finger
2017-06-09 22:38                   ` David Rientjes
2017-06-10  8:09                     ` Michal Hocko
2017-06-11 23:28                       ` David Rientjes
2017-06-12  6:29                         ` Michal Hocko
2017-06-15  0:28                           ` David Rientjes
2017-06-15  1:12             ` David Rientjes
2017-06-15  8:32               ` Michal Hocko
2017-06-23 12:08             ` Michal Hocko
2017-06-23 13:13               ` Vlastimil Babka
2017-06-23 13:25                 ` Michal Hocko
2017-06-23 15:28                   ` Larry Finger
2017-06-08 15:29           ` Larry Finger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).