All of lore.kernel.org
 help / color / mirror / Atom feed
* kmemleak: Unable to handle kernel paging request
@ 2014-06-11 12:13 ` Denis Kirjanov
  0 siblings, 0 replies; 32+ messages in thread
From: Denis Kirjanov @ 2014-06-11 12:13 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-mm

Hi,

I got a trace while running 3.15.0-08556-gdfb9454:

[  104.534026] Unable to handle kernel paging request for data at
address 0xc00000007f000000
[  104.534197] Faulting instruction address: 0xc00000000019cb50
[  104.534204] Oops: Kernel access of bad area, sig: 11 [#1]
[  104.534891] PREEMPT SMP NR_CPUS=4 NUMA PowerMac
[  104.535550] Modules linked in: ipv6 bonding snd_aoa_codec_onyx
snd_aoa snd soundcore uninorth_agp unix
[  104.536940] CPU: 1 PID: 1241 Comm: kmemleak Not tainted
3.15.0-08556-gdfb9454 #49
[  104.537881] task: c0000001652cc000 ti: c000000165570000 task.ti:
c000000165570000
[  104.538819] NIP: c00000000019cb50 LR: c00000000019cb48 CTR: c0000000000d3e60
[  104.539706] REGS: c000000165573740 TRAP: 0300   Not tainted
(3.15.0-08556-gdfb9454)
[  104.540676] MSR: 9000000000009032 <SF,HV,EE,ME,IR,DR,RI>  CR:
48502542  XER: 00000000
[  104.541887] DAR: c00000007f000000 DSISR: 40010000 SOFTE: 0
GPR00: c00000000019cb48 c0000001655739c0 c0000000012f1858 0000000000000000
GPR04: c00000007f001000 c00000017a024450 0000000000000000 0000000000000000
GPR08: 0000000000000573 c000000165573d98 0000000000000001 0000000000000000
GPR12: 0000000048500544 c00000000ffff400 c0000000000aa5e0 c000000166d43680
GPR16: 0000000000000000 0000000000000000 0000000000000000 c000000002081858
GPR20: 0000000000000100 c0000000020a1858 c0000000011b8ee0 c000000080000000
GPR24: c0000000011c0a28 c0000000011c0b80 c000000000a0ab88 c0000000011c0a28
GPR28: c00000017a024450 c00000007f000ff9 c00000007f000000 c00000017a024450
[  104.550169] NIP [c00000000019cb50] .scan_block+0x70/0x170
[  104.550852] LR [c00000000019cb48] .scan_block+0x68/0x170
[  104.554164] Call Trace:
[  104.557023] [c0000001655739c0] [c00000000019cb48]
.scan_block+0x68/0x170 (unreliable)
[  104.560578] [c000000165573a70] [c00000000019ce68] .scan_gray_list+0x218/0x270
[  104.564030] [c000000165573b30] [c00000000019d240] .kmemleak_scan+0x380/0x740
[  104.567491] [c000000165573c20] [c00000000019d67c]
.kmemleak_scan_thread+0x7c/0x120
[  104.571017] [c000000165573cb0] [c0000000000aa6e4] .kthread+0x104/0x130
[  104.574365] [c000000165573e30] [c00000000000a428]
.ret_from_kernel_thread+0x58/0xb0
[  104.577772] Instruction dump:
[  104.580634] 2e260000 409c00b8 3f62ffed 3f42ff72 3b7bf1d0 7cbc2b78
3b5a9330 3b3b0158
[  104.584274] 409200f0 4bffff2d 2fa30000 409e0090 <e87e0000> 38800001
4bfff279 7c7f1b79
[  104.587973] ---[ end trace 7905ecd9245ab244 ]---

[  104.593854] note: kmemleak[1241] exited with preempt_count 1
[  104.597142] BUG: sleeping function called from invalid context at
kernel/nsproxy.c:205
[  104.600734] in_atomic(): 1, irqs_disabled(): 1, pid: 1241, name: kmemleak
[  104.604230] INFO: lockdep is turned off.
[  104.607417] irq event stamp: 7910916
[  104.610567] hardirqs last  enabled at (7910915):
[<c0000000007cfe24>] ._raw_spin_unlock_irqrestore+0x94/0xc0
[  104.614561] hardirqs last disabled at (7910916):
[<c0000000007cef64>] ._raw_spin_lock_irqsave+0x34/0xd0
[  104.618507] softirqs last  enabled at (7910912):
[<c000000000085a60>] .__do_softirq+0x300/0x3e0
[  104.622411] softirqs last disabled at (7910893):
[<c0000000000860a4>] .irq_exit+0x144/0x160
[  104.626295] Preemption disabled at:[<          (null)>]           (null)

[  104.633121] CPU: 1 PID: 1241 Comm: kmemleak Tainted: G      D
3.15.0-08556-gdfb9454 #49
[  104.637132] Call Trace:
[  104.640453] [c000000165573230] [c000000000016728]
.show_stack+0x78/0x1e0 (unreliable)
[  104.644535] [c000000165573300] [c0000000007d6590] .dump_stack+0x9c/0x108
[  104.648492] [c000000165573390] [c0000000000b6870] .__might_sleep+0x160/0x1d0
[  104.652511] [c000000165573410] [c0000000000afc34]
.switch_task_namespaces+0x34/0xc0
[  104.656597] [c0000001655734a0] [c000000000082cc4] .do_exit+0x334/0xa80
[  104.660586] [c000000165573590] [c00000000001e424] .die+0x2f4/0x460
[  104.664570] [c000000165573650] [c00000000003e1e4] .bad_page_fault+0xd4/0x130
[  104.668680] [c0000001655736d0] [c000000000009584] handle_page_fault+0x2c/0x30
[  104.672793] --- Exception: 300 at .scan_block+0x70/0x170
    LR = .scan_block+0x68/0x170
[  104.680471] [c000000165573a70] [c00000000019ce68] .scan_gray_list+0x218/0x270
[  104.684682] [c000000165573b30] [c00000000019d240] .kmemleak_scan+0x380/0x740
[  104.688924] [c000000165573c20] [c00000000019d67c]
.kmemleak_scan_thread+0x7c/0x120
[  104.693260] [c000000165573cb0] [c0000000000aa6e4] .kthread+0x104/0x130
[  104.697503] [c000000165573e30] [c00000000000a428]
.ret_from_kernel_thread+0x58/0xb0

^ permalink raw reply	[flat|nested] 32+ messages in thread

* kmemleak: Unable to handle kernel paging request
@ 2014-06-11 12:13 ` Denis Kirjanov
  0 siblings, 0 replies; 32+ messages in thread
From: Denis Kirjanov @ 2014-06-11 12:13 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-mm

Hi,

I got a trace while running 3.15.0-08556-gdfb9454:

[  104.534026] Unable to handle kernel paging request for data at
address 0xc00000007f000000
[  104.534197] Faulting instruction address: 0xc00000000019cb50
[  104.534204] Oops: Kernel access of bad area, sig: 11 [#1]
[  104.534891] PREEMPT SMP NR_CPUS=4 NUMA PowerMac
[  104.535550] Modules linked in: ipv6 bonding snd_aoa_codec_onyx
snd_aoa snd soundcore uninorth_agp unix
[  104.536940] CPU: 1 PID: 1241 Comm: kmemleak Not tainted
3.15.0-08556-gdfb9454 #49
[  104.537881] task: c0000001652cc000 ti: c000000165570000 task.ti:
c000000165570000
[  104.538819] NIP: c00000000019cb50 LR: c00000000019cb48 CTR: c0000000000d3e60
[  104.539706] REGS: c000000165573740 TRAP: 0300   Not tainted
(3.15.0-08556-gdfb9454)
[  104.540676] MSR: 9000000000009032 <SF,HV,EE,ME,IR,DR,RI>  CR:
48502542  XER: 00000000
[  104.541887] DAR: c00000007f000000 DSISR: 40010000 SOFTE: 0
GPR00: c00000000019cb48 c0000001655739c0 c0000000012f1858 0000000000000000
GPR04: c00000007f001000 c00000017a024450 0000000000000000 0000000000000000
GPR08: 0000000000000573 c000000165573d98 0000000000000001 0000000000000000
GPR12: 0000000048500544 c00000000ffff400 c0000000000aa5e0 c000000166d43680
GPR16: 0000000000000000 0000000000000000 0000000000000000 c000000002081858
GPR20: 0000000000000100 c0000000020a1858 c0000000011b8ee0 c000000080000000
GPR24: c0000000011c0a28 c0000000011c0b80 c000000000a0ab88 c0000000011c0a28
GPR28: c00000017a024450 c00000007f000ff9 c00000007f000000 c00000017a024450
[  104.550169] NIP [c00000000019cb50] .scan_block+0x70/0x170
[  104.550852] LR [c00000000019cb48] .scan_block+0x68/0x170
[  104.554164] Call Trace:
[  104.557023] [c0000001655739c0] [c00000000019cb48]
.scan_block+0x68/0x170 (unreliable)
[  104.560578] [c000000165573a70] [c00000000019ce68] .scan_gray_list+0x218/0x270
[  104.564030] [c000000165573b30] [c00000000019d240] .kmemleak_scan+0x380/0x740
[  104.567491] [c000000165573c20] [c00000000019d67c]
.kmemleak_scan_thread+0x7c/0x120
[  104.571017] [c000000165573cb0] [c0000000000aa6e4] .kthread+0x104/0x130
[  104.574365] [c000000165573e30] [c00000000000a428]
.ret_from_kernel_thread+0x58/0xb0
[  104.577772] Instruction dump:
[  104.580634] 2e260000 409c00b8 3f62ffed 3f42ff72 3b7bf1d0 7cbc2b78
3b5a9330 3b3b0158
[  104.584274] 409200f0 4bffff2d 2fa30000 409e0090 <e87e0000> 38800001
4bfff279 7c7f1b79
[  104.587973] ---[ end trace 7905ecd9245ab244 ]---

[  104.593854] note: kmemleak[1241] exited with preempt_count 1
[  104.597142] BUG: sleeping function called from invalid context at
kernel/nsproxy.c:205
[  104.600734] in_atomic(): 1, irqs_disabled(): 1, pid: 1241, name: kmemleak
[  104.604230] INFO: lockdep is turned off.
[  104.607417] irq event stamp: 7910916
[  104.610567] hardirqs last  enabled at (7910915):
[<c0000000007cfe24>] ._raw_spin_unlock_irqrestore+0x94/0xc0
[  104.614561] hardirqs last disabled at (7910916):
[<c0000000007cef64>] ._raw_spin_lock_irqsave+0x34/0xd0
[  104.618507] softirqs last  enabled at (7910912):
[<c000000000085a60>] .__do_softirq+0x300/0x3e0
[  104.622411] softirqs last disabled at (7910893):
[<c0000000000860a4>] .irq_exit+0x144/0x160
[  104.626295] Preemption disabled at:[<          (null)>]           (null)

[  104.633121] CPU: 1 PID: 1241 Comm: kmemleak Tainted: G      D
3.15.0-08556-gdfb9454 #49
[  104.637132] Call Trace:
[  104.640453] [c000000165573230] [c000000000016728]
.show_stack+0x78/0x1e0 (unreliable)
[  104.644535] [c000000165573300] [c0000000007d6590] .dump_stack+0x9c/0x108
[  104.648492] [c000000165573390] [c0000000000b6870] .__might_sleep+0x160/0x1d0
[  104.652511] [c000000165573410] [c0000000000afc34]
.switch_task_namespaces+0x34/0xc0
[  104.656597] [c0000001655734a0] [c000000000082cc4] .do_exit+0x334/0xa80
[  104.660586] [c000000165573590] [c00000000001e424] .die+0x2f4/0x460
[  104.664570] [c000000165573650] [c00000000003e1e4] .bad_page_fault+0xd4/0x130
[  104.668680] [c0000001655736d0] [c000000000009584] handle_page_fault+0x2c/0x30
[  104.672793] --- Exception: 300 at .scan_block+0x70/0x170
    LR = .scan_block+0x68/0x170
[  104.680471] [c000000165573a70] [c00000000019ce68] .scan_gray_list+0x218/0x270
[  104.684682] [c000000165573b30] [c00000000019d240] .kmemleak_scan+0x380/0x740
[  104.688924] [c000000165573c20] [c00000000019d67c]
.kmemleak_scan_thread+0x7c/0x120
[  104.693260] [c000000165573cb0] [c0000000000aa6e4] .kthread+0x104/0x130
[  104.697503] [c000000165573e30] [c00000000000a428]
.ret_from_kernel_thread+0x58/0xb0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: kmemleak: Unable to handle kernel paging request
  2014-06-11 12:13 ` Denis Kirjanov
@ 2014-06-11 17:38   ` Catalin Marinas
  -1 siblings, 0 replies; 32+ messages in thread
From: Catalin Marinas @ 2014-06-11 17:38 UTC (permalink / raw)
  To: Denis Kirjanov; +Cc: linux-kernel, linux-mm

On Wed, Jun 11, 2014 at 04:13:07PM +0400, Denis Kirjanov wrote:
> I got a trace while running 3.15.0-08556-gdfb9454:
> 
> [  104.534026] Unable to handle kernel paging request for data at
> address 0xc00000007f000000

Were there any kmemleak messages prior to this, like "kmemleak
disabled"? There could be a race when kmemleak is disabled because of
some fatal (for kmemleak) error while the scanning is taking place
(which needs some more thinking to fix properly).

Thanks.

-- 
Catalin

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: kmemleak: Unable to handle kernel paging request
@ 2014-06-11 17:38   ` Catalin Marinas
  0 siblings, 0 replies; 32+ messages in thread
From: Catalin Marinas @ 2014-06-11 17:38 UTC (permalink / raw)
  To: Denis Kirjanov; +Cc: linux-kernel, linux-mm

On Wed, Jun 11, 2014 at 04:13:07PM +0400, Denis Kirjanov wrote:
> I got a trace while running 3.15.0-08556-gdfb9454:
> 
> [  104.534026] Unable to handle kernel paging request for data at
> address 0xc00000007f000000

Were there any kmemleak messages prior to this, like "kmemleak
disabled"? There could be a race when kmemleak is disabled because of
some fatal (for kmemleak) error while the scanning is taking place
(which needs some more thinking to fix properly).

Thanks.

-- 
Catalin

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: kmemleak: Unable to handle kernel paging request
  2014-06-11 17:38   ` Catalin Marinas
@ 2014-06-11 20:04     ` Denis Kirjanov
  -1 siblings, 0 replies; 32+ messages in thread
From: Denis Kirjanov @ 2014-06-11 20:04 UTC (permalink / raw)
  To: Catalin Marinas; +Cc: linux-kernel, linux-mm

On 6/11/14, Catalin Marinas <catalin.marinas@arm.com> wrote:
> On Wed, Jun 11, 2014 at 04:13:07PM +0400, Denis Kirjanov wrote:
>> I got a trace while running 3.15.0-08556-gdfb9454:
>>
>> [  104.534026] Unable to handle kernel paging request for data at
>> address 0xc00000007f000000
>
> Were there any kmemleak messages prior to this, like "kmemleak
> disabled"? There could be a race when kmemleak is disabled because of
> some fatal (for kmemleak) error while the scanning is taking place
> (which needs some more thinking to fix properly).

No. I checked for the similar problem and didn't find anything relevant.
I'll try to bisect it.

> Thanks.
>
> --
> Catalin
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: kmemleak: Unable to handle kernel paging request
@ 2014-06-11 20:04     ` Denis Kirjanov
  0 siblings, 0 replies; 32+ messages in thread
From: Denis Kirjanov @ 2014-06-11 20:04 UTC (permalink / raw)
  To: Catalin Marinas; +Cc: linux-kernel, linux-mm

On 6/11/14, Catalin Marinas <catalin.marinas@arm.com> wrote:
> On Wed, Jun 11, 2014 at 04:13:07PM +0400, Denis Kirjanov wrote:
>> I got a trace while running 3.15.0-08556-gdfb9454:
>>
>> [  104.534026] Unable to handle kernel paging request for data at
>> address 0xc00000007f000000
>
> Were there any kmemleak messages prior to this, like "kmemleak
> disabled"? There could be a race when kmemleak is disabled because of
> some fatal (for kmemleak) error while the scanning is taking place
> (which needs some more thinking to fix properly).

No. I checked for the similar problem and didn't find anything relevant.
I'll try to bisect it.

> Thanks.
>
> --
> Catalin
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: kmemleak: Unable to handle kernel paging request
  2014-06-11 20:04     ` Denis Kirjanov
@ 2014-06-11 22:00       ` Catalin Marinas
  -1 siblings, 0 replies; 32+ messages in thread
From: Catalin Marinas @ 2014-06-11 22:00 UTC (permalink / raw)
  To: Denis Kirjanov; +Cc: linux-kernel, linux-mm

On 11 Jun 2014, at 21:04, Denis Kirjanov <kda@linux-powerpc.org> wrote:
> On 6/11/14, Catalin Marinas <catalin.marinas@arm.com> wrote:
>> On Wed, Jun 11, 2014 at 04:13:07PM +0400, Denis Kirjanov wrote:
>>> I got a trace while running 3.15.0-08556-gdfb9454:
>>> 
>>> [  104.534026] Unable to handle kernel paging request for data at
>>> address 0xc00000007f000000
>> 
>> Were there any kmemleak messages prior to this, like "kmemleak
>> disabled"? There could be a race when kmemleak is disabled because of
>> some fatal (for kmemleak) error while the scanning is taking place
>> (which needs some more thinking to fix properly).
> 
> No. I checked for the similar problem and didn't find anything relevant.
> I'll try to bisect it.

Does this happen soon after boot? I guess it’s the first scan
(scheduled at around 1min after boot). Something seems to be telling
kmemleak that there is a valid memory block at 0xc00000007f000000.

Catalin

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: kmemleak: Unable to handle kernel paging request
@ 2014-06-11 22:00       ` Catalin Marinas
  0 siblings, 0 replies; 32+ messages in thread
From: Catalin Marinas @ 2014-06-11 22:00 UTC (permalink / raw)
  To: Denis Kirjanov; +Cc: linux-kernel, linux-mm

On 11 Jun 2014, at 21:04, Denis Kirjanov <kda@linux-powerpc.org> wrote:
> On 6/11/14, Catalin Marinas <catalin.marinas@arm.com> wrote:
>> On Wed, Jun 11, 2014 at 04:13:07PM +0400, Denis Kirjanov wrote:
>>> I got a trace while running 3.15.0-08556-gdfb9454:
>>> 
>>> [  104.534026] Unable to handle kernel paging request for data at
>>> address 0xc00000007f000000
>> 
>> Were there any kmemleak messages prior to this, like "kmemleak
>> disabled"? There could be a race when kmemleak is disabled because of
>> some fatal (for kmemleak) error while the scanning is taking place
>> (which needs some more thinking to fix properly).
> 
> No. I checked for the similar problem and didn't find anything relevant.
> I'll try to bisect it.

Does this happen soon after boot? I guess it’s the first scan
(scheduled at around 1min after boot). Something seems to be telling
kmemleak that there is a valid memory block at 0xc00000007f000000.

Catalin
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: kmemleak: Unable to handle kernel paging request
  2014-06-11 22:00       ` Catalin Marinas
@ 2014-06-12  7:39         ` Denis Kirjanov
  -1 siblings, 0 replies; 32+ messages in thread
From: Denis Kirjanov @ 2014-06-12  7:39 UTC (permalink / raw)
  To: Catalin Marinas; +Cc: linux-kernel, linux-mm

On 6/12/14, Catalin Marinas <catalin.marinas@arm.com> wrote:
> On 11 Jun 2014, at 21:04, Denis Kirjanov <kda@linux-powerpc.org> wrote:
>> On 6/11/14, Catalin Marinas <catalin.marinas@arm.com> wrote:
>>> On Wed, Jun 11, 2014 at 04:13:07PM +0400, Denis Kirjanov wrote:
>>>> I got a trace while running 3.15.0-08556-gdfb9454:
>>>>
>>>> [  104.534026] Unable to handle kernel paging request for data at
>>>> address 0xc00000007f000000
>>>
>>> Were there any kmemleak messages prior to this, like "kmemleak
>>> disabled"? There could be a race when kmemleak is disabled because of
>>> some fatal (for kmemleak) error while the scanning is taking place
>>> (which needs some more thinking to fix properly).
>>
>> No. I checked for the similar problem and didn't find anything relevant.
>> I'll try to bisect it.
>
> Does this happen soon after boot? I guess it’s the first scan
> (scheduled at around 1min after boot). Something seems to be telling
> kmemleak that there is a valid memory block at 0xc00000007f000000.

Yeah, it happens after a while with a booted system so that's the
first kmemleak scan.

> Catalin

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: kmemleak: Unable to handle kernel paging request
@ 2014-06-12  7:39         ` Denis Kirjanov
  0 siblings, 0 replies; 32+ messages in thread
From: Denis Kirjanov @ 2014-06-12  7:39 UTC (permalink / raw)
  To: Catalin Marinas; +Cc: linux-kernel, linux-mm

On 6/12/14, Catalin Marinas <catalin.marinas@arm.com> wrote:
> On 11 Jun 2014, at 21:04, Denis Kirjanov <kda@linux-powerpc.org> wrote:
>> On 6/11/14, Catalin Marinas <catalin.marinas@arm.com> wrote:
>>> On Wed, Jun 11, 2014 at 04:13:07PM +0400, Denis Kirjanov wrote:
>>>> I got a trace while running 3.15.0-08556-gdfb9454:
>>>>
>>>> [  104.534026] Unable to handle kernel paging request for data at
>>>> address 0xc00000007f000000
>>>
>>> Were there any kmemleak messages prior to this, like "kmemleak
>>> disabled"? There could be a race when kmemleak is disabled because of
>>> some fatal (for kmemleak) error while the scanning is taking place
>>> (which needs some more thinking to fix properly).
>>
>> No. I checked for the similar problem and didn't find anything relevant.
>> I'll try to bisect it.
>
> Does this happen soon after boot? I guess it’s the first scan
> (scheduled at around 1min after boot). Something seems to be telling
> kmemleak that there is a valid memory block at 0xc00000007f000000.

Yeah, it happens after a while with a booted system so that's the
first kmemleak scan.

> Catalin

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: kmemleak: Unable to handle kernel paging request
  2014-06-12  7:39         ` Denis Kirjanov
@ 2014-06-12 12:00           ` Denis Kirjanov
  -1 siblings, 0 replies; 32+ messages in thread
From: Denis Kirjanov @ 2014-06-12 12:00 UTC (permalink / raw)
  To: Catalin Marinas; +Cc: linux-kernel, linux-mm, Naoya Horiguchi

On 6/12/14, Denis Kirjanov <kda@linux-powerpc.org> wrote:
> On 6/12/14, Catalin Marinas <catalin.marinas@arm.com> wrote:
>> On 11 Jun 2014, at 21:04, Denis Kirjanov <kda@linux-powerpc.org> wrote:
>>> On 6/11/14, Catalin Marinas <catalin.marinas@arm.com> wrote:
>>>> On Wed, Jun 11, 2014 at 04:13:07PM +0400, Denis Kirjanov wrote:
>>>>> I got a trace while running 3.15.0-08556-gdfb9454:
>>>>>
>>>>> [  104.534026] Unable to handle kernel paging request for data at
>>>>> address 0xc00000007f000000
>>>>
>>>> Were there any kmemleak messages prior to this, like "kmemleak
>>>> disabled"? There could be a race when kmemleak is disabled because of
>>>> some fatal (for kmemleak) error while the scanning is taking place
>>>> (which needs some more thinking to fix properly).
>>>
>>> No. I checked for the similar problem and didn't find anything relevant.
>>> I'll try to bisect it.
>>
>> Does this happen soon after boot? I guess it’s the first scan
>> (scheduled at around 1min after boot). Something seems to be telling
>> kmemleak that there is a valid memory block at 0xc00000007f000000.
>
> Yeah, it happens after a while with a booted system so that's the
> first kmemleak scan.
>
>> Catalin
>

I've bisected to this commit: d4c54919ed86302094c0ca7d48a8cbd4ee753e92
"mm: add !pte_present() check on existing hugetlb_entry callbacks".
Reverting the commit fixes the issue

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: kmemleak: Unable to handle kernel paging request
@ 2014-06-12 12:00           ` Denis Kirjanov
  0 siblings, 0 replies; 32+ messages in thread
From: Denis Kirjanov @ 2014-06-12 12:00 UTC (permalink / raw)
  To: Catalin Marinas; +Cc: linux-kernel, linux-mm, Naoya Horiguchi

On 6/12/14, Denis Kirjanov <kda@linux-powerpc.org> wrote:
> On 6/12/14, Catalin Marinas <catalin.marinas@arm.com> wrote:
>> On 11 Jun 2014, at 21:04, Denis Kirjanov <kda@linux-powerpc.org> wrote:
>>> On 6/11/14, Catalin Marinas <catalin.marinas@arm.com> wrote:
>>>> On Wed, Jun 11, 2014 at 04:13:07PM +0400, Denis Kirjanov wrote:
>>>>> I got a trace while running 3.15.0-08556-gdfb9454:
>>>>>
>>>>> [  104.534026] Unable to handle kernel paging request for data at
>>>>> address 0xc00000007f000000
>>>>
>>>> Were there any kmemleak messages prior to this, like "kmemleak
>>>> disabled"? There could be a race when kmemleak is disabled because of
>>>> some fatal (for kmemleak) error while the scanning is taking place
>>>> (which needs some more thinking to fix properly).
>>>
>>> No. I checked for the similar problem and didn't find anything relevant.
>>> I'll try to bisect it.
>>
>> Does this happen soon after boot? I guess it’s the first scan
>> (scheduled at around 1min after boot). Something seems to be telling
>> kmemleak that there is a valid memory block at 0xc00000007f000000.
>
> Yeah, it happens after a while with a booted system so that's the
> first kmemleak scan.
>
>> Catalin
>

I've bisected to this commit: d4c54919ed86302094c0ca7d48a8cbd4ee753e92
"mm: add !pte_present() check on existing hugetlb_entry callbacks".
Reverting the commit fixes the issue

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: kmemleak: Unable to handle kernel paging request
  2014-06-12 12:00           ` Denis Kirjanov
  (?)
@ 2014-06-12 13:29           ` Naoya Horiguchi
  -1 siblings, 0 replies; 32+ messages in thread
From: Naoya Horiguchi @ 2014-06-12 13:29 UTC (permalink / raw)
  To: Denis Kirjanov; +Cc: Catalin Marinas, linux-kernel, linux-mm, Andrew Morton

Hi Denis,

On Thu, Jun 12, 2014 at 04:00:57PM +0400, Denis Kirjanov wrote:
> On 6/12/14, Denis Kirjanov <kda@linux-powerpc.org> wrote:
> > On 6/12/14, Catalin Marinas <catalin.marinas@arm.com> wrote:
> >> On 11 Jun 2014, at 21:04, Denis Kirjanov <kda@linux-powerpc.org> wrote:
> >>> On 6/11/14, Catalin Marinas <catalin.marinas@arm.com> wrote:
> >>>> On Wed, Jun 11, 2014 at 04:13:07PM +0400, Denis Kirjanov wrote:
> >>>>> I got a trace while running 3.15.0-08556-gdfb9454:
> >>>>>
> >>>>> [  104.534026] Unable to handle kernel paging request for data at
> >>>>> address 0xc00000007f000000
> >>>>
> >>>> Were there any kmemleak messages prior to this, like "kmemleak
> >>>> disabled"? There could be a race when kmemleak is disabled because of
> >>>> some fatal (for kmemleak) error while the scanning is taking place
> >>>> (which needs some more thinking to fix properly).
> >>>
> >>> No. I checked for the similar problem and didn't find anything relevant.
> >>> I'll try to bisect it.
> >>
> >> Does this happen soon after boot? I guess it’s the first scan
> >> (scheduled at around 1min after boot). Something seems to be telling
> >> kmemleak that there is a valid memory block at 0xc00000007f000000.
> >
> > Yeah, it happens after a while with a booted system so that's the
> > first kmemleak scan.
> >
> >> Catalin
> >
> 
> I've bisected to this commit: d4c54919ed86302094c0ca7d48a8cbd4ee753e92
> "mm: add !pte_present() check on existing hugetlb_entry callbacks".
> Reverting the commit fixes the issue

Thanks for the effort of bisecting.
I guess that this bug happens because pte_none() check was gone in this
commit, so could you try to find if the following patch fixes the problem?

I don't know much about kmemleak's details, so I'm not sure how this bug
affected kmemleak. So I'm appreciated if you would add some comment in
patch description.

Thanks,
Naoya Horiguchi
---
Date: Thu, 12 Jun 2014 08:56:27 -0400
Subject: [PATCH] mm: revoke pte_none() check for hugetlb_entry() callbacks

commit: d4c54919ed86302094c0ca7d48a8cbd4ee753e92 ("mm: add !pte_present()
check on existing hugetlb_entry callbacks") removed pte_none() check in
a ->hugetlb_entry() handler, which unexpectedly broke other features like
kmemleak.

pte_none() check should be done in common page walk code, because we do
so for normal pages and page walk might want to handle holes with
->pte_hole() callback.

Reported-by: Denis Kirjanov <kda@linux-powerpc.org>
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
 mm/pagewalk.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/mm/pagewalk.c b/mm/pagewalk.c
index 2beeabf502c5..0618657285c4 100644
--- a/mm/pagewalk.c
+++ b/mm/pagewalk.c
@@ -118,6 +118,13 @@ static int walk_hugetlb_range(struct vm_area_struct *vma,
 	do {
 		next = hugetlb_entry_end(h, addr, end);
 		pte = huge_pte_offset(walk->mm, addr & hmask);
+		if (huge_pte_none(*pte)) {
+			if (walk->pte_hole)
+				err = walk->pte_hole(addr, next, walk);
+			if (err)
+				break;
+			continue;
+		}
 		if (pte && walk->hugetlb_entry)
 			err = walk->hugetlb_entry(pte, hmask, addr, next, walk);
 		if (err)
-- 
1.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: kmemleak: Unable to handle kernel paging request
  2014-06-12 12:00           ` Denis Kirjanov
@ 2014-06-12 14:39             ` Catalin Marinas
  -1 siblings, 0 replies; 32+ messages in thread
From: Catalin Marinas @ 2014-06-12 14:39 UTC (permalink / raw)
  To: Denis Kirjanov; +Cc: linux-kernel, linux-mm, Naoya Horiguchi

On Thu, Jun 12, 2014 at 01:00:57PM +0100, Denis Kirjanov wrote:
> On 6/12/14, Denis Kirjanov <kda@linux-powerpc.org> wrote:
> > On 6/12/14, Catalin Marinas <catalin.marinas@arm.com> wrote:
> >> On 11 Jun 2014, at 21:04, Denis Kirjanov <kda@linux-powerpc.org> wrote:
> >>> On 6/11/14, Catalin Marinas <catalin.marinas@arm.com> wrote:
> >>>> On Wed, Jun 11, 2014 at 04:13:07PM +0400, Denis Kirjanov wrote:
> >>>>> I got a trace while running 3.15.0-08556-gdfb9454:
> >>>>>
> >>>>> [  104.534026] Unable to handle kernel paging request for data at
> >>>>> address 0xc00000007f000000
> >>>>
> >>>> Were there any kmemleak messages prior to this, like "kmemleak
> >>>> disabled"? There could be a race when kmemleak is disabled because of
> >>>> some fatal (for kmemleak) error while the scanning is taking place
> >>>> (which needs some more thinking to fix properly).
> >>>
> >>> No. I checked for the similar problem and didn't find anything relevant.
> >>> I'll try to bisect it.
> >>
> >> Does this happen soon after boot? I guess it’s the first scan
> >> (scheduled at around 1min after boot). Something seems to be telling
> >> kmemleak that there is a valid memory block at 0xc00000007f000000.
> >
> > Yeah, it happens after a while with a booted system so that's the
> > first kmemleak scan.
> >
> 
> I've bisected to this commit: d4c54919ed86302094c0ca7d48a8cbd4ee753e92
> "mm: add !pte_present() check on existing hugetlb_entry callbacks".
> Reverting the commit fixes the issue

I can't figure how this causes the problem but I have more questions. Is
0xc00000007f000000 address always the same in all crashes? If yes, you
could comment out start_scan_thread() in kmemleak_late_init() to avoid
the scanning thread starting. Once booted, you can run:

  echo dump=0xc00000007f000000 > /sys/kernel/debug/kmemleak

and check the dmesg for what kmemleak knows about that address, when it
was allocated and whether it should be mapped or not.

-- 
Catalin

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: kmemleak: Unable to handle kernel paging request
@ 2014-06-12 14:39             ` Catalin Marinas
  0 siblings, 0 replies; 32+ messages in thread
From: Catalin Marinas @ 2014-06-12 14:39 UTC (permalink / raw)
  To: Denis Kirjanov; +Cc: linux-kernel, linux-mm, Naoya Horiguchi

On Thu, Jun 12, 2014 at 01:00:57PM +0100, Denis Kirjanov wrote:
> On 6/12/14, Denis Kirjanov <kda@linux-powerpc.org> wrote:
> > On 6/12/14, Catalin Marinas <catalin.marinas@arm.com> wrote:
> >> On 11 Jun 2014, at 21:04, Denis Kirjanov <kda@linux-powerpc.org> wrote:
> >>> On 6/11/14, Catalin Marinas <catalin.marinas@arm.com> wrote:
> >>>> On Wed, Jun 11, 2014 at 04:13:07PM +0400, Denis Kirjanov wrote:
> >>>>> I got a trace while running 3.15.0-08556-gdfb9454:
> >>>>>
> >>>>> [  104.534026] Unable to handle kernel paging request for data at
> >>>>> address 0xc00000007f000000
> >>>>
> >>>> Were there any kmemleak messages prior to this, like "kmemleak
> >>>> disabled"? There could be a race when kmemleak is disabled because of
> >>>> some fatal (for kmemleak) error while the scanning is taking place
> >>>> (which needs some more thinking to fix properly).
> >>>
> >>> No. I checked for the similar problem and didn't find anything relevant.
> >>> I'll try to bisect it.
> >>
> >> Does this happen soon after boot? I guess ita??s the first scan
> >> (scheduled at around 1min after boot). Something seems to be telling
> >> kmemleak that there is a valid memory block at 0xc00000007f000000.
> >
> > Yeah, it happens after a while with a booted system so that's the
> > first kmemleak scan.
> >
> 
> I've bisected to this commit: d4c54919ed86302094c0ca7d48a8cbd4ee753e92
> "mm: add !pte_present() check on existing hugetlb_entry callbacks".
> Reverting the commit fixes the issue

I can't figure how this causes the problem but I have more questions. Is
0xc00000007f000000 address always the same in all crashes? If yes, you
could comment out start_scan_thread() in kmemleak_late_init() to avoid
the scanning thread starting. Once booted, you can run:

  echo dump=0xc00000007f000000 > /sys/kernel/debug/kmemleak

and check the dmesg for what kmemleak knows about that address, when it
was allocated and whether it should be mapped or not.

-- 
Catalin

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: kmemleak: Unable to handle kernel paging request
       [not found]           ` <5399ab3b.4825e00a.60fd.5014SMTPIN_ADDED_BROKEN@mx.google.com>
@ 2014-06-13  6:39               ` Denis Kirjanov
  0 siblings, 0 replies; 32+ messages in thread
From: Denis Kirjanov @ 2014-06-13  6:39 UTC (permalink / raw)
  To: Naoya Horiguchi; +Cc: Catalin Marinas, linux-kernel, linux-mm, Andrew Morton

On 6/12/14, Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> wrote:
> Hi Denis,
>
> On Thu, Jun 12, 2014 at 04:00:57PM +0400, Denis Kirjanov wrote:
>> On 6/12/14, Denis Kirjanov <kda@linux-powerpc.org> wrote:
>> > On 6/12/14, Catalin Marinas <catalin.marinas@arm.com> wrote:
>> >> On 11 Jun 2014, at 21:04, Denis Kirjanov <kda@linux-powerpc.org>
>> >> wrote:
>> >>> On 6/11/14, Catalin Marinas <catalin.marinas@arm.com> wrote:
>> >>>> On Wed, Jun 11, 2014 at 04:13:07PM +0400, Denis Kirjanov wrote:
>> >>>>> I got a trace while running 3.15.0-08556-gdfb9454:
>> >>>>>
>> >>>>> [  104.534026] Unable to handle kernel paging request for data at
>> >>>>> address 0xc00000007f000000
>> >>>>
>> >>>> Were there any kmemleak messages prior to this, like "kmemleak
>> >>>> disabled"? There could be a race when kmemleak is disabled because
>> >>>> of
>> >>>> some fatal (for kmemleak) error while the scanning is taking place
>> >>>> (which needs some more thinking to fix properly).
>> >>>
>> >>> No. I checked for the similar problem and didn't find anything
>> >>> relevant.
>> >>> I'll try to bisect it.
>> >>
>> >> Does this happen soon after boot? I guess it’s the first scan
>> >> (scheduled at around 1min after boot). Something seems to be telling
>> >> kmemleak that there is a valid memory block at 0xc00000007f000000.
>> >
>> > Yeah, it happens after a while with a booted system so that's the
>> > first kmemleak scan.
>> >
>> >> Catalin
>> >
>>
>> I've bisected to this commit: d4c54919ed86302094c0ca7d48a8cbd4ee753e92
>> "mm: add !pte_present() check on existing hugetlb_entry callbacks".
>> Reverting the commit fixes the issue
>
> Thanks for the effort of bisecting.
> I guess that this bug happens because pte_none() check was gone in this
> commit, so could you try to find if the following patch fixes the problem?
>
> I don't know much about kmemleak's details, so I'm not sure how this bug
> affected kmemleak. So I'm appreciated if you would add some comment in
> patch description.
>
> Thanks,
> Naoya Horiguchi
> ---
> Date: Thu, 12 Jun 2014 08:56:27 -0400
> Subject: [PATCH] mm: revoke pte_none() check for hugetlb_entry() callbacks
>
> commit: d4c54919ed86302094c0ca7d48a8cbd4ee753e92 ("mm: add !pte_present()
> check on existing hugetlb_entry callbacks") removed pte_none() check in
> a ->hugetlb_entry() handler, which unexpectedly broke other features like
> kmemleak.
>
> pte_none() check should be done in common page walk code, because we do
> so for normal pages and page walk might want to handle holes with
> ->pte_hole() callback.
>
> Reported-by: Denis Kirjanov <kda@linux-powerpc.org>
> Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> ---
>  mm/pagewalk.c | 7 +++++++
>  1 file changed, 7 insertions(+)
>
> diff --git a/mm/pagewalk.c b/mm/pagewalk.c
> index 2beeabf502c5..0618657285c4 100644
> --- a/mm/pagewalk.c
> +++ b/mm/pagewalk.c
> @@ -118,6 +118,13 @@ static int walk_hugetlb_range(struct vm_area_struct
> *vma,
>  	do {
>  		next = hugetlb_entry_end(h, addr, end);
>  		pte = huge_pte_offset(walk->mm, addr & hmask);
> +		if (huge_pte_none(*pte)) {
> +			if (walk->pte_hole)
> +				err = walk->pte_hole(addr, next, walk);
> +			if (err)
> +				break;
> +			continue;
> +		}
>  		if (pte && walk->hugetlb_entry)
>  			err = walk->hugetlb_entry(pte, hmask, addr, next, walk);
>  		if (err)
> --
> 1.9.3

Nope, Unfortunately I still see the issue :/

>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: kmemleak: Unable to handle kernel paging request
@ 2014-06-13  6:39               ` Denis Kirjanov
  0 siblings, 0 replies; 32+ messages in thread
From: Denis Kirjanov @ 2014-06-13  6:39 UTC (permalink / raw)
  To: Naoya Horiguchi; +Cc: Catalin Marinas, linux-kernel, linux-mm, Andrew Morton

On 6/12/14, Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> wrote:
> Hi Denis,
>
> On Thu, Jun 12, 2014 at 04:00:57PM +0400, Denis Kirjanov wrote:
>> On 6/12/14, Denis Kirjanov <kda@linux-powerpc.org> wrote:
>> > On 6/12/14, Catalin Marinas <catalin.marinas@arm.com> wrote:
>> >> On 11 Jun 2014, at 21:04, Denis Kirjanov <kda@linux-powerpc.org>
>> >> wrote:
>> >>> On 6/11/14, Catalin Marinas <catalin.marinas@arm.com> wrote:
>> >>>> On Wed, Jun 11, 2014 at 04:13:07PM +0400, Denis Kirjanov wrote:
>> >>>>> I got a trace while running 3.15.0-08556-gdfb9454:
>> >>>>>
>> >>>>> [  104.534026] Unable to handle kernel paging request for data at
>> >>>>> address 0xc00000007f000000
>> >>>>
>> >>>> Were there any kmemleak messages prior to this, like "kmemleak
>> >>>> disabled"? There could be a race when kmemleak is disabled because
>> >>>> of
>> >>>> some fatal (for kmemleak) error while the scanning is taking place
>> >>>> (which needs some more thinking to fix properly).
>> >>>
>> >>> No. I checked for the similar problem and didn't find anything
>> >>> relevant.
>> >>> I'll try to bisect it.
>> >>
>> >> Does this happen soon after boot? I guess it’s the first scan
>> >> (scheduled at around 1min after boot). Something seems to be telling
>> >> kmemleak that there is a valid memory block at 0xc00000007f000000.
>> >
>> > Yeah, it happens after a while with a booted system so that's the
>> > first kmemleak scan.
>> >
>> >> Catalin
>> >
>>
>> I've bisected to this commit: d4c54919ed86302094c0ca7d48a8cbd4ee753e92
>> "mm: add !pte_present() check on existing hugetlb_entry callbacks".
>> Reverting the commit fixes the issue
>
> Thanks for the effort of bisecting.
> I guess that this bug happens because pte_none() check was gone in this
> commit, so could you try to find if the following patch fixes the problem?
>
> I don't know much about kmemleak's details, so I'm not sure how this bug
> affected kmemleak. So I'm appreciated if you would add some comment in
> patch description.
>
> Thanks,
> Naoya Horiguchi
> ---
> Date: Thu, 12 Jun 2014 08:56:27 -0400
> Subject: [PATCH] mm: revoke pte_none() check for hugetlb_entry() callbacks
>
> commit: d4c54919ed86302094c0ca7d48a8cbd4ee753e92 ("mm: add !pte_present()
> check on existing hugetlb_entry callbacks") removed pte_none() check in
> a ->hugetlb_entry() handler, which unexpectedly broke other features like
> kmemleak.
>
> pte_none() check should be done in common page walk code, because we do
> so for normal pages and page walk might want to handle holes with
> ->pte_hole() callback.
>
> Reported-by: Denis Kirjanov <kda@linux-powerpc.org>
> Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> ---
>  mm/pagewalk.c | 7 +++++++
>  1 file changed, 7 insertions(+)
>
> diff --git a/mm/pagewalk.c b/mm/pagewalk.c
> index 2beeabf502c5..0618657285c4 100644
> --- a/mm/pagewalk.c
> +++ b/mm/pagewalk.c
> @@ -118,6 +118,13 @@ static int walk_hugetlb_range(struct vm_area_struct
> *vma,
>  	do {
>  		next = hugetlb_entry_end(h, addr, end);
>  		pte = huge_pte_offset(walk->mm, addr & hmask);
> +		if (huge_pte_none(*pte)) {
> +			if (walk->pte_hole)
> +				err = walk->pte_hole(addr, next, walk);
> +			if (err)
> +				break;
> +			continue;
> +		}
>  		if (pte && walk->hugetlb_entry)
>  			err = walk->hugetlb_entry(pte, hmask, addr, next, walk);
>  		if (err)
> --
> 1.9.3

Nope, Unfortunately I still see the issue :/

>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: kmemleak: Unable to handle kernel paging request
  2014-06-12 14:39             ` Catalin Marinas
@ 2014-06-13  7:12               ` Denis Kirjanov
  -1 siblings, 0 replies; 32+ messages in thread
From: Denis Kirjanov @ 2014-06-13  7:12 UTC (permalink / raw)
  To: Catalin Marinas; +Cc: linux-kernel, linux-mm, Naoya Horiguchi

On 6/12/14, Catalin Marinas <catalin.marinas@arm.com> wrote:
> On Thu, Jun 12, 2014 at 01:00:57PM +0100, Denis Kirjanov wrote:
>> On 6/12/14, Denis Kirjanov <kda@linux-powerpc.org> wrote:
>> > On 6/12/14, Catalin Marinas <catalin.marinas@arm.com> wrote:
>> >> On 11 Jun 2014, at 21:04, Denis Kirjanov <kda@linux-powerpc.org>
>> >> wrote:
>> >>> On 6/11/14, Catalin Marinas <catalin.marinas@arm.com> wrote:
>> >>>> On Wed, Jun 11, 2014 at 04:13:07PM +0400, Denis Kirjanov wrote:
>> >>>>> I got a trace while running 3.15.0-08556-gdfb9454:
>> >>>>>
>> >>>>> [  104.534026] Unable to handle kernel paging request for data at
>> >>>>> address 0xc00000007f000000
>> >>>>
>> >>>> Were there any kmemleak messages prior to this, like "kmemleak
>> >>>> disabled"? There could be a race when kmemleak is disabled because
>> >>>> of
>> >>>> some fatal (for kmemleak) error while the scanning is taking place
>> >>>> (which needs some more thinking to fix properly).
>> >>>
>> >>> No. I checked for the similar problem and didn't find anything
>> >>> relevant.
>> >>> I'll try to bisect it.
>> >>
>> >> Does this happen soon after boot? I guess it’s the first scan
>> >> (scheduled at around 1min after boot). Something seems to be telling
>> >> kmemleak that there is a valid memory block at 0xc00000007f000000.
>> >
>> > Yeah, it happens after a while with a booted system so that's the
>> > first kmemleak scan.
>> >
>>
>> I've bisected to this commit: d4c54919ed86302094c0ca7d48a8cbd4ee753e92
>> "mm: add !pte_present() check on existing hugetlb_entry callbacks".
>> Reverting the commit fixes the issue
>
> I can't figure how this causes the problem but I have more questions. Is
> 0xc00000007f000000 address always the same in all crashes? If yes, you
> could comment out start_scan_thread() in kmemleak_late_init() to avoid
> the scanning thread starting. Once booted, you can run:
>
>   echo dump=0xc00000007f000000 > /sys/kernel/debug/kmemleak
>
> and check the dmesg for what kmemleak knows about that address, when it
> was allocated and whether it should be mapped or not.

The address is always the same.

[  179.466239] kmemleak: Object 0xc00000007f000000 (size 16777216):
[  179.466503] kmemleak:   comm "swapper/0", pid 0, jiffies 4294892300
[  179.466508] kmemleak:   min_count = 0
[  179.466512] kmemleak:   count = 0
[  179.466517] kmemleak:   flags = 0x1
[  179.466522] kmemleak:   checksum = 0
[  179.466526] kmemleak:   backtrace:
[  179.466531]      [<c000000000afc3dc>] .memblock_alloc_range_nid+0x68/0x88
[  179.466544]      [<c000000000afc444>] .memblock_alloc_base+0x20/0x58
[  179.466553]      [<c000000000ae96cc>] .alloc_dart_table+0x5c/0xb0
[  179.466561]      [<c000000000aea300>] .pmac_probe+0x38/0xa0
[  179.466569]      [<000000000002166c>] 0x2166c
[  179.466579]      [<0000000000ae0e68>] 0xae0e68
[  179.466587]      [<0000000000009bc4>] 0x9bc4


> --
> Catalin
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: kmemleak: Unable to handle kernel paging request
@ 2014-06-13  7:12               ` Denis Kirjanov
  0 siblings, 0 replies; 32+ messages in thread
From: Denis Kirjanov @ 2014-06-13  7:12 UTC (permalink / raw)
  To: Catalin Marinas; +Cc: linux-kernel, linux-mm, Naoya Horiguchi

On 6/12/14, Catalin Marinas <catalin.marinas@arm.com> wrote:
> On Thu, Jun 12, 2014 at 01:00:57PM +0100, Denis Kirjanov wrote:
>> On 6/12/14, Denis Kirjanov <kda@linux-powerpc.org> wrote:
>> > On 6/12/14, Catalin Marinas <catalin.marinas@arm.com> wrote:
>> >> On 11 Jun 2014, at 21:04, Denis Kirjanov <kda@linux-powerpc.org>
>> >> wrote:
>> >>> On 6/11/14, Catalin Marinas <catalin.marinas@arm.com> wrote:
>> >>>> On Wed, Jun 11, 2014 at 04:13:07PM +0400, Denis Kirjanov wrote:
>> >>>>> I got a trace while running 3.15.0-08556-gdfb9454:
>> >>>>>
>> >>>>> [  104.534026] Unable to handle kernel paging request for data at
>> >>>>> address 0xc00000007f000000
>> >>>>
>> >>>> Were there any kmemleak messages prior to this, like "kmemleak
>> >>>> disabled"? There could be a race when kmemleak is disabled because
>> >>>> of
>> >>>> some fatal (for kmemleak) error while the scanning is taking place
>> >>>> (which needs some more thinking to fix properly).
>> >>>
>> >>> No. I checked for the similar problem and didn't find anything
>> >>> relevant.
>> >>> I'll try to bisect it.
>> >>
>> >> Does this happen soon after boot? I guess it’s the first scan
>> >> (scheduled at around 1min after boot). Something seems to be telling
>> >> kmemleak that there is a valid memory block at 0xc00000007f000000.
>> >
>> > Yeah, it happens after a while with a booted system so that's the
>> > first kmemleak scan.
>> >
>>
>> I've bisected to this commit: d4c54919ed86302094c0ca7d48a8cbd4ee753e92
>> "mm: add !pte_present() check on existing hugetlb_entry callbacks".
>> Reverting the commit fixes the issue
>
> I can't figure how this causes the problem but I have more questions. Is
> 0xc00000007f000000 address always the same in all crashes? If yes, you
> could comment out start_scan_thread() in kmemleak_late_init() to avoid
> the scanning thread starting. Once booted, you can run:
>
>   echo dump=0xc00000007f000000 > /sys/kernel/debug/kmemleak
>
> and check the dmesg for what kmemleak knows about that address, when it
> was allocated and whether it should be mapped or not.

The address is always the same.

[  179.466239] kmemleak: Object 0xc00000007f000000 (size 16777216):
[  179.466503] kmemleak:   comm "swapper/0", pid 0, jiffies 4294892300
[  179.466508] kmemleak:   min_count = 0
[  179.466512] kmemleak:   count = 0
[  179.466517] kmemleak:   flags = 0x1
[  179.466522] kmemleak:   checksum = 0
[  179.466526] kmemleak:   backtrace:
[  179.466531]      [<c000000000afc3dc>] .memblock_alloc_range_nid+0x68/0x88
[  179.466544]      [<c000000000afc444>] .memblock_alloc_base+0x20/0x58
[  179.466553]      [<c000000000ae96cc>] .alloc_dart_table+0x5c/0xb0
[  179.466561]      [<c000000000aea300>] .pmac_probe+0x38/0xa0
[  179.466569]      [<000000000002166c>] 0x2166c
[  179.466579]      [<0000000000ae0e68>] 0xae0e68
[  179.466587]      [<0000000000009bc4>] 0x9bc4


> --
> Catalin
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: kmemleak: Unable to handle kernel paging request
  2014-06-13  7:12               ` Denis Kirjanov
  (?)
@ 2014-06-13  8:56                 ` Catalin Marinas
  -1 siblings, 0 replies; 32+ messages in thread
From: Catalin Marinas @ 2014-06-13  8:56 UTC (permalink / raw)
  To: Denis Kirjanov
  Cc: linux-kernel, linux-mm, Naoya Horiguchi, linuxppc-dev,
	Benjamin Herrenschmidt, Paul Mackerras

On Fri, Jun 13, 2014 at 08:12:08AM +0100, Denis Kirjanov wrote:
> On 6/12/14, Catalin Marinas <catalin.marinas@arm.com> wrote:
> > On Thu, Jun 12, 2014 at 01:00:57PM +0100, Denis Kirjanov wrote:
> >> On 6/12/14, Denis Kirjanov <kda@linux-powerpc.org> wrote:
> >> > On 6/12/14, Catalin Marinas <catalin.marinas@arm.com> wrote:
> >> >> On 11 Jun 2014, at 21:04, Denis Kirjanov <kda@linux-powerpc.org>
> >> >> wrote:
> >> >>> On 6/11/14, Catalin Marinas <catalin.marinas@arm.com> wrote:
> >> >>>> On Wed, Jun 11, 2014 at 04:13:07PM +0400, Denis Kirjanov wrote:
> >> >>>>> I got a trace while running 3.15.0-08556-gdfb9454:
> >> >>>>>
> >> >>>>> [  104.534026] Unable to handle kernel paging request for data at
> >> >>>>> address 0xc00000007f000000
> >> >>>>
> >> >>>> Were there any kmemleak messages prior to this, like "kmemleak
> >> >>>> disabled"? There could be a race when kmemleak is disabled because
> >> >>>> of
> >> >>>> some fatal (for kmemleak) error while the scanning is taking place
> >> >>>> (which needs some more thinking to fix properly).
> >> >>>
> >> >>> No. I checked for the similar problem and didn't find anything
> >> >>> relevant.
> >> >>> I'll try to bisect it.
> >> >>
> >> >> Does this happen soon after boot? I guess it’s the first scan
> >> >> (scheduled at around 1min after boot). Something seems to be telling
> >> >> kmemleak that there is a valid memory block at 0xc00000007f000000.
> >> >
> >> > Yeah, it happens after a while with a booted system so that's the
> >> > first kmemleak scan.
> >>
> >> I've bisected to this commit: d4c54919ed86302094c0ca7d48a8cbd4ee753e92
> >> "mm: add !pte_present() check on existing hugetlb_entry callbacks".
> >> Reverting the commit fixes the issue
> >
> > I can't figure how this causes the problem but I have more questions. Is
> > 0xc00000007f000000 address always the same in all crashes? If yes, you
> > could comment out start_scan_thread() in kmemleak_late_init() to avoid
> > the scanning thread starting. Once booted, you can run:
> >
> >   echo dump=0xc00000007f000000 > /sys/kernel/debug/kmemleak
> >
> > and check the dmesg for what kmemleak knows about that address, when it
> > was allocated and whether it should be mapped or not.
> 
> The address is always the same.
> 
> [  179.466239] kmemleak: Object 0xc00000007f000000 (size 16777216):
> [  179.466503] kmemleak:   comm "swapper/0", pid 0, jiffies 4294892300
> [  179.466508] kmemleak:   min_count = 0
> [  179.466512] kmemleak:   count = 0
> [  179.466517] kmemleak:   flags = 0x1
> [  179.466522] kmemleak:   checksum = 0
> [  179.466526] kmemleak:   backtrace:
> [  179.466531]      [<c000000000afc3dc>] .memblock_alloc_range_nid+0x68/0x88
> [  179.466544]      [<c000000000afc444>] .memblock_alloc_base+0x20/0x58
> [  179.466553]      [<c000000000ae96cc>] .alloc_dart_table+0x5c/0xb0
> [  179.466561]      [<c000000000aea300>] .pmac_probe+0x38/0xa0
> [  179.466569]      [<000000000002166c>] 0x2166c
> [  179.466579]      [<0000000000ae0e68>] 0xae0e68
> [  179.466587]      [<0000000000009bc4>] 0x9bc4

OK, so that's the DART table allocated via alloc_dart_table(). Is
dart_tablebase removed from the kernel linear mapping after allocation?
If that's the case, we need to tell kmemleak to ignore this block (see
patch below, untested). But I still can't explain how commit
d4c54919ed863020 causes this issue.

(also cc'ing the powerpc list and maintainers)

---------------8<--------------------------

>From 09a7f1c97166c7bdca7ca4e8a4ff2774f3706ea3 Mon Sep 17 00:00:00 2001
From: Catalin Marinas <catalin.marinas@arm.com>
Date: Fri, 13 Jun 2014 09:44:21 +0100
Subject: [PATCH] powerpc/kmemleak: Do not scan the DART table

The DART table allocation is registered to kmemleak via the
memblock_alloc_base() call. However, the DART table is later unmapped
and dart_tablebase VA no longer accessible. This patch tells kmemleak
not to scan this block and avoid an unhandled paging request.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/sysdev/dart_iommu.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/powerpc/sysdev/dart_iommu.c b/arch/powerpc/sysdev/dart_iommu.c
index 62c47bb76517..9e5353ff6d1b 100644
--- a/arch/powerpc/sysdev/dart_iommu.c
+++ b/arch/powerpc/sysdev/dart_iommu.c
@@ -476,6 +476,11 @@ void __init alloc_dart_table(void)
 	 */
 	dart_tablebase = (unsigned long)
 		__va(memblock_alloc_base(1UL<<24, 1UL<<24, 0x80000000L));
+	/*
+	 * The DART space is later unmapped from the kernel linear mapping and
+	 * accessing dart_tablebase during kmemleak scanning will fault.
+	 */
+	kmemleak_no_scan((void *)dart_tablebase);
 
 	printk(KERN_INFO "DART table allocated at: %lx\n", dart_tablebase);
 }

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: kmemleak: Unable to handle kernel paging request
@ 2014-06-13  8:56                 ` Catalin Marinas
  0 siblings, 0 replies; 32+ messages in thread
From: Catalin Marinas @ 2014-06-13  8:56 UTC (permalink / raw)
  To: Denis Kirjanov
  Cc: linux-kernel, linux-mm, Naoya Horiguchi, linuxppc-dev,
	Benjamin Herrenschmidt, Paul Mackerras

On Fri, Jun 13, 2014 at 08:12:08AM +0100, Denis Kirjanov wrote:
> On 6/12/14, Catalin Marinas <catalin.marinas@arm.com> wrote:
> > On Thu, Jun 12, 2014 at 01:00:57PM +0100, Denis Kirjanov wrote:
> >> On 6/12/14, Denis Kirjanov <kda@linux-powerpc.org> wrote:
> >> > On 6/12/14, Catalin Marinas <catalin.marinas@arm.com> wrote:
> >> >> On 11 Jun 2014, at 21:04, Denis Kirjanov <kda@linux-powerpc.org>
> >> >> wrote:
> >> >>> On 6/11/14, Catalin Marinas <catalin.marinas@arm.com> wrote:
> >> >>>> On Wed, Jun 11, 2014 at 04:13:07PM +0400, Denis Kirjanov wrote:
> >> >>>>> I got a trace while running 3.15.0-08556-gdfb9454:
> >> >>>>>
> >> >>>>> [  104.534026] Unable to handle kernel paging request for data at
> >> >>>>> address 0xc00000007f000000
> >> >>>>
> >> >>>> Were there any kmemleak messages prior to this, like "kmemleak
> >> >>>> disabled"? There could be a race when kmemleak is disabled because
> >> >>>> of
> >> >>>> some fatal (for kmemleak) error while the scanning is taking place
> >> >>>> (which needs some more thinking to fix properly).
> >> >>>
> >> >>> No. I checked for the similar problem and didn't find anything
> >> >>> relevant.
> >> >>> I'll try to bisect it.
> >> >>
> >> >> Does this happen soon after boot? I guess ita??s the first scan
> >> >> (scheduled at around 1min after boot). Something seems to be telling
> >> >> kmemleak that there is a valid memory block at 0xc00000007f000000.
> >> >
> >> > Yeah, it happens after a while with a booted system so that's the
> >> > first kmemleak scan.
> >>
> >> I've bisected to this commit: d4c54919ed86302094c0ca7d48a8cbd4ee753e92
> >> "mm: add !pte_present() check on existing hugetlb_entry callbacks".
> >> Reverting the commit fixes the issue
> >
> > I can't figure how this causes the problem but I have more questions. Is
> > 0xc00000007f000000 address always the same in all crashes? If yes, you
> > could comment out start_scan_thread() in kmemleak_late_init() to avoid
> > the scanning thread starting. Once booted, you can run:
> >
> >   echo dump=0xc00000007f000000 > /sys/kernel/debug/kmemleak
> >
> > and check the dmesg for what kmemleak knows about that address, when it
> > was allocated and whether it should be mapped or not.
> 
> The address is always the same.
> 
> [  179.466239] kmemleak: Object 0xc00000007f000000 (size 16777216):
> [  179.466503] kmemleak:   comm "swapper/0", pid 0, jiffies 4294892300
> [  179.466508] kmemleak:   min_count = 0
> [  179.466512] kmemleak:   count = 0
> [  179.466517] kmemleak:   flags = 0x1
> [  179.466522] kmemleak:   checksum = 0
> [  179.466526] kmemleak:   backtrace:
> [  179.466531]      [<c000000000afc3dc>] .memblock_alloc_range_nid+0x68/0x88
> [  179.466544]      [<c000000000afc444>] .memblock_alloc_base+0x20/0x58
> [  179.466553]      [<c000000000ae96cc>] .alloc_dart_table+0x5c/0xb0
> [  179.466561]      [<c000000000aea300>] .pmac_probe+0x38/0xa0
> [  179.466569]      [<000000000002166c>] 0x2166c
> [  179.466579]      [<0000000000ae0e68>] 0xae0e68
> [  179.466587]      [<0000000000009bc4>] 0x9bc4

OK, so that's the DART table allocated via alloc_dart_table(). Is
dart_tablebase removed from the kernel linear mapping after allocation?
If that's the case, we need to tell kmemleak to ignore this block (see
patch below, untested). But I still can't explain how commit
d4c54919ed863020 causes this issue.

(also cc'ing the powerpc list and maintainers)

---------------8<--------------------------

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: kmemleak: Unable to handle kernel paging request
@ 2014-06-13  8:56                 ` Catalin Marinas
  0 siblings, 0 replies; 32+ messages in thread
From: Catalin Marinas @ 2014-06-13  8:56 UTC (permalink / raw)
  To: Denis Kirjanov
  Cc: linux-kernel, linux-mm, Paul Mackerras, Naoya Horiguchi, linuxppc-dev

On Fri, Jun 13, 2014 at 08:12:08AM +0100, Denis Kirjanov wrote:
> On 6/12/14, Catalin Marinas <catalin.marinas@arm.com> wrote:
> > On Thu, Jun 12, 2014 at 01:00:57PM +0100, Denis Kirjanov wrote:
> >> On 6/12/14, Denis Kirjanov <kda@linux-powerpc.org> wrote:
> >> > On 6/12/14, Catalin Marinas <catalin.marinas@arm.com> wrote:
> >> >> On 11 Jun 2014, at 21:04, Denis Kirjanov <kda@linux-powerpc.org>
> >> >> wrote:
> >> >>> On 6/11/14, Catalin Marinas <catalin.marinas@arm.com> wrote:
> >> >>>> On Wed, Jun 11, 2014 at 04:13:07PM +0400, Denis Kirjanov wrote:
> >> >>>>> I got a trace while running 3.15.0-08556-gdfb9454:
> >> >>>>>
> >> >>>>> [  104.534026] Unable to handle kernel paging request for data at
> >> >>>>> address 0xc00000007f000000
> >> >>>>
> >> >>>> Were there any kmemleak messages prior to this, like "kmemleak
> >> >>>> disabled"? There could be a race when kmemleak is disabled because
> >> >>>> of
> >> >>>> some fatal (for kmemleak) error while the scanning is taking place
> >> >>>> (which needs some more thinking to fix properly).
> >> >>>
> >> >>> No. I checked for the similar problem and didn't find anything
> >> >>> relevant.
> >> >>> I'll try to bisect it.
> >> >>
> >> >> Does this happen soon after boot? I guess it’s the first scan
> >> >> (scheduled at around 1min after boot). Something seems to be telling
> >> >> kmemleak that there is a valid memory block at 0xc00000007f000000.
> >> >
> >> > Yeah, it happens after a while with a booted system so that's the
> >> > first kmemleak scan.
> >>
> >> I've bisected to this commit: d4c54919ed86302094c0ca7d48a8cbd4ee753e92
> >> "mm: add !pte_present() check on existing hugetlb_entry callbacks".
> >> Reverting the commit fixes the issue
> >
> > I can't figure how this causes the problem but I have more questions. Is
> > 0xc00000007f000000 address always the same in all crashes? If yes, you
> > could comment out start_scan_thread() in kmemleak_late_init() to avoid
> > the scanning thread starting. Once booted, you can run:
> >
> >   echo dump=0xc00000007f000000 > /sys/kernel/debug/kmemleak
> >
> > and check the dmesg for what kmemleak knows about that address, when it
> > was allocated and whether it should be mapped or not.
> 
> The address is always the same.
> 
> [  179.466239] kmemleak: Object 0xc00000007f000000 (size 16777216):
> [  179.466503] kmemleak:   comm "swapper/0", pid 0, jiffies 4294892300
> [  179.466508] kmemleak:   min_count = 0
> [  179.466512] kmemleak:   count = 0
> [  179.466517] kmemleak:   flags = 0x1
> [  179.466522] kmemleak:   checksum = 0
> [  179.466526] kmemleak:   backtrace:
> [  179.466531]      [<c000000000afc3dc>] .memblock_alloc_range_nid+0x68/0x88
> [  179.466544]      [<c000000000afc444>] .memblock_alloc_base+0x20/0x58
> [  179.466553]      [<c000000000ae96cc>] .alloc_dart_table+0x5c/0xb0
> [  179.466561]      [<c000000000aea300>] .pmac_probe+0x38/0xa0
> [  179.466569]      [<000000000002166c>] 0x2166c
> [  179.466579]      [<0000000000ae0e68>] 0xae0e68
> [  179.466587]      [<0000000000009bc4>] 0x9bc4

OK, so that's the DART table allocated via alloc_dart_table(). Is
dart_tablebase removed from the kernel linear mapping after allocation?
If that's the case, we need to tell kmemleak to ignore this block (see
patch below, untested). But I still can't explain how commit
d4c54919ed863020 causes this issue.

(also cc'ing the powerpc list and maintainers)

---------------8<--------------------------

>From 09a7f1c97166c7bdca7ca4e8a4ff2774f3706ea3 Mon Sep 17 00:00:00 2001
From: Catalin Marinas <catalin.marinas@arm.com>
Date: Fri, 13 Jun 2014 09:44:21 +0100
Subject: [PATCH] powerpc/kmemleak: Do not scan the DART table

The DART table allocation is registered to kmemleak via the
memblock_alloc_base() call. However, the DART table is later unmapped
and dart_tablebase VA no longer accessible. This patch tells kmemleak
not to scan this block and avoid an unhandled paging request.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/sysdev/dart_iommu.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/powerpc/sysdev/dart_iommu.c b/arch/powerpc/sysdev/dart_iommu.c
index 62c47bb76517..9e5353ff6d1b 100644
--- a/arch/powerpc/sysdev/dart_iommu.c
+++ b/arch/powerpc/sysdev/dart_iommu.c
@@ -476,6 +476,11 @@ void __init alloc_dart_table(void)
 	 */
 	dart_tablebase = (unsigned long)
 		__va(memblock_alloc_base(1UL<<24, 1UL<<24, 0x80000000L));
+	/*
+	 * The DART space is later unmapped from the kernel linear mapping and
+	 * accessing dart_tablebase during kmemleak scanning will fault.
+	 */
+	kmemleak_no_scan((void *)dart_tablebase);
 
 	printk(KERN_INFO "DART table allocated at: %lx\n", dart_tablebase);
 }

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: kmemleak: Unable to handle kernel paging request
  2014-06-13  8:56                 ` Catalin Marinas
  (?)
@ 2014-06-13 10:26                   ` Denis Kirjanov
  -1 siblings, 0 replies; 32+ messages in thread
From: Denis Kirjanov @ 2014-06-13 10:26 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: linux-kernel, linux-mm, Naoya Horiguchi, linuxppc-dev,
	Benjamin Herrenschmidt, Paul Mackerras

On 6/13/14, Catalin Marinas <catalin.marinas@arm.com> wrote:
> On Fri, Jun 13, 2014 at 08:12:08AM +0100, Denis Kirjanov wrote:
>> On 6/12/14, Catalin Marinas <catalin.marinas@arm.com> wrote:
>> > On Thu, Jun 12, 2014 at 01:00:57PM +0100, Denis Kirjanov wrote:
>> >> On 6/12/14, Denis Kirjanov <kda@linux-powerpc.org> wrote:
>> >> > On 6/12/14, Catalin Marinas <catalin.marinas@arm.com> wrote:
>> >> >> On 11 Jun 2014, at 21:04, Denis Kirjanov <kda@linux-powerpc.org>
>> >> >> wrote:
>> >> >>> On 6/11/14, Catalin Marinas <catalin.marinas@arm.com> wrote:
>> >> >>>> On Wed, Jun 11, 2014 at 04:13:07PM +0400, Denis Kirjanov wrote:
>> >> >>>>> I got a trace while running 3.15.0-08556-gdfb9454:
>> >> >>>>>
>> >> >>>>> [  104.534026] Unable to handle kernel paging request for data
>> >> >>>>> at
>> >> >>>>> address 0xc00000007f000000
>> >> >>>>
>> >> >>>> Were there any kmemleak messages prior to this, like "kmemleak
>> >> >>>> disabled"? There could be a race when kmemleak is disabled
>> >> >>>> because
>> >> >>>> of
>> >> >>>> some fatal (for kmemleak) error while the scanning is taking
>> >> >>>> place
>> >> >>>> (which needs some more thinking to fix properly).
>> >> >>>
>> >> >>> No. I checked for the similar problem and didn't find anything
>> >> >>> relevant.
>> >> >>> I'll try to bisect it.
>> >> >>
>> >> >> Does this happen soon after boot? I guess it’s the first scan
>> >> >> (scheduled at around 1min after boot). Something seems to be
>> >> >> telling
>> >> >> kmemleak that there is a valid memory block at 0xc00000007f000000.
>> >> >
>> >> > Yeah, it happens after a while with a booted system so that's the
>> >> > first kmemleak scan.
>> >>
>> >> I've bisected to this commit: d4c54919ed86302094c0ca7d48a8cbd4ee753e92
>> >> "mm: add !pte_present() check on existing hugetlb_entry callbacks".
>> >> Reverting the commit fixes the issue
>> >
>> > I can't figure how this causes the problem but I have more questions.
>> > Is
>> > 0xc00000007f000000 address always the same in all crashes? If yes, you
>> > could comment out start_scan_thread() in kmemleak_late_init() to avoid
>> > the scanning thread starting. Once booted, you can run:
>> >
>> >   echo dump=0xc00000007f000000 > /sys/kernel/debug/kmemleak
>> >
>> > and check the dmesg for what kmemleak knows about that address, when it
>> > was allocated and whether it should be mapped or not.
>>
>> The address is always the same.
>>
>> [  179.466239] kmemleak: Object 0xc00000007f000000 (size 16777216):
>> [  179.466503] kmemleak:   comm "swapper/0", pid 0, jiffies 4294892300
>> [  179.466508] kmemleak:   min_count = 0
>> [  179.466512] kmemleak:   count = 0
>> [  179.466517] kmemleak:   flags = 0x1
>> [  179.466522] kmemleak:   checksum = 0
>> [  179.466526] kmemleak:   backtrace:
>> [  179.466531]      [<c000000000afc3dc>]
>> .memblock_alloc_range_nid+0x68/0x88
>> [  179.466544]      [<c000000000afc444>] .memblock_alloc_base+0x20/0x58
>> [  179.466553]      [<c000000000ae96cc>] .alloc_dart_table+0x5c/0xb0
>> [  179.466561]      [<c000000000aea300>] .pmac_probe+0x38/0xa0
>> [  179.466569]      [<000000000002166c>] 0x2166c
>> [  179.466579]      [<0000000000ae0e68>] 0xae0e68
>> [  179.466587]      [<0000000000009bc4>] 0x9bc4
>
> OK, so that's the DART table allocated via alloc_dart_table(). Is
> dart_tablebase removed from the kernel linear mapping after allocation?
> If that's the case, we need to tell kmemleak to ignore this block (see
> patch below, untested). But I still can't explain how commit
> d4c54919ed863020 causes this issue.
>
> (also cc'ing the powerpc list and maintainers)

Ok, your path fixes the oops.

Ben, can you shed some light on this issue?

Thanks!
> ---------------8<--------------------------
>
> From 09a7f1c97166c7bdca7ca4e8a4ff2774f3706ea3 Mon Sep 17 00:00:00 2001
> From: Catalin Marinas <catalin.marinas@arm.com>
> Date: Fri, 13 Jun 2014 09:44:21 +0100
> Subject: [PATCH] powerpc/kmemleak: Do not scan the DART table
>
> The DART table allocation is registered to kmemleak via the
> memblock_alloc_base() call. However, the DART table is later unmapped
> and dart_tablebase VA no longer accessible. This patch tells kmemleak
> not to scan this block and avoid an unhandled paging request.
>
> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Paul Mackerras <paulus@samba.org>
> ---
>  arch/powerpc/sysdev/dart_iommu.c | 5 +++++
>  1 file changed, 5 insertions(+)
>
> diff --git a/arch/powerpc/sysdev/dart_iommu.c
> b/arch/powerpc/sysdev/dart_iommu.c
> index 62c47bb76517..9e5353ff6d1b 100644
> --- a/arch/powerpc/sysdev/dart_iommu.c
> +++ b/arch/powerpc/sysdev/dart_iommu.c
> @@ -476,6 +476,11 @@ void __init alloc_dart_table(void)
>  	 */
>  	dart_tablebase = (unsigned long)
>  		__va(memblock_alloc_base(1UL<<24, 1UL<<24, 0x80000000L));
> +	/*
> +	 * The DART space is later unmapped from the kernel linear mapping and
> +	 * accessing dart_tablebase during kmemleak scanning will fault.
> +	 */
> +	kmemleak_no_scan((void *)dart_tablebase);
>
>  	printk(KERN_INFO "DART table allocated at: %lx\n", dart_tablebase);
>  }
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: kmemleak: Unable to handle kernel paging request
@ 2014-06-13 10:26                   ` Denis Kirjanov
  0 siblings, 0 replies; 32+ messages in thread
From: Denis Kirjanov @ 2014-06-13 10:26 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: linux-kernel, linux-mm, Naoya Horiguchi, linuxppc-dev,
	Benjamin Herrenschmidt, Paul Mackerras

On 6/13/14, Catalin Marinas <catalin.marinas@arm.com> wrote:
> On Fri, Jun 13, 2014 at 08:12:08AM +0100, Denis Kirjanov wrote:
>> On 6/12/14, Catalin Marinas <catalin.marinas@arm.com> wrote:
>> > On Thu, Jun 12, 2014 at 01:00:57PM +0100, Denis Kirjanov wrote:
>> >> On 6/12/14, Denis Kirjanov <kda@linux-powerpc.org> wrote:
>> >> > On 6/12/14, Catalin Marinas <catalin.marinas@arm.com> wrote:
>> >> >> On 11 Jun 2014, at 21:04, Denis Kirjanov <kda@linux-powerpc.org>
>> >> >> wrote:
>> >> >>> On 6/11/14, Catalin Marinas <catalin.marinas@arm.com> wrote:
>> >> >>>> On Wed, Jun 11, 2014 at 04:13:07PM +0400, Denis Kirjanov wrote:
>> >> >>>>> I got a trace while running 3.15.0-08556-gdfb9454:
>> >> >>>>>
>> >> >>>>> [  104.534026] Unable to handle kernel paging request for data
>> >> >>>>> at
>> >> >>>>> address 0xc00000007f000000
>> >> >>>>
>> >> >>>> Were there any kmemleak messages prior to this, like "kmemleak
>> >> >>>> disabled"? There could be a race when kmemleak is disabled
>> >> >>>> because
>> >> >>>> of
>> >> >>>> some fatal (for kmemleak) error while the scanning is taking
>> >> >>>> place
>> >> >>>> (which needs some more thinking to fix properly).
>> >> >>>
>> >> >>> No. I checked for the similar problem and didn't find anything
>> >> >>> relevant.
>> >> >>> I'll try to bisect it.
>> >> >>
>> >> >> Does this happen soon after boot? I guess it’s the first scan
>> >> >> (scheduled at around 1min after boot). Something seems to be
>> >> >> telling
>> >> >> kmemleak that there is a valid memory block at 0xc00000007f000000.
>> >> >
>> >> > Yeah, it happens after a while with a booted system so that's the
>> >> > first kmemleak scan.
>> >>
>> >> I've bisected to this commit: d4c54919ed86302094c0ca7d48a8cbd4ee753e92
>> >> "mm: add !pte_present() check on existing hugetlb_entry callbacks".
>> >> Reverting the commit fixes the issue
>> >
>> > I can't figure how this causes the problem but I have more questions.
>> > Is
>> > 0xc00000007f000000 address always the same in all crashes? If yes, you
>> > could comment out start_scan_thread() in kmemleak_late_init() to avoid
>> > the scanning thread starting. Once booted, you can run:
>> >
>> >   echo dump=0xc00000007f000000 > /sys/kernel/debug/kmemleak
>> >
>> > and check the dmesg for what kmemleak knows about that address, when it
>> > was allocated and whether it should be mapped or not.
>>
>> The address is always the same.
>>
>> [  179.466239] kmemleak: Object 0xc00000007f000000 (size 16777216):
>> [  179.466503] kmemleak:   comm "swapper/0", pid 0, jiffies 4294892300
>> [  179.466508] kmemleak:   min_count = 0
>> [  179.466512] kmemleak:   count = 0
>> [  179.466517] kmemleak:   flags = 0x1
>> [  179.466522] kmemleak:   checksum = 0
>> [  179.466526] kmemleak:   backtrace:
>> [  179.466531]      [<c000000000afc3dc>]
>> .memblock_alloc_range_nid+0x68/0x88
>> [  179.466544]      [<c000000000afc444>] .memblock_alloc_base+0x20/0x58
>> [  179.466553]      [<c000000000ae96cc>] .alloc_dart_table+0x5c/0xb0
>> [  179.466561]      [<c000000000aea300>] .pmac_probe+0x38/0xa0
>> [  179.466569]      [<000000000002166c>] 0x2166c
>> [  179.466579]      [<0000000000ae0e68>] 0xae0e68
>> [  179.466587]      [<0000000000009bc4>] 0x9bc4
>
> OK, so that's the DART table allocated via alloc_dart_table(). Is
> dart_tablebase removed from the kernel linear mapping after allocation?
> If that's the case, we need to tell kmemleak to ignore this block (see
> patch below, untested). But I still can't explain how commit
> d4c54919ed863020 causes this issue.
>
> (also cc'ing the powerpc list and maintainers)

Ok, your path fixes the oops.

Ben, can you shed some light on this issue?

Thanks!
> ---------------8<--------------------------
>
> From 09a7f1c97166c7bdca7ca4e8a4ff2774f3706ea3 Mon Sep 17 00:00:00 2001
> From: Catalin Marinas <catalin.marinas@arm.com>
> Date: Fri, 13 Jun 2014 09:44:21 +0100
> Subject: [PATCH] powerpc/kmemleak: Do not scan the DART table
>
> The DART table allocation is registered to kmemleak via the
> memblock_alloc_base() call. However, the DART table is later unmapped
> and dart_tablebase VA no longer accessible. This patch tells kmemleak
> not to scan this block and avoid an unhandled paging request.
>
> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Paul Mackerras <paulus@samba.org>
> ---
>  arch/powerpc/sysdev/dart_iommu.c | 5 +++++
>  1 file changed, 5 insertions(+)
>
> diff --git a/arch/powerpc/sysdev/dart_iommu.c
> b/arch/powerpc/sysdev/dart_iommu.c
> index 62c47bb76517..9e5353ff6d1b 100644
> --- a/arch/powerpc/sysdev/dart_iommu.c
> +++ b/arch/powerpc/sysdev/dart_iommu.c
> @@ -476,6 +476,11 @@ void __init alloc_dart_table(void)
>  	 */
>  	dart_tablebase = (unsigned long)
>  		__va(memblock_alloc_base(1UL<<24, 1UL<<24, 0x80000000L));
> +	/*
> +	 * The DART space is later unmapped from the kernel linear mapping and
> +	 * accessing dart_tablebase during kmemleak scanning will fault.
> +	 */
> +	kmemleak_no_scan((void *)dart_tablebase);
>
>  	printk(KERN_INFO "DART table allocated at: %lx\n", dart_tablebase);
>  }
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: kmemleak: Unable to handle kernel paging request
@ 2014-06-13 10:26                   ` Denis Kirjanov
  0 siblings, 0 replies; 32+ messages in thread
From: Denis Kirjanov @ 2014-06-13 10:26 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: linux-kernel, linux-mm, Paul Mackerras, Naoya Horiguchi, linuxppc-dev

On 6/13/14, Catalin Marinas <catalin.marinas@arm.com> wrote:
> On Fri, Jun 13, 2014 at 08:12:08AM +0100, Denis Kirjanov wrote:
>> On 6/12/14, Catalin Marinas <catalin.marinas@arm.com> wrote:
>> > On Thu, Jun 12, 2014 at 01:00:57PM +0100, Denis Kirjanov wrote:
>> >> On 6/12/14, Denis Kirjanov <kda@linux-powerpc.org> wrote:
>> >> > On 6/12/14, Catalin Marinas <catalin.marinas@arm.com> wrote:
>> >> >> On 11 Jun 2014, at 21:04, Denis Kirjanov <kda@linux-powerpc.org>
>> >> >> wrote:
>> >> >>> On 6/11/14, Catalin Marinas <catalin.marinas@arm.com> wrote:
>> >> >>>> On Wed, Jun 11, 2014 at 04:13:07PM +0400, Denis Kirjanov wrote:
>> >> >>>>> I got a trace while running 3.15.0-08556-gdfb9454:
>> >> >>>>>
>> >> >>>>> [  104.534026] Unable to handle kernel paging request for data
>> >> >>>>> at
>> >> >>>>> address 0xc00000007f000000
>> >> >>>>
>> >> >>>> Were there any kmemleak messages prior to this, like "kmemleak
>> >> >>>> disabled"? There could be a race when kmemleak is disabled
>> >> >>>> because
>> >> >>>> of
>> >> >>>> some fatal (for kmemleak) error while the scanning is taking
>> >> >>>> place
>> >> >>>> (which needs some more thinking to fix properly).
>> >> >>>
>> >> >>> No. I checked for the similar problem and didn't find anything
>> >> >>> relevant.
>> >> >>> I'll try to bisect it.
>> >> >>
>> >> >> Does this happen soon after boot? I guess it=E2=80=99s the first s=
can
>> >> >> (scheduled at around 1min after boot). Something seems to be
>> >> >> telling
>> >> >> kmemleak that there is a valid memory block at 0xc00000007f000000.
>> >> >
>> >> > Yeah, it happens after a while with a booted system so that's the
>> >> > first kmemleak scan.
>> >>
>> >> I've bisected to this commit: d4c54919ed86302094c0ca7d48a8cbd4ee753e9=
2
>> >> "mm: add !pte_present() check on existing hugetlb_entry callbacks".
>> >> Reverting the commit fixes the issue
>> >
>> > I can't figure how this causes the problem but I have more questions.
>> > Is
>> > 0xc00000007f000000 address always the same in all crashes? If yes, you
>> > could comment out start_scan_thread() in kmemleak_late_init() to avoid
>> > the scanning thread starting. Once booted, you can run:
>> >
>> >   echo dump=3D0xc00000007f000000 > /sys/kernel/debug/kmemleak
>> >
>> > and check the dmesg for what kmemleak knows about that address, when i=
t
>> > was allocated and whether it should be mapped or not.
>>
>> The address is always the same.
>>
>> [  179.466239] kmemleak: Object 0xc00000007f000000 (size 16777216):
>> [  179.466503] kmemleak:   comm "swapper/0", pid 0, jiffies 4294892300
>> [  179.466508] kmemleak:   min_count =3D 0
>> [  179.466512] kmemleak:   count =3D 0
>> [  179.466517] kmemleak:   flags =3D 0x1
>> [  179.466522] kmemleak:   checksum =3D 0
>> [  179.466526] kmemleak:   backtrace:
>> [  179.466531]      [<c000000000afc3dc>]
>> .memblock_alloc_range_nid+0x68/0x88
>> [  179.466544]      [<c000000000afc444>] .memblock_alloc_base+0x20/0x58
>> [  179.466553]      [<c000000000ae96cc>] .alloc_dart_table+0x5c/0xb0
>> [  179.466561]      [<c000000000aea300>] .pmac_probe+0x38/0xa0
>> [  179.466569]      [<000000000002166c>] 0x2166c
>> [  179.466579]      [<0000000000ae0e68>] 0xae0e68
>> [  179.466587]      [<0000000000009bc4>] 0x9bc4
>
> OK, so that's the DART table allocated via alloc_dart_table(). Is
> dart_tablebase removed from the kernel linear mapping after allocation?
> If that's the case, we need to tell kmemleak to ignore this block (see
> patch below, untested). But I still can't explain how commit
> d4c54919ed863020 causes this issue.
>
> (also cc'ing the powerpc list and maintainers)

Ok, your path fixes the oops.

Ben, can you shed some light on this issue?

Thanks!
> ---------------8<--------------------------
>
> From 09a7f1c97166c7bdca7ca4e8a4ff2774f3706ea3 Mon Sep 17 00:00:00 2001
> From: Catalin Marinas <catalin.marinas@arm.com>
> Date: Fri, 13 Jun 2014 09:44:21 +0100
> Subject: [PATCH] powerpc/kmemleak: Do not scan the DART table
>
> The DART table allocation is registered to kmemleak via the
> memblock_alloc_base() call. However, the DART table is later unmapped
> and dart_tablebase VA no longer accessible. This patch tells kmemleak
> not to scan this block and avoid an unhandled paging request.
>
> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Paul Mackerras <paulus@samba.org>
> ---
>  arch/powerpc/sysdev/dart_iommu.c | 5 +++++
>  1 file changed, 5 insertions(+)
>
> diff --git a/arch/powerpc/sysdev/dart_iommu.c
> b/arch/powerpc/sysdev/dart_iommu.c
> index 62c47bb76517..9e5353ff6d1b 100644
> --- a/arch/powerpc/sysdev/dart_iommu.c
> +++ b/arch/powerpc/sysdev/dart_iommu.c
> @@ -476,6 +476,11 @@ void __init alloc_dart_table(void)
>  	 */
>  	dart_tablebase =3D (unsigned long)
>  		__va(memblock_alloc_base(1UL<<24, 1UL<<24, 0x80000000L));
> +	/*
> +	 * The DART space is later unmapped from the kernel linear mapping and
> +	 * accessing dart_tablebase during kmemleak scanning will fault.
> +	 */
> +	kmemleak_no_scan((void *)dart_tablebase);
>
>  	printk(KERN_INFO "DART table allocated at: %lx\n", dart_tablebase);
>  }
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: kmemleak: Unable to handle kernel paging request
  2014-06-13  8:56                 ` Catalin Marinas
@ 2014-06-13 21:44                   ` Benjamin Herrenschmidt
  -1 siblings, 0 replies; 32+ messages in thread
From: Benjamin Herrenschmidt @ 2014-06-13 21:44 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Denis Kirjanov, linux-kernel, linux-mm, Naoya Horiguchi,
	linuxppc-dev, Paul Mackerras

On Fri, 2014-06-13 at 09:56 +0100, Catalin Marinas wrote:

> OK, so that's the DART table allocated via alloc_dart_table(). Is
> dart_tablebase removed from the kernel linear mapping after allocation?

Yes.

> If that's the case, we need to tell kmemleak to ignore this block (see
> patch below, untested). But I still can't explain how commit
> d4c54919ed863020 causes this issue.
> 
> (also cc'ing the powerpc list and maintainers)

We remove the DART from the linear mapping because it has to be mapped
non-cachable and having it in the linear mapping would cause cache
paradoxes. We also can't just change the caching attributes in the
linear mapping because we use 16M pages for it and 970 CPUs don't
support cache-inhibited 16M pages :-( And due to the MMU segmentation
model, we also can't mix & match page sizes in that area.

So we just unmap it, and ioremap it elsewhere.

Cheers,
Ben.

> ---------------8<--------------------------
> 
> >From 09a7f1c97166c7bdca7ca4e8a4ff2774f3706ea3 Mon Sep 17 00:00:00 2001
> From: Catalin Marinas <catalin.marinas@arm.com>
> Date: Fri, 13 Jun 2014 09:44:21 +0100
> Subject: [PATCH] powerpc/kmemleak: Do not scan the DART table
> 
> The DART table allocation is registered to kmemleak via the
> memblock_alloc_base() call. However, the DART table is later unmapped
> and dart_tablebase VA no longer accessible. This patch tells kmemleak
> not to scan this block and avoid an unhandled paging request.
> 
> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Paul Mackerras <paulus@samba.org>
> ---
>  arch/powerpc/sysdev/dart_iommu.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/arch/powerpc/sysdev/dart_iommu.c b/arch/powerpc/sysdev/dart_iommu.c
> index 62c47bb76517..9e5353ff6d1b 100644
> --- a/arch/powerpc/sysdev/dart_iommu.c
> +++ b/arch/powerpc/sysdev/dart_iommu.c
> @@ -476,6 +476,11 @@ void __init alloc_dart_table(void)
>  	 */
>  	dart_tablebase = (unsigned long)
>  		__va(memblock_alloc_base(1UL<<24, 1UL<<24, 0x80000000L));
> +	/*
> +	 * The DART space is later unmapped from the kernel linear mapping and
> +	 * accessing dart_tablebase during kmemleak scanning will fault.
> +	 */
> +	kmemleak_no_scan((void *)dart_tablebase);
>  
>  	printk(KERN_INFO "DART table allocated at: %lx\n", dart_tablebase);
>  }



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: kmemleak: Unable to handle kernel paging request
@ 2014-06-13 21:44                   ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 32+ messages in thread
From: Benjamin Herrenschmidt @ 2014-06-13 21:44 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Denis Kirjanov, linux-kernel, linux-mm, Paul Mackerras,
	Naoya Horiguchi, linuxppc-dev

On Fri, 2014-06-13 at 09:56 +0100, Catalin Marinas wrote:

> OK, so that's the DART table allocated via alloc_dart_table(). Is
> dart_tablebase removed from the kernel linear mapping after allocation?

Yes.

> If that's the case, we need to tell kmemleak to ignore this block (see
> patch below, untested). But I still can't explain how commit
> d4c54919ed863020 causes this issue.
> 
> (also cc'ing the powerpc list and maintainers)

We remove the DART from the linear mapping because it has to be mapped
non-cachable and having it in the linear mapping would cause cache
paradoxes. We also can't just change the caching attributes in the
linear mapping because we use 16M pages for it and 970 CPUs don't
support cache-inhibited 16M pages :-( And due to the MMU segmentation
model, we also can't mix & match page sizes in that area.

So we just unmap it, and ioremap it elsewhere.

Cheers,
Ben.

> ---------------8<--------------------------
> 
> >From 09a7f1c97166c7bdca7ca4e8a4ff2774f3706ea3 Mon Sep 17 00:00:00 2001
> From: Catalin Marinas <catalin.marinas@arm.com>
> Date: Fri, 13 Jun 2014 09:44:21 +0100
> Subject: [PATCH] powerpc/kmemleak: Do not scan the DART table
> 
> The DART table allocation is registered to kmemleak via the
> memblock_alloc_base() call. However, the DART table is later unmapped
> and dart_tablebase VA no longer accessible. This patch tells kmemleak
> not to scan this block and avoid an unhandled paging request.
> 
> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Paul Mackerras <paulus@samba.org>
> ---
>  arch/powerpc/sysdev/dart_iommu.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/arch/powerpc/sysdev/dart_iommu.c b/arch/powerpc/sysdev/dart_iommu.c
> index 62c47bb76517..9e5353ff6d1b 100644
> --- a/arch/powerpc/sysdev/dart_iommu.c
> +++ b/arch/powerpc/sysdev/dart_iommu.c
> @@ -476,6 +476,11 @@ void __init alloc_dart_table(void)
>  	 */
>  	dart_tablebase = (unsigned long)
>  		__va(memblock_alloc_base(1UL<<24, 1UL<<24, 0x80000000L));
> +	/*
> +	 * The DART space is later unmapped from the kernel linear mapping and
> +	 * accessing dart_tablebase during kmemleak scanning will fault.
> +	 */
> +	kmemleak_no_scan((void *)dart_tablebase);
>  
>  	printk(KERN_INFO "DART table allocated at: %lx\n", dart_tablebase);
>  }

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: kmemleak: Unable to handle kernel paging request
  2014-06-13 21:44                   ` Benjamin Herrenschmidt
  (?)
@ 2014-06-14 12:05                     ` Catalin Marinas
  -1 siblings, 0 replies; 32+ messages in thread
From: Catalin Marinas @ 2014-06-14 12:05 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Denis Kirjanov, linux-kernel, linux-mm, Naoya Horiguchi,
	linuxppc-dev, Paul Mackerras

On 13 Jun 2014, at 22:44, Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
> On Fri, 2014-06-13 at 09:56 +0100, Catalin Marinas wrote:
> 
>> OK, so that's the DART table allocated via alloc_dart_table(). Is
>> dart_tablebase removed from the kernel linear mapping after allocation?
> 
> Yes.
> 
>> If that's the case, we need to tell kmemleak to ignore this block (see
>> patch below, untested). But I still can't explain how commit
>> d4c54919ed863020 causes this issue.
>> 
>> (also cc'ing the powerpc list and maintainers)
> 
> We remove the DART from the linear mapping because it has to be mapped
> non-cachable and having it in the linear mapping would cause cache
> paradoxes. We also can't just change the caching attributes in the
> linear mapping because we use 16M pages for it and 970 CPUs don't
> support cache-inhibited 16M pages :-( And due to the MMU segmentation
> model, we also can't mix & match page sizes in that area.
> 
> So we just unmap it, and ioremap it elsewhere.

OK, thanks for the explanation. So the kmemleak annotation makes sense.

Would you please take the I patch earlier (I guess with Denis’ tested-
by). I can send it separately if more convenient.

Thanks,

Catalin

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: kmemleak: Unable to handle kernel paging request
@ 2014-06-14 12:05                     ` Catalin Marinas
  0 siblings, 0 replies; 32+ messages in thread
From: Catalin Marinas @ 2014-06-14 12:05 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Denis Kirjanov, linux-kernel, linux-mm, Naoya Horiguchi,
	linuxppc-dev, Paul Mackerras

On 13 Jun 2014, at 22:44, Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
> On Fri, 2014-06-13 at 09:56 +0100, Catalin Marinas wrote:
> 
>> OK, so that's the DART table allocated via alloc_dart_table(). Is
>> dart_tablebase removed from the kernel linear mapping after allocation?
> 
> Yes.
> 
>> If that's the case, we need to tell kmemleak to ignore this block (see
>> patch below, untested). But I still can't explain how commit
>> d4c54919ed863020 causes this issue.
>> 
>> (also cc'ing the powerpc list and maintainers)
> 
> We remove the DART from the linear mapping because it has to be mapped
> non-cachable and having it in the linear mapping would cause cache
> paradoxes. We also can't just change the caching attributes in the
> linear mapping because we use 16M pages for it and 970 CPUs don't
> support cache-inhibited 16M pages :-( And due to the MMU segmentation
> model, we also can't mix & match page sizes in that area.
> 
> So we just unmap it, and ioremap it elsewhere.

OK, thanks for the explanation. So the kmemleak annotation makes sense.

Would you please take the I patch earlier (I guess with Denis’ tested-
by). I can send it separately if more convenient.

Thanks,

Catalin
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: kmemleak: Unable to handle kernel paging request
@ 2014-06-14 12:05                     ` Catalin Marinas
  0 siblings, 0 replies; 32+ messages in thread
From: Catalin Marinas @ 2014-06-14 12:05 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Denis Kirjanov, linux-kernel, linux-mm, Paul Mackerras,
	Naoya Horiguchi, linuxppc-dev

On 13 Jun 2014, at 22:44, Benjamin Herrenschmidt =
<benh@kernel.crashing.org> wrote:
> On Fri, 2014-06-13 at 09:56 +0100, Catalin Marinas wrote:
>=20
>> OK, so that's the DART table allocated via alloc_dart_table(). Is
>> dart_tablebase removed from the kernel linear mapping after =
allocation?
>=20
> Yes.
>=20
>> If that's the case, we need to tell kmemleak to ignore this block =
(see
>> patch below, untested). But I still can't explain how commit
>> d4c54919ed863020 causes this issue.
>>=20
>> (also cc'ing the powerpc list and maintainers)
>=20
> We remove the DART from the linear mapping because it has to be mapped
> non-cachable and having it in the linear mapping would cause cache
> paradoxes. We also can't just change the caching attributes in the
> linear mapping because we use 16M pages for it and 970 CPUs don't
> support cache-inhibited 16M pages :-( And due to the MMU segmentation
> model, we also can't mix & match page sizes in that area.
>=20
> So we just unmap it, and ioremap it elsewhere.

OK, thanks for the explanation. So the kmemleak annotation makes sense.

Would you please take the I patch earlier (I guess with Denis=92 tested-
by). I can send it separately if more convenient.

Thanks,

Catalin=

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: kmemleak: Unable to handle kernel paging request
  2014-06-13 10:26                   ` Denis Kirjanov
@ 2014-06-16  2:40                     ` Michael Ellerman
  -1 siblings, 0 replies; 32+ messages in thread
From: Michael Ellerman @ 2014-06-16  2:40 UTC (permalink / raw)
  To: Denis Kirjanov
  Cc: Catalin Marinas, linux-kernel, linux-mm, Paul Mackerras,
	Naoya Horiguchi, linuxppc-dev

On Fri, 2014-06-13 at 14:26 +0400, Denis Kirjanov wrote:
> On 6/13/14, Catalin Marinas <catalin.marinas@arm.com> wrote:
> > On Fri, Jun 13, 2014 at 08:12:08AM +0100, Denis Kirjanov wrote:
> >> On 6/12/14, Catalin Marinas <catalin.marinas@arm.com> wrote:
> >> > On Thu, Jun 12, 2014 at 01:00:57PM +0100, Denis Kirjanov wrote:
> >> >> On 6/12/14, Denis Kirjanov <kda@linux-powerpc.org> wrote:
> >> >> > On 6/12/14, Catalin Marinas <catalin.marinas@arm.com> wrote:
> >> >> >> On 11 Jun 2014, at 21:04, Denis Kirjanov <kda@linux-powerpc.org>
> >> >> >> wrote:
> >> >> >>> On 6/11/14, Catalin Marinas <catalin.marinas@arm.com> wrote:
> >> >> >>>> On Wed, Jun 11, 2014 at 04:13:07PM +0400, Denis Kirjanov wrote:
> >> >> >>>>> I got a trace while running 3.15.0-08556-gdfb9454:
> >> >> >>>>>
> >> >> >>>>> [  104.534026] Unable to handle kernel paging request for data
> >> >> >>>>> at
> >> >> >>>>> address 0xc00000007f000000
> >> >> >>>>
> >> >> >>>> Were there any kmemleak messages prior to this, like "kmemleak
> >> >> >>>> disabled"? There could be a race when kmemleak is disabled
> >> >> >>>> because
> >> >> >>>> of
> >> >> >>>> some fatal (for kmemleak) error while the scanning is taking
> >> >> >>>> place
> >> >> >>>> (which needs some more thinking to fix properly).
> >> >> >>>
> >> >> >>> No. I checked for the similar problem and didn't find anything
> >> >> >>> relevant.
> >> >> >>> I'll try to bisect it.
> >> >> >>
> >> >> >> Does this happen soon after boot? I guess it’s the first scan
> >> >> >> (scheduled at around 1min after boot). Something seems to be
> >> >> >> telling
> >> >> >> kmemleak that there is a valid memory block at 0xc00000007f000000.
> >> >> >
> >> >> > Yeah, it happens after a while with a booted system so that's the
> >> >> > first kmemleak scan.
> >> >>
> >> >> I've bisected to this commit: d4c54919ed86302094c0ca7d48a8cbd4ee753e92
> >> >> "mm: add !pte_present() check on existing hugetlb_entry callbacks".
> >> >> Reverting the commit fixes the issue
> >> >
> >> > I can't figure how this causes the problem but I have more questions.
> >> > Is
> >> > 0xc00000007f000000 address always the same in all crashes? If yes, you
> >> > could comment out start_scan_thread() in kmemleak_late_init() to avoid
> >> > the scanning thread starting. Once booted, you can run:
> >> >
> >> >   echo dump=0xc00000007f000000 > /sys/kernel/debug/kmemleak
> >> >
> >> > and check the dmesg for what kmemleak knows about that address, when it
> >> > was allocated and whether it should be mapped or not.
> >>
> >> The address is always the same.
> >>
> >> [  179.466239] kmemleak: Object 0xc00000007f000000 (size 16777216):
> >> [  179.466503] kmemleak:   comm "swapper/0", pid 0, jiffies 4294892300
> >> [  179.466508] kmemleak:   min_count = 0
> >> [  179.466512] kmemleak:   count = 0
> >> [  179.466517] kmemleak:   flags = 0x1
> >> [  179.466522] kmemleak:   checksum = 0
> >> [  179.466526] kmemleak:   backtrace:
> >> [  179.466531]      [<c000000000afc3dc>]
> >> .memblock_alloc_range_nid+0x68/0x88
> >> [  179.466544]      [<c000000000afc444>] .memblock_alloc_base+0x20/0x58
> >> [  179.466553]      [<c000000000ae96cc>] .alloc_dart_table+0x5c/0xb0
> >> [  179.466561]      [<c000000000aea300>] .pmac_probe+0x38/0xa0
> >> [  179.466569]      [<000000000002166c>] 0x2166c
> >> [  179.466579]      [<0000000000ae0e68>] 0xae0e68
> >> [  179.466587]      [<0000000000009bc4>] 0x9bc4
> >
> > OK, so that's the DART table allocated via alloc_dart_table(). Is
> > dart_tablebase removed from the kernel linear mapping after allocation?
> > If that's the case, we need to tell kmemleak to ignore this block (see
> > patch below, untested). But I still can't explain how commit
> > d4c54919ed863020 causes this issue.
> >
> > (also cc'ing the powerpc list and maintainers)
> 
> Ok, your path fixes the oops.
> 
> Ben, can you shed some light on this issue?

(I'm not Ben)

Yes, the memory for dart_tablebase is removed from the linear mapping. In fact
it's never mapped, see htab_initialize().

I don't easily see how commit d4c54919ed8, could have exposed this, but I don't
know enough of the kmemleak internals to say for sure.

cheers



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: kmemleak: Unable to handle kernel paging request
@ 2014-06-16  2:40                     ` Michael Ellerman
  0 siblings, 0 replies; 32+ messages in thread
From: Michael Ellerman @ 2014-06-16  2:40 UTC (permalink / raw)
  To: Denis Kirjanov
  Cc: Catalin Marinas, linux-kernel, linux-mm, Paul Mackerras,
	Naoya Horiguchi, linuxppc-dev

On Fri, 2014-06-13 at 14:26 +0400, Denis Kirjanov wrote:
> On 6/13/14, Catalin Marinas <catalin.marinas@arm.com> wrote:
> > On Fri, Jun 13, 2014 at 08:12:08AM +0100, Denis Kirjanov wrote:
> >> On 6/12/14, Catalin Marinas <catalin.marinas@arm.com> wrote:
> >> > On Thu, Jun 12, 2014 at 01:00:57PM +0100, Denis Kirjanov wrote:
> >> >> On 6/12/14, Denis Kirjanov <kda@linux-powerpc.org> wrote:
> >> >> > On 6/12/14, Catalin Marinas <catalin.marinas@arm.com> wrote:
> >> >> >> On 11 Jun 2014, at 21:04, Denis Kirjanov <kda@linux-powerpc.org>
> >> >> >> wrote:
> >> >> >>> On 6/11/14, Catalin Marinas <catalin.marinas@arm.com> wrote:
> >> >> >>>> On Wed, Jun 11, 2014 at 04:13:07PM +0400, Denis Kirjanov wrote:
> >> >> >>>>> I got a trace while running 3.15.0-08556-gdfb9454:
> >> >> >>>>>
> >> >> >>>>> [  104.534026] Unable to handle kernel paging request for data
> >> >> >>>>> at
> >> >> >>>>> address 0xc00000007f000000
> >> >> >>>>
> >> >> >>>> Were there any kmemleak messages prior to this, like "kmemleak
> >> >> >>>> disabled"? There could be a race when kmemleak is disabled
> >> >> >>>> because
> >> >> >>>> of
> >> >> >>>> some fatal (for kmemleak) error while the scanning is taking
> >> >> >>>> place
> >> >> >>>> (which needs some more thinking to fix properly).
> >> >> >>>
> >> >> >>> No. I checked for the similar problem and didn't find anything
> >> >> >>> relevant.
> >> >> >>> I'll try to bisect it.
> >> >> >>
> >> >> >> Does this happen soon after boot? I guess ita??s the first scan
> >> >> >> (scheduled at around 1min after boot). Something seems to be
> >> >> >> telling
> >> >> >> kmemleak that there is a valid memory block at 0xc00000007f000000.
> >> >> >
> >> >> > Yeah, it happens after a while with a booted system so that's the
> >> >> > first kmemleak scan.
> >> >>
> >> >> I've bisected to this commit: d4c54919ed86302094c0ca7d48a8cbd4ee753e92
> >> >> "mm: add !pte_present() check on existing hugetlb_entry callbacks".
> >> >> Reverting the commit fixes the issue
> >> >
> >> > I can't figure how this causes the problem but I have more questions.
> >> > Is
> >> > 0xc00000007f000000 address always the same in all crashes? If yes, you
> >> > could comment out start_scan_thread() in kmemleak_late_init() to avoid
> >> > the scanning thread starting. Once booted, you can run:
> >> >
> >> >   echo dump=0xc00000007f000000 > /sys/kernel/debug/kmemleak
> >> >
> >> > and check the dmesg for what kmemleak knows about that address, when it
> >> > was allocated and whether it should be mapped or not.
> >>
> >> The address is always the same.
> >>
> >> [  179.466239] kmemleak: Object 0xc00000007f000000 (size 16777216):
> >> [  179.466503] kmemleak:   comm "swapper/0", pid 0, jiffies 4294892300
> >> [  179.466508] kmemleak:   min_count = 0
> >> [  179.466512] kmemleak:   count = 0
> >> [  179.466517] kmemleak:   flags = 0x1
> >> [  179.466522] kmemleak:   checksum = 0
> >> [  179.466526] kmemleak:   backtrace:
> >> [  179.466531]      [<c000000000afc3dc>]
> >> .memblock_alloc_range_nid+0x68/0x88
> >> [  179.466544]      [<c000000000afc444>] .memblock_alloc_base+0x20/0x58
> >> [  179.466553]      [<c000000000ae96cc>] .alloc_dart_table+0x5c/0xb0
> >> [  179.466561]      [<c000000000aea300>] .pmac_probe+0x38/0xa0
> >> [  179.466569]      [<000000000002166c>] 0x2166c
> >> [  179.466579]      [<0000000000ae0e68>] 0xae0e68
> >> [  179.466587]      [<0000000000009bc4>] 0x9bc4
> >
> > OK, so that's the DART table allocated via alloc_dart_table(). Is
> > dart_tablebase removed from the kernel linear mapping after allocation?
> > If that's the case, we need to tell kmemleak to ignore this block (see
> > patch below, untested). But I still can't explain how commit
> > d4c54919ed863020 causes this issue.
> >
> > (also cc'ing the powerpc list and maintainers)
> 
> Ok, your path fixes the oops.
> 
> Ben, can you shed some light on this issue?

(I'm not Ben)

Yes, the memory for dart_tablebase is removed from the linear mapping. In fact
it's never mapped, see htab_initialize().

I don't easily see how commit d4c54919ed8, could have exposed this, but I don't
know enough of the kmemleak internals to say for sure.

cheers


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2014-06-16  2:40 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-11 12:13 kmemleak: Unable to handle kernel paging request Denis Kirjanov
2014-06-11 12:13 ` Denis Kirjanov
2014-06-11 17:38 ` Catalin Marinas
2014-06-11 17:38   ` Catalin Marinas
2014-06-11 20:04   ` Denis Kirjanov
2014-06-11 20:04     ` Denis Kirjanov
2014-06-11 22:00     ` Catalin Marinas
2014-06-11 22:00       ` Catalin Marinas
2014-06-12  7:39       ` Denis Kirjanov
2014-06-12  7:39         ` Denis Kirjanov
2014-06-12 12:00         ` Denis Kirjanov
2014-06-12 12:00           ` Denis Kirjanov
2014-06-12 13:29           ` Naoya Horiguchi
2014-06-12 14:39           ` Catalin Marinas
2014-06-12 14:39             ` Catalin Marinas
2014-06-13  7:12             ` Denis Kirjanov
2014-06-13  7:12               ` Denis Kirjanov
2014-06-13  8:56               ` Catalin Marinas
2014-06-13  8:56                 ` Catalin Marinas
2014-06-13  8:56                 ` Catalin Marinas
2014-06-13 10:26                 ` Denis Kirjanov
2014-06-13 10:26                   ` Denis Kirjanov
2014-06-13 10:26                   ` Denis Kirjanov
2014-06-16  2:40                   ` Michael Ellerman
2014-06-16  2:40                     ` Michael Ellerman
2014-06-13 21:44                 ` Benjamin Herrenschmidt
2014-06-13 21:44                   ` Benjamin Herrenschmidt
2014-06-14 12:05                   ` Catalin Marinas
2014-06-14 12:05                     ` Catalin Marinas
2014-06-14 12:05                     ` Catalin Marinas
     [not found]           ` <5399ab3b.4825e00a.60fd.5014SMTPIN_ADDED_BROKEN@mx.google.com>
2014-06-13  6:39             ` Denis Kirjanov
2014-06-13  6:39               ` Denis Kirjanov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.