linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* slab leak on rxe
@ 2020-02-11  7:09 Frank Huang
  2020-02-11  7:17 ` Frank Huang
  0 siblings, 1 reply; 5+ messages in thread
From: Frank Huang @ 2020-02-11  7:09 UTC (permalink / raw)
  To: linux-rdma

Hi, All

When I use the old version of rdma_rxe (kernel 4.14.97), There is a
slab leak of qp, is it fixed in newest version? I found the commit
history on kernel.org, have not found same issue with it?


Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:
=============================================================================
Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: BUG
rxe-qp (Tainted: G           OE  ): Objects remaining in rxe-qp on
__kmem_cache_shutdown()
Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:
-----------------------------------------------------------------------------
Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: Disabling
lock debugging due to kernel taint
Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: INFO:
Slab 0xfffff4c4b027a000 objects=16 used=1 fp=0xffff96f3c9e83f00
flags=0x17ffffc0008100
Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: CPU: 27
PID: 25588 Comm: rmmod Tainted: G    B      OE
4.14.97-.el7.centos.x86_64 #1
Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: Hardware
name: 80010056, BIOS 4.1.15 03/28/2017
Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: Call Trace:
Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:
dump_stack+0x5a/0x7b
Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:  slab_err+0xb4/0xe0
Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:  ?
calibrate_delay+0x138/0x5f0
Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:  ?
on_each_cpu_mask+0x27/0x60
Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:  ?
on_each_cpu_cond+0xaf/0x140
Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:  ?
__kmalloc+0x179/0x200
Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:  ?
__kmem_cache_shutdown+0x194/0x3d0
Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:
__kmem_cache_shutdown+0x1b4/0x3d0
Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:
shutdown_cache+0x13/0x1b0
Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:
kmem_cache_destroy+0x1e4/0x220
Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:
rxe_cache_clean+0x41/0x60 [rdma_rxe]
Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:
rxe_module_exit+0xf/0x68 [rdma_rxe]
Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:
SyS_delete_module+0x175/0x270
Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:
do_syscall_64+0x74/0x190
Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:
entry_SYSCALL_64_after_hwframe+0x3d/0xa2
Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: RIP:
0033:0x7ff146d3f517
Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: RSP:
002b:00007ffd4b5c1598 EFLAGS: 00000202 ORIG_RAX: 00000000000000b0
Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: RAX:
ffffffffffffffda RBX: 0000000000d78280 RCX: 00007ff146d3f517
Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: RDX:
00007ff146db3ca0 RSI: 0000000000000800 RDI: 0000000000d782e8
Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: RBP:
0000000000000000 R08: 00007ff147008060 R09: 00007ff146db3ca0
Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: R10:
00007ffd4b5c1020 R11: 0000000000000202 R12: 00007ffd4b5c36ca
Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: R13:
0000000000000000 R14: 0000000000d78280 R15: 0000000000d78010
Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: INFO:
Object 0xffff96f3c9e84ec0 @offset=20160
Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:
kmem_cache_destroy rxe-qp: Slab cache still has objects
Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: CPU: 27
PID: 25588 Comm: rmmod Tainted: G    B      OE
4.14.97-.el7.centos.x86_64 #1
Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: Hardware
name: 80010056, BIOS 4.1.15 03/28/2017
Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: Call Trace:
Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:
dump_stack+0x5a/0x7b
Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:
kmem_cache_destroy+0x203/0x220
Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:
rxe_cache_clean+0x41/0x60 [rdma_rxe]
Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:
rxe_module_exit+0xf/0x68 [rdma_rxe]
Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:
SyS_delete_module+0x175/0x270
Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:
do_syscall_64+0x74/0x190
Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:
entry_SYSCALL_64_after_hwframe+0x3d/0xa2
Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: RIP:
0033:0x7ff146d3f517
Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: RSP:
002b:00007ffd4b5c1598 EFLAGS: 00000202 ORIG_RAX: 00000000000000b0
Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: RAX:
ffffffffffffffda RBX: 0000000000d78280 RCX: 00007ff146d3f517
Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: RDX:
00007ff146db3ca0 RSI: 0000000000000800 RDI: 0000000000d782e8
Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: RBP:
0000000000000000 R08: 00007ff147008060 R09: 00007ff146db3ca0
Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: R10:
00007ffd4b5c1020 R11: 0000000000000202 R12: 00007ffd4b5c36ca
Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: R13:
0000000000000000 R14: 0000000000d78280 R15: 0000000000d78010

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: slab leak on rxe
  2020-02-11  7:09 slab leak on rxe Frank Huang
@ 2020-02-11  7:17 ` Frank Huang
  2020-02-11  7:41   ` Zhu Yanjun
  0 siblings, 1 reply; 5+ messages in thread
From: Frank Huang @ 2020-02-11  7:17 UTC (permalink / raw)
  To: linux-rdma

Re-post the log , sorry for the format.

Feb 11 14:17:31  kernel:
=============================================================================
Feb 11 14:17:31  kernel: BUG rxe-qp (Tainted: G           OE  ):
Objects remaining in rxe-qp on __kmem_cache_shutdown()
Feb 11 14:17:31  kernel:
-----------------------------------------------------------------------------
Feb 11 14:17:31  kernel: Disabling lock debugging due to kernel taint
Feb 11 14:17:31  kernel: INFO: Slab 0xfffff4c4b027a000 objects=16
used=1 fp=0xffff96f3c9e83f00 flags=0x17ffffc0008100
Feb 11 14:17:31  kernel: CPU: 27 PID: 25588 Comm: rmmod Tainted: G
B      OE   4.14.97-.el7.centos.x86_64 #1
Feb 11 14:17:31  kernel: Hardware name: /80010056, BIOS 4.1.15 03/28/2017
Feb 11 14:17:31  kernel: Call Trace:
Feb 11 14:17:31  kernel:  dump_stack+0x5a/0x7b
Feb 11 14:17:31  kernel:  slab_err+0xb4/0xe0
Feb 11 14:17:31  kernel:  ? calibrate_delay+0x138/0x5f0
Feb 11 14:17:31  kernel:  ? on_each_cpu_mask+0x27/0x60
Feb 11 14:17:31  kernel:  ? on_each_cpu_cond+0xaf/0x140
Feb 11 14:17:31  kernel:  ? __kmalloc+0x179/0x200
Feb 11 14:17:31  kernel:  ? __kmem_cache_shutdown+0x194/0x3d0
Feb 11 14:17:31  kernel:  __kmem_cache_shutdown+0x1b4/0x3d0
Feb 11 14:17:31  kernel:  shutdown_cache+0x13/0x1b0
Feb 11 14:17:31  kernel:  kmem_cache_destroy+0x1e4/0x220
Feb 11 14:17:31  kernel:  rxe_cache_clean+0x41/0x60 [rdma_rxe]
Feb 11 14:17:31  kernel:  rxe_module_exit+0xf/0x68 [rdma_rxe]
Feb 11 14:17:31  kernel:  SyS_delete_module+0x175/0x270
Feb 11 14:17:31  kernel:  do_syscall_64+0x74/0x190
Feb 11 14:17:31  kernel:  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
Feb 11 14:17:31  kernel: RIP: 0033:0x7ff146d3f517
Feb 11 14:17:31  kernel: RSP: 002b:00007ffd4b5c1598 EFLAGS: 00000202
ORIG_RAX: 00000000000000b0
Feb 11 14:17:31  kernel: RAX: ffffffffffffffda RBX: 0000000000d78280
RCX: 00007ff146d3f517
Feb 11 14:17:31  kernel: RDX: 00007ff146db3ca0 RSI: 0000000000000800
RDI: 0000000000d782e8
Feb 11 14:17:31  kernel: RBP: 0000000000000000 R08: 00007ff147008060
R09: 00007ff146db3ca0
Feb 11 14:17:31  kernel: R10: 00007ffd4b5c1020 R11: 0000000000000202
R12: 00007ffd4b5c36ca
Feb 11 14:17:31  kernel: R13: 0000000000000000 R14: 0000000000d78280
R15: 0000000000d78010
Feb 11 14:17:31  kernel: INFO: Object 0xffff96f3c9e84ec0 @offset=20160
Feb 11 14:17:31  kernel: kmem_cache_destroy rxe-qp: Slab cache still has objects
Feb 11 14:17:31  kernel: CPU: 27 PID: 25588 Comm: rmmod Tainted: G
B      OE   4.14.97-.el7.centos.x86_64 #1
Feb 11 14:17:31  kernel: Hardware name: /80010056, BIOS 4.1.15 03/28/2017
Feb 11 14:17:31  kernel: Call Trace:
Feb 11 14:17:31  kernel:  dump_stack+0x5a/0x7b
Feb 11 14:17:31  kernel:  kmem_cache_destroy+0x203/0x220
Feb 11 14:17:31  kernel:  rxe_cache_clean+0x41/0x60 [rdma_rxe]
Feb 11 14:17:31  kernel:  rxe_module_exit+0xf/0x68 [rdma_rxe]
Feb 11 14:17:31  kernel:  SyS_delete_module+0x175/0x270
Feb 11 14:17:31  kernel:  do_syscall_64+0x74/0x190
Feb 11 14:17:31  kernel:  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
Feb 11 14:17:31  kernel: RIP: 0033:0x7ff146d3f517
Feb 11 14:17:31  kernel: RSP: 002b:00007ffd4b5c1598 EFLAGS: 00000202
ORIG_RAX: 00000000000000b0
Feb 11 14:17:31  kernel: RAX: ffffffffffffffda RBX: 0000000000d78280
RCX: 00007ff146d3f517
Feb 11 14:17:31  kernel: RDX: 00007ff146db3ca0 RSI: 0000000000000800
RDI: 0000000000d782e8

On Tue, Feb 11, 2020 at 3:09 PM Frank Huang <tigerinxm@gmail.com> wrote:
>
> Hi, All
>
> When I use the old version of rdma_rxe (kernel 4.14.97), There is a
> slab leak of qp, is it fixed in newest version? I found the commit
> history on kernel.org, have not found same issue with it?
>
>
> Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:
> =============================================================================
> Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: BUG
> rxe-qp (Tainted: G           OE  ): Objects remaining in rxe-qp on
> __kmem_cache_shutdown()
> Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:
> -----------------------------------------------------------------------------
> Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: Disabling
> lock debugging due to kernel taint
> Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: INFO:
> Slab 0xfffff4c4b027a000 objects=16 used=1 fp=0xffff96f3c9e83f00
> flags=0x17ffffc0008100
> Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: CPU: 27
> PID: 25588 Comm: rmmod Tainted: G    B      OE
> 4.14.97-.el7.centos.x86_64 #1
> Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: Hardware
> name: 80010056, BIOS 4.1.15 03/28/2017
> Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: Call Trace:
> Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:
> dump_stack+0x5a/0x7b
> Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:  slab_err+0xb4/0xe0
> Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:  ?
> calibrate_delay+0x138/0x5f0
> Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:  ?
> on_each_cpu_mask+0x27/0x60
> Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:  ?
> on_each_cpu_cond+0xaf/0x140
> Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:  ?
> __kmalloc+0x179/0x200
> Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:  ?
> __kmem_cache_shutdown+0x194/0x3d0
> Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:
> __kmem_cache_shutdown+0x1b4/0x3d0
> Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:
> shutdown_cache+0x13/0x1b0
> Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:
> kmem_cache_destroy+0x1e4/0x220
> Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:
> rxe_cache_clean+0x41/0x60 [rdma_rxe]
> Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:
> rxe_module_exit+0xf/0x68 [rdma_rxe]
> Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:
> SyS_delete_module+0x175/0x270
> Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:
> do_syscall_64+0x74/0x190
> Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:
> entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: RIP:
> 0033:0x7ff146d3f517
> Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: RSP:
> 002b:00007ffd4b5c1598 EFLAGS: 00000202 ORIG_RAX: 00000000000000b0
> Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: RAX:
> ffffffffffffffda RBX: 0000000000d78280 RCX: 00007ff146d3f517
> Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: RDX:
> 00007ff146db3ca0 RSI: 0000000000000800 RDI: 0000000000d782e8
> Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: RBP:
> 0000000000000000 R08: 00007ff147008060 R09: 00007ff146db3ca0
> Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: R10:
> 00007ffd4b5c1020 R11: 0000000000000202 R12: 00007ffd4b5c36ca
> Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: R13:
> 0000000000000000 R14: 0000000000d78280 R15: 0000000000d78010
> Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: INFO:
> Object 0xffff96f3c9e84ec0 @offset=20160
> Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:
> kmem_cache_destroy rxe-qp: Slab cache still has objects
> Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: CPU: 27
> PID: 25588 Comm: rmmod Tainted: G    B      OE
> 4.14.97-.el7.centos.x86_64 #1
> Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: Hardware
> name: 80010056, BIOS 4.1.15 03/28/2017
> Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: Call Trace:
> Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:
> dump_stack+0x5a/0x7b
> Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:
> kmem_cache_destroy+0x203/0x220
> Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:
> rxe_cache_clean+0x41/0x60 [rdma_rxe]
> Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:
> rxe_module_exit+0xf/0x68 [rdma_rxe]
> Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:
> SyS_delete_module+0x175/0x270
> Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:
> do_syscall_64+0x74/0x190
> Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:
> entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: RIP:
> 0033:0x7ff146d3f517
> Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: RSP:
> 002b:00007ffd4b5c1598 EFLAGS: 00000202 ORIG_RAX: 00000000000000b0
> Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: RAX:
> ffffffffffffffda RBX: 0000000000d78280 RCX: 00007ff146d3f517
> Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: RDX:
> 00007ff146db3ca0 RSI: 0000000000000800 RDI: 0000000000d782e8
> Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: RBP:
> 0000000000000000 R08: 00007ff147008060 R09: 00007ff146db3ca0
> Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: R10:
> 00007ffd4b5c1020 R11: 0000000000000202 R12: 00007ffd4b5c36ca
> Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: R13:
> 0000000000000000 R14: 0000000000d78280 R15: 0000000000d78010

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: slab leak on rxe
  2020-02-11  7:17 ` Frank Huang
@ 2020-02-11  7:41   ` Zhu Yanjun
  2020-02-11  8:33     ` Frank Huang
  0 siblings, 1 reply; 5+ messages in thread
From: Zhu Yanjun @ 2020-02-11  7:41 UTC (permalink / raw)
  To: Frank Huang; +Cc: linux-rdma

Can this bug be reproduced?

Zhu Yanjun

On Tue, Feb 11, 2020 at 3:32 PM Frank Huang <tigerinxm@gmail.com> wrote:
>
> Re-post the log , sorry for the format.
>
> Feb 11 14:17:31  kernel:
> =============================================================================
> Feb 11 14:17:31  kernel: BUG rxe-qp (Tainted: G           OE  ):
> Objects remaining in rxe-qp on __kmem_cache_shutdown()
> Feb 11 14:17:31  kernel:
> -----------------------------------------------------------------------------
> Feb 11 14:17:31  kernel: Disabling lock debugging due to kernel taint
> Feb 11 14:17:31  kernel: INFO: Slab 0xfffff4c4b027a000 objects=16
> used=1 fp=0xffff96f3c9e83f00 flags=0x17ffffc0008100
> Feb 11 14:17:31  kernel: CPU: 27 PID: 25588 Comm: rmmod Tainted: G
> B      OE   4.14.97-.el7.centos.x86_64 #1
> Feb 11 14:17:31  kernel: Hardware name: /80010056, BIOS 4.1.15 03/28/2017
> Feb 11 14:17:31  kernel: Call Trace:
> Feb 11 14:17:31  kernel:  dump_stack+0x5a/0x7b
> Feb 11 14:17:31  kernel:  slab_err+0xb4/0xe0
> Feb 11 14:17:31  kernel:  ? calibrate_delay+0x138/0x5f0
> Feb 11 14:17:31  kernel:  ? on_each_cpu_mask+0x27/0x60
> Feb 11 14:17:31  kernel:  ? on_each_cpu_cond+0xaf/0x140
> Feb 11 14:17:31  kernel:  ? __kmalloc+0x179/0x200
> Feb 11 14:17:31  kernel:  ? __kmem_cache_shutdown+0x194/0x3d0
> Feb 11 14:17:31  kernel:  __kmem_cache_shutdown+0x1b4/0x3d0
> Feb 11 14:17:31  kernel:  shutdown_cache+0x13/0x1b0
> Feb 11 14:17:31  kernel:  kmem_cache_destroy+0x1e4/0x220
> Feb 11 14:17:31  kernel:  rxe_cache_clean+0x41/0x60 [rdma_rxe]
> Feb 11 14:17:31  kernel:  rxe_module_exit+0xf/0x68 [rdma_rxe]
> Feb 11 14:17:31  kernel:  SyS_delete_module+0x175/0x270
> Feb 11 14:17:31  kernel:  do_syscall_64+0x74/0x190
> Feb 11 14:17:31  kernel:  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> Feb 11 14:17:31  kernel: RIP: 0033:0x7ff146d3f517
> Feb 11 14:17:31  kernel: RSP: 002b:00007ffd4b5c1598 EFLAGS: 00000202
> ORIG_RAX: 00000000000000b0
> Feb 11 14:17:31  kernel: RAX: ffffffffffffffda RBX: 0000000000d78280
> RCX: 00007ff146d3f517
> Feb 11 14:17:31  kernel: RDX: 00007ff146db3ca0 RSI: 0000000000000800
> RDI: 0000000000d782e8
> Feb 11 14:17:31  kernel: RBP: 0000000000000000 R08: 00007ff147008060
> R09: 00007ff146db3ca0
> Feb 11 14:17:31  kernel: R10: 00007ffd4b5c1020 R11: 0000000000000202
> R12: 00007ffd4b5c36ca
> Feb 11 14:17:31  kernel: R13: 0000000000000000 R14: 0000000000d78280
> R15: 0000000000d78010
> Feb 11 14:17:31  kernel: INFO: Object 0xffff96f3c9e84ec0 @offset=20160
> Feb 11 14:17:31  kernel: kmem_cache_destroy rxe-qp: Slab cache still has objects
> Feb 11 14:17:31  kernel: CPU: 27 PID: 25588 Comm: rmmod Tainted: G
> B      OE   4.14.97-.el7.centos.x86_64 #1
> Feb 11 14:17:31  kernel: Hardware name: /80010056, BIOS 4.1.15 03/28/2017
> Feb 11 14:17:31  kernel: Call Trace:
> Feb 11 14:17:31  kernel:  dump_stack+0x5a/0x7b
> Feb 11 14:17:31  kernel:  kmem_cache_destroy+0x203/0x220
> Feb 11 14:17:31  kernel:  rxe_cache_clean+0x41/0x60 [rdma_rxe]
> Feb 11 14:17:31  kernel:  rxe_module_exit+0xf/0x68 [rdma_rxe]
> Feb 11 14:17:31  kernel:  SyS_delete_module+0x175/0x270
> Feb 11 14:17:31  kernel:  do_syscall_64+0x74/0x190
> Feb 11 14:17:31  kernel:  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> Feb 11 14:17:31  kernel: RIP: 0033:0x7ff146d3f517
> Feb 11 14:17:31  kernel: RSP: 002b:00007ffd4b5c1598 EFLAGS: 00000202
> ORIG_RAX: 00000000000000b0
> Feb 11 14:17:31  kernel: RAX: ffffffffffffffda RBX: 0000000000d78280
> RCX: 00007ff146d3f517
> Feb 11 14:17:31  kernel: RDX: 00007ff146db3ca0 RSI: 0000000000000800
> RDI: 0000000000d782e8
>
> On Tue, Feb 11, 2020 at 3:09 PM Frank Huang <tigerinxm@gmail.com> wrote:
> >
> > Hi, All
> >
> > When I use the old version of rdma_rxe (kernel 4.14.97), There is a
> > slab leak of qp, is it fixed in newest version? I found the commit
> > history on kernel.org, have not found same issue with it?
> >
> >
> > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:
> > =============================================================================
> > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: BUG
> > rxe-qp (Tainted: G           OE  ): Objects remaining in rxe-qp on
> > __kmem_cache_shutdown()
> > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:
> > -----------------------------------------------------------------------------
> > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: Disabling
> > lock debugging due to kernel taint
> > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: INFO:
> > Slab 0xfffff4c4b027a000 objects=16 used=1 fp=0xffff96f3c9e83f00
> > flags=0x17ffffc0008100
> > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: CPU: 27
> > PID: 25588 Comm: rmmod Tainted: G    B      OE
> > 4.14.97-.el7.centos.x86_64 #1
> > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: Hardware
> > name: 80010056, BIOS 4.1.15 03/28/2017
> > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: Call Trace:
> > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:
> > dump_stack+0x5a/0x7b
> > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:  slab_err+0xb4/0xe0
> > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:  ?
> > calibrate_delay+0x138/0x5f0
> > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:  ?
> > on_each_cpu_mask+0x27/0x60
> > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:  ?
> > on_each_cpu_cond+0xaf/0x140
> > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:  ?
> > __kmalloc+0x179/0x200
> > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:  ?
> > __kmem_cache_shutdown+0x194/0x3d0
> > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:
> > __kmem_cache_shutdown+0x1b4/0x3d0
> > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:
> > shutdown_cache+0x13/0x1b0
> > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:
> > kmem_cache_destroy+0x1e4/0x220
> > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:
> > rxe_cache_clean+0x41/0x60 [rdma_rxe]
> > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:
> > rxe_module_exit+0xf/0x68 [rdma_rxe]
> > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:
> > SyS_delete_module+0x175/0x270
> > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:
> > do_syscall_64+0x74/0x190
> > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:
> > entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: RIP:
> > 0033:0x7ff146d3f517
> > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: RSP:
> > 002b:00007ffd4b5c1598 EFLAGS: 00000202 ORIG_RAX: 00000000000000b0
> > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: RAX:
> > ffffffffffffffda RBX: 0000000000d78280 RCX: 00007ff146d3f517
> > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: RDX:
> > 00007ff146db3ca0 RSI: 0000000000000800 RDI: 0000000000d782e8
> > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: RBP:
> > 0000000000000000 R08: 00007ff147008060 R09: 00007ff146db3ca0
> > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: R10:
> > 00007ffd4b5c1020 R11: 0000000000000202 R12: 00007ffd4b5c36ca
> > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: R13:
> > 0000000000000000 R14: 0000000000d78280 R15: 0000000000d78010
> > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: INFO:
> > Object 0xffff96f3c9e84ec0 @offset=20160
> > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:
> > kmem_cache_destroy rxe-qp: Slab cache still has objects
> > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: CPU: 27
> > PID: 25588 Comm: rmmod Tainted: G    B      OE
> > 4.14.97-.el7.centos.x86_64 #1
> > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: Hardware
> > name: 80010056, BIOS 4.1.15 03/28/2017
> > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: Call Trace:
> > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:
> > dump_stack+0x5a/0x7b
> > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:
> > kmem_cache_destroy+0x203/0x220
> > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:
> > rxe_cache_clean+0x41/0x60 [rdma_rxe]
> > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:
> > rxe_module_exit+0xf/0x68 [rdma_rxe]
> > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:
> > SyS_delete_module+0x175/0x270
> > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:
> > do_syscall_64+0x74/0x190
> > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:
> > entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: RIP:
> > 0033:0x7ff146d3f517
> > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: RSP:
> > 002b:00007ffd4b5c1598 EFLAGS: 00000202 ORIG_RAX: 00000000000000b0
> > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: RAX:
> > ffffffffffffffda RBX: 0000000000d78280 RCX: 00007ff146d3f517
> > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: RDX:
> > 00007ff146db3ca0 RSI: 0000000000000800 RDI: 0000000000d782e8
> > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: RBP:
> > 0000000000000000 R08: 00007ff147008060 R09: 00007ff146db3ca0
> > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: R10:
> > 00007ffd4b5c1020 R11: 0000000000000202 R12: 00007ffd4b5c36ca
> > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: R13:
> > 0000000000000000 R14: 0000000000d78280 R15: 0000000000d78010

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: slab leak on rxe
  2020-02-11  7:41   ` Zhu Yanjun
@ 2020-02-11  8:33     ` Frank Huang
  2020-02-12 13:07       ` Zhu Yanjun
  0 siblings, 1 reply; 5+ messages in thread
From: Frank Huang @ 2020-02-11  8:33 UTC (permalink / raw)
  To: Zhu Yanjun; +Cc: linux-rdma

This is the first time I meet this bug, haven't found the bug trigger yet.

We will kill the process in some situation using kill -9. Would it cause that?

Before this happens, there are some error report:

Feb 11 04:24:55  kernel: rdma_rxe: no qp matches qpn 0x31f5
Feb 11 04:24:55  kernel: rdma_rxe: no qp matches qpn 0x31f5
Feb 11 04:24:55  kernel: rdma_rxe: no qp matches qpn 0x31f5
Feb 11 04:24:55  kernel: rdma_rxe: no qp matches qpn 0x31f5
Feb 11 04:24:55  kernel: rdma_rxe: no qp matches qpn 0x31f5
Feb 11 04:24:55  kernel: rdma_rxe: no qp matches qpn 0x31f5
Feb 11 04:24:55  kernel: rdma_rxe: no qp matches qpn 0x31f5
Feb 11 04:24:55  kernel: rdma_rxe: no qp matches qpn 0x31f5
Feb 11 04:24:55  kernel: rdma_rxe: no qp matches qpn 0x31f5
Feb 11 04:24:55  kernel: rdma_rxe: no qp matches qpn 0x31f5

On Tue, Feb 11, 2020 at 3:42 PM Zhu Yanjun <zyjzyj2000@gmail.com> wrote:
>
> Can this bug be reproduced?
>
> Zhu Yanjun
>
> On Tue, Feb 11, 2020 at 3:32 PM Frank Huang <tigerinxm@gmail.com> wrote:
> >
> > Re-post the log , sorry for the format.
> >
> > Feb 11 14:17:31  kernel:
> > =============================================================================
> > Feb 11 14:17:31  kernel: BUG rxe-qp (Tainted: G           OE  ):
> > Objects remaining in rxe-qp on __kmem_cache_shutdown()
> > Feb 11 14:17:31  kernel:
> > -----------------------------------------------------------------------------
> > Feb 11 14:17:31  kernel: Disabling lock debugging due to kernel taint
> > Feb 11 14:17:31  kernel: INFO: Slab 0xfffff4c4b027a000 objects=16
> > used=1 fp=0xffff96f3c9e83f00 flags=0x17ffffc0008100
> > Feb 11 14:17:31  kernel: CPU: 27 PID: 25588 Comm: rmmod Tainted: G
> > B      OE   4.14.97-.el7.centos.x86_64 #1
> > Feb 11 14:17:31  kernel: Hardware name: /80010056, BIOS 4.1.15 03/28/2017
> > Feb 11 14:17:31  kernel: Call Trace:
> > Feb 11 14:17:31  kernel:  dump_stack+0x5a/0x7b
> > Feb 11 14:17:31  kernel:  slab_err+0xb4/0xe0
> > Feb 11 14:17:31  kernel:  ? calibrate_delay+0x138/0x5f0
> > Feb 11 14:17:31  kernel:  ? on_each_cpu_mask+0x27/0x60
> > Feb 11 14:17:31  kernel:  ? on_each_cpu_cond+0xaf/0x140
> > Feb 11 14:17:31  kernel:  ? __kmalloc+0x179/0x200
> > Feb 11 14:17:31  kernel:  ? __kmem_cache_shutdown+0x194/0x3d0
> > Feb 11 14:17:31  kernel:  __kmem_cache_shutdown+0x1b4/0x3d0
> > Feb 11 14:17:31  kernel:  shutdown_cache+0x13/0x1b0
> > Feb 11 14:17:31  kernel:  kmem_cache_destroy+0x1e4/0x220
> > Feb 11 14:17:31  kernel:  rxe_cache_clean+0x41/0x60 [rdma_rxe]
> > Feb 11 14:17:31  kernel:  rxe_module_exit+0xf/0x68 [rdma_rxe]
> > Feb 11 14:17:31  kernel:  SyS_delete_module+0x175/0x270
> > Feb 11 14:17:31  kernel:  do_syscall_64+0x74/0x190
> > Feb 11 14:17:31  kernel:  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> > Feb 11 14:17:31  kernel: RIP: 0033:0x7ff146d3f517
> > Feb 11 14:17:31  kernel: RSP: 002b:00007ffd4b5c1598 EFLAGS: 00000202
> > ORIG_RAX: 00000000000000b0
> > Feb 11 14:17:31  kernel: RAX: ffffffffffffffda RBX: 0000000000d78280
> > RCX: 00007ff146d3f517
> > Feb 11 14:17:31  kernel: RDX: 00007ff146db3ca0 RSI: 0000000000000800
> > RDI: 0000000000d782e8
> > Feb 11 14:17:31  kernel: RBP: 0000000000000000 R08: 00007ff147008060
> > R09: 00007ff146db3ca0
> > Feb 11 14:17:31  kernel: R10: 00007ffd4b5c1020 R11: 0000000000000202
> > R12: 00007ffd4b5c36ca
> > Feb 11 14:17:31  kernel: R13: 0000000000000000 R14: 0000000000d78280
> > R15: 0000000000d78010
> > Feb 11 14:17:31  kernel: INFO: Object 0xffff96f3c9e84ec0 @offset=20160
> > Feb 11 14:17:31  kernel: kmem_cache_destroy rxe-qp: Slab cache still has objects
> > Feb 11 14:17:31  kernel: CPU: 27 PID: 25588 Comm: rmmod Tainted: G
> > B      OE   4.14.97-.el7.centos.x86_64 #1
> > Feb 11 14:17:31  kernel: Hardware name: /80010056, BIOS 4.1.15 03/28/2017
> > Feb 11 14:17:31  kernel: Call Trace:
> > Feb 11 14:17:31  kernel:  dump_stack+0x5a/0x7b
> > Feb 11 14:17:31  kernel:  kmem_cache_destroy+0x203/0x220
> > Feb 11 14:17:31  kernel:  rxe_cache_clean+0x41/0x60 [rdma_rxe]
> > Feb 11 14:17:31  kernel:  rxe_module_exit+0xf/0x68 [rdma_rxe]
> > Feb 11 14:17:31  kernel:  SyS_delete_module+0x175/0x270
> > Feb 11 14:17:31  kernel:  do_syscall_64+0x74/0x190
> > Feb 11 14:17:31  kernel:  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> > Feb 11 14:17:31  kernel: RIP: 0033:0x7ff146d3f517
> > Feb 11 14:17:31  kernel: RSP: 002b:00007ffd4b5c1598 EFLAGS: 00000202
> > ORIG_RAX: 00000000000000b0
> > Feb 11 14:17:31  kernel: RAX: ffffffffffffffda RBX: 0000000000d78280
> > RCX: 00007ff146d3f517
> > Feb 11 14:17:31  kernel: RDX: 00007ff146db3ca0 RSI: 0000000000000800
> > RDI: 0000000000d782e8
> >
> > On Tue, Feb 11, 2020 at 3:09 PM Frank Huang <tigerinxm@gmail.com> wrote:
> > >
> > > Hi, All
> > >
> > > When I use the old version of rdma_rxe (kernel 4.14.97), There is a
> > > slab leak of qp, is it fixed in newest version? I found the commit
> > > history on kernel.org, have not found same issue with it?
> > >
> > >
> > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:
> > > =============================================================================
> > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: BUG
> > > rxe-qp (Tainted: G           OE  ): Objects remaining in rxe-qp on
> > > __kmem_cache_shutdown()
> > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:
> > > -----------------------------------------------------------------------------
> > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: Disabling
> > > lock debugging due to kernel taint
> > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: INFO:
> > > Slab 0xfffff4c4b027a000 objects=16 used=1 fp=0xffff96f3c9e83f00
> > > flags=0x17ffffc0008100
> > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: CPU: 27
> > > PID: 25588 Comm: rmmod Tainted: G    B      OE
> > > 4.14.97-.el7.centos.x86_64 #1
> > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: Hardware
> > > name: 80010056, BIOS 4.1.15 03/28/2017
> > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: Call Trace:
> > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:
> > > dump_stack+0x5a/0x7b
> > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:  slab_err+0xb4/0xe0
> > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:  ?
> > > calibrate_delay+0x138/0x5f0
> > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:  ?
> > > on_each_cpu_mask+0x27/0x60
> > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:  ?
> > > on_each_cpu_cond+0xaf/0x140
> > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:  ?
> > > __kmalloc+0x179/0x200
> > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:  ?
> > > __kmem_cache_shutdown+0x194/0x3d0
> > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:
> > > __kmem_cache_shutdown+0x1b4/0x3d0
> > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:
> > > shutdown_cache+0x13/0x1b0
> > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:
> > > kmem_cache_destroy+0x1e4/0x220
> > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:
> > > rxe_cache_clean+0x41/0x60 [rdma_rxe]
> > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:
> > > rxe_module_exit+0xf/0x68 [rdma_rxe]
> > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:
> > > SyS_delete_module+0x175/0x270
> > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:
> > > do_syscall_64+0x74/0x190
> > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:
> > > entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: RIP:
> > > 0033:0x7ff146d3f517
> > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: RSP:
> > > 002b:00007ffd4b5c1598 EFLAGS: 00000202 ORIG_RAX: 00000000000000b0
> > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: RAX:
> > > ffffffffffffffda RBX: 0000000000d78280 RCX: 00007ff146d3f517
> > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: RDX:
> > > 00007ff146db3ca0 RSI: 0000000000000800 RDI: 0000000000d782e8
> > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: RBP:
> > > 0000000000000000 R08: 00007ff147008060 R09: 00007ff146db3ca0
> > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: R10:
> > > 00007ffd4b5c1020 R11: 0000000000000202 R12: 00007ffd4b5c36ca
> > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: R13:
> > > 0000000000000000 R14: 0000000000d78280 R15: 0000000000d78010
> > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: INFO:
> > > Object 0xffff96f3c9e84ec0 @offset=20160
> > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:
> > > kmem_cache_destroy rxe-qp: Slab cache still has objects
> > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: CPU: 27
> > > PID: 25588 Comm: rmmod Tainted: G    B      OE
> > > 4.14.97-.el7.centos.x86_64 #1
> > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: Hardware
> > > name: 80010056, BIOS 4.1.15 03/28/2017
> > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: Call Trace:
> > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:
> > > dump_stack+0x5a/0x7b
> > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:
> > > kmem_cache_destroy+0x203/0x220
> > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:
> > > rxe_cache_clean+0x41/0x60 [rdma_rxe]
> > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:
> > > rxe_module_exit+0xf/0x68 [rdma_rxe]
> > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:
> > > SyS_delete_module+0x175/0x270
> > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:
> > > do_syscall_64+0x74/0x190
> > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:
> > > entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: RIP:
> > > 0033:0x7ff146d3f517
> > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: RSP:
> > > 002b:00007ffd4b5c1598 EFLAGS: 00000202 ORIG_RAX: 00000000000000b0
> > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: RAX:
> > > ffffffffffffffda RBX: 0000000000d78280 RCX: 00007ff146d3f517
> > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: RDX:
> > > 00007ff146db3ca0 RSI: 0000000000000800 RDI: 0000000000d782e8
> > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: RBP:
> > > 0000000000000000 R08: 00007ff147008060 R09: 00007ff146db3ca0
> > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: R10:
> > > 00007ffd4b5c1020 R11: 0000000000000202 R12: 00007ffd4b5c36ca
> > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: R13:
> > > 0000000000000000 R14: 0000000000d78280 R15: 0000000000d78010

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: slab leak on rxe
  2020-02-11  8:33     ` Frank Huang
@ 2020-02-12 13:07       ` Zhu Yanjun
  0 siblings, 0 replies; 5+ messages in thread
From: Zhu Yanjun @ 2020-02-12 13:07 UTC (permalink / raw)
  To: Frank Huang; +Cc: linux-rdma

From kernel 4.14.97, the function rxe_cache_clean does not exist.
This function is introduced in the following commit.
"
commit 6db21d8986e14e2e86573a3b055b05296188bd2c
Author: Yuval Shaia <yuval.shaia@oracle.com>
Date:   Sun Dec 9 15:53:49 2018 +0200
    IB/rxe: Fix incorrect cache cleanup in error flow
    Array iterator stays at the same slot, fix it.
    Fixes: 8700e3e7c485 ("Soft RoCE driver")
    Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
    Reviewed-by: Bart Van Assche <bvanassche@acm.org>
    Reviewed-by: Zhu Yanjun <yanjun.zhu@oracle.com>
    Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
    Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
"

On Tue, Feb 11, 2020 at 4:33 PM Frank Huang <tigerinxm@gmail.com> wrote:
>
> This is the first time I meet this bug, haven't found the bug trigger yet.
>
> We will kill the process in some situation using kill -9. Would it cause that?
>
> Before this happens, there are some error report:
>
> Feb 11 04:24:55  kernel: rdma_rxe: no qp matches qpn 0x31f5
> Feb 11 04:24:55  kernel: rdma_rxe: no qp matches qpn 0x31f5
> Feb 11 04:24:55  kernel: rdma_rxe: no qp matches qpn 0x31f5
> Feb 11 04:24:55  kernel: rdma_rxe: no qp matches qpn 0x31f5
> Feb 11 04:24:55  kernel: rdma_rxe: no qp matches qpn 0x31f5
> Feb 11 04:24:55  kernel: rdma_rxe: no qp matches qpn 0x31f5
> Feb 11 04:24:55  kernel: rdma_rxe: no qp matches qpn 0x31f5
> Feb 11 04:24:55  kernel: rdma_rxe: no qp matches qpn 0x31f5
> Feb 11 04:24:55  kernel: rdma_rxe: no qp matches qpn 0x31f5
> Feb 11 04:24:55  kernel: rdma_rxe: no qp matches qpn 0x31f5
>
> On Tue, Feb 11, 2020 at 3:42 PM Zhu Yanjun <zyjzyj2000@gmail.com> wrote:
> >
> > Can this bug be reproduced?
> >
> > Zhu Yanjun
> >
> > On Tue, Feb 11, 2020 at 3:32 PM Frank Huang <tigerinxm@gmail.com> wrote:
> > >
> > > Re-post the log , sorry for the format.
> > >
> > > Feb 11 14:17:31  kernel:
> > > =============================================================================
> > > Feb 11 14:17:31  kernel: BUG rxe-qp (Tainted: G           OE  ):
> > > Objects remaining in rxe-qp on __kmem_cache_shutdown()
> > > Feb 11 14:17:31  kernel:
> > > -----------------------------------------------------------------------------
> > > Feb 11 14:17:31  kernel: Disabling lock debugging due to kernel taint
> > > Feb 11 14:17:31  kernel: INFO: Slab 0xfffff4c4b027a000 objects=16
> > > used=1 fp=0xffff96f3c9e83f00 flags=0x17ffffc0008100
> > > Feb 11 14:17:31  kernel: CPU: 27 PID: 25588 Comm: rmmod Tainted: G
> > > B      OE   4.14.97-.el7.centos.x86_64 #1
> > > Feb 11 14:17:31  kernel: Hardware name: /80010056, BIOS 4.1.15 03/28/2017
> > > Feb 11 14:17:31  kernel: Call Trace:
> > > Feb 11 14:17:31  kernel:  dump_stack+0x5a/0x7b
> > > Feb 11 14:17:31  kernel:  slab_err+0xb4/0xe0
> > > Feb 11 14:17:31  kernel:  ? calibrate_delay+0x138/0x5f0
> > > Feb 11 14:17:31  kernel:  ? on_each_cpu_mask+0x27/0x60
> > > Feb 11 14:17:31  kernel:  ? on_each_cpu_cond+0xaf/0x140
> > > Feb 11 14:17:31  kernel:  ? __kmalloc+0x179/0x200
> > > Feb 11 14:17:31  kernel:  ? __kmem_cache_shutdown+0x194/0x3d0
> > > Feb 11 14:17:31  kernel:  __kmem_cache_shutdown+0x1b4/0x3d0
> > > Feb 11 14:17:31  kernel:  shutdown_cache+0x13/0x1b0
> > > Feb 11 14:17:31  kernel:  kmem_cache_destroy+0x1e4/0x220
> > > Feb 11 14:17:31  kernel:  rxe_cache_clean+0x41/0x60 [rdma_rxe]
> > > Feb 11 14:17:31  kernel:  rxe_module_exit+0xf/0x68 [rdma_rxe]
> > > Feb 11 14:17:31  kernel:  SyS_delete_module+0x175/0x270
> > > Feb 11 14:17:31  kernel:  do_syscall_64+0x74/0x190
> > > Feb 11 14:17:31  kernel:  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> > > Feb 11 14:17:31  kernel: RIP: 0033:0x7ff146d3f517
> > > Feb 11 14:17:31  kernel: RSP: 002b:00007ffd4b5c1598 EFLAGS: 00000202
> > > ORIG_RAX: 00000000000000b0
> > > Feb 11 14:17:31  kernel: RAX: ffffffffffffffda RBX: 0000000000d78280
> > > RCX: 00007ff146d3f517
> > > Feb 11 14:17:31  kernel: RDX: 00007ff146db3ca0 RSI: 0000000000000800
> > > RDI: 0000000000d782e8
> > > Feb 11 14:17:31  kernel: RBP: 0000000000000000 R08: 00007ff147008060
> > > R09: 00007ff146db3ca0
> > > Feb 11 14:17:31  kernel: R10: 00007ffd4b5c1020 R11: 0000000000000202
> > > R12: 00007ffd4b5c36ca
> > > Feb 11 14:17:31  kernel: R13: 0000000000000000 R14: 0000000000d78280
> > > R15: 0000000000d78010
> > > Feb 11 14:17:31  kernel: INFO: Object 0xffff96f3c9e84ec0 @offset=20160
> > > Feb 11 14:17:31  kernel: kmem_cache_destroy rxe-qp: Slab cache still has objects
> > > Feb 11 14:17:31  kernel: CPU: 27 PID: 25588 Comm: rmmod Tainted: G
> > > B      OE   4.14.97-.el7.centos.x86_64 #1
> > > Feb 11 14:17:31  kernel: Hardware name: /80010056, BIOS 4.1.15 03/28/2017
> > > Feb 11 14:17:31  kernel: Call Trace:
> > > Feb 11 14:17:31  kernel:  dump_stack+0x5a/0x7b
> > > Feb 11 14:17:31  kernel:  kmem_cache_destroy+0x203/0x220
> > > Feb 11 14:17:31  kernel:  rxe_cache_clean+0x41/0x60 [rdma_rxe]
> > > Feb 11 14:17:31  kernel:  rxe_module_exit+0xf/0x68 [rdma_rxe]
> > > Feb 11 14:17:31  kernel:  SyS_delete_module+0x175/0x270
> > > Feb 11 14:17:31  kernel:  do_syscall_64+0x74/0x190
> > > Feb 11 14:17:31  kernel:  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> > > Feb 11 14:17:31  kernel: RIP: 0033:0x7ff146d3f517
> > > Feb 11 14:17:31  kernel: RSP: 002b:00007ffd4b5c1598 EFLAGS: 00000202
> > > ORIG_RAX: 00000000000000b0
> > > Feb 11 14:17:31  kernel: RAX: ffffffffffffffda RBX: 0000000000d78280
> > > RCX: 00007ff146d3f517
> > > Feb 11 14:17:31  kernel: RDX: 00007ff146db3ca0 RSI: 0000000000000800
> > > RDI: 0000000000d782e8
> > >
> > > On Tue, Feb 11, 2020 at 3:09 PM Frank Huang <tigerinxm@gmail.com> wrote:
> > > >
> > > > Hi, All
> > > >
> > > > When I use the old version of rdma_rxe (kernel 4.14.97), There is a
> > > > slab leak of qp, is it fixed in newest version? I found the commit
> > > > history on kernel.org, have not found same issue with it?
> > > >
> > > >
> > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:
> > > > =============================================================================
> > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: BUG
> > > > rxe-qp (Tainted: G           OE  ): Objects remaining in rxe-qp on
> > > > __kmem_cache_shutdown()
> > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:
> > > > -----------------------------------------------------------------------------
> > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: Disabling
> > > > lock debugging due to kernel taint
> > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: INFO:
> > > > Slab 0xfffff4c4b027a000 objects=16 used=1 fp=0xffff96f3c9e83f00
> > > > flags=0x17ffffc0008100
> > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: CPU: 27
> > > > PID: 25588 Comm: rmmod Tainted: G    B      OE
> > > > 4.14.97-.el7.centos.x86_64 #1
> > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: Hardware
> > > > name: 80010056, BIOS 4.1.15 03/28/2017
> > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: Call Trace:
> > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:
> > > > dump_stack+0x5a/0x7b
> > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:  slab_err+0xb4/0xe0
> > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:  ?
> > > > calibrate_delay+0x138/0x5f0
> > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:  ?
> > > > on_each_cpu_mask+0x27/0x60
> > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:  ?
> > > > on_each_cpu_cond+0xaf/0x140
> > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:  ?
> > > > __kmalloc+0x179/0x200
> > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:  ?
> > > > __kmem_cache_shutdown+0x194/0x3d0
> > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:
> > > > __kmem_cache_shutdown+0x1b4/0x3d0
> > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:
> > > > shutdown_cache+0x13/0x1b0
> > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:
> > > > kmem_cache_destroy+0x1e4/0x220
> > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:
> > > > rxe_cache_clean+0x41/0x60 [rdma_rxe]
> > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:
> > > > rxe_module_exit+0xf/0x68 [rdma_rxe]
> > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:
> > > > SyS_delete_module+0x175/0x270
> > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:
> > > > do_syscall_64+0x74/0x190
> > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:
> > > > entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: RIP:
> > > > 0033:0x7ff146d3f517
> > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: RSP:
> > > > 002b:00007ffd4b5c1598 EFLAGS: 00000202 ORIG_RAX: 00000000000000b0
> > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: RAX:
> > > > ffffffffffffffda RBX: 0000000000d78280 RCX: 00007ff146d3f517
> > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: RDX:
> > > > 00007ff146db3ca0 RSI: 0000000000000800 RDI: 0000000000d782e8
> > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: RBP:
> > > > 0000000000000000 R08: 00007ff147008060 R09: 00007ff146db3ca0
> > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: R10:
> > > > 00007ffd4b5c1020 R11: 0000000000000202 R12: 00007ffd4b5c36ca
> > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: R13:
> > > > 0000000000000000 R14: 0000000000d78280 R15: 0000000000d78010
> > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: INFO:
> > > > Object 0xffff96f3c9e84ec0 @offset=20160
> > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:
> > > > kmem_cache_destroy rxe-qp: Slab cache still has objects
> > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: CPU: 27
> > > > PID: 25588 Comm: rmmod Tainted: G    B      OE
> > > > 4.14.97-.el7.centos.x86_64 #1
> > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: Hardware
> > > > name: 80010056, BIOS 4.1.15 03/28/2017
> > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: Call Trace:
> > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:
> > > > dump_stack+0x5a/0x7b
> > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:
> > > > kmem_cache_destroy+0x203/0x220
> > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:
> > > > rxe_cache_clean+0x41/0x60 [rdma_rxe]
> > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:
> > > > rxe_module_exit+0xf/0x68 [rdma_rxe]
> > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:
> > > > SyS_delete_module+0x175/0x270
> > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:
> > > > do_syscall_64+0x74/0x190
> > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel:
> > > > entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: RIP:
> > > > 0033:0x7ff146d3f517
> > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: RSP:
> > > > 002b:00007ffd4b5c1598 EFLAGS: 00000202 ORIG_RAX: 00000000000000b0
> > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: RAX:
> > > > ffffffffffffffda RBX: 0000000000d78280 RCX: 00007ff146d3f517
> > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: RDX:
> > > > 00007ff146db3ca0 RSI: 0000000000000800 RDI: 0000000000d782e8
> > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: RBP:
> > > > 0000000000000000 R08: 00007ff147008060 R09: 00007ff146db3ca0
> > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: R10:
> > > > 00007ffd4b5c1020 R11: 0000000000000202 R12: 00007ffd4b5c36ca
> > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: R13:
> > > > 0000000000000000 R14: 0000000000d78280 R15: 0000000000d78010

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-02-12 13:07 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-11  7:09 slab leak on rxe Frank Huang
2020-02-11  7:17 ` Frank Huang
2020-02-11  7:41   ` Zhu Yanjun
2020-02-11  8:33     ` Frank Huang
2020-02-12 13:07       ` Zhu Yanjun

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).