Linux-RDMA Archive on lore.kernel.org
 help / color / Atom feed
* qedr memory leak report
@ 2019-08-30 18:03 Chuck Lever
  2019-08-30 18:27 ` Chuck Lever
  0 siblings, 1 reply; 9+ messages in thread
From: Chuck Lever @ 2019-08-30 18:03 UTC (permalink / raw)
  To: Michal Kalderon; +Cc: linux-rdma

Hi Michal-

In the middle of some other testing, I got this kmemleak report
while testing with FastLinq cards in iWARP mode:

unreferenced object 0xffff888458923340 (size 32):
  comm "mount.nfs", pid 2294, jiffies 4298338848 (age 1144.337s)
  hex dump (first 32 bytes):
    20 1d 69 63 88 88 ff ff 20 1d 69 63 88 88 ff ff   .ic.... .ic....
    00 60 7a 69 84 88 ff ff 00 60 82 f9 00 00 00 00  .`zi.....`......
  backtrace:
    [<000000000df5bfed>] __kmalloc+0x128/0x176
    [<0000000020724641>] qedr_alloc_pbl_tbl.constprop.44+0x3c/0x121 [qedr]
    [<00000000a361c591>] init_mr_info.constprop.41+0xaf/0x21f [qedr]
    [<00000000e8049714>] qedr_alloc_mr+0x95/0x2c1 [qedr]
    [<000000000e6102bc>] ib_alloc_mr_user+0x31/0x96 [ib_core]
    [<00000000d254a9fb>] frwr_init_mr+0x23/0x121 [rpcrdma]
    [<00000000a0364e35>] rpcrdma_mrs_create+0x45/0xea [rpcrdma]
    [<00000000fd6bf282>] rpcrdma_buffer_create+0x9e/0x1c9 [rpcrdma]
    [<00000000be3a1eba>] xprt_setup_rdma+0x109/0x279 [rpcrdma]
    [<00000000b736b88f>] xprt_create_transport+0x39/0x19a [sunrpc]
    [<000000001024e4dc>] rpc_create+0x118/0x1ab [sunrpc]
    [<00000000cca43a49>] nfs_create_rpc_client+0xf8/0x15f [nfs]
    [<00000000073c962c>] nfs_init_client+0x1a/0x3b [nfs]
    [<00000000b03964c4>] nfs_init_server+0xc1/0x212 [nfs]
    [<000000001c71f609>] nfs_create_server+0x74/0x1a4 [nfs]
    [<000000004dc919a1>] nfs3_create_server+0xb/0x25 [nfsv3]

It's repeated many times.

The workload was an unremarkable software build and regression test
suite on an NFSv3 mount with RDMA.


--
Chuck Lever

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: qedr memory leak report
  2019-08-30 18:03 qedr memory leak report Chuck Lever
@ 2019-08-30 18:27 ` Chuck Lever
  2019-08-31  7:30   ` Leon Romanovsky
  2019-09-02  7:53   ` [EXT] " Michal Kalderon
  0 siblings, 2 replies; 9+ messages in thread
From: Chuck Lever @ 2019-08-30 18:27 UTC (permalink / raw)
  To: Michal Kalderon; +Cc: linux-rdma


> On Aug 30, 2019, at 2:03 PM, Chuck Lever <chuck.lever@oracle.com> wrote:
> 
> Hi Michal-
> 
> In the middle of some other testing, I got this kmemleak report
> while testing with FastLinq cards in iWARP mode:
> 
> unreferenced object 0xffff888458923340 (size 32):
>  comm "mount.nfs", pid 2294, jiffies 4298338848 (age 1144.337s)
>  hex dump (first 32 bytes):
>    20 1d 69 63 88 88 ff ff 20 1d 69 63 88 88 ff ff   .ic.... .ic....
>    00 60 7a 69 84 88 ff ff 00 60 82 f9 00 00 00 00  .`zi.....`......
>  backtrace:
>    [<000000000df5bfed>] __kmalloc+0x128/0x176
>    [<0000000020724641>] qedr_alloc_pbl_tbl.constprop.44+0x3c/0x121 [qedr]
>    [<00000000a361c591>] init_mr_info.constprop.41+0xaf/0x21f [qedr]
>    [<00000000e8049714>] qedr_alloc_mr+0x95/0x2c1 [qedr]
>    [<000000000e6102bc>] ib_alloc_mr_user+0x31/0x96 [ib_core]
>    [<00000000d254a9fb>] frwr_init_mr+0x23/0x121 [rpcrdma]
>    [<00000000a0364e35>] rpcrdma_mrs_create+0x45/0xea [rpcrdma]
>    [<00000000fd6bf282>] rpcrdma_buffer_create+0x9e/0x1c9 [rpcrdma]
>    [<00000000be3a1eba>] xprt_setup_rdma+0x109/0x279 [rpcrdma]
>    [<00000000b736b88f>] xprt_create_transport+0x39/0x19a [sunrpc]
>    [<000000001024e4dc>] rpc_create+0x118/0x1ab [sunrpc]
>    [<00000000cca43a49>] nfs_create_rpc_client+0xf8/0x15f [nfs]
>    [<00000000073c962c>] nfs_init_client+0x1a/0x3b [nfs]
>    [<00000000b03964c4>] nfs_init_server+0xc1/0x212 [nfs]
>    [<000000001c71f609>] nfs_create_server+0x74/0x1a4 [nfs]
>    [<000000004dc919a1>] nfs3_create_server+0xb/0x25 [nfsv3]
> 
> It's repeated many times.
> 
> The workload was an unremarkable software build and regression test
> suite on an NFSv3 mount with RDMA.

Also seeing one of these per NFS mount:

unreferenced object 0xffff888869f39b40 (size 64):
  comm "kworker/u28:0", pid 17569, jiffies 4299267916 (age 1592.907s)
  hex dump (first 32 bytes):
    00 80 53 6d 88 88 ff ff 00 00 00 00 00 00 00 00  ..Sm............
    00 48 e2 66 84 88 ff ff 00 00 00 00 00 00 00 00  .H.f............
  backtrace:
    [<0000000063e652dd>] kmem_cache_alloc_trace+0xed/0x133
    [<0000000083b1e912>] qedr_iw_connect+0xf9/0x3c8 [qedr]
    [<00000000553be951>] iw_cm_connect+0xd0/0x157 [iw_cm]
    [<00000000b086730c>] rdma_connect+0x54e/0x5b0 [rdma_cm]
    [<00000000d8af3cf2>] rpcrdma_ep_connect+0x22b/0x360 [rpcrdma]
    [<000000006a413c8d>] xprt_rdma_connect_worker+0x24/0x88 [rpcrdma]
    [<000000001c5b049a>] process_one_work+0x196/0x2c6
    [<000000007e3403ba>] worker_thread+0x1ad/0x261
    [<000000001daaa973>] kthread+0xf4/0xf9
    [<0000000014987b31>] ret_from_fork+0x24/0x30

Looks like this one is not being freed:

514         ep = kzalloc(sizeof(*ep), GFP_KERNEL);
515         if (!ep)
516                 return -ENOMEM;


--
Chuck Lever




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: qedr memory leak report
  2019-08-30 18:27 ` Chuck Lever
@ 2019-08-31  7:30   ` Leon Romanovsky
  2019-08-31 14:33     ` Doug Ledford
  2019-09-02  7:53   ` [EXT] " Michal Kalderon
  1 sibling, 1 reply; 9+ messages in thread
From: Leon Romanovsky @ 2019-08-31  7:30 UTC (permalink / raw)
  To: Doug Ledford; +Cc: Chuck Lever, Michal Kalderon, linux-rdma

Doug,

I think that it can be counted as good example why allowing memory leaks
in drivers (HNS) is not so great idea.

Thanks

On Fri, Aug 30, 2019 at 02:27:49PM -0400, Chuck Lever wrote:
>
> > On Aug 30, 2019, at 2:03 PM, Chuck Lever <chuck.lever@oracle.com> wrote:
> >
> > Hi Michal-
> >
> > In the middle of some other testing, I got this kmemleak report
> > while testing with FastLinq cards in iWARP mode:
> >
> > unreferenced object 0xffff888458923340 (size 32):
> >  comm "mount.nfs", pid 2294, jiffies 4298338848 (age 1144.337s)
> >  hex dump (first 32 bytes):
> >    20 1d 69 63 88 88 ff ff 20 1d 69 63 88 88 ff ff   .ic.... .ic....
> >    00 60 7a 69 84 88 ff ff 00 60 82 f9 00 00 00 00  .`zi.....`......
> >  backtrace:
> >    [<000000000df5bfed>] __kmalloc+0x128/0x176
> >    [<0000000020724641>] qedr_alloc_pbl_tbl.constprop.44+0x3c/0x121 [qedr]
> >    [<00000000a361c591>] init_mr_info.constprop.41+0xaf/0x21f [qedr]
> >    [<00000000e8049714>] qedr_alloc_mr+0x95/0x2c1 [qedr]
> >    [<000000000e6102bc>] ib_alloc_mr_user+0x31/0x96 [ib_core]
> >    [<00000000d254a9fb>] frwr_init_mr+0x23/0x121 [rpcrdma]
> >    [<00000000a0364e35>] rpcrdma_mrs_create+0x45/0xea [rpcrdma]
> >    [<00000000fd6bf282>] rpcrdma_buffer_create+0x9e/0x1c9 [rpcrdma]
> >    [<00000000be3a1eba>] xprt_setup_rdma+0x109/0x279 [rpcrdma]
> >    [<00000000b736b88f>] xprt_create_transport+0x39/0x19a [sunrpc]
> >    [<000000001024e4dc>] rpc_create+0x118/0x1ab [sunrpc]
> >    [<00000000cca43a49>] nfs_create_rpc_client+0xf8/0x15f [nfs]
> >    [<00000000073c962c>] nfs_init_client+0x1a/0x3b [nfs]
> >    [<00000000b03964c4>] nfs_init_server+0xc1/0x212 [nfs]
> >    [<000000001c71f609>] nfs_create_server+0x74/0x1a4 [nfs]
> >    [<000000004dc919a1>] nfs3_create_server+0xb/0x25 [nfsv3]
> >
> > It's repeated many times.
> >
> > The workload was an unremarkable software build and regression test
> > suite on an NFSv3 mount with RDMA.
>
> Also seeing one of these per NFS mount:
>
> unreferenced object 0xffff888869f39b40 (size 64):
>   comm "kworker/u28:0", pid 17569, jiffies 4299267916 (age 1592.907s)
>   hex dump (first 32 bytes):
>     00 80 53 6d 88 88 ff ff 00 00 00 00 00 00 00 00  ..Sm............
>     00 48 e2 66 84 88 ff ff 00 00 00 00 00 00 00 00  .H.f............
>   backtrace:
>     [<0000000063e652dd>] kmem_cache_alloc_trace+0xed/0x133
>     [<0000000083b1e912>] qedr_iw_connect+0xf9/0x3c8 [qedr]
>     [<00000000553be951>] iw_cm_connect+0xd0/0x157 [iw_cm]
>     [<00000000b086730c>] rdma_connect+0x54e/0x5b0 [rdma_cm]
>     [<00000000d8af3cf2>] rpcrdma_ep_connect+0x22b/0x360 [rpcrdma]
>     [<000000006a413c8d>] xprt_rdma_connect_worker+0x24/0x88 [rpcrdma]
>     [<000000001c5b049a>] process_one_work+0x196/0x2c6
>     [<000000007e3403ba>] worker_thread+0x1ad/0x261
>     [<000000001daaa973>] kthread+0xf4/0xf9
>     [<0000000014987b31>] ret_from_fork+0x24/0x30
>
> Looks like this one is not being freed:
>
> 514         ep = kzalloc(sizeof(*ep), GFP_KERNEL);
> 515         if (!ep)
> 516                 return -ENOMEM;
>
>
> --
> Chuck Lever
>
>
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: qedr memory leak report
  2019-08-31  7:30   ` Leon Romanovsky
@ 2019-08-31 14:33     ` Doug Ledford
  2019-08-31 15:19       ` Leon Romanovsky
  0 siblings, 1 reply; 9+ messages in thread
From: Doug Ledford @ 2019-08-31 14:33 UTC (permalink / raw)
  To: Leon Romanovsky; +Cc: Chuck Lever, Michal Kalderon, linux-rdma

[-- Attachment #1: Type: text/plain, Size: 3591 bytes --]

On Sat, 2019-08-31 at 10:30 +0300, Leon Romanovsky wrote:
> Doug,
> 
> I think that it can be counted as good example why allowing memory
> leaks
> in drivers (HNS) is not so great idea.

Crashing the machine is worse.

> Thanks
> 
> On Fri, Aug 30, 2019 at 02:27:49PM -0400, Chuck Lever wrote:
> > > On Aug 30, 2019, at 2:03 PM, Chuck Lever <chuck.lever@oracle.com>
> > > wrote:
> > > 
> > > Hi Michal-
> > > 
> > > In the middle of some other testing, I got this kmemleak report
> > > while testing with FastLinq cards in iWARP mode:
> > > 
> > > unreferenced object 0xffff888458923340 (size 32):
> > >  comm "mount.nfs", pid 2294, jiffies 4298338848 (age 1144.337s)
> > >  hex dump (first 32 bytes):
> > >    20 1d 69 63 88 88 ff ff 20 1d 69 63 88 88 ff ff   .ic....
> > > .ic....
> > >    00 60 7a 69 84 88 ff ff 00 60 82 f9 00 00 00
> > > 00  .`zi.....`......
> > >  backtrace:
> > >    [<000000000df5bfed>] __kmalloc+0x128/0x176
> > >    [<0000000020724641>] qedr_alloc_pbl_tbl.constprop.44+0x3c/0x121
> > > [qedr]
> > >    [<00000000a361c591>] init_mr_info.constprop.41+0xaf/0x21f
> > > [qedr]
> > >    [<00000000e8049714>] qedr_alloc_mr+0x95/0x2c1 [qedr]
> > >    [<000000000e6102bc>] ib_alloc_mr_user+0x31/0x96 [ib_core]
> > >    [<00000000d254a9fb>] frwr_init_mr+0x23/0x121 [rpcrdma]
> > >    [<00000000a0364e35>] rpcrdma_mrs_create+0x45/0xea [rpcrdma]
> > >    [<00000000fd6bf282>] rpcrdma_buffer_create+0x9e/0x1c9 [rpcrdma]
> > >    [<00000000be3a1eba>] xprt_setup_rdma+0x109/0x279 [rpcrdma]
> > >    [<00000000b736b88f>] xprt_create_transport+0x39/0x19a [sunrpc]
> > >    [<000000001024e4dc>] rpc_create+0x118/0x1ab [sunrpc]
> > >    [<00000000cca43a49>] nfs_create_rpc_client+0xf8/0x15f [nfs]
> > >    [<00000000073c962c>] nfs_init_client+0x1a/0x3b [nfs]
> > >    [<00000000b03964c4>] nfs_init_server+0xc1/0x212 [nfs]
> > >    [<000000001c71f609>] nfs_create_server+0x74/0x1a4 [nfs]
> > >    [<000000004dc919a1>] nfs3_create_server+0xb/0x25 [nfsv3]
> > > 
> > > It's repeated many times.
> > > 
> > > The workload was an unremarkable software build and regression
> > > test
> > > suite on an NFSv3 mount with RDMA.
> > 
> > Also seeing one of these per NFS mount:
> > 
> > unreferenced object 0xffff888869f39b40 (size 64):
> >   comm "kworker/u28:0", pid 17569, jiffies 4299267916 (age
> > 1592.907s)
> >   hex dump (first 32 bytes):
> >     00 80 53 6d 88 88 ff ff 00 00 00 00 00 00 00
> > 00  ..Sm............
> >     00 48 e2 66 84 88 ff ff 00 00 00 00 00 00 00
> > 00  .H.f............
> >   backtrace:
> >     [<0000000063e652dd>] kmem_cache_alloc_trace+0xed/0x133
> >     [<0000000083b1e912>] qedr_iw_connect+0xf9/0x3c8 [qedr]
> >     [<00000000553be951>] iw_cm_connect+0xd0/0x157 [iw_cm]
> >     [<00000000b086730c>] rdma_connect+0x54e/0x5b0 [rdma_cm]
> >     [<00000000d8af3cf2>] rpcrdma_ep_connect+0x22b/0x360 [rpcrdma]
> >     [<000000006a413c8d>] xprt_rdma_connect_worker+0x24/0x88
> > [rpcrdma]
> >     [<000000001c5b049a>] process_one_work+0x196/0x2c6
> >     [<000000007e3403ba>] worker_thread+0x1ad/0x261
> >     [<000000001daaa973>] kthread+0xf4/0xf9
> >     [<0000000014987b31>] ret_from_fork+0x24/0x30
> > 
> > Looks like this one is not being freed:
> > 
> > 514         ep = kzalloc(sizeof(*ep), GFP_KERNEL);
> > 515         if (!ep)
> > 516                 return -ENOMEM;
> > 
> > 
> > --
> > Chuck Lever
> > 
> > 
> > 

-- 
Doug Ledford <dledford@redhat.com>
    GPG KeyID: B826A3330E572FDD
    Fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: qedr memory leak report
  2019-08-31 14:33     ` Doug Ledford
@ 2019-08-31 15:19       ` Leon Romanovsky
  2019-08-31 17:17         ` Doug Ledford
  0 siblings, 1 reply; 9+ messages in thread
From: Leon Romanovsky @ 2019-08-31 15:19 UTC (permalink / raw)
  To: Doug Ledford; +Cc: Chuck Lever, Michal Kalderon, linux-rdma

On Sat, Aug 31, 2019 at 10:33:13AM -0400, Doug Ledford wrote:
> On Sat, 2019-08-31 at 10:30 +0300, Leon Romanovsky wrote:
> > Doug,
> >
> > I think that it can be counted as good example why allowing memory
> > leaks
> > in drivers (HNS) is not so great idea.
>
> Crashing the machine is worse.

The problem with it that you are "punishing" whole subsystem
because of some piece of crap which anyway users can't buy.
If HNS wants to have memory leaks, they need to do it outside
of upstream kernel.

In general, if users buy shitty hardware, they need to be ready
to have kernel panics too. It works with faulty DRAM where kernel
doesn't hide such failures, so don't see any rationale to invent
something special for ib_device.

Thanks

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: qedr memory leak report
  2019-08-31 15:19       ` Leon Romanovsky
@ 2019-08-31 17:17         ` Doug Ledford
  2019-08-31 18:55           ` Leon Romanovsky
  0 siblings, 1 reply; 9+ messages in thread
From: Doug Ledford @ 2019-08-31 17:17 UTC (permalink / raw)
  To: Leon Romanovsky; +Cc: Chuck Lever, Michal Kalderon, linux-rdma

[-- Attachment #1: Type: text/plain, Size: 2041 bytes --]

On Sat, 2019-08-31 at 18:19 +0300, Leon Romanovsky wrote:
> On Sat, Aug 31, 2019 at 10:33:13AM -0400, Doug Ledford wrote:
> > On Sat, 2019-08-31 at 10:30 +0300, Leon Romanovsky wrote:
> > > Doug,
> > > 
> > > I think that it can be counted as good example why allowing memory
> > > leaks
> > > in drivers (HNS) is not so great idea.
> > 
> > Crashing the machine is worse.
> 
> The problem with it that you are "punishing" whole subsystem
> because of some piece of crap which anyway users can't buy.

No I'm not.  The patch in question was in the hns driver and only leaked
resources assigned to the hns card when the hns card timed out in
freeing those resources.  That doesn't punish the entire subsystem, it
only punishes the users of that card, and then only if the card has
flaked out.

> If HNS wants to have memory leaks, they need to do it outside
> of upstream kernel.

Nope.

> In general, if users buy shitty hardware, they need to be ready
> to have kernel panics too. It works with faulty DRAM where kernel
> doesn't hide such failures, so don't see any rationale to invent
> something special for ib_device.

What you are advocating for is not "shitty DRAM crashing the machine",
you are advocating for "having ECC DRAM and then intentionally turning
the ECC off and then crashing the machine".  Please repeat after me: WE
DONT CRASH MACHINES.  PERIOD.  If it is avoidable, we avoid it.  That's
why BUG_ONs have to go and why they piss Linus off so much.  If you
crash the machine, people are left scratching their head and asking why.
If you don't crash the machine, they have a chance to debug the issue
and resolve it.  The entire idea that you are advocating for crashing
the machine as being preferable to leaking a few resources is ludicrous.
WE DONT CRASH MACHINES.  PERIOD.  Please repeat that until it fully
sinks in.

> Thanks

-- 
Doug Ledford <dledford@redhat.com>
    GPG KeyID: B826A3330E572FDD
    Fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: qedr memory leak report
  2019-08-31 17:17         ` Doug Ledford
@ 2019-08-31 18:55           ` Leon Romanovsky
  0 siblings, 0 replies; 9+ messages in thread
From: Leon Romanovsky @ 2019-08-31 18:55 UTC (permalink / raw)
  To: Doug Ledford; +Cc: Chuck Lever, Michal Kalderon, linux-rdma

On Sat, Aug 31, 2019 at 01:17:05PM -0400, Doug Ledford wrote:
> On Sat, 2019-08-31 at 18:19 +0300, Leon Romanovsky wrote:
> > On Sat, Aug 31, 2019 at 10:33:13AM -0400, Doug Ledford wrote:
> > > On Sat, 2019-08-31 at 10:30 +0300, Leon Romanovsky wrote:
> > > > Doug,
> > > >
> > > > I think that it can be counted as good example why allowing memory
> > > > leaks
> > > > in drivers (HNS) is not so great idea.
> > >
> > > Crashing the machine is worse.
> >
> > The problem with it that you are "punishing" whole subsystem
> > because of some piece of crap which anyway users can't buy.
>
> No I'm not.  The patch in question was in the hns driver and only leaked
> resources assigned to the hns card when the hns card timed out in
> freeing those resources.  That doesn't punish the entire subsystem, it
> only punishes the users of that card, and then only if the card has
> flaked out.

Unfortunately, but you are.

Our model is based on the fact that destroy operations can't fail and
all allocations performed by IB/core should be released right after call
to relevant destroy callback. The fact that you are allowing to one
driver don't success in destroy, means that you will need to allow
to everyone chance to return errors and skip freeing resources.

>
> > If HNS wants to have memory leaks, they need to do it outside
> > of upstream kernel.
>
> Nope.
>
> > In general, if users buy shitty hardware, they need to be ready
> > to have kernel panics too. It works with faulty DRAM where kernel
> > doesn't hide such failures, so don't see any rationale to invent
> > something special for ib_device.
>
> What you are advocating for is not "shitty DRAM crashing the machine",
> you are advocating for "having ECC DRAM and then intentionally turning
> the ECC off and then crashing the machine".  Please repeat after me: WE
> DONT CRASH MACHINES.  PERIOD.  If it is avoidable, we avoid it.  That's
> why BUG_ONs have to go and why they piss Linus off so much.  If you
> crash the machine, people are left scratching their head and asking why.
> If you don't crash the machine, they have a chance to debug the issue
> and resolve it.  The entire idea that you are advocating for crashing
> the machine as being preferable to leaking a few resources is ludicrous.
> WE DONT CRASH MACHINES.  PERIOD.  Please repeat that until it fully
> sinks in.

I'm not advocating for that and I don't buy explanation that freeing
memory will cause for machine to crash, at the end, freed memory
means that user won't have access to such bad resource.

Thanks

>
> > Thanks
>
> --
> Doug Ledford <dledford@redhat.com>
>     GPG KeyID: B826A3330E572FDD
>     Fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD



^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [EXT] Re: qedr memory leak report
  2019-08-30 18:27 ` Chuck Lever
  2019-08-31  7:30   ` Leon Romanovsky
@ 2019-09-02  7:53   ` " Michal Kalderon
  2019-09-03 12:53     ` Chuck Lever
  1 sibling, 1 reply; 9+ messages in thread
From: Michal Kalderon @ 2019-09-02  7:53 UTC (permalink / raw)
  To: Chuck Lever, Michal Kalderon; +Cc: linux-rdma

> From: Chuck Lever <chuck.lever@oracle.com>
> Sent: Friday, August 30, 2019 9:28 PM
> 
> External Email
> 
> ----------------------------------------------------------------------
> 
> > On Aug 30, 2019, at 2:03 PM, Chuck Lever <chuck.lever@oracle.com>
> wrote:
> >
> > Hi Michal-
> >
> > In the middle of some other testing, I got this kmemleak report while
> > testing with FastLinq cards in iWARP mode:
> >
> > unreferenced object 0xffff888458923340 (size 32):
> >  comm "mount.nfs", pid 2294, jiffies 4298338848 (age 1144.337s)  hex
> > dump (first 32 bytes):
> >    20 1d 69 63 88 88 ff ff 20 1d 69 63 88 88 ff ff   .ic.... .ic....
> >    00 60 7a 69 84 88 ff ff 00 60 82 f9 00 00 00 00  .`zi.....`......
> >  backtrace:
> >    [<000000000df5bfed>] __kmalloc+0x128/0x176
> >    [<0000000020724641>] qedr_alloc_pbl_tbl.constprop.44+0x3c/0x121
> [qedr]
> >    [<00000000a361c591>] init_mr_info.constprop.41+0xaf/0x21f [qedr]
> >    [<00000000e8049714>] qedr_alloc_mr+0x95/0x2c1 [qedr]
> >    [<000000000e6102bc>] ib_alloc_mr_user+0x31/0x96 [ib_core]
> >    [<00000000d254a9fb>] frwr_init_mr+0x23/0x121 [rpcrdma]
> >    [<00000000a0364e35>] rpcrdma_mrs_create+0x45/0xea [rpcrdma]
> >    [<00000000fd6bf282>] rpcrdma_buffer_create+0x9e/0x1c9 [rpcrdma]
> >    [<00000000be3a1eba>] xprt_setup_rdma+0x109/0x279 [rpcrdma]
> >    [<00000000b736b88f>] xprt_create_transport+0x39/0x19a [sunrpc]
> >    [<000000001024e4dc>] rpc_create+0x118/0x1ab [sunrpc]
> >    [<00000000cca43a49>] nfs_create_rpc_client+0xf8/0x15f [nfs]
> >    [<00000000073c962c>] nfs_init_client+0x1a/0x3b [nfs]
> >    [<00000000b03964c4>] nfs_init_server+0xc1/0x212 [nfs]
> >    [<000000001c71f609>] nfs_create_server+0x74/0x1a4 [nfs]
> >    [<000000004dc919a1>] nfs3_create_server+0xb/0x25 [nfsv3]
> >
> > It's repeated many times.
> >
> > The workload was an unremarkable software build and regression test
> > suite on an NFSv3 mount with RDMA.
> 
> Also seeing one of these per NFS mount:
> 
> unreferenced object 0xffff888869f39b40 (size 64):
>   comm "kworker/u28:0", pid 17569, jiffies 4299267916 (age 1592.907s)
>   hex dump (first 32 bytes):
>     00 80 53 6d 88 88 ff ff 00 00 00 00 00 00 00 00  ..Sm............
>     00 48 e2 66 84 88 ff ff 00 00 00 00 00 00 00 00  .H.f............
>   backtrace:
>     [<0000000063e652dd>] kmem_cache_alloc_trace+0xed/0x133
>     [<0000000083b1e912>] qedr_iw_connect+0xf9/0x3c8 [qedr]
>     [<00000000553be951>] iw_cm_connect+0xd0/0x157 [iw_cm]
>     [<00000000b086730c>] rdma_connect+0x54e/0x5b0 [rdma_cm]
>     [<00000000d8af3cf2>] rpcrdma_ep_connect+0x22b/0x360 [rpcrdma]
>     [<000000006a413c8d>] xprt_rdma_connect_worker+0x24/0x88 [rpcrdma]
>     [<000000001c5b049a>] process_one_work+0x196/0x2c6
>     [<000000007e3403ba>] worker_thread+0x1ad/0x261
>     [<000000001daaa973>] kthread+0xf4/0xf9
>     [<0000000014987b31>] ret_from_fork+0x24/0x30
> 
> Looks like this one is not being freed:
> 
> 514         ep = kzalloc(sizeof(*ep), GFP_KERNEL);
> 515         if (!ep)
> 516                 return -ENOMEM;
> 
> 
Thanks Chuck! I'll take care of this. Is there an easy repro for getting the leak ? 
Thanks,
Michal

> --
> Chuck Lever
> 
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [EXT] Re: qedr memory leak report
  2019-09-02  7:53   ` [EXT] " Michal Kalderon
@ 2019-09-03 12:53     ` Chuck Lever
  0 siblings, 0 replies; 9+ messages in thread
From: Chuck Lever @ 2019-09-03 12:53 UTC (permalink / raw)
  To: Michal Kalderon; +Cc: Michal Kalderon, linux-rdma


On Sep 2, 2019, at 3:53 AM, Michal Kalderon <mkalderon@marvell.com> wrote:

>> From: Chuck Lever <chuck.lever@oracle.com>
>> Sent: Friday, August 30, 2019 9:28 PM
>> 
>> External Email
>> 
>> ----------------------------------------------------------------------
>> 
>>> On Aug 30, 2019, at 2:03 PM, Chuck Lever <chuck.lever@oracle.com>
>> wrote:
>>> 
>>> Hi Michal-
>>> 
>>> In the middle of some other testing, I got this kmemleak report while
>>> testing with FastLinq cards in iWARP mode:
>>> 
>>> unreferenced object 0xffff888458923340 (size 32):
>>> comm "mount.nfs", pid 2294, jiffies 4298338848 (age 1144.337s)  hex
>>> dump (first 32 bytes):
>>>   20 1d 69 63 88 88 ff ff 20 1d 69 63 88 88 ff ff   .ic.... .ic....
>>>   00 60 7a 69 84 88 ff ff 00 60 82 f9 00 00 00 00  .`zi.....`......
>>> backtrace:
>>>   [<000000000df5bfed>] __kmalloc+0x128/0x176
>>>   [<0000000020724641>] qedr_alloc_pbl_tbl.constprop.44+0x3c/0x121
>> [qedr]
>>>   [<00000000a361c591>] init_mr_info.constprop.41+0xaf/0x21f [qedr]
>>>   [<00000000e8049714>] qedr_alloc_mr+0x95/0x2c1 [qedr]
>>>   [<000000000e6102bc>] ib_alloc_mr_user+0x31/0x96 [ib_core]
>>>   [<00000000d254a9fb>] frwr_init_mr+0x23/0x121 [rpcrdma]
>>>   [<00000000a0364e35>] rpcrdma_mrs_create+0x45/0xea [rpcrdma]
>>>   [<00000000fd6bf282>] rpcrdma_buffer_create+0x9e/0x1c9 [rpcrdma]
>>>   [<00000000be3a1eba>] xprt_setup_rdma+0x109/0x279 [rpcrdma]
>>>   [<00000000b736b88f>] xprt_create_transport+0x39/0x19a [sunrpc]
>>>   [<000000001024e4dc>] rpc_create+0x118/0x1ab [sunrpc]
>>>   [<00000000cca43a49>] nfs_create_rpc_client+0xf8/0x15f [nfs]
>>>   [<00000000073c962c>] nfs_init_client+0x1a/0x3b [nfs]
>>>   [<00000000b03964c4>] nfs_init_server+0xc1/0x212 [nfs]
>>>   [<000000001c71f609>] nfs_create_server+0x74/0x1a4 [nfs]
>>>   [<000000004dc919a1>] nfs3_create_server+0xb/0x25 [nfsv3]
>>> 
>>> It's repeated many times.
>>> 
>>> The workload was an unremarkable software build and regression test
>>> suite on an NFSv3 mount with RDMA.
>> 
>> Also seeing one of these per NFS mount:
>> 
>> unreferenced object 0xffff888869f39b40 (size 64):
>>  comm "kworker/u28:0", pid 17569, jiffies 4299267916 (age 1592.907s)
>>  hex dump (first 32 bytes):
>>    00 80 53 6d 88 88 ff ff 00 00 00 00 00 00 00 00  ..Sm............
>>    00 48 e2 66 84 88 ff ff 00 00 00 00 00 00 00 00  .H.f............
>>  backtrace:
>>    [<0000000063e652dd>] kmem_cache_alloc_trace+0xed/0x133
>>    [<0000000083b1e912>] qedr_iw_connect+0xf9/0x3c8 [qedr]
>>    [<00000000553be951>] iw_cm_connect+0xd0/0x157 [iw_cm]
>>    [<00000000b086730c>] rdma_connect+0x54e/0x5b0 [rdma_cm]
>>    [<00000000d8af3cf2>] rpcrdma_ep_connect+0x22b/0x360 [rpcrdma]
>>    [<000000006a413c8d>] xprt_rdma_connect_worker+0x24/0x88 [rpcrdma]
>>    [<000000001c5b049a>] process_one_work+0x196/0x2c6
>>    [<000000007e3403ba>] worker_thread+0x1ad/0x261
>>    [<000000001daaa973>] kthread+0xf4/0xf9
>>    [<0000000014987b31>] ret_from_fork+0x24/0x30
>> 
>> Looks like this one is not being freed:
>> 
>> 514         ep = kzalloc(sizeof(*ep), GFP_KERNEL);
>> 515         if (!ep)
>> 516                 return -ENOMEM;
>> 
>> 
> Thanks Chuck! I'll take care of this. Is there an easy repro for getting the leak ?

Nothing special is necessary. Enable kmemleak detection, then run any NFS/RDMA workload that does some I/O, unmount, and wait a few minutes for the kmemleak laudromat thread to run.


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, back to index

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-30 18:03 qedr memory leak report Chuck Lever
2019-08-30 18:27 ` Chuck Lever
2019-08-31  7:30   ` Leon Romanovsky
2019-08-31 14:33     ` Doug Ledford
2019-08-31 15:19       ` Leon Romanovsky
2019-08-31 17:17         ` Doug Ledford
2019-08-31 18:55           ` Leon Romanovsky
2019-09-02  7:53   ` [EXT] " Michal Kalderon
2019-09-03 12:53     ` Chuck Lever

Linux-RDMA Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-rdma/0 linux-rdma/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-rdma linux-rdma/ https://lore.kernel.org/linux-rdma \
		linux-rdma@vger.kernel.org linux-rdma@archiver.kernel.org
	public-inbox-index linux-rdma


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-rdma


AGPL code for this site: git clone https://public-inbox.org/ public-inbox