netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/1] net: rds: fix memory leak in rds_ib_flush_mr_pool
@ 2019-06-06  8:00 Zhu Yanjun
  2019-06-06 15:57 ` santosh.shilimkar
  2019-06-06 17:33 ` David Miller
  0 siblings, 2 replies; 3+ messages in thread
From: Zhu Yanjun @ 2019-06-06  8:00 UTC (permalink / raw)
  To: santosh.shilimkar, davem, netdev, linux-rdma, rds-devel

When the following tests last for several hours, the problem will occur.

Server:
    rds-stress -r 1.1.1.16 -D 1M
Client:
    rds-stress -r 1.1.1.14 -s 1.1.1.16 -D 1M -T 30

The following will occur.

"
Starting up....
tsks   tx/s   rx/s  tx+rx K/s    mbi K/s    mbo K/s tx us/c   rtt us cpu
%
  1      0      0       0.00       0.00       0.00    0.00 0.00 -1.00
  1      0      0       0.00       0.00       0.00    0.00 0.00 -1.00
  1      0      0       0.00       0.00       0.00    0.00 0.00 -1.00
  1      0      0       0.00       0.00       0.00    0.00 0.00 -1.00
"
From vmcore, we can find that clean_list is NULL.

From the source code, rds_mr_flushd calls rds_ib_mr_pool_flush_worker.
Then rds_ib_mr_pool_flush_worker calls
"
 rds_ib_flush_mr_pool(pool, 0, NULL);
"
Then in function
"
int rds_ib_flush_mr_pool(struct rds_ib_mr_pool *pool,
                         int free_all, struct rds_ib_mr **ibmr_ret)
"
ibmr_ret is NULL.

In the source code,
"
...
list_to_llist_nodes(pool, &unmap_list, &clean_nodes, &clean_tail);
if (ibmr_ret)
        *ibmr_ret = llist_entry(clean_nodes, struct rds_ib_mr, llnode);

/* more than one entry in llist nodes */
if (clean_nodes->next)
        llist_add_batch(clean_nodes->next, clean_tail, &pool->clean_list);
...
"
When ibmr_ret is NULL, llist_entry is not executed. clean_nodes->next
instead of clean_nodes is added in clean_list.
So clean_nodes is discarded. It can not be used again.
The workqueue is executed periodically. So more and more clean_nodes are
discarded. Finally the clean_list is NULL.
Then this problem will occur.

Fixes: 1bc144b62524 ("net, rds, Replace xlist in net/rds/xlist.h with llist")
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
---
 net/rds/ib_rdma.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/net/rds/ib_rdma.c b/net/rds/ib_rdma.c
index d664e9a..0b347f4 100644
--- a/net/rds/ib_rdma.c
+++ b/net/rds/ib_rdma.c
@@ -428,12 +428,14 @@ int rds_ib_flush_mr_pool(struct rds_ib_mr_pool *pool,
 		wait_clean_list_grace();
 
 		list_to_llist_nodes(pool, &unmap_list, &clean_nodes, &clean_tail);
-		if (ibmr_ret)
+		if (ibmr_ret) {
 			*ibmr_ret = llist_entry(clean_nodes, struct rds_ib_mr, llnode);
-
+			clean_nodes = clean_nodes->next;
+		}
 		/* more than one entry in llist nodes */
-		if (clean_nodes->next)
-			llist_add_batch(clean_nodes->next, clean_tail, &pool->clean_list);
+		if (clean_nodes)
+			llist_add_batch(clean_nodes, clean_tail,
+					&pool->clean_list);
 
 	}
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH 1/1] net: rds: fix memory leak in rds_ib_flush_mr_pool
  2019-06-06  8:00 [PATCH 1/1] net: rds: fix memory leak in rds_ib_flush_mr_pool Zhu Yanjun
@ 2019-06-06 15:57 ` santosh.shilimkar
  2019-06-06 17:33 ` David Miller
  1 sibling, 0 replies; 3+ messages in thread
From: santosh.shilimkar @ 2019-06-06 15:57 UTC (permalink / raw)
  To: Zhu Yanjun, davem, netdev, linux-rdma, rds-devel

On 6/6/19 1:00 AM, Zhu Yanjun wrote:
> When the following tests last for several hours, the problem will occur.
> 
> Server:
>      rds-stress -r 1.1.1.16 -D 1M
> Client:
>      rds-stress -r 1.1.1.14 -s 1.1.1.16 -D 1M -T 30
> 
> The following will occur.
> 
> "
> Starting up....
> tsks   tx/s   rx/s  tx+rx K/s    mbi K/s    mbo K/s tx us/c   rtt us cpu
> %
>    1      0      0       0.00       0.00       0.00    0.00 0.00 -1.00
>    1      0      0       0.00       0.00       0.00    0.00 0.00 -1.00
>    1      0      0       0.00       0.00       0.00    0.00 0.00 -1.00
>    1      0      0       0.00       0.00       0.00    0.00 0.00 -1.00
> "
>  From vmcore, we can find that clean_list is NULL.
> 
>  From the source code, rds_mr_flushd calls rds_ib_mr_pool_flush_worker.
> Then rds_ib_mr_pool_flush_worker calls
> "
>   rds_ib_flush_mr_pool(pool, 0, NULL);
> "
> Then in function
> "
> int rds_ib_flush_mr_pool(struct rds_ib_mr_pool *pool,
>                           int free_all, struct rds_ib_mr **ibmr_ret)
> "
> ibmr_ret is NULL.
> 
> In the source code,
> "
> ...
> list_to_llist_nodes(pool, &unmap_list, &clean_nodes, &clean_tail);
> if (ibmr_ret)
>          *ibmr_ret = llist_entry(clean_nodes, struct rds_ib_mr, llnode);
> 
> /* more than one entry in llist nodes */
> if (clean_nodes->next)
>          llist_add_batch(clean_nodes->next, clean_tail, &pool->clean_list);
> ...
> "
> When ibmr_ret is NULL, llist_entry is not executed. clean_nodes->next
> instead of clean_nodes is added in clean_list.
> So clean_nodes is discarded. It can not be used again.
> The workqueue is executed periodically. So more and more clean_nodes are
> discarded. Finally the clean_list is NULL.
> Then this problem will occur.
> 
> Fixes: 1bc144b62524 ("net, rds, Replace xlist in net/rds/xlist.h with llist")
> Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
> ---
Thanks.
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH 1/1] net: rds: fix memory leak in rds_ib_flush_mr_pool
  2019-06-06  8:00 [PATCH 1/1] net: rds: fix memory leak in rds_ib_flush_mr_pool Zhu Yanjun
  2019-06-06 15:57 ` santosh.shilimkar
@ 2019-06-06 17:33 ` David Miller
  1 sibling, 0 replies; 3+ messages in thread
From: David Miller @ 2019-06-06 17:33 UTC (permalink / raw)
  To: yanjun.zhu; +Cc: santosh.shilimkar, netdev, linux-rdma, rds-devel

From: Zhu Yanjun <yanjun.zhu@oracle.com>
Date: Thu,  6 Jun 2019 04:00:03 -0400

> When the following tests last for several hours, the problem will occur.
 ...
> When ibmr_ret is NULL, llist_entry is not executed. clean_nodes->next
> instead of clean_nodes is added in clean_list.
> So clean_nodes is discarded. It can not be used again.
> The workqueue is executed periodically. So more and more clean_nodes are
> discarded. Finally the clean_list is NULL.
> Then this problem will occur.
> 
> Fixes: 1bc144b62524 ("net, rds, Replace xlist in net/rds/xlist.h with llist")
> Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>

Applied and queued up for -stable.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2019-06-06 17:33 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-06  8:00 [PATCH 1/1] net: rds: fix memory leak in rds_ib_flush_mr_pool Zhu Yanjun
2019-06-06 15:57 ` santosh.shilimkar
2019-06-06 17:33 ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).