From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.0 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE, SPF_PASS,T_DKIMWL_WL_HIGH,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B9B5CC28EBD for ; Sun, 9 Jun 2019 17:06:49 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 940E8204EC for ; Sun, 9 Jun 2019 17:06:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1560100009; bh=iuPuEw6k/qLj6HTpc+dIvdDWh6PQgc0U6t6HGluGMdk=; h=From:To:Cc:Subject:Date:In-Reply-To:References:List-ID:From; b=0x5pxqIDL2h6GFDN9qlK79PY8ow2Ed2Y6klmRAOcZsTn0J+xqj5fqEelpNKGUEpGs CnObAFDHLMY3QsgzvWNuBFAH1w1r+TCfw9Db4gFCxr69Y+ZDymHTpd+1fLh9MTGuz3 YzTEnbldaw+tH2BdN6GFdc93Iqv1MuP2UoJYoAm8= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2389079AbfFIRGs (ORCPT ); Sun, 9 Jun 2019 13:06:48 -0400 Received: from mail.kernel.org ([198.145.29.99]:46704 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2388593AbfFIRGq (ORCPT ); Sun, 9 Jun 2019 13:06:46 -0400 Received: from localhost (83-86-89-107.cable.dynamic.v4.ziggo.nl [83.86.89.107]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 7D2AF206C3; Sun, 9 Jun 2019 17:06:45 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1560100006; bh=iuPuEw6k/qLj6HTpc+dIvdDWh6PQgc0U6t6HGluGMdk=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=wUpzMrzyu7hYUkVkBFQScvpm6ghka4O0VcLTatKqSvnrH7Czx/Gnil9DJf7vxM65B SF497F2J/P/bNp1rFgeIStEJ4f2IuFTjh7jchqtmtPyFbljwVqHH/43RN6/wFx+Yf1 FXcwVVVw5HmnMRhANbtUm/X/9JVZjVkOZLvoQJuc= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Zhu Yanjun , Santosh Shilimkar , "David S. Miller" Subject: [PATCH 4.4 232/241] net: rds: fix memory leak in rds_ib_flush_mr_pool Date: Sun, 9 Jun 2019 18:42:54 +0200 Message-Id: <20190609164155.436277890@linuxfoundation.org> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20190609164147.729157653@linuxfoundation.org> References: <20190609164147.729157653@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Zhu Yanjun [ Upstream commit 85cb928787eab6a2f4ca9d2a798b6f3bed53ced1 ] When the following tests last for several hours, the problem will occur. Server: rds-stress -r 1.1.1.16 -D 1M Client: rds-stress -r 1.1.1.14 -s 1.1.1.16 -D 1M -T 30 The following will occur. " Starting up.... tsks tx/s rx/s tx+rx K/s mbi K/s mbo K/s tx us/c rtt us cpu % 1 0 0 0.00 0.00 0.00 0.00 0.00 -1.00 1 0 0 0.00 0.00 0.00 0.00 0.00 -1.00 1 0 0 0.00 0.00 0.00 0.00 0.00 -1.00 1 0 0 0.00 0.00 0.00 0.00 0.00 -1.00 " >>From vmcore, we can find that clean_list is NULL. >>From the source code, rds_mr_flushd calls rds_ib_mr_pool_flush_worker. Then rds_ib_mr_pool_flush_worker calls " rds_ib_flush_mr_pool(pool, 0, NULL); " Then in function " int rds_ib_flush_mr_pool(struct rds_ib_mr_pool *pool, int free_all, struct rds_ib_mr **ibmr_ret) " ibmr_ret is NULL. In the source code, " ... list_to_llist_nodes(pool, &unmap_list, &clean_nodes, &clean_tail); if (ibmr_ret) *ibmr_ret = llist_entry(clean_nodes, struct rds_ib_mr, llnode); /* more than one entry in llist nodes */ if (clean_nodes->next) llist_add_batch(clean_nodes->next, clean_tail, &pool->clean_list); ... " When ibmr_ret is NULL, llist_entry is not executed. clean_nodes->next instead of clean_nodes is added in clean_list. So clean_nodes is discarded. It can not be used again. The workqueue is executed periodically. So more and more clean_nodes are discarded. Finally the clean_list is NULL. Then this problem will occur. Fixes: 1bc144b62524 ("net, rds, Replace xlist in net/rds/xlist.h with llist") Signed-off-by: Zhu Yanjun Acked-by: Santosh Shilimkar Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/rds/ib_rdma.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) --- a/net/rds/ib_rdma.c +++ b/net/rds/ib_rdma.c @@ -725,12 +725,14 @@ static int rds_ib_flush_mr_pool(struct r wait_clean_list_grace(); list_to_llist_nodes(pool, &unmap_list, &clean_nodes, &clean_tail); - if (ibmr_ret) + if (ibmr_ret) { *ibmr_ret = llist_entry(clean_nodes, struct rds_ib_mr, llnode); - + clean_nodes = clean_nodes->next; + } /* more than one entry in llist nodes */ - if (clean_nodes->next) - llist_add_batch(clean_nodes->next, clean_tail, &pool->clean_list); + if (clean_nodes) + llist_add_batch(clean_nodes, clean_tail, + &pool->clean_list); }