linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sagi Grimberg <sagi@grimberg.me>
To: Coly Li <colyli@suse.de>,
	philipp.reisner@linbit.com, linux-nvme@lists.infradead.org,
	linux-block@vger.kernel.org, linux-bcache@vger.kernel.org,
	hch@lst.de
Cc: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>,
	Hannes Reinecke <hare@suse.de>, Jan Kara <jack@suse.com>,
	Jens Axboe <axboe@kernel.dk>,
	Mikhail Skorzhinskii <mskorzhinskiy@solarflare.com>,
	Vlastimil Babka <vbabka@suse.com>,
	stable@vger.kernel.org
Subject: Re: [PATCH 1/2] nvme-tcp: use sendpage_ok() to check page for kernel_sendpage()
Date: Mon, 27 Jul 2020 10:25:29 -0700	[thread overview]
Message-ID: <f6cf7563-9d8c-baa8-e8e7-e41f9b13e787@grimberg.me> (raw)
In-Reply-To: <20200726135224.107516-1-colyli@suse.de>



On 7/26/20 6:52 AM, Coly Li wrote:
> Currently nvme_tcp_try_send_data() doesn't use kernel_sendpage() to
> send slab pages. But for pages allocated by __get_free_pages() without
> __GFP_COMP, which also have refcount as 0, they are still sent by
> kernel_sendpage() to remote end, this is problematic.
> 
> When bcache uses a remote NVMe SSD via nvme-over-tcp as its cache
> device, writing meta data e.g. cache_set->disk_buckets to remote SSD may
> trigger a kernel panic due to the above problem. Bcause the meta data
> pages for cache_set->disk_buckets are allocated by __get_free_pages()
> without __GFP_COMP.
> 
> This problem should be fixed both in upper layer driver (bcache) and
> nvme-over-tcp code. This patch fixes the nvme-over-tcp code by checking
> whether the page refcount is 0, if yes then don't use kernel_sendpage()
> and call sock_no_sendpage() to send the page into network stack.
> 
> Such check is done by macro sendpage_ok() in this patch, which is defined
> in include/linux/net.h as,
> 	(!PageSlab(page) && page_count(page) >= 1)
> If sendpage_ok() returns false, sock_no_sendpage() will handle the page
> other than kernel_sendpage().
> 
> The code comments in this patch is copied and modified from drbd where
> the similar problem already gets solved by Philipp Reisner. This is the
> best code comment including my own version.
> 
> Signed-off-by: Coly Li <colyli@suse.de>
> Cc: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Hannes Reinecke <hare@suse.de>
> Cc: Jan Kara <jack@suse.com>
> Cc: Jens Axboe <axboe@kernel.dk>
> Cc: Mikhail Skorzhinskii <mskorzhinskiy@solarflare.com>
> Cc: Philipp Reisner <philipp.reisner@linbit.com>
> Cc: Sagi Grimberg <sagi@grimberg.me>
> Cc: Vlastimil Babka <vbabka@suse.com>
> Cc: stable@vger.kernel.org
> ---
> Changelog:
> v3: introduce a more common name sendpage_ok() for the open coded check
> v2: fix typo in patch subject.
> v1: the initial version.
> 
>   drivers/nvme/host/tcp.c | 13 +++++++++++--
>   include/linux/net.h     |  2 ++
>   2 files changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
> index 79ef2b8e2b3c..f9952f6d94b9 100644
> --- a/drivers/nvme/host/tcp.c
> +++ b/drivers/nvme/host/tcp.c
> @@ -887,8 +887,17 @@ static int nvme_tcp_try_send_data(struct nvme_tcp_request *req)
>   		else
>   			flags |= MSG_MORE | MSG_SENDPAGE_NOTLAST;
>   
> -		/* can't zcopy slab pages */
> -		if (unlikely(PageSlab(page))) {
> +		/*
> +		 * e.g. XFS meta- & log-data is in slab pages, or bcache meta
> +		 * data pages, or other high order pages allocated by
> +		 * __get_free_pages() without __GFP_COMP, which have a page_count
> +		 * of 0 and/or have PageSlab() set. We cannot use send_page for
> +		 * those, as that does get_page(); put_page(); and would cause
> +		 * either a VM_BUG directly, or __page_cache_release a page that
> +		 * would actually still be referenced by someone, leading to some
> +		 * obscure delayed Oops somewhere else.
> +		 */

I was hoping that this comment would move to the helper as well.

Agree with Christoph comment as well.

  parent reply	other threads:[~2020-07-27 17:25 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-26 13:52 [PATCH 1/2] nvme-tcp: use sendpage_ok() to check page for kernel_sendpage() Coly Li
2020-07-26 13:52 ` [PATCH 2/2] drbd: code cleanup by using " Coly Li
2020-07-26 15:07 ` [PATCH 1/2] nvme-tcp: use " Christoph Hellwig
2020-07-27 17:25 ` Sagi Grimberg [this message]
2020-07-28 12:42   ` Coly Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f6cf7563-9d8c-baa8-e8e7-e41f9b13e787@grimberg.me \
    --to=sagi@grimberg.me \
    --cc=axboe@kernel.dk \
    --cc=chaitanya.kulkarni@wdc.com \
    --cc=colyli@suse.de \
    --cc=hare@suse.de \
    --cc=hch@lst.de \
    --cc=jack@suse.com \
    --cc=linux-bcache@vger.kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=mskorzhinskiy@solarflare.com \
    --cc=philipp.reisner@linbit.com \
    --cc=stable@vger.kernel.org \
    --cc=vbabka@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).