linux-bcache.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2] nvme-tpc: don't use sendpage for pages not taking reference counter
@ 2020-07-10 13:26 Coly Li
  2020-07-10 13:26 ` [PATCH 2/2] bcache: allocate meta data pages as compound pages Coly Li
  2020-07-13 12:30 ` [PATCH 1/2] nvme-tpc: don't use sendpage for pages not taking reference counter Coly Li
  0 siblings, 2 replies; 4+ messages in thread
From: Coly Li @ 2020-07-10 13:26 UTC (permalink / raw)
  To: linux-block, linux-nvme, linux-bcache
  Cc: Coly Li, Chaitanya Kulkarni, Christoph Hellwig, Hannes Reinecke,
	Jan Kara, Jens Axboe, Mikhail Skorzhinskii, Philipp Reisner,
	Sagi Grimberg, Vlastimil Babka, stable

Currently nvme_tcp_try_send_data() doesn't use kernel_sendpage() to
send slab pages. But for pages allocated by __get_free_pages() without
__GFP_COMP, which also have refcount as 0, they are still sent by
kernel_sendpage() to remote end, this is problematic.

When bcache uses a remote NVMe SSD via nvme-over-tcp as its cache
device, writing meta data e.g. cache_set->disk_buckets to remote SSD may
trigger a kernel panic due to the above problem. Bcause the meta data
pages for cache_set->disk_buckets are allocated by __get_free_pages()
without __GFP_COMP.

This problem should be fixed both in upper layer driver (bcache) and
nvme-over-tcp code. This patch fixes the nvme-over-tcp code by checking
whether the page refcount is 0, if yes then don't use kernel_sendpage()
and call sock_no_sendpage() to send the page into network stack.

The code comments in this patch is copied and modified from drbd where
the similar problem already gets solved by Philipp Reisner. This is the
best code comment including my own version. 

Signed-off-by: Coly Li <colyli@suse.de>
Cc: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Jan Kara <jack@suse.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Mikhail Skorzhinskii <mskorzhinskiy@solarflare.com>
Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Vlastimil Babka <vbabka@suse.com>
Cc: stable@vger.kernel.org
---
 drivers/nvme/host/tcp.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 79ef2b8e2b3c..faa71db7522a 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -887,8 +887,17 @@ static int nvme_tcp_try_send_data(struct nvme_tcp_request *req)
 		else
 			flags |= MSG_MORE | MSG_SENDPAGE_NOTLAST;
 
-		/* can't zcopy slab pages */
-		if (unlikely(PageSlab(page))) {
+		/*
+		 * e.g. XFS meta- & log-data is in slab pages, or bcache meta
+		 * data pages, or other high order pages allocated by
+		 * __get_free_pages() without __GFP_COMP, which have a page_count
+		 * of 0 and/or have PageSlab() set. We cannot use send_page for
+		 * those, as that does get_page(); put_page(); and would cause
+		 * either a VM_BUG directly, or __page_cache_release a page that
+		 * would actually still be referenced by someone, leading to some
+		 * obscure delayed Oops somewhere else.
+		 */
+		if (unlikely(PageSlab(page) || page_count(page) < 1)) {
 			ret = sock_no_sendpage(queue->sock, page, offset, len,
 					flags);
 		} else {
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-07-14  7:19 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-10 13:26 [PATCH 1/2] nvme-tpc: don't use sendpage for pages not taking reference counter Coly Li
2020-07-10 13:26 ` [PATCH 2/2] bcache: allocate meta data pages as compound pages Coly Li
2020-07-13 12:30 ` [PATCH 1/2] nvme-tpc: don't use sendpage for pages not taking reference counter Coly Li
2020-07-14  7:19   ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).