linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Jens Axboe <axboe@kernel.dk>,
	Christoph Hellwig <hch@lst.de>,
	"Martin K. Petersen" <martin.petersen@oracle.com>
Subject: [PATCH 4.19 13/46] scsi: sd: use mempool for discard special page
Date: Fri, 28 Dec 2018 12:52:07 +0100	[thread overview]
Message-ID: <20181228113125.558041477@linuxfoundation.org> (raw)
In-Reply-To: <20181228113124.971620049@linuxfoundation.org>

4.19-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Jens Axboe <axboe@kernel.dk>

commit 61cce6f6eeced5ddd9cac55e807fe28b4f18c1ba upstream.

When boxes are run near (or to) OOM, we have a problem with the discard
page allocation in sd. If we fail allocating the special page, we return
busy, and it'll get retried. But since ordering is honored for dispatch
requests, we can keep retrying this same IO and failing. Behind that IO
could be requests that want to free memory, but they never get the
chance. This means you get repeated spews of traces like this:

[1201401.625972] Call Trace:
[1201401.631748]  dump_stack+0x4d/0x65
[1201401.639445]  warn_alloc+0xec/0x190
[1201401.647335]  __alloc_pages_slowpath+0xe84/0xf30
[1201401.657722]  ? get_page_from_freelist+0x11b/0xb10
[1201401.668475]  ? __alloc_pages_slowpath+0x2e/0xf30
[1201401.679054]  __alloc_pages_nodemask+0x1f9/0x210
[1201401.689424]  alloc_pages_current+0x8c/0x110
[1201401.699025]  sd_setup_write_same16_cmnd+0x51/0x150
[1201401.709987]  sd_init_command+0x49c/0xb70
[1201401.719029]  scsi_setup_cmnd+0x9c/0x160
[1201401.727877]  scsi_queue_rq+0x4d9/0x610
[1201401.736535]  blk_mq_dispatch_rq_list+0x19a/0x360
[1201401.747113]  blk_mq_sched_dispatch_requests+0xff/0x190
[1201401.758844]  __blk_mq_run_hw_queue+0x95/0xa0
[1201401.768653]  blk_mq_run_work_fn+0x2c/0x30
[1201401.777886]  process_one_work+0x14b/0x400
[1201401.787119]  worker_thread+0x4b/0x470
[1201401.795586]  kthread+0x110/0x150
[1201401.803089]  ? rescuer_thread+0x320/0x320
[1201401.812322]  ? kthread_park+0x90/0x90
[1201401.820787]  ? do_syscall_64+0x53/0x150
[1201401.829635]  ret_from_fork+0x29/0x40

Ensure that the discard page allocation has a mempool backing, so we
know we can make progress.

Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/scsi/sd.c |   23 +++++++++++++++++++----
 1 file changed, 19 insertions(+), 4 deletions(-)

--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -132,6 +132,7 @@ static DEFINE_MUTEX(sd_ref_mutex);
 
 static struct kmem_cache *sd_cdb_cache;
 static mempool_t *sd_cdb_pool;
+static mempool_t *sd_page_pool;
 
 static const char *sd_cache_types[] = {
 	"write through", "none", "write back",
@@ -758,9 +759,10 @@ static int sd_setup_unmap_cmnd(struct sc
 	unsigned int data_len = 24;
 	char *buf;
 
-	rq->special_vec.bv_page = alloc_page(GFP_ATOMIC | __GFP_ZERO);
+	rq->special_vec.bv_page = mempool_alloc(sd_page_pool, GFP_ATOMIC);
 	if (!rq->special_vec.bv_page)
 		return BLKPREP_DEFER;
+	clear_highpage(rq->special_vec.bv_page);
 	rq->special_vec.bv_offset = 0;
 	rq->special_vec.bv_len = data_len;
 	rq->rq_flags |= RQF_SPECIAL_PAYLOAD;
@@ -791,9 +793,10 @@ static int sd_setup_write_same16_cmnd(st
 	u32 nr_sectors = blk_rq_sectors(rq) >> (ilog2(sdp->sector_size) - 9);
 	u32 data_len = sdp->sector_size;
 
-	rq->special_vec.bv_page = alloc_page(GFP_ATOMIC | __GFP_ZERO);
+	rq->special_vec.bv_page = mempool_alloc(sd_page_pool, GFP_ATOMIC);
 	if (!rq->special_vec.bv_page)
 		return BLKPREP_DEFER;
+	clear_highpage(rq->special_vec.bv_page);
 	rq->special_vec.bv_offset = 0;
 	rq->special_vec.bv_len = data_len;
 	rq->rq_flags |= RQF_SPECIAL_PAYLOAD;
@@ -821,9 +824,10 @@ static int sd_setup_write_same10_cmnd(st
 	u32 nr_sectors = blk_rq_sectors(rq) >> (ilog2(sdp->sector_size) - 9);
 	u32 data_len = sdp->sector_size;
 
-	rq->special_vec.bv_page = alloc_page(GFP_ATOMIC | __GFP_ZERO);
+	rq->special_vec.bv_page = mempool_alloc(sd_page_pool, GFP_ATOMIC);
 	if (!rq->special_vec.bv_page)
 		return BLKPREP_DEFER;
+	clear_highpage(rq->special_vec.bv_page);
 	rq->special_vec.bv_offset = 0;
 	rq->special_vec.bv_len = data_len;
 	rq->rq_flags |= RQF_SPECIAL_PAYLOAD;
@@ -1287,7 +1291,7 @@ static void sd_uninit_command(struct scs
 	u8 *cmnd;
 
 	if (rq->rq_flags & RQF_SPECIAL_PAYLOAD)
-		__free_page(rq->special_vec.bv_page);
+		mempool_free(rq->special_vec.bv_page, sd_page_pool);
 
 	if (SCpnt->cmnd != scsi_req(rq)->cmd) {
 		cmnd = SCpnt->cmnd;
@@ -3635,6 +3639,13 @@ static int __init init_sd(void)
 		goto err_out_cache;
 	}
 
+	sd_page_pool = mempool_create_page_pool(SD_MEMPOOL_SIZE, 0);
+	if (!sd_page_pool) {
+		printk(KERN_ERR "sd: can't init discard page pool\n");
+		err = -ENOMEM;
+		goto err_out_ppool;
+	}
+
 	err = scsi_register_driver(&sd_template.gendrv);
 	if (err)
 		goto err_out_driver;
@@ -3642,6 +3653,9 @@ static int __init init_sd(void)
 	return 0;
 
 err_out_driver:
+	mempool_destroy(sd_page_pool);
+
+err_out_ppool:
 	mempool_destroy(sd_cdb_pool);
 
 err_out_cache:
@@ -3668,6 +3682,7 @@ static void __exit exit_sd(void)
 
 	scsi_unregister_driver(&sd_template.gendrv);
 	mempool_destroy(sd_cdb_pool);
+	mempool_destroy(sd_page_pool);
 	kmem_cache_destroy(sd_cdb_cache);
 
 	class_unregister(&sd_disk_class);



  parent reply	other threads:[~2018-12-28 11:53 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-28 11:51 [PATCH 4.19 00/46] 4.19.13-stable review Greg Kroah-Hartman
2018-12-28 11:51 ` [PATCH 4.19 01/46] iomap: Revert "fs/iomap.c: get/put the page in iomap_page_create/release()" Greg Kroah-Hartman
2018-12-28 11:51 ` [PATCH 4.19 02/46] Revert "vfs: Allow userns root to call mknod on owned filesystems." Greg Kroah-Hartman
2018-12-28 11:51 ` [PATCH 4.19 03/46] USB: hso: Fix OOB memory access in hso_probe/hso_get_config_data Greg Kroah-Hartman
2018-12-28 11:51 ` [PATCH 4.19 04/46] xhci: Dont prevent USB2 bus suspend in state check intended for USB3 only Greg Kroah-Hartman
2018-12-28 11:51 ` [PATCH 4.19 05/46] USB: xhci: fix broken_suspend placement in struct xchi_hcd Greg Kroah-Hartman
2018-12-28 11:52 ` [PATCH 4.19 06/46] USB: serial: option: add GosunCn ZTE WeLink ME3630 Greg Kroah-Hartman
2018-12-28 11:52 ` [PATCH 4.19 07/46] USB: serial: option: add HP lt4132 Greg Kroah-Hartman
2018-12-28 11:52 ` [PATCH 4.19 08/46] USB: serial: option: add Simcom SIM7500/SIM7600 (MBIM mode) Greg Kroah-Hartman
2018-12-28 11:52 ` [PATCH 4.19 09/46] USB: serial: option: add Fibocom NL668 series Greg Kroah-Hartman
2018-12-28 11:52 ` [PATCH 4.19 10/46] USB: serial: option: add Telit LN940 series Greg Kroah-Hartman
2018-12-28 11:52 ` [PATCH 4.19 11/46] ubifs: Handle re-linking of inodes correctly while recovery Greg Kroah-Hartman
2018-12-28 11:52 ` [PATCH 4.19 12/46] scsi: t10-pi: Return correct ref tag when queue has no integrity profile Greg Kroah-Hartman
2018-12-28 11:52 ` Greg Kroah-Hartman [this message]
2018-12-28 11:52 ` [PATCH 4.19 14/46] mmc: core: Reset HPI enabled state during re-init and in case of errors Greg Kroah-Hartman
2018-12-28 11:52 ` [PATCH 4.19 15/46] mmc: core: Allow BKOPS and CACHE ctrl even if no HPI support Greg Kroah-Hartman
2018-12-28 11:52 ` [PATCH 4.19 16/46] mmc: core: Use a minimum 1600ms timeout when enabling CACHE ctrl Greg Kroah-Hartman
2018-12-28 11:52 ` [PATCH 4.19 17/46] mmc: omap_hsmmc: fix DMA API warning Greg Kroah-Hartman
2018-12-28 11:52 ` [PATCH 4.19 18/46] gpio: max7301: fix driver for use with CONFIG_VMAP_STACK Greg Kroah-Hartman
2018-12-28 11:52 ` [PATCH 4.19 19/46] gpiolib-acpi: Only defer request_irq for GpioInt ACPI event handlers Greg Kroah-Hartman
2018-12-28 11:52 ` [PATCH 4.19 20/46] posix-timers: Fix division by zero bug Greg Kroah-Hartman
2018-12-28 11:52 ` [PATCH 4.19 21/46] KVM: X86: Fix NULL deref in vcpu_scan_ioapic Greg Kroah-Hartman
2018-12-28 11:52 ` [PATCH 4.19 22/46] kvm: x86: Add AMDs EX_CFG to the list of ignored MSRs Greg Kroah-Hartman
2018-12-28 11:52 ` [PATCH 4.19 23/46] KVM: Fix UAF in nested posted interrupt processing Greg Kroah-Hartman
2018-12-28 11:52 ` [PATCH 4.19 24/46] Drivers: hv: vmbus: Return -EINVAL for the sys files for unopened channels Greg Kroah-Hartman
2018-12-28 11:52 ` [PATCH 4.19 25/46] futex: Cure exit race Greg Kroah-Hartman
2018-12-28 11:52 ` [PATCH 4.19 26/46] x86/mtrr: Dont copy uninitialized gentry fields back to userspace Greg Kroah-Hartman
2018-12-28 11:52 ` [PATCH 4.19 27/46] x86/mm: Fix decoy address handling vs 32-bit builds Greg Kroah-Hartman
2018-12-28 11:52 ` [PATCH 4.19 28/46] x86/vdso: Pass --eh-frame-hdr to the linker Greg Kroah-Hartman
2018-12-28 11:52 ` [PATCH 4.19 29/46] x86/intel_rdt: Ensure a CPU remains online for the regions pseudo-locking sequence Greg Kroah-Hartman
2018-12-28 11:52 ` [PATCH 4.19 30/46] panic: avoid deadlocks in re-entrant console drivers Greg Kroah-Hartman
2018-12-28 11:52 ` [PATCH 4.19 31/46] mm: add mm_pxd_folded checks to pgtable_bytes accounting functions Greg Kroah-Hartman
2018-12-28 11:52 ` [PATCH 4.19 32/46] mm: make the __PAGETABLE_PxD_FOLDED defines non-empty Greg Kroah-Hartman
2018-12-28 11:52 ` [PATCH 4.19 33/46] mm: introduce mm_[p4d|pud|pmd]_folded Greg Kroah-Hartman
2018-12-28 11:52 ` [PATCH 4.19 34/46] xfrm_user: fix freeing of xfrm states on acquire Greg Kroah-Hartman
2018-12-28 11:52 ` [PATCH 4.19 35/46] rtlwifi: Fix leak of skb when processing C2H_BT_INFO Greg Kroah-Hartman
2018-12-28 11:52 ` [PATCH 4.19 36/46] iwlwifi: mvm: dont send GEO_TX_POWER_LIMIT to old firmwares Greg Kroah-Hartman
2018-12-28 11:52 ` [PATCH 4.19 37/46] Revert "mwifiex: restructure rx_reorder_tbl_lock usage" Greg Kroah-Hartman
2018-12-28 11:52 ` [PATCH 4.19 38/46] iwlwifi: add new cards for 9560, 9462, 9461 and killer series Greg Kroah-Hartman
2018-12-28 11:52 ` [PATCH 4.19 39/46] media: ov5640: Fix set format regression Greg Kroah-Hartman
2018-12-28 11:52 ` [PATCH 4.19 40/46] mm, memory_hotplug: initialize struct pages for the full memory section Greg Kroah-Hartman
2018-12-28 11:52 ` [PATCH 4.19 41/46] mm: thp: fix flags for pmd migration when split Greg Kroah-Hartman
2018-12-28 11:52 ` [PATCH 4.19 42/46] mm, page_alloc: fix has_unmovable_pages for HugePages Greg Kroah-Hartman
2018-12-28 11:52 ` [PATCH 4.19 43/46] mm: dont miss the last page because of round-off error Greg Kroah-Hartman
2018-12-28 11:52 ` [PATCH 4.19 44/46] Input: elantech - disable elan-i2c for P52 and P72 Greg Kroah-Hartman
2018-12-28 11:52 ` [PATCH 4.19 45/46] proc/sysctl: dont return ENOMEM on lookup when a table is unregistering Greg Kroah-Hartman
2018-12-28 11:52 ` [PATCH 4.19 46/46] drm/ioctl: Fix Spectre v1 vulnerabilities Greg Kroah-Hartman
2018-12-28 17:54 ` [PATCH 4.19 00/46] 4.19.13-stable review Dan Rue
2018-12-29 12:20   ` Greg Kroah-Hartman
2018-12-28 20:08 ` shuah
2018-12-29  9:55   ` Greg Kroah-Hartman
2018-12-28 21:29 ` Guenter Roeck
2018-12-29 12:20   ` Greg Kroah-Hartman
2018-12-29  9:01 ` Harsh Shandilya
2018-12-29 12:20   ` Greg Kroah-Hartman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181228113125.558041477@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=axboe@kernel.dk \
    --cc=hch@lst.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).