linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Christoph Hellwig <hch@lst.de>,
	"Martin K. Petersen" <martin.petersen@oracle.com>,
	Damien Le Moal <damien.lemoal@wdc.com>,
	Jens Axboe <axboe@kernel.dk>
Subject: [PATCH 5.2 65/66] sd_zbc: Fix report zones buffer allocation
Date: Fri, 26 Jul 2019 17:25:04 +0200	[thread overview]
Message-ID: <20190726152308.684968854@linuxfoundation.org> (raw)
In-Reply-To: <20190726152301.936055394@linuxfoundation.org>

From: Damien Le Moal <damien.lemoal@wdc.com>

commit b091ac616846a1da75b1f2566b41255ce7f0e0a6 upstream.

During disk scan and revalidation done with sd_revalidate(), the zones
of a zoned disk are checked using the helper function
blk_revalidate_disk_zones() if a configuration change is detected
(change in the number of zones or zone size). The function
blk_revalidate_disk_zones() issues report_zones calls that are very
large, that is, to obtain zone information for all zones of the disk
with a single command. The size of the report zones command buffer
necessary for such large request generally is lower than the disk
max_hw_sectors and KMALLOC_MAX_SIZE (4MB) and succeeds on boot (no
memory fragmentation), but often fail at run time (e.g. hot-plug
event). This causes the disk revalidation to fail and the disk
capacity to be changed to 0.

This problem can be avoided by using vmalloc() instead of kmalloc() for
the buffer allocation. To limit the amount of memory to be allocated,
this patch also introduces the arbitrary SD_ZBC_REPORT_MAX_ZONES
maximum number of zones to report with a single report zones command.
This limit may be lowered further to satisfy the disk max_hw_sectors
limit. Finally, to ensure that the vmalloc-ed buffer can always be
mapped in a request, the buffer size is further limited to at most
queue_max_segments() pages, allowing successful mapping of the buffer
even in the worst case scenario where none of the buffer pages are
contiguous.

Fixes: 515ce6061312 ("scsi: sd_zbc: Fix sd_zbc_report_zones() buffer allocation")
Fixes: e76239a3748c ("block: add a report_zones method")
Cc: stable@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


---
 drivers/scsi/sd_zbc.c |  104 ++++++++++++++++++++++++++++++++++++--------------
 1 file changed, 75 insertions(+), 29 deletions(-)

--- a/drivers/scsi/sd_zbc.c
+++ b/drivers/scsi/sd_zbc.c
@@ -9,6 +9,8 @@
  */
 
 #include <linux/blkdev.h>
+#include <linux/vmalloc.h>
+#include <linux/sched/mm.h>
 
 #include <asm/unaligned.h>
 
@@ -50,7 +52,7 @@ static void sd_zbc_parse_report(struct s
 /**
  * sd_zbc_do_report_zones - Issue a REPORT ZONES scsi command.
  * @sdkp: The target disk
- * @buf: Buffer to use for the reply
+ * @buf: vmalloc-ed buffer to use for the reply
  * @buflen: the buffer size
  * @lba: Start LBA of the report
  * @partial: Do partial report
@@ -79,7 +81,6 @@ static int sd_zbc_do_report_zones(struct
 	put_unaligned_be32(buflen, &cmd[10]);
 	if (partial)
 		cmd[14] = ZBC_REPORT_ZONE_PARTIAL;
-	memset(buf, 0, buflen);
 
 	result = scsi_execute_req(sdp, cmd, DMA_FROM_DEVICE,
 				  buf, buflen, &sshdr,
@@ -103,6 +104,53 @@ static int sd_zbc_do_report_zones(struct
 	return 0;
 }
 
+/*
+ * Maximum number of zones to get with one report zones command.
+ */
+#define SD_ZBC_REPORT_MAX_ZONES		8192U
+
+/**
+ * Allocate a buffer for report zones reply.
+ * @sdkp: The target disk
+ * @nr_zones: Maximum number of zones to report
+ * @buflen: Size of the buffer allocated
+ *
+ * Try to allocate a reply buffer for the number of requested zones.
+ * The size of the buffer allocated may be smaller than requested to
+ * satify the device constraint (max_hw_sectors, max_segments, etc).
+ *
+ * Return the address of the allocated buffer and update @buflen with
+ * the size of the allocated buffer.
+ */
+static void *sd_zbc_alloc_report_buffer(struct scsi_disk *sdkp,
+					unsigned int nr_zones, size_t *buflen)
+{
+	struct request_queue *q = sdkp->disk->queue;
+	size_t bufsize;
+	void *buf;
+
+	/*
+	 * Report zone buffer size should be at most 64B times the number of
+	 * zones requested plus the 64B reply header, but should be at least
+	 * SECTOR_SIZE for ATA devices.
+	 * Make sure that this size does not exceed the hardware capabilities.
+	 * Furthermore, since the report zone command cannot be split, make
+	 * sure that the allocated buffer can always be mapped by limiting the
+	 * number of pages allocated to the HBA max segments limit.
+	 */
+	nr_zones = min(nr_zones, SD_ZBC_REPORT_MAX_ZONES);
+	bufsize = roundup((nr_zones + 1) * 64, 512);
+	bufsize = min_t(size_t, bufsize,
+			queue_max_hw_sectors(q) << SECTOR_SHIFT);
+	bufsize = min_t(size_t, bufsize, queue_max_segments(q) << PAGE_SHIFT);
+
+	buf = vzalloc(bufsize);
+	if (buf)
+		*buflen = bufsize;
+
+	return buf;
+}
+
 /**
  * sd_zbc_report_zones - Disk report zones operation.
  * @disk: The target disk
@@ -118,30 +166,23 @@ int sd_zbc_report_zones(struct gendisk *
 			gfp_t gfp_mask)
 {
 	struct scsi_disk *sdkp = scsi_disk(disk);
-	unsigned int i, buflen, nrz = *nr_zones;
+	unsigned int i, nrz = *nr_zones;
 	unsigned char *buf;
-	size_t offset = 0;
+	size_t buflen = 0, offset = 0;
 	int ret = 0;
 
 	if (!sd_is_zoned(sdkp))
 		/* Not a zoned device */
 		return -EOPNOTSUPP;
 
-	/*
-	 * Get a reply buffer for the number of requested zones plus a header,
-	 * without exceeding the device maximum command size. For ATA disks,
-	 * buffers must be aligned to 512B.
-	 */
-	buflen = min(queue_max_hw_sectors(disk->queue) << 9,
-		     roundup((nrz + 1) * 64, 512));
-	buf = kmalloc(buflen, gfp_mask);
+	buf = sd_zbc_alloc_report_buffer(sdkp, nrz, &buflen);
 	if (!buf)
 		return -ENOMEM;
 
 	ret = sd_zbc_do_report_zones(sdkp, buf, buflen,
 			sectors_to_logical(sdkp->device, sector), true);
 	if (ret)
-		goto out_free_buf;
+		goto out;
 
 	nrz = min(nrz, get_unaligned_be32(&buf[0]) / 64);
 	for (i = 0; i < nrz; i++) {
@@ -152,8 +193,8 @@ int sd_zbc_report_zones(struct gendisk *
 
 	*nr_zones = nrz;
 
-out_free_buf:
-	kfree(buf);
+out:
+	kvfree(buf);
 
 	return ret;
 }
@@ -287,8 +328,6 @@ static int sd_zbc_check_zoned_characteri
 	return 0;
 }
 
-#define SD_ZBC_BUF_SIZE 131072U
-
 /**
  * sd_zbc_check_zones - Check the device capacity and zone sizes
  * @sdkp: Target disk
@@ -304,22 +343,28 @@ static int sd_zbc_check_zoned_characteri
  */
 static int sd_zbc_check_zones(struct scsi_disk *sdkp, u32 *zblocks)
 {
+	size_t bufsize, buflen;
+	unsigned int noio_flag;
 	u64 zone_blocks = 0;
 	sector_t max_lba, block = 0;
 	unsigned char *buf;
 	unsigned char *rec;
-	unsigned int buf_len;
-	unsigned int list_length;
 	int ret;
 	u8 same;
 
+	/* Do all memory allocations as if GFP_NOIO was specified */
+	noio_flag = memalloc_noio_save();
+
 	/* Get a buffer */
-	buf = kmalloc(SD_ZBC_BUF_SIZE, GFP_KERNEL);
-	if (!buf)
-		return -ENOMEM;
+	buf = sd_zbc_alloc_report_buffer(sdkp, SD_ZBC_REPORT_MAX_ZONES,
+					 &bufsize);
+	if (!buf) {
+		ret = -ENOMEM;
+		goto out;
+	}
 
 	/* Do a report zone to get max_lba and the same field */
-	ret = sd_zbc_do_report_zones(sdkp, buf, SD_ZBC_BUF_SIZE, 0, false);
+	ret = sd_zbc_do_report_zones(sdkp, buf, bufsize, 0, false);
 	if (ret)
 		goto out_free;
 
@@ -355,12 +400,12 @@ static int sd_zbc_check_zones(struct scs
 	do {
 
 		/* Parse REPORT ZONES header */
-		list_length = get_unaligned_be32(&buf[0]) + 64;
+		buflen = min_t(size_t, get_unaligned_be32(&buf[0]) + 64,
+			       bufsize);
 		rec = buf + 64;
-		buf_len = min(list_length, SD_ZBC_BUF_SIZE);
 
 		/* Parse zone descriptors */
-		while (rec < buf + buf_len) {
+		while (rec < buf + buflen) {
 			u64 this_zone_blocks = get_unaligned_be64(&rec[8]);
 
 			if (zone_blocks == 0) {
@@ -376,8 +421,8 @@ static int sd_zbc_check_zones(struct scs
 		}
 
 		if (block < sdkp->capacity) {
-			ret = sd_zbc_do_report_zones(sdkp, buf, SD_ZBC_BUF_SIZE,
-						     block, true);
+			ret = sd_zbc_do_report_zones(sdkp, buf, bufsize, block,
+						     true);
 			if (ret)
 				goto out_free;
 		}
@@ -408,7 +453,8 @@ out:
 	}
 
 out_free:
-	kfree(buf);
+	memalloc_noio_restore(noio_flag);
+	kvfree(buf);
 
 	return ret;
 }



  parent reply	other threads:[~2019-07-26 15:28 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-26 15:23 [PATCH 5.2 00/66] 5.2.4-stable review Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.2 01/66] bnx2x: Prevent load reordering in tx completion processing Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.2 02/66] caif-hsi: fix possible deadlock in cfhsi_exit_module() Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.2 03/66] hv_netvsc: Fix extra rcu_read_unlock in netvsc_recv_callback() Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.2 04/66] igmp: fix memory leak in igmpv3_del_delrec() Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.2 05/66] ipv4: dont set IPv6 only flags to IPv4 addresses Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.2 06/66] ipv6: rt6_check should return NULL if from is NULL Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.2 07/66] ipv6: Unlink sibling route in case of failure Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.2 08/66] net: bcmgenet: use promisc for unsupported filters Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.2 09/66] net: dsa: mv88e6xxx: wait after reset deactivation Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.2 10/66] net: make skb_dst_force return true when dst is refcounted Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.2 11/66] net: neigh: fix multiple neigh timer scheduling Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.2 12/66] net: openvswitch: fix csum updates for MPLS actions Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.2 13/66] net: phy: sfp: hwmon: Fix scaling of RX power Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.2 14/66] net_sched: unset TCQ_F_CAN_BYPASS when adding filters Greg Kroah-Hartman
2019-07-27 21:24   ` Sasha Levin
2019-07-28  6:21     ` Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.2 15/66] net: stmmac: Re-work the queue selection for TSO packets Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.2 16/66] net/tls: make sure offload also gets the keys wiped Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.2 17/66] nfc: fix potential illegal memory access Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.2 18/66] r8169: fix issue with confused RX unit after PHY power-down on RTL8411b Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.2 19/66] rxrpc: Fix send on a connected, but unbound socket Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.2 20/66] sctp: fix error handling on stream scheduler initialization Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.2 21/66] sctp: not bind the socket in sctp_connect Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.2 22/66] sky2: Disable MSI on ASUS P6T Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.2 23/66] tcp: be more careful in tcp_fragment() Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.2 24/66] tcp: fix tcp_set_congestion_control() use from bpf hook Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.2 25/66] tcp: Reset bytes_acked and bytes_received when disconnecting Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.2 26/66] vrf: make sure skb->data contains ip header to make routing Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.2 27/66] net/mlx5e: IPoIB, Add error path in mlx5_rdma_setup_rn Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.2 28/66] net: bridge: mcast: fix stale nsrcs pointer in igmp3/mld2 report handling Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.2 29/66] net: bridge: mcast: fix stale ipv6 hdr pointer when handling v6 query Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.2 30/66] net: bridge: dont cache ether dest pointer on input Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.2 31/66] net: bridge: stp: dont cache eth dest pointer before skb pull Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.2 32/66] macsec: fix use-after-free of skb during RX Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.2 33/66] macsec: fix checksumming after decryption Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.2 34/66] netrom: fix a memory leak in nr_rx_frame() Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.2 35/66] netrom: hold sock when setting skb->destructor Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.2 36/66] selftests: txring_overwrite: fix incorrect test of mmap() return value Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.2 37/66] net/tls: fix poll ignoring partially copied records Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.2 38/66] net/tls: reject offload of TLS 1.3 Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.2 39/66] net/mlx5e: Fix port tunnel GRE entropy control Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.2 40/66] net/mlx5e: Rx, Fix checksum calculation for new hardware Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.2 41/66] net/mlx5e: Fix return value from timeout recover function Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.2 42/66] net/mlx5e: Fix error flow in tx reporter diagnose Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.2 43/66] bnxt_en: Fix VNIC accounting when enabling aRFS on 57500 chips Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.2 44/66] mlxsw: spectrum_dcb: Configure DSCP map as the last rule is removed Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.2 45/66] net/mlx5: E-Switch, Fix default encap mode Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.2 46/66] mlxsw: spectrum: Do not process learned records with a dummy FID Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.2 47/66] dma-buf: balance refcount inbalance Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.2 48/66] dma-buf: Discard old fence_excl on retrying get_fences_rcu for realloc Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.2 49/66] Revert "gpio/spi: Fix spi-gpio regression on active high CS" Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.2 50/66] gpiolib: of: fix a memory leak in of_gpio_flags_quirks() Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.2 51/66] gpio: davinci: silence error prints in case of EPROBE_DEFER Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.2 52/66] MIPS: lb60: Fix pin mappings Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.2 53/66] perf script: Assume native_arch for pipe mode Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.2 54/66] perf/core: Fix exclusive events grouping Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.2 55/66] perf/core: Fix race between close() and fork() Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.2 56/66] ext4: dont allow any modifications to an immutable file Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.2 57/66] ext4: enforce the immutable flag on open files Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.2 58/66] mm: add filemap_fdatawait_range_keep_errors() Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.2 59/66] jbd2: introduce jbd2_inode dirty range scoping Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.2 60/66] ext4: use " Greg Kroah-Hartman
2019-07-26 15:25 ` [PATCH 5.2 61/66] ext4: allow directory holes Greg Kroah-Hartman
2019-07-26 15:25 ` [PATCH 5.2 62/66] KVM: nVMX: do not use dangling shadow VMCS after guest reset Greg Kroah-Hartman
2019-07-26 15:25 ` [PATCH 5.2 63/66] KVM: nVMX: Clear pending KVM_REQ_GET_VMCS12_PAGES when leaving nested Greg Kroah-Hartman
2019-07-26 15:25 ` [PATCH 5.2 64/66] Revert "kvm: x86: Use task structs fpu field for user" Greg Kroah-Hartman
2019-07-26 15:25 ` Greg Kroah-Hartman [this message]
2019-07-26 15:25 ` [PATCH 5.2 66/66] block: Limit zone array allocation size Greg Kroah-Hartman
2019-07-27  2:14 ` [PATCH 5.2 00/66] 5.2.4-stable review kernelci.org bot
2019-07-27  2:33 ` shuah
2019-07-27 10:50   ` Greg Kroah-Hartman
2019-07-27  5:35 ` Naresh Kamboju
2019-07-27 10:49   ` Greg Kroah-Hartman
2019-07-27 16:07 ` Guenter Roeck
2019-07-28  6:22   ` Greg Kroah-Hartman
2019-07-29  9:03 ` Jon Hunter
2019-07-29 15:12   ` Greg Kroah-Hartman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190726152308.684968854@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=axboe@kernel.dk \
    --cc=damien.lemoal@wdc.com \
    --cc=hch@lst.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).