* [PATCH 0/3] Fix zone revalidation memory allocation failures @ 2019-06-25 2:46 Damien Le Moal 2019-06-25 2:46 ` [PATCH 1/3] block: Allow mapping of vmalloc-ed buffers Damien Le Moal ` (2 more replies) 0 siblings, 3 replies; 14+ messages in thread From: Damien Le Moal @ 2019-06-25 2:46 UTC (permalink / raw) To: linux-scsi, Martin K . Petersen, linux-block, Jens Axboe Cc: Christoph Hellwig, Bart Van Assche This series addresses a reccuring problem with zone revalidation failures observed during extensive testing with memory constrained system and device hot-plugging. The problem source is failure to allocate large memory areas with alloc_pages() or kmalloc() in blk_revalidate_disk_zones() to store the disk array of zones (struct blk_zone) or in sd_zbc_report_zones() for the report zones command reply buffer. The solution proposed here is to: 1) limit the number of zones to be reported with a single report zones command execution, and 2) Use vmalloc to allocate large-ish arrays and buffers in place of alloc_pages() and kmalloc(). With these changes, tests do not show any zone revalidation failures while not impacting the time taken for a disk initial zone inspection during device scan. Damien Le Moal (3): block: Allow mapping of vmalloc-ed buffers sd_zbc: Fix report zones buffer allocation block: Limit zone array allocation size block/bio.c | 8 ++++- block/blk-zoned.c | 29 +++++++--------- drivers/scsi/sd_zbc.c | 79 +++++++++++++++++++++++++++++++----------- include/linux/blkdev.h | 5 +++ 4 files changed, 84 insertions(+), 37 deletions(-) -- 2.21.0 ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 1/3] block: Allow mapping of vmalloc-ed buffers 2019-06-25 2:46 [PATCH 0/3] Fix zone revalidation memory allocation failures Damien Le Moal @ 2019-06-25 2:46 ` Damien Le Moal 2019-06-25 3:24 ` Chaitanya Kulkarni ` (2 more replies) 2019-06-25 2:46 ` [PATCH 2/3] sd_zbc: Fix report zones buffer allocation Damien Le Moal 2019-06-25 2:46 ` [PATCH 3/3] block: Limit zone array allocation size Damien Le Moal 2 siblings, 3 replies; 14+ messages in thread From: Damien Le Moal @ 2019-06-25 2:46 UTC (permalink / raw) To: linux-scsi, Martin K . Petersen, linux-block, Jens Axboe Cc: Christoph Hellwig, Bart Van Assche To allow the SCSI subsystem scsi_execute_req() function to issue requests using large buffers that are better allocated with vmalloc() rather than kmalloc(), modify bio_map_kern() to allow passing a buffer allocated with the vmalloc() function. To do so, simply test the buffer address using is_vmalloc_addr() and use vmalloc_to_page() instead of virt_to_page() to obtain the pages of vmalloc-ed buffers. Fixes: 515ce6061312 ("scsi: sd_zbc: Fix sd_zbc_report_zones() buffer allocation") Fixes: e76239a3748c ("block: add a report_zones method") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> --- block/bio.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/block/bio.c b/block/bio.c index ce797d73bb43..05afcaf655f3 100644 --- a/block/bio.c +++ b/block/bio.c @@ -1501,6 +1501,8 @@ struct bio *bio_map_kern(struct request_queue *q, void *data, unsigned int len, unsigned long end = (kaddr + len + PAGE_SIZE - 1) >> PAGE_SHIFT; unsigned long start = kaddr >> PAGE_SHIFT; const int nr_pages = end - start; + bool is_vmalloc = is_vmalloc_addr(data); + struct page *page; int offset, i; struct bio *bio; @@ -1518,7 +1520,11 @@ struct bio *bio_map_kern(struct request_queue *q, void *data, unsigned int len, if (bytes > len) bytes = len; - if (bio_add_pc_page(q, bio, virt_to_page(data), bytes, + if (is_vmalloc) + page = vmalloc_to_page(data); + else + page = virt_to_page(data); + if (bio_add_pc_page(q, bio, page, bytes, offset) < bytes) { /* we don't support partial mappings */ bio_put(bio); -- 2.21.0 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH 1/3] block: Allow mapping of vmalloc-ed buffers 2019-06-25 2:46 ` [PATCH 1/3] block: Allow mapping of vmalloc-ed buffers Damien Le Moal @ 2019-06-25 3:24 ` Chaitanya Kulkarni 2019-06-25 15:59 ` Bart Van Assche 2019-06-25 15:59 ` Bart Van Assche 2019-06-25 16:22 ` Christoph Hellwig 2 siblings, 1 reply; 14+ messages in thread From: Chaitanya Kulkarni @ 2019-06-25 3:24 UTC (permalink / raw) To: Damien Le Moal, linux-scsi, Martin K . Petersen, linux-block, Jens Axboe Cc: Christoph Hellwig, Bart Van Assche Looks good with one nit, can be done at the time of applying patch. Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> On 6/24/19 7:46 PM, Damien Le Moal wrote: > To allow the SCSI subsystem scsi_execute_req() function to issue > requests using large buffers that are better allocated with vmalloc() > rather than kmalloc(), modify bio_map_kern() to allow passing a buffer > allocated with the vmalloc() function. To do so, simply test the buffer > address using is_vmalloc_addr() and use vmalloc_to_page() instead of > virt_to_page() to obtain the pages of vmalloc-ed buffers. > > Fixes: 515ce6061312 ("scsi: sd_zbc: Fix sd_zbc_report_zones() buffer allocation") > Fixes: e76239a3748c ("block: add a report_zones method") > Cc: stable@vger.kernel.org > Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> > --- > block/bio.c | 8 +++++++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/block/bio.c b/block/bio.c > index ce797d73bb43..05afcaf655f3 100644 > --- a/block/bio.c > +++ b/block/bio.c > @@ -1501,6 +1501,8 @@ struct bio *bio_map_kern(struct request_queue *q, void *data, unsigned int len, > unsigned long end = (kaddr + len + PAGE_SIZE - 1) >> PAGE_SHIFT; > unsigned long start = kaddr >> PAGE_SHIFT; > const int nr_pages = end - start; > + bool is_vmalloc = is_vmalloc_addr(data); > + struct page *page; > int offset, i; > struct bio *bio; > > @@ -1518,7 +1520,11 @@ struct bio *bio_map_kern(struct request_queue *q, void *data, unsigned int len, > if (bytes > len) > bytes = len; > > - if (bio_add_pc_page(q, bio, virt_to_page(data), bytes, > + if (is_vmalloc) nit:- Can we use is_vmalloc_addr() call directly so that "if (is_vmalloc)" -> "if (is_vmalloc_addr(data))" and remove is_vmalloc variable. > + page = vmalloc_to_page(data); > + else > + page = virt_to_page(data); > + if (bio_add_pc_page(q, bio, page, bytes, > offset) < bytes) { > /* we don't support partial mappings */ > bio_put(bio); > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/3] block: Allow mapping of vmalloc-ed buffers 2019-06-25 3:24 ` Chaitanya Kulkarni @ 2019-06-25 15:59 ` Bart Van Assche 2019-06-25 16:06 ` Chaitanya Kulkarni 0 siblings, 1 reply; 14+ messages in thread From: Bart Van Assche @ 2019-06-25 15:59 UTC (permalink / raw) To: Chaitanya Kulkarni, Damien Le Moal, linux-scsi, Martin K . Petersen, linux-block, Jens Axboe Cc: Christoph Hellwig On 6/24/19 8:24 PM, Chaitanya Kulkarni wrote: > nit:- Can we use is_vmalloc_addr() call directly so that > "if (is_vmalloc)" -> "if (is_vmalloc_addr(data))" and remove is_vmalloc > variable. That would change a single call of is_vmalloc_addr() into multiple? Bart. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/3] block: Allow mapping of vmalloc-ed buffers 2019-06-25 15:59 ` Bart Van Assche @ 2019-06-25 16:06 ` Chaitanya Kulkarni 2019-06-26 1:38 ` Damien Le Moal 0 siblings, 1 reply; 14+ messages in thread From: Chaitanya Kulkarni @ 2019-06-25 16:06 UTC (permalink / raw) To: Bart Van Assche, Damien Le Moal, linux-scsi, Martin K . Petersen, linux-block, Jens Axboe Cc: Christoph Hellwig On 06/25/2019 08:59 AM, Bart Van Assche wrote: > On 6/24/19 8:24 PM, Chaitanya Kulkarni wrote: >> nit:- Can we use is_vmalloc_addr() call directly so that >> "if (is_vmalloc)" -> "if (is_vmalloc_addr(data))" and remove is_vmalloc >> variable. > That would change a single call of is_vmalloc_addr() into multiple? Well is_vmalloc_addr() it is an in-line helper with address comparison. is it too expensive to have such a comparison in the loop ? > > Bart. > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/3] block: Allow mapping of vmalloc-ed buffers 2019-06-25 16:06 ` Chaitanya Kulkarni @ 2019-06-26 1:38 ` Damien Le Moal 2019-06-26 4:19 ` Chaitanya Kulkarni 0 siblings, 1 reply; 14+ messages in thread From: Damien Le Moal @ 2019-06-26 1:38 UTC (permalink / raw) To: Chaitanya Kulkarni, Bart Van Assche, linux-scsi, Martin K . Petersen, linux-block, Jens Axboe Cc: Christoph Hellwig On 2019/06/26 1:06, Chaitanya Kulkarni wrote: > On 06/25/2019 08:59 AM, Bart Van Assche wrote: >> On 6/24/19 8:24 PM, Chaitanya Kulkarni wrote: >>> nit:- Can we use is_vmalloc_addr() call directly so that >>> "if (is_vmalloc)" -> "if (is_vmalloc_addr(data))" and remove is_vmalloc >>> variable. >> That would change a single call of is_vmalloc_addr() into multiple? > > Well is_vmalloc_addr() it is an in-line helper with address comparison. > > is it too expensive to have such a comparison in the loop ? Probably not, but I do not see the point in calling it for every page either since the cost of the additional bool on the stack is likely also very low. -- Damien Le Moal Western Digital Research ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/3] block: Allow mapping of vmalloc-ed buffers 2019-06-26 1:38 ` Damien Le Moal @ 2019-06-26 4:19 ` Chaitanya Kulkarni 0 siblings, 0 replies; 14+ messages in thread From: Chaitanya Kulkarni @ 2019-06-26 4:19 UTC (permalink / raw) To: Damien Le Moal, Bart Van Assche, linux-scsi, Martin K . Petersen, linux-block, Jens Axboe Cc: Christoph Hellwig On 6/25/19 6:38 PM, Damien Le Moal wrote: > On 2019/06/26 1:06, Chaitanya Kulkarni wrote: >> On 06/25/2019 08:59 AM, Bart Van Assche wrote: >>> On 6/24/19 8:24 PM, Chaitanya Kulkarni wrote: >>>> nit:- Can we use is_vmalloc_addr() call directly so that >>>> "if (is_vmalloc)" -> "if (is_vmalloc_addr(data))" and remove is_vmalloc >>>> variable. >>> That would change a single call of is_vmalloc_addr() into multiple? >> >> Well is_vmalloc_addr() it is an in-line helper with address comparison. >> >> is it too expensive to have such a comparison in the loop ? > > Probably not, but I do not see the point in calling it for every page either > since the cost of the additional bool on the stack is likely also very low. > Okay. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/3] block: Allow mapping of vmalloc-ed buffers 2019-06-25 2:46 ` [PATCH 1/3] block: Allow mapping of vmalloc-ed buffers Damien Le Moal 2019-06-25 3:24 ` Chaitanya Kulkarni @ 2019-06-25 15:59 ` Bart Van Assche 2019-06-25 16:22 ` Christoph Hellwig 2 siblings, 0 replies; 14+ messages in thread From: Bart Van Assche @ 2019-06-25 15:59 UTC (permalink / raw) To: Damien Le Moal, linux-scsi, Martin K . Petersen, linux-block, Jens Axboe Cc: Christoph Hellwig On 6/24/19 7:46 PM, Damien Le Moal wrote: > To allow the SCSI subsystem scsi_execute_req() function to issue > requests using large buffers that are better allocated with vmalloc() > rather than kmalloc(), modify bio_map_kern() to allow passing a buffer > allocated with the vmalloc() function. To do so, simply test the buffer > address using is_vmalloc_addr() and use vmalloc_to_page() instead of > virt_to_page() to obtain the pages of vmalloc-ed buffers. > > Fixes: 515ce6061312 ("scsi: sd_zbc: Fix sd_zbc_report_zones() buffer allocation") > Fixes: e76239a3748c ("block: add a report_zones method") > Cc: stable@vger.kernel.org > Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> > --- > block/bio.c | 8 +++++++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/block/bio.c b/block/bio.c > index ce797d73bb43..05afcaf655f3 100644 > --- a/block/bio.c > +++ b/block/bio.c > @@ -1501,6 +1501,8 @@ struct bio *bio_map_kern(struct request_queue *q, void *data, unsigned int len, > unsigned long end = (kaddr + len + PAGE_SIZE - 1) >> PAGE_SHIFT; > unsigned long start = kaddr >> PAGE_SHIFT; > const int nr_pages = end - start; > + bool is_vmalloc = is_vmalloc_addr(data); > + struct page *page; > int offset, i; > struct bio *bio; > > @@ -1518,7 +1520,11 @@ struct bio *bio_map_kern(struct request_queue *q, void *data, unsigned int len, > if (bytes > len) > bytes = len; > > - if (bio_add_pc_page(q, bio, virt_to_page(data), bytes, > + if (is_vmalloc) > + page = vmalloc_to_page(data); > + else > + page = virt_to_page(data); > + if (bio_add_pc_page(q, bio, page, bytes, > offset) < bytes) { > /* we don't support partial mappings */ > bio_put(bio); Reviewed-by: Bart Van Assche <bvanassche@acm.org> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/3] block: Allow mapping of vmalloc-ed buffers 2019-06-25 2:46 ` [PATCH 1/3] block: Allow mapping of vmalloc-ed buffers Damien Le Moal 2019-06-25 3:24 ` Chaitanya Kulkarni 2019-06-25 15:59 ` Bart Van Assche @ 2019-06-25 16:22 ` Christoph Hellwig 2 siblings, 0 replies; 14+ messages in thread From: Christoph Hellwig @ 2019-06-25 16:22 UTC (permalink / raw) To: Damien Le Moal Cc: linux-scsi, Martin K . Petersen, linux-block, Jens Axboe, Christoph Hellwig, Bart Van Assche On Tue, Jun 25, 2019 at 11:46:23AM +0900, Damien Le Moal wrote: > To allow the SCSI subsystem scsi_execute_req() function to issue > requests using large buffers that are better allocated with vmalloc() > rather than kmalloc(), modify bio_map_kern() to allow passing a buffer > allocated with the vmalloc() function. To do so, simply test the buffer > address using is_vmalloc_addr() and use vmalloc_to_page() instead of > virt_to_page() to obtain the pages of vmalloc-ed buffers. This is broken on architectures with VIVT caches. You need to flush and invalidate the caches based on the virtual address on those before performing DMA operations. ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 2/3] sd_zbc: Fix report zones buffer allocation 2019-06-25 2:46 [PATCH 0/3] Fix zone revalidation memory allocation failures Damien Le Moal 2019-06-25 2:46 ` [PATCH 1/3] block: Allow mapping of vmalloc-ed buffers Damien Le Moal @ 2019-06-25 2:46 ` Damien Le Moal 2019-06-25 10:37 ` kbuild test robot 2019-06-25 16:07 ` Bart Van Assche 2019-06-25 2:46 ` [PATCH 3/3] block: Limit zone array allocation size Damien Le Moal 2 siblings, 2 replies; 14+ messages in thread From: Damien Le Moal @ 2019-06-25 2:46 UTC (permalink / raw) To: linux-scsi, Martin K . Petersen, linux-block, Jens Axboe Cc: Christoph Hellwig, Bart Van Assche During disk scan and revalidation done with sd_revalidate(), the zones of a zoned disk are checked using the helper function blk_revalidate_disk_zones() if a configuration change is detected (change in the number of zones or zone size). The function blk_revalidate_disk_zones() issues report_zones calls that are very large, that is, to obtain zone information for all zones of the disk with a single command. The size of the report zones command buffer necessary for such large request generally is lower than the disk max_hw_sectors and KMALLOC_MAX_SIZE (4MB) and succeeds on boot (no memory fragmentation), but often fail at run time (e.g. hot-plug event). This causes the disk revalidation to fail and the disk capacity to be changed to 0. This problem can be avoided by using vmalloc() instead of kmalloc() for the buffer allocation. To limit the amount of memory to be allocated, this patch also introduces the arbitrary SD_ZBC_REPORT_MAX_ZONES maximum number of zones to report with a single report zones command. This limit may be lowered further to satisfy the disk max_hw_sectors limit. Finally, to ensure that the vmalloc-ed buffer can always be mapped in a request, the buffer size is further limited to at most queue_max_segments() pages, allowing successful mapping of the buffer even in the worst case scenario where none of the buffer pages are contiguous. Fixes: 515ce6061312 ("scsi: sd_zbc: Fix sd_zbc_report_zones() buffer allocation") Fixes: e76239a3748c ("block: add a report_zones method") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> --- drivers/scsi/sd_zbc.c | 79 ++++++++++++++++++++++++++++++++----------- 1 file changed, 59 insertions(+), 20 deletions(-) diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c index 7334024b64f1..b12444d6fc95 100644 --- a/drivers/scsi/sd_zbc.c +++ b/drivers/scsi/sd_zbc.c @@ -103,6 +103,48 @@ static int sd_zbc_do_report_zones(struct scsi_disk *sdkp, unsigned char *buf, return 0; } +/* + * Maximum number of zones to get with one report zones command. + */ +#define SD_ZBC_REPORT_MAX_ZONES 8192U + +/** + * Allocate a buffer for report zones reply. + * @disk: The target disk + * @nr_zones: Maximum number of zones to report + * @buflen: Size of the buffer allocated + * @gfp_mask: Memory allocation mask + * + */ +static void *sd_zbc_alloc_report_buffer(struct request_queue *q, + unsigned int nr_zones, size_t *buflen, + gfp_t gfp_mask) +{ + size_t bufsize; + void *buf; + + /* + * Report zone buffer size should be at most 64B times the number of + * zones requested plus the 64B reply header, but should be at least + * SECTOR_SIZE for ATA devices. + * Make sure that this size does not exceed the hardware capabilities. + * Furthermore, since the report zone command cannot be split, make + * sure that the allocated buffer can always be mapped by limiting the + * number of pages allocated to the HBA max segments limit. + */ + nr_zones = min(nr_zones, SD_ZBC_REPORT_MAX_ZONES); + bufsize = roundup((nr_zones + 1) * 64, 512); + bufsize = min_t(size_t, bufsize, + queue_max_hw_sectors(q) << SECTOR_SHIFT); + bufsize = min_t(size_t, bufsize, queue_max_segments(q) << PAGE_SHIFT); + + buf = __vmalloc(bufsize, gfp_mask, PAGE_KERNEL); + if (buf) + *buflen = bufsize; + + return buf; +} + /** * sd_zbc_report_zones - Disk report zones operation. * @disk: The target disk @@ -118,9 +160,9 @@ int sd_zbc_report_zones(struct gendisk *disk, sector_t sector, gfp_t gfp_mask) { struct scsi_disk *sdkp = scsi_disk(disk); - unsigned int i, buflen, nrz = *nr_zones; + unsigned int i, nrz = *nr_zones; unsigned char *buf; - size_t offset = 0; + size_t buflen = 0, offset = 0; int ret = 0; if (!sd_is_zoned(sdkp)) @@ -132,16 +174,14 @@ int sd_zbc_report_zones(struct gendisk *disk, sector_t sector, * without exceeding the device maximum command size. For ATA disks, * buffers must be aligned to 512B. */ - buflen = min(queue_max_hw_sectors(disk->queue) << 9, - roundup((nrz + 1) * 64, 512)); - buf = kmalloc(buflen, gfp_mask); + buf = sd_zbc_alloc_report_buffer(disk->queue, nrz, &buflen, gfp_mask); if (!buf) return -ENOMEM; ret = sd_zbc_do_report_zones(sdkp, buf, buflen, sectors_to_logical(sdkp->device, sector), true); if (ret) - goto out_free_buf; + goto out; nrz = min(nrz, get_unaligned_be32(&buf[0]) / 64); for (i = 0; i < nrz; i++) { @@ -152,8 +192,8 @@ int sd_zbc_report_zones(struct gendisk *disk, sector_t sector, *nr_zones = nrz; -out_free_buf: - kfree(buf); +out: + kvfree(buf); return ret; } @@ -287,8 +327,6 @@ static int sd_zbc_check_zoned_characteristics(struct scsi_disk *sdkp, return 0; } -#define SD_ZBC_BUF_SIZE 131072U - /** * sd_zbc_check_zones - Check the device capacity and zone sizes * @sdkp: Target disk @@ -304,22 +342,23 @@ static int sd_zbc_check_zoned_characteristics(struct scsi_disk *sdkp, */ static int sd_zbc_check_zones(struct scsi_disk *sdkp, u32 *zblocks) { + size_t bufsize, buflen; u64 zone_blocks = 0; sector_t max_lba, block = 0; unsigned char *buf; unsigned char *rec; - unsigned int buf_len; - unsigned int list_length; int ret; u8 same; /* Get a buffer */ - buf = kmalloc(SD_ZBC_BUF_SIZE, GFP_KERNEL); + buf = sd_zbc_alloc_report_buffer(sdkp->disk->queue, + SD_ZBC_REPORT_MAX_ZONES, + &bufsize, GFP_NOIO); if (!buf) return -ENOMEM; /* Do a report zone to get max_lba and the same field */ - ret = sd_zbc_do_report_zones(sdkp, buf, SD_ZBC_BUF_SIZE, 0, false); + ret = sd_zbc_do_report_zones(sdkp, buf, bufsize, 0, false); if (ret) goto out_free; @@ -355,12 +394,12 @@ static int sd_zbc_check_zones(struct scsi_disk *sdkp, u32 *zblocks) do { /* Parse REPORT ZONES header */ - list_length = get_unaligned_be32(&buf[0]) + 64; + buflen = min_t(size_t, get_unaligned_be32(&buf[0]) + 64, + bufsize); rec = buf + 64; - buf_len = min(list_length, SD_ZBC_BUF_SIZE); /* Parse zone descriptors */ - while (rec < buf + buf_len) { + while (rec < buf + buflen) { u64 this_zone_blocks = get_unaligned_be64(&rec[8]); if (zone_blocks == 0) { @@ -376,8 +415,8 @@ static int sd_zbc_check_zones(struct scsi_disk *sdkp, u32 *zblocks) } if (block < sdkp->capacity) { - ret = sd_zbc_do_report_zones(sdkp, buf, SD_ZBC_BUF_SIZE, - block, true); + ret = sd_zbc_do_report_zones(sdkp, buf, bufsize, block, + true); if (ret) goto out_free; } @@ -408,7 +447,7 @@ static int sd_zbc_check_zones(struct scsi_disk *sdkp, u32 *zblocks) } out_free: - kfree(buf); + kvfree(buf); return ret; } -- 2.21.0 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH 2/3] sd_zbc: Fix report zones buffer allocation 2019-06-25 2:46 ` [PATCH 2/3] sd_zbc: Fix report zones buffer allocation Damien Le Moal @ 2019-06-25 10:37 ` kbuild test robot 2019-06-25 16:07 ` Bart Van Assche 1 sibling, 0 replies; 14+ messages in thread From: kbuild test robot @ 2019-06-25 10:37 UTC (permalink / raw) To: Damien Le Moal Cc: kbuild-all, linux-scsi, Martin K . Petersen, linux-block, Jens Axboe, Christoph Hellwig, Bart Van Assche [-- Attachment #1: Type: text/plain, Size: 3132 bytes --] Hi Damien, I love your patch! Yet something to improve: [auto build test ERROR on block/for-next] [also build test ERROR on v5.2-rc6 next-20190621] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Damien-Le-Moal/Fix-zone-revalidation-memory-allocation-failures/20190625-175053 base: https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next config: sparc64-allyesconfig (attached as .config) compiler: sparc64-linux-gcc (GCC) 7.4.0 reproduce: wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree GCC_VERSION=7.4.0 make.cross ARCH=sparc64 If you fix the issue, kindly add following tag Reported-by: kbuild test robot <lkp@intel.com> All error/warnings (new ones prefixed by >>): drivers/scsi/sd_zbc.c: In function 'sd_zbc_alloc_report_buffer': >> drivers/scsi/sd_zbc.c:141:8: error: implicit declaration of function '__vmalloc'; did you mean '__kmalloc'? [-Werror=implicit-function-declaration] buf = __vmalloc(bufsize, gfp_mask, PAGE_KERNEL); ^~~~~~~~~ __kmalloc >> drivers/scsi/sd_zbc.c:141:6: warning: assignment makes pointer from integer without a cast [-Wint-conversion] buf = __vmalloc(bufsize, gfp_mask, PAGE_KERNEL); ^ cc1: some warnings being treated as errors vim +141 drivers/scsi/sd_zbc.c 110 111 /** 112 * Allocate a buffer for report zones reply. 113 * @disk: The target disk 114 * @nr_zones: Maximum number of zones to report 115 * @buflen: Size of the buffer allocated 116 * @gfp_mask: Memory allocation mask 117 * 118 */ 119 static void *sd_zbc_alloc_report_buffer(struct request_queue *q, 120 unsigned int nr_zones, size_t *buflen, 121 gfp_t gfp_mask) 122 { 123 size_t bufsize; 124 void *buf; 125 126 /* 127 * Report zone buffer size should be at most 64B times the number of 128 * zones requested plus the 64B reply header, but should be at least 129 * SECTOR_SIZE for ATA devices. 130 * Make sure that this size does not exceed the hardware capabilities. 131 * Furthermore, since the report zone command cannot be split, make 132 * sure that the allocated buffer can always be mapped by limiting the 133 * number of pages allocated to the HBA max segments limit. 134 */ 135 nr_zones = min(nr_zones, SD_ZBC_REPORT_MAX_ZONES); 136 bufsize = roundup((nr_zones + 1) * 64, 512); 137 bufsize = min_t(size_t, bufsize, 138 queue_max_hw_sectors(q) << SECTOR_SHIFT); 139 bufsize = min_t(size_t, bufsize, queue_max_segments(q) << PAGE_SHIFT); 140 > 141 buf = __vmalloc(bufsize, gfp_mask, PAGE_KERNEL); 142 if (buf) 143 *buflen = bufsize; 144 145 return buf; 146 } 147 --- 0-DAY kernel test infrastructure Open Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation [-- Attachment #2: .config.gz --] [-- Type: application/gzip, Size: 58537 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/3] sd_zbc: Fix report zones buffer allocation 2019-06-25 2:46 ` [PATCH 2/3] sd_zbc: Fix report zones buffer allocation Damien Le Moal 2019-06-25 10:37 ` kbuild test robot @ 2019-06-25 16:07 ` Bart Van Assche 1 sibling, 0 replies; 14+ messages in thread From: Bart Van Assche @ 2019-06-25 16:07 UTC (permalink / raw) To: Damien Le Moal, linux-scsi, Martin K . Petersen, linux-block, Jens Axboe Cc: Christoph Hellwig On 6/24/19 7:46 PM, Damien Le Moal wrote: > During disk scan and revalidation done with sd_revalidate(), the zones > of a zoned disk are checked using the helper function > blk_revalidate_disk_zones() if a configuration change is detected > (change in the number of zones or zone size). The function > blk_revalidate_disk_zones() issues report_zones calls that are very > large, that is, to obtain zone information for all zones of the disk > with a single command. The size of the report zones command buffer > necessary for such large request generally is lower than the disk > max_hw_sectors and KMALLOC_MAX_SIZE (4MB) and succeeds on boot (no > memory fragmentation), but often fail at run time (e.g. hot-plug > event). This causes the disk revalidation to fail and the disk > capacity to be changed to 0. > > This problem can be avoided by using vmalloc() instead of kmalloc() for > the buffer allocation. To limit the amount of memory to be allocated, > this patch also introduces the arbitrary SD_ZBC_REPORT_MAX_ZONES > maximum number of zones to report with a single report zones command. > This limit may be lowered further to satisfy the disk max_hw_sectors > limit. Finally, to ensure that the vmalloc-ed buffer can always be > mapped in a request, the buffer size is further limited to at most > queue_max_segments() pages, allowing successful mapping of the buffer > even in the worst case scenario where none of the buffer pages are > contiguous. Reviewed-by: Bart Van Assche <bvanassche@acm.org> ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 3/3] block: Limit zone array allocation size 2019-06-25 2:46 [PATCH 0/3] Fix zone revalidation memory allocation failures Damien Le Moal 2019-06-25 2:46 ` [PATCH 1/3] block: Allow mapping of vmalloc-ed buffers Damien Le Moal 2019-06-25 2:46 ` [PATCH 2/3] sd_zbc: Fix report zones buffer allocation Damien Le Moal @ 2019-06-25 2:46 ` Damien Le Moal 2019-06-25 16:10 ` Bart Van Assche 2 siblings, 1 reply; 14+ messages in thread From: Damien Le Moal @ 2019-06-25 2:46 UTC (permalink / raw) To: linux-scsi, Martin K . Petersen, linux-block, Jens Axboe Cc: Christoph Hellwig, Bart Van Assche Limit the size of the struct blk_zone array used in blk_revalidate_disk_zones() to avoid memory allocation failures leading to disk revalidation failure. Further reduce the likelyhood of these failures by using kvmalloc() instead of directly allocating contiguous pages. Fixes: 515ce6061312 ("scsi: sd_zbc: Fix sd_zbc_report_zones() buffer allocation") Fixes: e76239a3748c ("block: add a report_zones method") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> --- block/blk-zoned.c | 29 +++++++++++++---------------- include/linux/blkdev.h | 5 +++++ 2 files changed, 18 insertions(+), 16 deletions(-) diff --git a/block/blk-zoned.c b/block/blk-zoned.c index ae7e91bd0618..26f878b9b5f5 100644 --- a/block/blk-zoned.c +++ b/block/blk-zoned.c @@ -373,22 +373,20 @@ static inline unsigned long *blk_alloc_zone_bitmap(int node, * Allocate an array of struct blk_zone to get nr_zones zone information. * The allocated array may be smaller than nr_zones. */ -static struct blk_zone *blk_alloc_zones(int node, unsigned int *nr_zones) +static struct blk_zone *blk_alloc_zones(unsigned int *nr_zones) { - size_t size = *nr_zones * sizeof(struct blk_zone); - struct page *page; - int order; - - for (order = get_order(size); order >= 0; order--) { - page = alloc_pages_node(node, GFP_NOIO | __GFP_ZERO, order); - if (page) { - *nr_zones = min_t(unsigned int, *nr_zones, - (PAGE_SIZE << order) / sizeof(struct blk_zone)); - return page_address(page); - } + struct blk_zone *zones; + size_t nrz = min(*nr_zones, BLK_ZONED_REPORT_MAX_ZONES); + + zones = kvcalloc(nrz, sizeof(struct blk_zone), GFP_NOIO); + if (!zones) { + *nr_zones = 0; + return NULL; } - return NULL; + *nr_zones = nrz; + + return zones; } void blk_queue_free_zone_bitmaps(struct request_queue *q) @@ -443,7 +441,7 @@ int blk_revalidate_disk_zones(struct gendisk *disk) /* Get zone information and initialize seq_zones_bitmap */ rep_nr_zones = nr_zones; - zones = blk_alloc_zones(q->node, &rep_nr_zones); + zones = blk_alloc_zones(&rep_nr_zones); if (!zones) goto out; @@ -480,8 +478,7 @@ int blk_revalidate_disk_zones(struct gendisk *disk) blk_mq_unfreeze_queue(q); out: - free_pages((unsigned long)zones, - get_order(rep_nr_zones * sizeof(struct blk_zone))); + kvfree(zones); kfree(seq_zones_wlock); kfree(seq_zones_bitmap); diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 592669bcc536..f7faac856017 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -344,6 +344,11 @@ struct queue_limits { #ifdef CONFIG_BLK_DEV_ZONED +/* + * Maximum number of zones to report with a single report zones command. + */ +#define BLK_ZONED_REPORT_MAX_ZONES 8192U + extern unsigned int blkdev_nr_zones(struct block_device *bdev); extern int blkdev_report_zones(struct block_device *bdev, sector_t sector, struct blk_zone *zones, -- 2.21.0 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH 3/3] block: Limit zone array allocation size 2019-06-25 2:46 ` [PATCH 3/3] block: Limit zone array allocation size Damien Le Moal @ 2019-06-25 16:10 ` Bart Van Assche 0 siblings, 0 replies; 14+ messages in thread From: Bart Van Assche @ 2019-06-25 16:10 UTC (permalink / raw) To: Damien Le Moal, linux-scsi, Martin K . Petersen, linux-block, Jens Axboe Cc: Christoph Hellwig On 6/24/19 7:46 PM, Damien Le Moal wrote: > Limit the size of the struct blk_zone array used in > blk_revalidate_disk_zones() to avoid memory allocation failures leading > to disk revalidation failure. Further reduce the likelyhood of these > failures by using kvmalloc() instead of directly allocating contiguous > pages. Reviewed-by: Bart Van Assche <bvanassche@acm.org> ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2019-06-26 4:19 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-06-25 2:46 [PATCH 0/3] Fix zone revalidation memory allocation failures Damien Le Moal 2019-06-25 2:46 ` [PATCH 1/3] block: Allow mapping of vmalloc-ed buffers Damien Le Moal 2019-06-25 3:24 ` Chaitanya Kulkarni 2019-06-25 15:59 ` Bart Van Assche 2019-06-25 16:06 ` Chaitanya Kulkarni 2019-06-26 1:38 ` Damien Le Moal 2019-06-26 4:19 ` Chaitanya Kulkarni 2019-06-25 15:59 ` Bart Van Assche 2019-06-25 16:22 ` Christoph Hellwig 2019-06-25 2:46 ` [PATCH 2/3] sd_zbc: Fix report zones buffer allocation Damien Le Moal 2019-06-25 10:37 ` kbuild test robot 2019-06-25 16:07 ` Bart Van Assche 2019-06-25 2:46 ` [PATCH 3/3] block: Limit zone array allocation size Damien Le Moal 2019-06-25 16:10 ` Bart Van Assche
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.