* [PATCH] fix unbalanced page refcounting in bio_map_user_iov @ 2017-09-22 5:18 Vitaly Mayatskikh 2017-09-22 5:24 ` Vitaly Mayatskikh 2017-09-23 16:39 ` Al Viro 0 siblings, 2 replies; 9+ messages in thread From: Vitaly Mayatskikh @ 2017-09-22 5:18 UTC (permalink / raw) To: linux-block, linux-kernel; +Cc: Jens Axboe bio_map_user_iov and bio_unmap_user do unbalanced pages refcounting if IO vector has small consecutive buffers belonging to the same page. bio_add_pc_page merges them into one, but the page reference is never dropped. Signed-off-by: Vitaly Mayatskikh <v.mayatskih@gmail.com> diff --git a/block/bio.c b/block/bio.c index b38e962fa83e..10cd3b6bed27 100644 --- a/block/bio.c +++ b/block/bio.c @@ -1383,6 +1383,7 @@ struct bio *bio_map_user_iov(struct request_queue *q, offset = offset_in_page(uaddr); for (j = cur_page; j < page_limit; j++) { unsigned int bytes = PAGE_SIZE - offset; + unsigned short prev_bi_vcnt = bio->bi_vcnt; if (len <= 0) break; @@ -1397,6 +1398,13 @@ struct bio *bio_map_user_iov(struct request_queue *q, bytes) break; + /* + * check if vector was merged with previous + * drop page reference if needed + */ + if (bio->bi_vcnt == prev_bi_vcnt) + put_page(pages[j]); + len -= bytes; offset = 0; } -- wbr, Vitaly ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] fix unbalanced page refcounting in bio_map_user_iov 2017-09-22 5:18 [PATCH] fix unbalanced page refcounting in bio_map_user_iov Vitaly Mayatskikh @ 2017-09-22 5:24 ` Vitaly Mayatskikh 2017-09-23 16:39 ` Al Viro 1 sibling, 0 replies; 9+ messages in thread From: Vitaly Mayatskikh @ 2017-09-22 5:24 UTC (permalink / raw) To: linux-block, linux-kernel; +Cc: Jens Axboe Reproducer (needs SCSI disk): #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <sys/ioctl.h> #include <fcntl.h> #include <errno.h> #include <malloc.h> #include <string.h> #include <scsi/sg.h> #define NR_IOS 10000 #define NR_IOVECS 8 #define SG_IO 0x2285 int main(int argc, char *argv[]) { int fd, i, j; unsigned char *buf, *ptr, cdb[10]; sg_io_hdr_t io_hdr; sg_iovec_t iovec[NR_IOVECS]; if (argc < 2) { printf("Run: %s </dev/sdX>\n", argv[0]); exit(1); } buf = ptr = memalign(4096, NR_IOS * NR_IOVECS * 512); if (!buf) { printf("can't alloc memory\n"); exit(1); } fd = open(argv[1], 0); if (fd < 0) { printf("open %s failed: %d (%s)\n", argv[1], errno, strerror(errno)); exit(1); } io_hdr.interface_id = 'S'; io_hdr.cmd_len = sizeof(cdb); io_hdr.cmdp = cdb; io_hdr.dxfer_direction = SG_DXFER_FROM_DEV; io_hdr.dxfer_len = 512 * NR_IOVECS; io_hdr.dxferp = iovec; io_hdr.iovec_count = NR_IOVECS; cdb[0] = 0x28; // READ10 cdb[8] = NR_IOVECS; // sectors for (j = 0; j < NR_IOS; j++, ptr += 512) { for (i = 0; i < NR_IOVECS; i++) { iovec[i].iov_base = ptr; iovec[i].iov_len = 512; } if (ioctl(fd, SG_IO, &io_hdr)) { printf("IOCTL failed: %d (%s)\n", errno, strerror(errno)); exit(1); } } free(buf); close(fd); return 0; } # free -m total used free shared buff/cache available Mem: 3827 46 3601 0 178 3568 Swap: 0 0 0 # ./sgio-leak /dev/sdd # free -m total used free shared buff/cache available Mem: 3827 85 3562 0 178 3529 Swap: 0 0 0 [root@node-A ~]# free -m total used free shared buff/cache available Mem: 3827 85 3628 0 113 3561 Swap: 0 0 0 # ./sgio-leak /dev/sdd # free -m total used free shared buff/cache available Mem: 3827 124 3589 0 113 3523 Swap: 0 0 0 -- wbr, Vitaly ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] fix unbalanced page refcounting in bio_map_user_iov 2017-09-22 5:18 [PATCH] fix unbalanced page refcounting in bio_map_user_iov Vitaly Mayatskikh 2017-09-22 5:24 ` Vitaly Mayatskikh @ 2017-09-23 16:39 ` Al Viro 2017-09-23 16:55 ` Al Viro 1 sibling, 1 reply; 9+ messages in thread From: Al Viro @ 2017-09-23 16:39 UTC (permalink / raw) To: Vitaly Mayatskikh; +Cc: linux-block, linux-kernel, Jens Axboe On Fri, Sep 22, 2017 at 01:18:39AM -0400, Vitaly Mayatskikh wrote: > bio_map_user_iov and bio_unmap_user do unbalanced pages refcounting if > IO vector has small consecutive buffers belonging to the same page. > bio_add_pc_page merges them into one, but the page reference is never > dropped. > > Signed-off-by: Vitaly Mayatskikh <v.mayatskih@gmail.com> > > diff --git a/block/bio.c b/block/bio.c > index b38e962fa83e..10cd3b6bed27 100644 > --- a/block/bio.c > +++ b/block/bio.c > @@ -1383,6 +1383,7 @@ struct bio *bio_map_user_iov(struct request_queue *q, > offset = offset_in_page(uaddr); > for (j = cur_page; j < page_limit; j++) { > unsigned int bytes = PAGE_SIZE - offset; > + unsigned short prev_bi_vcnt = bio->bi_vcnt; > > if (len <= 0) > break; > @@ -1397,6 +1398,13 @@ struct bio *bio_map_user_iov(struct request_queue *q, > bytes) > break; > > + /* > + * check if vector was merged with previous > + * drop page reference if needed > + */ > + if (bio->bi_vcnt == prev_bi_vcnt) > + put_page(pages[j]); > + Except that now you've got double-puts on failure exits ;-/ ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] fix unbalanced page refcounting in bio_map_user_iov 2017-09-23 16:39 ` Al Viro @ 2017-09-23 16:55 ` Al Viro 2017-09-23 17:19 ` Al Viro 0 siblings, 1 reply; 9+ messages in thread From: Al Viro @ 2017-09-23 16:55 UTC (permalink / raw) To: Vitaly Mayatskikh; +Cc: linux-block, linux-kernel, Jens Axboe On Sat, Sep 23, 2017 at 05:39:28PM +0100, Al Viro wrote: > On Fri, Sep 22, 2017 at 01:18:39AM -0400, Vitaly Mayatskikh wrote: > > bio_map_user_iov and bio_unmap_user do unbalanced pages refcounting if > > IO vector has small consecutive buffers belonging to the same page. > > bio_add_pc_page merges them into one, but the page reference is never > > dropped. > > > > Signed-off-by: Vitaly Mayatskikh <v.mayatskih@gmail.com> > > > > diff --git a/block/bio.c b/block/bio.c > > index b38e962fa83e..10cd3b6bed27 100644 > > --- a/block/bio.c > > +++ b/block/bio.c > > @@ -1383,6 +1383,7 @@ struct bio *bio_map_user_iov(struct request_queue *q, > > offset = offset_in_page(uaddr); > > for (j = cur_page; j < page_limit; j++) { > > unsigned int bytes = PAGE_SIZE - offset; > > + unsigned short prev_bi_vcnt = bio->bi_vcnt; > > > > if (len <= 0) > > break; > > @@ -1397,6 +1398,13 @@ struct bio *bio_map_user_iov(struct request_queue *q, > > bytes) > > break; > > > > + /* > > + * check if vector was merged with previous > > + * drop page reference if needed > > + */ > > + if (bio->bi_vcnt == prev_bi_vcnt) > > + put_page(pages[j]); > > + > > Except that now you've got double-puts on failure exits ;-/ IOW, the loop on failure exit should go through the bio, like __bio_unmap_user() does. We *also* need to put everything left unused in pages[], but only from the last iteration through iov_for_each(). Frankly, I would prefer to reuse the pages[], rather than append to it on each iteration. Used iov_iter_get_pages_alloc(), actually. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] fix unbalanced page refcounting in bio_map_user_iov 2017-09-23 16:55 ` Al Viro @ 2017-09-23 17:19 ` Al Viro 2017-09-23 20:33 ` Al Viro 0 siblings, 1 reply; 9+ messages in thread From: Al Viro @ 2017-09-23 17:19 UTC (permalink / raw) To: Vitaly Mayatskikh; +Cc: linux-block, linux-kernel, Jens Axboe On Sat, Sep 23, 2017 at 05:55:37PM +0100, Al Viro wrote: > IOW, the loop on failure exit should go through the bio, like __bio_unmap_user() > does. We *also* need to put everything left unused in pages[], but only from the > last iteration through iov_for_each(). > > Frankly, I would prefer to reuse the pages[], rather than append to it on each > iteration. Used iov_iter_get_pages_alloc(), actually. Something like completely untested diff below, perhaps... diff --git a/block/bio.c b/block/bio.c index b38e962fa83e..b5fe23597b41 100644 --- a/block/bio.c +++ b/block/bio.c @@ -1323,94 +1323,60 @@ struct bio *bio_map_user_iov(struct request_queue *q, const struct iov_iter *iter, gfp_t gfp_mask) { - int j; - int nr_pages = 0; - struct page **pages; struct bio *bio; - int cur_page = 0; - int ret, offset; + struct bio_vec *bvec; struct iov_iter i; - struct iovec iov; - - iov_for_each(iov, i, *iter) { - unsigned long uaddr = (unsigned long) iov.iov_base; - unsigned long len = iov.iov_len; - unsigned long end = (uaddr + len + PAGE_SIZE - 1) >> PAGE_SHIFT; - unsigned long start = uaddr >> PAGE_SHIFT; - - /* - * Overflow, abort - */ - if (end < start) - return ERR_PTR(-EINVAL); - - nr_pages += end - start; - /* - * buffer must be aligned to at least logical block size for now - */ - if (uaddr & queue_dma_alignment(q)) - return ERR_PTR(-EINVAL); - } + int ret, j; - if (!nr_pages) + if (!iov_iter_count(iter)) return ERR_PTR(-EINVAL); - bio = bio_kmalloc(gfp_mask, nr_pages); + bio = bio_kmalloc(gfp_mask, iov_iter_npages(iter, BIO_MAX_PAGES)); if (!bio) return ERR_PTR(-ENOMEM); - ret = -ENOMEM; - pages = kcalloc(nr_pages, sizeof(struct page *), gfp_mask); - if (!pages) - goto out; + i = *iter; + while (iov_iter_count(&i)) { + struct page **pages; + size_t offs; + ssize_t bytes; + int npages, j; - iov_for_each(iov, i, *iter) { - unsigned long uaddr = (unsigned long) iov.iov_base; - unsigned long len = iov.iov_len; - unsigned long end = (uaddr + len + PAGE_SIZE - 1) >> PAGE_SHIFT; - unsigned long start = uaddr >> PAGE_SHIFT; - const int local_nr_pages = end - start; - const int page_limit = cur_page + local_nr_pages; - - ret = get_user_pages_fast(uaddr, local_nr_pages, - (iter->type & WRITE) != WRITE, - &pages[cur_page]); - if (ret < local_nr_pages) { - ret = -EFAULT; + bytes = iov_iter_get_pages_alloc(&i, &pages, LONG_MAX, &offs); + if (bytes <= 0) { + ret = bytes ? bytes : -EFAULT; goto out_unmap; } - offset = offset_in_page(uaddr); - for (j = cur_page; j < page_limit; j++) { - unsigned int bytes = PAGE_SIZE - offset; + npages = DIV_ROUND_UP(offs + bytes, PAGE_SIZE); - if (len <= 0) - break; + if (unlikely(offs & queue_dma_alignment(q))) { + ret = -EINVAL; + j = 0; + } else { + for (j = 0; j < npages; j++) { + struct page *page = pages[j]; + unsigned n = PAGE_SIZE - offs; + unsigned prev_bi_vcnt = bio->bi_vcnt; + + if (!bio_add_pc_page(q, bio, page, n, offs)) { + iov_iter_truncate(&i, 0); + break; + } + + if (bio->bi_vcnt == prev_bi_vcnt) + put_page(page); - if (bytes > len) - bytes = len; - - /* - * sorry... - */ - if (bio_add_pc_page(q, bio, pages[j], bytes, offset) < - bytes) - break; - - len -= bytes; - offset = 0; + iov_iter_advance(&i, n); + bytes -= n; + offs = 0; + } } - - cur_page = j; - /* - * release the pages we didn't map into the bio, if any - */ - while (j < page_limit) + while (j < npages) put_page(pages[j++]); + kvfree(pages); } - kfree(pages); - bio_set_flag(bio, BIO_USER_MAPPED); /* @@ -1423,13 +1389,9 @@ struct bio *bio_map_user_iov(struct request_queue *q, return bio; out_unmap: - for (j = 0; j < nr_pages; j++) { - if (!pages[j]) - break; - put_page(pages[j]); + bio_for_each_segment_all(bvec, bio, j) { + put_page(bvec->bv_page); } - out: - kfree(pages); bio_put(bio); return ERR_PTR(ret); } ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] fix unbalanced page refcounting in bio_map_user_iov 2017-09-23 17:19 ` Al Viro @ 2017-09-23 20:33 ` Al Viro 2017-09-24 14:27 ` Al Viro 0 siblings, 1 reply; 9+ messages in thread From: Al Viro @ 2017-09-23 20:33 UTC (permalink / raw) To: Vitaly Mayatskikh; +Cc: linux-block, linux-kernel, Jens Axboe On Sat, Sep 23, 2017 at 06:19:26PM +0100, Al Viro wrote: > On Sat, Sep 23, 2017 at 05:55:37PM +0100, Al Viro wrote: > > > IOW, the loop on failure exit should go through the bio, like __bio_unmap_user() > > does. We *also* need to put everything left unused in pages[], but only from the > > last iteration through iov_for_each(). > > > > Frankly, I would prefer to reuse the pages[], rather than append to it on each > > iteration. Used iov_iter_get_pages_alloc(), actually. > > Something like completely untested diff below, perhaps... > + unsigned n = PAGE_SIZE - offs; > + unsigned prev_bi_vcnt = bio->bi_vcnt; Sorry, that should've been followed by if (n > bytes) n = bytes; Anyway, a carved-up variant is in vfs.git#work.iov_iter. It still needs review and testing; the patch Vitaly has posted in this thread plus 6 followups, hopefully more readable than aggregate diff. Comments? ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] fix unbalanced page refcounting in bio_map_user_iov 2017-09-23 20:33 ` Al Viro @ 2017-09-24 14:27 ` Al Viro 2017-09-24 17:15 ` Al Viro 2017-09-25 1:48 ` Vitaly Mayatskikh 0 siblings, 2 replies; 9+ messages in thread From: Al Viro @ 2017-09-24 14:27 UTC (permalink / raw) To: linux-block Cc: linux-kernel, Jens Axboe, Christoph Hellwig, Vitaly Mayatskikh On Sat, Sep 23, 2017 at 09:33:23PM +0100, Al Viro wrote: > On Sat, Sep 23, 2017 at 06:19:26PM +0100, Al Viro wrote: > > On Sat, Sep 23, 2017 at 05:55:37PM +0100, Al Viro wrote: > > > > > IOW, the loop on failure exit should go through the bio, like __bio_unmap_user() > > > does. We *also* need to put everything left unused in pages[], but only from the > > > last iteration through iov_for_each(). > > > > > > Frankly, I would prefer to reuse the pages[], rather than append to it on each > > > iteration. Used iov_iter_get_pages_alloc(), actually. > > > > Something like completely untested diff below, perhaps... > > > + unsigned n = PAGE_SIZE - offs; > > + unsigned prev_bi_vcnt = bio->bi_vcnt; > > Sorry, that should've been followed by > if (n > bytes) > n = bytes; > > Anyway, a carved-up variant is in vfs.git#work.iov_iter. It still needs > review and testing; the patch Vitaly has posted in this thread plus 6 > followups, hopefully more readable than aggregate diff. > > Comments? BTW, there's something fishy in bio_copy_user_iov(). If the area we'd asked for had been too large for a single bio, we are going to create a bio and have bio_add_pc_page() eventually fill it up to limit. Then we return into __blk_rq_map_user_iov(), advance iter and call bio_copy_user_iov() again. Fine, but... now we might have non-zero iter->iov_offset. And this bmd->is_our_pages = map_data ? 0 : 1; memcpy(bmd->iov, iter->iov, sizeof(struct iovec) * iter->nr_segs); iov_iter_init(&bmd->iter, iter->type, bmd->iov, iter->nr_segs, iter->count); does not even look at iter->iov_offset. As the result, when it gets to bio_uncopy_user(), we copy the data from each bio into the *beginning* of the user area, overwriting that from the other bio. At the very least, we need bmd->iter = *iter; bmd->iter.iov = bmd->iov; instead of that iov_iter_init() in there. I'm not sure how far back does it go; looks like "block: support large requests in blk_rq_map_user_iov" is the earliest possible point, but it might need more digging to make sure. v4.5+, if that's when the problems began... Anyway, I'd added the obvious fix to #work.iov_iter, reordered it and force-pushed the result. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] fix unbalanced page refcounting in bio_map_user_iov 2017-09-24 14:27 ` Al Viro @ 2017-09-24 17:15 ` Al Viro 2017-09-25 1:48 ` Vitaly Mayatskikh 1 sibling, 0 replies; 9+ messages in thread From: Al Viro @ 2017-09-24 17:15 UTC (permalink / raw) To: linux-block Cc: linux-kernel, Jens Axboe, Christoph Hellwig, Vitaly Mayatskikh On Sun, Sep 24, 2017 at 03:27:39PM +0100, Al Viro wrote: > At the very least, we need bmd->iter = *iter; bmd->iter.iov = bmd->iov; > instead of that iov_iter_init() in there. I'm not sure how far back does > it go; looks like "block: support large requests in blk_rq_map_user_iov" > is the earliest possible point, but it might need more digging to make > sure. v4.5+, if that's when the problems began... > > Anyway, I'd added the obvious fix to #work.iov_iter, reordered it and > force-pushed the result. While we are at it, calculation of nr_pages in bio_copy_user_iov() is bloody odd - why, in the name of everything unholy, does it care about the iovec boundaries in there? We are copying data anyway; why does allocation of bio care about the fragmentation of the other end of copying? Shouldn't it be simply max(DIV_ROUND_UP(offset + len, PAGE_SIZE), BIO_MAX_PAGES)? ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] fix unbalanced page refcounting in bio_map_user_iov 2017-09-24 14:27 ` Al Viro 2017-09-24 17:15 ` Al Viro @ 2017-09-25 1:48 ` Vitaly Mayatskikh 1 sibling, 0 replies; 9+ messages in thread From: Vitaly Mayatskikh @ 2017-09-25 1:48 UTC (permalink / raw) To: Al Viro; +Cc: linux-block, linux-kernel, Jens Axboe, Christoph Hellwig On Sun, 24 Sep 2017 10:27:39 -0400, Al Viro wrote: > BTW, there's something fishy in bio_copy_user_iov(). If the area we'd asked for > had been too large for a single bio, we are going to create a bio and have > bio_add_pc_page() eventually fill it up to limit. Then we return into > __blk_rq_map_user_iov(), advance iter and call bio_copy_user_iov() again. > Fine, but... now we might have non-zero iter->iov_offset. And this > bmd->is_our_pages = map_data ? 0 : 1; > memcpy(bmd->iov, iter->iov, sizeof(struct iovec) * iter->nr_segs); > iov_iter_init(&bmd->iter, iter->type, bmd->iov, > iter->nr_segs, iter->count); > does not even look at iter->iov_offset. As the result, when it gets to > bio_uncopy_user(), we copy the data from each bio into the *beginning* of > the user area, overwriting that from the other bio. Yeah, something is wrong with bio_copy_user_iov. Our datapath hangs when IO flows through unmodified SG (it forces bio_copy if iov_count is set). I did not look at details, but same IO pattern and memory layout work well with bio_map (module refcount problem). > Anyway, I'd added the obvious fix to #work.iov_iter, reordered it and > force-pushed the result. I'll give it a try, thanks! -- wbr, Vitaly ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2017-09-25 1:48 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-09-22 5:18 [PATCH] fix unbalanced page refcounting in bio_map_user_iov Vitaly Mayatskikh 2017-09-22 5:24 ` Vitaly Mayatskikh 2017-09-23 16:39 ` Al Viro 2017-09-23 16:55 ` Al Viro 2017-09-23 17:19 ` Al Viro 2017-09-23 20:33 ` Al Viro 2017-09-24 14:27 ` Al Viro 2017-09-24 17:15 ` Al Viro 2017-09-25 1:48 ` Vitaly Mayatskikh
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.