* avoid calling nth_page in the block I/O path @ 2019-04-08 10:46 Christoph Hellwig 2019-04-08 10:46 ` [PATCH 1/5] block: rewrite blk_bvec_map_sg to avoid a nth_page call Christoph Hellwig ` (5 more replies) 0 siblings, 6 replies; 29+ messages in thread From: Christoph Hellwig @ 2019-04-08 10:46 UTC (permalink / raw) To: Jens Axboe; +Cc: Ming Lei, linux-block Hi Jens, you complained about the overhead of calling nth_page in the multipage bio_vec enabled I/O path a while ago. While I can't really reproduce the numbers on my (slower) hardware we could avoid the nth_page calls pretty easily with a few tweaks. Can you take a look at this series? ^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH 1/5] block: rewrite blk_bvec_map_sg to avoid a nth_page call 2019-04-08 10:46 avoid calling nth_page in the block I/O path Christoph Hellwig @ 2019-04-08 10:46 ` Christoph Hellwig 2019-04-08 14:03 ` Johannes Thumshirn ` (3 more replies) 2019-04-08 10:46 ` [PATCH 2/5] block: refactor __bio_iov_bvec_add_pages Christoph Hellwig ` (4 subsequent siblings) 5 siblings, 4 replies; 29+ messages in thread From: Christoph Hellwig @ 2019-04-08 10:46 UTC (permalink / raw) To: Jens Axboe; +Cc: Ming Lei, linux-block The offset in scatterlists is allowed to be larger than the page size, so don't go to great length to avoid that case and simplify the arithmetics. Signed-off-by: Christoph Hellwig <hch@lst.de> --- block/blk-merge.c | 21 ++++++--------------- 1 file changed, 6 insertions(+), 15 deletions(-) diff --git a/block/blk-merge.c b/block/blk-merge.c index 8f96d683b577..52dbdd089fdf 100644 --- a/block/blk-merge.c +++ b/block/blk-merge.c @@ -467,26 +467,17 @@ static unsigned blk_bvec_map_sg(struct request_queue *q, struct scatterlist **sg) { unsigned nbytes = bvec->bv_len; - unsigned nsegs = 0, total = 0, offset = 0; + unsigned nsegs = 0, total = 0; while (nbytes > 0) { - unsigned seg_size; - struct page *pg; - unsigned idx; + unsigned offset = bvec->bv_offset + total; + unsigned len = min(get_max_segment_size(q, offset), nbytes); *sg = blk_next_sg(sg, sglist); + sg_set_page(*sg, bvec->bv_page, len, offset); - seg_size = get_max_segment_size(q, bvec->bv_offset + total); - seg_size = min(nbytes, seg_size); - - offset = (total + bvec->bv_offset) % PAGE_SIZE; - idx = (total + bvec->bv_offset) / PAGE_SIZE; - pg = bvec_nth_page(bvec->bv_page, idx); - - sg_set_page(*sg, pg, seg_size, offset); - - total += seg_size; - nbytes -= seg_size; + total += len; + nbytes -= len; nsegs++; } -- 2.20.1 ^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH 1/5] block: rewrite blk_bvec_map_sg to avoid a nth_page call 2019-04-08 10:46 ` [PATCH 1/5] block: rewrite blk_bvec_map_sg to avoid a nth_page call Christoph Hellwig @ 2019-04-08 14:03 ` Johannes Thumshirn 2019-04-08 22:04 ` Bart Van Assche ` (2 subsequent siblings) 3 siblings, 0 replies; 29+ messages in thread From: Johannes Thumshirn @ 2019-04-08 14:03 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Jens Axboe, Ming Lei, linux-block Looks good, Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> -- Johannes Thumshirn SUSE Labs Filesystems jthumshirn@suse.de +49 911 74053 689 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Felix Imendörffer, Mary Higgins, Sri Rasiah HRB 21284 (AG Nürnberg) Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850 ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/5] block: rewrite blk_bvec_map_sg to avoid a nth_page call 2019-04-08 10:46 ` [PATCH 1/5] block: rewrite blk_bvec_map_sg to avoid a nth_page call Christoph Hellwig 2019-04-08 14:03 ` Johannes Thumshirn @ 2019-04-08 22:04 ` Bart Van Assche 2019-04-08 22:51 ` Ming Lei 2019-04-15 19:44 ` Guenter Roeck 3 siblings, 0 replies; 29+ messages in thread From: Bart Van Assche @ 2019-04-08 22:04 UTC (permalink / raw) To: Christoph Hellwig, Jens Axboe; +Cc: Ming Lei, linux-block On Mon, 2019-04-08 at 12:46 +0200, Christoph Hellwig wrote: > The offset in scatterlists is allowed to be larger than the page size, > so don't go to great length to avoid that case and simplify the > arithmetics. Reviewed-by: Bart Van Assche <bvanassche@acm.org> ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/5] block: rewrite blk_bvec_map_sg to avoid a nth_page call 2019-04-08 10:46 ` [PATCH 1/5] block: rewrite blk_bvec_map_sg to avoid a nth_page call Christoph Hellwig 2019-04-08 14:03 ` Johannes Thumshirn 2019-04-08 22:04 ` Bart Van Assche @ 2019-04-08 22:51 ` Ming Lei 2019-04-15 19:44 ` Guenter Roeck 3 siblings, 0 replies; 29+ messages in thread From: Ming Lei @ 2019-04-08 22:51 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Jens Axboe, linux-block On Mon, Apr 08, 2019 at 12:46:37PM +0200, Christoph Hellwig wrote: > The offset in scatterlists is allowed to be larger than the page size, > so don't go to great length to avoid that case and simplify the > arithmetics. > > Signed-off-by: Christoph Hellwig <hch@lst.de> > --- > block/blk-merge.c | 21 ++++++--------------- > 1 file changed, 6 insertions(+), 15 deletions(-) > > diff --git a/block/blk-merge.c b/block/blk-merge.c > index 8f96d683b577..52dbdd089fdf 100644 > --- a/block/blk-merge.c > +++ b/block/blk-merge.c > @@ -467,26 +467,17 @@ static unsigned blk_bvec_map_sg(struct request_queue *q, > struct scatterlist **sg) > { > unsigned nbytes = bvec->bv_len; > - unsigned nsegs = 0, total = 0, offset = 0; > + unsigned nsegs = 0, total = 0; > > while (nbytes > 0) { > - unsigned seg_size; > - struct page *pg; > - unsigned idx; > + unsigned offset = bvec->bv_offset + total; > + unsigned len = min(get_max_segment_size(q, offset), nbytes); > > *sg = blk_next_sg(sg, sglist); > + sg_set_page(*sg, bvec->bv_page, len, offset); > > - seg_size = get_max_segment_size(q, bvec->bv_offset + total); > - seg_size = min(nbytes, seg_size); > - > - offset = (total + bvec->bv_offset) % PAGE_SIZE; > - idx = (total + bvec->bv_offset) / PAGE_SIZE; > - pg = bvec_nth_page(bvec->bv_page, idx); > - > - sg_set_page(*sg, pg, seg_size, offset); > - > - total += seg_size; > - nbytes -= seg_size; > + total += len; > + nbytes -= len; > nsegs++; > } Nice cleanup & simplification: Reviewed-by: Ming Lei <ming.lei@redhat.com> Thanks, Ming ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/5] block: rewrite blk_bvec_map_sg to avoid a nth_page call 2019-04-08 10:46 ` [PATCH 1/5] block: rewrite blk_bvec_map_sg to avoid a nth_page call Christoph Hellwig ` (2 preceding siblings ...) 2019-04-08 22:51 ` Ming Lei @ 2019-04-15 19:44 ` Guenter Roeck 2019-04-15 20:52 ` Christoph Hellwig 3 siblings, 1 reply; 29+ messages in thread From: Guenter Roeck @ 2019-04-15 19:44 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Jens Axboe, Ming Lei, linux-block On Mon, Apr 08, 2019 at 12:46:37PM +0200, Christoph Hellwig wrote: > The offset in scatterlists is allowed to be larger than the page size, > so don't go to great length to avoid that case and simplify the > arithmetics. > > Signed-off-by: Christoph Hellwig <hch@lst.de> > Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> > Reviewed-by: Bart Van Assche <bvanassche@acm.org> > Reviewed-by: Ming Lei <ming.lei@redhat.com> > --- This patch causes crashes with various boot tests. Most sparc tests crash, as well as several arm tests. Bisect results in both cases point to this patch. On sparc: Run /sbin/init as init process kernel BUG at arch/sparc/lib/bitext.c:112! \|/ ____ \|/ "@'/ ,. \`@" /_| \__/ |_\ \__U_/ lock_torture_wr(23): Kernel bad trap [#1] CPU: 0 PID: 23 Comm: lock_torture_wr Not tainted 5.1.0-rc4-next-20190415 #1 PSR: 04401fc0 PC: f04a4e28 NPC: f04a4e2c Y: 00000000 Not tainted PC: <bit_map_clear+0xe8/0xec> %G: f000caec 00010003 f05db308 000000fd 00000000 000000fd f514e000 f062be28 %O: 0000002a f05a41e0 00000070 00000001 f05dd564 f05dd504 f514f9a0 f04a4e20 RPC: <bit_map_clear+0xe0/0xec> %L: 04001fc1 f00130c4 f00130c8 00000002 00000000 00000004 f514e000 00000547 %I: f500b990 0000003a 00000012 f513e570 00010000 f500b990 f514fa00 f001316c Disabling lock debugging due to kernel taint Caller[f001316c]: sbus_iommu_unmap_sg+0x34/0x60 Caller[f02d6170]: scsi_dma_unmap+0xac/0xd0 Caller[f02e2424]: esp_cmd_is_done+0x198/0x1a8 Caller[f02e4150]: scsi_esp_intr+0x191c/0x21e0 Caller[f0055438]: __handle_irq_event_percpu+0x8c/0x128 Caller[f00554e0]: handle_irq_event_percpu+0xc/0x4c Caller[f0055550]: handle_irq_event+0x30/0x64 Caller[f0058b10]: handle_level_irq+0xb4/0x17c Caller[f0054c38]: generic_handle_irq+0x30/0x40 Caller[f0009a4c]: handler_irq+0x3c/0x78 Caller[f0007b68]: patch_handler_irq+0x0/0x24 Caller[f004e054]: lock_torture_writer+0xc0/0x1fc Caller[f003ca18]: kthread+0xd8/0x110 Caller[f00082f0]: ret_from_kernel_thread+0xc/0x38 Caller[00000000]: (null) Instruction DUMP: 113c1690 7fed90ae 901221e0 <91d02005> 9de3bfa0 92102000 9406a01f 90100019 9532a005 Kernel panic - not syncing: Aiee, killing interrupt handler! On arm: [ 13.836353] Run /sbin/init as init process [ 13.937406] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b [ 13.937736] CPU: 0 PID: 1 Comm: init Not tainted 5.1.0-rc4-next-20190415 #1 [ 13.937895] Hardware name: ARM-Versatile Express [ 13.938472] [<c0312a74>] (unwind_backtrace) from [<c030d350>] (show_stack+0x10/0x14) [ 13.938671] [<c030d350>] (show_stack) from [<c1097184>] (dump_stack+0xb4/0xc8) [ 13.938834] [<c1097184>] (dump_stack) from [<c03474c4>] (panic+0x110/0x328) [ 13.939008] [<c03474c4>] (panic) from [<c034c360>] (do_exit+0xbf8/0xc04) [ 13.939161] [<c034c360>] (do_exit) from [<c034d3c0>] (do_group_exit+0x3c/0xbc) [ 13.939323] [<c034d3c0>] (do_group_exit) from [<c0359460>] (get_signal+0x13c/0x9c4) [ 13.939491] [<c0359460>] (get_signal) from [<c030c760>] (do_work_pending+0x198/0x5f0) [ 13.939660] [<c030c760>] (do_work_pending) from [<c030106c>] (slow_work_pending+0xc/0x20) [ 13.939862] Exception stack(0xc703ffb0 to 0xc703fff8) [ 13.940102] ffa0: b6f0e578 00000016 00000015 00000004 [ 13.940372] ffc0: b6f0e060 b6f0e578 b6ede000 b6ede360 b6ef5dc0 b6ede95c b6ede2c0 becb6efc [ 13.940613] ffe0: b6f0e060 becb6ec0 b6edf628 b6ee9c00 60000010 ffffffff Reverting the patch isn't possible since there are subsequent changes affecting the code, and the image no longer builds after the revert. Guenter --- bisect results: # bad: [f9221a7a1014d8a047b277a73289678646ddc110] Add linux-next specific files for 20190415 # good: [15ade5d2e7775667cf191cf2f94327a4889f8b9d] Linux 5.1-rc4 git bisect start 'HEAD' 'v5.1-rc4' # good: [34ad076e8efd33658e4367df70a92058de7a870d] Merge remote-tracking branch 'crypto/master' git bisect good 34ad076e8efd33658e4367df70a92058de7a870d # bad: [5b51a36f96d773866e1a2165bacfcde58464eb1d] Merge remote-tracking branch 'devicetree/for-next' git bisect bad 5b51a36f96d773866e1a2165bacfcde58464eb1d # good: [457109829f4ee4107e8c7108237afba21fabbb5e] Merge branch 'drm-next-5.2' of git://people.freedesktop.org/~agd5f/linux into drm-next git bisect good 457109829f4ee4107e8c7108237afba21fabbb5e # good: [c4af5502948132ca3ce48d047188f8fc2f484dd7] Merge remote-tracking branch 'sound-asoc/for-next' git bisect good c4af5502948132ca3ce48d047188f8fc2f484dd7 # bad: [722312b700e5fe2e9b16547fc2cb0dbbfb3e345c] Merge remote-tracking branch 'battery/for-next' git bisect bad 722312b700e5fe2e9b16547fc2cb0dbbfb3e345c # bad: [da55bae775eb580cd7f1b74850e55cf8880e7984] Merge branch 'for-5.2/block' into for-next git bisect bad da55bae775eb580cd7f1b74850e55cf8880e7984 # good: [1e815b33c5ccd3936b71292b5ffb84e97e1df9e0] block: sed-opal: fix typos and formatting git bisect good 1e815b33c5ccd3936b71292b5ffb84e97e1df9e0 # good: [06bda9d56ba3c0ba5ecb41f27da93a417aa442d1] Merge branch 'for-5.2/block' into for-next git bisect good 06bda9d56ba3c0ba5ecb41f27da93a417aa442d1 # bad: [14eacf12dbc75352fa746dfd9e24de3170ba5ff5] block: don't allow multiple bio_iov_iter_get_pages calls per bio git bisect bad 14eacf12dbc75352fa746dfd9e24de3170ba5ff5 # good: [ae50640bebc48f1fc0092f16ea004c7c4d12c985] md: use correct type in super_1_sync git bisect good ae50640bebc48f1fc0092f16ea004c7c4d12c985 # good: [efcd487c69b9d968552a6bf80e7839c4f28b419d] md: add __acquires/__releases annotations to handle_active_stripes git bisect good efcd487c69b9d968552a6bf80e7839c4f28b419d # bad: [8a96a0e408102fb7aa73d8aa0b5e2219cfd51e55] block: rewrite blk_bvec_map_sg to avoid a nth_page call git bisect bad 8a96a0e408102fb7aa73d8aa0b5e2219cfd51e55 # good: [22391ac30ab9cc2ba610bf7ea2244840b83c8017] Merge branch 'md-next' of https://github.com/liu-song-6/linux into for-5.2/block git bisect good 22391ac30ab9cc2ba610bf7ea2244840b83c8017 # first bad commit: [8a96a0e408102fb7aa73d8aa0b5e2219cfd51e55] block: rewrite blk_bvec_map_sg to avoid a nth_page call ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/5] block: rewrite blk_bvec_map_sg to avoid a nth_page call 2019-04-15 19:44 ` Guenter Roeck @ 2019-04-15 20:52 ` Christoph Hellwig 2019-04-15 21:07 ` Guenter Roeck 0 siblings, 1 reply; 29+ messages in thread From: Christoph Hellwig @ 2019-04-15 20:52 UTC (permalink / raw) To: Guenter Roeck; +Cc: Christoph Hellwig, Jens Axboe, Ming Lei, linux-block On Mon, Apr 15, 2019 at 12:44:35PM -0700, Guenter Roeck wrote: > This patch causes crashes with various boot tests. Most sparc tests crash, as > well as several arm tests. Bisect results in both cases point to this patch. That just means we trigger an existing bug more easily now. I'll see if I can help with the issues. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/5] block: rewrite blk_bvec_map_sg to avoid a nth_page call 2019-04-15 20:52 ` Christoph Hellwig @ 2019-04-15 21:07 ` Guenter Roeck 2019-04-16 6:33 ` Christoph Hellwig 0 siblings, 1 reply; 29+ messages in thread From: Guenter Roeck @ 2019-04-15 21:07 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Jens Axboe, Ming Lei, linux-block On Mon, Apr 15, 2019 at 10:52:42PM +0200, Christoph Hellwig wrote: > On Mon, Apr 15, 2019 at 12:44:35PM -0700, Guenter Roeck wrote: > > This patch causes crashes with various boot tests. Most sparc tests crash, as > > well as several arm tests. Bisect results in both cases point to this patch. > > That just means we trigger an existing bug more easily now. I'll see > if I can help with the issues. Code which previously worked reliably no longer does. I would be quite hesitant to call this "trigger an existing bug more easily". "Regression" seems to be a more appropriate term - even more so as it seems to cause 'init' crashes, at least on arm. Guenter ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/5] block: rewrite blk_bvec_map_sg to avoid a nth_page call 2019-04-15 21:07 ` Guenter Roeck @ 2019-04-16 6:33 ` Christoph Hellwig 2019-04-16 14:09 ` Guenter Roeck 2019-04-16 17:08 ` Guenter Roeck 0 siblings, 2 replies; 29+ messages in thread From: Christoph Hellwig @ 2019-04-16 6:33 UTC (permalink / raw) To: Guenter Roeck; +Cc: Christoph Hellwig, Jens Axboe, Ming Lei, linux-block On Mon, Apr 15, 2019 at 02:07:31PM -0700, Guenter Roeck wrote: > On Mon, Apr 15, 2019 at 10:52:42PM +0200, Christoph Hellwig wrote: > > On Mon, Apr 15, 2019 at 12:44:35PM -0700, Guenter Roeck wrote: > > > This patch causes crashes with various boot tests. Most sparc tests crash, as > > > well as several arm tests. Bisect results in both cases point to this patch. > > > > That just means we trigger an existing bug more easily now. I'll see > > if I can help with the issues. > > Code which previously worked reliably no longer does. I would be quite > hesitant to call this "trigger an existing bug more easily". "Regression" > seems to be a more appropriate term - even more so as it seems to cause > 'init' crashes, at least on arm. Well, we have these sgls in the wild already, it just is that they are fairly rare. For a related fix on a mainstream platform see here for example: https://lore.kernel.org/patchwork/patch/1050367/ Below is a rework of the sparc32 iommu code that should avoid your reported problem. Please send any other reports to me as well. diff --git a/arch/sparc/mm/iommu.c b/arch/sparc/mm/iommu.c index e8d5d73ca40d..93c2fc440cb0 100644 --- a/arch/sparc/mm/iommu.c +++ b/arch/sparc/mm/iommu.c @@ -175,16 +175,38 @@ static void iommu_flush_iotlb(iopte_t *iopte, unsigned int niopte) } } -static u32 iommu_get_one(struct device *dev, struct page *page, int npages) +static u32 __sbus_iommu_map_page(struct device *dev, struct page *page, unsigned offset, + unsigned len, bool need_flush) { struct iommu_struct *iommu = dev->archdata.iommu; + phys_addr_t paddr = page_to_phys(page) + offset, p; + unsigned long pfn = __phys_to_pfn(paddr); + unsigned long off = (unsigned long)paddr & ~PAGE_MASK; + unsigned long npages = (off + len + PAGE_SIZE - 1) >> PAGE_SHIFT; int ioptex; iopte_t *iopte, *iopte0; unsigned int busa, busa0; int i; + /* XXX So what is maxphys for us and how do drivers know it? */ + if (!len || len > 256 * 1024) + return DMA_MAPPING_ERROR; + + /* + * We expect unmapped highmem pages to be not in the cache. + * XXX Is this a good assumption? + * XXX What if someone else unmaps it here and races us? + */ + if (need_flush && !PageHighMem(page)) { + for (p = paddr & PAGE_MASK; p < paddr + len; p += PAGE_SIZE) { + unsigned long vaddr = (unsigned long)phys_to_virt(p); + + flush_page_for_dma(vaddr); + } + } + /* page color = pfn of page */ - ioptex = bit_map_string_get(&iommu->usemap, npages, page_to_pfn(page)); + ioptex = bit_map_string_get(&iommu->usemap, npages, pfn); if (ioptex < 0) panic("iommu out"); busa0 = iommu->start + (ioptex << PAGE_SHIFT); @@ -193,11 +215,11 @@ static u32 iommu_get_one(struct device *dev, struct page *page, int npages) busa = busa0; iopte = iopte0; for (i = 0; i < npages; i++) { - iopte_val(*iopte) = MKIOPTE(page_to_pfn(page), IOPERM); + iopte_val(*iopte) = MKIOPTE(pfn, IOPERM); iommu_invalidate_page(iommu->regs, busa); busa += PAGE_SIZE; iopte++; - page++; + pfn++; } iommu_flush_iotlb(iopte0, npages); @@ -205,99 +227,62 @@ static u32 iommu_get_one(struct device *dev, struct page *page, int npages) return busa0; } -static dma_addr_t __sbus_iommu_map_page(struct device *dev, struct page *page, - unsigned long offset, size_t len) -{ - void *vaddr = page_address(page) + offset; - unsigned long off = (unsigned long)vaddr & ~PAGE_MASK; - unsigned long npages = (off + len + PAGE_SIZE - 1) >> PAGE_SHIFT; - - /* XXX So what is maxphys for us and how do drivers know it? */ - if (!len || len > 256 * 1024) - return DMA_MAPPING_ERROR; - return iommu_get_one(dev, virt_to_page(vaddr), npages) + off; -} - static dma_addr_t sbus_iommu_map_page_gflush(struct device *dev, struct page *page, unsigned long offset, size_t len, enum dma_data_direction dir, unsigned long attrs) { flush_page_for_dma(0); - return __sbus_iommu_map_page(dev, page, offset, len); + return __sbus_iommu_map_page(dev, page, offset, len, false); } static dma_addr_t sbus_iommu_map_page_pflush(struct device *dev, struct page *page, unsigned long offset, size_t len, enum dma_data_direction dir, unsigned long attrs) { - void *vaddr = page_address(page) + offset; - unsigned long p = ((unsigned long)vaddr) & PAGE_MASK; - - while (p < (unsigned long)vaddr + len) { - flush_page_for_dma(p); - p += PAGE_SIZE; - } - - return __sbus_iommu_map_page(dev, page, offset, len); + return __sbus_iommu_map_page(dev, page, offset, len, true); } -static int sbus_iommu_map_sg_gflush(struct device *dev, struct scatterlist *sgl, - int nents, enum dma_data_direction dir, unsigned long attrs) +static int __sbus_iommu_map_sg(struct device *dev, struct scatterlist *sgl, + int nents, enum dma_data_direction dir, unsigned long attrs, + bool need_flush) { struct scatterlist *sg; - int i, n; - - flush_page_for_dma(0); + int i; for_each_sg(sgl, sg, nents, i) { - n = (sg->length + sg->offset + PAGE_SIZE-1) >> PAGE_SHIFT; - sg->dma_address = iommu_get_one(dev, sg_page(sg), n) + sg->offset; + sg->dma_address = __sbus_iommu_map_page(dev, sg_page(sg), + sg->offset, sg->length, need_flush); + if (sg->dma_address == DMA_MAPPING_ERROR) + return 0; sg->dma_length = sg->length; } return nents; } -static int sbus_iommu_map_sg_pflush(struct device *dev, struct scatterlist *sgl, +static int sbus_iommu_map_sg_gflush(struct device *dev, struct scatterlist *sgl, int nents, enum dma_data_direction dir, unsigned long attrs) { - unsigned long page, oldpage = 0; - struct scatterlist *sg; - int i, j, n; - - for_each_sg(sgl, sg, nents, j) { - n = (sg->length + sg->offset + PAGE_SIZE-1) >> PAGE_SHIFT; - - /* - * We expect unmapped highmem pages to be not in the cache. - * XXX Is this a good assumption? - * XXX What if someone else unmaps it here and races us? - */ - if ((page = (unsigned long) page_address(sg_page(sg))) != 0) { - for (i = 0; i < n; i++) { - if (page != oldpage) { /* Already flushed? */ - flush_page_for_dma(page); - oldpage = page; - } - page += PAGE_SIZE; - } - } - - sg->dma_address = iommu_get_one(dev, sg_page(sg), n) + sg->offset; - sg->dma_length = sg->length; - } + flush_page_for_dma(0); + return __sbus_iommu_map_sg(dev, sgl, nents, dir, attrs, false); +} - return nents; +static int sbus_iommu_map_sg_pflush(struct device *dev, struct scatterlist *sgl, + int nents, enum dma_data_direction dir, unsigned long attrs) +{ + return __sbus_iommu_map_sg(dev, sgl, nents, dir, attrs, true); } -static void iommu_release_one(struct device *dev, u32 busa, int npages) +static void __sbus_iommu_unmap_page(struct device *dev, dma_addr_t dma_addr, + size_t len) { struct iommu_struct *iommu = dev->archdata.iommu; - int ioptex; - int i; + unsigned busa, npages, ioptex, i; + busa = dma_addr & PAGE_MASK; BUG_ON(busa < iommu->start); ioptex = (busa - iommu->start) >> PAGE_SHIFT; + npages = ((dma_addr & ~PAGE_MASK) + len + PAGE_SIZE-1) >> PAGE_SHIFT; for (i = 0; i < npages; i++) { iopte_val(iommu->page_table[ioptex + i]) = 0; iommu_invalidate_page(iommu->regs, busa); @@ -309,22 +294,17 @@ static void iommu_release_one(struct device *dev, u32 busa, int npages) static void sbus_iommu_unmap_page(struct device *dev, dma_addr_t dma_addr, size_t len, enum dma_data_direction dir, unsigned long attrs) { - unsigned long off = dma_addr & ~PAGE_MASK; - int npages; - - npages = (off + len + PAGE_SIZE-1) >> PAGE_SHIFT; - iommu_release_one(dev, dma_addr & PAGE_MASK, npages); + __sbus_iommu_unmap_page(dev, dma_addr, len); } static void sbus_iommu_unmap_sg(struct device *dev, struct scatterlist *sgl, int nents, enum dma_data_direction dir, unsigned long attrs) { struct scatterlist *sg; - int i, n; + int i; for_each_sg(sgl, sg, nents, i) { - n = (sg->length + sg->offset + PAGE_SIZE-1) >> PAGE_SHIFT; - iommu_release_one(dev, sg->dma_address & PAGE_MASK, n); + __sbus_iommu_unmap_page(dev, sg->dma_address, sg->length); sg->dma_address = 0x21212121; } } ^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH 1/5] block: rewrite blk_bvec_map_sg to avoid a nth_page call 2019-04-16 6:33 ` Christoph Hellwig @ 2019-04-16 14:09 ` Guenter Roeck 2019-04-16 17:08 ` Guenter Roeck 1 sibling, 0 replies; 29+ messages in thread From: Guenter Roeck @ 2019-04-16 14:09 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Jens Axboe, Ming Lei, linux-block On 4/15/19 11:33 PM, Christoph Hellwig wrote: > On Mon, Apr 15, 2019 at 02:07:31PM -0700, Guenter Roeck wrote: >> On Mon, Apr 15, 2019 at 10:52:42PM +0200, Christoph Hellwig wrote: >>> On Mon, Apr 15, 2019 at 12:44:35PM -0700, Guenter Roeck wrote: >>>> This patch causes crashes with various boot tests. Most sparc tests crash, as >>>> well as several arm tests. Bisect results in both cases point to this patch. >>> >>> That just means we trigger an existing bug more easily now. I'll see >>> if I can help with the issues. >> >> Code which previously worked reliably no longer does. I would be quite >> hesitant to call this "trigger an existing bug more easily". "Regression" >> seems to be a more appropriate term - even more so as it seems to cause >> 'init' crashes, at least on arm. > > Well, we have these sgls in the wild already, it just is that they > are fairly rare. For a related fix on a mainstream platform see > here for example: > > https://lore.kernel.org/patchwork/patch/1050367/ > > Below is a rework of the sparc32 iommu code that should avoid your > reported problem. Please send any other reports to me as well. > The code below on top of next-20190415 results in esp ffd398e4: esp0: regs[(ptrval):(ptrval)] irq[2] esp ffd398e4: esp0: is a FAS100A, 40 MHz (ccf=0), SCSI ID 7 scsi host0: esp scsi 0:0:0:0: Direct-Access QEMU QEMU HARDDISK 2.5+ PQ: 0 ANSI: 5 scsi target0:0:0: Beginning Domain Validation scsi target0:0:0: Domain Validation Initial Inquiry Failed scsi target0:0:0: Ending Domain Validation scsi host0: scsi scan: INQUIRY result too short (5), using 36 scsi 0:0:2:0: Direct-Access PQ: 0 ANSI: 0 scsi target0:0:2: Beginning Domain Validation scsi target0:0:2: Domain Validation Initial Inquiry Failed scsi target0:0:2: Ending Domain Validation ... sd 0:0:2:0: Device offlined - not ready after error recovery sd 0:0:2:0: rejecting I/O to offline device and sometimes: ... sd 0:0:0:4: [sde] Asking for cache data failed sd 0:0:0:4: [sde] Assuming drive cache: write through sd 0:0:0:4: rejecting I/O to offline device sd 0:0:0:5: Device offlined - not ready after error recovery Unable to handle kernel NULL pointer dereference tsk->{mm,active_mm}->context = ffffffff tsk->{mm,active_mm}->pgd = fc000000 \|/ ____ \|/ "@'/ ,. \`@" /_| \__/ |_\ \__U_/ kworker/u2:6(90): Oops [#1] CPU: 0 PID: 90 Comm: kworker/u2:6 Not tainted 5.1.0-rc4-next-20190415-00001-g328cdb292761 #1 Workqueue: events_unbound async_run_entry_fn PSR: 409010c5 PC: f02d2904 NPC: f02d2908 Y: 0000000c Not tainted PC: <__scsi_execute+0xc/0x18c> %G: f5334608 00000000 00000000 00000002 00041200 00000008 f5386000 00000002 %O: f5334608 00000000 f5387c90 f50d3fb0 0000000b f5334a4c f5387b98 f02d2a38 RPC: <__scsi_execute+0x140/0x18c> %L: 00000000 00000000 f51b1800 00000001 00000002 00000000 f5386000 f05f9070 %I: 00000200 f5387ca0 00000002 00000000 00000000 00000000 f5387bf8 f02e92a4 Disabling lock debugging due to kernel taint Caller[f02e92a4]: sd_revalidate_disk+0xe8/0x1f5c Caller[f02eb418]: sd_probe+0x2b0/0x3f0 Caller[f02bbc98]: really_probe+0x1bc/0x2e8 Caller[f02bc10c]: __driver_attach_async_helper+0x48/0x58 Caller[f003f534]: async_run_entry_fn+0x38/0x124 Caller[f00373bc]: process_one_work+0x168/0x390 Caller[f0037728]: worker_thread+0x144/0x504 Caller[f003c90c]: kthread+0xd8/0x110 Caller[f00082f0]: ret_from_kernel_thread+0xc/0x38 Caller[00000000]: (null) Instruction DUMP: 9de3bfa0 b41ea001 80a0001a <d0062004> 92603fff a4100018 e207a06c e007a074 94102008 Guenter > diff --git a/arch/sparc/mm/iommu.c b/arch/sparc/mm/iommu.c > index e8d5d73ca40d..93c2fc440cb0 100644 > --- a/arch/sparc/mm/iommu.c > +++ b/arch/sparc/mm/iommu.c > @@ -175,16 +175,38 @@ static void iommu_flush_iotlb(iopte_t *iopte, unsigned int niopte) > } > } > > -static u32 iommu_get_one(struct device *dev, struct page *page, int npages) > +static u32 __sbus_iommu_map_page(struct device *dev, struct page *page, unsigned offset, > + unsigned len, bool need_flush) > { > struct iommu_struct *iommu = dev->archdata.iommu; > + phys_addr_t paddr = page_to_phys(page) + offset, p; > + unsigned long pfn = __phys_to_pfn(paddr); > + unsigned long off = (unsigned long)paddr & ~PAGE_MASK; > + unsigned long npages = (off + len + PAGE_SIZE - 1) >> PAGE_SHIFT; > int ioptex; > iopte_t *iopte, *iopte0; > unsigned int busa, busa0; > int i; > > + /* XXX So what is maxphys for us and how do drivers know it? */ > + if (!len || len > 256 * 1024) > + return DMA_MAPPING_ERROR; > + > + /* > + * We expect unmapped highmem pages to be not in the cache. > + * XXX Is this a good assumption? > + * XXX What if someone else unmaps it here and races us? > + */ > + if (need_flush && !PageHighMem(page)) { > + for (p = paddr & PAGE_MASK; p < paddr + len; p += PAGE_SIZE) { > + unsigned long vaddr = (unsigned long)phys_to_virt(p); > + > + flush_page_for_dma(vaddr); > + } > + } > + > /* page color = pfn of page */ > - ioptex = bit_map_string_get(&iommu->usemap, npages, page_to_pfn(page)); > + ioptex = bit_map_string_get(&iommu->usemap, npages, pfn); > if (ioptex < 0) > panic("iommu out"); > busa0 = iommu->start + (ioptex << PAGE_SHIFT); > @@ -193,11 +215,11 @@ static u32 iommu_get_one(struct device *dev, struct page *page, int npages) > busa = busa0; > iopte = iopte0; > for (i = 0; i < npages; i++) { > - iopte_val(*iopte) = MKIOPTE(page_to_pfn(page), IOPERM); > + iopte_val(*iopte) = MKIOPTE(pfn, IOPERM); > iommu_invalidate_page(iommu->regs, busa); > busa += PAGE_SIZE; > iopte++; > - page++; > + pfn++; > } > > iommu_flush_iotlb(iopte0, npages); > @@ -205,99 +227,62 @@ static u32 iommu_get_one(struct device *dev, struct page *page, int npages) > return busa0; > } > > -static dma_addr_t __sbus_iommu_map_page(struct device *dev, struct page *page, > - unsigned long offset, size_t len) > -{ > - void *vaddr = page_address(page) + offset; > - unsigned long off = (unsigned long)vaddr & ~PAGE_MASK; > - unsigned long npages = (off + len + PAGE_SIZE - 1) >> PAGE_SHIFT; > - > - /* XXX So what is maxphys for us and how do drivers know it? */ > - if (!len || len > 256 * 1024) > - return DMA_MAPPING_ERROR; > - return iommu_get_one(dev, virt_to_page(vaddr), npages) + off; > -} > - > static dma_addr_t sbus_iommu_map_page_gflush(struct device *dev, > struct page *page, unsigned long offset, size_t len, > enum dma_data_direction dir, unsigned long attrs) > { > flush_page_for_dma(0); > - return __sbus_iommu_map_page(dev, page, offset, len); > + return __sbus_iommu_map_page(dev, page, offset, len, false); > } > > static dma_addr_t sbus_iommu_map_page_pflush(struct device *dev, > struct page *page, unsigned long offset, size_t len, > enum dma_data_direction dir, unsigned long attrs) > { > - void *vaddr = page_address(page) + offset; > - unsigned long p = ((unsigned long)vaddr) & PAGE_MASK; > - > - while (p < (unsigned long)vaddr + len) { > - flush_page_for_dma(p); > - p += PAGE_SIZE; > - } > - > - return __sbus_iommu_map_page(dev, page, offset, len); > + return __sbus_iommu_map_page(dev, page, offset, len, true); > } > > -static int sbus_iommu_map_sg_gflush(struct device *dev, struct scatterlist *sgl, > - int nents, enum dma_data_direction dir, unsigned long attrs) > +static int __sbus_iommu_map_sg(struct device *dev, struct scatterlist *sgl, > + int nents, enum dma_data_direction dir, unsigned long attrs, > + bool need_flush) > { > struct scatterlist *sg; > - int i, n; > - > - flush_page_for_dma(0); > + int i; > > for_each_sg(sgl, sg, nents, i) { > - n = (sg->length + sg->offset + PAGE_SIZE-1) >> PAGE_SHIFT; > - sg->dma_address = iommu_get_one(dev, sg_page(sg), n) + sg->offset; > + sg->dma_address = __sbus_iommu_map_page(dev, sg_page(sg), > + sg->offset, sg->length, need_flush); > + if (sg->dma_address == DMA_MAPPING_ERROR) > + return 0; > sg->dma_length = sg->length; > } > > return nents; > } > > -static int sbus_iommu_map_sg_pflush(struct device *dev, struct scatterlist *sgl, > +static int sbus_iommu_map_sg_gflush(struct device *dev, struct scatterlist *sgl, > int nents, enum dma_data_direction dir, unsigned long attrs) > { > - unsigned long page, oldpage = 0; > - struct scatterlist *sg; > - int i, j, n; > - > - for_each_sg(sgl, sg, nents, j) { > - n = (sg->length + sg->offset + PAGE_SIZE-1) >> PAGE_SHIFT; > - > - /* > - * We expect unmapped highmem pages to be not in the cache. > - * XXX Is this a good assumption? > - * XXX What if someone else unmaps it here and races us? > - */ > - if ((page = (unsigned long) page_address(sg_page(sg))) != 0) { > - for (i = 0; i < n; i++) { > - if (page != oldpage) { /* Already flushed? */ > - flush_page_for_dma(page); > - oldpage = page; > - } > - page += PAGE_SIZE; > - } > - } > - > - sg->dma_address = iommu_get_one(dev, sg_page(sg), n) + sg->offset; > - sg->dma_length = sg->length; > - } > + flush_page_for_dma(0); > + return __sbus_iommu_map_sg(dev, sgl, nents, dir, attrs, false); > +} > > - return nents; > +static int sbus_iommu_map_sg_pflush(struct device *dev, struct scatterlist *sgl, > + int nents, enum dma_data_direction dir, unsigned long attrs) > +{ > + return __sbus_iommu_map_sg(dev, sgl, nents, dir, attrs, true); > } > > -static void iommu_release_one(struct device *dev, u32 busa, int npages) > +static void __sbus_iommu_unmap_page(struct device *dev, dma_addr_t dma_addr, > + size_t len) > { > struct iommu_struct *iommu = dev->archdata.iommu; > - int ioptex; > - int i; > + unsigned busa, npages, ioptex, i; > > + busa = dma_addr & PAGE_MASK; > BUG_ON(busa < iommu->start); > ioptex = (busa - iommu->start) >> PAGE_SHIFT; > + npages = ((dma_addr & ~PAGE_MASK) + len + PAGE_SIZE-1) >> PAGE_SHIFT; > for (i = 0; i < npages; i++) { > iopte_val(iommu->page_table[ioptex + i]) = 0; > iommu_invalidate_page(iommu->regs, busa); > @@ -309,22 +294,17 @@ static void iommu_release_one(struct device *dev, u32 busa, int npages) > static void sbus_iommu_unmap_page(struct device *dev, dma_addr_t dma_addr, > size_t len, enum dma_data_direction dir, unsigned long attrs) > { > - unsigned long off = dma_addr & ~PAGE_MASK; > - int npages; > - > - npages = (off + len + PAGE_SIZE-1) >> PAGE_SHIFT; > - iommu_release_one(dev, dma_addr & PAGE_MASK, npages); > + __sbus_iommu_unmap_page(dev, dma_addr, len); > } > > static void sbus_iommu_unmap_sg(struct device *dev, struct scatterlist *sgl, > int nents, enum dma_data_direction dir, unsigned long attrs) > { > struct scatterlist *sg; > - int i, n; > + int i; > > for_each_sg(sgl, sg, nents, i) { > - n = (sg->length + sg->offset + PAGE_SIZE-1) >> PAGE_SHIFT; > - iommu_release_one(dev, sg->dma_address & PAGE_MASK, n); > + __sbus_iommu_unmap_page(dev, sg->dma_address, sg->length); > sg->dma_address = 0x21212121; > } > } > ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/5] block: rewrite blk_bvec_map_sg to avoid a nth_page call 2019-04-16 6:33 ` Christoph Hellwig 2019-04-16 14:09 ` Guenter Roeck @ 2019-04-16 17:08 ` Guenter Roeck 2019-04-16 17:10 ` Christoph Hellwig 1 sibling, 1 reply; 29+ messages in thread From: Guenter Roeck @ 2019-04-16 17:08 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Jens Axboe, Ming Lei, linux-block On Tue, Apr 16, 2019 at 08:33:56AM +0200, Christoph Hellwig wrote: > On Mon, Apr 15, 2019 at 02:07:31PM -0700, Guenter Roeck wrote: > > On Mon, Apr 15, 2019 at 10:52:42PM +0200, Christoph Hellwig wrote: > > > On Mon, Apr 15, 2019 at 12:44:35PM -0700, Guenter Roeck wrote: > > > > This patch causes crashes with various boot tests. Most sparc tests crash, as > > > > well as several arm tests. Bisect results in both cases point to this patch. > > > > > > That just means we trigger an existing bug more easily now. I'll see > > > if I can help with the issues. > > > > Code which previously worked reliably no longer does. I would be quite > > hesitant to call this "trigger an existing bug more easily". "Regression" > > seems to be a more appropriate term - even more so as it seems to cause > > 'init' crashes, at least on arm. > > Well, we have these sgls in the wild already, it just is that they That is besides the point. Your code changes an internal API to be more stringent and less forgiving. This causes failures, presumably because callers of that API took advantage (on purpose or not) of it. When changing an API, you are responsible for both ends. You can not claim that the callers of that API are buggy. Taking advangage of a forgiving API is not a bug. If you change an API, and that change causes a failure, that is a regression, not a bug on the side of the caller. On top of that, an API change causing roughly 4% of my boot tests to fail is a serious regression. Those boot tests don't really do anything besides trying to boot the system. If 4% of those tests fail, I don't even want to know what else is going to fail when your patch (or patch series) hits mainline. Your patch should be reverted until that is resolved. If making the API more stringent / less forgiving indeed makes sense and improves code quality and/or performance, the very least would be to change the code to still accept what it used to accept before but generate a traceback. That would let people fix the calling code without making systems unusable. This is even more true with failures like the one I observed on arm, where your patch causes init to crash without clear indication of the root cause of that crash. Guenter ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/5] block: rewrite blk_bvec_map_sg to avoid a nth_page call 2019-04-16 17:08 ` Guenter Roeck @ 2019-04-16 17:10 ` Christoph Hellwig 2019-04-16 17:51 ` Guenter Roeck 0 siblings, 1 reply; 29+ messages in thread From: Christoph Hellwig @ 2019-04-16 17:10 UTC (permalink / raw) To: Guenter Roeck; +Cc: Christoph Hellwig, Jens Axboe, Ming Lei, linux-block On Tue, Apr 16, 2019 at 10:08:47AM -0700, Guenter Roeck wrote: > That is besides the point. Your code changes an internal API to be more > stringent and less forgiving. This causes failures, presumably because > callers of that API took advantage (on purpose or not) of it. > When changing an API, you are responsible for both ends. You can not claim > that the callers of that API are buggy. Taking advangage of a forgiving > API is not a bug. If you change an API, and that change causes a failure, > that is a regression, not a bug on the side of the caller. As said I offered to fix these, even if this isn't my fault. I'm also still waiting for the the other reports. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/5] block: rewrite blk_bvec_map_sg to avoid a nth_page call 2019-04-16 17:10 ` Christoph Hellwig @ 2019-04-16 17:51 ` Guenter Roeck 2019-04-17 5:27 ` Christoph Hellwig 0 siblings, 1 reply; 29+ messages in thread From: Guenter Roeck @ 2019-04-16 17:51 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Jens Axboe, Ming Lei, linux-block On Tue, Apr 16, 2019 at 07:10:11PM +0200, Christoph Hellwig wrote: > On Tue, Apr 16, 2019 at 10:08:47AM -0700, Guenter Roeck wrote: > > That is besides the point. Your code changes an internal API to be more > > stringent and less forgiving. This causes failures, presumably because > > callers of that API took advantage (on purpose or not) of it. > > When changing an API, you are responsible for both ends. You can not claim > > that the callers of that API are buggy. Taking advangage of a forgiving > > API is not a bug. If you change an API, and that change causes a failure, > > that is a regression, not a bug on the side of the caller. > > As said I offered to fix these, even if this isn't my fault. I'm also "even if this isn't my fault" Here is where we disagree. You introduced the change, you are responsible for its impact, on both ends. > still waiting for the the other reports. I reported everything I know. To summarize, the following tests are confirmed to fail due to this patch. arm:vexpress-a9:multi_v7_defconfig:nolocktests:sd:mem128:vexpress-v2p-ca9:rootfs arm:vexpress-a15:multi_v7_defconfig:nolocktests:sd:mem128:vexpress-v2p-ca15-tc1:rootfs arm:vexpress-a15-a7:multi_v7_defconfig:nolocktests:sd:mem256:vexpress-v2p-ca15_a7:rootfs sparc32:SPARCClassic:nosmp:scsi:hd sparc32:SPARCbook:nosmp:scsi:cd sparc32:SS-5:nosmp:scsi:hd sparc32:SS-10:nosmp:scsi:cd sparc32:SS-600MP:nosmp:scsi:hd sparc32:Voyager:nosmp:noapc:scsi:hd sparc32:SS-4:smp:scsi:hd sparc32:SS-5:smp:scsi:cd sparc32:SS-20:smp:scsi:hd sparc32:SS-600MP:smp:scsi:hd sparc32:Voyager:smp:noapc:scsi:hd Detailed logs are available at https://kerneltests.org/builders, and the test scripts are published at https://github.com/groeck/linux-build-test. Guenter ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/5] block: rewrite blk_bvec_map_sg to avoid a nth_page call 2019-04-16 17:51 ` Guenter Roeck @ 2019-04-17 5:27 ` Christoph Hellwig 2019-04-17 13:42 ` Guenter Roeck 2019-04-17 21:59 ` Guenter Roeck 0 siblings, 2 replies; 29+ messages in thread From: Christoph Hellwig @ 2019-04-17 5:27 UTC (permalink / raw) To: Guenter Roeck; +Cc: Christoph Hellwig, Jens Axboe, Ming Lei, linux-block Now that I've fixed the sparc32 iommu code in another thread: can you send me your rootfs and qemu arm command line for the failing one? I have a hard time parsing your buildbot output. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/5] block: rewrite blk_bvec_map_sg to avoid a nth_page call 2019-04-17 5:27 ` Christoph Hellwig @ 2019-04-17 13:42 ` Guenter Roeck 2019-04-17 21:59 ` Guenter Roeck 1 sibling, 0 replies; 29+ messages in thread From: Guenter Roeck @ 2019-04-17 13:42 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Jens Axboe, Ming Lei, linux-block On 4/16/19 10:27 PM, Christoph Hellwig wrote: > Now that I've fixed the sparc32 iommu code in another thread: can > you send me your rootfs and qemu arm command line for the failing > one? I have a hard time parsing your buildbot output. > Example command line: qemu-system-arm -M vexpress-a9 -kernel arch/arm/boot/zImage -no-reboot \ -snapshot -m 128 \ -drive file=rootfs-armv5.ext2,format=raw,if=sd \ --append 'panic=-1 root=/dev/mmcblk0 rootwait console=ttyAMA0,115200' \ -dtb arch/arm/boot/dts/vexpress-v2p-ca9.dtb \ -nographic -monitor null -serial stdio Root file system is available from https://github.com/groeck/linux-build-test/blob/master/rootfs/arm/rootfs-armv5.ext2.gz Configuration is multi_v7_defconfig; I confirmed that this configuration is sufficient to see the failure. Guenter ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/5] block: rewrite blk_bvec_map_sg to avoid a nth_page call 2019-04-17 5:27 ` Christoph Hellwig 2019-04-17 13:42 ` Guenter Roeck @ 2019-04-17 21:59 ` Guenter Roeck 2019-04-19 2:27 ` Ming Lei 1 sibling, 1 reply; 29+ messages in thread From: Guenter Roeck @ 2019-04-17 21:59 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Jens Axboe, Ming Lei, linux-block On Wed, Apr 17, 2019 at 07:27:24AM +0200, Christoph Hellwig wrote: > Now that I've fixed the sparc32 iommu code in another thread: can > you send me your rootfs and qemu arm command line for the failing > one? I have a hard time parsing your buildbot output. FWIW: mmc_blk_data_prep() calls blk_rq_map_sg() with a large offset value. The old code translated this into: blk_bvec_map_sg(q=c77a0000 len=13824 offset=18944) sg_set_page(sg=c6015000 p=c7efd180 l=13824 o=2560) The new code leaves offset unchanged: blk_bvec_map_sg(q=c76c0528 len=13824 offset=18944) sg_set_page(sg=c6035000 p=c7f2af00 l=13824 o=18944) Traceback: [<c065e3d4>] (blk_rq_map_sg) from [<c0ca1444>] (mmc_blk_data_prep+0x1b0/0x2c8) [<c0ca1444>] (mmc_blk_data_prep) from [<c0ca15ac>] (mmc_blk_rw_rq_prep+0x50/0x178) [<c0ca15ac>] (mmc_blk_rw_rq_prep) from [<c0ca48bc>] (mmc_blk_mq_issue_rq+0x290/0x878) [<c0ca48bc>] (mmc_blk_mq_issue_rq) from [<c0ca52e4>] (mmc_mq_queue_rq+0x128/0x234) [<c0ca52e4>] (mmc_mq_queue_rq) from [<c066350c>] (blk_mq_dispatch_rq_list+0xc8/0x5e8) [<c066350c>] (blk_mq_dispatch_rq_list) from [<c06681a8>] (blk_mq_do_dispatch_sched+0x60/0xfc) [<c06681a8>] (blk_mq_do_dispatch_sched) from [<c06688b8>] (blk_mq_sched_dispatch_requests+0x134/0x1b0) [<c06688b8>] (blk_mq_sched_dispatch_requests) from [<c0661f08>] (__blk_mq_run_hw_queue+0xa4/0x138) [<c0661f08>] (__blk_mq_run_hw_queue) from [<c03622a0>] (process_one_work+0x218/0x510) [<c03622a0>] (process_one_work) from [<c0363230>] (worker_thread+0x44/0x5bc) This results in bad data transfers, which ultimately causes the crash. Guenter ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/5] block: rewrite blk_bvec_map_sg to avoid a nth_page call 2019-04-17 21:59 ` Guenter Roeck @ 2019-04-19 2:27 ` Ming Lei 2019-04-19 2:36 ` Ming Lei 0 siblings, 1 reply; 29+ messages in thread From: Ming Lei @ 2019-04-19 2:27 UTC (permalink / raw) To: Guenter Roeck; +Cc: Christoph Hellwig, Jens Axboe, Ming Lei, linux-block On Thu, Apr 18, 2019 at 5:59 AM Guenter Roeck <linux@roeck-us.net> wrote: > > On Wed, Apr 17, 2019 at 07:27:24AM +0200, Christoph Hellwig wrote: > > Now that I've fixed the sparc32 iommu code in another thread: can > > you send me your rootfs and qemu arm command line for the failing > > one? I have a hard time parsing your buildbot output. > > FWIW: mmc_blk_data_prep() calls blk_rq_map_sg() with a large offset value. > The old code translated this into: > > blk_bvec_map_sg(q=c77a0000 len=13824 offset=18944) > sg_set_page(sg=c6015000 p=c7efd180 l=13824 o=2560) > > The new code leaves offset unchanged: > > blk_bvec_map_sg(q=c76c0528 len=13824 offset=18944) > sg_set_page(sg=c6035000 p=c7f2af00 l=13824 o=18944) > > Traceback: > > [<c065e3d4>] (blk_rq_map_sg) from [<c0ca1444>] (mmc_blk_data_prep+0x1b0/0x2c8) > [<c0ca1444>] (mmc_blk_data_prep) from [<c0ca15ac>] (mmc_blk_rw_rq_prep+0x50/0x178) > [<c0ca15ac>] (mmc_blk_rw_rq_prep) from [<c0ca48bc>] (mmc_blk_mq_issue_rq+0x290/0x878) > [<c0ca48bc>] (mmc_blk_mq_issue_rq) from [<c0ca52e4>] (mmc_mq_queue_rq+0x128/0x234) > [<c0ca52e4>] (mmc_mq_queue_rq) from [<c066350c>] (blk_mq_dispatch_rq_list+0xc8/0x5e8) > [<c066350c>] (blk_mq_dispatch_rq_list) from [<c06681a8>] (blk_mq_do_dispatch_sched+0x60/0xfc) > [<c06681a8>] (blk_mq_do_dispatch_sched) from [<c06688b8>] (blk_mq_sched_dispatch_requests+0x134/0x1b0) > [<c06688b8>] (blk_mq_sched_dispatch_requests) from [<c0661f08>] (__blk_mq_run_hw_queue+0xa4/0x138) > [<c0661f08>] (__blk_mq_run_hw_queue) from [<c03622a0>] (process_one_work+0x218/0x510) > [<c03622a0>] (process_one_work) from [<c0363230>] (worker_thread+0x44/0x5bc) > > This results in bad data transfers, which ultimately causes the crash. There are several bugs related with kmap(sg_page(sg)), such as: sdhci_kmap_atomic() tmio_mmc_kmap_atomic() wbsd_map_sg() Thanks, Ming Lei ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/5] block: rewrite blk_bvec_map_sg to avoid a nth_page call 2019-04-19 2:27 ` Ming Lei @ 2019-04-19 2:36 ` Ming Lei 0 siblings, 0 replies; 29+ messages in thread From: Ming Lei @ 2019-04-19 2:36 UTC (permalink / raw) To: Guenter Roeck, linux-mmc, Ulf Hansson Cc: Christoph Hellwig, Jens Axboe, Ming Lei, linux-block On Fri, Apr 19, 2019 at 10:27 AM Ming Lei <tom.leiming@gmail.com> wrote: > > On Thu, Apr 18, 2019 at 5:59 AM Guenter Roeck <linux@roeck-us.net> wrote: > > > > On Wed, Apr 17, 2019 at 07:27:24AM +0200, Christoph Hellwig wrote: > > > Now that I've fixed the sparc32 iommu code in another thread: can > > > you send me your rootfs and qemu arm command line for the failing > > > one? I have a hard time parsing your buildbot output. > > > > FWIW: mmc_blk_data_prep() calls blk_rq_map_sg() with a large offset value. > > The old code translated this into: > > > > blk_bvec_map_sg(q=c77a0000 len=13824 offset=18944) > > sg_set_page(sg=c6015000 p=c7efd180 l=13824 o=2560) > > > > The new code leaves offset unchanged: > > > > blk_bvec_map_sg(q=c76c0528 len=13824 offset=18944) > > sg_set_page(sg=c6035000 p=c7f2af00 l=13824 o=18944) > > > > Traceback: > > > > [<c065e3d4>] (blk_rq_map_sg) from [<c0ca1444>] (mmc_blk_data_prep+0x1b0/0x2c8) > > [<c0ca1444>] (mmc_blk_data_prep) from [<c0ca15ac>] (mmc_blk_rw_rq_prep+0x50/0x178) > > [<c0ca15ac>] (mmc_blk_rw_rq_prep) from [<c0ca48bc>] (mmc_blk_mq_issue_rq+0x290/0x878) > > [<c0ca48bc>] (mmc_blk_mq_issue_rq) from [<c0ca52e4>] (mmc_mq_queue_rq+0x128/0x234) > > [<c0ca52e4>] (mmc_mq_queue_rq) from [<c066350c>] (blk_mq_dispatch_rq_list+0xc8/0x5e8) > > [<c066350c>] (blk_mq_dispatch_rq_list) from [<c06681a8>] (blk_mq_do_dispatch_sched+0x60/0xfc) > > [<c06681a8>] (blk_mq_do_dispatch_sched) from [<c06688b8>] (blk_mq_sched_dispatch_requests+0x134/0x1b0) > > [<c06688b8>] (blk_mq_sched_dispatch_requests) from [<c0661f08>] (__blk_mq_run_hw_queue+0xa4/0x138) > > [<c0661f08>] (__blk_mq_run_hw_queue) from [<c03622a0>] (process_one_work+0x218/0x510) > > [<c03622a0>] (process_one_work) from [<c0363230>] (worker_thread+0x44/0x5bc) > > > > This results in bad data transfers, which ultimately causes the crash. > > There are several bugs related with kmap(sg_page(sg)), such as: > > sdhci_kmap_atomic() > tmio_mmc_kmap_atomic() > wbsd_map_sg() Cc mmc maillist: Looks there are more such bad uses: au1xmmc_send_pio() au1xmmc_receive_pio() mmc_spi_data_do() sdricoh_request() However, seems tifm_sd.c notices this issue, see tifm_sd_transfer_data(). Thanks, Ming Lei ^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH 2/5] block: refactor __bio_iov_bvec_add_pages 2019-04-08 10:46 avoid calling nth_page in the block I/O path Christoph Hellwig 2019-04-08 10:46 ` [PATCH 1/5] block: rewrite blk_bvec_map_sg to avoid a nth_page call Christoph Hellwig @ 2019-04-08 10:46 ` Christoph Hellwig 2019-04-08 11:07 ` Johannes Thumshirn 2019-04-08 22:06 ` Bart Van Assche 2019-04-08 10:46 ` [PATCH 3/5] block: don't allow multiple bio_iov_iter_get_pages calls per bio Christoph Hellwig ` (3 subsequent siblings) 5 siblings, 2 replies; 29+ messages in thread From: Christoph Hellwig @ 2019-04-08 10:46 UTC (permalink / raw) To: Jens Axboe; +Cc: Ming Lei, linux-block Return early on error, and add an unlikely annotation for that case. Signed-off-by: Christoph Hellwig <hch@lst.de> --- block/bio.c | 19 +++++++++---------- 1 file changed, 9 insertions(+), 10 deletions(-) diff --git a/block/bio.c b/block/bio.c index c2592c5d70b9..ad346082a971 100644 --- a/block/bio.c +++ b/block/bio.c @@ -873,20 +873,19 @@ static int __bio_iov_bvec_add_pages(struct bio *bio, struct iov_iter *iter) len = min_t(size_t, bv->bv_len - iter->iov_offset, iter->count); size = bio_add_page(bio, bv->bv_page, len, bv->bv_offset + iter->iov_offset); - if (size == len) { - if (!bio_flagged(bio, BIO_NO_PAGE_REF)) { - struct page *page; - int i; + if (unlikely(size != len)) + return -EINVAL; - mp_bvec_for_each_page(page, bv, i) - get_page(page); - } + if (!bio_flagged(bio, BIO_NO_PAGE_REF)) { + struct page *page; + int i; - iov_iter_advance(iter, size); - return 0; + mp_bvec_for_each_page(page, bv, i) + get_page(page); } - return -EINVAL; + iov_iter_advance(iter, size); + return 0; } #define PAGE_PTRS_PER_BVEC (sizeof(struct bio_vec) / sizeof(struct page *)) -- 2.20.1 ^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH 2/5] block: refactor __bio_iov_bvec_add_pages 2019-04-08 10:46 ` [PATCH 2/5] block: refactor __bio_iov_bvec_add_pages Christoph Hellwig @ 2019-04-08 11:07 ` Johannes Thumshirn 2019-04-08 22:06 ` Bart Van Assche 1 sibling, 0 replies; 29+ messages in thread From: Johannes Thumshirn @ 2019-04-08 11:07 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Jens Axboe, Ming Lei, linux-block Looks good, Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> -- Johannes Thumshirn SUSE Labs Filesystems jthumshirn@suse.de +49 911 74053 689 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Felix Imendörffer, Mary Higgins, Sri Rasiah HRB 21284 (AG Nürnberg) Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850 ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 2/5] block: refactor __bio_iov_bvec_add_pages 2019-04-08 10:46 ` [PATCH 2/5] block: refactor __bio_iov_bvec_add_pages Christoph Hellwig 2019-04-08 11:07 ` Johannes Thumshirn @ 2019-04-08 22:06 ` Bart Van Assche 1 sibling, 0 replies; 29+ messages in thread From: Bart Van Assche @ 2019-04-08 22:06 UTC (permalink / raw) To: Christoph Hellwig, Jens Axboe; +Cc: Ming Lei, linux-block On Mon, 2019-04-08 at 12:46 +0200, Christoph Hellwig wrote: > Return early on error, and add an unlikely annotation for that case. Reviewed-by: Bart Van Assche <bvanassche@acm.org> ^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH 3/5] block: don't allow multiple bio_iov_iter_get_pages calls per bio 2019-04-08 10:46 avoid calling nth_page in the block I/O path Christoph Hellwig 2019-04-08 10:46 ` [PATCH 1/5] block: rewrite blk_bvec_map_sg to avoid a nth_page call Christoph Hellwig 2019-04-08 10:46 ` [PATCH 2/5] block: refactor __bio_iov_bvec_add_pages Christoph Hellwig @ 2019-04-08 10:46 ` Christoph Hellwig 2019-04-08 11:13 ` Johannes Thumshirn 2019-04-08 22:17 ` Bart Van Assche 2019-04-08 10:46 ` [PATCH 4/5] block: change how we get page references in bio_iov_iter_get_pages Christoph Hellwig ` (2 subsequent siblings) 5 siblings, 2 replies; 29+ messages in thread From: Christoph Hellwig @ 2019-04-08 10:46 UTC (permalink / raw) To: Jens Axboe; +Cc: Ming Lei, linux-block No caller uses bio_iov_iter_get_pages multiple times on a given bio, and that funtionality isn't all that useful. Removing it will make some future changes a little easier and also simplifies the function a bit. Signed-off-by: Christoph Hellwig <hch@lst.de> --- block/bio.c | 15 ++++++--------- 1 file changed, 6 insertions(+), 9 deletions(-) diff --git a/block/bio.c b/block/bio.c index ad346082a971..2fa624db21c7 100644 --- a/block/bio.c +++ b/block/bio.c @@ -958,7 +958,10 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) { const bool is_bvec = iov_iter_is_bvec(iter); - unsigned short orig_vcnt = bio->bi_vcnt; + int ret = -EFAULT; + + if (WARN_ON_ONCE(bio->bi_vcnt)) + return -EINVAL; /* * If this is a BVEC iter, then the pages are kernel pages. Don't @@ -968,19 +971,13 @@ int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) bio_set_flag(bio, BIO_NO_PAGE_REF); do { - int ret; - if (is_bvec) ret = __bio_iov_bvec_add_pages(bio, iter); else ret = __bio_iov_iter_get_pages(bio, iter); + } while (!ret && iov_iter_count(iter) && !bio_full(bio)); - if (unlikely(ret)) - return bio->bi_vcnt > orig_vcnt ? 0 : ret; - - } while (iov_iter_count(iter) && !bio_full(bio)); - - return 0; + return bio->bi_vcnt ? 0 : ret; } static void submit_bio_wait_endio(struct bio *bio) -- 2.20.1 ^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH 3/5] block: don't allow multiple bio_iov_iter_get_pages calls per bio 2019-04-08 10:46 ` [PATCH 3/5] block: don't allow multiple bio_iov_iter_get_pages calls per bio Christoph Hellwig @ 2019-04-08 11:13 ` Johannes Thumshirn 2019-04-08 22:17 ` Bart Van Assche 1 sibling, 0 replies; 29+ messages in thread From: Johannes Thumshirn @ 2019-04-08 11:13 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Jens Axboe, Ming Lei, linux-block Looks good, Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> -- Johannes Thumshirn SUSE Labs Filesystems jthumshirn@suse.de +49 911 74053 689 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Felix Imendörffer, Mary Higgins, Sri Rasiah HRB 21284 (AG Nürnberg) Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850 ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 3/5] block: don't allow multiple bio_iov_iter_get_pages calls per bio 2019-04-08 10:46 ` [PATCH 3/5] block: don't allow multiple bio_iov_iter_get_pages calls per bio Christoph Hellwig 2019-04-08 11:13 ` Johannes Thumshirn @ 2019-04-08 22:17 ` Bart Van Assche 2019-04-09 10:05 ` Christoph Hellwig 1 sibling, 1 reply; 29+ messages in thread From: Bart Van Assche @ 2019-04-08 22:17 UTC (permalink / raw) To: Christoph Hellwig, Jens Axboe; +Cc: Ming Lei, linux-block On Mon, 2019-04-08 at 12:46 +0200, Christoph Hellwig wrote: > No caller uses bio_iov_iter_get_pages multiple times on a given bio, > and that funtionality isn't all that useful. Removing it will make > some future changes a little easier and also simplifies the function > a bit. > > Signed-off-by: Christoph Hellwig <hch@lst.de> > --- > block/bio.c | 15 ++++++--------- > 1 file changed, 6 insertions(+), 9 deletions(-) > > diff --git a/block/bio.c b/block/bio.c > index ad346082a971..2fa624db21c7 100644 > --- a/block/bio.c > +++ b/block/bio.c > @@ -958,7 +958,10 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) > int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) > { > const bool is_bvec = iov_iter_is_bvec(iter); > - unsigned short orig_vcnt = bio->bi_vcnt; > + int ret = -EFAULT; Is the value -EFAULT used anywhere? In other words, can " = -EFAULT" be left out? Otherwise this patch looks good to me. Thanks, Bart. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 3/5] block: don't allow multiple bio_iov_iter_get_pages calls per bio 2019-04-08 22:17 ` Bart Van Assche @ 2019-04-09 10:05 ` Christoph Hellwig 0 siblings, 0 replies; 29+ messages in thread From: Christoph Hellwig @ 2019-04-09 10:05 UTC (permalink / raw) To: Bart Van Assche; +Cc: Christoph Hellwig, Jens Axboe, Ming Lei, linux-block On Mon, Apr 08, 2019 at 03:17:35PM -0700, Bart Van Assche wrote: > On Mon, 2019-04-08 at 12:46 +0200, Christoph Hellwig wrote: > > No caller uses bio_iov_iter_get_pages multiple times on a given bio, > > and that funtionality isn't all that useful. Removing it will make > > some future changes a little easier and also simplifies the function > > a bit. > > > > Signed-off-by: Christoph Hellwig <hch@lst.de> > > --- > > block/bio.c | 15 ++++++--------- > > 1 file changed, 6 insertions(+), 9 deletions(-) > > > > diff --git a/block/bio.c b/block/bio.c > > index ad346082a971..2fa624db21c7 100644 > > --- a/block/bio.c > > +++ b/block/bio.c > > @@ -958,7 +958,10 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) > > int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) > > { > > const bool is_bvec = iov_iter_is_bvec(iter); > > - unsigned short orig_vcnt = bio->bi_vcnt; > > + int ret = -EFAULT; > > Is the value -EFAULT used anywhere? In other words, can " = -EFAULT" be left > out? Otherwise this patch looks good to me. True, it could be dropped. The assignment is a leftover from an earlier version of the patch. ^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH 4/5] block: change how we get page references in bio_iov_iter_get_pages 2019-04-08 10:46 avoid calling nth_page in the block I/O path Christoph Hellwig ` (2 preceding siblings ...) 2019-04-08 10:46 ` [PATCH 3/5] block: don't allow multiple bio_iov_iter_get_pages calls per bio Christoph Hellwig @ 2019-04-08 10:46 ` Christoph Hellwig 2019-04-08 10:46 ` [PATCH 5/5] block: only allow contiguous page structs in a bio_vec Christoph Hellwig 2019-04-09 16:15 ` avoid calling nth_page in the block I/O path Jens Axboe 5 siblings, 0 replies; 29+ messages in thread From: Christoph Hellwig @ 2019-04-08 10:46 UTC (permalink / raw) To: Jens Axboe; +Cc: Ming Lei, linux-block Instead of needing a special macro to iterate over all pages in a bvec just do a second passs over the whole bio. This also matches what we do on the release side. The release side helper is moved up to where we need the get helper to clearly express the symmetry. Signed-off-by: Christoph Hellwig <hch@lst.de> --- block/bio.c | 51 ++++++++++++++++++++++---------------------- include/linux/bvec.h | 5 ----- 2 files changed, 25 insertions(+), 31 deletions(-) diff --git a/block/bio.c b/block/bio.c index 2fa624db21c7..3f78c114b5e9 100644 --- a/block/bio.c +++ b/block/bio.c @@ -861,6 +861,26 @@ int bio_add_page(struct bio *bio, struct page *page, } EXPORT_SYMBOL(bio_add_page); +static void bio_get_pages(struct bio *bio) +{ + struct bvec_iter_all iter_all; + struct bio_vec *bvec; + int i; + + bio_for_each_segment_all(bvec, bio, i, iter_all) + get_page(bvec->bv_page); +} + +static void bio_release_pages(struct bio *bio) +{ + struct bvec_iter_all iter_all; + struct bio_vec *bvec; + int i; + + bio_for_each_segment_all(bvec, bio, i, iter_all) + put_page(bvec->bv_page); +} + static int __bio_iov_bvec_add_pages(struct bio *bio, struct iov_iter *iter) { const struct bio_vec *bv = iter->bvec; @@ -875,15 +895,6 @@ static int __bio_iov_bvec_add_pages(struct bio *bio, struct iov_iter *iter) bv->bv_offset + iter->iov_offset); if (unlikely(size != len)) return -EINVAL; - - if (!bio_flagged(bio, BIO_NO_PAGE_REF)) { - struct page *page; - int i; - - mp_bvec_for_each_page(page, bv, i) - get_page(page); - } - iov_iter_advance(iter, size); return 0; } @@ -963,13 +974,6 @@ int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) if (WARN_ON_ONCE(bio->bi_vcnt)) return -EINVAL; - /* - * If this is a BVEC iter, then the pages are kernel pages. Don't - * release them on IO completion, if the caller asked us to. - */ - if (is_bvec && iov_iter_bvec_no_ref(iter)) - bio_set_flag(bio, BIO_NO_PAGE_REF); - do { if (is_bvec) ret = __bio_iov_bvec_add_pages(bio, iter); @@ -977,6 +981,11 @@ int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) ret = __bio_iov_iter_get_pages(bio, iter); } while (!ret && iov_iter_count(iter) && !bio_full(bio)); + if (iov_iter_bvec_no_ref(iter)) + bio_set_flag(bio, BIO_NO_PAGE_REF); + else + bio_get_pages(bio); + return bio->bi_vcnt ? 0 : ret; } @@ -1670,16 +1679,6 @@ void bio_set_pages_dirty(struct bio *bio) } } -static void bio_release_pages(struct bio *bio) -{ - struct bio_vec *bvec; - int i; - struct bvec_iter_all iter_all; - - bio_for_each_segment_all(bvec, bio, i, iter_all) - put_page(bvec->bv_page); -} - /* * bio_check_pages_dirty() will check that all the BIO's pages are still dirty. * If they are, then fine. If, however, some pages are clean then they must diff --git a/include/linux/bvec.h b/include/linux/bvec.h index f6275c4da13a..307bbda62b7b 100644 --- a/include/linux/bvec.h +++ b/include/linux/bvec.h @@ -189,9 +189,4 @@ static inline void mp_bvec_last_segment(const struct bio_vec *bvec, } } -#define mp_bvec_for_each_page(pg, bv, i) \ - for (i = (bv)->bv_offset / PAGE_SIZE; \ - (i <= (((bv)->bv_offset + (bv)->bv_len - 1) / PAGE_SIZE)) && \ - (pg = bvec_nth_page((bv)->bv_page, i)); i += 1) - #endif /* __LINUX_BVEC_ITER_H */ -- 2.20.1 ^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH 5/5] block: only allow contiguous page structs in a bio_vec 2019-04-08 10:46 avoid calling nth_page in the block I/O path Christoph Hellwig ` (3 preceding siblings ...) 2019-04-08 10:46 ` [PATCH 4/5] block: change how we get page references in bio_iov_iter_get_pages Christoph Hellwig @ 2019-04-08 10:46 ` Christoph Hellwig 2019-04-09 16:15 ` avoid calling nth_page in the block I/O path Jens Axboe 5 siblings, 0 replies; 29+ messages in thread From: Christoph Hellwig @ 2019-04-08 10:46 UTC (permalink / raw) To: Jens Axboe; +Cc: Ming Lei, linux-block We currently have to call nth_page when iterating over pages inside a bio_vec. Jens complained a while ago that this is fairly expensive. To mitigate this we can check that that the actual page structures are contiguous when adding them to the bio, and just do check pointer arithmetics later on. Signed-off-by: Christoph Hellwig <hch@lst.de> --- block/bio.c | 9 +++++++-- include/linux/bvec.h | 13 ++++--------- 2 files changed, 11 insertions(+), 11 deletions(-) diff --git a/block/bio.c b/block/bio.c index 3f78c114b5e9..110292627575 100644 --- a/block/bio.c +++ b/block/bio.c @@ -659,8 +659,13 @@ static inline bool page_is_mergeable(const struct bio_vec *bv, return false; if (xen_domain() && !xen_biovec_phys_mergeable(bv, page)) return false; - if (same_page && (vec_end_addr & PAGE_MASK) != page_addr) - return false; + + if ((vec_end_addr & PAGE_MASK) != page_addr) { + if (same_page) + return false; + if (pfn_to_page(PFN_DOWN(vec_end_addr)) + 1 != page) + return false; + } return true; } diff --git a/include/linux/bvec.h b/include/linux/bvec.h index 307bbda62b7b..44b0f4684190 100644 --- a/include/linux/bvec.h +++ b/include/linux/bvec.h @@ -51,11 +51,6 @@ struct bvec_iter_all { unsigned done; }; -static inline struct page *bvec_nth_page(struct page *page, int idx) -{ - return idx == 0 ? page : nth_page(page, idx); -} - /* * various member access, note that bio_data should of course not be used * on highmem page vectors @@ -92,8 +87,8 @@ static inline struct page *bvec_nth_page(struct page *page, int idx) PAGE_SIZE - bvec_iter_offset((bvec), (iter))) #define bvec_iter_page(bvec, iter) \ - bvec_nth_page(mp_bvec_iter_page((bvec), (iter)), \ - mp_bvec_iter_page_idx((bvec), (iter))) + (mp_bvec_iter_page((bvec), (iter)) + \ + mp_bvec_iter_page_idx((bvec), (iter))) #define bvec_iter_bvec(bvec, iter) \ ((struct bio_vec) { \ @@ -157,7 +152,7 @@ static inline void mp_bvec_next_segment(const struct bio_vec *bvec, struct bio_vec *bv = &iter_all->bv; if (bv->bv_page) { - bv->bv_page = nth_page(bv->bv_page, 1); + bv->bv_page++; bv->bv_offset = 0; } else { bv->bv_page = bvec->bv_page; @@ -177,7 +172,7 @@ static inline void mp_bvec_last_segment(const struct bio_vec *bvec, unsigned total = bvec->bv_offset + bvec->bv_len; unsigned last_page = (total - 1) / PAGE_SIZE; - seg->bv_page = bvec_nth_page(bvec->bv_page, last_page); + seg->bv_page = bvec->bv_page + last_page; /* the whole segment is inside the last page */ if (bvec->bv_offset >= last_page * PAGE_SIZE) { -- 2.20.1 ^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: avoid calling nth_page in the block I/O path 2019-04-08 10:46 avoid calling nth_page in the block I/O path Christoph Hellwig ` (4 preceding siblings ...) 2019-04-08 10:46 ` [PATCH 5/5] block: only allow contiguous page structs in a bio_vec Christoph Hellwig @ 2019-04-09 16:15 ` Jens Axboe 5 siblings, 0 replies; 29+ messages in thread From: Jens Axboe @ 2019-04-09 16:15 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Ming Lei, linux-block On 4/8/19 4:46 AM, Christoph Hellwig wrote: > Hi Jens, > > you complained about the overhead of calling nth_page in the multipage > bio_vec enabled I/O path a while ago. While I can't really reproduce > the numbers on my (slower) hardware we could avoid the nth_page calls > pretty easily with a few tweaks. Can you take a look at this series? While I haven't had a chance to bench this yet, I like this series just as an improvement/cleanup. -- Jens Axboe ^ permalink raw reply [flat|nested] 29+ messages in thread
* avoid calling nth_page in the block I/O path v2 @ 2019-04-11 6:23 Christoph Hellwig 2019-04-11 6:23 ` [PATCH 1/5] block: rewrite blk_bvec_map_sg to avoid a nth_page call Christoph Hellwig 0 siblings, 1 reply; 29+ messages in thread From: Christoph Hellwig @ 2019-04-11 6:23 UTC (permalink / raw) To: Jens Axboe; +Cc: Ming Lei, linux-block Hi Jens, you complained about the overhead of calling nth_page in the multipage bio_vec enabled I/O path a while ago. While I can't really reproduce the numbers on my (slower) hardware we could avoid the nth_page calls pretty easily with a few tweaks. Can you take a look at this series? Changes since v1: - remove a spurious variable initialization ^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH 1/5] block: rewrite blk_bvec_map_sg to avoid a nth_page call 2019-04-11 6:23 avoid calling nth_page in the block I/O path v2 Christoph Hellwig @ 2019-04-11 6:23 ` Christoph Hellwig 0 siblings, 0 replies; 29+ messages in thread From: Christoph Hellwig @ 2019-04-11 6:23 UTC (permalink / raw) To: Jens Axboe; +Cc: Ming Lei, linux-block, Bart Van Assche, Johannes Thumshirn The offset in scatterlists is allowed to be larger than the page size, so don't go to great length to avoid that case and simplify the arithmetics. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> --- block/blk-merge.c | 21 ++++++--------------- 1 file changed, 6 insertions(+), 15 deletions(-) diff --git a/block/blk-merge.c b/block/blk-merge.c index 895795cdb145..247b17f2a0f6 100644 --- a/block/blk-merge.c +++ b/block/blk-merge.c @@ -469,26 +469,17 @@ static unsigned blk_bvec_map_sg(struct request_queue *q, struct scatterlist **sg) { unsigned nbytes = bvec->bv_len; - unsigned nsegs = 0, total = 0, offset = 0; + unsigned nsegs = 0, total = 0; while (nbytes > 0) { - unsigned seg_size; - struct page *pg; - unsigned idx; + unsigned offset = bvec->bv_offset + total; + unsigned len = min(get_max_segment_size(q, offset), nbytes); *sg = blk_next_sg(sg, sglist); + sg_set_page(*sg, bvec->bv_page, len, offset); - seg_size = get_max_segment_size(q, bvec->bv_offset + total); - seg_size = min(nbytes, seg_size); - - offset = (total + bvec->bv_offset) % PAGE_SIZE; - idx = (total + bvec->bv_offset) / PAGE_SIZE; - pg = bvec_nth_page(bvec->bv_page, idx); - - sg_set_page(*sg, pg, seg_size, offset); - - total += seg_size; - nbytes -= seg_size; + total += len; + nbytes -= len; nsegs++; } -- 2.20.1 ^ permalink raw reply related [flat|nested] 29+ messages in thread
end of thread, other threads:[~2019-04-19 2:36 UTC | newest] Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-04-08 10:46 avoid calling nth_page in the block I/O path Christoph Hellwig 2019-04-08 10:46 ` [PATCH 1/5] block: rewrite blk_bvec_map_sg to avoid a nth_page call Christoph Hellwig 2019-04-08 14:03 ` Johannes Thumshirn 2019-04-08 22:04 ` Bart Van Assche 2019-04-08 22:51 ` Ming Lei 2019-04-15 19:44 ` Guenter Roeck 2019-04-15 20:52 ` Christoph Hellwig 2019-04-15 21:07 ` Guenter Roeck 2019-04-16 6:33 ` Christoph Hellwig 2019-04-16 14:09 ` Guenter Roeck 2019-04-16 17:08 ` Guenter Roeck 2019-04-16 17:10 ` Christoph Hellwig 2019-04-16 17:51 ` Guenter Roeck 2019-04-17 5:27 ` Christoph Hellwig 2019-04-17 13:42 ` Guenter Roeck 2019-04-17 21:59 ` Guenter Roeck 2019-04-19 2:27 ` Ming Lei 2019-04-19 2:36 ` Ming Lei 2019-04-08 10:46 ` [PATCH 2/5] block: refactor __bio_iov_bvec_add_pages Christoph Hellwig 2019-04-08 11:07 ` Johannes Thumshirn 2019-04-08 22:06 ` Bart Van Assche 2019-04-08 10:46 ` [PATCH 3/5] block: don't allow multiple bio_iov_iter_get_pages calls per bio Christoph Hellwig 2019-04-08 11:13 ` Johannes Thumshirn 2019-04-08 22:17 ` Bart Van Assche 2019-04-09 10:05 ` Christoph Hellwig 2019-04-08 10:46 ` [PATCH 4/5] block: change how we get page references in bio_iov_iter_get_pages Christoph Hellwig 2019-04-08 10:46 ` [PATCH 5/5] block: only allow contiguous page structs in a bio_vec Christoph Hellwig 2019-04-09 16:15 ` avoid calling nth_page in the block I/O path Jens Axboe 2019-04-11 6:23 avoid calling nth_page in the block I/O path v2 Christoph Hellwig 2019-04-11 6:23 ` [PATCH 1/5] block: rewrite blk_bvec_map_sg to avoid a nth_page call Christoph Hellwig
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.