* Any bio_clone_slow() implementation which doesn't share bi_io_vec? @ 2021-11-23 6:44 ` Qu Wenruo 0 siblings, 0 replies; 30+ messages in thread From: Qu Wenruo @ 2021-11-23 6:44 UTC (permalink / raw) To: Linux FS Devel, dm-devel; +Cc: linux-btrfs Hi, Although there are some out-of-date comments mentions other bio_clone_*() variants, but there isn't really any other bio clone variants other than __bio_clone_fast(), which shares bi_io_vec with the source bio. This limits means we can't free the source bio before the cloned one. Is there any bio_clone variant which do a deep clone, including bi_io_vec? The background story is a little long, with btrfs involved. Unlike dm/md-raid which uses bio_split(), btrfs splits its bio before even creating a bio. Every time btrfs wants to create a bio, it will calculate the boundary, thus will only create a bio which never cross various raid boundary. But this is not really the common practice, I'd like to at least change the timing of bio split so that reflects more like dm/md raid. That's why the bio_clone thing is involved, there is still some corner cases that we don't want to fail the whole large bio if there is only one stripe failed (mostly for read bio, that we want to salvage as much data as possible) Thus regular bio_split() + bio_chain() solution is not that good here. Any idea why no such bio_clone_slow() or bio_split_slow() provided in block layer? Or really bio_split() + bio_chain() is the only recommended solution? Thanks, Qu ^ permalink raw reply [flat|nested] 30+ messages in thread
* [dm-devel] Any bio_clone_slow() implementation which doesn't share bi_io_vec? @ 2021-11-23 6:44 ` Qu Wenruo 0 siblings, 0 replies; 30+ messages in thread From: Qu Wenruo @ 2021-11-23 6:44 UTC (permalink / raw) To: Linux FS Devel, dm-devel; +Cc: linux-btrfs Hi, Although there are some out-of-date comments mentions other bio_clone_*() variants, but there isn't really any other bio clone variants other than __bio_clone_fast(), which shares bi_io_vec with the source bio. This limits means we can't free the source bio before the cloned one. Is there any bio_clone variant which do a deep clone, including bi_io_vec? The background story is a little long, with btrfs involved. Unlike dm/md-raid which uses bio_split(), btrfs splits its bio before even creating a bio. Every time btrfs wants to create a bio, it will calculate the boundary, thus will only create a bio which never cross various raid boundary. But this is not really the common practice, I'd like to at least change the timing of bio split so that reflects more like dm/md raid. That's why the bio_clone thing is involved, there is still some corner cases that we don't want to fail the whole large bio if there is only one stripe failed (mostly for read bio, that we want to salvage as much data as possible) Thus regular bio_split() + bio_chain() solution is not that good here. Any idea why no such bio_clone_slow() or bio_split_slow() provided in block layer? Or really bio_split() + bio_chain() is the only recommended solution? Thanks, Qu -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Any bio_clone_slow() implementation which doesn't share bi_io_vec? 2021-11-23 6:44 ` [dm-devel] " Qu Wenruo @ 2021-11-23 7:43 ` Christoph Hellwig -1 siblings, 0 replies; 30+ messages in thread From: Christoph Hellwig @ 2021-11-23 7:43 UTC (permalink / raw) To: Qu Wenruo; +Cc: Linux FS Devel, dm-devel, linux-btrfs On Tue, Nov 23, 2021 at 02:44:32PM +0800, Qu Wenruo wrote: > Hi, > > Although there are some out-of-date comments mentions other > bio_clone_*() variants, but there isn't really any other bio clone > variants other than __bio_clone_fast(), which shares bi_io_vec with the > source bio. > > This limits means we can't free the source bio before the cloned one. > > Is there any bio_clone variant which do a deep clone, including bi_io_vec? There is no use case for that, unless the actual data changes like in the bounce buffering code. > That's why the bio_clone thing is involved, there is still some corner > cases that we don't want to fail the whole large bio if there is only > one stripe failed (mostly for read bio, that we want to salvage as much > data as possible) > > Thus regular bio_split() + bio_chain() solution is not that good here. > > Any idea why no such bio_clone_slow() or bio_split_slow() provided in > block layer? > > Or really bio_split() + bio_chain() is the only recommended solution? You can use bio_split witout bio_chain. You just need your own bi_end_io handler that first performs the action you want and then contains code equivalent to __bio_chain_endio. As a bonus you can pint bi_private to whatever you want, it does not have to be the parent bio, just something that allows you to find it. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [dm-devel] Any bio_clone_slow() implementation which doesn't share bi_io_vec? @ 2021-11-23 7:43 ` Christoph Hellwig 0 siblings, 0 replies; 30+ messages in thread From: Christoph Hellwig @ 2021-11-23 7:43 UTC (permalink / raw) To: Qu Wenruo; +Cc: Linux FS Devel, dm-devel, linux-btrfs On Tue, Nov 23, 2021 at 02:44:32PM +0800, Qu Wenruo wrote: > Hi, > > Although there are some out-of-date comments mentions other > bio_clone_*() variants, but there isn't really any other bio clone > variants other than __bio_clone_fast(), which shares bi_io_vec with the > source bio. > > This limits means we can't free the source bio before the cloned one. > > Is there any bio_clone variant which do a deep clone, including bi_io_vec? There is no use case for that, unless the actual data changes like in the bounce buffering code. > That's why the bio_clone thing is involved, there is still some corner > cases that we don't want to fail the whole large bio if there is only > one stripe failed (mostly for read bio, that we want to salvage as much > data as possible) > > Thus regular bio_split() + bio_chain() solution is not that good here. > > Any idea why no such bio_clone_slow() or bio_split_slow() provided in > block layer? > > Or really bio_split() + bio_chain() is the only recommended solution? You can use bio_split witout bio_chain. You just need your own bi_end_io handler that first performs the action you want and then contains code equivalent to __bio_chain_endio. As a bonus you can pint bi_private to whatever you want, it does not have to be the parent bio, just something that allows you to find it. -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Any bio_clone_slow() implementation which doesn't share bi_io_vec? 2021-11-23 7:43 ` [dm-devel] " Christoph Hellwig @ 2021-11-23 8:10 ` Qu Wenruo -1 siblings, 0 replies; 30+ messages in thread From: Qu Wenruo @ 2021-11-23 8:10 UTC (permalink / raw) To: Christoph Hellwig, Qu Wenruo; +Cc: Linux FS Devel, dm-devel, linux-btrfs On 2021/11/23 15:43, Christoph Hellwig wrote: > On Tue, Nov 23, 2021 at 02:44:32PM +0800, Qu Wenruo wrote: >> Hi, >> >> Although there are some out-of-date comments mentions other >> bio_clone_*() variants, but there isn't really any other bio clone >> variants other than __bio_clone_fast(), which shares bi_io_vec with the >> source bio. >> >> This limits means we can't free the source bio before the cloned one. >> >> Is there any bio_clone variant which do a deep clone, including bi_io_vec? > > There is no use case for that, unless the actual data changes like in > the bounce buffering code. > >> That's why the bio_clone thing is involved, there is still some corner >> cases that we don't want to fail the whole large bio if there is only >> one stripe failed (mostly for read bio, that we want to salvage as much >> data as possible) >> >> Thus regular bio_split() + bio_chain() solution is not that good here. >> >> Any idea why no such bio_clone_slow() or bio_split_slow() provided in >> block layer? >> >> Or really bio_split() + bio_chain() is the only recommended solution? > > You can use bio_split witout bio_chain. You just need your own > bi_end_io handler that first performs the action you want and then > contains code equivalent to __bio_chain_endio. As a bonus you can > pint bi_private to whatever you want, it does not have to be the parent > bio, just something that allows you to find it. > Without bio_chain() sounds pretty good, as we can still utilize bi_end_io and bi_private. But this also means, we're now responsible not to release the source bio since it has the real bi_io_vec. Let me explore this and hopefully to align btrfs with dm/md code more. Thanks, Qu ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [dm-devel] Any bio_clone_slow() implementation which doesn't share bi_io_vec? @ 2021-11-23 8:10 ` Qu Wenruo 0 siblings, 0 replies; 30+ messages in thread From: Qu Wenruo @ 2021-11-23 8:10 UTC (permalink / raw) To: Christoph Hellwig, Qu Wenruo; +Cc: Linux FS Devel, dm-devel, linux-btrfs On 2021/11/23 15:43, Christoph Hellwig wrote: > On Tue, Nov 23, 2021 at 02:44:32PM +0800, Qu Wenruo wrote: >> Hi, >> >> Although there are some out-of-date comments mentions other >> bio_clone_*() variants, but there isn't really any other bio clone >> variants other than __bio_clone_fast(), which shares bi_io_vec with the >> source bio. >> >> This limits means we can't free the source bio before the cloned one. >> >> Is there any bio_clone variant which do a deep clone, including bi_io_vec? > > There is no use case for that, unless the actual data changes like in > the bounce buffering code. > >> That's why the bio_clone thing is involved, there is still some corner >> cases that we don't want to fail the whole large bio if there is only >> one stripe failed (mostly for read bio, that we want to salvage as much >> data as possible) >> >> Thus regular bio_split() + bio_chain() solution is not that good here. >> >> Any idea why no such bio_clone_slow() or bio_split_slow() provided in >> block layer? >> >> Or really bio_split() + bio_chain() is the only recommended solution? > > You can use bio_split witout bio_chain. You just need your own > bi_end_io handler that first performs the action you want and then > contains code equivalent to __bio_chain_endio. As a bonus you can > pint bi_private to whatever you want, it does not have to be the parent > bio, just something that allows you to find it. > Without bio_chain() sounds pretty good, as we can still utilize bi_end_io and bi_private. But this also means, we're now responsible not to release the source bio since it has the real bi_io_vec. Let me explore this and hopefully to align btrfs with dm/md code more. Thanks, Qu -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Any bio_clone_slow() implementation which doesn't share bi_io_vec? 2021-11-23 8:10 ` [dm-devel] " Qu Wenruo @ 2021-11-23 8:13 ` Christoph Hellwig -1 siblings, 0 replies; 30+ messages in thread From: Christoph Hellwig @ 2021-11-23 8:13 UTC (permalink / raw) To: Qu Wenruo; +Cc: Christoph Hellwig, Linux FS Devel, dm-devel, linux-btrfs On Tue, Nov 23, 2021 at 04:10:35PM +0800, Qu Wenruo wrote: > Without bio_chain() sounds pretty good, as we can still utilize > bi_end_io and bi_private. > > But this also means, we're now responsible not to release the source bio > since it has the real bi_io_vec. Just call bio_inc_remaining before submitting the cloned bio, and then call bio_endio on the root bio every time a clone completes. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [dm-devel] Any bio_clone_slow() implementation which doesn't share bi_io_vec? @ 2021-11-23 8:13 ` Christoph Hellwig 0 siblings, 0 replies; 30+ messages in thread From: Christoph Hellwig @ 2021-11-23 8:13 UTC (permalink / raw) To: Qu Wenruo; +Cc: Christoph Hellwig, Linux FS Devel, dm-devel, linux-btrfs On Tue, Nov 23, 2021 at 04:10:35PM +0800, Qu Wenruo wrote: > Without bio_chain() sounds pretty good, as we can still utilize > bi_end_io and bi_private. > > But this also means, we're now responsible not to release the source bio > since it has the real bi_io_vec. Just call bio_inc_remaining before submitting the cloned bio, and then call bio_endio on the root bio every time a clone completes. -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Any bio_clone_slow() implementation which doesn't share bi_io_vec? 2021-11-23 8:13 ` [dm-devel] " Christoph Hellwig @ 2021-11-23 11:09 ` Qu Wenruo -1 siblings, 0 replies; 30+ messages in thread From: Qu Wenruo @ 2021-11-23 11:09 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Linux FS Devel, dm-devel, linux-btrfs On 2021/11/23 16:13, Christoph Hellwig wrote: > On Tue, Nov 23, 2021 at 04:10:35PM +0800, Qu Wenruo wrote: >> Without bio_chain() sounds pretty good, as we can still utilize >> bi_end_io and bi_private. >> >> But this also means, we're now responsible not to release the source bio >> since it has the real bi_io_vec. > > Just call bio_inc_remaining before submitting the cloned bio, and then > call bio_endio on the root bio every time a clone completes. > Yeah, that sounds pretty good for regular usage. But there is another very tricky case involved. For btrfs, it supports zoned device, thus we have special calls sites to switch between bio_add_page() and bio_add_zoned_append_page(). But zoned write can't not be split, nor there is an easy way to directly convert a regular bio into a bio with zoned append pages. Currently if we go the slow path, by allocating a new bio, then add pages from original bio, and advance the original bio, we're able to do the conversion from regular bio to zoned append bio. Any idea on this corner case? Thanks, Qu ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [dm-devel] Any bio_clone_slow() implementation which doesn't share bi_io_vec? @ 2021-11-23 11:09 ` Qu Wenruo 0 siblings, 0 replies; 30+ messages in thread From: Qu Wenruo @ 2021-11-23 11:09 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Linux FS Devel, dm-devel, linux-btrfs On 2021/11/23 16:13, Christoph Hellwig wrote: > On Tue, Nov 23, 2021 at 04:10:35PM +0800, Qu Wenruo wrote: >> Without bio_chain() sounds pretty good, as we can still utilize >> bi_end_io and bi_private. >> >> But this also means, we're now responsible not to release the source bio >> since it has the real bi_io_vec. > > Just call bio_inc_remaining before submitting the cloned bio, and then > call bio_endio on the root bio every time a clone completes. > Yeah, that sounds pretty good for regular usage. But there is another very tricky case involved. For btrfs, it supports zoned device, thus we have special calls sites to switch between bio_add_page() and bio_add_zoned_append_page(). But zoned write can't not be split, nor there is an easy way to directly convert a regular bio into a bio with zoned append pages. Currently if we go the slow path, by allocating a new bio, then add pages from original bio, and advance the original bio, we're able to do the conversion from regular bio to zoned append bio. Any idea on this corner case? Thanks, Qu -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Any bio_clone_slow() implementation which doesn't share bi_io_vec? 2021-11-23 11:09 ` [dm-devel] " Qu Wenruo @ 2021-11-23 11:39 ` Johannes Thumshirn -1 siblings, 0 replies; 30+ messages in thread From: Johannes Thumshirn @ 2021-11-23 11:39 UTC (permalink / raw) To: Qu Wenruo, hch; +Cc: Linux FS Devel, dm-devel, linux-btrfs On 23/11/2021 12:09, Qu Wenruo wrote: > > > On 2021/11/23 16:13, Christoph Hellwig wrote: >> On Tue, Nov 23, 2021 at 04:10:35PM +0800, Qu Wenruo wrote: >>> Without bio_chain() sounds pretty good, as we can still utilize >>> bi_end_io and bi_private. >>> >>> But this also means, we're now responsible not to release the source bio >>> since it has the real bi_io_vec. >> >> Just call bio_inc_remaining before submitting the cloned bio, and then >> call bio_endio on the root bio every time a clone completes. >> > Yeah, that sounds pretty good for regular usage. > > But there is another very tricky case involved. > > For btrfs, it supports zoned device, thus we have special calls sites to > switch between bio_add_page() and bio_add_zoned_append_page(). > > But zoned write can't not be split, nor there is an easy way to directly > convert a regular bio into a bio with zoned append pages. > > Currently if we go the slow path, by allocating a new bio, then add > pages from original bio, and advance the original bio, we're able to do > the conversion from regular bio to zoned append bio. > > Any idea on this corner case? I think we have to differentiate two cases here: A "regular" REQ_OP_ZONE_APPEND bio and a RAID stripe REQ_OP_ZONE_APPEND bio. The 1st one (i.e. the regular REQ_OP_ZONE_APPEND bio) can't be split because we cannot guarantee the order the device writes the data to disk. For the RAID stripe bio we can split it into the two (or more) parts that will end up on _different_ devices. All we need to do is a) ensure it doesn't cross the device's zone append limit and b) clamp all bi_iter.bi_sector down to the start of the target zone, a.k.a sticking to the rules of REQ_OP_ZONE_APPEND. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [dm-devel] Any bio_clone_slow() implementation which doesn't share bi_io_vec? @ 2021-11-23 11:39 ` Johannes Thumshirn 0 siblings, 0 replies; 30+ messages in thread From: Johannes Thumshirn @ 2021-11-23 11:39 UTC (permalink / raw) To: Qu Wenruo, hch; +Cc: Linux FS Devel, dm-devel, linux-btrfs On 23/11/2021 12:09, Qu Wenruo wrote: > > > On 2021/11/23 16:13, Christoph Hellwig wrote: >> On Tue, Nov 23, 2021 at 04:10:35PM +0800, Qu Wenruo wrote: >>> Without bio_chain() sounds pretty good, as we can still utilize >>> bi_end_io and bi_private. >>> >>> But this also means, we're now responsible not to release the source bio >>> since it has the real bi_io_vec. >> >> Just call bio_inc_remaining before submitting the cloned bio, and then >> call bio_endio on the root bio every time a clone completes. >> > Yeah, that sounds pretty good for regular usage. > > But there is another very tricky case involved. > > For btrfs, it supports zoned device, thus we have special calls sites to > switch between bio_add_page() and bio_add_zoned_append_page(). > > But zoned write can't not be split, nor there is an easy way to directly > convert a regular bio into a bio with zoned append pages. > > Currently if we go the slow path, by allocating a new bio, then add > pages from original bio, and advance the original bio, we're able to do > the conversion from regular bio to zoned append bio. > > Any idea on this corner case? I think we have to differentiate two cases here: A "regular" REQ_OP_ZONE_APPEND bio and a RAID stripe REQ_OP_ZONE_APPEND bio. The 1st one (i.e. the regular REQ_OP_ZONE_APPEND bio) can't be split because we cannot guarantee the order the device writes the data to disk. For the RAID stripe bio we can split it into the two (or more) parts that will end up on _different_ devices. All we need to do is a) ensure it doesn't cross the device's zone append limit and b) clamp all bi_iter.bi_sector down to the start of the target zone, a.k.a sticking to the rules of REQ_OP_ZONE_APPEND. -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Any bio_clone_slow() implementation which doesn't share bi_io_vec? 2021-11-23 11:39 ` [dm-devel] " Johannes Thumshirn @ 2021-11-23 14:28 ` hch -1 siblings, 0 replies; 30+ messages in thread From: hch @ 2021-11-23 14:28 UTC (permalink / raw) To: Johannes Thumshirn; +Cc: Qu Wenruo, hch, Linux FS Devel, dm-devel, linux-btrfs On Tue, Nov 23, 2021 at 11:39:11AM +0000, Johannes Thumshirn wrote: > I think we have to differentiate two cases here: > A "regular" REQ_OP_ZONE_APPEND bio and a RAID stripe REQ_OP_ZONE_APPEND > bio. The 1st one (i.e. the regular REQ_OP_ZONE_APPEND bio) can't be split > because we cannot guarantee the order the device writes the data to disk. > For the RAID stripe bio we can split it into the two (or more) parts that > will end up on _different_ devices. All we need to do is a) ensure it > doesn't cross the device's zone append limit and b) clamp all > bi_iter.bi_sector down to the start of the target zone, a.k.a sticking to > the rules of REQ_OP_ZONE_APPEND. Exactly. A stacking driver must never split a REQ_OP_ZONE_APPEND bio. But the file system itself can of course split it as long as each split off bio has it's own bi_end_io handler to record where it has been written to. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [dm-devel] Any bio_clone_slow() implementation which doesn't share bi_io_vec? @ 2021-11-23 14:28 ` hch 0 siblings, 0 replies; 30+ messages in thread From: hch @ 2021-11-23 14:28 UTC (permalink / raw) To: Johannes Thumshirn; +Cc: hch, Linux FS Devel, dm-devel, linux-btrfs, Qu Wenruo On Tue, Nov 23, 2021 at 11:39:11AM +0000, Johannes Thumshirn wrote: > I think we have to differentiate two cases here: > A "regular" REQ_OP_ZONE_APPEND bio and a RAID stripe REQ_OP_ZONE_APPEND > bio. The 1st one (i.e. the regular REQ_OP_ZONE_APPEND bio) can't be split > because we cannot guarantee the order the device writes the data to disk. > For the RAID stripe bio we can split it into the two (or more) parts that > will end up on _different_ devices. All we need to do is a) ensure it > doesn't cross the device's zone append limit and b) clamp all > bi_iter.bi_sector down to the start of the target zone, a.k.a sticking to > the rules of REQ_OP_ZONE_APPEND. Exactly. A stacking driver must never split a REQ_OP_ZONE_APPEND bio. But the file system itself can of course split it as long as each split off bio has it's own bi_end_io handler to record where it has been written to. -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Any bio_clone_slow() implementation which doesn't share bi_io_vec? 2021-11-23 14:28 ` [dm-devel] " hch @ 2021-11-23 23:07 ` Qu Wenruo -1 siblings, 0 replies; 30+ messages in thread From: Qu Wenruo @ 2021-11-23 23:07 UTC (permalink / raw) To: hch, Johannes Thumshirn; +Cc: Linux FS Devel, dm-devel, linux-btrfs On 2021/11/23 22:28, hch@infradead.org wrote: > On Tue, Nov 23, 2021 at 11:39:11AM +0000, Johannes Thumshirn wrote: >> I think we have to differentiate two cases here: >> A "regular" REQ_OP_ZONE_APPEND bio and a RAID stripe REQ_OP_ZONE_APPEND >> bio. The 1st one (i.e. the regular REQ_OP_ZONE_APPEND bio) can't be split >> because we cannot guarantee the order the device writes the data to disk. That's correct. But if we want to move all bio split into chunk layer, we want a initial bio without any limitation, and then using that bio to create real REQ_OP_ZONE_APPEND bios with proper size limitations. >> For the RAID stripe bio we can split it into the two (or more) parts that >> will end up on _different_ devices. All we need to do is a) ensure it >> doesn't cross the device's zone append limit and b) clamp all >> bi_iter.bi_sector down to the start of the target zone, a.k.a sticking to >> the rules of REQ_OP_ZONE_APPEND. > > Exactly. A stacking driver must never split a REQ_OP_ZONE_APPEND bio. > But the file system itself can of course split it as long as each split > off bio has it's own bi_end_io handler to record where it has been > written to. > This makes me wonder, can we really forget the zone thing for the initial bio so we just create a plain bio without any special limitation, and let every split condition be handled in the lower layer? Including raid stripe boundary, zone limitations etc. (yeah, it's still not pure stacking driver, but it's more stacking-driver like). In that case, the missing piece seems to be a way to convert a splitted plain bio into a REQ_OP_ZONE_APPEND bio. Can this be done without slow bvec copying? Thanks, Qu ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [dm-devel] Any bio_clone_slow() implementation which doesn't share bi_io_vec? @ 2021-11-23 23:07 ` Qu Wenruo 0 siblings, 0 replies; 30+ messages in thread From: Qu Wenruo @ 2021-11-23 23:07 UTC (permalink / raw) To: hch, Johannes Thumshirn; +Cc: Linux FS Devel, dm-devel, linux-btrfs On 2021/11/23 22:28, hch@infradead.org wrote: > On Tue, Nov 23, 2021 at 11:39:11AM +0000, Johannes Thumshirn wrote: >> I think we have to differentiate two cases here: >> A "regular" REQ_OP_ZONE_APPEND bio and a RAID stripe REQ_OP_ZONE_APPEND >> bio. The 1st one (i.e. the regular REQ_OP_ZONE_APPEND bio) can't be split >> because we cannot guarantee the order the device writes the data to disk. That's correct. But if we want to move all bio split into chunk layer, we want a initial bio without any limitation, and then using that bio to create real REQ_OP_ZONE_APPEND bios with proper size limitations. >> For the RAID stripe bio we can split it into the two (or more) parts that >> will end up on _different_ devices. All we need to do is a) ensure it >> doesn't cross the device's zone append limit and b) clamp all >> bi_iter.bi_sector down to the start of the target zone, a.k.a sticking to >> the rules of REQ_OP_ZONE_APPEND. > > Exactly. A stacking driver must never split a REQ_OP_ZONE_APPEND bio. > But the file system itself can of course split it as long as each split > off bio has it's own bi_end_io handler to record where it has been > written to. > This makes me wonder, can we really forget the zone thing for the initial bio so we just create a plain bio without any special limitation, and let every split condition be handled in the lower layer? Including raid stripe boundary, zone limitations etc. (yeah, it's still not pure stacking driver, but it's more stacking-driver like). In that case, the missing piece seems to be a way to convert a splitted plain bio into a REQ_OP_ZONE_APPEND bio. Can this be done without slow bvec copying? Thanks, Qu -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Any bio_clone_slow() implementation which doesn't share bi_io_vec? 2021-11-23 23:07 ` [dm-devel] " Qu Wenruo @ 2021-11-24 6:09 ` hch -1 siblings, 0 replies; 30+ messages in thread From: hch @ 2021-11-24 6:09 UTC (permalink / raw) To: Qu Wenruo; +Cc: hch, Johannes Thumshirn, Linux FS Devel, dm-devel, linux-btrfs On Wed, Nov 24, 2021 at 07:07:18AM +0800, Qu Wenruo wrote: > In that case, the missing piece seems to be a way to convert a splitted > plain bio into a REQ_OP_ZONE_APPEND bio. > > Can this be done without slow bvec copying? Yes. I have a WIP stacking driver that converts writes to zone appends and it does just that: sector_t orig_sector = bio->bi_iter.bi_sector; unsigned int bio_flags = bio->bi_opf & ~REQ_OP_MASK; ... clone = bio_clone_fast(bio, GFP_NOIO, &bdev->write_bio_set); ... clone->bi_opf = REQ_OP_ZONE_APPEND | REQ_NOMERGE | bio_flags; bio_set_dev(clone, dev->lower_bdev); clone->bi_iter.bi_sector = zone_sector; trace_block_bio_remap(clone, disk_devt(disk), orig_sector); > > Thanks, > Qu ---end quoted text--- ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [dm-devel] Any bio_clone_slow() implementation which doesn't share bi_io_vec? @ 2021-11-24 6:09 ` hch 0 siblings, 0 replies; 30+ messages in thread From: hch @ 2021-11-24 6:09 UTC (permalink / raw) To: Qu Wenruo; +Cc: hch, Johannes Thumshirn, dm-devel, Linux FS Devel, linux-btrfs On Wed, Nov 24, 2021 at 07:07:18AM +0800, Qu Wenruo wrote: > In that case, the missing piece seems to be a way to convert a splitted > plain bio into a REQ_OP_ZONE_APPEND bio. > > Can this be done without slow bvec copying? Yes. I have a WIP stacking driver that converts writes to zone appends and it does just that: sector_t orig_sector = bio->bi_iter.bi_sector; unsigned int bio_flags = bio->bi_opf & ~REQ_OP_MASK; ... clone = bio_clone_fast(bio, GFP_NOIO, &bdev->write_bio_set); ... clone->bi_opf = REQ_OP_ZONE_APPEND | REQ_NOMERGE | bio_flags; bio_set_dev(clone, dev->lower_bdev); clone->bi_iter.bi_sector = zone_sector; trace_block_bio_remap(clone, disk_devt(disk), orig_sector); > > Thanks, > Qu ---end quoted text--- -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Any bio_clone_slow() implementation which doesn't share bi_io_vec? 2021-11-24 6:09 ` [dm-devel] " hch @ 2021-11-24 6:18 ` Qu Wenruo -1 siblings, 0 replies; 30+ messages in thread From: Qu Wenruo @ 2021-11-24 6:18 UTC (permalink / raw) To: hch; +Cc: Johannes Thumshirn, Linux FS Devel, dm-devel, linux-btrfs On 2021/11/24 14:09, hch@infradead.org wrote: > On Wed, Nov 24, 2021 at 07:07:18AM +0800, Qu Wenruo wrote: >> In that case, the missing piece seems to be a way to convert a splitted >> plain bio into a REQ_OP_ZONE_APPEND bio. >> >> Can this be done without slow bvec copying? > > Yes. I have a WIP stacking driver that converts writes to zone appends > and it does just that: > > sector_t orig_sector = bio->bi_iter.bi_sector; > unsigned int bio_flags = bio->bi_opf & ~REQ_OP_MASK; > > ... > > clone = bio_clone_fast(bio, GFP_NOIO, &bdev->write_bio_set); > > ... > > clone->bi_opf = REQ_OP_ZONE_APPEND | REQ_NOMERGE | bio_flags; Just so simple? Then that's super awesome. But I'm a little concerned about the bio_add_hw_page() call in bio_add_zoned_append(). It's not exactly the same as bio_add_page(). Does it mean as long as our splitted bio doesn't exceed zone limit, we can do the convert without any further problem? Thanks, Qu > bio_set_dev(clone, dev->lower_bdev); > clone->bi_iter.bi_sector = zone_sector; > trace_block_bio_remap(clone, disk_devt(disk), orig_sector); > >> >> Thanks, >> Qu > ---end quoted text--- > ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [dm-devel] Any bio_clone_slow() implementation which doesn't share bi_io_vec? @ 2021-11-24 6:18 ` Qu Wenruo 0 siblings, 0 replies; 30+ messages in thread From: Qu Wenruo @ 2021-11-24 6:18 UTC (permalink / raw) To: hch; +Cc: Linux FS Devel, Johannes Thumshirn, dm-devel, linux-btrfs On 2021/11/24 14:09, hch@infradead.org wrote: > On Wed, Nov 24, 2021 at 07:07:18AM +0800, Qu Wenruo wrote: >> In that case, the missing piece seems to be a way to convert a splitted >> plain bio into a REQ_OP_ZONE_APPEND bio. >> >> Can this be done without slow bvec copying? > > Yes. I have a WIP stacking driver that converts writes to zone appends > and it does just that: > > sector_t orig_sector = bio->bi_iter.bi_sector; > unsigned int bio_flags = bio->bi_opf & ~REQ_OP_MASK; > > ... > > clone = bio_clone_fast(bio, GFP_NOIO, &bdev->write_bio_set); > > ... > > clone->bi_opf = REQ_OP_ZONE_APPEND | REQ_NOMERGE | bio_flags; Just so simple? Then that's super awesome. But I'm a little concerned about the bio_add_hw_page() call in bio_add_zoned_append(). It's not exactly the same as bio_add_page(). Does it mean as long as our splitted bio doesn't exceed zone limit, we can do the convert without any further problem? Thanks, Qu > bio_set_dev(clone, dev->lower_bdev); > clone->bi_iter.bi_sector = zone_sector; > trace_block_bio_remap(clone, disk_devt(disk), orig_sector); > >> >> Thanks, >> Qu > ---end quoted text--- > -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Any bio_clone_slow() implementation which doesn't share bi_io_vec? 2021-11-24 6:18 ` [dm-devel] " Qu Wenruo @ 2021-11-24 7:02 ` hch -1 siblings, 0 replies; 30+ messages in thread From: hch @ 2021-11-24 7:02 UTC (permalink / raw) To: Qu Wenruo; +Cc: hch, Johannes Thumshirn, Linux FS Devel, dm-devel, linux-btrfs On Wed, Nov 24, 2021 at 02:18:00PM +0800, Qu Wenruo wrote: > Just so simple? Then that's super awesome. > > But I'm a little concerned about the bio_add_hw_page() call in > bio_add_zoned_append(). > > It's not exactly the same as bio_add_page(). > > Does it mean as long as our splitted bio doesn't exceed zone limit, we > can do the convert without any further problem? You need to look at the limits when splitting. I have modified blk_bio_segment_split and exported it to deal with that. Let me split those changes out cleanly and push out a branch. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [dm-devel] Any bio_clone_slow() implementation which doesn't share bi_io_vec? @ 2021-11-24 7:02 ` hch 0 siblings, 0 replies; 30+ messages in thread From: hch @ 2021-11-24 7:02 UTC (permalink / raw) To: Qu Wenruo; +Cc: hch, Johannes Thumshirn, dm-devel, Linux FS Devel, linux-btrfs On Wed, Nov 24, 2021 at 02:18:00PM +0800, Qu Wenruo wrote: > Just so simple? Then that's super awesome. > > But I'm a little concerned about the bio_add_hw_page() call in > bio_add_zoned_append(). > > It's not exactly the same as bio_add_page(). > > Does it mean as long as our splitted bio doesn't exceed zone limit, we > can do the convert without any further problem? You need to look at the limits when splitting. I have modified blk_bio_segment_split and exported it to deal with that. Let me split those changes out cleanly and push out a branch. -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Any bio_clone_slow() implementation which doesn't share bi_io_vec? 2021-11-24 7:02 ` [dm-devel] " hch @ 2021-11-24 7:22 ` hch -1 siblings, 0 replies; 30+ messages in thread From: hch @ 2021-11-24 7:22 UTC (permalink / raw) To: Qu Wenruo; +Cc: hch, Johannes Thumshirn, Linux FS Devel, dm-devel, linux-btrfs On Tue, Nov 23, 2021 at 11:02:52PM -0800, hch@infradead.org wrote: > > Does it mean as long as our splitted bio doesn't exceed zone limit, we > > can do the convert without any further problem? > > You need to look at the limits when splitting. I have modified > blk_bio_segment_split and exported it to deal with that. Let me split > those changes out cleanly and push out a branch. Here: http://git.infradead.org/users/hch/block.git/shortlog/refs/heads/zone-append-split ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [dm-devel] Any bio_clone_slow() implementation which doesn't share bi_io_vec? @ 2021-11-24 7:22 ` hch 0 siblings, 0 replies; 30+ messages in thread From: hch @ 2021-11-24 7:22 UTC (permalink / raw) To: Qu Wenruo; +Cc: hch, Johannes Thumshirn, dm-devel, Linux FS Devel, linux-btrfs On Tue, Nov 23, 2021 at 11:02:52PM -0800, hch@infradead.org wrote: > > Does it mean as long as our splitted bio doesn't exceed zone limit, we > > can do the convert without any further problem? > > You need to look at the limits when splitting. I have modified > blk_bio_segment_split and exported it to deal with that. Let me split > those changes out cleanly and push out a branch. Here: http://git.infradead.org/users/hch/block.git/shortlog/refs/heads/zone-append-split -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Any bio_clone_slow() implementation which doesn't share bi_io_vec? 2021-11-23 23:07 ` [dm-devel] " Qu Wenruo @ 2021-11-24 7:25 ` Naohiro Aota -1 siblings, 0 replies; 30+ messages in thread From: Naohiro Aota @ 2021-11-24 7:25 UTC (permalink / raw) To: Qu Wenruo; +Cc: hch, Johannes Thumshirn, Linux FS Devel, dm-devel, linux-btrfs On Wed, Nov 24, 2021 at 07:07:18AM +0800, Qu Wenruo wrote: > > > On 2021/11/23 22:28, hch@infradead.org wrote: > > On Tue, Nov 23, 2021 at 11:39:11AM +0000, Johannes Thumshirn wrote: > > > I think we have to differentiate two cases here: > > > A "regular" REQ_OP_ZONE_APPEND bio and a RAID stripe REQ_OP_ZONE_APPEND > > > bio. The 1st one (i.e. the regular REQ_OP_ZONE_APPEND bio) can't be split > > > because we cannot guarantee the order the device writes the data to disk. > > That's correct. > > But if we want to move all bio split into chunk layer, we want a initial > bio without any limitation, and then using that bio to create real > REQ_OP_ZONE_APPEND bios with proper size limitations. > > > > For the RAID stripe bio we can split it into the two (or more) parts that > > > will end up on _different_ devices. All we need to do is a) ensure it > > > doesn't cross the device's zone append limit and b) clamp all > > > bi_iter.bi_sector down to the start of the target zone, a.k.a sticking to > > > the rules of REQ_OP_ZONE_APPEND. > > > > Exactly. A stacking driver must never split a REQ_OP_ZONE_APPEND bio. > > But the file system itself can of course split it as long as each split > > off bio has it's own bi_end_io handler to record where it has been > > written to. > > > > This makes me wonder, can we really forget the zone thing for the > initial bio so we just create a plain bio without any special > limitation, and let every split condition be handled in the lower layer? > > Including raid stripe boundary, zone limitations etc. What really matters is to ensure the "one bio (for real zoned device) == one ordered extent" rule. When a device rewrites ZONE_APPEND bio's sector address, we rewrite the ordered extent's logical address accordingly in the end_io process. For ensuring the rewriting works, one extent must be composed with one contiguous bio. So, if we can split an ordered extent at the bio splitting process, that will be fine. Or, it is also fine if we can split an ordered extent at end_bio process. But, I think it is difficult because someone can be already waiting for the ordered extent, and splitting it at that point will break some assumptions in the code. > (yeah, it's still not pure stacking driver, but it's more > stacking-driver like). > > In that case, the missing piece seems to be a way to convert a splitted > plain bio into a REQ_OP_ZONE_APPEND bio. > > Can this be done without slow bvec copying? > > Thanks, > Qu ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [dm-devel] Any bio_clone_slow() implementation which doesn't share bi_io_vec? @ 2021-11-24 7:25 ` Naohiro Aota 0 siblings, 0 replies; 30+ messages in thread From: Naohiro Aota @ 2021-11-24 7:25 UTC (permalink / raw) To: Qu Wenruo; +Cc: hch, Johannes Thumshirn, dm-devel, Linux FS Devel, linux-btrfs On Wed, Nov 24, 2021 at 07:07:18AM +0800, Qu Wenruo wrote: > > > On 2021/11/23 22:28, hch@infradead.org wrote: > > On Tue, Nov 23, 2021 at 11:39:11AM +0000, Johannes Thumshirn wrote: > > > I think we have to differentiate two cases here: > > > A "regular" REQ_OP_ZONE_APPEND bio and a RAID stripe REQ_OP_ZONE_APPEND > > > bio. The 1st one (i.e. the regular REQ_OP_ZONE_APPEND bio) can't be split > > > because we cannot guarantee the order the device writes the data to disk. > > That's correct. > > But if we want to move all bio split into chunk layer, we want a initial > bio without any limitation, and then using that bio to create real > REQ_OP_ZONE_APPEND bios with proper size limitations. > > > > For the RAID stripe bio we can split it into the two (or more) parts that > > > will end up on _different_ devices. All we need to do is a) ensure it > > > doesn't cross the device's zone append limit and b) clamp all > > > bi_iter.bi_sector down to the start of the target zone, a.k.a sticking to > > > the rules of REQ_OP_ZONE_APPEND. > > > > Exactly. A stacking driver must never split a REQ_OP_ZONE_APPEND bio. > > But the file system itself can of course split it as long as each split > > off bio has it's own bi_end_io handler to record where it has been > > written to. > > > > This makes me wonder, can we really forget the zone thing for the > initial bio so we just create a plain bio without any special > limitation, and let every split condition be handled in the lower layer? > > Including raid stripe boundary, zone limitations etc. What really matters is to ensure the "one bio (for real zoned device) == one ordered extent" rule. When a device rewrites ZONE_APPEND bio's sector address, we rewrite the ordered extent's logical address accordingly in the end_io process. For ensuring the rewriting works, one extent must be composed with one contiguous bio. So, if we can split an ordered extent at the bio splitting process, that will be fine. Or, it is also fine if we can split an ordered extent at end_bio process. But, I think it is difficult because someone can be already waiting for the ordered extent, and splitting it at that point will break some assumptions in the code. > (yeah, it's still not pure stacking driver, but it's more > stacking-driver like). > > In that case, the missing piece seems to be a way to convert a splitted > plain bio into a REQ_OP_ZONE_APPEND bio. > > Can this be done without slow bvec copying? > > Thanks, > Qu -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Any bio_clone_slow() implementation which doesn't share bi_io_vec? 2021-11-24 7:25 ` [dm-devel] " Naohiro Aota @ 2021-11-24 7:39 ` Qu Wenruo -1 siblings, 0 replies; 30+ messages in thread From: Qu Wenruo @ 2021-11-24 7:39 UTC (permalink / raw) To: Naohiro Aota Cc: hch, Johannes Thumshirn, Linux FS Devel, dm-devel, linux-btrfs On 2021/11/24 15:25, Naohiro Aota wrote: > On Wed, Nov 24, 2021 at 07:07:18AM +0800, Qu Wenruo wrote: >> >> >> On 2021/11/23 22:28, hch@infradead.org wrote: >>> On Tue, Nov 23, 2021 at 11:39:11AM +0000, Johannes Thumshirn wrote: >>>> I think we have to differentiate two cases here: >>>> A "regular" REQ_OP_ZONE_APPEND bio and a RAID stripe REQ_OP_ZONE_APPEND >>>> bio. The 1st one (i.e. the regular REQ_OP_ZONE_APPEND bio) can't be split >>>> because we cannot guarantee the order the device writes the data to disk. >> >> That's correct. >> >> But if we want to move all bio split into chunk layer, we want a initial >> bio without any limitation, and then using that bio to create real >> REQ_OP_ZONE_APPEND bios with proper size limitations. >> >>>> For the RAID stripe bio we can split it into the two (or more) parts that >>>> will end up on _different_ devices. All we need to do is a) ensure it >>>> doesn't cross the device's zone append limit and b) clamp all >>>> bi_iter.bi_sector down to the start of the target zone, a.k.a sticking to >>>> the rules of REQ_OP_ZONE_APPEND. >>> >>> Exactly. A stacking driver must never split a REQ_OP_ZONE_APPEND bio. >>> But the file system itself can of course split it as long as each split >>> off bio has it's own bi_end_io handler to record where it has been >>> written to. >>> >> >> This makes me wonder, can we really forget the zone thing for the >> initial bio so we just create a plain bio without any special >> limitation, and let every split condition be handled in the lower layer? >> >> Including raid stripe boundary, zone limitations etc. > > What really matters is to ensure the "one bio (for real zoned device) > == one ordered extent" rule. When a device rewrites ZONE_APPEND bio's > sector address, we rewrite the ordered extent's logical address > accordingly in the end_io process. For ensuring the rewriting works, > one extent must be composed with one contiguous bio. > > So, if we can split an ordered extent at the bio splitting process, > that will be fine. Or, it is also fine if we can split an ordered > extent at end_bio process. But, I think it is difficult because > someone can be already waiting for the ordered extent, and splitting > it at that point will break some assumptions in the code. OK, I see the problem now. It's extract_ordered_extent() relying on the zoned append bio to split the ordered extents. Not the opposite, thus it will be still more complex than I thought to split bio in chunk layer. I'll leave the zoned part untouched for now until I have a better solution. Thanks, Qu > >> (yeah, it's still not pure stacking driver, but it's more >> stacking-driver like). >> >> In that case, the missing piece seems to be a way to convert a splitted >> plain bio into a REQ_OP_ZONE_APPEND bio. >> >> Can this be done without slow bvec copying? >> >> Thanks, >> Qu ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [dm-devel] Any bio_clone_slow() implementation which doesn't share bi_io_vec? @ 2021-11-24 7:39 ` Qu Wenruo 0 siblings, 0 replies; 30+ messages in thread From: Qu Wenruo @ 2021-11-24 7:39 UTC (permalink / raw) To: Naohiro Aota Cc: hch, Johannes Thumshirn, dm-devel, Linux FS Devel, linux-btrfs On 2021/11/24 15:25, Naohiro Aota wrote: > On Wed, Nov 24, 2021 at 07:07:18AM +0800, Qu Wenruo wrote: >> >> >> On 2021/11/23 22:28, hch@infradead.org wrote: >>> On Tue, Nov 23, 2021 at 11:39:11AM +0000, Johannes Thumshirn wrote: >>>> I think we have to differentiate two cases here: >>>> A "regular" REQ_OP_ZONE_APPEND bio and a RAID stripe REQ_OP_ZONE_APPEND >>>> bio. The 1st one (i.e. the regular REQ_OP_ZONE_APPEND bio) can't be split >>>> because we cannot guarantee the order the device writes the data to disk. >> >> That's correct. >> >> But if we want to move all bio split into chunk layer, we want a initial >> bio without any limitation, and then using that bio to create real >> REQ_OP_ZONE_APPEND bios with proper size limitations. >> >>>> For the RAID stripe bio we can split it into the two (or more) parts that >>>> will end up on _different_ devices. All we need to do is a) ensure it >>>> doesn't cross the device's zone append limit and b) clamp all >>>> bi_iter.bi_sector down to the start of the target zone, a.k.a sticking to >>>> the rules of REQ_OP_ZONE_APPEND. >>> >>> Exactly. A stacking driver must never split a REQ_OP_ZONE_APPEND bio. >>> But the file system itself can of course split it as long as each split >>> off bio has it's own bi_end_io handler to record where it has been >>> written to. >>> >> >> This makes me wonder, can we really forget the zone thing for the >> initial bio so we just create a plain bio without any special >> limitation, and let every split condition be handled in the lower layer? >> >> Including raid stripe boundary, zone limitations etc. > > What really matters is to ensure the "one bio (for real zoned device) > == one ordered extent" rule. When a device rewrites ZONE_APPEND bio's > sector address, we rewrite the ordered extent's logical address > accordingly in the end_io process. For ensuring the rewriting works, > one extent must be composed with one contiguous bio. > > So, if we can split an ordered extent at the bio splitting process, > that will be fine. Or, it is also fine if we can split an ordered > extent at end_bio process. But, I think it is difficult because > someone can be already waiting for the ordered extent, and splitting > it at that point will break some assumptions in the code. OK, I see the problem now. It's extract_ordered_extent() relying on the zoned append bio to split the ordered extents. Not the opposite, thus it will be still more complex than I thought to split bio in chunk layer. I'll leave the zoned part untouched for now until I have a better solution. Thanks, Qu > >> (yeah, it's still not pure stacking driver, but it's more >> stacking-driver like). >> >> In that case, the missing piece seems to be a way to convert a splitted >> plain bio into a REQ_OP_ZONE_APPEND bio. >> >> Can this be done without slow bvec copying? >> >> Thanks, >> Qu -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [dm-devel] Any bio_clone_slow() implementation which doesn't share bi_io_vec? 2021-11-23 8:13 ` [dm-devel] " Christoph Hellwig @ 2021-11-26 12:33 ` Qu Wenruo -1 siblings, 0 replies; 30+ messages in thread From: Qu Wenruo @ 2021-11-26 12:33 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Linux FS Devel, dm-devel, linux-btrfs On 2021/11/23 16:13, Christoph Hellwig wrote: > On Tue, Nov 23, 2021 at 04:10:35PM +0800, Qu Wenruo wrote: >> Without bio_chain() sounds pretty good, as we can still utilize >> bi_end_io and bi_private. >> >> But this also means, we're now responsible not to release the source bio >> since it has the real bi_io_vec. > > Just call bio_inc_remaining before submitting the cloned bio, and then > call bio_endio on the root bio every time a clone completes. > Does this also mean, we need to save a bi_iter before bio submission, so that at endio time, we can iterate only the split part of the bio? Or we have to go back to bio_chain() method, and rely on the parent bio to iterate all bvecs... Thanks, Qu -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Any bio_clone_slow() implementation which doesn't share bi_io_vec? @ 2021-11-26 12:33 ` Qu Wenruo 0 siblings, 0 replies; 30+ messages in thread From: Qu Wenruo @ 2021-11-26 12:33 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Linux FS Devel, dm-devel, linux-btrfs On 2021/11/23 16:13, Christoph Hellwig wrote: > On Tue, Nov 23, 2021 at 04:10:35PM +0800, Qu Wenruo wrote: >> Without bio_chain() sounds pretty good, as we can still utilize >> bi_end_io and bi_private. >> >> But this also means, we're now responsible not to release the source bio >> since it has the real bi_io_vec. > > Just call bio_inc_remaining before submitting the cloned bio, and then > call bio_endio on the root bio every time a clone completes. > Does this also mean, we need to save a bi_iter before bio submission, so that at endio time, we can iterate only the split part of the bio? Or we have to go back to bio_chain() method, and rely on the parent bio to iterate all bvecs... Thanks, Qu ^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2021-11-26 12:35 UTC | newest] Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-11-23 6:44 Any bio_clone_slow() implementation which doesn't share bi_io_vec? Qu Wenruo 2021-11-23 6:44 ` [dm-devel] " Qu Wenruo 2021-11-23 7:43 ` Christoph Hellwig 2021-11-23 7:43 ` [dm-devel] " Christoph Hellwig 2021-11-23 8:10 ` Qu Wenruo 2021-11-23 8:10 ` [dm-devel] " Qu Wenruo 2021-11-23 8:13 ` Christoph Hellwig 2021-11-23 8:13 ` [dm-devel] " Christoph Hellwig 2021-11-23 11:09 ` Qu Wenruo 2021-11-23 11:09 ` [dm-devel] " Qu Wenruo 2021-11-23 11:39 ` Johannes Thumshirn 2021-11-23 11:39 ` [dm-devel] " Johannes Thumshirn 2021-11-23 14:28 ` hch 2021-11-23 14:28 ` [dm-devel] " hch 2021-11-23 23:07 ` Qu Wenruo 2021-11-23 23:07 ` [dm-devel] " Qu Wenruo 2021-11-24 6:09 ` hch 2021-11-24 6:09 ` [dm-devel] " hch 2021-11-24 6:18 ` Qu Wenruo 2021-11-24 6:18 ` [dm-devel] " Qu Wenruo 2021-11-24 7:02 ` hch 2021-11-24 7:02 ` [dm-devel] " hch 2021-11-24 7:22 ` hch 2021-11-24 7:22 ` [dm-devel] " hch 2021-11-24 7:25 ` Naohiro Aota 2021-11-24 7:25 ` [dm-devel] " Naohiro Aota 2021-11-24 7:39 ` Qu Wenruo 2021-11-24 7:39 ` [dm-devel] " Qu Wenruo 2021-11-26 12:33 ` Qu Wenruo 2021-11-26 12:33 ` Qu Wenruo
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.