* Behavior of btrfs_io_geometry() @ 2021-05-20 6:11 Naohiro Aota 2021-05-20 6:58 ` Nikolay Borisov 0 siblings, 1 reply; 3+ messages in thread From: Naohiro Aota @ 2021-05-20 6:11 UTC (permalink / raw) To: linux-btrfs I have a few questions about btrfs_io_geometry()'s behavior. While I'm testing zoned btrfs on ZNS drive with 2GB zone size, I hit the following ASSERT in btrfs_submit_direct() by running fstests btrfs/017. static blk_qc_t btrfs_submit_direct(struct inode *inode, struct iomap *iomap, struct bio *dio_bio, loff_t file_offset) { ... start_sector = dio_bio->bi_iter.bi_sector; submit_len = dio_bio->bi_iter.bi_size; do { logical = start_sector << 9; em = btrfs_get_chunk_map(fs_info, logical, submit_len); ... ret = btrfs_get_io_geometry(fs_info, em, btrfs_op(dio_bio), logical, submit_len, &geom); ... ASSERT(geom.len <= INT_MAX); clone_len = min_t(int, submit_len, geom.len); ... bio = btrfs_bio_clone_partial(dio_bio, clone_offset, clone_len); On zoned btrfs, we create a SINGLE block group whose size is equal to the device zone size, so we have a 2 GB SINGLE block group on a 2 GB zone size drive. Then, on a SINGLE single block group, btrfs_io_geometry() returns the remaining length from @logical to the end of the block group regardless of the @len argument. Thus, with @logical == 0, we get geom->len = 2 GB, which is larger than INT_MAX, hitting the ASSERT. I'm confusing because I'm not sure what the ASSERT wants to do. It might want to guard btrfs_bio_clone_partial() (and bio_trim()) below? But, since bio_trim() takes sector-based values, and the passed "clone_offset" and "clone_len" is byte-based, we can technically allow larger bytes than INT_MAX. (well, we might never build such large bio, though). And, it looks meaningless to check geom->len here. Since, it can be much larger than bio size on a SINGLE block group. So, in summary, below are my questions. 1. About btrfs_bio_clone_partial() 1.1 What is the meaning of geom->len? - Length from @logical to the stripe boundary? or - Length [logical, logical+len] can go without crossing the boundary? 1.2 @len is currently unused in btrfs_bio_clone_partial(), is this correct? 1.3 What should we fill into geom->len when the block group is SINGLE profile? 2. About the ASSERT 2.1 Shouldn't we check submit_len (= bio's length) instead of geom->len ? 2.2 Can't it be larger than INT_MAX? e.g., INT_MAX << SECTOR_SHIFT? 3. About btrfs_bio_clone_partial() 3.1 We can change "int" to "u64" maybe? ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Behavior of btrfs_io_geometry() 2021-05-20 6:11 Behavior of btrfs_io_geometry() Naohiro Aota @ 2021-05-20 6:58 ` Nikolay Borisov 2021-05-20 7:25 ` Naohiro Aota 0 siblings, 1 reply; 3+ messages in thread From: Nikolay Borisov @ 2021-05-20 6:58 UTC (permalink / raw) To: Naohiro Aota, linux-btrfs On 20.05.21 г. 9:11, Naohiro Aota wrote: > I have a few questions about btrfs_io_geometry()'s behavior. > > While I'm testing zoned btrfs on ZNS drive with 2GB zone size, I hit > the following ASSERT in btrfs_submit_direct() by running fstests > btrfs/017. > > static blk_qc_t btrfs_submit_direct(struct inode *inode, struct iomap *iomap, > struct bio *dio_bio, loff_t file_offset) > { > ... > start_sector = dio_bio->bi_iter.bi_sector; > submit_len = dio_bio->bi_iter.bi_size; > > do { > logical = start_sector << 9; > em = btrfs_get_chunk_map(fs_info, logical, submit_len); > ... > ret = btrfs_get_io_geometry(fs_info, em, btrfs_op(dio_bio), > logical, submit_len, &geom); > ... > ASSERT(geom.len <= INT_MAX); > > clone_len = min_t(int, submit_len, geom.len); > ... > bio = btrfs_bio_clone_partial(dio_bio, clone_offset, clone_len); > > > On zoned btrfs, we create a SINGLE block group whose size is equal to > the device zone size, so we have a 2 GB SINGLE block group on a 2 GB > zone size drive. Then, on a SINGLE single block group, > btrfs_io_geometry() returns the remaining length from @logical to the > end of the block group regardless of the @len argument. Thus, with > @logical == 0, we get geom->len = 2 GB, which is larger than INT_MAX, > hitting the ASSERT. > > I'm confusing because I'm not sure what the ASSERT wants to do. It > might want to guard btrfs_bio_clone_partial() (and bio_trim()) below? > But, since bio_trim() takes sector-based values, and the passed > "clone_offset" and "clone_len" is byte-based, we can technically allow > larger bytes than INT_MAX. (well, we might never build such large bio, > though). And, it looks meaningless to check geom->len here. Since, it > can be much larger than bio size on a SINGLE block group. > > So, in summary, below are my questions. > > 1. About btrfs_bio_clone_partial() > 1.1 What is the meaning of geom->len? > - Length from @logical to the stripe boundary? or > - Length [logical, logical+len] can go without crossing the boundary? So think of it as splitting a block group into 64k chunks (those are the stripes) then finding in which such 64k chunk/stripe_nr logical falls within, so len would be the length of an io you could do without crossing this 64k boundary. This process is described in https://github.com/btrfs/btrfs-dev-docs/blob/master/bio-submission.txt under section 'Bio mapping'. With this in mind I'd say it's [logical, logical+len] without crossing a 64k boundary. !!!HOWEVER!!!! This applies to bg types other than SINGLE , that's the if (map->type & BTRFS_BLOCK_GROUP_PROFILE_MASK) check in the function. We do this because for raid levels we'd like the io granularity to be 64k, whereas for SINGLE we simply issue a single write, i.e len is calculated as: len = em->len - offset; which is why you are getting such the size of the block group. > 1.2 @len is currently unused in btrfs_bio_clone_partial(), is this correct? It's used to calculate @clone_len parameter so it's indirectly involved in btrfs_bio_clone_partial. > 1.3 What should we fill into geom->len when the block group is SINGLE profile? See above. > > 2. About the ASSERT > 2.1 Shouldn't we check submit_len (= bio's length) instead of geom->len ? The assert got added quite a while back so I'm not entirely sure it's pertinent. > 2.2 Can't it be larger than INT_MAX? e.g., INT_MAX << SECTOR_SHIFT? > > 3. About btrfs_bio_clone_partial() > 3.1 We can change "int" to "u64" maybe? Can we though? So if size in btrfs_bio_clone_partial is switched to u64 then in that fuction we do size >> 9 potentially having a value of at most 55 bits, passing it to bio_trim, which in turn takes an int. So you'd be truncating values at the time you call bio_trim, no ? ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Behavior of btrfs_io_geometry() 2021-05-20 6:58 ` Nikolay Borisov @ 2021-05-20 7:25 ` Naohiro Aota 0 siblings, 0 replies; 3+ messages in thread From: Naohiro Aota @ 2021-05-20 7:25 UTC (permalink / raw) To: Nikolay Borisov; +Cc: linux-btrfs Thanks for the comments. On Thu, May 20, 2021 at 09:58:28AM +0300, Nikolay Borisov wrote: > > > On 20.05.21 г. 9:11, Naohiro Aota wrote: > > I have a few questions about btrfs_io_geometry()'s behavior. > > > > While I'm testing zoned btrfs on ZNS drive with 2GB zone size, I hit > > the following ASSERT in btrfs_submit_direct() by running fstests > > btrfs/017. > > > > static blk_qc_t btrfs_submit_direct(struct inode *inode, struct iomap *iomap, > > struct bio *dio_bio, loff_t file_offset) > > { > > ... > > start_sector = dio_bio->bi_iter.bi_sector; > > submit_len = dio_bio->bi_iter.bi_size; > > > > do { > > logical = start_sector << 9; > > em = btrfs_get_chunk_map(fs_info, logical, submit_len); > > ... > > ret = btrfs_get_io_geometry(fs_info, em, btrfs_op(dio_bio), > > logical, submit_len, &geom); > > ... > > ASSERT(geom.len <= INT_MAX); > > > > clone_len = min_t(int, submit_len, geom.len); > > ... > > bio = btrfs_bio_clone_partial(dio_bio, clone_offset, clone_len); > > > > > > On zoned btrfs, we create a SINGLE block group whose size is equal to > > the device zone size, so we have a 2 GB SINGLE block group on a 2 GB > > zone size drive. Then, on a SINGLE single block group, > > btrfs_io_geometry() returns the remaining length from @logical to the > > end of the block group regardless of the @len argument. Thus, with > > @logical == 0, we get geom->len = 2 GB, which is larger than INT_MAX, > > hitting the ASSERT. > > > > I'm confusing because I'm not sure what the ASSERT wants to do. It > > might want to guard btrfs_bio_clone_partial() (and bio_trim()) below? > > But, since bio_trim() takes sector-based values, and the passed > > "clone_offset" and "clone_len" is byte-based, we can technically allow > > larger bytes than INT_MAX. (well, we might never build such large bio, > > though). And, it looks meaningless to check geom->len here. Since, it > > can be much larger than bio size on a SINGLE block group. > > > > So, in summary, below are my questions. > > > > 1. About btrfs_bio_clone_partial() > > 1.1 What is the meaning of geom->len? > > - Length from @logical to the stripe boundary? or > > - Length [logical, logical+len] can go without crossing the boundary? > > So think of it as splitting a block group into 64k chunks (those are the > stripes) then finding in which such 64k chunk/stripe_nr logical falls > within, so len would be the length of an io you could do without > crossing this 64k boundary. This process is described in > https://github.com/btrfs/btrfs-dev-docs/blob/master/bio-submission.txt > under section 'Bio mapping'. With this in mind I'd say it's [logical, > logical+len] without crossing a 64k boundary. !!!HOWEVER!!!! This > applies to bg types other than SINGLE , that's the > if (map->type & BTRFS_BLOCK_GROUP_PROFILE_MASK) check in the function. > We do this because for raid levels we'd like the io granularity to be > 64k, whereas for SINGLE we simply issue a single write, i.e len is > calculated as: > > len = em->len - offset; > > which is why you are getting such the size of the block group. OK, it's the current behavior. So, we should expect geom->len vary the meaning depending on the block group profile type? > > 1.2 @len is currently unused in btrfs_bio_clone_partial(), is this correct? > > It's used to calculate @clone_len parameter so it's indirectly involved > in btrfs_bio_clone_partial. Oops, I made a mistake here. I meant btrfs_get_io_geometry() here. The same for the bullet "1." above. In btrfs_get_io_geometry(), we don't use @len. > > 1.3 What should we fill into geom->len when the block group is SINGLE profile? > > See above. > > > > > 2. About the ASSERT > > 2.1 Shouldn't we check submit_len (= bio's length) instead of geom->len ? > > The assert got added quite a while back so I'm not entirely sure it's > pertinent. > > > 2.2 Can't it be larger than INT_MAX? e.g., INT_MAX << SECTOR_SHIFT? > > > > 3. About btrfs_bio_clone_partial() > > 3.1 We can change "int" to "u64" maybe? > > Can we though? So if size in btrfs_bio_clone_partial is switched to u64 > then in that fuction we do size >> 9 potentially having a value of at > most 55 bits, passing it to bio_trim, which in turn takes an int. So > you'd be truncating values at the time you call bio_trim, no ? Yes, but we can still safely allow the value up to "INT_MAX << 9", can't we? We may need to add another ASSERT here in this case... ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2021-05-20 7:25 UTC | newest] Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-05-20 6:11 Behavior of btrfs_io_geometry() Naohiro Aota 2021-05-20 6:58 ` Nikolay Borisov 2021-05-20 7:25 ` Naohiro Aota
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.