* Behavior of btrfs_io_geometry()
@ 2021-05-20 6:11 Naohiro Aota
2021-05-20 6:58 ` Nikolay Borisov
0 siblings, 1 reply; 3+ messages in thread
From: Naohiro Aota @ 2021-05-20 6:11 UTC (permalink / raw)
To: linux-btrfs
I have a few questions about btrfs_io_geometry()'s behavior.
While I'm testing zoned btrfs on ZNS drive with 2GB zone size, I hit
the following ASSERT in btrfs_submit_direct() by running fstests
btrfs/017.
static blk_qc_t btrfs_submit_direct(struct inode *inode, struct iomap *iomap,
struct bio *dio_bio, loff_t file_offset)
{
...
start_sector = dio_bio->bi_iter.bi_sector;
submit_len = dio_bio->bi_iter.bi_size;
do {
logical = start_sector << 9;
em = btrfs_get_chunk_map(fs_info, logical, submit_len);
...
ret = btrfs_get_io_geometry(fs_info, em, btrfs_op(dio_bio),
logical, submit_len, &geom);
...
ASSERT(geom.len <= INT_MAX);
clone_len = min_t(int, submit_len, geom.len);
...
bio = btrfs_bio_clone_partial(dio_bio, clone_offset, clone_len);
On zoned btrfs, we create a SINGLE block group whose size is equal to
the device zone size, so we have a 2 GB SINGLE block group on a 2 GB
zone size drive. Then, on a SINGLE single block group,
btrfs_io_geometry() returns the remaining length from @logical to the
end of the block group regardless of the @len argument. Thus, with
@logical == 0, we get geom->len = 2 GB, which is larger than INT_MAX,
hitting the ASSERT.
I'm confusing because I'm not sure what the ASSERT wants to do. It
might want to guard btrfs_bio_clone_partial() (and bio_trim()) below?
But, since bio_trim() takes sector-based values, and the passed
"clone_offset" and "clone_len" is byte-based, we can technically allow
larger bytes than INT_MAX. (well, we might never build such large bio,
though). And, it looks meaningless to check geom->len here. Since, it
can be much larger than bio size on a SINGLE block group.
So, in summary, below are my questions.
1. About btrfs_bio_clone_partial()
1.1 What is the meaning of geom->len?
- Length from @logical to the stripe boundary? or
- Length [logical, logical+len] can go without crossing the boundary?
1.2 @len is currently unused in btrfs_bio_clone_partial(), is this correct?
1.3 What should we fill into geom->len when the block group is SINGLE profile?
2. About the ASSERT
2.1 Shouldn't we check submit_len (= bio's length) instead of geom->len ?
2.2 Can't it be larger than INT_MAX? e.g., INT_MAX << SECTOR_SHIFT?
3. About btrfs_bio_clone_partial()
3.1 We can change "int" to "u64" maybe?
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Behavior of btrfs_io_geometry()
2021-05-20 6:11 Behavior of btrfs_io_geometry() Naohiro Aota
@ 2021-05-20 6:58 ` Nikolay Borisov
2021-05-20 7:25 ` Naohiro Aota
0 siblings, 1 reply; 3+ messages in thread
From: Nikolay Borisov @ 2021-05-20 6:58 UTC (permalink / raw)
To: Naohiro Aota, linux-btrfs
On 20.05.21 г. 9:11, Naohiro Aota wrote:
> I have a few questions about btrfs_io_geometry()'s behavior.
>
> While I'm testing zoned btrfs on ZNS drive with 2GB zone size, I hit
> the following ASSERT in btrfs_submit_direct() by running fstests
> btrfs/017.
>
> static blk_qc_t btrfs_submit_direct(struct inode *inode, struct iomap *iomap,
> struct bio *dio_bio, loff_t file_offset)
> {
> ...
> start_sector = dio_bio->bi_iter.bi_sector;
> submit_len = dio_bio->bi_iter.bi_size;
>
> do {
> logical = start_sector << 9;
> em = btrfs_get_chunk_map(fs_info, logical, submit_len);
> ...
> ret = btrfs_get_io_geometry(fs_info, em, btrfs_op(dio_bio),
> logical, submit_len, &geom);
> ...
> ASSERT(geom.len <= INT_MAX);
>
> clone_len = min_t(int, submit_len, geom.len);
> ...
> bio = btrfs_bio_clone_partial(dio_bio, clone_offset, clone_len);
>
>
> On zoned btrfs, we create a SINGLE block group whose size is equal to
> the device zone size, so we have a 2 GB SINGLE block group on a 2 GB
> zone size drive. Then, on a SINGLE single block group,
> btrfs_io_geometry() returns the remaining length from @logical to the
> end of the block group regardless of the @len argument. Thus, with
> @logical == 0, we get geom->len = 2 GB, which is larger than INT_MAX,
> hitting the ASSERT.
>
> I'm confusing because I'm not sure what the ASSERT wants to do. It
> might want to guard btrfs_bio_clone_partial() (and bio_trim()) below?
> But, since bio_trim() takes sector-based values, and the passed
> "clone_offset" and "clone_len" is byte-based, we can technically allow
> larger bytes than INT_MAX. (well, we might never build such large bio,
> though). And, it looks meaningless to check geom->len here. Since, it
> can be much larger than bio size on a SINGLE block group.
>
> So, in summary, below are my questions.
>
> 1. About btrfs_bio_clone_partial()
> 1.1 What is the meaning of geom->len?
> - Length from @logical to the stripe boundary? or
> - Length [logical, logical+len] can go without crossing the boundary?
So think of it as splitting a block group into 64k chunks (those are the
stripes) then finding in which such 64k chunk/stripe_nr logical falls
within, so len would be the length of an io you could do without
crossing this 64k boundary. This process is described in
https://github.com/btrfs/btrfs-dev-docs/blob/master/bio-submission.txt
under section 'Bio mapping'. With this in mind I'd say it's [logical,
logical+len] without crossing a 64k boundary. !!!HOWEVER!!!! This
applies to bg types other than SINGLE , that's the
if (map->type & BTRFS_BLOCK_GROUP_PROFILE_MASK) check in the function.
We do this because for raid levels we'd like the io granularity to be
64k, whereas for SINGLE we simply issue a single write, i.e len is
calculated as:
len = em->len - offset;
which is why you are getting such the size of the block group.
> 1.2 @len is currently unused in btrfs_bio_clone_partial(), is this correct?
It's used to calculate @clone_len parameter so it's indirectly involved
in btrfs_bio_clone_partial.
> 1.3 What should we fill into geom->len when the block group is SINGLE profile?
See above.
>
> 2. About the ASSERT
> 2.1 Shouldn't we check submit_len (= bio's length) instead of geom->len ?
The assert got added quite a while back so I'm not entirely sure it's
pertinent.
> 2.2 Can't it be larger than INT_MAX? e.g., INT_MAX << SECTOR_SHIFT?
>
> 3. About btrfs_bio_clone_partial()
> 3.1 We can change "int" to "u64" maybe?
Can we though? So if size in btrfs_bio_clone_partial is switched to u64
then in that fuction we do size >> 9 potentially having a value of at
most 55 bits, passing it to bio_trim, which in turn takes an int. So
you'd be truncating values at the time you call bio_trim, no ?
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Behavior of btrfs_io_geometry()
2021-05-20 6:58 ` Nikolay Borisov
@ 2021-05-20 7:25 ` Naohiro Aota
0 siblings, 0 replies; 3+ messages in thread
From: Naohiro Aota @ 2021-05-20 7:25 UTC (permalink / raw)
To: Nikolay Borisov; +Cc: linux-btrfs
Thanks for the comments.
On Thu, May 20, 2021 at 09:58:28AM +0300, Nikolay Borisov wrote:
>
>
> On 20.05.21 г. 9:11, Naohiro Aota wrote:
> > I have a few questions about btrfs_io_geometry()'s behavior.
> >
> > While I'm testing zoned btrfs on ZNS drive with 2GB zone size, I hit
> > the following ASSERT in btrfs_submit_direct() by running fstests
> > btrfs/017.
> >
> > static blk_qc_t btrfs_submit_direct(struct inode *inode, struct iomap *iomap,
> > struct bio *dio_bio, loff_t file_offset)
> > {
> > ...
> > start_sector = dio_bio->bi_iter.bi_sector;
> > submit_len = dio_bio->bi_iter.bi_size;
> >
> > do {
> > logical = start_sector << 9;
> > em = btrfs_get_chunk_map(fs_info, logical, submit_len);
> > ...
> > ret = btrfs_get_io_geometry(fs_info, em, btrfs_op(dio_bio),
> > logical, submit_len, &geom);
> > ...
> > ASSERT(geom.len <= INT_MAX);
> >
> > clone_len = min_t(int, submit_len, geom.len);
> > ...
> > bio = btrfs_bio_clone_partial(dio_bio, clone_offset, clone_len);
> >
> >
> > On zoned btrfs, we create a SINGLE block group whose size is equal to
> > the device zone size, so we have a 2 GB SINGLE block group on a 2 GB
> > zone size drive. Then, on a SINGLE single block group,
> > btrfs_io_geometry() returns the remaining length from @logical to the
> > end of the block group regardless of the @len argument. Thus, with
> > @logical == 0, we get geom->len = 2 GB, which is larger than INT_MAX,
> > hitting the ASSERT.
> >
> > I'm confusing because I'm not sure what the ASSERT wants to do. It
> > might want to guard btrfs_bio_clone_partial() (and bio_trim()) below?
> > But, since bio_trim() takes sector-based values, and the passed
> > "clone_offset" and "clone_len" is byte-based, we can technically allow
> > larger bytes than INT_MAX. (well, we might never build such large bio,
> > though). And, it looks meaningless to check geom->len here. Since, it
> > can be much larger than bio size on a SINGLE block group.
> >
> > So, in summary, below are my questions.
> >
> > 1. About btrfs_bio_clone_partial()
> > 1.1 What is the meaning of geom->len?
> > - Length from @logical to the stripe boundary? or
> > - Length [logical, logical+len] can go without crossing the boundary?
>
> So think of it as splitting a block group into 64k chunks (those are the
> stripes) then finding in which such 64k chunk/stripe_nr logical falls
> within, so len would be the length of an io you could do without
> crossing this 64k boundary. This process is described in
> https://github.com/btrfs/btrfs-dev-docs/blob/master/bio-submission.txt
> under section 'Bio mapping'. With this in mind I'd say it's [logical,
> logical+len] without crossing a 64k boundary. !!!HOWEVER!!!! This
> applies to bg types other than SINGLE , that's the
> if (map->type & BTRFS_BLOCK_GROUP_PROFILE_MASK) check in the function.
> We do this because for raid levels we'd like the io granularity to be
> 64k, whereas for SINGLE we simply issue a single write, i.e len is
> calculated as:
>
> len = em->len - offset;
>
> which is why you are getting such the size of the block group.
OK, it's the current behavior. So, we should expect geom->len vary
the meaning depending on the block group profile type?
> > 1.2 @len is currently unused in btrfs_bio_clone_partial(), is this correct?
>
> It's used to calculate @clone_len parameter so it's indirectly involved
> in btrfs_bio_clone_partial.
Oops, I made a mistake here. I meant btrfs_get_io_geometry() here. The
same for the bullet "1." above. In btrfs_get_io_geometry(), we don't
use @len.
> > 1.3 What should we fill into geom->len when the block group is SINGLE profile?
>
> See above.
>
> >
> > 2. About the ASSERT
> > 2.1 Shouldn't we check submit_len (= bio's length) instead of geom->len ?
>
> The assert got added quite a while back so I'm not entirely sure it's
> pertinent.
>
> > 2.2 Can't it be larger than INT_MAX? e.g., INT_MAX << SECTOR_SHIFT?
> >
> > 3. About btrfs_bio_clone_partial()
> > 3.1 We can change "int" to "u64" maybe?
>
> Can we though? So if size in btrfs_bio_clone_partial is switched to u64
> then in that fuction we do size >> 9 potentially having a value of at
> most 55 bits, passing it to bio_trim, which in turn takes an int. So
> you'd be truncating values at the time you call bio_trim, no ?
Yes, but we can still safely allow the value up to "INT_MAX << 9",
can't we? We may need to add another ASSERT here in this case...
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2021-05-20 7:25 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-20 6:11 Behavior of btrfs_io_geometry() Naohiro Aota
2021-05-20 6:58 ` Nikolay Borisov
2021-05-20 7:25 ` Naohiro Aota
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.