All of lore.kernel.org
 help / color / mirror / Atom feed
* Behavior of btrfs_io_geometry()
@ 2021-05-20  6:11 Naohiro Aota
  2021-05-20  6:58 ` Nikolay Borisov
  0 siblings, 1 reply; 3+ messages in thread
From: Naohiro Aota @ 2021-05-20  6:11 UTC (permalink / raw)
  To: linux-btrfs

I have a few questions about btrfs_io_geometry()'s behavior. 

While I'm testing zoned btrfs on ZNS drive with 2GB zone size, I hit
the following ASSERT in btrfs_submit_direct() by running fstests
btrfs/017.

static blk_qc_t btrfs_submit_direct(struct inode *inode, struct iomap *iomap,
		struct bio *dio_bio, loff_t file_offset)
{
...
	start_sector = dio_bio->bi_iter.bi_sector;
	submit_len = dio_bio->bi_iter.bi_size;

	do {
		logical = start_sector << 9;
		em = btrfs_get_chunk_map(fs_info, logical, submit_len);
...
		ret = btrfs_get_io_geometry(fs_info, em, btrfs_op(dio_bio),
					    logical, submit_len, &geom);
...
		ASSERT(geom.len <= INT_MAX);

		clone_len = min_t(int, submit_len, geom.len);
...
		bio = btrfs_bio_clone_partial(dio_bio, clone_offset, clone_len);


On zoned btrfs, we create a SINGLE block group whose size is equal to
the device zone size, so we have a 2 GB SINGLE block group on a 2 GB
zone size drive. Then, on a SINGLE single block group,
btrfs_io_geometry() returns the remaining length from @logical to the
end of the block group regardless of the @len argument. Thus, with
@logical == 0, we get geom->len = 2 GB, which is larger than INT_MAX,
hitting the ASSERT.

I'm confusing because I'm not sure what the ASSERT wants to do. It
might want to guard btrfs_bio_clone_partial() (and bio_trim()) below?
But, since bio_trim() takes sector-based values, and the passed
"clone_offset" and "clone_len" is byte-based, we can technically allow
larger bytes than INT_MAX. (well, we might never build such large bio,
though). And, it looks meaningless to check geom->len here. Since, it
can be much larger than bio size on a SINGLE block group.

So, in summary, below are my questions.

1. About btrfs_bio_clone_partial()
  1.1 What is the meaning of geom->len?
      - Length from @logical to the stripe boundary? or
      - Length [logical, logical+len] can go without crossing the boundary?
  1.2 @len is currently unused in btrfs_bio_clone_partial(), is this correct?
  1.3 What should we fill into geom->len when the block group is SINGLE profile?

2. About the ASSERT
  2.1 Shouldn't we check submit_len (= bio's length) instead of geom->len ?
  2.2 Can't it be larger than INT_MAX? e.g., INT_MAX << SECTOR_SHIFT?

3. About btrfs_bio_clone_partial()
  3.1 We can change "int" to "u64" maybe?


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Behavior of btrfs_io_geometry()
  2021-05-20  6:11 Behavior of btrfs_io_geometry() Naohiro Aota
@ 2021-05-20  6:58 ` Nikolay Borisov
  2021-05-20  7:25   ` Naohiro Aota
  0 siblings, 1 reply; 3+ messages in thread
From: Nikolay Borisov @ 2021-05-20  6:58 UTC (permalink / raw)
  To: Naohiro Aota, linux-btrfs



On 20.05.21 г. 9:11, Naohiro Aota wrote:
> I have a few questions about btrfs_io_geometry()'s behavior. 
> 
> While I'm testing zoned btrfs on ZNS drive with 2GB zone size, I hit
> the following ASSERT in btrfs_submit_direct() by running fstests
> btrfs/017.
> 
> static blk_qc_t btrfs_submit_direct(struct inode *inode, struct iomap *iomap,
> 		struct bio *dio_bio, loff_t file_offset)
> {
> ...
> 	start_sector = dio_bio->bi_iter.bi_sector;
> 	submit_len = dio_bio->bi_iter.bi_size;
> 
> 	do {
> 		logical = start_sector << 9;
> 		em = btrfs_get_chunk_map(fs_info, logical, submit_len);
> ...
> 		ret = btrfs_get_io_geometry(fs_info, em, btrfs_op(dio_bio),
> 					    logical, submit_len, &geom);
> ...
> 		ASSERT(geom.len <= INT_MAX);
> 
> 		clone_len = min_t(int, submit_len, geom.len);
> ...
> 		bio = btrfs_bio_clone_partial(dio_bio, clone_offset, clone_len);
> 
> 
> On zoned btrfs, we create a SINGLE block group whose size is equal to
> the device zone size, so we have a 2 GB SINGLE block group on a 2 GB
> zone size drive. Then, on a SINGLE single block group,
> btrfs_io_geometry() returns the remaining length from @logical to the
> end of the block group regardless of the @len argument. Thus, with
> @logical == 0, we get geom->len = 2 GB, which is larger than INT_MAX,
> hitting the ASSERT.
> 
> I'm confusing because I'm not sure what the ASSERT wants to do. It
> might want to guard btrfs_bio_clone_partial() (and bio_trim()) below?
> But, since bio_trim() takes sector-based values, and the passed
> "clone_offset" and "clone_len" is byte-based, we can technically allow
> larger bytes than INT_MAX. (well, we might never build such large bio,
> though). And, it looks meaningless to check geom->len here. Since, it
> can be much larger than bio size on a SINGLE block group.
> 
> So, in summary, below are my questions.
> 
> 1. About btrfs_bio_clone_partial()
>   1.1 What is the meaning of geom->len?
>       - Length from @logical to the stripe boundary? or
>       - Length [logical, logical+len] can go without crossing the boundary?

So think of it as splitting a block group into 64k chunks (those are the
stripes) then finding in which such 64k chunk/stripe_nr logical falls
within, so len would be the length of an io you could do without
crossing this 64k boundary. This process is described in
https://github.com/btrfs/btrfs-dev-docs/blob/master/bio-submission.txt
under section 'Bio mapping'. With this in mind I'd say it's  [logical,
logical+len] without crossing a 64k boundary. !!!HOWEVER!!!!  This
applies to bg types other than SINGLE , that's the
if (map->type & BTRFS_BLOCK_GROUP_PROFILE_MASK) check in the function.
We do this because for raid levels we'd like the io granularity to be
64k, whereas for SINGLE we simply issue a single write, i.e len is
calculated as:

len = em->len - offset;

which is why you are getting such the size of the block group.



>   1.2 @len is currently unused in btrfs_bio_clone_partial(), is this correct?

It's used to calculate @clone_len parameter so it's indirectly involved
in btrfs_bio_clone_partial.

>   1.3 What should we fill into geom->len when the block group is SINGLE profile?

See above.

> 
> 2. About the ASSERT
>   2.1 Shouldn't we check submit_len (= bio's length) instead of geom->len ?

The assert got added quite a while back so I'm not entirely sure it's
pertinent.

>   2.2 Can't it be larger than INT_MAX? e.g., INT_MAX << SECTOR_SHIFT?
> 
> 3. About btrfs_bio_clone_partial()
>   3.1 We can change "int" to "u64" maybe?

Can we though? So if size in btrfs_bio_clone_partial is switched to u64
then in that fuction we do size >> 9 potentially having a value of at
most 55 bits, passing it to bio_trim, which in turn takes an int. So
you'd be truncating values at the time you call bio_trim, no ?

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Behavior of btrfs_io_geometry()
  2021-05-20  6:58 ` Nikolay Borisov
@ 2021-05-20  7:25   ` Naohiro Aota
  0 siblings, 0 replies; 3+ messages in thread
From: Naohiro Aota @ 2021-05-20  7:25 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: linux-btrfs

Thanks for the comments.

On Thu, May 20, 2021 at 09:58:28AM +0300, Nikolay Borisov wrote:
> 
> 
> On 20.05.21 г. 9:11, Naohiro Aota wrote:
> > I have a few questions about btrfs_io_geometry()'s behavior. 
> > 
> > While I'm testing zoned btrfs on ZNS drive with 2GB zone size, I hit
> > the following ASSERT in btrfs_submit_direct() by running fstests
> > btrfs/017.
> > 
> > static blk_qc_t btrfs_submit_direct(struct inode *inode, struct iomap *iomap,
> > 		struct bio *dio_bio, loff_t file_offset)
> > {
> > ...
> > 	start_sector = dio_bio->bi_iter.bi_sector;
> > 	submit_len = dio_bio->bi_iter.bi_size;
> > 
> > 	do {
> > 		logical = start_sector << 9;
> > 		em = btrfs_get_chunk_map(fs_info, logical, submit_len);
> > ...
> > 		ret = btrfs_get_io_geometry(fs_info, em, btrfs_op(dio_bio),
> > 					    logical, submit_len, &geom);
> > ...
> > 		ASSERT(geom.len <= INT_MAX);
> > 
> > 		clone_len = min_t(int, submit_len, geom.len);
> > ...
> > 		bio = btrfs_bio_clone_partial(dio_bio, clone_offset, clone_len);
> > 
> > 
> > On zoned btrfs, we create a SINGLE block group whose size is equal to
> > the device zone size, so we have a 2 GB SINGLE block group on a 2 GB
> > zone size drive. Then, on a SINGLE single block group,
> > btrfs_io_geometry() returns the remaining length from @logical to the
> > end of the block group regardless of the @len argument. Thus, with
> > @logical == 0, we get geom->len = 2 GB, which is larger than INT_MAX,
> > hitting the ASSERT.
> > 
> > I'm confusing because I'm not sure what the ASSERT wants to do. It
> > might want to guard btrfs_bio_clone_partial() (and bio_trim()) below?
> > But, since bio_trim() takes sector-based values, and the passed
> > "clone_offset" and "clone_len" is byte-based, we can technically allow
> > larger bytes than INT_MAX. (well, we might never build such large bio,
> > though). And, it looks meaningless to check geom->len here. Since, it
> > can be much larger than bio size on a SINGLE block group.
> > 
> > So, in summary, below are my questions.
> > 
> > 1. About btrfs_bio_clone_partial()
> >   1.1 What is the meaning of geom->len?
> >       - Length from @logical to the stripe boundary? or
> >       - Length [logical, logical+len] can go without crossing the boundary?
> 
> So think of it as splitting a block group into 64k chunks (those are the
> stripes) then finding in which such 64k chunk/stripe_nr logical falls
> within, so len would be the length of an io you could do without
> crossing this 64k boundary. This process is described in
> https://github.com/btrfs/btrfs-dev-docs/blob/master/bio-submission.txt
> under section 'Bio mapping'. With this in mind I'd say it's  [logical,
> logical+len] without crossing a 64k boundary. !!!HOWEVER!!!!  This
> applies to bg types other than SINGLE , that's the
> if (map->type & BTRFS_BLOCK_GROUP_PROFILE_MASK) check in the function.
> We do this because for raid levels we'd like the io granularity to be
> 64k, whereas for SINGLE we simply issue a single write, i.e len is
> calculated as:
> 
> len = em->len - offset;
> 
> which is why you are getting such the size of the block group.

OK, it's the current behavior. So, we should expect geom->len vary
the meaning depending on the block group profile type?

> >   1.2 @len is currently unused in btrfs_bio_clone_partial(), is this correct?
> 
> It's used to calculate @clone_len parameter so it's indirectly involved
> in btrfs_bio_clone_partial.

Oops, I made a mistake here. I meant btrfs_get_io_geometry() here. The
same for the bullet "1." above. In btrfs_get_io_geometry(), we don't
use @len.

> >   1.3 What should we fill into geom->len when the block group is SINGLE profile?
> 
> See above.
> 
> > 
> > 2. About the ASSERT
> >   2.1 Shouldn't we check submit_len (= bio's length) instead of geom->len ?
> 
> The assert got added quite a while back so I'm not entirely sure it's
> pertinent.
> 
> >   2.2 Can't it be larger than INT_MAX? e.g., INT_MAX << SECTOR_SHIFT?
> > 
> > 3. About btrfs_bio_clone_partial()
> >   3.1 We can change "int" to "u64" maybe?
> 
> Can we though? So if size in btrfs_bio_clone_partial is switched to u64
> then in that fuction we do size >> 9 potentially having a value of at
> most 55 bits, passing it to bio_trim, which in turn takes an int. So
> you'd be truncating values at the time you call bio_trim, no ?

Yes, but we can still safely allow the value up to "INT_MAX << 9",
can't we? We may need to add another ASSERT here in this case...

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-05-20  7:25 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-20  6:11 Behavior of btrfs_io_geometry() Naohiro Aota
2021-05-20  6:58 ` Nikolay Borisov
2021-05-20  7:25   ` Naohiro Aota

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.