linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* block: wrong return value by bio_end_sector?
@ 2022-09-30 15:59 Paolo Valente
  2022-09-30 16:35 ` Jens Axboe
  2022-10-01  0:50 ` Damien Le Moal
  0 siblings, 2 replies; 6+ messages in thread
From: Paolo Valente @ 2022-09-30 15:59 UTC (permalink / raw)
  To: Arie van der Hoeven, Tyler Erickson, Muhammad Ahmad, linux-block,
	linux-kernel, Jan Kara, andrea.righi, glen.valante, axboe,
	Michael English, Andrew Ring, Varun Boddu, Damien Le Moal

Hi Jens, Damien, all other possibly interested people,
this is to raise attention on a mistake that has emerged in a
thread on a bfq extension for multi-actuary drives [1].

The mistake is apparently in the macro bio_end_sector (defined in
include/linux/bio.h), which seems to be translated (incorrectly) as
sector+size, and not as sector+size-1.

For your convenience, I'm pasting a detailed description of the
problem, by Tyler (description taken from the above thread [1]).

The drive reports the actuator ranges as a starting LBA and a count of LBAs for the range.
If the code reading the reported values simply does startingLBA + range, this is an incorrect ending LBA for that actuator. This is because LBAs are zero indexed and this simple addition is not taking that into account.
The proper way to get the endingLBA is startingLBA + range - 1 to get the last LBA value for where to issue a final IO read/write to account for LBA values starting at zero rather than one.

Here is an example from the output in SeaChest/openSeaChest:
====Concurrent Positioning Ranges====

Range#     #Elements            Lowest LBA          # of LBAs      
  0            1                                               0           17578328064
  1            1                         17578328064           17578328064

If using the incorrect formula to get the final LBA for actuator 0, you would get 17578328064, but this is the starting LBA reported by the drive for actuator 1.
So to be consistent for all ranges, the final LBA for a given actuator should be calculated as starting LBA + range - 1.

I had reached out to Seagate's T10 and T13 representatives for clarification and verification and this is most likely what is causing the error is a missing - 1 somewhere after getting the information reported by the device. They agreed that the reporting from the drive and the SCSI to ATA translation is correct.

I'm not sure where this is being read and calculated, but it is not an error in the low-level libata or sd level of the kernel. It may be in bfq, or it may be in some other place after the sd layer. I know there were some additions to read this and report it up the stack, but I did not think those were wrong as they seemed to pass the drive reported information up the stack.

Jens, Damien, can you shed a light on this?

Thanks,
Paolo

[1] https://www.spinics.net/lists/kernel/msg4507408.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: block: wrong return value by bio_end_sector?
  2022-09-30 15:59 block: wrong return value by bio_end_sector? Paolo Valente
@ 2022-09-30 16:35 ` Jens Axboe
  2022-10-01  0:50 ` Damien Le Moal
  1 sibling, 0 replies; 6+ messages in thread
From: Jens Axboe @ 2022-09-30 16:35 UTC (permalink / raw)
  To: Paolo Valente, Arie van der Hoeven, Tyler Erickson,
	Muhammad Ahmad, linux-block, linux-kernel, Jan Kara,
	andrea.righi, glen.valante, Michael English, Andrew Ring,
	Varun Boddu, Damien Le Moal

On 9/30/22 9:59 AM, Paolo Valente wrote:
> Hi Jens, Damien, all other possibly interested people,
> this is to raise attention on a mistake that has emerged in a
> thread on a bfq extension for multi-actuary drives [1].
> 
> The mistake is apparently in the macro bio_end_sector (defined in
> include/linux/bio.h), which seems to be translated (incorrectly) as
> sector+size, and not as sector+size-1.
> 
> For your convenience, I'm pasting a detailed description of the
> problem, by Tyler (description taken from the above thread [1]).

I'm a little confused - currently it returns non-inclusive end, the
proposed change just make it inclusive. In general in the kernel the
former is used, and this one follows that.

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: block: wrong return value by bio_end_sector?
  2022-09-30 15:59 block: wrong return value by bio_end_sector? Paolo Valente
  2022-09-30 16:35 ` Jens Axboe
@ 2022-10-01  0:50 ` Damien Le Moal
  2022-10-02 16:20   ` Paolo Valente
  1 sibling, 1 reply; 6+ messages in thread
From: Damien Le Moal @ 2022-10-01  0:50 UTC (permalink / raw)
  To: Paolo Valente, Arie van der Hoeven, Tyler Erickson,
	Muhammad Ahmad, linux-block, linux-kernel, Jan Kara,
	andrea.righi, glen.valante, axboe, Michael English, Andrew Ring,
	Varun Boddu, Damien Le Moal

On 10/1/22 00:59, Paolo Valente wrote:
> Hi Jens, Damien, all other possibly interested people, this is to raise
> attention on a mistake that has emerged in a thread on a bfq extension
> for multi-actuary drives [1].
> 
> The mistake is apparently in the macro bio_end_sector (defined in 
> include/linux/bio.h), which seems to be translated (incorrectly) as 
> sector+size, and not as sector+size-1.

This has been like this for a long time, I think.

> 
> For your convenience, I'm pasting a detailed description of the 
> problem, by Tyler (description taken from the above thread [1]).
> 
> The drive reports the actuator ranges as a starting LBA and a count of
> LBAs for the range. If the code reading the reported values simply does
> startingLBA + range, this is an incorrect ending LBA for that actuator.

Well, yes. LBA 0 + drive capacity is also an incorrect LBA. If the code
assumes that it is, you have a classic off-by-one bug.

> This is because LBAs are zero indexed and this simple addition is not
> taking that into account. The proper way to get the endingLBA is
> startingLBA + range - 1 to get the last LBA value for where to issue a
> final IO read/write to account for LBA values starting at zero rather
> than one.

Yes. And ? Where is the issue ?

> 
> Here is an example from the output in SeaChest/openSeaChest: 
> ====Concurrent Positioning Ranges====
> 
> Range#     #Elements            Lowest LBA          # of LBAs 0
> 1                                               0
> 17578328064 1            1                         17578328064
> 17578328064
> 
> If using the incorrect formula to get the final LBA for actuator 0, you
> would get 17578328064, but this is the starting LBA reported by the
> drive for actuator 1. So to be consistent for all ranges, the final LBA
> for a given actuator should be calculated as starting LBA + range - 1.
> 
> I had reached out to Seagate's T10 and T13 representatives for
> clarification and verification and this is most likely what is causing
> the error is a missing - 1 somewhere after getting the information
> reported by the device. They agreed that the reporting from the drive
> and the SCSI to ATA translation is correct.
> 
> I'm not sure where this is being read and calculated, but it is not an
> error in the low-level libata or sd level of the kernel. It may be in
> bfq, or it may be in some other place after the sd layer. I know there
> were some additions to read this and report it up the stack, but I did
> not think those were wrong as they seemed to pass the drive reported
> information up the stack.
> 
> Jens, Damien, can you shed a light on this?

I am not clear on what the problem is exactly. This all sound like a
simple off-by-one issue if bfq support code. No ?

> 
> Thanks, Paolo
> 
> [1] https://www.spinics.net/lists/kernel/msg4507408.html

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: block: wrong return value by bio_end_sector?
  2022-10-01  0:50 ` Damien Le Moal
@ 2022-10-02 16:20   ` Paolo Valente
  2022-10-02 22:33     ` Damien Le Moal
  0 siblings, 1 reply; 6+ messages in thread
From: Paolo Valente @ 2022-10-02 16:20 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: Arie van der Hoeven, Tyler Erickson, Muhammad Ahmad, linux-block,
	linux-kernel, Jan Kara, andrea.righi, glen.valante, axboe,
	Michael English, Andrew Ring, Varun Boddu, Damien Le Moal



> Il giorno 1 ott 2022, alle ore 02:50, Damien Le Moal <damien.lemoal@opensource.wdc.com> ha scritto:
> 
> On 10/1/22 00:59, Paolo Valente wrote:
>> Hi Jens, Damien, all other possibly interested people, this is to raise
>> attention on a mistake that has emerged in a thread on a bfq extension
>> for multi-actuary drives [1].
>> 
>> The mistake is apparently in the macro bio_end_sector (defined in 
>> include/linux/bio.h), which seems to be translated (incorrectly) as 
>> sector+size, and not as sector+size-1.
> 
> This has been like this for a long time, I think.
> 
>> 
>> For your convenience, I'm pasting a detailed description of the 
>> problem, by Tyler (description taken from the above thread [1]).
>> 
>> The drive reports the actuator ranges as a starting LBA and a count of
>> LBAs for the range. If the code reading the reported values simply does
>> startingLBA + range, this is an incorrect ending LBA for that actuator.
> 
> Well, yes. LBA 0 + drive capacity is also an incorrect LBA. If the code
> assumes that it is, you have a classic off-by-one bug.
> 
>> This is because LBAs are zero indexed and this simple addition is not
>> taking that into account. The proper way to get the endingLBA is
>> startingLBA + range - 1 to get the last LBA value for where to issue a
>> final IO read/write to account for LBA values starting at zero rather
>> than one.
> 
> Yes. And ? Where is the issue ?
> 
>> 
>> Here is an example from the output in SeaChest/openSeaChest: 
>> ====Concurrent Positioning Ranges====
>> 
>> Range#     #Elements            Lowest LBA          # of LBAs 0
>> 1                                               0
>> 17578328064 1            1                         17578328064
>> 17578328064
>> 
>> If using the incorrect formula to get the final LBA for actuator 0, you
>> would get 17578328064, but this is the starting LBA reported by the
>> drive for actuator 1. So to be consistent for all ranges, the final LBA
>> for a given actuator should be calculated as starting LBA + range - 1.
>> 
>> I had reached out to Seagate's T10 and T13 representatives for
>> clarification and verification and this is most likely what is causing
>> the error is a missing - 1 somewhere after getting the information
>> reported by the device. They agreed that the reporting from the drive
>> and the SCSI to ATA translation is correct.
>> 
>> I'm not sure where this is being read and calculated, but it is not an
>> error in the low-level libata or sd level of the kernel. It may be in
>> bfq, or it may be in some other place after the sd layer. I know there
>> were some additions to read this and report it up the stack, but I did
>> not think those were wrong as they seemed to pass the drive reported
>> information up the stack.
>> 
>> Jens, Damien, can you shed a light on this?
> 
> I am not clear on what the problem is exactly. This all sound like a
> simple off-by-one issue if bfq support code. No ?

Yes, it's (also) in bfq code now; no, it's not only in bfq code. 

The involved bfq code is a replica of the following original function
(living in block/blk-iaranges.c):

static struct blk_independent_access_range *
disk_find_ia_range(struct blk_independent_access_ranges *iars,
		  sector_t sector)
{
	struct blk_independent_access_range *iar;
	int i;

	for (i = 0; i < iars->nr_ia_ranges; i++) {
		iar = &iars->ia_range[i];
		if (sector >= iar->sector &&
		    sector < iar->sector + iar->nr_sectors)
			return iar;
	}

	return NULL;
}

bfq's replica simply contains also this warning, right before the return NULL:

	WARN_ONCE(true,
		  "bfq_actuator_index: bio sector out of ranges: end=%llu\n",
		  end);

So, if this warning is triggered, then the sector does not belong to
any range.

That warning gets actually triggered [1], for a sector number that is
larger than the largest extreme (iar->sector + iar->nr_sectors) in the
iar data structure.  The offending value resulted to be simply equal
to the largest possible value accepted by the iar data structure, plus
one.

So, yes, this is an off-by-one error.  More precisely, there is a
mismatch between the above code (or the values stored the iar data
structure) and the value of bio_end_sector (the latter is passed as
input to the above function).  bio_end_sector does not seem to give
the end sector of a request (equal to begin+size-1), as apparently
expected by the above code, but the sector after the last one (namely,
begin+size).

Should I express an opinion on where the error is, I would say that
the mistake is in bio_end_sector.  But I could be totally wrong, as
I'm not a expert of either of the involved parts.  In addition,
bio_end_sector is already in use, with its current formula, by other
parts of the block layer.  If you guys (Damien, Jens, Tyler?, ...)
give us some guidance on what to fix exactly, we will be happy to make
a fix.

Thanks,
Paolo



> 
>> 
>> Thanks, Paolo
>> 
>> [1] https://www.spinics.net/lists/kernel/msg4507408.html
> 
> -- 
> Damien Le Moal
> Western Digital Research


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: block: wrong return value by bio_end_sector?
  2022-10-02 16:20   ` Paolo Valente
@ 2022-10-02 22:33     ` Damien Le Moal
  2022-10-04  8:46       ` Paolo Valente
  0 siblings, 1 reply; 6+ messages in thread
From: Damien Le Moal @ 2022-10-02 22:33 UTC (permalink / raw)
  To: Paolo Valente, Damien Le Moal
  Cc: Arie van der Hoeven, Tyler Erickson, Muhammad Ahmad, linux-block,
	linux-kernel, Jan Kara, andrea.righi, glen.valante, axboe,
	Michael English, Andrew Ring, Varun Boddu

On 2022/10/03 1:20, Paolo Valente wrote:
> 
> 
>> Il giorno 1 ott 2022, alle ore 02:50, Damien Le Moal <damien.lemoal@opensource.wdc.com> ha scritto:
>>
>> On 10/1/22 00:59, Paolo Valente wrote:
>>> Hi Jens, Damien, all other possibly interested people, this is to raise
>>> attention on a mistake that has emerged in a thread on a bfq extension
>>> for multi-actuary drives [1].
>>>
>>> The mistake is apparently in the macro bio_end_sector (defined in 
>>> include/linux/bio.h), which seems to be translated (incorrectly) as 
>>> sector+size, and not as sector+size-1.
>>
>> This has been like this for a long time, I think.
>>
>>>
>>> For your convenience, I'm pasting a detailed description of the 
>>> problem, by Tyler (description taken from the above thread [1]).
>>>
>>> The drive reports the actuator ranges as a starting LBA and a count of
>>> LBAs for the range. If the code reading the reported values simply does
>>> startingLBA + range, this is an incorrect ending LBA for that actuator.
>>
>> Well, yes. LBA 0 + drive capacity is also an incorrect LBA. If the code
>> assumes that it is, you have a classic off-by-one bug.
>>
>>> This is because LBAs are zero indexed and this simple addition is not
>>> taking that into account. The proper way to get the endingLBA is
>>> startingLBA + range - 1 to get the last LBA value for where to issue a
>>> final IO read/write to account for LBA values starting at zero rather
>>> than one.
>>
>> Yes. And ? Where is the issue ?
>>
>>>
>>> Here is an example from the output in SeaChest/openSeaChest: 
>>> ====Concurrent Positioning Ranges====
>>>
>>> Range#     #Elements            Lowest LBA          # of LBAs 0
>>> 1                                               0
>>> 17578328064 1            1                         17578328064
>>> 17578328064
>>>
>>> If using the incorrect formula to get the final LBA for actuator 0, you
>>> would get 17578328064, but this is the starting LBA reported by the
>>> drive for actuator 1. So to be consistent for all ranges, the final LBA
>>> for a given actuator should be calculated as starting LBA + range - 1.
>>>
>>> I had reached out to Seagate's T10 and T13 representatives for
>>> clarification and verification and this is most likely what is causing
>>> the error is a missing - 1 somewhere after getting the information
>>> reported by the device. They agreed that the reporting from the drive
>>> and the SCSI to ATA translation is correct.
>>>
>>> I'm not sure where this is being read and calculated, but it is not an
>>> error in the low-level libata or sd level of the kernel. It may be in
>>> bfq, or it may be in some other place after the sd layer. I know there
>>> were some additions to read this and report it up the stack, but I did
>>> not think those were wrong as they seemed to pass the drive reported
>>> information up the stack.
>>>
>>> Jens, Damien, can you shed a light on this?
>>
>> I am not clear on what the problem is exactly. This all sound like a
>> simple off-by-one issue if bfq support code. No ?
> 
> Yes, it's (also) in bfq code now; no, it's not only in bfq code. 
> 
> The involved bfq code is a replica of the following original function
> (living in block/blk-iaranges.c):
> 
> static struct blk_independent_access_range *
> disk_find_ia_range(struct blk_independent_access_ranges *iars,
> 		  sector_t sector)
> {
> 	struct blk_independent_access_range *iar;
> 	int i;
> 
> 	for (i = 0; i < iars->nr_ia_ranges; i++) {
> 		iar = &iars->ia_range[i];
> 		if (sector >= iar->sector &&
> 		    sector < iar->sector + iar->nr_sectors)
> 			return iar;
> 	}
> 
> 	return NULL;
> }
> 
> bfq's replica simply contains also this warning, right before the return NULL:
> 
> 	WARN_ONCE(true,
> 		  "bfq_actuator_index: bio sector out of ranges: end=%llu\n",
> 		  end);
> 
> So, if this warning is triggered, then the sector does not belong to
> any range.
> 
> That warning gets actually triggered [1], for a sector number that is
> larger than the largest extreme (iar->sector + iar->nr_sectors) in the
> iar data structure.  The offending value resulted to be simply equal
> to the largest possible value accepted by the iar data structure, plus
> one.
> 
> So, yes, this is an off-by-one error.  More precisely, there is a
> mismatch between the above code (or the values stored the iar data
> structure) and the value of bio_end_sector (the latter is passed as
> input to the above function).  bio_end_sector does not seem to give
> the end sector of a request (equal to begin+size-1), as apparently
> expected by the above code, but the sector after the last one (namely,
> begin+size).
> 
> Should I express an opinion on where the error is, I would say that
> the mistake is in bio_end_sector.  But I could be totally wrong, as
> I'm not a expert of either of the involved parts.  In addition,
> bio_end_sector is already in use, with its current formula, by other
> parts of the block layer.  If you guys (Damien, Jens, Tyler?, ...)
> give us some guidance on what to fix exactly, we will be happy to make
> a fix.

I do not think there is any error/problem anywhere with bio_end_sector(). But
indeed, the name is slightly misleading as it doe not return the last sector
processed by the bio but the next one. The reason is that with this
implementation, bio_end_sector() always gives you the first sector for the next
bio with a multi-bio request.

When you need the last sector processed by the bio, you need to use
bio_end_sector(bio) - 1.

So to find the access range that served a completed bio, you simply need to call:

disk_find_ia_range(iars, bio_end_sector(bio) - 1);

You probably could add a couple of helper functions for getting an access range
from a bio sector or end sector. Something like:

static inline struct blk_independent_access_range *
__bio_sector_access_range(struct bio *bio, sector_t sector)
{
	struct gendisk *disk = bio->bi_bdev->bd_disk;

	if (!disk->ia_ranges)
		return NULL;

	iar = disk_find_ia_range(disk->ia_ranges, sector);
	WARN_ONCE(!iar,
		  "bfq_actuator_index: bio sector out of ranges: end=%llu\n",
		  end);

	return iar;
}

static inline struct blk_independent_access_range *
bio_sector_access_range(struct bio *bio)
{
	return __bio_sector_access_range(bio, bio_sector(bio));
}

static inline struct blk_independent_access_range *
bio_end_sector_access_range(struct bio *bio)
{
	return __bio_sector_access_range(bio, bio_end_sector(bio) - 1);
}

or similar. Then your code will be simpler and much less worries about of-by-one
bugs.

> 
> Thanks,
> Paolo
> 
> 
> 
>>
>>>
>>> Thanks, Paolo
>>>
>>> [1] https://www.spinics.net/lists/kernel/msg4507408.html
>>
>> -- 
>> Damien Le Moal
>> Western Digital Research
> 
> 

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: block: wrong return value by bio_end_sector?
  2022-10-02 22:33     ` Damien Le Moal
@ 2022-10-04  8:46       ` Paolo Valente
  0 siblings, 0 replies; 6+ messages in thread
From: Paolo Valente @ 2022-10-04  8:46 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: Damien Le Moal, Arie van der Hoeven, Tyler Erickson,
	Muhammad Ahmad, linux-block, linux-kernel, Jan Kara,
	andrea.righi, glen.valante, axboe, Michael English, Andrew Ring,
	Varun Boddu



> Il giorno 3 ott 2022, alle ore 00:33, Damien Le Moal <Damien.LeMoal@wdc.com> ha scritto:
> 
> On 2022/10/03 1:20, Paolo Valente wrote:
>> 
>> 
>>> Il giorno 1 ott 2022, alle ore 02:50, Damien Le Moal <damien.lemoal@opensource.wdc.com> ha scritto:
>>> 
>>> On 10/1/22 00:59, Paolo Valente wrote:
>>>> Hi Jens, Damien, all other possibly interested people, this is to raise
>>>> attention on a mistake that has emerged in a thread on a bfq extension
>>>> for multi-actuary drives [1].
>>>> 
>>>> The mistake is apparently in the macro bio_end_sector (defined in 
>>>> include/linux/bio.h), which seems to be translated (incorrectly) as 
>>>> sector+size, and not as sector+size-1.
>>> 
>>> This has been like this for a long time, I think.
>>> 
>>>> 
>>>> For your convenience, I'm pasting a detailed description of the 
>>>> problem, by Tyler (description taken from the above thread [1]).
>>>> 
>>>> The drive reports the actuator ranges as a starting LBA and a count of
>>>> LBAs for the range. If the code reading the reported values simply does
>>>> startingLBA + range, this is an incorrect ending LBA for that actuator.
>>> 
>>> Well, yes. LBA 0 + drive capacity is also an incorrect LBA. If the code
>>> assumes that it is, you have a classic off-by-one bug.
>>> 
>>>> This is because LBAs are zero indexed and this simple addition is not
>>>> taking that into account. The proper way to get the endingLBA is
>>>> startingLBA + range - 1 to get the last LBA value for where to issue a
>>>> final IO read/write to account for LBA values starting at zero rather
>>>> than one.
>>> 
>>> Yes. And ? Where is the issue ?
>>> 
>>>> 
>>>> Here is an example from the output in SeaChest/openSeaChest: 
>>>> ====Concurrent Positioning Ranges====
>>>> 
>>>> Range#     #Elements            Lowest LBA          # of LBAs 0
>>>> 1                                               0
>>>> 17578328064 1            1                         17578328064
>>>> 17578328064
>>>> 
>>>> If using the incorrect formula to get the final LBA for actuator 0, you
>>>> would get 17578328064, but this is the starting LBA reported by the
>>>> drive for actuator 1. So to be consistent for all ranges, the final LBA
>>>> for a given actuator should be calculated as starting LBA + range - 1.
>>>> 
>>>> I had reached out to Seagate's T10 and T13 representatives for
>>>> clarification and verification and this is most likely what is causing
>>>> the error is a missing - 1 somewhere after getting the information
>>>> reported by the device. They agreed that the reporting from the drive
>>>> and the SCSI to ATA translation is correct.
>>>> 
>>>> I'm not sure where this is being read and calculated, but it is not an
>>>> error in the low-level libata or sd level of the kernel. It may be in
>>>> bfq, or it may be in some other place after the sd layer. I know there
>>>> were some additions to read this and report it up the stack, but I did
>>>> not think those were wrong as they seemed to pass the drive reported
>>>> information up the stack.
>>>> 
>>>> Jens, Damien, can you shed a light on this?
>>> 
>>> I am not clear on what the problem is exactly. This all sound like a
>>> simple off-by-one issue if bfq support code. No ?
>> 
>> Yes, it's (also) in bfq code now; no, it's not only in bfq code. 
>> 
>> The involved bfq code is a replica of the following original function
>> (living in block/blk-iaranges.c):
>> 
>> static struct blk_independent_access_range *
>> disk_find_ia_range(struct blk_independent_access_ranges *iars,
>> 		  sector_t sector)
>> {
>> 	struct blk_independent_access_range *iar;
>> 	int i;
>> 
>> 	for (i = 0; i < iars->nr_ia_ranges; i++) {
>> 		iar = &iars->ia_range[i];
>> 		if (sector >= iar->sector &&
>> 		    sector < iar->sector + iar->nr_sectors)
>> 			return iar;
>> 	}
>> 
>> 	return NULL;
>> }
>> 
>> bfq's replica simply contains also this warning, right before the return NULL:
>> 
>> 	WARN_ONCE(true,
>> 		  "bfq_actuator_index: bio sector out of ranges: end=%llu\n",
>> 		  end);
>> 
>> So, if this warning is triggered, then the sector does not belong to
>> any range.
>> 
>> That warning gets actually triggered [1], for a sector number that is
>> larger than the largest extreme (iar->sector + iar->nr_sectors) in the
>> iar data structure.  The offending value resulted to be simply equal
>> to the largest possible value accepted by the iar data structure, plus
>> one.
>> 
>> So, yes, this is an off-by-one error.  More precisely, there is a
>> mismatch between the above code (or the values stored the iar data
>> structure) and the value of bio_end_sector (the latter is passed as
>> input to the above function).  bio_end_sector does not seem to give
>> the end sector of a request (equal to begin+size-1), as apparently
>> expected by the above code, but the sector after the last one (namely,
>> begin+size).
>> 
>> Should I express an opinion on where the error is, I would say that
>> the mistake is in bio_end_sector.  But I could be totally wrong, as
>> I'm not a expert of either of the involved parts.  In addition,
>> bio_end_sector is already in use, with its current formula, by other
>> parts of the block layer.  If you guys (Damien, Jens, Tyler?, ...)
>> give us some guidance on what to fix exactly, we will be happy to make
>> a fix.
> 
> I do not think there is any error/problem anywhere with bio_end_sector(). But
> indeed, the name is slightly misleading as it doe not return the last sector
> processed by the bio but the next one. The reason is that with this
> implementation, bio_end_sector() always gives you the first sector for the next
> bio with a multi-bio request.
> 
> When you need the last sector processed by the bio, you need to use
> bio_end_sector(bio) - 1.
> 
> So to find the access range that served a completed bio, you simply need to call:
> 
> disk_find_ia_range(iars, bio_end_sector(bio) - 1);
> 

Thank you very much, I got it now.

> You probably could add a couple of helper functions for getting an access range
> from a bio sector or end sector. Something like:
> 
> static inline struct blk_independent_access_range *
> __bio_sector_access_range(struct bio *bio, sector_t sector)
> {
> 	struct gendisk *disk = bio->bi_bdev->bd_disk;
> 
> 	if (!disk->ia_ranges)
> 		return NULL;
> 
> 	iar = disk_find_ia_range(disk->ia_ranges, sector);
> 	WARN_ONCE(!iar,
> 		  "bfq_actuator_index: bio sector out of ranges: end=%llu\n",
> 		  end);
> 
> 	return iar;
> }
> 
> static inline struct blk_independent_access_range *
> bio_sector_access_range(struct bio *bio)
> {
> 	return __bio_sector_access_range(bio, bio_sector(bio));
> }
> 
> static inline struct blk_independent_access_range *
> bio_end_sector_access_range(struct bio *bio)
> {
> 	return __bio_sector_access_range(bio, bio_end_sector(bio) - 1);
> }
> 
> or similar. Then your code will be simpler and much less worries about of-by-one
> bugs.
> 

For the moment, bio_end_sector is invoked in only one place (for the
range-computation stuff).  So, for simplicity, I'd go for just
appending a -1 after such invocation.  To give you credits, I'm adding
you in a reviewed-by tag.  If anything of this is wrong, I'm willing
to change it.  I'll send a V2 soon.

Thanks,
Paolo

>> 
>> Thanks,
>> Paolo
>> 
>> 
>> 
>>> 
>>>> 
>>>> Thanks, Paolo
>>>> 
>>>> [1] https://www.spinics.net/lists/kernel/msg4507408.html
>>> 
>>> -- 
>>> Damien Le Moal
>>> Western Digital Research
>> 
>> 
> 
> -- 
> Damien Le Moal
> Western Digital Research


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-10-04  8:46 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-30 15:59 block: wrong return value by bio_end_sector? Paolo Valente
2022-09-30 16:35 ` Jens Axboe
2022-10-01  0:50 ` Damien Le Moal
2022-10-02 16:20   ` Paolo Valente
2022-10-02 22:33     ` Damien Le Moal
2022-10-04  8:46       ` Paolo Valente

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).