* [PATCH 0/1] t10-pi bio split fix
@ 2021-12-20 13:44 Alexey Lyashkov
2021-12-20 13:44 ` [PATCH 1/1] t10-pi: fix intergrity iterator Alexey Lyashkov
2021-12-20 16:29 ` [PATCH 0/1] t10-pi bio split fix Martin K. Petersen
0 siblings, 2 replies; 8+ messages in thread
From: Alexey Lyashkov @ 2021-12-20 13:44 UTC (permalink / raw)
To: Jens Axboe, Bart Van Assche, fio
Cc: Damien Le Moal, Dmitry Fomichev, Alexey Lyashkov
Hello All,
During HPe testing Lustre team discovered bio incorrectly splits with
integrity info attached.
Some research pointed me to the two fixes related to my problem,
first was
commit f36ea50ca0043e7b1204feaf1d2ba6bd68c08d36
Author: Wen Xiong <wenxiong@xxxxxxxxxxxxxxxxxx>
Date: Wed May 10 08:54:11 2017 -0500
blk-mq: NVMe 512B/4K+T10 DIF/DIX format returns I/O error on dd with split op
When formatting NVMe to 512B/4K + T10 DIf/DIX, dd with split op returns
"Input/output error". Looks block layer split the bio after calling
bio_integrity_prep(bio). This patch fixes the issue.
and second was
commit e4dc9a4c31fe10d1751c542702afc85be8a5c56a
Author: Israel Rukshin <israelr@xxxxxxxxxxxx>
Date: Wed Dec 11 17:36:02 2019 +0200
scsi: target/iblock: Fix protection error with blocks greater than 512B
…
Block io trace pointed me to the three functions called in my case, it is
t10_pi_generate, bio_integrity_advance, t10_pi_type1_prepare.
Looking in code - t10_pi_generate generate a ref_tag based on “virtual” block
number (512b base), and t10_pi_type1_prepare - converts this data to the the
real ref_tag (aka device block number), but sometimes it’s don’t mapped.
Looking to the
void bio_integrity_advance(struct bio *bio, unsigned int bytes_done)
{
struct bio_integrity_payload *bip = bio_integrity(bio);
struct blk_integrity *bi = blk_get_integrity(bio->bi_disk);
unsigned bytes = bio_integrity_bytes(bi, bytes_done >> 9);
bip->bip_iter.bi_sector += bytes_done >> 9;
bvec_iter_advance(bip->bip_vec, &bip->bip_iter, bytes);
}
it have shit an iterator in the 512b block size base it looks right, but wait…
static blk_status_t t10_pi_generate(struct blk_integrity_iter *iter,
csum_fn *fn, enum t10_dif_type type)
{
unsigned int i;
for (i = 0 ; i < iter->data_size ; i += iter->interval) {
struct t10_pi_tuple *pi = iter->prot_buf;
iter->seed++; <<<
}
return BLK_STS_OK;
}
t10_pi_generate / t10_pi_type1_prepare have just a increment by “1” for
the integrity internal which is 4k in my case,
so any bio_integrity_advance call will be move an iterator outside of
generated sequence and t10_pi_type1_prepare can’t be found a good virtual
sector for the mapping.
Changing an increment by “1” to be related to the real integrity size
solve a problem completely.
Attached patch passed my own testing on raid0 with 4k block size.
Alexey Lyashkov (1):
t10-pi: fix intergrity iterator
block/t10-pi.c | 14 ++++++++++----
1 file changed, 10 insertions(+), 4 deletions(-)
--
2.27.0
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 1/1] t10-pi: fix intergrity iterator
2021-12-20 13:44 [PATCH 0/1] t10-pi bio split fix Alexey Lyashkov
@ 2021-12-20 13:44 ` Alexey Lyashkov
2021-12-20 16:29 ` [PATCH 0/1] t10-pi bio split fix Martin K. Petersen
1 sibling, 0 replies; 8+ messages in thread
From: Alexey Lyashkov @ 2021-12-20 13:44 UTC (permalink / raw)
To: Jens Axboe, Bart Van Assche, fio
Cc: Damien Le Moal, Dmitry Fomichev, Alexey Lyashkov
Integrity iterator should counted in the SECTOR_SIZE units,
not in the interity intervals.
It make bio_integrity_advance to be happy on bio_split.
Signed-off-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
---
block/t10-pi.c | 14 ++++++++++----
1 file changed, 10 insertions(+), 4 deletions(-)
diff --git a/block/t10-pi.c b/block/t10-pi.c
index 25a52a2a09a8..bbdecead214e 100644
--- a/block/t10-pi.c
+++ b/block/t10-pi.c
@@ -31,6 +31,7 @@ static blk_status_t t10_pi_generate(struct blk_integrity_iter *iter,
csum_fn *fn, enum t10_dif_type type)
{
unsigned int i;
+ unsigned int pi_size = iter->interval >> SECTOR_SHIFT;
for (i = 0 ; i < iter->data_size ; i += iter->interval) {
struct t10_pi_tuple *pi = iter->prot_buf;
@@ -45,7 +46,7 @@ static blk_status_t t10_pi_generate(struct blk_integrity_iter *iter,
iter->data_buf += iter->interval;
iter->prot_buf += sizeof(struct t10_pi_tuple);
- iter->seed++;
+ iter->seed+= pi_size;
}
return BLK_STS_OK;
@@ -55,6 +56,7 @@ static blk_status_t t10_pi_verify(struct blk_integrity_iter *iter,
csum_fn *fn, enum t10_dif_type type)
{
unsigned int i;
+ unsigned int pi_size = iter->interval >> SECTOR_SHIFT;
BUG_ON(type == T10_PI_TYPE0_PROTECTION);
@@ -94,7 +96,7 @@ static blk_status_t t10_pi_verify(struct blk_integrity_iter *iter,
next:
iter->data_buf += iter->interval;
iter->prot_buf += sizeof(struct t10_pi_tuple);
- iter->seed++;
+ iter->seed += pi_size;
}
return BLK_STS_OK;
@@ -135,6 +137,7 @@ static void t10_pi_type1_prepare(struct request *rq)
const int tuple_sz = rq->q->integrity.tuple_size;
u32 ref_tag = t10_pi_ref_tag(rq);
struct bio *bio;
+ unsigned int pi_size = 1 << (rq->q->integrity.interval_exp - SECTOR_SHIFT);
__rq_for_each_bio(bio, rq) {
struct bio_integrity_payload *bip = bio_integrity(bio);
@@ -156,7 +159,8 @@ static void t10_pi_type1_prepare(struct request *rq)
if (be32_to_cpu(pi->ref_tag) == virt)
pi->ref_tag = cpu_to_be32(ref_tag);
- virt++;
+
+ virt += pi_size;
ref_tag++;
p += tuple_sz;
}
@@ -185,6 +189,7 @@ static void t10_pi_type1_complete(struct request *rq, unsigned int nr_bytes)
const int tuple_sz = rq->q->integrity.tuple_size;
u32 ref_tag = t10_pi_ref_tag(rq);
struct bio *bio;
+ unsigned int pi_size = 1 << (rq->q->integrity.interval_exp - SECTOR_SHIFT);
__rq_for_each_bio(bio, rq) {
struct bio_integrity_payload *bip = bio_integrity(bio);
@@ -202,7 +207,8 @@ static void t10_pi_type1_complete(struct request *rq, unsigned int nr_bytes)
if (be32_to_cpu(pi->ref_tag) == ref_tag)
pi->ref_tag = cpu_to_be32(virt);
- virt++;
+
+ virt+= pi_size;
ref_tag++;
intervals--;
p += tuple_sz;
--
2.27.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 0/1] t10-pi bio split fix
2021-12-20 13:44 [PATCH 0/1] t10-pi bio split fix Alexey Lyashkov
2021-12-20 13:44 ` [PATCH 1/1] t10-pi: fix intergrity iterator Alexey Lyashkov
@ 2021-12-20 16:29 ` Martin K. Petersen
2021-12-24 12:16 ` Lyashkov, Alexey
1 sibling, 1 reply; 8+ messages in thread
From: Martin K. Petersen @ 2021-12-20 16:29 UTC (permalink / raw)
To: Alexey Lyashkov
Cc: Jens Axboe, Bart Van Assche, fio, Damien Le Moal, Dmitry Fomichev
Alexey,
> t10_pi_generate / t10_pi_type1_prepare have just a increment by “1” for
> the integrity internal which is 4k in my case,
> so any bio_integrity_advance call will be move an iterator outside of
> generated sequence and t10_pi_type1_prepare can’t be found a good virtual
> sector for the mapping.
> Changing an increment by “1” to be related to the real integrity size
> solve a problem completely.
By definition the T10 PI reference tag is incremented by one per
interval (typically the logical block size). If you implement it by a
different value than one then it is no longer valid protection
information.
Seems like the splitting logic is broken somehow although I haven't seen
any failures with 4K on SCSI. What does your storage stack look like?
--
Martin K. Petersen Oracle Linux Engineering
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 0/1] t10-pi bio split fix
2021-12-20 16:29 ` [PATCH 0/1] t10-pi bio split fix Martin K. Petersen
@ 2021-12-24 12:16 ` Lyashkov, Alexey
2021-12-30 2:23 ` Damien Le Moal
0 siblings, 1 reply; 8+ messages in thread
From: Lyashkov, Alexey @ 2021-12-24 12:16 UTC (permalink / raw)
To: Martin K. Petersen
Cc: Jens Axboe, Bart Van Assche, fio, Damien Le Moal, Dmitry Fomichev
Martin,
Sorry about delay.
I don't agree with you about T10 PI reference tag in current code.
t10_pi_generate works with virtual block numbers and virtual reference tags.
Virtual tag mapped into real tag later in the
static void t10_pi_type1_prepare(struct request *rq)
...
if (be32_to_cpu(pi->ref_tag) == virt)
pi->ref_tag = cpu_to_be32(ref_tag);
...
So, we need just a pair between these functions to have a good mapping and good real reference tag
Once t10_pi_generate have shift a "virtual" ref tag for 4 it make a bio_integrity_advance to be happy.
And t10_pi_type1_prepare also happy but it need to be shift with 4 as similar to the generate function.
This patch tested with software raid (raid 1 / raid 6) over over NMVe devices with 4k block size.
In lustre case it caused a bio integrity prepare called before bio_submit so integrity will be splits before sends to the nvme devices.
Without patch it caused an T10 write errors for each write over 4k, with patch - no errors.
Alex
On 20/12/2021, 19:29, "Martin K. Petersen" <martin.petersen@oracle.com> wrote:
Alexey,
> t10_pi_generate / t10_pi_type1_prepare have just a increment by “1” for
> the integrity internal which is 4k in my case,
> so any bio_integrity_advance call will be move an iterator outside of
> generated sequence and t10_pi_type1_prepare can’t be found a good virtual
> sector for the mapping.
> Changing an increment by “1” to be related to the real integrity size
> solve a problem completely.
By definition the T10 PI reference tag is incremented by one per
interval (typically the logical block size). If you implement it by a
different value than one then it is no longer valid protection
information.
Seems like the splitting logic is broken somehow although I haven't seen
any failures with 4K on SCSI. What does your storage stack look like?
--
Martin K. Petersen Oracle Linux Engineering
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 0/1] t10-pi bio split fix
2021-12-24 12:16 ` Lyashkov, Alexey
@ 2021-12-30 2:23 ` Damien Le Moal
2022-01-13 11:13 ` Lyashkov, Alexey
0 siblings, 1 reply; 8+ messages in thread
From: Damien Le Moal @ 2021-12-30 2:23 UTC (permalink / raw)
To: Lyashkov, Alexey, Martin K. Petersen
Cc: Jens Axboe, Bart Van Assche, Dmitry Fomichev, linux-block, linux-scsi
On 12/24/21 21:16, Lyashkov, Alexey wrote:
This thread should really be addressed to linux-block@vger.kernel.org
and linux-scsi@vger.kernel.org.
Added to cc here.
> Martin,
>
> Sorry about delay.
>
> I don't agree with you about T10 PI reference tag in current code.
> t10_pi_generate works with virtual block numbers and virtual reference tags.
> Virtual tag mapped into real tag later in the
> static void t10_pi_type1_prepare(struct request *rq)
> ...
> if (be32_to_cpu(pi->ref_tag) == virt)
> pi->ref_tag = cpu_to_be32(ref_tag);
>
> ...
> So, we need just a pair between these functions to have a good mapping and good real reference tag
> Once t10_pi_generate have shift a "virtual" ref tag for 4 it make a bio_integrity_advance to be happy.
> And t10_pi_type1_prepare also happy but it need to be shift with 4 as similar to the generate function.
>
> This patch tested with software raid (raid 1 / raid 6) over over NMVe devices with 4k block size.
> In lustre case it caused a bio integrity prepare called before bio_submit so integrity will be splits before sends to the nvme devices.
> Without patch it caused an T10 write errors for each write over 4k, with patch - no errors.
>
> Alex
>
> On 20/12/2021, 19:29, "Martin K. Petersen" <martin.petersen@oracle.com> wrote:
>
>
> Alexey,
>
> > t10_pi_generate / t10_pi_type1_prepare have just a increment by “1” for
> > the integrity internal which is 4k in my case,
> > so any bio_integrity_advance call will be move an iterator outside of
> > generated sequence and t10_pi_type1_prepare can’t be found a good virtual
> > sector for the mapping.
> > Changing an increment by “1” to be related to the real integrity size
> > solve a problem completely.
>
> By definition the T10 PI reference tag is incremented by one per
> interval (typically the logical block size). If you implement it by a
> different value than one then it is no longer valid protection
> information.
>
> Seems like the splitting logic is broken somehow although I haven't seen
> any failures with 4K on SCSI. What does your storage stack look like?
>
> --
> Martin K. Petersen Oracle Linux Engineering
>
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 0/1] t10-pi bio split fix
2021-12-30 2:23 ` Damien Le Moal
@ 2022-01-13 11:13 ` Lyashkov, Alexey
2022-01-13 19:59 ` Martin K. Petersen
0 siblings, 1 reply; 8+ messages in thread
From: Lyashkov, Alexey @ 2022-01-13 11:13 UTC (permalink / raw)
To: Damien Le Moal, Martin K. Petersen
Cc: Jens Axboe, Bart Van Assche, Dmitry Fomichev, linux-block, linux-scsi
Ping for review.
On 30/12/2021, 05:24, "Damien Le Moal" <damien.lemoal@opensource.wdc.com> wrote:
On 12/24/21 21:16, Lyashkov, Alexey wrote:
This thread should really be addressed to linux-block@vger.kernel.org
and linux-scsi@vger.kernel.org.
Added to cc here.
> Martin,
>
> Sorry about delay.
>
> I don't agree with you about T10 PI reference tag in current code.
> t10_pi_generate works with virtual block numbers and virtual reference tags.
> Virtual tag mapped into real tag later in the
> static void t10_pi_type1_prepare(struct request *rq)
> ...
> if (be32_to_cpu(pi->ref_tag) == virt)
> pi->ref_tag = cpu_to_be32(ref_tag);
>
> ...
> So, we need just a pair between these functions to have a good mapping and good real reference tag
> Once t10_pi_generate have shift a "virtual" ref tag for 4 it make a bio_integrity_advance to be happy.
> And t10_pi_type1_prepare also happy but it need to be shift with 4 as similar to the generate function.
>
> This patch tested with software raid (raid 1 / raid 6) over over NMVe devices with 4k block size.
> In lustre case it caused a bio integrity prepare called before bio_submit so integrity will be splits before sends to the nvme devices.
> Without patch it caused an T10 write errors for each write over 4k, with patch - no errors.
>
> Alex
>
> On 20/12/2021, 19:29, "Martin K. Petersen" <martin.petersen@oracle.com> wrote:
>
>
> Alexey,
>
> > t10_pi_generate / t10_pi_type1_prepare have just a increment by “1” for
> > the integrity internal which is 4k in my case,
> > so any bio_integrity_advance call will be move an iterator outside of
> > generated sequence and t10_pi_type1_prepare can’t be found a good virtual
> > sector for the mapping.
> > Changing an increment by “1” to be related to the real integrity size
> > solve a problem completely.
>
> By definition the T10 PI reference tag is incremented by one per
> interval (typically the logical block size). If you implement it by a
> different value than one then it is no longer valid protection
> information.
>
> Seems like the splitting logic is broken somehow although I haven't seen
> any failures with 4K on SCSI. What does your storage stack look like?
>
> --
> Martin K. Petersen Oracle Linux Engineering
>
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 0/1] t10-pi bio split fix
2022-01-13 11:13 ` Lyashkov, Alexey
@ 2022-01-13 19:59 ` Martin K. Petersen
2022-01-14 12:25 ` Lyashkov, Alexey
0 siblings, 1 reply; 8+ messages in thread
From: Martin K. Petersen @ 2022-01-13 19:59 UTC (permalink / raw)
To: Lyashkov, Alexey
Cc: Damien Le Moal, Martin K. Petersen, Jens Axboe, Bart Van Assche,
Dmitry Fomichev, linux-block, linux-scsi
Alexey,
> Ping for review.
Can you please try the following patch?
--
Martin K. Petersen Oracle Linux Engineering
diff --git a/block/bio-integrity.c b/block/bio-integrity.c
index c0eb901315f9..fa5bc5b13c6a 100644
--- a/block/bio-integrity.c
+++ b/block/bio-integrity.c
@@ -387,7 +387,7 @@ void bio_integrity_advance(struct bio *bio, unsigned int bytes_done)
struct blk_integrity *bi = blk_get_integrity(bio->bi_disk);
unsigned bytes = bio_integrity_bytes(bi, bytes_done >> 9);
- bip->bip_iter.bi_sector += bytes_done >> 9;
+ bip->bip_iter.bi_sector += bio_integrity_intervals(bi, bytes_done >> 9);
bvec_iter_advance(bip->bip_vec, &bip->bip_iter, bytes);
}
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 0/1] t10-pi bio split fix
2022-01-13 19:59 ` Martin K. Petersen
@ 2022-01-14 12:25 ` Lyashkov, Alexey
0 siblings, 0 replies; 8+ messages in thread
From: Lyashkov, Alexey @ 2022-01-14 12:25 UTC (permalink / raw)
To: Martin K. Petersen
Cc: Damien Le Moal, Jens Axboe, Bart Van Assche, Dmitry Fomichev,
linux-block, linux-scsi
Martin,
I will test shortly. But it mean bi_sector will be in the integrity interval units while
any sectors expected to be in the SECTOR_SIZE units and it confuse while debug.
Alex
On 13/01/2022, 22:59, "Martin K. Petersen" <martin.petersen@oracle.com> wrote:
Alexey,
> Ping for review.
Can you please try the following patch?
--
Martin K. Petersen Oracle Linux Engineering
diff --git a/block/bio-integrity.c b/block/bio-integrity.c
index c0eb901315f9..fa5bc5b13c6a 100644
--- a/block/bio-integrity.c
+++ b/block/bio-integrity.c
@@ -387,7 +387,7 @@ void bio_integrity_advance(struct bio *bio, unsigned int bytes_done)
struct blk_integrity *bi = blk_get_integrity(bio->bi_disk);
unsigned bytes = bio_integrity_bytes(bi, bytes_done >> 9);
- bip->bip_iter.bi_sector += bytes_done >> 9;
+ bip->bip_iter.bi_sector += bio_integrity_intervals(bi, bytes_done >> 9);
bvec_iter_advance(bip->bip_vec, &bip->bip_iter, bytes);
}
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2022-01-14 12:25 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-20 13:44 [PATCH 0/1] t10-pi bio split fix Alexey Lyashkov
2021-12-20 13:44 ` [PATCH 1/1] t10-pi: fix intergrity iterator Alexey Lyashkov
2021-12-20 16:29 ` [PATCH 0/1] t10-pi bio split fix Martin K. Petersen
2021-12-24 12:16 ` Lyashkov, Alexey
2021-12-30 2:23 ` Damien Le Moal
2022-01-13 11:13 ` Lyashkov, Alexey
2022-01-13 19:59 ` Martin K. Petersen
2022-01-14 12:25 ` Lyashkov, Alexey
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.