All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/1] t10-pi bio split fix
@ 2021-12-20 13:44 Alexey Lyashkov
  2021-12-20 13:44 ` [PATCH 1/1] t10-pi: fix intergrity iterator Alexey Lyashkov
  2021-12-20 16:29 ` [PATCH 0/1] t10-pi bio split fix Martin K. Petersen
  0 siblings, 2 replies; 8+ messages in thread
From: Alexey Lyashkov @ 2021-12-20 13:44 UTC (permalink / raw)
  To: Jens Axboe, Bart Van Assche, fio
  Cc: Damien Le Moal, Dmitry Fomichev, Alexey Lyashkov

Hello All,

During HPe testing Lustre team discovered bio incorrectly splits with
integrity info attached.

Some research pointed me to the two fixes related to my problem,
first was 
commit f36ea50ca0043e7b1204feaf1d2ba6bd68c08d36
Author: Wen Xiong <wenxiong@xxxxxxxxxxxxxxxxxx>
Date:   Wed May 10 08:54:11 2017 -0500

    blk-mq: NVMe 512B/4K+T10 DIF/DIX format returns I/O error on dd with split op

    When formatting NVMe to 512B/4K + T10 DIf/DIX, dd with split op returns
    "Input/output error". Looks block layer split the bio after calling
    bio_integrity_prep(bio). This patch fixes the issue.

and second was
commit e4dc9a4c31fe10d1751c542702afc85be8a5c56a
Author: Israel Rukshin <israelr@xxxxxxxxxxxx>
Date:   Wed Dec 11 17:36:02 2019 +0200

    scsi: target/iblock: Fix protection error with blocks greater than 512B

…
Block io trace pointed me to the three functions called in my case, it is
t10_pi_generate, bio_integrity_advance, t10_pi_type1_prepare.

Looking in code - t10_pi_generate generate a ref_tag based on “virtual” block 
number (512b base), and t10_pi_type1_prepare - converts this data to the the 
real ref_tag (aka device block number), but sometimes it’s don’t mapped.
Looking to the 
void bio_integrity_advance(struct bio *bio, unsigned int bytes_done)
{
        struct bio_integrity_payload *bip = bio_integrity(bio);
        struct blk_integrity *bi = blk_get_integrity(bio->bi_disk);
        unsigned bytes = bio_integrity_bytes(bi, bytes_done >> 9);
        bip->bip_iter.bi_sector += bytes_done >> 9;
        bvec_iter_advance(bip->bip_vec, &bip->bip_iter, bytes);
}

it have shit an iterator in the 512b block size base it looks right, but wait… 
static blk_status_t t10_pi_generate(struct blk_integrity_iter *iter,
                csum_fn *fn, enum t10_dif_type type)
{
        unsigned int i;

        for (i = 0 ; i < iter->data_size ; i += iter->interval) {
                struct t10_pi_tuple *pi = iter->prot_buf;


                iter->seed++; <<<
        }

        return BLK_STS_OK;
}

t10_pi_generate / t10_pi_type1_prepare have just a increment by “1” for 
the integrity internal which is 4k in my case,
so any bio_integrity_advance call will be move an iterator outside of
generated sequence and t10_pi_type1_prepare can’t be found a good virtual
sector for the mapping.
Changing an increment by “1” to be related to the real integrity size 
solve a problem completely.
Attached patch passed my own testing on raid0 with 4k block size. 

Alexey Lyashkov (1):
  t10-pi: fix intergrity iterator

 block/t10-pi.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

-- 
2.27.0


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/1] t10-pi: fix intergrity iterator
  2021-12-20 13:44 [PATCH 0/1] t10-pi bio split fix Alexey Lyashkov
@ 2021-12-20 13:44 ` Alexey Lyashkov
  2021-12-20 16:29 ` [PATCH 0/1] t10-pi bio split fix Martin K. Petersen
  1 sibling, 0 replies; 8+ messages in thread
From: Alexey Lyashkov @ 2021-12-20 13:44 UTC (permalink / raw)
  To: Jens Axboe, Bart Van Assche, fio
  Cc: Damien Le Moal, Dmitry Fomichev, Alexey Lyashkov

Integrity iterator should counted in the SECTOR_SIZE units,
not in the interity intervals.
It make bio_integrity_advance to be happy on bio_split.

Signed-off-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
---
 block/t10-pi.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/block/t10-pi.c b/block/t10-pi.c
index 25a52a2a09a8..bbdecead214e 100644
--- a/block/t10-pi.c
+++ b/block/t10-pi.c
@@ -31,6 +31,7 @@ static blk_status_t t10_pi_generate(struct blk_integrity_iter *iter,
 		csum_fn *fn, enum t10_dif_type type)
 {
 	unsigned int i;
+	unsigned int pi_size = iter->interval >> SECTOR_SHIFT;
 
 	for (i = 0 ; i < iter->data_size ; i += iter->interval) {
 		struct t10_pi_tuple *pi = iter->prot_buf;
@@ -45,7 +46,7 @@ static blk_status_t t10_pi_generate(struct blk_integrity_iter *iter,
 
 		iter->data_buf += iter->interval;
 		iter->prot_buf += sizeof(struct t10_pi_tuple);
-		iter->seed++;
+		iter->seed+= pi_size;
 	}
 
 	return BLK_STS_OK;
@@ -55,6 +56,7 @@ static blk_status_t t10_pi_verify(struct blk_integrity_iter *iter,
 		csum_fn *fn, enum t10_dif_type type)
 {
 	unsigned int i;
+	unsigned int pi_size = iter->interval >> SECTOR_SHIFT;
 
 	BUG_ON(type == T10_PI_TYPE0_PROTECTION);
 
@@ -94,7 +96,7 @@ static blk_status_t t10_pi_verify(struct blk_integrity_iter *iter,
 next:
 		iter->data_buf += iter->interval;
 		iter->prot_buf += sizeof(struct t10_pi_tuple);
-		iter->seed++;
+		iter->seed += pi_size;
 	}
 
 	return BLK_STS_OK;
@@ -135,6 +137,7 @@ static void t10_pi_type1_prepare(struct request *rq)
 	const int tuple_sz = rq->q->integrity.tuple_size;
 	u32 ref_tag = t10_pi_ref_tag(rq);
 	struct bio *bio;
+	unsigned int pi_size = 1 << (rq->q->integrity.interval_exp - SECTOR_SHIFT);
 
 	__rq_for_each_bio(bio, rq) {
 		struct bio_integrity_payload *bip = bio_integrity(bio);
@@ -156,7 +159,8 @@ static void t10_pi_type1_prepare(struct request *rq)
 
 				if (be32_to_cpu(pi->ref_tag) == virt)
 					pi->ref_tag = cpu_to_be32(ref_tag);
-				virt++;
+
+				virt += pi_size;
 				ref_tag++;
 				p += tuple_sz;
 			}
@@ -185,6 +189,7 @@ static void t10_pi_type1_complete(struct request *rq, unsigned int nr_bytes)
 	const int tuple_sz = rq->q->integrity.tuple_size;
 	u32 ref_tag = t10_pi_ref_tag(rq);
 	struct bio *bio;
+	unsigned int pi_size = 1 << (rq->q->integrity.interval_exp - SECTOR_SHIFT);
 
 	__rq_for_each_bio(bio, rq) {
 		struct bio_integrity_payload *bip = bio_integrity(bio);
@@ -202,7 +207,8 @@ static void t10_pi_type1_complete(struct request *rq, unsigned int nr_bytes)
 
 				if (be32_to_cpu(pi->ref_tag) == ref_tag)
 					pi->ref_tag = cpu_to_be32(virt);
-				virt++;
+
+				virt+= pi_size;
 				ref_tag++;
 				intervals--;
 				p += tuple_sz;
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/1] t10-pi bio split fix
  2021-12-20 13:44 [PATCH 0/1] t10-pi bio split fix Alexey Lyashkov
  2021-12-20 13:44 ` [PATCH 1/1] t10-pi: fix intergrity iterator Alexey Lyashkov
@ 2021-12-20 16:29 ` Martin K. Petersen
  2021-12-24 12:16   ` Lyashkov, Alexey
  1 sibling, 1 reply; 8+ messages in thread
From: Martin K. Petersen @ 2021-12-20 16:29 UTC (permalink / raw)
  To: Alexey Lyashkov
  Cc: Jens Axboe, Bart Van Assche, fio, Damien Le Moal, Dmitry Fomichev


Alexey,

> t10_pi_generate / t10_pi_type1_prepare have just a increment by “1” for 
> the integrity internal which is 4k in my case,
> so any bio_integrity_advance call will be move an iterator outside of
> generated sequence and t10_pi_type1_prepare can’t be found a good virtual
> sector for the mapping.
> Changing an increment by “1” to be related to the real integrity size 
> solve a problem completely.

By definition the T10 PI reference tag is incremented by one per
interval (typically the logical block size). If you implement it by a
different value than one then it is no longer valid protection
information.

Seems like the splitting logic is broken somehow although I haven't seen
any failures with 4K on SCSI. What does your storage stack look like?

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/1] t10-pi bio split fix
  2021-12-20 16:29 ` [PATCH 0/1] t10-pi bio split fix Martin K. Petersen
@ 2021-12-24 12:16   ` Lyashkov, Alexey
  2021-12-30  2:23     ` Damien Le Moal
  0 siblings, 1 reply; 8+ messages in thread
From: Lyashkov, Alexey @ 2021-12-24 12:16 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Jens Axboe, Bart Van Assche, fio, Damien Le Moal, Dmitry Fomichev

Martin,

Sorry about delay.

I don't agree with you about T10 PI reference tag in current code.
t10_pi_generate works with virtual block numbers and virtual reference tags.
Virtual tag mapped into real tag later in the 
static void t10_pi_type1_prepare(struct request *rq)
...
                                if (be32_to_cpu(pi->ref_tag) == virt)
                                        pi->ref_tag = cpu_to_be32(ref_tag);

...
So, we need just a pair between these functions to have a good mapping and good real reference tag 
Once t10_pi_generate have shift a "virtual" ref tag for 4 it make a bio_integrity_advance to be happy.
And t10_pi_type1_prepare also happy but it need to be shift with 4 as similar to the generate function.

This patch tested with software raid (raid 1 / raid 6) over over NMVe devices with 4k block size.
In lustre case it caused a bio integrity prepare called before bio_submit so integrity will be splits before sends to the nvme devices.
Without patch it caused an T10 write errors for each write over 4k, with patch - no errors.

Alex

On 20/12/2021, 19:29, "Martin K. Petersen" <martin.petersen@oracle.com> wrote:


    Alexey,

    > t10_pi_generate / t10_pi_type1_prepare have just a increment by “1” for 
    > the integrity internal which is 4k in my case,
    > so any bio_integrity_advance call will be move an iterator outside of
    > generated sequence and t10_pi_type1_prepare can’t be found a good virtual
    > sector for the mapping.
    > Changing an increment by “1” to be related to the real integrity size 
    > solve a problem completely.

    By definition the T10 PI reference tag is incremented by one per
    interval (typically the logical block size). If you implement it by a
    different value than one then it is no longer valid protection
    information.

    Seems like the splitting logic is broken somehow although I haven't seen
    any failures with 4K on SCSI. What does your storage stack look like?

    -- 
    Martin K. Petersen	Oracle Linux Engineering


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/1] t10-pi bio split fix
  2021-12-24 12:16   ` Lyashkov, Alexey
@ 2021-12-30  2:23     ` Damien Le Moal
  2022-01-13 11:13       ` Lyashkov, Alexey
  0 siblings, 1 reply; 8+ messages in thread
From: Damien Le Moal @ 2021-12-30  2:23 UTC (permalink / raw)
  To: Lyashkov, Alexey, Martin K. Petersen
  Cc: Jens Axboe, Bart Van Assche, Dmitry Fomichev, linux-block, linux-scsi

On 12/24/21 21:16, Lyashkov, Alexey wrote:

This thread should really be addressed to linux-block@vger.kernel.org
and linux-scsi@vger.kernel.org.

Added to cc here.


> Martin,
> 
> Sorry about delay.
> 
> I don't agree with you about T10 PI reference tag in current code.
> t10_pi_generate works with virtual block numbers and virtual reference tags.
> Virtual tag mapped into real tag later in the 
> static void t10_pi_type1_prepare(struct request *rq)
> ...
>                                 if (be32_to_cpu(pi->ref_tag) == virt)
>                                         pi->ref_tag = cpu_to_be32(ref_tag);
> 
> ...
> So, we need just a pair between these functions to have a good mapping and good real reference tag 
> Once t10_pi_generate have shift a "virtual" ref tag for 4 it make a bio_integrity_advance to be happy.
> And t10_pi_type1_prepare also happy but it need to be shift with 4 as similar to the generate function.
> 
> This patch tested with software raid (raid 1 / raid 6) over over NMVe devices with 4k block size.
> In lustre case it caused a bio integrity prepare called before bio_submit so integrity will be splits before sends to the nvme devices.
> Without patch it caused an T10 write errors for each write over 4k, with patch - no errors.
> 
> Alex
> 
> On 20/12/2021, 19:29, "Martin K. Petersen" <martin.petersen@oracle.com> wrote:
> 
> 
>     Alexey,
> 
>     > t10_pi_generate / t10_pi_type1_prepare have just a increment by “1” for 
>     > the integrity internal which is 4k in my case,
>     > so any bio_integrity_advance call will be move an iterator outside of
>     > generated sequence and t10_pi_type1_prepare can’t be found a good virtual
>     > sector for the mapping.
>     > Changing an increment by “1” to be related to the real integrity size 
>     > solve a problem completely.
> 
>     By definition the T10 PI reference tag is incremented by one per
>     interval (typically the logical block size). If you implement it by a
>     different value than one then it is no longer valid protection
>     information.
> 
>     Seems like the splitting logic is broken somehow although I haven't seen
>     any failures with 4K on SCSI. What does your storage stack look like?
> 
>     -- 
>     Martin K. Petersen	Oracle Linux Engineering
> 


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/1] t10-pi bio split fix
  2021-12-30  2:23     ` Damien Le Moal
@ 2022-01-13 11:13       ` Lyashkov, Alexey
  2022-01-13 19:59         ` Martin K. Petersen
  0 siblings, 1 reply; 8+ messages in thread
From: Lyashkov, Alexey @ 2022-01-13 11:13 UTC (permalink / raw)
  To: Damien Le Moal, Martin K. Petersen
  Cc: Jens Axboe, Bart Van Assche, Dmitry Fomichev, linux-block, linux-scsi

Ping for review.

On 30/12/2021, 05:24, "Damien Le Moal" <damien.lemoal@opensource.wdc.com> wrote:

    On 12/24/21 21:16, Lyashkov, Alexey wrote:

    This thread should really be addressed to linux-block@vger.kernel.org
    and linux-scsi@vger.kernel.org.

    Added to cc here.


    > Martin,
    > 
    > Sorry about delay.
    > 
    > I don't agree with you about T10 PI reference tag in current code.
    > t10_pi_generate works with virtual block numbers and virtual reference tags.
    > Virtual tag mapped into real tag later in the 
    > static void t10_pi_type1_prepare(struct request *rq)
    > ...
    >                                 if (be32_to_cpu(pi->ref_tag) == virt)
    >                                         pi->ref_tag = cpu_to_be32(ref_tag);
    > 
    > ...
    > So, we need just a pair between these functions to have a good mapping and good real reference tag 
    > Once t10_pi_generate have shift a "virtual" ref tag for 4 it make a bio_integrity_advance to be happy.
    > And t10_pi_type1_prepare also happy but it need to be shift with 4 as similar to the generate function.
    > 
    > This patch tested with software raid (raid 1 / raid 6) over over NMVe devices with 4k block size.
    > In lustre case it caused a bio integrity prepare called before bio_submit so integrity will be splits before sends to the nvme devices.
    > Without patch it caused an T10 write errors for each write over 4k, with patch - no errors.
    > 
    > Alex
    > 
    > On 20/12/2021, 19:29, "Martin K. Petersen" <martin.petersen@oracle.com> wrote:
    > 
    > 
    >     Alexey,
    > 
    >     > t10_pi_generate / t10_pi_type1_prepare have just a increment by “1” for 
    >     > the integrity internal which is 4k in my case,
    >     > so any bio_integrity_advance call will be move an iterator outside of
    >     > generated sequence and t10_pi_type1_prepare can’t be found a good virtual
    >     > sector for the mapping.
    >     > Changing an increment by “1” to be related to the real integrity size 
    >     > solve a problem completely.
    > 
    >     By definition the T10 PI reference tag is incremented by one per
    >     interval (typically the logical block size). If you implement it by a
    >     different value than one then it is no longer valid protection
    >     information.
    > 
    >     Seems like the splitting logic is broken somehow although I haven't seen
    >     any failures with 4K on SCSI. What does your storage stack look like?
    > 
    >     -- 
    >     Martin K. Petersen	Oracle Linux Engineering
    > 


    -- 
    Damien Le Moal
    Western Digital Research


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/1] t10-pi bio split fix
  2022-01-13 11:13       ` Lyashkov, Alexey
@ 2022-01-13 19:59         ` Martin K. Petersen
  2022-01-14 12:25           ` Lyashkov, Alexey
  0 siblings, 1 reply; 8+ messages in thread
From: Martin K. Petersen @ 2022-01-13 19:59 UTC (permalink / raw)
  To: Lyashkov, Alexey
  Cc: Damien Le Moal, Martin K. Petersen, Jens Axboe, Bart Van Assche,
	Dmitry Fomichev, linux-block, linux-scsi


Alexey,

> Ping for review.

Can you please try the following patch?

-- 
Martin K. Petersen	Oracle Linux Engineering

diff --git a/block/bio-integrity.c b/block/bio-integrity.c
index c0eb901315f9..fa5bc5b13c6a 100644
--- a/block/bio-integrity.c
+++ b/block/bio-integrity.c
@@ -387,7 +387,7 @@ void bio_integrity_advance(struct bio *bio, unsigned int bytes_done)
 	struct blk_integrity *bi = blk_get_integrity(bio->bi_disk);
 	unsigned bytes = bio_integrity_bytes(bi, bytes_done >> 9);
 
-	bip->bip_iter.bi_sector += bytes_done >> 9;
+	bip->bip_iter.bi_sector += bio_integrity_intervals(bi, bytes_done >> 9);
 	bvec_iter_advance(bip->bip_vec, &bip->bip_iter, bytes);
 }
 

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/1] t10-pi bio split fix
  2022-01-13 19:59         ` Martin K. Petersen
@ 2022-01-14 12:25           ` Lyashkov, Alexey
  0 siblings, 0 replies; 8+ messages in thread
From: Lyashkov, Alexey @ 2022-01-14 12:25 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Damien Le Moal, Jens Axboe, Bart Van Assche, Dmitry Fomichev,
	linux-block, linux-scsi

Martin,

I will test shortly. But it mean bi_sector will be in the integrity interval units while 
any sectors expected to be in the SECTOR_SIZE units and it confuse while debug.

Alex

On 13/01/2022, 22:59, "Martin K. Petersen" <martin.petersen@oracle.com> wrote:


    Alexey,

    > Ping for review.

    Can you please try the following patch?

    -- 
    Martin K. Petersen	Oracle Linux Engineering

    diff --git a/block/bio-integrity.c b/block/bio-integrity.c
    index c0eb901315f9..fa5bc5b13c6a 100644
    --- a/block/bio-integrity.c
    +++ b/block/bio-integrity.c
    @@ -387,7 +387,7 @@ void bio_integrity_advance(struct bio *bio, unsigned int bytes_done)
     	struct blk_integrity *bi = blk_get_integrity(bio->bi_disk);
     	unsigned bytes = bio_integrity_bytes(bi, bytes_done >> 9);

    -	bip->bip_iter.bi_sector += bytes_done >> 9;
    +	bip->bip_iter.bi_sector += bio_integrity_intervals(bi, bytes_done >> 9);
     	bvec_iter_advance(bip->bip_vec, &bip->bip_iter, bytes);
     }



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-01-14 12:25 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-20 13:44 [PATCH 0/1] t10-pi bio split fix Alexey Lyashkov
2021-12-20 13:44 ` [PATCH 1/1] t10-pi: fix intergrity iterator Alexey Lyashkov
2021-12-20 16:29 ` [PATCH 0/1] t10-pi bio split fix Martin K. Petersen
2021-12-24 12:16   ` Lyashkov, Alexey
2021-12-30  2:23     ` Damien Le Moal
2022-01-13 11:13       ` Lyashkov, Alexey
2022-01-13 19:59         ` Martin K. Petersen
2022-01-14 12:25           ` Lyashkov, Alexey

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.