All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH 0/1] t10-pi bio split fix
@ 2022-02-02 17:42 Ivanov, Dmitry (HPC)
  2022-02-04  3:44 ` Martin K. Petersen
  0 siblings, 1 reply; 15+ messages in thread
From: Ivanov, Dmitry (HPC) @ 2022-02-02 17:42 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Damien Le Moal, Jens Axboe, Bart Van Assche, Lyashkov, Alexey,
	Dmitry Fomichev, linux-block, linux-scsi

On 14/01/2022, 12:25 +0000, "Martin K. Petersen" <martin.petersen@xxxxxxxxxx> wrote:

> Can you please try the following patch?
> 
> Martin K. Petersen	Oracle Linux Engineering
> 
> diff --git a/block/bio-integrity.c b/block/bio-integrity.c
> index c0eb901315f9..fa5bc5b13c6a 100644
> --- a/block/bio-integrity.c
> +++ b/block/bio-integrity.c
> @@ -387,7 +387,7 @@ void bio_integrity_advance(struct bio *bio, unsigned int bytes_done)
> 	struct blk_integrity *bi = blk_get_integrity(bio->bi_disk);
> 	unsigned bytes = bio_integrity_bytes(bi, bytes_done >> 9);
> 
> -	bip->bip_iter.bi_sector += bytes_done >> 9;
> +	bip->bip_iter.bi_sector += bio_integrity_intervals(bi, bytes_done >> 9);
> 	bvec_iter_advance(bip->bip_vec, &bip->bip_iter, bytes);
> }

I encountered the same issue as Alexey did when writing to DPS Type 2 formatted, 8 bytes metadata NVME device(s) from a stacked driver if bio was split in md layer.
Virtual ref_tags were improperly converted to ref_tags by the block integrity prepare_fn().
The Martin's patch has resolved this issue.

Martin, could you please advance with this patch?

My only concern is dm_crypt target which operates on bip_iter directly with the code copy-pasted from bio_integrity_advance:

static int dm_crypt_integrity_io_alloc(struct dm_crypt_io *io, struct bio *bio)
{
	struct bio_integrity_payload *bip;
	unsigned int tag_len;
	int ret;

	if (!bio_sectors(bio) || !io->cc->on_disk_tag_size)
		return 0;

	bip = bio_integrity_alloc(bio, GFP_NOIO, 1);
	if (IS_ERR(bip))
		return PTR_ERR(bip);

	tag_len = io->cc->on_disk_tag_size * (bio_sectors(bio) >> io->cc->sector_shift);

	bip->bip_iter.bi_size = tag_len;
	bip->bip_iter.bi_sector = io->cc->start + io->sector;
               ^^^

	ret = bio_integrity_add_page(bio, virt_to_page(io->integrity_metadata),
				     tag_len, offset_in_page(io->integrity_metadata));
...
}
Not sure if there is another place in drivers where bip iterator is advanced explicitly, i.e. without calling bio_integrity_advance/bio_advance.

--
Dmitry Ivanov
Hewlett Packard Enterprise - HPC AI Labs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/1] t10-pi bio split fix
  2022-02-02 17:42 [PATCH 0/1] t10-pi bio split fix Ivanov, Dmitry (HPC)
@ 2022-02-04  3:44 ` Martin K. Petersen
  2022-02-04  7:43     ` [dm-devel] " Milan Broz
  0 siblings, 1 reply; 15+ messages in thread
From: Martin K. Petersen @ 2022-02-04  3:44 UTC (permalink / raw)
  To: Ivanov, Dmitry (HPC)
  Cc: Martin K. Petersen, Damien Le Moal, Jens Axboe, Bart Van Assche,
	Lyashkov, Alexey, Dmitry Fomichev, linux-block, linux-scsi,
	Milan Broz, Mike Snitzer


Dmitry,

> My only concern is dm_crypt target which operates on bip_iter directly
> with the code copy-pasted from bio_integrity_advance:
>
> static int dm_crypt_integrity_io_alloc(struct dm_crypt_io *io, struct bio *bio)
> {
> 	struct bio_integrity_payload *bip;
> 	unsigned int tag_len;
> 	int ret;
>
> 	if (!bio_sectors(bio) || !io->cc->on_disk_tag_size)
> 		return 0;
>
> 	bip = bio_integrity_alloc(bio, GFP_NOIO, 1);
> 	if (IS_ERR(bip))
> 		return PTR_ERR(bip);
>
> 	tag_len = io->cc->on_disk_tag_size * (bio_sectors(bio) >> io->cc->sector_shift);
>
> 	bip->bip_iter.bi_size = tag_len;
> 	bip->bip_iter.bi_sector = io->cc->start + io->sector;
>                ^^^
>
> 	ret = bio_integrity_add_page(bio, virt_to_page(io->integrity_metadata),
> 				     tag_len, offset_in_page(io->integrity_metadata));
> ...
> }

I copied Milan and Mike who are more familiar with the dm-drypt internals.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/1] t10-pi bio split fix
  2022-02-04  3:44 ` Martin K. Petersen
@ 2022-02-04  7:43     ` Milan Broz
  0 siblings, 0 replies; 15+ messages in thread
From: Milan Broz @ 2022-02-04  7:43 UTC (permalink / raw)
  To: Martin K. Petersen, Ivanov, Dmitry (HPC)
  Cc: Damien Le Moal, Jens Axboe, Bart Van Assche, Lyashkov, Alexey,
	Dmitry Fomichev, linux-block, linux-scsi, Mike Snitzer,
	Mikulas Patocka, device-mapper development

On 04/02/2022 04:44, Martin K. Petersen wrote:
> 
> Dmitry,
> 
>> My only concern is dm_crypt target which operates on bip_iter directly
>> with the code copy-pasted from bio_integrity_advance:
>>
>> static int dm_crypt_integrity_io_alloc(struct dm_crypt_io *io, struct bio *bio)
>> {
>> 	struct bio_integrity_payload *bip;
>> 	unsigned int tag_len;
>> 	int ret;
>>
>> 	if (!bio_sectors(bio) || !io->cc->on_disk_tag_size)
>> 		return 0;
>>
>> 	bip = bio_integrity_alloc(bio, GFP_NOIO, 1);
>> 	if (IS_ERR(bip))
>> 		return PTR_ERR(bip);
>>
>> 	tag_len = io->cc->on_disk_tag_size * (bio_sectors(bio) >> io->cc->sector_shift);
>>
>> 	bip->bip_iter.bi_size = tag_len;
>> 	bip->bip_iter.bi_sector = io->cc->start + io->sector;
>>                 ^^^
>>
>> 	ret = bio_integrity_add_page(bio, virt_to_page(io->integrity_metadata),
>> 				     tag_len, offset_in_page(io->integrity_metadata));
>> ...
>> }
> 
> I copied Milan and Mike who are more familiar with the dm-drypt internals.

Hi,

What's the problem here you are trying to fix?
Even if I read linux-block posts, I do not understand the context...

Anyway, cc to Mikulas and dm-devel, as dm-integrity/dm-crypt is
the major user of bio_integrity here.

If you touch the code, please be sure you run cryptsetup testsuite
with the integrity tests.
(IOW integritysetup tests and LUKS2 with authenticated encryption
that uses dm-crypt over dm-integrity.)

All we need is that dm-integrity can process bio integrity data
directly. (I know some people do not like it, but this was
the most "elegant" solution here.)

Here dm-crypt uses the data for authenticated encryption (additional
auth tag in bio field), so because dm-crypt owns bio, integrity data
must be allocated in dm-crypt (stacked over dm-integrity).

Milan

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [dm-devel] [PATCH 0/1] t10-pi bio split fix
@ 2022-02-04  7:43     ` Milan Broz
  0 siblings, 0 replies; 15+ messages in thread
From: Milan Broz @ 2022-02-04  7:43 UTC (permalink / raw)
  To: Martin K. Petersen, Ivanov, Dmitry (HPC)
  Cc: Jens Axboe, Damien Le Moal, Bart Van Assche, linux-scsi,
	Dmitry Fomichev, Mike Snitzer, linux-block,
	device-mapper development, Mikulas Patocka, Lyashkov, Alexey

On 04/02/2022 04:44, Martin K. Petersen wrote:
> 
> Dmitry,
> 
>> My only concern is dm_crypt target which operates on bip_iter directly
>> with the code copy-pasted from bio_integrity_advance:
>>
>> static int dm_crypt_integrity_io_alloc(struct dm_crypt_io *io, struct bio *bio)
>> {
>> 	struct bio_integrity_payload *bip;
>> 	unsigned int tag_len;
>> 	int ret;
>>
>> 	if (!bio_sectors(bio) || !io->cc->on_disk_tag_size)
>> 		return 0;
>>
>> 	bip = bio_integrity_alloc(bio, GFP_NOIO, 1);
>> 	if (IS_ERR(bip))
>> 		return PTR_ERR(bip);
>>
>> 	tag_len = io->cc->on_disk_tag_size * (bio_sectors(bio) >> io->cc->sector_shift);
>>
>> 	bip->bip_iter.bi_size = tag_len;
>> 	bip->bip_iter.bi_sector = io->cc->start + io->sector;
>>                 ^^^
>>
>> 	ret = bio_integrity_add_page(bio, virt_to_page(io->integrity_metadata),
>> 				     tag_len, offset_in_page(io->integrity_metadata));
>> ...
>> }
> 
> I copied Milan and Mike who are more familiar with the dm-drypt internals.

Hi,

What's the problem here you are trying to fix?
Even if I read linux-block posts, I do not understand the context...

Anyway, cc to Mikulas and dm-devel, as dm-integrity/dm-crypt is
the major user of bio_integrity here.

If you touch the code, please be sure you run cryptsetup testsuite
with the integrity tests.
(IOW integritysetup tests and LUKS2 with authenticated encryption
that uses dm-crypt over dm-integrity.)

All we need is that dm-integrity can process bio integrity data
directly. (I know some people do not like it, but this was
the most "elegant" solution here.)

Here dm-crypt uses the data for authenticated encryption (additional
auth tag in bio field), so because dm-crypt owns bio, integrity data
must be allocated in dm-crypt (stacked over dm-integrity).

Milan

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/1] t10-pi bio split fix
  2022-02-04  7:43     ` [dm-devel] " Milan Broz
@ 2022-02-04  7:58       ` Lyashkov, Alexey
  -1 siblings, 0 replies; 15+ messages in thread
From: Lyashkov, Alexey @ 2022-02-04  7:58 UTC (permalink / raw)
  To: Milan Broz, Martin K. Petersen, Ivanov, Dmitry (HPC)
  Cc: Damien Le Moal, Jens Axboe, Bart Van Assche, Dmitry Fomichev,
	linux-block, linux-scsi, Mike Snitzer, Mikulas Patocka,
	device-mapper development

Milan,

I and Dmitriy tries to fix a problem with BIO split with integrity data attached.
This is case, integrity generated/attached before bio_submit over raid device (md stack in my case) and bio is subject of bio_integrity_advance.
But bio_integrity_advance incorrectly split an integrity data, as it's assume integrity iterator in the 512b blocks.
( bip->bip_iter.bi_sector += bytes_done >> 9 )
But t10 generate function (t10_pi_generate) increase a iterator by 1 for each integrity interval in the t10_pi_generate function.
(        for (i = 0 ; i < iter->data_size ; i += iter->interval) {
                iter->seed++;
        }
)
It's good for the scsi with 512b blocks, but bad for the nvme with 4k block size.

So two solutions exist 
1) my solution - lets fix an t10_pi_generate / t10_pi_verify /  t10_pi_type1_prepare to increase iterator by value related to real integrity block in the 512b blocks.
2) Martin solution, just change an bio_integrity_advance function to make iterator shit in this integrity data blocks units instead of 512b blocks.

Alex

On 04/02/2022, 10:44, "Milan Broz" <gmazyland@gmail.com> wrote:

    On 04/02/2022 04:44, Martin K. Petersen wrote:
    > 
    > Dmitry,
    > 
    >> My only concern is dm_crypt target which operates on bip_iter directly
    >> with the code copy-pasted from bio_integrity_advance:
    >>
    >> static int dm_crypt_integrity_io_alloc(struct dm_crypt_io *io, struct bio *bio)
    >> {
    >> 	struct bio_integrity_payload *bip;
    >> 	unsigned int tag_len;
    >> 	int ret;
    >>
    >> 	if (!bio_sectors(bio) || !io->cc->on_disk_tag_size)
    >> 		return 0;
    >>
    >> 	bip = bio_integrity_alloc(bio, GFP_NOIO, 1);
    >> 	if (IS_ERR(bip))
    >> 		return PTR_ERR(bip);
    >>
    >> 	tag_len = io->cc->on_disk_tag_size * (bio_sectors(bio) >> io->cc->sector_shift);
    >>
    >> 	bip->bip_iter.bi_size = tag_len;
    >> 	bip->bip_iter.bi_sector = io->cc->start + io->sector;
    >>                 ^^^
    >>
    >> 	ret = bio_integrity_add_page(bio, virt_to_page(io->integrity_metadata),
    >> 				     tag_len, offset_in_page(io->integrity_metadata));
    >> ...
    >> }
    > 
    > I copied Milan and Mike who are more familiar with the dm-drypt internals.

    Hi,

    What's the problem here you are trying to fix?
    Even if I read linux-block posts, I do not understand the context...

    Anyway, cc to Mikulas and dm-devel, as dm-integrity/dm-crypt is
    the major user of bio_integrity here.

    If you touch the code, please be sure you run cryptsetup testsuite
    with the integrity tests.
    (IOW integritysetup tests and LUKS2 with authenticated encryption
    that uses dm-crypt over dm-integrity.)

    All we need is that dm-integrity can process bio integrity data
    directly. (I know some people do not like it, but this was
    the most "elegant" solution here.)

    Here dm-crypt uses the data for authenticated encryption (additional
    auth tag in bio field), so because dm-crypt owns bio, integrity data
    must be allocated in dm-crypt (stacked over dm-integrity).

    Milan


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [dm-devel] [PATCH 0/1] t10-pi bio split fix
@ 2022-02-04  7:58       ` Lyashkov, Alexey
  0 siblings, 0 replies; 15+ messages in thread
From: Lyashkov, Alexey @ 2022-02-04  7:58 UTC (permalink / raw)
  To: Milan Broz, Martin K. Petersen, Ivanov, Dmitry (HPC)
  Cc: Jens Axboe, Damien Le Moal, Bart Van Assche, linux-scsi,
	Dmitry Fomichev, Mike Snitzer, linux-block,
	device-mapper development, Patocka, Mikulas

Milan,

I and Dmitriy tries to fix a problem with BIO split with integrity data attached.
This is case, integrity generated/attached before bio_submit over raid device (md stack in my case) and bio is subject of bio_integrity_advance.
But bio_integrity_advance incorrectly split an integrity data, as it's assume integrity iterator in the 512b blocks.
( bip->bip_iter.bi_sector += bytes_done >> 9 )
But t10 generate function (t10_pi_generate) increase a iterator by 1 for each integrity interval in the t10_pi_generate function.
(        for (i = 0 ; i < iter->data_size ; i += iter->interval) {
                iter->seed++;
        }
)
It's good for the scsi with 512b blocks, but bad for the nvme with 4k block size.

So two solutions exist 
1) my solution - lets fix an t10_pi_generate / t10_pi_verify /  t10_pi_type1_prepare to increase iterator by value related to real integrity block in the 512b blocks.
2) Martin solution, just change an bio_integrity_advance function to make iterator shit in this integrity data blocks units instead of 512b blocks.

Alex

On 04/02/2022, 10:44, "Milan Broz" <gmazyland@gmail.com> wrote:

    On 04/02/2022 04:44, Martin K. Petersen wrote:
    > 
    > Dmitry,
    > 
    >> My only concern is dm_crypt target which operates on bip_iter directly
    >> with the code copy-pasted from bio_integrity_advance:
    >>
    >> static int dm_crypt_integrity_io_alloc(struct dm_crypt_io *io, struct bio *bio)
    >> {
    >> 	struct bio_integrity_payload *bip;
    >> 	unsigned int tag_len;
    >> 	int ret;
    >>
    >> 	if (!bio_sectors(bio) || !io->cc->on_disk_tag_size)
    >> 		return 0;
    >>
    >> 	bip = bio_integrity_alloc(bio, GFP_NOIO, 1);
    >> 	if (IS_ERR(bip))
    >> 		return PTR_ERR(bip);
    >>
    >> 	tag_len = io->cc->on_disk_tag_size * (bio_sectors(bio) >> io->cc->sector_shift);
    >>
    >> 	bip->bip_iter.bi_size = tag_len;
    >> 	bip->bip_iter.bi_sector = io->cc->start + io->sector;
    >>                 ^^^
    >>
    >> 	ret = bio_integrity_add_page(bio, virt_to_page(io->integrity_metadata),
    >> 				     tag_len, offset_in_page(io->integrity_metadata));
    >> ...
    >> }
    > 
    > I copied Milan and Mike who are more familiar with the dm-drypt internals.

    Hi,

    What's the problem here you are trying to fix?
    Even if I read linux-block posts, I do not understand the context...

    Anyway, cc to Mikulas and dm-devel, as dm-integrity/dm-crypt is
    the major user of bio_integrity here.

    If you touch the code, please be sure you run cryptsetup testsuite
    with the integrity tests.
    (IOW integritysetup tests and LUKS2 with authenticated encryption
    that uses dm-crypt over dm-integrity.)

    All we need is that dm-integrity can process bio integrity data
    directly. (I know some people do not like it, but this was
    the most "elegant" solution here.)

    Here dm-crypt uses the data for authenticated encryption (additional
    auth tag in bio field), so because dm-crypt owns bio, integrity data
    must be allocated in dm-crypt (stacked over dm-integrity).

    Milan


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH 0/1] t10-pi bio split fix
  2022-02-04  7:43     ` [dm-devel] " Milan Broz
@ 2022-02-05  3:03       ` Ivanov, Dmitry (HPC)
  -1 siblings, 0 replies; 15+ messages in thread
From: Ivanov, Dmitry (HPC) @ 2022-02-05  3:03 UTC (permalink / raw)
  To: Milan Broz
  Cc: Damien Le Moal, Martin K. Petersen, Jens Axboe, Bart Van Assche,
	Lyashkov, Alexey, Dmitry Fomichev, linux-block, linux-scsi,
	Mike Snitzer, Mikulas Patocka, device-mapper development

On 04/02/2022 08:43 +0100, Milan Broz wrote:

> What's the problem here you are trying to fix?
> Even if I read linux-block posts, I do not understand the context..

1. Let me demonstrate this issue with the following example.

Given a driver which maps a bio to underlying devices with 4096 logical block size and the mapping:
bio: sector:0, size:32, dir:WRITE
Virtual LBA     Physical LBA
0               0
1               1
2               0
3               1

The Type 1 or 2 integrity is generated in bio_integrity_prep() by generate_fn as:
Virtual LBA     Virtual ref_tags
0               0
1               1
2               2
3               3

According to the mapping bio_split() would split it at 16. That would advance bip_iter.bi_sector (aka seed, see bip_get_seed()) by 16.
Virtual LBA     Physical LBA    seed
Split bio
0               0               0
1               1
Updated bio: sector:16, size:16
2               0               16
3               1               +1

Remapped updated bio: sector:0
Submitting it we expect to have ref_tags remapped to the actual physical start sector at the integrity prepare_fn and incremented by one per block of data:
Virtual LBA     Physical LBA    seed    virtual tags    ref_tags
2               0               2       2               16
3               1                +1      3               17

But we get with the current code a wrong seed (0+16) and there is no ref_tages remapping in the block integrity prepare_fn:
Virtual LBA     Physical LBA    seed    virtual tags    ref_tags
2               0               16      2               2
3               1                +1      3               3
This IO would fail by the NVME controller with Reference Tag Check Error (84h) because the first reference tag (2) is not equal to start LBA (16).

Martin's patch resolves this issue advancing the seed (bip_iter.bi_sector) properly by the number of integrity intervals (2) so the tag's remapping takes place in prepare_fn:
                                if (be32_to_cpu(pi->ref_tag) == virt)
                                        pi->ref_tag = cpu_to_be32(ref_tag);

2. Browsing the code I found a snippet of advancing the integrity seed directly, without calling bio_advance->bio_integrity_advance
I hope it does not introduce the abovementioned issue, please advise.

Dmitry

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [dm-devel] [PATCH 0/1] t10-pi bio split fix
@ 2022-02-05  3:03       ` Ivanov, Dmitry (HPC)
  0 siblings, 0 replies; 15+ messages in thread
From: Ivanov, Dmitry (HPC) @ 2022-02-05  3:03 UTC (permalink / raw)
  To: Milan Broz
  Cc: Jens Axboe, Damien Le Moal, Bart Van Assche, Martin K. Petersen,
	Dmitry Fomichev, Mike Snitzer, linux-block,
	device-mapper development, Mikulas Patocka, linux-scsi, Lyashkov,
	Alexey

On 04/02/2022 08:43 +0100, Milan Broz wrote:

> What's the problem here you are trying to fix?
> Even if I read linux-block posts, I do not understand the context..

1. Let me demonstrate this issue with the following example.

Given a driver which maps a bio to underlying devices with 4096 logical block size and the mapping:
bio: sector:0, size:32, dir:WRITE
Virtual LBA     Physical LBA
0               0
1               1
2               0
3               1

The Type 1 or 2 integrity is generated in bio_integrity_prep() by generate_fn as:
Virtual LBA     Virtual ref_tags
0               0
1               1
2               2
3               3

According to the mapping bio_split() would split it at 16. That would advance bip_iter.bi_sector (aka seed, see bip_get_seed()) by 16.
Virtual LBA     Physical LBA    seed
Split bio
0               0               0
1               1
Updated bio: sector:16, size:16
2               0               16
3               1               +1

Remapped updated bio: sector:0
Submitting it we expect to have ref_tags remapped to the actual physical start sector at the integrity prepare_fn and incremented by one per block of data:
Virtual LBA     Physical LBA    seed    virtual tags    ref_tags
2               0               2       2               16
3               1                +1      3               17

But we get with the current code a wrong seed (0+16) and there is no ref_tages remapping in the block integrity prepare_fn:
Virtual LBA     Physical LBA    seed    virtual tags    ref_tags
2               0               16      2               2
3               1                +1      3               3
This IO would fail by the NVME controller with Reference Tag Check Error (84h) because the first reference tag (2) is not equal to start LBA (16).

Martin's patch resolves this issue advancing the seed (bip_iter.bi_sector) properly by the number of integrity intervals (2) so the tag's remapping takes place in prepare_fn:
                                if (be32_to_cpu(pi->ref_tag) == virt)
                                        pi->ref_tag = cpu_to_be32(ref_tag);

2. Browsing the code I found a snippet of advancing the integrity seed directly, without calling bio_advance->bio_integrity_advance
I hope it does not introduce the abovementioned issue, please advise.

Dmitry

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/1] t10-pi bio split fix
  2022-01-13 19:59         ` Martin K. Petersen
@ 2022-01-14 12:25           ` Lyashkov, Alexey
  0 siblings, 0 replies; 15+ messages in thread
From: Lyashkov, Alexey @ 2022-01-14 12:25 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Damien Le Moal, Jens Axboe, Bart Van Assche, Dmitry Fomichev,
	linux-block, linux-scsi

Martin,

I will test shortly. But it mean bi_sector will be in the integrity interval units while 
any sectors expected to be in the SECTOR_SIZE units and it confuse while debug.

Alex

On 13/01/2022, 22:59, "Martin K. Petersen" <martin.petersen@oracle.com> wrote:


    Alexey,

    > Ping for review.

    Can you please try the following patch?

    -- 
    Martin K. Petersen	Oracle Linux Engineering

    diff --git a/block/bio-integrity.c b/block/bio-integrity.c
    index c0eb901315f9..fa5bc5b13c6a 100644
    --- a/block/bio-integrity.c
    +++ b/block/bio-integrity.c
    @@ -387,7 +387,7 @@ void bio_integrity_advance(struct bio *bio, unsigned int bytes_done)
     	struct blk_integrity *bi = blk_get_integrity(bio->bi_disk);
     	unsigned bytes = bio_integrity_bytes(bi, bytes_done >> 9);

    -	bip->bip_iter.bi_sector += bytes_done >> 9;
    +	bip->bip_iter.bi_sector += bio_integrity_intervals(bi, bytes_done >> 9);
     	bvec_iter_advance(bip->bip_vec, &bip->bip_iter, bytes);
     }



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/1] t10-pi bio split fix
  2022-01-13 11:13       ` Lyashkov, Alexey
@ 2022-01-13 19:59         ` Martin K. Petersen
  2022-01-14 12:25           ` Lyashkov, Alexey
  0 siblings, 1 reply; 15+ messages in thread
From: Martin K. Petersen @ 2022-01-13 19:59 UTC (permalink / raw)
  To: Lyashkov, Alexey
  Cc: Damien Le Moal, Martin K. Petersen, Jens Axboe, Bart Van Assche,
	Dmitry Fomichev, linux-block, linux-scsi


Alexey,

> Ping for review.

Can you please try the following patch?

-- 
Martin K. Petersen	Oracle Linux Engineering

diff --git a/block/bio-integrity.c b/block/bio-integrity.c
index c0eb901315f9..fa5bc5b13c6a 100644
--- a/block/bio-integrity.c
+++ b/block/bio-integrity.c
@@ -387,7 +387,7 @@ void bio_integrity_advance(struct bio *bio, unsigned int bytes_done)
 	struct blk_integrity *bi = blk_get_integrity(bio->bi_disk);
 	unsigned bytes = bio_integrity_bytes(bi, bytes_done >> 9);
 
-	bip->bip_iter.bi_sector += bytes_done >> 9;
+	bip->bip_iter.bi_sector += bio_integrity_intervals(bi, bytes_done >> 9);
 	bvec_iter_advance(bip->bip_vec, &bip->bip_iter, bytes);
 }
 

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/1] t10-pi bio split fix
  2021-12-30  2:23     ` Damien Le Moal
@ 2022-01-13 11:13       ` Lyashkov, Alexey
  2022-01-13 19:59         ` Martin K. Petersen
  0 siblings, 1 reply; 15+ messages in thread
From: Lyashkov, Alexey @ 2022-01-13 11:13 UTC (permalink / raw)
  To: Damien Le Moal, Martin K. Petersen
  Cc: Jens Axboe, Bart Van Assche, Dmitry Fomichev, linux-block, linux-scsi

Ping for review.

On 30/12/2021, 05:24, "Damien Le Moal" <damien.lemoal@opensource.wdc.com> wrote:

    On 12/24/21 21:16, Lyashkov, Alexey wrote:

    This thread should really be addressed to linux-block@vger.kernel.org
    and linux-scsi@vger.kernel.org.

    Added to cc here.


    > Martin,
    > 
    > Sorry about delay.
    > 
    > I don't agree with you about T10 PI reference tag in current code.
    > t10_pi_generate works with virtual block numbers and virtual reference tags.
    > Virtual tag mapped into real tag later in the 
    > static void t10_pi_type1_prepare(struct request *rq)
    > ...
    >                                 if (be32_to_cpu(pi->ref_tag) == virt)
    >                                         pi->ref_tag = cpu_to_be32(ref_tag);
    > 
    > ...
    > So, we need just a pair between these functions to have a good mapping and good real reference tag 
    > Once t10_pi_generate have shift a "virtual" ref tag for 4 it make a bio_integrity_advance to be happy.
    > And t10_pi_type1_prepare also happy but it need to be shift with 4 as similar to the generate function.
    > 
    > This patch tested with software raid (raid 1 / raid 6) over over NMVe devices with 4k block size.
    > In lustre case it caused a bio integrity prepare called before bio_submit so integrity will be splits before sends to the nvme devices.
    > Without patch it caused an T10 write errors for each write over 4k, with patch - no errors.
    > 
    > Alex
    > 
    > On 20/12/2021, 19:29, "Martin K. Petersen" <martin.petersen@oracle.com> wrote:
    > 
    > 
    >     Alexey,
    > 
    >     > t10_pi_generate / t10_pi_type1_prepare have just a increment by “1” for 
    >     > the integrity internal which is 4k in my case,
    >     > so any bio_integrity_advance call will be move an iterator outside of
    >     > generated sequence and t10_pi_type1_prepare can’t be found a good virtual
    >     > sector for the mapping.
    >     > Changing an increment by “1” to be related to the real integrity size 
    >     > solve a problem completely.
    > 
    >     By definition the T10 PI reference tag is incremented by one per
    >     interval (typically the logical block size). If you implement it by a
    >     different value than one then it is no longer valid protection
    >     information.
    > 
    >     Seems like the splitting logic is broken somehow although I haven't seen
    >     any failures with 4K on SCSI. What does your storage stack look like?
    > 
    >     -- 
    >     Martin K. Petersen	Oracle Linux Engineering
    > 


    -- 
    Damien Le Moal
    Western Digital Research


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/1] t10-pi bio split fix
  2021-12-24 12:16   ` Lyashkov, Alexey
@ 2021-12-30  2:23     ` Damien Le Moal
  2022-01-13 11:13       ` Lyashkov, Alexey
  0 siblings, 1 reply; 15+ messages in thread
From: Damien Le Moal @ 2021-12-30  2:23 UTC (permalink / raw)
  To: Lyashkov, Alexey, Martin K. Petersen
  Cc: Jens Axboe, Bart Van Assche, Dmitry Fomichev, linux-block, linux-scsi

On 12/24/21 21:16, Lyashkov, Alexey wrote:

This thread should really be addressed to linux-block@vger.kernel.org
and linux-scsi@vger.kernel.org.

Added to cc here.


> Martin,
> 
> Sorry about delay.
> 
> I don't agree with you about T10 PI reference tag in current code.
> t10_pi_generate works with virtual block numbers and virtual reference tags.
> Virtual tag mapped into real tag later in the 
> static void t10_pi_type1_prepare(struct request *rq)
> ...
>                                 if (be32_to_cpu(pi->ref_tag) == virt)
>                                         pi->ref_tag = cpu_to_be32(ref_tag);
> 
> ...
> So, we need just a pair between these functions to have a good mapping and good real reference tag 
> Once t10_pi_generate have shift a "virtual" ref tag for 4 it make a bio_integrity_advance to be happy.
> And t10_pi_type1_prepare also happy but it need to be shift with 4 as similar to the generate function.
> 
> This patch tested with software raid (raid 1 / raid 6) over over NMVe devices with 4k block size.
> In lustre case it caused a bio integrity prepare called before bio_submit so integrity will be splits before sends to the nvme devices.
> Without patch it caused an T10 write errors for each write over 4k, with patch - no errors.
> 
> Alex
> 
> On 20/12/2021, 19:29, "Martin K. Petersen" <martin.petersen@oracle.com> wrote:
> 
> 
>     Alexey,
> 
>     > t10_pi_generate / t10_pi_type1_prepare have just a increment by “1” for 
>     > the integrity internal which is 4k in my case,
>     > so any bio_integrity_advance call will be move an iterator outside of
>     > generated sequence and t10_pi_type1_prepare can’t be found a good virtual
>     > sector for the mapping.
>     > Changing an increment by “1” to be related to the real integrity size 
>     > solve a problem completely.
> 
>     By definition the T10 PI reference tag is incremented by one per
>     interval (typically the logical block size). If you implement it by a
>     different value than one then it is no longer valid protection
>     information.
> 
>     Seems like the splitting logic is broken somehow although I haven't seen
>     any failures with 4K on SCSI. What does your storage stack look like?
> 
>     -- 
>     Martin K. Petersen	Oracle Linux Engineering
> 


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/1] t10-pi bio split fix
  2021-12-20 16:29 ` Martin K. Petersen
@ 2021-12-24 12:16   ` Lyashkov, Alexey
  2021-12-30  2:23     ` Damien Le Moal
  0 siblings, 1 reply; 15+ messages in thread
From: Lyashkov, Alexey @ 2021-12-24 12:16 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Jens Axboe, Bart Van Assche, fio, Damien Le Moal, Dmitry Fomichev

Martin,

Sorry about delay.

I don't agree with you about T10 PI reference tag in current code.
t10_pi_generate works with virtual block numbers and virtual reference tags.
Virtual tag mapped into real tag later in the 
static void t10_pi_type1_prepare(struct request *rq)
...
                                if (be32_to_cpu(pi->ref_tag) == virt)
                                        pi->ref_tag = cpu_to_be32(ref_tag);

...
So, we need just a pair between these functions to have a good mapping and good real reference tag 
Once t10_pi_generate have shift a "virtual" ref tag for 4 it make a bio_integrity_advance to be happy.
And t10_pi_type1_prepare also happy but it need to be shift with 4 as similar to the generate function.

This patch tested with software raid (raid 1 / raid 6) over over NMVe devices with 4k block size.
In lustre case it caused a bio integrity prepare called before bio_submit so integrity will be splits before sends to the nvme devices.
Without patch it caused an T10 write errors for each write over 4k, with patch - no errors.

Alex

On 20/12/2021, 19:29, "Martin K. Petersen" <martin.petersen@oracle.com> wrote:


    Alexey,

    > t10_pi_generate / t10_pi_type1_prepare have just a increment by “1” for 
    > the integrity internal which is 4k in my case,
    > so any bio_integrity_advance call will be move an iterator outside of
    > generated sequence and t10_pi_type1_prepare can’t be found a good virtual
    > sector for the mapping.
    > Changing an increment by “1” to be related to the real integrity size 
    > solve a problem completely.

    By definition the T10 PI reference tag is incremented by one per
    interval (typically the logical block size). If you implement it by a
    different value than one then it is no longer valid protection
    information.

    Seems like the splitting logic is broken somehow although I haven't seen
    any failures with 4K on SCSI. What does your storage stack look like?

    -- 
    Martin K. Petersen	Oracle Linux Engineering


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/1] t10-pi bio split fix
  2021-12-20 13:44 Alexey Lyashkov
@ 2021-12-20 16:29 ` Martin K. Petersen
  2021-12-24 12:16   ` Lyashkov, Alexey
  0 siblings, 1 reply; 15+ messages in thread
From: Martin K. Petersen @ 2021-12-20 16:29 UTC (permalink / raw)
  To: Alexey Lyashkov
  Cc: Jens Axboe, Bart Van Assche, fio, Damien Le Moal, Dmitry Fomichev


Alexey,

> t10_pi_generate / t10_pi_type1_prepare have just a increment by “1” for 
> the integrity internal which is 4k in my case,
> so any bio_integrity_advance call will be move an iterator outside of
> generated sequence and t10_pi_type1_prepare can’t be found a good virtual
> sector for the mapping.
> Changing an increment by “1” to be related to the real integrity size 
> solve a problem completely.

By definition the T10 PI reference tag is incremented by one per
interval (typically the logical block size). If you implement it by a
different value than one then it is no longer valid protection
information.

Seems like the splitting logic is broken somehow although I haven't seen
any failures with 4K on SCSI. What does your storage stack look like?

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 0/1] t10-pi bio split fix
@ 2021-12-20 13:44 Alexey Lyashkov
  2021-12-20 16:29 ` Martin K. Petersen
  0 siblings, 1 reply; 15+ messages in thread
From: Alexey Lyashkov @ 2021-12-20 13:44 UTC (permalink / raw)
  To: Jens Axboe, Bart Van Assche, fio
  Cc: Damien Le Moal, Dmitry Fomichev, Alexey Lyashkov

Hello All,

During HPe testing Lustre team discovered bio incorrectly splits with
integrity info attached.

Some research pointed me to the two fixes related to my problem,
first was 
commit f36ea50ca0043e7b1204feaf1d2ba6bd68c08d36
Author: Wen Xiong <wenxiong@xxxxxxxxxxxxxxxxxx>
Date:   Wed May 10 08:54:11 2017 -0500

    blk-mq: NVMe 512B/4K+T10 DIF/DIX format returns I/O error on dd with split op

    When formatting NVMe to 512B/4K + T10 DIf/DIX, dd with split op returns
    "Input/output error". Looks block layer split the bio after calling
    bio_integrity_prep(bio). This patch fixes the issue.

and second was
commit e4dc9a4c31fe10d1751c542702afc85be8a5c56a
Author: Israel Rukshin <israelr@xxxxxxxxxxxx>
Date:   Wed Dec 11 17:36:02 2019 +0200

    scsi: target/iblock: Fix protection error with blocks greater than 512B

…
Block io trace pointed me to the three functions called in my case, it is
t10_pi_generate, bio_integrity_advance, t10_pi_type1_prepare.

Looking in code - t10_pi_generate generate a ref_tag based on “virtual” block 
number (512b base), and t10_pi_type1_prepare - converts this data to the the 
real ref_tag (aka device block number), but sometimes it’s don’t mapped.
Looking to the 
void bio_integrity_advance(struct bio *bio, unsigned int bytes_done)
{
        struct bio_integrity_payload *bip = bio_integrity(bio);
        struct blk_integrity *bi = blk_get_integrity(bio->bi_disk);
        unsigned bytes = bio_integrity_bytes(bi, bytes_done >> 9);
        bip->bip_iter.bi_sector += bytes_done >> 9;
        bvec_iter_advance(bip->bip_vec, &bip->bip_iter, bytes);
}

it have shit an iterator in the 512b block size base it looks right, but wait… 
static blk_status_t t10_pi_generate(struct blk_integrity_iter *iter,
                csum_fn *fn, enum t10_dif_type type)
{
        unsigned int i;

        for (i = 0 ; i < iter->data_size ; i += iter->interval) {
                struct t10_pi_tuple *pi = iter->prot_buf;


                iter->seed++; <<<
        }

        return BLK_STS_OK;
}

t10_pi_generate / t10_pi_type1_prepare have just a increment by “1” for 
the integrity internal which is 4k in my case,
so any bio_integrity_advance call will be move an iterator outside of
generated sequence and t10_pi_type1_prepare can’t be found a good virtual
sector for the mapping.
Changing an increment by “1” to be related to the real integrity size 
solve a problem completely.
Attached patch passed my own testing on raid0 with 4k block size. 

Alexey Lyashkov (1):
  t10-pi: fix intergrity iterator

 block/t10-pi.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

-- 
2.27.0


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2022-02-07  9:34 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-02 17:42 [PATCH 0/1] t10-pi bio split fix Ivanov, Dmitry (HPC)
2022-02-04  3:44 ` Martin K. Petersen
2022-02-04  7:43   ` Milan Broz
2022-02-04  7:43     ` [dm-devel] " Milan Broz
2022-02-04  7:58     ` Lyashkov, Alexey
2022-02-04  7:58       ` [dm-devel] " Lyashkov, Alexey
2022-02-05  3:03     ` Ivanov, Dmitry (HPC)
2022-02-05  3:03       ` [dm-devel] " Ivanov, Dmitry (HPC)
  -- strict thread matches above, loose matches on Subject: below --
2021-12-20 13:44 Alexey Lyashkov
2021-12-20 16:29 ` Martin K. Petersen
2021-12-24 12:16   ` Lyashkov, Alexey
2021-12-30  2:23     ` Damien Le Moal
2022-01-13 11:13       ` Lyashkov, Alexey
2022-01-13 19:59         ` Martin K. Petersen
2022-01-14 12:25           ` Lyashkov, Alexey

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.