* Block Integrity Rq Count Question
@ 2017-12-08 22:04 Jeffrey Lien
2017-12-08 22:19 ` Keith Busch
2017-12-12 8:39 ` Christoph Hellwig
0 siblings, 2 replies; 7+ messages in thread
From: Jeffrey Lien @ 2017-12-08 22:04 UTC (permalink / raw)
I've noticed an issue when trying to create an ext3/4 filesystem on nvme device with RHEL 7.3 and 7.4 and would like to understand how it's supposed to work or if there's a bug in the driver code.
When doing the mkfs command on an nvme device using lbaf of 1 or 3 (ie using metadata), the call to blk_rq_count_integrity_sg in nvme_map_data returns 2 causing the nvme_map_data function to goto out_unmap and ultimately the request fails. In the blk_rq_count_integrity_sg function, the check "if (!BIOVEC_PHYS_MERGEABLE(ivprv, iv))" is true causing a 2nd segment to be added. This seems like it could happen regularly so my question is why the nvme driver map data function is only ever expecting 1 segment?
Jeff Lien
Linux Device Driver Development
Device Host Apps and Drivers
jeff.lien at wdc.com
o: 507-322-2416 (ext. 23-2416)
m: 507-273-9124
^ permalink raw reply [flat|nested] 7+ messages in thread
* Block Integrity Rq Count Question
2017-12-08 22:04 Block Integrity Rq Count Question Jeffrey Lien
@ 2017-12-08 22:19 ` Keith Busch
2017-12-11 17:54 ` Jeffrey Lien
2017-12-12 8:39 ` Christoph Hellwig
1 sibling, 1 reply; 7+ messages in thread
From: Keith Busch @ 2017-12-08 22:19 UTC (permalink / raw)
On Fri, Dec 08, 2017@10:04:47PM +0000, Jeffrey Lien wrote:
> I've noticed an issue when trying to create an ext3/4 filesystem on
> nvme device with RHEL 7.3 and 7.4 and would like to understand how
> it's supposed to work or if there's a bug in the driver code.
>
> When doing the mkfs command on an nvme device using lbaf of 1 or 3 (ie
> using metadata), the call to blk_rq_count_integrity_sg in
> nvme_map_data returns 2 causing the nvme_map_data function to goto
> out_unmap and ultimately the request fails. In the
> blk_rq_count_integrity_sg function, the check "if
> (!BIOVEC_PHYS_MERGEABLE(ivprv, iv))" is true causing a 2nd segment to
> be added. This seems like it could happen regularly so my question is
> why the nvme driver map data function is only ever expecting 1
> segment?
The definition of the NVMe MPTR field says it has to be "a contiguous
physical buffer". It's not physically contiguous if you've two segments.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Block Integrity Rq Count Question
2017-12-08 22:19 ` Keith Busch
@ 2017-12-11 17:54 ` Jeffrey Lien
2017-12-12 0:00 ` Keith Busch
0 siblings, 1 reply; 7+ messages in thread
From: Jeffrey Lien @ 2017-12-11 17:54 UTC (permalink / raw)
Keith,
Your comment below makes sense, but I still have a question. Where in the driver (or maybe it's block layer) is metadata pointer allocated? I can't find where that happens in the nvme driver so does this happen in the block layer? And how do we control whether or not it's 1 contiguous buffer or not?
Jeff Lien
-----Original Message-----
From: Keith Busch [mailto:keith.busch@intel.com]
Sent: Friday, December 8, 2017 4:20 PM
To: Jeffrey Lien
Cc: linux-nvme at lists.infradead.org; Christoph Hellwig; David Darrington
Subject: Re: Block Integrity Rq Count Question
On Fri, Dec 08, 2017@10:04:47PM +0000, Jeffrey Lien wrote:
> I've noticed an issue when trying to create an ext3/4 filesystem on
> nvme device with RHEL 7.3 and 7.4 and would like to understand how
> it's supposed to work or if there's a bug in the driver code.
>
> When doing the mkfs command on an nvme device using lbaf of 1 or 3 (ie
> using metadata), the call to blk_rq_count_integrity_sg in
> nvme_map_data returns 2 causing the nvme_map_data function to goto
> out_unmap and ultimately the request fails. In the
> blk_rq_count_integrity_sg function, the check "if
> (!BIOVEC_PHYS_MERGEABLE(ivprv, iv))" is true causing a 2nd segment to
> be added. This seems like it could happen regularly so my question is
> why the nvme driver map data function is only ever expecting 1
> segment?
The definition of the NVMe MPTR field says it has to be "a contiguous physical buffer". It's not physically contiguous if you've two segments.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Block Integrity Rq Count Question
2017-12-11 17:54 ` Jeffrey Lien
@ 2017-12-12 0:00 ` Keith Busch
0 siblings, 0 replies; 7+ messages in thread
From: Keith Busch @ 2017-12-12 0:00 UTC (permalink / raw)
On Mon, Dec 11, 2017@05:54:35PM +0000, Jeffrey Lien wrote:
> Keith,
> Your comment below makes sense, but I still have a question. Where in the driver (or maybe it's block layer) is metadata pointer allocated? I can't find where that happens in the nvme driver so does this happen in the block layer? And how do we control whether or not it's 1 contiguous buffer or not?
The function 'nvme_init_integrity' sets up our metadata
profile and requests single metadata payload segments with
'blk_queue_max_integrity_segments(disk->queue, 1)', so the fact
that you're getting multiple segments in your payload suggests some
inappropriate merging is going on in the block layer. It looks like
blk_integrity_merge_{bio,rq} are doing the right thing.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Block Integrity Rq Count Question
2017-12-08 22:04 Block Integrity Rq Count Question Jeffrey Lien
2017-12-08 22:19 ` Keith Busch
@ 2017-12-12 8:39 ` Christoph Hellwig
2017-12-12 14:34 ` Jeffrey Lien
2017-12-12 21:17 ` Jeffrey Lien
1 sibling, 2 replies; 7+ messages in thread
From: Christoph Hellwig @ 2017-12-12 8:39 UTC (permalink / raw)
On Fri, Dec 08, 2017@10:04:47PM +0000, Jeffrey Lien wrote:
> I've noticed an issue when trying to create an ext3/4 filesystem on nvme device with RHEL 7.3 and 7.4 and would like to understand how it's supposed to work or if there's a bug in the driver code.
Can you reproduces this on Linux 4.14 / Linux 4.15-rc, please? The
code in bio_integrity_prep should allocate the right number of
bio_vec entries based on what the device asks for, and for NVMe that's
always 1 in Linux as we don't support SGLs for the metadata transfer.
>
> When doing the mkfs command on an nvme device using lbaf of 1 or 3 (ie using metadata), the call to blk_rq_count_integrity_sg in nvme_map_data returns 2 causing the nvme_map_data function to goto out_unmap and ultimately the request fails. In the blk_rq_count_integrity_sg function, the check "if (!BIOVEC_PHYS_MERGEABLE(ivprv, iv))" is true causing a 2nd segment to be added. This seems like it could happen regularly so my question is why the nvme driver map data function is only ever expecting 1 segment?
>
> Jeff Lien
> Linux Device Driver Development
> Device Host Apps and Drivers
> jeff.lien at wdc.com
> o: 507-322-2416 (ext. 23-2416)
> m: 507-273-9124
>
>
>
>
>
---end quoted text---
^ permalink raw reply [flat|nested] 7+ messages in thread
* Block Integrity Rq Count Question
2017-12-12 8:39 ` Christoph Hellwig
@ 2017-12-12 14:34 ` Jeffrey Lien
2017-12-12 21:17 ` Jeffrey Lien
1 sibling, 0 replies; 7+ messages in thread
From: Jeffrey Lien @ 2017-12-12 14:34 UTC (permalink / raw)
I'll see if I can reproduce on 4.14 or 4.5 and let you know - hopefully later today.
Jeff Lien
-----Original Message-----
From: Christoph Hellwig [mailto:hch@lst.de]
Sent: Tuesday, December 12, 2017 2:40 AM
To: Jeffrey Lien
Cc: linux-nvme at lists.infradead.org; Christoph Hellwig; David Darrington; Martin K. Petersen
Subject: Re: Block Integrity Rq Count Question
On Fri, Dec 08, 2017@10:04:47PM +0000, Jeffrey Lien wrote:
> I've noticed an issue when trying to create an ext3/4 filesystem on nvme device with RHEL 7.3 and 7.4 and would like to understand how it's supposed to work or if there's a bug in the driver code.
Can you reproduces this on Linux 4.14 / Linux 4.15-rc, please? The
code in bio_integrity_prep should allocate the right number of bio_vec entries based on what the device asks for, and for NVMe that's always 1 in Linux as we don't support SGLs for the metadata transfer.
>
> When doing the mkfs command on an nvme device using lbaf of 1 or 3 (ie using metadata), the call to blk_rq_count_integrity_sg in nvme_map_data returns 2 causing the nvme_map_data function to goto out_unmap and ultimately the request fails. In the blk_rq_count_integrity_sg function, the check "if (!BIOVEC_PHYS_MERGEABLE(ivprv, iv))" is true causing a 2nd segment to be added. This seems like it could happen regularly so my question is why the nvme driver map data function is only ever expecting 1 segment?
>
> Jeff Lien
> Linux Device Driver Development
> Device Host Apps and Drivers
> jeff.lien at wdc.com
> o: 507-322-2416 (ext. 23-2416)
> m: 507-273-9124
>
>
>
>
>
---end quoted text---
^ permalink raw reply [flat|nested] 7+ messages in thread
* Block Integrity Rq Count Question
2017-12-12 8:39 ` Christoph Hellwig
2017-12-12 14:34 ` Jeffrey Lien
@ 2017-12-12 21:17 ` Jeffrey Lien
1 sibling, 0 replies; 7+ messages in thread
From: Jeffrey Lien @ 2017-12-12 21:17 UTC (permalink / raw)
Christoph,
The problem is resolved in the 4.15.0-rc1. I'm able to do the mkfs command with nvme_map_data returning 1 segment for all requests. Is there a specific block layer patch I can reference and recommend to Redhat to pull into their 7.3 and 7.4 releases?
Jeff Lien
-----Original Message-----
From: Christoph Hellwig [mailto:hch@lst.de]
Sent: Tuesday, December 12, 2017 2:40 AM
To: Jeffrey Lien
Cc: linux-nvme at lists.infradead.org; Christoph Hellwig; David Darrington; Martin K. Petersen
Subject: Re: Block Integrity Rq Count Question
On Fri, Dec 08, 2017@10:04:47PM +0000, Jeffrey Lien wrote:
> I've noticed an issue when trying to create an ext3/4 filesystem on nvme device with RHEL 7.3 and 7.4 and would like to understand how it's supposed to work or if there's a bug in the driver code.
Can you reproduces this on Linux 4.14 / Linux 4.15-rc, please? The
code in bio_integrity_prep should allocate the right number of bio_vec entries based on what the device asks for, and for NVMe that's always 1 in Linux as we don't support SGLs for the metadata transfer.
>
> When doing the mkfs command on an nvme device using lbaf of 1 or 3 (ie using metadata), the call to blk_rq_count_integrity_sg in nvme_map_data returns 2 causing the nvme_map_data function to goto out_unmap and ultimately the request fails. In the blk_rq_count_integrity_sg function, the check "if (!BIOVEC_PHYS_MERGEABLE(ivprv, iv))" is true causing a 2nd segment to be added. This seems like it could happen regularly so my question is why the nvme driver map data function is only ever expecting 1 segment?
>
> Jeff Lien
> Linux Device Driver Development
> Device Host Apps and Drivers
> jeff.lien at wdc.com
> o: 507-322-2416 (ext. 23-2416)
> m: 507-273-9124
>
>
>
>
>
---end quoted text---
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2017-12-12 21:17 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-08 22:04 Block Integrity Rq Count Question Jeffrey Lien
2017-12-08 22:19 ` Keith Busch
2017-12-11 17:54 ` Jeffrey Lien
2017-12-12 0:00 ` Keith Busch
2017-12-12 8:39 ` Christoph Hellwig
2017-12-12 14:34 ` Jeffrey Lien
2017-12-12 21:17 ` Jeffrey Lien
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.