All of lore.kernel.org
 help / color / mirror / Atom feed
* Block Integrity Rq Count Question
@ 2017-12-08 22:04 Jeffrey Lien
  2017-12-08 22:19 ` Keith Busch
  2017-12-12  8:39 ` Christoph Hellwig
  0 siblings, 2 replies; 7+ messages in thread
From: Jeffrey Lien @ 2017-12-08 22:04 UTC (permalink / raw)


I've noticed an issue when trying to create an ext3/4 filesystem on nvme device with RHEL 7.3 and 7.4 and would like to understand how it's supposed to work or if there's a bug in the driver code.  

When doing the mkfs command on an nvme device using lbaf of 1 or 3 (ie using metadata), the call to blk_rq_count_integrity_sg in nvme_map_data returns 2 causing the nvme_map_data function to goto out_unmap and ultimately the request fails.  In the blk_rq_count_integrity_sg function, the check "if (!BIOVEC_PHYS_MERGEABLE(ivprv, iv))" is true causing a 2nd segment to be added.  This seems like it could happen regularly so my question is why the nvme driver map data function is only ever expecting 1 segment?

Jeff Lien
Linux Device Driver Development
Device Host Apps and Drivers
jeff.lien at wdc.com
o: 507-322-2416 (ext. 23-2416)
m: 507-273-9124

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Block Integrity Rq Count Question
  2017-12-08 22:04 Block Integrity Rq Count Question Jeffrey Lien
@ 2017-12-08 22:19 ` Keith Busch
  2017-12-11 17:54   ` Jeffrey Lien
  2017-12-12  8:39 ` Christoph Hellwig
  1 sibling, 1 reply; 7+ messages in thread
From: Keith Busch @ 2017-12-08 22:19 UTC (permalink / raw)


On Fri, Dec 08, 2017@10:04:47PM +0000, Jeffrey Lien wrote:
> I've noticed an issue when trying to create an ext3/4 filesystem on
> nvme device with RHEL 7.3 and 7.4 and would like to understand how
> it's supposed to work or if there's a bug in the driver code.  
> 
> When doing the mkfs command on an nvme device using lbaf of 1 or 3 (ie
> using metadata), the call to blk_rq_count_integrity_sg in
> nvme_map_data returns 2 causing the nvme_map_data function to goto
> out_unmap and ultimately the request fails.  In the
> blk_rq_count_integrity_sg function, the check "if
> (!BIOVEC_PHYS_MERGEABLE(ivprv, iv))" is true causing a 2nd segment to
> be added.  This seems like it could happen regularly so my question is
> why the nvme driver map data function is only ever expecting 1
> segment?

The definition of the NVMe MPTR field says it has to be "a contiguous
physical buffer". It's not physically contiguous if you've two segments.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Block Integrity Rq Count Question
  2017-12-08 22:19 ` Keith Busch
@ 2017-12-11 17:54   ` Jeffrey Lien
  2017-12-12  0:00     ` Keith Busch
  0 siblings, 1 reply; 7+ messages in thread
From: Jeffrey Lien @ 2017-12-11 17:54 UTC (permalink / raw)


Keith,
Your comment below makes sense, but I still have a question.   Where in the driver (or maybe it's block layer) is metadata pointer allocated?   I can't find where that happens in the nvme driver so does this happen in the block layer?   And how do we control whether or not it's 1 contiguous buffer or not?


Jeff Lien

-----Original Message-----
From: Keith Busch [mailto:keith.busch@intel.com] 
Sent: Friday, December 8, 2017 4:20 PM
To: Jeffrey Lien
Cc: linux-nvme at lists.infradead.org; Christoph Hellwig; David Darrington
Subject: Re: Block Integrity Rq Count Question

On Fri, Dec 08, 2017@10:04:47PM +0000, Jeffrey Lien wrote:
> I've noticed an issue when trying to create an ext3/4 filesystem on 
> nvme device with RHEL 7.3 and 7.4 and would like to understand how 
> it's supposed to work or if there's a bug in the driver code.
> 
> When doing the mkfs command on an nvme device using lbaf of 1 or 3 (ie 
> using metadata), the call to blk_rq_count_integrity_sg in 
> nvme_map_data returns 2 causing the nvme_map_data function to goto 
> out_unmap and ultimately the request fails.  In the 
> blk_rq_count_integrity_sg function, the check "if 
> (!BIOVEC_PHYS_MERGEABLE(ivprv, iv))" is true causing a 2nd segment to 
> be added.  This seems like it could happen regularly so my question is 
> why the nvme driver map data function is only ever expecting 1 
> segment?

The definition of the NVMe MPTR field says it has to be "a contiguous physical buffer". It's not physically contiguous if you've two segments.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Block Integrity Rq Count Question
  2017-12-11 17:54   ` Jeffrey Lien
@ 2017-12-12  0:00     ` Keith Busch
  0 siblings, 0 replies; 7+ messages in thread
From: Keith Busch @ 2017-12-12  0:00 UTC (permalink / raw)


On Mon, Dec 11, 2017@05:54:35PM +0000, Jeffrey Lien wrote:
> Keith,
> Your comment below makes sense, but I still have a question.   Where in the driver (or maybe it's block layer) is metadata pointer allocated?   I can't find where that happens in the nvme driver so does this happen in the block layer?   And how do we control whether or not it's 1 contiguous buffer or not?

The function 'nvme_init_integrity' sets up our metadata
profile and requests single metadata payload segments with
'blk_queue_max_integrity_segments(disk->queue, 1)', so the fact
that you're getting multiple segments in your payload suggests some
inappropriate merging is going on in the block layer. It looks like
blk_integrity_merge_{bio,rq} are doing the right thing.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Block Integrity Rq Count Question
  2017-12-08 22:04 Block Integrity Rq Count Question Jeffrey Lien
  2017-12-08 22:19 ` Keith Busch
@ 2017-12-12  8:39 ` Christoph Hellwig
  2017-12-12 14:34   ` Jeffrey Lien
  2017-12-12 21:17   ` Jeffrey Lien
  1 sibling, 2 replies; 7+ messages in thread
From: Christoph Hellwig @ 2017-12-12  8:39 UTC (permalink / raw)


On Fri, Dec 08, 2017@10:04:47PM +0000, Jeffrey Lien wrote:
> I've noticed an issue when trying to create an ext3/4 filesystem on nvme device with RHEL 7.3 and 7.4 and would like to understand how it's supposed to work or if there's a bug in the driver code.  

Can you reproduces this on Linux 4.14 / Linux 4.15-rc, please?   The
code in bio_integrity_prep should allocate the right number of
bio_vec entries based on what the device asks for, and for NVMe that's
always 1 in Linux as we don't support SGLs for the metadata transfer.

> 
> When doing the mkfs command on an nvme device using lbaf of 1 or 3 (ie using metadata), the call to blk_rq_count_integrity_sg in nvme_map_data returns 2 causing the nvme_map_data function to goto out_unmap and ultimately the request fails.  In the blk_rq_count_integrity_sg function, the check "if (!BIOVEC_PHYS_MERGEABLE(ivprv, iv))" is true causing a 2nd segment to be added.  This seems like it could happen regularly so my question is why the nvme driver map data function is only ever expecting 1 segment?
> 
> Jeff Lien
> Linux Device Driver Development
> Device Host Apps and Drivers
> jeff.lien at wdc.com
> o: 507-322-2416 (ext. 23-2416)
> m: 507-273-9124
> 
> 
> 
> 
> 
---end quoted text---

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Block Integrity Rq Count Question
  2017-12-12  8:39 ` Christoph Hellwig
@ 2017-12-12 14:34   ` Jeffrey Lien
  2017-12-12 21:17   ` Jeffrey Lien
  1 sibling, 0 replies; 7+ messages in thread
From: Jeffrey Lien @ 2017-12-12 14:34 UTC (permalink / raw)


I'll see if I can reproduce on 4.14 or 4.5 and let you know - hopefully later today.  


Jeff Lien

-----Original Message-----
From: Christoph Hellwig [mailto:hch@lst.de] 
Sent: Tuesday, December 12, 2017 2:40 AM
To: Jeffrey Lien
Cc: linux-nvme at lists.infradead.org; Christoph Hellwig; David Darrington; Martin K. Petersen
Subject: Re: Block Integrity Rq Count Question

On Fri, Dec 08, 2017@10:04:47PM +0000, Jeffrey Lien wrote:
> I've noticed an issue when trying to create an ext3/4 filesystem on nvme device with RHEL 7.3 and 7.4 and would like to understand how it's supposed to work or if there's a bug in the driver code.  

Can you reproduces this on Linux 4.14 / Linux 4.15-rc, please?   The
code in bio_integrity_prep should allocate the right number of bio_vec entries based on what the device asks for, and for NVMe that's always 1 in Linux as we don't support SGLs for the metadata transfer.

> 
> When doing the mkfs command on an nvme device using lbaf of 1 or 3 (ie using metadata), the call to blk_rq_count_integrity_sg in nvme_map_data returns 2 causing the nvme_map_data function to goto out_unmap and ultimately the request fails.  In the blk_rq_count_integrity_sg function, the check "if (!BIOVEC_PHYS_MERGEABLE(ivprv, iv))" is true causing a 2nd segment to be added.  This seems like it could happen regularly so my question is why the nvme driver map data function is only ever expecting 1 segment?
> 
> Jeff Lien
> Linux Device Driver Development
> Device Host Apps and Drivers
> jeff.lien at wdc.com
> o: 507-322-2416 (ext. 23-2416)
> m: 507-273-9124
> 
> 
> 
> 
> 
---end quoted text---

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Block Integrity Rq Count Question
  2017-12-12  8:39 ` Christoph Hellwig
  2017-12-12 14:34   ` Jeffrey Lien
@ 2017-12-12 21:17   ` Jeffrey Lien
  1 sibling, 0 replies; 7+ messages in thread
From: Jeffrey Lien @ 2017-12-12 21:17 UTC (permalink / raw)


Christoph,
The problem is resolved in the 4.15.0-rc1.  I'm able to do the mkfs command with nvme_map_data returning 1 segment for all requests.   Is there a specific block layer patch I can reference and recommend to Redhat to pull into their 7.3 and 7.4 releases?


Jeff Lien

-----Original Message-----
From: Christoph Hellwig [mailto:hch@lst.de] 
Sent: Tuesday, December 12, 2017 2:40 AM
To: Jeffrey Lien
Cc: linux-nvme at lists.infradead.org; Christoph Hellwig; David Darrington; Martin K. Petersen
Subject: Re: Block Integrity Rq Count Question

On Fri, Dec 08, 2017@10:04:47PM +0000, Jeffrey Lien wrote:
> I've noticed an issue when trying to create an ext3/4 filesystem on nvme device with RHEL 7.3 and 7.4 and would like to understand how it's supposed to work or if there's a bug in the driver code.  

Can you reproduces this on Linux 4.14 / Linux 4.15-rc, please?   The
code in bio_integrity_prep should allocate the right number of bio_vec entries based on what the device asks for, and for NVMe that's always 1 in Linux as we don't support SGLs for the metadata transfer.

> 
> When doing the mkfs command on an nvme device using lbaf of 1 or 3 (ie using metadata), the call to blk_rq_count_integrity_sg in nvme_map_data returns 2 causing the nvme_map_data function to goto out_unmap and ultimately the request fails.  In the blk_rq_count_integrity_sg function, the check "if (!BIOVEC_PHYS_MERGEABLE(ivprv, iv))" is true causing a 2nd segment to be added.  This seems like it could happen regularly so my question is why the nvme driver map data function is only ever expecting 1 segment?
> 
> Jeff Lien
> Linux Device Driver Development
> Device Host Apps and Drivers
> jeff.lien at wdc.com
> o: 507-322-2416 (ext. 23-2416)
> m: 507-273-9124
> 
> 
> 
> 
> 
---end quoted text---

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-12-12 21:17 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-08 22:04 Block Integrity Rq Count Question Jeffrey Lien
2017-12-08 22:19 ` Keith Busch
2017-12-11 17:54   ` Jeffrey Lien
2017-12-12  0:00     ` Keith Busch
2017-12-12  8:39 ` Christoph Hellwig
2017-12-12 14:34   ` Jeffrey Lien
2017-12-12 21:17   ` Jeffrey Lien

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.