All of lore.kernel.org
 help / color / mirror / Atom feed
* Weirdness with discard cmd and get log pages
@ 2016-10-13 15:46 Nisha Miller
  2016-10-13 16:15 ` Keith Busch
  0 siblings, 1 reply; 9+ messages in thread
From: Nisha Miller @ 2016-10-13 15:46 UTC (permalink / raw)


Hi,

We are running the stock nvme driver on Centos 7.2 with kernel 3.19.8
and we are seeing some weirdness with get log pages and discard cmd.
In the driver, both these calls setup a SG list for DMA.

In the case of discard command, we are able to fetch the first two
ranges (i.e 16 bytes/range x 2) correctly using DMA. But any ranges
after that contain just junk data. For example, if we try to fetch 4
ranges, ranges 1-2 are valid but ranges 3-4 contain junk.

In the case of get log pages, its even worse. A lot of the data we DMA
to driver/host is mostly junk. And this changes based on how much data
is DMA'd.

Note that DMA for read/write works perfectly.

In our FW, when we fetch data using DMA, we always fetch in multiples
of 32 bytes due to alignment constraints.

Is there something we are missing here that could be causing these problems?

TIA
Nisha Miller

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Weirdness with discard cmd and get log pages
  2016-10-13 15:46 Weirdness with discard cmd and get log pages Nisha Miller
@ 2016-10-13 16:15 ` Keith Busch
  2016-10-13 18:18   ` Nisha Miller
  0 siblings, 1 reply; 9+ messages in thread
From: Keith Busch @ 2016-10-13 16:15 UTC (permalink / raw)


On Thu, Oct 13, 2016@08:46:30AM -0700, Nisha Miller wrote:
> In the case of discard command, we are able to fetch the first two
> ranges (i.e 16 bytes/range x 2) correctly using DMA. But any ranges
> after that contain just junk data. For example, if we try to fetch 4
> ranges, ranges 1-2 are valid but ranges 3-4 contain junk.

How did you set up a discard command with more than one range? The
standard discard path to the driver allows only one.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Weirdness with discard cmd and get log pages
  2016-10-13 16:15 ` Keith Busch
@ 2016-10-13 18:18   ` Nisha Miller
  2016-10-13 23:18     ` Keith Busch
  0 siblings, 1 reply; 9+ messages in thread
From: Nisha Miller @ 2016-10-13 18:18 UTC (permalink / raw)


On Thu, Oct 13, 2016@9:15 AM, Keith Busch <keith.busch@intel.com> wrote:
> On Thu, Oct 13, 2016@08:46:30AM -0700, Nisha Miller wrote:
>> In the case of discard command, we are able to fetch the first two
>> ranges (i.e 16 bytes/range x 2) correctly using DMA. But any ranges
>> after that contain just junk data. For example, if we try to fetch 4
>> ranges, ranges 1-2 are valid but ranges 3-4 contain junk.
>
> How did you set up a discard command with more than one range? The
> standard discard path to the driver allows only one.

Yes, that is what I noticed too. I used the nvme-cli command like this:

nvme dsm /dev/nvme0n1 -a 0,0,0,0 --blocks=4,5,6,7 --slbs=100,200,300,400 --ad

This turns up as nvme_user_cmd in the driver, which calls
nvme_map_user_pages to setup the SG list.

Nisha

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Weirdness with discard cmd and get log pages
  2016-10-13 18:18   ` Nisha Miller
@ 2016-10-13 23:18     ` Keith Busch
  2016-10-14 17:44       ` Keith Busch
  0 siblings, 1 reply; 9+ messages in thread
From: Keith Busch @ 2016-10-13 23:18 UTC (permalink / raw)


On Thu, Oct 13, 2016@11:18:43AM -0700, Nisha Miller wrote:
> Yes, that is what I noticed too. I used the nvme-cli command like this:
> 
> nvme dsm /dev/nvme0n1 -a 0,0,0,0 --blocks=4,5,6,7 --slbs=100,200,300,400 --ad
> 
> This turns up as nvme_user_cmd in the driver, which calls
> nvme_map_user_pages to setup the SG list.

Okay, that's what I use too. I'm not observing any issues on a 4.8 kernel
or back to 4.4 either. I've not tested 3.19 though, and the mechanism
it uses to map user buffers is completely different. Could you verify if
your observation exists in a more current stable release?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Weirdness with discard cmd and get log pages
  2016-10-13 23:18     ` Keith Busch
@ 2016-10-14 17:44       ` Keith Busch
  2016-10-14 21:33         ` Nisha Miller
  0 siblings, 1 reply; 9+ messages in thread
From: Keith Busch @ 2016-10-14 17:44 UTC (permalink / raw)


On Thu, Oct 13, 2016@07:18:09PM -0400, Keith Busch wrote:
> On Thu, Oct 13, 2016@11:18:43AM -0700, Nisha Miller wrote:
> > Yes, that is what I noticed too. I used the nvme-cli command like this:
> > 
> > nvme dsm /dev/nvme0n1 -a 0,0,0,0 --blocks=4,5,6,7 --slbs=100,200,300,400 --ad
> > 
> > This turns up as nvme_user_cmd in the driver, which calls
> > nvme_map_user_pages to setup the SG list.
> 
> Okay, that's what I use too. I'm not observing any issues on a 4.8 kernel
> or back to 4.4 either. I've not tested 3.19 though, and the mechanism
> it uses to map user buffers is completely different. Could you verify if
> your observation exists in a more current stable release?

Just for reference, this is how I've verified 64 ranges. My device
deterministically returns 0 on any deallocated block, and is formatted
with 512b LBAs. 

  # create a random 1MB file
  dd if=/dev/urandom of=~/rand.1M.in bs=1M count=1

  # write it to the device
  dd if=~/rand.1M.in of=/dev/nvme0n1 oflag=direct

  # read it back out
  dd if=/dev/nvme0n1 of=~/rand.1M.out bs=1M count=1 iflag=direct

  # compare the two to verify they're the same
  diff ~/rand.1M.in ~/rand.1M.out

  # write a bunch of 0-filled 8k holes in the original file
  for i in $(seq 0 2 127); do dd if=/dev/zero of=~/rand.1M.in bs=8k seek=$i conv=notrunc count=1 2> /dev/null; done

  # deallocate the exact same ranges as the file's new 0-filled holes 
  nvme dsm /dev/nvme0n1 -d --slbs=`seq 0 32 2016 | tr "\n" "," | sed "s/,$//g"` --blocks=`printf "16,%0.s" {0..63} | sed "s/,$//g"`

  # read the file from the device
  dd if=/dev/nvme0n1 of=~/rand.1M.out bs=1M count=1 iflag=direct

  # verify the contents are still the same
  diff ~/rand.1M.in ~/rand.1M.out

Works for me.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Weirdness with discard cmd and get log pages
  2016-10-14 17:44       ` Keith Busch
@ 2016-10-14 21:33         ` Nisha Miller
  2016-10-14 23:19           ` Keith Busch
  0 siblings, 1 reply; 9+ messages in thread
From: Nisha Miller @ 2016-10-14 21:33 UTC (permalink / raw)


Hi Keith,

thank you for the test case. I'll try it out.

My problem is that I need to make it work on Kernel 2.6.33. For now,
since blkdiscard and fstrim send only 1 range I'm ok with current
support for discard cmd.

However I need to fix the issue for get log pages. Are there any known
issues with mapping user buffers in older linux-nvme drivers?

thanks
Nisha


On Fri, Oct 14, 2016@10:44 AM, Keith Busch <keith.busch@intel.com> wrote:
> On Thu, Oct 13, 2016@07:18:09PM -0400, Keith Busch wrote:
>> On Thu, Oct 13, 2016@11:18:43AM -0700, Nisha Miller wrote:
>> > Yes, that is what I noticed too. I used the nvme-cli command like this:
>> >
>> > nvme dsm /dev/nvme0n1 -a 0,0,0,0 --blocks=4,5,6,7 --slbs=100,200,300,400 --ad
>> >
>> > This turns up as nvme_user_cmd in the driver, which calls
>> > nvme_map_user_pages to setup the SG list.
>>
>> Okay, that's what I use too. I'm not observing any issues on a 4.8 kernel
>> or back to 4.4 either. I've not tested 3.19 though, and the mechanism
>> it uses to map user buffers is completely different. Could you verify if
>> your observation exists in a more current stable release?
>
> Just for reference, this is how I've verified 64 ranges. My device
> deterministically returns 0 on any deallocated block, and is formatted
> with 512b LBAs.
>
>   # create a random 1MB file
>   dd if=/dev/urandom of=~/rand.1M.in bs=1M count=1
>
>   # write it to the device
>   dd if=~/rand.1M.in of=/dev/nvme0n1 oflag=direct
>
>   # read it back out
>   dd if=/dev/nvme0n1 of=~/rand.1M.out bs=1M count=1 iflag=direct
>
>   # compare the two to verify they're the same
>   diff ~/rand.1M.in ~/rand.1M.out
>
>   # write a bunch of 0-filled 8k holes in the original file
>   for i in $(seq 0 2 127); do dd if=/dev/zero of=~/rand.1M.in bs=8k seek=$i conv=notrunc count=1 2> /dev/null; done
>
>   # deallocate the exact same ranges as the file's new 0-filled holes
>   nvme dsm /dev/nvme0n1 -d --slbs=`seq 0 32 2016 | tr "\n" "," | sed "s/,$//g"` --blocks=`printf "16,%0.s" {0..63} | sed "s/,$//g"`
>
>   # read the file from the device
>   dd if=/dev/nvme0n1 of=~/rand.1M.out bs=1M count=1 iflag=direct
>
>   # verify the contents are still the same
>   diff ~/rand.1M.in ~/rand.1M.out
>
> Works for me.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Weirdness with discard cmd and get log pages
  2016-10-14 21:33         ` Nisha Miller
@ 2016-10-14 23:19           ` Keith Busch
  2016-10-18 16:55             ` Nisha Miller
  0 siblings, 1 reply; 9+ messages in thread
From: Keith Busch @ 2016-10-14 23:19 UTC (permalink / raw)


On Fri, Oct 14, 2016@02:33:49PM -0700, Nisha Miller wrote:
> My problem is that I need to make it work on Kernel 2.6.33. For now,
> since blkdiscard and fstrim send only 1 range I'm ok with current
> support for discard cmd.
> 
> However I need to fix the issue for get log pages. Are there any known
> issues with mapping user buffers in older linux-nvme drivers?

I'm pretty sure it works. I tested on a RHEL 6.6 (a 2.6.32 fork), and
I can read all the log pages without issue.

The nvme-cli doesn't align every buffer on page boundaries, so many of
these passthru commands require PRP2. Is your device using this correctly
on non-RW commands?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Weirdness with discard cmd and get log pages
  2016-10-14 23:19           ` Keith Busch
@ 2016-10-18 16:55             ` Nisha Miller
  2016-10-18 17:08               ` Keith Busch
  0 siblings, 1 reply; 9+ messages in thread
From: Nisha Miller @ 2016-10-18 16:55 UTC (permalink / raw)


On Fri, Oct 14, 2016@4:19 PM, Keith Busch <keith.busch@intel.com> wrote:
>
> The nvme-cli doesn't align every buffer on page boundaries, so many of
> these passthru commands require PRP2. Is your device using this correctly
> on non-RW commands?

Hi Keith,

since the data transfer size is less than 4K from get log pages and
DSM, won't PRP2 be invalid?

thanks
Nisha

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Weirdness with discard cmd and get log pages
  2016-10-18 16:55             ` Nisha Miller
@ 2016-10-18 17:08               ` Keith Busch
  0 siblings, 0 replies; 9+ messages in thread
From: Keith Busch @ 2016-10-18 17:08 UTC (permalink / raw)


On Tue, Oct 18, 2016@09:55:20AM -0700, Nisha Miller wrote:
> since the data transfer size is less than 4K from get log pages and
> DSM, won't PRP2 be invalid?

No, that would only be true if PRP1's offset was 0, and you're not
guaranteed that.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2016-10-18 17:08 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-13 15:46 Weirdness with discard cmd and get log pages Nisha Miller
2016-10-13 16:15 ` Keith Busch
2016-10-13 18:18   ` Nisha Miller
2016-10-13 23:18     ` Keith Busch
2016-10-14 17:44       ` Keith Busch
2016-10-14 21:33         ` Nisha Miller
2016-10-14 23:19           ` Keith Busch
2016-10-18 16:55             ` Nisha Miller
2016-10-18 17:08               ` Keith Busch

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.