All of lore.kernel.org
 help / color / mirror / Atom feed
* Detecting whether hug pages are working with fsdax
@ 2022-11-09 21:59 Eliot Moss
  2022-11-14 15:09 ` Detecting whether huge pages are working with fsdax - only partial success Eliot Moss
  0 siblings, 1 reply; 4+ messages in thread
From: Eliot Moss @ 2022-11-09 21:59 UTC (permalink / raw)
  To: nvdimm

Dear nvdimmers -

I tried following Darrick Wong's advice from this page:

https://nvdimm.wiki.kernel.org/2mib_fs_dax

In particular, these instructions:
================================================================================
The way that I normally do this is by looking at the filesystem DAX tracepoints:

# cd /sys/kernel/debug/tracing
# echo 1 > events/fs_dax/dax_pmd_fault_done/enable
<run test which faults in filesystem DAX mappings>
We can then look at the dax_pmd_fault_done events in

/sys/kernel/debug/tracing/trace
and see whether they were successful. An event that successfully faulted in a filesystem DAX PMD
looks like this:

big-1434  [008] ....  1502.341229: dax_pmd_fault_done: dev 259:0 ino 0xc shared
WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10505000 vm_start 0x10200000 vm_end
0x10700000 pgoff 0x305 max_pgoff 0x1400 NOPAGE
The first thing to look at is the NOPAGE return value at the end of the line. This means that the
fault succeeded and didn't return a page cache page, which is expected for DAX. A 2 MiB fault that
failed and fell back to 4 KiB DAX faults will instead look like this:

small-1431  [008] ....  1499.402672: dax_pmd_fault_done: dev 259:0 ino 0xc shared
WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10420000 vm_start 0x10200000 vm_end
0x10500000 pgoff 0x220 max_pgoff 0x3ffff FALLBACK
You can see that this fault resulted in a fallback to 4 KiB faults via the FALLBACK return code at
the end of the line. The rest of the data in this line can help you determine why the fallback
happened. In this case it was because I intentionally created an mmap() area that was smaller than 2
MiB.
================================================================================

I get no trace output whatsoever, whether I am using 2Mb huge pages or 1Gb
huge pages.  My mmap calls are successful but I get no trace output at all,
only this:

================================================================================
# tracer: nop
#
# entries-in-buffer/entries-written: 0/0   #P:63
#
#                                _-----=> irqs-off
#                               / _----=> need-resched
#                              | / _---=> hardirq/softirq
#                              || / _--=> preempt-depth
#                              ||| / _-=> migrate-disable
#                              |||| /     delay
#           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
#              | |         |   |||||     |         |
================================================================================

Any suggestions about what may be different in my system?  It is clear that we
are mapping files created in an fdax file system, and that the contents of the
files are changing.

Regards - Eliot Moss

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Detecting whether huge pages are working with fsdax - only partial success
  2022-11-09 21:59 Detecting whether hug pages are working with fsdax Eliot Moss
@ 2022-11-14 15:09 ` Eliot Moss
  2022-11-14 19:16   ` Eliot Moss
  0 siblings, 1 reply; 4+ messages in thread
From: Eliot Moss @ 2022-11-14 15:09 UTC (permalink / raw)
  To: nvdimm

On 11/9/2022 4:59 PM, Eliot Moss wrote:
> Dear nvdimmers -
> 
> I tried following Darrick Wong's advice from this page:
> 
> https://nvdimm.wiki.kernel.org/2mib_fs_dax

With more knowledge and some fiddling, I am getting huge pages to work ...
sometimes.

But first, is this the right place to ask questions?  Folks don't seem to
respond.  If this is not the place, perhaps someone could point me to
better places?  Thanks!

Meanwhile, I now see a *mix* of FALLBACK and NOPAGE trace records happening
when I map 32G from a dax file.  I map with:

MAP_SYNC
MAP_SHARED
MAP_SHARED_VALIDATE

not MAP_FIXED

Also, MAP_LOCK and MAP_POPULATE do not seem to change the behavior.

AFAICT, everything has proper 2M alignment - /proc/iomem shows that and
the partition is set to start at 4096 512-byte sectors, ndctl uses 1G
alignment and xfs 2M.

What I see is ~11,000 FALLBACK trace records and about ~11,000 NOPAGE,
with FALLBACK coming first then NOPAGE.  My little app then touches
cache lines sequentially through the 32G region.

I am happy to provide more details, but did not want to create a long
message if this is not the place :-) ...

Also, can fs_dax do 1G huge pages?  If so, how do I make that go, since
the same approach of alignments, etc., does not seem to make it happen.

Regards - Eliot Moss

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Detecting whether huge pages are working with fsdax - only partial success
  2022-11-14 15:09 ` Detecting whether huge pages are working with fsdax - only partial success Eliot Moss
@ 2022-11-14 19:16   ` Eliot Moss
  2022-11-30  4:25     ` Dan Williams
  0 siblings, 1 reply; 4+ messages in thread
From: Eliot Moss @ 2022-11-14 19:16 UTC (permalink / raw)
  To: nvdimm

A quick followup, which may indicate a flaw in fs_dax page mapping code.

When doing that mapping of 32G, for each group of 8G, all but the last 2M
resulted in a NOPAGE 2M fault.  The very last 2M chunk of each 8G region
resulted in FALLBACK.

Then, a spawned thread accessed the same region sequentially,  This caused
the upper 16G all to result in FALLBACK (except those two 2M regions that
already had done FALLBACK).

The first case "smells" like some kind of range error in the code.

The second one is also curiously regular, but I have less of a theory
about it.

Is this the right place for discussion of this behavior and possible patches?

Regards - Eliot Moss

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Detecting whether huge pages are working with fsdax - only partial success
  2022-11-14 19:16   ` Eliot Moss
@ 2022-11-30  4:25     ` Dan Williams
  0 siblings, 0 replies; 4+ messages in thread
From: Dan Williams @ 2022-11-30  4:25 UTC (permalink / raw)
  To: Eliot Moss, nvdimm

Eliot Moss wrote:
> A quick followup, which may indicate a flaw in fs_dax page mapping code.
> 
> When doing that mapping of 32G, for each group of 8G, all but the last 2M
> resulted in a NOPAGE 2M fault.  The very last 2M chunk of each 8G region
> resulted in FALLBACK.
> 
> Then, a spawned thread accessed the same region sequentially,  This caused
> the upper 16G all to result in FALLBACK (except those two 2M regions that
> already had done FALLBACK).
> 
> The first case "smells" like some kind of range error in the code.
> 
> The second one is also curiously regular, but I have less of a theory
> about it.
> 
> Is this the right place for discussion of this behavior and possible patches?

This is a good place to reach people that can poke at this, but may need
to pull in fsdevel folks if this gets into a question about layout
behaviour of a specific fs.

Even the nvdimm unit tests have gone through changes where different
kernels result in different file block allocation behaviour. For example:

https://lkml.kernel.org/r/CAPcyv4g2U6YYj6BO_nMgUYPfE2d04pZvKP0JQwNAMy9HZ3UNvg@mail.gmail.com

So my hesitation to jump in on this stems from this usually being
something outside of what fs/dax.c can control.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-11-30  4:25 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-09 21:59 Detecting whether hug pages are working with fsdax Eliot Moss
2022-11-14 15:09 ` Detecting whether huge pages are working with fsdax - only partial success Eliot Moss
2022-11-14 19:16   ` Eliot Moss
2022-11-30  4:25     ` Dan Williams

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.