linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* nvdimm,pmem: makedumpfile: __vtop4_x86_64: Can't get a valid pte.
@ 2022-11-28 12:04 lizhijian
       [not found] ` <103666d5-3dcf-074c-0057-76b865f012a6@cs.umass.edu>
  2022-11-30 20:05 ` Dan Williams
  0 siblings, 2 replies; 7+ messages in thread
From: lizhijian @ 2022-11-28 12:04 UTC (permalink / raw)
  To: kexec, linux-mm, nvdimm; +Cc: dan.j.williams

Hi folks,

I'm going to make crash coredump support pmem region. So
I have modified kexec-tools to add pmem region to PT_LOAD of vmcore.

But it failed at makedumpfile, log are as following:

In my environment, i found the last 512 pages in pmem region will cause the error.

qemu commandline:
  -object memory-backend-file,id=memnvdimm0,prealloc=yes,mem-path=/root/qemu-dax.img,share=yes,size=4267704320,align=2097152
-device nvdimm,node=0,label-size=4194304,memdev=memnvdimm0,id=nvdimm0,slot=0

ndctl info:
[root@rdma-server ~]# ndctl list
[
   {
     "dev":"namespace0.0",
     "mode":"devdax",
     "map":"dev",
     "size":4127195136,
     "uuid":"f6fc1e86-ac5b-48d8-9cda-4888a33158f9",
     "chardev":"dax0.0",
     "align":4096
   }
]
[root@rdma-server ~]# ndctl list -iRD
{
   "dimms":[
     {
       "dev":"nmem0",
       "id":"8680-56341200",
       "handle":1,
       "phys_id":0
     }
   ],
   "regions":[
     {
       "dev":"region0",
       "size":4263510016,
       "align":16777216,
       "available_size":0,
       "max_available_extent":0,
       "type":"pmem",
       "iset_id":10248187106440278,
       "mappings":[
         {
           "dimm":"nmem0",
           "offset":0,
           "length":4263510016,
           "position":0
         }
       ],
       "persistence_domain":"unknown"
     }
   ]
}

iomem info:
[root@rdma-server ~]# cat /proc/iomem  | grep Persi
140000000-23e1fffff : Persistent Memory

makedumpfile info:
[   57.229110] kdump.sh[240]: mem_map[  71] ffffea0008e00000           238000           23e200


Firstly, i wonder that
1) makedumpfile read the whole range of iomem(same with the PT_LOAD of pmem)
2) 1st kernel side only setup mem_map(vmemmap) for this namespace, which size is 512 pages smaller than iomem for some reasons.
3) Since there is an align in nvdimm region(16MiB in above), i also guess the maximum size of the pmem can used by user should
be ALIGN(iomem, 10MiB), after this alignment, the last 512 pages will be dropped. then kernel only setups vmemmap for this
range. but i didn't see any code doing such things in kernel side.

So if you guy know the reasons, please let me know :), any hint/feedback is very welcome.

--------------------------------
[   56.380802] kdump.sh[240]:   OFFSET(atomic_long_t.counter)=0
[   56.385976] kdump.sh[240]:   SIZE(latched_seq)=64
[   56.390217] kdump.sh[240]:   OFFSET(latched_seq.val)=48
[   56.395750] kdump.sh[240]:   LENGTH(free_area.free_list)=5
[   56.401295] kdump.sh[240]:   NUMBER(NR_FREE_PAGES)=0
[   56.406772] kdump.sh[240]:   NUMBER(PG_lru)=4
[   56.408783] kdump.sh[240]:   NUMBER(PG_private)=13
[   56.415401] kdump.sh[240]:   NUMBER(PG_swapcache)=10
[   56.421269] kdump.sh[240]:   NUMBER(PG_swapbacked)=19
[   56.426797] kdump.sh[240]:   NUMBER(PG_slab)=9
[   56.428911] kdump.sh[240]:   NUMBER(PG_head_mask)=65536
[   56.435175] kdump.sh[240]:   NUMBER(PAGE_BUDDY_MAPCOUNT_VALUE)=-129
[   56.437522] kdump.sh[240]:   NUMBER(HUGETLB_PAGE_DTOR)=2
[   56.442233] kdump.sh[240]:   NUMBER(PAGE_OFFLINE_MAPCOUNT_VALUE)=-257
[   56.446943] kdump.sh[240]:   SYMBOL(kallsyms_names)=ffffffff9e5eb9b0
[   56.452486] kdump.sh[240]:   SYMBOL(kallsyms_num_syms)=ffffffff9e5eb9a8
[   56.458891] kdump.sh[240]:   SYMBOL(kallsyms_token_table)=ffffffff9e76e038
Excluding unnecessary pages                       : [  2.7 %] \                  __vtop4_x86_64: Can't get a valid pte.
[   56.476355] kdump.sh[240]: readmem: Can't convert a virtual address(ffffea0008f80000) to physical address.
[   56.483192] kdump.sh[240]: readmem: type_addr: 0, addr:ffffea0008f80000, size:32768
[   56.489350] kdump.sh[240]: __exclude_unnecessary_pages: Can't read the buffer of struct page.
[   56.494516] kdump.sh[240]: create_2nd_bitmap: Can't exclude unnecessary pages.
[   56.501427] kdump[242]: saving vmcore failed, _exitcode:1
[   56.506871] kdump.sh[240]:   SYMBOL(kallsyms_token_index)=ffffffff9e76e3e8
[   56.511820] kdump.sh[240]:   SYMBOL(kallsyms_offsets)=ffffffff9e574240
[   56.516816] kdump.sh[240]:   SYMBOL(kallsyms_relative_base)=ffffffff9e5eb9a0
[   56.522270] kdump.sh[240]:   NUMBER(phys_base)=4330618880
[   56.526372] kdump.sh[240]:   SYMBOL(init_top_pgt)=ffffffff9ea26000
[   56.531827] kdump.sh[240]:   NUMBER(pgtable_l5_enabled)=0
[   56.535773] kdump.sh[240]:   SYMBOL(node_data)=ffffffff9f2829a0
[   56.540830] kdump.sh[240]:   LENGTH(node_data)=64
[   56.545405] kdump.sh[240]:   KERNELOFFSET=1b400000
[   56.549841] kdump.sh[240]:   NUMBER(KERNEL_IMAGE_SIZE)=1073741824
[   56.554862] kdump.sh[240]:   NUMBER(sme_mask)=0
[   56.558820] kdump.sh[240]:   CRASHTIME=1669635006
[   56.562812] kdump.sh[240]: phys_base    : 102200000 (vmcoreinfo)
[   56.567343] kdump.sh[240]: max_mapnr    : 23e200
[   56.572837] kdump.sh[240]: There is enough free memory to be done in one cycle.
[   56.577843] kdump.sh[240]: Buffer size for the cyclic mode: 587904
[   56.582390] kdump.sh[240]: The kernel version is not supported.
[   56.590812] kdump.sh[240]: The makedumpfile operation may be incomplete.
[   56.593489] kdump.sh[240]: page_offset  : ffff888000000000 (pt_load)
[   56.599405] kdump.sh[240]: num of NODEs : 1
[   56.601457] kdump.sh[240]: Memory type  : SPARSEMEM_EX
[   56.605806] kdump.sh[240]:                        mem_map        pfn_start          pfn_end
[   56.611275] kdump.sh[240]: mem_map[   0] ffffea0000000000                0             8000
[   56.619601] kdump.sh[240]: mem_map[   1] ffffea0000200000             8000            10000
[   56.626356] kdump.sh[240]: mem_map[   2] ffffea0000400000            10000            18000
[   56.631342] kdump.sh[240]: mem_map[   3] ffffea0000600000            18000            20000
[   56.636392] kdump.sh[240]: mem_map[   4] ffffea0000800000            20000            28000
[   56.642359] kdump.sh[240]: mem_map[   5] ffffea0000a00000            28000            30000
[   56.649430] kdump.sh[240]: mem_map[   6] ffffea0000c00000            30000            38000
[   56.656364] kdump.sh[240]: mem_map[   7] ffffea0000e00000            38000            40000
[   56.659915] kdump.sh[240]: mem_map[   8] ffffea0001000000            40000            48000
[   56.666405] kdump.sh[240]: mem_map[   9] ffffea0001200000            48000            50000
[   56.671485] kdump.sh[240]: mem_map[  10] ffffea0001400000            50000            58000
[   56.678435] kdump.sh[240]: mem_map[  11] ffffea0001600000            58000            60000
[...skip...]

[  OK  ] Closed udev Control Socket.
[   57.075801] kdump.sh[240]: mem_map[  53] ffffea0006a00000           1a8000           1b0000
[   57.479463] systemd[1]: systemd-udevd-kernel.socket: Deactivated successfully.
[   57.088368] kdump.sh[240]: mem_map[  54] ffffea0006c00000           1b0000           1b8000
[   57.094242] kdump.sh[240]: mem_map[  55] ffffea0006e00000           1b8000           1c0000

[  OK  ] Closed udev Kernel Socket.
[   57.500084] systemd[1]: systemd-tmpfiles-setup-dev.service: Deactivated successfully.
[   57.101694] kdump.sh[240]: mem_map[  56] ffffea0007000000           1c0000           1c8000
[   57.113148] kdump.sh[240]: mem_map[  57] ffffea0007200000           1c8000           1d0000

[   57.120456] kdump.sh[240]: mem_map[  58] ffffea0007400000           1d0000           1d8000
[  OK  ] Stopped Create Static Device Nodes in /dev.
[   57.128022] kdump.sh[240]: mem_map[  59] ffffea0007600000           1d8000           1e0000
[   57.136194] kdump.sh[240]: mem_map[  60] ffffea0007800000           1e0000           1e8000
[   57.146275] kdump.sh[240]: mem_map[  61] ffffea0007a00000           1e8000           1f0000
[   57.153289] kdump.sh[240]: mem_map[  62] ffffea0007c00000           1f0000           1f8000

[   57.165975] kdump.sh[240]: mem_map[  63] ffffea0007e00000           1f8000           200000

[   57.178271] kdump.sh[240]: mem_map[  64] ffffea0008000000           200000           208000
[  OK  ] Unmounted /sysroot.
[   57.189515] kdump.sh[240]: mem_map[  65] ffffea0008200000           208000           210000

[   57.195177] kdump.sh[240]: mem_map[  66] ffffea0008400000           210000           218000
[   57.202848] kdump.sh[240]: mem_map[  67] ffffea0008600000           218000           220000
[   57.211403] kdump.sh[240]: mem_map[  68] ffffea0008800000           220000           228000
[   57.220052] kdump.sh[240]: mem_map[  69] ffffea0008a00000           228000           230000
[   57.224179] kdump.sh[240]: mem_map[  70] ffffea0008c00000           230000           238000
[   57.229110] kdump.sh[240]: mem_map[  71] ffffea0008e00000           238000           23e200
[   57.237287] kdump.sh[240]: mmap() is available on the kernel.
[   57.243230] kdump.sh[240]: makedumpfile Failed.
[   57.246415] systemd[1]: kdump-capture.service: Failed with result 'exit-code'.
[   57.255054] kdump[264]: Kdump is using the default log level(3).
[   57.262390] systemd[1]: Failed to start Kdump Vmcore Save Service.

Thanks
Zhijian

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: nvdimm,pmem: makedumpfile: __vtop4_x86_64: Can't get a valid pte.
       [not found] ` <103666d5-3dcf-074c-0057-76b865f012a6@cs.umass.edu>
@ 2022-11-28 14:46   ` lizhijian
  2022-11-28 15:03     ` Eliot Moss
  0 siblings, 1 reply; 7+ messages in thread
From: lizhijian @ 2022-11-28 14:46 UTC (permalink / raw)
  To: moss, kexec, linux-mm, nvdimm; +Cc: dan.j.williams



On 28/11/2022 20:53, Eliot Moss wrote:
> On 11/28/2022 7:04 AM, lizhijian@fujitsu.com wrote:
>> Hi folks,
>>
>> I'm going to make crash coredump support pmem region. So
>> I have modified kexec-tools to add pmem region to PT_LOAD of vmcore.
>>
>> But it failed at makedumpfile, log are as following:
>>
>> In my environment, i found the last 512 pages in pmem region will cause the error.
> 
> I wonder if an issue I reported is related: when set up to map
> 2Mb (huge) pages, the last 2Mb of a large region got mapped as
> 4Kb pages, and then later, half of a large region was treated
> that way.
> 
Could you share the url/link ? I'd like to take a look



> I've seen no response to the report, but assume folks have
> been busy with other things or perhaps giving this lower
> priority since it does not exactly *fail*, just not work as
> a user might think it should.
> 
> Regards - Eliot Moss

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: nvdimm,pmem: makedumpfile: __vtop4_x86_64: Can't get a valid pte.
  2022-11-28 14:46   ` lizhijian
@ 2022-11-28 15:03     ` Eliot Moss
  2022-11-29  5:16       ` lizhijian
  0 siblings, 1 reply; 7+ messages in thread
From: Eliot Moss @ 2022-11-28 15:03 UTC (permalink / raw)
  To: lizhijian, kexec, linux-mm, nvdimm; +Cc: dan.j.williams

On 11/28/2022 9:46 AM, lizhijian@fujitsu.com wrote:
> 
> 
> On 28/11/2022 20:53, Eliot Moss wrote:
>> On 11/28/2022 7:04 AM, lizhijian@fujitsu.com wrote:
>>> Hi folks,
>>>
>>> I'm going to make crash coredump support pmem region. So
>>> I have modified kexec-tools to add pmem region to PT_LOAD of vmcore.
>>>
>>> But it failed at makedumpfile, log are as following:
>>>
>>> In my environment, i found the last 512 pages in pmem region will cause the error.
>>
>> I wonder if an issue I reported is related: when set up to map
>> 2Mb (huge) pages, the last 2Mb of a large region got mapped as
>> 4Kb pages, and then later, half of a large region was treated
>> that way.
>>
> Could you share the url/link ? I'd like to take a look

It was in a previous email to the nvdimm list.  the title was:

"Possible PMD (huge pages) bug in fs dax"

And here is the body.  I just sent directly to the list so there
is no URL (if I should be engaging in a different way, please let me know):
================================================================================
Folks - I posted already on nvdimm, but perhaps the topic did not quite grab
anyone's attention.  I had had some trouble figuring all the details to get
dax mapping of files from an xfs file system with underlying Optane DC memory
going, but now have that working reliably.  But there is an odd behavior:

When first mapping a file, I request mapping a 32 Gb range, aligned on a 1 Gb
(and thus clearly on a 2 Mb) boundary.

For each group of 8 Gb, the first 4095 entries map with a 2 Mb huge (PMD)
page.  The 4096th one does FALLBACK.  I suspect some problem in
dax.c:grab_mapping_entry or its callees, but am not personally well enough
versed in either the dax code or the xarray implementation to dig further.


If you'd like a second puzzle 😄 ... after completing this mapping, another
thread accesses the whole range sequentially.  This results in NOPAGE fault
handling for the first 4095+4095 2M regions that previously resulted in
NOPAGE -- so far so good.  But it gives FALLBACK for the upper 16 Gb (except
the two PMD regions it alrady gave FALLBACK for).


I can provide trace output from a run if you'd like and all the ndctl, gdisk
-l, fdisk -l, and xfs_info details if you like.


In my application, it would be nice if dax.c could deliver 1 Gb PUD size
mappings as well, though it would appear that that would require more surgery
on dax.c.  It would be somewhat analogous to what's already there, of course,
but I don't mean to minimize the possible trickiness of it.  I realize I
should submit that request as a separate thread 😄 which I intend to do
later.
================================================================================

Regards - Eliot Moss


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: nvdimm,pmem: makedumpfile: __vtop4_x86_64: Can't get a valid pte.
  2022-11-28 15:03     ` Eliot Moss
@ 2022-11-29  5:16       ` lizhijian
  2022-11-29  5:22         ` Eliot Moss
  0 siblings, 1 reply; 7+ messages in thread
From: lizhijian @ 2022-11-29  5:16 UTC (permalink / raw)
  To: moss, kexec, linux-mm, nvdimm; +Cc: dan.j.williams



On 28/11/2022 23:03, Eliot Moss wrote:
> On 11/28/2022 9:46 AM, lizhijian@fujitsu.com wrote:
>>
>>
>> On 28/11/2022 20:53, Eliot Moss wrote:
>>> On 11/28/2022 7:04 AM, lizhijian@fujitsu.com wrote:
>>>> Hi folks,
>>>>
>>>> I'm going to make crash coredump support pmem region. So
>>>> I have modified kexec-tools to add pmem region to PT_LOAD of vmcore.
>>>>
>>>> But it failed at makedumpfile, log are as following:
>>>>
>>>> In my environment, i found the last 512 pages in pmem region will cause the error.
>>>
>>> I wonder if an issue I reported is related: when set up to map
>>> 2Mb (huge) pages, the last 2Mb of a large region got mapped as
>>> 4Kb pages, and then later, half of a large region was treated
>>> that way.
>>>
>> Could you share the url/link ? I'd like to take a look
> 
> It was in a previous email to the nvdimm list.  the title was:
> 
> "Possible PMD (huge pages) bug in fs dax"
> 
> And here is the body.  I just sent directly to the list so there
> is no URL (if I should be engaging in a different way, please let me know):

I found it :) at
https://www.mail-archive.com/nvdimm@lists.linux.dev/msg02743.html


> ================================================================================
> Folks - I posted already on nvdimm, but perhaps the topic did not quite grab
> anyone's attention.  I had had some trouble figuring all the details to get
> dax mapping of files from an xfs file system with underlying Optane DC memory
> going, but now have that working reliably.  But there is an odd behavior:
> 
> When first mapping a file, I request mapping a 32 Gb range, aligned on a 1 Gb
> (and thus clearly on a 2 Mb) boundary.
> 
> For each group of 8 Gb, the first 4095 entries map with a 2 Mb huge (PMD)
> page.  The 4096th one does FALLBACK.  I suspect some problem in
> dax.c:grab_mapping_entry or its callees, but am not personally well enough
> versed in either the dax code or the xarray implementation to dig further.
> 
> 
> If you'd like a second puzzle 😄 ... after completing this mapping, another
> thread accesses the whole range sequentially.  This results in NOPAGE fault
> handling for the first 4095+4095 2M regions that previously resulted in
> NOPAGE -- so far so good.  But it gives FALLBACK for the upper 16 Gb (except
> the two PMD regions it alrady gave FALLBACK for).
> 
> 
> I can provide trace output from a run if you'd like and all the ndctl, gdisk
> -l, fdisk -l, and xfs_info details if you like.
> 
> 
> In my application, it would be nice if dax.c could deliver 1 Gb PUD size
> mappings as well, though it would appear that that would require more surgery
> on dax.c.  It would be somewhat analogous to what's already there, of course,
> but I don't mean to minimize the possible trickiness of it.  I realize I
> should submit that request as a separate thread 😄 which I intend to do
> later.
> ================================================================================
> 
> Regards - Eliot Moss

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: nvdimm,pmem: makedumpfile: __vtop4_x86_64: Can't get a valid pte.
  2022-11-29  5:16       ` lizhijian
@ 2022-11-29  5:22         ` Eliot Moss
  0 siblings, 0 replies; 7+ messages in thread
From: Eliot Moss @ 2022-11-29  5:22 UTC (permalink / raw)
  To: lizhijian; +Cc: Moss, kexec, linux-mm, nvdimm, dan.j.williams

Glad you found it. Any thoughts/reactions?  EM

Sent from my iPhone

> On Nov 29, 2022, at 12:17 AM, lizhijian@fujitsu.com wrote:
> 
> 
> 
>> On 28/11/2022 23:03, Eliot Moss wrote:
>>> On 11/28/2022 9:46 AM, lizhijian@fujitsu.com wrote:
>>> 
>>> 
>>> On 28/11/2022 20:53, Eliot Moss wrote:
>>>> On 11/28/2022 7:04 AM, lizhijian@fujitsu.com wrote:
>>>>> Hi folks,
>>>>> 
>>>>> I'm going to make crash coredump support pmem region. So
>>>>> I have modified kexec-tools to add pmem region to PT_LOAD of vmcore.
>>>>> 
>>>>> But it failed at makedumpfile, log are as following:
>>>>> 
>>>>> In my environment, i found the last 512 pages in pmem region will cause the error.
>>>> 
>>>> I wonder if an issue I reported is related: when set up to map
>>>> 2Mb (huge) pages, the last 2Mb of a large region got mapped as
>>>> 4Kb pages, and then later, half of a large region was treated
>>>> that way.
>>>> 
>>> Could you share the url/link ? I'd like to take a look
>> 
>> It was in a previous email to the nvdimm list.  the title was:
>> 
>> "Possible PMD (huge pages) bug in fs dax"
>> 
>> And here is the body.  I just sent directly to the list so there
>> is no URL (if I should be engaging in a different way, please let me know):
> 
> I found it :) at
> https://www.mail-archive.com/nvdimm@lists.linux.dev/msg02743.html
> 
> 
>> ================================================================================
>> Folks - I posted already on nvdimm, but perhaps the topic did not quite grab
>> anyone's attention.  I had had some trouble figuring all the details to get
>> dax mapping of files from an xfs file system with underlying Optane DC memory
>> going, but now have that working reliably.  But there is an odd behavior:
>> 
>> When first mapping a file, I request mapping a 32 Gb range, aligned on a 1 Gb
>> (and thus clearly on a 2 Mb) boundary.
>> 
>> For each group of 8 Gb, the first 4095 entries map with a 2 Mb huge (PMD)
>> page.  The 4096th one does FALLBACK.  I suspect some problem in
>> dax.c:grab_mapping_entry or its callees, but am not personally well enough
>> versed in either the dax code or the xarray implementation to dig further.
>> 
>> 
>> If you'd like a second puzzle 😄 ... after completing this mapping, another
>> thread accesses the whole range sequentially.  This results in NOPAGE fault
>> handling for the first 4095+4095 2M regions that previously resulted in
>> NOPAGE -- so far so good.  But it gives FALLBACK for the upper 16 Gb (except
>> the two PMD regions it alrady gave FALLBACK for).
>> 
>> 
>> I can provide trace output from a run if you'd like and all the ndctl, gdisk
>> -l, fdisk -l, and xfs_info details if you like.
>> 
>> 
>> In my application, it would be nice if dax.c could deliver 1 Gb PUD size
>> mappings as well, though it would appear that that would require more surgery
>> on dax.c.  It would be somewhat analogous to what's already there, of course,
>> but I don't mean to minimize the possible trickiness of it.  I realize I
>> should submit that request as a separate thread 😄 which I intend to do
>> later.
>> ================================================================================
>> 
>> Regards - Eliot Moss



^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: nvdimm,pmem: makedumpfile: __vtop4_x86_64: Can't get a valid pte.
  2022-11-28 12:04 nvdimm,pmem: makedumpfile: __vtop4_x86_64: Can't get a valid pte lizhijian
       [not found] ` <103666d5-3dcf-074c-0057-76b865f012a6@cs.umass.edu>
@ 2022-11-30 20:05 ` Dan Williams
  2022-12-01  9:42   ` lizhijian
  1 sibling, 1 reply; 7+ messages in thread
From: Dan Williams @ 2022-11-30 20:05 UTC (permalink / raw)
  To: lizhijian, kexec, linux-mm, nvdimm; +Cc: dan.j.williams

lizhijian@fujitsu.com wrote:
> Hi folks,
> 
> I'm going to make crash coredump support pmem region. So
> I have modified kexec-tools to add pmem region to PT_LOAD of vmcore.
> 
> But it failed at makedumpfile, log are as following:
> 
> In my environment, i found the last 512 pages in pmem region will cause the error.
> 
> qemu commandline:
>   -object memory-backend-file,id=memnvdimm0,prealloc=yes,mem-path=/root/qemu-dax.img,share=yes,size=4267704320,align=2097152
> -device nvdimm,node=0,label-size=4194304,memdev=memnvdimm0,id=nvdimm0,slot=0
> 
> ndctl info:
> [root@rdma-server ~]# ndctl list
> [
>    {
>      "dev":"namespace0.0",
>      "mode":"devdax",
>      "map":"dev",
>      "size":4127195136,
>      "uuid":"f6fc1e86-ac5b-48d8-9cda-4888a33158f9",
>      "chardev":"dax0.0",
>      "align":4096
>    }
> ]
> [root@rdma-server ~]# ndctl list -iRD
> {
>    "dimms":[
>      {
>        "dev":"nmem0",
>        "id":"8680-56341200",
>        "handle":1,
>        "phys_id":0
>      }
>    ],
>    "regions":[
>      {
>        "dev":"region0",
>        "size":4263510016,
>        "align":16777216,
>        "available_size":0,
>        "max_available_extent":0,
>        "type":"pmem",
>        "iset_id":10248187106440278,
>        "mappings":[
>          {
>            "dimm":"nmem0",
>            "offset":0,
>            "length":4263510016,
>            "position":0
>          }
>        ],
>        "persistence_domain":"unknown"
>      }
>    ]
> }
> 
> iomem info:
> [root@rdma-server ~]# cat /proc/iomem  | grep Persi
> 140000000-23e1fffff : Persistent Memory
> 
> makedumpfile info:
> [   57.229110] kdump.sh[240]: mem_map[  71] ffffea0008e00000           238000           23e200
> 
> 
> Firstly, i wonder that
> 1) makedumpfile read the whole range of iomem(same with the PT_LOAD of pmem)
> 2) 1st kernel side only setup mem_map(vmemmap) for this namespace, which size is 512 pages smaller than iomem for some reasons.
> 3) Since there is an align in nvdimm region(16MiB in above), i also guess the maximum size of the pmem can used by user should
> be ALIGN(iomem, 10MiB), after this alignment, the last 512 pages will be dropped. then kernel only setups vmemmap for this
> range. but i didn't see any code doing such things in kernel side.
> 
> So if you guy know the reasons, please let me know :), any hint/feedback is very welcome.

This is due to the region alignment.

2522afb86a8c libnvdimm/region: Introduce an 'align' attribute

If you want to use the full capacity it would be something like this
(untested, and may destroy any data currently on the namespace):

ndctl destroy-namespace namespace0.0
echo $((2<<20)) > /sys/bus/nd/devices/region0/align
ndctl create-namespace -m dax -a 4k -M mem


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: nvdimm,pmem: makedumpfile: __vtop4_x86_64: Can't get a valid pte.
  2022-11-30 20:05 ` Dan Williams
@ 2022-12-01  9:42   ` lizhijian
  0 siblings, 0 replies; 7+ messages in thread
From: lizhijian @ 2022-12-01  9:42 UTC (permalink / raw)
  To: Dan Williams, kexec, linux-mm, nvdimm



On 01/12/2022 04:05, Dan Williams wrote:
> lizhijian@fujitsu.com wrote:
>> Hi folks,
>>
>> I'm going to make crash coredump support pmem region. So
>> I have modified kexec-tools to add pmem region to PT_LOAD of vmcore.
>>
>> But it failed at makedumpfile, log are as following:
>>
>> In my environment, i found the last 512 pages in pmem region will cause the error.
>>
>> qemu commandline:
>>    -object memory-backend-file,id=memnvdimm0,prealloc=yes,mem-path=/root/qemu-dax.img,share=yes,size=4267704320,align=2097152
>> -device nvdimm,node=0,label-size=4194304,memdev=memnvdimm0,id=nvdimm0,slot=0
>>
>> ndctl info:
>> [root@rdma-server ~]# ndctl list
>> [
>>     {
>>       "dev":"namespace0.0",
>>       "mode":"devdax",
>>       "map":"dev",
>>       "size":4127195136,
>>       "uuid":"f6fc1e86-ac5b-48d8-9cda-4888a33158f9",
>>       "chardev":"dax0.0",
>>       "align":4096
>>     }
>> ]
>> [root@rdma-server ~]# ndctl list -iRD
>> {
>>     "dimms":[
>>       {
>>         "dev":"nmem0",
>>         "id":"8680-56341200",
>>         "handle":1,
>>         "phys_id":0
>>       }
>>     ],
>>     "regions":[
>>       {
>>         "dev":"region0",
>>         "size":4263510016,
>>         "align":16777216,
>>         "available_size":0,
>>         "max_available_extent":0,
>>         "type":"pmem",
>>         "iset_id":10248187106440278,
>>         "mappings":[
>>           {
>>             "dimm":"nmem0",
>>             "offset":0,
>>             "length":4263510016,
>>             "position":0
>>           }
>>         ],
>>         "persistence_domain":"unknown"
>>       }
>>     ]
>> }
>>
>> iomem info:
>> [root@rdma-server ~]# cat /proc/iomem  | grep Persi
>> 140000000-23e1fffff : Persistent Memory
>>
>> makedumpfile info:
>> [   57.229110] kdump.sh[240]: mem_map[  71] ffffea0008e00000           238000           23e200
>>
>>
>> Firstly, i wonder that
>> 1) makedumpfile read the whole range of iomem(same with the PT_LOAD of pmem)
>> 2) 1st kernel side only setup mem_map(vmemmap) for this namespace, which size is 512 pages smaller than iomem for some reasons.
>> 3) Since there is an align in nvdimm region(16MiB in above), i also guess the maximum size of the pmem can used by user should
>> be ALIGN(iomem, 10MiB), after this alignment, the last 512 pages will be dropped. then kernel only setups vmemmap for this
>> range. but i didn't see any code doing such things in kernel side.
>>
>> So if you guy know the reasons, please let me know :), any hint/feedback is very welcome.
> 
> This is due to the region alignment.
> 
> 2522afb86a8c libnvdimm/region: Introduce an 'align' attribute
> 

Dan,

Thank you very much,  That's exactly the reason.



> If you want to use the full capacity it would be something like this
> (untested, and may destroy any data currently on the namespace):
> 
> ndctl destroy-namespace namespace0.0
> echo $((2<<20)) > /sys/bus/nd/devices/region0/align
> ndctl create-namespace -m dax -a 4k -M mem
> 

It works for me, but the alignment will reset to 16MiB after reboot. Is this expected ?


Thanks
Zhijian


> _______________________________________________
> kexec mailing list
> kexec@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-12-01  9:43 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-28 12:04 nvdimm,pmem: makedumpfile: __vtop4_x86_64: Can't get a valid pte lizhijian
     [not found] ` <103666d5-3dcf-074c-0057-76b865f012a6@cs.umass.edu>
2022-11-28 14:46   ` lizhijian
2022-11-28 15:03     ` Eliot Moss
2022-11-29  5:16       ` lizhijian
2022-11-29  5:22         ` Eliot Moss
2022-11-30 20:05 ` Dan Williams
2022-12-01  9:42   ` lizhijian

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).