linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Li, Zhijian" <lizhijian@fujitsu.com>
To: Dan Williams <dan.j.williams@intel.com>, bhe@redhat.com
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"Yasunori Gotou (Fujitsu)" <y-goto@fujitsu.com>,
	"Xiao Yang (Fujitsu)" <yangx.jy@fujitsu.com>,
	"Shiyang Ruan (Fujitsu)" <ruansy.fnst@fujitsu.com>,
	"x86@kernel.org" <x86@kernel.org>,
	"nvdimm@lists.linux.dev" <nvdimm@lists.linux.dev>,
	"kexec@lists.infradead.org" <kexec@lists.infradead.org>
Subject: Re: [RFC PATCH v2 0/3] pmem memmap dump support
Date: Thu, 25 May 2023 13:36:46 +0800	[thread overview]
Message-ID: <f8aff5b7-4892-9ccb-8079-abd87e9ab8b0@fujitsu.com> (raw)
In-Reply-To: <0fe0d69e-e33b-cf45-c957-68a8159d29ab@fujitsu.com>

Ping

Baoquan, Dan

Sorry to bother you again.

Could you further comment a word or two on this set?


Thanks
Zhijian


on 5/10/2023 6:41 PM, Zhijian Li (Fujitsu) wrote:
> Hi Dan
>
>
> on 5/8/2023 5:45 PM, Zhijian Li (Fujitsu) wrote:
>> Dan,
>>
>>
>> On 29/04/2023 02:59, Dan Williams wrote:
>>> Li Zhijian wrote:
>>>> Hello folks,
>>>>
>>>> About 2 months ago, we posted our first RFC[3] and received your kindly feedback. Thank you :)
>>>> Now, I'm back with the code.
>>>>
>>>> Currently, this RFC has already implemented to supported case D*. And the case A&B is disabled
>>>> deliberately in makedumpfile. It includes changes in 3 source code as below:
>>> I think the reason this patchkit is difficult to follow is that it
>>> spends a lot of time describing a chosen solution, but not enough time
>>> describing the problem and the tradeoffs.
>>>
>>> For example why is updating /proc/vmcore with pmem metadata the chosen
>>> solution? Why not leave the kernel out of it and have makedumpfile
>>> tooling aware of how to parse persistent memory namespace info-blocks
>>> and retrieve that dump itself? This is what I proposed here:
>>>
>>> http://lore.kernel.org/r/641484f7ef780_a52e2940@dwillia2-mobl3.amr.corp.intel.com.notmuch
>> Sorry for the late reply. I'm just back from the vacation.
>> And sorry again for missing your previous *important* information in V1.
>>
>> Your proposal also sounds to me with less kernel changes, but more ndctl coupling with makedumpfile tools.
>> In my current understanding, it will includes following source changes.
> The kernel and makedumpfile has updated. It's still in a early stage, but in order to make sure I'm following your proposal.
> i want to share the changes with you early. Alternatively, you are able to refer to my github for the full details.
> https://github.com/zhijianli88/makedumpfile/commit/8ebfe38c015cfca0545cb3b1d7a6cc9a58fc9bb3
>
> If I'm going the wrong way, fee free to let me know :)
>
>
>> -----------+-------------------------------------------------------------------+
>> Source     |                      changes                                      |
>> -----------+-------------------------------------------------------------------+
>> I.         | 1. enter force_raw in kdump kernel automatically(avoid metadata being updated again)|
> kernel should adapt it so that the metadata of pmem will be updated again in the kdump kernel:
>
> diff --git a/drivers/nvdimm/namespace_devs.c b/drivers/nvdimm/namespace_devs.c
> index c60ec0b373c5..2e59be8b9c78 100644
> --- a/drivers/nvdimm/namespace_devs.c
> +++ b/drivers/nvdimm/namespace_devs.c
> @@ -8,6 +8,7 @@
>    #include <linux/slab.h>
>    #include <linux/list.h>
>    #include <linux/nd.h>
> +#include <linux/crash_dump.h>
>    #include "nd-core.h"
>    #include "pmem.h"
>    #include "pfn.h"
> @@ -1504,6 +1505,8 @@ struct nd_namespace_common *nvdimm_namespace_common_probe(struct device *dev)
>                           return ERR_PTR(-ENODEV);
>           }
>    
> +       if (is_kdump_kernel())
> +               ndns->force_raw = true;
>           return ndns;
>    }
>    EXPORT_SYMBOL(nvdimm_namespace_common_probe);
>
>> kernel     |                                                                   |
>>               | 2. mark the whole pmem's PT_LOAD for kexec_file_load(2) syscall   |
>> -----------+-------------------------------------------------------------------+
>> II. kexec- | 1. mark the whole pmem's PT_LOAD for kexe_load(2) syscall         |
>> tool       |                                                                   |
>> -----------+-------------------------------------------------------------------+
>> III.       | 1. parse the infoblock and calculate the boundaries of userdata and metadata   |
>> makedump-  | 2. skip pmem userdata region                                      |
>> file       | 3. exclude pmem metadata region if needed                         |
>> -----------+-------------------------------------------------------------------+
>>
>> I will try rewrite it with your proposal ASAP
> inspect_pmem_namespace() will walk the namespaces and the read its resource.start and infoblock. With this
> information, we can calculate the boundaries of userdata and metadata easily. But currently this changes are
> strongly coupling with the ndctl/pmem which looks a bit messy and ugly.
>
> ============makedumpfile=======
>
> diff --git a/Makefile b/Makefile
> index a289e41ef44d..4b4ded639cfd 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -50,7 +50,7 @@ OBJ_PART=$(patsubst %.c,%.o,$(SRC_PART))
>    SRC_ARCH = arch/arm.c arch/arm64.c arch/x86.c arch/x86_64.c arch/ia64.c arch/ppc64.c arch/s390x.c arch/ppc.c arch/sparc64.c arch/mips64.c arch/loongarch64.c
>    OBJ_ARCH=$(patsubst %.c,%.o,$(SRC_ARCH))
>    
> -LIBS = -ldw -lbz2 -ldl -lelf -lz
> +LIBS = -ldw -lbz2 -ldl -lelf -lz -lndctl
>    ifneq ($(LINKTYPE), dynamic)
>    LIBS := -static $(LIBS) -llzma
>    endif
> diff --git a/makedumpfile.c b/makedumpfile.c
> index 98c3b8c7ced9..db68d05a29f9 100644
> --- a/makedumpfile.c
> +++ b/makedumpfile.c
> @@ -27,6 +27,8 @@
>    #include <limits.h>
>    #include <assert.h>
>    #include <zlib.h>
> +#include <sys/types.h>
> +#include <ndctl/libndctl.h>
>
> +
> +#define INFOBLOCK_SZ (8192)
> +#define SZ_4K (4096)
> +#define PFN_SIG_LEN 16
> +
> +typedef uint64_t u64;
> +typedef int64_t s64;
> +typedef uint32_t u32;
> +typedef int32_t s32;
> +typedef uint16_t u16;
> +typedef int16_t s16;
> +typedef uint8_t u8;
> +typedef int8_t s8;
> +
> +typedef int64_t le64;
> +typedef int32_t le32;
> +typedef int16_t le16;
> +
> +struct pfn_sb {
> +       u8 signature[PFN_SIG_LEN];
> +       u8 uuid[16];
> +       u8 parent_uuid[16];
> +       le32 flags;
> +       le16 version_major;
> +       le16 version_minor;
> +       le64 dataoff; /* relative to namespace_base + start_pad */
> +       le64 npfns;
> +       le32 mode;
> +       /* minor-version-1 additions for section alignment */
> +       le32 start_pad;
> +       le32 end_trunc;
> +       /* minor-version-2 record the base alignment of the mapping */
> +       le32 align;
> +       /* minor-version-3 guarantee the padding and flags are zero */
> +       /* minor-version-4 record the page size and struct page size */
> +       le32 page_size;
> +       le16 page_struct_size;
> +       u8 padding[3994];
> +       le64 checksum;
> +};
> +
> +static int nd_read_infoblock_dataoff(struct ndctl_namespace *ndns)
> +{
> +       int fd, rc;
> +       char path[50];
> +       char buf[INFOBLOCK_SZ + 1];
> +       struct pfn_sb *pfn_sb = (struct pfn_sb *)(buf + SZ_4K);
> +
> +       sprintf(path, "/dev/%s", ndctl_namespace_get_block_device(ndns));
> +
> +       fd = open(path, O_RDONLY|O_EXCL);
> +       if (fd < 0)
> +               return -1;
> +
> +
> +       rc = read(fd, buf, INFOBLOCK_SZ);
> +       if (rc < INFOBLOCK_SZ) {
> +               return -1;
> +       }
> +
> +       return pfn_sb->dataoff;
> +}
> +
> +int inspect_pmem_namespace(void)
> +{
> +       struct ndctl_ctx *ctx;
> +       struct ndctl_bus *bus;
> +       int rc = -1;
> +
> +       fprintf(stderr, "\n\ninspect_pmem_namespace!!\n\n");
> +       rc = ndctl_new(&ctx);
> +       if (rc)
> +               return -1;
> +
> +       ndctl_bus_foreach(ctx, bus) {
> +               struct ndctl_region *region;
> +
> +               ndctl_region_foreach(bus, region) {
> +                       struct ndctl_namespace *ndns;
> +
> +                       ndctl_namespace_foreach(region, ndns) {
> +                               enum ndctl_namespace_mode mode;
> +                               long long start, end_metadata;
> +
> +                               mode = ndctl_namespace_get_mode(ndns);
> +                               /* kdump kernel should set force_raw, mode become *safe* */
> +                               if (mode == NDCTL_NS_MODE_SAFE) {
> +                                       fprintf(stderr, "Only raw can be dumpable\n");
> +                                       continue;
> +                               }
> +
> +                               start = ndctl_namespace_get_resource(ndns);
> +                               end_metadata = nd_read_infoblock_dataoff(ndns);
> +
> +                               /* metadata really starts from 2M alignment */
> +                               if (start != ULLONG_MAX && end_metadata > 2 * 1024 * 1024) // 2M
> +                                       pmem_add_next(start, end_metadata);
> +                       }
> +               }
> +       }
> +
> +       ndctl_unref(ctx);
> +       return 0;
> +}
> +
>
> Thanks
> Zhijian
>
>
>
>> Thanks again
>>
>> Thanks
>> Zhijian
>>
>>> ...but never got an answer, or I missed the answer.
>> _______________________________________________
>> kexec mailing list
>> kexec@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/kexec
> _______________________________________________
> kexec mailing list
> kexec@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec


  reply	other threads:[~2023-05-25  5:37 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-27 10:18 [RFC PATCH v2 0/3] pmem memmap dump support Li Zhijian
2023-04-27 10:18 ` [RFC PATCH v2 1/3] crash: export dev memmap header to vmcoreinfo Li Zhijian
2023-04-27 10:18 ` [RFC PATCH v2 2/3] drivers/nvdimm: export memmap of namespace " Li Zhijian
2023-04-27 22:50   ` Ira Weiny
2023-04-28  7:01     ` Zhijian Li (Fujitsu)
2023-04-27 10:18 ` [RFC PATCH v2 3/3] resource, crash: Make kexec_file_load support pmem Li Zhijian
2023-04-27 11:39   ` Greg Kroah-Hartman
2023-04-28  7:36     ` Zhijian Li (Fujitsu)
2023-04-27 20:41   ` Jane Chu
2023-04-28  7:10     ` Zhijian Li (Fujitsu)
2023-04-27 10:18 ` [RFC PATCH v2 kexec-tools] kexec: Add and mark pmem region into PT_LOADs Li Zhijian
2023-04-27 10:18 ` [RFC PATCH v2 makedumpfile 1/3] elf_info.c: Introduce is_pmem_pt_load_range Li Zhijian
2023-04-27 10:18 ` [RFC PATCH v2 makedumpfile 2/3] makedumpfile.c: Exclude all pmem pages Li Zhijian
2023-04-27 10:18 ` [RFC PATCH v2 makedumpfile 3/3] makedumpfile.c: Allow excluding metadata of pmem region Li Zhijian
2023-04-28 18:59 ` [RFC PATCH v2 0/3] pmem memmap dump support Dan Williams
2023-05-08  9:45   ` Zhijian Li (Fujitsu)
2023-05-10 10:41     ` Zhijian Li (Fujitsu)
2023-05-25  5:36       ` Li, Zhijian [this message]
  -- strict thread matches above, loose matches on Subject: below --
2023-04-27 10:11 Li Zhijian

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f8aff5b7-4892-9ccb-8079-abd87e9ab8b0@fujitsu.com \
    --to=lizhijian@fujitsu.com \
    --cc=bhe@redhat.com \
    --cc=dan.j.williams@intel.com \
    --cc=kexec@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nvdimm@lists.linux.dev \
    --cc=ruansy.fnst@fujitsu.com \
    --cc=x86@kernel.org \
    --cc=y-goto@fujitsu.com \
    --cc=yangx.jy@fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).