All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v7 0/4] resource: Use list_head to link sibling resource
@ 2018-07-18  2:49 ` Baoquan He
  0 siblings, 0 replies; 83+ messages in thread
From: Baoquan He @ 2018-07-18  2:49 UTC (permalink / raw)
  To: linux-kernel, akpm, robh+dt, dan.j.williams, nicolas.pitre, josh,
	fengguang.wu, bp, andy.shevchenko
  Cc: brijesh.singh, devicetree, airlied, linux-pci, richard.weiyang,
	jcmvbkbc, baiyaowei, kys, frowand.list, lorenzo.pieralisi,
	sthemmin, Baoquan He, linux-nvdimm, patrik.r.jakobsson,
	linux-input, gustavo, dyoung, thomas.lendacky, haiyangz,
	maarten.lankhorst, jglisse, seanpaul, bhelgaas, tglx, yinghai,
	jonathan.derrick, chris, monstr, linux-parisc, gregkh,
	dmitry.torokhov, ebiederm, devel, linuxppc-dev, davem

This patchset is doing:
1) Move reparent_resources() to kernel/resource.c to clean up duplicated
   code in arch/microblaze/pci/pci-common.c and
   arch/powerpc/kernel/pci-common.c .
2) Replace struct resource's sibling list from singly linked list to
   list_head. Clearing out those pointer operation within singly linked
   list for better code readability.
2) Based on list_head replacement, add a new function
   walk_system_ram_res_rev() which can does reversed iteration on
   iomem_resource's siblings.
3) Change kexec_file loading to search system RAM top down for kernel
   loadin, using walk_system_ram_res_rev().

Note:
This patchset only passed testing on  x86_64 arch with network
enabling. The thing we need pay attetion to is that a root resource's
child member need be initialized specifically with LIST_HEAD_INIT() if
statically defined or INIT_LIST_HEAD() for dynamically definition. Here
Just like we do for iomem_resource/ioport_resource, or the change in
get_pci_domain_busn_res().

v6:
http://lkml.kernel.org/r/20180704041038.8190-1-bhe@redhat.com

v5:
http://lkml.kernel.org/r/20180612032831.29747-1-bhe@redhat.com

v4:
http://lkml.kernel.org/r/20180507063224.24229-1-bhe@redhat.com

v3:
http://lkml.kernel.org/r/20180419001848.3041-1-bhe@redhat.com

v2:
http://lkml.kernel.org/r/20180408024724.16812-1-bhe@redhat.com

v1:
http://lkml.kernel.org/r/20180322033722.9279-1-bhe@redhat.com

Changelog:
v6->v7:
  Fix code bugs that test robot reported on mips and ia64.

  Add error code description in reparent_resources() according to
  Andy's comment, and fix minor log typo.
v5->v6:
  Fix code style problems in reparent_resources() and use existing
  error codes, according to Andy's suggestion.

  Fix bugs test robot reported.

v4->v5:
  Add new patch 0001 to move duplicated reparent_resources() to
  kernel/resource.c to make it be shared by different ARCH-es.

  Fix several code bugs reported by test robot on ARCH powerpc and
  microblaze.
v3->v4:
  Fix several bugs test robot reported. Rewrite cover letter and patch
  log according to reviewer's comment.

v2->v3:
  Rename resource functions first_child() and sibling() to
  resource_first_chils() and resource_sibling(). Dan suggested this.

  Move resource_first_chils() and resource_sibling() to linux/ioport.h
  and make them as inline function. Rob suggested this. Accordingly add
  linux/list.h including in linux/ioport.h, please help review if this
  bring efficiency degradation or code redundancy.

  The change on struct resource {} bring two pointers of size increase,
  mention this in git log to make it more specifically, Rob suggested
  this.

v1->v2:
  Use list_head instead to link resource siblings. This is suggested by
  Andrew.

  Rewrite walk_system_ram_res_rev() after list_head is taken to link
  resouce siblings.

Baoquan He (4):
  resource: Move reparent_resources() to kernel/resource.c and make it
    public
  resource: Use list_head to link sibling resource
  resource: add walk_system_ram_res_rev()
  kexec_file: Load kernel at top of system RAM if required

 arch/arm/plat-samsung/pm-check.c            |   6 +-
 arch/ia64/sn/kernel/io_init.c               |   2 +-
 arch/microblaze/pci/pci-common.c            |  41 +----
 arch/mips/pci/pci-rc32434.c                 |  12 +-
 arch/powerpc/kernel/pci-common.c            |  39 +---
 arch/sparc/kernel/ioport.c                  |   2 +-
 arch/xtensa/include/asm/pci-bridge.h        |   4 +-
 drivers/eisa/eisa-bus.c                     |   2 +
 drivers/gpu/drm/drm_memory.c                |   3 +-
 drivers/gpu/drm/gma500/gtt.c                |   5 +-
 drivers/hv/vmbus_drv.c                      |  52 +++---
 drivers/input/joystick/iforce/iforce-main.c |   4 +-
 drivers/nvdimm/namespace_devs.c             |   6 +-
 drivers/nvdimm/nd.h                         |   5 +-
 drivers/of/address.c                        |   4 +-
 drivers/parisc/lba_pci.c                    |   4 +-
 drivers/pci/controller/vmd.c                |   8 +-
 drivers/pci/probe.c                         |   2 +
 drivers/pci/setup-bus.c                     |   2 +-
 include/linux/ioport.h                      |  21 ++-
 kernel/kexec_file.c                         |   2 +
 kernel/resource.c                           | 266 ++++++++++++++++++----------
 22 files changed, 260 insertions(+), 232 deletions(-)

-- 
2.13.6

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH v7 0/4] resource: Use list_head to link sibling resource
@ 2018-07-18  2:49 ` Baoquan He
  0 siblings, 0 replies; 83+ messages in thread
From: Baoquan He @ 2018-07-18  2:49 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	dan.j.williams-ral2JQCrhuEAvxtiuMwx3w,
	nicolas.pitre-QSEj5FYQhm4dnm+yROfE0A,
	josh-iaAMLnmF4UmaiuxdJuQwMA, fengguang.wu-ral2JQCrhuEAvxtiuMwx3w,
	bp-l3A5Bk7waGM, andy.shevchenko-Re5JQEeQqe8AvxtiuMwx3w
  Cc: brijesh.singh-5C7GfCeVMHo, devicetree-u79uwXL29TY76Z2rM5mHXA,
	airlied-cv59FeDIM0c, linux-pci-u79uwXL29TY76Z2rM5mHXA,
	richard.weiyang-Re5JQEeQqe8AvxtiuMwx3w,
	jcmvbkbc-Re5JQEeQqe8AvxtiuMwx3w,
	baiyaowei-0p4V/sDNsUmm0O/7XYngnFaTQe2KTcn/,
	kys-0li6OtcxBFHby3iVrkZq2A, frowand.list-Re5JQEeQqe8AvxtiuMwx3w,
	lorenzo.pieralisi-5wv7dgnIgG8, sthemmin-0li6OtcxBFHby3iVrkZq2A,
	Baoquan He, linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	patrik.r.jakobsson-Re5JQEeQqe8AvxtiuMwx3w,
	linux-input-u79uwXL29TY76Z2rM5mHXA,
	gustavo-THi1TnShQwVAfugRpC6u6w, dyoung-H+wXaHxf7aLQT0dZR+AlfA,
	thomas.lendacky-5C7GfCeVMHo, haiyangz-0li6OtcxBFHby3iVrkZq2A,
	maarten.lankhorst-VuQAYsv1563Yd54FQh9/CA,
	jglisse-H+wXaHxf7aLQT0dZR+AlfA, seanpaul-F7+t8E8rja9g9hUCZPvPmw,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA, tglx-hfZtesqFncYOwBW4kG4KsQ,
	yinghai-DgEjT+Ai2ygdnm+yROfE0A,
	jonathan.derrick-ral2JQCrhuEAvxtiuMwx3w,
	chris-YvXeqwSYzG2sTnJN9+BGXg, monstr-pSz03upnqPeHXe+LvDLADg,
	linux-parisc-u79uwXL29TY76Z2rM5mHXA,
	gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r,
	dmitry.torokhov-Re5JQEeQqe8AvxtiuMwx3w,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	devel-tBiZLqfeLfOHmIFyCCdPziST3g8Odh+X,
	linuxppc-dev-uLR06cmDAlY/bJ5BZ2RsiQ,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q

This patchset is doing:
1) Move reparent_resources() to kernel/resource.c to clean up duplicated
   code in arch/microblaze/pci/pci-common.c and
   arch/powerpc/kernel/pci-common.c .
2) Replace struct resource's sibling list from singly linked list to
   list_head. Clearing out those pointer operation within singly linked
   list for better code readability.
2) Based on list_head replacement, add a new function
   walk_system_ram_res_rev() which can does reversed iteration on
   iomem_resource's siblings.
3) Change kexec_file loading to search system RAM top down for kernel
   loadin, using walk_system_ram_res_rev().

Note:
This patchset only passed testing on  x86_64 arch with network
enabling. The thing we need pay attetion to is that a root resource's
child member need be initialized specifically with LIST_HEAD_INIT() if
statically defined or INIT_LIST_HEAD() for dynamically definition. Here
Just like we do for iomem_resource/ioport_resource, or the change in
get_pci_domain_busn_res().

v6:
http://lkml.kernel.org/r/20180704041038.8190-1-bhe-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org

v5:
http://lkml.kernel.org/r/20180612032831.29747-1-bhe-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org

v4:
http://lkml.kernel.org/r/20180507063224.24229-1-bhe-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org

v3:
http://lkml.kernel.org/r/20180419001848.3041-1-bhe-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org

v2:
http://lkml.kernel.org/r/20180408024724.16812-1-bhe-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org

v1:
http://lkml.kernel.org/r/20180322033722.9279-1-bhe-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org

Changelog:
v6->v7:
  Fix code bugs that test robot reported on mips and ia64.

  Add error code description in reparent_resources() according to
  Andy's comment, and fix minor log typo.
v5->v6:
  Fix code style problems in reparent_resources() and use existing
  error codes, according to Andy's suggestion.

  Fix bugs test robot reported.

v4->v5:
  Add new patch 0001 to move duplicated reparent_resources() to
  kernel/resource.c to make it be shared by different ARCH-es.

  Fix several code bugs reported by test robot on ARCH powerpc and
  microblaze.
v3->v4:
  Fix several bugs test robot reported. Rewrite cover letter and patch
  log according to reviewer's comment.

v2->v3:
  Rename resource functions first_child() and sibling() to
  resource_first_chils() and resource_sibling(). Dan suggested this.

  Move resource_first_chils() and resource_sibling() to linux/ioport.h
  and make them as inline function. Rob suggested this. Accordingly add
  linux/list.h including in linux/ioport.h, please help review if this
  bring efficiency degradation or code redundancy.

  The change on struct resource {} bring two pointers of size increase,
  mention this in git log to make it more specifically, Rob suggested
  this.

v1->v2:
  Use list_head instead to link resource siblings. This is suggested by
  Andrew.

  Rewrite walk_system_ram_res_rev() after list_head is taken to link
  resouce siblings.

Baoquan He (4):
  resource: Move reparent_resources() to kernel/resource.c and make it
    public
  resource: Use list_head to link sibling resource
  resource: add walk_system_ram_res_rev()
  kexec_file: Load kernel at top of system RAM if required

 arch/arm/plat-samsung/pm-check.c            |   6 +-
 arch/ia64/sn/kernel/io_init.c               |   2 +-
 arch/microblaze/pci/pci-common.c            |  41 +----
 arch/mips/pci/pci-rc32434.c                 |  12 +-
 arch/powerpc/kernel/pci-common.c            |  39 +---
 arch/sparc/kernel/ioport.c                  |   2 +-
 arch/xtensa/include/asm/pci-bridge.h        |   4 +-
 drivers/eisa/eisa-bus.c                     |   2 +
 drivers/gpu/drm/drm_memory.c                |   3 +-
 drivers/gpu/drm/gma500/gtt.c                |   5 +-
 drivers/hv/vmbus_drv.c                      |  52 +++---
 drivers/input/joystick/iforce/iforce-main.c |   4 +-
 drivers/nvdimm/namespace_devs.c             |   6 +-
 drivers/nvdimm/nd.h                         |   5 +-
 drivers/of/address.c                        |   4 +-
 drivers/parisc/lba_pci.c                    |   4 +-
 drivers/pci/controller/vmd.c                |   8 +-
 drivers/pci/probe.c                         |   2 +
 drivers/pci/setup-bus.c                     |   2 +-
 include/linux/ioport.h                      |  21 ++-
 kernel/kexec_file.c                         |   2 +
 kernel/resource.c                           | 266 ++++++++++++++++++----------
 22 files changed, 260 insertions(+), 232 deletions(-)

-- 
2.13.6

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH v7 0/4] resource: Use list_head to link sibling resource
@ 2018-07-18  2:49 ` Baoquan He
  0 siblings, 0 replies; 83+ messages in thread
From: Baoquan He @ 2018-07-18  2:49 UTC (permalink / raw)
  To: linux-kernel, akpm, robh+dt, dan.j.williams, nicolas.pitre, josh,
	fengguang.wu, bp, andy.shevchenko
  Cc: patrik.r.jakobsson, airlied, kys, haiyangz, sthemmin,
	dmitry.torokhov, frowand.list, keith.busch, jonathan.derrick,
	lorenzo.pieralisi, bhelgaas, tglx, brijesh.singh, jglisse,
	thomas.lendacky, gregkh, baiyaowei, richard.weiyang, devel,
	linux-input, linux-nvdimm, devicetree, linux-pci, ebiederm,
	vgoyal, dyoung, yinghai, monstr, davem, chris, jcmvbkbc, gustavo,
	maarten.lankhorst, seanpaul, linux-parisc, linuxppc-dev,
	Baoquan He

This patchset is doing:
1) Move reparent_resources() to kernel/resource.c to clean up duplicated
   code in arch/microblaze/pci/pci-common.c and
   arch/powerpc/kernel/pci-common.c .
2) Replace struct resource's sibling list from singly linked list to
   list_head. Clearing out those pointer operation within singly linked
   list for better code readability.
2) Based on list_head replacement, add a new function
   walk_system_ram_res_rev() which can does reversed iteration on
   iomem_resource's siblings.
3) Change kexec_file loading to search system RAM top down for kernel
   loadin, using walk_system_ram_res_rev().

Note:
This patchset only passed testing on  x86_64 arch with network
enabling. The thing we need pay attetion to is that a root resource's
child member need be initialized specifically with LIST_HEAD_INIT() if
statically defined or INIT_LIST_HEAD() for dynamically definition. Here
Just like we do for iomem_resource/ioport_resource, or the change in
get_pci_domain_busn_res().

v6:
http://lkml.kernel.org/r/20180704041038.8190-1-bhe@redhat.com

v5:
http://lkml.kernel.org/r/20180612032831.29747-1-bhe@redhat.com

v4:
http://lkml.kernel.org/r/20180507063224.24229-1-bhe@redhat.com

v3:
http://lkml.kernel.org/r/20180419001848.3041-1-bhe@redhat.com

v2:
http://lkml.kernel.org/r/20180408024724.16812-1-bhe@redhat.com

v1:
http://lkml.kernel.org/r/20180322033722.9279-1-bhe@redhat.com

Changelog:
v6->v7:
  Fix code bugs that test robot reported on mips and ia64.

  Add error code description in reparent_resources() according to
  Andy's comment, and fix minor log typo.
v5->v6:
  Fix code style problems in reparent_resources() and use existing
  error codes, according to Andy's suggestion.

  Fix bugs test robot reported.

v4->v5:
  Add new patch 0001 to move duplicated reparent_resources() to
  kernel/resource.c to make it be shared by different ARCH-es.

  Fix several code bugs reported by test robot on ARCH powerpc and
  microblaze.
v3->v4:
  Fix several bugs test robot reported. Rewrite cover letter and patch
  log according to reviewer's comment.

v2->v3:
  Rename resource functions first_child() and sibling() to
  resource_first_chils() and resource_sibling(). Dan suggested this.

  Move resource_first_chils() and resource_sibling() to linux/ioport.h
  and make them as inline function. Rob suggested this. Accordingly add
  linux/list.h including in linux/ioport.h, please help review if this
  bring efficiency degradation or code redundancy.

  The change on struct resource {} bring two pointers of size increase,
  mention this in git log to make it more specifically, Rob suggested
  this.

v1->v2:
  Use list_head instead to link resource siblings. This is suggested by
  Andrew.

  Rewrite walk_system_ram_res_rev() after list_head is taken to link
  resouce siblings.

Baoquan He (4):
  resource: Move reparent_resources() to kernel/resource.c and make it
    public
  resource: Use list_head to link sibling resource
  resource: add walk_system_ram_res_rev()
  kexec_file: Load kernel at top of system RAM if required

 arch/arm/plat-samsung/pm-check.c            |   6 +-
 arch/ia64/sn/kernel/io_init.c               |   2 +-
 arch/microblaze/pci/pci-common.c            |  41 +----
 arch/mips/pci/pci-rc32434.c                 |  12 +-
 arch/powerpc/kernel/pci-common.c            |  39 +---
 arch/sparc/kernel/ioport.c                  |   2 +-
 arch/xtensa/include/asm/pci-bridge.h        |   4 +-
 drivers/eisa/eisa-bus.c                     |   2 +
 drivers/gpu/drm/drm_memory.c                |   3 +-
 drivers/gpu/drm/gma500/gtt.c                |   5 +-
 drivers/hv/vmbus_drv.c                      |  52 +++---
 drivers/input/joystick/iforce/iforce-main.c |   4 +-
 drivers/nvdimm/namespace_devs.c             |   6 +-
 drivers/nvdimm/nd.h                         |   5 +-
 drivers/of/address.c                        |   4 +-
 drivers/parisc/lba_pci.c                    |   4 +-
 drivers/pci/controller/vmd.c                |   8 +-
 drivers/pci/probe.c                         |   2 +
 drivers/pci/setup-bus.c                     |   2 +-
 include/linux/ioport.h                      |  21 ++-
 kernel/kexec_file.c                         |   2 +
 kernel/resource.c                           | 266 ++++++++++++++++++----------
 22 files changed, 260 insertions(+), 232 deletions(-)

-- 
2.13.6


^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH v7 1/4] resource: Move reparent_resources() to kernel/resource.c and make it public
@ 2018-07-18  2:49   ` Baoquan He
  0 siblings, 0 replies; 83+ messages in thread
From: Baoquan He @ 2018-07-18  2:49 UTC (permalink / raw)
  To: linux-kernel, akpm, robh+dt, dan.j.williams, nicolas.pitre, josh,
	fengguang.wu, bp, andy.shevchenko
  Cc: brijesh.singh, devicetree, airlied, linux-pci, richard.weiyang,
	jcmvbkbc, Paul Mackerras, baiyaowei, kys, frowand.list,
	lorenzo.pieralisi, sthemmin, Baoquan He, linux-nvdimm,
	Michael Ellerman, patrik.r.jakobsson, linux-input, gustavo,
	dyoung, thomas.lendacky, haiyangz, maarten.lankhorst, jglisse,
	seanpaul, bhelgaas, tglx, yinghai, jonathan.derrick, chris,
	monstr, linux-parisc, gregkh, dmitry.torokhov,
	Benjamin Herrenschmidt, ebiederm, devel, linuxppc-dev, davem

reparent_resources() is duplicated in arch/microblaze/pci/pci-common.c
and arch/powerpc/kernel/pci-common.c, so move it to kernel/resource.c
so that it's shared.

Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Baoquan He <bhe@redhat.com>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
---
 arch/microblaze/pci/pci-common.c | 37 -----------------------------------
 arch/powerpc/kernel/pci-common.c | 35 ---------------------------------
 include/linux/ioport.h           |  1 +
 kernel/resource.c                | 42 ++++++++++++++++++++++++++++++++++++++++
 4 files changed, 43 insertions(+), 72 deletions(-)

diff --git a/arch/microblaze/pci/pci-common.c b/arch/microblaze/pci/pci-common.c
index f34346d56095..7899bafab064 100644
--- a/arch/microblaze/pci/pci-common.c
+++ b/arch/microblaze/pci/pci-common.c
@@ -619,43 +619,6 @@ int pcibios_add_device(struct pci_dev *dev)
 EXPORT_SYMBOL(pcibios_add_device);
 
 /*
- * Reparent resource children of pr that conflict with res
- * under res, and make res replace those children.
- */
-static int __init reparent_resources(struct resource *parent,
-				     struct resource *res)
-{
-	struct resource *p, **pp;
-	struct resource **firstpp = NULL;
-
-	for (pp = &parent->child; (p = *pp) != NULL; pp = &p->sibling) {
-		if (p->end < res->start)
-			continue;
-		if (res->end < p->start)
-			break;
-		if (p->start < res->start || p->end > res->end)
-			return -1;	/* not completely contained */
-		if (firstpp == NULL)
-			firstpp = pp;
-	}
-	if (firstpp == NULL)
-		return -1;	/* didn't find any conflicting entries? */
-	res->parent = parent;
-	res->child = *firstpp;
-	res->sibling = *pp;
-	*firstpp = res;
-	*pp = NULL;
-	for (p = res->child; p != NULL; p = p->sibling) {
-		p->parent = res;
-		pr_debug("PCI: Reparented %s [%llx..%llx] under %s\n",
-			 p->name,
-			 (unsigned long long)p->start,
-			 (unsigned long long)p->end, res->name);
-	}
-	return 0;
-}
-
-/*
  *  Handle resources of PCI devices.  If the world were perfect, we could
  *  just allocate all the resource regions and do nothing more.  It isn't.
  *  On the other hand, we cannot just re-allocate all devices, as it would
diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index fe9733ffffaa..926035bb378d 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -1088,41 +1088,6 @@ resource_size_t pcibios_align_resource(void *data, const struct resource *res,
 EXPORT_SYMBOL(pcibios_align_resource);
 
 /*
- * Reparent resource children of pr that conflict with res
- * under res, and make res replace those children.
- */
-static int reparent_resources(struct resource *parent,
-				     struct resource *res)
-{
-	struct resource *p, **pp;
-	struct resource **firstpp = NULL;
-
-	for (pp = &parent->child; (p = *pp) != NULL; pp = &p->sibling) {
-		if (p->end < res->start)
-			continue;
-		if (res->end < p->start)
-			break;
-		if (p->start < res->start || p->end > res->end)
-			return -1;	/* not completely contained */
-		if (firstpp == NULL)
-			firstpp = pp;
-	}
-	if (firstpp == NULL)
-		return -1;	/* didn't find any conflicting entries? */
-	res->parent = parent;
-	res->child = *firstpp;
-	res->sibling = *pp;
-	*firstpp = res;
-	*pp = NULL;
-	for (p = res->child; p != NULL; p = p->sibling) {
-		p->parent = res;
-		pr_debug("PCI: Reparented %s %pR under %s\n",
-			 p->name, p, res->name);
-	}
-	return 0;
-}
-
-/*
  *  Handle resources of PCI devices.  If the world were perfect, we could
  *  just allocate all the resource regions and do nothing more.  It isn't.
  *  On the other hand, we cannot just re-allocate all devices, as it would
diff --git a/include/linux/ioport.h b/include/linux/ioport.h
index da0ebaec25f0..dfdcd0bfe54e 100644
--- a/include/linux/ioport.h
+++ b/include/linux/ioport.h
@@ -192,6 +192,7 @@ extern int allocate_resource(struct resource *root, struct resource *new,
 struct resource *lookup_resource(struct resource *root, resource_size_t start);
 int adjust_resource(struct resource *res, resource_size_t start,
 		    resource_size_t size);
+int reparent_resources(struct resource *parent, struct resource *res);
 resource_size_t resource_alignment(struct resource *res);
 static inline resource_size_t resource_size(const struct resource *res)
 {
diff --git a/kernel/resource.c b/kernel/resource.c
index 30e1bc68503b..81ccd19c1d9f 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -983,6 +983,48 @@ int adjust_resource(struct resource *res, resource_size_t start,
 }
 EXPORT_SYMBOL(adjust_resource);
 
+/**
+ * reparent_resources - reparent resource children of parent that res covers
+ * @parent: parent resource descriptor
+ * @res: resource descriptor desired by caller
+ *
+ * Returns 0 on success, -ENOTSUPP if child resource is not completely
+ * contained by 'res', -ECANCELED if no any conflicting entry found.
+ *
+ * Reparent resource children of 'parent' that conflict with 'res'
+ * under 'res', and make 'res' replace those children.
+ */
+int reparent_resources(struct resource *parent, struct resource *res)
+{
+	struct resource *p, **pp;
+	struct resource **firstpp = NULL;
+
+	for (pp = &parent->child; (p = *pp) != NULL; pp = &p->sibling) {
+		if (p->end < res->start)
+			continue;
+		if (res->end < p->start)
+			break;
+		if (p->start < res->start || p->end > res->end)
+			return -ENOTSUPP;	/* not completely contained */
+		if (firstpp == NULL)
+			firstpp = pp;
+	}
+	if (firstpp == NULL)
+		return -ECANCELED; /* didn't find any conflicting entries? */
+	res->parent = parent;
+	res->child = *firstpp;
+	res->sibling = *pp;
+	*firstpp = res;
+	*pp = NULL;
+	for (p = res->child; p != NULL; p = p->sibling) {
+		p->parent = res;
+		pr_debug("PCI: Reparented %s %pR under %s\n",
+			 p->name, p, res->name);
+	}
+	return 0;
+}
+EXPORT_SYMBOL(reparent_resources);
+
 static void __init __reserve_region_with_split(struct resource *root,
 		resource_size_t start, resource_size_t end,
 		const char *name)
-- 
2.13.6

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v7 1/4] resource: Move reparent_resources() to kernel/resource.c and make it public
@ 2018-07-18  2:49   ` Baoquan He
  0 siblings, 0 replies; 83+ messages in thread
From: Baoquan He @ 2018-07-18  2:49 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	dan.j.williams-ral2JQCrhuEAvxtiuMwx3w,
	nicolas.pitre-QSEj5FYQhm4dnm+yROfE0A,
	josh-iaAMLnmF4UmaiuxdJuQwMA, fengguang.wu-ral2JQCrhuEAvxtiuMwx3w,
	bp-l3A5Bk7waGM, andy.shevchenko-Re5JQEeQqe8AvxtiuMwx3w
  Cc: brijesh.singh-5C7GfCeVMHo, devicetree-u79uwXL29TY76Z2rM5mHXA,
	airlied-cv59FeDIM0c, linux-pci-u79uwXL29TY76Z2rM5mHXA,
	richard.weiyang-Re5JQEeQqe8AvxtiuMwx3w,
	jcmvbkbc-Re5JQEeQqe8AvxtiuMwx3w, Paul Mackerras,
	baiyaowei-0p4V/sDNsUmm0O/7XYngnFaTQe2KTcn/,
	kys-0li6OtcxBFHby3iVrkZq2A, frowand.list-Re5JQEeQqe8AvxtiuMwx3w,
	lorenzo.pieralisi-5wv7dgnIgG8, sthemmin-0li6OtcxBFHby3iVrkZq2A,
	Baoquan He, linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	Michael Ellerman, patrik.r.jakobsson-Re5JQEeQqe8AvxtiuMwx3w,
	linux-input-u79uwXL29TY76Z2rM5mHXA,
	gustavo-THi1TnShQwVAfugRpC6u6w, dyoung-H+wXaHxf7aLQT0dZR+AlfA,
	thomas.lendacky-5C7GfCeVMHo, haiyangz-0li6OtcxBFHby3iVrkZq2A,
	maarten.lankhorst-VuQAYsv1563Yd54FQh9/CA,
	jglisse-H+wXaHxf7aLQT0dZR+AlfA, seanpaul-F7+t8E8rja9g9hUCZPvPmw,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA, tglx-hfZtesqFncYOwBW4kG4KsQ,
	yinghai-DgEjT+Ai2ygdnm+yROfE0A,
	jonathan.derrick-ral2JQCrhuEAvxtiuMwx3w,
	chris-YvXeqwSYzG2sTnJN9+BGXg, monstr-pSz03upnqPeHXe+LvDLADg,
	linux-parisc-u79uwXL29TY76Z2rM5mHXA,
	gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r,
	dmitry.torokhov-Re5JQEeQqe8AvxtiuMwx3w, Benjamin Herrenschmidt

reparent_resources() is duplicated in arch/microblaze/pci/pci-common.c
and arch/powerpc/kernel/pci-common.c, so move it to kernel/resource.c
so that it's shared.

Reviewed-by: Andy Shevchenko <andy.shevchenko-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Signed-off-by: Baoquan He <bhe-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Cc: Michal Simek <monstr-pSz03upnqPeHXe+LvDLADg@public.gmane.org>
Cc: Benjamin Herrenschmidt <benh-XVmvHMARGAS8U2dJNN8I7kB+6BGkLq7r@public.gmane.org>
Cc: Paul Mackerras <paulus-eUNUBHrolfbYtjvyW6yDsg@public.gmane.org>
Cc: Michael Ellerman <mpe-Gsx/Oe8HsFggBc27wqDAHg@public.gmane.org>
Cc: linuxppc-dev-uLR06cmDAlY/bJ5BZ2RsiQ@public.gmane.org
---
 arch/microblaze/pci/pci-common.c | 37 -----------------------------------
 arch/powerpc/kernel/pci-common.c | 35 ---------------------------------
 include/linux/ioport.h           |  1 +
 kernel/resource.c                | 42 ++++++++++++++++++++++++++++++++++++++++
 4 files changed, 43 insertions(+), 72 deletions(-)

diff --git a/arch/microblaze/pci/pci-common.c b/arch/microblaze/pci/pci-common.c
index f34346d56095..7899bafab064 100644
--- a/arch/microblaze/pci/pci-common.c
+++ b/arch/microblaze/pci/pci-common.c
@@ -619,43 +619,6 @@ int pcibios_add_device(struct pci_dev *dev)
 EXPORT_SYMBOL(pcibios_add_device);
 
 /*
- * Reparent resource children of pr that conflict with res
- * under res, and make res replace those children.
- */
-static int __init reparent_resources(struct resource *parent,
-				     struct resource *res)
-{
-	struct resource *p, **pp;
-	struct resource **firstpp = NULL;
-
-	for (pp = &parent->child; (p = *pp) != NULL; pp = &p->sibling) {
-		if (p->end < res->start)
-			continue;
-		if (res->end < p->start)
-			break;
-		if (p->start < res->start || p->end > res->end)
-			return -1;	/* not completely contained */
-		if (firstpp == NULL)
-			firstpp = pp;
-	}
-	if (firstpp == NULL)
-		return -1;	/* didn't find any conflicting entries? */
-	res->parent = parent;
-	res->child = *firstpp;
-	res->sibling = *pp;
-	*firstpp = res;
-	*pp = NULL;
-	for (p = res->child; p != NULL; p = p->sibling) {
-		p->parent = res;
-		pr_debug("PCI: Reparented %s [%llx..%llx] under %s\n",
-			 p->name,
-			 (unsigned long long)p->start,
-			 (unsigned long long)p->end, res->name);
-	}
-	return 0;
-}
-
-/*
  *  Handle resources of PCI devices.  If the world were perfect, we could
  *  just allocate all the resource regions and do nothing more.  It isn't.
  *  On the other hand, we cannot just re-allocate all devices, as it would
diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index fe9733ffffaa..926035bb378d 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -1088,41 +1088,6 @@ resource_size_t pcibios_align_resource(void *data, const struct resource *res,
 EXPORT_SYMBOL(pcibios_align_resource);
 
 /*
- * Reparent resource children of pr that conflict with res
- * under res, and make res replace those children.
- */
-static int reparent_resources(struct resource *parent,
-				     struct resource *res)
-{
-	struct resource *p, **pp;
-	struct resource **firstpp = NULL;
-
-	for (pp = &parent->child; (p = *pp) != NULL; pp = &p->sibling) {
-		if (p->end < res->start)
-			continue;
-		if (res->end < p->start)
-			break;
-		if (p->start < res->start || p->end > res->end)
-			return -1;	/* not completely contained */
-		if (firstpp == NULL)
-			firstpp = pp;
-	}
-	if (firstpp == NULL)
-		return -1;	/* didn't find any conflicting entries? */
-	res->parent = parent;
-	res->child = *firstpp;
-	res->sibling = *pp;
-	*firstpp = res;
-	*pp = NULL;
-	for (p = res->child; p != NULL; p = p->sibling) {
-		p->parent = res;
-		pr_debug("PCI: Reparented %s %pR under %s\n",
-			 p->name, p, res->name);
-	}
-	return 0;
-}
-
-/*
  *  Handle resources of PCI devices.  If the world were perfect, we could
  *  just allocate all the resource regions and do nothing more.  It isn't.
  *  On the other hand, we cannot just re-allocate all devices, as it would
diff --git a/include/linux/ioport.h b/include/linux/ioport.h
index da0ebaec25f0..dfdcd0bfe54e 100644
--- a/include/linux/ioport.h
+++ b/include/linux/ioport.h
@@ -192,6 +192,7 @@ extern int allocate_resource(struct resource *root, struct resource *new,
 struct resource *lookup_resource(struct resource *root, resource_size_t start);
 int adjust_resource(struct resource *res, resource_size_t start,
 		    resource_size_t size);
+int reparent_resources(struct resource *parent, struct resource *res);
 resource_size_t resource_alignment(struct resource *res);
 static inline resource_size_t resource_size(const struct resource *res)
 {
diff --git a/kernel/resource.c b/kernel/resource.c
index 30e1bc68503b..81ccd19c1d9f 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -983,6 +983,48 @@ int adjust_resource(struct resource *res, resource_size_t start,
 }
 EXPORT_SYMBOL(adjust_resource);
 
+/**
+ * reparent_resources - reparent resource children of parent that res covers
+ * @parent: parent resource descriptor
+ * @res: resource descriptor desired by caller
+ *
+ * Returns 0 on success, -ENOTSUPP if child resource is not completely
+ * contained by 'res', -ECANCELED if no any conflicting entry found.
+ *
+ * Reparent resource children of 'parent' that conflict with 'res'
+ * under 'res', and make 'res' replace those children.
+ */
+int reparent_resources(struct resource *parent, struct resource *res)
+{
+	struct resource *p, **pp;
+	struct resource **firstpp = NULL;
+
+	for (pp = &parent->child; (p = *pp) != NULL; pp = &p->sibling) {
+		if (p->end < res->start)
+			continue;
+		if (res->end < p->start)
+			break;
+		if (p->start < res->start || p->end > res->end)
+			return -ENOTSUPP;	/* not completely contained */
+		if (firstpp == NULL)
+			firstpp = pp;
+	}
+	if (firstpp == NULL)
+		return -ECANCELED; /* didn't find any conflicting entries? */
+	res->parent = parent;
+	res->child = *firstpp;
+	res->sibling = *pp;
+	*firstpp = res;
+	*pp = NULL;
+	for (p = res->child; p != NULL; p = p->sibling) {
+		p->parent = res;
+		pr_debug("PCI: Reparented %s %pR under %s\n",
+			 p->name, p, res->name);
+	}
+	return 0;
+}
+EXPORT_SYMBOL(reparent_resources);
+
 static void __init __reserve_region_with_split(struct resource *root,
 		resource_size_t start, resource_size_t end,
 		const char *name)
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v7 1/4] resource: Move reparent_resources() to kernel/resource.c and make it public
@ 2018-07-18  2:49   ` Baoquan He
  0 siblings, 0 replies; 83+ messages in thread
From: Baoquan He @ 2018-07-18  2:49 UTC (permalink / raw)
  To: linux-kernel, akpm, robh+dt, dan.j.williams, nicolas.pitre, josh,
	fengguang.wu, bp, andy.shevchenko
  Cc: patrik.r.jakobsson, airlied, kys, haiyangz, sthemmin,
	dmitry.torokhov, frowand.list, keith.busch, jonathan.derrick,
	lorenzo.pieralisi, bhelgaas, tglx, brijesh.singh, jglisse,
	thomas.lendacky, gregkh, baiyaowei, richard.weiyang, devel,
	linux-input, linux-nvdimm, devicetree, linux-pci, ebiederm,
	vgoyal, dyoung, yinghai, monstr, davem, chris, jcmvbkbc, gustavo,
	maarten.lankhorst, seanpaul, linux-parisc, linuxppc-dev,
	Baoquan He, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman

reparent_resources() is duplicated in arch/microblaze/pci/pci-common.c
and arch/powerpc/kernel/pci-common.c, so move it to kernel/resource.c
so that it's shared.

Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Baoquan He <bhe@redhat.com>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
---
 arch/microblaze/pci/pci-common.c | 37 -----------------------------------
 arch/powerpc/kernel/pci-common.c | 35 ---------------------------------
 include/linux/ioport.h           |  1 +
 kernel/resource.c                | 42 ++++++++++++++++++++++++++++++++++++++++
 4 files changed, 43 insertions(+), 72 deletions(-)

diff --git a/arch/microblaze/pci/pci-common.c b/arch/microblaze/pci/pci-common.c
index f34346d56095..7899bafab064 100644
--- a/arch/microblaze/pci/pci-common.c
+++ b/arch/microblaze/pci/pci-common.c
@@ -619,43 +619,6 @@ int pcibios_add_device(struct pci_dev *dev)
 EXPORT_SYMBOL(pcibios_add_device);
 
 /*
- * Reparent resource children of pr that conflict with res
- * under res, and make res replace those children.
- */
-static int __init reparent_resources(struct resource *parent,
-				     struct resource *res)
-{
-	struct resource *p, **pp;
-	struct resource **firstpp = NULL;
-
-	for (pp = &parent->child; (p = *pp) != NULL; pp = &p->sibling) {
-		if (p->end < res->start)
-			continue;
-		if (res->end < p->start)
-			break;
-		if (p->start < res->start || p->end > res->end)
-			return -1;	/* not completely contained */
-		if (firstpp == NULL)
-			firstpp = pp;
-	}
-	if (firstpp == NULL)
-		return -1;	/* didn't find any conflicting entries? */
-	res->parent = parent;
-	res->child = *firstpp;
-	res->sibling = *pp;
-	*firstpp = res;
-	*pp = NULL;
-	for (p = res->child; p != NULL; p = p->sibling) {
-		p->parent = res;
-		pr_debug("PCI: Reparented %s [%llx..%llx] under %s\n",
-			 p->name,
-			 (unsigned long long)p->start,
-			 (unsigned long long)p->end, res->name);
-	}
-	return 0;
-}
-
-/*
  *  Handle resources of PCI devices.  If the world were perfect, we could
  *  just allocate all the resource regions and do nothing more.  It isn't.
  *  On the other hand, we cannot just re-allocate all devices, as it would
diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index fe9733ffffaa..926035bb378d 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -1088,41 +1088,6 @@ resource_size_t pcibios_align_resource(void *data, const struct resource *res,
 EXPORT_SYMBOL(pcibios_align_resource);
 
 /*
- * Reparent resource children of pr that conflict with res
- * under res, and make res replace those children.
- */
-static int reparent_resources(struct resource *parent,
-				     struct resource *res)
-{
-	struct resource *p, **pp;
-	struct resource **firstpp = NULL;
-
-	for (pp = &parent->child; (p = *pp) != NULL; pp = &p->sibling) {
-		if (p->end < res->start)
-			continue;
-		if (res->end < p->start)
-			break;
-		if (p->start < res->start || p->end > res->end)
-			return -1;	/* not completely contained */
-		if (firstpp == NULL)
-			firstpp = pp;
-	}
-	if (firstpp == NULL)
-		return -1;	/* didn't find any conflicting entries? */
-	res->parent = parent;
-	res->child = *firstpp;
-	res->sibling = *pp;
-	*firstpp = res;
-	*pp = NULL;
-	for (p = res->child; p != NULL; p = p->sibling) {
-		p->parent = res;
-		pr_debug("PCI: Reparented %s %pR under %s\n",
-			 p->name, p, res->name);
-	}
-	return 0;
-}
-
-/*
  *  Handle resources of PCI devices.  If the world were perfect, we could
  *  just allocate all the resource regions and do nothing more.  It isn't.
  *  On the other hand, we cannot just re-allocate all devices, as it would
diff --git a/include/linux/ioport.h b/include/linux/ioport.h
index da0ebaec25f0..dfdcd0bfe54e 100644
--- a/include/linux/ioport.h
+++ b/include/linux/ioport.h
@@ -192,6 +192,7 @@ extern int allocate_resource(struct resource *root, struct resource *new,
 struct resource *lookup_resource(struct resource *root, resource_size_t start);
 int adjust_resource(struct resource *res, resource_size_t start,
 		    resource_size_t size);
+int reparent_resources(struct resource *parent, struct resource *res);
 resource_size_t resource_alignment(struct resource *res);
 static inline resource_size_t resource_size(const struct resource *res)
 {
diff --git a/kernel/resource.c b/kernel/resource.c
index 30e1bc68503b..81ccd19c1d9f 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -983,6 +983,48 @@ int adjust_resource(struct resource *res, resource_size_t start,
 }
 EXPORT_SYMBOL(adjust_resource);
 
+/**
+ * reparent_resources - reparent resource children of parent that res covers
+ * @parent: parent resource descriptor
+ * @res: resource descriptor desired by caller
+ *
+ * Returns 0 on success, -ENOTSUPP if child resource is not completely
+ * contained by 'res', -ECANCELED if no any conflicting entry found.
+ *
+ * Reparent resource children of 'parent' that conflict with 'res'
+ * under 'res', and make 'res' replace those children.
+ */
+int reparent_resources(struct resource *parent, struct resource *res)
+{
+	struct resource *p, **pp;
+	struct resource **firstpp = NULL;
+
+	for (pp = &parent->child; (p = *pp) != NULL; pp = &p->sibling) {
+		if (p->end < res->start)
+			continue;
+		if (res->end < p->start)
+			break;
+		if (p->start < res->start || p->end > res->end)
+			return -ENOTSUPP;	/* not completely contained */
+		if (firstpp == NULL)
+			firstpp = pp;
+	}
+	if (firstpp == NULL)
+		return -ECANCELED; /* didn't find any conflicting entries? */
+	res->parent = parent;
+	res->child = *firstpp;
+	res->sibling = *pp;
+	*firstpp = res;
+	*pp = NULL;
+	for (p = res->child; p != NULL; p = p->sibling) {
+		p->parent = res;
+		pr_debug("PCI: Reparented %s %pR under %s\n",
+			 p->name, p, res->name);
+	}
+	return 0;
+}
+EXPORT_SYMBOL(reparent_resources);
+
 static void __init __reserve_region_with_split(struct resource *root,
 		resource_size_t start, resource_size_t end,
 		const char *name)
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v7 2/4] resource: Use list_head to link sibling resource
@ 2018-07-18  2:49   ` Baoquan He
  0 siblings, 0 replies; 83+ messages in thread
From: Baoquan He @ 2018-07-18  2:49 UTC (permalink / raw)
  To: linux-kernel, akpm, robh+dt, dan.j.williams, nicolas.pitre, josh,
	fengguang.wu, bp, andy.shevchenko
  Cc: linux-mips, brijesh.singh, devicetree, airlied, linux-pci,
	richard.weiyang, jcmvbkbc, Paul Mackerras, baiyaowei, kys,
	frowand.list, lorenzo.pieralisi, sthemmin, Baoquan He,
	linux-nvdimm, Michael Ellerman, patrik.r.jakobsson, linux-input,
	gustavo, dyoung, thomas.lendacky, haiyangz, maarten.lankhorst,
	jglisse, seanpaul, bhelgaas, tglx, yinghai, jonathan.derrick,
	chris, monstr, linux-parisc, gregkh, dmitry.torokhov,
	Benjamin Herrenschmidt, ebiederm, devel, linuxppc-dev, davem

The struct resource uses singly linked list to link siblings, implemented
by pointer operation. Replace it with list_head for better code readability.

Based on this list_head replacement, it will be very easy to do reverse
iteration on iomem_resource's sibling list in later patch.

Besides, type of member variables of struct resource, sibling and child, are
changed from 'struct resource *' to 'struct list_head'. This brings two
pointers of size increase.

Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Baoquan He <bhe@redhat.com>
Cc: Patrik Jakobsson <patrik.r.jakobsson@gmail.com>
Cc: David Airlie <airlied@linux.ie>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Frank Rowand <frowand.list@gmail.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Jonathan Derrick <jonathan.derrick@intel.com>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Yaowei Bai <baiyaowei@cmss.chinamobile.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: devel@linuxdriverproject.org
Cc: linux-input@vger.kernel.org
Cc: linux-nvdimm@lists.01.org
Cc: devicetree@vger.kernel.org
Cc: linux-pci@vger.kernel.org
Cc: Michal Simek <monstr@monstr.eu>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>                                                                                             
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linux-mips@linux-mips.org
---
 arch/arm/plat-samsung/pm-check.c            |   6 +-
 arch/ia64/sn/kernel/io_init.c               |   2 +-
 arch/microblaze/pci/pci-common.c            |   4 +-
 arch/mips/pci/pci-rc32434.c                 |  12 +-
 arch/powerpc/kernel/pci-common.c            |   4 +-
 arch/sparc/kernel/ioport.c                  |   2 +-
 arch/xtensa/include/asm/pci-bridge.h        |   4 +-
 drivers/eisa/eisa-bus.c                     |   2 +
 drivers/gpu/drm/drm_memory.c                |   3 +-
 drivers/gpu/drm/gma500/gtt.c                |   5 +-
 drivers/hv/vmbus_drv.c                      |  52 +++----
 drivers/input/joystick/iforce/iforce-main.c |   4 +-
 drivers/nvdimm/namespace_devs.c             |   6 +-
 drivers/nvdimm/nd.h                         |   5 +-
 drivers/of/address.c                        |   4 +-
 drivers/parisc/lba_pci.c                    |   4 +-
 drivers/pci/controller/vmd.c                |   8 +-
 drivers/pci/probe.c                         |   2 +
 drivers/pci/setup-bus.c                     |   2 +-
 include/linux/ioport.h                      |  17 ++-
 kernel/resource.c                           | 206 ++++++++++++++--------------
 21 files changed, 183 insertions(+), 171 deletions(-)

diff --git a/arch/arm/plat-samsung/pm-check.c b/arch/arm/plat-samsung/pm-check.c
index cd2c02c68bc3..5494355b1c49 100644
--- a/arch/arm/plat-samsung/pm-check.c
+++ b/arch/arm/plat-samsung/pm-check.c
@@ -46,8 +46,8 @@ typedef u32 *(run_fn_t)(struct resource *ptr, u32 *arg);
 static void s3c_pm_run_res(struct resource *ptr, run_fn_t fn, u32 *arg)
 {
 	while (ptr != NULL) {
-		if (ptr->child != NULL)
-			s3c_pm_run_res(ptr->child, fn, arg);
+		if (!list_empty(&ptr->child))
+			s3c_pm_run_res(resource_first_child(&ptr->child), fn, arg);
 
 		if ((ptr->flags & IORESOURCE_SYSTEM_RAM)
 				== IORESOURCE_SYSTEM_RAM) {
@@ -57,7 +57,7 @@ static void s3c_pm_run_res(struct resource *ptr, run_fn_t fn, u32 *arg)
 			arg = (fn)(ptr, arg);
 		}
 
-		ptr = ptr->sibling;
+		ptr = resource_sibling(ptr);
 	}
 }
 
diff --git a/arch/ia64/sn/kernel/io_init.c b/arch/ia64/sn/kernel/io_init.c
index d63809a6adfa..338a7b7f194d 100644
--- a/arch/ia64/sn/kernel/io_init.c
+++ b/arch/ia64/sn/kernel/io_init.c
@@ -192,7 +192,7 @@ sn_io_slot_fixup(struct pci_dev *dev)
 		 * if it's already in the device structure, remove it before
 		 * inserting
 		 */
-		if (res->parent && res->parent->child)
+		if (res->parent && !list_empty(&res->parent->child))
 			release_resource(res);
 
 		if (res->flags & IORESOURCE_IO)
diff --git a/arch/microblaze/pci/pci-common.c b/arch/microblaze/pci/pci-common.c
index 7899bafab064..2bf73e27e231 100644
--- a/arch/microblaze/pci/pci-common.c
+++ b/arch/microblaze/pci/pci-common.c
@@ -533,7 +533,9 @@ void pci_process_bridge_OF_ranges(struct pci_controller *hose,
 			res->flags = range.flags;
 			res->start = range.cpu_addr;
 			res->end = range.cpu_addr + range.size - 1;
-			res->parent = res->child = res->sibling = NULL;
+			res->parent = NULL;
+			INIT_LIST_HEAD(&res->child);
+			INIT_LIST_HEAD(&res->sibling);
 		}
 	}
 
diff --git a/arch/mips/pci/pci-rc32434.c b/arch/mips/pci/pci-rc32434.c
index 7f6ce6d734c0..e80283df7925 100644
--- a/arch/mips/pci/pci-rc32434.c
+++ b/arch/mips/pci/pci-rc32434.c
@@ -53,8 +53,8 @@ static struct resource rc32434_res_pci_mem1 = {
 	.start = 0x50000000,
 	.end = 0x5FFFFFFF,
 	.flags = IORESOURCE_MEM,
-	.sibling = NULL,
-	.child = &rc32434_res_pci_mem2
+	.sibling = LIST_HEAD_INIT(rc32434_res_pci_mem1.sibling),
+	.child = LIST_HEAD_INIT(rc32434_res_pci_mem1.child),
 };
 
 static struct resource rc32434_res_pci_mem2 = {
@@ -63,8 +63,8 @@ static struct resource rc32434_res_pci_mem2 = {
 	.end = 0x6FFFFFFF,
 	.flags = IORESOURCE_MEM,
 	.parent = &rc32434_res_pci_mem1,
-	.sibling = NULL,
-	.child = NULL
+	.sibling = LIST_HEAD_INIT(rc32434_res_pci_mem2.sibling),
+	.child = LIST_HEAD_INIT(rc32434_res_pci_mem2.child),
 };
 
 static struct resource rc32434_res_pci_io1 = {
@@ -72,6 +72,8 @@ static struct resource rc32434_res_pci_io1 = {
 	.start = 0x18800000,
 	.end = 0x188FFFFF,
 	.flags = IORESOURCE_IO,
+	.sibling = LIST_HEAD_INIT(rc32434_res_pci_io1.sibling),
+	.child = LIST_HEAD_INIT(rc32434_res_pci_io1.child),
 };
 
 extern struct pci_ops rc32434_pci_ops;
@@ -208,6 +210,8 @@ static int __init rc32434_pci_init(void)
 
 	pr_info("PCI: Initializing PCI\n");
 
+	list_add(&rc32434_res_pci_mem2.sibling, &rc32434_res_pci_mem1.child);
+
 	ioport_resource.start = rc32434_res_pci_io1.start;
 	ioport_resource.end = rc32434_res_pci_io1.end;
 
diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index 926035bb378d..28fbe83c9daf 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -761,7 +761,9 @@ void pci_process_bridge_OF_ranges(struct pci_controller *hose,
 			res->flags = range.flags;
 			res->start = range.cpu_addr;
 			res->end = range.cpu_addr + range.size - 1;
-			res->parent = res->child = res->sibling = NULL;
+			res->parent = NULL;
+			INIT_LIST_HEAD(&res->child);
+			INIT_LIST_HEAD(&res->sibling);
 		}
 	}
 }
diff --git a/arch/sparc/kernel/ioport.c b/arch/sparc/kernel/ioport.c
index cca9134cfa7d..99efe4e98b16 100644
--- a/arch/sparc/kernel/ioport.c
+++ b/arch/sparc/kernel/ioport.c
@@ -669,7 +669,7 @@ static int sparc_io_proc_show(struct seq_file *m, void *v)
 	struct resource *root = m->private, *r;
 	const char *nm;
 
-	for (r = root->child; r != NULL; r = r->sibling) {
+	list_for_each_entry(r, &root->child, sibling) {
 		if ((nm = r->name) == NULL) nm = "???";
 		seq_printf(m, "%016llx-%016llx: %s\n",
 				(unsigned long long)r->start,
diff --git a/arch/xtensa/include/asm/pci-bridge.h b/arch/xtensa/include/asm/pci-bridge.h
index 0b68c76ec1e6..f487b06817df 100644
--- a/arch/xtensa/include/asm/pci-bridge.h
+++ b/arch/xtensa/include/asm/pci-bridge.h
@@ -71,8 +71,8 @@ static inline void pcibios_init_resource(struct resource *res,
 	res->flags = flags;
 	res->name = name;
 	res->parent = NULL;
-	res->sibling = NULL;
-	res->child = NULL;
+	INIT_LIST_HEAD(&res->child);
+	INIT_LIST_HEAD(&res->sibling);
 }
 
 
diff --git a/drivers/eisa/eisa-bus.c b/drivers/eisa/eisa-bus.c
index 1e8062f6dbfc..dba78f75fd06 100644
--- a/drivers/eisa/eisa-bus.c
+++ b/drivers/eisa/eisa-bus.c
@@ -408,6 +408,8 @@ static struct resource eisa_root_res = {
 	.start = 0,
 	.end   = 0xffffffff,
 	.flags = IORESOURCE_IO,
+	.sibling = LIST_HEAD_INIT(eisa_root_res.sibling),
+	.child  = LIST_HEAD_INIT(eisa_root_res.child),
 };
 
 static int eisa_bus_count;
diff --git a/drivers/gpu/drm/drm_memory.c b/drivers/gpu/drm/drm_memory.c
index d69e4fc1ee77..33baa7fa5e41 100644
--- a/drivers/gpu/drm/drm_memory.c
+++ b/drivers/gpu/drm/drm_memory.c
@@ -155,9 +155,8 @@ u64 drm_get_max_iomem(void)
 	struct resource *tmp;
 	resource_size_t max_iomem = 0;
 
-	for (tmp = iomem_resource.child; tmp; tmp = tmp->sibling) {
+	list_for_each_entry(tmp, &iomem_resource.child, sibling)
 		max_iomem = max(max_iomem,  tmp->end);
-	}
 
 	return max_iomem;
 }
diff --git a/drivers/gpu/drm/gma500/gtt.c b/drivers/gpu/drm/gma500/gtt.c
index 3949b0990916..addd3bc009af 100644
--- a/drivers/gpu/drm/gma500/gtt.c
+++ b/drivers/gpu/drm/gma500/gtt.c
@@ -565,7 +565,7 @@ int psb_gtt_init(struct drm_device *dev, int resume)
 int psb_gtt_restore(struct drm_device *dev)
 {
 	struct drm_psb_private *dev_priv = dev->dev_private;
-	struct resource *r = dev_priv->gtt_mem->child;
+	struct resource *r;
 	struct gtt_range *range;
 	unsigned int restored = 0, total = 0, size = 0;
 
@@ -573,14 +573,13 @@ int psb_gtt_restore(struct drm_device *dev)
 	mutex_lock(&dev_priv->gtt_mutex);
 	psb_gtt_init(dev, 1);
 
-	while (r != NULL) {
+	list_for_each_entry(r, &dev_priv->gtt_mem->child, sibling) {
 		range = container_of(r, struct gtt_range, resource);
 		if (range->pages) {
 			psb_gtt_insert(dev, range, 1);
 			size += range->resource.end - range->resource.start;
 			restored++;
 		}
-		r = r->sibling;
 		total++;
 	}
 	mutex_unlock(&dev_priv->gtt_mutex);
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index b10fe26c4891..d87ec5a1bc4c 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -1412,9 +1412,8 @@ static acpi_status vmbus_walk_resources(struct acpi_resource *res, void *ctx)
 {
 	resource_size_t start = 0;
 	resource_size_t end = 0;
-	struct resource *new_res;
+	struct resource *new_res, *tmp;
 	struct resource **old_res = &hyperv_mmio;
-	struct resource **prev_res = NULL;
 
 	switch (res->type) {
 
@@ -1461,44 +1460,36 @@ static acpi_status vmbus_walk_resources(struct acpi_resource *res, void *ctx)
 	/*
 	 * If two ranges are adjacent, merge them.
 	 */
-	do {
-		if (!*old_res) {
-			*old_res = new_res;
-			break;
-		}
-
-		if (((*old_res)->end + 1) == new_res->start) {
-			(*old_res)->end = new_res->end;
+	if (!*old_res) {
+		*old_res = new_res;
+		return AE_OK;
+	}
+	tmp = *old_res;
+	list_for_each_entry_from(tmp, &tmp->parent->child, sibling) {
+		if ((tmp->end + 1) == new_res->start) {
+			tmp->end = new_res->end;
 			kfree(new_res);
 			break;
 		}
 
-		if ((*old_res)->start == new_res->end + 1) {
-			(*old_res)->start = new_res->start;
+		if (tmp->start == new_res->end + 1) {
+			tmp->start = new_res->start;
 			kfree(new_res);
 			break;
 		}
 
-		if ((*old_res)->start > new_res->end) {
-			new_res->sibling = *old_res;
-			if (prev_res)
-				(*prev_res)->sibling = new_res;
-			*old_res = new_res;
+		if (tmp->start > new_res->end) {
+			list_add(&new_res->sibling, tmp->sibling.prev);
 			break;
 		}
-
-		prev_res = old_res;
-		old_res = &(*old_res)->sibling;
-
-	} while (1);
+	}
 
 	return AE_OK;
 }
 
 static int vmbus_acpi_remove(struct acpi_device *device)
 {
-	struct resource *cur_res;
-	struct resource *next_res;
+	struct resource *res;
 
 	if (hyperv_mmio) {
 		if (fb_mmio) {
@@ -1507,10 +1498,9 @@ static int vmbus_acpi_remove(struct acpi_device *device)
 			fb_mmio = NULL;
 		}
 
-		for (cur_res = hyperv_mmio; cur_res; cur_res = next_res) {
-			next_res = cur_res->sibling;
-			kfree(cur_res);
-		}
+		res = hyperv_mmio;
+		list_for_each_entry_from(res, &res->parent->child, sibling)
+			kfree(res);
 	}
 
 	return 0;
@@ -1596,7 +1586,8 @@ int vmbus_allocate_mmio(struct resource **new, struct hv_device *device_obj,
 		}
 	}
 
-	for (iter = hyperv_mmio; iter; iter = iter->sibling) {
+	iter = hyperv_mmio;
+	list_for_each_entry_from(iter, &iter->parent->child, sibling) {
 		if ((iter->start >= max) || (iter->end <= min))
 			continue;
 
@@ -1639,7 +1630,8 @@ void vmbus_free_mmio(resource_size_t start, resource_size_t size)
 	struct resource *iter;
 
 	down(&hyperv_mmio_lock);
-	for (iter = hyperv_mmio; iter; iter = iter->sibling) {
+	iter = hyperv_mmio;
+	list_for_each_entry_from(iter, &iter->parent->child, sibling) {
 		if ((iter->start >= start + size) || (iter->end <= start))
 			continue;
 
diff --git a/drivers/input/joystick/iforce/iforce-main.c b/drivers/input/joystick/iforce/iforce-main.c
index daeeb4c7e3b0..5c0be27b33ff 100644
--- a/drivers/input/joystick/iforce/iforce-main.c
+++ b/drivers/input/joystick/iforce/iforce-main.c
@@ -305,8 +305,8 @@ int iforce_init_device(struct iforce *iforce)
 	iforce->device_memory.end = 200;
 	iforce->device_memory.flags = IORESOURCE_MEM;
 	iforce->device_memory.parent = NULL;
-	iforce->device_memory.child = NULL;
-	iforce->device_memory.sibling = NULL;
+	INIT_LIST_HEAD(&iforce->device_memory.child);
+	INIT_LIST_HEAD(&iforce->device_memory.sibling);
 
 /*
  * Wait until device ready - until it sends its first response.
diff --git a/drivers/nvdimm/namespace_devs.c b/drivers/nvdimm/namespace_devs.c
index 28afdd668905..f53d410d9981 100644
--- a/drivers/nvdimm/namespace_devs.c
+++ b/drivers/nvdimm/namespace_devs.c
@@ -637,7 +637,7 @@ static resource_size_t scan_allocate(struct nd_region *nd_region,
  retry:
 	first = 0;
 	for_each_dpa_resource(ndd, res) {
-		struct resource *next = res->sibling, *new_res = NULL;
+		struct resource *next = resource_sibling(res), *new_res = NULL;
 		resource_size_t allocate, available = 0;
 		enum alloc_loc loc = ALLOC_ERR;
 		const char *action;
@@ -763,7 +763,7 @@ static resource_size_t scan_allocate(struct nd_region *nd_region,
 	 * an initial "pmem-reserve pass".  Only do an initial BLK allocation
 	 * when none of the DPA space is reserved.
 	 */
-	if ((is_pmem || !ndd->dpa.child) && n == to_allocate)
+	if ((is_pmem || list_empty(&ndd->dpa.child)) && n == to_allocate)
 		return init_dpa_allocation(label_id, nd_region, nd_mapping, n);
 	return n;
 }
@@ -779,7 +779,7 @@ static int merge_dpa(struct nd_region *nd_region,
  retry:
 	for_each_dpa_resource(ndd, res) {
 		int rc;
-		struct resource *next = res->sibling;
+		struct resource *next = resource_sibling(res);
 		resource_size_t end = res->start + resource_size(res);
 
 		if (!next || strcmp(res->name, label_id->id) != 0
diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h
index 32e0364b48b9..da7da15e03e7 100644
--- a/drivers/nvdimm/nd.h
+++ b/drivers/nvdimm/nd.h
@@ -102,11 +102,10 @@ unsigned sizeof_namespace_label(struct nvdimm_drvdata *ndd);
 		(unsigned long long) (res ? res->start : 0), ##arg)
 
 #define for_each_dpa_resource(ndd, res) \
-	for (res = (ndd)->dpa.child; res; res = res->sibling)
+	list_for_each_entry(res, &(ndd)->dpa.child, sibling)
 
 #define for_each_dpa_resource_safe(ndd, res, next) \
-	for (res = (ndd)->dpa.child, next = res ? res->sibling : NULL; \
-			res; res = next, next = next ? next->sibling : NULL)
+	list_for_each_entry_safe(res, next, &(ndd)->dpa.child, sibling)
 
 struct nd_percpu_lane {
 	int count;
diff --git a/drivers/of/address.c b/drivers/of/address.c
index 53349912ac75..e2e25719ab52 100644
--- a/drivers/of/address.c
+++ b/drivers/of/address.c
@@ -330,7 +330,9 @@ int of_pci_range_to_resource(struct of_pci_range *range,
 {
 	int err;
 	res->flags = range->flags;
-	res->parent = res->child = res->sibling = NULL;
+	res->parent = NULL;
+	INIT_LIST_HEAD(&res->child);
+	INIT_LIST_HEAD(&res->sibling);
 	res->name = np->full_name;
 
 	if (res->flags & IORESOURCE_IO) {
diff --git a/drivers/parisc/lba_pci.c b/drivers/parisc/lba_pci.c
index 69bd98421eb1..7482bdfd1959 100644
--- a/drivers/parisc/lba_pci.c
+++ b/drivers/parisc/lba_pci.c
@@ -170,8 +170,8 @@ lba_dump_res(struct resource *r, int d)
 	for (i = d; i ; --i) printk(" ");
 	printk(KERN_DEBUG "%p [%lx,%lx]/%lx\n", r,
 		(long)r->start, (long)r->end, r->flags);
-	lba_dump_res(r->child, d+2);
-	lba_dump_res(r->sibling, d);
+	lba_dump_res(resource_first_child(&r->child), d+2);
+	lba_dump_res(resource_sibling(r), d);
 }
 
 
diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
index 942b64fc7f1f..e3ace20345c7 100644
--- a/drivers/pci/controller/vmd.c
+++ b/drivers/pci/controller/vmd.c
@@ -542,14 +542,14 @@ static struct pci_ops vmd_ops = {
 
 static void vmd_attach_resources(struct vmd_dev *vmd)
 {
-	vmd->dev->resource[VMD_MEMBAR1].child = &vmd->resources[1];
-	vmd->dev->resource[VMD_MEMBAR2].child = &vmd->resources[2];
+	list_add(&vmd->resources[1].sibling, &vmd->dev->resource[VMD_MEMBAR1].child);
+	list_add(&vmd->resources[2].sibling, &vmd->dev->resource[VMD_MEMBAR2].child);
 }
 
 static void vmd_detach_resources(struct vmd_dev *vmd)
 {
-	vmd->dev->resource[VMD_MEMBAR1].child = NULL;
-	vmd->dev->resource[VMD_MEMBAR2].child = NULL;
+	INIT_LIST_HEAD(&vmd->dev->resource[VMD_MEMBAR1].child);
+	INIT_LIST_HEAD(&vmd->dev->resource[VMD_MEMBAR2].child);
 }
 
 /*
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index ac876e32de4b..9624dd1dfd49 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -59,6 +59,8 @@ static struct resource *get_pci_domain_busn_res(int domain_nr)
 	r->res.start = 0;
 	r->res.end = 0xff;
 	r->res.flags = IORESOURCE_BUS | IORESOURCE_PCI_FIXED;
+	INIT_LIST_HEAD(&r->res.child);
+	INIT_LIST_HEAD(&r->res.sibling);
 
 	list_add_tail(&r->list, &pci_domain_busn_res_list);
 
diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 79b1824e83b4..8e685af8938d 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -2107,7 +2107,7 @@ int pci_reassign_bridge_resources(struct pci_dev *bridge, unsigned long type)
 				continue;
 
 			/* Ignore BARs which are still in use */
-			if (res->child)
+			if (!list_empty(&res->child))
 				continue;
 
 			ret = add_to_list(&saved, bridge, res, 0, 0);
diff --git a/include/linux/ioport.h b/include/linux/ioport.h
index dfdcd0bfe54e..b7456ae889dd 100644
--- a/include/linux/ioport.h
+++ b/include/linux/ioport.h
@@ -12,6 +12,7 @@
 #ifndef __ASSEMBLY__
 #include <linux/compiler.h>
 #include <linux/types.h>
+#include <linux/list.h>
 /*
  * Resources are tree-like, allowing
  * nesting etc..
@@ -22,7 +23,8 @@ struct resource {
 	const char *name;
 	unsigned long flags;
 	unsigned long desc;
-	struct resource *parent, *sibling, *child;
+	struct list_head child, sibling;
+	struct resource *parent;
 };
 
 /*
@@ -216,7 +218,6 @@ static inline bool resource_contains(struct resource *r1, struct resource *r2)
 	return r1->start <= r2->start && r1->end >= r2->end;
 }
 
-
 /* Convenience shorthand with allocation */
 #define request_region(start,n,name)		__request_region(&ioport_resource, (start), (n), (name), 0)
 #define request_muxed_region(start,n,name)	__request_region(&ioport_resource, (start), (n), (name), IORESOURCE_MUXED)
@@ -287,6 +288,18 @@ static inline bool resource_overlaps(struct resource *r1, struct resource *r2)
        return (r1->start <= r2->end && r1->end >= r2->start);
 }
 
+static inline struct resource *resource_sibling(struct resource *res)
+{
+	if (res->parent && !list_is_last(&res->sibling, &res->parent->child))
+		return list_next_entry(res, sibling);
+	return NULL;
+}
+
+static inline struct resource *resource_first_child(struct list_head *head)
+{
+	return list_first_entry_or_null(head, struct resource, sibling);
+}
+
 
 #endif /* __ASSEMBLY__ */
 #endif	/* _LINUX_IOPORT_H */
diff --git a/kernel/resource.c b/kernel/resource.c
index 81ccd19c1d9f..c96e58d3d2f8 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -31,6 +31,8 @@ struct resource ioport_resource = {
 	.start	= 0,
 	.end	= IO_SPACE_LIMIT,
 	.flags	= IORESOURCE_IO,
+	.sibling = LIST_HEAD_INIT(ioport_resource.sibling),
+	.child  = LIST_HEAD_INIT(ioport_resource.child),
 };
 EXPORT_SYMBOL(ioport_resource);
 
@@ -39,6 +41,8 @@ struct resource iomem_resource = {
 	.start	= 0,
 	.end	= -1,
 	.flags	= IORESOURCE_MEM,
+	.sibling = LIST_HEAD_INIT(iomem_resource.sibling),
+	.child  = LIST_HEAD_INIT(iomem_resource.child),
 };
 EXPORT_SYMBOL(iomem_resource);
 
@@ -57,20 +61,20 @@ static DEFINE_RWLOCK(resource_lock);
  * by boot mem after the system is up. So for reusing the resource entry
  * we need to remember the resource.
  */
-static struct resource *bootmem_resource_free;
+static struct list_head bootmem_resource_free = LIST_HEAD_INIT(bootmem_resource_free);
 static DEFINE_SPINLOCK(bootmem_resource_lock);
 
 static struct resource *next_resource(struct resource *p, bool sibling_only)
 {
 	/* Caller wants to traverse through siblings only */
 	if (sibling_only)
-		return p->sibling;
+		return resource_sibling(p);
 
-	if (p->child)
-		return p->child;
-	while (!p->sibling && p->parent)
+	if (!list_empty(&p->child))
+		return resource_first_child(&p->child);
+	while (!resource_sibling(p) && p->parent)
 		p = p->parent;
-	return p->sibling;
+	return resource_sibling(p);
 }
 
 static void *r_next(struct seq_file *m, void *v, loff_t *pos)
@@ -90,7 +94,7 @@ static void *r_start(struct seq_file *m, loff_t *pos)
 	struct resource *p = PDE_DATA(file_inode(m->file));
 	loff_t l = 0;
 	read_lock(&resource_lock);
-	for (p = p->child; p && l < *pos; p = r_next(m, p, &l))
+	for (p = resource_first_child(&p->child); p && l < *pos; p = r_next(m, p, &l))
 		;
 	return p;
 }
@@ -153,8 +157,7 @@ static void free_resource(struct resource *res)
 
 	if (!PageSlab(virt_to_head_page(res))) {
 		spin_lock(&bootmem_resource_lock);
-		res->sibling = bootmem_resource_free;
-		bootmem_resource_free = res;
+		list_add(&res->sibling, &bootmem_resource_free);
 		spin_unlock(&bootmem_resource_lock);
 	} else {
 		kfree(res);
@@ -166,10 +169,9 @@ static struct resource *alloc_resource(gfp_t flags)
 	struct resource *res = NULL;
 
 	spin_lock(&bootmem_resource_lock);
-	if (bootmem_resource_free) {
-		res = bootmem_resource_free;
-		bootmem_resource_free = res->sibling;
-	}
+	res = resource_first_child(&bootmem_resource_free);
+	if (res)
+		list_del(&res->sibling);
 	spin_unlock(&bootmem_resource_lock);
 
 	if (res)
@@ -177,6 +179,8 @@ static struct resource *alloc_resource(gfp_t flags)
 	else
 		res = kzalloc(sizeof(struct resource), flags);
 
+	INIT_LIST_HEAD(&res->child);
+	INIT_LIST_HEAD(&res->sibling);
 	return res;
 }
 
@@ -185,7 +189,7 @@ static struct resource * __request_resource(struct resource *root, struct resour
 {
 	resource_size_t start = new->start;
 	resource_size_t end = new->end;
-	struct resource *tmp, **p;
+	struct resource *tmp;
 
 	if (end < start)
 		return root;
@@ -193,64 +197,62 @@ static struct resource * __request_resource(struct resource *root, struct resour
 		return root;
 	if (end > root->end)
 		return root;
-	p = &root->child;
-	for (;;) {
-		tmp = *p;
-		if (!tmp || tmp->start > end) {
-			new->sibling = tmp;
-			*p = new;
+
+	if (list_empty(&root->child)) {
+		list_add(&new->sibling, &root->child);
+		new->parent = root;
+		INIT_LIST_HEAD(&new->child);
+		return NULL;
+	}
+
+	list_for_each_entry(tmp, &root->child, sibling) {
+		if (tmp->start > end) {
+			list_add(&new->sibling, tmp->sibling.prev);
 			new->parent = root;
+			INIT_LIST_HEAD(&new->child);
 			return NULL;
 		}
-		p = &tmp->sibling;
 		if (tmp->end < start)
 			continue;
 		return tmp;
 	}
+
+	list_add_tail(&new->sibling, &root->child);
+	new->parent = root;
+	INIT_LIST_HEAD(&new->child);
+	return NULL;
 }
 
 static int __release_resource(struct resource *old, bool release_child)
 {
-	struct resource *tmp, **p, *chd;
+	struct resource *tmp, *next, *chd;
 
-	p = &old->parent->child;
-	for (;;) {
-		tmp = *p;
-		if (!tmp)
-			break;
+	list_for_each_entry_safe(tmp, next, &old->parent->child, sibling) {
 		if (tmp == old) {
-			if (release_child || !(tmp->child)) {
-				*p = tmp->sibling;
+			if (release_child || list_empty(&tmp->child)) {
+				list_del(&tmp->sibling);
 			} else {
-				for (chd = tmp->child;; chd = chd->sibling) {
+				list_for_each_entry(chd, &tmp->child, sibling)
 					chd->parent = tmp->parent;
-					if (!(chd->sibling))
-						break;
-				}
-				*p = tmp->child;
-				chd->sibling = tmp->sibling;
+				list_splice(&tmp->child, tmp->sibling.prev);
+				list_del(&tmp->sibling);
 			}
+
 			old->parent = NULL;
 			return 0;
 		}
-		p = &tmp->sibling;
 	}
 	return -EINVAL;
 }
 
 static void __release_child_resources(struct resource *r)
 {
-	struct resource *tmp, *p;
+	struct resource *tmp, *next;
 	resource_size_t size;
 
-	p = r->child;
-	r->child = NULL;
-	while (p) {
-		tmp = p;
-		p = p->sibling;
-
+	list_for_each_entry_safe(tmp, next, &r->child, sibling) {
 		tmp->parent = NULL;
-		tmp->sibling = NULL;
+		list_del_init(&tmp->sibling);
 		__release_child_resources(tmp);
 
 		printk(KERN_DEBUG "release child resource %pR\n", tmp);
@@ -259,6 +261,8 @@ static void __release_child_resources(struct resource *r)
 		tmp->start = 0;
 		tmp->end = size - 1;
 	}
+
+	INIT_LIST_HEAD(&tmp->child);
 }
 
 void release_child_resources(struct resource *r)
@@ -343,7 +347,8 @@ static int find_next_iomem_res(struct resource *res, unsigned long desc,
 
 	read_lock(&resource_lock);
 
-	for (p = iomem_resource.child; p; p = next_resource(p, sibling_only)) {
+	for (p = resource_first_child(&iomem_resource.child); p;
+			p = next_resource(p, sibling_only)) {
 		if ((p->flags & res->flags) != res->flags)
 			continue;
 		if ((desc != IORES_DESC_NONE) && (desc != p->desc))
@@ -532,7 +537,7 @@ int region_intersects(resource_size_t start, size_t size, unsigned long flags,
 	struct resource *p;
 
 	read_lock(&resource_lock);
-	for (p = iomem_resource.child; p ; p = p->sibling) {
+	list_for_each_entry(p, &iomem_resource.child, sibling) {
 		bool is_type = (((p->flags & flags) == flags) &&
 				((desc == IORES_DESC_NONE) ||
 				 (desc == p->desc)));
@@ -586,7 +591,7 @@ static int __find_resource(struct resource *root, struct resource *old,
 			 resource_size_t  size,
 			 struct resource_constraint *constraint)
 {
-	struct resource *this = root->child;
+	struct resource *this = resource_first_child(&root->child);
 	struct resource tmp = *new, avail, alloc;
 
 	tmp.start = root->start;
@@ -596,7 +601,7 @@ static int __find_resource(struct resource *root, struct resource *old,
 	 */
 	if (this && this->start == root->start) {
 		tmp.start = (this == old) ? old->start : this->end + 1;
-		this = this->sibling;
+		this = resource_sibling(this);
 	}
 	for(;;) {
 		if (this)
@@ -632,7 +637,7 @@ next:		if (!this || this->end == root->end)
 
 		if (this != old)
 			tmp.start = this->end + 1;
-		this = this->sibling;
+		this = resource_sibling(this);
 	}
 	return -EBUSY;
 }
@@ -676,7 +681,7 @@ static int reallocate_resource(struct resource *root, struct resource *old,
 		goto out;
 	}
 
-	if (old->child) {
+	if (!list_empty(&old->child)) {
 		err = -EBUSY;
 		goto out;
 	}
@@ -757,7 +762,7 @@ struct resource *lookup_resource(struct resource *root, resource_size_t start)
 	struct resource *res;
 
 	read_lock(&resource_lock);
-	for (res = root->child; res; res = res->sibling) {
+	list_for_each_entry(res, &root->child, sibling) {
 		if (res->start == start)
 			break;
 	}
@@ -790,32 +795,27 @@ static struct resource * __insert_resource(struct resource *parent, struct resou
 			break;
 	}
 
-	for (next = first; ; next = next->sibling) {
+	for (next = first; ; next = resource_sibling(next)) {
 		/* Partial overlap? Bad, and unfixable */
 		if (next->start < new->start || next->end > new->end)
 			return next;
-		if (!next->sibling)
+		if (!resource_sibling(next))
 			break;
-		if (next->sibling->start > new->end)
+		if (resource_sibling(next)->start > new->end)
 			break;
 	}
-
 	new->parent = parent;
-	new->sibling = next->sibling;
-	new->child = first;
+	list_add(&new->sibling, &next->sibling);
+	INIT_LIST_HEAD(&new->child);
 
-	next->sibling = NULL;
-	for (next = first; next; next = next->sibling)
+	/*
+	 * From first to next, they all fall into new's region, so change them
+	 * as new's children.
+	 */
+	list_cut_position(&new->child, first->sibling.prev, &next->sibling);
+	list_for_each_entry(next, &new->child, sibling)
 		next->parent = new;
 
-	if (parent->child == first) {
-		parent->child = new;
-	} else {
-		next = parent->child;
-		while (next->sibling != first)
-			next = next->sibling;
-		next->sibling = new;
-	}
 	return NULL;
 }
 
@@ -937,19 +937,17 @@ static int __adjust_resource(struct resource *res, resource_size_t start,
 	if ((start < parent->start) || (end > parent->end))
 		goto out;
 
-	if (res->sibling && (res->sibling->start <= end))
+	if (resource_sibling(res) && (resource_sibling(res)->start <= end))
 		goto out;
 
-	tmp = parent->child;
-	if (tmp != res) {
-		while (tmp->sibling != res)
-			tmp = tmp->sibling;
+	if (res->sibling.prev != &parent->child) {
+		tmp = list_prev_entry(res, sibling);
 		if (start <= tmp->end)
 			goto out;
 	}
 
 skip:
-	for (tmp = res->child; tmp; tmp = tmp->sibling)
+	list_for_each_entry(tmp, &res->child, sibling)
 		if ((tmp->start < start) || (tmp->end > end))
 			goto out;
 
@@ -996,27 +994,30 @@ EXPORT_SYMBOL(adjust_resource);
  */
 int reparent_resources(struct resource *parent, struct resource *res)
 {
-	struct resource *p, **pp;
-	struct resource **firstpp = NULL;
+	struct resource *p, *first = NULL;
 
-	for (pp = &parent->child; (p = *pp) != NULL; pp = &p->sibling) {
+	list_for_each_entry(p, &parent->child, sibling) {
 		if (p->end < res->start)
 			continue;
 		if (res->end < p->start)
 			break;
 		if (p->start < res->start || p->end > res->end)
 			return -ENOTSUPP;	/* not completely contained */
-		if (firstpp == NULL)
-			firstpp = pp;
+		if (first == NULL)
+			first = p;
 	}
-	if (firstpp == NULL)
+	if (first == NULL)
 		return -ECANCELED; /* didn't find any conflicting entries? */
 	res->parent = parent;
-	res->child = *firstpp;
-	res->sibling = *pp;
-	*firstpp = res;
-	*pp = NULL;
-	for (p = res->child; p != NULL; p = p->sibling) {
+	list_add(&res->sibling, p->sibling.prev);
+	INIT_LIST_HEAD(&res->child);
+
+	/*
+	 * From first to p's previous sibling, they all fall into
+	 * res's region, change them as res's children.
+	 */
+	list_cut_position(&res->child, first->sibling.prev, res->sibling.prev);
+	list_for_each_entry(p, &res->child, sibling) {
 		p->parent = res;
 		pr_debug("PCI: Reparented %s %pR under %s\n",
 			 p->name, p, res->name);
@@ -1216,34 +1217,32 @@ EXPORT_SYMBOL(__request_region);
 void __release_region(struct resource *parent, resource_size_t start,
 			resource_size_t n)
 {
-	struct resource **p;
+	struct resource *res;
 	resource_size_t end;
 
-	p = &parent->child;
+	res = resource_first_child(&parent->child);
 	end = start + n - 1;
 
 	write_lock(&resource_lock);
 
 	for (;;) {
-		struct resource *res = *p;
-
 		if (!res)
 			break;
 		if (res->start <= start && res->end >= end) {
 			if (!(res->flags & IORESOURCE_BUSY)) {
-				p = &res->child;
+				res = resource_first_child(&res->child);
 				continue;
 			}
 			if (res->start != start || res->end != end)
 				break;
-			*p = res->sibling;
+			list_del(&res->sibling);
 			write_unlock(&resource_lock);
 			if (res->flags & IORESOURCE_MUXED)
 				wake_up(&muxed_resource_wait);
 			free_resource(res);
 			return;
 		}
-		p = &res->sibling;
+		res = resource_sibling(res);
 	}
 
 	write_unlock(&resource_lock);
@@ -1278,9 +1277,7 @@ EXPORT_SYMBOL(__release_region);
 int release_mem_region_adjustable(struct resource *parent,
 			resource_size_t start, resource_size_t size)
 {
-	struct resource **p;
-	struct resource *res;
-	struct resource *new_res;
+	struct resource *res, *new_res;
 	resource_size_t end;
 	int ret = -EINVAL;
 
@@ -1291,16 +1288,16 @@ int release_mem_region_adjustable(struct resource *parent,
 	/* The alloc_resource() result gets checked later */
 	new_res = alloc_resource(GFP_KERNEL);
 
-	p = &parent->child;
+	res = resource_first_child(&parent->child);
 	write_lock(&resource_lock);
 
-	while ((res = *p)) {
+	while ((res)) {
 		if (res->start >= end)
 			break;
 
 		/* look for the next resource if it does not fit into */
 		if (res->start > start || res->end < end) {
-			p = &res->sibling;
+			res = resource_sibling(res);
 			continue;
 		}
 
@@ -1308,14 +1305,14 @@ int release_mem_region_adjustable(struct resource *parent,
 			break;
 
 		if (!(res->flags & IORESOURCE_BUSY)) {
-			p = &res->child;
+			res = resource_first_child(&res->child);
 			continue;
 		}
 
 		/* found the target resource; let's adjust accordingly */
 		if (res->start == start && res->end == end) {
 			/* free the whole entry */
-			*p = res->sibling;
+			list_del(&res->sibling);
 			free_resource(res);
 			ret = 0;
 		} else if (res->start == start && res->end != end) {
@@ -1338,14 +1335,13 @@ int release_mem_region_adjustable(struct resource *parent,
 			new_res->flags = res->flags;
 			new_res->desc = res->desc;
 			new_res->parent = res->parent;
-			new_res->sibling = res->sibling;
-			new_res->child = NULL;
+			INIT_LIST_HEAD(&new_res->child);
 
 			ret = __adjust_resource(res, res->start,
 						start - res->start);
 			if (ret)
 				break;
-			res->sibling = new_res;
+			list_add(&new_res->sibling, &res->sibling);
 			new_res = NULL;
 		}
 
@@ -1526,7 +1522,7 @@ static int __init reserve_setup(char *str)
 			res->end = io_start + io_num - 1;
 			res->flags |= IORESOURCE_BUSY;
 			res->desc = IORES_DESC_NONE;
-			res->child = NULL;
+			INIT_LIST_HEAD(&res->child);
 			if (request_resource(parent, res) == 0)
 				reserved = x+1;
 		}
@@ -1546,7 +1542,7 @@ int iomem_map_sanity_check(resource_size_t addr, unsigned long size)
 	loff_t l;
 
 	read_lock(&resource_lock);
-	for (p = p->child; p ; p = r_next(NULL, p, &l)) {
+	for (p = resource_first_child(&p->child); p; p = r_next(NULL, p, &l)) {
 		/*
 		 * We can probably skip the resources without
 		 * IORESOURCE_IO attribute?
@@ -1602,7 +1598,7 @@ bool iomem_is_exclusive(u64 addr)
 	addr = addr & PAGE_MASK;
 
 	read_lock(&resource_lock);
-	for (p = p->child; p ; p = r_next(NULL, p, &l)) {
+	for (p = resource_first_child(&p->child); p; p = r_next(NULL, p, &l)) {
 		/*
 		 * We can probably skip the resources without
 		 * IORESOURCE_IO attribute?
-- 
2.13.6

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v7 2/4] resource: Use list_head to link sibling resource
@ 2018-07-18  2:49   ` Baoquan He
  0 siblings, 0 replies; 83+ messages in thread
From: Baoquan He @ 2018-07-18  2:49 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	dan.j.williams-ral2JQCrhuEAvxtiuMwx3w,
	nicolas.pitre-QSEj5FYQhm4dnm+yROfE0A,
	josh-iaAMLnmF4UmaiuxdJuQwMA, fengguang.wu-ral2JQCrhuEAvxtiuMwx3w,
	bp-l3A5Bk7waGM, andy.shevchenko-Re5JQEeQqe8AvxtiuMwx3w
  Cc: linux-mips-6z/3iImG2C8G8FEW9MqTrA, brijesh.singh-5C7GfCeVMHo,
	devicetree-u79uwXL29TY76Z2rM5mHXA, airlied-cv59FeDIM0c,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	richard.weiyang-Re5JQEeQqe8AvxtiuMwx3w,
	jcmvbkbc-Re5JQEeQqe8AvxtiuMwx3w, Paul Mackerras,
	baiyaowei-0p4V/sDNsUmm0O/7XYngnFaTQe2KTcn/,
	kys-0li6OtcxBFHby3iVrkZq2A, frowand.list-Re5JQEeQqe8AvxtiuMwx3w,
	lorenzo.pieralisi-5wv7dgnIgG8, sthemmin-0li6OtcxBFHby3iVrkZq2A,
	Baoquan He, linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	Michael Ellerman, patrik.r.jakobsson-Re5JQEeQqe8AvxtiuMwx3w,
	linux-input-u79uwXL29TY76Z2rM5mHXA,
	gustavo-THi1TnShQwVAfugRpC6u6w, dyoung-H+wXaHxf7aLQT0dZR+AlfA,
	thomas.lendacky-5C7GfCeVMHo, haiyangz-0li6OtcxBFHby3iVrkZq2A,
	maarten.lankhorst-VuQAYsv1563Yd54FQh9/CA,
	jglisse-H+wXaHxf7aLQT0dZR+AlfA, seanpaul-F7+t8E8rja9g9hUCZPvPmw,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA, tglx-hfZtesqFncYOwBW4kG4KsQ,
	yinghai-DgEjT+Ai2ygdnm+yROfE0A,
	jonathan.derrick-ral2JQCrhuEAvxtiuMwx3w,
	chris-YvXeqwSYzG2sTnJN9+BGXg, monstr-pSz03upnqPeHXe+LvDLADg,
	linux-parisc-u79uwXL29TY76Z2rM5mHXA,
	gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r,
	dmitry.torokhov-Re5JQEeQqe8AvxtiuMwx3w, Benjamin

VGhlIHN0cnVjdCByZXNvdXJjZSB1c2VzIHNpbmdseSBsaW5rZWQgbGlzdCB0byBsaW5rIHNpYmxp
bmdzLCBpbXBsZW1lbnRlZApieSBwb2ludGVyIG9wZXJhdGlvbi4gUmVwbGFjZSBpdCB3aXRoIGxp
c3RfaGVhZCBmb3IgYmV0dGVyIGNvZGUgcmVhZGFiaWxpdHkuCgpCYXNlZCBvbiB0aGlzIGxpc3Rf
aGVhZCByZXBsYWNlbWVudCwgaXQgd2lsbCBiZSB2ZXJ5IGVhc3kgdG8gZG8gcmV2ZXJzZQppdGVy
YXRpb24gb24gaW9tZW1fcmVzb3VyY2UncyBzaWJsaW5nIGxpc3QgaW4gbGF0ZXIgcGF0Y2guCgpC
ZXNpZGVzLCB0eXBlIG9mIG1lbWJlciB2YXJpYWJsZXMgb2Ygc3RydWN0IHJlc291cmNlLCBzaWJs
aW5nIGFuZCBjaGlsZCwgYXJlCmNoYW5nZWQgZnJvbSAnc3RydWN0IHJlc291cmNlIConIHRvICdz
dHJ1Y3QgbGlzdF9oZWFkJy4gVGhpcyBicmluZ3MgdHdvCnBvaW50ZXJzIG9mIHNpemUgaW5jcmVh
c2UuCgpTdWdnZXN0ZWQtYnk6IEFuZHJldyBNb3J0b24gPGFrcG1AbGludXgtZm91bmRhdGlvbi5v
cmc+ClNpZ25lZC1vZmYtYnk6IEJhb3F1YW4gSGUgPGJoZUByZWRoYXQuY29tPgpDYzogUGF0cmlr
IEpha29ic3NvbiA8cGF0cmlrLnIuamFrb2Jzc29uQGdtYWlsLmNvbT4KQ2M6IERhdmlkIEFpcmxp
ZSA8YWlybGllZEBsaW51eC5pZT4KQ2M6ICJLLiBZLiBTcmluaXZhc2FuIiA8a3lzQG1pY3Jvc29m
dC5jb20+CkNjOiBIYWl5YW5nIFpoYW5nIDxoYWl5YW5nekBtaWNyb3NvZnQuY29tPgpDYzogU3Rl
cGhlbiBIZW1taW5nZXIgPHN0aGVtbWluQG1pY3Jvc29mdC5jb20+CkNjOiBEbWl0cnkgVG9yb2to
b3YgPGRtaXRyeS50b3Jva2hvdkBnbWFpbC5jb20+CkNjOiBEYW4gV2lsbGlhbXMgPGRhbi5qLndp
bGxpYW1zQGludGVsLmNvbT4KQ2M6IFJvYiBIZXJyaW5nIDxyb2JoK2R0QGtlcm5lbC5vcmc+CkNj
OiBGcmFuayBSb3dhbmQgPGZyb3dhbmQubGlzdEBnbWFpbC5jb20+CkNjOiBLZWl0aCBCdXNjaCA8
a2VpdGguYnVzY2hAaW50ZWwuY29tPgpDYzogSm9uYXRoYW4gRGVycmljayA8am9uYXRoYW4uZGVy
cmlja0BpbnRlbC5jb20+CkNjOiBMb3JlbnpvIFBpZXJhbGlzaSA8bG9yZW56by5waWVyYWxpc2lA
YXJtLmNvbT4KQ2M6IEJqb3JuIEhlbGdhYXMgPGJoZWxnYWFzQGdvb2dsZS5jb20+CkNjOiBUaG9t
YXMgR2xlaXhuZXIgPHRnbHhAbGludXRyb25peC5kZT4KQ2M6IEJyaWplc2ggU2luZ2ggPGJyaWpl
c2guc2luZ2hAYW1kLmNvbT4KQ2M6ICJKw6lyw7RtZSBHbGlzc2UiIDxqZ2xpc3NlQHJlZGhhdC5j
b20+CkNjOiBCb3Jpc2xhdiBQZXRrb3YgPGJwQHN1c2UuZGU+CkNjOiBUb20gTGVuZGFja3kgPHRo
b21hcy5sZW5kYWNreUBhbWQuY29tPgpDYzogR3JlZyBLcm9haC1IYXJ0bWFuIDxncmVna2hAbGlu
dXhmb3VuZGF0aW9uLm9yZz4KQ2M6IFlhb3dlaSBCYWkgPGJhaXlhb3dlaUBjbXNzLmNoaW5hbW9i
aWxlLmNvbT4KQ2M6IFdlaSBZYW5nIDxyaWNoYXJkLndlaXlhbmdAZ21haWwuY29tPgpDYzogZGV2
ZWxAbGludXhkcml2ZXJwcm9qZWN0Lm9yZwpDYzogbGludXgtaW5wdXRAdmdlci5rZXJuZWwub3Jn
CkNjOiBsaW51eC1udmRpbW1AbGlzdHMuMDEub3JnCkNjOiBkZXZpY2V0cmVlQHZnZXIua2VybmVs
Lm9yZwpDYzogbGludXgtcGNpQHZnZXIua2VybmVsLm9yZwpDYzogTWljaGFsIFNpbWVrIDxtb25z
dHJAbW9uc3RyLmV1PgpDYzogQmVuamFtaW4gSGVycmVuc2NobWlkdCA8YmVuaEBrZXJuZWwuY3Jh
c2hpbmcub3JnPiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg
ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIApDYzogUGF1bCBN
YWNrZXJyYXMgPHBhdWx1c0BzYW1iYS5vcmc+CkNjOiBNaWNoYWVsIEVsbGVybWFuIDxtcGVAZWxs
ZXJtYW4uaWQuYXU+CkNjOiBsaW51eC1taXBzQGxpbnV4LW1pcHMub3JnCi0tLQogYXJjaC9hcm0v
cGxhdC1zYW1zdW5nL3BtLWNoZWNrLmMgICAgICAgICAgICB8ICAgNiArLQogYXJjaC9pYTY0L3Nu
L2tlcm5lbC9pb19pbml0LmMgICAgICAgICAgICAgICB8ICAgMiArLQogYXJjaC9taWNyb2JsYXpl
L3BjaS9wY2ktY29tbW9uLmMgICAgICAgICAgICB8ICAgNCArLQogYXJjaC9taXBzL3BjaS9wY2kt
cmMzMjQzNC5jICAgICAgICAgICAgICAgICB8ICAxMiArLQogYXJjaC9wb3dlcnBjL2tlcm5lbC9w
Y2ktY29tbW9uLmMgICAgICAgICAgICB8ICAgNCArLQogYXJjaC9zcGFyYy9rZXJuZWwvaW9wb3J0
LmMgICAgICAgICAgICAgICAgICB8ICAgMiArLQogYXJjaC94dGVuc2EvaW5jbHVkZS9hc20vcGNp
LWJyaWRnZS5oICAgICAgICB8ICAgNCArLQogZHJpdmVycy9laXNhL2Vpc2EtYnVzLmMgICAgICAg
ICAgICAgICAgICAgICB8ICAgMiArCiBkcml2ZXJzL2dwdS9kcm0vZHJtX21lbW9yeS5jICAgICAg
ICAgICAgICAgIHwgICAzICstCiBkcml2ZXJzL2dwdS9kcm0vZ21hNTAwL2d0dC5jICAgICAgICAg
ICAgICAgIHwgICA1ICstCiBkcml2ZXJzL2h2L3ZtYnVzX2Rydi5jICAgICAgICAgICAgICAgICAg
ICAgIHwgIDUyICsrKy0tLS0KIGRyaXZlcnMvaW5wdXQvam95c3RpY2svaWZvcmNlL2lmb3JjZS1t
YWluLmMgfCAgIDQgKy0KIGRyaXZlcnMvbnZkaW1tL25hbWVzcGFjZV9kZXZzLmMgICAgICAgICAg
ICAgfCAgIDYgKy0KIGRyaXZlcnMvbnZkaW1tL25kLmggICAgICAgICAgICAgICAgICAgICAgICAg
fCAgIDUgKy0KIGRyaXZlcnMvb2YvYWRkcmVzcy5jICAgICAgICAgICAgICAgICAgICAgICAgfCAg
IDQgKy0KIGRyaXZlcnMvcGFyaXNjL2xiYV9wY2kuYyAgICAgICAgICAgICAgICAgICAgfCAgIDQg
Ky0KIGRyaXZlcnMvcGNpL2NvbnRyb2xsZXIvdm1kLmMgICAgICAgICAgICAgICAgfCAgIDggKy0K
IGRyaXZlcnMvcGNpL3Byb2JlLmMgICAgICAgICAgICAgICAgICAgICAgICAgfCAgIDIgKwogZHJp
dmVycy9wY2kvc2V0dXAtYnVzLmMgICAgICAgICAgICAgICAgICAgICB8ICAgMiArLQogaW5jbHVk
ZS9saW51eC9pb3BvcnQuaCAgICAgICAgICAgICAgICAgICAgICB8ICAxNyArKy0KIGtlcm5lbC9y
ZXNvdXJjZS5jICAgICAgICAgICAgICAgICAgICAgICAgICAgfCAyMDYgKysrKysrKysrKysrKyst
LS0tLS0tLS0tLS0tLQogMjEgZmlsZXMgY2hhbmdlZCwgMTgzIGluc2VydGlvbnMoKyksIDE3MSBk
ZWxldGlvbnMoLSkKCmRpZmYgLS1naXQgYS9hcmNoL2FybS9wbGF0LXNhbXN1bmcvcG0tY2hlY2su
YyBiL2FyY2gvYXJtL3BsYXQtc2Ftc3VuZy9wbS1jaGVjay5jCmluZGV4IGNkMmMwMmM2OGJjMy4u
NTQ5NDM1NWIxYzQ5IDEwMDY0NAotLS0gYS9hcmNoL2FybS9wbGF0LXNhbXN1bmcvcG0tY2hlY2su
YworKysgYi9hcmNoL2FybS9wbGF0LXNhbXN1bmcvcG0tY2hlY2suYwpAQCAtNDYsOCArNDYsOCBA
QCB0eXBlZGVmIHUzMiAqKHJ1bl9mbl90KShzdHJ1Y3QgcmVzb3VyY2UgKnB0ciwgdTMyICphcmcp
Owogc3RhdGljIHZvaWQgczNjX3BtX3J1bl9yZXMoc3RydWN0IHJlc291cmNlICpwdHIsIHJ1bl9m
bl90IGZuLCB1MzIgKmFyZykKIHsKIAl3aGlsZSAocHRyICE9IE5VTEwpIHsKLQkJaWYgKHB0ci0+
Y2hpbGQgIT0gTlVMTCkKLQkJCXMzY19wbV9ydW5fcmVzKHB0ci0+Y2hpbGQsIGZuLCBhcmcpOwor
CQlpZiAoIWxpc3RfZW1wdHkoJnB0ci0+Y2hpbGQpKQorCQkJczNjX3BtX3J1bl9yZXMocmVzb3Vy
Y2VfZmlyc3RfY2hpbGQoJnB0ci0+Y2hpbGQpLCBmbiwgYXJnKTsKIAogCQlpZiAoKHB0ci0+Zmxh
Z3MgJiBJT1JFU09VUkNFX1NZU1RFTV9SQU0pCiAJCQkJPT0gSU9SRVNPVVJDRV9TWVNURU1fUkFN
KSB7CkBAIC01Nyw3ICs1Nyw3IEBAIHN0YXRpYyB2b2lkIHMzY19wbV9ydW5fcmVzKHN0cnVjdCBy
ZXNvdXJjZSAqcHRyLCBydW5fZm5fdCBmbiwgdTMyICphcmcpCiAJCQlhcmcgPSAoZm4pKHB0ciwg
YXJnKTsKIAkJfQogCi0JCXB0ciA9IHB0ci0+c2libGluZzsKKwkJcHRyID0gcmVzb3VyY2Vfc2li
bGluZyhwdHIpOwogCX0KIH0KIApkaWZmIC0tZ2l0IGEvYXJjaC9pYTY0L3NuL2tlcm5lbC9pb19p
bml0LmMgYi9hcmNoL2lhNjQvc24va2VybmVsL2lvX2luaXQuYwppbmRleCBkNjM4MDlhNmFkZmEu
LjMzOGE3YjdmMTk0ZCAxMDA2NDQKLS0tIGEvYXJjaC9pYTY0L3NuL2tlcm5lbC9pb19pbml0LmMK
KysrIGIvYXJjaC9pYTY0L3NuL2tlcm5lbC9pb19pbml0LmMKQEAgLTE5Miw3ICsxOTIsNyBAQCBz
bl9pb19zbG90X2ZpeHVwKHN0cnVjdCBwY2lfZGV2ICpkZXYpCiAJCSAqIGlmIGl0J3MgYWxyZWFk
eSBpbiB0aGUgZGV2aWNlIHN0cnVjdHVyZSwgcmVtb3ZlIGl0IGJlZm9yZQogCQkgKiBpbnNlcnRp
bmcKIAkJICovCi0JCWlmIChyZXMtPnBhcmVudCAmJiByZXMtPnBhcmVudC0+Y2hpbGQpCisJCWlm
IChyZXMtPnBhcmVudCAmJiAhbGlzdF9lbXB0eSgmcmVzLT5wYXJlbnQtPmNoaWxkKSkKIAkJCXJl
bGVhc2VfcmVzb3VyY2UocmVzKTsKIAogCQlpZiAocmVzLT5mbGFncyAmIElPUkVTT1VSQ0VfSU8p
CmRpZmYgLS1naXQgYS9hcmNoL21pY3JvYmxhemUvcGNpL3BjaS1jb21tb24uYyBiL2FyY2gvbWlj
cm9ibGF6ZS9wY2kvcGNpLWNvbW1vbi5jCmluZGV4IDc4OTliYWZhYjA2NC4uMmJmNzNlMjdlMjMx
IDEwMDY0NAotLS0gYS9hcmNoL21pY3JvYmxhemUvcGNpL3BjaS1jb21tb24uYworKysgYi9hcmNo
L21pY3JvYmxhemUvcGNpL3BjaS1jb21tb24uYwpAQCAtNTMzLDcgKzUzMyw5IEBAIHZvaWQgcGNp
X3Byb2Nlc3NfYnJpZGdlX09GX3JhbmdlcyhzdHJ1Y3QgcGNpX2NvbnRyb2xsZXIgKmhvc2UsCiAJ
CQlyZXMtPmZsYWdzID0gcmFuZ2UuZmxhZ3M7CiAJCQlyZXMtPnN0YXJ0ID0gcmFuZ2UuY3B1X2Fk
ZHI7CiAJCQlyZXMtPmVuZCA9IHJhbmdlLmNwdV9hZGRyICsgcmFuZ2Uuc2l6ZSAtIDE7Ci0JCQly
ZXMtPnBhcmVudCA9IHJlcy0+Y2hpbGQgPSByZXMtPnNpYmxpbmcgPSBOVUxMOworCQkJcmVzLT5w
YXJlbnQgPSBOVUxMOworCQkJSU5JVF9MSVNUX0hFQUQoJnJlcy0+Y2hpbGQpOworCQkJSU5JVF9M
SVNUX0hFQUQoJnJlcy0+c2libGluZyk7CiAJCX0KIAl9CiAKZGlmZiAtLWdpdCBhL2FyY2gvbWlw
cy9wY2kvcGNpLXJjMzI0MzQuYyBiL2FyY2gvbWlwcy9wY2kvcGNpLXJjMzI0MzQuYwppbmRleCA3
ZjZjZTZkNzM0YzAuLmU4MDI4M2RmNzkyNSAxMDA2NDQKLS0tIGEvYXJjaC9taXBzL3BjaS9wY2kt
cmMzMjQzNC5jCisrKyBiL2FyY2gvbWlwcy9wY2kvcGNpLXJjMzI0MzQuYwpAQCAtNTMsOCArNTMs
OCBAQCBzdGF0aWMgc3RydWN0IHJlc291cmNlIHJjMzI0MzRfcmVzX3BjaV9tZW0xID0gewogCS5z
dGFydCA9IDB4NTAwMDAwMDAsCiAJLmVuZCA9IDB4NUZGRkZGRkYsCiAJLmZsYWdzID0gSU9SRVNP
VVJDRV9NRU0sCi0JLnNpYmxpbmcgPSBOVUxMLAotCS5jaGlsZCA9ICZyYzMyNDM0X3Jlc19wY2lf
bWVtMgorCS5zaWJsaW5nID0gTElTVF9IRUFEX0lOSVQocmMzMjQzNF9yZXNfcGNpX21lbTEuc2li
bGluZyksCisJLmNoaWxkID0gTElTVF9IRUFEX0lOSVQocmMzMjQzNF9yZXNfcGNpX21lbTEuY2hp
bGQpLAogfTsKIAogc3RhdGljIHN0cnVjdCByZXNvdXJjZSByYzMyNDM0X3Jlc19wY2lfbWVtMiA9
IHsKQEAgLTYzLDggKzYzLDggQEAgc3RhdGljIHN0cnVjdCByZXNvdXJjZSByYzMyNDM0X3Jlc19w
Y2lfbWVtMiA9IHsKIAkuZW5kID0gMHg2RkZGRkZGRiwKIAkuZmxhZ3MgPSBJT1JFU09VUkNFX01F
TSwKIAkucGFyZW50ID0gJnJjMzI0MzRfcmVzX3BjaV9tZW0xLAotCS5zaWJsaW5nID0gTlVMTCwK
LQkuY2hpbGQgPSBOVUxMCisJLnNpYmxpbmcgPSBMSVNUX0hFQURfSU5JVChyYzMyNDM0X3Jlc19w
Y2lfbWVtMi5zaWJsaW5nKSwKKwkuY2hpbGQgPSBMSVNUX0hFQURfSU5JVChyYzMyNDM0X3Jlc19w
Y2lfbWVtMi5jaGlsZCksCiB9OwogCiBzdGF0aWMgc3RydWN0IHJlc291cmNlIHJjMzI0MzRfcmVz
X3BjaV9pbzEgPSB7CkBAIC03Miw2ICs3Miw4IEBAIHN0YXRpYyBzdHJ1Y3QgcmVzb3VyY2UgcmMz
MjQzNF9yZXNfcGNpX2lvMSA9IHsKIAkuc3RhcnQgPSAweDE4ODAwMDAwLAogCS5lbmQgPSAweDE4
OEZGRkZGLAogCS5mbGFncyA9IElPUkVTT1VSQ0VfSU8sCisJLnNpYmxpbmcgPSBMSVNUX0hFQURf
SU5JVChyYzMyNDM0X3Jlc19wY2lfaW8xLnNpYmxpbmcpLAorCS5jaGlsZCA9IExJU1RfSEVBRF9J
TklUKHJjMzI0MzRfcmVzX3BjaV9pbzEuY2hpbGQpLAogfTsKIAogZXh0ZXJuIHN0cnVjdCBwY2lf
b3BzIHJjMzI0MzRfcGNpX29wczsKQEAgLTIwOCw2ICsyMTAsOCBAQCBzdGF0aWMgaW50IF9faW5p
dCByYzMyNDM0X3BjaV9pbml0KHZvaWQpCiAKIAlwcl9pbmZvKCJQQ0k6IEluaXRpYWxpemluZyBQ
Q0lcbiIpOwogCisJbGlzdF9hZGQoJnJjMzI0MzRfcmVzX3BjaV9tZW0yLnNpYmxpbmcsICZyYzMy
NDM0X3Jlc19wY2lfbWVtMS5jaGlsZCk7CisKIAlpb3BvcnRfcmVzb3VyY2Uuc3RhcnQgPSByYzMy
NDM0X3Jlc19wY2lfaW8xLnN0YXJ0OwogCWlvcG9ydF9yZXNvdXJjZS5lbmQgPSByYzMyNDM0X3Jl
c19wY2lfaW8xLmVuZDsKIApkaWZmIC0tZ2l0IGEvYXJjaC9wb3dlcnBjL2tlcm5lbC9wY2ktY29t
bW9uLmMgYi9hcmNoL3Bvd2VycGMva2VybmVsL3BjaS1jb21tb24uYwppbmRleCA5MjYwMzViYjM3
OGQuLjI4ZmJlODNjOWRhZiAxMDA2NDQKLS0tIGEvYXJjaC9wb3dlcnBjL2tlcm5lbC9wY2ktY29t
bW9uLmMKKysrIGIvYXJjaC9wb3dlcnBjL2tlcm5lbC9wY2ktY29tbW9uLmMKQEAgLTc2MSw3ICs3
NjEsOSBAQCB2b2lkIHBjaV9wcm9jZXNzX2JyaWRnZV9PRl9yYW5nZXMoc3RydWN0IHBjaV9jb250
cm9sbGVyICpob3NlLAogCQkJcmVzLT5mbGFncyA9IHJhbmdlLmZsYWdzOwogCQkJcmVzLT5zdGFy
dCA9IHJhbmdlLmNwdV9hZGRyOwogCQkJcmVzLT5lbmQgPSByYW5nZS5jcHVfYWRkciArIHJhbmdl
LnNpemUgLSAxOwotCQkJcmVzLT5wYXJlbnQgPSByZXMtPmNoaWxkID0gcmVzLT5zaWJsaW5nID0g
TlVMTDsKKwkJCXJlcy0+cGFyZW50ID0gTlVMTDsKKwkJCUlOSVRfTElTVF9IRUFEKCZyZXMtPmNo
aWxkKTsKKwkJCUlOSVRfTElTVF9IRUFEKCZyZXMtPnNpYmxpbmcpOwogCQl9CiAJfQogfQpkaWZm
IC0tZ2l0IGEvYXJjaC9zcGFyYy9rZXJuZWwvaW9wb3J0LmMgYi9hcmNoL3NwYXJjL2tlcm5lbC9p
b3BvcnQuYwppbmRleCBjY2E5MTM0Y2ZhN2QuLjk5ZWZlNGU5OGIxNiAxMDA2NDQKLS0tIGEvYXJj
aC9zcGFyYy9rZXJuZWwvaW9wb3J0LmMKKysrIGIvYXJjaC9zcGFyYy9rZXJuZWwvaW9wb3J0LmMK
QEAgLTY2OSw3ICs2NjksNyBAQCBzdGF0aWMgaW50IHNwYXJjX2lvX3Byb2Nfc2hvdyhzdHJ1Y3Qg
c2VxX2ZpbGUgKm0sIHZvaWQgKnYpCiAJc3RydWN0IHJlc291cmNlICpyb290ID0gbS0+cHJpdmF0
ZSwgKnI7CiAJY29uc3QgY2hhciAqbm07CiAKLQlmb3IgKHIgPSByb290LT5jaGlsZDsgciAhPSBO
VUxMOyByID0gci0+c2libGluZykgeworCWxpc3RfZm9yX2VhY2hfZW50cnkociwgJnJvb3QtPmNo
aWxkLCBzaWJsaW5nKSB7CiAJCWlmICgobm0gPSByLT5uYW1lKSA9PSBOVUxMKSBubSA9ICI/Pz8i
OwogCQlzZXFfcHJpbnRmKG0sICIlMDE2bGx4LSUwMTZsbHg6ICVzXG4iLAogCQkJCSh1bnNpZ25l
ZCBsb25nIGxvbmcpci0+c3RhcnQsCmRpZmYgLS1naXQgYS9hcmNoL3h0ZW5zYS9pbmNsdWRlL2Fz
bS9wY2ktYnJpZGdlLmggYi9hcmNoL3h0ZW5zYS9pbmNsdWRlL2FzbS9wY2ktYnJpZGdlLmgKaW5k
ZXggMGI2OGM3NmVjMWU2Li5mNDg3YjA2ODE3ZGYgMTAwNjQ0Ci0tLSBhL2FyY2gveHRlbnNhL2lu
Y2x1ZGUvYXNtL3BjaS1icmlkZ2UuaAorKysgYi9hcmNoL3h0ZW5zYS9pbmNsdWRlL2FzbS9wY2kt
YnJpZGdlLmgKQEAgLTcxLDggKzcxLDggQEAgc3RhdGljIGlubGluZSB2b2lkIHBjaWJpb3NfaW5p
dF9yZXNvdXJjZShzdHJ1Y3QgcmVzb3VyY2UgKnJlcywKIAlyZXMtPmZsYWdzID0gZmxhZ3M7CiAJ
cmVzLT5uYW1lID0gbmFtZTsKIAlyZXMtPnBhcmVudCA9IE5VTEw7Ci0JcmVzLT5zaWJsaW5nID0g
TlVMTDsKLQlyZXMtPmNoaWxkID0gTlVMTDsKKwlJTklUX0xJU1RfSEVBRCgmcmVzLT5jaGlsZCk7
CisJSU5JVF9MSVNUX0hFQUQoJnJlcy0+c2libGluZyk7CiB9CiAKIApkaWZmIC0tZ2l0IGEvZHJp
dmVycy9laXNhL2Vpc2EtYnVzLmMgYi9kcml2ZXJzL2Vpc2EvZWlzYS1idXMuYwppbmRleCAxZTgw
NjJmNmRiZmMuLmRiYTc4Zjc1ZmQwNiAxMDA2NDQKLS0tIGEvZHJpdmVycy9laXNhL2Vpc2EtYnVz
LmMKKysrIGIvZHJpdmVycy9laXNhL2Vpc2EtYnVzLmMKQEAgLTQwOCw2ICs0MDgsOCBAQCBzdGF0
aWMgc3RydWN0IHJlc291cmNlIGVpc2Ffcm9vdF9yZXMgPSB7CiAJLnN0YXJ0ID0gMCwKIAkuZW5k
ICAgPSAweGZmZmZmZmZmLAogCS5mbGFncyA9IElPUkVTT1VSQ0VfSU8sCisJLnNpYmxpbmcgPSBM
SVNUX0hFQURfSU5JVChlaXNhX3Jvb3RfcmVzLnNpYmxpbmcpLAorCS5jaGlsZCAgPSBMSVNUX0hF
QURfSU5JVChlaXNhX3Jvb3RfcmVzLmNoaWxkKSwKIH07CiAKIHN0YXRpYyBpbnQgZWlzYV9idXNf
Y291bnQ7CmRpZmYgLS1naXQgYS9kcml2ZXJzL2dwdS9kcm0vZHJtX21lbW9yeS5jIGIvZHJpdmVy
cy9ncHUvZHJtL2RybV9tZW1vcnkuYwppbmRleCBkNjllNGZjMWVlNzcuLjMzYmFhN2ZhNWU0MSAx
MDA2NDQKLS0tIGEvZHJpdmVycy9ncHUvZHJtL2RybV9tZW1vcnkuYworKysgYi9kcml2ZXJzL2dw
dS9kcm0vZHJtX21lbW9yeS5jCkBAIC0xNTUsOSArMTU1LDggQEAgdTY0IGRybV9nZXRfbWF4X2lv
bWVtKHZvaWQpCiAJc3RydWN0IHJlc291cmNlICp0bXA7CiAJcmVzb3VyY2Vfc2l6ZV90IG1heF9p
b21lbSA9IDA7CiAKLQlmb3IgKHRtcCA9IGlvbWVtX3Jlc291cmNlLmNoaWxkOyB0bXA7IHRtcCA9
IHRtcC0+c2libGluZykgeworCWxpc3RfZm9yX2VhY2hfZW50cnkodG1wLCAmaW9tZW1fcmVzb3Vy
Y2UuY2hpbGQsIHNpYmxpbmcpCiAJCW1heF9pb21lbSA9IG1heChtYXhfaW9tZW0sICB0bXAtPmVu
ZCk7Ci0JfQogCiAJcmV0dXJuIG1heF9pb21lbTsKIH0KZGlmZiAtLWdpdCBhL2RyaXZlcnMvZ3B1
L2RybS9nbWE1MDAvZ3R0LmMgYi9kcml2ZXJzL2dwdS9kcm0vZ21hNTAwL2d0dC5jCmluZGV4IDM5
NDliMDk5MDkxNi4uYWRkZDNiYzAwOWFmIDEwMDY0NAotLS0gYS9kcml2ZXJzL2dwdS9kcm0vZ21h
NTAwL2d0dC5jCisrKyBiL2RyaXZlcnMvZ3B1L2RybS9nbWE1MDAvZ3R0LmMKQEAgLTU2NSw3ICs1
NjUsNyBAQCBpbnQgcHNiX2d0dF9pbml0KHN0cnVjdCBkcm1fZGV2aWNlICpkZXYsIGludCByZXN1
bWUpCiBpbnQgcHNiX2d0dF9yZXN0b3JlKHN0cnVjdCBkcm1fZGV2aWNlICpkZXYpCiB7CiAJc3Ry
dWN0IGRybV9wc2JfcHJpdmF0ZSAqZGV2X3ByaXYgPSBkZXYtPmRldl9wcml2YXRlOwotCXN0cnVj
dCByZXNvdXJjZSAqciA9IGRldl9wcml2LT5ndHRfbWVtLT5jaGlsZDsKKwlzdHJ1Y3QgcmVzb3Vy
Y2UgKnI7CiAJc3RydWN0IGd0dF9yYW5nZSAqcmFuZ2U7CiAJdW5zaWduZWQgaW50IHJlc3RvcmVk
ID0gMCwgdG90YWwgPSAwLCBzaXplID0gMDsKIApAQCAtNTczLDE0ICs1NzMsMTMgQEAgaW50IHBz
Yl9ndHRfcmVzdG9yZShzdHJ1Y3QgZHJtX2RldmljZSAqZGV2KQogCW11dGV4X2xvY2soJmRldl9w
cml2LT5ndHRfbXV0ZXgpOwogCXBzYl9ndHRfaW5pdChkZXYsIDEpOwogCi0Jd2hpbGUgKHIgIT0g
TlVMTCkgeworCWxpc3RfZm9yX2VhY2hfZW50cnkociwgJmRldl9wcml2LT5ndHRfbWVtLT5jaGls
ZCwgc2libGluZykgewogCQlyYW5nZSA9IGNvbnRhaW5lcl9vZihyLCBzdHJ1Y3QgZ3R0X3Jhbmdl
LCByZXNvdXJjZSk7CiAJCWlmIChyYW5nZS0+cGFnZXMpIHsKIAkJCXBzYl9ndHRfaW5zZXJ0KGRl
diwgcmFuZ2UsIDEpOwogCQkJc2l6ZSArPSByYW5nZS0+cmVzb3VyY2UuZW5kIC0gcmFuZ2UtPnJl
c291cmNlLnN0YXJ0OwogCQkJcmVzdG9yZWQrKzsKIAkJfQotCQlyID0gci0+c2libGluZzsKIAkJ
dG90YWwrKzsKIAl9CiAJbXV0ZXhfdW5sb2NrKCZkZXZfcHJpdi0+Z3R0X211dGV4KTsKZGlmZiAt
LWdpdCBhL2RyaXZlcnMvaHYvdm1idXNfZHJ2LmMgYi9kcml2ZXJzL2h2L3ZtYnVzX2Rydi5jCmlu
ZGV4IGIxMGZlMjZjNDg5MS4uZDg3ZWM1YTFiYzRjIDEwMDY0NAotLS0gYS9kcml2ZXJzL2h2L3Zt
YnVzX2Rydi5jCisrKyBiL2RyaXZlcnMvaHYvdm1idXNfZHJ2LmMKQEAgLTE0MTIsOSArMTQxMiw4
IEBAIHN0YXRpYyBhY3BpX3N0YXR1cyB2bWJ1c193YWxrX3Jlc291cmNlcyhzdHJ1Y3QgYWNwaV9y
ZXNvdXJjZSAqcmVzLCB2b2lkICpjdHgpCiB7CiAJcmVzb3VyY2Vfc2l6ZV90IHN0YXJ0ID0gMDsK
IAlyZXNvdXJjZV9zaXplX3QgZW5kID0gMDsKLQlzdHJ1Y3QgcmVzb3VyY2UgKm5ld19yZXM7CisJ
c3RydWN0IHJlc291cmNlICpuZXdfcmVzLCAqdG1wOwogCXN0cnVjdCByZXNvdXJjZSAqKm9sZF9y
ZXMgPSAmaHlwZXJ2X21taW87Ci0Jc3RydWN0IHJlc291cmNlICoqcHJldl9yZXMgPSBOVUxMOwog
CiAJc3dpdGNoIChyZXMtPnR5cGUpIHsKIApAQCAtMTQ2MSw0NCArMTQ2MCwzNiBAQCBzdGF0aWMg
YWNwaV9zdGF0dXMgdm1idXNfd2Fsa19yZXNvdXJjZXMoc3RydWN0IGFjcGlfcmVzb3VyY2UgKnJl
cywgdm9pZCAqY3R4KQogCS8qCiAJICogSWYgdHdvIHJhbmdlcyBhcmUgYWRqYWNlbnQsIG1lcmdl
IHRoZW0uCiAJICovCi0JZG8gewotCQlpZiAoISpvbGRfcmVzKSB7Ci0JCQkqb2xkX3JlcyA9IG5l
d19yZXM7Ci0JCQlicmVhazsKLQkJfQotCi0JCWlmICgoKCpvbGRfcmVzKS0+ZW5kICsgMSkgPT0g
bmV3X3Jlcy0+c3RhcnQpIHsKLQkJCSgqb2xkX3JlcyktPmVuZCA9IG5ld19yZXMtPmVuZDsKKwlp
ZiAoISpvbGRfcmVzKSB7CisJCSpvbGRfcmVzID0gbmV3X3JlczsKKwkJcmV0dXJuIEFFX09LOwor
CX0KKwl0bXAgPSAqb2xkX3JlczsKKwlsaXN0X2Zvcl9lYWNoX2VudHJ5X2Zyb20odG1wLCAmdG1w
LT5wYXJlbnQtPmNoaWxkLCBzaWJsaW5nKSB7CisJCWlmICgodG1wLT5lbmQgKyAxKSA9PSBuZXdf
cmVzLT5zdGFydCkgeworCQkJdG1wLT5lbmQgPSBuZXdfcmVzLT5lbmQ7CiAJCQlrZnJlZShuZXdf
cmVzKTsKIAkJCWJyZWFrOwogCQl9CiAKLQkJaWYgKCgqb2xkX3JlcyktPnN0YXJ0ID09IG5ld19y
ZXMtPmVuZCArIDEpIHsKLQkJCSgqb2xkX3JlcyktPnN0YXJ0ID0gbmV3X3Jlcy0+c3RhcnQ7CisJ
CWlmICh0bXAtPnN0YXJ0ID09IG5ld19yZXMtPmVuZCArIDEpIHsKKwkJCXRtcC0+c3RhcnQgPSBu
ZXdfcmVzLT5zdGFydDsKIAkJCWtmcmVlKG5ld19yZXMpOwogCQkJYnJlYWs7CiAJCX0KIAotCQlp
ZiAoKCpvbGRfcmVzKS0+c3RhcnQgPiBuZXdfcmVzLT5lbmQpIHsKLQkJCW5ld19yZXMtPnNpYmxp
bmcgPSAqb2xkX3JlczsKLQkJCWlmIChwcmV2X3JlcykKLQkJCQkoKnByZXZfcmVzKS0+c2libGlu
ZyA9IG5ld19yZXM7Ci0JCQkqb2xkX3JlcyA9IG5ld19yZXM7CisJCWlmICh0bXAtPnN0YXJ0ID4g
bmV3X3Jlcy0+ZW5kKSB7CisJCQlsaXN0X2FkZCgmbmV3X3Jlcy0+c2libGluZywgdG1wLT5zaWJs
aW5nLnByZXYpOwogCQkJYnJlYWs7CiAJCX0KLQotCQlwcmV2X3JlcyA9IG9sZF9yZXM7Ci0JCW9s
ZF9yZXMgPSAmKCpvbGRfcmVzKS0+c2libGluZzsKLQotCX0gd2hpbGUgKDEpOworCX0KIAogCXJl
dHVybiBBRV9PSzsKIH0KIAogc3RhdGljIGludCB2bWJ1c19hY3BpX3JlbW92ZShzdHJ1Y3QgYWNw
aV9kZXZpY2UgKmRldmljZSkKIHsKLQlzdHJ1Y3QgcmVzb3VyY2UgKmN1cl9yZXM7Ci0Jc3RydWN0
IHJlc291cmNlICpuZXh0X3JlczsKKwlzdHJ1Y3QgcmVzb3VyY2UgKnJlczsKIAogCWlmIChoeXBl
cnZfbW1pbykgewogCQlpZiAoZmJfbW1pbykgewpAQCAtMTUwNywxMCArMTQ5OCw5IEBAIHN0YXRp
YyBpbnQgdm1idXNfYWNwaV9yZW1vdmUoc3RydWN0IGFjcGlfZGV2aWNlICpkZXZpY2UpCiAJCQlm
Yl9tbWlvID0gTlVMTDsKIAkJfQogCi0JCWZvciAoY3VyX3JlcyA9IGh5cGVydl9tbWlvOyBjdXJf
cmVzOyBjdXJfcmVzID0gbmV4dF9yZXMpIHsKLQkJCW5leHRfcmVzID0gY3VyX3Jlcy0+c2libGlu
ZzsKLQkJCWtmcmVlKGN1cl9yZXMpOwotCQl9CisJCXJlcyA9IGh5cGVydl9tbWlvOworCQlsaXN0
X2Zvcl9lYWNoX2VudHJ5X2Zyb20ocmVzLCAmcmVzLT5wYXJlbnQtPmNoaWxkLCBzaWJsaW5nKQor
CQkJa2ZyZWUocmVzKTsKIAl9CiAKIAlyZXR1cm4gMDsKQEAgLTE1OTYsNyArMTU4Niw4IEBAIGlu
dCB2bWJ1c19hbGxvY2F0ZV9tbWlvKHN0cnVjdCByZXNvdXJjZSAqKm5ldywgc3RydWN0IGh2X2Rl
dmljZSAqZGV2aWNlX29iaiwKIAkJfQogCX0KIAotCWZvciAoaXRlciA9IGh5cGVydl9tbWlvOyBp
dGVyOyBpdGVyID0gaXRlci0+c2libGluZykgeworCWl0ZXIgPSBoeXBlcnZfbW1pbzsKKwlsaXN0
X2Zvcl9lYWNoX2VudHJ5X2Zyb20oaXRlciwgJml0ZXItPnBhcmVudC0+Y2hpbGQsIHNpYmxpbmcp
IHsKIAkJaWYgKChpdGVyLT5zdGFydCA+PSBtYXgpIHx8IChpdGVyLT5lbmQgPD0gbWluKSkKIAkJ
CWNvbnRpbnVlOwogCkBAIC0xNjM5LDcgKzE2MzAsOCBAQCB2b2lkIHZtYnVzX2ZyZWVfbW1pbyhy
ZXNvdXJjZV9zaXplX3Qgc3RhcnQsIHJlc291cmNlX3NpemVfdCBzaXplKQogCXN0cnVjdCByZXNv
dXJjZSAqaXRlcjsKIAogCWRvd24oJmh5cGVydl9tbWlvX2xvY2spOwotCWZvciAoaXRlciA9IGh5
cGVydl9tbWlvOyBpdGVyOyBpdGVyID0gaXRlci0+c2libGluZykgeworCWl0ZXIgPSBoeXBlcnZf
bW1pbzsKKwlsaXN0X2Zvcl9lYWNoX2VudHJ5X2Zyb20oaXRlciwgJml0ZXItPnBhcmVudC0+Y2hp
bGQsIHNpYmxpbmcpIHsKIAkJaWYgKChpdGVyLT5zdGFydCA+PSBzdGFydCArIHNpemUpIHx8IChp
dGVyLT5lbmQgPD0gc3RhcnQpKQogCQkJY29udGludWU7CiAKZGlmZiAtLWdpdCBhL2RyaXZlcnMv
aW5wdXQvam95c3RpY2svaWZvcmNlL2lmb3JjZS1tYWluLmMgYi9kcml2ZXJzL2lucHV0L2pveXN0
aWNrL2lmb3JjZS9pZm9yY2UtbWFpbi5jCmluZGV4IGRhZWViNGM3ZTNiMC4uNWMwYmUyN2IzM2Zm
IDEwMDY0NAotLS0gYS9kcml2ZXJzL2lucHV0L2pveXN0aWNrL2lmb3JjZS9pZm9yY2UtbWFpbi5j
CisrKyBiL2RyaXZlcnMvaW5wdXQvam95c3RpY2svaWZvcmNlL2lmb3JjZS1tYWluLmMKQEAgLTMw
NSw4ICszMDUsOCBAQCBpbnQgaWZvcmNlX2luaXRfZGV2aWNlKHN0cnVjdCBpZm9yY2UgKmlmb3Jj
ZSkKIAlpZm9yY2UtPmRldmljZV9tZW1vcnkuZW5kID0gMjAwOwogCWlmb3JjZS0+ZGV2aWNlX21l
bW9yeS5mbGFncyA9IElPUkVTT1VSQ0VfTUVNOwogCWlmb3JjZS0+ZGV2aWNlX21lbW9yeS5wYXJl
bnQgPSBOVUxMOwotCWlmb3JjZS0+ZGV2aWNlX21lbW9yeS5jaGlsZCA9IE5VTEw7Ci0JaWZvcmNl
LT5kZXZpY2VfbWVtb3J5LnNpYmxpbmcgPSBOVUxMOworCUlOSVRfTElTVF9IRUFEKCZpZm9yY2Ut
PmRldmljZV9tZW1vcnkuY2hpbGQpOworCUlOSVRfTElTVF9IRUFEKCZpZm9yY2UtPmRldmljZV9t
ZW1vcnkuc2libGluZyk7CiAKIC8qCiAgKiBXYWl0IHVudGlsIGRldmljZSByZWFkeSAtIHVudGls
IGl0IHNlbmRzIGl0cyBmaXJzdCByZXNwb25zZS4KZGlmZiAtLWdpdCBhL2RyaXZlcnMvbnZkaW1t
L25hbWVzcGFjZV9kZXZzLmMgYi9kcml2ZXJzL252ZGltbS9uYW1lc3BhY2VfZGV2cy5jCmluZGV4
IDI4YWZkZDY2ODkwNS4uZjUzZDQxMGQ5OTgxIDEwMDY0NAotLS0gYS9kcml2ZXJzL252ZGltbS9u
YW1lc3BhY2VfZGV2cy5jCisrKyBiL2RyaXZlcnMvbnZkaW1tL25hbWVzcGFjZV9kZXZzLmMKQEAg
LTYzNyw3ICs2MzcsNyBAQCBzdGF0aWMgcmVzb3VyY2Vfc2l6ZV90IHNjYW5fYWxsb2NhdGUoc3Ry
dWN0IG5kX3JlZ2lvbiAqbmRfcmVnaW9uLAogIHJldHJ5OgogCWZpcnN0ID0gMDsKIAlmb3JfZWFj
aF9kcGFfcmVzb3VyY2UobmRkLCByZXMpIHsKLQkJc3RydWN0IHJlc291cmNlICpuZXh0ID0gcmVz
LT5zaWJsaW5nLCAqbmV3X3JlcyA9IE5VTEw7CisJCXN0cnVjdCByZXNvdXJjZSAqbmV4dCA9IHJl
c291cmNlX3NpYmxpbmcocmVzKSwgKm5ld19yZXMgPSBOVUxMOwogCQlyZXNvdXJjZV9zaXplX3Qg
YWxsb2NhdGUsIGF2YWlsYWJsZSA9IDA7CiAJCWVudW0gYWxsb2NfbG9jIGxvYyA9IEFMTE9DX0VS
UjsKIAkJY29uc3QgY2hhciAqYWN0aW9uOwpAQCAtNzYzLDcgKzc2Myw3IEBAIHN0YXRpYyByZXNv
dXJjZV9zaXplX3Qgc2Nhbl9hbGxvY2F0ZShzdHJ1Y3QgbmRfcmVnaW9uICpuZF9yZWdpb24sCiAJ
ICogYW4gaW5pdGlhbCAicG1lbS1yZXNlcnZlIHBhc3MiLiAgT25seSBkbyBhbiBpbml0aWFsIEJM
SyBhbGxvY2F0aW9uCiAJICogd2hlbiBub25lIG9mIHRoZSBEUEEgc3BhY2UgaXMgcmVzZXJ2ZWQu
CiAJICovCi0JaWYgKChpc19wbWVtIHx8ICFuZGQtPmRwYS5jaGlsZCkgJiYgbiA9PSB0b19hbGxv
Y2F0ZSkKKwlpZiAoKGlzX3BtZW0gfHwgbGlzdF9lbXB0eSgmbmRkLT5kcGEuY2hpbGQpKSAmJiBu
ID09IHRvX2FsbG9jYXRlKQogCQlyZXR1cm4gaW5pdF9kcGFfYWxsb2NhdGlvbihsYWJlbF9pZCwg
bmRfcmVnaW9uLCBuZF9tYXBwaW5nLCBuKTsKIAlyZXR1cm4gbjsKIH0KQEAgLTc3OSw3ICs3Nzks
NyBAQCBzdGF0aWMgaW50IG1lcmdlX2RwYShzdHJ1Y3QgbmRfcmVnaW9uICpuZF9yZWdpb24sCiAg
cmV0cnk6CiAJZm9yX2VhY2hfZHBhX3Jlc291cmNlKG5kZCwgcmVzKSB7CiAJCWludCByYzsKLQkJ
c3RydWN0IHJlc291cmNlICpuZXh0ID0gcmVzLT5zaWJsaW5nOworCQlzdHJ1Y3QgcmVzb3VyY2Ug
Km5leHQgPSByZXNvdXJjZV9zaWJsaW5nKHJlcyk7CiAJCXJlc291cmNlX3NpemVfdCBlbmQgPSBy
ZXMtPnN0YXJ0ICsgcmVzb3VyY2Vfc2l6ZShyZXMpOwogCiAJCWlmICghbmV4dCB8fCBzdHJjbXAo
cmVzLT5uYW1lLCBsYWJlbF9pZC0+aWQpICE9IDAKZGlmZiAtLWdpdCBhL2RyaXZlcnMvbnZkaW1t
L25kLmggYi9kcml2ZXJzL252ZGltbS9uZC5oCmluZGV4IDMyZTAzNjRiNDhiOS4uZGE3ZGExNWUw
M2U3IDEwMDY0NAotLS0gYS9kcml2ZXJzL252ZGltbS9uZC5oCisrKyBiL2RyaXZlcnMvbnZkaW1t
L25kLmgKQEAgLTEwMiwxMSArMTAyLDEwIEBAIHVuc2lnbmVkIHNpemVvZl9uYW1lc3BhY2VfbGFi
ZWwoc3RydWN0IG52ZGltbV9kcnZkYXRhICpuZGQpOwogCQkodW5zaWduZWQgbG9uZyBsb25nKSAo
cmVzID8gcmVzLT5zdGFydCA6IDApLCAjI2FyZykKIAogI2RlZmluZSBmb3JfZWFjaF9kcGFfcmVz
b3VyY2UobmRkLCByZXMpIFwKLQlmb3IgKHJlcyA9IChuZGQpLT5kcGEuY2hpbGQ7IHJlczsgcmVz
ID0gcmVzLT5zaWJsaW5nKQorCWxpc3RfZm9yX2VhY2hfZW50cnkocmVzLCAmKG5kZCktPmRwYS5j
aGlsZCwgc2libGluZykKIAogI2RlZmluZSBmb3JfZWFjaF9kcGFfcmVzb3VyY2Vfc2FmZShuZGQs
IHJlcywgbmV4dCkgXAotCWZvciAocmVzID0gKG5kZCktPmRwYS5jaGlsZCwgbmV4dCA9IHJlcyA/
IHJlcy0+c2libGluZyA6IE5VTEw7IFwKLQkJCXJlczsgcmVzID0gbmV4dCwgbmV4dCA9IG5leHQg
PyBuZXh0LT5zaWJsaW5nIDogTlVMTCkKKwlsaXN0X2Zvcl9lYWNoX2VudHJ5X3NhZmUocmVzLCBu
ZXh0LCAmKG5kZCktPmRwYS5jaGlsZCwgc2libGluZykKIAogc3RydWN0IG5kX3BlcmNwdV9sYW5l
IHsKIAlpbnQgY291bnQ7CmRpZmYgLS1naXQgYS9kcml2ZXJzL29mL2FkZHJlc3MuYyBiL2RyaXZl
cnMvb2YvYWRkcmVzcy5jCmluZGV4IDUzMzQ5OTEyYWM3NS4uZTJlMjU3MTlhYjUyIDEwMDY0NAot
LS0gYS9kcml2ZXJzL29mL2FkZHJlc3MuYworKysgYi9kcml2ZXJzL29mL2FkZHJlc3MuYwpAQCAt
MzMwLDcgKzMzMCw5IEBAIGludCBvZl9wY2lfcmFuZ2VfdG9fcmVzb3VyY2Uoc3RydWN0IG9mX3Bj
aV9yYW5nZSAqcmFuZ2UsCiB7CiAJaW50IGVycjsKIAlyZXMtPmZsYWdzID0gcmFuZ2UtPmZsYWdz
OwotCXJlcy0+cGFyZW50ID0gcmVzLT5jaGlsZCA9IHJlcy0+c2libGluZyA9IE5VTEw7CisJcmVz
LT5wYXJlbnQgPSBOVUxMOworCUlOSVRfTElTVF9IRUFEKCZyZXMtPmNoaWxkKTsKKwlJTklUX0xJ
U1RfSEVBRCgmcmVzLT5zaWJsaW5nKTsKIAlyZXMtPm5hbWUgPSBucC0+ZnVsbF9uYW1lOwogCiAJ
aWYgKHJlcy0+ZmxhZ3MgJiBJT1JFU09VUkNFX0lPKSB7CmRpZmYgLS1naXQgYS9kcml2ZXJzL3Bh
cmlzYy9sYmFfcGNpLmMgYi9kcml2ZXJzL3BhcmlzYy9sYmFfcGNpLmMKaW5kZXggNjliZDk4NDIx
ZWIxLi43NDgyYmRmZDE5NTkgMTAwNjQ0Ci0tLSBhL2RyaXZlcnMvcGFyaXNjL2xiYV9wY2kuYwor
KysgYi9kcml2ZXJzL3BhcmlzYy9sYmFfcGNpLmMKQEAgLTE3MCw4ICsxNzAsOCBAQCBsYmFfZHVt
cF9yZXMoc3RydWN0IHJlc291cmNlICpyLCBpbnQgZCkKIAlmb3IgKGkgPSBkOyBpIDsgLS1pKSBw
cmludGsoIiAiKTsKIAlwcmludGsoS0VSTl9ERUJVRyAiJXAgWyVseCwlbHhdLyVseFxuIiwgciwK
IAkJKGxvbmcpci0+c3RhcnQsIChsb25nKXItPmVuZCwgci0+ZmxhZ3MpOwotCWxiYV9kdW1wX3Jl
cyhyLT5jaGlsZCwgZCsyKTsKLQlsYmFfZHVtcF9yZXMoci0+c2libGluZywgZCk7CisJbGJhX2R1
bXBfcmVzKHJlc291cmNlX2ZpcnN0X2NoaWxkKCZyLT5jaGlsZCksIGQrMik7CisJbGJhX2R1bXBf
cmVzKHJlc291cmNlX3NpYmxpbmcociksIGQpOwogfQogCiAKZGlmZiAtLWdpdCBhL2RyaXZlcnMv
cGNpL2NvbnRyb2xsZXIvdm1kLmMgYi9kcml2ZXJzL3BjaS9jb250cm9sbGVyL3ZtZC5jCmluZGV4
IDk0MmI2NGZjN2YxZi4uZTNhY2UyMDM0NWM3IDEwMDY0NAotLS0gYS9kcml2ZXJzL3BjaS9jb250
cm9sbGVyL3ZtZC5jCisrKyBiL2RyaXZlcnMvcGNpL2NvbnRyb2xsZXIvdm1kLmMKQEAgLTU0Miwx
NCArNTQyLDE0IEBAIHN0YXRpYyBzdHJ1Y3QgcGNpX29wcyB2bWRfb3BzID0gewogCiBzdGF0aWMg
dm9pZCB2bWRfYXR0YWNoX3Jlc291cmNlcyhzdHJ1Y3Qgdm1kX2RldiAqdm1kKQogewotCXZtZC0+
ZGV2LT5yZXNvdXJjZVtWTURfTUVNQkFSMV0uY2hpbGQgPSAmdm1kLT5yZXNvdXJjZXNbMV07Ci0J
dm1kLT5kZXYtPnJlc291cmNlW1ZNRF9NRU1CQVIyXS5jaGlsZCA9ICZ2bWQtPnJlc291cmNlc1sy
XTsKKwlsaXN0X2FkZCgmdm1kLT5yZXNvdXJjZXNbMV0uc2libGluZywgJnZtZC0+ZGV2LT5yZXNv
dXJjZVtWTURfTUVNQkFSMV0uY2hpbGQpOworCWxpc3RfYWRkKCZ2bWQtPnJlc291cmNlc1syXS5z
aWJsaW5nLCAmdm1kLT5kZXYtPnJlc291cmNlW1ZNRF9NRU1CQVIyXS5jaGlsZCk7CiB9CiAKIHN0
YXRpYyB2b2lkIHZtZF9kZXRhY2hfcmVzb3VyY2VzKHN0cnVjdCB2bWRfZGV2ICp2bWQpCiB7Ci0J
dm1kLT5kZXYtPnJlc291cmNlW1ZNRF9NRU1CQVIxXS5jaGlsZCA9IE5VTEw7Ci0Jdm1kLT5kZXYt
PnJlc291cmNlW1ZNRF9NRU1CQVIyXS5jaGlsZCA9IE5VTEw7CisJSU5JVF9MSVNUX0hFQUQoJnZt
ZC0+ZGV2LT5yZXNvdXJjZVtWTURfTUVNQkFSMV0uY2hpbGQpOworCUlOSVRfTElTVF9IRUFEKCZ2
bWQtPmRldi0+cmVzb3VyY2VbVk1EX01FTUJBUjJdLmNoaWxkKTsKIH0KIAogLyoKZGlmZiAtLWdp
dCBhL2RyaXZlcnMvcGNpL3Byb2JlLmMgYi9kcml2ZXJzL3BjaS9wcm9iZS5jCmluZGV4IGFjODc2
ZTMyZGU0Yi4uOTYyNGRkMWRmZDQ5IDEwMDY0NAotLS0gYS9kcml2ZXJzL3BjaS9wcm9iZS5jCisr
KyBiL2RyaXZlcnMvcGNpL3Byb2JlLmMKQEAgLTU5LDYgKzU5LDggQEAgc3RhdGljIHN0cnVjdCBy
ZXNvdXJjZSAqZ2V0X3BjaV9kb21haW5fYnVzbl9yZXMoaW50IGRvbWFpbl9ucikKIAlyLT5yZXMu
c3RhcnQgPSAwOwogCXItPnJlcy5lbmQgPSAweGZmOwogCXItPnJlcy5mbGFncyA9IElPUkVTT1VS
Q0VfQlVTIHwgSU9SRVNPVVJDRV9QQ0lfRklYRUQ7CisJSU5JVF9MSVNUX0hFQUQoJnItPnJlcy5j
aGlsZCk7CisJSU5JVF9MSVNUX0hFQUQoJnItPnJlcy5zaWJsaW5nKTsKIAogCWxpc3RfYWRkX3Rh
aWwoJnItPmxpc3QsICZwY2lfZG9tYWluX2J1c25fcmVzX2xpc3QpOwogCmRpZmYgLS1naXQgYS9k
cml2ZXJzL3BjaS9zZXR1cC1idXMuYyBiL2RyaXZlcnMvcGNpL3NldHVwLWJ1cy5jCmluZGV4IDc5
YjE4MjRlODNiNC4uOGU2ODVhZjg5MzhkIDEwMDY0NAotLS0gYS9kcml2ZXJzL3BjaS9zZXR1cC1i
dXMuYworKysgYi9kcml2ZXJzL3BjaS9zZXR1cC1idXMuYwpAQCAtMjEwNyw3ICsyMTA3LDcgQEAg
aW50IHBjaV9yZWFzc2lnbl9icmlkZ2VfcmVzb3VyY2VzKHN0cnVjdCBwY2lfZGV2ICpicmlkZ2Us
IHVuc2lnbmVkIGxvbmcgdHlwZSkKIAkJCQljb250aW51ZTsKIAogCQkJLyogSWdub3JlIEJBUnMg
d2hpY2ggYXJlIHN0aWxsIGluIHVzZSAqLwotCQkJaWYgKHJlcy0+Y2hpbGQpCisJCQlpZiAoIWxp
c3RfZW1wdHkoJnJlcy0+Y2hpbGQpKQogCQkJCWNvbnRpbnVlOwogCiAJCQlyZXQgPSBhZGRfdG9f
bGlzdCgmc2F2ZWQsIGJyaWRnZSwgcmVzLCAwLCAwKTsKZGlmZiAtLWdpdCBhL2luY2x1ZGUvbGlu
dXgvaW9wb3J0LmggYi9pbmNsdWRlL2xpbnV4L2lvcG9ydC5oCmluZGV4IGRmZGNkMGJmZTU0ZS4u
Yjc0NTZhZTg4OWRkIDEwMDY0NAotLS0gYS9pbmNsdWRlL2xpbnV4L2lvcG9ydC5oCisrKyBiL2lu
Y2x1ZGUvbGludXgvaW9wb3J0LmgKQEAgLTEyLDYgKzEyLDcgQEAKICNpZm5kZWYgX19BU1NFTUJM
WV9fCiAjaW5jbHVkZSA8bGludXgvY29tcGlsZXIuaD4KICNpbmNsdWRlIDxsaW51eC90eXBlcy5o
PgorI2luY2x1ZGUgPGxpbnV4L2xpc3QuaD4KIC8qCiAgKiBSZXNvdXJjZXMgYXJlIHRyZWUtbGlr
ZSwgYWxsb3dpbmcKICAqIG5lc3RpbmcgZXRjLi4KQEAgLTIyLDcgKzIzLDggQEAgc3RydWN0IHJl
c291cmNlIHsKIAljb25zdCBjaGFyICpuYW1lOwogCXVuc2lnbmVkIGxvbmcgZmxhZ3M7CiAJdW5z
aWduZWQgbG9uZyBkZXNjOwotCXN0cnVjdCByZXNvdXJjZSAqcGFyZW50LCAqc2libGluZywgKmNo
aWxkOworCXN0cnVjdCBsaXN0X2hlYWQgY2hpbGQsIHNpYmxpbmc7CisJc3RydWN0IHJlc291cmNl
ICpwYXJlbnQ7CiB9OwogCiAvKgpAQCAtMjE2LDcgKzIxOCw2IEBAIHN0YXRpYyBpbmxpbmUgYm9v
bCByZXNvdXJjZV9jb250YWlucyhzdHJ1Y3QgcmVzb3VyY2UgKnIxLCBzdHJ1Y3QgcmVzb3VyY2Ug
KnIyKQogCXJldHVybiByMS0+c3RhcnQgPD0gcjItPnN0YXJ0ICYmIHIxLT5lbmQgPj0gcjItPmVu
ZDsKIH0KIAotCiAvKiBDb252ZW5pZW5jZSBzaG9ydGhhbmQgd2l0aCBhbGxvY2F0aW9uICovCiAj
ZGVmaW5lIHJlcXVlc3RfcmVnaW9uKHN0YXJ0LG4sbmFtZSkJCV9fcmVxdWVzdF9yZWdpb24oJmlv
cG9ydF9yZXNvdXJjZSwgKHN0YXJ0KSwgKG4pLCAobmFtZSksIDApCiAjZGVmaW5lIHJlcXVlc3Rf
bXV4ZWRfcmVnaW9uKHN0YXJ0LG4sbmFtZSkJX19yZXF1ZXN0X3JlZ2lvbigmaW9wb3J0X3Jlc291
cmNlLCAoc3RhcnQpLCAobiksIChuYW1lKSwgSU9SRVNPVVJDRV9NVVhFRCkKQEAgLTI4Nyw2ICsy
ODgsMTggQEAgc3RhdGljIGlubGluZSBib29sIHJlc291cmNlX292ZXJsYXBzKHN0cnVjdCByZXNv
dXJjZSAqcjEsIHN0cnVjdCByZXNvdXJjZSAqcjIpCiAgICAgICAgcmV0dXJuIChyMS0+c3RhcnQg
PD0gcjItPmVuZCAmJiByMS0+ZW5kID49IHIyLT5zdGFydCk7CiB9CiAKK3N0YXRpYyBpbmxpbmUg
c3RydWN0IHJlc291cmNlICpyZXNvdXJjZV9zaWJsaW5nKHN0cnVjdCByZXNvdXJjZSAqcmVzKQor
eworCWlmIChyZXMtPnBhcmVudCAmJiAhbGlzdF9pc19sYXN0KCZyZXMtPnNpYmxpbmcsICZyZXMt
PnBhcmVudC0+Y2hpbGQpKQorCQlyZXR1cm4gbGlzdF9uZXh0X2VudHJ5KHJlcywgc2libGluZyk7
CisJcmV0dXJuIE5VTEw7Cit9CisKK3N0YXRpYyBpbmxpbmUgc3RydWN0IHJlc291cmNlICpyZXNv
dXJjZV9maXJzdF9jaGlsZChzdHJ1Y3QgbGlzdF9oZWFkICpoZWFkKQoreworCXJldHVybiBsaXN0
X2ZpcnN0X2VudHJ5X29yX251bGwoaGVhZCwgc3RydWN0IHJlc291cmNlLCBzaWJsaW5nKTsKK30K
KwogCiAjZW5kaWYgLyogX19BU1NFTUJMWV9fICovCiAjZW5kaWYJLyogX0xJTlVYX0lPUE9SVF9I
ICovCmRpZmYgLS1naXQgYS9rZXJuZWwvcmVzb3VyY2UuYyBiL2tlcm5lbC9yZXNvdXJjZS5jCmlu
ZGV4IDgxY2NkMTljMWQ5Zi4uYzk2ZTU4ZDNkMmY4IDEwMDY0NAotLS0gYS9rZXJuZWwvcmVzb3Vy
Y2UuYworKysgYi9rZXJuZWwvcmVzb3VyY2UuYwpAQCAtMzEsNiArMzEsOCBAQCBzdHJ1Y3QgcmVz
b3VyY2UgaW9wb3J0X3Jlc291cmNlID0gewogCS5zdGFydAk9IDAsCiAJLmVuZAk9IElPX1NQQUNF
X0xJTUlULAogCS5mbGFncwk9IElPUkVTT1VSQ0VfSU8sCisJLnNpYmxpbmcgPSBMSVNUX0hFQURf
SU5JVChpb3BvcnRfcmVzb3VyY2Uuc2libGluZyksCisJLmNoaWxkICA9IExJU1RfSEVBRF9JTklU
KGlvcG9ydF9yZXNvdXJjZS5jaGlsZCksCiB9OwogRVhQT1JUX1NZTUJPTChpb3BvcnRfcmVzb3Vy
Y2UpOwogCkBAIC0zOSw2ICs0MSw4IEBAIHN0cnVjdCByZXNvdXJjZSBpb21lbV9yZXNvdXJjZSA9
IHsKIAkuc3RhcnQJPSAwLAogCS5lbmQJPSAtMSwKIAkuZmxhZ3MJPSBJT1JFU09VUkNFX01FTSwK
Kwkuc2libGluZyA9IExJU1RfSEVBRF9JTklUKGlvbWVtX3Jlc291cmNlLnNpYmxpbmcpLAorCS5j
aGlsZCAgPSBMSVNUX0hFQURfSU5JVChpb21lbV9yZXNvdXJjZS5jaGlsZCksCiB9OwogRVhQT1JU
X1NZTUJPTChpb21lbV9yZXNvdXJjZSk7CiAKQEAgLTU3LDIwICs2MSwyMCBAQCBzdGF0aWMgREVG
SU5FX1JXTE9DSyhyZXNvdXJjZV9sb2NrKTsKICAqIGJ5IGJvb3QgbWVtIGFmdGVyIHRoZSBzeXN0
ZW0gaXMgdXAuIFNvIGZvciByZXVzaW5nIHRoZSByZXNvdXJjZSBlbnRyeQogICogd2UgbmVlZCB0
byByZW1lbWJlciB0aGUgcmVzb3VyY2UuCiAgKi8KLXN0YXRpYyBzdHJ1Y3QgcmVzb3VyY2UgKmJv
b3RtZW1fcmVzb3VyY2VfZnJlZTsKK3N0YXRpYyBzdHJ1Y3QgbGlzdF9oZWFkIGJvb3RtZW1fcmVz
b3VyY2VfZnJlZSA9IExJU1RfSEVBRF9JTklUKGJvb3RtZW1fcmVzb3VyY2VfZnJlZSk7CiBzdGF0
aWMgREVGSU5FX1NQSU5MT0NLKGJvb3RtZW1fcmVzb3VyY2VfbG9jayk7CiAKIHN0YXRpYyBzdHJ1
Y3QgcmVzb3VyY2UgKm5leHRfcmVzb3VyY2Uoc3RydWN0IHJlc291cmNlICpwLCBib29sIHNpYmxp
bmdfb25seSkKIHsKIAkvKiBDYWxsZXIgd2FudHMgdG8gdHJhdmVyc2UgdGhyb3VnaCBzaWJsaW5n
cyBvbmx5ICovCiAJaWYgKHNpYmxpbmdfb25seSkKLQkJcmV0dXJuIHAtPnNpYmxpbmc7CisJCXJl
dHVybiByZXNvdXJjZV9zaWJsaW5nKHApOwogCi0JaWYgKHAtPmNoaWxkKQotCQlyZXR1cm4gcC0+
Y2hpbGQ7Ci0Jd2hpbGUgKCFwLT5zaWJsaW5nICYmIHAtPnBhcmVudCkKKwlpZiAoIWxpc3RfZW1w
dHkoJnAtPmNoaWxkKSkKKwkJcmV0dXJuIHJlc291cmNlX2ZpcnN0X2NoaWxkKCZwLT5jaGlsZCk7
CisJd2hpbGUgKCFyZXNvdXJjZV9zaWJsaW5nKHApICYmIHAtPnBhcmVudCkKIAkJcCA9IHAtPnBh
cmVudDsKLQlyZXR1cm4gcC0+c2libGluZzsKKwlyZXR1cm4gcmVzb3VyY2Vfc2libGluZyhwKTsK
IH0KIAogc3RhdGljIHZvaWQgKnJfbmV4dChzdHJ1Y3Qgc2VxX2ZpbGUgKm0sIHZvaWQgKnYsIGxv
ZmZfdCAqcG9zKQpAQCAtOTAsNyArOTQsNyBAQCBzdGF0aWMgdm9pZCAqcl9zdGFydChzdHJ1Y3Qg
c2VxX2ZpbGUgKm0sIGxvZmZfdCAqcG9zKQogCXN0cnVjdCByZXNvdXJjZSAqcCA9IFBERV9EQVRB
KGZpbGVfaW5vZGUobS0+ZmlsZSkpOwogCWxvZmZfdCBsID0gMDsKIAlyZWFkX2xvY2soJnJlc291
cmNlX2xvY2spOwotCWZvciAocCA9IHAtPmNoaWxkOyBwICYmIGwgPCAqcG9zOyBwID0gcl9uZXh0
KG0sIHAsICZsKSkKKwlmb3IgKHAgPSByZXNvdXJjZV9maXJzdF9jaGlsZCgmcC0+Y2hpbGQpOyBw
ICYmIGwgPCAqcG9zOyBwID0gcl9uZXh0KG0sIHAsICZsKSkKIAkJOwogCXJldHVybiBwOwogfQpA
QCAtMTUzLDggKzE1Nyw3IEBAIHN0YXRpYyB2b2lkIGZyZWVfcmVzb3VyY2Uoc3RydWN0IHJlc291
cmNlICpyZXMpCiAKIAlpZiAoIVBhZ2VTbGFiKHZpcnRfdG9faGVhZF9wYWdlKHJlcykpKSB7CiAJ
CXNwaW5fbG9jaygmYm9vdG1lbV9yZXNvdXJjZV9sb2NrKTsKLQkJcmVzLT5zaWJsaW5nID0gYm9v
dG1lbV9yZXNvdXJjZV9mcmVlOwotCQlib290bWVtX3Jlc291cmNlX2ZyZWUgPSByZXM7CisJCWxp
c3RfYWRkKCZyZXMtPnNpYmxpbmcsICZib290bWVtX3Jlc291cmNlX2ZyZWUpOwogCQlzcGluX3Vu
bG9jaygmYm9vdG1lbV9yZXNvdXJjZV9sb2NrKTsKIAl9IGVsc2UgewogCQlrZnJlZShyZXMpOwpA
QCAtMTY2LDEwICsxNjksOSBAQCBzdGF0aWMgc3RydWN0IHJlc291cmNlICphbGxvY19yZXNvdXJj
ZShnZnBfdCBmbGFncykKIAlzdHJ1Y3QgcmVzb3VyY2UgKnJlcyA9IE5VTEw7CiAKIAlzcGluX2xv
Y2soJmJvb3RtZW1fcmVzb3VyY2VfbG9jayk7Ci0JaWYgKGJvb3RtZW1fcmVzb3VyY2VfZnJlZSkg
ewotCQlyZXMgPSBib290bWVtX3Jlc291cmNlX2ZyZWU7Ci0JCWJvb3RtZW1fcmVzb3VyY2VfZnJl
ZSA9IHJlcy0+c2libGluZzsKLQl9CisJcmVzID0gcmVzb3VyY2VfZmlyc3RfY2hpbGQoJmJvb3Rt
ZW1fcmVzb3VyY2VfZnJlZSk7CisJaWYgKHJlcykKKwkJbGlzdF9kZWwoJnJlcy0+c2libGluZyk7
CiAJc3Bpbl91bmxvY2soJmJvb3RtZW1fcmVzb3VyY2VfbG9jayk7CiAKIAlpZiAocmVzKQpAQCAt
MTc3LDYgKzE3OSw4IEBAIHN0YXRpYyBzdHJ1Y3QgcmVzb3VyY2UgKmFsbG9jX3Jlc291cmNlKGdm
cF90IGZsYWdzKQogCWVsc2UKIAkJcmVzID0ga3phbGxvYyhzaXplb2Yoc3RydWN0IHJlc291cmNl
KSwgZmxhZ3MpOwogCisJSU5JVF9MSVNUX0hFQUQoJnJlcy0+Y2hpbGQpOworCUlOSVRfTElTVF9I
RUFEKCZyZXMtPnNpYmxpbmcpOwogCXJldHVybiByZXM7CiB9CiAKQEAgLTE4NSw3ICsxODksNyBA
QCBzdGF0aWMgc3RydWN0IHJlc291cmNlICogX19yZXF1ZXN0X3Jlc291cmNlKHN0cnVjdCByZXNv
dXJjZSAqcm9vdCwgc3RydWN0IHJlc291cgogewogCXJlc291cmNlX3NpemVfdCBzdGFydCA9IG5l
dy0+c3RhcnQ7CiAJcmVzb3VyY2Vfc2l6ZV90IGVuZCA9IG5ldy0+ZW5kOwotCXN0cnVjdCByZXNv
dXJjZSAqdG1wLCAqKnA7CisJc3RydWN0IHJlc291cmNlICp0bXA7CiAKIAlpZiAoZW5kIDwgc3Rh
cnQpCiAJCXJldHVybiByb290OwpAQCAtMTkzLDY0ICsxOTcsNjIgQEAgc3RhdGljIHN0cnVjdCBy
ZXNvdXJjZSAqIF9fcmVxdWVzdF9yZXNvdXJjZShzdHJ1Y3QgcmVzb3VyY2UgKnJvb3QsIHN0cnVj
dCByZXNvdXIKIAkJcmV0dXJuIHJvb3Q7CiAJaWYgKGVuZCA+IHJvb3QtPmVuZCkKIAkJcmV0dXJu
IHJvb3Q7Ci0JcCA9ICZyb290LT5jaGlsZDsKLQlmb3IgKDs7KSB7Ci0JCXRtcCA9ICpwOwotCQlp
ZiAoIXRtcCB8fCB0bXAtPnN0YXJ0ID4gZW5kKSB7Ci0JCQluZXctPnNpYmxpbmcgPSB0bXA7Ci0J
CQkqcCA9IG5ldzsKKworCWlmIChsaXN0X2VtcHR5KCZyb290LT5jaGlsZCkpIHsKKwkJbGlzdF9h
ZGQoJm5ldy0+c2libGluZywgJnJvb3QtPmNoaWxkKTsKKwkJbmV3LT5wYXJlbnQgPSByb290Owor
CQlJTklUX0xJU1RfSEVBRCgmbmV3LT5jaGlsZCk7CisJCXJldHVybiBOVUxMOworCX0KKworCWxp
c3RfZm9yX2VhY2hfZW50cnkodG1wLCAmcm9vdC0+Y2hpbGQsIHNpYmxpbmcpIHsKKwkJaWYgKHRt
cC0+c3RhcnQgPiBlbmQpIHsKKwkJCWxpc3RfYWRkKCZuZXctPnNpYmxpbmcsIHRtcC0+c2libGlu
Zy5wcmV2KTsKIAkJCW5ldy0+cGFyZW50ID0gcm9vdDsKKwkJCUlOSVRfTElTVF9IRUFEKCZuZXct
PmNoaWxkKTsKIAkJCXJldHVybiBOVUxMOwogCQl9Ci0JCXAgPSAmdG1wLT5zaWJsaW5nOwogCQlp
ZiAodG1wLT5lbmQgPCBzdGFydCkKIAkJCWNvbnRpbnVlOwogCQlyZXR1cm4gdG1wOwogCX0KKwor
CWxpc3RfYWRkX3RhaWwoJm5ldy0+c2libGluZywgJnJvb3QtPmNoaWxkKTsKKwluZXctPnBhcmVu
dCA9IHJvb3Q7CisJSU5JVF9MSVNUX0hFQUQoJm5ldy0+Y2hpbGQpOworCXJldHVybiBOVUxMOwog
fQogCiBzdGF0aWMgaW50IF9fcmVsZWFzZV9yZXNvdXJjZShzdHJ1Y3QgcmVzb3VyY2UgKm9sZCwg
Ym9vbCByZWxlYXNlX2NoaWxkKQogewotCXN0cnVjdCByZXNvdXJjZSAqdG1wLCAqKnAsICpjaGQ7
CisJc3RydWN0IHJlc291cmNlICp0bXAsICpuZXh0LCAqY2hkOwogCi0JcCA9ICZvbGQtPnBhcmVu
dC0+Y2hpbGQ7Ci0JZm9yICg7OykgewotCQl0bXAgPSAqcDsKLQkJaWYgKCF0bXApCi0JCQlicmVh
azsKKwlsaXN0X2Zvcl9lYWNoX2VudHJ5X3NhZmUodG1wLCBuZXh0LCAmb2xkLT5wYXJlbnQtPmNo
aWxkLCBzaWJsaW5nKSB7CiAJCWlmICh0bXAgPT0gb2xkKSB7Ci0JCQlpZiAocmVsZWFzZV9jaGls
ZCB8fCAhKHRtcC0+Y2hpbGQpKSB7Ci0JCQkJKnAgPSB0bXAtPnNpYmxpbmc7CisJCQlpZiAocmVs
ZWFzZV9jaGlsZCB8fCBsaXN0X2VtcHR5KCZ0bXAtPmNoaWxkKSkgeworCQkJCWxpc3RfZGVsKCZ0
bXAtPnNpYmxpbmcpOwogCQkJfSBlbHNlIHsKLQkJCQlmb3IgKGNoZCA9IHRtcC0+Y2hpbGQ7OyBj
aGQgPSBjaGQtPnNpYmxpbmcpIHsKKwkJCQlsaXN0X2Zvcl9lYWNoX2VudHJ5KGNoZCwgJnRtcC0+
Y2hpbGQsIHNpYmxpbmcpCiAJCQkJCWNoZC0+cGFyZW50ID0gdG1wLT5wYXJlbnQ7Ci0JCQkJCWlm
ICghKGNoZC0+c2libGluZykpCi0JCQkJCQlicmVhazsKLQkJCQl9Ci0JCQkJKnAgPSB0bXAtPmNo
aWxkOwotCQkJCWNoZC0+c2libGluZyA9IHRtcC0+c2libGluZzsKKwkJCQlsaXN0X3NwbGljZSgm
dG1wLT5jaGlsZCwgdG1wLT5zaWJsaW5nLnByZXYpOworCQkJCWxpc3RfZGVsKCZ0bXAtPnNpYmxp
bmcpOwogCQkJfQorCiAJCQlvbGQtPnBhcmVudCA9IE5VTEw7CiAJCQlyZXR1cm4gMDsKIAkJfQot
CQlwID0gJnRtcC0+c2libGluZzsKIAl9CiAJcmV0dXJuIC1FSU5WQUw7CiB9CiAKIHN0YXRpYyB2
b2lkIF9fcmVsZWFzZV9jaGlsZF9yZXNvdXJjZXMoc3RydWN0IHJlc291cmNlICpyKQogewotCXN0
cnVjdCByZXNvdXJjZSAqdG1wLCAqcDsKKwlzdHJ1Y3QgcmVzb3VyY2UgKnRtcCwgKm5leHQ7CiAJ
cmVzb3VyY2Vfc2l6ZV90IHNpemU7CiAKLQlwID0gci0+Y2hpbGQ7Ci0Jci0+Y2hpbGQgPSBOVUxM
OwotCXdoaWxlIChwKSB7Ci0JCXRtcCA9IHA7Ci0JCXAgPSBwLT5zaWJsaW5nOwotCisJbGlzdF9m
b3JfZWFjaF9lbnRyeV9zYWZlKHRtcCwgbmV4dCwgJnItPmNoaWxkLCBzaWJsaW5nKSB7CiAJCXRt
cC0+cGFyZW50ID0gTlVMTDsKLQkJdG1wLT5zaWJsaW5nID0gTlVMTDsKKwkJbGlzdF9kZWxfaW5p
dCgmdG1wLT5zaWJsaW5nKTsKIAkJX19yZWxlYXNlX2NoaWxkX3Jlc291cmNlcyh0bXApOwogCiAJ
CXByaW50ayhLRVJOX0RFQlVHICJyZWxlYXNlIGNoaWxkIHJlc291cmNlICVwUlxuIiwgdG1wKTsK
QEAgLTI1OSw2ICsyNjEsOCBAQCBzdGF0aWMgdm9pZCBfX3JlbGVhc2VfY2hpbGRfcmVzb3VyY2Vz
KHN0cnVjdCByZXNvdXJjZSAqcikKIAkJdG1wLT5zdGFydCA9IDA7CiAJCXRtcC0+ZW5kID0gc2l6
ZSAtIDE7CiAJfQorCisJSU5JVF9MSVNUX0hFQUQoJnRtcC0+Y2hpbGQpOwogfQogCiB2b2lkIHJl
bGVhc2VfY2hpbGRfcmVzb3VyY2VzKHN0cnVjdCByZXNvdXJjZSAqcikKQEAgLTM0Myw3ICszNDcs
OCBAQCBzdGF0aWMgaW50IGZpbmRfbmV4dF9pb21lbV9yZXMoc3RydWN0IHJlc291cmNlICpyZXMs
IHVuc2lnbmVkIGxvbmcgZGVzYywKIAogCXJlYWRfbG9jaygmcmVzb3VyY2VfbG9jayk7CiAKLQlm
b3IgKHAgPSBpb21lbV9yZXNvdXJjZS5jaGlsZDsgcDsgcCA9IG5leHRfcmVzb3VyY2UocCwgc2li
bGluZ19vbmx5KSkgeworCWZvciAocCA9IHJlc291cmNlX2ZpcnN0X2NoaWxkKCZpb21lbV9yZXNv
dXJjZS5jaGlsZCk7IHA7CisJCQlwID0gbmV4dF9yZXNvdXJjZShwLCBzaWJsaW5nX29ubHkpKSB7
CiAJCWlmICgocC0+ZmxhZ3MgJiByZXMtPmZsYWdzKSAhPSByZXMtPmZsYWdzKQogCQkJY29udGlu
dWU7CiAJCWlmICgoZGVzYyAhPSBJT1JFU19ERVNDX05PTkUpICYmIChkZXNjICE9IHAtPmRlc2Mp
KQpAQCAtNTMyLDcgKzUzNyw3IEBAIGludCByZWdpb25faW50ZXJzZWN0cyhyZXNvdXJjZV9zaXpl
X3Qgc3RhcnQsIHNpemVfdCBzaXplLCB1bnNpZ25lZCBsb25nIGZsYWdzLAogCXN0cnVjdCByZXNv
dXJjZSAqcDsKIAogCXJlYWRfbG9jaygmcmVzb3VyY2VfbG9jayk7Ci0JZm9yIChwID0gaW9tZW1f
cmVzb3VyY2UuY2hpbGQ7IHAgOyBwID0gcC0+c2libGluZykgeworCWxpc3RfZm9yX2VhY2hfZW50
cnkocCwgJmlvbWVtX3Jlc291cmNlLmNoaWxkLCBzaWJsaW5nKSB7CiAJCWJvb2wgaXNfdHlwZSA9
ICgoKHAtPmZsYWdzICYgZmxhZ3MpID09IGZsYWdzKSAmJgogCQkJCSgoZGVzYyA9PSBJT1JFU19E
RVNDX05PTkUpIHx8CiAJCQkJIChkZXNjID09IHAtPmRlc2MpKSk7CkBAIC01ODYsNyArNTkxLDcg
QEAgc3RhdGljIGludCBfX2ZpbmRfcmVzb3VyY2Uoc3RydWN0IHJlc291cmNlICpyb290LCBzdHJ1
Y3QgcmVzb3VyY2UgKm9sZCwKIAkJCSByZXNvdXJjZV9zaXplX3QgIHNpemUsCiAJCQkgc3RydWN0
IHJlc291cmNlX2NvbnN0cmFpbnQgKmNvbnN0cmFpbnQpCiB7Ci0Jc3RydWN0IHJlc291cmNlICp0
aGlzID0gcm9vdC0+Y2hpbGQ7CisJc3RydWN0IHJlc291cmNlICp0aGlzID0gcmVzb3VyY2VfZmly
c3RfY2hpbGQoJnJvb3QtPmNoaWxkKTsKIAlzdHJ1Y3QgcmVzb3VyY2UgdG1wID0gKm5ldywgYXZh
aWwsIGFsbG9jOwogCiAJdG1wLnN0YXJ0ID0gcm9vdC0+c3RhcnQ7CkBAIC01OTYsNyArNjAxLDcg
QEAgc3RhdGljIGludCBfX2ZpbmRfcmVzb3VyY2Uoc3RydWN0IHJlc291cmNlICpyb290LCBzdHJ1
Y3QgcmVzb3VyY2UgKm9sZCwKIAkgKi8KIAlpZiAodGhpcyAmJiB0aGlzLT5zdGFydCA9PSByb290
LT5zdGFydCkgewogCQl0bXAuc3RhcnQgPSAodGhpcyA9PSBvbGQpID8gb2xkLT5zdGFydCA6IHRo
aXMtPmVuZCArIDE7Ci0JCXRoaXMgPSB0aGlzLT5zaWJsaW5nOworCQl0aGlzID0gcmVzb3VyY2Vf
c2libGluZyh0aGlzKTsKIAl9CiAJZm9yKDs7KSB7CiAJCWlmICh0aGlzKQpAQCAtNjMyLDcgKzYz
Nyw3IEBAIG5leHQ6CQlpZiAoIXRoaXMgfHwgdGhpcy0+ZW5kID09IHJvb3QtPmVuZCkKIAogCQlp
ZiAodGhpcyAhPSBvbGQpCiAJCQl0bXAuc3RhcnQgPSB0aGlzLT5lbmQgKyAxOwotCQl0aGlzID0g
dGhpcy0+c2libGluZzsKKwkJdGhpcyA9IHJlc291cmNlX3NpYmxpbmcodGhpcyk7CiAJfQogCXJl
dHVybiAtRUJVU1k7CiB9CkBAIC02NzYsNyArNjgxLDcgQEAgc3RhdGljIGludCByZWFsbG9jYXRl
X3Jlc291cmNlKHN0cnVjdCByZXNvdXJjZSAqcm9vdCwgc3RydWN0IHJlc291cmNlICpvbGQsCiAJ
CWdvdG8gb3V0OwogCX0KIAotCWlmIChvbGQtPmNoaWxkKSB7CisJaWYgKCFsaXN0X2VtcHR5KCZv
bGQtPmNoaWxkKSkgewogCQllcnIgPSAtRUJVU1k7CiAJCWdvdG8gb3V0OwogCX0KQEAgLTc1Nyw3
ICs3NjIsNyBAQCBzdHJ1Y3QgcmVzb3VyY2UgKmxvb2t1cF9yZXNvdXJjZShzdHJ1Y3QgcmVzb3Vy
Y2UgKnJvb3QsIHJlc291cmNlX3NpemVfdCBzdGFydCkKIAlzdHJ1Y3QgcmVzb3VyY2UgKnJlczsK
IAogCXJlYWRfbG9jaygmcmVzb3VyY2VfbG9jayk7Ci0JZm9yIChyZXMgPSByb290LT5jaGlsZDsg
cmVzOyByZXMgPSByZXMtPnNpYmxpbmcpIHsKKwlsaXN0X2Zvcl9lYWNoX2VudHJ5KHJlcywgJnJv
b3QtPmNoaWxkLCBzaWJsaW5nKSB7CiAJCWlmIChyZXMtPnN0YXJ0ID09IHN0YXJ0KQogCQkJYnJl
YWs7CiAJfQpAQCAtNzkwLDMyICs3OTUsMjcgQEAgc3RhdGljIHN0cnVjdCByZXNvdXJjZSAqIF9f
aW5zZXJ0X3Jlc291cmNlKHN0cnVjdCByZXNvdXJjZSAqcGFyZW50LCBzdHJ1Y3QgcmVzb3UKIAkJ
CWJyZWFrOwogCX0KIAotCWZvciAobmV4dCA9IGZpcnN0OyA7IG5leHQgPSBuZXh0LT5zaWJsaW5n
KSB7CisJZm9yIChuZXh0ID0gZmlyc3Q7IDsgbmV4dCA9IHJlc291cmNlX3NpYmxpbmcobmV4dCkp
IHsKIAkJLyogUGFydGlhbCBvdmVybGFwPyBCYWQsIGFuZCB1bmZpeGFibGUgKi8KIAkJaWYgKG5l
eHQtPnN0YXJ0IDwgbmV3LT5zdGFydCB8fCBuZXh0LT5lbmQgPiBuZXctPmVuZCkKIAkJCXJldHVy
biBuZXh0OwotCQlpZiAoIW5leHQtPnNpYmxpbmcpCisJCWlmICghcmVzb3VyY2Vfc2libGluZyhu
ZXh0KSkKIAkJCWJyZWFrOwotCQlpZiAobmV4dC0+c2libGluZy0+c3RhcnQgPiBuZXctPmVuZCkK
KwkJaWYgKHJlc291cmNlX3NpYmxpbmcobmV4dCktPnN0YXJ0ID4gbmV3LT5lbmQpCiAJCQlicmVh
azsKIAl9Ci0KIAluZXctPnBhcmVudCA9IHBhcmVudDsKLQluZXctPnNpYmxpbmcgPSBuZXh0LT5z
aWJsaW5nOwotCW5ldy0+Y2hpbGQgPSBmaXJzdDsKKwlsaXN0X2FkZCgmbmV3LT5zaWJsaW5nLCAm
bmV4dC0+c2libGluZyk7CisJSU5JVF9MSVNUX0hFQUQoJm5ldy0+Y2hpbGQpOwogCi0JbmV4dC0+
c2libGluZyA9IE5VTEw7Ci0JZm9yIChuZXh0ID0gZmlyc3Q7IG5leHQ7IG5leHQgPSBuZXh0LT5z
aWJsaW5nKQorCS8qCisJICogRnJvbSBmaXJzdCB0byBuZXh0LCB0aGV5IGFsbCBmYWxsIGludG8g
bmV3J3MgcmVnaW9uLCBzbyBjaGFuZ2UgdGhlbQorCSAqIGFzIG5ldydzIGNoaWxkcmVuLgorCSAq
LworCWxpc3RfY3V0X3Bvc2l0aW9uKCZuZXctPmNoaWxkLCBmaXJzdC0+c2libGluZy5wcmV2LCAm
bmV4dC0+c2libGluZyk7CisJbGlzdF9mb3JfZWFjaF9lbnRyeShuZXh0LCAmbmV3LT5jaGlsZCwg
c2libGluZykKIAkJbmV4dC0+cGFyZW50ID0gbmV3OwogCi0JaWYgKHBhcmVudC0+Y2hpbGQgPT0g
Zmlyc3QpIHsKLQkJcGFyZW50LT5jaGlsZCA9IG5ldzsKLQl9IGVsc2UgewotCQluZXh0ID0gcGFy
ZW50LT5jaGlsZDsKLQkJd2hpbGUgKG5leHQtPnNpYmxpbmcgIT0gZmlyc3QpCi0JCQluZXh0ID0g
bmV4dC0+c2libGluZzsKLQkJbmV4dC0+c2libGluZyA9IG5ldzsKLQl9CiAJcmV0dXJuIE5VTEw7
CiB9CiAKQEAgLTkzNywxOSArOTM3LDE3IEBAIHN0YXRpYyBpbnQgX19hZGp1c3RfcmVzb3VyY2Uo
c3RydWN0IHJlc291cmNlICpyZXMsIHJlc291cmNlX3NpemVfdCBzdGFydCwKIAlpZiAoKHN0YXJ0
IDwgcGFyZW50LT5zdGFydCkgfHwgKGVuZCA+IHBhcmVudC0+ZW5kKSkKIAkJZ290byBvdXQ7CiAK
LQlpZiAocmVzLT5zaWJsaW5nICYmIChyZXMtPnNpYmxpbmctPnN0YXJ0IDw9IGVuZCkpCisJaWYg
KHJlc291cmNlX3NpYmxpbmcocmVzKSAmJiAocmVzb3VyY2Vfc2libGluZyhyZXMpLT5zdGFydCA8
PSBlbmQpKQogCQlnb3RvIG91dDsKIAotCXRtcCA9IHBhcmVudC0+Y2hpbGQ7Ci0JaWYgKHRtcCAh
PSByZXMpIHsKLQkJd2hpbGUgKHRtcC0+c2libGluZyAhPSByZXMpCi0JCQl0bXAgPSB0bXAtPnNp
Ymxpbmc7CisJaWYgKHJlcy0+c2libGluZy5wcmV2ICE9ICZwYXJlbnQtPmNoaWxkKSB7CisJCXRt
cCA9IGxpc3RfcHJldl9lbnRyeShyZXMsIHNpYmxpbmcpOwogCQlpZiAoc3RhcnQgPD0gdG1wLT5l
bmQpCiAJCQlnb3RvIG91dDsKIAl9CiAKIHNraXA6Ci0JZm9yICh0bXAgPSByZXMtPmNoaWxkOyB0
bXA7IHRtcCA9IHRtcC0+c2libGluZykKKwlsaXN0X2Zvcl9lYWNoX2VudHJ5KHRtcCwgJnJlcy0+
Y2hpbGQsIHNpYmxpbmcpCiAJCWlmICgodG1wLT5zdGFydCA8IHN0YXJ0KSB8fCAodG1wLT5lbmQg
PiBlbmQpKQogCQkJZ290byBvdXQ7CiAKQEAgLTk5NiwyNyArOTk0LDMwIEBAIEVYUE9SVF9TWU1C
T0woYWRqdXN0X3Jlc291cmNlKTsKICAqLwogaW50IHJlcGFyZW50X3Jlc291cmNlcyhzdHJ1Y3Qg
cmVzb3VyY2UgKnBhcmVudCwgc3RydWN0IHJlc291cmNlICpyZXMpCiB7Ci0Jc3RydWN0IHJlc291
cmNlICpwLCAqKnBwOwotCXN0cnVjdCByZXNvdXJjZSAqKmZpcnN0cHAgPSBOVUxMOworCXN0cnVj
dCByZXNvdXJjZSAqcCwgKmZpcnN0ID0gTlVMTDsKIAotCWZvciAocHAgPSAmcGFyZW50LT5jaGls
ZDsgKHAgPSAqcHApICE9IE5VTEw7IHBwID0gJnAtPnNpYmxpbmcpIHsKKwlsaXN0X2Zvcl9lYWNo
X2VudHJ5KHAsICZwYXJlbnQtPmNoaWxkLCBzaWJsaW5nKSB7CiAJCWlmIChwLT5lbmQgPCByZXMt
PnN0YXJ0KQogCQkJY29udGludWU7CiAJCWlmIChyZXMtPmVuZCA8IHAtPnN0YXJ0KQogCQkJYnJl
YWs7CiAJCWlmIChwLT5zdGFydCA8IHJlcy0+c3RhcnQgfHwgcC0+ZW5kID4gcmVzLT5lbmQpCiAJ
CQlyZXR1cm4gLUVOT1RTVVBQOwkvKiBub3QgY29tcGxldGVseSBjb250YWluZWQgKi8KLQkJaWYg
KGZpcnN0cHAgPT0gTlVMTCkKLQkJCWZpcnN0cHAgPSBwcDsKKwkJaWYgKGZpcnN0ID09IE5VTEwp
CisJCQlmaXJzdCA9IHA7CiAJfQotCWlmIChmaXJzdHBwID09IE5VTEwpCisJaWYgKGZpcnN0ID09
IE5VTEwpCiAJCXJldHVybiAtRUNBTkNFTEVEOyAvKiBkaWRuJ3QgZmluZCBhbnkgY29uZmxpY3Rp
bmcgZW50cmllcz8gKi8KIAlyZXMtPnBhcmVudCA9IHBhcmVudDsKLQlyZXMtPmNoaWxkID0gKmZp
cnN0cHA7Ci0JcmVzLT5zaWJsaW5nID0gKnBwOwotCSpmaXJzdHBwID0gcmVzOwotCSpwcCA9IE5V
TEw7Ci0JZm9yIChwID0gcmVzLT5jaGlsZDsgcCAhPSBOVUxMOyBwID0gcC0+c2libGluZykgewor
CWxpc3RfYWRkKCZyZXMtPnNpYmxpbmcsIHAtPnNpYmxpbmcucHJldik7CisJSU5JVF9MSVNUX0hF
QUQoJnJlcy0+Y2hpbGQpOworCisJLyoKKwkgKiBGcm9tIGZpcnN0IHRvIHAncyBwcmV2aW91cyBz
aWJsaW5nLCB0aGV5IGFsbCBmYWxsIGludG8KKwkgKiByZXMncyByZWdpb24sIGNoYW5nZSB0aGVt
IGFzIHJlcydzIGNoaWxkcmVuLgorCSAqLworCWxpc3RfY3V0X3Bvc2l0aW9uKCZyZXMtPmNoaWxk
LCBmaXJzdC0+c2libGluZy5wcmV2LCByZXMtPnNpYmxpbmcucHJldik7CisJbGlzdF9mb3JfZWFj
aF9lbnRyeShwLCAmcmVzLT5jaGlsZCwgc2libGluZykgewogCQlwLT5wYXJlbnQgPSByZXM7CiAJ
CXByX2RlYnVnKCJQQ0k6IFJlcGFyZW50ZWQgJXMgJXBSIHVuZGVyICVzXG4iLAogCQkJIHAtPm5h
bWUsIHAsIHJlcy0+bmFtZSk7CkBAIC0xMjE2LDM0ICsxMjE3LDMyIEBAIEVYUE9SVF9TWU1CT0wo
X19yZXF1ZXN0X3JlZ2lvbik7CiB2b2lkIF9fcmVsZWFzZV9yZWdpb24oc3RydWN0IHJlc291cmNl
ICpwYXJlbnQsIHJlc291cmNlX3NpemVfdCBzdGFydCwKIAkJCXJlc291cmNlX3NpemVfdCBuKQog
ewotCXN0cnVjdCByZXNvdXJjZSAqKnA7CisJc3RydWN0IHJlc291cmNlICpyZXM7CiAJcmVzb3Vy
Y2Vfc2l6ZV90IGVuZDsKIAotCXAgPSAmcGFyZW50LT5jaGlsZDsKKwlyZXMgPSByZXNvdXJjZV9m
aXJzdF9jaGlsZCgmcGFyZW50LT5jaGlsZCk7CiAJZW5kID0gc3RhcnQgKyBuIC0gMTsKIAogCXdy
aXRlX2xvY2soJnJlc291cmNlX2xvY2spOwogCiAJZm9yICg7OykgewotCQlzdHJ1Y3QgcmVzb3Vy
Y2UgKnJlcyA9ICpwOwotCiAJCWlmICghcmVzKQogCQkJYnJlYWs7CiAJCWlmIChyZXMtPnN0YXJ0
IDw9IHN0YXJ0ICYmIHJlcy0+ZW5kID49IGVuZCkgewogCQkJaWYgKCEocmVzLT5mbGFncyAmIElP
UkVTT1VSQ0VfQlVTWSkpIHsKLQkJCQlwID0gJnJlcy0+Y2hpbGQ7CisJCQkJcmVzID0gcmVzb3Vy
Y2VfZmlyc3RfY2hpbGQoJnJlcy0+Y2hpbGQpOwogCQkJCWNvbnRpbnVlOwogCQkJfQogCQkJaWYg
KHJlcy0+c3RhcnQgIT0gc3RhcnQgfHwgcmVzLT5lbmQgIT0gZW5kKQogCQkJCWJyZWFrOwotCQkJ
KnAgPSByZXMtPnNpYmxpbmc7CisJCQlsaXN0X2RlbCgmcmVzLT5zaWJsaW5nKTsKIAkJCXdyaXRl
X3VubG9jaygmcmVzb3VyY2VfbG9jayk7CiAJCQlpZiAocmVzLT5mbGFncyAmIElPUkVTT1VSQ0Vf
TVVYRUQpCiAJCQkJd2FrZV91cCgmbXV4ZWRfcmVzb3VyY2Vfd2FpdCk7CiAJCQlmcmVlX3Jlc291
cmNlKHJlcyk7CiAJCQlyZXR1cm47CiAJCX0KLQkJcCA9ICZyZXMtPnNpYmxpbmc7CisJCXJlcyA9
IHJlc291cmNlX3NpYmxpbmcocmVzKTsKIAl9CiAKIAl3cml0ZV91bmxvY2soJnJlc291cmNlX2xv
Y2spOwpAQCAtMTI3OCw5ICsxMjc3LDcgQEAgRVhQT1JUX1NZTUJPTChfX3JlbGVhc2VfcmVnaW9u
KTsKIGludCByZWxlYXNlX21lbV9yZWdpb25fYWRqdXN0YWJsZShzdHJ1Y3QgcmVzb3VyY2UgKnBh
cmVudCwKIAkJCXJlc291cmNlX3NpemVfdCBzdGFydCwgcmVzb3VyY2Vfc2l6ZV90IHNpemUpCiB7
Ci0Jc3RydWN0IHJlc291cmNlICoqcDsKLQlzdHJ1Y3QgcmVzb3VyY2UgKnJlczsKLQlzdHJ1Y3Qg
cmVzb3VyY2UgKm5ld19yZXM7CisJc3RydWN0IHJlc291cmNlICpyZXMsICpuZXdfcmVzOwogCXJl
c291cmNlX3NpemVfdCBlbmQ7CiAJaW50IHJldCA9IC1FSU5WQUw7CiAKQEAgLTEyOTEsMTYgKzEy
ODgsMTYgQEAgaW50IHJlbGVhc2VfbWVtX3JlZ2lvbl9hZGp1c3RhYmxlKHN0cnVjdCByZXNvdXJj
ZSAqcGFyZW50LAogCS8qIFRoZSBhbGxvY19yZXNvdXJjZSgpIHJlc3VsdCBnZXRzIGNoZWNrZWQg
bGF0ZXIgKi8KIAluZXdfcmVzID0gYWxsb2NfcmVzb3VyY2UoR0ZQX0tFUk5FTCk7CiAKLQlwID0g
JnBhcmVudC0+Y2hpbGQ7CisJcmVzID0gcmVzb3VyY2VfZmlyc3RfY2hpbGQoJnBhcmVudC0+Y2hp
bGQpOwogCXdyaXRlX2xvY2soJnJlc291cmNlX2xvY2spOwogCi0Jd2hpbGUgKChyZXMgPSAqcCkp
IHsKKwl3aGlsZSAoKHJlcykpIHsKIAkJaWYgKHJlcy0+c3RhcnQgPj0gZW5kKQogCQkJYnJlYWs7
CiAKIAkJLyogbG9vayBmb3IgdGhlIG5leHQgcmVzb3VyY2UgaWYgaXQgZG9lcyBub3QgZml0IGlu
dG8gKi8KIAkJaWYgKHJlcy0+c3RhcnQgPiBzdGFydCB8fCByZXMtPmVuZCA8IGVuZCkgewotCQkJ
cCA9ICZyZXMtPnNpYmxpbmc7CisJCQlyZXMgPSByZXNvdXJjZV9zaWJsaW5nKHJlcyk7CiAJCQlj
b250aW51ZTsKIAkJfQogCkBAIC0xMzA4LDE0ICsxMzA1LDE0IEBAIGludCByZWxlYXNlX21lbV9y
ZWdpb25fYWRqdXN0YWJsZShzdHJ1Y3QgcmVzb3VyY2UgKnBhcmVudCwKIAkJCWJyZWFrOwogCiAJ
CWlmICghKHJlcy0+ZmxhZ3MgJiBJT1JFU09VUkNFX0JVU1kpKSB7Ci0JCQlwID0gJnJlcy0+Y2hp
bGQ7CisJCQlyZXMgPSByZXNvdXJjZV9maXJzdF9jaGlsZCgmcmVzLT5jaGlsZCk7CiAJCQljb250
aW51ZTsKIAkJfQogCiAJCS8qIGZvdW5kIHRoZSB0YXJnZXQgcmVzb3VyY2U7IGxldCdzIGFkanVz
dCBhY2NvcmRpbmdseSAqLwogCQlpZiAocmVzLT5zdGFydCA9PSBzdGFydCAmJiByZXMtPmVuZCA9
PSBlbmQpIHsKIAkJCS8qIGZyZWUgdGhlIHdob2xlIGVudHJ5ICovCi0JCQkqcCA9IHJlcy0+c2li
bGluZzsKKwkJCWxpc3RfZGVsKCZyZXMtPnNpYmxpbmcpOwogCQkJZnJlZV9yZXNvdXJjZShyZXMp
OwogCQkJcmV0ID0gMDsKIAkJfSBlbHNlIGlmIChyZXMtPnN0YXJ0ID09IHN0YXJ0ICYmIHJlcy0+
ZW5kICE9IGVuZCkgewpAQCAtMTMzOCwxNCArMTMzNSwxMyBAQCBpbnQgcmVsZWFzZV9tZW1fcmVn
aW9uX2FkanVzdGFibGUoc3RydWN0IHJlc291cmNlICpwYXJlbnQsCiAJCQluZXdfcmVzLT5mbGFn
cyA9IHJlcy0+ZmxhZ3M7CiAJCQluZXdfcmVzLT5kZXNjID0gcmVzLT5kZXNjOwogCQkJbmV3X3Jl
cy0+cGFyZW50ID0gcmVzLT5wYXJlbnQ7Ci0JCQluZXdfcmVzLT5zaWJsaW5nID0gcmVzLT5zaWJs
aW5nOwotCQkJbmV3X3Jlcy0+Y2hpbGQgPSBOVUxMOworCQkJSU5JVF9MSVNUX0hFQUQoJm5ld19y
ZXMtPmNoaWxkKTsKIAogCQkJcmV0ID0gX19hZGp1c3RfcmVzb3VyY2UocmVzLCByZXMtPnN0YXJ0
LAogCQkJCQkJc3RhcnQgLSByZXMtPnN0YXJ0KTsKIAkJCWlmIChyZXQpCiAJCQkJYnJlYWs7Ci0J
CQlyZXMtPnNpYmxpbmcgPSBuZXdfcmVzOworCQkJbGlzdF9hZGQoJm5ld19yZXMtPnNpYmxpbmcs
ICZyZXMtPnNpYmxpbmcpOwogCQkJbmV3X3JlcyA9IE5VTEw7CiAJCX0KIApAQCAtMTUyNiw3ICsx
NTIyLDcgQEAgc3RhdGljIGludCBfX2luaXQgcmVzZXJ2ZV9zZXR1cChjaGFyICpzdHIpCiAJCQly
ZXMtPmVuZCA9IGlvX3N0YXJ0ICsgaW9fbnVtIC0gMTsKIAkJCXJlcy0+ZmxhZ3MgfD0gSU9SRVNP
VVJDRV9CVVNZOwogCQkJcmVzLT5kZXNjID0gSU9SRVNfREVTQ19OT05FOwotCQkJcmVzLT5jaGls
ZCA9IE5VTEw7CisJCQlJTklUX0xJU1RfSEVBRCgmcmVzLT5jaGlsZCk7CiAJCQlpZiAocmVxdWVz
dF9yZXNvdXJjZShwYXJlbnQsIHJlcykgPT0gMCkKIAkJCQlyZXNlcnZlZCA9IHgrMTsKIAkJfQpA
QCAtMTU0Niw3ICsxNTQyLDcgQEAgaW50IGlvbWVtX21hcF9zYW5pdHlfY2hlY2socmVzb3VyY2Vf
c2l6ZV90IGFkZHIsIHVuc2lnbmVkIGxvbmcgc2l6ZSkKIAlsb2ZmX3QgbDsKIAogCXJlYWRfbG9j
aygmcmVzb3VyY2VfbG9jayk7Ci0JZm9yIChwID0gcC0+Y2hpbGQ7IHAgOyBwID0gcl9uZXh0KE5V
TEwsIHAsICZsKSkgeworCWZvciAocCA9IHJlc291cmNlX2ZpcnN0X2NoaWxkKCZwLT5jaGlsZCk7
IHA7IHAgPSByX25leHQoTlVMTCwgcCwgJmwpKSB7CiAJCS8qCiAJCSAqIFdlIGNhbiBwcm9iYWJs
eSBza2lwIHRoZSByZXNvdXJjZXMgd2l0aG91dAogCQkgKiBJT1JFU09VUkNFX0lPIGF0dHJpYnV0
ZT8KQEAgLTE2MDIsNyArMTU5OCw3IEBAIGJvb2wgaW9tZW1faXNfZXhjbHVzaXZlKHU2NCBhZGRy
KQogCWFkZHIgPSBhZGRyICYgUEFHRV9NQVNLOwogCiAJcmVhZF9sb2NrKCZyZXNvdXJjZV9sb2Nr
KTsKLQlmb3IgKHAgPSBwLT5jaGlsZDsgcCA7IHAgPSByX25leHQoTlVMTCwgcCwgJmwpKSB7CisJ
Zm9yIChwID0gcmVzb3VyY2VfZmlyc3RfY2hpbGQoJnAtPmNoaWxkKTsgcDsgcCA9IHJfbmV4dChO
VUxMLCBwLCAmbCkpIHsKIAkJLyoKIAkJICogV2UgY2FuIHByb2JhYmx5IHNraXAgdGhlIHJlc291
cmNlcyB3aXRob3V0CiAJCSAqIElPUkVTT1VSQ0VfSU8gYXR0cmlidXRlPwotLSAKMi4xMy42Cgpf
X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fXwpMaW51eC1udmRp
bW0gbWFpbGluZyBsaXN0CkxpbnV4LW52ZGltbUBsaXN0cy4wMS5vcmcKaHR0cHM6Ly9saXN0cy4w
MS5vcmcvbWFpbG1hbi9saXN0aW5mby9saW51eC1udmRpbW0K

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH v7 2/4] resource: Use list_head to link sibling resource
@ 2018-07-18  2:49   ` Baoquan He
  0 siblings, 0 replies; 83+ messages in thread
From: Baoquan He @ 2018-07-18  2:49 UTC (permalink / raw)
  To: linux-kernel, akpm, robh+dt, dan.j.williams, nicolas.pitre, josh,
	fengguang.wu, bp, andy.shevchenko
  Cc: patrik.r.jakobsson, airlied, kys, haiyangz, sthemmin,
	dmitry.torokhov, frowand.list, keith.busch, jonathan.derrick,
	lorenzo.pieralisi, bhelgaas, tglx, brijesh.singh, jglisse,
	thomas.lendacky, gregkh, baiyaowei, richard.weiyang, devel,
	linux-input, linux-nvdimm, devicetree, linux-pci, ebiederm,
	vgoyal, dyoung, yinghai, monstr, davem, chris, jcmvbkbc, gustavo,
	maarten.lankhorst, seanpaul, linux-parisc, linuxppc-dev,
	Baoquan He, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, linux-mips

The struct resource uses singly linked list to link siblings, implemented
by pointer operation. Replace it with list_head for better code readability.

Based on this list_head replacement, it will be very easy to do reverse
iteration on iomem_resource's sibling list in later patch.

Besides, type of member variables of struct resource, sibling and child, are
changed from 'struct resource *' to 'struct list_head'. This brings two
pointers of size increase.

Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Baoquan He <bhe@redhat.com>
Cc: Patrik Jakobsson <patrik.r.jakobsson@gmail.com>
Cc: David Airlie <airlied@linux.ie>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Frank Rowand <frowand.list@gmail.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Jonathan Derrick <jonathan.derrick@intel.com>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Yaowei Bai <baiyaowei@cmss.chinamobile.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: devel@linuxdriverproject.org
Cc: linux-input@vger.kernel.org
Cc: linux-nvdimm@lists.01.org
Cc: devicetree@vger.kernel.org
Cc: linux-pci@vger.kernel.org
Cc: Michal Simek <monstr@monstr.eu>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>                                                                                             
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linux-mips@linux-mips.org
---
 arch/arm/plat-samsung/pm-check.c            |   6 +-
 arch/ia64/sn/kernel/io_init.c               |   2 +-
 arch/microblaze/pci/pci-common.c            |   4 +-
 arch/mips/pci/pci-rc32434.c                 |  12 +-
 arch/powerpc/kernel/pci-common.c            |   4 +-
 arch/sparc/kernel/ioport.c                  |   2 +-
 arch/xtensa/include/asm/pci-bridge.h        |   4 +-
 drivers/eisa/eisa-bus.c                     |   2 +
 drivers/gpu/drm/drm_memory.c                |   3 +-
 drivers/gpu/drm/gma500/gtt.c                |   5 +-
 drivers/hv/vmbus_drv.c                      |  52 +++----
 drivers/input/joystick/iforce/iforce-main.c |   4 +-
 drivers/nvdimm/namespace_devs.c             |   6 +-
 drivers/nvdimm/nd.h                         |   5 +-
 drivers/of/address.c                        |   4 +-
 drivers/parisc/lba_pci.c                    |   4 +-
 drivers/pci/controller/vmd.c                |   8 +-
 drivers/pci/probe.c                         |   2 +
 drivers/pci/setup-bus.c                     |   2 +-
 include/linux/ioport.h                      |  17 ++-
 kernel/resource.c                           | 206 ++++++++++++++--------------
 21 files changed, 183 insertions(+), 171 deletions(-)

diff --git a/arch/arm/plat-samsung/pm-check.c b/arch/arm/plat-samsung/pm-check.c
index cd2c02c68bc3..5494355b1c49 100644
--- a/arch/arm/plat-samsung/pm-check.c
+++ b/arch/arm/plat-samsung/pm-check.c
@@ -46,8 +46,8 @@ typedef u32 *(run_fn_t)(struct resource *ptr, u32 *arg);
 static void s3c_pm_run_res(struct resource *ptr, run_fn_t fn, u32 *arg)
 {
 	while (ptr != NULL) {
-		if (ptr->child != NULL)
-			s3c_pm_run_res(ptr->child, fn, arg);
+		if (!list_empty(&ptr->child))
+			s3c_pm_run_res(resource_first_child(&ptr->child), fn, arg);
 
 		if ((ptr->flags & IORESOURCE_SYSTEM_RAM)
 				== IORESOURCE_SYSTEM_RAM) {
@@ -57,7 +57,7 @@ static void s3c_pm_run_res(struct resource *ptr, run_fn_t fn, u32 *arg)
 			arg = (fn)(ptr, arg);
 		}
 
-		ptr = ptr->sibling;
+		ptr = resource_sibling(ptr);
 	}
 }
 
diff --git a/arch/ia64/sn/kernel/io_init.c b/arch/ia64/sn/kernel/io_init.c
index d63809a6adfa..338a7b7f194d 100644
--- a/arch/ia64/sn/kernel/io_init.c
+++ b/arch/ia64/sn/kernel/io_init.c
@@ -192,7 +192,7 @@ sn_io_slot_fixup(struct pci_dev *dev)
 		 * if it's already in the device structure, remove it before
 		 * inserting
 		 */
-		if (res->parent && res->parent->child)
+		if (res->parent && !list_empty(&res->parent->child))
 			release_resource(res);
 
 		if (res->flags & IORESOURCE_IO)
diff --git a/arch/microblaze/pci/pci-common.c b/arch/microblaze/pci/pci-common.c
index 7899bafab064..2bf73e27e231 100644
--- a/arch/microblaze/pci/pci-common.c
+++ b/arch/microblaze/pci/pci-common.c
@@ -533,7 +533,9 @@ void pci_process_bridge_OF_ranges(struct pci_controller *hose,
 			res->flags = range.flags;
 			res->start = range.cpu_addr;
 			res->end = range.cpu_addr + range.size - 1;
-			res->parent = res->child = res->sibling = NULL;
+			res->parent = NULL;
+			INIT_LIST_HEAD(&res->child);
+			INIT_LIST_HEAD(&res->sibling);
 		}
 	}
 
diff --git a/arch/mips/pci/pci-rc32434.c b/arch/mips/pci/pci-rc32434.c
index 7f6ce6d734c0..e80283df7925 100644
--- a/arch/mips/pci/pci-rc32434.c
+++ b/arch/mips/pci/pci-rc32434.c
@@ -53,8 +53,8 @@ static struct resource rc32434_res_pci_mem1 = {
 	.start = 0x50000000,
 	.end = 0x5FFFFFFF,
 	.flags = IORESOURCE_MEM,
-	.sibling = NULL,
-	.child = &rc32434_res_pci_mem2
+	.sibling = LIST_HEAD_INIT(rc32434_res_pci_mem1.sibling),
+	.child = LIST_HEAD_INIT(rc32434_res_pci_mem1.child),
 };
 
 static struct resource rc32434_res_pci_mem2 = {
@@ -63,8 +63,8 @@ static struct resource rc32434_res_pci_mem2 = {
 	.end = 0x6FFFFFFF,
 	.flags = IORESOURCE_MEM,
 	.parent = &rc32434_res_pci_mem1,
-	.sibling = NULL,
-	.child = NULL
+	.sibling = LIST_HEAD_INIT(rc32434_res_pci_mem2.sibling),
+	.child = LIST_HEAD_INIT(rc32434_res_pci_mem2.child),
 };
 
 static struct resource rc32434_res_pci_io1 = {
@@ -72,6 +72,8 @@ static struct resource rc32434_res_pci_io1 = {
 	.start = 0x18800000,
 	.end = 0x188FFFFF,
 	.flags = IORESOURCE_IO,
+	.sibling = LIST_HEAD_INIT(rc32434_res_pci_io1.sibling),
+	.child = LIST_HEAD_INIT(rc32434_res_pci_io1.child),
 };
 
 extern struct pci_ops rc32434_pci_ops;
@@ -208,6 +210,8 @@ static int __init rc32434_pci_init(void)
 
 	pr_info("PCI: Initializing PCI\n");
 
+	list_add(&rc32434_res_pci_mem2.sibling, &rc32434_res_pci_mem1.child);
+
 	ioport_resource.start = rc32434_res_pci_io1.start;
 	ioport_resource.end = rc32434_res_pci_io1.end;
 
diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index 926035bb378d..28fbe83c9daf 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -761,7 +761,9 @@ void pci_process_bridge_OF_ranges(struct pci_controller *hose,
 			res->flags = range.flags;
 			res->start = range.cpu_addr;
 			res->end = range.cpu_addr + range.size - 1;
-			res->parent = res->child = res->sibling = NULL;
+			res->parent = NULL;
+			INIT_LIST_HEAD(&res->child);
+			INIT_LIST_HEAD(&res->sibling);
 		}
 	}
 }
diff --git a/arch/sparc/kernel/ioport.c b/arch/sparc/kernel/ioport.c
index cca9134cfa7d..99efe4e98b16 100644
--- a/arch/sparc/kernel/ioport.c
+++ b/arch/sparc/kernel/ioport.c
@@ -669,7 +669,7 @@ static int sparc_io_proc_show(struct seq_file *m, void *v)
 	struct resource *root = m->private, *r;
 	const char *nm;
 
-	for (r = root->child; r != NULL; r = r->sibling) {
+	list_for_each_entry(r, &root->child, sibling) {
 		if ((nm = r->name) == NULL) nm = "???";
 		seq_printf(m, "%016llx-%016llx: %s\n",
 				(unsigned long long)r->start,
diff --git a/arch/xtensa/include/asm/pci-bridge.h b/arch/xtensa/include/asm/pci-bridge.h
index 0b68c76ec1e6..f487b06817df 100644
--- a/arch/xtensa/include/asm/pci-bridge.h
+++ b/arch/xtensa/include/asm/pci-bridge.h
@@ -71,8 +71,8 @@ static inline void pcibios_init_resource(struct resource *res,
 	res->flags = flags;
 	res->name = name;
 	res->parent = NULL;
-	res->sibling = NULL;
-	res->child = NULL;
+	INIT_LIST_HEAD(&res->child);
+	INIT_LIST_HEAD(&res->sibling);
 }
 
 
diff --git a/drivers/eisa/eisa-bus.c b/drivers/eisa/eisa-bus.c
index 1e8062f6dbfc..dba78f75fd06 100644
--- a/drivers/eisa/eisa-bus.c
+++ b/drivers/eisa/eisa-bus.c
@@ -408,6 +408,8 @@ static struct resource eisa_root_res = {
 	.start = 0,
 	.end   = 0xffffffff,
 	.flags = IORESOURCE_IO,
+	.sibling = LIST_HEAD_INIT(eisa_root_res.sibling),
+	.child  = LIST_HEAD_INIT(eisa_root_res.child),
 };
 
 static int eisa_bus_count;
diff --git a/drivers/gpu/drm/drm_memory.c b/drivers/gpu/drm/drm_memory.c
index d69e4fc1ee77..33baa7fa5e41 100644
--- a/drivers/gpu/drm/drm_memory.c
+++ b/drivers/gpu/drm/drm_memory.c
@@ -155,9 +155,8 @@ u64 drm_get_max_iomem(void)
 	struct resource *tmp;
 	resource_size_t max_iomem = 0;
 
-	for (tmp = iomem_resource.child; tmp; tmp = tmp->sibling) {
+	list_for_each_entry(tmp, &iomem_resource.child, sibling)
 		max_iomem = max(max_iomem,  tmp->end);
-	}
 
 	return max_iomem;
 }
diff --git a/drivers/gpu/drm/gma500/gtt.c b/drivers/gpu/drm/gma500/gtt.c
index 3949b0990916..addd3bc009af 100644
--- a/drivers/gpu/drm/gma500/gtt.c
+++ b/drivers/gpu/drm/gma500/gtt.c
@@ -565,7 +565,7 @@ int psb_gtt_init(struct drm_device *dev, int resume)
 int psb_gtt_restore(struct drm_device *dev)
 {
 	struct drm_psb_private *dev_priv = dev->dev_private;
-	struct resource *r = dev_priv->gtt_mem->child;
+	struct resource *r;
 	struct gtt_range *range;
 	unsigned int restored = 0, total = 0, size = 0;
 
@@ -573,14 +573,13 @@ int psb_gtt_restore(struct drm_device *dev)
 	mutex_lock(&dev_priv->gtt_mutex);
 	psb_gtt_init(dev, 1);
 
-	while (r != NULL) {
+	list_for_each_entry(r, &dev_priv->gtt_mem->child, sibling) {
 		range = container_of(r, struct gtt_range, resource);
 		if (range->pages) {
 			psb_gtt_insert(dev, range, 1);
 			size += range->resource.end - range->resource.start;
 			restored++;
 		}
-		r = r->sibling;
 		total++;
 	}
 	mutex_unlock(&dev_priv->gtt_mutex);
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index b10fe26c4891..d87ec5a1bc4c 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -1412,9 +1412,8 @@ static acpi_status vmbus_walk_resources(struct acpi_resource *res, void *ctx)
 {
 	resource_size_t start = 0;
 	resource_size_t end = 0;
-	struct resource *new_res;
+	struct resource *new_res, *tmp;
 	struct resource **old_res = &hyperv_mmio;
-	struct resource **prev_res = NULL;
 
 	switch (res->type) {
 
@@ -1461,44 +1460,36 @@ static acpi_status vmbus_walk_resources(struct acpi_resource *res, void *ctx)
 	/*
 	 * If two ranges are adjacent, merge them.
 	 */
-	do {
-		if (!*old_res) {
-			*old_res = new_res;
-			break;
-		}
-
-		if (((*old_res)->end + 1) == new_res->start) {
-			(*old_res)->end = new_res->end;
+	if (!*old_res) {
+		*old_res = new_res;
+		return AE_OK;
+	}
+	tmp = *old_res;
+	list_for_each_entry_from(tmp, &tmp->parent->child, sibling) {
+		if ((tmp->end + 1) == new_res->start) {
+			tmp->end = new_res->end;
 			kfree(new_res);
 			break;
 		}
 
-		if ((*old_res)->start == new_res->end + 1) {
-			(*old_res)->start = new_res->start;
+		if (tmp->start == new_res->end + 1) {
+			tmp->start = new_res->start;
 			kfree(new_res);
 			break;
 		}
 
-		if ((*old_res)->start > new_res->end) {
-			new_res->sibling = *old_res;
-			if (prev_res)
-				(*prev_res)->sibling = new_res;
-			*old_res = new_res;
+		if (tmp->start > new_res->end) {
+			list_add(&new_res->sibling, tmp->sibling.prev);
 			break;
 		}
-
-		prev_res = old_res;
-		old_res = &(*old_res)->sibling;
-
-	} while (1);
+	}
 
 	return AE_OK;
 }
 
 static int vmbus_acpi_remove(struct acpi_device *device)
 {
-	struct resource *cur_res;
-	struct resource *next_res;
+	struct resource *res;
 
 	if (hyperv_mmio) {
 		if (fb_mmio) {
@@ -1507,10 +1498,9 @@ static int vmbus_acpi_remove(struct acpi_device *device)
 			fb_mmio = NULL;
 		}
 
-		for (cur_res = hyperv_mmio; cur_res; cur_res = next_res) {
-			next_res = cur_res->sibling;
-			kfree(cur_res);
-		}
+		res = hyperv_mmio;
+		list_for_each_entry_from(res, &res->parent->child, sibling)
+			kfree(res);
 	}
 
 	return 0;
@@ -1596,7 +1586,8 @@ int vmbus_allocate_mmio(struct resource **new, struct hv_device *device_obj,
 		}
 	}
 
-	for (iter = hyperv_mmio; iter; iter = iter->sibling) {
+	iter = hyperv_mmio;
+	list_for_each_entry_from(iter, &iter->parent->child, sibling) {
 		if ((iter->start >= max) || (iter->end <= min))
 			continue;
 
@@ -1639,7 +1630,8 @@ void vmbus_free_mmio(resource_size_t start, resource_size_t size)
 	struct resource *iter;
 
 	down(&hyperv_mmio_lock);
-	for (iter = hyperv_mmio; iter; iter = iter->sibling) {
+	iter = hyperv_mmio;
+	list_for_each_entry_from(iter, &iter->parent->child, sibling) {
 		if ((iter->start >= start + size) || (iter->end <= start))
 			continue;
 
diff --git a/drivers/input/joystick/iforce/iforce-main.c b/drivers/input/joystick/iforce/iforce-main.c
index daeeb4c7e3b0..5c0be27b33ff 100644
--- a/drivers/input/joystick/iforce/iforce-main.c
+++ b/drivers/input/joystick/iforce/iforce-main.c
@@ -305,8 +305,8 @@ int iforce_init_device(struct iforce *iforce)
 	iforce->device_memory.end = 200;
 	iforce->device_memory.flags = IORESOURCE_MEM;
 	iforce->device_memory.parent = NULL;
-	iforce->device_memory.child = NULL;
-	iforce->device_memory.sibling = NULL;
+	INIT_LIST_HEAD(&iforce->device_memory.child);
+	INIT_LIST_HEAD(&iforce->device_memory.sibling);
 
 /*
  * Wait until device ready - until it sends its first response.
diff --git a/drivers/nvdimm/namespace_devs.c b/drivers/nvdimm/namespace_devs.c
index 28afdd668905..f53d410d9981 100644
--- a/drivers/nvdimm/namespace_devs.c
+++ b/drivers/nvdimm/namespace_devs.c
@@ -637,7 +637,7 @@ static resource_size_t scan_allocate(struct nd_region *nd_region,
  retry:
 	first = 0;
 	for_each_dpa_resource(ndd, res) {
-		struct resource *next = res->sibling, *new_res = NULL;
+		struct resource *next = resource_sibling(res), *new_res = NULL;
 		resource_size_t allocate, available = 0;
 		enum alloc_loc loc = ALLOC_ERR;
 		const char *action;
@@ -763,7 +763,7 @@ static resource_size_t scan_allocate(struct nd_region *nd_region,
 	 * an initial "pmem-reserve pass".  Only do an initial BLK allocation
 	 * when none of the DPA space is reserved.
 	 */
-	if ((is_pmem || !ndd->dpa.child) && n == to_allocate)
+	if ((is_pmem || list_empty(&ndd->dpa.child)) && n == to_allocate)
 		return init_dpa_allocation(label_id, nd_region, nd_mapping, n);
 	return n;
 }
@@ -779,7 +779,7 @@ static int merge_dpa(struct nd_region *nd_region,
  retry:
 	for_each_dpa_resource(ndd, res) {
 		int rc;
-		struct resource *next = res->sibling;
+		struct resource *next = resource_sibling(res);
 		resource_size_t end = res->start + resource_size(res);
 
 		if (!next || strcmp(res->name, label_id->id) != 0
diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h
index 32e0364b48b9..da7da15e03e7 100644
--- a/drivers/nvdimm/nd.h
+++ b/drivers/nvdimm/nd.h
@@ -102,11 +102,10 @@ unsigned sizeof_namespace_label(struct nvdimm_drvdata *ndd);
 		(unsigned long long) (res ? res->start : 0), ##arg)
 
 #define for_each_dpa_resource(ndd, res) \
-	for (res = (ndd)->dpa.child; res; res = res->sibling)
+	list_for_each_entry(res, &(ndd)->dpa.child, sibling)
 
 #define for_each_dpa_resource_safe(ndd, res, next) \
-	for (res = (ndd)->dpa.child, next = res ? res->sibling : NULL; \
-			res; res = next, next = next ? next->sibling : NULL)
+	list_for_each_entry_safe(res, next, &(ndd)->dpa.child, sibling)
 
 struct nd_percpu_lane {
 	int count;
diff --git a/drivers/of/address.c b/drivers/of/address.c
index 53349912ac75..e2e25719ab52 100644
--- a/drivers/of/address.c
+++ b/drivers/of/address.c
@@ -330,7 +330,9 @@ int of_pci_range_to_resource(struct of_pci_range *range,
 {
 	int err;
 	res->flags = range->flags;
-	res->parent = res->child = res->sibling = NULL;
+	res->parent = NULL;
+	INIT_LIST_HEAD(&res->child);
+	INIT_LIST_HEAD(&res->sibling);
 	res->name = np->full_name;
 
 	if (res->flags & IORESOURCE_IO) {
diff --git a/drivers/parisc/lba_pci.c b/drivers/parisc/lba_pci.c
index 69bd98421eb1..7482bdfd1959 100644
--- a/drivers/parisc/lba_pci.c
+++ b/drivers/parisc/lba_pci.c
@@ -170,8 +170,8 @@ lba_dump_res(struct resource *r, int d)
 	for (i = d; i ; --i) printk(" ");
 	printk(KERN_DEBUG "%p [%lx,%lx]/%lx\n", r,
 		(long)r->start, (long)r->end, r->flags);
-	lba_dump_res(r->child, d+2);
-	lba_dump_res(r->sibling, d);
+	lba_dump_res(resource_first_child(&r->child), d+2);
+	lba_dump_res(resource_sibling(r), d);
 }
 
 
diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
index 942b64fc7f1f..e3ace20345c7 100644
--- a/drivers/pci/controller/vmd.c
+++ b/drivers/pci/controller/vmd.c
@@ -542,14 +542,14 @@ static struct pci_ops vmd_ops = {
 
 static void vmd_attach_resources(struct vmd_dev *vmd)
 {
-	vmd->dev->resource[VMD_MEMBAR1].child = &vmd->resources[1];
-	vmd->dev->resource[VMD_MEMBAR2].child = &vmd->resources[2];
+	list_add(&vmd->resources[1].sibling, &vmd->dev->resource[VMD_MEMBAR1].child);
+	list_add(&vmd->resources[2].sibling, &vmd->dev->resource[VMD_MEMBAR2].child);
 }
 
 static void vmd_detach_resources(struct vmd_dev *vmd)
 {
-	vmd->dev->resource[VMD_MEMBAR1].child = NULL;
-	vmd->dev->resource[VMD_MEMBAR2].child = NULL;
+	INIT_LIST_HEAD(&vmd->dev->resource[VMD_MEMBAR1].child);
+	INIT_LIST_HEAD(&vmd->dev->resource[VMD_MEMBAR2].child);
 }
 
 /*
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index ac876e32de4b..9624dd1dfd49 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -59,6 +59,8 @@ static struct resource *get_pci_domain_busn_res(int domain_nr)
 	r->res.start = 0;
 	r->res.end = 0xff;
 	r->res.flags = IORESOURCE_BUS | IORESOURCE_PCI_FIXED;
+	INIT_LIST_HEAD(&r->res.child);
+	INIT_LIST_HEAD(&r->res.sibling);
 
 	list_add_tail(&r->list, &pci_domain_busn_res_list);
 
diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 79b1824e83b4..8e685af8938d 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -2107,7 +2107,7 @@ int pci_reassign_bridge_resources(struct pci_dev *bridge, unsigned long type)
 				continue;
 
 			/* Ignore BARs which are still in use */
-			if (res->child)
+			if (!list_empty(&res->child))
 				continue;
 
 			ret = add_to_list(&saved, bridge, res, 0, 0);
diff --git a/include/linux/ioport.h b/include/linux/ioport.h
index dfdcd0bfe54e..b7456ae889dd 100644
--- a/include/linux/ioport.h
+++ b/include/linux/ioport.h
@@ -12,6 +12,7 @@
 #ifndef __ASSEMBLY__
 #include <linux/compiler.h>
 #include <linux/types.h>
+#include <linux/list.h>
 /*
  * Resources are tree-like, allowing
  * nesting etc..
@@ -22,7 +23,8 @@ struct resource {
 	const char *name;
 	unsigned long flags;
 	unsigned long desc;
-	struct resource *parent, *sibling, *child;
+	struct list_head child, sibling;
+	struct resource *parent;
 };
 
 /*
@@ -216,7 +218,6 @@ static inline bool resource_contains(struct resource *r1, struct resource *r2)
 	return r1->start <= r2->start && r1->end >= r2->end;
 }
 
-
 /* Convenience shorthand with allocation */
 #define request_region(start,n,name)		__request_region(&ioport_resource, (start), (n), (name), 0)
 #define request_muxed_region(start,n,name)	__request_region(&ioport_resource, (start), (n), (name), IORESOURCE_MUXED)
@@ -287,6 +288,18 @@ static inline bool resource_overlaps(struct resource *r1, struct resource *r2)
        return (r1->start <= r2->end && r1->end >= r2->start);
 }
 
+static inline struct resource *resource_sibling(struct resource *res)
+{
+	if (res->parent && !list_is_last(&res->sibling, &res->parent->child))
+		return list_next_entry(res, sibling);
+	return NULL;
+}
+
+static inline struct resource *resource_first_child(struct list_head *head)
+{
+	return list_first_entry_or_null(head, struct resource, sibling);
+}
+
 
 #endif /* __ASSEMBLY__ */
 #endif	/* _LINUX_IOPORT_H */
diff --git a/kernel/resource.c b/kernel/resource.c
index 81ccd19c1d9f..c96e58d3d2f8 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -31,6 +31,8 @@ struct resource ioport_resource = {
 	.start	= 0,
 	.end	= IO_SPACE_LIMIT,
 	.flags	= IORESOURCE_IO,
+	.sibling = LIST_HEAD_INIT(ioport_resource.sibling),
+	.child  = LIST_HEAD_INIT(ioport_resource.child),
 };
 EXPORT_SYMBOL(ioport_resource);
 
@@ -39,6 +41,8 @@ struct resource iomem_resource = {
 	.start	= 0,
 	.end	= -1,
 	.flags	= IORESOURCE_MEM,
+	.sibling = LIST_HEAD_INIT(iomem_resource.sibling),
+	.child  = LIST_HEAD_INIT(iomem_resource.child),
 };
 EXPORT_SYMBOL(iomem_resource);
 
@@ -57,20 +61,20 @@ static DEFINE_RWLOCK(resource_lock);
  * by boot mem after the system is up. So for reusing the resource entry
  * we need to remember the resource.
  */
-static struct resource *bootmem_resource_free;
+static struct list_head bootmem_resource_free = LIST_HEAD_INIT(bootmem_resource_free);
 static DEFINE_SPINLOCK(bootmem_resource_lock);
 
 static struct resource *next_resource(struct resource *p, bool sibling_only)
 {
 	/* Caller wants to traverse through siblings only */
 	if (sibling_only)
-		return p->sibling;
+		return resource_sibling(p);
 
-	if (p->child)
-		return p->child;
-	while (!p->sibling && p->parent)
+	if (!list_empty(&p->child))
+		return resource_first_child(&p->child);
+	while (!resource_sibling(p) && p->parent)
 		p = p->parent;
-	return p->sibling;
+	return resource_sibling(p);
 }
 
 static void *r_next(struct seq_file *m, void *v, loff_t *pos)
@@ -90,7 +94,7 @@ static void *r_start(struct seq_file *m, loff_t *pos)
 	struct resource *p = PDE_DATA(file_inode(m->file));
 	loff_t l = 0;
 	read_lock(&resource_lock);
-	for (p = p->child; p && l < *pos; p = r_next(m, p, &l))
+	for (p = resource_first_child(&p->child); p && l < *pos; p = r_next(m, p, &l))
 		;
 	return p;
 }
@@ -153,8 +157,7 @@ static void free_resource(struct resource *res)
 
 	if (!PageSlab(virt_to_head_page(res))) {
 		spin_lock(&bootmem_resource_lock);
-		res->sibling = bootmem_resource_free;
-		bootmem_resource_free = res;
+		list_add(&res->sibling, &bootmem_resource_free);
 		spin_unlock(&bootmem_resource_lock);
 	} else {
 		kfree(res);
@@ -166,10 +169,9 @@ static struct resource *alloc_resource(gfp_t flags)
 	struct resource *res = NULL;
 
 	spin_lock(&bootmem_resource_lock);
-	if (bootmem_resource_free) {
-		res = bootmem_resource_free;
-		bootmem_resource_free = res->sibling;
-	}
+	res = resource_first_child(&bootmem_resource_free);
+	if (res)
+		list_del(&res->sibling);
 	spin_unlock(&bootmem_resource_lock);
 
 	if (res)
@@ -177,6 +179,8 @@ static struct resource *alloc_resource(gfp_t flags)
 	else
 		res = kzalloc(sizeof(struct resource), flags);
 
+	INIT_LIST_HEAD(&res->child);
+	INIT_LIST_HEAD(&res->sibling);
 	return res;
 }
 
@@ -185,7 +189,7 @@ static struct resource * __request_resource(struct resource *root, struct resour
 {
 	resource_size_t start = new->start;
 	resource_size_t end = new->end;
-	struct resource *tmp, **p;
+	struct resource *tmp;
 
 	if (end < start)
 		return root;
@@ -193,64 +197,62 @@ static struct resource * __request_resource(struct resource *root, struct resour
 		return root;
 	if (end > root->end)
 		return root;
-	p = &root->child;
-	for (;;) {
-		tmp = *p;
-		if (!tmp || tmp->start > end) {
-			new->sibling = tmp;
-			*p = new;
+
+	if (list_empty(&root->child)) {
+		list_add(&new->sibling, &root->child);
+		new->parent = root;
+		INIT_LIST_HEAD(&new->child);
+		return NULL;
+	}
+
+	list_for_each_entry(tmp, &root->child, sibling) {
+		if (tmp->start > end) {
+			list_add(&new->sibling, tmp->sibling.prev);
 			new->parent = root;
+			INIT_LIST_HEAD(&new->child);
 			return NULL;
 		}
-		p = &tmp->sibling;
 		if (tmp->end < start)
 			continue;
 		return tmp;
 	}
+
+	list_add_tail(&new->sibling, &root->child);
+	new->parent = root;
+	INIT_LIST_HEAD(&new->child);
+	return NULL;
 }
 
 static int __release_resource(struct resource *old, bool release_child)
 {
-	struct resource *tmp, **p, *chd;
+	struct resource *tmp, *next, *chd;
 
-	p = &old->parent->child;
-	for (;;) {
-		tmp = *p;
-		if (!tmp)
-			break;
+	list_for_each_entry_safe(tmp, next, &old->parent->child, sibling) {
 		if (tmp == old) {
-			if (release_child || !(tmp->child)) {
-				*p = tmp->sibling;
+			if (release_child || list_empty(&tmp->child)) {
+				list_del(&tmp->sibling);
 			} else {
-				for (chd = tmp->child;; chd = chd->sibling) {
+				list_for_each_entry(chd, &tmp->child, sibling)
 					chd->parent = tmp->parent;
-					if (!(chd->sibling))
-						break;
-				}
-				*p = tmp->child;
-				chd->sibling = tmp->sibling;
+				list_splice(&tmp->child, tmp->sibling.prev);
+				list_del(&tmp->sibling);
 			}
+
 			old->parent = NULL;
 			return 0;
 		}
-		p = &tmp->sibling;
 	}
 	return -EINVAL;
 }
 
 static void __release_child_resources(struct resource *r)
 {
-	struct resource *tmp, *p;
+	struct resource *tmp, *next;
 	resource_size_t size;
 
-	p = r->child;
-	r->child = NULL;
-	while (p) {
-		tmp = p;
-		p = p->sibling;
-
+	list_for_each_entry_safe(tmp, next, &r->child, sibling) {
 		tmp->parent = NULL;
-		tmp->sibling = NULL;
+		list_del_init(&tmp->sibling);
 		__release_child_resources(tmp);
 
 		printk(KERN_DEBUG "release child resource %pR\n", tmp);
@@ -259,6 +261,8 @@ static void __release_child_resources(struct resource *r)
 		tmp->start = 0;
 		tmp->end = size - 1;
 	}
+
+	INIT_LIST_HEAD(&tmp->child);
 }
 
 void release_child_resources(struct resource *r)
@@ -343,7 +347,8 @@ static int find_next_iomem_res(struct resource *res, unsigned long desc,
 
 	read_lock(&resource_lock);
 
-	for (p = iomem_resource.child; p; p = next_resource(p, sibling_only)) {
+	for (p = resource_first_child(&iomem_resource.child); p;
+			p = next_resource(p, sibling_only)) {
 		if ((p->flags & res->flags) != res->flags)
 			continue;
 		if ((desc != IORES_DESC_NONE) && (desc != p->desc))
@@ -532,7 +537,7 @@ int region_intersects(resource_size_t start, size_t size, unsigned long flags,
 	struct resource *p;
 
 	read_lock(&resource_lock);
-	for (p = iomem_resource.child; p ; p = p->sibling) {
+	list_for_each_entry(p, &iomem_resource.child, sibling) {
 		bool is_type = (((p->flags & flags) == flags) &&
 				((desc == IORES_DESC_NONE) ||
 				 (desc == p->desc)));
@@ -586,7 +591,7 @@ static int __find_resource(struct resource *root, struct resource *old,
 			 resource_size_t  size,
 			 struct resource_constraint *constraint)
 {
-	struct resource *this = root->child;
+	struct resource *this = resource_first_child(&root->child);
 	struct resource tmp = *new, avail, alloc;
 
 	tmp.start = root->start;
@@ -596,7 +601,7 @@ static int __find_resource(struct resource *root, struct resource *old,
 	 */
 	if (this && this->start == root->start) {
 		tmp.start = (this == old) ? old->start : this->end + 1;
-		this = this->sibling;
+		this = resource_sibling(this);
 	}
 	for(;;) {
 		if (this)
@@ -632,7 +637,7 @@ next:		if (!this || this->end == root->end)
 
 		if (this != old)
 			tmp.start = this->end + 1;
-		this = this->sibling;
+		this = resource_sibling(this);
 	}
 	return -EBUSY;
 }
@@ -676,7 +681,7 @@ static int reallocate_resource(struct resource *root, struct resource *old,
 		goto out;
 	}
 
-	if (old->child) {
+	if (!list_empty(&old->child)) {
 		err = -EBUSY;
 		goto out;
 	}
@@ -757,7 +762,7 @@ struct resource *lookup_resource(struct resource *root, resource_size_t start)
 	struct resource *res;
 
 	read_lock(&resource_lock);
-	for (res = root->child; res; res = res->sibling) {
+	list_for_each_entry(res, &root->child, sibling) {
 		if (res->start == start)
 			break;
 	}
@@ -790,32 +795,27 @@ static struct resource * __insert_resource(struct resource *parent, struct resou
 			break;
 	}
 
-	for (next = first; ; next = next->sibling) {
+	for (next = first; ; next = resource_sibling(next)) {
 		/* Partial overlap? Bad, and unfixable */
 		if (next->start < new->start || next->end > new->end)
 			return next;
-		if (!next->sibling)
+		if (!resource_sibling(next))
 			break;
-		if (next->sibling->start > new->end)
+		if (resource_sibling(next)->start > new->end)
 			break;
 	}
-
 	new->parent = parent;
-	new->sibling = next->sibling;
-	new->child = first;
+	list_add(&new->sibling, &next->sibling);
+	INIT_LIST_HEAD(&new->child);
 
-	next->sibling = NULL;
-	for (next = first; next; next = next->sibling)
+	/*
+	 * From first to next, they all fall into new's region, so change them
+	 * as new's children.
+	 */
+	list_cut_position(&new->child, first->sibling.prev, &next->sibling);
+	list_for_each_entry(next, &new->child, sibling)
 		next->parent = new;
 
-	if (parent->child == first) {
-		parent->child = new;
-	} else {
-		next = parent->child;
-		while (next->sibling != first)
-			next = next->sibling;
-		next->sibling = new;
-	}
 	return NULL;
 }
 
@@ -937,19 +937,17 @@ static int __adjust_resource(struct resource *res, resource_size_t start,
 	if ((start < parent->start) || (end > parent->end))
 		goto out;
 
-	if (res->sibling && (res->sibling->start <= end))
+	if (resource_sibling(res) && (resource_sibling(res)->start <= end))
 		goto out;
 
-	tmp = parent->child;
-	if (tmp != res) {
-		while (tmp->sibling != res)
-			tmp = tmp->sibling;
+	if (res->sibling.prev != &parent->child) {
+		tmp = list_prev_entry(res, sibling);
 		if (start <= tmp->end)
 			goto out;
 	}
 
 skip:
-	for (tmp = res->child; tmp; tmp = tmp->sibling)
+	list_for_each_entry(tmp, &res->child, sibling)
 		if ((tmp->start < start) || (tmp->end > end))
 			goto out;
 
@@ -996,27 +994,30 @@ EXPORT_SYMBOL(adjust_resource);
  */
 int reparent_resources(struct resource *parent, struct resource *res)
 {
-	struct resource *p, **pp;
-	struct resource **firstpp = NULL;
+	struct resource *p, *first = NULL;
 
-	for (pp = &parent->child; (p = *pp) != NULL; pp = &p->sibling) {
+	list_for_each_entry(p, &parent->child, sibling) {
 		if (p->end < res->start)
 			continue;
 		if (res->end < p->start)
 			break;
 		if (p->start < res->start || p->end > res->end)
 			return -ENOTSUPP;	/* not completely contained */
-		if (firstpp == NULL)
-			firstpp = pp;
+		if (first == NULL)
+			first = p;
 	}
-	if (firstpp == NULL)
+	if (first == NULL)
 		return -ECANCELED; /* didn't find any conflicting entries? */
 	res->parent = parent;
-	res->child = *firstpp;
-	res->sibling = *pp;
-	*firstpp = res;
-	*pp = NULL;
-	for (p = res->child; p != NULL; p = p->sibling) {
+	list_add(&res->sibling, p->sibling.prev);
+	INIT_LIST_HEAD(&res->child);
+
+	/*
+	 * From first to p's previous sibling, they all fall into
+	 * res's region, change them as res's children.
+	 */
+	list_cut_position(&res->child, first->sibling.prev, res->sibling.prev);
+	list_for_each_entry(p, &res->child, sibling) {
 		p->parent = res;
 		pr_debug("PCI: Reparented %s %pR under %s\n",
 			 p->name, p, res->name);
@@ -1216,34 +1217,32 @@ EXPORT_SYMBOL(__request_region);
 void __release_region(struct resource *parent, resource_size_t start,
 			resource_size_t n)
 {
-	struct resource **p;
+	struct resource *res;
 	resource_size_t end;
 
-	p = &parent->child;
+	res = resource_first_child(&parent->child);
 	end = start + n - 1;
 
 	write_lock(&resource_lock);
 
 	for (;;) {
-		struct resource *res = *p;
-
 		if (!res)
 			break;
 		if (res->start <= start && res->end >= end) {
 			if (!(res->flags & IORESOURCE_BUSY)) {
-				p = &res->child;
+				res = resource_first_child(&res->child);
 				continue;
 			}
 			if (res->start != start || res->end != end)
 				break;
-			*p = res->sibling;
+			list_del(&res->sibling);
 			write_unlock(&resource_lock);
 			if (res->flags & IORESOURCE_MUXED)
 				wake_up(&muxed_resource_wait);
 			free_resource(res);
 			return;
 		}
-		p = &res->sibling;
+		res = resource_sibling(res);
 	}
 
 	write_unlock(&resource_lock);
@@ -1278,9 +1277,7 @@ EXPORT_SYMBOL(__release_region);
 int release_mem_region_adjustable(struct resource *parent,
 			resource_size_t start, resource_size_t size)
 {
-	struct resource **p;
-	struct resource *res;
-	struct resource *new_res;
+	struct resource *res, *new_res;
 	resource_size_t end;
 	int ret = -EINVAL;
 
@@ -1291,16 +1288,16 @@ int release_mem_region_adjustable(struct resource *parent,
 	/* The alloc_resource() result gets checked later */
 	new_res = alloc_resource(GFP_KERNEL);
 
-	p = &parent->child;
+	res = resource_first_child(&parent->child);
 	write_lock(&resource_lock);
 
-	while ((res = *p)) {
+	while ((res)) {
 		if (res->start >= end)
 			break;
 
 		/* look for the next resource if it does not fit into */
 		if (res->start > start || res->end < end) {
-			p = &res->sibling;
+			res = resource_sibling(res);
 			continue;
 		}
 
@@ -1308,14 +1305,14 @@ int release_mem_region_adjustable(struct resource *parent,
 			break;
 
 		if (!(res->flags & IORESOURCE_BUSY)) {
-			p = &res->child;
+			res = resource_first_child(&res->child);
 			continue;
 		}
 
 		/* found the target resource; let's adjust accordingly */
 		if (res->start == start && res->end == end) {
 			/* free the whole entry */
-			*p = res->sibling;
+			list_del(&res->sibling);
 			free_resource(res);
 			ret = 0;
 		} else if (res->start == start && res->end != end) {
@@ -1338,14 +1335,13 @@ int release_mem_region_adjustable(struct resource *parent,
 			new_res->flags = res->flags;
 			new_res->desc = res->desc;
 			new_res->parent = res->parent;
-			new_res->sibling = res->sibling;
-			new_res->child = NULL;
+			INIT_LIST_HEAD(&new_res->child);
 
 			ret = __adjust_resource(res, res->start,
 						start - res->start);
 			if (ret)
 				break;
-			res->sibling = new_res;
+			list_add(&new_res->sibling, &res->sibling);
 			new_res = NULL;
 		}
 
@@ -1526,7 +1522,7 @@ static int __init reserve_setup(char *str)
 			res->end = io_start + io_num - 1;
 			res->flags |= IORESOURCE_BUSY;
 			res->desc = IORES_DESC_NONE;
-			res->child = NULL;
+			INIT_LIST_HEAD(&res->child);
 			if (request_resource(parent, res) == 0)
 				reserved = x+1;
 		}
@@ -1546,7 +1542,7 @@ int iomem_map_sanity_check(resource_size_t addr, unsigned long size)
 	loff_t l;
 
 	read_lock(&resource_lock);
-	for (p = p->child; p ; p = r_next(NULL, p, &l)) {
+	for (p = resource_first_child(&p->child); p; p = r_next(NULL, p, &l)) {
 		/*
 		 * We can probably skip the resources without
 		 * IORESOURCE_IO attribute?
@@ -1602,7 +1598,7 @@ bool iomem_is_exclusive(u64 addr)
 	addr = addr & PAGE_MASK;
 
 	read_lock(&resource_lock);
-	for (p = p->child; p ; p = r_next(NULL, p, &l)) {
+	for (p = resource_first_child(&p->child); p; p = r_next(NULL, p, &l)) {
 		/*
 		 * We can probably skip the resources without
 		 * IORESOURCE_IO attribute?
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v7 2/4] resource: Use list_head to link sibling resource
@ 2018-07-18  2:49   ` Baoquan He
  0 siblings, 0 replies; 83+ messages in thread
From: Baoquan He @ 2018-07-18  2:49 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	dan.j.williams-ral2JQCrhuEAvxtiuMwx3w,
	nicolas.pitre-QSEj5FYQhm4dnm+yROfE0A,
	josh-iaAMLnmF4UmaiuxdJuQwMA, fengguang.wu-ral2JQCrhuEAvxtiuMwx3w,
	bp-l3A5Bk7waGM, andy.shevchenko-Re5JQEeQqe8AvxtiuMwx3w
  Cc: linux-mips-6z/3iImG2C8G8FEW9MqTrA, brijesh.singh-5C7GfCeVMHo,
	devicetree-u79uwXL29TY76Z2rM5mHXA, airlied-cv59FeDIM0c,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	richard.weiyang-Re5JQEeQqe8AvxtiuMwx3w,
	jcmvbkbc-Re5JQEeQqe8AvxtiuMwx3w, Paul Mackerras,
	baiyaowei-0p4V/sDNsUmm0O/7XYngnFaTQe2KTcn/,
	kys-0li6OtcxBFHby3iVrkZq2A, frowand.list-Re5JQEeQqe8AvxtiuMwx3w,
	lorenzo.pieralisi-5wv7dgnIgG8, sthemmin-0li6OtcxBFHby3iVrkZq2A,
	Baoquan He, linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	Michael Ellerman, patrik.r.jakobsson-Re5JQEeQqe8AvxtiuMwx3w,
	linux-input-u79uwXL29TY76Z2rM5mHXA,
	gustavo-THi1TnShQwVAfugRpC6u6w, dyoung-H+wXaHxf7aLQT0dZR+AlfA,
	thomas.lendacky-5C7GfCeVMHo, haiyangz-0li6OtcxBFHby3iVrkZq2A,
	maarten.lankhorst-VuQAYsv1563Yd54FQh9/CA,
	jglisse-H+wXaHxf7aLQT0dZR+AlfA, seanpaul-F7+t8E8rja9g9hUCZPvPmw,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA, tglx-hfZtesqFncYOwBW4kG4KsQ,
	yinghai-DgEjT+Ai2ygdnm+yROfE0A,
	jonathan.derrick-ral2JQCrhuEAvxtiuMwx3w,
	chris-YvXeqwSYzG2sTnJN9+BGXg, monstr-pSz03upnqPeHXe+LvDLADg,
	linux-parisc-u79uwXL29TY76Z2rM5mHXA,
	gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r,
	dmitry.torokhov-Re5JQEeQqe8AvxtiuMwx3w, Benjamin

The struct resource uses singly linked list to link siblings, implemented
by pointer operation. Replace it with list_head for better code readability.

Based on this list_head replacement, it will be very easy to do reverse
iteration on iomem_resource's sibling list in later patch.

Besides, type of member variables of struct resource, sibling and child, are
changed from 'struct resource *' to 'struct list_head'. This brings two
pointers of size increase.

Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Baoquan He <bhe@redhat.com>
Cc: Patrik Jakobsson <patrik.r.jakobsson@gmail.com>
Cc: David Airlie <airlied@linux.ie>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Frank Rowand <frowand.list@gmail.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Jonathan Derrick <jonathan.derrick@intel.com>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Yaowei Bai <baiyaowei@cmss.chinamobile.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: devel@linuxdriverproject.org
Cc: linux-input@vger.kernel.org
Cc: linux-nvdimm@lists.01.org
Cc: devicetree@vger.kernel.org
Cc: linux-pci@vger.kernel.org
Cc: Michal Simek <monstr@monstr.eu>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>                                                                                             
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linux-mips@linux-mips.org
---
 arch/arm/plat-samsung/pm-check.c            |   6 +-
 arch/ia64/sn/kernel/io_init.c               |   2 +-
 arch/microblaze/pci/pci-common.c            |   4 +-
 arch/mips/pci/pci-rc32434.c                 |  12 +-
 arch/powerpc/kernel/pci-common.c            |   4 +-
 arch/sparc/kernel/ioport.c                  |   2 +-
 arch/xtensa/include/asm/pci-bridge.h        |   4 +-
 drivers/eisa/eisa-bus.c                     |   2 +
 drivers/gpu/drm/drm_memory.c                |   3 +-
 drivers/gpu/drm/gma500/gtt.c                |   5 +-
 drivers/hv/vmbus_drv.c                      |  52 +++----
 drivers/input/joystick/iforce/iforce-main.c |   4 +-
 drivers/nvdimm/namespace_devs.c             |   6 +-
 drivers/nvdimm/nd.h                         |   5 +-
 drivers/of/address.c                        |   4 +-
 drivers/parisc/lba_pci.c                    |   4 +-
 drivers/pci/controller/vmd.c                |   8 +-
 drivers/pci/probe.c                         |   2 +
 drivers/pci/setup-bus.c                     |   2 +-
 include/linux/ioport.h                      |  17 ++-
 kernel/resource.c                           | 206 ++++++++++++++--------------
 21 files changed, 183 insertions(+), 171 deletions(-)

diff --git a/arch/arm/plat-samsung/pm-check.c b/arch/arm/plat-samsung/pm-check.c
index cd2c02c68bc3..5494355b1c49 100644
--- a/arch/arm/plat-samsung/pm-check.c
+++ b/arch/arm/plat-samsung/pm-check.c
@@ -46,8 +46,8 @@ typedef u32 *(run_fn_t)(struct resource *ptr, u32 *arg);
 static void s3c_pm_run_res(struct resource *ptr, run_fn_t fn, u32 *arg)
 {
 	while (ptr != NULL) {
-		if (ptr->child != NULL)
-			s3c_pm_run_res(ptr->child, fn, arg);
+		if (!list_empty(&ptr->child))
+			s3c_pm_run_res(resource_first_child(&ptr->child), fn, arg);
 
 		if ((ptr->flags & IORESOURCE_SYSTEM_RAM)
 				== IORESOURCE_SYSTEM_RAM) {
@@ -57,7 +57,7 @@ static void s3c_pm_run_res(struct resource *ptr, run_fn_t fn, u32 *arg)
 			arg = (fn)(ptr, arg);
 		}
 
-		ptr = ptr->sibling;
+		ptr = resource_sibling(ptr);
 	}
 }
 
diff --git a/arch/ia64/sn/kernel/io_init.c b/arch/ia64/sn/kernel/io_init.c
index d63809a6adfa..338a7b7f194d 100644
--- a/arch/ia64/sn/kernel/io_init.c
+++ b/arch/ia64/sn/kernel/io_init.c
@@ -192,7 +192,7 @@ sn_io_slot_fixup(struct pci_dev *dev)
 		 * if it's already in the device structure, remove it before
 		 * inserting
 		 */
-		if (res->parent && res->parent->child)
+		if (res->parent && !list_empty(&res->parent->child))
 			release_resource(res);
 
 		if (res->flags & IORESOURCE_IO)
diff --git a/arch/microblaze/pci/pci-common.c b/arch/microblaze/pci/pci-common.c
index 7899bafab064..2bf73e27e231 100644
--- a/arch/microblaze/pci/pci-common.c
+++ b/arch/microblaze/pci/pci-common.c
@@ -533,7 +533,9 @@ void pci_process_bridge_OF_ranges(struct pci_controller *hose,
 			res->flags = range.flags;
 			res->start = range.cpu_addr;
 			res->end = range.cpu_addr + range.size - 1;
-			res->parent = res->child = res->sibling = NULL;
+			res->parent = NULL;
+			INIT_LIST_HEAD(&res->child);
+			INIT_LIST_HEAD(&res->sibling);
 		}
 	}
 
diff --git a/arch/mips/pci/pci-rc32434.c b/arch/mips/pci/pci-rc32434.c
index 7f6ce6d734c0..e80283df7925 100644
--- a/arch/mips/pci/pci-rc32434.c
+++ b/arch/mips/pci/pci-rc32434.c
@@ -53,8 +53,8 @@ static struct resource rc32434_res_pci_mem1 = {
 	.start = 0x50000000,
 	.end = 0x5FFFFFFF,
 	.flags = IORESOURCE_MEM,
-	.sibling = NULL,
-	.child = &rc32434_res_pci_mem2
+	.sibling = LIST_HEAD_INIT(rc32434_res_pci_mem1.sibling),
+	.child = LIST_HEAD_INIT(rc32434_res_pci_mem1.child),
 };
 
 static struct resource rc32434_res_pci_mem2 = {
@@ -63,8 +63,8 @@ static struct resource rc32434_res_pci_mem2 = {
 	.end = 0x6FFFFFFF,
 	.flags = IORESOURCE_MEM,
 	.parent = &rc32434_res_pci_mem1,
-	.sibling = NULL,
-	.child = NULL
+	.sibling = LIST_HEAD_INIT(rc32434_res_pci_mem2.sibling),
+	.child = LIST_HEAD_INIT(rc32434_res_pci_mem2.child),
 };
 
 static struct resource rc32434_res_pci_io1 = {
@@ -72,6 +72,8 @@ static struct resource rc32434_res_pci_io1 = {
 	.start = 0x18800000,
 	.end = 0x188FFFFF,
 	.flags = IORESOURCE_IO,
+	.sibling = LIST_HEAD_INIT(rc32434_res_pci_io1.sibling),
+	.child = LIST_HEAD_INIT(rc32434_res_pci_io1.child),
 };
 
 extern struct pci_ops rc32434_pci_ops;
@@ -208,6 +210,8 @@ static int __init rc32434_pci_init(void)
 
 	pr_info("PCI: Initializing PCI\n");
 
+	list_add(&rc32434_res_pci_mem2.sibling, &rc32434_res_pci_mem1.child);
+
 	ioport_resource.start = rc32434_res_pci_io1.start;
 	ioport_resource.end = rc32434_res_pci_io1.end;
 
diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index 926035bb378d..28fbe83c9daf 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -761,7 +761,9 @@ void pci_process_bridge_OF_ranges(struct pci_controller *hose,
 			res->flags = range.flags;
 			res->start = range.cpu_addr;
 			res->end = range.cpu_addr + range.size - 1;
-			res->parent = res->child = res->sibling = NULL;
+			res->parent = NULL;
+			INIT_LIST_HEAD(&res->child);
+			INIT_LIST_HEAD(&res->sibling);
 		}
 	}
 }
diff --git a/arch/sparc/kernel/ioport.c b/arch/sparc/kernel/ioport.c
index cca9134cfa7d..99efe4e98b16 100644
--- a/arch/sparc/kernel/ioport.c
+++ b/arch/sparc/kernel/ioport.c
@@ -669,7 +669,7 @@ static int sparc_io_proc_show(struct seq_file *m, void *v)
 	struct resource *root = m->private, *r;
 	const char *nm;
 
-	for (r = root->child; r != NULL; r = r->sibling) {
+	list_for_each_entry(r, &root->child, sibling) {
 		if ((nm = r->name) == NULL) nm = "???";
 		seq_printf(m, "%016llx-%016llx: %s\n",
 				(unsigned long long)r->start,
diff --git a/arch/xtensa/include/asm/pci-bridge.h b/arch/xtensa/include/asm/pci-bridge.h
index 0b68c76ec1e6..f487b06817df 100644
--- a/arch/xtensa/include/asm/pci-bridge.h
+++ b/arch/xtensa/include/asm/pci-bridge.h
@@ -71,8 +71,8 @@ static inline void pcibios_init_resource(struct resource *res,
 	res->flags = flags;
 	res->name = name;
 	res->parent = NULL;
-	res->sibling = NULL;
-	res->child = NULL;
+	INIT_LIST_HEAD(&res->child);
+	INIT_LIST_HEAD(&res->sibling);
 }
 
 
diff --git a/drivers/eisa/eisa-bus.c b/drivers/eisa/eisa-bus.c
index 1e8062f6dbfc..dba78f75fd06 100644
--- a/drivers/eisa/eisa-bus.c
+++ b/drivers/eisa/eisa-bus.c
@@ -408,6 +408,8 @@ static struct resource eisa_root_res = {
 	.start = 0,
 	.end   = 0xffffffff,
 	.flags = IORESOURCE_IO,
+	.sibling = LIST_HEAD_INIT(eisa_root_res.sibling),
+	.child  = LIST_HEAD_INIT(eisa_root_res.child),
 };
 
 static int eisa_bus_count;
diff --git a/drivers/gpu/drm/drm_memory.c b/drivers/gpu/drm/drm_memory.c
index d69e4fc1ee77..33baa7fa5e41 100644
--- a/drivers/gpu/drm/drm_memory.c
+++ b/drivers/gpu/drm/drm_memory.c
@@ -155,9 +155,8 @@ u64 drm_get_max_iomem(void)
 	struct resource *tmp;
 	resource_size_t max_iomem = 0;
 
-	for (tmp = iomem_resource.child; tmp; tmp = tmp->sibling) {
+	list_for_each_entry(tmp, &iomem_resource.child, sibling)
 		max_iomem = max(max_iomem,  tmp->end);
-	}
 
 	return max_iomem;
 }
diff --git a/drivers/gpu/drm/gma500/gtt.c b/drivers/gpu/drm/gma500/gtt.c
index 3949b0990916..addd3bc009af 100644
--- a/drivers/gpu/drm/gma500/gtt.c
+++ b/drivers/gpu/drm/gma500/gtt.c
@@ -565,7 +565,7 @@ int psb_gtt_init(struct drm_device *dev, int resume)
 int psb_gtt_restore(struct drm_device *dev)
 {
 	struct drm_psb_private *dev_priv = dev->dev_private;
-	struct resource *r = dev_priv->gtt_mem->child;
+	struct resource *r;
 	struct gtt_range *range;
 	unsigned int restored = 0, total = 0, size = 0;
 
@@ -573,14 +573,13 @@ int psb_gtt_restore(struct drm_device *dev)
 	mutex_lock(&dev_priv->gtt_mutex);
 	psb_gtt_init(dev, 1);
 
-	while (r != NULL) {
+	list_for_each_entry(r, &dev_priv->gtt_mem->child, sibling) {
 		range = container_of(r, struct gtt_range, resource);
 		if (range->pages) {
 			psb_gtt_insert(dev, range, 1);
 			size += range->resource.end - range->resource.start;
 			restored++;
 		}
-		r = r->sibling;
 		total++;
 	}
 	mutex_unlock(&dev_priv->gtt_mutex);
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index b10fe26c4891..d87ec5a1bc4c 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -1412,9 +1412,8 @@ static acpi_status vmbus_walk_resources(struct acpi_resource *res, void *ctx)
 {
 	resource_size_t start = 0;
 	resource_size_t end = 0;
-	struct resource *new_res;
+	struct resource *new_res, *tmp;
 	struct resource **old_res = &hyperv_mmio;
-	struct resource **prev_res = NULL;
 
 	switch (res->type) {
 
@@ -1461,44 +1460,36 @@ static acpi_status vmbus_walk_resources(struct acpi_resource *res, void *ctx)
 	/*
 	 * If two ranges are adjacent, merge them.
 	 */
-	do {
-		if (!*old_res) {
-			*old_res = new_res;
-			break;
-		}
-
-		if (((*old_res)->end + 1) == new_res->start) {
-			(*old_res)->end = new_res->end;
+	if (!*old_res) {
+		*old_res = new_res;
+		return AE_OK;
+	}
+	tmp = *old_res;
+	list_for_each_entry_from(tmp, &tmp->parent->child, sibling) {
+		if ((tmp->end + 1) == new_res->start) {
+			tmp->end = new_res->end;
 			kfree(new_res);
 			break;
 		}
 
-		if ((*old_res)->start == new_res->end + 1) {
-			(*old_res)->start = new_res->start;
+		if (tmp->start == new_res->end + 1) {
+			tmp->start = new_res->start;
 			kfree(new_res);
 			break;
 		}
 
-		if ((*old_res)->start > new_res->end) {
-			new_res->sibling = *old_res;
-			if (prev_res)
-				(*prev_res)->sibling = new_res;
-			*old_res = new_res;
+		if (tmp->start > new_res->end) {
+			list_add(&new_res->sibling, tmp->sibling.prev);
 			break;
 		}
-
-		prev_res = old_res;
-		old_res = &(*old_res)->sibling;
-
-	} while (1);
+	}
 
 	return AE_OK;
 }
 
 static int vmbus_acpi_remove(struct acpi_device *device)
 {
-	struct resource *cur_res;
-	struct resource *next_res;
+	struct resource *res;
 
 	if (hyperv_mmio) {
 		if (fb_mmio) {
@@ -1507,10 +1498,9 @@ static int vmbus_acpi_remove(struct acpi_device *device)
 			fb_mmio = NULL;
 		}
 
-		for (cur_res = hyperv_mmio; cur_res; cur_res = next_res) {
-			next_res = cur_res->sibling;
-			kfree(cur_res);
-		}
+		res = hyperv_mmio;
+		list_for_each_entry_from(res, &res->parent->child, sibling)
+			kfree(res);
 	}
 
 	return 0;
@@ -1596,7 +1586,8 @@ int vmbus_allocate_mmio(struct resource **new, struct hv_device *device_obj,
 		}
 	}
 
-	for (iter = hyperv_mmio; iter; iter = iter->sibling) {
+	iter = hyperv_mmio;
+	list_for_each_entry_from(iter, &iter->parent->child, sibling) {
 		if ((iter->start >= max) || (iter->end <= min))
 			continue;
 
@@ -1639,7 +1630,8 @@ void vmbus_free_mmio(resource_size_t start, resource_size_t size)
 	struct resource *iter;
 
 	down(&hyperv_mmio_lock);
-	for (iter = hyperv_mmio; iter; iter = iter->sibling) {
+	iter = hyperv_mmio;
+	list_for_each_entry_from(iter, &iter->parent->child, sibling) {
 		if ((iter->start >= start + size) || (iter->end <= start))
 			continue;
 
diff --git a/drivers/input/joystick/iforce/iforce-main.c b/drivers/input/joystick/iforce/iforce-main.c
index daeeb4c7e3b0..5c0be27b33ff 100644
--- a/drivers/input/joystick/iforce/iforce-main.c
+++ b/drivers/input/joystick/iforce/iforce-main.c
@@ -305,8 +305,8 @@ int iforce_init_device(struct iforce *iforce)
 	iforce->device_memory.end = 200;
 	iforce->device_memory.flags = IORESOURCE_MEM;
 	iforce->device_memory.parent = NULL;
-	iforce->device_memory.child = NULL;
-	iforce->device_memory.sibling = NULL;
+	INIT_LIST_HEAD(&iforce->device_memory.child);
+	INIT_LIST_HEAD(&iforce->device_memory.sibling);
 
 /*
  * Wait until device ready - until it sends its first response.
diff --git a/drivers/nvdimm/namespace_devs.c b/drivers/nvdimm/namespace_devs.c
index 28afdd668905..f53d410d9981 100644
--- a/drivers/nvdimm/namespace_devs.c
+++ b/drivers/nvdimm/namespace_devs.c
@@ -637,7 +637,7 @@ static resource_size_t scan_allocate(struct nd_region *nd_region,
  retry:
 	first = 0;
 	for_each_dpa_resource(ndd, res) {
-		struct resource *next = res->sibling, *new_res = NULL;
+		struct resource *next = resource_sibling(res), *new_res = NULL;
 		resource_size_t allocate, available = 0;
 		enum alloc_loc loc = ALLOC_ERR;
 		const char *action;
@@ -763,7 +763,7 @@ static resource_size_t scan_allocate(struct nd_region *nd_region,
 	 * an initial "pmem-reserve pass".  Only do an initial BLK allocation
 	 * when none of the DPA space is reserved.
 	 */
-	if ((is_pmem || !ndd->dpa.child) && n == to_allocate)
+	if ((is_pmem || list_empty(&ndd->dpa.child)) && n == to_allocate)
 		return init_dpa_allocation(label_id, nd_region, nd_mapping, n);
 	return n;
 }
@@ -779,7 +779,7 @@ static int merge_dpa(struct nd_region *nd_region,
  retry:
 	for_each_dpa_resource(ndd, res) {
 		int rc;
-		struct resource *next = res->sibling;
+		struct resource *next = resource_sibling(res);
 		resource_size_t end = res->start + resource_size(res);
 
 		if (!next || strcmp(res->name, label_id->id) != 0
diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h
index 32e0364b48b9..da7da15e03e7 100644
--- a/drivers/nvdimm/nd.h
+++ b/drivers/nvdimm/nd.h
@@ -102,11 +102,10 @@ unsigned sizeof_namespace_label(struct nvdimm_drvdata *ndd);
 		(unsigned long long) (res ? res->start : 0), ##arg)
 
 #define for_each_dpa_resource(ndd, res) \
-	for (res = (ndd)->dpa.child; res; res = res->sibling)
+	list_for_each_entry(res, &(ndd)->dpa.child, sibling)
 
 #define for_each_dpa_resource_safe(ndd, res, next) \
-	for (res = (ndd)->dpa.child, next = res ? res->sibling : NULL; \
-			res; res = next, next = next ? next->sibling : NULL)
+	list_for_each_entry_safe(res, next, &(ndd)->dpa.child, sibling)
 
 struct nd_percpu_lane {
 	int count;
diff --git a/drivers/of/address.c b/drivers/of/address.c
index 53349912ac75..e2e25719ab52 100644
--- a/drivers/of/address.c
+++ b/drivers/of/address.c
@@ -330,7 +330,9 @@ int of_pci_range_to_resource(struct of_pci_range *range,
 {
 	int err;
 	res->flags = range->flags;
-	res->parent = res->child = res->sibling = NULL;
+	res->parent = NULL;
+	INIT_LIST_HEAD(&res->child);
+	INIT_LIST_HEAD(&res->sibling);
 	res->name = np->full_name;
 
 	if (res->flags & IORESOURCE_IO) {
diff --git a/drivers/parisc/lba_pci.c b/drivers/parisc/lba_pci.c
index 69bd98421eb1..7482bdfd1959 100644
--- a/drivers/parisc/lba_pci.c
+++ b/drivers/parisc/lba_pci.c
@@ -170,8 +170,8 @@ lba_dump_res(struct resource *r, int d)
 	for (i = d; i ; --i) printk(" ");
 	printk(KERN_DEBUG "%p [%lx,%lx]/%lx\n", r,
 		(long)r->start, (long)r->end, r->flags);
-	lba_dump_res(r->child, d+2);
-	lba_dump_res(r->sibling, d);
+	lba_dump_res(resource_first_child(&r->child), d+2);
+	lba_dump_res(resource_sibling(r), d);
 }
 
 
diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
index 942b64fc7f1f..e3ace20345c7 100644
--- a/drivers/pci/controller/vmd.c
+++ b/drivers/pci/controller/vmd.c
@@ -542,14 +542,14 @@ static struct pci_ops vmd_ops = {
 
 static void vmd_attach_resources(struct vmd_dev *vmd)
 {
-	vmd->dev->resource[VMD_MEMBAR1].child = &vmd->resources[1];
-	vmd->dev->resource[VMD_MEMBAR2].child = &vmd->resources[2];
+	list_add(&vmd->resources[1].sibling, &vmd->dev->resource[VMD_MEMBAR1].child);
+	list_add(&vmd->resources[2].sibling, &vmd->dev->resource[VMD_MEMBAR2].child);
 }
 
 static void vmd_detach_resources(struct vmd_dev *vmd)
 {
-	vmd->dev->resource[VMD_MEMBAR1].child = NULL;
-	vmd->dev->resource[VMD_MEMBAR2].child = NULL;
+	INIT_LIST_HEAD(&vmd->dev->resource[VMD_MEMBAR1].child);
+	INIT_LIST_HEAD(&vmd->dev->resource[VMD_MEMBAR2].child);
 }
 
 /*
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index ac876e32de4b..9624dd1dfd49 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -59,6 +59,8 @@ static struct resource *get_pci_domain_busn_res(int domain_nr)
 	r->res.start = 0;
 	r->res.end = 0xff;
 	r->res.flags = IORESOURCE_BUS | IORESOURCE_PCI_FIXED;
+	INIT_LIST_HEAD(&r->res.child);
+	INIT_LIST_HEAD(&r->res.sibling);
 
 	list_add_tail(&r->list, &pci_domain_busn_res_list);
 
diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 79b1824e83b4..8e685af8938d 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -2107,7 +2107,7 @@ int pci_reassign_bridge_resources(struct pci_dev *bridge, unsigned long type)
 				continue;
 
 			/* Ignore BARs which are still in use */
-			if (res->child)
+			if (!list_empty(&res->child))
 				continue;
 
 			ret = add_to_list(&saved, bridge, res, 0, 0);
diff --git a/include/linux/ioport.h b/include/linux/ioport.h
index dfdcd0bfe54e..b7456ae889dd 100644
--- a/include/linux/ioport.h
+++ b/include/linux/ioport.h
@@ -12,6 +12,7 @@
 #ifndef __ASSEMBLY__
 #include <linux/compiler.h>
 #include <linux/types.h>
+#include <linux/list.h>
 /*
  * Resources are tree-like, allowing
  * nesting etc..
@@ -22,7 +23,8 @@ struct resource {
 	const char *name;
 	unsigned long flags;
 	unsigned long desc;
-	struct resource *parent, *sibling, *child;
+	struct list_head child, sibling;
+	struct resource *parent;
 };
 
 /*
@@ -216,7 +218,6 @@ static inline bool resource_contains(struct resource *r1, struct resource *r2)
 	return r1->start <= r2->start && r1->end >= r2->end;
 }
 
-
 /* Convenience shorthand with allocation */
 #define request_region(start,n,name)		__request_region(&ioport_resource, (start), (n), (name), 0)
 #define request_muxed_region(start,n,name)	__request_region(&ioport_resource, (start), (n), (name), IORESOURCE_MUXED)
@@ -287,6 +288,18 @@ static inline bool resource_overlaps(struct resource *r1, struct resource *r2)
        return (r1->start <= r2->end && r1->end >= r2->start);
 }
 
+static inline struct resource *resource_sibling(struct resource *res)
+{
+	if (res->parent && !list_is_last(&res->sibling, &res->parent->child))
+		return list_next_entry(res, sibling);
+	return NULL;
+}
+
+static inline struct resource *resource_first_child(struct list_head *head)
+{
+	return list_first_entry_or_null(head, struct resource, sibling);
+}
+
 
 #endif /* __ASSEMBLY__ */
 #endif	/* _LINUX_IOPORT_H */
diff --git a/kernel/resource.c b/kernel/resource.c
index 81ccd19c1d9f..c96e58d3d2f8 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -31,6 +31,8 @@ struct resource ioport_resource = {
 	.start	= 0,
 	.end	= IO_SPACE_LIMIT,
 	.flags	= IORESOURCE_IO,
+	.sibling = LIST_HEAD_INIT(ioport_resource.sibling),
+	.child  = LIST_HEAD_INIT(ioport_resource.child),
 };
 EXPORT_SYMBOL(ioport_resource);
 
@@ -39,6 +41,8 @@ struct resource iomem_resource = {
 	.start	= 0,
 	.end	= -1,
 	.flags	= IORESOURCE_MEM,
+	.sibling = LIST_HEAD_INIT(iomem_resource.sibling),
+	.child  = LIST_HEAD_INIT(iomem_resource.child),
 };
 EXPORT_SYMBOL(iomem_resource);
 
@@ -57,20 +61,20 @@ static DEFINE_RWLOCK(resource_lock);
  * by boot mem after the system is up. So for reusing the resource entry
  * we need to remember the resource.
  */
-static struct resource *bootmem_resource_free;
+static struct list_head bootmem_resource_free = LIST_HEAD_INIT(bootmem_resource_free);
 static DEFINE_SPINLOCK(bootmem_resource_lock);
 
 static struct resource *next_resource(struct resource *p, bool sibling_only)
 {
 	/* Caller wants to traverse through siblings only */
 	if (sibling_only)
-		return p->sibling;
+		return resource_sibling(p);
 
-	if (p->child)
-		return p->child;
-	while (!p->sibling && p->parent)
+	if (!list_empty(&p->child))
+		return resource_first_child(&p->child);
+	while (!resource_sibling(p) && p->parent)
 		p = p->parent;
-	return p->sibling;
+	return resource_sibling(p);
 }
 
 static void *r_next(struct seq_file *m, void *v, loff_t *pos)
@@ -90,7 +94,7 @@ static void *r_start(struct seq_file *m, loff_t *pos)
 	struct resource *p = PDE_DATA(file_inode(m->file));
 	loff_t l = 0;
 	read_lock(&resource_lock);
-	for (p = p->child; p && l < *pos; p = r_next(m, p, &l))
+	for (p = resource_first_child(&p->child); p && l < *pos; p = r_next(m, p, &l))
 		;
 	return p;
 }
@@ -153,8 +157,7 @@ static void free_resource(struct resource *res)
 
 	if (!PageSlab(virt_to_head_page(res))) {
 		spin_lock(&bootmem_resource_lock);
-		res->sibling = bootmem_resource_free;
-		bootmem_resource_free = res;
+		list_add(&res->sibling, &bootmem_resource_free);
 		spin_unlock(&bootmem_resource_lock);
 	} else {
 		kfree(res);
@@ -166,10 +169,9 @@ static struct resource *alloc_resource(gfp_t flags)
 	struct resource *res = NULL;
 
 	spin_lock(&bootmem_resource_lock);
-	if (bootmem_resource_free) {
-		res = bootmem_resource_free;
-		bootmem_resource_free = res->sibling;
-	}
+	res = resource_first_child(&bootmem_resource_free);
+	if (res)
+		list_del(&res->sibling);
 	spin_unlock(&bootmem_resource_lock);
 
 	if (res)
@@ -177,6 +179,8 @@ static struct resource *alloc_resource(gfp_t flags)
 	else
 		res = kzalloc(sizeof(struct resource), flags);
 
+	INIT_LIST_HEAD(&res->child);
+	INIT_LIST_HEAD(&res->sibling);
 	return res;
 }
 
@@ -185,7 +189,7 @@ static struct resource * __request_resource(struct resource *root, struct resour
 {
 	resource_size_t start = new->start;
 	resource_size_t end = new->end;
-	struct resource *tmp, **p;
+	struct resource *tmp;
 
 	if (end < start)
 		return root;
@@ -193,64 +197,62 @@ static struct resource * __request_resource(struct resource *root, struct resour
 		return root;
 	if (end > root->end)
 		return root;
-	p = &root->child;
-	for (;;) {
-		tmp = *p;
-		if (!tmp || tmp->start > end) {
-			new->sibling = tmp;
-			*p = new;
+
+	if (list_empty(&root->child)) {
+		list_add(&new->sibling, &root->child);
+		new->parent = root;
+		INIT_LIST_HEAD(&new->child);
+		return NULL;
+	}
+
+	list_for_each_entry(tmp, &root->child, sibling) {
+		if (tmp->start > end) {
+			list_add(&new->sibling, tmp->sibling.prev);
 			new->parent = root;
+			INIT_LIST_HEAD(&new->child);
 			return NULL;
 		}
-		p = &tmp->sibling;
 		if (tmp->end < start)
 			continue;
 		return tmp;
 	}
+
+	list_add_tail(&new->sibling, &root->child);
+	new->parent = root;
+	INIT_LIST_HEAD(&new->child);
+	return NULL;
 }
 
 static int __release_resource(struct resource *old, bool release_child)
 {
-	struct resource *tmp, **p, *chd;
+	struct resource *tmp, *next, *chd;
 
-	p = &old->parent->child;
-	for (;;) {
-		tmp = *p;
-		if (!tmp)
-			break;
+	list_for_each_entry_safe(tmp, next, &old->parent->child, sibling) {
 		if (tmp == old) {
-			if (release_child || !(tmp->child)) {
-				*p = tmp->sibling;
+			if (release_child || list_empty(&tmp->child)) {
+				list_del(&tmp->sibling);
 			} else {
-				for (chd = tmp->child;; chd = chd->sibling) {
+				list_for_each_entry(chd, &tmp->child, sibling)
 					chd->parent = tmp->parent;
-					if (!(chd->sibling))
-						break;
-				}
-				*p = tmp->child;
-				chd->sibling = tmp->sibling;
+				list_splice(&tmp->child, tmp->sibling.prev);
+				list_del(&tmp->sibling);
 			}
+
 			old->parent = NULL;
 			return 0;
 		}
-		p = &tmp->sibling;
 	}
 	return -EINVAL;
 }
 
 static void __release_child_resources(struct resource *r)
 {
-	struct resource *tmp, *p;
+	struct resource *tmp, *next;
 	resource_size_t size;
 
-	p = r->child;
-	r->child = NULL;
-	while (p) {
-		tmp = p;
-		p = p->sibling;
-
+	list_for_each_entry_safe(tmp, next, &r->child, sibling) {
 		tmp->parent = NULL;
-		tmp->sibling = NULL;
+		list_del_init(&tmp->sibling);
 		__release_child_resources(tmp);
 
 		printk(KERN_DEBUG "release child resource %pR\n", tmp);
@@ -259,6 +261,8 @@ static void __release_child_resources(struct resource *r)
 		tmp->start = 0;
 		tmp->end = size - 1;
 	}
+
+	INIT_LIST_HEAD(&tmp->child);
 }
 
 void release_child_resources(struct resource *r)
@@ -343,7 +347,8 @@ static int find_next_iomem_res(struct resource *res, unsigned long desc,
 
 	read_lock(&resource_lock);
 
-	for (p = iomem_resource.child; p; p = next_resource(p, sibling_only)) {
+	for (p = resource_first_child(&iomem_resource.child); p;
+			p = next_resource(p, sibling_only)) {
 		if ((p->flags & res->flags) != res->flags)
 			continue;
 		if ((desc != IORES_DESC_NONE) && (desc != p->desc))
@@ -532,7 +537,7 @@ int region_intersects(resource_size_t start, size_t size, unsigned long flags,
 	struct resource *p;
 
 	read_lock(&resource_lock);
-	for (p = iomem_resource.child; p ; p = p->sibling) {
+	list_for_each_entry(p, &iomem_resource.child, sibling) {
 		bool is_type = (((p->flags & flags) == flags) &&
 				((desc == IORES_DESC_NONE) ||
 				 (desc == p->desc)));
@@ -586,7 +591,7 @@ static int __find_resource(struct resource *root, struct resource *old,
 			 resource_size_t  size,
 			 struct resource_constraint *constraint)
 {
-	struct resource *this = root->child;
+	struct resource *this = resource_first_child(&root->child);
 	struct resource tmp = *new, avail, alloc;
 
 	tmp.start = root->start;
@@ -596,7 +601,7 @@ static int __find_resource(struct resource *root, struct resource *old,
 	 */
 	if (this && this->start == root->start) {
 		tmp.start = (this == old) ? old->start : this->end + 1;
-		this = this->sibling;
+		this = resource_sibling(this);
 	}
 	for(;;) {
 		if (this)
@@ -632,7 +637,7 @@ next:		if (!this || this->end == root->end)
 
 		if (this != old)
 			tmp.start = this->end + 1;
-		this = this->sibling;
+		this = resource_sibling(this);
 	}
 	return -EBUSY;
 }
@@ -676,7 +681,7 @@ static int reallocate_resource(struct resource *root, struct resource *old,
 		goto out;
 	}
 
-	if (old->child) {
+	if (!list_empty(&old->child)) {
 		err = -EBUSY;
 		goto out;
 	}
@@ -757,7 +762,7 @@ struct resource *lookup_resource(struct resource *root, resource_size_t start)
 	struct resource *res;
 
 	read_lock(&resource_lock);
-	for (res = root->child; res; res = res->sibling) {
+	list_for_each_entry(res, &root->child, sibling) {
 		if (res->start == start)
 			break;
 	}
@@ -790,32 +795,27 @@ static struct resource * __insert_resource(struct resource *parent, struct resou
 			break;
 	}
 
-	for (next = first; ; next = next->sibling) {
+	for (next = first; ; next = resource_sibling(next)) {
 		/* Partial overlap? Bad, and unfixable */
 		if (next->start < new->start || next->end > new->end)
 			return next;
-		if (!next->sibling)
+		if (!resource_sibling(next))
 			break;
-		if (next->sibling->start > new->end)
+		if (resource_sibling(next)->start > new->end)
 			break;
 	}
-
 	new->parent = parent;
-	new->sibling = next->sibling;
-	new->child = first;
+	list_add(&new->sibling, &next->sibling);
+	INIT_LIST_HEAD(&new->child);
 
-	next->sibling = NULL;
-	for (next = first; next; next = next->sibling)
+	/*
+	 * From first to next, they all fall into new's region, so change them
+	 * as new's children.
+	 */
+	list_cut_position(&new->child, first->sibling.prev, &next->sibling);
+	list_for_each_entry(next, &new->child, sibling)
 		next->parent = new;
 
-	if (parent->child == first) {
-		parent->child = new;
-	} else {
-		next = parent->child;
-		while (next->sibling != first)
-			next = next->sibling;
-		next->sibling = new;
-	}
 	return NULL;
 }
 
@@ -937,19 +937,17 @@ static int __adjust_resource(struct resource *res, resource_size_t start,
 	if ((start < parent->start) || (end > parent->end))
 		goto out;
 
-	if (res->sibling && (res->sibling->start <= end))
+	if (resource_sibling(res) && (resource_sibling(res)->start <= end))
 		goto out;
 
-	tmp = parent->child;
-	if (tmp != res) {
-		while (tmp->sibling != res)
-			tmp = tmp->sibling;
+	if (res->sibling.prev != &parent->child) {
+		tmp = list_prev_entry(res, sibling);
 		if (start <= tmp->end)
 			goto out;
 	}
 
 skip:
-	for (tmp = res->child; tmp; tmp = tmp->sibling)
+	list_for_each_entry(tmp, &res->child, sibling)
 		if ((tmp->start < start) || (tmp->end > end))
 			goto out;
 
@@ -996,27 +994,30 @@ EXPORT_SYMBOL(adjust_resource);
  */
 int reparent_resources(struct resource *parent, struct resource *res)
 {
-	struct resource *p, **pp;
-	struct resource **firstpp = NULL;
+	struct resource *p, *first = NULL;
 
-	for (pp = &parent->child; (p = *pp) != NULL; pp = &p->sibling) {
+	list_for_each_entry(p, &parent->child, sibling) {
 		if (p->end < res->start)
 			continue;
 		if (res->end < p->start)
 			break;
 		if (p->start < res->start || p->end > res->end)
 			return -ENOTSUPP;	/* not completely contained */
-		if (firstpp == NULL)
-			firstpp = pp;
+		if (first == NULL)
+			first = p;
 	}
-	if (firstpp == NULL)
+	if (first == NULL)
 		return -ECANCELED; /* didn't find any conflicting entries? */
 	res->parent = parent;
-	res->child = *firstpp;
-	res->sibling = *pp;
-	*firstpp = res;
-	*pp = NULL;
-	for (p = res->child; p != NULL; p = p->sibling) {
+	list_add(&res->sibling, p->sibling.prev);
+	INIT_LIST_HEAD(&res->child);
+
+	/*
+	 * From first to p's previous sibling, they all fall into
+	 * res's region, change them as res's children.
+	 */
+	list_cut_position(&res->child, first->sibling.prev, res->sibling.prev);
+	list_for_each_entry(p, &res->child, sibling) {
 		p->parent = res;
 		pr_debug("PCI: Reparented %s %pR under %s\n",
 			 p->name, p, res->name);
@@ -1216,34 +1217,32 @@ EXPORT_SYMBOL(__request_region);
 void __release_region(struct resource *parent, resource_size_t start,
 			resource_size_t n)
 {
-	struct resource **p;
+	struct resource *res;
 	resource_size_t end;
 
-	p = &parent->child;
+	res = resource_first_child(&parent->child);
 	end = start + n - 1;
 
 	write_lock(&resource_lock);
 
 	for (;;) {
-		struct resource *res = *p;
-
 		if (!res)
 			break;
 		if (res->start <= start && res->end >= end) {
 			if (!(res->flags & IORESOURCE_BUSY)) {
-				p = &res->child;
+				res = resource_first_child(&res->child);
 				continue;
 			}
 			if (res->start != start || res->end != end)
 				break;
-			*p = res->sibling;
+			list_del(&res->sibling);
 			write_unlock(&resource_lock);
 			if (res->flags & IORESOURCE_MUXED)
 				wake_up(&muxed_resource_wait);
 			free_resource(res);
 			return;
 		}
-		p = &res->sibling;
+		res = resource_sibling(res);
 	}
 
 	write_unlock(&resource_lock);
@@ -1278,9 +1277,7 @@ EXPORT_SYMBOL(__release_region);
 int release_mem_region_adjustable(struct resource *parent,
 			resource_size_t start, resource_size_t size)
 {
-	struct resource **p;
-	struct resource *res;
-	struct resource *new_res;
+	struct resource *res, *new_res;
 	resource_size_t end;
 	int ret = -EINVAL;
 
@@ -1291,16 +1288,16 @@ int release_mem_region_adjustable(struct resource *parent,
 	/* The alloc_resource() result gets checked later */
 	new_res = alloc_resource(GFP_KERNEL);
 
-	p = &parent->child;
+	res = resource_first_child(&parent->child);
 	write_lock(&resource_lock);
 
-	while ((res = *p)) {
+	while ((res)) {
 		if (res->start >= end)
 			break;
 
 		/* look for the next resource if it does not fit into */
 		if (res->start > start || res->end < end) {
-			p = &res->sibling;
+			res = resource_sibling(res);
 			continue;
 		}
 
@@ -1308,14 +1305,14 @@ int release_mem_region_adjustable(struct resource *parent,
 			break;
 
 		if (!(res->flags & IORESOURCE_BUSY)) {
-			p = &res->child;
+			res = resource_first_child(&res->child);
 			continue;
 		}
 
 		/* found the target resource; let's adjust accordingly */
 		if (res->start == start && res->end == end) {
 			/* free the whole entry */
-			*p = res->sibling;
+			list_del(&res->sibling);
 			free_resource(res);
 			ret = 0;
 		} else if (res->start == start && res->end != end) {
@@ -1338,14 +1335,13 @@ int release_mem_region_adjustable(struct resource *parent,
 			new_res->flags = res->flags;
 			new_res->desc = res->desc;
 			new_res->parent = res->parent;
-			new_res->sibling = res->sibling;
-			new_res->child = NULL;
+			INIT_LIST_HEAD(&new_res->child);
 
 			ret = __adjust_resource(res, res->start,
 						start - res->start);
 			if (ret)
 				break;
-			res->sibling = new_res;
+			list_add(&new_res->sibling, &res->sibling);
 			new_res = NULL;
 		}
 
@@ -1526,7 +1522,7 @@ static int __init reserve_setup(char *str)
 			res->end = io_start + io_num - 1;
 			res->flags |= IORESOURCE_BUSY;
 			res->desc = IORES_DESC_NONE;
-			res->child = NULL;
+			INIT_LIST_HEAD(&res->child);
 			if (request_resource(parent, res) == 0)
 				reserved = x+1;
 		}
@@ -1546,7 +1542,7 @@ int iomem_map_sanity_check(resource_size_t addr, unsigned long size)
 	loff_t l;
 
 	read_lock(&resource_lock);
-	for (p = p->child; p ; p = r_next(NULL, p, &l)) {
+	for (p = resource_first_child(&p->child); p; p = r_next(NULL, p, &l)) {
 		/*
 		 * We can probably skip the resources without
 		 * IORESOURCE_IO attribute?
@@ -1602,7 +1598,7 @@ bool iomem_is_exclusive(u64 addr)
 	addr = addr & PAGE_MASK;
 
 	read_lock(&resource_lock);
-	for (p = p->child; p ; p = r_next(NULL, p, &l)) {
+	for (p = resource_first_child(&p->child); p; p = r_next(NULL, p, &l)) {
 		/*
 		 * We can probably skip the resources without
 		 * IORESOURCE_IO attribute?
-- 
2.13.6

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v7 3/4] resource: add walk_system_ram_res_rev()
@ 2018-07-18  2:49   ` Baoquan He
  0 siblings, 0 replies; 83+ messages in thread
From: Baoquan He @ 2018-07-18  2:49 UTC (permalink / raw)
  To: linux-kernel, akpm, robh+dt, dan.j.williams, nicolas.pitre, josh,
	fengguang.wu, bp, andy.shevchenko
  Cc: brijesh.singh, devicetree, airlied, linux-pci, richard.weiyang,
	jcmvbkbc, baiyaowei, kys, frowand.list, lorenzo.pieralisi,
	sthemmin, Baoquan He, linux-nvdimm, patrik.r.jakobsson,
	linux-input, gustavo, dyoung, thomas.lendacky, haiyangz,
	maarten.lankhorst, jglisse, seanpaul, bhelgaas, tglx, yinghai,
	jonathan.derrick, chris, monstr, linux-parisc, gregkh,
	dmitry.torokhov, ebiederm, devel, linuxppc-dev, davem

This function, being a variant of walk_system_ram_res() introduced in
commit 8c86e70acead ("resource: provide new functions to walk through
resources"), walks through a list of all the resources of System RAM
in reversed order, i.e., from higher to lower.

It will be used in kexec_file code.

Signed-off-by: Baoquan He <bhe@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
---
 include/linux/ioport.h |  3 +++
 kernel/resource.c      | 40 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 43 insertions(+)

diff --git a/include/linux/ioport.h b/include/linux/ioport.h
index b7456ae889dd..066cc263e2cc 100644
--- a/include/linux/ioport.h
+++ b/include/linux/ioport.h
@@ -279,6 +279,9 @@ extern int
 walk_system_ram_res(u64 start, u64 end, void *arg,
 		    int (*func)(struct resource *, void *));
 extern int
+walk_system_ram_res_rev(u64 start, u64 end, void *arg,
+			int (*func)(struct resource *, void *));
+extern int
 walk_iomem_res_desc(unsigned long desc, unsigned long flags, u64 start, u64 end,
 		    void *arg, int (*func)(struct resource *, void *));
 
diff --git a/kernel/resource.c b/kernel/resource.c
index c96e58d3d2f8..3e18f24b90c4 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -23,6 +23,8 @@
 #include <linux/pfn.h>
 #include <linux/mm.h>
 #include <linux/resource_ext.h>
+#include <linux/string.h>
+#include <linux/vmalloc.h>
 #include <asm/io.h>
 
 
@@ -443,6 +445,44 @@ int walk_system_ram_res(u64 start, u64 end, void *arg,
 }
 
 /*
+ * This function, being a variant of walk_system_ram_res(), calls the @func
+ * callback against all memory ranges of type System RAM which are marked as
+ * IORESOURCE_SYSTEM_RAM and IORESOUCE_BUSY in reversed order, i.e., from
+ * higher to lower.
+ */
+int walk_system_ram_res_rev(u64 start, u64 end, void *arg,
+				int (*func)(struct resource *, void *))
+{
+	unsigned long flags;
+	struct resource *res;
+	int ret = -1;
+
+	flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
+
+	read_lock(&resource_lock);
+	list_for_each_entry_reverse(res, &iomem_resource.child, sibling) {
+		if (start >= end)
+			break;
+		if ((res->flags & flags) != flags)
+			continue;
+		if (res->desc != IORES_DESC_NONE)
+			continue;
+		if (res->end < start)
+			break;
+
+		if ((res->end >= start) && (res->start < end)) {
+			ret = (*func)(res, arg);
+			if (ret)
+				break;
+		}
+		end = res->start - 1;
+
+	}
+	read_unlock(&resource_lock);
+	return ret;
+}
+
+/*
  * This function calls the @func callback against all memory ranges, which
  * are ranges marked as IORESOURCE_MEM and IORESOUCE_BUSY.
  */
-- 
2.13.6

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v7 3/4] resource: add walk_system_ram_res_rev()
@ 2018-07-18  2:49   ` Baoquan He
  0 siblings, 0 replies; 83+ messages in thread
From: Baoquan He @ 2018-07-18  2:49 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	dan.j.williams-ral2JQCrhuEAvxtiuMwx3w,
	nicolas.pitre-QSEj5FYQhm4dnm+yROfE0A,
	josh-iaAMLnmF4UmaiuxdJuQwMA, fengguang.wu-ral2JQCrhuEAvxtiuMwx3w,
	bp-l3A5Bk7waGM, andy.shevchenko-Re5JQEeQqe8AvxtiuMwx3w
  Cc: brijesh.singh-5C7GfCeVMHo, devicetree-u79uwXL29TY76Z2rM5mHXA,
	airlied-cv59FeDIM0c, linux-pci-u79uwXL29TY76Z2rM5mHXA,
	richard.weiyang-Re5JQEeQqe8AvxtiuMwx3w,
	jcmvbkbc-Re5JQEeQqe8AvxtiuMwx3w,
	baiyaowei-0p4V/sDNsUmm0O/7XYngnFaTQe2KTcn/,
	kys-0li6OtcxBFHby3iVrkZq2A, frowand.list-Re5JQEeQqe8AvxtiuMwx3w,
	lorenzo.pieralisi-5wv7dgnIgG8, sthemmin-0li6OtcxBFHby3iVrkZq2A,
	Baoquan He, linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	patrik.r.jakobsson-Re5JQEeQqe8AvxtiuMwx3w,
	linux-input-u79uwXL29TY76Z2rM5mHXA,
	gustavo-THi1TnShQwVAfugRpC6u6w, dyoung-H+wXaHxf7aLQT0dZR+AlfA,
	thomas.lendacky-5C7GfCeVMHo, haiyangz-0li6OtcxBFHby3iVrkZq2A,
	maarten.lankhorst-VuQAYsv1563Yd54FQh9/CA,
	jglisse-H+wXaHxf7aLQT0dZR+AlfA, seanpaul-F7+t8E8rja9g9hUCZPvPmw,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA, tglx-hfZtesqFncYOwBW4kG4KsQ,
	yinghai-DgEjT+Ai2ygdnm+yROfE0A,
	jonathan.derrick-ral2JQCrhuEAvxtiuMwx3w,
	chris-YvXeqwSYzG2sTnJN9+BGXg, monstr-pSz03upnqPeHXe+LvDLADg,
	linux-parisc-u79uwXL29TY76Z2rM5mHXA,
	gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r,
	dmitry.torokhov-Re5JQEeQqe8AvxtiuMwx3w,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	devel-tBiZLqfeLfOHmIFyCCdPziST3g8Odh+X,
	linuxppc-dev-uLR06cmDAlY/bJ5BZ2RsiQ,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q

VGhpcyBmdW5jdGlvbiwgYmVpbmcgYSB2YXJpYW50IG9mIHdhbGtfc3lzdGVtX3JhbV9yZXMoKSBp
bnRyb2R1Y2VkIGluCmNvbW1pdCA4Yzg2ZTcwYWNlYWQgKCJyZXNvdXJjZTogcHJvdmlkZSBuZXcg
ZnVuY3Rpb25zIHRvIHdhbGsgdGhyb3VnaApyZXNvdXJjZXMiKSwgd2Fsa3MgdGhyb3VnaCBhIGxp
c3Qgb2YgYWxsIHRoZSByZXNvdXJjZXMgb2YgU3lzdGVtIFJBTQppbiByZXZlcnNlZCBvcmRlciwg
aS5lLiwgZnJvbSBoaWdoZXIgdG8gbG93ZXIuCgpJdCB3aWxsIGJlIHVzZWQgaW4ga2V4ZWNfZmls
ZSBjb2RlLgoKU2lnbmVkLW9mZi1ieTogQmFvcXVhbiBIZSA8YmhlQHJlZGhhdC5jb20+CkNjOiBB
bmRyZXcgTW9ydG9uIDxha3BtQGxpbnV4LWZvdW5kYXRpb24ub3JnPgpDYzogVGhvbWFzIEdsZWl4
bmVyIDx0Z2x4QGxpbnV0cm9uaXguZGU+CkNjOiBCcmlqZXNoIFNpbmdoIDxicmlqZXNoLnNpbmdo
QGFtZC5jb20+CkNjOiAiSsOpcsO0bWUgR2xpc3NlIiA8amdsaXNzZUByZWRoYXQuY29tPgpDYzog
Qm9yaXNsYXYgUGV0a292IDxicEBzdXNlLmRlPgpDYzogVG9tIExlbmRhY2t5IDx0aG9tYXMubGVu
ZGFja3lAYW1kLmNvbT4KQ2M6IFdlaSBZYW5nIDxyaWNoYXJkLndlaXlhbmdAZ21haWwuY29tPgot
LS0KIGluY2x1ZGUvbGludXgvaW9wb3J0LmggfCAgMyArKysKIGtlcm5lbC9yZXNvdXJjZS5jICAg
ICAgfCA0MCArKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrCiAyIGZpbGVz
IGNoYW5nZWQsIDQzIGluc2VydGlvbnMoKykKCmRpZmYgLS1naXQgYS9pbmNsdWRlL2xpbnV4L2lv
cG9ydC5oIGIvaW5jbHVkZS9saW51eC9pb3BvcnQuaAppbmRleCBiNzQ1NmFlODg5ZGQuLjA2NmNj
MjYzZTJjYyAxMDA2NDQKLS0tIGEvaW5jbHVkZS9saW51eC9pb3BvcnQuaAorKysgYi9pbmNsdWRl
L2xpbnV4L2lvcG9ydC5oCkBAIC0yNzksNiArMjc5LDkgQEAgZXh0ZXJuIGludAogd2Fsa19zeXN0
ZW1fcmFtX3Jlcyh1NjQgc3RhcnQsIHU2NCBlbmQsIHZvaWQgKmFyZywKIAkJICAgIGludCAoKmZ1
bmMpKHN0cnVjdCByZXNvdXJjZSAqLCB2b2lkICopKTsKIGV4dGVybiBpbnQKK3dhbGtfc3lzdGVt
X3JhbV9yZXNfcmV2KHU2NCBzdGFydCwgdTY0IGVuZCwgdm9pZCAqYXJnLAorCQkJaW50ICgqZnVu
Yykoc3RydWN0IHJlc291cmNlICosIHZvaWQgKikpOworZXh0ZXJuIGludAogd2Fsa19pb21lbV9y
ZXNfZGVzYyh1bnNpZ25lZCBsb25nIGRlc2MsIHVuc2lnbmVkIGxvbmcgZmxhZ3MsIHU2NCBzdGFy
dCwgdTY0IGVuZCwKIAkJICAgIHZvaWQgKmFyZywgaW50ICgqZnVuYykoc3RydWN0IHJlc291cmNl
ICosIHZvaWQgKikpOwogCmRpZmYgLS1naXQgYS9rZXJuZWwvcmVzb3VyY2UuYyBiL2tlcm5lbC9y
ZXNvdXJjZS5jCmluZGV4IGM5NmU1OGQzZDJmOC4uM2UxOGYyNGI5MGM0IDEwMDY0NAotLS0gYS9r
ZXJuZWwvcmVzb3VyY2UuYworKysgYi9rZXJuZWwvcmVzb3VyY2UuYwpAQCAtMjMsNiArMjMsOCBA
QAogI2luY2x1ZGUgPGxpbnV4L3Bmbi5oPgogI2luY2x1ZGUgPGxpbnV4L21tLmg+CiAjaW5jbHVk
ZSA8bGludXgvcmVzb3VyY2VfZXh0Lmg+CisjaW5jbHVkZSA8bGludXgvc3RyaW5nLmg+CisjaW5j
bHVkZSA8bGludXgvdm1hbGxvYy5oPgogI2luY2x1ZGUgPGFzbS9pby5oPgogCiAKQEAgLTQ0Myw2
ICs0NDUsNDQgQEAgaW50IHdhbGtfc3lzdGVtX3JhbV9yZXModTY0IHN0YXJ0LCB1NjQgZW5kLCB2
b2lkICphcmcsCiB9CiAKIC8qCisgKiBUaGlzIGZ1bmN0aW9uLCBiZWluZyBhIHZhcmlhbnQgb2Yg
d2Fsa19zeXN0ZW1fcmFtX3JlcygpLCBjYWxscyB0aGUgQGZ1bmMKKyAqIGNhbGxiYWNrIGFnYWlu
c3QgYWxsIG1lbW9yeSByYW5nZXMgb2YgdHlwZSBTeXN0ZW0gUkFNIHdoaWNoIGFyZSBtYXJrZWQg
YXMKKyAqIElPUkVTT1VSQ0VfU1lTVEVNX1JBTSBhbmQgSU9SRVNPVUNFX0JVU1kgaW4gcmV2ZXJz
ZWQgb3JkZXIsIGkuZS4sIGZyb20KKyAqIGhpZ2hlciB0byBsb3dlci4KKyAqLworaW50IHdhbGtf
c3lzdGVtX3JhbV9yZXNfcmV2KHU2NCBzdGFydCwgdTY0IGVuZCwgdm9pZCAqYXJnLAorCQkJCWlu
dCAoKmZ1bmMpKHN0cnVjdCByZXNvdXJjZSAqLCB2b2lkICopKQoreworCXVuc2lnbmVkIGxvbmcg
ZmxhZ3M7CisJc3RydWN0IHJlc291cmNlICpyZXM7CisJaW50IHJldCA9IC0xOworCisJZmxhZ3Mg
PSBJT1JFU09VUkNFX1NZU1RFTV9SQU0gfCBJT1JFU09VUkNFX0JVU1k7CisKKwlyZWFkX2xvY2so
JnJlc291cmNlX2xvY2spOworCWxpc3RfZm9yX2VhY2hfZW50cnlfcmV2ZXJzZShyZXMsICZpb21l
bV9yZXNvdXJjZS5jaGlsZCwgc2libGluZykgeworCQlpZiAoc3RhcnQgPj0gZW5kKQorCQkJYnJl
YWs7CisJCWlmICgocmVzLT5mbGFncyAmIGZsYWdzKSAhPSBmbGFncykKKwkJCWNvbnRpbnVlOwor
CQlpZiAocmVzLT5kZXNjICE9IElPUkVTX0RFU0NfTk9ORSkKKwkJCWNvbnRpbnVlOworCQlpZiAo
cmVzLT5lbmQgPCBzdGFydCkKKwkJCWJyZWFrOworCisJCWlmICgocmVzLT5lbmQgPj0gc3RhcnQp
ICYmIChyZXMtPnN0YXJ0IDwgZW5kKSkgeworCQkJcmV0ID0gKCpmdW5jKShyZXMsIGFyZyk7CisJ
CQlpZiAocmV0KQorCQkJCWJyZWFrOworCQl9CisJCWVuZCA9IHJlcy0+c3RhcnQgLSAxOworCisJ
fQorCXJlYWRfdW5sb2NrKCZyZXNvdXJjZV9sb2NrKTsKKwlyZXR1cm4gcmV0OworfQorCisvKgog
ICogVGhpcyBmdW5jdGlvbiBjYWxscyB0aGUgQGZ1bmMgY2FsbGJhY2sgYWdhaW5zdCBhbGwgbWVt
b3J5IHJhbmdlcywgd2hpY2gKICAqIGFyZSByYW5nZXMgbWFya2VkIGFzIElPUkVTT1VSQ0VfTUVN
IGFuZCBJT1JFU09VQ0VfQlVTWS4KICAqLwotLSAKMi4xMy42CgpfX19fX19fX19fX19fX19fX19f
X19fX19fX19fX19fX19fX19fX19fX19fX19fXwpMaW51eC1udmRpbW0gbWFpbGluZyBsaXN0Ckxp
bnV4LW52ZGltbUBsaXN0cy4wMS5vcmcKaHR0cHM6Ly9saXN0cy4wMS5vcmcvbWFpbG1hbi9saXN0
aW5mby9saW51eC1udmRpbW0K

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH v7 3/4] resource: add walk_system_ram_res_rev()
@ 2018-07-18  2:49   ` Baoquan He
  0 siblings, 0 replies; 83+ messages in thread
From: Baoquan He @ 2018-07-18  2:49 UTC (permalink / raw)
  To: linux-kernel, akpm, robh+dt, dan.j.williams, nicolas.pitre, josh,
	fengguang.wu, bp, andy.shevchenko
  Cc: patrik.r.jakobsson, airlied, kys, haiyangz, sthemmin,
	dmitry.torokhov, frowand.list, keith.busch, jonathan.derrick,
	lorenzo.pieralisi, bhelgaas, tglx, brijesh.singh, jglisse,
	thomas.lendacky, gregkh, baiyaowei, richard.weiyang, devel,
	linux-input, linux-nvdimm, devicetree, linux-pci, ebiederm,
	vgoyal, dyoung, yinghai, monstr, davem, chris, jcmvbkbc, gustavo,
	maarten.lankhorst, seanpaul, linux-parisc, linuxppc-dev,
	Baoquan He

This function, being a variant of walk_system_ram_res() introduced in
commit 8c86e70acead ("resource: provide new functions to walk through
resources"), walks through a list of all the resources of System RAM
in reversed order, i.e., from higher to lower.

It will be used in kexec_file code.

Signed-off-by: Baoquan He <bhe@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
---
 include/linux/ioport.h |  3 +++
 kernel/resource.c      | 40 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 43 insertions(+)

diff --git a/include/linux/ioport.h b/include/linux/ioport.h
index b7456ae889dd..066cc263e2cc 100644
--- a/include/linux/ioport.h
+++ b/include/linux/ioport.h
@@ -279,6 +279,9 @@ extern int
 walk_system_ram_res(u64 start, u64 end, void *arg,
 		    int (*func)(struct resource *, void *));
 extern int
+walk_system_ram_res_rev(u64 start, u64 end, void *arg,
+			int (*func)(struct resource *, void *));
+extern int
 walk_iomem_res_desc(unsigned long desc, unsigned long flags, u64 start, u64 end,
 		    void *arg, int (*func)(struct resource *, void *));
 
diff --git a/kernel/resource.c b/kernel/resource.c
index c96e58d3d2f8..3e18f24b90c4 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -23,6 +23,8 @@
 #include <linux/pfn.h>
 #include <linux/mm.h>
 #include <linux/resource_ext.h>
+#include <linux/string.h>
+#include <linux/vmalloc.h>
 #include <asm/io.h>
 
 
@@ -443,6 +445,44 @@ int walk_system_ram_res(u64 start, u64 end, void *arg,
 }
 
 /*
+ * This function, being a variant of walk_system_ram_res(), calls the @func
+ * callback against all memory ranges of type System RAM which are marked as
+ * IORESOURCE_SYSTEM_RAM and IORESOUCE_BUSY in reversed order, i.e., from
+ * higher to lower.
+ */
+int walk_system_ram_res_rev(u64 start, u64 end, void *arg,
+				int (*func)(struct resource *, void *))
+{
+	unsigned long flags;
+	struct resource *res;
+	int ret = -1;
+
+	flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
+
+	read_lock(&resource_lock);
+	list_for_each_entry_reverse(res, &iomem_resource.child, sibling) {
+		if (start >= end)
+			break;
+		if ((res->flags & flags) != flags)
+			continue;
+		if (res->desc != IORES_DESC_NONE)
+			continue;
+		if (res->end < start)
+			break;
+
+		if ((res->end >= start) && (res->start < end)) {
+			ret = (*func)(res, arg);
+			if (ret)
+				break;
+		}
+		end = res->start - 1;
+
+	}
+	read_unlock(&resource_lock);
+	return ret;
+}
+
+/*
  * This function calls the @func callback against all memory ranges, which
  * are ranges marked as IORESOURCE_MEM and IORESOUCE_BUSY.
  */
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v7 3/4] resource: add walk_system_ram_res_rev()
@ 2018-07-18  2:49   ` Baoquan He
  0 siblings, 0 replies; 83+ messages in thread
From: Baoquan He @ 2018-07-18  2:49 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	dan.j.williams-ral2JQCrhuEAvxtiuMwx3w,
	nicolas.pitre-QSEj5FYQhm4dnm+yROfE0A,
	josh-iaAMLnmF4UmaiuxdJuQwMA, fengguang.wu-ral2JQCrhuEAvxtiuMwx3w,
	bp-l3A5Bk7waGM, andy.shevchenko-Re5JQEeQqe8AvxtiuMwx3w
  Cc: brijesh.singh-5C7GfCeVMHo, devicetree-u79uwXL29TY76Z2rM5mHXA,
	airlied-cv59FeDIM0c, linux-pci-u79uwXL29TY76Z2rM5mHXA,
	richard.weiyang-Re5JQEeQqe8AvxtiuMwx3w,
	jcmvbkbc-Re5JQEeQqe8AvxtiuMwx3w,
	baiyaowei-0p4V/sDNsUmm0O/7XYngnFaTQe2KTcn/,
	kys-0li6OtcxBFHby3iVrkZq2A, frowand.list-Re5JQEeQqe8AvxtiuMwx3w,
	lorenzo.pieralisi-5wv7dgnIgG8, sthemmin-0li6OtcxBFHby3iVrkZq2A,
	Baoquan He, linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	patrik.r.jakobsson-Re5JQEeQqe8AvxtiuMwx3w,
	linux-input-u79uwXL29TY76Z2rM5mHXA,
	gustavo-THi1TnShQwVAfugRpC6u6w, dyoung-H+wXaHxf7aLQT0dZR+AlfA,
	thomas.lendacky-5C7GfCeVMHo, haiyangz-0li6OtcxBFHby3iVrkZq2A,
	maarten.lankhorst-VuQAYsv1563Yd54FQh9/CA,
	jglisse-H+wXaHxf7aLQT0dZR+AlfA, seanpaul-F7+t8E8rja9g9hUCZPvPmw,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA, tglx-hfZtesqFncYOwBW4kG4KsQ,
	yinghai-DgEjT+Ai2ygdnm+yROfE0A,
	jonathan.derrick-ral2JQCrhuEAvxtiuMwx3w,
	chris-YvXeqwSYzG2sTnJN9+BGXg, monstr-pSz03upnqPeHXe+LvDLADg,
	linux-parisc-u79uwXL29TY76Z2rM5mHXA,
	gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r,
	dmitry.torokhov-Re5JQEeQqe8AvxtiuMwx3w,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	devel-tBiZLqfeLfOHmIFyCCdPziST3g8Odh+X,
	linuxppc-dev-uLR06cmDAlY/bJ5BZ2RsiQ,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q

This function, being a variant of walk_system_ram_res() introduced in
commit 8c86e70acead ("resource: provide new functions to walk through
resources"), walks through a list of all the resources of System RAM
in reversed order, i.e., from higher to lower.

It will be used in kexec_file code.

Signed-off-by: Baoquan He <bhe@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
---
 include/linux/ioport.h |  3 +++
 kernel/resource.c      | 40 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 43 insertions(+)

diff --git a/include/linux/ioport.h b/include/linux/ioport.h
index b7456ae889dd..066cc263e2cc 100644
--- a/include/linux/ioport.h
+++ b/include/linux/ioport.h
@@ -279,6 +279,9 @@ extern int
 walk_system_ram_res(u64 start, u64 end, void *arg,
 		    int (*func)(struct resource *, void *));
 extern int
+walk_system_ram_res_rev(u64 start, u64 end, void *arg,
+			int (*func)(struct resource *, void *));
+extern int
 walk_iomem_res_desc(unsigned long desc, unsigned long flags, u64 start, u64 end,
 		    void *arg, int (*func)(struct resource *, void *));
 
diff --git a/kernel/resource.c b/kernel/resource.c
index c96e58d3d2f8..3e18f24b90c4 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -23,6 +23,8 @@
 #include <linux/pfn.h>
 #include <linux/mm.h>
 #include <linux/resource_ext.h>
+#include <linux/string.h>
+#include <linux/vmalloc.h>
 #include <asm/io.h>
 
 
@@ -443,6 +445,44 @@ int walk_system_ram_res(u64 start, u64 end, void *arg,
 }
 
 /*
+ * This function, being a variant of walk_system_ram_res(), calls the @func
+ * callback against all memory ranges of type System RAM which are marked as
+ * IORESOURCE_SYSTEM_RAM and IORESOUCE_BUSY in reversed order, i.e., from
+ * higher to lower.
+ */
+int walk_system_ram_res_rev(u64 start, u64 end, void *arg,
+				int (*func)(struct resource *, void *))
+{
+	unsigned long flags;
+	struct resource *res;
+	int ret = -1;
+
+	flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
+
+	read_lock(&resource_lock);
+	list_for_each_entry_reverse(res, &iomem_resource.child, sibling) {
+		if (start >= end)
+			break;
+		if ((res->flags & flags) != flags)
+			continue;
+		if (res->desc != IORES_DESC_NONE)
+			continue;
+		if (res->end < start)
+			break;
+
+		if ((res->end >= start) && (res->start < end)) {
+			ret = (*func)(res, arg);
+			if (ret)
+				break;
+		}
+		end = res->start - 1;
+
+	}
+	read_unlock(&resource_lock);
+	return ret;
+}
+
+/*
  * This function calls the @func callback against all memory ranges, which
  * are ranges marked as IORESOURCE_MEM and IORESOUCE_BUSY.
  */
-- 
2.13.6

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v7 4/4] kexec_file: Load kernel at top of system RAM if required
@ 2018-07-18  2:49   ` Baoquan He
  0 siblings, 0 replies; 83+ messages in thread
From: Baoquan He @ 2018-07-18  2:49 UTC (permalink / raw)
  To: linux-kernel, akpm, robh+dt, dan.j.williams, nicolas.pitre, josh,
	fengguang.wu, bp, andy.shevchenko
  Cc: brijesh.singh, devicetree, airlied, linux-pci, richard.weiyang,
	jcmvbkbc, baiyaowei, kys, frowand.list, lorenzo.pieralisi,
	sthemmin, Baoquan He, linux-nvdimm, patrik.r.jakobsson,
	linux-input, gustavo, dyoung, thomas.lendacky, haiyangz,
	maarten.lankhorst, jglisse, seanpaul, bhelgaas, tglx, yinghai,
	jonathan.derrick, chris, monstr, linux-parisc, gregkh,
	dmitry.torokhov, kexec, ebiederm, devel, linuxppc-dev, davem

For kexec_file loading, if kexec_buf.top_down is 'true', the memory which
is used to load kernel/initrd/purgatory is supposed to be allocated from
top to down. This is what we have been doing all along in the old kexec
loading interface and the kexec loading is still default setting in some
distributions. However, the current kexec_file loading interface doesn't
do like this. The function arch_kexec_walk_mem() it calls ignores checking
kexec_buf.top_down, but calls walk_system_ram_res() directly to go through
all resources of System RAM from bottom to up, to try to find memory region
which can contain the specific kexec buffer, then call locate_mem_hole_callback()
to allocate memory in that found memory region from top to down. This brings
confusion especially when KASLR is widely supported , users have to make clear
why kexec/kdump kernel loading position is different between these two
interfaces in order to exclude unnecessary noises. Hence these two interfaces
need be unified on behaviour.

Here add checking if kexec_buf.top_down is 'true' in arch_kexec_walk_mem(),
if yes, call the newly added walk_system_ram_res_rev() to find memory region
from top to down to load kernel.

Signed-off-by: Baoquan He <bhe@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: kexec@lists.infradead.org
---
 kernel/kexec_file.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index c6a3b6851372..75226c1d08ce 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -518,6 +518,8 @@ int __weak arch_kexec_walk_mem(struct kexec_buf *kbuf,
 					   IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY,
 					   crashk_res.start, crashk_res.end,
 					   kbuf, func);
+	else if (kbuf->top_down)
+		return walk_system_ram_res_rev(0, ULONG_MAX, kbuf, func);
 	else
 		return walk_system_ram_res(0, ULONG_MAX, kbuf, func);
 }
-- 
2.13.6

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v7 4/4] kexec_file: Load kernel at top of system RAM if required
@ 2018-07-18  2:49   ` Baoquan He
  0 siblings, 0 replies; 83+ messages in thread
From: Baoquan He @ 2018-07-18  2:49 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	dan.j.williams-ral2JQCrhuEAvxtiuMwx3w,
	nicolas.pitre-QSEj5FYQhm4dnm+yROfE0A,
	josh-iaAMLnmF4UmaiuxdJuQwMA, fengguang.wu-ral2JQCrhuEAvxtiuMwx3w,
	bp-l3A5Bk7waGM, andy.shevchenko-Re5JQEeQqe8AvxtiuMwx3w
  Cc: brijesh.singh-5C7GfCeVMHo, devicetree-u79uwXL29TY76Z2rM5mHXA,
	airlied-cv59FeDIM0c, linux-pci-u79uwXL29TY76Z2rM5mHXA,
	richard.weiyang-Re5JQEeQqe8AvxtiuMwx3w,
	jcmvbkbc-Re5JQEeQqe8AvxtiuMwx3w,
	baiyaowei-0p4V/sDNsUmm0O/7XYngnFaTQe2KTcn/,
	kys-0li6OtcxBFHby3iVrkZq2A, frowand.list-Re5JQEeQqe8AvxtiuMwx3w,
	lorenzo.pieralisi-5wv7dgnIgG8, sthemmin-0li6OtcxBFHby3iVrkZq2A,
	Baoquan He, linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	patrik.r.jakobsson-Re5JQEeQqe8AvxtiuMwx3w,
	linux-input-u79uwXL29TY76Z2rM5mHXA,
	gustavo-THi1TnShQwVAfugRpC6u6w, dyoung-H+wXaHxf7aLQT0dZR+AlfA,
	thomas.lendacky-5C7GfCeVMHo, haiyangz-0li6OtcxBFHby3iVrkZq2A,
	maarten.lankhorst-VuQAYsv1563Yd54FQh9/CA,
	jglisse-H+wXaHxf7aLQT0dZR+AlfA, seanpaul-F7+t8E8rja9g9hUCZPvPmw,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA, tglx-hfZtesqFncYOwBW4kG4KsQ,
	yinghai-DgEjT+Ai2ygdnm+yROfE0A,
	jonathan.derrick-ral2JQCrhuEAvxtiuMwx3w,
	chris-YvXeqwSYzG2sTnJN9+BGXg, monstr-pSz03upnqPeHXe+LvDLADg,
	linux-parisc-u79uwXL29TY76Z2rM5mHXA,
	gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r,
	dmitry.torokhov-Re5JQEeQqe8AvxtiuMwx3w,
	kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	devel-tBiZLqfeLfOHmIFyCCdPziST3g8Odh+X,
	linuxppc-dev-uLR06cmDAlY/bJ5BZ2RsiQ, davem

For kexec_file loading, if kexec_buf.top_down is 'true', the memory which
is used to load kernel/initrd/purgatory is supposed to be allocated from
top to down. This is what we have been doing all along in the old kexec
loading interface and the kexec loading is still default setting in some
distributions. However, the current kexec_file loading interface doesn't
do like this. The function arch_kexec_walk_mem() it calls ignores checking
kexec_buf.top_down, but calls walk_system_ram_res() directly to go through
all resources of System RAM from bottom to up, to try to find memory region
which can contain the specific kexec buffer, then call locate_mem_hole_callback()
to allocate memory in that found memory region from top to down. This brings
confusion especially when KASLR is widely supported , users have to make clear
why kexec/kdump kernel loading position is different between these two
interfaces in order to exclude unnecessary noises. Hence these two interfaces
need be unified on behaviour.

Here add checking if kexec_buf.top_down is 'true' in arch_kexec_walk_mem(),
if yes, call the newly added walk_system_ram_res_rev() to find memory region
from top to down to load kernel.

Signed-off-by: Baoquan He <bhe-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Cc: Eric Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
Cc: Vivek Goyal <vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Cc: Dave Young <dyoung-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Cc: Andrew Morton <akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
Cc: Yinghai Lu <yinghai-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org
---
 kernel/kexec_file.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index c6a3b6851372..75226c1d08ce 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -518,6 +518,8 @@ int __weak arch_kexec_walk_mem(struct kexec_buf *kbuf,
 					   IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY,
 					   crashk_res.start, crashk_res.end,
 					   kbuf, func);
+	else if (kbuf->top_down)
+		return walk_system_ram_res_rev(0, ULONG_MAX, kbuf, func);
 	else
 		return walk_system_ram_res(0, ULONG_MAX, kbuf, func);
 }
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v7 4/4] kexec_file: Load kernel at top of system RAM if required
@ 2018-07-18  2:49   ` Baoquan He
  0 siblings, 0 replies; 83+ messages in thread
From: Baoquan He @ 2018-07-18  2:49 UTC (permalink / raw)
  To: linux-kernel, akpm, robh+dt, dan.j.williams, nicolas.pitre, josh,
	fengguang.wu, bp, andy.shevchenko
  Cc: patrik.r.jakobsson, airlied, kys, haiyangz, sthemmin,
	dmitry.torokhov, frowand.list, keith.busch, jonathan.derrick,
	lorenzo.pieralisi, bhelgaas, tglx, brijesh.singh, jglisse,
	thomas.lendacky, gregkh, baiyaowei, richard.weiyang, devel,
	linux-input, linux-nvdimm, devicetree, linux-pci, ebiederm,
	vgoyal, dyoung, yinghai, monstr, davem, chris, jcmvbkbc, gustavo,
	maarten.lankhorst, seanpaul, linux-parisc, linuxppc-dev,
	Baoquan He, kexec

For kexec_file loading, if kexec_buf.top_down is 'true', the memory which
is used to load kernel/initrd/purgatory is supposed to be allocated from
top to down. This is what we have been doing all along in the old kexec
loading interface and the kexec loading is still default setting in some
distributions. However, the current kexec_file loading interface doesn't
do like this. The function arch_kexec_walk_mem() it calls ignores checking
kexec_buf.top_down, but calls walk_system_ram_res() directly to go through
all resources of System RAM from bottom to up, to try to find memory region
which can contain the specific kexec buffer, then call locate_mem_hole_callback()
to allocate memory in that found memory region from top to down. This brings
confusion especially when KASLR is widely supported , users have to make clear
why kexec/kdump kernel loading position is different between these two
interfaces in order to exclude unnecessary noises. Hence these two interfaces
need be unified on behaviour.

Here add checking if kexec_buf.top_down is 'true' in arch_kexec_walk_mem(),
if yes, call the newly added walk_system_ram_res_rev() to find memory region
from top to down to load kernel.

Signed-off-by: Baoquan He <bhe@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: kexec@lists.infradead.org
---
 kernel/kexec_file.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index c6a3b6851372..75226c1d08ce 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -518,6 +518,8 @@ int __weak arch_kexec_walk_mem(struct kexec_buf *kbuf,
 					   IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY,
 					   crashk_res.start, crashk_res.end,
 					   kbuf, func);
+	else if (kbuf->top_down)
+		return walk_system_ram_res_rev(0, ULONG_MAX, kbuf, func);
 	else
 		return walk_system_ram_res(0, ULONG_MAX, kbuf, func);
 }
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v7 4/4] kexec_file: Load kernel at top of system RAM if required
@ 2018-07-18  2:49   ` Baoquan He
  0 siblings, 0 replies; 83+ messages in thread
From: Baoquan He @ 2018-07-18  2:49 UTC (permalink / raw)
  To: linux-kernel, akpm, robh+dt, dan.j.williams, nicolas.pitre, josh,
	fengguang.wu, bp, andy.shevchenko
  Cc: brijesh.singh, devicetree, airlied, linux-pci, richard.weiyang,
	keith.busch, jcmvbkbc, baiyaowei, kys, frowand.list,
	lorenzo.pieralisi, sthemmin, Baoquan He, linux-nvdimm,
	patrik.r.jakobsson, linux-input, gustavo, dyoung, vgoyal,
	thomas.lendacky, haiyangz, maarten.lankhorst, jglisse, seanpaul,
	bhelgaas, tglx, yinghai, jonathan.derrick, chris, monstr,
	linux-parisc, gregkh, dmitry.torokhov, kexec, ebiederm, devel,
	linuxppc-dev, davem

For kexec_file loading, if kexec_buf.top_down is 'true', the memory which
is used to load kernel/initrd/purgatory is supposed to be allocated from
top to down. This is what we have been doing all along in the old kexec
loading interface and the kexec loading is still default setting in some
distributions. However, the current kexec_file loading interface doesn't
do like this. The function arch_kexec_walk_mem() it calls ignores checking
kexec_buf.top_down, but calls walk_system_ram_res() directly to go through
all resources of System RAM from bottom to up, to try to find memory region
which can contain the specific kexec buffer, then call locate_mem_hole_callback()
to allocate memory in that found memory region from top to down. This brings
confusion especially when KASLR is widely supported , users have to make clear
why kexec/kdump kernel loading position is different between these two
interfaces in order to exclude unnecessary noises. Hence these two interfaces
need be unified on behaviour.

Here add checking if kexec_buf.top_down is 'true' in arch_kexec_walk_mem(),
if yes, call the newly added walk_system_ram_res_rev() to find memory region
from top to down to load kernel.

Signed-off-by: Baoquan He <bhe@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: kexec@lists.infradead.org
---
 kernel/kexec_file.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index c6a3b6851372..75226c1d08ce 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -518,6 +518,8 @@ int __weak arch_kexec_walk_mem(struct kexec_buf *kbuf,
 					   IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY,
 					   crashk_res.start, crashk_res.end,
 					   kbuf, func);
+	else if (kbuf->top_down)
+		return walk_system_ram_res_rev(0, ULONG_MAX, kbuf, func);
 	else
 		return walk_system_ram_res(0, ULONG_MAX, kbuf, func);
 }
-- 
2.13.6


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* Re: [PATCH v7 1/4] resource: Move reparent_resources() to kernel/resource.c and make it public
  2018-07-18  2:49   ` Baoquan He
  (?)
@ 2018-07-18 16:36     ` Andy Shevchenko
  -1 siblings, 0 replies; 83+ messages in thread
From: Andy Shevchenko @ 2018-07-18 16:36 UTC (permalink / raw)
  To: Baoquan He
  Cc: Nicolas Pitre, brijesh.singh, devicetree, David Airlie,
	linux-pci, richard.weiyang, Max Filippov, Paul Mackerras,
	baiyaowei, KY Srinivasan, Frank Rowand, Lorenzo Pieralisi,
	Stephen Hemminger, linux-nvdimm, Michael Ellerman,
	Patrik Jakobsson, linux-input, Gustavo Padovan, Borislav Petkov,
	Dave Young, Tom Lendacky, Haiyang Zhang, Maarten Lankhorst,
	Josh Triplett, Jérôme Glisse, Rob Herring, Sean Paul,
	Bjorn Helgaas, Thomas Gleixner, Yinghai Lu, Jon Derrick,
	Chris Zankel, Michal Simek, linux-parisc, Greg Kroah-Hartman,
	Dmitry Torokhov, Linux Kernel Mailing List,
	Benjamin Herrenschmidt, Eric Biederman, devel, Andrew Morton,
	kbuild test robot,
	open list:LINUX FOR POWERPC PA SEMI PWRFICIENT, David S. Miller

On Wed, Jul 18, 2018 at 5:49 AM, Baoquan He <bhe@redhat.com> wrote:
> reparent_resources() is duplicated in arch/microblaze/pci/pci-common.c
> and arch/powerpc/kernel/pci-common.c, so move it to kernel/resource.c
> so that it's shared.

Some minor stuff.

> +/**
> + * reparent_resources - reparent resource children of parent that res covers
> + * @parent: parent resource descriptor
> + * @res: resource descriptor desired by caller
> + *
> + * Returns 0 on success, -ENOTSUPP if child resource is not completely
> + * contained by 'res', -ECANCELED if no any conflicting entry found.

'res' -> @res

> + *
> + * Reparent resource children of 'parent' that conflict with 'res'

Ditto + 'parent' -> @parent

> + * under 'res', and make 'res' replace those children.

Ditto.

> + */
> +int reparent_resources(struct resource *parent, struct resource *res)
> +{
> +       struct resource *p, **pp;
> +       struct resource **firstpp = NULL;
> +
> +       for (pp = &parent->child; (p = *pp) != NULL; pp = &p->sibling) {
> +               if (p->end < res->start)
> +                       continue;
> +               if (res->end < p->start)
> +                       break;
> +               if (p->start < res->start || p->end > res->end)
> +                       return -ENOTSUPP;       /* not completely contained */
> +               if (firstpp == NULL)
> +                       firstpp = pp;
> +       }
> +       if (firstpp == NULL)
> +               return -ECANCELED; /* didn't find any conflicting entries? */
> +       res->parent = parent;
> +       res->child = *firstpp;
> +       res->sibling = *pp;
> +       *firstpp = res;
> +       *pp = NULL;
> +       for (p = res->child; p != NULL; p = p->sibling) {
> +               p->parent = res;

> +               pr_debug("PCI: Reparented %s %pR under %s\n",
> +                        p->name, p, res->name);

Now, PCI is a bit confusing here.

> +       }
> +       return 0;
> +}
> +EXPORT_SYMBOL(reparent_resources);
> +
>  static void __init __reserve_region_with_split(struct resource *root,
>                 resource_size_t start, resource_size_t end,
>                 const char *name)
> --
> 2.13.6
>



-- 
With Best Regards,
Andy Shevchenko
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v7 1/4] resource: Move reparent_resources() to kernel/resource.c and make it public
@ 2018-07-18 16:36     ` Andy Shevchenko
  0 siblings, 0 replies; 83+ messages in thread
From: Andy Shevchenko @ 2018-07-18 16:36 UTC (permalink / raw)
  To: Baoquan He
  Cc: Nicolas Pitre, brijesh.singh, devicetree, David Airlie,
	linux-pci, richard.weiyang, Keith Busch, Max Filippov,
	Paul Mackerras, baiyaowei, Frank Rowand, Dan Williams,
	Lorenzo Pieralisi, Stephen Hemminger, linux-nvdimm,
	Michael Ellerman, Patrik Jakobsson, linux-input, Gustavo Padovan,
	Borislav Petkov, Dave Young, Vivek Goyal, Tom Lendacky,
	Haiyang Zhang, Maarten

On Wed, Jul 18, 2018 at 5:49 AM, Baoquan He <bhe@redhat.com> wrote:
> reparent_resources() is duplicated in arch/microblaze/pci/pci-common.c
> and arch/powerpc/kernel/pci-common.c, so move it to kernel/resource.c
> so that it's shared.

Some minor stuff.

> +/**
> + * reparent_resources - reparent resource children of parent that res covers
> + * @parent: parent resource descriptor
> + * @res: resource descriptor desired by caller
> + *
> + * Returns 0 on success, -ENOTSUPP if child resource is not completely
> + * contained by 'res', -ECANCELED if no any conflicting entry found.

'res' -> @res

> + *
> + * Reparent resource children of 'parent' that conflict with 'res'

Ditto + 'parent' -> @parent

> + * under 'res', and make 'res' replace those children.

Ditto.

> + */
> +int reparent_resources(struct resource *parent, struct resource *res)
> +{
> +       struct resource *p, **pp;
> +       struct resource **firstpp = NULL;
> +
> +       for (pp = &parent->child; (p = *pp) != NULL; pp = &p->sibling) {
> +               if (p->end < res->start)
> +                       continue;
> +               if (res->end < p->start)
> +                       break;
> +               if (p->start < res->start || p->end > res->end)
> +                       return -ENOTSUPP;       /* not completely contained */
> +               if (firstpp == NULL)
> +                       firstpp = pp;
> +       }
> +       if (firstpp == NULL)
> +               return -ECANCELED; /* didn't find any conflicting entries? */
> +       res->parent = parent;
> +       res->child = *firstpp;
> +       res->sibling = *pp;
> +       *firstpp = res;
> +       *pp = NULL;
> +       for (p = res->child; p != NULL; p = p->sibling) {
> +               p->parent = res;

> +               pr_debug("PCI: Reparented %s %pR under %s\n",
> +                        p->name, p, res->name);

Now, PCI is a bit confusing here.

> +       }
> +       return 0;
> +}
> +EXPORT_SYMBOL(reparent_resources);
> +
>  static void __init __reserve_region_with_split(struct resource *root,
>                 resource_size_t start, resource_size_t end,
>                 const char *name)
> --
> 2.13.6
>



-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v7 1/4] resource: Move reparent_resources() to kernel/resource.c and make it public
@ 2018-07-18 16:36     ` Andy Shevchenko
  0 siblings, 0 replies; 83+ messages in thread
From: Andy Shevchenko @ 2018-07-18 16:36 UTC (permalink / raw)
  To: Baoquan He
  Cc: Linux Kernel Mailing List, Andrew Morton, Rob Herring,
	Dan Williams, Nicolas Pitre, Josh Triplett, kbuild test robot,
	Borislav Petkov, Patrik Jakobsson, David Airlie, KY Srinivasan,
	Haiyang Zhang, Stephen Hemminger, Dmitry Torokhov, Frank Rowand,
	Keith Busch, Jon Derrick, Lorenzo Pieralisi, Bjorn Helgaas,
	Thomas Gleixner, brijesh.singh, Jérôme Glisse,
	Tom Lendacky, Greg Kroah-Hartman, baiyaowei, richard.weiyang,
	devel, linux-input, linux-nvdimm, devicetree, linux-pci,
	Eric Biederman, Vivek Goyal, Dave Young, Yinghai Lu,
	Michal Simek, David S. Miller, Chris Zankel, Max Filippov,
	Gustavo Padovan, Maarten Lankhorst, Sean Paul, linux-parisc,
	open list:LINUX FOR POWERPC PA SEMI PWRFICIENT,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman

On Wed, Jul 18, 2018 at 5:49 AM, Baoquan He <bhe@redhat.com> wrote:
> reparent_resources() is duplicated in arch/microblaze/pci/pci-common.c
> and arch/powerpc/kernel/pci-common.c, so move it to kernel/resource.c
> so that it's shared.

Some minor stuff.

> +/**
> + * reparent_resources - reparent resource children of parent that res covers
> + * @parent: parent resource descriptor
> + * @res: resource descriptor desired by caller
> + *
> + * Returns 0 on success, -ENOTSUPP if child resource is not completely
> + * contained by 'res', -ECANCELED if no any conflicting entry found.

'res' -> @res

> + *
> + * Reparent resource children of 'parent' that conflict with 'res'

Ditto + 'parent' -> @parent

> + * under 'res', and make 'res' replace those children.

Ditto.

> + */
> +int reparent_resources(struct resource *parent, struct resource *res)
> +{
> +       struct resource *p, **pp;
> +       struct resource **firstpp = NULL;
> +
> +       for (pp = &parent->child; (p = *pp) != NULL; pp = &p->sibling) {
> +               if (p->end < res->start)
> +                       continue;
> +               if (res->end < p->start)
> +                       break;
> +               if (p->start < res->start || p->end > res->end)
> +                       return -ENOTSUPP;       /* not completely contained */
> +               if (firstpp == NULL)
> +                       firstpp = pp;
> +       }
> +       if (firstpp == NULL)
> +               return -ECANCELED; /* didn't find any conflicting entries? */
> +       res->parent = parent;
> +       res->child = *firstpp;
> +       res->sibling = *pp;
> +       *firstpp = res;
> +       *pp = NULL;
> +       for (p = res->child; p != NULL; p = p->sibling) {
> +               p->parent = res;

> +               pr_debug("PCI: Reparented %s %pR under %s\n",
> +                        p->name, p, res->name);

Now, PCI is a bit confusing here.

> +       }
> +       return 0;
> +}
> +EXPORT_SYMBOL(reparent_resources);
> +
>  static void __init __reserve_region_with_split(struct resource *root,
>                 resource_size_t start, resource_size_t end,
>                 const char *name)
> --
> 2.13.6
>



-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v7 1/4] resource: Move reparent_resources() to kernel/resource.c and make it public
@ 2018-07-18 16:37       ` Andy Shevchenko
  0 siblings, 0 replies; 83+ messages in thread
From: Andy Shevchenko @ 2018-07-18 16:37 UTC (permalink / raw)
  To: Baoquan He
  Cc: Nicolas Pitre, brijesh.singh, devicetree, David Airlie,
	linux-pci, richard.weiyang, Max Filippov, Paul Mackerras,
	baiyaowei, KY Srinivasan, Frank Rowand, Lorenzo Pieralisi,
	Stephen Hemminger, linux-nvdimm, Michael Ellerman,
	Patrik Jakobsson, linux-input, Gustavo Padovan, Borislav Petkov,
	Dave Young, Tom Lendacky, Haiyang Zhang, Maarten Lankhorst,
	Josh Triplett, Jérôme Glisse, Rob Herring, Sean Paul,
	Bjorn Helgaas, Thomas Gleixner, Yinghai Lu, Jon Derrick,
	Chris Zankel, Michal Simek, linux-parisc, Greg Kroah-Hartman,
	Dmitry Torokhov, Linux Kernel Mailing List,
	Benjamin Herrenschmidt, Eric Biederman, devel, Andrew Morton,
	kbuild test robot,
	open list:LINUX FOR POWERPC PA SEMI PWRFICIENT, David S. Miller

On Wed, Jul 18, 2018 at 7:36 PM, Andy Shevchenko
<andy.shevchenko@gmail.com> wrote:
> On Wed, Jul 18, 2018 at 5:49 AM, Baoquan He <bhe@redhat.com> wrote:
>> reparent_resources() is duplicated in arch/microblaze/pci/pci-common.c
>> and arch/powerpc/kernel/pci-common.c, so move it to kernel/resource.c
>> so that it's shared.

>> + * Returns 0 on success, -ENOTSUPP if child resource is not completely
>> + * contained by 'res', -ECANCELED if no any conflicting entry found.

You also can refer to constants by prefixing them with %, e.g. %-ENOTSUPP.
But this is up to you completely.

-- 
With Best Regards,
Andy Shevchenko
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v7 1/4] resource: Move reparent_resources() to kernel/resource.c and make it public
@ 2018-07-18 16:37       ` Andy Shevchenko
  0 siblings, 0 replies; 83+ messages in thread
From: Andy Shevchenko @ 2018-07-18 16:37 UTC (permalink / raw)
  To: Baoquan He
  Cc: Nicolas Pitre, brijesh.singh-5C7GfCeVMHo, devicetree,
	David Airlie, linux-pci-u79uwXL29TY76Z2rM5mHXA,
	richard.weiyang-Re5JQEeQqe8AvxtiuMwx3w, Max Filippov,
	Paul Mackerras, baiyaowei-0p4V/sDNsUmm0O/7XYngnFaTQe2KTcn/,
	KY Srinivasan, Frank Rowand, Lorenzo Pieralisi,
	Stephen Hemminger, linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	Michael Ellerman, Patrik Jakobsson, linux-input, Gustavo Padovan,
	Borislav Petkov, Dave Young, Tom Lendacky, Haiyang Zhang,
	Maarten Lankhorst, Josh Triplett

On Wed, Jul 18, 2018 at 7:36 PM, Andy Shevchenko
<andy.shevchenko-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> On Wed, Jul 18, 2018 at 5:49 AM, Baoquan He <bhe-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
>> reparent_resources() is duplicated in arch/microblaze/pci/pci-common.c
>> and arch/powerpc/kernel/pci-common.c, so move it to kernel/resource.c
>> so that it's shared.

>> + * Returns 0 on success, -ENOTSUPP if child resource is not completely
>> + * contained by 'res', -ECANCELED if no any conflicting entry found.

You also can refer to constants by prefixing them with %, e.g. %-ENOTSUPP.
But this is up to you completely.

-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v7 1/4] resource: Move reparent_resources() to kernel/resource.c and make it public
@ 2018-07-18 16:37       ` Andy Shevchenko
  0 siblings, 0 replies; 83+ messages in thread
From: Andy Shevchenko @ 2018-07-18 16:37 UTC (permalink / raw)
  To: Baoquan He
  Cc: Nicolas Pitre, brijesh.singh-5C7GfCeVMHo, devicetree,
	David Airlie, linux-pci-u79uwXL29TY76Z2rM5mHXA,
	richard.weiyang-Re5JQEeQqe8AvxtiuMwx3w, Max Filippov,
	Paul Mackerras, baiyaowei-0p4V/sDNsUmm0O/7XYngnFaTQe2KTcn/,
	KY Srinivasan, Frank Rowand, Lorenzo Pieralisi,
	Stephen Hemminger, linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	Michael Ellerman, Patrik Jakobsson, linux-input, Gustavo Padovan,
	Borislav Petkov, Dave Young, Tom Lendacky, Haiyang Zhang,
	Maarten Lankhorst, Josh Triplett

On Wed, Jul 18, 2018 at 7:36 PM, Andy Shevchenko
<andy.shevchenko-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> On Wed, Jul 18, 2018 at 5:49 AM, Baoquan He <bhe-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
>> reparent_resources() is duplicated in arch/microblaze/pci/pci-common.c
>> and arch/powerpc/kernel/pci-common.c, so move it to kernel/resource.c
>> so that it's shared.

>> + * Returns 0 on success, -ENOTSUPP if child resource is not completely
>> + * contained by 'res', -ECANCELED if no any conflicting entry found.

You also can refer to constants by prefixing them with %, e.g. %-ENOTSUPP.
But this is up to you completely.

-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v7 1/4] resource: Move reparent_resources() to kernel/resource.c and make it public
@ 2018-07-18 16:37       ` Andy Shevchenko
  0 siblings, 0 replies; 83+ messages in thread
From: Andy Shevchenko @ 2018-07-18 16:37 UTC (permalink / raw)
  To: Baoquan He
  Cc: Linux Kernel Mailing List, Andrew Morton, Rob Herring,
	Dan Williams, Nicolas Pitre, Josh Triplett, kbuild test robot,
	Borislav Petkov, Patrik Jakobsson, David Airlie, KY Srinivasan,
	Haiyang Zhang, Stephen Hemminger, Dmitry Torokhov, Frank Rowand,
	Keith Busch, Jon Derrick, Lorenzo Pieralisi, Bjorn Helgaas,
	Thomas Gleixner, brijesh.singh, Jérôme Glisse,
	Tom Lendacky, Greg Kroah-Hartman, baiyaowei, richard.weiyang,
	devel, linux-input, linux-nvdimm, devicetree, linux-pci,
	Eric Biederman, Vivek Goyal, Dave Young, Yinghai Lu,
	Michal Simek, David S. Miller, Chris Zankel, Max Filippov,
	Gustavo Padovan, Maarten Lankhorst, Sean Paul, linux-parisc,
	open list:LINUX FOR POWERPC PA SEMI PWRFICIENT,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman

On Wed, Jul 18, 2018 at 7:36 PM, Andy Shevchenko
<andy.shevchenko@gmail.com> wrote:
> On Wed, Jul 18, 2018 at 5:49 AM, Baoquan He <bhe@redhat.com> wrote:
>> reparent_resources() is duplicated in arch/microblaze/pci/pci-common.c
>> and arch/powerpc/kernel/pci-common.c, so move it to kernel/resource.c
>> so that it's shared.

>> + * Returns 0 on success, -ENOTSUPP if child resource is not completely
>> + * contained by 'res', -ECANCELED if no any conflicting entry found.

You also can refer to constants by prefixing them with %, e.g. %-ENOTSUPP.
But this is up to you completely.

-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v7 4/4] kexec_file: Load kernel at top of system RAM if required
  2018-07-18  2:49   ` Baoquan He
  (?)
  (?)
@ 2018-07-18 22:33     ` Andrew Morton
  -1 siblings, 0 replies; 83+ messages in thread
From: Andrew Morton @ 2018-07-18 22:33 UTC (permalink / raw)
  To: Baoquan He
  Cc: nicolas.pitre, brijesh.singh, devicetree, airlied, linux-pci,
	richard.weiyang, jcmvbkbc, baiyaowei, kys, frowand.list, tglx,
	lorenzo.pieralisi, sthemmin, linux-nvdimm, patrik.r.jakobsson,
	andy.shevchenko, linux-input, gustavo, bp, dyoung,
	thomas.lendacky, haiyangz, maarten.lankhorst, josh, jglisse,
	robh+dt, seanpaul, bhelgaas, yinghai, jonathan.derrick, chris,
	monstr, linux-parisc, gregkh, dmitry.torokhov, kexec,
	linux-kernel, ebiederm, devel, fengguang.wu, linuxppc-dev, davem

On Wed, 18 Jul 2018 10:49:44 +0800 Baoquan He <bhe@redhat.com> wrote:

> For kexec_file loading, if kexec_buf.top_down is 'true', the memory which
> is used to load kernel/initrd/purgatory is supposed to be allocated from
> top to down. This is what we have been doing all along in the old kexec
> loading interface and the kexec loading is still default setting in some
> distributions. However, the current kexec_file loading interface doesn't
> do like this. The function arch_kexec_walk_mem() it calls ignores checking
> kexec_buf.top_down, but calls walk_system_ram_res() directly to go through
> all resources of System RAM from bottom to up, to try to find memory region
> which can contain the specific kexec buffer, then call locate_mem_hole_callback()
> to allocate memory in that found memory region from top to down. This brings
> confusion especially when KASLR is widely supported , users have to make clear
> why kexec/kdump kernel loading position is different between these two
> interfaces in order to exclude unnecessary noises. Hence these two interfaces
> need be unified on behaviour.

As far as I can tell, the above is the whole reason for the patchset,
yes?  To avoid confusing users.

Is that sufficient?  Can we instead simplify their lives by providing
better documentation or informative printks or better Kconfig text,
etc?

And who *are* the people who are performing this configuration?  Random
system administrators?  Linux distro engineers?  If the latter then
they presumably aren't easily confused!

In other words, I'm trying to understand how much benefit this patchset
will provide to our users as a whole.

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v7 4/4] kexec_file: Load kernel at top of system RAM if required
@ 2018-07-18 22:33     ` Andrew Morton
  0 siblings, 0 replies; 83+ messages in thread
From: Andrew Morton @ 2018-07-18 22:33 UTC (permalink / raw)
  To: Baoquan He
  Cc: linux-kernel, robh+dt, dan.j.williams, nicolas.pitre, josh,
	fengguang.wu, bp, andy.shevchenko, patrik.r.jakobsson, airlied,
	kys, haiyangz, sthemmin, dmitry.torokhov, frowand.list,
	keith.busch, jonathan.derrick, lorenzo.pieralisi, bhelgaas, tglx,
	brijesh.singh, jglisse, thomas.lendacky, gregkh, baiyaowei,
	richard.weiyang, devel, linux-input, linux-nvdimm, devicetree,
	linux-pci, ebiederm

On Wed, 18 Jul 2018 10:49:44 +0800 Baoquan He <bhe@redhat.com> wrote:

> For kexec_file loading, if kexec_buf.top_down is 'true', the memory which
> is used to load kernel/initrd/purgatory is supposed to be allocated from
> top to down. This is what we have been doing all along in the old kexec
> loading interface and the kexec loading is still default setting in some
> distributions. However, the current kexec_file loading interface doesn't
> do like this. The function arch_kexec_walk_mem() it calls ignores checking
> kexec_buf.top_down, but calls walk_system_ram_res() directly to go through
> all resources of System RAM from bottom to up, to try to find memory region
> which can contain the specific kexec buffer, then call locate_mem_hole_callback()
> to allocate memory in that found memory region from top to down. This brings
> confusion especially when KASLR is widely supported , users have to make clear
> why kexec/kdump kernel loading position is different between these two
> interfaces in order to exclude unnecessary noises. Hence these two interfaces
> need be unified on behaviour.

As far as I can tell, the above is the whole reason for the patchset,
yes?  To avoid confusing users.

Is that sufficient?  Can we instead simplify their lives by providing
better documentation or informative printks or better Kconfig text,
etc?

And who *are* the people who are performing this configuration?  Random
system administrators?  Linux distro engineers?  If the latter then
they presumably aren't easily confused!

In other words, I'm trying to understand how much benefit this patchset
will provide to our users as a whole.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v7 4/4] kexec_file: Load kernel at top of system RAM if required
@ 2018-07-18 22:33     ` Andrew Morton
  0 siblings, 0 replies; 83+ messages in thread
From: Andrew Morton @ 2018-07-18 22:33 UTC (permalink / raw)
  To: Baoquan He
  Cc: linux-kernel, robh+dt, dan.j.williams, nicolas.pitre, josh,
	fengguang.wu, bp, andy.shevchenko, patrik.r.jakobsson, airlied,
	kys, haiyangz, sthemmin, dmitry.torokhov, frowand.list,
	keith.busch, jonathan.derrick, lorenzo.pieralisi, bhelgaas, tglx,
	brijesh.singh, jglisse, thomas.lendacky, gregkh, baiyaowei,
	richard.weiyang, devel, linux-input, linux-nvdimm, devicetree,
	linux-pci, ebiederm, vgoyal, dyoung, yinghai, monstr, davem,
	chris, jcmvbkbc, gustavo, maarten.lankhorst, seanpaul,
	linux-parisc, linuxppc-dev, kexec

On Wed, 18 Jul 2018 10:49:44 +0800 Baoquan He <bhe@redhat.com> wrote:

> For kexec_file loading, if kexec_buf.top_down is 'true', the memory which
> is used to load kernel/initrd/purgatory is supposed to be allocated from
> top to down. This is what we have been doing all along in the old kexec
> loading interface and the kexec loading is still default setting in some
> distributions. However, the current kexec_file loading interface doesn't
> do like this. The function arch_kexec_walk_mem() it calls ignores checking
> kexec_buf.top_down, but calls walk_system_ram_res() directly to go through
> all resources of System RAM from bottom to up, to try to find memory region
> which can contain the specific kexec buffer, then call locate_mem_hole_callback()
> to allocate memory in that found memory region from top to down. This brings
> confusion especially when KASLR is widely supported , users have to make clear
> why kexec/kdump kernel loading position is different between these two
> interfaces in order to exclude unnecessary noises. Hence these two interfaces
> need be unified on behaviour.

As far as I can tell, the above is the whole reason for the patchset,
yes?  To avoid confusing users.

Is that sufficient?  Can we instead simplify their lives by providing
better documentation or informative printks or better Kconfig text,
etc?

And who *are* the people who are performing this configuration?  Random
system administrators?  Linux distro engineers?  If the latter then
they presumably aren't easily confused!

In other words, I'm trying to understand how much benefit this patchset
will provide to our users as a whole.


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v7 4/4] kexec_file: Load kernel at top of system RAM if required
@ 2018-07-18 22:33     ` Andrew Morton
  0 siblings, 0 replies; 83+ messages in thread
From: Andrew Morton @ 2018-07-18 22:33 UTC (permalink / raw)
  To: Baoquan He
  Cc: nicolas.pitre, brijesh.singh, devicetree, airlied, linux-pci,
	richard.weiyang, keith.busch, jcmvbkbc, baiyaowei, kys,
	frowand.list, tglx, lorenzo.pieralisi, sthemmin, linux-nvdimm,
	patrik.r.jakobsson, andy.shevchenko, linux-input, gustavo, bp,
	dyoung, vgoyal, thomas.lendacky, haiyangz, maarten.lankhorst,
	josh, jglisse, robh+dt, seanpaul, bhelgaas, dan.j.williams,
	yinghai, jonathan.derrick, chris, monstr, linux-parisc, gregkh,
	dmitry.torokhov, kexec, linux-kernel, ebiederm, devel,
	fengguang.wu, linuxppc-dev, davem

On Wed, 18 Jul 2018 10:49:44 +0800 Baoquan He <bhe@redhat.com> wrote:

> For kexec_file loading, if kexec_buf.top_down is 'true', the memory which
> is used to load kernel/initrd/purgatory is supposed to be allocated from
> top to down. This is what we have been doing all along in the old kexec
> loading interface and the kexec loading is still default setting in some
> distributions. However, the current kexec_file loading interface doesn't
> do like this. The function arch_kexec_walk_mem() it calls ignores checking
> kexec_buf.top_down, but calls walk_system_ram_res() directly to go through
> all resources of System RAM from bottom to up, to try to find memory region
> which can contain the specific kexec buffer, then call locate_mem_hole_callback()
> to allocate memory in that found memory region from top to down. This brings
> confusion especially when KASLR is widely supported , users have to make clear
> why kexec/kdump kernel loading position is different between these two
> interfaces in order to exclude unnecessary noises. Hence these two interfaces
> need be unified on behaviour.

As far as I can tell, the above is the whole reason for the patchset,
yes?  To avoid confusing users.

Is that sufficient?  Can we instead simplify their lives by providing
better documentation or informative printks or better Kconfig text,
etc?

And who *are* the people who are performing this configuration?  Random
system administrators?  Linux distro engineers?  If the latter then
they presumably aren't easily confused!

In other words, I'm trying to understand how much benefit this patchset
will provide to our users as a whole.


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v7 4/4] kexec_file: Load kernel at top of system RAM if required
@ 2018-07-19 15:17       ` Baoquan He
  0 siblings, 0 replies; 83+ messages in thread
From: Baoquan He @ 2018-07-19 15:17 UTC (permalink / raw)
  To: Andrew Morton
  Cc: nicolas.pitre, brijesh.singh, devicetree, airlied, linux-pci,
	richard.weiyang, jcmvbkbc, baiyaowei, kys, frowand.list, tglx,
	lorenzo.pieralisi, sthemmin, linux-nvdimm, patrik.r.jakobsson,
	andy.shevchenko, linux-input, gustavo, bp, dyoung,
	thomas.lendacky, haiyangz, maarten.lankhorst, josh, jglisse,
	robh+dt, seanpaul, bhelgaas, yinghai, jonathan.derrick, chris,
	monstr, linux-parisc, gregkh, dmitry.torokhov, kexec,
	linux-kernel, ebiederm, devel, fengguang.wu, linuxppc-dev, davem

Hi Andrew,

On 07/18/18 at 03:33pm, Andrew Morton wrote:
> On Wed, 18 Jul 2018 10:49:44 +0800 Baoquan He <bhe@redhat.com> wrote:
> 
> > For kexec_file loading, if kexec_buf.top_down is 'true', the memory which
> > is used to load kernel/initrd/purgatory is supposed to be allocated from
> > top to down. This is what we have been doing all along in the old kexec
> > loading interface and the kexec loading is still default setting in some
> > distributions. However, the current kexec_file loading interface doesn't
> > do like this. The function arch_kexec_walk_mem() it calls ignores checking
> > kexec_buf.top_down, but calls walk_system_ram_res() directly to go through
> > all resources of System RAM from bottom to up, to try to find memory region
> > which can contain the specific kexec buffer, then call locate_mem_hole_callback()
> > to allocate memory in that found memory region from top to down. This brings
> > confusion especially when KASLR is widely supported , users have to make clear
> > why kexec/kdump kernel loading position is different between these two
> > interfaces in order to exclude unnecessary noises. Hence these two interfaces
> > need be unified on behaviour.
> 
> As far as I can tell, the above is the whole reason for the patchset,
> yes?  To avoid confusing users.


In fact, it's not just trying to avoid confusing users. Kexec loading
and kexec_file loading are just do the same thing in essence. Just we
need do kernel image verification on uefi system, have to port kexec
loading code to kernel. 

Kexec has been a formal feature in our distro, and customers owning
those kind of very large machine can make use of this feature to speed
up the reboot process. On uefi machine, the kexec_file loading will
search place to put kernel under 4G from top to down. As we know, the
1st 4G space is DMA32 ZONE, dma, pci mmcfg, bios etc all try to consume
it. It may have possibility to not be able to find a usable space for
kernel/initrd. From the top down of the whole memory space, we don't
have this worry. 

And at the first post, I just posted below with AKASHI's
walk_system_ram_res_rev() version. Later you suggested to use
list_head to link child sibling of resource, see what the code change
looks like.
http://lkml.kernel.org/r/20180322033722.9279-1-bhe@redhat.com

Then I posted v2
http://lkml.kernel.org/r/20180408024724.16812-1-bhe@redhat.com
Rob Herring mentioned that other components which has this tree struct
have planned to do the same thing, replacing the singly linked list with
list_head to link resource child sibling. Just quote Rob's words as
below. I think this could be another reason.

~~~~~ From Rob
The DT struct device_node also has the same tree structure with
parent, child, sibling pointers and converting to list_head had been
on the todo list for a while. ACPI also has some tree walking
functions (drivers/acpi/acpica/pstree.c). Perhaps there should be a
common tree struct and helpers defined either on top of list_head or a
~~~~~
new struct if that saves some size.

> 
> Is that sufficient?  Can we instead simplify their lives by providing
> better documentation or informative printks or better Kconfig text,
> etc?
> 
> And who *are* the people who are performing this configuration?  Random
> system administrators?  Linux distro engineers?  If the latter then
> they presumably aren't easily confused!

Kexec was invented for kernel developer to speed up their kernel
rebooting. Now high end sever admin, kernel developer and QE are also
keen to use it to reboot large box for faster feature testing, bug
debugging. Kernel dev could know this well, about kernel loading
position, admin or QE might not be aware of it very well. 

> 
> In other words, I'm trying to understand how much benefit this patchset
> will provide to our users as a whole.

Understood. The list_head replacing patch truly involes too many code
changes, it's risky. I am willing to try any idea from reviewers, won't
persuit they have to be accepted finally. If don't have a try, we don't
know what it looks like, and what impact it may have. I am fine to take
AKASHI's simple version of walk_system_ram_res_rev() to lower risk, even
though it could be a little bit low efficient.

Thanks
Baoquan
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v7 4/4] kexec_file: Load kernel at top of system RAM if required
@ 2018-07-19 15:17       ` Baoquan He
  0 siblings, 0 replies; 83+ messages in thread
From: Baoquan He @ 2018-07-19 15:17 UTC (permalink / raw)
  To: Andrew Morton
  Cc: nicolas.pitre-QSEj5FYQhm4dnm+yROfE0A, brijesh.singh-5C7GfCeVMHo,
	devicetree-u79uwXL29TY76Z2rM5mHXA, airlied-cv59FeDIM0c,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	richard.weiyang-Re5JQEeQqe8AvxtiuMwx3w,
	jcmvbkbc-Re5JQEeQqe8AvxtiuMwx3w,
	baiyaowei-0p4V/sDNsUmm0O/7XYngnFaTQe2KTcn/,
	kys-0li6OtcxBFHby3iVrkZq2A, frowand.list-Re5JQEeQqe8AvxtiuMwx3w,
	tglx-hfZtesqFncYOwBW4kG4KsQ, lorenzo.pieralisi-5wv7dgnIgG8,
	sthemmin-0li6OtcxBFHby3iVrkZq2A,
	linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	patrik.r.jakobsson-Re5JQEeQqe8AvxtiuMwx3w,
	andy.shevchenko-Re5JQEeQqe8AvxtiuMwx3w,
	linux-input-u79uwXL29TY76Z2rM5mHXA,
	gustavo-THi1TnShQwVAfugRpC6u6w, bp-l3A5Bk7waGM,
	dyoung-H+wXaHxf7aLQT0dZR+AlfA, thomas.lendacky-5C7GfCeVMHo,
	haiyangz-0li6OtcxBFHby3iVrkZq2A,
	maarten.lankhorst-VuQAYsv1563Yd54FQh9/CA,
	josh-iaAMLnmF4UmaiuxdJuQwMA, jglisse-H+wXaHxf7aLQT0dZR+AlfA,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A, seanpaul-F7+t8E8rja9g9hUCZPvPmw,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA, yinghai-DgEjT+Ai2ygdnm+yROfE0A,
	jonathan.derrick-ral2JQCrhuEAvxtiuMwx3w,
	chris-YvXeqwSYzG2sTnJN9+BGXg, monstr-pSz03upnqPeHXe+LvDLADg,
	linux-parisc-u79uwXL29TY76Z2rM5mHXA,
	gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r,
	dmitry.torokhov-Re5JQEeQqe8AvxtiuMwx3w,
	kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, linux-kernel

Hi Andrew,

On 07/18/18 at 03:33pm, Andrew Morton wrote:
> On Wed, 18 Jul 2018 10:49:44 +0800 Baoquan He <bhe-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> 
> > For kexec_file loading, if kexec_buf.top_down is 'true', the memory which
> > is used to load kernel/initrd/purgatory is supposed to be allocated from
> > top to down. This is what we have been doing all along in the old kexec
> > loading interface and the kexec loading is still default setting in some
> > distributions. However, the current kexec_file loading interface doesn't
> > do like this. The function arch_kexec_walk_mem() it calls ignores checking
> > kexec_buf.top_down, but calls walk_system_ram_res() directly to go through
> > all resources of System RAM from bottom to up, to try to find memory region
> > which can contain the specific kexec buffer, then call locate_mem_hole_callback()
> > to allocate memory in that found memory region from top to down. This brings
> > confusion especially when KASLR is widely supported , users have to make clear
> > why kexec/kdump kernel loading position is different between these two
> > interfaces in order to exclude unnecessary noises. Hence these two interfaces
> > need be unified on behaviour.
> 
> As far as I can tell, the above is the whole reason for the patchset,
> yes?  To avoid confusing users.


In fact, it's not just trying to avoid confusing users. Kexec loading
and kexec_file loading are just do the same thing in essence. Just we
need do kernel image verification on uefi system, have to port kexec
loading code to kernel. 

Kexec has been a formal feature in our distro, and customers owning
those kind of very large machine can make use of this feature to speed
up the reboot process. On uefi machine, the kexec_file loading will
search place to put kernel under 4G from top to down. As we know, the
1st 4G space is DMA32 ZONE, dma, pci mmcfg, bios etc all try to consume
it. It may have possibility to not be able to find a usable space for
kernel/initrd. From the top down of the whole memory space, we don't
have this worry. 

And at the first post, I just posted below with AKASHI's
walk_system_ram_res_rev() version. Later you suggested to use
list_head to link child sibling of resource, see what the code change
looks like.
http://lkml.kernel.org/r/20180322033722.9279-1-bhe-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org

Then I posted v2
http://lkml.kernel.org/r/20180408024724.16812-1-bhe-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org
Rob Herring mentioned that other components which has this tree struct
have planned to do the same thing, replacing the singly linked list with
list_head to link resource child sibling. Just quote Rob's words as
below. I think this could be another reason.

~~~~~ From Rob
The DT struct device_node also has the same tree structure with
parent, child, sibling pointers and converting to list_head had been
on the todo list for a while. ACPI also has some tree walking
functions (drivers/acpi/acpica/pstree.c). Perhaps there should be a
common tree struct and helpers defined either on top of list_head or a
~~~~~
new struct if that saves some size.

> 
> Is that sufficient?  Can we instead simplify their lives by providing
> better documentation or informative printks or better Kconfig text,
> etc?
> 
> And who *are* the people who are performing this configuration?  Random
> system administrators?  Linux distro engineers?  If the latter then
> they presumably aren't easily confused!

Kexec was invented for kernel developer to speed up their kernel
rebooting. Now high end sever admin, kernel developer and QE are also
keen to use it to reboot large box for faster feature testing, bug
debugging. Kernel dev could know this well, about kernel loading
position, admin or QE might not be aware of it very well. 

> 
> In other words, I'm trying to understand how much benefit this patchset
> will provide to our users as a whole.

Understood. The list_head replacing patch truly involes too many code
changes, it's risky. I am willing to try any idea from reviewers, won't
persuit they have to be accepted finally. If don't have a try, we don't
know what it looks like, and what impact it may have. I am fine to take
AKASHI's simple version of walk_system_ram_res_rev() to lower risk, even
though it could be a little bit low efficient.

Thanks
Baoquan

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v7 4/4] kexec_file: Load kernel at top of system RAM if required
@ 2018-07-19 15:17       ` Baoquan He
  0 siblings, 0 replies; 83+ messages in thread
From: Baoquan He @ 2018-07-19 15:17 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, robh+dt, dan.j.williams, nicolas.pitre, josh,
	fengguang.wu, bp, andy.shevchenko, patrik.r.jakobsson, airlied,
	kys, haiyangz, sthemmin, dmitry.torokhov, frowand.list,
	keith.busch, jonathan.derrick, lorenzo.pieralisi, bhelgaas, tglx,
	brijesh.singh, jglisse, thomas.lendacky, gregkh, baiyaowei,
	richard.weiyang, devel, linux-input, linux-nvdimm, devicetree,
	linux-pci, ebiederm, vgoyal, dyoung, yinghai, monstr, davem,
	chris, jcmvbkbc, gustavo, maarten.lankhorst, seanpaul,
	linux-parisc, linuxppc-dev, kexec

Hi Andrew,

On 07/18/18 at 03:33pm, Andrew Morton wrote:
> On Wed, 18 Jul 2018 10:49:44 +0800 Baoquan He <bhe@redhat.com> wrote:
> 
> > For kexec_file loading, if kexec_buf.top_down is 'true', the memory which
> > is used to load kernel/initrd/purgatory is supposed to be allocated from
> > top to down. This is what we have been doing all along in the old kexec
> > loading interface and the kexec loading is still default setting in some
> > distributions. However, the current kexec_file loading interface doesn't
> > do like this. The function arch_kexec_walk_mem() it calls ignores checking
> > kexec_buf.top_down, but calls walk_system_ram_res() directly to go through
> > all resources of System RAM from bottom to up, to try to find memory region
> > which can contain the specific kexec buffer, then call locate_mem_hole_callback()
> > to allocate memory in that found memory region from top to down. This brings
> > confusion especially when KASLR is widely supported , users have to make clear
> > why kexec/kdump kernel loading position is different between these two
> > interfaces in order to exclude unnecessary noises. Hence these two interfaces
> > need be unified on behaviour.
> 
> As far as I can tell, the above is the whole reason for the patchset,
> yes?  To avoid confusing users.


In fact, it's not just trying to avoid confusing users. Kexec loading
and kexec_file loading are just do the same thing in essence. Just we
need do kernel image verification on uefi system, have to port kexec
loading code to kernel. 

Kexec has been a formal feature in our distro, and customers owning
those kind of very large machine can make use of this feature to speed
up the reboot process. On uefi machine, the kexec_file loading will
search place to put kernel under 4G from top to down. As we know, the
1st 4G space is DMA32 ZONE, dma, pci mmcfg, bios etc all try to consume
it. It may have possibility to not be able to find a usable space for
kernel/initrd. From the top down of the whole memory space, we don't
have this worry. 

And at the first post, I just posted below with AKASHI's
walk_system_ram_res_rev() version. Later you suggested to use
list_head to link child sibling of resource, see what the code change
looks like.
http://lkml.kernel.org/r/20180322033722.9279-1-bhe@redhat.com

Then I posted v2
http://lkml.kernel.org/r/20180408024724.16812-1-bhe@redhat.com
Rob Herring mentioned that other components which has this tree struct
have planned to do the same thing, replacing the singly linked list with
list_head to link resource child sibling. Just quote Rob's words as
below. I think this could be another reason.

~~~~~ From Rob
The DT struct device_node also has the same tree structure with
parent, child, sibling pointers and converting to list_head had been
on the todo list for a while. ACPI also has some tree walking
functions (drivers/acpi/acpica/pstree.c). Perhaps there should be a
common tree struct and helpers defined either on top of list_head or a
~~~~~
new struct if that saves some size.

> 
> Is that sufficient?  Can we instead simplify their lives by providing
> better documentation or informative printks or better Kconfig text,
> etc?
> 
> And who *are* the people who are performing this configuration?  Random
> system administrators?  Linux distro engineers?  If the latter then
> they presumably aren't easily confused!

Kexec was invented for kernel developer to speed up their kernel
rebooting. Now high end sever admin, kernel developer and QE are also
keen to use it to reboot large box for faster feature testing, bug
debugging. Kernel dev could know this well, about kernel loading
position, admin or QE might not be aware of it very well. 

> 
> In other words, I'm trying to understand how much benefit this patchset
> will provide to our users as a whole.

Understood. The list_head replacing patch truly involes too many code
changes, it's risky. I am willing to try any idea from reviewers, won't
persuit they have to be accepted finally. If don't have a try, we don't
know what it looks like, and what impact it may have. I am fine to take
AKASHI's simple version of walk_system_ram_res_rev() to lower risk, even
though it could be a little bit low efficient.

Thanks
Baoquan

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v7 4/4] kexec_file: Load kernel at top of system RAM if required
@ 2018-07-19 15:17       ` Baoquan He
  0 siblings, 0 replies; 83+ messages in thread
From: Baoquan He @ 2018-07-19 15:17 UTC (permalink / raw)
  To: Andrew Morton
  Cc: nicolas.pitre, brijesh.singh, devicetree, airlied, linux-pci,
	richard.weiyang, keith.busch, jcmvbkbc, baiyaowei, kys,
	frowand.list, tglx, lorenzo.pieralisi, sthemmin, linux-nvdimm,
	patrik.r.jakobsson, andy.shevchenko, linux-input, gustavo, bp,
	dyoung, vgoyal, thomas.lendacky, haiyangz, maarten.lankhorst,
	josh, jglisse, robh+dt, seanpaul, bhelgaas, dan.j.williams,
	yinghai, jonathan.derrick, chris, monstr, linux-parisc, gregkh,
	dmitry.torokhov, kexec, linux-kernel, ebiederm, devel,
	fengguang.wu, linuxppc-dev, davem

Hi Andrew,

On 07/18/18 at 03:33pm, Andrew Morton wrote:
> On Wed, 18 Jul 2018 10:49:44 +0800 Baoquan He <bhe@redhat.com> wrote:
> 
> > For kexec_file loading, if kexec_buf.top_down is 'true', the memory which
> > is used to load kernel/initrd/purgatory is supposed to be allocated from
> > top to down. This is what we have been doing all along in the old kexec
> > loading interface and the kexec loading is still default setting in some
> > distributions. However, the current kexec_file loading interface doesn't
> > do like this. The function arch_kexec_walk_mem() it calls ignores checking
> > kexec_buf.top_down, but calls walk_system_ram_res() directly to go through
> > all resources of System RAM from bottom to up, to try to find memory region
> > which can contain the specific kexec buffer, then call locate_mem_hole_callback()
> > to allocate memory in that found memory region from top to down. This brings
> > confusion especially when KASLR is widely supported , users have to make clear
> > why kexec/kdump kernel loading position is different between these two
> > interfaces in order to exclude unnecessary noises. Hence these two interfaces
> > need be unified on behaviour.
> 
> As far as I can tell, the above is the whole reason for the patchset,
> yes?  To avoid confusing users.


In fact, it's not just trying to avoid confusing users. Kexec loading
and kexec_file loading are just do the same thing in essence. Just we
need do kernel image verification on uefi system, have to port kexec
loading code to kernel. 

Kexec has been a formal feature in our distro, and customers owning
those kind of very large machine can make use of this feature to speed
up the reboot process. On uefi machine, the kexec_file loading will
search place to put kernel under 4G from top to down. As we know, the
1st 4G space is DMA32 ZONE, dma, pci mmcfg, bios etc all try to consume
it. It may have possibility to not be able to find a usable space for
kernel/initrd. From the top down of the whole memory space, we don't
have this worry. 

And at the first post, I just posted below with AKASHI's
walk_system_ram_res_rev() version. Later you suggested to use
list_head to link child sibling of resource, see what the code change
looks like.
http://lkml.kernel.org/r/20180322033722.9279-1-bhe@redhat.com

Then I posted v2
http://lkml.kernel.org/r/20180408024724.16812-1-bhe@redhat.com
Rob Herring mentioned that other components which has this tree struct
have planned to do the same thing, replacing the singly linked list with
list_head to link resource child sibling. Just quote Rob's words as
below. I think this could be another reason.

~~~~~ From Rob
The DT struct device_node also has the same tree structure with
parent, child, sibling pointers and converting to list_head had been
on the todo list for a while. ACPI also has some tree walking
functions (drivers/acpi/acpica/pstree.c). Perhaps there should be a
common tree struct and helpers defined either on top of list_head or a
~~~~~
new struct if that saves some size.

> 
> Is that sufficient?  Can we instead simplify their lives by providing
> better documentation or informative printks or better Kconfig text,
> etc?
> 
> And who *are* the people who are performing this configuration?  Random
> system administrators?  Linux distro engineers?  If the latter then
> they presumably aren't easily confused!

Kexec was invented for kernel developer to speed up their kernel
rebooting. Now high end sever admin, kernel developer and QE are also
keen to use it to reboot large box for faster feature testing, bug
debugging. Kernel dev could know this well, about kernel loading
position, admin or QE might not be aware of it very well. 

> 
> In other words, I'm trying to understand how much benefit this patchset
> will provide to our users as a whole.

Understood. The list_head replacing patch truly involes too many code
changes, it's risky. I am willing to try any idea from reviewers, won't
persuit they have to be accepted finally. If don't have a try, we don't
know what it looks like, and what impact it may have. I am fine to take
AKASHI's simple version of walk_system_ram_res_rev() to lower risk, even
though it could be a little bit low efficient.

Thanks
Baoquan

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v7 1/4] resource: Move reparent_resources() to kernel/resource.c and make it public
@ 2018-07-19 15:18         ` Baoquan He
  0 siblings, 0 replies; 83+ messages in thread
From: Baoquan He @ 2018-07-19 15:18 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Nicolas Pitre, brijesh.singh, devicetree, David Airlie,
	linux-pci, richard.weiyang, Max Filippov, Paul Mackerras,
	baiyaowei, KY Srinivasan, Frank Rowand, Lorenzo Pieralisi,
	Stephen Hemminger, linux-nvdimm, Michael Ellerman,
	Patrik Jakobsson, linux-input, Gustavo Padovan, Borislav Petkov,
	Dave Young, Tom Lendacky, Haiyang Zhang, Maarten Lankhorst,
	Josh Triplett, Jérôme Glisse, Rob Herring, Sean Paul,
	Bjorn Helgaas, Thomas Gleixner, Yinghai Lu, Jon Derrick,
	Chris Zankel, Michal Simek, linux-parisc, Greg Kroah-Hartman,
	Dmitry Torokhov, Linux Kernel Mailing List,
	Benjamin Herrenschmidt, Eric Biederman, devel, Andrew Morton,
	kbuild test robot,
	open list:LINUX FOR POWERPC PA SEMI PWRFICIENT, David S. Miller

On 07/18/18 at 07:37pm, Andy Shevchenko wrote:
> On Wed, Jul 18, 2018 at 7:36 PM, Andy Shevchenko
> <andy.shevchenko@gmail.com> wrote:
> > On Wed, Jul 18, 2018 at 5:49 AM, Baoquan He <bhe@redhat.com> wrote:
> >> reparent_resources() is duplicated in arch/microblaze/pci/pci-common.c
> >> and arch/powerpc/kernel/pci-common.c, so move it to kernel/resource.c
> >> so that it's shared.
> 
> >> + * Returns 0 on success, -ENOTSUPP if child resource is not completely
> >> + * contained by 'res', -ECANCELED if no any conflicting entry found.
> 
> You also can refer to constants by prefixing them with %, e.g. %-ENOTSUPP.
> But this is up to you completely.

Thanks, will fix when repost. 

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v7 1/4] resource: Move reparent_resources() to kernel/resource.c and make it public
@ 2018-07-19 15:18         ` Baoquan He
  0 siblings, 0 replies; 83+ messages in thread
From: Baoquan He @ 2018-07-19 15:18 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Nicolas Pitre, brijesh.singh-5C7GfCeVMHo, devicetree,
	David Airlie, linux-pci-u79uwXL29TY76Z2rM5mHXA,
	richard.weiyang-Re5JQEeQqe8AvxtiuMwx3w, Max Filippov,
	Paul Mackerras, baiyaowei-0p4V/sDNsUmm0O/7XYngnFaTQe2KTcn/,
	KY Srinivasan, Frank Rowand, Lorenzo Pieralisi,
	Stephen Hemminger, linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	Michael Ellerman, Patrik Jakobsson, linux-input, Gustavo Padovan,
	Borislav Petkov, Dave Young, Tom Lendacky, Haiyang Zhang,
	Maarten Lankhorst, Josh Triplett

On 07/18/18 at 07:37pm, Andy Shevchenko wrote:
> On Wed, Jul 18, 2018 at 7:36 PM, Andy Shevchenko
> <andy.shevchenko-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> > On Wed, Jul 18, 2018 at 5:49 AM, Baoquan He <bhe-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> >> reparent_resources() is duplicated in arch/microblaze/pci/pci-common.c
> >> and arch/powerpc/kernel/pci-common.c, so move it to kernel/resource.c
> >> so that it's shared.
> 
> >> + * Returns 0 on success, -ENOTSUPP if child resource is not completely
> >> + * contained by 'res', -ECANCELED if no any conflicting entry found.
> 
> You also can refer to constants by prefixing them with %, e.g. %-ENOTSUPP.
> But this is up to you completely.

Thanks, will fix when repost. 

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v7 1/4] resource: Move reparent_resources() to kernel/resource.c and make it public
@ 2018-07-19 15:18         ` Baoquan He
  0 siblings, 0 replies; 83+ messages in thread
From: Baoquan He @ 2018-07-19 15:18 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Nicolas Pitre, brijesh.singh-5C7GfCeVMHo, devicetree,
	David Airlie, linux-pci-u79uwXL29TY76Z2rM5mHXA,
	richard.weiyang-Re5JQEeQqe8AvxtiuMwx3w, Max Filippov,
	Paul Mackerras, baiyaowei-0p4V/sDNsUmm0O/7XYngnFaTQe2KTcn/,
	KY Srinivasan, Frank Rowand, Lorenzo Pieralisi,
	Stephen Hemminger, linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	Michael Ellerman, Patrik Jakobsson, linux-input, Gustavo Padovan,
	Borislav Petkov, Dave Young, Tom Lendacky, Haiyang Zhang,
	Maarten Lankhorst, Josh Triplett

On 07/18/18 at 07:37pm, Andy Shevchenko wrote:
> On Wed, Jul 18, 2018 at 7:36 PM, Andy Shevchenko
> <andy.shevchenko-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> > On Wed, Jul 18, 2018 at 5:49 AM, Baoquan He <bhe-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> >> reparent_resources() is duplicated in arch/microblaze/pci/pci-common.c
> >> and arch/powerpc/kernel/pci-common.c, so move it to kernel/resource.c
> >> so that it's shared.
> 
> >> + * Returns 0 on success, -ENOTSUPP if child resource is not completely
> >> + * contained by 'res', -ECANCELED if no any conflicting entry found.
> 
> You also can refer to constants by prefixing them with %, e.g. %-ENOTSUPP.
> But this is up to you completely.

Thanks, will fix when repost. 

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v7 1/4] resource: Move reparent_resources() to kernel/resource.c and make it public
@ 2018-07-19 15:18         ` Baoquan He
  0 siblings, 0 replies; 83+ messages in thread
From: Baoquan He @ 2018-07-19 15:18 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Linux Kernel Mailing List, Andrew Morton, Rob Herring,
	Dan Williams, Nicolas Pitre, Josh Triplett, kbuild test robot,
	Borislav Petkov, Patrik Jakobsson, David Airlie, KY Srinivasan,
	Haiyang Zhang, Stephen Hemminger, Dmitry Torokhov, Frank Rowand,
	Keith Busch, Jon Derrick, Lorenzo Pieralisi, Bjorn Helgaas,
	Thomas Gleixner, brijesh.singh, Jérôme Glisse,
	Tom Lendacky, Greg Kroah-Hartman, baiyaowei, richard.weiyang,
	devel, linux-input, linux-nvdimm, devicetree, linux-pci,
	Eric Biederman, Vivek Goyal, Dave Young, Yinghai Lu,
	Michal Simek, David S. Miller, Chris Zankel, Max Filippov,
	Gustavo Padovan, Maarten Lankhorst, Sean Paul, linux-parisc,
	open list:LINUX FOR POWERPC PA SEMI PWRFICIENT,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman

On 07/18/18 at 07:37pm, Andy Shevchenko wrote:
> On Wed, Jul 18, 2018 at 7:36 PM, Andy Shevchenko
> <andy.shevchenko@gmail.com> wrote:
> > On Wed, Jul 18, 2018 at 5:49 AM, Baoquan He <bhe@redhat.com> wrote:
> >> reparent_resources() is duplicated in arch/microblaze/pci/pci-common.c
> >> and arch/powerpc/kernel/pci-common.c, so move it to kernel/resource.c
> >> so that it's shared.
> 
> >> + * Returns 0 on success, -ENOTSUPP if child resource is not completely
> >> + * contained by 'res', -ECANCELED if no any conflicting entry found.
> 
> You also can refer to constants by prefixing them with %, e.g. %-ENOTSUPP.
> But this is up to you completely.

Thanks, will fix when repost. 

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v7 4/4] kexec_file: Load kernel at top of system RAM if required
@ 2018-07-19 19:44         ` Andrew Morton
  0 siblings, 0 replies; 83+ messages in thread
From: Andrew Morton @ 2018-07-19 19:44 UTC (permalink / raw)
  To: Baoquan He
  Cc: nicolas.pitre, brijesh.singh, devicetree, airlied, linux-pci,
	richard.weiyang, jcmvbkbc, baiyaowei, kys, frowand.list, tglx,
	lorenzo.pieralisi, sthemmin, linux-nvdimm, patrik.r.jakobsson,
	andy.shevchenko, linux-input, gustavo, bp, dyoung,
	thomas.lendacky, haiyangz, maarten.lankhorst, josh, jglisse,
	robh+dt, seanpaul, bhelgaas, yinghai, jonathan.derrick, chris,
	monstr, linux-parisc, gregkh, dmitry.torokhov, kexec,
	linux-kernel, ebiederm, devel, fengguang.wu, linuxppc-dev, davem

On Thu, 19 Jul 2018 23:17:53 +0800 Baoquan He <bhe@redhat.com> wrote:

> Hi Andrew,
> 
> On 07/18/18 at 03:33pm, Andrew Morton wrote:
> > On Wed, 18 Jul 2018 10:49:44 +0800 Baoquan He <bhe@redhat.com> wrote:
> > 
> > > For kexec_file loading, if kexec_buf.top_down is 'true', the memory which
> > > is used to load kernel/initrd/purgatory is supposed to be allocated from
> > > top to down. This is what we have been doing all along in the old kexec
> > > loading interface and the kexec loading is still default setting in some
> > > distributions. However, the current kexec_file loading interface doesn't
> > > do like this. The function arch_kexec_walk_mem() it calls ignores checking
> > > kexec_buf.top_down, but calls walk_system_ram_res() directly to go through
> > > all resources of System RAM from bottom to up, to try to find memory region
> > > which can contain the specific kexec buffer, then call locate_mem_hole_callback()
> > > to allocate memory in that found memory region from top to down. This brings
> > > confusion especially when KASLR is widely supported , users have to make clear
> > > why kexec/kdump kernel loading position is different between these two
> > > interfaces in order to exclude unnecessary noises. Hence these two interfaces
> > > need be unified on behaviour.
> > 
> > As far as I can tell, the above is the whole reason for the patchset,
> > yes?  To avoid confusing users.
> 
> 
> In fact, it's not just trying to avoid confusing users. Kexec loading
> and kexec_file loading are just do the same thing in essence. Just we
> need do kernel image verification on uefi system, have to port kexec
> loading code to kernel. 
> 
> Kexec has been a formal feature in our distro, and customers owning
> those kind of very large machine can make use of this feature to speed
> up the reboot process. On uefi machine, the kexec_file loading will
> search place to put kernel under 4G from top to down. As we know, the
> 1st 4G space is DMA32 ZONE, dma, pci mmcfg, bios etc all try to consume
> it. It may have possibility to not be able to find a usable space for
> kernel/initrd. From the top down of the whole memory space, we don't
> have this worry. 
> 
> And at the first post, I just posted below with AKASHI's
> walk_system_ram_res_rev() version. Later you suggested to use
> list_head to link child sibling of resource, see what the code change
> looks like.
> http://lkml.kernel.org/r/20180322033722.9279-1-bhe@redhat.com
> 
> Then I posted v2
> http://lkml.kernel.org/r/20180408024724.16812-1-bhe@redhat.com
> Rob Herring mentioned that other components which has this tree struct
> have planned to do the same thing, replacing the singly linked list with
> list_head to link resource child sibling. Just quote Rob's words as
> below. I think this could be another reason.
> 
> ~~~~~ From Rob
> The DT struct device_node also has the same tree structure with
> parent, child, sibling pointers and converting to list_head had been
> on the todo list for a while. ACPI also has some tree walking
> functions (drivers/acpi/acpica/pstree.c). Perhaps there should be a
> common tree struct and helpers defined either on top of list_head or a
> ~~~~~
> new struct if that saves some size.

Please let's get all this into the changelogs?

> > 
> > Is that sufficient?  Can we instead simplify their lives by providing
> > better documentation or informative printks or better Kconfig text,
> > etc?
> > 
> > And who *are* the people who are performing this configuration?  Random
> > system administrators?  Linux distro engineers?  If the latter then
> > they presumably aren't easily confused!
> 
> Kexec was invented for kernel developer to speed up their kernel
> rebooting. Now high end sever admin, kernel developer and QE are also
> keen to use it to reboot large box for faster feature testing, bug
> debugging. Kernel dev could know this well, about kernel loading
> position, admin or QE might not be aware of it very well. 
> 
> > 
> > In other words, I'm trying to understand how much benefit this patchset
> > will provide to our users as a whole.
> 
> Understood. The list_head replacing patch truly involes too many code
> changes, it's risky. I am willing to try any idea from reviewers, won't
> persuit they have to be accepted finally. If don't have a try, we don't
> know what it looks like, and what impact it may have. I am fine to take
> AKASHI's simple version of walk_system_ram_res_rev() to lower risk, even
> though it could be a little bit low efficient.

The larger patch produces a better result.  We can handle it ;)
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v7 4/4] kexec_file: Load kernel at top of system RAM if required
@ 2018-07-19 19:44         ` Andrew Morton
  0 siblings, 0 replies; 83+ messages in thread
From: Andrew Morton @ 2018-07-19 19:44 UTC (permalink / raw)
  To: Baoquan He
  Cc: nicolas.pitre-QSEj5FYQhm4dnm+yROfE0A, brijesh.singh-5C7GfCeVMHo,
	devicetree-u79uwXL29TY76Z2rM5mHXA, airlied-cv59FeDIM0c,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	richard.weiyang-Re5JQEeQqe8AvxtiuMwx3w,
	jcmvbkbc-Re5JQEeQqe8AvxtiuMwx3w,
	baiyaowei-0p4V/sDNsUmm0O/7XYngnFaTQe2KTcn/,
	kys-0li6OtcxBFHby3iVrkZq2A, frowand.list-Re5JQEeQqe8AvxtiuMwx3w,
	tglx-hfZtesqFncYOwBW4kG4KsQ, lorenzo.pieralisi-5wv7dgnIgG8,
	sthemmin-0li6OtcxBFHby3iVrkZq2A,
	linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	patrik.r.jakobsson-Re5JQEeQqe8AvxtiuMwx3w,
	andy.shevchenko-Re5JQEeQqe8AvxtiuMwx3w,
	linux-input-u79uwXL29TY76Z2rM5mHXA,
	gustavo-THi1TnShQwVAfugRpC6u6w, bp-l3A5Bk7waGM,
	dyoung-H+wXaHxf7aLQT0dZR+AlfA, thomas.lendacky-5C7GfCeVMHo,
	haiyangz-0li6OtcxBFHby3iVrkZq2A,
	maarten.lankhorst-VuQAYsv1563Yd54FQh9/CA,
	josh-iaAMLnmF4UmaiuxdJuQwMA, jglisse-H+wXaHxf7aLQT0dZR+AlfA,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A, seanpaul-F7+t8E8rja9g9hUCZPvPmw,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA, yinghai-DgEjT+Ai2ygdnm+yROfE0A,
	jonathan.derrick-ral2JQCrhuEAvxtiuMwx3w,
	chris-YvXeqwSYzG2sTnJN9+BGXg, monstr-pSz03upnqPeHXe+LvDLADg,
	linux-parisc-u79uwXL29TY76Z2rM5mHXA,
	gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r,
	dmitry.torokhov-Re5JQEeQqe8AvxtiuMwx3w,
	kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, linux-kernel

On Thu, 19 Jul 2018 23:17:53 +0800 Baoquan He <bhe-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:

> Hi Andrew,
> 
> On 07/18/18 at 03:33pm, Andrew Morton wrote:
> > On Wed, 18 Jul 2018 10:49:44 +0800 Baoquan He <bhe-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> > 
> > > For kexec_file loading, if kexec_buf.top_down is 'true', the memory which
> > > is used to load kernel/initrd/purgatory is supposed to be allocated from
> > > top to down. This is what we have been doing all along in the old kexec
> > > loading interface and the kexec loading is still default setting in some
> > > distributions. However, the current kexec_file loading interface doesn't
> > > do like this. The function arch_kexec_walk_mem() it calls ignores checking
> > > kexec_buf.top_down, but calls walk_system_ram_res() directly to go through
> > > all resources of System RAM from bottom to up, to try to find memory region
> > > which can contain the specific kexec buffer, then call locate_mem_hole_callback()
> > > to allocate memory in that found memory region from top to down. This brings
> > > confusion especially when KASLR is widely supported , users have to make clear
> > > why kexec/kdump kernel loading position is different between these two
> > > interfaces in order to exclude unnecessary noises. Hence these two interfaces
> > > need be unified on behaviour.
> > 
> > As far as I can tell, the above is the whole reason for the patchset,
> > yes?  To avoid confusing users.
> 
> 
> In fact, it's not just trying to avoid confusing users. Kexec loading
> and kexec_file loading are just do the same thing in essence. Just we
> need do kernel image verification on uefi system, have to port kexec
> loading code to kernel. 
> 
> Kexec has been a formal feature in our distro, and customers owning
> those kind of very large machine can make use of this feature to speed
> up the reboot process. On uefi machine, the kexec_file loading will
> search place to put kernel under 4G from top to down. As we know, the
> 1st 4G space is DMA32 ZONE, dma, pci mmcfg, bios etc all try to consume
> it. It may have possibility to not be able to find a usable space for
> kernel/initrd. From the top down of the whole memory space, we don't
> have this worry. 
> 
> And at the first post, I just posted below with AKASHI's
> walk_system_ram_res_rev() version. Later you suggested to use
> list_head to link child sibling of resource, see what the code change
> looks like.
> http://lkml.kernel.org/r/20180322033722.9279-1-bhe-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org
> 
> Then I posted v2
> http://lkml.kernel.org/r/20180408024724.16812-1-bhe-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org
> Rob Herring mentioned that other components which has this tree struct
> have planned to do the same thing, replacing the singly linked list with
> list_head to link resource child sibling. Just quote Rob's words as
> below. I think this could be another reason.
> 
> ~~~~~ From Rob
> The DT struct device_node also has the same tree structure with
> parent, child, sibling pointers and converting to list_head had been
> on the todo list for a while. ACPI also has some tree walking
> functions (drivers/acpi/acpica/pstree.c). Perhaps there should be a
> common tree struct and helpers defined either on top of list_head or a
> ~~~~~
> new struct if that saves some size.

Please let's get all this into the changelogs?

> > 
> > Is that sufficient?  Can we instead simplify their lives by providing
> > better documentation or informative printks or better Kconfig text,
> > etc?
> > 
> > And who *are* the people who are performing this configuration?  Random
> > system administrators?  Linux distro engineers?  If the latter then
> > they presumably aren't easily confused!
> 
> Kexec was invented for kernel developer to speed up their kernel
> rebooting. Now high end sever admin, kernel developer and QE are also
> keen to use it to reboot large box for faster feature testing, bug
> debugging. Kernel dev could know this well, about kernel loading
> position, admin or QE might not be aware of it very well. 
> 
> > 
> > In other words, I'm trying to understand how much benefit this patchset
> > will provide to our users as a whole.
> 
> Understood. The list_head replacing patch truly involes too many code
> changes, it's risky. I am willing to try any idea from reviewers, won't
> persuit they have to be accepted finally. If don't have a try, we don't
> know what it looks like, and what impact it may have. I am fine to take
> AKASHI's simple version of walk_system_ram_res_rev() to lower risk, even
> though it could be a little bit low efficient.

The larger patch produces a better result.  We can handle it ;)

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v7 4/4] kexec_file: Load kernel at top of system RAM if required
@ 2018-07-19 19:44         ` Andrew Morton
  0 siblings, 0 replies; 83+ messages in thread
From: Andrew Morton @ 2018-07-19 19:44 UTC (permalink / raw)
  To: Baoquan He
  Cc: linux-kernel, robh+dt, dan.j.williams, nicolas.pitre, josh,
	fengguang.wu, bp, andy.shevchenko, patrik.r.jakobsson, airlied,
	kys, haiyangz, sthemmin, dmitry.torokhov, frowand.list,
	keith.busch, jonathan.derrick, lorenzo.pieralisi, bhelgaas, tglx,
	brijesh.singh, jglisse, thomas.lendacky, gregkh, baiyaowei,
	richard.weiyang, devel, linux-input, linux-nvdimm, devicetree,
	linux-pci, ebiederm, vgoyal, dyoung, yinghai, monstr, davem,
	chris, jcmvbkbc, gustavo, maarten.lankhorst, seanpaul,
	linux-parisc, linuxppc-dev, kexec

On Thu, 19 Jul 2018 23:17:53 +0800 Baoquan He <bhe@redhat.com> wrote:

> Hi Andrew,
> 
> On 07/18/18 at 03:33pm, Andrew Morton wrote:
> > On Wed, 18 Jul 2018 10:49:44 +0800 Baoquan He <bhe@redhat.com> wrote:
> > 
> > > For kexec_file loading, if kexec_buf.top_down is 'true', the memory which
> > > is used to load kernel/initrd/purgatory is supposed to be allocated from
> > > top to down. This is what we have been doing all along in the old kexec
> > > loading interface and the kexec loading is still default setting in some
> > > distributions. However, the current kexec_file loading interface doesn't
> > > do like this. The function arch_kexec_walk_mem() it calls ignores checking
> > > kexec_buf.top_down, but calls walk_system_ram_res() directly to go through
> > > all resources of System RAM from bottom to up, to try to find memory region
> > > which can contain the specific kexec buffer, then call locate_mem_hole_callback()
> > > to allocate memory in that found memory region from top to down. This brings
> > > confusion especially when KASLR is widely supported , users have to make clear
> > > why kexec/kdump kernel loading position is different between these two
> > > interfaces in order to exclude unnecessary noises. Hence these two interfaces
> > > need be unified on behaviour.
> > 
> > As far as I can tell, the above is the whole reason for the patchset,
> > yes?  To avoid confusing users.
> 
> 
> In fact, it's not just trying to avoid confusing users. Kexec loading
> and kexec_file loading are just do the same thing in essence. Just we
> need do kernel image verification on uefi system, have to port kexec
> loading code to kernel. 
> 
> Kexec has been a formal feature in our distro, and customers owning
> those kind of very large machine can make use of this feature to speed
> up the reboot process. On uefi machine, the kexec_file loading will
> search place to put kernel under 4G from top to down. As we know, the
> 1st 4G space is DMA32 ZONE, dma, pci mmcfg, bios etc all try to consume
> it. It may have possibility to not be able to find a usable space for
> kernel/initrd. From the top down of the whole memory space, we don't
> have this worry. 
> 
> And at the first post, I just posted below with AKASHI's
> walk_system_ram_res_rev() version. Later you suggested to use
> list_head to link child sibling of resource, see what the code change
> looks like.
> http://lkml.kernel.org/r/20180322033722.9279-1-bhe@redhat.com
> 
> Then I posted v2
> http://lkml.kernel.org/r/20180408024724.16812-1-bhe@redhat.com
> Rob Herring mentioned that other components which has this tree struct
> have planned to do the same thing, replacing the singly linked list with
> list_head to link resource child sibling. Just quote Rob's words as
> below. I think this could be another reason.
> 
> ~~~~~ From Rob
> The DT struct device_node also has the same tree structure with
> parent, child, sibling pointers and converting to list_head had been
> on the todo list for a while. ACPI also has some tree walking
> functions (drivers/acpi/acpica/pstree.c). Perhaps there should be a
> common tree struct and helpers defined either on top of list_head or a
> ~~~~~
> new struct if that saves some size.

Please let's get all this into the changelogs?

> > 
> > Is that sufficient?  Can we instead simplify their lives by providing
> > better documentation or informative printks or better Kconfig text,
> > etc?
> > 
> > And who *are* the people who are performing this configuration?  Random
> > system administrators?  Linux distro engineers?  If the latter then
> > they presumably aren't easily confused!
> 
> Kexec was invented for kernel developer to speed up their kernel
> rebooting. Now high end sever admin, kernel developer and QE are also
> keen to use it to reboot large box for faster feature testing, bug
> debugging. Kernel dev could know this well, about kernel loading
> position, admin or QE might not be aware of it very well. 
> 
> > 
> > In other words, I'm trying to understand how much benefit this patchset
> > will provide to our users as a whole.
> 
> Understood. The list_head replacing patch truly involes too many code
> changes, it's risky. I am willing to try any idea from reviewers, won't
> persuit they have to be accepted finally. If don't have a try, we don't
> know what it looks like, and what impact it may have. I am fine to take
> AKASHI's simple version of walk_system_ram_res_rev() to lower risk, even
> though it could be a little bit low efficient.

The larger patch produces a better result.  We can handle it ;)

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v7 4/4] kexec_file: Load kernel at top of system RAM if required
@ 2018-07-19 19:44         ` Andrew Morton
  0 siblings, 0 replies; 83+ messages in thread
From: Andrew Morton @ 2018-07-19 19:44 UTC (permalink / raw)
  To: Baoquan He
  Cc: nicolas.pitre, brijesh.singh, devicetree, airlied, linux-pci,
	richard.weiyang, keith.busch, jcmvbkbc, baiyaowei, kys,
	frowand.list, tglx, lorenzo.pieralisi, sthemmin, linux-nvdimm,
	patrik.r.jakobsson, andy.shevchenko, linux-input, gustavo, bp,
	dyoung, vgoyal, thomas.lendacky, haiyangz, maarten.lankhorst,
	josh, jglisse, robh+dt, seanpaul, bhelgaas, dan.j.williams,
	yinghai, jonathan.derrick, chris, monstr, linux-parisc, gregkh,
	dmitry.torokhov, kexec, linux-kernel, ebiederm, devel,
	fengguang.wu, linuxppc-dev, davem

On Thu, 19 Jul 2018 23:17:53 +0800 Baoquan He <bhe@redhat.com> wrote:

> Hi Andrew,
> 
> On 07/18/18 at 03:33pm, Andrew Morton wrote:
> > On Wed, 18 Jul 2018 10:49:44 +0800 Baoquan He <bhe@redhat.com> wrote:
> > 
> > > For kexec_file loading, if kexec_buf.top_down is 'true', the memory which
> > > is used to load kernel/initrd/purgatory is supposed to be allocated from
> > > top to down. This is what we have been doing all along in the old kexec
> > > loading interface and the kexec loading is still default setting in some
> > > distributions. However, the current kexec_file loading interface doesn't
> > > do like this. The function arch_kexec_walk_mem() it calls ignores checking
> > > kexec_buf.top_down, but calls walk_system_ram_res() directly to go through
> > > all resources of System RAM from bottom to up, to try to find memory region
> > > which can contain the specific kexec buffer, then call locate_mem_hole_callback()
> > > to allocate memory in that found memory region from top to down. This brings
> > > confusion especially when KASLR is widely supported , users have to make clear
> > > why kexec/kdump kernel loading position is different between these two
> > > interfaces in order to exclude unnecessary noises. Hence these two interfaces
> > > need be unified on behaviour.
> > 
> > As far as I can tell, the above is the whole reason for the patchset,
> > yes?  To avoid confusing users.
> 
> 
> In fact, it's not just trying to avoid confusing users. Kexec loading
> and kexec_file loading are just do the same thing in essence. Just we
> need do kernel image verification on uefi system, have to port kexec
> loading code to kernel. 
> 
> Kexec has been a formal feature in our distro, and customers owning
> those kind of very large machine can make use of this feature to speed
> up the reboot process. On uefi machine, the kexec_file loading will
> search place to put kernel under 4G from top to down. As we know, the
> 1st 4G space is DMA32 ZONE, dma, pci mmcfg, bios etc all try to consume
> it. It may have possibility to not be able to find a usable space for
> kernel/initrd. From the top down of the whole memory space, we don't
> have this worry. 
> 
> And at the first post, I just posted below with AKASHI's
> walk_system_ram_res_rev() version. Later you suggested to use
> list_head to link child sibling of resource, see what the code change
> looks like.
> http://lkml.kernel.org/r/20180322033722.9279-1-bhe@redhat.com
> 
> Then I posted v2
> http://lkml.kernel.org/r/20180408024724.16812-1-bhe@redhat.com
> Rob Herring mentioned that other components which has this tree struct
> have planned to do the same thing, replacing the singly linked list with
> list_head to link resource child sibling. Just quote Rob's words as
> below. I think this could be another reason.
> 
> ~~~~~ From Rob
> The DT struct device_node also has the same tree structure with
> parent, child, sibling pointers and converting to list_head had been
> on the todo list for a while. ACPI also has some tree walking
> functions (drivers/acpi/acpica/pstree.c). Perhaps there should be a
> common tree struct and helpers defined either on top of list_head or a
> ~~~~~
> new struct if that saves some size.

Please let's get all this into the changelogs?

> > 
> > Is that sufficient?  Can we instead simplify their lives by providing
> > better documentation or informative printks or better Kconfig text,
> > etc?
> > 
> > And who *are* the people who are performing this configuration?  Random
> > system administrators?  Linux distro engineers?  If the latter then
> > they presumably aren't easily confused!
> 
> Kexec was invented for kernel developer to speed up their kernel
> rebooting. Now high end sever admin, kernel developer and QE are also
> keen to use it to reboot large box for faster feature testing, bug
> debugging. Kernel dev could know this well, about kernel loading
> position, admin or QE might not be aware of it very well. 
> 
> > 
> > In other words, I'm trying to understand how much benefit this patchset
> > will provide to our users as a whole.
> 
> Understood. The list_head replacing patch truly involes too many code
> changes, it's risky. I am willing to try any idea from reviewers, won't
> persuit they have to be accepted finally. If don't have a try, we don't
> know what it looks like, and what impact it may have. I am fine to take
> AKASHI's simple version of walk_system_ram_res_rev() to lower risk, even
> though it could be a little bit low efficient.

The larger patch produces a better result.  We can handle it ;)

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v7 4/4] kexec_file: Load kernel at top of system RAM if required
  2018-07-19 15:17       ` Baoquan He
                           ` (2 preceding siblings ...)
  (?)
@ 2018-07-23 14:34         ` Michal Hocko
  -1 siblings, 0 replies; 83+ messages in thread
From: Michal Hocko @ 2018-07-23 14:34 UTC (permalink / raw)
  To: Baoquan He
  Cc: nicolas.pitre, brijesh.singh, devicetree, airlied, linux-pci,
	richard.weiyang, jcmvbkbc, baiyaowei, kys, frowand.list,
	lorenzo.pieralisi, sthemmin, linux-nvdimm, patrik.r.jakobsson,
	andy.shevchenko, linux-input, gustavo, bp, dyoung,
	thomas.lendacky, haiyangz, maarten.lankhorst, josh, jglisse,
	robh+dt, seanpaul, bhelgaas, tglx, yinghai, jonathan.derrick,
	chris, monstr, linux-parisc, gregkh, dmitry.torokhov, kexec,
	linux-kernel, ebiederm, devel, Andrew Morton, fengguang.wu,
	linuxppc-dev, davem

On Thu 19-07-18 23:17:53, Baoquan He wrote:
> Kexec has been a formal feature in our distro, and customers owning
> those kind of very large machine can make use of this feature to speed
> up the reboot process. On uefi machine, the kexec_file loading will
> search place to put kernel under 4G from top to down. As we know, the
> 1st 4G space is DMA32 ZONE, dma, pci mmcfg, bios etc all try to consume
> it. It may have possibility to not be able to find a usable space for
> kernel/initrd. From the top down of the whole memory space, we don't
> have this worry. 

I do not have the full context here but let me note that you should be
careful when doing top-down reservation because you can easily get into
hotplugable memory and break the hotremove usecase. We even warn when
this is done. See memblock_find_in_range_node
-- 
Michal Hocko
SUSE Labs
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v7 4/4] kexec_file: Load kernel at top of system RAM if required
@ 2018-07-23 14:34         ` Michal Hocko
  0 siblings, 0 replies; 83+ messages in thread
From: Michal Hocko @ 2018-07-23 14:34 UTC (permalink / raw)
  To: Baoquan He
  Cc: Andrew Morton, linux-kernel, robh+dt, dan.j.williams,
	nicolas.pitre, josh, fengguang.wu, bp, andy.shevchenko,
	patrik.r.jakobsson, airlied, kys, haiyangz, sthemmin,
	dmitry.torokhov, frowand.list, keith.busch, jonathan.derrick,
	lorenzo.pieralisi, bhelgaas, tglx, brijesh.singh, jglisse,
	thomas.lendacky, gregkh, baiyaowei, richard.weiyang, devel,
	linux-input, linux-nvdimm, devicetree

On Thu 19-07-18 23:17:53, Baoquan He wrote:
> Kexec has been a formal feature in our distro, and customers owning
> those kind of very large machine can make use of this feature to speed
> up the reboot process. On uefi machine, the kexec_file loading will
> search place to put kernel under 4G from top to down. As we know, the
> 1st 4G space is DMA32 ZONE, dma, pci mmcfg, bios etc all try to consume
> it. It may have possibility to not be able to find a usable space for
> kernel/initrd. From the top down of the whole memory space, we don't
> have this worry. 

I do not have the full context here but let me note that you should be
careful when doing top-down reservation because you can easily get into
hotplugable memory and break the hotremove usecase. We even warn when
this is done. See memblock_find_in_range_node
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v7 4/4] kexec_file: Load kernel at top of system RAM if required
@ 2018-07-23 14:34         ` Michal Hocko
  0 siblings, 0 replies; 83+ messages in thread
From: Michal Hocko @ 2018-07-23 14:34 UTC (permalink / raw)
  To: Baoquan He
  Cc: Andrew Morton, linux-kernel, robh+dt, dan.j.williams,
	nicolas.pitre, josh, fengguang.wu, bp, andy.shevchenko,
	patrik.r.jakobsson, airlied, kys, haiyangz, sthemmin,
	dmitry.torokhov, frowand.list, keith.busch, jonathan.derrick,
	lorenzo.pieralisi, bhelgaas, tglx, brijesh.singh, jglisse,
	thomas.lendacky, gregkh, baiyaowei, richard.weiyang, devel,
	linux-input, linux-nvdimm, devicetree, linux-pci, ebiederm,
	vgoyal, dyoung, yinghai, monstr, davem, chris, jcmvbkbc, gustavo,
	maarten.lankhorst, seanpaul, linux-parisc, linuxppc-dev, kexec

On Thu 19-07-18 23:17:53, Baoquan He wrote:
> Kexec has been a formal feature in our distro, and customers owning
> those kind of very large machine can make use of this feature to speed
> up the reboot process. On uefi machine, the kexec_file loading will
> search place to put kernel under 4G from top to down. As we know, the
> 1st 4G space is DMA32 ZONE, dma, pci mmcfg, bios etc all try to consume
> it. It may have possibility to not be able to find a usable space for
> kernel/initrd. From the top down of the whole memory space, we don't
> have this worry. 

I do not have the full context here but let me note that you should be
careful when doing top-down reservation because you can easily get into
hotplugable memory and break the hotremove usecase. We even warn when
this is done. See memblock_find_in_range_node
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v7 4/4] kexec_file: Load kernel at top of system RAM if required
@ 2018-07-23 14:34         ` Michal Hocko
  0 siblings, 0 replies; 83+ messages in thread
From: Michal Hocko @ 2018-07-23 14:34 UTC (permalink / raw)
  To: Baoquan He
  Cc: Andrew Morton, linux-kernel, robh+dt, dan.j.williams,
	nicolas.pitre, josh, fengguang.wu, bp, andy.shevchenko,
	patrik.r.jakobsson, airlied, kys, haiyangz, sthemmin,
	dmitry.torokhov, frowand.list, keith.busch, jonathan.derrick,
	lorenzo.pieralisi, bhelgaas, tglx, brijesh.singh, jglisse,
	thomas.lendacky, gregkh, baiyaowei, richard.weiyang, devel,
	linux-input, linux-nvdimm, devicetree

On Thu 19-07-18 23:17:53, Baoquan He wrote:
> Kexec has been a formal feature in our distro, and customers owning
> those kind of very large machine can make use of this feature to speed
> up the reboot process. On uefi machine, the kexec_file loading will
> search place to put kernel under 4G from top to down. As we know, the
> 1st 4G space is DMA32 ZONE, dma, pci mmcfg, bios etc all try to consume
> it. It may have possibility to not be able to find a usable space for
> kernel/initrd. From the top down of the whole memory space, we don't
> have this worry. 

I do not have the full context here but let me note that you should be
careful when doing top-down reservation because you can easily get into
hotplugable memory and break the hotremove usecase. We even warn when
this is done. See memblock_find_in_range_node
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v7 4/4] kexec_file: Load kernel at top of system RAM if required
@ 2018-07-23 14:34         ` Michal Hocko
  0 siblings, 0 replies; 83+ messages in thread
From: Michal Hocko @ 2018-07-23 14:34 UTC (permalink / raw)
  To: Baoquan He
  Cc: nicolas.pitre, brijesh.singh, devicetree, airlied, linux-pci,
	richard.weiyang, keith.busch, jcmvbkbc, baiyaowei, kys,
	frowand.list, dan.j.williams, lorenzo.pieralisi, sthemmin,
	linux-nvdimm, patrik.r.jakobsson, andy.shevchenko, linux-input,
	gustavo, bp, dyoung, vgoyal, thomas.lendacky, haiyangz,
	maarten.lankhorst, josh, jglisse, robh+dt, seanpaul, bhelgaas,
	tglx, yinghai, jonathan.derrick, chris, monstr, linux-parisc,
	gregkh, dmitry.torokhov, kexec, linux-kernel, ebiederm, devel,
	Andrew Morton, fengguang.wu, linuxppc-dev, davem

On Thu 19-07-18 23:17:53, Baoquan He wrote:
> Kexec has been a formal feature in our distro, and customers owning
> those kind of very large machine can make use of this feature to speed
> up the reboot process. On uefi machine, the kexec_file loading will
> search place to put kernel under 4G from top to down. As we know, the
> 1st 4G space is DMA32 ZONE, dma, pci mmcfg, bios etc all try to consume
> it. It may have possibility to not be able to find a usable space for
> kernel/initrd. From the top down of the whole memory space, we don't
> have this worry. 

I do not have the full context here but let me note that you should be
careful when doing top-down reservation because you can easily get into
hotplugable memory and break the hotremove usecase. We even warn when
this is done. See memblock_find_in_range_node
-- 
Michal Hocko
SUSE Labs

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v7 4/4] kexec_file: Load kernel at top of system RAM if required
@ 2018-07-25  2:21           ` Baoquan He
  0 siblings, 0 replies; 83+ messages in thread
From: Baoquan He @ 2018-07-25  2:21 UTC (permalink / raw)
  To: Andrew Morton
  Cc: nicolas.pitre, brijesh.singh, devicetree, airlied, linux-pci,
	richard.weiyang, jcmvbkbc, baiyaowei, kys, frowand.list, tglx,
	lorenzo.pieralisi, sthemmin, linux-nvdimm, patrik.r.jakobsson,
	andy.shevchenko, linux-input, gustavo, bp, dyoung,
	thomas.lendacky, haiyangz, maarten.lankhorst, josh, jglisse,
	robh+dt, seanpaul, bhelgaas, yinghai, jonathan.derrick, chris,
	monstr, linux-parisc, gregkh, dmitry.torokhov, kexec,
	linux-kernel, ebiederm, devel, fengguang.wu, linuxppc-dev, davem

Hi Andrew,

On 07/19/18 at 12:44pm, Andrew Morton wrote:
> On Thu, 19 Jul 2018 23:17:53 +0800 Baoquan He <bhe@redhat.com> wrote:
> > > As far as I can tell, the above is the whole reason for the patchset,
> > > yes?  To avoid confusing users.
> > 
> > 
> > In fact, it's not just trying to avoid confusing users. Kexec loading
> > and kexec_file loading are just do the same thing in essence. Just we
> > need do kernel image verification on uefi system, have to port kexec
> > loading code to kernel. 
> > 
> > Kexec has been a formal feature in our distro, and customers owning
> > those kind of very large machine can make use of this feature to speed
> > up the reboot process. On uefi machine, the kexec_file loading will
> > search place to put kernel under 4G from top to down. As we know, the
> > 1st 4G space is DMA32 ZONE, dma, pci mmcfg, bios etc all try to consume
> > it. It may have possibility to not be able to find a usable space for
> > kernel/initrd. From the top down of the whole memory space, we don't
> > have this worry. 
> > 
> > And at the first post, I just posted below with AKASHI's
> > walk_system_ram_res_rev() version. Later you suggested to use
> > list_head to link child sibling of resource, see what the code change
> > looks like.
> > http://lkml.kernel.org/r/20180322033722.9279-1-bhe@redhat.com
> > 
> > Then I posted v2
> > http://lkml.kernel.org/r/20180408024724.16812-1-bhe@redhat.com
> > Rob Herring mentioned that other components which has this tree struct
> > have planned to do the same thing, replacing the singly linked list with
> > list_head to link resource child sibling. Just quote Rob's words as
> > below. I think this could be another reason.
> > 
> > ~~~~~ From Rob
> > The DT struct device_node also has the same tree structure with
> > parent, child, sibling pointers and converting to list_head had been
> > on the todo list for a while. ACPI also has some tree walking
> > functions (drivers/acpi/acpica/pstree.c). Perhaps there should be a
> > common tree struct and helpers defined either on top of list_head or a
> > ~~~~~
> > new struct if that saves some size.
> 
> Please let's get all this into the changelogs?

Sorry for late reply because of some urgent customer hotplug issues.

I am rewriting all change logs, and cover letter. Then found I was wrong
about the 2nd reason. The current kexec_file_load calls
kexec_locate_mem_hole() to go through all system RAM region, if one
region is larger than the size of kernel or initrd, it will search a
position in that region from top to down. Since kexec will jump to 2nd
kernel and don't need to care the 1st kernel's data, we can always find
a usable space to load kexec kernel/initrd under 4G.

So the only reason for this patch is keeping consistent with kexec_load
and avoid confusion.

And since x86 5-level paging mode has been added, we have another issue
for top-down searching in the whole system RAM. That is we support
dynamic 4-level to 5-level changing. Namely a kernel compiled with
5-level support, we can add 'no5lvl' to force 4-level. Then jumping from
a 5-level kernel to 4-level kernel, e.g we load kernel at the top of
system RAM in 5-level paging mode which might be bigger than 64TB, then
try to jump to 4-level kernel with the upper limit of 64TB. For this
case, we need add limit for kexec kernel loading if in 5-level kernel.

All this mess makes me hesitate to choose a deligate method. Maybe I
should drop this patchset.

> 
> > > 
> > > Is that sufficient?  Can we instead simplify their lives by providing
> > > better documentation or informative printks or better Kconfig text,
> > > etc?
> > > 
> > > And who *are* the people who are performing this configuration?  Random
> > > system administrators?  Linux distro engineers?  If the latter then
> > > they presumably aren't easily confused!
> > 
> > Kexec was invented for kernel developer to speed up their kernel
> > rebooting. Now high end sever admin, kernel developer and QE are also
> > keen to use it to reboot large box for faster feature testing, bug
> > debugging. Kernel dev could know this well, about kernel loading
> > position, admin or QE might not be aware of it very well. 
> > 
> > > 
> > > In other words, I'm trying to understand how much benefit this patchset
> > > will provide to our users as a whole.
> > 
> > Understood. The list_head replacing patch truly involes too many code
> > changes, it's risky. I am willing to try any idea from reviewers, won't
> > persuit they have to be accepted finally. If don't have a try, we don't
> > know what it looks like, and what impact it may have. I am fine to take
> > AKASHI's simple version of walk_system_ram_res_rev() to lower risk, even
> > though it could be a little bit low efficient.
> 
> The larger patch produces a better result.  We can handle it ;)

For this issue, if we stop changing the kexec top down searching code,
I am not sure if we should post this replacing with list_head patches
separately.

Thanks
Baoquan
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v7 4/4] kexec_file: Load kernel at top of system RAM if required
@ 2018-07-25  2:21           ` Baoquan He
  0 siblings, 0 replies; 83+ messages in thread
From: Baoquan He @ 2018-07-25  2:21 UTC (permalink / raw)
  To: Andrew Morton
  Cc: nicolas.pitre-QSEj5FYQhm4dnm+yROfE0A, brijesh.singh-5C7GfCeVMHo,
	devicetree-u79uwXL29TY76Z2rM5mHXA, airlied-cv59FeDIM0c,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	richard.weiyang-Re5JQEeQqe8AvxtiuMwx3w,
	jcmvbkbc-Re5JQEeQqe8AvxtiuMwx3w,
	baiyaowei-0p4V/sDNsUmm0O/7XYngnFaTQe2KTcn/,
	kys-0li6OtcxBFHby3iVrkZq2A, frowand.list-Re5JQEeQqe8AvxtiuMwx3w,
	tglx-hfZtesqFncYOwBW4kG4KsQ, lorenzo.pieralisi-5wv7dgnIgG8,
	sthemmin-0li6OtcxBFHby3iVrkZq2A,
	linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	patrik.r.jakobsson-Re5JQEeQqe8AvxtiuMwx3w,
	andy.shevchenko-Re5JQEeQqe8AvxtiuMwx3w,
	linux-input-u79uwXL29TY76Z2rM5mHXA,
	gustavo-THi1TnShQwVAfugRpC6u6w, bp-l3A5Bk7waGM,
	dyoung-H+wXaHxf7aLQT0dZR+AlfA, thomas.lendacky-5C7GfCeVMHo,
	haiyangz-0li6OtcxBFHby3iVrkZq2A,
	maarten.lankhorst-VuQAYsv1563Yd54FQh9/CA,
	josh-iaAMLnmF4UmaiuxdJuQwMA, jglisse-H+wXaHxf7aLQT0dZR+AlfA,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A, seanpaul-F7+t8E8rja9g9hUCZPvPmw,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA, yinghai-DgEjT+Ai2ygdnm+yROfE0A,
	jonathan.derrick-ral2JQCrhuEAvxtiuMwx3w,
	chris-YvXeqwSYzG2sTnJN9+BGXg, monstr-pSz03upnqPeHXe+LvDLADg,
	linux-parisc-u79uwXL29TY76Z2rM5mHXA,
	gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r,
	dmitry.torokhov-Re5JQEeQqe8AvxtiuMwx3w,
	kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, linux-kernel

Hi Andrew,

On 07/19/18 at 12:44pm, Andrew Morton wrote:
> On Thu, 19 Jul 2018 23:17:53 +0800 Baoquan He <bhe-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> > > As far as I can tell, the above is the whole reason for the patchset,
> > > yes?  To avoid confusing users.
> > 
> > 
> > In fact, it's not just trying to avoid confusing users. Kexec loading
> > and kexec_file loading are just do the same thing in essence. Just we
> > need do kernel image verification on uefi system, have to port kexec
> > loading code to kernel. 
> > 
> > Kexec has been a formal feature in our distro, and customers owning
> > those kind of very large machine can make use of this feature to speed
> > up the reboot process. On uefi machine, the kexec_file loading will
> > search place to put kernel under 4G from top to down. As we know, the
> > 1st 4G space is DMA32 ZONE, dma, pci mmcfg, bios etc all try to consume
> > it. It may have possibility to not be able to find a usable space for
> > kernel/initrd. From the top down of the whole memory space, we don't
> > have this worry. 
> > 
> > And at the first post, I just posted below with AKASHI's
> > walk_system_ram_res_rev() version. Later you suggested to use
> > list_head to link child sibling of resource, see what the code change
> > looks like.
> > http://lkml.kernel.org/r/20180322033722.9279-1-bhe-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org
> > 
> > Then I posted v2
> > http://lkml.kernel.org/r/20180408024724.16812-1-bhe-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org
> > Rob Herring mentioned that other components which has this tree struct
> > have planned to do the same thing, replacing the singly linked list with
> > list_head to link resource child sibling. Just quote Rob's words as
> > below. I think this could be another reason.
> > 
> > ~~~~~ From Rob
> > The DT struct device_node also has the same tree structure with
> > parent, child, sibling pointers and converting to list_head had been
> > on the todo list for a while. ACPI also has some tree walking
> > functions (drivers/acpi/acpica/pstree.c). Perhaps there should be a
> > common tree struct and helpers defined either on top of list_head or a
> > ~~~~~
> > new struct if that saves some size.
> 
> Please let's get all this into the changelogs?

Sorry for late reply because of some urgent customer hotplug issues.

I am rewriting all change logs, and cover letter. Then found I was wrong
about the 2nd reason. The current kexec_file_load calls
kexec_locate_mem_hole() to go through all system RAM region, if one
region is larger than the size of kernel or initrd, it will search a
position in that region from top to down. Since kexec will jump to 2nd
kernel and don't need to care the 1st kernel's data, we can always find
a usable space to load kexec kernel/initrd under 4G.

So the only reason for this patch is keeping consistent with kexec_load
and avoid confusion.

And since x86 5-level paging mode has been added, we have another issue
for top-down searching in the whole system RAM. That is we support
dynamic 4-level to 5-level changing. Namely a kernel compiled with
5-level support, we can add 'no5lvl' to force 4-level. Then jumping from
a 5-level kernel to 4-level kernel, e.g we load kernel at the top of
system RAM in 5-level paging mode which might be bigger than 64TB, then
try to jump to 4-level kernel with the upper limit of 64TB. For this
case, we need add limit for kexec kernel loading if in 5-level kernel.

All this mess makes me hesitate to choose a deligate method. Maybe I
should drop this patchset.

> 
> > > 
> > > Is that sufficient?  Can we instead simplify their lives by providing
> > > better documentation or informative printks or better Kconfig text,
> > > etc?
> > > 
> > > And who *are* the people who are performing this configuration?  Random
> > > system administrators?  Linux distro engineers?  If the latter then
> > > they presumably aren't easily confused!
> > 
> > Kexec was invented for kernel developer to speed up their kernel
> > rebooting. Now high end sever admin, kernel developer and QE are also
> > keen to use it to reboot large box for faster feature testing, bug
> > debugging. Kernel dev could know this well, about kernel loading
> > position, admin or QE might not be aware of it very well. 
> > 
> > > 
> > > In other words, I'm trying to understand how much benefit this patchset
> > > will provide to our users as a whole.
> > 
> > Understood. The list_head replacing patch truly involes too many code
> > changes, it's risky. I am willing to try any idea from reviewers, won't
> > persuit they have to be accepted finally. If don't have a try, we don't
> > know what it looks like, and what impact it may have. I am fine to take
> > AKASHI's simple version of walk_system_ram_res_rev() to lower risk, even
> > though it could be a little bit low efficient.
> 
> The larger patch produces a better result.  We can handle it ;)

For this issue, if we stop changing the kexec top down searching code,
I am not sure if we should post this replacing with list_head patches
separately.

Thanks
Baoquan

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v7 4/4] kexec_file: Load kernel at top of system RAM if required
@ 2018-07-25  2:21           ` Baoquan He
  0 siblings, 0 replies; 83+ messages in thread
From: Baoquan He @ 2018-07-25  2:21 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, robh+dt, dan.j.williams, nicolas.pitre, josh,
	fengguang.wu, bp, andy.shevchenko, patrik.r.jakobsson, airlied,
	kys, haiyangz, sthemmin, dmitry.torokhov, frowand.list,
	keith.busch, jonathan.derrick, lorenzo.pieralisi, bhelgaas, tglx,
	brijesh.singh, jglisse, thomas.lendacky, gregkh, baiyaowei,
	richard.weiyang, devel, linux-input, linux-nvdimm, devicetree,
	linux-pci, ebiederm, vgoyal, dyoung, yinghai, monstr, davem,
	chris, jcmvbkbc, gustavo, maarten.lankhorst, seanpaul,
	linux-parisc, linuxppc-dev, kexec

Hi Andrew,

On 07/19/18 at 12:44pm, Andrew Morton wrote:
> On Thu, 19 Jul 2018 23:17:53 +0800 Baoquan He <bhe@redhat.com> wrote:
> > > As far as I can tell, the above is the whole reason for the patchset,
> > > yes?  To avoid confusing users.
> > 
> > 
> > In fact, it's not just trying to avoid confusing users. Kexec loading
> > and kexec_file loading are just do the same thing in essence. Just we
> > need do kernel image verification on uefi system, have to port kexec
> > loading code to kernel. 
> > 
> > Kexec has been a formal feature in our distro, and customers owning
> > those kind of very large machine can make use of this feature to speed
> > up the reboot process. On uefi machine, the kexec_file loading will
> > search place to put kernel under 4G from top to down. As we know, the
> > 1st 4G space is DMA32 ZONE, dma, pci mmcfg, bios etc all try to consume
> > it. It may have possibility to not be able to find a usable space for
> > kernel/initrd. From the top down of the whole memory space, we don't
> > have this worry. 
> > 
> > And at the first post, I just posted below with AKASHI's
> > walk_system_ram_res_rev() version. Later you suggested to use
> > list_head to link child sibling of resource, see what the code change
> > looks like.
> > http://lkml.kernel.org/r/20180322033722.9279-1-bhe@redhat.com
> > 
> > Then I posted v2
> > http://lkml.kernel.org/r/20180408024724.16812-1-bhe@redhat.com
> > Rob Herring mentioned that other components which has this tree struct
> > have planned to do the same thing, replacing the singly linked list with
> > list_head to link resource child sibling. Just quote Rob's words as
> > below. I think this could be another reason.
> > 
> > ~~~~~ From Rob
> > The DT struct device_node also has the same tree structure with
> > parent, child, sibling pointers and converting to list_head had been
> > on the todo list for a while. ACPI also has some tree walking
> > functions (drivers/acpi/acpica/pstree.c). Perhaps there should be a
> > common tree struct and helpers defined either on top of list_head or a
> > ~~~~~
> > new struct if that saves some size.
> 
> Please let's get all this into the changelogs?

Sorry for late reply because of some urgent customer hotplug issues.

I am rewriting all change logs, and cover letter. Then found I was wrong
about the 2nd reason. The current kexec_file_load calls
kexec_locate_mem_hole() to go through all system RAM region, if one
region is larger than the size of kernel or initrd, it will search a
position in that region from top to down. Since kexec will jump to 2nd
kernel and don't need to care the 1st kernel's data, we can always find
a usable space to load kexec kernel/initrd under 4G.

So the only reason for this patch is keeping consistent with kexec_load
and avoid confusion.

And since x86 5-level paging mode has been added, we have another issue
for top-down searching in the whole system RAM. That is we support
dynamic 4-level to 5-level changing. Namely a kernel compiled with
5-level support, we can add 'no5lvl' to force 4-level. Then jumping from
a 5-level kernel to 4-level kernel, e.g we load kernel at the top of
system RAM in 5-level paging mode which might be bigger than 64TB, then
try to jump to 4-level kernel with the upper limit of 64TB. For this
case, we need add limit for kexec kernel loading if in 5-level kernel.

All this mess makes me hesitate to choose a deligate method. Maybe I
should drop this patchset.

> 
> > > 
> > > Is that sufficient?  Can we instead simplify their lives by providing
> > > better documentation or informative printks or better Kconfig text,
> > > etc?
> > > 
> > > And who *are* the people who are performing this configuration?  Random
> > > system administrators?  Linux distro engineers?  If the latter then
> > > they presumably aren't easily confused!
> > 
> > Kexec was invented for kernel developer to speed up their kernel
> > rebooting. Now high end sever admin, kernel developer and QE are also
> > keen to use it to reboot large box for faster feature testing, bug
> > debugging. Kernel dev could know this well, about kernel loading
> > position, admin or QE might not be aware of it very well. 
> > 
> > > 
> > > In other words, I'm trying to understand how much benefit this patchset
> > > will provide to our users as a whole.
> > 
> > Understood. The list_head replacing patch truly involes too many code
> > changes, it's risky. I am willing to try any idea from reviewers, won't
> > persuit they have to be accepted finally. If don't have a try, we don't
> > know what it looks like, and what impact it may have. I am fine to take
> > AKASHI's simple version of walk_system_ram_res_rev() to lower risk, even
> > though it could be a little bit low efficient.
> 
> The larger patch produces a better result.  We can handle it ;)

For this issue, if we stop changing the kexec top down searching code,
I am not sure if we should post this replacing with list_head patches
separately.

Thanks
Baoquan

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v7 4/4] kexec_file: Load kernel at top of system RAM if required
@ 2018-07-25  2:21           ` Baoquan He
  0 siblings, 0 replies; 83+ messages in thread
From: Baoquan He @ 2018-07-25  2:21 UTC (permalink / raw)
  To: Andrew Morton
  Cc: nicolas.pitre, brijesh.singh, devicetree, airlied, linux-pci,
	richard.weiyang, keith.busch, jcmvbkbc, baiyaowei, kys,
	frowand.list, tglx, lorenzo.pieralisi, sthemmin, linux-nvdimm,
	patrik.r.jakobsson, andy.shevchenko, linux-input, gustavo, bp,
	dyoung, vgoyal, thomas.lendacky, haiyangz, maarten.lankhorst,
	josh, jglisse, robh+dt, seanpaul, bhelgaas, dan.j.williams,
	yinghai, jonathan.derrick, chris, monstr, linux-parisc, gregkh,
	dmitry.torokhov, kexec, linux-kernel, ebiederm, devel,
	fengguang.wu, linuxppc-dev, davem

Hi Andrew,

On 07/19/18 at 12:44pm, Andrew Morton wrote:
> On Thu, 19 Jul 2018 23:17:53 +0800 Baoquan He <bhe@redhat.com> wrote:
> > > As far as I can tell, the above is the whole reason for the patchset,
> > > yes?  To avoid confusing users.
> > 
> > 
> > In fact, it's not just trying to avoid confusing users. Kexec loading
> > and kexec_file loading are just do the same thing in essence. Just we
> > need do kernel image verification on uefi system, have to port kexec
> > loading code to kernel. 
> > 
> > Kexec has been a formal feature in our distro, and customers owning
> > those kind of very large machine can make use of this feature to speed
> > up the reboot process. On uefi machine, the kexec_file loading will
> > search place to put kernel under 4G from top to down. As we know, the
> > 1st 4G space is DMA32 ZONE, dma, pci mmcfg, bios etc all try to consume
> > it. It may have possibility to not be able to find a usable space for
> > kernel/initrd. From the top down of the whole memory space, we don't
> > have this worry. 
> > 
> > And at the first post, I just posted below with AKASHI's
> > walk_system_ram_res_rev() version. Later you suggested to use
> > list_head to link child sibling of resource, see what the code change
> > looks like.
> > http://lkml.kernel.org/r/20180322033722.9279-1-bhe@redhat.com
> > 
> > Then I posted v2
> > http://lkml.kernel.org/r/20180408024724.16812-1-bhe@redhat.com
> > Rob Herring mentioned that other components which has this tree struct
> > have planned to do the same thing, replacing the singly linked list with
> > list_head to link resource child sibling. Just quote Rob's words as
> > below. I think this could be another reason.
> > 
> > ~~~~~ From Rob
> > The DT struct device_node also has the same tree structure with
> > parent, child, sibling pointers and converting to list_head had been
> > on the todo list for a while. ACPI also has some tree walking
> > functions (drivers/acpi/acpica/pstree.c). Perhaps there should be a
> > common tree struct and helpers defined either on top of list_head or a
> > ~~~~~
> > new struct if that saves some size.
> 
> Please let's get all this into the changelogs?

Sorry for late reply because of some urgent customer hotplug issues.

I am rewriting all change logs, and cover letter. Then found I was wrong
about the 2nd reason. The current kexec_file_load calls
kexec_locate_mem_hole() to go through all system RAM region, if one
region is larger than the size of kernel or initrd, it will search a
position in that region from top to down. Since kexec will jump to 2nd
kernel and don't need to care the 1st kernel's data, we can always find
a usable space to load kexec kernel/initrd under 4G.

So the only reason for this patch is keeping consistent with kexec_load
and avoid confusion.

And since x86 5-level paging mode has been added, we have another issue
for top-down searching in the whole system RAM. That is we support
dynamic 4-level to 5-level changing. Namely a kernel compiled with
5-level support, we can add 'no5lvl' to force 4-level. Then jumping from
a 5-level kernel to 4-level kernel, e.g we load kernel at the top of
system RAM in 5-level paging mode which might be bigger than 64TB, then
try to jump to 4-level kernel with the upper limit of 64TB. For this
case, we need add limit for kexec kernel loading if in 5-level kernel.

All this mess makes me hesitate to choose a deligate method. Maybe I
should drop this patchset.

> 
> > > 
> > > Is that sufficient?  Can we instead simplify their lives by providing
> > > better documentation or informative printks or better Kconfig text,
> > > etc?
> > > 
> > > And who *are* the people who are performing this configuration?  Random
> > > system administrators?  Linux distro engineers?  If the latter then
> > > they presumably aren't easily confused!
> > 
> > Kexec was invented for kernel developer to speed up their kernel
> > rebooting. Now high end sever admin, kernel developer and QE are also
> > keen to use it to reboot large box for faster feature testing, bug
> > debugging. Kernel dev could know this well, about kernel loading
> > position, admin or QE might not be aware of it very well. 
> > 
> > > 
> > > In other words, I'm trying to understand how much benefit this patchset
> > > will provide to our users as a whole.
> > 
> > Understood. The list_head replacing patch truly involes too many code
> > changes, it's risky. I am willing to try any idea from reviewers, won't
> > persuit they have to be accepted finally. If don't have a try, we don't
> > know what it looks like, and what impact it may have. I am fine to take
> > AKASHI's simple version of walk_system_ram_res_rev() to lower risk, even
> > though it could be a little bit low efficient.
> 
> The larger patch produces a better result.  We can handle it ;)

For this issue, if we stop changing the kexec top down searching code,
I am not sure if we should post this replacing with list_head patches
separately.

Thanks
Baoquan

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v7 4/4] kexec_file: Load kernel at top of system RAM if required
@ 2018-07-25  6:48           ` Baoquan He
  0 siblings, 0 replies; 83+ messages in thread
From: Baoquan He @ 2018-07-25  6:48 UTC (permalink / raw)
  To: Michal Hocko
  Cc: nicolas.pitre, brijesh.singh, devicetree, airlied, linux-pci,
	richard.weiyang, jcmvbkbc, baiyaowei, kys, frowand.list,
	lorenzo.pieralisi, sthemmin, linux-nvdimm, patrik.r.jakobsson,
	andy.shevchenko, linux-input, gustavo, bp, dyoung,
	thomas.lendacky, haiyangz, maarten.lankhorst, josh, jglisse,
	robh+dt, seanpaul, bhelgaas, tglx, yinghai, jonathan.derrick,
	chris, monstr, linux-parisc, gregkh, dmitry.torokhov, kexec,
	linux-kernel, ebiederm, devel, Andrew Morton, fengguang.wu,
	linuxppc-dev, davem

On 07/23/18 at 04:34pm, Michal Hocko wrote:
> On Thu 19-07-18 23:17:53, Baoquan He wrote:
> > Kexec has been a formal feature in our distro, and customers owning
> > those kind of very large machine can make use of this feature to speed
> > up the reboot process. On uefi machine, the kexec_file loading will
> > search place to put kernel under 4G from top to down. As we know, the
> > 1st 4G space is DMA32 ZONE, dma, pci mmcfg, bios etc all try to consume
> > it. It may have possibility to not be able to find a usable space for
> > kernel/initrd. From the top down of the whole memory space, we don't
> > have this worry. 
> 
> I do not have the full context here but let me note that you should be
> careful when doing top-down reservation because you can easily get into
> hotplugable memory and break the hotremove usecase. We even warn when
> this is done. See memblock_find_in_range_node

Kexec read kernel/initrd file into buffer, just search usable positions
for them to do the later copying. You can see below struct kexec_segment, 
for the old kexec_load, kernel/initrd are read into user space buffer,
the @buf stores the user space buffer address, @mem stores the position
where kernel/initrd will be put. In kernel, it calls
kimage_load_normal_segment() to copy user space buffer to intermediate
pages which are allocated with flag GFP_KERNEL. These intermediate pages
are recorded as entries, later when user execute "kexec -e" to trigger
kexec jumping, it will do the final copying from the intermediate pages
to the real destination pages which @mem pointed. Because we can't touch
the existed data in 1st kernel when do kexec kernel loading. With my
understanding, GFP_KERNEL will make those intermediate pages be
allocated inside immovable area, it won't impact hotplugging. But the
@mem we searched in the whole system RAM might be lost along with
hotplug. Hence we need do kexec kernel again when hotplug event is
detected.

#define KEXEC_CONTROL_MEMORY_GFP (GFP_KERNEL | __GFP_NORETRY)


struct kexec_segment {
        /*
         * This pointer can point to user memory if kexec_load() system
         * call is used or will point to kernel memory if
         * kexec_file_load() system call is used.
         *
         * Use ->buf when expecting to deal with user memory and use ->kbuf
         * when expecting to deal with kernel memory.
         */
        union {
                void __user *buf;
                void *kbuf;
        };
        size_t bufsz;                                                                                                                             
        unsigned long mem;
        size_t memsz;
};

Thanks
Baoquan
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v7 4/4] kexec_file: Load kernel at top of system RAM if required
@ 2018-07-25  6:48           ` Baoquan He
  0 siblings, 0 replies; 83+ messages in thread
From: Baoquan He @ 2018-07-25  6:48 UTC (permalink / raw)
  To: Michal Hocko
  Cc: nicolas.pitre-QSEj5FYQhm4dnm+yROfE0A, brijesh.singh-5C7GfCeVMHo,
	devicetree-u79uwXL29TY76Z2rM5mHXA, airlied-cv59FeDIM0c,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	richard.weiyang-Re5JQEeQqe8AvxtiuMwx3w,
	jcmvbkbc-Re5JQEeQqe8AvxtiuMwx3w,
	baiyaowei-0p4V/sDNsUmm0O/7XYngnFaTQe2KTcn/,
	kys-0li6OtcxBFHby3iVrkZq2A, frowand.list-Re5JQEeQqe8AvxtiuMwx3w,
	lorenzo.pieralisi-5wv7dgnIgG8, sthemmin-0li6OtcxBFHby3iVrkZq2A,
	linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	patrik.r.jakobsson-Re5JQEeQqe8AvxtiuMwx3w,
	andy.shevchenko-Re5JQEeQqe8AvxtiuMwx3w,
	linux-input-u79uwXL29TY76Z2rM5mHXA,
	gustavo-THi1TnShQwVAfugRpC6u6w, bp-l3A5Bk7waGM,
	dyoung-H+wXaHxf7aLQT0dZR+AlfA, thomas.lendacky-5C7GfCeVMHo,
	haiyangz-0li6OtcxBFHby3iVrkZq2A,
	maarten.lankhorst-VuQAYsv1563Yd54FQh9/CA,
	josh-iaAMLnmF4UmaiuxdJuQwMA, jglisse-H+wXaHxf7aLQT0dZR+AlfA,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A, seanpaul-F7+t8E8rja9g9hUCZPvPmw,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA, tglx-hfZtesqFncYOwBW4kG4KsQ,
	yinghai-DgEjT+Ai2ygdnm+yROfE0A,
	jonathan.derrick-ral2JQCrhuEAvxtiuMwx3w,
	chris-YvXeqwSYzG2sTnJN9+BGXg, monstr-pSz03upnqPeHXe+LvDLADg,
	linux-parisc-u79uwXL29TY76Z2rM5mHXA,
	gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r,
	dmitry.torokhov-Re5JQEeQqe8AvxtiuMwx3w,
	kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, linux-kernel

On 07/23/18 at 04:34pm, Michal Hocko wrote:
> On Thu 19-07-18 23:17:53, Baoquan He wrote:
> > Kexec has been a formal feature in our distro, and customers owning
> > those kind of very large machine can make use of this feature to speed
> > up the reboot process. On uefi machine, the kexec_file loading will
> > search place to put kernel under 4G from top to down. As we know, the
> > 1st 4G space is DMA32 ZONE, dma, pci mmcfg, bios etc all try to consume
> > it. It may have possibility to not be able to find a usable space for
> > kernel/initrd. From the top down of the whole memory space, we don't
> > have this worry. 
> 
> I do not have the full context here but let me note that you should be
> careful when doing top-down reservation because you can easily get into
> hotplugable memory and break the hotremove usecase. We even warn when
> this is done. See memblock_find_in_range_node

Kexec read kernel/initrd file into buffer, just search usable positions
for them to do the later copying. You can see below struct kexec_segment, 
for the old kexec_load, kernel/initrd are read into user space buffer,
the @buf stores the user space buffer address, @mem stores the position
where kernel/initrd will be put. In kernel, it calls
kimage_load_normal_segment() to copy user space buffer to intermediate
pages which are allocated with flag GFP_KERNEL. These intermediate pages
are recorded as entries, later when user execute "kexec -e" to trigger
kexec jumping, it will do the final copying from the intermediate pages
to the real destination pages which @mem pointed. Because we can't touch
the existed data in 1st kernel when do kexec kernel loading. With my
understanding, GFP_KERNEL will make those intermediate pages be
allocated inside immovable area, it won't impact hotplugging. But the
@mem we searched in the whole system RAM might be lost along with
hotplug. Hence we need do kexec kernel again when hotplug event is
detected.

#define KEXEC_CONTROL_MEMORY_GFP (GFP_KERNEL | __GFP_NORETRY)


struct kexec_segment {
        /*
         * This pointer can point to user memory if kexec_load() system
         * call is used or will point to kernel memory if
         * kexec_file_load() system call is used.
         *
         * Use ->buf when expecting to deal with user memory and use ->kbuf
         * when expecting to deal with kernel memory.
         */
        union {
                void __user *buf;
                void *kbuf;
        };
        size_t bufsz;                                                                                                                             
        unsigned long mem;
        size_t memsz;
};

Thanks
Baoquan

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v7 4/4] kexec_file: Load kernel at top of system RAM if required
@ 2018-07-25  6:48           ` Baoquan He
  0 siblings, 0 replies; 83+ messages in thread
From: Baoquan He @ 2018-07-25  6:48 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, linux-kernel, robh+dt, dan.j.williams,
	nicolas.pitre, josh, fengguang.wu, bp, andy.shevchenko,
	patrik.r.jakobsson, airlied, kys, haiyangz, sthemmin,
	dmitry.torokhov, frowand.list, keith.busch, jonathan.derrick,
	lorenzo.pieralisi, bhelgaas, tglx, brijesh.singh, jglisse,
	thomas.lendacky, gregkh, baiyaowei, richard.weiyang, devel,
	linux-input, linux-nvdimm, devicetree, linux-pci, ebiederm,
	vgoyal, dyoung, yinghai, monstr, davem, chris, jcmvbkbc, gustavo,
	maarten.lankhorst, seanpaul, linux-parisc, linuxppc-dev, kexec

On 07/23/18 at 04:34pm, Michal Hocko wrote:
> On Thu 19-07-18 23:17:53, Baoquan He wrote:
> > Kexec has been a formal feature in our distro, and customers owning
> > those kind of very large machine can make use of this feature to speed
> > up the reboot process. On uefi machine, the kexec_file loading will
> > search place to put kernel under 4G from top to down. As we know, the
> > 1st 4G space is DMA32 ZONE, dma, pci mmcfg, bios etc all try to consume
> > it. It may have possibility to not be able to find a usable space for
> > kernel/initrd. From the top down of the whole memory space, we don't
> > have this worry. 
> 
> I do not have the full context here but let me note that you should be
> careful when doing top-down reservation because you can easily get into
> hotplugable memory and break the hotremove usecase. We even warn when
> this is done. See memblock_find_in_range_node

Kexec read kernel/initrd file into buffer, just search usable positions
for them to do the later copying. You can see below struct kexec_segment, 
for the old kexec_load, kernel/initrd are read into user space buffer,
the @buf stores the user space buffer address, @mem stores the position
where kernel/initrd will be put. In kernel, it calls
kimage_load_normal_segment() to copy user space buffer to intermediate
pages which are allocated with flag GFP_KERNEL. These intermediate pages
are recorded as entries, later when user execute "kexec -e" to trigger
kexec jumping, it will do the final copying from the intermediate pages
to the real destination pages which @mem pointed. Because we can't touch
the existed data in 1st kernel when do kexec kernel loading. With my
understanding, GFP_KERNEL will make those intermediate pages be
allocated inside immovable area, it won't impact hotplugging. But the
@mem we searched in the whole system RAM might be lost along with
hotplug. Hence we need do kexec kernel again when hotplug event is
detected.

#define KEXEC_CONTROL_MEMORY_GFP (GFP_KERNEL | __GFP_NORETRY)


struct kexec_segment {
        /*
         * This pointer can point to user memory if kexec_load() system
         * call is used or will point to kernel memory if
         * kexec_file_load() system call is used.
         *
         * Use ->buf when expecting to deal with user memory and use ->kbuf
         * when expecting to deal with kernel memory.
         */
        union {
                void __user *buf;
                void *kbuf;
        };
        size_t bufsz;                                                                                                                             
        unsigned long mem;
        size_t memsz;
};

Thanks
Baoquan

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v7 4/4] kexec_file: Load kernel at top of system RAM if required
@ 2018-07-25  6:48           ` Baoquan He
  0 siblings, 0 replies; 83+ messages in thread
From: Baoquan He @ 2018-07-25  6:48 UTC (permalink / raw)
  To: Michal Hocko
  Cc: nicolas.pitre, brijesh.singh, devicetree, airlied, linux-pci,
	richard.weiyang, keith.busch, jcmvbkbc, baiyaowei, kys,
	frowand.list, dan.j.williams, lorenzo.pieralisi, sthemmin,
	linux-nvdimm, patrik.r.jakobsson, andy.shevchenko, linux-input,
	gustavo, bp, dyoung, vgoyal, thomas.lendacky, haiyangz,
	maarten.lankhorst, josh, jglisse, robh+dt, seanpaul, bhelgaas,
	tglx, yinghai, jonathan.derrick, chris, monstr, linux-parisc,
	gregkh, dmitry.torokhov, kexec, linux-kernel, ebiederm, devel,
	Andrew Morton, fengguang.wu, linuxppc-dev, davem

On 07/23/18 at 04:34pm, Michal Hocko wrote:
> On Thu 19-07-18 23:17:53, Baoquan He wrote:
> > Kexec has been a formal feature in our distro, and customers owning
> > those kind of very large machine can make use of this feature to speed
> > up the reboot process. On uefi machine, the kexec_file loading will
> > search place to put kernel under 4G from top to down. As we know, the
> > 1st 4G space is DMA32 ZONE, dma, pci mmcfg, bios etc all try to consume
> > it. It may have possibility to not be able to find a usable space for
> > kernel/initrd. From the top down of the whole memory space, we don't
> > have this worry. 
> 
> I do not have the full context here but let me note that you should be
> careful when doing top-down reservation because you can easily get into
> hotplugable memory and break the hotremove usecase. We even warn when
> this is done. See memblock_find_in_range_node

Kexec read kernel/initrd file into buffer, just search usable positions
for them to do the later copying. You can see below struct kexec_segment, 
for the old kexec_load, kernel/initrd are read into user space buffer,
the @buf stores the user space buffer address, @mem stores the position
where kernel/initrd will be put. In kernel, it calls
kimage_load_normal_segment() to copy user space buffer to intermediate
pages which are allocated with flag GFP_KERNEL. These intermediate pages
are recorded as entries, later when user execute "kexec -e" to trigger
kexec jumping, it will do the final copying from the intermediate pages
to the real destination pages which @mem pointed. Because we can't touch
the existed data in 1st kernel when do kexec kernel loading. With my
understanding, GFP_KERNEL will make those intermediate pages be
allocated inside immovable area, it won't impact hotplugging. But the
@mem we searched in the whole system RAM might be lost along with
hotplug. Hence we need do kexec kernel again when hotplug event is
detected.

#define KEXEC_CONTROL_MEMORY_GFP (GFP_KERNEL | __GFP_NORETRY)


struct kexec_segment {
        /*
         * This pointer can point to user memory if kexec_load() system
         * call is used or will point to kernel memory if
         * kexec_file_load() system call is used.
         *
         * Use ->buf when expecting to deal with user memory and use ->kbuf
         * when expecting to deal with kernel memory.
         */
        union {
                void __user *buf;
                void *kbuf;
        };
        size_t bufsz;                                                                                                                             
        unsigned long mem;
        size_t memsz;
};

Thanks
Baoquan

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v7 4/4] kexec_file: Load kernel at top of system RAM if required
  2018-07-25  6:48           ` Baoquan He
  (?)
  (?)
@ 2018-07-26 12:59             ` Michal Hocko
  -1 siblings, 0 replies; 83+ messages in thread
From: Michal Hocko @ 2018-07-26 12:59 UTC (permalink / raw)
  To: Baoquan He
  Cc: nicolas.pitre, brijesh.singh, devicetree, airlied, linux-pci,
	richard.weiyang, jcmvbkbc, baiyaowei, kys, frowand.list,
	lorenzo.pieralisi, sthemmin, linux-nvdimm, patrik.r.jakobsson,
	andy.shevchenko, linux-input, gustavo, bp, dyoung,
	thomas.lendacky, haiyangz, maarten.lankhorst, josh, jglisse,
	robh+dt, seanpaul, bhelgaas, tglx, yinghai, jonathan.derrick,
	chris, monstr, linux-parisc, gregkh, dmitry.torokhov, kexec,
	linux-kernel, ebiederm, devel, Andrew Morton, fengguang.wu,
	linuxppc-dev, davem

On Wed 25-07-18 14:48:13, Baoquan He wrote:
> On 07/23/18 at 04:34pm, Michal Hocko wrote:
> > On Thu 19-07-18 23:17:53, Baoquan He wrote:
> > > Kexec has been a formal feature in our distro, and customers owning
> > > those kind of very large machine can make use of this feature to speed
> > > up the reboot process. On uefi machine, the kexec_file loading will
> > > search place to put kernel under 4G from top to down. As we know, the
> > > 1st 4G space is DMA32 ZONE, dma, pci mmcfg, bios etc all try to consume
> > > it. It may have possibility to not be able to find a usable space for
> > > kernel/initrd. From the top down of the whole memory space, we don't
> > > have this worry. 
> > 
> > I do not have the full context here but let me note that you should be
> > careful when doing top-down reservation because you can easily get into
> > hotplugable memory and break the hotremove usecase. We even warn when
> > this is done. See memblock_find_in_range_node
> 
> Kexec read kernel/initrd file into buffer, just search usable positions
> for them to do the later copying. You can see below struct kexec_segment, 
> for the old kexec_load, kernel/initrd are read into user space buffer,
> the @buf stores the user space buffer address, @mem stores the position
> where kernel/initrd will be put. In kernel, it calls
> kimage_load_normal_segment() to copy user space buffer to intermediate
> pages which are allocated with flag GFP_KERNEL. These intermediate pages
> are recorded as entries, later when user execute "kexec -e" to trigger
> kexec jumping, it will do the final copying from the intermediate pages
> to the real destination pages which @mem pointed. Because we can't touch
> the existed data in 1st kernel when do kexec kernel loading. With my
> understanding, GFP_KERNEL will make those intermediate pages be
> allocated inside immovable area, it won't impact hotplugging. But the
> @mem we searched in the whole system RAM might be lost along with
> hotplug. Hence we need do kexec kernel again when hotplug event is
> detected.

I am not sure I am following. If @mem is placed at movable node then the
memory hotremove simply won't work, because we are seeing reserved pages
and do not know what to do about them. They are not migrateable.
Allocating intermediate pages from other nodes doesn't really help.

The memblock code warns exactly for that reason.
-- 
Michal Hocko
SUSE Labs
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v7 4/4] kexec_file: Load kernel at top of system RAM if required
@ 2018-07-26 12:59             ` Michal Hocko
  0 siblings, 0 replies; 83+ messages in thread
From: Michal Hocko @ 2018-07-26 12:59 UTC (permalink / raw)
  To: Baoquan He
  Cc: nicolas.pitre-QSEj5FYQhm4dnm+yROfE0A, brijesh.singh-5C7GfCeVMHo,
	devicetree-u79uwXL29TY76Z2rM5mHXA, airlied-cv59FeDIM0c,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	richard.weiyang-Re5JQEeQqe8AvxtiuMwx3w,
	jcmvbkbc-Re5JQEeQqe8AvxtiuMwx3w,
	baiyaowei-0p4V/sDNsUmm0O/7XYngnFaTQe2KTcn/,
	kys-0li6OtcxBFHby3iVrkZq2A, frowand.list-Re5JQEeQqe8AvxtiuMwx3w,
	lorenzo.pieralisi-5wv7dgnIgG8, sthemmin-0li6OtcxBFHby3iVrkZq2A,
	linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	patrik.r.jakobsson-Re5JQEeQqe8AvxtiuMwx3w,
	andy.shevchenko-Re5JQEeQqe8AvxtiuMwx3w,
	linux-input-u79uwXL29TY76Z2rM5mHXA,
	gustavo-THi1TnShQwVAfugRpC6u6w, bp-l3A5Bk7waGM,
	dyoung-H+wXaHxf7aLQT0dZR+AlfA, thomas.lendacky-5C7GfCeVMHo,
	haiyangz-0li6OtcxBFHby3iVrkZq2A,
	maarten.lankhorst-VuQAYsv1563Yd54FQh9/CA,
	josh-iaAMLnmF4UmaiuxdJuQwMA, jglisse-H+wXaHxf7aLQT0dZR+AlfA,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A, seanpaul-F7+t8E8rja9g9hUCZPvPmw,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA, tglx-hfZtesqFncYOwBW4kG4KsQ,
	yinghai-DgEjT+Ai2ygdnm+yROfE0A,
	jonathan.derrick-ral2JQCrhuEAvxtiuMwx3w,
	chris-YvXeqwSYzG2sTnJN9+BGXg, monstr-pSz03upnqPeHXe+LvDLADg,
	linux-parisc-u79uwXL29TY76Z2rM5mHXA,
	gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r,
	dmitry.torokhov-Re5JQEeQqe8AvxtiuMwx3w,
	kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, linux-kernel

On Wed 25-07-18 14:48:13, Baoquan He wrote:
> On 07/23/18 at 04:34pm, Michal Hocko wrote:
> > On Thu 19-07-18 23:17:53, Baoquan He wrote:
> > > Kexec has been a formal feature in our distro, and customers owning
> > > those kind of very large machine can make use of this feature to speed
> > > up the reboot process. On uefi machine, the kexec_file loading will
> > > search place to put kernel under 4G from top to down. As we know, the
> > > 1st 4G space is DMA32 ZONE, dma, pci mmcfg, bios etc all try to consume
> > > it. It may have possibility to not be able to find a usable space for
> > > kernel/initrd. From the top down of the whole memory space, we don't
> > > have this worry. 
> > 
> > I do not have the full context here but let me note that you should be
> > careful when doing top-down reservation because you can easily get into
> > hotplugable memory and break the hotremove usecase. We even warn when
> > this is done. See memblock_find_in_range_node
> 
> Kexec read kernel/initrd file into buffer, just search usable positions
> for them to do the later copying. You can see below struct kexec_segment, 
> for the old kexec_load, kernel/initrd are read into user space buffer,
> the @buf stores the user space buffer address, @mem stores the position
> where kernel/initrd will be put. In kernel, it calls
> kimage_load_normal_segment() to copy user space buffer to intermediate
> pages which are allocated with flag GFP_KERNEL. These intermediate pages
> are recorded as entries, later when user execute "kexec -e" to trigger
> kexec jumping, it will do the final copying from the intermediate pages
> to the real destination pages which @mem pointed. Because we can't touch
> the existed data in 1st kernel when do kexec kernel loading. With my
> understanding, GFP_KERNEL will make those intermediate pages be
> allocated inside immovable area, it won't impact hotplugging. But the
> @mem we searched in the whole system RAM might be lost along with
> hotplug. Hence we need do kexec kernel again when hotplug event is
> detected.

I am not sure I am following. If @mem is placed at movable node then the
memory hotremove simply won't work, because we are seeing reserved pages
and do not know what to do about them. They are not migrateable.
Allocating intermediate pages from other nodes doesn't really help.

The memblock code warns exactly for that reason.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v7 4/4] kexec_file: Load kernel at top of system RAM if required
@ 2018-07-26 12:59             ` Michal Hocko
  0 siblings, 0 replies; 83+ messages in thread
From: Michal Hocko @ 2018-07-26 12:59 UTC (permalink / raw)
  To: Baoquan He
  Cc: Andrew Morton, linux-kernel, robh+dt, dan.j.williams,
	nicolas.pitre, josh, fengguang.wu, bp, andy.shevchenko,
	patrik.r.jakobsson, airlied, kys, haiyangz, sthemmin,
	dmitry.torokhov, frowand.list, keith.busch, jonathan.derrick,
	lorenzo.pieralisi, bhelgaas, tglx, brijesh.singh, jglisse,
	thomas.lendacky, gregkh, baiyaowei, richard.weiyang, devel,
	linux-input, linux-nvdimm, devicetree, linux-pci, ebiederm,
	vgoyal, dyoung, yinghai, monstr, davem, chris, jcmvbkbc, gustavo,
	maarten.lankhorst, seanpaul, linux-parisc, linuxppc-dev, kexec

On Wed 25-07-18 14:48:13, Baoquan He wrote:
> On 07/23/18 at 04:34pm, Michal Hocko wrote:
> > On Thu 19-07-18 23:17:53, Baoquan He wrote:
> > > Kexec has been a formal feature in our distro, and customers owning
> > > those kind of very large machine can make use of this feature to speed
> > > up the reboot process. On uefi machine, the kexec_file loading will
> > > search place to put kernel under 4G from top to down. As we know, the
> > > 1st 4G space is DMA32 ZONE, dma, pci mmcfg, bios etc all try to consume
> > > it. It may have possibility to not be able to find a usable space for
> > > kernel/initrd. From the top down of the whole memory space, we don't
> > > have this worry. 
> > 
> > I do not have the full context here but let me note that you should be
> > careful when doing top-down reservation because you can easily get into
> > hotplugable memory and break the hotremove usecase. We even warn when
> > this is done. See memblock_find_in_range_node
> 
> Kexec read kernel/initrd file into buffer, just search usable positions
> for them to do the later copying. You can see below struct kexec_segment, 
> for the old kexec_load, kernel/initrd are read into user space buffer,
> the @buf stores the user space buffer address, @mem stores the position
> where kernel/initrd will be put. In kernel, it calls
> kimage_load_normal_segment() to copy user space buffer to intermediate
> pages which are allocated with flag GFP_KERNEL. These intermediate pages
> are recorded as entries, later when user execute "kexec -e" to trigger
> kexec jumping, it will do the final copying from the intermediate pages
> to the real destination pages which @mem pointed. Because we can't touch
> the existed data in 1st kernel when do kexec kernel loading. With my
> understanding, GFP_KERNEL will make those intermediate pages be
> allocated inside immovable area, it won't impact hotplugging. But the
> @mem we searched in the whole system RAM might be lost along with
> hotplug. Hence we need do kexec kernel again when hotplug event is
> detected.

I am not sure I am following. If @mem is placed at movable node then the
memory hotremove simply won't work, because we are seeing reserved pages
and do not know what to do about them. They are not migrateable.
Allocating intermediate pages from other nodes doesn't really help.

The memblock code warns exactly for that reason.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v7 4/4] kexec_file: Load kernel at top of system RAM if required
@ 2018-07-26 12:59             ` Michal Hocko
  0 siblings, 0 replies; 83+ messages in thread
From: Michal Hocko @ 2018-07-26 12:59 UTC (permalink / raw)
  To: Baoquan He
  Cc: nicolas.pitre, brijesh.singh, devicetree, airlied, linux-pci,
	richard.weiyang, keith.busch, jcmvbkbc, baiyaowei, kys,
	frowand.list, dan.j.williams, lorenzo.pieralisi, sthemmin,
	linux-nvdimm, patrik.r.jakobsson, andy.shevchenko, linux-input,
	gustavo, bp, dyoung, vgoyal, thomas.lendacky, haiyangz,
	maarten.lankhorst, josh, jglisse, robh+dt, seanpaul, bhelgaas,
	tglx, yinghai, jonathan.derrick, chris, monstr, linux-parisc,
	gregkh, dmitry.torokhov, kexec, linux-kernel, ebiederm, devel,
	Andrew Morton, fengguang.wu, linuxppc-dev, davem

On Wed 25-07-18 14:48:13, Baoquan He wrote:
> On 07/23/18 at 04:34pm, Michal Hocko wrote:
> > On Thu 19-07-18 23:17:53, Baoquan He wrote:
> > > Kexec has been a formal feature in our distro, and customers owning
> > > those kind of very large machine can make use of this feature to speed
> > > up the reboot process. On uefi machine, the kexec_file loading will
> > > search place to put kernel under 4G from top to down. As we know, the
> > > 1st 4G space is DMA32 ZONE, dma, pci mmcfg, bios etc all try to consume
> > > it. It may have possibility to not be able to find a usable space for
> > > kernel/initrd. From the top down of the whole memory space, we don't
> > > have this worry. 
> > 
> > I do not have the full context here but let me note that you should be
> > careful when doing top-down reservation because you can easily get into
> > hotplugable memory and break the hotremove usecase. We even warn when
> > this is done. See memblock_find_in_range_node
> 
> Kexec read kernel/initrd file into buffer, just search usable positions
> for them to do the later copying. You can see below struct kexec_segment, 
> for the old kexec_load, kernel/initrd are read into user space buffer,
> the @buf stores the user space buffer address, @mem stores the position
> where kernel/initrd will be put. In kernel, it calls
> kimage_load_normal_segment() to copy user space buffer to intermediate
> pages which are allocated with flag GFP_KERNEL. These intermediate pages
> are recorded as entries, later when user execute "kexec -e" to trigger
> kexec jumping, it will do the final copying from the intermediate pages
> to the real destination pages which @mem pointed. Because we can't touch
> the existed data in 1st kernel when do kexec kernel loading. With my
> understanding, GFP_KERNEL will make those intermediate pages be
> allocated inside immovable area, it won't impact hotplugging. But the
> @mem we searched in the whole system RAM might be lost along with
> hotplug. Hence we need do kexec kernel again when hotplug event is
> detected.

I am not sure I am following. If @mem is placed at movable node then the
memory hotremove simply won't work, because we are seeing reserved pages
and do not know what to do about them. They are not migrateable.
Allocating intermediate pages from other nodes doesn't really help.

The memblock code warns exactly for that reason.
-- 
Michal Hocko
SUSE Labs

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v7 4/4] kexec_file: Load kernel at top of system RAM if required
@ 2018-07-26 13:09               ` Baoquan He
  0 siblings, 0 replies; 83+ messages in thread
From: Baoquan He @ 2018-07-26 13:09 UTC (permalink / raw)
  To: Michal Hocko
  Cc: nicolas.pitre, brijesh.singh, devicetree, airlied, linux-pci,
	richard.weiyang, jcmvbkbc, baiyaowei, kys, frowand.list,
	lorenzo.pieralisi, sthemmin, linux-nvdimm, patrik.r.jakobsson,
	andy.shevchenko, linux-input, gustavo, bp, dyoung,
	thomas.lendacky, haiyangz, maarten.lankhorst, josh, jglisse,
	robh+dt, seanpaul, bhelgaas, tglx, yinghai, jonathan.derrick,
	chris, monstr, linux-parisc, gregkh, dmitry.torokhov, kexec,
	linux-kernel, ebiederm, devel, Andrew Morton, fengguang.wu,
	linuxppc-dev, davem

On 07/26/18 at 02:59pm, Michal Hocko wrote:
> On Wed 25-07-18 14:48:13, Baoquan He wrote:
> > On 07/23/18 at 04:34pm, Michal Hocko wrote:
> > > On Thu 19-07-18 23:17:53, Baoquan He wrote:
> > > > Kexec has been a formal feature in our distro, and customers owning
> > > > those kind of very large machine can make use of this feature to speed
> > > > up the reboot process. On uefi machine, the kexec_file loading will
> > > > search place to put kernel under 4G from top to down. As we know, the
> > > > 1st 4G space is DMA32 ZONE, dma, pci mmcfg, bios etc all try to consume
> > > > it. It may have possibility to not be able to find a usable space for
> > > > kernel/initrd. From the top down of the whole memory space, we don't
> > > > have this worry. 
> > > 
> > > I do not have the full context here but let me note that you should be
> > > careful when doing top-down reservation because you can easily get into
> > > hotplugable memory and break the hotremove usecase. We even warn when
> > > this is done. See memblock_find_in_range_node
> > 
> > Kexec read kernel/initrd file into buffer, just search usable positions
> > for them to do the later copying. You can see below struct kexec_segment, 
> > for the old kexec_load, kernel/initrd are read into user space buffer,
> > the @buf stores the user space buffer address, @mem stores the position
> > where kernel/initrd will be put. In kernel, it calls
> > kimage_load_normal_segment() to copy user space buffer to intermediate
> > pages which are allocated with flag GFP_KERNEL. These intermediate pages
> > are recorded as entries, later when user execute "kexec -e" to trigger
> > kexec jumping, it will do the final copying from the intermediate pages
> > to the real destination pages which @mem pointed. Because we can't touch
> > the existed data in 1st kernel when do kexec kernel loading. With my
> > understanding, GFP_KERNEL will make those intermediate pages be
> > allocated inside immovable area, it won't impact hotplugging. But the
> > @mem we searched in the whole system RAM might be lost along with
> > hotplug. Hence we need do kexec kernel again when hotplug event is
> > detected.
> 
> I am not sure I am following. If @mem is placed at movable node then the
> memory hotremove simply won't work, because we are seeing reserved pages
> and do not know what to do about them. They are not migrateable.
> Allocating intermediate pages from other nodes doesn't really help.

OK, I forgot the 2nd kernel which kexec jump into. It won't impact hotremove
in 1st kernel, it does impact the kernel which kexec jump into if kernel
is at top of system RAM and the top RAM is in movable node.

> 
> The memblock code warns exactly for that reason.
> -- 
> Michal Hocko
> SUSE Labs
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v7 4/4] kexec_file: Load kernel at top of system RAM if required
@ 2018-07-26 13:09               ` Baoquan He
  0 siblings, 0 replies; 83+ messages in thread
From: Baoquan He @ 2018-07-26 13:09 UTC (permalink / raw)
  To: Michal Hocko
  Cc: nicolas.pitre-QSEj5FYQhm4dnm+yROfE0A, brijesh.singh-5C7GfCeVMHo,
	devicetree-u79uwXL29TY76Z2rM5mHXA, airlied-cv59FeDIM0c,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	richard.weiyang-Re5JQEeQqe8AvxtiuMwx3w,
	jcmvbkbc-Re5JQEeQqe8AvxtiuMwx3w,
	baiyaowei-0p4V/sDNsUmm0O/7XYngnFaTQe2KTcn/,
	kys-0li6OtcxBFHby3iVrkZq2A, frowand.list-Re5JQEeQqe8AvxtiuMwx3w,
	lorenzo.pieralisi-5wv7dgnIgG8, sthemmin-0li6OtcxBFHby3iVrkZq2A,
	linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	patrik.r.jakobsson-Re5JQEeQqe8AvxtiuMwx3w,
	andy.shevchenko-Re5JQEeQqe8AvxtiuMwx3w,
	linux-input-u79uwXL29TY76Z2rM5mHXA,
	gustavo-THi1TnShQwVAfugRpC6u6w, bp-l3A5Bk7waGM,
	dyoung-H+wXaHxf7aLQT0dZR+AlfA, thomas.lendacky-5C7GfCeVMHo,
	haiyangz-0li6OtcxBFHby3iVrkZq2A,
	maarten.lankhorst-VuQAYsv1563Yd54FQh9/CA,
	josh-iaAMLnmF4UmaiuxdJuQwMA, jglisse-H+wXaHxf7aLQT0dZR+AlfA,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A, seanpaul-F7+t8E8rja9g9hUCZPvPmw,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA, tglx-hfZtesqFncYOwBW4kG4KsQ,
	yinghai-DgEjT+Ai2ygdnm+yROfE0A,
	jonathan.derrick-ral2JQCrhuEAvxtiuMwx3w,
	chris-YvXeqwSYzG2sTnJN9+BGXg, monstr-pSz03upnqPeHXe+LvDLADg,
	linux-parisc-u79uwXL29TY76Z2rM5mHXA,
	gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r,
	dmitry.torokhov-Re5JQEeQqe8AvxtiuMwx3w,
	kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, linux-kernel

On 07/26/18 at 02:59pm, Michal Hocko wrote:
> On Wed 25-07-18 14:48:13, Baoquan He wrote:
> > On 07/23/18 at 04:34pm, Michal Hocko wrote:
> > > On Thu 19-07-18 23:17:53, Baoquan He wrote:
> > > > Kexec has been a formal feature in our distro, and customers owning
> > > > those kind of very large machine can make use of this feature to speed
> > > > up the reboot process. On uefi machine, the kexec_file loading will
> > > > search place to put kernel under 4G from top to down. As we know, the
> > > > 1st 4G space is DMA32 ZONE, dma, pci mmcfg, bios etc all try to consume
> > > > it. It may have possibility to not be able to find a usable space for
> > > > kernel/initrd. From the top down of the whole memory space, we don't
> > > > have this worry. 
> > > 
> > > I do not have the full context here but let me note that you should be
> > > careful when doing top-down reservation because you can easily get into
> > > hotplugable memory and break the hotremove usecase. We even warn when
> > > this is done. See memblock_find_in_range_node
> > 
> > Kexec read kernel/initrd file into buffer, just search usable positions
> > for them to do the later copying. You can see below struct kexec_segment, 
> > for the old kexec_load, kernel/initrd are read into user space buffer,
> > the @buf stores the user space buffer address, @mem stores the position
> > where kernel/initrd will be put. In kernel, it calls
> > kimage_load_normal_segment() to copy user space buffer to intermediate
> > pages which are allocated with flag GFP_KERNEL. These intermediate pages
> > are recorded as entries, later when user execute "kexec -e" to trigger
> > kexec jumping, it will do the final copying from the intermediate pages
> > to the real destination pages which @mem pointed. Because we can't touch
> > the existed data in 1st kernel when do kexec kernel loading. With my
> > understanding, GFP_KERNEL will make those intermediate pages be
> > allocated inside immovable area, it won't impact hotplugging. But the
> > @mem we searched in the whole system RAM might be lost along with
> > hotplug. Hence we need do kexec kernel again when hotplug event is
> > detected.
> 
> I am not sure I am following. If @mem is placed at movable node then the
> memory hotremove simply won't work, because we are seeing reserved pages
> and do not know what to do about them. They are not migrateable.
> Allocating intermediate pages from other nodes doesn't really help.

OK, I forgot the 2nd kernel which kexec jump into. It won't impact hotremove
in 1st kernel, it does impact the kernel which kexec jump into if kernel
is at top of system RAM and the top RAM is in movable node.

> 
> The memblock code warns exactly for that reason.
> -- 
> Michal Hocko
> SUSE Labs

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v7 4/4] kexec_file: Load kernel at top of system RAM if required
@ 2018-07-26 13:09               ` Baoquan He
  0 siblings, 0 replies; 83+ messages in thread
From: Baoquan He @ 2018-07-26 13:09 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, linux-kernel, robh+dt, dan.j.williams,
	nicolas.pitre, josh, fengguang.wu, bp, andy.shevchenko,
	patrik.r.jakobsson, airlied, kys, haiyangz, sthemmin,
	dmitry.torokhov, frowand.list, keith.busch, jonathan.derrick,
	lorenzo.pieralisi, bhelgaas, tglx, brijesh.singh, jglisse,
	thomas.lendacky, gregkh, baiyaowei, richard.weiyang, devel,
	linux-input, linux-nvdimm, devicetree, linux-pci, ebiederm,
	vgoyal, dyoung, yinghai, monstr, davem, chris, jcmvbkbc, gustavo,
	maarten.lankhorst, seanpaul, linux-parisc, linuxppc-dev, kexec

On 07/26/18 at 02:59pm, Michal Hocko wrote:
> On Wed 25-07-18 14:48:13, Baoquan He wrote:
> > On 07/23/18 at 04:34pm, Michal Hocko wrote:
> > > On Thu 19-07-18 23:17:53, Baoquan He wrote:
> > > > Kexec has been a formal feature in our distro, and customers owning
> > > > those kind of very large machine can make use of this feature to speed
> > > > up the reboot process. On uefi machine, the kexec_file loading will
> > > > search place to put kernel under 4G from top to down. As we know, the
> > > > 1st 4G space is DMA32 ZONE, dma, pci mmcfg, bios etc all try to consume
> > > > it. It may have possibility to not be able to find a usable space for
> > > > kernel/initrd. From the top down of the whole memory space, we don't
> > > > have this worry. 
> > > 
> > > I do not have the full context here but let me note that you should be
> > > careful when doing top-down reservation because you can easily get into
> > > hotplugable memory and break the hotremove usecase. We even warn when
> > > this is done. See memblock_find_in_range_node
> > 
> > Kexec read kernel/initrd file into buffer, just search usable positions
> > for them to do the later copying. You can see below struct kexec_segment, 
> > for the old kexec_load, kernel/initrd are read into user space buffer,
> > the @buf stores the user space buffer address, @mem stores the position
> > where kernel/initrd will be put. In kernel, it calls
> > kimage_load_normal_segment() to copy user space buffer to intermediate
> > pages which are allocated with flag GFP_KERNEL. These intermediate pages
> > are recorded as entries, later when user execute "kexec -e" to trigger
> > kexec jumping, it will do the final copying from the intermediate pages
> > to the real destination pages which @mem pointed. Because we can't touch
> > the existed data in 1st kernel when do kexec kernel loading. With my
> > understanding, GFP_KERNEL will make those intermediate pages be
> > allocated inside immovable area, it won't impact hotplugging. But the
> > @mem we searched in the whole system RAM might be lost along with
> > hotplug. Hence we need do kexec kernel again when hotplug event is
> > detected.
> 
> I am not sure I am following. If @mem is placed at movable node then the
> memory hotremove simply won't work, because we are seeing reserved pages
> and do not know what to do about them. They are not migrateable.
> Allocating intermediate pages from other nodes doesn't really help.

OK, I forgot the 2nd kernel which kexec jump into. It won't impact hotremove
in 1st kernel, it does impact the kernel which kexec jump into if kernel
is at top of system RAM and the top RAM is in movable node.

> 
> The memblock code warns exactly for that reason.
> -- 
> Michal Hocko
> SUSE Labs

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v7 4/4] kexec_file: Load kernel at top of system RAM if required
@ 2018-07-26 13:09               ` Baoquan He
  0 siblings, 0 replies; 83+ messages in thread
From: Baoquan He @ 2018-07-26 13:09 UTC (permalink / raw)
  To: Michal Hocko
  Cc: nicolas.pitre, brijesh.singh, devicetree, airlied, linux-pci,
	richard.weiyang, keith.busch, jcmvbkbc, baiyaowei, kys,
	frowand.list, dan.j.williams, lorenzo.pieralisi, sthemmin,
	linux-nvdimm, patrik.r.jakobsson, andy.shevchenko, linux-input,
	gustavo, bp, dyoung, vgoyal, thomas.lendacky, haiyangz,
	maarten.lankhorst, josh, jglisse, robh+dt, seanpaul, bhelgaas,
	tglx, yinghai, jonathan.derrick, chris, monstr, linux-parisc,
	gregkh, dmitry.torokhov, kexec, linux-kernel, ebiederm, devel,
	Andrew Morton, fengguang.wu, linuxppc-dev, davem

On 07/26/18 at 02:59pm, Michal Hocko wrote:
> On Wed 25-07-18 14:48:13, Baoquan He wrote:
> > On 07/23/18 at 04:34pm, Michal Hocko wrote:
> > > On Thu 19-07-18 23:17:53, Baoquan He wrote:
> > > > Kexec has been a formal feature in our distro, and customers owning
> > > > those kind of very large machine can make use of this feature to speed
> > > > up the reboot process. On uefi machine, the kexec_file loading will
> > > > search place to put kernel under 4G from top to down. As we know, the
> > > > 1st 4G space is DMA32 ZONE, dma, pci mmcfg, bios etc all try to consume
> > > > it. It may have possibility to not be able to find a usable space for
> > > > kernel/initrd. From the top down of the whole memory space, we don't
> > > > have this worry. 
> > > 
> > > I do not have the full context here but let me note that you should be
> > > careful when doing top-down reservation because you can easily get into
> > > hotplugable memory and break the hotremove usecase. We even warn when
> > > this is done. See memblock_find_in_range_node
> > 
> > Kexec read kernel/initrd file into buffer, just search usable positions
> > for them to do the later copying. You can see below struct kexec_segment, 
> > for the old kexec_load, kernel/initrd are read into user space buffer,
> > the @buf stores the user space buffer address, @mem stores the position
> > where kernel/initrd will be put. In kernel, it calls
> > kimage_load_normal_segment() to copy user space buffer to intermediate
> > pages which are allocated with flag GFP_KERNEL. These intermediate pages
> > are recorded as entries, later when user execute "kexec -e" to trigger
> > kexec jumping, it will do the final copying from the intermediate pages
> > to the real destination pages which @mem pointed. Because we can't touch
> > the existed data in 1st kernel when do kexec kernel loading. With my
> > understanding, GFP_KERNEL will make those intermediate pages be
> > allocated inside immovable area, it won't impact hotplugging. But the
> > @mem we searched in the whole system RAM might be lost along with
> > hotplug. Hence we need do kexec kernel again when hotplug event is
> > detected.
> 
> I am not sure I am following. If @mem is placed at movable node then the
> memory hotremove simply won't work, because we are seeing reserved pages
> and do not know what to do about them. They are not migrateable.
> Allocating intermediate pages from other nodes doesn't really help.

OK, I forgot the 2nd kernel which kexec jump into. It won't impact hotremove
in 1st kernel, it does impact the kernel which kexec jump into if kernel
is at top of system RAM and the top RAM is in movable node.

> 
> The memblock code warns exactly for that reason.
> -- 
> Michal Hocko
> SUSE Labs

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v7 4/4] kexec_file: Load kernel at top of system RAM if required
  2018-07-26 13:09               ` Baoquan He
  (?)
  (?)
@ 2018-07-26 13:12                 ` Michal Hocko
  -1 siblings, 0 replies; 83+ messages in thread
From: Michal Hocko @ 2018-07-26 13:12 UTC (permalink / raw)
  To: Baoquan He
  Cc: nicolas.pitre, brijesh.singh, devicetree, airlied, linux-pci,
	richard.weiyang, jcmvbkbc, baiyaowei, kys, frowand.list,
	lorenzo.pieralisi, sthemmin, linux-nvdimm, patrik.r.jakobsson,
	andy.shevchenko, linux-input, gustavo, bp, dyoung,
	thomas.lendacky, haiyangz, maarten.lankhorst, josh, jglisse,
	robh+dt, seanpaul, bhelgaas, tglx, yinghai, jonathan.derrick,
	chris, monstr, linux-parisc, gregkh, dmitry.torokhov, kexec,
	linux-kernel, ebiederm, devel, Andrew Morton, fengguang.wu,
	linuxppc-dev, davem

On Thu 26-07-18 21:09:04, Baoquan He wrote:
> On 07/26/18 at 02:59pm, Michal Hocko wrote:
> > On Wed 25-07-18 14:48:13, Baoquan He wrote:
> > > On 07/23/18 at 04:34pm, Michal Hocko wrote:
> > > > On Thu 19-07-18 23:17:53, Baoquan He wrote:
> > > > > Kexec has been a formal feature in our distro, and customers owning
> > > > > those kind of very large machine can make use of this feature to speed
> > > > > up the reboot process. On uefi machine, the kexec_file loading will
> > > > > search place to put kernel under 4G from top to down. As we know, the
> > > > > 1st 4G space is DMA32 ZONE, dma, pci mmcfg, bios etc all try to consume
> > > > > it. It may have possibility to not be able to find a usable space for
> > > > > kernel/initrd. From the top down of the whole memory space, we don't
> > > > > have this worry. 
> > > > 
> > > > I do not have the full context here but let me note that you should be
> > > > careful when doing top-down reservation because you can easily get into
> > > > hotplugable memory and break the hotremove usecase. We even warn when
> > > > this is done. See memblock_find_in_range_node
> > > 
> > > Kexec read kernel/initrd file into buffer, just search usable positions
> > > for them to do the later copying. You can see below struct kexec_segment, 
> > > for the old kexec_load, kernel/initrd are read into user space buffer,
> > > the @buf stores the user space buffer address, @mem stores the position
> > > where kernel/initrd will be put. In kernel, it calls
> > > kimage_load_normal_segment() to copy user space buffer to intermediate
> > > pages which are allocated with flag GFP_KERNEL. These intermediate pages
> > > are recorded as entries, later when user execute "kexec -e" to trigger
> > > kexec jumping, it will do the final copying from the intermediate pages
> > > to the real destination pages which @mem pointed. Because we can't touch
> > > the existed data in 1st kernel when do kexec kernel loading. With my
> > > understanding, GFP_KERNEL will make those intermediate pages be
> > > allocated inside immovable area, it won't impact hotplugging. But the
> > > @mem we searched in the whole system RAM might be lost along with
> > > hotplug. Hence we need do kexec kernel again when hotplug event is
> > > detected.
> > 
> > I am not sure I am following. If @mem is placed at movable node then the
> > memory hotremove simply won't work, because we are seeing reserved pages
> > and do not know what to do about them. They are not migrateable.
> > Allocating intermediate pages from other nodes doesn't really help.
> 
> OK, I forgot the 2nd kernel which kexec jump into. It won't impact hotremove
> in 1st kernel, it does impact the kernel which kexec jump into if kernel
> is at top of system RAM and the top RAM is in movable node.

It will affect the 1st kernel (which does the memblock allocation
top-down) as well. For reasons mentioned above.
-- 
Michal Hocko
SUSE Labs
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v7 4/4] kexec_file: Load kernel at top of system RAM if required
@ 2018-07-26 13:12                 ` Michal Hocko
  0 siblings, 0 replies; 83+ messages in thread
From: Michal Hocko @ 2018-07-26 13:12 UTC (permalink / raw)
  To: Baoquan He
  Cc: nicolas.pitre-QSEj5FYQhm4dnm+yROfE0A, brijesh.singh-5C7GfCeVMHo,
	devicetree-u79uwXL29TY76Z2rM5mHXA, airlied-cv59FeDIM0c,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	richard.weiyang-Re5JQEeQqe8AvxtiuMwx3w,
	jcmvbkbc-Re5JQEeQqe8AvxtiuMwx3w,
	baiyaowei-0p4V/sDNsUmm0O/7XYngnFaTQe2KTcn/,
	kys-0li6OtcxBFHby3iVrkZq2A, frowand.list-Re5JQEeQqe8AvxtiuMwx3w,
	lorenzo.pieralisi-5wv7dgnIgG8, sthemmin-0li6OtcxBFHby3iVrkZq2A,
	linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	patrik.r.jakobsson-Re5JQEeQqe8AvxtiuMwx3w,
	andy.shevchenko-Re5JQEeQqe8AvxtiuMwx3w,
	linux-input-u79uwXL29TY76Z2rM5mHXA,
	gustavo-THi1TnShQwVAfugRpC6u6w, bp-l3A5Bk7waGM,
	dyoung-H+wXaHxf7aLQT0dZR+AlfA, thomas.lendacky-5C7GfCeVMHo,
	haiyangz-0li6OtcxBFHby3iVrkZq2A,
	maarten.lankhorst-VuQAYsv1563Yd54FQh9/CA,
	josh-iaAMLnmF4UmaiuxdJuQwMA, jglisse-H+wXaHxf7aLQT0dZR+AlfA,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A, seanpaul-F7+t8E8rja9g9hUCZPvPmw,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA, tglx-hfZtesqFncYOwBW4kG4KsQ,
	yinghai-DgEjT+Ai2ygdnm+yROfE0A,
	jonathan.derrick-ral2JQCrhuEAvxtiuMwx3w,
	chris-YvXeqwSYzG2sTnJN9+BGXg, monstr-pSz03upnqPeHXe+LvDLADg,
	linux-parisc-u79uwXL29TY76Z2rM5mHXA,
	gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r,
	dmitry.torokhov-Re5JQEeQqe8AvxtiuMwx3w,
	kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, linux-kernel

On Thu 26-07-18 21:09:04, Baoquan He wrote:
> On 07/26/18 at 02:59pm, Michal Hocko wrote:
> > On Wed 25-07-18 14:48:13, Baoquan He wrote:
> > > On 07/23/18 at 04:34pm, Michal Hocko wrote:
> > > > On Thu 19-07-18 23:17:53, Baoquan He wrote:
> > > > > Kexec has been a formal feature in our distro, and customers owning
> > > > > those kind of very large machine can make use of this feature to speed
> > > > > up the reboot process. On uefi machine, the kexec_file loading will
> > > > > search place to put kernel under 4G from top to down. As we know, the
> > > > > 1st 4G space is DMA32 ZONE, dma, pci mmcfg, bios etc all try to consume
> > > > > it. It may have possibility to not be able to find a usable space for
> > > > > kernel/initrd. From the top down of the whole memory space, we don't
> > > > > have this worry. 
> > > > 
> > > > I do not have the full context here but let me note that you should be
> > > > careful when doing top-down reservation because you can easily get into
> > > > hotplugable memory and break the hotremove usecase. We even warn when
> > > > this is done. See memblock_find_in_range_node
> > > 
> > > Kexec read kernel/initrd file into buffer, just search usable positions
> > > for them to do the later copying. You can see below struct kexec_segment, 
> > > for the old kexec_load, kernel/initrd are read into user space buffer,
> > > the @buf stores the user space buffer address, @mem stores the position
> > > where kernel/initrd will be put. In kernel, it calls
> > > kimage_load_normal_segment() to copy user space buffer to intermediate
> > > pages which are allocated with flag GFP_KERNEL. These intermediate pages
> > > are recorded as entries, later when user execute "kexec -e" to trigger
> > > kexec jumping, it will do the final copying from the intermediate pages
> > > to the real destination pages which @mem pointed. Because we can't touch
> > > the existed data in 1st kernel when do kexec kernel loading. With my
> > > understanding, GFP_KERNEL will make those intermediate pages be
> > > allocated inside immovable area, it won't impact hotplugging. But the
> > > @mem we searched in the whole system RAM might be lost along with
> > > hotplug. Hence we need do kexec kernel again when hotplug event is
> > > detected.
> > 
> > I am not sure I am following. If @mem is placed at movable node then the
> > memory hotremove simply won't work, because we are seeing reserved pages
> > and do not know what to do about them. They are not migrateable.
> > Allocating intermediate pages from other nodes doesn't really help.
> 
> OK, I forgot the 2nd kernel which kexec jump into. It won't impact hotremove
> in 1st kernel, it does impact the kernel which kexec jump into if kernel
> is at top of system RAM and the top RAM is in movable node.

It will affect the 1st kernel (which does the memblock allocation
top-down) as well. For reasons mentioned above.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v7 4/4] kexec_file: Load kernel at top of system RAM if required
@ 2018-07-26 13:12                 ` Michal Hocko
  0 siblings, 0 replies; 83+ messages in thread
From: Michal Hocko @ 2018-07-26 13:12 UTC (permalink / raw)
  To: Baoquan He
  Cc: Andrew Morton, linux-kernel, robh+dt, dan.j.williams,
	nicolas.pitre, josh, fengguang.wu, bp, andy.shevchenko,
	patrik.r.jakobsson, airlied, kys, haiyangz, sthemmin,
	dmitry.torokhov, frowand.list, keith.busch, jonathan.derrick,
	lorenzo.pieralisi, bhelgaas, tglx, brijesh.singh, jglisse,
	thomas.lendacky, gregkh, baiyaowei, richard.weiyang, devel,
	linux-input, linux-nvdimm, devicetree, linux-pci, ebiederm,
	vgoyal, dyoung, yinghai, monstr, davem, chris, jcmvbkbc, gustavo,
	maarten.lankhorst, seanpaul, linux-parisc, linuxppc-dev, kexec

On Thu 26-07-18 21:09:04, Baoquan He wrote:
> On 07/26/18 at 02:59pm, Michal Hocko wrote:
> > On Wed 25-07-18 14:48:13, Baoquan He wrote:
> > > On 07/23/18 at 04:34pm, Michal Hocko wrote:
> > > > On Thu 19-07-18 23:17:53, Baoquan He wrote:
> > > > > Kexec has been a formal feature in our distro, and customers owning
> > > > > those kind of very large machine can make use of this feature to speed
> > > > > up the reboot process. On uefi machine, the kexec_file loading will
> > > > > search place to put kernel under 4G from top to down. As we know, the
> > > > > 1st 4G space is DMA32 ZONE, dma, pci mmcfg, bios etc all try to consume
> > > > > it. It may have possibility to not be able to find a usable space for
> > > > > kernel/initrd. From the top down of the whole memory space, we don't
> > > > > have this worry. 
> > > > 
> > > > I do not have the full context here but let me note that you should be
> > > > careful when doing top-down reservation because you can easily get into
> > > > hotplugable memory and break the hotremove usecase. We even warn when
> > > > this is done. See memblock_find_in_range_node
> > > 
> > > Kexec read kernel/initrd file into buffer, just search usable positions
> > > for them to do the later copying. You can see below struct kexec_segment, 
> > > for the old kexec_load, kernel/initrd are read into user space buffer,
> > > the @buf stores the user space buffer address, @mem stores the position
> > > where kernel/initrd will be put. In kernel, it calls
> > > kimage_load_normal_segment() to copy user space buffer to intermediate
> > > pages which are allocated with flag GFP_KERNEL. These intermediate pages
> > > are recorded as entries, later when user execute "kexec -e" to trigger
> > > kexec jumping, it will do the final copying from the intermediate pages
> > > to the real destination pages which @mem pointed. Because we can't touch
> > > the existed data in 1st kernel when do kexec kernel loading. With my
> > > understanding, GFP_KERNEL will make those intermediate pages be
> > > allocated inside immovable area, it won't impact hotplugging. But the
> > > @mem we searched in the whole system RAM might be lost along with
> > > hotplug. Hence we need do kexec kernel again when hotplug event is
> > > detected.
> > 
> > I am not sure I am following. If @mem is placed at movable node then the
> > memory hotremove simply won't work, because we are seeing reserved pages
> > and do not know what to do about them. They are not migrateable.
> > Allocating intermediate pages from other nodes doesn't really help.
> 
> OK, I forgot the 2nd kernel which kexec jump into. It won't impact hotremove
> in 1st kernel, it does impact the kernel which kexec jump into if kernel
> is at top of system RAM and the top RAM is in movable node.

It will affect the 1st kernel (which does the memblock allocation
top-down) as well. For reasons mentioned above.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v7 4/4] kexec_file: Load kernel at top of system RAM if required
@ 2018-07-26 13:12                 ` Michal Hocko
  0 siblings, 0 replies; 83+ messages in thread
From: Michal Hocko @ 2018-07-26 13:12 UTC (permalink / raw)
  To: Baoquan He
  Cc: nicolas.pitre, brijesh.singh, devicetree, airlied, linux-pci,
	richard.weiyang, keith.busch, jcmvbkbc, baiyaowei, kys,
	frowand.list, dan.j.williams, lorenzo.pieralisi, sthemmin,
	linux-nvdimm, patrik.r.jakobsson, andy.shevchenko, linux-input,
	gustavo, bp, dyoung, vgoyal, thomas.lendacky, haiyangz,
	maarten.lankhorst, josh, jglisse, robh+dt, seanpaul, bhelgaas,
	tglx, yinghai, jonathan.derrick, chris, monstr, linux-parisc,
	gregkh, dmitry.torokhov, kexec, linux-kernel, ebiederm, devel,
	Andrew Morton, fengguang.wu, linuxppc-dev, davem

On Thu 26-07-18 21:09:04, Baoquan He wrote:
> On 07/26/18 at 02:59pm, Michal Hocko wrote:
> > On Wed 25-07-18 14:48:13, Baoquan He wrote:
> > > On 07/23/18 at 04:34pm, Michal Hocko wrote:
> > > > On Thu 19-07-18 23:17:53, Baoquan He wrote:
> > > > > Kexec has been a formal feature in our distro, and customers owning
> > > > > those kind of very large machine can make use of this feature to speed
> > > > > up the reboot process. On uefi machine, the kexec_file loading will
> > > > > search place to put kernel under 4G from top to down. As we know, the
> > > > > 1st 4G space is DMA32 ZONE, dma, pci mmcfg, bios etc all try to consume
> > > > > it. It may have possibility to not be able to find a usable space for
> > > > > kernel/initrd. From the top down of the whole memory space, we don't
> > > > > have this worry. 
> > > > 
> > > > I do not have the full context here but let me note that you should be
> > > > careful when doing top-down reservation because you can easily get into
> > > > hotplugable memory and break the hotremove usecase. We even warn when
> > > > this is done. See memblock_find_in_range_node
> > > 
> > > Kexec read kernel/initrd file into buffer, just search usable positions
> > > for them to do the later copying. You can see below struct kexec_segment, 
> > > for the old kexec_load, kernel/initrd are read into user space buffer,
> > > the @buf stores the user space buffer address, @mem stores the position
> > > where kernel/initrd will be put. In kernel, it calls
> > > kimage_load_normal_segment() to copy user space buffer to intermediate
> > > pages which are allocated with flag GFP_KERNEL. These intermediate pages
> > > are recorded as entries, later when user execute "kexec -e" to trigger
> > > kexec jumping, it will do the final copying from the intermediate pages
> > > to the real destination pages which @mem pointed. Because we can't touch
> > > the existed data in 1st kernel when do kexec kernel loading. With my
> > > understanding, GFP_KERNEL will make those intermediate pages be
> > > allocated inside immovable area, it won't impact hotplugging. But the
> > > @mem we searched in the whole system RAM might be lost along with
> > > hotplug. Hence we need do kexec kernel again when hotplug event is
> > > detected.
> > 
> > I am not sure I am following. If @mem is placed at movable node then the
> > memory hotremove simply won't work, because we are seeing reserved pages
> > and do not know what to do about them. They are not migrateable.
> > Allocating intermediate pages from other nodes doesn't really help.
> 
> OK, I forgot the 2nd kernel which kexec jump into. It won't impact hotremove
> in 1st kernel, it does impact the kernel which kexec jump into if kernel
> is at top of system RAM and the top RAM is in movable node.

It will affect the 1st kernel (which does the memblock allocation
top-down) as well. For reasons mentioned above.
-- 
Michal Hocko
SUSE Labs

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v7 4/4] kexec_file: Load kernel at top of system RAM if required
@ 2018-07-26 13:14                   ` Michal Hocko
  0 siblings, 0 replies; 83+ messages in thread
From: Michal Hocko @ 2018-07-26 13:14 UTC (permalink / raw)
  To: Baoquan He
  Cc: nicolas.pitre, brijesh.singh, devicetree, airlied, linux-pci,
	richard.weiyang, jcmvbkbc, baiyaowei, kys, frowand.list,
	lorenzo.pieralisi, sthemmin, linux-nvdimm, patrik.r.jakobsson,
	andy.shevchenko, linux-input, gustavo, bp, dyoung,
	thomas.lendacky, haiyangz, maarten.lankhorst, josh, jglisse,
	robh+dt, seanpaul, bhelgaas, tglx, yinghai, jonathan.derrick,
	chris, monstr, linux-parisc, gregkh, dmitry.torokhov, kexec,
	linux-kernel, ebiederm, devel, Andrew Morton, fengguang.wu,
	linuxppc-dev, davem

On Thu 26-07-18 15:12:42, Michal Hocko wrote:
> On Thu 26-07-18 21:09:04, Baoquan He wrote:
> > On 07/26/18 at 02:59pm, Michal Hocko wrote:
> > > On Wed 25-07-18 14:48:13, Baoquan He wrote:
> > > > On 07/23/18 at 04:34pm, Michal Hocko wrote:
> > > > > On Thu 19-07-18 23:17:53, Baoquan He wrote:
> > > > > > Kexec has been a formal feature in our distro, and customers owning
> > > > > > those kind of very large machine can make use of this feature to speed
> > > > > > up the reboot process. On uefi machine, the kexec_file loading will
> > > > > > search place to put kernel under 4G from top to down. As we know, the
> > > > > > 1st 4G space is DMA32 ZONE, dma, pci mmcfg, bios etc all try to consume
> > > > > > it. It may have possibility to not be able to find a usable space for
> > > > > > kernel/initrd. From the top down of the whole memory space, we don't
> > > > > > have this worry. 
> > > > > 
> > > > > I do not have the full context here but let me note that you should be
> > > > > careful when doing top-down reservation because you can easily get into
> > > > > hotplugable memory and break the hotremove usecase. We even warn when
> > > > > this is done. See memblock_find_in_range_node
> > > > 
> > > > Kexec read kernel/initrd file into buffer, just search usable positions
> > > > for them to do the later copying. You can see below struct kexec_segment, 
> > > > for the old kexec_load, kernel/initrd are read into user space buffer,
> > > > the @buf stores the user space buffer address, @mem stores the position
> > > > where kernel/initrd will be put. In kernel, it calls
> > > > kimage_load_normal_segment() to copy user space buffer to intermediate
> > > > pages which are allocated with flag GFP_KERNEL. These intermediate pages
> > > > are recorded as entries, later when user execute "kexec -e" to trigger
> > > > kexec jumping, it will do the final copying from the intermediate pages
> > > > to the real destination pages which @mem pointed. Because we can't touch
> > > > the existed data in 1st kernel when do kexec kernel loading. With my
> > > > understanding, GFP_KERNEL will make those intermediate pages be
> > > > allocated inside immovable area, it won't impact hotplugging. But the
> > > > @mem we searched in the whole system RAM might be lost along with
> > > > hotplug. Hence we need do kexec kernel again when hotplug event is
> > > > detected.
> > > 
> > > I am not sure I am following. If @mem is placed at movable node then the
> > > memory hotremove simply won't work, because we are seeing reserved pages
> > > and do not know what to do about them. They are not migrateable.
> > > Allocating intermediate pages from other nodes doesn't really help.
> > 
> > OK, I forgot the 2nd kernel which kexec jump into. It won't impact hotremove
> > in 1st kernel, it does impact the kernel which kexec jump into if kernel
> > is at top of system RAM and the top RAM is in movable node.
> 
> It will affect the 1st kernel (which does the memblock allocation
> top-down) as well. For reasons mentioned above.

And btw. in the ideal world, we would restrict the memblock allocation
top-down from the non-movable nodes. But I do not think we have that
information ready at the time when the reservation is done.
-- 
Michal Hocko
SUSE Labs
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v7 4/4] kexec_file: Load kernel at top of system RAM if required
@ 2018-07-26 13:14                   ` Michal Hocko
  0 siblings, 0 replies; 83+ messages in thread
From: Michal Hocko @ 2018-07-26 13:14 UTC (permalink / raw)
  To: Baoquan He
  Cc: nicolas.pitre-QSEj5FYQhm4dnm+yROfE0A, brijesh.singh-5C7GfCeVMHo,
	devicetree-u79uwXL29TY76Z2rM5mHXA, airlied-cv59FeDIM0c,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	richard.weiyang-Re5JQEeQqe8AvxtiuMwx3w,
	jcmvbkbc-Re5JQEeQqe8AvxtiuMwx3w,
	baiyaowei-0p4V/sDNsUmm0O/7XYngnFaTQe2KTcn/,
	kys-0li6OtcxBFHby3iVrkZq2A, frowand.list-Re5JQEeQqe8AvxtiuMwx3w,
	lorenzo.pieralisi-5wv7dgnIgG8, sthemmin-0li6OtcxBFHby3iVrkZq2A,
	linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	patrik.r.jakobsson-Re5JQEeQqe8AvxtiuMwx3w,
	andy.shevchenko-Re5JQEeQqe8AvxtiuMwx3w,
	linux-input-u79uwXL29TY76Z2rM5mHXA,
	gustavo-THi1TnShQwVAfugRpC6u6w, bp-l3A5Bk7waGM,
	dyoung-H+wXaHxf7aLQT0dZR+AlfA, thomas.lendacky-5C7GfCeVMHo,
	haiyangz-0li6OtcxBFHby3iVrkZq2A,
	maarten.lankhorst-VuQAYsv1563Yd54FQh9/CA,
	josh-iaAMLnmF4UmaiuxdJuQwMA, jglisse-H+wXaHxf7aLQT0dZR+AlfA,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A, seanpaul-F7+t8E8rja9g9hUCZPvPmw,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA, tglx-hfZtesqFncYOwBW4kG4KsQ,
	yinghai-DgEjT+Ai2ygdnm+yROfE0A,
	jonathan.derrick-ral2JQCrhuEAvxtiuMwx3w,
	chris-YvXeqwSYzG2sTnJN9+BGXg, monstr-pSz03upnqPeHXe+LvDLADg,
	linux-parisc-u79uwXL29TY76Z2rM5mHXA,
	gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r,
	dmitry.torokhov-Re5JQEeQqe8AvxtiuMwx3w,
	kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, linux-kernel

On Thu 26-07-18 15:12:42, Michal Hocko wrote:
> On Thu 26-07-18 21:09:04, Baoquan He wrote:
> > On 07/26/18 at 02:59pm, Michal Hocko wrote:
> > > On Wed 25-07-18 14:48:13, Baoquan He wrote:
> > > > On 07/23/18 at 04:34pm, Michal Hocko wrote:
> > > > > On Thu 19-07-18 23:17:53, Baoquan He wrote:
> > > > > > Kexec has been a formal feature in our distro, and customers owning
> > > > > > those kind of very large machine can make use of this feature to speed
> > > > > > up the reboot process. On uefi machine, the kexec_file loading will
> > > > > > search place to put kernel under 4G from top to down. As we know, the
> > > > > > 1st 4G space is DMA32 ZONE, dma, pci mmcfg, bios etc all try to consume
> > > > > > it. It may have possibility to not be able to find a usable space for
> > > > > > kernel/initrd. From the top down of the whole memory space, we don't
> > > > > > have this worry. 
> > > > > 
> > > > > I do not have the full context here but let me note that you should be
> > > > > careful when doing top-down reservation because you can easily get into
> > > > > hotplugable memory and break the hotremove usecase. We even warn when
> > > > > this is done. See memblock_find_in_range_node
> > > > 
> > > > Kexec read kernel/initrd file into buffer, just search usable positions
> > > > for them to do the later copying. You can see below struct kexec_segment, 
> > > > for the old kexec_load, kernel/initrd are read into user space buffer,
> > > > the @buf stores the user space buffer address, @mem stores the position
> > > > where kernel/initrd will be put. In kernel, it calls
> > > > kimage_load_normal_segment() to copy user space buffer to intermediate
> > > > pages which are allocated with flag GFP_KERNEL. These intermediate pages
> > > > are recorded as entries, later when user execute "kexec -e" to trigger
> > > > kexec jumping, it will do the final copying from the intermediate pages
> > > > to the real destination pages which @mem pointed. Because we can't touch
> > > > the existed data in 1st kernel when do kexec kernel loading. With my
> > > > understanding, GFP_KERNEL will make those intermediate pages be
> > > > allocated inside immovable area, it won't impact hotplugging. But the
> > > > @mem we searched in the whole system RAM might be lost along with
> > > > hotplug. Hence we need do kexec kernel again when hotplug event is
> > > > detected.
> > > 
> > > I am not sure I am following. If @mem is placed at movable node then the
> > > memory hotremove simply won't work, because we are seeing reserved pages
> > > and do not know what to do about them. They are not migrateable.
> > > Allocating intermediate pages from other nodes doesn't really help.
> > 
> > OK, I forgot the 2nd kernel which kexec jump into. It won't impact hotremove
> > in 1st kernel, it does impact the kernel which kexec jump into if kernel
> > is at top of system RAM and the top RAM is in movable node.
> 
> It will affect the 1st kernel (which does the memblock allocation
> top-down) as well. For reasons mentioned above.

And btw. in the ideal world, we would restrict the memblock allocation
top-down from the non-movable nodes. But I do not think we have that
information ready at the time when the reservation is done.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v7 4/4] kexec_file: Load kernel at top of system RAM if required
@ 2018-07-26 13:14                   ` Michal Hocko
  0 siblings, 0 replies; 83+ messages in thread
From: Michal Hocko @ 2018-07-26 13:14 UTC (permalink / raw)
  To: Baoquan He
  Cc: Andrew Morton, linux-kernel, robh+dt, dan.j.williams,
	nicolas.pitre, josh, fengguang.wu, bp, andy.shevchenko,
	patrik.r.jakobsson, airlied, kys, haiyangz, sthemmin,
	dmitry.torokhov, frowand.list, keith.busch, jonathan.derrick,
	lorenzo.pieralisi, bhelgaas, tglx, brijesh.singh, jglisse,
	thomas.lendacky, gregkh, baiyaowei, richard.weiyang, devel,
	linux-input, linux-nvdimm, devicetree, linux-pci, ebiederm,
	vgoyal, dyoung, yinghai, monstr, davem, chris, jcmvbkbc, gustavo,
	maarten.lankhorst, seanpaul, linux-parisc, linuxppc-dev, kexec

On Thu 26-07-18 15:12:42, Michal Hocko wrote:
> On Thu 26-07-18 21:09:04, Baoquan He wrote:
> > On 07/26/18 at 02:59pm, Michal Hocko wrote:
> > > On Wed 25-07-18 14:48:13, Baoquan He wrote:
> > > > On 07/23/18 at 04:34pm, Michal Hocko wrote:
> > > > > On Thu 19-07-18 23:17:53, Baoquan He wrote:
> > > > > > Kexec has been a formal feature in our distro, and customers owning
> > > > > > those kind of very large machine can make use of this feature to speed
> > > > > > up the reboot process. On uefi machine, the kexec_file loading will
> > > > > > search place to put kernel under 4G from top to down. As we know, the
> > > > > > 1st 4G space is DMA32 ZONE, dma, pci mmcfg, bios etc all try to consume
> > > > > > it. It may have possibility to not be able to find a usable space for
> > > > > > kernel/initrd. From the top down of the whole memory space, we don't
> > > > > > have this worry. 
> > > > > 
> > > > > I do not have the full context here but let me note that you should be
> > > > > careful when doing top-down reservation because you can easily get into
> > > > > hotplugable memory and break the hotremove usecase. We even warn when
> > > > > this is done. See memblock_find_in_range_node
> > > > 
> > > > Kexec read kernel/initrd file into buffer, just search usable positions
> > > > for them to do the later copying. You can see below struct kexec_segment, 
> > > > for the old kexec_load, kernel/initrd are read into user space buffer,
> > > > the @buf stores the user space buffer address, @mem stores the position
> > > > where kernel/initrd will be put. In kernel, it calls
> > > > kimage_load_normal_segment() to copy user space buffer to intermediate
> > > > pages which are allocated with flag GFP_KERNEL. These intermediate pages
> > > > are recorded as entries, later when user execute "kexec -e" to trigger
> > > > kexec jumping, it will do the final copying from the intermediate pages
> > > > to the real destination pages which @mem pointed. Because we can't touch
> > > > the existed data in 1st kernel when do kexec kernel loading. With my
> > > > understanding, GFP_KERNEL will make those intermediate pages be
> > > > allocated inside immovable area, it won't impact hotplugging. But the
> > > > @mem we searched in the whole system RAM might be lost along with
> > > > hotplug. Hence we need do kexec kernel again when hotplug event is
> > > > detected.
> > > 
> > > I am not sure I am following. If @mem is placed at movable node then the
> > > memory hotremove simply won't work, because we are seeing reserved pages
> > > and do not know what to do about them. They are not migrateable.
> > > Allocating intermediate pages from other nodes doesn't really help.
> > 
> > OK, I forgot the 2nd kernel which kexec jump into. It won't impact hotremove
> > in 1st kernel, it does impact the kernel which kexec jump into if kernel
> > is at top of system RAM and the top RAM is in movable node.
> 
> It will affect the 1st kernel (which does the memblock allocation
> top-down) as well. For reasons mentioned above.

And btw. in the ideal world, we would restrict the memblock allocation
top-down from the non-movable nodes. But I do not think we have that
information ready at the time when the reservation is done.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v7 4/4] kexec_file: Load kernel at top of system RAM if required
@ 2018-07-26 13:14                   ` Michal Hocko
  0 siblings, 0 replies; 83+ messages in thread
From: Michal Hocko @ 2018-07-26 13:14 UTC (permalink / raw)
  To: Baoquan He
  Cc: nicolas.pitre, brijesh.singh, devicetree, airlied, linux-pci,
	richard.weiyang, keith.busch, jcmvbkbc, baiyaowei, kys,
	frowand.list, dan.j.williams, lorenzo.pieralisi, sthemmin,
	linux-nvdimm, patrik.r.jakobsson, andy.shevchenko, linux-input,
	gustavo, bp, dyoung, vgoyal, thomas.lendacky, haiyangz,
	maarten.lankhorst, josh, jglisse, robh+dt, seanpaul, bhelgaas,
	tglx, yinghai, jonathan.derrick, chris, monstr, linux-parisc,
	gregkh, dmitry.torokhov, kexec, linux-kernel, ebiederm, devel,
	Andrew Morton, fengguang.wu, linuxppc-dev, davem

On Thu 26-07-18 15:12:42, Michal Hocko wrote:
> On Thu 26-07-18 21:09:04, Baoquan He wrote:
> > On 07/26/18 at 02:59pm, Michal Hocko wrote:
> > > On Wed 25-07-18 14:48:13, Baoquan He wrote:
> > > > On 07/23/18 at 04:34pm, Michal Hocko wrote:
> > > > > On Thu 19-07-18 23:17:53, Baoquan He wrote:
> > > > > > Kexec has been a formal feature in our distro, and customers owning
> > > > > > those kind of very large machine can make use of this feature to speed
> > > > > > up the reboot process. On uefi machine, the kexec_file loading will
> > > > > > search place to put kernel under 4G from top to down. As we know, the
> > > > > > 1st 4G space is DMA32 ZONE, dma, pci mmcfg, bios etc all try to consume
> > > > > > it. It may have possibility to not be able to find a usable space for
> > > > > > kernel/initrd. From the top down of the whole memory space, we don't
> > > > > > have this worry. 
> > > > > 
> > > > > I do not have the full context here but let me note that you should be
> > > > > careful when doing top-down reservation because you can easily get into
> > > > > hotplugable memory and break the hotremove usecase. We even warn when
> > > > > this is done. See memblock_find_in_range_node
> > > > 
> > > > Kexec read kernel/initrd file into buffer, just search usable positions
> > > > for them to do the later copying. You can see below struct kexec_segment, 
> > > > for the old kexec_load, kernel/initrd are read into user space buffer,
> > > > the @buf stores the user space buffer address, @mem stores the position
> > > > where kernel/initrd will be put. In kernel, it calls
> > > > kimage_load_normal_segment() to copy user space buffer to intermediate
> > > > pages which are allocated with flag GFP_KERNEL. These intermediate pages
> > > > are recorded as entries, later when user execute "kexec -e" to trigger
> > > > kexec jumping, it will do the final copying from the intermediate pages
> > > > to the real destination pages which @mem pointed. Because we can't touch
> > > > the existed data in 1st kernel when do kexec kernel loading. With my
> > > > understanding, GFP_KERNEL will make those intermediate pages be
> > > > allocated inside immovable area, it won't impact hotplugging. But the
> > > > @mem we searched in the whole system RAM might be lost along with
> > > > hotplug. Hence we need do kexec kernel again when hotplug event is
> > > > detected.
> > > 
> > > I am not sure I am following. If @mem is placed at movable node then the
> > > memory hotremove simply won't work, because we are seeing reserved pages
> > > and do not know what to do about them. They are not migrateable.
> > > Allocating intermediate pages from other nodes doesn't really help.
> > 
> > OK, I forgot the 2nd kernel which kexec jump into. It won't impact hotremove
> > in 1st kernel, it does impact the kernel which kexec jump into if kernel
> > is at top of system RAM and the top RAM is in movable node.
> 
> It will affect the 1st kernel (which does the memblock allocation
> top-down) as well. For reasons mentioned above.

And btw. in the ideal world, we would restrict the memblock allocation
top-down from the non-movable nodes. But I do not think we have that
information ready at the time when the reservation is done.
-- 
Michal Hocko
SUSE Labs

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v7 4/4] kexec_file: Load kernel at top of system RAM if required
@ 2018-07-26 13:37                     ` Baoquan He
  0 siblings, 0 replies; 83+ messages in thread
From: Baoquan He @ 2018-07-26 13:37 UTC (permalink / raw)
  To: Michal Hocko
  Cc: nicolas.pitre, brijesh.singh, devicetree, airlied, linux-pci,
	richard.weiyang, jcmvbkbc, baiyaowei, kys, frowand.list,
	lorenzo.pieralisi, sthemmin, linux-nvdimm, patrik.r.jakobsson,
	andy.shevchenko, linux-input, gustavo, bp, dyoung,
	thomas.lendacky, haiyangz, maarten.lankhorst, josh, jglisse,
	robh+dt, seanpaul, bhelgaas, tglx, yinghai, jonathan.derrick,
	chris, monstr, linux-parisc, gregkh, dmitry.torokhov, kexec,
	linux-kernel, ebiederm, devel, Andrew Morton, fengguang.wu,
	linuxppc-dev, davem

On 07/26/18 at 03:14pm, Michal Hocko wrote:
> On Thu 26-07-18 15:12:42, Michal Hocko wrote:
> > On Thu 26-07-18 21:09:04, Baoquan He wrote:
> > > On 07/26/18 at 02:59pm, Michal Hocko wrote:
> > > > On Wed 25-07-18 14:48:13, Baoquan He wrote:
> > > > > On 07/23/18 at 04:34pm, Michal Hocko wrote:
> > > > > > On Thu 19-07-18 23:17:53, Baoquan He wrote:
> > > > > > > Kexec has been a formal feature in our distro, and customers owning
> > > > > > > those kind of very large machine can make use of this feature to speed
> > > > > > > up the reboot process. On uefi machine, the kexec_file loading will
> > > > > > > search place to put kernel under 4G from top to down. As we know, the
> > > > > > > 1st 4G space is DMA32 ZONE, dma, pci mmcfg, bios etc all try to consume
> > > > > > > it. It may have possibility to not be able to find a usable space for
> > > > > > > kernel/initrd. From the top down of the whole memory space, we don't
> > > > > > > have this worry. 
> > > > > > 
> > > > > > I do not have the full context here but let me note that you should be
> > > > > > careful when doing top-down reservation because you can easily get into
> > > > > > hotplugable memory and break the hotremove usecase. We even warn when
> > > > > > this is done. See memblock_find_in_range_node
> > > > > 
> > > > > Kexec read kernel/initrd file into buffer, just search usable positions
> > > > > for them to do the later copying. You can see below struct kexec_segment, 
> > > > > for the old kexec_load, kernel/initrd are read into user space buffer,
> > > > > the @buf stores the user space buffer address, @mem stores the position
> > > > > where kernel/initrd will be put. In kernel, it calls
> > > > > kimage_load_normal_segment() to copy user space buffer to intermediate
> > > > > pages which are allocated with flag GFP_KERNEL. These intermediate pages
> > > > > are recorded as entries, later when user execute "kexec -e" to trigger
> > > > > kexec jumping, it will do the final copying from the intermediate pages
> > > > > to the real destination pages which @mem pointed. Because we can't touch
> > > > > the existed data in 1st kernel when do kexec kernel loading. With my
> > > > > understanding, GFP_KERNEL will make those intermediate pages be
> > > > > allocated inside immovable area, it won't impact hotplugging. But the
> > > > > @mem we searched in the whole system RAM might be lost along with
> > > > > hotplug. Hence we need do kexec kernel again when hotplug event is
> > > > > detected.
> > > > 
> > > > I am not sure I am following. If @mem is placed at movable node then the
> > > > memory hotremove simply won't work, because we are seeing reserved pages
> > > > and do not know what to do about them. They are not migrateable.
> > > > Allocating intermediate pages from other nodes doesn't really help.
> > > 
> > > OK, I forgot the 2nd kernel which kexec jump into. It won't impact hotremove
> > > in 1st kernel, it does impact the kernel which kexec jump into if kernel
> > > is at top of system RAM and the top RAM is in movable node.
> > 
> > It will affect the 1st kernel (which does the memblock allocation
> > top-down) as well. For reasons mentioned above.
> 
> And btw. in the ideal world, we would restrict the memblock allocation
> top-down from the non-movable nodes. But I do not think we have that
> information ready at the time when the reservation is done.

Oh, you could mix kexec loading up with kdump kernel loading. For kdump
kernel, we need reserve memory region during bootup with memblock
allocator. For kexec loading, we just operate after system up, and do
not need to reserve any memmory region. About memory used to load them,
it's quite different way.

Thanks
Baoquan
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v7 4/4] kexec_file: Load kernel at top of system RAM if required
@ 2018-07-26 13:37                     ` Baoquan He
  0 siblings, 0 replies; 83+ messages in thread
From: Baoquan He @ 2018-07-26 13:37 UTC (permalink / raw)
  To: Michal Hocko
  Cc: nicolas.pitre-QSEj5FYQhm4dnm+yROfE0A, brijesh.singh-5C7GfCeVMHo,
	devicetree-u79uwXL29TY76Z2rM5mHXA, airlied-cv59FeDIM0c,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	richard.weiyang-Re5JQEeQqe8AvxtiuMwx3w,
	jcmvbkbc-Re5JQEeQqe8AvxtiuMwx3w,
	baiyaowei-0p4V/sDNsUmm0O/7XYngnFaTQe2KTcn/,
	kys-0li6OtcxBFHby3iVrkZq2A, frowand.list-Re5JQEeQqe8AvxtiuMwx3w,
	lorenzo.pieralisi-5wv7dgnIgG8, sthemmin-0li6OtcxBFHby3iVrkZq2A,
	linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	patrik.r.jakobsson-Re5JQEeQqe8AvxtiuMwx3w,
	andy.shevchenko-Re5JQEeQqe8AvxtiuMwx3w,
	linux-input-u79uwXL29TY76Z2rM5mHXA,
	gustavo-THi1TnShQwVAfugRpC6u6w, bp-l3A5Bk7waGM,
	dyoung-H+wXaHxf7aLQT0dZR+AlfA, thomas.lendacky-5C7GfCeVMHo,
	haiyangz-0li6OtcxBFHby3iVrkZq2A,
	maarten.lankhorst-VuQAYsv1563Yd54FQh9/CA,
	josh-iaAMLnmF4UmaiuxdJuQwMA, jglisse-H+wXaHxf7aLQT0dZR+AlfA,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A, seanpaul-F7+t8E8rja9g9hUCZPvPmw,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA, tglx-hfZtesqFncYOwBW4kG4KsQ,
	yinghai-DgEjT+Ai2ygdnm+yROfE0A,
	jonathan.derrick-ral2JQCrhuEAvxtiuMwx3w,
	chris-YvXeqwSYzG2sTnJN9+BGXg, monstr-pSz03upnqPeHXe+LvDLADg,
	linux-parisc-u79uwXL29TY76Z2rM5mHXA,
	gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r,
	dmitry.torokhov-Re5JQEeQqe8AvxtiuMwx3w,
	kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, linux-kernel

On 07/26/18 at 03:14pm, Michal Hocko wrote:
> On Thu 26-07-18 15:12:42, Michal Hocko wrote:
> > On Thu 26-07-18 21:09:04, Baoquan He wrote:
> > > On 07/26/18 at 02:59pm, Michal Hocko wrote:
> > > > On Wed 25-07-18 14:48:13, Baoquan He wrote:
> > > > > On 07/23/18 at 04:34pm, Michal Hocko wrote:
> > > > > > On Thu 19-07-18 23:17:53, Baoquan He wrote:
> > > > > > > Kexec has been a formal feature in our distro, and customers owning
> > > > > > > those kind of very large machine can make use of this feature to speed
> > > > > > > up the reboot process. On uefi machine, the kexec_file loading will
> > > > > > > search place to put kernel under 4G from top to down. As we know, the
> > > > > > > 1st 4G space is DMA32 ZONE, dma, pci mmcfg, bios etc all try to consume
> > > > > > > it. It may have possibility to not be able to find a usable space for
> > > > > > > kernel/initrd. From the top down of the whole memory space, we don't
> > > > > > > have this worry. 
> > > > > > 
> > > > > > I do not have the full context here but let me note that you should be
> > > > > > careful when doing top-down reservation because you can easily get into
> > > > > > hotplugable memory and break the hotremove usecase. We even warn when
> > > > > > this is done. See memblock_find_in_range_node
> > > > > 
> > > > > Kexec read kernel/initrd file into buffer, just search usable positions
> > > > > for them to do the later copying. You can see below struct kexec_segment, 
> > > > > for the old kexec_load, kernel/initrd are read into user space buffer,
> > > > > the @buf stores the user space buffer address, @mem stores the position
> > > > > where kernel/initrd will be put. In kernel, it calls
> > > > > kimage_load_normal_segment() to copy user space buffer to intermediate
> > > > > pages which are allocated with flag GFP_KERNEL. These intermediate pages
> > > > > are recorded as entries, later when user execute "kexec -e" to trigger
> > > > > kexec jumping, it will do the final copying from the intermediate pages
> > > > > to the real destination pages which @mem pointed. Because we can't touch
> > > > > the existed data in 1st kernel when do kexec kernel loading. With my
> > > > > understanding, GFP_KERNEL will make those intermediate pages be
> > > > > allocated inside immovable area, it won't impact hotplugging. But the
> > > > > @mem we searched in the whole system RAM might be lost along with
> > > > > hotplug. Hence we need do kexec kernel again when hotplug event is
> > > > > detected.
> > > > 
> > > > I am not sure I am following. If @mem is placed at movable node then the
> > > > memory hotremove simply won't work, because we are seeing reserved pages
> > > > and do not know what to do about them. They are not migrateable.
> > > > Allocating intermediate pages from other nodes doesn't really help.
> > > 
> > > OK, I forgot the 2nd kernel which kexec jump into. It won't impact hotremove
> > > in 1st kernel, it does impact the kernel which kexec jump into if kernel
> > > is at top of system RAM and the top RAM is in movable node.
> > 
> > It will affect the 1st kernel (which does the memblock allocation
> > top-down) as well. For reasons mentioned above.
> 
> And btw. in the ideal world, we would restrict the memblock allocation
> top-down from the non-movable nodes. But I do not think we have that
> information ready at the time when the reservation is done.

Oh, you could mix kexec loading up with kdump kernel loading. For kdump
kernel, we need reserve memory region during bootup with memblock
allocator. For kexec loading, we just operate after system up, and do
not need to reserve any memmory region. About memory used to load them,
it's quite different way.

Thanks
Baoquan

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v7 4/4] kexec_file: Load kernel at top of system RAM if required
@ 2018-07-26 13:37                     ` Baoquan He
  0 siblings, 0 replies; 83+ messages in thread
From: Baoquan He @ 2018-07-26 13:37 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, linux-kernel, robh+dt, dan.j.williams,
	nicolas.pitre, josh, fengguang.wu, bp, andy.shevchenko,
	patrik.r.jakobsson, airlied, kys, haiyangz, sthemmin,
	dmitry.torokhov, frowand.list, keith.busch, jonathan.derrick,
	lorenzo.pieralisi, bhelgaas, tglx, brijesh.singh, jglisse,
	thomas.lendacky, gregkh, baiyaowei, richard.weiyang, devel,
	linux-input, linux-nvdimm, devicetree, linux-pci, ebiederm,
	vgoyal, dyoung, yinghai, monstr, davem, chris, jcmvbkbc, gustavo,
	maarten.lankhorst, seanpaul, linux-parisc, linuxppc-dev, kexec

On 07/26/18 at 03:14pm, Michal Hocko wrote:
> On Thu 26-07-18 15:12:42, Michal Hocko wrote:
> > On Thu 26-07-18 21:09:04, Baoquan He wrote:
> > > On 07/26/18 at 02:59pm, Michal Hocko wrote:
> > > > On Wed 25-07-18 14:48:13, Baoquan He wrote:
> > > > > On 07/23/18 at 04:34pm, Michal Hocko wrote:
> > > > > > On Thu 19-07-18 23:17:53, Baoquan He wrote:
> > > > > > > Kexec has been a formal feature in our distro, and customers owning
> > > > > > > those kind of very large machine can make use of this feature to speed
> > > > > > > up the reboot process. On uefi machine, the kexec_file loading will
> > > > > > > search place to put kernel under 4G from top to down. As we know, the
> > > > > > > 1st 4G space is DMA32 ZONE, dma, pci mmcfg, bios etc all try to consume
> > > > > > > it. It may have possibility to not be able to find a usable space for
> > > > > > > kernel/initrd. From the top down of the whole memory space, we don't
> > > > > > > have this worry. 
> > > > > > 
> > > > > > I do not have the full context here but let me note that you should be
> > > > > > careful when doing top-down reservation because you can easily get into
> > > > > > hotplugable memory and break the hotremove usecase. We even warn when
> > > > > > this is done. See memblock_find_in_range_node
> > > > > 
> > > > > Kexec read kernel/initrd file into buffer, just search usable positions
> > > > > for them to do the later copying. You can see below struct kexec_segment, 
> > > > > for the old kexec_load, kernel/initrd are read into user space buffer,
> > > > > the @buf stores the user space buffer address, @mem stores the position
> > > > > where kernel/initrd will be put. In kernel, it calls
> > > > > kimage_load_normal_segment() to copy user space buffer to intermediate
> > > > > pages which are allocated with flag GFP_KERNEL. These intermediate pages
> > > > > are recorded as entries, later when user execute "kexec -e" to trigger
> > > > > kexec jumping, it will do the final copying from the intermediate pages
> > > > > to the real destination pages which @mem pointed. Because we can't touch
> > > > > the existed data in 1st kernel when do kexec kernel loading. With my
> > > > > understanding, GFP_KERNEL will make those intermediate pages be
> > > > > allocated inside immovable area, it won't impact hotplugging. But the
> > > > > @mem we searched in the whole system RAM might be lost along with
> > > > > hotplug. Hence we need do kexec kernel again when hotplug event is
> > > > > detected.
> > > > 
> > > > I am not sure I am following. If @mem is placed at movable node then the
> > > > memory hotremove simply won't work, because we are seeing reserved pages
> > > > and do not know what to do about them. They are not migrateable.
> > > > Allocating intermediate pages from other nodes doesn't really help.
> > > 
> > > OK, I forgot the 2nd kernel which kexec jump into. It won't impact hotremove
> > > in 1st kernel, it does impact the kernel which kexec jump into if kernel
> > > is at top of system RAM and the top RAM is in movable node.
> > 
> > It will affect the 1st kernel (which does the memblock allocation
> > top-down) as well. For reasons mentioned above.
> 
> And btw. in the ideal world, we would restrict the memblock allocation
> top-down from the non-movable nodes. But I do not think we have that
> information ready at the time when the reservation is done.

Oh, you could mix kexec loading up with kdump kernel loading. For kdump
kernel, we need reserve memory region during bootup with memblock
allocator. For kexec loading, we just operate after system up, and do
not need to reserve any memmory region. About memory used to load them,
it's quite different way.

Thanks
Baoquan

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v7 4/4] kexec_file: Load kernel at top of system RAM if required
@ 2018-07-26 13:37                     ` Baoquan He
  0 siblings, 0 replies; 83+ messages in thread
From: Baoquan He @ 2018-07-26 13:37 UTC (permalink / raw)
  To: Michal Hocko
  Cc: nicolas.pitre, brijesh.singh, devicetree, airlied, linux-pci,
	richard.weiyang, keith.busch, jcmvbkbc, baiyaowei, kys,
	frowand.list, dan.j.williams, lorenzo.pieralisi, sthemmin,
	linux-nvdimm, patrik.r.jakobsson, andy.shevchenko, linux-input,
	gustavo, bp, dyoung, vgoyal, thomas.lendacky, haiyangz,
	maarten.lankhorst, josh, jglisse, robh+dt, seanpaul, bhelgaas,
	tglx, yinghai, jonathan.derrick, chris, monstr, linux-parisc,
	gregkh, dmitry.torokhov, kexec, linux-kernel, ebiederm, devel,
	Andrew Morton, fengguang.wu, linuxppc-dev, davem

On 07/26/18 at 03:14pm, Michal Hocko wrote:
> On Thu 26-07-18 15:12:42, Michal Hocko wrote:
> > On Thu 26-07-18 21:09:04, Baoquan He wrote:
> > > On 07/26/18 at 02:59pm, Michal Hocko wrote:
> > > > On Wed 25-07-18 14:48:13, Baoquan He wrote:
> > > > > On 07/23/18 at 04:34pm, Michal Hocko wrote:
> > > > > > On Thu 19-07-18 23:17:53, Baoquan He wrote:
> > > > > > > Kexec has been a formal feature in our distro, and customers owning
> > > > > > > those kind of very large machine can make use of this feature to speed
> > > > > > > up the reboot process. On uefi machine, the kexec_file loading will
> > > > > > > search place to put kernel under 4G from top to down. As we know, the
> > > > > > > 1st 4G space is DMA32 ZONE, dma, pci mmcfg, bios etc all try to consume
> > > > > > > it. It may have possibility to not be able to find a usable space for
> > > > > > > kernel/initrd. From the top down of the whole memory space, we don't
> > > > > > > have this worry. 
> > > > > > 
> > > > > > I do not have the full context here but let me note that you should be
> > > > > > careful when doing top-down reservation because you can easily get into
> > > > > > hotplugable memory and break the hotremove usecase. We even warn when
> > > > > > this is done. See memblock_find_in_range_node
> > > > > 
> > > > > Kexec read kernel/initrd file into buffer, just search usable positions
> > > > > for them to do the later copying. You can see below struct kexec_segment, 
> > > > > for the old kexec_load, kernel/initrd are read into user space buffer,
> > > > > the @buf stores the user space buffer address, @mem stores the position
> > > > > where kernel/initrd will be put. In kernel, it calls
> > > > > kimage_load_normal_segment() to copy user space buffer to intermediate
> > > > > pages which are allocated with flag GFP_KERNEL. These intermediate pages
> > > > > are recorded as entries, later when user execute "kexec -e" to trigger
> > > > > kexec jumping, it will do the final copying from the intermediate pages
> > > > > to the real destination pages which @mem pointed. Because we can't touch
> > > > > the existed data in 1st kernel when do kexec kernel loading. With my
> > > > > understanding, GFP_KERNEL will make those intermediate pages be
> > > > > allocated inside immovable area, it won't impact hotplugging. But the
> > > > > @mem we searched in the whole system RAM might be lost along with
> > > > > hotplug. Hence we need do kexec kernel again when hotplug event is
> > > > > detected.
> > > > 
> > > > I am not sure I am following. If @mem is placed at movable node then the
> > > > memory hotremove simply won't work, because we are seeing reserved pages
> > > > and do not know what to do about them. They are not migrateable.
> > > > Allocating intermediate pages from other nodes doesn't really help.
> > > 
> > > OK, I forgot the 2nd kernel which kexec jump into. It won't impact hotremove
> > > in 1st kernel, it does impact the kernel which kexec jump into if kernel
> > > is at top of system RAM and the top RAM is in movable node.
> > 
> > It will affect the 1st kernel (which does the memblock allocation
> > top-down) as well. For reasons mentioned above.
> 
> And btw. in the ideal world, we would restrict the memblock allocation
> top-down from the non-movable nodes. But I do not think we have that
> information ready at the time when the reservation is done.

Oh, you could mix kexec loading up with kdump kernel loading. For kdump
kernel, we need reserve memory region during bootup with memblock
allocator. For kexec loading, we just operate after system up, and do
not need to reserve any memmory region. About memory used to load them,
it's quite different way.

Thanks
Baoquan

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v7 4/4] kexec_file: Load kernel at top of system RAM if required
  2018-07-26 13:37                     ` Baoquan He
  (?)
  (?)
@ 2018-07-26 14:01                       ` Michal Hocko
  -1 siblings, 0 replies; 83+ messages in thread
From: Michal Hocko @ 2018-07-26 14:01 UTC (permalink / raw)
  To: Baoquan He
  Cc: nicolas.pitre, brijesh.singh, devicetree, airlied, linux-pci,
	richard.weiyang, jcmvbkbc, baiyaowei, kys, frowand.list,
	lorenzo.pieralisi, sthemmin, linux-nvdimm, patrik.r.jakobsson,
	andy.shevchenko, linux-input, gustavo, bp, dyoung,
	thomas.lendacky, haiyangz, maarten.lankhorst, josh, jglisse,
	robh+dt, seanpaul, bhelgaas, tglx, yinghai, jonathan.derrick,
	chris, monstr, linux-parisc, gregkh, dmitry.torokhov, kexec,
	linux-kernel, ebiederm, devel, Andrew Morton, fengguang.wu,
	linuxppc-dev, davem

On Thu 26-07-18 21:37:05, Baoquan He wrote:
> On 07/26/18 at 03:14pm, Michal Hocko wrote:
> > On Thu 26-07-18 15:12:42, Michal Hocko wrote:
> > > On Thu 26-07-18 21:09:04, Baoquan He wrote:
> > > > On 07/26/18 at 02:59pm, Michal Hocko wrote:
> > > > > On Wed 25-07-18 14:48:13, Baoquan He wrote:
> > > > > > On 07/23/18 at 04:34pm, Michal Hocko wrote:
> > > > > > > On Thu 19-07-18 23:17:53, Baoquan He wrote:
> > > > > > > > Kexec has been a formal feature in our distro, and customers owning
> > > > > > > > those kind of very large machine can make use of this feature to speed
> > > > > > > > up the reboot process. On uefi machine, the kexec_file loading will
> > > > > > > > search place to put kernel under 4G from top to down. As we know, the
> > > > > > > > 1st 4G space is DMA32 ZONE, dma, pci mmcfg, bios etc all try to consume
> > > > > > > > it. It may have possibility to not be able to find a usable space for
> > > > > > > > kernel/initrd. From the top down of the whole memory space, we don't
> > > > > > > > have this worry. 
> > > > > > > 
> > > > > > > I do not have the full context here but let me note that you should be
> > > > > > > careful when doing top-down reservation because you can easily get into
> > > > > > > hotplugable memory and break the hotremove usecase. We even warn when
> > > > > > > this is done. See memblock_find_in_range_node
> > > > > > 
> > > > > > Kexec read kernel/initrd file into buffer, just search usable positions
> > > > > > for them to do the later copying. You can see below struct kexec_segment, 
> > > > > > for the old kexec_load, kernel/initrd are read into user space buffer,
> > > > > > the @buf stores the user space buffer address, @mem stores the position
> > > > > > where kernel/initrd will be put. In kernel, it calls
> > > > > > kimage_load_normal_segment() to copy user space buffer to intermediate
> > > > > > pages which are allocated with flag GFP_KERNEL. These intermediate pages
> > > > > > are recorded as entries, later when user execute "kexec -e" to trigger
> > > > > > kexec jumping, it will do the final copying from the intermediate pages
> > > > > > to the real destination pages which @mem pointed. Because we can't touch
> > > > > > the existed data in 1st kernel when do kexec kernel loading. With my
> > > > > > understanding, GFP_KERNEL will make those intermediate pages be
> > > > > > allocated inside immovable area, it won't impact hotplugging. But the
> > > > > > @mem we searched in the whole system RAM might be lost along with
> > > > > > hotplug. Hence we need do kexec kernel again when hotplug event is
> > > > > > detected.
> > > > > 
> > > > > I am not sure I am following. If @mem is placed at movable node then the
> > > > > memory hotremove simply won't work, because we are seeing reserved pages
> > > > > and do not know what to do about them. They are not migrateable.
> > > > > Allocating intermediate pages from other nodes doesn't really help.
> > > > 
> > > > OK, I forgot the 2nd kernel which kexec jump into. It won't impact hotremove
> > > > in 1st kernel, it does impact the kernel which kexec jump into if kernel
> > > > is at top of system RAM and the top RAM is in movable node.
> > > 
> > > It will affect the 1st kernel (which does the memblock allocation
> > > top-down) as well. For reasons mentioned above.
> > 
> > And btw. in the ideal world, we would restrict the memblock allocation
> > top-down from the non-movable nodes. But I do not think we have that
> > information ready at the time when the reservation is done.
> 
> Oh, you could mix kexec loading up with kdump kernel loading. For kdump
> kernel, we need reserve memory region during bootup with memblock
> allocator. For kexec loading, we just operate after system up, and do
> not need to reserve any memmory region. About memory used to load them,
> it's quite different way.

I didn't know about that. I thought both use the same underlying
reservation mechanism. My bad and sorry for the noise.
-- 
Michal Hocko
SUSE Labs
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v7 4/4] kexec_file: Load kernel at top of system RAM if required
@ 2018-07-26 14:01                       ` Michal Hocko
  0 siblings, 0 replies; 83+ messages in thread
From: Michal Hocko @ 2018-07-26 14:01 UTC (permalink / raw)
  To: Baoquan He
  Cc: nicolas.pitre, brijesh.singh, devicetree, airlied, linux-pci,
	richard.weiyang, keith.busch, jcmvbkbc, baiyaowei, frowand.list,
	dan.j.williams, lorenzo.pieralisi, sthemmin, linux-nvdimm,
	patrik.r.jakobsson, andy.shevchenko, linux-input, gustavo, bp,
	dyoung, vgoyal, thomas.lendacky, haiyangz, maarten.lankhorst,
	josh, jglisse, robh+dt, seanpaul, bhelgaas, tglx, yinghai,
	jonathan.derrick, chris, monstr, linux-parisc, gregkh,
	dmitry.torokhov

On Thu 26-07-18 21:37:05, Baoquan He wrote:
> On 07/26/18 at 03:14pm, Michal Hocko wrote:
> > On Thu 26-07-18 15:12:42, Michal Hocko wrote:
> > > On Thu 26-07-18 21:09:04, Baoquan He wrote:
> > > > On 07/26/18 at 02:59pm, Michal Hocko wrote:
> > > > > On Wed 25-07-18 14:48:13, Baoquan He wrote:
> > > > > > On 07/23/18 at 04:34pm, Michal Hocko wrote:
> > > > > > > On Thu 19-07-18 23:17:53, Baoquan He wrote:
> > > > > > > > Kexec has been a formal feature in our distro, and customers owning
> > > > > > > > those kind of very large machine can make use of this feature to speed
> > > > > > > > up the reboot process. On uefi machine, the kexec_file loading will
> > > > > > > > search place to put kernel under 4G from top to down. As we know, the
> > > > > > > > 1st 4G space is DMA32 ZONE, dma, pci mmcfg, bios etc all try to consume
> > > > > > > > it. It may have possibility to not be able to find a usable space for
> > > > > > > > kernel/initrd. From the top down of the whole memory space, we don't
> > > > > > > > have this worry. 
> > > > > > > 
> > > > > > > I do not have the full context here but let me note that you should be
> > > > > > > careful when doing top-down reservation because you can easily get into
> > > > > > > hotplugable memory and break the hotremove usecase. We even warn when
> > > > > > > this is done. See memblock_find_in_range_node
> > > > > > 
> > > > > > Kexec read kernel/initrd file into buffer, just search usable positions
> > > > > > for them to do the later copying. You can see below struct kexec_segment, 
> > > > > > for the old kexec_load, kernel/initrd are read into user space buffer,
> > > > > > the @buf stores the user space buffer address, @mem stores the position
> > > > > > where kernel/initrd will be put. In kernel, it calls
> > > > > > kimage_load_normal_segment() to copy user space buffer to intermediate
> > > > > > pages which are allocated with flag GFP_KERNEL. These intermediate pages
> > > > > > are recorded as entries, later when user execute "kexec -e" to trigger
> > > > > > kexec jumping, it will do the final copying from the intermediate pages
> > > > > > to the real destination pages which @mem pointed. Because we can't touch
> > > > > > the existed data in 1st kernel when do kexec kernel loading. With my
> > > > > > understanding, GFP_KERNEL will make those intermediate pages be
> > > > > > allocated inside immovable area, it won't impact hotplugging. But the
> > > > > > @mem we searched in the whole system RAM might be lost along with
> > > > > > hotplug. Hence we need do kexec kernel again when hotplug event is
> > > > > > detected.
> > > > > 
> > > > > I am not sure I am following. If @mem is placed at movable node then the
> > > > > memory hotremove simply won't work, because we are seeing reserved pages
> > > > > and do not know what to do about them. They are not migrateable.
> > > > > Allocating intermediate pages from other nodes doesn't really help.
> > > > 
> > > > OK, I forgot the 2nd kernel which kexec jump into. It won't impact hotremove
> > > > in 1st kernel, it does impact the kernel which kexec jump into if kernel
> > > > is at top of system RAM and the top RAM is in movable node.
> > > 
> > > It will affect the 1st kernel (which does the memblock allocation
> > > top-down) as well. For reasons mentioned above.
> > 
> > And btw. in the ideal world, we would restrict the memblock allocation
> > top-down from the non-movable nodes. But I do not think we have that
> > information ready at the time when the reservation is done.
> 
> Oh, you could mix kexec loading up with kdump kernel loading. For kdump
> kernel, we need reserve memory region during bootup with memblock
> allocator. For kexec loading, we just operate after system up, and do
> not need to reserve any memmory region. About memory used to load them,
> it's quite different way.

I didn't know about that. I thought both use the same underlying
reservation mechanism. My bad and sorry for the noise.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v7 4/4] kexec_file: Load kernel at top of system RAM if required
@ 2018-07-26 14:01                       ` Michal Hocko
  0 siblings, 0 replies; 83+ messages in thread
From: Michal Hocko @ 2018-07-26 14:01 UTC (permalink / raw)
  To: Baoquan He
  Cc: Andrew Morton, linux-kernel, robh+dt, dan.j.williams,
	nicolas.pitre, josh, fengguang.wu, bp, andy.shevchenko,
	patrik.r.jakobsson, airlied, kys, haiyangz, sthemmin,
	dmitry.torokhov, frowand.list, keith.busch, jonathan.derrick,
	lorenzo.pieralisi, bhelgaas, tglx, brijesh.singh, jglisse,
	thomas.lendacky, gregkh, baiyaowei, richard.weiyang, devel,
	linux-input, linux-nvdimm, devicetree, linux-pci, ebiederm,
	vgoyal, dyoung, yinghai, monstr, davem, chris, jcmvbkbc, gustavo,
	maarten.lankhorst, seanpaul, linux-parisc, linuxppc-dev, kexec

On Thu 26-07-18 21:37:05, Baoquan He wrote:
> On 07/26/18 at 03:14pm, Michal Hocko wrote:
> > On Thu 26-07-18 15:12:42, Michal Hocko wrote:
> > > On Thu 26-07-18 21:09:04, Baoquan He wrote:
> > > > On 07/26/18 at 02:59pm, Michal Hocko wrote:
> > > > > On Wed 25-07-18 14:48:13, Baoquan He wrote:
> > > > > > On 07/23/18 at 04:34pm, Michal Hocko wrote:
> > > > > > > On Thu 19-07-18 23:17:53, Baoquan He wrote:
> > > > > > > > Kexec has been a formal feature in our distro, and customers owning
> > > > > > > > those kind of very large machine can make use of this feature to speed
> > > > > > > > up the reboot process. On uefi machine, the kexec_file loading will
> > > > > > > > search place to put kernel under 4G from top to down. As we know, the
> > > > > > > > 1st 4G space is DMA32 ZONE, dma, pci mmcfg, bios etc all try to consume
> > > > > > > > it. It may have possibility to not be able to find a usable space for
> > > > > > > > kernel/initrd. From the top down of the whole memory space, we don't
> > > > > > > > have this worry. 
> > > > > > > 
> > > > > > > I do not have the full context here but let me note that you should be
> > > > > > > careful when doing top-down reservation because you can easily get into
> > > > > > > hotplugable memory and break the hotremove usecase. We even warn when
> > > > > > > this is done. See memblock_find_in_range_node
> > > > > > 
> > > > > > Kexec read kernel/initrd file into buffer, just search usable positions
> > > > > > for them to do the later copying. You can see below struct kexec_segment, 
> > > > > > for the old kexec_load, kernel/initrd are read into user space buffer,
> > > > > > the @buf stores the user space buffer address, @mem stores the position
> > > > > > where kernel/initrd will be put. In kernel, it calls
> > > > > > kimage_load_normal_segment() to copy user space buffer to intermediate
> > > > > > pages which are allocated with flag GFP_KERNEL. These intermediate pages
> > > > > > are recorded as entries, later when user execute "kexec -e" to trigger
> > > > > > kexec jumping, it will do the final copying from the intermediate pages
> > > > > > to the real destination pages which @mem pointed. Because we can't touch
> > > > > > the existed data in 1st kernel when do kexec kernel loading. With my
> > > > > > understanding, GFP_KERNEL will make those intermediate pages be
> > > > > > allocated inside immovable area, it won't impact hotplugging. But the
> > > > > > @mem we searched in the whole system RAM might be lost along with
> > > > > > hotplug. Hence we need do kexec kernel again when hotplug event is
> > > > > > detected.
> > > > > 
> > > > > I am not sure I am following. If @mem is placed at movable node then the
> > > > > memory hotremove simply won't work, because we are seeing reserved pages
> > > > > and do not know what to do about them. They are not migrateable.
> > > > > Allocating intermediate pages from other nodes doesn't really help.
> > > > 
> > > > OK, I forgot the 2nd kernel which kexec jump into. It won't impact hotremove
> > > > in 1st kernel, it does impact the kernel which kexec jump into if kernel
> > > > is at top of system RAM and the top RAM is in movable node.
> > > 
> > > It will affect the 1st kernel (which does the memblock allocation
> > > top-down) as well. For reasons mentioned above.
> > 
> > And btw. in the ideal world, we would restrict the memblock allocation
> > top-down from the non-movable nodes. But I do not think we have that
> > information ready at the time when the reservation is done.
> 
> Oh, you could mix kexec loading up with kdump kernel loading. For kdump
> kernel, we need reserve memory region during bootup with memblock
> allocator. For kexec loading, we just operate after system up, and do
> not need to reserve any memmory region. About memory used to load them,
> it's quite different way.

I didn't know about that. I thought both use the same underlying
reservation mechanism. My bad and sorry for the noise.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v7 4/4] kexec_file: Load kernel at top of system RAM if required
@ 2018-07-26 14:01                       ` Michal Hocko
  0 siblings, 0 replies; 83+ messages in thread
From: Michal Hocko @ 2018-07-26 14:01 UTC (permalink / raw)
  To: Baoquan He
  Cc: nicolas.pitre, brijesh.singh, devicetree, airlied, linux-pci,
	richard.weiyang, keith.busch, jcmvbkbc, baiyaowei, kys,
	frowand.list, dan.j.williams, lorenzo.pieralisi, sthemmin,
	linux-nvdimm, patrik.r.jakobsson, andy.shevchenko, linux-input,
	gustavo, bp, dyoung, vgoyal, thomas.lendacky, haiyangz,
	maarten.lankhorst, josh, jglisse, robh+dt, seanpaul, bhelgaas,
	tglx, yinghai, jonathan.derrick, chris, monstr, linux-parisc,
	gregkh, dmitry.torokhov, kexec, linux-kernel, ebiederm, devel,
	Andrew Morton, fengguang.wu, linuxppc-dev, davem

On Thu 26-07-18 21:37:05, Baoquan He wrote:
> On 07/26/18 at 03:14pm, Michal Hocko wrote:
> > On Thu 26-07-18 15:12:42, Michal Hocko wrote:
> > > On Thu 26-07-18 21:09:04, Baoquan He wrote:
> > > > On 07/26/18 at 02:59pm, Michal Hocko wrote:
> > > > > On Wed 25-07-18 14:48:13, Baoquan He wrote:
> > > > > > On 07/23/18 at 04:34pm, Michal Hocko wrote:
> > > > > > > On Thu 19-07-18 23:17:53, Baoquan He wrote:
> > > > > > > > Kexec has been a formal feature in our distro, and customers owning
> > > > > > > > those kind of very large machine can make use of this feature to speed
> > > > > > > > up the reboot process. On uefi machine, the kexec_file loading will
> > > > > > > > search place to put kernel under 4G from top to down. As we know, the
> > > > > > > > 1st 4G space is DMA32 ZONE, dma, pci mmcfg, bios etc all try to consume
> > > > > > > > it. It may have possibility to not be able to find a usable space for
> > > > > > > > kernel/initrd. From the top down of the whole memory space, we don't
> > > > > > > > have this worry. 
> > > > > > > 
> > > > > > > I do not have the full context here but let me note that you should be
> > > > > > > careful when doing top-down reservation because you can easily get into
> > > > > > > hotplugable memory and break the hotremove usecase. We even warn when
> > > > > > > this is done. See memblock_find_in_range_node
> > > > > > 
> > > > > > Kexec read kernel/initrd file into buffer, just search usable positions
> > > > > > for them to do the later copying. You can see below struct kexec_segment, 
> > > > > > for the old kexec_load, kernel/initrd are read into user space buffer,
> > > > > > the @buf stores the user space buffer address, @mem stores the position
> > > > > > where kernel/initrd will be put. In kernel, it calls
> > > > > > kimage_load_normal_segment() to copy user space buffer to intermediate
> > > > > > pages which are allocated with flag GFP_KERNEL. These intermediate pages
> > > > > > are recorded as entries, later when user execute "kexec -e" to trigger
> > > > > > kexec jumping, it will do the final copying from the intermediate pages
> > > > > > to the real destination pages which @mem pointed. Because we can't touch
> > > > > > the existed data in 1st kernel when do kexec kernel loading. With my
> > > > > > understanding, GFP_KERNEL will make those intermediate pages be
> > > > > > allocated inside immovable area, it won't impact hotplugging. But the
> > > > > > @mem we searched in the whole system RAM might be lost along with
> > > > > > hotplug. Hence we need do kexec kernel again when hotplug event is
> > > > > > detected.
> > > > > 
> > > > > I am not sure I am following. If @mem is placed at movable node then the
> > > > > memory hotremove simply won't work, because we are seeing reserved pages
> > > > > and do not know what to do about them. They are not migrateable.
> > > > > Allocating intermediate pages from other nodes doesn't really help.
> > > > 
> > > > OK, I forgot the 2nd kernel which kexec jump into. It won't impact hotremove
> > > > in 1st kernel, it does impact the kernel which kexec jump into if kernel
> > > > is at top of system RAM and the top RAM is in movable node.
> > > 
> > > It will affect the 1st kernel (which does the memblock allocation
> > > top-down) as well. For reasons mentioned above.
> > 
> > And btw. in the ideal world, we would restrict the memblock allocation
> > top-down from the non-movable nodes. But I do not think we have that
> > information ready at the time when the reservation is done.
> 
> Oh, you could mix kexec loading up with kdump kernel loading. For kdump
> kernel, we need reserve memory region during bootup with memblock
> allocator. For kexec loading, we just operate after system up, and do
> not need to reserve any memmory region. About memory used to load them,
> it's quite different way.

I didn't know about that. I thought both use the same underlying
reservation mechanism. My bad and sorry for the noise.
-- 
Michal Hocko
SUSE Labs

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v7 4/4] kexec_file: Load kernel at top of system RAM if required
  2018-07-26 14:01                       ` Michal Hocko
                                           ` (2 preceding siblings ...)
  (?)
@ 2018-07-26 15:10                         ` Baoquan He
  -1 siblings, 0 replies; 83+ messages in thread
From: Baoquan He @ 2018-07-26 15:10 UTC (permalink / raw)
  To: Michal Hocko
  Cc: nicolas.pitre, brijesh.singh, devicetree, airlied, linux-pci,
	richard.weiyang, jcmvbkbc, baiyaowei, kys, frowand.list,
	lorenzo.pieralisi, sthemmin, linux-nvdimm, patrik.r.jakobsson,
	andy.shevchenko, linux-input, gustavo, bp, dyoung,
	thomas.lendacky, haiyangz, maarten.lankhorst, josh, jglisse,
	robh+dt, seanpaul, bhelgaas, tglx, yinghai, jonathan.derrick,
	chris, monstr, linux-parisc, gregkh, dmitry.torokhov, kexec,
	linux-kernel, ebiederm, devel, Andrew Morton, fengguang.wu,
	linuxppc-dev, davem

On 07/26/18 at 04:01pm, Michal Hocko wrote:
> On Thu 26-07-18 21:37:05, Baoquan He wrote:
> > On 07/26/18 at 03:14pm, Michal Hocko wrote:
> > > On Thu 26-07-18 15:12:42, Michal Hocko wrote:
> > > > On Thu 26-07-18 21:09:04, Baoquan He wrote:
> > > > > On 07/26/18 at 02:59pm, Michal Hocko wrote:
> > > > > > On Wed 25-07-18 14:48:13, Baoquan He wrote:
> > > > > > > On 07/23/18 at 04:34pm, Michal Hocko wrote:
> > > > > > > > On Thu 19-07-18 23:17:53, Baoquan He wrote:
> > > > > > > > > Kexec has been a formal feature in our distro, and customers owning
> > > > > > > > > those kind of very large machine can make use of this feature to speed
> > > > > > > > > up the reboot process. On uefi machine, the kexec_file loading will
> > > > > > > > > search place to put kernel under 4G from top to down. As we know, the
> > > > > > > > > 1st 4G space is DMA32 ZONE, dma, pci mmcfg, bios etc all try to consume
> > > > > > > > > it. It may have possibility to not be able to find a usable space for
> > > > > > > > > kernel/initrd. From the top down of the whole memory space, we don't
> > > > > > > > > have this worry. 
> > > > > > > > 
> > > > > > > > I do not have the full context here but let me note that you should be
> > > > > > > > careful when doing top-down reservation because you can easily get into
> > > > > > > > hotplugable memory and break the hotremove usecase. We even warn when
> > > > > > > > this is done. See memblock_find_in_range_node
> > > > > > > 
> > > > > > > Kexec read kernel/initrd file into buffer, just search usable positions
> > > > > > > for them to do the later copying. You can see below struct kexec_segment, 
> > > > > > > for the old kexec_load, kernel/initrd are read into user space buffer,
> > > > > > > the @buf stores the user space buffer address, @mem stores the position
> > > > > > > where kernel/initrd will be put. In kernel, it calls
> > > > > > > kimage_load_normal_segment() to copy user space buffer to intermediate
> > > > > > > pages which are allocated with flag GFP_KERNEL. These intermediate pages
> > > > > > > are recorded as entries, later when user execute "kexec -e" to trigger
> > > > > > > kexec jumping, it will do the final copying from the intermediate pages
> > > > > > > to the real destination pages which @mem pointed. Because we can't touch
> > > > > > > the existed data in 1st kernel when do kexec kernel loading. With my
> > > > > > > understanding, GFP_KERNEL will make those intermediate pages be
> > > > > > > allocated inside immovable area, it won't impact hotplugging. But the
> > > > > > > @mem we searched in the whole system RAM might be lost along with
> > > > > > > hotplug. Hence we need do kexec kernel again when hotplug event is
> > > > > > > detected.
> > > > > > 
> > > > > > I am not sure I am following. If @mem is placed at movable node then the
> > > > > > memory hotremove simply won't work, because we are seeing reserved pages
> > > > > > and do not know what to do about them. They are not migrateable.
> > > > > > Allocating intermediate pages from other nodes doesn't really help.
> > > > > 
> > > > > OK, I forgot the 2nd kernel which kexec jump into. It won't impact hotremove
> > > > > in 1st kernel, it does impact the kernel which kexec jump into if kernel
> > > > > is at top of system RAM and the top RAM is in movable node.
> > > > 
> > > > It will affect the 1st kernel (which does the memblock allocation
> > > > top-down) as well. For reasons mentioned above.
> > > 
> > > And btw. in the ideal world, we would restrict the memblock allocation
> > > top-down from the non-movable nodes. But I do not think we have that
> > > information ready at the time when the reservation is done.
> > 
> > Oh, you could mix kexec loading up with kdump kernel loading. For kdump
> > kernel, we need reserve memory region during bootup with memblock
> > allocator. For kexec loading, we just operate after system up, and do
> > not need to reserve any memmory region. About memory used to load them,
> > it's quite different way.
> 
> I didn't know about that. I thought both use the same underlying
> reservation mechanism. My bad and sorry for the noise.

Not at all. It's truly confusing. I often need take time to recall those
details. 
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v7 4/4] kexec_file: Load kernel at top of system RAM if required
@ 2018-07-26 15:10                         ` Baoquan He
  0 siblings, 0 replies; 83+ messages in thread
From: Baoquan He @ 2018-07-26 15:10 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, linux-kernel, robh+dt, dan.j.williams,
	nicolas.pitre, josh, fengguang.wu, bp, andy.shevchenko,
	patrik.r.jakobsson, airlied, kys, haiyangz, sthemmin,
	dmitry.torokhov, frowand.list, keith.busch, jonathan.derrick,
	lorenzo.pieralisi, bhelgaas, tglx, brijesh.singh, jglisse,
	thomas.lendacky, gregkh, baiyaowei, richard.weiyang, devel,
	linux-input, linux-nvdimm, devicetree

On 07/26/18 at 04:01pm, Michal Hocko wrote:
> On Thu 26-07-18 21:37:05, Baoquan He wrote:
> > On 07/26/18 at 03:14pm, Michal Hocko wrote:
> > > On Thu 26-07-18 15:12:42, Michal Hocko wrote:
> > > > On Thu 26-07-18 21:09:04, Baoquan He wrote:
> > > > > On 07/26/18 at 02:59pm, Michal Hocko wrote:
> > > > > > On Wed 25-07-18 14:48:13, Baoquan He wrote:
> > > > > > > On 07/23/18 at 04:34pm, Michal Hocko wrote:
> > > > > > > > On Thu 19-07-18 23:17:53, Baoquan He wrote:
> > > > > > > > > Kexec has been a formal feature in our distro, and customers owning
> > > > > > > > > those kind of very large machine can make use of this feature to speed
> > > > > > > > > up the reboot process. On uefi machine, the kexec_file loading will
> > > > > > > > > search place to put kernel under 4G from top to down. As we know, the
> > > > > > > > > 1st 4G space is DMA32 ZONE, dma, pci mmcfg, bios etc all try to consume
> > > > > > > > > it. It may have possibility to not be able to find a usable space for
> > > > > > > > > kernel/initrd. From the top down of the whole memory space, we don't
> > > > > > > > > have this worry. 
> > > > > > > > 
> > > > > > > > I do not have the full context here but let me note that you should be
> > > > > > > > careful when doing top-down reservation because you can easily get into
> > > > > > > > hotplugable memory and break the hotremove usecase. We even warn when
> > > > > > > > this is done. See memblock_find_in_range_node
> > > > > > > 
> > > > > > > Kexec read kernel/initrd file into buffer, just search usable positions
> > > > > > > for them to do the later copying. You can see below struct kexec_segment, 
> > > > > > > for the old kexec_load, kernel/initrd are read into user space buffer,
> > > > > > > the @buf stores the user space buffer address, @mem stores the position
> > > > > > > where kernel/initrd will be put. In kernel, it calls
> > > > > > > kimage_load_normal_segment() to copy user space buffer to intermediate
> > > > > > > pages which are allocated with flag GFP_KERNEL. These intermediate pages
> > > > > > > are recorded as entries, later when user execute "kexec -e" to trigger
> > > > > > > kexec jumping, it will do the final copying from the intermediate pages
> > > > > > > to the real destination pages which @mem pointed. Because we can't touch
> > > > > > > the existed data in 1st kernel when do kexec kernel loading. With my
> > > > > > > understanding, GFP_KERNEL will make those intermediate pages be
> > > > > > > allocated inside immovable area, it won't impact hotplugging. But the
> > > > > > > @mem we searched in the whole system RAM might be lost along with
> > > > > > > hotplug. Hence we need do kexec kernel again when hotplug event is
> > > > > > > detected.
> > > > > > 
> > > > > > I am not sure I am following. If @mem is placed at movable node then the
> > > > > > memory hotremove simply won't work, because we are seeing reserved pages
> > > > > > and do not know what to do about them. They are not migrateable.
> > > > > > Allocating intermediate pages from other nodes doesn't really help.
> > > > > 
> > > > > OK, I forgot the 2nd kernel which kexec jump into. It won't impact hotremove
> > > > > in 1st kernel, it does impact the kernel which kexec jump into if kernel
> > > > > is at top of system RAM and the top RAM is in movable node.
> > > > 
> > > > It will affect the 1st kernel (which does the memblock allocation
> > > > top-down) as well. For reasons mentioned above.
> > > 
> > > And btw. in the ideal world, we would restrict the memblock allocation
> > > top-down from the non-movable nodes. But I do not think we have that
> > > information ready at the time when the reservation is done.
> > 
> > Oh, you could mix kexec loading up with kdump kernel loading. For kdump
> > kernel, we need reserve memory region during bootup with memblock
> > allocator. For kexec loading, we just operate after system up, and do
> > not need to reserve any memmory region. About memory used to load them,
> > it's quite different way.
> 
> I didn't know about that. I thought both use the same underlying
> reservation mechanism. My bad and sorry for the noise.

Not at all. It's truly confusing. I often need take time to recall those
details. 

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v7 4/4] kexec_file: Load kernel at top of system RAM if required
@ 2018-07-26 15:10                         ` Baoquan He
  0 siblings, 0 replies; 83+ messages in thread
From: Baoquan He @ 2018-07-26 15:10 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, linux-kernel, robh+dt, dan.j.williams,
	nicolas.pitre, josh, fengguang.wu, bp, andy.shevchenko,
	patrik.r.jakobsson, airlied, kys, haiyangz, sthemmin,
	dmitry.torokhov, frowand.list, keith.busch, jonathan.derrick,
	lorenzo.pieralisi, bhelgaas, tglx, brijesh.singh, jglisse,
	thomas.lendacky, gregkh, baiyaowei, richard.weiyang, devel,
	linux-input, linux-nvdimm, devicetree, linux-pci, ebiederm,
	vgoyal, dyoung, yinghai, monstr, davem, chris, jcmvbkbc, gustavo,
	maarten.lankhorst, seanpaul, linux-parisc, linuxppc-dev, kexec

On 07/26/18 at 04:01pm, Michal Hocko wrote:
> On Thu 26-07-18 21:37:05, Baoquan He wrote:
> > On 07/26/18 at 03:14pm, Michal Hocko wrote:
> > > On Thu 26-07-18 15:12:42, Michal Hocko wrote:
> > > > On Thu 26-07-18 21:09:04, Baoquan He wrote:
> > > > > On 07/26/18 at 02:59pm, Michal Hocko wrote:
> > > > > > On Wed 25-07-18 14:48:13, Baoquan He wrote:
> > > > > > > On 07/23/18 at 04:34pm, Michal Hocko wrote:
> > > > > > > > On Thu 19-07-18 23:17:53, Baoquan He wrote:
> > > > > > > > > Kexec has been a formal feature in our distro, and customers owning
> > > > > > > > > those kind of very large machine can make use of this feature to speed
> > > > > > > > > up the reboot process. On uefi machine, the kexec_file loading will
> > > > > > > > > search place to put kernel under 4G from top to down. As we know, the
> > > > > > > > > 1st 4G space is DMA32 ZONE, dma, pci mmcfg, bios etc all try to consume
> > > > > > > > > it. It may have possibility to not be able to find a usable space for
> > > > > > > > > kernel/initrd. From the top down of the whole memory space, we don't
> > > > > > > > > have this worry. 
> > > > > > > > 
> > > > > > > > I do not have the full context here but let me note that you should be
> > > > > > > > careful when doing top-down reservation because you can easily get into
> > > > > > > > hotplugable memory and break the hotremove usecase. We even warn when
> > > > > > > > this is done. See memblock_find_in_range_node
> > > > > > > 
> > > > > > > Kexec read kernel/initrd file into buffer, just search usable positions
> > > > > > > for them to do the later copying. You can see below struct kexec_segment, 
> > > > > > > for the old kexec_load, kernel/initrd are read into user space buffer,
> > > > > > > the @buf stores the user space buffer address, @mem stores the position
> > > > > > > where kernel/initrd will be put. In kernel, it calls
> > > > > > > kimage_load_normal_segment() to copy user space buffer to intermediate
> > > > > > > pages which are allocated with flag GFP_KERNEL. These intermediate pages
> > > > > > > are recorded as entries, later when user execute "kexec -e" to trigger
> > > > > > > kexec jumping, it will do the final copying from the intermediate pages
> > > > > > > to the real destination pages which @mem pointed. Because we can't touch
> > > > > > > the existed data in 1st kernel when do kexec kernel loading. With my
> > > > > > > understanding, GFP_KERNEL will make those intermediate pages be
> > > > > > > allocated inside immovable area, it won't impact hotplugging. But the
> > > > > > > @mem we searched in the whole system RAM might be lost along with
> > > > > > > hotplug. Hence we need do kexec kernel again when hotplug event is
> > > > > > > detected.
> > > > > > 
> > > > > > I am not sure I am following. If @mem is placed at movable node then the
> > > > > > memory hotremove simply won't work, because we are seeing reserved pages
> > > > > > and do not know what to do about them. They are not migrateable.
> > > > > > Allocating intermediate pages from other nodes doesn't really help.
> > > > > 
> > > > > OK, I forgot the 2nd kernel which kexec jump into. It won't impact hotremove
> > > > > in 1st kernel, it does impact the kernel which kexec jump into if kernel
> > > > > is at top of system RAM and the top RAM is in movable node.
> > > > 
> > > > It will affect the 1st kernel (which does the memblock allocation
> > > > top-down) as well. For reasons mentioned above.
> > > 
> > > And btw. in the ideal world, we would restrict the memblock allocation
> > > top-down from the non-movable nodes. But I do not think we have that
> > > information ready at the time when the reservation is done.
> > 
> > Oh, you could mix kexec loading up with kdump kernel loading. For kdump
> > kernel, we need reserve memory region during bootup with memblock
> > allocator. For kexec loading, we just operate after system up, and do
> > not need to reserve any memmory region. About memory used to load them,
> > it's quite different way.
> 
> I didn't know about that. I thought both use the same underlying
> reservation mechanism. My bad and sorry for the noise.

Not at all. It's truly confusing. I often need take time to recall those
details. 

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v7 4/4] kexec_file: Load kernel at top of system RAM if required
@ 2018-07-26 15:10                         ` Baoquan He
  0 siblings, 0 replies; 83+ messages in thread
From: Baoquan He @ 2018-07-26 15:10 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, linux-kernel, robh+dt, dan.j.williams,
	nicolas.pitre, josh, fengguang.wu, bp, andy.shevchenko,
	patrik.r.jakobsson, airlied, kys, haiyangz, sthemmin,
	dmitry.torokhov, frowand.list, keith.busch, jonathan.derrick,
	lorenzo.pieralisi, bhelgaas, tglx, brijesh.singh, jglisse,
	thomas.lendacky, gregkh, baiyaowei, richard.weiyang, devel,
	linux-input, linux-nvdimm, devicetree

On 07/26/18 at 04:01pm, Michal Hocko wrote:
> On Thu 26-07-18 21:37:05, Baoquan He wrote:
> > On 07/26/18 at 03:14pm, Michal Hocko wrote:
> > > On Thu 26-07-18 15:12:42, Michal Hocko wrote:
> > > > On Thu 26-07-18 21:09:04, Baoquan He wrote:
> > > > > On 07/26/18 at 02:59pm, Michal Hocko wrote:
> > > > > > On Wed 25-07-18 14:48:13, Baoquan He wrote:
> > > > > > > On 07/23/18 at 04:34pm, Michal Hocko wrote:
> > > > > > > > On Thu 19-07-18 23:17:53, Baoquan He wrote:
> > > > > > > > > Kexec has been a formal feature in our distro, and customers owning
> > > > > > > > > those kind of very large machine can make use of this feature to speed
> > > > > > > > > up the reboot process. On uefi machine, the kexec_file loading will
> > > > > > > > > search place to put kernel under 4G from top to down. As we know, the
> > > > > > > > > 1st 4G space is DMA32 ZONE, dma, pci mmcfg, bios etc all try to consume
> > > > > > > > > it. It may have possibility to not be able to find a usable space for
> > > > > > > > > kernel/initrd. From the top down of the whole memory space, we don't
> > > > > > > > > have this worry. 
> > > > > > > > 
> > > > > > > > I do not have the full context here but let me note that you should be
> > > > > > > > careful when doing top-down reservation because you can easily get into
> > > > > > > > hotplugable memory and break the hotremove usecase. We even warn when
> > > > > > > > this is done. See memblock_find_in_range_node
> > > > > > > 
> > > > > > > Kexec read kernel/initrd file into buffer, just search usable positions
> > > > > > > for them to do the later copying. You can see below struct kexec_segment, 
> > > > > > > for the old kexec_load, kernel/initrd are read into user space buffer,
> > > > > > > the @buf stores the user space buffer address, @mem stores the position
> > > > > > > where kernel/initrd will be put. In kernel, it calls
> > > > > > > kimage_load_normal_segment() to copy user space buffer to intermediate
> > > > > > > pages which are allocated with flag GFP_KERNEL. These intermediate pages
> > > > > > > are recorded as entries, later when user execute "kexec -e" to trigger
> > > > > > > kexec jumping, it will do the final copying from the intermediate pages
> > > > > > > to the real destination pages which @mem pointed. Because we can't touch
> > > > > > > the existed data in 1st kernel when do kexec kernel loading. With my
> > > > > > > understanding, GFP_KERNEL will make those intermediate pages be
> > > > > > > allocated inside immovable area, it won't impact hotplugging. But the
> > > > > > > @mem we searched in the whole system RAM might be lost along with
> > > > > > > hotplug. Hence we need do kexec kernel again when hotplug event is
> > > > > > > detected.
> > > > > > 
> > > > > > I am not sure I am following. If @mem is placed at movable node then the
> > > > > > memory hotremove simply won't work, because we are seeing reserved pages
> > > > > > and do not know what to do about them. They are not migrateable.
> > > > > > Allocating intermediate pages from other nodes doesn't really help.
> > > > > 
> > > > > OK, I forgot the 2nd kernel which kexec jump into. It won't impact hotremove
> > > > > in 1st kernel, it does impact the kernel which kexec jump into if kernel
> > > > > is at top of system RAM and the top RAM is in movable node.
> > > > 
> > > > It will affect the 1st kernel (which does the memblock allocation
> > > > top-down) as well. For reasons mentioned above.
> > > 
> > > And btw. in the ideal world, we would restrict the memblock allocation
> > > top-down from the non-movable nodes. But I do not think we have that
> > > information ready at the time when the reservation is done.
> > 
> > Oh, you could mix kexec loading up with kdump kernel loading. For kdump
> > kernel, we need reserve memory region during bootup with memblock
> > allocator. For kexec loading, we just operate after system up, and do
> > not need to reserve any memmory region. About memory used to load them,
> > it's quite different way.
> 
> I didn't know about that. I thought both use the same underlying
> reservation mechanism. My bad and sorry for the noise.

Not at all. It's truly confusing. I often need take time to recall those
details. 

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v7 4/4] kexec_file: Load kernel at top of system RAM if required
@ 2018-07-26 15:10                         ` Baoquan He
  0 siblings, 0 replies; 83+ messages in thread
From: Baoquan He @ 2018-07-26 15:10 UTC (permalink / raw)
  To: Michal Hocko
  Cc: nicolas.pitre, brijesh.singh, devicetree, airlied, linux-pci,
	richard.weiyang, keith.busch, jcmvbkbc, baiyaowei, kys,
	frowand.list, dan.j.williams, lorenzo.pieralisi, sthemmin,
	linux-nvdimm, patrik.r.jakobsson, andy.shevchenko, linux-input,
	gustavo, bp, dyoung, vgoyal, thomas.lendacky, haiyangz,
	maarten.lankhorst, josh, jglisse, robh+dt, seanpaul, bhelgaas,
	tglx, yinghai, jonathan.derrick, chris, monstr, linux-parisc,
	gregkh, dmitry.torokhov, kexec, linux-kernel, ebiederm, devel,
	Andrew Morton, fengguang.wu, linuxppc-dev, davem

On 07/26/18 at 04:01pm, Michal Hocko wrote:
> On Thu 26-07-18 21:37:05, Baoquan He wrote:
> > On 07/26/18 at 03:14pm, Michal Hocko wrote:
> > > On Thu 26-07-18 15:12:42, Michal Hocko wrote:
> > > > On Thu 26-07-18 21:09:04, Baoquan He wrote:
> > > > > On 07/26/18 at 02:59pm, Michal Hocko wrote:
> > > > > > On Wed 25-07-18 14:48:13, Baoquan He wrote:
> > > > > > > On 07/23/18 at 04:34pm, Michal Hocko wrote:
> > > > > > > > On Thu 19-07-18 23:17:53, Baoquan He wrote:
> > > > > > > > > Kexec has been a formal feature in our distro, and customers owning
> > > > > > > > > those kind of very large machine can make use of this feature to speed
> > > > > > > > > up the reboot process. On uefi machine, the kexec_file loading will
> > > > > > > > > search place to put kernel under 4G from top to down. As we know, the
> > > > > > > > > 1st 4G space is DMA32 ZONE, dma, pci mmcfg, bios etc all try to consume
> > > > > > > > > it. It may have possibility to not be able to find a usable space for
> > > > > > > > > kernel/initrd. From the top down of the whole memory space, we don't
> > > > > > > > > have this worry. 
> > > > > > > > 
> > > > > > > > I do not have the full context here but let me note that you should be
> > > > > > > > careful when doing top-down reservation because you can easily get into
> > > > > > > > hotplugable memory and break the hotremove usecase. We even warn when
> > > > > > > > this is done. See memblock_find_in_range_node
> > > > > > > 
> > > > > > > Kexec read kernel/initrd file into buffer, just search usable positions
> > > > > > > for them to do the later copying. You can see below struct kexec_segment, 
> > > > > > > for the old kexec_load, kernel/initrd are read into user space buffer,
> > > > > > > the @buf stores the user space buffer address, @mem stores the position
> > > > > > > where kernel/initrd will be put. In kernel, it calls
> > > > > > > kimage_load_normal_segment() to copy user space buffer to intermediate
> > > > > > > pages which are allocated with flag GFP_KERNEL. These intermediate pages
> > > > > > > are recorded as entries, later when user execute "kexec -e" to trigger
> > > > > > > kexec jumping, it will do the final copying from the intermediate pages
> > > > > > > to the real destination pages which @mem pointed. Because we can't touch
> > > > > > > the existed data in 1st kernel when do kexec kernel loading. With my
> > > > > > > understanding, GFP_KERNEL will make those intermediate pages be
> > > > > > > allocated inside immovable area, it won't impact hotplugging. But the
> > > > > > > @mem we searched in the whole system RAM might be lost along with
> > > > > > > hotplug. Hence we need do kexec kernel again when hotplug event is
> > > > > > > detected.
> > > > > > 
> > > > > > I am not sure I am following. If @mem is placed at movable node then the
> > > > > > memory hotremove simply won't work, because we are seeing reserved pages
> > > > > > and do not know what to do about them. They are not migrateable.
> > > > > > Allocating intermediate pages from other nodes doesn't really help.
> > > > > 
> > > > > OK, I forgot the 2nd kernel which kexec jump into. It won't impact hotremove
> > > > > in 1st kernel, it does impact the kernel which kexec jump into if kernel
> > > > > is at top of system RAM and the top RAM is in movable node.
> > > > 
> > > > It will affect the 1st kernel (which does the memblock allocation
> > > > top-down) as well. For reasons mentioned above.
> > > 
> > > And btw. in the ideal world, we would restrict the memblock allocation
> > > top-down from the non-movable nodes. But I do not think we have that
> > > information ready at the time when the reservation is done.
> > 
> > Oh, you could mix kexec loading up with kdump kernel loading. For kdump
> > kernel, we need reserve memory region during bootup with memblock
> > allocator. For kexec loading, we just operate after system up, and do
> > not need to reserve any memmory region. About memory used to load them,
> > it's quite different way.
> 
> I didn't know about that. I thought both use the same underlying
> reservation mechanism. My bad and sorry for the noise.

Not at all. It's truly confusing. I often need take time to recall those
details. 

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 83+ messages in thread

end of thread, other threads:[~2018-07-26 15:10 UTC | newest]

Thread overview: 83+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-18  2:49 [PATCH v7 0/4] resource: Use list_head to link sibling resource Baoquan He
2018-07-18  2:49 ` Baoquan He
2018-07-18  2:49 ` Baoquan He
2018-07-18  2:49 ` [PATCH v7 1/4] resource: Move reparent_resources() to kernel/resource.c and make it public Baoquan He
2018-07-18  2:49   ` Baoquan He
2018-07-18  2:49   ` Baoquan He
2018-07-18 16:36   ` Andy Shevchenko
2018-07-18 16:36     ` Andy Shevchenko
2018-07-18 16:36     ` Andy Shevchenko
2018-07-18 16:37     ` Andy Shevchenko
2018-07-18 16:37       ` Andy Shevchenko
2018-07-18 16:37       ` Andy Shevchenko
2018-07-18 16:37       ` Andy Shevchenko
2018-07-19 15:18       ` Baoquan He
2018-07-19 15:18         ` Baoquan He
2018-07-19 15:18         ` Baoquan He
2018-07-19 15:18         ` Baoquan He
2018-07-18  2:49 ` [PATCH v7 2/4] resource: Use list_head to link sibling resource Baoquan He
2018-07-18  2:49   ` Baoquan He
2018-07-18  2:49   ` Baoquan He
2018-07-18  2:49   ` Baoquan He
2018-07-18  2:49 ` [PATCH v7 3/4] resource: add walk_system_ram_res_rev() Baoquan He
2018-07-18  2:49   ` Baoquan He
2018-07-18  2:49   ` Baoquan He
2018-07-18  2:49   ` Baoquan He
2018-07-18  2:49 ` [PATCH v7 4/4] kexec_file: Load kernel at top of system RAM if required Baoquan He
2018-07-18  2:49   ` Baoquan He
2018-07-18  2:49   ` Baoquan He
2018-07-18  2:49   ` Baoquan He
2018-07-18 22:33   ` Andrew Morton
2018-07-18 22:33     ` Andrew Morton
2018-07-18 22:33     ` Andrew Morton
2018-07-18 22:33     ` Andrew Morton
2018-07-19 15:17     ` Baoquan He
2018-07-19 15:17       ` Baoquan He
2018-07-19 15:17       ` Baoquan He
2018-07-19 15:17       ` Baoquan He
2018-07-19 19:44       ` Andrew Morton
2018-07-19 19:44         ` Andrew Morton
2018-07-19 19:44         ` Andrew Morton
2018-07-19 19:44         ` Andrew Morton
2018-07-25  2:21         ` Baoquan He
2018-07-25  2:21           ` Baoquan He
2018-07-25  2:21           ` Baoquan He
2018-07-25  2:21           ` Baoquan He
2018-07-23 14:34       ` Michal Hocko
2018-07-23 14:34         ` Michal Hocko
2018-07-23 14:34         ` Michal Hocko
2018-07-23 14:34         ` Michal Hocko
2018-07-23 14:34         ` Michal Hocko
2018-07-25  6:48         ` Baoquan He
2018-07-25  6:48           ` Baoquan He
2018-07-25  6:48           ` Baoquan He
2018-07-25  6:48           ` Baoquan He
2018-07-26 12:59           ` Michal Hocko
2018-07-26 12:59             ` Michal Hocko
2018-07-26 12:59             ` Michal Hocko
2018-07-26 12:59             ` Michal Hocko
2018-07-26 13:09             ` Baoquan He
2018-07-26 13:09               ` Baoquan He
2018-07-26 13:09               ` Baoquan He
2018-07-26 13:09               ` Baoquan He
2018-07-26 13:12               ` Michal Hocko
2018-07-26 13:12                 ` Michal Hocko
2018-07-26 13:12                 ` Michal Hocko
2018-07-26 13:12                 ` Michal Hocko
2018-07-26 13:14                 ` Michal Hocko
2018-07-26 13:14                   ` Michal Hocko
2018-07-26 13:14                   ` Michal Hocko
2018-07-26 13:14                   ` Michal Hocko
2018-07-26 13:37                   ` Baoquan He
2018-07-26 13:37                     ` Baoquan He
2018-07-26 13:37                     ` Baoquan He
2018-07-26 13:37                     ` Baoquan He
2018-07-26 14:01                     ` Michal Hocko
2018-07-26 14:01                       ` Michal Hocko
2018-07-26 14:01                       ` Michal Hocko
2018-07-26 14:01                       ` Michal Hocko
2018-07-26 15:10                       ` Baoquan He
2018-07-26 15:10                         ` Baoquan He
2018-07-26 15:10                         ` Baoquan He
2018-07-26 15:10                         ` Baoquan He
2018-07-26 15:10                         ` Baoquan He

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.