* [PATCH v3 0/2] x86/boot: Some fixes @ 2018-01-10 13:05 Daniel Kiper 2018-01-10 13:05 ` [PATCH v3 1/2] x86/setup: do not relocate Xen over current Xen image placement Daniel Kiper 2018-01-10 13:05 ` [PATCH v3 2/2] x86/setup: remap Xen image up to PFN_DOWN(__pa(_end)) Daniel Kiper 0 siblings, 2 replies; 8+ messages in thread From: Daniel Kiper @ 2018-01-10 13:05 UTC (permalink / raw) To: xen-devel; +Cc: andrew.cooper3, jbeulich Hi, As in subject... Daniel xen/arch/x86/setup.c | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-) Daniel Kiper (2): x86/setup: do not relocate Xen over current Xen image placement x86/setup: remap Xen image up to PFN_DOWN(__pa(_end)) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH v3 1/2] x86/setup: do not relocate Xen over current Xen image placement 2018-01-10 13:05 [PATCH v3 0/2] x86/boot: Some fixes Daniel Kiper @ 2018-01-10 13:05 ` Daniel Kiper 2018-01-19 15:13 ` Jan Beulich 2018-01-10 13:05 ` [PATCH v3 2/2] x86/setup: remap Xen image up to PFN_DOWN(__pa(_end)) Daniel Kiper 1 sibling, 1 reply; 8+ messages in thread From: Daniel Kiper @ 2018-01-10 13:05 UTC (permalink / raw) To: xen-devel; +Cc: andrew.cooper3, jbeulich Otherwise, due to Xen code/data changes under CPU feet, Xen may crash silently at boot. We were hit by the issue in OVS Xen 4.4 with my earlier version of EFI/Multiboot2 patches. Initially its implementation allowed relocation of Xen even if it was relocated by the bootloader. This led to the crashes on some new Oracle machines because copy destination partially overlapped with the end of current/initial Xen image placement. After some discussion on Xen-devel we decided to disable Xen relocation in my EFI/Multiboot2 upstream patches if the booloader did the work for us. Though one case is still not covered. If Xen is not relocated by the booloader then it tries to do that by itself. If all RAM regions above currently occupied one are unsuitable for relocation then Xen tries to move itself higher in it. And if (end - reloc_size + XEN_IMG_OFFSET) goes below __pa(_end) then copy/relocation destination overlaps, at least partially, with its source. I can agree that this should not happen on todays machines very often. If at all. It is rather unusual to not have usable RAM regions above ~5 MiB nowadays. Though I think that we should at least consider putting such safety measure here. Otherwise Xen may crash mysteriously without any stack trace. It is very confusing and impairs further debugging. Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com> --- v3 - suggestions/fixes: - use more readable condition (suggested by Jan Beulich), - improve comment (suggested by Jan Beulich), - improve commit message. v2 - suggestions/fixes: - improve comment (suggested by Jan Beulich), - improve commit message (suggested by Jan Beulich). --- xen/arch/x86/setup.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c index 2e10c6b..9f8abd9 100644 --- a/xen/arch/x86/setup.c +++ b/xen/arch/x86/setup.c @@ -962,7 +962,12 @@ void __init noreturn __start_xen(unsigned long mbi_p) } else end = 0; - if ( end > s ) + + /* + * Is the region size greater than zero and does it begin + * at or above the end of current Xen image placement? + */ + if ( (end > s) && (end - reloc_size + XEN_IMG_OFFSET >= __pa(_end)) ) { l4_pgentry_t *pl4e; l3_pgentry_t *pl3e; -- 1.7.10.4 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH v3 1/2] x86/setup: do not relocate Xen over current Xen image placement 2018-01-10 13:05 ` [PATCH v3 1/2] x86/setup: do not relocate Xen over current Xen image placement Daniel Kiper @ 2018-01-19 15:13 ` Jan Beulich 0 siblings, 0 replies; 8+ messages in thread From: Jan Beulich @ 2018-01-19 15:13 UTC (permalink / raw) To: Daniel Kiper; +Cc: andrew.cooper3, xen-devel >>> On 10.01.18 at 14:05, <daniel.kiper@oracle.com> wrote: > Otherwise, due to Xen code/data changes under CPU feet, Xen may crash > silently at boot. > > We were hit by the issue in OVS Xen 4.4 with my earlier version of > EFI/Multiboot2 patches. Initially its implementation allowed relocation > of Xen even if it was relocated by the bootloader. This led to the > crashes on some new Oracle machines because copy destination partially > overlapped with the end of current/initial Xen image placement. > > After some discussion on Xen-devel we decided to disable Xen relocation in > my EFI/Multiboot2 upstream patches if the booloader did the work for us. > Though one case is still not covered. If Xen is not relocated by the > booloader then it tries to do that by itself. If all RAM regions above > currently occupied one are unsuitable for relocation then Xen tries to move > itself higher in it. And if (end - reloc_size + XEN_IMG_OFFSET) goes below > __pa(_end) then copy/relocation destination overlaps, at least partially, > with its source. > > I can agree that this should not happen on todays machines very often. > If at all. It is rather unusual to not have usable RAM regions above > ~5 MiB nowadays. Though I think that we should at least consider putting > such safety measure here. Otherwise Xen may crash mysteriously without > any stack trace. It is very confusing and impairs further debugging. > > Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com> Acked-by: Jan Beulich <jbeulich@suse.com> _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH v3 2/2] x86/setup: remap Xen image up to PFN_DOWN(__pa(_end)) 2018-01-10 13:05 [PATCH v3 0/2] x86/boot: Some fixes Daniel Kiper 2018-01-10 13:05 ` [PATCH v3 1/2] x86/setup: do not relocate Xen over current Xen image placement Daniel Kiper @ 2018-01-10 13:05 ` Daniel Kiper 2018-01-19 15:27 ` Jan Beulich 1 sibling, 1 reply; 8+ messages in thread From: Daniel Kiper @ 2018-01-10 13:05 UTC (permalink / raw) To: xen-devel; +Cc: andrew.cooper3, jbeulich Current limit, PFN_DOWN(xen_phys_start), introduced by commit b280442 (x86: make Xen early boot code relocatable) is not reliable. Potentially its value may fall below PFN_DOWN(__pa(_end)) and then part of Xen image may not be mapped after relocation. This will not happen in current code thanks to "x86/setup: do not relocate over current Xen image placement" patch. Though this safety measure may save a lot of debugging time when somebody decide to relax existing relocation restrictions one day. Additionally, remapping will execute a bit faster due to this change. Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com> --- xen/arch/x86/setup.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c index 9f8abd9..ac163e2 100644 --- a/xen/arch/x86/setup.c +++ b/xen/arch/x86/setup.c @@ -973,6 +973,11 @@ void __init noreturn __start_xen(unsigned long mbi_p) l3_pgentry_t *pl3e; l2_pgentry_t *pl2e; int i, j, k; + /* + * We have to calculate xen_remap_end_pfn before + * xen_phys_start change. + */ + unsigned long xen_remap_end_pfn = PFN_DOWN(__pa(_end)); /* Select relocation address. */ e = end - reloc_size; @@ -1002,7 +1007,7 @@ void __init noreturn __start_xen(unsigned long mbi_p) /* Not present, 1GB mapping, or already relocated? */ if ( !(l3e_get_flags(*pl3e) & _PAGE_PRESENT) || (l3e_get_flags(*pl3e) & _PAGE_PSE) || - (l3e_get_pfn(*pl3e) > PFN_DOWN(xen_phys_start)) ) + (l3e_get_pfn(*pl3e) > xen_remap_end_pfn) ) continue; *pl3e = l3e_from_intpte(l3e_get_intpte(*pl3e) + xen_phys_start); @@ -1012,7 +1017,7 @@ void __init noreturn __start_xen(unsigned long mbi_p) /* Not present, PSE, or already relocated? */ if ( !(l2e_get_flags(*pl2e) & _PAGE_PRESENT) || (l2e_get_flags(*pl2e) & _PAGE_PSE) || - (l2e_get_pfn(*pl2e) > PFN_DOWN(xen_phys_start)) ) + (l2e_get_pfn(*pl2e) > xen_remap_end_pfn) ) continue; *pl2e = l2e_from_intpte(l2e_get_intpte(*pl2e) + xen_phys_start); @@ -1036,7 +1041,7 @@ void __init noreturn __start_xen(unsigned long mbi_p) unsigned int flags; if ( !(l2e_get_flags(*pl2e) & _PAGE_PRESENT) || - (l2e_get_pfn(*pl2e) > PFN_DOWN(xen_phys_start)) ) + (l2e_get_pfn(*pl2e) > xen_remap_end_pfn) ) continue; if ( !using_2M_mapping() ) -- 1.7.10.4 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH v3 2/2] x86/setup: remap Xen image up to PFN_DOWN(__pa(_end)) 2018-01-10 13:05 ` [PATCH v3 2/2] x86/setup: remap Xen image up to PFN_DOWN(__pa(_end)) Daniel Kiper @ 2018-01-19 15:27 ` Jan Beulich 2018-02-08 13:46 ` Daniel Kiper 0 siblings, 1 reply; 8+ messages in thread From: Jan Beulich @ 2018-01-19 15:27 UTC (permalink / raw) To: Daniel Kiper; +Cc: andrew.cooper3, xen-devel >>> On 10.01.18 at 14:05, <daniel.kiper@oracle.com> wrote: > Current limit, PFN_DOWN(xen_phys_start), introduced by commit b280442 > (x86: make Xen early boot code relocatable) is not reliable. Potentially > its value may fall below PFN_DOWN(__pa(_end)) and then part of Xen image > may not be mapped after relocation. This will not happen in current code > thanks to "x86/setup: do not relocate over current Xen image placement" > patch. Though this safety measure may save a lot of debugging time when > somebody decide to relax existing relocation restrictions one day. I've gone back through the v2 discussion, and I continue to fail to see what is being fixed here, even if just theoretically. It is bad enough that the description here isn't clarifying this and I need to go back to the earlier discussion, but it's even worse if even that earlier discussion didn't really help. My conclusion is that you're talking about a case where old and positions of Xen overlap, a case which I thought patch 1 eliminates. > Additionally, remapping will execute a bit faster due to this change. Besides it hardly mattering - how come? > --- a/xen/arch/x86/setup.c > +++ b/xen/arch/x86/setup.c > @@ -973,6 +973,11 @@ void __init noreturn __start_xen(unsigned long mbi_p) > l3_pgentry_t *pl3e; > l2_pgentry_t *pl2e; > int i, j, k; > + /* > + * We have to calculate xen_remap_end_pfn before > + * xen_phys_start change. > + */ > + unsigned long xen_remap_end_pfn = PFN_DOWN(__pa(_end)); In case you can provide a convincing reason for the patch to be needed (or at least wanted) - the xen_ prefix is pointless here, and you might help the compiler (and maybe also the reader) a little by declaring the whole thing const. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v3 2/2] x86/setup: remap Xen image up to PFN_DOWN(__pa(_end)) 2018-01-19 15:27 ` Jan Beulich @ 2018-02-08 13:46 ` Daniel Kiper 2018-02-13 9:16 ` Jan Beulich 0 siblings, 1 reply; 8+ messages in thread From: Daniel Kiper @ 2018-02-08 13:46 UTC (permalink / raw) To: Jan Beulich; +Cc: andrew.cooper3, xen-devel Sorry for late reply but I was busy with other stuff. On Fri, Jan 19, 2018 at 08:27:46AM -0700, Jan Beulich wrote: > >>> On 10.01.18 at 14:05, <daniel.kiper@oracle.com> wrote: > > Current limit, PFN_DOWN(xen_phys_start), introduced by commit b280442 > > (x86: make Xen early boot code relocatable) is not reliable. Potentially > > its value may fall below PFN_DOWN(__pa(_end)) and then part of Xen image > > may not be mapped after relocation. This will not happen in current code > > thanks to "x86/setup: do not relocate over current Xen image placement" > > patch. Though this safety measure may save a lot of debugging time when > > somebody decide to relax existing relocation restrictions one day. > > I've gone back through the v2 discussion, and I continue to fail to > see what is being fixed here, even if just theoretically. It is bad OK, let's give an example. I assume that there is no patch 1 and Xen can relocate itself even it was initially relocated by the bootloader. So, let's assume that the bootloader loaded Xen image at 0x80200000 (xen_phys_start == 0x80000000) and its size is 0x700000 (7 MiB). The RAM region ends at 0x80D00000 and there is no RAM above that address. At some point Xen realizes that it can relocate itself to 0x80600000 (xen_phys_start == 0x80400000). So, it does and then remaps itself. And here is the problem. Currently existing code will remap only Xen image up to 0x803fffff. Everything above will no be remapped. So, that is why I suggested this patch. > enough that the description here isn't clarifying this and I need to > go back to the earlier discussion, but it's even worse if even that > earlier discussion didn't really help. My conclusion is that you're Sorry about that. > talking about a case where old and positions of Xen overlap, a > case which I thought patch 1 eliminates. It does not eliminate the issue described above. It just hides it. > > Additionally, remapping will execute a bit faster due to this change. > > Besides it hardly mattering - how come? The continue triggered by "l3e_get_pfn(*pl3e) > ..." check will fire after going above xen_remap_end_pfn instead of PFN_DOWN(xen_phys_start). And xen_remap_end_pfn is often much less than PFN_DOWN(xen_phys_start). However, I agree that it hardly matters here. It is just side effect of this patch. This is not main goal of it. > > --- a/xen/arch/x86/setup.c > > +++ b/xen/arch/x86/setup.c > > @@ -973,6 +973,11 @@ void __init noreturn __start_xen(unsigned long mbi_p) > > l3_pgentry_t *pl3e; > > l2_pgentry_t *pl2e; > > int i, j, k; > > + /* > > + * We have to calculate xen_remap_end_pfn before > > + * xen_phys_start change. > > + */ > > + unsigned long xen_remap_end_pfn = PFN_DOWN(__pa(_end)); > > In case you can provide a convincing reason for the patch to be > needed (or at least wanted) - the xen_ prefix is pointless here, > and you might help the compiler (and maybe also the reader) a > little by declaring the whole thing const. OK. Daniel _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v3 2/2] x86/setup: remap Xen image up to PFN_DOWN(__pa(_end)) 2018-02-08 13:46 ` Daniel Kiper @ 2018-02-13 9:16 ` Jan Beulich 2018-03-29 9:31 ` Daniel Kiper 0 siblings, 1 reply; 8+ messages in thread From: Jan Beulich @ 2018-02-13 9:16 UTC (permalink / raw) To: Daniel Kiper; +Cc: andrew.cooper3, xen-devel >>> On 08.02.18 at 14:46, <daniel.kiper@oracle.com> wrote: > Sorry for late reply but I was busy with other stuff. > > On Fri, Jan 19, 2018 at 08:27:46AM -0700, Jan Beulich wrote: >> >>> On 10.01.18 at 14:05, <daniel.kiper@oracle.com> wrote: >> > Current limit, PFN_DOWN(xen_phys_start), introduced by commit b280442 >> > (x86: make Xen early boot code relocatable) is not reliable. Potentially >> > its value may fall below PFN_DOWN(__pa(_end)) and then part of Xen image >> > may not be mapped after relocation. This will not happen in current code >> > thanks to "x86/setup: do not relocate over current Xen image placement" >> > patch. Though this safety measure may save a lot of debugging time when >> > somebody decide to relax existing relocation restrictions one day. >> >> I've gone back through the v2 discussion, and I continue to fail to >> see what is being fixed here, even if just theoretically. It is bad > > OK, let's give an example. I assume that there is no patch 1 and Xen can > relocate itself even it was initially relocated by the bootloader. So, > let's assume that the bootloader loaded Xen image at 0x80200000 > (xen_phys_start == 0x80000000) and its size is 0x700000 (7 MiB). > The RAM region ends at 0x80D00000 and there is no RAM above that > address. At some point Xen realizes that it can relocate itself > to 0x80600000 (xen_phys_start == 0x80400000). So, it does and then > remaps itself. And here is the problem. Currently existing code > will remap only Xen image up to 0x803fffff. Everything above will > no be remapped. So, that is why I suggested this patch. > >> enough that the description here isn't clarifying this and I need to >> go back to the earlier discussion, but it's even worse if even that >> earlier discussion didn't really help. My conclusion is that you're > > Sorry about that. > >> talking about a case where old and positions of Xen overlap, a >> case which I thought patch 1 eliminates. > > It does not eliminate the issue described above. It just hides it. Well, no, I disagree - it makes an overlap impossible afaict, which is more that just hiding the problem. Anyway - I'm not going to object to the change provided it comes with a clear description of what _existing_ issue (even if just a theoretical one) is being fixed _with the currently present code in mind_ (i.e. in particular including your patch 1). Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v3 2/2] x86/setup: remap Xen image up to PFN_DOWN(__pa(_end)) 2018-02-13 9:16 ` Jan Beulich @ 2018-03-29 9:31 ` Daniel Kiper 0 siblings, 0 replies; 8+ messages in thread From: Daniel Kiper @ 2018-03-29 9:31 UTC (permalink / raw) To: Jan Beulich; +Cc: andrew.cooper3, xen-devel On Tue, Feb 13, 2018 at 02:16:22AM -0700, Jan Beulich wrote: > >>> On 08.02.18 at 14:46, <daniel.kiper@oracle.com> wrote: > > Sorry for late reply but I was busy with other stuff. > > > > On Fri, Jan 19, 2018 at 08:27:46AM -0700, Jan Beulich wrote: > >> >>> On 10.01.18 at 14:05, <daniel.kiper@oracle.com> wrote: > >> > Current limit, PFN_DOWN(xen_phys_start), introduced by commit b280442 > >> > (x86: make Xen early boot code relocatable) is not reliable. Potentially > >> > its value may fall below PFN_DOWN(__pa(_end)) and then part of Xen image > >> > may not be mapped after relocation. This will not happen in current code > >> > thanks to "x86/setup: do not relocate over current Xen image placement" > >> > patch. Though this safety measure may save a lot of debugging time when > >> > somebody decide to relax existing relocation restrictions one day. > >> > >> I've gone back through the v2 discussion, and I continue to fail to > >> see what is being fixed here, even if just theoretically. It is bad > > > > OK, let's give an example. I assume that there is no patch 1 and Xen can > > relocate itself even it was initially relocated by the bootloader. So, > > let's assume that the bootloader loaded Xen image at 0x80200000 > > (xen_phys_start == 0x80000000) and its size is 0x700000 (7 MiB). > > The RAM region ends at 0x80D00000 and there is no RAM above that > > address. At some point Xen realizes that it can relocate itself > > to 0x80600000 (xen_phys_start == 0x80400000). So, it does and then > > remaps itself. And here is the problem. Currently existing code > > will remap only Xen image up to 0x803fffff. Everything above will > > no be remapped. So, that is why I suggested this patch. > > > >> enough that the description here isn't clarifying this and I need to > >> go back to the earlier discussion, but it's even worse if even that > >> earlier discussion didn't really help. My conclusion is that you're > > > > Sorry about that. > > > >> talking about a case where old and positions of Xen overlap, a > >> case which I thought patch 1 eliminates. > > > > It does not eliminate the issue described above. It just hides it. > > Well, no, I disagree - it makes an overlap impossible afaict, > which is more that just hiding the problem. Anyway - I'm not > going to object to the change provided it comes with a clear > description of what _existing_ issue (even if just a theoretical > one) is being fixed _with the currently present code in mind_ > (i.e. in particular including your patch 1). Well, it looks that I have misread the code and I was simply lying. Sorry about that. However, I had strange feeling that still something is wrong here. And it is wrong. Currently destination region between __image_base__ and (__image_base__ + XEN_IMG_OFFSET) may overlap with the end of source image. And here is the problem. If anything between __page_tables_start and __page_tables_end lands in the overlap then some or even all page table entries may not be updated. This means boom in early boot which will be difficult to the investigate. So, I think the we have three choices to fix the issue: - drop XEN_IMG_OFFSET from if ( (end > s) && (end - reloc_size + XEN_IMG_OFFSET >= __pa(_end)) ) - add XEN_IMG_OFFSET to xen_phys_start in PFN_DOWN(xen_phys_start) used in loops as one of conditions, - change PFN_DOWN(xen_phys_start) to PFN_DOWN(xen_remap_end_pfn) proposed in this patch. I think that we should choose first option. This way we will avoid all kinds of overlaps which are always full can of worms. Daniel _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2018-03-29 9:31 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-01-10 13:05 [PATCH v3 0/2] x86/boot: Some fixes Daniel Kiper 2018-01-10 13:05 ` [PATCH v3 1/2] x86/setup: do not relocate Xen over current Xen image placement Daniel Kiper 2018-01-19 15:13 ` Jan Beulich 2018-01-10 13:05 ` [PATCH v3 2/2] x86/setup: remap Xen image up to PFN_DOWN(__pa(_end)) Daniel Kiper 2018-01-19 15:27 ` Jan Beulich 2018-02-08 13:46 ` Daniel Kiper 2018-02-13 9:16 ` Jan Beulich 2018-03-29 9:31 ` Daniel Kiper
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.