* [RFC PATCH] xen/x86: Increase xen_e820_map to E820_X_MAX possible entries @ 2016-11-15 0:11 Alex Thorlton 2016-11-15 6:33 ` Juergen Gross ` (2 more replies) 0 siblings, 3 replies; 26+ messages in thread From: Alex Thorlton @ 2016-11-15 0:11 UTC (permalink / raw) To: linux-kernel Cc: Alex Thorlton, Russ Anderson, David Vrabel, Juergen Gross, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, xen-devel Hey everyone, We're having problems with large systems hitting a BUG in xen_memory_setup, due to extra e820 entries created in the XENMEM_machine_memory_map callback. The change in the patch gets things working, but Boris and I wanted to get opinions on whether or not this is the appropriate/entire solution, which is why I've sent it as an RFC for now. Boris pointed out to me that E820_X_MAX is only large when CONFIG_EFI=y, which is a detail worth discussig. He proposed possibly adding CONFIG_XEN to the conditions under which we set E820_X_MAX to a larger value than E820MAX, since the Xen e820 table isn't bound by the zero-page memory limitations. I do *slightly* question the use of E820_X_MAX here, only from a cosmetic prospective, as I believe this macro is intended to describe the maximum size of the extended e820 table, which, AFAIK, is not used by the Xen HV. That being said, there isn't exactly a "more appropriate" macro/variable to use, so this may not really be an issue. Any input on the patch, or the questions I've raised above is greatly appreciated! - Alex Alex Thorlton (1): xen/x86: Increase xen_e820_map to E820_X_MAX possible entries arch/x86/xen/setup.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) -- 1.8.5.6 ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [RFC PATCH] xen/x86: Increase xen_e820_map to E820_X_MAX possible entries 2016-11-15 0:11 [RFC PATCH] xen/x86: Increase xen_e820_map to E820_X_MAX possible entries Alex Thorlton @ 2016-11-15 6:33 ` Juergen Gross 2016-11-15 7:15 ` Jan Beulich [not found] ` <582AC427020000780011EA7E@suse.com> 2016-11-30 3:24 ` [RFC PATCH v2] " Alex Thorlton 2016-12-05 17:49 ` [RFC PATCH v3] " Alex Thorlton 2 siblings, 2 replies; 26+ messages in thread From: Juergen Gross @ 2016-11-15 6:33 UTC (permalink / raw) To: Alex Thorlton, linux-kernel, Jan Beulich Cc: Russ Anderson, David Vrabel, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, xen-devel On 15/11/16 01:11, Alex Thorlton wrote: > Hey everyone, > > We're having problems with large systems hitting a BUG in > xen_memory_setup, due to extra e820 entries created in the > XENMEM_machine_memory_map callback. The change in the patch gets things > working, but Boris and I wanted to get opinions on whether or not this > is the appropriate/entire solution, which is why I've sent it as an RFC > for now. > > Boris pointed out to me that E820_X_MAX is only large when CONFIG_EFI=y, > which is a detail worth discussig. He proposed possibly adding > CONFIG_XEN to the conditions under which we set E820_X_MAX to a larger > value than E820MAX, since the Xen e820 table isn't bound by the > zero-page memory limitations. > > I do *slightly* question the use of E820_X_MAX here, only from a > cosmetic prospective, as I believe this macro is intended to describe > the maximum size of the extended e820 table, which, AFAIK, is not used > by the Xen HV. That being said, there isn't exactly a "more > appropriate" macro/variable to use, so this may not really be an issue. > > Any input on the patch, or the questions I've raised above is greatly > appreciated! While I think extending the e820 table is the right thing to do I'm questioning the assumptions here. Looking briefly through the Xen hypervisor sources I think it isn't yet ready for such large machines: the hypervisor's e820 map seems to be still limited to 128 e820 entries. Jan, did I overlook an EFI specific path extending this limitation? In case I'm right the Xen hypervisor should be prepared for a larger e820 map, but this won't help alone as there would still be additional entries for the IOAPICs created. So I think we need something like: #define E820_XEN_MAX (E820_X_MAX + MAX_IO_APICS) and use this for sizing xen_e820_map[]. Juergen ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [RFC PATCH] xen/x86: Increase xen_e820_map to E820_X_MAX possible entries 2016-11-15 6:33 ` Juergen Gross @ 2016-11-15 7:15 ` Jan Beulich [not found] ` <582AC427020000780011EA7E@suse.com> 1 sibling, 0 replies; 26+ messages in thread From: Jan Beulich @ 2016-11-15 7:15 UTC (permalink / raw) To: Juergen Gross Cc: David Vrabel, x86, Thomas Gleixner, xen-devel, Ingo Molnar, Alex Thorlton, Russ Anderson, linux-kernel, H. Peter Anvin >>> On 15.11.16 at 07:33, <JGross@suse.com> wrote: > On 15/11/16 01:11, Alex Thorlton wrote: >> Hey everyone, >> >> We're having problems with large systems hitting a BUG in >> xen_memory_setup, due to extra e820 entries created in the >> XENMEM_machine_memory_map callback. The change in the patch gets things >> working, but Boris and I wanted to get opinions on whether or not this >> is the appropriate/entire solution, which is why I've sent it as an RFC >> for now. >> >> Boris pointed out to me that E820_X_MAX is only large when CONFIG_EFI=y, >> which is a detail worth discussig. He proposed possibly adding >> CONFIG_XEN to the conditions under which we set E820_X_MAX to a larger >> value than E820MAX, since the Xen e820 table isn't bound by the >> zero-page memory limitations. >> >> I do *slightly* question the use of E820_X_MAX here, only from a >> cosmetic prospective, as I believe this macro is intended to describe >> the maximum size of the extended e820 table, which, AFAIK, is not used >> by the Xen HV. That being said, there isn't exactly a "more >> appropriate" macro/variable to use, so this may not really be an issue. >> >> Any input on the patch, or the questions I've raised above is greatly >> appreciated! > > While I think extending the e820 table is the right thing to do I'm > questioning the assumptions here. > > Looking briefly through the Xen hypervisor sources I think it isn't > yet ready for such large machines: the hypervisor's e820 map seems to > be still limited to 128 e820 entries. Jan, did I overlook an EFI > specific path extending this limitation? No, you didn't. I do question the correlation with "large machines" here though: The issue isn't with large machines afaict, but with ones having very many entries (i.e. heavily fragmented). > In case I'm right the Xen hypervisor should be prepared for a larger > e820 map, but this won't help alone as there would still be additional > entries for the IOAPICs created. > > So I think we need something like: > > #define E820_XEN_MAX (E820_X_MAX + MAX_IO_APICS) > > and use this for sizing xen_e820_map[]. I would say that if any change gets done here, there shouldn't be any static upper limit at all. That could even be viewed as in line with recent e820.c changes moving to dynamic allocations. In particular I don't see why MAX_IO_APICS would need adding in here, but not other (current and future) factors determining the (pseudo) E820 map Xen presents to Dom0. Jan ^ permalink raw reply [flat|nested] 26+ messages in thread
[parent not found: <582AC427020000780011EA7E@suse.com>]
* Re: [RFC PATCH] xen/x86: Increase xen_e820_map to E820_X_MAX possible entries [not found] ` <582AC427020000780011EA7E@suse.com> @ 2016-11-15 7:36 ` Juergen Gross 2016-11-15 8:01 ` Jan Beulich [not found] ` <582ACEDE020000780011EAC9@suse.com> 0 siblings, 2 replies; 26+ messages in thread From: Juergen Gross @ 2016-11-15 7:36 UTC (permalink / raw) To: Jan Beulich Cc: David Vrabel, x86, Thomas Gleixner, xen-devel, Ingo Molnar, Alex Thorlton, Russ Anderson, linux-kernel, H. Peter Anvin On 15/11/16 08:15, Jan Beulich wrote: >>>> On 15.11.16 at 07:33, <JGross@suse.com> wrote: >> On 15/11/16 01:11, Alex Thorlton wrote: >>> Hey everyone, >>> >>> We're having problems with large systems hitting a BUG in >>> xen_memory_setup, due to extra e820 entries created in the >>> XENMEM_machine_memory_map callback. The change in the patch gets things >>> working, but Boris and I wanted to get opinions on whether or not this >>> is the appropriate/entire solution, which is why I've sent it as an RFC >>> for now. >>> >>> Boris pointed out to me that E820_X_MAX is only large when CONFIG_EFI=y, >>> which is a detail worth discussig. He proposed possibly adding >>> CONFIG_XEN to the conditions under which we set E820_X_MAX to a larger >>> value than E820MAX, since the Xen e820 table isn't bound by the >>> zero-page memory limitations. >>> >>> I do *slightly* question the use of E820_X_MAX here, only from a >>> cosmetic prospective, as I believe this macro is intended to describe >>> the maximum size of the extended e820 table, which, AFAIK, is not used >>> by the Xen HV. That being said, there isn't exactly a "more >>> appropriate" macro/variable to use, so this may not really be an issue. >>> >>> Any input on the patch, or the questions I've raised above is greatly >>> appreciated! >> >> While I think extending the e820 table is the right thing to do I'm >> questioning the assumptions here. >> >> Looking briefly through the Xen hypervisor sources I think it isn't >> yet ready for such large machines: the hypervisor's e820 map seems to >> be still limited to 128 e820 entries. Jan, did I overlook an EFI >> specific path extending this limitation? > > No, you didn't. I do question the correlation with "large machines" > here though: The issue isn't with large machines afaict, but with > ones having very many entries (i.e. heavily fragmented). Fact is: non-EFI machines are limited to 128 entries. One reason for fragmentation is NUMA - which tends to be present and especially adding many entries only with lots of nodes - on large machines. So while you are of course right that the problem isn't the size of the machine, but the memory fragmentation, the chances to show up are much higher on large machines. So I'd rephrase: Looking briefly through the Xen hypervisor sources I think it isn't yet ready for machines with many e820 entries due to memory fragmentation to be found e.g. on very large NUMA machines with lots of nodes: ... >> In case I'm right the Xen hypervisor should be prepared for a larger >> e820 map, but this won't help alone as there would still be additional >> entries for the IOAPICs created. >> >> So I think we need something like: >> >> #define E820_XEN_MAX (E820_X_MAX + MAX_IO_APICS) >> >> and use this for sizing xen_e820_map[]. > > I would say that if any change gets done here, there shouldn't be > any static upper limit at all. That could even be viewed as in line > with recent e820.c changes moving to dynamic allocations. In > particular I don't see why MAX_IO_APICS would need adding in > here, but not other (current and future) factors determining the > (pseudo) E820 map Xen presents to Dom0. The hypervisor interface of XENMEM_machine_memory_map requires to specify the size of the e820 map where the hypervisor can supply entries. The alternative would be try and error: start with a small e820 map and in case of error increasing the size until success. I don't like this approach. Especially as dynamic memory allocations are not possible at this stage (using RESERVE_BRK() isn't any better than a static __initdata array IMO). Which other current factors do you see determining the number of e820 entries presented to Dom0? Juergen ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [RFC PATCH] xen/x86: Increase xen_e820_map to E820_X_MAX possible entries 2016-11-15 7:36 ` Juergen Gross @ 2016-11-15 8:01 ` Jan Beulich [not found] ` <582ACEDE020000780011EAC9@suse.com> 1 sibling, 0 replies; 26+ messages in thread From: Jan Beulich @ 2016-11-15 8:01 UTC (permalink / raw) To: Juergen Gross Cc: David Vrabel, x86, Thomas Gleixner, xen-devel, Ingo Molnar, Alex Thorlton, Russ Anderson, linux-kernel, H. Peter Anvin >>> On 15.11.16 at 08:36, <JGross@suse.com> wrote: > On 15/11/16 08:15, Jan Beulich wrote: >>>>> On 15.11.16 at 07:33, <JGross@suse.com> wrote: >>> On 15/11/16 01:11, Alex Thorlton wrote: >>>> Hey everyone, >>>> >>>> We're having problems with large systems hitting a BUG in >>>> xen_memory_setup, due to extra e820 entries created in the >>>> XENMEM_machine_memory_map callback. The change in the patch gets things >>>> working, but Boris and I wanted to get opinions on whether or not this >>>> is the appropriate/entire solution, which is why I've sent it as an RFC >>>> for now. >>>> >>>> Boris pointed out to me that E820_X_MAX is only large when CONFIG_EFI=y, >>>> which is a detail worth discussig. He proposed possibly adding >>>> CONFIG_XEN to the conditions under which we set E820_X_MAX to a larger >>>> value than E820MAX, since the Xen e820 table isn't bound by the >>>> zero-page memory limitations. >>>> >>>> I do *slightly* question the use of E820_X_MAX here, only from a >>>> cosmetic prospective, as I believe this macro is intended to describe >>>> the maximum size of the extended e820 table, which, AFAIK, is not used >>>> by the Xen HV. That being said, there isn't exactly a "more >>>> appropriate" macro/variable to use, so this may not really be an issue. >>>> >>>> Any input on the patch, or the questions I've raised above is greatly >>>> appreciated! >>> >>> While I think extending the e820 table is the right thing to do I'm >>> questioning the assumptions here. >>> >>> Looking briefly through the Xen hypervisor sources I think it isn't >>> yet ready for such large machines: the hypervisor's e820 map seems to >>> be still limited to 128 e820 entries. Jan, did I overlook an EFI >>> specific path extending this limitation? >> >> No, you didn't. I do question the correlation with "large machines" >> here though: The issue isn't with large machines afaict, but with >> ones having very many entries (i.e. heavily fragmented). > > Fact is: non-EFI machines are limited to 128 entries. > > One reason for fragmentation is NUMA - which tends to be present and > especially adding many entries only with lots of nodes - on large > machines. Yes. However, at present Xen only supports up to 64 nodes, and with one entry per node (assuming there are holes between nodes, but contiguous memory within them) that's still unlikely to overflow the table. Yet I'm not meaning to discard the fact that it's been known for a long time that since Linux needed a larger table, Xen would need to follow suit sooner or later. >>> In case I'm right the Xen hypervisor should be prepared for a larger >>> e820 map, but this won't help alone as there would still be additional >>> entries for the IOAPICs created. >>> >>> So I think we need something like: >>> >>> #define E820_XEN_MAX (E820_X_MAX + MAX_IO_APICS) >>> >>> and use this for sizing xen_e820_map[]. >> >> I would say that if any change gets done here, there shouldn't be >> any static upper limit at all. That could even be viewed as in line >> with recent e820.c changes moving to dynamic allocations. In >> particular I don't see why MAX_IO_APICS would need adding in >> here, but not other (current and future) factors determining the >> (pseudo) E820 map Xen presents to Dom0. > > The hypervisor interface of XENMEM_machine_memory_map requires to > specify the size of the e820 map where the hypervisor can supply > entries. The alternative would be try and error: start with a small > e820 map and in case of error increasing the size until success. I > don't like this approach. Especially as dynamic memory allocations > are not possible at this stage (using RESERVE_BRK() isn't any better > than a static __initdata array IMO). Well, I think as a first step we should extend XENMEM_{,machine_}memory_map to the model used elsewhere where passing in a NULL handle (and perhaps count being zero) is a request for the number of needed entries. Granted this doesn't help with Linux'es way of consuming the output, but it at least allows for dynamic sizing. And Linux would then be free to prepare a static array or have a RESERVE_BRK() as it sees fit. > Which other current factors do you see determining the number of > e820 entries presented to Dom0? Just look at the implementation: The dependency is on iomem_caps, and hence any code removing pages from there could affect the number of entries. Which means the number of IOMMUs also has an effect here. And further things could get added at any time, as we happen to find them. Jan ^ permalink raw reply [flat|nested] 26+ messages in thread
[parent not found: <582ACEDE020000780011EAC9@suse.com>]
* Re: [RFC PATCH] xen/x86: Increase xen_e820_map to E820_X_MAX possible entries [not found] ` <582ACEDE020000780011EAC9@suse.com> @ 2016-11-15 8:42 ` Juergen Gross 2016-11-15 9:45 ` Jan Beulich [not found] ` <582AE72D020000780011EBB2@suse.com> 0 siblings, 2 replies; 26+ messages in thread From: Juergen Gross @ 2016-11-15 8:42 UTC (permalink / raw) To: Jan Beulich Cc: David Vrabel, x86, Thomas Gleixner, xen-devel, Ingo Molnar, Alex Thorlton, Russ Anderson, linux-kernel, H. Peter Anvin On 15/11/16 09:01, Jan Beulich wrote: >>>> On 15.11.16 at 08:36, <JGross@suse.com> wrote: >> On 15/11/16 08:15, Jan Beulich wrote: >>>>>> On 15.11.16 at 07:33, <JGross@suse.com> wrote: >>>> On 15/11/16 01:11, Alex Thorlton wrote: >>>>> Hey everyone, >>>>> >>>>> We're having problems with large systems hitting a BUG in >>>>> xen_memory_setup, due to extra e820 entries created in the >>>>> XENMEM_machine_memory_map callback. The change in the patch gets things >>>>> working, but Boris and I wanted to get opinions on whether or not this >>>>> is the appropriate/entire solution, which is why I've sent it as an RFC >>>>> for now. >>>>> >>>>> Boris pointed out to me that E820_X_MAX is only large when CONFIG_EFI=y, >>>>> which is a detail worth discussig. He proposed possibly adding >>>>> CONFIG_XEN to the conditions under which we set E820_X_MAX to a larger >>>>> value than E820MAX, since the Xen e820 table isn't bound by the >>>>> zero-page memory limitations. >>>>> >>>>> I do *slightly* question the use of E820_X_MAX here, only from a >>>>> cosmetic prospective, as I believe this macro is intended to describe >>>>> the maximum size of the extended e820 table, which, AFAIK, is not used >>>>> by the Xen HV. That being said, there isn't exactly a "more >>>>> appropriate" macro/variable to use, so this may not really be an issue. >>>>> >>>>> Any input on the patch, or the questions I've raised above is greatly >>>>> appreciated! >>>> >>>> While I think extending the e820 table is the right thing to do I'm >>>> questioning the assumptions here. >>>> >>>> Looking briefly through the Xen hypervisor sources I think it isn't >>>> yet ready for such large machines: the hypervisor's e820 map seems to >>>> be still limited to 128 e820 entries. Jan, did I overlook an EFI >>>> specific path extending this limitation? >>> >>> No, you didn't. I do question the correlation with "large machines" >>> here though: The issue isn't with large machines afaict, but with >>> ones having very many entries (i.e. heavily fragmented). >> >> Fact is: non-EFI machines are limited to 128 entries. >> >> One reason for fragmentation is NUMA - which tends to be present and >> especially adding many entries only with lots of nodes - on large >> machines. > > Yes. However, at present Xen only supports up to 64 nodes, and > with one entry per node (assuming there are holes between nodes, > but contiguous memory within them) that's still unlikely to overflow > the table. Yet I'm not meaning to discard the fact that it's been > known for a long time that since Linux needed a larger table, Xen > would need to follow suit sooner or later. > >>>> In case I'm right the Xen hypervisor should be prepared for a larger >>>> e820 map, but this won't help alone as there would still be additional >>>> entries for the IOAPICs created. >>>> >>>> So I think we need something like: >>>> >>>> #define E820_XEN_MAX (E820_X_MAX + MAX_IO_APICS) >>>> >>>> and use this for sizing xen_e820_map[]. >>> >>> I would say that if any change gets done here, there shouldn't be >>> any static upper limit at all. That could even be viewed as in line >>> with recent e820.c changes moving to dynamic allocations. In >>> particular I don't see why MAX_IO_APICS would need adding in >>> here, but not other (current and future) factors determining the >>> (pseudo) E820 map Xen presents to Dom0. >> >> The hypervisor interface of XENMEM_machine_memory_map requires to >> specify the size of the e820 map where the hypervisor can supply >> entries. The alternative would be try and error: start with a small >> e820 map and in case of error increasing the size until success. I >> don't like this approach. Especially as dynamic memory allocations >> are not possible at this stage (using RESERVE_BRK() isn't any better >> than a static __initdata array IMO). > > Well, I think as a first step we should extend > XENMEM_{,machine_}memory_map to the model used elsewhere > where passing in a NULL handle (and perhaps count being zero) is > a request for the number of needed entries. Granted this doesn't > help with Linux'es way of consuming the output, but it at least > allows for dynamic sizing. And Linux would then be free to prepare > a static array or have a RESERVE_BRK() as it sees fit. This still needs the definition of an upper limit, as RESERVE_BRK() is a compile time feature. For a fully dynamical solution we'd need a way to get a partial E820 map from the hypervisor (e.g. first 128 entries) in order to be able to setup at least some memory and later get the rest of the memory map using some dynamically allocated memory. >> Which other current factors do you see determining the number of >> e820 entries presented to Dom0? > > Just look at the implementation: The dependency is on iomem_caps, > and hence any code removing pages from there could affect the > number of entries. Which means the number of IOMMUs also has an > effect here. And further things could get added at any time, as we > happen to find them. Its easier to see with that information. :-) Juergen ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [RFC PATCH] xen/x86: Increase xen_e820_map to E820_X_MAX possible entries 2016-11-15 8:42 ` Juergen Gross @ 2016-11-15 9:45 ` Jan Beulich [not found] ` <582AE72D020000780011EBB2@suse.com> 1 sibling, 0 replies; 26+ messages in thread From: Jan Beulich @ 2016-11-15 9:45 UTC (permalink / raw) To: Juergen Gross Cc: David Vrabel, x86, Thomas Gleixner, xen-devel, Ingo Molnar, Alex Thorlton, Russ Anderson, linux-kernel, H. Peter Anvin >>> On 15.11.16 at 09:42, <JGross@suse.com> wrote: > On 15/11/16 09:01, Jan Beulich wrote: >>>>> On 15.11.16 at 08:36, <JGross@suse.com> wrote: >>> On 15/11/16 08:15, Jan Beulich wrote: >>>>>>> On 15.11.16 at 07:33, <JGross@suse.com> wrote: >>>>> In case I'm right the Xen hypervisor should be prepared for a larger >>>>> e820 map, but this won't help alone as there would still be additional >>>>> entries for the IOAPICs created. >>>>> >>>>> So I think we need something like: >>>>> >>>>> #define E820_XEN_MAX (E820_X_MAX + MAX_IO_APICS) >>>>> >>>>> and use this for sizing xen_e820_map[]. >>>> >>>> I would say that if any change gets done here, there shouldn't be >>>> any static upper limit at all. That could even be viewed as in line >>>> with recent e820.c changes moving to dynamic allocations. In >>>> particular I don't see why MAX_IO_APICS would need adding in >>>> here, but not other (current and future) factors determining the >>>> (pseudo) E820 map Xen presents to Dom0. >>> >>> The hypervisor interface of XENMEM_machine_memory_map requires to >>> specify the size of the e820 map where the hypervisor can supply >>> entries. The alternative would be try and error: start with a small >>> e820 map and in case of error increasing the size until success. I >>> don't like this approach. Especially as dynamic memory allocations >>> are not possible at this stage (using RESERVE_BRK() isn't any better >>> than a static __initdata array IMO). >> >> Well, I think as a first step we should extend >> XENMEM_{,machine_}memory_map to the model used elsewhere >> where passing in a NULL handle (and perhaps count being zero) is >> a request for the number of needed entries. Granted this doesn't >> help with Linux'es way of consuming the output, but it at least >> allows for dynamic sizing. And Linux would then be free to prepare >> a static array or have a RESERVE_BRK() as it sees fit. > > This still needs the definition of an upper limit, as RESERVE_BRK() > is a compile time feature. That's why I did say "as it sees fit". > For a fully dynamical solution we'd need a way to get a partial > E820 map from the hypervisor (e.g. first 128 entries) in order to > be able to setup at least some memory and later get the rest of > the memory map using some dynamically allocated memory. And we could of course also make the hypercall allow for that (e.g. by defining the semantics of a specific error code, so far not used by it, to avoid mis-interpretation of output on older hypervisors), or introduce a new clone of the existing one(s). Jan ^ permalink raw reply [flat|nested] 26+ messages in thread
[parent not found: <582AE72D020000780011EBB2@suse.com>]
* Re: [RFC PATCH] xen/x86: Increase xen_e820_map to E820_X_MAX possible entries [not found] ` <582AE72D020000780011EBB2@suse.com> @ 2016-11-15 9:55 ` Juergen Gross 2016-11-15 10:44 ` Jan Beulich ` (2 more replies) 0 siblings, 3 replies; 26+ messages in thread From: Juergen Gross @ 2016-11-15 9:55 UTC (permalink / raw) To: Jan Beulich Cc: David Vrabel, x86, Thomas Gleixner, xen-devel, Ingo Molnar, Alex Thorlton, Russ Anderson, linux-kernel, H. Peter Anvin On 15/11/16 10:45, Jan Beulich wrote: >>>> On 15.11.16 at 09:42, <JGross@suse.com> wrote: >> On 15/11/16 09:01, Jan Beulich wrote: >>>>>> On 15.11.16 at 08:36, <JGross@suse.com> wrote: >>>> On 15/11/16 08:15, Jan Beulich wrote: >>>>>>>> On 15.11.16 at 07:33, <JGross@suse.com> wrote: >>>>>> In case I'm right the Xen hypervisor should be prepared for a larger >>>>>> e820 map, but this won't help alone as there would still be additional >>>>>> entries for the IOAPICs created. >>>>>> >>>>>> So I think we need something like: >>>>>> >>>>>> #define E820_XEN_MAX (E820_X_MAX + MAX_IO_APICS) >>>>>> >>>>>> and use this for sizing xen_e820_map[]. >>>>> >>>>> I would say that if any change gets done here, there shouldn't be >>>>> any static upper limit at all. That could even be viewed as in line >>>>> with recent e820.c changes moving to dynamic allocations. In >>>>> particular I don't see why MAX_IO_APICS would need adding in >>>>> here, but not other (current and future) factors determining the >>>>> (pseudo) E820 map Xen presents to Dom0. >>>> >>>> The hypervisor interface of XENMEM_machine_memory_map requires to >>>> specify the size of the e820 map where the hypervisor can supply >>>> entries. The alternative would be try and error: start with a small >>>> e820 map and in case of error increasing the size until success. I >>>> don't like this approach. Especially as dynamic memory allocations >>>> are not possible at this stage (using RESERVE_BRK() isn't any better >>>> than a static __initdata array IMO). >>> >>> Well, I think as a first step we should extend >>> XENMEM_{,machine_}memory_map to the model used elsewhere >>> where passing in a NULL handle (and perhaps count being zero) is >>> a request for the number of needed entries. Granted this doesn't >>> help with Linux'es way of consuming the output, but it at least >>> allows for dynamic sizing. And Linux would then be free to prepare >>> a static array or have a RESERVE_BRK() as it sees fit. >> >> This still needs the definition of an upper limit, as RESERVE_BRK() >> is a compile time feature. > > That's why I did say "as it sees fit". > >> For a fully dynamical solution we'd need a way to get a partial >> E820 map from the hypervisor (e.g. first 128 entries) in order to >> be able to setup at least some memory and later get the rest of >> the memory map using some dynamically allocated memory. > > And we could of course also make the hypercall allow for that (e.g. > by defining the semantics of a specific error code, so far not used > by it, to avoid mis-interpretation of output on older hypervisors), > or introduce a new clone of the existing one(s). I'd go with the new error code. What about E2BIG or ENOSPC? I think the hypervisor should fill in the number of entries required in this case. In case nobody objects I can post patches for this purpose (both Xen and Linux). Juergen ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [RFC PATCH] xen/x86: Increase xen_e820_map to E820_X_MAX possible entries 2016-11-15 9:55 ` Juergen Gross @ 2016-11-15 10:44 ` Jan Beulich [not found] ` <582AF51F020000780011ECB4@suse.com> 2016-11-15 15:22 ` Alex Thorlton 2 siblings, 0 replies; 26+ messages in thread From: Jan Beulich @ 2016-11-15 10:44 UTC (permalink / raw) To: Juergen Gross Cc: David Vrabel, x86, Thomas Gleixner, xen-devel, Ingo Molnar, Alex Thorlton, Russ Anderson, linux-kernel, H. Peter Anvin >>> On 15.11.16 at 10:55, <JGross@suse.com> wrote: > On 15/11/16 10:45, Jan Beulich wrote: >>>>> On 15.11.16 at 09:42, <JGross@suse.com> wrote: >>> For a fully dynamical solution we'd need a way to get a partial >>> E820 map from the hypervisor (e.g. first 128 entries) in order to >>> be able to setup at least some memory and later get the rest of >>> the memory map using some dynamically allocated memory. >> >> And we could of course also make the hypercall allow for that (e.g. >> by defining the semantics of a specific error code, so far not used >> by it, to avoid mis-interpretation of output on older hypervisors), >> or introduce a new clone of the existing one(s). > > I'd go with the new error code. What about E2BIG or ENOSPC? Either seems fine. > I think the hypervisor should fill in the number of entries required > in this case. And you'd then mean the caller to imply that the passed in (and now overwritten) count to describe how many entries got filled? That's not that nice an interface. I'd rather return the number of entries filled, and require the sizing variant (NULL handle) to be used to obtain the number of entries. After all if a caller _wants_ to handle a partial map, it doesn't really care how many further entries there are. Jan ^ permalink raw reply [flat|nested] 26+ messages in thread
[parent not found: <582AF51F020000780011ECB4@suse.com>]
* Re: [RFC PATCH] xen/x86: Increase xen_e820_map to E820_X_MAX possible entries [not found] ` <582AF51F020000780011ECB4@suse.com> @ 2016-11-15 11:07 ` Juergen Gross 2016-11-15 11:12 ` Jan Beulich 0 siblings, 1 reply; 26+ messages in thread From: Juergen Gross @ 2016-11-15 11:07 UTC (permalink / raw) To: Jan Beulich Cc: David Vrabel, x86, Thomas Gleixner, xen-devel, Ingo Molnar, Alex Thorlton, Russ Anderson, linux-kernel, H. Peter Anvin On 15/11/16 11:44, Jan Beulich wrote: >>>> On 15.11.16 at 10:55, <JGross@suse.com> wrote: >> On 15/11/16 10:45, Jan Beulich wrote: >>>>>> On 15.11.16 at 09:42, <JGross@suse.com> wrote: >>>> For a fully dynamical solution we'd need a way to get a partial >>>> E820 map from the hypervisor (e.g. first 128 entries) in order to >>>> be able to setup at least some memory and later get the rest of >>>> the memory map using some dynamically allocated memory. >>> >>> And we could of course also make the hypercall allow for that (e.g. >>> by defining the semantics of a specific error code, so far not used >>> by it, to avoid mis-interpretation of output on older hypervisors), >>> or introduce a new clone of the existing one(s). >> >> I'd go with the new error code. What about E2BIG or ENOSPC? > > Either seems fine. > >> I think the hypervisor should fill in the number of entries required >> in this case. > > And you'd then mean the caller to imply that the passed in (and now > overwritten) count to describe how many entries got filled? That's > not that nice an interface. I'd rather return the number of entries > filled, and require the sizing variant (NULL handle) to be used to > obtain the number of entries. After all if a caller _wants_ to handle > a partial map, it doesn't really care how many further entries there > are. I just wanted to avoid two interface changes. IMO the caller knows the buffer size he supplied and it is rather clear that e.g. E2BIG means it has been filled completely. And the assumption the caller doesn't care for the further entries is not necessarily true: he might not care at the moment of the call, but he will eventually care for them later when he is able to allocate an appropriate sized buffer. OTOH I'm not feeling strong in this case and you are the hypervisor maintainer. Either solution is fine for me. Juergen ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [RFC PATCH] xen/x86: Increase xen_e820_map to E820_X_MAX possible entries 2016-11-15 11:07 ` Juergen Gross @ 2016-11-15 11:12 ` Jan Beulich 0 siblings, 0 replies; 26+ messages in thread From: Jan Beulich @ 2016-11-15 11:12 UTC (permalink / raw) To: Juergen Gross Cc: David Vrabel, x86, Thomas Gleixner, xen-devel, Ingo Molnar, Alex Thorlton, Russ Anderson, linux-kernel, H. Peter Anvin >>> On 15.11.16 at 12:07, <JGross@suse.com> wrote: > On 15/11/16 11:44, Jan Beulich wrote: >>>>> On 15.11.16 at 10:55, <JGross@suse.com> wrote: >>> On 15/11/16 10:45, Jan Beulich wrote: >>>>>>> On 15.11.16 at 09:42, <JGross@suse.com> wrote: >>>>> For a fully dynamical solution we'd need a way to get a partial >>>>> E820 map from the hypervisor (e.g. first 128 entries) in order to >>>>> be able to setup at least some memory and later get the rest of >>>>> the memory map using some dynamically allocated memory. >>>> >>>> And we could of course also make the hypercall allow for that (e.g. >>>> by defining the semantics of a specific error code, so far not used >>>> by it, to avoid mis-interpretation of output on older hypervisors), >>>> or introduce a new clone of the existing one(s). >>> >>> I'd go with the new error code. What about E2BIG or ENOSPC? >> >> Either seems fine. >> >>> I think the hypervisor should fill in the number of entries required >>> in this case. >> >> And you'd then mean the caller to imply that the passed in (and now >> overwritten) count to describe how many entries got filled? That's >> not that nice an interface. I'd rather return the number of entries >> filled, and require the sizing variant (NULL handle) to be used to >> obtain the number of entries. After all if a caller _wants_ to handle >> a partial map, it doesn't really care how many further entries there >> are. > > I just wanted to avoid two interface changes. Just make one at a time. Jan ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [RFC PATCH] xen/x86: Increase xen_e820_map to E820_X_MAX possible entries 2016-11-15 9:55 ` Juergen Gross 2016-11-15 10:44 ` Jan Beulich [not found] ` <582AF51F020000780011ECB4@suse.com> @ 2016-11-15 15:22 ` Alex Thorlton 2016-11-16 6:06 ` Juergen Gross 2 siblings, 1 reply; 26+ messages in thread From: Alex Thorlton @ 2016-11-15 15:22 UTC (permalink / raw) To: Juergen Gross Cc: Jan Beulich, David Vrabel, x86, Thomas Gleixner, xen-devel, Ingo Molnar, Alex Thorlton, Russ Anderson, linux-kernel, H. Peter Anvin On Tue, Nov 15, 2016 at 10:55:49AM +0100, Juergen Gross wrote: > I'd go with the new error code. What about E2BIG or ENOSPC? > > I think the hypervisor should fill in the number of entries required > in this case. > > In case nobody objects I can post patches for this purpose (both Xen > and Linux). This sounds like a good solution to me. I think it's definitely more appropriate than simply bumping up the size of xen_e820_map, especially considering the fact that it's theoretically possible for the e820 map generated by the hypercall to grow too large, even on a non-EFI machine, where my change would have no effect. Thanks for your input! - Alex ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [RFC PATCH] xen/x86: Increase xen_e820_map to E820_X_MAX possible entries 2016-11-15 15:22 ` Alex Thorlton @ 2016-11-16 6:06 ` Juergen Gross 0 siblings, 0 replies; 26+ messages in thread From: Juergen Gross @ 2016-11-16 6:06 UTC (permalink / raw) To: Alex Thorlton Cc: Jan Beulich, David Vrabel, x86, Thomas Gleixner, xen-devel, Ingo Molnar, Russ Anderson, linux-kernel, H. Peter Anvin On 15/11/16 16:22, Alex Thorlton wrote: > On Tue, Nov 15, 2016 at 10:55:49AM +0100, Juergen Gross wrote: >> I'd go with the new error code. What about E2BIG or ENOSPC? >> >> I think the hypervisor should fill in the number of entries required >> in this case. >> >> In case nobody objects I can post patches for this purpose (both Xen >> and Linux). > > This sounds like a good solution to me. I think it's definitely more > appropriate than simply bumping up the size of xen_e820_map, especially > considering the fact that it's theoretically possible for the e820 map > generated by the hypercall to grow too large, even on a non-EFI machine, > where my change would have no effect. Well, it won't help with the current hypervisor, so bumping up the size of xen_e820_map will still be a good idea. I think using E820_X_MAX is okay since in the end xen_e820_map will be transferred into a struct e820map which can't hold more than E820_X_MAX entries (additional entries are ignored here, so this won't let the boot fail). Juergen ^ permalink raw reply [flat|nested] 26+ messages in thread
* [RFC PATCH v2] xen/x86: Increase xen_e820_map to E820_X_MAX possible entries 2016-11-15 0:11 [RFC PATCH] xen/x86: Increase xen_e820_map to E820_X_MAX possible entries Alex Thorlton 2016-11-15 6:33 ` Juergen Gross @ 2016-11-30 3:24 ` Alex Thorlton 2016-11-30 3:24 ` [PATCH 1/2] x86: Make E820_X_MAX unconditionally larger than E820MAX Alex Thorlton 2016-11-30 3:24 ` [PATCH 2/2] xen/x86: Increase xen_e820_map to E820_X_MAX possible entries Alex Thorlton 2016-12-05 17:49 ` [RFC PATCH v3] " Alex Thorlton 2 siblings, 2 replies; 26+ messages in thread From: Alex Thorlton @ 2016-11-30 3:24 UTC (permalink / raw) To: linux-kernel Cc: Alex Thorlton, Russ Anderson, David Vrabel, Juergen Gross, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Denys Vlasenko, Boris Ostrovsky, x86, xen-devel Here's the second round of my patches to fix up the problems that we're seeing with the XENMEM_machine_memory_map hypercall. These few simple changes, to give xen_e820_map some extra space, get things working fine on our large machine. This version of the patchset adds a patch to remove the #ifdef CONFIG_EFI conditional around the definition of E820_X_MAX, so that it's always slightly larger than E820MAX. I've also tweaked the code that works on xen_e820_map to use ARRAY_SIZE(xen_e820_map) instead of using E820_X_MAX directly, as suggested by Boris. As always, I appreciate any input that others can give! - Alex Alex Thorlton (2): x86: Make E820_X_MAX unconditionally larger than E820MAX xen/x86: Increase xen_e820_map to E820_X_MAX possible entries arch/x86/include/asm/e820.h | 8 +++----- arch/x86/xen/setup.c | 6 +++--- 2 files changed, 6 insertions(+), 8 deletions(-) -- 1.8.5.6 ^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH 1/2] x86: Make E820_X_MAX unconditionally larger than E820MAX 2016-11-30 3:24 ` [RFC PATCH v2] " Alex Thorlton @ 2016-11-30 3:24 ` Alex Thorlton 2016-11-30 6:21 ` Ingo Molnar 2016-11-30 3:24 ` [PATCH 2/2] xen/x86: Increase xen_e820_map to E820_X_MAX possible entries Alex Thorlton 1 sibling, 1 reply; 26+ messages in thread From: Alex Thorlton @ 2016-11-30 3:24 UTC (permalink / raw) To: linux-kernel Cc: Alex Thorlton, Russ Anderson, David Vrabel, Ingo Molnar, Juergen Gross, Thomas Gleixner, H. Peter Anvin, Denys Vlasenko, Boris Ostrovsky, x86, xen-devel It's really not necessary to limit E820_X_MAX to 128 in the non-EFI case. This commit drops E820_X_MAX's dependency on CONFIG_EFI, so that E820_X_MAX is always at least slightly larger than E820MAX. The real motivation behind this is actually to prevent some issues in the Xen kernel, where the XENMEM_machine_memory_map hypercall can produce an e820 map larger than 128 entries, even on systems where the original e820 table was quite a bit smaller than that, depending on how many IOAPICs are installed on the system. Signed-off-by: Alex Thorlton <athorlton@sgi.com> Suggested-by: Ingo Molnar <mingo@redhat.com> Cc: Russ Anderson <rja@sgi.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: x86@kernel.org Cc: xen-devel@lists.xenproject.org --- arch/x86/include/asm/e820.h | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/arch/x86/include/asm/e820.h b/arch/x86/include/asm/e820.h index 476b574..aa00d33 100644 --- a/arch/x86/include/asm/e820.h +++ b/arch/x86/include/asm/e820.h @@ -1,13 +1,15 @@ #ifndef _ASM_X86_E820_H #define _ASM_X86_E820_H -#ifdef CONFIG_EFI +/* + * We need to make sure that E820_X_MAX is defined + * before we include uapi/asm/e820.h + */ #include <linux/numa.h> #define E820_X_MAX (E820MAX + 3 * MAX_NUMNODES) -#else /* ! CONFIG_EFI */ -#define E820_X_MAX E820MAX -#endif + #include <uapi/asm/e820.h> + #ifndef __ASSEMBLY__ /* see comment in arch/x86/kernel/e820.c */ extern struct e820map *e820; -- 1.8.5.6 ^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: [PATCH 1/2] x86: Make E820_X_MAX unconditionally larger than E820MAX 2016-11-30 3:24 ` [PATCH 1/2] x86: Make E820_X_MAX unconditionally larger than E820MAX Alex Thorlton @ 2016-11-30 6:21 ` Ingo Molnar 2016-12-01 18:37 ` Alex Thorlton 0 siblings, 1 reply; 26+ messages in thread From: Ingo Molnar @ 2016-11-30 6:21 UTC (permalink / raw) To: Alex Thorlton Cc: linux-kernel, Russ Anderson, David Vrabel, Ingo Molnar, Juergen Gross, Thomas Gleixner, H. Peter Anvin, Denys Vlasenko, Boris Ostrovsky, x86, xen-devel * Alex Thorlton <athorlton@sgi.com> wrote: > It's really not necessary to limit E820_X_MAX to 128 in the non-EFI > case. This commit drops E820_X_MAX's dependency on CONFIG_EFI, so that > E820_X_MAX is always at least slightly larger than E820MAX. > > The real motivation behind this is actually to prevent some issues in > the Xen kernel, where the XENMEM_machine_memory_map hypercall can > produce an e820 map larger than 128 entries, even on systems where the > original e820 table was quite a bit smaller than that, depending on how > many IOAPICs are installed on the system. > > Signed-off-by: Alex Thorlton <athorlton@sgi.com> > Suggested-by: Ingo Molnar <mingo@redhat.com> > Cc: Russ Anderson <rja@sgi.com> > Cc: David Vrabel <david.vrabel@citrix.com> > Cc: Ingo Molnar <mingo@redhat.com> > Cc: Juergen Gross <jgross@suse.com> > Cc: Thomas Gleixner <tglx@linutronix.de> > Cc: "H. Peter Anvin" <hpa@zytor.com> > Cc: Denys Vlasenko <dvlasenk@redhat.com> > Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> > Cc: x86@kernel.org > Cc: xen-devel@lists.xenproject.org > --- > arch/x86/include/asm/e820.h | 10 ++++++---- > 1 file changed, 6 insertions(+), 4 deletions(-) > > diff --git a/arch/x86/include/asm/e820.h b/arch/x86/include/asm/e820.h > index 476b574..aa00d33 100644 > --- a/arch/x86/include/asm/e820.h > +++ b/arch/x86/include/asm/e820.h > @@ -1,13 +1,15 @@ > #ifndef _ASM_X86_E820_H > #define _ASM_X86_E820_H > > -#ifdef CONFIG_EFI > +/* > + * We need to make sure that E820_X_MAX is defined > + * before we include uapi/asm/e820.h > + */ > #include <linux/numa.h> > #define E820_X_MAX (E820MAX + 3 * MAX_NUMNODES) What we need an explanation for in the comment is what does this stand for (what does the 'X' mean?), and what is the magic 3*MAX_NUMNODES about? Thanks, Ingo ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 1/2] x86: Make E820_X_MAX unconditionally larger than E820MAX 2016-11-30 6:21 ` Ingo Molnar @ 2016-12-01 18:37 ` Alex Thorlton 0 siblings, 0 replies; 26+ messages in thread From: Alex Thorlton @ 2016-12-01 18:37 UTC (permalink / raw) To: Ingo Molnar Cc: Alex Thorlton, linux-kernel, Russ Anderson, David Vrabel, Ingo Molnar, Juergen Gross, Thomas Gleixner, H. Peter Anvin, Denys Vlasenko, Boris Ostrovsky, x86, xen-devel On Wed, Nov 30, 2016 at 07:21:48AM +0100, Ingo Molnar wrote: > > * Alex Thorlton <athorlton@sgi.com> wrote: > > > It's really not necessary to limit E820_X_MAX to 128 in the non-EFI > > case. This commit drops E820_X_MAX's dependency on CONFIG_EFI, so that > > E820_X_MAX is always at least slightly larger than E820MAX. > > > > The real motivation behind this is actually to prevent some issues in > > the Xen kernel, where the XENMEM_machine_memory_map hypercall can > > produce an e820 map larger than 128 entries, even on systems where the > > original e820 table was quite a bit smaller than that, depending on how > > many IOAPICs are installed on the system. > > > > Signed-off-by: Alex Thorlton <athorlton@sgi.com> > > Suggested-by: Ingo Molnar <mingo@redhat.com> > > Cc: Russ Anderson <rja@sgi.com> > > Cc: David Vrabel <david.vrabel@citrix.com> > > Cc: Ingo Molnar <mingo@redhat.com> > > Cc: Juergen Gross <jgross@suse.com> > > Cc: Thomas Gleixner <tglx@linutronix.de> > > Cc: "H. Peter Anvin" <hpa@zytor.com> > > Cc: Denys Vlasenko <dvlasenk@redhat.com> > > Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> > > Cc: x86@kernel.org > > Cc: xen-devel@lists.xenproject.org > > --- > > arch/x86/include/asm/e820.h | 10 ++++++---- > > 1 file changed, 6 insertions(+), 4 deletions(-) > > > > diff --git a/arch/x86/include/asm/e820.h b/arch/x86/include/asm/e820.h > > index 476b574..aa00d33 100644 > > --- a/arch/x86/include/asm/e820.h > > +++ b/arch/x86/include/asm/e820.h > > @@ -1,13 +1,15 @@ > > #ifndef _ASM_X86_E820_H > > #define _ASM_X86_E820_H > > > > -#ifdef CONFIG_EFI > > +/* > > + * We need to make sure that E820_X_MAX is defined > > + * before we include uapi/asm/e820.h > > + */ > > #include <linux/numa.h> > > #define E820_X_MAX (E820MAX + 3 * MAX_NUMNODES) > > What we need an explanation for in the comment is what does this stand for (what > does the 'X' mean?), and what is the magic 3*MAX_NUMNODES about? I'm not actually certain why 3*MAX_NUMNODES was chosen as the max size for this table, but it was written by somebody here at SGI, so I'm sure I can find out :) As for the 'X,' I'm assuming that's meant to imply that this is the maxmimum size for the 'eXtended' table, but that could be made more clear in the comment as well. I'll come up with a better comment for this and submit a v3. Thanks, Ingo! ^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH 2/2] xen/x86: Increase xen_e820_map to E820_X_MAX possible entries 2016-11-30 3:24 ` [RFC PATCH v2] " Alex Thorlton 2016-11-30 3:24 ` [PATCH 1/2] x86: Make E820_X_MAX unconditionally larger than E820MAX Alex Thorlton @ 2016-11-30 3:24 ` Alex Thorlton 2016-11-30 5:18 ` Juergen Gross 1 sibling, 1 reply; 26+ messages in thread From: Alex Thorlton @ 2016-11-30 3:24 UTC (permalink / raw) To: linux-kernel Cc: Alex Thorlton, Russ Anderson, David Vrabel, Ingo Molnar, Juergen Gross, Thomas Gleixner, H. Peter Anvin, Denys Vlasenko, Boris Ostrovsky, x86, xen-devel On systems with sufficiently large e820 tables, and several IOAPICs, it is possible for the XENMEM_machine_memory_map callback (and its counterpart, XENMEM_memory_map) to attempt to return an e820 table with more than 128 entries. This callback adds entries to the BIOS-provided e820 table to account for IOAPIC registers, which, on sufficiently large systems, can result in an e820 table that is too large to copy back into xen_e820_map. This change simply increases the size of xen_e820_map to E820_X_MAX to ensure that there is enough room to store the entire e820 map returned from this callback. Signed-off-by: Alex Thorlton <athorlton@sgi.com> Suggested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Russ Anderson <rja@sgi.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: x86@kernel.org Cc: xen-devel@lists.xenproject.org --- arch/x86/xen/setup.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c index f8960fc..8c394e3 100644 --- a/arch/x86/xen/setup.c +++ b/arch/x86/xen/setup.c @@ -41,7 +41,7 @@ unsigned long xen_released_pages; /* E820 map used during setting up memory. */ -static struct e820entry xen_e820_map[E820MAX] __initdata; +static struct e820entry xen_e820_map[E820_X_MAX] __initdata; static u32 xen_e820_map_entries __initdata; /* @@ -750,7 +750,7 @@ char * __init xen_memory_setup(void) max_pfn = min(max_pfn, xen_start_info->nr_pages); mem_end = PFN_PHYS(max_pfn); - memmap.nr_entries = E820MAX; + memmap.nr_entries = ARRAY_SIZE(xen_e820_map); set_xen_guest_handle(memmap.buffer, xen_e820_map); op = xen_initial_domain() ? @@ -923,7 +923,7 @@ char * __init xen_auto_xlated_memory_setup(void) int i; int rc; - memmap.nr_entries = E820MAX; + memmap.nr_entries = ARRAY_SIZE(xen_e820_map); set_xen_guest_handle(memmap.buffer, xen_e820_map); rc = HYPERVISOR_memory_op(XENMEM_memory_map, &memmap); -- 1.8.5.6 ^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: [PATCH 2/2] xen/x86: Increase xen_e820_map to E820_X_MAX possible entries 2016-11-30 3:24 ` [PATCH 2/2] xen/x86: Increase xen_e820_map to E820_X_MAX possible entries Alex Thorlton @ 2016-11-30 5:18 ` Juergen Gross 0 siblings, 0 replies; 26+ messages in thread From: Juergen Gross @ 2016-11-30 5:18 UTC (permalink / raw) To: Alex Thorlton, linux-kernel Cc: Russ Anderson, David Vrabel, Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Denys Vlasenko, Boris Ostrovsky, x86, xen-devel On 30/11/16 04:24, Alex Thorlton wrote: > On systems with sufficiently large e820 tables, and several IOAPICs, it > is possible for the XENMEM_machine_memory_map callback (and its > counterpart, XENMEM_memory_map) to attempt to return an e820 table with > more than 128 entries. This callback adds entries to the BIOS-provided > e820 table to account for IOAPIC registers, which, on sufficiently large > systems, can result in an e820 table that is too large to copy back into > xen_e820_map. > > This change simply increases the size of xen_e820_map to E820_X_MAX to > ensure that there is enough room to store the entire e820 map returned > from this callback. > > Signed-off-by: Alex Thorlton <athorlton@sgi.com> > Suggested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Juergen Gross <jgross@suse.com> Juergen > Cc: Russ Anderson <rja@sgi.com> > Cc: David Vrabel <david.vrabel@citrix.com> > Cc: Ingo Molnar <mingo@redhat.com> > Cc: Juergen Gross <jgross@suse.com> > Cc: Thomas Gleixner <tglx@linutronix.de> > Cc: "H. Peter Anvin" <hpa@zytor.com> > Cc: Denys Vlasenko <dvlasenk@redhat.com> > Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> > Cc: x86@kernel.org > Cc: xen-devel@lists.xenproject.org > --- > arch/x86/xen/setup.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c > index f8960fc..8c394e3 100644 > --- a/arch/x86/xen/setup.c > +++ b/arch/x86/xen/setup.c > @@ -41,7 +41,7 @@ > unsigned long xen_released_pages; > > /* E820 map used during setting up memory. */ > -static struct e820entry xen_e820_map[E820MAX] __initdata; > +static struct e820entry xen_e820_map[E820_X_MAX] __initdata; > static u32 xen_e820_map_entries __initdata; > > /* > @@ -750,7 +750,7 @@ char * __init xen_memory_setup(void) > max_pfn = min(max_pfn, xen_start_info->nr_pages); > mem_end = PFN_PHYS(max_pfn); > > - memmap.nr_entries = E820MAX; > + memmap.nr_entries = ARRAY_SIZE(xen_e820_map); > set_xen_guest_handle(memmap.buffer, xen_e820_map); > > op = xen_initial_domain() ? > @@ -923,7 +923,7 @@ char * __init xen_auto_xlated_memory_setup(void) > int i; > int rc; > > - memmap.nr_entries = E820MAX; > + memmap.nr_entries = ARRAY_SIZE(xen_e820_map); > set_xen_guest_handle(memmap.buffer, xen_e820_map); > > rc = HYPERVISOR_memory_op(XENMEM_memory_map, &memmap); > ^ permalink raw reply [flat|nested] 26+ messages in thread
* [RFC PATCH v3] xen/x86: Increase xen_e820_map to E820_X_MAX possible entries 2016-11-15 0:11 [RFC PATCH] xen/x86: Increase xen_e820_map to E820_X_MAX possible entries Alex Thorlton 2016-11-15 6:33 ` Juergen Gross 2016-11-30 3:24 ` [RFC PATCH v2] " Alex Thorlton @ 2016-12-05 17:49 ` Alex Thorlton 2016-12-05 17:49 ` [PATCH 1/2] x86: Make E820_X_MAX unconditionally larger than E820MAX Alex Thorlton ` (2 more replies) 2 siblings, 3 replies; 26+ messages in thread From: Alex Thorlton @ 2016-12-05 17:49 UTC (permalink / raw) To: linux-kernel Cc: Alex Thorlton, Russ Anderson, David Vrabel, Juergen Gross, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Denys Vlasenko, Boris Ostrovsky, x86, xen-devel This is the third pass at my patchset to fix up our problems with XENMEM_machine_memory_map on large systems. The only changes on this pass were to flesh out the comment above the E820_X_MAX definition, and to add Juergen's Reviewed-by to the second patch. Let me know if anyone has any questions/comments! Alex Thorlton (2): x86: Make E820_X_MAX unconditionally larger than E820MAX xen/x86: Increase xen_e820_map to E820_X_MAX possible entries arch/x86/include/asm/e820.h | 12 ++++++++---- arch/x86/xen/setup.c | 6 +++--- 2 files changed, 11 insertions(+), 7 deletions(-) -- 1.8.5.6 ^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH 1/2] x86: Make E820_X_MAX unconditionally larger than E820MAX 2016-12-05 17:49 ` [RFC PATCH v3] " Alex Thorlton @ 2016-12-05 17:49 ` Alex Thorlton 2016-12-09 10:12 ` Juergen Gross 2016-12-05 17:49 ` [PATCH 2/2] xen/x86: Increase xen_e820_map to E820_X_MAX possible entries Alex Thorlton 2016-12-08 5:50 ` [RFC PATCH v3] " Juergen Gross 2 siblings, 1 reply; 26+ messages in thread From: Alex Thorlton @ 2016-12-05 17:49 UTC (permalink / raw) To: linux-kernel Cc: Alex Thorlton, Ingo Molnar, Russ Anderson, David Vrabel, Juergen Gross, Thomas Gleixner, H. Peter Anvin, Denys Vlasenko, Boris Ostrovsky, x86, xen-devel It's really not necessary to limit E820_X_MAX to 128 in the non-EFI case. This commit drops E820_X_MAX's dependency on CONFIG_EFI, so that E820_X_MAX is always at least slightly larger than E820MAX. The real motivation behind this is actually to prevent some issues in the Xen kernel, where the XENMEM_machine_memory_map hypercall can produce an e820 map larger than 128 entries, even on systems where the original e820 table was quite a bit smaller than that, depending on how many IOAPICs are installed on the system. Signed-off-by: Alex Thorlton <athorlton@sgi.com> Suggested-by: Ingo Molnar <mingo@redhat.com> Cc: Russ Anderson <rja@sgi.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: x86@kernel.org Cc: xen-devel@lists.xenproject.org --- arch/x86/include/asm/e820.h | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/arch/x86/include/asm/e820.h b/arch/x86/include/asm/e820.h index 476b574..ec23d8e 100644 --- a/arch/x86/include/asm/e820.h +++ b/arch/x86/include/asm/e820.h @@ -1,13 +1,17 @@ #ifndef _ASM_X86_E820_H #define _ASM_X86_E820_H -#ifdef CONFIG_EFI +/* + * E820_X_MAX is the maximum size of the extended E820 table. The extended + * table may contain up to 3 extra E820 entries per possible NUMA node, so we + * make room for 3 * MAX_NUMNODES possible entries, beyond the standard 128. + * Also note that E820_X_MAX *must* be defined before we include uapi/asm/e820.h. + */ #include <linux/numa.h> #define E820_X_MAX (E820MAX + 3 * MAX_NUMNODES) -#else /* ! CONFIG_EFI */ -#define E820_X_MAX E820MAX -#endif + #include <uapi/asm/e820.h> + #ifndef __ASSEMBLY__ /* see comment in arch/x86/kernel/e820.c */ extern struct e820map *e820; -- 1.8.5.6 ^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: [PATCH 1/2] x86: Make E820_X_MAX unconditionally larger than E820MAX 2016-12-05 17:49 ` [PATCH 1/2] x86: Make E820_X_MAX unconditionally larger than E820MAX Alex Thorlton @ 2016-12-09 10:12 ` Juergen Gross 0 siblings, 0 replies; 26+ messages in thread From: Juergen Gross @ 2016-12-09 10:12 UTC (permalink / raw) To: Alex Thorlton, linux-kernel Cc: Ingo Molnar, Russ Anderson, David Vrabel, Thomas Gleixner, H. Peter Anvin, Denys Vlasenko, Boris Ostrovsky, x86, xen-devel On 05/12/16 18:49, Alex Thorlton wrote: > It's really not necessary to limit E820_X_MAX to 128 in the non-EFI > case. This commit drops E820_X_MAX's dependency on CONFIG_EFI, so that > E820_X_MAX is always at least slightly larger than E820MAX. > > The real motivation behind this is actually to prevent some issues in > the Xen kernel, where the XENMEM_machine_memory_map hypercall can > produce an e820 map larger than 128 entries, even on systems where the > original e820 table was quite a bit smaller than that, depending on how > many IOAPICs are installed on the system. > > Signed-off-by: Alex Thorlton <athorlton@sgi.com> > Suggested-by: Ingo Molnar <mingo@redhat.com> Commited to xen/tip.git for-linus-4.10 Juergen ^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH 2/2] xen/x86: Increase xen_e820_map to E820_X_MAX possible entries 2016-12-05 17:49 ` [RFC PATCH v3] " Alex Thorlton 2016-12-05 17:49 ` [PATCH 1/2] x86: Make E820_X_MAX unconditionally larger than E820MAX Alex Thorlton @ 2016-12-05 17:49 ` Alex Thorlton 2016-12-09 10:12 ` Juergen Gross 2016-12-08 5:50 ` [RFC PATCH v3] " Juergen Gross 2 siblings, 1 reply; 26+ messages in thread From: Alex Thorlton @ 2016-12-05 17:49 UTC (permalink / raw) To: linux-kernel Cc: Alex Thorlton, Boris Ostrovsky, Juergen Gross, Russ Anderson, David Vrabel, Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Denys Vlasenko, x86, xen-devel On systems with sufficiently large e820 tables, and several IOAPICs, it is possible for the XENMEM_machine_memory_map callback (and its counterpart, XENMEM_memory_map) to attempt to return an e820 table with more than 128 entries. This callback adds entries to the BIOS-provided e820 table to account for IOAPIC registers, which, on sufficiently large systems, can result in an e820 table that is too large to copy back into xen_e820_map. This change simply increases the size of xen_e820_map to E820_X_MAX to ensure that there is enough room to store the entire e820 map returned from this callback. Signed-off-by: Alex Thorlton <athorlton@sgi.com> Suggested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Juergen Gross <jgross@suse.com> Cc: Russ Anderson <rja@sgi.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: x86@kernel.org Cc: xen-devel@lists.xenproject.org --- arch/x86/xen/setup.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c index f8960fc..8c394e3 100644 --- a/arch/x86/xen/setup.c +++ b/arch/x86/xen/setup.c @@ -41,7 +41,7 @@ unsigned long xen_released_pages; /* E820 map used during setting up memory. */ -static struct e820entry xen_e820_map[E820MAX] __initdata; +static struct e820entry xen_e820_map[E820_X_MAX] __initdata; static u32 xen_e820_map_entries __initdata; /* @@ -750,7 +750,7 @@ char * __init xen_memory_setup(void) max_pfn = min(max_pfn, xen_start_info->nr_pages); mem_end = PFN_PHYS(max_pfn); - memmap.nr_entries = E820MAX; + memmap.nr_entries = ARRAY_SIZE(xen_e820_map); set_xen_guest_handle(memmap.buffer, xen_e820_map); op = xen_initial_domain() ? @@ -923,7 +923,7 @@ char * __init xen_auto_xlated_memory_setup(void) int i; int rc; - memmap.nr_entries = E820MAX; + memmap.nr_entries = ARRAY_SIZE(xen_e820_map); set_xen_guest_handle(memmap.buffer, xen_e820_map); rc = HYPERVISOR_memory_op(XENMEM_memory_map, &memmap); -- 1.8.5.6 ^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: [PATCH 2/2] xen/x86: Increase xen_e820_map to E820_X_MAX possible entries 2016-12-05 17:49 ` [PATCH 2/2] xen/x86: Increase xen_e820_map to E820_X_MAX possible entries Alex Thorlton @ 2016-12-09 10:12 ` Juergen Gross 0 siblings, 0 replies; 26+ messages in thread From: Juergen Gross @ 2016-12-09 10:12 UTC (permalink / raw) To: Alex Thorlton, linux-kernel Cc: Boris Ostrovsky, Russ Anderson, David Vrabel, Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Denys Vlasenko, x86, xen-devel On 05/12/16 18:49, Alex Thorlton wrote: > On systems with sufficiently large e820 tables, and several IOAPICs, it > is possible for the XENMEM_machine_memory_map callback (and its > counterpart, XENMEM_memory_map) to attempt to return an e820 table with > more than 128 entries. This callback adds entries to the BIOS-provided > e820 table to account for IOAPIC registers, which, on sufficiently large > systems, can result in an e820 table that is too large to copy back into > xen_e820_map. > > This change simply increases the size of xen_e820_map to E820_X_MAX to > ensure that there is enough room to store the entire e820 map returned > from this callback. > > Signed-off-by: Alex Thorlton <athorlton@sgi.com> > Suggested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> > Reviewed-by: Juergen Gross <jgross@suse.com> Commited to xen/tip.git for-linus-4.10 Juergen ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [RFC PATCH v3] xen/x86: Increase xen_e820_map to E820_X_MAX possible entries 2016-12-05 17:49 ` [RFC PATCH v3] " Alex Thorlton 2016-12-05 17:49 ` [PATCH 1/2] x86: Make E820_X_MAX unconditionally larger than E820MAX Alex Thorlton 2016-12-05 17:49 ` [PATCH 2/2] xen/x86: Increase xen_e820_map to E820_X_MAX possible entries Alex Thorlton @ 2016-12-08 5:50 ` Juergen Gross 2016-12-09 3:46 ` Ingo Molnar 2 siblings, 1 reply; 26+ messages in thread From: Juergen Gross @ 2016-12-08 5:50 UTC (permalink / raw) To: Alex Thorlton, linux-kernel, Ingo Molnar Cc: Russ Anderson, David Vrabel, Thomas Gleixner, H. Peter Anvin, Denys Vlasenko, Boris Ostrovsky, x86, xen-devel On 05/12/16 18:49, Alex Thorlton wrote: > This is the third pass at my patchset to fix up our problems with > XENMEM_machine_memory_map on large systems. The only changes on this > pass were to flesh out the comment above the E820_X_MAX definition, and > to add Juergen's Reviewed-by to the second patch. > > Let me know if anyone has any questions/comments! > > Alex Thorlton (2): > x86: Make E820_X_MAX unconditionally larger than E820MAX > xen/x86: Increase xen_e820_map to E820_X_MAX possible entries > > arch/x86/include/asm/e820.h | 12 ++++++++---- > arch/x86/xen/setup.c | 6 +++--- > 2 files changed, 11 insertions(+), 7 deletions(-) > Ingo, do you have any preferences through which tree those patches should go? I'd like to have at least patch 2 in 4.10, so I could take it through the Xen tree. Juergen ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [RFC PATCH v3] xen/x86: Increase xen_e820_map to E820_X_MAX possible entries 2016-12-08 5:50 ` [RFC PATCH v3] " Juergen Gross @ 2016-12-09 3:46 ` Ingo Molnar 0 siblings, 0 replies; 26+ messages in thread From: Ingo Molnar @ 2016-12-09 3:46 UTC (permalink / raw) To: Juergen Gross Cc: Alex Thorlton, linux-kernel, Ingo Molnar, Russ Anderson, David Vrabel, Thomas Gleixner, H. Peter Anvin, Denys Vlasenko, Boris Ostrovsky, x86, xen-devel * Juergen Gross <jgross@suse.com> wrote: > On 05/12/16 18:49, Alex Thorlton wrote: > > This is the third pass at my patchset to fix up our problems with > > XENMEM_machine_memory_map on large systems. The only changes on this > > pass were to flesh out the comment above the E820_X_MAX definition, and > > to add Juergen's Reviewed-by to the second patch. > > > > Let me know if anyone has any questions/comments! > > > > Alex Thorlton (2): > > x86: Make E820_X_MAX unconditionally larger than E820MAX > > xen/x86: Increase xen_e820_map to E820_X_MAX possible entries > > > > arch/x86/include/asm/e820.h | 12 ++++++++---- > > arch/x86/xen/setup.c | 6 +++--- > > 2 files changed, 11 insertions(+), 7 deletions(-) > > > > Ingo, do you have any preferences through which tree those patches > should go? I'd like to have at least patch 2 in 4.10, so I could take > it through the Xen tree. Sure, both are fine to me: Acked-by: Ingo Molnar <mingo@kernel.org> Thanks, Ingo ^ permalink raw reply [flat|nested] 26+ messages in thread
end of thread, other threads:[~2016-12-09 10:12 UTC | newest] Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-11-15 0:11 [RFC PATCH] xen/x86: Increase xen_e820_map to E820_X_MAX possible entries Alex Thorlton 2016-11-15 6:33 ` Juergen Gross 2016-11-15 7:15 ` Jan Beulich [not found] ` <582AC427020000780011EA7E@suse.com> 2016-11-15 7:36 ` Juergen Gross 2016-11-15 8:01 ` Jan Beulich [not found] ` <582ACEDE020000780011EAC9@suse.com> 2016-11-15 8:42 ` Juergen Gross 2016-11-15 9:45 ` Jan Beulich [not found] ` <582AE72D020000780011EBB2@suse.com> 2016-11-15 9:55 ` Juergen Gross 2016-11-15 10:44 ` Jan Beulich [not found] ` <582AF51F020000780011ECB4@suse.com> 2016-11-15 11:07 ` Juergen Gross 2016-11-15 11:12 ` Jan Beulich 2016-11-15 15:22 ` Alex Thorlton 2016-11-16 6:06 ` Juergen Gross 2016-11-30 3:24 ` [RFC PATCH v2] " Alex Thorlton 2016-11-30 3:24 ` [PATCH 1/2] x86: Make E820_X_MAX unconditionally larger than E820MAX Alex Thorlton 2016-11-30 6:21 ` Ingo Molnar 2016-12-01 18:37 ` Alex Thorlton 2016-11-30 3:24 ` [PATCH 2/2] xen/x86: Increase xen_e820_map to E820_X_MAX possible entries Alex Thorlton 2016-11-30 5:18 ` Juergen Gross 2016-12-05 17:49 ` [RFC PATCH v3] " Alex Thorlton 2016-12-05 17:49 ` [PATCH 1/2] x86: Make E820_X_MAX unconditionally larger than E820MAX Alex Thorlton 2016-12-09 10:12 ` Juergen Gross 2016-12-05 17:49 ` [PATCH 2/2] xen/x86: Increase xen_e820_map to E820_X_MAX possible entries Alex Thorlton 2016-12-09 10:12 ` Juergen Gross 2016-12-08 5:50 ` [RFC PATCH v3] " Juergen Gross 2016-12-09 3:46 ` Ingo Molnar
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).