* [PATCH 0/2] workqueue: fix a bug when numa mapping is changed @ 2015-03-26 2:17 Gu Zheng 2015-03-26 2:17 ` [PATCH 1/2] x86/cpu hotplug: make apicid <--> cpuid mapping persistent Gu Zheng ` (2 more replies) 0 siblings, 3 replies; 23+ messages in thread From: Gu Zheng @ 2015-03-26 2:17 UTC (permalink / raw) To: linux-kernel Cc: tj, laijs, isimatu.yasuaki, kamezawa.hiroyu, tangchen, guz.fnst, izumi.taku Yasuaki Ishimatsu found that with node online/offline, cpu<->node relationship is established. Because workqueue uses a info which was established at boot time, but it may be changed by node hotpluging. Once pool->node points to a stale node, following allocation failure happens. == SLUB: Unable to allocate memory on node 2 (gfp=0x80d0) cache: kmalloc-192, object size: 192, buffer size: 192, default order: 1, min order: 0 node 0: slabs: 6172, objs: 259224, free: 245741 node 1: slabs: 3261, objs: 136962, free: 127656 == As the apicid <--> node relationship is persistent, so the root cause is the cpu-id <-> lapicid mapping is not persistent (because the currently implementation always choose the first free cpu id for the new added cpu), so if we can build persistent cpu-id <-> lapicid relationship, this problem will be fixed. Please refer to https://lkml.org/lkml/2015/2/27/145 for the previous discussion. Gu Zheng (2): x86/cpu hotplug: make lapicid <-> cpuid mapping persistent workqueue: update per cpu workqueue's numa affinity when cpu preparing online arch/x86/kernel/apic/apic.c | 31 ++++++++++++++++++++++++++++++- kernel/workqueue.c | 1 + 2 files changed, 31 insertions(+), 1 deletions(-) -- 1.7.7 ^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 1/2] x86/cpu hotplug: make apicid <--> cpuid mapping persistent 2015-03-26 2:17 [PATCH 0/2] workqueue: fix a bug when numa mapping is changed Gu Zheng @ 2015-03-26 2:17 ` Gu Zheng 2015-03-26 3:19 ` Kamezawa Hiroyuki 2015-03-26 2:17 ` [PATCH 2/2] workqueue: update per cpu workqueue's numa affinity when cpu preparing online Gu Zheng 2015-03-26 3:12 ` [PATCH 0/2] workqueue: fix a bug when numa mapping is changed Kamezawa Hiroyuki 2 siblings, 1 reply; 23+ messages in thread From: Gu Zheng @ 2015-03-26 2:17 UTC (permalink / raw) To: linux-kernel Cc: tj, laijs, isimatu.yasuaki, kamezawa.hiroyu, tangchen, guz.fnst, izumi.taku Previously, we build the apicid <--> cpuid mapping when the cpu is present, but the relationship will be changed if the cpu/node hotplug happenned, because we always choose the first free cpuid for the hot added cpu (whether it is new-add or re-add), so this the cpuid <--> node mapping changed if node hot plug occurred, and it causes the wq sub-system allocation failture: == SLUB: Unable to allocate memory on node 2 (gfp=0x80d0) cache: kmalloc-192, object size: 192, buffer size: 192, default order: 1, min order: 0 node 0: slabs: 6172, objs: 259224, free: 245741 node 1: slabs: 3261, objs: 136962, free: 127656 == So here we build the persistent [lapic id] <--> cpuid mapping when the cpu first present, and never change it. Suggested-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> --- arch/x86/kernel/apic/apic.c | 31 ++++++++++++++++++++++++++++++- 1 files changed, 30 insertions(+), 1 deletions(-) diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c index ad3639a..d539ebc 100644 --- a/arch/x86/kernel/apic/apic.c +++ b/arch/x86/kernel/apic/apic.c @@ -2038,6 +2038,30 @@ void disconnect_bsp_APIC(int virt_wire_setup) apic_write(APIC_LVT1, value); } +/* + * Logic cpu number(cpuid) to local APIC id persistent mappings. + * Do not clear the mapping even if cpu hot removed. + * */ +static int apicid_to_x86_cpu[MAX_LOCAL_APIC] = { + [0 ... MAX_LOCAL_APIC - 1] = -1, +}; + +/* + * Internal cpu id bits, set the bit once cpu present, and never clear it. + * */ +static cpumask_t cpuid_mask = CPU_MASK_NONE; + +static int get_cpuid(int apicid) +{ + int cpuid; + + cpuid = apicid_to_x86_cpu[apicid]; + if (cpuid == -1) + cpuid = cpumask_next_zero(-1, &cpuid_mask); + + return cpuid; +} + int generic_processor_info(int apicid, int version) { int cpu, max = nr_cpu_ids; @@ -2115,7 +2139,10 @@ int generic_processor_info(int apicid, int version) */ cpu = 0; } else - cpu = cpumask_next_zero(-1, cpu_present_mask); + cpu = get_cpuid(apicid); + + /* Store the mapping */ + apicid_to_x86_cpu[apicid] = cpu; /* * Validate version @@ -2144,6 +2171,8 @@ int generic_processor_info(int apicid, int version) early_per_cpu(x86_cpu_to_logical_apicid, cpu) = apic->x86_32_early_logical_apicid(cpu); #endif + /* Mark this cpu id as uesed (already mapping a local apic id) */ + cpumask_set_cpu(cpu, &cpuid_mask); set_cpu_possible(cpu, true); set_cpu_present(cpu, true); -- 1.7.7 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH 1/2] x86/cpu hotplug: make apicid <--> cpuid mapping persistent 2015-03-26 2:17 ` [PATCH 1/2] x86/cpu hotplug: make apicid <--> cpuid mapping persistent Gu Zheng @ 2015-03-26 3:19 ` Kamezawa Hiroyuki 2015-03-26 4:55 ` Gu Zheng 0 siblings, 1 reply; 23+ messages in thread From: Kamezawa Hiroyuki @ 2015-03-26 3:19 UTC (permalink / raw) To: Gu Zheng, linux-kernel; +Cc: tj, laijs, isimatu.yasuaki, tangchen, izumi.taku On 2015/03/26 11:17, Gu Zheng wrote: > Previously, we build the apicid <--> cpuid mapping when the cpu is present, but > the relationship will be changed if the cpu/node hotplug happenned, because we > always choose the first free cpuid for the hot added cpu (whether it is new-add > or re-add), so this the cpuid <--> node mapping changed if node hot plug > occurred, and it causes the wq sub-system allocation failture: > == > SLUB: Unable to allocate memory on node 2 (gfp=0x80d0) > cache: kmalloc-192, object size: 192, buffer size: 192, default > order: > 1, min order: 0 > node 0: slabs: 6172, objs: 259224, free: 245741 > node 1: slabs: 3261, objs: 136962, free: 127656 > == > So here we build the persistent [lapic id] <--> cpuid mapping when the cpu first > present, and never change it. > > Suggested-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> > Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> > --- > arch/x86/kernel/apic/apic.c | 31 ++++++++++++++++++++++++++++++- > 1 files changed, 30 insertions(+), 1 deletions(-) > > diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c > index ad3639a..d539ebc 100644 > --- a/arch/x86/kernel/apic/apic.c > +++ b/arch/x86/kernel/apic/apic.c > @@ -2038,6 +2038,30 @@ void disconnect_bsp_APIC(int virt_wire_setup) > apic_write(APIC_LVT1, value); > } > > +/* > + * Logic cpu number(cpuid) to local APIC id persistent mappings. > + * Do not clear the mapping even if cpu hot removed. > + * */ > +static int apicid_to_x86_cpu[MAX_LOCAL_APIC] = { > + [0 ... MAX_LOCAL_APIC - 1] = -1, > +}; This patch cannot handle x2apic, which is 32bit. As far as I understand, it depends on CPU's spec and the newest cpu has 9bit apicid, at least. But you can't create inifinit array. If you can't allocate the array dynamically, How about adding static int cpuid_to_apicid[MAX_CPU] = {} or using idr library ? (please see lib/idr.c) I guess you can update this map after boot(after mm initialization) and make use of idr library. About this patch, Nack. -Kame > + > +/* > + * Internal cpu id bits, set the bit once cpu present, and never clear it. > + * */ > +static cpumask_t cpuid_mask = CPU_MASK_NONE; > + > +static int get_cpuid(int apicid) > +{ > + int cpuid; > + > + cpuid = apicid_to_x86_cpu[apicid]; > + if (cpuid == -1) > + cpuid = cpumask_next_zero(-1, &cpuid_mask); > + > + return cpuid; > +} > + > int generic_processor_info(int apicid, int version) > { > int cpu, max = nr_cpu_ids; > @@ -2115,7 +2139,10 @@ int generic_processor_info(int apicid, int version) > */ > cpu = 0; > } else > - cpu = cpumask_next_zero(-1, cpu_present_mask); > + cpu = get_cpuid(apicid); > + > + /* Store the mapping */ > + apicid_to_x86_cpu[apicid] = cpu; > > /* > * Validate version > @@ -2144,6 +2171,8 @@ int generic_processor_info(int apicid, int version) > early_per_cpu(x86_cpu_to_logical_apicid, cpu) = > apic->x86_32_early_logical_apicid(cpu); > #endif > + /* Mark this cpu id as uesed (already mapping a local apic id) */ > + cpumask_set_cpu(cpu, &cpuid_mask); > set_cpu_possible(cpu, true); > set_cpu_present(cpu, true); > > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 1/2] x86/cpu hotplug: make apicid <--> cpuid mapping persistent 2015-03-26 3:19 ` Kamezawa Hiroyuki @ 2015-03-26 4:55 ` Gu Zheng 2015-03-26 15:13 ` Tejun Heo 2015-03-26 16:31 ` Kamezawa Hiroyuki 0 siblings, 2 replies; 23+ messages in thread From: Gu Zheng @ 2015-03-26 4:55 UTC (permalink / raw) To: Kamezawa Hiroyuki Cc: linux-kernel, tj, laijs, isimatu.yasuaki, tangchen, izumi.taku Hi Kame-san, On 03/26/2015 11:19 AM, Kamezawa Hiroyuki wrote: > On 2015/03/26 11:17, Gu Zheng wrote: >> Previously, we build the apicid <--> cpuid mapping when the cpu is present, but >> the relationship will be changed if the cpu/node hotplug happenned, because we >> always choose the first free cpuid for the hot added cpu (whether it is new-add >> or re-add), so this the cpuid <--> node mapping changed if node hot plug >> occurred, and it causes the wq sub-system allocation failture: >> == >> SLUB: Unable to allocate memory on node 2 (gfp=0x80d0) >> cache: kmalloc-192, object size: 192, buffer size: 192, default >> order: >> 1, min order: 0 >> node 0: slabs: 6172, objs: 259224, free: 245741 >> node 1: slabs: 3261, objs: 136962, free: 127656 >> == >> So here we build the persistent [lapic id] <--> cpuid mapping when the cpu first >> present, and never change it. >> >> Suggested-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> >> Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> >> --- >> arch/x86/kernel/apic/apic.c | 31 ++++++++++++++++++++++++++++++- >> 1 files changed, 30 insertions(+), 1 deletions(-) >> >> diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c >> index ad3639a..d539ebc 100644 >> --- a/arch/x86/kernel/apic/apic.c >> +++ b/arch/x86/kernel/apic/apic.c >> @@ -2038,6 +2038,30 @@ void disconnect_bsp_APIC(int virt_wire_setup) >> apic_write(APIC_LVT1, value); >> } >> >> +/* >> + * Logic cpu number(cpuid) to local APIC id persistent mappings. >> + * Do not clear the mapping even if cpu hot removed. >> + * */ >> +static int apicid_to_x86_cpu[MAX_LOCAL_APIC] = { >> + [0 ... MAX_LOCAL_APIC - 1] = -1, >> +}; > > > This patch cannot handle x2apic, which is 32bit. IMO, if the apicid is too big (larger than MAX_LOCAL_APIC), we will skip generating a logic cpu number for it, so it seems no problem here. > > As far as I understand, it depends on CPU's spec and the newest cpu has 9bit apicid, at least. > > But you can't create inifinit array. > > If you can't allocate the array dynamically, How about adding > > static int cpuid_to_apicid[MAX_CPU] = {} > > or using idr library ? (please see lib/idr.c) > > I guess you can update this map after boot(after mm initialization) > and make use of idr library. > > About this patch, Nack. > > -Kame > > > >> + >> +/* >> + * Internal cpu id bits, set the bit once cpu present, and never clear it. >> + * */ >> +static cpumask_t cpuid_mask = CPU_MASK_NONE; >> + >> +static int get_cpuid(int apicid) >> +{ >> + int cpuid; >> + >> + cpuid = apicid_to_x86_cpu[apicid]; >> + if (cpuid == -1) >> + cpuid = cpumask_next_zero(-1, &cpuid_mask); >> + >> + return cpuid; >> +} >> + >> int generic_processor_info(int apicid, int version) >> { >> int cpu, max = nr_cpu_ids; >> @@ -2115,7 +2139,10 @@ int generic_processor_info(int apicid, int version) >> */ >> cpu = 0; >> } else >> - cpu = cpumask_next_zero(-1, cpu_present_mask); >> + cpu = get_cpuid(apicid); >> + >> + /* Store the mapping */ >> + apicid_to_x86_cpu[apicid] = cpu; >> >> /* >> * Validate version >> @@ -2144,6 +2171,8 @@ int generic_processor_info(int apicid, int version) >> early_per_cpu(x86_cpu_to_logical_apicid, cpu) = >> apic->x86_32_early_logical_apicid(cpu); >> #endif >> + /* Mark this cpu id as uesed (already mapping a local apic id) */ >> + cpumask_set_cpu(cpu, &cpuid_mask); >> set_cpu_possible(cpu, true); >> set_cpu_present(cpu, true); >> >> > > > . > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 1/2] x86/cpu hotplug: make apicid <--> cpuid mapping persistent 2015-03-26 4:55 ` Gu Zheng @ 2015-03-26 15:13 ` Tejun Heo 2015-03-26 16:31 ` Kamezawa Hiroyuki 1 sibling, 0 replies; 23+ messages in thread From: Tejun Heo @ 2015-03-26 15:13 UTC (permalink / raw) To: Gu Zheng Cc: Kamezawa Hiroyuki, linux-kernel, laijs, isimatu.yasuaki, tangchen, izumi.taku On Thu, Mar 26, 2015 at 12:55:04PM +0800, Gu Zheng wrote: > >> +static int apicid_to_x86_cpu[MAX_LOCAL_APIC] = { > >> + [0 ... MAX_LOCAL_APIC - 1] = -1, > >> +}; > > > > > > This patch cannot handle x2apic, which is 32bit. > > IMO, if the apicid is too big (larger than MAX_LOCAL_APIC), we will skip > generating a logic cpu number for it, so it seems no problem here. You're defining 128k array on 64bit most of which will be unused on most machines. There is a problem with that. Thanks. -- tejun ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 1/2] x86/cpu hotplug: make apicid <--> cpuid mapping persistent 2015-03-26 4:55 ` Gu Zheng 2015-03-26 15:13 ` Tejun Heo @ 2015-03-26 16:31 ` Kamezawa Hiroyuki 2015-03-30 9:58 ` Gu Zheng 1 sibling, 1 reply; 23+ messages in thread From: Kamezawa Hiroyuki @ 2015-03-26 16:31 UTC (permalink / raw) To: Gu Zheng; +Cc: linux-kernel, tj, laijs, isimatu.yasuaki, tangchen, izumi.taku On 2015/03/26 13:55, Gu Zheng wrote: > Hi Kame-san, > On 03/26/2015 11:19 AM, Kamezawa Hiroyuki wrote: > >> On 2015/03/26 11:17, Gu Zheng wrote: >>> Previously, we build the apicid <--> cpuid mapping when the cpu is present, but >>> the relationship will be changed if the cpu/node hotplug happenned, because we >>> always choose the first free cpuid for the hot added cpu (whether it is new-add >>> or re-add), so this the cpuid <--> node mapping changed if node hot plug >>> occurred, and it causes the wq sub-system allocation failture: >>> == >>> SLUB: Unable to allocate memory on node 2 (gfp=0x80d0) >>> cache: kmalloc-192, object size: 192, buffer size: 192, default >>> order: >>> 1, min order: 0 >>> node 0: slabs: 6172, objs: 259224, free: 245741 >>> node 1: slabs: 3261, objs: 136962, free: 127656 >>> == >>> So here we build the persistent [lapic id] <--> cpuid mapping when the cpu first >>> present, and never change it. >>> >>> Suggested-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> >>> Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> >>> --- >>> arch/x86/kernel/apic/apic.c | 31 ++++++++++++++++++++++++++++++- >>> 1 files changed, 30 insertions(+), 1 deletions(-) >>> >>> diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c >>> index ad3639a..d539ebc 100644 >>> --- a/arch/x86/kernel/apic/apic.c >>> +++ b/arch/x86/kernel/apic/apic.c >>> @@ -2038,6 +2038,30 @@ void disconnect_bsp_APIC(int virt_wire_setup) >>> apic_write(APIC_LVT1, value); >>> } >>> >>> +/* >>> + * Logic cpu number(cpuid) to local APIC id persistent mappings. >>> + * Do not clear the mapping even if cpu hot removed. >>> + * */ >>> +static int apicid_to_x86_cpu[MAX_LOCAL_APIC] = { >>> + [0 ... MAX_LOCAL_APIC - 1] = -1, >>> +}; >> >> >> This patch cannot handle x2apic, which is 32bit. > > IMO, if the apicid is too big (larger than MAX_LOCAL_APIC), we will skip > generating a logic cpu number for it, so it seems no problem here. > you mean MAX_LOCAL_APIC=32768 ? ....isn't it too wasting ? Anyway, APIC IDs are sparse values. Please use proper structure. Thanks, -Kame >> >> As far as I understand, it depends on CPU's spec and the newest cpu has 9bit apicid, at least. >> >> But you can't create inifinit array. >> >> If you can't allocate the array dynamically, How about adding >> >> static int cpuid_to_apicid[MAX_CPU] = {} >> >> or using idr library ? (please see lib/idr.c) >> >> I guess you can update this map after boot(after mm initialization) >> and make use of idr library. >> >> About this patch, Nack. >> >> -Kame >> >> >> >>> + >>> +/* >>> + * Internal cpu id bits, set the bit once cpu present, and never clear it. >>> + * */ >>> +static cpumask_t cpuid_mask = CPU_MASK_NONE; >>> + >>> +static int get_cpuid(int apicid) >>> +{ >>> + int cpuid; >>> + >>> + cpuid = apicid_to_x86_cpu[apicid]; >>> + if (cpuid == -1) >>> + cpuid = cpumask_next_zero(-1, &cpuid_mask); >>> + >>> + return cpuid; >>> +} >>> + >>> int generic_processor_info(int apicid, int version) >>> { >>> int cpu, max = nr_cpu_ids; >>> @@ -2115,7 +2139,10 @@ int generic_processor_info(int apicid, int version) >>> */ >>> cpu = 0; >>> } else >>> - cpu = cpumask_next_zero(-1, cpu_present_mask); >>> + cpu = get_cpuid(apicid); >>> + >>> + /* Store the mapping */ >>> + apicid_to_x86_cpu[apicid] = cpu; >>> >>> /* >>> * Validate version >>> @@ -2144,6 +2171,8 @@ int generic_processor_info(int apicid, int version) >>> early_per_cpu(x86_cpu_to_logical_apicid, cpu) = >>> apic->x86_32_early_logical_apicid(cpu); >>> #endif >>> + /* Mark this cpu id as uesed (already mapping a local apic id) */ >>> + cpumask_set_cpu(cpu, &cpuid_mask); >>> set_cpu_possible(cpu, true); >>> set_cpu_present(cpu, true); >>> >>> >> >> >> . >> > > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 1/2] x86/cpu hotplug: make apicid <--> cpuid mapping persistent 2015-03-26 16:31 ` Kamezawa Hiroyuki @ 2015-03-30 9:58 ` Gu Zheng 2015-04-01 2:59 ` Kamezawa Hiroyuki 0 siblings, 1 reply; 23+ messages in thread From: Gu Zheng @ 2015-03-30 9:58 UTC (permalink / raw) To: Kamezawa Hiroyuki Cc: linux-kernel, tj, laijs, isimatu.yasuaki, tangchen, izumi.taku Hi Kame-san, On 03/27/2015 12:31 AM, Kamezawa Hiroyuki wrote: > On 2015/03/26 13:55, Gu Zheng wrote: >> Hi Kame-san, >> On 03/26/2015 11:19 AM, Kamezawa Hiroyuki wrote: >> >>> On 2015/03/26 11:17, Gu Zheng wrote: >>>> Previously, we build the apicid <--> cpuid mapping when the cpu is present, but >>>> the relationship will be changed if the cpu/node hotplug happenned, because we >>>> always choose the first free cpuid for the hot added cpu (whether it is new-add >>>> or re-add), so this the cpuid <--> node mapping changed if node hot plug >>>> occurred, and it causes the wq sub-system allocation failture: >>>> == >>>> SLUB: Unable to allocate memory on node 2 (gfp=0x80d0) >>>> cache: kmalloc-192, object size: 192, buffer size: 192, default >>>> order: >>>> 1, min order: 0 >>>> node 0: slabs: 6172, objs: 259224, free: 245741 >>>> node 1: slabs: 3261, objs: 136962, free: 127656 >>>> == >>>> So here we build the persistent [lapic id] <--> cpuid mapping when the cpu first >>>> present, and never change it. >>>> >>>> Suggested-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> >>>> Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> >>>> --- >>>> arch/x86/kernel/apic/apic.c | 31 ++++++++++++++++++++++++++++++- >>>> 1 files changed, 30 insertions(+), 1 deletions(-) >>>> >>>> diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c >>>> index ad3639a..d539ebc 100644 >>>> --- a/arch/x86/kernel/apic/apic.c >>>> +++ b/arch/x86/kernel/apic/apic.c >>>> @@ -2038,6 +2038,30 @@ void disconnect_bsp_APIC(int virt_wire_setup) >>>> apic_write(APIC_LVT1, value); >>>> } >>>> >>>> +/* >>>> + * Logic cpu number(cpuid) to local APIC id persistent mappings. >>>> + * Do not clear the mapping even if cpu hot removed. >>>> + * */ >>>> +static int apicid_to_x86_cpu[MAX_LOCAL_APIC] = { >>>> + [0 ... MAX_LOCAL_APIC - 1] = -1, >>>> +}; >>> >>> >>> This patch cannot handle x2apic, which is 32bit. >> >> IMO, if the apicid is too big (larger than MAX_LOCAL_APIC), we will skip >> generating a logic cpu number for it, so it seems no problem here. >> > you mean MAX_LOCAL_APIC=32768 ? ....isn't it too wasting ? I use the big array here to keep the same format with the existed ones: int apic_version[MAX_LOCAL_APIC]; s16 __apicid_to_node[MAX_LOCAL_APIC] = { [0 ... MAX_LOCAL_APIC-1] = NUMA_NO_NODE }; Or we should also say "NO" to them? Regards, Gu > > Anyway, APIC IDs are sparse values. Please use proper structure. > > Thanks, > -Kame > >>> >>> As far as I understand, it depends on CPU's spec and the newest cpu has 9bit apicid, at least. >>> >>> But you can't create inifinit array. >>> >>> If you can't allocate the array dynamically, How about adding >>> >>> static int cpuid_to_apicid[MAX_CPU] = {} >>> >>> or using idr library ? (please see lib/idr.c) >>> >>> I guess you can update this map after boot(after mm initialization) >>> and make use of idr library. >>> >>> About this patch, Nack. >>> >>> -Kame >>> >>> >>> >>>> + >>>> +/* >>>> + * Internal cpu id bits, set the bit once cpu present, and never clear it. >>>> + * */ >>>> +static cpumask_t cpuid_mask = CPU_MASK_NONE; >>>> + >>>> +static int get_cpuid(int apicid) >>>> +{ >>>> + int cpuid; >>>> + >>>> + cpuid = apicid_to_x86_cpu[apicid]; >>>> + if (cpuid == -1) >>>> + cpuid = cpumask_next_zero(-1, &cpuid_mask); >>>> + >>>> + return cpuid; >>>> +} >>>> + >>>> int generic_processor_info(int apicid, int version) >>>> { >>>> int cpu, max = nr_cpu_ids; >>>> @@ -2115,7 +2139,10 @@ int generic_processor_info(int apicid, int version) >>>> */ >>>> cpu = 0; >>>> } else >>>> - cpu = cpumask_next_zero(-1, cpu_present_mask); >>>> + cpu = get_cpuid(apicid); >>>> + >>>> + /* Store the mapping */ >>>> + apicid_to_x86_cpu[apicid] = cpu; >>>> >>>> /* >>>> * Validate version >>>> @@ -2144,6 +2171,8 @@ int generic_processor_info(int apicid, int version) >>>> early_per_cpu(x86_cpu_to_logical_apicid, cpu) = >>>> apic->x86_32_early_logical_apicid(cpu); >>>> #endif >>>> + /* Mark this cpu id as uesed (already mapping a local apic id) */ >>>> + cpumask_set_cpu(cpu, &cpuid_mask); >>>> set_cpu_possible(cpu, true); >>>> set_cpu_present(cpu, true); >>>> >>>> >>> >>> >>> . >>> >> >> > > > . > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 1/2] x86/cpu hotplug: make apicid <--> cpuid mapping persistent 2015-03-30 9:58 ` Gu Zheng @ 2015-04-01 2:59 ` Kamezawa Hiroyuki 0 siblings, 0 replies; 23+ messages in thread From: Kamezawa Hiroyuki @ 2015-04-01 2:59 UTC (permalink / raw) To: Gu Zheng; +Cc: linux-kernel, tj, laijs, isimatu.yasuaki, tangchen, izumi.taku On 2015/03/30 18:58, Gu Zheng wrote: > Hi Kame-san, > > On 03/27/2015 12:31 AM, Kamezawa Hiroyuki wrote: > >> On 2015/03/26 13:55, Gu Zheng wrote: >>> Hi Kame-san, >>> On 03/26/2015 11:19 AM, Kamezawa Hiroyuki wrote: >>> >>>> On 2015/03/26 11:17, Gu Zheng wrote: >>>>> Previously, we build the apicid <--> cpuid mapping when the cpu is present, but >>>>> the relationship will be changed if the cpu/node hotplug happenned, because we >>>>> always choose the first free cpuid for the hot added cpu (whether it is new-add >>>>> or re-add), so this the cpuid <--> node mapping changed if node hot plug >>>>> occurred, and it causes the wq sub-system allocation failture: >>>>> == >>>>> SLUB: Unable to allocate memory on node 2 (gfp=0x80d0) >>>>> cache: kmalloc-192, object size: 192, buffer size: 192, default >>>>> order: >>>>> 1, min order: 0 >>>>> node 0: slabs: 6172, objs: 259224, free: 245741 >>>>> node 1: slabs: 3261, objs: 136962, free: 127656 >>>>> == >>>>> So here we build the persistent [lapic id] <--> cpuid mapping when the cpu first >>>>> present, and never change it. >>>>> >>>>> Suggested-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> >>>>> Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> >>>>> --- >>>>> arch/x86/kernel/apic/apic.c | 31 ++++++++++++++++++++++++++++++- >>>>> 1 files changed, 30 insertions(+), 1 deletions(-) >>>>> >>>>> diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c >>>>> index ad3639a..d539ebc 100644 >>>>> --- a/arch/x86/kernel/apic/apic.c >>>>> +++ b/arch/x86/kernel/apic/apic.c >>>>> @@ -2038,6 +2038,30 @@ void disconnect_bsp_APIC(int virt_wire_setup) >>>>> apic_write(APIC_LVT1, value); >>>>> } >>>>> >>>>> +/* >>>>> + * Logic cpu number(cpuid) to local APIC id persistent mappings. >>>>> + * Do not clear the mapping even if cpu hot removed. >>>>> + * */ >>>>> +static int apicid_to_x86_cpu[MAX_LOCAL_APIC] = { >>>>> + [0 ... MAX_LOCAL_APIC - 1] = -1, >>>>> +}; >>>> >>>> >>>> This patch cannot handle x2apic, which is 32bit. >>> >>> IMO, if the apicid is too big (larger than MAX_LOCAL_APIC), we will skip >>> generating a logic cpu number for it, so it seems no problem here. >>> >> you mean MAX_LOCAL_APIC=32768 ? ....isn't it too wasting ? > > I use the big array here to keep the same format with the existed ones: > int apic_version[MAX_LOCAL_APIC]; > > s16 __apicid_to_node[MAX_LOCAL_APIC] = { > [0 ... MAX_LOCAL_APIC-1] = NUMA_NO_NODE > }; > Or we should also say "NO" to them? This will not survive when someone try to enlarge MAX_LOCAL_APIC for introdcing new cpu model because x2apic is 32bit by definition. Please create long surviving infrastructure. Thanks, -Kame ^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 2/2] workqueue: update per cpu workqueue's numa affinity when cpu preparing online 2015-03-26 2:17 [PATCH 0/2] workqueue: fix a bug when numa mapping is changed Gu Zheng 2015-03-26 2:17 ` [PATCH 1/2] x86/cpu hotplug: make apicid <--> cpuid mapping persistent Gu Zheng @ 2015-03-26 2:17 ` Gu Zheng 2015-03-26 3:12 ` [PATCH 0/2] workqueue: fix a bug when numa mapping is changed Kamezawa Hiroyuki 2 siblings, 0 replies; 23+ messages in thread From: Gu Zheng @ 2015-03-26 2:17 UTC (permalink / raw) To: linux-kernel Cc: tj, laijs, isimatu.yasuaki, kamezawa.hiroyu, tangchen, guz.fnst, izumi.taku Update the per cpu workqueue's numa affinity when cpu preparing online to create the worker on the correct node. Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> --- kernel/workqueue.c | 1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 41ff75b..4c65953 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -4618,6 +4618,7 @@ static int workqueue_cpu_up_callback(struct notifier_block *nfb, switch (action & ~CPU_TASKS_FROZEN) { case CPU_UP_PREPARE: for_each_cpu_worker_pool(pool, cpu) { + pool->node = cpu_to_node(cpu); if (pool->nr_workers) continue; if (!create_worker(pool)) -- 1.7.7 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH 0/2] workqueue: fix a bug when numa mapping is changed 2015-03-26 2:17 [PATCH 0/2] workqueue: fix a bug when numa mapping is changed Gu Zheng 2015-03-26 2:17 ` [PATCH 1/2] x86/cpu hotplug: make apicid <--> cpuid mapping persistent Gu Zheng 2015-03-26 2:17 ` [PATCH 2/2] workqueue: update per cpu workqueue's numa affinity when cpu preparing online Gu Zheng @ 2015-03-26 3:12 ` Kamezawa Hiroyuki 2015-03-26 5:04 ` Gu Zheng 2 siblings, 1 reply; 23+ messages in thread From: Kamezawa Hiroyuki @ 2015-03-26 3:12 UTC (permalink / raw) To: Gu Zheng, linux-kernel; +Cc: tj, laijs, isimatu.yasuaki, tangchen, izumi.taku On 2015/03/26 11:17, Gu Zheng wrote: > Yasuaki Ishimatsu found that with node online/offline, cpu<->node > relationship is established. Because workqueue uses a info which was > established at boot time, but it may be changed by node hotpluging. > > Once pool->node points to a stale node, following allocation failure > happens. > == > SLUB: Unable to allocate memory on node 2 (gfp=0x80d0) > cache: kmalloc-192, object size: 192, buffer size: 192, default > order: > 1, min order: 0 > node 0: slabs: 6172, objs: 259224, free: 245741 > node 1: slabs: 3261, objs: 136962, free: 127656 > == > > As the apicid <--> node relationship is persistent, so the root cause is the ^^^^^^^ pxm. > cpu-id <-> lapicid mapping is not persistent (because the currently implementation > always choose the first free cpu id for the new added cpu), so if we can build > persistent cpu-id <-> lapicid relationship, this problem will be fixed. > > Please refer to https://lkml.org/lkml/2015/2/27/145 for the previous discussion. > > Gu Zheng (2): > x86/cpu hotplug: make lapicid <-> cpuid mapping persistent > workqueue: update per cpu workqueue's numa affinity when cpu > preparing online why patch(2/2) required ? Thanks, -Kame > > arch/x86/kernel/apic/apic.c | 31 ++++++++++++++++++++++++++++++- > kernel/workqueue.c | 1 + > 2 files changed, 31 insertions(+), 1 deletions(-) > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 0/2] workqueue: fix a bug when numa mapping is changed 2015-03-26 3:12 ` [PATCH 0/2] workqueue: fix a bug when numa mapping is changed Kamezawa Hiroyuki @ 2015-03-26 5:04 ` Gu Zheng 2015-03-26 15:18 ` Tejun Heo 0 siblings, 1 reply; 23+ messages in thread From: Gu Zheng @ 2015-03-26 5:04 UTC (permalink / raw) To: Kamezawa Hiroyuki Cc: linux-kernel, tj, laijs, isimatu.yasuaki, tangchen, izumi.taku Hi Kame-san, On 03/26/2015 11:12 AM, Kamezawa Hiroyuki wrote: > On 2015/03/26 11:17, Gu Zheng wrote: >> Yasuaki Ishimatsu found that with node online/offline, cpu<->node >> relationship is established. Because workqueue uses a info which was >> established at boot time, but it may be changed by node hotpluging. >> >> Once pool->node points to a stale node, following allocation failure >> happens. >> == >> SLUB: Unable to allocate memory on node 2 (gfp=0x80d0) >> cache: kmalloc-192, object size: 192, buffer size: 192, default >> order: >> 1, min order: 0 >> node 0: slabs: 6172, objs: 259224, free: 245741 >> node 1: slabs: 3261, objs: 136962, free: 127656 >> == >> >> As the apicid <--> node relationship is persistent, so the root cause is the > ^^^^^^^ > pxm. > >> cpu-id <-> lapicid mapping is not persistent (because the currently implementation >> always choose the first free cpu id for the new added cpu), so if we can build >> persistent cpu-id <-> lapicid relationship, this problem will be fixed. >> >> Please refer to https://lkml.org/lkml/2015/2/27/145 for the previous discussion. >> >> Gu Zheng (2): >> x86/cpu hotplug: make lapicid <-> cpuid mapping persistent >> workqueue: update per cpu workqueue's numa affinity when cpu >> preparing online > > why patch(2/2) required ? wq generates the numa affinity (pool->node) for all the possible cpu's per cpu workqueue at init stage, that means the affinity of currently un-present ones' may be incorrect, so we need to update the pool->node for the new added cpu to the correct node when preparing online, otherwise it will try to create worker on invalid node if node hotplug occurred. Regards, Gu > > Thanks, > -Kame > >> >> arch/x86/kernel/apic/apic.c | 31 ++++++++++++++++++++++++++++++- >> kernel/workqueue.c | 1 + >> 2 files changed, 31 insertions(+), 1 deletions(-) >> > > > . > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 0/2] workqueue: fix a bug when numa mapping is changed 2015-03-26 5:04 ` Gu Zheng @ 2015-03-26 15:18 ` Tejun Heo 2015-03-26 16:42 ` Kamezawa Hiroyuki 0 siblings, 1 reply; 23+ messages in thread From: Tejun Heo @ 2015-03-26 15:18 UTC (permalink / raw) To: Gu Zheng Cc: Kamezawa Hiroyuki, linux-kernel, laijs, isimatu.yasuaki, tangchen, izumi.taku Hello, On Thu, Mar 26, 2015 at 01:04:00PM +0800, Gu Zheng wrote: > wq generates the numa affinity (pool->node) for all the possible cpu's > per cpu workqueue at init stage, that means the affinity of currently un-present > ones' may be incorrect, so we need to update the pool->node for the new added cpu > to the correct node when preparing online, otherwise it will try to create worker > on invalid node if node hotplug occurred. If the mapping is gonna be static once the cpus show up, any chance we can initialize that for all possible cpus during boot? Thanks. -- tejun ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 0/2] workqueue: fix a bug when numa mapping is changed 2015-03-26 15:18 ` Tejun Heo @ 2015-03-26 16:42 ` Kamezawa Hiroyuki 2015-03-30 9:49 ` Gu Zheng 2015-03-30 9:49 ` Gu Zheng 0 siblings, 2 replies; 23+ messages in thread From: Kamezawa Hiroyuki @ 2015-03-26 16:42 UTC (permalink / raw) To: Tejun Heo, Gu Zheng Cc: linux-kernel, laijs, isimatu.yasuaki, tangchen, izumi.taku On 2015/03/27 0:18, Tejun Heo wrote: > Hello, > > On Thu, Mar 26, 2015 at 01:04:00PM +0800, Gu Zheng wrote: >> wq generates the numa affinity (pool->node) for all the possible cpu's >> per cpu workqueue at init stage, that means the affinity of currently un-present >> ones' may be incorrect, so we need to update the pool->node for the new added cpu >> to the correct node when preparing online, otherwise it will try to create worker >> on invalid node if node hotplug occurred. > > If the mapping is gonna be static once the cpus show up, any chance we > can initialize that for all possible cpus during boot? > I think the kernel can define all possible cpuid <-> lapicid <-> pxm <-> nodeid mapping at boot with using firmware table information. One concern is current x86 logic for memory-less node v.s. memory hotplug. (as I explained before) My idea is step1. build all possible mapping at boot cpuid <-> apicid <-> pxm <-> node id at boot. But this may be overwritten by x86's memory less node logic. So, step2. check node is online or not before calling kmalloc. If offline, use -1. rather than updating workqueue's attribute. Thanks, -Kame ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 0/2] workqueue: fix a bug when numa mapping is changed 2015-03-26 16:42 ` Kamezawa Hiroyuki @ 2015-03-30 9:49 ` Gu Zheng 2015-03-30 9:49 ` Gu Zheng 1 sibling, 0 replies; 23+ messages in thread From: Gu Zheng @ 2015-03-30 9:49 UTC (permalink / raw) To: Kamezawa Hiroyuki Cc: Tejun Heo, linux-kernel, laijs, isimatu.yasuaki, tangchen, izumi.taku Hi Kame-san, On 03/27/2015 12:42 AM, Kamezawa Hiroyuki wrote: > On 2015/03/27 0:18, Tejun Heo wrote: >> Hello, >> >> On Thu, Mar 26, 2015 at 01:04:00PM +0800, Gu Zheng wrote: >>> wq generates the numa affinity (pool->node) for all the possible cpu's >>> per cpu workqueue at init stage, that means the affinity of currently un-present >>> ones' may be incorrect, so we need to update the pool->node for the new added cpu >>> to the correct node when preparing online, otherwise it will try to create worker >>> on invalid node if node hotplug occurred. >> >> If the mapping is gonna be static once the cpus show up, any chance we >> can initialize that for all possible cpus during boot? >> > > I think the kernel can define all possible > > cpuid <-> lapicid <-> pxm <-> nodeid > > mapping at boot with using firmware table information. Could you explain more? > > One concern is current x86 logic for memory-less node v.s. memory hotplug. > (as I explained before) > > My idea is > step1. build all possible mapping at boot cpuid <-> apicid <-> pxm <-> node id at boot. > > But this may be overwritten by x86's memory less node logic. So, > step2. check node is online or not before calling kmalloc. If offline, use -1. > rather than updating workqueue's attribute. > > Thanks, > -Kame > > . > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 0/2] workqueue: fix a bug when numa mapping is changed 2015-03-26 16:42 ` Kamezawa Hiroyuki 2015-03-30 9:49 ` Gu Zheng @ 2015-03-30 9:49 ` Gu Zheng 2015-03-31 6:09 ` Kamezawa Hiroyuki 1 sibling, 1 reply; 23+ messages in thread From: Gu Zheng @ 2015-03-30 9:49 UTC (permalink / raw) To: Kamezawa Hiroyuki Cc: Tejun Heo, linux-kernel, laijs, isimatu.yasuaki, tangchen, izumi.taku Hi Kame-san, On 03/27/2015 12:42 AM, Kamezawa Hiroyuki wrote: > On 2015/03/27 0:18, Tejun Heo wrote: >> Hello, >> >> On Thu, Mar 26, 2015 at 01:04:00PM +0800, Gu Zheng wrote: >>> wq generates the numa affinity (pool->node) for all the possible cpu's >>> per cpu workqueue at init stage, that means the affinity of currently un-present >>> ones' may be incorrect, so we need to update the pool->node for the new added cpu >>> to the correct node when preparing online, otherwise it will try to create worker >>> on invalid node if node hotplug occurred. >> >> If the mapping is gonna be static once the cpus show up, any chance we >> can initialize that for all possible cpus during boot? >> > > I think the kernel can define all possible > > cpuid <-> lapicid <-> pxm <-> nodeid > > mapping at boot with using firmware table information. Could you explain more? Regards, Gu > > One concern is current x86 logic for memory-less node v.s. memory hotplug. > (as I explained before) > > My idea is > step1. build all possible mapping at boot cpuid <-> apicid <-> pxm <-> node id at boot. > > But this may be overwritten by x86's memory less node logic. So, > step2. check node is online or not before calling kmalloc. If offline, use -1. > rather than updating workqueue's attribute. > > Thanks, > -Kame > > . > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 0/2] workqueue: fix a bug when numa mapping is changed 2015-03-30 9:49 ` Gu Zheng @ 2015-03-31 6:09 ` Kamezawa Hiroyuki 2015-03-31 15:28 ` Tejun Heo 0 siblings, 1 reply; 23+ messages in thread From: Kamezawa Hiroyuki @ 2015-03-31 6:09 UTC (permalink / raw) To: Gu Zheng Cc: Tejun Heo, linux-kernel, laijs, isimatu.yasuaki, tangchen, izumi.taku On 2015/03/30 18:49, Gu Zheng wrote: > Hi Kame-san, > > On 03/27/2015 12:42 AM, Kamezawa Hiroyuki wrote: > >> On 2015/03/27 0:18, Tejun Heo wrote: >>> Hello, >>> >>> On Thu, Mar 26, 2015 at 01:04:00PM +0800, Gu Zheng wrote: >>>> wq generates the numa affinity (pool->node) for all the possible cpu's >>>> per cpu workqueue at init stage, that means the affinity of currently un-present >>>> ones' may be incorrect, so we need to update the pool->node for the new added cpu >>>> to the correct node when preparing online, otherwise it will try to create worker >>>> on invalid node if node hotplug occurred. >>> >>> If the mapping is gonna be static once the cpus show up, any chance we >>> can initialize that for all possible cpus during boot? >>> >> >> I think the kernel can define all possible >> >> cpuid <-> lapicid <-> pxm <-> nodeid >> >> mapping at boot with using firmware table information. > > Could you explain more? you can scan firmware's table and store all provided information for possible cpus/pxms somewhere even while future cpus/mems are not onlined rather than determine them on demand. But this may be considered as API change for most hot-add users. So, for now, I vote for detemining ids at online but record it is a good way. Thanks, -Kame ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 0/2] workqueue: fix a bug when numa mapping is changed 2015-03-31 6:09 ` Kamezawa Hiroyuki @ 2015-03-31 15:28 ` Tejun Heo 2015-04-01 2:55 ` Kamezawa Hiroyuki 0 siblings, 1 reply; 23+ messages in thread From: Tejun Heo @ 2015-03-31 15:28 UTC (permalink / raw) To: Kamezawa Hiroyuki Cc: Gu Zheng, linux-kernel, laijs, isimatu.yasuaki, tangchen, izumi.taku Hello, Kamezawa. On Tue, Mar 31, 2015 at 03:09:05PM +0900, Kamezawa Hiroyuki wrote: > But this may be considered as API change for most hot-add users. Hmm... Why would it be? What can that possibly break? > So, for now, I vote for detemining ids at online but record it is a good way. If we know the information during boot, let's please do it at boot time by all means. Thanks. -- tejun ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 0/2] workqueue: fix a bug when numa mapping is changed 2015-03-31 15:28 ` Tejun Heo @ 2015-04-01 2:55 ` Kamezawa Hiroyuki 2015-04-01 3:02 ` Tejun Heo 0 siblings, 1 reply; 23+ messages in thread From: Kamezawa Hiroyuki @ 2015-04-01 2:55 UTC (permalink / raw) To: Tejun Heo Cc: Gu Zheng, linux-kernel, laijs, isimatu.yasuaki, tangchen, izumi.taku On 2015/04/01 0:28, Tejun Heo wrote: > Hello, Kamezawa. > > On Tue, Mar 31, 2015 at 03:09:05PM +0900, Kamezawa Hiroyuki wrote: >> But this may be considered as API change for most hot-add users. > > Hmm... Why would it be? What can that possibly break? > Now, hot-added cpus will have the lowest free cpu id. Because of this, in most of systems which has only cpu-hot-add, cpu-ids are always contiguous even after cpu hot add. In enterprise, this would be considered as imcompatibility. determining cpuid <-> lapicid at boot will make cpuids sparse. That may corrupt exisiting script or configuration/resource management software. >> So, for now, I vote for detemining ids at online but record it is a good way. > > If we know the information during boot, let's please do it at boot > time by all means. I basically agree. Just thinking influence of this small imcompatibility of forcing ideal way because of my standing point. Thanks, -Kame ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 0/2] workqueue: fix a bug when numa mapping is changed 2015-04-01 2:55 ` Kamezawa Hiroyuki @ 2015-04-01 3:02 ` Tejun Heo 2015-04-01 3:05 ` Tejun Heo 2015-04-01 8:30 ` Kamezawa Hiroyuki 0 siblings, 2 replies; 23+ messages in thread From: Tejun Heo @ 2015-04-01 3:02 UTC (permalink / raw) To: Kamezawa Hiroyuki Cc: Gu Zheng, linux-kernel, laijs, isimatu.yasuaki, tangchen, izumi.taku On Wed, Apr 01, 2015 at 11:55:11AM +0900, Kamezawa Hiroyuki wrote: > Now, hot-added cpus will have the lowest free cpu id. > > Because of this, in most of systems which has only cpu-hot-add, cpu-ids are always > contiguous even after cpu hot add. > In enterprise, this would be considered as imcompatibility. > > determining cpuid <-> lapicid at boot will make cpuids sparse. That may corrupt > exisiting script or configuration/resource management software. Ugh... so, cpu number allocation on hot-add is part of userland interface that we're locked into? Tying hotplug and id allocation order together usually isn't a good idea. What if the cpu up fails while running the notifiers? The ID is already allocated and the next cpu being brought up will be after a hole anyway. Is this even actually gonna affect userland? -- tejun ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 0/2] workqueue: fix a bug when numa mapping is changed 2015-04-01 3:02 ` Tejun Heo @ 2015-04-01 3:05 ` Tejun Heo 2015-04-01 8:30 ` Kamezawa Hiroyuki 1 sibling, 0 replies; 23+ messages in thread From: Tejun Heo @ 2015-04-01 3:05 UTC (permalink / raw) To: Kamezawa Hiroyuki Cc: Gu Zheng, linux-kernel, laijs, isimatu.yasuaki, tangchen, izumi.taku On Tue, Mar 31, 2015 at 11:02:42PM -0400, Tejun Heo wrote: > Ugh... so, cpu number allocation on hot-add is part of userland > interface that we're locked into? Tying hotplug and id allocation > order together usually isn't a good idea. What if the cpu up fails > while running the notifiers? The ID is already allocated and the next > cpu being brought up will be after a hole anyway. Is this even > actually gonna affect userland? At any rate, this isn't big a deal. If the mapping has to be dynamic, it'd be great to move the mapping code from workqueue to cpu hotplug proper tho. Thanks. -- tejun ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 0/2] workqueue: fix a bug when numa mapping is changed 2015-04-01 3:02 ` Tejun Heo 2015-04-01 3:05 ` Tejun Heo @ 2015-04-01 8:30 ` Kamezawa Hiroyuki 2015-04-02 1:36 ` Gu Zheng 1 sibling, 1 reply; 23+ messages in thread From: Kamezawa Hiroyuki @ 2015-04-01 8:30 UTC (permalink / raw) To: Tejun Heo Cc: Gu Zheng, linux-kernel, laijs, isimatu.yasuaki, tangchen, izumi.taku On 2015/04/01 12:02, Tejun Heo wrote: > On Wed, Apr 01, 2015 at 11:55:11AM +0900, Kamezawa Hiroyuki wrote: >> Now, hot-added cpus will have the lowest free cpu id. >> >> Because of this, in most of systems which has only cpu-hot-add, cpu-ids are always >> contiguous even after cpu hot add. >> In enterprise, this would be considered as imcompatibility. >> >> determining cpuid <-> lapicid at boot will make cpuids sparse. That may corrupt >> exisiting script or configuration/resource management software. > > Ugh... so, cpu number allocation on hot-add is part of userland > interface that we're locked into? We checked most of RHEL7 packages and didn't find a problem yet. But, for examle, we know some performance test team's test program assumed contiguous cpuids and it failed. It was an easy case because we can ask them to fix the application but I guess there will be some amount of customers that cpuids are contiguous. > Tying hotplug and id allocation > order together usually isn't a good idea. What if the cpu up fails > while running the notifiers? The ID is already allocated and the next > cpu being brought up will be after a hole anyway. Is this even > actually gonna affect userland? > Maybe. It's not fail-safe but.... In general, all kernel engineers (and skilled userland engineers) knows that cpuids cannot be always contiguous and cpuids/nodeids should be checked before running programs. I think most of engineers should be aware of that but many users have their own assumption :( Basically, I don't have strong objections, you're right technically. In summary... - users should not assume cpuids are contiguous. - all possible ids should be fixed at boot time. - For uses, some clarification document should be somewhere in Documenatation. So, Gu-san 1) determine all possible ids at boot. 2) clarify cpuid/nodeid can have hole because of 1) in Documenation. 3) It would be good if other guys give us ack. In future, I myself thinks naming system like udev for cpuid/numaid is necessary, at last. Can that renaming feature can be cgroup/namespace feature ? If possible, all container can have cpuids starting from 0. Thanks, -Kame ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 0/2] workqueue: fix a bug when numa mapping is changed 2015-04-01 8:30 ` Kamezawa Hiroyuki @ 2015-04-02 1:36 ` Gu Zheng 2015-04-02 2:54 ` Kamezawa Hiroyuki 0 siblings, 1 reply; 23+ messages in thread From: Gu Zheng @ 2015-04-02 1:36 UTC (permalink / raw) To: Kamezawa Hiroyuki, Tejun Heo Cc: linux-kernel, laijs, isimatu.yasuaki, tangchen, izumi.taku Hi Kame, TJ, On 04/01/2015 04:30 PM, Kamezawa Hiroyuki wrote: > On 2015/04/01 12:02, Tejun Heo wrote: >> On Wed, Apr 01, 2015 at 11:55:11AM +0900, Kamezawa Hiroyuki wrote: >>> Now, hot-added cpus will have the lowest free cpu id. >>> >>> Because of this, in most of systems which has only cpu-hot-add, cpu-ids are always >>> contiguous even after cpu hot add. >>> In enterprise, this would be considered as imcompatibility. >>> >>> determining cpuid <-> lapicid at boot will make cpuids sparse. That may corrupt >>> exisiting script or configuration/resource management software. >> >> Ugh... so, cpu number allocation on hot-add is part of userland >> interface that we're locked into? > > We checked most of RHEL7 packages and didn't find a problem yet. > But, for examle, we know some performance test team's test program assumed contiguous > cpuids and it failed. It was an easy case because we can ask them to fix the application > but I guess there will be some amount of customers that cpuids are contiguous. > >> Tying hotplug and id allocation >> order together usually isn't a good idea. What if the cpu up fails >> while running the notifiers? The ID is already allocated and the next >> cpu being brought up will be after a hole anyway. Is this even >> actually gonna affect userland? >> > > Maybe. It's not fail-safe but.... > > In general, all kernel engineers (and skilled userland engineers) knows that > cpuids cannot be always contiguous and cpuids/nodeids should be checked before > running programs. I think most of engineers should be aware of that but many > users have their own assumption :( > > Basically, I don't have strong objections, you're right technically. > > In summary... > - users should not assume cpuids are contiguous. > - all possible ids should be fixed at boot time. > - For uses, some clarification document should be somewhere in Documenatation. Fine to me. > > So, Gu-san > 1) determine all possible ids at boot. > 2) clarify cpuid/nodeid can have hole because of 1) in Documenation. > 3) It would be good if other guys give us ack. Also fine. But before this going, could you please reconsider determining the ids when firstly present (the implementation on this patchset)? Though it is not the perfect one in some words, but we can ignore the doubts that mentioned above as the cpu/node hotplug is not frequent behaviours, and there seems not anything harmful to us if we go this way. Regards, Gu > > In future, > I myself thinks naming system like udev for cpuid/numaid is necessary, at last. > Can that renaming feature can be cgroup/namespace feature ? If possible, > all container can have cpuids starting from 0. > > Thanks, > -Kame > > . > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 0/2] workqueue: fix a bug when numa mapping is changed 2015-04-02 1:36 ` Gu Zheng @ 2015-04-02 2:54 ` Kamezawa Hiroyuki 0 siblings, 0 replies; 23+ messages in thread From: Kamezawa Hiroyuki @ 2015-04-02 2:54 UTC (permalink / raw) To: Gu Zheng, Tejun Heo Cc: linux-kernel, laijs, isimatu.yasuaki, tangchen, izumi.taku On 2015/04/02 10:36, Gu Zheng wrote: > Hi Kame, TJ, > > On 04/01/2015 04:30 PM, Kamezawa Hiroyuki wrote: > >> On 2015/04/01 12:02, Tejun Heo wrote: >>> On Wed, Apr 01, 2015 at 11:55:11AM +0900, Kamezawa Hiroyuki wrote: >>>> Now, hot-added cpus will have the lowest free cpu id. >>>> >>>> Because of this, in most of systems which has only cpu-hot-add, cpu-ids are always >>>> contiguous even after cpu hot add. >>>> In enterprise, this would be considered as imcompatibility. >>>> >>>> determining cpuid <-> lapicid at boot will make cpuids sparse. That may corrupt >>>> exisiting script or configuration/resource management software. >>> >>> Ugh... so, cpu number allocation on hot-add is part of userland >>> interface that we're locked into? >> >> We checked most of RHEL7 packages and didn't find a problem yet. >> But, for examle, we know some performance test team's test program assumed contiguous >> cpuids and it failed. It was an easy case because we can ask them to fix the application >> but I guess there will be some amount of customers that cpuids are contiguous. >> >>> Tying hotplug and id allocation >>> order together usually isn't a good idea. What if the cpu up fails >>> while running the notifiers? The ID is already allocated and the next >>> cpu being brought up will be after a hole anyway. Is this even >>> actually gonna affect userland? >>> >> >> Maybe. It's not fail-safe but.... >> >> In general, all kernel engineers (and skilled userland engineers) knows that >> cpuids cannot be always contiguous and cpuids/nodeids should be checked before >> running programs. I think most of engineers should be aware of that but many >> users have their own assumption :( >> >> Basically, I don't have strong objections, you're right technically. >> >> In summary... >> - users should not assume cpuids are contiguous. >> - all possible ids should be fixed at boot time. >> - For uses, some clarification document should be somewhere in Documenatation. > > Fine to me. > >> >> So, Gu-san >> 1) determine all possible ids at boot. >> 2) clarify cpuid/nodeid can have hole because of 1) in Documenation. >> 3) It would be good if other guys give us ack. > > Also fine. > But before this going, could you please reconsider determining the ids when firstly > present (the implementation on this patchset)? > Though it is not the perfect one in some words, but we can ignore the doubts that > mentioned above as the cpu/node hotplug is not frequent behaviours, and there seems > not anything harmful to us if we go this way. > Is it so heavy work ? Hmm. My requests are Implement your patches as - Please don't change current behavior at boot. - Remember all possible apicids and give them future cpuids if not assigned. as step 1. Please fix dynamic pxm<->node detection in step2. In future, memory-less node handling in x86 should be revisited. Thanks, -Kame ^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2015-04-02 2:55 UTC | newest] Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2015-03-26 2:17 [PATCH 0/2] workqueue: fix a bug when numa mapping is changed Gu Zheng 2015-03-26 2:17 ` [PATCH 1/2] x86/cpu hotplug: make apicid <--> cpuid mapping persistent Gu Zheng 2015-03-26 3:19 ` Kamezawa Hiroyuki 2015-03-26 4:55 ` Gu Zheng 2015-03-26 15:13 ` Tejun Heo 2015-03-26 16:31 ` Kamezawa Hiroyuki 2015-03-30 9:58 ` Gu Zheng 2015-04-01 2:59 ` Kamezawa Hiroyuki 2015-03-26 2:17 ` [PATCH 2/2] workqueue: update per cpu workqueue's numa affinity when cpu preparing online Gu Zheng 2015-03-26 3:12 ` [PATCH 0/2] workqueue: fix a bug when numa mapping is changed Kamezawa Hiroyuki 2015-03-26 5:04 ` Gu Zheng 2015-03-26 15:18 ` Tejun Heo 2015-03-26 16:42 ` Kamezawa Hiroyuki 2015-03-30 9:49 ` Gu Zheng 2015-03-30 9:49 ` Gu Zheng 2015-03-31 6:09 ` Kamezawa Hiroyuki 2015-03-31 15:28 ` Tejun Heo 2015-04-01 2:55 ` Kamezawa Hiroyuki 2015-04-01 3:02 ` Tejun Heo 2015-04-01 3:05 ` Tejun Heo 2015-04-01 8:30 ` Kamezawa Hiroyuki 2015-04-02 1:36 ` Gu Zheng 2015-04-02 2:54 ` Kamezawa Hiroyuki
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.