* [regression] cpuset: offlined CPUs removed from affinity masks @ 2020-01-16 17:41 Mathieu Desnoyers 2020-01-16 18:27 ` Valentin Schneider 2020-02-17 16:03 ` Mathieu Desnoyers 0 siblings, 2 replies; 16+ messages in thread From: Mathieu Desnoyers @ 2020-01-16 17:41 UTC (permalink / raw) To: Li Zefan; +Cc: linux-kernel, Peter Zijlstra, Ingo Molnar Hi, I noticed the following regression with CONFIG_CPUSET=y. Note that I am not using cpusets at all (only using the root cpuset I'm given at boot), it's just configured in. I am currently working on a 5.2.5 kernel. I am simply combining use of taskset(1) (setting the affinity mask of a process) and cpu hotplug. The result is that with CONFIG_CPUSET=y, setting the affinity mask including an offline CPU number don't keep that CPU in the affinity mask, and it is never put back when the CPU comes back online. CONFIG_CPUSET=n behaves as expected, and puts back the CPU into the affinity mask reported to user-space when it comes back online. * With CONFIG_CPUSET=y (unexpected behavior): # echo 0 > /sys/devices/system/cpu/cpu1/online % taskset 0x7 ./loop & % taskset -p $! pid 1341's current affinity mask: 5 # echo 1 > /sys/devices/system/cpu/cpu1/online taskset -p $! pid 1341's current affinity mask: 5 kill $! * With CONFIG_CPUSET=n (expected behavior): (Offlining CPU, then start task) # echo 0 > /sys/devices/system/cpu/cpu1/online % taskset 0x7 ./loop & % taskset -p $! pid 1358's current affinity mask: 5 # echo 1 > /sys/devices/system/cpu/cpu1/online taskset -p $! pid 1358's current affinity mask: 7 kill $! Test system lscpu output: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 32 On-line CPU(s) list: 0-31 Thread(s) per core: 2 Core(s) per socket: 8 Socket(s): 2 NUMA node(s): 2 Vendor ID: GenuineIntel CPU family: 6 Model: 60 Model name: Intel Core Processor (Haswell, no TSX, IBRS) Stepping: 1 CPU MHz: 2399.996 BogoMIPS: 4799.99 Hypervisor vendor: KVM Virtualization type: full L1d cache: 32K L1i cache: 32K L2 cache: 4096K NUMA node0 CPU(s): 0-7,16-23 NUMA node1 CPU(s): 8-15,24-31 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm cpuid_fault invpcid_single pti ibrs ibpb fsgsbase bmi1 avx2 smep bmi2 erms invpcid xsaveopt -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [regression] cpuset: offlined CPUs removed from affinity masks 2020-01-16 17:41 [regression] cpuset: offlined CPUs removed from affinity masks Mathieu Desnoyers @ 2020-01-16 18:27 ` Valentin Schneider 2020-02-17 16:03 ` Mathieu Desnoyers 1 sibling, 0 replies; 16+ messages in thread From: Valentin Schneider @ 2020-01-16 18:27 UTC (permalink / raw) To: Mathieu Desnoyers, Li Zefan; +Cc: linux-kernel, Peter Zijlstra, Ingo Molnar On 16/01/2020 17:41, Mathieu Desnoyers wrote: > Hi, > > I noticed the following regression with CONFIG_CPUSET=y. Note that > I am not using cpusets at all (only using the root cpuset I'm given > at boot), it's just configured in. I am currently working on a 5.2.5 > kernel. I am simply combining use of taskset(1) (setting the affinity > mask of a process) and cpu hotplug. The result is that with > CONFIG_CPUSET=y, setting the affinity mask including an offline CPU number > don't keep that CPU in the affinity mask, and it is never put back when the > CPU comes back online. CONFIG_CPUSET=n behaves as expected, and puts back > the CPU into the affinity mask reported to user-space when it comes back > online. > > > * With CONFIG_CPUSET=y (unexpected behavior): > > # echo 0 > /sys/devices/system/cpu/cpu1/online > > % taskset 0x7 ./loop & > % taskset -p $! > pid 1341's current affinity mask: 5 > > # echo 1 > /sys/devices/system/cpu/cpu1/online > > taskset -p $! > pid 1341's current affinity mask: 5 > > kill $! > As discussed on IRC, this is because we have in sched_setaffinity(): cpuset_cpus_allowed(p, cpus_allowed); cpumask_and(new_mask, in_mask, cpus_allowed); Another source of issue is that CPUs are taken out of cpusets when hotplugged out, and not put back in when hotplugged back in (except for the root cpuset which follows cpu_active_mask). Both cpuset.effective_cpus and cpuset.allowed_cpus seem to only span online CPUs: root@valsch-juno:~# cat /sys/fs/cgroup/cpuset/cpuset.effective_cpus 0-5 root@valsch-juno:~# cat /sys/fs/cgroup/cpuset/cpuset.cpus 0-5 root@valsch-juno:~# echo 0 > /sys/devices/system/cpu/cpu3/online [93418.733050] CPU3: shutdown [93418.735815] psci: CPU3 killed (polled 0 ms) root@valsch-juno:~# cat /sys/fs/cgroup/cpuset/cpuset.cpus 0-2,4-5 root@valsch-juno:~# cat /sys/fs/cgroup/cpuset/cpuset.effective_cpus 0-2,4-5 The thing is, with CONFIG_CPUSET=n, we can absolutely cope with p->cpus_ptr spanning CPUs that are offline because we still check the active/online mask (is_cpu_allowed()). So one thing I'd like to know is why do cpusets remove offline cpus from their mask? I could see cpuset.allowed containing both online & offline CPUs, and cpuset.effective containing just the online ones. That way in sched_setaffinity() we can still check for cpuset.allowed, and we still have the online/active check in __set_cpus_allowed_ptr() to deny stupid requests. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [regression] cpuset: offlined CPUs removed from affinity masks 2020-01-16 17:41 [regression] cpuset: offlined CPUs removed from affinity masks Mathieu Desnoyers 2020-01-16 18:27 ` Valentin Schneider @ 2020-02-17 16:03 ` Mathieu Desnoyers 2020-02-19 15:19 ` Tejun Heo 1 sibling, 1 reply; 16+ messages in thread From: Mathieu Desnoyers @ 2020-02-17 16:03 UTC (permalink / raw) To: Li Zefan, Tejun Heo, cgroups Cc: linux-kernel, Peter Zijlstra, Ingo Molnar, Valentin Schneider Hi, Adding Tejun and the cgroups mailing list in CC for this cpuset regression I reported last month. Thanks, Mathieu ----- On Jan 16, 2020, at 12:41 PM, Mathieu Desnoyers mathieu.desnoyers@efficios.com wrote: > Hi, > > I noticed the following regression with CONFIG_CPUSET=y. Note that > I am not using cpusets at all (only using the root cpuset I'm given > at boot), it's just configured in. I am currently working on a 5.2.5 > kernel. I am simply combining use of taskset(1) (setting the affinity > mask of a process) and cpu hotplug. The result is that with > CONFIG_CPUSET=y, setting the affinity mask including an offline CPU number > don't keep that CPU in the affinity mask, and it is never put back when the > CPU comes back online. CONFIG_CPUSET=n behaves as expected, and puts back > the CPU into the affinity mask reported to user-space when it comes back > online. > > > * With CONFIG_CPUSET=y (unexpected behavior): > > # echo 0 > /sys/devices/system/cpu/cpu1/online > > % taskset 0x7 ./loop & > % taskset -p $! > pid 1341's current affinity mask: 5 > > # echo 1 > /sys/devices/system/cpu/cpu1/online > > taskset -p $! > pid 1341's current affinity mask: 5 > > kill $! > > > * With CONFIG_CPUSET=n (expected behavior): > > (Offlining CPU, then start task) > > # echo 0 > /sys/devices/system/cpu/cpu1/online > > % taskset 0x7 ./loop & > % taskset -p $! > pid 1358's current affinity mask: 5 > > # echo 1 > /sys/devices/system/cpu/cpu1/online > > taskset -p $! > pid 1358's current affinity mask: 7 > > kill $! > > > Test system lscpu output: > > Architecture: x86_64 > CPU op-mode(s): 32-bit, 64-bit > Byte Order: Little Endian > CPU(s): 32 > On-line CPU(s) list: 0-31 > Thread(s) per core: 2 > Core(s) per socket: 8 > Socket(s): 2 > NUMA node(s): 2 > Vendor ID: GenuineIntel > CPU family: 6 > Model: 60 > Model name: Intel Core Processor (Haswell, no TSX, IBRS) > Stepping: 1 > CPU MHz: 2399.996 > BogoMIPS: 4799.99 > Hypervisor vendor: KVM > Virtualization type: full > L1d cache: 32K > L1i cache: 32K > L2 cache: 4096K > NUMA node0 CPU(s): 0-7,16-23 > NUMA node1 CPU(s): 8-15,24-31 > Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca > cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc > rep_good nopl cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 > sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand > hypervisor lahf_lm abm cpuid_fault invpcid_single pti ibrs ibpb fsgsbase bmi1 > avx2 smep bmi2 erms invpcid xsaveopt > > > > -- > Mathieu Desnoyers > EfficiOS Inc. > http://www.efficios.com -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [regression] cpuset: offlined CPUs removed from affinity masks 2020-02-17 16:03 ` Mathieu Desnoyers @ 2020-02-19 15:19 ` Tejun Heo 2020-02-19 15:43 ` Mathieu Desnoyers 0 siblings, 1 reply; 16+ messages in thread From: Tejun Heo @ 2020-02-19 15:19 UTC (permalink / raw) To: Mathieu Desnoyers Cc: Li Zefan, cgroups, linux-kernel, Peter Zijlstra, Ingo Molnar, Valentin Schneider Hello, On Mon, Feb 17, 2020 at 11:03:07AM -0500, Mathieu Desnoyers wrote: > Hi, > > Adding Tejun and the cgroups mailing list in CC for this cpuset regression I > reported last month. > > Thanks, > > Mathieu > > ----- On Jan 16, 2020, at 12:41 PM, Mathieu Desnoyers mathieu.desnoyers@efficios.com wrote: > > > Hi, > > > > I noticed the following regression with CONFIG_CPUSET=y. Note that > > I am not using cpusets at all (only using the root cpuset I'm given > > at boot), it's just configured in. I am currently working on a 5.2.5 > > kernel. I am simply combining use of taskset(1) (setting the affinity > > mask of a process) and cpu hotplug. The result is that with > > CONFIG_CPUSET=y, setting the affinity mask including an offline CPU number > > don't keep that CPU in the affinity mask, and it is never put back when the > > CPU comes back online. CONFIG_CPUSET=n behaves as expected, and puts back > > the CPU into the affinity mask reported to user-space when it comes back > > online. Because cpuset operations irreversibly change task affinity masks rather than masking them dynamically, the interaction has always been kinda broken. Hmm... Are there older kernel vesions which behave differently? Off the top of my head, I can't think of sth which could have changed that behavior recently but I could easily be missing something. Thanks. -- tejun ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [regression] cpuset: offlined CPUs removed from affinity masks 2020-02-19 15:19 ` Tejun Heo @ 2020-02-19 15:43 ` Mathieu Desnoyers 2020-02-19 15:47 ` Tejun Heo 0 siblings, 1 reply; 16+ messages in thread From: Mathieu Desnoyers @ 2020-02-19 15:43 UTC (permalink / raw) To: Tejun Heo Cc: Li Zefan, cgroups, linux-kernel, Peter Zijlstra, Ingo Molnar, Valentin Schneider ----- On Feb 19, 2020, at 10:19 AM, Tejun Heo tj@kernel.org wrote: > Hello, > > On Mon, Feb 17, 2020 at 11:03:07AM -0500, Mathieu Desnoyers wrote: >> Hi, >> >> Adding Tejun and the cgroups mailing list in CC for this cpuset regression I >> reported last month. >> >> Thanks, >> >> Mathieu >> >> ----- On Jan 16, 2020, at 12:41 PM, Mathieu Desnoyers >> mathieu.desnoyers@efficios.com wrote: >> >> > Hi, >> > >> > I noticed the following regression with CONFIG_CPUSET=y. Note that >> > I am not using cpusets at all (only using the root cpuset I'm given >> > at boot), it's just configured in. I am currently working on a 5.2.5 >> > kernel. I am simply combining use of taskset(1) (setting the affinity >> > mask of a process) and cpu hotplug. The result is that with >> > CONFIG_CPUSET=y, setting the affinity mask including an offline CPU number >> > don't keep that CPU in the affinity mask, and it is never put back when the >> > CPU comes back online. CONFIG_CPUSET=n behaves as expected, and puts back >> > the CPU into the affinity mask reported to user-space when it comes back >> > online. > > Because cpuset operations irreversibly change task affinity masks > rather than masking them dynamically, the interaction has always been > kinda broken. Hmm... Are there older kernel vesions which behave > differently? Off the top of my head, I can't think of sth which could > have changed that behavior recently but I could easily be missing > something. Hi Tejun, The regression I'm talking about here is that CONFIG_CPUSET=y changes the behavior of the sched_setaffinify system call, which existed prior to cpusets. sched_setaffinity should behave in the same way for kernels configured with CONFIG_CPUSET=y or CONFIG_CPUSET=n. The fact that cpuset decides to irreversibly change the task affinity mask may not be considered a regression if it has always done that, but changing the behavior of sched_setaffinity seems to fit the definition of a regression. Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [regression] cpuset: offlined CPUs removed from affinity masks 2020-02-19 15:43 ` Mathieu Desnoyers @ 2020-02-19 15:47 ` Tejun Heo 2020-02-19 15:50 ` Mathieu Desnoyers 0 siblings, 1 reply; 16+ messages in thread From: Tejun Heo @ 2020-02-19 15:47 UTC (permalink / raw) To: Mathieu Desnoyers Cc: Li Zefan, cgroups, linux-kernel, Peter Zijlstra, Ingo Molnar, Valentin Schneider On Wed, Feb 19, 2020 at 10:43:05AM -0500, Mathieu Desnoyers wrote: > The regression I'm talking about here is that CONFIG_CPUSET=y changes the > behavior of the sched_setaffinify system call, which existed prior to > cpusets. > > sched_setaffinity should behave in the same way for kernels configured with > CONFIG_CPUSET=y or CONFIG_CPUSET=n. > > The fact that cpuset decides to irreversibly change the task affinity mask > may not be considered a regression if it has always done that, but changing > the behavior of sched_setaffinity seems to fit the definition of a regression. We generally use "regression" for breakages which weren't in past versions but then appeared later. It has debugging implications because if we know something is a regression, we generally can point to the commit which introduced the bug either through examining the history or bisection. It is a silly bug, for sure, but slapping regression name on it just confuses rather than helping anything. -- tejun ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [regression] cpuset: offlined CPUs removed from affinity masks 2020-02-19 15:47 ` Tejun Heo @ 2020-02-19 15:50 ` Mathieu Desnoyers 2020-02-19 15:52 ` Tejun Heo 0 siblings, 1 reply; 16+ messages in thread From: Mathieu Desnoyers @ 2020-02-19 15:50 UTC (permalink / raw) To: Tejun Heo Cc: Li Zefan, cgroups, linux-kernel, Peter Zijlstra, Ingo Molnar, Valentin Schneider ----- On Feb 19, 2020, at 10:47 AM, Tejun Heo tj@kernel.org wrote: > On Wed, Feb 19, 2020 at 10:43:05AM -0500, Mathieu Desnoyers wrote: >> The regression I'm talking about here is that CONFIG_CPUSET=y changes the >> behavior of the sched_setaffinify system call, which existed prior to >> cpusets. >> >> sched_setaffinity should behave in the same way for kernels configured with >> CONFIG_CPUSET=y or CONFIG_CPUSET=n. >> >> The fact that cpuset decides to irreversibly change the task affinity mask >> may not be considered a regression if it has always done that, but changing >> the behavior of sched_setaffinity seems to fit the definition of a regression. > > We generally use "regression" for breakages which weren't in past > versions but then appeared later. It has debugging implications > because if we know something is a regression, we generally can point > to the commit which introduced the bug either through examining the > history or bisection. > > It is a silly bug, for sure, but slapping regression name on it just > confuses rather than helping anything. I can look into figuring out the commit introducing this issue, which I suspect will be close to the introduction of CONFIG_CPUSET into the kernel (which was ages ago). I'll check and let you know. Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [regression] cpuset: offlined CPUs removed from affinity masks 2020-02-19 15:50 ` Mathieu Desnoyers @ 2020-02-19 15:52 ` Tejun Heo 2020-02-19 16:08 ` Mathieu Desnoyers 0 siblings, 1 reply; 16+ messages in thread From: Tejun Heo @ 2020-02-19 15:52 UTC (permalink / raw) To: Mathieu Desnoyers Cc: Li Zefan, cgroups, linux-kernel, Peter Zijlstra, Ingo Molnar, Valentin Schneider On Wed, Feb 19, 2020 at 10:50:35AM -0500, Mathieu Desnoyers wrote: > I can look into figuring out the commit introducing this issue, which I > suspect will be close to the introduction of CONFIG_CPUSET into the > kernel (which was ages ago). I'll check and let you know. Oh, yeah, I'm pretty sure it goes way back. I don't think tracking that down would be necessary. I was just wondering whether it was a recent change because you said it was a regression. -- tejun ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [regression] cpuset: offlined CPUs removed from affinity masks 2020-02-19 15:52 ` Tejun Heo @ 2020-02-19 16:08 ` Mathieu Desnoyers 2020-02-19 16:12 ` Tejun Heo 0 siblings, 1 reply; 16+ messages in thread From: Mathieu Desnoyers @ 2020-02-19 16:08 UTC (permalink / raw) To: Tejun Heo Cc: Li Zefan, cgroups, linux-kernel, Peter Zijlstra, Ingo Molnar, Valentin Schneider, Thomas Gleixner ----- On Feb 19, 2020, at 10:52 AM, Tejun Heo tj@kernel.org wrote: > On Wed, Feb 19, 2020 at 10:50:35AM -0500, Mathieu Desnoyers wrote: >> I can look into figuring out the commit introducing this issue, which I >> suspect will be close to the introduction of CONFIG_CPUSET into the >> kernel (which was ages ago). I'll check and let you know. > > Oh, yeah, I'm pretty sure it goes way back. I don't think tracking > that down would be necessary. I was just wondering whether it was a > recent change because you said it was a regression. It's most likely not a recent regression, but it has unfortunate effects on the affinity mask which directly affects my ongoing work on the pin_on_cpu() system call [1]. The sched_setaffinity vs cpu hotplug semantic provided by CONFIG_CPUSET=n if fine for the needs on pin_on_cpu(): when a CPU comes back online, those reappear in the affinity mask, but it is not the case with CONFIG_CPUSET=y. I wonder if applying the online cpu masks to the per-thread affinity mask is the correct approach ? I suspect what we may be looking for here is to keep the affinity mask independent of cpu hotplug, and look-up both the per-thread affinity mask and the online cpu mask whenever the scheduler needs to perform "is_cpu_allowed()" to check task placement. Then whenever sched_getaffinity or cpusets try to query the current set of cpus on which a task can run right now, it could also look at both the task's affinity mask and the online cpu mask. Thanks, Mathieu [1] https://lore.kernel.org/r/20200121160312.26545-1-mathieu.desnoyers@efficios.com -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [regression] cpuset: offlined CPUs removed from affinity masks 2020-02-19 16:08 ` Mathieu Desnoyers @ 2020-02-19 16:12 ` Tejun Heo 2020-03-07 16:06 ` Mathieu Desnoyers 0 siblings, 1 reply; 16+ messages in thread From: Tejun Heo @ 2020-02-19 16:12 UTC (permalink / raw) To: Mathieu Desnoyers Cc: Li Zefan, cgroups, linux-kernel, Peter Zijlstra, Ingo Molnar, Valentin Schneider, Thomas Gleixner On Wed, Feb 19, 2020 at 11:08:39AM -0500, Mathieu Desnoyers wrote: > I wonder if applying the online cpu masks to the per-thread affinity mask > is the correct approach ? I suspect what we may be looking for here is to keep Oh, the whole thing is wrong. > the affinity mask independent of cpu hotplug, and look-up both the per-thread > affinity mask and the online cpu mask whenever the scheduler needs to perform > "is_cpu_allowed()" to check task placement. Yes, that's what it should have done from the get-go. The way it's implemented now, maybe we can avoid some specific cases like cpuset not being used at all but it'll constantly get in the way if you're expecting thread affinity to retain its value across offlines. -- tejun ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [regression] cpuset: offlined CPUs removed from affinity masks 2020-02-19 16:12 ` Tejun Heo @ 2020-03-07 16:06 ` Mathieu Desnoyers 2020-03-12 18:26 ` Tejun Heo 0 siblings, 1 reply; 16+ messages in thread From: Mathieu Desnoyers @ 2020-03-07 16:06 UTC (permalink / raw) To: Tejun Heo Cc: Li Zefan, cgroups, linux-kernel, Peter Zijlstra, Ingo Molnar, Valentin Schneider, Thomas Gleixner ----- On Feb 19, 2020, at 11:12 AM, Tejun Heo tj@kernel.org wrote: > On Wed, Feb 19, 2020 at 11:08:39AM -0500, Mathieu Desnoyers wrote: >> I wonder if applying the online cpu masks to the per-thread affinity mask >> is the correct approach ? I suspect what we may be looking for here is to keep > > Oh, the whole thing is wrong. > >> the affinity mask independent of cpu hotplug, and look-up both the per-thread >> affinity mask and the online cpu mask whenever the scheduler needs to perform >> "is_cpu_allowed()" to check task placement. > > Yes, that's what it should have done from the get-go. The way it's > implemented now, maybe we can avoid some specific cases like cpuset > not being used at all but it'll constantly get in the way if you're > expecting thread affinity to retain its value across offlines. Looking into solving this, one key issue seems to get in the way: cpuset appear to care about not allowing to create a cpuset which has no currently active CPU where to run, e.g.: # it is forbidden to create an empty cpuset if the cpu is offlined first: mkdir /sys/fs/cgroup/cpuset/test echo 2 > /sys/fs/cgroup/cpuset/test/cpusets.cpus cat /sys/fs/cgroup/cpuset/test/cpusets.cpu 2 echo 0 > /sys/devices/system/cpu/cpu1/online echo 1 > /sys/fs/cgroup/cpuset/test/cpuset.cpus bash: echo: write error: Invalid argument cat /sys/fs/cgroup/cpuset/test/cpusets.cpu 2 # but it's perfectly fine to generate this empty cpuset by offlining # a cpu _after_ creating the cpuset: echo 0 > /sys/devices/system/cpu/cpu2/online cat /sys/fs/cgroup/cpuset/test/cpusets.cpu <----- empty (nothing) Some further testing seems to show that tasks belonging to that empty cpuset are placed anywhere on active cpus. Clearly, there is an intent that cpusets take the active mask into account to prohibit creating an empty cpuset, but nothing prevents cpu hotplug from creating an empty cpuset. I wonder how to solve this inconsistency ? Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [regression] cpuset: offlined CPUs removed from affinity masks 2020-03-07 16:06 ` Mathieu Desnoyers @ 2020-03-12 18:26 ` Tejun Heo 2020-03-12 19:47 ` Mathieu Desnoyers 0 siblings, 1 reply; 16+ messages in thread From: Tejun Heo @ 2020-03-12 18:26 UTC (permalink / raw) To: Mathieu Desnoyers Cc: Li Zefan, cgroups, linux-kernel, Peter Zijlstra, Ingo Molnar, Valentin Schneider, Thomas Gleixner Hello, On Sat, Mar 07, 2020 at 11:06:47AM -0500, Mathieu Desnoyers wrote: > Looking into solving this, one key issue seems to get in the way: cpuset > appear to care about not allowing to create a cpuset which has no currently > active CPU where to run, e.g.: ... > Clearly, there is an intent that cpusets take the active mask into > account to prohibit creating an empty cpuset, but nothing prevents > cpu hotplug from creating an empty cpuset. > > I wonder how to solve this inconsistency ? Please try cpuset in cgroup2. It shouldn't have those issues. Thanks. -- tejun ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [regression] cpuset: offlined CPUs removed from affinity masks 2020-03-12 18:26 ` Tejun Heo @ 2020-03-12 19:47 ` Mathieu Desnoyers 2020-03-24 18:01 ` Tejun Heo 0 siblings, 1 reply; 16+ messages in thread From: Mathieu Desnoyers @ 2020-03-12 19:47 UTC (permalink / raw) To: Tejun Heo Cc: Li Zefan, cgroups, linux-kernel, Peter Zijlstra, Ingo Molnar, Valentin Schneider, Thomas Gleixner ----- On Mar 12, 2020, at 2:26 PM, Tejun Heo tj@kernel.org wrote: > Hello, > > On Sat, Mar 07, 2020 at 11:06:47AM -0500, Mathieu Desnoyers wrote: >> Looking into solving this, one key issue seems to get in the way: cpuset >> appear to care about not allowing to create a cpuset which has no currently >> active CPU where to run, e.g.: > ... >> Clearly, there is an intent that cpusets take the active mask into >> account to prohibit creating an empty cpuset, but nothing prevents >> cpu hotplug from creating an empty cpuset. >> >> I wonder how to solve this inconsistency ? > > Please try cpuset in cgroup2. It shouldn't have those issues. After figuring how to use cgroup2 (systemd.unified_cgroup_hierarchy=1 boot parameter helped tremendously), and testing similar scenarios, it indeed seems to have a much saner behavior than cgroup1. Considering that the allowed cpu mask is weird wrt cgroup1 and cpu hotplug, and that cgroup2 allows thread-level granularity, it does not make much sense to prevent the pin_on_cpu() system call I am working on from pinning on cpus which are not present in the allowed mask. I'm currently investigating approaches that would detect situations where a thread is pinned onto a CPU which is not part of its allowed mask, and set the task prio at MAX_PRIO-1 (the lowest fair priority possible) in those cases. The basic idea is to allow applications to pin to every possible cpu, but not allow them to use this to consume a lot of cpu time on CPUs they are not allowed to run. Thoughts ? Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [regression] cpuset: offlined CPUs removed from affinity masks 2020-03-12 19:47 ` Mathieu Desnoyers @ 2020-03-24 18:01 ` Tejun Heo 2020-03-24 19:30 ` Mathieu Desnoyers 0 siblings, 1 reply; 16+ messages in thread From: Tejun Heo @ 2020-03-24 18:01 UTC (permalink / raw) To: Mathieu Desnoyers Cc: Li Zefan, cgroups, linux-kernel, Peter Zijlstra, Ingo Molnar, Valentin Schneider, Thomas Gleixner Sorry about long delay. On Thu, Mar 12, 2020 at 03:47:50PM -0400, Mathieu Desnoyers wrote: > The basic idea is to allow applications to pin to every possible cpu, but > not allow them to use this to consume a lot of cpu time on CPUs they > are not allowed to run. > > Thoughts ? One thing that we learned is that priority alone isn't enough in isolating cpu consumptions no matter how low the priority may be if the workload is latency sensitive. The actual computation capacity of cpus gets saturated way before cpu time is saturated and latency impact from lowered mips becomes noticeable. So, depending on workloads, allowing threads to run at the lowest priority on disallowed cpus might not lead to behaviors that users expect but I have no idea what kind of usage models you have on mind for the new system call. Thanks. -- tejun ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [regression] cpuset: offlined CPUs removed from affinity masks 2020-03-24 18:01 ` Tejun Heo @ 2020-03-24 19:30 ` Mathieu Desnoyers 2020-03-30 19:53 ` Mathieu Desnoyers 0 siblings, 1 reply; 16+ messages in thread From: Mathieu Desnoyers @ 2020-03-24 19:30 UTC (permalink / raw) To: Tejun Heo Cc: Li Zefan, cgroups, linux-kernel, Peter Zijlstra, Ingo Molnar, Valentin Schneider, Thomas Gleixner ----- On Mar 24, 2020, at 2:01 PM, Tejun Heo tj@kernel.org wrote: > On Thu, Mar 12, 2020 at 03:47:50PM -0400, Mathieu Desnoyers wrote: >> The basic idea is to allow applications to pin to every possible cpu, but >> not allow them to use this to consume a lot of cpu time on CPUs they >> are not allowed to run. >> >> Thoughts ? > > One thing that we learned is that priority alone isn't enough in isolating cpu > consumptions no matter how low the priority may be if the workload is latency > sensitive. The actual computation capacity of cpus gets saturated way before cpu > time is saturated and latency impact from lowered mips becomes noticeable. So, > depending on workloads, allowing threads to run at the lowest priority on > disallowed cpus might not lead to behaviors that users expect but I have no idea > what kind of usage models you have on mind for the new system call. Let me take a step back and focus on the requirements for the moment. It should help us navigate more easily through the various solutions available to us. Two goals are to enable use-cases such as user-space memory allocator migration of free memory (typically single-process), and issue operations on each per-CPU data from the consumer of a user-space per-CPU ring buffer (multi-process over shared memory). For the memory allocator use-case, one scenario which illustrates the situation well is related to CPU hotplug: with per-CPU memory pools, what should the application do when a CPU goes offline ? Ideally, it should have a manager thread able to detect that a CPU is offline, and be able to reclaim free memory or move it into other CPU's pools. However, considering that user-space has no mean to synchronously do this wrt CPU hotplug, the CPU may very well come back online and start using those data structures once more, so we cannot presume mutual exclusion from an offline CPU. One way to achieve this is by allowing user-space to run rseq critical sections targeting the per-CPU (user-space) data of any possible CPU. However, when considering allowing threads to pin themselves on any of the possible CPUs, three concerns arise: - CPU hotplug (offline CPUs), - sched_setaffinity affinity mask, which can be set either internally by the process or externally by a manager process, - cgroup cpuset allowed mask, which can be set either internally or by manager process, For offline CPUs, the pin_on_cpu system call ensures that a task can run on a "backup runqueue" when it pins itself onto an offline CPU. The current algorithm is to choose the first online CPU's runqueue for this. As soon as the offline CPU is brought back online, all tasks pinned to that CPU are moved to their rightful runqueue. For sched_setaffinity's affinity mask, I don't think it is such an issue, because pinning onto specific CPUs does not provide more rights than what could have been done by setting the affinity mask to a single CPU. The main difference between sched_setaffinity to a single cpu and pin_on_cpu is the behavior when the target CPU goes offline: sched_setaffinity then allows the thread to move to any runqueue (which is really bad for rseq purposes), whereas pin_on_cpu moves the thread to a runqueue which is guaranteed to be the same for all threads which want to be pinned on that CPU. Then there is the issue of cgroup cpuset: AFAIU, cgroup v1's integration with CPU hotplug removes the offlined CPUs from the cgroup's allowed mask, which basically breaks the memory allocator free memory migration/reclaim use-case, because there is then no way to target an offline CPU if we apply the cgroup's allowed mask. For cgroup v2, AFAIU it allows creation of groups which target specific threads within a process. Therefore, some threads could have allowed mask which differ from others. In this kind of scenario, it's not possible to have a manager thread allowed to pin itself onto each CPUs which can be accessed by other threads in the same process. Also, for the multi-process shared memory use-case (ring buffer), if the various processes which interact with the same shared memory end up in different cgroups allowed to run on a different subset of the possible CPUs, it becomes impossible to have a consumer allowed to pin itself on all the CPUs it needs. Ideally, I would like to come up with an approach that is not fragile when combined with cgroups or cpu hotplug. One approach I have envisioned is to allow pin_on_cpu to target CPUs which are not part of the cpuset's allowed mask, but lower the priority of the threads to the lowest possible priority while doing so. That approach would allow threads to pin themselves on basically any CPU part of the possible cpu mask. But as you point out, maybe this is an issue in terms of workload isolation. I am welcoming ideas on how to solve this. Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [regression] cpuset: offlined CPUs removed from affinity masks 2020-03-24 19:30 ` Mathieu Desnoyers @ 2020-03-30 19:53 ` Mathieu Desnoyers 0 siblings, 0 replies; 16+ messages in thread From: Mathieu Desnoyers @ 2020-03-30 19:53 UTC (permalink / raw) To: Tejun Heo Cc: Li Zefan, cgroups, linux-kernel, Peter Zijlstra, Ingo Molnar, Valentin Schneider, Thomas Gleixner ----- On Mar 24, 2020, at 3:30 PM, Mathieu Desnoyers mathieu.desnoyers@efficios.com wrote: > ----- On Mar 24, 2020, at 2:01 PM, Tejun Heo tj@kernel.org wrote: > >> On Thu, Mar 12, 2020 at 03:47:50PM -0400, Mathieu Desnoyers wrote: >>> The basic idea is to allow applications to pin to every possible cpu, but >>> not allow them to use this to consume a lot of cpu time on CPUs they >>> are not allowed to run. >>> >>> Thoughts ? >> >> One thing that we learned is that priority alone isn't enough in isolating cpu >> consumptions no matter how low the priority may be if the workload is latency >> sensitive. The actual computation capacity of cpus gets saturated way before cpu >> time is saturated and latency impact from lowered mips becomes noticeable. So, >> depending on workloads, allowing threads to run at the lowest priority on >> disallowed cpus might not lead to behaviors that users expect but I have no idea >> what kind of usage models you have on mind for the new system call. > [...] One possibility would be to use SCHED_IDLE scheduling class rather than SCHED_OTHER with nice +19. The unfortunate side-effect AFAIU shows up when a thread requests to be pinned on a CPU which is continuously overcommitted. It may never run. This could come as a surprise for the user. The only case where this would happen is if: - A thread is pinned on CPU N, and - CPU N is not part of the allowed mask for the task's cpuset (and is overcommitted), or - CPU N is offline, and the fallback CPU is not part of the allowed mask for the task's cpuset (and is overcommitted). Is it an acceptable behavior ? How is userspace supposed to detect this kind of situation and mitigate it ? Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2020-03-30 19:53 UTC | newest] Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-01-16 17:41 [regression] cpuset: offlined CPUs removed from affinity masks Mathieu Desnoyers 2020-01-16 18:27 ` Valentin Schneider 2020-02-17 16:03 ` Mathieu Desnoyers 2020-02-19 15:19 ` Tejun Heo 2020-02-19 15:43 ` Mathieu Desnoyers 2020-02-19 15:47 ` Tejun Heo 2020-02-19 15:50 ` Mathieu Desnoyers 2020-02-19 15:52 ` Tejun Heo 2020-02-19 16:08 ` Mathieu Desnoyers 2020-02-19 16:12 ` Tejun Heo 2020-03-07 16:06 ` Mathieu Desnoyers 2020-03-12 18:26 ` Tejun Heo 2020-03-12 19:47 ` Mathieu Desnoyers 2020-03-24 18:01 ` Tejun Heo 2020-03-24 19:30 ` Mathieu Desnoyers 2020-03-30 19:53 ` Mathieu Desnoyers
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).