From: lizf@kernel.org
To: stable@vger.kernel.org
Cc: linux-kernel@vger.kernel.org, Zefan Li <lizefan@huawei.com>,
Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@kernel.org>, Miao Xie <miaox@cn.fujitsu.com>,
Kees Cook <keescook@chromium.org>, Tejun Heo <tj@kernel.org>
Subject: [PATCH 3.4 64/91] cpuset: PF_SPREAD_PAGE and PF_SPREAD_SLAB should be atomic flags
Date: Thu, 27 Nov 2014 16:42:47 +0800 [thread overview]
Message-ID: <1417077794-9299-64-git-send-email-lizf@kernel.org> (raw)
In-Reply-To: <1417077368-9217-1-git-send-email-lizf@kernel.org>
From: Zefan Li <lizefan@huawei.com>
3.4.105-rc1 review patch. If anyone has any objections, please let me know.
------------------
commit 2ad654bc5e2b211e92f66da1d819e47d79a866f0 upstream.
When we change cpuset.memory_spread_{page,slab}, cpuset will flip
PF_SPREAD_{PAGE,SLAB} bit of tsk->flags for each task in that cpuset.
This should be done using atomic bitops, but currently we don't,
which is broken.
Tetsuo reported a hard-to-reproduce kernel crash on RHEL6, which happened
when one thread tried to clear PF_USED_MATH while at the same time another
thread tried to flip PF_SPREAD_PAGE/PF_SPREAD_SLAB. They both operate on
the same task.
Here's the full report:
https://lkml.org/lkml/2014/9/19/230
To fix this, we make PF_SPREAD_PAGE and PF_SPREAD_SLAB atomic flags.
v4:
- updated mm/slab.c. (Fengguang Wu)
- updated Documentation.
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Miao Xie <miaox@cn.fujitsu.com>
Cc: Kees Cook <keescook@chromium.org>
Fixes: 950592f7b991 ("cpusets: update tasks' page/slab spread flags in time")
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Zefan Li <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
[lizf: Backported to 3.4:
- adjust context
- check current->flags & PF_MEMPOLICY rather than current->mempolicy]
---
Documentation/cgroups/cpusets.txt | 6 +++---
include/linux/cpuset.h | 4 ++--
include/linux/sched.h | 12 ++++++++++--
kernel/cpuset.c | 9 +++++----
mm/slab.c | 4 ++--
5 files changed, 22 insertions(+), 13 deletions(-)
diff --git a/Documentation/cgroups/cpusets.txt b/Documentation/cgroups/cpusets.txt
index cefd3d8..a52a39f 100644
--- a/Documentation/cgroups/cpusets.txt
+++ b/Documentation/cgroups/cpusets.txt
@@ -345,14 +345,14 @@ the named feature on.
The implementation is simple.
Setting the flag 'cpuset.memory_spread_page' turns on a per-process flag
-PF_SPREAD_PAGE for each task that is in that cpuset or subsequently
+PFA_SPREAD_PAGE for each task that is in that cpuset or subsequently
joins that cpuset. The page allocation calls for the page cache
-is modified to perform an inline check for this PF_SPREAD_PAGE task
+is modified to perform an inline check for this PFA_SPREAD_PAGE task
flag, and if set, a call to a new routine cpuset_mem_spread_node()
returns the node to prefer for the allocation.
Similarly, setting 'cpuset.memory_spread_slab' turns on the flag
-PF_SPREAD_SLAB, and appropriately marked slab caches will allocate
+PFA_SPREAD_SLAB, and appropriately marked slab caches will allocate
pages from the node returned by cpuset_mem_spread_node().
The cpuset_mem_spread_node() routine is also simple. It uses the
diff --git a/include/linux/cpuset.h b/include/linux/cpuset.h
index 668f66b..bb48be7 100644
--- a/include/linux/cpuset.h
+++ b/include/linux/cpuset.h
@@ -74,12 +74,12 @@ extern int cpuset_slab_spread_node(void);
static inline int cpuset_do_page_mem_spread(void)
{
- return current->flags & PF_SPREAD_PAGE;
+ return task_spread_page(current);
}
static inline int cpuset_do_slab_mem_spread(void)
{
- return current->flags & PF_SPREAD_SLAB;
+ return task_spread_slab(current);
}
extern int current_cpuset_is_being_rebound(void);
diff --git a/include/linux/sched.h b/include/linux/sched.h
index b85b719..56d8233 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1834,8 +1834,6 @@ extern void thread_group_times(struct task_struct *p, cputime_t *ut, cputime_t *
#define PF_KTHREAD 0x00200000 /* I am a kernel thread */
#define PF_RANDOMIZE 0x00400000 /* randomize virtual address space */
#define PF_SWAPWRITE 0x00800000 /* Allowed to write to swap */
-#define PF_SPREAD_PAGE 0x01000000 /* Spread page cache over cpuset */
-#define PF_SPREAD_SLAB 0x02000000 /* Spread some slab caches over cpuset */
#define PF_THREAD_BOUND 0x04000000 /* Thread bound to specific cpu */
#define PF_MCE_EARLY 0x08000000 /* Early kill for mce process policy */
#define PF_MEMPOLICY 0x10000000 /* Non-default NUMA mempolicy */
@@ -1868,6 +1866,8 @@ extern void thread_group_times(struct task_struct *p, cputime_t *ut, cputime_t *
#define used_math() tsk_used_math(current)
/* Per-process atomic flags. */
+#define PFA_SPREAD_PAGE 1 /* Spread page cache over cpuset */
+#define PFA_SPREAD_SLAB 2 /* Spread some slab caches over cpuset */
#define TASK_PFA_TEST(name, func) \
static inline bool task_##func(struct task_struct *p) \
@@ -1970,6 +1970,14 @@ static inline int set_cpus_allowed(struct task_struct *p, cpumask_t new_mask)
}
#endif
+TASK_PFA_TEST(SPREAD_PAGE, spread_page)
+TASK_PFA_SET(SPREAD_PAGE, spread_page)
+TASK_PFA_CLEAR(SPREAD_PAGE, spread_page)
+
+TASK_PFA_TEST(SPREAD_SLAB, spread_slab)
+TASK_PFA_SET(SPREAD_SLAB, spread_slab)
+TASK_PFA_CLEAR(SPREAD_SLAB, spread_slab)
+
/*
* Do not use outside of architecture code which knows its limitations.
*
diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index 9cb82b9..7f3bde5 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -326,13 +326,14 @@ static void cpuset_update_task_spread_flag(struct cpuset *cs,
struct task_struct *tsk)
{
if (is_spread_page(cs))
- tsk->flags |= PF_SPREAD_PAGE;
+ task_set_spread_page(tsk);
else
- tsk->flags &= ~PF_SPREAD_PAGE;
+ task_clear_spread_page(tsk);
+
if (is_spread_slab(cs))
- tsk->flags |= PF_SPREAD_SLAB;
+ task_set_spread_slab(tsk);
else
- tsk->flags &= ~PF_SPREAD_SLAB;
+ task_clear_spread_slab(tsk);
}
/*
diff --git a/mm/slab.c b/mm/slab.c
index 3eb1c38..3714dd9 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3321,7 +3321,7 @@ static inline void *____cache_alloc(struct kmem_cache *cachep, gfp_t flags)
#ifdef CONFIG_NUMA
/*
- * Try allocating on another node if PF_SPREAD_SLAB|PF_MEMPOLICY.
+ * Try allocating on another node if PFA_SPREAD_SLAB|PF_MEMPOLICY.
*
* If we are in_interrupt, then process context, including cpusets and
* mempolicy, may not apply and should not be used for allocation policy.
@@ -3562,7 +3562,7 @@ __do_cache_alloc(struct kmem_cache *cache, gfp_t flags)
{
void *objp;
- if (unlikely(current->flags & (PF_SPREAD_SLAB | PF_MEMPOLICY))) {
+ if (unlikely((current->flags & PF_MEMPOLICY) || cpuset_do_slab_mem_spread())) {
objp = alternate_node_alloc(cache, flags);
if (objp)
goto out;
--
1.9.1
next prev parent reply other threads:[~2014-11-27 8:57 UTC|newest]
Thread overview: 98+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-11-27 8:36 [PATCH 3.4 00/91] 3.4.105-rc1 review lizf
2014-11-27 8:41 ` [PATCH 3.4 01/91] KVM: s390: Fix user triggerable bug in dead code lizf
2014-11-27 8:41 ` [PATCH 3.4 02/91] regmap: Fix handling of volatile registers for format_write() chips lizf
2014-11-27 8:41 ` [PATCH 3.4 03/91] drm/i915: Remove bogus __init annotation from DMI callbacks lizf
2014-11-27 8:41 ` [PATCH 3.4 04/91] get rid of propagate_umount() mistakenly treating slaves as busy lizf
2014-11-27 8:41 ` [PATCH 3.4 05/91] drm/vmwgfx: Fix a potential infinite spin waiting for fifo idle lizf
2014-11-27 8:41 ` [PATCH 3.4 06/91] ALSA: hda - Fix COEF setups for ALC1150 codec lizf
2014-11-27 8:41 ` [PATCH 3.4 07/91] ACPI / cpuidle: fix deadlock between cpuidle_lock and cpu_hotplug.lock lizf
2014-11-27 8:41 ` [PATCH 3.4 08/91] regulatory: add NUL to alpha2 lizf
2014-11-27 8:41 ` [PATCH 3.4 09/91] percpu: fix pcpu_alloc_pages() failure path lizf
2014-11-27 8:41 ` [PATCH 3.4 10/91] percpu: perform tlb flush after pcpu_map_pages() failure lizf
2014-11-27 8:41 ` [PATCH 3.4 11/91] percpu: free percpu allocation info for uniprocessor system lizf
2014-11-27 8:41 ` [PATCH 3.4 12/91] cgroup: reject cgroup names with ' ' lizf
2014-11-27 8:41 ` [PATCH 3.4 13/91] rtlwifi: rtl8192cu: Add new ID lizf
2014-11-27 8:41 ` [PATCH 3.4 14/91] ahci: Add Device IDs for Intel 9 Series PCH lizf
2014-11-27 8:41 ` [PATCH 3.4 15/91] ata_piix: " lizf
2014-11-27 8:41 ` [PATCH 3.4 16/91] USB: ftdi_sio: add support for NOVITUS Bono E thermal printer lizf
2014-11-27 8:42 ` [PATCH 3.4 17/91] USB: sierra: avoid CDC class functions on "68A3" devices lizf
2014-11-27 8:42 ` [PATCH 3.4 18/91] USB: sierra: add 1199:68AA device ID lizf
2014-11-27 8:42 ` [PATCH 3.4 19/91] xen/manage: Always freeze/thaw processes when suspend/resuming lizf
2014-11-27 8:42 ` [PATCH 3.4 20/91] block: Fix dev_t minor allocation lifetime lizf
2014-11-27 8:42 ` [PATCH 3.4 21/91] usb: dwc3: core: fix order of PM runtime calls lizf
2014-11-27 8:42 ` [PATCH 3.4 22/91] ahci: add pcid for Marvel 0x9182 controller lizf
2014-11-27 8:42 ` [PATCH 3.4 23/91] drm/radeon: add connector quirk for fujitsu board lizf
2014-11-27 8:42 ` [PATCH 3.4 24/91] usb: host: xhci: fix compliance mode workaround lizf
2014-11-27 8:42 ` [PATCH 3.4 25/91] Input: elantech - fix detection of touchpad on ASUS s301l lizf
2014-11-27 8:42 ` [PATCH 3.4 26/91] USB: ftdi_sio: Add support for GE Healthcare Nemo Tracker device lizf
2014-11-27 8:42 ` [PATCH 3.4 27/91] uwb: init beacon cache entry before registering uwb device lizf
2014-11-27 8:42 ` [PATCH 3.4 28/91] Input: synaptics - add support for ForcePads lizf
2014-11-27 8:42 ` [PATCH 3.4 29/91] libceph: gracefully handle large reply messages from the mon lizf
2014-11-27 8:42 ` [PATCH 3.4 30/91] libceph: add process_one_ticket() helper lizf
2014-11-27 8:42 ` [PATCH 3.4 31/91] libceph: do not hard code max auth ticket len lizf
2014-11-27 8:42 ` [PATCH 3.4 32/91] Input: serport - add compat handling for SPIOCSTYPE ioctl lizf
2014-11-27 8:42 ` [PATCH 3.4 33/91] usb: hub: take hub->hdev reference when processing from eventlist lizf
2014-11-27 8:42 ` [PATCH 3.4 34/91] storage: Add single-LUN quirk for Jaz USB Adapter lizf
2014-11-27 8:42 ` [PATCH 3.4 35/91] xhci: Fix null pointer dereference if xhci initialization fails lizf
2014-11-27 8:42 ` [PATCH 3.4 36/91] futex: Unlock hb->lock in futex_wait_requeue_pi() error path lizf
2014-11-27 8:42 ` [PATCH 3.4 37/91] alarmtimer: Return relative times in timer_gettime lizf
2014-11-27 8:42 ` [PATCH 3.4 38/91] alarmtimer: Do not signal SIGEV_NONE timers lizf
2014-11-27 8:42 ` [PATCH 3.4 39/91] alarmtimer: Lock k_itimer during timer callback lizf
2014-11-27 8:42 ` [PATCH 3.4 40/91] don't bugger nd->seq on set_root_rcu() from follow_dotdot_rcu() lizf
2014-11-27 8:42 ` [PATCH 3.4 41/91] jiffies: Fix timeval conversion to jiffies lizf
2014-11-27 8:42 ` [PATCH 3.4 42/91] MIPS: ZBOOT: add missing <linux/string.h> include lizf
2014-11-27 8:42 ` [PATCH 3.4 43/91] perf: Fix a race condition in perf_remove_from_context() lizf
2014-12-01 18:43 ` Sukadev Bhattiprolu
2014-12-02 1:22 ` Zefan Li
2014-11-27 8:42 ` [PATCH 3.4 44/91] ASoC: samsung-i2s: Check secondary DAI exists before referencing lizf
2014-11-27 8:42 ` [PATCH 3.4 45/91] Input: i8042 - add Fujitsu U574 to no_timeout dmi table lizf
2014-11-27 8:42 ` [PATCH 3.4 46/91] Input: i8042 - add nomux quirk for Avatar AVIU-145A6 lizf
2014-11-27 8:42 ` [PATCH 3.4 47/91] iscsi-target: Fix memory corruption in iscsit_logout_post_handler_diffcid lizf
2014-11-27 8:42 ` [PATCH 3.4 48/91] iscsi-target: avoid NULL pointer in iscsi_copy_param_list failure lizf
2014-11-27 8:42 ` [PATCH 3.4 49/91] NFSv4: Fix another bug in the close/open_downgrade code lizf
2014-11-27 8:42 ` [PATCH 3.4 50/91] libiscsi: fix potential buffer overrun in __iscsi_conn_send_pdu lizf
2014-11-27 8:42 ` [PATCH 3.4 51/91] USB: storage: Add quirk for Adaptec USBConnect 2000 USB-to-SCSI Adapter lizf
2014-11-27 8:42 ` [PATCH 3.4 52/91] USB: storage: Add quirk for Ariston Technologies iConnect USB to SCSI adapter lizf
2014-11-27 8:42 ` [PATCH 3.4 53/91] USB: storage: Add quirks for Entrega/Xircom USB to SCSI converters lizf
2014-11-27 8:42 ` [PATCH 3.4 54/91] can: flexcan: mark TX mailbox as TX_INACTIVE lizf
2014-11-27 8:42 ` [PATCH 3.4 55/91] can: flexcan: correctly initialize mailboxes lizf
2014-11-27 8:42 ` [PATCH 3.4 56/91] can: flexcan: implement workaround for errata ERR005829 lizf
2014-11-27 8:42 ` [PATCH 3.4 57/91] can: flexcan: put TX mailbox into TX_INACTIVE mode after tx-complete lizf
2014-11-27 8:42 ` [PATCH 3.4 58/91] can: at91_can: add missing prepare and unprepare of the clock lizf
2014-11-27 8:42 ` [PATCH 3.4 59/91] ALSA: pcm: fix fifo_size frame calculation lizf
2014-11-27 8:42 ` [PATCH 3.4 60/91] Fix nasty 32-bit overflow bug in buffer i/o code lizf
2014-11-27 8:42 ` [PATCH 3.4 61/91] parisc: Only use -mfast-indirect-calls option for 32-bit kernel builds lizf
2014-11-27 8:42 ` [PATCH 3.4 62/91] sched: Fix unreleased llc_shared_mask bit during CPU hotplug lizf
2014-11-27 8:42 ` [PATCH 3.4 63/91] sched: add macros to define bitops for task atomic flags lizf
2014-11-27 8:42 ` lizf [this message]
2014-11-27 8:42 ` [PATCH 3.4 65/91] MIPS: mcount: Adjust stack pointer for static trace in MIPS32 lizf
2014-11-27 8:42 ` [PATCH 3.4 66/91] nilfs2: fix data loss with mmap() lizf
2014-11-27 8:42 ` [PATCH 3.4 67/91] ocfs2/dlm: do not get resource spinlock if lockres is new lizf
2014-11-27 8:42 ` [PATCH 3.4 68/91] shmem: fix nlink for rename overwrite directory lizf
2014-11-27 8:42 ` [PATCH 3.4 69/91] ARM: 8165/1: alignment: don't break misaligned NEON load/store lizf
2014-11-27 8:42 ` [PATCH 3.4 70/91] ASoC: core: fix possible ZERO_SIZE_PTR pointer dereferencing error lizf
2014-11-27 8:42 ` [PATCH 3.4 71/91] mm: migrate: Close race between migration completion and mprotect lizf
2014-11-27 8:42 ` [PATCH 3.4 72/91] perf: fix perf bug in fork() lizf
2014-11-27 8:42 ` [PATCH 3.4 73/91] init/Kconfig: Hide printk log config if CONFIG_PRINTK=n lizf
2014-11-27 8:42 ` [PATCH 3.4 74/91] genhd: fix leftover might_sleep() in blk_free_devt() lizf
2014-11-27 8:42 ` [PATCH 3.4 75/91] nl80211: clear skb cb before passing to netlink lizf
2014-11-27 8:42 ` [PATCH 3.4 76/91] ext4: propagate errors up to ext4_find_entry()'s callers lizf
2014-11-27 8:43 ` [PATCH 3.4 77/91] ext4: avoid trying to kfree an ERR_PTR pointer lizf
2014-11-27 8:43 ` [PATCH 3.4 78/91] NFS: fix stable regression lizf
2014-11-27 8:43 ` [PATCH 3.4 79/91] perf: Handle compat ioctl lizf
2014-11-27 8:43 ` [PATCH 3.4 80/91] bluetooth: hci_ldisc: fix deadlock condition lizf
2014-11-27 8:43 ` [PATCH 3.4 81/91] mnt: Only change user settable mount flags in remount lizf
2014-11-27 8:43 ` [PATCH 3.4 82/91] dm crypt: fix access beyond the end of allocated space lizf
2014-11-27 8:43 ` [PATCH 3.4 83/91] Fix spurious request sense in error handling lizf
2014-11-27 8:43 ` [PATCH 3.4 84/91] ipv4: move route garbage collector to work queue lizf
2014-11-27 8:43 ` [PATCH 3.4 85/91] ipv4: avoid parallel route cache gc executions lizf
2014-11-27 8:43 ` [PATCH 3.4 86/91] ipv4: disable bh while doing route gc lizf
2014-11-27 8:43 ` [PATCH 3.4 87/91] rtl8192ce: Fix null dereference in watchdog lizf
2014-11-27 8:43 ` [PATCH 3.4 88/91] ipv6: reuse ip6_frag_id from ip6_ufo_append_data lizf
2014-11-27 8:43 ` [PATCH 3.4 89/91] net: Do not enable tx-nocache-copy by default lizf
2014-11-27 8:43 ` [PATCH 3.4 90/91] ixgbevf: Prevent RX/TX statistics getting reset to zero lizf
2014-11-27 8:43 ` [PATCH 3.4 91/91] l2tp: fix race while getting PMTU on PPP pseudo-wire lizf
2014-11-27 9:07 ` [PATCH 3.4 00/91] 3.4.105-rc1 review Zefan Li
2014-11-27 20:15 ` Guenter Roeck
2014-11-29 5:54 ` Zefan Li
2015-01-28 4:07 [PATCH 3.4 000/177] 3.4.106-rc1 review lizf
2015-01-28 4:08 ` [PATCH 3.4 64/91] cpuset: PF_SPREAD_PAGE and PF_SPREAD_SLAB should be atomic flags lizf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1417077794-9299-64-git-send-email-lizf@kernel.org \
--to=lizf@kernel.org \
--cc=keescook@chromium.org \
--cc=linux-kernel@vger.kernel.org \
--cc=lizefan@huawei.com \
--cc=miaox@cn.fujitsu.com \
--cc=mingo@kernel.org \
--cc=peterz@infradead.org \
--cc=stable@vger.kernel.org \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).