All of lore.kernel.org
 help / color / mirror / Atom feed
From: Willy Tarreau <w@1wt.eu>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: "Herton R. Krzesinski" <herton@redhat.com>,
	Manfred Spraul <manfred@colorfullife.com>,
	Davidlohr Bueso <dave@stgolabs.net>,
	Rafael Aquini <aquini@redhat.com>,
	Aristeu Rozanski <aris@redhat.com>,
	David Jeffery <djeffery@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Ben Hutchings <ben@decadent.org.uk>, Willy Tarreau <w@1wt.eu>
Subject: [PATCH 2.6.32 13/38] [PATCH 13/38] ipc,sem: fix use after free on IPC_RMID after a task using same semaphore set exits
Date: Sun, 29 Nov 2015 22:47:15 +0100	[thread overview]
Message-ID: <20151129214703.429050559@1wt.eu> (raw)
In-Reply-To: <8acf8256ccc72771a80b7851061027bc@local>

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

commit 602b8593d2b4138c10e922eeaafe306f6b51817b upstream.

The current semaphore code allows a potential use after free: in
exit_sem we may free the task's sem_undo_list while there is still
another task looping through the same semaphore set and cleaning the
sem_undo list at freeary function (the task called IPC_RMID for the same
semaphore set).

For example, with a test program [1] running which keeps forking a lot
of processes (which then do a semop call with SEM_UNDO flag), and with
the parent right after removing the semaphore set with IPC_RMID, and a
kernel built with CONFIG_SLAB, CONFIG_SLAB_DEBUG and
CONFIG_DEBUG_SPINLOCK, you can easily see something like the following
in the kernel log:

   Slab corruption (Not tainted): kmalloc-64 start=ffff88003b45c1c0, len=64
   000: 6b 6b 6b 6b 6b 6b 6b 6b 00 6b 6b 6b 6b 6b 6b 6b  kkkkkkkk.kkkkkkk
   010: ff ff ff ff 6b 6b 6b 6b ff ff ff ff ff ff ff ff  ....kkkk........
   Prev obj: start=ffff88003b45c180, len=64
   000: 00 00 00 00 ad 4e ad de ff ff ff ff 5a 5a 5a 5a  .....N......ZZZZ
   010: ff ff ff ff ff ff ff ff c0 fb 01 37 00 88 ff ff  ...........7....
   Next obj: start=ffff88003b45c200, len=64
   000: 00 00 00 00 ad 4e ad de ff ff ff ff 5a 5a 5a 5a  .....N......ZZZZ
   010: ff ff ff ff ff ff ff ff 68 29 a7 3c 00 88 ff ff  ........h).<....
   BUG: spinlock wrong CPU on CPU#2, test/18028
   general protection fault: 0000 [#1] SMP
   Modules linked in: 8021q mrp garp stp llc nf_conntrack_ipv4 nf_defrag_ipv4 ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables binfmt_misc ppdev input_leds joydev parport_pc parport floppy serio_raw virtio_balloon virtio_rng virtio_console virtio_net iosf_mbi crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcspkr qxl ttm drm_kms_helper drm snd_hda_codec_generic i2c_piix4 snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore crc32c_intel virtio_pci virtio_ring virtio pata_acpi ata_generic [last unloaded: speedstep_lib]
   CPU: 2 PID: 18028 Comm: test Not tainted 4.2.0-rc5+ #1
   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150318_183358- 04/01/2014
   RIP: spin_dump+0x53/0xc0
   Call Trace:
     spin_bug+0x30/0x40
     do_raw_spin_unlock+0x71/0xa0
     _raw_spin_unlock+0xe/0x10
     freeary+0x82/0x2a0
     ? _raw_spin_lock+0xe/0x10
     semctl_down.clone.0+0xce/0x160
     ? __do_page_fault+0x19a/0x430
     ? __audit_syscall_entry+0xa8/0x100
     SyS_semctl+0x236/0x2c0
     ? syscall_trace_leave+0xde/0x130
     entry_SYSCALL_64_fastpath+0x12/0x71
   Code: 8b 80 88 03 00 00 48 8d 88 60 05 00 00 48 c7 c7 a0 2c a4 81 31 c0 65 8b 15 eb 40 f3 7e e8 08 31 68 00 4d 85 e4 44 8b 4b 08 74 5e <45> 8b 84 24 88 03 00 00 49 8d 8c 24 60 05 00 00 8b 53 04 48 89
   RIP  [<ffffffff810d6053>] spin_dump+0x53/0xc0
    RSP <ffff88003750fd68>
   ---[ end trace 783ebb76612867a0 ]---
   NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [test:18053]
   Modules linked in: 8021q mrp garp stp llc nf_conntrack_ipv4 nf_defrag_ipv4 ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables binfmt_misc ppdev input_leds joydev parport_pc parport floppy serio_raw virtio_balloon virtio_rng virtio_console virtio_net iosf_mbi crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcspkr qxl ttm drm_kms_helper drm snd_hda_codec_generic i2c_piix4 snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore crc32c_intel virtio_pci virtio_ring virtio pata_acpi ata_generic [last unloaded: speedstep_lib]
   CPU: 3 PID: 18053 Comm: test Tainted: G      D         4.2.0-rc5+ #1
   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150318_183358- 04/01/2014
   RIP: native_read_tsc+0x0/0x20
   Call Trace:
     ? delay_tsc+0x40/0x70
     __delay+0xf/0x20
     do_raw_spin_lock+0x96/0x140
     _raw_spin_lock+0xe/0x10
     sem_lock_and_putref+0x11/0x70
     SYSC_semtimedop+0x7bf/0x960
     ? handle_mm_fault+0xbf6/0x1880
     ? dequeue_task_fair+0x79/0x4a0
     ? __do_page_fault+0x19a/0x430
     ? kfree_debugcheck+0x16/0x40
     ? __do_page_fault+0x19a/0x430
     ? __audit_syscall_entry+0xa8/0x100
     ? do_audit_syscall_entry+0x66/0x70
     ? syscall_trace_enter_phase1+0x139/0x160
     SyS_semtimedop+0xe/0x10
     SyS_semop+0x10/0x20
     entry_SYSCALL_64_fastpath+0x12/0x71
   Code: 47 10 83 e8 01 85 c0 89 47 10 75 08 65 48 89 3d 1f 74 ff 7e c9 c3 0f 1f 44 00 00 55 48 89 e5 e8 87 17 04 00 66 90 c9 c3 0f 1f 00 <55> 48 89 e5 0f 31 89 c1 48 89 d0 48 c1 e0 20 89 c9 48 09 c8 c9
   Kernel panic - not syncing: softlockup: hung tasks

I wasn't able to trigger any badness on a recent kernel without the
proper config debugs enabled, however I have softlockup reports on some
kernel versions, in the semaphore code, which are similar as above (the
scenario is seen on some servers running IBM DB2 which uses semaphore
syscalls).

The patch here fixes the race against freeary, by acquiring or waiting
on the sem_undo_list lock as necessary (exit_sem can race with freeary,
while freeary sets un->semid to -1 and removes the same sem_undo from
list_proc or when it removes the last sem_undo).

After the patch I'm unable to reproduce the problem using the test case
[1].

[1] Test case used below:

    #include <stdio.h>
    #include <sys/types.h>
    #include <sys/ipc.h>
    #include <sys/sem.h>
    #include <sys/wait.h>
    #include <stdlib.h>
    #include <time.h>
    #include <unistd.h>
    #include <errno.h>

    #define NSEM 1
    #define NSET 5

    int sid[NSET];

    void thread()
    {
            struct sembuf op;
            int s;
            uid_t pid = getuid();

            s = rand() % NSET;
            op.sem_num = pid % NSEM;
            op.sem_op = 1;
            op.sem_flg = SEM_UNDO;

            semop(sid[s], &op, 1);
            exit(EXIT_SUCCESS);
    }

    void create_set()
    {
            int i, j;
            pid_t p;
            union {
                    int val;
                    struct semid_ds *buf;
                    unsigned short int *array;
                    struct seminfo *__buf;
            } un;

            /* Create and initialize semaphore set */
            for (i = 0; i < NSET; i++) {
                    sid[i] = semget(IPC_PRIVATE , NSEM, 0644 | IPC_CREAT);
                    if (sid[i] < 0) {
                            perror("semget");
                            exit(EXIT_FAILURE);
                    }
            }
            un.val = 0;
            for (i = 0; i < NSET; i++) {
                    for (j = 0; j < NSEM; j++) {
                            if (semctl(sid[i], j, SETVAL, un) < 0)
                                    perror("semctl");
                    }
            }

            /* Launch threads that operate on semaphore set */
            for (i = 0; i < NSEM * NSET * NSET; i++) {
                    p = fork();
                    if (p < 0)
                            perror("fork");
                    if (p == 0)
                            thread();
            }

            /* Free semaphore set */
            for (i = 0; i < NSET; i++) {
                    if (semctl(sid[i], NSEM, IPC_RMID))
                            perror("IPC_RMID");
            }

            /* Wait for forked processes to exit */
            while (wait(NULL)) {
                    if (errno == ECHILD)
                            break;
            };
    }

    int main(int argc, char **argv)
    {
            pid_t p;

            srand(time(NULL));

            while (1) {
                    p = fork();
                    if (p < 0) {
                            perror("fork");
                            exit(EXIT_FAILURE);
                    }
                    if (p == 0) {
                            create_set();
                            goto end;
                    }

                    /* Wait for forked processes to exit */
                    while (wait(NULL)) {
                            if (errno == ECHILD)
                                    break;
                    };
            }
    end:
            return 0;
    }

[akpm@linux-foundation.org: use normal comment layout]
Signed-off-by: Herton R. Krzesinski <herton@redhat.com>
Acked-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Rafael Aquini <aquini@redhat.com>
CC: Aristeu Rozanski <aris@redhat.com>
Cc: David Jeffery <djeffery@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit a1c4fb80c5d94ef61a77c1e891cae616a50d8d3c)
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 ipc/sem.c | 23 +++++++++++++++++------
 1 file changed, 17 insertions(+), 6 deletions(-)

diff --git a/ipc/sem.c b/ipc/sem.c
index 26dc5b1..fe3579f 100644
--- a/ipc/sem.c
+++ b/ipc/sem.c
@@ -1296,16 +1296,27 @@ void exit_sem(struct task_struct *tsk)
 		rcu_read_lock();
 		un = list_entry_rcu(ulp->list_proc.next,
 				    struct sem_undo, list_proc);
-		if (&un->list_proc == &ulp->list_proc)
-			semid = -1;
-		 else
-			semid = un->semid;
+		if (&un->list_proc == &ulp->list_proc) {
+			/*
+			 * We must wait for freeary() before freeing this ulp,
+			 * in case we raced with last sem_undo. There is a small
+			 * possibility where we exit while freeary() didn't
+			 * finish unlocking sem_undo_list.
+			 */
+			spin_unlock_wait(&ulp->lock);
+			rcu_read_unlock();
+			break;
+		}
+		spin_lock(&ulp->lock);
+		semid = un->semid;
+		spin_unlock(&ulp->lock);
 		rcu_read_unlock();
 
+		/* exit_sem raced with IPC_RMID, nothing to do */
 		if (semid == -1)
-			break;
+			continue;
 
-		sma = sem_lock_check(tsk->nsproxy->ipc_ns, un->semid);
+		sma = sem_lock_check(tsk->nsproxy->ipc_ns, semid);
 
 		/* exit_sem raced with IPC_RMID, nothing to do */
 		if (IS_ERR(sma))
-- 
1.7.12.2.21.g234cd45.dirty




  parent reply	other threads:[~2015-11-29 22:09 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-29 21:47 [PATCH 2.6.32 00/38] 2.6.32.69-longterm review Willy Tarreau
2015-11-29 21:47 ` Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 01/38] [PATCH 01/38] dcache: Handle escaped paths in prepend_path Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 03/38] [PATCH 03/38] md: use kzalloc() when bitmap is disabled Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 04/38] [PATCH 04/38] ipv6: addrconf: validate new MTU before applying it Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 05/38] [PATCH 05/38] virtio-net: drop NETIF_F_FRAGLIST Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 06/38] [PATCH 06/38] USB: whiteheat: fix potential null-deref at probe Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 07/38] [PATCH 07/38] ipc/sem.c: fully initialize sem_array before making it visible Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 08/38] [PATCH 08/38] Initialize msg/shm IPC objects before doing ipc_addid() Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 10/38] [PATCH 10/38] rds: fix an integer overflow test in rds_info_getsockopt() Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 11/38] [PATCH 11/38] net: Clone skb before setting peeked flag Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 12/38] [PATCH 12/38] net: Fix skb_set_peeked use-after-free bug Willy Tarreau
2015-11-29 21:47 ` Willy Tarreau [this message]
2015-11-29 21:47 ` [PATCH 2.6.32 14/38] [PATCH 14/38] devres: fix devres_get() Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 15/38] [PATCH 15/38] windfarm: decrement client count when unregistering Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 16/38] [PATCH 16/38] xfs: Fix xfs_attr_leafblock definition Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 17/38] [PATCH 17/38] SUNRPC: xs_reset_transport must mark the connection as disconnected Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 18/38] [PATCH 18/38] Input: evdev - do not report errors form flush() Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 19/38] [PATCH 19/38] pagemap: hide physical addresses from non-privileged users Willy Tarreau
2015-11-30  1:54   ` Ben Hutchings
2015-11-30  7:01     ` Willy Tarreau
2015-11-30  7:01       ` Willy Tarreau
2015-11-30 11:30       ` Willy Tarreau
2015-11-30 11:49         ` Konstantin Khlebnikov
2015-11-30 12:13           ` Willy Tarreau
2015-11-30 14:55         ` Ben Hutchings
2015-11-30 15:14           ` Willy Tarreau
2015-11-30 15:14             ` Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 20/38] [PATCH 20/38] hfs,hfsplus: cache pages correctly between bnode_create and bnode_free Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 21/38] [PATCH 21/38] hfs: fix B-tree corruption after insertion at position 0 Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 22/38] [PATCH 22/38] x86/paravirt: Replace the paravirt nop with a bona fide empty function Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 23/38] [PATCH 23/38] RDS: verify the underlying transport exists before creating a connection Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 24/38] [PATCH 24/38] net: Fix skb csum races when peeking Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 25/38] [PATCH 25/38] net: add length argument to skb_copy_and_csum_datagram_iovec Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 26/38] [PATCH 26/38] module: Fix locking in symbol_put_addr() Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 27/38] [PATCH 27/38] x86/process: Add proper bound checks in 64bit get_wchan() Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 28/38] [PATCH 28/38] mm: hugetlbfs: skip shared VMAs when unmapping private pages to satisfy a fault Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 29/38] [PATCH 29/38] tty: fix stall caused by missing memory barrier in drivers/tty/n_tty.c Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 31/38] [PATCH 31/38] ethtool: Use kcalloc instead of kmalloc for ethtool_get_strings Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 32/38] [PATCH 32/38] HID: core: Avoid uninitialized buffer access Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 33/38] [PATCH 33/38] devres: fix a for loop bounds check Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 34/38] [PATCH 34/38] binfmt_elf: Dont clobber passed executables file header Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 35/38] [PATCH 35/38] RDS-TCP: Recover correctly from pskb_pull()/pksb_trim() failure in rds_tcp_data_recv Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 36/38] [PATCH 36/38] ipmr: fix possible race resulting from improper usage of IP_INC_STATS_BH() in preemptible context Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 37/38] [PATCH 37/38] net: avoid NULL deref in inet_ctl_sock_destroy() Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 38/38] [PATCH 38/38] splice: sendfile() at once fails for big files Willy Tarreau
2015-11-30  1:25 ` [PATCH 2.6.32 09/38] [PATCH 09/38] xhci: fix off by one error in TRB DMA address boundary check Willy Tarreau
2015-11-30  2:04 ` [PATCH 2.6.32 30/38] [PATCH 30/38] mvsas: Fix NULL pointer dereference in mvs_slot_task_free Willy Tarreau
2015-11-30  2:42 ` [PATCH 2.6.32 00/38] 2.6.32.69-longterm review Ben Hutchings
2015-11-30  6:51   ` Willy Tarreau
2015-11-30  6:51     ` Willy Tarreau
2015-11-30 11:23     ` Willy Tarreau
2015-11-30 14:43     ` Ben Hutchings
2015-11-30 15:10       ` Willy Tarreau
     [not found] ` <20151129214702.957590241@1wt.eu>
2015-11-30  6:44   ` [PATCH 2.6.32 02/38] [PATCH 02/38] Failing to send a CLOSE if file is opened WRONLY and server reboots on a 4.x mount Willy Tarreau
2015-11-30 16:04 ` [PATCH 2.6.32 00/38] 2.6.32.69-longterm review Willy Tarreau
2015-11-30 16:04   ` Willy Tarreau
2015-11-30 16:04   ` [PATCH 2.6.32 39/38] vfs: Test for and handle paths that are unreachable from their mnt_root Willy Tarreau
2015-11-30 16:05   ` [PATCH 2.6.32 40/38] security: add cred argument to security_capable() Willy Tarreau
2015-11-30 16:05   ` [PATCH 2.6.32 19/38] pagemap: hide physical addresses from non-privileged users Willy Tarreau
2015-12-01  0:43   ` [PATCH 2.6.32 00/38] 2.6.32.69-longterm review Ben Hutchings
2015-12-01  6:57     ` Willy Tarreau
2015-12-01  6:57       ` Willy Tarreau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151129214703.429050559@1wt.eu \
    --to=w@1wt.eu \
    --cc=akpm@linux-foundation.org \
    --cc=aquini@redhat.com \
    --cc=aris@redhat.com \
    --cc=ben@decadent.org.uk \
    --cc=dave@stgolabs.net \
    --cc=djeffery@redhat.com \
    --cc=herton@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=manfred@colorfullife.com \
    --cc=stable@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.