All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: Florian Weimer <fweimer@redhat.com>
Cc: carlos <carlos@redhat.com>,
	Joseph Myers <joseph@codesourcery.com>,
	Szabolcs Nagy <szabolcs.nagy@arm.com>,
	libc-alpha <libc-alpha@sourceware.org>,
	Thomas Gleixner <tglx@linutronix.de>, Ben Maurer <bmaurer@fb.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Paul <paulmck@linux.vnet.ibm.com>,
	Boqun Feng <boqun.feng@gmail.com>,
	Will Deacon <will.deacon@arm.com>, Paul Turner <pjt@google.com>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	linux-api <linux-api@vger.kernel.org>
Subject: Re: [PATCH 2/3] Linux: Use rseq in sched_getcpu if available (v9)
Date: Mon, 6 Jul 2020 13:33:46 -0400 (EDT)	[thread overview]
Message-ID: <1679448037.22891.1594056826859.JavaMail.zimbra@efficios.com> (raw)
In-Reply-To: <942999672.22574.1594046978937.JavaMail.zimbra@efficios.com>

----- On Jul 6, 2020, at 10:49 AM, Mathieu Desnoyers mathieu.desnoyers@efficios.com wrote:

> ----- On Jul 6, 2020, at 9:59 AM, Florian Weimer fweimer@redhat.com wrote:
> 
>> * Mathieu Desnoyers:
>> 
>>> When available, use the cpu_id field from __rseq_abi on Linux to
>>> implement sched_getcpu().  Fall-back on the vgetcpu vDSO if
>>> unavailable.
>> 
>> I've pushed this to glibc master, but unfortunately it looks like this
>> exposes a kernel bug related to affinity mask changes.
>> 
>> After building and testing glibc, this
>> 
>>  for x in {1..2000} ; do posix/tst-affinity-static  & done
>> 
>> produces some “error:” lines for me:
>> 
>> error: Unexpected CPU 2, expected 0
>> error: Unexpected CPU 2, expected 0
>> error: Unexpected CPU 2, expected 0
>> error: Unexpected CPU 2, expected 0
>> error: Unexpected CPU 138, expected 0
>> error: Unexpected CPU 138, expected 0
>> error: Unexpected CPU 138, expected 0
>> error: Unexpected CPU 138, expected 0
>> 
>> “expected 0” is a result of how the test has been written, it bails out
>> on the first failure, which happens with CPU ID 0.
>> 
>> Smaller systems can use a smaller count than 2000 to reproduce this.  It
>> also happens sporadically when running the glibc test suite itself
>> (which is why it took further testing to reveal this issue).
>> 
>> I can reproduce this with the Debian 4.19.118-2+deb10u1 kernel, the
>> Fedora 5.6.19-300.fc32 kernel, and the Red Hat Enterprise Linux kernel
>> 4.18.0-193.el8 (all x86_64).
>> 
>> As to the cause, I'd guess that the exit path in the sched_setaffinity
>> system call fails to update the rseq area, so that userspace can observe
>> the outdated CPU ID there.
> 
> Hi Florian,
> 
> We have a similar test in Linux, see tools/testing/selftests/rseq/basic_test.c.
> That test does not trigger this issue, even when executed repeatedly.
> 
> I'll investigate further what is happening within the glibc test.

Here are the missing kernel bits. Just adding those in wake_up_new_task() is
enough to make the problem go away, but to err on the safe side I'm planning to
add an rseq_migrate within sched_fork:

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 1757381cabd4..ff83f790a9b3 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3043,6 +3043,7 @@ int sched_fork(unsigned long clone_flags, struct task_struct *p)
         * so use __set_task_cpu().
         */
        __set_task_cpu(p, smp_processor_id());
+       rseq_migrate(p);
        if (p->sched_class->task_fork)
                p->sched_class->task_fork(p);
        raw_spin_unlock_irqrestore(&p->pi_lock, flags);
@@ -3103,6 +3104,7 @@ void wake_up_new_task(struct task_struct *p)
         */
        p->recent_used_cpu = task_cpu(p);
        __set_task_cpu(p, select_task_rq(p, task_cpu(p), SD_BALANCE_FORK, 0));
+       rseq_migrate(p);
 #endif
        rq = __task_rq_lock(p, &rf);
        update_rq_clock(rq);

I'm not sure why your test catches it fairly quickly but ours does not. That's a good
catch.

Now we need to discuss how we introduce that fix in a way that will allow user-space
to trust the __rseq_abi.cpu_id field's content.

The usual approach to kernel bug fixing is typically to push the fix, mark it for
stable kernels, and expect everyone to pick up the fixes. I wonder how comfortable
glibc would be to replace its sched_getcpu implementation with a broken-until-fixed
kernel rseq implementation without any mechanism in place to know whether it can
trust the value of the cpu_id field. I am extremely reluctant to do so.

One possible way to introduce this fix would be to use the rseq flags argument to allow
glibc to query whether the kernel implements a rseq with a cpu_id field it can trust.
So glibc could do, for registration:

  ret = INTERNAL_SYSCALL_CALL (rseq, &__rseq_abi, sizeof (struct rseq),
                               RSEQ_FLAG_REGISTER | RSEQ_FLAG_RELIABLE_CPU_ID,
                               RSEQ_SIG);

(I'm open to bike-shedding the actual flag name)

So if the kernel does not implement the fix, the registration would return EINVAL.
When getting EINVAL like this, we could then re-issue the rseq syscall:

  ret = INTERNAL_SYSCALL_CALL (rseq, NULL, 0, RSEQ_FLAG_RELIABLE_CPU_ID, 0);

to check whether EINVAL has been caused by other invalid system call parameters,
or it's really because the RSEQ_FLAG_RELIABLE_CPU_ID flag is unknown. Being able
to distinguish between invalid parameters and unknown flags like this would end
up requiring one extra system call on failed registration attempt on kernels which
implement a broken rseq.

This also means glibc would only start really activating rseq on kernels implementing
this fix.

Thoughts ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

  reply	other threads:[~2020-07-06 17:33 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20200629190036.26982-1-mathieu.desnoyers@efficios.com>
2020-06-29 19:00 ` [PATCH 1/3] glibc: Perform rseq registration at C startup and thread creation (v22) Mathieu Desnoyers
2020-07-02 14:46   ` Florian Weimer
2020-07-03 17:50     ` Carlos O'Donell
2020-06-29 19:00 ` [PATCH 2/3] Linux: Use rseq in sched_getcpu if available (v9) Mathieu Desnoyers
2020-07-06 13:59   ` Florian Weimer
2020-07-06 14:49     ` Mathieu Desnoyers
2020-07-06 17:33       ` Mathieu Desnoyers [this message]
2020-07-06 17:50         ` Florian Weimer
2020-07-06 18:02           ` Mathieu Desnoyers
2020-07-06 18:11             ` Florian Weimer
2020-07-06 21:08               ` Mathieu Desnoyers
     [not found] <20200622180803.1449-1-mathieu.desnoyers@efficios.com>
2020-06-22 18:08 ` Mathieu Desnoyers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1679448037.22891.1594056826859.JavaMail.zimbra@efficios.com \
    --to=mathieu.desnoyers@efficios.com \
    --cc=bmaurer@fb.com \
    --cc=boqun.feng@gmail.com \
    --cc=carlos@redhat.com \
    --cc=fweimer@redhat.com \
    --cc=joseph@codesourcery.com \
    --cc=libc-alpha@sourceware.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=szabolcs.nagy@arm.com \
    --cc=tglx@linutronix.de \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.