All of lore.kernel.org
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: Andy Lutomirski <luto@kernel.org>
Cc: Fenghua Yu <fenghua.yu@intel.com>, Ingo Molnar <mingo@redhat.com>,
	H Peter Anvin <hpa@zytor.com>, Ashok Raj <ashok.raj@intel.com>,
	Alan Cox <alan@linux.intel.com>,
	Ravi V Shankar <ravi.v.shankar@intel.com>,
	linux-kernel <linux-kernel@vger.kernel.org>, x86 <x86@kernel.org>
Subject: Re: [PATCH 6/7] x86/vdso: Add vDSO functions for user wait instructions
Date: Tue, 31 Jul 2018 23:22:12 +0200 (CEST)	[thread overview]
Message-ID: <alpine.DEB.2.21.1807312220490.1713@nanos.tec.linutronix.de> (raw)
In-Reply-To: <2267fbe6-37e8-7063-d48f-1879f31d3258@kernel.org>

On Mon, 23 Jul 2018, Andy Lutomirski wrote:
> On 07/23/2018 05:55 AM, Fenghua Yu wrote:
> >     static void __init init_vdso_funcs_data(void)
> >   {
> > +	struct system_counterval_t sys_counterval;
> > +
> >   	if (static_cpu_has(X86_FEATURE_MOVDIRI))
> >   		vdso_funcs_data.movdiri_supported = true;
> >   	if (static_cpu_has(X86_FEATURE_MOVDIR64B))
> >   		vdso_funcs_data.movdir64b_supported = true;
> > +	if (static_cpu_has(X86_FEATURE_WAITPKG))
> > +		vdso_funcs_data.waitpkg_supported = true;
> > +	if (static_cpu_has(X86_FEATURE_TSC_KNOWN_FREQ)) {
> > +		vdso_funcs_data.tsc_known_freq = true;
> > +		sys_counterval = convert_art_ns_to_tsc(1);
> > +		vdso_funcs_data.tsc_per_nsec = sys_counterval.cycles;
> > +	}
> 
> You're losing a ton of precision here.  You might even be losing *all* of the
> precision and malfunctioning rather badly.

Indeed.

> The correct way to do this is:
> 
> tsc_counts = ns * mul >> shift;

> and the vclock code illustrates it.

Albeit you cannot use the TSC mult/shift pair as that is for the TSC to
nsec conversion.

To get the proper mult/shift pair use clocks_calc_mult_shift(). Note that
the scaled math has an upper limit when using 64 bit variables. You might
need 128bit scaled math to make it work correctly.

> convert_art_ns_to_tsc() is a bad example because it uses an expensive
> division operation for no good reason except that no one bothered to
> optimize it.

Right. It's not a hot path function and it does the job and we would need
128bit scaled math to avoid mult overflows.

Aside of that I have no idea why anyone would use convert_art_ns_to_tsc()
for anything else than converting art to nsecs.

> > +notrace int __vdso_umwait(int state, unsigned long nsec)
> 
> __vdso_umwait_relative(), please.  Because some day (possibly soon) someone
> will want __vdso_umwait_absolute() and its friend __vdso_read_art_ns() so they
> can do:
> 
> u64 start = __vdso_read_art_ns();

Errm. No. You can't read ART. ART is only used by decives to which it is
distributed. You can only read TSC here and convert that to nsecs.

> __vdso_umonitor(...);
> ... do something potentially slow or that might fault ...
> __vdso_umwait_absolute(start + timeout);

That definitely requires 128bit scaled math to work correctly, unless you
make the timeout relative before conversion.

But I really think we should avoid creating yet another interface to
retrieve TSC time in nsecs. We have enough of these things already.

Ideally we'd use CLOCK_MONOTONIC here, but that needs more thought as:

  1) TSC might be disabled as the timekeeping clocksource

  2) The mult/shift pair for converting to nanoseconds is affected by
     NTP/PTP so it can be different from the initial mult/shift pair for
     converting nanoseconds to TSC.

A possible solution would be to use CLOCK_MOTONIC_RAW which is not affected
by NTP/PTP adjustments. But that still has the issue of TSC not being the
timekeeping clocksource. Bah, the whole TSC deadline mode sucks. I have no
idea what's wrong with simple down counters. They Just Work.

> Also, this patch appears to have a subtle but show-stopping race.  Consider:

It's not the patch which has this issue. It's the hardware ....

> 1. Task A does UMONITOR on CPU 1
> 2. Task A is preempted.
> 3. Task B does UMONITOR on CPU 1 at a different address
> 4. Task A resumes
> 5. Task A does UMWAIT
> 
> Now task A hangs, at least until the next external wakeup happens.
> 
> It's not entirely clear to me how you're supposed to fix this without some
> abomination that's so bad that it torpedoes the entire feature. Except that
> there is no chicken bit to turn this thing off.  Sigh.

That sounds similar to the ARM monitor which is used for implementing
cmpxchg. They had to sprinkle monitor aborts all over the place....

So yes, unless there is undocumented magic which aborts the monitor under
certain circumstances, this thing is broken beyond repair.

Thanks,

	tglx

  parent reply	other threads:[~2018-07-31 21:22 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-23 12:55 [PATCH 0/7] x86: Enable a few new instructions Fenghua Yu
2018-07-23 12:55 ` [PATCH 1/7] x86/cpufeatures: Enumerate MOVDIRI instruction Fenghua Yu
2018-07-23 12:55 ` [PATCH 2/7] x86/cpufeatures: Enumerate MOVDIR64B instruction Fenghua Yu
2018-07-23 12:55 ` [PATCH 3/7] x86/cpufeatures: Enumerate UMONITOR, UMWAIT, and TPAUSE instructions Fenghua Yu
2018-07-23 12:55 ` [PATCH 4/7] x86/umwait_contro: Set global umwait maximum time limit and umwait C0.2 state Fenghua Yu
2018-07-24  1:41   ` Andy Lutomirski
2018-08-01  9:01     ` Thomas Gleixner
2018-07-23 12:55 ` [PATCH 5/7] x86/vdso: Add vDSO functions for direct store instructions Fenghua Yu
2018-07-24  1:48   ` Andy Lutomirski
2018-07-24  3:42     ` Fenghua Yu
2018-07-24  5:27       ` Andy Lutomirski
2018-07-25 22:18         ` Fenghua Yu
2018-07-23 12:55 ` [PATCH 6/7] x86/vdso: Add vDSO functions for user wait instructions Fenghua Yu
2018-07-24  2:11   ` Andy Lutomirski
2018-07-24 15:14     ` Andy Lutomirski
2018-07-31 21:22     ` Thomas Gleixner [this message]
2018-07-31 21:38       ` Andy Lutomirski
2018-08-01  8:55         ` Thomas Gleixner
2018-07-23 12:55 ` [PATCH 7/7] selftests/vDSO: Add selftest to test vDSO functions for direct store and " Fenghua Yu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.21.1807312220490.1713@nanos.tec.linutronix.de \
    --to=tglx@linutronix.de \
    --cc=alan@linux.intel.com \
    --cc=ashok.raj@intel.com \
    --cc=fenghua.yu@intel.com \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mingo@redhat.com \
    --cc=ravi.v.shankar@intel.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.