All of lore.kernel.org
 help / color / mirror / Atom feed
From: Akira Yokosawa <akiyks@gmail.com>
To: "Paul E. McKenney" <paulmck@kernel.org>
Cc: perfbook@vger.kernel.org, Akira Yokosawa <akiyks@gmail.com>
Subject: Re: [PATCH 0/3] defer: misc updates
Date: Wed, 3 Jun 2020 08:05:48 +0900	[thread overview]
Message-ID: <fde7efb7-f274-7b1e-2b21-a5e60f10c2d0@gmail.com> (raw)
In-Reply-To: <20200602152809.GF29598@paulmck-ThinkPad-P72>

On Tue, 2 Jun 2020 08:28:09 -0700, Paul E. McKenney wrote:
> On Tue, Jun 02, 2020 at 11:27:37PM +0900, Akira Yokosawa wrote:
>> On Mon, 1 Jun 2020 16:45:45 -0700, Paul E. McKenney wrote:
>>> On Tue, Jun 02, 2020 at 07:51:31AM +0900, Akira Yokosawa wrote:
>>>> On Mon, 1 Jun 2020 09:13:49 -0700, Paul E. McKenney wrote:
>>>>> On Tue, Jun 02, 2020 at 12:10:06AM +0900, Akira Yokosawa wrote:
>>>>>> On Sun, 31 May 2020 18:18:38 -0700, Paul E. McKenney wrote:
>>>>>>> On Mon, Jun 01, 2020 at 08:11:06AM +0900, Akira Yokosawa wrote:
>>>>>>>> On Sun, 31 May 2020 09:50:23 -0700, Paul E. McKenney wrote:
>>>>>>>>> On Sun, May 31, 2020 at 09:30:44AM +0900, Akira Yokosawa wrote:
>>>>>>>>>> Hi Paul,
>>>>>>>>>>
>>>>>>>>>> This is misc updates in response to your recent updates.
>>>>>>>>>>
>>>>>>>>>> Patch 1/3 treats QQZ annotations for "nq" build.
>>>>>>>>>
>>>>>>>>> Good reminder, thank you!
>>>>>>>>>
>>>>>>>>>> Patch 2/3 adds a paragraph in #9 of FAQ.txt.  The wording may need
>>>>>>>>>> your retouch for fluency.
>>>>>>>>>> Patch 3/3 is an independent improvement of runlatex.sh.  It will avoid
>>>>>>>>>> a few redundant runs of pdflatex when you have some typo in labels/refs.
>>>>>>>>>
>>>>>>>>> Nice, queued and pushed, thank you!
>>>>>>>>>
>>>>>>>>>> Another suggestion to Figures 9.25 and 9.29.
>>>>>>>>>> Wouldn't these graphs look better with log scale x-axis?
>>>>>>>>>>
>>>>>>>>>> X range can be 0.001 -- 10.
>>>>>>>>>>
>>>>>>>>>> You'll need to add a few data points in sub-microsecond critical-section
>>>>>>>>>> duration to show plausible shapes in those regions, though.
>>>>>>>>>
>>>>>>>>> I took a quick look and didn't find any nanosecond delay primitives
>>>>>>>>> in the Linux kernel, but yes, that would be nicer looking.
>>>>>>>>>
>>>>>>>>> I don't expect to make further progress on this particular graph
>>>>>>>>> in the immediate future, but if you know of such a delay primitive,
>>>>>>>>> please don't keep it a secret!  ;-)
>>>>>>>>
>>>>>>>> I find ndelay() defined in include/asm_generic/delay.h.
>>>>>>>> I'm not sure if it works as you would expect, though.
>>>>>>>
>>>>>>> I must be going blind, given that I missed that one!
>>>>>>
>>>>>> :-) :-)
>>>>>>
>>>>>>> I did try it out, and it suffers from about 10% timing errors.  In
>>>>>>> contrast, udelay is usually less than 1%.
>>>>>>
>>>>>> You mean udelay(1)'s error is less than 10ns, whereas ndelay(1000)'s
>>>>>> error is about 100ns?
>>>>>
>>>>> Yuck.  The 10% was a preliminary eyeballing.  An overnight run showed it
>>>>> to be worst than that.  100ns gets me about 130ns, 200ns gets me about
>>>>> 270ns, and 500ns gets me about 600ns.  So ndelay() is useful only for
>>>>> very short delays.
>>>>
>>>> To compensate the error, how about doing the appended?
>>>> Yes, this is kind of ugly...
>>>>
>>>> Another point you should be aware.  It looks like arch/powerpc
>>>> does not have __ndelay defined.  Which means ndelay() would cause
>>>> build error.  Still, I might be missing something.
>>>
>>> That is quite clever!  It does turn ndelay(1) into ndelay(0), but it
>>> probably costs more than a nanosecond to do the integer division, so
>>> that shouldn't be a problem.
>>>
>>> However, I believe that any such compensatory schemes should be done
>>> within ndelay() rather than by its users.
>>
>> I'm not brave enough to change the behavior of ndelay() seeing the
>> number of call sites in kernel code base, especially under drivers/.
>>
>> Looking at the updated Figures 9.25 and 9.29, the timing error of
>> ndelay() results in the discrepancy of "rcu" plots from the ideal
>> orthogonal lines in sub-microseconds regions (0.1, 0.2, and 0.5us).
>> I don't think you like such misleading plots.
>>
>> You could instead compensate the x-values you give to ndelay().
>>
>> On x86, you know the resolution of xdelay() is 1.164153ns.
>> Which means if you want a time delay of 100ns, ndelay(86) will
>> be 100.117ns.
>> ndelay(172) will be 200.234ns and ndelay(429) will be 499.422ns.
>> ndelay(430) will be 500.586ns, which is the 2nd closest.
>> If you don't want to exceed 500ns, ndelay(430) would be your choice.
>>
>> I think this level of tweak is worthwhile, especially it will
>> result in a better looking plot of RCU scaling.
>>
>> Thoughts?
> 
> Huh.
> 
> What we could do is to do a calibration pass where we sample a
> fine-grained timesource, spin on a series of ndelay() calls that last for
> a few microseconds, then resample the fine-grained timestamp.  We could
> then do a binary search so as to compute a corrected ndelay argument.
> We would then need to verify the corrected argument.
> 
> This procedure would be architecture independent, and might also account
> for instruction-stream differences.

This calibration part could be implemented and tested on a small system,
assuming you have sub-microsecond ndelay() and fine-grained timer.

For example, powerpc I mentioned earlier uses the fallback definition
in linux/delay.h:

	#ifndef ndelay
	static inline void ndelay(unsigned long x)
	{
		udelay(DIV_ROUND_UP(x, 1000));
	}
	#define ndelay(x) ndelay(x)
	#endif

> 
> Is there a better way?  Seems like there should be.  ;-)

There can be someone already has done a similar thing.

        Thanks, Akira

> 
> 							Thanx, Paul
> 
>> PS: The bumps in Figures 9.25 and 9.29 in the sub-microsecond region 
>> might be the effect of difference of instruction stream.
>> As we have seen in Figure 9.22, slight changes in the code path,
>> e.g. jump target alignment, can cause 10% -- 20% of performance
>> difference.
>>
>> Enforce inlining un_delay() might or might not help. Just guessing.
>>
>>
>>>                                           Plus, as you imply, different
>>> architectures might need different adjustments.  My concern is that
>>> different CPU generations within a given architecture might also need
>>> different adjustments. :-(
>>>
>>> 							Thanx, Paul
>>>
>>>>         Thanks, Akira
>>>>
>>>> diff --git a/kernel/rcu/refperf.c b/kernel/rcu/refperf.c
>>>> index 5db165ecd465..0a3764ea220c 100644
>>>> --- a/kernel/rcu/refperf.c
>>>> +++ b/kernel/rcu/refperf.c
>>>> @@ -122,7 +122,7 @@ static void un_delay(const int udl, const int ndl)
>>>>         if (udl)
>>>>                 udelay(udl);
>>>>         if (ndl)
>>>> -               ndelay(ndl);
>>>> +               ndelay((ndl * 859) / 1000); // 5 : 2^32/1000000000 (4.295)
>>>>  }
>>>>  
>>>>  static void ref_rcu_read_section(const int nloops)
>>>>
>>>>
>>>>

  reply	other threads:[~2020-06-02 23:05 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-31  0:30 [PATCH 0/3] defer: misc updates Akira Yokosawa
2020-05-31  0:32 ` [PATCH 1/3] defer: Annotate consecutive QQZs as such for 'nq' build Akira Yokosawa
2020-05-31  0:33 ` [PATCH 2/3] FAQ.txt: Advertise 'nq' build in #9 Akira Yokosawa
2020-05-31  0:35 ` [PATCH 3/3] runlatex.sh: Give up early on undefined refs Akira Yokosawa
2020-05-31 16:50 ` [PATCH 0/3] defer: misc updates Paul E. McKenney
2020-05-31 23:11   ` Akira Yokosawa
2020-06-01  1:18     ` Paul E. McKenney
2020-06-01 15:10       ` Akira Yokosawa
2020-06-01 16:13         ` Paul E. McKenney
2020-06-01 22:51           ` Akira Yokosawa
2020-06-01 23:45             ` Paul E. McKenney
2020-06-02 14:27               ` Akira Yokosawa
2020-06-02 15:28                 ` Paul E. McKenney
2020-06-02 23:05                   ` Akira Yokosawa [this message]
2020-06-03  1:02                     ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fde7efb7-f274-7b1e-2b21-a5e60f10c2d0@gmail.com \
    --to=akiyks@gmail.com \
    --cc=paulmck@kernel.org \
    --cc=perfbook@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.