linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Chris Hyser <chris.hyser@oracle.com>
To: Josh Don <joshdon@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	"Joel Fernandes (Google)" <joel@joelfernandes.org>,
	Nishanth Aravamudan <naravamudan@digitalocean.com>,
	Julien Desfossez <jdesfossez@digitalocean.com>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	Vineeth Pillai <viremana@linux.microsoft.com>,
	Aaron Lu <aaron.lwe@gmail.com>,
	Aubrey Li <aubrey.intel@gmail.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	mingo@kernel.org, torvalds@linux-foundation.org,
	fweisbec@gmail.com, Kees Cook <keescook@chromium.org>,
	Greg Kerr <kerrnel@google.com>, Phil Auld <pauld@redhat.com>,
	Valentin Schneider <valentin.schneider@arm.com>,
	Mel Gorman <mgorman@techsingularity.net>,
	Pawan Gupta <pawan.kumar.gupta@linux.intel.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	vineeth@bitbyteword.org, Chen Yu <yu.c.chen@intel.com>,
	Christian Brauner <christian.brauner@ubuntu.com>,
	Agata Gruza <agata.gruza@intel.com>,
	Antonio Gomez Iglesias <antonio.gomez.iglesias@intel.com>,
	graf@amazon.com, konrad.wilk@oracle.com, dfaggioli@suse.com,
	Paul Turner <pjt@google.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Patrick Bellasi <derkling@google.com>,
	benbjiang@tencent.com,
	Alexandre Chartre <alexandre.chartre@oracle.com>,
	James.Bottomley@hansenpartnership.com, OWeisse@umich.edu,
	Dhaval Giani <dhaval.giani@oracle.com>,
	Junaid Shahid <junaids@google.com>,
	Jesse Barnes <jsbarnes@google.com>,
	Ben Segall <bsegall@google.com>, Hao Luo <haoluo@google.com>,
	Tom Lendacky <thomas.lendacky@amd.com>
Subject: Re: [PATCH v10 2/5] sched: CGroup tagging interface for core scheduling
Date: Wed, 24 Feb 2021 08:02:58 -0500	[thread overview]
Message-ID: <c65bde1e-bac9-e6b6-e6c8-78b93f27b8e4@oracle.com> (raw)
In-Reply-To: <CABk29NvX9_RxpZ71ihR7Y_Nhpg0TpBfdXzehptO52VuwOmS2Ww@mail.gmail.com>



On 2/24/21 12:15 AM, Josh Don wrote:
> On Tue, Feb 23, 2021 at 11:26 AM Chris Hyser <chris.hyser@oracle.com> wrote:
>>
>> On 2/23/21 4:05 AM, Peter Zijlstra wrote:
>>> On Mon, Feb 22, 2021 at 11:00:37PM -0500, Chris Hyser wrote:
>>>> On 1/22/21 8:17 PM, Joel Fernandes (Google) wrote:
>>>> While trying to test the new prctl() code I'm working on, I ran into a bug I
>>>> chased back into this v10 code. Under a fair amount of stress, when the
>>>> function __sched_core_update_cookie() is ultimately called from
>>>> sched_core_fork(), the system deadlocks or otherwise non-visibly crashes.
>>>> I've not had much success figuring out why/what. I'm running with LOCKDEP on
>>>> and seeing no complaints. Duplicating it only requires setting a cookie on a
>>>> task and forking a bunch of threads ... all of which then want to update
>>>> their cookie.
>>>
>>> Can you share the code and reproducer?
>>
>> Attached is a tarball with c code (source) and scripts. Just run ./setup_bug which will compile the source and start a
>> bash with a cs cookie. Then run ./show_bug which dumps the cookie and then fires off some processes and threads. Note
>> the cs_clone command is not doing any core sched prctls for this test (not needed and currently coded for a diff prctl
>> interface). It just creates processes and threads. I see this hang almost instantly.
>>
>> Josh, I did verify that this occurs on Joel's coresched tree both with and w/o the kprot patch and that should exactly
>> correspond to these patches.
>>
>> -chrish
>>
> 
> I think I've gotten to the root of this. In the fork code, our cases
> for inheriting task_cookie are inverted for CLONE_THREAD vs
> !CLONE_THREAD. As a result, we are creating a new cookie per-thread,
> rather than inheriting from the parent. Now this is actually ok; I'm
> not observing a scalability problem with creating this many cookies.

This isn't the issue. The test code generates cases for both THREAD_CLONE and not and both paths call the cookie update 
code. The new code I was testing when I discovered this, fixed the problem you noted.


> However, it means that overall throughput of your binary is cut in
> ~half, since none of the threads can share a core. Note that I never
> saw an indefinite deadlock, just ~2x runtime for your binary vs th > control. I've verified that both a) manually hardcoding all threads to
> be able to share regardless of cookie, and b) using a machine with 6
> cores instead of 2, both allow your binary to complete in the same
> amount of time as without the new API.

This was on a 24 core box. When I run the test, I definitely hangs. I'll answer your other email as wwll.

-chrish

  reply	other threads:[~2021-02-24 13:51 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-23  1:16 [PATCH v10 0/5] Core scheduling remaining patches Joel Fernandes (Google)
2021-01-23  1:17 ` [PATCH v10 1/5] sched: migration changes for core scheduling Joel Fernandes (Google)
2021-01-23  1:17 ` [PATCH v10 2/5] sched: CGroup tagging interface " Joel Fernandes (Google)
2021-02-03 16:51   ` Peter Zijlstra
2021-02-04 13:59     ` Peter Zijlstra
2021-02-05 16:42       ` Joel Fernandes
2021-02-04 13:54   ` Peter Zijlstra
2021-02-05  3:45     ` Josh Don
2021-02-04 13:57   ` Peter Zijlstra
2021-02-04 20:52     ` Chris Hyser
2021-02-05 10:43       ` Peter Zijlstra
2021-02-05 22:19         ` Chris Hyser
2021-02-04 14:01   ` Peter Zijlstra
2021-02-05  3:55     ` Josh Don
2021-02-04 14:35   ` Peter Zijlstra
2021-02-05  4:07     ` Josh Don
2021-02-04 14:52   ` Peter Zijlstra
2021-02-05 16:37     ` Joel Fernandes
2021-02-05 11:41   ` Peter Zijlstra
2021-02-05 11:52   ` Peter Zijlstra
2021-02-06  1:15     ` Josh Don
2021-02-05 12:00   ` Peter Zijlstra
2021-02-23  4:00   ` Chris Hyser
2021-02-23  9:05     ` Peter Zijlstra
2021-02-23 19:25       ` Chris Hyser
2021-02-24  5:15         ` Josh Don
2021-02-24 13:02           ` Chris Hyser [this message]
2021-02-24 13:52             ` chris hyser
2021-02-24 15:47               ` chris hyser
2021-02-26 20:07                 ` Chris Hyser
2021-03-01 21:01                   ` Josh Don
2021-01-23  1:17 ` [PATCH v10 3/5] kselftest: Add tests for core-sched interface Joel Fernandes (Google)
2021-01-23  1:17 ` [PATCH v10 4/5] Documentation: Add core scheduling documentation Joel Fernandes (Google)
2021-01-23  1:17 ` [PATCH v10 5/5] sched: Debug bits Joel Fernandes (Google)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c65bde1e-bac9-e6b6-e6c8-78b93f27b8e4@oracle.com \
    --to=chris.hyser@oracle.com \
    --cc=James.Bottomley@hansenpartnership.com \
    --cc=OWeisse@umich.edu \
    --cc=aaron.lwe@gmail.com \
    --cc=agata.gruza@intel.com \
    --cc=alexandre.chartre@oracle.com \
    --cc=antonio.gomez.iglesias@intel.com \
    --cc=aubrey.intel@gmail.com \
    --cc=benbjiang@tencent.com \
    --cc=bsegall@google.com \
    --cc=christian.brauner@ubuntu.com \
    --cc=derkling@google.com \
    --cc=dfaggioli@suse.com \
    --cc=dhaval.giani@oracle.com \
    --cc=fweisbec@gmail.com \
    --cc=graf@amazon.com \
    --cc=haoluo@google.com \
    --cc=jdesfossez@digitalocean.com \
    --cc=joel@joelfernandes.org \
    --cc=joshdon@google.com \
    --cc=jsbarnes@google.com \
    --cc=junaids@google.com \
    --cc=keescook@chromium.org \
    --cc=kerrnel@google.com \
    --cc=konrad.wilk@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@techsingularity.net \
    --cc=mingo@kernel.org \
    --cc=naravamudan@digitalocean.com \
    --cc=pauld@redhat.com \
    --cc=pawan.kumar.gupta@linux.intel.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=thomas.lendacky@amd.com \
    --cc=tim.c.chen@linux.intel.com \
    --cc=torvalds@linux-foundation.org \
    --cc=valentin.schneider@arm.com \
    --cc=vineeth@bitbyteword.org \
    --cc=viremana@linux.microsoft.com \
    --cc=yu.c.chen@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).