All of lore.kernel.org
 help / color / mirror / Atom feed
From: Chris Hyser <chris.hyser@oracle.com>
To: Josh Don <joshdon@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	"Joel Fernandes (Google)" <joel@joelfernandes.org>,
	Nishanth Aravamudan <naravamudan@digitalocean.com>,
	Julien Desfossez <jdesfossez@digitalocean.com>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	Vineeth Pillai <viremana@linux.microsoft.com>,
	Aaron Lu <aaron.lwe@gmail.com>,
	Aubrey Li <aubrey.intel@gmail.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	mingo@kernel.org, torvalds@linux-foundation.org,
	fweisbec@gmail.com, Kees Cook <keescook@chromium.org>,
	Greg Kerr <kerrnel@google.com>, Phil Auld <pauld@redhat.com>,
	Valentin Schneider <valentin.schneider@arm.com>,
	Mel Gorman <mgorman@techsingularity.net>,
	Pawan Gupta <pawan.kumar.gupta@linux.intel.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	vineeth@bitbyteword.org, Chen Yu <yu.c.chen@intel.com>,
	Christian Brauner <christian.brauner@ubuntu.com>,
	Agata Gruza <agata.gruza@intel.com>,
	Antonio Gomez Iglesias <antonio.gomez.iglesias@intel.com>,
	graf@amazon.com, konrad.wilk@oracle.com, dfaggioli@suse.com,
	Paul Turner <pjt@google.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Patrick Bellasi <derkling@google.com>,
	benbjiang@tencent.com,
	Alexandre Chartre <alexandre.chartre@oracle.com>,
	James.Bottomley@hansenpartnership.com, OWeisse@umich.edu,
	Dhaval Giani <dhaval.giani@oracle.com>,
	Junaid Shahid <junaids@google.com>,
	Jesse Barnes <jsbarnes@google.com>,
	Ben Segall <bsegall@google.com>, Hao Luo <haoluo@google.com>,
	Tom Lendacky <thomas.lendacky@amd.com>
Subject: Re: [PATCH v10 2/5] sched: CGroup tagging interface for core scheduling
Date: Wed, 24 Feb 2021 08:02:58 -0500	[thread overview]
Message-ID: <c65bde1e-bac9-e6b6-e6c8-78b93f27b8e4@oracle.com> (raw)
In-Reply-To: <CABk29NvX9_RxpZ71ihR7Y_Nhpg0TpBfdXzehptO52VuwOmS2Ww@mail.gmail.com>



On 2/24/21 12:15 AM, Josh Don wrote:
> On Tue, Feb 23, 2021 at 11:26 AM Chris Hyser <chris.hyser@oracle.com> wrote:
>>
>> On 2/23/21 4:05 AM, Peter Zijlstra wrote:
>>> On Mon, Feb 22, 2021 at 11:00:37PM -0500, Chris Hyser wrote:
>>>> On 1/22/21 8:17 PM, Joel Fernandes (Google) wrote:
>>>> While trying to test the new prctl() code I'm working on, I ran into a bug I
>>>> chased back into this v10 code. Under a fair amount of stress, when the
>>>> function __sched_core_update_cookie() is ultimately called from
>>>> sched_core_fork(), the system deadlocks or otherwise non-visibly crashes.
>>>> I've not had much success figuring out why/what. I'm running with LOCKDEP on
>>>> and seeing no complaints. Duplicating it only requires setting a cookie on a
>>>> task and forking a bunch of threads ... all of which then want to update
>>>> their cookie.
>>>
>>> Can you share the code and reproducer?
>>
>> Attached is a tarball with c code (source) and scripts. Just run ./setup_bug which will compile the source and start a
>> bash with a cs cookie. Then run ./show_bug which dumps the cookie and then fires off some processes and threads. Note
>> the cs_clone command is not doing any core sched prctls for this test (not needed and currently coded for a diff prctl
>> interface). It just creates processes and threads. I see this hang almost instantly.
>>
>> Josh, I did verify that this occurs on Joel's coresched tree both with and w/o the kprot patch and that should exactly
>> correspond to these patches.
>>
>> -chrish
>>
> 
> I think I've gotten to the root of this. In the fork code, our cases
> for inheriting task_cookie are inverted for CLONE_THREAD vs
> !CLONE_THREAD. As a result, we are creating a new cookie per-thread,
> rather than inheriting from the parent. Now this is actually ok; I'm
> not observing a scalability problem with creating this many cookies.

This isn't the issue. The test code generates cases for both THREAD_CLONE and not and both paths call the cookie update 
code. The new code I was testing when I discovered this, fixed the problem you noted.


> However, it means that overall throughput of your binary is cut in
> ~half, since none of the threads can share a core. Note that I never
> saw an indefinite deadlock, just ~2x runtime for your binary vs th > control. I've verified that both a) manually hardcoding all threads to
> be able to share regardless of cookie, and b) using a machine with 6
> cores instead of 2, both allow your binary to complete in the same
> amount of time as without the new API.

This was on a 24 core box. When I run the test, I definitely hangs. I'll answer your other email as wwll.

-chrish

  reply	other threads:[~2021-02-24 13:51 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-23  1:16 [PATCH v10 0/5] Core scheduling remaining patches Joel Fernandes (Google)
2021-01-23  1:17 ` [PATCH v10 1/5] sched: migration changes for core scheduling Joel Fernandes (Google)
2021-01-23  1:17 ` [PATCH v10 2/5] sched: CGroup tagging interface " Joel Fernandes (Google)
2021-02-03 16:51   ` Peter Zijlstra
2021-02-04 13:59     ` Peter Zijlstra
2021-02-05 16:42       ` Joel Fernandes
2021-02-04 13:54   ` Peter Zijlstra
2021-02-05  3:45     ` Josh Don
2021-02-04 13:57   ` Peter Zijlstra
2021-02-04 20:52     ` Chris Hyser
2021-02-05 10:43       ` Peter Zijlstra
2021-02-05 22:19         ` Chris Hyser
2021-02-04 14:01   ` Peter Zijlstra
2021-02-05  3:55     ` Josh Don
2021-02-04 14:35   ` Peter Zijlstra
2021-02-05  4:07     ` Josh Don
2021-02-04 14:52   ` Peter Zijlstra
2021-02-05 16:37     ` Joel Fernandes
2021-02-05 11:41   ` Peter Zijlstra
2021-02-05 11:52   ` Peter Zijlstra
2021-02-06  1:15     ` Josh Don
2021-02-05 12:00   ` Peter Zijlstra
2021-02-23  4:00   ` Chris Hyser
2021-02-23  9:05     ` Peter Zijlstra
2021-02-23 19:25       ` Chris Hyser
2021-02-24  5:15         ` Josh Don
2021-02-24 13:02           ` Chris Hyser [this message]
2021-02-24 13:52             ` chris hyser
2021-02-24 15:47               ` chris hyser
2021-02-26 20:07                 ` Chris Hyser
2021-03-01 21:01                   ` Josh Don
2021-01-23  1:17 ` [PATCH v10 3/5] kselftest: Add tests for core-sched interface Joel Fernandes (Google)
2021-01-23  1:17 ` [PATCH v10 4/5] Documentation: Add core scheduling documentation Joel Fernandes (Google)
2021-01-23  1:17 ` [PATCH v10 5/5] sched: Debug bits Joel Fernandes (Google)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c65bde1e-bac9-e6b6-e6c8-78b93f27b8e4@oracle.com \
    --to=chris.hyser@oracle.com \
    --cc=James.Bottomley@hansenpartnership.com \
    --cc=OWeisse@umich.edu \
    --cc=aaron.lwe@gmail.com \
    --cc=agata.gruza@intel.com \
    --cc=alexandre.chartre@oracle.com \
    --cc=antonio.gomez.iglesias@intel.com \
    --cc=aubrey.intel@gmail.com \
    --cc=benbjiang@tencent.com \
    --cc=bsegall@google.com \
    --cc=christian.brauner@ubuntu.com \
    --cc=derkling@google.com \
    --cc=dfaggioli@suse.com \
    --cc=dhaval.giani@oracle.com \
    --cc=fweisbec@gmail.com \
    --cc=graf@amazon.com \
    --cc=haoluo@google.com \
    --cc=jdesfossez@digitalocean.com \
    --cc=joel@joelfernandes.org \
    --cc=joshdon@google.com \
    --cc=jsbarnes@google.com \
    --cc=junaids@google.com \
    --cc=keescook@chromium.org \
    --cc=kerrnel@google.com \
    --cc=konrad.wilk@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@techsingularity.net \
    --cc=mingo@kernel.org \
    --cc=naravamudan@digitalocean.com \
    --cc=pauld@redhat.com \
    --cc=pawan.kumar.gupta@linux.intel.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=thomas.lendacky@amd.com \
    --cc=tim.c.chen@linux.intel.com \
    --cc=torvalds@linux-foundation.org \
    --cc=valentin.schneider@arm.com \
    --cc=vineeth@bitbyteword.org \
    --cc=viremana@linux.microsoft.com \
    --cc=yu.c.chen@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.