linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Michael Ellerman <mpe@ellerman.id.au>
To: Nathan Lynch <nathanl@linux.ibm.com>,
	Laurent Dufour <ldufour@linux.ibm.com>
Cc: tyreld@linux.ibm.com, cheloha@linux.ibm.com,
	linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH] powerpc/pseries: explicitly reschedule during drmem_lmb list traversal
Date: Sun, 02 Aug 2020 22:42:46 +1000	[thread overview]
Message-ID: <87tuxl1ant.fsf@mpe.ellerman.id.au> (raw)
In-Reply-To: <87365723m0.fsf@linux.ibm.com>

Nathan Lynch <nathanl@linux.ibm.com> writes:
> Michael Ellerman <mpe@ellerman.id.au> writes:
>> Nathan Lynch <nathanl@linux.ibm.com> writes:
>>> Michael Ellerman <mpe@ellerman.id.au> writes:
>>>> Nathan Lynch <nathanl@linux.ibm.com> writes:
>>>>> Laurent Dufour <ldufour@linux.ibm.com> writes:
>>>>>> Le 28/07/2020 à 19:37, Nathan Lynch a écrit :
>>>>>>> The drmem lmb list can have hundreds of thousands of entries, and
>>>>>>> unfortunately lookups take the form of linear searches. As long as
>>>>>>> this is the case, traversals have the potential to monopolize the CPU
>>>>>>> and provoke lockup reports, workqueue stalls, and the like unless
>>>>>>> they explicitly yield.
>>>>>>> 
>>>>>>> Rather than placing cond_resched() calls within various
>>>>>>> for_each_drmem_lmb() loop blocks in the code, put it in the iteration
>>>>>>> expression of the loop macro itself so users can't omit it.
>>>>>>
>>>>>> Is that not too much to call cond_resched() on every LMB?
>>>>>>
>>>>>> Could that be less frequent, every 10, or 100, I don't really know ?
>>>>>
>>>>> Everything done within for_each_drmem_lmb is relatively heavyweight
>>>>> already. E.g. calling dlpar_remove_lmb()/dlpar_add_lmb() can take dozens
>>>>> of milliseconds. I don't think cond_resched() is an expensive check in
>>>>> this context.
>>>>
>>>> Hmm, mostly.
>>>>
>>>> But there are quite a few cases like drmem_update_dt_v1():
>>>>
>>>> 	for_each_drmem_lmb(lmb) {
>>>> 		dr_cell->base_addr = cpu_to_be64(lmb->base_addr);
>>>> 		dr_cell->drc_index = cpu_to_be32(lmb->drc_index);
>>>> 		dr_cell->aa_index = cpu_to_be32(lmb->aa_index);
>>>> 		dr_cell->flags = cpu_to_be32(drmem_lmb_flags(lmb));
>>>>
>>>> 		dr_cell++;
>>>> 	}
>>>>
>>>> Which will compile to a pretty tight loop at the moment.
>>>>
>>>> Or drmem_update_dt_v2() which has two loops over all lmbs.
>>>>
>>>> And although the actual TIF check is cheap the function call to do it is
>>>> not free.
>>>>
>>>> So I worry this is going to make some of those long loops take even
>>>> longer.
>>>
>>> That's fair, and I was wrong - some of the loop bodies are relatively
>>> simple, not doing allocations or taking locks, etc.
>>>
>>> One way to deal is to keep for_each_drmem_lmb() as-is and add a new
>>> iterator that can reschedule, e.g. for_each_drmem_lmb_slow().
>>
>> If we did that, how many call-sites would need converting?
>> Is it ~2 or ~20 or ~200?
>
> At a glance I would convert 15-20 out of the 24 users in the tree I'm
> looking at. Let me know if I should do a v2 with that approach.

OK, that's a bunch of churn then, if we're planning to rework the code
significantly in the near future.

One thought, which I possibly should not put in writing, is that we
could use the alignment of the pointer as a poor man's substitute for a
counter, eg:

+static inline struct drmem_lmb *drmem_lmb_next(struct drmem_lmb *lmb)
+{
+	if (lmb % PAGE_SIZE == 0)
+		cond_resched();
+
+	return ++lmb;
+}

I think the lmbs are allocated in a block, so I think that will work.
Maybe PAGE_SIZE is not the right size to use, but you get the idea.

Gross I know, but might be OK as short term solution?

cheers

  reply	other threads:[~2020-08-02 12:44 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-28 17:37 [PATCH] powerpc/pseries: explicitly reschedule during drmem_lmb list traversal Nathan Lynch
2020-07-28 17:46 ` Laurent Dufour
2020-07-28 19:19   ` Nathan Lynch
2020-07-30  0:57     ` Michael Ellerman
2020-07-30 15:01       ` Nathan Lynch
2020-07-31 13:16         ` Michael Ellerman
2020-07-31 13:52           ` Nathan Lynch
2020-08-02 12:42             ` Michael Ellerman [this message]
2020-08-10 20:03               ` Nathan Lynch
2020-08-12  1:32                 ` Nathan Lynch
2020-09-09 13:27 ` Michael Ellerman
2020-09-10  7:37   ` Michael Ellerman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87tuxl1ant.fsf@mpe.ellerman.id.au \
    --to=mpe@ellerman.id.au \
    --cc=cheloha@linux.ibm.com \
    --cc=ldufour@linux.ibm.com \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=nathanl@linux.ibm.com \
    --cc=tyreld@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).