All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Oliver O'Halloran" <oohall@gmail.com>
To: Ganesh Goudar <ganeshgr@linux.ibm.com>
Cc: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>,
	Mahesh J Salgaonkar <mahesh@linux.ibm.com>
Subject: Re: [RFC 0/3] Asynchronous EEH recovery
Date: Wed, 17 Aug 2022 17:16:09 +1000	[thread overview]
Message-ID: <CAOSf1CHicPi6B_tGZ64xEknkfCSUmERTLVq_92XHHmh6-gxYHw@mail.gmail.com> (raw)
In-Reply-To: <20220816032716.108297-1-ganeshgr@linux.ibm.com>

On Tue, Aug 16, 2022 at 1:29 PM Ganesh Goudar <ganeshgr@linux.ibm.com> wrote:
>
> Hi,
>
> EEH reocvery is currently serialized and these patches shorten
> the time taken for EEH recovery by making the recovery to run
> in parallel. The original author of these patches is Sam Bobroff,
> I have rebased and tested these patches.
>
> On powervm with 64 VFs and I see approximately 48% reduction
> in time taken in EEH recovery,

Can you elaborate on how you're testing this? A VM with 64 separate
VFs on 64 separate PHBs I guess? How are you injecting errors?

> Yet to be tested on powernv.

If you're not testing on powernv you might as well not be testing. EEH
support on pseries is relatively straightforward since managing PCI
bridges, etc is all hidden away in the hypervisor. Most of the things
about EEH which give me headaches simply don't happen on pseries since
the PCI topology seen by the guest is flat.

> These patches were originally posted as separate RFCs, I think
> posting them as single series would be more helpful, I know the
> patches are too big, I will try to logically divide in next
> iterations.
>
> Thanks
>
> Ganesh Goudar (3):
>   powerpc/eeh: Synchronization for safety

I didn't pick that patch up after Sam left for a reason. I don't
recall the exact details, but I think I decided that the lock being
added had the same (or mostly the same) scope as the pci rescan lock.
All modifications to the EEH PE tree have to occur with the rescan
lock held since the per-device EEH setup occurs while configuring the
pci_dev (similarly the teardown happens when we remove the pci_dev
from its bus). I don't think that was true when Same wrote the patch
originally though since it pre-dates my big EEH init re-work.

It's possible (probable even!) I got that wrong, or that it was
something I was trying to make true through re-works rather than an
accurate assessment of the current state of things. I'll try page some
of this back into my brain later. I think I left some comments on
Sam's original RFC patch which might still be valid.

>   powerpc/eeh: Asynchronous recovery

I remember not hating the idea, but I think the implementation leaves
a lot to be desired. If you only really care about pseries then just
moving to a model where each PHB has its own recovery thread would get
you most of the benefit without needing to turn the already
complicated recovery path into an async trainwreck. IIRC Sam's
motivation for that patch was to make recovery faster for powernv
systems when an error was injected on a PF with a lot of child VFs.
Certain drivers took forever to recover each VF (might have been mlx5)
so doing it in parallel would have sped things up considerably.

Oliver

  parent reply	other threads:[~2022-08-17  7:17 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-16  3:27 [RFC 0/3] Asynchronous EEH recovery Ganesh Goudar
2022-08-16  3:27 ` [RFC 1/3] powerpc/eeh: Synchronization for safety Ganesh Goudar
2022-08-16  3:27 ` [RFC 2/3] powerpc/eeh: Provide a unique ID for each EEH recovery Ganesh Goudar
2022-08-16  3:27 ` [RFC 3/3] powerpc/eeh: Asynchronous recovery Ganesh Goudar
2022-08-17  7:16 ` Oliver O'Halloran [this message]
2022-09-02  0:19 ` [RFC 0/3] Asynchronous EEH recovery Jason Gunthorpe
2022-09-15 10:15   ` Ganesh
2023-01-25 14:04     ` Christophe Leroy
2023-06-13  1:43 Ganesh Goudar
2023-06-13  2:36 ` Oliver O'Halloran
2023-07-17  8:10   ` Ganesh G R

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAOSf1CHicPi6B_tGZ64xEknkfCSUmERTLVq_92XHHmh6-gxYHw@mail.gmail.com \
    --to=oohall@gmail.com \
    --cc=ganeshgr@linux.ibm.com \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mahesh@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.