Re: [PATCH 2/4] drm/dp_mst: Only create connector for connected end device

From: Lyude Paul <lyude@redhat.com>
To: "Lin, Wayne" <Wayne.Lin@amd.com>,
	"dri-devel@lists.freedesktop.org"
	<dri-devel@lists.freedesktop.org>
Cc: "Kazlauskas, Nicholas" <Nicholas.Kazlauskas@amd.com>,
	"Wentland, Harry" <Harry.Wentland@amd.com>,
	"Zuo, Jerry" <Jerry.Zuo@amd.com>,
	"Wu, Hersen" <hersenxs.wu@amd.com>,
	"Juston Li" <juston.li@intel.com>,
	"Imre Deak" <imre.deak@intel.com>,
	"Ville Syrjälä" <ville.syrjala@linux.intel.com>,
	"Daniel Vetter" <daniel.vetter@ffwll.ch>,
	"Sean Paul" <sean@poorly.run>,
	"Maarten Lankhorst" <maarten.lankhorst@linux.intel.com>,
	"Maxime Ripard" <mripard@kernel.org>,
	"Thomas Zimmermann" <tzimmermann@suse.de>,
	"David Airlie" <airlied@linux.ie>,
	"Daniel Vetter" <daniel@ffwll.ch>,
	"Deucher, Alexander" <Alexander.Deucher@amd.com>,
	"Siqueira, Rodrigo" <Rodrigo.Siqueira@amd.com>,
	"Pillai, Aurabindo" <Aurabindo.Pillai@amd.com>,
	"Bas Nieuwenhuizen" <bas@basnieuwenhuizen.nl>,
	"Cornij, Nikola" <Nikola.Cornij@amd.com>,
	"Jani Nikula" <jani.nikula@intel.com>,
	"Manasi Navare" <manasi.d.navare@intel.com>,
	"Ankit Nautiyal" <ankit.k.nautiyal@intel.com>,
	"José Roberto de Souza" <jose.souza@intel.com>,
	"Sean Paul" <seanpaul@chromium.org>,
	"Ben Skeggs" <bskeggs@redhat.com>,
	"stable@vger.kernel.org" <stable@vger.kernel.org>
Subject: Re: [PATCH 2/4] drm/dp_mst: Only create connector for connected end device
Date: Tue, 02 Nov 2021 18:31:48 -0400	[thread overview]
Message-ID: <0badd704d7014481cc87a42e58c96a84205f1ca3.camel@redhat.com> (raw)
In-Reply-To: <CO6PR12MB548935F6AE8DA89C965A2A58FC879@CO6PR12MB5489.namprd12.prod.outlook.com>

On Fri, 2021-10-29 at 12:11 +0000, Lin, Wayne wrote:
> [Public]
> 
> Thanks Lyude for patiently guiding on this : )
> Would like to learn more as following

I do follow your bit about connectors only being created when a virtual path
is instantiated, but that still doesn't follow how connectors in DRM typically
behave though as this idea still comes down to "we don't have disconnected
connectors, only connected ones". Which still breaks force probing (if the
connector doesn't exist in userspace because we destroyed it, how do we get to
it's force sysfs file?), and also just makes hides information from userspace
that it might actually care about (what if for instance, a GUI wanted to display
the topology layout of an MST hub -including- all of the currently disconnected
ports on it? Considering we allow this for things like USB, it doesn't make
sense to hide them for MST.

As well, while your idea for what an MST connector is honestly does make a lot
more sense then what we have, that's not really the issue here. The problem is
that connector creation/destruction is already quite racy, and requires a _lot_
of care to get right. We've already had tons of bugs in the past that lead to us
resorting to all of the tricks we're currently using, for instance:
Which just seems to add a lot of
complication to the current MST code, without much reason here besides trying
to reduce the amount of connectors along with a potential bug with leaking
connectors that we still don't know the cause of. Trying to solve problems
without understanding exactly what's causing them 
something around a bug that could be entirely unrelated to how we create
connectors, because then it's not even really guaranteed we've fixed anything
if we don't know what caused the problem in the first place. Working around
problems might temporarily fix the ones we're dealing with right now, but
without understanding what's causing it there's no guarantee it won't just pop
up again in the future or that we won't introduce new problems in the process.

> 
> > 
> > Regardless though, I would think that we could just handle this mostly
> > from the atomic state even with a connector for every port. For
> > instance, i915 already has something called "big joiner" for supporting
> > display configurations where one display can take up two
> > separate display pipes (CRTCs). We could likely do something similar but
> > with connectors if we end up having to deal with a display
> > driven by two DP links.
> > 
> > > I was thinking to associate a drm connector for end stream sink only.
> > > I think we probably won't want to attach a connector to a
> > > relay/retimer/redriver within a stream path? I treat MST port as the
> > > similar role when It's fixed to connect to a MST branch device.
> > 
> > If it's a fixed connection, this might actually be OK to avoid attaching
> > connectors on. Currently with input ports where we know we can
> > never receive a CSN for them during runtime, we're able to avoid creating
> > a connector because no potential for CSN during runtime
> > means the only possible time an input port could transition would be
> > suspend/resume. So if we detect we're on another topology
> > where something that was previously an output port that is an input port
> > on the new topology, we get rid of the connector by
> > removing the drm_dp_mst_port it's associated with from the topology and
> > replace it with a new one. This works pretty well, as it
> > avoids doing any actual connector destruction from the suspend/resume
> > codepath and ensures that any pointer references to the
> > now non-existent output port remain valid for as long as needed. So I
> > might actually be open to expanding this for fixed connections
> > like relays, retimers and redrivers if we handle things in a similar
> > manner.
> > For anything that can receive a CSN though, a drm_connector is
> > unconditionally needed even if nothing's connected.
> 
> Want to deepen my knowledge here. Sorry Lyude could you explain more on this
> please?
> Are you saying that if we change to associate drm connector as what I
> proposed in this patch, we will create actual connector destruction
> from the suspend/resume codepath and which is a problem here? I thought once
> the connection status changed from connected to
> disconnected during suspend/resume, we still use the same way as what we did
> in drm_dp_delayed_destroy_port():
> i.e.
> if (port->connector) {
>         drm_connector_unregister(port->connector);
>         drm_connector_put(port->connector);
> }
> We won't directly destruct the drm connector?

Something like that, I'd need to to go look further into the details because I
very vividly remember most of the tricks we do in the MST helpers regarding
delayed connector destruction and when/how we change various members of the
drm_dp_mst_port/drm_dp_mst_branch structures. I vaguely remember the problem
with trying to hot add/remove connectors (I -did- actually try to do this once
I believe! but not as thoroughly as you have) being some kind of lockdep
issue. I started trying to dig into the MST code a bit deeper to get a clear
answer on this, but I actually decided to take that time and just finish up
the debug helpers I mentioned (I'll send the WIP patch I've got to you in a
moment, and will send it off the mailing list once I finish hooking things up
in i915) because it really just doesn't seem to me like we actually have a
clear understanding of how this issue is being caused - and it's not a good
idea for us to make any kind of API change like this to attempt (and
inevitably fail or break something else) to fix an issue we don't fully
understand.

[snip...]

> > 
> Right! I might not recall correctly, but I think that's why I want this
> patch. I probably encountered that userspace doesn’t explicitly
> try to react to this unplug event. Instead, it tries to react when we plug
> in monitor next time. And the problem is when we plug in
> monitor next time, stale resources are not released yet. It then hits the
> limitation within our HW. Which let me want to explicitly
> release resource once driver detect the unplug event (just like sst long HPD
> event I think).  By the way, just out of curiosity, when
> do you think is the timing to release sink related resource if we rely on
> hotplug event notifying userspace? When userspace frees the
> associated pipe of the connector? Won't it be a transient state that
> userspace just free the pipe temporarily?

The timing of releasing resources should be done at the same time that we
disable the connector. In general, MST modesetting shouldn't be much different
from anything else - except for having to maintain a payload table and
bandwidth limitations across a shared connection. So pretty much everything
related to enabling or disabling streams should be in the atomic commit phase
(with any bandwidth calculations done beforehand, WIP...). I'm going to say,
let's figure out where this is happening first. I've got the debugging patches
for this ready and will send them to you now.

> 
> > Also, I'm still working on the debugging stuff btw!
> Much appreciate Lyude! Thanks!
> 
> > 
> > --
> > Cheers,
> >  Lyude Paul (she/her)
> >  Software Engineer at Red Hat
> --
> Regards,
> Wayne Lin

-- 
Cheers, Lyude Paul (she/her) Software Engineer at Red Hat

-- 
Cheers,
 Lyude Paul (she/her)
 Software Engineer at Red Hat