From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.1 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EA508C432BE for ; Wed, 1 Sep 2021 21:59:53 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C1B126108E for ; Wed, 1 Sep 2021 21:59:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232600AbhIAWAu (ORCPT ); Wed, 1 Sep 2021 18:00:50 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:32552 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230182AbhIAWAt (ORCPT ); Wed, 1 Sep 2021 18:00:49 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1630533591; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=uaJ6JdAwGVr6C+6ju0Ho3/Oo9maaQwWf3p+I5+adotY=; b=AS36Bm4MKTLUS07qLHjVfg+JPpMjn0VqUJvsVsc8+z4Sfv9dSj5XH7Rm2fQh3lDyNdPxXQ 3DQeF6fK4ETes0oosrUNJGuZgAq+sAeH7AdPj58lmGTWM0zXk4T+ahEIQGaiGGEwbeqMP9 xZwsAQMHnxM/I+yNmldmIZWbPS4J2Bk= Received: from mail-qk1-f198.google.com (mail-qk1-f198.google.com [209.85.222.198]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-154-oih-1oiiOF6J_eBapwsMaA-1; Wed, 01 Sep 2021 17:59:50 -0400 X-MC-Unique: oih-1oiiOF6J_eBapwsMaA-1 Received: by mail-qk1-f198.google.com with SMTP id q13-20020a05620a038d00b003d38f784161so978855qkm.8 for ; Wed, 01 Sep 2021 14:59:49 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:subject:from:to:cc:date:in-reply-to :references:organization:user-agent:mime-version :content-transfer-encoding; bh=uaJ6JdAwGVr6C+6ju0Ho3/Oo9maaQwWf3p+I5+adotY=; b=mTY8gh3CzQQ0WG68WPUvc3zU99cvSOdzwOZeDJ0K4htd4KqhiI5iM6nPP8w+Y/kbgc eL3IsY4phmt3BrHdHVkFEgJLNU4hFjPjjELepPFYFAJvoa4j+okjcmVaBwsumrFxr5Dt olleTiXyZZv6utVhFLfOZQ1pbPAA6/eFmQ5eQyWmZZ0FyPX7/ah+1N2F7Nu2bVfmb9+m moZitwoIXrSt+05Pjh61X/VjNWrDQsrLyVo6m9Mb1QJrHg6ti2MsuuSCNDdblHDz9toj FGeRSWPScQ8gW81D3LJz1eGzbTspLqBjz/dKywgenrR0tDdtQTsG2eL0Py5a704Er39+ alLQ== X-Gm-Message-State: AOAM5307gaqfhBkcKnSE4mNJUZJDKhmlS8oYANeDmRikb0bp3uWGJQ3z yaRfWlFxdGTQFhP4OOoHuwZbncWe7oucRzvC3Wq0YabsSs+/Hy4yozPKSPlzra0372SfU16hz4f 2UYcWOC+XipxJMuxH X-Received: by 2002:ac8:7516:: with SMTP id u22mr1611360qtq.119.1630533589494; Wed, 01 Sep 2021 14:59:49 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzS0EfxoDKe30r19w7LTQumxLJiraDbw+JW3yjbYYNxa2GGnb6eeeIYa1/9Q1loUlbuQ+LWmA== X-Received: by 2002:ac8:7516:: with SMTP id u22mr1611346qtq.119.1630533589262; Wed, 01 Sep 2021 14:59:49 -0700 (PDT) Received: from [192.168.8.104] (pool-108-49-102-102.bstnma.fios.verizon.net. [108.49.102.102]) by smtp.gmail.com with ESMTPSA id e21sm724922qtg.40.2021.09.01.14.59.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 01 Sep 2021 14:59:48 -0700 (PDT) Message-ID: Subject: Re: [PATCH 2/4] drm/dp_mst: Only create connector for connected end device From: Lyude Paul To: "Lin, Wayne" , "dri-devel@lists.freedesktop.org" Cc: "Kazlauskas, Nicholas" , "Wentland, Harry" , "Zuo, Jerry" , "Wu, Hersen" , Juston Li , Imre Deak , Ville =?ISO-8859-1?Q?Syrj=E4l=E4?= , Daniel Vetter , Sean Paul , Maarten Lankhorst , Maxime Ripard , Thomas Zimmermann , David Airlie , Daniel Vetter , "Deucher, Alexander" , "Siqueira, Rodrigo" , "Pillai, Aurabindo" , Bas Nieuwenhuizen , "Cornij, Nikola" , Jani Nikula , Manasi Navare , Ankit Nautiyal , =?ISO-8859-1?Q?Jos=E9?= Roberto de Souza , Sean Paul , Ben Skeggs , "stable@vger.kernel.org" Date: Wed, 01 Sep 2021 17:59:46 -0400 In-Reply-To: References: <20210720160342.11415-1-Wayne.Lin@amd.com> <20210720160342.11415-3-Wayne.Lin@amd.com> <69a5f39580f0d3519468f45ecbfd50d7ad1b3036.camel@redhat.com> <292d6ead03d6afe54f81d52f705e38bbf9feb7bd.camel@redhat.com> <2012d26bb2bece43e65ce435e6ba03f1d8767f61.camel@redhat.com> <6a0868a8ce6befd5f7ddea3481e70285079fcb6a.camel@redhat.com> Organization: Red Hat Content-Type: text/plain; charset="UTF-8" User-Agent: Evolution 3.40.4 (3.40.4-1.fc34) MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org Actually - did some more thinking, and I think we shouldn't try to make changes like this until we actually know what the problem is here. I could try to figure out what the actual race conditions I was facing before with trying to add/destroy connectors based on PDT, but we still don't even actually have a clear idea of what's broken here. I'd much rather us figure out exactly how this leak is happening before considering making changes like this, because we have no way of knowing if we've properly fixed it or not if we don't know what the problem is in the first place. I'm still happy to write up the topology debugging stuff I mentioned to you if you think that would help you debug this issue - since that would make it a lot easier for you to track down what references are keeping a connector alive (and additkionally, where those references were taken in code. thanks stack_depot!) On Tue, 2021-08-31 at 18:47 -0400, Lyude Paul wrote: > (I am going to try responding to this tomorrow btw. I haven't been super > busy > this week, but this has been a surprisingly difficult email to respond to > because I need to actually need to do a deep dive some of the MST helpers > tomorrow to figure out more of the specifics on why I realized we couldn't > just hot add/remove port->connector here). > > On Wed, 2021-08-25 at 03:35 +0000, Lin, Wayne wrote: > > [Public] > > > > > -----Original Message----- > > > From: Lyude Paul > > > Sent: Tuesday, August 24, 2021 5:18 AM > > > To: Lin, Wayne ; dri-devel@lists.freedesktop.org > > > Cc: Kazlauskas, Nicholas ; Wentland, Harry > > > ; Zuo, Jerry > > > ; Wu, Hersen ; Juston Li > > > ; Imre Deak ; > > > Ville Syrjälä ; Daniel Vetter > > > ; Sean Paul ; Maarten Lankhorst > > > ; Maxime Ripard ; > > > Thomas Zimmermann ; > > > David Airlie ; Daniel Vetter ; > > > Deucher, > > > Alexander ; Siqueira, > > > Rodrigo ; Pillai, Aurabindo > > > ; Bas Nieuwenhuizen > > > ; Cornij, Nikola ; Jani > > > Nikula ; Manasi Navare > > > ; Ankit Nautiyal > > > ; > > > José Roberto de Souza ; Sean > > > Paul ; Ben Skeggs ; > > > stable@vger.kernel.org > > > Subject: Re: [PATCH 2/4] drm/dp_mst: Only create connector for connected > > > end device > > > > > > [snip] > > > > > > I think I might still be misunderstanding something, some comments below > > > > > > On Mon, 2021-08-23 at 06:33 +0000, Lin, Wayne wrote: > > > > > > Hi Lyude, > > > > > > > > > > > > Really thankful for willing to explain in such details. Really > > > > > > appreciate. > > > > > > > > > > > > I'm trying to fix some problems that observed after these 2 > > > > > > patches > > > > > > * 09b974e8983 drm/amd/amdgpu_dm/mst: Remove ->destroy_connector() > > > > > > callback > > > > > > * 72dc0f51591 drm/dp_mst: Remove > > > > > > drm_dp_mst_topology_cbs.destroy_connector > > > > > > > > > > > > With above patches, we now change to remove dc_sink when connector > > > > > > is about to be destroyed. However, we found out that connectors > > > > > > won't get destroyed after hotplugs. Thus, after few times > > > > > > hotplugs, we won't create any new dc_sink since number of sink is > > > > > > exceeding our limitation. As the result of that, I'm trying to > > > > > > figure out why the refcount of connectors won't get zero. > > > > > > > > > > > > Based on my analysis, I found out that if we connect a sst monitor > > > > > > to a mst hub then connect the hub to the system, and then unplug > > > > > > the sst monitor from the hub. E.g. > > > > > > src - mst hub - sst monitor => src - mst hub  (unplug) sst monitor > > > > > > > > > > > > Within this case, we won't try to put refcount of the sst monitor. > > > > > > Which is what I tried to resolve by [PATCH 3/4]. > > > > > > But here comes a problem which is confusing me that if I can > > > > > > destroy connector in this case. By comparing to another case, if > > > > > > now mst hub is connected with a mst monitor like this: > > > > > > src - mst hub - mst monitor => src - mst hub  (unplug) mst monitor > > > > > > > > > > > > We will put the topology refcount of mst monitor's branching unit > > > > > > in and > > > > > > drm_dp_port_set_pdt() and eventually call > > > > > > drm_dp_delayed_destroy_port() to unregister the connector of the > > > > > > logical port. So following the same rule, I think to dynamically > > > > > > unregister a mst connector is what we want and should be > > > > > > reasonable to also destroy sst connectors in my case. But this > > > > > > conflicts the idea what we have here. We want to create connectors > > > > > > for all output ports. > > > > > > So if dynamically creating/destroying connectors is what we want, > > > > > > when is the appropriate time for us to create one is what I'm > > > > > > considering. > > > > > > > > > > > > Take the StartTech hub DP 1to4 DP output ports for instance. This > > > > > > hub, internally, is constructed by  3 1-to-2 mst branch chips. 2 > > > > > > output ports of 1st chip are hardwired to another 2 chips. It's > > > > > > how it makes it to support 1-to-4 mst branching. So within this > > > > > > case, the internal > > > > > > 2 output ports of 1st chip is not connecting to a stream sink and > > > > > > will never get connected to one.  Thus, I'm thinking maybe the > > > > > > best timing to attach a connector to a port is when the port is > > > > > > connected, and the connected PDT is determined as a stream sink. > > > > > > > > > > > > Sorry if I misunderstand anything here and really thanks for your > > > > > > time to shed light on this : ) Thanks Lyude. > > > > > > > > > > It's no problem, it is my job after all! Sorry for how long my > > > > > responses have been taking, but my plate seems to be finally > > > > > clearing up for the foreseeable future. > > > > > > > > > > That being said - it sounds like with this we still aren't actually > > > > > clear on where the topology refcount leak is happening - only when > > > > > it's happening, which says to me that's the issue we really need to > > > > > be figuring out the cause of as opposed to trying to workaround it. > > > > > > > > > > Actually - refcount leaks is an issue I've ran into a number of > > > > > times before in the past, so a while back I actually added some nice > > > > > debugging features to assist with debugging leaks. If you enable the > > > > > following options in your kernel config: > > > > > > > > > > CONFIG_EXPERT=y # This must be set first before the next option > > > > > CONFIG_DRM_DEBUG_DP_MST_TOPOLOGY_REFS=y > > > > > > > > > > Unfortunately, I'm suddenly realizing after typing this that > > > > > apparently I never bothered adding a way for us to debug the > > > > > > > refcounts of ports/mstbs that haven't been released yet - only the > > > > > ones for ones that have. This shouldn't be difficult at all for me > > > > > to add, so I'll send you a patch either today or at the start of > > > > > next week to try debugging with using this, and then we can figure > > > > > out where this leak is really coming from. > > > > > > > > Thanks Lyude! > > > > > > > > Sorry to bother you, but I would like to clarify this again.  So it > > > > sounds > > > > > > It's no problem! It's my job and I'm happy to help :). > > > > Thanks! > > I would like to learn more from you as below : p > > > > > > > like you also agree that we should destroy associated connector > > > > > > Not quite. I think a better way of explaining this might be to point out > > > that the lifetime of an MST port and its connector isn't supposed > > > to be determined by whether or not it has something plugged into it - > > > its > > > lifetime is supposed to depend on whether there's a valid > > > path from us down the MST topology to the port we're trying to reach. So > > > an MSTB with ports that is unplugged would destroy all of > > > its ports - but an unplugged port should just be the same as a > > > disconnected DRM connector - even if the port itself is just hosting a > > > branching device. > > > > This is the part a bit difficult to me. I treat DRM connector as the place > > where we associate with a stream sink. So if the statement > > is "All DP mst output ports are places we connect with stream sink", I > > would > > say false to this since I can find the negative example when > > output port is connected with mst branch device. Thus, looks like we could > > only determine whether to create a connector for an output > > port when the peer device type is known? > > > > > > Additionally - we don't want to try "delaying" connector creation > > > either. > > > In the modern world hotplugging is almost always reliable in > > > normal situations, but even so there's still use cases for wanting force > > > probing for analog devices on DP converters and just in general > > > as it's a feature commonly used by developers or users working around > > > monitors with problematic HPD issues or EDID issues. > > > > I think I understand that why we want to create connectors for all output > > ports here. But under these mentioned use cases, aren't we still > > capable to force connector to enable stream? MST hub with muti-functon > > capability, it will enumerate connected virtual DP peer device. > > For problematic HPD issues or EDID issues, their connection status is also > > connected. > > > > My understanding of output port is it is an internal node to help > > construct > > an end-to-end virtual channel between a stream source device > > and a stream sink device. Creating connectors for internal nodes within a > > virtual channel is a bit hard for me to get the idea. Please correct > > me if I misunderstand anything here. Thanks Lyude! > > > > > > > when we unplug sst monitor from a mst hub in the case that I > > > > described? In the case I described (unplug sst monitor), we only > > > > receive CSN from the hub that notifying us the connection status of > > > > one of its downstream output ports is changed to disconnected. There > > > > is no topology refcount needed to be decreased on this disconnected > > > > port but the malloc refcount. Since the output port is still declared > > > > by > > > > > > Apologies - I misunderstood your original mail as implying that topology > > > refcounts were being leaked - but it sounds like it's actually > > > malloc refcounts being leaked instead? In any case - that means we're > > > still tracing down a leak, just a malloc ref leak. > > > > > > But, this still doesn't totally make sense to me. Malloc refs only keep > > > the actual drm_dp_mst_port/drm_dp_mst_branch struct alive in > > > memory. > > > Nothing else is kept around, meaning the DRM connector (and I assume by > > > proxy, the dc_sink) should both be getting dropped still > > > and the only thing that should be leaked is a memory allocation. These > > > things should instead be dropped once there's no longer any > > > topology references around. So, are we _sure_ that the problem here is a > > > missing > > > drm_dp_mst_port_put_malloc() or drm_dp_mst_mstb_put_malloc()? > > > > Just my two cents, I don't think it's leak of malloc ref neither. As you > > said, malloc ref is dealing with the last step to free port/mstb. > > If there is still topology refcount on port/mstb in my case, we won't free > > port/mstb. > > > > > > If we are unfortunately we don't have equivalent tools for malloc() > > > tracing. I'm totally fine with trying to add some if we have trouble > > > figuring out this issue, but I'm a bit suspicious of the commits you > > > mentioned that introduced this problem. If the problem doesn't > > > happen until those two commits, then it's something in the code changes > > > there that are causing this problem. > > > > I think we probably also have the problem before these commits, but we > > didn't notice this before. Just when we change to clean up all > > things in dm_dp_mst_connector_destroy(), I start to try to figure out all > > these things out. > > > > > > The main thing I'm suspicious of just from looking at changes in > > > 09b974e8983a4b163d4a406b46d50bf869da3073 is that the call to > > > amdgpu_dm_update_freesync_caps() that was previously in > > > dm_dp_destroy_mst_connector() appears to be dropped and not re-added in > > > (oh dear, this is a /very/ confusingly similar function > > > > Lol. I also have hard time on this.. > > > name!!!) dm_dp_mst_connector_destroy(). I don't remember if this was > > > intentional on my part, but does adding a call back to > > > amdgpu_dm_update_freesync_caps() into dm_dp_destroy_mst_connector() > > > right > > > before the dc_link_remove_remote_sink() call fix > > > anything? > > > > > > As well, I'm far less suspicious of this one but does re-adding this > > > hunk: > > > > > >       aconnector->dc_sink = NULL; > > >       aconnector->dc_link->cur_link_settings.lane_count = 0; > > > > > > After dc_sink_release() fix anything either? > > > > So the main problem is we don't have chance to call > > dc_link_remove_remote_sink() in the unplugging SST case. We only have > > chance > > to > > remove the remote sink of a link when unplug a mstb. > > > > > > > the mst hub,  I think we shouldn't destroy the port. Actually, no > > > > ports nor mst branch devices should get destroyed in this case I > > > > think. > > > > The result of LINK_ADDRESS is still the same before/after removing the > > > > sst monitor except the DisplayPort_Device_Plug_Status/ > > > > Legacy_Device_Plug_Status. > > > > > > > > Hence, if you agree that we should put refcount of the connector of > > > > the disconnected port within the unplugging sst monitor case to > > > > release the allocated resource, it means we don't want to create > > > > connectors for those disconnected ports. Which conflicts current flow > > > > to create connectors for all declared output ports. > > > > > > > > Thanks again for your time Lyude! > > > > > > -- > > > Cheers, > > >  Lyude Paul (she/her) > > >  Software Engineer at Red Hat > > -- > > Regards, > > Wayne > > > -- Cheers, Lyude Paul (she/her) Software Engineer at Red Hat