All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Vincenzo Palazzo" <vincenzopalazzodev@gmail.com>
To: "Bjorn Helgaas" <helgaas@kernel.org>
Cc: <linux-pci@vger.kernel.org>, <robh@kernel.org>, <heiko@sntech.de>,
	<kw@linux.com>, <shawn.lin@rock-chips.com>,
	<linux-kernel@vger.kernel.org>, <lgirdwood@gmail.com>,
	<linux-rockchip@lists.infradead.org>, <broonie@kernel.org>,
	<bhelgaas@google.com>,
	<linux-kernel-mentees@lists.linuxfoundation.org>,
	<lpieralisi@kernel.org>, <linux-arm-kernel@lists.infradead.org>,
	"Dan Johansen" <strit@manjaro.org>
Subject: Re: [PATCH v1] drivers: pci: introduce configurable delay for Rockchip PCIe bus scan
Date: Wed, 10 May 2023 13:35:43 +0200	[thread overview]
Message-ID: <CSIKESNNLX5D.4VDA3E6NBN3N@vincent-arch> (raw)
In-Reply-To: <20230509211902.GA1270901@bhelgaas>

> Hi Vincenzo,

Hi :)

> Thanks for raising this issue.  Let's see what we can do to address
> it.

Yeah, as I said in my cover letter, I am not happy with my solution,
but we should start somewhere to discuss it.

> > Add a configurable delay to the Rockchip PCIe driver to address
> > crashes that occur on some old devices, such as the Pine64 RockPro64.
> > 
> > This issue is affecting the ARM community, but there is no
> > upstream solution for it yet.
>
> It sounds like this happens with several endpoints, right?  And I
> assume the endpoints work fine in other non-Rockchip systems?  If
> that's the case, my guess is the problem is with the Rockchip host
> controller and how it's initialized, not with the endpoints.


Yeah, the crash is only reproducible with the Rockchip system, or better, 
the crash is reproducible only in some modern devices that use the old 
Rockchip driver mentioned in this patch.

> The only delays and timeouts I see in the driver now are in
> rockchip_pcie_host_init_port(), where it waits for link training to
> complete.  I assume the link training did completely successfully
> since you don't mention either a gen1 or gen2 timeout (although the
> gen2 message is a dev_dbg() that normally wouldn't go to the console).
>
> I don't know that the spec contains a retrain timeout value.  Several
> other drivers use 1 second, while rockchip uses 500ms (for example,
> see LINK_RETRAIN_TIMEOUT and LINK_UP_TIMEOUT).
>
> I think we need to understand the issue better before adding a DT
> property and a module parameter.  Those are hard for users to deal
> with.  If we can figure out a value that works for everybody, it would
> be better to just hard-code it in the driver and use that all the
> time.

Yeah, I see, I see. This makes sense. Is there any path that I can follow in 
order to better understand what's going on at the hardware level? In other 
words, how can I help to understand this issue better and provide a
unique solution for everybody?

Thanks for the nits in the patch, I will take a look with a fresh mind
later in the day.

Cheers!

Vincent.

WARNING: multiple messages have this Message-ID (diff)
From: "Vincenzo Palazzo" <vincenzopalazzodev@gmail.com>
To: "Bjorn Helgaas" <helgaas@kernel.org>
Cc: kw@linux.com, heiko@sntech.de, robh@kernel.org,
	linux-pci@vger.kernel.org, shawn.lin@rock-chips.com,
	linux-kernel@vger.kernel.org, lgirdwood@gmail.com,
	linux-rockchip@lists.infradead.org, broonie@kernel.org,
	bhelgaas@google.com,
	linux-kernel-mentees@lists.linuxfoundation.org,
	lpieralisi@kernel.org, linux-arm-kernel@lists.infradead.org,
	Dan Johansen <strit@manjaro.org>
Subject: Re: [PATCH v1] drivers: pci: introduce configurable delay for Rockchip PCIe bus scan
Date: Wed, 10 May 2023 13:35:43 +0200	[thread overview]
Message-ID: <CSIKESNNLX5D.4VDA3E6NBN3N@vincent-arch> (raw)
In-Reply-To: <20230509211902.GA1270901@bhelgaas>

> Hi Vincenzo,

Hi :)

> Thanks for raising this issue.  Let's see what we can do to address
> it.

Yeah, as I said in my cover letter, I am not happy with my solution,
but we should start somewhere to discuss it.

> > Add a configurable delay to the Rockchip PCIe driver to address
> > crashes that occur on some old devices, such as the Pine64 RockPro64.
> > 
> > This issue is affecting the ARM community, but there is no
> > upstream solution for it yet.
>
> It sounds like this happens with several endpoints, right?  And I
> assume the endpoints work fine in other non-Rockchip systems?  If
> that's the case, my guess is the problem is with the Rockchip host
> controller and how it's initialized, not with the endpoints.


Yeah, the crash is only reproducible with the Rockchip system, or better, 
the crash is reproducible only in some modern devices that use the old 
Rockchip driver mentioned in this patch.

> The only delays and timeouts I see in the driver now are in
> rockchip_pcie_host_init_port(), where it waits for link training to
> complete.  I assume the link training did completely successfully
> since you don't mention either a gen1 or gen2 timeout (although the
> gen2 message is a dev_dbg() that normally wouldn't go to the console).
>
> I don't know that the spec contains a retrain timeout value.  Several
> other drivers use 1 second, while rockchip uses 500ms (for example,
> see LINK_RETRAIN_TIMEOUT and LINK_UP_TIMEOUT).
>
> I think we need to understand the issue better before adding a DT
> property and a module parameter.  Those are hard for users to deal
> with.  If we can figure out a value that works for everybody, it would
> be better to just hard-code it in the driver and use that all the
> time.

Yeah, I see, I see. This makes sense. Is there any path that I can follow in 
order to better understand what's going on at the hardware level? In other 
words, how can I help to understand this issue better and provide a
unique solution for everybody?

Thanks for the nits in the patch, I will take a look with a fresh mind
later in the day.

Cheers!

Vincent.
_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

WARNING: multiple messages have this Message-ID (diff)
From: "Vincenzo Palazzo" <vincenzopalazzodev@gmail.com>
To: "Bjorn Helgaas" <helgaas@kernel.org>
Cc: <linux-pci@vger.kernel.org>, <robh@kernel.org>, <heiko@sntech.de>,
	<kw@linux.com>, <shawn.lin@rock-chips.com>,
	<linux-kernel@vger.kernel.org>, <lgirdwood@gmail.com>,
	<linux-rockchip@lists.infradead.org>, <broonie@kernel.org>,
	<bhelgaas@google.com>,
	<linux-kernel-mentees@lists.linuxfoundation.org>,
	<lpieralisi@kernel.org>, <linux-arm-kernel@lists.infradead.org>,
	"Dan Johansen" <strit@manjaro.org>
Subject: Re: [PATCH v1] drivers: pci: introduce configurable delay for Rockchip PCIe bus scan
Date: Wed, 10 May 2023 13:35:43 +0200	[thread overview]
Message-ID: <CSIKESNNLX5D.4VDA3E6NBN3N@vincent-arch> (raw)
In-Reply-To: <20230509211902.GA1270901@bhelgaas>

> Hi Vincenzo,

Hi :)

> Thanks for raising this issue.  Let's see what we can do to address
> it.

Yeah, as I said in my cover letter, I am not happy with my solution,
but we should start somewhere to discuss it.

> > Add a configurable delay to the Rockchip PCIe driver to address
> > crashes that occur on some old devices, such as the Pine64 RockPro64.
> > 
> > This issue is affecting the ARM community, but there is no
> > upstream solution for it yet.
>
> It sounds like this happens with several endpoints, right?  And I
> assume the endpoints work fine in other non-Rockchip systems?  If
> that's the case, my guess is the problem is with the Rockchip host
> controller and how it's initialized, not with the endpoints.


Yeah, the crash is only reproducible with the Rockchip system, or better, 
the crash is reproducible only in some modern devices that use the old 
Rockchip driver mentioned in this patch.

> The only delays and timeouts I see in the driver now are in
> rockchip_pcie_host_init_port(), where it waits for link training to
> complete.  I assume the link training did completely successfully
> since you don't mention either a gen1 or gen2 timeout (although the
> gen2 message is a dev_dbg() that normally wouldn't go to the console).
>
> I don't know that the spec contains a retrain timeout value.  Several
> other drivers use 1 second, while rockchip uses 500ms (for example,
> see LINK_RETRAIN_TIMEOUT and LINK_UP_TIMEOUT).
>
> I think we need to understand the issue better before adding a DT
> property and a module parameter.  Those are hard for users to deal
> with.  If we can figure out a value that works for everybody, it would
> be better to just hard-code it in the driver and use that all the
> time.

Yeah, I see, I see. This makes sense. Is there any path that I can follow in 
order to better understand what's going on at the hardware level? In other 
words, how can I help to understand this issue better and provide a
unique solution for everybody?

Thanks for the nits in the patch, I will take a look with a fresh mind
later in the day.

Cheers!

Vincent.

_______________________________________________
Linux-rockchip mailing list
Linux-rockchip@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-rockchip

  parent reply	other threads:[~2023-05-10 11:35 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-09 15:39 [PATCH v1] drivers: pci: introduce configurable delay for Rockchip PCIe bus scan Vincenzo Palazzo
2023-05-09 15:39 ` Vincenzo Palazzo
2023-05-09 15:39 ` Vincenzo Palazzo
2023-05-09 21:19 ` Bjorn Helgaas
2023-05-09 21:19   ` Bjorn Helgaas
2023-05-09 21:19   ` Bjorn Helgaas
2023-05-10  0:11   ` Peter Geis
2023-05-10  0:11     ` Peter Geis
2023-05-10  0:11     ` Peter Geis
2023-05-10 11:16     ` Vincenzo Palazzo
2023-05-10 11:16       ` Vincenzo Palazzo
2023-05-10 11:16       ` Vincenzo Palazzo
2023-05-10 19:46       ` Peter Geis
2023-05-10 19:46         ` Peter Geis
2023-05-10 19:46         ` Peter Geis
2023-05-10 20:47     ` Bjorn Helgaas
2023-05-10 20:47       ` Bjorn Helgaas
2023-05-10 20:47       ` Bjorn Helgaas
2023-05-11  1:07       ` Peter Geis
2023-05-11  1:07         ` Peter Geis
2023-05-11  1:07         ` Peter Geis
2023-05-12 10:46         ` Vincenzo Palazzo
2023-05-12 10:46           ` Vincenzo Palazzo
2023-05-12 10:46           ` Vincenzo Palazzo
2023-05-13  1:24           ` Bjorn Helgaas
2023-05-13  1:24             ` Bjorn Helgaas
2023-05-13  1:24             ` Bjorn Helgaas
2023-05-13 11:40             ` Peter Geis
2023-05-13 11:40               ` Peter Geis
2023-05-13 11:40               ` Peter Geis
2023-05-15 11:04               ` Vincenzo Palazzo
2023-05-15 11:04                 ` Vincenzo Palazzo
2023-05-15 11:04                 ` Vincenzo Palazzo
2023-05-15 11:04                 ` Vincenzo Palazzo
2023-05-15 16:51               ` Bjorn Helgaas
2023-05-15 16:51                 ` Bjorn Helgaas
2023-05-15 16:51                 ` Bjorn Helgaas
2023-05-15 16:51                 ` Bjorn Helgaas
2023-05-15 20:52                 ` Peter Geis
2023-05-15 20:52                   ` Peter Geis
2023-05-15 20:52                   ` Peter Geis
2023-05-15 20:52                   ` Peter Geis
2023-07-12 15:42               ` Vincenzo Palazzo
2023-07-12 15:42                 ` Vincenzo Palazzo
2023-07-12 15:42                 ` Vincenzo Palazzo
2023-07-12 15:42                 ` Vincenzo Palazzo
2023-05-10 11:35   ` Vincenzo Palazzo [this message]
2023-05-10 11:35     ` Vincenzo Palazzo
2023-05-10 11:35     ` Vincenzo Palazzo
2023-05-12 16:40   ` Vincenzo Palazzo
2023-05-12 16:40     ` Vincenzo Palazzo
2023-05-12 16:40     ` Vincenzo Palazzo
2023-05-10  7:57 ` Greg KH
2023-05-10  7:57   ` Greg KH
2023-05-10  7:57   ` Greg KH
2023-05-10 10:49   ` Vincenzo Palazzo
2023-05-10 10:49     ` Vincenzo Palazzo
2023-05-10 10:49     ` Vincenzo Palazzo
2023-11-20  4:15 ` Tom Fitzhenry
2023-11-20  4:15   ` Tom Fitzhenry
2023-11-20  4:15   ` Tom Fitzhenry
2023-11-20  4:15   ` Tom Fitzhenry
  -- strict thread matches above, loose matches on Subject: below --
2023-05-01 20:14 Vincenzo Palazzo
2023-05-01 20:14 ` Vincenzo Palazzo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CSIKESNNLX5D.4VDA3E6NBN3N@vincent-arch \
    --to=vincenzopalazzodev@gmail.com \
    --cc=bhelgaas@google.com \
    --cc=broonie@kernel.org \
    --cc=heiko@sntech.de \
    --cc=helgaas@kernel.org \
    --cc=kw@linux.com \
    --cc=lgirdwood@gmail.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel-mentees@lists.linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linux-rockchip@lists.infradead.org \
    --cc=lpieralisi@kernel.org \
    --cc=robh@kernel.org \
    --cc=shawn.lin@rock-chips.com \
    --cc=strit@manjaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.