linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Shuai Xue <xueshuai@linux.alibaba.com>
To: Jonathan Cameron <Jonathan.Cameron@huawei.com>,
	Bjorn Helgaas <helgaas@kernel.org>
Cc: will@kernel.org, linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, rdunlap@infradead.org,
	robin.murphy@arm.com, mark.rutland@arm.com,
	baolin.wang@linux.alibaba.com, zhuo.song@linux.alibaba.com,
	linux-pci@vger.kernel.org
Subject: Re: [PATCH v1 2/3] drivers/perf: add DesignWare PCIe PMU driver
Date: Mon, 26 Sep 2022 21:31:34 +0800	[thread overview]
Message-ID: <89efd20f-65f2-c082-1eb4-4e308957ff59@linux.alibaba.com> (raw)
In-Reply-To: <20220923165423.00007dc6@huawei.com>

+ Bjorn Helgaas

在 2022/9/23 PM11:54, Jonathan Cameron 写道:
> 
>>
>>>   
>>>> +#define RP_NUM_MAX				32 /* 2die * 4RC * 4Ctrol */  
>>>
>>> This driver is 'almost' generic. So if you an avoid defines based on a particular
>>> platform that's definitely good!  
>>
>> Good idea. How about defining RP_NUM_MAX as 64? As fars as I know,
>> some platfrom use 2 sockets, 2 die per socket.
>> Then 2 sockets * 2 dies * 4 Root Complex * 4 root port.
> 
> Setting a reasonable maximum is fine - but make sure the code then fails with
> a suitable error message if there are more!

OK, I will add a discovery logic here and count PMU number at runtime.

> 
> 
>>>> +#define DWC_PCIE_LANE_SHIFT			4
>>>> +#define DWC_PCIE_LANE_MASK			GENMASK(9, 4)
>>>> +
>>>> +#define DWC_PCIE_EVENT_CNT_CTRL			0x8
>>>> +#define DWC_PCIE__CNT_EVENT_SELECT_SHIFT	16  
>>>
>>> Why double __?  If point is , then
>>> naming works better
>>> DWC_PCIE_EVENT_CNT_CTRL_REG
>>> DWC_PCIE_EVENT_CNT_CTRL_EV_SELECT_MSK etc  
>>
>> Yes, I point to use double `__` to indicate it is a field of register,
>> as CMN and CCN drivers do. I also considered naming with REG explicitly,
>> but the macro is so long that I often have to wrap code into multilines.
>> Any way, it's fine to rename if you still suggest to do so.
> 
> I don't particularly mind.  This convention was new to me.

Haha, then I will leave the double `__` as CMN and CCN drivers do.

>>>> +struct dwc_pcie_pmu_priv {
>>>> +	struct device *dev;
>>>> +	u32 pcie_ctrl_num;
>>>> +	struct dwc_pcie_info_table *pcie_table;
>>>> +};
>>>> +
>>>> +#define DWC_PCIE_CREATE_BDF(seg, bus, dev, func)	\
>>>> +	(((seg) << 24) | (((bus) & 0xFF) << 16) | (((dev) & 0xFF) << 8) | (func))  
>>>
>>> Superficially this looks pretty standard.  Why is is DWC specific?  
>>
>> You are right, it is not DWC specific.
>>
>> I found a similar definition in arch/ia64/pci/pci.c .
>>
>> 	#define PCI_SAL_ADDRESS(seg, bus, devfn, reg)		\
>> 	(((u64) seg << 24) | (bus << 16) | (devfn << 8) | (reg))
>>
>> Should we move it into a common header first?
> 
> Maybe. The bus, devfn, reg part is standard bdf, but I don't think
> the PCI 6.0 spec defined a version with the seg in the upper bits.
> I'm not sure if we want to adopt that in LInux.

I found lots of code use seg,bus,devfn,reg with format "%04x:%02x:%02x.%x",
I am not quite familiar with PCIe spec. What do you think about it, Bjorn?


> 
>>>   
>>>> +		pci_read_config_dword(pdev, vsec + PCI_VNDR_HEADER, &header);
>>>> +		/* Is the device part of a DesignWare Cores PCIe Controller ? */  
>>>
>>> Good question... This code doesn't check that.  VSEC ID is matched only with
>>> the Vendor ID of the devices - unlike DVSEC where this would all be nice
>>> and local.  
>>
>> I think a similar fashion is
>>
>> 	u16 pci_find_vsec_capability(struct pci_dev *dev, u16 vendor, int cap)
>>
>> As you see, I don't want to limit this driver to a specific vendor, like
>> Alibaba (0x1ded), because this driver is generic to all DesignWare Cores PCIe
>> Controller. Therefore, dwc_pcie_find_ras_des_cap_position does not check vendor
>> like pci_find_vsec_capability.
> 
> You can't do that because another vendor could use the same VSEC ID for
> an entirely different purpose. They are only valid in combination with the device VID.

It make sense to me.

> 
> The only way this can work is with a list of specific vendor ID / VSEC pairs for
> known devices.
> 
>>
>> Do you mean to use DVSEC instead? I try to read out DVSEC with lspci:
>>
>>     # lspci -vvv
>>     b0:00.0 PCI bridge: Alibaba (China) Co., Ltd. M1 Root Port (rev 01) (prog-if 00 [Normal decode])
>>     [...snip...]
>>         Capabilities: [374 v1] Vendor Specific Information: ID=0002 Rev=4 Len=100 <?>
>>         Capabilities: [474 v1] Vendor Specific Information: ID=0001 Rev=1 Len=038 <?>
>>         Capabilities: [4ac v1] Data Link Feature <?>
>>         Capabilities: [4b8 v1] Designated Vendor-Specific: Vendor=0001 ID=0000 Rev=1 Len=64 <?>
>>         Capabilities: [4fc v1] Vendor Specific Information: ID=0005 Rev=1 Len=018 <?>
>>
>> How can we tell it's a DesignWare Cores PCIe Controller?
> 
> Gah. This is what DVSEC was defined to solve. It lets you have a common
> vendor defined extended capability defined by a vendor, independent of the
> VID of a given device.  With a VSEC you can't write generic code.
> 

Got it. But I don't see any description about RAS_DES_CAP register relate to DVSEC
in PCIe Controller TRM. I will check this later.

>>
>>>> +		if (PCI_VNDR_HEADER_ID(header) == DWC_PCIE_VSEC_ID &&
>>>> +		    PCI_VNDR_HEADER_REV(header) == DWC_PCIE_VSEC_REV) {
>>>> +			*pos = vsec;
>>>> +			return 0;
>>>> +		}
>>>> +	}
>>>> +
>>>> +	return -ENODEV;
>>>> +}
>>>> +
>>>> +static int dwc_pcie_pmu_discover(struct dwc_pcie_pmu_priv *priv)
>>>> +{
>>>> +	int val, where, index = 0;
>>>> +	struct pci_dev *pdev = NULL;
>>>> +	struct dwc_pcie_info_table *pcie_info;
>>>> +
>>>> +	priv->pcie_table =
>>>> +	    devm_kcalloc(priv->dev, RP_NUM_MAX, sizeof(*pcie_info), GFP_KERNEL);
>>>> +	if (!priv->pcie_table)
>>>> +		return -EINVAL;
>>>> +
>>>> +	pcie_info = priv->pcie_table;
>>>> +	while ((pdev = pci_get_device(PCI_ANY_ID, PCI_ANY_ID, pdev)) != NULL &&
>>>> +	       index < RP_NUM_MAX) {  
>>>
>>> This having a driver than then walks the pci topology to find root ports and add
>>> extra stuff to them is not a clean solution.
>>>
>>> The probing should be driven from the existing PCI driver topology.
>>> There are a bunch of new features we need to add to ports in the near future
>>> anyway - this would just be another one.
>>> Same problem exists for CXL CPMU perf devices - so far we only support those
>>> on end points, partly because we need a clean way to probe them on pci ports.
>>>
>>> Whatever we come up with there will apply here as well.  
>>
>> I see your point. Any link to reference?
> 
> No, though hopefully we'll get to some sort of plan in the branch of this thread
> that Bjorn comment in.
> 

OK.

>>
>>>   
>>>> +		if (!pci_dev_is_rootport(pdev))
>>>> +			continue;
>>>> +
>>>> +		pcie_info[index].bdf = dwc_pcie_get_bdf(pdev);
>>>> +		pcie_info[index].pdev = pdev;  
>>> Probably want a sanity check this has a vendor ID appropriate the VSEC you are about
>>> to look for.  
>>
>> If I check the vendor ID here or in dwc_pcie_find_ras_des_cap_position, this driver
>> will only work for Alibaba as I mentioned before.
> 
> Agreed. Unfortunately that's all you can do safely as VSEC IDs are not a global
> namespace.

Should we add a sanity check with a vendor list in dwc_pcie_find_ras_des_cap_position?

>>
>>>> +
>>>> +	ret = dwc_pcie_pmu_write_dword(pcie_info, DWC_PCIE_EVENT_CNT_CTRL, val);
>>>> +	if (ret)
>>>> +		pci_err(pcie_info->pdev, "PCIe write fail\n");
>>>> +
>>>> +	return ret;
>>>> +}  
>>>
>>> ...
>>>   
>>>> +
>>>> +static int dwc_pcie_pmu_read_base_time_counter(struct dwc_pcie_info_table
>>>> +					       *pcie_info, u64 *counter)
>>>> +{
>>>> +	u32 ret, val;
>>>> +
>>>> +	ret = dwc_pcie_pmu_read_dword(pcie_info,
>>>> +				      DWC_PCIE_TIME_BASED_ANALYSIS_DATA_REG_HIGH,
>>>> +				      &val);
>>>> +	if (ret) {
>>>> +		pci_err(pcie_info->pdev, "PCIe read fail\n");
>>>> +		return ret;
>>>> +	}
>>>> +
>>>> +	*counter = val;
>>>> +	*counter <<= 32;  
>>>
>>> This looks like you could get ripping between the upper and lower dwords.
>>> What prevents that? Perhaps a comment to say why that's not a problem?  
>>
>> The Time-based Analysis Data which contains the measurement results of
>> RX/TX data throughput and time spent in each low-power LTSSM state is 64 bit.
>> The data is provided by two 32 bit registers so I rip them together. I will
>> add a comment here in next verison.
> 
> If I understand correctly the only safe way to read this is in a try / retry loop.
> Read the upper part, then the lower part, then reread the upper part.
> If the upper part is unchanged you did not get ripping across the two registers.
> If it changes, try again.

It make sence to me, I will fix it in next version.

> 
>>
>>>   
>>>> +
>>>> +	ret = dwc_pcie_pmu_read_dword(pcie_info,
>>>> +				      DWC_PCIE_TIME_BASED_ANALYSIS_DATA_REG_LOW,
>>>> +				      &val);
>>>> +	if (ret) {
>>>> +		pci_err(pcie_info->pdev, "PCIe read fail\n");
>>>> +		return ret;
>>>> +	}
>>>> +
>>>> +	*counter += val;
>>>> +
>>>> +	return ret;
>>>> +}  
>>> ...
>>>
>>>> +
>>>> +	ret = perf_pmu_register(&pcie_pmu->pmu, name, -1);
>>>> +	if (ret) {
>>>> +		pci_err(pcie_info->pdev, "Error %d registering PMU @%x\n", ret,
>>>> +				 pcie_info->bdf);
>>>> +		return ret;
>>>> +	}
>>>> +
>>>> +	pcie_info->pmu_is_register = DWC_PCIE_PMU_HAS_REGISTER;  
>>>
>>> As below. I think you can drop this state info.  
>>
>> Please see my confusion bellow.
>>
>>>   
>>>> +
>>>> +	return ret;
>>>> +}
>>>> +
>>>> +static int dwc_pcie_pmu_remove(struct platform_device *pdev)
>>>> +{
>>>> +	struct dwc_pcie_pmu_priv *priv = platform_get_drvdata(pdev);
>>>> +	int index;
>>>> +	struct dwc_pcie_pmu *pcie_pmu;
>>>> +
>>>> +	for (index = 0; index < priv->pcie_ctrl_num; index++)
>>>> +		if (priv->pcie_table[index].pmu_is_register) {
>>>> +			pcie_pmu = &priv->pcie_table[index].pcie_pmu;
>>>> +			perf_pmu_unregister(&pcie_pmu->pmu);
>>>> +		}
>>>> +	return 0;
>>>> +}
>>>> +
>>>> +static int dwc_pcie_pmu_probe(struct platform_device *pdev)
>>>> +{
>>>> +	int ret = 0;  
>>>
>>> Initialized in all paths where it is used. Compiler should be able to tell
>>> that so I doubt you need this to be set to 0 here.  
>>
>> Agree, will leave it as uninitialized.
>>
>>>   
>>>> +	int pcie_index;
>>>> +	struct dwc_pcie_pmu_priv *priv;
>>>> +
>>>> +	priv = devm_kzalloc(&pdev->dev, sizeof(*priv), GFP_KERNEL);
>>>> +	if (!priv)
>>>> +		return -ENOMEM;
>>>> +	priv->dev = &pdev->dev;
>>>> +	platform_set_drvdata(pdev, priv);
>>>> +
>>>> +	/* If PMU is not support on current platform, keep slient */
>>>> +	if (dwc_pcie_pmu_discover(priv))
>>>> +		return 0;
>>>> +
>>>> +	for (pcie_index = 0; pcie_index < priv->pcie_ctrl_num; pcie_index++) {
>>>> +		struct pci_dev *rp = priv->pcie_table[pcie_index].pdev;
>>>> +
>>>> +		ret = __dwc_pcie_pmu_probe(priv, &priv->pcie_table[pcie_index]);
>>>> +		if (ret) {
>>>> +			dev_err(&rp->dev, "PCIe PMU probe fail\n");
>>>> +			goto pmu_unregister;
>>>> +		}
>>>> +	}
>>>> +	dev_info(&pdev->dev, "PCIe PMUs registered\n");  
>>>
>>> Noise in the logs.  There are lots of ways to know if we reached this point
>>> so this adds no value.  
>>
>> Got it, will drop this out in next version.
>>
>>>   
>>>> +
>>>> +	return 0;
>>>> +
>>>> +pmu_unregister:
>>>> +	dwc_pcie_pmu_remove(pdev);  
>>>
>>> I'd much rather see the unwind here directly so we can clearly see that it undoes
>>> the result of errors in this function.  That removes the need to use the
>>> is_registered flag in the remove() function simplifying that flow as well.  
>>
>> Do you mean that if perf_pmu_register fails, then jump to pmu_unregister lable directly?
>> How can we tell which PMU diveice fails to reigister?
> 
> pcie_index will be set to the index of the PMU device that failed - so loops backwards
> from that removing them.

Good idea. I will fix it in next version.


>>
> .
>>
>>>   
>>>> +};
>>>> +
>>>> +static int __init dwc_pcie_pmu_init(void)
>>>> +{
>>>> +	int ret;
>>>> +
>>>> +	ret = platform_driver_register(&dwc_pcie_pmu_driver);
>>>> +
>>>> +	if (ret)
>>>> +		return ret;
>>>> +
>>>> +	dwc_pcie_pmu_dev =
>>>> +	    platform_device_register_simple(DEV_NAME, -1, NULL, 0);  
>>>
>>> I'd normally expect to see the device created as a result of firmware
>>> description (ACPI DSDT / or Device tree)
>>> It is unusual to create a 'real' device directly in the driver
>>> init - that's normally reserved for various fake / software devices.  
>>
>> I see your concerns. You mentioned that
>>
>>    > The probing should be driven from the existing PCI driver topology.  
>>
>> Should we add a fake device in firmware or drive from PCI driver topology?
> 
> Ah. I was reviewing backwards so when I wrote this hadn't realized you walk
> the PCI topology.   PCI driver topology is the right solution here.

I see, I will use PCI driver topology instead.

> 
>>
>> Thank you.
>>
>> Best Regards,
>> Shuai
>>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2022-09-26 13:33 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-17 12:10 [PATCH v1 0/3] drivers/perf: add Synopsys DesignWare PCIe PMU driver support Shuai Xue
2022-09-17 12:10 ` [PATCH v1 1/3] docs: perf: Add description for Synopsys DesignWare PCIe PMU driver Shuai Xue
2022-09-22 13:25   ` Will Deacon
2022-09-23 13:51     ` Shuai Xue
2022-11-07 15:28       ` Will Deacon
2022-09-23  1:27   ` Yicong Yang
2022-09-23 14:47     ` Shuai Xue
2022-09-17 12:10 ` [PATCH v1 2/3] drivers/perf: add " Shuai Xue
2022-09-22 15:58   ` Jonathan Cameron
2022-09-22 17:32     ` Bjorn Helgaas
2022-09-23  3:35       ` Yicong Yang
2022-09-23 10:56         ` Jonathan Cameron
2022-09-23 13:45     ` Shuai Xue
2022-09-23 15:54       ` Jonathan Cameron
2022-09-26 13:31         ` Shuai Xue [this message]
2022-09-26 14:32           ` Robin Murphy
2022-09-26 17:18           ` Bjorn Helgaas
2022-09-27  5:13             ` Shuai Xue
2022-09-27 10:04               ` Jonathan Cameron
2022-09-27 10:14                 ` Robin Murphy
2022-09-27 12:49                   ` Shuai Xue
2022-09-27 13:39                     ` Jonathan Cameron
2022-09-27 12:29                 ` Shuai Xue
2022-09-27 10:03             ` Jonathan Cameron
2022-09-22 17:36   ` Bjorn Helgaas
2022-09-23 14:46     ` Shuai Xue
2022-09-23 18:51       ` Bjorn Helgaas
2022-09-27  6:01         ` Shuai Xue
2022-09-23  3:30   ` Yicong Yang
2022-09-23 15:43     ` Shuai Xue
2022-09-24  8:00       ` Yicong Yang
2022-09-26 11:39         ` Shuai Xue
2022-09-17 12:10 ` [PATCH v1 3/3] MAINTAINERS: add maintainers for " Shuai Xue
2023-04-10  3:16 ` [PATCH v2 0/3] drivers/perf: add Synopsys DesignWare PCIe PMU driver support Shuai Xue
2023-04-10  3:17 ` [PATCH v2 1/3] docs: perf: Add description for Synopsys DesignWare PCIe PMU driver Shuai Xue
2023-04-10  3:17 ` [PATCH v2 2/3] drivers/perf: add " Shuai Xue
2023-04-10  7:25   ` kernel test robot
2023-04-11  3:17   ` Baolin Wang
2023-04-17  1:16     ` Shuai Xue
2023-04-18  1:51       ` Baolin Wang
2023-04-19  1:39         ` Shuai Xue
2023-04-10  3:17 ` [PATCH v2 3/3] MAINTAINERS: add maintainers for " Shuai Xue
2023-04-17  6:17 ` [PATCH v3 0/3] drivers/perf: add Synopsys DesignWare PCIe PMU driver support Shuai Xue
2023-04-17  6:17 ` [PATCH v3 1/3] docs: perf: Add description for Synopsys DesignWare PCIe PMU driver Shuai Xue
2023-05-16 14:32   ` Jonathan Cameron
2023-05-17  1:27     ` Shuai Xue
2023-04-17  6:17 ` [PATCH v3 2/3] drivers/perf: add " Shuai Xue
2023-04-18 23:30   ` Robin Murphy
2023-04-27  6:33     ` Shuai Xue
2023-05-16 15:03       ` Jonathan Cameron
2023-05-16 19:17         ` Bjorn Helgaas
2023-05-17  9:54           ` Jonathan Cameron
2023-05-17 16:27             ` Bjorn Helgaas
2023-05-19 10:08               ` Shuai Xue
2023-04-17  6:17 ` [PATCH v3 3/3] MAINTAINERS: add maintainers for " Shuai Xue
2023-05-16 13:01 ` [PATCH v4 0/4] drivers/perf: add Synopsys DesignWare PCIe PMU driver support Shuai Xue
2023-05-16 13:01 ` [PATCH v4 1/4] docs: perf: Add description for Synopsys DesignWare PCIe PMU driver Shuai Xue
2023-05-16 13:01 ` [PATCH v4 2/4] PCI: move Alibaba Vendor ID linux/pci_ids.h Shuai Xue
2023-05-16 13:01 ` [PATCH v4 3/4] drivers/perf: add DesignWare PCIe PMU driver Shuai Xue
2023-05-16 19:19   ` Bjorn Helgaas
2023-05-17  2:35     ` Shuai Xue
     [not found]   ` <202305170639.XU3djFZX-lkp@intel.com>
2023-05-17  3:37     ` Shuai Xue
2023-05-16 13:01 ` [PATCH v4 4/4] MAINTAINERS: add maintainers for " Shuai Xue
2023-05-22  3:54 ` [PATCH v5 0/4] drivers/perf: add Synopsys DesignWare PCIe PMU driver support Shuai Xue
2023-05-22 14:28   ` Jonathan Cameron
2023-05-23  2:57     ` Shuai Xue
2023-05-22  3:54 ` [PATCH v5 1/4] docs: perf: Add description for Synopsys DesignWare PCIe PMU driver Shuai Xue
2023-05-29  3:45   ` Baolin Wang
2023-05-29  6:31     ` Shuai Xue
2023-05-22  3:54 ` [PATCH v5 2/4] PCI: move Alibaba Vendor ID linux/pci_ids.h Shuai Xue
2023-05-22 16:04   ` Bjorn Helgaas
2023-05-23  3:22     ` Shuai Xue
2023-05-23 11:54       ` Bjorn Helgaas
2023-05-23 12:49         ` Shuai Xue
2023-05-22  3:54 ` [PATCH v5 3/4] drivers/perf: add DesignWare PCIe PMU driver Shuai Xue
2023-05-29  6:13   ` Baolin Wang
2023-05-29  6:33     ` Shuai Xue
2023-05-22  3:54 ` [PATCH v5 4/4] MAINTAINERS: add maintainers for " Shuai Xue

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=89efd20f-65f2-c082-1eb4-4e308957ff59@linux.alibaba.com \
    --to=xueshuai@linux.alibaba.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=helgaas@kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=rdunlap@infradead.org \
    --cc=robin.murphy@arm.com \
    --cc=will@kernel.org \
    --cc=zhuo.song@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).