All of lore.kernel.org
 help / color / mirror / Atom feed
* cxl: question about cxl qos_class verification
       [not found] <CGME20240205103602epcms2p8543d4f3a4bfb684c81f07a94627c7aef@epcms2p8>
@ 2024-02-05 10:36 ` Wonjae Lee
  2024-02-05 22:31   ` Dave Jiang
       [not found]   ` <CGME20240205103602epcms2p8543d4f3a4bfb684c81f07a94627c7aef@epcms2p6>
  0 siblings, 2 replies; 5+ messages in thread
From: Wonjae Lee @ 2024-02-05 10:36 UTC (permalink / raw)
  To: linux-cxl; +Cc: dave.jiang, dan.j.williams, KyungSan Kim, Hojin Nam

Hello,

To test the CXL driver with respect to QTG IDs on a real CXL device, I
connected one CXL device to a CXL 2.0 Compliant System (v6.8-rc3).

However, during cxl endpoint probing, CDAT extraction and parsing works
fine, but cxl_qos_class_verify() for cxlmd does not run properly.

To be precise, when cxl_qos_class_verify() is executed, the below error
handling code is executed since cxlmd->endpoint is NULL:

    if (!cxl_root)
        return -ENODEV;


I'm not sure if I analyzed it correctly due to the complexity of the CXL
driver, but I think it's because the cxl_port driver execute
cxl_qos_class_verify() before cxlmd->endpoint = endpoint was executed in
the cxl_mem driver. See the dmesg log below, where I've added debugging
code.

# cxl_mem driver is adding the endpoint
[] cxl_mem mem0: call devm_cxl_add_enpoint
...
# endpoint port is probed, and cxl_qos_class_verify() runs
[] cxl_port endpoint5: call cxl_qos_class_verify
[] cxl_mem mem0: cxl_qos_class_verify: cxlmd->endpoint is NULL
[] cxl_mem mem0: cxl_qos_class_verify: cxl_root is NULL
...
# cxl_mem driver sets cxlmd->endpoint
[] cxl_mem mem0: cxl_endpoint_autoremove: cxlmd->endpoint = endpoint
...


I did an experiment to validate the hypothesis. If I call
cxl_endpoint_parse_cdat() after cxlmd->endpoint is set,
cxl_qos_class_verify() runs well without problems.

diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
index c5c9d8e0d88d..33b39c6c46fd 100644
--- a/drivers/cxl/mem.c
+++ b/drivers/cxl/mem.c
@@ -74,6 +74,8 @@ static int devm_cxl_add_endpoint(struct device *host, struct cxl_memdev *cxlmd,
        if (rc)
                return rc;

+       cxl_endpoint_parse_cdat(endpoint);
+
        if (!endpoint->dev.driver) {
                dev_err(&cxlmd->dev, "%s failed probe\n",
                        dev_name(&endpoint->dev));
diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c
index 97c21566677a..ee77aba62780 100644
--- a/drivers/cxl/port.c
+++ b/drivers/cxl/port.c
@@ -111,7 +111,6 @@ static int cxl_endpoint_port_probe(struct cxl_port *port)

        /* Cache the data early to ensure is_visible() works */
        read_cdat_data(port);
-       cxl_endpoint_parse_cdat(port);

        get_device(&cxlmd->dev);
        rc = devm_add_action_or_reset(&port->dev, schedule_detach, cxlmd);


Maybe there's something I'm missing. It would be very helpful if anyone
could comment on the above analysis.

Thanks,
Wonjae

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: cxl: question about cxl qos_class verification
  2024-02-05 10:36 ` cxl: question about cxl qos_class verification Wonjae Lee
@ 2024-02-05 22:31   ` Dave Jiang
       [not found]   ` <CGME20240205103602epcms2p8543d4f3a4bfb684c81f07a94627c7aef@epcms2p6>
  1 sibling, 0 replies; 5+ messages in thread
From: Dave Jiang @ 2024-02-05 22:31 UTC (permalink / raw)
  To: wj28.lee, linux-cxl; +Cc: dan.j.williams, KyungSan Kim, Hojin Nam



On 2/5/24 3:36 AM, Wonjae Lee wrote:
> Hello,
> 
> To test the CXL driver with respect to QTG IDs on a real CXL device, I
> connected one CXL device to a CXL 2.0 Compliant System (v6.8-rc3).
> 
> However, during cxl endpoint probing, CDAT extraction and parsing works
> fine, but cxl_qos_class_verify() for cxlmd does not run properly.
> 
> To be precise, when cxl_qos_class_verify() is executed, the below error
> handling code is executed since cxlmd->endpoint is NULL:
> 
>     if (!cxl_root)
>         return -ENODEV;
> 
> 
> I'm not sure if I analyzed it correctly due to the complexity of the CXL
> driver, but I think it's because the cxl_port driver execute
> cxl_qos_class_verify() before cxlmd->endpoint = endpoint was executed in
> the cxl_mem driver. See the dmesg log below, where I've added debugging
> code.
> 
> # cxl_mem driver is adding the endpoint
> [] cxl_mem mem0: call devm_cxl_add_enpoint
> ...
> # endpoint port is probed, and cxl_qos_class_verify() runs
> [] cxl_port endpoint5: call cxl_qos_class_verify
> [] cxl_mem mem0: cxl_qos_class_verify: cxlmd->endpoint is NULL
> [] cxl_mem mem0: cxl_qos_class_verify: cxl_root is NULL
> ...
> # cxl_mem driver sets cxlmd->endpoint
> [] cxl_mem mem0: cxl_endpoint_autoremove: cxlmd->endpoint = endpoint
> ...
> 
> 
> I did an experiment to validate the hypothesis. If I call
> cxl_endpoint_parse_cdat() after cxlmd->endpoint is set,
> cxl_qos_class_verify() runs well without problems.
> 
> diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
> index c5c9d8e0d88d..33b39c6c46fd 100644
> --- a/drivers/cxl/mem.c
> +++ b/drivers/cxl/mem.c
> @@ -74,6 +74,8 @@ static int devm_cxl_add_endpoint(struct device *host, struct cxl_memdev *cxlmd,
>         if (rc)
>                 return rc;
> 
> +       cxl_endpoint_parse_cdat(endpoint);
> +
>         if (!endpoint->dev.driver) {
>                 dev_err(&cxlmd->dev, "%s failed probe\n",
>                         dev_name(&endpoint->dev));
> diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c
> index 97c21566677a..ee77aba62780 100644
> --- a/drivers/cxl/port.c
> +++ b/drivers/cxl/port.c
> @@ -111,7 +111,6 @@ static int cxl_endpoint_port_probe(struct cxl_port *port)
> 
>         /* Cache the data early to ensure is_visible() works */
>         read_cdat_data(port);
> -       cxl_endpoint_parse_cdat(port);
> 
>         get_device(&cxlmd->dev);
>         rc = devm_add_action_or_reset(&port->dev, schedule_detach, cxlmd);
> 
> 
> Maybe there's something I'm missing. It would be very helpful if anyone
> could comment on the above analysis.

I think this should fix the issue you are seeing?

https://lore.kernel.org/linux-cxl/b243e80f-1b24-4756-8bb3-8389d66ea13a@intel.com/T/#mcbce77b6584bd1031d6c1928fcb36fe67be66039


> 
> Thanks,
> Wonjae

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE:(2) cxl: question about cxl qos_class verification
       [not found]   ` <CGME20240205103602epcms2p8543d4f3a4bfb684c81f07a94627c7aef@epcms2p6>
@ 2024-02-06  1:23     ` Wonjae Lee
  2024-02-06 15:24       ` (2) " Dave Jiang
       [not found]       ` <CGME20240205103602epcms2p8543d4f3a4bfb684c81f07a94627c7aef@epcms2p7>
  0 siblings, 2 replies; 5+ messages in thread
From: Wonjae Lee @ 2024-02-06  1:23 UTC (permalink / raw)
  To: Dave Jiang, linux-cxl; +Cc: dan.j.williams, KyungSan Kim, Hojin Nam

On Mon, Feb 05, 2024 at 03:31:35PM -0700, Dave Jiang wrote:
>
>
> On 2/5/24 3:36 AM, Wonjae Lee wrote:
> > Hello,
> >
> > To test the CXL driver with respect to QTG IDs on a real CXL device, I
> > connected one CXL device to a CXL 2.0 Compliant System (v6.8-rc3).
> >
> > However, during cxl endpoint probing, CDAT extraction and parsing works
> > fine, but cxl_qos_class_verify() for cxlmd does not run properly.
> >
> > To be precise, when cxl_qos_class_verify() is executed, the below error
> > handling code is executed since cxlmd->endpoint is NULL:
> >
> >     if (!cxl_root)
> >         return -ENODEV;
> >
> >
> > I'm not sure if I analyzed it correctly due to the complexity of the CXL
> > driver, but I think it's because the cxl_port driver execute
> > cxl_qos_class_verify() before cxlmd->endpoint = endpoint was executed in
> > the cxl_mem driver. See the dmesg log below, where I've added debugging
> > code.
> >
> > # cxl_mem driver is adding the endpoint
> > [] cxl_mem mem0: call devm_cxl_add_enpoint
> > ...
> > # endpoint port is probed, and cxl_qos_class_verify() runs
> > [] cxl_port endpoint5: call cxl_qos_class_verify
> > [] cxl_mem mem0: cxl_qos_class_verify: cxlmd->endpoint is NULL
> > [] cxl_mem mem0: cxl_qos_class_verify: cxl_root is NULL
> > ...
> > # cxl_mem driver sets cxlmd->endpoint
> > [] cxl_mem mem0: cxl_endpoint_autoremove: cxlmd->endpoint = endpoint
> > ...
> >
> >
> > I did an experiment to validate the hypothesis. If I call
> > cxl_endpoint_parse_cdat() after cxlmd->endpoint is set,
> > cxl_qos_class_verify() runs well without problems.
> >
> > diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
> > index c5c9d8e0d88d..33b39c6c46fd 100644
> > --- a/drivers/cxl/mem.c
> > +++ b/drivers/cxl/mem.c
> > @@ -74,6 +74,8 @@ static int devm_cxl_add_endpoint(struct device *host, struct cxl_memdev *cxlmd,
> >         if (rc)
> >                 return rc;
> >
> > +       cxl_endpoint_parse_cdat(endpoint);
> > +
> >         if (!endpoint->dev.driver) {
> >                 dev_err(&cxlmd->dev, "%s failed probe\n",
> >                         dev_name(&endpoint->dev));
> > diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c
> > index 97c21566677a..ee77aba62780 100644
> > --- a/drivers/cxl/port.c
> > +++ b/drivers/cxl/port.c
> > @@ -111,7 +111,6 @@ static int cxl_endpoint_port_probe(struct cxl_port *port)
> >
> >         /* Cache the data early to ensure is_visible() works */
> >         read_cdat_data(port);
> > -       cxl_endpoint_parse_cdat(port);
> >
> >         get_device(&cxlmd->dev);
> >         rc = devm_add_action_or_reset(&port->dev, schedule_detach, cxlmd);
> >
> >
> > Maybe there's something I'm missing. It would be very helpful if anyone
> > could comment on the above analysis.
>
> I think this should fix the issue you are seeing?
>
> https://lore.kernel.org/linux-cxl/b243e80f-1b24-4756-8bb3-8389d66ea13a@intel.com/T/#mcbce77b6584bd1031d6c1928fcb36fe67be66039
>

Oh, you've already fixed it. I found that on the same testbed the commit
you mentioned resolves the issue.

Thanks for your response!

>
> >
> > Thanks,
> > Wonjae
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: (2) cxl: question about cxl qos_class verification
  2024-02-06  1:23     ` Wonjae Lee
@ 2024-02-06 15:24       ` Dave Jiang
       [not found]       ` <CGME20240205103602epcms2p8543d4f3a4bfb684c81f07a94627c7aef@epcms2p7>
  1 sibling, 0 replies; 5+ messages in thread
From: Dave Jiang @ 2024-02-06 15:24 UTC (permalink / raw)
  To: wj28.lee, linux-cxl; +Cc: dan.j.williams, KyungSan Kim, Hojin Nam



On 2/5/24 6:23 PM, Wonjae Lee wrote:
> On Mon, Feb 05, 2024 at 03:31:35PM -0700, Dave Jiang wrote:
>>
>>
>> On 2/5/24 3:36 AM, Wonjae Lee wrote:
>>> Hello,
>>>
>>> To test the CXL driver with respect to QTG IDs on a real CXL device, I
>>> connected one CXL device to a CXL 2.0 Compliant System (v6.8-rc3).
>>>
>>> However, during cxl endpoint probing, CDAT extraction and parsing works
>>> fine, but cxl_qos_class_verify() for cxlmd does not run properly.
>>>
>>> To be precise, when cxl_qos_class_verify() is executed, the below error
>>> handling code is executed since cxlmd->endpoint is NULL:
>>>
>>>     if (!cxl_root)
>>>         return -ENODEV;
>>>
>>>
>>> I'm not sure if I analyzed it correctly due to the complexity of the CXL
>>> driver, but I think it's because the cxl_port driver execute
>>> cxl_qos_class_verify() before cxlmd->endpoint = endpoint was executed in
>>> the cxl_mem driver. See the dmesg log below, where I've added debugging
>>> code.
>>>
>>> # cxl_mem driver is adding the endpoint
>>> [] cxl_mem mem0: call devm_cxl_add_enpoint
>>> ...
>>> # endpoint port is probed, and cxl_qos_class_verify() runs
>>> [] cxl_port endpoint5: call cxl_qos_class_verify
>>> [] cxl_mem mem0: cxl_qos_class_verify: cxlmd->endpoint is NULL
>>> [] cxl_mem mem0: cxl_qos_class_verify: cxl_root is NULL
>>> ...
>>> # cxl_mem driver sets cxlmd->endpoint
>>> [] cxl_mem mem0: cxl_endpoint_autoremove: cxlmd->endpoint = endpoint
>>> ...
>>>
>>>
>>> I did an experiment to validate the hypothesis. If I call
>>> cxl_endpoint_parse_cdat() after cxlmd->endpoint is set,
>>> cxl_qos_class_verify() runs well without problems.
>>>
>>> diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
>>> index c5c9d8e0d88d..33b39c6c46fd 100644
>>> --- a/drivers/cxl/mem.c
>>> +++ b/drivers/cxl/mem.c
>>> @@ -74,6 +74,8 @@ static int devm_cxl_add_endpoint(struct device *host, struct cxl_memdev *cxlmd,
>>>         if (rc)
>>>                 return rc;
>>>
>>> +       cxl_endpoint_parse_cdat(endpoint);
>>> +
>>>         if (!endpoint->dev.driver) {
>>>                 dev_err(&cxlmd->dev, "%s failed probe\n",
>>>                         dev_name(&endpoint->dev));
>>> diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c
>>> index 97c21566677a..ee77aba62780 100644
>>> --- a/drivers/cxl/port.c
>>> +++ b/drivers/cxl/port.c
>>> @@ -111,7 +111,6 @@ static int cxl_endpoint_port_probe(struct cxl_port *port)
>>>
>>>         /* Cache the data early to ensure is_visible() works */
>>>         read_cdat_data(port);
>>> -       cxl_endpoint_parse_cdat(port);
>>>
>>>         get_device(&cxlmd->dev);
>>>         rc = devm_add_action_or_reset(&port->dev, schedule_detach, cxlmd);
>>>
>>>
>>> Maybe there's something I'm missing. It would be very helpful if anyone
>>> could comment on the above analysis.
>>
>> I think this should fix the issue you are seeing?
>>
>> https://lore.kernel.org/linux-cxl/b243e80f-1b24-4756-8bb3-8389d66ea13a@intel.com/T/#mcbce77b6584bd1031d6c1928fcb36fe67be66039
>>
> 
> Oh, you've already fixed it. I found that on the same testbed the commit
> you mentioned resolves the issue.
> 
> Thanks for your response!

Ok if I add your Tested-by tag to that patch?
 
> 
>>
>>>
>>> Thanks,
>>> Wonjae
>>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE:(2) (2) cxl: question about cxl qos_class verification
       [not found]       ` <CGME20240205103602epcms2p8543d4f3a4bfb684c81f07a94627c7aef@epcms2p7>
@ 2024-02-07  0:19         ` Wonjae Lee
  0 siblings, 0 replies; 5+ messages in thread
From: Wonjae Lee @ 2024-02-07  0:19 UTC (permalink / raw)
  To: Dave Jiang, linux-cxl; +Cc: dan.j.williams, KyungSan Kim, Hojin Nam

On Tue, Feb 06, 2024 at 08:24:04AM -0700, Dave Jiang wrote:
>
>
> On 2/5/24 6:23 PM, Wonjae Lee wrote:
> > On Mon, Feb 05, 2024 at 03:31:35PM -0700, Dave Jiang wrote:
> >>
> >>
> >> On 2/5/24 3:36 AM, Wonjae Lee wrote:
> >>> Hello,
> >>>
> >>> To test the CXL driver with respect to QTG IDs on a real CXL device, I
> >>> connected one CXL device to a CXL 2.0 Compliant System (v6.8-rc3).
> >>>
> >>> However, during cxl endpoint probing, CDAT extraction and parsing works
> >>> fine, but cxl_qos_class_verify() for cxlmd does not run properly.
> >>>
> >>> To be precise, when cxl_qos_class_verify() is executed, the below error
> >>> handling code is executed since cxlmd->endpoint is NULL:
> >>>
> >>>     if (!cxl_root)
> >>>         return -ENODEV;
> >>>
> >>>
> >>> I'm not sure if I analyzed it correctly due to the complexity of the CXL
> >>> driver, but I think it's because the cxl_port driver execute
> >>> cxl_qos_class_verify() before cxlmd->endpoint = endpoint was executed in
> >>> the cxl_mem driver. See the dmesg log below, where I've added debugging
> >>> code.
> >>>
> >>> # cxl_mem driver is adding the endpoint
> >>> [] cxl_mem mem0: call devm_cxl_add_enpoint
> >>> ...
> >>> # endpoint port is probed, and cxl_qos_class_verify() runs
> >>> [] cxl_port endpoint5: call cxl_qos_class_verify
> >>> [] cxl_mem mem0: cxl_qos_class_verify: cxlmd->endpoint is NULL
> >>> [] cxl_mem mem0: cxl_qos_class_verify: cxl_root is NULL
> >>> ...
> >>> # cxl_mem driver sets cxlmd->endpoint
> >>> [] cxl_mem mem0: cxl_endpoint_autoremove: cxlmd->endpoint = endpoint
> >>> ...
> >>>
> >>>
> >>> I did an experiment to validate the hypothesis. If I call
> >>> cxl_endpoint_parse_cdat() after cxlmd->endpoint is set,
> >>> cxl_qos_class_verify() runs well without problems.
> >>>
> >>> diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
> >>> index c5c9d8e0d88d..33b39c6c46fd 100644
> >>> --- a/drivers/cxl/mem.c
> >>> +++ b/drivers/cxl/mem.c
> >>> @@ -74,6 +74,8 @@ static int devm_cxl_add_endpoint(struct device *host, struct cxl_memdev *cxlmd,
> >>>         if (rc)
> >>>                 return rc;
> >>>
> >>> +       cxl_endpoint_parse_cdat(endpoint);
> >>> +
> >>>         if (!endpoint->dev.driver) {
> >>>                 dev_err(&cxlmd->dev, "%s failed probe\n",
> >>>                         dev_name(&endpoint->dev));
> >>> diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c
> >>> index 97c21566677a..ee77aba62780 100644
> >>> --- a/drivers/cxl/port.c
> >>> +++ b/drivers/cxl/port.c
> >>> @@ -111,7 +111,6 @@ static int cxl_endpoint_port_probe(struct cxl_port *port)
> >>>
> >>>         /* Cache the data early to ensure is_visible() works */
> >>>         read_cdat_data(port);
> >>> -       cxl_endpoint_parse_cdat(port);
> >>>
> >>>         get_device(&cxlmd->dev);
> >>>         rc = devm_add_action_or_reset(&port->dev, schedule_detach, cxlmd);
> >>>
> >>>
> >>> Maybe there's something I'm missing. It would be very helpful if anyone
> >>> could comment on the above analysis.
> >>
> >> I think this should fix the issue you are seeing?
> >>
> >> https://lore.kernel.org/linux-cxl/b243e80f-1b24-4756-8bb3-8389d66ea13a@intel.com/T/#mcbce77b6584bd1031d6c1928fcb36fe67be66039
> >>
> >
> > Oh, you've already fixed it. I found that on the same testbed the commit
> > you mentioned resolves the issue.
> >
> > Thanks for your response!
>
> Ok if I add your Tested-by tag to that patch?

Oh, sure. Feel free to add it:

Tested-by: Wonjae Lee <wj28.lee@samsung.com>

Thanks,
Wonjae

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-02-07  0:19 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CGME20240205103602epcms2p8543d4f3a4bfb684c81f07a94627c7aef@epcms2p8>
2024-02-05 10:36 ` cxl: question about cxl qos_class verification Wonjae Lee
2024-02-05 22:31   ` Dave Jiang
     [not found]   ` <CGME20240205103602epcms2p8543d4f3a4bfb684c81f07a94627c7aef@epcms2p6>
2024-02-06  1:23     ` Wonjae Lee
2024-02-06 15:24       ` (2) " Dave Jiang
     [not found]       ` <CGME20240205103602epcms2p8543d4f3a4bfb684c81f07a94627c7aef@epcms2p7>
2024-02-07  0:19         ` Wonjae Lee

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.