Linux-ACPI Archive on lore.kernel.org
 help / color / Atom feed
* MPAM branch verification (was RE: [RFC PATCH 2/2] ACPI / PPTT: cacheinfo: Label caches based on fw_token)
@ 2019-06-21 15:57 Shameerali Kolothum Thodi
  2019-06-24 17:35 ` James Morse
  0 siblings, 1 reply; 5+ messages in thread
From: Shameerali Kolothum Thodi @ 2019-06-21 15:57 UTC (permalink / raw)
  To: James Morse
  Cc: linux-acpi, Vijaya Kumar K, Lorenzo Pieralisi, Jeffrey Hugo,
	Sudeep Holla, Jeremy Linton, Tomasz Nowicki, Richard Ruigrok,
	Guohanjun (Hanjun Guo), wangxiongfeng (C),
	linux-arm-kernel, Linuxarm

Hi James,

> -----Original Message-----
> From: James Morse [mailto:james.morse@arm.com]
> Sent: 19 June 2019 14:31
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> Cc: linux-acpi@vger.kernel.org; Vijaya Kumar K <vkilari@codeaurora.org>;
> Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>; Jeffrey Hugo
> <jhugo@codeaurora.org>; Sudeep Holla <sudeep.holla@arm.com>; Jeremy
> Linton <jeremy.linton@arm.com>; Tomasz Nowicki
> <Tomasz.Nowicki@cavium.com>; Richard Ruigrok
> <rruigrok@qti.qualcomm.com>; Guohanjun (Hanjun Guo)
> <guohanjun@huawei.com>; wangxiongfeng (C)
> <wangxiongfeng2@huawei.com>; linux-arm-kernel@lists.infradead.org;
> Linuxarm <linuxarm@huawei.com>
> Subject: Re: [RFC PATCH 2/2] ACPI / PPTT: cacheinfo: Label caches based on
> fw_token
> 
> Hi Shameer,
> 
[...] 
> > I was just trying out the latest public MPAM branch available here[1]
> 
> Great!
> 
> 
> > and noted that
> > on our HiSilicon platform all the L3 cache were labeled with the same Id.
> Debugging> revealed that the above leaf node check was removed in this
> branch[2] which makes
> > the min_physid calculation going wrong.
> 
> Thanks for debugging this,
> 
> > Just wondering is there any particular reason
> > for removing the check or the branch is not carrying the latest patch?
> 
> Nope, that's a bug.
> 
> Jeremy Linton's review feedback[0] was that that PROCESSOR_ID_VALID flag
> can't be relied
> on. It looks like I over-zealously removed the whole if(), and this doesn't cause a
> problem with my pptt so I didn't notice.
> 
> I've fixed it locally, I've also pushed a fix to those branches, but it will get folded
> in
> next time I push a branch.

Thanks for that.

Apart from the above, I have come across few other issues as well and had some
temporary fixes to the branch here[0]. This is encountered while trying to get the
resctrl fs mounted and attempted a cqm test run using resctrl_tests tool. 

The fixes may not be proper ones, but I think it will give you an idea. Please take a
look and let me know your thoughts.

Thanks,
Shameer

[0] https://github.com/hisilicon/kernel-dev.git  branch: private-dbg-mpam-5.2-rc1

> 
> Thanks!
> 
> James
> 
> [0] lore.kernel.org/r/a68abfd2-1e28-d9e7-919a-8b3133db4d20@arm.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: MPAM branch verification (was RE: [RFC PATCH 2/2] ACPI / PPTT: cacheinfo: Label caches based on fw_token)
  2019-06-21 15:57 MPAM branch verification (was RE: [RFC PATCH 2/2] ACPI / PPTT: cacheinfo: Label caches based on fw_token) Shameerali Kolothum Thodi
@ 2019-06-24 17:35 ` James Morse
  2019-07-03 12:27   ` Shameerali Kolothum Thodi
  0 siblings, 1 reply; 5+ messages in thread
From: James Morse @ 2019-06-24 17:35 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi
  Cc: linux-acpi, Vijaya Kumar K, Lorenzo Pieralisi, Jeffrey Hugo,
	Sudeep Holla, Jeremy Linton, Tomasz Nowicki, Richard Ruigrok,
	Guohanjun (Hanjun Guo), wangxiongfeng (C),
	linux-arm-kernel, Linuxarm

Hi Shameer,

On 21/06/2019 16:57, Shameerali Kolothum Thodi wrote:
>> -----Original Message-----
>> From: James Morse [mailto:james.morse@arm.com]

>> Subject: Re: [RFC PATCH 2/2] ACPI / PPTT: cacheinfo: Label caches based on
>> fw_token

>>> and noted that
>>> on our HiSilicon platform all the L3 cache were labeled with the same Id.
>> Debugging> revealed that the above leaf node check was removed in this
>> branch[2] which makes
>>> the min_physid calculation going wrong.

>>> Just wondering is there any particular reason
>>> for removing the check or the branch is not carrying the latest patch?
>>
>> Nope, that's a bug.
>>
>> Jeremy Linton's review feedback[0] was that that PROCESSOR_ID_VALID flag
>> can't be relied
>> on. It looks like I over-zealously removed the whole if(), and this doesn't cause a
>> problem with my pptt so I didn't notice.
>>
>> I've fixed it locally, I've also pushed a fix to those branches, but it will get folded
>> in
>> next time I push a branch.
> 
> Thanks for that.
> 
> Apart from the above, I have come across few other issues as well and had some
> temporary fixes to the branch here[0]. This is encountered while trying to get the
> resctrl fs mounted and attempted a cqm test run using resctrl_tests tool. 

Thanks! I haven't run that on the model yet as I want it to get the monitors working first.

If you are seeing bugs in that monitor code, beware you're the only person to ever run it!


> The fixes may not be proper ones, but I think it will give you an idea. Please take a
> look and let me know your thoughts.

{,!} exposed_mon_capable, yup that's a typo.

the evt_list being uninitialised is because that code has never run, as noted in the
KNOWN_ISSUES, (The model doesn't expose have any of the mpam counters...)

Issues around component->resource mapping will disappear as I've re-written all that to
fix issues around picking the same resource twice.


The domid bitfield not being big enough for the width of the cacheinfo id field looks like
a bug in the existing resctrl code. Could you spin that as a patch against mainline?

It won't affect any x86 system, but I don't want to 'fix' anything as part of the mpam
support.

We almost certainly need to compress the cache-id numbers down to {0,1,2} if only so we
haven't filled all the exposed bits on day-1. (so it might not matter for arm64 either...)


Thanks,

James

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: MPAM branch verification (was RE: [RFC PATCH 2/2] ACPI / PPTT: cacheinfo: Label caches based on fw_token)
  2019-06-24 17:35 ` James Morse
@ 2019-07-03 12:27   ` Shameerali Kolothum Thodi
  2019-07-19 15:29     ` James Morse
  0 siblings, 1 reply; 5+ messages in thread
From: Shameerali Kolothum Thodi @ 2019-07-03 12:27 UTC (permalink / raw)
  To: James Morse
  Cc: Vijaya Kumar K, Lorenzo Pieralisi, Tomasz Nowicki, Jeffrey Hugo,
	Guohanjun (Hanjun Guo),
	Linuxarm, Jeremy Linton, linux-acpi, linux-arm-kernel,
	Sudeep Holla, wangxiongfeng (C),
	Richard Ruigrok

Hi James,

> -----Original Message-----
> From: linux-arm-kernel [mailto:linux-arm-kernel-bounces@lists.infradead.org]
> On Behalf Of James Morse
> Sent: 24 June 2019 18:36
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> Cc: Vijaya Kumar K <vkilari@codeaurora.org>; Lorenzo Pieralisi
> <lorenzo.pieralisi@arm.com>; Tomasz Nowicki
> <Tomasz.Nowicki@cavium.com>; Jeffrey Hugo <jhugo@codeaurora.org>;
> Guohanjun (Hanjun Guo) <guohanjun@huawei.com>; Linuxarm
> <linuxarm@huawei.com>; Jeremy Linton <jeremy.linton@arm.com>;
> linux-acpi@vger.kernel.org; linux-arm-kernel@lists.infradead.org; Sudeep Holla
> <sudeep.holla@arm.com>; wangxiongfeng (C)
> <wangxiongfeng2@huawei.com>; Richard Ruigrok
> <rruigrok@qti.qualcomm.com>
> Subject: Re: MPAM branch verification (was RE: [RFC PATCH 2/2] ACPI / PPTT:
> cacheinfo: Label caches based on fw_token)
> 
> Hi Shameer,
> 
> On 21/06/2019 16:57, Shameerali Kolothum Thodi wrote:
> >> -----Original Message-----
> >> From: James Morse [mailto:james.morse@arm.com]
> 
> >> Subject: Re: [RFC PATCH 2/2] ACPI / PPTT: cacheinfo: Label caches based on
> >> fw_token
> 
> >>> and noted that
> >>> on our HiSilicon platform all the L3 cache were labeled with the same Id.
> >> Debugging> revealed that the above leaf node check was removed in this
> >> branch[2] which makes
> >>> the min_physid calculation going wrong.
> 
> >>> Just wondering is there any particular reason
> >>> for removing the check or the branch is not carrying the latest patch?
> >>
> >> Nope, that's a bug.
> >>
> >> Jeremy Linton's review feedback[0] was that that PROCESSOR_ID_VALID
> flag
> >> can't be relied
> >> on. It looks like I over-zealously removed the whole if(), and this doesn't
> cause a
> >> problem with my pptt so I didn't notice.
> >>
> >> I've fixed it locally, I've also pushed a fix to those branches, but it will get
> folded
> >> in
> >> next time I push a branch.
> >
> > Thanks for that.
> >
> > Apart from the above, I have come across few other issues as well and had
> some
> > temporary fixes to the branch here[0]. This is encountered while trying to get
> the
> > resctrl fs mounted and attempted a cqm test run using resctrl_tests tool.
> 
> Thanks! I haven't run that on the model yet as I want it to get the monitors
> working first.
> 
> If you are seeing bugs in that monitor code, beware you're the only person to
> ever run it!

Ya..I guessed so :).

> 
> > The fixes may not be proper ones, but I think it will give you an idea. Please
> take a
> > look and let me know your thoughts.
> 
> {,!} exposed_mon_capable, yup that's a typo.

Ok.

> the evt_list being uninitialised is because that code has never run, as noted in
> the
> KNOWN_ISSUES, (The model doesn't expose have any of the mpam counters...)
> 
> Issues around component->resource mapping will disappear as I've re-written
> all that to
> fix issues around picking the same resource twice.

Ok. Good to know that.

> 
> The domid bitfield not being big enough for the width of the cacheinfo id field
> looks like
> a bug in the existing resctrl code. Could you spin that as a patch against
> mainline?

Yes it could be a bug. But I am not sure about the assumption on x86 platforms with
respect to cache id width. Also any need to consider 32 bit systems at all or not.

> It won't affect any x86 system, but I don't want to 'fix' anything as part of the
> mpam
> support.

Does that mean the cache id width on x86 will never be >14 bits?

> We almost certainly need to compress the cache-id numbers down to {0,1,2} if
> only so we
> haven't filled all the exposed bits on day-1. (so it might not matter for arm64
> either...)

That will be nice if we can compress it like that. I think we can leave the fix for now
and come up with a solution when things gets really going.

Mean time I am trying to probe memory controller as well on our system and it looks
like there are still issues. I will debug and update if it really is a problem. Please
let me know if you have any plans to update the branch so that I can try the latest.

Cheers,
Shameer
 
> 
> Thanks,
> 
> James
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: MPAM branch verification (was RE: [RFC PATCH 2/2] ACPI / PPTT: cacheinfo: Label caches based on fw_token)
  2019-07-03 12:27   ` Shameerali Kolothum Thodi
@ 2019-07-19 15:29     ` James Morse
  2019-08-15 10:38       ` Shameerali Kolothum Thodi
  0 siblings, 1 reply; 5+ messages in thread
From: James Morse @ 2019-07-19 15:29 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi
  Cc: Vijaya Kumar K, Lorenzo Pieralisi, Tomasz Nowicki, Jeffrey Hugo,
	Guohanjun (Hanjun Guo),
	Linuxarm, Jeremy Linton, linux-acpi, linux-arm-kernel,
	Sudeep Holla, wangxiongfeng (C),
	Richard Ruigrok

Hi Shameer,

On 03/07/2019 13:27, Shameerali Kolothum Thodi wrote:
>> -----Original Message-----
>> On 21/06/2019 16:57, Shameerali Kolothum Thodi wrote:
>>>> -----Original Message-----
>>>> From: James Morse [mailto:james.morse@arm.com]

>> The domid bitfield not being big enough for the width of the cacheinfo id field
>> looks like
>> a bug in the existing resctrl code. Could you spin that as a patch against
>> mainline?
> 
> Yes it could be a bug. But I am not sure about the assumption on x86 platforms with
> respect to cache id width. Also any need to consider 32 bit systems at all or not.
> 
>> It won't affect any x86 system, but I don't want to 'fix' anything as part of the
>> mpam
>> support.
> 
> Does that mean the cache id width on x86 will never be >14 bits?

I have no idea. Today they're 0,1,2, so its unlikely?, but Documentation/x86/resctrl.rst's
"Cache IDs" section says "it isn't guaranteed to be a contiguous sequence", so maybe?

The problem is 'struct cacheinfo's id field is an int, its exposed via sysfs as an int,
but resctrl packs it into a smaller size. That's going to bite one day, it would be good
to fix it now we know its a problem.


>> We almost certainly need to compress the cache-id numbers down to {0,1,2} if
>> only so we
>> haven't filled all the exposed bits on day-1. (so it might not matter for arm64
>> either...)
> 
> That will be nice if we can compress it like that> I think we can leave the fix for now
> and come up with a solution when things gets really going.
> 
> Mean time I am trying to probe memory controller as well on our system and it looks
> like there are still issues.

Typo in the MBA picking code? Should be:
| if (!mpam_has_feature(mpam_feat_mbw_part, class->features) &&
|     !mpam_has_feature(mpam_feat_mbw_max, class->features)) {

It can do something useful with either of those features, but the (!part || !max)
previously forced it to have both.

(This still doesn't work on the model as its describing a 0-bit bitmap MBW_PART)


> I will debug and update if it really is a problem. Please
> let me know if you have any plans to update the branch so that I can try the latest.

I hope to push a new version by the end of June. (whoosh! There goes June).
http://www.linux-arm.org/git?p=linux-jm.git;a=shortlog;h=refs/heads/mpam/snapshot/jun

The changes in there are to avoid the known-issues when the same 'thing' is picked as both
L3 resource and the MBA resource.

I think the risk of sleeping-while-atomic if not all mpam:devices are accessible from all
CPUs in the resctrl:domain is my next highest priority issue...


Thanks,

James

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: MPAM branch verification (was RE: [RFC PATCH 2/2] ACPI / PPTT: cacheinfo: Label caches based on fw_token)
  2019-07-19 15:29     ` James Morse
@ 2019-08-15 10:38       ` Shameerali Kolothum Thodi
  0 siblings, 0 replies; 5+ messages in thread
From: Shameerali Kolothum Thodi @ 2019-08-15 10:38 UTC (permalink / raw)
  To: James Morse
  Cc: Vijaya Kumar K, Lorenzo Pieralisi, Tomasz Nowicki, Jeffrey Hugo,
	Guohanjun (Hanjun Guo),
	Linuxarm, Jeremy Linton, linux-acpi, linux-arm-kernel,
	Sudeep Holla, wangxiongfeng (C),
	Richard Ruigrok, Wangshaobo (bobo)

Hi James,

Sorry for the delay. It took a while to get back into this.

> -----Original Message-----
> From: James Morse [mailto:james.morse@arm.com]
> Sent: 19 July 2019 16:30
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> Cc: Vijaya Kumar K <vkilari@codeaurora.org>; Lorenzo Pieralisi
> <lorenzo.pieralisi@arm.com>; Tomasz Nowicki
> <Tomasz.Nowicki@cavium.com>; Jeffrey Hugo <jhugo@codeaurora.org>;
> Guohanjun (Hanjun Guo) <guohanjun@huawei.com>; Linuxarm
> <linuxarm@huawei.com>; Jeremy Linton <jeremy.linton@arm.com>;
> linux-acpi@vger.kernel.org; linux-arm-kernel@lists.infradead.org; Sudeep Holla
> <sudeep.holla@arm.com>; wangxiongfeng (C)
> <wangxiongfeng2@huawei.com>; Richard Ruigrok
> <rruigrok@qti.qualcomm.com>
> Subject: Re: MPAM branch verification (was RE: [RFC PATCH 2/2] ACPI / PPTT:
> cacheinfo: Label caches based on fw_token)
> 
> Hi Shameer,
> 
> On 03/07/2019 13:27, Shameerali Kolothum Thodi wrote:
> >> -----Original Message-----
> >> On 21/06/2019 16:57, Shameerali Kolothum Thodi wrote:
> >>>> -----Original Message-----
> >>>> From: James Morse [mailto:james.morse@arm.com]
> 
> >> The domid bitfield not being big enough for the width of the cacheinfo id field
> >> looks like
> >> a bug in the existing resctrl code. Could you spin that as a patch against
> >> mainline?
> >
> > Yes it could be a bug. But I am not sure about the assumption on x86
> platforms with
> > respect to cache id width. Also any need to consider 32 bit systems at all or
> not.
> >
> >> It won't affect any x86 system, but I don't want to 'fix' anything as part of
> the
> >> mpam
> >> support.
> >
> > Does that mean the cache id width on x86 will never be >14 bits?
> 
> I have no idea. Today they're 0,1,2, so its unlikely?, but
> Documentation/x86/resctrl.rst's
> "Cache IDs" section says "it isn't guaranteed to be a contiguous sequence", so
> maybe?
> 
> The problem is 'struct cacheinfo's id field is an int, its exposed via sysfs as an
> int,
> but resctrl packs it into a smaller size. That's going to bite one day, it would be
> good
> to fix it now we know its a problem.
> 
> 
> >> We almost certainly need to compress the cache-id numbers down to {0,1,2}
> if
> >> only so we
> >> haven't filled all the exposed bits on day-1. (so it might not matter for arm64
> >> either...)
> >
> > That will be nice if we can compress it like that> I think we can leave the fix
> for now
> > and come up with a solution when things gets really going.
> >
> > Mean time I am trying to probe memory controller as well on our system and
> it looks
> > like there are still issues.
> 
> Typo in the MBA picking code? Should be:
> | if (!mpam_has_feature(mpam_feat_mbw_part, class->features) &&
> |     !mpam_has_feature(mpam_feat_mbw_max, class->features)) {
> 
> It can do something useful with either of those features, but the (!part || !max)
> previously forced it to have both.
> 
> (This still doesn't work on the model as its describing a 0-bit bitmap
> MBW_PART)

I think what happens on our hardware is, the MBA reports PMG_MAX = 0 and that
upsets mpam_pmg_bits() -->ilog2(). I am not entirely sure whether PMG_MAX= 0 is
allowed as per spec when the resource reports HAS_MSMON =1. But hasn't found
anything in spec that forbids this as the filter is a combination of PRATID:PMG.

I have a temp hack here to keep it going,

https://github.com/hisilicon/kernel-dev/commit/5e0881c4cdded4066dfac7603c53242385417a3a
 
> 
> > I will debug and update if it really is a problem. Please
> > let me know if you have any plans to update the branch so that I can try the
> latest.
> 
> I hope to push a new version by the end of June. (whoosh! There goes June).
> http://www.linux-arm.org/git?p=linux-jm.git;a=shortlog;h=refs/heads/mpam/s
> napshot/jun

Thanks for that. I am using this now. (And I see a more recent one mpam/5.3-tmp
now. Has anything changed other than rebase?)

>
> The changes in there are to avoid the known-issues when the same 'thing' is
> picked as both
> L3 resource and the MBA resource.

Now with the above fix for PMG_MAX=0, I am hitting another issue.
mount -t resctrl resctrl /sys/fs/resctrl fails with "File exists" error.

Debugging points to,
rdt_get_tree() 
  mkdir_mondata_all()
    mkdir_mondata_subdir_alldom()
      mkdir_mondata_subdir() 
        mon_addfile()

It looks like r->evt_list gets corrupted somehow and has duplicate entries. I haven’t
gone into the bottom of this issue, but please let me know if you have any idea.

Cheers,
Shameer

> I think the risk of sleeping-while-atomic if not all mpam:devices are accessible
> from all
> CPUs in the resctrl:domain is my next highest priority issue...
> 
> 
> Thanks,
> 
> James

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, back to index

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-21 15:57 MPAM branch verification (was RE: [RFC PATCH 2/2] ACPI / PPTT: cacheinfo: Label caches based on fw_token) Shameerali Kolothum Thodi
2019-06-24 17:35 ` James Morse
2019-07-03 12:27   ` Shameerali Kolothum Thodi
2019-07-19 15:29     ` James Morse
2019-08-15 10:38       ` Shameerali Kolothum Thodi

Linux-ACPI Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-acpi/0 linux-acpi/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-acpi linux-acpi/ https://lore.kernel.org/linux-acpi \
		linux-acpi@vger.kernel.org linux-acpi@archiver.kernel.org
	public-inbox-index linux-acpi


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-acpi


AGPL code for this site: git clone https://public-inbox.org/ public-inbox