All of lore.kernel.org
 help / color / mirror / Atom feed
* [lustre-devel] Lustre Arm stuff status and work plan
@ 2021-12-16  8:23 Xinliang Liu via lustre-devel
  2021-12-20  2:30 ` Xinliang Liu via lustre-devel
  0 siblings, 1 reply; 16+ messages in thread
From: Xinliang Liu via lustre-devel @ 2021-12-16  8:23 UTC (permalink / raw)
  To: Peter Jones; +Cc: Jian Yu, lixi, cloud-dev-request, lustre-devel


[-- Attachment #1.1: Type: text/plain, Size: 5279 bytes --]

*Hi Peter and all,As Kevin(on cc) and I have been working on Lustre Arm
stuff for some time. We want to give a status and progress report to the
community and list our work plan for the next year. Please help to review
our work plan and give some comments and suggestions. Thanks.*Status and
Progress












*====================Release - No Arm packages built on official community
release <https://downloads.whamcloud.com/public/lustre/> yet.Build -
Verified Lustre, openZFS build and multi-nodes setup on Arm64 CentOS 8, all
are ok.- Lbuild script support for Arm is on review
<https://review.whamcloud.com/#/c/45691/>, LU-15293
<https://jira.whamcloud.com/browse/LU-15293>.CI - No Arm server end CI
support yet.- Arm client with x86_64 server test is already in the CI
gate.- Only run a few ldiskfs test suites(sanity, sanity-sec,
sanctity-lnet, etc.), not a full test.- A full test
<https://review.whamcloud.com/#/c/44760/> (with empty GRANT_CHECK_LIST)
shows several Arm client related failed test cases, see test results page
<https://testing.whamcloud.com/test_sessions?jobs=lustre-reviews&builds=82774&start_date=2021-08-26#redirect>-
sanity test 317: LU-11667 <https://jira.whamcloud.com/browse/LU-11667>
(Workaround fix landed)- sanityn test 16a: LU-11597
<https://jira.whamcloud.com/browse/LU-11597>, test 71a: LU-11787
<https://jira.whamcloud.com/browse/LU-11787>- conf-sanity test 98: LU-11785
<https://jira.whamcloud.com/browse/LU-11785>, test 112: LU-13813
<https://jira.whamcloud.com/browse/LU-13813>- sanity-flr test 50a: LU-14970
<https://jira.whamcloud.com/browse/LU-14970>- sanity-pcc test 7a: LU-14346
<https://jira.whamcloud.com/browse/LU-14346>Arm server end test on local
setup - Run a full ldiskfs test with all test suites.- Due to the multi
MDTs crash issue, some multi MDTs tests are not run.- Many new failed tests
come, see the google sheet
<https://docs.google.com/spreadsheets/d/1EE5zU96_lqlkS0uk6NJeeNBrikYpd_ZEO7hdVt5spsw/edit#gid=969410610>for
details.- The openZFS full test is not run, but heard that it should be
more stable than ldiskfs.Bugfix - Old Arm always_except bugs
https://jira.whamcloud.com/issues/?filter=15555
<https://jira.whamcloud.com/issues/?filter=15555> , the Arm related ones
are almost addressed.- LU-11596
<https://jira.whamcloud.com/browse/LU-11596>, LU-11597
<https://jira.whamcloud.com/browse/LU-11597>, LU-14067
<https://jira.whamcloud.com/browse/LU-14067>, LU-11787
<https://jira.whamcloud.com/browse/LU-11787>: addressed, patch sent and
waiting for Arm client CI recovery to land.- LU-10073
<https://jira.whamcloud.com/browse/LU-10073>, LU-11671
<https://jira.whamcloud.com/browse/LU-11671>: can't be reproduced on Arm or
happen on x86_64 also.- Other old Arm bugs  LU-11785
<https://jira.whamcloud.com/browse/LU-11785>, LU-13813
<https://jira.whamcloud.com/browse/LU-13813>, LU-14970
<https://jira.whamcloud.com/browse/LU-14970>, LU-14346
<https://jira.whamcloud.com/browse/LU-14346> to be fixed.- New created
server end bugs - LU-15122 <https://jira.whamcloud.com/browse/LU-15122> :
ASSERTION( iobuf->dr_rw == 0 ) crash issue, fixed patch is landed. -
LU-15364 <https://jira.whamcloud.com/browse/LU-15364>: multi MDTs kernel
oops issue, related to atomic unaligned memory access, work in progress.-
LU-15223 <https://jira.whamcloud.com/browse/LU-15223>: 64K page size
read/write improvement, long-term work, in progress. - Full Arm related bug
list with label arm: https://jira.whamcloud.com/issues/?filter=16710
<https://jira.whamcloud.com/issues/?filter=16710>- is not that ready for
production.Reference to:James Simmons’ Lustre Arm update:
https://connect.linaro.org/resources/san19/san19-224/
<https://connect.linaro.org/resources/san19/san19-224/>*Work Plan




*========== - Lustre Server End Critical Bug Fix target 2022-06- Lustre
Multiple MDTs kernel OOPS when stripe issue: LU-15364
<https://jira.whamcloud.com/browse/LU-15364>- Lustre hangs at Sanity Test
807- Lustre Conf-sanity test 44 kernel crash- Lustre Conf sanity case 58
kernel crash- Lustre Conf sanity case 78 kernel crash- Lustre Conf sanity
case 79 crash- Lustre sanity-pcc 7a case hang the cluster- Lustre Server
End Non-critical Bug Fix target 2022-12- Lustre Sanity failure cases: 33
cases- Lustre server replay-single: 1 case- Lustre sanity-flr 200 cases
fix: 1 case - Lustre sanity-hsm failure cases: 25 cases- Lustre
lustre-rsync-test failure test: 3- Lustre recovery-small/sanity-scrub: 2-
Lustre sanityn test cases fix: 12- Lustre sanity-lfsck failure cases fix:
3- Lustre sanity-sec failure cases fix :7 - Lustre sanity-lnet failure
cases test fix: 2- Continuous add more test suites for Arm client CI ??-
Once a  test suite is all passed for Arm then add it into CI.- Server CI
support for Arm on Centos8 ??- Ideally, Arm server CI can come with Arm
server end fixes patches and ensure future patches merged don’t make any
regressions on Arm.- As the test infra is not open source and maintained by
whamcloud, it might need whamcloud to make it ??- Other works in future-
Test other distros like ubuntu, SUSE etc.- Test x86 client with Arm64
Server - Basic Optimised: CRC/AES- All-flash optimizationBest
Regards,Xinliang*

[-- Attachment #1.2: Type: text/html, Size: 54661 bytes --]

[-- Attachment #2: Type: text/plain, Size: 165 bytes --]

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [lustre-devel] Lustre Arm stuff status and work plan
  2021-12-16  8:23 [lustre-devel] Lustre Arm stuff status and work plan Xinliang Liu via lustre-devel
@ 2021-12-20  2:30 ` Xinliang Liu via lustre-devel
  2021-12-23  7:43   ` Oleg Drokin via lustre-devel
  0 siblings, 1 reply; 16+ messages in thread
From: Xinliang Liu via lustre-devel @ 2021-12-20  2:30 UTC (permalink / raw)
  To: Peter Jones; +Cc: Jian Yu, lixi, cloud-dev-request, lustre-devel


[-- Attachment #1.1: Type: text/plain, Size: 4335 bytes --]

(Maybe converting to plain text email will be more readable.)

Hi Peter and all,
As Kevin(on cc) and I have been working on Lustre Arm stuff for some time.
We want to give a status and progress report to the community and list our
work plan for the next year.
Please help to review our work plan and give some comments and suggestions.
Thanks.


Status and Progress
================
Release
-------
- No Arm packages built on official community release yet.

Build
------
- Verified Lustre, openZFS build and multi-nodes setup on Arm64 CentOS 8,
all are ok.
- Lbuild script support for Arm is on review, LU-15293.

CI
---
- No Arm server end CI support yet.
- Arm client with x86_64 server test is already in the CI gate.
  - Only run a few ldiskfs test suites(sanity, sanity-sec, sanctity-lnet,
etc.), not a full test.
  - A full test (with empty GRANT_CHECK_LIST) shows several Arm client
related failed test cases, see test results page:
https://testing.whamcloud.com/test_sessions?jobs=lustre-reviews&builds=82774&start_date=2021-08-26#redirect
    - sanity test 317: LU-11667 (Workaround fix landed)
    - sanityn test 16a: LU-11597, test 71a: LU-11787
    - conf-sanity test 98: LU-11785, test 112: LU-13813
    - sanity-flr test 50a: LU-14970
    - sanity-pcc test 7a: LU-14346

Arm server end test on local setup
---------------------------------------------
- Run a full ldiskfs test with all test suites.
  - Due to the multi MDTs crash issue, some multi MDTs tests are not run.
  - Many new failed tests come, see the test result google sheet for
details:
https://docs.google.com/spreadsheets/d/1EE5zU96_lqlkS0uk6NJeeNBrikYpd_ZEO7hdVt5spsw/edit#gid=969410610
  - The openZFS full test is not run, but heard that it should be more
stable than ldiskfs.

Bugfix
-------
- Old Arm always_except bugs https://jira.whamcloud.com/issues/?filter=15555
, the Arm related ones are almost addressed.
  - LU-11596, LU-11597, LU-14067, LU-11787: addressed, patch sent and
waiting for Arm client CI recovery to land.
  - LU-10073, LU-11671: can't be reproduced on Arm or happen on x86_64 also.

- Other old Arm bugs  LU-11785, LU-13813, LU-14970, LU-14346 to be fixed.

- New created server end bugs
  - LU-15122 : ASSERTION( iobuf->dr_rw == 0 ) crash issue, fixed patch is
landed.
  - LU-15364: multi MDTs kernel oops issue, related to atomic unaligned
memory access, work in progress.
  - LU-15223: 64K page size read/write improvement, long-term work, in
progress.

- Full Arm related bug list with label arm:
https://jira.whamcloud.com/issues/?filter=16710

Reference to:
James Simmons’ Lustre Arm update:
https://connect.linaro.org/resources/san19/san19-224/


Work Plan
========
- Lustre Server End Critical Bug Fix target 2022-06
  - Lustre Multiple MDTs kernel OOPS when stripe issue: LU-15364
  - Lustre hangs at Sanity Test 807
  - Lustre Conf-sanity test 44 kernel crash
  - Lustre Conf sanity case 58 kernel crash
  - Lustre Conf sanity case 78 kernel crash
  - Lustre Conf sanity case 79 crash
  - Lustre sanity-pcc 7a case hang the cluster

- Lustre Server End Non-critical Bug Fix target 2022-12
  - Lustre Sanity failure cases: 33 cases
  - Lustre server replay-single: 1 case
  - Lustre sanity-flr 200 cases fix: 1 case
  - Lustre sanity-hsm failure cases: 25 cases
  - Lustre lustre-rsync-test failure test: 3 cases
  - Lustre recovery-small/sanity-scrub: 2 cases
  - Lustre sanityn test cases fix: 12 cases
  - Lustre sanity-lfsck failure cases fix: 3 cases
  - Lustre sanity-sec failure cases fix :7 cases
  - Lustre sanity-lnet failure cases test fix: 2 cases

- Continuous add more test suites for Arm client CI ??
  - Once a test suite is all passed for Arm then add it into CI.

- Server CI support for Arm on Centos8 ??
  - Ideally, Arm server CI can come with Arm server end fixes patches and
ensure future patches merged don’t make any regressions on Arm.
  - As the test infra is not open source and maintained by whamcloud, it
might need whamcloud to make it ??

- Other works in future
  - Full test with openZFS backend.
  - Test x86 client with Arm64 Server
  - Test other distros like ubuntu, SUSE etc.
  - Basic Optimised: CRC/AES
  - All-flash optimization


Best Regards,
Xinliang

[-- Attachment #1.2: Type: text/html, Size: 5034 bytes --]

[-- Attachment #2: Type: text/plain, Size: 165 bytes --]

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [lustre-devel] Lustre Arm stuff status and work plan
  2021-12-20  2:30 ` Xinliang Liu via lustre-devel
@ 2021-12-23  7:43   ` Oleg Drokin via lustre-devel
  2021-12-27  7:56     ` Xinliang Liu via lustre-devel
  0 siblings, 1 reply; 16+ messages in thread
From: Oleg Drokin via lustre-devel @ 2021-12-23  7:43 UTC (permalink / raw)
  To: Xinliang Liu
  Cc: Jian Yu, Li Xi, cloud-dev-request, Xinliang Liu via lustre-devel


[-- Attachment #1.1: Type: text/plain, Size: 5895 bytes --]

Hello!

 Thank you for your work on this.

the test infra is not open source, but it is open - it’s just gerrit, so you can plug your own CI nodes
to report test results for configurations that you think are well working (to ensure they don’t break).
 This applies to everybody having ideas about gaps in test coverage for patches in review (which is
  the most important part of the pipeline IMO - stuff that was not caught on this step and got committed is much
  harder to fix - it then needs a triage for failures, a ticket and somebody assigned to do it and another patch)

 It’ll need some work on your end to create the setup that meets the level of testing you feel is adequate,
 e.g. you could do something simple using this builder https://wiki.lustre.org/index.php?title=Simple_Gerrit_Builder_Howto

 or you could do something really advanced like what I have with the janitortester that is also on github https://github.com/verygreen/lustretester

  or I guess you can use some off the shelf CI solution that integrtes with gerrit.

  You can also report results into Maloo DB if you don’t want to host your own logs infra, there’s API for that, though I see it’s not exactly public
   but that’s probably just an omission and it should be made public https://wiki.whamcloud.com/display/TEI/Test+results+format
   Let me know if that’s something you are interested in and I will try to provide you with this data.


On Dec 19, 2021, at 9:30 PM, Xinliang Liu <xinliang.liu@linaro.org<mailto:xinliang.liu@linaro.org>> wrote:

(Maybe converting to plain text email will be more readable.)

Hi Peter and all,
As Kevin(on cc) and I have been working on Lustre Arm stuff for some time.
We want to give a status and progress report to the community and list our work plan for the next year.
Please help to review our work plan and give some comments and suggestions. Thanks.


Status and Progress
================
Release
-------
- No Arm packages built on official community release yet.

Build
------
- Verified Lustre, openZFS build and multi-nodes setup on Arm64 CentOS 8, all are ok.
- Lbuild script support for Arm is on review, LU-15293.

CI
---
- No Arm server end CI support yet.
- Arm client with x86_64 server test is already in the CI gate.
  - Only run a few ldiskfs test suites(sanity, sanity-sec, sanctity-lnet, etc.), not a full test.
  - A full test (with empty GRANT_CHECK_LIST) shows several Arm client related failed test cases, see test results page: https://testing.whamcloud.com/test_sessions?jobs=lustre-reviews&builds=82774&start_date=2021-08-26#redirect
    - sanity test 317: LU-11667 (Workaround fix landed)
    - sanityn test 16a: LU-11597, test 71a: LU-11787
    - conf-sanity test 98: LU-11785, test 112: LU-13813
    - sanity-flr test 50a: LU-14970
    - sanity-pcc test 7a: LU-14346

Arm server end test on local setup
---------------------------------------------
- Run a full ldiskfs test with all test suites.
  - Due to the multi MDTs crash issue, some multi MDTs tests are not run.
  - Many new failed tests come, see the test result google sheet for details: https://docs.google.com/spreadsheets/d/1EE5zU96_lqlkS0uk6NJeeNBrikYpd_ZEO7hdVt5spsw/edit#gid=969410610
  - The openZFS full test is not run, but heard that it should be more stable than ldiskfs.

Bugfix
-------
- Old Arm always_except bugs https://jira.whamcloud.com/issues/?filter=15555 , the Arm related ones are almost addressed.
  - LU-11596, LU-11597, LU-14067, LU-11787: addressed, patch sent and waiting for Arm client CI recovery to land.
  - LU-10073, LU-11671: can't be reproduced on Arm or happen on x86_64 also.

- Other old Arm bugs  LU-11785, LU-13813, LU-14970, LU-14346 to be fixed.

- New created server end bugs
  - LU-15122 : ASSERTION( iobuf->dr_rw == 0 ) crash issue, fixed patch is landed.
  - LU-15364: multi MDTs kernel oops issue, related to atomic unaligned memory access, work in progress.
  - LU-15223: 64K page size read/write improvement, long-term work, in progress.

- Full Arm related bug list with label arm: https://jira.whamcloud.com/issues/?filter=16710

Reference to:
James Simmons’ Lustre Arm update: https://connect.linaro.org/resources/san19/san19-224/


Work Plan
========
- Lustre Server End Critical Bug Fix target 2022-06
  - Lustre Multiple MDTs kernel OOPS when stripe issue: LU-15364
  - Lustre hangs at Sanity Test 807
  - Lustre Conf-sanity test 44 kernel crash
  - Lustre Conf sanity case 58 kernel crash
  - Lustre Conf sanity case 78 kernel crash
  - Lustre Conf sanity case 79 crash
  - Lustre sanity-pcc 7a case hang the cluster

- Lustre Server End Non-critical Bug Fix target 2022-12
  - Lustre Sanity failure cases: 33 cases
  - Lustre server replay-single: 1 case
  - Lustre sanity-flr 200 cases fix: 1 case
  - Lustre sanity-hsm failure cases: 25 cases
  - Lustre lustre-rsync-test failure test: 3 cases
  - Lustre recovery-small/sanity-scrub: 2 cases
  - Lustre sanityn test cases fix: 12 cases
  - Lustre sanity-lfsck failure cases fix: 3 cases
  - Lustre sanity-sec failure cases fix :7 cases
  - Lustre sanity-lnet failure cases test fix: 2 cases

- Continuous add more test suites for Arm client CI ??
  - Once a test suite is all passed for Arm then add it into CI.

- Server CI support for Arm on Centos8 ??
  - Ideally, Arm server CI can come with Arm server end fixes patches and ensure future patches merged don’t make any regressions on Arm.
  - As the test infra is not open source and maintained by whamcloud, it might need whamcloud to make it ??

- Other works in future
  - Full test with openZFS backend.
  - Test x86 client with Arm64 Server
  - Test other distros like ubuntu, SUSE etc.
  - Basic Optimised: CRC/AES
  - All-flash optimization


Best Regards,
Xinliang


[-- Attachment #1.2: Type: text/html, Size: 9193 bytes --]

[-- Attachment #2: Type: text/plain, Size: 165 bytes --]

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [lustre-devel] Lustre Arm stuff status and work plan
  2021-12-23  7:43   ` Oleg Drokin via lustre-devel
@ 2021-12-27  7:56     ` Xinliang Liu via lustre-devel
  2021-12-27 15:23       ` Oleg Drokin via lustre-devel
  0 siblings, 1 reply; 16+ messages in thread
From: Xinliang Liu via lustre-devel @ 2021-12-27  7:56 UTC (permalink / raw)
  To: Oleg Drokin
  Cc: Jian Yu, Li Xi, cloud-dev-request, Xinliang Liu via lustre-devel


[-- Attachment #1.1: Type: text/plain, Size: 7445 bytes --]

Hello Oleg,
Thank you for your suggestions about Arm CI.

On Thu, 23 Dec 2021 at 15:43, Oleg Drokin <green@whamcloud.com> wrote:

> Hello!
>
>  Thank you for your work on this.
>
> the test infra is not open source, but it is open - it’s just gerrit, so
> you can plug your own CI nodes
> to report test results for configurations that you think are well working
> (to ensure they don’t break).
>  This applies to everybody having ideas about gaps in test coverage for
> patches in review (which is
>   the most important part of the pipeline IMO - stuff that was not caught
> on this step and got committed is much
>   harder to fix - it then needs a triage for failures, a ticket and
> somebody assigned to do it and another patch)
>
>  It’ll need some work on your end to create the setup that meets the level
> of testing you feel is adequate,
>  e.g. you could do something simple using this builder
> https://wiki.lustre.org/index.php?title=Simple_Gerrit_Builder_Howto
>
>  or you could do something really advanced like what I have with the
> janitortester that is also on github
> https://github.com/verygreen/lustretester
>
>   or I guess you can use some off the shelf CI solution that integrtes
> with gerrit.
>
>   You can also report results into Maloo DB if you don’t want to host your
> own logs infra, there’s API for that, though I see it’s not exactly public
>    but that’s probably just an omission and it should be made public
> https://wiki.whamcloud.com/display/TEI/Test+results+format
>    Let me know if that’s something you are interested in and I will try to
> provide you with this data.
>

Yeah, that sounds interesting. Please provide me with this data, the link
responses 404 now., thanks.

Ideally, we want enough Arm CI tests as x86_64. And these tests are running
and maintained in the Lustre official test infra like review-ldiskfs-arm we
have now.
But if it is impossible now, we will consider setting up such CI on our own.

And if we run some kind of tests externally in our end, we can see the test
result (pass or not, running log link) displayed on the gerrit patch page
as the community running tests.
We want to make our test results public to the Lustre community, so that
developers can see the test results and adjust the patch if necessary. Thus
avoiding breaking previous Arm work.
That can be made via reporting results into Maloo DB, right? Am I
understanding correctly?

We will look into the links and evaluate on setting up our own or
self-hosted test infra and see if we can make it.
Such as,
1. How many VMs resources for the test infra and test running;
2. Other requirements for the self-hosted test infra;
3. Work items to be done;
..

Thanks,
Xinliang



>
> On Dec 19, 2021, at 9:30 PM, Xinliang Liu <xinliang.liu@linaro.org> wrote:
>
> (Maybe converting to plain text email will be more readable.)
>
> Hi Peter and all,
> As Kevin(on cc) and I have been working on Lustre Arm stuff for some time.
> We want to give a status and progress report to the community and list our
> work plan for the next year.
> Please help to review our work plan and give some comments and
> suggestions. Thanks.
>
>
> Status and Progress
> ================
> Release
> -------
> - No Arm packages built on official community release yet.
>
> Build
> ------
> - Verified Lustre, openZFS build and multi-nodes setup on Arm64 CentOS 8,
> all are ok.
> - Lbuild script support for Arm is on review, LU-15293.
>
> CI
> ---
> - No Arm server end CI support yet.
> - Arm client with x86_64 server test is already in the CI gate.
>   - Only run a few ldiskfs test suites(sanity, sanity-sec, sanctity-lnet,
> etc.), not a full test.
>   - A full test (with empty GRANT_CHECK_LIST) shows several Arm client
> related failed test cases, see test results page:
> https://testing.whamcloud.com/test_sessions?jobs=lustre-reviews&builds=82774&start_date=2021-08-26#redirect
>     - sanity test 317: LU-11667 (Workaround fix landed)
>     - sanityn test 16a: LU-11597, test 71a: LU-11787
>     - conf-sanity test 98: LU-11785, test 112: LU-13813
>     - sanity-flr test 50a: LU-14970
>     - sanity-pcc test 7a: LU-14346
>
> Arm server end test on local setup
> ---------------------------------------------
> - Run a full ldiskfs test with all test suites.
>   - Due to the multi MDTs crash issue, some multi MDTs tests are not run.
>   - Many new failed tests come, see the test result google sheet for
> details:
> https://docs.google.com/spreadsheets/d/1EE5zU96_lqlkS0uk6NJeeNBrikYpd_ZEO7hdVt5spsw/edit#gid=969410610
>   - The openZFS full test is not run, but heard that it should be more
> stable than ldiskfs.
>
> Bugfix
> -------
> - Old Arm always_except bugs
> https://jira.whamcloud.com/issues/?filter=15555 , the Arm related ones
> are almost addressed.
>   - LU-11596, LU-11597, LU-14067, LU-11787: addressed, patch sent and
> waiting for Arm client CI recovery to land.
>   - LU-10073, LU-11671: can't be reproduced on Arm or happen on x86_64
> also.
>
> - Other old Arm bugs  LU-11785, LU-13813, LU-14970, LU-14346 to be fixed.
>
> - New created server end bugs
>   - LU-15122 : ASSERTION( iobuf->dr_rw == 0 ) crash issue, fixed patch is
> landed.
>   - LU-15364: multi MDTs kernel oops issue, related to atomic unaligned
> memory access, work in progress.
>   - LU-15223: 64K page size read/write improvement, long-term work, in
> progress.
>
> - Full Arm related bug list with label arm:
> https://jira.whamcloud.com/issues/?filter=16710
>
> Reference to:
> James Simmons’ Lustre Arm update:
> https://connect.linaro.org/resources/san19/san19-224/
>
>
> Work Plan
> ========
> - Lustre Server End Critical Bug Fix target 2022-06
>   - Lustre Multiple MDTs kernel OOPS when stripe issue: LU-15364
>   - Lustre hangs at Sanity Test 807
>   - Lustre Conf-sanity test 44 kernel crash
>   - Lustre Conf sanity case 58 kernel crash
>   - Lustre Conf sanity case 78 kernel crash
>   - Lustre Conf sanity case 79 crash
>   - Lustre sanity-pcc 7a case hang the cluster
>
> - Lustre Server End Non-critical Bug Fix target 2022-12
>   - Lustre Sanity failure cases: 33 cases
>   - Lustre server replay-single: 1 case
>   - Lustre sanity-flr 200 cases fix: 1 case
>   - Lustre sanity-hsm failure cases: 25 cases
>   - Lustre lustre-rsync-test failure test: 3 cases
>   - Lustre recovery-small/sanity-scrub: 2 cases
>   - Lustre sanityn test cases fix: 12 cases
>   - Lustre sanity-lfsck failure cases fix: 3 cases
>   - Lustre sanity-sec failure cases fix :7 cases
>   - Lustre sanity-lnet failure cases test fix: 2 cases
>
> - Continuous add more test suites for Arm client CI ??
>   - Once a test suite is all passed for Arm then add it into CI.
>
> - Server CI support for Arm on Centos8 ??
>   - Ideally, Arm server CI can come with Arm server end fixes patches and
> ensure future patches merged don’t make any regressions on Arm.
>   - As the test infra is not open source and maintained by whamcloud, it
> might need whamcloud to make it ??
>
> - Other works in future
>   - Full test with openZFS backend.
>   - Test x86 client with Arm64 Server
>   - Test other distros like ubuntu, SUSE etc.
>   - Basic Optimised: CRC/AES
>   - All-flash optimization
>
>
> Best Regards,
> Xinliang
>
>
>

[-- Attachment #1.2: Type: text/html, Size: 9635 bytes --]

[-- Attachment #2: Type: text/plain, Size: 165 bytes --]

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [lustre-devel] Lustre Arm stuff status and work plan
  2021-12-27  7:56     ` Xinliang Liu via lustre-devel
@ 2021-12-27 15:23       ` Oleg Drokin via lustre-devel
  2021-12-28  1:53         ` Xinliang Liu via lustre-devel
  0 siblings, 1 reply; 16+ messages in thread
From: Oleg Drokin via lustre-devel @ 2021-12-27 15:23 UTC (permalink / raw)
  To: Xinliang Liu
  Cc: Jian Yu, Li Xi, cloud-dev-request, Xinliang Liu via lustre-devel



> On Dec 27, 2021, at 2:56 AM, Xinliang Liu <xinliang.liu@linaro.org> wrote:
> 
> And if we run some kind of tests externally in our end, we can see the test result (pass or not, running log link) displayed on the gerrit patch page as the community running tests. 
> We want to make our test results public to the Lustre community, so that developers can see the test results and adjust the patch if necessary. Thus avoiding breaking previous Arm work.
> That can be made via reporting results into Maloo DB, right? Am I understanding correctly?

In general people are more likely to see results when you post a comment to gerrit with said results.
Maloo is just one place to link to to actually let people see the results, but you can link to external resources too
like e.g. gatekeeper janitor helper does or assuming the information is small enough it could be entirely contained
in the comment (like say for a build failure)

> We will look into the links and evaluate on setting up our own or self-hosted test infra and see if we can make it.
> Such as,
> 1. How many VMs resources for the test infra and test running;
> 2. Other requirements for the self-hosted test infra; 
> 3. Work items to be done;
> ..
_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [lustre-devel] Lustre Arm stuff status and work plan
  2021-12-27 15:23       ` Oleg Drokin via lustre-devel
@ 2021-12-28  1:53         ` Xinliang Liu via lustre-devel
  2021-12-28  1:58           ` Oleg Drokin via lustre-devel
  0 siblings, 1 reply; 16+ messages in thread
From: Xinliang Liu via lustre-devel @ 2021-12-28  1:53 UTC (permalink / raw)
  To: Oleg Drokin
  Cc: Jian Yu, Li Xi, cloud-dev-request, Xinliang Liu via lustre-devel


[-- Attachment #1.1: Type: text/plain, Size: 1480 bytes --]

On Mon, 27 Dec 2021 at 23:23, Oleg Drokin <green@whamcloud.com> wrote:

>
>
> > On Dec 27, 2021, at 2:56 AM, Xinliang Liu <xinliang.liu@linaro.org>
> wrote:
> >
> > And if we run some kind of tests externally in our end, we can see the
> test result (pass or not, running log link) displayed on the gerrit patch
> page as the community running tests.
> > We want to make our test results public to the Lustre community, so that
> developers can see the test results and adjust the patch if necessary. Thus
> avoiding breaking previous Arm work.
> > That can be made via reporting results into Maloo DB, right? Am I
> understanding correctly?
>
> In general people are more likely to see results when you post a comment
> to gerrit with said results.
>

Yeah, that's what we want.


> Maloo is just one place to link to to actually let people see the results,
> but you can link to external resources too
> like e.g. gatekeeper janitor helper does or assuming the information is
> small enough it could be entirely contained
> in the comment (like say for a build failure)
>

Ok, understand now. Is there any other reference external CI that posts
results to Lustre gerrit now?


>
> > We will look into the links and evaluate on setting up our own or
> self-hosted test infra and see if we can make it.
> > Such as,
> > 1. How many VMs resources for the test infra and test running;
> > 2. Other requirements for the self-hosted test infra;
> > 3. Work items to be done;
> > ..
>

[-- Attachment #1.2: Type: text/html, Size: 2266 bytes --]

[-- Attachment #2: Type: text/plain, Size: 165 bytes --]

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [lustre-devel] Lustre Arm stuff status and work plan
  2021-12-28  1:53         ` Xinliang Liu via lustre-devel
@ 2021-12-28  1:58           ` Oleg Drokin via lustre-devel
  2022-02-18  8:05             ` Kevin Zhao via lustre-devel
  0 siblings, 1 reply; 16+ messages in thread
From: Oleg Drokin via lustre-devel @ 2021-12-28  1:58 UTC (permalink / raw)
  To: Xinliang Liu
  Cc: Jian Yu, Li Xi, cloud-dev-request, Xinliang Liu via lustre-devel


[-- Attachment #1.1: Type: text/plain, Size: 838 bytes --]



On Dec 27, 2021, at 8:53 PM, Xinliang Liu <xinliang.liu@linaro.org<mailto:xinliang.liu@linaro.org>> wrote:

Maloo is just one place to link to to actually let people see the results, but you can link to external resources too
like e.g. gatekeeper janitor helper does or assuming the information is small enough it could be entirely contained
in the comment (like say for a build failure)

Ok, understand now. Is there any other reference external CI that posts results to Lustre gerrit now?

Currently there are:
- checkpatch and Misc code checks (smach) that post their results as 100% comment only. they share codebase pretty much
- the Janitor (also started with above codebase but got changed and extended a lot)

There was external interest in the past to post results to gerrit but it never materialized in the end

[-- Attachment #1.2: Type: text/html, Size: 1614 bytes --]

[-- Attachment #2: Type: text/plain, Size: 165 bytes --]

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [lustre-devel] Lustre Arm stuff status and work plan
  2021-12-28  1:58           ` Oleg Drokin via lustre-devel
@ 2022-02-18  8:05             ` Kevin Zhao via lustre-devel
  2022-02-28  5:36               ` Oleg Drokin via lustre-devel
  0 siblings, 1 reply; 16+ messages in thread
From: Kevin Zhao via lustre-devel @ 2022-02-18  8:05 UTC (permalink / raw)
  To: Oleg Drokin, mdiep
  Cc: Li Xi, Jian Yu, cloud-dev-request, Xinliang Liu via lustre-devel


[-- Attachment #1.1: Type: text/plain, Size: 2623 bytes --]

Hi All,


Greetings and thanks a lot for your comments! Xinliang and I are from Linaro
<https://www.linaro.org/>, an organization focusing on Arm open-source
ecosystem development. We have been working on Lustre on the Arm64 server
and client end for a while now, already fixing a few bugs on arm64.

As Xinliang said before, we want to enable the Arm64 CI, Oleg advises that
we can plug our own CI nodes into the Jenkins. Now we want to understand
and estimate how many machines resources can meet our requests, and doing
the next stage plan of our hardware to meet the Lustre test requirements.


As I understand, the test jobs will cover the ZFS and Ldiskfs backend with
2 scenarios:

   -

   Lustre Arm64 Server + Arm64 Client( High Priority )
   -

   Lustre Arm64 Server + x86_64 Client

After going through the Lustre test website:
https://testing.whamcloud.com/test_sessions, it is quite clear to show the
test info, and still remain some questions, that will be great if the
community can give me a clear answer.

1. Is there a link to show all the machine resources? Including the machine
info, CPU, memory and peripheral info.

2. Do we have a CI infra arch overview diagram to show the machine usage
and communication?

3. How many machines are needed to meet the request of the Lustre Arm64
Server + Arm64 Client test?


Thanks a lot for your time, and look forward to your response.


On Tue, 28 Dec 2021 at 09:58, Oleg Drokin <green@whamcloud.com> wrote:

>
>
> On Dec 27, 2021, at 8:53 PM, Xinliang Liu <xinliang.liu@linaro.org> wrote:
>
> Maloo is just one place to link to to actually let people see the results,
>> but you can link to external resources too
>> like e.g. gatekeeper janitor helper does or assuming the information is
>> small enough it could be entirely contained
>> in the comment (like say for a build failure)
>>
>
> Ok, understand now. Is there any other reference external CI that posts
> results to Lustre gerrit now?
>
>
> Currently there are:
> - checkpatch and Misc code checks (smach) that post their results as 100%
> comment only. they share codebase pretty much
> - the Janitor (also started with above codebase but got changed and
> extended a lot)
>
> There was external interest in the past to post results to gerrit but it
> never materialized in the end
>


-- 
*Best Regards*

*Kevin Zhao*

Tech Lead, LDCG Cloud Infrastructure

Linaro Vertical Technologies

IRC(freenode): kevinz

Slack(kubernetes.slack.com): kevinz

kevin.zhao@linaro.org | Mobile/Direct/Wechat:  +86 18818270915

[-- Attachment #1.2: Type: text/html, Size: 10695 bytes --]

[-- Attachment #2: Type: text/plain, Size: 165 bytes --]

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [lustre-devel] Lustre Arm stuff status and work plan
  2022-02-18  8:05             ` Kevin Zhao via lustre-devel
@ 2022-02-28  5:36               ` Oleg Drokin via lustre-devel
  2022-03-11  8:28                 ` Kevin Zhao via lustre-devel
  2022-03-29  7:37                 ` [lustre-devel] " Kevin Zhao via lustre-devel
  0 siblings, 2 replies; 16+ messages in thread
From: Oleg Drokin via lustre-devel @ 2022-02-28  5:36 UTC (permalink / raw)
  To: Kevin Zhao
  Cc: Li Xi, Jian Yu, cloud-dev-request, Xinliang Liu via lustre-devel


[-- Attachment #1.1: Type: text/plain, Size: 3678 bytes --]

Hello!

  the sizing really depends on your test scaling requirements.
  For example my own test infrastructure is a couple builders + 4 nodes for VMs (each has 256G RAM), 160 VM pairs in total,
  and on a particularly busy day another 80 VM pairs can be added. This is to ensure speedy feedback to developers.
  You can operate a much smaller scale testing system if you want, just keep in mind what is the longest running test would take
  to understand how many patches could be tested in parallel (sometimes patch bombs result in 20+ patches submission at the same time).
   Here’s stats for last 30 days. https://imgur.com/lk2ogJv 1 item means single patch n processing. time in testing for a patch is typically about 3.5 hours.

maloo shows the resources when you go into the test session, for example https://testing.whamcloud.com/test_sessions/4de25b47-43fc-4bfc-87aa-15e4968519a7 - scroll down to see list of nodes



On Feb 18, 2022, at 3:05 AM, Kevin Zhao <kevin.zhao@linaro.org<mailto:kevin.zhao@linaro.org>> wrote:

Hi All,

Greetings and thanks a lot for your comments! Xinliang and I are from Linaro<https://www.linaro.org/>, an organization focusing on Arm open-source ecosystem development. We have been working on Lustre on the Arm64 server and client end for a while now, already fixing a few bugs on arm64.
As Xinliang said before, we want to enable the Arm64 CI, Oleg advises that we can plug our own CI nodes into the Jenkins. Now we want to understand and estimate how many machines resources can meet our requests, and doing the next stage plan of our hardware to meet the Lustre test requirements.

As I understand, the test jobs will cover the ZFS and Ldiskfs backend with 2 scenarios:

  *
Lustre Arm64 Server + Arm64 Client( High Priority )
  *
Lustre Arm64 Server + x86_64 Client

After going through the Lustre test website: https://testing.whamcloud.com/test_sessions, it is quite clear to show the test info, and still remain some questions, that will be great if the community can give me a clear answer.
1. Is there a link to show all the machine resources? Including the machine info, CPU, memory and peripheral info.
2. Do we have a CI infra arch overview diagram to show the machine usage and communication?
3. How many machines are needed to meet the request of the Lustre Arm64 Server + Arm64 Client test?

Thanks a lot for your time, and look forward to your response.


On Tue, 28 Dec 2021 at 09:58, Oleg Drokin <green@whamcloud.com<mailto:green@whamcloud.com>> wrote:


On Dec 27, 2021, at 8:53 PM, Xinliang Liu <xinliang.liu@linaro.org<mailto:xinliang.liu@linaro.org>> wrote:

Maloo is just one place to link to to actually let people see the results, but you can link to external resources too
like e.g. gatekeeper janitor helper does or assuming the information is small enough it could be entirely contained
in the comment (like say for a build failure)

Ok, understand now. Is there any other reference external CI that posts results to Lustre gerrit now?

Currently there are:
- checkpatch and Misc code checks (smach) that post their results as 100% comment only. they share codebase pretty much
- the Janitor (also started with above codebase but got changed and extended a lot)

There was external interest in the past to post results to gerrit but it never materialized in the end


--
Best Regards
Kevin Zhao
Tech Lead, LDCG Cloud Infrastructure
Linaro Vertical Technologies
IRC(freenode): kevinz
Slack(kubernetes.slack.com<http://kubernetes.slack.com/>): kevinz
kevin.zhao@linaro.org<mailto:kevin.zhao@linaro.org> | Mobile/Direct/Wechat:  +86 18818270915



[-- Attachment #1.2: Type: text/html, Size: 14623 bytes --]

[-- Attachment #2: Type: text/plain, Size: 165 bytes --]

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [lustre-devel] Lustre Arm stuff status and work plan
  2022-02-28  5:36               ` Oleg Drokin via lustre-devel
@ 2022-03-11  8:28                 ` Kevin Zhao via lustre-devel
  2022-03-15 22:09                   ` [lustre-devel] [EXTERNAL] " Simmons, James via lustre-devel
  2022-03-29  7:37                 ` [lustre-devel] " Kevin Zhao via lustre-devel
  1 sibling, 1 reply; 16+ messages in thread
From: Kevin Zhao via lustre-devel @ 2022-03-11  8:28 UTC (permalink / raw)
  To: Oleg Drokin
  Cc: Li Xi, Jian Yu, cloud-dev-request, Xinliang Liu via lustre-devel


[-- Attachment #1.1: Type: text/plain, Size: 4193 bytes --]

Thanks Oleg,

I will update the progress for the test clusters setup on Arm64 platform.

On Mon, 28 Feb 2022 at 13:36, Oleg Drokin <green@whamcloud.com> wrote:

> Hello!
>
>   the sizing really depends on your test scaling requirements.
>   For example my own test infrastructure is a couple builders + 4 nodes
> for VMs (each has 256G RAM), 160 VM pairs in total,
>   and on a particularly busy day another 80 VM pairs can be added. This is
> to ensure speedy feedback to developers.
>   You can operate a much smaller scale testing system if you want, just
> keep in mind what is the longest running test would take
>   to understand how many patches could be tested in parallel (sometimes
> patch bombs result in 20+ patches submission at the same time).
>    Here’s stats for last 30 days. https://imgur.com/lk2ogJv 1 item means
> single patch n processing. time in testing for a patch is typically about
> 3.5 hours.
>
> maloo shows the resources when you go into the test session, for example
> https://testing.whamcloud.com/test_sessions/4de25b47-43fc-4bfc-87aa-15e4968519a7
> - scroll down to see list of nodes
>
>
>
> On Feb 18, 2022, at 3:05 AM, Kevin Zhao <kevin.zhao@linaro.org> wrote:
>
> Hi All,
>
> Greetings and thanks a lot for your comments! Xinliang and I are from
> Linaro <https://www.linaro.org/>, an organization focusing on Arm
> open-source ecosystem development. We have been working on Lustre on the
> Arm64 server and client end for a while now, already fixing a few bugs on
> arm64.
> As Xinliang said before, we want to enable the Arm64 CI, Oleg advises
> that we can plug our own CI nodes into the Jenkins. Now we want to
> understand and estimate how many machines resources can meet our requests,
> and doing the next stage plan of our hardware to meet the Lustre test
> requirements.
>
> As I understand, the test jobs will cover the ZFS and Ldiskfs backend with
> 2 scenarios:
>
>    - Lustre Arm64 Server + Arm64 Client( High Priority )
>    - Lustre Arm64 Server + x86_64 Client
>
> After going through the Lustre test website:
> https://testing.whamcloud.com/test_sessions, it is quite clear to show
> the test info, and still remain some questions, that will be great if the
> community can give me a clear answer.
> 1. Is there a link to show all the machine resources? Including the
> machine info, CPU, memory and peripheral info.
> 2. Do we have a CI infra arch overview diagram to show the machine usage
> and communication?
> 3. How many machines are needed to meet the request of the Lustre Arm64
> Server + Arm64 Client test?
>
> Thanks a lot for your time, and look forward to your response.
>
>
> On Tue, 28 Dec 2021 at 09:58, Oleg Drokin <green@whamcloud.com> wrote:
>
>>
>>
>> On Dec 27, 2021, at 8:53 PM, Xinliang Liu <xinliang.liu@linaro.org>
>> wrote:
>>
>> Maloo is just one place to link to to actually let people see the
>>> results, but you can link to external resources too
>>> like e.g. gatekeeper janitor helper does or assuming the information is
>>> small enough it could be entirely contained
>>> in the comment (like say for a build failure)
>>>
>>
>> Ok, understand now. Is there any other reference external CI that posts
>> results to Lustre gerrit now?
>>
>>
>> Currently there are:
>> - checkpatch and Misc code checks (smach) that post their results as 100%
>> comment only. they share codebase pretty much
>> - the Janitor (also started with above codebase but got changed and
>> extended a lot)
>>
>> There was external interest in the past to post results to gerrit but it
>> never materialized in the end
>>
>
>
> --
> *Best Regards*
>
> *Kevin Zhao*
>
> Tech Lead, LDCG Cloud Infrastructure
>
> Linaro Vertical Technologies
>
> IRC(freenode): kevinz
>
> Slack(kubernetes.slack.com): kevinz
>
> kevin.zhao@linaro.org | Mobile/Direct/Wechat:  +86 18818270915
>
>
>

-- 
*Best Regards*

*Kevin Zhao*

Tech Lead, LDCG Cloud Infrastructure

Linaro Vertical Technologies

IRC(freenode): kevinz

Slack(kubernetes.slack.com): kevinz

kevin.zhao@linaro.org | Mobile/Direct/Wechat:  +86 18818270915

[-- Attachment #1.2: Type: text/html, Size: 15849 bytes --]

[-- Attachment #2: Type: text/plain, Size: 165 bytes --]

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [lustre-devel] [EXTERNAL] Re: Lustre Arm stuff status and work plan
  2022-03-11  8:28                 ` Kevin Zhao via lustre-devel
@ 2022-03-15 22:09                   ` Simmons, James via lustre-devel
  2022-03-16  1:25                     ` Kevin Zhao via lustre-devel
  0 siblings, 1 reply; 16+ messages in thread
From: Simmons, James via lustre-devel @ 2022-03-15 22:09 UTC (permalink / raw)
  To: Kevin Zhao; +Cc: Xinliang Liu via lustre-devel


[-- Attachment #1.1: Type: text/plain, Size: 4952 bytes --]

Hello.

    I have been watching your efforts to doing your own testing and this is something ORNL has been interested in as well.
I was thinking would you be willing to do a joint talk at LUG on this effort. We can pool our knowledge on how to doing
local testing and feeding it back to WC. Would you be interested?
________________________________
From: lustre-devel <lustre-devel-bounces@lists.lustre.org> on behalf of Kevin Zhao via lustre-devel <lustre-devel@lists.lustre.org>
Sent: Friday, March 11, 2022 1:28 AM
To: Oleg Drokin <green@whamcloud.com>
Cc: Li Xi <lixi@ddn.com>; Jian Yu <jiyu@whamcloud.com>; cloud-dev-request@op-lists.linaro.org <cloud-dev-request@op-lists.linaro.org>; Xinliang Liu via lustre-devel <lustre-devel@lists.lustre.org>
Subject: [EXTERNAL] Re: [lustre-devel] Lustre Arm stuff status and work plan

Thanks Oleg,

I will update the progress for the test clusters setup on Arm64 platform.

On Mon, 28 Feb 2022 at 13:36, Oleg Drokin <green@whamcloud.com<mailto:green@whamcloud.com>> wrote:
Hello!

  the sizing really depends on your test scaling requirements.
  For example my own test infrastructure is a couple builders + 4 nodes for VMs (each has 256G RAM), 160 VM pairs in total,
  and on a particularly busy day another 80 VM pairs can be added. This is to ensure speedy feedback to developers.
  You can operate a much smaller scale testing system if you want, just keep in mind what is the longest running test would take
  to understand how many patches could be tested in parallel (sometimes patch bombs result in 20+ patches submission at the same time).
   Here’s stats for last 30 days. hxxps://imgur.com/lk2ogJv 1 item means single patch n processing. time in testing for a patch is typically about 3.5 hours.

maloo shows the resources when you go into the test session, for example hxxps://testing.whamcloud.com/test_sessions/4de25b47-43fc-4bfc-87aa-15e4968519a7 - scroll down to see list of nodes



On Feb 18, 2022, at 3:05 AM, Kevin Zhao <kevin.zhao@linaro.org<mailto:kevin.zhao@linaro.org>> wrote:

Hi All,

Greetings and thanks a lot for your comments! Xinliang and I are from Linaro, an organization focusing on Arm open-source ecosystem development. We have been working on Lustre on the Arm64 server and client end for a while now, already fixing a few bugs on arm64.
As Xinliang said before, we want to enable the Arm64 CI, Oleg advises that we can plug our own CI nodes into the Jenkins. Now we want to understand and estimate how many machines resources can meet our requests, and doing the next stage plan of our hardware to meet the Lustre test requirements.

As I understand, the test jobs will cover the ZFS and Ldiskfs backend with 2 scenarios:

  *
Lustre Arm64 Server + Arm64 Client( High Priority )
  *
Lustre Arm64 Server + x86_64 Client

After going through the Lustre test website: hxxps://testing.whamcloud.com/test_sessions, it is quite clear to show the test info, and still remain some questions, that will be great if the community can give me a clear answer.
1. Is there a link to show all the machine resources? Including the machine info, CPU, memory and peripheral info.
2. Do we have a CI infra arch overview diagram to show the machine usage and communication?
3. How many machines are needed to meet the request of the Lustre Arm64 Server + Arm64 Client test?

Thanks a lot for your time, and look forward to your response.


On Tue, 28 Dec 2021 at 09:58, Oleg Drokin <green@whamcloud.com<mailto:green@whamcloud.com>> wrote:


On Dec 27, 2021, at 8:53 PM, Xinliang Liu <xinliang.liu@linaro.org<mailto:xinliang.liu@linaro.org>> wrote:

Maloo is just one place to link to to actually let people see the results, but you can link to external resources too
like e.g. gatekeeper janitor helper does or assuming the information is small enough it could be entirely contained
in the comment (like say for a build failure)

Ok, understand now. Is there any other reference external CI that posts results to Lustre gerrit now?

Currently there are:
- checkpatch and Misc code checks (smach) that post their results as 100% comment only. they share codebase pretty much
- the Janitor (also started with above codebase but got changed and extended a lot)

There was external interest in the past to post results to gerrit but it never materialized in the end


--
Best Regards

Kevin Zhao

Tech Lead, LDCG Cloud Infrastructure

Linaro Vertical Technologies

IRC(freenode): kevinz

Slack(kubernetes.slack.com): kevinz

kevin.zhao@linaro.org<mailto:kevin.zhao@linaro.org> | Mobile/Direct/Wechat:  +86 18818270915




--
Best Regards

Kevin Zhao

Tech Lead, LDCG Cloud Infrastructure

Linaro Vertical Technologies

IRC(freenode): kevinz

Slack(kubernetes.slack.com): kevinz

kevin.zhao@linaro.org<mailto:kevin.zhao@linaro.org> | Mobile/Direct/Wechat:  +86 18818270915


[-- Attachment #1.2: Type: text/html, Size: 17975 bytes --]

[-- Attachment #2: Type: text/plain, Size: 165 bytes --]

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [lustre-devel] [EXTERNAL] Re: Lustre Arm stuff status and work plan
  2022-03-15 22:09                   ` [lustre-devel] [EXTERNAL] " Simmons, James via lustre-devel
@ 2022-03-16  1:25                     ` Kevin Zhao via lustre-devel
  2022-03-18 22:46                       ` Simmons, James via lustre-devel
  0 siblings, 1 reply; 16+ messages in thread
From: Kevin Zhao via lustre-devel @ 2022-03-16  1:25 UTC (permalink / raw)
  To: Simmons, James; +Cc: Xinliang Liu via lustre-devel


[-- Attachment #1.1: Type: text/plain, Size: 5584 bytes --]

Hi James,

It would be great! Look forward to having a joint talk on LUG2022 :-).

Xinliang and I are now working on the test cluster setup and hopefully, we
will have some progress quite soon.

On Wed, 16 Mar 2022 at 06:09, Simmons, James <simmonsja@ornl.gov> wrote:

> Hello.
>
>     I have been watching your efforts to doing your own testing and this
> is something ORNL has been interested in as well.
> I was thinking would you be willing to do a joint talk at LUG on this
> effort. We can pool our knowledge on how to doing
> local testing and feeding it back to WC. Would you be interested?
> ------------------------------
> *From:* lustre-devel <lustre-devel-bounces@lists.lustre.org> on behalf of
> Kevin Zhao via lustre-devel <lustre-devel@lists.lustre.org>
> *Sent:* Friday, March 11, 2022 1:28 AM
> *To:* Oleg Drokin <green@whamcloud.com>
> *Cc:* Li Xi <lixi@ddn.com>; Jian Yu <jiyu@whamcloud.com>;
> cloud-dev-request@op-lists.linaro.org <
> cloud-dev-request@op-lists.linaro.org>; Xinliang Liu via lustre-devel <
> lustre-devel@lists.lustre.org>
> *Subject:* [EXTERNAL] Re: [lustre-devel] Lustre Arm stuff status and work
> plan
>
> Thanks Oleg,
>
> I will update the progress for the test clusters setup on Arm64 platform.
>
> On Mon, 28 Feb 2022 at 13:36, Oleg Drokin <green@whamcloud.com> wrote:
>
> Hello!
>
>   the sizing really depends on your test scaling requirements.
>   For example my own test infrastructure is a couple builders + 4 nodes
> for VMs (each has 256G RAM), 160 VM pairs in total,
>   and on a particularly busy day another 80 VM pairs can be added. This is
> to ensure speedy feedback to developers.
>   You can operate a much smaller scale testing system if you want, just
> keep in mind what is the longest running test would take
>   to understand how many patches could be tested in parallel (sometimes
> patch bombs result in 20+ patches submission at the same time).
>    Here’s stats for last 30 days. hxxps://imgur.com/lk2ogJv 1 item means
> single patch n processing. time in testing for a patch is typically about
> 3.5 hours.
>
> maloo shows the resources when you go into the test session, for example
> hxxps://testing.whamcloud.com/test_sessions/4de25b47-43fc-4bfc-87aa-15e4968519a7
> - scroll down to see list of nodes
>
>
>
> On Feb 18, 2022, at 3:05 AM, Kevin Zhao <kevin.zhao@linaro.org> wrote:
>
> Hi All,
>
> Greetings and thanks a lot for your comments! Xinliang and I are from
> Linaro, an organization focusing on Arm open-source ecosystem development.
> We have been working on Lustre on the Arm64 server and client end for a
> while now, already fixing a few bugs on arm64.
> As Xinliang said before, we want to enable the Arm64 CI, Oleg advises
> that we can plug our own CI nodes into the Jenkins. Now we want to
> understand and estimate how many machines resources can meet our requests,
> and doing the next stage plan of our hardware to meet the Lustre test
> requirements.
>
> As I understand, the test jobs will cover the ZFS and Ldiskfs backend with
> 2 scenarios:
>
>    - Lustre Arm64 Server + Arm64 Client( High Priority )
>    - Lustre Arm64 Server + x86_64 Client
>
> After going through the Lustre test website:
> hxxps://testing.whamcloud.com/test_sessions, it is quite clear to show
> the test info, and still remain some questions, that will be great if the
> community can give me a clear answer.
> 1. Is there a link to show all the machine resources? Including the
> machine info, CPU, memory and peripheral info.
> 2. Do we have a CI infra arch overview diagram to show the machine usage
> and communication?
> 3. How many machines are needed to meet the request of the Lustre Arm64
> Server + Arm64 Client test?
>
> Thanks a lot for your time, and look forward to your response.
>
>
> On Tue, 28 Dec 2021 at 09:58, Oleg Drokin <green@whamcloud.com> wrote:
>
>
>
> On Dec 27, 2021, at 8:53 PM, Xinliang Liu <xinliang.liu@linaro.org> wrote:
>
> Maloo is just one place to link to to actually let people see the results,
> but you can link to external resources too
> like e.g. gatekeeper janitor helper does or assuming the information is
> small enough it could be entirely contained
> in the comment (like say for a build failure)
>
>
> Ok, understand now. Is there any other reference external CI that posts
> results to Lustre gerrit now?
>
>
> Currently there are:
> - checkpatch and Misc code checks (smach) that post their results as 100%
> comment only. they share codebase pretty much
> - the Janitor (also started with above codebase but got changed and
> extended a lot)
>
> There was external interest in the past to post results to gerrit but it
> never materialized in the end
>
>
>
> --
> *Best Regards*
>
> *Kevin Zhao*
>
> Tech Lead, LDCG Cloud Infrastructure
>
> Linaro Vertical Technologies
>
> IRC(freenode): kevinz
>
> Slack(kubernetes.slack.com): kevinz
>
> kevin.zhao@linaro.org | Mobile/Direct/Wechat:  +86 18818270915
>
>
>
>
> --
> *Best Regards*
>
> *Kevin Zhao*
>
> Tech Lead, LDCG Cloud Infrastructure
>
> Linaro Vertical Technologies
>
> IRC(freenode): kevinz
>
> Slack(kubernetes.slack.com): kevinz
>
> kevin.zhao@linaro.org | Mobile/Direct/Wechat:  +86 18818270915
>
>

-- 
*Best Regards*

*Kevin Zhao*

Tech Lead, LDCG Cloud Infrastructure

Linaro Vertical Technologies

IRC(freenode): kevinz

Slack(kubernetes.slack.com): kevinz

kevin.zhao@linaro.org | Mobile/Direct/Wechat:  +86 18818270915

[-- Attachment #1.2: Type: text/html, Size: 20669 bytes --]

[-- Attachment #2: Type: text/plain, Size: 165 bytes --]

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [lustre-devel] [EXTERNAL] Re: Lustre Arm stuff status and work plan
  2022-03-16  1:25                     ` Kevin Zhao via lustre-devel
@ 2022-03-18 22:46                       ` Simmons, James via lustre-devel
  2022-03-20 12:26                         ` Kevin Zhao via lustre-devel
  0 siblings, 1 reply; 16+ messages in thread
From: Simmons, James via lustre-devel @ 2022-03-18 22:46 UTC (permalink / raw)
  To: Kevin Zhao; +Cc: Xinliang Liu via lustre-devel


[-- Attachment #1.1: Type: text/plain, Size: 6975 bytes --]

By joint talk I'm referring to a presentation at LUG about how to do this type of setup. He is the abstract I'm thinking of sending to easychair.
Feedback welcomed.

The scope of Lustre support matrix has grown due to the community involvement but the testing has been very limited and the results often go unpublished. Whamcloud is resourced restrained to support every platform requested. Discusses have opened up about how to create testing sites outside of Whamcloud that are tied into the general testing frame work. This is with the goal of enabling external sites to running the equivalent tests Whamcloud's does when work is presented to OpenSFS Lustre source tree and have the results published the the general Maloo testing framework. This talk will go over what needs to be done to create such a setup.
________________________________
From: Kevin Zhao <kevin.zhao@linaro.org>
Sent: Tuesday, March 15, 2022 7:25 PM
To: Simmons, James <simmonsja@ornl.gov>
Cc: Xinliang Liu via lustre-devel <lustre-devel@lists.lustre.org>; Peter Jones <pjones@whamcloud.com>
Subject: Re: [EXTERNAL] Re: [lustre-devel] Lustre Arm stuff status and work plan

Hi James,

It would be great! Look forward to having a joint talk on LUG2022 :-).

Xinliang and I are now working on the test cluster setup and hopefully, we will have some progress quite soon.

On Wed, 16 Mar 2022 at 06:09, Simmons, James <simmonsja@ornl.gov<mailto:simmonsja@ornl.gov>> wrote:
Hello.

    I have been watching your efforts to doing your own testing and this is something ORNL has been interested in as well.
I was thinking would you be willing to do a joint talk at LUG on this effort. We can pool our knowledge on how to doing
local testing and feeding it back to WC. Would you be interested?
________________________________
From: lustre-devel <lustre-devel-bounces@lists.lustre.org<mailto:lustre-devel-bounces@lists.lustre.org>> on behalf of Kevin Zhao via lustre-devel <lustre-devel@lists.lustre.org<mailto:lustre-devel@lists.lustre.org>>
Sent: Friday, March 11, 2022 1:28 AM
To: Oleg Drokin <green@whamcloud.com<mailto:green@whamcloud.com>>
Cc: Li Xi <lixi@ddn.com<mailto:lixi@ddn.com>>; Jian Yu <jiyu@whamcloud.com<mailto:jiyu@whamcloud.com>>; cloud-dev-request@op-lists.linaro.org<mailto:cloud-dev-request@op-lists.linaro.org> <cloud-dev-request@op-lists.linaro.org<mailto:cloud-dev-request@op-lists.linaro.org>>; Xinliang Liu via lustre-devel <lustre-devel@lists.lustre.org<mailto:lustre-devel@lists.lustre.org>>
Subject: [EXTERNAL] Re: [lustre-devel] Lustre Arm stuff status and work plan

Thanks Oleg,

I will update the progress for the test clusters setup on Arm64 platform.

On Mon, 28 Feb 2022 at 13:36, Oleg Drokin <green@whamcloud.com<mailto:green@whamcloud.com>> wrote:
Hello!

  the sizing really depends on your test scaling requirements.
  For example my own test infrastructure is a couple builders + 4 nodes for VMs (each has 256G RAM), 160 VM pairs in total,
  and on a particularly busy day another 80 VM pairs can be added. This is to ensure speedy feedback to developers.
  You can operate a much smaller scale testing system if you want, just keep in mind what is the longest running test would take
  to understand how many patches could be tested in parallel (sometimes patch bombs result in 20+ patches submission at the same time).
   Here’s stats for last 30 days. hxxps://imgur.com/lk2ogJv 1 item means single patch n processing. time in testing for a patch is typically about 3.5 hours.

maloo shows the resources when you go into the test session, for example hxxps://testing.whamcloud.com/test_sessions/4de25b47-43fc-4bfc-87aa-15e4968519a7 - scroll down to see list of nodes



On Feb 18, 2022, at 3:05 AM, Kevin Zhao <kevin.zhao@linaro.org<mailto:kevin.zhao@linaro.org>> wrote:

Hi All,

Greetings and thanks a lot for your comments! Xinliang and I are from Linaro, an organization focusing on Arm open-source ecosystem development. We have been working on Lustre on the Arm64 server and client end for a while now, already fixing a few bugs on arm64.
As Xinliang said before, we want to enable the Arm64 CI, Oleg advises that we can plug our own CI nodes into the Jenkins. Now we want to understand and estimate how many machines resources can meet our requests, and doing the next stage plan of our hardware to meet the Lustre test requirements.

As I understand, the test jobs will cover the ZFS and Ldiskfs backend with 2 scenarios:

  *
Lustre Arm64 Server + Arm64 Client( High Priority )
  *
Lustre Arm64 Server + x86_64 Client

After going through the Lustre test website: hxxps://testing.whamcloud.com/test_sessions, it is quite clear to show the test info, and still remain some questions, that will be great if the community can give me a clear answer.
1. Is there a link to show all the machine resources? Including the machine info, CPU, memory and peripheral info.
2. Do we have a CI infra arch overview diagram to show the machine usage and communication?
3. How many machines are needed to meet the request of the Lustre Arm64 Server + Arm64 Client test?

Thanks a lot for your time, and look forward to your response.


On Tue, 28 Dec 2021 at 09:58, Oleg Drokin <green@whamcloud.com<mailto:green@whamcloud.com>> wrote:


On Dec 27, 2021, at 8:53 PM, Xinliang Liu <xinliang.liu@linaro.org<mailto:xinliang.liu@linaro.org>> wrote:

Maloo is just one place to link to to actually let people see the results, but you can link to external resources too
like e.g. gatekeeper janitor helper does or assuming the information is small enough it could be entirely contained
in the comment (like say for a build failure)

Ok, understand now. Is there any other reference external CI that posts results to Lustre gerrit now?

Currently there are:
- checkpatch and Misc code checks (smach) that post their results as 100% comment only. they share codebase pretty much
- the Janitor (also started with above codebase but got changed and extended a lot)

There was external interest in the past to post results to gerrit but it never materialized in the end


--
Best Regards

Kevin Zhao

Tech Lead, LDCG Cloud Infrastructure

Linaro Vertical Technologies

IRC(freenode): kevinz

Slack(kubernetes.slack.com): kevinz

kevin.zhao@linaro.org<mailto:kevin.zhao@linaro.org> | Mobile/Direct/Wechat:  +86 18818270915




--
Best Regards

Kevin Zhao

Tech Lead, LDCG Cloud Infrastructure

Linaro Vertical Technologies

IRC(freenode): kevinz

Slack(kubernetes.slack.com): kevinz

kevin.zhao@linaro.org<mailto:kevin.zhao@linaro.org> | Mobile/Direct/Wechat:  +86 18818270915



--
Best Regards

Kevin Zhao

Tech Lead, LDCG Cloud Infrastructure

Linaro Vertical Technologies

IRC(freenode): kevinz

Slack(kubernetes.slack.com): kevinz

kevin.zhao@linaro.org<mailto:kevin.zhao@linaro.org> | Mobile/Direct/Wechat:  +86 18818270915


[-- Attachment #1.2: Type: text/html, Size: 23268 bytes --]

[-- Attachment #2: Type: text/plain, Size: 165 bytes --]

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [lustre-devel] [EXTERNAL] Re: Lustre Arm stuff status and work plan
  2022-03-18 22:46                       ` Simmons, James via lustre-devel
@ 2022-03-20 12:26                         ` Kevin Zhao via lustre-devel
  0 siblings, 0 replies; 16+ messages in thread
From: Kevin Zhao via lustre-devel @ 2022-03-20 12:26 UTC (permalink / raw)
  To: Simmons, James; +Cc: Xinliang Liu via lustre-devel


[-- Attachment #1.1: Type: text/plain, Size: 7225 bytes --]

Hi James,

Thanks! I think the abstract is quite nice.

On Sat, 19 Mar 2022 at 06:47, Simmons, James <simmonsja@ornl.gov> wrote:

> By joint talk I'm referring to a presentation at LUG about how to do this
> type of setup. He is the abstract I'm thinking of sending to easychair.
> Feedback welcomed.
>
> The scope of Lustre support matrix has grown due to the community
> involvement but the testing has been very limited and the results often go
> unpublished. Whamcloud is resourced restrained to support every platform
> requested. Discusses have opened up about how to create testing sites
> outside of Whamcloud that are tied into the general testing frame work.
> This is with the goal of enabling external sites to running the equivalent
> tests Whamcloud's does when work is presented to OpenSFS Lustre source tree
> and have the results published the the general Maloo testing framework.
> This talk will go over what needs to be done to create such a setup.
> ------------------------------
> *From:* Kevin Zhao <kevin.zhao@linaro.org>
> *Sent:* Tuesday, March 15, 2022 7:25 PM
> *To:* Simmons, James <simmonsja@ornl.gov>
> *Cc:* Xinliang Liu via lustre-devel <lustre-devel@lists.lustre.org>;
> Peter Jones <pjones@whamcloud.com>
> *Subject:* Re: [EXTERNAL] Re: [lustre-devel] Lustre Arm stuff status and
> work plan
>
> Hi James,
>
> It would be great! Look forward to having a joint talk on LUG2022 :-).
>
> Xinliang and I are now working on the test cluster setup and hopefully, we
> will have some progress quite soon.
>
> On Wed, 16 Mar 2022 at 06:09, Simmons, James <simmonsja@ornl.gov> wrote:
>
> Hello.
>
>     I have been watching your efforts to doing your own testing and this
> is something ORNL has been interested in as well.
> I was thinking would you be willing to do a joint talk at LUG on this
> effort. We can pool our knowledge on how to doing
> local testing and feeding it back to WC. Would you be interested?
> ------------------------------
> *From:* lustre-devel <lustre-devel-bounces@lists.lustre.org> on behalf of
> Kevin Zhao via lustre-devel <lustre-devel@lists.lustre.org>
> *Sent:* Friday, March 11, 2022 1:28 AM
> *To:* Oleg Drokin <green@whamcloud.com>
> *Cc:* Li Xi <lixi@ddn.com>; Jian Yu <jiyu@whamcloud.com>;
> cloud-dev-request@op-lists.linaro.org <
> cloud-dev-request@op-lists.linaro.org>; Xinliang Liu via lustre-devel <
> lustre-devel@lists.lustre.org>
> *Subject:* [EXTERNAL] Re: [lustre-devel] Lustre Arm stuff status and work
> plan
>
> Thanks Oleg,
>
> I will update the progress for the test clusters setup on Arm64 platform.
>
> On Mon, 28 Feb 2022 at 13:36, Oleg Drokin <green@whamcloud.com> wrote:
>
> Hello!
>
>   the sizing really depends on your test scaling requirements.
>   For example my own test infrastructure is a couple builders + 4 nodes
> for VMs (each has 256G RAM), 160 VM pairs in total,
>   and on a particularly busy day another 80 VM pairs can be added. This is
> to ensure speedy feedback to developers.
>   You can operate a much smaller scale testing system if you want, just
> keep in mind what is the longest running test would take
>   to understand how many patches could be tested in parallel (sometimes
> patch bombs result in 20+ patches submission at the same time).
>    Here’s stats for last 30 days. hxxps://imgur.com/lk2ogJv 1 item means
> single patch n processing. time in testing for a patch is typically about
> 3.5 hours.
>
> maloo shows the resources when you go into the test session, for example
> hxxps://testing.whamcloud.com/test_sessions/4de25b47-43fc-4bfc-87aa-15e4968519a7
> - scroll down to see list of nodes
>
>
>
> On Feb 18, 2022, at 3:05 AM, Kevin Zhao <kevin.zhao@linaro.org> wrote:
>
> Hi All,
>
> Greetings and thanks a lot for your comments! Xinliang and I are from
> Linaro, an organization focusing on Arm open-source ecosystem development.
> We have been working on Lustre on the Arm64 server and client end for a
> while now, already fixing a few bugs on arm64.
> As Xinliang said before, we want to enable the Arm64 CI, Oleg advises
> that we can plug our own CI nodes into the Jenkins. Now we want to
> understand and estimate how many machines resources can meet our requests,
> and doing the next stage plan of our hardware to meet the Lustre test
> requirements.
>
> As I understand, the test jobs will cover the ZFS and Ldiskfs backend with
> 2 scenarios:
>
>    - Lustre Arm64 Server + Arm64 Client( High Priority )
>    - Lustre Arm64 Server + x86_64 Client
>
> After going through the Lustre test website:
> hxxps://testing.whamcloud.com/test_sessions, it is quite clear to show
> the test info, and still remain some questions, that will be great if the
> community can give me a clear answer.
> 1. Is there a link to show all the machine resources? Including the
> machine info, CPU, memory and peripheral info.
> 2. Do we have a CI infra arch overview diagram to show the machine usage
> and communication?
> 3. How many machines are needed to meet the request of the Lustre Arm64
> Server + Arm64 Client test?
>
> Thanks a lot for your time, and look forward to your response.
>
>
> On Tue, 28 Dec 2021 at 09:58, Oleg Drokin <green@whamcloud.com> wrote:
>
>
>
> On Dec 27, 2021, at 8:53 PM, Xinliang Liu <xinliang.liu@linaro.org> wrote:
>
> Maloo is just one place to link to to actually let people see the results,
> but you can link to external resources too
> like e.g. gatekeeper janitor helper does or assuming the information is
> small enough it could be entirely contained
> in the comment (like say for a build failure)
>
>
> Ok, understand now. Is there any other reference external CI that posts
> results to Lustre gerrit now?
>
>
> Currently there are:
> - checkpatch and Misc code checks (smach) that post their results as 100%
> comment only. they share codebase pretty much
> - the Janitor (also started with above codebase but got changed and
> extended a lot)
>
> There was external interest in the past to post results to gerrit but it
> never materialized in the end
>
>
>
> --
> *Best Regards*
>
> *Kevin Zhao*
>
> Tech Lead, LDCG Cloud Infrastructure
>
> Linaro Vertical Technologies
>
> IRC(freenode): kevinz
>
> Slack(kubernetes.slack.com): kevinz
>
> kevin.zhao@linaro.org | Mobile/Direct/Wechat:  +86 18818270915
>
>
>
>
> --
> *Best Regards*
>
> *Kevin Zhao*
>
> Tech Lead, LDCG Cloud Infrastructure
>
> Linaro Vertical Technologies
>
> IRC(freenode): kevinz
>
> Slack(kubernetes.slack.com): kevinz
>
> kevin.zhao@linaro.org | Mobile/Direct/Wechat:  +86 18818270915
>
>
>
> --
> *Best Regards*
>
> *Kevin Zhao*
>
> Tech Lead, LDCG Cloud Infrastructure
>
> Linaro Vertical Technologies
>
> IRC(freenode): kevinz
>
> Slack(kubernetes.slack.com): kevinz
>
> kevin.zhao@linaro.org | Mobile/Direct/Wechat:  +86 18818270915
>
>

-- 
*Best Regards*

*Kevin Zhao*

Tech Lead, LDCG Cloud Infrastructure

Linaro Vertical Technologies

IRC(freenode): kevinz

Slack(kubernetes.slack.com): kevinz

kevin.zhao@linaro.org | Mobile/Direct/Wechat:  +86 18818270915

[-- Attachment #1.2: Type: text/html, Size: 25895 bytes --]

[-- Attachment #2: Type: text/plain, Size: 165 bytes --]

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [lustre-devel] Lustre Arm stuff status and work plan
  2022-02-28  5:36               ` Oleg Drokin via lustre-devel
  2022-03-11  8:28                 ` Kevin Zhao via lustre-devel
@ 2022-03-29  7:37                 ` Kevin Zhao via lustre-devel
  2022-03-29 12:51                   ` Peter Jones via lustre-devel
  1 sibling, 1 reply; 16+ messages in thread
From: Kevin Zhao via lustre-devel @ 2022-03-29  7:37 UTC (permalink / raw)
  To: Oleg Drokin
  Cc: Li Xi, Jian Yu, cloud-dev-request, Xinliang Liu via lustre-devel


[-- Attachment #1.1: Type: text/plain, Size: 4553 bytes --]

Hi Oleg,

We are now defining the test process and setting up a trial test cluster as
the external Arm64 test resources.
Can I have an account to post consequences to Maloo DB? The trial arm64
external test cluster is under setup and hopefully, it will be finished
quite soon. I want to test if we can post data to Maloo DB. Btw, the doc
you point before:
https://wiki.whamcloud.com/display/TEI/Test+results+format can
not access, just 404.

On Mon, 28 Feb 2022 at 13:36, Oleg Drokin <green@whamcloud.com> wrote:

> Hello!
>
>   the sizing really depends on your test scaling requirements.
>   For example my own test infrastructure is a couple builders + 4 nodes
> for VMs (each has 256G RAM), 160 VM pairs in total,
>   and on a particularly busy day another 80 VM pairs can be added. This is
> to ensure speedy feedback to developers.
>   You can operate a much smaller scale testing system if you want, just
> keep in mind what is the longest running test would take
>   to understand how many patches could be tested in parallel (sometimes
> patch bombs result in 20+ patches submission at the same time).
>    Here’s stats for last 30 days. https://imgur.com/lk2ogJv 1 item means
> single patch n processing. time in testing for a patch is typically about
> 3.5 hours.
>
> maloo shows the resources when you go into the test session, for example
> https://testing.whamcloud.com/test_sessions/4de25b47-43fc-4bfc-87aa-15e4968519a7
> - scroll down to see list of nodes
>
>
>
> On Feb 18, 2022, at 3:05 AM, Kevin Zhao <kevin.zhao@linaro.org> wrote:
>
> Hi All,
>
> Greetings and thanks a lot for your comments! Xinliang and I are from
> Linaro <https://www.linaro.org/>, an organization focusing on Arm
> open-source ecosystem development. We have been working on Lustre on the
> Arm64 server and client end for a while now, already fixing a few bugs on
> arm64.
> As Xinliang said before, we want to enable the Arm64 CI, Oleg advises
> that we can plug our own CI nodes into the Jenkins. Now we want to
> understand and estimate how many machines resources can meet our requests,
> and doing the next stage plan of our hardware to meet the Lustre test
> requirements.
>
> As I understand, the test jobs will cover the ZFS and Ldiskfs backend with
> 2 scenarios:
>
>    - Lustre Arm64 Server + Arm64 Client( High Priority )
>    - Lustre Arm64 Server + x86_64 Client
>
> After going through the Lustre test website:
> https://testing.whamcloud.com/test_sessions, it is quite clear to show
> the test info, and still remain some questions, that will be great if the
> community can give me a clear answer.
> 1. Is there a link to show all the machine resources? Including the
> machine info, CPU, memory and peripheral info.
> 2. Do we have a CI infra arch overview diagram to show the machine usage
> and communication?
> 3. How many machines are needed to meet the request of the Lustre Arm64
> Server + Arm64 Client test?
>
> Thanks a lot for your time, and look forward to your response.
>
>
> On Tue, 28 Dec 2021 at 09:58, Oleg Drokin <green@whamcloud.com> wrote:
>
>>
>>
>> On Dec 27, 2021, at 8:53 PM, Xinliang Liu <xinliang.liu@linaro.org>
>> wrote:
>>
>> Maloo is just one place to link to to actually let people see the
>>> results, but you can link to external resources too
>>> like e.g. gatekeeper janitor helper does or assuming the information is
>>> small enough it could be entirely contained
>>> in the comment (like say for a build failure)
>>>
>>
>> Ok, understand now. Is there any other reference external CI that posts
>> results to Lustre gerrit now?
>>
>>
>> Currently there are:
>> - checkpatch and Misc code checks (smach) that post their results as 100%
>> comment only. they share codebase pretty much
>> - the Janitor (also started with above codebase but got changed and
>> extended a lot)
>>
>> There was external interest in the past to post results to gerrit but it
>> never materialized in the end
>>
>
>
> --
> *Best Regards*
>
> *Kevin Zhao*
>
> Tech Lead, LDCG Cloud Infrastructure
>
> Linaro Vertical Technologies
>
> IRC(freenode): kevinz
>
> Slack(kubernetes.slack.com): kevinz
>
> kevin.zhao@linaro.org | Mobile/Direct/Wechat:  +86 18818270915
>
>
>

-- 
*Best Regards*

*Kevin Zhao*

Tech Lead, LDCG Cloud Infrastructure

Linaro Vertical Technologies

IRC(freenode): kevinz

Slack(kubernetes.slack.com): kevinz

kevin.zhao@linaro.org | Mobile/Direct/Wechat:  +86 18818270915

[-- Attachment #1.2: Type: text/html, Size: 16306 bytes --]

[-- Attachment #2: Type: text/plain, Size: 165 bytes --]

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [lustre-devel] Lustre Arm stuff status and work plan
  2022-03-29  7:37                 ` [lustre-devel] " Kevin Zhao via lustre-devel
@ 2022-03-29 12:51                   ` Peter Jones via lustre-devel
  0 siblings, 0 replies; 16+ messages in thread
From: Peter Jones via lustre-devel @ 2022-03-29 12:51 UTC (permalink / raw)
  To: Kevin Zhao, Oleg Drokin
  Cc: Li Xi, Jian Yu, cloud-dev-request, Xinliang Liu via lustre-devel


[-- Attachment #1.1: Type: text/plain, Size: 5172 bytes --]

Please find attached the content from the refenced wiki page

From: Kevin Zhao <kevin.zhao@linaro.org>
Date: Tuesday, March 29, 2022 at 12:38 AM
To: Oleg Drokin <green@whamcloud.com>
Cc: Minh Diep <mdiep@whamcloud.com>, Xinliang Liu <xinliang.liu@linaro.org>, Peter Jones <pjones@whamcloud.com>, Xinliang Liu via lustre-devel <lustre-devel@lists.lustre.org>, "cloud-dev-request@op-lists.linaro.org" <cloud-dev-request@op-lists.linaro.org>, Jian Yu <jiyu@whamcloud.com>, Andreas Dilger <adilger@whamcloud.com>, "jsimmons@infradead.org" <jsimmons@infradead.org>, Li Xi <lixi@ddn.com>
Subject: Re: Lustre Arm stuff status and work plan

Hi Oleg,

We are now defining the test process and setting up a trial test cluster as the external Arm64 test resources.
Can I have an account to post consequences to Maloo DB? The trial arm64 external test cluster is under setup and hopefully, it will be finished quite soon. I want to test if we can post data to Maloo DB. Btw, the doc you point before: https://wiki.whamcloud.com/display/TEI/Test+results+format can not access, just 404.

On Mon, 28 Feb 2022 at 13:36, Oleg Drokin <green@whamcloud.com<mailto:green@whamcloud.com>> wrote:
Hello!

  the sizing really depends on your test scaling requirements.
  For example my own test infrastructure is a couple builders + 4 nodes for VMs (each has 256G RAM), 160 VM pairs in total,
  and on a particularly busy day another 80 VM pairs can be added. This is to ensure speedy feedback to developers.
  You can operate a much smaller scale testing system if you want, just keep in mind what is the longest running test would take
  to understand how many patches could be tested in parallel (sometimes patch bombs result in 20+ patches submission at the same time).
   Here’s stats for last 30 days. https://imgur.com/lk2ogJv 1 item means single patch n processing. time in testing for a patch is typically about 3.5 hours.


maloo shows the resources when you go into the test session, for example https://testing.whamcloud.com/test_sessions/4de25b47-43fc-4bfc-87aa-15e4968519a7 - scroll down to see list of nodes






On Feb 18, 2022, at 3:05 AM, Kevin Zhao <kevin.zhao@linaro.org<mailto:kevin.zhao@linaro.org>> wrote:

Hi All,


Greetings and thanks a lot for your comments! Xinliang and I are from Linaro<https://www.linaro.org/>, an organization focusing on Arm open-source ecosystem development. We have been working on Lustre on the Arm64 server and client end for a while now, already fixing a few bugs on arm64.
As Xinliang said before, we want to enable the Arm64 CI, Oleg advises that we can plug our own CI nodes into the Jenkins. Now we want to understand and estimate how many machines resources can meet our requests, and doing the next stage plan of our hardware to meet the Lustre test requirements.


As I understand, the test jobs will cover the ZFS and Ldiskfs backend with 2 scenarios:

  *   Lustre Arm64 Server + Arm64 Client( High Priority )

  *   Lustre Arm64 Server + x86_64 Client
After going through the Lustre test website: https://testing.whamcloud.com/test_sessions, it is quite clear to show the test info, and still remain some questions, that will be great if the community can give me a clear answer.
1. Is there a link to show all the machine resources? Including the machine info, CPU, memory and peripheral info.
2. Do we have a CI infra arch overview diagram to show the machine usage and communication?
3. How many machines are needed to meet the request of the Lustre Arm64 Server + Arm64 Client test?


Thanks a lot for your time, and look forward to your response.


On Tue, 28 Dec 2021 at 09:58, Oleg Drokin <green@whamcloud.com<mailto:green@whamcloud.com>> wrote:



On Dec 27, 2021, at 8:53 PM, Xinliang Liu <xinliang.liu@linaro.org<mailto:xinliang.liu@linaro.org>> wrote:

Maloo is just one place to link to to actually let people see the results, but you can link to external resources too
like e.g. gatekeeper janitor helper does or assuming the information is small enough it could be entirely contained
in the comment (like say for a build failure)

Ok, understand now. Is there any other reference external CI that posts results to Lustre gerrit now?

Currently there are:
- checkpatch and Misc code checks (smach) that post their results as 100% comment only. they share codebase pretty much
- the Janitor (also started with above codebase but got changed and extended a lot)

There was external interest in the past to post results to gerrit but it never materialized in the end


--
Best Regards
Kevin Zhao
Tech Lead, LDCG Cloud Infrastructure
Linaro Vertical Technologies
IRC(freenode): kevinz
Slack(kubernetes.slack.com<http://kubernetes.slack.com/>): kevinz
kevin.zhao@linaro.org<mailto:kevin.zhao@linaro.org> | Mobile/Direct/Wechat:  +86 18818270915




--
Best Regards
Kevin Zhao
Tech Lead, LDCG Cloud Infrastructure
Linaro Vertical Technologies
IRC(freenode): kevinz
Slack(kubernetes.slack.com<http://kubernetes.slack.com>): kevinz
kevin.zhao@linaro.org<mailto:kevin.zhao@linaro.org> | Mobile/Direct/Wechat:  +86 18818270915


[-- Attachment #1.2: Type: text/html, Size: 19421 bytes --]

[-- Attachment #2: Test results format - DataCenter Operations - Whamcloud Community Wiki.pdf --]
[-- Type: application/pdf, Size: 113184 bytes --]

[-- Attachment #3: Type: text/plain, Size: 165 bytes --]

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2022-03-29 16:32 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-16  8:23 [lustre-devel] Lustre Arm stuff status and work plan Xinliang Liu via lustre-devel
2021-12-20  2:30 ` Xinliang Liu via lustre-devel
2021-12-23  7:43   ` Oleg Drokin via lustre-devel
2021-12-27  7:56     ` Xinliang Liu via lustre-devel
2021-12-27 15:23       ` Oleg Drokin via lustre-devel
2021-12-28  1:53         ` Xinliang Liu via lustre-devel
2021-12-28  1:58           ` Oleg Drokin via lustre-devel
2022-02-18  8:05             ` Kevin Zhao via lustre-devel
2022-02-28  5:36               ` Oleg Drokin via lustre-devel
2022-03-11  8:28                 ` Kevin Zhao via lustre-devel
2022-03-15 22:09                   ` [lustre-devel] [EXTERNAL] " Simmons, James via lustre-devel
2022-03-16  1:25                     ` Kevin Zhao via lustre-devel
2022-03-18 22:46                       ` Simmons, James via lustre-devel
2022-03-20 12:26                         ` Kevin Zhao via lustre-devel
2022-03-29  7:37                 ` [lustre-devel] " Kevin Zhao via lustre-devel
2022-03-29 12:51                   ` Peter Jones via lustre-devel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.