[lustre-devel] Lustre Arm stuff status and work plan

* [lustre-devel] Lustre Arm stuff status and work plan
@ 2021-12-16  8:23 Xinliang Liu via lustre-devel
  2021-12-20  2:30 ` Xinliang Liu via lustre-devel
  0 siblings, 1 reply; 16+ messages in thread
From: Xinliang Liu via lustre-devel @ 2021-12-16  8:23 UTC (permalink / raw)
  To: Peter Jones; +Cc: Jian Yu, lixi, cloud-dev-request, lustre-devel

[-- Attachment #1.1: Type: text/plain, Size: 5279 bytes --]

*Hi Peter and all,As Kevin(on cc) and I have been working on Lustre Arm
stuff for some time. We want to give a status and progress report to the
community and list our work plan for the next year. Please help to review
our work plan and give some comments and suggestions. Thanks.*Status and
Progress

*====================Release - No Arm packages built on official community
release <https://downloads.whamcloud.com/public/lustre/> yet.Build -
Verified Lustre, openZFS build and multi-nodes setup on Arm64 CentOS 8, all
are ok.- Lbuild script support for Arm is on review
<https://review.whamcloud.com/#/c/45691/>, LU-15293
<https://jira.whamcloud.com/browse/LU-15293>.CI - No Arm server end CI
support yet.- Arm client with x86_64 server test is already in the CI
gate.- Only run a few ldiskfs test suites(sanity, sanity-sec,
sanctity-lnet, etc.), not a full test.- A full test
<https://review.whamcloud.com/#/c/44760/> (with empty GRANT_CHECK_LIST)
shows several Arm client related failed test cases, see test results page
<https://testing.whamcloud.com/test_sessions?jobs=lustre-reviews&builds=82774&start_date=2021-08-26#redirect>-
sanity test 317: LU-11667 <https://jira.whamcloud.com/browse/LU-11667>
(Workaround fix landed)- sanityn test 16a: LU-11597
<https://jira.whamcloud.com/browse/LU-11597>, test 71a: LU-11787
<https://jira.whamcloud.com/browse/LU-11787>- conf-sanity test 98: LU-11785
<https://jira.whamcloud.com/browse/LU-11785>, test 112: LU-13813
<https://jira.whamcloud.com/browse/LU-13813>- sanity-flr test 50a: LU-14970
<https://jira.whamcloud.com/browse/LU-14970>- sanity-pcc test 7a: LU-14346
<https://jira.whamcloud.com/browse/LU-14346>Arm server end test on local
setup - Run a full ldiskfs test with all test suites.- Due to the multi
MDTs crash issue, some multi MDTs tests are not run.- Many new failed tests
come, see the google sheet
<https://docs.google.com/spreadsheets/d/1EE5zU96_lqlkS0uk6NJeeNBrikYpd_ZEO7hdVt5spsw/edit#gid=969410610>for
details.- The openZFS full test is not run, but heard that it should be
more stable than ldiskfs.Bugfix - Old Arm always_except bugs
https://jira.whamcloud.com/issues/?filter=15555
<https://jira.whamcloud.com/issues/?filter=15555> , the Arm related ones
are almost addressed.- LU-11596
<https://jira.whamcloud.com/browse/LU-11596>, LU-11597
<https://jira.whamcloud.com/browse/LU-11597>, LU-14067
<https://jira.whamcloud.com/browse/LU-14067>, LU-11787
<https://jira.whamcloud.com/browse/LU-11787>: addressed, patch sent and
waiting for Arm client CI recovery to land.- LU-10073
<https://jira.whamcloud.com/browse/LU-10073>, LU-11671
<https://jira.whamcloud.com/browse/LU-11671>: can't be reproduced on Arm or
happen on x86_64 also.- Other old Arm bugs  LU-11785
<https://jira.whamcloud.com/browse/LU-11785>, LU-13813
<https://jira.whamcloud.com/browse/LU-13813>, LU-14970
<https://jira.whamcloud.com/browse/LU-14970>, LU-14346
<https://jira.whamcloud.com/browse/LU-14346> to be fixed.- New created
server end bugs - LU-15122 <https://jira.whamcloud.com/browse/LU-15122> :
ASSERTION( iobuf->dr_rw == 0 ) crash issue, fixed patch is landed. -
LU-15364 <https://jira.whamcloud.com/browse/LU-15364>: multi MDTs kernel
oops issue, related to atomic unaligned memory access, work in progress.-
LU-15223 <https://jira.whamcloud.com/browse/LU-15223>: 64K page size
read/write improvement, long-term work, in progress. - Full Arm related bug
list with label arm: https://jira.whamcloud.com/issues/?filter=16710
<https://jira.whamcloud.com/issues/?filter=16710>- is not that ready for
production.Reference to:James Simmons’ Lustre Arm update:
https://connect.linaro.org/resources/san19/san19-224/
<https://connect.linaro.org/resources/san19/san19-224/>*Work Plan

*========== - Lustre Server End Critical Bug Fix target 2022-06- Lustre
Multiple MDTs kernel OOPS when stripe issue: LU-15364
<https://jira.whamcloud.com/browse/LU-15364>- Lustre hangs at Sanity Test
807- Lustre Conf-sanity test 44 kernel crash- Lustre Conf sanity case 58
kernel crash- Lustre Conf sanity case 78 kernel crash- Lustre Conf sanity
case 79 crash- Lustre sanity-pcc 7a case hang the cluster- Lustre Server
End Non-critical Bug Fix target 2022-12- Lustre Sanity failure cases: 33
cases- Lustre server replay-single: 1 case- Lustre sanity-flr 200 cases
fix: 1 case - Lustre sanity-hsm failure cases: 25 cases- Lustre
lustre-rsync-test failure test: 3- Lustre recovery-small/sanity-scrub: 2-
Lustre sanityn test cases fix: 12- Lustre sanity-lfsck failure cases fix:
3- Lustre sanity-sec failure cases fix ：7 - Lustre sanity-lnet failure
cases test fix: 2- Continuous add more test suites for Arm client CI ??-
Once a  test suite is all passed for Arm then add it into CI.- Server CI
support for Arm on Centos8 ??- Ideally, Arm server CI can come with Arm
server end fixes patches and ensure future patches merged don’t make any
regressions on Arm.- As the test infra is not open source and maintained by
whamcloud, it might need whamcloud to make it ??- Other works in future-
Test other distros like ubuntu, SUSE etc.- Test x86 client with Arm64
Server - Basic Optimised: CRC/AES- All-flash optimizationBest
Regards,Xinliang*

[-- Attachment #1.2: Type: text/html, Size: 54661 bytes --]

[-- Attachment #2: Type: text/plain, Size: 165 bytes --]

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 16+ messages in thread