The regression has been identified; Chris Wilson found commits touching

swapfile.c, and reverting them the issue couldn’t be reproduced any more.

 

https://patchwork.freedesktop.org/series/87549/

 

This revert will be applied to core-for-CI branch. When new CI_DRM has

been built, shard-testing will be enabled again.

 

Regards,

 

Tomi Sarvela

 

From: Sarvela, Tomi P

More information (excuse my top-posting):

 

- Issue happens in igt@gem_tiled_swapping@non-threaded Mlocking

phase, before “starting subtest” appears.

 

- Filesystem trashed is the one containing swapfile

 

- If swap is partition, it seems that the swap signature is correct even

after running the test, so for now I’m assuming that the issue has to do

with swapfile

 

- Bisection between 20210129 and 20210215 proved to be challenging,

because the kernels have pre-init hang, don’t leave dmesg and I don’t

have console on testing host. Petri’s suggestion to bisect between

CI_DRM_9817 and 9818 might work better

 

Regards,

 

Tomi Sarvela

 

From: Sarvela, Tomi P

Hello,

 

The linux i915 CI shardruns have been disabled. This is due to the unfortunate

filesystem-corrupting bug first seen in linux-next 20210215, which now has

been merged to linus 5.12-rc1 and further on to DRM-Tip, first instance seen

in CI_DRM_9818. Last changes coming in were:

 

fb3b93df7979 drm-tip: 2021y-03m-01d-09h-36m-57s UTC integration manifest

3b3c4086295b drm-tip: 2021y-03m-01d-08h-49m-06s UTC integration manifest

fe07bfda2fb9 Linux 5.12-rc1

 

More information can be seen at:

https://phoronix.com/scan.php?page=news_item&px=Linux-5.12-Early-Buggy-Issue

 

I’ve seen this bug happen regularly with (but not limited to) IGT test:

igt@gem_tiled_swapping@non-threaded

 

The range for bisection is linux-next 20210215 to 20210129 because the kernels

in-between taint the kernel and our i915 testing was not done. Hitting the bug

corrupts the underlying filesystem very thoroughly, wiping out large amount of

data from the beginning of the partition which leaves fsck sad with thousands of

items lost. Bisection of the IGT testlist was done with two root filesystems, where

testable kernel booted from 2. partition, and copy of the 2. partition was stored

on 1. partition and could be restored at will.

 

I’ll continue bisecting this bug on the linux-next tree again. If someone has more

information where this issue originates from, help would be appreciated.

 

Regards,

 

Tomi Sarvela

 

--

Intel Finland Oy - BIC 0357606-4 - Westendinkatu 7, 02160 Espoo