[PATCH v8 00/12] syfs: generic deadlock fix with module removal

* [PATCH v8 00/12] syfs: generic deadlock fix with module removal
@ 2021-09-27 16:37 Luis Chamberlain
  2021-09-27 16:37 ` [PATCH v8 01/12] LICENSES: Add the copyleft-next-0.3.1 license Luis Chamberlain
                   ` (11 more replies)
  0 siblings, 12 replies; 94+ messages in thread
From: Luis Chamberlain @ 2021-09-27 16:37 UTC (permalink / raw)
  To: tj, gregkh, akpm, minchan, jeyu, shuah
  Cc: bvanassche, dan.j.williams, joe, tglx, mcgrof, keescook, rostedt,
	linux-spdx, linux-doc, linux-block, linux-fsdevel,
	linux-kselftest, linux-kernel

This is a follow up to my v7 series of fixes for the zram driver [0]
which ended up uncovering a generic deadlock issue with sysfs and module
removal. I've reported this issue and proposed a few patches first since
March 2021 [1]. At the end of this email you will find an itemized list
of changes since that v1 series, you can also find these changes on my
branch 20210927-sysfs-generic-deadlock-fix [4] which is based on
linux-next tag next-20210927.

Just a heads up, I'm goin on vacation in two days, won't be back until
Monday October 11th.

On this v8 I incorporate feedback from the v7 series, namely:

 - Tejun requested I move the struct module to the last attribute when
   extending functions
 - As per discussion with Tejun, trimmed and clarified the commit log
   and documentation on the generic fix on patch 7
 - As requested by Bart Van Assche, I simplied the setting of the
   struct test_config *config into one line instead of two on many
   places on patch 3 which adds the new sysfs selftest
 - Dan Williams had some questions about patch 7, and so clarified these
   questions using a more elaborate example on the commit log to show
   where the lock call was happening.
 - Trimmed the Cc list considerably as it was way too long before
 - Rebased onto linux-next tag next-20210927

Below a list of changes of this patch set since its inception:

On v1:
  - Open coded the sysfs deadlock race to only be localized by the zram
    driver
Changes on v2:
  - used bdgrab() as well for another race which was speculated by
    Minchan
  - improved documentation of fixes
Changes on v3:
  - used a localized zram macros for the sysfs attributes instead of
    open coding on each routine
  - replaced bdget() stuff for a generic get_device() and bus_get() on
    dev_attr_show() / dev_attr_store() for the issue speculated by
    Michan
Changes on v4:
  - Cosmetic fixes on the zram fixes as requested by Greg
  - Split out the driver core fix as requested by Greg for the
    issue speculated by Michan. This fix ended up getting up to its 4th
    patch iteration [2] and eventually hit linux-next. We got a 0day
    0day suspend stres fail for this patch [3]
Changes on v5:
  - I ended up writing a test_sysfs driver and with it I ended up
    proving that the issue speculated by Michen was not possible and
    so I asked Greg to drop the patch from his queue titled
    "sysfs: fix kobject refcount to address races with kobject removal"
  - checkpatch fixes for the zram changes
Changes on v6:
  - I submitted my test_sysfs driver for inclusion upstream which easily
    abstracted the deadlock issue in a driver generically [4]
  - I rebased the zram fixes and added also a new patch for zram to use
    ATTRIBUTE_GROUPS As per Minchen I sent the patches to be merged
    through Andrew Morton.
  - Greg ended up NACK'ing the patchset because he was not sure the fix
    was correct still
Changes on v7:
  - Formalizes the original proposed generic sysfs fix intead of using
    macro helpers to work around the issue
  - I decided it is best to merge all the effort together into
    one patch set because communication was being lost when I split the
    patches up. This was not helping in any way to either fix the zram
    issues or come to consensus on a generic solution. The patches are
    also merged now because they are all related now.
  - Running checkpatch exposed that S_IRWXUGO and S_IRWXU|S_IRUGO|S_IXUGO
    should be replaced, so I did that in this series in two new patches
  - Adds a try_module_get() documentation extension with tribal
    knowledge and new information I don't think some folks still believe
    in. The new test_sysfs selftest however proves this information to
    be correct, the same selftest can be used to try to prove that
    documentation incorrect
  - Because the fix is now generic zram's deadlock can easily be fixed
    now by just making it use ATTRIBUTE_GROUPS().

[0] https://lkml.kernel.org/r/YUjLAbnEB5qPfnL8@slm.duckdns.org
[1] https://lkml.kernel.org/r/20210306022035.11266-1-mcgrof@kernel.org
[2] https://lkml.kernel.org/r/20210623215007.862787-1-mcgrof@kernel.org                                                                                                      
[3] https://lkml.kernel.org/r/20210701022737.GC21279@xsang-OptiPlex-9020                                                                                                     
[4] https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux-next.git/log/?h=20210927-sysfs-generic-deadlock-fix

Luis Chamberlain (12):
  LICENSES: Add the copyleft-next-0.3.1 license
  testing: use the copyleft-next-0.3.1 SPDX tag
  selftests: add tests_sysfs module
  kernfs: add initial failure injection support
  test_sysfs: add support to use kernfs failure injection
  kernel/module: add documentation for try_module_get()
  fs/kernfs/symlink.c: replace S_IRWXUGO with 0777 on
    kernfs_create_link()
  fs/sysfs/dir.c: replace S_IRWXU|S_IRUGO|S_IXUGO with 0755
    sysfs_create_dir_ns()
  sysfs: fix deadlock race with module removal
  test_sysfs: enable deadlock tests by default
  zram: fix crashes with cpu hotplug multistate
  zram: use ATTRIBUTE_GROUPS to fix sysfs deadlock module removal

 .../fault-injection/fault-injection.rst       |   22 +
 LICENSES/dual/copyleft-next-0.3.1             |  237 +++
 MAINTAINERS                                   |    9 +-
 arch/x86/kernel/cpu/resctrl/rdtgroup.c        |    4 +-
 drivers/block/zram/zram_drv.c                 |   74 +-
 fs/kernfs/Makefile                            |    1 +
 fs/kernfs/dir.c                               |   44 +-
 fs/kernfs/failure-injection.c                 |   91 ++
 fs/kernfs/file.c                              |   19 +-
 fs/kernfs/kernfs-internal.h                   |   75 +-
 fs/kernfs/symlink.c                           |    4 +-
 fs/sysfs/dir.c                                |    5 +-
 fs/sysfs/file.c                               |    6 +-
 fs/sysfs/group.c                              |    3 +-
 include/linux/kernfs.h                        |   19 +-
 include/linux/module.h                        |   34 +-
 include/linux/sysfs.h                         |   52 +-
 kernel/cgroup/cgroup.c                        |    2 +-
 lib/Kconfig.debug                             |   25 +
 lib/Makefile                                  |    1 +
 lib/test_kmod.c                               |   12 +-
 lib/test_sysctl.c                             |   12 +-
 lib/test_sysfs.c                              |  952 ++++++++++++
 tools/testing/selftests/kmod/kmod.sh          |   13 +-
 tools/testing/selftests/sysctl/sysctl.sh      |   12 +-
 tools/testing/selftests/sysfs/Makefile        |   12 +
 tools/testing/selftests/sysfs/config          |    5 +
 tools/testing/selftests/sysfs/sysfs.sh        | 1383 +++++++++++++++++
 28 files changed, 3026 insertions(+), 102 deletions(-)
 create mode 100644 LICENSES/dual/copyleft-next-0.3.1
 create mode 100644 fs/kernfs/failure-injection.c
 create mode 100644 lib/test_sysfs.c
 create mode 100644 tools/testing/selftests/sysfs/Makefile
 create mode 100644 tools/testing/selftests/sysfs/config
 create mode 100755 tools/testing/selftests/sysfs/sysfs.sh

-- 
2.30.2

^ permalink raw reply	[flat|nested] 94+ messages in thread