All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nathan Chancellor <nathan@kernel.org>
To: Jani Nikula <jani.nikula@intel.com>
Cc: "Maxime Ripard" <mripard@kernel.org>,
	dri-devel@lists.freedesktop.org, intel-gfx@lists.freedesktop.org,
	intel-xe@lists.freedesktop.org,
	"David Airlie" <airlied@gmail.com>,
	"Daniel Vetter" <daniel@ffwll.ch>,
	"Maarten Lankhorst" <maarten.lankhorst@linux.intel.com>,
	"Thomas Zimmermann" <tzimmermann@suse.de>,
	"Alex Deucher" <alexander.deucher@amd.com>,
	"Christian König" <christian.koenig@amd.com>,
	"Pan, Xinhui" <Xinhui.Pan@amd.com>,
	"Karol Herbst" <kherbst@redhat.com>,
	"Lyude Paul" <lyude@redhat.com>,
	"Danilo Krummrich" <dakr@redhat.com>,
	"Rob Clark" <robdclark@gmail.com>,
	"Abhinav Kumar" <quic_abhinavk@quicinc.com>,
	"Dmitry Baryshkov" <dmitry.baryshkov@linaro.org>,
	"Sean Paul" <sean@poorly.run>,
	"Marijn Suijten" <marijn.suijten@somainline.org>,
	"Hamza Mahfooz" <hamza.mahfooz@amd.com>,
	"Javier Martinez Canillas" <javierm@redhat.com>,
	"Sui Jingfeng" <sui.jingfeng@linux.dev>,
	linux-kbuild@vger.kernel.org, llvm@lists.linux.dev
Subject: Re: [RESEND v3 2/2] drm: Add CONFIG_DRM_WERROR
Date: Wed, 27 Mar 2024 09:33:31 -0700	[thread overview]
Message-ID: <20240327163331.GB1153323@dev-arch.thelio-3990X> (raw)
In-Reply-To: <87a5mkgl6i.fsf@intel.com>

On Wed, Mar 27, 2024 at 09:59:01AM +0200, Jani Nikula wrote:
> On Wed, 27 Mar 2024, Maxime Ripard <mripard@kernel.org> wrote:
> > On Tue, Mar 26, 2024 at 03:56:50PM -0700, Nathan Chancellor wrote:
> >> On Tue, Mar 05, 2024 at 11:07:36AM +0200, Jani Nikula wrote:
> >> > Add kconfig to enable -Werror subsystem wide. This is useful for
> >> > development and CI to keep the subsystem warning free, while avoiding
> >> > issues outside of the subsystem that kernel wide CONFIG_WERROR=y might
> >> > hit.
> >> > 
> >> > v2: Don't depend on COMPILE_TEST
> >> > 
> >> > Reviewed-by: Hamza Mahfooz <hamza.mahfooz@amd.com> # v1
> >> > Signed-off-by: Jani Nikula <jani.nikula@intel.com>
> >> > ---
> >> >  drivers/gpu/drm/Kconfig  | 13 +++++++++++++
> >> >  drivers/gpu/drm/Makefile |  3 +++
> >> >  2 files changed, 16 insertions(+)
> >> > 
> >> > diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
> >> > index 6e853acf15da..c08e18108c2a 100644
> >> > --- a/drivers/gpu/drm/Kconfig
> >> > +++ b/drivers/gpu/drm/Kconfig
> >> > @@ -416,3 +416,16 @@ config DRM_LIB_RANDOM
> >> >  config DRM_PRIVACY_SCREEN
> >> >  	bool
> >> >  	default n
> >> > +
> >> > +config DRM_WERROR
> >> > +	bool "Compile the drm subsystem with warnings as errors"
> >> > +	depends on EXPERT
> >> > +	default n
> >> > +	help
> >> > +	  A kernel build should not cause any compiler warnings, and this
> >> > +	  enables the '-Werror' flag to enforce that rule in the drm subsystem.
> >> > +
> >> > +	  The drm subsystem enables more warnings than the kernel default, so
> >> > +	  this config option is disabled by default.
> >> > +
> >> > +	  If in doubt, say N.
> >> 
> >> While I understand the desire for an easy switch that maintainers and
> >> developers can use to ensure that their changes are warning free for the
> >> drm subsystem specifically, I think subsystem specific configuration
> >> options like this are actively detrimental to developers and continuous
> >> integration systems that build test the entire kernel. For example, we
> >> turned off CONFIG_WERROR for our Hexagon builds because of warnings that
> >> appear with -Wextra that are legitimate but require treewide changes to
> >> resolve in a manner sufficient for Linus:
> >> 
> >> https://github.com/ClangBuiltLinux/linux/issues/1285
> >> https://lore.kernel.org/all/CAHk-=wg80je=K7madF4e7WrRNp37e3qh6y10Svhdc7O8SZ_-8g@mail.gmail.com/
> >> https://lore.kernel.org/all/20230522105049.1467313-1-schnelle@linux.ibm.com/
> >> 
> >> But now, due to CONFIG_DRM_WERROR getting enabled by all{mod,yes}config
> >> and -Wextra being unconditionally enabled for DRM, those warnings hard
> >> break the build despite CONFIG_WERROR=n...
> >
> > Would making DRM_WERROR depends on WERROR address your concerns?
> 
> But then what would be the point of having DRM_WERROR at all? For me the
> point is, "werror in drm, ignore the rest, they're someone else's
> problem".

Right, I do think this is a valid view point and one I am sympathetic
to, especially since it is in the pursuit of increased code quality. I
do not want to disrupt that.

> An alternative would be to "depends on !COMPILE_TEST" that we have in
> i915, but then some folks want to have COMPILE_TEST in drm, because some
> drivers are otherwise hard for people to build.

Right. I think it is unfortunate how (at least in my opinion)
CONFIG_COMPILE_TEST has two meanings: genuinely just compile testing or
"allmodconfig". For the first case, we would want CONFIG_DRM_WERROR=y
but for the second case, it would be nice for CONFIG_DRM_WERROR to
default to off (because CONFIG_WERROR is enabled) but allow developers
to turn it on explicitly.

Another lofty/wistful idea to solve this would be to implement something
similar to compiler diagnostic groups for Kconfig, where there would be
a hierarchy like

  CONFIG_WERROR
    CONFIG_DRM_WERROR
    CONFIG_SUBSYSTEM_A_WERROR
    CONFIG_SUBSYSTEM_B_WERROR

where the value of CONFIG_WERROR is the same value for all
subconfigurations under it but they could still be enabled individually
without any additional dependencies (ala something like '-Wno-unused
-Wunused-variable'), which would allow my use case of CONFIG_WERROR=n
removing all instances of -Werror to continue to work but allow other
developers and CI systems to just set their specific -Werror
configuration and be done with it. I don't think something like that
exists but maybe I don't know Kconfig as well as I think I do :)

> Nathan, we do want to fix any issues switfly. Are you hitting specific
> build problems?

Yes, I see three distinct set of problems from our CI as a direct result
of this series. I already covered two in the prior mail but I'll be a
little more expansive below.

1. Instances of -Wunused-but-set-variable from variables that only have
   unary operations applied to them. Clang can warn in this case where
   GCC cannot: https://godbolt.org/z/d368q3coP

      int main(void)
      {
          int a = 0;
          a++;
          return 0;
      }

   which shows up in a few drm drivers. Most have a patch on the mailing
   list that has not been applied.

     drivers/gpu/drm/qxl/qxl_cmd.c:424:6: error: variable 'count' set but not used [-Werror,-Wunused-but-set-variable]
       424 |         int count = 0;
           |             ^
     https://lore.kernel.org/all/20230408165023.2706235-1-trix@redhat.com/ (almost a year old)

     drivers/gpu/drm/qxl/qxl_ioctl.c:148:14: error: variable 'num_relocs' set but not used [-Werror,-Wunused-but-set-variable]
       148 |         int i, ret, num_relocs;
           |                     ^
     https://lore.kernel.org/all/20240307104119.1980621-1-colin.i.king@gmail.com/

     drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c:843:6: error: variable 'out' set but not used [-Werror,-Wunused-but-set-variable]
       843 |         u64 out = dumper->iova + A6XX_CD_DATA_OFFSET;
           |             ^
     https://lore.kernel.org/all/20240326212324.185832-1-ojeda@kernel.org/ (recent patch)

     drivers/gpu/drm/panthor/panthor_sched.c:2048:6: error: variable 'csg_mod_mask' set but not used [-Werror,-Wunused-but-set-variable]
      2048 |         u32 csg_mod_mask = 0, free_csg_slots = 0;
           |             ^
     No patch, new driver, not reported yet it seems.

2. High stack usage in AMDGPU files for ARCH=powerpc allmodconfig. This
   might be a compiler issue but until now, there have been more
   important fires.

     drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c:1096:12: error: stack frame size (2064) exceeds limit (2048) in 'vcn_v3_0_start' [-Werror,-Wframe-larger-than]
      1096 | static int vcn_v3_0_start(struct amdgpu_device *adev)
           |            ^

     drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c:955:12: error: stack frame size (2112) exceeds limit (2048) in 'vcn_v4_0_5_start' [-Werror,-Wframe-larger-than]
       955 | static int vcn_v4_0_5_start(struct amdgpu_device *adev)
           |            ^

     drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c:713:12: error: stack frame size (2992) exceeds limit (2048) in 'vcn_v5_0_0_start' [-Werror,-Wframe-larger-than]
       713 | static int vcn_v5_0_0_start(struct amdgpu_device *adev)
           |            ^

   Taking a brief look at it while writing this email, it appears
   related to CONFIG_UBSAN_BOUNDS, as none of the warnings appear when
   that is disabled on top of allmodconfig. I suspect that the sanitizer
   instrumentation and inlining might be messing something up here, it
   has happened with other sanitizers like KASAN and KCSAN in the past.
   Without CONFIG_UBSAN_BOUNDS, the stack usage of these functions does
   not seem too bad:

     drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c:1096:12: warning: stack frame size (816) exceeds limit (512) in 'vcn_v3_0_start' [-Wframe-larger-than]
     drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c:955:12: warning: stack frame size (800) exceeds limit (512) in 'vcn_v4_0_5_start' [-Wframe-larger-than]
     drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c:713:12: warning: stack frame size (1040) exceeds limit (512) in 'vcn_v5_0_0_start' [-Wframe-larger-than]

3. -Wnull-pointer-arithmetic from IO port accessors on architectures
   that do not have them (such as hexagon and s390). For example:

     In file included from drivers/gpu/drm/virtio/virtgpu_plane.c:26:
     In file included from include/drm/drm_atomic_helper.h:31:
     In file included from include/drm/drm_crtc.h:32:
     In file included from include/drm/drm_modes.h:33:
     In file included from include/drm/drm_connector.h:32:
     In file included from include/drm/drm_util.h:35:
     In file included from include/linux/interrupt.h:11:
     In file included from include/linux/hardirq.h:11:
     In file included from ./arch/hexagon/include/generated/asm/hardirq.h:1:
     In file included from include/asm-generic/hardirq.h:17:
     In file included from include/linux/irq.h:20:
     In file included from include/linux/io.h:13:
     In file included from arch/hexagon/include/asm/io.h:328:
     include/asm-generic/io.h:584:33: error: performing pointer arithmetic on a null pointer has undefined behavior [-Werror,-Wnull-pointer-arithmetic]
       584 |         __raw_writeb(value, PCI_IOBASE + addr);
           |                             ~~~~~~~~~~ ^
     include/asm-generic/io.h:594:59: error: performing pointer arithmetic on a null pointer has undefined behavior [-Werror,-Wnull-pointer-arithmetic]
       594 |         __raw_writew((u16 __force)cpu_to_le16(value), PCI_IOBASE + addr);
           |                                                       ~~~~~~~~~~ ^
     include/asm-generic/io.h:604:59: error: performing pointer arithmetic on a null pointer has undefined behavior [-Werror,-Wnull-pointer-arithmetic]
       604 |         __raw_writel((u32 __force)cpu_to_le32(value), PCI_IOBASE + addr);
           |                                                       ~~~~~~~~~~ ^

   These warnings are numerous throughout drivers/gpu/drm/ because this
   warning is enabled with -Wextra. Again, this is not exactly your
   problem and it should eventually be fixed by [1] (it appears that
   Niklas is working on a new version at [2]) but it is exacerbated by
   the default combo of W=1 + -Werror for DRM with allmodconfig now,
   even with CONFIG_WERROR=n.

Hopefully that helps clear things up. I am more than happy to send
patches or work towards solutions that satisfies everyone (or at least a
majority/consensus). Wider testing with clang never hurts as well but I
understand increasing build matrices is not always an easy sell.

[1]: https://lore.kernel.org/all/20230522105049.1467313-45-schnelle@linux.ibm.com/
[2]: https://git.kernel.org/niks/l/has_ioport_v6

Cheers,
Nathan

  reply	other threads:[~2024-03-27 16:33 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-05  9:07 [RESEND v3 0/2] drm: enable W=1 warnings by default across the subsystem Jani Nikula
2024-03-05  9:07 ` [RESEND v3 1/2] drm: enable (most) " Jani Nikula
2024-05-17 21:04   ` Guenter Roeck
2024-03-05  9:07 ` [RESEND v3 2/2] drm: Add CONFIG_DRM_WERROR Jani Nikula
2024-03-05 10:14   ` Javier Martinez Canillas
2024-03-26 22:56   ` Nathan Chancellor
2024-03-27  7:34     ` Maxime Ripard
2024-03-27  7:59       ` Jani Nikula
2024-03-27 16:33         ` Nathan Chancellor [this message]
2024-03-28 11:33           ` Jani Nikula
2024-03-05  9:13 ` ✓ CI.Patch_applied: success for drm: enable W=1 warnings by default across the subsystem (rev2) Patchwork
2024-03-05  9:13 ` ✓ CI.checkpatch: " Patchwork
2024-03-05  9:14 ` ✓ CI.KUnit: " Patchwork
2024-03-05  9:32 ` ✓ CI.Build: " Patchwork
2024-03-05  9:32 ` ✓ CI.Hooks: " Patchwork
2024-03-05  9:34 ` ✗ CI.checksparse: warning " Patchwork
2024-03-05 10:15 ` ✓ CI.BAT: success " Patchwork
2024-03-05 11:29 ` [RESEND v3 0/2] drm: enable W=1 warnings by default across the subsystem Maxime Ripard
2024-03-05 17:43   ` Jani Nikula
2024-03-05 17:45     ` Lucas De Marchi
2024-03-05 17:50       ` Jani Nikula
2024-03-05 21:39 ` ✗ Fi.CI.BUILD: failure for drm: enable W=1 warnings by default across the subsystem (rev4) Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240327163331.GB1153323@dev-arch.thelio-3990X \
    --to=nathan@kernel.org \
    --cc=Xinhui.Pan@amd.com \
    --cc=airlied@gmail.com \
    --cc=alexander.deucher@amd.com \
    --cc=christian.koenig@amd.com \
    --cc=dakr@redhat.com \
    --cc=daniel@ffwll.ch \
    --cc=dmitry.baryshkov@linaro.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=hamza.mahfooz@amd.com \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=jani.nikula@intel.com \
    --cc=javierm@redhat.com \
    --cc=kherbst@redhat.com \
    --cc=linux-kbuild@vger.kernel.org \
    --cc=llvm@lists.linux.dev \
    --cc=lyude@redhat.com \
    --cc=maarten.lankhorst@linux.intel.com \
    --cc=marijn.suijten@somainline.org \
    --cc=mripard@kernel.org \
    --cc=quic_abhinavk@quicinc.com \
    --cc=robdclark@gmail.com \
    --cc=sean@poorly.run \
    --cc=sui.jingfeng@linux.dev \
    --cc=tzimmermann@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.