Re: [PATCH] x86, vmlinux.lds: Add debug option to force all data sections aligned

From: Feng Tang <feng.tang@intel.com>
To: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	Dave Hansen <dave.hansen@intel.com>,
	H Peter Anvin <hpa@zytor.com>,
	Peter Zijlstra <peterz@infradead.org>,
	x86@kernel.org, linux-kernel@vger.kernel.org,
	Josh Poimboeuf <jpoimboe@redhat.com>
Subject: Re: [PATCH] x86, vmlinux.lds: Add debug option to force all data sections aligned
Date: Thu, 17 Feb 2022 15:37:53 +0800	[thread overview]
Message-ID: <20220217073753.GE8191@shbuild999.sh.intel.com> (raw)
In-Reply-To: <20220217062232.GD8191@shbuild999.sh.intel.com>

On Thu, Feb 17, 2022 at 02:22:32PM +0800, Feng Tang wrote:
> Hi Densy,
> 
> Thanks for the review!
> 
> On Wed, Feb 16, 2022 at 02:35:10PM +0100, Denys Vlasenko wrote:
> > On 2/16/22 9:28 AM, Feng Tang wrote:
> > > 0day has reported many strange performance changes (regression or
> > > improvement), in which there was no obvious relation between the culprit
> > > commit and the benchmark at the first look, and it causes people to doubt
> > > the test itself is wrong.
> > > 
> > > Upon further check, many of these cases are caused by the change to the
> > > alignment of kernel text or data, as whole text/data of kernel are linked
> > > together, change in one domain can affect alignments of other domains.
> > > 
> > > To help quickly identifying if the strange performance change is caused
> > > by _data_ alignment, add a debug option to force the data sections from
> > > all .o files aligned on THREAD_SIZE, so that change in one domain won't
> > > affect other modules' data alignment.
> > > 
> > > We have used this option to check some strange kernel changes [1][2][3],
> > > and those performance changes were gone after enabling it, which proved
> > > they are data alignment related. Besides these publicly reported cases,
> > > recently there are other similar cases found by 0day, and this option
> > > has been actively used by 0Day for analyzing strange performance changes.
> > ...
> > > +	.data : AT(ADDR(.data) - LOAD_OFFSET)
> > > +#ifdef CONFIG_DEBUG_FORCE_DATA_SECTION_ALIGNED
> > > +	/* Use the biggest alignment of below sections */
> > > +	SUBALIGN(THREAD_SIZE)
> > > +#endif
> > 
> > "Align every input section to 4096 bytes" ?
> 
> The THREAD_SIZE depends on ARCH, and could be 16KB or more (KASAN related).
> 
> > This is way, way, WAY too much. The added padding will be very wasteful.
>  
> Yes, its too much. One general kernel's data section could be bumped
> from 18MB to 50+ MB.
> 
> And I should make it more clear, that this is for debugging only.
> 
> > Performance differences are likely to be caused by cacheline alignment.
> > Factoring in an odd hardware prefetcher grabbing an additional
> > cacheline after every accessed one, I'd say alignment to 128 bytes
> > (on x86) should suffice for almost any scenario. Even 64 bytes
> > would almost always work fine.
>  
> Yep, when I started it, I tried 128 bytes for HW adjacent cacheline
> prefetch consideration. But the built kernel won't boot, then I  
> tried 512/1024/2048/4096 but there were still boot issue.

I just did these tests again, and confirmed that alignment <= 2048
will not boot on x86 HW, while 4096 and bigger will work.

The reason is this SUBALIGN() has privilege over the alignment
inside the data sections. like in vmlinux.lds:

 .data : AT(ADDR(.data) - 0xffffffff80000000)
 SUBALIGN(2048)
 {
_sdata = .;
  . = ALIGN(((1 << 12) << (2 + 0))); __start_init_task = .; init_thread_union = .; init_stack = .; KEEP(*(.data..init_task)) KEEP(*(.data..init_thread_info)) . = __start_init_task + ((1 << 12) << (2 + 0)); __end_init_task = .;
  . = ALIGN((1 << 12)); *(.data..page_aligned) . = ALIGN((1 << 12));
  ...

And the section of *(.data..page_aligned) asks for 4KB align but
our 2048 setting will break it, as confirmed by the System.map:

	ffffffff82800000 D _sdata
	ffffffff82800000 D init_stack
	ffffffff82800000 D init_thread_union
	ffffffff82804000 D __end_init_task
	ffffffff82804000 d bringup_idt_table
	ffffffff82804800 D __vsyscall_page  <--------- broken
	ffffffff82806000 d hpet

But this also means if necessary, we can change the alignment
from THREAD_SIZE to PAGE_SIZE, which will make data sections
much smaller.

Thanks,
Feng

> So I 
> chose the biggest alignment within data sections, considering
> this is purely for analyzing/debugging purpose.