linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Kirill A. Shutemov" <kirill@shutemov.name>
To: Ingo Molnar <mingo@kernel.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Ingo Molnar <mingo@redhat.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	x86@kernel.org, Thomas Gleixner <tglx@linutronix.de>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Andy Lutomirski <luto@amacapital.net>,
	Cyrill Gorcunov <gorcunov@openvz.org>,
	Borislav Petkov <bp@suse.de>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 0/6] Boot-time switching between 4- and 5-level paging for 4.15, Part 1
Date: Thu, 26 Oct 2017 17:40:41 +0300	[thread overview]
Message-ID: <20171026144040.hjm45civpm74gafx@node.shutemov.name> (raw)
In-Reply-To: <20171026073752.fl4eicn4x7wudpop@gmail.com>

On Thu, Oct 26, 2017 at 09:37:52AM +0200, Ingo Molnar wrote:
> 
> * Kirill A. Shutemov <kirill@shutemov.name> wrote:
> 
> > On Tue, Oct 24, 2017 at 02:47:41PM +0200, Ingo Molnar wrote:
> > > > > > > > > Making a variable that 'looks' like a constant macro dynamic in a rare Kconfig 
> > > > > > > > > scenario is asking for trouble.
> > > > > > > > 
> > > > > > > > We expect boot-time page mode switching to be enabled in kernel of next
> > > > > > > > generation enterprise distros. It shoudn't be that rare.
> > > > > > > 
> > > > > > > My point remains even with not-so-rare Kconfig dependency.
> > > > > > 
> > > > > > I don't follow how introducing new variable that depends on Kconfig option
> > > > > > would help with the situation.
> > > > > 
> > > > > A new, properly named variable or function (max_physmem_bits or 
> > > > > max_physmem_bits()) that is not all uppercase would make it abundantly clear that 
> > > > > it is not a constant but a runtime value.
> > > > 
> > > > Would we need to rename every uppercase macros that would depend on
> > > > max_physmem_bits()? Like MAXMEM.
> > > 
> > > MAXMEM isn't used in too many places either - what's the total impact of it?
> > 
> > The impact is not very small. The tree of macros dependent on
> > MAX_PHYSMEM_BITS:
> > 
> > MAX_PHYSMEM_BITS
> >   MAXMEM
> >     KEXEC_SOURCE_MEMORY_LIMIT
> >     KEXEC_DESTINATION_MEMORY_LIMIT
> >     KEXEC_CONTROL_MEMORY_LIMIT
> >   SECTIONS_SHIFT
> >     ZONEID_SHIFT
> >       ZONEID_PGSHIFT
> >       ZONEID_MASK
> > 
> > The total number of users of them is not large. It's doable. But I expect
> > it to be somewhat ugly, since we're partly in generic code and it would
> > require some kind of compatibility layer for other archtectures.
> > 
> > Do you want me to rename them all?
> 
> Yeah, I think these former constants should be organized better.
> 
> Here's their usage frequency:
> 
>  triton:~/tip> for N in MAX_PHYSMEM_BITS MAXMEM KEXEC_SOURCE_MEMORY_LIMIT \
>  KEXEC_DESTINATION_MEMORY_LIMIT KEXEC_CONTROL_MEMORY_LIMIT SECTIONS_SHIFT \
>  ZONEID_SHIFT ZONEID_PGSHIFT ZONEID_MASK; do printf "  %-40s: " $N; git grep -w $N  | grep -vE 'define| \* ' | wc -l; done
> 
>    MAX_PHYSMEM_BITS                        : 10
>    MAXMEM                                  : 5
>    KEXEC_SOURCE_MEMORY_LIMIT               : 2
>    KEXEC_DESTINATION_MEMORY_LIMIT          : 2
>    KEXEC_CONTROL_MEMORY_LIMIT              : 2
>    SECTIONS_SHIFT                          : 2
>    ZONEID_SHIFT                            : 1
>    ZONEID_PGSHIFT                          : 1
>    ZONEID_MASK                             : 1
> 
> So it's not too bad to clean up, I think.
> 
> How about something like this:
> 
> 	machine.physmem.max_bytes		/* ex MAXMEM */
> 	machine.physmem.max_bits		/* bit count of the highest in-use physical address */
> 	machine.physmem.zones.id_shift		/* ZONEID_SHIFT */
> 	machine.physmem.zones.pg_shift		/* ZONEID_PGSHIFT */
> 	machine.physmem.zones.id_mask		/* ZONEID_MASK */
> 
> 	machine.kexec.physmem_bytes_src		/* KEXEC_SOURCE_MEMORY_LIMIT */
> 	machine.kexec.physmem_bytes_dst		/* KEXEC_DESTINATION_MEMORY_LIMIT */
> 
> ( With perhaps 'physmem' being an alias to '&machine->physmem', so that 
>   physmem->max_bytes and physmem->max_bits would be a natural thing to write. )
> 
> I'd suggest doing this in a finegrained fashion, one step at a time, introducing 
> 'struct machine' and 'struct physmem' and extending it gradually with new fields.

I don't think this design is reasonable.

  - It introduces memory references where we haven't had them before.

    At this point all variable would fit a cache line, which is not that
    bad. But I don't see what would stop the list from growing in the
    future.

  - We loose ability to optimize out change with static branches
    (cpu_feature_enabled() instead of pgtable_l5_enabled variable).

    It's probably, not that big of an issue here, but if we are going to
    use the same approach for other dynamic macros in the patchset, it
    might be.

  - AFAICS, it requires changes to all architectures to provide such
    structures as we now partly in generic code.

    Or to introduce some kind of compatibility layer, but it would make
    the kernel as a whole uglier than cleaner. Especially, given that
    nobody beyond x86 need this.

To me it's pretty poor trade off for a clean up.

> To re-discuss the virt_addr_valid() concern you raised:
> 
> > > For instance, virt_addr_valid() depends indirectly on it:
> > > 
> > >   virt_addr_valid()
> > >     __virt_addr_valid()
> > >       phys_addr_valid()
> > >         boot_cpu_data.x86_phys_bits (initialized with MAX_PHYSMEM_BITS)
> > > 
> > > virt_addr_valid() is used in things like implementation /dev/kmem.
> > > 
> > > To me it's far more risky than occasional build breakage for
> > > CONFIG_X86_5LEVEL=y.
> > 
> > So why do we have two variables here, one boot_cpu_data.x86_phys_bits and the
> > other MAX_PHYSMEM_BITS - both set once during boot?
> 
> So it's still unclear to me why virt_addr_valid() would be a problem: this 
> function could probably (in a separate patch) use physmem->max_bits, which would 
> make it more secure than using even a dynamic MAX_PHYSMEM_BITS: it would detect 
> any physical addresses that are beyond the recognized maximum range.

Here we discussed what would happen if we leave MAX_PHYSMEM_BITS as a
constant -- maximum possible physmem bits, regardless of paging mode --
and introduce new variable that would reflect the actual maximum.

And this was example for the case that may misbehave (not only bloat a
data structure) if we would forget to change MAX_PHYSMEM_BITS to a
new variable.

-- 
 Kirill A. Shutemov

  reply	other threads:[~2017-10-26 14:40 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-29 14:08 [PATCH 0/6] Boot-time switching between 4- and 5-level paging for 4.15, Part 1 Kirill A. Shutemov
2017-09-29 14:08 ` [PATCH 1/6] mm/sparsemem: Allocate mem_section at runtime for SPARSEMEM_EXTREME Kirill A. Shutemov
2017-10-20 12:27   ` [tip:x86/mm] mm/sparsemem: Allocate mem_section at runtime for CONFIG_SPARSEMEM_EXTREME=y tip-bot for Kirill A. Shutemov
2017-11-02 12:31     ` Sudeep Holla
2017-11-02 13:34       ` Kirill A. Shutemov
2017-11-02 13:42         ` Sudeep Holla
2017-11-02 14:12           ` Kirill A. Shutemov
2017-11-02 15:07             ` Sudeep Holla
2017-11-02 15:37             ` Thierry Reding
2017-11-06 19:00             ` Bjorn Andersson
2017-11-07  1:15               ` Will Deacon
2017-09-29 14:08 ` [PATCH 2/6] mm/zsmalloc: Prepare to variable MAX_PHYSMEM_BITS Kirill A. Shutemov
2017-10-14  0:00   ` Nitin Gupta
2017-10-16 14:44     ` Kirill A. Shutemov
2017-10-18 23:39       ` Nitin Gupta
2017-09-29 14:08 ` [PATCH 3/6] x86/kasan: Use the same shadow offset for 4- and 5-level paging Kirill A. Shutemov
2017-10-20 12:28   ` [tip:x86/mm] " tip-bot for Andrey Ryabinin
2017-09-29 14:08 ` [PATCH 4/6] x86/xen: Provide pre-built page tables only for XEN_PV and XEN_PVH Kirill A. Shutemov
2017-10-20 12:28   ` [tip:x86/mm] x86/xen: Provide pre-built page tables only for CONFIG_XEN_PV=y and CONFIG_XEN_PVH=y tip-bot for Kirill A. Shutemov
2017-09-29 14:08 ` [PATCH 5/6] x86/xen: Drop 5-level paging support code from XEN_PV code Kirill A. Shutemov
2017-10-20 12:29   ` [tip:x86/mm] x86/xen: Drop 5-level paging support code from the " tip-bot for Kirill A. Shutemov
2017-09-29 14:08 ` [PATCH 6/6] x86/boot/compressed/64: Detect and handle 5-level paging at boot-time Kirill A. Shutemov
2017-10-03  8:27 ` [PATCH 0/6] Boot-time switching between 4- and 5-level paging for 4.15, Part 1 Kirill A. Shutemov
2017-10-17 15:42   ` Kirill A. Shutemov
2017-10-20  8:18     ` Ingo Molnar
2017-10-20  9:41       ` Kirill A. Shutemov
2017-10-20 15:23         ` Ingo Molnar
2017-10-20 16:23           ` Kirill A. Shutemov
2017-10-23 11:56             ` Ingo Molnar
2017-10-23 12:21               ` Kirill A. Shutemov
2017-10-23 12:40                 ` Ingo Molnar
2017-10-23 12:48                   ` Kirill A. Shutemov
2017-10-24  9:40                     ` Ingo Molnar
2017-10-24 11:38                       ` Kirill A. Shutemov
2017-10-24 12:47                         ` Ingo Molnar
2017-10-24 13:12                           ` Kirill A. Shutemov
2017-10-26  7:37                             ` Ingo Molnar
2017-10-26 14:40                               ` Kirill A. Shutemov [this message]
2017-10-31  9:47                                 ` Ingo Molnar
2017-10-31 12:04                                   ` Kirill A. Shutemov
2017-10-20  9:49       ` Minchan Kim
2017-10-20 12:18         ` Kirill A. Shutemov
2017-10-24 11:32     ` hpa
2017-10-24 11:43       ` Kirill A. Shutemov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171026144040.hjm45civpm74gafx@node.shutemov.name \
    --to=kirill@shutemov.name \
    --cc=akpm@linux-foundation.org \
    --cc=bp@suse.de \
    --cc=gorcunov@openvz.org \
    --cc=hpa@zytor.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@amacapital.net \
    --cc=mingo@kernel.org \
    --cc=mingo@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).