Re: [PATCH 1/2] x86/mm: don't mmap() over 4GB with compat syscall

From: Dmitry Safonov <dsafonov@virtuozzo.com>
To: Andy Lutomirski <luto@amacapital.net>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"Dmitry Safonov" <0x7f454c46@gmail.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	"Ingo Molnar" <mingo@redhat.com>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Andy Lutomirski <luto@kernel.org>, Borislav Petkov <bp@suse.de>,
	X86 ML <x86@kernel.org>
Subject: Re: [PATCH 1/2] x86/mm: don't mmap() over 4GB with compat syscall
Date: Thu, 12 Jan 2017 12:46:03 +0300	[thread overview]
Message-ID: <dec32bd7-36ad-7273-5874-62328a98bb30@virtuozzo.com> (raw)
In-Reply-To: <CALCETrX1Y44HB3J+qLLV+N4br=-RUWNMBJE+8Dcb2nF_7+YcFg@mail.gmail.com>

On 01/12/2017 01:26 AM, Andy Lutomirski wrote:
> On Wed, Jan 11, 2017 at 10:17 AM, Dmitry Safonov <dsafonov@virtuozzo.com> wrote:
>> During fixing CRIU bugs on ZDTM tests for 32-bit C/R, I found that
>> compatible ia32/x32 syscalls mmap() and mmap2() can return address
>> over 4Gb in x86_64 applications, which results in returning lower
>> 4 bytes of address while dropping the higher bytes.
>> It happens because mmap() upper limit doesn't differ native/compat
>> syscalls for 64-bit task, it's: (TASK_UNMAPPED_BASE + random_factor)
>> which is: (PAGE_ALIGN(TASK_SIZE / 3)) + random_factor
>> (in case of legacy mmap it's just TASK_SIZE).
>> This patch limits higher address that can be mmaped with compat
>> syscalls in 64-bit applications with IA32_PAGE_OFFSET (+randomization).
>>
>> Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
>> ---
>>  arch/x86/kernel/sys_x86_64.c | 37 +++++++++++++++++++++++++++++--------
>>  1 file changed, 29 insertions(+), 8 deletions(-)
>>
>> diff --git a/arch/x86/kernel/sys_x86_64.c b/arch/x86/kernel/sys_x86_64.c
>> index a55ed63b9f91..0893725db6e6 100644
>> --- a/arch/x86/kernel/sys_x86_64.c
>> +++ b/arch/x86/kernel/sys_x86_64.c
>> @@ -100,7 +100,7 @@ SYSCALL_DEFINE6(mmap, unsigned long, addr, unsigned long, len,
>>  static void find_start_end(unsigned long flags, unsigned long *begin,
>>                            unsigned long *end)
>>  {
>> -       if (!test_thread_flag(TIF_ADDR32) && (flags & MAP_32BIT)) {
>> +       if (!test_thread_flag(TIF_ADDR32)) {
>>                 /* This is usually used needed to map code in small
>>                    model, so it needs to be in the first 31bit. Limit
>>                    it to that.  This means we need to move the
>> @@ -109,14 +109,24 @@ static void find_start_end(unsigned long flags, unsigned long *begin,
>>                    malloc knows how to fall back to mmap. Give it 1GB
>>                    of playground for now. -AK */
>>                 *begin = 0x40000000;
>> -               *end = 0x80000000;
>> -               if (current->flags & PF_RANDOMIZE) {
>> -                       *begin = randomize_page(*begin, 0x02000000);
>> +
>> +               if (flags & MAP_32BIT) {
>> +                       if (current->flags & PF_RANDOMIZE)
>> +                               *begin = randomize_page(*begin, 0x02000000);
>> +                       *end = 0x80000000;
>> +                       return;
>> +               }
>> +               if (current->thread.status & TS_COMPAT) {
>> +                       if (current->flags & PF_RANDOMIZE)
>> +                               *begin = randomize_page(*begin,
>> +                                       1UL << mmap_rnd_compat_bits);
>> +                       *end = IA32_PAGE_OFFSET;
>> +                       return;
>>                 }
>> -       } else {
>> -               *begin = current->mm->mmap_legacy_base;
>> -               *end = TASK_SIZE;
>>         }
>> +
>> +       *begin = current->mm->mmap_legacy_base;
>> +       *end = TASK_SIZE;
>>  }
>>
>>  unsigned long
>> @@ -187,10 +197,21 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0,
>>                         return addr;
>>         }
>>
>> +       if (current->thread.status & TS_COMPAT) {
>
> in_compat_syscall(), please.

Indeed, forgot about the helper.

>
> Also, we need to verify that, if this is called execve(), it does the
> right thing.

Hmm, not sure I get it right.
A test for calling compat sys_execve() from and for 64-bit ELF?

>> +               if (current->flags & PF_RANDOMIZE) {
>> +                       unsigned long rnd = 1UL << mmap_rnd_compat_bits;
>> +
>> +                       info.high_limit =
>> +                               randomize_page(IA32_PAGE_OFFSET - rnd, rnd);
>> +               } else {
>> +                       info.high_limit = IA32_PAGE_OFFSET;
>> +               }
>> +       } else {
>> +               info.high_limit = mm->mmap_base;
>> +       }
>
> This code was incomprehensible before and it's worse now.  Could you
> try to clean it up a bit?  For example, a patch that simply folds
> find_start_end() into its sole caller as the first patch in the series
> without changing any semantics would probably help.

Well, yep, I also don't like how this code looks like.
That will need to add a parameter to find_start_end() whether
allocation is bottom-up or up-bottom.
I'll try to cleanup for v2.

>
> --Andy
>

-- 
              Dmitry