Re: ❌ FAIL: Test report for kernel 5.4.0-rc2-d6c2c23.cki (stable-next)

From: Vincenzo Frascino <vincenzo.frascino@arm.com>
To: Catalin Marinas <catalin.marinas@arm.com>, Will Deacon <will@kernel.org>
Cc: Andrey Konovalov <andreyknvl@google.com>,
	Jan Stancek <jstancek@redhat.com>,
	CKI Project <cki-project@redhat.com>,
	LTP List <ltp@lists.linux.it>,
	Linux Stable maillist <stable@vger.kernel.org>,
	Memory Management <mm-qe@redhat.com>,
	Szabolcs Nagy <szabolcs.nagy@arm.com>,
	Dave P Martin <Dave.Martin@arm.com>
Subject: Re: ❌ FAIL: Test report for kernel 5.4.0-rc2-d6c2c23.cki (stable-next)
Date: Tue, 15 Oct 2019 17:02:31 +0100	[thread overview]
Message-ID: <342a9218-d840-f56d-2349-a5cfaafb6e16@arm.com> (raw)
In-Reply-To: <20191015152651.GG13874@arrakis.emea.arm.com>

Hi Catalin,

On 10/15/19 4:26 PM, Catalin Marinas wrote:
> Adding Szabolcs and Dave from ARM as we've discussed this internally
> briefly but we should include the wider audience.
> 
> On Mon, Oct 14, 2019 at 10:33:32PM +0100, Will Deacon wrote:
>> On Mon, Oct 14, 2019 at 05:26:51PM +0100, Catalin Marinas wrote:
>>> On Mon, Oct 14, 2019 at 02:54:17PM +0200, Andrey Konovalov wrote:
>>>> On Mon, Oct 14, 2019 at 9:29 AM Jan Stancek <jstancek@redhat.com> wrote:
>>>>>> We ran automated tests on a recent commit from this kernel tree:
>>>>>>
>>>>>>        Kernel repo:
>>>>>>        git://git.kernel.org/pub/scm/linux/kernel/git/sashal/linux-stable.git
>>>>>>             Commit: d6c2c23a29f4 - Merge branch 'stable-next' of
>>>>>>             ssh://chubbybox:/home/sasha/data/next into stable-next
>>>>>>
>>>>>> The results of these automated tests are provided below.
>>>>>>
>>>>>>     Overall result: FAILED (see details below)
>>>>>>              Merge: OK
>>>>>>            Compile: OK
>>>>>>              Tests: FAILED
>>>>>>
>>>>>> All kernel binaries, config files, and logs are available for download here:
>>>>>>
>>>>>>   https://artifacts.cki-project.org/pipelines/223563
>>>>>>
>>>>>> One or more kernel tests failed:
>>>>>>
>>>>>>     aarch64:
>>>>>>       ❌ LTP: openposix test suite
>>>>>>
>>>>>
>>>>> Test [1] is passing value close to LONG_MAX, which on arm64 is now treated as tagged userspace ptr:
>>>>>   057d3389108e ("mm: untag user pointers passed to memory syscalls")
>>>>>
>>>>> And now seems to hit overflow check after sign extension (EINVAL).
>>>>> Test should probably find different way to choose invalid pointer.
>>>>>
>>>>> [1] https://github.com/linux-test-project/ltp/blob/master/testcases/open_posix_testsuite/conformance/interfaces/mlock/8-1.c
>>>>

[...]

>>> The options I see:
>>>
>>> 1. Revert commit 057d3389108e and try again to document that the memory
>>>    syscalls do not support tagged pointers
>>>
>>> 2. Change untagged_addr() to only 0-extend from bit 55 or leave the
>>>    tag unchanged if bit 55 is 1. We could mask out the tag (0 rather
>>>    than sign-extend) but if we had an mlock test passing ULONG_MASK,
>>>    then we get -ENOMEM instead of -EINVAL
>>>
>>> 3. Make untagged_addr() depend on the TIF_TAGGED_ADDR bit and we only
>>>    break the ABI for applications opting in to this new ABI. We did look
>>>    at this but the ptrace(PEEK/POKE_DATA) needs a bit more thinking on
>>>    whether we check the ptrace'd process or the debugger flags
>>>
>>> 4. Leave things as they are, consider the address space 56-bit and
>>>    change the test to not use LONG_MAX on arm64. This needs to be passed
>>>    by the sparc guys since they probably have a similar issue
>>
>> I'm in favour of (2) or (4) as long as there's also an update to the docs.
> 
> With (4) we'd start differing from other architectures supported by
> Linux. This works if we consider the test to be broken. However, reading
> the mlock man page:
> 
>        EINVAL The result of the addition addr+len was less than addr
>        (e.g., the addition may have resulted in an overflow).
> 
>        ENOMEM Some of the specified address range does not correspond to
>        mapped pages in the address space of the process.
> 
> There is no mention of EINVAL outside the TASK_SIZE, seems to fall more
> within the ENOMEM description above.
>

I agree with your analysis and vote for option (2) as well, because of what you
reported about mlock() error meanings and because being this ABI I would prefer
where possible to not diverge from other architectures.
 [...]

-- 
Regards,
Vincenzo