linux-sgx.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jarkko Sakkinen <jarkko@kernel.org>
To: Reinette Chatre <reinette.chatre@intel.com>
Cc: linux-sgx@vger.kernel.org,
	Haitao Huang <haitao.huang@linux.intel.com>,
	Vijay Dhanraj <vijay.dhanraj@intel.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Paul Menzel <pmenzel@molgen.mpg.de>,
	stable@vger.kernel.org, Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	"maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)"
	<x86@kernel.org>, "H. Peter Anvin" <hpa@zytor.com>,
	"open list:X86 ARCHITECTURE (32-BIT AND 64-BIT)" 
	<linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v2 2/6] x86/sgx: Do not consider unsanitized pages an error
Date: Fri, 2 Sep 2022 00:53:52 +0300	[thread overview]
Message-ID: <YxEp8Ji+ukLBoNE+@kernel.org> (raw)
In-Reply-To: <24906e57-461f-6c94-9e78-0d8507df01bb@intel.com>

On Wed, Aug 31, 2022 at 01:39:53PM -0700, Reinette Chatre wrote:
> Hi Jarkko,
> 
> On 8/31/2022 10:38 AM, Jarkko Sakkinen wrote:
> > In sgx_init(), if misc_register() fails or misc_register() succeeds but
> > neither sgx_drv_init() nor sgx_vepc_init() succeeds, then ksgxd will be
> > prematurely stopped. This may leave some unsanitized pages, which does
> > not matter, because SGX will be disabled for the whole power cycle.
> > 
> > This triggers WARN_ON() because sgx_dirty_page_list ends up being
> > non-empty, and dumps the call stack:
> > 
> > [    0.268103] sgx: EPC section 0x40200000-0x45f7ffff
> > [    0.268591] ------------[ cut here ]------------
> > [    0.268592] WARNING: CPU: 6 PID: 83 at
> > arch/x86/kernel/cpu/sgx/main.c:401 ksgxd+0x1b7/0x1d0
> > [    0.268598] Modules linked in:
> > [    0.268600] CPU: 6 PID: 83 Comm: ksgxd Not tainted 6.0.0-rc2 #382
> > [    0.268603] Hardware name: Dell Inc. XPS 13 9370/0RMYH9, BIOS 1.21.0
> > 07/06/2022
> > [    0.268604] RIP: 0010:ksgxd+0x1b7/0x1d0
> > [    0.268607] Code: ff e9 f2 fe ff ff 48 89 df e8 75 07 0e 00 84 c0 0f
> > 84 c3 fe ff ff 31 ff e8 e6 07 0e 00 84 c0 0f 85 94 fe ff ff e9 af fe ff
> > ff <0f> 0b e9 7f fe ff ff e8 dd 9c 95 00 66 66 2e 0f 1f 84 00 00 00 00
> > [    0.268608] RSP: 0000:ffffb6c7404f3ed8 EFLAGS: 00010287
> > [    0.268610] RAX: ffffb6c740431a10 RBX: ffff8dcd8117b400 RCX:
> > 0000000000000000
> > [    0.268612] RDX: 0000000080000000 RSI: ffffb6c7404319d0 RDI:
> > 00000000ffffffff
> > [    0.268613] RBP: ffff8dcd820a4d80 R08: ffff8dcd820a4180 R09:
> > ffff8dcd820a4180
> > [    0.268614] R10: 0000000000000000 R11: 0000000000000006 R12:
> > ffffb6c74006bce0
> > [    0.268615] R13: ffff8dcd80e63880 R14: ffffffffa8a60f10 R15:
> > 0000000000000000
> > [    0.268616] FS:  0000000000000000(0000) GS:ffff8dcf25580000(0000)
> > knlGS:0000000000000000
> > [    0.268617] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [    0.268619] CR2: 0000000000000000 CR3: 0000000213410001 CR4:
> > 00000000003706e0
> > [    0.268620] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> > 0000000000000000
> > [    0.268621] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> > 0000000000000400
> > [    0.268622] Call Trace:
> > [    0.268624]  <TASK>
> > [    0.268627]  ? _raw_spin_lock_irqsave+0x24/0x60
> > [    0.268632]  ? _raw_spin_unlock_irqrestore+0x23/0x40
> > [    0.268634]  ? __kthread_parkme+0x36/0x90
> > [    0.268637]  kthread+0xe5/0x110
> > [    0.268639]  ? kthread_complete_and_exit+0x20/0x20
> > [    0.268642]  ret_from_fork+0x1f/0x30
> > [    0.268647]  </TASK>
> > [    0.268648] ---[ end trace 0000000000000000 ]---
> > 
> 
> Are you still planning to trim this?
> 
> > Ultimately this can crash the kernel, if the following is set:
> > 
> > 	/proc/sys/kernel/panic_on_warn
> > 
> > In premature stop, print nothing, as the number is by practical means a
> > random number. Otherwise, it is an indicator of a bug in the driver, and
> > therefore print the number of unsanitized pages with pr_err().
> 
> I think that "print the number of unsanitized pages with pr_err()" 
> contradicts the patch subject of "Do not consider unsanitized pages
> an error".
> 
> ...
> 
> > @@ -388,17 +393,40 @@ void sgx_reclaim_direct(void)
> >  
> >  static int ksgxd(void *p)
> >  {
> > +	long ret;
> > +
> >  	set_freezable();
> >  
> >  	/*
> >  	 * Sanitize pages in order to recover from kexec(). The 2nd pass is
> >  	 * required for SECS pages, whose child pages blocked EREMOVE.
> >  	 */
> > -	__sgx_sanitize_pages(&sgx_dirty_page_list);
> > -	__sgx_sanitize_pages(&sgx_dirty_page_list);
> > +	ret = __sgx_sanitize_pages(&sgx_dirty_page_list);
> > +	if (ret == -ECANCELED)
> > +		/* kthread stopped */
> > +		return 0;
> >  
> > -	/* sanity check: */
> > -	WARN_ON(!list_empty(&sgx_dirty_page_list));
> > +	ret = __sgx_sanitize_pages(&sgx_dirty_page_list);
> > +	switch (ret) {
> > +	case 0:
> > +		/* success, no unsanitized pages */
> > +		break;
> > +
> > +	case -ECANCELED:
> > +		/* kthread stopped */
> > +		return 0;
> > +
> > +	default:
> > +		/*
> > +		 * Never expected to happen in a working driver. If it happens
> > +		 * the bug is expected to be in the sanitization process, but
> > +		 * successfully sanitized pages are still valid and driver can
> > +		 * be used and most importantly debugged without issues. To put
> > +		 * short, the global state of kernel is not corrupted so no
> > +		 * reason to do any more complicated rollback.
> > +		 */
> > +		pr_err("%ld unsanitized pages\n", ret);
> > +	}
> >  
> >  	while (!kthread_should_stop()) {
> >  		if (try_to_freeze())
> 
> 
> I think I am missing something here. A lot of logic is added here but I
> do not see why it is necessary.  ksgxd() knows via kthread_should_stop() if
> the reclaimer was canceled. I am thus wondering, could the above not be
> simplified to something similar to V1:
> 
> @@ -388,6 +393,8 @@ void sgx_reclaim_direct(void)
>  
>  static int ksgxd(void *p)
>  {
> +	unsigned long left_dirty;
> +
>  	set_freezable();
>  
>  	/*
> @@ -395,10 +402,10 @@ static int ksgxd(void *p)
>  	 * required for SECS pages, whose child pages blocked EREMOVE.
>  	 */
>  	__sgx_sanitize_pages(&sgx_dirty_page_list);

IMHO, would make sense also to have here:

        if (!kthread_should_stop())
                return 0;

> -	__sgx_sanitize_pages(&sgx_dirty_page_list);
>  
> -	/* sanity check: */
> -	WARN_ON(!list_empty(&sgx_dirty_page_list));
> +	left_dirty = __sgx_sanitize_pages(&sgx_dirty_page_list);
> +	if (left_dirty && !kthread_should_stop())
> +		pr_err("%lu unsanitized pages\n", left_dirty);

That would be incorrect, if the function returned
because of kthread stopped.

If you do the check here you already have a window
where kthread could have been stopped anyhow.

So even this would be less correct:

        if (kthreas_should_stop()) {
                return 0;
        }  else if (left_dirty) {
                pr_err("%lu unsanitized pages\n", left_dirty);
        }

So in the end you end as complicated and less correct
fix. This all is explained in the commit message.

If you unconditionally print error, you don't have
a meaning for the number of unsanitized pags.

BR, Jarkko

  parent reply	other threads:[~2022-09-01 21:54 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-31 17:38 [PATCH v2 0/6] x86/sgx: A collection of tests and fixes Jarkko Sakkinen
2022-08-31 17:38 ` [PATCH v2 1/6] selftests/sgx: Ignore OpenSSL 3.0 deprecated functions warning Jarkko Sakkinen
2022-08-31 17:38 ` [PATCH v2 2/6] x86/sgx: Do not consider unsanitized pages an error Jarkko Sakkinen
2022-08-31 20:39   ` Reinette Chatre
2022-09-01 10:50     ` Huang, Kai
2022-09-01 21:47       ` jarkko
2022-09-01 21:53     ` Jarkko Sakkinen [this message]
2022-09-01 21:56       ` Jarkko Sakkinen
2022-09-01 22:01         ` Jarkko Sakkinen
2022-09-01 22:34       ` Reinette Chatre
2022-09-01 23:56         ` Jarkko Sakkinen
2022-09-02 13:26           ` Jarkko Sakkinen
2022-09-02 15:53             ` Jarkko Sakkinen
2022-09-02 16:08               ` Reinette Chatre
2022-09-02 16:30                 ` Jarkko Sakkinen
2022-09-02 17:38                   ` Reinette Chatre
2022-09-02 19:20                     ` Jarkko Sakkinen
2022-08-31 17:38 ` [PATCH v2 3/6] x86/sgx: Handle VA page allocation failure for EAUG on PF Jarkko Sakkinen
2022-08-31 18:08   ` Reinette Chatre
2022-08-31 18:21     ` Jarkko Sakkinen
2022-08-31 18:33       ` Reinette Chatre
2022-08-31 18:46         ` Jarkko Sakkinen
2022-08-31 17:38 ` [PATCH v2 4/6] selftests/sgx: Add SGX selftest augment_via_eaccept_long Jarkko Sakkinen
2022-08-31 20:07   ` Reinette Chatre
2022-09-01 22:22     ` Jarkko Sakkinen
2022-09-01 23:12       ` Reinette Chatre
2022-09-02  0:03         ` Jarkko Sakkinen
2022-09-04  4:02       ` Jarkko Sakkinen
2022-09-04  4:21         ` Jarkko Sakkinen
2022-08-31 17:38 ` [PATCH v2 5/6] selftests/sgx: retry the ioctls returned with EAGAIN Jarkko Sakkinen
2022-08-31 20:08   ` Reinette Chatre
2022-08-31 17:38 ` [PATCH v2 6/6] selftests/sgx: Add a bpftrace script for tracking allocation errors Jarkko Sakkinen
2022-08-31 20:09   ` Reinette Chatre
2022-09-01 22:24     ` Jarkko Sakkinen
2022-08-31 17:43 ` [PATCH v2 0/6] x86/sgx: A collection of tests and fixes Dave Hansen
2022-08-31 18:11   ` Jarkko Sakkinen
2022-08-31 18:24     ` Dave Hansen
2022-08-31 18:47       ` Jarkko Sakkinen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YxEp8Ji+ukLBoNE+@kernel.org \
    --to=jarkko@kernel.org \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=haitao.huang@linux.intel.com \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-sgx@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=pmenzel@molgen.mpg.de \
    --cc=reinette.chatre@intel.com \
    --cc=stable@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=vijay.dhanraj@intel.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).