All of lore.kernel.org
 help / color / mirror / Atom feed
* Fwd: Spurious SIGBUS when threads race to insert a DAX page
       [not found] <CAGd_VJzZArEHHR5HUoUDjkN70aJ7CVsfBjro0mtS3eTPeTy1nw@mail.gmail.com>
@ 2022-03-14 20:04 ` Christopher Hodgkins
  2022-03-14 21:00   ` Dave Chinner
  0 siblings, 1 reply; 2+ messages in thread
From: Christopher Hodgkins @ 2022-03-14 20:04 UTC (permalink / raw)
  To: linux-fsdevel

NOTE: This question is about kernel 4.15. All line numbers and symbol
names correspond to the Git source at tag v4.15.

Hi all,
I've been running some benchmarks using ext4 files on PMEM (first-gen
Intel Optane) as "anonymous" memory, and I've run into a weird error.
For reference, the way this works is that we have a runtime that at
startup `fallocate`s a large PMEM-backed file and maps the whole thing
R/W with MAP_SYNC, and then it interposes on calls to `mmap` in
userspace to return page-sized chunks of PMEM when anonymous memory is
requested.

The error I have encountered is the nondeterministic delivery of
SIGBUS on the first access to an untouched page of the mapped region
(which since the file is passed to the application sequentially, is
also typically the first uninitialized extent in the file at time of
crash). The accesses are aligned and within a mapped region according
to smaps, which eliminates the only documented reasons for delivery of
SIGBUS that I'm aware of.

I did a bit of digging with FTrace, and the course of events at a
crash seems to be as follows. Multiple (>2) threads start faulting in
the page, and go through the "synchronous page fault" path. They all
return error-free from the fdatasync() call at dax.c:1588 and call
dax_insert_pfn_mkwrite. The first thread to exit that function returns
NOPAGE (success) and the others all return SIGBUS, and each raises the
userspace signal on the return path.

My best guess for why this occurs is that the unsuccessful calls all
bounce with EBUSY (because of the successful one?) in insert_pfn
(which tails into the call to vm_insert_mixed_mkwrite at dax.c:1548),
and then dax_fault_return maps that to SIGBUS. The signal is
definitely spurious -- as mentioned, one of the threads returns
success, and if I catch the signal with GDB, the faulting access can
be successfully performed after the signal is caught. Also, as
mentioned above, the error is nondeterministic -- it happens maybe one
out of every five runs. To clarify some other things that could make a
difference, the pages are normal-sized (not huge) and the SIGBUS isn't
due to PMEM failure (ie HWPOISON).

I'm on an old kernel (4.15) so if this is really an error in the
kernel code it may be fixed on the current series. If that's the case,
just point me to a patch or release number where it was fixed and I'll
be happy. It may also be an error in my code -- I will be less happy
in that case, but please still point it out or ask questions for
clarification if you think I'm doing something wrong to cause this.

Thanks,
George Hodgkins

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Fwd: Spurious SIGBUS when threads race to insert a DAX page
  2022-03-14 20:04 ` Fwd: Spurious SIGBUS when threads race to insert a DAX page Christopher Hodgkins
@ 2022-03-14 21:00   ` Dave Chinner
  0 siblings, 0 replies; 2+ messages in thread
From: Dave Chinner @ 2022-03-14 21:00 UTC (permalink / raw)
  To: Christopher Hodgkins; +Cc: linux-fsdevel

On Mon, Mar 14, 2022 at 02:04:35PM -0600, Christopher Hodgkins wrote:
> NOTE: This question is about kernel 4.15. All line numbers and symbol
> names correspond to the Git source at tag v4.15.
> 
> Hi all,
> I've been running some benchmarks using ext4 files on PMEM (first-gen
> Intel Optane) as "anonymous" memory, and I've run into a weird error.
> For reference, the way this works is that we have a runtime that at
> startup `fallocate`s a large PMEM-backed file and maps the whole thing
> R/W with MAP_SYNC, and then it interposes on calls to `mmap` in
> userspace to return page-sized chunks of PMEM when anonymous memory is
> requested.
> 
> The error I have encountered is the nondeterministic delivery of
> SIGBUS on the first access to an untouched page of the mapped region
> (which since the file is passed to the application sequentially, is
> also typically the first uninitialized extent in the file at time of
> crash). The accesses are aligned and within a mapped region according
> to smaps, which eliminates the only documented reasons for delivery of
> SIGBUS that I'm aware of.

First thing to check is whether it occurs with XFS+DAX on that
kernel. That will tell you if it's an infrastructure or ext4
problem.

Second thing to do is to test a current 5.17-rc8 kernel to see if
the problem reproduces on a current kernel. i.e. determine if the
problem has actually been fixed or not.

If it reproduces on a current kernel, then update the bug report
with all that information and post the code that reproduces the
problem so we can look at it more detail.

> I did a bit of digging with FTrace, and the course of events at a
> crash seems to be as follows. Multiple (>2) threads start faulting in
> the page, and go through the "synchronous page fault" path. They all
> return error-free from the fdatasync() call at dax.c:1588 and call
> dax_insert_pfn_mkwrite. The first thread to exit that function returns
> NOPAGE (success) and the others all return SIGBUS, and each raises the
> userspace signal on the return path.
> 
> My best guess for why this occurs is that the unsuccessful calls all
> bounce with EBUSY (because of the successful one?) in insert_pfn
> (which tails into the call to vm_insert_mixed_mkwrite at dax.c:1548),
> and then dax_fault_return maps that to SIGBUS. The signal is
> definitely spurious -- as mentioned, one of the threads returns
> success, and if I catch the signal with GDB, the faulting access can
> be successfully performed after the signal is caught. Also, as
> mentioned above, the error is nondeterministic -- it happens maybe one
> out of every five runs. To clarify some other things that could make a
> difference, the pages are normal-sized (not huge) and the SIGBUS isn't
> due to PMEM failure (ie HWPOISON).
> 
> I'm on an old kernel (4.15) so if this is really an error in the
> kernel code it may be fixed on the current series. If that's the case,
> just point me to a patch or release number where it was fixed and I'll
> be happy.

git bisect is your friend, and it doesn't require any upstream
developer time for you to run the bisect and determine where it was
fixed...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2022-03-14 21:00 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CAGd_VJzZArEHHR5HUoUDjkN70aJ7CVsfBjro0mtS3eTPeTy1nw@mail.gmail.com>
2022-03-14 20:04 ` Fwd: Spurious SIGBUS when threads race to insert a DAX page Christopher Hodgkins
2022-03-14 21:00   ` Dave Chinner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.