All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [Devel] ref-counting races?
@ 2013-03-24 11:10 Andriy Gapon
  0 siblings, 0 replies; 18+ messages in thread
From: Andriy Gapon @ 2013-03-24 11:10 UTC (permalink / raw)
  To: devel

[-- Attachment #1: Type: text/plain, Size: 2933 bytes --]

on 22/03/2013 06:28 Zheng, Lv said the following:
>> 1. Ensure that AcpiGetObjectInfo/AcpiUtExecute_HID/etc are never executed
>> concurrently on the same object at the OS level code (primarily in the battery
>> driver).
>>
>> 2. Ensure that AcpiUtRemoveReference is always invoked under some lock.
>> Not sure
>> if it should be the Interpreter mutex or the Namespace mutex.  I believe that
>> AcpiUtAddReference are always consistently called under a lock, but I haven't
>> verified all the code paths.  On the other hand, it is obvious that there are
>> many
>> places where AcpiUtRemoveReference is called without any protection, but it is
>> hard to conclusively enumerate _all_ of them.
> 
> Hi, Andriy, your codes are using AcpiOsAcquireMutext to achieve this goal, it does not look like correct.
>

Perhaps it's not perfect/appropriate, but I am sure that it is not incorrect.
In either case, I support using a spin-lock here.

>> 3. Introduce locking to AcpiUtUpdateRefCount to ensure that ReferenceCount
>> is
>> never modified concurrently.  Essentially this should give an equivalence of
>> atomicity for ReferenceCount manipulations.
>> Additionally, it would be a good idea to make ReferenceCount == 0 check more
>> strict (if not fatal).  Also, unused and dangerous REF_FORCE_DELETE should
>> be removed.
> 
> It does not look good and sufficient.
> We need to add a "never-return error" mutex in OSL first then do many tests to see if there are regressions.

Well, I think that possibilities for ACPICA mutex to return an error are very
well enumerated in the ACPICA reference manual.  So, if there is no timeout
specified and the lock parameter is definitely not NULL, then we should expect
that mutex lock and unlock never fail.

> There might be regressions if dead lock happens.

I do not see any possibility for a deadlock with the proposed new lock, because
it is a leaf lock (it spans only basic operations and no function calls).

>> I like #3 the best because it provides the most guarantees while requiring the
>> least code inspection/verification efforts.
>> Of course, you know the code much better, so your assessment of the problem
>> and
>> possible solutions could be very different.
>>
>> Here is a quickly hacked together prototype patch that seems to have helped
>> the user:
>> http://people.freebsd.org/~avg/acpi-ref-count-2.diff
> 
> The code looks broken on Linux, it might be broken on freebsd. Isn't it?

It is not broken on FreeBSD.
What is broken on Linux?  If you mean the panic calls, then I've already said
that they are FreeBSD-specific and are only there to mark dangerous (fatal,
even) conditions that the current code simply ignores.

> Why don't we consider to have an AcpiAtomic implementation in every OSL?

AcpiAtomic could be useful for other things, but I do not see it being
_required_ here.

-- 
Andriy Gapon

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Devel] ref-counting races?
@ 2013-03-24 10:54 Andriy Gapon
  0 siblings, 0 replies; 18+ messages in thread
From: Andriy Gapon @ 2013-03-24 10:54 UTC (permalink / raw)
  To: devel

[-- Attachment #1: Type: text/plain, Size: 1946 bytes --]

on 22/03/2013 22:23 Moore, Robert said the following:
> Andriy,
> 
> This will of course take a bit of time for us to completely sort out. You are
> correct that there are unlocked calls to the reference count mechanism. This
> could be a result of some code restructuring sometime in the past, as I hope
> that this didn't happen as an oversight.
> 
> I want to investigate some older code as well as the original design of the
> reference counts.

Thank you!

> If we do need a separate lock for the reference count mechanism, perhaps a
> spin lock could be used instead of a mutex, as there is nothing that will
> block for any length of time. This may not make any difference on FreeBSD as
> spinlocks may not be available, I'm not sure.

I agree that a spinlock would be more appropriate.  Just in case, FreeBSD does
provide ACPICA spinlock implementation.

> In the meantime, I have removed the REF_FORCE_DELETE option as per your
> suggestion; it was implemented in case it was ever needed, which it has not.

Thank you again.

Do you perhaps have an advice on an interim solution for FreeBSD users who seem
to keep bumping into this issue?
I will have a look into trying to prevent concurrency at the battery driver
level.  To, at least, reduce the chances of hitting the problem.

In fact, I'll ask the user who helped me, kron24(a)gmail.com, to test the
following (rather coarse) patch:
--- a/sys/dev/acpica/acpi_battery.c
+++ b/sys/dev/acpica/acpi_battery.c
@@ -360,6 +360,8 @@ acpi_battery_ioctl(u_long cmd, caddr_t addr, void *arg)
     int error, unit;
     device_t dev;

+    mtx_lock(&Giant);
+
     /* For commands that use the ioctl_arg struct, validate it first. */
     error = ENXIO;
     unit = 0;
@@ -417,6 +419,7 @@ acpi_battery_ioctl(u_long cmd, caddr_t addr, void *arg)
 	error = EINVAL;
     }

+    mtx_unlock(&Giant);
     return (error);
 }


-- 
Andriy Gapon

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Devel] ref-counting races?
@ 2013-03-22 20:23 Moore, Robert
  0 siblings, 0 replies; 18+ messages in thread
From: Moore, Robert @ 2013-03-22 20:23 UTC (permalink / raw)
  To: devel

[-- Attachment #1: Type: text/plain, Size: 7224 bytes --]

Andriy,

This will of course take a bit of time for us to completely sort out. You are correct that there are unlocked calls to the reference count mechanism. This could be a result of some code restructuring sometime in the past, as I hope that this didn't happen as an oversight.

I want to investigate some older code as well as the original design of the reference counts.

If we do need a separate lock for the reference count mechanism, perhaps a spin lock could be used instead of a mutex, as there is nothing that will block for any length of time. This may not make any difference on FreeBSD as spinlocks may not be available, I'm not sure.

In the meantime, I have removed the REF_FORCE_DELETE option as per your suggestion; it was implemented in case it was ever needed, which it has not.

Thanks,
Bob


> -----Original Message-----
> From: Andriy Gapon [mailto:avg(a)FreeBSD.org]
> Sent: Thursday, March 21, 2013 11:08 AM
> To: Moore, Robert
> Cc: devel(a)acpica.org; Zheng, Lv
> Subject: Re: [Devel] ref-counting races?
> 
> on 21/11/2012 17:18 Moore, Robert said the following:
> >> -----Original Message-----
> >> From: Andriy Gapon [mailto:avg(a)FreeBSD.org]
> [snip]
> >> AcpiUtUpdateRefCount handling of REF_DECREMENT && Count < 1 looks
> >> worrying.
> >> Not sure if that ever happens in practice, but since the code exists...
> >> The case is being treated as a minor event (judging from the
> >> debugging print), but we are already lucky if we see Count == 0 there
> >> instead of some garbage or just plain crashing because the memory is
> already freed.
> >> Nevertheless we discard even that luck by happily continuing into
> >> potentially a repeated call to AcpiUtDeleteInternalObj.
> >
> > This code appears in order to essentially catch a case where an
> > attempt is made to delete an object twice. Admittedly, it doesn't
> > serve much purpose these days and should probably be removed. In
> > practice, we never, ever hit the (Count < 1) case that I know of. In
> > other words, I don't know of any cases where the acpica code ever
> attempts to delete an object twice.
> 
> With a help from a skilled user we've been able to catch a likely culprit
> of the FreeBSD + ACPICA heisenbugs, which all showed some relation to the
> FreeBSD battery driver.
> 
> Probably unlike other OSes FreeBSD liberally uses AcpiGetObjectInfo in the
> code that handles battery status queries from userland.  What's more,
> FreeBSD allows such queries to run in parallel and, thus,
> AcpiGetObjectInfo may be executed concurrently by more than one thread.
> In turn, AcpiGetObjectInfo calls functions like AcpiUtExecute_HID,
> AcpiUtExecute_UID, etc.  These functions are called without holding any
> lock and thus, for example, AcpiUtExecute_HID may run in parallel.
> 
> The above mentioned functions all follow the same template:
> {
>     Status = AcpiUtEvaluateObject (DeviceNode, METHOD_NAME__HID,
>                 ACPI_BTYPE_INTEGER | ACPI_BTYPE_STRING, &ObjDesc); ... do
> useful work ...
> 
> Cleanup:
> 
>     /* On exit, we must delete the return object */
> 
>     AcpiUtRemoveReference (ObjDesc);
>     return_ACPI_STATUS (Status);
> }
> 
> 
> It can be clearly seen that AcpiUtRemoveReference is called without
> protection of any lock.  Neither of AcpiUtRemoveReference ->
> AcpiUtUpdateObjectReference -> AcpiUtUpdateRefCount use any lock.
> 
> So, in the above code path ReferenceCount is decremented without holding
> any lock.
> Given that the decrement is performed in "highly" non-atomic fashion, it
> is open to races.
> 
> E.g. let's assume that we have some like this in ASL:
> Name (_HID, EisaId ("PNP0C0E"))
> 
> Creation of the _HID node would create a value object with reference count
> of 1.
> Let's assume that thread T1 executes AcpiUtExecute_HID.
> Then AcpiUtEvaluateObject would increment the count to 2.
> Let's assume that thread T2 then concurrently executes AcpiUtExecute_HID.
> It is possible that T1 and T2 would then concurrently call
> AcpiUtUpdateRefCount:
> T1 with REF_DECREMENT (via AcpiUtRemoveReference) and T2 with
> REF_INCREMENT (via AcpiUtEvaluateObject).
> Both would see the same reference count of 2 as a starting value.
> If T2 is quicker to write 3 into ReferenceCount, then T1 would overwrite
> it with 1.
> Then, when T2 calls AcpiUtRemoveReference, it would decrement
> ReferenceCount to zero and thus the value object would be freed while the
> node still has a pointer to its address.
> 
> Subsequent accesses to the _HID node would be accessing freed memory.
> 
> This description is very consistent with the problem reports from users.
> Taking the most recent example:
> http://thread.gmane.org/gmane.os.freebsd.devel.acpi/7707
> 
> The panic was preceded by the following messages:
> ACPI Error: No object attached to node 0xfffffe00094a51c0
> (20110527/exresnte-138) ACPI Error: Method execution failed
> [\_SB_.BAT0._UID] (Node 0xfffffe00094a51c0), AE_AML_NO_OPERAND
> (20110527/uteval-113)
> 
> The panic itself is consistent with corruption of ACPICA object cache:
> http://thread.gmane.org/gmane.os.freebsd.devel.acpi/7707/focus=7730
> 
> All the reports (some of which I mentioned at the start of this thread)
> follow the same pattern: a panic in object cache code, frequently preceded
> by various error messages related to battery device _UID/_HID/etc (such as
> missing object or wrong object type, etc).
> 
> I see several possibilities of fixing this problem.
> From worst to best:
> 
> 1. Ensure that AcpiGetObjectInfo/AcpiUtExecute_HID/etc are never executed
> concurrently on the same object at the OS level code (primarily in the
> battery driver).
> 
> 2. Ensure that AcpiUtRemoveReference is always invoked under some lock.
> Not sure if it should be the Interpreter mutex or the Namespace mutex.  I
> believe that AcpiUtAddReference are always consistently called under a
> lock, but I haven't verified all the code paths.  On the other hand, it is
> obvious that there are many places where AcpiUtRemoveReference is called
> without any protection, but it is hard to conclusively enumerate _all_ of
> them.
> 
> 3. Introduce locking to AcpiUtUpdateRefCount to ensure that ReferenceCount
> is never modified concurrently.  Essentially this should give an
> equivalence of atomicity for ReferenceCount manipulations.
> Additionally, it would be a good idea to make ReferenceCount == 0 check
> more strict (if not fatal).  Also, unused and dangerous REF_FORCE_DELETE
> should be removed.
> 
> I like #3 the best because it provides the most guarantees while requiring
> the least code inspection/verification efforts.
> Of course, you know the code much better, so your assessment of the
> problem and possible solutions could be very different.
> 
> Here is a quickly hacked together prototype patch that seems to have
> helped the user:
> http://people.freebsd.org/~avg/acpi-ref-count-2.diff
> 
> The panic calls are to demonstrate various dangerous places in the current
> code.
> They were actually hit in testing before the locking was added.
> 
> --
> Andriy Gapon

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Devel] ref-counting races?
@ 2013-03-22  4:28 Zheng, Lv
  0 siblings, 0 replies; 18+ messages in thread
From: Zheng, Lv @ 2013-03-22  4:28 UTC (permalink / raw)
  To: devel

[-- Attachment #1: Type: text/plain, Size: 1938 bytes --]

> 1. Ensure that AcpiGetObjectInfo/AcpiUtExecute_HID/etc are never executed
> concurrently on the same object at the OS level code (primarily in the battery
> driver).
> 
> 2. Ensure that AcpiUtRemoveReference is always invoked under some lock.
> Not sure
> if it should be the Interpreter mutex or the Namespace mutex.  I believe that
> AcpiUtAddReference are always consistently called under a lock, but I haven't
> verified all the code paths.  On the other hand, it is obvious that there are
> many
> places where AcpiUtRemoveReference is called without any protection, but it is
> hard to conclusively enumerate _all_ of them.

Hi, Andriy, your codes are using AcpiOsAcquireMutext to achieve this goal, it does not look like correct.

> 3. Introduce locking to AcpiUtUpdateRefCount to ensure that ReferenceCount
> is
> never modified concurrently.  Essentially this should give an equivalence of
> atomicity for ReferenceCount manipulations.
> Additionally, it would be a good idea to make ReferenceCount == 0 check more
> strict (if not fatal).  Also, unused and dangerous REF_FORCE_DELETE should
> be removed.

It does not look good and sufficient.
We need to add a "never-return error" mutex in OSL first then do many tests to see if there are regressions.
There might be regressions if dead lock happens.

> I like #3 the best because it provides the most guarantees while requiring the
> least code inspection/verification efforts.
> Of course, you know the code much better, so your assessment of the problem
> and
> possible solutions could be very different.
> 
> Here is a quickly hacked together prototype patch that seems to have helped
> the user:
> http://people.freebsd.org/~avg/acpi-ref-count-2.diff

The code looks broken on Linux, it might be broken on freebsd. Isn't it?
Why don't we consider to have an AcpiAtomic implementation in every OSL?

Thanks and best regards
-Lv

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Devel] ref-counting races?
@ 2013-03-21 19:46 Moore, Robert
  0 siblings, 0 replies; 18+ messages in thread
From: Moore, Robert @ 2013-03-21 19:46 UTC (permalink / raw)
  To: devel

[-- Attachment #1: Type: text/plain, Size: 6481 bytes --]

Andriy,

Thanks for your analysis. We'll take a look and get back to you.
Bob


> -----Original Message-----
> From: Andriy Gapon [mailto:avg(a)FreeBSD.org]
> Sent: Thursday, March 21, 2013 11:08 AM
> To: Moore, Robert
> Cc: devel(a)acpica.org; Zheng, Lv
> Subject: Re: [Devel] ref-counting races?
> 
> on 21/11/2012 17:18 Moore, Robert said the following:
> >> -----Original Message-----
> >> From: Andriy Gapon [mailto:avg(a)FreeBSD.org]
> [snip]
> >> AcpiUtUpdateRefCount handling of REF_DECREMENT && Count < 1 looks
> >> worrying.
> >> Not sure if that ever happens in practice, but since the code exists...
> >> The case is being treated as a minor event (judging from the
> >> debugging print), but we are already lucky if we see Count == 0 there
> >> instead of some garbage or just plain crashing because the memory is
> already freed.
> >> Nevertheless we discard even that luck by happily continuing into
> >> potentially a repeated call to AcpiUtDeleteInternalObj.
> >
> > This code appears in order to essentially catch a case where an
> > attempt is made to delete an object twice. Admittedly, it doesn't
> > serve much purpose these days and should probably be removed. In
> > practice, we never, ever hit the (Count < 1) case that I know of. In
> > other words, I don't know of any cases where the acpica code ever
> attempts to delete an object twice.
> 
> With a help from a skilled user we've been able to catch a likely culprit
> of the FreeBSD + ACPICA heisenbugs, which all showed some relation to the
> FreeBSD battery driver.
> 
> Probably unlike other OSes FreeBSD liberally uses AcpiGetObjectInfo in the
> code that handles battery status queries from userland.  What's more,
> FreeBSD allows such queries to run in parallel and, thus,
> AcpiGetObjectInfo may be executed concurrently by more than one thread.
> In turn, AcpiGetObjectInfo calls functions like AcpiUtExecute_HID,
> AcpiUtExecute_UID, etc.  These functions are called without holding any
> lock and thus, for example, AcpiUtExecute_HID may run in parallel.
> 
> The above mentioned functions all follow the same template:
> {
>     Status = AcpiUtEvaluateObject (DeviceNode, METHOD_NAME__HID,
>                 ACPI_BTYPE_INTEGER | ACPI_BTYPE_STRING, &ObjDesc); ... do
> useful work ...
> 
> Cleanup:
> 
>     /* On exit, we must delete the return object */
> 
>     AcpiUtRemoveReference (ObjDesc);
>     return_ACPI_STATUS (Status);
> }
> 
> 
> It can be clearly seen that AcpiUtRemoveReference is called without
> protection of any lock.  Neither of AcpiUtRemoveReference ->
> AcpiUtUpdateObjectReference -> AcpiUtUpdateRefCount use any lock.
> 
> So, in the above code path ReferenceCount is decremented without holding
> any lock.
> Given that the decrement is performed in "highly" non-atomic fashion, it
> is open to races.
> 
> E.g. let's assume that we have some like this in ASL:
> Name (_HID, EisaId ("PNP0C0E"))
> 
> Creation of the _HID node would create a value object with reference count
> of 1.
> Let's assume that thread T1 executes AcpiUtExecute_HID.
> Then AcpiUtEvaluateObject would increment the count to 2.
> Let's assume that thread T2 then concurrently executes AcpiUtExecute_HID.
> It is possible that T1 and T2 would then concurrently call
> AcpiUtUpdateRefCount:
> T1 with REF_DECREMENT (via AcpiUtRemoveReference) and T2 with
> REF_INCREMENT (via AcpiUtEvaluateObject).
> Both would see the same reference count of 2 as a starting value.
> If T2 is quicker to write 3 into ReferenceCount, then T1 would overwrite
> it with 1.
> Then, when T2 calls AcpiUtRemoveReference, it would decrement
> ReferenceCount to zero and thus the value object would be freed while the
> node still has a pointer to its address.
> 
> Subsequent accesses to the _HID node would be accessing freed memory.
> 
> This description is very consistent with the problem reports from users.
> Taking the most recent example:
> http://thread.gmane.org/gmane.os.freebsd.devel.acpi/7707
> 
> The panic was preceded by the following messages:
> ACPI Error: No object attached to node 0xfffffe00094a51c0
> (20110527/exresnte-138) ACPI Error: Method execution failed
> [\_SB_.BAT0._UID] (Node 0xfffffe00094a51c0), AE_AML_NO_OPERAND
> (20110527/uteval-113)
> 
> The panic itself is consistent with corruption of ACPICA object cache:
> http://thread.gmane.org/gmane.os.freebsd.devel.acpi/7707/focus=7730
> 
> All the reports (some of which I mentioned at the start of this thread)
> follow the same pattern: a panic in object cache code, frequently preceded
> by various error messages related to battery device _UID/_HID/etc (such as
> missing object or wrong object type, etc).
> 
> I see several possibilities of fixing this problem.
> From worst to best:
> 
> 1. Ensure that AcpiGetObjectInfo/AcpiUtExecute_HID/etc are never executed
> concurrently on the same object at the OS level code (primarily in the
> battery driver).
> 
> 2. Ensure that AcpiUtRemoveReference is always invoked under some lock.
> Not sure if it should be the Interpreter mutex or the Namespace mutex.  I
> believe that AcpiUtAddReference are always consistently called under a
> lock, but I haven't verified all the code paths.  On the other hand, it is
> obvious that there are many places where AcpiUtRemoveReference is called
> without any protection, but it is hard to conclusively enumerate _all_ of
> them.
> 
> 3. Introduce locking to AcpiUtUpdateRefCount to ensure that ReferenceCount
> is never modified concurrently.  Essentially this should give an
> equivalence of atomicity for ReferenceCount manipulations.
> Additionally, it would be a good idea to make ReferenceCount == 0 check
> more strict (if not fatal).  Also, unused and dangerous REF_FORCE_DELETE
> should be removed.
> 
> I like #3 the best because it provides the most guarantees while requiring
> the least code inspection/verification efforts.
> Of course, you know the code much better, so your assessment of the
> problem and possible solutions could be very different.
> 
> Here is a quickly hacked together prototype patch that seems to have
> helped the user:
> http://people.freebsd.org/~avg/acpi-ref-count-2.diff
> 
> The panic calls are to demonstrate various dangerous places in the current
> code.
> They were actually hit in testing before the locking was added.
> 
> --
> Andriy Gapon

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Devel] ref-counting races?
@ 2013-03-21 18:07 Andriy Gapon
  0 siblings, 0 replies; 18+ messages in thread
From: Andriy Gapon @ 2013-03-21 18:07 UTC (permalink / raw)
  To: devel

[-- Attachment #1: Type: text/plain, Size: 5893 bytes --]

on 21/11/2012 17:18 Moore, Robert said the following:
>> -----Original Message-----
>> From: Andriy Gapon [mailto:avg(a)FreeBSD.org]
[snip]
>> AcpiUtUpdateRefCount handling of REF_DECREMENT && Count < 1 looks
>> worrying.
>> Not sure if that ever happens in practice, but since the code exists...
>> The case is being treated as a minor event (judging from the debugging
>> print), but we are already lucky if we see Count == 0 there instead of
>> some garbage or just plain crashing because the memory is already freed.
>> Nevertheless we discard even that luck by happily continuing into
>> potentially a repeated call to AcpiUtDeleteInternalObj.
> 
> This code appears in order to essentially catch a case where an attempt is made to
> delete an object twice. Admittedly, it doesn't serve much purpose these days and
> should probably be removed. In practice, we never, ever hit the (Count < 1) case
> that I know of. In other words, I don't know of any cases where the acpica code
> ever attempts to delete an object twice.

With a help from a skilled user we've been able to catch a likely culprit of the
FreeBSD + ACPICA heisenbugs, which all showed some relation to the FreeBSD battery
driver.

Probably unlike other OSes FreeBSD liberally uses AcpiGetObjectInfo in the code
that handles battery status queries from userland.  What's more, FreeBSD allows
such queries to run in parallel and, thus, AcpiGetObjectInfo may be executed
concurrently by more than one thread.  In turn, AcpiGetObjectInfo calls functions
like AcpiUtExecute_HID, AcpiUtExecute_UID, etc.  These functions are called
without holding any lock and thus, for example, AcpiUtExecute_HID may run in parallel.

The above mentioned functions all follow the same template:
{
    Status = AcpiUtEvaluateObject (DeviceNode, METHOD_NAME__HID,
                ACPI_BTYPE_INTEGER | ACPI_BTYPE_STRING, &ObjDesc);
... do useful work ...

Cleanup:

    /* On exit, we must delete the return object */

    AcpiUtRemoveReference (ObjDesc);
    return_ACPI_STATUS (Status);
}


It can be clearly seen that AcpiUtRemoveReference is called without protection of
any lock.  Neither of AcpiUtRemoveReference -> AcpiUtUpdateObjectReference ->
AcpiUtUpdateRefCount use any lock.

So, in the above code path ReferenceCount is decremented without holding any lock.
Given that the decrement is performed in "highly" non-atomic fashion, it is open
to races.

E.g. let's assume that we have some like this in ASL:
Name (_HID, EisaId ("PNP0C0E"))

Creation of the _HID node would create a value object with reference count of 1.
Let's assume that thread T1 executes AcpiUtExecute_HID.
Then AcpiUtEvaluateObject would increment the count to 2.
Let's assume that thread T2 then concurrently executes AcpiUtExecute_HID.
It is possible that T1 and T2 would then concurrently call AcpiUtUpdateRefCount:
T1 with REF_DECREMENT (via AcpiUtRemoveReference) and T2 with REF_INCREMENT (via
AcpiUtEvaluateObject).
Both would see the same reference count of 2 as a starting value.
If T2 is quicker to write 3 into ReferenceCount, then T1 would overwrite it with 1.
Then, when T2 calls AcpiUtRemoveReference, it would decrement ReferenceCount to
zero and thus the value object would be freed while the node still has a pointer
to its address.

Subsequent accesses to the _HID node would be accessing freed memory.

This description is very consistent with the problem reports from users.
Taking the most recent example:
http://thread.gmane.org/gmane.os.freebsd.devel.acpi/7707

The panic was preceded by the following messages:
ACPI Error: No object attached to node 0xfffffe00094a51c0 (20110527/exresnte-138)
ACPI Error: Method execution failed [\_SB_.BAT0._UID] (Node 0xfffffe00094a51c0),
AE_AML_NO_OPERAND (20110527/uteval-113)

The panic itself is consistent with corruption of ACPICA object cache:
http://thread.gmane.org/gmane.os.freebsd.devel.acpi/7707/focus=7730

All the reports (some of which I mentioned at the start of this thread) follow the
same pattern: a panic in object cache code, frequently preceded by various error
messages related to battery device _UID/_HID/etc (such as missing object or wrong
object type, etc).

I see several possibilities of fixing this problem.
From worst to best:

1. Ensure that AcpiGetObjectInfo/AcpiUtExecute_HID/etc are never executed
concurrently on the same object at the OS level code (primarily in the battery
driver).

2. Ensure that AcpiUtRemoveReference is always invoked under some lock.  Not sure
if it should be the Interpreter mutex or the Namespace mutex.  I believe that
AcpiUtAddReference are always consistently called under a lock, but I haven't
verified all the code paths.  On the other hand, it is obvious that there are many
places where AcpiUtRemoveReference is called without any protection, but it is
hard to conclusively enumerate _all_ of them.

3. Introduce locking to AcpiUtUpdateRefCount to ensure that ReferenceCount is
never modified concurrently.  Essentially this should give an equivalence of
atomicity for ReferenceCount manipulations.
Additionally, it would be a good idea to make ReferenceCount == 0 check more
strict (if not fatal).  Also, unused and dangerous REF_FORCE_DELETE should be removed.

I like #3 the best because it provides the most guarantees while requiring the
least code inspection/verification efforts.
Of course, you know the code much better, so your assessment of the problem and
possible solutions could be very different.

Here is a quickly hacked together prototype patch that seems to have helped the user:
http://people.freebsd.org/~avg/acpi-ref-count-2.diff

The panic calls are to demonstrate various dangerous places in the current code.
They were actually hit in testing before the locking was added.

-- 
Andriy Gapon

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Devel] ref-counting races?
@ 2012-11-21 15:18 Moore, Robert
  0 siblings, 0 replies; 18+ messages in thread
From: Moore, Robert @ 2012-11-21 15:18 UTC (permalink / raw)
  To: devel

[-- Attachment #1: Type: text/plain, Size: 1990 bytes --]

> -----Original Message-----
> From: Andriy Gapon [mailto:avg(a)FreeBSD.org]
> Sent: Tuesday, November 20, 2012 3:09 PM
> To: Moore, Robert
> Cc: devel(a)acpica.org; Zheng, Lv
> Subject: Re: [Devel] ref-counting races?
> 
> on 13/11/2012 18:50 Moore, Robert said the following:
> > I still think that it might be interesting to examine the
> > AcpiGbl_MutexInfo array during add reference and remove reference to
> > see if there is any case where the reference count functions are called
> with no lock held.
> 
> I still haven't got around to implementing this useful debugging
> technique.
> 
> Meanwhile we've got another FreeBSD report which appears to be related:
> http://article.gmane.org/gmane.os.freebsd.devel.acpi/7562
> 
> Two new notes.
> 
> Could these issues be caused by FreeBSD using multiple threads to handle
> ACPI
> Notify-s?   So that some Notify-s may get processed in parallel and thus
> cause
> some parallel access to ACPICA and AML.
> 


I don't know of any issues concerning multiple threads with multiple notifies.



> AcpiUtUpdateRefCount handling of REF_DECREMENT && Count < 1 looks
> worrying.
> Not sure if that ever happens in practice, but since the code exists...
> The case is being treated as a minor event (judging from the debugging
> print), but we are already lucky if we see Count == 0 there instead of
> some garbage or just plain crashing because the memory is already freed.
> Nevertheless we discard even that luck by happily continuing into
> potentially a repeated call to AcpiUtDeleteInternalObj.
> 
> --
> Andriy Gapon

This code appears in order to essentially catch a case where an attempt is made to delete an object twice. Admittedly, it doesn't serve much purpose these days and should probably be removed. In practice, we never, ever hit the (Count < 1) case that I know of. In other words, I don't know of any cases where the acpica code ever attempts to delete an object twice.




^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Devel] ref-counting races?
@ 2012-11-20 23:08 Andriy Gapon
  0 siblings, 0 replies; 18+ messages in thread
From: Andriy Gapon @ 2012-11-20 23:08 UTC (permalink / raw)
  To: devel

[-- Attachment #1: Type: text/plain, Size: 1231 bytes --]

on 13/11/2012 18:50 Moore, Robert said the following:
> I still think that it might be interesting to examine the AcpiGbl_MutexInfo
> array during add reference and remove reference to see if there is any case
> where the reference count functions are called with no lock held.

I still haven't got around to implementing this useful debugging technique.

Meanwhile we've got another FreeBSD report which appears to be related:
http://article.gmane.org/gmane.os.freebsd.devel.acpi/7562

Two new notes.

Could these issues be caused by FreeBSD using multiple threads to handle ACPI
Notify-s?   So that some Notify-s may get processed in parallel and thus cause
some parallel access to ACPICA and AML.

AcpiUtUpdateRefCount handling of REF_DECREMENT && Count < 1 looks worrying.
Not sure if that ever happens in practice, but since the code exists...
The case is being treated as a minor event (judging from the debugging print),
but we are already lucky if we see Count == 0 there instead of some garbage or
just plain crashing because the memory is already freed.  Nevertheless we
discard even that luck by happily continuing into potentially a repeated call to
AcpiUtDeleteInternalObj.

-- 
Andriy Gapon

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Devel] ref-counting races?
@ 2012-11-13 16:50 Moore, Robert
  0 siblings, 0 replies; 18+ messages in thread
From: Moore, Robert @ 2012-11-13 16:50 UTC (permalink / raw)
  To: devel

[-- Attachment #1: Type: text/plain, Size: 1950 bytes --]

Here's what I see in AcpiEvaluateObject:

The incoming method arguments are converted from external format to internal format. During this conversion, new objects are created and the reference counts are set. However, at this time AcpiEvaluateObject has exclusive access to these objects, at least until they are handed off to the interpreter (which is locked just before the call to AcpiNsEvaluate.)

I still think that it might be interesting to examine the AcpiGbl_MutexInfo array during add reference and remove reference to see if there is any case where the reference count functions are called with no lock held.

Bob


> -----Original Message-----
> From: Andriy Gapon [mailto:avg(a)FreeBSD.org]
> Sent: Saturday, November 10, 2012 7:04 AM
> To: Moore, Robert
> Cc: devel(a)acpica.org
> Subject: Re: [Devel] ref-counting races?
> 
> on 09/11/2012 17:59 Andriy Gapon said the following:
> > on 09/11/2012 17:42 Moore, Robert said the following:
> >>> To make sure that we speak about the same thing - is e.g.
> >>> AcpiUtEvaluateObject always called under that lock?
> >>
> >> Yes.
> >>
> >>
> >> AcpiUtEvaluateObject
> >>     AcpiNsEvaluate
> >>         AcpiExEnterInterpreter ();
> >>             AcpiUtAcquireMutex (ACPI_MTX_INTERPRETER);
> >>         AcpiPsExecuteMethod (Info);
> >>
> >
> > This is not exactly what I meant.  ACPI_MTX_INTERPRETER does not cover
> > the calls to AcpiUtRemoveReference that can be made in
> > AcpiUtEvaluateObject or in AcpiUtExecute_HID or in
> AcpiUtEvaluateNumericObject, etc.
> > There could be other places in the code where AcpiUtRemoveReference or
> > AcpiUtAddReference are called without any lock serializing the access
> > to a reference counter.
> >
> 
> Just in case, AcpiEvaluateObject->AcpiNsResolveReferences seems to be an
> example of a place where AcpiUtAddReference is called without
> ACPI_MTX_INTERPRETER serialization.
> --
> Andriy Gapon

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Devel] ref-counting races?
@ 2012-11-13  1:51 Moore, Robert
  0 siblings, 0 replies; 18+ messages in thread
From: Moore, Robert @ 2012-11-13  1:51 UTC (permalink / raw)
  To: devel

[-- Attachment #1: Type: text/plain, Size: 1357 bytes --]

Ok, I will take a look.
Thanks,
Bob


> -----Original Message-----
> From: Andriy Gapon [mailto:avg(a)FreeBSD.org]
> Sent: Saturday, November 10, 2012 7:04 AM
> To: Moore, Robert
> Cc: devel(a)acpica.org
> Subject: Re: [Devel] ref-counting races?
> 
> on 09/11/2012 17:59 Andriy Gapon said the following:
> > on 09/11/2012 17:42 Moore, Robert said the following:
> >>> To make sure that we speak about the same thing - is e.g.
> >>> AcpiUtEvaluateObject always called under that lock?
> >>
> >> Yes.
> >>
> >>
> >> AcpiUtEvaluateObject
> >>     AcpiNsEvaluate
> >>         AcpiExEnterInterpreter ();
> >>             AcpiUtAcquireMutex (ACPI_MTX_INTERPRETER);
> >>         AcpiPsExecuteMethod (Info);
> >>
> >
> > This is not exactly what I meant.  ACPI_MTX_INTERPRETER does not cover
> > the calls to AcpiUtRemoveReference that can be made in
> > AcpiUtEvaluateObject or in AcpiUtExecute_HID or in
> AcpiUtEvaluateNumericObject, etc.
> > There could be other places in the code where AcpiUtRemoveReference or
> > AcpiUtAddReference are called without any lock serializing the access
> > to a reference counter.
> >
> 
> Just in case, AcpiEvaluateObject->AcpiNsResolveReferences seems to be an
> example of a place where AcpiUtAddReference is called without
> ACPI_MTX_INTERPRETER serialization.
> --
> Andriy Gapon

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Devel] ref-counting races?
@ 2012-11-10 15:04 Andriy Gapon
  0 siblings, 0 replies; 18+ messages in thread
From: Andriy Gapon @ 2012-11-10 15:04 UTC (permalink / raw)
  To: devel

[-- Attachment #1: Type: text/plain, Size: 1039 bytes --]

on 09/11/2012 17:59 Andriy Gapon said the following:
> on 09/11/2012 17:42 Moore, Robert said the following:
>>> To make sure that we speak about the same thing - is e.g.
>>> AcpiUtEvaluateObject always called under that lock?
>>
>> Yes.
>>
>>
>> AcpiUtEvaluateObject
>>     AcpiNsEvaluate
>>         AcpiExEnterInterpreter ();
>>             AcpiUtAcquireMutex (ACPI_MTX_INTERPRETER);
>>         AcpiPsExecuteMethod (Info);
>>
> 
> This is not exactly what I meant.  ACPI_MTX_INTERPRETER does not cover the calls
> to AcpiUtRemoveReference that can be made in AcpiUtEvaluateObject or in
> AcpiUtExecute_HID or in AcpiUtEvaluateNumericObject, etc.
> There could be other places in the code where AcpiUtRemoveReference or
> AcpiUtAddReference are called without any lock serializing the access to a
> reference counter.
> 

Just in case, AcpiEvaluateObject->AcpiNsResolveReferences seems to be an example
of a place where AcpiUtAddReference is called without ACPI_MTX_INTERPRETER
serialization.
-- 
Andriy Gapon

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Devel] ref-counting races?
@ 2012-11-09 16:03 Andriy Gapon
  0 siblings, 0 replies; 18+ messages in thread
From: Andriy Gapon @ 2012-11-09 16:03 UTC (permalink / raw)
  To: devel

[-- Attachment #1: Type: text/plain, Size: 511 bytes --]

on 09/11/2012 17:50 Moore, Robert said the following:
> Of course, that is not to say that there isn't a hole somewhere. However, we
> have not seen any similar issues from other host operating systems at this
> time.

Unfortunately it is not easy to reproduce the condition and it is very hard to
debug it postmortem.
Maybe my conjecture is completely wrong and the problem is not in the reference
counting in general, but in some code which does double-free or alike by mistake.

-- 
Andriy Gapon

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Devel] ref-counting races?
@ 2012-11-09 15:59 Andriy Gapon
  0 siblings, 0 replies; 18+ messages in thread
From: Andriy Gapon @ 2012-11-09 15:59 UTC (permalink / raw)
  To: devel

[-- Attachment #1: Type: text/plain, Size: 783 bytes --]

on 09/11/2012 17:42 Moore, Robert said the following:
>> To make sure that we speak about the same thing - is e.g.
>> AcpiUtEvaluateObject always called under that lock?
> 
> Yes.
> 
> 
> AcpiUtEvaluateObject
>     AcpiNsEvaluate
>         AcpiExEnterInterpreter ();
>             AcpiUtAcquireMutex (ACPI_MTX_INTERPRETER);
>         AcpiPsExecuteMethod (Info);
> 

This is not exactly what I meant.  ACPI_MTX_INTERPRETER does not cover the calls
to AcpiUtRemoveReference that can be made in AcpiUtEvaluateObject or in
AcpiUtExecute_HID or in AcpiUtEvaluateNumericObject, etc.
There could be other places in the code where AcpiUtRemoveReference or
AcpiUtAddReference are called without any lock serializing the access to a
reference counter.

-- 
Andriy Gapon

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Devel] ref-counting races?
@ 2012-11-09 15:50 Moore, Robert
  0 siblings, 0 replies; 18+ messages in thread
From: Moore, Robert @ 2012-11-09 15:50 UTC (permalink / raw)
  To: devel

[-- Attachment #1: Type: text/plain, Size: 3620 bytes --]

Of course, that is not to say that there isn't a hole somewhere. However, we have not seen any similar issues from other host operating systems at this time.

We keep track of which global mutex objects are held at any given time. Another debug strategy might be to add some code to the reference count function(s) that looks at the interpreter mutex and squawks if it isn't held.

Bob


> -----Original Message-----
> From: devel-bounces(a)acpica.org [mailto:devel-bounces(a)acpica.org] On Behalf
> Of Moore, Robert
> Sent: Friday, November 09, 2012 7:43 AM
> To: Andriy Gapon
> Cc: devel(a)acpica.org
> Subject: Re: [Devel] ref-counting races?
> 
> > To make sure that we speak about the same thing - is e.g.
> > AcpiUtEvaluateObject always called under that lock?
> 
> Yes.
> 
> 
> AcpiUtEvaluateObject
>     AcpiNsEvaluate
>         AcpiExEnterInterpreter ();
>             AcpiUtAcquireMutex (ACPI_MTX_INTERPRETER);
>         AcpiPsExecuteMethod (Info);
> 
> 
> > -----Original Message-----
> > From: Andriy Gapon [mailto:avg(a)FreeBSD.org]
> > Sent: Friday, November 09, 2012 6:42 AM
> > To: Moore, Robert
> > Cc: devel(a)acpica.org
> > Subject: Re: [Devel] ref-counting races?
> >
> > on 09/11/2012 16:28 Moore, Robert said the following:
> > > There is a lock around the entire AML interpreter. We do support the
> > limited threading model defined by the ACPI spec, but at any given
> > time, only one thread is executing the interpreter.
> >
> > Bob,
> >
> > thank you for the reply.
> > To make sure that we speak about the same thing - is e.g.
> > AcpiUtEvaluateObject always called under that lock?
> > That function and its users have calls to AcpiUtRemoveReference.
> >
> >
> > >> -----Original Message-----
> > >> From: devel-bounces(a)acpica.org [mailto:devel-bounces(a)acpica.org] On
> > >> Behalf Of Andriy Gapon
> > >> Sent: Thursday, November 08, 2012 11:34 PM
> > >> To: devel(a)acpica.org
> > >> Subject: [Devel] ref-counting races?
> > >>
> > >>
> > >> It looks like the reference count manipulations performed in
> > >> AcpiUtUpdateRefCount/AcpiUtUpdateObjectReference are not protected
> > >> by any lock.
> > >>  Although, it is possible that I am missing some implied locking in
> > >> the higher levels.
> > >>
> > >> In particular, I am concerned about access to the ASL elements like
> > >> Name(_HID,
> > >> 2) via AcpiNsEvaluate -> AcpiExResolveNodeToValue.  In this case
> > >> multiple threads can concurrently get or drop references to the
> > >> value
> > object.
> > >>
> > >> The reason to inquire about this is that over the time several
> > >> FreeBSD users has reported some seldom occurring, non-reproducible,
> > >> ACPI-related heisen-panics.
> > >> Their common element is a trap upon memory access in
> > >> AcpiOsAcquireObject(AcpiGbl_OperandCache).  Which, in my opinion,
> > >> strongly hints at corruption of AcpiGbl_OperandCache via
> > >> double-deallocation of an object.
> > >> Which I then connect to the reference counting.
> > >>
> > >> Some links:
> > >> http://lists.freebsd.org/pipermail/freebsd-current/2012-March/032637.
> > >> html
> > >> http://lists.freebsd.org/pipermail/freebsd-acpi/2012-January/007406
> > >> .h
> > >> tml
> > >> http://thread.gmane.org/gmane.os.freebsd.stable/83262/focus=83445
> > >> https://bugs.dragonflybsd.org/issues/568
> > >>
> > >>
> > >> --
> > >> Andriy Gapon
> >
> >
> > --
> > Andriy Gapon
> _______________________________________________
> Devel mailing list
> Devel(a)acpica.org
> http://lists.acpica.org/listinfo/devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Devel] ref-counting races?
@ 2012-11-09 15:42 Moore, Robert
  0 siblings, 0 replies; 18+ messages in thread
From: Moore, Robert @ 2012-11-09 15:42 UTC (permalink / raw)
  To: devel

[-- Attachment #1: Type: text/plain, Size: 2655 bytes --]

> To make sure that we speak about the same thing - is e.g.
> AcpiUtEvaluateObject always called under that lock?

Yes.


AcpiUtEvaluateObject
    AcpiNsEvaluate
        AcpiExEnterInterpreter ();
            AcpiUtAcquireMutex (ACPI_MTX_INTERPRETER);
        AcpiPsExecuteMethod (Info);


> -----Original Message-----
> From: Andriy Gapon [mailto:avg(a)FreeBSD.org]
> Sent: Friday, November 09, 2012 6:42 AM
> To: Moore, Robert
> Cc: devel(a)acpica.org
> Subject: Re: [Devel] ref-counting races?
> 
> on 09/11/2012 16:28 Moore, Robert said the following:
> > There is a lock around the entire AML interpreter. We do support the
> limited threading model defined by the ACPI spec, but at any given time,
> only one thread is executing the interpreter.
> 
> Bob,
> 
> thank you for the reply.
> To make sure that we speak about the same thing - is e.g.
> AcpiUtEvaluateObject always called under that lock?
> That function and its users have calls to AcpiUtRemoveReference.
> 
> 
> >> -----Original Message-----
> >> From: devel-bounces(a)acpica.org [mailto:devel-bounces(a)acpica.org] On
> >> Behalf Of Andriy Gapon
> >> Sent: Thursday, November 08, 2012 11:34 PM
> >> To: devel(a)acpica.org
> >> Subject: [Devel] ref-counting races?
> >>
> >>
> >> It looks like the reference count manipulations performed in
> >> AcpiUtUpdateRefCount/AcpiUtUpdateObjectReference are not protected by
> >> any lock.
> >>  Although, it is possible that I am missing some implied locking in
> >> the higher levels.
> >>
> >> In particular, I am concerned about access to the ASL elements like
> >> Name(_HID,
> >> 2) via AcpiNsEvaluate -> AcpiExResolveNodeToValue.  In this case
> >> multiple threads can concurrently get or drop references to the value
> object.
> >>
> >> The reason to inquire about this is that over the time several
> >> FreeBSD users has reported some seldom occurring, non-reproducible,
> >> ACPI-related heisen-panics.
> >> Their common element is a trap upon memory access in
> >> AcpiOsAcquireObject(AcpiGbl_OperandCache).  Which, in my opinion,
> >> strongly hints at corruption of AcpiGbl_OperandCache via
> >> double-deallocation of an object.
> >> Which I then connect to the reference counting.
> >>
> >> Some links:
> >> http://lists.freebsd.org/pipermail/freebsd-current/2012-March/032637.
> >> html
> >> http://lists.freebsd.org/pipermail/freebsd-acpi/2012-January/007406.h
> >> tml
> >> http://thread.gmane.org/gmane.os.freebsd.stable/83262/focus=83445
> >> https://bugs.dragonflybsd.org/issues/568
> >>
> >>
> >> --
> >> Andriy Gapon
> 
> 
> --
> Andriy Gapon

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Devel] ref-counting races?
@ 2012-11-09 14:41 Andriy Gapon
  0 siblings, 0 replies; 18+ messages in thread
From: Andriy Gapon @ 2012-11-09 14:41 UTC (permalink / raw)
  To: devel

[-- Attachment #1: Type: text/plain, Size: 2009 bytes --]

on 09/11/2012 16:28 Moore, Robert said the following:
> There is a lock around the entire AML interpreter. We do support the limited threading model defined by the ACPI spec, but at any given time, only one thread is executing the interpreter.

Bob,

thank you for the reply.
To make sure that we speak about the same thing - is e.g. AcpiUtEvaluateObject
always called under that lock?
That function and its users have calls to AcpiUtRemoveReference.


>> -----Original Message-----
>> From: devel-bounces(a)acpica.org [mailto:devel-bounces(a)acpica.org] On Behalf
>> Of Andriy Gapon
>> Sent: Thursday, November 08, 2012 11:34 PM
>> To: devel(a)acpica.org
>> Subject: [Devel] ref-counting races?
>>
>>
>> It looks like the reference count manipulations performed in
>> AcpiUtUpdateRefCount/AcpiUtUpdateObjectReference are not protected by any
>> lock.
>>  Although, it is possible that I am missing some implied locking in the
>> higher levels.
>>
>> In particular, I am concerned about access to the ASL elements like
>> Name(_HID,
>> 2) via AcpiNsEvaluate -> AcpiExResolveNodeToValue.  In this case multiple
>> threads can concurrently get or drop references to the value object.
>>
>> The reason to inquire about this is that over the time several FreeBSD
>> users has reported some seldom occurring, non-reproducible, ACPI-related
>> heisen-panics.
>> Their common element is a trap upon memory access in
>> AcpiOsAcquireObject(AcpiGbl_OperandCache).  Which, in my opinion, strongly
>> hints at corruption of AcpiGbl_OperandCache via double-deallocation of an
>> object.
>> Which I then connect to the reference counting.
>>
>> Some links:
>> http://lists.freebsd.org/pipermail/freebsd-current/2012-March/032637.html
>> http://lists.freebsd.org/pipermail/freebsd-acpi/2012-January/007406.html
>> http://thread.gmane.org/gmane.os.freebsd.stable/83262/focus=83445
>> https://bugs.dragonflybsd.org/issues/568
>>
>>
>> --
>> Andriy Gapon


-- 
Andriy Gapon

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Devel] ref-counting races?
@ 2012-11-09 14:28 Moore, Robert
  0 siblings, 0 replies; 18+ messages in thread
From: Moore, Robert @ 2012-11-09 14:28 UTC (permalink / raw)
  To: devel

[-- Attachment #1: Type: text/plain, Size: 1692 bytes --]

There is a lock around the entire AML interpreter. We do support the limited threading model defined by the ACPI spec, but at any given time, only one thread is executing the interpreter.

Bob


> -----Original Message-----
> From: devel-bounces(a)acpica.org [mailto:devel-bounces(a)acpica.org] On Behalf
> Of Andriy Gapon
> Sent: Thursday, November 08, 2012 11:34 PM
> To: devel(a)acpica.org
> Subject: [Devel] ref-counting races?
> 
> 
> It looks like the reference count manipulations performed in
> AcpiUtUpdateRefCount/AcpiUtUpdateObjectReference are not protected by any
> lock.
>  Although, it is possible that I am missing some implied locking in the
> higher levels.
> 
> In particular, I am concerned about access to the ASL elements like
> Name(_HID,
> 2) via AcpiNsEvaluate -> AcpiExResolveNodeToValue.  In this case multiple
> threads can concurrently get or drop references to the value object.
> 
> The reason to inquire about this is that over the time several FreeBSD
> users has reported some seldom occurring, non-reproducible, ACPI-related
> heisen-panics.
> Their common element is a trap upon memory access in
> AcpiOsAcquireObject(AcpiGbl_OperandCache).  Which, in my opinion, strongly
> hints at corruption of AcpiGbl_OperandCache via double-deallocation of an
> object.
> Which I then connect to the reference counting.
> 
> Some links:
> http://lists.freebsd.org/pipermail/freebsd-current/2012-March/032637.html
> http://lists.freebsd.org/pipermail/freebsd-acpi/2012-January/007406.html
> http://thread.gmane.org/gmane.os.freebsd.stable/83262/focus=83445
> https://bugs.dragonflybsd.org/issues/568
> 
> 
> --
> Andriy Gapon

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Devel] ref-counting races?
@ 2012-11-09  7:33 Andriy Gapon
  0 siblings, 0 replies; 18+ messages in thread
From: Andriy Gapon @ 2012-11-09  7:33 UTC (permalink / raw)
  To: devel

[-- Attachment #1: Type: text/plain, Size: 1182 bytes --]


It looks like the reference count manipulations performed in
AcpiUtUpdateRefCount/AcpiUtUpdateObjectReference are not protected by any lock.
 Although, it is possible that I am missing some implied locking in the higher
levels.

In particular, I am concerned about access to the ASL elements like Name(_HID,
2) via AcpiNsEvaluate -> AcpiExResolveNodeToValue.  In this case multiple
threads can concurrently get or drop references to the value object.

The reason to inquire about this is that over the time several FreeBSD users has
reported some seldom occurring, non-reproducible, ACPI-related heisen-panics.
Their common element is a trap upon memory access in
AcpiOsAcquireObject(AcpiGbl_OperandCache).  Which, in my opinion, strongly hints
at corruption of AcpiGbl_OperandCache via double-deallocation of an object.
Which I then connect to the reference counting.

Some links:
http://lists.freebsd.org/pipermail/freebsd-current/2012-March/032637.html
http://lists.freebsd.org/pipermail/freebsd-acpi/2012-January/007406.html
http://thread.gmane.org/gmane.os.freebsd.stable/83262/focus=83445
https://bugs.dragonflybsd.org/issues/568


-- 
Andriy Gapon

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2013-03-24 11:10 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-03-24 11:10 [Devel] ref-counting races? Andriy Gapon
  -- strict thread matches above, loose matches on Subject: below --
2013-03-24 10:54 Andriy Gapon
2013-03-22 20:23 Moore, Robert
2013-03-22  4:28 Zheng, Lv
2013-03-21 19:46 Moore, Robert
2013-03-21 18:07 Andriy Gapon
2012-11-21 15:18 Moore, Robert
2012-11-20 23:08 Andriy Gapon
2012-11-13 16:50 Moore, Robert
2012-11-13  1:51 Moore, Robert
2012-11-10 15:04 Andriy Gapon
2012-11-09 16:03 Andriy Gapon
2012-11-09 15:59 Andriy Gapon
2012-11-09 15:50 Moore, Robert
2012-11-09 15:42 Moore, Robert
2012-11-09 14:41 Andriy Gapon
2012-11-09 14:28 Moore, Robert
2012-11-09  7:33 Andriy Gapon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.