Re: [PATCH] md/bcache: Fix a deadlock while calculating writeback rate

All of lore.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH] md/bcache: Fix a deadlock while calculating writeback rate
       [not found] <20170507155222.19114-1-phillipp.roell@trafficplex.de>
@ 2017-05-09 18:53 ` Eric Wheeler
       [not found]   ` <4e43a54e075047af9b2e3cf8a0452c1b@EX2013-CAS2.de2.eu-rbx.webhod.com>
  0 siblings, 1 reply; 7+ messages in thread
From: Eric Wheeler @ 2017-05-09 18:53 UTC (permalink / raw)
  To: Phillipp Röll; +Cc: Kent Overstreet, linux-bcache

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2610 bytes --]

On Sun, 7 May 2017, Phillipp Röll wrote:
> A bcache foreground write holds a read lock on the writeback_lock
> rwsem. In order to complete and release the lock after the actual
> write, it needs to update the index, which runs out of a workqueue.
> 
> If the bch_writeback_thread is starting to write back dirty data while
> the foreground write still holds the lock, it tries to get a write
> lock on the writeback lock, thus blocking until the foreground write
> completes.
> 
> If at this moment a workqueue worker starts an update of the write-
> back rate, while foreground write completions are queued behind it,
> the update thread will try to get a read lock, which blocks, because
> the writeback thread is trying to get a write lock. This blocks the
> foreground writes behind it and locks up all writeback and foreground
> writes.
> 
> Removing the read lock from the writeback rate update thread is the
> simplest solution: the worst thing that will happen when the scan
> for dirty data is inconsistent is a wrong P-factor in the PID controller.

Hi Phillipp,

Thank you for tracking this down!  Did you have an actual deadlock case 
that this fixed?


I wonder: in the event of a race, can the updated P-factor be invalid (ie, 
partially populated)? I don't believe 64-bit assignments are atomic on all 
architectures (see __update_writeback_rate) so you might get the upper or 
lower 32 bits of the P-factor or of other values assigned therein.  Could 
the rate be calculated in such a race by the writeback thread such that no 
progress is made (eg, effectively rate==0)?

I certainly want to see deadlocks fixed and thought this could be a 
consideration.  Worst case we could put a spinlock around the use of the 
dc->writeback_rate* fields.

What do you think?

--
Eric Wheeler

> 
> Signed-off-by: Phillipp Röll <phillipp.roell@trafficplex.de>
> ---
>  drivers/md/bcache/writeback.c | 4 ----
>  1 file changed, 4 deletions(-)
> 
> diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
> index 6ac2e48..90da3bd 100644
> --- a/drivers/md/bcache/writeback.c
> +++ b/drivers/md/bcache/writeback.c
> @@ -75,14 +75,10 @@ static void update_writeback_rate(struct work_struct *work)
>  					     struct cached_dev,
>  					     writeback_rate_update);
>  
> -	down_read(&dc->writeback_lock);
> -
>  	if (atomic_read(&dc->has_dirty) &&
>  	    dc->writeback_percent)
>  		__update_writeback_rate(dc);
>  
> -	up_read(&dc->writeback_lock);
> -
>  	schedule_delayed_work(&dc->writeback_rate_update,
>  			      dc->writeback_rate_update_seconds * HZ);
>  }
> -- 
> 2.10.0
> 
> 
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: AW: [PATCH] md/bcache: Fix a deadlock while calculating writeback rate
       [not found]   ` <4e43a54e075047af9b2e3cf8a0452c1b@EX2013-CAS2.de2.eu-rbx.webhod.com>
@ 2017-05-12 18:54     ` Eric Wheeler
  2017-05-18 17:05       ` Eric Wheeler
  0 siblings, 1 reply; 7+ messages in thread
From: Eric Wheeler @ 2017-05-12 18:54 UTC (permalink / raw)
  To: Phillipp Röll; +Cc: linux-bcache

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2758 bytes --]

On Wed, 10 May 2017, Phillipp Röll wrote:

> Hi Eric,
> 
> > Thank you for tracking this down!  Did you have an actual deadlock case that
> > this fixed?
> 
> Yes. We had at least dozens, if not hundreds, of "kernel hung task" events
> followed by complete IO lockups on our production systems, with traces like
> this: https://paste42.de/059f78f7f9f27e9d88cd65fc02811e92/11749/
> 
> Reproduction is actually pretty easy:
> 
> - Setup bcache with SSD + HDD and LVM on top of the resulting bcache device
> - Create three LVs in the new cached VG
> - Start dd'ing from one of these LVs to another LV
> - Start fio with randrw on the third LV
> - Wait 5-10 minutes for the message to pop up and the system to lock up
> 
> > I wonder: in the event of a race, can the updated P-factor be invalid (ie,
> > partially populated)? I don't believe 64-bit assignments are atomic on all
> > architectures (see __update_writeback_rate) so you might get the upper or
> > lower 32 bits of the P-factor or of other values assigned therein.  Could
> > the rate be calculated in such a race by the writeback thread such that no
> > progress is made (eg, effectively rate==0)?
> 
> That seems to be right, yes. I think I can add that spinlock. Would you say
> the spinlock should be used for the sysfs-info as well?

Yes, anywhere it is used.  It might be a good idea to write a short 
accessor function if there are lots of places that need to grab the lock.  
Some kind of future-proofing would be a good idea to make it easy if 
anyone else grabs the value and doesn't realize it needs to be locked.

--
Eric Wheeler



> 
> We're using amd64 only and had the patch tested with IO stress tests for
> 10 to 14 days and running in prod for 3 months now without issue.
> But that doesn't mean anything for the other architectures...
> 
> Phillipp
> 
> > > Signed-off-by: Phillipp Röll <phillipp.roell@trafficplex.de>
> > > ---
> > >  drivers/md/bcache/writeback.c | 4 ----
> > >  1 file changed, 4 deletions(-)
> > >
> > > diff --git a/drivers/md/bcache/writeback.c
> > b/drivers/md/bcache/writeback.c
> > > index 6ac2e48..90da3bd 100644
> > > --- a/drivers/md/bcache/writeback.c
> > > +++ b/drivers/md/bcache/writeback.c
> > > @@ -75,14 +75,10 @@ static void update_writeback_rate(struct
> > work_struct *work)
> > >  					     struct cached_dev,
> > >  					     writeback_rate_update);
> > >
> > > -	down_read(&dc->writeback_lock);
> > > -
> > >  	if (atomic_read(&dc->has_dirty) &&
> > >  	    dc->writeback_percent)
> > >  		__update_writeback_rate(dc);
> > >
> > > -	up_read(&dc->writeback_lock);
> > > -
> > >  	schedule_delayed_work(&dc->writeback_rate_update,
> > >  			      dc->writeback_rate_update_seconds * HZ);
> > >  }
> > > --
> > > 2.10.0
> > >
> > >
> > >
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: AW: [PATCH] md/bcache: Fix a deadlock while calculating writeback rate
  2017-05-12 18:54     ` AW: " Eric Wheeler
@ 2017-05-18 17:05       ` Eric Wheeler
  2017-05-18 18:52         ` AW: " Phillipp Röll
  0 siblings, 1 reply; 7+ messages in thread
From: Eric Wheeler @ 2017-05-18 17:05 UTC (permalink / raw)
  To: Phillipp Röll; +Cc: linux-bcache

[-- Attachment #1: Type: TEXT/PLAIN, Size: 3129 bytes --]

On Fri, 12 May 2017, Eric Wheeler wrote:

> On Wed, 10 May 2017, Phillipp Röll wrote:
> 
> > Hi Eric,
> > 
> > > Thank you for tracking this down!  Did you have an actual deadlock case that
> > > this fixed?
> > 
> > Yes. We had at least dozens, if not hundreds, of "kernel hung task" events
> > followed by complete IO lockups on our production systems, with traces like
> > this: https://paste42.de/059f78f7f9f27e9d88cd65fc02811e92/11749/
> > 
> > Reproduction is actually pretty easy:
> > 
> > - Setup bcache with SSD + HDD and LVM on top of the resulting bcache device
> > - Create three LVs in the new cached VG
> > - Start dd'ing from one of these LVs to another LV
> > - Start fio with randrw on the third LV
> > - Wait 5-10 minutes for the message to pop up and the system to lock up
> > 
> > > I wonder: in the event of a race, can the updated P-factor be invalid (ie,
> > > partially populated)? I don't believe 64-bit assignments are atomic on all
> > > architectures (see __update_writeback_rate) so you might get the upper or
> > > lower 32 bits of the P-factor or of other values assigned therein.  Could
> > > the rate be calculated in such a race by the writeback thread such that no
> > > progress is made (eg, effectively rate==0)?
> > 
> > That seems to be right, yes. I think I can add that spinlock. Would you say
> > the spinlock should be used for the sysfs-info as well?
> 
> Yes, anywhere it is used.  It might be a good idea to write a short 
> accessor function if there are lots of places that need to grab the lock.  
> Some kind of future-proofing would be a good idea to make it easy if 
> anyone else grabs the value and doesn't realize it needs to be locked.

Hi Phillipp,

Just checking in: 

Will you have a spinlock version of the patch available soon?

-Eric

ps, be sure to Cc: linux-bcache@vger.kernel.org on your replies :)

> 
> --
> Eric Wheeler
> 
> 
> 
> > 
> > We're using amd64 only and had the patch tested with IO stress tests for
> > 10 to 14 days and running in prod for 3 months now without issue.
> > But that doesn't mean anything for the other architectures...
> > 
> > Phillipp
> > 
> > > > Signed-off-by: Phillipp Röll <phillipp.roell@trafficplex.de>
> > > > ---
> > > >  drivers/md/bcache/writeback.c | 4 ----
> > > >  1 file changed, 4 deletions(-)
> > > >
> > > > diff --git a/drivers/md/bcache/writeback.c
> > > b/drivers/md/bcache/writeback.c
> > > > index 6ac2e48..90da3bd 100644
> > > > --- a/drivers/md/bcache/writeback.c
> > > > +++ b/drivers/md/bcache/writeback.c
> > > > @@ -75,14 +75,10 @@ static void update_writeback_rate(struct
> > > work_struct *work)
> > > >  					     struct cached_dev,
> > > >  					     writeback_rate_update);
> > > >
> > > > -	down_read(&dc->writeback_lock);
> > > > -
> > > >  	if (atomic_read(&dc->has_dirty) &&
> > > >  	    dc->writeback_percent)
> > > >  		__update_writeback_rate(dc);
> > > >
> > > > -	up_read(&dc->writeback_lock);
> > > > -
> > > >  	schedule_delayed_work(&dc->writeback_rate_update,
> > > >  			      dc->writeback_rate_update_seconds * HZ);
> > > >  }
> > > > --
> > > > 2.10.0
> > > >
> > > >
> > > >
> > 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* AW: AW: [PATCH] md/bcache: Fix a deadlock while calculating writeback rate
  2017-05-18 17:05       ` Eric Wheeler
@ 2017-05-18 18:52         ` Phillipp Röll
  2017-05-18 20:50           ` Eric Wheeler
  0 siblings, 1 reply; 7+ messages in thread
From: Phillipp Röll @ 2017-05-18 18:52 UTC (permalink / raw)
  To: Eric Wheeler; +Cc: linux-bcache



> -----Ursprüngliche Nachricht-----
> Von: Eric Wheeler [mailto:bcache@lists.ewheeler.net]
> Gesendet: Donnerstag, 18. Mai 2017 19:05
> An: Phillipp Röll
> Cc: linux-bcache@vger.kernel.org
> Betreff: Re: AW: [PATCH] md/bcache: Fix a deadlock while calculating
> writeback rate
> 
> On Fri, 12 May 2017, Eric Wheeler wrote:
> 
> > On Wed, 10 May 2017, Phillipp Röll wrote:
> >
> > > Hi Eric,
> > >
> > > > Thank you for tracking this down!  Did you have an actual deadlock
> > > > case that this fixed?
> > >
> > > Yes. We had at least dozens, if not hundreds, of "kernel hung task"
> > > events followed by complete IO lockups on our production systems,
> > > with traces like
> > > this: https://paste42.de/059f78f7f9f27e9d88cd65fc02811e92/11749/
> > >
> > > Reproduction is actually pretty easy:
> > >
> > > - Setup bcache with SSD + HDD and LVM on top of the resulting bcache
> > > device
> > > - Create three LVs in the new cached VG
> > > - Start dd'ing from one of these LVs to another LV
> > > - Start fio with randrw on the third LV
> > > - Wait 5-10 minutes for the message to pop up and the system to lock
> > > up
> > >
> > > > I wonder: in the event of a race, can the updated P-factor be
> > > > invalid (ie, partially populated)? I don't believe 64-bit
> > > > assignments are atomic on all architectures (see
> > > > __update_writeback_rate) so you might get the upper or lower 32
> > > > bits of the P-factor or of other values assigned therein.  Could
> > > > the rate be calculated in such a race by the writeback thread such that
> no progress is made (eg, effectively rate==0)?
> > >
> > > That seems to be right, yes. I think I can add that spinlock. Would
> > > you say the spinlock should be used for the sysfs-info as well?
> >
> > Yes, anywhere it is used.  It might be a good idea to write a short
> > accessor function if there are lots of places that need to grab the lock.
> > Some kind of future-proofing would be a good idea to make it easy if
> > anyone else grabs the value and doesn't realize it needs to be locked.
> 
> Hi Phillipp,
> 
> Just checking in:
> 
> Will you have a spinlock version of the patch available soon?
> 
> -Eric
> 
> ps, be sure to Cc: linux-bcache@vger.kernel.org on your replies :)
> 

Hi Eric,
honestly I don't know. I haven't touched C in a decade and would have
to read up on documentation first, setup the dev ecosystem with
build and testing and I have no clue when I can spare a day or two
for that. Removing a lock is one thing, adding one is a different beast.

Plus, for our arch the problem is solved, it's just not upstream,
so the priority has gone from "critical" to "none". I wouldn't be
surprised if I couldn't work on this until after the summer, sorry.

- Phillipp

> >
> > --
> > Eric Wheeler
> >
> >
> >
> > >
> > > We're using amd64 only and had the patch tested with IO stress tests
> > > for
> > > 10 to 14 days and running in prod for 3 months now without issue.
> > > But that doesn't mean anything for the other architectures...
> > >
> > > Phillipp
> > >
> > > > > Signed-off-by: Phillipp Röll <phillipp.roell@trafficplex.de>
> > > > > ---
> > > > >  drivers/md/bcache/writeback.c | 4 ----
> > > > >  1 file changed, 4 deletions(-)
> > > > >
> > > > > diff --git a/drivers/md/bcache/writeback.c
> > > > b/drivers/md/bcache/writeback.c
> > > > > index 6ac2e48..90da3bd 100644
> > > > > --- a/drivers/md/bcache/writeback.c
> > > > > +++ b/drivers/md/bcache/writeback.c
> > > > > @@ -75,14 +75,10 @@ static void update_writeback_rate(struct
> > > > work_struct *work)
> > > > >  					     struct cached_dev,
> > > > >  					     writeback_rate_update);
> > > > >
> > > > > -	down_read(&dc->writeback_lock);
> > > > > -
> > > > >  	if (atomic_read(&dc->has_dirty) &&
> > > > >  	    dc->writeback_percent)
> > > > >  		__update_writeback_rate(dc);
> > > > >
> > > > > -	up_read(&dc->writeback_lock);
> > > > > -
> > > > >  	schedule_delayed_work(&dc->writeback_rate_update,
> > > > >  			      dc->writeback_rate_update_seconds *
> HZ);  }
> > > > > --
> > > > > 2.10.0
> > > > >
> > > > >
> > > > >
> > >

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: AW: AW: [PATCH] md/bcache: Fix a deadlock while calculating writeback rate
  2017-05-18 18:52         ` AW: " Phillipp Röll
@ 2017-05-18 20:50           ` Eric Wheeler
  2017-10-27 19:22             ` Eric Wheeler
  0 siblings, 1 reply; 7+ messages in thread
From: Eric Wheeler @ 2017-05-18 20:50 UTC (permalink / raw)
  To: Phillipp Röll; +Cc: linux-bcache

[-- Attachment #1: Type: TEXT/PLAIN, Size: 4645 bytes --]

On Thu, 18 May 2017, Phillipp Röll wrote:

> 
> 
> > -----Ursprüngliche Nachricht-----
> > Von: Eric Wheeler [mailto:bcache@lists.ewheeler.net]
> > Gesendet: Donnerstag, 18. Mai 2017 19:05
> > An: Phillipp Röll
> > Cc: linux-bcache@vger.kernel.org
> > Betreff: Re: AW: [PATCH] md/bcache: Fix a deadlock while calculating
> > writeback rate
> > 
> > On Fri, 12 May 2017, Eric Wheeler wrote:
> > 
> > > On Wed, 10 May 2017, Phillipp Röll wrote:
> > >
> > > > Hi Eric,
> > > >
> > > > > Thank you for tracking this down!  Did you have an actual deadlock
> > > > > case that this fixed?
> > > >
> > > > Yes. We had at least dozens, if not hundreds, of "kernel hung task"
> > > > events followed by complete IO lockups on our production systems,
> > > > with traces like
> > > > this: https://paste42.de/059f78f7f9f27e9d88cd65fc02811e92/11749/
> > > >
> > > > Reproduction is actually pretty easy:
> > > >
> > > > - Setup bcache with SSD + HDD and LVM on top of the resulting bcache
> > > > device
> > > > - Create three LVs in the new cached VG
> > > > - Start dd'ing from one of these LVs to another LV
> > > > - Start fio with randrw on the third LV
> > > > - Wait 5-10 minutes for the message to pop up and the system to lock
> > > > up
> > > >
> > > > > I wonder: in the event of a race, can the updated P-factor be
> > > > > invalid (ie, partially populated)? I don't believe 64-bit
> > > > > assignments are atomic on all architectures (see
> > > > > __update_writeback_rate) so you might get the upper or lower 32
> > > > > bits of the P-factor or of other values assigned therein.  Could
> > > > > the rate be calculated in such a race by the writeback thread such that
> > no progress is made (eg, effectively rate==0)?
> > > >
> > > > That seems to be right, yes. I think I can add that spinlock. Would
> > > > you say the spinlock should be used for the sysfs-info as well?
> > >
> > > Yes, anywhere it is used.  It might be a good idea to write a short
> > > accessor function if there are lots of places that need to grab the lock.
> > > Some kind of future-proofing would be a good idea to make it easy if
> > > anyone else grabs the value and doesn't realize it needs to be locked.
> > 
> > Hi Phillipp,
> > 
> > Just checking in:
> > 
> > Will you have a spinlock version of the patch available soon?
> > 
> > -Eric
> > 
> > ps, be sure to Cc: linux-bcache@vger.kernel.org on your replies :)
> > 
> 
> Hi Eric,
> honestly I don't know. I haven't touched C in a decade and would have
> to read up on documentation first, setup the dev ecosystem with
> build and testing and I have no clue when I can spare a day or two
> for that. Removing a lock is one thing, adding one is a different beast.
> 
> Plus, for our arch the problem is solved, it's just not upstream,
> so the priority has gone from "critical" to "none". I wouldn't be
> surprised if I couldn't work on this until after the summer, sorry.

Thank you for letting me know, I'll take a look.


--
Eric Wheeler

> 
> - Phillipp
> 
> > >
> > > --
> > > Eric Wheeler
> > >
> > >
> > >
> > > >
> > > > We're using amd64 only and had the patch tested with IO stress tests
> > > > for
> > > > 10 to 14 days and running in prod for 3 months now without issue.
> > > > But that doesn't mean anything for the other architectures...
> > > >
> > > > Phillipp
> > > >
> > > > > > Signed-off-by: Phillipp Röll <phillipp.roell@trafficplex.de>
> > > > > > ---
> > > > > >  drivers/md/bcache/writeback.c | 4 ----
> > > > > >  1 file changed, 4 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/md/bcache/writeback.c
> > > > > b/drivers/md/bcache/writeback.c
> > > > > > index 6ac2e48..90da3bd 100644
> > > > > > --- a/drivers/md/bcache/writeback.c
> > > > > > +++ b/drivers/md/bcache/writeback.c
> > > > > > @@ -75,14 +75,10 @@ static void update_writeback_rate(struct
> > > > > work_struct *work)
> > > > > >  					     struct cached_dev,
> > > > > >  					     writeback_rate_update);
> > > > > >
> > > > > > -	down_read(&dc->writeback_lock);
> > > > > > -
> > > > > >  	if (atomic_read(&dc->has_dirty) &&
> > > > > >  	    dc->writeback_percent)
> > > > > >  		__update_writeback_rate(dc);
> > > > > >
> > > > > > -	up_read(&dc->writeback_lock);
> > > > > > -
> > > > > >  	schedule_delayed_work(&dc->writeback_rate_update,
> > > > > >  			      dc->writeback_rate_update_seconds *
> > HZ);  }
> > > > > > --
> > > > > > 2.10.0
> > > > > >
> > > > > >
> > > > > >
> > > >
> --
> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: AW: AW: [PATCH] md/bcache: Fix a deadlock while calculating writeback rate
  2017-05-18 20:50           ` Eric Wheeler
@ 2017-10-27 19:22             ` Eric Wheeler
  2017-10-27 21:41               ` Eric Wheeler
  0 siblings, 1 reply; 7+ messages in thread
From: Eric Wheeler @ 2017-10-27 19:22 UTC (permalink / raw)
  To: Michael Lyle; +Cc: Phillipp Röll, linux-bcache

[-- Attachment #1: Type: TEXT/PLAIN, Size: 5092 bytes --]

On Thu, 18 May 2017, Eric Wheeler wrote:

> On Thu, 18 May 2017, Phillipp Röll wrote:
> 
> > 
> > 
> > > -----Ursprüngliche Nachricht-----
> > > Von: Eric Wheeler [mailto:bcache@lists.ewheeler.net]
> > > Gesendet: Donnerstag, 18. Mai 2017 19:05
> > > An: Phillipp Röll
> > > Cc: linux-bcache@vger.kernel.org
> > > Betreff: Re: AW: [PATCH] md/bcache: Fix a deadlock while calculating
> > > writeback rate
> > > 
> > > On Fri, 12 May 2017, Eric Wheeler wrote:
> > > 
> > > > On Wed, 10 May 2017, Phillipp Röll wrote:
> > > >
> > > > > Hi Eric,
> > > > >
> > > > > > Thank you for tracking this down!  Did you have an actual deadlock
> > > > > > case that this fixed?
> > > > >
> > > > > Yes. We had at least dozens, if not hundreds, of "kernel hung task"
> > > > > events followed by complete IO lockups on our production systems,
> > > > > with traces like
> > > > > this: https://paste42.de/059f78f7f9f27e9d88cd65fc02811e92/11749/
> > > > >
> > > > > Reproduction is actually pretty easy:
> > > > >
> > > > > - Setup bcache with SSD + HDD and LVM on top of the resulting bcache
> > > > > device
> > > > > - Create three LVs in the new cached VG
> > > > > - Start dd'ing from one of these LVs to another LV
> > > > > - Start fio with randrw on the third LV
> > > > > - Wait 5-10 minutes for the message to pop up and the system to lock
> > > > > up
> > > > >
> > > > > > I wonder: in the event of a race, can the updated P-factor be
> > > > > > invalid (ie, partially populated)? I don't believe 64-bit
> > > > > > assignments are atomic on all architectures (see
> > > > > > __update_writeback_rate) so you might get the upper or lower 32
> > > > > > bits of the P-factor or of other values assigned therein.  Could
> > > > > > the rate be calculated in such a race by the writeback thread such that
> > > no progress is made (eg, effectively rate==0)?
> > > > >
> > > > > That seems to be right, yes. I think I can add that spinlock. Would
> > > > > you say the spinlock should be used for the sysfs-info as well?
> > > >
> > > > Yes, anywhere it is used.  It might be a good idea to write a short
> > > > accessor function if there are lots of places that need to grab the lock.
> > > > Some kind of future-proofing would be a good idea to make it easy if
> > > > anyone else grabs the value and doesn't realize it needs to be locked.
> > > 
> > > Hi Phillipp,
> > > 
> > > Just checking in:
> > > 
> > > Will you have a spinlock version of the patch available soon?
> > > 
> > > -Eric
> > > 
> > > ps, be sure to Cc: linux-bcache@vger.kernel.org on your replies :)
> > > 
> > 
> > Hi Eric,
> > honestly I don't know. I haven't touched C in a decade and would have
> > to read up on documentation first, setup the dev ecosystem with
> > build and testing and I have no clue when I can spare a day or two
> > for that. Removing a lock is one thing, adding one is a different beast.
> > 
> > Plus, for our arch the problem is solved, it's just not upstream,
> > so the priority has gone from "critical" to "none". I wouldn't be
> > surprised if I couldn't work on this until after the summer, sorry.
> 
> Thank you for letting me know, I'll take a look.

Hi Michael,

I'm writing you to give you a heads up about this bug:

Do you have any ideas on how it might be addressed?

--
Eric Wheeler



> 
> --
> Eric Wheeler
> 
> > 
> > - Phillipp
> > 
> > > >
> > > > --
> > > > Eric Wheeler
> > > >
> > > >
> > > >
> > > > >
> > > > > We're using amd64 only and had the patch tested with IO stress tests
> > > > > for
> > > > > 10 to 14 days and running in prod for 3 months now without issue.
> > > > > But that doesn't mean anything for the other architectures...
> > > > >
> > > > > Phillipp
> > > > >
> > > > > > > Signed-off-by: Phillipp Röll <phillipp.roell@trafficplex.de>
> > > > > > > ---
> > > > > > >  drivers/md/bcache/writeback.c | 4 ----
> > > > > > >  1 file changed, 4 deletions(-)
> > > > > > >
> > > > > > > diff --git a/drivers/md/bcache/writeback.c
> > > > > > b/drivers/md/bcache/writeback.c
> > > > > > > index 6ac2e48..90da3bd 100644
> > > > > > > --- a/drivers/md/bcache/writeback.c
> > > > > > > +++ b/drivers/md/bcache/writeback.c
> > > > > > > @@ -75,14 +75,10 @@ static void update_writeback_rate(struct
> > > > > > work_struct *work)
> > > > > > >  					     struct cached_dev,
> > > > > > >  					     writeback_rate_update);
> > > > > > >
> > > > > > > -	down_read(&dc->writeback_lock);
> > > > > > > -
> > > > > > >  	if (atomic_read(&dc->has_dirty) &&
> > > > > > >  	    dc->writeback_percent)
> > > > > > >  		__update_writeback_rate(dc);
> > > > > > >
> > > > > > > -	up_read(&dc->writeback_lock);
> > > > > > > -
> > > > > > >  	schedule_delayed_work(&dc->writeback_rate_update,
> > > > > > >  			      dc->writeback_rate_update_seconds *
> > > HZ);  }
> > > > > > > --
> > > > > > > 2.10.0
> > > > > > >
> > > > > > >
> > > > > > >
> > > > >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: AW: AW: [PATCH] md/bcache: Fix a deadlock while calculating writeback rate
  2017-10-27 19:22             ` Eric Wheeler
@ 2017-10-27 21:41               ` Eric Wheeler
  0 siblings, 0 replies; 7+ messages in thread
From: Eric Wheeler @ 2017-10-27 21:41 UTC (permalink / raw)
  To: Michael Lyle; +Cc: Phillipp Röll, linux-bcache

[-- Attachment #1: Type: TEXT/PLAIN, Size: 6279 bytes --]

On Fri, 27 Oct 2017, Michael Lyle wrote:
> On Fri, 27 Oct 2017, Eric Wheeler wrote:
> > On Thu, 18 May 2017, Eric Wheeler wrote:
> > 
> > > On Thu, 18 May 2017, Phillipp Röll wrote:
> > > 
> > > > 
> > > > 
> > > > > -----Ursprüngliche Nachricht-----
> > > > > Von: Eric Wheeler [mailto:bcache@lists.ewheeler.net]
> > > > > Gesendet: Donnerstag, 18. Mai 2017 19:05
> > > > > An: Phillipp Röll
> > > > > Cc: linux-bcache@vger.kernel.org
> > > > > Betreff: Re: AW: [PATCH] md/bcache: Fix a deadlock while calculating
> > > > > writeback rate
> > > > > 
> > > > > On Fri, 12 May 2017, Eric Wheeler wrote:
> > > > > 
> > > > > > On Wed, 10 May 2017, Phillipp Röll wrote:
> > > > > >
> > > > > > > Hi Eric,
> > > > > > >
> > > > > > > > Thank you for tracking this down!  Did you have an actual deadlock
> > > > > > > > case that this fixed?
> > > > > > >
> > > > > > > Yes. We had at least dozens, if not hundreds, of "kernel hung task"
> > > > > > > events followed by complete IO lockups on our production systems,
> > > > > > > with traces like
> > > > > > > this: https://paste42.de/059f78f7f9f27e9d88cd65fc02811e92/11749/
> > > > > > >
> > > > > > > Reproduction is actually pretty easy:
> > > > > > >
> > > > > > > - Setup bcache with SSD + HDD and LVM on top of the resulting bcache
> > > > > > > device
> > > > > > > - Create three LVs in the new cached VG
> > > > > > > - Start dd'ing from one of these LVs to another LV
> > > > > > > - Start fio with randrw on the third LV
> > > > > > > - Wait 5-10 minutes for the message to pop up and the system to lock
> > > > > > > up
> > > > > > >
> > > > > > > > I wonder: in the event of a race, can the updated P-factor be
> > > > > > > > invalid (ie, partially populated)? I don't believe 64-bit
> > > > > > > > assignments are atomic on all architectures (see
> > > > > > > > __update_writeback_rate) so you might get the upper or lower 32
> > > > > > > > bits of the P-factor or of other values assigned therein.  Could
> > > > > > > > the rate be calculated in such a race by the writeback thread such that
> > > > > no progress is made (eg, effectively rate==0)?
> > > > > > >
> > > > > > > That seems to be right, yes. I think I can add that spinlock. Would
> > > > > > > you say the spinlock should be used for the sysfs-info as well?
> > > > > >
> > > > > > Yes, anywhere it is used.  It might be a good idea to write a short
> > > > > > accessor function if there are lots of places that need to grab the lock.
> > > > > > Some kind of future-proofing would be a good idea to make it easy if
> > > > > > anyone else grabs the value and doesn't realize it needs to be locked.
> > > > > 
> > > > > Hi Phillipp,
> > > > > 
> > > > > Just checking in:
> > > > > 
> > > > > Will you have a spinlock version of the patch available soon?
> > > > > 
> > > > > -Eric
> > > > > 
> > > > > ps, be sure to Cc: linux-bcache@vger.kernel.org on your replies :)
> > > > > 
> > > > 
> > > > Hi Eric,
> > > > honestly I don't know. I haven't touched C in a decade and would have
> > > > to read up on documentation first, setup the dev ecosystem with
> > > > build and testing and I have no clue when I can spare a day or two
> > > > for that. Removing a lock is one thing, adding one is a different beast.
> > > > 
> > > > Plus, for our arch the problem is solved, it's just not upstream,
> > > > so the priority has gone from "critical" to "none". I wouldn't be
> > > > surprised if I couldn't work on this until after the summer, sorry.
> > > 
> > > Thank you for letting me know, I'll take a look.
> > 
> > Hi Michael,
> > 
> > I'm writing you to give you a heads up about this bug:
> > 
> > Do you have any ideas on how it might be addressed?
>
> This analysis seems incorrect-- at least for current code, and I don't
> think it's changed lately.  It may be possible to have a divide by
> zero on a very-unlikely data race-- I will peek at that a little more
> sometime.  But the delay sleeping is constrained to be a maximum of  
> 2.5 seconds no matter what the rate is...

Hi Michael,

I'm not sure I see the divide by zero issue.   Can you elaborate?

Phillipp Röll who originally reported this issue in May said that the 
patch below fixed his issue, but removing the locks might not be safe.

What do you think, is that safe or is there a better way to solve this?

--
Eric Wheeler


> 
> --
> Eric Wheeler
> 
> 
> 
> > 
> > --
> > Eric Wheeler
> > 
> > > 
> > > - Phillipp
> > > 
> > > > >
> > > > > --
> > > > > Eric Wheeler
> > > > >
> > > > >
> > > > >
> > > > > >
> > > > > > We're using amd64 only and had the patch tested with IO stress tests
> > > > > > for
> > > > > > 10 to 14 days and running in prod for 3 months now without issue.
> > > > > > But that doesn't mean anything for the other architectures...
> > > > > >
> > > > > > Phillipp
> > > > > >
> > > > > > > > Signed-off-by: Phillipp Röll <phillipp.roell@trafficplex.de>
> > > > > > > > ---
> > > > > > > >  drivers/md/bcache/writeback.c | 4 ----
> > > > > > > >  1 file changed, 4 deletions(-)
> > > > > > > >
> > > > > > > > diff --git a/drivers/md/bcache/writeback.c
> > > > > > > b/drivers/md/bcache/writeback.c
> > > > > > > > index 6ac2e48..90da3bd 100644
> > > > > > > > --- a/drivers/md/bcache/writeback.c
> > > > > > > > +++ b/drivers/md/bcache/writeback.c
> > > > > > > > @@ -75,14 +75,10 @@ static void update_writeback_rate(struct
> > > > > > > work_struct *work)
> > > > > > > >  					     struct cached_dev,
> > > > > > > >  					     writeback_rate_update);
> > > > > > > >
> > > > > > > > -	down_read(&dc->writeback_lock);
> > > > > > > > -
> > > > > > > >  	if (atomic_read(&dc->has_dirty) &&
> > > > > > > >  	    dc->writeback_percent)
> > > > > > > >  		__update_writeback_rate(dc);
> > > > > > > >
> > > > > > > > -	up_read(&dc->writeback_lock);
> > > > > > > > -
> > > > > > > >  	schedule_delayed_work(&dc->writeback_rate_update,
> > > > > > > >  			      dc->writeback_rate_update_seconds *
> > > > HZ);  }
> > > > > > > > --
> > > > > > > > 2.10.0
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > >
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > 

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-10-27 21:41 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20170507155222.19114-1-phillipp.roell@trafficplex.de>
2017-05-09 18:53 ` [PATCH] md/bcache: Fix a deadlock while calculating writeback rate Eric Wheeler
     [not found]   ` <4e43a54e075047af9b2e3cf8a0452c1b@EX2013-CAS2.de2.eu-rbx.webhod.com>
2017-05-12 18:54     ` AW: " Eric Wheeler
2017-05-18 17:05       ` Eric Wheeler
2017-05-18 18:52         ` AW: " Phillipp Röll
2017-05-18 20:50           ` Eric Wheeler
2017-10-27 19:22             ` Eric Wheeler
2017-10-27 21:41               ` Eric Wheeler

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.