All of lore.kernel.org
 help / color / mirror / Atom feed
* Formatting of backing device
@ 2012-02-01 10:10 Piergiorgio Sartor
       [not found] ` <20120201101041.GA2779-W+Wf6LxwHt0@public.gmane.org>
       [not found] ` <CAHYUNGYcs3CeRA8Pk-R_3hA6mFHshKzysxRaCcsfm3WLT__B0A@mail.gmail.com>
  0 siblings, 2 replies; 19+ messages in thread
From: Piergiorgio Sartor @ 2012-02-01 10:10 UTC (permalink / raw)
  To: linux-bcache-u79uwXL29TY76Z2rM5mHXA

Hi all,

first of all I would like to congratulate for this
project, I think it is one of the most promising
feature the Linux kernel can have.

Wrote that, I've a question about the concept of
formatting the backing device.

As far as I understood, the first concept of bcache
was to simply "register" or "attach" a cache to a
backing device, that is, the backing device had not
to be formatted.

Lately, still if I understood it correctly, this
behaviour was changed and, now, the backing device
needs to be formatted.

So, the question is:

How about an already running device? Is it still
possible to attach a cache under such situation?

In general, would it be possible to attach/detach
a cache to any already available device (in the
future)? Or the caching/backing setup must be planned
before the HW is available, so to speak?

It would be useful (and cool too), to have the
possibility to attach/detach the SSD cache, on
the fly (at run-time) to any device it needs it.

I hope the question(s) are clear, if not please
let me know.

Thanks a lot in advance,

bye,

-- 

piergiorgio

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Formatting of backing device
       [not found] ` <20120201101041.GA2779-W+Wf6LxwHt0@public.gmane.org>
@ 2012-02-01 19:12   ` Adam Berkan
  0 siblings, 0 replies; 19+ messages in thread
From: Adam Berkan @ 2012-02-01 19:12 UTC (permalink / raw)
  To: Piergiorgio Sartor; +Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA

You can attach bcache to a drive with an existing file system, and it
will continue as normal.  If you connect to a drive without a file
system, then it will continue to not have a file system, but you can
format it while attached.

Attach/detach should work while the device is in use.  This isn't the
most tested code path, especially with writeback on, but it's supposed
to work.  Detaching while the cache is dirty requires flushing all
that data so performance will be bad until the detach completes.

Let us know if you find any bugs.

On Wed, Feb 1, 2012 at 2:10 AM, Piergiorgio Sartor
<piergiorgio.sartor-KvP5wT2u2U0@public.gmane.org> wrote:
>
> Hi all,
>
> first of all I would like to congratulate for this
> project, I think it is one of the most promising
> feature the Linux kernel can have.
>
> Wrote that, I've a question about the concept of
> formatting the backing device.
>
> As far as I understood, the first concept of bcache
> was to simply "register" or "attach" a cache to a
> backing device, that is, the backing device had not
> to be formatted.
>
> Lately, still if I understood it correctly, this
> behaviour was changed and, now, the backing device
> needs to be formatted.
>
> So, the question is:
>
> How about an already running device? Is it still
> possible to attach a cache under such situation?
>
> In general, would it be possible to attach/detach
> a cache to any already available device (in the
> future)? Or the caching/backing setup must be planned
> before the HW is available, so to speak?
>
> It would be useful (and cool too), to have the
> possibility to attach/detach the SSD cache, on
> the fly (at run-time) to any device it needs it.
>
> I hope the question(s) are clear, if not please
> let me know.
>
> Thanks a lot in advance,
>
> bye,
>
> --
>
> piergiorgio
> --
> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Formatting of backing device
       [not found]   ` <CAHYUNGYcs3CeRA8Pk-R_3hA6mFHshKzysxRaCcsfm3WLT__B0A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-02-01 20:54     ` Piergiorgio Sartor
       [not found]       ` <20120201205456.GA7669-W+Wf6LxwHt0@public.gmane.org>
       [not found]       ` <CAHYUNGaB4LCESDWU1tWB1ZJp_kBH_=19e07vCndxXS5T98_xBA@mail.gmail.com>
  0 siblings, 2 replies; 19+ messages in thread
From: Piergiorgio Sartor @ 2012-02-01 20:54 UTC (permalink / raw)
  To: Adam Berkan; +Cc: Piergiorgio Sartor, linux-bcache-u79uwXL29TY76Z2rM5mHXA

Hi Adam,

thanks for the answer, see below.

On Wed, Feb 01, 2012 at 11:04:59AM -0800, Adam Berkan wrote:
> You can attach bcache to a drive with an existing file system, and it will
> continue as normal.  If you connect to a drive without a file system, then
> it will continue to not have a file system, but you can format it while
> attached.

Maybe I misused the term "format".

I did not mean filesystem format, but bcache format.

What I understood, maybe I'm wrong, is that the backing
device, before being used, must be "initialized" with
the bcache tool.

From the docs:

Getting started:
You'll need make-bcache from the bcache-tools repository. Both the cache device
and backing device must be formatted before use.
  make-bcache -B /dev/sdb
  make-bcache -C -w2k -b1M -j64 /dev/sdc

I understand this as the backing device gets something
on written on it (note the term "formatted").

Am I wrong? I hope so...

Thanks again,

bye,

pg

> Attach/detach should work while the device is in use.  This isn't the most
> tested code path, especially with writeback on, but it's supposed to work.
>  Detaching while the cache is dirty requires flushing all that data so
> performance will be bad until the detach completes.
> 
> Let us know if you find any bugs.
> Adam
> 
> On Wed, Feb 1, 2012 at 2:10 AM, Piergiorgio Sartor <
> piergiorgio.sartor-KvP5wT2u2U0@public.gmane.org> wrote:
> 
> > Hi all,
> >
> > first of all I would like to congratulate for this
> > project, I think it is one of the most promising
> > feature the Linux kernel can have.
> >
> > Wrote that, I've a question about the concept of
> > formatting the backing device.
> >
> > As far as I understood, the first concept of bcache
> > was to simply "register" or "attach" a cache to a
> > backing device, that is, the backing device had not
> > to be formatted.
> >
> > Lately, still if I understood it correctly, this
> > behaviour was changed and, now, the backing device
> > needs to be formatted.
> >
> > So, the question is:
> >
> > How about an already running device? Is it still
> > possible to attach a cache under such situation?
> >
> > In general, would it be possible to attach/detach
> > a cache to any already available device (in the
> > future)? Or the caching/backing setup must be planned
> > before the HW is available, so to speak?
> >
> > It would be useful (and cool too), to have the
> > possibility to attach/detach the SSD cache, on
> > the fly (at run-time) to any device it needs it.
> >
> > I hope the question(s) are clear, if not please
> > let me know.
> >
> > Thanks a lot in advance,
> >
> > bye,
> >
> > --
> >
> > piergiorgio
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >

-- 

piergiorgio

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Formatting of backing device
       [not found]       ` <20120201205456.GA7669-W+Wf6LxwHt0@public.gmane.org>
@ 2012-02-01 21:43         ` Adam Berkan
  0 siblings, 0 replies; 19+ messages in thread
From: Adam Berkan @ 2012-02-01 21:43 UTC (permalink / raw)
  To: Piergiorgio Sartor; +Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA

Oh, sorry I misunderstood.

You have to run make-bcache once to add a bcache superblock to the
drive.  After that the drive contents are destroyed and it needs to be
formatted with a filesystem.

At that point you can attach or detach the drive while it is in use.

On Wed, Feb 1, 2012 at 12:54 PM, Piergiorgio Sartor
<piergiorgio.sartor-KvP5wT2u2U0@public.gmane.org> wrote:
>
> Hi Adam,
>
> thanks for the answer, see below.
>
> On Wed, Feb 01, 2012 at 11:04:59AM -0800, Adam Berkan wrote:
> > You can attach bcache to a drive with an existing file system, and it will
> > continue as normal.  If you connect to a drive without a file system, then
> > it will continue to not have a file system, but you can format it while
> > attached.
>
> Maybe I misused the term "format".
>
> I did not mean filesystem format, but bcache format.
>
> What I understood, maybe I'm wrong, is that the backing
> device, before being used, must be "initialized" with
> the bcache tool.
>
> From the docs:
>
> Getting started:
> You'll need make-bcache from the bcache-tools repository. Both the cache device
> and backing device must be formatted before use.
>  make-bcache -B /dev/sdb
>  make-bcache -C -w2k -b1M -j64 /dev/sdc
>
> I understand this as the backing device gets something
> on written on it (note the term "formatted").
>
> Am I wrong? I hope so...
>
> Thanks again,
>
> bye,
>
> pg
>
> > Attach/detach should work while the device is in use.  This isn't the most
> > tested code path, especially with writeback on, but it's supposed to work.
> >  Detaching while the cache is dirty requires flushing all that data so
> > performance will be bad until the detach completes.
> >
> > Let us know if you find any bugs.
> > Adam
> >
> > On Wed, Feb 1, 2012 at 2:10 AM, Piergiorgio Sartor <
> > piergiorgio.sartor-KvP5wT2u2U0@public.gmane.org> wrote:
> >
> > > Hi all,
> > >
> > > first of all I would like to congratulate for this
> > > project, I think it is one of the most promising
> > > feature the Linux kernel can have.
> > >
> > > Wrote that, I've a question about the concept of
> > > formatting the backing device.
> > >
> > > As far as I understood, the first concept of bcache
> > > was to simply "register" or "attach" a cache to a
> > > backing device, that is, the backing device had not
> > > to be formatted.
> > >
> > > Lately, still if I understood it correctly, this
> > > behaviour was changed and, now, the backing device
> > > needs to be formatted.
> > >
> > > So, the question is:
> > >
> > > How about an already running device? Is it still
> > > possible to attach a cache under such situation?
> > >
> > > In general, would it be possible to attach/detach
> > > a cache to any already available device (in the
> > > future)? Or the caching/backing setup must be planned
> > > before the HW is available, so to speak?
> > >
> > > It would be useful (and cool too), to have the
> > > possibility to attach/detach the SSD cache, on
> > > the fly (at run-time) to any device it needs it.
> > >
> > > I hope the question(s) are clear, if not please
> > > let me know.
> > >
> > > Thanks a lot in advance,
> > >
> > > bye,
> > >
> > > --
> > >
> > > piergiorgio
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> > > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > >
>
> --
>
> piergiorgio

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Formatting of backing device
       [not found]         ` <CAHYUNGaB4LCESDWU1tWB1ZJp_kBH_=19e07vCndxXS5T98_xBA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-02-01 21:44           ` Piergiorgio Sartor
       [not found]             ` <20120201214443.GA8544-W+Wf6LxwHt0@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Piergiorgio Sartor @ 2012-02-01 21:44 UTC (permalink / raw)
  To: Adam Berkan; +Cc: Piergiorgio Sartor, linux-bcache-u79uwXL29TY76Z2rM5mHXA

Hi Adam,

On Wed, Feb 01, 2012 at 01:38:12PM -0800, Adam Berkan wrote:
> Oh, sorry I misunderstood.
> 
> You have to run make-bcache once to add a bcache superblock to the drive.
>  After that the drive contents are destroyed and it needs to be formatted
> with a filesystem.

ah! That's not good...

Is there any plan to have the caching device attachable
and detachable from *any* backing device without prior
"formatting" of this second one?

I think bcache is a very interesting and promising
project, but formatting the backing device is
something, I think, that should be avoided.

bye,

pg

> At that point you can attach or detach the drive while it is in use.
> 
> Adam
> 
> On Wed, Feb 1, 2012 at 12:54 PM, Piergiorgio Sartor <
> piergiorgio.sartor-KvP5wT2u2U0@public.gmane.org> wrote:
> 
> > Hi Adam,
> >
> > thanks for the answer, see below.
> >
> > On Wed, Feb 01, 2012 at 11:04:59AM -0800, Adam Berkan wrote:
> > > You can attach bcache to a drive with an existing file system, and it
> > will
> > > continue as normal.  If you connect to a drive without a file system,
> > then
> > > it will continue to not have a file system, but you can format it while
> > > attached.
> >
> > Maybe I misused the term "format".
> >
> > I did not mean filesystem format, but bcache format.
> >
> > What I understood, maybe I'm wrong, is that the backing
> > device, before being used, must be "initialized" with
> > the bcache tool.
> >
> > From the docs:
> >
> > Getting started:
> > You'll need make-bcache from the bcache-tools repository. Both the cache
> > device
> > and backing device must be formatted before use.
> >  make-bcache -B /dev/sdb
> >  make-bcache -C -w2k -b1M -j64 /dev/sdc
> >
> > I understand this as the backing device gets something
> > on written on it (note the term "formatted").
> >
> > Am I wrong? I hope so...
> >
> > Thanks again,
> >
> > bye,
> >
> > pg
> >
> > > Attach/detach should work while the device is in use.  This isn't the
> > most
> > > tested code path, especially with writeback on, but it's supposed to
> > work.
> > >  Detaching while the cache is dirty requires flushing all that data so
> > > performance will be bad until the detach completes.
> > >
> > > Let us know if you find any bugs.
> > > Adam
> > >
> > > On Wed, Feb 1, 2012 at 2:10 AM, Piergiorgio Sartor <
> > > piergiorgio.sartor-KvP5wT2u2U0@public.gmane.org> wrote:
> > >
> > > > Hi all,
> > > >
> > > > first of all I would like to congratulate for this
> > > > project, I think it is one of the most promising
> > > > feature the Linux kernel can have.
> > > >
> > > > Wrote that, I've a question about the concept of
> > > > formatting the backing device.
> > > >
> > > > As far as I understood, the first concept of bcache
> > > > was to simply "register" or "attach" a cache to a
> > > > backing device, that is, the backing device had not
> > > > to be formatted.
> > > >
> > > > Lately, still if I understood it correctly, this
> > > > behaviour was changed and, now, the backing device
> > > > needs to be formatted.
> > > >
> > > > So, the question is:
> > > >
> > > > How about an already running device? Is it still
> > > > possible to attach a cache under such situation?
> > > >
> > > > In general, would it be possible to attach/detach
> > > > a cache to any already available device (in the
> > > > future)? Or the caching/backing setup must be planned
> > > > before the HW is available, so to speak?
> > > >
> > > > It would be useful (and cool too), to have the
> > > > possibility to attach/detach the SSD cache, on
> > > > the fly (at run-time) to any device it needs it.
> > > >
> > > > I hope the question(s) are clear, if not please
> > > > let me know.
> > > >
> > > > Thanks a lot in advance,
> > > >
> > > > bye,
> > > >
> > > > --
> > > >
> > > > piergiorgio
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe
> > linux-bcache" in
> > > > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > >
> >
> > --
> >
> > piergiorgio
> >

-- 

piergiorgio

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Formatting of backing device
       [not found]             ` <20120201214443.GA8544-W+Wf6LxwHt0@public.gmane.org>
@ 2012-02-01 23:11               ` Adam Berkan
  2012-02-02 19:01                 ` Piergiorgio Sartor
  0 siblings, 1 reply; 19+ messages in thread
From: Adam Berkan @ 2012-02-01 23:11 UTC (permalink / raw)
  To: Piergiorgio Sartor; +Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA

When we make-bcache on a drive we need to replace the filesytem
superblock with a bcache superblock so the kernel knows to load the
drive through bcache, but this destroys the filesystem.  We've talked
about hacky ways to hide the bcache superblock somewhere else, but
it's very dangerous stuff that's likely to fail and we don't want to
support it.

Adam



On Wed, Feb 1, 2012 at 1:44 PM, Piergiorgio Sartor
<piergiorgio.sartor-KvP5wT2u2U0@public.gmane.org> wrote:
> Hi Adam,
>
> On Wed, Feb 01, 2012 at 01:38:12PM -0800, Adam Berkan wrote:
>> Oh, sorry I misunderstood.
>>
>> You have to run make-bcache once to add a bcache superblock to the drive.
>>  After that the drive contents are destroyed and it needs to be formatted
>> with a filesystem.
>
> ah! That's not good...
>
> Is there any plan to have the caching device attachable
> and detachable from *any* backing device without prior
> "formatting" of this second one?
>
> I think bcache is a very interesting and promising
> project, but formatting the backing device is
> something, I think, that should be avoided.
>
> bye,
>
> pg
>
>> At that point you can attach or detach the drive while it is in use.
>>
>> Adam
>>
>> On Wed, Feb 1, 2012 at 12:54 PM, Piergiorgio Sartor <
>> piergiorgio.sartor-KvP5wT2u2U0@public.gmane.org> wrote:
>>
>> > Hi Adam,
>> >
>> > thanks for the answer, see below.
>> >
>> > On Wed, Feb 01, 2012 at 11:04:59AM -0800, Adam Berkan wrote:
>> > > You can attach bcache to a drive with an existing file system, and it
>> > will
>> > > continue as normal.  If you connect to a drive without a file system,
>> > then
>> > > it will continue to not have a file system, but you can format it while
>> > > attached.
>> >
>> > Maybe I misused the term "format".
>> >
>> > I did not mean filesystem format, but bcache format.
>> >
>> > What I understood, maybe I'm wrong, is that the backing
>> > device, before being used, must be "initialized" with
>> > the bcache tool.
>> >
>> > From the docs:
>> >
>> > Getting started:
>> > You'll need make-bcache from the bcache-tools repository. Both the cache
>> > device
>> > and backing device must be formatted before use.
>> >  make-bcache -B /dev/sdb
>> >  make-bcache -C -w2k -b1M -j64 /dev/sdc
>> >
>> > I understand this as the backing device gets something
>> > on written on it (note the term "formatted").
>> >
>> > Am I wrong? I hope so...
>> >
>> > Thanks again,
>> >
>> > bye,
>> >
>> > pg
>> >
>> > > Attach/detach should work while the device is in use.  This isn't the
>> > most
>> > > tested code path, especially with writeback on, but it's supposed to
>> > work.
>> > >  Detaching while the cache is dirty requires flushing all that data so
>> > > performance will be bad until the detach completes.
>> > >
>> > > Let us know if you find any bugs.
>> > > Adam
>> > >
>> > > On Wed, Feb 1, 2012 at 2:10 AM, Piergiorgio Sartor <
>> > > piergiorgio.sartor-KvP5wT2u2U0@public.gmane.org> wrote:
>> > >
>> > > > Hi all,
>> > > >
>> > > > first of all I would like to congratulate for this
>> > > > project, I think it is one of the most promising
>> > > > feature the Linux kernel can have.
>> > > >
>> > > > Wrote that, I've a question about the concept of
>> > > > formatting the backing device.
>> > > >
>> > > > As far as I understood, the first concept of bcache
>> > > > was to simply "register" or "attach" a cache to a
>> > > > backing device, that is, the backing device had not
>> > > > to be formatted.
>> > > >
>> > > > Lately, still if I understood it correctly, this
>> > > > behaviour was changed and, now, the backing device
>> > > > needs to be formatted.
>> > > >
>> > > > So, the question is:
>> > > >
>> > > > How about an already running device? Is it still
>> > > > possible to attach a cache under such situation?
>> > > >
>> > > > In general, would it be possible to attach/detach
>> > > > a cache to any already available device (in the
>> > > > future)? Or the caching/backing setup must be planned
>> > > > before the HW is available, so to speak?
>> > > >
>> > > > It would be useful (and cool too), to have the
>> > > > possibility to attach/detach the SSD cache, on
>> > > > the fly (at run-time) to any device it needs it.
>> > > >
>> > > > I hope the question(s) are clear, if not please
>> > > > let me know.
>> > > >
>> > > > Thanks a lot in advance,
>> > > >
>> > > > bye,
>> > > >
>> > > > --
>> > > >
>> > > > piergiorgio
>> > > > --
>> > > > To unsubscribe from this list: send the line "unsubscribe
>> > linux-bcache" in
>> > > > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> > > >
>> >
>> > --
>> >
>> > piergiorgio
>> >
>
> --
>
> piergiorgio

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Formatting of backing device
  2012-02-01 23:11               ` Adam Berkan
@ 2012-02-02 19:01                 ` Piergiorgio Sartor
       [not found]                   ` <20120202190122.GA2353-W+Wf6LxwHt0@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Piergiorgio Sartor @ 2012-02-02 19:01 UTC (permalink / raw)
  To: Adam Berkan; +Cc: Piergiorgio Sartor, linux-bcache-u79uwXL29TY76Z2rM5mHXA

Hi Adam,

On Wed, Feb 01, 2012 at 03:11:54PM -0800, Adam Berkan wrote:
> When we make-bcache on a drive we need to replace the filesytem
> superblock with a bcache superblock so the kernel knows to load the
> drive through bcache, but this destroys the filesystem.  We've talked

well, I guess it will destroy the md superblock 1.1 too,
how about LVM metadata?

I think the mismatch is with /dev/bcacheX device.

The first implementation, as far as I remember, was simply
telling the caching device (using UUID) which was the
backing device, i.e. it was registering the backing to
the caching.
Then, still if I got it right, the bcache was caching the
backing device directly, without any need of a third
device (/dev/bcacheX).

I understand that the actual implementation is easier and,
maybe, simpler, since a completely new device is added,
which will have the new caching "features", while the
old one (backing device) is just a further layer.
This is similar to LVM over md over /dev/sdX.

Nevertheless, my opinioni is, while still considering bcache
a great project, that it should work on already existing
devices, without touching them.

Anyway, thanks a lot for the chat,

bye,

pg

> about hacky ways to hide the bcache superblock somewhere else, but
> it's very dangerous stuff that's likely to fail and we don't want to
> support it.
> 
> Adam
> 
> 
> 
> On Wed, Feb 1, 2012 at 1:44 PM, Piergiorgio Sartor
> <piergiorgio.sartor-KvP5wT2u2U0@public.gmane.org> wrote:
> > Hi Adam,
> >
> > On Wed, Feb 01, 2012 at 01:38:12PM -0800, Adam Berkan wrote:
> >> Oh, sorry I misunderstood.
> >>
> >> You have to run make-bcache once to add a bcache superblock to the drive.
> >>  After that the drive contents are destroyed and it needs to be formatted
> >> with a filesystem.
> >
> > ah! That's not good...
> >
> > Is there any plan to have the caching device attachable
> > and detachable from *any* backing device without prior
> > "formatting" of this second one?
> >
> > I think bcache is a very interesting and promising
> > project, but formatting the backing device is
> > something, I think, that should be avoided.
> >
> > bye,
> >
> > pg
> >
> >> At that point you can attach or detach the drive while it is in use.
> >>
> >> Adam
> >>
> >> On Wed, Feb 1, 2012 at 12:54 PM, Piergiorgio Sartor <
> >> piergiorgio.sartor-KvP5wT2u2U0@public.gmane.org> wrote:
> >>
> >> > Hi Adam,
> >> >
> >> > thanks for the answer, see below.
> >> >
> >> > On Wed, Feb 01, 2012 at 11:04:59AM -0800, Adam Berkan wrote:
> >> > > You can attach bcache to a drive with an existing file system, and it
> >> > will
> >> > > continue as normal.  If you connect to a drive without a file system,
> >> > then
> >> > > it will continue to not have a file system, but you can format it while
> >> > > attached.
> >> >
> >> > Maybe I misused the term "format".
> >> >
> >> > I did not mean filesystem format, but bcache format.
> >> >
> >> > What I understood, maybe I'm wrong, is that the backing
> >> > device, before being used, must be "initialized" with
> >> > the bcache tool.
> >> >
> >> > From the docs:
> >> >
> >> > Getting started:
> >> > You'll need make-bcache from the bcache-tools repository. Both the cache
> >> > device
> >> > and backing device must be formatted before use.
> >> >  make-bcache -B /dev/sdb
> >> >  make-bcache -C -w2k -b1M -j64 /dev/sdc
> >> >
> >> > I understand this as the backing device gets something
> >> > on written on it (note the term "formatted").
> >> >
> >> > Am I wrong? I hope so...
> >> >
> >> > Thanks again,
> >> >
> >> > bye,
> >> >
> >> > pg
> >> >
> >> > > Attach/detach should work while the device is in use.  This isn't the
> >> > most
> >> > > tested code path, especially with writeback on, but it's supposed to
> >> > work.
> >> > >  Detaching while the cache is dirty requires flushing all that data so
> >> > > performance will be bad until the detach completes.
> >> > >
> >> > > Let us know if you find any bugs.
> >> > > Adam
> >> > >
> >> > > On Wed, Feb 1, 2012 at 2:10 AM, Piergiorgio Sartor <
> >> > > piergiorgio.sartor-KvP5wT2u2U0@public.gmane.org> wrote:
> >> > >
> >> > > > Hi all,
> >> > > >
> >> > > > first of all I would like to congratulate for this
> >> > > > project, I think it is one of the most promising
> >> > > > feature the Linux kernel can have.
> >> > > >
> >> > > > Wrote that, I've a question about the concept of
> >> > > > formatting the backing device.
> >> > > >
> >> > > > As far as I understood, the first concept of bcache
> >> > > > was to simply "register" or "attach" a cache to a
> >> > > > backing device, that is, the backing device had not
> >> > > > to be formatted.
> >> > > >
> >> > > > Lately, still if I understood it correctly, this
> >> > > > behaviour was changed and, now, the backing device
> >> > > > needs to be formatted.
> >> > > >
> >> > > > So, the question is:
> >> > > >
> >> > > > How about an already running device? Is it still
> >> > > > possible to attach a cache under such situation?
> >> > > >
> >> > > > In general, would it be possible to attach/detach
> >> > > > a cache to any already available device (in the
> >> > > > future)? Or the caching/backing setup must be planned
> >> > > > before the HW is available, so to speak?
> >> > > >
> >> > > > It would be useful (and cool too), to have the
> >> > > > possibility to attach/detach the SSD cache, on
> >> > > > the fly (at run-time) to any device it needs it.
> >> > > >
> >> > > > I hope the question(s) are clear, if not please
> >> > > > let me know.
> >> > > >
> >> > > > Thanks a lot in advance,
> >> > > >
> >> > > > bye,
> >> > > >
> >> > > > --
> >> > > >
> >> > > > piergiorgio
> >> > > > --
> >> > > > To unsubscribe from this list: send the line "unsubscribe
> >> > linux-bcache" in
> >> > > > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> >> > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >> > > >
> >> >
> >> > --
> >> >
> >> > piergiorgio
> >> >
> >
> > --
> >
> > piergiorgio

-- 

piergiorgio

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Formatting of backing device
       [not found]                   ` <20120202190122.GA2353-W+Wf6LxwHt0@public.gmane.org>
@ 2012-02-02 22:11                     ` Kent Overstreet
       [not found]                       ` <20120202221101.GA26768-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Kent Overstreet @ 2012-02-02 22:11 UTC (permalink / raw)
  To: Piergiorgio Sartor; +Cc: Adam Berkan, linux-bcache-u79uwXL29TY76Z2rM5mHXA

On Thu, Feb 02, 2012 at 08:01:22PM +0100, Piergiorgio Sartor wrote:
> Hi Adam,
> 
> On Wed, Feb 01, 2012 at 03:11:54PM -0800, Adam Berkan wrote:
> > When we make-bcache on a drive we need to replace the filesytem
> > superblock with a bcache superblock so the kernel knows to load the
> > drive through bcache, but this destroys the filesystem.  We've talked
> 
> well, I guess it will destroy the md superblock 1.1 too,
> how about LVM metadata?
> 
> I think the mismatch is with /dev/bcacheX device.
> 
> The first implementation, as far as I remember, was simply
> telling the caching device (using UUID) which was the
> backing device, i.e. it was registering the backing to
> the caching.
> Then, still if I got it right, the bcache was caching the
> backing device directly, without any need of a third
> device (/dev/bcacheX).
> 
> I understand that the actual implementation is easier and,
> maybe, simpler, since a completely new device is added,
> which will have the new caching "features", while the
> old one (backing device) is just a further layer.
> This is similar to LVM over md over /dev/sdX.

The reason for getting rid of transparent caching didn't have anything
to do with ease of implementation: the real reason is that safely doing
persistent caching (and writeback!) is impossible with transparent
caching.

Adding back a mode that caches a device without a bcache superblock but
without the cache being persistent isn't out of the question, but it
wouldn't be terribly useful to us so it's not at all a priority for me.
If someone else wrote the code I'd take patches, though.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Formatting of backing device
       [not found]                       ` <20120202221101.GA26768-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
@ 2012-02-02 22:24                         ` Piergiorgio Sartor
  2012-02-16 19:42                           ` Alex Elsayed
  0 siblings, 1 reply; 19+ messages in thread
From: Piergiorgio Sartor @ 2012-02-02 22:24 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Piergiorgio Sartor, Adam Berkan, linux-bcache-u79uwXL29TY76Z2rM5mHXA

Hi Kent,

nice to have you in this discussion!

> > I understand that the actual implementation is easier and,
> > maybe, simpler, since a completely new device is added,
> > which will have the new caching "features", while the
> > old one (backing device) is just a further layer.
> > This is similar to LVM over md over /dev/sdX.
> 
> The reason for getting rid of transparent caching didn't have anything
> to do with ease of implementation: the real reason is that safely doing
> persistent caching (and writeback!) is impossible with transparent
> caching.

Well, it seems to me "impossible" is a big word...
I could image is more "invasive".

> Adding back a mode that caches a device without a bcache superblock but
> without the cache being persistent isn't out of the question, but it

I miss the point, the superblock can be stored in
the caching device, instead of the backing and
the actual device *could* stay the same.
The kernel would have to discover first the caching,
later the backing and then put things together.
So, the cache will be persistent, or?

As I wrote above, I see this more complex than
adding a further layer, likely I would do the same.

> wouldn't be terribly useful to us so it's not at all a priority for me.
> If someone else wrote the code I'd take patches, though.

No time for that, unfortunately.

I take the opportunity to congratulate personally
to you for this project, well done!

bye,

-- 

piergiorgio

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Formatting of backing device
  2012-02-02 22:24                         ` Piergiorgio Sartor
@ 2012-02-16 19:42                           ` Alex Elsayed
       [not found]                             ` <loom.20120216T200235-190-eS7Uydv5nfjZ+VzJOa5vwg@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Alex Elsayed @ 2012-02-16 19:42 UTC (permalink / raw)
  To: linux-bcache-u79uwXL29TY76Z2rM5mHXA

Piergiorgio Sartor <piergiorgio.sartor@...> writes:

> > The reason for getting rid of transparent caching didn't have anything
> > to do with ease of implementation: the real reason is that safely doing
> > persistent caching (and writeback!) is impossible with transparent
> > caching.
>
> Well, it seems to me "impossible" is a big word...
> I could image is more "invasive".

Not invasive, *horribly unsafe*

> > Adding back a mode that caches a device without a bcache superblock but
> > without the cache being persistent isn't out of the question, but it
> 
> I miss the point, the superblock can be stored in
> the caching device, instead of the backing and
> the actual device *could* stay the same.
> The kernel would have to discover first the caching,
> later the backing and then put things together.
> So, the cache will be persistent, or?

Oh sure, the cache is persistent. But device discovery order is undefined, and
if the backing device is no different from one without a cache and writeback
caching is enabled the kernel has no *possible* way to know that a caching
device is needed or even exists. So it mounts it, but it doesn't have any of the
data in the writeback cache meaning it thinks the filesystem is corrupted.
Depending on the filesystem and exactly what is missing, it may run some
in-kernel recovery code that alters the disk. You just lost your data.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Formatting of backing device
       [not found]                             ` <loom.20120216T200235-190-eS7Uydv5nfjZ+VzJOa5vwg@public.gmane.org>
@ 2012-02-16 20:33                               ` Piergiorgio Sartor
       [not found]                                 ` <20120216203332.GA6597-W+Wf6LxwHt0@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Piergiorgio Sartor @ 2012-02-16 20:33 UTC (permalink / raw)
  To: Alex Elsayed; +Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA

Hi Alex,

> Oh sure, the cache is persistent. But device discovery order is undefined, and
> if the backing device is no different from one without a cache and writeback
> caching is enabled the kernel has no *possible* way to know that a caching
> device is needed or even exists. So it mounts it, but it doesn't have any of the
> data in the writeback cache meaning it thinks the filesystem is corrupted.
> Depending on the filesystem and exactly what is missing, it may run some
> in-kernel recovery code that alters the disk. You just lost your data.

nonono, I believe I wrote that the kernel
should *first* look for caching devices
and later for the others...

The formatting thing is, clearly, a much
standard approach, for the current kernel
architecture, but nothing forbids to have
a hierarchical search of devices.
This could be done, for example, by assigning
different classes to each device type, to
be scanned in a specific order. 

In this scope (not bcache, but device discovery)
it is already a problem a layered software RAID
with metadata 1.0 together with 1.2 (or 1.1).
Where the first lies at the end and the second
at the beginning of the HDDs, making it difficult
(but not impossible) to find out which is the
outer and which is the inner one.

bye,

-- 

piergiorgio

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Formatting of backing device
       [not found]                                 ` <20120216203332.GA6597-W+Wf6LxwHt0@public.gmane.org>
@ 2012-02-16 20:50                                   ` Alex Elsayed
       [not found]                                     ` <CA++fp8wcxTDJ=mbsKmWi27+yRZg-tyNdWgmhWU6=UeWgC0TZuw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Alex Elsayed @ 2012-02-16 20:50 UTC (permalink / raw)
  To: Piergiorgio Sartor; +Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA

On Thu, Feb 16, 2012 at 12:33 PM, Piergiorgio Sartor
<piergiorgio.sartor-KvP5wT2u2U0@public.gmane.org> wrote:
> Hi Alex,
>
>> Oh sure, the cache is persistent. But device discovery order is undefined, and
>> if the backing device is no different from one without a cache and writeback
>> caching is enabled the kernel has no *possible* way to know that a caching
>> device is needed or even exists. So it mounts it, but it doesn't have any of the
>> data in the writeback cache meaning it thinks the filesystem is corrupted.
>> Depending on the filesystem and exactly what is missing, it may run some
>> in-kernel recovery code that alters the disk. You just lost your data.
>
> nonono, I believe I wrote that the kernel
> should *first* look for caching devices
> and later for the others...
>
> The formatting thing is, clearly, a much
> standard approach, for the current kernel
> architecture, but nothing forbids to have
> a hierarchical search of devices.
> This could be done, for example, by assigning
> different classes to each device type, to
> be scanned in a specific order.
>
> In this scope (not bcache, but device discovery)
> it is already a problem a layered software RAID
> with metadata 1.0 together with 1.2 (or 1.1).
> Where the first lies at the end and the second
> at the beginning of the HDDs, making it difficult
> (but not impossible) to find out which is the
> outer and which is the inner one.

The difference is that for MD devices, both types
of metadata are on the same block device. You're
prioritizing which *type of metadata* is checked
for first in that case. For bcache, you'd have to
scan /dev/sdz before /dev/sda if sdz is the cache
and sda is the backing device. Now consider a
few things:

1.) SCSI/SATA devices may be probed in parallel

2.) udev gets events when each device is probed,
*not* after all devices have been probed

3.) The bcache device may not even be attached
to the system at the time

4.) Even in the MD case, there is still *some*
change to the backing device, there is still some
sort of data there that says "hey, there's more."
A totally unchanged backing device won't do that.
Even if it doesn't invalidate the other metadata, it
still tells the kernel that it's not enough - think of
it as invalidating it at the logical rather than the
physical level

3 and 4 are the really critical ones. If the cable
that connects the SSD to the computer is flaky,
and it never gets probed, and there is *no*
metadata on the backing device, there is
*exactly* zero information available to the kernel
to inform it that a backing device ever existed at all.

Also, you say that the cache must be scanned
before the backing device - but how do you know
it's a cache or a backing device until you've probed it?
You could delay sending any uevents untill all
devices are probed, except there are some devices
that take 30sec timeouts and fail, or iscsi, or devices
that get plugged in at runtime, or...

And since you can't do that, you have a chicken
and egg problem. You can't probe the backing
device before the cache, but you don't know which
is the cache until you probe it. And there may be
more than one of each. You can have one cache
and 200 backing devices, in theory. Want to take
the odds that the cache gets probed first at random?
Because the kernel doesn't have enough information
for it to be anything other than random.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Formatting of backing device
       [not found]                                     ` <CA++fp8wcxTDJ=mbsKmWi27+yRZg-tyNdWgmhWU6=UeWgC0TZuw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-02-16 20:52                                       ` Alex Elsayed
  2012-02-16 22:35                                       ` Piergiorgio Sartor
  1 sibling, 0 replies; 19+ messages in thread
From: Alex Elsayed @ 2012-02-16 20:52 UTC (permalink / raw)
  To: Piergiorgio Sartor; +Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA

On Thu, Feb 16, 2012 at 12:50 PM, Alex Elsayed <eternaleye-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> On Thu, Feb 16, 2012 at 12:33 PM, Piergiorgio Sartor
> <piergiorgio.sartor-KvP5wT2u2U0@public.gmane.org> wrote:
>> Hi Alex,
>>
>>> Oh sure, the cache is persistent. But device discovery order is undefined, and
>>> if the backing device is no different from one without a cache and writeback
>>> caching is enabled the kernel has no *possible* way to know that a caching
>>> device is needed or even exists. So it mounts it, but it doesn't have any of the
>>> data in the writeback cache meaning it thinks the filesystem is corrupted.
>>> Depending on the filesystem and exactly what is missing, it may run some
>>> in-kernel recovery code that alters the disk. You just lost your data.
>>
>> nonono, I believe I wrote that the kernel
>> should *first* look for caching devices
>> and later for the others...
>>
>> The formatting thing is, clearly, a much
>> standard approach, for the current kernel
>> architecture, but nothing forbids to have
>> a hierarchical search of devices.
>> This could be done, for example, by assigning
>> different classes to each device type, to
>> be scanned in a specific order.
>>
>> In this scope (not bcache, but device discovery)
>> it is already a problem a layered software RAID
>> with metadata 1.0 together with 1.2 (or 1.1).
>> Where the first lies at the end and the second
>> at the beginning of the HDDs, making it difficult
>> (but not impossible) to find out which is the
>> outer and which is the inner one.
>
> The difference is that for MD devices, both types
> of metadata are on the same block device. You're
> prioritizing which *type of metadata* is checked
> for first in that case. For bcache, you'd have to
> scan /dev/sdz before /dev/sda if sdz is the cache
> and sda is the backing device. Now consider a
> few things:
>
> 1.) SCSI/SATA devices may be probed in parallel
>
> 2.) udev gets events when each device is probed,
> *not* after all devices have been probed
>
> 3.) The bcache device may not even be attached
> to the system at the time
>
> 4.) Even in the MD case, there is still *some*
> change to the backing device, there is still some
> sort of data there that says "hey, there's more."
> A totally unchanged backing device won't do that.
> Even if it doesn't invalidate the other metadata, it
> still tells the kernel that it's not enough - think of
> it as invalidating it at the logical rather than the
> physical level
>
> 3 and 4 are the really critical ones. If the cable
> that connects the SSD to the computer is flaky,
> and it never gets probed, and there is *no*
> metadata on the backing device, there is
> *exactly* zero information available to the kernel
> to inform it that a backing device ever existed at all.

Er, to inform it that a *cache* device ever existed

>
> Also, you say that the cache must be scanned
> before the backing device - but how do you know
> it's a cache or a backing device until you've probed it?
> You could delay sending any uevents untill all
> devices are probed, except there are some devices
> that take 30sec timeouts and fail, or iscsi, or devices
> that get plugged in at runtime, or...
>
> And since you can't do that, you have a chicken
> and egg problem. You can't probe the backing
> device before the cache, but you don't know which
> is the cache until you probe it. And there may be
> more than one of each. You can have one cache
> and 200 backing devices, in theory. Want to take
> the odds that the cache gets probed first at random?
> Because the kernel doesn't have enough information
> for it to be anything other than random.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Formatting of backing device
       [not found]                                     ` <CA++fp8wcxTDJ=mbsKmWi27+yRZg-tyNdWgmhWU6=UeWgC0TZuw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2012-02-16 20:52                                       ` Alex Elsayed
@ 2012-02-16 22:35                                       ` Piergiorgio Sartor
       [not found]                                         ` <20120216223554.GA6947-W+Wf6LxwHt0@public.gmane.org>
  1 sibling, 1 reply; 19+ messages in thread
From: Piergiorgio Sartor @ 2012-02-16 22:35 UTC (permalink / raw)
  To: Alex Elsayed; +Cc: Piergiorgio Sartor, linux-bcache-u79uwXL29TY76Z2rM5mHXA

Hi Alex.

> The difference is that for MD devices, both types
> of metadata are on the same block device. You're
> prioritizing which *type of metadata* is checked

how? 1.0 in 1.1 is the same as 1.1 in 1.0...
The only difference would be that one is smaller
than the other, which can hint which is first
and which is second.

> for first in that case. For bcache, you'd have to
> scan /dev/sdz before /dev/sda if sdz is the cache
> and sda is the backing device. Now consider a
> few things:

Again, you scan *all* and check *only* for
cache devices.
After that, if none found, you've your list of
devices, if someone found, you activate these
first and then the corresponding backing device.

> 1.) SCSI/SATA devices may be probed in parallel

And this does not make any difference, in
this context. Probed does not mean necessarily
activated. Maybe you mean probed as activated.
For me it is different.

> 2.) udev gets events when each device is probed,
> *not* after all devices have been probed

This is a udev issue, which can be fixed... :-)

> 3.) The bcache device may not even be attached
> to the system at the time

Good, so the persistency is not needed, I guess,
in that case...
Or, the backing device cannot be activated,
which might be an option, in the current
architecture, but, maybe a bit borderline.

> 4.) Even in the MD case, there is still *some*
> change to the backing device, there is still some
> sort of data there that says "hey, there's more."

If you mean the 1.1 in 1.0 (or the other way around),
there is no information telling you there's more,
except, as mentioned, the size, which is not directly
related to device probing.

Otherwise, I do not understand what do you mean.

> Even if it doesn't invalidate the other metadata, it
> still tells the kernel that it's not enough - think of
> it as invalidating it at the logical rather than the
> physical level
> 
> 3 and 4 are the really critical ones. If the cable
> that connects the SSD to the computer is flaky,

In this case you've much more serious problems,
I guess, this is not a use case.
The cable can be flaky also after the probing
and activation, and result in a disaster.

> Also, you say that the cache must be scanned
> before the backing device - but how do you know
> it's a cache or a backing device until you've probed it?

The cache has ad "header" with enough information,
namely the UUID(s) of the backing device(s)
So you probe (I use "scan") all devices, sort out
caches, sort out backing and the rest.
Then you activate in proper order.
There are many other alternatives.

> You could delay sending any uevents untill all
> devices are probed, except there are some devices
> that take 30sec timeouts and fail, or iscsi, or devices
> that get plugged in at runtime, or...

Those are *all* solvable problems. Some of
them are even too generic. That is, they're
problems in any case.

As I wrote few posts ago, it is clear why it is
like it is. It is *complex* to implement all the
required changes in order to have the backing
device unformatted. Which has, in the end,
limited advantage.

No problem with that, very fine for me, but
telling the it is not possible, it is just,
well, let's say funny.

> And since you can't do that, you have a chicken
> and egg problem. You can't probe the backing
> device before the cache, but you don't know which
> is the cache until you probe it. And there may be
> more than one of each. You can have one cache
> and 200 backing devices, in theory. Want to take
> the odds that the cache gets probed first at random?
> Because the kernel doesn't have enough information
> for it to be anything other than random.

The kernel, again, has to separate the probing
process, from the activation process.

Furthermore, it could always be possible to
configure the booting process to do so, in
an *explicit* way, like md does usually, i.e.
with a configuration file (in initramfs).

bye,

-- 

piergiorgio

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Formatting of backing device
       [not found]                                         ` <20120216223554.GA6947-W+Wf6LxwHt0@public.gmane.org>
@ 2012-02-16 23:09                                           ` Joseph Glanville
       [not found]                                             ` <CAOzFzEhO+6ECN-WjvtMK+-2g7Dwo+DPwQMVWuCZG=Y3BVRNEBw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Joseph Glanville @ 2012-02-16 23:09 UTC (permalink / raw)
  To: Piergiorgio Sartor; +Cc: Alex Elsayed, linux-bcache-u79uwXL29TY76Z2rM5mHXA

Hi Piergiorgio,

Your reasoning is quite sound assuming the cache device is present at
activation time.

In the case where the cache device has failed but the backing device
has persisted the failure then the case looks somewhat more like this:
1) OS probes all devices, searches for caches and finds none.
2) Activate the raw backing device with possibly corrupt data....

This is the primary reason Alex has been trying to convince you of the
necessity of the super block on the backing device, it exists to tell
the kernel not to try activate it raw if the cache is not found.

Joseph.

On 17 February 2012 09:35, Piergiorgio Sartor
<piergiorgio.sartor-KvP5wT2u2U0@public.gmane.org> wrote:
> Hi Alex.
>
>> The difference is that for MD devices, both types
>> of metadata are on the same block device. You're
>> prioritizing which *type of metadata* is checked
>
> how? 1.0 in 1.1 is the same as 1.1 in 1.0...
> The only difference would be that one is smaller
> than the other, which can hint which is first
> and which is second.
>
>> for first in that case. For bcache, you'd have to
>> scan /dev/sdz before /dev/sda if sdz is the cache
>> and sda is the backing device. Now consider a
>> few things:
>
> Again, you scan *all* and check *only* for
> cache devices.
> After that, if none found, you've your list of
> devices, if someone found, you activate these
> first and then the corresponding backing device.
>
>> 1.) SCSI/SATA devices may be probed in parallel
>
> And this does not make any difference, in
> this context. Probed does not mean necessarily
> activated. Maybe you mean probed as activated.
> For me it is different.
>
>> 2.) udev gets events when each device is probed,
>> *not* after all devices have been probed
>
> This is a udev issue, which can be fixed... :-)
>
>> 3.) The bcache device may not even be attached
>> to the system at the time
>
> Good, so the persistency is not needed, I guess,
> in that case...
> Or, the backing device cannot be activated,
> which might be an option, in the current
> architecture, but, maybe a bit borderline.
>
>> 4.) Even in the MD case, there is still *some*
>> change to the backing device, there is still some
>> sort of data there that says "hey, there's more."
>
> If you mean the 1.1 in 1.0 (or the other way around),
> there is no information telling you there's more,
> except, as mentioned, the size, which is not directly
> related to device probing.
>
> Otherwise, I do not understand what do you mean.
>
>> Even if it doesn't invalidate the other metadata, it
>> still tells the kernel that it's not enough - think of
>> it as invalidating it at the logical rather than the
>> physical level
>>
>> 3 and 4 are the really critical ones. If the cable
>> that connects the SSD to the computer is flaky,
>
> In this case you've much more serious problems,
> I guess, this is not a use case.
> The cable can be flaky also after the probing
> and activation, and result in a disaster.
>
>> Also, you say that the cache must be scanned
>> before the backing device - but how do you know
>> it's a cache or a backing device until you've probed it?
>
> The cache has ad "header" with enough information,
> namely the UUID(s) of the backing device(s)
> So you probe (I use "scan") all devices, sort out
> caches, sort out backing and the rest.
> Then you activate in proper order.
> There are many other alternatives.
>
>> You could delay sending any uevents untill all
>> devices are probed, except there are some devices
>> that take 30sec timeouts and fail, or iscsi, or devices
>> that get plugged in at runtime, or...
>
> Those are *all* solvable problems. Some of
> them are even too generic. That is, they're
> problems in any case.
>
> As I wrote few posts ago, it is clear why it is
> like it is. It is *complex* to implement all the
> required changes in order to have the backing
> device unformatted. Which has, in the end,
> limited advantage.
>
> No problem with that, very fine for me, but
> telling the it is not possible, it is just,
> well, let's say funny.
>
>> And since you can't do that, you have a chicken
>> and egg problem. You can't probe the backing
>> device before the cache, but you don't know which
>> is the cache until you probe it. And there may be
>> more than one of each. You can have one cache
>> and 200 backing devices, in theory. Want to take
>> the odds that the cache gets probed first at random?
>> Because the kernel doesn't have enough information
>> for it to be anything other than random.
>
> The kernel, again, has to separate the probing
> process, from the activation process.
>
> Furthermore, it could always be possible to
> configure the booting process to do so, in
> an *explicit* way, like md does usually, i.e.
> with a configuration file (in initramfs).
>
> bye,
>
> --
>
> piergiorgio
> --
> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Founder | Director | VP Research
Orion Virtualisation Solutions | www.orionvm.com.au | Phone: 1300 56
99 52 | Mobile: 0428 754 846

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Formatting of backing device
       [not found]                                             ` <CAOzFzEhO+6ECN-WjvtMK+-2g7Dwo+DPwQMVWuCZG=Y3BVRNEBw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-02-16 23:17                                               ` Piergiorgio Sartor
       [not found]                                                 ` <20120216231754.GA14206-W+Wf6LxwHt0@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Piergiorgio Sartor @ 2012-02-16 23:17 UTC (permalink / raw)
  To: Joseph Glanville
  Cc: Piergiorgio Sartor, Alex Elsayed, linux-bcache-u79uwXL29TY76Z2rM5mHXA

Hi joseph,

> Your reasoning is quite sound assuming the cache device is present at
> activation time.
> 
> In the case where the cache device has failed but the backing device
> has persisted the failure then the case looks somewhat more like this:
> 1) OS probes all devices, searches for caches and finds none.
> 2) Activate the raw backing device with possibly corrupt data....

as I mentioned, this is a bit borderline.

One reason is that it would be a failure in
any case, depending on what the system will
do with the backing device.

Second, as per md, the configuration could
be in a file in initramfs, which will allow
to support this type of failure *and* have
the backing device unformatted.

In other words, it does not need to be
activated automatically by kernel, it can
be done by the user, like md...

As wrote before, I'm fine with the formatting,
very clear and understandable.

bye,

-- 

piergiorgio

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Formatting of backing device
       [not found]                                                 ` <20120216231754.GA14206-W+Wf6LxwHt0@public.gmane.org>
@ 2012-02-16 23:34                                                   ` Alex Elsayed
       [not found]                                                     ` <CA++fp8w7_uUd35Tcwy1bwEYpR6tJ+fkWMEg+iVEyJ1H4hqKBKg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Alex Elsayed @ 2012-02-16 23:34 UTC (permalink / raw)
  To: Piergiorgio Sartor; +Cc: Joseph Glanville, linux-bcache-u79uwXL29TY76Z2rM5mHXA

On Thu, Feb 16, 2012 at 3:17 PM, Piergiorgio Sartor
<piergiorgio.sartor-KvP5wT2u2U0@public.gmane.org> wrote:
> Hi joseph,
>
>> Your reasoning is quite sound assuming the cache device is present at
>> activation time.
>>
>> In the case where the cache device has failed but the backing device
>> has persisted the failure then the case looks somewhat more like this:
>> 1) OS probes all devices, searches for caches and finds none.
>> 2) Activate the raw backing device with possibly corrupt data....
>
> as I mentioned, this is a bit borderline.
>
> One reason is that it would be a failure in
> any case, depending on what the system will
> do with the backing device.

Perhaps, but there are two types of failures that are
absolutely critical to distinguish between:

Recoverable, and unrecoverable.

If there is a superblock, any error in which the cache
device is not available at activation is recoverable so
long as the cache device can be made available at
some other time.

If there is a superblock, whether such a situation is
recoverable is now undefined, and dependent on the
implementation of the filesystem.

This is a recipe for a horrible disaster.

> Second, as per md, the configuration could
> be in a file in initramfs, which will allow
> to support this type of failure *and* have
> the backing device unformatted.

Actually, in modern initramfs' (see dracut) the
way md devices are set up is via dynamic scanning,
NOT via a static configuration file.

This is possible *because* md devices have a
superblock on the backing devices. This is
*desirable* because a generic initramfs reduces
the burden on the user (to know what they are
doing) and on the distribution (to support users
who roll their own initramfs)

And dracut's entire logic is based on acting on
devices as they are detected, so delaying all
uevents until everything has been found would
catastrophically break it. Especially because
it acting on those events can create more devices
which also need probed.

What if your cache is on LVM but your backing
devices are whole disks? Waiting until all devices
have been probed before poking userspace means
that you will never find the cache at all.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Formatting of backing device
       [not found]                                                     ` <CA++fp8w7_uUd35Tcwy1bwEYpR6tJ+fkWMEg+iVEyJ1H4hqKBKg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-02-16 23:35                                                       ` Alex Elsayed
  2012-02-17 19:12                                                       ` Piergiorgio Sartor
  1 sibling, 0 replies; 19+ messages in thread
From: Alex Elsayed @ 2012-02-16 23:35 UTC (permalink / raw)
  To: Piergiorgio Sartor; +Cc: Joseph Glanville, linux-bcache-u79uwXL29TY76Z2rM5mHXA

> If there is a superblock, whether such a situation is
> recoverable is now undefined, and dependent on the
 Argh. "If there is *not* a superblock"

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Formatting of backing device
       [not found]                                                     ` <CA++fp8w7_uUd35Tcwy1bwEYpR6tJ+fkWMEg+iVEyJ1H4hqKBKg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2012-02-16 23:35                                                       ` Alex Elsayed
@ 2012-02-17 19:12                                                       ` Piergiorgio Sartor
  1 sibling, 0 replies; 19+ messages in thread
From: Piergiorgio Sartor @ 2012-02-17 19:12 UTC (permalink / raw)
  To: Alex Elsayed
  Cc: Piergiorgio Sartor, Joseph Glanville,
	linux-bcache-u79uwXL29TY76Z2rM5mHXA

Hi Alex,

> Perhaps, but there are two types of failures that are
> absolutely critical to distinguish between:
> 
> Recoverable, and unrecoverable.
> 
> If there is a superblock, any error in which the cache
> device is not available at activation is recoverable so
> long as the cache device can be made available at
> some other time.

yes and no, depend, as wrote before, on
*if* the cache device is coming and what
is the system doing with the volume.
 
> If there is a superblock, whether such a situation is
> recoverable is now undefined, and dependent on the
> implementation of the filesystem.

OK, let's say it differently, the superblock
*could* be in a different place than the
backing device.
 
> Actually, in modern initramfs' (see dracut) the
> way md devices are set up is via dynamic scanning,
> NOT via a static configuration file.

Actually, the dynamic scanning is done in
user space, not in kernel space.
It could be done using "mdadm.conf" or, by
"udev", using "mdadm -I" and proper udev rules.
This could be replicated with bcache.

As an example, and please note this is just
and example, so no nitpicking, we can consider
the following.

First of all, what is required is to activate
the bcache system from boot and to be able to
use the persistent caching, maybe even in write
back mode (backing not in sync).

What we need is:

1) udev rule, which is trigger by any storage device
found by the kernel. This should support the skipping
of following rules. AFAIK this is somehow supported
in udev, if not it will require a patch.

2) User space tool, let's call "bcacheadm", which can
activate bcache devices. This is called by the above
udev rule, must keep state across calls (it could use
the /dev/ fs or daemonize, for example) and should
have proper return codes, in order to allow udev to
skip following rules.

3) Configuration file, which contains, in the simple
case, pairs of device UUIDs, let's say caching-backing.
In case of more complex configurations it could be a
human (un)readable xml file.

Everything is packed into the initramfs, like it is
nowadays done with mdadm.

When a storage device pops up, udev call, at first,
the bcache rule. bcacheadm will then check if the
device is in the configuration list. If not, it will
just return and the following udev rules will run.
If yes, it will "copy" the device in the proper slot
(figuratively, slot in the list) and, if the slot is
full (caching and backing present), it will ask the
kernel to create the bcache device (and trigger a
further udev event). If the slot is not full, then
it will return and inform udev to skip the following
rules (for this device).

As wrote above, this all run in initramfs, like it
happens for md devices.
This is a bit complex, but I'm pretty sure smart
people can do better.

More or less the initial requirements are fulfilled.

This approach will, de facto, detach the superblock
from the backing device and put it in the config file.

What do we gain? A backing device unformatted.

What do we pay? A part for a little complexity, we
introduce a single point of failure, namely the
configuration file. If this is lost, damaged or
changed unintentionally, we can, potentially,
create the situation where the backing device
is activated without cache.
Of course, the information in this config file
could be reduntant and several fail-safe mechanisms
could be considered.

Is this worth? If you ask me, it is *NOT* worth it.

Nevertheless, my point is that it is *possible*.
Complexity is the limitation, the several dependencies,
the udev weaknesses, and so on.

That's why, I write it again, I fully agree on having
the superblock into the backing device.

Sorry for the long post,

bye,

-- 

piergiorgio

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2012-02-17 19:12 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-02-01 10:10 Formatting of backing device Piergiorgio Sartor
     [not found] ` <20120201101041.GA2779-W+Wf6LxwHt0@public.gmane.org>
2012-02-01 19:12   ` Adam Berkan
     [not found] ` <CAHYUNGYcs3CeRA8Pk-R_3hA6mFHshKzysxRaCcsfm3WLT__B0A@mail.gmail.com>
     [not found]   ` <CAHYUNGYcs3CeRA8Pk-R_3hA6mFHshKzysxRaCcsfm3WLT__B0A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-02-01 20:54     ` Piergiorgio Sartor
     [not found]       ` <20120201205456.GA7669-W+Wf6LxwHt0@public.gmane.org>
2012-02-01 21:43         ` Adam Berkan
     [not found]       ` <CAHYUNGaB4LCESDWU1tWB1ZJp_kBH_=19e07vCndxXS5T98_xBA@mail.gmail.com>
     [not found]         ` <CAHYUNGaB4LCESDWU1tWB1ZJp_kBH_=19e07vCndxXS5T98_xBA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-02-01 21:44           ` Piergiorgio Sartor
     [not found]             ` <20120201214443.GA8544-W+Wf6LxwHt0@public.gmane.org>
2012-02-01 23:11               ` Adam Berkan
2012-02-02 19:01                 ` Piergiorgio Sartor
     [not found]                   ` <20120202190122.GA2353-W+Wf6LxwHt0@public.gmane.org>
2012-02-02 22:11                     ` Kent Overstreet
     [not found]                       ` <20120202221101.GA26768-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-02-02 22:24                         ` Piergiorgio Sartor
2012-02-16 19:42                           ` Alex Elsayed
     [not found]                             ` <loom.20120216T200235-190-eS7Uydv5nfjZ+VzJOa5vwg@public.gmane.org>
2012-02-16 20:33                               ` Piergiorgio Sartor
     [not found]                                 ` <20120216203332.GA6597-W+Wf6LxwHt0@public.gmane.org>
2012-02-16 20:50                                   ` Alex Elsayed
     [not found]                                     ` <CA++fp8wcxTDJ=mbsKmWi27+yRZg-tyNdWgmhWU6=UeWgC0TZuw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-02-16 20:52                                       ` Alex Elsayed
2012-02-16 22:35                                       ` Piergiorgio Sartor
     [not found]                                         ` <20120216223554.GA6947-W+Wf6LxwHt0@public.gmane.org>
2012-02-16 23:09                                           ` Joseph Glanville
     [not found]                                             ` <CAOzFzEhO+6ECN-WjvtMK+-2g7Dwo+DPwQMVWuCZG=Y3BVRNEBw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-02-16 23:17                                               ` Piergiorgio Sartor
     [not found]                                                 ` <20120216231754.GA14206-W+Wf6LxwHt0@public.gmane.org>
2012-02-16 23:34                                                   ` Alex Elsayed
     [not found]                                                     ` <CA++fp8w7_uUd35Tcwy1bwEYpR6tJ+fkWMEg+iVEyJ1H4hqKBKg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-02-16 23:35                                                       ` Alex Elsayed
2012-02-17 19:12                                                       ` Piergiorgio Sartor

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.