All of lore.kernel.org
 help / color / mirror / Atom feed
From: Wei-Chung Cheng <freeze.vicente.cheng@gmail.com>
To: Travis Rhoden <trhoden@gmail.com>
Cc: Sage Weil <sage@newdream.net>,
	Robert LeBlanc <robert@leblancnet.us>,
	Wido den Hollander <wido@42on.com>,
	Loic Dachary <loic@dachary.org>,
	ceph-devel <ceph-devel@vger.kernel.org>
Subject: Re: ceph-deploy osd destroy feature
Date: Wed, 7 Jan 2015 10:18:51 +0800	[thread overview]
Message-ID: <CABF_e-FvtHzwzc5MUV+=hWmFXDn=eNsD9XV-Q5P8KMMi5uN6hg@mail.gmail.com> (raw)
In-Reply-To: <CACkq2mqG41qOebE4ghFWyhAc8_6MCKrHrLuQo+b_QqdYzWfMNA@mail.gmail.com>

2015-01-07 0:30 GMT+08:00 Travis Rhoden <trhoden@gmail.com>:
> On Tue, Jan 6, 2015 at 11:23 AM, Sage Weil <sage@newdream.net> wrote:
>> On Tue, 6 Jan 2015, Travis Rhoden wrote:
>>> On Tue, Jan 6, 2015 at 9:28 AM, Sage Weil <sage@newdream.net> wrote:
>>> > On Tue, 6 Jan 2015, Wei-Chung Cheng wrote:
>>> >> 2015-01-06 13:08 GMT+08:00 Sage Weil <sage@newdream.net>:
>>> >> > On Tue, 6 Jan 2015, Wei-Chung Cheng wrote:
>>> >> >> Dear all:
>>> >> >>
>>> >> >> I agree Robert opinion because I hit the similar problem once.
>>> >> >> I think that how to handle journal partition is another problem about
>>> >> >> destroy subcommand.
>>> >> >> (Although it will work normally most time)
>>> >> >>
>>> >> >> I also agree we need the "secure erase" feature.
>>> >> >> As my experience, I just make new label for disk by "parted" command.
>>> >> >> I will think how could we do a secure erase or someone have a good
>>> >> >> idea for this?
>>> >> >
>>> >> > The simplest secure erase is to encrypt the disk and destroy the key.  You
>>> >> > can do that with dm-crypt today.  Most drives also will do this in the
>>> >> > firmware but I'm not familiar with the toolchain needed to use that
>>> >> > feature.  (It would be much preferable to go that route, though, since it
>>> >> > will avoid any CPU overhead.)
>>> >> >
>>> >> > sage
>>> >>
>>> >> I think I got some misunderstanding.
>>> >> The secure erase means how to handle the disk which have encrypt
>>> >> feature (SED disk)?
>>> >> or it means that encrypt the disk by dm-crypt?
>>> >
>>> > Normally secure erase simply means destroying the data on disk.
>>> > In practice, that can be hard.  Overwriting it will mostly work, but it's
>>> > slow, and with effort forensics can often still recover the old data.
>>> >
>>> > Encrypting a disk and then destroying just the encryption key is an easy
>>> > way to "erase" a entire disk.  It's not uncommon to do this so that old
>>> > disks can be RMAed or disposed of through the usual channels without fear
>>> > of data being recovered.
>>> >
>>> > sage
>>> >
>>> >
>>> >>
>>> >> Would Travis describe the "secure erase" more detailly?
>>>
>>> Encrypting and throwing away the key is a good way to go, for sure.
>>> But for now, I'm suggesting that we don't add a secure erase
>>> functionality.  It can certainly be added later, but I'd rather focus
>>> on getting the baseline deactivate and destroy functionality in first,
>>> and use --zap with destroy to blow away a disk.
>>>
>>> I'd rather not have a secure erase feature hold up the other functionality.
>>
>> Agreed.. sorry for running off into the weeds!  :)
>
> Oh, not at all.  Very good info.  It was more since Vicente said he
> was going to start working on some things, I didn't want him to worry
> about how to add secure erase at the very beginning.  :)

OK, according to your  description I think I can ignore the "secure
erase" at beginning. :D
You and sage's info make me know how to erase entire disk fast, thanks!
It useful to me!!

>
> To that end, Vicente, I saw your comments on GitHub as well.  To
> clarify, were you thinking of adding 'deactivate' to ceph-disk or
> ceph-deploy?  I may have misunderstood your intent.  We definitely
> need to add deactivate/destroy to ceph-disk, then ceph-deploy can call
> them.  But you may have meant that you were going to pre-emptively
> work on ceph-deploy to call the (hopefully soon to exist) 'ceph-disk
> deactivate' command.
>
>  - Travis

If all of disk related functions in ceph-disk, I agree to add
deactivate to ceph-disk.
(Just as you need, ceph-deploy could call them to make things simple.)

As you mention, you started work on deactivate on ceph-disk.
I haven't started to work it.
I worked on ceph-deply osd related function that you say on GitHub
comment ( osd_list() and osd_tree() ) yesterday.
Maybe you would like pushed to wip- brnach that I can help you to
complete if you need.
Or I re-work on ceph-deploy to call the ceph-disk deactivate?

vicente

>>
>> sage
>>
>>
>>>
>>> >>
>>> >> very thanks!
>>> >>
>>> >> vicente
>>> >>
>>> >> >
>>> >> >
>>> >> >>
>>> >> >> Anyway, I rework and implement the deactivate first.
>>>
>>> I started working on this yesterday as well, but don't want to
>>> duplicate work.  I haven't pushed a wip- branch or anything yet,
>>> though.  I can hold off if you are actively working on it.
>>>
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >> 2015-01-06 8:42 GMT+08:00 Robert LeBlanc <robert@leblancnet.us>:
>>> >> >> > I do think the "find a journal partition" code isn't particularly robust.
>>> >> >> > I've had experiences with ceph-disk trying to create a new partition even
>>> >> >> > though I had wiped/zapped a disk previously. It would make the operational
>>> >> >> > component of Ceph much easier with replacing disks if the journal partition
>>> >> >> > is cleanly removed and able to be reused automatically.
>>> >> >> >
>>> >> >> > On Mon, Jan 5, 2015 at 11:18 AM, Sage Weil <sage@newdream.net> wrote:
>>> >> >> >> On Mon, 5 Jan 2015, Travis Rhoden wrote:
>>> >> >> >>> On Mon, Jan 5, 2015 at 12:27 PM, Sage Weil <sage@newdream.net> wrote:
>>> >> >> >>> > On Mon, 5 Jan 2015, Travis Rhoden wrote:
>>> >> >> >>> >> Hi Loic and Wido,
>>> >> >> >>> >>
>>> >> >> >>> >> Loic - I agree with you that it makes more sense to implement the core
>>> >> >> >>> >> of the logic in ceph-disk where it can be re-used by other tools (like
>>> >> >> >>> >> ceph-deploy) or by administrators directly.  There are a lot of
>>> >> >> >>> >> conventions put in place by ceph-disk such that ceph-disk is the best
>>> >> >> >>> >> place to undo them as part of clean-up.  I'll pursue this with other
>>> >> >> >>> >> Ceph devs to see if I can get agreement on the best approach.
>>> >> >> >>> >>
>>> >> >> >>> >> At a high-level, ceph-disk has two commands that I think could have a
>>> >> >> >>> >> corollary -- prepare, and activate.
>>> >> >> >>> >>
>>> >> >> >>> >> Prepare will format and mkfs a disk/dir as needed to make it usable by Ceph.
>>> >> >> >>> >> Activate will put the resulting disk/dir into service by allocating an
>>> >> >> >>> >> OSD ID, creating the cephx key, and marking the init system as needed,
>>> >> >> >>> >> and finally starting the ceph-osd service.
>>> >> >> >>> >>
>>> >> >> >>> >> It seems like there could be two opposite commands that do the following:
>>> >> >> >>> >>
>>> >> >> >>> >> deactivate:
>>> >> >> >>> >>  - set "ceph osd out"
>>> >> >> >>> >
>>> >> >> >>> > I don't think 'out out' belongs at all.  It's redundant (and extra work)
>>> >> >> >>> > if we remove the osd from the CRUSH map.  I would imagine it being a
>>> >> >> >>> > possibly independent step.  I.e.,
>>> >> >> >>> >
>>> >> >> >>> >  - drain (by setting CRUSH weight to 0)
>>> >> >> >>> >  - wait
>>> >> >> >>> >  - deactivate
>>> >> >> >>> >  - (maybe) destroy
>>> >> >> >>> >
>>> >> >> >>> > That would make deactivate
>>> >> >> >>> >
>>> >> >> >>> >>  - stop ceph-osd service if needed
>>> >> >> >>> >>  - remove OSD from CRUSH map
>>> >> >> >>> >>  - remove OSD cephx key
>>> >> >> >>> >>  - deallocate OSD ID
>>> >> >> >>> >>  - remove 'ready', 'active', and INIT-specific files (to Wido's point)
>>> >> >> >>> >>  - umount device and remove mount point
>>> >> >> >>> >
>>> >> >> >>> > which I think make sense if the next step is to destroy or to move the
>>> >> >> >>> > disk to another box.  In the latter case the data will likely need to move
>>> >> >> >>> > to another disk anyway so keeping it around it just a data safety thing
>>> >> >> >>> > (keep as many copies as possible).
>>> >> >> >>> >
>>> >> >> >>> > OTOH, if you clear out the OSD id then deactivate isn't reversible
>>> >> >> >>> > with activate as the OSD might be a new id even if it isn't moved.  An
>>> >> >> >>> > alternative approach might be
>>> >> >> >>> >
>>> >> >> >>> > deactivate:
>>> >> >> >>> >   - stop ceph-osd service if needed
>>> >> >> >>> >   - remove 'ready', 'active', and INIT-specific files (to Wido's point)
>>> >> >> >>> >   - umount device and remove mount point
>>> >> >> >>>
>>> >> >> >>> Good point.  It would be a very nice result if activate/deactivate
>>> >> >> >>> were reversible by each other.  perhaps that should be the guiding
>>> >> >> >>> principle, with any additional steps pushed off to other commands,
>>> >> >> >>> such as destroy...
>>> >> >> >>>
>>> >> >> >>> >
>>> >> >> >>> > destroy:
>>> >> >> >>> >   - remove OSD from CRUSH map
>>> >> >> >>> >   - remove OSD cephx key
>>> >> >> >>> >   - deallocate OSD ID
>>> >> >> >>> >   - destroy data
>>> >> >> >>>
>>> >> >> >>> I like this demarcation between deactivate and destroy.
>>> >> >> >>>
>>> >> >> >>> >
>>> >> >> >>> > It's not quite true that the OSD ID should be preserved if the data
>>> >> >> >>> > is, but I don't think there is harm in associating the two...
>>> >> >> >>>
>>> >> >> >>> What if we make destroy data optional by using the --zap flag?  Or,
>>> >> >> >>> since zap is just removing the partition table, do we want to add more
>>> >> >> >>> of a "secure erase" feature?  Almost seems like that is difficult
>>> >> >> >>> precedent.  There are so many ways of trying to "securely" erase data
>>> >> >> >>> out there that that may be best left to the policies of the cluster
>>> >> >> >>> administrator(s).  In that case, --zap would still be a good middle
>>> >> >> >>> ground, but you should do more if you want to be extra secure.
>>> >> >> >>
>>> >> >> >> Sounds good to me!
>>> >> >> >>
>>> >> >> >>> One other question -- should we be doing anything with the journals?
>>> >> >> >>
>>> >> >> >> I think destroy should clear the partition type so that it can be reused
>>> >> >> >> by another OSD.  That will need to be tested, though.. I forget how smart
>>> >> >> >> the "find a journal partiiton" code is (it might blindly try to create a
>>> >> >> >> new one or something).
>>> >> >> >>
>>> >> >> >> sage
>>> >> >> >>
>>> >> >> >>
>>> >> >> >>
>>> >> >> >>>
>>> >> >> >>> >
>>> >> >> >>> > sage
>>> >> >> >>> >
>>> >> >> >>> >
>>> >> >> >>> >
>>> >> >> >>> >>
>>> >> >> >>> >> destroy:
>>> >> >> >>> >>  - zap disk (removes partition table and disk content)
>>> >> >> >>> >>
>>> >> >> >>> >> A few questions I have from this, though.  Is this granular enough?
>>> >> >> >>> >> If all the steps listed above are done in deactivate, is it useful?
>>> >> >> >>> >> Or are there usecases we need to cover where some of those steps need
>>> >> >> >>> >> to be done but not all?  Deactivating in this case would be
>>> >> >> >>> >> permanently removing the disk from the cluster.  If you are just
>>> >> >> >>> >> moving a disk from one host to another, Ceph already supports that
>>> >> >> >>> >> with no additional steps other than stop service, move disk, start
>>> >> >> >>> >> service.
>>> >> >> >>> >>
>>> >> >> >>> >> Is "destroy" even necessary?  It's really just zap at that point,
>>> >> >> >>> >> which already exists.  It only seems necessary to me if we add extra
>>> >> >> >>> >> functionality, like the ability to do a wipe of some kind first.  If
>>> >> >> >>> >> it is just zap, you could call zap separate or with --zap as an option
>>> >> >> >>> >> to deactivate.
>>> >> >> >>> >>
>>> >> >> >>> >> And all of this would need to be able to fail somewhat gracefully, as
>>> >> >> >>> >> you would often be dealing with dead/failed disks that may not allow
>>> >> >> >>> >> these commands to run successfully.  That's why I'm wondering if it
>>> >> >> >>> >> would be best to break the steps currently in "deactivate" into two
>>> >> >> >>> >> commands -- (1) deactivate: which would deal with commands specific to
>>> >> >> >>> >> the disk (osd out, stop service, remove marker files, umount) and (2)
>>> >> >> >>> >> remove: which would undefine the OSD within the cluster (remove from
>>> >> >> >>> >> CRUSH, remove cephx key, deallocate OSD ID).
>>> >> >> >>> >>
>>> >> >> >>> >> I'm mostly talking out loud here.  Looking for more ideas, input.  :)
>>> >> >> >>> >>
>>> >> >> >>> >>  - Travis
>>> >> >> >>> >>
>>> >> >> >>> >>
>>> >> >> >>> >> On Sun, Jan 4, 2015 at 6:07 AM, Wido den Hollander <wido@42on.com> wrote:
>>> >> >> >>> >> > On 01/02/2015 10:31 PM, Travis Rhoden wrote:
>>> >> >> >>> >> >> Hi everyone,
>>> >> >> >>> >> >>
>>> >> >> >>> >> >> There has been a long-standing request [1] to implement an OSD
>>> >> >> >>> >> >> "destroy" capability to ceph-deploy.  A community user has submitted a
>>> >> >> >>> >> >> pull request implementing this feature [2].  While the code needs a
>>> >> >> >>> >> >> bit of work (there are a few things to work out before it would be
>>> >> >> >>> >> >> ready to merge), I want to verify that the approach is sound before
>>> >> >> >>> >> >> diving into it.
>>> >> >> >>> >> >>
>>> >> >> >>> >> >> As it currently stands, the new feature would do allow for the following:
>>> >> >> >>> >> >>
>>> >> >> >>> >> >> ceph-deploy osd destroy <host> --osd-id <id>
>>> >> >> >>> >> >>
>>> >> >> >>> >> >> From that command, ceph-deploy would reach out to the host, do "ceph
>>> >> >> >>> >> >> osd out", stop the ceph-osd service for the OSD, then finish by doing
>>> >> >> >>> >> >> "ceph osd crush remove", "ceph auth del", and "ceph osd rm".  Finally,
>>> >> >> >>> >> >> it would umount the OSD, typically in /var/lib/ceph/osd/...
>>> >> >> >>> >> >>
>>> >> >> >>> >> >
>>> >> >> >>> >> > Prior to the unmount, shouldn't it also clean up the 'ready' file to
>>> >> >> >>> >> > prevent the OSD from starting after a reboot?
>>> >> >> >>> >> >
>>> >> >> >>> >> > Although it's key has been removed from the cluster it shouldn't matter
>>> >> >> >>> >> > that much, but it seems a bit cleaner.
>>> >> >> >>> >> >
>>> >> >> >>> >> > It could even be more destructive, that if you pass --zap-disk to it, it
>>> >> >> >>> >> > also runs wipefs or something to clean the whole disk.
>>> >> >> >>> >> >
>>> >> >> >>> >> >>
>>> >> >> >>> >> >> Does this high-level approach seem sane?  Anything that is missing
>>> >> >> >>> >> >> when trying to remove an OSD?
>>> >> >> >>> >> >>
>>> >> >> >>> >> >>
>>> >> >> >>> >> >> There are a few specifics to the current PR that jump out to me as
>>> >> >> >>> >> >> things to address.  The format of the command is a bit rough, as other
>>> >> >> >>> >> >> "ceph-deploy osd" commands take a list of [host[:disk[:journal]]] args
>>> >> >> >>> >> >> to specify a bunch of disks/osds to act on at one.  But this command
>>> >> >> >>> >> >> only allows one at a time, by virtue of the --osd-id argument.  We
>>> >> >> >>> >> >> could try to accept [host:disk] and look up the OSD ID from that, or
>>> >> >> >>> >> >> potentially take [host:ID] as input.
>>> >> >> >>> >> >>
>>> >> >> >>> >> >> Additionally, what should be done with the OSD's journal during the
>>> >> >> >>> >> >> destroy process?  Should it be left untouched?
>>> >> >> >>> >> >>
>>> >> >> >>> >> >> Should there be any additional barriers to performing such a
>>> >> >> >>> >> >> destructive command?  User confirmation?
>>> >> >> >>> >> >>
>>> >> >> >>> >> >>
>>> >> >> >>> >> >>  - Travis
>>> >> >> >>> >> >>
>>> >> >> >>> >> >> [1] http://tracker.ceph.com/issues/3480
>>> >> >> >>> >> >> [2] https://github.com/ceph/ceph-deploy/pull/254
>>> >> >> >>> >> >> --
>>> >> >> >>> >> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> >> >> >>> >> >> the body of a message to majordomo@vger.kernel.org
>>> >> >> >>> >> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> >> >> >>> >> >>
>>> >> >> >>> >> >
>>> >> >> >>> >> >
>>> >> >> >>> >> > --
>>> >> >> >>> >> > Wido den Hollander
>>> >> >> >>> >> > 42on B.V.
>>> >> >> >>> >> > Ceph trainer and consultant
>>> >> >> >>> >> >
>>> >> >> >>> >> > Phone: +31 (0)20 700 9902
>>> >> >> >>> >> > Skype: contact42on
>>> >> >> >>> >> --
>>> >> >> >>> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> >> >> >>> >> the body of a message to majordomo@vger.kernel.org
>>> >> >> >>> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> >> >> >>> >>
>>> >> >> >>> >>
>>> >> >> >>> --
>>> >> >> >>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> >> >> >>> the body of a message to majordomo@vger.kernel.org
>>> >> >> >>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> >> >> >>>
>>> >> >> >>>
>>> >> >> >> --
>>> >> >> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> >> >> >> the body of a message to majordomo@vger.kernel.org
>>> >> >> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> >> >> > --
>>> >> >> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> >> >> > the body of a message to majordomo@vger.kernel.org
>>> >> >> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> >> >>
>>> >> >>
>>> >> --
>>> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> >> the body of a message to majordomo@vger.kernel.org
>>> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> >>
>>> >>
>>>
>>>

  reply	other threads:[~2015-01-07  2:18 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-01-02 21:31 ceph-deploy osd destroy feature Travis Rhoden
2015-01-02 22:29 ` Loic Dachary
2015-01-04 11:07 ` Wido den Hollander
2015-01-05 17:14   ` Travis Rhoden
2015-01-05 17:27     ` Sage Weil
2015-01-05 17:53       ` Travis Rhoden
2015-01-05 18:18         ` Sage Weil
2015-01-06  0:42           ` Robert LeBlanc
2015-01-06  4:21             ` Wei-Chung Cheng
2015-01-06  5:08               ` Sage Weil
2015-01-06  6:34                 ` Wei-Chung Cheng
2015-01-06 14:28                   ` Sage Weil
2015-01-06 16:19                     ` Travis Rhoden
2015-01-06 16:23                       ` Sage Weil
2015-01-06 16:30                         ` Travis Rhoden
2015-01-07  2:18                           ` Wei-Chung Cheng [this message]
2015-01-05 17:32     ` Loic Dachary

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CABF_e-FvtHzwzc5MUV+=hWmFXDn=eNsD9XV-Q5P8KMMi5uN6hg@mail.gmail.com' \
    --to=freeze.vicente.cheng@gmail.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=loic@dachary.org \
    --cc=robert@leblancnet.us \
    --cc=sage@newdream.net \
    --cc=trhoden@gmail.com \
    --cc=wido@42on.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.