How to clone LDOMs using ZFS

The ZFS snapshot and cloning feature can be used to clones LDOMs. This comes very handy when you need to create multiple ldoms with some softwares already installed. The steps involved are :

1. Setup the primary domain
2. Create a guest LDOM (base LDOM)
3. Unconfigure, stop and unbind the base LDOM
4. Take zfs snapshot of base LDOM (called as golden image)
5. Clone the golden image to create new LDOMs

Setup the primary domain

Setup the primary domain with necessary resources and services and reboot the machine, for configuration changes to take effect.
Create default services

primary# ldm add-vcc port-range=5000-5100 primary-vcc0 primary
primary# ldm add-vds primary-vds0 primary
primary# ldm add-vsw net-dev=nxge0 primary-vsw0 primary
primary# ldm list-services primary 
VDS
    NAME             VOLUME         OPTIONS          DEVICE
    primary-vds0
VCC
    NAME             PORT-RANGE
    primary-vcc0     5000-5100
VSW
    NAME             MAC               NET-DEV   DEVICE     MODE
    primary-vsw0     02:04:4f:fb:9f:0d nxge0     switch@0   prog,promisc

Ensure ldmd deamon is online and set CPU, memory resources fro primary domain.

primary# svcs -a | grep ldmd
online 14:23:34 svc:/ldoms/ldmd:default
primary# ldm set-mau 1 primary
primary# ldm set-vcpu 8 primary
primary# ldm start-reconf primary          (delayed reconfiguration)
primary# ldm set-memory 4G primary
primary# ldm add-config new_config
primary# ldm list-config 
factory-default
new_config [current]

Reboot primary domain for new configuration (new_config) to become active.

primary# shutdown -y -g0 -i6

Enable networking between primary and guest domains

primary# ifconfig nxge0 down unplumb
primary# ifconfig vsw0 plumb
primary# ifconfig vsw0 192.168.1.2 netmask + broadcast + up
primary# mv /etc/hostname.nxge0 /etc/hostname.vsw0

Enable virtual network terminal server daemon if not already enabled.

primary# svcadm enable vntsd
primary# svcs vntsd
    STATE          STIME    FMRI
    online         Oct_12   svc:/ldoms/vntsd:default

Setting up the base LDOM

Setup the base LDOM (base_ldom) with 8 VCPU, 2GB Memory, virtual network device vnet1 and zfs volume (base_ldom) as a virtual disk (vdisk1).

primary# ldm add-domain base_ldom
primary# ldm add-vcpu 8 base_ldom
primary# ldm add-memory 2G base_ldom
primary# ldm add-vnet vnet1 primary-vsw0 base_ldom
primary# zfs create -V 5gb ldompool/base_ldomvol
primary# ldm add-vdsdev /dev/zvol/dsk/ldompool/base_ldomvol vol01@primary-vds0
primary# ldm add-vdisk vdisk1 vol01@primary-vds0 base_ldom

Set the boot environment variables

primary# ldm set-var auto-boot?=true base_ldom
primary# ldm set-var boot-device=vdisk1 base_ldom

Install Solaris 10 on base ldom using solaris 10 iso image. We will add solaris 10 iso image as a virtual disk and then boot the base LDOM from this disk to install solaris 10.

primary# ldm add-vdsdev options=ro /data/sol_10.iso iso@primary-vds0
primary# ldm add-vdisk sol10_iso iso@primary-vds0 base_ldom

Bind and start the base LDOM. The solaris 10 iso should reflect in the devalias output at OK prompt as sol10_iso. Boot from this image to start the installation.

primary# ldm bind base_ldom
primary# ldm start base_ldom
LDom base_ldom started
ok> devalias
sol10_iso                /virtual-devices@100/channel-devices@200/disk@1
vdisk0                   /virtual-devices@100/channel-devices@200/disk@0
vnet1                    /virtual-devices@100/channel-devices@200/network@0
net                      /virtual-devices@100/channel-devices@200/network@0
disk                     /virtual-devices@100/channel-devices@200/disk@0
virtual-console          /virtual-devices/console@1
name                     aliases
ok> boot sol10_iso

Unconfigure, stop and unbind base LDOM

Unconfigure the base LDOM which automatically halts it. We would then stop the LDOM and unbind it so that we can take a snapshot of the base LDOM boot disk volume (base_ldomvol).

base_ldom# sys-unconfigure  (the ldom halts after this)
primary-domain# ldm stop base_ldom
primary-domain# ldm unbind base_ldom

Create the golden image

To create the golden image take a snapshot of the base_ldomvol from the base ldom.

primary-domain# zfs snapshot ldompool/bas_ldomvol@golden

Clone the golden image to create new LDOM

Clone the base_ldomvol snapshot (golden image) and use it to create the new LDOM, ldom01 with 4 VCPU, 4G, 1 MAU

primary-domain# zfs clone ldompool/bas_ldomvol@golden ldompool/ldom01_bootvol
primary-domain# ldm create ldom01
primary-domain# ldm set-mau 1 ldom01
primary-domain# ldm set-vcpu 4 ldom01
primary-domain# ldm set-mem 4G ldom01
primary-domain# ldm add-vnet vnet1 primary-vsw0 ldom01
primary-domain# ldm add-vdsdev ldompool/ldom01_bootvol vol01@primary-vds0
primary-domain# ldm add-vdisk vdisk1 vol01@primary-vds0
primary-domain# ldm set-variable auto-boot?=false ldom01
primary-domain# ldm bind ldom01
primary-domain# ldm start ldom01

When you boot the new LDOM, you will have to configure it with hostname, IP, timezone etc settings as it is an unconfigured LDOM.

Advertisements

ZFS Part 5: ZFS Clones and Sending/Receiving ZFS Data

A ZFS Clone is a read-write clone of a filesystem created from a snapshot. It still refers to the snapshot it has been created from, but allows us to make changes. We cannot remove the origin snapshot whilst the clone is in use, unless we promote it. These concepts will become clear during the examples.

Let’s create a test dataset:

Put some data on the filesystem:

We can now take a snapshot of the filesystem:

Now, take a clone of this snapshot into a new dataset:

Here, the clone is being created from the snapshot datapool/clonefs@20130129 into the new dataset datapool/cloned. zfs list shows the new dataset:

See that 19KB is used, but the dataset refers to 306KB somewhere else. Where does that originate? The origin property of the ZFS dataset datapool/cloned will show us:

There we go. We will be unable to delete the origin snapshot (as it’s still required for the clone to function):

No – we don’t want that! Before we complete this, verify that the dataset is in-fact a read-write clone:

The ZFS dataset can be promoted:

See that the snapshot is now a snapshot of the CLONED filesystem:

And that the original filesystem (the one we cloned) is now dependent on this snapshot, and uses it as its origin:

Essentially, the parent-child relationship is switched. We can switch it back:

Switch it back again (dizzy yet?) and then you can destroy the dataset that datapool/cloned was created from (i.e. datapool/clonefs):

As the dependent filesystem has now been removed, the snapshot too can be removed:

And we’re done with clones.

Sending/Receiving ZFS Data

ZFS send/receive is essentially ufsdump/ufsrestore on steroids. zfs send can be used to create “streams” from snapshots, and send those streams to files, other systems, or indeed another dataset with zfs recv.

zfs send/recv, along with the snapshot functionality, allow us to create our own complex backup solutions relatively simply.

Start, as usual, with a test dataset with a few files copied to it:

Create a snapshot of the filesystem:

zfs send writes a stream of the current snapshot to STDOUT. zfs recv receives a ZFS data stream on STDIN. Thus, we can just pipe it the output of zfs send into zfs recv and create a new filesystem from the stream on-the-fly:

We can see that the new filesystem has been created:

The snapshot has also been created:

We won’t be needing that, so we can remove it:

We can verify that datapool/receiveme contains the data we sent:

zfs send will, by default, send a full backup. You can use zfs send -i to send incremental backups and fashion yourself that backup system I’ve been prattling on about – which is when the ZFS snapshots that it creates on the destination (receiving) dataset are required (so that the filesystem can be restored to a point-in-time, for example). Going into this solution within this article is stretching the scope a little.

As zfs send/recv operate on streams (just like the rest of UNIX), we can do things like:

Conclusion

This article has covered the ZFS basics, and a few advanced concepts too. In a later article, I’ll introduce other concepts such as ZFS Delegated administration, setting up NFS/SMB servers using a ZFS backing store, repairing failed zpools (and scrubbing) and much more.

ZFS Part 4: ZFS Snapshots

Snapshots are another piece of awesome functionality built right into ZFS. Essentially, snapshots are a read-only picture of a filesystem at a particular point-in-time. You can use these snapshots to perform incremental backups of filesystems (sending them to remote systems, too), create filesystem clones, create pre-upgrade backups prior to working with new software on a filesystem, and so on.

The snapshot will, initially, only refer to the files on the parent ZFS dataset from which they were created and not consume any space. They will only start to consume space once the data on the original dataset is changed. The snapshot will refer to these blocks and will not free them, thus the snapshot will start consuming space within the pool. The files on the snapshot can be accessed and read via standard UNIX tools (once you know where to look).

The best way to discuss these concepts is via some examples. Let us start by creating a test dataset:

Copy a few files to the new filesystem:

Verify that the dataset has been created and is using space within the pool:

So, we can see that 35KB is being referenced, and used, by the dataset. Which is all as expected.

Now, let’s take a snapshot of this dataset. Snapshots are named <datasetname>@<arbitrarystring> – you can pretty much use whichever string you want for arbitrarystring, but it’d make sense to use something meaningful, such as the date or some such.

Create the snapshot:

Verify that the snapshot has been created:

You can see here that whilst the snapshot refers to 35KB of data, 0 bytes are currently used. This is expected as we are referencing all the data within datapool/snapshotfs, and haven’t yet changed anything.

Let’s delete a file from the filesystem.

Now, a zfs list -t snapshot shows that the snapshot is consuming 21KB:

This is expected, as the snapshot now has a copy (via copy-on-write) of the data we deleted.

Create another snapshot (note REFER is now 34K as the file copy of /etc/shadow was removed from datapool/snapshotfs) …

and remove another file.

USED will change accordingly:

See that the first snapshot we took HASN’T changed, again due to copy-on-write. Just to labour the point, let’s create another snapshot:

Again, REFER has gone down to 31.5K (due to the copy of /etc/passwd being removed from datapool/snapshotfs) and USED is 0 because we haven’t done anything else to datapool/snapshotfs since creating the snapshot.

Remove the final file:

All as expected. USED is up again. Thus, snapshots contain incremental changes and can be used as the basis of developing incremental backups (once an appropriate full backup has been taken). We can use these to rollback, too. Let’s rollback to datapool/snapshotfs@20130129-3, and get our copy of /etc/group back:

Very cool. Now, let’s rollback the dataset to the first snapshot we took:

Note that we need to specify the -r option to zfs rollback as we have multiple snapshots taken later than snapshotfs@20130129 that must be removed during the rollback to original dataset state. Note, we still have the ORIGINAL snapshot we’ve rolled back to active, and this will still be using space as the dataset is changed.

Our files are now restored.

We can access the copies of the files stored on the snapshot, under <dataset_mountpoint>/.zfs/snapshot/<snapshot_name>, for example:

Let’s destroy the test dataset (remembering -r to zfs destroy so that the snapshot(s) are removed too):

We can now use what we’ve learned about ZFS Snapshots to work with ZFS Clones.

ZFS Part 3: Compression & Encryption

Also available to us is ZFS compression. Let’s create a test pool for testing. We’ll turn a few options on and off so you see the syntax:

Verify that the dataset was created with all appropriate options:

Compression ratio is 1.00x, as you’d expect for an empty filesystem. Copy some stuff to it:

Then check the compressratio variable within the dataset properties:

So – compression has given us some benefit. It’d be worth weighing up compression/dedup/encryption of ZFS filesystems against the system resources which they consume. Nowadays I’d be pushing FOR turning all this stuff on – servers are cheap and can work hard. Put them to use.

Encryption

Filesystem encryption is another easy-to-implement feature of ZFS. ZFS root pools and other OS components (such as the /var filesystem) cannot be encrypted.

To start, I’ll create a new encrypted dataset. You will be prompted for a passphrase to use when encrypting/decrypting the filesystem. Needless to say – do not forget this passphrase! Create the dataset with the encryption=on option:

Verify that the operation has succeeded and that the encrypted dataset has been created:

Encrypted ZFS datasets, when created with encryption=on and no other options, use aes-128-ccm as the default encryption algorithm.

You will see that by default, ZFS uses passphrase,prompt as the value for the keysource property:

In this configuration, the ZFS filesystem will not be automatically mounted at boot – observe. After a reboot the output of zfs mount does not contain an entry for datapool/encryptfs:

So – any encrypted datasets, using passphrase,prompt as the value for the keysource property, require manual mount with zfs mount:

The passphrase can be placed in a file, so that it is automatically mounted on boot. Note – this is not secure nor is it recommended. It is best to use keys, and we will configure this shortly. For now, place the passphrase in a read-only file in root’s home directory:

Set the keysource property to include the passphrase file:///path/to/key value as below:

Now, unmount the filesystem and unload the cached key (otherwise the filesystem will be remounted using the cached key and nothing will change):

If the filesystem is mounted now, we are not prompted for a passphrase:

A reboot confirms this:

OK – this is all well and good, but storing the passphrase in a file is far from being best practice. A more secure method is to use pktool to create a key, then change the dataset to use this key (or indeed create the ZFS dataset in the first place using this key). Whilst this is better – it’s still only as secure as the security of the key file location. Any compromise of the key file leads to a potential compromise of the ZFS dataset. However, we’re not storing the passphrase clear-text in a file somewhere, which is positive in my book.

The conversion from using a passphrase based key to the pktool generated key is completed as follows. First, generate a key, and store in as secure a location as possible:

Load the existing wrapping key for the dataset, by either mounting the dataset, or using zfs key -l. In our case, we can see that the key is already loaded, and so can ignore the warning:

Change the wrapping key via zfs key -c:

To test that the change has been successfully implemented, unmount the filesystem and unload the wrapping key for the dataset:

Try mounting the dataset, and confirm that the operation is successful, and that no prompts/warnings are displayed:

As previously discussed, a ZFS filesystem with encryption=on set uses aes-128-ccm by default. We can change this when we create a new dataset, however. Observe:

Our new dataset has been created using aes-256-ccm encryption.

ZFS Part 2: Implementing Zpool/ZFS Configuration and ZFS Features

OK – we’re good to go for our final ZFS configuration. Recall from earlier that I will be configuring a two-disk RAID1 set, with an extra disk for hot-spare use, and the final disk to play with for backups, encryption, dedup, etc.

It was that easy:

If we query the pool, all required elements (mirroring, hot-spare), will be in place:

The default ZFS dataset will also have been created:

For the root dataset mount, that mountpoint will be fine. Each new ZFS dataset created will have a more appropriate mountpoint set.

So – let’s assume we need to install some software, and require a /u01 filesystem on which to install (a common sight if you’ve worked around Oracle software long enough).

That simple command has created the new dataset for us:

Next, let’s move the mountpoint to somewhere more sensible, namely /u01:

Again – another simiple task, it even created the non-existent mountpoint for us:

This dataset will stay the same size as its parent, datapool, until we start actually using it (or use and create other datasets).

Let’s turn on deduplication for the parent, so that it will be inherited from by any other datasets configured within datapool. I wont turn on encryption and compression and will control those at a finer-grained per-dataset level.

Then we verify:

As we created datapool/u01 prior to setting dedup=on, we need to head over there and set it on for that dataset too:

Let’s start creating some more interesting child datasets.

ZFS Features

The first feature I wanted to try out was deduplication. Essentially – if a block is duplicated numerous times across a pool, it will be deduplicated (i.e. its duplicates removed) thus improving storage utilisation.

I set dedup=on on datapool so any new dataset created will inherit that property from its parent. Therefore:

Verify as always:

You even get a nice little note that the value for this property has been inherited from the parent ZFS dataset – datapool.

If we check via du the “real” values are reported:

A zpool list however has else to say:

You can see that only 37.1M is actually allocated due to a deduplication factor of 7.37x – between our three copies of /usr/sbin the system was able to deduplicate by quite a large factor saving us a couple of hundred meg. Pretty cool. You can also get a simulated deduplication histogram on a datapool (with dedup=on or dedup=off) using zdb:

For all but the most demanding scenarios where calculating deduplication would be stressful on system resources, setting dedup=on on a dataset is normally a good idea.