Tag Archives: Exadata

Linux Kernel Upgrade on Exadata(manual way)

Kernel upgrade can be applied node by node on exadata so there will be no service interruption. Kernel upgrades are required when you need new functionality or when you hit bugs on the current kernel version. I had to upgrade kernel of a box. It is a good experience and The following procedure is based on kernel upgrade on Oracle Linux 5.8 with Unbreakable Enterprise Kernel [2.6.32], a compute node of exadata.

PRE-UPGRADE

==> If you have EM12C the targets on the host will be unavailable for upgrade period. Put them in blackout state so that no false alarms generated from them.

==> Run the upgrade step on X-windows like vnc. This will prevent any disconnection issues from ssh clients.

==> Disable all NFS mounts on the system. check the locations /etc/rc.local , /etc/fstab

==> Is there any asm operations going on the system. Wait for them to finish. Make sure no rebalance job is running on the ASM part. check v$asm_operation.

==> Backup the grup startup file /boot/grub/grub.conf. you might need it for rollback.

==> Shutdown the crs and disable crs auto start. Also shutdown any databases or listeners that are not registered with the csr.
[root@host1 ~]# /u01/app/11.2.0.3/grid/bin/crsctl disable crs
CRS-4621: Oracle High Availability Services autostart is disabled.
[root@host1 ~]# /u01/app/11.2.0.3/grid/bin/crsctl stop crs -f

==> Make sure crs is not running
[root@host1 ~]# ps -ef | grep d.bin
root 66664 60395 0 09:55 pts/1 00:00:00 grep d.bin

==> Reboot the system and make sure it is able to restart before any kernel changes 🙂

==> Check the ilom problem page and make sure there is no problem on the server. If there are any like memory problems etc. fix them.

==> Record the current kernel
[root@host1 ~]# uname -r
2.6.32-400.11.1.el5uek

==> Check the server version and make sure the next kernel is designed for the server.
[root@host1 ~]# dmidecode -s system-product-name
SUN FIRE X4170 M3

==> Make sure enough space is available
[root@host1 ~]# df -h

==> Shutdown any database or listeners that hasn’t been registered with the crs. check the crs for the last time.
[root@host1 ~]# ps -ef | grep d.bin
root 66664 60395 0 09:55 pts/1 00:00:00 grep d.bin

UPGRADE
==> upgrade the kernel

[root@host1 ~]# rpm -ivh kernel-uek-firmware-2.6.32-400.34.1.el5uek.noarch.rpm kernel-uek-2.6.32-400.34.1.el5uek.x86_64.rpm ofa-2.6.32-400.34.1.el5uek-1.5.1-4.0.58.1.x86_64.rpm
Preparing… ########################################### [100%]
1:kernel-uek-firmware ########################################### [ 33%]
2:kernel-uek ########################################### [ 67%]
3:ofa-2.6.32-400.34.1.el5########################################### [100%]

==> Reboot the system
[root@host1 ~]# reboot

POST-UPGRADE
==> Check ilom for any errors. Check /var/log/messages for any errors.

==> Check the new kernel version
[root@host1 ~]# uname -r
2.6.32-400.34.1.el5uek

==> Start the crs and enable crs auto start
[root@host1 ~]# /u01/app/11.2.0.3/grid/bin/crsctl enable crs
CRS-4622: Oracle High Availability Services autostart is enabled.
[root@host1 ~]# /u01/app/11.2.0.3/grid/bin/crsctl start crs

==> Check if crs is starting

[root@host1 ~]# ps -ef | grep d.bin
root 11852 1 4 10:22 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/ohasd.bin reboot
oracle 12013 1 0 10:22 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/oraagent.bin
oracle 12025 1 0 10:22 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/mdnsd.bin
oracle 12109 1 1 10:22 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/gpnpd.bin
root 12119 1 0 10:22 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/orarootagent.bin
oracle 12122 1 1 10:22 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/gipcd.bin
root 12137 1 1 10:22 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/osysmond.bin
root 12150 1 0 10:22 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/cssdmonitor
root 12167 1 0 10:22 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/cssdagent
oracle 12169 1 1 10:22 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/diskmon.bin -d -f
oracle 12187 1 2 10:22 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/ocssd.bin
root 12389 10620 0 10:23 pts/0 00:00:00 grep d.bin
[root@host1 ~]#

==> Enable any NFS mount on the system and mount them

==> On EM12c end the blackout period for the targets.

Now you can move on the other server in the cluster.

Advertisements

Change exadata flashcache mode from WriteThrough to WriteBack

We can enable the WriteBack mode for flashcaches without shutting down the ASM or any instance on the Exadata. This will be done in a rolling fashion one cell disk at a time. You have to make sure you finish one cell before starting the operation on the next cell.

I will follow the following document to change the flashcache mode in a Exadata Database Machine X3-2 Eighth Rack System. Also you can find why we need to use WriteBack mode and other useful information in that document.

Exadata Write-Back Flash Cache – FAQ (Doc ID 1500257.1)

4. How to determine if you have write back flash cache enabled?

[root@testdbadm01 ~]# dcli -g ~/cell_group -l root cellcli -e "list 
cell attributes flashcachemode"
testceladm01: WriteThrough
testceladm02: WriteThrough
testceladm03: WriteThrough

5. How can we enable the write back flash cache?

Before proceeding any further, make sure all the griddisks are online and there are no problems on the cells.

[root@testdbadm01 ~]#  dcli -g cell_group -l root cellcli -e list 
griddisk attributes asmdeactivationoutcome, asmmodestatus
testceladm01: Yes        ONLINE
testceladm01: Yes        ONLINE
testceladm01: Yes        ONLINE
testceladm01: Yes        ONLINE
testceladm01: Yes        ONLINE
testceladm01: Yes        ONLINE
testceladm01: Yes        ONLINE
testceladm01: Yes        ONLINE
testceladm01: Yes        ONLINE
testceladm01: Yes        ONLINE
testceladm01: Yes        ONLINE
testceladm01: Yes        ONLINE
testceladm01: Yes        ONLINE
testceladm01: Yes        ONLINE
testceladm01: Yes        ONLINE
testceladm01: Yes        ONLINE
testceladm02: Yes        ONLINE
testceladm02: Yes        ONLINE
testceladm02: Yes        ONLINE
testceladm02: Yes        ONLINE
testceladm02: Yes        ONLINE
testceladm02: Yes        ONLINE
testceladm02: Yes        ONLINE
testceladm02: Yes        ONLINE
testceladm02: Yes        ONLINE
testceladm02: Yes        ONLINE
testceladm02: Yes        ONLINE
testceladm02: Yes        ONLINE
testceladm02: Yes        ONLINE
testceladm02: Yes        ONLINE
testceladm02: Yes        ONLINE
testceladm02: Yes        ONLINE
testceladm03: Yes        ONLINE
testceladm03: Yes        ONLINE
testceladm03: Yes        ONLINE
testceladm03: Yes        ONLINE
testceladm03: Yes        ONLINE
testceladm03: Yes        ONLINE
testceladm03: Yes        ONLINE
testceladm03: Yes        ONLINE
testceladm03: Yes        ONLINE
testceladm03: Yes        ONLINE
testceladm03: Yes        ONLINE
testceladm03: Yes        ONLINE
testceladm03: Yes        ONLINE
testceladm03: Yes        ONLINE
testceladm03: Yes        ONLINE
testceladm03: Yes        ONLINE

On the 1/8th Rack system flashcache is reduced to half size. Becuase of that the size appears to be 744 GB.

[root@testdbadm01 ~]#  dcli -g cell_group -l root cellcli -e list flashcache detail
testceladm01: name:                      testceladm01_FLASHCACHE
testceladm01: cellDisk:                  FD_03_testceladm01,FD_02_testceladm01,FD_00_testceladm01,FD_01_testceladm01,FD_06_testceladm01,FD_05_testceladm01,FD_04_testceladm01,FD_07_testceladm01
testceladm01: creationTime:              2013-07-29T17:55:28+03:00
testceladm01: degradedCelldisks:
testceladm01: effectiveCacheSize:        744.125G
testceladm01: id:                        5adas74d-asdc-4477-382d-30c14052c23d
testceladm01: size:                      744.125G
testceladm01: status:                    normal
testceladm02: name:                      testceladm02_FLASHCACHE
testceladm02: cellDisk:                  FD_03_testceladm02,FD_07_testceladm02,FD_02_testceladm02,FD_00_testceladm02,FD_06_testceladm02,FD_01_testceladm02,FD_04_testceladm02,FD_05_testceladm02
testceladm02: creationTime:              2013-07-29T17:55:40+03:00
testceladm02: degradedCelldisks:
testceladm02: effectiveCacheSize:        744.125G
testceladm02: id:                        80c140a3-0c14-40ba-8364-10c1460d802f
testceladm02: size:                      744.125G
testceladm02: status:                    normal
testceladm03: name:                      testceladm03_FLASHCACHE
testceladm03: cellDisk:                  FD_06_testceladm03,FD_03_testceladm03,FD_00_testceladm03,FD_04_testceladm03,FD_05_testceladm03,FD_02_testceladm03,FD_01_testceladm03,FD_07_testceladm03
testceladm03: creationTime:              2013-07-29T17:55:25+03:00
testceladm03: degradedCelldisks:
testceladm03: effectiveCacheSize:        744.125G
testceladm03: id:                        6950c148-992b-4347-a1e1-e60c1406bda6
testceladm03: size:                      744.125G
testceladm03: status:                    normal

Repeat the following procedure for all the cells. We have 3 cells on 1/8 the rack exadata. Operate only on one cell at a time. Before proceeding the next cell, make sure the cell is operational. I will follow the Support ID but I prefer to use the cellcli commands in the cellcli command prompt.

Also I prefer to do any disk operation during off peak hours.

1. Drop the flash cache on that cell

CellCLI>  drop flashcache
Flash cache testceladm01_FLASHCACHE successfully dropped

2. Check if ASM will be OK if the grid disks go OFFLINE. The following command should return ‘Yes’ for the grid disks being listed:

CellCLI> list griddisk attributes name,asmmodestatus,asmdeactivationoutcome
         DATA_TEST_CD_00_testceladm01    ONLINE  Yes
         DATA_TEST_CD_01_testceladm01    ONLINE  Yes
         DATA_TEST_CD_02_testceladm01    ONLINE  Yes
         DATA_TEST_CD_03_testceladm01    ONLINE  Yes
         DATA_TEST_CD_04_testceladm01    ONLINE  Yes
         DATA_TEST_CD_05_testceladm01    ONLINE  Yes
         DBFS_DG_CD_02_testceladm01      ONLINE  Yes
         DBFS_DG_CD_03_testceladm01      ONLINE  Yes
         DBFS_DG_CD_04_testceladm01      ONLINE  Yes
         DBFS_DG_CD_05_testceladm01      ONLINE  Yes
         RECO_TEST_CD_00_testceladm01    ONLINE  Yes
         RECO_TEST_CD_01_testceladm01    ONLINE  Yes
         RECO_TEST_CD_02_testceladm01    ONLINE  Yes
         RECO_TEST_CD_03_testceladm01    ONLINE  Yes
         RECO_TEST_CD_04_testceladm01    ONLINE  Yes
         RECO_TEST_CD_05_testceladm01    ONLINE  Yes

3. Inactivate the griddisk on the cell

CellCLI> alter griddisk all inactive
GridDisk DATA_TEST_CD_00_testceladm01 successfully altered
GridDisk DATA_TEST_CD_01_testceladm01 successfully altered
GridDisk DATA_TEST_CD_02_testceladm01 successfully altered
GridDisk DATA_TEST_CD_03_testceladm01 successfully altered
GridDisk DATA_TEST_CD_04_testceladm01 successfully altered
GridDisk DATA_TEST_CD_05_testceladm01 successfully altered
GridDisk DBFS_DG_CD_02_testceladm01 successfully altered
GridDisk DBFS_DG_CD_03_testceladm01 successfully altered
GridDisk DBFS_DG_CD_04_testceladm01 successfully altered
GridDisk DBFS_DG_CD_05_testceladm01 successfully altered
GridDisk RECO_TEST_CD_00_testceladm01 successfully altered
GridDisk RECO_TEST_CD_01_testceladm01 successfully altered
GridDisk RECO_TEST_CD_02_testceladm01 successfully altered
GridDisk RECO_TEST_CD_03_testceladm01 successfully altered
GridDisk RECO_TEST_CD_04_testceladm01 successfully altered
GridDisk RECO_TEST_CD_05_testceladm01 successfully altered

4. Shut down cellsrv service

CellCLI> alter cell shutdown services cellsrv 

Stopping CELLSRV services... 
The SHUTDOWN of CELLSRV services was successful.

5. Set the cell flashcache mode to writeback

CellCLI> alter cell flashCacheMode=writeback
Cell testceladm01 successfully altered

6. Restart the cellsrv service

CellCLI> alter cell startup services cellsrv

Starting CELLSRV services...
The STARTUP of CELLSRV services was successful.

7. Reactivate the griddisks on the cell

CellCLI> alter griddisk all active
GridDisk DATA_TEST_CD_00_testceladm01 successfully altered
GridDisk DATA_TEST_CD_01_testceladm01 successfully altered
GridDisk DATA_TEST_CD_02_testceladm01 successfully altered
GridDisk DATA_TEST_CD_03_testceladm01 successfully altered
GridDisk DATA_TEST_CD_04_testceladm01 successfully altered
GridDisk DATA_TEST_CD_05_testceladm01 successfully altered
GridDisk DBFS_DG_CD_02_testceladm01 successfully altered
GridDisk DBFS_DG_CD_03_testceladm01 successfully altered
GridDisk DBFS_DG_CD_04_testceladm01 successfully altered
GridDisk DBFS_DG_CD_05_testceladm01 successfully altered
GridDisk RECO_TEST_CD_00_testceladm01 successfully altered
GridDisk RECO_TEST_CD_01_testceladm01 successfully altered
GridDisk RECO_TEST_CD_02_testceladm01 successfully altered
GridDisk RECO_TEST_CD_03_testceladm01 successfully altered
GridDisk RECO_TEST_CD_04_testceladm01 successfully altered
GridDisk RECO_TEST_CD_05_testceladm01 successfully altered

8. Verify all grid disks have been successfully put online using the following command:

(Currently DATA_TEST diskgroup started syncronization)

CellCLI> list griddisk attributes name, asmmodestatus
DATA_TEST_CD_00_testceladm01 SYNCING
DATA_TEST_CD_01_testceladm01 SYNCING
DATA_TEST_CD_02_testceladm01 SYNCING
DATA_TEST_CD_03_testceladm01 SYNCING
DATA_TEST_CD_04_testceladm01 SYNCING
DATA_TEST_CD_05_testceladm01 SYNCING
DBFS_DG_CD_02_testceladm01 OFFLINE
DBFS_DG_CD_03_testceladm01 OFFLINE
DBFS_DG_CD_04_testceladm01 OFFLINE
DBFS_DG_CD_05_testceladm01 OFFLINE
RECO_TEST_CD_00_testceladm01 OFFLINE
RECO_TEST_CD_01_testceladm01 OFFLINE
RECO_TEST_CD_02_testceladm01 OFFLINE
RECO_TEST_CD_03_testceladm01 OFFLINE
RECO_TEST_CD_04_testceladm01 OFFLINE
RECO_TEST_CD_05_testceladm01 OFFLINE

9. Recreate the flash cache

CellCLI> create flashcache all
Flash cache testceladm01_FLASHCACHE successfully created

10. Check the status of the cell to confirm that it’s now in WriteBack mode:

CellCLI> list cell attributes flashCacheMode
writeback

11. Repeat these same steps again on the next cell. However, before taking another storage server offline, execute the following making sure ‘asmdeactivationoutcome’ displays YES: (Currently RECO_TEST diskgroup started synchronization)

CellCLI> list griddisk attributes name,asmmodestatus,asmdeactivationoutcome
DATA_TEST_CD_00_testceladm01 ONLINE Yes
DATA_TEST_CD_01_testceladm01 ONLINE Yes
DATA_TEST_CD_02_testceladm01 ONLINE Yes
DATA_TEST_CD_03_testceladm01 ONLINE Yes
DATA_TEST_CD_04_testceladm01 ONLINE Yes
DATA_TEST_CD_05_testceladm01 ONLINE Yes
DBFS_DG_CD_02_testceladm01 ONLINE Yes
DBFS_DG_CD_03_testceladm01 ONLINE Yes
DBFS_DG_CD_04_testceladm01 ONLINE Yes
DBFS_DG_CD_05_testceladm01 ONLINE Yes
RECO_TEST_CD_00_testceladm01 SYNCING Yes
RECO_TEST_CD_01_testceladm01 SYNCING Yes
RECO_TEST_CD_02_testceladm01 SYNCING Yes
RECO_TEST_CD_03_testceladm01 SYNCING Yes
RECO_TEST_CD_04_testceladm01 SYNCING Yes
RECO_TEST_CD_05_testceladm01 SYNCING Yes

All disk groups are synchronized now we can proceed on the next cell.

CellCLI> list griddisk attributes name,asmmodestatus,asmdeactivationoutcome
DATA_TEST_CD_00_testceladm01 ONLINE Yes
DATA_TEST_CD_01_testceladm01 ONLINE Yes
DATA_TEST_CD_02_testceladm01 ONLINE Yes
DATA_TEST_CD_03_testceladm01 ONLINE Yes
DATA_TEST_CD_04_testceladm01 ONLINE Yes
DATA_TEST_CD_05_testceladm01 ONLINE Yes
DBFS_DG_CD_02_testceladm01 ONLINE Yes
DBFS_DG_CD_03_testceladm01 ONLINE Yes
DBFS_DG_CD_04_testceladm01 ONLINE Yes
DBFS_DG_CD_05_testceladm01 ONLINE Yes
RECO_TEST_CD_00_testceladm01 ONLINE Yes
RECO_TEST_CD_01_testceladm01 ONLINE Yes
RECO_TEST_CD_02_testceladm01 ONLINE Yes
RECO_TEST_CD_03_testceladm01 ONLINE Yes
RECO_TEST_CD_04_testceladm01 ONLINE Yes
RECO_TEST_CD_05_testceladm01 ONLINE Yes

FINALLY
After changing the flashcache modes on all cells, check if flashcache modes are changed to write-back for all cells.

[root@testdbadm01 ~]# dcli -g ~/cell_group -l root cellcli -e "list cell attributes flashcachemode"
testceladm01: writeback
testceladm02: writeback
testceladm03: writeback

Using the dcli Utility

7 Using the dcli Utility

The dcli utility executes commands across a group of servers on Oracle and returns the output.

This chapter contains the following sections:

1 Overview of the dcli Utility

The dcli utility executes commands on multiple Oracle Big Data Appliance servers in parallel, using the InfiniBand (bondib0) interface to make the connections. You can run the utility from any server.

1.1 Setting Up Passwordless SSH

The dcli utility requires a passwordless Secure Shell (SSH) between the local server and all target servers. You run the dcli utility on the local server, and the commands specified in dcli execute on the target servers.

Two scripts facilitate the use of SSH on Oracle Big Data Appliance: setup-root-ssh and remove-root-ssh. These scripts accept two options that are also used by dcli:

  • -C: Targets all the servers in a Hadoop cluster
  • -g: Targets a user-defined set of servers

To set up passwordless SSH for root:

  1. Connect to an Oracle Big Data Appliance server using PuTTY or a similar utility. Select an SSH connection type.
  2. Log in as root.
  3. Set up passwordless SSH for root across the rack:
    setup-root-ssh
    

    Or, to set up passwordless SSH across a Hadoop cluster of multiple racks:

    setup-root-ssh -C
    

    You see the message “ssh key added” from each server.

  4. You can now run any ssh command on any server in the rack without entering a password. In addition to dcli commands, you can use scp to copy files between servers.
  5. To remove passwordless SSH from root:
    remove-root-ssh
    

1.2 Basic Use of dcli

This topic identifies some of the basic options to the dcli command.

1.2.1 Getting Help

To see the dcli help page, issue the dcli command with the -h or --help options. You can see a description of the commands by issuing the dclicommand with no options.

1.2.2 Identifying the Target Servers

You can identify the servers where you want the commands to run either in the command line or in a file. For a list of default target servers, use the -toption. To change the target servers for the current command, use the -c or -g options described in below table.

The /opt/oracle/bda directory contains two files for executing commands on multiple servers:

  • rack-hosts-infiniband is the default target group of servers for the dclisetup-root-ssh, and remove-root-ssh utilities. The file initially contains the default factory IP addresses. The network configuration process changes this file to the custom IP addresses identified in the Oracle Big Data Appliance Configuration Worksheets.
  • cluster-hosts-infiniband contains the names of all servers in the Hadoop cluster created by the Mammoth Utility. A cluster can span one or more Oracle Big Data Appliance racks.

You can manually create additional files with groups of servers that you want to manage together. For example, you might manage servers 5 to 18 together, because they have no special functions like servers 1 to 4.

1.2.3 Specifying the Commands

You typically specify a command for execution on the target servers on the command line. However, you can also create a command file for a series of commands that you often issue together or for commands with complex syntax. See the -x option in below table.

You can also copy files to the target servers without executing them by using the -f option.

1.2.4 Controlling the Output Levels

You can request more information with the -v option or less information with the -n option. You can also limit the number of returned lines with the --maxlines option, or replace matching strings with the -r option.

Following are examples of various output levels using a simple example: the Linux date command.

Note:

The output from only one server (node07) is shown. The syntax in these examples issues the date command on all 18 servers.

This is the default output, which lists the server followed by the output.

# dcli date
bda1node07-adm.example.com: Tue Feb 14 10:22:31 PST 2012

The minimal output returns OK for completed execution:

# dcli -n date
OK: ['bda1node07.example.com']

Verbose output provides extensive information about the settings under which the command ran:

dcli -v dateoptions.nodes: Noneoptions.destfile: Noneoptions.file: 
Noneoptions.group: dcserversoptions.maxLines: 100000options.listNegatives: 
Falseoptions.pushKey: Falseoptions.regexp: Noneoptions.sshOptions: 
Noneoptions.scpOptions: Noneoptions.dropKey: Falseoptions.serializeOps:
 Falseoptions.userID: rootoptions.verbosity 1options.vmstatOps 
Noneoptions.execfile: Noneargv: ['/opt/oracle/bda/bin/dcli', 
'-g', 'dcservers', '-v', 'date']Success connecting to nodes: 
['bda1node07.example.com']...entering thread for 
bda1node07.example.com:execute: /usr/bin/ssh 
-l root bda1node07.example.com ' date' ...exiting thread for 
bda1node07.example.com status: 0bda1node07.example.com: 
Tue Feb 14 10:24:43 PST 2013]

2 dcli Syntax

dcli [options] [command]

Parameters

options
The options described in below table. You can omit all options to run a command on all servers.

command
Any command that runs from the operating system prompt. If the command contains punctuation marks or special characters, then enclose the command in double quotation marks.

The backslash (\) is the escape character. Precede the following special characters with a backslash on the command line to prevent interpretation by the shell. The backslash is not needed in a command file. See the -x option for information about command files.
$ (dollar sign)
' (quotation mark)
< (less than)
> (greater than)
( ) (parentheses)

Table  dcli Options

Option Description
-c nodes Specifies a comma-separated list of Oracle Big Data Appliance

servers where the command is executed

-C Uses the list of servers in /opt/oracle/bda/cluster-rack-infiniband

as the target.

-d destfile Specifies a target directory or file name for the -f option
-f file Specifies files to be copied to the user’s home directory on the

target servers. The files are not executed. See the -l option.

-g groupfile Specifies a file containing a list of Oracle Big Data Appliance

servers where the command is executed. Either server names

or IP addresses can be used in the file.

-h--help Displays a description of the commands
-k Pushes the ssh key to each server’s /root/.ssh/authorized_keys file.
-l userid Identifies the user ID for logging in to another server.

The default ID is root.

--maxlines=maxlines Identifies the maximum lines of output displayed from a

command executed on multiple servers. The default is 10,000 lines.

-n Abbreviates the output for non-error messages. Only the

server name is displayed when a server returns normal

output (return code 0).You cannot use the

-n and -r options together.

-r regexp Replaces the output with the server name for lines

that match the specified regular expression

-s sshoptions Specifies a string of options that are passed to SSH
--scp=scpoptions Specifies a string of options that are passed to

Secure Copy (SCP), when these options are different from

sshoptions

--serial Serializes execution over the servers. The default is

parallel execution.

-t Lists the target servers
--unkey Drops the keys from the authorized_key files of the target

servers

-v Displays the verbose version of all messages
--version Displays the dcli version number
--vmstat=VMSTATOPS Displays the syntax of the Linux Virtual Memory Statistics utility

(vmstat). This command returns process, virtual memory, disk,

trap, and CPU activity information.To issue a vmstat command,

enclose its options in quotation marks. For example:

--vmstat="-a 3 5"

See your Linux documentation for more information about vmstat.

-x execfile Specifies a command file to be copied to the user’s home directory

and executed on the target servers. See the -l option.

3 dcli Return Values

  • 0: The command ran successfully on all servers.
  • 1: One or more servers were inaccessible or remote execution returned a nonzero value. A message lists the unresponsive servers. Execution continues on the other servers.
  • 2: A local error prevented the command from executing.

If you interrupt the local dcli process, then the remote commands may continue without returning their output or status.

4 dcli Examples

Following are examples of the dcli utility.

This example returns the default list of target servers:

# dcli -t
Target nodes: ['bda1node01-adm.example.com', 'bda1node02-adm.example.com', 
'bda1node03-adm.example.com', 'bda1node04-adm.example.com', 
'bda1node05-adm.example.com', 'bda1node06-adm.example.com', 
'bda1node07-adm.example.com', 'bda1node08-adm.example.com', 
'bda1node09-adm.example.com']

The next example checks the temperature of all servers:

# dcli 'ipmitool sunoem cli "show /SYS/T_AMB" | grep value'

bda1node01-adm.example.com: value = 22.000 degree C
bda1node02-adm.example.com: value = 22.000 degree C
bda1node03-adm.example.com: value = 22.000 degree C
bda1node04-adm.example.com: value = 23.000 degree C

Are you getting the most out of your Exadata? “By Tanel Poder”

Checking Whether Smart Scanning Is Used

Let’s first see how to identify whether your queries are taking full advantage of the Exadata “secret sauce” – Smart Scans. Here’s a simple execution plan, which has some Exadata- specific elements in it:

Ekran Resmi 2014-07-01 16.36.59

 

Ekran Resmi 2014-07-01 16.38.34

 

All the TABLE ACCESS FULL row sources do have the STORAGE keyword in them. It is important to know that this does not mean that Exadata smart scans are used. The STORAGE keyword in row sources means that this execution plan is using an Exadata- storage aware and smart scan capable row source, but it doesn’t tell us whether a smart scan actually happened for given execution. Also, in the bottom you see two storage() predicates in addition to the usual filter() and access() ones. This shows that if a smart scan were chosen for scanning some segments during given SQL execution, then Oracle would

attempt to offload the predicate conditions into the cells. If the smart scan was not used, then predicate offloading did not happen either (as the filter predicate offloading is part of smart scan).

Important:

The key thing to remember is that Exadata smart scan is not only a property of an execution plan – it’s actually a runtime decision done separately for each table/index/partition segment accessed with a full scan. It is not possible to determine whether a smart scan happened just by looking into the execution plan, you should measure execution metrics from V$SQL or V$SESSION to be sure.

It is also possible that a smart scan is attempted against a segment, but during the smart scan execution, the cells have to fall back to regular block IO mode for some blocks and ship the blocks back to database for processing – instead of extracting rows from them inside the cell. For example, this happens for blocks for which the consistent reads require access to undo data or there are chained rows in a block. There are more reasons and we explain these in detail in the upcoming Expert Oracle Exadata book.

This should also explain why is there a filter predicate (executed in the database layer) in addition to every storage predicate in the plan, because sometimes the filtering cannot be entirely offloaded to the storage cells and the database layer has to perform the final filtering.

Serial Execution Without Smart Scan

Let’s look into an example query now, executed in serial mode at first. Basically it’s returning all customers who have ever made orders where the total order amount exceeds their customer-specific credit limit. All the tables are partitioned and their total size is around 11GB. Note that this query is written against a regular normalized OLTP-style schema, not a star- or snowflake schema which is you are likely using in your DW databases. But for purpose of this demo it should be enough.

Ekran Resmi 2014-07-01 16.39.43

When executed on an otherwise idle quarter rack Exadata V2 installation, it took 151 seconds to run, which seems too much, knowing that the smart scans on even a quarter rack can scan data on hard disks multiple gigabytes per second (and even faster from flash cache).

So, I will identify the SQL ID, child cursor number and execution ID of this current ongoing SQL execution first:

Ekran Resmi 2014-07-01 16.40.50

Now, let’s look into V$SQL first, using the SQL ID and child cursor number:

Ekran Resmi 2014-07-01 16.41.26

The physical_read_bytes metric (phyrd_mb) shows that Oracle database layer has issued 10833 MB worth of IO calls for this SQL. And the io_interconnect_bytes (ret_mb) shows that this query has also used 10833 MB worth of IO interconnect traffic (between database host and storage cells). So, the smart scans were not able to reduce the cell-database IO traffic at all. In fact, when looking into io_cell_offload_eligible_bytes (elig_mb), it’s zero. This means that the database has not even tried to do smart scan offloading for this statement. If 10GB worth of segments would be read via smart scans, then the “eligible bytes for offload” would also show 10GB. So, this io_cell_offload_eligible_bytes metric is a key for determining whether any offloading has been attempted for a query. Note that V$SQL accumulates statistics over multiple executions of the same query, so if this cursor has already been executed before (and is still in cache) you should not look into the absolute values in the V$SQL columns, but rather by how much they increase (calculate deltas from before- and after-test values).

Another option is to look into what the session(s) executing this SQL are doing and waiting for. You can use SQL trace or ASH for this, or any other tool, which is capable of extracting the wait information from Oracle. Here’s a wait profile example from the problem session, taken with Oracle Session Snapper

Ekran Resmi 2014-07-01 16.42.11

 

Indeed we aren’t seeing the cell smart table scan (or cell smart index scan) wait events, but a cell multiblock physical read wait event as the top one – which shows that half of the query time is spent doing regular multiblock IOs from storage to database. Note that there’s still a chance that smart scans are happening for some tables involved in the query (but they don’t show up in the top waits thanks to their fast, asynchronous nature), but the regular multiblock reads show up thanks to rest of the tables not using smart scan. Luckily it’s possible to check exactly which execution plan row sources do use the smart scan and how much do they benefit from it.

We can get this information from the Real Time SQL Monitoring views, either by manually querying V$SQL_PLAN_MONITOR or by Grid Control/Database Control UI tools. I like to always know where the data is coming from, so let’s start from a manual query. Note that the sql_id and sql_exec_id values are taken from my previous query against V$SESSION above:

Ekran Resmi 2014-07-01 16.42.47

The V$SQL_PLAN_MONITOR doesn’t have the io_offload_eligible_bytes column showing how many bytes worth of segments were scanned using smart scans, but nevertheless the io_interconnect_bytes column tells us how many bytes of IO traffic between the database host and cells were done for the given access path. In the above example, exactly the same amount of data was returned by cells (RET_MB) as requested by database (PHYRD_MB), so there was no interconnect traffic saving at all. When these numbers exactly match, this is a good indication that no offloading was performed (as all the IO requested by database was returned in untouched blocks to the database and no data reduction due to filtering and projection offloading was performed in the cells). So, this is additional confirmation that none of the tables were scanned with smart scan.

I caused the above problem deliberately. I just set the undocumented parameter _small_table_threshhold to a big value (1000000 blocks) in my testing session. This made Oracle table scanning function think that all the scanned partitions were “small” segments,

which should be scanned using regular buffered reads as opposed to direct path reads, which are a pre-requisite for smart scans. Note that the above query was executed in serial mode, not parallel. Starting from Oracle 11g, Oracle can decide to use direct path reads (and as a result smart scans) even for serial sessions. The direct path read decision is done during runtime, separately for each table/index/partition segment accessed – and it’s dependent on how many blocks this segment has under its HWM and what’s the buffer cache size etc. So, you might encounter such an issue of Oracle deciding to not use direct path reads (and thus smart scan) in real life too – and as it’s an automatic decision, it may change unexpectedly.

Serial Execution With Smart Scan

Let’s run the same query without my trick to disable smart scans, I’m using default session parameters now. The query, still executed in serial, tool 26 seconds (as opposed to 151 seconds previously). The wait profile looks different, the cell smart table scan wait is present and there are no regular block IO related waits. The CPU usage percentage is higher, but remember that this query completed over 5 times faster than previously. Higher CPU usage with lower response time is a good thing, as high CPU usage means that your query spent less time waiting instead of working and wasn’t throttled by IO and other issues.

Ekran Resmi 2014-07-01 16.43.42

For simple enough queries you might actually see that only a couple of percent of response time is spent waiting for smart scans and all the rest is CPU usage. This happens thanks to the asynchronous nature of smart scans, where cells work for the database sessions independently and may be able to constantly have some data ready for the DB session to consume.

Now, let’s look into V$SQL entries of that cursor (a new child cursor 1 was parsed thanks to me changing the _small_table_threshold parameter value back):

Ekran Resmi 2014-07-01 16.44.19

 

Apparently all the physical IOs requested were requested using the smart scanning method, as ELIG_MB is the same as PHYRD_MB. Apparently some filtering and projection was done in the cell as the interconnect traffic (RET_MB) of that statement is about 69.2% less than the scanned segments sizes on disk.

Let’s drill down into individual execution plan lines, which allows us to look into the efficiency of accessing individual tables. In real life DWs you are more likely dealing with 10..20 table joins, not 3-table joins. Note that the sql_exec_id has increased as I’ve re- executed the SQL statement:

Ekran Resmi 2014-07-01 16.44.55

The above output clearly shows that the ORDERS and CUSTOMERS tables benefit from smart scan the most.

Note that the execution plan join order has changed too, this is thanks to the automatic cursor re-optimization using cardinality feedback from previous child cursor’s execution statistics.

Summary

In order to get the most out of your Exadata performance for your DW application, you’ll have to use smart scans. Otherwise you won’t be able to take advantage of the cells computing power and its IO reduction features. Full segment scans (like full table/partition scans and fast full index scans) are a pre-requisite for smart scans. Additionally, direct path reads have to be used in order to the smart scans to kick in. For serial sessions, the direct path read decision is done based on the scanned segment size, buffer cache size and some other factors. For parallel execution, direct path access is always used for full scans, unless you use the new parallel_degree_policy = AUTO feature, in which case the decision would again be dynamic.

 

 

Siebel on Exadata – are there any design patterns?

Oracle would say that Exadata is transparant to Siebel. But is it really?

The single biggest benefit of Exadata is the concept of Storage Indexes which reside on the the Storage Server – versus traditional B-Tree Database Indexes which are physically stored on disk or reside in cache when being accessed.

Storage indexes are great. Especially for those expensive Siebel queries which require full-tables scan. Especially then Storage indexes can result in impressive performance gains. Storage indexes essentially prevent I/O by telling which 1M areas of data do NOT contain data. They can be helpful as well for “IS NOT NULL” predicates – which B-Tree indexes cannot cope with very well either is most cases.

But there can be only eight Storage indexes on a single table, this means only eight columns can be indexed by the Storage server. And well hey, Siebel tends to have very wide tables which huge logical row sizes.

Storage indexes are being created dynamically by the Storage server based on the workload – especially the WHERE predicates used. If certain columns are accessed frequently – the column will become more likely to receive a Storage index.

The limit of eight Storage indexes is really worth to note. They will be quite suited for visibility intersection tables (e.g. S_POSTN_CON, S_SRV_REQ_BU, etc.). These are tables carrying only a limited number of columns with some denormalized columns from the related base tables. Driving most of the expensive/heavy queries through visibility intersection tables is ideal – especially if the appropriate key columns have been denormalized. In most cases – customers have not gone this path. It requires Oracle’s ACS/ES assistance to create denormalized columns. But the benefit is typically impressive. And with Exadata having your key query columns denormalized will improve the benefit of Storage indices – since the limit of eight indexes per table will be reasonable.

One other design pattern when running Siebel on Exadata might be to use 1:1 extension tables. On non-Exadata implementation you rarther do not use 1:1 extension tables. Especially not for those columns used in WHERE predicates. But given the concept of Storage indexes there are situations where it will better to put a column which is queried in an extension table – to ensure that the Storage server will create an Storage index on that table. This will help especially for those users having to run typically expensive queries (needed for list exports, BIP reports, etc.). It could even make sense to create “denormalized” 1:1 extension columns for this fact (e.g. denormalizing through BC layer using On Field Update user property). This enables you both to create concatenated indexes B-Tree indexes and allow for particular queries or reports to leverage the denormalized column.