Spark Streaming with HBase

January 25, 2016 Leave a comment

What is Spark Streaming?

First of all, what is streaming? A data stream is an unbounded sequence of data arriving continuously. Streaming divides continuously flowing input data into discrete units for processing. Stream processing is low latency processing and analyzing of streaming data. Spark Streaming is an extension of the core Spark API that enables scalable, high-throughput, fault-tolerant stream processing of live data. Spark Streaming is for use cases which require a significant amount of data to be quickly processed as soon as it arrives. Example real-time use cases are:

  • Website monitoring , Network monitoring
  • Fraud detection
  • Web clicks
  • Advertising
  • Internet of Things: sensors

Spark Streaming supports data sources such as HDFS directories, TCP sockets, Kafka, Flume, Twitter, etc. Data Streams can be processed with Spark’s core APIS, DataFrames SQL, or machine learning APIs, and can be persisted to a filesystem, HDFS, databases, or any data source offering a Hadoop OutputFormat.

How Spark Streaming Works

Streaming data is continuous and needs to be batched to process. Spark Streaming divides the data stream into batches of X seconds called Dstreams, which internally is a sequence of RDDs. Your Spark Application processes the RDDs using Spark APIs, and the processed results of the RDD operations are returned in batches.

Architecture of the example Streaming Application

The Spark Streaming example code does the following:

  • Reads streaming data.
  • Processes the streaming data.
  • Writes the processed data to an HBase Table.

Other Spark example code does the following:

  • Reads HBase Table data written by the streaming code
  • Calculates daily summary statistics
  • Writes summary statistics to the HBase table Column Family stats

Example data set

The Oil Pump Sensor data comes in as comma separated value (csv) files dropped in a directory. Spark Streaming will monitor the directory and process any files created in that directory. (As stated before, Spark Streaming supports different streaming data sources; for simplicity, this example will use files.) Below is an example of the csv file with some sample data:

We use a Scala case class to define the Sensor schema corresponding to the sensor data csv files, and a parseSensor function to parse the comma separated values into the sensor case class.

// schema for sensor data
case class Sensor(resid: String, date: String, time: String, hz: Double, disp: Double, flo: Double, 
          sedPPM: Double, psi: Double, chlPPM: Double)

object Sensor {
   // function to parse line of csv data into Sensor class
   def parseSensor(str: String): Sensor = {
       val p = str.split(",")
        Sensor(p(0), p(1), p(2), p(3).toDouble, p(4).toDouble, p(5).toDouble, p(6).toDouble,
            p(7).toDouble, p(8).toDouble)
  }
…
}

HBase Table schema

The HBase Table Schema for the streaming data is as follows:

  • Composite row key of the pump name date and time stamp
  • Column Family data with columns corresponding to the input data fields Column Family alerts with columns corresponding to any filters for alarming values Note that the data and alert column families could be set to expire values after a certain amount of time.

The Schema for the daily statistics summary rollups is as follows:

  • Composite row key of the pump name and date
  • Column Family stats
  • Columns for min, max, avg.

The function below converts a Sensor object into an HBase Put object, which is used to insert a row into HBase.

val cfDataBytes = Bytes.toBytes("data")

object Sensor {
. . .
  //  Convert a row of sensor object data to an HBase put object
  def convertToPut(sensor: Sensor): (ImmutableBytesWritable, Put) = {
      val dateTime = sensor.date + " " + sensor.time
      // create a composite row key: sensorid_date time
      val rowkey = sensor.resid + "_" + dateTime
      val put = new Put(Bytes.toBytes(rowkey))
      // add to column family data, column  data values to put object 
      put.add(cfDataBytes, Bytes.toBytes("hz"), Bytes.toBytes(sensor.hz))
      put.add(cfDataBytes, Bytes.toBytes("disp"), Bytes.toBytes(sensor.disp))
      put.add(cfDataBytes, Bytes.toBytes("flo"), Bytes.toBytes(sensor.flo))
      put.add(cfDataBytes, Bytes.toBytes("sedPPM"), Bytes.toBytes(sensor.sedPPM))
      put.add(cfDataBytes, Bytes.toBytes("psi"), Bytes.toBytes(sensor.psi))
      put.add(cfDataBytes, Bytes.toBytes("chlPPM"), Bytes.toBytes(sensor.chlPPM))
      return (new ImmutableBytesWritable(Bytes.toBytes(rowkey)), put)
  }
}

Configuration for Writing to an HBase Table

You can use the TableOutputFormat class with Spark to write to an HBase table, similar to how you would write to an HBase table from MapReduce. Below we set up the configuration for writing to HBase using the TableOutputFormat class.

   val tableName = "sensor"
     
   // set up Hadoop HBase configuration using TableOutputFormat
    val conf = HBaseConfiguration.create()
    conf.set(TableOutputFormat.OUTPUT_TABLE, tableName)
    val jobConfig: jobConfig = new JobConf(conf, this.getClass)
    jobConfig.setOutputFormat(classOf[TableOutputFormat])
    jobConfig.set(TableOutputFormat.OUTPUT_TABLE, tableName)

The Spark Streaming Example Code

These are the basic steps for Spark Streaming code:

  1. Initialize a Spark StreamingContext object.
  2. Apply transformations and output operations to DStreams.
  3. Start receiving data and processing it using streamingContext.start().
  4. Wait for the processing to be stopped using streamingContext.awaitTermination().

We will go through each of these steps with the example application code.

Initializing the StreamingContext

First we create a StreamingContext, the main entry point for streaming functionality, with a 2 second batch interval. (In the code boxes, comments are in Green)

val sparkConf = new SparkConf().setAppName("HBaseStream")

//  create a StreamingContext, the main entry point for all streaming functionality
val ssc = new StreamingContext(sparkConf, Seconds(2))

Next, we use the StreamingContext textFileStream(directory) method to create an input stream that monitors a Hadoop-compatible file system for new files and processes any files created in that directory.

// create a DStream that represents streaming data from a directory source
val linesDStream = ssc.textFileStream("/user/user01/stream")

The linesDStream represents the stream of data, each record is a line of text. Internally a DStream is a sequence of RDDs, one RDD per batch interval.

Apply transformations and output operations to DStreams

Next we parse the lines of data into Sensor objects, with the map operation on the linesDStream.

// parse each line of data in linesDStream  into sensor objects

val sensorDStream = linesDStream.map(Sensor.parseSensor) 

The map operation applies the Sensor.parseSensor function on the RDDs in the linesDStream, resulting in RDDs of Sensor objects.

Next we use the DStream foreachRDD method to apply processing to each RDD in this DStream. We filter the sensor objects for low psi to create alerts, then we write the sensor and alert data to HBase by converting them to Put objects, and using the PairRDDFunctions saveAsHadoopDatasetmethod, which outputs the RDD to any Hadoop-supported storage system using a Hadoop Configuration object for that storage system (see Hadoop Configuration for HBase above).

// for each RDD. performs function on each RDD in DStream
sensorRDD.foreachRDD { rdd =>
        // filter sensor data for low psi
     val alertRDD = rdd.filter(sensor => sensor.psi < 5.0)

      // convert sensor data to put object and write to HBase  Table CF data
      rdd.map(Sensor.convertToPut).saveAsHadoopDataset(jobConfig)

     // convert alert to put object write to HBase  Table CF alerts
     rdd.map(Sensor.convertToPutAlert).saveAsHadoopDataset(jobConfig)
}

The sensorRDD objects are converted to put objects then written to HBase.

Start receiving data

To start receiving data, we must explicitly call start() on the StreamingContext, then call awaitTermination to wait for the streaming computation to finish.

    // Start the computation
    ssc.start()
    // Wait for the computation to terminate
    ssc.awaitTermination()

Spark Reading from and Writing to HBase

Now we want to read the HBase sensor table data , calculate daily summary statistics and write these statistics to the stats column family.

The code below reads the HBase table sensor table psi column data, calculates statistics on this data using StatCounter, then writes the statistics to the sensor stats column family.

     // configure HBase for reading 
    val conf = HBaseConfiguration.create()
    conf.set(TableInputFormat.INPUT_TABLE, HBaseSensorStream.tableName)
    // scan data column family psi column
    conf.set(TableInputFormat.SCAN_COLUMNS, "data:psi") 

// Load an RDD of (row key, row Result) tuples from the table
    val hBaseRDD = sc.newAPIHadoopRDD(conf, classOf[TableInputFormat],
      classOf[org.apache.hadoop.hbase.io.ImmutableBytesWritable],
      classOf[org.apache.hadoop.hbase.client.Result])

    // transform (row key, row Result) tuples into an RDD of Results
    val resultRDD = hBaseRDD.map(tuple => tuple._2)

    // transform into an RDD of (RowKey, ColumnValue)s , with Time removed from row key
    val keyValueRDD = resultRDD.
              map(result => (Bytes.toString(result.getRow()).
              split(" ")(0), Bytes.toDouble(result.value)))

    // group by rowkey , get statistics for column value
    val keyStatsRDD = keyValueRDD.
             groupByKey().
             mapValues(list => StatCounter(list))

    // convert rowkey, stats to put and write to hbase table stats column family
    keyStatsRDD.map { case (k, v) => convertToPut(k, v) }.saveAsHadoopDataset(jobConfig)

The diagram below shows that the output from newAPIHadoopRDD is an RDD of row key, result pairs. The PairRDDFunctions saveAsHadoopDataset saves the Put objects to HBase.

Software

Running the Application

You can run the code as a standalone application as described in the tutorial on Getting Started with Spark on MapR Sandbox.

Here are the steps summarized:

  1. Log into the MapR Sandbox, as explained in Getting Started with Spark on MapR Sandbox, using userid user01, password mapr.
  2. Build the application using maven.
  3. Copy the jar file and data file to your sandbox home directory /user/user01 using scp.
  4. Run the streaming app:
     /opt/mapr/spark/spark-1.3.1/bin/spark-submit --driver-class-path `hbase classpath` 
       --class examples.HBaseSensorStream sparkstreamhbaseapp-1.0.jar
    
  5. Copy the streaming data file to the stream directory:
    cp sensordata.csv /user/user01/stream/
  6. Read data and calculate stats for one column
       /opt/mapr/spark/spark-1.3.1/bin/spark-submit --driver-class-path `hbase classpath` 
        --class examples.HBaseReadWrite sparkstreamhbaseapp-1.0.jar
    
  7. Calculate stats for whole row
      /opt/mapr/spark/spark-1.3.1/bin/spark-submit --driver-class-path `hbase classpath` 
       --class examples.HBaseReadRowWriteStats sparkstreamhbaseapp-1.0.jar
Categories: #oracle_Emp, big data

An In-Depth Look at the HBase Architecture

January 25, 2016 Leave a comment

HBase Architectural Components

Physically, HBase is composed of three types of servers in a master slave type of architecture. Region servers serve data for reads and writes. When accessing data, clients communicate with HBase RegionServers directly. Region assignment, DDL (create, delete tables) operations are handled by the HBase Master process. Zookeeper, which is part of HDFS, maintains a live cluster state.

The Hadoop DataNode stores the data that the Region Server is managing. All HBase data is stored in HDFS files. Region Servers are collocated with the HDFS DataNodes, which enable data locality (putting the data close to where it is needed) for the data served by the RegionServers. HBase data is local when it is written, but when a region is moved, it is not local until compaction.

The NameNode maintains metadata information for all the physical data blocks that comprise the files.

Regions

HBase Tables are divided horizontally by row key range into “Regions.” A region contains all rows in the table between the region’s start key and end key. Regions are assigned to the nodes in the cluster, called “Region Servers,” and these serve data for reads and writes. A region server can serve about 1,000 regions.

HBase HMaster

Region assignment, DDL (create, delete tables) operations are handled by the HBase Master.

A master is responsible for:

  • Coordinating the region servers- Assigning regions on startup , re-assigning regions for recovery or load balancing- Monitoring all RegionServer instances in the cluster (listens for notifications from zookeeper)
  • Admin functions- Interface for creating, deleting, updating tables

ZooKeeper: The Coordinator

HBase uses ZooKeeper as a distributed coordination service to maintain server state in the cluster. Zookeeper maintains which servers are alive and available, and provides server failure notification. Zookeeper uses consensus to guarantee common shared state. Note that there should be three or five machines for consensus.

How the Components Work Together

Zookeeper is used to coordinate shared state information for members of distributed systems. Region servers and the active HMaster connect with a session to ZooKeeper. The ZooKeeper maintains ephemeral nodes for active sessions via heartbeats.

Each Region Server creates an ephemeral node. The HMaster monitors these nodes to discover available region servers, and it also monitors these nodes for server failures. HMasters vie to create an ephemeral node. Zookeeper determines the first one and uses it to make sure that only one master is active. The active HMaster sends heartbeats to Zookeeper, and the inactive HMaster listens for notifications of the active HMaster failure.

If a region server or the active HMaster fails to send a heartbeat, the session is expired and the corresponding ephemeral node is deleted. Listeners for updates will be notified of the deleted nodes. The active HMaster listens for region servers, and will recover region servers on failure. The Inactive HMaster listens for active HMaster failure, and if an active HMaster fails, the inactive HMaster becomes active.

HBase First Read or Write

There is a special HBase Catalog table called the META table, which holds the location of the regions in the cluster. ZooKeeper stores the location of the META table.

This is what happens the first time a client reads or writes to HBase:

  1. The client gets the Region server that hosts the META table from ZooKeeper.
  2. The client will query the .META. server to get the region server corresponding to the row key it wants to access. The client caches this information along with the META table location.
  3. It will get the Row from the corresponding Region Server.

For future reads, the client uses the cache to retrieve the META location and previously read row keys. Over time, it does not need to query the META table, unless there is a miss because a region has moved; then it will re-query and update the cache.

HBase Meta Table

  • This META table is an HBase table that keeps a list of all regions in the system.
  • The .META. table is like a b tree.
  • The .META. table structure is as follows:- Key: region start key,region id- Values: RegionServer

Region Server Components

A Region Server runs on an HDFS data node and has the following components:

  • WAL: Write Ahead Log is a file on the distributed file system. The WAL is used to store new data that hasn’t yet been persisted to permanent storage; it is used for recovery in the case of failure.
  • BlockCache: is the read cache. It stores frequently read data in memory. Least Recently Used data is evicted when full.
  • MemStore: is the write cache. It stores new data which has not yet been written to disk. It is sorted before writing to disk. There is one MemStore per column family per region.
  • Hfiles store the rows as sorted KeyValues on disk.

HBase Write Steps (1)

When the client issues a Put request, the first step is to write the data to the write-ahead log, the WAL:

– Edits are appended to the end of the WAL file that is stored on disk. – The WAL is used to recover not-yet-persisted data in case a server crashes.

HBase Write Steps (2)

Once the data is written to the WAL, it is placed in the MemStore. Then, the put request acknowledgement returns to the client.

HBase MemStore

The MemStore stores updates in memory as sorted KeyValues, the same as it would be stored in an HFile. There is one MemStore per column family. The updates are sorted per column family.

HBase Region Flush

When the MemStore accumulates enough data, the entire sorted set is written to a new HFile in HDFS. HBase uses multiple HFiles per column family, which contain the actual cells, or KeyValue instances. These files are created over time as KeyValue edits sorted in the MemStores are flushed as files to disk.

Note that this is one reason why there is a limit to the number of column families in HBase. There is one MemStore per CF; when one is full, they all flush. It also saves the last written sequence number so the system knows what was persisted so far.

The highest sequence number is stored as a meta field in each HFile, to reflect where persisting has ended and where to continue. On region startup, the sequence number is read, and the highest is used as the sequence number for new edits.

HBase HFile

Data is stored in an HFile which contains sorted key/values. When the MemStore accumulates enough data, the entire sorted KeyValue set is written to a new HFile in HDFS. This is a sequential write. It is very fast, as it avoids moving the disk drive head.

HBase HFile Structure

An HFile contains a multi-layered index which allows HBase to seek to the data without having to read the whole file. The multi-level index is like a b+tree:

  • Key value pairs are stored in increasing order
  • Indexes point by row key to the key value data in 64KB “blocks”
  • Each block has its own leaf-index
  • The last key of each block is put in the intermediate index
  • The root index points to the intermediate index

The trailer points to the meta blocks, and is written at the end of persisting the data to the file. The trailer also has information like bloom filters and time range info. Bloom filters help to skip files that do not contain a certain row key. The time range info is useful for skipping the file if it is not in the time range the read is looking for.

HFile Index

The index, which we just discussed, is loaded when the HFile is opened and kept in memory. This allows lookups to be performed with a single disk seek.

HBase Read Merge

We have seen that the KeyValue cells corresponding to one row can be in multiple places, row cells already persisted are in Hfiles, recently updated cells are in the MemStore, and recently read cells are in the Block cache. So when you read a row, how does the system get the corresponding cells to return? A Read merges Key Values from the block cache, MemStore, and HFiles in the following steps:

  1. First, the scanner looks for the Row cells in the Block cache – the read cache. Recently Read Key Values are cached here, and Least Recently Used are evicted when memory is needed.
  2. Next, the scanner looks in the MemStore, the write cache in memory containing the most recent writes.
  3. If the scanner does not find all of the row cells in the MemStore and Block Cache, then HBase will use the Block Cache indexes and bloom filters to load HFiles into memory, which may contain the target row cells.

HBase Read Merge

As discussed earlier, there may be many HFiles per MemStore, which means for a read, multiple files may have to be examined, which can affect the performance. This is called read amplification.

HBase Minor Compaction

HBase will automatically pick some smaller HFiles and rewrite them into fewer bigger Hfiles. This process is called minor compaction. Minor compaction reduces the number of storage files by rewriting smaller files into fewer but larger ones, performing a merge sort.

HBase Major Compaction

Major compaction merges and rewrites all the HFiles in a region to one HFile per column family, and in the process, drops deleted or expired cells. This improves read performance; however, since major compaction rewrites all of the files, lots of disk I/O and network traffic might occur during the process. This is called write amplification.

Major compactions can be scheduled to run automatically. Due to write amplification, major compactions are usually scheduled for weekends or evenings. Note that MapR-DB has made improvements and does not need to do compactions. A major compaction also makes any data files that were remote, due to server failure or load balancing, local to the region server.

Region = Contiguous Keys

Let’s do a quick review of regions:

  • A table can be divided horizontally into one or more regions. A region contains a contiguous, sorted range of rows between a start key and an end key
  • Each region is 1GB in size (default)
  • A region of a table is served to the client by a RegionServer
  • A region server can serve about 1,000 regions (which may belong to the same table or different tables)

Region Split

Initially there is one region per table. When a region grows too large, it splits into two child regions. Both child regions, representing one-half of the original region, are opened in parallel on the same Region server, and then the split is reported to the HMaster. For load balancing reasons, the HMaster may schedule for new regions to be moved off to other servers.

Read Load Balancing

Splitting happens initially on the same region server, but for load balancing reasons, the HMaster may schedule for new regions to be moved off to other servers. This results in the new Region server serving data from a remote HDFS node until a major compaction moves the data files to the Regions server’s local node. HBase data is local when it is written, but when a region is moved (for load balancing or recovery), it is not local until major compaction.

HDFS Data Replication

All writes and Reads are to/from the primary node. HDFS replicates the WAL and HFile blocks. HFile block replication happens automatically. HBase relies on HDFS to provide the data safety as it stores its files. When data is written in HDFS, one copy is written locally, and then it is replicated to a secondary node, and a third copy is written to a tertiary node.

HDFS Data Replication (2)

The WAL file and the Hfiles are persisted on disk and replicated, so how does HBase recover the MemStore updates not persisted to HFiles? See the next section for the answer.

HBase Crash Recovery

When a RegionServer fails, Crashed Regions are unavailable until detection and recovery steps have happened. Zookeeper will determine Node failure when it loses region server heart beats. The HMaster will then be notified that the Region Server has failed.

When the HMaster detects that a region server has crashed, the HMaster reassigns the regions from the crashed server to active Region servers. In order to recover the crashed region server’s memstore edits that were not flushed to disk. The HMaster splits the WAL belonging to the crashed region server into separate files and stores these file in the new region servers’ data nodes. Each Region Server then replays the WAL from the respective split WAL, to rebuild the memstore for that region.

Data Recovery

WAL files contain a list of edits, with one edit representing a single put or delete. Edits are written chronologically, so, for persistence, additions are appended to the end of the WAL file that is stored on disk.

What happens if there is a failure when the data is still in memory and not persisted to an HFile? The WAL is replayed. Replaying a WAL is done by reading the WAL, adding and sorting the contained edits to the current MemStore. At the end, the MemStore is flush to write changes to an HFile.

Apache HBase Architecture Benefits

HBase provides the following benefits:

  • Strong consistency model- When a write returns, all readers will see same value
  • Scales automatically- Regions split when data grows too large- Uses HDFS to spread and replicate data
  • Built-in recovery- Using Write Ahead Log (similar to journaling on file system)
  • Integrated with Hadoop- MapReduce on HBase is straightforward

Apache HBase Has Problems Too…

  • Business continuity reliability:- WAL replay slow- Slow complex crash recovery- Major Compaction I/O storms

MapR-DB with MapR-FS does not have these problems

The diagram below compares the application stacks for Apache HBase on top of HDFS on the left, Apache HBase on top of MapR’s read/write file system MapR-FS in the middle, and MapR-DB and MapR-FS in a Unified Storage Layer on the right.

MapR-DB exposes the same HBase API and the Data model for MapR-DB is the same as for Apache HBase. However the MapR-DB implementation integrates table storage into the MapR file system, eliminating all JVM layers and interacting directly with disks for both file and table storage.

MapR-DB offers many benefits over HBase, while maintaining the virtues of the HBase API and the idea of data being sorted according to primary key. MapR-DB provides operational benefits such as no compaction delays and automated region splits that do not impact the performance of the database. The tables in MapR-DB can also be isolated to certain machines in a cluster by utilizing the topology feature of MapR. The final differentiator is that MapR-DB is just plain fast, due primarily to the fact that it is tightly integrated into the MapR file system itself, rather than being layered on top of a distributed file system that is layered on top of a conventional file system.

Key differences between MapR-DB and Apache HBase

  • Tables part of the MapR Read/Write File system
    • Guaranteed data locality
  • Smarter load balancing
    • Uses container Replicas
  • Smarter fail over
    • Uses container replicas
  • Multiple small WALs
    • Faster recovery
  • Memstore Flushes Merged into Read/Write File System
    • No compaction !

You can take this free On Demand training to learn more about MapR-FS and MapR-DB 

In this blog post, you learned more about the HBase architecture and its main benefits over NoSQL data store solutions. If you have any questions about HBase, please ask them in the comments section below.

Categories: #oracle_Emp, big data

Comparing SQL Functions and Performance with Apache Spark and Apache Drill

January 25, 2016 Leave a comment

SQL engines for Hadoop differ in their approach and functionality. My focus for this blog post is to compare and contrast the functions and performance of Apache Spark and Apache Drill and discuss their expected use cases.

Running queries and analysis on structured databases is a standard operation and has been in place for decades. This typically involves developing a structure and schema which gets implemented in a database.

Although the data handled by companies is increasing every year, the pace of this increase is of a completely different magnitude. According to a recent investigation, there are presently 3.5ZB (zetta bytes) of structured data, with 1.4 times more semi-structured and unstructured data, totaling 5ZB. By 2020, however, it is predicted that there will be roughly 6 times more semi-structured and unstructured data than structured data, totaling 30ZB to 5ZB, respectively.1

Figuring out how to analyze this growing amount of data is a key objective for business and data analysts. This is because, along with the multiplicity of data types contained in unstructured data, the data’s actual meanings (namely the words contained in word documents, video frames, etc.) are likely to depend on the actual semantics (context) of the things being analyzed. In addition to the semantics on which the data itself relies, there might also be a need to understand the data path included in that data as a part of the semantics. (For example, the data paths: “C:\Users\kiuchi\My Pets\whale\johnny.jpg” and “http://kiuchi.local/My Pets/cat/michelle.jpg” do not only lead to the image which they are referencing; they can also lead us to believe that I have a whale and a cat among my pets. Essentially, it may be possible to interpret data paths as their nested columns.)

In addition, the increase of information content contained in “semi-structured data” in lighter data exchange formats such as XML and JSON, which are inspired by the development of information sharing over the internet, has made the situation even more complicated. While this semi-structured data, because of its flexible schema structure, does not fit into data stores that strictly define schema beforehand, such as relational databases, compared to data such as emails or office documents, it can nonetheless clearly be said to be structured data. The problem of how to store and analyze such semi-structured data is currently one of the major issues faced by analytics professionals.

Among my readers, I suspect that not many people know the expression “Polyglot Persistence.” This is a concept which was originally developed by Scott Leberknight, and which was brought into the limelight in 2011 when it was taken up by Martin Fowler, an advocate of XP (eXtreme Programming). It advocates for the cross-cutting accessing of data stored in a wide variety of formats on various data sources.2 Should it be achieved, data analysts will be able to grasp the location and format of the data that they are trying to analyze, thus leading to a dramatic reduction in data conversion costs.

In other words, we can say that for big data analytics, which has recently gained a lot of exposure as a buzzword, solving the problem of handling wide varieties of data remains an ongoing effort. Both Apache Drill and Apache Spark are execution engines for Hadoop that were born under these circumstances. By comparing the functions and performance of Apache Drill and Apache Spark, this article will explore their characteristics and expected use cases.

Functional Comparison

As a data source, Apache Drill and Apache Spark support semi-structured data such as CSV and JSON, as well as relational databases through JDBC. Apache Drill also supports NoSQL databases such as MongoDB ,and HBase.

Supported Data Sources
Apache Drill CSV, JSON, Parquet
Hive, HBase,
MongoDB, MapR- DB, Amazon S3
Apache Spark CSV, JSON, Parquet
HiveQL (compatible),
JDBC

One of the defining characteristics is the fact that they are able to analyze a mix of structured data sources and semi-structured data sources. This sets them apart from traditional SQL engines. In addition, they also have means to access local files, remote data sources that support standard database connection protocols, and data sources on the internet which require connection methods other than JDBC. Finally, it should also be mentioned that both have “plugin” features, meaning that additional data sources can be added.3

Clearly, these are the designs that presuppose a wide variety of data sources and a cross-cutting data analysis. They are good representatives of next-generation SQL engines operating in a big data environment with mixed data sources taking “Polyglot Persistence” into account.

Performance Comparison and Discussion

Let me now compare the query performance of the two technologies. Here, we will use the “MovieLens” data set, which consists of a ratings database from movie viewers collected by a research project at the University of Minnesota. Performance will be measured by running three types of SQL queries, including JOIN access.

The program used here is available at GitHub(m-kiuchi/MovieLensSQL)

The results below are those that were obtained on my PC (2 cores, 6GB memory).

Where did the differences in results come from? I believe that they originated from the differences in the ideas behind Apache Spark and Apache Drill. Since Apache Drill has been developed in order to scan and filter data sources, it is likely that it does not have any extra processing. As it is scanned, a piece of data is sent to the next step of pipeline processing, with the focus being on obtaining the final results in the shortest time possible. Since it assumes that query processing completes in a short time, the process is re-run if a failure occurs during execution. On the other hand, the data source scanning in Spark is only positioned as the preliminary step for parallel processing. As such, since reusability is what is aimed for, partitioning and staging takes place in each step like MapReduce. This allows for more flexible control of the data flows in the program. In case of failure during long batch processing periods, processing can be restarted using the intermediate data, giving it increased fault tolerance. This is the likely cause of the differences in performance. Of course, if sufficient machine resources are available, or if we think of the entirety of the workflow processing time, the differences in these observed processing time might end up falling within the margin of error.

The intent here is not to make a determination of whether Apache Drill or Apache Spark is better based on the above results, whether Apache Drill or Apache Spark is better. What is important is the fact that both these engines are able to execute queries on and transparently capture data from a wide variety of data sources, with Spark being able to conduct more varied processing than SQL queries. What is clear and evident is the fact that what these two execution engines can achieve clearly goes beyond the scope of traditional relational databases.

2 There is a similar concept called “Data Virtualization”
Categories: big data

Oracle Database 11g Release 2 (11.2.0.3.0) RAC On Oracle Linux 6.3 Using VirtualBox

January 1, 2016 Leave a comment

This article describes the installation of Oracle Database 11g release 2 (11.2.0.3 64-bit) RAC on Linux (Oracle Linux 6.3 64-bit) using VirtualBox (4.2.6) with no additional shared disk devices.

  • Introduction
  • Download Software
  • VirtualBox Installation
  • Virtual Machine Setup
  • Guest Operating System Installation
  • Oracle Installation Prerequisites
    • Automatic Setup
    • Manual Setup
    • Additional Setup
  • Install Guest Additions
  • Create Shared Disks
  • Clone the Virtual Machine
  • Install the Grid Infrastructure
  • Install the Database
  • Check the Status of the RAC

 

Introduction

One of the biggest obstacles preventing people from setting up test RAC environments is the requirement for shared storage. In a production environment, shared storage is often provided by a SAN or high-end NAS device, but both of these options are very expensive when all you want to do is get some experience installing and using RAC. A cheaper alternative is to use a FireWire disk enclosure to allow two machines to access the same disk(s), but that still costs money and requires two servers. A third option is to use virtualization to fake the shared storage.

Using VirtualBox you can run multiple Virtual Machines (VMs) on a single server, allowing you to run both RAC nodes on a single machine. In addition, it allows you to set up shared virtual disks, overcoming the obstacle of expensive shared storage.

Virtual RAC

Before you launch into this installation, here are a few things to consider.

  • The finished system includes the host operating system, two guest operating systems, two sets of Oracle Grid Infrastructure (Clusterware + ASM) and two Database instances all on a single server. As you can imagine, this requires a significant amount of disk space, CPU and memory.
  • Following on from the last point, the VMs will each need at least 3G of RAM, preferably 4G if you don’t want the VMs to swap like crazy. As you can see, 11gR2 RAC requires much more memory than 11gR1 RAC. Don’t assume you will be able to run this on a small PC or laptop. You won’t.
  • This procedure provides a bare bones installation to get the RAC working. There is no redundancy in the Grid Infrastructure installation or the ASM installation. To add this, simply create double the amount of shared disks and select the “Normal” redundancy option when it is offered. Of course, this will take more disk space.
  • During the virtual disk creation, I always choose not to preallocate the disk space. This makes virtual disk access slower during the installation, but saves on wasted disk space. The shared disks must have their space preallocated.
  • This is not, and should not be considered, a production-ready system. It’s simply to allow you to get used to installing and using RAC.
  • The Single Client Access Name (SCAN) should be defined in the DNS or GNS and round-robin between one of 3 addresses, which are on the same subnet as the public and virtual IPs. Prior to 11.2.0.2 it could be defined as a single IP address in the “/etc/hosts” file, which is wrong and will cause the cluster verification to fail, but it allowed you to complete the install without the presence of a DNS. This does not seem to work for 11.2.0.2 onward.
  • The virtual machines can be limited to 2Gig of swap, which causes a prerequisite check failure, but doesn’t prevent the installation working. If you want to avoid this, define 3+Gig of swap.
  • This article uses the 64-bit versions of Oracle Linux and Oracle 11g Release 2.
  • When doing this installation on my server, I split the virtual disks on to different physical disks (“/u02”, “/u03”, “/u04”). This is not necessary, but makes things run a bit faster.

Download Software

Download the following software.

VirtualBox Installation

First, install the VirtualBox software. On RHEL and its clones you do this with the following type of command as the root user.

# rpm -Uvh VirtualBox-4.2-4.2.6_82870_fedora17-1.x86_64.rpm

The package name will vary depending on the host distribution you are using. Once complete, VirtualBox is started from the “Applications > System Tools > Oracle VM VirtualBox” menu option.

Virtual Machine Setup

Now we must define the two virtual RAC nodes. We can save time by defining one VM, then cloning it when it is installed.

Start VirtualBox and click the “New” button on the toolbar. Enter the name “ol6-112-rac1”, OS “Linux” and Version “Oracle (64 bit)”, then click the “Next” button.

New VM Wizard - Name and Operating System

Enter “4096” as the base memory size, then click the “Next” button.

New VM Wizard - Memory Size

Accept the default option to create a new virtual hard disk by clicking the “Create” button.

New VM Wizard - Hard Drive

Acccept the default hard drive file type by clicking the “Next” button.

Create Virtual Hard Drive - Hard Drive File Type

Acccept the “Dynamically allocated” option by clicking the “Next” button.

Create Virtual Hard Drive - Storage on Physical Hard Drive

Accept the default location and set the size to “30G”, then click the “Create” button. If you can spread the virtual disks onto different physical disks, that will improve performance.

Create Virtual Hard Drive - File Location And Size

The “ol6-112-rac1” VM will appear on the left hand pane. Scroll down the “Details” tab on the right and click on the “Network” link.

VirtualBox - Console

Make sure “Adapter 1” is enabled, set to “Bridged Adapter”, then click on the “Adapter 2” tab.

VirtualBox - Network Adapter 1

Make sure “Adapter 2” is enabled, set to “Bridged Adapter” or “Internal Network”, then click on the “System” section.

VirtualBox - Network Adapter 2

Move “Hard Disk” to the top of the boot order and uncheck the “Floppy” option, then click the “OK” button.

VirtualBox - System Settings

The virtual machine is now configured so we can start the guest operating system installation.

Guest Operating System Installation

With the new VM highlighted, click the “Start” button on the toolbar. On the “Select start-updisk” screen, choose the relevant Oracle Linux ISO image and click the “Start” button.

VirtualBox - Select start-up disk

The resulting console window will contain the Oracle Linux boot screen.

Oracle Linux Boot

Continue through the Oracle Linux 6 installation as you would for a basic server. A general pictorial guide to the installation can be found here. More specifically, it should be a server installation with a minimum of 4G+ swap, firewall disabled, SELinux set to permissive and the following package groups installed:

  • Base System > Base
  • Base System > Client management tools
  • Base System > Compatibility libraries
  • Base System > Hardware monitoring utilities
  • Base System > Large Systems Performance
  • Base System > Network file system client
  • Base System > Performance Tools
  • Base System > Perl Support
  • Servers > Server Platform
  • Servers > System administration tools
  • Desktops > Desktop
  • Desktops > Desktop Platform
  • Desktops > Fonts
  • Desktops > General Purpose Desktop
  • Desktops > Graphical Administration Tools
  • Desktops > Input Methods
  • Desktops > X Window System
  • Applications > Internet Browser
  • Development > Additional Development
  • Development > Development Tools

To be consistent with the rest of the article, the following information should be set during the installation:

  • hostname: ol6-112-rac1.localdomain
  • IP Address eth0: 192.168.0.111 (public address)
  • Default Gateway eth0: 192.168.0.1 (public address)
  • IP Address eth1: 192.168.1.111 (private address)
  • Default Gateway eth1: none

You are free to change the IP addresses to suit your network, but remember to stay consistent with those adjustments throughout the rest of the article.

Oracle Installation Prerequisites

Perform either the Automatic Setup or the Manual Setup to complete the basic prerequisites. The Additional Setup is required for all installations.

Automatic Setup

If you plan to use the “oracle-rdbms-server-11gR2-preinstall” package to perform all your prerequisite setup, follow the instructions at http://public-yum.oracle.com to setup the yum repository for OL, then perform the following command.

# yum install oracle-rdbms-server-11gR2-preinstall

All necessary prerequisites will be performed automatically.

It is probably worth doing a full update as well, but this is not strictly speaking necessary.

# yum update

Manual Setup

If you have not used the “oracle-rdbms-server-11gR2-preinstall” package to perform all prerequisites, you will need to manually perform the following setup tasks.

In addition to the basic OS installation, the following packages must be installed whilst logged in as the root user. This includes the 64-bit and 32-bit versions of some packages. The commented out packages are those already installed if you have followed the suggested package selection.

# From Oracle Linux 6 DVD
cd /media/cdrom/Server/Packages
#rpm -Uvh binutils-2.*
#rpm -Uvh compat-libstdc++-33*
#rpm -Uvh elfutils-libelf-0.*
#rpm -Uvh libaio-0.*
rpm -Uvh libaio-devel-0.*
#rpm -Uvh sysstat-9.*
#rpm -Uvh glibc-2.*
#rpm -Uvh glibc-common-2.*
#rpm -Uvh glibc-devel-2.* glibc-headers-2.*
rpm -Uvh ksh-2*
#rpm -Uvh make-3.*
#rpm -Uvh libgcc-4.*
#rpm -Uvh libstdc++-4.*
rpm -Uvh libstdc++-4.*.i686*
#rpm -Uvh libstdc++-devel-4.*
#rpm -Uvh gcc-4.*x86_64*
#rpm -Uvh gcc-c++-4.*x86_64*
#rpm -Uvh --allfiles elfutils-libelf-0*x86_64* elfutils-libelf-devel-0*x86_64*
rpm -Uvh elfutils-libelf-0*i686* elfutils-libelf-devel-0*i686*
rpm -Uvh libtool-ltdl*i686*
rpm -Uvh ncurses*i686*
rpm -Uvh readline*i686*
rpm -Uvh unixODBC*
cd /
eject

Add or amend the following lines to the “/etc/sysctl.conf” file.

fs.aio-max-nr = 1048576
fs.file-max = 6815744
#kernel.shmall = 2097152 
#kernel.shmmax = 1054504960
kernel.shmmni = 4096
# semaphores: semmsl, semmns, semopm, semmni
kernel.sem = 250 32000 100 128
net.ipv4.ip_local_port_range = 9000 65500
net.core.rmem_default=262144
net.core.rmem_max=4194304
net.core.wmem_default=262144
net.core.wmem_max=1048586

Run the following command to change the current kernel parameters.

/sbin/sysctl -p

Add the following lines to the “/etc/security/limits.conf” file.

oracle              soft    nproc   2047
oracle              hard    nproc   16384
oracle              soft    nofile  4096
oracle              hard    nofile  65536
oracle              soft    stack   10240

Add the following lines to the “/etc/pam.d/login” file, if it does not already exist.

session    required     pam_limits.so

Create the new groups and users.

groupadd -g 1000 oinstall
groupadd -g 1200 dba
useradd -u 1100 -g oinstall -G dba oracle
passwd oracle

Additional Setup

Perform the following steps whilst logged into the “ol6-112-rac1” virtual machine as the root user.

Set the password for the “oracle” user.

passwd oracle

Install the following package from the Oracle grid media after you’ve defined groups.

cd /your/path/to/grid/rpm
rpm -Uvh cvuqdisk*

If you are not using DNS, the “/etc/hosts” file must contain the following information.

127.0.0.1       localhost.localdomain   localhost
# Public
192.168.0.111   ol6-112-rac1.localdomain        ol6-112-rac1
192.168.0.112   ol6-112-rac2.localdomain        ol6-112-rac2
# Private
192.168.1.111   ol6-112-rac1-priv.localdomain   ol6-112-rac1-priv
192.168.1.112   ol6-112-rac2-priv.localdomain   ol6-112-rac2-priv
# Virtual
192.168.0.113   ol6-112-rac1-vip.localdomain    ol6-112-rac1-vip
192.168.0.114   ol6-112-rac2-vip.localdomain    ol6-112-rac2-vip
# SCAN
192.168.0.115   ol6-112-scan.localdomain ol6-112-scan
192.168.0.116   ol6-112-scan.localdomain ol6-112-scan
192.168.0.117   ol6-112-scan.localdomain ol6-112-scan

Even with the SCAN address defined in the hosts file, it still needs to be defined on the DNS to round-robin between 3 addresses on the same subnet as the public IPs. The DNS configuration is described here. Having said that, I normally include everything except the SCAN entries when using DNS.

Amend the “/etc/security/limits.d/90-nproc.conf” file as described below. See MOS Note [ID 1487773.1]

# Change this
*          soft    nproc    1024

# To this
* - nproc 16384

Change the setting of SELinux to permissive by editing the “/etc/selinux/config” file, making sure the SELINUX flag is set as follows.

SELINUX=permissive

If you have the Linux firewall enabled, you will need to disable or configure it, as shown here or here. The following is an example of disabling the firewall.

# service iptables stop
# chkconfig iptables off

Either configure NTP, or make sure it is not configured so the Oracle Cluster Time Synchronization Service (ctssd) can synchronize the times of the RAC nodes. If you want to deconfigure NTP do the following.

# service ntpd stop
Shutting down ntpd:                                        [  OK  ]
# chkconfig ntpd off
# mv /etc/ntp.conf /etc/ntp.conf.orig
# rm /var/run/ntpd.pid

If you want to use NTP, you must add the “-x” option into the following line in the “/etc/sysconfig/ntpd” file.

OPTIONS="-x -u ntp:ntp -p /var/run/ntpd.pid"

Then restart NTP.

# service ntpd restart

Create the directories in which the Oracle software will be installed.

mkdir -p  /u01/app/11.2.0.3/grid
mkdir -p /u01/app/oracle/product/11.2.0.3/db_1
chown -R oracle:oinstall /u01
chmod -R 775 /u01/

Log in as the “oracle” user and add the following lines at the end of the “/home/oracle/.bash_profile” file.

# Oracle Settings
TMP=/tmp; export TMP
TMPDIR=$TMP; export TMPDIR

ORACLE_HOSTNAME=ol6-112-rac1.localdomain; export ORACLE_HOSTNAME
ORACLE_UNQNAME=RAC; export ORACLE_UNQNAME
ORACLE_BASE=/u01/app/oracle; export ORACLE_BASE
GRID_HOME=/u01/app/11.2.0.3/grid; export GRID_HOME
DB_HOME=$ORACLE_BASE/product/11.2.0.3/db_1; export DB_HOME
ORACLE_HOME=$DB_HOME; export ORACLE_HOME
ORACLE_SID=RAC1; export ORACLE_SID
ORACLE_TERM=xterm; export ORACLE_TERM
BASE_PATH=/usr/sbin:$PATH; export BASE_PATH
PATH=$ORACLE_HOME/bin:$BASE_PATH; export PATH

LD_LIBRARY_PATH=$ORACLE_HOME/lib:/lib:/usr/lib; export LD_LIBRARY_PATH
CLASSPATH=$ORACLE_HOME/JRE:$ORACLE_HOME/jlib:$ORACLE_HOME/rdbms/jlib; export CLASSPATH

if [ $USER = "oracle" ]; then
  if [ $SHELL = "/bin/ksh" ]; then
    ulimit -p 16384
    ulimit -n 65536
  else
    ulimit -u 16384 -n 65536
  fi
fi

alias grid_env='. /home/oracle/grid_env'
alias db_env='. /home/oracle/db_env'

Create a file called “/home/oracle/grid_env” with the following contents.

ORACLE_SID=+ASM1; export ORACLE_SID
ORACLE_HOME=$GRID_HOME; export ORACLE_HOME
PATH=$ORACLE_HOME/bin:$BASE_PATH; export PATH

LD_LIBRARY_PATH=$ORACLE_HOME/lib:/lib:/usr/lib; export LD_LIBRARY_PATH
CLASSPATH=$ORACLE_HOME/JRE:$ORACLE_HOME/jlib:$ORACLE_HOME/rdbms/jlib; export CLASSPATH

Create a file called “/home/oracle/db_env” with the following contents.

ORACLE_SID=RAC1; export ORACLE_SID
ORACLE_HOME=$DB_HOME; export ORACLE_HOME
PATH=$ORACLE_HOME/bin:$BASE_PATH; export PATH

LD_LIBRARY_PATH=$ORACLE_HOME/lib:/lib:/usr/lib; export LD_LIBRARY_PATH
CLASSPATH=$ORACLE_HOME/JRE:$ORACLE_HOME/jlib:$ORACLE_HOME/rdbms/jlib; export CLASSPATH

Once the “/home/oracle/.bash_profile” has been run, you will be able to switch between environments as follows.

$ grid_env
$ echo $ORACLE_HOME
/u01/app/11.2.0.3/grid
$ db_env
$ echo $ORACLE_HOME
/u01/app/oracle/product/11.2.0.3/db_1
$

We’ve made a lot of changes, so it’s worth doing a reboot of the VM at this point to make sure all the changes have taken effect.

# shutdown -r now

Install Guest Additions

Click on the “Devices > Install Guest Additions” menu option at the top of the VM screen. If you get the option to auto-run take it. If not, then run the following commands.

cd /media/VBOXADDITIONS_4.2.6_82870
sh ./VBoxLinuxAdditions.run

The VM will need to be restarted for the additions to be used properly. The next section requires a shutdown so no additional restart is needed at this time.

Create Shared Disks

Shut down the “ol6-112-rac1” virtual machine using the following command.

# shutdown -h now

On the host server, create 4 sharable virtual disks and associate them as virtual media using the following commands. You can pick a different location, but make sure they are outside the existing VM directory.

$ mkdir -p /u04/VirtualBox/ol6-112-rac
$ cd /u04/VirtualBox/ol6-112-rac
$
$ # Create the disks and associate them with VirtualBox as virtual media.
$ VBoxManage createhd --filename asm1.vdi --size 5120 --format VDI --variant Fixed
$ VBoxManage createhd --filename asm2.vdi --size 5120 --format VDI --variant Fixed
$ VBoxManage createhd --filename asm3.vdi --size 5120 --format VDI --variant Fixed
$ VBoxManage createhd --filename asm4.vdi --size 5120 --format VDI --variant Fixed
$
$ # Connect them to the VM.
$ VBoxManage storageattach ol6-112-rac1 --storagectl "SATA" --port 1 --device 0 --type hdd \
    --medium asm1.vdi --mtype shareable
$ VBoxManage storageattach ol6-112-rac1 --storagectl "SATA" --port 2 --device 0 --type hdd \
    --medium asm2.vdi --mtype shareable
$ VBoxManage storageattach ol6-112-rac1 --storagectl "SATA" --port 3 --device 0 --type hdd \
    --medium asm3.vdi --mtype shareable
$ VBoxManage storageattach ol6-112-rac1 --storagectl "SATA" --port 4 --device 0 --type hdd \
    --medium asm4.vdi --mtype shareable
$
$ # Make shareable.
$ VBoxManage modifyhd asm1.vdi --type shareable
$ VBoxManage modifyhd asm2.vdi --type shareable
$ VBoxManage modifyhd asm3.vdi --type shareable
$ VBoxManage modifyhd asm4.vdi --type shareable

Start the “ol6-112-rac1” virtual machine by clicking the “Start” button on the toolbar. When the server has started, log in as the root user so you can configure the shared disks. The current disks can be seen by issuing the following commands.

# cd /dev
# ls sd*
sda  sda1  sda2  sdb  sdc  sdd  sde
#

Use the “fdisk” command to partition the disks sdb to sde. The following output shows the expected fdisk output for the sdb disk.

# fdisk /dev/sdb
Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel
Building a new DOS disklabel with disk identifier 0x62be91cf.
Changes will remain in memory only, until you decide to write them.
After that, of course, the previous content won't be recoverable.

Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)

WARNING: DOS-compatible mode is deprecated. It's strongly recommended to
         switch off the mode (command 'c') and change display units to
         sectors (command 'u').

Command (m for help): n
Command action
   e   extended
   p   primary partition (1-4)
p
Partition number (1-4): 1
First cylinder (1-652, default 1): 
Using default value 1
Last cylinder, +cylinders or +size{K,M,G} (1-652, default 652): 
Using default value 652

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.
#

In each case, the sequence of answers is “n”, “p”, “1”, “Return”, “Return” and “w”.

Once all the disks are partitioned, the results can be seen by repeating the previous “ls” command.

# cd /dev
# ls sd*
sda  sda1  sda2  sdb  sdb1  sdc  sdc1  sdd  sdd1  sde  sde1
#

Configure your UDEV rules, as shown here.

Add the following to the “/etc/scsi_id.config” file to configure SCSI devices as trusted. Create the file if it doesn’t already exist.

options=-g

The SCSI ID of my disks are displayed below.

# /sbin/scsi_id -g -u -d /dev/sdb
1ATA_VBOX_HARDDISK_VB348c4cfc-e3924169
# /sbin/scsi_id -g -u -d /dev/sdc
1ATA_VBOX_HARDDISK_VB5a922b63-bdda991a
# /sbin/scsi_id -g -u -d /dev/sdd
1ATA_VBOX_HARDDISK_VB4bcd7321-f022a60f
# /sbin/scsi_id -g -u -d /dev/sde
1ATA_VBOX_HARDDISK_VBec4843fc-6004ae11
#

Using these values, edit the “/etc/udev/rules.d/99-oracle-asmdevices.rules” file adding the following 4 entries. All parameters for a single entry must be on the same line.

KERNEL=="sd?1", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -d /dev/$parent", RESULT=="1ATA_VBOX_HARDDISK_VB348c4cfc-e3924169",
  NAME="asm-disk1", OWNER="oracle", GROUP="dba", MODE="0660"
KERNEL=="sd?1", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -d /dev/$parent", RESULT=="1ATA_VBOX_HARDDISK_VB5a922b63-bdda991a",
  NAME="asm-disk2", OWNER="oracle", GROUP="dba", MODE="0660"
KERNEL=="sd?1", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -d /dev/$parent", RESULT=="1ATA_VBOX_HARDDISK_VB4bcd7321-f022a60f",
  NAME="asm-disk3", OWNER="oracle", GROUP="dba", MODE="0660"
KERNEL=="sd?1", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -d /dev/$parent", RESULT=="1ATA_VBOX_HARDDISK_VBec4843fc-6004ae11",
  NAME="asm-disk4", OWNER="oracle", GROUP="dba", MODE="0660"

Load updated block device partition tables.

# /sbin/partprobe /dev/sdb1
# /sbin/partprobe /dev/sdc1
# /sbin/partprobe /dev/sdd1
# /sbin/partprobe /dev/sde1

Test the rules are working as expected.

# /sbin/udevadm test /block/sdb/sdb1

Reload the UDEV rules and start UDEV.

# /sbin/udevadm control --reload-rules
# /sbin/start_udev

The disks should now be visible and have the correct ownership using the following command. If they are not visible, your UDEV configuration is incorrect and must be fixed before you proceed.

# ls -al /dev/asm*
brw-rw---- 1 oracle dba 8, 17 Oct 12 14:39 /dev/asm-disk1
brw-rw---- 1 oracle dba 8, 33 Oct 12 14:38 /dev/asm-disk2
brw-rw---- 1 oracle dba 8, 49 Oct 12 14:39 /dev/asm-disk3
brw-rw---- 1 oracle dba 8, 65 Oct 12 14:39 /dev/asm-disk4
#

The shared disks are now configured for the grid infrastructure.

Clone the Virtual Machine

Later versions of VirtualBox allow you to clone VMs, but these also attempt to clone the shared disks, which is not what we want. Instead we must manually clone the VM.

Shut down the “ol6-112-rac1” virtual machine using the following command.

# shutdown -h now

Manually clone the “ol6-112-rac1.vdi” disk using the following commands on the host server.

$ mkdir -p /u03/VirtualBox/ol6-112-rac2
$ VBoxManage clonehd /u01/VirtualBox/ol6-112-rac1/ol6-112-rac1.vdi /u03/VirtualBox/ol6-112-rac2/ol6-112-rac2.vdi

Create the “ol6-112-rac2” virtual machine in VirtualBox in the same way as you did for “ol6-112-rac1”, with the exception of using an existing “ol6-112-rac2.vdi” virtual hard drive.

New VM Wizard - Hard Drive

Remember to add the second network adaptor as you did on the “ol6-112-rac1” VM. When the VM is created, attach the shared disks to this VM.

$ cd /u04/VirtualBox/ol6-112-rac
$
$ VBoxManage storageattach ol6-112-rac2 --storagectl "SATA" --port 1 --device 0 --type hdd \
    --medium asm1.vdi --mtype shareable
$ VBoxManage storageattach ol6-112-rac2 --storagectl "SATA" --port 2 --device 0 --type hdd \
    --medium asm2.vdi --mtype shareable
$ VBoxManage storageattach ol6-112-rac2 --storagectl "SATA" --port 3 --device 0 --type hdd \
    --medium asm3.vdi --mtype shareable
$ VBoxManage storageattach ol6-112-rac2 --storagectl "SATA" --port 4 --device 0 --type hdd \
    --medium asm4.vdi --mtype shareable

Start the “ol6-112-rac2” virtual machine by clicking the “Start” button on the toolbar. Ignore any network errors during the startup.

Log in to the “ol6-112rac2” virtual machine as the “root” user so we can reconfigure the network settings to match the following.

  • hostname: ol6-112-rac2.localdomain
  • IP Address eth0: 192.168.0.112 (public address)
  • Default Gateway eth0: 192.168.0.1 (public address)
  • IP Address eth1: 192.168.1.112 (private address)
  • Default Gateway eth1: none

Amend the hostname in the “/etc/sysconfig/network” file.

NETWORKING=yes
HOSTNAME=ol6-112-rac2.localdomain

Check the MAC address of each of the available network connections. Don’t worry that they are listed as “eth2” and “eth3”. These are dynamically created connections because the MAC address of the “eth0” and “eth1” connections is incorrect.

# ifconfig -a | grep eth
eth2      Link encap:Ethernet  HWaddr 08:00:27:95:ED:33
eth3      Link encap:Ethernet  HWaddr 08:00:27:E3:DA:B6
#

Edit the “/etc/sysconfig/network-scripts/ifcfg-eth0”, amending only the IPADDR and HWADDR settings as follows and deleting the UUID entry. Note, the HWADDR value comes from the “eth2” interface displayed above.

HWADDR=08:00:27:95:ED:33
IPADDR=192.168.0.112

Edit the “/etc/sysconfig/network-scripts/ifcfg-eth1”, amending only the IPADDR and HWADDR settings as follows and deleting the UUID entry. Note, the HWADDR value comes from the “eth3” interface displayed above.

HWADDR=08:00:27:E3:DA:B6
IPADDR=192.168.1.112

If the adapter names do not reset properly, check the HWADDR in the “/etc/udev/rules.d/70-persistent-net.rules” file. If it is incorrect, amend it to match the settings described above.

Edit the “/home/oracle/.bash_profile” file on the “ol6-112-rac2” node to correct the ORACLE_SID and ORACLE_HOSTNAME values.

ORACLE_SID=RAC2; export ORACLE_SID
ORACLE_HOSTNAME=ol6-112-rac2.localdomain; export ORACLE_HOSTNAME

Also, amend the ORACLE_SID setting in the “/home/oracle/db_env” and “/home/oracle/grid_env” files.

Restart the “ol6-112-rac2” virtual machine and start the “ol6-112-rac1” virtual machine. When both nodes have started, check they can both ping all the public and private IP addresses using the following commands.

ping -c 3 ol6-112-rac1
ping -c 3 ol6-112-rac1-priv
ping -c 3 ol6-112-rac2
ping -c 3 ol6-112-rac2-priv

At this point the virtual IP addresses defined in the “/etc/hosts” file will not work, so don’t bother testing them.

Check the UDEV rules are working on both machines. In previous versions of OL6 the “/etc/udev/rules.d/99-oracle-asmdevices.rules” file copied between servers during the clone without any issues. For some reason, this doesn’t seem to happen on my OL6.3 installations, so you may need to repeat the UDEV configuration on the second node if the output of the following command is not consistent on both nodes.

# ls -al /dev/asm*
brw-rw----. 1 oracle dba 8, 17 Jan 12 20:16 /dev/asm-disk1
brw-rw----. 1 oracle dba 8, 33 Jan 12 20:16 /dev/asm-disk2
brw-rw----. 1 oracle dba 8, 49 Jan 12 20:16 /dev/asm-disk3
brw-rw----. 1 oracle dba 8, 65 Jan 12 20:16 /dev/asm-disk4
#

Prior to 11gR2 we would probably use the “runcluvfy.sh” utility in the clusterware root directory to check the prerequisites have been met. If you are intending to configure SSH connectivity using the installer this check should be omitted as it will always fail. If you want to setup SSH connectivity manually, then once it is done you can run the “runcluvfy.sh” with the following command.

/mountpoint/clusterware/runcluvfy.sh stage -pre crsinst -n ol6-112-rac1,ol6-112-rac2 -verbose

If you get any failures be sure to correct them before proceeding.

The virtual machine setup is now complete.

Before moving forward you should probably shut down your VMs and take snapshots of them. If any failures happen beyond this point it is probably better to switch back to those snapshots, clean up the shared drives and start the grid installation again. An alternative to cleaning up the shared disks is to back them up now using zip and just replace them in the event of a failure.

$ cd /u04/VirtualBox/ol6-112-rac
$ zip PreGrid.zip *.vdi

Install the Grid Infrastructure

Make sure both virtual machines are started, then login to “ol6-112-rac1” as the oracle user and start the Oracle installer.

$ cd /host/software/oracle/11gR2/11.2.0.3.0/linux64_grid
$ ./runInstaller

Select the “Skip software updates” option, then click the “Next” button.

Grid - Download Software Updates

Select the “Install and Configure Oracle Grid Infrastructure for a Cluster” option, then click the “Next” button.

Grid - Select Installation Option

Select the “Typical Installation” option, then click the “Next” button.

Grid - Select Installation Type

On the “Specify Cluster Configuration” screen, enter the correct SCAN Name and click the “Add” button.

Grid - Specify Cluster Configuration

Enter the details of the second node in the cluster, then click the “OK” button.

Grid - Add Cluster Node Information

Click the “SSH Connectivity…” button and enter the password for the “oracle” user. Click the “Setup” button to to configure SSH connectivity, and the “Test” button to test it once it is complete.

Grid - SSH Connectivity

Click the “Identify network interfaces…” button and check the public and private networks are specified correctly. Once you are happy with them, click the “OK” button and the “Next” button on the previous screen.

Grid - Network Interfaces

Enter “/u01/app/11.2.0.3/grid” as the software location and “Automatic Storage Manager” as the cluster registry storage type. Enter the ASM password, select “dba” as the group and click the “Next” button.

Grid - Specify Install Locations

Set the redundancy to “External”, click the “Change Discovery Path” button and set the path to “/dev/asm*”. Return the main screen and select all 4 disks and click the “Next” button.

Grid - Create ASM Disk Group

Accept the default inventory directory by clicking the “Next” button.

Grid - Create Inventory

Wait while the prerequisite checks complete. If you have any issues, either fix them or check the “Ignore All” checkbox and click the “Next” button.

Grid - Perform Prerequisite Checks

If you are happy with the summary information, click the “Install” button.

Grid - Summary

Wait while the setup takes place.

Grid - Setup

When prompted, run the configuration scripts on each node.

Grid - Execute Configuration Scripts

The output from the “orainstRoot.sh” file should look something like that listed below.

# cd /u01/app/oraInventory
# ./orainstRoot.sh
Changing permissions of /u01/app/oraInventory.
Adding read,write permissions for group.
Removing read,write,execute permissions for world.

Changing groupname of /u01/app/oraInventory to oinstall.
The execution of the script is complete.
#

The output of the root.sh will vary a little depending on the node it is run on.

Once the scripts have completed, return to the “Execute Configuration Scripts” screen on “rac1” and click the “OK” button.

Grid - Execute Configuration Scripts

Wait for the configuration assistants to complete.

Grid - Configuration Assistants

We expect the verification phase to fail with an error relating to the SCAN, assuming you are not using DNS.

INFO: Checking Single Client Access Name (SCAN)...
INFO: Checking name resolution setup for "rac-scan.localdomain"...
INFO: ERROR:
INFO: PRVF-4664 : Found inconsistent name resolution entries for SCAN name "rac-scan.localdomain"
INFO: ERROR:
INFO: PRVF-4657 : Name resolution setup check for "rac-scan.localdomain" (IP address: 192.168.2.201) failed
INFO: ERROR:
INFO: PRVF-4664 : Found inconsistent name resolution entries for SCAN name "rac-scan.localdomain"
INFO: Verification of SCAN VIP and Listener setup failed

Provided this is the only error, it is safe to ignore this and continue by clicking the “Next” button.

Click the “Close” button to exit the installer.

Grid - Finish

The grid infrastructure installation is now complete.

Install the Database

Make sure the “ol6-112-rac1” and “ol6-112-rac2” virtual machines are started, then login to “ol6-112-rac1” as the oracle user and start the Oracle installer.

$ cd /host/software/oracle/11gR2/11.2.0.3.0/linux64_database
$ ./runInstaller

Uncheck the security updates checkbox and click the “Next” button and “Yes” on the subsequent warning dialog.

DB - Configure Security Updates

Check the “Skip software updates” checkbox and click the “Next” button.

DB - Download Software Updates

Accept the “Create and configure a database” option by clicking the “Next” button.

DB - Select Installation Option

Accept the “Server Class” option by clicking the “Next” button.

DB - System Class

Make sure both nodes are selected, then click the “Next” button.

DB - Node Selection

Accept the “Typical install” option by clicking the “Next” button.

DB - Select Install Type

Enter “/u01/app/oracle/product/11.2.0.3/db_1” for the software location. The storage type should be set to “Automatic Storage Manager”. Enter the appropriate passwords and database name, in this case “RAC.localdomain”.

DB - Typical Install Configuration

Wait for the prerequisite check to complete. If there are any problems either fix them, or check the “Ignore All” checkbox and click the “Next” button.

DB - Perform Prerequisite Checks

If you are happy with the summary information, click the “Install” button.

DB - Summary

Wait while the installation takes place.

DB - Install Product

Once the software installation is complete the Database Configuration Assistant (DBCA) will start automatically.

DB - DBCA

Once the Database Configuration Assistant (DBCA) has finished, click the “OK” button.

DB - DBCA Complete

When prompted, run the configuration scripts on each node. When the scripts have been run on each node, click the “OK” button.

DB - Execute Configuration Scripts

Click the “Close” button to exit the installer.

DB - Finish

The RAC database creation is now complete.

Check the Status of the RAC

There are several ways to check the status of the RAC. The srvctl utility shows the current configuration and status of the RAC database.

$ srvctl config database -d RAC
Database unique name: RAC
Database name: RAC
Oracle home: /u01/app/oracle/product/11.2.0.3/db_1
Oracle user: oracle
Spfile: +DATA/RAC/spfileRAC.ora
Domain: localdomain
Start options: open
Stop options: immediate
Database role: PRIMARY
Management policy: AUTOMATIC
Server pools: RAC
Database instances: RAC2,RAC1
Disk Groups: DATA
Mount point paths: 
Services: 
Type: RAC
Database is administrator managed
$

$ srvctl status database -d RAC
Instance RAC1 is running on node ol6-112-rac1
Instance RAC2 is running on node ol6-112-rac2
$

The V$ACTIVE_INSTANCES view can also display the current status of the instances.

$ sqlplus / as sysdba

SQL*Plus: Release 11.2.0.3.0 Production on Tue Sep 27 22:20:14 2011

Copyright (c) 1982, 2011, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options

SQL> SELECT inst_name FROM v$active_instances;

INST_NAME
--------------------------------------------------------------------------------
ol6-112-rac1.localdomain:RAC1
ol6-112-rac2.localdomain:RAC2

SQL>

If you have configured Enterprise Manager, it can be used to view the configuration and current status of the database using a URL like “https://ol6-112-rac1.localdomain:1158/em&#8221;.

OEM

How to check which PSU is installed…if any

January 1, 2016 Leave a comment

Oracle PSUs (Patch Set Updates) are referenced by their 5-place version number.  Unfortunately they do not change version numbers in the Oracle binaries, product banners and such though (see MOS 861152.1), so here’s how to identify which PSU your ORACLE_HOME is at…

Database Server:

$ORACLE_HOME/OPatch/opatch lsinventory -bugs_fixed 
| grep -i -E 'DATABASE PSU|DATABASE PATCH SET UPDATE'
$ORACLE_HOME/OPatch/opatch lsinventory -bugs_fixed 
| egrep -i 'DATABASE PSU|DATABASE PATCH SET UPDATE'

(The first command above being for Linux)

…or using the following SQL:

select comments, version, bundle_series
from sys.registry$history
where bundle_series = 'PSU'
order by action_time;

COMMENTS                       VERSION            BUNDLE_SERIES
------------------------------ ------------------ -----------------
Patchset 11.2.0.2.0            11.2.0.3           PSU
PSU 11.2.0.3.5                 11.2.0.3           PSU

The above view is populated when catbundle.sql is executed.  If the query above ends with “ORA-00904: “BUNDLE_SERIES”: invalid identifier” then no bundle patch (PSU or CPU) has been applied.

Grid Infrastructure:

$ORACLE_HOME/OPatch/opatch lsinventory -bugs_fixed 
| grep -i 'GI PSU'

Cluster Ready Services:

$ORACLE_HOME/OPatch/opatch lsinventory -bugs_fixed 
| grep -i 'TRACKING BUG' | grep -i 'PSU'

Enterprise Manager Agent:

$ORACLE_HOME/OPatch/opatch lsinventory -bugs_fixed 
| grep -i 'ENTERPRISE MANAGER AGENT' | grep -i 'PSU'

Enterprise Manager OMS:

$ORACLE_HOME/OPatch/opatch lsinventory -bugs_fixed 
| grep -i 'ENTERPRISE MANAGER OMS' | grep -i 'PSU'

WebLogic Server:

. $WLS_HOME/server/bin/setWLSEnv.sh
java weblogic.version|grep PSU
Categories: #oracle_Emp, 12c, Database, Oracle Tags: , ,
Follow

Get every new post delivered to your Inbox.

Join 726 other followers