Home > Siebel > FDR Analysis on Siebel

FDR Analysis on Siebel

September 11, 2013

PURPOSE

FDR stands for Flight Data Recorder and is a framework in the Siebel Server infrastructure that collects data about the running Siebel application in a circular buffer. In the event of a crash, the data is written in binary format to a file in the application /bin subdirectory. The file that is written has the .fdr extension and can be post-processed into human readable format using the sarmanalyzer.exe utility. The data in the output can help show what was happening immediately prior to the crash in different application subsystems.

NOTE: The application may not generate an FDR file when there is a “soft” crash meaning that the server process exits but is not recognized by the Siebel crash handler. This can happen when one of the following occurs:

  • The Siebel crash handler is disabled (should only happen under the supervision of Technical Support).
  • When the process exits, it does not go through the internal code path that executes the Siebel crash handler logic and none of the related crash output for example the crash.txt file or FDR file is created.

SCOPE

This document is informational and intended for any user.

DETAILS

Here are the high level sections that are covered in this document. Click on any of the items below to jumplink to that section:

  • How to Identify the Correct FDR file When a Crash Occurs
  • How to Process Binary .fdr File into .csv Format and Identify the Crashing Thread
  • How to Flush FDR Output
  • How to Review Entries Prior to Crashing Thread to Understand What Happened Immediately Prior to the Crash
    • Example Case
    • Diagram of Interaction between Elements Involved in Crash Scenario
    • Analysis of the FDR output:
  • Where to go for more information

How to Identify the Correct FDR file When a Crash Occurs

Finding the correct output file can be done by using the information in the .fdr file name. The file name includes a timestamp and the process id that crashed and is written in the format:

T<YYYYMMDDHHMM>_P<process id value>.fdr

For example:

T200503181601_P001376.fdr

Is a file name that is based on a component that was started on March 18, 2005 at 4:01 PM where the process id value was 1376.

NOTE: Bug 10509303 has been logged to address the documentation defect with how the file format is documented in the System Monitoring and Diagnostics Guide for Siebel Business Applications.

If the process id is known, then look at the second part of the file name to find the correct file, otherwise, use the timestamp in the first part as the guide. NOTE: If a crash_xxxx.txt file is available, convert the hexadecimal process id found in that file to a decimal value to identify the appropriate process id value that should appear in the .fdr output file name. The example below shows a crash.txt file generated in the Microsoft Windows environment.

NOTE: On HP-UX, the crash.txt file that is created in Siebel version 7.7 is a single file that gets appended. The process id is displayed in Decimal format as shown below:

How to Process Binary .fdr File into .csv Format and Identify the Crashing Thread

Here are the steps to follow to post process the raw .fdr file:

  1. Identify the appropriate .fdr file to process using process suggested above.NOTE: On UNIX platforms only, source the shell environment variables, before running the sarmanalyzer utility.
    To do this from the $SIEBEL_ROOT/siebsrvr directory, run the following shell command:

. ./siebenv.sh

  1. Use the sarmanalyzer.exe command line utility and issue the following command:

sarmanalyzer -o <output_csv_file> -x -f <fdr_file>

For example:

sarmanalyzer -o T200503181601_P001376.csv -x -f T200503181601_P001376.fdr

The output .csv file will be written to the SIEBSRVR_ROOT\bin directory unless redirected to a different directory.

NOTE: While you can specific any file name for the .csv file, it is good practice to keep the same file name. This will maintain the date and time stamp as well as the crashing PID designations in the name of the file. This is useful when there are multiple FDR files generated and will provide reference points should these files need to be supplied to Technical Support.

  1. A best practice is to open the output .csv file using a spreadsheet application like Microsoft Excel so that you can easily filter the data.
  1. To do this in Excel you simply open the .csv file, use the Data menu item and select Filter > Auto Filter sub menu items.
    1. Next, to see the entries related to only the crashing thread, filter the SubAreaDesc column by the value ** CRASHING THREAD **.
    1. Select the ThreadID column and filter on the value (in this example, the value is 4068) that appears there for the record.
    1. And then unset the filter on the SubAreaDesc column. This should cause all records with the same thread id as the crashing thread to be displayed. These are the relevant records to review when analyzing FDR output. Please note that several threads may crash before the process is terminated by the operating system, in which case you may find several such FDR records.
    1. Please note that the .csv file created by sarmanalyzer.exe is not sorted. An important step is to sort the file in chronological order. For performance reasons, the FDR file does not contain timestamps. However, you can sort on the FdrID column in ascending order to rearrange the data in chronological order.

How to Flush FDR Output

Besides automatically creating the .fdr output file when a process crashes, it is possible to force the file to be flushed (written to disk) on command. The Siebel Server task id needs to be provided as an argument. The following information describes the steps to force the .fdr file to be flushed.

To cause the FDR buffer for a component process to be written to disk follow these steps:

  1. Identify the task you want to generate the dump for. This can be done by using the srvrmgr.exe command line utility and the list tasks command or navigating to the Administration – Server Management > Server > Tasks view from the Site Map in the Siebel application. For example:

srvrmgr> list tasks

  1. Identify the task id value for the component task that you want to generate the dump for and note it.
  1. Flush the FDR buffer to disk. Using the srvrmgr.exe command line utility and execute the command:

srvrmgr> flush FDR for task <task_id> (for version 7.7. and 7.8)

srvrmgr> flush FDR for process <process_id> (for version 8)

Where task_id in the statement is replaced with the value identified in step 1a.

  1. The FDR file will be written to the SIEBSRVR_ROOT\bin directory with the naming convention:

T<timestamp_YYYYMMDDHHMM>_P<OS_process_id>.fdr

An example of this is:

T200403121323_P002576.fdr

This evaluates to an FDR file for process id 2576 where the process was started at approximately 1:23pm on March 12, 2004.

  1. Identify the correct OS thread id for your task. Within the FDR file, each entry includes the OS thread id related to the operation captured, not the task id. To find the relevant OS thread id used by the component task, use the following command on the srvrmgr.exe command line utility:

srvrmgr> list tasks show CC_ALIAS, TK_TASKID, TK_TID, TK_PID

This will generate a list of tasks and include the component alias (CC_ALIAS), the task id (TK_TASKID), the OS thread id (TK_TID), and OS process id (TK_PID).

Note the TK_TID and TK_PID values for the appropriate task id that you have flushed the FDR buffer for. This will help you find the appropriate FDR file described in 2a (the last part of the file name should map to the TK_PID value), and after decoding the file, will help you identify the entries relevant to the task id you are interested in. Each entry should have a ThreadID value equal to the TK_TID value. This is especially important when considering that a single process may have many threads.

How to Review Entries Prior to Crashing Thread to Understand What Happened Immediately Prior to the Crash

General structure of FDR entries includes the following columns:

Column

Description

FdrID The id assigned to a particular FDR entry. Each entry has a different id value.
ThreadID The Operating System thread id. Each entry is associated with a thread, some entries may have the same or different thread id depending on whether the process is multi-threaded or not, and whether more than one thread is in use at the time of a crash.
AreaSymbol Categorization for a particular subsystem so all entries can be grouped together.
AreaDesc Descriptive text of what product area each entry is associated with.
SubAreaSymbol Similar to the area symbol, used to assign a unique categorization within a particular area for different functionality.
UserInt1, UserInt2 Integer values assigned by internal instrumentation that may store values like internal pointer references; this is normally only useful to Oracle Engineering.
UserStr1, UserStr2 These columns provide contextual information that is germane to understanding the significance of each entry and that may store object names, parameter values, row_ids or other messages that help indicate some context within the area and sub-area.

Example Case

A custom DLL has been developed that can be called when string transformations are necessary. The DLL is called from a business service that includes custom eScript code to pass a parameter to the DLL and receive the output from it. In a particular implementation, eScript code on the Account business component WriteRecord event calls a workflow process that uses the business service. The script passes the location value of the account to the workflow, the workflow process passes the value as a process property to the business service, and that value is in turn passed to the DLL for processing.

In this scenario the DLL is incorrectly implemented in a way that causes the DLL to exit unexpectedly, and this in turn will cause the OM process hosting the user session that invokes the DLL to crash. The FDR output can be examined to show what the user session was doing prior to the crash and what happened in the different internal subsystems the object manager, scripting, and the workflow manager to track down the point of failure to the workflow process and to the step that calls the DLL.

Diagram of Interaction between Elements Involved in Crash Scenario

Steps to cause the failure:

  1. Login to the Siebel application and navigate to the Account List View.
  1. Create a new Account with a location value of New York and step off the record to commit it.
  1. Because the DLL will fail when the location value is set to TT, set the location of the record to TT and step off the record to commit it.
  1. Step #3 will cause the Object Manager process to crash, and an FDR file will be written to disk at SIEBSRVR_ROOT/bin. The web client behavior will be to display an error indicating that the Server is busy, and the client will need to initiate a new session if they want to continue using the application.
  1. The .fdr file will need to be post-processed using the sarmanalyzer.exe utility to determine what happened before the crash.

Analysis of the FDR output:

See the section above called “How to process binary .fdr file into .csv format and identify the crashing thread” for details on how to post-process the binary .fdr file and get the .csv file into the proper order showing the records of the session relevant to the crash.

After sorting the content of the .csv file by the FdrID column, scroll down to the bottom of the list and work up to see the last few entries prior to the record showing the crashing thread. The entries prior to the last one show what happened prior to the crash.

The output will help to show things like:

  1. The SWE command executed in the client to navigate to the Account List View.
  1. The applet where the account record is written.
  1. The business component and script event that is executed.
  1. The different methods that are invoked by the BusComp_WriteRecord event and what script language is used.
  1. The call in the script to invoke a workflow, and its execution by the workflow subsystem.
  1. The invocation of the business service and method from within the workflow and by the object manager.
  1. The execution of the script methods in the Service_PreInvokeMethod event.
  1. The last successful operation is the GetProperty(WF_LOC) call in the AA business service so it can be deduced that the next call in the business service  SElib.dynamicLink(“revstr.dll”, “_BlockRev@4”, CDECL,myloc); – is the point of failure. In fact, when reviewing the DLL code, it can be determined that the point of failure occurs when the DLL receives a value of TT and the file object is never initialized prior to an attempt to write to it.
  1. Finally the crashing thread.

In this case, analyzing the FDR output quickly shows:

  • the interaction of several subsystems in the product,
  • helps deconstruct how each one is utilized prior to a crash, and
  • assists in pinpointing the last several operations prior to the failure.

Given this information it is possible to reconstruct what led to the failure, the likely cause, and the areas to focus diagnostic and recovery efforts.

Advertisements
Categories: Siebel Tags: ,
%d bloggers like this: