Troubleshooting EMC Ionix (SMARTS) – Part 1

Recently I went to a client to help them understand their current state of Enterprise Management. They leveraged EMC’s Ionix (aka SMARTS) for server, network, and storage monitoring. As usual in any IT enterprise, the guy who implemented left the company, and the newcomer was still ramping up. Additionally, they were experiencing high latency/poor user performance on their Service Assurance Manager (SAM) Global Console.

In the next series of blogs, I hope to shed some light into the approach for identifying (and correcting) bottlenecks in your Ionix infrastructure. Things like:
-Resource sizing
-Domain Manager performance
-Adapter issues
-Escalation Policies
-SAM Subscription “surge” issues
-Out of date/Incompatible software versions
-Underlying O/S Issues

What’s interesting about Ionix is not only the amount of intelligence behind it, but also the intelligence necessary to understand all of the interconnections and dependencies. This particular client had various components of Ionix (SAM, BIM, Dashboard, ICIP, Adapters) which needed to be tested, assessed, and reported back for future remediation.

The first step I usually take is to understand the Ionix infrastructure deployment:
-Where/what are the brokers?
-What domains are registered/what domains are not registered but exist out there?
-Where/what are the FlexLM license managers?
-Are all components registering as alive, or dead?
-Are the underlying O/Ss healthy?

Next, I try to gather a good understanding of the volume; and by volume, I mean amount of events (root cause and/or symptomatic), number of topological objects in each domain manager and Service Assurance Manager, and “surge” of subscribed objects. More on this later.

Below is a good picture of the type of statistics that help frame an understanding of Ionix health.

Now obviously, those are not all of the important statistics. You should also take advantage of the “sm_tpmgr –sizes” utility to get a full picture of the topological objects discovered/monitored by each domain manager. This will list all of the Node-level objects (discovered/instrumented) as well as interface/card objects. This is critical in understanding scale and server sizing.

Another good measuring “yard stick” is the amount of events present in SAM’s memory footprint and the rate of events coming into SAM. Utilities such as “dmctl” and “sm_adapter” are key for coming up with a statistic such as this:

Next blog, I will review how to use dmdebug to dump statistical information about each domain manager to understand the underlying performance of the domain managers.

Written by

April 12, 2011
Comments 0

Speak Your Mind