Archive for the ‘Gestalt IT’ Category

The Blue lights on EMC Symmetrix V-Max Systems

September 10th, 2009 6 comments

If you were to walk in a Datacenter and see an EMC Symmetrix V-Max for the first time, you will end up giving it a look.

It’s those Blue Flashy lights in the front of the unit that just catches your eyes.

It gives the Symmetrix V-Max the Sleek and Sexy look..

Here are some pictures to prove that..


v-max image 2

v-max image

Visible in the picture below are Cisco UCS blades, NetApp systems, HP systems, Cisco Switches, Xsigo Systems, but you can surely spot the Symmetrix V-Max

vmworld 2009

EMC Symmetrix V-Max

A video from EMC World 2009

So these lights are USB powered through USB cables, a very simple idea though a genius one. Enterprise class arrays and the use of USB ports at the backend of these systems keep these Blue flashy lights on.

Here are the pictures of those USB connectors from the front. Believe it or not, they are redundant as well.

(Look at the USB cable connectors that go on the front door, 2 of them right above the blue light)


As Storage Anarchist says in his Blog post “The first thing you’ll probably notice about the new Symmetrix V-Max is the packaging – and specifically the glossy-black panel with the blazing blue LED light bar that underscores the name Symmetrix on every door. The design team had a lot of fun blending the modern gloss-black look of today’s popular personal communications devices with the image of stability and security that customers expect from Symmetrix.”

Yea that is right,,,,,,,,,, this post was about the Blue lights on the Symmetrix V-Max Systems :-)

Oh…the big question, will it call home through the EMC ESRS Gateway if one of these Blue lights fail?

Storage Economics – Hardware Maintenance – Part 2

August 6th, 2009 No comments

This blog post is a continuation of yesterday’s post about various aspects of Storage Economics as it relates to Hardware Maintenance cost.

To read about Storage Economics – Hardware Maintenance – Part 1

Topics we covered in the previous post included

The concept of Hardware Maintenance

The Strategy related to Hardware Maintenance

The Facts about Hardware Maintenance

The beliefs about Hardware Maintenance

Here are a few other components related to storage hardware maintenance services as it fits into a concept of Storage Economics.

There was an interesting post yesterday by David Merrill at HDS regarding how a customer in the APAC market has been able to leverage Independent Service Providers for various different assets that they own and how they decide what stays with the manufacturer and what is being maintained by Independent Service Providers.

The Plan for Hardware Maintenance

Hardware Support: Support on storage assets could be available through Independent Service Providers (ISP’s), which could help reduce CapEx, OpEx and improve ROA by levering the existing technology on the floor for a longer time span.

Remote Support and Diagnostics: Independent Service Providers can enable storage frames for remote call home features, remote support and perform diagnostics for troubleshooting.
Code Upgrades (firmware) and Engineering: This support is typically only available through the manufacturer. But here is a fact; at the end of a 3 year life cycle of the equipment (when you start paying for off warranty support) how many times have you seen code upgrades being offered to customers, since vendors are more focused on technology that is current today).

Global technical Support: Global 24 x 7 technical support is often provided by Independent Service Providers as a part of service offerings.

Onsite Certified & Trained engineers: Independent Service Providers typically hire the same engineers that have been working for the vendor and redeploy them onsite for services

Spares: Spare parts are standard offering through Independent Service Providers to have them shipped at the site within the 4-hour SLA or possibly store it as onsite spares.

SLA: Normally Independent Service Providers SLA’s are matched to vendor specifications.  Also a custom tailored support plan can be created for the test and development systems – non critical systems, which might not need the utmost priority.

Software Support: In most cases Software support can be continued with the vendor, which enables you to receive software updates for your host environment or any other layered software. If your storage platform is more than 5 years old, may be you can investigate into dropping the software support.

The Pricing for Hardware Maintenance

So typically off-warranty hardware maintenance services may be available to you at 50% to 70% discount of vendors list price since these organizations do not have a sustaining engineering cost.

This will help increase the life of the asset you already own on the floor, which is fully functional and operational.

This will further help you reduce your CAPEX (by not purchasing new assets), reduce your OPEX (by reducing your maintenance cost) and improve your ROA (an asset you have already paid for).

This savings will need a 12-month cycle to fully qualify since hardware maintenance services are charged on a monthly basis.

With 50% to 70% savings per device (Storage Frame), if you have an environment with 1PB storage, your organization could see a savings of millions of dollars over 3 years and a 5PB environment might see a double-digit million dollar savings over 3 years.

Is this something that sounds interesting and can help you overall preserve your CapEx and reduce your OpEx?

Your Alternatives for Hardware Maintenance

Are people within your organization ready and open for this concept?

Do your homework in selecting the right service provider.

It is okay to consider giving the Independent Service Providers a partial environment that might consists of Test and Development Systems, to get a better feel for their services and response times.

Compare service offerings from multiple independent service providers.

Ask the right questions and see how long transition plans would take for a cut over.

Ask questions related to outages in these environments and the Independent Service Providers experience around it.

It is completely okay to drill the Independent Service Providers technical folks with tons of questions related to your environment.

Here are some additional things to ask the Independent Service Provider:

Ask about recertification cost

Ask about reconfiguration cost

Ask for any hidden cost

Ask about spares that are used

Ask about spares procurement process

Ask about spares testing process

Ask to see their operations (visit the ISP)

Ask about support plan

Ask to access the online service calls portal

Ask about online web portal for advisories and errors

Ask about response times

Look into escalation plans

Look into call flow processes

Ask about gaps in service compared to the vendor

Ask for a dedicated trained engineer (based on the amount of business)

Ask about training for engineers

Ask for a dedicated account manager

Ask for a dedicated technical contact

Ask for sales contact for your account

Ask for escalation contacts

Ask for project plan related to the transition from vendor to ISP

Investigate how big the Independent Service Provider is

Check references, ask reference customers about outages, parts replacement process, about hardware – software issues and call ownership issues

Compare SLA’s

Check the viability of the support solution (tools, processes, escalation, risks, etc)

Do not make decision on pricing only

Determine contingency plans.

So where do you find these alternatives?

Well, your initial search can begin on the web. Following that, you should further inquire into the company and try to dig into their area of expertise. Ask them about their competition and inquire with the competition on support plans.

If you have a Storage partner, go to them and ask them to find an Independent Service Provider for your organization.

If you have Global Outsourcing partners, inquire with them, to see if they have any strategic partners they recommend.

Honestly, working with so many different customers over the world, I have seen Independent Service Providers do good, they help the customer reduce CapEx and OpEx, extend the life of the equipment and most importantly run the entire operations without any disruptions, but that said, I have seen many Independent Service Providers fail miserably to deliver on promised services.

Do the research and jump on this if it is a viable option for your organization.

Its all about Storage Economics!!!!

Storage Economics – Hardware Maintenance – Part 1

August 5th, 2009 2 comments

So on several occasions, I have written about Storage management and the cost reduction associated with it in terms of CapEx and OpEx. In this blog post we will talk about how your organization may further be able to leverage resources available in the industry to reduce TCO (Total Cost of Ownership) and improve ROA (Return on Assets) for the storage devices you own.

For example purposes, lets assume we are only talking about one single Storage device (frame) in the environment. Also for this blog post, lets assume the manufacturer (OEM) of the Storage frame is the vendor.

The concept of Hardware Maintenance

You purchased a storage asset 3 years ago. Spend a million dollars in acquisition cost on that storage device, also paid for software licenses, implementation cost, migration cost and training cost.  You are almost at a 2 million dollar mark to implement this Enterprise Class Storage, which includes your Tier1 and / or Tier 2 data.

How is this Storage frame doing today?

It’s working great, applications associated with it are robust, Thank God over the past 3 years we haven’t seen any outages in this environment.

Oh…………by the way, the vendor just visited today and is proposing we do a tech refresh in this environment.

The Strategy related to Hardware Maintenance

So the first question, are you ready for this tech refresh?

Is your business ready for this tech refresh?

Is your team ready and trained for this new technology?

Do you need external resources for this tech refresh?

Are there budgets and proposed growth in the business to pay for this tech refresh?

Do we really need a tech refresh?

Are your applications ready for this tech refresh?

Would your host environments be ready for this tech refresh?

What is that you are trying to gain by this tech refresh – Processing Power, Speed, Savings, Green Data Center, Power, Electricity, management cost, etc?

Are your users complaining about your application performance?

Is the number of users growing on these apps?

So how many Nah’s and Yah’s do we have on the questions above?

The Facts about Hardware Maintenance

The vendor is proposing a substantial savings and helping us reduce the TCO on these assets over the next three years.

Cost of hardware maintenance from the vendor for year 4, year 5 and year 6 (on the existing storage asset) is almost equivalent to the cost of purchasing new assets

We are being offered the best deal, free training, the vendor reduced the hardware acquisition cost by 20% and they have another 5% discount for the quarter closing tomorrow.

The beliefs about Hardware Maintenance

Hardware Support: No one other than the vendor can provide hardware support on the Storage assets because it is just too complex to manage.

Remote Support and Diagnostics: No one other than the vendor can provide remote support and diagnostics.

Code Upgrades (Firmware) and Engineering Support: No one other than the vendor can provide Code upgrades and Engineering Support.

Global Technical Support: No one other than the vendor has a 24 x 7 global technical support.

Onsite Certified & Trained Engineers: Only the vendor has trained and certified onsite engineers.

Spares: 24 x 7, 4 hour response spare parts logistics, only the vendor has it.

SLA: Only the vendor can provide a mission critical or a premium SLA that would include either 24 x 7 x 2 support or 24 x 7 x 4 support.

Software Support: No one other than the vendor can provide Software support

So, how do you get around these industry notions?

Please stay tuned for the next blog post on Storage Economics – Hardware Maintenance – Part 2 tomorrow.

EMC Clariion Systems: Global Hot Spares & Proactive Hot Spares

July 30th, 2009 No comments

The concept of Global Hot Spares has been supported in Clariion environments since the first generation of FC & CX platforms. Now the technology has been extended into the CX3 and then the CX4 platforms. The primary purpose of global hot sparing is to protect the system against disk drive failures.

Typically look at a CX4-960, which can be scaled up to 960TB of raw storage and can have as many as 960 disk drives in it. With certain failure rates guaranteed, large number of drives can create a higher probability of failure. Every storage manufacturer these days includes some sort of hot sparing technology in the storage subsystems. EMC started offering this technology to its customers as Global Hot Spares. Then came an era where some value add offerings were brought in for proactive failures to minimize the chance of data loss. This brought to the table a technology that is termed as Proactive Hot Spares, where proactively failing drive is determined and global hot spare is kicked in.

I believe flare release 24 started offering Proactive hot spares. With this Flare release customers can proactively initiate a kickoff of hot spares through Navisphere or Naviseccli against a suspect drive.

Depending on the RAID type implemented, the RAID Groups can withstand drive failures and can run in degraded state without data loss or data unavailability. With RAID 6 implemented, a machine can have as many as 2 drive failures in the same RAID group, with RAID 5, a machine can have as many as 1 drive failure in the same RAID group, with RAID 1/0, RAID 1 a machine can have as many as 1 drive failure in the RAID group without data loss.

Drives supported on Clariion CX, CX3, CX4, AX and AX4 systems typically are FC (Fiber Channel), SATA II and ATA drives.

A Global Hot Spare has to be configured in an EMC Clariion system as a single RAID Group (with one drive). Once the RAID Group is created, a LUN should be bound as a Global Hot Spare before it could be activated.

The following is the sequence of steps that take place on a Clariion Subsystem related to Global Hot Spares (Supported on CX, CX3, CX4 systems)

  1. Disk Drive failure: A disk drive failure in the system, Flare Code marks it bad.
  2. Hot spare invoked: A preconfigured Global Hot Spare is invoked based on the Global Hot Spare selection criteria.
  3. Rebuild: The Global Hot Spare is rebuilt from surviving raid group members.
  4. Failed drive replaced: Failed disk drive is replaced with a good drive by a Customer Engineer
  5. Copy Back: The Global Hot Spare copy has to finish before the new drive starts rebuilding. The rebuild or equalize happens in a sequential order of LBA (Logical Block Address) and not the LUNs bound no it.
  6. Return Hot Spare: Once the sync of new drive is finished, the hot spare is invalidated (zero’ed) and put back in the Global Hot Spare pool.

The following is the sequence of steps that take place on a Clariion Subsystem related to Proactive Hot Spares (Supported on CX300, CX500, CX700, CX3, CX4). Proactive Hot Spares essentially use the same drives that are configured as Global Hot Spares.

  1. Threshold of errors on Disk Drive: A drive gets hit with errors, it surpasses the number and type of those errors, and the flare code marks it as a potential candidate for failure.
  2. Proactive Hot Spare invoked: Based on the potential candidate’s (drive) type, drive size and bus location a Global Hot Spare is indentified and the process is kicked off for data rebuild.
  3. Potential candidate fails: Once the Proactive Hot Spare is synced, the flare code fails the indentified potential candidate.
  4. Failed drive replacement: The failed drive is replaced by a Customer Engineer
  5. Copy Back: From the proactive hot spare, the data is copied back to the newly inserted drive. The rebuilt or equalize happens in a sequential order of LBA (Logical Block Address).
  6. Return Proactive Hot Spare: Once the sync of new drive is finished, the hot spare is invalidated (zero’ed) and put back to the Global Hot Spares pool.

The Global Hot Spares Selection Criteria:

The following are the criteria’s that are followed with selection (invoke) of a Global Hot Spare when a potential proactive candidate is identified or disk drive is failed. In the sequence listed below, Drive type is the first selection, Size of the drive is the second selection and location of the Global Hot Spare is the third selection. Speed of the drive (RPM) is not a selection criterion.

  1. Type of Global Hot Spare Drive: As discussed above, Clariion Systems use three primary drive types. For FC and SATA II type drives, either or can be invoked against each other type. ATA drives can be invoked against an ATA drive failure.
  2. Size of Global Hot Spare: Upon a disk failure, the drive size (Global Hot Spare) is examined by Flare Code. The size of failed drive is not the key in invoking the hot spare, but the total space of all LUNs (bound) on the drive is used as a determination criteria.
  3. Location of Global Hot Spare: Based on the above two criteria, the location of the Global Hot Spare is considered as the third criteria. If the Global Hot Spare is on the same bus as the failed drive, it will be considered as the primary selection if the above two criteria’s are met. If the above two criteria’s are met and the drive is not on the same bus, then the Global Hot Spare is selected from other buses.

Other Considerations:

  1. RAID Types: For the copy of data, with RAID 3 and RAID 5 data on the hot spare is built using the parity drive. With RAID 6 raid types, data on the hot spare is built using the RP (row parity) and / or DP (Diagonal Parity) depending on the number of failures in the RAID Groups. For the RAID 1/0 and RAID 1, data on the hot spare is built using the surviving mirrors.
  2. Copy Times: The time required to copy or rebuilt a hot spare really depends on how large the drive is, the speed of the drive, the cache available on the drive, the cache available on the array, the type of the array, raid type and the current job processing on the array. Typical rebuilt times vary from 30 minutes to 90 minutes again depending upon how busy the storage subsystem is.
  3. Global hot Spare types: For every 30 drives (2 DAE’s of drives), consider having 1 drive as a Global hot spare. Also verify, for every drive type (size, speed) in the machine, you have at least one configured global hot spare. Good idea to have global hot spares on various different buses and spread across multiple Service Processors.
  4. Vault Drives: Vault Drives cannot be used for Global Hot Spares. The Vault drives are considered as the first 5 drives [ 0_0_0, 0_0_1, 0_0_2, 0_0_3, 0_0_4 ] on the Clariion System. If a vault drive fails, a Global Hot Spare takes over its position.
  5. Rotational Speed: Rotational Speed of the Global Hot Spare is not considered before invoking it. It might be a good idea to have Global Hot Spares running 15K RPM’s potentially with large size drives.
  6. Mixed Loop Speed: With certain Clariion Systems like CX3’s, available loop options are 4GB and / or 2GB and you can have a mixed loop speed in your machine, for hot spare selection the loop speed is not considered, in those cases it might be wise to have similar hot spares on both the 2GB and 4GB loops.