Google+

True IT – Storage Stories: 1 (Dataloss)

September 22nd, 2009 No comments

Data LossAbout 4 years ago, got a call early morning from a customer where we were doing some data migration work. The customer decided to put the project on hold until they sorted out some issues in their storage environment.

Later in the day, had another call with the same customer and they passed some very terrifying news to us. It seems one of the storage array that we were looking to migrate the data from in their environment was managed by an independent service provider. This independent service provide did not worry about setting up email home on the system, which means in a case of a catastrophic failure the system will not notify anyone. The customer started reporting problems to the independent service provider that they were losing access to certain volumes in the host environment, started to see data corruption and within a few hours the entire disk array was completely gonzo.

This was beyond the capabilities of the independent service provider to fix and they escalated the call to the OEM to get this resolved. The OEM engineering folks and onsite teams worked round the clock for 4 days trying to recover data from the machine. Due to a failure to call home or email home on a failed component in the storage array, the data started getting corrupted and caused the entire system to fall on its knees. THE DATA WAS GONE!!!

The customer lost 60TB’s of RAW storage in a few hours.

Now the question shouldn’t be if the data can be recovered from backup tapes and other media, which the customer were able to over the next 3 weeks.

The primary question is why did it happen? And what can be done to prevent a catastrophic failure similar to this?

Lesson Learnt:

If you are in charge of managing any data in your organization today that is associated with storage arrays, open a call with the OEM or your Independent Service Provider on a monthly basis. Have them check and verify every storage array in the environment and if they are calling home or emailing home on a regular basis. If a modem is attached to the system, verify all components are working, if you have a TCP/IP/SSL based connection verify all is working, if you have email home features verify the emails are not getting queued on your exchange or may be your exchange IP address has changed.

These call home, email home, tcp/ip/ssl features allows the storage arrays to regularly communicate back to the OEM or your Independent Service provider with errors / warnings / events and heartbeats.

If you are using any SRM tools, please regularly check for alerts. If you are receiving any failed communication alerts, please escalate the situation immediately rather than waiting. Also verify you are not consistently seeing the same failed components in the array through the SRM tools.

True IT – Storage Stories: A Series

September 21st, 2009 No comments

True IT - Storage StoriesHave been thinking about this for the past week, with various IT – Storage related projects that are implemented in the industry, there are these times where one runs into very unique situations. These situations may work in their favor, work against them, possibly not affect the project or completely change the course of the project.

There are also times where you hear these stories from the customers since they may have encountered something out of the ordinary with their IT – Storage Environments completely unrelated to the project. These situations can occur out of the blue, take the customer by unknown and completely change their IT – Storage organization forever.

Each of these True IT – Storage Stories will have a number associated with it for example True IT – Storage Stories 1, 2 and so forth. Also you should be able to follow all the stories based on tag “True IT – Storage Stories”.

I will stay neutral (try to keep any negatively out) and not mention any customer names or involved people or products or OEM technology and again not promote any particular OEM or product against other.

Hope we all get to learn some lessons from these True Stories

Cloud: The Quest for Standards

September 17th, 2009 No comments

Note: This is my first attempt to write about the Cloud technology, please feel free to correct me if my understanding of any of the aspects is not correct.

Many of us in the industry think about Standards as a hindrance to technological growth. Creating Standards has been a long drawn process for any technology that is set for robust growth. Cloud Services is one of those areas where we expect an exponential growth over the next few years. It’s better to control the behavior of growth rather than taming or enabling haphazard growth.  Standards typically help and are known to work in the favor of the consumers.

cloud_computing_growth

Why Standards

Example

Everyone in the industry or home users heard about the battle between HD-DVD and Blu-Ray, years of consist & persistent fight between two major groups. Toshiba and alliances supporting the HD-DVD format while Sony and alliances supporting the Blu-Ray disc formats.

Billions of dollars of investment, millions in marketing, millions in legal battle, millions in customer investments, millions in HD-DVD investments, futures & employment dependent on those technologies.

In the end, HD-DVD loses the battle, Blu-Ray wins. Well who else lost, the consumers, the investments, the video game developers, the alliance partners and that means all of us. All the way around, YOU as a consumer is never a winner in this battle.

Example

Lets look at the WIFI technology 802.11 a/b/g/n. When all these WIFI technologies started, how excited were we, wow now I can connect my laptop over the air and browse the Internet and do not need a physical cable.

Imagine not having the 802.11 standards in place during this boom. You buy a Linksys WIFI for your home and install a Linksys Air card in your laptop and all works good. You go to the Starbucks around the corner and they have Netgear and your laptop WIFI stops working. You go to the office and they have a complete separate set of devices that are incompatible with your existing WIFI. All around it would have been chaos.

Wow, its good we didn’t go through that. Why, because there were standards. Interoperability was the key. Everyone made products, brought enhancements, but at the end of the day, they were all compatible with the users or consumers or customers.

Cloud

Standards

A Standard is typically a document that defines a certain set of shared protocols, resources, API’s, interoperability, security, methods, practices and other aspects relating to usage.

A Standards committee is normally formed with various manufacturers, service providers and experts that act as governing bodies defining and deciding on common technology practices.

Though do not get me wrong, Standards will not define your product features.

The Cloud Standards

Though some experts differ on the opinion for having a Cloud Standard early enough in the game, which is right now. I truly believe, in the best interest of the consumers, customers and investments, a common governing body should start deciding on the standards for Cloud Services.

Hypervisor

Hypervisor is the underlying technology, which enables virtualization in the cloud.

Question: Can a Standard be incorporated to enable applications to move from Hypervisor to Hypervisor (Vmware vSphere to Citrix Xen or Microsoft Hyper-V) without going through a redesign?

Result: That will enable your applications to move around the Cloud infrastructure irrelevant of the underlying technology provider.

Today: This cannot be done

Service Provider

Some of the famous and emerging Service providers today are Amazon, EMC Atmos (Beta), Nirvanix, Terremark, Savvis, Rackspace.

Question: Can you move your applications from Amazon to Rackspace and how easily can this be achieved.

Result: Free movement of the application irrelevant of the underlying Service provider, very hard to achieve.

Today: This cannot be done

Security

Security within the Cloud Infrastructure enables user, applications, application owners and other related applications to interact with each other based on standard protocols of communications.

Today: There are no standard security protocols used within the Cloud Infrastructure that are common amongst all service providers. Every security centric provider and service provider alliance will enable you with a certain security mechanism.

Storage

Underlying storage may ways be different, no real dependencies on use of Storage as it relates to Cloud and Standardization. Though Amazon uses a certain set of Storage, while Terremark and Savvis may use a certain kind, it should not create dependencies on application movements.

API’s

Application movement between private clouds. Application movement between public clouds. Application movement between private and public clouds.

Question: Can you move your applications from Terremark to Amazon’s EC2 and then back to the private cloud.

Result: The re-engineering of the application and the related API’s will enable that movement

Today: Very hard to do, or a very tedious process, No standards around it today.

Cost Models

Yet another driving force with Cloud based services. Some Service Providers today charge based on computational power, some on storage capacity, some on bandwidth, some on a mix of all

Question: Is your application hosted in Amazon’s EC2 costing you the same money that a similar application at Terremark would cost you.

Result: Standard pricing practices needs to be established. Pricing can vary based on feature sets.

cloud-question-mark-cloud-computing

Standards vs Dominance

So if you enable a certain technology to grow by itself in an uncontrolled manner, it just becomes a dominating factor and tends to mold people, industry, customers, ideas all in one direction. Typically that direction is where the dominance is.

If we wait too long for an establishment of a Standard for a certain technology, more or less that technology will be the Standard and everyone will need to adapt around it. Have you heard “Survival of the fittest and Dominance of the Strongest”?

The Customer

Remember you are talking about large scale investments with Cloud Services being developed, designed and implemented.

We should see exponential growth with the Cloud Services market over the next few years. This means billions of dollars will be poured into R&D, Infrastructure, Services and other related areas enabling Cloud Services.

This is the ripe time when things are yet not out of control, the more players join the mix, and the more dominance effect will persist.

This is the time to act and act in our favor to preserve the investment in technology that we all expect and hope to see growing. SNIA and DMTF is currently working on the Cloud Standards initiative (EDIT 09/17/09 at 11:15 AM).

It seems like as this post is being released, the first CDMI (Cloud Data Management Interface) draft is ready and released by SNIA. Glad to see the progress.


Unrelated to this discussion by a really funny Youtube Video by Mr. Larry Ellison of Oracle on Cloud Computing

Other Reading on Cloud (EDIT 09/17/09 at 11:15 AM)

http://cloud-standards.org/

http://www.snia.org/cloud

http://samj.net/search/label/cloud

http://thecloudclinic.com/tag/cloud-standards/

Note: This is my first attempt to write about the Cloud technology, please feel free to correct me if my understanding of any of the above aspects is not correct.


SUN ORACLE Exadata Version 2: Showing the power of ORACLE SUN

September 16th, 2009 No comments

The US Regulators gave the ORACLE purchase of SUN a go ahead several weeks ago, but EU Regulators are still actively looking at Antitrust laws with the possible buyout of SUN by ORACLE. ORACLE’s (Mr. Larry Ellison’s) quest to own an infrastructure company is becoming true with the purchase of SUN.

But the approvals haven’t stopped Mr. Ellison’s Team from redesigning the SUN ORACLE Exadata platform Version 2 (With SUN ORACLE logo’s on it). A joint venture between ORACLE and SUN has been on for several years now. Today was the day when the new Exadata platform version 2 was presented to the world by Mr. Ellison himself. It was truly visible, that Mr. Ellison is already taking a lot of pride with this acquisition even before its approved.

There was an advertisement earlier this week from ORACLE SUN challenging IBM and all its products and how Mr. Ellison now wants to go after IBM to become the top Infrastructure company. Said that, there are only 3 big infrastructure companies today, IBM and HP going neck to neck in terms of revenues competing for the 1st position, while the pending approval of ORACLE – SUN at number three.

Through it is great to see the vision of Mr. Ellison and how he is internally transforming ORACLE from being a software database company to an Infrastructure company. Today’s announcement of SUN ORACLE Exadata version 2 platform is very unique in that sense. Exadata products has been developed with years of partnership between SUN and ORACLE, but goes to show how both the combined companies can fulfill the datacenter vision END to END.

This platform extensively uses the SUN FlashFire technology and is truly the first OLTP (Online Transaction Processing) System designed to optimize customer data processing using a mix of SUN hardware and ORACLE software. It was very noticeable during the 35 minutes introduction where Mr. Ellison drove the presentation for more than 25 minutes and then handed over to John Fowler, EVP SUN for a technical talk.

Clearly ORACLE is targeting IBM and NCR – Teradata products with the release of this platform. It was obvious in the presentation that Mr. Larry Ellison used the word “THEY” numerous times signaling towards IBM and NCR. Though it was not said during the presentation, “THEY” could include HP as well. At this point without the final approval of the SUN purchase, it wouldn’t make a lot of sense for ORACLE to make another enemy, HP.

Here are some SUN ORACLE Exadata Version 2 platform highlights

  1. Exadata Version 2 is optimized for OLTP (Online Transaction Processing), first in its kind to hit the market.
  2. Typically SUN ORACLE Exadata Version 2 should give customers a 50X to 100X better performance than standard data warehousing servers.
  3. Optimized for Random I/O
  4. 1M IOPS per system cabinet
  5. Each system cabinet has 8 Compute servers, 176 total processors, 336 TB of Raw Disk, 5TB of Flash Cache (56 Flash Cache cards), 400GB to total DRAM in the 8 compute servers.
  6. Intel Nehalem processors, Infiniband switching, FlashCache, 4 Ethernet links per database node.
  7. Runs Linux System, Oracle manages cache, fully redundant compute servers and storage. On demand capacity expansion as it relates to compute servers, storage or Infiniband switches.
  8. 1 Node (computer server) is the smallest configuration, large configurations can be 8 nodes in one cabinet or 32 cabinets combined together to massively have 32 Million IOPS or several 100 Petabyte of storage optimized for OLTP.
  9. Infiniband speed per link is 40 Gbps aggregating to 880 Gbps for a system (cabinet), Non Blocking switch gives a full open & distributed system access for faster processing.
  10. Power consumption less than Exadata version 1.0 by 14%
  11. Fastest OLTP system, Fastest Data Warehousing system in the world
  12. All calculations done in memory (FlashCache), optimizing the system.
  13. Massively Parallel Processing, the scale out architecture helps and enables easy on demand expansion.

Couple things to note:

  1. Mr. Ellison calls Flash Disk as Dumb Flask Disk, truly remarkable.
  2. Also another highlight was to use 56 Flash Cache cards per system (5 TB) and then use 168 x 2TB SATA drives (Possibly 7.2K RPM) to optimize data space (Somehow didn’t make a lot of sense).


Some Questions to Consider
:

  1. Is this a real threat to Storage & Host providers where you have specialized Hardware / Software combination optimizing your performance for certain applications?
  2. Is this the power of ORACLE SUN that we will see in the future?
  3. How does this compete with EMC COMPUTE platform (rumors) or the Cisco – EMC Alpine Project (rumors)?
  4. Does the VCE (VMware – Cisco – EMC) partnership really focus towards the giant to come ORACLE SUN?
  5. What will happen to the ORACLE – HP partnership if the ORACLE SUN buyout gets approval, what happens to HP – Oracle Exadata?
  6. Does this create any Antitrust scenarios for the future?
  7. Is Mr. Ellison’s dream to own the infrastructure end with the purchase of SUN?


Here are some links for references if you would like to read more about Exadata products

http://www.orafaq.com/wiki/Exadata_FAQ
http://www.sun.com
http://www.oracle.com