atom beingexchanged

Monday, November 30, 2009

Going cheap still has limits

Over the Thanksgiving holiday here in the US, I finally got a chance to catch up on a lot of the information on Database Availability Groups (DAG) and other neat new features in Exchange 2010.  I’ll get back to talking about earlier versions shortly, but one trend that got me thinking was that smaller organizations will be looking to use Ex2010 to get failover capability without clustering technologies and – therefore – at a lower cost.  The problem is that while you can implement DAG much less expensively than a traditional or CCR cluster, there are some severe limits you need to be aware of. 

Note: I will be attempting to keep everything very neutral in this article, but do keep in mind that I work for a High Availability/ Disaster Recovery solution provider (see notice below).

First, to spell out the Standard versus Enterprise versioning debate.  Yes, you can get DAG capabilities in the Standard version of Exchange 2010.  This means that you can create a DAG without the need for shelling out the extra cash for the Enterprise version of the Exchange Server software itself.  However, since DAG requires some of the components from Microsoft Failover Clustering, if you want to use DAG you must be on Server 2008 RTM or R2 Enterprise Edition.  So, in short, Exchange Standard is a yes, Windows Standard is a big no.

Also, keep in mind that each Exchange 2010 Server Standard may have no more than 5 databases on it.  There seems to be a good deal of confusion around that, but as has been quoted in Jim McBee's blog and other places, that doesn’t mean each Standard server can host 5 live databases.  It means that the total of both live and passive copies of databases housed on that server many not be more than 5.  So, if you want 1 live database on each of 4 servers, you can get away with Exchange 2010 Standard.  However, if you have 3 live databases on 2 servers, the Standard version is not enough to allow you to perform DAG on all databases, as that would make 3 live and 3 passive on each box, for a total of 6 per server.

One thing that is not limited is your ability to use any Client Access License (CAL) on any Exchange Server version you’d like.  Enterprise CAL’s run just fine on Exchange Standard, and vice-versa.  This means that end-users running on Standard can get nifty features without requiring you to upgrade to Exchange Enterprise.

So, smaller organizations may very well be able to use the Standard version of Exchange 2010 (but not Windows) in order to get DAG functionality for their databases and other higher-end feature sets.  Just keep in mind that there are still limitations on the Standard version, and avoid hitting those limits if you’re staying on Standard.

Labels: ,

Bookmark and Share
posted by Mike Talon at 0 Comments

Thursday, November 5, 2009

Time to pay the bills! Exchange 2003 and GeoCluster.

Exchange 2007 introduced the idea of Cluster Continuous Replication (CCR) to the world, allowing you to extend an Exchange Cluster between sites (especially on Server 2008) and to create more than one copy of the mailbox data. Exchange 2010 will introduce Database Availability Groups (DAG), further pushing the technology to provide up to 16 total copies of the mailbox data in any number of locations. Both of these technologies are stellar in their own right, but leave those who are still running Exchange 2003 solidly in the dust. Granted, Exchange 2003 is nearing end-of-life, but with a large portion of the market still running on it (at the very least until the upgrades are done), many folks need solutions.

As I work for Double-Take Software, of course I’m happy to advocate our cluster-extending technology to help alleviate the situation on earlier versions of Exchange Server. This is both because they pay me to vocally advocate it (the FCC may be watching) and because it works remarkably well. More so for the latter reason.

GeoCluster (which was once a stand-alone product but is now a feature set of Double-Take Availability), allows you to create a Microsoft Cluster using Microsoft Clustering Services (MSCS) on Server 2003, but to do so without creating a shared-disk configuration that could lead to a single-point-of-failure and will restrict you in terms of how far apart the nodes can physically be. The idea is simple, GeoCluster works under the hood of MSCS, replicating data on each disk resource from the owning node to all potential owning nodes in the cluster. So Exchange sees a traditional cluster, but in reality the disks are replicated, creating multiple copies of the data based on the active node for each disk.

Since GeoCluster can support any valid cluster configuration, you can freely create clusters that span more than 2 nodes, or even more than one physical site. Keep in mind, however, that you’ll still be limited by single-subnet restrictions in Server 2003’s MSCS implementation. The good news is that moving resources from node to node works exactly the same was as it would in a shared-disk cluster, and therefore automatic failover and on-command moves are all possible.

If you lose a node, GeoCluster lets the MSCS engine arbitrate who should take over, then begins replicating data from that new owner to all the other, surviving, potential owners. Once you repair or replace the original node, the system will sync up the volumes and be ready to allow you to move the resources back to the original node if you want to. This replication is all done with the Double-Take Replication Engine, which allows GeoCluster to have the same level of write-order integrity and data reliability as any other Double-Take connection.

So, until you’re ready to make the jump to Exchange 2007 and beyond, or if you cannot take advantage of CCR for whatever reason, have a look at the GeoCluster solution. It is a cost effective and reliable way to make MSCS even more flexible and reliable, and does so without making Exchange work differently than it was designed to function.

Don’t believe me?  Check out this TechNET blog post about what the MSFT Virtualization Team does with partners like DBTK.  We help them with clustering solutions for Hyper-V, and can help you with that and much more.

Tomorrow, back to my usual, non-vendor-specific stuff =)

Labels: , , , ,

Bookmark and Share
posted by Mike Talon at 0 Comments

Wednesday, November 4, 2009

Can I get a Witness?

Continuous Cluster Replication in Exchange 2007 allows for two nodes of a Distributed Failover Cluster (DFC) for Exchange to be held in different physical locations and different physical network segments.  This is a good thing to leverage if you’re not concerned with local High Availability, but can lead to some interesting issues if something goes wrong.  The two nodes will use their quorum resources to find out which server should be in control of the cluster (and therefore assign resources accordingly) – but that doesn’t help if the nodes cannot see each other due to network failure.

One of two conditions would happen if you ran into this situation as described this far.  You could run into split brain, where both nodes thing they’re in charge and bring up Exchange resources.  This can take hours or even days of manual work to fix, and therefore Microsoft has taken steps to prohibit it.  If either node can’t figure out who’s supposed to be in charge, both go offline to prohibit split brain at all costs.

The second potential situation is the opposite, that neither node can figure out who is in control and both therefore shut down.  While this doesn’t put your data in danger, it does effectively shut off your Exchange system, stopping all messaging flow.  Neither situation is good, but by default, if arbitration is not possible via either quorum or other means, this safer situation occurs.  Luckily, there are “other means,” specifically the File Share Witness (FSW).

FSW is a file share (as its name implies) that both nodes can see under normal circumstances.  It must be placed on a server that isn’t part of the cluster.  Usually, you find it on a file server within the environment, but be aware that it will need to be at least Windows Server 2003 SP1 or better.  The FSW should also be placed either locally to the preferred node (the one you want to “win” in the event  of arbitration) or in an independent location that can be seen by both networks where CCR nodes reside.

In a CCR cluster, there are only two nodes, so right off the bat, if an arbitration event occurs, neither node could gain a majority and take over if there was some communication failure or other emergency.  The FSW acts as a third resource that can be polled to find out who is in control. Both nodes will attempt to take ownership of the FSW, but due to the physical placement of the Witness Server, only one will successfully do so.  That node stays online as owner of the cluster, the other node prohibits resources from going live until the emergency has been resolved.  As you can see, placement of the FSW becomes a critical component to the overall success of this arbitration system.

If you have only two physical locations, your best bet is to place the FSW on a server in the secondary site.  This allows the cluster to properly arbitrate to the remote site if the production site goes offline.  If you have more than two locations, then you can place the FSW on a server at a third location, just make sure connectivity to that site is stable and constant to and from both CCR servers.  If that link is unstable to one or more sites, you can create accidental arbitration events when they’re not really needed.  The benefit to putting the FSW at a 3rd site is that you can survive a link outage at either CCR node location without having to manually force one node or the other to take control (called a Force Quorum Operation). 

Here’s an example of what I mean.  If you have only two sites, and place the FSW at Site 2, a network link failure at Site 1 would force arbitration to Site 2 since the CCR node at Site 1 would not be able to communicate with either the node at Site 2 or the FSW hosted there.  In this scenario, there may be no value to failing over to Site 2, but you would automatically fail over anyway.  If, however, the FSW is hosted at a 3rd site, and both sites can see it, then a network fault between Site 1 and Site 2 would not flip everything to Site 2. Since Site 1 is the preferred owner, and can maintain control of the FSW, it will stay in control of the cluster.

You can find out a lot more about configuring FSW for Exchange 2007 via this TechNET article. The use of FSW technology is mandatory for CCR, and will continue to be a good idea for Exchange 2010 and Database Availability Groups as well.  Learning how this technology works today will allow you to create redundant solutions that last through your future Exchange solution sets.

Labels: , ,

Bookmark and Share
posted by Mike Talon at 0 Comments

Tuesday, September 15, 2009

Get back to where you once belonged (Failover Cluster version)

In honor of the re-release of the Beatles stuff all over the world (games, CD’s, maybe iTunes at some point), I took the title of today’s post from their song “Get Back” on the album Let It Be (Remastered).

I am, of course, going to tie this to something in Exchange; specifically Exchange 2007 Standby Clustering. Standby clustering refers to the theory of using a replication engine (like the native CCR or a 3rd-party system like Double-Take Availability – see disclaimer below) to place a copy of the data for the Storage Groups of the production cluster onto a secondary cluster.  Once the data is replicated, you can use the /RecoverCMS commands to recreate the production Exchange Cluster Mailbox Servers (CMS’s) on that secondary cluster.

The solution set for bringing up the Storage Groups and CMS’s on another physical cluster setup in the same or another location is fairly well established.  If a single node fails on a production cluster, other nodes take over the failed Storage Groups and work resumes in a very automated fashion.  If multiple nodes, or the entire cluster, fail you use /RecoverCMS and the associated protocols to manually get everything working on another system – so long as a copy of the data exists to work from.

The problem has traditionally been best expressed by the phrase, “And then what?”

If the original cluster failed completely, the answer was simple.  Rebuild the systems with the same node names, but prepare the systems as though they would be a new /RecoverCMS target system.  However, if you have not lost the production systems, and they’re stable enough to be used again, you would still have to reinstall them without some additional help.  The most common reasons for this kind of outage are routine testing of the failover systems and extended power failures that generators and UPS systems can’t handle.

Microsoft does offer a command set to fix this particular problem, but it is not well known or publicized.  As a matter of fact, during a recent client troubleshooting session, we had a couple or techs from Microsoft on the phone (Premier Support in this case) and they were not aware of this particular method for cluster restoration.

Once you have fixed whatever went wrong, if your production cluster is still viable (and is suitably stable for continued use), you can use a command set called /ClearLocalCMS to remove the original CMS entries from the original production cluster.  Doing so is not without risks, and you should familiarize yourself with this KB article on the subject before you try it. 

/ClearLocalCMS will remove the CMS components off the original production nodes, clean up AD, and disable the virtual computer object for the original cluster CMS.  This ensures that Exchange doesn’t accidentally address the original cluster system, even after the restore process begins.  Once the CMS is cleaned, you can go about restoration of the data using the same tools as you used to get it over to the standby cluster in the first place.

To get back to your original servers, use the /RecoverCMS command in the opposite direction (from DR back to production) and then use /ClearLocalCMS commands to re-prepare your DR cluster for use in the next emergency.

Jumping between clusters is not an automated or easy process, but it does work correctly if you follow all the steps in both directions.  This set of command suites (/RecoverCMS and /ClearLocalCMS) can allow you to get back to where you once belonged, every time.

Labels: , ,

Bookmark and Share
posted by Mike Talon at 0 Comments

Tuesday, September 8, 2009

CCR clustering is still clustering, and so is DAG

As more and more of my readers move to Exchange 2007 and 2010 from Exchange 2003 and earlier versions, I hear a lot about how using the new High Availability tools will finally free them from the yolk of clustering in Windows.  While both CCR and DAG are definite improvements over traditional shared-disk clustering, neither is a departure from clustering entirely.

We’ll be talking about the new HA stuff in Exchange 2010 (along with much more of course) in the webinar Double-Take Software and Microsoft are presenting tomorrow.  I’m the speaker for Double-Take, and Patrick Foley from Microsoft is going to be doing their portion. It’s September 9th at 11am, and you can still register for free by clicking here.

In the meantime, it is important to realize that both CCR (Continuous Cluster Replication) and DAG (Database Availability Groups) are offshoots of Windows Failover Clustering (WFC).  They both change the way WFC works, and by quite a lot, so you may never touch the underlying cluster technology, but it is still there.

CCR – as its name implies – works by allowing you to create a cluster during the installation of Exchange 2007.  This one is a bit easier to see as part of WFC, as you have to create a Failover Cluster first – specifically a Distributed Majority-Node File Share Witness Failover Cluster.  After that, when you install Exchange Server you can specify which server(s) will be the Active node(s) and which will be passive.  This creates the clustered Exchange resources for you, making the overall process of setting up clustering for Exchange a lot easier.  As this one has Cluster in the name, it’s easier to see the WFC roots.

DAG will permit you to create the cluster itself from Exchange 2010 command sets, eliminating the need to pre-create the Failover Cluster prior to getting the Exchange installation rolling.  While this makes the process even easier than in 2007, it still requires that you have two or more servers capable of running Distributed Failover Clustering.  This means that not every version of Windows 2008 is going to be suitable for DAG, but also means that – under the hood – you still need to know how Distributed Failover Clustering works to properly manage the DAG systems.

In both cases, the required level of understanding of clustering is greatly diminished from what was needed in Exchange 2003 and earlier versions.  Most of the guts of the cluster are controlled by Exchange itself, which is a double-edged sword.  On one side you have the fact that folks who don’t have a lot of cluster know-how can now set up HA solutions for Exchange.  On the other side, people who don’t have a lot of cluster know-how are facing troubleshooting clustered Exchange solutions they may not have realized were there.

Both solutions work great for Exchange.  While they don’t eliminate the need for 3rd-party products to help with overall HA (and I’m biased on this one, see disclaimer below), they do make mailbox server protection much more complete.  Just remember that you’re still running on a cluster, and arm yourself with the knowledge needed to keep it running smoothly.

Labels: , , , ,

Bookmark and Share
posted by Mike Talon at 0 Comments

Wednesday, May 27, 2009

When your cluster goes “oops,” Using RecoverCMS

First, a quick note:  I’m posting this one from Windows Live Writer on Windows 7 RC1, which I’m happy to say is remarkably stable and much faster overall than Vista.  I’d recommend it wholeheartedly!

Funny story, I once had a client who swore that clustering was enough protection for their messaging environment, until an outage took out their entire cluster at once – causing them to be down for about week.  Now, that’s not the funny part, but what caused the outage is somewhat hilarious, more on that later.

Exchange 2003 and earlier had a pretty straight-forward method for recovering an entire MSCS cluster if one had failed on you.  You built one or more nodes of a brand new cluster, created an Exchange Virtual Server (EVS) Resource Group with the same parameters (names, IP’s etc) as the production system had, and Exchange would do the rest.

With Exchange 2007, the rules changed significantly, leaving many cluster users confused as to how the system now works if they suffer a cataclysmic failure of the production cluster.  Adding both Single Copy Cluster (SCC) and Continuous Cluster Replication (CCR) to the mix just makes things more confusing, so Microsoft created a new recovery method for Exchange 2007 clusters.  Called RecoverCMS, the system is really a setup task rather than a true failover system, but since your failover system just went belly-up, that’s not a bad thing.

If your Recovery Time Objectives are flexible enough to handle some downtime if an entire cluster fails, then you can leverage this system to get back up and running, either at the original production site, or at a new location.  There are some definite limits to what you can do with it which I’ll explain later, but he basics of how it works are pretty simple.

Step one is rebuild, repair or replace the original cluster hardware. If the repair works then you’re done, just restore any missing data from tape or other backup (due disclaimer, see below, I am biased on backup tools) and then resume normal operations. If you rebuild or replace completely, bring up a new server that is configured with Exchange 2007 in the Passive Cluster Node configuration.  You can find out how to do that:

Here for CCR clustering or,

Here for SCC Clustering

During that process you will also have installed the Exchange 2007 binaries on at least one node of the cluster system, so go to the directory that has the Exchange setup files and execute the following command:

Setup.com /recoverCMS /CMSName:<name> /CMSIPaddress:<ip>

Where <name> is the name of the EVS you’re restoring from, and <IP> is the IP address you want the recovered system to have – in theory the same IP as the original EVS had.

The rest of the procedure is pretty automated, and when finished, you will have a new EVS running on your new cluster node(s) that matches the original EVS and has all the users already assigned to it.  From there, you can restore your data if it was also lost to the disaster.

There are a few things that are extremely important to be aware of before you begin:

1 – Keep in mind that /recoverCMS is designed to restore a failed cluster only.  Attempting to use it for migration or for any other purpose will result in unpredictable behavior and is not supported by MSFT.

2 – You will need to manually create the volumes that existed on the failed cluster before you run /recoverCMS.  If volumes are missing then the recovery will fail.  They don’t have to be the same physical disk or size, just large enough to hold the data and with the same drive letters as the original cluster held.

3 – The System Attendant service will start and then immediately stop after you recover, this is normal, just bring the resource back online when you’re ready.

4 – Your databases are not mounted after a recovery, you must do this manually through PowerShell or the Exchange Management Console after you’re done with the restore.

5 – Do NOT try to use this across OS’s. If you started on Windows Server 2003, you must recover to Windows Server 2003, and 2008 to 2008.   It will not work if you try to go from one to the other.

6 – While you can pre-configure many portions of this system, it will still take some time to run through a /recoverCMS procedure from start to finish, so if you need a second-stage failover, /recoverCMS isn’t the best bet.  I’m quite biased on this (see disclaimer below), but unless you can be down for a few hours if both cluster nodes fail, you might want to go with another tool to provide remote site failover in addition to SCC or CCR clustering.

7 – Finally, SCR and CCR will not automatically work with /recoverCMS.  You will need to stop SCR if it’s running before you recover, and neither will resume automatically after the recovery is done.  Once you’re set up in the new node configuration, re-enable CCR and SCR manually as required.

/RecoverCMS is a great way to restore a failed cluster system to new hardware or rebuilt hardware after a fault.  You still need to back up your data to some device outside the cluster itself, but once you have that backup /recoverCMS can get your cluster back up and running much faster than the manual methodologies used in previous versions of Exchange.

As to the funny story I mentioned at the top of the blog, this particular client was in a hardened datacenter with UPS systems, 24/7 staff and a backup generator.  They were convinced that clustering was going to be more than enough for them.  After trying to explain that a shared-disk cluster (the only option at the time) had weak points, I finally gave up and let them be.  A few months later I got a great phone call.  Apparently – unbeknownst to the client – the datacenter crew had run all power connections through the UPS – including the generator.  The UPS was rated to handle the full power load of the datacenter on 1 of its 2 redundant circuit loops.  So far so good.  Well, this particular datacenter was in the middle of the dot-com boom (this was some time ago) and had grown exponentially in a short period of time.  What they had was well over half the full expected load on each of the two circuits, and one was failing.  So they diligently got replacement parts and moved the load over to the good circuit.  Since was over half the expected load, and circuit 2 was already under over half the load, they immediately overloaded the UPS, shorting it out.  The way it was explained to me, a solenoid shot through the casing of the UPS…and there was indeed a nice hole in the unit to back that up. No one was hurt, but needless to say, the whole datacenter was offline until they replaced the UPS, 4 days later, so they lost about one business week, without anything happening to the physical cluster at all.  Just goes to show you that anything that can go wrong, will.

Labels: , , , , , , , ,

Bookmark and Share
posted by Mike Talon at 0 Comments

Monday, April 27, 2009

The Dread Pirate Re-Seed – Part 1

Among the most common questions I get from clients about the new data-protection features in Exchange 2007 (and the soon-to-be-released Exchange 2010) is, “What is a re-seed and why does it happen?”  This mostly falls into the category of “fear of the unknown” since the technology is new, and documentation on how it works is somewhat scarce.

Re-seeds are a commonly confusing part of most protection methods, though they fall under different names and methodologies.  In a solution like Double-Take (see disclaimer below), they’re called re-mirrors or re-synchronization operations – and are typically differences only.  In a tape-backup solution it’s a restore operation, and might be everything, incremental pieces, or some combination thereof.  In Exchange 2007 these operations are called re-seeds, basically the replay of data from a server that has a “correct” copy to one that does not.  Today, we see these operations in Exchange 2007 Local Continuous Replication (LCR), Cluster Continuous Replication (CCR) and Server – or Standby – Continuous Replication (SCR).  Today, we’ll talk about CCR, and address LCR/SCR next week.

CCR allows an active node of a 2-node Active/Passive Exchange Cluster to replicate a copy its data to the passive node.  This allows the passive node to take over with a nearly-current copy of the data if the production system fails due to hardware or software failure.  There is a log replay lag to be considered, but it’s only 50 logs that need to be applied to the passive node during a rollover event, and that does give you some measure of protection against corruption if you catch it fast enough.  Otherwise, the system acts much like a traditional Single Copy Cluster (formerly Shared Disk Cluster) in behavior, and is controlled with a combination of Windows cluster tools and PowerShell.

Whenever a log file is committed, and a new prime log (usually E00) is created, the closed log is copied over to the passive node via an SMB share, where it is held until it passes the 50 log replay limit and is then committed to the database, or a rollover occurs and the logs are committed immediately.  Exchange 2010 will move away from the SMB share, but will utilize a similar methodology overall, if the beta is to be taken at face value.

In order to get the passive node in sync with the active data, the CCR system starts with a re-seed operation.  All data from the database is copied from the active node to the passive node, as well as any non-truncated logs.  From then on, only log files are copied, as they are committed on the active node.  If all goes well, this will probably be the only re-seed you see unless you have a rollover.

If you do flip nodes – let’s say from Node A to Node B – then Node B will re-seed back to Node A if Node A becomes divergent. In other words, if Exchange cannot determine what logs still exist on Node A, or if the logs are inconsistent, or if some are missing.  A graceful rollover will not cause a re-seed, but most emergency rollovers will require it.

The same will happen if you haven’t rolled over, but instead Node B was offline for some other reason.  When Node B comes back online, Node A will see if all the required logs are on both machines, and then either just continue CCR protection or else initiate a re-seed to copy the data over again if anything is amiss.  The only issue here is if your backup tools purge logs while Node B is still offline.  In that case the servers will be considered divergent and need a re-seed to get back up and running properly.

Finally, if a cluster is restored from a backup (tape or otherwise) to the active node, then a re-seed must be manually initiated to re-sync the nodes properly.  You will see errors telling you to do this after the restore is complete and you bring Node A back online.

One other condition exists, but it is a manually created condition. If you perform Offline Defragmentation of the database, you will trigger a re-seed operation when Node A is brought back online.  As long as the first Exchange log is still present (which it should be) then this will happen automatically. Otherwise, it will need to be initiated manually.

So, why is this an issue?  Normally, it’s not, but keep in mind that re-seed operations are *full* copies of the entire database.  So if you have relatively small databases and only a few of them, this isn’t a problem.  But let’s say you have over 1 Terabyte of data in your Exchange cluster.  Re-seeding that much data locally will be time and resource consuming, and doing it over a WAN (for distributed failover clustering) could be problematic – to say the least.  So you want to avoid re-seed operations at all costs and wherever possible, which means treating the CCR cluster very carefully, and following all the best practices from Microsoft on Exchange 2007 Clustering in general.

For information on when re-seeds occur, take a look at this TechNet article.  They’re not an everyday occurrence, but you will need to be sure you know when and why they will happen to avoid confusion and frustration.

Labels: , , , ,

Bookmark and Share
posted by Mike Talon at 0 Comments

Monday, September 15, 2008

Update on Wildcard certificates

Not that long ago I talked about using wildcard certificates to allow you to move OWA and ActiveSync services from one physical server to another.  Since single certificates are assigned to a single server, failing over or moving to another server would cause the clients to suddenly lose SSL connectivity, as the certificate would not match up, and ActiveSync devices cannot pop up the error about a non-secure connection. OWA can, but it can be troublesome with end-users who suddenly start seeing security warnings.

Following up on this theory of using a domain-assigned wildcard certificate, research has shown that older Windows Mobile devices (WM 5 or earlier) cannot leverage these types of certificates at all.  WM 6 and higher can leverage this technology, but earlier versions were not coded with the required information to recognize that a certificate could possibly be assigned to more than one physical server or networked device.

So, for those using WM 6, OWA and Outlook Anywhere, you can use wildcard certificates to allow for services to move from server to server as required.  For those using earlier versions of WM (or the 3G iPhone - though the jury is still out on that one), you must use server-specific certificates, and re-configure the devices' ActiveSync connection if the server itself moves. 

None of this impacts Blackberries, as they authenticate via the RIM network, and not directly to the Exchange servers.  If you're using Blackberries, wildcard certificates offer the ability for all other mobile systems (OWA, Outlook Anywhere, etc) to move with your servers in the event of a loss of a particular physical machine, while RIM will handle moving the Blackberry devices.

Long story short, if you're on WM 5 or earlier, be ready for a few support calls when you need to move the services or fail over between servers - even if you use wildcard certificates.  If your users are on the new iPhone systems, be sure to keep a close watch on Apple's forums, as new information is being discovered every day.

Labels: , , , , , , ,

Bookmark and Share
posted by Mike Talon at 0 Comments

Monday, August 11, 2008

Quite a stretch for clustering in 2008

Microsoft Clustering Services (MSCS) have existed in one form or another since NT4, but have always suffered from a significant limitation.  All cluster resources had to exist within the same logical subnet, or else you couldn't create the cluster itself.  Windows Server 2008 allows for some flexibility in that regard, with the ability to create nodes of a contiguous cluster in different logical subnets.

Before we dive too far into that, you may want to see the official MSFT information here:

http://technet.microsoft.com/en-us/library/cc770625.aspx

So what does this mean for you and I?  It means that we can create CCR clusters on Exchange 2007 that stretch between physical locations and subnets.  However, to do this you'll need to be on Server 2008, the function just isn't available in Server 2003. This allows you to provide basic availability for Mailbox Role (MBX) servers in your Exchange environment, but doesn't take care of everything when it comes to DR planning.

First off, this applies only to Exchange 2007 Enterprise Edition, and then only to MBX role servers.  While most other roles are natively fault-tolerant with multiple servers installed with the same role able to stand in for each other, organizational or regulatory rules might not make that kind of redundancy possible.  Edge servers are the biggest example of this.  Since they're not tied into the domain structures, they don't contain any way to quickly flip traffic from one Edge server to another in different sites.  Third-party tools (see disclaimer below) can often take care of that function for you, as can working with your DNS provider to facilitate moving the MX records in the event of an emergency.

If you have legacy Exchange 2000/2003 servers, you're also not able to take advantage of this new MSCS feature set on those boxes.  The same goes with any non-Exchange tools, like SQL, anti-virus servers, anti-spam servers, etc.  Even if these servers run on Server 2008 clusters, they'll require some third-party intervention to handle the data replication for those systems.  This would include things like Blackberry servers, GoodLink systems and other non-Exchange remote email tools.

Finally, keep in mind that CCR clusters can only extend to Active/Passive, 2-node configurations. That will mean you can't use these solutions if you need to go beyond that model - which Exchange 2007 easily allows without CCR involved.

Server 2008 Failover Clustering is a great method for basic High Availability for Exchange 2007 CCR systems - even across subnets.  With some additional tools, it can become the center of an Exchange DR solution set that can help your organization withstand even site-wide emergencies.

Labels: , , , ,

Bookmark and Share
posted by Mike Talon at 0 Comments

Friday, July 11, 2008

Dial-tone revisited

The theory of Dial-Tone Recovery (DTR) is one that has often been overlooked in the world of Disaster Recovery (DR) for Exchange Server.  However, even in Exchange 2007, DTR can provide a great method for immediate restoration of email services, though with a few things to keep in mind.

For those who haven't heard of DTR before, here's a primer:

If a primary Exchange 2000, 2003 or 2007 server fails, you can attempt to restore services by deleting the corrupted databases and re-starting Exchange services.  This will create blank databases and allow users to send and receive new mail, access new calendar items and access all shared contact information.  Running a /disasterrecovery install of Exchange on a rebuilt box with no data will do the same thing.  Though end-users can access their email systems again, there will be no historical data, so this isn't a total solution set for true DR, but gives you some options for immediate availability.

In an emergency, this can give you time to perform restoration steps - which could take quite a while to finish - without making everyone wait to get back basic send/receive capability. You can restore a copy of the historical data via several methods, taking the time you need to do it right.

If you used a brick-level tape or disk backup solution, you can restore mailboxes via that tool's recovery system. Archiving solutions and Continuous Data Recovery systems (like TimeData from Double-Take - see disclaimer below), can let you move mailbox, folder and other data back over time as well.  If neither of those tools are at your disposal, but you to have a backup of the database and logs, you can restore those to a Recovery Storage Group, and use ExMerge or Exchange 2007 tools to bring back mailbox data and merge it with the new information on the DTR-recovered server.

If no backup is available at all, you can still provide Exchange services from the point of DTR onward.  While no historical info will be available to them, the end-users will be able to send and receive new email, calendar entries and Public Folder data.

Labels: , , , , ,

Bookmark and Share
posted by Mike Talon at 0 Comments

Monday, June 23, 2008

Legally important

Providing Disaster Recovery (DR) for Exchange servers has always had strong arguments in its favor.  As an overall requirement, being able to get the email systems, calendars and contacts back up and running ranks pretty high up the list.  But there are other reasons to look at Dynamic Infrastructure solutions for Exchange that go beyond the convenience of Exchange end-users.

When email is stored only on the production Exchange server, it can be altered or destroyed by anyone with access to that server, which means anyone with an email account that allows them to see that particular user's mailbox.  This leaves you with a whopping legal liability (check local listings on exactly what that is for you), but one that can be avoided in most cases.

Using an Operational Recovery tool, like Double-Take TimeData (see disclaimer below), will allow you to ensure that another copy of the data is not only held off-site, but held in a repository that tracks all changes to the email, calendars, contacts and all other information.  This way if a critical change is accidentally applied, or if someone maliciously attacks the data on the production server, you have the ability to revert either individual items or the entire Store or Storage Group as required.

This means that if the data is required for a legal requirement (like court-ordered discovery), you can be sure that not only will the system be available, but also that any information that could be lost without impacting the server can also be quickly and efficiently restored.  Of course, being able to get that information back to some other server or even to a desktop or laptop is best, so look for tools that give you that flexibility.  After all, if you don't happen to have the original server running to perform the restore of information, you'll need to be able to designate someplace else to receive the data instead.

This doesn't change anything you might be doing with DR and Dynamic Infrastructure in your Exchange Environment, but gives you a deeper level of protection and flexibility that most standard DR tool-sets just can't natively offer.

Labels: , , ,

Bookmark and Share
posted by Mike Talon at 0 Comments