Saturday, November 28, 2009

High Availability in Exchange Server 2010

High Availability in Exchange Server 2010

The architecture of Exchange Server 2010 has been changed considerably from the previous versions of Exchange. High availability features of Exchange Server 2010 ensure messaging continuity in an organization. These features include incremental deployment, database mobility, and continuous mailbox availability. In addition to these features, Exchange Server 2010 includes enhanced disaster recovery options that help the recovery of data in the organization.

Incremental Deployment
The core architecture of Exchange Server 2010 has been improved to incorporate high availability and provide a messaging continuity service in an enterprise.

In the earlier versions of Exchange Server, for uninterrupted service for the mailbox server role, you have to create a clustered mailbox server and deploy Exchange Server in a Windows failover cluster. For this, you need to first build a failover cluster and then install the program files. If the Exchange Server program files are already installed on a non-clustered server, then to create a clustered mail box server, you need to build a cluster by using new hardware and move the mailboxes over, or uninstall Exchange Server from the existing server, install failover clustering, and then reinstall Exchange Server. Then, you need to restore the mailboxes from the backup.
The concept of a clustered mailbox server does not exist in Exchange Server 2010. Exchange Server 2010 features incremental deployment that allows you to achieve service and data for all mailbox servers and databases even after Exchange Server is installed.

Certain additional features in Exchange Server 2010 such as database availability groups (DAG) and database copies help you achieve service and data redundancy. High availability is achieved using the continuous mailbox availability and inter-site continuous mailbox availability solutions. These solutions combine the cluster continuous replication (CCR) and standby continuous replication (SCR) technologies. After you have deployed Exchange Server 2010, you can start deploying either of the two solutions for high availability and have these availability features enabled anytime you want.

How Database Mobility Works
In addition to the high availability and site resilience features introduced in Exchange Server 2007, Exchange Server 2010 introduces the concept of database mobility, DAG, and incremental reseed to assure a highly available Exchange environment.

Database mobility allows you to move a mailbox database between servers, DAG provides automatic database-level recovery from failures, and incremental reseed provides an automatic correction to discrepancies in database copies after an automatic failover.
Database mobility detaches the mailbox database from mailbox servers and helps maintain several copies of a database on multiple servers. It also provides a native experience for adding database copies to a database.

In Exchange Server 2010, storage groups have been removed. Therefore, continuous replication operates at the database level and not at the storage group level. Transaction logs can be replicated to one or more Mailbox servers and replayed into one or more copies of a mailbox database that is stored on those servers.

Database names for Exchange Server 2010 should be unique within the organization. In situations where a mailbox database has been configured with one or more database copies, the full path for all database copies on all Mailbox servers that host a copy must be identical.
A mailbox database copy can be backed up at any point in time using an Exchange-aware, Volume Shadow Copy Service (VSS)-based backup application.

Failures such as disk failures and server failures can affect individual databases. Recovery from such failures can be provided by DAG, which can contain up to 16 Mailbox servers.
A DAG is represented in the Active Directory as an object that stores information. This information includes server membership and the database copy. When a DAG is created, it is initially empty.

When a server is added to the DAG, a failover cluster is automatically created for the DAG. The infrastructure that monitors the servers for network or server failures is also initiated. To track and manage information about the DAG, the failover cluster heartbeat mechanism and cluster database are used.

The transaction log stream in between the source and target storage group may have certain discrepancies. In Exchange Server 2007, incremental reseed helps you correct these discrepancies by using the delayed replay capabilities of lost log resilience (LLR).

However, the incremental reseed feature does not provide a way to correct divergences in the passive copy of a database after divergent logs are replayed. This leads to the requirement of a complete reseed.

The upgraded version of incremental reseed in Exchange Server 2010 provides automatic correction of divergences in database copies.

This correction occurs in situations where there is an automatic failover for all configured copies of a database, a new copy is enabled at the location where database and log files already exist, or when a replication is recommenced after a suspension or a restart of the Microsoft Exchange Replication service.

Mailbox Availability in Exchange Server 2010
Exchange Server 2010 is designed to mitigate certain challenges with regards to the Mailbox availability in Exchange Server 2007.

Exchange Server 2007 provides several features such as integrated Setup experience, optimized out-of-box configuration settings, and the ability to manage most aspects of the high availability solution using native Exchange management tools. These features make deploying high availability and site resiliency solutions for Exchange fast and simple. However, Exchange Server 2007 faces certain challenges.
For managing high availability solution, the administrators have to master concepts of moving network identities and managing cluster resources. Troubleshooting issues related to clustered mailbox servers requires Exchange tools and cluster tools to be used for analyzing and correlating logs and events from the Exchange organization and cluster. At least four Exchange servers are needed to achieve full redundancy of the primary components of a deployment. This is because only the Mailbox server role can be installed on a node in the cluster. Failover of a clustered mailbox server occurs at the server level. Therefore, the administrators have to failover the entire clustered mailbox server to another node in the cluster or leave the users on the failed database offline for hours while restoring the database from backup.
Exchange Server 2010 has been designed to overcome these challenges. The CCR and SCR features of Exchange Server 2007 have been combined and enhanced into the database mobility feature. This feature along with continuous replication, and database copy features provide continuous mailbox availability. The database mobility feature provides automatic failover protection at the individual mailbox database level and not at the storage group level. This results in failover actions to complete in lesser time than in the earlier versions of Exchange. For example, with Exchange Server 2007, failover of a clustered mailbox server in a CCR environment takes about 2 minutes to complete. On the other hand, with Exchange Server 2010, failover of a mailbox database completes in about 20-30 seconds. The combination of database-level failovers and significant fast failover intervals considerably improves the overall uptime of the Exchange organization.

Although Exchange Server 2007 provides message redundancy with the help of the Transport Dumpster feature, it ensures that messages are not lost only when a cluster fails over. The Shadow Redundancy feature of 2007 provides redundancy for all messages that are in transit. This feature that messages reliably transmitted to their destinations by delaying the deletion of e-mail messages from the transport database until the transport server verifies the complete delivery of the message. Also, the truncation of the transport dumpster is based on log copy status. During the replication process, messages from the dumpster are not removed until they have been replicated on all the servers.
Continuous mailbox availability provides certain other benefits for organizations and their administrators. It allows multiple server roles to co-exist on servers that provide high availability. Organizations can deploy a two-server configuration that provides full redundancy of the mailbox data and at the same time provides redundant Client Access and Hub Transport services. Administrators can create a highly available environment without building standalone servers into clustered servers. The concept of event stream correlates related events from the operating system and the Exchange organization to help determine the root cause of the failure.
The inter-site continuous mailbox availability solution, besides providing high availability for storage, database, and server failures, provides additional benefits such as site resilience and rapid recovery from datacenter failures.

The following table summarizes the main differences in the way various features are implemented in Exchange Server 2007 and Exchange Server 2010 in order to achieve mailbox availability.



Sunday, May 17, 2009

High Availability for Exchange Server 2007

High Availability for Exchange Server 2007

Circumstances such as component failure, power outages, operator errors, and natural disasters can affect a messaging system's availability. To help prevent against such circumstances, it is crucial that companies plan and implement reliable strategies for maintaining high availability. A highly available messaging system can save money by providing consistent messaging functionality to users.

Exchange Server 2007 has three main High Availability features: Single Copy Cluster (SCC), Local Continuous Replication (LCR) and Cluster Continuous Replication (CCR). Exchange 2007 SP1 has an additional feature named Standby Continuous Replication (SCR), which can be classified as a Disaster Recovery feature rather than a High Availability feature.

Local Continuous Replication (LCR)
Local continuous replication (LCR) is a single-server solution that uses built-in technology to create and maintain a copy of a storage group on a second set of disks that are connected to the same server as the production Mailbox Server. LCR provides asynchronous log shipping, log replay, and a quick manual switch to a copy of the data.

Cluster Continuous Replication (CCR)
Cluster continuous replication (CCR) combines the replication and replay features in Exchange 2007 with failover features in Microsoft Cluster services. CCR is a solution that can be deployed with no single point of failure in a single datacenter or between two datacenters. CCR provides several advantages over clustering in previous versions of Exchange Server and single copy clusters in Exchange 2007. This feature takes the new Exchange Server 2007 Log file shipping and replay features and combines them with the features that are available in a more traditional 2 node Windows 2003 active/passive cluster setup. A traditional 2 node active/passive cluster certainly has its benefits, but it also has one major drawback and that is you still have a single point of failure when it comes to the Data Storage. CCR overcomes this disadvantage by storing the Exchange Active Database and Passive Database on different storage devices.

Single Copy Clusters (SCC)
Single copy clusters (SCC), known as shared storage clusters in previous versions of Exchange Server, are present in Exchange 2007, with some significant changes and improvements. With SCC, all of the hardware, including the disks used for Exchange data, must be listed in the Cluster category of the Windows Server Catalog. With CCR, the disks used for Exchange databases are local to each system and are not controlled or failed over as part of the cluster.

SCC provides redundancy for the server, but not for storage. CCR provide redundancy with no single point of failure. CCR allows you to simplify backup administration and offload backup IO demands completely to the passive replica server.

Friday, May 15, 2009

Microsoft Exchange Disaster Recovery with Site Resilience

Microsoft Exchange Disaster Recovery with Site Resilience

Messaging services are mission-critical or business-critical to all organizations. If the messaging system is not available, productivity can be lowered, and business and revenue opportunities can be lost. Even though site resilience is only one means of Exchange Server Disaster Recovery, in this post, we are focusing on the site resilience part of the Exchnange 2007 Disaster Recovery strategy. Site Resilience is highly important for organizations with multiple offices in different locations and all offices are relying on the same Exchange Servers, which will be the typical setup involved in most of the organizations. Site resilience will be useful in situation where the Primary datacenter is down or lost connectivity for a long duration which is not acceptable for the business to go with out Email.

Exchange 2007 SP1 has a very much avaited feature, named Standby Continuous Replication (SCR), which was only available with third party products. In Exchang 2007 SP1, microsoft has intgrated the Disaster Recovery feature to the Microsoft Exchange product itself. By integrating the Site Resilience in Exchange Server itself, Administrators has the advantage of managing the Exchange Server disaster recovery with in Exchange Server framework and avoiding third party support overheads.

SCR allows you to replicate your Exchange database information from your production servers to a standby server that can be brought online should the production servers be lost. Although existing Exchange 2007 technologies such as Clustered Continuous Replication (CCR) offer high availability, site resilience is something currently best achieved via SCR.

SCR enables a separation of high availability and site resilience. For example, SCR can be combined with CCR to replicate storage groups locally in a primary datacenter (using CCR for high availability) and remotely in a backup datacenter (using SCR for site resilience).

Sources and Targets
The starting point for SCR is called the source, which is any storage group on any of the following:
  • Stand-alone Mailbox server
  • Clustered mailbox server in a single copy cluster (SCC)
  • Clustered mailbox server in a CCR environment
The endpoint for SCR is called the target, and the target can be either of the following:
  • Stand-alone Mailbox server that does not have LCR enabled for any storage groups
  • A standby cluster, which is a failover cluster where the Passive Clustered Mailbox role is installed, but no clustered mailbox server (e.g., no Active Clustered Mailbox role) has been installed in the cluster

Cheers