Saturday, November 28, 2009

High Availability in Exchange Server 2010

High Availability in Exchange Server 2010

The architecture of Exchange Server 2010 has been changed considerably from the previous versions of Exchange. High availability features of Exchange Server 2010 ensure messaging continuity in an organization. These features include incremental deployment, database mobility, and continuous mailbox availability. In addition to these features, Exchange Server 2010 includes enhanced disaster recovery options that help the recovery of data in the organization.

Incremental Deployment
The core architecture of Exchange Server 2010 has been improved to incorporate high availability and provide a messaging continuity service in an enterprise.

In the earlier versions of Exchange Server, for uninterrupted service for the mailbox server role, you have to create a clustered mailbox server and deploy Exchange Server in a Windows failover cluster. For this, you need to first build a failover cluster and then install the program files. If the Exchange Server program files are already installed on a non-clustered server, then to create a clustered mail box server, you need to build a cluster by using new hardware and move the mailboxes over, or uninstall Exchange Server from the existing server, install failover clustering, and then reinstall Exchange Server. Then, you need to restore the mailboxes from the backup.
The concept of a clustered mailbox server does not exist in Exchange Server 2010. Exchange Server 2010 features incremental deployment that allows you to achieve service and data for all mailbox servers and databases even after Exchange Server is installed.

Certain additional features in Exchange Server 2010 such as database availability groups (DAG) and database copies help you achieve service and data redundancy. High availability is achieved using the continuous mailbox availability and inter-site continuous mailbox availability solutions. These solutions combine the cluster continuous replication (CCR) and standby continuous replication (SCR) technologies. After you have deployed Exchange Server 2010, you can start deploying either of the two solutions for high availability and have these availability features enabled anytime you want.

How Database Mobility Works
In addition to the high availability and site resilience features introduced in Exchange Server 2007, Exchange Server 2010 introduces the concept of database mobility, DAG, and incremental reseed to assure a highly available Exchange environment.

Database mobility allows you to move a mailbox database between servers, DAG provides automatic database-level recovery from failures, and incremental reseed provides an automatic correction to discrepancies in database copies after an automatic failover.
Database mobility detaches the mailbox database from mailbox servers and helps maintain several copies of a database on multiple servers. It also provides a native experience for adding database copies to a database.

In Exchange Server 2010, storage groups have been removed. Therefore, continuous replication operates at the database level and not at the storage group level. Transaction logs can be replicated to one or more Mailbox servers and replayed into one or more copies of a mailbox database that is stored on those servers.

Database names for Exchange Server 2010 should be unique within the organization. In situations where a mailbox database has been configured with one or more database copies, the full path for all database copies on all Mailbox servers that host a copy must be identical.
A mailbox database copy can be backed up at any point in time using an Exchange-aware, Volume Shadow Copy Service (VSS)-based backup application.

Failures such as disk failures and server failures can affect individual databases. Recovery from such failures can be provided by DAG, which can contain up to 16 Mailbox servers.
A DAG is represented in the Active Directory as an object that stores information. This information includes server membership and the database copy. When a DAG is created, it is initially empty.

When a server is added to the DAG, a failover cluster is automatically created for the DAG. The infrastructure that monitors the servers for network or server failures is also initiated. To track and manage information about the DAG, the failover cluster heartbeat mechanism and cluster database are used.

The transaction log stream in between the source and target storage group may have certain discrepancies. In Exchange Server 2007, incremental reseed helps you correct these discrepancies by using the delayed replay capabilities of lost log resilience (LLR).

However, the incremental reseed feature does not provide a way to correct divergences in the passive copy of a database after divergent logs are replayed. This leads to the requirement of a complete reseed.

The upgraded version of incremental reseed in Exchange Server 2010 provides automatic correction of divergences in database copies.

This correction occurs in situations where there is an automatic failover for all configured copies of a database, a new copy is enabled at the location where database and log files already exist, or when a replication is recommenced after a suspension or a restart of the Microsoft Exchange Replication service.

Mailbox Availability in Exchange Server 2010
Exchange Server 2010 is designed to mitigate certain challenges with regards to the Mailbox availability in Exchange Server 2007.

Exchange Server 2007 provides several features such as integrated Setup experience, optimized out-of-box configuration settings, and the ability to manage most aspects of the high availability solution using native Exchange management tools. These features make deploying high availability and site resiliency solutions for Exchange fast and simple. However, Exchange Server 2007 faces certain challenges.
For managing high availability solution, the administrators have to master concepts of moving network identities and managing cluster resources. Troubleshooting issues related to clustered mailbox servers requires Exchange tools and cluster tools to be used for analyzing and correlating logs and events from the Exchange organization and cluster. At least four Exchange servers are needed to achieve full redundancy of the primary components of a deployment. This is because only the Mailbox server role can be installed on a node in the cluster. Failover of a clustered mailbox server occurs at the server level. Therefore, the administrators have to failover the entire clustered mailbox server to another node in the cluster or leave the users on the failed database offline for hours while restoring the database from backup.
Exchange Server 2010 has been designed to overcome these challenges. The CCR and SCR features of Exchange Server 2007 have been combined and enhanced into the database mobility feature. This feature along with continuous replication, and database copy features provide continuous mailbox availability. The database mobility feature provides automatic failover protection at the individual mailbox database level and not at the storage group level. This results in failover actions to complete in lesser time than in the earlier versions of Exchange. For example, with Exchange Server 2007, failover of a clustered mailbox server in a CCR environment takes about 2 minutes to complete. On the other hand, with Exchange Server 2010, failover of a mailbox database completes in about 20-30 seconds. The combination of database-level failovers and significant fast failover intervals considerably improves the overall uptime of the Exchange organization.

Although Exchange Server 2007 provides message redundancy with the help of the Transport Dumpster feature, it ensures that messages are not lost only when a cluster fails over. The Shadow Redundancy feature of 2007 provides redundancy for all messages that are in transit. This feature that messages reliably transmitted to their destinations by delaying the deletion of e-mail messages from the transport database until the transport server verifies the complete delivery of the message. Also, the truncation of the transport dumpster is based on log copy status. During the replication process, messages from the dumpster are not removed until they have been replicated on all the servers.
Continuous mailbox availability provides certain other benefits for organizations and their administrators. It allows multiple server roles to co-exist on servers that provide high availability. Organizations can deploy a two-server configuration that provides full redundancy of the mailbox data and at the same time provides redundant Client Access and Hub Transport services. Administrators can create a highly available environment without building standalone servers into clustered servers. The concept of event stream correlates related events from the operating system and the Exchange organization to help determine the root cause of the failure.
The inter-site continuous mailbox availability solution, besides providing high availability for storage, database, and server failures, provides additional benefits such as site resilience and rapid recovery from datacenter failures.

The following table summarizes the main differences in the way various features are implemented in Exchange Server 2007 and Exchange Server 2010 in order to achieve mailbox availability.