(WO/1998/049620) REDUNDANT SERVER FAILOVER IN NETWORKED ENVIRONMENT
- Biblio. Data
- Description
- Claims
- National Phase
- Notices
- Documents
- Note: OCR Text
- Note: Text based on automatic Optical
Character Recognition processes. Please
use the PDF version for legal matters
- Note: Text based on automatic Optical
REDUNDANT SERVER FAILOVER IN NETWORKED ENVIROMENT 1. Technical Field This invention relates to distributed computing environments and in particular to methods and associated apparatus for activating a standby server upon takeover from a failed server in a redundant server environment with minimal delays imposed in client services and applicable to a wide variety of network topologies.
2. Background Art Distributed computing environments, as the term is used herein, are those in which client processes request services from server processes. In particular, distributed computing environments are often used in conjunction with network communication media and protocols to enable distribution of the various communicating processes across physically remote nodes and locations. A server process may, for example be operable in a server computing node while a client process which requests services from the server process may be operable in the same node or in a remote node connected via communication networks.
Computing nodes in a distributed computing environment are connect
to the network communication medium via network adapters or network
interface cards (also referred to herein as NICs). A NIC provides circuits to
receive and transmit data over the network communication medium on behalf
of the computing node in which it is housed. As used herein, such a
computing system may include general purpose computer systems (e.g., host
systems) into which a NIC is inserted as well as peripheral devices with
embedded NIC circuits which attach the peripheral to the network medium. A
server or service process may be, for example, a file server providing
coordinate access to files on behalf of a plurality of client processes, a print
server providing printer functions to a plurality of client process, or any other
function which provides services on behalf of a client process. A server
Protocols used in network communication are often modeled as layers of modules. The lowest level layers manage the lowest levels of processing required to apply signals to the network communication medium destined to another node. Higher level modules manage correspondingly higher level functions relating to transferring information between two processes (e.g., routing of messages between a client node and a server node or error recovery retransmissions).
At the lowest layers of network communications in many common protocols used in network distributed computing environments (e.g., Ethernet, Token Ring, FDDI, etc.), the NIC has an address associated therewith often referred to as a media access address or MAC address (also referred to herein as a physical address). This address is programmed into the NIC typically at time of manufacture in accordance with industry standards which help assure a globally unique address is assigned each NIC. The MAC address is used to uniquely identify and distinguish the communicating nodes on the network medium. Blocks (packets) of information applied to the communication medium typically include address fields which identify the NIC which is the source of the packet and the NIC which is the destination of the packet. The MAC address of the NIC is therefore physical in nature in that it identifies the NIC globally regardless of the particular system in which it is operating.
Higher level protocols (e.g., TCP/IP) utilize other addresses (also
referred to herein as logical addresses) within their portions of the data packet
to identify higher level components of the network. In TCP/IP protocols, for
example, a packet includes a source and destination IP address to identify
the communicating nodes. These addresses are logical in that they may be
In many network protocols, portions of the protocol are defined for exchange of address mapping information so that changes in address mapping can be promulgated throughout the network. In TCP/IP protocol standards, for example, an address resolution protocol (ARP packets) is defined for use in transferring such address mapping information throughout the network. A first node sends out a packet on the network including a higher level logical address (e.g., an IP address in TCP/IP) for which it requires the low level physical address (e.g., NIC MAC address). Nodes in the network typically cache such information after acquiring it a first time to reduce the volume of such network traffic.
It is known in the art to provide redundancy as a means for improving reliability and availability of a computing application. In a client/server distributed environment, redundant server nodes are often utilized to help assure reliable access to the service(s) provided thereby. Typically, one server node or system provides a particular service while its redundant paired server node remains idle (with respect to provision of the same service). The idle second server node takes over the provision of the service when it senses that the first server has failed in some manner. The process of taking over service on behalf of a failed server is also referred to herein as "activating" the idle second server.
In general, it is a problem in such environments to efficiently permit
one server node to take over for a redundant but failed server node with
respect to network communications addressing. For example, in TCP/IP
In other prior techniques applied to TCP/IP networks, the redundant server simply assumes the IP address of the failed server when taking over from the failed server thereby creating a new association (mapping) between an IP address and a lower level MAC address. However, client processes attempting to use the services of the redundant server will continue to use the cached MAC address associated with the failed server's IP address.
Information routed to the failed server at its IP address will therefore not be received by the redundant server because the MAC address used in the transmission is not the MAC address used by the redundant server's NIC card. Eventually, the client process node will timeout awaiting acknowledgment of information sent to the failed servers IP and MAC address. This timeout (often referred to in TCP/IP protocols as the ARP timeout) will cause the client to request an update of its cached IP/MAC address mapping. ARP messages are exchanged to update the clients ARP cache and processing then continues with the redundant server.
Such problems are not unique to TCP/IP protocol applications. More generally, it is a problem with redundant network servers to take over for one another in a manner which is rapid, transparent to client applications, and applicable to a wide variety of network applications and topologies.
It can be seen from the above discussion that a need exists for an
improved method for activating a standby (redundant) server on a network
when taking over for a failed server. Specifically, a need exists for a method
3. Disclosure of Invention The present invention solves the above and other problems, thereby advancing the state of the useful arts, by providing methods and associated apparatus for essentially immediate restoration of service to all clients in a wide variety of network topologies and applications. In particular, the methods of the present invention restore service to clients without need for the client processes to await error recovery timeout conditions and associated updates of their address mapping cache (e.g., ARP cache in TCP/IP protocols). The methods of the present invention are applicable in Ethernet as well as ring topologies (e.g., Token Ring and FDDI).
Specifically, methods of the present invention utilize multi-cast features of modern NICs to enable a NIC to receive information addressed to any of a plurality of MAC addresses. In particular, the NIC in a redundant server is configured to actively send and receive information using its own unique MAC address but is additionally configured to receive information addressed to the unique MAC address of the server for which it is the redundant mate. Multi- cast features are generally available on a number of commercially available NICs for enabling a NIC to "listen" to a plurality of MAC addresses. In true multi-cast applications, a multi-cast MAC address is used to broadcast information to a plurality of NICs all configured to receive information packets addressed to the multi-cast address as well as their own unique MAC address. Typically the multi-cast address is programmed into the NIC to provide flexible configuration of the multi-cast addressing.
The multi-cast address feature of the redundant server's NIC is programmed to receive packets addressed to the address of its paired server.
When the redundant server determines that it must take over provision of the
This feature of the present invention helps alleviate delay problems inherent in prior solutions. Other clients, depending upon the system architecture of the client, may not update their ARP cache. Rather, they will continue to transmit messages to the physical address associated with the failed system.
However, the redundant server will receive such messages and respond appropriately as above.
In addition, methods of the present invention are applicable to a wider variety of network topologies than are previous techniques. Specifically, the methods of the present invention are applicable to, for example, Ethernet, Token Ring, and FDDI network topologies. Whereas prior techniques are inapplicable to Token Ring and FDDI (as well as other ring topologies), the methods of the present invention utilize the preconfigured MAC address uniquely assigned to the redundant server's NIC when responding to requests directed to the failed server's IP address.
Preferably, the redundant servers operate in a symmetric manner in
that each may serve as a standby by for the other. The definition of standby
and primary server as used herein, is therefore a relative one which describe
a role and mode of operation rather than necessarily a physical designation of
the servers. Specifically, each server of a pair of redundant servers acts in
the role of primary server with regard to services it provides to particular
clients. Simultaneously, each server of the redundant pair of servers acts as
In a first embodiment of the present invention, both redundant servers continually monitor the physical address of the other so as to maintain a synchronized state in case a take over is required. The standby server in such an embodiment may therefore assume the identity of the failed server with complete knowledge of the present state of the failed server. Such an embodiment is preferred where the network protocols in use between the clients and servers are state based rather than state-less.
In the preferred embodiment where state-less protocols such as network file system protocols (NFS) are used, a standby server need not constantly monitor messages addressed to the primary server. Rather, the standby server takes over from the failed primary server when the primary server is determined to have failed. At the time of such take over processing the standby server assumes the identity of the primary server. This relieves the standby server from the overhead of receiving messages destined for the primary server during normal operation which are ignored (other than maintaining state synchronization with the primary server as above).
It is therefore an object of the present invention to provide methods and associated apparatus for activating a standby server in a redundant server network.
It is another object of the present invention to provide methods and associated apparatus for activating a standby server in a redundant server network having any of a number of topologies including Ethernet and ring topologies.
It is a further object of the present invention to provide methods and associated apparatus for activating a standby server in a redundant server network while minimizing delays in service to clients.
It is still another object of the present invention to provide methods and
associated apparatus for activating a standby server by assuming the
It is still a further object of the present invention to provide methods and associated apparatus for activating a standby server by assuming the physical address identity of a redundant primary server when the primary server fails and for maintaining a synchronized state between the primary and standby servers for requests processed by the primary server.
It is yet a further object of the present invention to provide a method and associated apparatus for activating a standby server in a redundant server network while minimizing delays in service to clients due to take over of a failed server by the standby server.
The above and other objects, aspects, features and advantages of the present invention will become apparent from the following detailed description and the attached drawings.
4. Brief Description of the Drawings
FIG. 1 is a block diagram of a redundant server network environment
operable in accordance with known techniques devoid of the methods and
apparatus of the present invention;
FIG. 2 is a block diagram of a redundant server environment operable
in accordance with the improved methods and apparatus of the present
invention;
FIG. 3 is a flowchart describing the operation of a redundant server
while operable in a standby mode and while operable to take over from a
failed server for a state based protocol requiring synchronization between the
servers; and
FIG. 4 is a flowchart describing the operation of a redundant server
while operable in a standby mode and while operable to take over from a
failed server for a state-less protocol requiring no synchronization between
the servers.
5. Detailed Description of the Preferred Embodiments While the invention is susceptible to various modifications and alternative forms, a specific embodiment thereof has been shown by way of example in the drawings and will herein be described in detail. It should be understood, however, that it is not intended to limit the invention to the particular form disclosed, but on the contrary, the invention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the invention as defined by the appended claims.
FIG. 1 is a block diagram depicting a typical network distributed client/server distributed computing application 100. As noted above, client/server model computing environments are applicable to networked as well as non-networked computing environments. In general, client processes request and receive services by communicating with server processes. The various client and server processes may be operable within a single, stand- alone, non-networked computing environment or may be distributed over a plurality of networked computing nodes. Though the client/server distributed computing model covers such a broad range of computing environments, the problems addressed by the present invention stem from configurations in which redundant server processes are each operable on distinct computing nodes and a client process is operable on yet another computing node.
In FIG. 1, a service is provided by system A 102 (also referred to herein as a server or server node) to client X 106 via communications over network 108. System B 104 (also referred to herein as a standby server or a redundant server) stands ready to take over provision of the service from server 102 in response to a failure detected in server 102. All nodes (102, 104, and 106) are attached to the network 108 via network interface circuits 112, 114, and 116 (also referred to herein as NICs), respectively.
As noted above, it is common in many network protocols to associate a low level (physical) address with each device (NIC) attached to a network.
NIC 112 is associated with a physical address (MAC address) of MAC.A, NIC
Service requests generated by client 106 include source and
destination logical addresses indicating that the service request is from client
106 by its logical address IP.X and is destined to server 102 by its logical
address
Server 104 is idle (with respect to the particular service nominally
provided by server 102) awaiting an indication that server 102 has failed in
some manner. Such indications are provided and sensed by well known
means outside the scope of the present invention. For example, servers 102
and 104 may periodically communicate to determine each others respective
operational status. Failure to appropriately respond to such an inquiry may
suffice as an indication that the systems filing to respond has failed and that
the redundant server should take over control of the service nominally
Redundant server 104 takes over provision of the services nominally provided by server 102 when it senses a failure in the operation of server 102.
In accordance with prior methods, redundant server 104 may adopt the
identity of failed server 102 by assuming both the logical and physical
address identity
In accordance with other known techniques, redundant server 104 may
adopt only the logical address of failed server 102 (e.g.,
Eventually client 106 will detect an error in transmission of a packet to
the physical address of failed server 102. For example, a timeout condition is
likely to occur after sufficient time has passed while client 106 waits for a
response to its request. In response to such an error client 106 may invoke
protocol messaging features to inquire as to a new mapping of the logical
address of server 102 to a new physical address. In TCP/IP protocols such
messaging is referred to as address resolution protocol or ARP. ARP
messages are generated by client 106 in response to an error condition in
communicating with server 102 and redundant server 104 responds with a
Client 106 includes ARP cache 120 wherein it store address mapping information received in its initial configuration and as modified over time by address resolution protocol message exchange over network 108. Client 106 relies on the information stored in ARP cache 120 to avoid excessive ARP message traffic being generated on network 108 thereby using available bandwidth. Information in ARP cache 120 is updated when new mapping information is made available to client 106 or at certain time based events (e.g., timeouts corresponding to error conditions in network message traffic).
As noted elsewhere, these delays may be undesirable in certain high performance distributed applications. During these delay periods the service nominally provided by failed server 102 are simply unavailable to client 106.
Although redundant server 104 can assume the logical address identity of failed server 102, client 106 remains unaware of the address mapping change until the ARP cache 120 is updated. Therefore, redundant server 104 receive none of the requests from client 106 and hence can do nothing to process such requests.
These and other problems of prior redundant network server take over
techniques are solved by the techniques of the present invention. FIG. 2 is a
block diagram of a distributed computing networked application 200 similar in
many respects to that of FIG. 1 but operable in accordance with the methods
of the present invention. In particular, each server 202 and 204 is operable to
take over operation of the other in case of failure. Each server 202 and 204
is configured with its own logical address, IP.A and
Each NIC 212 and 214 is configured with its own unique physical address,
MAC.A and MAC.B, respectively. These nominal addresses are used in
normal operation to identify each server and its associated service in
communication over network 208 with clients 106.
In addition to their respective nominal logical and physical addresses,
each server 202 and 204 is configured to remember the logical address of the
alternate server, namely IP.B and
When server 204 detects a failure in the operation of server 202 it takes over processing of requests destined to server 202. Server 204 assumes the logical address identity (e.g., IP.A) of the failed server 202.
Since server 204 was monitoring requests destined to failed server 202 it may immediately process any outstanding request and return an appropriate response to requesting client 106.
The response from redundant server 204 to client 106 is generated
with the assumed logical address identity of failed server 202 (e.g., IP.A) and
the physical address of NIC 214 in redundant server 204 (e.g., MAC.B). As
noted above, depending upon the internal architecture of client 106, client
106 may immediately recognize in the response packet that the mapping of a
logical address for server 202 (e.g., IP.A) to a physical address has changed
and should now be directed to the new physical address received as the
source physical address in the response, namely MAC.B the physical address
of redundant server 204. Further requests to logical address IP.A are
therefore transferred via network 208 to server 204 rather than failed server
202. Other client architectures may not update their address mapping
information based upon the response message. Rather, such clients will
However, redundant (standby) server 204 receives, processes, and responds to such messages as described above during such take over mode processing of requests on behalf of the failed server.
Client 106 receives its responses essentially without delay under this method of the present invention because redundant server 204 is prepared to respond to any monitored message destined to server 202 in response to detecting a failure in operation of server 202. This method of the present invention is usable in a wider variety of network topologies such as ring networks because transmissions from server 204 via its NIC 214 use the preconfigured nominal physical address associated therewith, namely MAC.B.
Redundant server 204 continues to process requests addressed to the logical address of failed server 202 until server 204 detects that server 202 is again operable and prepared to resume normal processing of service requests. When so sensed as operable, server 202 will begin processing requests directed to its original logical address (IP.A) in a manner analogous to the take over processing above. Specifically, server 202 will monitor messages sent to its logical address but to the physical address of server 204 (MAC.B). Server 202 will process the next such request (in cooperation with redundant server 204). The response generated by such processing will use the logical address of server 202 (IP.A) and its nominal physical address (MAC.A). As above, client 106 will immediately detect the change in logical to physical address mapping reflected in the response received and update its ARP cache 120 accordingly.
Those skilled in the art will recognize that the operation of servers 202
and 204 is essentially symmetric. Each acts as a redundant service provider
for the other. Each performs a take over process when it senses the other
has failed. Each permits the other to resume processing of requests by
All address changes required for the take over and reversion processing between the two servers are performed as described above such that all information transmitted from a server uses the nominal physical address for that server. This feature allows the method of the present invention to be applied to ring topology networks such as Token Ring and FDDI in addition to Ethernet bus topologies. In addition, client 106 is notified immediately of the address mapping changes required in its ARP cache 120 by the first response returned from the redundant controller 204 following its take over from server 202. This feature eliminates delays incurred by previous techniques wherein the client would be unaware of the address change until potentially lengthy timeout delays when the client would naturally seek to update ARP cache information (e.g., in response to transmission and retry timeout errors).
FIG. 3 is a flowchart describing the methods of the present invention
operable within server 202 and 204 to perform the take over processing as
described with respect to FIG. 2 above. As noted above, the processing
within each of a pair of redundant servers is (typically) symmetric. Each
server provides its own particular services and serves as a redundant server
to take over provision of services by the other in case of failure. The
processing described in FIG. 3 is therefore applicable to either server. In
particular, the method is preferably operable simultaneously within both
servers 202 and 204 of FIG. 2 wherein each provides its own services to
requesting clients and also provides a standby (redundant) service for the
services of the other in case of failure. Those skilled in the art will recognize
that the methods described herein are equally applicable if only one of the
pair of servers 202 and 204 performs the role of redundant server. Similarly,
the methods described herein may be trivially extended by those skilled in the
art to encompass more than two redundant servers. Any number of
redundant servers may be operable in accordance with the methods of the
FIG. 3 describes a method of the present invention most applicable to
state based network protocols in which messages are potentially
interdependent upon one another. Exemplary of such a state based protocol
is the
The method of FIG. 3 is required where state based protocols are applied to the network but is also operable in application where state-less protocols are utilized.
Element 300 is first operable to determine the physical address
(MAC.B) for the alternate system B. This may be determined by any form of
communication between system A and system B. For example, the two
servers may exchange configuration information at initialization via the
common network 208. Alternatively, there may be a dedicated
communication channel between systems A and B for purposes of such
Element 304 then provides configuration information to system A's NIC to enable the NIC to receive information on either of the two physical addresses associated with the redundant pair of servers (MAC.A or MAC.B).
As noted, standard multi-cast features of most commercially available NICs may be used to receive information on either of two physical addresses. One exemplary such NIC device utilized in the preferred embodiment of the present invention is the Digital Semiconductor 21140A PCI Fast Ethernet LAN Controller made by Digital Equipment Corporation (DEC) of Maynard Massachusetts. This integrated circuit provides high speed Ethernet communications including programmable multi-cast features to enable operation of the methods of the present invention.
The method of the present invention as shown in FIG. 3 then enters a normal processing mode awaiting failure of system B which, in turn, initiates a take over process. Specifically, element 306 is operable to determine if such failure indicia from system B has been sensed. As noted above, any of several well known methods for detecting a failure may be utilized in conjunction with the methods of the present invention. For example, so called watchdog timer messages may be exchanged between the redundant servers on a frequent periodic basis. Failure to receive such a watchdog message from the alternate system is an indication that a take over process is required.
Such messages may be exchanged via the common network 208 of FIG. 2 or over a dedicated communication channel among the redundant controllers.
So long as element 306 does not sense such a failure indication,
elements 308 and 310 are operable to process normal requests received by
system A through its NIC via network 208 addressed to its nominal logical
and physical address (e.g., IP.A and MAC.A). Specifically, element 308
Element 310 assures that information received via system A's NIC destined for system B's physical address (MAC.B) is processed to the extent necessary to synchronize with the state processing of the network protocol in use. Beyond such synchronization, the message is otherwise ignored in the sense that no reply is generated or transmitted to the requesting client by the standby server. Depending on the computing environment (e.g., operating system features and networking features), element 310 may represent proactive steps to assure such packets are ignored or discarded within system A. In other environments, such messages may be discarded at lower levels of the computing environment (e.g., at the NIC or within various of the layers of the network processing modules - the protocol stack). Essentially, element 310 is shown to stress that packets received by system A which are destined for system B are processed differently than packets destined to system A during normal processing so that system B may process such requests in its normal course of processing. Normal processing continues iterativley with elements 306-310 until element 306 senses an indication of failure in the alternate system B.
Responsive to sensing of a failure by element 306, elements 312-316 are iteratively operable to perform redundant mode processing within system thereby processing requests destined to the failed system B as well as those destined to system A for normal processing. Specifically, element 312 is operable in like manner to element 308 to process any requests received by system A's NIC which are addressed to system A for normal processing.
Such requests are processed by performing the requisite processing and
returning any required messages to the requesting client. Such requests and
responses use the nominal logical and physical addresses of system
Element 312 is also operable to process requests received on system
A's physical address (MAC.A) but directed to system B's logical address
Requests generated by clients following such adjustments to their address
mapping tables are directed to system B's logical address (IP.B) but systems
A's physical address (MAC.A). Such requests are processed on behalf of the
failed system B as though they were normally directed for processing to
system A. As noted elsewhere, some client may not adjust their respective
address mapping tables. Such clients will continue to address requests to the
address of failed system B
Element 314 is operable within system A to process requests destined to system B and normally ignored by element 310 during normal processing.
During redundant processing, element 314 receives and processes requests
addressed to the logical address of failed system B
When element 316 determines that system B has been restored to an
operation status, processing returns to element 306 to continue processing
requests in normal mode wherein requests directed to system B are
processed by system B. To achieve the switch back to normal processing
within restored system B, system B must be prepared to receive a request
addressed to its logical address (IP.B) but still addressed to the physical
address of system A (MAC.A) due to the address mapping changes induced
by system A's response to a clients request. Restoration of normal
processing is therefore analogous to the processing shown in FIG. 3 to
perform an initial take over of operations for a failed system. Specifically,
system B will receive requests addressed to its logical address (IP.B) with a
physical address of either system A or B (MAC.A or MAC.B). Responses
generated to processing of such requests will utilize the nominal logical and
physical addresses of system B
FIG. 4 is a flowchart describing a preferred embodiment of the methods of the present invention operable within server 202 and 204 to perform the take over processing as described with respect to FIG. 2 above.
As noted above, the processing within each of a pair of redundant servers is
(typically) symmetric. Each server provides its own particular services and
serves as a redundant server to take over provision of services by the other in
Similarly, the methods described herein may be trivially extended by those
skilled in the art to encompass more than two redundant servers. Any
number of redundant servers may be operable in accordance with the
methods of the present invention by appropriately coordinating among
redundant server which will assume the identity of the failed server. FIG. 4
describes the processing of the methods of the present invention with respect
to a pair of server for simplicity of presentation. Specifically, FIG. 4 describes
the operation of the methods of the present invention from the perspective of
system A (server 202 of
FIG. 4 describes a preferred embodiment of a method of the present invention wherein exclusively state-less protocols are used in communicating over the network. Exemplary of such a state based protocol is the network file system (NFS) networking protocol used for file sharing among computers.
Under this protocol a newly received message requires no state information derived from processing of preceding messages. In such state-less protocol applications, the standby server need not monitor message traffic destined to the primary server during normal processing by the primary server.
Specifically, in the preferred method of FIG. 4, each server receives packets
addressed only to its nominal physical and logical address during normal
operation. Only when a failure is sensed in the primary server (system B as
described in FIG. 4) does the standby server (system A) reconfigure its NIC to
receive packets destined to either physical address (MAC.A or MAC.B).
Element 402 is first operable to logically bind or associate the logical address (IP.A) for system A to the physical address for system A's NIC (MAC.A). This binding is a logical configuration which establishes the logical to physical address mapping for information to received or transmitted by system A via network 208. Element 404 then provides configuration information to system A's NIC to enable the NIC to receive information on its nominally assigned physical address (MAC.A).
The method of the present invention as shown in FIG. 4 then enters a normal processing mode awaiting a failure of system B which, in turn, initiates a take over process. Specifically, element 406 is operable to determine if such failure indicia from system B has been sensed. As noted above, any of several well known methods for detecting a failure may be utilized in conjunction with the methods of the present invention. For example, so called watchdog timer messages (also referred to as heartbeat messages) may be exchanged between the redundant servers on a frequent periodic basis.
Failure to receive such a watchdog message from the alternate system is an indication that a take over process is required. Such messages may be exchanged via the common network 208 of FIG. 2 or over a dedicated communication channel among the redundant controllers.
So long as element 406 does not sense such a failure indication, element 408 is operable to process normal requests received by system A through its NIC via network 208 addressed to its nominal logical and physical address (e.g., IP.A and MAC.A). Specifically, element 408 processes normal requests destined for system A by performing the requested processing and returning normal responses resulting therefrom. Processing then continues iteratively performing elements 406 and 408 until a failure is sensed in system B.
Responsive to sensing of a failure by element 406, elements 410-416
are iteratively operable to perform redundant mode processing within system
thereby processing requests destined to the failed system B as well as those
destined to system A for normal processing. Specifically, element 410 and
Element 416 is then operable to detect whether it is now possible to return to normal processing mode. Specifically, element 416 determines whether indicia of a return to normal processing by failed system B has been sensed. As noted above, watchdog timer messaging may provide both an indication of a failure as well as an indication of a return to operational status in a previously failed system. Those skilled in the art will recognize several equivalent methods for sensing indicia of restoration of a failed system to an operational state. If element 416 does not sense such an indication, received message continue to be processed by iterative performance of element 414.
When element 416 determines that system B has been restored to an
operation status, processing returns to element 404 to continue processing
As noted above, a key feature of both methods described above with respect to FIGS. 3 and 4 is that replies returned from the standby server when performing requests on behalf of the failed primary server are returned with a physical address corresponding to the nominal physical address of the standby server. This key feature enables the methods of the present invention to be utilized with a wider range of network topologies and media than was possible with prior approaches.
While the invention is susceptible to various modifications and alternative forms, a specific embodiment thereof has been shown by way of example in the drawing and has been described in detail. It should be understood, however. that it is not intended to limit the invention to the particular form disclosed, but on the contrary, the invention is intended to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the invention as defined by the appended claims.