Abstract

Exploiting the advances made so far in intrusion detection, alarm correlation and fault-tolerant computing, we present a middleware architecture for building a large-scale distributed data-management system that is self-healing and self-protective. The architecture has two related, novel aspects over the state of art. Tolerance to intrusions and faults is generally provided under the assumption that failures never exceed a pre-defined threshold and this assumption is justified based on design-stage provisions e.g., replica diversity. The autonomic system architected here strives to proactively maintain this failure threshold through intrusion detection at the node level and replicated alarm-correlation at the system level. The requirement for the latter leads to the construction of intrusion-aware signal-on-fail nodes so that the well-known (FLP) impossibility result does not hinder a timely construction of consistent snapshot of alarms raised by the nodes.

Against Attacks and Faults: an Autonomic Approach to Secure and Reliable Data Management
Ezhilchelvan, P. and Maxion, R.
In Workshop on Dependable Distributed Data Management, October 17, 2004, Florianopolis, Brazil. In conjunction with 23rd International Symposium on Reliable Distributed Systems
pp 51-56
IEEE Computer Society, 2004