Archive for September, 2008

23
Sep
08

Real time monitoring system Axon Monitor

What reliable you would create software, what steps would be undertaken by developers for maintenance of reliability and stability of work of applications – there is always will be possibility of errors occurrence and problem situations.
Further all depends on application sphere: the mankind can quite endure failure in your photos gallery management tool but failure in the control system of Atom power plant or in exchange server can carry rather unpleasant consequences.

Errors will arise — it is necessary to accept it as an axiom.
Hard disks and memory fail, the network equipment and communication channels refuses, disk sections are overflowed, there are failures in fulfilment of applications in a consequence of arrival of the incorrect data — this list can be continued eternally.
It is important to be able to react in right time at approach such here non-staff situations.

Most our clients are financial institutions.
Our experience has poured out in creation of tool called Axon Monitor intended for the centralised console to for collecting and processing status information from various sources.
Using the uniform console, technical support engineer can trace a condition of a considerable quantity of the services started on various servers and computers.

Architecture

In the center of Axon Monitor located data transfer bus which generally are message exchange server.
Axon Monitor architecture
Currently there is a considerable quantity of solutions quite successfully coping with a problem of the data transfer.
The experiments made by us have allowed to choose the optimal (from our point of view) from them.

Server tasks include message transfer from applications into the central console of monitoring system.
Necessary to notice that it is possible to connect more than one console and everyone can carry out own problems.

Using configuration utilities, possible to set reactions on events of certain type, e.g send SMS to manager Y if value X has exceeded some threshold and so on.

Basic features

  • Ability to create groups from users(e.g. support engineers, managers)
  • Ability to manage schedules. Functions to bind to time intervals individuals of groups of users
  • Support several types of messages from remote/monitoring systems: HEARTBEAT, INFORMATION, ERROR, CRITICAL_ERROR
  • Several ways of notification:
    1. On central console
    2. Sound alert from central console
    3. E-mail alert
    4. SMS alert
    5. Alert through Jabber protocol to IM.
  • Support of several types of message bus:
    1. Spread
    2. RabbitMQ
    3. ApacheMQ
  • Embedded language which allow create own callbacks on alerts
  • Standard set of alerts
  • Easy to use API which allow add support of your own services in short terms

Article original locate here