devops-exercises/topics/observability/README.md
srishakthidhara b9b84f8015
#415 Update README.md ["What's Observability?"] (#10208)
No answer for "What's Observability?"
 opened 3 weeks ago by bbrewington
2023-08-24 22:55:19 +03:00

2.7 KiB
Raw Permalink Blame History

Observability

What's Observability?

In distributed systems, observability is the ability to collect data about programs' execution, modules' internal states, and the communication among components.
To improve observability, software engineers use a wide range of logging and tracing techniques to gather telemetry information, and tools to analyze and use it.
Observability is foundational to site reliability engineering, as it is the first step in triaging a service outage.[1]

Monitoring

What's monitoring? How is it related to Observability?

Google: "Monitoring is one of the primary means by which service owners keep track of a systems health and availability".

What types of monitoring outputs are you familiar with and/or used in the past?

Alerts
Tickets
Logging

Data

Can you mention what type of things are often montiored in the IT industry?
  • Hardware (CPU, RAM, ...)
  • Infrastructure (Disk capacity, Network latency, ...)
  • App (Status code, Errors in logs, ...)
Explain "Time Series" data

Time series data is sequenced data, measuring certain parameter in ordered (by time) way.

An example would be CPU utilization every hour:

08:00   17
09:00   22
10:00   91

Explain data aggregation

In monitoring, aggregating data is basically combining collection of values. It can be done in different ways like taking the average of multiple values, the sum of them, the count of many times they appear in the collection and other ways that mainly depend on the type of the collection (e.g. time-series would be one type).

Application Performance Management

What is Application Performance Management?
  • IT metrics translated into business insights
  • Practices for monitoring applications insights so we can improve performances, reduce issues and improve overall user experience

Name three aspects of a project you can monitor with APM (e.g. backend)
  • Frontend
  • Backend
  • Infra
  • ...

What can be collected/monitored to perform APM monitoring?
  • Metrics
  • Logs
  • Events
  • Traces