An article discussing the importance of time series databases and data visualization tools like Grafana for managing and interpreting streams of data in various applications.
The author mentions several time series databases (TSDs) and visualization tools, focusing on their features, advantages, and some limitations. The article also provides an example of a Building Management and Control (BMaC) project that uses InfluxDB and Grafana for data visualization.
| Database | Description | Notable Features |
|-------------------|-------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------|
| InfluxDB | Partially open source, with version 3 being an edge data collector. | Shard-based storage, compaction levels, time series index, optional retention. |
| Apache Kudu | Column-based database optimized for multidimensional OLAP workloads. | Part of the Apache Hadoop ecosystem. |
| Prometheus | Developed at SoundCloud for metrics monitoring. | Written in Go, similar to InfluxDB v1 and v2. |
| RRDTool | All-in-one package with a circular buffer TSD that also does graphing. | Language bindings for various programming languages. |
| Graphite | Similar to RRDTool but uses a Django web-based application to render graphs. | Web-based graphing. |
| TimescaleDB | Extends PostgreSQL, supporting typical SQL queries with TSD functionality and optimizations. | Supports all typical SQL queries. |
The article also discusses Grafana as a popular tool for creating dashboards to visualize time series data, mentioning its compatibility with multiple TSDs and SQL databases. It concludes by highlighting the importance of understanding one's specific needs before choosing a TSD and visualization solution.
performing principal components analysis of the auxiliary variables, and including a small number of components in the imputation model instead of the original variables.
Impute composite variables instead of individual components
techniques may perform well, it is rarely the case, so you need a few backup.
Identifying the Type of Missingness
The first step to implementing an effective imputation strategy is identifying why the values are missing. Even though each case is unique, missingness can be grouped into three broad categories:
Missing Completely At Random (MCAR): this is a genuine case of data missing randomly. Examples are sudden mistakes in data entry, temporary sensor failures, or generally missing data that is not associated with any outside factor. The amount of missingness is low.
Missing At Random (MAR): this is a broader case of MCAR. Even though missing data may seem random at first glance, it will have some systematic relationship with the other observed features โ for example โ data missing from observational equipment during scheduled maintenance breaks. The number of null values may vary.
Missing Not At Random (MNAR):