Select Page

Tick Tock: What the heck is time-series data? by Tanay Pant | OSDC 2019

by Saeid Hassan-Abadi | May 5, 2020 | Monitoring & Observability

This entry is part 6 of 6 in the series OSDC 2019 | Recap

OSDC 2019 | Recap

Kubernetes Custom Resources with Kubeless and Metacontroller by Michael Grüner | OSDC 2019
Evolution of a Microservice-Infrastructure by Jan Martens | OSDC 2019
5 Steps to a DevOps Transformation by Dan Barker | OSDC 2019
Fast log management for your infrastructure by Nicolas Frankel | OSDC 2019
Storage Wars – Using Ceph since Firefly | OSDC 2019
Tick Tock: What the heck is time-series data? by Tanay Pant | OSDC 2019

The rise of IoT and smart infrastructure has led to the generation of massive amounts of complex data. In his talk at the Open Source Data Center Conference (OSDC) 2019 Tanay Pant brought up a question to gather insights: Tick Tock: What the heck is time-series data? See the video of Tanay‘s presentation and read a summary (below).

The former OSDC will be held for the first time in 2020 under the new name stackconf. With the changes in modern IT in recent years, the focus of the conference has increasingly shifted from a mainly static infrastructure approach to a broader spectrum that includes agile methods, continuous integration, container, hybrid and cloud solutions. This development is taken into account by changing the name of the conference and opening the topic area for further innovations.

Due to concerns around the coronavirus (COVID-19), the decision was made to hold stackconf 2020 as an online conference. The online event will now take place from June 16 – 18, 2020. Join us, live online! Save your ticket now at: stackconf.eu/ticket/

Tick Tock: What the heck is time-series data?

Today we are going to talk about topics like what is time-series and how the load of different file forms are distributed, different use cases where time-series are used frequently. Then we’ll talk about how Create-DB helps to communicate with machine files.

What are time series?

To answer this question we present a sensor that sends the files in a period of time. When we want to read in or display this file, the time would be an axis. Compared to other workloads this file is not added to the database as an update, the time-series is added as an input and this is the primary way for this process. Time-series in database is basically introducing efficiencies through temporal treatment and this allows us to intuitively have this set of files like monitoring in different times in all aspects of our operation.

Now we have a view on time-series. If you create an abstract, look at different use cases of time-series and the way the data was generated. You can categorize them in two different ways. The first one is IT and monitoring, what can be described as a traditional use of time-series databases. When we have a look at the properties in this, one can say there are tens or hundreds of metrics or sensors as well as a lot of complex data and queries that are often larger than several gigabytes. Flux DB is a good example in this category.

We have industrial sensor data and this is an emerging sector that has not been much talked about. There are also hundreds or thousands of sensors or metrics, too. So the real-time queries are under pressure, which must be able to access all the gigabytes of data. Create-DB is a good example in this case.

We start with core technology and see what exactly Create-DB is and how it differs from other databases in this segment. Create-DB is a new type of distribution continuation database that is best suited for handling industrial sensor data, due to its ease of use and ability to handle a lot of different data, as well as a thousand different sensor data. Create-DB supports distributed SQL with full-text search and data queries, and also coordinates different nodes in a DB Cluster seamlessly with one another. In addition, the execution of write and query operations across nodes in clusters are automatically distributed. Create-DB has columnar caches for time-series in memory SQL performance so time-series normally require all data in main memory to fit, which limits the amount of data that can be managed within a specific time.

One solution for time-series performance without data volume restrictions is to implement the residence of memory in filled caches at each node, so that the caches tell the query engine whether there are any records on this node and where those records are. Distributed query processing also contributes to fast performance and a query planner that makes wise decisions about which nodes are best suited for execution. And it has machine data functions with a cloud native that makes it seamless in the cloud. Finally, we look at a few advantages of Create-DB. The Create-DB installation is simple. You can create an instance of Create-DB with a single line on the terminal or docker. It has a distributed query engine that supports full-text queries. It can handle economic hardware and instances well, and it is easy to scale the architecture.

Saeid Hassan-Abadi

Systems Engineer

Saeid hat im Juli 2022 seine Ausbildung als Fachinformatiker für Systemintegration bei uns abgeschloßen, und arbeitet nun in Operation-Team. Der gebürtige Perser hat in seinem Heimatland Iran Wirtschaftsindustrie-Ingenieurwesen studiert. Er arbeitet leidenschaftlich gerne am Computer und eignet sich gerne neues Wissen an. Seine Hobbys sind Musik hören, Sport treiben und mit seinen Freunden Zeit verbringen.

Read more from Saeid and meet the Team