DEV Community

Cover image for Statistics by Time Window — From SQL to SPL #31
Judith-Data-Processing-Hacks
Judith-Data-Processing-Hacks

Posted on

Statistics by Time Window — From SQL to SPL #31

Problem description & analysis:

The Time field of a certain database table is time, and the time interval is sometimes greater than 1 minute.

source table

Task: Now we need to divide the data into windows every minute, fill in the missing windows, and calculate the four values for each window, which are: start_value, the last item of the previous window; end_value, the last item in this window; min, the minimum value of this window; max, the maximum value of this window. The start_value in the first minute is based on the first record in this window; If data for a certain window is missing, replace it with the last item of the previous window (same as the start_value of this window).

expected results

Code comparisons:

SQL

with overview as ( SELECT distinct on (a.time) a.id, a.time, b.time as "end", a.value, date_trunc('minute', a.time) as minute_start, date_trunc('minute', b.time) as minute_end FROM main a left join main b on a."time"<b."time" and a.id = b.id order by a.time, b.time asc ), overview2 as ( select id, value, true as backfill, date_trunc('minute', "end") as time, date_trunc('minute', "end") as minute from overview where minute_start <> minute_end UNION ALL select id, time, value, false as backfill, date_trunc('minute', time) as minute from overview ), overview3 as ( select * from overview2 UNION ALL ( Select distinct on (a.missingminute) c.id, a.missingminute as time, a.missingminute as minute, c.value, true as backfill from ( SELECT date_trunc('minute', time.time) as missingminute FROM generate_series((select min(minute) from overview2),(select max(minute) from overview2),'1 minute'::interval) time left join ( select distinct minute from overview2 ) b on date_trunc('minute', time) = b.minute where b.minute isnull ) a left join main c on a.missingminute > c.time order by a.missingminute, c.time desc ) order by time ) select t1.id, t1.minute as minute_start, t1.minute + interval '1 minute' as minute_end, t1.backfill as start_backfill, t1.start, t2.end, coalesce(t3.min, t1.start) as min, coalesce(t3.max, t1.start) as max from (select distinct on (id, minute) id, minute, value as start, backfill from overview3 order by id, minute, time asc) t1 left join (select distinct on (id, minute) id, minute, value as end from overview3 order by id, minute, time desc) t2 on t1.id = t2.id and t1.minute = t2.minute left join (select id, minute, min(value) min, max(value) max from overview2 group by id,minute) t3 on t1.id = t3.id and t1.minute = t3.minute 
Enter fullscreen mode Exit fullscreen mode

SQL requires multiple layers of nested subqueries and complex join statements to implement, and the code is lengthy and difficult to understand.

SPL: SPL directly provides time series functions, sequence-aligned functions, and position-related syntax.

➡️Try.DEMO

esProc SPL code

A1: Load data.

A2: Change the Time field to full minutes.

A3: Generate a continuous minute time series list.

A4: Align the data with the list, with each group being the data of a window and some windows being empty.

A5: Generate a new two-dimensional table, use the original records from each group to generate a new record, start takes values from the list according to the sequence number, start_value is taken from the end_value of the previous new record, and when this value is null (in the first minute), it is taken from the first record of the current group. end_value is taken from the last record of the previous group, min is taken from the minimum value of this group, and max is taken from the maximum value of this group. If the last three items are missing, use start_value(sv) instead. [-1] represents the relative previous one, and m(-1) represents the last member in the set.


Get Started with esProc SPL — esProc SPL FREE Download.

Top comments (3)

Collapse
 
nathan_tarbert profile image
Nathan Tarbert

insane how many hoops you gotta jump through for this in sql - been there. ever think we’ll see this kind of stuff get way simpler in mainstream dbs or is that just wishful thinking?

Collapse
 
judith677 profile image
Judith-Data-Processing-Hacks

Totally agree—SQL can make it way harder than it needs to be. SPL is designed to makes the set-based logic much clearer and much more. Feel free to follow up and stay tuned for more SPL solutions!🙌🏻

Collapse
 
judith677 profile image
Judith-Data-Processing-Hacks

Feel free to download and share your feedback with us!

🤲🏻Discord
🤲🏻Reddit