Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Rollups example script #1022

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

M-Sayed
Copy link

@M-Sayed M-Sayed commented Oct 27, 2021

Assume we run the cron jobs on an hour basis, the following will happen:

timeline runs_at last_rollup_time curr_rollup_time what happens
the Nth cron job runs 2021-01-05 20:00:00 2021-01-05 19:00:00 2021-01-05 20:00:00 hours included [20]
new http_requests are created at 2021-01-05 20:30:00, let's say http_requests (A and B)
the (N+1)th cron job 2021-01-05 21:00:00 2021-01-05 20:00:00 2021-01-05 21:00:00 hours included [21]

Events A and B will not be added to the rollup table. To confirm:

> select date_trunc('hour', '2021-01-05 20:30:00'::timestamp) <@ tsrange('2021-01-05 19:00:00'::timestamp, '2021-01-05 20:00:00'::timestamp, '(]');
> true

Events A, and B won't be added here although their ingest_times are within the range, because they are not in the citus DB yet, they are created at 20:30:00 and the job is running at 20:00:00. That's 30 minutes in between.

> select date_trunc('hour', '2021-01-05 20:30:00'::timestamp) <@ tsrange('2021-01-05 20:00:00'::timestamp, '2021-01-05 21:00:00'::timestamp, '(]');
> false

Events A, and B won't be added here because their ingest_times are not within the range.

Which means that http_requests A and B will be lost.

I think the issue can be fixed if we use the timestamp itself without any truncation in the where clause, so instead of using date_trunc('minute', ingest_time) we just use ingest_time.

@jonels-msft
Copy link
Member

Hi @M-Sayed, is the method you're proposing preferable to that in #943? I just merged that older PR, but we can switch to use your suggestion instead if you think that's best.

@M-Sayed
Copy link
Author

M-Sayed commented Aug 23, 2024

@jonels-msft Sorry for that late response, Yes. I believe mine is preferable and it can work with any rollup table, no matter what interval is used. It seems more reliable.

I tried and discussed the suggested approach in #943 with my team, but unluckily I don't recall the reasoning behind our outcome as it was so long ago, but we decided to go with the approach I suggested here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants