-
Notifications
You must be signed in to change notification settings - Fork 159
use pushdown as default null filling implementation for timeseries API #8549
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| rows := q.Result.Data | ||
| require.Len(t, rows, 1) | ||
| i := 0 | ||
| require.Equal(t, parseTime(t, "2023-11-05T05:00:00Z").AsTime(), rows[i].Ts.AsTime()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reasoning behind this test case change? If the time range is:
start=2023-11-05T05:00:00.000Z
end=2023-11-05T05:00:01.000Z
then you'd expect it to return a timestamp that is inside the time range in UTC?
Do I have this mapping right?
- 4:00 UTC is 0:00 in New York
- 5:00 UTC is 1:00 in New York
- 6:00 UTC is 1:00 in New York
- 7:00 UTC is 2:00 in New York
So asking for the time between 5:00 and 5:01 UTC should still give 5:00 UTC, not 6:00 UTC? However if it was asking for the time between 1:00 and 1:01 New York, then it would be acceptable to return 6:00 UTC (although ideally, it should return two rows, both 5:00 UTC and 6:00 UTC).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The timezone is America/New_York in the test case, and duckdb seems to be resolving 1:00 America/New_York as 6:00 UTC instead of 5:00.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually it relates to this change as well on line 435 below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree It does look odd though, can try around rewriting range query for duckdb but not sure if it would work. Last option is to always create a manual inline query like we do for other olaps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know your view on this if we should try changing it further or use olap approach if change does not work or just keep it this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Annoying... can you think of any workarounds for this?
Maybe is it due to not passing UTC values into range? This seems to output correct values:
D SELECT timezone('America/New_York', range) AS "timestamp" FROM range('2023-11-05T05:00:00Z'::TIMESTAMP, '2023-11-05T05:01:00Z'::TIMESTAMP, INTERVAL '1 HOUR');
┌──────────────────────────┐
│ timestamp │
│ timestamp with time zone │
├──────────────────────────┤
│ 2023-11-05 05:00:00-05 │
But range then doesn't take into account partial hourly offsets, so would need some extra hoops I guess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah there were cases that were not taken care of when passing UTC values, I will need to revisit it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not seem to be working for this simple case
SELECT timezone('America/Los_Angeles', range) AS "timestamp" FROM range('2024-03-10T08:00:00Z'::TIMESTAMP, '2024-03-13T07:00:00Z'::TIMESTAMP, INTERVAL '1 DAY');
┌──────────────────────────┐
│ timestamp │
│ timestamp with time zone │
├──────────────────────────┤
│ 2024-03-10 15:00:00+00 │
│ 2024-03-11 15:00:00+00 │
│ 2024-03-12 15:00:00+00 │
└──────────────────────────┘
expected 2024-03-10 08:00:00+00, 2024-03-11 07:00:00+00 and 2024-03-12 07:00:00+00. Also tried
SELECT range AT TIME ZONE 'America/New_York' AS "timestamp" FROM range('2023-11-05T05:00:00Z'::TIMESTAMPTZ AT TIME ZONE 'America/New_York', '2023-11-05T05:01:00Z'::TIMESTAMPTZ AT TIME ZONE 'America/New_York', INTERVAL '1 MINUTE');
┌──────────────────────────┐
│ timestamp │
│ timestamp with time zone │
├──────────────────────────┤
│ 2023-11-05 06:00:00+00 │
└──────────────────────────┘
So its a duckdb thing where is interprets 01:00 as 06:00
D SELECT range AT TIME ZONE 'America/New_York' AS "timestamp" FROM range('2023-11-05T05:00:00Z'::TIMESTAMPTZ AT TIME ZONE 'America/New_York', '2023-11-05T05:00:01Z'::TIMESTAMPTZ AT TIME ZONE 'America/New_York', INTERVAL '1 SECOND');
┌──────────────────────────┐
│ timestamp │
│ timestamp with time zone │
├──────────────────────────┤
│ 2023-11-05 06:00:00+00 │
└──────────────────────────┘
D SELECT range AS "timestamp" FROM range('2023-11-05T05:00:00Z'::TIMESTAMPTZ AT TIME ZONE 'America/New_York', '2023-11-05T05:00:01Z'::TIMESTAMPTZ AT TIME ZONE 'America/New_York', INTERVAL '1 SECOND');
┌─────────────────────┐
│ timestamp │
│ timestamp │
├─────────────────────┤
│ 2023-11-05 01:00:00 │
└─────────────────────┘
runtime/metricsview/ast.go
Outdated
| // time spine needs to be applied before having so that treat_nulls_as works correctly for derived measures | ||
| // and treat_nulls_as should be applied before Having so that wrong bins are not included in final results | ||
| // however having also removes null filling bins that does not match criteria, so its depended on time spine | ||
| // this creates a cyclic dependency between time spine and having, so we re-apply time spine after having to restore removed time bins | ||
| err = ast.addSpineSelect(ast.Root, ast.Query.TimeRange, false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
however having also removes null filling bins that does not match criteria, so its depended on time spine
If it's easier/simpler, I'm okay with not supporting this – using a HAVING on a null-filled timeseries feels somewhat undefined. I'd be okay with saying "any HAVING will be evaluated after null filling", unless that breaks some important behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah even I don't like adding this complexity but this seems to be failing two tests - 1 and 2 as final results have some null filling time bins filtered because of the Having criteria. Do you think its okay to change them?
Another simpler option that I tried is to always apply time spine in the end but that causes two issues with treat_nulls_as.
- we can end up with derived measures not using
treat_nulls_asvalues for base measures - also final bins might have measure values that could have been filtered by
Havingcriteria.
So we will have to choose between these two behaviours if we want to simplify.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's okay to change those two tests, and probably sounds more correct to me (since HAVING is supposed to filter out any rows that don't match the criteria)
Checklist: