GH-48593: [C++] C++20: use standard calendar / timezone APIs#48601
GH-48593: [C++] C++20: use standard calendar / timezone APIs#48601rok wants to merge 74 commits intoapache:mainfrom
Conversation
|
|
4283740 to
d82f990
Compare
|
It seems that std::chrono on GCC (14.3.0, 15.2.0) potentially has a bug that triggers some of our tests. Meanwhile std::chrono on MSVC 19.44 ( 14.44) appears to be pass them and is correct or at least consistent with vendored Below is the explanation and reproduction of the bug. // GCC libstdc++ DST bug reproduction
//
// The AN to AS Transition Bug
// ---------------------------
// Source https://github.com/eggert/tz/blob/c37fbc3249c1a1334948b38f3bca47dee5c11dd1/australasia#L165-L192
// Australia/Broken_Hill used the AN (New South Wales) rules until 2000, then
// switched to AS (South Australia) rules. Under AN rules, DST started the last
// Sunday of October and ended the last Sunday of March. For the 1999-2000
// summer, DST started October 31, 1999 and would end March 26, 2000. February
// 29, 2000 falls squarely within this DST period, so the correct offset should
// be 9:30 base + 1:00 DST = 10:30 (630 minutes).
//
// Why GCC's Data is Wrong
// -----------------------
// When libstdc++ processes the zone transition from AN rules to AS rules (which
// happens in year 2000), it appears to lose or reset the DST state inherited
// from the AN rules. Instead of recognizing that DST is still active from the
// October 1999 transition, it reports offset=570 (just the 9:30 base) with
// save=0. The inconsistency is evident: it returns abbrev="ACDT" (daylight
// time) but the offset and save values indicate standard time. The AN rules
// clearly show DST should be active until the last Sunday of March 2000.
//
// Compile: g++ -std=c++20 -o gcc_dst_bug gcc_libstdcxx_dst_bug.cpp
// Expected: 630 (10:30 = 9:30 base + 1:00 DST)
// Actual: 570 (9:30 = base only, DST missing)
#include <chrono>
#include <iostream>
int main() {
using namespace std::chrono;
auto* tz = locate_zone("Australia/Broken_Hill");
auto info = tz->get_info(sys_days{2000y / February / 29d} + 23h + 23min + 23s);
std::cout << duration_cast<minutes>(info.offset).count() << "\n";
} |
Is the bug reported somewhere? If not, can you do that? |
Yes, I think we should do so. There are also a bunch of code snippets in the C++ and Python codebase that could be removed, IIRC. |
It seems to be related to a known issue, I added comment explaining our case. |
|
@github-actions crossbow submit -g python |
|
Revision: 4c5113f Submitted crossbow builds: ursacomputing/crossbow @ actions-e83bbadd74 |
Co-authored-by: Raúl Cumplido <raulcumplido@gmail.com>
|
@github-actions crossbow submit wheel-windows-* |
|
Revision: cde409e Submitted crossbow builds: ursacomputing/crossbow @ actions-be2cb464a0 |
|
@github-actions verify-rc-source-windows |
There was a problem hiding this comment.
Sorry for being the bearer of bad news but the Windows wheels failures are related:
> check_example_file(path, table, need_fix=True)
Python310\lib\site-packages\pyarrow\tests\test_orc.py:145:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
Python310\lib\site-packages\pyarrow\tests\test_orc.py:101: in check_example_file
table = orc_file.read()
Python310\lib\site-packages\pyarrow\orc.py:187: in read
return self.reader.read(columns=columns)
pyarrow/_orc.pyx:374: in pyarrow._orc.ORCReader.read
???
pyarrow/error.pxi:155: in pyarrow.lib.pyarrow_internal_check_status
???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> ???
E pyarrow.lib.ArrowException: Unknown error: Time zone file /usr/share/zoneinfo/America/Los_Angeles does not exist. Please install IANA time zone database and set TZDIR env.
pyarrow/error.pxi:92: ArrowException|
@github-actions crossbow submit wheel-windows-cp314-cp314t-amd64 |
|
Revision: 7e49c63 Submitted crossbow builds: ursacomputing/crossbow @ actions-e400d8f35f
|
|
@github-actions crossbow submit wheel-windows-cp314-cp314t-amd64 |
|
Revision: 54017ee Submitted crossbow builds: ursacomputing/crossbow @ actions-3a266783b2
|
|
@github-actions crossbow submit wheel-windows-* |
|
Revision: 54017ee Submitted crossbow builds: ursacomputing/crossbow @ actions-b3fc0a238b |
|
@jonkeane in light of CRAN build issues are we ok to merge here? |
I wish we had a way of testing more reliably on the older macOS framework. What's in #49221 (comment) I'm not really comfortable merging to main, but does exercise the code path. There's at least one other place we need to clean up on main already #49223 so maybe it's fine to merge this and when we get a better way to test this manualy or in CI we'll need to revert this or add a work around. I don't want worries about CRAN to totally slow down everything C++(20) related, but there is a bit of an annoying place we are in! |
Rationale for this change
Switch to std::chrono for MSVC to be able to use the system-provided timezone automatically on Windows.
What changes are included in this PR?
This adds
chrono_internal.hthat uses C++20 std::chrono timezone/calendar APIs on compilers with support (MSVC only for now) and falls back to vendoreddate.hotherwise.Are these changes tested?
Partially tested locally and partially to be tested on CI.
Are there any user-facing changes?
Yes, Windows users will no longer need to install the IANA tzdb (see instructions here and here). We possibly have tzdb download set up in CI too and should update it appropriately.