WIP: Warn on datetime decoding error#11140
WIP: Warn on datetime decoding error#11140ChrisBarker-NOAA wants to merge 7 commits intopydata:mainfrom
Conversation
5a60aac to
1082fba
Compare
|
Thanks for starting this PR.
I would suggest number 3
We could finalize the deprecation of We definitely should refrain from inventing any new keyword arguments, but push them into the existing Coder classes (or split out more Coders, if necessary). |
|
oops -- moving too fast -- I meant: have That's a bit trickier as it could be a mapping too, and that doesn'tmake a lot of sense with these options -- but I'm not sure it would too confusing.
set on_error at CFDatetimeCoder init and advertise this more prominent in the docstring. Better advertising would help, but I'd really like a quick and easy way to simply set it in the
and done. Is what you you are suggesting is that that would look like:
Which isn't too bad, but would require folks to import CFDatetimeCoder, and also to grok the whole encoder idea, which may not be obvious to folks just wanting to open the darn file ... However, the upside os not changes to open_dataset other than the docstring (and error messages?) so it should with this existing PR so I can test it :-) |
|
Aftering digging sround through the stack of open_dataset calls, I think we'd only need to change something in line 215: in That doesn't seem too painful |
|
for reference, the existing API would be: xr.open_dataset(url, engine=..., decode_times=xr.coders.CFDatetimeCoder(on_errors="ignore"))which feels like a good compromise, given that values in a type union are pretty confusing (we have this problem with I agree with this not being very well known right now, but I'm sure we can improve upon this. |
…OAA/xarray into warn_on_time_error
for more information, see https://pre-commit.ci
| ) | ||
| decode_times = CFDatetimeCoder(use_cftime=use_cftime) | ||
| # decode_times = CFDatetimeCoder(use_cftime=use_cftime) | ||
| decode_times_options = { |
There was a problem hiding this comment.
maybe this dict should be a global?
|
OK -- since I was in the code, I figured I give it a try. I haven't tested Where should I put a |
Well, Ii think the whole point of the If we wanted folks to provide decoders and all that explicitly, there wouldn't be a need for the decode_times as a simple True at all :-)
Well, I'm a bit of a typing skeptic, to tell the truth, but in any case, I don't think making the type hints easy to write should be a design criteria for Python code -- it's a dynamic language, that's why we use it. Particularly for high-level use-focused functions like But anyway, I think (and i'm a typing newbie), that: would do it. alternatively, we could use an StrEnum, but I'm not a huge fan of that for user-facing APIs either. And I don't think xarray is using that (or Enum, for that matter), anywhere else. |
|
sorry, I've been a bit misleading. The type hints would obviously be easy to write, but interpreting the option would become a lot harder. Right now, what |
Good point -- how about "decode_times="ignore_errors" or something like that? And this doesn't seem THAT hard to document. And I think documenting how to pass in a CFDatetimeCoder is harder. The entire point of this PR is to make it easy for users to get "decode what you can, don't decode what you can't, but don't error out ever" behavior. As pointed out in the issue, as it stands, you can pass a different decoder object for each variable, or a number of other ways customize the loading already. If we don't expand the flag, we haven't really helped much (though a little, so still worth it) |
As discussed in #2848
This is a start on implementing the ability to have xarray warn or ignore time encoding errors.
This allows it to seamlessly load a file with some time variables correctly encoded, and others not correctly encoded.
It adds a
on_errorflag toCFDatetimeCoder:TODO: figure out how to pass this flag in through
open_dataset()Options:
open_datasetacceptuse_cftimeas one of {True, False, 'raise', 'warn', 'ignore'}raise(same as now) and False would mean "don't even try for any variables, same as it does now.open_dataset, maybeon_cftime_errorI prefer (1) as there's a bit of an interplay between the two flags -- plus lots of flags!
whats-new.rstapi.rst@kmuehlbauer: your thoughts?