Skip to content

WIP: Warn on datetime decoding error#11140

Open
ChrisBarker-NOAA wants to merge 7 commits intopydata:mainfrom
ChrisBarker-NOAA:warn_on_time_error
Open

WIP: Warn on datetime decoding error#11140
ChrisBarker-NOAA wants to merge 7 commits intopydata:mainfrom
ChrisBarker-NOAA:warn_on_time_error

Conversation

@ChrisBarker-NOAA
Copy link

As discussed in #2848

This is a start on implementing the ability to have xarray warn or ignore time encoding errors.

This allows it to seamlessly load a file with some time variables correctly encoded, and others not correctly encoded.

It adds a on_error flag to CFDatetimeCoder:

on_error  :  str, optional
       What to do if there is an error when attempting to decode
       a time variable. Options are: "raise", "warn", "ignore".
       Defaults to "raise". 

TODO: figure out how to pass this flag in through open_dataset()

Options:

  1. have open_dataset accept use_cftime as one of {True, False, 'raise', 'warn', 'ignore'}
  • True would be the same as raise (same as now) and False would mean "don't even try for any variables, same as it does now.
  1. add another flag to open_dataset, maybe on_cftime_error

I prefer (1) as there's a bit of an interplay between the two flags -- plus lots of flags!

@kmuehlbauer: your thoughts?

@kmuehlbauer
Copy link
Contributor

kmuehlbauer commented Feb 7, 2026

Thanks for starting this PR.

use_cftime as keyword argument is deprecated since 2025.1.

I would suggest number 3

  • set on_error at CFDatetimeCoder init and advertise this more prominent in the docstring.

We could finalize the deprecation of use_cftime here or in another PR.

We definitely should refrain from inventing any new keyword arguments, but push them into the existing Coder classes (or split out more Coders, if necessary).

@ChrisBarker-NOAA
Copy link
Author

oops -- moving too fast -- I meant:

have decode_times as one of {True, False, 'raise', 'warn', 'ignore'}

That's a bit trickier as it could be a mapping too, and that doesn'tmake a lot of sense with these options -- but I'm not sure it would too confusing.

I would suggest number 3

set on_error at CFDatetimeCoder init and advertise this more prominent in the docstring.

Better advertising would help, but I'd really like a quick and easy way to simply set it in the open_dataset call, e.g.

ds = xr.open_dataset(my_file, decode_times='ignore')

and done.

Is what you you are suggesting is that that would look like:

ds = xr.open_dataset(my_file, CFDatetimeCoder(on_error="ignore"))

Which isn't too bad, but would require folks to import CFDatetimeCoder, and also to grok the whole encoder idea, which may not be obvious to folks just wanting to open the darn file ...

However, the upside os not changes to open_dataset other than the docstring (and error messages?) so it should with this existing PR so I can test it :-)

@ChrisBarker-NOAA
Copy link
Author

Aftering digging sround through the stack of open_dataset calls, I think we'd only need to change something in convensions.py:

line 215: in decode_cf_variable

    if decode_times:
        # remove checks after end of deprecation cycle
        if not isinstance(decode_times, CFDatetimeCoder):
            if use_cftime is not None:
                emit_user_level_warning(
                    "Usage of 'use_cftime' as a kwarg is deprecated. "
                    "Please pass a 'CFDatetimeCoder' instance initialized "
                    "with 'use_cftime' to the 'decode_times' kwarg instead.\n"
                    "Example usage:\n"
                    "    time_coder = xr.coders.CFDatetimeCoder(use_cftime=True)\n"
                    "    ds = xr.open_dataset(decode_times=time_coder)\n",
                    FutureWarning,
                )
            decode_times = CFDatetimeCoder(use_cftime=use_cftime)

That doesn't seem too painful

@keewis
Copy link
Collaborator

keewis commented Feb 7, 2026

for reference, the existing API would be:

xr.open_dataset(url, engine=..., decode_times=xr.coders.CFDatetimeCoder(on_errors="ignore"))

which feels like a good compromise, given that values in a type union are pretty confusing (we have this problem with xr.where's keep_attrs).

I agree with this not being very well known right now, but I'm sure we can improve upon this.

)
decode_times = CFDatetimeCoder(use_cftime=use_cftime)
# decode_times = CFDatetimeCoder(use_cftime=use_cftime)
decode_times_options = {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe this dict should be a global?

@ChrisBarker-NOAA
Copy link
Author

OK -- since I was in the code, I figured I give it a try.

I haven't tested open_dataset yet, nor worked ont he docstring, but at least this doesn't break any existing tests.

Where should I put a open_dataset() test?

@ChrisBarker-NOAA
Copy link
Author

ChrisBarker-NOAA commented Feb 7, 2026

which feels like a good compromise,

Well, Ii think the whole point of the open_dataset() top level function is that it's a single, easy to use function to do the "ordinary" things.

If we wanted folks to provide decoders and all that explicitly, there wouldn't be a need for the decode_times as a simple True at all :-)

given that values in a type union are pretty confusing (we have this problem with xr.where's keep_attrs).

Well, I'm a bit of a typing skeptic, to tell the truth, but in any case, I don't think making the type hints easy to write should be a design criteria for Python code -- it's a dynamic language, that's why we use it.

Particularly for high-level use-focused functions like open_dataset()

But anyway, I think (and i'm a typing newbie), that:

    decode_times: bool | str | CFDatetimeCoder = True,

would do it.

alternatively, we could use an StrEnum, but I'm not a huge fan of that for user-facing APIs either.

And I don't think xarray is using that (or Enum, for that matter), anywhere else.

@keewis
Copy link
Collaborator

keewis commented Feb 7, 2026

sorry, I've been a bit misleading. The type hints would obviously be easy to write, but interpreting the option would become a lot harder. Right now, what decode_times: bool does is switch time decoding on/off, but decode_times="ignore" reads wrong: did we mean to ignore time decoding? (I know the intention in this PR is to switch decoding on, but also ignore any errors). I think for a good API we'd either need to add another option (which we explicitly want to avoid), or group all related options into a single object like a dict or the CFDatetimeCoder and pass that to decode_times. However, while decode_times={"on_error": "ignore"} is easier to understand than decode_times="ignore", I would argue that API-wise the coder is still preferable.

@ChrisBarker-NOAA
Copy link
Author

decode_times="ignore" reads wrong: did we mean to ignore time decoding?

Good point -- how about "decode_times="ignore_errors" or something like that?

And this doesn't seem THAT hard to document. And I think documenting how to pass in a CFDatetimeCoder is harder.

The entire point of this PR is to make it easy for users to get "decode what you can, don't decode what you can't, but don't error out ever" behavior.

As pointed out in the issue, as it stands, you can pass a different decoder object for each variable, or a number of other ways customize the loading already. If we don't expand the flag, we haven't really helped much (though a little, so still worth it)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

When decode_times fails, warn rather than failing

3 participants