guard timestamp_column in LocalDedupNode against missing DataFrame column by faustaround · Pull Request #5985 · feast-dev/feast

faustaround · 2026-02-18T22:43:03Z

LocalDedupNode.execute unconditionally appends timestamp_column to
sort_keys, but created_timestamp_column (added immediately after)
already guards against this with an in df.columns check:

  sort_keys = [self.column_info.timestamp_column]   # no guard
  if (
      self.column_info.created_timestamp_column
      and self.column_info.created_timestamp_column in df.columns  # has guard
  ):
      sort_keys.append(self.column_info.created_timestamp_column)

When the feature view's timestamp_field column is not declared in the
feature schema, the DAG pipeline projects it away before the dedup node
runs. The column is present in the raw Redshift result but absent from the
DataFrame by the time drop_duplicates is called, causing:

  KeyError: '<timestamp_field_name>'

This affects any feature view where timestamp_field is an internal
bookkeeping column not exposed as a feature.

Apply the same guard to timestamp_column for consistency, and add a
fallback to deduplicate by key only when no sort columns survive (rather
than crashing).

…t in feature schema timestamp_column is unconditionally added to sort_keys even when the column doesn't exist in the DataFrame (e.g. when the timestamp_field isn't declared in the feature view schema and gets projected away by the DAG pipeline). The adjacent created_timestamp_column already has an in df.columns guard — timestamp_column needs the same treatment.

devin-ai-integration

Devin Review found 1 potential issue.

View 3 additional findings in Devin Review.

devin-ai-integration · 2026-02-18T22:45:12Z

sdk/python/feast/infra/compute_engines/local/nodes.py

+            if sort_keys:
+              df = self.backend.drop_duplicates(
+                  df, keys=dedup_keys, sort_by=sort_keys, ascending=False
+              )


🔴 Deduplication silently skipped when no timestamp columns are present in DataFrame

When neither timestamp_column nor created_timestamp_column is present in the DataFrame columns, sort_keys will be empty and the if sort_keys: guard at line 204 causes the entire drop_duplicates call to be skipped. This means duplicate rows (by join key) pass through undetected.

Root Cause and Impact

The PR description states the intent is to "add a fallback to deduplicate by key only when no sort columns survive (rather than crashing)." However, the implementation at lines 204-207 simply skips deduplication entirely when sort_keys is empty:

if sort_keys: df = self.backend.drop_duplicates( df, keys=dedup_keys, sort_by=sort_keys, ascending=False )

When sort_keys is empty (falsy), no deduplication happens at all. The correct behavior should be to still deduplicate by dedup_keys alone — just without a deterministic sort order. For example, using pandas' df.drop_duplicates(subset=dedup_keys) or equivalent.

Impact: Any feature view where timestamp_field is an internal bookkeeping column not exposed in the feature schema will have its timestamp column projected away before the dedup node runs. In this case, duplicate entity rows will silently remain in the output, leading to incorrect feature values (e.g., duplicated rows in training datasets or multiple values written to the online store for the same entity key).

Suggested change

if sort_keys:

df = self.backend.drop_duplicates(

df, keys=dedup_keys, sort_by=sort_keys, ascending=False

)

if sort_keys:

df = self.backend.drop_duplicates(

df, keys=dedup_keys, sort_by=sort_keys, ascending=False

)

else:

df = self.backend.drop_duplicates(

df, keys=dedup_keys, sort_by=dedup_keys, ascending=True

)

Was this helpful? React with 👍 or 👎 to provide feedback.

faustaround requested a review from a team as a code owner February 18, 2026 22:43

devin-ai-integration bot reviewed Feb 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

guard timestamp_column in LocalDedupNode against missing DataFrame column #5985

guard timestamp_column in LocalDedupNode against missing DataFrame column #5985
faustaround wants to merge 1 commit intofeast-dev:masterfrom
faustaround:fix/dedup-node-timestamp-column-keyerror

faustaround commented Feb 18, 2026 •

edited by devin-ai-integration bot

Loading

Uh oh!

devin-ai-integration bot left a comment

Uh oh!

devin-ai-integration bot Feb 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

faustaround commented Feb 18, 2026 • edited by devin-ai-integration bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

devin-ai-integration bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration bot Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

faustaround commented Feb 18, 2026 •

edited by devin-ai-integration bot

Loading