Skip to content

[Bug]: Beam YAML JSON Schema validation silent failure and internal crashes in json_utils.py #37576

@atheendre130505

Description

@atheendre130505

What happened?

The _validate_compatible function in json_utils.py fails to correctly validate object-type schemas, leading to silent passes on incompatible data or internal crashes.

Several issues exist in this utility:

Silent Logic Failure: At line 321, elif weak_schema == 'object': compares a dictionary to a string, which is always false. This causes the entire schema compatibility check for object properties to be skipped.
Unpacking Crash: At line 325, for name, spec in weak_schema.get('properties', {}): attempts to iterate over a dictionary without .items(), causing a ValueError crash.
Mangled Error Messages: Error strings use incorrect formatting (e.g., ValueError('Expected object type, got {json_type}.') is missing an f prefix).
Beam Version: 2.61.0 (Python SDK)

from apache_beam.yaml import json_utils
from apache_beam.portability.api import schema_pb2
from apache_beam.typehints import schemas
# A schema with a string field
beam_schema = schema_pb2.Schema(fields=[
    schemas.schema_field('f', schema_pb2.STRING)
])
# An incompatible JSON schema expecting an integer
json_schema = {
    'type': 'object',
    'properties': {
        'f': {'type': 'integer'}
    }
}

1. This SHOULD fail with a compatibility error, but it silently succeeds.

json_utils.row_validator(beam_schema, json_schema)

2. Reaching other paths (if logic fixed) causes:

ValueError: not enough values to unpack (expected 2, got 1)

Issue Priority

Priority: 2 (default / most bugs should be filed as P2)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam YAML
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Infrastructure
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions