-
Notifications
You must be signed in to change notification settings - Fork 4.5k
Description
What happened?
The _validate_compatible function in json_utils.py fails to correctly validate object-type schemas, leading to silent passes on incompatible data or internal crashes.
Several issues exist in this utility:
Silent Logic Failure: At line 321, elif weak_schema == 'object': compares a dictionary to a string, which is always false. This causes the entire schema compatibility check for object properties to be skipped.
Unpacking Crash: At line 325, for name, spec in weak_schema.get('properties', {}): attempts to iterate over a dictionary without .items(), causing a ValueError crash.
Mangled Error Messages: Error strings use incorrect formatting (e.g., ValueError('Expected object type, got {json_type}.') is missing an f prefix).
Beam Version: 2.61.0 (Python SDK)
from apache_beam.yaml import json_utils
from apache_beam.portability.api import schema_pb2
from apache_beam.typehints import schemas
# A schema with a string field
beam_schema = schema_pb2.Schema(fields=[
schemas.schema_field('f', schema_pb2.STRING)
])
# An incompatible JSON schema expecting an integer
json_schema = {
'type': 'object',
'properties': {
'f': {'type': 'integer'}
}
}
1. This SHOULD fail with a compatibility error, but it silently succeeds.
json_utils.row_validator(beam_schema, json_schema)
2. Reaching other paths (if logic fixed) causes:
ValueError: not enough values to unpack (expected 2, got 1)
Issue Priority
Priority: 2 (default / most bugs should be filed as P2)
Issue Components
- Component: Python SDK
- Component: Java SDK
- Component: Go SDK
- Component: Typescript SDK
- Component: IO connector
- Component: Beam YAML
- Component: Beam examples
- Component: Beam playground
- Component: Beam katas
- Component: Website
- Component: Infrastructure
- Component: Spark Runner
- Component: Flink Runner
- Component: Samza Runner
- Component: Twister2 Runner
- Component: Hazelcast Jet Runner
- Component: Google Cloud Dataflow Runner