-
Notifications
You must be signed in to change notification settings - Fork 4.5k
Description
What happened?
Description: Beam YAML's JSON schema compatibility validation for objects is effectively disabled due to a logic error in json_utils.py.
The function _validate_compatible (used by row_validator) attempts to check if a Beam schema is compatible with a provided JSON schema. However, it contains several "simple code" bugs:
It compares the weak_schema dictionary directly to the string 'object' instead of checking its
type field.
It attempts to unpack a dictionary during iteration without calling .items().
It uses improper string formatting for error messages, leading to unhelpful or crashing error reports.
As a result, Validate transforms in Beam YAML may silently proceed even when schemas are fundamentally incompatible, or fail with distracting internal tracebacks.
Beam Version: 2.61.x (Python SDK)
Steps to reproduce
Run the following Python snippet. It should raise a ValueError about incompatible types ('string' vs 'integer'), but it currently completes successfully because the validation logic is skipped.
from apache_beam.yaml import json_utils
from apache_beam.portability.api import schema_pb2
from apache_beam.typehints import schemas
A schema with a string field
beam_schema = schema_pb2.Schema(fields=[
schemas.schema_field('f', schema_pb2.STRING)
])
An incompatible JSON schema expecting an integer for the same field
json_schema = {
'type': 'object',
'properties': {
'f': {'type': 'integer'}
}
}
This SHOULD fail, but silently succeeds due to logic error in json_utils.py
json_utils.row_validator(beam_schema, json_schema)
Issue Priority
Priority: 2 (default / most bugs should be filed as P2)
Issue Components
- Component: Python SDK
- Component: Java SDK
- Component: Go SDK
- Component: Typescript SDK
- Component: IO connector
- Component: Beam YAML
- Component: Beam examples
- Component: Beam playground
- Component: Beam katas
- Component: Website
- Component: Infrastructure
- Component: Spark Runner
- Component: Flink Runner
- Component: Samza Runner
- Component: Twister2 Runner
- Component: Hazelcast Jet Runner
- Component: Google Cloud Dataflow Runner