Skip to content

opentelemetry authors not detected properly #4736

@alexmohr

Description

@alexmohr

Description

ScanCode's author detection produces false positives for two-segment CamelCase code identifiers (e.g., MeterProvider, TracerProvider) found in Rust doc comments. The pattern "created by a MeterProvider" or "created by a TracerProvider" in source code documentation triggers the AUTHOR: { } grammar rule because these words match the NNP tokenizer regex ^([A-Z][a-z0-9]+){1,2}$ at line 2219 of copyrights.py.

The existing CamelCase JUNK rule (^([A-Z][a-z]+){3,}$) only catches words with three or more CamelCase segments (e.g., GetQueueReference), so two-segment identifiers like MeterProvider slip through.

How To Reproduce

Scan the opentelemetry-rust repository. The following files produce false author detections:

opentelemetry-sdk/src/metrics/meter_provider.rs line 26: "All Meters created by a MeterProvider will be..." _> detected author: MeterProvider
opentelemetry-sdk/src/trace/span_processor.rs line 13: "All Tracer instances created by a TracerProvider share..." -> detected author: TracerProvider

scancode --copyright --json output.json opentelemetry-rust/opentelemetry-sdk/

Expected: No author detection for these lines — MeterProvider and TracerProvider are Rust type names, not people or organizations.

System configuration

For bug reports, it really helps us to know:

OS: Linux (Docker: Linux-6.12.65-linuxkit-x86_64-with-glibc2.36)
scancode-toolkit version: 32.5.0
Installation method: docker

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions