-
-
Notifications
You must be signed in to change notification settings - Fork 707
Description
Description
ScanCode's author detection produces false positives for two-segment CamelCase code identifiers (e.g., MeterProvider, TracerProvider) found in Rust doc comments. The pattern "created by a MeterProvider" or "created by a TracerProvider" in source code documentation triggers the AUTHOR: { } grammar rule because these words match the NNP tokenizer regex ^([A-Z][a-z0-9]+){1,2}$ at line 2219 of copyrights.py.
The existing CamelCase JUNK rule (^([A-Z][a-z]+){3,}$) only catches words with three or more CamelCase segments (e.g., GetQueueReference), so two-segment identifiers like MeterProvider slip through.
How To Reproduce
Scan the opentelemetry-rust repository. The following files produce false author detections:
opentelemetry-sdk/src/metrics/meter_provider.rs line 26: "All Meters created by a MeterProvider will be..." _> detected author: MeterProvider
opentelemetry-sdk/src/trace/span_processor.rs line 13: "All Tracer instances created by a TracerProvider share..." -> detected author: TracerProvider
scancode --copyright --json output.json opentelemetry-rust/opentelemetry-sdk/
Expected: No author detection for these lines — MeterProvider and TracerProvider are Rust type names, not people or organizations.
System configuration
For bug reports, it really helps us to know:
OS: Linux (Docker: Linux-6.12.65-linuxkit-x86_64-with-glibc2.36)
scancode-toolkit version: 32.5.0
Installation method: docker