-
Notifications
You must be signed in to change notification settings - Fork 30
Description
Note: As I was reviewing this project, and before I created any pull requests, I wanted to create a tracking issue on this topics. Much of the content (including links, recommendations, and priorities) was assisted by AI analysis using Kiro IDE.
Overview
This issue catalogs all external dependencies used in the PDF Accessibility project, organized by technology stack and source folder. Each dependency includes its current version, latest available version, and links to the specific files in the repository.
Python Dependencies
Root Level - CDK Infrastructure
File: requirements.txt
| Package | Current Version | Latest Version | PyPI Link |
|---|---|---|---|
| aws-cdk-lib | 2.147.2 | 2.196.0 | pypi.org/project/aws-cdk-lib |
| constructs | >=10.0.0,<11.0.0 | 10.x | pypi.org/project/constructs |
Lambda: split_pdf
File: lambda/split_pdf/requirements.txt
| Package | Current Version | Latest Version | PyPI Link |
|---|---|---|---|
| pypdf | 4.3.1 | 6.2.0+ | pypi.org/project/pypdf |
Notes: pypdf has had major version updates. Version 6.x includes significant improvements and bug fixes.
Lambda: add_title
File: lambda/add_title/requirements.txt
| Package | Current Version | Latest Version | PyPI Link |
|---|---|---|---|
| PyMuPDF | 1.24.14 | 1.25.5+ | pypi.org/project/PyMuPDF |
Notes: PyMuPDF is actively maintained with regular updates for performance and features.
Lambda: accessibility_checker_before_remidiation
File: lambda/accessibility_checker_before_remidiation/requirements.txt
| Package | Current Version | Latest Version | PyPI Link |
|---|---|---|---|
| pdfservices-sdk | 4.1.0 | 4.1.0 | pypi.org/project/pdfservices-sdk |
Notes: Adobe PDF Services SDK - requires Python 3.10+. Check Adobe's documentation for updates.
Lambda: accessability_checker_after_remidiation
File: lambda/accessability_checker_after_remidiation/requirements.txt
| Package | Current Version | Latest Version | PyPI Link |
|---|---|---|---|
| pdfservices-sdk | 4.1.0 | 4.1.0 | pypi.org/project/pdfservices-sdk |
Docker Container: docker_autotag
File: docker_autotag/requirements.txt
| Package | Current Version | Latest Version | PyPI Link |
|---|---|---|---|
| boto3 | 1.34.160 | 1.40.0+ | pypi.org/project/boto3 |
| numpy | 2.0.1 | 2.x | pypi.org/project/numpy |
| openpyxl | 3.1.5 | 3.1.x | pypi.org/project/openpyxl |
| pandas | 2.2.2 | 2.2.x | pypi.org/project/pandas |
| pillow | 10.4.0 | 11.1.0+ | pypi.org/project/Pillow |
| PyMuPDF | 1.25.1 | 1.25.5+ | pypi.org/project/PyMuPDF |
| pypdf | 4.3.1 | 6.2.0+ | pypi.org/project/pypdf |
| pdfservices-sdk | 4.1.0 | 4.1.0 | pypi.org/project/pdfservices-sdk |
Notes:
- boto3 updates frequently with new AWS service support
- Pillow has a major version update available (10.x → 11.x)
- pypdf has major version updates available
PDF-to-HTML Solution
File: pdf2html/requirements.txt
| Package | Current Version | Latest Version | PyPI Link |
|---|---|---|---|
| Pillow | >=11.1.0 | 11.1.0+ | pypi.org/project/Pillow |
| boto3 | >=1.37.11 | 1.40.0+ | pypi.org/project/boto3 |
| botocore | >=1.37.11 | 1.41.0+ | pypi.org/project/botocore |
| beautifulsoup4 | >=4.11.0 | 4.12.x | pypi.org/project/beautifulsoup4 |
| bs4 | >=0.0.1 | 0.0.2 | pypi.org/project/bs4 |
| pydantic | >=2.10.6 | 2.10.x | pypi.org/project/pydantic |
| defusedcsv | >=2.0.0 | 2.0.0 | pypi.org/project/defusedcsv |
| Flask | >=3.1.0 | 3.1.x | pypi.org/project/Flask |
| PyYaml | >=6.0.0 | 6.0.2 | pypi.org/project/PyYAML |
| pypdf | >=5.4.0 | 6.2.0+ | pypi.org/project/pypdf |
Notes: This solution uses minimum version constraints (>=) which is good for flexibility but requires testing with newer versions.
JavaScript/Node.js Dependencies
Docker Container: javascript_docker
File: javascript_docker/package.json
| Package | Current Version | Latest Version | NPM Link |
|---|---|---|---|
| @aws-sdk/client-bedrock-runtime | ^3.632.0 | 3.734.0+ | npmjs.com/package/@aws-sdk/client-bedrock-runtime |
| @aws-sdk/client-s3 | ^3.633.0 | 3.734.0+ | npmjs.com/package/@aws-sdk/client-s3 |
| @aws-sdk/util-buffer-from | ^3.374.0 | 3.x | npmjs.com/package/@aws-sdk/util-buffer-from |
| aws-sdk | ^2.1678.0 | 2.x (EOL Sept 2025) | npmjs.com/package/aws-sdk |
| better-sqlite3 | ^11.8.1 | 11.x | npmjs.com/package/better-sqlite3 |
| pdf-lib | ^1.17.1 | 1.17.1 | npmjs.com/package/pdf-lib |
| pdfjs-dist | ^4.6.82 | 5.3.x | npmjs.com/package/pdfjs-dist |
| winston | ^3.14.2 | 3.x | npmjs.com/package/winston |
- aws-sdk v2 is in maintenance mode and reaches end-of-support on September 8, 2025
- Recommend migrating to @aws-sdk v3 packages only
- pdfjs-dist has a major version update available (4.x → 5.x)
PDF-to-HTML CDK Deployment
File: pdf2html/cdk/package.json
| Package | Current Version | Latest Version | NPM Link |
|---|---|---|---|
| aws-cdk-lib | ^2.0.0 | 2.215.0+ | npmjs.com/package/aws-cdk-lib |
| constructs | ^10.0.0 | 10.x | npmjs.com/package/constructs |
Notes: Using caret (^) ranges allows automatic minor/patch updates. Consider pinning to specific versions for production stability.
Java Dependencies
Lambda: java_lambda/PDFMergerLambda
File: lambda/java_lambda/PDFMergerLambda/pom.xml
- JUnit 3.8.1 is extremely outdated (released 2002). Upgrade to JUnit 5.x
- Apache PDFBox has a major version update (2.0.27 → 3.0.3+)
- logback-classic has security updates in 1.5.x
- slf4j-api has a major version update (1.7.x → 2.0.x)
- Consider migrating from AWS SDK v1 (com.amazonaws) to v2 (software.amazon.awssdk)
Maven Build Plugins
| Group ID | Artifact ID | Current Version | Latest Version |
|---|---|---|---|
| org.apache.maven.plugins | maven-shade-plugin | 3.2.4 | 3.6.x |
| com.github.edwgiz | maven-shade-plugin.log4j2-cachefile-transformer | 2.13.0 | 2.x |
Docker Base Images
Python-based Containers
| Container | Base Image | Current Version | Latest Available |
|---|---|---|---|
| docker_autotag | public.ecr.aws/docker/library/python | 3.12-slim | 3.12-slim |
| lambda/split_pdf | public.ecr.aws/lambda/python | 3.12 | 3.12 |
| lambda/add_title | public.ecr.aws/lambda/python | 3.12 | 3.12 |
| lambda/accessibility_checker_before_remidiation | public.ecr.aws/lambda/python | 3.12 | 3.12 |
| lambda/accessability_checker_after_remidiation | public.ecr.aws/lambda/python | 3.12 | 3.12 |
| pdf2html | public.ecr.aws/lambda/python | 3.12 | 3.12 |
Node.js-based Containers
| Container | Base Image | Current Version | Latest Available |
|---|---|---|---|
| javascript_docker | public.ecr.aws/docker/library/node | 20 | 22 LTS |
Notes: Node.js 22 is the current LTS version. Consider upgrading from Node 20.
Security & Compliance Considerations
High Priority Updates
-
JavaScript: aws-sdk v2 EOL - Reaches end-of-support September 2025
- Action: Migrate to @aws-sdk v3 packages
- Impact: High - Security vulnerabilities will not be patched
- Files:
javascript_docker/package.json
-
Java: JUnit 3.8.1 - Extremely outdated (20+ years old)
- Action: Upgrade to JUnit 5.x
- Impact: Medium - Missing modern testing features
- Files:
lambda/java_lambda/PDFMergerLambda/pom.xml
-
Java: Apache PDFBox 2.0.27 - Major version behind
- Action: Evaluate upgrade to 3.0.3+
- Impact: Medium - Bug fixes and performance improvements
- Files:
lambda/java_lambda/PDFMergerLambda/pom.xml
Medium Priority Updates
-
Python: pypdf - Multiple major versions behind (4.3.1 → 6.2.0+)
- Action: Test and upgrade to pypdf 6.x
- Impact: Medium - Performance and feature improvements
- Files: Multiple locations
-
Python: boto3 - Regular updates available
- Action: Update to latest stable version
- Impact: Low-Medium - New AWS service features
- Files:
docker_autotag/requirements.txt,pdf2html/requirements.txt
-
Python: Pillow - Version inconsistency
- Action: Standardize on Pillow 11.1.0+ across all components
- Impact: Low - Security and bug fixes
- Files:
docker_autotag/requirements.txt
-
Python: AWS CDK - Behind latest (2.147.2 → 2.196.0)
- Action: Update to latest CDK version
- Impact: Low - New CloudFormation features
- Files:
requirements.txt
Recommendations
Immediate Actions
- Plan aws-sdk v2 migration - Create migration plan before September 2025 EOL
- Update JUnit - Upgrade Java tests to JUnit 5
- Security audit - Run dependency vulnerability scans (npm audit, pip-audit, OWASP Dependency Check)
Short-term Actions (1-3 months)
- Standardize Python versions - Ensure consistent dependency versions across components
- Update pypdf - Test and migrate to pypdf 6.x for performance improvements
- Update Apache PDFBox - Evaluate PDFBox 3.x compatibility
- Update Node.js base image - Upgrade to Node 22 LTS
Long-term Actions (3-6 months)
- Dependency automation - Implement Dependabot or Renovate for automated updates
- Version pinning strategy - Document policy for version constraints vs pinning
- Regular update cadence - Establish quarterly dependency review process
Dependency Management Tools
Recommended Tools
- Python:
pip-audit,safety, Dependabot - JavaScript:
npm audit,snyk, Dependabot - Java: OWASP Dependency-Check, Snyk
- Docker: Trivy, Snyk Container
GitHub Actions Integration
Consider adding automated dependency scanning to CI/CD pipeline:
- Weekly dependency vulnerability scans
- Automated PR creation for security updates
- Dependency update notifications
Appendix: Quick Reference Links
Python Package Indexes
- PyPI: https://pypi.org/
- Python Package Health: https://snyk.io/advisor/python
JavaScript Package Indexes
- NPM: https://www.npmjs.com/
- NPM Package Health: https://snyk.io/advisor/npm-package
Java Package Indexes
- Maven Central: https://mvnrepository.com/
- Maven Central Search: https://central.sonatype.com/
AWS SDK Documentation
- AWS SDK for Python (Boto3): https://boto3.amazonaws.com/v1/documentation/api/latest/index.html
- AWS SDK for JavaScript v3: https://docs.aws.amazon.com/AWSJavaScriptSDK/v3/latest/
- AWS SDK for Java v2: https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/home.html