Skip to content

Update External Dependencies (Python, JavaScript, Java, Docker, etc) #30

@hakanson

Description

@hakanson

Note: As I was reviewing this project, and before I created any pull requests, I wanted to create a tracking issue on this topics. Much of the content (including links, recommendations, and priorities) was assisted by AI analysis using Kiro IDE.

Overview

This issue catalogs all external dependencies used in the PDF Accessibility project, organized by technology stack and source folder. Each dependency includes its current version, latest available version, and links to the specific files in the repository.


Python Dependencies

Root Level - CDK Infrastructure

File: requirements.txt

Package Current Version Latest Version PyPI Link
aws-cdk-lib 2.147.2 2.196.0 pypi.org/project/aws-cdk-lib
constructs >=10.0.0,<11.0.0 10.x pypi.org/project/constructs

Lambda: split_pdf

File: lambda/split_pdf/requirements.txt

Package Current Version Latest Version PyPI Link
pypdf 4.3.1 6.2.0+ pypi.org/project/pypdf

Notes: pypdf has had major version updates. Version 6.x includes significant improvements and bug fixes.


Lambda: add_title

File: lambda/add_title/requirements.txt

Package Current Version Latest Version PyPI Link
PyMuPDF 1.24.14 1.25.5+ pypi.org/project/PyMuPDF

Notes: PyMuPDF is actively maintained with regular updates for performance and features.


Lambda: accessibility_checker_before_remidiation

File: lambda/accessibility_checker_before_remidiation/requirements.txt

Package Current Version Latest Version PyPI Link
pdfservices-sdk 4.1.0 4.1.0 pypi.org/project/pdfservices-sdk

Notes: Adobe PDF Services SDK - requires Python 3.10+. Check Adobe's documentation for updates.


Lambda: accessability_checker_after_remidiation

File: lambda/accessability_checker_after_remidiation/requirements.txt

Package Current Version Latest Version PyPI Link
pdfservices-sdk 4.1.0 4.1.0 pypi.org/project/pdfservices-sdk

Docker Container: docker_autotag

File: docker_autotag/requirements.txt

Package Current Version Latest Version PyPI Link
boto3 1.34.160 1.40.0+ pypi.org/project/boto3
numpy 2.0.1 2.x pypi.org/project/numpy
openpyxl 3.1.5 3.1.x pypi.org/project/openpyxl
pandas 2.2.2 2.2.x pypi.org/project/pandas
pillow 10.4.0 11.1.0+ pypi.org/project/Pillow
PyMuPDF 1.25.1 1.25.5+ pypi.org/project/PyMuPDF
pypdf 4.3.1 6.2.0+ pypi.org/project/pypdf
pdfservices-sdk 4.1.0 4.1.0 pypi.org/project/pdfservices-sdk

Notes:

  • boto3 updates frequently with new AWS service support
  • Pillow has a major version update available (10.x → 11.x)
  • pypdf has major version updates available

PDF-to-HTML Solution

File: pdf2html/requirements.txt

Package Current Version Latest Version PyPI Link
Pillow >=11.1.0 11.1.0+ pypi.org/project/Pillow
boto3 >=1.37.11 1.40.0+ pypi.org/project/boto3
botocore >=1.37.11 1.41.0+ pypi.org/project/botocore
beautifulsoup4 >=4.11.0 4.12.x pypi.org/project/beautifulsoup4
bs4 >=0.0.1 0.0.2 pypi.org/project/bs4
pydantic >=2.10.6 2.10.x pypi.org/project/pydantic
defusedcsv >=2.0.0 2.0.0 pypi.org/project/defusedcsv
Flask >=3.1.0 3.1.x pypi.org/project/Flask
PyYaml >=6.0.0 6.0.2 pypi.org/project/PyYAML
pypdf >=5.4.0 6.2.0+ pypi.org/project/pypdf

Notes: This solution uses minimum version constraints (>=) which is good for flexibility but requires testing with newer versions.


JavaScript/Node.js Dependencies

Docker Container: javascript_docker

File: javascript_docker/package.json

Package Current Version Latest Version NPM Link
@aws-sdk/client-bedrock-runtime ^3.632.0 3.734.0+ npmjs.com/package/@aws-sdk/client-bedrock-runtime
@aws-sdk/client-s3 ^3.633.0 3.734.0+ npmjs.com/package/@aws-sdk/client-s3
@aws-sdk/util-buffer-from ^3.374.0 3.x npmjs.com/package/@aws-sdk/util-buffer-from
aws-sdk ^2.1678.0 2.x (EOL Sept 2025) npmjs.com/package/aws-sdk
better-sqlite3 ^11.8.1 11.x npmjs.com/package/better-sqlite3
pdf-lib ^1.17.1 1.17.1 npmjs.com/package/pdf-lib
pdfjs-dist ^4.6.82 5.3.x npmjs.com/package/pdfjs-dist
winston ^3.14.2 3.x npmjs.com/package/winston

⚠️ CRITICAL NOTES:

  • aws-sdk v2 is in maintenance mode and reaches end-of-support on September 8, 2025
  • Recommend migrating to @aws-sdk v3 packages only
  • pdfjs-dist has a major version update available (4.x → 5.x)

PDF-to-HTML CDK Deployment

File: pdf2html/cdk/package.json

Package Current Version Latest Version NPM Link
aws-cdk-lib ^2.0.0 2.215.0+ npmjs.com/package/aws-cdk-lib
constructs ^10.0.0 10.x npmjs.com/package/constructs

Notes: Using caret (^) ranges allows automatic minor/patch updates. Consider pinning to specific versions for production stability.


Java Dependencies

Lambda: java_lambda/PDFMergerLambda

File: lambda/java_lambda/PDFMergerLambda/pom.xml

Group ID Artifact ID Current Version Latest Version Maven Link
software.amazon.awssdk cloudwatch 2.20.0 2.x mvnrepository.com/artifact/software.amazon.awssdk/cloudwatch
org.json json 20230227 20240303+ mvnrepository.com/artifact/org.json/json
ch.qos.logback logback-classic 1.2.11 1.5.x mvnrepository.com/artifact/ch.qos.logback/logback-classic
org.slf4j slf4j-api 1.7.32 2.0.x mvnrepository.com/artifact/org.slf4j/slf4j-api
com.amazonaws aws-java-sdk-logs 1.12.400 1.12.x mvnrepository.com/artifact/com.amazonaws/aws-java-sdk-logs
com.amazonaws aws-java-sdk-cloudwatch 1.12.400 1.12.x mvnrepository.com/artifact/com.amazonaws/aws-java-sdk-cloudwatch
junit junit 3.8.1 4.13.2 / 5.x mvnrepository.com/artifact/junit/junit
org.apache.pdfbox pdfbox 2.0.27 3.0.3+ mvnrepository.com/artifact/org.apache.pdfbox/pdfbox
com.amazonaws aws-java-sdk-s3 1.12.537 1.12.x mvnrepository.com/artifact/com.amazonaws/aws-java-sdk-s3
com.amazonaws aws-lambda-java-core 1.2.2 1.2.x mvnrepository.com/artifact/com.amazonaws/aws-lambda-java-core
com.amazonaws aws-lambda-java-events 3.11.1 3.x mvnrepository.com/artifact/com.amazonaws/aws-lambda-java-events

⚠️ CRITICAL NOTES:

  • JUnit 3.8.1 is extremely outdated (released 2002). Upgrade to JUnit 5.x
  • Apache PDFBox has a major version update (2.0.27 → 3.0.3+)
  • logback-classic has security updates in 1.5.x
  • slf4j-api has a major version update (1.7.x → 2.0.x)
  • Consider migrating from AWS SDK v1 (com.amazonaws) to v2 (software.amazon.awssdk)

Maven Build Plugins

Group ID Artifact ID Current Version Latest Version
org.apache.maven.plugins maven-shade-plugin 3.2.4 3.6.x
com.github.edwgiz maven-shade-plugin.log4j2-cachefile-transformer 2.13.0 2.x

Docker Base Images

Python-based Containers

Container Base Image Current Version Latest Available
docker_autotag public.ecr.aws/docker/library/python 3.12-slim 3.12-slim
lambda/split_pdf public.ecr.aws/lambda/python 3.12 3.12
lambda/add_title public.ecr.aws/lambda/python 3.12 3.12
lambda/accessibility_checker_before_remidiation public.ecr.aws/lambda/python 3.12 3.12
lambda/accessability_checker_after_remidiation public.ecr.aws/lambda/python 3.12 3.12
pdf2html public.ecr.aws/lambda/python 3.12 3.12

Node.js-based Containers

Container Base Image Current Version Latest Available
javascript_docker public.ecr.aws/docker/library/node 20 22 LTS

Notes: Node.js 22 is the current LTS version. Consider upgrading from Node 20.


Security & Compliance Considerations

High Priority Updates

  1. JavaScript: aws-sdk v2 EOL - Reaches end-of-support September 2025

  2. Java: JUnit 3.8.1 - Extremely outdated (20+ years old)

  3. Java: Apache PDFBox 2.0.27 - Major version behind

Medium Priority Updates

  1. Python: pypdf - Multiple major versions behind (4.3.1 → 6.2.0+)

    • Action: Test and upgrade to pypdf 6.x
    • Impact: Medium - Performance and feature improvements
    • Files: Multiple locations
  2. Python: boto3 - Regular updates available

  3. Python: Pillow - Version inconsistency

  4. Python: AWS CDK - Behind latest (2.147.2 → 2.196.0)

    • Action: Update to latest CDK version
    • Impact: Low - New CloudFormation features
    • Files: requirements.txt

Recommendations

Immediate Actions

  1. Plan aws-sdk v2 migration - Create migration plan before September 2025 EOL
  2. Update JUnit - Upgrade Java tests to JUnit 5
  3. Security audit - Run dependency vulnerability scans (npm audit, pip-audit, OWASP Dependency Check)

Short-term Actions (1-3 months)

  1. Standardize Python versions - Ensure consistent dependency versions across components
  2. Update pypdf - Test and migrate to pypdf 6.x for performance improvements
  3. Update Apache PDFBox - Evaluate PDFBox 3.x compatibility
  4. Update Node.js base image - Upgrade to Node 22 LTS

Long-term Actions (3-6 months)

  1. Dependency automation - Implement Dependabot or Renovate for automated updates
  2. Version pinning strategy - Document policy for version constraints vs pinning
  3. Regular update cadence - Establish quarterly dependency review process

Dependency Management Tools

Recommended Tools

  • Python: pip-audit, safety, Dependabot
  • JavaScript: npm audit, snyk, Dependabot
  • Java: OWASP Dependency-Check, Snyk
  • Docker: Trivy, Snyk Container

GitHub Actions Integration

Consider adding automated dependency scanning to CI/CD pipeline:

  • Weekly dependency vulnerability scans
  • Automated PR creation for security updates
  • Dependency update notifications

Appendix: Quick Reference Links

Python Package Indexes

JavaScript Package Indexes

Java Package Indexes

AWS SDK Documentation

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions