Skip to content

Harvesting client improvements: configurable delay between GetRecord calls; a fix for a problem with long-running DataCite harvests#11486

Open
landreev wants to merge 13 commits intodevelopfrom
11473-harvesting-client-ratelimit
Open

Harvesting client improvements: configurable delay between GetRecord calls; a fix for a problem with long-running DataCite harvests#11486
landreev wants to merge 13 commits intodevelopfrom
11473-harvesting-client-ratelimit

Conversation

@landreev
Copy link
Contributor

@landreev landreev commented May 12, 2025

What this PR does / why we need it:

This is based on a patch that I made a while ago for another Dataverse instance. But it has come handy here at HDV and it may be of benefit to other instances out there.
The changes are quite straightforward.

From the accompanying release note:

A setting has been added for configuring sleep intervals between OAI calls for specific harvesting clients. Making it possible to harvest uninterrupted from servers enforcing rate limit policies. See the configuration guide for details. Additionally, this release fixes a problem with harvesting from DataCite OAI-PMH where initial, long-running harvests were failing on sets with large numbers of records.

Which issue(s) this PR closes:

Special notes for your reviewer:

Suggestions on how to test this:

Does this PR introduce a user interface change? If mockups are available, please link/include them here:

Is there a release notes update needed for this change?:

Additional documentation:

…g client calls

(the only thing I want to add is an option of enabling this setting for specific clients; similarly to how ingest size limits can be for all, or some specific formats only. #11473
@coveralls
Copy link

coveralls commented May 12, 2025

Coverage Status

coverage: 24.397% (-0.02%) from 24.414%
when pulling 5b87133 on 11473-harvesting-client-ratelimit
into f20e75a on develop.

@github-actions

This comment has been minimized.

Resolved merge conflicts in:
	src/main/java/edu/harvard/iq/dataverse/settings/SettingsServiceBean.java
	src/main/java/edu/harvard/iq/dataverse/util/SystemConfig.java
@github-actions

This comment has been minimized.

@landreev landreev changed the title adds a setting for configuring a delay between GetRecord calls in harvesting client Harvesting client improvements: configurable delay between GetRecord calls; a fix for a problem with long-running DataCite harvests Feb 26, 2026
@landreev
Copy link
Contributor Author

Un-drafting this thing.

@landreev landreev marked this pull request as ready for review February 26, 2026 15:57
@landreev
Copy link
Contributor Author

(will sync w/ develop shortly)

@github-actions

This comment has been minimized.

@scolapasta scolapasta moved this from Ready for Triage to Ready for Review ⏩ in IQSS Dataverse Project Feb 26, 2026
@scolapasta scolapasta added this to the 6.10 milestone Feb 26, 2026
@github-actions

This comment has been minimized.

@stevenwinship stevenwinship self-assigned this Feb 26, 2026
@stevenwinship stevenwinship moved this from Ready for Review ⏩ to In Review 🔎 in IQSS Dataverse Project Feb 26, 2026
@stevenwinship stevenwinship added the FY26 Sprint 18 FY26 Sprint 18 (2026-02-25 - 2026-03-11) label Feb 26, 2026
@landreev
Copy link
Contributor Author

Synced the branch with develop just now. There were no merge conflicts to resolve however, contrary to what GitHub was saying. 🤔

@github-actions

This comment has been minimized.

@github-project-automation github-project-automation bot moved this from In Review 🔎 to Ready for QA ⏩ in IQSS Dataverse Project Feb 26, 2026
@stevenwinship stevenwinship removed their assignment Feb 26, 2026
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions
Copy link

📦 Pushed preview images as

ghcr.io/gdcc/dataverse:11473-harvesting-client-ratelimit
ghcr.io/gdcc/configbaker:11473-harvesting-client-ratelimit

🚢 See on GHCR. Use by referencing with full name as printed above, mind the registry name.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

FY26 Sprint 18 FY26 Sprint 18 (2026-02-25 - 2026-03-11)

Projects

Status: Ready for QA ⏩

Development

Successfully merging this pull request may close these issues.

Add a setting for configuring a rate limit in Harvesting Client (a limit on outgoing calls)

4 participants