Skip to content

Conversation

@Niloth-p
Copy link
Contributor

@Niloth-p Niloth-p commented Dec 18, 2025

Fixes: #831

I've tested the basic functionality (unordered feeds + max batch size), but I haven't tested every case thoroughly (yet).

Self-review checklist
  • Self-reviewed the changes for clarity and maintainability
    (variable names, code reuse, readability, etc.).

Communicate decisions, questions, and potential concerns.

  • Explains differences from previous plans (e.g., issue description).
  • Highlights technical choices and bugs encountered.
  • Calls out remaining decisions and concerns.
  • Automated tests verify logic where appropriate.

Individual commits are ready for review (see commit discipline).

  • Each commit is a coherent idea.
  • Commit message(s) explain reasoning and motivation for changes.

Completed manual review and testing of the following:

  • Visual appearance of the changes.
  • Responsiveness and internationalization.
  • Strings and tooltips.
  • End-to-end functionality of buttons, interactions and flows.
  • Corner cases, error conditions, and easily imagined bugs.

rss-bot had 2 different feed_file variables:
1. The user provided file with the list of feed URLs.
2. The file for each feed URL, to store the feed entries' hashes.

To clearly differentiate between them, the latter has been renamed to
feed_hashes_file.
Previously, it was being set for every entry.
Renamed the OLDNESS_THRESHOLD constant that was being used for the
same, to match the name of the newly added option.
We will be using entry time for sorting entries in the following
commits.
@Niloth-p
Copy link
Contributor Author

@Pritesh-30 Could you please help me manually test this PR?
We'd want to check that the following cases work fine:

  • oldest-first / newest-first / especially unordered feeds
  • with a max batch size (less than the number of new entries) and earliest entry age (less than the oldest unhashed entries)
  • Entries without time tags

And let me know if you can find any missing edge cases.

By splitting the logic into two loops - one for processing all the
entries in the feed, and another to post only the latest ones in
chronological order.

Instead of tracking new_hashes in memory while processing the feed file,
we track unhashed_entries now, since we will not be hashing all the
entries, only the ones that we post.

Fixes zulip#831.
@Pritesh-30
Copy link
Collaborator

@Pritesh-30 Could you please help me manually test this PR? We'd want to check that the following cases work fine:

  • oldest-first / newest-first / especially unordered feeds
  • with a max batch size (less than the number of new entries) and earliest entry age (less than the oldest unhashed entries)
  • Entries without time tags

And let me know if you can find any missing edge cases.

@Niloth-p Sure, I can help with the manual testing. I’ll run this locally against a few feeds and the cases you have mentioned & report back with the results once I’m done.

@Pritesh-30
Copy link
Collaborator

Pritesh-30 commented Dec 18, 2025

@Niloth-p I manually tested using real RSS feeds. I used the following feeds for testing:

The entries were posted to the test here stream.

I tested for :

  • Newest-first feeds as well as unordered / oldest-first ones
  • Behavior when the number of new entries is greater than max-batch-size
  • earliest-entry-age by running against a fresh data directory
  • Multiple consecutive runs to make sure entries aren’t duplicated and previously the feed that were not sent
    are getting posted making final run feed to zero
  • Feeds where entries don’t have time tags

Observed behavior:

  • Unordered feeds are getting scanned properly without stopping
  • Hash persistence works correctly no duplicate feeds are sent
  • I verified that max-batch-size behaves correctly when there are more new entries than the batch limit — entries that aren’t sent aren’t hashed and show up in the next run.
  • Entry age filtering works as expected. I verified this by adjusting the cutoff — only entries within the specified age were sent.
  • Entries without time tags are handled correctly and don’t cause the run to fail.

One edge case I encountered:

  • Current script assumes that for every feed feed.title exist . Feeds without a feed.title cause the script to crash. So we have to add a fallback here. For my testing I changed this feed_name: str = data.feed.title or feed_url to
    feed_name: str = getattr(data.feed, "title", None) or feed_url

Overall, behavior looks correct from manual testing.

@Niloth-p
Copy link
Contributor Author

Niloth-p commented Dec 18, 2025

Thank you for testing, @Pritesh-30!
Ah, is that so? I'd have expected feed.title to be provided by feedparser even if it doesn't exist. I'll have to check that. Which URL did that happen with?

(Marking this as draft because I want to discuss a couple of design decisions on CZO before proceeding.)

@Niloth-p Niloth-p marked this pull request as draft December 18, 2025 11:35
@Pritesh-30
Copy link
Collaborator

@Niloth-p It happend with this feed https://uptime.com/rss. I looked into it, it says feedparser handles if title is present and empty but not if the title tag is is not present.
Thanks! Let me know if there’s anything else I can help with!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

RSS bot pulls the oldest entries and ignores the newer/latest entries on the feed.

3 participants