Skip to content

Commit 4590a0d

Browse files
committed
rss-bot: Support unordered RSS feeds.
By splitting the logic into two loops - one for processing all the entries in the feed, and another to post only the latest ones in chronological order. Instead of tracking new_hashes in memory while processing the feed file, we track new_entries now. This is because we will not be hashing all the entries, only the ones that we post. Fixes #831.
1 parent 0deb6cd commit 4590a0d

File tree

1 file changed

+22
-27
lines changed

1 file changed

+22
-27
lines changed

zulip/integrations/rss/rss-bot

Lines changed: 22 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -241,7 +241,7 @@ for feed_url in feed_urls:
241241
except OSError:
242242
old_feed_hashes = {}
243243

244-
new_hashes: List[str] = []
244+
new_entries: List[tuple[Any, str, float]] = []
245245
data = feedparser.parse(feed_url)
246246
feed_name: str = data.feed.title or feed_url
247247
# Safeguard to not process older entries in unordered feeds
@@ -251,32 +251,27 @@ for feed_url in feed_urls:
251251
entry_hash = compute_entry_hash(entry)
252252
entry_time, is_time_tagged = get_entry_time(entry)
253253
if (is_time_tagged and entry_time < entry_threshold) or entry_hash in old_feed_hashes:
254-
# As a safeguard against misbehaving feeds, don't try to process
255-
# entries older than some threshold.
256254
continue
257-
if entry_hash in old_feed_hashes:
258-
# We've already seen this. No need to process any older entries.
259-
break
260-
if not old_feed_hashes and len(new_hashes) >= opts.max_batch_size:
261-
# On a first run, pick up the n (= opts.max_batch_size) most recent entries.
262-
# An RSS feed has entries in reverse chronological order.
263-
break
264-
265-
response: Dict[str, Any] = send_zulip(entry, feed_name)
266-
if response["result"] != "success":
267-
logger.error("Error processing %s", feed_url)
268-
logger.error("%s", response)
269-
if first_message:
270-
# This is probably some fundamental problem like the stream not
271-
# existing or something being misconfigured, so bail instead of
272-
# getting the same error for every RSS entry.
273-
log_error_and_exit("Failed to process first message")
274-
# Go ahead and move on -- perhaps this entry is corrupt.
275-
new_hashes.append(entry_hash)
276-
first_message = False
255+
new_entries.append((entry, entry_hash, entry_time))
277256

278-
with open(feed_hashes_file, "a") as f:
279-
for hash in new_hashes:
280-
f.write(hash + "\n")
257+
# We process all entries to support unordered feeds,
258+
# but post only the latest ones in chronological order.
259+
sorted_entries = sorted(new_entries, key=lambda x: x[2])[-opts.max_batch_size :]
281260

282-
logger.info("Sent zulips for %d %s entries", len(new_hashes), feed_url)
261+
with open(feed_hashes_file, "a") as f:
262+
for entry_tuple in sorted_entries:
263+
entry, entry_hash, _ = entry_tuple
264+
265+
response: Dict[str, Any] = send_zulip(entry, feed_name)
266+
if response["result"] != "success":
267+
logger.error("Error processing %s", feed_url)
268+
logger.error("%s", response)
269+
if not old_feed_hashes and entry_tuple == sorted_entries[0]:
270+
# This is probably some fundamental problem like the stream not
271+
# existing or something being misconfigured, so bail instead of
272+
# getting the same error for every RSS entry.
273+
log_error_and_exit("Failed to process first message")
274+
# Go ahead and move on -- perhaps this entry is corrupt.
275+
f.write(entry_hash + "\n")
276+
277+
logger.info("Sent zulips for %d %s entries", len(new_entries), feed_url)

0 commit comments

Comments
 (0)