Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Several headlines for the same frontpage - Toronto Star #39

Open
SkelAlex opened this issue May 12, 2023 · 4 comments
Open

Several headlines for the same frontpage - Toronto Star #39

SkelAlex opened this issue May 12, 2023 · 4 comments
Assignees
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@SkelAlex
Copy link
Collaborator

Please let me know if I should separate this issue in three issues. The pattern I noticed is exactly the same.

I do not know if this is specific to the Toronto Star or a more general problem.

In four instances (one of which will instead be discussed in a separate issue because there are other challenges), I find there are two or more headline Hublot elements for the same frontpage. Interestingly, some of these headlines do include duplicates, and some frontpages in the past day were associated with only one headline as they should, so the issue does not happen all the time. The headline elements associated with the same frontpage have only a few differences with each other: (1) the hashed_html is different; (2) timestamps are different and non-overlapping (but, collectively, they do match with the frontpage); (3) the lake item's final numbers are different.

Instance #1:

Instance #2:

Instance #3:

Screenshots from instance #3 only (since the pattern is very similar for instances #1 and #2; happy to add more screenshots if needed):
Capture d’écran, le 2023-05-12 à 13 22 45
Capture d’écran, le 2023-05-12 à 13 22 26
Capture d’écran, le 2023-05-12 à 13 22 36

@ClementCadieux
Copy link
Contributor

This is normal. In situations where modifications are made to the article, the frontpage will remain the same but a different headline will be pushed to the lake.

@SkelAlex
Copy link
Collaborator Author

I had a look at some of these articles and saw absolutely no differences in the URL, title, or content however

@ClementCadieux
Copy link
Contributor

ClementCadieux commented May 12, 2023

If the hashed_html are different, then the code is finding differences in the content. This could be a bug, but the code is behaving as it should in situations where it sees different content related to the same frontpage.

@SkelAlex
Copy link
Collaborator Author

Yes, that makes sense. Maybe the differences are about some other items, such as "next recommended article"? If this doubles the amount of data found in radarplus/headlines this might be something worth inquiring eventually. But maybe the steps required to fix it would be too burdensome to be undertaken.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants