Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion: timestamp inconsistencies #91

Open
SHAcollision opened this issue Aug 27, 2024 · 0 comments
Open

Discussion: timestamp inconsistencies #91

SHAcollision opened this issue Aug 27, 2024 · 0 comments
Labels
enhancement New feature or request 🔮 nexus

Comments

@SHAcollision
Copy link
Collaborator

Problem with Running a New Indexer: Timestamp Inconsistencies

Description:
We've identified a significant issue that arises when running a new indexer on an existing home server. Specifically, the problem centers around how timestamps (created_at) are handled and their meaning in different contexts. This issue can lead to a breakdown in functionality, particularly when generating timelines and sorting data.

Current Behavior:

  • When a new indexer runs, it generates new indexed_at timestamps for items such as tags, bookmarks, and follows.
  • These newly created timestamps represent the time when the data was indexed, not the actual time the event (e.g., a tag or bookmark creation) occurred on the home server. This is desired as we cannot trust that clients timestamps are faithful or that their system clock is correct.
  • This leads to a situation where all events indexed by a new indexer are timestamped as if they occurred at the same time, regardless of when they actually happened. This behavior can make it impossible to reconstruct a meaningful timeline for the user when running a new Nexus instance.

For example:

  • A new indexer running on a home server may assign the same timestamp to all bookmarks and tags that it indexes, as it is unaware of the original created_at time from the home server.
  • This results in the inability to generate meaningful views such as "last 24 hours", "last month", or similar time-based filters, since all items will appear as if they were created at the same time.

Proposed Solutions:
To address this issue, we need to explore several potential solutions:

  1. Use the Home Server's created_at Timestamps:

    • When a new indexer runs, instead of generating new indexed_at timestamps, it could retrieve and use the original created_at timestamps from the home server (that we ususally do not trust). This would ensure that the indexed data reflects the actual creation time of the items, allowing for accurate timelines and sorting.
  2. Improve Event Stream:

    • We could enhance the event stream of the homeserver by including timestamps of when the data was stored, that ensures the indexer stays up-to-date with the home server.
    • The event stream could help the indexer differentiate between historical and current events, reducing the risk of incorrect timestamping.
  3. Hybrid Approach with indexed_at and created_at:

    • Use a hybrid approach where both created_at and indexed_at timestamps are maintained. The created_at timestamp would be retrieved from the home server, while the indexed_at timestamp would represent when the indexer processed the event. The indexer would then prioritize created_at for user-facing timelines and sorting, while indexed_at could be used internally for tracking indexing operations.
  4. Detection of Indexer Sync State:

    • Implement logic to detect if the indexer is up-to-date with the home server. For example, the indexer could request a batch of events and compare the number of received events with the expected total. If the indexer receives fewer events than expected, it would know it is up-to-date and can safely use indexed_at for new events. Otherwise, it would prioritize created_at.

Challenges:

  • System Clock Issues: One challenge with relying solely on created_at timestamps is the potential for incorrect timestamps if the client device has an incorrect system clock. This could result in inaccurate data.
  • Event Stream Complexity: Adding timestamps and maintaining a reliable event stream system adds complexity and requires careful consideration of edge cases, such as network interruptions and server downtime.

Conclusion:
This issue presents a challenge that must be addressed to ensure the integrity and usability of the data indexed by new Nexus instances. We propose evaluating the solutions outlined above and determining the best approach to ensure consistent and meaningful timestamps across all indexed data.

@SHAcollision SHAcollision added enhancement New feature or request 🔮 nexus labels Aug 27, 2024
@SHAcollision SHAcollision changed the title Timestamp Inconsistencies Discussion: timestamp inconsistencies Nov 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request 🔮 nexus
Projects
None yet
Development

No branches or pull requests

1 participant