You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The TikTok URLs data source now lists skipped posts as 'not found, may have been removed, skipping' in the logs - but the reason it is not found (404, private, etc) can be interesting as well. We have a dataset with such unavailable posts, which we could use to detect the reason and log that data more accurately.
(thought: is the log the best place to store that information? probably not, but no other obvious place to put it except the dataset itself, but that should include only the existing posts)
The text was updated successfully, but these errors were encountered:
I think there is an argument for including them in the dataset itself, after all, the URLs to those datasets were specifically provided by the user in order to get a response (in a crawl dataset--e.g. Telegram--it may be more ambiguous). We could add a column (status/error/something like that) that describes whether it was collected or what we know about why it was not. That could then easily be used as a filter if you are only interested in available videos for example.
The knock on effects may need to be handled as some processors are not prepped. I do not think anything would fail, but perhaps mislead if you are say counting likes or something and it is not clear that certain posts do not have 0 likes and instead their likes are just unavailable. That could be handled by using MissingMappedField.
The TikTok URLs data source now lists skipped posts as 'not found, may have been removed, skipping' in the logs - but the reason it is not found (404, private, etc) can be interesting as well. We have a dataset with such unavailable posts, which we could use to detect the reason and log that data more accurately.
(thought: is the log the best place to store that information? probably not, but no other obvious place to put it except the dataset itself, but that should include only the existing posts)
The text was updated successfully, but these errors were encountered: