-
Notifications
You must be signed in to change notification settings - Fork 346
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tvmaze metadata lookup failures #654
Comments
I have a solution for this bug. Once I figure out how to push my solution to my GitHub branch, the fix will be available. |
Looking at your provided patch, this issue actually addresses four items:
Once you considered this feedback, please open a pull request against mythtv/master |
@rcrdnalor : Thanks for your feedback. Bullet 2 - I agree that assigning a default of 60 minutes was an ugly hack. I'll rework the code to eliminate it. Bullet 4 - It's true that some TV shows air multiple original broadcast episodes in one day, but that's rare. There are cases where the TvMaze database provides an airdate but doesn't provide a timestamp, but that's very rare. So, in the extremely rare case of both of these happening simultaneously, there would be multiple episodes which would be possible matches for the recording. There's no demand that there be only one episode per day. The script will have more than one 'matchesFound', and will print them all out. It's already designed to do this. If all you have is a date, and there are 5 episodes matching that date, then all 5 should be printed out. If I follow your suggestion to break down my solution into smaller chunks, do I have to create a new Issue for each one? Or can there be four commits corresponding to one Issue, with each one adding a little bit more? Does each commit require its own pull request, or can 4 commits go in one pull request? The Submitting Bug Fixes document says to "run the current unstable/development version" but that seems dangerous to me. I need my MythTv to run in a stable fashion every day, so I'd rather have my system based on fixes/32. How can you test your code on an unstable release? Do you have to run it on a different machine? |
Writing pull requests should be always against git/master. Just a quick question: |
@rcrdnalor : Could you please clarify the process? When you said "open a pull request against mythtv/master ... which addresses above items in separate commits", does that mean I should have 4 separate commits against one issue (this one), or would I need to create 4 separate issues, with one commit against each one? |
@rcrdnalor : In regards to your quick question, I first noticed missing 'runtime' and 'airtime' fields in the TvMaze database when processing an episode of a show called "Everything's Trash". At the broadcast time of the show (7/27/22), and 2 days later (7/29/22), the database had: "id":2356087, ... "airdate":"2022-07-27", "airtime":"","airstamp":"2022-07-27T16:00:00+00:00","runtime":null I think about a week later, when I checked again, it showed "id":2356087,... "airdate":"2022-07-27", "airtime":"22:00","airstamp":"2022-07-28T02:00:00+00:00","runtime":30 It appears that someone updated the database between these fetch times. Since the metadata is fetched right after the recording has finished, an update a week later doesn't really help. |
Updating the solution to split it into smaller commit components and to base it off of mythtv/master. |
…ime field. If both are missing, avoid a crash by testing for a non-NULL episode duration. MythTV#654
… episode, and not the time. MythTV#654
…ime field. If both are missing, avoid a crash by testing for a non-NULL episode duration. MythTV#654
…rageRuntime field. If both are missing, avoid a crash by testing for a non-NULL episode duration. MythTV#654" This reverts commit 82bad85. I've learned that I cannot push more than one commit from one branch.
Steve, See https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue Thank you for improving MythTV, |
In rare cases, the TvMaze database provides null timestamp or runtime data fields. To make the metadata retrieval script more fault tolerant, null checks have been added for these values. When the timestamp field is null, don't use it to generate a datetime to use in a search for a matching episode. When the runtime duration field is null, the delta size is now set to zero minutes. The code which looks for an episode match has been updated to separate "on time" recordings from "a little late" recordings to accommodate the possible delta size of zero. Refs: MythTV#654
The script tvmaze/show.py had a line which retrieved the episode duration from the metadata provider, but then assigned it to num_episodes, which makes no sense. There is no code looking for num_episodes, but there is code looking for runtime, so the line has been fixed to assign to runtime. Using the TVmaze service for metadata lookup, there are rare cases where the 'runtime' field is empty, but 'averageRuntime' is not. This happened when looking up metadata for a TV show called "Everything's Trash". Note that the database entry for this show was subsequently updated to include the missing 'runtime' field. But on the night it was recorded, and for several days later, that field was null. Here's an excerpt of the original metadata for "Everything's Trash": "id":60518, "url":"https://www.tvmaze.com/shows/60518/everythings-trash", "name":"Everything's Trash", "type":"Scripted", "language":"English", "genres":[], "status":"Running", "runtime":null, "averageRuntime":30, To handle this type of scenario, when the 'runtime' field is null, read the duration from the 'averageRuntime' field. If both are null, then avoid trying to cast that value to an integer in tvmaze.py. Refs: MythTV#654
The tool pylint reported several unnecessary includes in tvmaze.py. These have been removed to make the script run a little bit faster. Refs: MythTV#654
The tool pylint reported several unnecessary includes in tvmaze.py. These have been removed to make the script run a little bit faster. Refs: MythTV#654
@rcrdnalor : I've amended commit messages and have generated 3 pull requests: #658, #662, #663. The fourth commit will be slightly dependent on the 3rd one getting merged to master. I'll have to wait for that to happen before proceeding with generation of that commit and the corresponding pull request. |
A recording of a broadcast of the first episode in season 16 of "Criminal Minds" on CBS lead to a metadata lookup failure. The general syntax for this metadata lookup was: tvmaze.py -N "Criminal Minds" "2022-11-24 11:00:00" Investigation revealed that for recent years this show is normally only available on a streaming service called Paramount+. https://api.tvmaze.com/shows/81 ... "webChannel":{"id":107,"name":"Paramount+", "country":null,"officialSite": "https://www.paramountplus.com/"} ... The null value for Country caused the Timezone retrieval code in tvmaze.py to fail. The code has been updated to handle a null Country field. While regression testing, another crash appeared. tvmaze.py -N "The Masked Singer" "2022-11-24 19:00:00" This show has distinct versions in many countries and timezones. country='United States', timezone='America/New_York' country='United Kingdom', timezone='Europe/London' country='Australia', timezone='Australia/Sydney' country='Belgium', timezone='Europe/Brussels' country=None, show_tz =None (Japanese version) country='Finland', timezone='Europe/Helsinki' country='Germany', timezone='Europe/Busingen' country='Israel', timezone='Asia/Jerusalem' country='Mexico', timezone='America/Monterrey' The null Country field for the Japanese version is handled fine, but passing 'Asia/Jerusalem' to astimezone() caused it to throw an exception. UnboundLocalError: local variable 'ttmfmt' referenced before assignment This appears to be a bug in python3. To prevent it from crashing our script, UnboundLocalError has been added to the exception catcher. Some of the debug print messages have also been updated to improve their format. Refs: MythTV#654
The UnboundLocalError is not necessarily a bug in python3. It seems to be in the Mythtv version of dt.py. Here's the stack trace when show_tz = 'Asia/Jerusalem':
|
Looking at the definition:
the version 3 is not supported by
Please open an issue against the python bindings. |
@rcrdnalor : Thanks. I've opened #672 for the issue. Should I change the comment to say UnboundLocalError is needed until version 3 of the timezone protocol is supported? |
The previous commit added an exception catch for UnboundLocalError because regression testing revealed a failure in handling the Asia/Jerusalem timezone. This timezone uses the version 3 data format. A new issue MythTV#672 was opened to ask for support for version 3. Since that issue has now been fixed, we no longer need to catch UnboundLocalError. Refs: MythTV#654
When using tvmaze.py to retrieve Collection information, the runtime information was not being displayed. This update adds it to output. For example, the Collection information for the TV show "Hogan's Heroes" is now: /usr/share/mythtv/metadata/Television/tvmaze.py -C 1475 <?xml version='1.0' encoding='UTF-8'?> <metadata> <item> <title>Hogan's Heroes</title> ... <inetref>1475</inetref> <imdb>tt0058812</imdb> <collectionref>1475</collectionref> <language>en</language> <releasedate>1965-09-17</releasedate> <userrating>8.000000</userrating> <popularity>85.0</popularity> <year>1965</year> <runtime>30</runtime> ... </item> </metadata> Refs: MythTV#654
The tool pylint reported several unnecessary includes in tvmaze.py. These have been removed to make the script run a little bit faster. Refs: #654
The script tvmaze/show.py had a line which retrieved the episode duration from the metadata provider, but then assigned it to num_episodes, which makes no sense. There is no code looking for num_episodes, but there is code looking for runtime, so the line has been fixed to assign to runtime. Using the TVmaze service for metadata lookup, there are rare cases where the 'runtime' field is empty, but 'averageRuntime' is not. This happened when looking up metadata for a TV show called "Everything's Trash". Note that the database entry for this show was subsequently updated to include the missing 'runtime' field. But on the night it was recorded, and for several days later, that field was null. Here's an excerpt of the original metadata for "Everything's Trash": "id":60518, "url":"https://www.tvmaze.com/shows/60518/everythings-trash", "name":"Everything's Trash", "type":"Scripted", "language":"English", "genres":[], "status":"Running", "runtime":null, "averageRuntime":30, To handle this type of scenario, when the 'runtime' field is null, read the duration from the 'averageRuntime' field. If both are null, then avoid trying to cast that value to an integer in tvmaze.py. Refs: #654
When using tvmaze.py to retrieve Collection information, the runtime information was not being displayed. This update adds it to output. For example, the Collection information for the TV show "Hogan's Heroes" is now: /usr/share/mythtv/metadata/Television/tvmaze.py -C 1475 <?xml version='1.0' encoding='UTF-8'?> <metadata> <item> <title>Hogan's Heroes</title> ... <inetref>1475</inetref> <imdb>tt0058812</imdb> <collectionref>1475</collectionref> <language>en</language> <releasedate>1965-09-17</releasedate> <userrating>8.000000</userrating> <popularity>85.0</popularity> <year>1965</year> <runtime>30</runtime> ... </item> </metadata> Refs: #654
In rare cases, the TvMaze database provides null timestamp or runtime data fields. To make the metadata retrieval script more fault tolerant, null checks have been added for these values. When the timestamp field is null, don't use it to generate a datetime to use in a search for a matching episode. When the runtime duration field is null, the delta size is now set to zero minutes. The code which looks for an episode match has been updated to separate "on time" recordings from "a little late" recordings to accommodate the possible delta size of zero. Refs: #654
A recording of a broadcast of the first episode in season 16 of "Criminal Minds" on CBS lead to a metadata lookup failure. The general syntax for this metadata lookup was: tvmaze.py -N "Criminal Minds" "2022-11-24 11:00:00" Investigation revealed that for recent years this show is normally only available on a streaming service called Paramount+. https://api.tvmaze.com/shows/81 ... "webChannel":{"id":107,"name":"Paramount+", "country":null,"officialSite": "https://www.paramountplus.com/"} ... The null value for Country caused the Timezone retrieval code in tvmaze.py to fail. The code has been updated to handle a null Country field. While regression testing, another crash appeared. tvmaze.py -N "The Masked Singer" "2022-11-24 19:00:00" This show has distinct versions in many countries and timezones. country='United States', timezone='America/New_York' country='United Kingdom', timezone='Europe/London' country='Australia', timezone='Australia/Sydney' country='Belgium', timezone='Europe/Brussels' country=None, show_tz =None (Japanese version) country='Finland', timezone='Europe/Helsinki' country='Germany', timezone='Europe/Busingen' country='Israel', timezone='Asia/Jerusalem' country='Mexico', timezone='America/Monterrey' The null Country field for the Japanese version is handled fine, but passing 'Asia/Jerusalem' to astimezone() caused it to throw an exception. UnboundLocalError: local variable 'ttmfmt' referenced before assignment This appears to be a bug in python3. To prevent it from crashing our script, UnboundLocalError has been added to the exception catcher. Some of the debug print messages have also been updated to improve their format. Refs: #654
The previous commit added an exception catch for UnboundLocalError because regression testing revealed a failure in handling the Asia/Jerusalem timezone. This timezone uses the version 3 data format. A new issue #672 was opened to ask for support for version 3. Since that issue has now been fixed, we no longer need to catch UnboundLocalError. Refs: #654
Sorry, didn't meant to jump in here. I was looking at the list of pull requests, and didn't realize they were all connected to this issue. Hope I haven't caused a problem. |
@linuxdude42 : No apology necessary. I've been waiting a long time for any progress on getting these changes merged. Thank you very much! |
Sometimes, the TvMaze database only specifies a date and not a time for an episode or set of episodes. This is most common in webChannel originated shows. In such a case, the 'Find Episode by Timestamp' syntax, fails to find an exact or close match. For example: tvmaze.py -N "Criminal Minds" "2022-11-24 21:00:00" For this episode, in the TvMaze database airtime = airdate = 2022-11-24 With this code update, we detect when no airtime is specified in the database, and apply a match-by-date behavior. The match-by-date results are only used as a last resort. First we select exact timestamp matches. If there are none of those, we select close timestamp matches. If there are none of those, then we'll select date only matches. For example: tvmaze.py -N "Criminal Minds" "2022-11-24 21:00:00" <?xml version='1.0' encoding='UTF-8'?> <metadata> <item> <title>Criminal Minds</title> <subtitle>Just Getting Started</subtitle> ... <season>16</season> <episode>1</episode> <inetref>81</inetref> ... </item> <item> <title>Criminal Minds</title> <subtitle>Sicarius</subtitle> ... <season>16</season> <episode>2</episode> <inetref>81</inetref> ... </item> </metadata> Note that there were 2 episodes released on 2022-11-24, so both episodes appear in the output results. In addition, a display of 'runtime' has been added for the -M invocation option. For example tvmaze.py -M "Fire Country" <?xml version='1.0' encoding='UTF-8'?> <metadata> <item> <title>Fire Country</title> ... <inetref>60339</inetref> <collectionref>60339</collectionref> <language>en</language> <releasedate>2022-10-07</releasedate> <userrating>6.300000</userrating> <popularity>99.0</popularity> <year>2022</year> <runtime>60</runtime> ... </item> </metadata> Resolves MythTV#654
Platform: All
MythTV version: fixes/32
Package version:
Component: metadatalookup
What steps will reproduce the bug?
Using the TVmaze service for metadata lookup, there are rare cases where the 'runtime' field is empty, but 'averageRuntime' is not. This happened when looking up metadata for a TV show called "Everything's Trash". Note that the database entry for this show was subsequently updated to include the missing 'runtime' field. But on the night it was recorded, that field was empty.
For the same show, the early contents of the database entries had 'airdate' populated, but no 'airtime'. I'm guessing that at the time the data was populated, the broadcast time had not been decided yet. The missing 'airtime' caused the specific episode lookup to fail.
How often does it reproduce? Is there a required condition?
It's rare, but when it happens, it causes metadata lookups to fail.
What is the expected behaviour?
The lookup scripts should be more flexible to handle these sorts of issues.
If the 'runtime' field is empty, try using the 'averageRuntime' field. If the 'averageRuntime' field is also empty, set a default runtime length. If the 'airtime' field is empty, look for an episode which matches the 'airdate' field.
What do you see instead?
Additional information
Here's an example of the original data for "Everything's Trash":
{"id":60518,
"url":"https://www.tvmaze.com/shows/60518/everythings-trash",
"name":"Everything's Trash",
"type":"Scripted",
"language":"English",
"genres":[],
"status":"Running",
"runtime":null,
"averageRuntime":30,
"premiered":"2022-07-13",
"ended":null,
"officialSite":null,
"schedule": {"time":"","days":[]},
"rating": {"average":null},
"weight":94,
The text was updated successfully, but these errors were encountered: