Tvmaze metadata lookup failures #654

SteveErl · 2022-10-24T22:03:28Z

Platform: All
MythTV version: fixes/32
Package version:
Component: metadatalookup

What steps will reproduce the bug?

Using the TVmaze service for metadata lookup, there are rare cases where the 'runtime' field is empty, but 'averageRuntime' is not. This happened when looking up metadata for a TV show called "Everything's Trash". Note that the database entry for this show was subsequently updated to include the missing 'runtime' field. But on the night it was recorded, that field was empty.

For the same show, the early contents of the database entries had 'airdate' populated, but no 'airtime'. I'm guessing that at the time the data was populated, the broadcast time had not been decided yet. The missing 'airtime' caused the specific episode lookup to fail.

How often does it reproduce? Is there a required condition?

It's rare, but when it happens, it causes metadata lookups to fail.

What is the expected behaviour?

The lookup scripts should be more flexible to handle these sorts of issues.
If the 'runtime' field is empty, try using the 'averageRuntime' field. If the 'averageRuntime' field is also empty, set a default runtime length. If the 'airtime' field is empty, look for an episode which matches the 'airdate' field.

What do you see instead?

Additional information

Here's an example of the original data for "Everything's Trash":

{"id":60518,
"url":"https://www.tvmaze.com/shows/60518/everythings-trash",
"name":"Everything's Trash",
"type":"Scripted",
"language":"English",
"genres":[],
"status":"Running",
"runtime":null,
"averageRuntime":30,
"premiered":"2022-07-13",
"ended":null,
"officialSite":null,
"schedule": {"time":"","days":[]},
"rating": {"average":null},
"weight":94,

SteveErl · 2022-10-24T22:34:31Z

I have a solution for this bug. Once I figure out how to push my solution to my GitHub branch, the fix will be available.

…lookup failures MythTV#654

rcrdnalor · 2022-10-30T18:57:22Z

Looking at your provided patch, this issue actually addresses four items:

Cleanup python imports of not needed modules
No functional change.
Fall back on average runtime if the the episode's runtime is not provided
I am not sure what improvement is behind this change. I prefer to not
provide inaccurate/guessed data if no data are available, e.g.: the
hardcoded 60 mins for a duration which is not provided by tvmaze.
Add robustness when fetching a non existent timestamp for an episode:
Do not interrupt iteration of all found episodes matching a certain timestamp
if one episode does not provide accurate data.
Workaround when the grabber tvmaze.py is not providing correct data for
the timestamp of an aired episode: Use aired date instead of aired timestamp.
IMO, this workaround only fits to your actual use-case: 'record one episode a day'.
What happens if a user records multiple episodes a day and is using the
'timestamp' method to fetch actual episode data and tvmaze does not - yet -
provide current timestamps?
I assume the user will get confused because he does not know that this kind
of workaround exists.

Once you considered this feedback, please open a pull request against mythtv/master
on git which addresses above items in separately commits and explain the need for change.
See https://www.mythtv.org/wiki/Development_guide
and https://www.mythtv.org/wiki/Submitting_Bug_Fixes
for details.
Writing detailed pull requests against current mythtv/master allows other
developers and users to efficiently review them and incorporate the changes.

SteveErl · 2022-11-01T03:16:17Z

@rcrdnalor : Thanks for your feedback.

Bullet 2 - I agree that assigning a default of 60 minutes was an ugly hack. I'll rework the code to eliminate it.

Bullet 4 - It's true that some TV shows air multiple original broadcast episodes in one day, but that's rare. There are cases where the TvMaze database provides an airdate but doesn't provide a timestamp, but that's very rare. So, in the extremely rare case of both of these happening simultaneously, there would be multiple episodes which would be possible matches for the recording. There's no demand that there be only one episode per day. The script will have more than one 'matchesFound', and will print them all out. It's already designed to do this. If all you have is a date, and there are 5 episodes matching that date, then all 5 should be printed out.

If I follow your suggestion to break down my solution into smaller chunks, do I have to create a new Issue for each one? Or can there be four commits corresponding to one Issue, with each one adding a little bit more? Does each commit require its own pull request, or can 4 commits go in one pull request?

The Submitting Bug Fixes document says to "run the current unstable/development version" but that seems dangerous to me. I need my MythTv to run in a stable fashion every day, so I'd rather have my system based on fixes/32. How can you test your code on an unstable release? Do you have to run it on a different machine?
The Submitting Bug Fixes document also suggests using Trac, but isn't that deprecated now? I'm under the impression that the current process is to use GitHub to document bugs and features, but it's frustrating that I cannot see how to assign myself an issue which I've created.

rcrdnalor · 2022-11-02T20:26:20Z

Writing pull requests should be always against git/master.
If you use *buntu then this may help to install master for testing:
https://github.com/MythTV/packaging/tree/master/deb-light
For the separation of the bullets, I think I will do item 1 and item 3, anyway.
Please focus on items 2 & 4.

Just a quick question:
You said, that in 'rare cases' tvmaze does not provide early data.
If you fetch the metadata for that cases the next or the other day, does it succeed?

SteveErl · 2022-11-02T21:15:36Z

@rcrdnalor : Could you please clarify the process? When you said "open a pull request against mythtv/master ... which addresses above items in separate commits", does that mean I should have 4 separate commits against one issue (this one), or would I need to create 4 separate issues, with one commit against each one?
I've started porting my changes to git/master. So far, I've implemented bullet 1. Should I commit that change with the explanation for the need for the change in the commit comment, or does the need for the change have to be documented in its own issue?
Should there be one pull request for all 4 bullet items, or 4 pull requests for the 4 bullets?

SteveErl · 2022-11-03T03:53:42Z

@rcrdnalor : In regards to your quick question, I first noticed missing 'runtime' and 'airtime' fields in the TvMaze database when processing an episode of a show called "Everything's Trash". At the broadcast time of the show (7/27/22), and 2 days later (7/29/22), the database had:

"id":2356087, ... "airdate":"2022-07-27", "airtime":"","airstamp":"2022-07-27T16:00:00+00:00","runtime":null

I think about a week later, when I checked again, it showed

"id":2356087,... "airdate":"2022-07-27", "airtime":"22:00","airstamp":"2022-07-28T02:00:00+00:00","runtime":30

It appears that someone updated the database between these fetch times. Since the metadata is fetched right after the recording has finished, an update a week later doesn't really help.

SteveErl · 2022-11-04T01:44:04Z

Updating the solution to split it into smaller commit components and to base it off of mythtv/master.

…ime field. If both are missing, avoid a crash by testing for a non-NULL episode duration. MythTV#654

… episode, and not the time. MythTV#654

…ime field. If both are missing, avoid a crash by testing for a non-NULL episode duration. MythTV#654

…rageRuntime field. If both are missing, avoid a crash by testing for a non-NULL episode duration. MythTV#654" This reverts commit 82bad85. I've learned that I cannot push more than one commit from one branch.

rcrdnalor · 2022-11-12T20:30:27Z

Steve,
I see good progress here. Please keep in mind that the commit messages of your commits in your pull requests are empty.
Future reader of this source code will not see the motivation of the change and need to go down the hard way to identify the issue behind the pull request.
The text you wrote in the pull request message will not be included in the commit message when the pull request will be applied.

See https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue
BTW, the keyword "Refs #654" should work as well within your commit message of your local branch.

Thank you for improving MythTV,
regards,
Roland

In rare cases, the TvMaze database provides null timestamp or runtime data fields. To make the metadata retrieval script more fault tolerant, null checks have been added for these values. When the timestamp field is null, don't use it to generate a datetime to use in a search for a matching episode. When the runtime duration field is null, the delta size is now set to zero minutes. The code which looks for an episode match has been updated to separate "on time" recordings from "a little late" recordings to accommodate the possible delta size of zero. Refs: MythTV#654

The script tvmaze/show.py had a line which retrieved the episode duration from the metadata provider, but then assigned it to num_episodes, which makes no sense. There is no code looking for num_episodes, but there is code looking for runtime, so the line has been fixed to assign to runtime. Using the TVmaze service for metadata lookup, there are rare cases where the 'runtime' field is empty, but 'averageRuntime' is not. This happened when looking up metadata for a TV show called "Everything's Trash". Note that the database entry for this show was subsequently updated to include the missing 'runtime' field. But on the night it was recorded, and for several days later, that field was null. Here's an excerpt of the original metadata for "Everything's Trash": "id":60518, "url":"https://www.tvmaze.com/shows/60518/everythings-trash", "name":"Everything's Trash", "type":"Scripted", "language":"English", "genres":[], "status":"Running", "runtime":null, "averageRuntime":30, To handle this type of scenario, when the 'runtime' field is null, read the duration from the 'averageRuntime' field. If both are null, then avoid trying to cast that value to an integer in tvmaze.py. Refs: MythTV#654

The tool pylint reported several unnecessary includes in tvmaze.py. These have been removed to make the script run a little bit faster. Refs: MythTV#654

SteveErl · 2022-11-14T07:05:06Z

@rcrdnalor : I've amended commit messages and have generated 3 pull requests: #658, #662, #663. The fourth commit will be slightly dependent on the 3rd one getting merged to master. I'll have to wait for that to happen before proceeding with generation of that commit and the corresponding pull request.

A recording of a broadcast of the first episode in season 16 of "Criminal Minds" on CBS lead to a metadata lookup failure. The general syntax for this metadata lookup was: tvmaze.py -N "Criminal Minds" "2022-11-24 11:00:00" Investigation revealed that for recent years this show is normally only available on a streaming service called Paramount+. https://api.tvmaze.com/shows/81 ... "webChannel":{"id":107,"name":"Paramount+", "country":null,"officialSite": "https://www.paramountplus.com/"} ... The null value for Country caused the Timezone retrieval code in tvmaze.py to fail. The code has been updated to handle a null Country field. While regression testing, another crash appeared. tvmaze.py -N "The Masked Singer" "2022-11-24 19:00:00" This show has distinct versions in many countries and timezones. country='United States', timezone='America/New_York' country='United Kingdom', timezone='Europe/London' country='Australia', timezone='Australia/Sydney' country='Belgium', timezone='Europe/Brussels' country=None, show_tz =None (Japanese version) country='Finland', timezone='Europe/Helsinki' country='Germany', timezone='Europe/Busingen' country='Israel', timezone='Asia/Jerusalem' country='Mexico', timezone='America/Monterrey' The null Country field for the Japanese version is handled fine, but passing 'Asia/Jerusalem' to astimezone() caused it to throw an exception. UnboundLocalError: local variable 'ttmfmt' referenced before assignment This appears to be a bug in python3. To prevent it from crashing our script, UnboundLocalError has been added to the exception catcher. Some of the debug print messages have also been updated to improve their format. Refs: MythTV#654

SteveErl · 2022-11-30T07:15:59Z

The UnboundLocalError is not necessarily a bug in python3. It seems to be in the Mythtv version of dt.py. Here's the stack trace when show_tz = 'Asia/Jerusalem':

Traceback (most recent call last):
  File "programs/scripts/metadata/Television/tvmaze.py", line 767, in <module>
    main()
  File "programs/scripts/metadata/Television/tvmaze.py", line 744, in main
    buildNumbers(args, opts)
  File "programs/scripts/metadata/Television/tvmaze.py", line 235, in buildNumbers
    dtInTgtZone = dtInLocalZone.astimezone(posixtzinfo(show_tz))
  File "/usr/lib/python3/dist-packages/MythTV/utility/singleton.py", line 49, in __call__
    inst = type.__call__(cls, *args, **kwargs)
  File "/usr/lib/python3/dist-packages/MythTV/utility/dt.py", line 234, in __init__
    self._process(fd, version)
  File "/usr/lib/python3/dist-packages/MythTV/utility/dt.py", line 167, in _process
    t = unpack(ttmfmt, fd.read(calcsize(ttmfmt)))[0]
UnboundLocalError: local variable 'ttmfmt' referenced before assignment

rcrdnalor · 2022-11-30T19:00:43Z

Looking at the definition:

$ file /usr/share/zoneinfo/Asia/Jerusalem 
/usr/share/zoneinfo/Asia/Jerusalem: timezone data (fat), version 3, 9 gmt time flags, 9 std time flags, no leap seconds, 149 transition times, 9 local time types, 21 abbreviation chars

the version 3 is not supported by

mythtv/mythtv/bindings/python/MythTV/utility/dt.py

Line 119 in 00e4fc5

class posixtzinfo(basetzinfo, metaclass=InputSingleton):

Please open an issue against the python bindings.

SteveErl · 2022-11-30T22:23:06Z

@rcrdnalor : Thanks. I've opened #672 for the issue. Should I change the comment to say UnboundLocalError is needed until version 3 of the timezone protocol is supported?

The previous commit added an exception catch for UnboundLocalError because regression testing revealed a failure in handling the Asia/Jerusalem timezone. This timezone uses the version 3 data format. A new issue MythTV#672 was opened to ask for support for version 3. Since that issue has now been fixed, we no longer need to catch UnboundLocalError. Refs: MythTV#654

When using tvmaze.py to retrieve Collection information, the runtime information was not being displayed. This update adds it to output. For example, the Collection information for the TV show "Hogan's Heroes" is now: /usr/share/mythtv/metadata/Television/tvmaze.py -C 1475 <?xml version='1.0' encoding='UTF-8'?> <metadata> <item> <title>Hogan's Heroes</title> ... <inetref>1475</inetref> <imdb>tt0058812</imdb> <collectionref>1475</collectionref> <language>en</language> <releasedate>1965-09-17</releasedate> <userrating>8.000000</userrating> <popularity>85.0</popularity> <year>1965</year> <runtime>30</runtime> ... </item> </metadata> Refs: MythTV#654

The tool pylint reported several unnecessary includes in tvmaze.py. These have been removed to make the script run a little bit faster. Refs: #654

The script tvmaze/show.py had a line which retrieved the episode duration from the metadata provider, but then assigned it to num_episodes, which makes no sense. There is no code looking for num_episodes, but there is code looking for runtime, so the line has been fixed to assign to runtime. Using the TVmaze service for metadata lookup, there are rare cases where the 'runtime' field is empty, but 'averageRuntime' is not. This happened when looking up metadata for a TV show called "Everything's Trash". Note that the database entry for this show was subsequently updated to include the missing 'runtime' field. But on the night it was recorded, and for several days later, that field was null. Here's an excerpt of the original metadata for "Everything's Trash": "id":60518, "url":"https://www.tvmaze.com/shows/60518/everythings-trash", "name":"Everything's Trash", "type":"Scripted", "language":"English", "genres":[], "status":"Running", "runtime":null, "averageRuntime":30, To handle this type of scenario, when the 'runtime' field is null, read the duration from the 'averageRuntime' field. If both are null, then avoid trying to cast that value to an integer in tvmaze.py. Refs: #654

When using tvmaze.py to retrieve Collection information, the runtime information was not being displayed. This update adds it to output. For example, the Collection information for the TV show "Hogan's Heroes" is now: /usr/share/mythtv/metadata/Television/tvmaze.py -C 1475 <?xml version='1.0' encoding='UTF-8'?> <metadata> <item> <title>Hogan's Heroes</title> ... <inetref>1475</inetref> <imdb>tt0058812</imdb> <collectionref>1475</collectionref> <language>en</language> <releasedate>1965-09-17</releasedate> <userrating>8.000000</userrating> <popularity>85.0</popularity> <year>1965</year> <runtime>30</runtime> ... </item> </metadata> Refs: #654

In rare cases, the TvMaze database provides null timestamp or runtime data fields. To make the metadata retrieval script more fault tolerant, null checks have been added for these values. When the timestamp field is null, don't use it to generate a datetime to use in a search for a matching episode. When the runtime duration field is null, the delta size is now set to zero minutes. The code which looks for an episode match has been updated to separate "on time" recordings from "a little late" recordings to accommodate the possible delta size of zero. Refs: #654

A recording of a broadcast of the first episode in season 16 of "Criminal Minds" on CBS lead to a metadata lookup failure. The general syntax for this metadata lookup was: tvmaze.py -N "Criminal Minds" "2022-11-24 11:00:00" Investigation revealed that for recent years this show is normally only available on a streaming service called Paramount+. https://api.tvmaze.com/shows/81 ... "webChannel":{"id":107,"name":"Paramount+", "country":null,"officialSite": "https://www.paramountplus.com/"} ... The null value for Country caused the Timezone retrieval code in tvmaze.py to fail. The code has been updated to handle a null Country field. While regression testing, another crash appeared. tvmaze.py -N "The Masked Singer" "2022-11-24 19:00:00" This show has distinct versions in many countries and timezones. country='United States', timezone='America/New_York' country='United Kingdom', timezone='Europe/London' country='Australia', timezone='Australia/Sydney' country='Belgium', timezone='Europe/Brussels' country=None, show_tz =None (Japanese version) country='Finland', timezone='Europe/Helsinki' country='Germany', timezone='Europe/Busingen' country='Israel', timezone='Asia/Jerusalem' country='Mexico', timezone='America/Monterrey' The null Country field for the Japanese version is handled fine, but passing 'Asia/Jerusalem' to astimezone() caused it to throw an exception. UnboundLocalError: local variable 'ttmfmt' referenced before assignment This appears to be a bug in python3. To prevent it from crashing our script, UnboundLocalError has been added to the exception catcher. Some of the debug print messages have also been updated to improve their format. Refs: #654

The previous commit added an exception catch for UnboundLocalError because regression testing revealed a failure in handling the Asia/Jerusalem timezone. This timezone uses the version 3 data format. A new issue #672 was opened to ask for support for version 3. Since that issue has now been fixed, we no longer need to catch UnboundLocalError. Refs: #654

linuxdude42 · 2023-01-22T21:22:02Z

Sorry, didn't meant to jump in here. I was looking at the list of pull requests, and didn't realize they were all connected to this issue. Hope I haven't caused a problem.

SteveErl · 2023-01-23T04:47:57Z

@linuxdude42 : No apology necessary. I've been waiting a long time for any progress on getting these changes merged. Thank you very much!

Sometimes, the TvMaze database only specifies a date and not a time for an episode or set of episodes. This is most common in webChannel originated shows. In such a case, the 'Find Episode by Timestamp' syntax, fails to find an exact or close match. For example: tvmaze.py -N "Criminal Minds" "2022-11-24 21:00:00" For this episode, in the TvMaze database airtime = airdate = 2022-11-24 With this code update, we detect when no airtime is specified in the database, and apply a match-by-date behavior. The match-by-date results are only used as a last resort. First we select exact timestamp matches. If there are none of those, we select close timestamp matches. If there are none of those, then we'll select date only matches. For example: tvmaze.py -N "Criminal Minds" "2022-11-24 21:00:00" <?xml version='1.0' encoding='UTF-8'?> <metadata> <item> <title>Criminal Minds</title> <subtitle>Just Getting Started</subtitle> ... <season>16</season> <episode>1</episode> <inetref>81</inetref> ... </item> <item> <title>Criminal Minds</title> <subtitle>Sicarius</subtitle> ... <season>16</season> <episode>2</episode> <inetref>81</inetref> ... </item> </metadata> Note that there were 2 episodes released on 2022-11-24, so both episodes appear in the output results. In addition, a display of 'runtime' has been added for the -M invocation option. For example tvmaze.py -M "Fire Country" <?xml version='1.0' encoding='UTF-8'?> <metadata> <item> <title>Fire Country</title> ... <inetref>60339</inetref> <collectionref>60339</collectionref> <language>en</language> <releasedate>2022-10-07</releasedate> <userrating>6.300000</userrating> <popularity>99.0</popularity> <year>2022</year> <runtime>60</runtime> ... </item> </metadata> Resolves MythTV#654

SteveErl added a commit to SteveErl/mythtv that referenced this issue Oct 25, 2022

Adding flexibility to TVmaze metadata lookup. Solves Tvmaze metadata …

155586a

…lookup failures MythTV#654

SteveErl added a commit to SteveErl/mythtv that referenced this issue Nov 4, 2022

Cleanup to remove unnecessary python imports MythTV#654

de496f8

SteveErl added a commit to SteveErl/mythtv that referenced this issue Nov 4, 2022

When a TvMaze episode runtime field is missing, try using averageRunt…

497dede

…ime field. If both are missing, avoid a crash by testing for a non-NULL episode duration. MythTV#654

SteveErl added a commit to SteveErl/mythtv that referenced this issue Nov 4, 2022

Add robustness when the timestamp is missing for an episode. MythTV#654

9da4396

SteveErl added a commit to SteveErl/mythtv that referenced this issue Nov 4, 2022

Add handling for rare case when TvMaze only specifies the date for an…

5a1a8bb

… episode, and not the time. MythTV#654

SteveErl mentioned this issue Nov 8, 2022

Remove unnecessary python imports in tvmaze.py #658

Merged

5 tasks

SteveErl added a commit to SteveErl/mythtv that referenced this issue Nov 8, 2022

When a TvMaze episode runtime field is missing, try using averageRunt…

82bad85

…ime field. If both are missing, avoid a crash by testing for a non-NULL episode duration. MythTV#654

This was referenced Nov 11, 2022

Fix setting of TvMaze show runtime #662

Merged

Handle null timestamp or runtime fields in TvMaze data #663

Merged

SteveErl added a commit to SteveErl/mythtv that referenced this issue Nov 13, 2022

Cleanup to remove unnecessary python imports

ec29ebc

The tool pylint reported several unnecessary includes in tvmaze.py. These have been removed to make the script run a little bit faster. Refs: MythTV#654

linuxdude42 pushed a commit that referenced this issue Jan 22, 2023

Cleanup to remove unnecessary python imports

5be6bda

The tool pylint reported several unnecessary includes in tvmaze.py. These have been removed to make the script run a little bit faster. Refs: #654

SteveErl linked a pull request Jan 24, 2023 that will close this issue

Match by date when no time specified in TvMaze DB #701

Open

5 tasks

rcrdnalor self-assigned this Feb 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tvmaze metadata lookup failures #654

Tvmaze metadata lookup failures #654

SteveErl commented Oct 24, 2022

SteveErl commented Oct 24, 2022

rcrdnalor commented Oct 30, 2022

SteveErl commented Nov 1, 2022

rcrdnalor commented Nov 2, 2022

SteveErl commented Nov 2, 2022

SteveErl commented Nov 3, 2022 •

edited

Loading

SteveErl commented Nov 4, 2022

rcrdnalor commented Nov 12, 2022 •

edited

Loading

SteveErl commented Nov 14, 2022

SteveErl commented Nov 30, 2022

rcrdnalor commented Nov 30, 2022

SteveErl commented Nov 30, 2022

linuxdude42 commented Jan 22, 2023

SteveErl commented Jan 23, 2023

Tvmaze metadata lookup failures #654

Tvmaze metadata lookup failures #654

Comments

SteveErl commented Oct 24, 2022

What steps will reproduce the bug?

How often does it reproduce? Is there a required condition?

What is the expected behaviour?

What do you see instead?

Additional information

SteveErl commented Oct 24, 2022

rcrdnalor commented Oct 30, 2022

SteveErl commented Nov 1, 2022

rcrdnalor commented Nov 2, 2022

SteveErl commented Nov 2, 2022

SteveErl commented Nov 3, 2022 • edited Loading

SteveErl commented Nov 4, 2022

rcrdnalor commented Nov 12, 2022 • edited Loading

SteveErl commented Nov 14, 2022

SteveErl commented Nov 30, 2022

rcrdnalor commented Nov 30, 2022

SteveErl commented Nov 30, 2022

linuxdude42 commented Jan 22, 2023

SteveErl commented Jan 23, 2023

SteveErl commented Nov 3, 2022 •

edited

Loading

rcrdnalor commented Nov 12, 2022 •

edited

Loading