Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scrape audible author ID #27

Open
buswedg opened this issue Nov 23, 2022 · 8 comments
Open

Scrape audible author ID #27

buswedg opened this issue Nov 23, 2022 · 8 comments

Comments

@buswedg
Copy link

buswedg commented Nov 23, 2022

Would it be possible/ worth capturing the audible author identifier? I don't see it in the current query scope at the title level.

For context -- I'm working towards a folder structure where the first level folders include the first author (full) name, but want to include an id alongside names to keep folders unique.

Will admit, I'm not overly familiar with Audibles metadata. But it does look like they have a 10 digit string for all authors in their db.

For example, I see Michelle Obama has an id of B07B436TLF -- https://www.audible.com/author/Michelle-Obama/B07B436TLF

@seanap
Copy link
Owner

seanap commented Nov 23, 2022

Interesting, I did some digging in a few books page source and it looks like there is a datalayer section at the very bottom which has Author ID. I only checked two books and it was there in both, but I have no idea how consistent it is.

This shouldn't be hard to scrape. What ID3 tag should be put this ID in? I see a couple options; WWWARTIST, MUSICBRAINZ_ALBUMARTISTID, or a custom tag (AudibleAlbumArtistID, or AAAI). https://docs.mp3tag.de/mapping-table/

I hesitate to use the Mbz tag since the ID is from Audible not Mbz. The WWW tag is a good option. The custom tag would have the best description but no other program would know to read it (does that even matter?). Maybe we could consult with the Audiobookshelf team, but I don't know if having this ID tag would even be beneficial for them.

I'm also curious, how do you plan on handling books with multiple authors? Authors like J.N. Chaney frequently have multiple authors, and sometimes his name is listed first, sometimes second or third. Do we just pull the first author listed?

@seanap
Copy link
Owner

seanap commented Nov 23, 2022

I made the executive decision to put the ID in a custom tag called "AUDIBLE_ALBUMARTISTID" to keep it consistent with the Mbz tag (still up for debate, let me know if anyone has a better idea). I've only tested this on a few books, but it seems to work pretty good.

Please test this on a variety of books and report back any issues. Then once we're happy that it's good enough I will merge with the main script.

Download the new .src script here https://github.com/seanap/Audible.com-Search-by-Album/blob/master/Audible.com%23Search%20by%20Album%20-%20BETA.src

@seanap
Copy link
Owner

seanap commented Nov 24, 2022

Here's a format string that does what you're looking for:
Z:\temp\TEST\audiobooks\%albumartist%[ '['%audible_albumartistid%']']\%series%\%year% - %album% [ '['%series% %series-part%']']\%album% (%year%)[ '['%series% %series-part%']']$ifgreater(%_total_files%,1, - pt$num(%track%,2),)

@buswedg
Copy link
Author

buswedg commented Nov 24, 2022

Amazing -- thanks for the quick response. I'll give this a shot tomorrow.

@buswedg
Copy link
Author

buswedg commented Nov 24, 2022

Here's a format string that does what you're looking for: Z:\temp\TEST\audiobooks\%albumartist%[ '['%audible_albumartistid%']']\%series%\%year% - %album% [ '['%series% %series-part%']']\%album% (%year%)[ '['%series% %series-part%']']$ifgreater(%_total_files%,1, - pt$num(%track%,2),)

%albumartist% may include more than one author however? I thought I saw some instances on that earlier today when I was playing with this. Is the albumartistID for only the first author, delimited by a comma?

@seanap
Copy link
Owner

seanap commented Nov 24, 2022

This will only grab the ID of whichever author Audible lists first. A folder would look like /Author1, Author2 [IDof1]/...

@buswedg
Copy link
Author

buswedg commented Nov 24, 2022

ok, I just took a quick look at some page sources. It looks like the authors is just a list of dics. It should have both the first author name and their ID in the first pair. Which is where you're pulling the author id from anyway. I'll make a new custom field similar to AUDIBLE_PRIMARYARTIST and maybe rename your custom id field to AUDIBLE_PRIMARYARTISTID.

On a separate note, I started taking a look at your beets.io fork this morning, as I'll need an automated solution here. But have some suggestions on search priority using ASIN (if already available in tag or filename). I think this will also improve results. But I'll make a separate issue there in time.

@buswedg
Copy link
Author

buswedg commented Nov 25, 2022

You might want to update to something like the below to pull both the first author's name and id. I tested on a bunch of books this morning, and all looks fine.

findline "product:[{"
findinline "{\"fullName\":\"" 1 1
outputto "AUDIBLE_FIRSTARTIST"
sayuntil "\""
findinline "\"id\":\""
outputto "AUDIBLE_FIRSTARTISTID"
sayuntil "\""

I'd say scraping the api.audnex.us endpoint would be the more sustainable solution however. Same as the beets audible plugin. No doubt, it'll remain more stable than the source of audibles audiobook summary pages. And I see that API also includes first author asin as part of their spec.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants