Filter Promoted Company ads #1

jfcolomer · 2023-09-15T11:35:30Z

Hi there,
Thanks for creating this script, it's fabulous!
I was wondering what'd be the best way to target not every single post but specifically PROMOTE ADS, this is, the ones listed here:
https://www.linkedin.com/company/{company-name}/posts/?feedView=ads
For some reason when I update the link variable on the scrape function to be something like link = f'{link}/posts/?feedView=ads' it will only pick up the very first promoted ad but for some reason it won't be able to collect the remaining ones (i.e. 50 ads, it will return only 1 result) and from this result it won't be able to collect likes/links (i.e. an ad with a carousel and items with links).
For ALL other posts, it does indeed work as a charm.
Thanks

jfcolomer · 2023-10-25T15:17:54Z

Hi,

Any help to understand how the post individual items are created before they are passed to the postInfo = getPostInformation(str(post)) would be really appreciated:

`def scrape(driver, link, profileType):
if (profileType == "Company"):
link = f'{link}/posts/?feedView=ads'
else:
link = f'{link}/recent-activity/all/'

driver.get(link)

time.sleep(3)

posts = {}

old_position = 0
new_position = None
counter = 0
while new_position != old_position:
    # Get old scroll position
    old_position = driver.execute_script(
            ("return (window.pageYOffset !== undefined) ?"
             " window.pageYOffset : (document.documentElement ||"
             " document.body.parentNode || document.body);"))
    time.sleep(1)           #experimentar tirar eleste limte de tempo, para ver se a execução do programa é mais rápida, como o programa está a fazer processamento pode ser que não seja nbecessáio o tempo de sleep como era preciso no insta. No insta apenas estava a fazer scrool sem nenhum processamento pelo meio
    soup = BeautifulSoup(driver.page_source, 'html.parser')
    soup = str(soup)

    results = soup.split('occludable-update')
    
    # results = {}
    for result in results:

        try:
            counter += 1

            postlink = result.split('data-urn="')[counter].split('"')[0]
            postlink = f'https://www.linkedin.com/feed/update/{postlink}'
        except:
            postlink = ''
        
        if('linkedin' in postlink):
            posts[postlink] = result
    new_position = scroll(driver, old_position)

print(f'\n\nFound {len(posts)} posts.')
postsFiltered = []

for postlink, post in posts.items():
    postInfo = getPostInformation(str(post))
    postInfo.append(postlink)
    postsFiltered.append(postInfo)`

After refactoring the link variable, link = f'{link}/posts/?feedView=ads' I can get the script to export all the final company promoted posts exported to the csv with this format:
https://www.linkedin.com/feed/update/urn:li:activity:00000000000000001
https://www.linkedin.com/feed/update/urn:li:activity:00000000000000002
https://www.linkedin.com/feed/update/urn:li:activity:00000000000000003

and so on ...

But the description, hashtags etc.. will only return the values for the first of the posts, in this case https://www.linkedin.com/feed/update/urn:li:activity:00000000000000001 so it'd be really appreciated if you could explain how the post variable that is referenced here

LinkedinScraperCompanies/linkdin.py

Line 50 in 8365a6e

for postlink, post in posts.items():

is generated.

Thanks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Filter Promoted Company ads #1

Filter Promoted Company ads #1

jfcolomer commented Sep 15, 2023

jfcolomer commented Oct 25, 2023 •

edited

Loading

Filter Promoted Company ads #1

Filter Promoted Company ads #1

Comments

jfcolomer commented Sep 15, 2023

jfcolomer commented Oct 25, 2023 • edited Loading

jfcolomer commented Oct 25, 2023 •

edited

Loading