Ad Removal Layer (Ad Blocking) #30

nb333 · 2013-12-31T03:31:11Z

Our Ad Blocking will be similar to AdBlock's service, but ours will be server-side. Thus, we will strip out the ad so it's never even sent to the user. :D

zlatanvasovic · 2013-12-31T11:01:55Z

Hum hum. In JavaScript or Python? If with Python we'll need to use a DOM addon (I pretty don't know what) and do a search within <body>, with predefined ad classes.

nb333 · 2014-01-01T01:22:09Z

@zdroid Correct. With JavaScript being client-side, we would have to remove the ad once we've already fetched or prevent the call to get the ad. Instead, if we chose Python (server-side), that allows us to remove the ad code before it ever gets to the client.

zlatanvasovic · 2014-01-01T10:56:34Z

Yeah, but Python one would be super-super complex.

arunenigma · 2014-01-01T15:14:18Z

@nb333 @zdroid Sorry, I have been really busy lately with school and work. @zdroid if you can sent me the detailed requirements, I can try to help with this issue.

zlatanvasovic · 2014-01-01T15:31:24Z

@arunenigma Don't worry, it's New Year! Relax... :)

Details: OpenFaux server should get ads and remove them, then OpenFaux client renders page and opens it without ads.

Sp3ctr3 · 2014-01-01T15:34:45Z

Hmm..privoxy has something similar, we should be able to emulate that in python.

zlatanvasovic · 2014-01-01T15:47:40Z

Ok.

2014/1/1 Yashin Mehaboobe [email protected]

Hmm..privoxy has something similar, we should be able to emulate that in
python.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/30#issuecomment-31424568
.

Zlatan Vasović - ZDroid

admwx7 · 2014-01-06T17:34:44Z

A big portion of ad services we'll be removing are run through major services (such as google ad sense) and will have a streamlined implementation we can be looking for. If we wanted to it'd be as easy as running the code through a regex and stripping out anything that matches. Personally I'd prefer to use a DOM handler so we know the objects are preserved as expected then we can just run attributes of the elements the DOM generates through a regex.

boxtown · 2014-01-08T01:24:07Z

Just to let you guys know, regex cannot be used to parse HTML. HTML is not a regular language. You need to parse the HTML first and then possibly use regex (although probably not required after parsing HTML). Shouldn't be a problem if done server side though because Python comes with a HTMLParser class in its standard library.

zlatanvasovic · 2014-01-08T09:38:35Z

Lawl lawl lawl. I said load HTML and then search it. :D

Problem is that Python doesn't love HTML too much.

2014/1/8 Michael Ma [email protected]

Just to let you guys know, regex cannot be used to parse HTML. HTML is not
a regular language. You need to parse the HTML first and then possibly use
regex (although probably not required after parsing HTML). Shouldn't be a
problem if done server side though because Python comes with a HTMLParser
class in its standard library.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/30#issuecomment-31797889
.

Zlatan Vasović - ZDroid

admwx7 · 2014-01-08T14:26:13Z

@boxtown HTML is a regular language, to be exact it's a markup language, yes it's syntax is different from a programming language but it's still a standardized language.

@zdroid if you're saying we render the HTML then search it, we don't want to do that either.

HTMLParser should do the trick, if it's anything like the built in parser for JS then we can just search for all of element type x with class y and remove it/them. Will just require a bit of research on our part to find the common elements between ads generated by the different ad services. It may help to check out the source code for adblock (https://hg.adblockplus.org/adblockplus/) since they do this already, although their service is client-side. It's possible adblock uses a different method we haven't thought of that might work better, same goes for any other service of this type.

The only thing I'm worried about with removing HTML elements is that it may destroy the flow of the page, in which case maybe there's a way we can just unlink all of the files that are required for the ad, so if it generates the ad through some JS, remove the JS include, if there's an image associated with it, remove the image and so on so it never grabs the resources but the element is still there and (depending on how the ad service implements) still filling the space, just with empty space now. Ideas?

Sp3ctr3 · 2014-01-08T16:19:44Z

Wouldn't building a blacklist of ad elements help? Check if any of them exist in the browser contents and then remove it altogether?

admwx7 · 2014-01-08T16:24:26Z

We don't want to accidentally break someone's layout by just blindly removing the elements, but a blacklist will be needed. Instead of blacklisting a

<div class="ad">...</div>

element instead we can focus on the part that will actually impact the user's experience, such as removing the

<script src="getYoAdHere.spam/..." />

that will actually be making requests out so when the element renders it'll keep it's styling that was added and the div element so it shouldn't break the flow of the page but since it's never grabbing the script to fetch the image it'll never actually render anything more then some black space. This will also help with those pesky sites that have JS built int to overlay a ad that you have to click close on before you can see the content.

Sp3ctr3 · 2014-01-08T16:29:40Z

Alright. Once we figure out what to remove, the actual removal should be fairly trivial. We just modify the buffer in the proxy accordingly. Parse the HTML content using Beautiful soup or lxml (faster?) and then remove the element.

admwx7 · 2014-01-08T16:44:08Z

Agreed, we'll just have to find the common culprits and create a blacklist for it.

zlatanvasovic mentioned this issue Jan 4, 2014

Add support for encryption openfaux/openfaux-client#17

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ad Removal Layer (Ad Blocking) #30

Ad Removal Layer (Ad Blocking) #30

nb333 commented Dec 31, 2013

zlatanvasovic commented Dec 31, 2013

nb333 commented Jan 1, 2014

zlatanvasovic commented Jan 1, 2014

arunenigma commented Jan 1, 2014

zlatanvasovic commented Jan 1, 2014

Sp3ctr3 commented Jan 1, 2014

zlatanvasovic commented Jan 1, 2014

admwx7 commented Jan 6, 2014

boxtown commented Jan 8, 2014

zlatanvasovic commented Jan 8, 2014

admwx7 commented Jan 8, 2014

Sp3ctr3 commented Jan 8, 2014

admwx7 commented Jan 8, 2014

Sp3ctr3 commented Jan 8, 2014

admwx7 commented Jan 8, 2014

Ad Removal Layer (Ad Blocking) #30

Ad Removal Layer (Ad Blocking) #30

Comments

nb333 commented Dec 31, 2013

zlatanvasovic commented Dec 31, 2013

nb333 commented Jan 1, 2014

zlatanvasovic commented Jan 1, 2014

arunenigma commented Jan 1, 2014

zlatanvasovic commented Jan 1, 2014

Sp3ctr3 commented Jan 1, 2014

zlatanvasovic commented Jan 1, 2014

admwx7 commented Jan 6, 2014

boxtown commented Jan 8, 2014

zlatanvasovic commented Jan 8, 2014

admwx7 commented Jan 8, 2014

Sp3ctr3 commented Jan 8, 2014

admwx7 commented Jan 8, 2014

Sp3ctr3 commented Jan 8, 2014

admwx7 commented Jan 8, 2014