Skip to content

Commit

Permalink
v2.0.0 - use playwright chromium
Browse files Browse the repository at this point in the history
  • Loading branch information
deedy5 committed Feb 16, 2024
1 parent a95dccf commit 510a938
Show file tree
Hide file tree
Showing 7 changed files with 275 additions and 322 deletions.
78 changes: 37 additions & 41 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,26 +1,16 @@
[![Python >= 3.6](https://img.shields.io/badge/python->=3.6-red.svg)](https://www.python.org/downloads/) [![](https://badgen.net/github/release/deedy5/fake_traffic)](https://github.com/deedy5/fake_traffic/releases) [![](https://badge.fury.io/py/fake-traffic.svg)](https://pypi.org/project/fake-traffic)
[![Python >= 3.8](https://img.shields.io/badge/python->=3.6-red.svg)](https://www.python.org/downloads/) [![](https://badgen.net/github/release/deedy5/fake_traffic)](https://github.com/deedy5/fake_traffic/releases) [![](https://badge.fury.io/py/fake-traffic.svg)](https://pypi.org/project/fake-traffic)
# fake_traffic
Imitating an Internet user by mimicking popular web traffic (internet traffic generator).

### How it works:
```python3
1. you specify the country, language and category of interests of a user,
while True:
2. from google trends the script gets a list of popular keywords that are searched in real time
on google by people with a given category of interest in a given country in a given language,
threads:
3. select a random trend, take from there the keywords and urls of related articles,
4. the selected keywords are searched on google and duckduckgo, the found urls are added
to the existing ones,
5. the script sequentially sends requests to a list of urls,
6. in each open url, recursive queries to random links are performed to a random depth (1-5).
```
---
### Install

```python3
pip install -U fake_traffic
```

⚠️ When FakeTraffic runs for the first time, playwright dowloads the chromium browser under the hood, which takes some time.

---
### CLI version
```python3
Expand All @@ -29,35 +19,40 @@ fake_traffic -h
CLI examples:
```python3
# user located in Turkey, who speaks Kurdish and is interested in hot stories
fake_traffic -c tr -l ku-tr -ca h -d
fake_traffic -c tr -l ku-tr -ca h
# user located in Brazil, who speaks Portuguese and is interested in sports
fake_traffic -c br -l pt-br -ca s -d

fake_traffic -c br -l pt-br -ca s
# save logs into 'fake_traffic.log'
fake_traffic -c ru -l ru-ru -ca s -lf
# define wait times between requests
fake_traffic -c fr -l fr-fr -ca b -min_w 1 -max_w 100 -lf
# use none-headless mode
fake_traffic -c en -l en-us -ca t -nh -lf
```
---
### Simple usage
```python3
from fake_traffic import fake_traffic
from fake_traffic import FakeTraffic

fake_traffic(country='US', language='en-US")
FakeTraffic(country='US', language='en-US").crawl()
```
---
### Advanced usage
```python3
from fake_traffic import fake_traffic
from fake_traffic import FakeTraffic

fake_traffic(country='US', language='en-US', category='h', threads=2, min_wait=1, max_wait=5, debug=True)
ft = FakeTraffic(country='US', language='en-US', category='h', min_wait=1, max_wait=5, headless=True)
""" Imitating an Internet user by mimicking popular web traffic (internet traffic generator).
country = country code ISO 3166-1 Alpha-2 code (https://www.iso.org/obp/ui/),
language = country-language code ISO-639 and ISO-3166 (https://www.fincher.org/Utilities/CountryLanguageList.shtml),
category = сategory of interest of a user (defaults to 'h'):
'all' (all), 'b' (business), 'e' (entertainment),
'm' (health), 's' (sports), 't' (sci/tech), 'h' (top stories);
threads = number of threads (defaults to 2),
min_wait = minimal delay between requests (defaults to 1),
max_wait = maximum delay between requests (defaults to 60),
debug = if True, then print the details of the requests (defaults to False).
max_wait = maximum delay between requests (defaults to 10),
headless = True/False (defaults to True).
"""
ft.crawl()
```
---
### Example
Expand All @@ -74,32 +69,33 @@ Find Turkey country-language code ([ISO-639 and ISO-3166](https://www.fincher.or
Set the category ('h', because the user in the example is interested in hot stories):
- category = 'h'

Starting work in two threads:
- threads=2
Starting in none-headless mode:
- headless=False
```python3
from fake_traffic import fake_traffic
from fake_traffic import FakeTraffic

fake_traffic(country="TR", language="ku-TR", category='h', threads=2)
ft = FakeTraffic(country="TR", language="ku-TR", category='h', headless=False)
ft.crawl()
```
P.S. you can select language from other country.
For example, such combinations are also correct:
```python3
fake_traffic(country="TR", language="ar-TR")
fake_traffic(country="US", language="he-IL")
fake_traffic(country="DE", language="hi-IN")
FakeTraffic(country="TR", language="ar-TR").crawl()
FakeTraffic(country="US", language="he-IL").crawl()
FakeTraffic(country="DE", language="hi-IN").crawl()
```
---
### Other examples
Country | Language | Function |
----------|---------- | ---------------------------------------------|
France | French | fake_traffic(country="FR", language="fr-FR") |
Germany | German | fake_traffic(country="DE", language="de-DE", category='b') |
India | English | fake_traffic(country="IN", language="en-IN", category='all') |
India | Hindi | fake_traffic(country="IN", language="hi-IN", max_wait=10) |
Russia | English | fake_traffic(country="RU", language="en-US", category='b', threads=3, debug=True) |
Russia | Russian | fake_traffic(country="RU", language="ru-RU", min_wait=0.5, max_wait=3, threads=5) |
Brazil | Portuguese | fake_traffic(country="BR", language="pt-BR", category='s', threads=2, max_wait=60, debug=True) |
United Kingdom | English | fake_traffic(country="GB", language="en-GB") |
United States | English | fake_traffic(country="US", language="en-US", min_wait=60, max_wait=300) |
United States | Hebrew Israel | fake_traffic(country="US", language="he-IL") |
France | French | `FakeTraffic(country="FR", language="fr-FR")` |
Germany | German | `FakeTraffic(country="DE", language="de-DE", category='b')` |
India | English | `FakeTraffic(country="IN", language="en-IN", category='all')` |
India | Hindi | `FakeTraffic(country="IN", language="hi-IN", max_wait=10)` |
Russia | English | `FakeTraffic(country="RU", language="en-US", category='b', headless=False)` |
Russia | Russian | `FakeTraffic(country="RU", language="ru-RU", min_wait=0.5, max_wait=3)` |
Brazil | Portuguese | `FakeTraffic(country="BR", language="pt-BR", category='s', threads=2, max_wait=60)` |
United Kingdom | English | `FakeTraffic(country="GB", language="en-GB")` |
United States | English | `FakeTraffic(country="US", language="en-US", min_wait=60, max_wait=300)` |
United States | Hebrew Israel | `FakeTraffic(country="US", language="he-IL")` |

2 changes: 1 addition & 1 deletion fake_traffic/__init__.py
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
from .fake_traffic import fake_traffic
from .fake_traffic import FakeTraffic
from .version import __version__
56 changes: 44 additions & 12 deletions fake_traffic/cli.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
import argparse
import logging

from fake_traffic import fake_traffic
from .fake_traffic import FakeTraffic


parser = argparse.ArgumentParser(
Expand Down Expand Up @@ -28,9 +29,6 @@
choices=["all", "b", "e", "m", "s", "t", "h"],
required=False,
)
parser.add_argument(
"-t", "--threads", default=2, help="default=2. Number of threads.", required=False
)
parser.add_argument(
"-min_w",
"--min_wait",
Expand All @@ -41,25 +39,59 @@
parser.add_argument(
"-max_w",
"--max_wait",
default=60,
help="default=60. Maximum wait time between requests.",
default=10,
help="default=10. Maximum wait time between requests.",
required=False,
)
parser.add_argument(
"-d", "--debug", action="store_true", help="Print debug information(requests)", required=False
"-nh",
"--no-headless",
dest="headless",
action="store_false",
help="Run the browser in non-headless mode",
required=False,
)
parser.add_argument(
"-ll",
"--logging_level",
default="INFO",
help="logging level. default=INFO",
required=False,
)
parser.add_argument(
"-lf",
"--logging_file",
action="store_true",
help="save the log into 'fake_traffic.log'",
required=False,
)
args = parser.parse_args()

# logging
logging.basicConfig(
level=args.logging_level,
format="%(asctime)s | %(levelname)s | %(message)s",
datefmt="%Y-%m-%d %H:%M:%S",
force=True,
handlers=[logging.FileHandler("fake_traffic.log"), logging.StreamHandler()]
if args.logging_file
else [logging.StreamHandler()],
)

country = args.country.upper()
language_split = args.language.split("-")
language = f"{language_split[0]}-{language_split[1].upper()}"
logging.info(
f"Run crawl with: {country=}, {language=}, category={args.category} min_w={args.min_wait}, max_w={args.max_wait}, headless={args.headless}, logging_level={args.logging_level}, logging_file={args.logging_file}"
)


fake_traffic(
fake_traffic = FakeTraffic(
country=country,
language=language,
category=args.category,
threads=args.threads,
min_wait=args.min_wait,
max_wait=args.max_wait,
debug=args.debug,
min_wait=int(args.min_wait),
max_wait=int(args.max_wait),
headless=args.headless,
)
fake_traffic.crawl()
Loading

0 comments on commit 510a938

Please sign in to comment.