When I study for AWS Data Analyst, I am looking for something hand-on that I can do. :) I found a interesting dataset on internet regarding to Mobile Base station on Shanghai, thanks for it !!!
[1] Yuanzhe Li, Ao Zhou, Xiao Ma, Shangguang Wang, Profit-aware Edge Server Placement, IEEE Internet of Things Journal, 2021, https://ieeexplore.ieee.org/document/9442389 [PDF] [Sourcecode]
[2] Y. Guo, S. Wang, A. Zhou, J. Xu, J. Yuan, C. Hsu. User Allocation‐aware Edge Cloud Placement in Mobile Edge Computing, Software: Practice and Experience, vol. 50, no. 5, pp. 489-502, 2020.[PDF] [Sourcecode]
[3] S. Wang, Y. Guo, N. Zhang, P. Yang, A. Zhou, X. Shen. Delay-aware Microservice Coordination in Mobile Edge Computing: A Reinforcement Learning Approach, IEEE Transactions on Mobile Computing, vol. 20, no.3, pp.939-953, 2021. [PDF]
The dataset, provided by Shanghai Telecom, contains more than 7.2 million records of accessing the Interent through 3,233 base stations from 9,481 mobile phones for six months. For example, the following figure shows the distribution of base stations. Each node denotes a base station in Shanghai, China. This dataset could help researchers to evaluate their solution in mobile edge computing topic such as edge server placement, service migration, service recommendation, etc.
ID | Parameter Name | Description |
---|---|---|
1 | Month | The month when one record happens |
2 | Date | The date when one record happens |
3 | Start Time | The time when a record starts |
4 | End Time | The time when a record ends |
5 | Base Station | Location |
6 | User ID | Mobile phone |
As shown in the following table, the Telecom dataset shows 6 parameters such as Month, Data, Start Time, End Time, Base Station Location, Mobile Phone ID. The trajectory of users can be found by the dataset.
The most interesting of Big Data is not the coding. But it is what we can do with it to provide the useful information. Then and apply needed SQL. I think:
- Can we get top-ten Base Station Location, in that, the user access Internet mostly. Base on this, we can help mobile service provider to evaluate their solution, such as: evaluate the replacement/ migration service to minimize the DownTime, or increase the network bandwidht in specific BTS etc
- Can we draw a "internet mobile access map" for specific user, in that, we can know where he/she access internet and the shop location nearby, so we may send suggested proper advertisement.
import argparse
from pyspark.sql import SparkSession
def query_bts_location(data_source, output_uri):
"""
Query top 10 BTS with high internet load and return long-lat attritutes
"""
with SparkSession.builder.appName("Check BTS location").getOrCreate() as spark:
# Load the restaurant violation CSV data
if data_source is not None:
bts_df = spark.read.option("header", "true").csv(data_source)
# Create an in-memory DataFrame to query
bts_df.createOrReplaceTempView("bts_data")
# Create a DataFrame of the top 10 BTS with their location and with highest traffic
top_ten_base_station_locations = spark.sql("""SELECT latitude, longitude, count(*) AS top_ten_base_station
FROM bts_data
GROUP BY latitude, longitude
ORDER BY top_ten_base_station DESC LIMIT 10""")
# Write the results to the specified output URI
top_ten_base_station_locations.write.option("header", "true").mode("overwrite").csv(output_uri)
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument(
'--data_source', help="The URI for you CSV data, like an S3 bucket location.")
parser.add_argument(
'--output_uri', help="The URI where output is saved, like an S3 bucket location.")
args = parser.parse_args()
query_bts_location(args.data_source, args.output_uri)
Need to draw it on a map (on developing)
Need to draw it on a map (on developing)