publication.html

<!doctype html><html><head><meta charset="utf-8"><meta http-equiv="X-UA-Compatible" content="IE=edge"><title>SUSTech DBGroup</title><meta name="description" content=""><meta name="viewport" content="width=device-width,initial-scale=1"><meta name="robots" content="all,follow"><link rel="stylesheet" href="vendor/bootstrap/css/bootstrap.min.css"><link rel="stylesheet" href="https://fonts.googleapis.com/css?family=Roboto:300,400,700"><link rel="stylesheet" href="css/pe-icon-7-stroke.css"><link rel="stylesheet" href="vendor/lightbox2/css/lightbox.min.css"><link rel="stylesheet" href="vendor/leaflet/leaflet.css"><link rel="stylesheet" href="css/style.default.css" id="theme-stylesheet"><link rel="stylesheet" href="css/custom.css"><link rel="shortcut icon" href="img/favicon.png"><!--[if lt IE 9]>
        <script src="https://oss.maxcdn.com/html5shiv/3.7.3/html5shiv.min.js"></script>
        <script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script><![endif]--><link rel="stylesheet" href="https://unpkg.com/element-ui/lib/theme-chalk/index.css"><script src="vendor/vue/vue.min.js"></script><script src="https://unpkg.com/element-ui/lib/index.js"></script><script defer="defer" src="js/publication.5489fb0dc6902048e604.js"></script></head><body><header class="header"><nav class="navbar navbar-expand-lg py-lg-0"><div class="container"><a href="index.html"><img src="img/dbgroup-logo.png" alt="DBGroup" height="50px"/> </a><button type="button" data-toggle="collapse" data-target="#navbarSupportedContent" aria-controls="navbarSupportedContent" aria-expanded="false" aria-label="Toggle navigation" class="navbar-toggler navbar-toggler-right"><span class="text-uppercase font-weight-bold mr-2">Menu</span><i class="fas fa-bars"></i></button><div id="navbarSupportedContent" class="collapse navbar-collapse"><ul class="navbar-nav ml-auto d-lg-flex align-items-lg-center"><li class="nav-item"><a href="index.html" class="nav-link font-weight-bold text-uppercase px-lg-3 py-lg-4">Home</a></li><li class="nav-item"><a href="publication.html" class="nav-link font-weight-bold text-uppercase px-lg-3 py-lg-4 active">Publication</a></li></ul></div></div></nav></header><section class="py-2 bg-cover bg-gray"><div class="container"><div aria-label="breadcrumb" class="nav"><ol class="breadcrumb"><li class="breadcrumb-item"><a href="./">Home</a></li><li aria-current="page" class="breadcrumb-item active">Publications</li></ol></div><h1 class="">Publications</h1><div style="width: 100%; display: flex; justify-content: flex-end; flex-wrap: wrap"><span style="padding-right: 10px;">Author: <select class="form-select form-select-sm" id="authorsSelector" style="width: 120px;"><option selected="selected" value="all">All</option><option value="Bo Tang">Bo Tang</option><option value="Xiao Yan">Xiao Yan</option><option value="Qiaomu Shen">Qiaomu Shen</option><option value="Dan Zeng">Dan Zeng</option></select> </span><span style="padding-right: 10px;">Category: <select class="form-select form-select-sm" id="categorySelector" style="width: 120px;"><option selected="selected" value="all">All</option><option value="CCF-A">CCF-A</option><option value="CCF-B">CCF-B</option><option value="CCF-C">CCF-C</option><option value="JCR-Q1">JCR-Q1</option><option value="JCR-Q3">JCR-Q3</option><option value="None">None</option></select></span></div></div></section><section class="py-5 annual-publication"><div class="container py-2"><div style="width: 100%; display: flex; justify-content: center"><h2 class="section-title mb-4 lined">2022</h2></div><div class="publication-list"><div class="publication-item" id="tang2022CVPR_Face2Exp" data-authors="Dan Zeng, Zhiyuan Lin, Xiao Yan, Yuting liu, Fei Wang and Bo Tang." data-category="CCF-A"><div class="row publication-section"><div class="col-md-2 publication-img-wrapper"><div class="publication-img-container"><img src="img/loading.gif" data-src="img/publication/tang2022CVPR_Face2Exp.png" alt="" class="lazy-load-img img-fluid"></div></div><div class="col-md-10"><p class="publication-title">Face2Exp: Combating Data Biases for Facial Expression Recognition</p><p class="publication-location">IEEE Conference on Computer Vision and Pattern Recognition (CVPR, CCF-A), 2022</p><p class="publication-author" title="Authors: Dan Zeng, Zhiyuan Lin, Xiao Yan, Yuting liu, Fei Wang and Bo Tang.">Authors: Dan Zeng, Zhiyuan Lin, Xiao Yan, Yuting liu, Fei Wang and Bo Tang.</p><div class="publication-btn-group"><span class="btn btn-disabled" title="Unavailable">abstract</span> <span class="btn btn-disabled" title="Unavailable">pdf</span></div><div class="abstract" id="abstract-0-0"><div></div></div></div></div></div><div class="publication-item" id="tang2022SIGMOD_T-LevelIndex" data-authors="Jiahao Zhang, Bo Tang, Man Lung Yiu, Xiao Yan, Keming Li." data-category="CCF-A"><div class="row publication-section"><div class="col-md-2 publication-img-wrapper"><div class="publication-img-container"><img src="img/loading.gif" data-src="img/publication/tang2022SIGMOD_T-LevelIndex.png" alt="" class="lazy-load-img img-fluid"></div></div><div class="col-md-10"><p class="publication-title">𝜏-LevelIndex: Towards Efficient Query Processing in Continuous Preference Space</p><p class="publication-location">ACM Conference on Management of Data (SIGMOD, CCF-A), 2022</p><p class="publication-author" title="Authors: Jiahao Zhang, Bo Tang, Man Lung Yiu, Xiao Yan, Keming Li.">Authors: Jiahao Zhang, Bo Tang, Man Lung Yiu, Xiao Yan, Keming Li.</p><div class="publication-btn-group"><span class="btn btn-disabled" title="Unavailable">abstract</span> <span class="btn btn-disabled" title="Unavailable">pdf</span></div><div class="abstract" id="abstract-0-1"><div></div></div></div></div></div><div class="publication-item" id="tang2022SIGMOD_Spatial" data-authors="Huan Li, Bo Tang, Hua Lu, Christian S. Jensen, Muhammad Aamir Cheema." data-category="CCF-A"><div class="row publication-section"><div class="col-md-2 publication-img-wrapper"><div class="publication-img-container"><img src="img/loading.gif" data-src="img/publication/tang2022SIGMOD_Spatial.png" alt="" class="lazy-load-img img-fluid"></div></div><div class="col-md-10"><p class="publication-title">Spatial Data Quality in the IoT Era: Management and Exploitation</p><p class="publication-location">ACM Conference on Management of Data (SIGMOD, CCF-A), 2022</p><p class="publication-author" title="Authors: Huan Li, Bo Tang, Hua Lu, Christian S. Jensen, Muhammad Aamir Cheema.">Authors: Huan Li, Bo Tang, Hua Lu, Christian S. Jensen, Muhammad Aamir Cheema.</p><div class="publication-btn-group"><span class="btn btn-disabled" title="Unavailable">abstract</span> <span class="btn btn-disabled" title="Unavailable">pdf</span></div><div class="abstract" id="abstract-0-2"><div></div></div></div></div></div><div class="publication-item" id="tang2022SIGMOD_GHive" data-authors="Haotian Liu, Bo Tang, Jiashu Zhang, Yangshen Deng, Xinying Zheng, Qiaomu Shen, Xiao Yan, Dan Zeng, Zunyao Mao, Chaozu Zhang, Zhengxin You, Zhihao Wang, Jiang Runzhe, Fang Wang, Man Lung Yiu, Huan Li, Mingji Han, Qian Li, Zhenghai Luo." data-category="CCF-A"><div class="row publication-section"><div class="col-md-2 publication-img-wrapper"><div class="publication-img-container"><img src="img/loading.gif" data-src="img/publication/tang2022SIGMOD_GHive.png" alt="" class="lazy-load-img img-fluid"></div></div><div class="col-md-10"><p class="publication-title">GHive: A Demonstration of GPU-Accelerated Query Processing in Apache Hive</p><p class="publication-location">ACM Conference on Management of Data (SIGMOD, CCF-A), 2022</p><p class="publication-author" title="Authors: Haotian Liu, Bo Tang, Jiashu Zhang, Yangshen Deng, Xinying Zheng, Qiaomu Shen, Xiao Yan, Dan Zeng, Zunyao Mao, Chaozu Zhang, Zhengxin You, Zhihao Wang, Jiang Runzhe, Fang Wang, Man Lung Yiu, Huan Li, Mingji Han, Qian Li, Zhenghai Luo.">Authors: Haotian Liu, Bo Tang, Jiashu Zhang, Yangshen Deng, Xinying Zheng, Qiaomu Shen, Xiao Yan, Dan Zeng, Zunyao Mao, Chaozu Zhang, Zhengxin You, Zhihao Wang, Jiang Runzhe, Fang Wang, Man Lung Yiu, Huan Li, Mingji Han, Qian Li, Zhenghai Luo.</p><div class="publication-btn-group"><span class="btn btn-disabled" title="Unavailable">abstract</span> <span class="btn btn-disabled" title="Unavailable">pdf</span></div><div class="abstract" id="abstract-0-3"><div></div></div></div></div></div><div class="publication-item" id="tang2022PVLDB_Efficient" data-authors="Huan Li, Lanjing Yi, Bo Tang, Hua Lu, Christian Jensen." data-category="CCF-A"><div class="row publication-section"><div class="col-md-2 publication-img-wrapper"><div class="publication-img-container"><img src="img/loading.gif" data-src="img/publication/tang2022PVLDB_Efficient.png" alt="" class="lazy-load-img img-fluid"></div></div><div class="col-md-10"><p class="publication-title">Efficient and Error-bounded Spatiotemporal Quantile Monitoring in Edge Computing Environments</p><p class="publication-location">Proceedings of the VLDB Endowment (PVLDB, CCF-A), 2022</p><p class="publication-author" title="Authors: Huan Li, Lanjing Yi, Bo Tang, Hua Lu, Christian Jensen.">Authors: Huan Li, Lanjing Yi, Bo Tang, Hua Lu, Christian Jensen.</p><div class="publication-btn-group"><span class="btn btn-disabled" title="Unavailable">abstract</span> <span class="btn btn-disabled" title="Unavailable">pdf</span></div><div class="abstract" id="abstract-0-4"><div></div></div></div></div></div><div class="publication-item" id="tang2022ICDE_Constructing" data-authors="Jing Zhao, Peng Wang, Bo Tang, Lu Liu, Chen Wang, Wei Wang, Jianmin Wang." data-category="CCF-A"><div class="row publication-section"><div class="col-md-2 publication-img-wrapper"><div class="publication-img-container"><img src="img/loading.gif" data-src="img/publication/tang2022ICDE_Constructing.png" alt="" class="lazy-load-img img-fluid"></div></div><div class="col-md-10"><p class="publication-title">Constructing Compact Time Series Index for Efficient Window Query Processing</p><p class="publication-location">IEEE International Conference on Data Engineering (ICDE, CCF-A), 2022</p><p class="publication-author" title="Authors: Jing Zhao, Peng Wang, Bo Tang, Lu Liu, Chen Wang, Wei Wang, Jianmin Wang.">Authors: Jing Zhao, Peng Wang, Bo Tang, Lu Liu, Chen Wang, Wei Wang, Jianmin Wang.</p><div class="publication-btn-group"><span class="btn btn-disabled" title="Unavailable">abstract</span> <span class="btn btn-disabled" title="Unavailable">pdf</span></div><div class="abstract" id="abstract-0-5"><div></div></div></div></div></div><div class="publication-item" id="tang2022ICDE_imDedup" data-authors="Cai Deng, Qi Chen, Xiangyu Zou, Eric Xu, Bo Tang, Wen Xia." data-category="CCF-A"><div class="row publication-section"><div class="col-md-2 publication-img-wrapper"><div class="publication-img-container"><img src="img/loading.gif" data-src="img/publication/tang2022ICDE_imDedup.png" alt="" class="lazy-load-img img-fluid"></div></div><div class="col-md-10"><p class="publication-title">imDedup: a Lossless Deduplication Scheme to Eliminate Fine-grained Redundancy among Images</p><p class="publication-location">IEEE International Conference on Data Engineering (ICDE, CCF-A), 2022</p><p class="publication-author" title="Authors: Cai Deng, Qi Chen, Xiangyu Zou, Eric Xu, Bo Tang, Wen Xia.">Authors: Cai Deng, Qi Chen, Xiangyu Zou, Eric Xu, Bo Tang, Wen Xia.</p><div class="publication-btn-group"><a href="javascript:;" class="btn btn-primary my-2" onclick="toggleAbstractDisplayPubPage('abstract-0-6')">abstract</a> <a href="https://www.researchgate.net/profile/Xiangyu-Zou-4/publication/359685410_imDedup_a_Lossless_Deduplication_Scheme_to_Eliminate_Fine-grained_Redundancy_among_Images/links/6247ff0257084c718b7f01f5/imDedup-a-Lossless-Deduplication-Scheme-to-Eliminate-Fine-grained-Redundancy-among-Images.pdf?_sg%5B0%5D=1xX9IoYFYKRqf9zQnXHViI4PBRM9SMiEu8wpC65YkkFlyn8X4siSGZnC47np8Vh2M32DXol28tsJS2IVNS-Vkw.6uuoHS8gHPe-wZUp_XT49KIFx4pa3GLQxV2jxT0zyD2ctcoHtIHwtipBQUWoMkDj3-SyaGM4ZpSM_RWR86kfuw&_sg%5B1%5D=EtkwlKZsxBU8ZYZLZKhOAbIpTruoGwjFYMbQBwjMwIet0rEqOPMST8dE2VteO9Tb6tKprdUJL1hrrK-q3s071Gmq7hSRr9uXEDz5r_rFlWl7.6uuoHS8gHPe-wZUp_XT49KIFx4pa3GLQxV2jxT0zyD2ctcoHtIHwtipBQUWoMkDj3-SyaGM4ZpSM_RWR86kfuw&_iepl=" target="_blank" class="btn btn-primary">pdf</a></div><div class="abstract" id="abstract-0-6"><div><p>Images occupy a large amount of storage in data centers. To cope with the explosive growth of the image storage requirement, image compression techniques are devised to shrink the size of every single image at first. Furthermore, image deduplication methods are proposed to reduce the storage cost as they could be used to eliminate redundancy among images. However, state-of-the-art image deduplication methods either can only eliminate file-level coarse-grained redundancy or cannot guarantee lossless deduplication. In this work, we propose a new lossless image deduplication framework to eliminate fine-grained redundancy among images. It first decodes images to expose similarity, then eliminates fine-grained redundancy on the decoded data by delta compression , and finally re-compresses the remaining data by image compression encoding. Based on this framework, we propose a novel lossless similarity-based deduplication (SBD) scheme for decoded image data (called imDedup). Specifically, imDedup uses a novel and fast sampling method (called Feature Map) to detect similar images in a two-dimensional way, which greatly reduces computation overhead. Meanwhile, it uses a novel delta encoder (called Idelta) which incorporates image compression encoding characteristics into deduplication to guarantee the remaining deduplicated image data to be friendly re-compressed via image encoding, which significantly improves the compression ratio. We implement a prototype of imDedup for JPEG images, and demonstrate its superiority on four datasets: Compared with exact image deduplication, imDedup achieves a 19%-38% higher compression ratio by efficiently eliminating fine-grained redundancy. Compared with the similarity detector and delta encoder of state-of-the-art SBD schemes running on the decoded image data, imDedup achieves a 1.8×-3.4× higher throughput and a 1.3×-1.6× higher compression ratio, respectively.</p></div></div></div></div></div><div class="publication-item" id="tang2022ICDE_CheetahKG" data-authors="Bo Tang, Zeng Jian, Qiandong Tang, Chuan Yang, Qiaomu Shen, Leong Hou U, Xiao Yan, Dan Zeng." data-category="CCF-A"><div class="row publication-section"><div class="col-md-2 publication-img-wrapper"><div class="publication-img-container"><img src="img/loading.gif" data-src="img/publication/tang2022ICDE_CheetahKG.png" alt="" class="lazy-load-img img-fluid"></div></div><div class="col-md-10"><p class="publication-title">CheetahKG: A Demonstration for Core-based Top-k Frequent Pattern Discovery on Knowledge Graphs</p><p class="publication-location">IEEE International Conference on Data Engineering (ICDE, CCF-A), 2022</p><p class="publication-author" title="Authors: Bo Tang, Zeng Jian, Qiandong Tang, Chuan Yang, Qiaomu Shen, Leong Hou U, Xiao Yan, Dan Zeng.">Authors: Bo Tang, Zeng Jian, Qiandong Tang, Chuan Yang, Qiaomu Shen, Leong Hou U, Xiao Yan, Dan Zeng.</p><div class="publication-btn-group"><span class="btn btn-disabled" title="Unavailable">abstract</span> <span class="btn btn-disabled" title="Unavailable">pdf</span></div><div class="abstract" id="abstract-0-7"><div></div></div></div></div></div><div class="publication-item" id="tang2022ICDE_Fast" data-authors="Jiahao Zhang, Man Lung Yiu, Bo Tang, Qing Li." data-category="CCF-A"><div class="row publication-section"><div class="col-md-2 publication-img-wrapper"><div class="publication-img-container"><img src="img/loading.gif" data-src="img/publication/tang2022ICDE_Fast.png" alt="" class="lazy-load-img img-fluid"></div></div><div class="col-md-10"><p class="publication-title">Fast Error-Bounded Distance Distribution Computation (Extended Abstract)</p><p class="publication-location">IEEE International Conference on Data Engineering (ICDE, CCF-A), 2022</p><p class="publication-author" title="Authors: Jiahao Zhang, Man Lung Yiu, Bo Tang, Qing Li.">Authors: Jiahao Zhang, Man Lung Yiu, Bo Tang, Qing Li.</p><div class="publication-btn-group"><a href="javascript:;" class="btn btn-primary my-2" onclick="toggleAbstractDisplayPubPage('abstract-0-8')">abstract</a> <a href="https://web.comp.polyu.edu.hk/csmlyiu/journal/TKDE_acdd.pdf" target="_blank" class="btn btn-primary">pdf</a></div><div class="abstract" id="abstract-0-8"><div><p>In this work we study the distance distribution computation problem. It has been widely used in many real-world applications, e.g., human genome clustering, cosmological model analysis, and parameter tuning. The straightforward solution for the exact distance distribution computation problem is unacceptably slow due to (i) massive data size, and (ii) expensive distance computation. In this paper, we propose a novel method to compute approximate distance distributions with error bound guarantees. Furthermore, our method is generic to different distance measures. We conduct extensive experimental studies on three widely used distance measures with real-world datasets. The experimental results demonstrate that our proposed method outperforms sampling-based solution (without error guarantees) by up to three orders of magnitude.</p></div></div></div></div></div><div class="publication-item" id="zeng2022ICME_Rank-based" data-authors="Xucheng Wang, Dan Zeng, Qijun Zhao, Shuiwang Li." data-category="CCF-B"><div class="row publication-section"><div class="col-md-2 publication-img-wrapper"><div class="publication-img-container"><img src="img/loading.gif" data-src="img/publication/zeng2022ICME_Rank-based.png" alt="" class="lazy-load-img img-fluid"></div></div><div class="col-md-10"><p class="publication-title">Rank-based filter pruning for real-time UAV tracking</p><p class="publication-location">IEEE International Conference on Multimedia & Expo (ICME, CCF-B), 2022</p><p class="publication-author" title="Authors: Xucheng Wang, Dan Zeng, Qijun Zhao, Shuiwang Li.">Authors: Xucheng Wang, Dan Zeng, Qijun Zhao, Shuiwang Li.</p><div class="publication-btn-group"><span class="btn btn-disabled" title="Unavailable">abstract</span> <span class="btn btn-disabled" title="Unavailable">pdf</span></div><div class="abstract" id="abstract-0-9"><div></div></div></div></div></div></div></div></section><section class="py-5 bg-gray annual-publication"><div class="container py-2"><div style="width: 100%; display: flex; justify-content: center"><h2 class="section-title mb-4 lined">2021</h2></div><div class="publication-list"><div class="publication-item" id="tang2021SIGMOD_On" data-authors="Bo Tang, Kyriakos Mouratidis, Mingji Han." data-category="CCF-A"><div class="row publication-section"><div class="col-md-2 publication-img-wrapper"><div class="publication-img-container"><img src="img/loading.gif" data-src="img/publication/tang2021SIGMOD_On.png" alt="" class="lazy-load-img img-fluid"></div></div><div class="col-md-10"><p class="publication-title">On m-Impact Regions and Standing Top-k Influence Problems</p><p class="publication-location">ACM Conference on Management of Data (SIGMOD, CCF-A), 2021</p><p class="publication-author" title="Authors: Bo Tang, Kyriakos Mouratidis, Mingji Han.">Authors: Bo Tang, Kyriakos Mouratidis, Mingji Han.</p><div class="publication-btn-group"><a href="javascript:;" class="btn btn-primary my-2" onclick="toggleAbstractDisplayPubPage('abstract-1-0')">abstract</a> <a href="http://www.mysmu.edu/faculty/kyriakos/SIGMOD21-mIR.pdf" target="_blank" class="btn btn-primary">pdf</a></div><div class="abstract" id="abstract-1-0"><div><p>In this paper, we study the m-impact region problem (mIR). In a context where users look for available products with top-k queries, mIR identifies the part of the product space that attracts the most user attention. Specifically, mIR determines the kind of attribute values that lead a (new or existing) product to the top-k result for at least a fraction of the user population. mIR has several applications, ranging from effective marketing to product improvement. Importantly, it also leads to (exact and efficient) solutions for standing top-k impact problems, which were previously solved heuristically only, or whose current solutions face serious scalability limitations. We experiment, among others, on data mined from actual user reviews for real products, and demonstrate the practicality and efficiency of our algorithms, both for mIR and for standing top-k impact problems.</p></div></div></div></div></div><div class="publication-item" id="tang2021SIGMOD_Marrying" data-authors="Kyriakos Mouratidis, Keming Li, Bo Tang." data-category="CCF-A"><div class="row publication-section"><div class="col-md-2 publication-img-wrapper"><div class="publication-img-container"><img src="img/loading.gif" data-src="img/publication/tang2021SIGMOD_Marrying.png" alt="" class="lazy-load-img img-fluid"></div></div><div class="col-md-10"><p class="publication-title">Marrying Top-k with Skyline Queries: Relaxing the Preference Input while Producing Output of Controllable Size</p><p class="publication-location">ACM Conference on Management of Data (SIGMOD, CCF-A), 2021</p><p class="publication-author" title="Authors: Kyriakos Mouratidis, Keming Li, Bo Tang.">Authors: Kyriakos Mouratidis, Keming Li, Bo Tang.</p><div class="publication-btn-group"><a href="javascript:;" class="btn btn-primary my-2" onclick="toggleAbstractDisplayPubPage('abstract-1-1')">abstract</a> <a href="https://ink.library.smu.edu.sg/cgi/viewcontent.cgi?article=7226&context=sis_research" target="_blank" class="btn btn-primary">pdf</a></div><div class="abstract" id="abstract-1-1"><div><p>The two most common paradigms to identify records of preference in a multi-objective setting rely either on dominance (e.g., the skyline operator) or on a utility function defined over the records&#39; attributes (typically, using a top-k query). Despite their proliferation, each of them has its own palpable drawbacks. Motivated by these drawbacks, we identify three hard requirements for practical decision support, namely, personalization, controllable output size, and flexibility in preference specification. With these requirements as a guide, we combine elements from both paradigms and propose two new operators, ORD and ORU. We perform a qualitative study to demonstrate how they work, and evaluate their performance against adaptations of previous work that mimic their output.</p></div></div></div></div></div><div class="publication-item" id="tang2021SIGMOD_Vertex-Centric" data-authors="Yidi Wu, Yuntao Gui, Tatiana Jin, James Cheng, Xiao Yan, Peiqi Yin, Yufei Cai, Bo Tang, Fan Yu." data-category="CCF-A"><div class="row publication-section"><div class="col-md-2 publication-img-wrapper"><div class="publication-img-container"><img src="img/loading.gif" data-src="img/publication/tang2021SIGMOD_Vertex-Centric.png" alt="" class="lazy-load-img img-fluid"></div></div><div class="col-md-10"><p class="publication-title">Vertex-Centric Visual Programming for Graph Neural Networks</p><p class="publication-location">ACM Conference on Management of Data (SIGMOD, CCF-A), 2021</p><p class="publication-author" title="Authors: Yidi Wu, Yuntao Gui, Tatiana Jin, James Cheng, Xiao Yan, Peiqi Yin, Yufei Cai, Bo Tang, Fan Yu.">Authors: Yidi Wu, Yuntao Gui, Tatiana Jin, James Cheng, Xiao Yan, Peiqi Yin, Yufei Cai, Bo Tang, Fan Yu.</p><div class="publication-btn-group"><a href="javascript:;" class="btn btn-primary my-2" onclick="toggleAbstractDisplayPubPage('abstract-1-2')">abstract</a> <a href="https://www.cse.cuhk.edu.hk/~jcheng/papers/seastar_sigmod21.pdf" target="_blank" class="btn btn-primary">pdf</a></div><div class="abstract" id="abstract-1-2"><div><p>Graph neural networks (GNNs) have achieved remarkable performance in many graph analytics tasks such as node classification, link prediction and graph clustering. Existing GNN systems (e.g., PyG and DGL) adopt a tensor-centric programming model and train GNNs with manually written operators. Such design results in poor usability due to the large semantic gap between the API and the GNN models, and suffers from inferior efficiency because of high memory consumption and massive data movement. We demonstrateSeastar, a novel GNN training framework that adopts avertex-centric programming paradigm and supportsautomatic kernel generation, to simplify model development and improve training efficiency. We will (i) show how to express GNN models succinctly using a visual &quot;drag-and-drop&#39;&#39; interface or Seastar&#39;s vertex-centric python API; (ii) demonstrate the performance advantage of Seastar over existing GNN systems in convergence speed, training throughput and memory consumption; and (iii) illustrate how Seastar&#39;s optimizations (e.g., operator fusion and constant folding) improve training efficiency by profiling the run-time performance.</p></div></div></div></div></div><div class="publication-item" id="tang2021SIGIR_GAIPS" data-authors="Long Xiang, Xiao Yan, Lan Lu, Bo Tang." data-category="CCF-A"><div class="row publication-section"><div class="col-md-2 publication-img-wrapper"><div class="publication-img-container"><img src="img/loading.gif" data-src="img/publication/tang2021SIGIR_GAIPS.png" alt="" class="lazy-load-img img-fluid"></div></div><div class="col-md-10"><p class="publication-title">GAIPS: Accelerating Maximum Inner Product Search with GPU</p><p class="publication-location">International Conference on Research on Development in Information Retrieval (SIGIR, CCF-A), 2021</p><p class="publication-author" title="Authors: Long Xiang, Xiao Yan, Lan Lu, Bo Tang.">Authors: Long Xiang, Xiao Yan, Lan Lu, Bo Tang.</p><div class="publication-btn-group"><a href="javascript:;" class="btn btn-primary my-2" onclick="toggleAbstractDisplayPubPage('abstract-1-3')">abstract</a> <a href="https://dl.acm.org/doi/pdf/10.1145/3404835.3462997?casa_token=YnIsfvp7H88AAAAA:MmIFff-A6HoPSiBVcbAWTOGOuEZcEctP8UtOp9-zP7rzEbc7TVC5ZZvIIyr4A6m54RBcdaMj4fC8BiQ" target="_blank" class="btn btn-primary">pdf</a></div><div class="abstract" id="abstract-1-3"><div><p>In this paper, we propose the GAIPS framework for efficient maximum inner product search (MIPS) on GPU. We observe that a query can usually find a good lower bound of its maximum inner product in some large norm items that take up only a small portion of the dataset and utilize this fact to facilitate pruning. In addition, we design norm-based, residue-based and hash-based pruning techniques to avoid computation for items that are unlikely to be the MIPS results. Experiment results show that compared with FAISS, the state-of-the-art GPU-based similarity search framework, GAIPS has significantly shorter query processing time at the same recall.</p></div></div></div></div></div><div class="publication-item" id="tang2021ICDE_Fast" data-authors="Jiang Zeng, Leong Hou U, Xiao Yan, Mingji Han, Bo Tang." data-category="CCF-A"><div class="row publication-section"><div class="col-md-2 publication-img-wrapper"><div class="publication-img-container"><img src="img/loading.gif" data-src="img/publication/tang2021ICDE_Fast.png" alt="" class="lazy-load-img img-fluid"></div></div><div class="col-md-10"><p class="publication-title">Fast Core-based Top-k Frequent Pattern Discovery in Knowledge Graphs</p><p class="publication-location">IEEE International Conference on Data Engineering (ICDE, CCF-A), 2021</p><p class="publication-author" title="Authors: Jiang Zeng, Leong Hou U, Xiao Yan, Mingji Han, Bo Tang.">Authors: Jiang Zeng, Leong Hou U, Xiao Yan, Mingji Han, Bo Tang.</p><div class="publication-btn-group"><a href="javascript:;" class="btn btn-primary my-2" onclick="toggleAbstractDisplayPubPage('abstract-1-4')">abstract</a> <a href="https://ieeexplore.ieee.org/abstract/document/9458875" target="_blank" class="btn btn-primary">pdf</a></div><div class="abstract" id="abstract-1-4"><div><p>Knowledge graph is a way of structuring information in graph form, by representing entities as nodes and relationships between entities as edges. A knowledge graph often consists of large amount of facts in real-world which can be used in supporting many analytical tasks, e.g., exceptional facts discovery and fact check of claims. In this work, we study a core-based top-k frequent pattern discovery problem which is frequently used as a subroutine in analyzing knowledge graphs. The main challenge of the problem is search space of the candidate patterns is exponential to the combinations of the nodes and edges in the knowledge graph.To reduce the search space, we devise a novel computation framework FastPat with a suite of optimizations. First, we devise a meta-index, which can be used to avoid generating invalid candidate patterns. Second, we propose an upper bound of the frequency score (i.e., MNI) of the candidate pattern that prunes unqualified candidates earlier and prioritize the enumeration order of the patterns. Lastly, we design a join-based approach to compute the MNI of candidate pattern efficiently. We conduct extensive experimental studies in real-world datasets to verify the superiority of our proposed method over the baselines. We also demonstrate the utility of the discovered frequent patterns by a case study in COVID-19 knowledge graph.</p></div></div></div></div></div><div class="publication-item" id="tang2021ICDE_Towards" data-authors="Wentao Ning, Xiao Yan, Bo Tang." data-category="CCF-A"><div class="row publication-section"><div class="col-md-2 publication-img-wrapper"><div class="publication-img-container"><img src="img/loading.gif" data-src="img/publication/tang2021ICDE_Towards.png" alt="" class="lazy-load-img img-fluid"></div></div><div class="col-md-10"><p class="publication-title">Towards Efficient MaxBRNN Computation for Streaming Updates</p><p class="publication-location">IEEE International Conference on Data Engineering (ICDE, CCF-A), 2021</p><p class="publication-author" title="Authors: Wentao Ning, Xiao Yan, Bo Tang.">Authors: Wentao Ning, Xiao Yan, Bo Tang.</p><div class="publication-btn-group"><a href="javascript:;" class="btn btn-primary my-2" onclick="toggleAbstractDisplayPubPage('abstract-1-5')">abstract</a> <a href="https://ieeexplore.ieee.org/document/9458861" target="_blank" class="btn btn-primary">pdf</a></div><div class="abstract" id="abstract-1-5"><div><p>In this paper, we propose the streaming MaxBRNNquery, which finds the optimal region to deploy a new service point when both the service points and client points are under continuous updates. The streaming MaxBRNN query has many applications such as taxi scheduling, shared bike placements, etc. Existing MaxBRNN solutions are insufficient for streaming updates as they need to re-run from scratch even for a small amount of updates, resulting in long query processing time. To tackle this problem, we devise an efficient slot partitioning-based algorithm (SlotP), which divides the space into equal-sized slots and processes each slot independently. The superiorities of our proposal for streaming MaxBRNN query are: (i) an update affects only a smaller number of slots and works done on the unaffected slots can be reused directly; (ii) the influence value upper bound of each slot can be derived efficiently and accurately, which facilitate pruning many slots from expensive computation. We conducted extensive experiments to validate the performance of the SlotP algorithm. The results show that SlotP is 2-3 orders of magnitude faster than state-of-the-art baselines.</p></div></div></div></div></div><div class="publication-item" id="tang2021ICDE_GRAB" data-authors="Yi Lu, Peng Wang, Bo Tang, Shen Liang, Chen Wang, Wei Wang, Jianmin Wang." data-category="CCF-A"><div class="row publication-section"><div class="col-md-2 publication-img-wrapper"><div class="publication-img-container"><img src="img/loading.gif" data-src="img/publication/tang2021ICDE_GRAB.png" alt="" class="lazy-load-img img-fluid"></div></div><div class="col-md-10"><p class="publication-title">GRAB: Finding Time Series Natural Structures via A Novel Graph-Based Scheme</p><p class="publication-location">IEEE International Conference on Data Engineering (ICDE, CCF-A), 2021</p><p class="publication-author" title="Authors: Yi Lu, Peng Wang, Bo Tang, Shen Liang, Chen Wang, Wei Wang, Jianmin Wang.">Authors: Yi Lu, Peng Wang, Bo Tang, Shen Liang, Chen Wang, Wei Wang, Jianmin Wang.</p><div class="publication-btn-group"><a href="javascript:;" class="btn btn-primary my-2" onclick="toggleAbstractDisplayPubPage('abstract-1-6')">abstract</a> <a href="https://ieeexplore.ieee.org/abstract/document/9458905" target="_blank" class="btn btn-primary">pdf</a></div><div class="abstract" id="abstract-1-6"><div><p>In recent years, the widespread use of sensors has substantially stimulated researchers&#39; interest in time series data mining. Real-world time series often include natural structures. For example, a time series captured from a patient rehabilitation app can be divided into a series of movements, e.g., sitting, standing, and walking. Finding time series natural structures (i.e., latent semantic states) is one of the core subroutines in time series mining applications. However, this task is not trivial as it has two challenges: (1) how to determine the correct change points between consecutive segments, and (2) how to cluster segments into different states.In this paper, we propose a novel graph-based approach, GRAB, to discover time series natural structures. In particular, GRAB first partitions the time series into a set of non-overlapping fragments via the similarity between subsequences. Then, it constructs a fragment-based graph and employs a graph partition method to cluster the fragments into states. Extensive experiments on real-world datasets demonstrate the effectiveness and efficiency of our GRAB method. Specifically, GRAB finds high-quality latent states, and it outperforms state-of-the-art solutions by orders of magnitude.</p></div></div></div></div></div><div class="publication-item" id="yan2021IEEEJournalofBiomedicalandHealthInformatics_Combating" data-authors="Jiansheng Fang, Huazhu Fu, Dan Zeng, Xiao Yan, and Jiang Liu." data-category="JCR-Q1"><div class="row publication-section"><div class="col-md-2 publication-img-wrapper"><div class="publication-img-container"><img src="img/loading.gif" data-src="img/publication/yan2021IEEEJournalofBiomedicalandHealthInformatics_Combating.png" alt="" class="lazy-load-img img-fluid"></div></div><div class="col-md-10"><p class="publication-title">Combating Ambiguity for hash-code Learning in Medical Instance Retrieval</p><p class="publication-location">IEEE Journal of Biomedical and Health Informatics (JCR-Q1), 2021</p><p class="publication-author" title="Authors: Jiansheng Fang, Huazhu Fu, Dan Zeng, Xiao Yan, and Jiang Liu.">Authors: Jiansheng Fang, Huazhu Fu, Dan Zeng, Xiao Yan, and Jiang Liu.</p><div class="publication-btn-group"><a href="javascript:;" class="btn btn-primary my-2" onclick="toggleAbstractDisplayPubPage('abstract-1-7')">abstract</a> <a href="https://arxiv.org/pdf/2105.08872" target="_blank" class="btn btn-primary">pdf</a></div><div class="abstract" id="abstract-1-7"><div><p>When encountering a dubious diagnostic case, medical instance retrieval can help radiologists make evidence-based diagnoses by finding images containing instances similar to a query case from a large image database. The similarity between the query case and retrieved similar cases is determined by visual features extracted from pathologically abnormal regions. However, the manifestation of these regions often lacks specificity, i.e., different diseases can have the same manifestation, and different manifestations may occur at different stages of the same disease. To combat the manifestation ambiguity in medical instance retrieval, we propose a novel deep framework called Y-Net, encoding images into compact hash-codes generated from convolutional features by feature aggregation. Y-Net can learn highly discriminative convolutional features by unifying the pixel-wise segmentation loss and classification loss. The segmentation loss allows exploring subtle spatial differences for good spatial-discriminability while the classification loss utilizes class-aware semantic information for good semantic-separability. As a result, Y-Net can enhance the visual features in pathologically abnormal regions and suppress the disturbing of the background during model training, which could effectively embed discriminative features into the hash-codes in the retrieval stage. Extensive experiments on two medical image datasets demonstrate that Y-Net can alleviate the ambiguity of pathologically abnormal regions and its retrieval performance outperforms the state-of-the-art method by an average of 9.27% on the returned list of 10.</p></div></div></div></div></div><div class="publication-item" id="tang2021CSUR_Spatial" data-authors="Huan Li, Hua Lu, Christian S. Jensen, Bo Tang, Muhammad Aamir Cheema." data-category="JCR-Q1"><div class="row publication-section"><div class="col-md-2 publication-img-wrapper"><div class="publication-img-container"><img src="img/loading.gif" data-src="img/publication/tang2021CSUR_Spatial.png" alt="" class="lazy-load-img img-fluid"></div></div><div class="col-md-10"><p class="publication-title">Spatial Data Quality in the Internet of Things: Management, Exploitation, and Prospects</p><p class="publication-location">ACM Computing Surveys (CSUR, JCR-Q1), 2021</p><p class="publication-author" title="Authors: Huan Li, Hua Lu, Christian S. Jensen, Bo Tang, Muhammad Aamir Cheema.">Authors: Huan Li, Hua Lu, Christian S. Jensen, Bo Tang, Muhammad Aamir Cheema.</p><div class="publication-btn-group"><a href="javascript:;" class="btn btn-primary my-2" onclick="toggleAbstractDisplayPubPage('abstract-1-8')">abstract</a> <a href="https://www.researchgate.net/publication/358278395_Spatial_Data_Quality_in_the_Internet_of_Things_Management_Exploitation_and_Prospects" target="_blank" class="btn btn-primary">pdf</a></div><div class="abstract" id="abstract-1-8"><div><p>With the continued deployment of the Internet of Things (IoT), increasing volumes of devices are being deployed that emit massive spatially referenced data. Due in part to the dynamic, decentralized, and heterogeneous architecture of the IoT, the varying and often low quality of spatial IoT data (SID) presents challenges to applications built on top of this data. This survey aims to provide unique insight to practitioners who intend to develop IoT-enabled applications and to researchers who wish to conduct research that relates to data quality in the IoT setting. The survey offers an inventory analysis of major data quality dimensions in SID and covers significant data characteristics and associated quality considerations. The survey summarizes data quality related technologies from both task and technique perspectives. Organizing the technologies from the task perspective, it covers recent progress in SID quality management, encompassing location refinement, uncertainty elimination, outlier removal, fault correction, data integration, and data reduction; and it covers low-quality SID exploitation, encompassing querying, analysis, and decision-making techniques. Finally, the survey covers emerging trends and open issues concerning the quality of SID.</p></div></div></div></div></div><div class="publication-item" id="tang2021IS_RCELF" data-authors="Shiqi Zhang, Xinxun Zeng, Bo Tang." data-category="CCF-B"><div class="row publication-section"><div class="col-md-2 publication-img-wrapper"><div class="publication-img-container"><img src="img/loading.gif" data-src="img/publication/tang2021IS_RCELF.png" alt="" class="lazy-load-img img-fluid"></div></div><div class="col-md-10"><p class="publication-title">RCELF: A Residual-based Approach for Influence Maximization Problem</p><p class="publication-location">Information Systems (IS, CCF-B), 2021</p><p class="publication-author" title="Authors: Shiqi Zhang, Xinxun Zeng, Bo Tang.">Authors: Shiqi Zhang, Xinxun Zeng, Bo Tang.</p><div class="publication-btn-group"><a href="javascript:;" class="btn btn-primary my-2" onclick="toggleAbstractDisplayPubPage('abstract-1-9')">abstract</a> <a href="https://arxiv.org/pdf/2001.06630.pdf" target="_blank" class="btn btn-primary">pdf</a></div><div class="abstract" id="abstract-1-9"><div><p>Influence Maximization Problem (IMP) is selecting a seed set of nodes in the social network to spread the influence as widely as possible. It has many applications in multiple domains, e.g., viral marketing is frequently used for new products or activities advertisement. While it is a classic and well-studied problem in computer science, unfortunately, all those proposed techniques are compromising among time efficiency, memory consumption, and result quality. In this paper, we conduct comprehensive experimental studies on the state-of-the-art IMP approximate approaches to reveal the underlying trade-off strategies. Interestingly, we find that even the state-of-the-art approaches are impractical when the propagation probability of the network have been taken into consideration. With the findings of existing approaches, we propose a novel residual-based approach (i.e., RCELF) for IMP, which (i) overcomes the deficiencies of existing approximate approaches, and (ii) provides theoretical guaranteed results with high efficiency in both time- and space-perspectives. We demonstrate the superiority of our proposal by extensive experimental evaluation on real datasets.</p></div></div></div></div></div><div class="publication-item" id="tang2021GeoInformatica_On" data-authors="Bo Tang, Man Lung Yiu, Kyriakos Mouratidis, Jiahao Zhang, Kai Wang." data-category="CCF-B"><div class="row publication-section"><div class="col-md-2 publication-img-wrapper"><div class="publication-img-container"><img src="img/loading.gif" data-src="img/publication/tang2021GeoInformatica_On.png" alt="" class="lazy-load-img img-fluid"></div></div><div class="col-md-10"><p class="publication-title">On Discovering Motifs and Frequent Patterns in Spatial Trajectories with Discrete Frechet Distance</p><p class="publication-location">GeoInformatica (GeoInformatica, CCF-B), 2021</p><p class="publication-author" title="Authors: Bo Tang, Man Lung Yiu, Kyriakos Mouratidis, Jiahao Zhang, Kai Wang.">Authors: Bo Tang, Man Lung Yiu, Kyriakos Mouratidis, Jiahao Zhang, Kai Wang.</p><div class="publication-btn-group"><a href="javascript:;" class="btn btn-primary my-2" onclick="toggleAbstractDisplayPubPage('abstract-1-10')">abstract</a> <a href="https://ink.library.smu.edu.sg/cgi/viewcontent.cgi?article=7947&context=sis_research" target="_blank" class="btn btn-primary">pdf</a></div><div class="abstract" id="abstract-1-10"><div><p>The discrete Fréchet distance (DFD) captures perceptual and geographical similarity between two trajectories. It has been successfully adopted in a multitude of applications, such as signature and handwriting recognition, computer graphics, as well as geographic applications. Spatial applications, e.g., sports analysis, traffic analysis, etc. require discovering similar subtrajectories within a single trajectory or across multiple trajectories. In this paper, we adopt DFD as the similarity measure, and study two representative trajectory analysis problems, namely, motif discovery and frequent pattern discovery. Due to the time complexity of DFD, these tasks are computationally challenging. We address that challenge with a suite of novel lower bound functions and a grouping-based solution. Our techniques apply directly when the analysis tasks are defined within the same or across multiple trajectories. An extensive empirical study on real trajectory datasets reveals that our approaches are 3 orders of magnitude faster than baseline solutions.</p></div></div></div></div></div><div class="publication-item" id="yan2021EuroSys_DGCL" data-authors="Zhenkun Cai, Xiao Yan, Yidi Wu, Kaihao Ma, James Cheng, Fan Yu." data-category="CCF-B"><div class="row publication-section"><div class="col-md-2 publication-img-wrapper"><div class="publication-img-container"><img src="img/loading.gif" data-src="img/publication/yan2021EuroSys_DGCL.png" alt="" class="lazy-load-img img-fluid"></div></div><div class="col-md-10"><p class="publication-title">DGCL: An efficient communication library for distributed GNN training</p><p class="publication-location">European Conference on Computer Systems (EuroSys, CCF-B), 2021</p><p class="publication-author" title="Authors: Zhenkun Cai, Xiao Yan, Yidi Wu, Kaihao Ma, James Cheng, Fan Yu.">Authors: Zhenkun Cai, Xiao Yan, Yidi Wu, Kaihao Ma, James Cheng, Fan Yu.</p><div class="publication-btn-group"><a href="javascript:;" class="btn btn-primary my-2" onclick="toggleAbstractDisplayPubPage('abstract-1-11')">abstract</a> <a href="https://www.cse.cuhk.edu.hk/~jcheng/papers/dgcl_eurosys21.pdf" target="_blank" class="btn btn-primary">pdf</a></div><div class="abstract" id="abstract-1-11"><div><p>Graph neural networks (GNNs) have gained increasing popularity in many areas such as e-commerce, social networks and bio-informatics. Distributed GNN training is essential for handling large graphs and reducing the execution time. However, for distributed GNN training, a peer-to-peer communication strategy suffers from high communication overheads. Also, different GPUs require different remote vertex embeddings, which leads to an irregular communication pattern and renders existing communication planning solutions unsuitable. We propose the distributed graph communication library (DGCL) for efficient GNN training on multiple GPUs. At the heart of DGCL is a communication planning algorithm tailored for GNN training, which jointly considers fully utilizing fast links, fusing communication, avoiding contention and balancing loads on different links. DGCL can be easily adopted to extend existing single-GPU GNN systems to distributed training. We conducted extensive experiments on different datasets and network configurations to compare DGCL with alternative communication schemes. In our experiments, DGCL reduces the communication time of the peer-to-peer communication by 77.5% on average and the training time for an epoch by up to 47%.</p></div></div></div></div></div><div class="publication-item" id="yan2021IJCNN_Self-enhanced" data-authors="Han Yang, Xiao Yan, Xinyan Dai, Yongqiang Chen, James Cheng." data-category="CCF-C"><div class="row publication-section"><div class="col-md-2 publication-img-wrapper"><div class="publication-img-container"><img src="img/loading.gif" data-src="img/publication/yan2021IJCNN_Self-enhanced.png" alt="" class="lazy-load-img img-fluid"></div></div><div class="col-md-10"><p class="publication-title">Self-enhanced gnn: Improving graph neural networks using model outputs</p><p class="publication-location">International Joint Conference on Neural Networks (IJCNN, CCF-C), 2021</p><p class="publication-author" title="Authors: Han Yang, Xiao Yan, Xinyan Dai, Yongqiang Chen, James Cheng.">Authors: Han Yang, Xiao Yan, Xinyan Dai, Yongqiang Chen, James Cheng.</p><div class="publication-btn-group"><a href="javascript:;" class="btn btn-primary my-2" onclick="toggleAbstractDisplayPubPage('abstract-1-12')">abstract</a> <a href="https://arxiv.org/pdf/2002.07518.pdf" target="_blank" class="btn btn-primary">pdf</a></div><div class="abstract" id="abstract-1-12"><div><p>Graph neural networks (GNNs) have received much attention recently because of their excellent performance on graph-based tasks. However, existing research on GNNs focuses on designing more effective models without considering much about the quality of the input data. In this paper, we propose self-enhanced GNN (SEG), which improves the quality of the input data using the outputs of existing GNN models for better performance on semi-supervised node classification. As graph data consist of both topology and node labels, we improve input data quality from both perspectives. For topology, we observe that higher classification accuracy can be achieved when the ratio of inter-class edges (connecting nodes from different classes) is low and propose topology update to remove inter-class edges and add intra-class edges. For node labels, we propose training node augmentation, which enlarges the training set using the labels predicted by existing GNN models. SEG is a general framework that can be easily combined with existing GNN models. Experimental results validate that SEG consistently improves the performance of well-known GNN models such as GCN, GAT and SGC across different datasets.</p></div></div></div></div></div><div class="publication-item" id="zeng2021IETBiometrics_Occlusion" data-authors="Dan Zeng, Raymond Veldhuis, Luuk Spreeuwers, Richard Arendsen." data-category="JCR-Q3"><div class="row publication-section"><div class="col-md-2 publication-img-wrapper"><div class="publication-img-container"><img src="img/loading.gif" data-src="img/publication/zeng2021IETBiometrics_Occlusion.png" alt="" class="lazy-load-img img-fluid"></div></div><div class="col-md-10"><p class="publication-title">Occlusion Invariant Face Recognition Using Simultaneous Segmentation</p><p class="publication-location">IET Biometrics (JCR-Q3), 2021</p><p class="publication-author" title="Authors: Dan Zeng, Raymond Veldhuis, Luuk Spreeuwers, Richard Arendsen.">Authors: Dan Zeng, Raymond Veldhuis, Luuk Spreeuwers, Richard Arendsen.</p><div class="publication-btn-group"><a href="javascript:;" class="btn btn-primary my-2" onclick="toggleAbstractDisplayPubPage('abstract-1-13')">abstract</a> <a href="https://pdfs.semanticscholar.org/4168/30115219f94e07d2c159aad2da04da87c727.pdf" target="_blank" class="btn btn-primary">pdf</a></div><div class="abstract" id="abstract-1-13"><div><p>When using convolutional neural network (CNN) models to extract features of an occluded face, the occluded part will inevitably be embedded into the representation just as with other facial regions. Existing methods deal with occluded face recognition either by augmenting the training dataset with synthesized occluded faces or by segmenting occlusions first and subsequently recognize the face based on unoccluded facial regions. Instead, simultaneous occlusion segmentation and face recognition is developed to make the most of these correlated two tasks. This is inspired by the phenomenon that features corrupted by occlusion are traceable within a CNN trained to segment occluded parts in face images. Specifically, a simultaneous occlusion invariant deep network (SOIDN) is proposed that contains simultaneously operating face recognition and occlusion segmentation networks coupled with an occlusion mask adaptor module as their bridge to learn occlusion invariant features. The training of SOIDN is jointly supervised by classification and segmentation losses aiming to obtain (1) occlusion invariant features, (2) occlusion segmentation, and (3) an occlusion feature mask that weighs the reliability of features. Experiments on synthesized occluded dataset (e.g. LFW-occ) and real occluded face dataset (e.g. AR) demonstrate that SOIDN outperforms state of the art methods for face verification and identification.</p></div></div></div></div></div><div class="publication-item" id="zeng2021IETBiometrics_A" data-authors="Dan Zeng, Raymond Veldhuis, Luuk Spreeuwers." data-category="JCR-Q3"><div class="row publication-section"><div class="col-md-2 publication-img-wrapper"><div class="publication-img-container"><img src="img/loading.gif" data-src="img/publication/zeng2021IETBiometrics_A.png" alt="" class="lazy-load-img img-fluid"></div></div><div class="col-md-10"><p class="publication-title">A survey of face recognition techniques under occlusion</p><p class="publication-location">IET Biometrics (JCR-Q3), 2021</p><p class="publication-author" title="Authors: Dan Zeng, Raymond Veldhuis, Luuk Spreeuwers.">Authors: Dan Zeng, Raymond Veldhuis, Luuk Spreeuwers.</p><div class="publication-btn-group"><a href="javascript:;" class="btn btn-primary my-2" onclick="toggleAbstractDisplayPubPage('abstract-1-14')">abstract</a> <a href="https://arxiv.org/abs/2006.11366" target="_blank" class="btn btn-primary">pdf</a></div><div class="abstract" id="abstract-1-14"><div><p>The limited capacity to recognize faces under occlusions is a long-standing problem that presents a unique challenge for face recognition systems and even for humans. The problem regarding occlusion is less covered by research when compared to other challenges such as pose variation, different expressions, etc. Nevertheless, occluded face recognition is imperative to exploit the full potential of face recognition for real-world applications. In this paper, we restrict the scope to occluded face recognition. First, we explore what the occlusion problem is and what inherent difficulties can arise. As a part of this review, we introduce face detection under occlusion, a preliminary step in face recognition. Second, we present how existing face recognition methods cope with the occlusion problem and classify them into three categories, which are 1) occlusion robust feature extraction approaches, 2) occlusion aware face recognition approaches, and 3) occlusion recovery based face recognition approaches. Furthermore, we analyze the motivations, innovations, pros and cons, and the performance of representative approaches for comparison. Finally, future challenges and method trends of occluded face recognition are thoroughly discussed.</p></div></div></div></div></div><div class="publication-item" id="tang2021AdvancedTherapeutics_Drug" data-authors="Vincent K. C. Yan, Xiaodong Li, Xuxiao Ye, Min Ou, Ruibang Luo, Qingpeng Zhang, Bo Tang, Benjamin J. Cowling, Ivan Hung, Chung Wah Siu, Ian C. K. Wong, Reynold C. K. Cheng, Esther W. Chan." data-category="None"><div class="row publication-section"><div class="col-md-2 publication-img-wrapper"><div class="publication-img-container"><img src="img/loading.gif" data-src="img/publication/tang2021AdvancedTherapeutics_Drug.png" alt="" class="lazy-load-img img-fluid"></div></div><div class="col-md-10"><p class="publication-title">Drug repurposing for the treatment of COVID-19: a knowledge graph approach</p><p class="publication-location">Advanced Therapeutics (None), 2021</p><p class="publication-author" title="Authors: Vincent K. C. Yan, Xiaodong Li, Xuxiao Ye, Min Ou, Ruibang Luo, Qingpeng Zhang, Bo Tang, Benjamin J. Cowling, Ivan Hung, Chung Wah Siu, Ian C. K. Wong, Reynold C. K. Cheng, Esther W. Chan.">Authors: Vincent K. C. Yan, Xiaodong Li, Xuxiao Ye, Min Ou, Ruibang Luo, Qingpeng Zhang, Bo Tang, Benjamin J. Cowling, Ivan Hung, Chung Wah Siu, Ian C. K. Wong, Reynold C. K. Cheng, Esther W. Chan.</p><div class="publication-btn-group"><a href="javascript:;" class="btn btn-primary my-2" onclick="toggleAbstractDisplayPubPage('abstract-1-15')">abstract</a> <a href="https://onlinelibrary.wiley.com/doi/epdf/10.1002/adtp.202100055" target="_blank" class="btn btn-primary">pdf</a></div><div class="abstract" id="abstract-1-15"><div><p>Identifying effective drug treatments for COVID-19 is essential to reduce morbidity and mortality. Although a number of existing drugs have been proposed as potential COVID-19 treatments, effective data platforms and algorithms to prioritize drug candidates for evaluation and application of knowledge graph for drug repurposing have not been adequately explored. A COVID-19 knowledge graph by integrating 14 public bioinformatic databases containing information on drugs, genes, proteins, viruses, diseases, symptoms and their linkages is developed. An algorithm is developed to extract hidden linkages connecting drugs and COVID-19 from the knowledge graph, to generate and rank proposed drug candidates for repurposing as treatments for COVID-19 by integrating three scores for each drug: motif scores, knowledge graph PageRank scores, and knowledge graph embedding scores. The knowledge graph contains over 48 000 nodes and 13 37 000 edges, including 13 563 molecules in the DrugBank database. From the 5624 molecules identified by the motif-discovery algorithms, ranking results show that 112 drug molecules had the top 2% scores, of which 50 existing drugs with other indications approved by health administrations reported. The proposed drug candidates serve to generate hypotheses for future evaluation in clinical trials and observational studies.</p></div></div></div></div></div><div class="publication-item" id="tang2021ISPRAS_Spatial" data-authors="Xiang Ji, Xiao Yan, Kaijun Ren, Xiang Wang, Bo Tang." data-category="None"><div class="row publication-section"><div class="col-md-2 publication-img-wrapper"><div class="publication-img-container"><img src="img/loading.gif" data-src="img/publication/tang2021ISPRAS_Effective.png" alt="" class="lazy-load-img img-fluid"></div></div><div class="col-md-10"><p class="publication-title">Effective and Efficient Summarization for Non-hierarchical Data</p><p class="publication-location">Ivannikov Ispras Open Conference (ISPRAS, None), 2021</p><p class="publication-author" title="Authors: Xiang Ji, Xiao Yan, Kaijun Ren, Xiang Wang, Bo Tang.">Authors: Xiang Ji, Xiao Yan, Kaijun Ren, Xiang Wang, Bo Tang.</p><div class="publication-btn-group"><a href="javascript:;" class="btn btn-primary my-2" onclick="toggleAbstractDisplayPubPage('abstract-1-16')">abstract</a> <a href="https://ieeexplore.ieee.org/document/9708661" target="_blank" class="btn btn-primary">pdf</a></div><div class="abstract" id="abstract-1-16"><div><p>Data cubes are ubiquitous in domains including meteorology, sales and demography, and data summarization is an important service that can be used to compress data cubes and extract insight. Existing data summarization methods require presupposed hierarchies for the data cube dimensions, which do not exist for many types of data (e.g., rainfall and temperature). To tackle this problem, we first define the non-hierarchical data summarization (NHDS) problem, which covers data cube using rectangle regions with an error bound and minimizes the summary size. We then show that the NHDS problem is NP-hard and design the Mark and Select (MS) algorithm to find an approximate solution. MS first identifies the qualified rectangles and then selects among the rectangles to cover the data cube. To improve efficiency, we show that it suffices to find only some of the qualified rectangles, devise a procedure to avoid checking rectangles that do not contribute to the result, save unnecessary computation during rectangle selection using sub-modularity. We conducted experiments on both real and synthetic datasets. The results show that MS significantly outperforms a state-of-the-art baseline in summary size, error and running time.</p></div></div></div></div></div></div></div></section><section class="py-5 annual-publication"><div class="container py-2"><div style="width: 100%; display: flex; justify-content: center"><h2 class="section-title mb-4 lined">2020</h2></div><div class="publication-list"><div class="publication-item" id="zeng2020TPAMI_Joint" data-authors="Feng Liu, Qijun Zhao, Xiaoming Liu, Dan Zeng." data-category="CCF-A,JCR-Q1"><div class="row publication-section"><div class="col-md-2 publication-img-wrapper"><div class="publication-img-container"><img src="img/loading.gif" data-src="img/publication/zeng2020TPAMI_Joint.png" alt="" class="lazy-load-img img-fluid"></div></div><div class="col-md-10"><p class="publication-title">Joint Face Alignment and 3D Face Reconstruction with Application to Face Recognition</p><p class="publication-location">IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI, CCF-A, JCR-Q1), 2020</p><p class="publication-author" title="Authors: Feng Liu, Qijun Zhao, Xiaoming Liu, Dan Zeng.">Authors: Feng Liu, Qijun Zhao, Xiaoming Liu, Dan Zeng.</p><div class="publication-btn-group"><a href="javascript:;" class="btn btn-primary my-2" onclick="toggleAbstractDisplayPubPage('abstract-2-0')">abstract</a> <a href="https://www.researchgate.net/publication/308157586_Joint_Face_Alignment_and_3D_Face_Reconstruction" target="_blank" class="btn btn-primary">pdf</a></div><div class="abstract" id="abstract-2-0"><div><p>Face alignment and 3D face reconstruction are traditionally accomplished as separated tasks. By exploring the strong correlation between 2D landmarks and 3D shapes, in contrast, we propose a joint face alignment and 3D face reconstruction method to simultaneously solve these two problems for 2D face images of arbitrary poses and expressions. This method, based on a summation model of 3D faces and cascaded regression in 2D and 3D shape spaces, iteratively and alternately applies two cascaded regressors, one for updating 2D landmarks and the other for 3D shape. The 3D shape and the landmarks are correlated via a 3D-to-2D mapping matrix, which is updated in each iteration to refine the location and visibility of 2D landmarks. Unlike existing methods, the proposed method can fully automatically generate both pose-and-expression-normalized (PEN) and expressive 3D faces and localize both visible and invisible 2D landmarks. Based on the PEN 3D faces, we devise a method to enhance face recognition accuracy across poses and expressions. Both linear and nonlinear implementations of the proposed method are presented and evaluated in this paper. Extensive experiments show that the proposed method can achieve the state-of-the-art accuracy in both face alignment and 3D face reconstruction, and benefit face recognition owing to its reconstructed PEN 3D face.</p></div></div></div></div></div><div class="publication-item" id="tang2020VLDB_CheetahVIS" data-authors="Wentao Ning, Qiandong Tang, Yi Zhao, Chuan Yang, Xiaofeng Wang, Teng Wang, Haotian Liu, Chaozu Zhang, Zhiyuan Zhou, Qiaomu Shen, Bo Tang." data-category="CCF-A"><div class="row publication-section"><div class="col-md-2 publication-img-wrapper"><div class="publication-img-container"><img src="img/loading.gif" data-src="img/publication/tang2020VLDB_CheetahVIS.png" alt="" class="lazy-load-img img-fluid"></div></div><div class="col-md-10"><p class="publication-title">CheetahVIS: A Visual Analytical System for Large Urban Bus Data</p><p class="publication-location">International Conference on Very Large Data Bases (VLDB, CCF-A), 2020</p><p class="publication-author" title="Authors: Wentao Ning, Qiandong Tang, Yi Zhao, Chuan Yang, Xiaofeng Wang, Teng Wang, Haotian Liu, Chaozu Zhang, Zhiyuan Zhou, Qiaomu Shen, Bo Tang.">Authors: Wentao Ning, Qiandong Tang, Yi Zhao, Chuan Yang, Xiaofeng Wang, Teng Wang, Haotian Liu, Chaozu Zhang, Zhiyuan Zhou, Qiaomu Shen, Bo Tang.</p><div class="publication-btn-group"><a href="javascript:;" class="btn btn-primary my-2" onclick="toggleAbstractDisplayPubPage('abstract-2-1')">abstract</a> <a href="https://vldb.org/pvldb/vol13/p2805-ning.pdf" target="_blank" class="btn btn-primary">pdf</a></div><div class="abstract" id="abstract-2-1"><div><p>Recently, the spatial-temporal data of urban moving objects, e.g., cars and buses, are collected and widely used in urban trajectory exploratory analysis. Urban bus service is one of the most common public transportation services. Urban bus data analysis plays an important role in smart city applications. For example, data analysts in bus companies use the urban bus data to optimize their bus scheduling plan. Map services providers, e.g., Google map, Ten-cent map, take urban bus data into account to improve their service quality (e.g., broadcast road update instantly). Unlike urban moving cars or pedestrians, urban buses travel on known bus routes. The operating buses form the &quot;bus flows&quot; in a city. Efficient analyzing urban bus flows has many challenges, e.g., how to analyze the dynamics of given bus routes? How to help users to identify traffic flow of interests easily?</p><p>In this work, we present CheetahVIS, a visual analytical system for efficient massive urban bus data analysis. CheetahVIS builds upon Spark and provides a visual analytical platform for the stakeholders (e.g., city planner, data analysts in bus company) to conduct effective and efficient analytical tasks. In the demonstration, demo visitors will be invited to experience our proposed CheetahVIS system with different urban bus data analytical functions, e.g., bus route analysis, public bus flow overview, multiple region analysis, in a real-world dataset. We also will present a case study, which compares different regions in a city, to demonstrate the effectiveness of CheetahVIS.</p></div></div></div></div></div><div class="publication-item" id="tang2020AAAI_Draft" data-authors="Meng-Hsuan Yu, Juntao Li, Danyang Liu, Bo Tang, Haisong Zhang, Dongyan Zhao, Rui Yan." data-category="CCF-A"><div class="row publication-section"><div class="col-md-2 publication-img-wrapper"><div class="publication-img-container"><img src="img/loading.gif" data-src="img/publication/tang2020AAAI_Draft.png" alt="" class="lazy-load-img img-fluid"></div></div><div class="col-md-10"><p class="publication-title">Draft and Edit: Automatic Storytelling Through Multi-Pass Hierarchical Conditional Variational Autoencoder</p><p class="publication-location">AAAI Conference on Artificial Intelligence (AAAI, CCF-A), 2020</p><p class="publication-author" title="Authors: Meng-Hsuan Yu, Juntao Li, Danyang Liu, Bo Tang, Haisong Zhang, Dongyan Zhao, Rui Yan.">Authors: Meng-Hsuan Yu, Juntao Li, Danyang Liu, Bo Tang, Haisong Zhang, Dongyan Zhao, Rui Yan.</p><div class="publication-btn-group"><a href="javascript:;" class="btn btn-primary my-2" onclick="toggleAbstractDisplayPubPage('abstract-2-2')">abstract</a> <a href="https://ojs.aaai.org/index.php/AAAI/article/view/5538" target="_blank" class="btn btn-primary">pdf</a></div><div class="abstract" id="abstract-2-2"><div><p>Automatic Storytelling has consistently been a challenging area in the field of natural language processing. Despite considerable achievements have been made, the gap between automatically generated stories and human-written stories is still significant. Moreover, the limitations of existing automatic storytelling methods are obvious, e.g., the consistency of content, wording diversity. In this paper, we proposed a multi-pass hierarchical conditional variational autoencoder model to overcome the challenges and limitations in existing automatic storytelling models. While the conditional variational autoencoder (CVAE) model has been employed to generate diversified content, the hierarchical structure and multi-pass editing scheme allow the story to create more consistent content. We conduct extensive experiments on the ROCStories Dataset. The results verified the validity and effectiveness of our proposed model and yields substantial improvement over the existing state-of-the-art approaches.</p></div></div></div></div></div><div class="publication-item" id="tang2020IEEEBigData_Towards" data-authors="Chris Liu, Pengfei Zhang, Bo Tang, Hang Shen, Ziliang Lai, Eric Lo, Korris Chung." data-category="CCF-C"><div class="row publication-section"><div class="col-md-2 publication-img-wrapper"><div class="publication-img-container"><img src="img/loading.gif" data-src="img/publication/tang2020IEEEBigData_Towards.png" alt="" class="lazy-load-img img-fluid"></div></div><div class="col-md-10"><p class="publication-title">Towards Self-Tuning Parameter Servers</p><p class="publication-location">IEEE International Conference on Big Data (IEEE BigData, CCF-C), 2020</p><p class="publication-author" title="Authors: Chris Liu, Pengfei Zhang, Bo Tang, Hang Shen, Ziliang Lai, Eric Lo, Korris Chung.">Authors: Chris Liu, Pengfei Zhang, Bo Tang, Hang Shen, Ziliang Lai, Eric Lo, Korris Chung.</p><div class="publication-btn-group"><a href="javascript:;" class="btn btn-primary my-2" onclick="toggleAbstractDisplayPubPage('abstract-2-3')">abstract</a> <a href="https://arxiv.org/pdf/1810.02935.pdf" target="_blank" class="btn btn-primary">pdf</a></div><div class="abstract" id="abstract-2-3"><div><p>Recent years, many applications have been driven advances by the use of Machine Learning (ML). Nowadays, it is common to see industrial-strength machine learning jobs that involve millions of model parameters, terabytes of training data, and weeks of training. Good efficiency, i.e., fast completion time of running a specific ML training job, therefore, is a key feature of a successful ML system. While the completion time of a long-running ML job is determined by the time required to reach model convergence, that is also largely influenced by the values of various system settings. In this paper, we contribute techniques towards building self-tuning parameter servers. Parameter Server (PS) is a popular system architecture for large-scale machine learning systems; and by self-tuning we mean while a long-running ML job is iteratively training the expert-suggested model, the system is also iteratively learning which system setting is more efficient for that job and applies it online. Our techniques are general enough to various PS-style ML systems. Experiments on TensorFlow show that our techniques can reduce the completion times of a variety of long-running TensorFlow jobs from 1.4× to 18×.</p></div></div></div></div></div><div class="publication-item" id="tang2020DI2KG_CheetahER" data-authors=", Wendi Luan, Haotian Liu, Bo Tang." data-category="None"><div class="row publication-section"><div class="col-md-2 publication-img-wrapper"><div class="publication-img-container"><img src="img/loading.gif" data-src="img/publication/tang2020DI2KG_CheetahER.png" alt="" class="lazy-load-img img-fluid"></div></div><div class="col-md-10"><p class="publication-title">CheetahER: A Fast Entity Resolution System for Heterogeneous Camera Data</p><p class="publication-location">International Workshop on Challenges and Experiences from Data Integration to Knowledge Graphs (DI2KG, None), 2020</p><p class="publication-author" title="Authors: , Wendi Luan, Haotian Liu, Bo Tang.">Authors: , Wendi Luan, Haotian Liu, Bo Tang.</p><div class="publication-btn-group"><a href="javascript:;" class="btn btn-primary my-2" onclick="toggleAbstractDisplayPubPage('abstract-2-4')">abstract</a> <a href="https://acm.sustech.edu.cn/btang/pub/DI2KG_CheetahER.pdf" target="_blank" class="btn btn-primary">pdf</a></div><div class="abstract" id="abstract-2-4"><div><p>The SIGMOD Programming Contest 2020 raises a real-world entity resolution problem, which requires to identify product specifica?tions from multiple e-commerce websites that represent the same real-world cameras. Entity resolution has been extensively studied and the general solution framework consists of two phases: blocking and matching. Most existing works focus on the matching phase, which trains (complex) models on large volumes of data and uses the models to decide whether a pair of descriptions refers to the same real-world object. However, training a high-quality model is difficult for the SIGMOD contest because there is only a limited amount of labeled data and the product specifications can be dirty and incomplete. In this paper, we propose CheetahER, an accurate and efficient entity resolution system. Different from existing works, we focus on improving the effectiveness of the blocking phase, which is overlooked in both academia researches and industry systems, and propose a two-phase blocking framework to group the product specifications according to brand and model. The pre-processing and data cleaning procedures are also carefully designed to improve data quality. CheetahER ranks the 1st in accuracy among 53 teams and completes the task within 20 seconds. Even though some de?signs of CheetahER are specialized for camera datasets, its novel two-phase blocking framework and operators (i.e., merging and splitting) may generalize to other entity resolution tasks.</p></div></div></div></div></div></div></div></section><section class="py-5 bg-gray annual-publication"><div class="container py-2"><div style="width: 100%; display: flex; justify-content: center"><h2 class="section-title mb-4 lined">2019</h2></div><div class="publication-list"><div class="publication-item" id="tang2019PVLDB_Creating" data-authors="Bo Tang, Kyriakos Mouratidis, Man Lung Yiu, Zhenyu Chen." data-category="CCF-A"><div class="row publication-section"><div class="col-md-2 publication-img-wrapper"><div class="publication-img-container"><img src="img/loading.gif" data-src="img/publication/tang2019PVLDB_Creating.png" alt="" class="lazy-load-img img-fluid"></div></div><div class="col-md-10"><p class="publication-title">Creating Top Ranking Options in the Continuous Option and Preference Space</p><p class="publication-location">Proceedings of the VLDB Endowment (PVLDB, CCF-A), 2019</p><p class="publication-author" title="Authors: Bo Tang, Kyriakos Mouratidis, Man Lung Yiu, Zhenyu Chen.">Authors: Bo Tang, Kyriakos Mouratidis, Man Lung Yiu, Zhenyu Chen.</p><div class="publication-btn-group"><a href="javascript:;" class="btn btn-primary my-2" onclick="toggleAbstractDisplayPubPage('abstract-3-0')">abstract</a> <a href="https://www.researchgate.net/publication/335052712_Creating_top_ranking_options_in_the_continuous_option_and_preference_space" target="_blank" class="btn btn-primary">pdf</a></div><div class="abstract" id="abstract-3-0"><div><p>Top-k queries are extensively used to retrieve the k most relevant options (e.g., products, services, accommodation alternatives, etc) based on a weighted scoring function that captures user preferences. In this paper, we take the viewpoint of a business owner who plans to introduce a new option to the market, with a certain type of clientele in mind. Given a target region in the consumer spectrum, we determine what attribute values the new option should have, so that it ranks among the top-k for any user in that region. Our methodology can also be used to improve an existing option, at the minimum modification cost, so that it ranks consistently high for an intended type of customers. This is the first work on competitive option placement where no distinct user(s) are targeted, but a general clientele type, i.e., a continuum of possible preferences. Here also lies our main challenge (and contribution), i.e., dealing with the interplay between two continuous spaces: the targeted region in the preference spectrum, and the option domain (where the new option will be placed). At the core of our methodology lies a novel and powerful interlinking between the two spaces. Our algorithms offer exact answers in practical response times, even for the largest of the standard benchmark datasets.</p></div></div></div></div></div><div class="publication-item" id="tang2019SIGIR_Accelerating" data-authors="Long Xiang, Bo Tang, Chuan Yang." data-category="CCF-A"><div class="row publication-section"><div class="col-md-2 publication-img-wrapper"><div class="publication-img-container"><img src="img/loading.gif" data-src="img/publication/tang2019SIGIR_Accelerating.png" alt="" class="lazy-load-img img-fluid"></div></div><div class="col-md-10"><p class="publication-title">Accelerating Exact Inner Product Retrieval by CPU-GPU System</p><p class="publication-location">International Conference on Research on Development in Information Retrieval (SIGIR, CCF-A), 2019</p><p class="publication-author" title="Authors: Long Xiang, Bo Tang, Chuan Yang.">Authors: Long Xiang, Bo Tang, Chuan Yang.</p><div class="publication-btn-group"><a href="javascript:;" class="btn btn-primary my-2" onclick="toggleAbstractDisplayPubPage('abstract-3-1')">abstract</a> <a href="https://acm.sustech.edu.cn/btang/pub/SIGIR19_GPUIR.pdf" target="_blank" class="btn btn-primary">pdf</a></div><div class="abstract" id="abstract-3-1"><div><p>Recommender systems are widely used in many applications, e.g., social network, e-commerce. Inner product retrieval IPR is the core subroutine in Matrix Factorization (MF) based recommender systems. It consists of two phases: i) inner product computation and ii) top-k items retrieval. The performance bottleneck of existing solutions is inner product computation phase. Exploiting Graphics Processing Units (GPUs) to accelerate the computation intensive workloads is the gold standard in data mining and machine learning communities. However, it is not trivial to apply CPU-GPU systems to boost the performance of IPR solutions due to the nature complex of the IPR problem. In this work, we analyze the time cost of each phase in IPR solutions at first. Second, we exploit the characteristics of CPU-GPU systems to improve performance. Specifically, the computation tasks of IPR solution are heterogeneously processed in CPU-GPU systems. Third, we demonstrate the efficiency of our proposal on four standard real datasets.</p></div></div></div></div></div><div class="publication-item" id="tang2019ICDE_Vaite" data-authors="Chuang Yang, Yilan Zhang, Bo Tang, Min Zhu." data-category="CCF-A"><div class="row publication-section"><div class="col-md-2 publication-img-wrapper"><div class="publication-img-container"><img src="img/loading.gif" data-src="img/publication/tang2019ICDE_Vaite.png" alt="" class="lazy-load-img img-fluid"></div></div><div class="col-md-10"><p class="publication-title">Vaite: a Visualization-Assisted Interactive Big Urban Trajectory Data Exploration System</p><p class="publication-location">IEEE International Conference on Data Engineering (ICDE, CCF-A), 2019</p><p class="publication-author" title="Authors: Chuang Yang, Yilan Zhang, Bo Tang, Min Zhu.">Authors: Chuang Yang, Yilan Zhang, Bo Tang, Min Zhu.</p><div class="publication-btn-group"><a href="javascript:;" class="btn btn-primary my-2" onclick="toggleAbstractDisplayPubPage('abstract-3-2')">abstract</a> <a href="https://acm.sustech.edu.cn/btang/pub/ICDE19_Vaite.pdf" target="_blank" class="btn btn-primary">pdf</a></div><div class="abstract" id="abstract-3-2"><div><p>Big urban trajectory exploration extracts insights from trajectories. It has many smart-city applications, e.g., traffic jam detection, taxi movement pattern analysis. The challenges of big urban trajectory data exploration are: (i) the data analysts probably do not have experience or knowledge on issuing their analysis tasks by SQL-like queries or analysis operations accurately; and (ii) big urban trajectory data is naturally complex, e.g., unpredictability, interrelation, etc. In this work, we architect and implement a visualization-assisted big urban trajectory data exploration system (Vaiet) to address these chanllenges. Vaiet includes three layers, from data collection to results visualization. We devise novel visualization views in Vaiet to support interactive big urban trajectory exploratory analysis. We demonstrate the effectiveness of Vaiet by the real world applications.</p></div></div></div></div></div><div class="publication-item" id="tang2019AAAI_Insufficient" data-authors="Juntao Li, Lisong Qiu, Bo Tang, Dongmin Chen, Dongyan Zhao, Rui Yan." data-category="CCF-A"><div class="row publication-section"><div class="col-md-2 publication-img-wrapper"><div class="publication-img-container"><img src="img/loading.gif" data-src="img/publication/tang2019AAAI_Insufficient.png" alt="" class="lazy-load-img img-fluid"></div></div><div class="col-md-10"><p class="publication-title">Insufficient Data Can Also Rock！Learning to Converse Using Smaller Data with Augmentation</p><p class="publication-location">AAAI Conference on Artificial Intelligence (AAAI, CCF-A), 2019</p><p class="publication-author" title="Authors: Juntao Li, Lisong Qiu, Bo Tang, Dongmin Chen, Dongyan Zhao, Rui Yan.">Authors: Juntao Li, Lisong Qiu, Bo Tang, Dongmin Chen, Dongyan Zhao, Rui Yan.</p><div class="publication-btn-group"><a href="javascript:;" class="btn btn-primary my-2" onclick="toggleAbstractDisplayPubPage('abstract-3-3')">abstract</a> <a href="https://www.researchgate.net/publication/335398508_Insufficient_Data_Can_Also_Rock_Learning_to_Converse_Using_Smaller_Data_with_Augmentation" target="_blank" class="btn btn-primary">pdf</a></div><div class="abstract" id="abstract-3-3"><div><p>Recent successes of open-domain dialogue generation mainly rely on the advances of deep neural networks. The effectiveness of deep neural network models depends on the amount of training data. As it is laboursome and expensive to acquire a huge amount of data in most scenarios, how to effectively utilize existing data is the crux of this issue. In this paper, we use data augmentation techniques to improve the performance of neural dialogue models on the condition of insufficient data. Specifically, we propose a novel generative model to augment existing data, where the conditional variational autoencoder (CVAE) is employed as the generator to output more training data with diversified expressions. To improve the correlation of each augmented training pair, we design a discriminator with adversarial training to supervise the augmentation process. Moreover, we thoroughly investigate various data augmentation schemes for neural dialogue system with generative models, both GAN and CVAE. Experimental results on two open corpora, Weibo and Twitter, demonstrate the superiority of our proposed data augmentation model.</p></div></div></div></div></div><div class="publication-item" id="tang2019AAAI_Find" data-authors="Mingyue Shang, Zhenxin Fu, Hongzhi Yin, Bo Tang, Dongyan Zhao and Rui Yan." data-category="CCF-A"><div class="row publication-section"><div class="col-md-2 publication-img-wrapper"><div class="publication-img-container"><img src="img/loading.gif" data-src="img/publication/tang2019AAAI_Find.png" alt="" class="lazy-load-img img-fluid"></div></div><div class="col-md-10"><p class="publication-title">Find a Reasonable Ending for Stories: Does Logic Relation Help the Story Cloze Test?</p><p class="publication-location">AAAI Conference on Artificial Intelligence (AAAI, CCF-A), 2019</p><p class="publication-author" title="Authors: Mingyue Shang, Zhenxin Fu, Hongzhi Yin, Bo Tang, Dongyan Zhao and Rui Yan.">Authors: Mingyue Shang, Zhenxin Fu, Hongzhi Yin, Bo Tang, Dongyan Zhao and Rui Yan.</p><div class="publication-btn-group"><a href="javascript:;" class="btn btn-primary my-2" onclick="toggleAbstractDisplayPubPage('abstract-3-4')">abstract</a> <a href="https://arxiv.org/pdf/1812.05411.pdf" target="_blank" class="btn btn-primary">pdf</a></div><div class="abstract" id="abstract-3-4"><div><p>Natural language understanding is a challenging problem that covers a wide range of tasks. While previous methods generally train each task separately, we consider combining the cross-task features to enhance the task performance. In this paper, we incorporate the logic information with the help of the Natural Language Inference (NLI) task to the Story Cloze Test (SCT). Previous work on SCT considered various semantic information, such as sentiment and topic, but lack the logic information between sentences which is an essential element of stories. Thus we propose to extract the logic information during the course of the story to improve the understanding of the whole story. The logic information is modeled with the help of the NLI task. Experimental results prove the strength of the logic information.</p></div></div></div></div></div><div class="publication-item" id="tang2019EDBT_Fast" data-authors="Jiahao Zhang, Bo Tang, Man Lung Yiu." data-category="CCF-B"><div class="row publication-section"><div class="col-md-2 publication-img-wrapper"><div class="publication-img-container"><img src="img/loading.gif" data-src="img/publication/tang2019EDBT_Fast.png" alt="" class="lazy-load-img img-fluid"></div></div><div class="col-md-10"><p class="publication-title">Fast Trajectory Range Query with Discrete Frechet Distance</p><p class="publication-location">International Conference on Extending DB Technology (EDBT, CCF-B), 2019</p><p class="publication-author" title="Authors: Jiahao Zhang, Bo Tang, Man Lung Yiu.">Authors: Jiahao Zhang, Bo Tang, Man Lung Yiu.</p><div class="publication-btn-group"><a href="javascript:;" class="btn btn-primary my-2" onclick="toggleAbstractDisplayPubPage('abstract-3-5')">abstract</a> <a href="https://www4.comp.polyu.edu.hk/~csmlyiu/conf/EDBT19_dfd.pdf" target="_blank" class="btn btn-primary">pdf</a></div><div class="abstract" id="abstract-3-5"><div><p>The discrete Fréchet distance (DFD) is widely used to measure the similarity between two trajectories. Trajectory range query has been extensively studied in trajectory analytical applications, e.g., outlier detection, movement pattern analysis. With the discrete Fréchet distance, the above applications are computation bound rather than disk I/O bound. In this work, we propose new lower and upper bound functions to speedup the evaluation of trajectory range queries. Experimental studies on three real datasets demonstrate the superiority of our proposal.</p></div></div></div></div></div><div class="publication-item" id="tang2019SIGMODChina_Mining" data-authors="Jian Zeng, Bo Tang." data-category="None"><div class="row publication-section"><div class="col-md-2 publication-img-wrapper"><div class="publication-img-container"><img src="img/loading.gif" data-src="img/publication/tang2019SIGMODChina_Mining.png" alt="" class="lazy-load-img img-fluid"></div></div><div class="col-md-10"><p class="publication-title">Mining Heterogeneous Urban Data for Retail Store Placement</p><p class="publication-location">Proceedings of the ACM Turing Celebration Conference-China (SIGMOD China, None), 2019</p><p class="publication-author" title="Authors: Jian Zeng, Bo Tang.">Authors: Jian Zeng, Bo Tang.</p><div class="publication-btn-group"><a href="javascript:;" class="btn btn-primary my-2" onclick="toggleAbstractDisplayPubPage('abstract-3-6')">abstract</a> <a href="https://dl.acm.org/doi/10.1145/3321408.3322834" target="_blank" class="btn btn-primary">pdf</a></div><div class="abstract" id="abstract-3-6"><div><p>Retail store placement problem has been extensively studied in both academic and industry as it decides the retail success of a business. Existing methods exploited either consumer studies (e.g., consultant-based solutions) or geographic features (e.g., points of interests) to settle it. However, due to the limitations of data sources (i.e., costly in time and labor), none of these methods could provide an accurate and timely solution.</p><p>In this paper, we rethink retail store placement problem by mining heterogeneous urban data. In particular, unlike existing works which only used geographic features or consumer studies solely, we extract three categories of features (i.e., human movement features, commercial area features and geographic features) from heterogeneous urban data, and integrate them into various machine learning models to predict the popularity of a prospective retail store in the candidate area. We conduct a case study with real data in Shenzhen to demonstrate the predictive power of our proposal.</p></div></div></div></div></div></div></div></section><section class="py-5 annual-publication"><div class="container py-2"><div style="width: 100%; display: flex; justify-content: center"><h2 class="section-title mb-4 lined">2018</h2></div><div class="publication-list"><div class="publication-item" id="tang2018PVLDB_Exact" data-authors="Kyriakos Mouratidis, Bo Tang." data-category="CCF-A"><div class="row publication-section"><div class="col-md-2 publication-img-wrapper"><div class="publication-img-container"><img src="img/loading.gif" data-src="img/publication/tang2018PVLDB_Exact.png" alt="" class="lazy-load-img img-fluid"></div></div><div class="col-md-10"><p class="publication-title">Exact Processing of Uncertain Top-k Queries in Multi-criteria Settings</p><p class="publication-location">Proceedings of the VLDB Endowment (PVLDB, CCF-A), 2018</p><p class="publication-author" title="Authors: Kyriakos Mouratidis, Bo Tang.">Authors: Kyriakos Mouratidis, Bo Tang.</p><div class="publication-btn-group"><a href="javascript:;" class="btn btn-primary my-2" onclick="toggleAbstractDisplayPubPage('abstract-4-0')">abstract</a> <a href="https://acm.sustech.edu.cn/btang/pub/PVLDB18_UTK.pdf" target="_blank" class="btn btn-primary">pdf</a></div><div class="abstract" id="abstract-4-0"><div><p>Traditional rank-aware processing assumes a dataset that contains available options to cover a specific need (e.g., restaurants, hotels, etc) and users who browse that dataset via top-k queries with linear scoring functions, i.e., by ranking the options according to the weighted sum of their attributes, for a set of given weights. In practice, however, user preferences (weights) may only be estimated with bounded accuracy, or may be inherently uncertain due to the inability of a human user to specify exact weight values with absolute accuracy. Motivated by this, we introduce the uncertain top-k query (UTK). Given uncertain preferences, that is, an approximate description of the weight values, the UTK query reports all options that may belong to the top-k set. A second version of the problem additionally reports the exact top-k set for each of the possible weight settings. We develop a scalable processing framework for both UTK versions, and demonstrate its efficiency using standard benchmark datasets.</p></div></div></div></div></div><div class="publication-item" id="tang2018IEEEBigData_Deriving" data-authors="Bo Tang, Chuan Yang, Long Xiang, Jian Zeng." data-category="CCF-C"><div class="row publication-section"><div class="col-md-2 publication-img-wrapper"><div class="publication-img-container"><img src="img/loading.gif" data-src="img/publication/tang2018IEEEBigData_Deriving.png" alt="" class="lazy-load-img img-fluid"></div></div><div class="col-md-10"><p class="publication-title">Deriving Real-time City Crowd Flows by Heterogeneous Big Urban Data</p><p class="publication-location">IEEE International Conference on Big Data (IEEE BigData, CCF-C), 2018</p><p class="publication-author" title="Authors: Bo Tang, Chuan Yang, Long Xiang, Jian Zeng.">Authors: Bo Tang, Chuan Yang, Long Xiang, Jian Zeng.</p><div class="publication-btn-group"><a href="javascript:;" class="btn btn-primary my-2" onclick="toggleAbstractDisplayPubPage('abstract-4-1')">abstract</a> <a href="https://ieeexplore.ieee.org/document/8622370" target="_blank" class="btn btn-primary">pdf</a></div><div class="abstract" id="abstract-4-1"><div><p>Real-time city crowd flows are extremely important for facility placement, transportation management, and public safety. In this paper, we show how to derive the real-time city crowd flows with heterogeneous urban data. Unlike existing prediction-based approaches, our proposal does not rely on the training data and learning models. We propose a computation framework for it by exploiting the massive heterogeneous urban data, which includes both immutable data (i.e., bus and subway stations, commercial-based regions) and mutable data (i.e., real-time taxi, bus, subway transactions and trajectories). In addition, our solution provides accurate and timely city crowd-flows.To provide a practical solution for it, we first partition the city into commercial-based regions with geographic information (e.g., road network, administrative regions). Then, we devise three major components (i.e., urban data fusion model, heterogeneous urban data integration model, and effective crowd flows computation model) in the computation framework to process massive heterogeneous urban data effectively and derive city crowd flows accurately. Finally, we conduct extensive experiments to demonstrate the effectiveness and efficiency of our proposed solution with real heterogeneous urban data in Shenzhen, China.</p></div></div></div></div></div><div class="publication-item" id="tang2018APWeb-WAIM_Efficient" data-authors="Wentao Wang, Bo Tang, Min Zhu." data-category="CCF-C"><div class="row publication-section"><div class="col-md-2 publication-img-wrapper"><div class="publication-img-container"><img src="img/loading.gif" data-src="img/publication/tang2018APWeb-WAIM_Efficient.png" alt="" class="lazy-load-img img-fluid"></div></div><div class="col-md-10"><p class="publication-title">Efficient Longest Streak Discovery in Multidimensional Sequence Data</p><p class="publication-location">Proceedings of the APWeb-WAIM joint Conference (APWeb-WAIM, CCF-C), 2018</p><p class="publication-author" title="Authors: Wentao Wang, Bo Tang, Min Zhu.">Authors: Wentao Wang, Bo Tang, Min Zhu.</p><div class="publication-btn-group"><a href="javascript:;" class="btn btn-primary my-2" onclick="toggleAbstractDisplayPubPage('abstract-4-2')">abstract</a> <a href="https://acm.sustech.edu.cn/btang/pub/WAIM18_lstreak.pdf" target="_blank" class="btn btn-primary">pdf</a></div><div class="abstract" id="abstract-4-2"><div><p>This paper studies the problem of discovering longest streak in multidimensional sequence dataset. Given a multidimensional sequence dataset, the contextual longest streak is the longest consecutive tuples in a context subspace which match with a specific measure constraint. It has various applications in social network analysis, computational journalism, etc. The challenges of the longest streak discovery problem are (i) huge search space, and (ii) non-monotonicity property of streak lengths. In this paper, we propose a novel computation framework with a suite of optimization techniques for it. Our solutions outperform the baseline solution by two orders of magnitude in both real and synthetic datasets. In addition, we validate the effectiveness of our proposal by a real-world case study.</p></div></div></div></div></div><div class="publication-item" id="tang2018IJBDI_A" data-authors="Julie Yixuan Zhu, Bo Tang, Victor Li." data-category="None"><div class="row publication-section"><div class="col-md-2 publication-img-wrapper"><div class="publication-img-container"><img src="img/loading.gif" data-src="img/publication/tang2018IJBDI_A.png" alt="" class="lazy-load-img img-fluid"></div></div><div class="col-md-10"><p class="publication-title">A Five-layer Architecture for Big Data Processing and Analytics</p><p class="publication-location">International Journal of Big Data Intelligenc (IJBDI, None), 2018</p><p class="publication-author" title="Authors: Julie Yixuan Zhu, Bo Tang, Victor Li.">Authors: Julie Yixuan Zhu, Bo Tang, Victor Li.</p><div class="publication-btn-group"><a href="javascript:;" class="btn btn-primary my-2" onclick="toggleAbstractDisplayPubPage('abstract-4-3')">abstract</a> <a href="https://acm.sustech.edu.cn/btang/pub/IJBDI19_arch.pdf" target="_blank" class="btn btn-primary">pdf</a></div><div class="abstract" id="abstract-4-3"><div><p>Big data technologies have attracted much attention in recent years. The academia and industry have reached a consensus, that is, the ultimate goal of big data is about transforming ‘big data’ to ‘real value’. In this article, we discuss how to achieve this goal and propose five-layer architecture for big data processing and analytics (BDPA), including a collection layer, a storage layer, a processing layer, an analytics layer, and an application layer. The five-layer architecture targets to set up a de facto standard for current BDPA solutions, to collect, manage, process, and analyse the vast volume of both static data and online data streams, and make valuable decisions for all types of industries. Functionalities and challenges of the five-layers are illustrated, with the most recent technologies and solutions discussed accordingly. We conclude with the requirements for the future BDPA solutions, which may serve as a foundation for the future big data ecosystem.</p></div></div></div></div></div></div></div></section><section class="py-5 bg-gray annual-publication"><div class="container py-2"><div style="width: 100%; display: flex; justify-content: center"><h2 class="section-title mb-4 lined">2017</h2></div><div class="publication-list"><div class="publication-item" id="tang2017SIGMOD_Determining" data-authors="Bo Tang, Kyriakos Mouratidis, Man Lung Yiu." data-category="CCF-A"><div class="row publication-section"><div class="col-md-2 publication-img-wrapper"><div class="publication-img-container"><img src="img/loading.gif" data-src="img/publication/tang2017SIGMOD_Determining.png" alt="" class="lazy-load-img img-fluid"></div></div><div class="col-md-10"><p class="publication-title">Determining the Impact Regions of Competing Options in Preference Space</p><p class="publication-location">Proceedings of the ACM Conference on Management of Data (SIGMOD, CCF-A), 2017</p><p class="publication-author" title="Authors: Bo Tang, Kyriakos Mouratidis, Man Lung Yiu.">Authors: Bo Tang, Kyriakos Mouratidis, Man Lung Yiu.</p><div class="publication-btn-group"><a href="javascript:;" class="btn btn-primary my-2" onclick="toggleAbstractDisplayPubPage('abstract-5-0')">abstract</a> <a href="https://acm.sustech.edu.cn/btang/pub/SIGMOD17_kSPR.pdf" target="_blank" class="btn btn-primary">pdf</a></div><div class="abstract" id="abstract-5-0"><div><p>In rank-aware processing, user preferences are typically represented by a numeric weight per data attribute, collectively forming a weight vector. The score of an option (data record) is defined as the weighted sum of its individual attributes. The highest-scoring options across a set of alternatives (dataset) are shortlisted for the user as the recommended ones. In that setting, the user input is a vector (equivalently, a point) in a d-dimensional preference space, where d is the number of data attributes. In this paper we study the problem of determining in which regions of the preference space the weight vector should lie so that a given option (focal record) is among the top-k score-wise. In effect, these regions capture all possible user profiles for which the focal record is highly preferable, and are therefore essential in market impact analysis, potential customer identification, profile-based marketing, targeted advertising, etc. We refer to our problem as k-Shortlist Preference Region identification (kSPR), and exploit its computational geometric nature to develop a framework for its efficient (and exact) processing. Using real and synthetic benchmarks, we show that our most optimized algorithm outperforms by three orders of magnitude a competitor we constructed from previous work on a different problem.</p></div></div></div></div></div><div class="publication-item" id="tang2017SIGMOD_Extracting" data-authors="Bo Tang, Shi Han, Man Lung Yiu, Rui Ding, Dongmei Zhang." data-category="CCF-A"><div class="row publication-section"><div class="col-md-2 publication-img-wrapper"><div class="publication-img-container"><img src="img/loading.gif" data-src="img/publication/tang2017SIGMOD_Extracting.png" alt="" class="lazy-load-img img-fluid"></div></div><div class="col-md-10"><p class="publication-title">Extracting Top-K Insights from Multi-dimensional Data</p><p class="publication-location">Proceedings of the ACM Conference on Management of Data (SIGMOD, CCF-A), 2017</p><p class="publication-author" title="Authors: Bo Tang, Shi Han, Man Lung Yiu, Rui Ding, Dongmei Zhang.">Authors: Bo Tang, Shi Han, Man Lung Yiu, Rui Ding, Dongmei Zhang.</p><div class="publication-btn-group"><a href="javascript:;" class="btn btn-primary my-2" onclick="toggleAbstractDisplayPubPage('abstract-5-1')">abstract</a> <a href="https://acm.sustech.edu.cn/btang/pub/SIGMOD17_insight.pdf" target="_blank" class="btn btn-primary">pdf</a></div><div class="abstract" id="abstract-5-1"><div><p>OLAP tools have been extensively used by enterprises to make better and faster decisions. Nevertheless, they require users to specify group-by attributes and know precisely what they are looking for. This paper takes the first attempt towards automatically extracting top-k insights from multi-dimensional data. This is useful not only for non-expert users, but also reduces the manual effort of data analysts. In particular, we propose the concept of insight which captures interesting observation derived from aggregation results in multiple steps (e.g., rank by a dimension, compute the percentage of measure by a dimension). An example insight is: ``Brand B&#39;s rank (across brands) falls along the year, in terms of the increase in sales&#39;&#39;. Our problem is to compute the top-k insights by a score function. It poses challenges on (i) the effectiveness of the result and (ii) the efficiency of computation. We propose a meaningful scoring function for insights to address (i). Then, we contribute a computation framework for top-k insights, together with a suite of optimization techniques (i.e., pruning, ordering, specialized cube, and computation sharing) to address (ii). Our experimental study on both real data and synthetic data verifies the effectiveness and efficiency of our proposed solution.</p></div></div></div></div></div><div class="publication-item" id="tang2017ICDE_Exploit" data-authors="Bo Tang, Man Lung Yiu, Kien A Hua." data-category="CCF-A"><div class="row publication-section"><div class="col-md-2 publication-img-wrapper"><div class="publication-img-container"><img src="img/loading.gif" data-src="img/publication/tang2017ICDE_Exploit.png" alt="" class="lazy-load-img img-fluid"></div></div><div class="col-md-10"><p class="publication-title">Exploit Every Bit: Effective Caching for High-Dimensional Nearest Neighbor Search (Extended Abstract)</p><p class="publication-location">International Conference on Data Engineering (ICDE, CCF-A), 2017</p><p class="publication-author" title="Authors: Bo Tang, Man Lung Yiu, Kien A Hua.">Authors: Bo Tang, Man Lung Yiu, Kien A Hua.</p><div class="publication-btn-group"><a href="javascript:;" class="btn btn-primary my-2" onclick="toggleAbstractDisplayPubPage('abstract-5-2')">abstract</a> <a href="https://acm.sustech.edu.cn/btang/pub/ICDE17_bitcache.pdf" target="_blank" class="btn btn-primary">pdf</a></div><div class="abstract" id="abstract-5-2"><div><p>High-dimensional k nearest neighbor (kNN) search has a wide range of applications in multimedia information retrieval. Existing disk-based k NN search methods incur significant I/O costs in the candidate refinement phase. In this paper, we propose to cache compact approximate representations of data points in main memory in order to reduce the candidate refinement time during k NN search. This problem raises two challenging issues: (i) which is the most effective encoding scheme for data points to support k NN search? and (ii) what is the optimal number of bits for encoding a data point? For (i), we formulate and solve a novel histogram optimization problem that decides the most effective encoding scheme. For (ii), we develop a cost model for automatically tuning the optimal number of bits for encoding points. In addition, our approach is generic and applicable to exact / approximate k NN search methods. Extensive experimental results on real datasets demonstrate that our proposal can accelerate the candidate refinement time of k NN search by at least an order of magnitude.</p></div></div></div></div></div><div class="publication-item" id="tang2017EDBT_Efficient" data-authors="Bo Tang, Man Lung Yiu, Kyriakos Mouratidis, Kai Wang." data-category="CCF-B"><div class="row publication-section"><div class="col-md-2 publication-img-wrapper"><div class="publication-img-container"><img src="img/loading.gif" data-src="img/publication/tang2017EDBT_Efficient.png" alt="" class="lazy-load-img img-fluid"></div></div><div class="col-md-10"><p class="publication-title">Efficient Motif Discovery in Spatial Trajectories Using Discrete Fréchet Distance</p><p class="publication-location">Proceedings of the International Conference on Extending Database Technology (EDBT, CCF-B), 2017</p><p class="publication-author" title="Authors: Bo Tang, Man Lung Yiu, Kyriakos Mouratidis, Kai Wang.">Authors: Bo Tang, Man Lung Yiu, Kyriakos Mouratidis, Kai Wang.</p><div class="publication-btn-group"><a href="javascript:;" class="btn btn-primary my-2" onclick="toggleAbstractDisplayPubPage('abstract-5-3')">abstract</a> <a href="https://acm.sustech.edu.cn/btang/pub/EDBT17_trajmotif.pdf" target="_blank" class="btn btn-primary">pdf</a></div><div class="abstract" id="abstract-5-3"><div><p>The discrete Fréchet distance (DFD) captures perceptual and geographical similarity between discrete trajectories. It has been successfully adopted in a multitude of applications, such as signature and handwriting recognition, computer graphics, as well as geographic applications. Spatial applications, e.g., sports analysis, traffic analysis, etc. require discovering the pair of most similar subtrajectories, be them parts of the same or of different input trajectories. The identified pair of subtrajectories is called a motif. The adoption of DFD as the similarity measure in motif discovery, although semantically ideal, is hindered by the high computational complexity of DFD calculation. In this paper, we propose a suite of novel lower bound functions and a grouping-based solution with multi-level pruning in order to compute motifs with DFD efficiently. Our techniques apply directly to motif discovery within the same or between different trajectories. An extensive empirical study on three real trajectory datasets reveals that our approach is 3 orders of magnitude faster than a baseline solution.</p></div></div></div></div></div><div class="publication-item" id="tang2017EDBT_Fast" data-authors="Bo Tang#, Yuhong Li#, Leong Hou U, Man Lung Yiu, Zhiguo Gong. # indicates equal contribution" data-category="CCF-B"><div class="row publication-section"><div class="col-md-2 publication-img-wrapper"><div class="publication-img-container"><img src="img/loading.gif" data-src="img/publication/tang2017EDBT_Fast.png" alt="" class="lazy-load-img img-fluid"></div></div><div class="col-md-10"><p class="publication-title">Fast Subsequence Search on Time Series Data</p><p class="publication-location">Proceedings of the International Conference on Extending Database Technology (EDBT, CCF-B), 2017</p><p class="publication-author" title="Authors: Bo Tang#, Yuhong Li#, Leong Hou U, Man Lung Yiu, Zhiguo Gong. # indicates equal contribution">Authors: Bo Tang#, Yuhong Li#, Leong Hou U, Man Lung Yiu, Zhiguo Gong. # indicates equal contribution</p><div class="publication-btn-group"><a href="javascript:;" class="btn btn-primary my-2" onclick="toggleAbstractDisplayPubPage('abstract-5-4')">abstract</a> <a href="https://acm.sustech.edu.cn/btang/pub/EDBT17_fastsubseq.pdf" target="_blank" class="btn btn-primary">pdf</a></div><div class="abstract" id="abstract-5-4"><div><p>Many applications on time series data require solving the subsequence search problem, which has been extensively studied in the database and data mining communities. These applications are computation bound rather than disk I/O bound. In this work, we further propose effective and cheap lower-bounds to reduce the computation cost of the subsequence search problem. Experimental studies show that the proposed lower-bounds can boost the performance of the state-of-the-art solution by up to an order of magnitude.</p></div></div></div></div></div></div></div></section><section class="py-5 annual-publication"><div class="container py-2"><div style="width: 100%; display: flex; justify-content: center"><h2 class="section-title mb-4 lined">2016</h2></div><div class="publication-list"><div class="publication-item" id="tang2016ADMSPVLDB_Exploit" data-authors="Bo Tang, Man Lung Yiu, Yuhong Li, Leong Hou U." data-category="CCF-A"><div class="row publication-section"><div class="col-md-2 publication-img-wrapper"><div class="publication-img-container"><img src="img/loading.gif" data-src="img/publication/tang2016ADMSPVLDB_Exploit.png" alt="" class="lazy-load-img img-fluid"></div></div><div class="col-md-10"><p class="publication-title">Exploit Every Cycle: Vectorized Time Series Algorithms on Modern Commodity CPUs</p><p class="publication-location">Proceedings of International Workshop on Accelerating Analytics and Data Management Systems Using Modern Processor and Storage Architectures (ADMS), affiliated with PVLDB (CCF-A), 2016</p><p class="publication-author" title="Authors: Bo Tang, Man Lung Yiu, Yuhong Li, Leong Hou U.">Authors: Bo Tang, Man Lung Yiu, Yuhong Li, Leong Hou U.</p><div class="publication-btn-group"><a href="javascript:;" class="btn btn-primary my-2" onclick="toggleAbstractDisplayPubPage('abstract-6-0')">abstract</a> <a href="https://acm.sustech.edu.cn/btang/pub/ADMS16_SIMDTS.pdf" target="_blank" class="btn btn-primary">pdf</a></div><div class="abstract" id="abstract-6-0"><div><p>Many time series algorithms reduce the computation cost by pruning unpromising candidates with lower-bound distance functions. In this paper, we focus on an orthogonal research direction that further boosts the performance by unlocking the potentials of modern commodity CPUs. First, we conduct a performance profiling on existing algorithms to understand where does time go. Second, we design vectorized implementations for lower-bound and distance functions that can enjoy characteristics (e.g., data parallelism, caching, branch prediction) provided by CPU. Third, our vectorized methods are general and applicable to many time series problems such as subsequence search, motif discovery and kNN classification. Our experimental study on real datasets shows that our proposal can achieve up to 6 times of speedup.</p></div></div></div></div></div><div class="publication-item" id="tang2016TKDE_Exploit" data-authors="Bo Tang, Man Lung Yiu, Kien A Hua." data-category="CCF-A"><div class="row publication-section"><div class="col-md-2 publication-img-wrapper"><div class="publication-img-container"><img src="img/loading.gif" data-src="img/publication/tang2016TKDE_Exploit.png" alt="" class="lazy-load-img img-fluid"></div></div><div class="col-md-10"><p class="publication-title">Exploit Every Bit: Effective Caching for High-Dimensional Nearest Neighbor Search</p><p class="publication-location">IEEE Transactions on Knowledge and Data Engineering (TKDE, CCF-A), 2016</p><p class="publication-author" title="Authors: Bo Tang, Man Lung Yiu, Kien A Hua.">Authors: Bo Tang, Man Lung Yiu, Kien A Hua.</p><div class="publication-btn-group"><a href="javascript:;" class="btn btn-primary my-2" onclick="toggleAbstractDisplayPubPage('abstract-6-1')">abstract</a> <a href="https://acm.sustech.edu.cn/btang/pub/TKDE16_bitcache.pdf" target="_blank" class="btn btn-primary">pdf</a></div><div class="abstract" id="abstract-6-1"><div><p>High-dimensional k nearest neighbor (kNN) search has a wide range of applications in multimedia information retrieval. Existing disk-based k NN search methods incur significant I/O costs in the candidate refinement phase. In this paper, we propose to cache compact approximate representations of data points in main memory in order to reduce the candidate refinement time during k NN search. This problem raises two challenging issues: (i) which is the most effective encoding scheme for data points to support k NN search? and (ii) what is the optimal number of bits for encoding a data point? For (i), we formulate and solve a novel histogram optimization problem that decides the most effective encoding scheme. For (ii), we develop a cost model for automatically tuning the optimal number of bits for encoding points. In addition, our approach is generic and applicable to exact / approximate k NN search methods. Extensive experimental results on real datasets demonstrate that our proposal can accelerate the candidate refinement time of k NN search by at least an order of magnitude.</p></div></div></div></div></div></div></div></section><section class="py-5 bg-gray annual-publication"><div class="container py-2"><div style="width: 100%; display: flex; justify-content: center"><h2 class="section-title mb-4 lined">2015</h2></div><div class="publication-list"><div class="publication-item" id="tang2015ICED_Diversified" data-authors="Chuanfei Xu, Bo Tang, Man Lung Yiu." data-category="CCF-A"><div class="row publication-section"><div class="col-md-2 publication-img-wrapper"><div class="publication-img-container"><img src="img/loading.gif" data-src="img/publication/tang2015ICED_Diversified.png" alt="" class="lazy-load-img img-fluid"></div></div><div class="col-md-10"><p class="publication-title">Diversified caching for replicated web search engines</p><p class="publication-location">IEEE International Conference on Data Engineering (ICED, CCF-A), 2015</p><p class="publication-author" title="Authors: Chuanfei Xu, Bo Tang, Man Lung Yiu.">Authors: Chuanfei Xu, Bo Tang, Man Lung Yiu.</p><div class="publication-btn-group"><a href="javascript:;" class="btn btn-primary my-2" onclick="toggleAbstractDisplayPubPage('abstract-7-0')">abstract</a> <a href="https://acm.sustech.edu.cn/btang/pub/ICDE15_divcache.pdf" target="_blank" class="btn btn-primary">pdf</a></div><div class="abstract" id="abstract-7-0"><div><p>Commercial web search engines adopt parallel and replicated architecture in order to support high query throughput. In this paper, we investigate the effect of caching on the throughput in such a setting. A simple scheme, called uniform caching, would replicate the cache content to all servers. Unfortunately, it does not exploit the variations among queries, thus wasting memory space on caching the same cache content redundantly on multiple servers. To tackle this limitation, we propose a diversified caching problem, which aims to diversify the types of queries served by different servers, and maximize the sharing of terms among queries assigned to the same server. We show that it is NP-hard to find the optimal diversified caching scheme, and identify intuitive properties to seek good solutions. Then we present a framework with a suite of techniques and heuristics for diversified caching. Finally, we evaluate the proposed solution with competitors by using a real dataset and a real query log.</p></div></div></div></div></div></div></div></section><footer style="background: #eee;" class="footer"><div class="bg-dark py-2"><div class="container"><div class="row"><div class="" style="width: 100%; display: flex; align-items: center;justify-content: space-around;flex-wrap: wrap;"><p class="small text-gray mb-0 mx-3" style="white-space: nowrap;">© 2022 All rights reserved</p><p class="small text-gray mb-0 mx-3" style="white-space: nowrap;">Template by <a href="https://bootstrapious.com" class="text-gray">Bootstrapious.com</a></p></div></div></div></div></footer><script src="vendor/jquery/jquery.min.js"></script><script src="vendor/bootstrap/js/bootstrap.bundle.min.js"></script><script src="vendor/lightbox2/js/lightbox.min.js"></script><script src="vendor/leaflet/leaflet.js"></script><link rel="stylesheet" href="https://use.fontawesome.com/releases/v5.7.1/css/all.css" integrity="sha384-fnmOCqbTlWIlj8LyTjo7mOUStjsKC4pOpQbqyi7RrhN7udi9RwhKkMHpvLbHG9Sr" crossorigin="anonymous"></body></html>