-
Notifications
You must be signed in to change notification settings - Fork 0
/
cvpr19_sumt.html
348 lines (297 loc) · 16 KB
/
cvpr19_sumt.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
<html><head><meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
<title>CVPR 2019 Summarization Tutorial</title>
<style type="text/css" media="screen">
html, body, div, span, applet, object, iframe, h1, h2, h3, h4, h5, h6, p, blockquote, pre, a, abbr, acronym, address, big, cite, code, del, dfn, em, font, img, ins, kbd, q, s, samp, small, strike, strong, sub, tt, var, dl, dt, dd, ol, ul, li, fieldset, form, label, legend, table, caption, tbody, tfoot, thead, tr, th, td {
border: 0pt none;
font-family: inherit;
font-size: 100%;
font-style: inherit;
font-weight: inherit;
margin: 0pt;
outline-color: invert;
outline-style: none;
outline-width: 0pt;
padding: 0pt;
vertical-align: baseline;
}
a {
color: #1772d0;
text-decoration:none;
}
a:focus, a:hover {
color: #f09228;
text-decoration:none;
}
a.paper {
font-weight: bold;
font-size: 12pt;
}
b.paper {
font-weight: bold;
font-size: 12pt;
}
* {
margin: 0pt;
padding: 0pt;
}
body {
position: relative;
margin: 3em auto 2em auto;
width: 800px;
font-family: Lato, Verdana, Helvetica, sans-serif;
font-size: 14px;
background: #eee;
}
h2 {
font-family: Lato, Verdana, Helvetica, sans-serif;
font-size: 18pt;
font-weight: 700;
}
h3 {
font-family: Lato, Verdana, Helvetica, sans-serif;
font-size: 16px;
font-weight: 700;
}
strong {
font-family: Lato, Verdana, Helvetica, sans-serif;
font-size: 13px;
}
ul {
list-style: circle;
}
img {
border: none;
}
li {
padding-bottom: 0.5em;
margin-left: 1.4em;
}
strong, b {
font-weight:bold;
}
em, i {
font-style:italic;
}
div.section {
clear: both;
margin-bottom: 1.5em;
background: #eee;
}
div.spanner {
clear: both;
}
div.paper {
clear: both;
margin-top: 0.5em;
margin-bottom: 1em;
border: 1px solid #ddd;
background: #fff;
padding: 1em 1em 1em 1em;
}
div.paper div {
padding-left: 200px;
}
img.paper {
margin-bottom: 0.5em;
float: left;
width: 170px;
}
div.dissert {
clear: both;
margin-top: 0.5em;
margin-bottom: 1em;
border: 1px solid #ddd;
background: #fff;
padding: 1em 1em 1em 1em;
}
div.dissert div {
padding-left: 150px;
}
img.dissert {
margin-bottom: 0.5em;
float: left;
width: 140px;
}
span.blurb {
font-style:italic;
display:block;
margin-top:0.75em;
margin-bottom:0.5em;
}
pre, code {
font-family: 'Lucida Console', 'Andale Mono', 'Courier', monospaced;
margin: 1em 0;
padding: 0;
}
div.paper pre {
font-size: 0.9em;
}
</style>
<script type="text/javascript" async="" src="./page_files/ga.js"></script><script type="text/javascript">
var _gaq = _gaq || [];
_gaq.push(['_setAccount', 'UA-7953909-1']);
_gaq.push(['_trackPageview']);
(function() {
var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
})();
</script>
<script type="text/javascript" src="./page_files/hidebib.js"></script>
<link href="./page_files/css" rel="stylesheet" type="text/css">
<!--<link href='http://fonts.googleapis.com/css?family=Open+Sans+Condensed:300' rel='stylesheet' type='text/css'>-->
<!--<link href='http://fonts.googleapis.com/css?family=Open+Sans' rel='stylesheet' type='text/css'>-->
<!--<link href='http://fonts.googleapis.com/css?family=Yanone+Kaffeesatz' rel='stylesheet' type='text/css'>-->
<style id="style-1-cropbar-clipper">/* Copyright 2014 Evernote Corporation. All rights reserved. */
.en-markup-crop-options {
top: 18px !important;
left: 50% !important;
margin-left: -100px !important;
width: 200px !important;
border: 2px rgba(255,255,255,.38) solid !important;
border-radius: 4px !important;
}
.en-markup-crop-options div div:first-of-type {
margin-left: 0px !important;
}
</style></head>
<body>
<div style="margin-bottom: 1em; border: 1px solid #ddd; background-color: #fff; padding: 1em; height: 150px;">
<div style="padding-left: 1em; vertical-align: top; height: 250px;">
<span style="font-size: 18pt; line-height: 130%;"> Recent Advances in Visual Data Summarization</span><br><br>
<span> CVPR 2019 Tutorial<br><br>
Location: Room 203C
<br><br>
Sunday, June 16, 2019, 1:30 pm - 5:30 pm
<br><br>
</div>
</div>
</div>
<div class="section">
<h2> Organizers </h2>
<div class="paper">
<strong><li></strong> <a href="https://rpand002.github.io/">Rameswar Panda</a>: Research Staff Member, IBM Research AI, MIT-IBM Watson AI Lab</a>.<br>
<strong><li></strong> <a href="http://www.ccs.neu.edu/home/eelhami/">Ehsan Elhamifar</a>: Assistant Professor, Northeastern University</a>.<br>
<strong><li></strong> <a href="https://gyglim.github.io/me/index.html">Michael Gygli</a>: Research Scientist, Google Research, Zurich</a>.<br>
<strong><li></strong> <a href="http://boqinggong.info/">Boqing Gong</a>: Research Scientist, Google Research, Seattle</a>.<br>
</div>
</div>
<div class="section">
<h2>Tutorial Description</h2>
<div class="paper">
Visual data summarization has many applications
ranging from computer vision (video summarization, video
captioning, active visual learning, object detection, image/video
segmentation, etc) to data mining (recommender systems, webdata analysis, etc).
As a consequence, new important research topics
and problems are recently appearing, (i) online and distributed
summarization, (ii) weakly supervised summarization, (iii)
summarization in sequential data, as well as (iv) summarization in
networks of cameras, in particular, for surveillance tasks. The
objective of this tutorial is to present the audience with a unifying
perspective of the visual data summarization problem from both
theoretical and application standpoint, as well as to discuss,
motivate and encourage future research that will spur disruptive
progress in the the emerging field of summarization.
</div>
</div>
<div class="section">
<h2>Schedule</h2>
<div class="paper">
<strong><li></strong> 1:30 pm - 1:50 pm: Introduction and Overview: Rameswar Panda</a>
<a href="https://www.dropbox.com/s/bhhhjqmcqth2211/Introduction_Rameswar.pdf?dl=0"> [Slides]</a><br>
<strong><li></strong> 1:50 pm - 2:40 pm: Dynamic Subset Selection: Ehsan Elhamifar</a>
<a href="https://www.dropbox.com/s/qi0rwxmo4ylu49r/EhsanElhamifar_CVPR19Tutorial.pdf?dl=0"> [Slides]</a><br>
<strong><li></strong> 2:40 pm - 3:30 pm: Video Summarization Objectives: Michael Gygli</a>
<a href="https://docs.google.com/presentation/d/1fabTAz48AIAYTX6Vyqc_yhWTa_XVRm1FVU7JhhW_Nts/edit#slide=id.g579e21329d_0_546"> [Slides]</a><br>
<strong><li></strong> 3:30 pm - 3:50 pm: Break</a>.<br>
<strong><li></strong> 3:50 pm - 4:40 pm: Weakly Supervised Video Summarization: Rameswar Panda</a>
<a href="https://www.dropbox.com/s/77acu0iofytk4dd/Weak_Summ_Rameswar.pdf?dl=0"> [Slides]</a><br>
<strong><li></strong> 4:40 pm - 5:30 pm: Sequential Determinantal Point Processes: Boqing Gong</a>
<a href="http://boqinggong.info/assets/seqDPP-tutorial.pdf"> [Slides]</a><br>
</div>
</div>
<div class="section">
<h2>Abstracts</h2>
<div class="paper">
<strong><li></strong><strong>Dynamic Subset Selection: Algorithms, Theory and Applications to Procedure Learning (Ehsan):</strong>
Subset selection is the task of finding a small subset of most informative points from a large dataset and finds many applications in computer vision
including image and video summarization, data clustering, active visual learning and classifier selection, among others. Despite many studies,
the majority of existing methods ignore dynamics and important structured dependencies among points and require many pairs of datasets and ground-truth summaries
for efficient learning. In this talk, I will discuss a new class of utility functions by generalizing the well-known facility location to structured settings,
develop scalable algorithms based on extensions of submodular maximization and discuss theoretical underpinning of the developed methods.
I will discuss an important application in vision, which is understanding procedural videos, where I show using tools from dynamic subset selection significantly
improves the performance over existing methods. I will also discuss incorporating high-level reasoning into the developed methods by learning from humans with small amount of annotations</a>.<br>
<strong><li></strong><strong>Video Summarization Objectives during Training and Testing (Michael):</strong>
The omnipresence of video recording devices has created the need to automatically edit and summarize videos.
Video summarization is a challenging task, however. What characterizes a good summary depends on the context and the task that one aims to execute.
This makes obtaining ground truth for summarization datasets and the evaluation of summarization methods difficult.
As a result, datasets are typically small and it is unclear how well existing evaluation metrics align with human preferences.
In this talk, I will first discuss existing datasets and how recent works compensate the lack large-scale datasets.
Approaches for this include pre-training on other tasks or using weakly-supervised and unsupervised training objectives. Others rely on web priors or use topic-similarity to summarize multiple videos jointly.
Second, I will discuss the advantages and disadvantages of existing evaluation metrics.
Finally, I will propose ideas on how to train better models and more reliably track performance of summarization models</a>.<br>
<strong><li></strong><strong>Weakly Supervised Video Summarization (Rameswar):</strong>
Many of the recent successes in video summarization have been driven by the availability of large quantities of labeled training data.
However, in the vast majority of real-world settings, collecting such data sets by hand is infeasible due to the cost of labeling data or
the paucity of data in a given domain. One increasingly popular approach is to use weaker forms of supervision that are potentially less precise
but can be substantially less costly than producing explicit annotation for the given task. In this talk, we will first discuss about different
forms of weak supervision that can be leveraged while summarizing videos. We will present how the context of additional topic-related videos can
provide more knowledge and useful clues to extract semantically meaningful video summaries. Next, we will introduce how the context of a video
in a scene, e.g., video level labels help generating a meaningful video summary by avoiding the requirement of huge amount of human-labeled
video-summary pairs in fully supervised algorithms. Finally, we will describe how sparse optimization methods exploiting content correlations
across multiple videos in a camera network or multiple videos resulting from a web search help in generating an informative multi-video summary describing the whole video collection</a>.<br>
<strong><li></strong><strong>Sequential Determinantal Point Processes: Models, Algorithms, and Applications in Diverse and Sequential Subset Selection
(Boqing):</strong>
Determinantal point processes (DPPs) were first used to characterize the Pauli exclusion principle,
which states that two identical particles cannot occupy the same quantum state simultaneously.
The notion of exclusion has made DPP an appealing tool to model diversity in applications such as video summarization and image ranking.
In this talk, I will give a gentle review to DPPs and then present sequential DPPs (seqDPPs), a probabilistic model we originally proposed
for modeling video summarization as a supervised, diverse, and sequential subset selection process — in contrast, prior approaches to
video summarization were largely unsupervised. This talk will cover both seqDPPs and hierarchical seqDPPs, three tailored training algorithms
(maximum likelihood estimation, large-margin, and reinforcement), and their applications to vanilla video summarization as well as query-focused video summarization</a>.<br>
</div>
</div>
<div class="section">
<h2>Target Audience</h2>
<div class="paper">
The intended audience are academicians, graduate students and industrial
researchers who are interested in the state-of-the-art machine learning techniques for information
extraction and summarization in large high-dimensional datasets that are considered to be mixed,
multi-modal, inhomogeneous, heterogeneous, or hybrid. Audience with mathematical and theoretical
inclination will enjoy the course as much as the audience with practical tendency.
</div>
</div>
<div class="section">
<h2>Speaker Bios</h2>
<div class="paper">
<strong><li></strong><a href="https://rpand002.github.io/">Rameswar Panda</a> is currently a Research Staff Member at IBM
Research AI, MIT-IBM Watson AI Lab, Cambridge, USA. Prior to joining IBM, he obtained his Ph.D in Electrical and Computer Engineering from
University of California, Riverside in 2018. His primary research interests span thevareas of computer vision, machine learning and multimedia.
In particular, his current focus is on developing semi, weakly, unsupervised algorithms for solving different vision problems.
His work has been published intop-tier conferences such as CVPR, ICCV, ECCV, MM as well as high impact journals such as TIP and TMM</a>.<br>
<strong><li></strong><a href="http://www.ccs.neu.edu/home/eelhami/">Ehsan Elhamifar</a> is an Assistant Professor in the College of Computer and Information Science (CCIS) and is
the director of the Mathematical, Computational and Applied Data Science (MCADS) Lab at the Northeastern
University. Prof. Elhamifar is a recipient of the DARPA Young Faculty Award and the NSF CISE Career
Research Initiation Award on the topic of Big Data Summarization. Previously, he was a postdoctoral scholar in
the Electrical Engineering and Computer Science (EECS) department at the University of California, Berkeley.
Prof. Elhamifar obtained his PhD from the Electrical and Computer Engineering (ECE) department at the Johns
Hopkins University. Prof. Elhamifars research areas are machine learning, computer vision and optimization.
He is interested in developing scalable and robust algorithms that can address challenges of complex and massive
high-dimensional data. Specifically, he uses tools from convex, nonconvex and submodular optimization, sparse
and low-rank modeling, deep learning and high-dimensional statistics to develop algorithms and theory and
applies them to solve real-world challenging problems, including big data summarization, procedure learning
from instructional data, large-scale recognition with small labeled data and active learning for visual data</a>.<br>
<strong><li></strong><a href="https://gyglim.github.io/me/index.html">Michael Gygli</a> is a research scientist at Google AI in Zurich, working under Prof. Vittorio Ferrari. Before
joining Google, Michael was the head of AI at gifs.com, leading the efforts to automate video editing through
summarization and highlight detection. In 2017 he obtained a PhD from ETH Zurich for his thesis on Interest-Based
Video Summarization via Subset Selection, under the supervision of Prof. Luc Van Gool.
Michael has published several papers at venues such as CVPR, ICCV, ECCV, ICML and MM</a>.<br>
<strong><li></strong><a href="http://boqinggong.info/">Boqing Gong</a> is a research scientist at Google, Seattle and a remote principal investigator at ICSI, Berkeley.
His research in machine learning and computer vision focuses on modeling algorithms and visual recognition. Before joining Google in 2019, he worked in Tencent and
was a tenure-track Assistant Professor at the University of Central Florida (UCF). He received an NSF CRII award in 2016 and an NSF BIGDATA award in 2017,
both of which were the first of their kinds ever granted to UCF. He is/was a (senior) area chair of NeurIPS 2019, ICCV 2019, ICML 2019, AISTATS 2019, AAAI 2020, and WACV 2018--2020.
He received his Ph.D. in 2015 at the University of Southern California, where the Viterbi Fellowship partially supported his work</a>.<br>
</div>
</div>
</body></html>