cvpr19_sumt.html

<html><head><meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
<title>CVPR 2019 Summarization Tutorial</title>
<style type="text/css" media="screen">
html, body, div, span, applet, object, iframe, h1, h2, h3, h4, h5, h6, p, blockquote, pre, a, abbr, acronym, address, big, cite, code, del, dfn, em, font, img, ins, kbd, q, s, samp, small, strike, strong, sub, tt, var, dl, dt, dd, ol, ul, li, fieldset, form, label, legend, table, caption, tbody, tfoot, thead, tr, th, td {
  border: 0pt none;
  font-family: inherit;
  font-size: 100%;
  font-style: inherit;
  font-weight: inherit;
  margin: 0pt;
  outline-color: invert;
  outline-style: none;
  outline-width: 0pt;
  padding: 0pt;
  vertical-align: baseline;
}

a {
  color: #1772d0;
  text-decoration:none;
}

a:focus, a:hover {
  color: #f09228;
  text-decoration:none;
}

a.paper {
  font-weight: bold;
  font-size: 12pt;
}

b.paper {
  font-weight: bold;
  font-size: 12pt;
}

* {
  margin: 0pt;
  padding: 0pt;
}

body {
  position: relative;
  margin: 3em auto 2em auto;
  width: 800px;
  font-family: Lato, Verdana, Helvetica, sans-serif;
  font-size: 14px;
  background: #eee;
}

h2 {
  font-family: Lato, Verdana, Helvetica, sans-serif;
  font-size: 18pt;
  font-weight: 700;
}

h3 {
  font-family: Lato, Verdana, Helvetica, sans-serif;
  font-size: 16px;
  font-weight: 700;
}

strong {
  font-family: Lato, Verdana, Helvetica, sans-serif;
  font-size: 13px;
}

ul {
  list-style: circle;
}

img {
  border: none;
}

li {
  padding-bottom: 0.5em;
  margin-left: 1.4em;
}

strong, b {
  font-weight:bold;
}

em, i {
  font-style:italic;
}

div.section {
  clear: both;
  margin-bottom: 1.5em;
  background: #eee;
}

div.spanner {
  clear: both;
}

div.paper {
  clear: both;
  margin-top: 0.5em;
  margin-bottom: 1em;
  border: 1px solid #ddd;
  background: #fff;
  padding: 1em 1em 1em 1em;
}

div.paper div {
  padding-left: 200px;
}

img.paper {
  margin-bottom: 0.5em;
  float: left;
  width: 170px;
}

div.dissert {
  clear: both;
  margin-top: 0.5em;
  margin-bottom: 1em;
  border: 1px solid #ddd;
  background: #fff;
  padding: 1em 1em 1em 1em;
}

div.dissert div {
  padding-left: 150px;
}

img.dissert {
  margin-bottom: 0.5em;
  float: left;
  width: 140px;
}

span.blurb {
  font-style:italic;
  display:block;
  margin-top:0.75em;
  margin-bottom:0.5em;
}

pre, code {
  font-family: 'Lucida Console', 'Andale Mono', 'Courier', monospaced;
  margin: 1em 0;
  padding: 0;
}

div.paper pre {
  font-size: 0.9em;
}

</style>

<script type="text/javascript" async="" src="./page_files/ga.js"></script><script type="text/javascript">
  var _gaq = _gaq || [];
  _gaq.push(['_setAccount', 'UA-7953909-1']);
  _gaq.push(['_trackPageview']);

  (function() {
    var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
    ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
    var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
  })();
</script>

<script type="text/javascript" src="./page_files/hidebib.js"></script>

<link href="./page_files/css" rel="stylesheet" type="text/css">
<!--<link href='http://fonts.googleapis.com/css?family=Open+Sans+Condensed:300' rel='stylesheet' type='text/css'>-->
<!--<link href='http://fonts.googleapis.com/css?family=Open+Sans' rel='stylesheet' type='text/css'>-->
<!--<link href='http://fonts.googleapis.com/css?family=Yanone+Kaffeesatz' rel='stylesheet' type='text/css'>-->
<style id="style-1-cropbar-clipper">/* Copyright 2014 Evernote Corporation. All rights reserved. */
.en-markup-crop-options {
    top: 18px !important;
    left: 50% !important;
    margin-left: -100px !important;
    width: 200px !important;
    border: 2px rgba(255,255,255,.38) solid !important;
    border-radius: 4px !important;
}

.en-markup-crop-options div div:first-of-type {
    margin-left: 0px !important;
}
</style></head>

<body>

<div style="margin-bottom: 1em; border: 1px solid #ddd; background-color: #fff; padding: 1em; height: 150px;">
    <div style="padding-left: 1em; vertical-align: top; height: 250px;">
      <span style="font-size: 18pt; line-height: 130%;"> Recent Advances in Visual Data Summarization</span><br><br>
    <span>  CVPR 2019 Tutorial<br><br>
    Location: Room 203C
    <br><br>
    Sunday, June 16, 2019, 1:30 pm - 5:30 pm   
    <br><br>
    </div>
  </div>
</div>


<div class="section">
<h2> Organizers </h2>
  <div class="paper">
  <strong><li></strong> <a href="https://rpand002.github.io/">Rameswar Panda</a>: Research Staff Member, IBM Research AI, MIT-IBM Watson AI Lab</a>.<br>
  <strong><li></strong> <a href="http://www.ccs.neu.edu/home/eelhami/">Ehsan Elhamifar</a>: Assistant Professor, Northeastern University</a>.<br>
  <strong><li></strong> <a href="https://gyglim.github.io/me/index.html">Michael Gygli</a>: Research Scientist, Google Research, Zurich</a>.<br>
  <strong><li></strong> <a href="http://boqinggong.info/">Boqing Gong</a>: Research Scientist, Google Research, Seattle</a>.<br>
</div>
</div>


<div class="section">
<h2>Tutorial Description</h2>
  <div class="paper">
Visual data summarization has many applications 
ranging from computer vision (video summarization, video
captioning, active visual learning, object detection, image/video
segmentation, etc) to data mining (recommender systems, webdata analysis, etc). 
As a consequence, new important research topics
and problems are recently appearing, (i) online and distributed
summarization, (ii) weakly supervised summarization, (iii)
summarization in sequential data, as well as (iv) summarization in
networks of cameras, in particular, for surveillance tasks. The
objective of this tutorial is to present the audience with a unifying
perspective of the visual data summarization problem from both
theoretical and application standpoint, as well as to discuss,
motivate and encourage future research that will spur disruptive
progress in the the emerging field of summarization. 
</div>
</div>

<div class="section">
<h2>Schedule</h2>
  <div class="paper">
  <strong><li></strong> 1:30 pm - 1:50 pm: Introduction and Overview: Rameswar Panda</a>
    <a href="https://www.dropbox.com/s/bhhhjqmcqth2211/Introduction_Rameswar.pdf?dl=0"> [Slides]</a><br>
  <strong><li></strong> 1:50 pm - 2:40 pm: Dynamic Subset Selection: Ehsan Elhamifar</a> 
    <a href="https://www.dropbox.com/s/qi0rwxmo4ylu49r/EhsanElhamifar_CVPR19Tutorial.pdf?dl=0"> [Slides]</a><br>
  <strong><li></strong> 2:40 pm - 3:30 pm: Video Summarization Objectives: Michael Gygli</a>
    <a href="https://docs.google.com/presentation/d/1fabTAz48AIAYTX6Vyqc_yhWTa_XVRm1FVU7JhhW_Nts/edit#slide=id.g579e21329d_0_546"> [Slides]</a><br>
  <strong><li></strong> 3:30 pm - 3:50 pm: Break</a>.<br>
  <strong><li></strong> 3:50 pm - 4:40 pm: Weakly Supervised Video Summarization: Rameswar Panda</a>
    <a href="https://www.dropbox.com/s/77acu0iofytk4dd/Weak_Summ_Rameswar.pdf?dl=0"> [Slides]</a><br>
  <strong><li></strong> 4:40 pm - 5:30 pm: Sequential Determinantal Point Processes: Boqing Gong</a>
    <a href="http://boqinggong.info/assets/seqDPP-tutorial.pdf"> [Slides]</a><br>
</div>
</div>


<div class="section">
<h2>Abstracts</h2>
  <div class="paper">
  <strong><li></strong><strong>Dynamic Subset Selection: Algorithms, Theory and Applications to Procedure Learning (Ehsan):</strong> 
  Subset selection is the task of finding a small subset of most informative points from a large dataset and finds many applications in computer vision 
including image and video summarization, data clustering, active visual learning and classifier selection, among others. Despite many studies, 
the majority of existing methods ignore dynamics and important structured dependencies among points and require many pairs of datasets and ground-truth summaries 
for efficient learning. In this talk, I will discuss a new class of utility functions by generalizing the well-known facility location to structured settings, 
develop scalable algorithms based on extensions of submodular maximization and discuss theoretical underpinning of the developed methods. 
I will discuss an important application in vision, which is understanding procedural videos, where I show using tools from dynamic subset selection significantly 
improves the performance over existing methods. I will also discuss incorporating high-level reasoning into the developed methods by learning from humans with small amount of annotations</a>.<br>

  <strong><li></strong><strong>Video Summarization Objectives during Training and Testing (Michael):</strong> 
  The omnipresence of video recording devices has created the need to automatically edit and summarize videos.
Video summarization is a challenging task, however. What characterizes a good summary depends on the context and the task that one aims to execute.
This makes obtaining ground truth for summarization datasets and the evaluation of summarization methods difficult.
As a result, datasets are typically small and it is unclear how well existing evaluation metrics align with human preferences.
In this talk, I will first discuss existing datasets and how recent works compensate the lack large-scale datasets.
Approaches for this include pre-training on other tasks or using weakly-supervised and unsupervised training objectives. Others rely on web priors or use topic-similarity to summarize multiple videos jointly.
Second, I will discuss the advantages and disadvantages of existing evaluation metrics.
Finally, I will propose ideas on how to train better models and more reliably track performance of summarization models</a>.<br>

<strong><li></strong><strong>Weakly Supervised Video Summarization (Rameswar):</strong> 
  Many of the recent successes in video summarization have been driven by the availability of large quantities of labeled training data. 
However, in the vast majority of real-world settings, collecting such data sets by hand is infeasible due to the cost of labeling data or 
the paucity of data in a given domain. One increasingly popular approach is to use weaker forms of supervision that are potentially less precise 
but can be substantially less costly than producing explicit annotation for the given task. In this talk, we will first discuss about different 
forms of weak supervision that can be leveraged while summarizing videos. We will present how the context of additional topic-related videos can 
provide more knowledge and useful clues to extract semantically meaningful video summaries. Next, we will introduce how the context of a video 
in a scene, e.g., video level labels help generating a meaningful video summary by avoiding the requirement of huge amount of human-labeled 
video-summary pairs in fully supervised algorithms. Finally, we will describe how sparse optimization methods exploiting content correlations 
across multiple videos in a camera network or multiple videos resulting from a web search help in generating an informative multi-video summary describing the whole video collection</a>.<br>

<strong><li></strong><strong>Sequential Determinantal Point Processes: Models, Algorithms, and Applications in Diverse and Sequential Subset Selection
 (Boqing):</strong> 
  Determinantal point processes (DPPs) were first used to characterize the Pauli exclusion principle, 
  which states that two identical particles cannot occupy the same quantum state simultaneously. 
  The notion of exclusion has made DPP an appealing tool to model diversity in applications such as video summarization and image ranking.
In this talk, I will give a gentle review to DPPs and then present sequential DPPs (seqDPPs), a probabilistic model we originally proposed 
for modeling video summarization as a supervised, diverse, and sequential subset selection process — in contrast, prior approaches to 
video summarization were largely unsupervised. This talk will cover both seqDPPs and hierarchical seqDPPs, three tailored training algorithms 
(maximum likelihood estimation, large-margin, and reinforcement), and their applications to vanilla video summarization as well as query-focused video summarization</a>.<br>
</div>
</div>

<div class="section">
<h2>Target Audience</h2>
  <div class="paper">
The intended audience are academicians, graduate students and industrial
researchers who are interested in the state-of-the-art machine learning techniques for information
extraction and summarization in large high-dimensional datasets that are considered to be mixed,
multi-modal, inhomogeneous, heterogeneous, or hybrid. Audience with mathematical and theoretical 
inclination will enjoy the course as much as the audience with practical tendency.
</div>
</div>


<div class="section">
<h2>Speaker Bios</h2>
  <div class="paper">
  <strong><li></strong><a href="https://rpand002.github.io/">Rameswar Panda</a> is currently a Research Staff Member at IBM 
  Research AI, MIT-IBM Watson AI Lab, Cambridge, USA. Prior to joining IBM, he obtained his Ph.D in Electrical and Computer Engineering from
   University of California, Riverside in 2018. His primary research interests span thevareas of computer vision, machine learning and multimedia. 
   In particular, his current focus is on developing semi, weakly, unsupervised algorithms for solving different vision problems. 
  His work has been published intop-tier conferences such as CVPR, ICCV, ECCV, MM as well as high impact journals such as TIP and TMM</a>.<br>

  <strong><li></strong><a href="http://www.ccs.neu.edu/home/eelhami/">Ehsan Elhamifar</a> is an Assistant Professor in the College of Computer and Information Science (CCIS) and is
the director of the Mathematical, Computational and Applied Data Science (MCADS) Lab at the Northeastern
University. Prof. Elhamifar is a recipient of the DARPA Young Faculty Award and the NSF CISE Career
Research Initiation Award on the topic of Big Data Summarization. Previously, he was a postdoctoral scholar in
the Electrical Engineering and Computer Science (EECS) department at the University of California, Berkeley.
Prof. Elhamifar obtained his PhD from the Electrical and Computer Engineering (ECE) department at the Johns
Hopkins University. Prof. Elhamifars research areas are machine learning, computer vision and optimization.
He is interested in developing scalable and robust algorithms that can address challenges of complex and massive
high-dimensional data. Specifically, he uses tools from convex, nonconvex and submodular optimization, sparse
and low-rank modeling, deep learning and high-dimensional statistics to develop algorithms and theory and
applies them to solve real-world challenging problems, including big data summarization, procedure learning
from instructional data, large-scale recognition with small labeled data and active learning for visual data</a>.<br>

<strong><li></strong><a href="https://gyglim.github.io/me/index.html">Michael Gygli</a> is a research scientist at Google AI in Zurich, working under Prof. Vittorio Ferrari. Before
joining Google, Michael was the head of AI at gifs.com, leading the efforts to automate video editing through
summarization and highlight detection. In 2017 he obtained a PhD from ETH Zurich for his thesis on Interest-Based 
Video Summarization via Subset Selection, under the supervision of Prof. Luc Van Gool.
Michael has published several papers at venues such as CVPR, ICCV, ECCV, ICML and MM</a>.<br>

<strong><li></strong><a href="http://boqinggong.info/">Boqing Gong</a> is a research scientist at Google, Seattle and a remote principal investigator at ICSI, Berkeley. 
His research in machine learning and computer vision focuses on modeling algorithms and visual recognition. Before joining Google in 2019, he worked in Tencent and 
was a tenure-track Assistant Professor at the University of Central Florida (UCF). He received an NSF CRII award in 2016 and an NSF BIGDATA award in 2017, 
both of which were the first of their kinds ever granted to UCF. He is/was a (senior) area chair of NeurIPS 2019, ICCV 2019, ICML 2019, AISTATS 2019, AAAI 2020, and WACV 2018--2020. 
He received his Ph.D. in 2015 at the University of Southern California, where the Viterbi Fellowship partially supported his work</a>.<br>
</div>
</div>

</body></html>