Pulling Actions out of Context: Explicit Separation for Effective Combination #159

yiskw713 · 2021-04-08T07:22:48Z

INFO

author

Yang Wang and Minh Hoai

affiliation

Stony Brook University, Stony Brook, NY 11794, USA

conference or year

CVPR 2018

link

paper

概要

動画認識において，現状のシステムでは人間の行動と，それに付随する有意な要因(物体や背景など)の分離がうまくいっておらず，行動認識の結果が背景に影響を受けてしまうことがある．

そこで本研究では，行動を含む動画とコンテキスト(背景，物体，カメラモーションなどあらゆる視覚情報)が似ているが行動を含まない動画(conjugate sample)からの情報を活用することで，追加のアノテーションなしで，人間の行動とコンテキストの情報を分離する手法を提案．

提案手法

動画の行動が起きてる場所の前後のフレームをconjugate samplesと定義する．
そうするとコンテキストの情報がほとんど全て同じで，
それ以外の行動の情報のみが異なる動画を用意することができる．

conjugate samplesを有効に活用するナイーブな方法は，conjugate samplesを負例として扱う方法であるが，これはあまりうまくいかない．なぜなら，この方法ではcontextの情報がネガティブな根拠だと判断してしまうからだ．しかしながら，contextの情報は分類に有効になることもあるため，この方法はうまくいかない．
もう一つのナイーブな方法は，全てのconjugate samplesを正例として扱う方法である．しかしながらこれも有効ではない．というのもconjugate samplesには行動の情報が含まれてなく，この方法では行動の情報を学習することができない．

そこで本研究では，上図のようなアプローチを提案している．行動認識器は，action extractor, context extractor, action classifierの3つからなる．この行動認識器を (i) classification loss (ii) action sample と conjugate sampleの行動特徴量の類似度 (iii) action sample と conjugate sampleのコンテキスト特徴の相違度，の3つを最小化するように学習させる．

ネットワークはC3Dをベースにしている．

学習ステップは以下の通り．

実験

提案法で精度向上を確認

date

Apr. 8, 2021

yiskw713 added the action recognition label Apr 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pulling Actions out of Context: Explicit Separation for Effective Combination #159

Pulling Actions out of Context: Explicit Separation for Effective Combination #159

yiskw713 commented Apr 8, 2021 •

edited

Loading

Pulling Actions out of Context: Explicit Separation for Effective Combination #159

Pulling Actions out of Context: Explicit Separation for Effective Combination #159

Comments

yiskw713 commented Apr 8, 2021 • edited Loading

INFO

author

affiliation

conference or year

link

概要

提案手法

実験

date

yiskw713 commented Apr 8, 2021 •

edited

Loading