标签: 去码头整点论文 |

本文主要记录一下近4年(2019年起)各顶会顶刊有关video的paper名字，以便后续video dialog工作的调研和展开
(本文档未经过任何筛选，仅通过关键词搜索得到paper名字)

2022 ECCV

DexMV: Imitation Learning for Dexterous Manipulation from Human Videos
Video Dialog as Conversation About Objects Living in Space-Time
Actor-Centered Representations for Action Localization in Streaming Videos
AutoTransition: Learning to Recommend Video Transition Effects
Sports Video Analysis on Large-Scale Data
Semantic-Aware Implicit Neural Audio-Driven Video Portrait Generation
Quantized GAN for Complex Music Generation from Dance Videos
Telepresence Video Quality Assessment
GAMa: Cross-View Video Geo-Localization
FAR: Fourier Aerial Video Recognition
Fabric Material Recovery from Video Using Multi-scale Geometric Auto-Encoder
Video Graph Transformer for Video Question Answering
Video Question Answering with Iterative Video-Text Co-tokenization
Can Shuffling Video Benefit Temporal Bias Problem: A Novel Training Framework for Temporal Grounding
Selective Query-Guided Debiasing for Video Corpus Moment Retrieval
Learning Linguistic Association Towards Efficient Text-Video Retrieval
VisageSynTalk: Unseen Speaker Video-to-Speech Synthesis via Speech-Visage Feature Selection
CycDA: Unsupervised Cycle Domain Adaptation to Learn from Image to Video
Expanding Language-Image Pretrained Models for General Video Recognition
AdaFocusV3: On Unified Spatial-Temporal Dynamic Video Recognition
Delving into Details: Synopsis-to-Detail Networks for Video Recognition
Scale-Aware Spatio-Temporal Relation Learning for Video Anomaly Detection
Continual 3D Convolutional Neural Networks for Real-time Processing of Videos
Geometric Features Informed Multi-person Human-Object Interaction Recognition in Videos
Neural Capture of Animatable 3D Human from Monocular Video
FAST-VQA: Efficient End-to-End Video Quality Assessment with Fragment Sampling
Real-RawVSR: Real-World Raw Video Super-Resolution with a Benchmark Dataset
Synthesizing Light Field Video from Monocular Video
Video Interpolation by Event-Driven Anisotropic Adjustment of Optical Flow
CelebV-HQ: A Large-Scale Video Facial Attributes Dataset
SmoothNet: A Plug-and-Play Network for Refining Human Poses in Videos
RayTran: 3D Pose Estimation and Shape Reconstruction of Multiple Objects from Videos with Ray-Traced Transformers
SALISA: Saliency-Based Input Sampling for Efficient Video Object Detection
Video Anomaly Detection by Solving Decoupled Spatio-Temporal Jigsaw Puzzles
Weakly-Supervised Temporal Action Detection for Fine-Grained Videos with Hierarchical Atomic Actions
Contrast-Phys: Unsupervised Video-Based Remote Physiological Measurement via Spatiotemporal Contrast
Hierarchical Contrastive Inconsistency Learning for Deepfake Video Detection
Generative Adversarial Network for Future Hand Segmentation from Egocentric Video
My View is the Best View: Procedure Learning from Egocentric Videos
Self-supervised Sparse Representation for Video Anomaly Detection
Few-Shot Video Object Detection
Inductive and Transductive Few-Shot Video Classification via Appearance and Temporal Alignments
Graph Neural Network for Cell Tracking in Microscopy Videos
Particle Video Revisited: Tracking Through Occlusions Using Point Trajectories
Towards Generic 3D Tracking in RGBD Videos: Benchmark and Baseline
Tackling Background Distraction in Video Object Segmentation
Learned Variational Video Color Propagation
Ensemble Learning Priors Driven Deep Unfolding for Scalable Video Snapshot Compressive Imaging
Bridging Images and Videos: A Simple Learning Framework for Large Vocabulary Video Object Detection
LocVTP: Video-Text Pre-training for Temporal Localization
Learning to Drive by Watching YouTube Videos: Action-Conditioned Contrastive Policy Pretraining
Static and Dynamic Concepts for Self-supervised Video Representation Learning
Neural Video Compression Using GANs for Detail Synthesis and Propagation
Is It Necessary to Transfer Temporal Knowledge for Domain Adaptive Video Semantic Segmentation?
Meta Spatio-Temporal Debiasing for Video Scene Graph Generation
PolyphonicFormer: Unified Query Learning for Depth-Aware Video Panoptic Segmentation
Video Restoration Framework and Its Meta-adaptations to Data-Poor Conditions
SeqFormer: Sequential Transformer for Video Instance Segmentation
In Defense of Online Models for Video Instance Segmentation
XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model
Video Mask Transfiner for High-Quality Video Instance Segmentation
Point Primitive Transformer for Long-Term 4D Point Cloud Video Understanding
Waymo Open Dataset: Panoramic Video Panoptic Segmentation
One-Trimap Video Matting
Learning Quality-aware Dynamic Memory for Video Object Segmentation
Instance as Identity: A Generic Online Paradigm for Video Instance Segmentation
BATMAN: Bilateral Attention Transformer in Motion-Appearance Neighboring Space for Video Object Segmentation
Global Spectral Filter Memory Network for Video Object Segmentation
Video Instance Segmentation via Multi-Scale Spatio-Temporal Split Attention Transformer
Domain Adaptive Video Segmentation via Temporal Pseudo Supervision
GOCA: Guided Online Cluster Assignment for Self-supervised Video Representation Learning
Learn2Augment: Learning to Composite Videos for Data Augmentation in Action Recognition
Federated Self-supervised Learning for Video Understanding
NeuMan: Neural Human Radiance Field from a Single Video
Structure and Motion from Casual Videos
The Anatomy of Video Editing: A Dataset and Benchmark Suite for AI-Assisted Video Editing
MUGEN: A Playground for Video-Audio-Text Multimodal Understanding and GENeration
earning Omnidirectional Flow in 360$^\circ $ Video via Siamese Representation
PTSEFormer: Progressive Temporal-Spatial Enhanced TransFormer Towards Video Object Detection
Dual-Stream Knowledge-Preserving Hashing for Unsupervised Video Retrieval
Multi-query Video Retrieval
TS2-Net: Token Shift and Selection Transformer for Text-Video Retrieval
Learning Audio-Video Modalities from Image Captions
Lightweight Attentional Feature Fusion: A New Baseline for Text-to-Video Retrieval
Audio-Visual Mismatch-Aware Video Retrieval via Association and Adjustment
CAViT: Contextual Alignment Vision Transformer for Video Object Re-identification
Relighting4D: Neural Relightable Human from Videos
Real-Time Intermediate Flow Estimation for Video Frame Interpolation
Deep Bayesian Video Frame Interpolation
A Perceptual Quality Metric for Video Frame Interpolation
Fast-Vid2Vid: Spatial-Temporal Compression for Video-to-Video Synthesis
Temporally Consistent Semantic Video Editing
Error Compensation Framework for Flow-Guided Video Inpainting
Learning Cross-Video Neural Representations for High-Quality Frame Interpolation
A Style-Based GAN Encoder for High Fidelity Reconstruction of Images and Videos
Harmonizer: Learning to Perform White-Box Image and Video Harmonization
Text2LIVE: Text-Driven Layered Image and Video Editing
CANF-VC: Conditional Augmented Normalizing Flows for Video Compression
Video Extrapolation in Space and Time
Augmentation of rPPG Benchmark Datasets: Learning to Remove and Embed rPPG Signals via Double Cycle Consistent Learning from Unpaired Facial Videos
Layered Controllable Video Generation
Spatio-Temporal Deformable Attention Network for Video Deblurring
Sound-Guided Semantic Video Generation
Controllable Video Generation Through Global and Local Motion Dynamics
Long Video Generation with Time-Agnostic VQGAN and Time-Sensitive Transformer
Combining Internal and External Constraints for Unrolling Shutter in Videos
A Codec Information Assisted Framework for Efficient Compressed Video Super-Resolution
Diverse Generation from a Single Video Made Possible
Learning Shadow Correspondence for Video Shadow Detection
Flow-Guided Transformer for Video Inpainting
Learning Spatio-Temporal Downsampling for Effective Video Upscaling
Learning Spatiotemporal Frequency-Transformer for Compressed Video Super-Resolution
Efficient Meta-Tuning for Content-Aware Neural Video Delivery
Towards Interpretable Video Super-Resolution via Alternating Optimization
Event-guided Deblurring of Unknown Exposure Time Videos
Unidirectional Video Denoising by Mimicking Backward Recurrent Modules with Look-Ahead Forward Ones
ERDN: Equivalent Receptive Field Deformable Network for Video Deblurring
RealFlow: EM-Based Realistic Optical Flow Dataset Generation from Videos
Efficient Video Deblurring Guided by Motion Magnitude
TempFormer: Temporally Consistent Transformer for Video Denoising
Rethinking Video Rain Streak Removal: A New Synthesis Model and a Deraining Network with Video Rain Prior
AlphaVC: High-Performance and Efficient Learned Video Compression
Source-Free Video Domain Adaptation by Learning Temporal Consistency for Action Recognition
Towards Open Set Video Anomaly Detection
EclipSE: Efficient Long-Range Video Retrieval Using Sight and Sound
Joint-Modal Label Denoising for Weakly-Supervised Audio-Visual Video Parsing
Less Than Few: Self-shot Video Instance Segmentation
Real-Time Online Video Detection with Temporal Smoothing Transformers
Mining Relations Among Cross-Frame Affinities for Video Semantic Segmentation
TL;DW? Summarizing Instructional Videos with Task Relevance and Cross-Modal Saliency
DualFormer: Local-Global Stratified Transformer for Efficient Video Recognition
Hierarchical Feature Alignment Network for Unsupervised Video Object Segmentation
PAC-Net: Highlight Your Video via History Preference Modeling
How Severe Is Benchmark-Sensitivity in Video Self-supervised Learning?
NSNet: Non-saliency Suppression Sampler for Efficient Video Recognition
Video Activity Localisation with Uncertainties in Temporal Boundary
Temporal Saliency Query Network for Efficient Video Recognition
Efficient One-Stage Video Object Detection by Exploiting Temporal Consistency
Spotting Temporally Precise, Fine-Grained Events in Video
Efficient Video Transformers with Spatial-Temporal Token Selection
Long Movie Clip Classification with State-Space Video Models
Prompting Visual-Language Models for Efficient Video Understanding
Asymmetric Relation Consistency Reasoning for Video Relation Grounding
K-centered Patch Sampling for Efficient Video Recognition
GraphVid: It only Takes a Few Nodes to Understand a Video
Delta Distillation for Efficient Video Processing
COMPOSER: Compositional Reasoning of Group Activity in Videos with Keypoint-Only Modality
E-NeRV: Expedite Neural Video Representation with Disentangled Spatial-Temporal Context
TDViT: Temporal Dilated Video Transformer for Dense Video Tasks
Flow Graph to Video Grounding for Weakly-Supervised Multi-step Localization
MaCLR: Motion-Aware Contrastive Learning of Representations for Videos
Frozen CLIP Models are Efficient Video Learners
Panoramic Vision Transformer for Saliency Detection in 360$^\circ $ Videos
Bayesian Tracking of Video Graphs Using Joint Kalman Smoothing and Registration
Motion Sensitive Contrastive Learning for Self-supervised Video Representation
Dynamic Temporal Filtering in Video Models
VTC: Improving Video-Text Retrieval with User Comments
Automatic Dense Annotation of Large-Vocabulary Sign Language Videos
MILES: Visual BERT Pre-training with Injected Language Semantics for Video-Text Retrieval

Mr.R大约 8 分钟

一些关于dialog方向的论文收集

ECCV

2022 ECCV

一些关于video方向的论文收集

2022 ECCV