Machine Perception

Research in machine perception tackles the hard problems of understanding images, sounds, music and video. In recent years, our computers have become much better at such tasks, enabling a variety of new applications such as: content-based search in Google Photos and Image Search, natural handwriting interfaces for Android, optical character recognition for Google Drive documents, and recommendation systems that understand music and YouTube videos. Our approach is driven by algorithms that benefit from processing very large, partially-labeled datasets using parallel computing clusters. A good example is our recent work on object recognition using a novel deep convolutional neural network architecture known as Inception that achieves state-of-the-art results on academic benchmarks and allows users to easily search through their large collection of Google Photos. The ability to mine meaningful information from multimedia is broadly applied throughout Google.

454 Publications

A Neural Representation of Sketch Drawings

David Ha, Douglas Eck

ICLR 2018
Aperture Supervision for Monocular Depth Estimation

Pratul Srinivasan, Rahul Garg, Neal Wadhwa, Ren Ng, Jonathan T. Barron

CVPR (2018) (to appear)
Burst Denoising with Kernel Prediction Networks

Ben Mildenhall, Jonathan T. Barron, Jiawen Chen, Dillon Sharlet, Ren Ng, Rob Carroll

CVPR (2018) (to appear)
COCO-Stuff: Thing and Stuff Classes in Context

Holger Caesar, Jasper Uijlings, Vittorio Ferrari

CVPR (2018) (to appear)
Cross-View Training for Semi-Supervised Learning

Kevin Clark, Quoc V. Le, Thang Luong

ICLR (2018) (to appear)
Frame-Recurrent Video Super-Resolution

Mehdi S. M. Sajjadi, Raviteja Vemulapalli, Matthew Brown

CVPR (2018) (to appear)
Grader variability and the importance of reference standards for evaluating machine learning models for diabetic retinopathy

Dale Webster, Ehsan Rahimy, Greg Corrado, Jonathan Krause, Kasumi Widner, Lily Peng, Peter Karth, Varun Gulshan

Ophthalmology (2018)
Intriguing Properties of Adversarial Examples

Barret Zoph, Ekin Dogus Cubuk, Quoc V. Le, Sam Schoenholz

ICLR (2018)
Large-Scale 3D Scene Classification With Multi-View Volumetric CNN

Dror Aiger, Brett Allen, Aleksey Golovinskiy

arxiv (2018)
Learning Intelligent Dialogs for Bounding-Box Annotation

Ksenia Konyushkova, Jasper Uijlings, Chris Lampert, Vittorio Ferrari

CVPR (2018) (to appear)
Learning with Imprinted Weights

Hang Qi, David Lowe, Matthew Brown

CVPR (2018) (to appear)
Matrix capsules with EM routing

Geoffrey Hinton, Sara Sabour, Nicholas Frosst

ICLR (2018) (to appear)
Revisiting knowledge transfer for training object class detectors

Jasper Uijlings, Stefan Popov, Vittorio Ferrari

CVPR (2018) (to appear)
Searching for Activation Functions

Prajit Ramachandran, Barret Zoph, Quoc Le

ICLR (2018)
Sequences with Low-Discrepancy Blue-Noise 2-D Projections

Helene Perrier, David Coeurjolly, Feng Xie, Matt Pharr, Pat Hanrahan, Victor Ostromoukhov

Proceedings of Eurographics (2018)
Thermometer Encoding: One Hot Way To Resist Adversarial Examples

Aurko Roy, Colin Raffel, Ian Goodfellow, Jacob Buckman

ICLR (2018)
Time-Contrastive Networks: Self-Supervised Learning from Video

Pierre Sermanet, Corey Lynch, Yevgen Chebotar, Jasmine Hsu, Eric Jang, Stefan Schaal, Sergey Levine

Proceedings of International Conference in Robotics and Automation (ICRA 2018) + Deep Learning for Robotic Vision (DLRV) Workshop at CVPR 2017 + Deep Reinforcement Learning Symposium at NIPS 2017 (2018)
Towards learning a metric for neural prosthetics

Nishal P Shah, Sasidhar Madugula, Alan Litke, Alexander Sher, EJ Chichilnisky, Yoram Singer, Jonathon Shlens

ICLR (2018) (to appear)
Unsupervised Learning of Depth and Egomotion from Monocular Video Using 3D Geometric Constraints

Reza Mahjourian, Martin Wicke, Anelia Angelova

CVPR (2018)
Unsupervised Learning of Semantic Audio Representations

Aren Jansen, Manoj Plakal, Ratheet Pandya, Dan Ellis, Shawn Hershey, Jiayang Liu, Channing Moore, Rif A. Saurous

Proceedings of ICASSP 2018 (to appear)
Using Simulation and Domain Adaptation to Improve Efficiency of Deep Robotic Grasping

Konstantinos Bousmalis, Alex Irpan, Paul Wohlhart, Yunfei Bai, Matthew Kelcey, Mrinal Kalakrishnan, Laura Downs, Julian Ibarz, Peter Pastor Sampedro, Kurt Konolige, Sergey Levine, Vincent Vanhoucke

ICRA (2018)
3D object classification and retrieval with Spherical CNNs

Carlos Esteves, Christine Allen-Blanchette, Ameesh Makadia, Kostas Daniilidis

ArXiv (2017)
A Learned Representation For Artistic Style

Vincent Dumoulin, Jonathon Shlens, Manjunath Kudlur

ICLR (2017)
A No-Reference Video Quality Predictor for H.264 Compression and Scaling Artifacts

Deepti Ghadiyaram, Chao Chen, Sasi Inguva, Anil Kokaram

IEEE International Conference on Image Processing, IEEE (2017) (to appear)
A discriminative view of MRF pre-processing algorithms

Chen Wang, Charles Herrmann, Ramin Zabih

ICCV 2017
AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions

Chunhui Gu, Chen Sun, David A. Ross, Carl Vondrick, Caroline Pantofaru, Yeqing Li, Sudheendra Vijayanarasimhan, George Toderici, Susanna Ricco, Rahul Sukthankar, Cordelia Schmid, Jitendra Malik

Arxiv (2017)
Accelerating Eulerian Fluid Simulation With Convolutional Networks

Jonathan Tompson, Kristofer Schlachter, Pablo Sprechmann, Ken Perlin

ICML (2017)
Adversarial Machine Learning at Scale

Alexey Kurakin, Ian J. Goodfellow, Samy Bengio

ICLR (2017)
Adversarial Patch

Tom Brown, Dandelion Mane, Aurko Roy, Martin Abadi, Justin Gilmer

NIPS Workshop (2017)
Adversarial examples in the physical world

Alexey Kurakin, Ian Goodfellow, Samy Bengio

ICLR Workshop (2017)
Ambisonics soundfield navigation using directional decomposition and path distance estimation

Andrew Allen, Bastiaan Kleijn

(2017)
Appearance-and-Relation Networks for Video Classification

Limin Wang, Wei Li, Wen Li, Luc Van Gool

arXiv (2017)
Are GANs Created Equal? A Large-Scale Study

Mario Lučić, Karol Kurach, Marcin Michalski, Sylvain Gelly, Olivier Bousquet

arXiv (2017)
Associative Domain Adaptation

Philip Haeusser, Thomas Frerix, Alexander Mordvintsev, Daniel Cremers

International Conference on Computer Vision (ICCV), IEEE (2017) (to appear)
Attention-based Extraction of Structured Information from Street View Imagery

Zbigniew Wojna, Alex Gorban, Dar-Shyang Lee, Kevin Murphy, Qian Yu, Yeqing Li, Julian Ibarz

ICDAR (2017), pp. 8
Audio Set: An ontology and human-labeled dataset for audio events

Jort F. Gemmeke, Daniel P. W. Ellis, Dylan Freedman, Aren Jansen, Wade Lawrence, R. Channing Moore, Manoj Plakal, Marvin Ritter

Proc. IEEE ICASSP 2017, New Orleans, LA (to appear)
Automatic Spatially-aware Fashion Concept Discovery

Xintong Han, Zuxuan Wu, Phoenix X Huang, Xiao Zhang, Menglong Zhu, Yuan Li, Yang Zhao, Larry S Davis

ICCV (2017)
BranchOut: Regularization for Online Ensemble Tracking with CNNs

Bohyung Han, Hartwig Adam, Jack Sim

CVPR (2017) (to appear)
CNN Architectures for Large-Scale Audio Classification

Shawn Hershey, Sourish Chaudhuri, Daniel P. W. Ellis, Jort F. Gemmeke, Aren Jansen, Channing Moore, Manoj Plakal, Devin Platt, Rif A. Saurous, Bryan Seybold, Malcolm Slaney, Ron Weiss, Kevin Wilson

International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE (2017)
Cognitive Mapping and Planning for Visual Navigation

Saurabh Gupta, James Davidson, Sergey Levine, Rahul Sukthankar, Jitendra Malik

CVPR (2017)
Conditional Image Synthesis With Auxiliary Classifier GANs

Augustus Odena, Christopher Olah, Jonathon Shlens

ICML (2017)
Context-aware Captions from Context-agnostic Supervision

Shanmukha Ramakrishna Vedantam, Samy Bengio, Kevin Murphy, Devi Parikh, Gal Chechik

CVPR (2017)
CycleGAN, a Master of Steganography

Casey Chu, Andrey Zhmoginov, Mark Sandler

NIPS 2017 Workshop “Machine Deception” (2017)
Decomposing Motion and Content for Natural Video Sequence Prediction

Ruben Villegas, Jimei Yang, Seunghoon Hong, Xunyu Lin, Honglak Lee

ICLR (2017)
Deep Bilateral Learning for Real-Time Image Enhancement

Michaël Gharbi, Jiawen Chen, Jonathan T. Barron, Sam Hasinoff, Frédo Durand

ACM Transactions on Graphics, ACM (2017)
Deep Metric Learning via Facility Location

Hyun Oh Song, Stefanie Jegelka, Vivek Rathod, Kevin Murphy

IEEE CVPR (2017)
Deep Visual Foresight for Planning Robot Motion

Sergey Levine, Chelsea Finn

ICRA (2017)
Deformable Shape Completion with Graph Convolutional Autoencoders

Or Litany, Alex Bronstein, Michael Bronstein, Ameesh Makadia

CVPR 2018 (2017) (to appear)
Deformable block based motion estimation in omnidirectional image sequences

Francesca De Simone, Neil Birkbeck, Balu Adsumilli, Pascal Frossard

IEEE 19th International Workshop on Multimedia Signal Processing (2017)
Detecting Cancer Metastases on Gigapixel Pathology Images

Yun Liu, Krishna Kumar Gadepalli, Mohammad Norouzi, George Dahl, Timo Kohlberger, Subhashini Venugopalan, Aleksey S Boyko, Aleksei Timofeev, Philip Q Nelson, Greg Corrado, Jason Hipp, Lily Peng, Martin Stumpe

MICCAI (2017)
Encoding Bitrate Optimization Using Playback Statistics for HTTP-based Adaptive Video Streaming

Chao Chen, Yao-Chung Lin, Anil Kokaram, Steve Benting

arxiv (2017)
End-to-End Learning of Semantic Grasping

Eric Jang, Julian Ibarz, Peter Pastor Sampedro, Sergey Levine, Sudheendra Vijayanarasimhan

CoRL 2017 (2017) (to appear)
Enhancing Video Summarization via Vision-Language Embedding

Bryan Plummer, Matthew Brown, Svetlana Lazebnik

IEEE International Conference on Computer Vision and Pattern Recognition (2017)
Exploring the structure of a real-time, arbitrary neural artistic stylization network

Golnaz Ghiasi, Honglak Lee, Manjunath Kudlur, Vincent Dumoulin, Jonathon Shlens

Proceedings of the 28th British Machine Vision Conference (BMVC) (2017)
Extreme clicking for efficient object annotation

Dim Papadopoulos, Jasper Uijlings, Frank Keller, Vittorio Ferrari

ICCV (2017)
Eyemotion: Classifying facial expressions in VR using eye-tracking cameras

Steven Hickson, Nick Dufour, Avneesh Sud, Vivek Kwatra, Irfan Essa

arXiv, https://arxiv.org/abs/1707.07204 (2017)
Fast Fourier Color Constancy

Jonathan T. Barron, Yun-Ta Tsai

CVPR (2017)
Feature agnostic geometric alignment

Dror Aiger, Yoni Weill

Patent (2017)
Geometry-Based Next Frame Prediction from Monocular Video

Reza Mahjourian, Martin Wicke, Anelia Angelova

Intelligent Vehicles Symposium (2017)
Guetzli: Perceptually Guided JPEG Encoder

Jyrki Alakuijala, Robert Obryk, Ostap Stoliarchuk, Zoltan Szabadka, Lode Vandevenne, Jan Wassenberg

arXiv (2017)
Headset Removal for Virtual and Mixed Reality

Christian Frueh, Avneesh Sud, Vivek Kwatra

SIGGRAPH Talks 2017, ACM SIGGRAPH (to appear)
Human and Machine Hearing: Extracting Meaning from Sound

Richard F. Lyon

Cambridge University Press (2017)
Improving Phenotypic Measurements in High-Content Imaging Screens

D. Mike Ando, Cory McLean, Marc Berndl

bioRxiv (2017)
Improving Smiling Detection with Race and Gender Diversity

Hee Jung Ryu, Margaret Mitchell, Hartwig Adam

arXiv (2017)
Incoherent idempotent ambisonics rendering

W. Bastiaan Kleijn, Andrew Allen, Jan Skoglund, Felicia Lim

2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (2017)
Joint Wideband Source Localization and Acquisition Based on a Grid-Shift Approach

Christos Tzagkarakis, Bastiaan Kleijn, Jan Skoglund

2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (2017)
Large-Scale Audio Event Discovery in One Million YouTube Videos

Aren Jansen, Jort F. Gemmeke, Daniel P. W. Ellis, Xiaofeng Liu, Wade Lawrence, Dylan Freedman

Proceedings of ICASSP (2017) (to appear)
Large-Scale Content-Only Video Recommendation

Joonseok Lee, Sami Abu-El-Haija

International Conference on Computer Vision Workshop, Computer Vision Foundation (2017), pp. 987 - 995
Large-Scale Image Retrieval with Attentive Deep Local Features

Hyeonwoo Noh, Andre Araujo, Jack Sim, Tobias Weyand, Bohyung Han

Proc. ICCV (2017) (to appear)
Learning Discriminative and Transformation Covariant Local Feature Detectors

Xu Zhang, Felix Yu, Svebor Karaman, Shih-Fu Chang

CVPR (2017)
Learning From Noisy Large-Scale Datasets With Minimal Supervision

Andreas Veit, Neil Alldrin, Gal Chechik, Ivan Krasin, Abhinav Gupta, Serge Belongie

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017), pp. 839-847
Learning Spread-out Local Feature Descriptors

Xu Zhang, Felix Yu, Sanjiv Kumar, Shih-Fu Chang

ICCV (2017)
Learning Unified Embedding for Apparel Recognition

Yang Song, Yuan Li, Bo Wu, Chao-Yeh Chen, Xiao Zhang, Hartwig Adam

ICCV Computational Fashion Workshop (2017)
Learning by Association - A versatile semi-supervised training method for neural networks

Philip Haeusser, Alexander Mordvintsev, Daniel Cremers

IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Learning to Generate Long-term Future via Hierarchical Prediction

Ruben Villegas, Jimei Yang, Yuliang Zou, Sungryull Sohn, Xunyu Lin, Honglak Lee

ICML (2017)
Learning typographic style: from discrimination to synthesis

Shumeet Baluja

Machine Vision and Applications, vol. 28, Issues 5-6 (2017), pp. 551-568
Learning with Proxy Supervision for End-To-End Visual Learning

Jiří Čermák, Anelia Angelova

Deep Learning for Vehicle Perception Workshop, Intelligent Vehicles Symposium (2017)
Modulating early visual processing by language

Harm de Vries, Florian Strub, Jérémie Mary, Hugo Larochelle, Olivier Pietquin, Aaron Courville

NIPS (2017)
No Fuss Distance Metric Learning using Proxies

Yair Movshovitz-Attias, Alexander Toshev, Thomas Leung, Sergey Ioffe, Saurabh Singh

International Conference on Computer Vision (ICCV), IEEE (2017) (to appear)
Novel inter and intra prediction tools under consideration for the emerging AV1 video codec

Urvang Joshi, Debargha Mukherjee, Jingning Han, Yue Chen, Sarah Parker, Hui Su, Angie Chiang, Yaowu Xu, Zoe Liu, Yunqing Wang, Jim Bankoski, Chen Wang, Emil Keyder

SPIE Optical Engineering + Applications, vol. 10396 (2017), 10396 - 10396 - 13
Novel modes and adaptive block scanning order for intra prediction in AV1

Ofer Hadar, Ariel Shleifer, Debargha Mukherjee, Urvang Joshi, Itai Mazar, Michael Yuzvinsky, Nitzan Tavor, Nati Itzhak, Raz Birman

SPIE Optical Engineering + Applications, vol. 10396 (2017), 10396 - 10396 - 10
Object category learning and retrieval with weak supervision

Steven Hickson, Anelia Angelova, Irfan Essa, Rahul Sukthankar

NIPS Workshop on Learning With Limited Labeled Data (2017)
Onsets and Frames: Dual-Objective Piano Transcription

Curtis Hawthorne, Erich Elsen, Jialin Song, Adam Roberts, Ian Simon, Colin Raffel, Jesse Engel, Sageev Oore, Douglas Eck

arXiv Preprint (2017)
PixColor: Pixel Recursive Colorization

Sergio Guadarrama, Ryan Dahl, David Bieber, Mohammad Norouzi, Jonathon Shlens, Kevin Murphy

Proceedings of the 28th British Machine Vision Conference (BMVC) (2017)
Pixel Recursive Super Resolution

Ryan Dahl, Mohammad Norouzi, Jonathan Shlens

ICCV (2017)
Practically Efficient Nonlinear Acoustic Echo Cancellers Using Cascaded Block RLS and FLMS Adaptive Filters

Yiteng (Arden) Huang, Jan Skoglund, Alejandro Luebs

ICASSP (2017)
Predicting Cardiovascular Risk Factors in Retinal Fundus Photographs using Deep Learning

Ryan Poplin, Avinash Vaidyanathan Varadarajan, Katy Blumer, Yun Liu, Mike McConnell, Greg Corrado, Lily Peng, Dale Webster

Arxiv (2017)
Quantitative evaluation of omnidirectional video quality

Neil Birkbeck, Chip Brown, Rob Suderman

Quality of Multimedia Experience (QoMEX) (2017)
Revisiting Unreasonable Effectiveness of Data in Deep Learning Era

Chen Sun, Abhinav Shrivastava, Saurabh Singh, Abhinav Gupta

ICCV (2017)
Seamless texturing of 3D meshes of objects from multiple views

Yoni Weill, Dror Aiger

Patent (2017)
Self-Supervised Learning of Structure and Motion from Video

Aikaterini Fragkiadaki, Bryan Seybold, Rahul Sukthankar, Sudheendra Vijayanarasimhan, Susanna Ricco

arxiv (2017)
Show, Ask, Attend, and Answer: A Strong Baseline For Visual Question Answering

Vahid Kazemi, Ali Elqursh

arxiv (2017)
Soft 3D Reconstruction for View Synthesis

Eric Penner, Li Zhang

ACM Transactions on Graphics (Proc. SIGGRAPH Asia), vol. 36 (2017) (to appear)
Spatially Adaptive Computation Time for Residual Networks

Dmitry P. Vetrov, Jonathan Huang, Li Zhang, Maxwell Collins, Michael Figurnov, Ruslan Salakhutdinov, Yukun Zhu

IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Spatially adaptive image compression using a tiled deep network

David Minnen, George Toderici, Michele Covell, Troy Chinen, Nick Johnston, Joel Shor, Sung Jin Hwang, Damien Vincent, Saurabh Singh

Proceedings of the International Conference on Image Processing (2017), pp. 2796-2800
Spatiotemporal atlas parameterization for evolving meshes

Fabian Prada, Misha Kazhdan, Ming Chuang, Alvaro Collet, Hugues Hoppe

ACM Transactions on Graphics, vol. 36 (2017)
Speed and accuracy trade-offs for modern convolutional object detectors

Alireza Fathi, Anoop Korattikara, Chen Sun, Ian Fischer, Jonathan Huang, Kevin Murphy, Menglong Zhu, Sergio Guadarrama, Vivek Rathod, Yang Song, Zbigniew Wojna

CVPR 2017, Honolulu, Hawaii (2017)
Strategies for Foveated Compression and Transmission

Behnam Bastani

Symposium for Information Display, Palisades Convention Management, Inc. 411 Lafayette Street, Suite 201 New York, NY 10003 (2017) (to appear)
Supervision via Competition: Robot Adversaries for Learning Tasks

Lerrel Pinto, James Davidson, Abhinav Gupta

ICRA (2017)
Synthesizing Normalized Faces from Facial Identity Features

Forrester Cole, David Belanger, Dilip Krishnan, Aaron Sarna, Inbar Mosseri, William T. Freeman

Conference on Computer Vision and Pattern Recognition (CVPR) (2017) (to appear)
TALL: Temporal Activity Localization via Language Query

Jiyang Gao, Chen Sun, Zhenheng Yang, Ram Nevatia

ICCV (2017)
TURN TAP: Temporal Unit Regression Networks for Temporal Action Proposals

Jiyang Gao, Zhenheng Yang, Chen Sun, Kan Chen, Ram Nevatia

ICCV (2017)
The Devil is in the Decoders

Zbigniew Wojna, Vittorio Ferrari, Sergio Guadarrama, Nathan Silberman, Liang-chieh Chen, Alireza Fathi, Jasper Uijlings

BMVC (2017)
The Kinetics Human Action Video Dataset

Andrew Zisserman, Joao Carreira, Karen Simonyan, Will Kay, Brian Zhang, Chloe Hillier, Sudheendra Vijayanarasimhan, Fabio Viola, Tim Green, Trevor Back, Paul Natsev, Mustafa Suleyman

arXiv (2017)
The power of sparsity in convolutional neural networks

Soravit Changpinyo, Mark Sandler, Andrey Zhmoginov

arXiv (2017)
Three-dimensional models visual differential

Yoni Weill, Dror Aiger

Patent (2017)
Towards Accurate Multi-person Pose Estimation in the Wild

George Papandreou, Tyler Zhu, Nori Kanazawa, Alexander Toshev, Jonathan Tompson, Chris Bregler, Kevin Murphy

CVPR (2017)
Towards Learning Semantic Audio Representations from Unlabeled Data

Aren Jansen, Manoj Plakal, Ratheet Pandya, Dan Ellis, Shawn Hershey, Jiayang Liu, Channing Moore, Rif A. Saurous

NIPS Workshop on Machine Learning for Audio Signal Processing (ML4Audio) (2017) (to appear)
Training object class detectors with click supervision

Dim Papadopoulos, Jasper Uijlings, Frank Keller, Vittorio Ferrari

CVPR (2017)
Training ultra-deep CNNs with critical initialization

Lechao Xiao, Yasaman Bahri, Sam Schoenholz, Jeffrey Pennington

NIPS Workshop (2017) (to appear)
Unsupervised Learning of Depth and Ego-Motion from Video

Tinghui Zhou, Matthew Brown, Noah Snavely, David Lowe

Computer Vision and Pattern Recognition, IEEE (2017)
Unsupervised Perceptual Rewards for Imitation Learning

Pierre Sermanet, Kelvin Xu, Sergey Levine

Proceedings of Robotics: Science and Systems (RSS 2017) + Deep Learning for Action and Interaction workshop at NIPS (2016) + International Conference on Learning Representations (ICLR 2017) Workshop (2017)
Unsupervised Pixel-level Domain Adaptation with Generative Adversarial Networks

Konstantinos Bousmalis, Nathan Silberman, David Dohan, Dumitru Erhan, Dilip Krishnan

CVPR (2017)
Unsupervised deep clustering for semantic object retrieval

Steven Hickson, Anelia Angelova, Irfan Essa, Rahul Sukthankar

Baylearn, http://www.baylearn.org/ (2017)
Using Perceptual Metrics for Something Other Than Compression

Anil Kokaram

IS&T, Hyatt Regency, Burlingame, California (2017)
Video Frame Synthesis Using Deep Voxel Flow

Ziwei Liu, Raymond Yeh, Xiaoou Tang, Yiming Liu, Aseem Agarwala

Proceedings of International Conference on Computer Vision (ICCV) (2017) (to appear)
XGAN: Unsupervised Image-to-Image Translation for many-to-many Mappings

Amelie Royer, Konstantinos Bousmalis, Stephan Gouws, Fred Bertsch, Inbar Mosseri, Forrester Cole, Kevin Murphy

arXiv (2017)
YouTube-BoundingBoxes: A Large High-Precision Human-Annotated Dataset for Object Detection in Video

Esteban Real, Jon Shlens, Stefano Mazzocchi, Vincent Vanhoucke, Xin Pan

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7464-7473
A DYNAMIC MOTION VECTOR REFERENCING SCHEME FOR VIDEO CODING

Jingning Han, Yaowu Xu, James Bankoski

IEEE ICIP (2016)
A Deep Matrix Factorization Method for Learning Attribute Representations

George Trigeorgis, Konstantinos Bousmalis, Stefanos Zafeiriou, Björn W. Schuller

IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 39 (2016), pp. 417-429
A No-reference Perceptual Quality Metric for Videos Distorted by Spatially Correlated Noise

Chao Chen, Mohammad Izadi, Anil Kokaram

ACM Multimedia 2016, Amsterdam, The Netherlands (to appear)
A Perceptual Visibility Metric for Banding Artifacts

Yilin Wang, Sang-Uok Kum, Chao Chen, Anil Kokaram

IEEE International Conference on Image Processing (2016) (to appear)
A Staircase Transform Coding Scheme for Screen Content Video Coding

Cheng Chen, Jingning Han, Yaowu Xu, James Bankoski

IEEE ICIP (2016)
A Subjective Study for the Design of Multi-resolution ABR Video Streams with the VP9 Codec

Chao Chen, Sasi Inguva, Andrew Rankin, Anil Kokaram

SPIE Electronic Imaging, Human Visual Perception (2016) (to appear)
A cloud-based large-scale distributed video analysis system

Yongzhe Wang, Wei-Ta Chen, Huahui Wu, Anil Kokaram, Jaron Schaeffer

IEEE International Conference on Image Processing (2016)
AN ACOUSTIC KEYSTROKE TRANSIENT CANCELER FOR SPEECH COMMUNICATION TERMINALS USING A SEMI-BLIND ADAPTIVE FILTER MODEL

Herbert Buchner, Simon Godsill, Jan Skoglund

ICASSP (2016)
Adversarial Autoencoders

Alireza Makhzani, Jonathon Shlens, Navdeep Jaitly, Ian Goodfellow

International Conference on Learning Representations (2016)
Attend, Infer, Repeat: Fast Scene Understanding with Generative Models

S. M. Ali Eslami, Nicolas Heess, Theophane Weber, Yuval Tassa, David Szepesvari, Koray Kavukcuoglu, Geoffrey E. Hinton

NIPS (2016)
Audio Deepdream: Optimizing raw audio with convolutional networks

Adam Roberts, Cinjon Resnick, Diego Ardila, Doug Eck

International Society for Music Information Retrieval Conference, Google Brain (2016)
BI-MAGNITUDE PROCESSING FRAMEWORK FOR NONLINEAR ACOUSTIC ECHO CANCELLATION ON ANDROID DEVICES

Yiteng (Arden) Huang, Jan Skoglund, Alejandro Luebs

International Workshop on Acoustic Signal Enhancement 2016 (IWAENC2016)
Bilateral Guided Upsampling

Jiawen Chen, Andrew Adams, Neal Wadhwa, Sam Hasinoff

ACM Transactions on Graphics (Proceedings of SIGGRAPH Asia 2016) (2016)
Bitrate Classification of Twice-Encoded Audio using Objective Quality Features

Colm Sloan, Naomi Harte, Anil Kokaram, Damien Kelly, Andrew Hines

8th International Conference on Quality of Multimedia Experience (QoMEX 2016)
Blockout: Dynamic Model Selection for Hierarchical Deep Networks

Calvin Murdock, Zhen Li, Howard Zhou, Tom Duerig

CVPR 2016
Burst photography for high dynamic range and low-light imaging on mobile cameras

Sam Hasinoff, Dillon Sharlet, Ryan Geiss, Andrew Adams, Jonathan T. Barron, Florian Kainz, Jiawen Chen, Marc Levoy

ACM Transactions on Graphics (Proc. SIGGRAPH Asia 2016) (2016)
Chained Predictions Using Convolutional Neural Networks

Georgia Gkioxari, Navdeep Jaitly, Alexander Toshev

European Conference on Computer Vision (2016)
Chained predictions using convolutional neural networks

Georgia Gkioxari, Alexander Toshev, Navdeep Jaitly

ECCV (2016)
Computer Vision for Active and Assisted Living

Rainer Planinc, Alexandros Chaaraoui, Martin Kampel, Francisco Florez-Revuelta

Active and Assisted Living: Technologies and Applications, IET - The institution of Engineering and Technology, Savoy Place London WC2R 0BL UK (2016)
Content-based Related Video Recommendations

Joonseok Lee, Nisarg Kothari, Paul Natsev

Advances in Neural Information Processing Systems (NIPS) Demonstration Track (2016)
DeepStereo: Learning to Predict New Views From the World's Imagery

John Flynn, Ivan Neulander, James Philbin, Noah Snavely

The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Density Estimation using Real NVP

Laurent Dinh, Jascha Sohl-Dickstein, Samy Bengio

arXiv preprint (2016)
Detecting Events and Key Actors in Multi-Person Videos

Vignesh Ramanathan, Jonathan Huang, Sami Abu-El-Haija, Alexander Gorban, Kevin Murphy, Li Fei-Fei

Computer Vision and Pattern Recognition (CVPR) (2016)
Discovering the physical parts of an articulated object class from multiple videos

Luca DelPero, Susanna Ricco, Rahul Sukthankar, Vittorio Ferrari

CVPR (2016)
Do-It-Yourself Lighting Design for Product Videography

Ivaylo Boyadzhiev, Jiawen Chen, Kavita Bala, Sylvain Paris

IEEE International Conference on Computational Photography (2016)
Domain Separation Networks

Konstantinos Bousmalis, George Trigeorgis, Nathan Silberman, Dilip Krishnan, Dumitru Erhan

NIPS 2016 (2016)
Exploiting cyclic symmetry in convolutional neural networks

Jeffrey De Fauw, Koray Kavukcuoglu, Sander Dieleman

International Conference on Machine Learning (2016)
G-RMI Object Detection

Alireza Fathi, Anoop Korattikara, Chen Sun, Ian Fischer, Jonathan Huang, Kevin Murphy, Menglong Zhu, Sergio Guadarrama, Vivek Rathod, Yang Song, Zbigniew Wojna

2nd ImageNet and COCO Visual Recognition Challenges Joint Workshop, Amsterdam (2016)
GLOBALLY OPTIMIZED LEAST-SQUARES POST-FILTERING FOR MICROPHONE ARRAY SPEECH ENHANCEMENT

Yiteng (Arden) Huang, Alejandro Luebs, Jan Skoglund, W. Bastiaan Kleijn

ICASSP (2016)
Generation and Comprehension of Unambiguous Object Descriptions

Junhua Mao, Jonathan Huang, Alexander Toshev, Oana Camburu, Kevin Murphy

Computer Vision and Pattern Recognition (2016)
Geometry-driven quantization for omnidirectional image coding

Francesca De Simone, Pascal Frossard, Paul Wilkins, Neil Birkbeck, Anil Kokaram

Picture Coding Symposium (PCS) (2016)
Improving the Robustness of Deep Neural Networks via Stability Training

Stephan Zheng, Yang Song, Thomas Leung, Ian Goodfellow

CVPR'2016, IEEE (to appear)
Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning

Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, Alex A. Alemi

ICLR 2016 Workshop
Inverting Face Embeddings with Convolutional Neural Networks

Andrey Zhmoginov, Mark Sandler

arXiv (2016)
Jump: Virtual Reality Video

Robert Anderson, David Gallup, Jonathan T. Barron, Janne Kontkanen, Noah Snavely, Carlos Hernandez Esteban, Sameer Agarwal, Steven M. Seitz

ACM Transactions on Graphics(Proc. of SIGGRAPH Asia 2016) (2016) (to appear)
Learning Typographic Style

Shumeet Baluja

arXiv (2016)
Leveraging Contextual Cues for Generating Basketball Highlights

Vinay Bettadapura, Caroline Pantofaru, Irfan Essa

ACM Multimedia (2016)
Multi-Task Convolutional Music Models

Adam Roberts, Cinjon Resnick, Diego Ardila, Doug Eck

BayLearn (2016)
ON PRE-FILTERING STRATEGIES FOR THE GCC-PHAT ALGORITHM

Hong-Goo Kang, Michael Graczyk, Jan Skoglund

International Workshop on Acoustic Signal Enhancement 2016 (IWAENC 2016)
On The Existence of Epipolar Matrices

Sameer Agarwal, Hon Leung Lee, Bernd Sturmfels, Rekha R. Thomas

International Journal of Computer Vision (2016), pp. 1-13
Perspective Transformer Nets: Learning Single-View 3D Object Reconstruction without 3D Supervision

Xinchen Yan, Jimei Yang, Ersin Yumer, Yijie Guo, Honglak Lee

NIPS (2016)
Perspective-aware manipulation of portrait photos

Ohad Fried, Eli Shechtman, Dan B Goldman, Adam Finkelstein

ACM Transactions on Graphics (Proc. SIGGRAPH), vol. 35(4) (2016)
PlaNet - Photo Geolocation with Convolutional Neural Networks

Tobias Weyand, Ilya Kostrikov, James Philbin

European Conference on Computer Vision (ECCV) (2016)
Rethinking the Inception Architecture for Computer Vision

Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, Zbigniew Wojna

Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, (2016)
Robust Estimation of Reverberation Time Using Polynomial Roots

Ian Kelly, Francis Boland, Jan Skoglund

AES 60th Conference on Dereverberation and Reverberation of Audio, Music, and Speech, Google Ireland Ltd. (2016)
SSD: Single Shot MultiBox Detector

Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed,, Cheng-Yang Fu,, Alexander C. Berg

Proceedings of the European Conference on Computer Vision (ECCV) (2016) (to appear)
Scalable Learning of Non-Decomposable Objectives

Elad Eban, Mariano Schain, Alan Mackey, Ariel Gordon, Rif A. Saurous, Gal Elidan

arXiv preprint arXiv:1608.04802 (2016)
Semantic Video Trailers

Harrie Oosterhuis, Sujith Ravi, Mike Bendersky

ICML 2016 Workshop on Multi-View Representation Learning
The Fast Bilateral Solver

Jonathan T. Barron, Ben Poole

ECCV (2016)
The Unreasonable Effectiveness of Noisy Data for Fine-Grained Recognition

Jonathan Krause, Andrew Howard, Benjamin Sapp, Howard Zhou, Alexander Toshev, Tom Duerig, James Philbin, Li Fei-Fei

Computer Vision and Pattern Recognition (2016)
The little Engine that Could: Regularization by Denoising (RED)

Yaniv Romano, Michael Elad, Peyman Milanfar

ArXiv (2016) (to appear)
Understanding Image and Text Simultaneously: a Dual Vision-Language Machine Comprehension Task

Nan Ding, Sebastian Goodman, Fei Sha, Radu Soricut

Arxiv, https://arxiv.org/abs/1612.07833 (2016)
Unsupervised Learning for Physical Interaction through Video Prediction

Chelsea Finn, Ian Goodfellow, Sergey Levine

arXiv e-prints (2016)
Webly-supervised Video Recognition by Mutually Voting for Relevant Web Images and Web Video Frames

Chuang Gan, Chen Sun, Lixin Duan, Boqing Gong

European Conference on Computer Vision (ECCV) (2016) (to appear)
YouTube-8M: A Large-Scale Video Classification Benchmark

Sami Abu-El-Haija, Nisarg Kothari, Joonseok Lee, Apostol (Paul) Natsev, George Toderici, Balakrishnan Varadarajan, Sudheendra Vijayanarasimhan

arXiv:1609.08675 (2016)
A 6 µW per Channel Analog Biomimetic Cochlear Implant Processor Filterbank Architecture With Across Channels AGC

Guang Wang, Richard F. Lyon, Emmanuel M. Drakakis

IEEE Transactions on Biomedical Circuits and Systems, vol. 9 (2015), pp. 72-86
A Computational Approach for Obstruction-Free Photography

Tianfan Xue, Michael Rubinstein, Ce Liu, William T. Freeman

ACM Transactions on Graphics, vol. 34, no. 4 (Proc. SIGGRAPH) (2015)
A World of Movement

Fredo Durand, William T. Freeman, Michael Rubinstein

Scientific American, vol. 312, no. 1 (2015)
An Exploration of Parameter Redundancy in Deep Networks with Circulant Projections

Yu Cheng, Felix X. Yu, Rogerio Feris, Sanjiv Kumar, Shih-Fu Chang

International Conference on Computer Vision (ICCV) (2015)
An estimation-theoretic approach to video denoising

Jingning Han, Timothy Kopp, Yaowu Xu

2015 IEEE International Conference on Image Processing, IEEE, pp. 4273-4277
Attention for fine-grained categorization

Pierre Sermanet, Andrea Frome, Esteban Real

International Conference on Learning Representations (ICLR 2015) workshop
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Sergey Ioffe, Christian Szegedy

Proceedings of The 32nd International Conference on Machine Learning (2015), pp. 448-456
Best-Buddies Similarity for Robust Template Matching

Tali Dekel, Shaul Oron, Michael Rubinstein, Shai Avidan, William T. Freeman

IEEE Conf. on Computer Vision and Pattern Recognition (CVPR) (2015)
Beyond Short Snippets: Deep Networks for Video Classification

Joe Yue-Hei Ng, Matthew Hausknecht, Sudheendra Vijayanarasimhan, Oriol Vinyals, Rajat Monga, George Toderici

Computer Vision and Pattern Recognition (2015)
Convolutional Color Constancy

Jonathan T Barron

ICCV (2015) (to appear)
DETECTION AND SUPPRESSION OF KEYBOARD TRANSIENT NOISE IN AUDIO STREAMS WITH AUXILIARY KEYBED MICROPHONE

Simon Godsill, Herbert Buchner, Jan Skoglund

ICASSP 2015, IEEE
DIRECT-TO-REVERBERANT RATIO ESTIMATION USING A NULL-STEERED BEAMFORMER

James Eaton, Alastair Moore, Patrick Naylor, Jan Skoglund

ICASSP 2015, IEEE
Deep Networks With Large Output Spaces

Sudheendra Vijayanarasimhan, Jonathon Shlens, Rajat Monga, Jay Yagnik

International Conference on Learning Representations (2015)
Efficient Large Scale Video Classification

Balakrishnan Varadarajan, George Toderici, Paul Natsev, Sudheendra Vijayanarasimhan

dblp computer science bibliography, http://dblp.org (2015) (to appear)
Egocentric Field-of-View Localization Using First-Person Point-of-View Devices

Vinay Bettadapura, Irfan Essa, Caroline Pantofaru

Proceedings of Winter Conference on Applications of Computer Vision (WACV), IEEE (2015)
Fast Bilateral-Space Stereo for Synthetic Defocus

Jonathan T Barron, Andrew Adams, YiChang Shih, Carlos Hernández

CVPR (2015)
Fast Orthogonal Projection Based on Kronecker Product

Xu Zhang, Felix X. Yu, Ruiqi Guo, Sanjiv Kumar, Shengjin Wang, Shih-Fu Chang

International Conference on Computer Vision (ICCV) (2015)
Going Deeper with Convolutions

Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich

Computer Vision and Pattern Recognition (CVPR) (2015)
Im2Calories: towards an automated mobile vision food diary

Austin Myers, Nick Johnston, Vivek Rathod, Anoop Korattikara, Alex Gorban, Nathan Silberman, Sergio Guadarrama, George Papandreou, Jonathan Huang, Kevin Murphy

ICCV (2015)
IsoMatch: Creating Informative Grid Layouts

Ohad Fried, Stephen DiVerdi, Maciej Halber, Elena Sizikova, Adam Finkelstein

Computer Graphics Forum (Proceedings of Eurographics), vol. 34(2) (2015) (to appear)
Label Transition and Selection Pruning and Automatic Decoding Parameter Optimization for Time-Synchronous Viterbi Decoding

Yasuhisa Fujii, Dmitriy Genzel, Ashok C. Popat, Remco Teunen

13th International Conference on Document Analysis and Recognition (ICDAR), IEEE (2015), pp. 756-760
Large Scale Business Discovery from Street Level Imagery

Qian Yu, Christian Szegedy, Martin C. Stumpe, Liron Yatziv, Vinay Shet, Julian Ibarz, Sacha Arnoud

arXiv (2015)
Learning semantic relationships for better action retrieval in images

Vignesh Ramanathan, Congcong Li, Jia Deng, Wei Han, Zhen Li, Kunlong Gu, Yang Song, Samy Bengio, Chuck Rosenberg, Li Fei-Fei

CVPR (2015)
Object Recognition from Short Videos for Robotic Perception

Ivan Bogun, Anelia Angelova, Navdeep Jaitly

CoRR, vol. abs/1509.01602 (2015)
Ontological Supervision for Fine Grained Classification of Street View Storefronts

Yair Movshovitz-Attias, Qian Yu, Martin C. Stumpe, Vinay Shet, Sacha Arnoud, Liron Yatziv

CVPR15 (2015)
Palette-based Photo Recoloring

Huiwen Chang, Ohad Fried, Yiming Liu, Stephen DiVerdi, Adam Finkelstein

Transactions on Graphics (Proceedings of SIGGRAPH) (2015) (to appear)
Pedestrian Detection with a Large-Field-Of-View Deep Network

Anelia Angelova, Alex Krizhevsky, Vincent Vanhoucke

Proceedings of ICRA 2015
Pose Embeddings: A Deep Architecture for Learning to Match Human Poses

Greg Mori, Caroline Pantofaru, Nisarg Kothari, Thomas Leung, George Toderici, Alexander Toshev, Weilong Yang

arXiv (2015)
Probabilistic Label Relation Graphs with Ising Models

Nan Ding, Jia Deng, Kevin Murphy, Hartmut Neven

International Conference on Computer Vision (2015)
Real-Time Grasp Detection Using Convolutional Neural Networks

Joseph Redmon, Anelia Angelova

International Conference on Robotics and Automation (ICRA), IEEE (2015)
Real-Time Pedestrian Detection With Deep Network Cascades

Anelia Angelova, Alex Krizhevsky, Vincent Vanhoucke, Abhijit Ogale, Dave Ferguson

Proceedings of BMVC 2015
Refer-to-as Relations as Semantic Knowledge

Song Feng, Sujith Ravi, Ravi Kumar, Polina Kuznetsova, Wei Liu, Alex Berg, Tamara Berg, Yejin Choi

AAAI Conference on Artificial Intelligence (2015)
Show and tell: A neural image caption generator

Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan

Computer Vision and Pattern Recognition (2015)
Speech Acoustic Modeling from Raw Multichannel Waveforms

Yedid Hoshen, Ron Weiss, Kevin W Wilson

International Conference on Acoustics, Speech, and Signal Processing, IEEE (2015)
Temporal Localization of Fine-Grained Actions in Videos by Domain Transfer from Web Images

Chen Sun, Sanketh Shetty, Rahul Sukthankar, Ram Nevatia

ACM Multimedia (2015)
The latest open-source video codec VP9 - An overview and preliminary results

Debargha Mukherjee, Jingning Han, Jim Bankoski, Ronald S Bultje, Adrian Grange, John Koleszar, Paul Wilkins, Yaowu Xu

SMPTE Motion Imaging Journal, vol. 124 (2015)
VIP: Finding Important People in Images

Clint Solomon Mathialagan, Andrew C. Gallagher, Dhruv Batra

Computer Vision and Pattern Recognition, Computer Vision and Pattern Recognition, Computer Vision and Pattern Recognition (2015), pp. 4858-4966
ViSQOLAudio: An objective audio quality metric for low bitrate codecs

Andrew Hines, Eoin Gillen, Damien Kelly, Jan Skoglund, Anil Kokaram, Naomi Harte

The Journal of the Acoustical Society of America, vol. 137 (6) (2015), EL449-EL455
Visual Vibrometry: Estimating Material Properties from Small Motion in Video

Abe Davis, Katherine L. Bouman, Justin G. Chen, Michael Rubinstein, Fredo Durand, William T. Freeman

IEEE Conf. on Computer Vision and Pattern Recognition (CVPR) (2015)
What’s Cookin’? Interpreting Cooking Videos using Text, Speech and Vision

Jonathan Malmaud, Jonathan Huang, Vivek Rathod, Nicholas Johnston, Andrew Rabinovich, Kevin Murphy

North American Chapter of the Association for Computational Linguistics – Human Language Technologies (NAACL HLT 2015) (to appear)
An optimized template matching approach to intra coding in video/image compression

Hui Su, Jingning Han, Yaowu Xu

IS&T/SPIE Electronic Imaging, 2014, SPIE, pp. 1-6
Auto-Rectification of User Photos

Krishnendu Chaudhury (aka Krish Chaudhury), Stephen DiVerdi, Sergey Ioffe

Proceedings of International Conference on Image Processing, ICIP, IEEE (2014), pp. 3479-3483
Co-Segmentation of Textured 3D Shapes with Sparse Annotations

M. Ersin Yumer, Ameesh Makadia

Computer Vision and Pattern Recognition (CVPR) (2014)
DaMN – Discriminative and Mutually Nearest: Exploiting Pairwise Category Proximity for Video Action Recognition

Rui Hou, Amir Roshan Zamir, Rahul Sukthankar, Mubarak Shah

Proceedings of European Conference on Computer Vision (2014)
DeepPose: Human Pose Estimation via Deep Neural Networks

Alexander Toshev, Christian Szegedy

Computer Vision and Pattern Recognition (2014) (to appear)
Discovering Groups of People in Images

Wongun Choi, Yu-Wei Chao, Caroline Pantofaru, Silvio Savarese

European Conference on Computer Vision (ECCV) (2014)
Indoor Scene Understanding with Geometric and Semantic Contexts

Wongun Choi, Yu-Wei Chao, Caroline Pantofaru, Silvio Savarese

International Journal of Computer Vision (IJCV) (2014)
Large-Scale Object Classification Using Label Relation Graphs

Jia Deng, Nan Ding, Yangqing Jia, Andrea Frome, Kevin Murphy, Samy Bengio, Yuan Li, Hartmut Neven, Hartwig Adam

European Conference on Computer Vision (2014)
Large-scale Video Classiﬁcation with Convolutional Neural Networks

Andrej Karpathy, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Sukthankar, Li Fei-Fei

Proceedings of International Computer Vision and Pattern Recognition (CVPR 2014), IEEE
Learning 3D Part Detection from Sparsely Labeled Data

Ameesh Makadia, Mehmet Ersin Yumer

2nd International Conference on 3D Vision, 2014 (2014)
Learning Fine-grained Image Similarity with Deep Ranking

Jiang Wang, Yang Song, Thomas Leung, Chuck Rosenberg, Jingbin Wang, James Philbin, Bo Chen, Ying Wu

CVPR'2014, IEEE
Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks

Ian Goodfellow, Yaroslav Bulatov, Julian Ibarz, Sacha Arnoud, Vinay Shet

ICLR2014, ICLR2014 (to appear)
Neural Networks and Neuroscience-Inspired Computer Vision

David Cox, Tom Dean

Current Biology, vol. 24 (2014), pp. 921-929
On Learning Where to Look

Marc'Aurelio Ranzato

Google Inc. (2014)
Painting with Triangles

Mark D. Benjamin, Stephen DiVerdi, Adam Finkelstein

Proceedings of the Workshop on Non-Photorealistic Animation and Rendering, NPAR, ACM, New York, NY, USA (2014), pp. 13-20
RealPigment: Paint Compositing by Example

Jingwan Lu, Stephen DiVerdi, Willa Chen, Connelly Barnes, Adam Finkelstein

Proceedings of the Workshop on Non-Photorealistic Animation and Rendering, NPAR, ACM, New York, NY, USA (2014), pp. 21-30
Recognition of Complex Events: Exploiting Temporal Dynamics between Underlying Concepts

Subhabrata Bhattacharya, Mahdi M. Kalayeh, Rahul Sukthankar, Mubarak Shah

Proceedings of International Computer Vision and Pattern Recognition (CVPR 2014), IEEE
SUPER 4PCS Fast Global Pointcloud Registration via Smart Indexing

Nicolas Mellado, Dror Aiger, Niloy Mitra

Eurographics Symposium on Geometry Processing 2014
Scalable Object Detection using Deep Neural Networks

Dumitru Erhan, Christian Szegedy, Alexander Toshev, Dragomir Anguelov

Computer Vision and Pattern Recognition, IEEE (2014), pp. 2155- 2162
Sinusoidal Interpolation Across Missing Data

W. Bastiaan Kleijn, Turaj Zakizadeh Shabestary, Jan Skoglund

International Workshop on Acoustic Signal Enhancement 2014 (IWAENC 2014), pp. 71-75
Temporal Synchronization of Multiple Audio Signals

Julius Kammerl, Neil Birkbeck, Sasi Inguva, Damien Kelly, Andy Crawford, Hugh Denman, Anil Kokaram, Caroline Pantofaru

Proceedings of the International Conference on Signal Processing (ICASSP), Florence, Italy (2014)
The Optical Mouse: Early Biomimetic Embedded Vision

Richard F. Lyon

Advnances in Embedded Computer Vision, Springer (2014), pp. 3-22
Training Highly Multi-class Linear Classifiers

Maya R. Gupta, Samy Bengio, Jason Weston

Journal Machine Learning Research (JMLR) (2014), 1461-−1492
Unsupervised Discovery of Object Classes with a Mobile Robot

Julian Mason, Bhaskara Marthi, Ronald Parr

ICRA 2014
Video Object Discovery and Co-segmentation with Extremely Weak Supervision

Le Wang, Gang Hua, Rahul Sukthankar, Jianru Xue, Nanning Zheng

Proceedings of European Conference on Computer Vision (2014)
Video Quality Assessment for Web Content Mirroring

Ye He, Kevin Fei, Gus Fernandez, Edward J. Delp

Imaging and Multimedia Analytics in a Web and Mobile World 2014, IS&T/SPIE Electronic Imaging, San Francisco, California, pp. 9027-11
Zero-Shot Learning by Convex Combination of Semantic Embeddings

Mohammad Norouzi, Tomas Mikolov, Samy Bengio, Yoram Singer, Jonathon Shlens, Andrea Frome, Greg Corrado, Jeffrey Dean

International Conference on Learning Representations (2014)
3DNN: Viewpoint Invariant 3D Geometry Matching for Scene Understanding

Scott Satkin, Martial Hebert

Proceedings of the International Conference on Computer Vision (ICCV) (2013) (to appear)
A Butterfly Structured Design of The Hybrid Transform Coding Scheme

Jingning Han, Yaowu Xu, Debargha Mukherjee

Picture Coding Symposium, IEEE (2013), pp. 1-4
A Discriminative Model for Learning Semantic and Geometric Interactions in Indoor Scenes

Wongun Choi, Yu-Wei Chao, Caroline Pantofaru, Silvio Savarese

Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Scene Understanding Workshop (SUNw) (2013)
Accelerating defocus blur magnification

Florian Kriener, Thomas Binder, Manuel Wille

Proceedings SPIE Vol. 8667 (Multimedia Content and Mobile Devices), SPIE (2013)
Category-Independent Object-level Saliency Detection

Yangqing Jia, Mei Han

International Conference on Computer Vision (2013)
DeViSE: A Deep Visual-Semantic Embedding Model

Andrea Frome, Greg Corrado, Jonathon Shlens, Samy Bengio, Jeffrey Dean, Marc’Aurelio Ranzato, Tomas Mikolov

Neural Information Processing Systems (NIPS) (2013)
Deep Neural Networks for Object Detection

Christian Szegedy, Alexander Toshev, Dumitru Erhan

Advances in Neural Information Processing Systems (2013)
Design of user interfaces for selective editing of digital photos on touchscreen devices

Thomas Binder, Meikel Steiding, Manuel Wille, Nils Kokemohr

Proceedings SPIE 8667 (Multimedia Content and Mobile Devices), SPIE (2013)
Discriminative Segment Annotation in Weakly Labeled Video

Kevin Tang, Rahul Sukthankar, Jay Yagnik, Li Fei-Fei

Proceedings of International Conference on Computer Vision and Pattern Recognition (CVPR 2013)
Fast, Accurate Detection of 100,000 Object Classes on a Single Machine

Thomas Dean, Mark Ruzon, Mark Segal, Jonathon Shlens, Sudheendra Vijayanarasimhan, Jay Yagnik

Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE Computer Society, Washington, DC, USA (2013)
Fast, Accurate Detection of 100,000 Object Classes on a Single Machine: Technical Supplement

Thomas Dean, Mark Ruzon, Mark Segal, Jonathon Shlens, Sudheendra Vijayanarasimhan, Jay Yagnik

Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE Computer Society, Washington, DC, USA (2013)
HMM-based script identification for OCR

Dmitriy Genzel, Ashok Popat, Remco Teunen, Yasuhisa Fujii

Proceedings of the 4th International Workshop on Multilingual OCR, ACM, New York, NY, US (2013), 2:1-2:5
Handling Packet Loss in WebRTC

Stefan Holmer, Mikhal Shemer, Marco Paniconi

International Conference on Image Processing (ICIP 2013), IEEE, pp. 1860-1864
High-Resolution Global Maps of 21st-Century Forest Cover Change

Rebecca Moore, Matt Hancher, David Thau

Science, vol. 342 (2013), pp. 850-853
Image Annotation in Presence of Noisy Labels

Chandrashekhar V., Shailesh Kumar, C. V. Jawahar

International Conference on Pattern Recognition and Machine Intelligence (2013) (to appear)
Image Compression via Colorization Using Semi-Regular Color Samples

Chenguang Zhang, Hui Fang

Data Compression Conference (2013)
Joint Noise Level Estimation from Personal Photo Collections

YiChang Shih, Vivek Kwatra, Troy Chinen, Hui Fang, Sergey Ioffe

ICCV 2013 (to appear)
Learning Binary Codes for High Dimensional Data Using Bilinear Projections

Yunchao Gong, Sanjiv Kumar, Henry Rowley, Svetlana Lazebnik

IEEE Computer Vision and Pattern Recognition (2013)
Learning Multiple Non-Linear Sub-Spaces using K-RBMs

Siddhartha Chandra, Shailesh Kumar, C. V. Jawahar

Computer Vision and Pattern Recognition (2013)
Learning Part-based Templates from Large Collections of 3D Shapes

Vladimir Kim, Wilmot Li, Niloy Mitra, Siddhartha Chaudhuri, Stephen DiVerdi, Thomas Funkhouser

ACM Transactions on Graphics (TOG) - SIGGRAPH 2013 Conference Proceedings, vol. 32, no. 4 (2013), 70:1-70:12
Learning Query-Specific Distance Functions for Large-Scale Web Image Search

Yushi Jing, Michele Covell, David Tsai, James M. Rehg

IEEE Transactions on Multimedia, vol. 15 (2013), pp. 2022-2034
Modelling the Distortion Produced by Cochlear Compression

Roy D. Patterson, Timothy Ives, Thomas C. Walters, Richard F. Lyon

Basic Aspects of Hearing, Springer (2013), pp. 81-88
Random Grids: Fast Approximate Nearest Neighbors and Range Searching for Image Search

Dror Aiger, Efi Kokiopoulou, Ehud Rivlin

ICCV 2013
Rate-Distortion Optimization for Multichannel Audio Compression

Minyue Li, Jan Skoglund, W. Bastiaan Kleijn

2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)
RealBrush: Painting with Examples of Physical Media

Jingwan Lu, Connelly Barnes, Stephen DiVerdi, Adam Finkelstein

ACM Transactions on Graphics (TOG) -- SIGGRAPH 2013 Conference Proceedings, vol. 32, no. 4 (2013), 117:1-117:12
Rendering Fur in Life of Pi

Ivan Neulander, Toshi Kato, Kevin Beason

ACM, New York, NY, USA
Reporting Neighbors in High-Dimensional Euclidean Space

Dror Aiger, Haim Kaplan, Micha Sharir

SODA (2013)
Spatiotemporal Deformable Part Models for Action Detection

Yicong Tian, Rahul Sukthankar, Mubarak Shah

Proceedings of International Conference on Computer Vision and Pattern Recognition (CVPR 2013)
Street View Motion-from-Structure-from-Motion

Bryan Klingner, David Martin, James Roseborough

Proceedings of the International Conference on Computer Vision, IEEE (2013)
The Intervalgram: An Audio Feature for Large-Scale Cover-Song Recognition

Thomas C. Walters, David A. Ross, Richard F. Lyon

From Sounds to Music and Emotions: 9th International Symposium, CMMR 2012, London, UK, June 19-22, 2012, Revised Selected Papers, Springer Berlin Heidelberg (2013), pp. 197-213
The latest open-source video codec VP9 - An overview and preliminary results

Debargha Mukherjee, Jim Bankoski, Adrian Grange, Jingning Han, John Koleszar, Paul Wilkins, Yaowu Xu, Ronald S Bultje

Picture Coding Symposium (2013)
Tracking Large-Scale Video Remix in Real-World Events

Lexing Xie, Apostol Natsev, Xuming He, John R. Kender, Matthew L. Hill, John R. Smith

IEEE Transactions on Multimedia, vol. 15, no. 6 (2013), pp. 1244-1254
Understanding Indoor Scenes using 3D Geometric Phrases

Wongun Choi, Yu-Wei Chao, Caroline Pantofaru, Silvio Savarese

Proceedings of International Conference on Computer Vision and Pattern Recognition (CVPR 2013)
Using Web Co-occurrence Statistics for Improving Image Categorization

Samy Bengio, Jeffrey Dean, Dumitru Erhan, Eugene Ie, Quoc Le, Andrew Rabinovich, Jonathon Shlens, Yoram Singer

arXiv (2013)
Video Motion for Every Visible Point

Susanna Ricco, Carlo Tomasi

International Conference on Computer Vision (ICCV) (2013)
A QCQP Approach to Triangulation

Chris Aholt, Rekha Thomas, Sameer Agarwal

European Conference on Computer Vision, Springer Verlag (2012)
All Smiles : Automatic Photo Enhancement by Facial Expression Analysis

Rajvi Shah, Vivek Kwatra

Conference for Visual Media Production (CVMP 2012) [Best Paper]
Apparel silhouette attributes recognition

Wei Zhang, Emilio Antunez, Salih Gokturk, Baris Sumengen

Proceedings of the 2012 IEEE Workshop on the Applications of Computer Vision, IEEE Computer Society, Washington, DC, USA, pp. 489-496
Automatically Discovering Talented Musicians with Acoustic Analysis of YouTube Videos

Eric Nichols, Charles DuHadway, Hrishikesh Aradhye, Richard F. Lyon

Proceedings of the 2012 IEEE 12th International Conference on Data Mining (ICDM), IEEE Computer Society, Washington, DC, USA, pp. 559-565
Building Musically-relevant Audio Features through Multiple Timescale Representations

Philippe Hamel, Yoshua Bengio, Douglas Eck

Proceedings of the 13th International Society for Music Information Retrieval Conference, Porto, Portugal (2012)
Building high-level features using large scale unsupervised learning

Quoc Le, Marc'Aurelio Ranzato, Rajat Monga, Matthieu Devin, Kai Chen, Greg Corrado, Jeff Dean, Andrew Ng

International Conference in Machine Learning (2012)
Calibration-Free Rolling Shutter Removal

Matthias Grundmann, Vivek Kwatra, Daniel Castro, Irfan Essa

International Conference on Computational Photography [Best Paper], IEEE (2012)
Capturing Indoor Scenes with Smartphones

Aditya Sankar, Steve Seitz

Proc. UIST, 651 N. 34th St. (2012) (to appear)
Coherent image selection using a fast approximation to the generalized traveling salesman problem

Meng Wang, Prakash Ishwar, Janusz Konrad, Cenk Gazen, Rohit Saboo

Proceedings of the 20th ACM international conference on Multimedia, ACM, New York, NY, USA (2012), pp. 981-984
D-Nets: Beyond Patch-Based Image Descriptors

Felix von Hundelshausen, Rahul Sukthankar

IEEE International Conference on Computer Vision and Pattern Recognition (CVPR'12) (2012)
Efficient Closed-Form Solution to Generalized Boundary Detection

Marius Leordeanu, Rahul Sukthankar, Crisitian Sminchisescu

Proceedings of European Conference on Computer Vision (ECCV'12) (2012)
Efficient model based single and double thresholding for real time recognition

Dror Aiger, Silvio Guimarães

ACCV Workshop on Detection and Tracking in Challenging Environments (2012)
Embedded Voxel Colouring with Adaptive Threshold Selection Using Globally Minimal Surfaces

Carlos Leung, Ben Appleton, Mitchell Buckley, Changming Sun

IJCV, vol. 99 (2012), pp. 215-231
General and Nested Wiberg Minimization

Dennis Strelow

Computer Vision and Pattern Recognition, IEEE (2012)
General and nested Wiberg minimization: L2 and maximum likelihood

Dennis Strelow

European Conference on Computer Vision, Springer (2012)
IMPROVED PREDICTION OF NEARLY-PERIODIC SIGNALS

Bastiaan Kleijn, Jan Skoglund

International Workshop on Acoustic Signal Enhancement 2012 (IWAENC2012)
Improving Book OCR by Adaptive Language and Image Models

Dar-Shyang Lee, Ray Smith

Proceedings of 2012 10th IAPR International Workshop on Document Analysis Systems, IEEE, pp. 115-119
Joint Image and Word Sense Discrimination For Image Retrieval

Aurelien Lucchi, Jason Weston

ECCV (2012)
Learning Hierarchical Bag of Words Using Naive Bayes Clustering

Siddhartha Chandra, Shailesh Kumar, C. V. Jawahar

Asian Conference on Computer Vision (2012), pp. 382-395
MEASURING NOISE CORRELATION FOR IMPROVED VIDEO DENOISING

Anil Kokaram, Damien Kelly, Hugh Denman, Andrew Crawford

IEEE International Conference on Image Processing, IEEE, 1600 Amphitheatre Parkway (2012)
Measuring the Objectness of Image Windows

Bogdan Alexe, Thomas Deselaers, Vittorio Ferrari

IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34/11 (2012), pp. 2189-2202
Mobile Music Modeling, Analysis and Recognition

Pavel Golik, Boulos Harb, Ananya Misra, Michael Riley, Alex Rudnick, Eugene Weinstein

International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2012)
Model Recommendation for Action Recognition

Pyry Matikainen, Rahul Sukthankar, Martial Hebert

IEEE International Conference on Computer Vision and Pattern Recognition (CVPR'12) (2012)
Molli: Interactive Visualization for Exploratory Protein Analysis

Sara L. Su, Connor Gramazio, Megan Strait, Caitlin Crumm, Daniela Extrum-Fernandez, Matt Menke, Lenore Cowen

IEEE Computer Graphics & Applications, vol. 32 (2012), pp. 62-69
Multi-component Models for Object Detection

Chunhui Gu, Pablo Arbelaez, Yuanqing Lin, Kai Yu, Jitendra Malik

European Conference on Computer Vision, Springer (2012), Volume 4, 445-458
Multimedia Semantics: Interactions Between Content and Community

Hari Sundaram, Lexing Xie, Munmun De Choudhury, Yu-Ru Lin, Apostol Natsev

Proceedings of the IEEE, vol. 100, no. 9 (2012)
On Using Nearly-Independent Feature Families for High Precision and Confidence

Omid Madani, Manfred Georg, David Ross

Fourth Asian Machine Learning Conference, JMLR workshop and conference proceedings (2012), pp. 269-284
Photo Tours

Avanish Kushal, Ben Self, Yasutaka Furukawa, David Gallup, Carlos Hernandez, Brian Curless, Steve Seitz

3DimPVT 2012 (to appear)
Real-Time Human Pose Tracking from Range Data

Varun Ganapathi, Christian Plagemann, Daphne Koller, Sebastian Thrun

Proceedings of the European Conference on Computer Vision (ECCV) (2012)
Reconstructing the World's Museums

Jianxiong Xiao, Yasutaka Furukawa

European Conference on Computer Vision (2012) (to appear)
Refractive Height Fields from Single and Multiple Images

Qi Shan, Sameer Agarwal, Brian Curless

IEEE Conference on Computer Vision and Pattern Recognition, IEEE (2012)
Repetition Maximization based Texture Rectification

Dror Aiger, Niloy Mitra, Daniel Cohen-Or

EUROGRAPHICS 2012
Scene Aligned Pooling for Complex Video Recognition

Liangliang Cao, Yadong Mu, Apostol Natsev, Shih-Fu Chang, Gang Hua, John R. Smith

ECCV (2012), pp. 688-701
Schematic Surface Reconstruction

Changchang Wu, Sameer Agarwal, Brian Curless, Steven M. Seitz

IEEE Conference on Computer Vision and Pattern Recognition, IEEE (2012)
Semantic Segmentation Using Regions and Parts

Pablo Arbelaez, Bharath Hariharan, Chunhui Gu, Saurabh Gupta, Lubomir Bourdev, Jitendra Malik

Computer Vision and Pattern Recognition, IEEE Computer Society Washington, DC, USA (2012), pp. 3378-3385
Semi-Supervised Hashing for Large Scale Search

Jun Wang, Sanjiv Kumar, Shih-Fu Chang

IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) (2012)
Shadow Removal for Aerial Imagery by Information Theoretic Intrinsic Image Analysis

Vivek Kwatra, Mei Han, Shengyang Dai

International Conference on Computational Photography, IEEE (2012)
Size Matters: Exhaustive Geometric Verification for Image Retrieval

Henrik Stewenius, Steinar H. Gunderson, Julien Pilet

12th European Conference on Computer Vision (ECCV), Springer (2012), pp. 674-687
Street view goes indoors: Automatic pose estimation from uncalibrated unordered spherical panoramas

Mohamed Aly, Jean-Yves Bouguet

Proceedings of the 2012 IEEE Workshop on the Applications of Computer Vision, IEEE Computer Society, Washington, DC, USA, pp. 1-8
Unsupervised Learning for Graph Matching

Marius Leordeanu, Rahul Sukthankar, Martial Hebert

International Journal of Computer Vision, vol. 96 (2012), pp. 28-45
VISQOL: THE VIRTUAL SPEECH QUALITY OBJECTIVE LISTENER

Andrew Hines, Jan Skoglund, Anil Kokaram, Naomi Harte

International Workshop on Acoustic Signal Enhancement 2012 (IWAENC2012)
Video Description Length Guided Constant Quality Video Coding with Bitrate Constraint

Lei Yang, Debargha Mukherjee, Dapeng Wu

Multimedia and Expo Workshops (ICMEW), 2012 IEEE International Conference on, IEEE, 2001 L Street, NW. Suite 700 Washington, DC 20036-4910 USA, pp. 366-371
Visibility Based Preconditioning for Bundle Adjustment

Avanish Kushal, Sameer Agarwal

IEEE Conference on Computer Vision and Pattern Recognition, IEEE (2012)
Weakly Supervised Learning of Object Segmentations from Web-Scale Video

Glenn Hartmann, Matthias Grundmann, Judy Hoffman, David Tsai, Vivek Kwatra, Omid Madani, Sudheendra Vijayanarasimhan, Irfan Essa, James Rehg, Rahul Sukthankar

ECCV'12 Proceedings of the 12th international conference on Computer Vision - Volume Part I, Springer-Verlag, Berlin, Heidelberg (2012), pp. 198-208
A Hierarchical Conditional Random Field Model for Labeling and Images of Street Scenes

Qixing Huang, Mei Han, Bo Wu, Sergey Ioffe

International Conference on Computer Vision and Pattern Recognition (2011)
A Pole-Zero Filter Cascade Provides Good Fits to Human Masking Data and to Basilar Membrane and Neural Data

Richard F. Lyon

Mechanics of Hearing (2011)
Aesthetics and Emotions in Images

Dhiraj Joshi, Ritendra Datta, Elena Fedorovskaya, Quang-Tuan Luong, James Z. Wang, Jia Li, Jiebo Luo

IEEE Signal Processing Magazine, vol. vol. 28, no. 5 (2011), pp. 94-115
Auditory Sparse Coding

Steven R. Ness, Thomas Walters, Richard F. Lyon

Music Data Mining, CRC Press/Chapman Hall (2011)
Auto-Directed Video Stabilization with Robust L1 Optimal Camera Paths

Matthias Grundmann, Vivek Kwatra, Irfan Essa

IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2011)
Automatic Language Identification in Music Videos with Low Level Audio and Visual Features

Vijay Chandrasekhar, Mehmet Emre Sargin, David A. Ross

Proceedings of the IEEE Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2011)
Boosting Video Classification Using Cross-Video Signals

Mehmet Emre Sargin, Hrishikesh Aradhye

Proceedings of the IEEE Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2011) (to appear)
Building Rome in a day

Sameer Agarwal, Yasutaka Furukawa, Noah Snavely, Ian Simon, Brian Curless, Steven M. Seitz, Rick Szeliski

Communications of the ACM, vol. 54 (2011), pp. 105-112
Cascades of two-pole–two-zero asymmetric resonators are good models of peripheral auditory function

Richard F. Lyon

Journal of the Acoustical Society of America, vol. 130 (2011), pp. 3893-3904
Crowdsourcing Event Detection in YouTube Videos

Thomas Steiner, Ruben Verborgh, Rik Van de Walle, Michael Hausenblas, Joaquim Gabarro

Detection, Representation, and Exploitation of Events in the Semantic Web (DeRiVE 2011), Bonn, Germany
Discrete Point Based Signatures and Applications to Document Matching

Nemanja Spasojevic, Guillaume Poncin, Dan Bloomberg

ICIAP 2011
Discriminative Tag Learning on YouTube Videos with Latent Sub-tags

Weilong Yang, George Toderici

Computer Vision and Pattern Recognition, IEEE (2011)
Dynamic Stylized Shading Primitives

David Vanderhaeghe, Romain Vergne, Pascal Barla, William Baxter

Proc. Symposium on NonPhotorealistic Animation and Rendering (NPAR 2011), ACM
Exploring Photobios

Ira Kemelmacher-Shlizerman, Eli Shechtman, Rahul Garg, Steven Seitz

ACM Trans. on Graphics (Proc. SIGGRAPH), vol. 30(4) (2011) (to appear)
Feature Seeding for Action Recognition

Pyry Matikainen, Rahul Sukthankar, Martial Hebert

International Conference on Computer Vision (ICCV) (2011)
Geometric Overpass Extraction from Vector Road Data and DSMs

Joshua Schpok

Proceedings of the 19th ACM SIGSPATIAL international Conference on Advances in Geographic information Systems, 2011 (to appear)
Handling Label Noise in Video Classification via Multiple Instance Learning

Thomas Leung, Yang Song, John Zhang

ICCV'2011, IEEE
Image Saliency: From Local to Global Context

Meng Wang, Janusz Konrad, Prakash Ishwar, Yushi Jing, Henry Rowley

Proc. Conference on Computer Vision and Pattern Recognition (CVPR) (2011)
Improving Video Classification via YouTube Video Co-Watch Data

John Zhang, Yang Song, Thomas Leung

ACM Workshop on Social and Behavioural Networked Media Access at ACM MM 2011, ACM
Kernelized Structural SVM Learning for Supervised Object Segmentation

Luca Bertelli, Tianli Yu, Diem Vu, Burak Gokturk

Proceedings of IEEE Conference on Computer Vision and Pattern Recognition 2011
Large-Scale Image Annotation using Visual Synset

David Tsai, Yushi Jing, Henry Rowley, Yi Liu, Sergey Ioffe, James Rehg

Proc. International Conference on Computer Vision (ICCV) (2011)
Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis

Quoc V. Le, Will Zou, Serena Yeung, Andrew Y. Ng

Conference on Computer Vision and Pattern Recognition (2011)
Limits on the Application of Frequency-based Language Models to OCR

Ray Smith

ICDAR, IEEE (2011), pp. 538-542
Multicore Bundle Adjustment

Changchang Wu, Sameer Agarwal, Brian Curless, Steven Seitz

Proc. IEEE Conf. on Computer Vision and Pattern Recognition (2011), pp. 3057-3064
Privacy protection and face recognition

Andrew Senior, Sharat Pankanti

Handbook of Face recognition, Springer, 236 Gray's Inn Road | Floor 6 London | WC1X 8HL | UK (2011), pp. 671-692
Reading Digits in Natural Images with Unsupervised Feature Learning

Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, Andrew Y. Ng

NIPS Workshop on Deep Learning and Unsupervised Feature Learning 2011
Sparse coding of auditory features for machine hearing in interference

Richard F. Lyon, Gal Chechik, Jay Ponte

Proc. ICASSP, IEEE (2011)
Summary of Opus listening test results

Christian Hoene, Jean-Marc Valin, Koen Vos, Jan Skoglund

IETF, IETF (2011)
Survey and Evaluation of Audio Fingerprinting Schemes for Mobile Query-By-Example Applications

Vijay Chandrasekhar, Matt Sharifi, David Ross

12th International Society for Music Information Retrieval Conference (ISMIR) (2011)
Technical Overview of VP8, an open source video codec for the web

Jim Bankoski, Paul Wilkins, Yaowu Xu

2011 International Workshop on Acoustics and Video Coding and Communication, IEEE, Barcelona, Spain (to appear)
The Power of Comparative Reasoning

Jay Yagnik, Dennis Strelow, David Ross, Ruei-Sung Lin

International Conference on Computer Vision, IEEE (2011)
Using a Cascade of Asymmetric Resonators with Fast-Acting Compression as a Cochlear Model for Machine-Hearing Applications

Richard F. Lyon

Autumn Meeting of the Acoustical Society of Japan (2011), pp. 509-512
Visual and Semantic Similarity in ImageNet

Thomas Deselaers, Vittorio Ferrari

IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2011), pp. 1777-1784
Where's Waldo: Matching People in Images of Crowds

Rahul Garg, Deva Ramanan, Steven M. Seitz, Noah Snavely

Proc. IEEE Conf. on Computer Vision and Pattern Recognition (2011), pp. 1793-1800
YouTubeEvent: On Large-Scale Video Event Classification

Bingbing Ni, Yang Song, Ming Zhao

The 3rd International Workshop on Video Event Categorization, Tagging and Retrieval for Real-World Applications at IEEE ICCV'2011
A Large-Scale Taxonomic Classification System for Web-based Videos

Yang Song, Ming Zhao, Reto Strobl, John Zhang, Jay Yagnik

the 11th European Conference on Computer Vision (ECCV 2010)
Baselines for Image Annotation

Ameesh Makadia, Vladimir Pavlovic, Sanjiv Kumar

International Journal on Computer Vision (IJCV) (2010)
Beyond “Near-Duplicates”: Learning Hash Codes for Efficient Similar-Image Retrieval

Shumeet Baluja, Michele Covell

20th International Conference on Pattern Recognition 2010
Comparison of Clustering Approaches for Summarizing Large Populations of Images

Yushi Jing, Michele Covell, Henry A. Rowley

Proceedings ICME VCIDS, IEEE, Singapore (2010)
Discontinuous Seam-Carving for Video Retargeting

Matthias Grundmann, Vivek Kwatra, Mei Han, Irfan Essa

Computer Vision and Pattern Recognition (CVPR 2010)
Document Image Analysis (Chapter 18)

Dan Bloomberg, Luc Vincent

Mathematical morphology: theory and applications, ISTE-Wiley (2010), pp. 425-438
Efficient Hierarchical Graph-Based Video Segmentation

Matthias Grundmann, Vivek Kwatra, Mei Han, Irfan Essa

Computer Vision and Pattern Recognition (CVPR 2010)
Example-based Image Compression

Jing-Yu Cui, Saurabh Mathur, Michele Covell, Vivek Kwatra, Mei Han

International Conference on Image Processing (ICIP 2010)
Fast Covariance Computation and Dimensionality Reduction for Sub-Window Features in Images

Vivek Kwatra, Mei Han

European Conference on Computer Vision (ECCV 2010)
Feature Tracking for Wide-Baseline Image Retrieval

Ameesh Makadia

European Conference on Computer Vision (ECCV) (2010)
Google Street View: Capturing the World at Street Level

Dragomir Anguelov, Carole Dulong, Daniel Filip, Christian Frueh, Stéphane Lafon, Richard Lyon, Abhijit Ogale, Luc Vincent, Josh Weaver

Computer, vol. 43 (2010)
History and Future of Auditory Filter Models

Richard F. Lyon, Andreas G. Katsiamis, Emmanuel M. Drakakis

Proc. ISCAS, IEEE (2010), pp. 3809-3812
Improved Consistent Sampling, Weighted Minhash and L1 Sketching

Sergey Ioffe

ICDM (2010) (to appear)
Looking for Pieces of Needles in Millions of Haystacks: Finding Distorted Audio/Video Snippets

Michele Covell, Shumeet Baluja

International Workshop on Computer Vision (2010)
Machine Hearing: An Emerging Field

Richard F. Lyon

IEEE Signal Processing Magazine, vol. 27 (2010), pp. 131-139
SemWebVid - Making Video a First Class Semantic Web Citizen and a First Class Web Bourgeois -- Semantic Web Challenge

Thomas Steiner, Michael Hausenblas

9th International Semantic Web Conference (ISWC 2010)
Semi-Supervised Hashing for Scalable Image Retrieval

Jun Wang, Sanjiv Kumar, Shih-Fu Chang

IEEE Conf on Computer Vision and Pattern Recognition (CVPR) (2010)
Sound Retrieval and Ranking Using Sparse Auditory Representations

Richard F Lyon, Martin Rehn, Samy Bengio, Thomas C. Walters, Gal Chechik

Neural Computation, vol. 22 (2010), pp. 2390-2416
Table Detection in Heterogeneous Documents

Faisal Shafait, Ray Smith

Document Analysis Systems 2010, ACM International Conference Proceedings series
Taxonomic Classification for Web-based Videos

Yang Song, Ming Zhao, Jay Yagnik, Xiaoyun Wu

IEEE Conf on Computer Vision and Pattern Recognition (CVPR), IEEE (2010)
Video coding mode decision as a classification problem

Rashad Jillani, Urvang Joshi, Chiranjib Bhattacharya, Hari Kalva, RK Ramakrishnan

IS&T/SPIE Electronic Imaging, vol. 7543 (2010), 7543 - 7543 - 8
YouTubeCat: Learning to Categorize Wild Web Videos

Zheshen Wang, Ming Zhao, Yang Song, Sanjiv Kumar, Baoxin Li

IEEE Conf on Computer Vision and Pattern Recognition (CVPR) (2010)
A Biomimetic, 4.5 µW, 120+dB, Log-domain Cochlea Channel with AGC

Andreas G. Katsiamis, Emmanuel M. Drakakis, Richard F. Lyon

IEEE JSSC (Journal of Solid-State Circuits), vol. 44 (2009), pp. 1006-1022
Adapting the Tesseract Open Source OCR Engine for Multilingual OCR

Ray Smith, Daria Antonova, Dar-Shyang Lee

MOCR '09: Proceedings of the International Workshop on Multilingual OCR (2009)
Adaptive, selective, automatic tonal enhancement of faces

Hrishikesh Aradhye, George D. Toderici, Jay Yagnik

ACM Multimedia, ACM, New York, NY, USA (2009), pp. 677-680
Audiovisual Celebrity Recognition in Unconstrained Web Videos

Mehmet Emre Sargin, Hrishikesh Aradhye, Pedro Moreno, Ming Zhao

Proceedings of the IEEE Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2009)
Automatic, Efficient, Temporally-Coherent Video Enhancement for Large Scale Applications

George Toderici, Jay Yagnik

ACM Multimedia, ACM (2009), pp. 609-612
Combined Orientation and Script Detection using the Tesseract OCR Engine

Ranjith Unnikrishnan, Ray Smith

Workshop on Multilingual OCR (MOCR), Proc. 10th Intl. Conf. on Document Analysis and Recognition (ICDAR), (2009)
Computer Vision Interfaces for Interactive Art

Andrew Senior, Alejandro Jaimes

Human-Centric Interfaces for Ambient Intelligence, Elsevier (2009)
Efficient and Robust Music Identification with Weighted Finite-State Transducers

Mehryar Mohri, Pedro Moreno, Eugene Weinstein

IEEE Transactions on Audio, Speech, and Language Processing, vol. to appear (2009)
Flight patterns

Aaron Koblin

SIGGRAPH ASIA '09: ACM SIGGRAPH ASIA 2009 Art Gallery & Emerging Technologies: Adaptation, ACM, New York, NY, USA, pp. 29-29
Google Newspaper Search – Image Processing and Analysis Pipeline

Krishnendu Chaudhury, Ankur Jain, Sriram Thirthala, Vivek Sahasranaman, Shobhit Saxena, Selvam Mahalingam

10th International Conference on Document Analysis and Recognition, ICDAR 2009, pp. 621-625
Hybrid Page Layout Analysis via Tab-Stop Detection

Ray Smith

Proceedings of the 10th international conference on document analysis and recognition, IEEE (2009)
Image Reconstruction in the Gigavision Camera

Feng Yang, Luciano Sbaiz, Edoardo Charbon, Sabine Susstrunk, Martin Vetterli

ICCV workshop OMNIVIS 2009
LSH Banding for Large-Scale Retrieval with Memory and Recall Constraints

Michele Covell, Shumeet Baluja

International Conference on Acoustics, Speech, and Signal Processing, IEEE (2009)
Large-scale Privacy Protection in Google Street View

Andrea Frome, German Cheung, Ahmad Abdulkader, Marco Zennaro, Bo Wu, Alessandro Bissacco, Hartwig Adam, Hartmut Neven, Luc Vincent

IEEE International Conference on Computer Vision (2009)
Low Cost Correction of OCR Errors Using Learning in a Multi-Engine Environment

Ahmad Abdulkader, Matthew R. Casey

Proceedings of the 10th international conference on document analysis and recognition, IEEE (2009)
Models for patch-based image restoration

Mithun Das Gupta, Shyamsundar Rajaram, Nemanja Petrovic, Thomas S. Huang

J. Image Video Process., vol. 2009 (2009), pp. 1-12
Predictive Models for Music

Jean-Francois Paiement, Yves Grandvalet, Samy Bengio

Connection Science, vol. 21 (2009), pp. 253-272
Privacy Protection in Video Surveillance

Andrew W. Senior

Springer (2009)
SD-VBS: The San Diego Vision Benchmark Suite

Sravanthi Kota Venkata, Ikkjin Ahn, Donghwan Jeon, Anshuman Gupta, Christopher Louie, Saturnino Garcia, Serge Belongie, Michael Bedford Taylor

IEEE Workload Characterization Symposium, vol. 0 (2009), pp. 55-64
Shape-based Object Recognition in Videos Using 3D Synthetic Object Models

Alexander Toshev, Ameesh Makadia, Kostas Daniilidis

Computer Vision and Pattern Recognition (2009)
Softcuts: A Soft Edge Smoothness Prior for Color Image Super Resolution

Shengyang Dai, Mei Han, Wei Xu, Ying Wu, Yihong Gong, Aggelos K. Katsaggelos

IEEE Transactions on Image Processing (T-IP), vol. 18 (2009), pp. 969-981
Sound Ranking Using Auditory Sparse-Code Representations

Martin Rehn, Richard F. Lyon, Samy Bengio, Thomas C. Walters, Gal Chechik

ICML 2009 Workshop on Sparse Method for Music Audio
State of the Art in Example-based Texture Synthesis

Li-Yi Wei, Sylvain Lefebvre, Vivek Kwatra, Greg Turk

Eurographics 2009, State of the Art Report, EG-STAR, Eurographics Association
Tour the World: building a web-scale landmark recognition engine

Yantao Zheng, Ming Zhao, Yang Song, Hartwig Adam, Ulrich Buddemeier, Alessandro Bissacco, Fernando Brucher, Tat-Seng Chua, Hartmut Neven

International Conference on Computer Vision and Pattern Recognition (CVPR) (2009)
Tree detection from aerial imagery

Lin Yang, Xiaqing Wu, Emil Praun, Xiaoxu Ma

Proceedings of the 17th ACM SIGSPATIAL international Conference on Advances in Geographic information Systems, Seattle, Washington (2009)
Visualizing Web Images via Google Image Swirl

Yushi Jing, Henry A. Rowley, Chuck Rosenberg, Jingbin Wang, Michele Covell

NIPS Workshop on Statistical Machine Learning for Visual Analytics (2009)
A New Baseline For Image Annotation

Ameesh Makadia, Vladimir Pavlovic, Sanjiv Kumar

European Conference on Computer Vision (ECCV) (2008)
Beyond Sliding Windows: Object Localization by Efficient Subwindow Search

Christoph H. Lampert, Matthew B. Blaschko, Thomas Hofmann

IEEE Computer Vision and Pattern Recognition (CVPR), Anchorage, AK (2008)
Coordinated Multi-Device Presentations: Ambient-Audio Identification

Michael Fink, Michele Covell, Shumeet Baluja

Encyclopedia of Wireless and Mobile Communications, Taylor & Francis (2008), pp. 274-285
Estimating the Spectral Reflectance of Natural Imagery Using Color Image Features

Josh Hyman, Mark Hansen, Eric Graham, Deborah Estrin

Workshop on Applications, Systems, and Algorithms for Image Sensing (2008)
Face Tracking and Recognition with Visual Constraints in Real-World Videos

Minyoung Kim, Sanjiv Kumar, Vladimir Pavlovic, Henry A. Rowley

IEEE Computer Vision and Pattern Recognition (CVPR) (2008)
Fluid in Video: Augmenting Real Video with Simulated Fluids

Vivek Kwatra, Philippos Mordohai, Rahul Narain, Sashi Kumar Penta, Mark Carlson, Marc Pollefeys, Ming C. Lin

Comput. Graph. Forum (Proc. Eurographics), vol. 27 (2008), pp. 487-496
Large Scale Learning and Recognition of Faces in Web Videos

Ming Zhao, Jay Yagnik, Hartwig Adam, David Bau

FG2008
Large-Scale Manifold Learning

Ameet Talwalkar, Sanjiv Kumar, Henry A. Rowley

Computer Vision and Pattern Recognition (CVPR) (2008)
Linear Time Maximally Stable Extremal Regions

David Nistér, Henrik Stewénius

Proc. 10th Europ. Conf. Comput. Vision (2008), pp. 183-196
Markovian Mixture Face Recognition with discriminative face alignment

Ming Zhao

automatic face and gesture recognition, ieee (2008)
Mass Personalization: Social and Interactive Applications using Sound-Track Identification

Michael Fink, Michele Covell, Shumeet Baluja

Journal of Multimedia Tools and Applications, vol. 36 (2008), pp. 115-132
PageRank for Product Image Search

Yushi Jing, Shumeet Baluja

WWW-2008
Permutation Grouping: Intelligent Hash Function Design for Audio & Image Retrieval

Shumeet Baluja, Michele Covell, Sergey Ioffe

International Conference on Acoustics, Speech and Signal Processing (ICASSP-2008)
Reducing Photon Mapping Bandwidth by Query Reordering

Joshua Steinhurst, Greg Coombe, Anselmo Lastra

IEEE Transactions on Visualization and Computer Graphics, vol. 14 (2008)
Solving the label resolution problem in supervised video content classification

Ullas Gargi, Jay Yagnik

MIR '08: Proceeding of the 1st ACM international conference on Multimedia information retrieval, ACM, New York, NY, USA (2008), pp. 276-282
Stereo Matching with Color-weighted Correlation, Hierarchical Belief Propagation and Occlusion Handling

Qingxiong Yang, Liang Wang, Ruigang Yang, Henrik Stewénius, David Nistér

IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) (2008)
Visual Synset: Towards a Higher-level Visual Representation

Yantao Zheng, Ming Zhao, Shi-Yong Neo, Tat-Seng Chua, Qi Tian

CVPR (2008)
VisualRank: Applying PageRank to Large-Scale Image Search

Yushi Jing, Shumeet Baluja

IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 30 (2008), pp. 1877-1890
Waveprint: Efficient Wavelet-Based Audio Fingerprinting

Shumeet Baluja, Michele Covell

Pattern Recognition (2008)
Web-scale Image Annotation

Jiakai Liu, Rong Hu, Meihong Wang, Yi Wang, Edward Chang

Pacific-Rim Conference on Multimedia (2008) (to appear)
An Overview of the Tesseract OCR Engine

Ray Smith

Proc. Ninth Int. Conference on Document Analysis and Recognition (ICDAR), IEEE Computer Society (2007), pp. 629-633
Audio Fingerprinting: Combining Computer Vision & Data Stream Processing

Shumeet Baluja, Michele Covell

Proceedings of the 2007 International Conference on Acoustics, Speech, and Signal Processing
Automated Image Orientation Detection: A Scalable Boosting Approach

Shumeet Baluja

Pattern Analysis and Applications (2007)
Automatic Alignment of Large-scale Aerial Rasters to Road-maps

James Xiaqing Wu, Rodrigo Carceroni, Hui Fang, Steve Zelinka, Andrew Kirmse

ACM GIS 2007, ACM
Boosting Sex Identification Performance

Shumeet Baluja, Henry A. Rowley

International Journal of Computer Vision, vol. 71 (2007), pp. 111-119
Canonical Image Selection from the Web

Yushi Jing, Shumeet Baluja, Henry A. Rowley

ACM International Conference on Image and Video Retrieval (2007)
Classification of Weakly-Labeled Data with Partial Equivalence Relations

Sanjiv Kumar, Henry A. Rowley

International Conference on Computer Vision (ICCV) (2007)
Detail Preserving Shape Deformation in Image Editing

Hui Fang, John C. Hart

Proc. SIGGRAPH 2007, ACM, San Diego, no. 12
Efficient Complete and Incomplete Path Openings and Closings

Hugues Talbot, Ben Appleton

Image and Vision Computing, vol. 25, no. 4 (2007), pp. 416-425
GRADE-IV: Visualizing Graphics Library Operations in an Executing Program

Hidehiko Abe, Takeo Igarashi

SIGGRAPH 2007 Posters, ACM, no. 118
Google Books: Making the public domain universally accessible

Adam Langley, Dan Bloomberg

Document Recognition and Retrieval XIV, SPIE (2007), 65000H1-65000H10
Imagers as sensors: Correlating plant CO2 uptake with digital visible-light imagery

Josh Hyman, Eric Graham, Mark Hansen, Deborah Estrin

Data Management for Sensor Networks (2007)
Known-Audio Detection Using Waveprint: Spectrogram Fingerprinting By Wavelet Hashing

Michele Covell, Shumeet Baluja

Proceedings of the 2007 International Conference on Acoustics, Speech, and Signal Processing
Music Identification with Weighted Finite-State Transducers

Eugene Weinstein, Pedro J. Moreno

Proceedings of the International Conference in Acoustics, Speech and Signal Processing (ICASSP) (2007)
Ordinal Regression Based Subpixel Shift Estimation for Video Super-Resolution

Mithun Das Gupta, Shyamsundar Rajaram, Thomas S. Huang, Nemanja Petrovic

EURASIP Journal on Advances in Signal Processing, vol. 85963 (2007)
Practical Gammatone-Like Filters for Auditory Modeling

Andreas G. Katsiamis, Emmanuel M. Drakakis, Richard F. Lyon

EURASIP Journal on Audio, Speech, and Music Processing, vol. 2007 (2007), pp. 12
Practical MythTV: Building a PVR and Media Center PC

Michael Still, Stewart Smith

Apress (2007), pp. 350
Raising Global Awareness with Google Earth

Rebecca Moore

Imaging Notes, vol. 22, no. 2 (2007), pp. 24-29
Robust music identification, detection, and analysis

M. Mohri, Pedro J. Moreno, Eugene Weinstein

Proceedings of the International Conference on Music Information Retrieval (ISMIR) (2007)
Temporally Consistent Reconstruction from Multiple Video Streams using Enhanced Belief Propagation

E. Scott Larsen, Philippos Mordohai, Marc Pollefeys, Henry Fuchs

Eleventh IEEE International Conference on Computer Vision (2007)
Advertisement Detection and Replacement using Acoustic and Visual Repetition

Michele Covell, Shumeet Baluja, Michael Fink

Proceedings of the 2006 International Workshop on Multimedia Signal Processing, IEEE
Content Fingerprinting Using Wavelets

Shumeet Baluja, Michele Covell

Proceedings of the Conference of Visual Media Production, IET (2006)
Detecting Ads in Video Streams using Acoustic and Visual Cues

Michele Covell, Shumeet Baluja, Michael Fink

Computer Magazine (2006), pp. 135-137
Globally Minimal Surfaces by Continuous Maximal Flows

Ben Appleton, Hugues Talbot

IEEE Trans. Pattern Anal. Mach. Intell., vol. 28 (2006), pp. 106-118
Large Scale Image-Based Adult-Content Filtering

Henry A. Rowley, Yushi Jing, Shumeet Baluja

1st International Conference on Computer Vision Theory, Sebutal, Portugal (2006)
Query by Semantic Example

Nikhil Rasiwasia, Nuno Vasconcelos, Pedro J. Moreno

CIVR (2006), pp. 51-60
Social- and Interactive-Television Applications Based on Real-Time Ambient-Audio Identification

Michael Fink, Michele Covell, Shumeet Baluja

European Interactive TV Conference (Euro-ITV) (2006)
Time-Scale Modification for 3G-Telephony Video

Michele Covell, Sumit Roy, Bo Shen

Proceedings of the 2006 International Workshop on Multimedia Signal Processing, IEEE
Boosting Sex Identification Performance

Shumeet Baluja, Henry A. Rowley

Proceedings of the Seventeenth Innovative Applications of Artificial Intelligence Conference, AAAI (2005), pp. 1508-1513
Large Scale Performance Measurement of Content-Based Automated Image-Orientation Detection

Shumeet Baluja, Henry A. Rowley

International Conference on Image Processing, Genova, Italy (2005)
The Definitive Guide to ImageMagick

Michael Still

Apress, Apress, Inc. 2560 Ninth St., Ste. 219 Berkeley, CA 94710 (2005), pp. 335
Efficient Face Orientation Discrimination

Shumeet Baluja, Mehran Sahami, Henry A. Rowley

International Conference on Image Processing (ICIP-2004)