Xinchen Liu (刘鑫辰)
Email: or *My previous email xinchenliu [at] bupt [dot] edu [dot] cn has been deprecated.
I am currently a Senior Researcher at Computer Vision and Multimedia Lab of JD AI Research working with Dr. Wu Liu and Dr. Tao Mei. I received the Ph. D. degree in computer science under supervision of Prof. Huadong Ma in Beijing Key Lab of Intelligent Telecommunication Software and Multimedia, Beijing University of Posts and Telecommunications, in June 2018. My research interests include multimedia computing, computer vision, and their applications in retail.
We are recruiting self-motivated interns in computer vision, deep learning, and multimedia. Please directly send your CV to my email if you are interested in the positions! :D
- November, 2021, I will serve as an Area Chair for ICME 2022.
- October, 2021, our team obtained the 2nd Award in ICCV21 DeeperAction Challenge Track 3 - Kinetics-TPS Challenge on Part-level Action Parsing. Congratulations to our teammates Xiaodong Chen and Kun Liu! The technical report of our solution can be found HERE.
- September, 2021, The 2nd International Workshop On Human-Centric Multimedia Analysis (HUMA) will be held on 20 October 2021 in conjunction with ACM Multimedia 2021. This year we have three invited talks and five paper presentations. More details can be found on the Website.
- August, 2021, we are organizing The 2nd International Workshop On Human-Centric Multimedia Analysis (HUMA) in conjunction with ACM Multimedia 2021. Paper submission due is extended to 17 August 2021. Welcome submissions! Website
- July, 2021, one paper on Explanable Person ReID was accepted by ICCV 2021.
- May, 2021, our paper on IEEE ISCAS 2021 was selected as MSA-TC “Best Paper Award - Honorable Mention”. Thanks to our collaborators!
- May, 2021, our team are organizing ACM Multimedia Asia 2021. Paper submission due: 19 July 2021. Website
- April, 2021, we are organizing The 2nd International Workshop On Human-Centric Multimedia Analysis (HUMA) with ACM Multimedia 2021. Paper submission due: 10 August 2021. Website
- January, 2021, one paper on gait recognition was accepted as a lecture (oral) presentation by IEEE ISCAS 2021. Congratulations to Jinkai Zheng and thanks to our collaborators!
- December, 2020, our team won the Championship in NAIC Challenge 2020 AI+Person ReID Track. Congratulations to our teammates Xingyu Liao, Lingxiao He, Peng Cheng, and Guanan Wang!
- November, 2020, the codebase for human parsing and vehicle parsing in our papers of ACM MM’19 and ACM MM’20, has been released, please refer to CODE. It supported multiple segmentation and parsing methods and two datasets (LIP for human, MVP for vehicle).
- November, 2020, the code for the paper, “Beyond the Parts: Learning Multi-view Cross-part Correlation for Vehicle Re-identification, ACM MM, 2020”, has been released, please refer to CODE.
- August, 2020, one regular paper and one demo paper are accpepted by ACM Multimedia, 2020.
- July, 2020, we released a large-scale multi-grained vehicle parsing dataset, MVP dataset, for vehicle part segmentation. For more datails, please refer to MVP.
- July, 2020, one paper was published on IEEE Transactions on Image Processing LINK.
- June, 2020, our team presented FastReID, a powerful toolbox of object re-identification for academia and industry. It achieves the state-of-the-art performance for both person Re-Id and vehicle Re-Id. Please refer to our PAPER and CODE for more details.
- December, 2019, we made a perfromance list of recent vehicle Re-Id methods on the VeRi dataset. We also apply a strong baseline model for Re-Id on six vehicle Re-Id datasets. Please refer to LINK and the strong baseline model CODE.
- November, 2019, my Ph. D. Thesis, “Research on Key Techniques of Vehicle Search in Urban Video Surveillance Networks”, was awarded as the Outstanding Doctoral Dissertation Award of China Society of Image and Graphics (中国图象图形学学会优秀博士学位论文). NEWS PDF
- October 22, 2019, one paper (Paper ID: P1C-10) will be presented at ACM Multimedia 2019, Nice, France.
- July, 2019, our paper, “PROVID: Progressive and Multimodal Vehicle Reidentification for Large-Scale Urban Surveillance” on IEEE Trans. Multimedia 20(3): 645-658, (2018), was awarded the TMM Multimedia Prize Paper Award 2019. Thanks to Dr. Wu Liu, Dr. Tao Mei, and Prof. Huadong Ma!
- July, 2019, one paper was presented at ICME 2019, Shanghai, China.
- June, 2019, one paper was presented at CVPR 2019, Long Beach, USA.
Publications (dblp Google Scholar)
- Wu Liu, Xinchen Liu, Jingkuan Song, Dingwen Zhang, Wenbing Huang, Junbo Guo, John Smith: HUMA’21: 2nd International Workshop on Human-centric Multimedia Analysis. ACM Multimedia 2021: 5690-5691
- Xiaodong Chen, Xinchen Liu, Kun Liu, Wu Liu, Tao Mei: A Baseline Framework for Part-level Action Parsing and Action Recognition. CoRR abs/2110.03368 (2021) arXiv (2nd place solution to Kinetics-TPS Track on Part-level Action Parsing in ICCV DeeperAction Challenge 2021)
- Xiaodong Chen, Xinchen Liu, Wu Liu, Yongdong Zhang, Xiao-Ping Zhang, Tao Mei: Explainable Person Re-Identification with Attribute-guided Metric Distillation. ICCV 2021 PAPER PDF
- Jinkai Zheng, Xinchen Liu, Chenggang Yan, Jiyong Zhang, Wu Liu, Xiaoping Zhang, Tao Mei: TraND: Transferable Neighborhood Discovery for Unsupervised Cross-domain Gait Recognition. ISCAS 2021 PAPER PDF CODE (IEEE CAS MSA-TC Best Paper Award Honorable Mention)
Xiaodong Chen, Wu Liu, Xinchen Liu, Yongdong Zhang, Tao Mei: A Cross-modality and Progressive Person Search System. ACM MM Demo 2020: 4550-4552 PDF
Lingxiao He, Xingyu Liao, Wu Liu, Xinchen Liu, Peng Cheng, Tao Mei: FastReID: A Pytorch Toolbox for General Instance Re-identification. CoRR abs/2006.02631 (2020) ARXIV
Qi Wang, Xinchen Liu, Wu Liu, Anan Liu, Wenyin Liu, Tao Mei: MetaSearch: Incremental Product Search via Deep Meta-learning. IEEE Trans. Image Process. 29: 7549-7564 (2020) LINK
Xinchen Liu, Meng Zhang, Wu Liu, Jingkuan Song, Tao Mei: BraidNet: Braiding Semantics and Details for Accurate Human Parsing. ACM MM 2019: 338-346 PDF
Xinchen Liu, Wu Liu, Huadong Ma, Shuangqun Li: PVSS: A Progressive Vehicle Search System for Video Surveillance Networks. J. Comput. Sci. Technol. 34(3): 634-644 (2019) PDF
Xinchen Liu, Wu Liu, Tao Mei, Huadong Ma: PROVID: Progressive and Multimodal Vehicle Reidentification for Large-Scale Urban Surveillance. IEEE Trans. Multimedia 20(3): 645-658, (2018) (TMM Multimedia Prize Paper Award 2019) PDF
Xinchen Liu, Wu Liu, Huadong Ma, Shuangqun Li: A Progressive Vehicle Search System for Video Surveillance Networks. BigMM 2018: 1-7
Wenhui Gao, Xinchen Liu, Huadong Ma, Yanan Li, Liang Liu: MMH: Multi-Modal Hash for Instant Mobile Video Search. MIPR 2018: 57-62
- Wu Liu, Xinchen Liu, Huadong Ma, Peng Cheng: Beyond Human-level License Plate Super-resolution with Progressive Vehicle Search and Domain Priori GAN. ACM Multimedia 2017: 1618-1626 PDF
Shuangqun Li, Xinchen Liu, Wu Liu, Huadong Ma, Haitao Zhang: A discriminative null space based deep learning approach for person re-identification. CCIS 2016: 480-484
Xinchen Liu, Wu Liu, Tao Mei, Huadong Ma: A Deep Learning-Based Approach to Progressive Vehicle Re-identification for Urban Surveillance. ECCV (2) 2016: 869-884 PDF
Xinchen Liu, Wu Liu, Huadong Ma, Huiyuan Fu: Large-scale vehicle re-identification in urban surveillance videos. ICME 2016: 1-6 (Best Student Paper Award) PDF
- Xinchen Liu, Huadong Ma, Huiyuan Fu, Mo Zhou: Vehicle Retrieval and Trajectory Inference in Urban Traffic Surveillance Scene. ICDSC 2014: 26:1-26:6
June, 2020, NCIG2020 Outstanding Doctor and Young Scholor Panel (2020全国图象图形学学术会议，优秀博士与青年学者论坛), “Large-scale Vehicle Search in Smart City (智慧城市中的车辆搜索)” (In Chinese). SLIDES
Area Chair, ICME 2022
Local Session Chair, ACM Multimedia 2021
Proceedings Co-Chair, ACM Multimedia Asia 2021
Co-chair, HUMA 2021 Workshop at ACM Multimedia 2021
Journal Reviewer: IEEE TPAMI, IEEE TMM, IEEE TIP, IEEE TCSVT, IEEE TITS, IEEE TMC, ACM TOMM, ACM TIST, IoTJ, Neurocomputing, MTAP, …
Conference Reviewer: CVPR, ACM MM, AAAI, ICME, ICASSP, ICIP, …
Membership: IEEE/ACM/CCF/CSIG Member.
Awards and Honors
ICCV 2021 DeeperAction Challenge, Track 3 Kinetics-TPS Challenge on Part-level Action Parsing, 2nd Award
IEEE CAS MSA-TC Best Paper Award - Honorable Mention, 2021, for the paper “TraND: Transferable Neighborhood Discovery for Unsupervised Cross-domain Gait Recognition”
Outstanding Doctoral Dissertation Award of China Society of Image and Graphics, 2019, for my Ph. D. thesis “Research on Key Techniques of Vehicle Search in Urban Video Surveillance Networks”
IEEE TMM Multimedia Prize Paper Award, 2019, for the paper “PROVID: Progressive and Multimodal Vehicle Reidentification for Large-Scale Urban Surveillance”
ICME 2019, 2021, Outstanding Reviewer Award
CVPR 2019 LIP Challenge, Track 3 Multi-Person Human Parsing, 2nd Award
CVPR 2018 LIP Challenge, Track 1 Single-Person Human Parsing, 2nd Award
IEEE ICME Best Student Paper Award, 2016, for the paper “Large-scale vehicle re-identification in urban surveillance videos”
Progressive Vehicle Search in Larve-scale Surveillance Networks (More Details)
Compared with person re-identification, which has concentrated attention, vehicle re-identification is an important yet frontier problem in video surveillance and has been neglected by the multimedia and vision communities. Since most existing approaches mainly consider the general vehicle appearance for re-identification while overlooking the distinct vehicle identifier, such as the license number plate, they attain suboptimal performance. In this work, we propose PROVID, a PROgressive Vehicle re-IDentification framework based on deep neural networks. In particular, our framework not only utilizes the multi-modality data in large-scale video surveillance, such as visual features, license plates, camera locations, and contextual information, but also considers vehicle re-identification in two progressive procedures: coarse-to-fine search in the feature domain, and near-to-distant search in the physical space. Furthermore, to evaluate our progressive search framework and facilitate related research, we construct the VeRi dataset, which is the most comprehensive dataset from real-world surveillance videos. It not only provides large numbers of vehicles with varied labels and sufficient cross-camera recurrences but also contains license number plates and contextual information. Extensive experiments on the VeRi dataset demonstrate both the accuracy and efficiency of our progressive vehicle re-identification framework.
Multi-grained Vehicle Parsing (More Details)
We present a novel large-scale dataset, Multi-grained Vehicle Parsing (MVP), for semantic analysis of vehicles in the wild, which has several featured properties. First of all, the MVP contains 24,000 vehicle images captured in read-world surveillance scenes, which makes it more scalable than existing datasets. Moreover, for different requirements, we annotate the vehicle images with pixel-level part masks in two granularities, i.e., the coarse annotations of ten classes and the fine annotations of 59 classes. The former can be applied to object-level applications such as vehicle Re-Id, fine-grained classification, and pose estimation, while the latter can be explored for high-quality image generation and content manipulation. Furthermore, the images reflect complexity of real surveillance scenes, such as different viewpoints, illumination conditions, backgrounds, and etc. In addition, the vehicles have diverse countries, types, brands, models, and colors, which makes the dataset more diverse and challenging. A codebase for person and vehicle parsing can be found HERE.
Fine-grained Human Parsing
This paper focuses on fine-grained human parsing in images. This is a very challenging task due to the diverse person appearance, semantic ambiguity of different body parts and clothing, and extremely small parsing targets. Although existing approaches can achieve significant improvement by pyramid feature learning, multi-level supervision, and joint learning with pose estimation, human parsing is still far from being solved. Different from existing approaches, we propose a Braiding Network, named as BraidNet, to learn complementary semantics and details for fine-grained human parsing. The BraidNet contains a two-stream braid-like architecture. The first stream is a semantic abstracting net with a deep yet narrow structure which can learn semantic knowledge by a hierarchy of fully convolution layers to overcome the challenges of diverse person appearance. To capture low-level details of small targets, the detail-preserving net is designed to exploit a shallow yet wide network without down-sampling, which can retain sufficient local structures for small objects. Moreover, we design a group of braiding modules across the two sub-nets, by which complementary information can be exchanged during end-to-end training. Besides, in the end of BraidNet, a Pairwise Hard Region Embedding strategy is propose to eliminate the semantic ambiguity of different body parts and clothing. Extensive experiments show that the proposed BraidNet achieves better performance than the state-of-the-art methods for fine-grained human parsing.
Try Human Parsing Online API at JD Neuhub.
Social Relation Recognition
Discovering social relations, e.g., kinship, friendship, etc., from visual contents can make machines better interpret the behaviors and emotions of human beings. Existing studies mainly focus on recognizing social relations from still images while neglecting another important media—video. On the one hand, the actions and storylines in videos provide more important cues for social relation recognition. On the other hand, the key persons may appear at arbitrary spatial-temporal locations, even not in one same image from beginning to the end. To overcome these challenges, we propose a Multi-scale Spatial-Temporal Reasoning (MSTR) framework to recognize social relations from videos. For the spatial representation, we not only adopt a temporal segment network to learn global action and scene information, but also design a Triple Graphs model to capture visual relations between persons and objects. For the temporal domain, we propose a Pyramid Graph Convolutional Network to perform temporal reasoning with multi-scale receptive fields, which can obtain both long-term and short-term storylines in videos. By this means, MSTR can comprehensively explore the multi-scale actions and story-lines in spatial-temporal dimensions for social relation reasoning in videos. Extensive experiments on a new large-scale Video Social Relation dataset demonstrate the effectiveness of the proposed framework. The dataset can be download from BaiduPan (~57GB, download code: jzei).
Last Update: December, 2021