Peng Xu

Bio

I am a 4th year undergraduate student major in Electronic Information Engineering, Beijing University of Posts and Telecommunications. My research interests include statistical machine learning, data mining and parallel computation. I also have a well command of algorithms and mathematics. I am working with Prof. Zhihu Du on parallel and distributed computation, and more specifically, efficient particle-mesh spreading on GPUs and handle large amount of images via MPI, Spark and Hadoop. I am also working with Prof. Jie Tang on data mining, machine learning, information retrieve, and more specifically, name disambiguation in large scale database.

Education

Beijing Univesity of Posts and Telecommunications

BA, 2011-2015(expected)

GPA 87.06, GPA of major 89.62

Major Courses

Probability Theory and Stochastic Process 98
Mathematical Analysis I 94
Advanced Algebra 91
Discrete Mathematics 94
Analytic Geometry of Space 93
Practicum in Computer Skills 93
Database Technologies and Applications 91
Fundamentamentals of Information Theory 95
Digital Signal Processing 91
Digital Circuit and Logic Design 99
Signals and Systems 91

Research Experience

Knowledge Engineering Group, Tsinghua University

Advisor: Jie Tang, 2014/9 - present

Identify Experts Online
- Develop a GUI to accept the input information of the experts entered or from the database.
- Use the given information to identify the experts on Baidu Baike, which is a website just like Wikipeida but in Chinese.
- Try various methods including SVM or Logistic Regression, and achieve Precision of 90.9%, Recall of 98.5% and F1 of 94.5%.
- It's my independent work. And my implementation can be found on my github repo:IdentifyExpertsInBaiduBaike and the work is clarified in my report in Chinese.
Crawler For Google Scholar
- Develop a web crawler to crawl the whole co-author network with all their papers on Google Scholar using Scrapy Framework in Python.
- In order to prevet the blockage by Google, I also write several crawler to obtain thounsands of proxy IP and port.
- This is my independent work. My implementation can be found on my github repo:CrawlerForGoogleScholar and the work is clarified in my report.
Develop of ArnetMiner II
- ArnetMiner is a website that offers comprehensive search and mining services for academic community. I am working on the second version of it.
- My work mainly involves with the data, including cache, mongodb, redis and the corresponding computation.
- In addition, I am working on improve the performance of name disambiguation of the system and how to handle the new data properly.
- The ArnetMiner has been released already, you can visit it here

High Performance Computing Laboratory, Tsinghua University

Advisor: Zhihu Du, 2014/4 - present

Implement Parallel Algorithm
- Learn the implementation and optimization of parallel algorithm on GPU hardware via CUDA.
- Try to implement the SVM (Support Vector Machine) and parallelize it on GPU by myself.
Algorithm Optimization
- We optimized all spreading and interpolation algorithms from its granularity to its details in algorithm on GPU.
- I implement several spreading algorithms. Also do some test work and plot corresponding figures in the paper.
- Our work is submitted to IPDPS'2015, the paper can be found here
Develop Image Library and Process System
- Crawl large amounts of images online first.
- Handle those images via parallel computing and distributed systems, more specifically, using MPI, Spark, Hadoop respectively and compare their performance in different situations.

Publications

Efficient Particle-Mesh Spreading on GPUs (submitted)

Xiangyu Guo, Xing Liu, Peng Xu, Zhihu Du, Edmond Chow

In 29th IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2015

Awards

2014 Enter the quarter-final in Microsoft Programming of Beauty Contest, rank 134 of 988
2013 Bronze Medalist in ACM-ICPC Hunan Invitational Programming Contest
2013 Gold Medalist in Programming Contest of Beijing University of Posts and Telecommunications

Skills and Qualifications

Programming Languages: C/C++, Python, CUDA, Matlab, Shell, R, Scala

Softwares and Tools: LaTeX, Git, LIBSVM, MongoDB, Scrapy, Redis, Spark

Speaking Languages: Chinese (mother tongue), fluent in speaking and written English

Mathematics: Interested in Statistics, Number Theory, Graph Theory, Cominatorics and learn related textbooks by myself. I also often solve problems on Euler Project.

Algorithms: I love participating various programming contest. And I have a rate of 1732 of Codeforces. I also solve the problem on online judge such as LeetCode and write my own solution report, my solution report can be found on my blog's category:solution. Furthermore, I also read part of the CLRS and write my study notes on my blog's category:CLRS. I also write my own algorithm template in LaTeX, which can be found here.

Coursera: Machine Learning, Game Theory, Probabilistic Graphical Models, Mining Massive Datasets, Statistical Learning. My notes and summaries of Coursera can be found on my blog's tag:Coursera

CV

You can download my CV here