I am a 4th year undergraduate student major in Electronic Information Engineering, Beijing University of Posts and Telecommunications. My research interests include statistical machine learning, data mining and parallel computation. I also have a well command of algorithms and mathematics. I am working with Prof. Zhihu Du on parallel and distributed computation, and more specifically, efficient particle-mesh spreading on GPUs and handle large amount of images via MPI, Spark and Hadoop. I am also working with Prof. Jie Tang on data mining, machine learning, information retrieve, and more specifically, name disambiguation in large scale database.


Beijing Univesity of Posts and Telecommunications

BA, 2011-2015(expected)

GPA 87.06, GPA of major 89.62

Major Courses

  • Probability Theory and Stochastic Process 98
  • Mathematical Analysis I 94
  • Advanced Algebra 91
  • Discrete Mathematics 94
  • Analytic Geometry of Space 93
  • Practicum in Computer Skills 93
  • Database Technologies and Applications 91
  • Fundamentamentals of Information Theory 95
  • Digital Signal Processing 91
  • Digital Circuit and Logic Design 99
  • Signals and Systems 91

Research Experience

Knowledge Engineering Group, Tsinghua University

Advisor: Jie Tang, 2014/9 - present

  • Identify Experts Online
    • Develop a GUI to accept the input information of the experts entered or from the database.
    • Use the given information to identify the experts on Baidu Baike, which is a website just like Wikipeida but in Chinese.
    • Try various methods including SVM or Logistic Regression, and achieve Precision of 90.9%, Recall of 98.5% and F1 of 94.5%.
    • It's my independent work. And my implementation can be found on my github repo:IdentifyExpertsInBaiduBaike and the work is clarified in my report in Chinese.
  • Crawler For Google Scholar
    • Develop a web crawler to crawl the whole co-author network with all their papers on Google Scholar using Scrapy Framework in Python.
    • In order to prevet the blockage by Google, I also write several crawler to obtain thounsands of proxy IP and port.
    • This is my independent work. My implementation can be found on my github repo:CrawlerForGoogleScholar and the work is clarified in my report.
  • Develop of ArnetMiner II
    • ArnetMiner is a website that offers comprehensive search and mining services for academic community. I am working on the second version of it.
    • My work mainly involves with the data, including cache, mongodb, redis and the corresponding computation.
    • In addition, I am working on improve the performance of name disambiguation of the system and how to handle the new data properly.
    • The ArnetMiner has been released already, you can visit it here

High Performance Computing Laboratory, Tsinghua University

Advisor: Zhihu Du, 2014/4 - present

  • Implement Parallel Algorithm
    • Learn the implementation and optimization of parallel algorithm on GPU hardware via CUDA.
    • Try to implement the SVM (Support Vector Machine) and parallelize it on GPU by myself.
  • Algorithm Optimization
    • We optimized all spreading and interpolation algorithms from its granularity to its details in algorithm on GPU.
    • I implement several spreading algorithms. Also do some test work and plot corresponding figures in the paper.
    • Our work is submitted to IPDPS'2015, the paper can be found here
  • Develop Image Library and Process System
    • Crawl large amounts of images online first.
    • Handle those images via parallel computing and distributed systems, more specifically, using MPI, Spark, Hadoop respectively and compare their performance in different situations.


Efficient Particle-Mesh Spreading on GPUs (submitted)

Xiangyu Guo, Xing Liu, Peng Xu, Zhihu Du, Edmond Chow

In 29th IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2015


  • 2014 Enter the quarter-final in Microsoft Programming of Beauty Contest, rank 134 of 988
  • 2013 Bronze Medalist in ACM-ICPC Hunan Invitational Programming Contest
  • 2013 Gold Medalist in Programming Contest of Beijing University of Posts and Telecommunications

Skills and Qualifications

Programming Languages: C/C++, Python, CUDA, Matlab, Shell, R, Scala

Softwares and Tools: LaTeX, Git, LIBSVM, MongoDB, Scrapy, Redis, Spark

Speaking Languages: Chinese (mother tongue), fluent in speaking and written English

Mathematics: Interested in Statistics, Number Theory, Graph Theory, Cominatorics and learn related textbooks by myself. I also often solve problems on Euler Project.

Algorithms: I love participating various programming contest. And I have a rate of 1732 of Codeforces. I also solve the problem on online judge such as LeetCode and write my own solution report, my solution report can be found on my blog's category:solution. Furthermore, I also read part of the CLRS and write my study notes on my blog's category:CLRS. I also write my own algorithm template in LaTeX, which can be found here.

Coursera: Machine Learning, Game Theory, Probabilistic Graphical Models, Mining Massive Datasets, Statistical Learning. My notes and summaries of Coursera can be found on my blog's tag:Coursera


You can download my CV here