Pengcheng Xu

Here is my warm greetings! Welcome to my home page. My name is Pengcheng Xu(pronounced as Peng-cheng Hsu).
In Chinese, my name is written as 徐鹏程, where 徐 is Xu, 鹏 is Peng, 程 is Cheng. 徐 is my surname(Also has the meaning of slow). 鹏 is a giant bird in ancient Chinese myth. It has huge wings and can fly 5000 kilometers at a time. 程 means journey. So, basically my name means the journey of the huge bird 鹏.This name carries the hope of my parents for me to have a bright and shining future, as well as a promising and ambitious path ahead.

Welcome to talk to ChatGPT version of me on my PengchengGPT website 😄!
ChatGPT-Chatbot-Personal-Website repo is here. If you think it's cool or want to use it, please star it! Thank you~

Welcome to play a funny battle game I designed and developed on Wacky Brawlers ! I am welcome to any suggestions and bugs you found! Planning to make it a online multiplayer game.

🌱 Education

Currently, I'm pursuing a PhD degree in Computer Science from the University of California Irvine,starting from Sep,2024, which I'm expected to complete by 2029.
I got my Master's degree in Electrical and Computer Engineering from the University of Illinois Urbana-Champaign,August,2022 - December,2023.
I previously obtained my Bachelor's degree in the same field ECE from the University of Michigan - Shanghai Jiao Tong University Joint institute between 2018 and 2022.

🎸 Hobbies

In my free time, I enjoy singing, playing the guitar, working out, playing basketball, tennis, table tennis, swimming, reading, watching films(especially sci-fi ones), and exploring everything related to science and technology. I'm particularly inspired by Richard Feynman, Tsung-Dao Lee. You can check out some of my technology talks and guitar playing on My bilibili.

💻 Programming Languages

When it comes to programming languages, I'm skilled in C, C++, and Python. Additionally, I have project experience with MATLAB, Verilog, Java, R, and Javascript. I also have experience using various frameworks and libraries such as PyTorch, TensorFlow, Keras, Sk-learn, Pandas, and Horovod.

🔭 Area of Interest

My main interests are in applied machine learning for health or science(bioinformatics, molecular optimization, and so on),Large Language Model(LLM in general,LoRA, RLHF, Transformer or DNA,pangenome, protein, single cell foundation model. LLM for lay-summarization), Transformer Tokenization, Positional Encoding,Normalization, Graph Transformer,Graph Neural Networks, Reinforcement Learning, Manifold Learning, Diffusion Model. Computer Vision, Multi-Modal Learning, distributed systems and distributed ML, and computer architecture.

🔬 Research Experience

Molecular de-novo design through Decision Transformer and Oracle-feedback reinforcement learning
First Author
May 2023 - Present
Advisors: Tianfan Fu (Assistant Professor at Rensselaer Polytechnic Institute), Jimeng Sun (Professor at UIUC CS Department)
Accepted to KDD 2024 AIDSH Workshop(Oral). Read the paper.
Adapting Large Language Models for Biomedical Lay Summarization
Third Author
Apr 2024 - Jun,2024
Advisor: Hongyi Xin, Associate Professor at University of Michigan - Shanghai Jiao Tong University Joint Institute.
Accepted to ACL 2024 BioNLP Workshop. Read the paper.
MIRACLE: Multi-task learning based Interpretable Regulation of Autoimmune diseases through Common Latent Epigenetics
First author
Sep 2021 - Present
Advisor: Hongyi Xin, Associate Professor at University of Michigan - Shanghai Jiao Tong University Joint Institute.
Submitted paper under review. Read the paper preprint.
L2 Normalization and Geodesic Distance for Enhanced Information Preservation in Visualizing High-dimensional Single-cell Sequencing Data
Fourth Author
Sep 2023 - Jun,2024
Advisor: Hongyi Xin, Associate Professor at University of Michigan - Shanghai Jiao Tong University Joint Institute.
Accepted to ACM BCB 2024, SigBio Best Paper Award (1 /204 Accepted). Read the paper.
Pipe-Déjàvu: Hardware-aware Latency Predictable, Differentiable Search for Faster Config and Convergence of Distributed ML Pipeline Parallelism
First author
Feb 2023 - May 2023
Advisor: Indranil Gupta, Professor at UIUC CS Department
Read the research report

💼Work Experience

Research Scientist Intern
XtalPi Inc (QuantumPharm Inc) | XAB-Antibody IDD
Duration: Aug,2024 - Present
Antibody Binding Affinity prediction AI model design
Software Engineer Intern
Amazon Web Service, Seattle, WA
Duration: May,2023 - Aug,2023
Worked with VMware Cloud on AWS Group.
Multi-modal Cognitive Computing Algorithm Intern
Shanghai AI Laboratory
Duration: May,2022 - Aug,2022
Worked on Multi-modal target detection with zero-shot depth estimation and Multi-modal Neural Architecture Search.
OpenMLDB - Open-source Software Developer
4Paradigm Co.,Ltd
Duration: July,2022 - Oct,2022(Part-Time)
Developed AutoFE: automated feature engineering tool. View pull request.
Presented at OpenMLDB Meetup No.7 and OpenMLDB-GitLink Code Camp 2022 mid-term presentation.
Deep Learning Software Engineer Intern
Intel Corporation
Duration: Nov,2021 - June,2022
Worked on Intel Neural Compressor and ML inference server software.
Algorithms Intern
Shukun Technology Co.,Ltd
Duration: Dec,2020 - Apr,2021
Worked on Multi-node Training for 3D-UNet with horovod.
Presented on multi-node training and horovod.

🧑‍🏫 Teaching Experience

Teaching Assistant for VE370 Intro to Computer Organization, 2021 Fall
Supervisor: Gang Zheng. University of Michigan - Shanghai Jiao Tong University Joint institute.

🏆 Honors & Awards

2022 Shanghai Jiao Tong University Outstanding Graduate (School Level) - Aug 2022
2021 Microsoft Imagine Cup Global Competition - Third Prize in China - Jan 2021
2020 Mathematical Contest in Modeling - Meritorious Winner (Top 6%) - Apr 2020 [PDF]
2020 “Jidong Cup” CCVR China Virtual Reality Competition, Product Creative Group - Second Prize - Nov 2020
2018-2019 and 2020-2021 Academic Year Undergraduate Excellence Scholarship - Nov 2019/2021

Resume

Recent Work

Molecular de-novo design through Transformer-based Reinforcement Learning

(Accepted to KDD-AIDSH workshop 2024, Oral) Pengcheng Xu, Tianfan Fu, Wenhao Gao, Jimeng Sun. 2024. Preprint Link: http://arxiv.org/abs/2310.05365

Advisor: Tianfan Fu, Jimeng Sun

Implemented a decision transformer architecture to improve the AUC for over fifteen molecular optimization tasks for 5% each on average.
Applied Oracle-feedback reinforcement learning on the downstream tasks to reach higher performance than pretrained model.
Carried out ablation study and investigation into loss curve and conditional probability over the next token as a function of previously chosen ones according to the model

Adapting Large Language Models for Biomedical Lay Summarization

(Accepted to ACL-BioNLP workshop 2024) Jieli Zhou*, Cheng Ye*, Pengcheng Xu, Hongyi Xin. 2024. Paper Link: https://aclanthology.org/2024.bionlp-1.76.pdf

Adapting Large Language Models for Biomedical Lay Summarization | Research Assistant Apr 2024 - June 2024 Advisor: Jieli Zhou, PhD of UM-SJTU Joint Institute, Shanghai Jiao Tong University. Hongyi Xin, Associate Professor of UM-SJTU Joint Institute, Shanghai Jiao Tong University

Explored adaptable and interpretable neural network to find common genotype given 480k dimension sites, hundreds of sample.
Fine-tuned LLama3 using Low-Rank Adaptation (LoRA) and optimized model performance, achieving a 68.8% improvement in the LENS readability score, leading to the first place in readability at the 2024 BioLaySumm workshop.
Implemented K-shot prompting based on semantic similarity, enhancing factuality scores by 15.0% (AlignScore) and 7.9% (SummaC) through the integration of contextually relevant examples, leading to more accurate and relevant summaries.
Developed techniques to resolve repeated word issues post-fine-tuning, resulting in a 12% increase in coherence and conciseness of the generated summaries.

MIRACLE: Multi-task learning based Interpretable Regulation of Autoimmune diseases through Common Latent Epigenetics

(Under Review) Pengcheng Xu*, Jinpu Cai*, Yulin Gao, Ziqi Rong, Hongyi Xin. 2023. Preprint Link: arxiv.org/abs/2306.13866

Multi-task learning based interpretable gene-level methylation estimations | Research Assistant Sep 2021 - Present Advisor: Hongyi Xin, Associate Professor of UM-SJTU Joint Institute, Shanghai Jiao Tong University

Explored adaptable and interpretable neural network to find common genotype given 480k dimension sites, hundreds of sample.
Designed an explainable site-gene-pathway ontology constraint to NN to discover new biomarkers by checking weights.
Implemented a Variational Auto-Encoder to support gene-level embedding shared among datasets to obtain multi-task learning.
Optimized a pretrain-finetune training scheme to increase accuracy by over 10%, wrote the paper under review in 2024.

Balancing Information Preservation and Computational Efficiency: L2 Normalization and Geodesic Distance in Manifold Learning

(Accepted to ACM BCB 2024, Best Paper Award, Oral(36oral/204accepted)) Ziqi Rong, Jinpu Cai, Jiahao Qiu, Pengcheng Xu, Lana Garmire, Qiuyu Lian, Hongyi Xin. 2024. Preprint Link: https://explcre.github.io/files/7472_balancing_information_preserva.pdf

The importance of distinguishable information in similarity measurement for unsupervised learning, manifold learning, and high-dimensional data visualization tasks.
The limitations of conventional metrics like Euclidean distance after L1-normalization in handling high-dimensional data due to the "curse of dimensionality".
The influence of normalization with different p-norms and the defect of Euclidean distance.
The preservation of observation differences when normalizing data to a higher p-norm and using geodesic distance instead of Euclidean distance.
The sufficiency of L2-normalization onto the hypersphere in preserving delicate differences in relatively high-dimensional data while maintaining computational efficiency.
The presentation of HS-SNE, an augmentation to t-SNE based on a hypersphere representation system, which effectively addresses high-dimensional data visualization and similarity measurement intricacies.
The better resolution of the hypersphere representation system in identifying subtle differences in high-dimensional data while balancing efficiency and computational feasibility.

Pipe-Déjàvu: Hardware-aware Latency Predictable, Differentiable Search for Faster Config and Convergence of Distributed ML Pipeline Parallelism

(To be submitted) Pengcheng Xu, Kaiyang Chen, Yuanrui Zhang, Indranil Gupta. 2023. Read the research report

Hardware-aware Latency Predictable, Differentiable Search for Faster Config and Convergence of Distributed ML Pipeline Parallelism Advisor: Indranil Gupta, Professor of CS UIUC | Advanced Distributed Systems| Researcher | Feb 2023 – May,2023

Implemented a predictive model that considers communication cost, model computational cost, and hardware information to predict latency and resources of parallel configurations, saving time on pre-profiling before searching the parallel configuration.
Proposed a differentiable parallel configuration search space inspired by DARTS, can potentially reach optimal configuration faster than the original dynamic programming.
Employed parallel random initialization using sampling algorithms like Bayesian Optimization for faster train loss convergence

Vascular Intervention Training System Based on Electromagnetic Tracking Technology

Zhikai Yang, Pengcheng Xu, Dekun Yang, Yufeng Chen, Yancong Ma. ICVRV, 2020. ieeexplore.ieee.org/document/9479727

Advisor: Lixu Gu, Professor of Biomedical Engineering, Shanghai Jiao Tong University

Developed the framework of an augmented reality surgery training assistant system for medical student and surgery.
Predicted the operation trajectory using LSTM and used KD-Tree to calculate the distance for operation safety warning.
Displayed vascular model in AR with OpenGL and designed the UI interface to support translation.
Used the aruco library in OpenCV to coordinate positioning of the QR code.
Published Vascular Intervention Training System Based on Electromagnetic Tracking Technology on ICVRV as second author.

Full Portfolio

Get In Touch

Feel free to email me or message me.

Address
Irvine, CA 92617
United States
Phone
217-550-1337
Email
pengchx3@uci.edu

Misc

Video about me playing the guitar: "Fly me to the Moon".

Video about me playing the guitar: "七里香"(Common Jasmine Orange) by Jay Chou.

Video about me playing the guitar: "Modern Loneliness" by Lauv.

Video about me doing a tech talk on my open-sourced software development project "Intregrated Automatic Feature Engineering" for 4Paradigm OpenMLDB community.

Video about me doing a presentation on Distributed Machine Learning (Horovod, NVIDIA Clara Train SDK. 3D UNet model distributed training on 2+2 GPUs).

A Poem: Good "New" Days

A Poem: Human Brain is Perfect

A Sci-Fi Novel: 祂

(祂 is a gender-neutral term in Chinese. Sometimes it's used to refer to God.)

祂 (Original Version, In Chinese)

Fun Facts:

My MBTI Personality is INTJ. (Around 2020-2021 it becomes ENTJ but now it's returning to INTJ. Now it's about 55% Introvert 45% Extrovert.)