Here is my warm greetings! Welcome to my home page. My name is Pengcheng Xu(pronounced as Peng-cheng Hsu).
In Chinese, my name is written as 徐鹏程, where 徐 is Xu, 鹏 is Peng, 程 is Cheng. 徐 is my surname(Also has the meaning of slow). 鹏 is a giant bird in ancient Chinese myth. It has huge wings and can fly 5000 kilometers at a time. 程 means journey. So, basically my name means the journey of the huge bird 鹏.This name carries the hope of my parents for me to have a bright and shining future, as well as a promising and ambitious path ahead.
Welcome to talk to ChatGPT version of me on my PengchengGPT website 😄!
ChatGPT-Chatbot-Personal-Website repo is here. If you think it's cool or want to use it, please star it! Thank you~

🌱 Education

🎸 Hobbies

In my free time, I enjoy singing, playing the guitar, working out, playing basketball, tennis, swimming, reading, watching films(especially sci-fi ones), and exploring everything related to science and technology. I'm particularly inspired by Richard Feynman and Elon Musk. You can check out some of my technology talks and guitar playing on My bilibili.

💻 Programming Languages

When it comes to programming languages, I'm skilled in C, C++, and Python. Additionally, I have project experience with MATLAB, Verilog, Java, R, and Javascript. I also have experience using various frameworks and libraries such as PyTorch, TensorFlow, Keras, Sk-learn, Pandas, and Horovod.

🔭 Area of Interest

My main interests are in applied machine learning for health or science(bioinformatics, molecular optimization, and so on),Large Language Model(LLM in general,LoRA, RLHF, Transformer or DNA,pangenome, protein, single cell foundation model. LLM for lay-summarization), Transformer Tokenization, Positional Encoding,Normalization, Graph Transformer,Graph Neural Networks, Reinforcement Learning, Manifold Learning, Diffusion Model. Computer Vision, Multi-Modal Learning, distributed systems and distributed ML, and computer architecture.

🔬 Research Experience

💼Work Experience

🧑‍🏫 Teaching Experience

Recent Work

Molecular de-novo design through Transformer-based Reinforcement Learning

(Accepted to KDD-AIDSH workshop 2024, Oral) Pengcheng Xu, Tianfan Fu, Wenhao Gao, Jimeng Sun. 2024. Preprint Link: http://arxiv.org/abs/2310.05365

Advisor: Tianfan Fu, Jimeng Sun

  • Implemented a decision transformer architecture to improve the AUC for over fifteen molecular optimization tasks for 5% each on average.
  • Applied Oracle-feedback reinforcement learning on the downstream tasks to reach higher performance than pretrained model.
  • Carried out ablation study and investigation into loss curve and conditional probability over the next token as a function of previously chosen ones according to the model

Adapting Large Language Models for Biomedical Lay Summarization

(Accepted to ACL-BioNLP workshop 2024) Jieli Zhou*, Cheng Ye*, Pengcheng Xu, Hongyi Xin. 2024. Paper Link: https://aclanthology.org/2024.bionlp-1.76.pdf

Adapting Large Language Models for Biomedical Lay Summarization | Research Assistant Apr 2024 - June 2024 Advisor: Jieli Zhou, PhD of UM-SJTU Joint Institute, Shanghai Jiao Tong University. Hongyi Xin, Associate Professor of UM-SJTU Joint Institute, Shanghai Jiao Tong University

  • Explored adaptable and interpretable neural network to find common genotype given 480k dimension sites, hundreds of sample.
  • Fine-tuned LLama3 using Low-Rank Adaptation (LoRA) and optimized model performance, achieving a 68.8% improvement in the LENS readability score, leading to the first place in readability at the 2024 BioLaySumm workshop.
  • Implemented K-shot prompting based on semantic similarity, enhancing factuality scores by 15.0% (AlignScore) and 7.9% (SummaC) through the integration of contextually relevant examples, leading to more accurate and relevant summaries.
  • Developed techniques to resolve repeated word issues post-fine-tuning, resulting in a 12% increase in coherence and conciseness of the generated summaries.

MIRACLE: Multi-task learning based Interpretable Regulation of Autoimmune diseases through Common Latent Epigenetics

(Under Review) Pengcheng Xu*, Jinpu Cai*, Yulin Gao, Ziqi Rong, Hongyi Xin. 2023. Preprint Link: arxiv.org/abs/2306.13866

Multi-task learning based interpretable gene-level methylation estimations | Research Assistant Sep 2021 - Present Advisor: Hongyi Xin, Associate Professor of UM-SJTU Joint Institute, Shanghai Jiao Tong University

  • Explored adaptable and interpretable neural network to find common genotype given 480k dimension sites, hundreds of sample.
  • Designed an explainable site-gene-pathway ontology constraint to NN to discover new biomarkers by checking weights.
  • Implemented a Variational Auto-Encoder to support gene-level embedding shared among datasets to obtain multi-task learning.
  • Optimized a pretrain-finetune training scheme to increase accuracy by over 10%, wrote the paper under review in 2024.

Balancing Information Preservation and Computational Efficiency: L2 Normalization and Geodesic Distance in Manifold Learning

(Accepted to ACM BCB 2024, Oral(36oral/204accepted)) Ziqi Rong, Jinpu Cai, Jiahao Qiu, Pengcheng Xu, Lana Garmire, Qiuyu Lian, Hongyi Xin. 2024. Preprint Link: https://explcre.github.io/files/7472_balancing_information_preserva.pdf

  • The importance of distinguishable information in similarity measurement for unsupervised learning, manifold learning, and high-dimensional data visualization tasks.
  • The limitations of conventional metrics like Euclidean distance after L1-normalization in handling high-dimensional data due to the "curse of dimensionality".
  • The influence of normalization with different p-norms and the defect of Euclidean distance.
  • The preservation of observation differences when normalizing data to a higher p-norm and using geodesic distance instead of Euclidean distance.
  • The sufficiency of L2-normalization onto the hypersphere in preserving delicate differences in relatively high-dimensional data while maintaining computational efficiency.
  • The presentation of HS-SNE, an augmentation to t-SNE based on a hypersphere representation system, which effectively addresses high-dimensional data visualization and similarity measurement intricacies.
  • The better resolution of the hypersphere representation system in identifying subtle differences in high-dimensional data while balancing efficiency and computational feasibility.

Pipe-Déjàvu: Hardware-aware Latency Predictable, Differentiable Search for Faster Config and Convergence of Distributed ML Pipeline Parallelism

(To be submitted) Pengcheng Xu, Kaiyang Chen, Yuanrui Zhang, Indranil Gupta. 2023. Read the research report

Hardware-aware Latency Predictable, Differentiable Search for Faster Config and Convergence of Distributed ML Pipeline Parallelism Advisor: Indranil Gupta, Professor of CS UIUC | Advanced Distributed Systems| Researcher | Feb 2023 – May,2023

  • Implemented a predictive model that considers communication cost, model computational cost, and hardware information to predict latency and resources of parallel configurations, saving time on pre-profiling before searching the parallel configuration.
  • Proposed a differentiable parallel configuration search space inspired by DARTS, can potentially reach optimal configuration faster than the original dynamic programming.
  • Employed parallel random initialization using sampling algorithms like Bayesian Optimization for faster train loss convergence

Vascular Intervention Training System Based on Electromagnetic Tracking Technology

Zhikai Yang, Pengcheng Xu, Dekun Yang, Yufeng Chen, Yancong Ma. ICVRV, 2020. ieeexplore.ieee.org/document/9479727

Advisor: Lixu Gu, Professor of Biomedical Engineering, Shanghai Jiao Tong University

  • Developed the framework of an augmented reality surgery training assistant system for medical student and surgery.
  • Predicted the operation trajectory using LSTM and used KD-Tree to calculate the distance for operation safety warning.
  • Displayed vascular model in AR with OpenGL and designed the UI interface to support translation.
  • Used the aruco library in OpenCV to coordinate positioning of the QR code.
  • Published Vascular Intervention Training System Based on Electromagnetic Tracking Technology on ICVRV as second author.

Get In Touch

Feel free to email me or message me.

  • Address

    1010 West University Ave
    Urbana, IL 61801
    United States
  • Phone

    217-550-1337
  • Email

    px6@illinois.edu

Misc

Video about me playing the guitar: "Fly me to the Moon".

Video about me playing the guitar: "七里香"(Common Jasmine Orange) by Jay Chou.

Video about me playing the guitar: "Modern Loneliness" by Lauv.

Video about me doing a tech talk on my open-sourced software development project "Intregrated Automatic Feature Engineering" for 4Paradigm OpenMLDB community.

Video about me doing a presentation on Distributed Machine Learning (Horovod, NVIDIA Clara Train SDK. 3D UNet model distributed training on 2+2 GPUs).

A Poem: Good "New" Days

A Poem: Human Brain is Perfect

A Sci-Fi Novel: 祂

(祂 is a gender-neutral term in Chinese. Sometimes it's used to refer to God.) 祂 (Original Version, In Chinese)

Fun Facts:

My MBTI Personality is INTJ. (Around 2020-2021 it becomes ENTJ but now it's returning to INTJ. Now it's about 55% Introvert 45% Extrovert.)