Molecular de-novo design through Transformer-based Reinforcement Learning
(Accepted to KDD-AIDSH workshop 2024, Oral) Pengcheng Xu, Tianfan Fu, Wenhao Gao, Jimeng Sun.
2024. Preprint Link: http://arxiv.org/abs/2310.05365
Advisor: Tianfan Fu, Jimeng Sun
- Implemented a decision transformer architecture to improve the AUC for over fifteen
molecular optimization tasks for 5% each on average.
- Applied Oracle-feedback reinforcement learning on the downstream tasks to reach higher
performance than pretrained model.
- Carried out ablation study and investigation into loss curve and conditional probability
over the next token as a function of previously chosen ones according to the model
Adapting Large Language Models for Biomedical Lay Summarization
(Accepted to ACL-BioNLP workshop 2024) Jieli Zhou*, Cheng Ye*, Pengcheng Xu, Hongyi Xin. 2024.
Paper Link:
https://aclanthology.org/2024.bionlp-1.76.pdf
Adapting Large Language Models for Biomedical Lay Summarization | Research Assistant
Apr 2024 - June 2024
Advisor: Jieli Zhou, PhD of UM-SJTU Joint Institute, Shanghai Jiao Tong
University. Hongyi Xin, Associate Professor of UM-SJTU Joint Institute, Shanghai Jiao Tong
University
- Explored adaptable and interpretable neural network to find common genotype given 480k
dimension sites, hundreds of sample.
- Fine-tuned LLama3 using Low-Rank Adaptation (LoRA) and optimized model performance,
achieving a
68.8% improvement in the LENS readability score, leading to the first place in readability
at the
2024 BioLaySumm workshop.
- Implemented K-shot prompting based on semantic similarity, enhancing factuality scores by
15.0% (AlignScore)
and 7.9% (SummaC) through the integration of contextually relevant examples, leading to more
accurate and
relevant summaries.
- Developed techniques to resolve repeated word issues post-fine-tuning, resulting in a 12%
increase in coherence
and conciseness of the generated summaries.
MIRACLE: Multi-task learning based Interpretable Regulation of Autoimmune diseases through
Common Latent Epigenetics
(Under Review) Pengcheng Xu*, Jinpu Cai*, Yulin Gao, Ziqi Rong, Hongyi Xin. 2023. Preprint Link:
arxiv.org/abs/2306.13866
Multi-task learning based interpretable gene-level methylation estimations | Research Assistant
Sep 2021 - Present
Advisor: Hongyi Xin, Associate Professor of UM-SJTU Joint Institute, Shanghai Jiao Tong
University
- Explored adaptable and interpretable neural network to find common genotype given 480k
dimension sites, hundreds of sample.
- Designed an explainable site-gene-pathway ontology constraint to NN to discover new
biomarkers by checking weights.
- Implemented a Variational Auto-Encoder to support gene-level embedding shared among datasets
to obtain multi-task learning.
- Optimized a pretrain-finetune training scheme to increase accuracy by over 10%, wrote the
paper under review in 2024.
Balancing Information Preservation and Computational Efficiency: L2 Normalization and Geodesic
Distance in Manifold Learning
(Accepted to ACM BCB 2024, Oral(36oral/204accepted)) Ziqi Rong, Jinpu Cai, Jiahao Qiu, Pengcheng
Xu, Lana Garmire, Qiuyu Lian, Hongyi Xin. 2024. Preprint Link:
https://explcre.github.io/files/7472_balancing_information_preserva.pdf
- The importance of distinguishable information in similarity measurement for unsupervised
learning, manifold learning, and high-dimensional data visualization tasks.
- The limitations of conventional metrics like Euclidean distance after L1-normalization in
handling high-dimensional data due to the "curse of dimensionality".
- The influence of normalization with different p-norms and the defect of Euclidean distance.
- The preservation of observation differences when normalizing data to a higher p-norm and
using geodesic distance instead of Euclidean distance.
- The sufficiency of L2-normalization onto the hypersphere in preserving delicate differences
in relatively high-dimensional data while maintaining computational efficiency.
- The presentation of HS-SNE, an augmentation to t-SNE based on a hypersphere representation
system, which effectively addresses high-dimensional data visualization and similarity
measurement intricacies.
- The better resolution of the hypersphere representation system in identifying subtle
differences in high-dimensional data while balancing efficiency and computational
feasibility.
Pipe-Déjàvu: Hardware-aware Latency Predictable, Differentiable Search for Faster Config and
Convergence of Distributed ML Pipeline Parallelism
(To be submitted) Pengcheng Xu, Kaiyang Chen, Yuanrui Zhang, Indranil Gupta. 2023. Read the research report
Hardware-aware Latency Predictable, Differentiable Search for Faster Config and Convergence of
Distributed ML Pipeline Parallelism
Advisor: Indranil Gupta, Professor of CS UIUC | Advanced Distributed Systems| Researcher | Feb
2023 – May,2023
- Implemented a predictive model that considers communication cost, model computational cost,
and hardware information to predict latency and resources of parallel configurations, saving
time on pre-profiling before searching the parallel configuration.
- Proposed a differentiable parallel configuration search space inspired by DARTS, can
potentially reach optimal configuration faster than the original dynamic programming.
- Employed parallel random initialization using sampling algorithms like Bayesian Optimization
for faster train loss convergence
Vascular Intervention Training System Based on Electromagnetic Tracking Technology
Zhikai Yang, Pengcheng Xu, Dekun Yang, Yufeng Chen, Yancong Ma. ICVRV, 2020. ieeexplore.ieee.org/document/9479727
Advisor: Lixu Gu, Professor of Biomedical Engineering, Shanghai Jiao Tong University
- Developed the framework of an augmented reality surgery training assistant system for
medical student and surgery.
- Predicted the operation trajectory using LSTM and used KD-Tree to calculate the distance for
operation safety warning.
- Displayed vascular model in AR with OpenGL and designed the UI interface to support
translation.
- Used the aruco library in OpenCV to coordinate positioning of the QR code.
- Published Vascular Intervention Training System Based on Electromagnetic Tracking Technology
on ICVRV as second author.