Molecular de-novo design through Transformer-based Reinforcement Learning
					(Accepted to KDD-AIDSH workshop 2024, Oral) Pengcheng Xu, Tianfan Fu, Wenhao Gao, Jimeng Sun.
						2024. Preprint Link: http://arxiv.org/abs/2310.05365
					
					Advisor: Tianfan Fu, Jimeng Sun
					
						- Implemented a decision transformer architecture to improve the AUC for over fifteen
							molecular optimization tasks for 5% each on average.
 
						- Applied Oracle-feedback reinforcement learning on the downstream tasks to reach higher
							performance than pretrained model.
 
						- Carried out ablation study and investigation into loss curve and conditional probability
							over the next token as a function of previously chosen ones according to the model
 
					
					
				
				
					
					Adapting Large Language Models for Biomedical Lay Summarization
					 (Accepted to ACL-BioNLP workshop 2024) Jieli Zhou*, Cheng Ye*, Pengcheng Xu, Hongyi Xin. 2024.
						Paper Link:
						
							https://aclanthology.org/2024.bionlp-1.76.pdf
					
					Adapting Large Language Models for Biomedical Lay Summarization | Research Assistant
						Apr 2024 - June 2024
						Advisor: Jieli Zhou, PhD of UM-SJTU Joint Institute, Shanghai Jiao Tong
						University. Hongyi Xin, Associate Professor of UM-SJTU Joint Institute, Shanghai Jiao Tong
						University
					
						- Explored adaptable and interpretable neural network to find common genotype given 480k
							dimension sites, hundreds of sample.
 
						- Fine-tuned LLama3 using Low-Rank Adaptation (LoRA) and optimized model performance,
							achieving a
							68.8% improvement in the LENS readability score, leading to the first place in readability
							at the
							2024 BioLaySumm workshop.
 
						- Implemented K-shot prompting based on semantic similarity, enhancing factuality scores by
							15.0% (AlignScore)
							and 7.9% (SummaC) through the integration of contextually relevant examples, leading to more
							accurate and
							relevant summaries.
 
						- Developed techniques to resolve repeated word issues post-fine-tuning, resulting in a 12%
							increase in coherence
							and conciseness of the generated summaries.
						
 
					
					
				
				
					
					MIRACLE: Multi-task learning based Interpretable Regulation of Autoimmune diseases through
						Common Latent Epigenetics
					(Under Review) Pengcheng Xu*, Jinpu Cai*, Yulin Gao, Ziqi Rong, Hongyi Xin. 2023. Preprint Link:
						arxiv.org/abs/2306.13866
					
					Multi-task learning based interpretable gene-level methylation estimations | Research Assistant
						Sep 2021 - Present
						Advisor: Hongyi Xin, Associate Professor of UM-SJTU Joint Institute, Shanghai Jiao Tong
						University
					
						- Explored adaptable and interpretable neural network to find common genotype given 480k
							dimension sites, hundreds of sample.
 
						- Designed an explainable site-gene-pathway ontology constraint to NN to discover new
							biomarkers by checking weights.
 
						- Implemented a Variational Auto-Encoder to support gene-level embedding shared among datasets
							to obtain multi-task learning.
 
						- Optimized a pretrain-finetune training scheme to increase accuracy by over 10%, wrote the
							paper under review in 2024.
 
					
					
				
				
					
					Balancing Information Preservation and Computational Efficiency: L2 Normalization and Geodesic
						Distance in Manifold Learning
					(Accepted to ACM BCB 2024, Best Paper Award, Oral(36oral/204accepted)) Ziqi Rong, Jinpu Cai,
						Jiahao Qiu, Pengcheng
						Xu, Lana Garmire, Qiuyu Lian, Hongyi Xin. 2024. Preprint Link:
						https://explcre.github.io/files/7472_balancing_information_preserva.pdf
						
					
					
						- The importance of distinguishable information in similarity measurement for unsupervised
							learning, manifold learning, and high-dimensional data visualization tasks.
 
						- The limitations of conventional metrics like Euclidean distance after L1-normalization in
							handling high-dimensional data due to the "curse of dimensionality".
 
						- The influence of normalization with different p-norms and the defect of Euclidean distance.
						
 
						- The preservation of observation differences when normalizing data to a higher p-norm and
							using geodesic distance instead of Euclidean distance.
 
						- The sufficiency of L2-normalization onto the hypersphere in preserving delicate differences
							in relatively high-dimensional data while maintaining computational efficiency.
 
						- The presentation of HS-SNE, an augmentation to t-SNE based on a hypersphere representation
							system, which effectively addresses high-dimensional data visualization and similarity
							measurement intricacies.
 
						- The better resolution of the hypersphere representation system in identifying subtle
							differences in high-dimensional data while balancing efficiency and computational
							feasibility.
 
					
				
				
					
					Pipe-Déjàvu: Hardware-aware Latency Predictable, Differentiable Search for Faster Config and
						Convergence of Distributed ML Pipeline Parallelism
					(To be submitted) Pengcheng Xu, Kaiyang Chen, Yuanrui Zhang, Indranil Gupta. 2023. Read the research report
					Hardware-aware Latency Predictable, Differentiable Search for Faster Config and Convergence of
						Distributed ML Pipeline Parallelism
						Advisor: Indranil Gupta, Professor of CS UIUC | Advanced Distributed Systems| Researcher | Feb
						2023 – May,2023
					
						- Implemented a predictive model that considers communication cost, model computational cost,
							and hardware information to predict latency and resources of parallel configurations, saving
							time on pre-profiling before searching the parallel configuration.
 
						- Proposed a differentiable parallel configuration search space inspired by DARTS, can
							potentially reach optimal configuration faster than the original dynamic programming.
 
						- Employed parallel random initialization using sampling algorithms like Bayesian Optimization
							for faster train loss convergence
 
						
					
				
				
					
					Vascular Intervention Training System Based on Electromagnetic Tracking Technology
					Zhikai Yang, Pengcheng Xu, Dekun Yang, Yufeng Chen, Yancong Ma. ICVRV, 2020. ieeexplore.ieee.org/document/9479727
					
					Advisor: Lixu Gu, Professor of Biomedical Engineering, Shanghai Jiao Tong University
					
						- Developed the framework of an augmented reality surgery training assistant system for
							medical student and surgery.
 
						- Predicted the operation trajectory using LSTM and used KD-Tree to calculate the distance for
							operation safety warning.
 
						- Displayed vascular model in AR with OpenGL and designed the UI interface to support
							translation.
 
						- Used the aruco library in OpenCV to coordinate positioning of the QR code.
 
						- Published Vascular Intervention Training System Based on Electromagnetic Tracking Technology
							on ICVRV as second author.