About
I am a PhD student in computer engineering at Responsible Data Science Lab (ReDS Lab) located at Virginia Polytechnic Institute and State University (Virginia Tech).
I am fortunate to be advised by Prof. Ruoxi Jia.
I finished my bachelor degrees in mathematics and computer science at Gettysburg College,
where I had a pleasure to work with Prof. Béla Bajnok and Prof. Todd Neller, respectively.
I enjoy working on data-centric AI, especially to measure the importance of each data point used to train a model.
- Focus Questions:
- [Data Valuation]
How much should the data cost? - [Data Selection]
How to choose the best data to meet the model owner's expectations? - [Model Prediction]
How to predict the model performance given training data? - [AI Privacy]
How to protect your data used to train a model? - [Data Leakage]
How to extract data used to train a model?
Research
Feiyang Kang, Hoang Anh Just, Yifan Sun, Himanshu Jahagirdar, Yuanzhi Zhang, Rongxing Du, Anit Sahu, Ruoxi Jia
- Developed a scalable data selection method to pre-fine-tune a pretrained large language model (LLM) by selecting (unlabeled) data that can shift the source distribution to better align with the target distribution.
Feiyang Kang*, Hoang Anh Just*, Anit Sahu, Ruoxi Jia
- Proposed a performance estimator for a model trained on any data composition given only sample information and a scaling law to predict performance on larger scales, which effectively finds the optimal composition of data sources for any target data size.
Yi Zeng*, Minzhou Pan*, Hoang Anh Just, Lingjuan Lyu, Meikang Qiu and Ruoxi Jia
- Launched an efficient (poisoning 0.5% of the target class and 0.05% of the entire training dataset) and stealthy (hard to detect) backdoor attack, which requires only knowledge of the target class to successfully deploy the attack.
Myoengseob Ko, Xinyu Yang, Zhengjie Ji, Hoang Anh Just, Peng Gao, Ruoxi Jia
- Established an efficient real-time detection system to membership inference attacks which prevents attackers from inferring sensitive data used for model training.
Liu Zhihong*, Hoang Anh Just*, Xiangyu Chang, Xi Chen, Ruoxi Jia
- Proposed a novel, efficient approach to fine-grained data analysis, which valuates the quality of each feature of each data point with theoretical grounding.
Hoang Anh Just*, Feiyang Kang*, Tianhao Wang, Yi Zeng, Myeongseob Ko, Ming Jin and Ruoxi Jia
- Introduced an efficient data quality valuation method through adopting a modified class-wise Wasserstein distance, which is robust to noisy, mislabeled, and poisoned data without requiring any model training.
Yingyan Zeng, Tianhao Wang, Si Chen, Hoang Anh Just, Ran Jin, Ruoxi Jia
- Developed a set-function based neural network which can predict model weights from the training dataset of any size. This method enables efficient applications for data valuation, data selection, or data memorization, which requires multiple model re-trainings.
Mostafa Kahla, Si Chen, Hoang Anh Just, Ruoxi Jia
- Designed a novel practical model inversion attack which recovers sensitive data by accessing only labels of the model output without additional information.
Bela Bajnok, Connor Berson and Hoang Anh Just
- Proved that for sets of size greater than 3, there are no perfect restricted 2-basis in Z_n. Showed that for only sets of size smaller equal to 3 there exists a perfect restricted 2-basis in Z_n, proving by contradiction knowing that Z_n is closed under both addition and subtraction.
Peter Francis*, Hoang Anh Just*, Todd Neller
- We describe various approaches to opponent hand estimation in the card game Gin Rummy. We use an application of Bayes' rule, as well as both simple and convolutional neural networks, to recognize patterns in simulated game play and predict the opponent's hand. We also present a new minimal-sized construction for using arrays to pre-populate hand representation images.
Teaching
The Bradley Department of Electrical and Computer Engineering, Virginia Tech
Graduate Teaching Assistant
Artificial Intelligence and Engineering Applications
Fall 2021 - Fall 2022
Mathematics Department, Gettysburg College
Peer Learning Associate
Abstract Mathematics
Fall 2018 - Spring 2021
Computer Science Department, Gettysburg College
Teaching Assistant and Grader
Computer Science I and II
Fall 2018 - Spring 2021
Volunteer
Reviewer
CVPR 2023
NeurIPS 2023
ICLR 2023
Education
Virginia Polytechnic Institute and State University (Virginia Tech)
Degree: PhD Student in Computer Engineering
- Natural Language Processing (NLP)
- Advanced Machine Learning (ML)
- Convex Optimization
Relevant Courseworks:
Degrees: BA in Mathematics, BS in Computer Science
- Data Structures and Algorithms
- Combinatorics
- Abstract Algebra
- Artificial Intelligence (AI)
Relevant Courseworks:
Contact
2024 @ Hoàng Anh Just | Thank you!