Portfolio
This consists:
- My timeline-based AI/ML/DL projects
- My blogs/presentations/contributions
- My publications
- Interesting Work
- Interview prep
- What I have learnt
Latest version is at here
Projects
2018
- Intro to Machine Learning final project: Titanic Survival Prediction Trained and validated multiple ML algorithms using Scikit-learn (SVM, Random Forest, Decision Tree, Logistic Regression) on Titatnic dataset to predict the survival rate of Titatnic passengers
- TCReepy
- Implemented K-Nearest-Neighbor to learn informative positions of proteins in amino acids to distinguish two types of T cell receptor hypervariable CDR3 sequences.
- Achived: Best Desk to Best Bedside Award at 2018 Med U-Hack at UT Southwestern
2019
- Neural Engineering research projects (2018 - 2019) advised by Prof. Tan Chin-Tuan at the Auditory Perception Engineering Laboratory, UT Dallas
- Chest X-Ray Abnormal Detection
- Applied multiple Transfer Learning models and hyper-parameter tuning to detect abnormalities in chest x-rays with 88% accuracy.
- Achived: 1st Prize at the HealthCare AI 2019 Hackathon at Uni. of Texas at Dallas
2020
- MoCV
- An open-source Python package implementing Computer Vision and Image Processing algorithms.
- Senior Co-op Project: Interaction Tunning Tool
- Led a team of 6 engineers to build and deploy a no-code Intent Extraction system to reduce the manual intent labeling tasks (no coding and domain knowledge required) for Chatbot data preparation.
- Contribution: utilized StanfordNLP and Tensorflow to develop a Deep Learning model (LSTM-Attention + MLP) to extract intents from raw utterances (75% accuracy in development and 20% in deployment). Unlike Google Dialogflow using a fixed intent list, our system forms VERB-NOUN intents that it does not limit iteself by industry domains
- Name Entity Recognizer Implemented BiLSTM-CRF for Name Entity Recognition, built the data pipeline in Tensorflow, and deploy in Flask
- Intent Classifier
- Trained Suport-Vector-Machine (SVM) and GradientBoosting on text features extracted by TF-IDF for Intent-Classification tasks. Accuracy: 97% for training and 80% validation
- Borot
- Built Chatbot Question & Answering with Flask, Scikit-learn, Tensorflow, and SQL.
- Implemented Information Retrieval with Intent Classifier (SVM), Name-Entity-Recognizer (BiLSTM-CRF) and TF-IDF to retrieve answers in response to questions. Implemented OOP to collect users’ QA queries for personalization.
- Mask-RCNN - Implementation of Mask-RCNN in Tensorflow >= 2.0.0
2021
- Emorecom, ICDAR2021 Competition – Multimodal Emotion Recognition on Comic scenes
- Developed a multimodal Deep Learning model composed of CNN (ResNet, FaceNet) for visual features and RNN/BERT for textual features to detect emotion on comic scenes. Ranked 13th.
- Utilized Tensorflow Data/string/image and OpenCV to build image/text augmentation pipeline and the TFRecord data pipeline.
- h-at (Information Extraction)
- Developed a Python program to extract template-based information by designing rules and utilizing Dependency Parsing, Name Entities, and POS tags.
- Find-unicorns (search engine)
- Developed a search engine by crawling & scrapping (>100k links), implementing ranking algorithms (TF-IDF, HITS, PageRank).
- Attention2Log
- An experimental study of Transformers for Log-Key-based Anomaly Detection
- FRL
- Neural Collaborative Filtering + GraphNN for Interactive Recommendation System
Blogs and Workshops
Machine Learning with Google Cloud Platform (GCP)
Install Tensorflow on AMD GPUs
Entry-ML Workshops @DSC-UTD - workshops over fundamental machine learning algorithms
Workshop on Multimodal Emotion Recognition @AIS
Publications
- Neural Entrainment to Speech Envelope in response to Perceived Sound Quality
- Authors: Dat Quoc Ngo, Garret Oliver, Gleb Tcheslavski, Chin-Tuan Tan.
- Affiliation: Undergraduate Research Assistant at Auditory Perception Engineering Laboratory, UT Dallas
- Status: Accepted to IEEE Neural Engineering Conference 2019
- Linear and Nonlinear Reconstruction of Speech Envelope from EEG
- Authors: Dat Quoc Ngo, Garret Oliver, Gleb Tcheslavski, Fei Chen, Chin-Tuan Tan.
- Affiliation: Undergraduate Research Assistant at Auditory Perception Engineering Laboratory, UT Dallas
- Status: Preprint
- Depression Detection: Text Augmentation for Robustness to Label Noise in Self-reprots
- Authors: Dat Quoc Ngo, Aninda Bhattacharjee, Tannistha Maiti, Tarry Singh, Jie Mei
- Affiliation: Visiting AI Researcher at deepkapha.ai
- Status: in submission
Interesting Work
Computer Vision
- You Look Only ONce
- A unified Object Detection model consisting of DarkNet (Deep CNN) and Non-Max-Suppression at the output layer to group multiple bounding-boxes for the final object. Multi losses (object loss, class loss, and position loss) were computed.
- Mask RCNN
- An instance segmentation model composed of backbone (ResNet or Feature Pyramid Network), Region Proposal Network (simple CNN), and ROIAlign layer (Non-Max-Suppression) that both share hidden feature from the backbone.
- Prone to overfitting since using Fully Connected layer for prediction.
- Single Image Super Resolution based on a Modified U-NET with Mixed Gradient Loss
- Introduction to loss functions (MGE and MixGE) for Super-Image-Super-Resolution problems using U-NET.
- MSE, a common loss function is limited to learn errors based on pixel values, not the object curve (aka gradient error)
- To solve gradient error introduced by Super-Resolution, Mean-Gradient-Error (MGE) utilizes Sobel operator to shapren curves of objects in predicted and true images which are then computed into difference-square
Natural Language Processing
- Bidirectional LSTM-CRF Models for Sequence Tagging
- Implementation of BiLSTM-CRF for Sequence Tagging tasks.
- Attention is all you need
- Attention-based Seq-2seq model for state-of-the-art Neural Machine Translation
- BERT: Pretraining of Deep Bidirectional Transformers for Langauge Understanding
- A Deep Transformer-based Encoder pretrained on Wikipedia and BookCorpus allows easiy fine-tuning for downstream tasks.
- Make text-based Transfer Learning feasible
- Pros: effectively extract context
- Cons: due to unsupervised pretraining tasks (i.e. random input masks) and individually attention-weight computing of each token across text, BERT ignores the sequential dependency of tokens (aka autogressive learning). Hence, BERT is not suitable for several text generation tasks such as Machine Translation.
- XLNet: Generalized Autoregressive Pretraining for Language Understanding
- XLNet (Autogressive Language Modeling) implements Permutation Language Modeling to generate permutation of token positions. This allows XLNet to learn dependency between a token and tokens before it only which represent for Autogressive Language Modeling (remember RNN and LSTM?).
- Conditional BERT Contextual Augmentation
- Proposed Conditional Mask Language Modeling (Conditional MSM) and a novel text data augmentation for labeled sntences to train Conditinoal BERT.
- Idea : replace Segmentation Embedding with vocabulary-indexed labels when pretraining.
- Result : Conditional BERT is applicable for contextual data augmentation that improves BERT in several NLP tasks. Also, Conditional BERT could be used for text style transfer (replacing words w/o changing context).
- DistilBERT, a distilled version of BERT: smaller, faster, cheaper, and lighter
- A smaller version of BERT with equivalent performance but less number of layers (6 in Distil BERT and 12 in BERT)
- More practical for real-time inference
- COMET: Commonsense Transformers for Automatic Knowledge Graph Construction
- A Transformer-based model to generate/complete knowledge graphs.
- Input encoding:
- Consists of a 3-element tuple {subject, relation, object}
- Unlike BERT w/ 3 embedings, COMET has 2 embeddings: tokens and positions
- Architecture is similar to BERT, except the output layer of Transformer
- Unsupervised Commonsense Question Answering with Self-Talk
- To be added
- Visual Question Answering To-read-list
- Collection of to-read publications regarding Visual Question Answering
Others
Label Noise
- Label Noise Types and Their Effects on Deep
Learning
- Investigate the effects of label noise types to Deep Learning
- Label noise is common in large-scale datasets or in active learning when samples are labeled by non-experts
- Exploiting Context for Robustness to Label Noise in Active Learning
- Proposed to exploit graphical context to improve CNN’s performance in classification tasks.
Knowledge Distillation
- Knowledge Distillation: A Survey
- A literature review of Knowlege Distillation techniques
- By:
- Knowledge: response-based, feature-based, and relation-based
- Schemes: offline-distillation, online-distillation, and self-distllation
- Algorithms: attention-based, adversarial-based, cross-modal, multi-teacher, graph-based, data-free, quantization-based, NAS-based
- Knowledge distillation’s applications in CV, NLP, Speech Recognition
- I personally found data-free algo useful when huge distilling huge models to smaller models (e.g. T5-3B to T5-small) without data in-hand.
- Adversarial Self-Supervised Data-Free Distillation for Text Classification
- A data-free distillation algorithm to distill BERT-large to smaller BERTs without ready in-hand.
- Idea: use BERT-teacher to generate and optimize synthetic embeddings (i.e. ignore discrete text that is hard to generate) that serve as inputs to BERT-student.
- Update: I utilized this method to distil T5-3B to T5-small with minor modifications for Grammar Error Correction.
- Be Your Own Teacher: Improve the Performance of Convolutional Neural Networks via Self Distillation
- An self-distillation scheme for ResNet that each low-level module is a student network. In training, the teacher (whole ResNet) and students (sub-ResNet modules) are trained and optimized simulatenously by losses: KL-divergence & cross-entropy
- Notes: useful when distilling huge models (e..g T5-3B, T5-11B, or GPT-#)
Interview Prep
What I have learnt
Graduate courses lised only