MACHINE/DEEP LEARNING

Reinforcement Learning

Finding optimal path in unknown mazes

(by A* and Dynamic Programming)

Navigating in unknown real world is a key challenge in autonomous vehicle or mobile robot application. The problem is simplified as a robot navigating in an unknown maze and finding its optimal path. A motion agent for the robot has been developed and validated in the simulation. The approach is based on A* like algorithm with heuristic function and the number times of path traveled to learn the maze in the training run; and Dynamic programming method to find the fastest path in the testing run. By comparing with benchmark metrics, the test results proved that the robot was able to complete the mazes and find the fastest path within a very reasonable time.

- link to GitHub

Finding optimal path in unknown mazes (by Q-Learning)

The Micromouse problem is solved with Q-Learning method. The Q-Learning method would require much more training time to learn the maze than the A* + DP approach. However, programming is simpler, and it is able to find the optimal path with sufficient training.

- link to GitHub

Cart-Pole from OpenAI Gym (by Simple Random Gains)

To learn more about Reinforcement Learning for controlling a dynamic system, the inverted pole on the cart problem from OpenAI Gym is tried out.

Main alogrithm comes from:

http://kvfrans.com/simple-algoritms-for-solving-cartpole/

It is a linear state feedback control with discrete output, 0 or 1.

Feedback gains are selected randomly, but it works surprisingly well.

It is to be used as a benchmark for future Reforcement Learning developemnt.

- link to GitHub

- OpenAI Gym score

Deep Learning Classifier

Hand Writing Recognition (Kaggle Digit Recognizer Competition - MNIST)

Applied convolutional neural network (Keras + TensorFlow + TensorBoard) to MINST dataset. The CNN is based on a VGG like architecture for the images with 28x28x1 and achieved 99.37% accuracy with test data set, ranking at the top 9% at Kaggle Digit Recognizer competition. I believe that 99.37% accuracy would better than the level any human can do.

The Tensorboard-enabled embedding visualization is used to confirm the proper clustering of similar images (unlabeled data set).

- link to GitHub

State Farm Distracted Drivers Detection (Classification)

Built a deep learning model to detect distracted drivers from images. The data set from State Farm Kaggle competition is used for the development. The model is based on pretrained VGG model with ImageNet data and then re-trained with State Farm dataset. ~93% accuracy is achieved for 10 categories classification:

c0: normal driving

c1: text - right

c2: talking on the phone - right

c3: text - left

c4: talking on the phone - left

c5: operating the radio

c6: drinking

c7: reaching behind

c8: hair and makeup

c9: talking to passenger

Image Classification and Analysis thru Deep Learning API

This is a simple Python Flask web app to demonstrate image classification and analysis using Deep Learning. After uploading images from the local directories of your computer, the web app dectects and analyzes objects from the image, and print out key features to provide the context of the images. For example, 'sky', 'freedom', 'fun', 'people', 'flying', 'adults', 'outdoors', 'motion', ........

- link to Web App

Object Detection

Image Detection (Classification+Regression) for Pictures and Videos

Applied the state-of-the-art deep learning frameworks to detect multiple objects in the images in near real time without sacrifying accuracy. Currently achieved mAP 74% at 59 FPS with VOC 2007 dataset (compared to Faster RCNN, mAP 73% at 7 FPS; YOLO mAP 63% at 45 FPS) - Tensflow, KERAS, VGG

Transfer Learning of Image Detection

Pre-trained with imagenet and then re-train top layer only with new dataset (custom). The code demonstrates detecting and classifying fish from the image data set: https://www.kaggle.com/c/the-nature-conservancy-fisheries-monitoring/data.

Building Detection with TensorBox - Spacenet Dataset

TensorBox developed by R. Stewart is applied to detect buildings on SpaceNet dataset. It is an the end-to-end objects detection algorithm that directly predicts bounding boxes and object class confidences through all locations and scales of an image. Specifically, GoogLeNet-LSTM-Rezoom is used for the final model.

The algorithm is implemented with AWS, GPU enabled EC. It took about 7 hours for 150K iterations.

- link to GitHub

CV for Automotives

Lane and Objects Detection with OpenCV and MobileNet-SSD

Applied OpenCV to detect the lanes based on the multiple channels and deployed a lighter version of deep learning algorithm, mobilenet-SSD to detect objects from mobile devices. It was developed under Tensorflow framework.

link to the video

3D Pose Detection and Tracking from a Mono Camera

Developed 3D pose objects detection and tracking with a mono camera. It is based on DNN, Kaman filter, and Hungarian algorithm, and able to estimate the 3D position and orientation of multiple objects and track them thru a sequence of images.

link to the video

Inventory Monitoring Drone

Warehouse Inventory Monitoring Drone - Indoor Vision Based Autonomous Flight

Flying drones autonomously at the warehouses where gps signal is denied. Used two off-the-shelf stereo cameras for drone localization and navigation. To achieve +/- 10 cm accuracy, Apriltags are installed at the site mitigating occasional drifts.

The drone has two on board computers: Intel i7 Nuc for primary communication and rack analysis - OCR and bar coding reading; NVIDIA TX2 for navigation and AI work.

link to the video

Screen Shot 2020-04-18 at 11.18.10 AM.pn

Box Detection, Depth Calculation and 3D Reconstruction with Stereo Camera

TensorFlow + ROS running on NVIDIA TX2 to detect the boxes on the rack - SSD + MobileNet; Depth estimation and occupancy grid were performed thru point clouds generated by a stereo camera - Octomap.

link to the video

Stereo Cameras

Monitoring Social Distancing with Deep Learning + Stereo Camera

Social distancing violation is detected with deep learning algorithm and a stereo camera. The stereo camera generates point cloud to provide bird's eye view of the detected objects. Therefore, it provides with more reliable and accurate warnings when 6 ft distancing is violated.

link to the video

3D Pointcloud and SLAM

Visual SLAM with Intel RealSense D435 Depth Cameras

Performed SLAM (Simultaneous Localization and Mapping ) to estimate the location of products in a store and map its environment with Intel D435 and NVIDIA AGX Nano on a cart. It initialized with an Apriltag at the starting point , ran "rtabmap_ros" for SLAM algorithm and used "YoloV5" for product detection and out-of-stock identification.

link to video

3D PointCloud Construction with Multiple Stereo Cameras

Constructed 3D pointcloud of an object using four Intel Realsense cameras. Overcame technical challenges such as multiple cameras calibration and synchronization with OpenCV and Open3D .

link to video

Multiple Camera CV

Multiple Cameras and Multiple People Tracking

Tracking multiple people with multiple camera to overcome occlusion and re-entry. Applied object detection, key point tracking, re-ID to consolidate and maintain unique tracking ID per person. It ran on NVIDIA AGX Xavier with 4 GMSL cameras and NVIDIA Deepstream, and achieved ~10 frames/sec performance.

link to video

Product Recognition with Multiple Cameras

Recognized products on the checkout counter by aggregating 4 camera views and applying deep learning CV. It was based on YoloV7 segment, SORT, and a fine-grain classifier on NVIDIA AGX Xavier.

link to video

Supervised Learning

Movie Review Classification

Demonstrated movie review classification predicting positive or negative from your written review comments.

Link to Web App (temporally disconnected)

Predictiong Boston Housing Prices

Boston Housing dataset contains aggregated data on various features for houses in Greater Boston communities, including the median value of home for each of those areas. An optimal model based on a statistical analysis is developed first. Then the model is used to estimate the best selling price for a home. DecisionTreeRegressor and GridSearchCV from sklearn are used.

Some portion of code is imported from Udacity MLND assignment 2.

- link to Github

Predicting Student Intervention

Analyzed the data set on student's performance and develop a model that will predict the likehoold that a given student will pass, quantifying whether an intervention is necessary. Three classification methods are initially applied and compared: AdaBoost, Naive Bayesian and support vector machine. The AdaBoost model was picked and further fine-tuned with GridSearchCV().

Some portion of code is imported from Udacity MLND assignment 1.

- link to GitHub

Unsupervised Learning

Customer Segments

Used unsupervised learning techniques to see if any similarities exist between customers, and how to best segment customers in distinct categories. One goal of this project is to best describe the variation in the different types of customers that a wholesale distributor interacts with. Doing so would equip the distributor with insight into how to best structure their delivery service to meet the needs of each customer.

PCA and KMeans from sklearn used for data dimensioni reduction and clustering.

Some portion of code is imported from Udacity MLND assignment 3.

- link to GitHub

Mobile OCR (Optical Character Recognition) with OpenCV

Convert Currency

ConvertCurrency is an iOS app to make currency conversion simple, easy ans accurate. It helps your trip to abroad to be more enjoyable. It uses optimized Optical Character Recognition (OCR) technology to read instaneously one currency and converts to the other currency without any input. It utlizes the most recent exchange rate available from internet. It is based on swift and openCV.

- link to video

Tip Properly

TipProperly is an iOS app to make tip calculation simple, easy and fair. It provides saving opportunities to an user; and encourage good service and discourage bad service by tipping properly - eliminate reading/rounding error. It uses the optimized Optical Character Recognition (OCR) technology to read number instantaneously and calculates tip automatcally with user options. Pinch to zoom and flashlight support help reading under the poor ambient light condition. It is based on swift and openCV.

- link to video

Reinforcement Learning

Finding optimal path in unknown mazes

(by A* and Dynamic Programming)

Finding optimal path in unknown mazes (by Q-Learning)

Cart-Pole from OpenAI Gym (by Simple Random Gains)

Deep Learning Classifier

Hand Writing Recognition (Kaggle Digit Recognizer Competition - MNIST)

State Farm Distracted Drivers Detection (Classification)

Image Classification and Analysis thru Deep Learning API

Object Detection

Image Detection (Classification+Regression) for Pictures and Videos

Transfer Learning of Image Detection

Building Detection with TensorBox - Spacenet Dataset

CV for Automotives

Lane and Objects Detection with OpenCV and MobileNet-SSD

3D Pose Detection and Tracking from a Mono Camera

Inventory Monitoring Drone

Warehouse Inventory Monitoring Drone - Indoor Vision Based Autonomous Flight

Box Detection, Depth Calculation and 3D Reconstruction with Stereo Camera

Stereo Cameras

Monitoring Social Distancing with Deep Learning + Stereo Camera

3D Pointcloud and SLAM

Visual SLAM with Intel RealSense D435 Depth Cameras

3D PointCloud Construction with Multiple Stereo Cameras

Multiple Camera CV

Multiple Cameras and Multiple People Tracking

Product Recognition with Multiple Cameras

Supervised Learning

Movie Review Classification

Predictiong Boston Housing Prices

Predicting Student Intervention

Unsupervised Learning

Customer Segments

​

Mobile OCR (Optical Character Recognition) with OpenCV

Convert Currency

Tip Properly