• Designed and implemented a full developing life-cycle text mining system extracting, cleaning and classifying professional skills from raw email data in Java and Python
• Developed keyword extraction and prediction system based on Stanford CoreNLP for NLP analysis as POS tag, NER tag, Lemma tag analysis
• Developed topic extraction system connected to Dbpedia by Apache Jena in SPARQL to create domain specified ontology
• Expanded topic extraction system with Wikidata and DMOZ stored in MySQL that cleaned and imported by Python
• Implemented PageRank algorithm for social graph scoring based on email contact information