Friday, August 11, 2017

Last Week

Last Week




Last week at  LexisNexis for my Summer-17 internship and a glorious run culminates.

The plan for this week was to present at the final intern presentation at Raleigh, make some final code changes, push code for a pull request and sign off.

Day 1:
  • Benchmarking tests checking to improve the performance of classification.
  • Zero down on the source which can be parallelized.
Day 2:
  • The HR team at LexisNexis Raleigh arranged for a session to practice and calibrate the talk before the final presentation.
  • Made few changes on the slides and updated my resume adding the work done at LN Raleigh. Here it is
Day 3:
  • Final presentation at LN Raleigh. The talk went well as it garnered some attention from folks in Raleigh. Here are the slides.
  • Meeting and Greeting other interns and seniors at LexisNexis Raleigh.


























Day 4:
  • Final changes on parallelization based on the zero downs from Monday
  • Documentation for gradient boosting






























Day 5:
  • Push code for commit based on some feedback from Dr Holt.
  • Final acknowledgments for everyone.

Thanks to John Holt, Lorraine Chapman and the HR team at Alpharetta and Raleigh for giving me the opportunity and infrastructure to learn and showcase my talent. Cheers!!



Friday, August 4, 2017

Week 11

Week 11





The plan for this week was to present at "The Download" tech talk optimize classification. The plan was also to model data from Scopus.

Day 1:
  • Model data from Scopus using the updated doc2vec technique.
  • Results now give a common platform to query both legislative and Scopus documents.
Day 2:
  • Practice Talk for the tech talk
  • Present at the tech talk
Day 3 and 4:
  • Probe classification to identify regions of parallelization
  • Parallelize independent regressions in classification.
Day 5:
  • Finish few training assignments.
  • Prepare slides for final intern presentation.
  • Work on standardization.

Friday, July 28, 2017

Week 10

Week 10





The plan for this week was to prepare slides for "The Download" tech talk, unify categorical and continuous trees into the original push and parallelize classification. The plan was also to fetch data from Scopus.

Day 1:
  • Work on slides for the Tech Talk.
Day 2:
  • Practice Talk for the tech talk
  • Combine Categorical and Continuous Trees with original Gradient Boosting Trees framework
Day 3 and 4:
  • Scrapper to fetch data from Scopus.
  • Reading on projections and clustering vectors.
Day 5:
  • Few optimizations to code and fix code-style violations.
  • Finish up few touch ups for the slides for the tech talk.

Friday, July 21, 2017

Week 9

Week 9





The plan for this week was to test continuous and categorical data for gradient boosting and integrate regression and classification tree on larger datasets.

Day 1:
  • Did a study on the dataset provided by Roger Dev. It had 5000 records, 52 attributes, and 7 classes.
  • Developed script to read this data into ECL and verify the algorithm.
Day 2:
  • On a parallel note spent this day studying Scopus data as part of my parallel project in connecting legislative and research documents.
Day 3:
  • The dataset took a large time to run.
  • Spent the rest of the day debugging the problem.
  • Fixed the issue.
Day 4: 
  • Wrote a naive scrapper to parse data from Scopus.
  • Need to fetch large data. Planning over the weekend.
Day 5:
  • Started working on slides for Tech Talk on Aug 1st

Friday, July 14, 2017

Week 8

Week 8





The plan for this week was to develop a decision tree to handle both continuous and categorical data for gradient boosting and integrate regression and classification tree.

Day 1:

  •  Addressed feedbacks from my mentor on commits for Regression and Decision Trees

Day 2:
  • Developed the stub methods and creating generic modules for future implementations of mixed decision tree regression.
Day 3:
  • Combining Splitting techniques for regression and classifications
Day 4: 
  • Implement mixed trees
  • Plugin mixed trees to gradient boosting
  • Test for Gradient Boosting using mixed trees for classification and regressions
Day 5:
  • Field Type generator to easily use default field types
  • Community Service to Food bank for 3/4th of the day.

Friday, July 7, 2017

Week 7

Week 7



A relatively short week due to the 4th of July break.

Day 1
  •  Implement tests for all the algorithms implemented over last six weeks
Day 2: 
  • Fix few leftover tests
  • Send code for review to Dr Holt
Day 5:
  • Fix recommended changes and sent code back again.

Friday, June 30, 2017

Week 6

Week 6



The plan for this week was to develop a decision tree a generic framework for gradient boosting and integrate regression and classification tree.

Day 1 and 2(Research Days):
  • Studying the representation of categorical data in ECL
  • Reformating the Gradient boosting interface to support categorical data
  • Generated stub methods for Gradient Boosting for ordinal data.
Day 3:
  •  Implement partitioning function for categorical data for regression
Day 4: 
  • Complete implementation of the Regression Tree.
  • Plugin to Gradient Boosting Framework
Day 5:
  • Testing and verifying the framework.