David Kaspar UpwardTrajectory

Hi there 👋

My name is David Kaspar and I'm a creative data scientist/engineer and design thinker with experience in Python, statistical analysis, machine learning, and natural language processing. With a background in math education and entrepreneurship, I build and connect data pipelines, creating robust & performant tools while answering business-centric questions. I transform numbers, data, and abstract ideas into something that makes sense to people and organizations so that they can make informed, data-driven decisions.

Downloadable Resume

My Time as a Data Science Consultant

Due to signing an NDA, I cannot discuss anything here in great detail, but I can share some high-level ideas and some of the tools I used to collaborate with both our internal team and our external clients (technical & non-technical) to solve a wide variety of challenging, data-related problems.

Source-control: Git, GitHub, Atlassian BitBucket
Python Machine Learning Libraries: Pandas, NumPy, Scikit-Learn, PyTorch, Tensorflow, SpaCy, NLTK, Facebook Prophet
Visualization: Tableau, Power BI, Matplotlib, Seaborn, Plot.ly
Cloud Computing:
- Amazon Web Services (AWS, S3, EC2, SageMaker)
- Google Cloud Platform (GCP, App Engine, Compute Engine, Cloud Functions, Cloud Storage, BigQuery, Cloud AI, Cloud IAM)
- IBM Watson (Natural Language Understanding, Natural Language Classifier, Speech to Text)

⚡ Highlighted Projects

Tracking Customer Engagement in 3D Space (on a team of 4 ppl)

Understanding how customers interact with a physical space is often difficult (and expensive) to measure accurately. If done wrong, it can easily be seen as an invasion of privacy. However, if done well, it can give a lot of insights into how a company might want to do things differently in order to optimize their customers' experience or simply to improve their own bottom line. This project focused on assessing anonymous movement trends & group behaviors across a multi-story space covering an area that exceeds 500,000 sq ft with 45 distinct zones. Extracting trends based on visit duration, day of the week, time of the year, and then comparing with historical data. All findings were presented in a Tableau dashboard that was updated once per day.

Automatically processed tens of millions of records per night using cloud computing & scheduling
Gather raw data -> Google Big Query (raw) -> Modeling & Analysis (Python) -> Google Big Query (processed) -> Tableau Front-end
Source control & collaboration using GitHub
My individual contributions & responsibilites included:
- Design & implement data gathering protocols to create a formal training & testing data set
- Model iteration: improve zone labeling algorithm for labeling an interaction in one of 45 zones from a random guess, 2.22% accuracy, to over 60% accuracy
- Improve performance & legibility of inherited legacy code
- Automate cloud scheduling to read & write from Google BigQuery daily, as well as backfill data for previous months

User Recommendations Engine (on a team of 4 ppl)

Create serendipitous recommendations for a user-base in the hundreds of thousands to recommend opportunities for personal growth. Leveraging Tensorflow & AWS SageMaker, build a recommendation system that can compare implicit user-profiles & opportunity-profiles, records of past interactions, and explicit feedback from the users to deliver relevant options that delight the user & foster greater adoption and interaction with the smartphone app.

Gather raw data -> AWS S3 Data Lake -> AWS SageMaker -> AWS S3 Data Lake -> Smartphone app -> Cycle feedback back into the AWS S3 Data Lake
Source control & collaboration using Atlassian BitBucket
My individual contributions & responsibilities included:
- Onboarding our team to AWS SageMaker & configuring the environment
- Connecting to the S3 Data Lake to read inputs
- Validating & cleaning input data
- Integrating the "User Profile" into the Tensorflow model to address the "cold-start problem" as well as improve ongoing recommendations
- Validating model outputs to ensure useful results were being served to the smartphone app
- Sending outputs back to the S3 Data Lake

Meander Maker (solo developer)

Google Maps is great for finding individual places to go, but if you want to find a cluster of multiple related places, it can take a lot of work. There's a lot of scrolling, saving things for later, interacting with the search bar over and over, and eventually just eyeballing what you think might work, and hoping for the best. This location discovery tool addresses that frustration, and is great for things like:

Planning an urban themed walk
Efficiently visiting the nearest group of shoe stores
Creating an itinerary for winetasting through a cluster of walkable tasting rooms
Discovering a neighborhood in a foriegn city with a high density of something you like (museums, gluten-free restaurants, etc)
Of course, there's always the good old-fashioned pub crawl

Meander Maker leverages user-input to customize the "best" cluster based on how much the end user values:

High ratings from Google Maps
Overall quantity of stops within the cluster
Short initial distance from the user's starting position
Short transit distance within the stops of the cluster (after initial travel to stop #1 is completed)

💬 Blog Articles

📫 Connect with Me

🔭 Currently Working On

Clustering of unlabeled text documents with Natural Language Understanding (NLU) techniques
Pros & Cons between cloud-computing services (AWS, GCP, Azure, DataBricks, IBM Watson, etc)
Contributing to Open-Source machine learning libraries
Drafting a "Performant Pandas" blog. A collection of tips to improve performance & readability for many common Pandas DataFrame operations

Provide feedback

Saved searches

Use saved searches to filter your results more quickly