
Collecting and Populating Wardrobe Data
Our ultimate goal of deriving wardrobe insights and taking data driven action on fits and clothing items is only achieveable to accurate and complete records. We tackle the important task of data collection and population.
Most of the time us data folk inherit data from external sources. Often this puts a smile on our face as we can get right to work with analysis, machine learning, or visualization. However, there is a dark side.
Often data is messy, incomplete, imbalanced, etc., which affects the degree to which the above can unearth relevant insights. In fact, most of our time is spent remedying these issues to the best of our ability (see the below from the Data Scientist Report).

Often in the midst of wrangling data, it occurs to us that "if only the experiment designer considered X" or "if only they collected data on Y" then our job would be so much easier. Such is my motivation for the wardrobe database.
Beginning with the end in mind of the questions I want answered, the insights I'm hoping for, and the analysis techniques I'm interested in employing pushed me to design the wardrobe data source ideal for these activites.
With the database now ready, we can begin the process of data collection and population.
Collecting Wardrobe Data
Data collection for this project, as you might expect, was tidious, to say the least.
Think about an individual item of clothing and what must be checked and examined prior to inputing the relevant values into the database:
- Who owns the item (i.e. Adam)?
- What is the item (i.e. shirt)?
- What is the brand (i.e. Nike)?
- Is the item active (i.e. yes)?
- What should the item be called (i.e. Basketball Shorts)?
- What sizing measurements descibe the item (i.e. large)?
- How many units of the exact item in question are owned (i.e. 1)?
- When was the item purchase (i.e. 2022/05/24)?
- What is the monetary value of the item when purchased (i.e. 24.99)?
- What are the status(es) of the item (i.e. damaged, stained)?
- What are the materials that make up the item (i.e. polyester, cotton)?
- What are the colors that make up the item (i.e. blue, green)?
- What are the patterns that make up the item (i.e. solid, stripes)?
- What are the keywords that describe the item (i.e. t-shirt, collar)?
That's a lot to be thinking about for each item in one's wardrobe. I can say with a good degree of certainty that the duration for logging all the items in my modestly sized wardrobe is likely to take a handful of hours at best.

Adding individual fit data is much simpler, though requires its own level of detail to log outfits on a daily basis. Think about what that takes:
- What is the occasion (i.e. work)?
- When was the outfit worn (i.e. 2022/05/24 at 7:00 pm)?
- What was the level of satisfaction with the outfit from 1-4 (i.e. 3)?
- What item(s) comprised the outfit in question (i.e. green sandals, golf shorts)?
In summary, the data collection task required patience and process. Aligning the collection requirements stated above with the tools for data population to the database was the appraoch for optimum efficiency.
Populating Wardrobe Data
Just as the sequence of adding tables to a database is important, so is the sequence for adding source data to those same tables.
Because the "wItem" table stores foreign keys for "wItemBrand", "wItemType", etd., we cannot add a clothing item if those brand and type records are not already in the database with keys to reference.
Below is a basic flowchart depicting the order and logic I run through every time I populate clothing item data:

At most stages, it's imperitive to check if the value related to the current clothing item (i.e. brand, type, color, etc.) is new. Should we be working with a brand not yet populated into the database, we have an additional step before proceding to add the item.
In a similar fashion and with similar considerations, we can add outfit data (please see the far simpler flowchart below):

Collecting and populating wardrobe data is never really complete. We all buy new clothing items throughout the year, get rid of some stuff, notice as an item gets damaged, and at the very least wear clothes daily. This all requires strict cataloging of what's going on with the wardrobe.
I do hope to take the next several weeks getting to the point where the current status of my wardrobe is represented in data and that habits are being developed for consistent tracking.