top of page

Sephora Products

Dataset: Kaggle

My Role: Data Analyst

Date: Fall 2025 

Tools: Excel

01 | Project Summary

The Sephora Skincare Analytics Dashboard was built to answer a fundamental question: what truly drives value in the skincare market: price, perception, or performance?

​

This project transforms over 1.1 million rows of Sephora product and review data into an interactive analytical model that reveals how pricing strategy, consumer sentiment, and product accessibility intersect to shape brand success.

​

Understanding these relationships was essential:

  • Pricing metrics such as retail vs. sale price, discount depth, and price tier helped quantify how affordability affects visibility and demand.

  • Sentiment metrics like average rating, recommendation rate, and helpfulness scores captured the emotional and experiential side of brand trust.

  • Engagement and exclusivity metrics (Online-Only, Limited Edition, Sephora Exclusive) exposed which types of products foster the strongest loyalty and differentiation.​

dashboard.jpg

02 | Power Query & Power Pivot

This project involved transforming a raw, unstructured Sephora dataset into a relational, analytics-ready data model using Excel Power Query and Power Pivot. The goal was to establish consistency, integrity, and clarity across over 1.1 million product and review records, ensuring smooth joins and meaningful DAX calculations.

Data Ingestion & Schema Setup

  • Imported multiple CSVs from Kaggle’s Sephora Products and Reviews Dataset into Power Query.

  • Inspected column headers for irregular naming conventions and missing field alignments between product and review files.

  • Removed redundant columns (e.g., site-specific metadata, null review fields).

  • Cast all columns to proper data types (Decimal, Whole Number, Date, Text, Boolean).

  • Trimmed and cleaned text fields (Title, Text) to remove trailing spaces and inconsistent punctuation.

  • Replaced nulls with blanks or zeros depending on data type.

Products Table Transformations

  • Renamed columns with readable, consistent formatting (BrandName, RetailPrice, OnlineOnly?).

  • Capitalized text fields (brand, category, tag, formulation) using Text.Proper.

  • Created derived fields:

    • DiscountAmount = Retail – Sale Price

    • PriceTier = Budget / Midrange / Premium (based on thresholds)

    • PerceivedValueGap = RetailPrice – MarketValue

    • FinalSizeCategory = Mapped grouping from Mini, Standard, Large, XL

  • Filtered out duplicates, nulls, and products missing essential data (e.g., empty price or product name).

  • Reordered columns into logical order: Identifiers → Attributes → Pricing → Flags → Calculated Fields.

  • Mapped ingredient and feature lists by cleaning brackets, delimiters, and irregular characters.

  • Extracted product volume from text strings using custom parsing functions:

    • GetNumberBeforeToken() to isolate numeric values before “oz” or “mL”.

    • Converted all size units → standardized FinalVolumeML.

Reviews Table Transformations

  • Applied Text.Proper to categorical demographics (SkinType, SkinTone, HairColor, EyeColor).

  • Standardized review timestamps to Date type → created clean SubmissionTime field for Calendar joins.

  • Removed duplicates and invalid ProductID links to ensure referential integrity.

  • Reordered columns into logical structure:                                                                                                                              Product Info → Review Metrics → Author Demographics → Text Fields.

Demographics Normalization

​NewDemographics:
  • Unpivoted all four attributes into Attribute–Value pairs (SkinType, SkinTone, HairColor, EyeColor).

  • Filtered ambiguous or erroneous categories (e.g., “Notsurest”, “Unknown”).

  • Established consistent taxonomy across all attributes.

​

MapDemographics:
  • Created lookup table with columns:
    Attribute, Value, NewValue, Status (Kept / Removed).

  • Used this mapping for controlled replacement in the main Reviews query.

  • Enabled seamless filtering by clean demographic groups in the dashboard.

Product Size Normalization

NewSizing:
  • Extracted distinct ProductSize and ProductValue combinations.

  • Parsed and converted sizes to milliliters using conversion logic (oz → mL).

  • Created standardized numeric field FinalVolumeML.

​

MapSizing:
  • Built lookup mapping between raw and cleaned size values.

  • Defined categorical groupings for dashboard visuals:

  • Travel/Mini, Standard, Large, XL.

  • Joined back to main Products table for consistent size-tier segmentation.

Text Analysis Query

  • Isolated Title and Text from Reviews for separate text processing.

  • Added custom Boolean field HasValueMention to flag reviews referencing “value”

  • Casted logical fields, replaced nulls, and created standardized outputs for use in DAX sentiment analysis

Data Model Optimization

  • ​Established one-to-many relationships between Products, Reviews, and Calendar tables.

  • Verified referential integrity to prevent orphaned records and maintain relational accuracy.

  • Defined explicit primary keys (ProductID, Date) and validated relationship cardinality.

  • Removed unnecessary columns and disabled auto-detect to optimize DAX calculation efficiency.

03 | Insights & Dashboards

Each dashboard sheet served a different analytical audience from executives to product strategists to data consumers. 

Profit & Pricing Trends

Monetary performance and pricing strategy health.

​

Insights:

  • Revenue trails category goals by ~59% overall, largely because discount depth underperformed relative to target thresholds (especially in Fragrance and Hair).

  • Premium-tier products (>$60) account for ~28% of items yet contribute over half of total revenue, confirming that high-end pricing remains Sephora’s main profit driver.

  • Midrange and Mini/Travel sizes outperform in product count, suggesting consumer preference for accessible, flexible options: a key indicator of “trial-first” buying behavior.

feature.jpg
category.jpg

Consumer Behavior

Sentiment, engagement, and community dynamics.

​

Insights:

  • 82.9% of all reviews are 4★+, confirming high post-purchase satisfaction. But sentiment varies strongly by product type: Men’s Grooming and Gifts outperform Skincare by 0.4–0.6 points.

  • Helpful reviews average 4+ feedback votes each, suggesting users actively engage with reviews they trust.

  • Repeat reviewers make up roughly 10–15% of all unique users, hinting at brand affinity and engagement loops (especially around cult-favorite brands like The Ordinary and LANEIGE).

Product Segmentation

Catalog architecture, pricing tiers, and feature mix.

​

Insights:

  • Average product variant spread is $170, showing significant internal price diversity: often caused by brands selling minis and luxury variants under a single line.

  • Skincare and Gift sets show the widest price variance, aligning with both entry-level and luxury bundles that appeal to mixed audiences.

  • Average retail price ($51.65) vs average sale price ($46.50) reveals a mild gap, reflecting Sephora’s restrained discount strategy compared to mass retailers.

products.jpg

Executive Dashboard

Sentiment, engagement, and community dynamics.

​

Insights:

  • Sephora’s strength lies in brand equity and loyal engagement, not deep discounting.

  • Revenue growth will likely depend on targeted promotion rather than mass markdowns, especially in premium and fragrance categories.

  • Mini sizes and accessible SKUs act as gateways for new customer acquisition, reinforcing the importance of balanced price tiers.

  • Exclusivity works: Exclusive and Online-Only items outperform in both rating and review helpfulness.

  • There’s a strong alignment between customer satisfaction and product credibility (feedback ratio), suggesting trust as a key KPI for long-term retention.

04| Reflection

This project was my first true sandbox for learning how analytical thinking, design, and data modeling come together to tell a business story. Using Excel as my main tool forced me to become creative with constraint. Power Query and Power Pivot gave me the freedom to model data like in Power BI, while DAX challenged me to think in relationships, context, and measure logic rather than just formulas.

​

In many ways, this Sephora dashboard has been my foundation for mastering Power BI methodologies inside Excel; from data cleaning and modeling to dynamic visualization and KPI storytelling. Building this by hand gave me a clearer understanding of how business questions evolve, and how dashboards must evolve with them.

​

As I keep improving my technical toolkit, I plan to revisit and expand this workbook to include more advanced and industry-level capabilities, such as:

  • Goal Seek and Scenario Manager: for forecasting pricing and discount outcomes based on revenue targets.

  • Combo Charts and Sparklines: to show trend continuity between performance KPIs and time-based changes.

  • Advanced Time Intelligence: using DAX functions like DATEADD, SAMEPERIODLASTYEAR, and PARALLELPERIOD for seasonal comparisons.

  • Drill-throughs and Tooltips: enabling deep dives from summary visuals into specific brand or category performance.

  • Bookmarks and Buttons: to simulate multi-page storytelling and guided analysis inside the workbook — similar to Power BI dashboards.

bottom of page