Soft Skill
Powered By GitBook
Canonical Correlation Analysis

Background Knowledge

Basic linear algebra literacy is needed. Video 1 and 2 are required and the rest is optional but highly recommended:
Essence of linear algebra series - 3Blue1Brown


The example in this article is implemented using R but the idea is the same:
Canonical Correlation Analysis - Joos Korstanje

What is Canonical Correlation Analysis?

In short, Canonical Correlation Analysis is used for analyzing correlations between two datasets.

Canonical Variables

Canonical Variables are linear combinations of the variables of one of the data sets. Since Canonical Correlation Analysis focuses on correlations between two data sets, you will define pairs of Canonical Variables: one Canonical Variable coming from the left data set and a second Canonical Variable Coming from the right data set.
If you have a different number of variables in your two datasets, you can have as many pairs of Canonical Variables as there are variables in the smallest dataset.

Canonical Correlation

    Canonical correlation seeks the weighted linear composite for each variate (sets of dependent variables or independent variables) to maximize the overlap in their distributions.
    Labeling of dependent variables and independent variables is arbitrary. The procedure looks for relationships and not causation.
    Goal is to maximize the correlation (not the variance extracted as in most other techniques).
    Lacks specificity in interpreting results that may limit its usefulness in many situations.


Correlation concepts:
    Bivariate correlations across sets
    Multiple correlations across sets
    Principal components within sets; correlations between principal components across sets

Canonical Correlation

Canonical correlation
Linear combinations:
Linear combinations


    Multiple continuous variables for dependent variables and independent variables or categorical with dummy coding.
    Assumes linear relationship between any two variables and between variates.
    Multivariate normality is necessary to perform statistical tests.
    Multicollinearity in either variate confounds interpretation of canonical results.
Last modified 17d ago