Normalize PCA with scikit-learn when data is split

I have a followup question on: How to normalize with PCA and scikit-learn.

Im creating an emotion detection system and what I do now is:

  1. Split data over all emotion (distributing data over multiple subsets).
  2. Add all data together (the multiple subsets into 1 set)
  3. Get PCA parameters of combined data (self.pca = RandomizedPCA(n_components=self.n_components, whiten=True).fit(self.data))
  4. Per emotion (per subset), apply PCA to data of that emotion (subset).

I should do the normalization at: step 2) Normalize all combined data, and step 4) normalize the subsets.

Edit

I was wondering if the normalization over all data and the normalization over subset is the same. Now when I tried to simplify my example on suggestion of @BartoszKP I figured out that how I understood the normalization worked, was wrong. The normalization in both cases work in the same way, so this is a valid way to do it, right? (see code)

from sklearn.preprocessing import normalize
from sklearn.decomposition import RandomizedPCA
import numpy as np

data_1 = np.array(([52, 254], [4, 128]), dtype=f)
data_2 = np.array(([39, 213], [123, 7]), dtype=f)
data_combined = np.vstack((data_1, data_2))
#print(data_combined)
"""
Output
[[  52.  254.]
 [   4.  128.]
 [  39.  213.]
 [ 123.    7.]]
"""
#Normalize all data
data_norm = normalize(data_combined)
print(data_norm)
"""
[[ 0.20056452  0.97968054]
 [ 0.03123475  0.99951208]
 [ 0.18010448  0.98364753]
 [ 0.99838448  0.05681863]]
"""

pca = RandomizedPCA(n_components=20, whiten=True)
pca.fit(data_norm)

#Normalize subset of data
data_1_norm = normalize(data_1)
print(data_1_norm)
"""
[[ 0.20056452  0.97968054]
 [ 0.03123475  0.99951208]]
"""
pca.transform(data_1_norm)


Download normalize.pca.with.scikit.learn.when.data.is.split.zip
Direct Link


Download


Download normalize.pca.with.scikit.learn.when.data.is.split.zip
Mediafire


Download


Download normalize.pca.with.scikit.learn.when.data.is.split.zip
Junocloud


Download


Download normalize.pca.with.scikit.learn.when.data.is.split.zip
Hotfile


Download


Download normalize.pca.with.scikit.learn.when.data.is.split.zip
Ultrafile


Download