Diff of /preprocessing.py [000000] .. [a23339]

Switch to unified view

a b/preprocessing.py
1
'''
2
About: Python script to preprocess the predictor and target matrices for the MLP classification program.
3
Author: Iman Kafian-Attari
4
Date: 20.07.2021
5
Licence: MIT
6
version: 0.1
7
=========================================================
8
How to use:
9
1. Select the output directory.
10
2. Select the file containing information on the predictor matrix.
11
3. Select the file containing information on the target matrix.
12
=========================================================
13
Notes:
14
1. This script is meant to create the training and test datasets for the MLP neural network for the classification problem.
15
2. This script must be executed before running the main script.
16
3. It requires the following inputs from the user:
17
   - an output directory,
18
   - a numpy 2D matrix containing the information on the predictor in the form of mxn
19
     where m: number of observation and n: number of predictor variables,
20
   - a numpy 2D matrix containing the information on the predictor in the form of mx1
21
     where m: number of observation and 1: the only target variable,
22
4. It randomly creates the training and test sets for the predictor and target variables,
23
5. It transfer the range of values for the predictor variables to the range of [0, 1],
24
6. It stores the following data:
25
   - x_train,
26
   - x_test,
27
   - y_train,
28
   - y_test,
29
7. The output files are saved as a numpy 2D arrays.
30
8. To use this program without any errors, the target variables should be in the form of mx1 where m: number of samples.
31
=========================================================
32
TODO for version O.2
33
1. Modify the code in a functional form.
34
2. Modify to code to work for any number of target variables.
35
=========================================================
36
'''
37
38
print(__doc__)
39
40
import numpy as np
41
from sklearn.preprocessing import MinMaxScaler
42
from sklearn.model_selection import train_test_split
43
import tkinter as tk
44
from tkinter import filedialog
45
46
root = tk.Tk()
47
root.withdraw()
48
49
output_dir = filedialog.askdirectory(parent=root, initialdir='C:\\', title='Select the output directory')
50
51
# Import the data
52
# PREDICTORS:
53
us = np.loadtxt(filedialog.askopenfilename(parent=root, initialdir='C:\\', title='Select the input file, a 2D numpy array'))
54
55
# REFERENCES:
56
cells = np.loadtxt(filedialog.askopenfilename(parent=root, initialdir='C:\\', title='Select the output file, a 2D numpy array'))
57
58
# Normalizing the data into [0, 1]
59
scaler = MinMaxScaler()
60
us = scaler.fit_transform(us)
61
62
# Making the train and test set
63
x_train, x_test, y_train, y_test = train_test_split(us, cells, test_size=0.25)
64
np.savetxt(f'{output_dir}\\x_train.txt', x_train, delimiter='\t')
65
np.savetxt(f'{output_dir}\\x_test.txt', x_test, delimiter='\t')
66
np.savetxt(f'{output_dir}\\y_train.txt', y_train, delimiter='\t')
67
np.savetxt(f'{output_dir}\\y_test.txt', y_test, delimiter='\t')