[973ab6]: / Stats / __pycache__ / FeatureSelection.cpython-35.pyc

Download this file

142 lines (133 with data), 11.6 kB



ţ2÷Y4Ń@s˘dZddlmZmZmZddlmZddlmZddlmZddlm	Z	ddl
mZddlm
Z
dd	lZdd
lmZedâZdZd
ZdgZdZdZdZdZdZGddädâZd	S)z4It is an interface for ranking features importance.
Ú)┌List┌TypeVar┌Any)┌ensemble)┌feature_selection)┌tree)┌svm)┌SVR)┌RandomizedLogisticRegressionN)┌	CONSTANTS┌	DataFramezMohsen Mesgarpourz-Copyright 2016, https://github.com/mesgarpour┌GPLz1.1zmohsen.mesgarpour@gmail.com┌Releasec@sEeZdZddäZdeeeeedddÉäZ	deeeeedddÉäZ
eeeed	d
dÉäZeeeed	dd
ÉäZeeeed	ddÉäZ
ddeeeeeedddÉäZeeeedddÉäZeeeedddÉäZeeeeedddÉäZdS) ┌FeatureSelectioncCs)tjtjâ|_|jjtâdS)z.Initialise the objects and constants.
        N)┌logging┌	getLoggerr┌app_name┌_FeatureSelection__logger┌debug┌__name__)┌selfęr˙NC:\Users\eagle\Documents\GitHub\Analytics_UoW\TCARER\Stats\FeatureSelection.py┌__init__,szFeatureSelection.__init__Ú)┌features_indep_df┌feature_target┌n_jobs┌kwargs┌returncKs5|jjdâtjd||Ź}|j||âS)u`Use Brieman Random Forest Classifier to rank features.
        Attributes:
        model.estimators_
        model.classes_
        model.n_classes_
        model.n_features_
        model.n_outputs_
        model.feature_importances_

        :param features_indep_df: the independent features, which are inputted into the model.
        :param feature_target: the target feature, which is being estimated.
        :param n_jobs: number of CPUs to use during the resampling. If ÔÇś-1ÔÇÖ, use all the CPUs.
        :param kwargs: n_estimators=10, criterion='gini', max_depth=None, min_samples_split=2, min_samples_leaf=1,
        min_weight_fraction_leaf=0.0, max_features='auto', max_leaf_nodes=None, bootstrap=True,
        oob_score=False, random_state=None, verbose=0, warm_start=False, class_weight=None
        :return: the importance ranking model.
        z'Run Random Forest Classifier (Brieman).r)rrrZRandomForestClassifier┌fit)rrrrr┌
classifierrrr┌rank_random_forest_breiman2sz+FeatureSelection.rank_random_forest_breimancKs2|jjdâtd||Ź}|j||âS)uťUse Randomized Logistic Regression to rank features.
        Attributes:
        model.scores_
        model.all_scores_

        :param features_indep_df: the independent features, which are inputted into the model.
        :param feature_target: the target feature, which is being estimated.
        :param n_jobs: number of CPUs to use during the resampling. If ÔÇś-1ÔÇÖ, use all the CPUs.
        :param kwargs: C=1, scaling=0.5, sample_fraction=0.75, n_resampling=200, selection_threshold=0.25, tol=0.001,
        fit_intercept=True, verbose=False, normalize=True, random_state=None, pre_dispatch='3*n_jobs'
        :return: the importance ranking model.
        zRun Random Logistic Regression.r)rrr
r )rrrrrr!rrr┌rank_random_logistic_regressionLsz0FeatureSelection.rank_random_logistic_regression)rrrrcKs/|jjdâtj|Ź}|j||âS)apUse Scalable Linear Support Vector Machine for classification.
        In C-Support Vector Classification (SVC), the C parameter trades off misclassification of training examples
        against simplicity of the decision surface.
        Attributes:
        model.support_
        model.support_vectors_
        model.n_support_
        model.dual_coef_
        model.coef_
        model.intercept_

        :param features_indep_df: the independent features, which are inputted into the model.
        :param feature_target: the target feature, which is being estimated.
        :param kwargs: C=1.0, kernel='rbf', degree=3, gamma='auto', coef0=0.0, shrinking=True, probability=False,
        tol=0.001, cache_size=200, class_weight=None, verbose=False, max_iter=-1, decision_function_shape=None,
        random_state=None
        :return: the importance ranking model.
        z$Run C-Support Vector Classification.)rrrZSVCr )rrrrr!rrr┌rank_svm_c_supportasz#FeatureSelection.rank_svm_c_supportcKs/|jjdâtj|Ź}|j||âS)aŘUse Brieman decision tree classifier to rank features.
        Attributes:
        model.classes_
        model.feature_importances_
        model.max_features_
        model.n_classes_
        model.n_features_
        model.n_outputs_
        model.tree_

        :param features_indep_df: the independent features, which are inputted into the model.
        :param feature_target: the target feature, which is being estimated.
        :param kwargs: criterion='gini', splitter='best', max_depth=None, min_samples_split=2, min_samples_leaf=1,
        min_weight_fraction_leaf=0.0, max_features=None, random_state=None, max_leaf_nodes=None,
        min_impurity_split=1e-07, class_weight=None, presort=False
        :return: the importance ranking model.
        z'Run Decision Tree Classifier (Brieman).)rrrZDecisionTreeClassifierr )rrrrr!rrr┌rank_tree_brieman{sz"FeatureSelection.rank_tree_briemancKs/|jjdâtj|Ź}|j||âS)a)Use Gradient Boosted Regression Trees (GBRT) to rank features.
        Attributes:
        model.feature_importances_
        model.train_score_
        model.loss_
        model.init
        model.estimators_

        :param features_indep_df: the independent features, which are inputted into the model.
        :param feature_target: the target feature, which is being estimated.
        :param kwargs: loss='ls', learning_rate=0.1, n_estimators=100, subsample=1.0, criterion='friedman_mse',
        min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_depth=3, min_impurity_split=1e-07,
        init=None, random_state=None, max_features=None, alpha=0.9, verbose=0, max_leaf_nodes=None, warm_start=False,
        presort='auto'
        :return: the importance ranking model.
        z-Run Gradient Boosted Regression Trees (GBRT).)rrrZGradientBoostingRegressorr )rrrrr!rrr┌rank_tree_gbrtöszFeatureSelection.rank_tree_gbrt┌linear)rr┌kernelrrrcKsJ|jjdâtd|â}tjd|d||Ź}|j||âS)uŹSelect top features using recursive feature elimination and cross-validated selection of the best number
        of features, to rank features.
        Attributes:
        model.n_features_
        model.support_
        model.ranking_
        model.grid_scores_
        model.estimator_

        :param features_indep_df: the independent features, which are inputted into the model.
        :param feature_target: the target feature, which is being estimated.
        :param kernel: Specifies the kernel type to be used in the algorithm. It must be one of ÔÇślinearÔÇÖ, ÔÇśpolyÔÇÖ,
        ÔÇśrbfÔÇÖ, ÔÇśsigmoidÔÇÖ, ÔÇśprecomputedÔÇÖ or a callable. If none is given, ÔÇśrbfÔÇÖ will be used.
        :param n_jobs: number of CPUs to use during the resampling. If ÔÇś-1ÔÇÖ, use all the CPUs.
        :param kwargs: step=1, cv=None, scoring=None, verbose=0
        :return: the feature selection model.
        z7Run Feature Ranking with Recursive Feature Elimination.r(┌	estimatorr)rrr	r┌RFECVr )rrrr(rrr)┌selectorrrr┌selector_logistic_rfeČsz&FeatureSelection.selector_logistic_rfe)rr┌kbestrcCs)|jjdâ|j||tj|âS)u:Select features according to the k highest scores, using 'chi2':
        Chi-squared stats of non-negative features for classification tasks.
        Attributes:
        model.scores_
        model.pvalues_

        :param features_indep_df: the independent features, which are inputted into the model.
        :param feature_target: the target feature, which is being estimated.
        :param kbest: number of top features to select. The ÔÇťallÔÇŁ option bypasses selection, for use in a parameter
        search.
        :return: the feature selection model.
        z@Select features according to the k highest scores, using 'chi2'.)rr┌5_FeatureSelection__selector_univarite_selection_kbestr┌chi2)rrrr-rrr┌'selector_univarite_selection_kbest_chi2╚s
	z8FeatureSelection.selector_univarite_selection_kbest_chi2cCs)|jjdâ|j||tj|âS)u8Select features according to the k highest scores, using 'f_classif':
        ANOVA F-value between label/feature for classification tasks.
        Attributes:
        model.scores_
        model.pvalues_

        :param features_indep_df: the independent features, which are inputted into the model.
        :param feature_target: the target feature, which is being estimated.
        :param kbest: number of top features to select. The ÔÇťallÔÇŁ option bypasses selection, for use in a parameter
        search.
        :return: the feature selection model.
        zESelect features according to the k highest scores, using 'f_classif'.)rrr.r┌	f_classif)rrrr-rrr┌,selector_univarite_selection_kbest_f_classifŮsz=FeatureSelection.selector_univarite_selection_kbest_f_classif)rr┌
score_funcr-rcCsU|jjdâtt|â|jdâ}tjd|d|â}|j||âS)uvSelect features according to the k highest scores.
        Attributes:
        model.scores_
        model.pvalues_

        :param features_indep_df: the independent features, which are inputted into the model.
        :param feature_target: the target feature, which is being estimated.
        :param score_func: Function taking two arrays X and y, and returning a pair of arrays (scores, pvalues) or
        a single array with scores.
        :param kbest: number of top features to select. The ÔÇťallÔÇŁ option bypasses selection, for use in a parameter
        search.
        :return: the feature selection model.
        z<Run Univariate Feature Selection with Configurable Strategy.rr3┌k)rr┌int┌float┌shaper┌SelectKBestr )rrrr3r-r+rrrZ$__selector_univarite_selection_kbest˛s
	z5FeatureSelection.__selector_univarite_selection_kbestNÚ    r9r9)r┌
__module__┌__qualname__r┌PandasDataFramerr5r┌objectr"r#r$r%r&┌strr,r0r2r.rrrrr+s6	r)┌__doc__┌typingrrr┌sklearnrrrrZsklearn.svmr	Zsklearn.linear_modelr
r┌Configs.CONSTANTSrr<┌
__author__┌
__copyright__┌__credits__┌__license__┌__version__┌__maintainer__┌	__email__┌
__status__rrrrr┌<module>s&