[973ab6]: / Stats / __pycache__ / Plots.cpython-35.pyc

Download this file

240 lines (240 with data), 19.2 kB



M3÷Yă_Ń@sdZddlmZmZmZmZddlmZddlm	Z	ddlm
Z
ddlmZddl
mZddljZddlZddlZed	âZed
âZdZdZdgZd
ZdZdZdZdZGddädâZdS)zMIt consists of a set of custom plots, using Matplotlib and Scikit libraries.
Ú)┌Dict┌List┌TypeVar┌Any)┌metrics)┌validation_curve)┌learning_curve)┌confusion_matrix)┌
KernelDensityN┌	DataFrame┌FigurezMohsen Mesgarpourz-Copyright 2016, https://github.com/mesgarpour┌GPLz1.1zmohsen.mesgarpour@gmail.com┌Releasec@sŠeZdZeeddgâdddeeeeeeee	gdddÉäâZ
ed	d
e	eeeddd
ÉäâZedd
eeeeee	gdddÉäâZ
edd
ddd
d
eeeeeeeeeeeeee	gdddÉäâZedd
eeeeee	gdddÉäâZedd
ddd
d
eeeeeeeeeeeeee	gdddÉäâZedddd5ejddd âeeeeeeeeee	gd!d"d#É
äâZedddd
d6eeeeeeeeeeee	gd$d%d&ÉäâZed'd(geeeeed)d*d+ÉäâZed,d'd(geeeeeed-d.d/ÉäâZedd0d1d'd(geeeeeeeed2d3d4É	äâZdS)7┌PlotsrÚFzConfusion MatrixZBlues)┌predicted_scores┌feature_target┌model_labals┌	normalize┌title┌cmap┌returnc
Cs÷tâ}tjt|ââ}t||â|d<tjddâtjddddâ\}}	|	jâ|	j	|dj
|dââ|	jd	â|	jd
â|	j
||ddâ|	j||â|	jâ|	jâtj|dd
dd|â|r<|djdâ|djddâddůtjf|d<|djâd}
xĆtjt|djdât|djdââD]Y\}}tj|||d||fddd|d||f|
krÎdndâqůWtjâ||fS)aąPlot the confusion matrix.
        :param predicted_scores: the predicted Scores.
        :param feature_target: the target feature, which is being estimated.
        :param model_labals: the target labels (default [0, 1]).
        :param normalize: to normalise the labels.
        :param title: the figure title.
        :param cmap: the plot color.
        :return: the plot object, and the data used to plot.
        Z
cnf_matrix┌	precisionÚ┌nrowsr┌ncolsz Average Precision={0:0.2f}┌
avg_precisionz
True labelzPredicted label┌rotationÚ-┌
interpolation┌nearestr┌float┌axisNg@r┌horizontalalignment┌center┌color┌white┌black)┌dict┌np┌arange┌lenr	┌set_printoptions┌plt┌subplots┌clfr┌format┌ylabel┌xlabel┌xticks┌yticks┌grid┌colorbar┌imshow┌astype┌sum┌newaxis┌max┌	itertools┌product┌range┌shape┌text┌tight_layout)
rrrrrr┌	summariesZ
tick_marks┌fig┌ax┌thresh┌i┌jęrH˙CC:\Users\eagle\Documents\GitHub\Analytics_UoW\TCARER\Stats\Plots.pyr	-s.	




>A +
zPlots.confusion_matrixzStep-Wise Train & Testr)rBr┌lwrc	Cs|tjddddâ\}}tjâtj|âtjddgâtjdâtjdâtjâtj|d|d	d
|ddd
dâtj|d|dd
|ddd
dâtj|d|dd
|ddd
dâtj|d|dd
|ddd
dâtj|d|dd
|ddd
dâtj|d|dd
|ddd
dâtj	ddâ|S) a│Plot a performance summary plot for the step-wise training and testing.
        :param summaries: the summary statistics which will be used for plotting.
        It must contain 'Train_Precision', 'Train_Recall', 'Train_ROC', 'Test_Precision', 'Test_Recall', and 'Test_ROC'
        for each training and testing step.
        :param title: the figure title.
        :param lw: the line-width.
        :return: the plot object.
        rrrgg═╠╠╠╠╠­?zNumber of FeatureszSummary StatisticsZStepZTrain_PrecisionrJr%┌r┌labelzTrain - PrecisionZTrain_Recall┌gzTrain - RecallZ	Train_ROC┌bzTrain - ROCZTest_Precision┌brownzTest - PrecisionZTest_Recall┌orangez
Test - RecallZTest_ROC┌pinkz
Test - ROC┌locz
lower left)
r-r.r/r┌ylimr2r1r5┌plot┌legend)rBrrJrCrDrHrHrI┌stepwise_model^s





******zPlots.stepwise_modelzPrecision-Recall Curve)rrrrJrc	Cstâ}tj||â\|d<|d<}tj||â|d<tjddddâ\}}tjâtj|dj|dââtj	dd	gâtj
dd
gâtjdâtjdâtj
âtj|d|dd
|ddddâtjddâ||fS)a└Plot the precision-recall curve.
        "The precision-recall plot is a model-wide measure for evaluating binary classifiers
        and closely related to the ROC plot."
        :param predicted_scores: the predicted Scores.
        :param feature_target: the target feature, which is being estimated.
        :param title: the figure title.
        :param lw: the line-width.
        :return: the plot object, and the data used to plot.
        r┌recallrrrrz Average Precision={0:0.2f}gg­?g═╠╠╠╠╠­?┌Recall┌	PrecisionrJr%┌navyrLzPrecision-Recall curverRz
lower left)r(r┌precision_recall_curve┌average_precision_scorer-r.r/rr0┌xlimrSr2r1r5rTrU)rrrrJrB┌_rCrDrHrHrI┌precision_recall}s 	



zPlots.precision_recallÚi')
┌predicted_scores_list┌feature_target_list┌
label_list┌marker_list┌linestyle_list┌
color_listrrJ┌
markersize┌	markevery┌legend_prop┌legend_markerscalercCs┘dgt|â}x|tt|ââD]h}
tâ||
<tj||
||
â\||
d<||
d<}tj||
||
â||
d<q&Wtjddâ}|jddd	dâ}tj	âtj
|âtjd
dgâtjd
dgâtj
d
âtjdâtjâxŹtt|ââD]y}
tj||
d||
dd|d||
d|	d||
d|d||
dd||
dj||
dââq0Wtjdddd|
id|â||fS)aÁPlot the precision-recall curve.
        "The precision-recall plot is a model-wide measure for evaluating binary classifiers
        and closely related to the ROC plot."
        :param predicted_scores_list: the predicted Scores (one or multiple).
        :param feature_target_list: the target feature, which is being estimated (one or multiple).
        :param label_list: the line label (one or multiple).
        :param marker_list: the line marker (one or multiple).
        :param linestyle_list: the line style (one or multiple).
        :param color_list: the line color (one or multiple).
        :param title: the figure title.
        :param lw: the line-width.
        :param markersize: the marker size.
        :param markevery: to mark every x point.
        :param legend_prop: the legend proportion
        :param legend_markerscale: The legend's marker scale.
        :return: the plot object, and the data used to plot.
        NrrWr┌figsizeÚ
rrrgg­?g═╠╠╠╠╠­?rXrYrg┌markerrh┌	linestyle┌	linewidthr%rLzAvg. Precision (z
)={0:0.2f}rRz
lower left┌prop┌size┌markerscale)rlrl)r+r>r(rr[r\r-┌figure┌add_subplotr/rr]rSr2r1r5rTr0rU)rarbrcrdrerfrrJrgrhrirjrBrFr^rCrDrHrHrI┌precision_recall_multipleús6 
- 







("zPlots.precision_recall_multiplez	ROC CurvecCsTtâ}tj||â\|d<|d<}tj|d|dâ|d<tjddddâ\}}tjâtj|dj|dââtj	dd	gâtj
dd
gâtjdâtjdâtj
âtj|d|dd
dd|dd|dâtjddgddgd
dd|ddâtjddâ||fS)aEPlot the Receiver Operating Characteristic (ROC)
        :param predicted_scores: the predicted Scores.
        :param feature_target: the target feature, which is being estimated.
        :param title: the figure title.
        :param lw: the line-width.
        :return: the plot object, and the data used to plot.
        ┌fpr┌tpr┌roc_aucrrrz
 AUC={0:0.2f}gg­?g═╠╠╠╠╠­?zFalse Positive RatezTrue Positive Rater%rKrJrLzROC curve (area = %0.2f)rrZrnz--rRzlower right)r(r┌	roc_curve┌aucr-r.r/rr0r]rSr2r1r5rTrU)rrrrJrBr^rCrDrHrHrI┌rocŃs 	#



.z	Plots.roccCsdgt|â}xätt|ââD]p}
tâ||
<tj||
||
â\||
d<||
d<}tj||
d||
dâ||
d<q&Wtjâtjdd!â}|j	ddd	dâ}tj
|âtjd
dgâtjd
dgâtj
d
âtjdâtjâxŹtt|ââD]y}
tj||
d||
dd|d||
d|	d||
d|d||
dd||
dj||
dââq8Wtjddgddgddd|ddâtjdddd|
id |â||fS)"a:Plot the Receiver Operating Characteristic (ROC)
        :param predicted_scores_list: the predicted Scores (one or multiple).
        :param feature_target_list: the target feature, which is being estimated (one or multiple).
        :param label_list: the line label (one or multiple).
        :param marker_list: the line marker (one or multiple).
        :param linestyle_list: the line style (one or multiple).
        :param color_list: the line color (one or multiple).
        :param title: the figure title.
        :param lw: the line-width.
        :param markersize: the marker size.
        :param markevery: to mark every x point.
        :param legend_prop: the legend proportion
        :param legend_markerscale: the legend's marker scale.
        :return: the plot object, and the data used to plot.
        Nrvrwrxrkrlrrrgg­?g═╠╠╠╠╠­?zFalse Positive RatezTrue Positive Ratergrmrhrnror%rLzAUC(z
)={0:0.2f}rrZrJz--rRzlower rightrprqrr)rlrl)r+r>r(rryrzr-r/rsrtrr]rSr2r1r5rTr0rU)rarbrcrdrerfrrJrgrhrirjrBrFr^rCrDrHrHrI┌roc_multiples8
-(







(."zPlots.roc_multiplezLearning CurveNgÜÖÖÖÖÖ╣?g­?Ú)	┌	estimator┌features_indep_dfrrrS┌cv┌n_jobs┌train_sizesrc

Cs┌tâ}t|||d|d|d|â\}}	}
tj|	ddâ|d<tj|	ddâ|d<tj|
ddâ|d<tj|
ddâ|d	<tjd
dddâ\}}tjâtj|â|dk	rŔtj	|îtj
d
âtjdâtjâtj
||d|d|d|dddddâtj
||d|d	|d|d	ddddâtj||ddddddâtj||ddddddâtjddâ||fS)u┘Plot the learning curve.
        "A learning curve shows the validation and training score of an estimator for varying numbers of training
        samples. It is a tool to find out how much we benefit from adding more training data and whether the estimator
        suffers more from a variance error or a bias error. If both the validation score and the training score
        converge to a value that is too low with increasing size of the training set, we will not benefit much
        from more training data."
        :param estimator: the object type that implements the ÔÇťfitÔÇŁ and ÔÇťpredictÔÇŁ methods.
        An object of that type which is cloned for each validation.
        :param features_indep_df: the independent features, which are inputted into the model.
        :param feature_target: the target feature, which is being estimated.
        :param title: the figure title.
        :param ylim: the y-limit for the axis.
        :param cv: the cross-validation splitting strategy (optional).
        :param n_jobs: the number of jobs to run in parallel (default -1).
        :param train_sizes: the size of the training samples for the learning curve.
        :return: the plot object, and the data used to plot.
        rÇrürér"r┌train_scores_mean┌train_scores_std┌test_scores_mean┌test_scores_stdrrNzTraining examples┌Score┌alphagÜÖÖÖÖÖ╣?r%rKrMzo-rLzTraining scorezCross-validation scorerR┌best)r(rr)┌mean┌stdr-r.r/rrSr2r1r5┌fill_betweenrTrU)
r~rrrrSrÇrürérB┌train_scores┌test_scoresrCrDrHrHrIrFs0	





##zPlots.learning_curve)r~rr┌
param_name┌param_rangerrSrÇrJrürc
Cs§tâ}
t|||d|d|d|ddd|	â\}}tj|ddâ|
d	<tj|ddâ|
d
<tj|ddâ|
d<tj|ddâ|
d<tjd
dddâ\}
}tjâtj|â|dk	r˝tj	|îtj
dâtjdâtjâtj
||
d	ddddd|âtj||
d	|
d
|
d	|
d
ddddd|âtj
||
dddddd|âtj||
d|
d|
d|
dddddd|âtjddâ|
|
fS)uMPlot the validation curve
        "it is sometimes helpful to plot the influence of a single hyperparameter on the training score and the
        validation score to find out whether the estimator is overfitting or underfitting for some hyperparameter
        values."
        :param estimator: the object type that implements the ÔÇťfitÔÇŁ and ÔÇťpredictÔÇŁ methods.
        An object of that type which is cloned for each validation.
        :param features_indep_df: the independent features, which are inputted into the model.
        :param feature_target: the target feature, which is being estimated.
        :param param_name: the N=name of the parameter that will be varied.
        :param param_range: the values of the parameter that will be evaluated.
        :param title: the figure title.
        :param ylim: the y-limit for the axis.
        :param cv: the cross-validation splitting strategy (optional).
        :param lw: the line-width.
        :param n_jobs: the number of jobs to run in parallel (default -1).
        :return: the plot object, and the data used to plot.
        rĆrÉrÇ┌scoring┌accuracyrür"rrârärůrćrrNz$\gamma$rçrLzTraining scorer%┌
darkorangerJrłgÜÖÖÖÖÖ╔?zCross-validation scorerZrRrë)r(rr)rŐrőr-r.r/rrSr2r1r5ZsemilogxrîrU)r~rrrĆrÉrrSrÇrJrürBrŹrÄrCrDrHrHrIrüs4	





&
&%zPlots.validation_curvegg═╠╠╠╠╠­?)┌feature┌feature_namerrSrc
Cs┤tj|â}|jâtjddddâ\}}tjâtj|â|dk	rgtj|îtj|âtj	dâtj
âtj|d|dddd	d
dâ|S)aPlot distribution, using bar plot.
        :param feature: the value of the feature.
        :param feature_name: the name of the feature.
        :param title: the figure title.
        :param ylim: the y-limit for the axis.
        :return: the plot object.
        rrrN┌Probability┌bins┌normed┌	facecolor┌greenrłgÓ?)r)┌unique┌sortr-r.r/rrSr2r1r5┌hist)rörĽrrS┌uniquesrCrDrHrHrI┌distribution_bar└s






%zPlots.distribution_barÚ2)rörĽr┌num_binsrSrc	Csśtjddddâ\}}tjâtj|â|dk	rNtj|îtj|âtjdâtjâtj||ddddd	d
â|S)aGPlot distribution, using histogram.
        :param feature: the value of the feature.
        :param feature_name: the name of the feature.
        :param title: the figure title.
        :param num_bins: number of bins in the histogram.
        :param ylim: the y-limit for the axis.
        :return: the plot object.
        rrrNrľrśrÖrÜrłgÓ?)	r-r.r/rrSr2r1r5rŁ)rörĽrrírSrCrDrHrHrI┌distribution_histŢs





"zPlots.distribution_hist┌gaussiangÓ?)rörĽr┌x_values┌kernel┌	bandwidthrSrcCs\|dkrItjt|ât|ât|ââddůtjf}n|ddůtjf}tjddddâ\}}tjâtj	|â|dk	r░tj
|îtj|âtjdâtj
âtd|d|âjtj|âddůtjfâ}	|	j|â}
tj|ddůdftj|
âd	d
dj|ââ|S)aűPlot distribution, using Kernel Density Estimation (KDE).
        :param feature: the value of the feature.
        :param feature_name: the name of the feature.
        :param title: the figure title.
        :param x_values: the grid to use for plotting (default: based on the feature range and size)
        :param kernel: the kernel to use. Valid kernels are
        :param bandwidth: the bandwidth of the kernel.
        :param ylim: the y-limit for the axis.
        :return: the plot object.
        NrrrrľrąrŽr˙-rLzkernel = '{0}')r)┌linspace┌minr;r+r:r-r.r/rrSr2r1r5r
┌fit┌array┌
score_samplesrT┌expr0)rörĽrrĄrąrŽrSrCrD┌kdeZlog_densrHrHrI┌distribution_kde¨s=





:;zPlots.distribution_kdeÚ    r░)┌__name__┌
__module__┌__qualname__┌staticmethod┌listr┌bool┌str┌MatplotlibFigurerr	┌intrVr_rur{r|r)rĘr┌PandasDataFramerr┌tuplerčrór!r»rHrHrHrIr,sd3+$-"E4- E3H3?5-0r) ┌__doc__┌typingrrrr┌sklearnrZsklearn.model_selectionrrZsklearn.metricsr	Zsklearn.neighborsr
┌matplotlib.pyplot┌pyplotr-┌numpyr)r<r║rŞ┌
__author__┌
__copyright__┌__credits__┌__license__┌__version__┌__maintainer__┌	__email__┌
__status__rrHrHrHrI┌<module>s("