[27c943]: / pathflowai / __pycache__ / datasets.cpython-36.pyc

Download this file

264 lines (233 with data), 14.1 kB

3

┴┐A]@Ń@sŘdZddlZddlmZddlZddlZddljZddl	Z
ddlZddl
TddlZddlZddlmZmZddlZddlZddlZddlmZddlmZddlmZdd	lmZd
däZ dggdd
dfddäZ!ddäZ"ddäZ#ddäZ$GddädeâZ%dS)z×
datasets.py
=======================
Houses the DynamicImageDataset class, also functions to help with image color channel normalization, transformers, etc..
ÚN)┌
transforms)┌*)┌Dataset┌
DataLoader)┌pytorch)┌LabelBinarizer)┌compute_class_weight)┌
class2one_hotcCsddäS)zvTransformer for random 90 degree rotation image.

	Returns
	-------
	function
		Transformer function for operation.

	cSs |jtjddddgddŹdâS)NrÚZÚ┤iÚ)┌k)┌rotate┌random┌sample)┌imgęr˙D/Users/joshualevy/Documents/GitHub/PathFlowAI/pathflowai/datasets.py┌<lambda>#sz RandomRotate90.<locals>.<lambda>rrrrr┌RandomRotate90s	rF┌torchTcCs┬tjtjâtj||fâtj|âtjdddddŹtjâtjâtâtj	âtj
|rP|ndddg|dk	rd|ndddgâg	âtjtjâtj||fâtj|âtj	âtj
|ró|ndddg|dk	rÂ|ndddgâgâtjtjâtj||fâtj|âtj	âtj
|r˘|ndddg|dk	Ér
|ndddgâgâtjtjâtj|âtj	âgâdťtjj
jtjjj||âtjjj||âg|Ésĺtjjjdd	Źtjjjdd	Źtjjjdd	Źgntjjjdd	Źtjjjdd	Źgtjj	t|Ér─|ndddg|dk	Ér┌|ndddgd
ŹdŹgâtjj
jtjjj||âtjjj||âtjj	t|Ér&|ndddg|dk	Ér<|ndddgd
ŹdŹgâtjj
jtjjj||âtjjj||âtjj	t|Érć|ndddg|dk	Érť|ndddgd
ŹdŹgâdťd
ť}||S)a╣Get data transformers for training test and validation sets.

	Parameters
	----------
	patch_size:int
		Original patch size being transformed.
	mean : list of float
		Mean RGB
	std : list of float
		Std RGB
	resize : int
		Which patch size to resize to.
	transform_platform : str
		Use pytorch or albumentation transforms.
	elastic : bool
		Whether to add elastic deformations from albumentations.

	Returns
	-------
	dict
		Transformers.

	gÜÖÖÖÖÖÚ?gÓ?)┌
brightness┌contrast┌
saturation┌huegffffffŠ?g333333Ń?Ng333333├?)┌train┌val┌test┌pass)┌p)┌mean┌std)┌	normalize)rrr)r┌albumentations)r┌ComposeZ
ToPILImage┌ResizeZ
CenterCropZColorJitterZRandomHorizontalFlipZRandomVerticalFliprZToTensor┌	Normalize┌alb┌coreZcompositionZ
augmentationsZFlipZ	TransposeZShiftScaleRotateZElasticTransform┌albtorch┌dict)┌
patch_sizer r!┌resize┌transform_platform┌elasticZdata_transformsrrr┌get_data_transforms%sJ..0░<Fr/cCstd||ddŹS)zĘCreate transformers.

	Parameters
	----------
	mean : list
		See get_data_transforms.
	std : list
		See get_data_transforms.

	Returns
	-------
	dict
		Transformers.

	ÚÓT)r+r r!r,)r/)r r!rrr┌create_transformsusr1cCsćtjj|ârtj|â}nd|i}d|kÉrétdggdddŹ}||d<tf|Ä}|dr^|jât|ddd	d
Ź}tj	dddgtj
dŹ}tj	dddgtj
dŹ}tjjâr┤|jâ}|jâ}tj
âĆRxJt|âD]>\}\}	}
tjjârŠ|	jâ}	|tj|	dâ7}|tj|	dâ7}q╚WWdQRX|d}|t
|â}|t
|â}|jâjâjâjâ}|jâjâjâjâ}tjt||dŹ|dâtj|dâ}|S)a Find mean and standard deviation of images in batches.

	Parameters
	----------
	normalization_file : str
		File to store normalization information.
	dataset_opts : type
		Dictionary storing information to create DynamicDataset class.

	Returns
	-------
	dict
		Stores RGB mean, stdev.

	┌normalization_filer0Tr)r+r r!r,r-┌transformers┌classify_annotationsÚÇÚ)┌
batch_size┌shuffle┌num_workersg)┌dtyperÚÚNr)r r!)rr;r<)rr;r<)┌os┌path┌existsr┌loadr/┌DynamicImageDataset┌binarize_annotationsr┌tensor┌float┌cuda┌is_available┌no_grad┌	enumerater r!┌detach┌cpu┌numpy┌tolist┌saver*)r2┌dataset_opts┌	norm_dictr3┌dataset┌
dataloaderZall_meanZall_std┌i┌X┌_┌Nrrr┌get_normalizerës:




rVcCs"|d||dŹ}|d|djâfS)aRun albumentations and return an image and its segmentation mask.

	Parameters
	----------
	img : array
		Image as array
	mask : array
		Categorical pixel by pixel.
	transformer :
		Transformation object.

	Returns
	-------
	tuple arrays
		Image and mask array.

	T)┌image┌maskrWrX)┌long)rrX┌transformer┌resrrr┌segmentation_transform╚sr\c@steZdZdZgddddddddddfd	d
äZddäZd
däZddäZdddäZdddäZ	ddäZ
ddäZddäZdS) rAaˇGenerate image dataset that accesses images and annotations via dask.

	Parameters
	----------
	dataset_df : dataframe
		Dataframe with WSI, which set it is in (train/test/val) and corresponding WSI labels if applicable.
	set : str
		Whether train, test, val or pass (normalization) set.
	patch_info_file : str
		SQL db with positional and annotation information on each slide.
	transformers : dict
		Contains transformers to apply on images.
	input_dir : str
		Directory where images comes from.
	target_names : list/str
		Names of initial targets, which may be modified.
	pos_annotation_class : str
		If selected and predicting on WSI, this class is labeled as a positive from the WSI, while the other classes are not.
	other_annotations : list
		Other annotations to consider from patch info db.
	segmentation : bool
		Conducting segmentation task?
	patch_size : int
		Patch size.
	fix_names : bool
		Whether to change the names of dataset_df.
	target_segmentation_class : list
		Now can be used for classification as well, matched with two below options, samples images only from this class. Can specify this and below two options multiple times.
	target_threshold : list
		Sampled only if above this threshold of occurence in the patches.
	oversampling_factor : list
		Over sample them at this amount.
	n_segmentation_classes : int
		Number classes to segment.
	gdl : bool
		Using generalized dice loss?
	mt_bce : bool
		For multi-target prediction tasks.
	classify_annotations : bool
		For classifying annotations.

	Fr0Tršg­?r6cst||ł_tj|â}|dkr d}|ł_|ł_|ł_|	ł_tłjâdkrRłjdł_|dkrjçfddäł_nBłjrÇçfddäł_n,dt	łjâkr×çfd	däł_nçfd
däł_||d|kł_
łjrÍdł_d
łj
łj<łjoÓ|Érłj
djtâłj
j
ddůdf<tjłj
jdâj
ddůłjfâł_łjÉrXłjÉrX|ÉrR|gt|âł_ndł_tłjâłjjjâ}t|łj||
łj|||
|dŹ	}tf|Äł_łjÉrż|dkÉrżçfddä|Dâł_çfddä|Dâł_|dkÉrÔdł_|dkÉrtjłjgt|âddŹjddŹł_n"|dkÉr4łjj|dŹjddŹł_łjjdł_ |ł_!łjÉrT|ndł_"dł_#|ł_$tłjâdS)Nrrrrcsłj|âtjdtjdŹfS)Ng­?)r:)rZrrCrD)┌x┌y)┌selfrrrsz.DynamicImageDataset.__init__.<locals>.<lambda>cst||łjâS)N)r\rZ)r^r_)r`rrrsrcs łjd|dŹdtj|âjâfS)NT)rWrW)rZr┌
from_numpyrD)r^r_)r`rrrscsłj|âtj|âjâfS)N)rZrrarD)r^r_)r`rrr!s┌set┌targetg­?┌ID)	┌
input_info_db┌slide_labels┌pos_annotation_classr+┌segmentation┌other_annotations┌target_segmentation_class┌target_thresholdr4cs.i|]&}tjtjtłdj|ââddŹâ|ôqS)z{}_mask.npyzr+)┌	mmap_mode)┌da┌
from_array┌npr@┌join┌format)┌.0┌slide)┌	input_dirrr˙
<dictcomp>5sz0DynamicImageDataset.__init__.<locals>.<dictcomp>cs$i|]}tjtłdj|âââ|ôqS)z{}.zarr)rm┌	from_zarrrprq)rrrs)rtrrru6sF)┌axisT)┌drop)┌frac)%rZ┌copy┌deepcopy┌targets┌mt_bcerbrh┌len┌transform_fn┌dirZ	image_set┌map┌fix_name┌loc┌pd┌	DataFrame┌	set_indexZ
slide_info┌list┌print┌indexrLr*┌modify_patch_info┌
patch_info┌segmentation_maps┌slides┌concat┌int┌reset_indexr┌shape┌length┌n_segmentation_classes┌gdl┌	binarizedr4)r`┌
dataset_dfrb┌patch_info_filer3rt┌target_namesrgrirhr+┌	fix_namesrjrk┌oversampling_factorrôrör}r4Zoriginal_set┌IDsZpi_dictr)rtr`r┌__init__s^

 $


&
zDynamicImageDataset.__init__cCsFtj|j|jgddŹjddŹ|_|jjd|_|jrB|jj|jâdS)zžConcatenate this dataset with others. Updates its own internal attributes.

		Parameters
		----------
		other_dataset : DynamicImageDataset
			Other image dataset.

		r)rwT)rxN)	rärÄrőrÉrĹrĺrhrî┌update)r`Z
other_datasetrrrrÄFs	 zDynamicImageDataset.concatcCs*|jj|jd|k|_|jjd|_|S)zĽReduce the sample set to just images from one ID.

		Parameters
		----------
		ID : str
			Basename/ID to predict on.

		Returns
		-------
		self

		rdr)rőrârĹrĺ)r`rdrrr┌	retain_IDUs
zDynamicImageDataset.retain_IDccs6x0|jdjâD]}tj|â}||j|âfVqWdS)zĺGenerator similar to groupby, but splits up by ID, generates (ID,data) using retain_ID.

		Returns
		-------
		generator
			ID, DynamicDataset

		rdN)rő┌uniquerzr{r×)r`rdZnew_datasetrrr┌split_by_IDfs	
zDynamicImageDataset.split_by_IDrcCsŔ|jr4d|jttttt|jââââjddŹj}n░|j	rbd|j|j
jddŹj}|t|â}né|jrÉt|j
âdkrÉt
j|j|j
jddŹ}n0t|j
âtdâkr░|j|j
}n|j|j
|}|jjtâjâ}tdt
j|â|dŹ}|S)zđWeight loss function with weights inversely proportional to the class appearence.

		Parameters
		----------
		i : int
			If multi-target, class used for weighting.

		Returns
		-------
		self
			Dataset.

		g­?r)rwr┌┌balanced)┌class_weight┌classesr_)rhrőrçrü┌str┌rangerô┌sum┌valuesr}r|rĽr~ro┌argmax┌type┌astyperĆ┌flattenrrč)r`rR┌weightsr_rrr┌get_class_weightsss.z%DynamicImageDataset.get_class_weightsNcCsP|jd}ddät|jjddůddůfâDâ}|jrÓ|dkrÓ|dkrXtâj|â|_ntj|â|_|jj	|_
tj|jj
|â|jj|j
dŹjtâ}x░t|âD]>}|t|jâkr╩||j|jjddů|f<qť||j|j|<qťWndd|_||_
|dkÉr|j
d
g|_
|dkÉr:|j|j
|kjjtjâ|jjddů|j
f<t|j
âd	|_|jS)apLabel binarize some annotations or threshold them if classifying slide annotations.

		Parameters
		----------
		binarizer : LabelBinarizer
			Binarizes the labels of a column(s)
		num_targets : int
			Number of desired targets to preidict on.
		binary_threshold : float
			Amount of annotation in patch before positive annotation.

		Returns
		-------
		binarizer

		┌
annotationcSsg|]}|dkr|ĹqS)┌arear)rr┌annotrrr˙
<listcomp>Ąsz<DynamicImageDataset.binarize_annotations.<locals>.<listcomp>NÚr)rë┌columnsgTÚ    )rőrç┌ilocr}r┌fit┌	binarizerrzr{┌classes_r|rärů┌	transformrërźrDrĘrâro┌float32rłrĽ)r`rŞ┌num_targets┌binary_threshold┌annotationsZannotsZannotation_labels┌colrrrrBĹs*
&
$

,
z(DynamicImageDataset.binarize_annotationscCs.tjjdâ|jj|dŹ|_|jjd|_dS)z^Sample subset of dataset.

		Parameters
		----------
		p : float
			Fraction to subsample.

		Ú*)ryrN)ror┌seedrőrrĹrĺ)r`rrrr┌	subsampleŻs	zDynamicImageDataset.subsamplecCs\|jj|}|d}|j}d}|jsî||j}t|tjârr|jjt	â}|j
rr|jrrt|âdkrrt
j|jââ}d}t
j|â}|jsî|jdâ}|d}|d}|d}	|js«|n&t
j|j||||	ů|||	ůfâ}|j|j||||	ů|||	ůddůfjâjt
jâ|â\}
}|jÉr@|jÉr@|jÉr@|Ér@|jâ}|jÉrTt||jâ}|
|fS)	NrdFrTr^r_r+r<)rőrÂr|rh┌
isinstancerä┌SeriesrĘrźrDrĽr}r~ro┌arrayręrĹ┌reshaperîrrŹ┌compute┌uint8r4rYrör	rô)r`rRrőrdr|Zuse_longr_┌xs┌ysr+rWrrr┌__getitem__╦s0


2@"zDynamicImageDataset.__getitem__cCs|jS)N)rĺ)r`rrr┌__len__šszDynamicImageDataset.__len__rÁ)r)Nrr])
┌__name__┌
__module__┌__qualname__┌__doc__rťrÄr×rár«rBr┬r╦r╠rrrrrAŮs* ;

,rA)&rđrZtorchvisionrr=┌dask┌
dask.arrayr┼rm┌pandasrärKro┌pathflowai.utils┌	pysnooper┌
nonechucks┌nc┌torch.utils.datarrrr#r'rzrr)Zsklearn.preprocessingrZsklearn.utils.class_weightrZpathflowai.lossesr	rr/r1rVr\rArrrr┌<module>s,P?