blocks.models¶
Classes¶
BinarySegmentation: Configurable binary segmentation auto-encoder with skip-connections and stuff.RegionProposalNetwork: Configurable region proposal network (RPN) inspired by the Faster R-CNN architecture.
-
class
blocks.models.BinarySegmentation(dataset, log_dir, inputs, outputs, session_config=None, n_gpus=0, restore_from=None, optimizer=None, freeze=False, loss_name='loss', monitor=None, clip_gradient=None, profile=False, keep_profiles=5, **kwargs)[source]¶ Bases:
emloop_tensorflow.model.BaseModelConfigurable binary segmentation auto-encoder with skip-connections and stuff. The segmentation works in parallel with multiple masks, thus, the following outputs are named accordingly.
- Inputs
images(4-dim tensor NHWC) scaled to 0-255<name>(3-dim tensor NHW) scaled to 0/255 for each <name> inmask_names
- Outputs
<name>_probabilitiesand<name>_predictions(3-dim tensor NHW) scaled to 0-1 and 0/1 respectively for each <name> inmask_nameslossand<name>_pixel_lossoptimization targets for each <name> inmask_names<name>_f1,<name>_recalland<name>_precisionperformance measures for each <name> inmask_names
- Requirements
The dataset has to provide
img_shape()method returning a 2- or 3- tuple or list with the image shape. Only the channel dimension needs to be specified, other values are ignored.
example usage in config¶model: name: SegmentationNet class: blocks.models.BinarySegmentation input_name: images mask_names: [masks, masks_eroded] architecture: encoder_config: [16c3, 16c3, 16c3, 16c3, mp2, 32c3, 32c3, 32c3, 32c3, mp2, 64c3, 64c3, 64c3] use_bn: true use_ln: false skip_connections: true l2: 0.00001 balance_loss: false optimizer: class: AdamOptimizer learning_rate: 0.0001 inputs: [images, masks, masks_eroded] outputs: [loss, masks_predictions, masks_probabilities, masks_f1, masks_eroded_predictions, masks_eroded_probabilities, masks_eroded_f1]

-
_create_model(architecture, loss_type='mse', balance_loss=False, l2=0.0, input_name='images', mask_names=('masks', ), final_kernel=(5, 5))[source]¶ Create new binary segmentation auto-encoder.
- Parameters
architecture (
Mapping[~KT, +VT_co]) – architecture configuration as accepted byemloop.models.conv.cnn_autoencoderloss_type (
str) – loss type (eithermse,l1, orxtropy)balance_loss (
Union[bool,str,Mapping[str,str]]) – 0/1 pixel loss balancing. If false, all pixel losses will remain untouched. If true, each pixel loss will be balanced according to the corresponding mask. If string, all pixel losses will be balanced according to the mask identified by the string. If mapping, each pixel losslwill balaned bybalance_loss[l]; pixel losses not present in the mapping will not be balanced at all.l2 (
float) – l2 weights regularization rateinput_name (
str) – stream source name providing the input imagesmask_names (
Sequence[str]) – sequence of stream source names providing the target segmentationsfinal_kernel (
Tuple[int,int]) – kernel size of the final convolution
- Return type
None
-
class
blocks.models.RegionProposalNetwork(dataset, log_dir, inputs, outputs, session_config=None, n_gpus=0, restore_from=None, optimizer=None, freeze=False, loss_name='loss', monitor=None, clip_gradient=None, profile=False, keep_profiles=5, **kwargs)[source]¶ Bases:
emloop_tensorflow.model.BaseModelConfigurable region proposal network (RPN) inspired by the Faster R-CNN architecture.
RPN predicts regions of interest (ROIs) from an input image. RPN starts with encoding the input images into feature maps. For each position of the feature maps, a fixed number of anchors corresponding to fixed regions in the original image is considered.
- For each anchor, RPN predicts:
if the anchor matches to a ROI in the original image
anchor diff (correction) to the respective ROI
- Inputs
images(4-dim tensor NHWC) scaled to 0-255anchors_label(4-dim tensor NHWA) anchors label 0/1 determining if anchors match certain regionsanchors_mask(4-dim tensor NHWA) anchors mask 0/1 determining valid anchors to be traineddiffs(5-dim tensor NHWAD) anchor differences to the respective ROIs, the diff dimension D is configurable
- Outputs
classifier_probabilitiesandclassifier_predictions(4-dim tensors NHWA) scaled to 0-1 and 0/1 respectivelyregression_predictions(5-dim tensor NHWAD) of anchor differences (corrections) to the respective ROIsclassifier_loss,regression_lossandloss(1-dim tensors N)
RPN is tightly connected with datasets used for the training. It needs to learn the input image shape, number of anchors per feature map position and the dimension of the
diffinput. On the other hand, the dataset has to be configured with the feature map shape and amount of pooling applied to the images.- Dataset requirements
img_shape()function returning a 3-tuple or list with the image shapediffs_dim()function returning the last dimension of thediffsinputn_anchors_per_positionpropertyconfigure_shape(features_shape, pool_amount)function which will be called after creating the feature map
example usage in config¶model: name: RegionProposal class: blocks.models.RegionProposalNetwork architecture: encoder_config: [14c3, 14c3, 14c3, 14c3, mp2, 32c3, 32c3, 32c3, 32c3, mp2, 64c3, 64c3, 64c3, 64c3, mp2, 128c3, 128c3, 128c3] use_ln: true optimizer: class: AdamOptimizer learning_rate: 0.0001 inputs: [images, anchors_mask, anchors_label, diffs] outputs: [loss, regression_loss, classifier_loss, classifier_accuracy]
Reference: Faster R-CNN

-
_create_model(architecture, shared_dim=512, window_size=5, loss_ratio=0.5)[source]¶ Create new RPN instance.
- Parameters
architecture (
Mapping[~KT, +VT_co]) – CNN encoder architectureshared_dim (
int) – the dimension of the feature vector shared between the classifier and regression netwindow_size (
int) – sliding window size after the CNN encoderloss_ratio (
float) – ratio from 0-1 interval between the classifier and regression losses, value 0.1 means the classifier will be 9 times less trained that the regression net
- Return type
None