blocks.models

Classes

  • BinarySegmentation: Configurable binary segmentation auto-encoder with skip-connections and stuff.

  • RegionProposalNetwork: Configurable region proposal network (RPN) inspired by the Faster R-CNN architecture.

class blocks.models.BinarySegmentation(dataset, log_dir, inputs, outputs, session_config=None, n_gpus=0, restore_from=None, optimizer=None, freeze=False, loss_name='loss', monitor=None, clip_gradient=None, profile=False, keep_profiles=5, **kwargs)[source]

Bases: emloop_tensorflow.model.BaseModel

Configurable binary segmentation auto-encoder with skip-connections and stuff. The segmentation works in parallel with multiple masks, thus, the following outputs are named accordingly.

Inputs
  • images (4-dim tensor NHWC) scaled to 0-255

  • <name> (3-dim tensor NHW) scaled to 0/255 for each <name> in mask_names

Outputs
  • <name>_probabilities and <name>_predictions (3-dim tensor NHW) scaled to 0-1 and 0/1 respectively for each <name> in mask_names

  • loss and <name>_pixel_loss optimization targets for each <name> in mask_names

  • <name>_f1, <name>_recall and <name>_precision performance measures for each <name> in mask_names

Requirements
  • The dataset has to provide img_shape() method returning a 2- or 3- tuple or list with the image shape. Only the channel dimension needs to be specified, other values are ignored.

example usage in config
model:
  name: SegmentationNet
  class: blocks.models.BinarySegmentation

  input_name: images
  mask_names: [masks, masks_eroded]

  architecture:
    encoder_config: [16c3, 16c3, 16c3, 16c3, mp2,
                     32c3, 32c3, 32c3, 32c3, mp2,
                     64c3, 64c3, 64c3]
    use_bn: true
    use_ln: false
    skip_connections: true

  l2: 0.00001
  balance_loss: false

  optimizer:
    class: AdamOptimizer
    learning_rate: 0.0001

  inputs: [images, masks, masks_eroded]
  outputs: [loss,
            masks_predictions, masks_probabilities, masks_f1,
            masks_eroded_predictions, masks_eroded_probabilities, masks_eroded_f1]
Inheritance diagram of BinarySegmentation

_create_model(architecture, loss_type='mse', balance_loss=False, l2=0.0, input_name='images', mask_names=('masks', ), final_kernel=(5, 5))[source]

Create new binary segmentation auto-encoder.

Parameters
  • architecture (Mapping[~KT, +VT_co]) – architecture configuration as accepted by emloop.models.conv.cnn_autoencoder

  • loss_type (str) – loss type (either mse, l1, or xtropy)

  • balance_loss (Union[bool, str, Mapping[str, str]]) – 0/1 pixel loss balancing. If false, all pixel losses will remain untouched. If true, each pixel loss will be balanced according to the corresponding mask. If string, all pixel losses will be balanced according to the mask identified by the string. If mapping, each pixel loss l will balaned by balance_loss[l]; pixel losses not present in the mapping will not be balanced at all.

  • l2 (float) – l2 weights regularization rate

  • input_name (str) – stream source name providing the input images

  • mask_names (Sequence[str]) – sequence of stream source names providing the target segmentations

  • final_kernel (Tuple[int, int]) – kernel size of the final convolution

Return type

None

class blocks.models.RegionProposalNetwork(dataset, log_dir, inputs, outputs, session_config=None, n_gpus=0, restore_from=None, optimizer=None, freeze=False, loss_name='loss', monitor=None, clip_gradient=None, profile=False, keep_profiles=5, **kwargs)[source]

Bases: emloop_tensorflow.model.BaseModel

Configurable region proposal network (RPN) inspired by the Faster R-CNN architecture.

RPN predicts regions of interest (ROIs) from an input image. RPN starts with encoding the input images into feature maps. For each position of the feature maps, a fixed number of anchors corresponding to fixed regions in the original image is considered.

For each anchor, RPN predicts:
  • if the anchor matches to a ROI in the original image

  • anchor diff (correction) to the respective ROI

Inputs
  • images (4-dim tensor NHWC) scaled to 0-255

  • anchors_label (4-dim tensor NHWA) anchors label 0/1 determining if anchors match certain regions

  • anchors_mask (4-dim tensor NHWA) anchors mask 0/1 determining valid anchors to be trained

  • diffs (5-dim tensor NHWAD) anchor differences to the respective ROIs, the diff dimension D is configurable

Outputs
  • classifier_probabilities and classifier_predictions (4-dim tensors NHWA) scaled to 0-1 and 0/1 respectively

  • regression_predictions (5-dim tensor NHWAD) of anchor differences (corrections) to the respective ROIs

  • classifier_loss, regression_loss and loss (1-dim tensors N)

RPN is tightly connected with datasets used for the training. It needs to learn the input image shape, number of anchors per feature map position and the dimension of the diff input. On the other hand, the dataset has to be configured with the feature map shape and amount of pooling applied to the images.

Dataset requirements
  • img_shape() function returning a 3-tuple or list with the image shape

  • diffs_dim() function returning the last dimension of the diffs input

  • n_anchors_per_position property

  • configure_shape(features_shape, pool_amount) function which will be called after creating the feature map

example usage in config
model:
  name: RegionProposal
  class: blocks.models.RegionProposalNetwork

  architecture:
    encoder_config: [14c3, 14c3, 14c3, 14c3, mp2,
                     32c3, 32c3, 32c3, 32c3, mp2,
                     64c3, 64c3, 64c3, 64c3, mp2,
                     128c3, 128c3, 128c3]
    use_ln: true

  optimizer:
    class: AdamOptimizer
    learning_rate: 0.0001

  inputs: [images, anchors_mask, anchors_label, diffs]
  outputs: [loss, regression_loss, classifier_loss, classifier_accuracy]

Reference: Faster R-CNN

Inheritance diagram of RegionProposalNetwork

_create_model(architecture, shared_dim=512, window_size=5, loss_ratio=0.5)[source]

Create new RPN instance.

Parameters
  • architecture (Mapping[~KT, +VT_co]) – CNN encoder architecture

  • shared_dim (int) – the dimension of the feature vector shared between the classifier and regression net

  • window_size (int) – sliding window size after the CNN encoder

  • loss_ratio (float) – ratio from 0-1 interval between the classifier and regression losses, value 0.1 means the classifier will be 9 times less trained that the regression net

Return type

None