`blocks.models`¶

Classes¶

BinarySegmentation: Configurable binary segmentation auto-encoder with skip-connections and stuff.
RegionProposalNetwork: Configurable region proposal network (RPN) inspired by the Faster R-CNN architecture.

class blocks.models.BinarySegmentation(dataset, log_dir, inputs, outputs, session_config=None, n_gpus=0, restore_from=None, optimizer=None, freeze=False, loss_name='loss', monitor=None, clip_gradient=None, profile=False, keep_profiles=5, **kwargs)[source]¶

Bases: emloop_tensorflow.model.BaseModel

Configurable binary segmentation auto-encoder with skip-connections and stuff. The segmentation works in parallel with multiple masks, thus, the following outputs are named accordingly.

Inputs

images (4-dim tensor NHWC) scaled to 0-255
<name> (3-dim tensor NHW) scaled to 0/255 for each <name> in mask_names

Outputs

<name>_probabilities and <name>_predictions (3-dim tensor NHW) scaled to 0-1 and 0/1 respectively for each <name> in mask_names
loss and <name>_pixel_loss optimization targets for each <name> in mask_names
<name>_f1, <name>_recall and <name>_precision performance measures for each <name> in mask_names

Requirements

The dataset has to provide img_shape() method returning a 2- or 3- tuple or list with the image shape. Only the channel dimension needs to be specified, other values are ignored.

example usage in config¶

model:
  name: SegmentationNet
  class: blocks.models.BinarySegmentation

  input_name: images
  mask_names: [masks, masks_eroded]

  architecture:
    encoder_config: [16c3, 16c3, 16c3, 16c3, mp2,
                     32c3, 32c3, 32c3, 32c3, mp2,
                     64c3, 64c3, 64c3]
    use_bn: true
    use_ln: false
    skip_connections: true

  l2: 0.00001
  balance_loss: false

  optimizer:
    class: AdamOptimizer
    learning_rate: 0.0001

  inputs: [images, masks, masks_eroded]
  outputs: [loss,
            masks_predictions, masks_probabilities, masks_f1,
            masks_eroded_predictions, masks_eroded_probabilities, masks_eroded_f1]

Inheritance diagram of BinarySegmentation

_create_model(architecture, loss_type='mse', balance_loss=False, l2=0.0, input_name='images', mask_names=('masks', ), final_kernel=(5, 5))[source]¶

Create new binary segmentation auto-encoder.

Parameters

architecture (Mapping[~KT, +VT_co]) – architecture configuration as accepted by emloop.models.conv.cnn_autoencoder
loss_type (str) – loss type (either mse, l1, or xtropy)
balance_loss (Union[bool, str, Mapping[str, str]]) – 0/1 pixel loss balancing. If false, all pixel losses will remain untouched. If true, each pixel loss will be balanced according to the corresponding mask. If string, all pixel losses will be balanced according to the mask identified by the string. If mapping, each pixel loss l will balaned by balance_loss[l]; pixel losses not present in the mapping will not be balanced at all.
l2 (float) – l2 weights regularization rate
input_name (str) – stream source name providing the input images
mask_names (Sequence[str]) – sequence of stream source names providing the target segmentations
final_kernel (Tuple[int, int]) – kernel size of the final convolution

Return type

None

class blocks.models.RegionProposalNetwork(dataset, log_dir, inputs, outputs, session_config=None, n_gpus=0, restore_from=None, optimizer=None, freeze=False, loss_name='loss', monitor=None, clip_gradient=None, profile=False, keep_profiles=5, **kwargs)[source]¶

Bases: emloop_tensorflow.model.BaseModel

Configurable region proposal network (RPN) inspired by the Faster R-CNN architecture.

RPN predicts regions of interest (ROIs) from an input image. RPN starts with encoding the input images into feature maps. For each position of the feature maps, a fixed number of anchors corresponding to fixed regions in the original image is considered.

For each anchor, RPN predicts:

if the anchor matches to a ROI in the original image
anchor diff (correction) to the respective ROI

Inputs

images (4-dim tensor NHWC) scaled to 0-255
anchors_label (4-dim tensor NHWA) anchors label 0/1 determining if anchors match certain regions
anchors_mask (4-dim tensor NHWA) anchors mask 0/1 determining valid anchors to be trained
diffs (5-dim tensor NHWAD) anchor differences to the respective ROIs, the diff dimension D is configurable

Outputs

classifier_probabilities and classifier_predictions (4-dim tensors NHWA) scaled to 0-1 and 0/1 respectively
regression_predictions (5-dim tensor NHWAD) of anchor differences (corrections) to the respective ROIs
classifier_loss, regression_loss and loss (1-dim tensors N)

RPN is tightly connected with datasets used for the training. It needs to learn the input image shape, number of anchors per feature map position and the dimension of the diff input. On the other hand, the dataset has to be configured with the feature map shape and amount of pooling applied to the images.

Dataset requirements

img_shape() function returning a 3-tuple or list with the image shape
diffs_dim() function returning the last dimension of the diffs input
n_anchors_per_position property
configure_shape(features_shape, pool_amount) function which will be called after creating the feature map

example usage in config¶

model:
  name: RegionProposal
  class: blocks.models.RegionProposalNetwork

  architecture:
    encoder_config: [14c3, 14c3, 14c3, 14c3, mp2,
                     32c3, 32c3, 32c3, 32c3, mp2,
                     64c3, 64c3, 64c3, 64c3, mp2,
                     128c3, 128c3, 128c3]
    use_ln: true

  optimizer:
    class: AdamOptimizer
    learning_rate: 0.0001

  inputs: [images, anchors_mask, anchors_label, diffs]
  outputs: [loss, regression_loss, classifier_loss, classifier_accuracy]

Reference: Faster R-CNN

Inheritance diagram of RegionProposalNetwork

_create_model(architecture, shared_dim=512, window_size=5, loss_ratio=0.5)[source]¶

Create new RPN instance.

Parameters

architecture (Mapping[~KT, +VT_co]) – CNN encoder architecture
shared_dim (int) – the dimension of the feature vector shared between the classifier and regression net
window_size (int) – sliding window size after the CNN encoder
loss_ratio (float) – ratio from 0-1 interval between the classifier and regression losses, value 0.1 means the classifier will be 9 times less trained that the regression net

Return type

None

Related projects

blocks.models¶

Classes¶

`blocks.models`¶