Vision sensor filter composition

A vision sensor normally produces two images at each simulation pass: a color image and a depth map. Those two images can be programmatically inspected by retrieving them through appropriate API function calls, then iterating over each individual pixel or depth map value. While this approach allows a maximum of flexibility, it is however troublesome and impractical. Instead, it is much more convenient (and fast!) to use the built-in filtering and triggering capabilities. Indeed, each vision sensor has a filter associated, that can be composed in a very flexible way by combining several components. Following figure illustrates a simple filter that inverts colors:

[Vision sensor filter with 3 components]


A component can perform 4 basic operations:

  • Transfer data from one buffer to another (e.g. transfer input image to work image)
  • Perform operations on one or more buffers (e.g. invert work image)
  • Activate a trigger (e.g. if average image intensity > 0.3 then activate trigger)
  • Return specific values that can be accessed through an API call (e.g. return the position of the center of mass of a binary image)
  • Following figure illustrates the various types of buffers a component can access:

    [Vision sensor buffers and operations between buffers]


    While the input image and input depth image are volatile buffers (i.e. normally automatically overwritten with new data at each simulation pass), the work image, buffer1 image and buffer2 image are persistent buffers (i.e. their content is not modified unless a component operates on them, e.g. persistent buffers can be used to compare vision sensor data from one simulation pass to the next simulation pass).

    A vision sensor is triggered if at least one component of its filter activates the trigger. The API function simHandleVisionSensor (or simReadVisionSensor) returns the sensor's trigger state, and a series of values or packets. The first packet may contain different data depending on the vision sensor settings:

  • 15 auxiliary values (default): the values are calculated over all the image pixels and represent the minimum of intensity, red, green, blue, depth value, the maximum of intensity, red, green, blue, depth value, and the average of intensity, red, green, blue, depth value. On a higher resolution image, computation might slow down vision sensors, and if those values are not used, their calculation can be turned off in the vision sensor properties (Packet1 is blank (faster))
  • 15 blank values: the values do not represent anything. This happens when the option Packet1 is blank (faster) in the vision sensor propreties is turned on, in order to speed-up the operation of a vision sensor.
  • n values: the values represent the object handles of seen objects, when the vision sensor's render mode is object handles in the vision sensor properties. In that mode the object handles are coded/decoded as RGB values, in order to identify the object handles of all seen objects. The values in the returned packet should be rounded down.
  • If additional filter components return values, then they will be appended as packets to the first packet. See the function's API description for more details.

    V-REP has more than 30 built-in filter components that can be combined as needed. In addition, new filter components can be developed through plugins. When a filter component from a plugin is used, then you should always distribute the created scene with the plugin, or you should make sure to check whether the plugin is present with the simGetModuleName API command in a child script (or C-application).


    Recommended topics

  • Vision sensors
  • Vision sensor types and mode of operation
  • Vision sensor properties