Bot call grouping, streams, and licensing for Microsoft Teams capture

To reduce the load on the Microsoft Media SDK, especially during large meetings, the Microsoft Azure BOT sends two or more call objects to the Verint recording solution. The first call object is always for the meeting host. The next call object is created with the first participant joins the meeting; and then it groups all participants who join thereafter.

The BOT creates more than two call objects when multiple first participants join a call simultaneously. For details, see Known issues and limitations for Microsoft Teams capture.

As part of each call object, the Microsoft BOT sends the following to the Verint recording solution:

A list of participants
One to four audio streams.
- When unmixed audio is enabled on the data source, then each person's voice is received as a separate audio stream. The maximum is four streams at a time.
- When unmixed audio is disabled, then all participant voices are mixed in a single audio stream.
One camera video stream per participant that has their camera on, up to the maximum number of streams allowed. The default is four streams, but you can adjust the number from zero (none) to 10 streams.
One Video-based screen share (Vbss) stream, if anyone shared their screen.

When the Recorder receives the streams, it only captures the media types specified for each monitored participant.

A monitored participant has a Verint Employee account that is associated with a recording profile for the Teams data source. The profile specifies which media types to capture, such as audio, video, and screen share.

For each monitored participant on the Teams call (or meeting), the following media is captured:

One or two audio streams:
- One audio stream is captured when Unmixed audio is disabled on the capture source.
- Two audio streams are captured when Unmixed audio is enabled on the Teams capture source.
The screen share video stream (vbss).
The camera video of the monitored participant that had their camera on, and the camera video of other participants that the monitored participant received.
- The default is 4 camera video streams per monitored participant, but you can adjust the number from zero (none) to 10 streams.

The Recorder limits the number of captured camera video streams per monitored participant. By default, it keeps the camera video from the last four active participants. When all of the available video sockets (0-10) are occupied, and a new monitored participant becomes the dominant speaker, the oldest camera stream is evicted, and the video of the dominant speaker is captured. You can change the maximum number of active participants captured from zero (none) to 10 by configuring the MSTeams.NumberOfCameraSocketsPerCallObject key on the Teams capture source. Microsoft limits the maximum to 10 streams.

Identifying participant media (INUM)

To identify the streams associated to a monitored participant, the IP Recorder creates an INUM (identifying number) for each media stream. There will be an INUM for the audio stream, an INUM for the camera video, and another INUM for a screen share video. The serial number of the recording uniquely identifies the interaction and is stored in the recording buffer.

Required Recorder Channel Licenses

Each participant recorded consumes a channel license for each media type captured:

Each monitored participant audio recording consumes 1 IP Voice Channel license.
Each monitored participant video recording consumes 1 IP Video Channel license (regardless of how many video participants that interaction contains).

A Recorder IP Voice Channel license is consumed for each monitored participant in the meeting; even if they do not speak.

A Recorder IP Video Channel license is required for each recorded participant who has video captured by the Recorder. When a single participant turns on the camera, and there was at least one available video socket to accommodate the video stream, then a IP Video Channel is consumed for interaction.

Voice and Video Channel licenses are set in the following Enterprise Manager screen:

Screenshot of enterprise manager screen

Required Delivery IP Port Usage

Each monitored participant may consume one or more recorder delivery IP ports for media capture (set in Recorder Manager > General Setup > Cards and Filters). For mono audio recording, each recorded participant will consume 1 delivery IP port for audio capture. For speaker separated audio recording (stereo) each recorded participant will consume 2 delivery IP ports for audio capture.

Each recorded participant video stream consumes 1 delivery IP port for video capture.

Delivery IP ports are set in the following Recorder Manager screen:

Screenshot of recorder manager screen

Example:

A Teams call has 10 participants, of which 5 are monitored participants. One person shares their screen. All participants have their camera on. The Teams data source is configured to capture four cameras per participant and it is configured to capture Stereo (unmixed) audio. All 10 participants talked during the meeting.

In this scenario, 50 video ports are used: five for screen share video received by monitored participants, 40 for camera video received by monitored participants. and 20 audio ports are consumed; one for each participant.

To summarize:

10 participants in the call
5 monitored participants
10 participants spoke
10 participants have camera on
5 monitored participants received screen share video
4 camera video allowed per participant

License and port consumption:

5 IP Voice Channel license consumed
5 IP Video Channel license consumed
20 audio delivery IP ports consumed
5 video delivery IP port used for screen share consumed
20 video delivery IP port used for camera video consumed (5 monitored participants x 4 allowed)

diagram: Objects in a 4-person Microsoft Teams conference call when using an Azure BOT

The total number of recorder delivery IP ports consumed for this scenario:

4 audio IP ports
24 video IP ports

The total number of INUMs is 28: 4 audio, 20 camera video, and 4 screen share video.