ANSI SCTE 242-4-2018 pdf download.Next Generation Audio Coding Constraints for Cable Systems
6.2. Overview
The DTS-UHD bitstream supports 32 pre-defined channel locations and definition of up to 224 objects plus 32 object groups. A DTS-UHD object is a set of coded waveforms plus an associated metadata structure, which are referred to in ETSI TS 103 491 [3] as an audio chunk element, and a metadata chunk element. It is worth noting that more than one object can reference the same audio chunk. These chunks are carried within an Audio Frame. In addition to metadata required to decode the audio chunk elements, metadata chunk elements may carry metadata that will be passed downstream to a renderer, e.g. as described in TS 103 584 [9]. The audio frame is organized by a frame table of contents (FTOC) at the beginning of the frame. The FTOC locates the metadata chunks and audio chunks within the frame, and creates associations of objects, object groups, and presentations. An object group is a collection of objects always played together and assigned to a single object ID. A presentation is a list of object IDs plus some additional metadata, usually including loudness and dynamics parameters. Up to 32 presentations can be defined in a stream. A more detailed introduction to the DTS-UHD stream architecture is provided in clause 4 of ETSI TS 103 491 [3]. The DTS-UHD decoder processes the selected audio chunks into sets of linear PCM waveforms. The waveforms are then passed to a renderer along with their object metadata, plus loudness and dynamic range metadata for the entire presentation. A reference renderer for DTS-UHD can be found in ETSI TS 103 584 [9]. A presentation is referenced by a single parameter when invoking the decoding process. This parameter is referred to in ETSI TS 103 491 [2] as ucAudPresIndex. Note that a given presentation may internally reference other presentations defined within the same stream. All objects and object groups will be combined into presentations for the purposes of linear broadcast applications, so the preselection interfaces in the consumer premises equipment will not reference audio objects or object groups directly.
6.3. Sync frames and non-sync frames
A given DTS-UHD audio frame is either sync-frame or non-sync frame. Properties of sync frames are specified in ETSI TS 103 491 [3]. This clause provides an overview of some of the important implications of these frame types. A sync frame contains all parameters necessary to unpack metadata and audio chunks, describe audio chunks, render and process audio samples and generate a frame of linear PCM samples. A decoder can attempt to establish initial synchronization only in a sync frame. A non-sync frame may only contain parameters that have changed in value since the previous frame or sync frame to minimize payload size. The time period between sync frames is a sync interval. The sync interval can be any duration, but it is recommended to be at least 500 ms, and is nominally about 2 seconds.
7. Multi-stream playback
When an audio program contains multiple DTS-UHD streams, there shall always be one “main” stream, and a maximum of seven “auxiliary” streams. The main stream contains a required default Audio Preselection and may contain additional Preselections and Components. The auxiliary streams contain additional Components to create new Preselections. Every Preselection shall contain all audio and metadata assets needed to render the final output to the speakers.
The conceptual model of multi-stream playback is that of multiple decoder sessions running in parallel. An implementation choice may be a single decoder processing the frames from the various streams sequentially, then rendering all waveforms from the given time interval together to generate the final output to the speakers. When an Audio Program contains multiple elementary streams, the indexing of the streams contributing to the Audio Preselection determine which metadata will be used to render the final output. The final rendering metadata for scaling the output is always provided by the highest indexed stream in the sequence that contains such metadata. An example is shown in Figure 1. Here we see three streams contributing to a preselection.ANSI SCTE 242-4-2018 pdf download