Skip to content

dev -> dev-1.2.0-rc#294

Merged
ojw28 merged 17 commits intodev-1.2.0-rcfrom
dev
Feb 13, 2015
Merged

dev -> dev-1.2.0-rc#294
ojw28 merged 17 commits intodev-1.2.0-rcfrom
dev

Conversation

@ojw28
Copy link
Contributor

@ojw28 ojw28 commented Feb 13, 2015

No description provided.

ojw28 added 17 commits February 10, 2015 12:25
This is the start of a sequence of changes to fix the ref'd
github issue. Currently TsExtractor involves multiple memory
copy steps:

DataSource->Ts_BitArray->Pes_BitArray->Sample->SampleHolder

This is inefficient, but more importantly, the copy into
Sample is problematic, because Samples are of dynamically
varying size. The way we end up expanding Sample objects to
be large enough to hold the data being written means that we
end up gradually expanding all Sample objects in the pool
(which wastes memory), and that we generate a lot of GC churn,
particularly when switching to a higher quality which can
trigger all Sample objects to expand.

The fix will be to reduce the copy steps to:

DataSource->TsPacket->SampleHolder

We will track Pes and Sample data with lists of pointers into
TsPackets, rather than actually copying the data. We will
recycle these pointers.

The following steps are approximately how the refactor will
progress:

1. Start reducing use of BitArray. It's going to be way too
complicated to track bit-granularity offsets into multiple packets,
and allow reading across packet boundaries. In practice reads
from Ts packets are all byte aligned except for small sections,
so we'll move over to using ParsableByteArray instead, so we
only need to track byte offsets.

2. Move TsExtractor to use ParsableByteArray except for small
sections where we really need bit-granularity offsets.

3. Do the actual optimization.

Issue: #278
- TsExtractor is now based on ParsableByteArray rather than BitArray.
  This makes is much clearer that, for the most part, data is byte
  aligned. It will allow us to optimize TsExtractor without worrying
  about arbitrary bit offsets.
- BitArray is renamed ParsableBitArray for consistency, and is now
  exclusively for bit-stream level reading.
- There are some temporary methods in ParsableByteArray that should be
  cleared up once the optimizations are in place.

Issue: #278
1. AdtsReader would previously copy all data through an intermediate
adtsBuffer. This change eliminates the additional copy step, and
instead copies directly into Sample objects.

2. PesReader would previously accumulate a whole packet by copying
multiple TS packets into an intermediate buffer. This change
eliminates this copy step. After the change, TS packet buffers
are propagated directly to PesPayloadReaders, which are required
to handle partial payload data correctly. The copy steps in the
extractor are simplified from:

DataSource->Ts_BitArray->Pes_BitArray->Sample->SampleHolder

To:

DataSource->Ts_BitArray->Sample->SampleHolder

Issue: #278
There's no code change here at all, except for how TsExtractor's
getLargestSampleTimestamp method works.
I'm not really a fan of micro-optimizations, but given this method
scans through every H264 frame in the HLS case, it seems worthwhile.
The trick here is to examine the first 7 bits of the third byte
first. If they're not all 0s, then we know that we haven't found a
NAL unit, and also that we wont find one at the next two positions.
This allows the loop to increment 3 bytes at a time.

Speedup is around 60% on Art according to some ad-hoc benchmarking.
Reordering in the extractor isn't going to work well with the
optimizations I'm making there. This change moves sorting back
to the renderer, although keeps all of the renderer
simplifications. It's basically just moving where the sort
happens from one place to another.
The complexity around not enabling the video renderer before it
has a valid surface is because MediaCodecTrackRenderer supports
a "discard" mode where it pulls through and discards samples
without a decoder. This mode means that if the demo app were to
enable the renderer before supplying the surface, the renderer
could discard the first few frames prior to getting the surface,
meaning video rendering wouldn't happen until the following sync
frame.

To get a handle on complexity, I think we're better off just removing
support for this mode, which nicely decouples how the demo app
handles surfaces v.s. how it handles enabling/disabling renderers.
- Remove TsExtractor's knowledge of Sample.
- Push handling of Sample objects into SampleQueue as much
  as possible. This is a precursor to replacing Sample objects
  with a different type of backing memory. Ideally, the
  individual readers shouldn't know how the sample data is
  stored. This is true after this CL, with the except of the
  TODO in H264Reader.
- Avoid double-scanning every H264 sample for NAL units, by
  moving the scan for SEI units from SeiReader into H264Reader.

Issue: #278
Use of Sample objects was inefficient for several reasons:

- Lots of objects (1 per sample, obviously).
- When switching up bitrates, there was a tendency for all Sample
  instances to need to expand, which effectively led to our whole
  media buffer being GC'd as each Sample discarded its byte[] to
  obtain a larger one.
- When a keyframe was encountered, the Sample would typically need
  to expand to accommodate it. Over time, this would lead to a
  gradual increase in the population of Samples that were sized to
  accommodate keyframes. These Sample instances were then typically
  underutilized whenever recycled to hold a non-keyframe, leading
  to inefficient memory usage.

This CL introduces RollingBuffer, which tightly packs pending sample
data into a byte[]s obtained from an underlying BufferPool. Which
fixes all of the above. There is still an issue where the total
memory allocation may grow when switching up bitrate, but we can
easily fix that from this point, if we choose to restrict the buffer
based on allocation size rather than time.

Issue: #278
I think this is the limit of how far we should be pushing complexity
v.s. efficiency. It's a little complicated to understand, but probably
worth it since the H264 bitstream is the majority of the data.

Issue: #278
This prevents excessive memory consumption when switching to
very high bitrate streams.

Issue: #278
ojw28 added a commit that referenced this pull request Feb 13, 2015
@ojw28 ojw28 merged commit 2f04580 into dev-1.2.0-rc Feb 13, 2015
@google google locked and limited conversation to collaborators Jun 28, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments