Skip to content

dev-1.2.0-rc -> master#297

Merged
ojw28 merged 182 commits intomasterfrom
dev-1.2.0-rc
Feb 16, 2015
Merged

dev-1.2.0-rc -> master#297
ojw28 merged 182 commits intomasterfrom
dev-1.2.0-rc

Conversation

@ojw28
Copy link
Contributor

@ojw28 ojw28 commented Feb 16, 2015

No description provided.

andudo and others added 30 commits October 1, 2014 17:03
Add file missing from "Improve subtitle handling" change.
- The HlsSampleSource now owns the extractor. TsChunk is more or less dumb.
  The previous model was weird, because you'd end up "reading" samples from
  TsChunk objects that were actually parsed from the previous chunk (due to
  the way the extractor was shared and maintained internal queues).
- Split out consuming and reading in the extractor.
- Make it so we consume 5s ahead. This is a window we allow for uneven
  interleaving, whilst preventing huge read-ahead (e.g. in the case of sparse
  ID3 samples).
- Avoid flushing the extractor for a discontinuity until it has been fully
  drained of previously parsed samples. This avoids skipping media shortly
  before discontinuities.
- Also made start-up faster by avoiding double-loading the first segment.

Issue: #3
- Add a readBit method to BitsArray for reading a boolean flag.
- Make things accessed from inner classes package visibility to avoid
  the compiler generating thunk methods.
Use largeHeap in demo app
Add cookies support and use the same UserAgent in ManifestFetcher and in...
Merge dev-l into dev-hls, now L SDK is released.
ojw28 added 26 commits February 6, 2015 11:31
If the timesource track renderer ends, but other track renderers
haven't finished, the player would get stuck in a pending state.
This change enables automatic switching to the media clock in the
case that the timesource renderer has ended, which allows other
renderers to continue to play.
Also add parseMp4vFromParent and return the track's duration in parseTrak.

This is in preparation for adding a non-fragmented MP4 extractor.
- Remove simple variant. Maintaining both simple + full is
  unnecessary effort.
- Remove need to specify content id in Sample definition,
  except where it's actually required (for DRM requests in
  the Widevine GTS samples)
- Also make text renderers respect the decodeOnly flag.
- Also fix AC3 passthrough to always allocate direct buffers.
- Bump version.
- Remove AC3 renderer. It's not ready.
This is the start of a sequence of changes to fix the ref'd
github issue. Currently TsExtractor involves multiple memory
copy steps:

DataSource->Ts_BitArray->Pes_BitArray->Sample->SampleHolder

This is inefficient, but more importantly, the copy into
Sample is problematic, because Samples are of dynamically
varying size. The way we end up expanding Sample objects to
be large enough to hold the data being written means that we
end up gradually expanding all Sample objects in the pool
(which wastes memory), and that we generate a lot of GC churn,
particularly when switching to a higher quality which can
trigger all Sample objects to expand.

The fix will be to reduce the copy steps to:

DataSource->TsPacket->SampleHolder

We will track Pes and Sample data with lists of pointers into
TsPackets, rather than actually copying the data. We will
recycle these pointers.

The following steps are approximately how the refactor will
progress:

1. Start reducing use of BitArray. It's going to be way too
complicated to track bit-granularity offsets into multiple packets,
and allow reading across packet boundaries. In practice reads
from Ts packets are all byte aligned except for small sections,
so we'll move over to using ParsableByteArray instead, so we
only need to track byte offsets.

2. Move TsExtractor to use ParsableByteArray except for small
sections where we really need bit-granularity offsets.

3. Do the actual optimization.

Issue: #278
- TsExtractor is now based on ParsableByteArray rather than BitArray.
  This makes is much clearer that, for the most part, data is byte
  aligned. It will allow us to optimize TsExtractor without worrying
  about arbitrary bit offsets.
- BitArray is renamed ParsableBitArray for consistency, and is now
  exclusively for bit-stream level reading.
- There are some temporary methods in ParsableByteArray that should be
  cleared up once the optimizations are in place.

Issue: #278
1. AdtsReader would previously copy all data through an intermediate
adtsBuffer. This change eliminates the additional copy step, and
instead copies directly into Sample objects.

2. PesReader would previously accumulate a whole packet by copying
multiple TS packets into an intermediate buffer. This change
eliminates this copy step. After the change, TS packet buffers
are propagated directly to PesPayloadReaders, which are required
to handle partial payload data correctly. The copy steps in the
extractor are simplified from:

DataSource->Ts_BitArray->Pes_BitArray->Sample->SampleHolder

To:

DataSource->Ts_BitArray->Sample->SampleHolder

Issue: #278
There's no code change here at all, except for how TsExtractor's
getLargestSampleTimestamp method works.
I'm not really a fan of micro-optimizations, but given this method
scans through every H264 frame in the HLS case, it seems worthwhile.
The trick here is to examine the first 7 bits of the third byte
first. If they're not all 0s, then we know that we haven't found a
NAL unit, and also that we wont find one at the next two positions.
This allows the loop to increment 3 bytes at a time.

Speedup is around 60% on Art according to some ad-hoc benchmarking.
Reordering in the extractor isn't going to work well with the
optimizations I'm making there. This change moves sorting back
to the renderer, although keeps all of the renderer
simplifications. It's basically just moving where the sort
happens from one place to another.
The complexity around not enabling the video renderer before it
has a valid surface is because MediaCodecTrackRenderer supports
a "discard" mode where it pulls through and discards samples
without a decoder. This mode means that if the demo app were to
enable the renderer before supplying the surface, the renderer
could discard the first few frames prior to getting the surface,
meaning video rendering wouldn't happen until the following sync
frame.

To get a handle on complexity, I think we're better off just removing
support for this mode, which nicely decouples how the demo app
handles surfaces v.s. how it handles enabling/disabling renderers.
- Remove TsExtractor's knowledge of Sample.
- Push handling of Sample objects into SampleQueue as much
  as possible. This is a precursor to replacing Sample objects
  with a different type of backing memory. Ideally, the
  individual readers shouldn't know how the sample data is
  stored. This is true after this CL, with the except of the
  TODO in H264Reader.
- Avoid double-scanning every H264 sample for NAL units, by
  moving the scan for SEI units from SeiReader into H264Reader.

Issue: #278
Use of Sample objects was inefficient for several reasons:

- Lots of objects (1 per sample, obviously).
- When switching up bitrates, there was a tendency for all Sample
  instances to need to expand, which effectively led to our whole
  media buffer being GC'd as each Sample discarded its byte[] to
  obtain a larger one.
- When a keyframe was encountered, the Sample would typically need
  to expand to accommodate it. Over time, this would lead to a
  gradual increase in the population of Samples that were sized to
  accommodate keyframes. These Sample instances were then typically
  underutilized whenever recycled to hold a non-keyframe, leading
  to inefficient memory usage.

This CL introduces RollingBuffer, which tightly packs pending sample
data into a byte[]s obtained from an underlying BufferPool. Which
fixes all of the above. There is still an issue where the total
memory allocation may grow when switching up bitrate, but we can
easily fix that from this point, if we choose to restrict the buffer
based on allocation size rather than time.

Issue: #278
I think this is the limit of how far we should be pushing complexity
v.s. efficiency. It's a little complicated to understand, but probably
worth it since the H264 bitstream is the majority of the data.

Issue: #278
This prevents excessive memory consumption when switching to
very high bitrate streams.

Issue: #278
ojw28 added a commit that referenced this pull request Feb 16, 2015
@ojw28 ojw28 merged commit 953e111 into master Feb 16, 2015
@ojw28 ojw28 deleted the dev-1.2.0-rc branch February 16, 2015 17:48
@google google locked and limited conversation to collaborators Jun 28, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants

Comments