dev -> dev-1.2.0-rc by ojw28 · Pull Request #294 · google/ExoPlayer

ojw28 · 2015-02-13T20:38:57Z

No description provided.

This is the start of a sequence of changes to fix the ref'd github issue. Currently TsExtractor involves multiple memory copy steps: DataSource->Ts_BitArray->Pes_BitArray->Sample->SampleHolder This is inefficient, but more importantly, the copy into Sample is problematic, because Samples are of dynamically varying size. The way we end up expanding Sample objects to be large enough to hold the data being written means that we end up gradually expanding all Sample objects in the pool (which wastes memory), and that we generate a lot of GC churn, particularly when switching to a higher quality which can trigger all Sample objects to expand. The fix will be to reduce the copy steps to: DataSource->TsPacket->SampleHolder We will track Pes and Sample data with lists of pointers into TsPackets, rather than actually copying the data. We will recycle these pointers. The following steps are approximately how the refactor will progress: 1. Start reducing use of BitArray. It's going to be way too complicated to track bit-granularity offsets into multiple packets, and allow reading across packet boundaries. In practice reads from Ts packets are all byte aligned except for small sections, so we'll move over to using ParsableByteArray instead, so we only need to track byte offsets. 2. Move TsExtractor to use ParsableByteArray except for small sections where we really need bit-granularity offsets. 3. Do the actual optimization. Issue: #278

- TsExtractor is now based on ParsableByteArray rather than BitArray. This makes is much clearer that, for the most part, data is byte aligned. It will allow us to optimize TsExtractor without worrying about arbitrary bit offsets. - BitArray is renamed ParsableBitArray for consistency, and is now exclusively for bit-stream level reading. - There are some temporary methods in ParsableByteArray that should be cleared up once the optimizations are in place. Issue: #278

1. AdtsReader would previously copy all data through an intermediate adtsBuffer. This change eliminates the additional copy step, and instead copies directly into Sample objects. 2. PesReader would previously accumulate a whole packet by copying multiple TS packets into an intermediate buffer. This change eliminates this copy step. After the change, TS packet buffers are propagated directly to PesPayloadReaders, which are required to handle partial payload data correctly. The copy steps in the extractor are simplified from: DataSource->Ts_BitArray->Pes_BitArray->Sample->SampleHolder To: DataSource->Ts_BitArray->Sample->SampleHolder Issue: #278

There's no code change here at all, except for how TsExtractor's getLargestSampleTimestamp method works.

I'm not really a fan of micro-optimizations, but given this method scans through every H264 frame in the HLS case, it seems worthwhile. The trick here is to examine the first 7 bits of the third byte first. If they're not all 0s, then we know that we haven't found a NAL unit, and also that we wont find one at the next two positions. This allows the loop to increment 3 bytes at a time. Speedup is around 60% on Art according to some ad-hoc benchmarking.

Reordering in the extractor isn't going to work well with the optimizations I'm making there. This change moves sorting back to the renderer, although keeps all of the renderer simplifications. It's basically just moving where the sort happens from one place to another.

The complexity around not enabling the video renderer before it has a valid surface is because MediaCodecTrackRenderer supports a "discard" mode where it pulls through and discards samples without a decoder. This mode means that if the demo app were to enable the renderer before supplying the surface, the renderer could discard the first few frames prior to getting the surface, meaning video rendering wouldn't happen until the following sync frame. To get a handle on complexity, I think we're better off just removing support for this mode, which nicely decouples how the demo app handles surfaces v.s. how it handles enabling/disabling renderers.

- Remove TsExtractor's knowledge of Sample. - Push handling of Sample objects into SampleQueue as much as possible. This is a precursor to replacing Sample objects with a different type of backing memory. Ideally, the individual readers shouldn't know how the sample data is stored. This is true after this CL, with the except of the TODO in H264Reader. - Avoid double-scanning every H264 sample for NAL units, by moving the scan for SEI units from SeiReader into H264Reader. Issue: #278

end of some streams.

Use of Sample objects was inefficient for several reasons: - Lots of objects (1 per sample, obviously). - When switching up bitrates, there was a tendency for all Sample instances to need to expand, which effectively led to our whole media buffer being GC'd as each Sample discarded its byte[] to obtain a larger one. - When a keyframe was encountered, the Sample would typically need to expand to accommodate it. Over time, this would lead to a gradual increase in the population of Samples that were sized to accommodate keyframes. These Sample instances were then typically underutilized whenever recycled to hold a non-keyframe, leading to inefficient memory usage. This CL introduces RollingBuffer, which tightly packs pending sample data into a byte[]s obtained from an underlying BufferPool. Which fixes all of the above. There is still an issue where the total memory allocation may grow when switching up bitrate, but we can easily fix that from this point, if we choose to restrict the buffer based on allocation size rather than time. Issue: #278

I think this is the limit of how far we should be pushing complexity v.s. efficiency. It's a little complicated to understand, but probably worth it since the H264 bitstream is the majority of the data. Issue: #278

This prevents excessive memory consumption when switching to very high bitrate streams. Issue: #278

…aces.

dev -> dev-1.2.0-rc

ojw28 added 17 commits February 10, 2015 12:25

Split TsExtractor into multiple files.

3568eca

There's no code change here at all, except for how TsExtractor's getLargestSampleTimestamp method works.

Pragmatic fix for "stuck-in-ready-state" when seeking near the

d1360ad

end of some streams.

Use 'bitrate' not 'bit-rate' consistently.

321005e

Rename ParsableByteArray.length() to limit(). Add capacity().

28166d8

Eliminate memory copy of H264 data through H264 reader.

c3788c0

I think this is the limit of how far we should be pushing complexity v.s. efficiency. It's a little complicated to understand, but probably worth it since the H264 bitstream is the majority of the data. Issue: #278

Restrict HLS buffering by total buffer size as well as duration.

5f64a1f

This prevents excessive memory consumption when switching to very high bitrate streams. Issue: #278

Add C.SAMPLE_FLAG_SYNC and removed InlinedApi suppression in a few pl…

bc99435

…aces.

Fix broken javadoc link

9d480ec

Use C.SAMPLE_FLAG_SYNC

4041126

ojw28 added a commit that referenced this pull request Feb 13, 2015

Merge pull request #294 from google/dev

2f04580

dev -> dev-1.2.0-rc

ojw28 merged commit 2f04580 into dev-1.2.0-rc Feb 13, 2015

google locked and limited conversation to collaborators Jun 28, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dev -> dev-1.2.0-rc#294

dev -> dev-1.2.0-rc#294
ojw28 merged 17 commits intodev-1.2.0-rcfrom
dev

ojw28 commented Feb 13, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

ojw28 commented Feb 13, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments