-
Notifications
You must be signed in to change notification settings - Fork 291
/
notes.txt
113 lines (70 loc) · 10.6 KB
/
notes.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
In this text file I keep notes about design decisions and issues I came across while writing this program, so I don't make the same mistakes twice. This should also make it easier to understand the code later.
#### Keep the front-end and back-end completely separated ####
The front-end (GUI) calls the back-end, not the other way around. This way the front-end can easily be replaced with something else (e.g. command-line). The back-end can use Qt Core classes like QString and QThread, but not GUI classes.
Logging is done with the 'Logger' class. This should become an abstract class, in order to make it easier to show log results in different places (e.g. stderr for the command-line interface, and a text area for the GUI).
#### Back-end design ####
The back-end uses a 'push' pipeline design. Classes push their results to the next stage, and the next stage doesn't need to know where these results came from (only where they are supposed to go). This makes it very easy to use the same output class with different input classes. It should also be possible to use the same input class with different output classes by making the output class abstract, but I haven't done this yet because it wasn't needed so far. A push design is a better idea than a pull design in this case, because it makes switching inputs while recording trivial.
The pipeline:
+---------------+ +--------------+
| X11Input | +--------------+ | | +-------+
| or | ---> | | ===> | VideoEncoder | ===> | |
| GLInjectInput | | | | | | |
+---------------+ | Synchronizer | +--------------+ | Muxer |
| | | | | | | |
| ALSAInput | ---> | | ===> | AudioEncoder | ===> | |
| | +--------------+ | | +-------+
+---------------+ +--------------+
"--->" means the result is sent to the next stage and processed immediately.
"===>" means the result is stored in a queue, and will be processed by the next stage later.
- X11Input: Captures the screen, or a part of it. Originally I used the 'x11grab' device from libav/ffmpeg, but this added overhead and was less flexible.
- GLInjectInput: Captures frames from an OpenGL program (which was launched with GLInjectLauncher).
- ALSAInput: Records audio using ALSA. Real PulseAudio support would be nice too (not a priority though).
- Synchronizer: Synchronizes audio and video based on the timestamps of the frames. This class is needed even if there is no audio, because it still has to translate the timestamps to 'pts' values (which are not the same). I've hijacked the AVFrame::pkt_dts field to store the timestamps because it wasn't used (and because I want to avoid confusion with AVFrame::pts which has a very different meaning). The synchronizer drops or inserts frames of video/audio to make sure both have the same length. The synchronizer also stitches together separate segments (when the recording is paused and resumed later). This is more complex than it sounds, trust me :). The synchronizer will also do sample format conversions if needed.
- VideoEncoder/AudioEncoder: Encodes the video/audio frames and generates packets. Inherits BaseEncoder. All the hard work is done by libav/ffmpeg, this is basically a wrapper.
- Muxer: Combines video and audio packets with the required metadata and writes it to a file. Again, all the hard work is done by libav/ffmpeg.
#### How to use libav/ffmpeg ####
Libav/ffmpeg is very confusing and the documentation doesn't really help either. It took a long time to figure out how it all works. The list below describes how I *think* libav/ffmpeg should be used. Note that the order of the function calls is very important. For example, calling avcodec_close before av_write_trailer appears to cause segfaults inside libav/ffmpeg.
Create muxer:
- av_guess_format: get AVOutputFormat pointer
- avformat_alloc_context: get format context
- avio_open: open output file
Create encoders:
- avcodec_find_encoder_by_name: get AVCodec pointer
- avformat_new_stream: create stream for codec
- avcodec_open2: open codec
Start muxer:
- avformat_write_header: write the header
Actual encoding (flushing is optional):
- avcodec_encode_video2/avcodec_encode_audio2: encode a frame
- av_interleaved_write_frame: write a packet (i.e. a frame)
Stop muxer:
- av_write_trailer: write the trailer and free internal data
Destroy encoders:
- avcodec_close: close codec
Destroy muxer:
- avio_close: close output file
- av_freep(formatcontext->stream[i]->codec): free codec context
- av_freep(formatcontext->stream[i]): free stream
- av_free(formatcontext): free format context
#### Multithreading ####
When you write a multithreaded application, you have to think about deadlocks. A simple and reliable way to avoid deadlocks is to give all mutexes a strict (partial) order, and only ever lock them in that order (the order of unlocking is irrelevant but is usually the reverse of the order of locking). This is automatically accomplished by using a 'push' pipeline design where each class only has pointers to successive stages, because this makes it impossible to lock mutexes in the wrong order, as long as all mutexes are private and locks are never held after a function call returns.
There's one exception in my pipeline: Muxer has a pointer to VideoEncoder/AudioEncoder which it uses to tell them to stop, and to delete them. This seems to be unavoidable because of the design of libav/ffmpeg. Special care has to be taken here to avoid deadlocks: BaseEncoder::Stop and 'delete BaseEncoder' are called while no locks in Muxer are held by the thread.
There's also an issue that's not very well documented: just using volatile is not enough to guarantee correct behaviour. The 'volatile' keyword is meant to stop some compiler optimisations, not to ensure correct multithreaded behaviour. Some of the problems with 'volatile':
- It only synchronizes access between two volatile variables, not between a volatile and a normal one. So in order to synchronize multiple variables correctly, you would have to make them ALL volatile, which makes things slow and is simply impossible for STL containers. The solution is to use mutexes instead. Apparently volatile is not even needed here because compilers should be smart enough to understand what mutexes do.
- For lock-free variables (i.e. flags like 'should_finish', 'should_stop', 'error_occurred', ...), volatile will indeed stop compiler optimisations, and even if synchronization with other variables isn't a problem, it is still possible that the CPU itself will reorder memory accesses, or that memory operations won't be visible to other CPU cores because of memory caching issues. This can only be fixed by inserting memory fences (which is also done by mutexes). C++11 provides these in the <atomic> header.
#### OpenGL recording ####
OpenGL recording is usually called 'GLInject' in the code. It consists of three parts:
- A small library that is injected into the program that will be recorded. The library overrides some system functions (most importantly glXSwapBuffers) so it can capture the frames before they are displayed on the screen. It should be relatively easy to draw extra things using OpenGL (such as the frame rate), but the program doesn't do that right now. Frames are sent to the main program using shared memory.
- The GLInjectLauncher class, which creates the shared memory circular buffer that is used for communication with the injected library. It can also launch the program that will be recorded, or alternatively just print the command to do so.
- The GLInjectInput class, which reads frames from the shared memory circular buffer, scales them to the correct output size, flips them vertically (because OpenGL returns images upside-down), and sends them to the synchronizer. The timestamp is also read from shared memory, which is more accurate than using the time the frame was received.
#### Video corruption ####
If the video is corrupted after some change to the back-end, check the timestamps. In most of the cases I've seen, the muxer was simply messing up the interleaving of the video and audio frames. The typical result is video that starts fine, then gets corrupted, and then freezes after a few seconds.
Video can also get corrupted if the frame rate is too low (like 1 fps). This may be a problem with the player though, I'm not sure.
#### Dealing with old versions of libav/ffmpeg ####
This is very annoying. Libav/ffmpeg likes to change its API a lot. The API is usually backwards compatible, but the old functions often have some disadvantages. So for now I'm supporting both old and new versions of the API, based on the preprocessor definitions from the libav/ffmpeg headers.
Libav/ffmpeg has a very useful document that keeps track of all API changes, it's located in 'doc/APIchanges'. The best way to find out how the old API actually worked is to search for the commit that removed the function (just search for the function name), go back one commit, go to the corresponding tree, and look at the example code (e.g. output-example.c for libavformat).
#### GUI ####
The GUI uses a wizard-like layout to show all the settings, rather than separate settings windows. This makes all settings immediately visible, so you're less likely to waste an hour of time because you forgot to turn on the microphone, for example. It also means all settings are easily discoverable. Yes, it means the user has to press 'continue' a few times. But I assume that many of these settings will be changed very often, if not every time. Opening separate windows to change settings, or just to check if they are still correct, would take a far more time.
#### Capturing hotkeys ####
Capturing key presses system-wide is a lot more annoying than I expected. I'm using XGrabKey because this appears to be the right way to do it. But with this function you can't capture all keys, apparently. Function keys are blocked. Ubuntu 12.04 only blocked function keys with no modifiers (and since X considers Num Lock a modifier, it wasn't really blocking anything), but Xubuntu 12.10 appears to block them completely. So for now I'm just relying on the letter keys.
An unexpected but useful side-effect of XGrabKey is that will capture key presses *without* sending the actual key press to the program that you are recording (or any other program actually). This means conflicts are less likely - instead of choosing a hotkey that isn't used by anything, the user can now simply choose any hotkey that he doesn't currently need. And how often do you really *need* something like Ctrl+R to use a program?