A tutorial for mjpeg.decoder.winrt - Source code for a MJPEG Decoder running on Windows RT (8.1 Universal Apps)

February 22, 2015

For anyone interested in integrating a MJPEG stream decoder on a Windows 8.1 Universal App (i.e. targetting both Windows 8.1 and Windows Phone 8.1 'store' apps) I have published a project on CodePlex that will help you with developing this. You can find the project here. The current version is v1.1.0.

The project is a rework of another MJPEG Decoder project on Codeplex (see here), but which lacks support for Windows Runtime 8.1 apps.

The modified decoder takes over the API from the original project, except for:

  • The name of the decoder class has changed. It is now MJPEGDecoder.
  • The name of the parse stream method is changed; it is now ParseStreamAsync().
  • A number of read/write properties have been added to configure certain settings in the decoder.

Internally the 2 most important changes are:

  • The decoder now creates a HTTP connection (with for instance an IP camera) using a c# HttpClient from the Windows.Web.Http namespace. This is the now recommended way to implement HTTP connection capabilities in Windows 8.1 Universal Apps. Because the API of the new HttpClient is radically different from other well-known HttpClient implementations on Windows, it turned out to be difficult to integrate the modified code inside the original decoder and so I decided to create a completely new project.
  • The decoder (starting with version 1.1.0) now also supports 2 mechanisms to extract an image frame from the input stream:
    • Extraction based on the Content-Length header in the MJPEG boundary section. This is the most efficient way of dealing with MJPEG, and leads to considerably higher fps compared with the 2nd mechanism.
    • Extraction based on detecting the next boundary section, followed by JPEG lead-in bytes and then retreiving chunks of image data until the next boundary marker is detected. This involves lots of IO on the TCP layer (because of the relatively small chunk buffers) and therefore doesn't give the highest possible fps.

Of course, all thanks must go to Brian Peek, the developer of the original MJPEG Decoder. The code he wrote in MjpegDecoder.cs is extremely well structured and so it wasn't too difficult to modify it.

1 Usage

Usage is more or less the same as with the original MjpegDecoder.cs file:

  1. There are 2 ways to integrate the modified decoder software:
    • Copy the MJPEGDecoder.cs file that is available in the Codeplex project somewhere in your Visual Studio project.
    • Build or download the MJPEGDecoderWinRTLib.dll file and reference it in your project.
  2. In your app create a MJPEGDecoder object and attach to the FrameReady and Error events.
  3. Optionally, modify the default chunk sizes of the decoder and/or force the decoder to use a specific image detection mode by setting the appropriate decoder properties (ChunkSize, BoundaryChunkSize, and ImgDetectionMode).
  4. When ready to start listening for the MJPEG stream invoke the ParseStreamAsync method on the decoder object. You must pass a suitable URI string, and optionally a username and a password. The URI string is typically in the form: 'http://hostAddress[":" port]["/" resourcePath]'. Almost all IP camera's will require a resourcePath to tell the camera which video stream it has to offer. For MJPEG it will be something like: "video.mjpg", "video2.mjpg", "videostream.cgi", etc.
  5. In the FrameReady event handler get the byte array of the received frame, assign the array to an InMemoryRandomAccessStream object and use that object as the source for your BitmapImage.

The following piece of code shows the above in full detail. The example code assumes that the MVVM design pattern is used.

1.1 The ViewModel

The approach I typically take when using the ViewModel is to pass the ViewModel object to the View (XAML and the XAML code-behind) as parameter in the Frame.Navigate method. This way the NavigationHelper_LoadState(object sender, LoadStateEventArgs e) method in the code-behind can retrieve the ViewModel and assign it to defaultViewModel["Root"]. Inside the XAML one can then reference the ViewModel's properties and commands.

This is how the ViewCameraPageViewModel properties and commands could look like:

class ViewCameraPageViewModel: BindableBase
{
    private Frame _frame;
    Camera _camera;
    BitmapImage _bitmap;
    string _errorMsg;
    string _requestUri;
    MJPEGDecoder _mjpegDecoder;

    TRelayCommand<object> _startCameraStreamCmd;
    TRelayCommand<object> _stopCameraStreamCmd;

    public ViewCameraPageViewModel(Frame frame, Camera camera)
    {
        _frame = frame;
        _camera = camera;
        _bitmap = new BitmapImage();
        _errorMsg = "";
        _mjpegDecoder = new MJPEGDecoder();
    }

    public Camera Camera
    {
        get { return _camera; }
        set { _camera = value; }
    }

    public BitmapImage CameraBitmap
    {
        get { return _bitmap; }
        set { SetProperty(ref _bitmap, value); }
    }

    public string ErrorMsg
    {
        get { return _errorMsg; }
        set { SetProperty(ref _errorMsg, value); }
    }

    public string RequestUri
    {
        get { return _requestUri; }
        set { SetProperty(ref _requestUri, value); }
    }

    public ICommand StartCameraStreamCmd
    {
        get
        {
            if (_startCameraStreamCmd == null)
            {
                _startCameraStreamCmd = new TRelayCommand<object>(
                    async (o) =>
                    {
                        // Register listener methods
                        _mjpegDecoder.FrameReady += mjpegDecoder_FrameReady;
                        _mjpegDecoder.Error += mjpegDecoder_Error;

                        // Construct Http Uri
                        RequestUri = String.Format("{0}/{1}", Camera.Uri, Camera.CameraResourcePath);

                        // Tell MJPEGDecoder to connect to the IP camera, parse the mjpeg stream, and 
                        // report the received image frames.
                        await _mjpegDecoder.ParseStreamAsync(RequestUri, Camera.UserName, Camera.Password);
                    });
            }
            return _startCameraStreamCmd;
        }
    }

    public ICommand StopCameraStreamCmd
    {
        get
        {
            if (_stopCameraStreamCmd == null)
            {
                _stopCameraStreamCmd = new TRelayCommand<object>(
                    (o) =>
                    {
                        _mjpegDecoder.StopStream();
                    });
            }
            return _stopCameraStreamCmd;
        }
    }
    ...
}

This is all pretty obvious except maybe for the RelayCommand which you might expect. Although not necessary here I always use the template version, TRelayCommand<T> in order to be able to pass a parameter with the command when that would be required.

As you can see in StartCameraStreamCmd the viewmodel registers for the FrameReady and Error events and invokes ParseStreamAsync to start listening for a MJPEG stream. The URI, resource path, username and password are properties of the Camera Model object (which I will not further explain here because it should be obvious what this class does).

The 2 event handlers in ViewCameraPageViewModel look as follows:

class ViewCameraPageViewModel: BindableBase
{
    ...

    private async void mjpegDecoder_FrameReady(object sender, FrameReadyEventArgs e)
    {
        // Copy the received FrameBuffer to an InMemoryRandomAccessStream.
        using (InMemoryRandomAccessStream ms = new InMemoryRandomAccessStream())
        {

            using (DataWriter writer = new DataWriter(ms.GetOutputStreamAt(0)))
            {
                writer.WriteBytes(e.FrameBuffer);
                await writer.StoreAsync();
            }
            // Update source of CameraBitmap with the memory stream
            CameraBitmap.SetSource(ms);
        }
    }

    private void mjpegDecoder_Error(object sender, ErrorEventArgs e)
    {
        ErrorMsg = e.Message;
    }
}

When ViewCameraPageViewModel receives a frame it creates an InMemoryRandomAccessStream object and assigns the byte array of the received frame to it. When done, the memory stream is passed as new source to the CameraBitmap property (which is a BitmapImage object that can be bounded to an Image control in XAML and shown to the user).

1.2 The View

The XAML/View code-behind file is best created using the Visual Studio wizard. Make sure you have a page that comes with Navigation support.

The most important code for our example is listed here:

public sealed partial class ViewCameraPage : Page
{
    private NavigationHelper navigationHelper;
    private ObservableDictionary defaultViewModel = new ObservableDictionary();
    private ViewCameraPageViewModel _pageViewModel;

    ...

    private void NavigationHelper_LoadState(object sender, LoadStateEventArgs e)
    {
        _pageViewModel = e.NavigationParameter as ViewCameraPageViewModel;
        defaultViewModel["Root"] = _pageViewModel;
        _pageViewModel.StartCameraStreamCmd.Execute(null);
    }

    protected override void OnNavigatedFrom(NavigationEventArgs e)
    {
        _pageViewModel.StopCameraStreamCmd.Execute(null);
        this.navigationHelper.OnNavigatedFrom(e);
    }
}

As you can see the ViewCameraPageViewModel object is assigned to defaultViewModel["Root"] when the page is loaded. At that moment the viewmodel is also instructed to start the camera stream.

The stream is stopped when we navigate away from the page.

Finally, the XAML part looks as follows:

<Page>
    x:Class="MJPEGDecoderWinRT.View.ViewCameraPage"
    xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
    xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
    xmlns:local="using:MJPEGDecoderWinRT.View"
    xmlns:d="http://schemas.microsoft.com/expression/blend/2008"
    xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
    DataContext="{Binding DefaultViewModel.Root, RelativeSource={RelativeSource Self}}"
    mc:Ignorable="d"
    Background="{ThemeResource ApplicationPageBackgroundThemeBrush}">

    <Grid>
        <Grid.RowDefinitions>
            <RowDefinition Height="Auto" />
            <RowDefinition Height="Auto" />
            <RowDefinition Height="Auto" />
            <RowDefinition Height="Auto" />
        </Grid.RowDefinitions>
        <StackPanel Grid.Row="0" Orientation="Horizontal" Margin="12,12,0,0">
            <TextBlock Text="camera:" FontSize="14" Width="60" />
            <TextBlock Text="{Binding Camera.Name}" FontSize="14"/>
        </StackPanel>
        <StackPanel Grid.Row="1" Orientation="Horizontal" Margin="12,0,0,0">
            <TextBlock Text="uri:" FontSize="14" Width="60" />
            <TextBlock Text="{Binding RequestUri}" FontSize="14"/>
        </StackPanel>
        <Image Grid.Row="2" Source="{Binding CameraBitmap}" Stretch="UniformToFill" />
        <TextBlock Grid.Row="3" Text="{Binding ErrorMsg}"
                   Margin="12,0,0,12" FontSize="16" Foreground="Red" TextWrapping="Wrap"/>
    </Grid>
</Page>

You can see that ViewCameraPageViewModel is bound to the XAML through the DataContext attribute in the XAML page defnition.

The actual camera picture is rendered by means of an Image control, which is bound to the CameraBitmap property of the viewmodel.

2 Configuring the decoder

It is possible (through the ImgDetectionMode property) to configure the method which the decoder uses to extract picture frames from the input stream. You do this by setting the property to one of 3 possible values:

  • ImageDetectionMode.ContentLengthBased
  • ImageDetectionMode.BoundaryBased
  • ImageDetectionMode.Auto

ImageDetectionMode.ContentLengthBased

In this mode the decoder assumes that each boundary section in the stream contains a Content-Length attribute. The content length value is used to tell the http input stream to read the full image data.
This approach is the fastest, requires the least processing and consequently has the best fps potential.

A MJPEG image boundary section must then look like this:

--myboundary
Content-Type: image/jpeg
Content-Length: 64199
\r\n

The MJPEG image data folows the boundary section and always starts with the bytes 0xff, 0xd8.

Note: I have seen some variants with different MJPEG IP camera's:

  • 'Content-Length' is 'Content-length' on a D-Link
  • Some camera's add a 'Date' mime attachment header
  • Some devices have \r\n bytes before --myboundary.

The decoder takes care of this by looking for the string "ength:" (instead of "Content-Length") and by making the size of the buffer in which the boundary data is written a little bit bigger (130). If this would not be sufficient then you can change that through property BoundaryChunkSize.

ImageDetectionMode.BoundaryBased

In this mode we still assume of course a boundary section but will isolate the image data by searching the MJPEG start bytes (0xff, 0xd8) and then treating all data as image data until we reach the next boundary marker.
Since we don't know the size of the image we will have to read from the input stream in chunks.

The size of the chunk is however problematic when reaching for the highest fps. Small chunks will inevitably lead to lower fps. Big chunk sizes will give better fps (when moved up to for instance half of the frame size). But, if you have to handle multiple type of devices in one app, with various resolutions and quality settings, then it will be difficult to define the optimal chunk size unless you can configure the size on a per device base.

The default chunk size is set to 1024 but can be modified using the ChunkSize property. Of course the chunk size must always be set to a value that is less than the device's image size. If you set it higher than it will become problematic to correctly detect boundaries/images.

The next list gives you an idea how the optimal chunk size varies across different resolutions and picture quality settings:

Resolution  JPEG quality  Chunk size
----------  ------------  ----------
176x144     medium        1400
176x144     excellent     4000
320x240     medium        3000
800x600     excellent     55000
1280x800    standard      30000
1280x800    excellent     100000  

ImageDetectionMode.Auto

This is the default mode in the decoder. It tells the decoder to detect itself which of the above 2 modes has to be choosen based on the presence of a Content-Length header in the first boundary section that it receives.
If the header is available then the mode is set to ImageDetectionMode.ContentLengthBased, otherwise it is set to ImageDetectionMode.BoundaryBased.

3 Download

You can download the source code of the latest version of the library and 2 test apps at Codeplex. You will need Visual Studio 2013 (and of course Windows 8.1) to build and test the library and the apps.

Comments

comments powered by Disqus