The digital imaging pipeline and objects

This project seeks to understand and also takes place in the ‘digital imaging pipeline’, that space of objects and object connections within hardware and software scopic apparatuses where light-becomes-date-become-image. Although the term ‘pipeline’ would tend to connote process and movement and linear relations, I will come on to draw it in terms of objects.

To provide a grounding for these discussions it is necessary to lay out the technological form of that pipeline in order to establish the range of objects we are dealing with.

One way to understand the ‘digital imaging pipeline’ is via its chemical equivalent. I use the term ‘chemical’ rather than ‘analogue’ because I want to avoid debates about an analogue-digital divide. The issue here is not whether one deals in discrete one and zero steps and the other a smooth curve, but rather the way in which encoding works. The issue within which JPEG as protocol is important is the difference between action of light on silver halides within a chemical process and light on silicon within a digital process.

In ‘chemical photography’ photographic film carries an emulsion binding silver halide crystals to a gelatine base. Silver halide consists of silver combined with a halogen element, such as chlorine, bromine or iodine. These crystals react to the light that hits them forming a latent image which is amplified to form a visible, black image where light has struck when the film is developed. When the film is ‘fixed’ the remaining unexposed crystals are removed, leaving a negative image on film.  Vastly simplified, the chemical imaging pipeline can be characterised as: light hits silver creating latent image; development amplifies latent image creating final image. Of course photography adds other stages and technologies to the process. Most photographers want to turn the negative into a positive. By shining light through the negative onto a paper coated with a similar silver halide emulsion, the exposed areas (black in the negative) stop light hitting the paper, while unexposed areas (clear in the negative) let light hit the crystals.⁠1 What was light in the scene and black in the negative become light in the print and vice versus. There are other technologies (or objects as I would refer to them) in play. Lenses (or in my case pinholes); camera apparatuses including the shutter and aperture assembly; enlargers; film and paper etc. These of course are also in play in digital photography. What is different is the encoding – the journey of light through latent image or data to visible image.

Many things are similar, metaphorically and literally. Concentrating just on the encoding, light hits a sensor (silicon rather than silver halide). This generates data (electronic information rather a latent image in silver) that becomes an image (through software and protocol processing rather than through chemical development). But there are important differences that impact on how JPEG, as my main focus, works within imaging and to create images.

To work with objects, the first object in the digital imaging pipeline is the sensor. In digital photography these are one of two types: CCD (charge-coupled device) and CMOS (complimentary metal-oxide semiconductor) sensors.⁠2

Sensors are effectively a whole array of silicon, solar or photovoltaic cells. When light hits one of these cells, some of its energy is absorbed by the silicon which knocks electrons loose which is forced to flow in a particular direction creating a current: photons become electrons, light become electricity.

Digital camera sensors have either a red, green or blue filter over each pixel/cell, essentially making the cell only sensitive to red, green or blue light.These are arranged in a Bayer mosaic pattern consisting with two green, one red and one blue filter – designed to match the bias of human perception of colours.

The sensor reads the amount of charge from each cells (what comes to be known as pixels). These electrical charges need to be collected and organised before they can be processed by other software objects. A CCD sensor handles this differently than a CMOS sensor. In a CCD sensor, a control circuit circuit causes each capacitor to transfer its contents to its neighbour with the final output read at one corner of the array. In a CMOS sensor, each pixel/cell is accompanied by several transistors that amplify and move the charge using more traditional wires. Thus each pixel can be read individually.⁠3

At this point the light-as-electricity is still ‘analogue’. In order for the software (including JPEG) to be able to work with it, it needs to become digital. Here we come to our second object⁠4: the analog-to-digital converter (ADC). An ADC is an integrated circuit that samples the analogue feed from the sensor into a number of discrete levels of brightness. Most cameras use 8bit ADCs which allow 256 distinct values for the brightness of each pixel.⁠5

This digital information simply records the luminescence at each location on the sensor. This is greyscale data. The ADC adds extra information to its output: information about a pixel’s location (and hence whether it was ‘under’ a red, green or blue filter); metadata about the sensor’s colour space; and the camera’s white balance setting. This digital information becomes the RAW data file that is written to the camera’s storage medium.⁠6

Because each pixel/cell only sensed one wavelength of light (red, green or blue), the information making up the ‘latent image’ needs to be interpolated so that the image can represent the amount of red across the whole image not just on those bit where the filter measured the red light. To do this a ‘demosaicing algorithm’ averages the values from the closest surrounding pixels to assign a ‘true colour’ to each pixel. This data can be encoded as a visible colour image file. It is here where JPEG comes in.

The demosaicing algorithm outputs three 8 bit colour channels of data as opposed to the one 12 bit RAW channel. Protocols within the in-camera software encode those three channels in particular formats: usually either a 24-bit TIFF or a 24-bit JPEG/JFIF or JPEG/EXIF file, see below.

To concentrate just on the JPEG protocol’s processing of that RAW feed of data, the digital imaging pipeline continues in four steps: Sampling, Discrete Cosine Transform, Quantization and Huffman Coding (Miano 1999, p. 44). At the end, the light-as-data is a JFIF image, commonly know as a ‘jpeg photograph’.

The pixel data is first converted from RGB to YCbCr colorspace. The JPEG protocol is principally about compression. It’s role in the imaging pipeline is to reduce the amount of data in the file – hence its importance in the early days of the Internet when bandwidth was at a premium. Part of the work of compression is the move from RGB to YCbCr. Storing image data in both RGB and YCbCr colorspaces demands three channels of information – in RGB: red, green and blue in YCbCr: Luminance and two chrominance, blue and red (Miano 1999, p. 6). Both allow a full range of colours but in RGB, each channel is sampled at the same frequency while in YCbCr, this can be varied. The Y component contributes most information to the visible image and JPEG therefore assigns more weight to that component and reduces the amount of information in the Cb and Cr channels, thus reducing the amount of information and so the file size. As John Miano explains:

“By adjusting the sampling frequencies you could include each pixel’s Y component value in the compressed data and 1 value for every 4 pixels from the other components. Instead of storing 12 values for every 4 pixels, you would be storing 6 – a 50% reduction” (Miano 1999, p. 41).

The next step in JPEG encoding is ‘Discrete Cosine Transform’ (DCT). First the protocol divides the YCbCr image data into 8×8 blocks called data units.⁠7 DCT does not actually compress or throw information away, it merely readies the data/information for that to happen in the next step by sorting the information which can safely be discarded. It can assumed that, over an 8 by 8 block, the values of the Y,Cb and Cr components will not be large. Rather than record the individual values of each component, we could average the values for each block and record how each pixel differs from that average value.

DCT takes the set of values in each data unit and transforms it into a set of coefficients to cosine functions with increasing frequencies (Miano 1999, pp. 77-90). DCT arranges the digital information ready for compression by finding the frequency of each value – in lay terms the most frequent tone or colour values.

JPEG compression, depends on the fact that human perception is not perfect. A lot of information can be thrown away and, effectively we fill in the gaps in a similar way to the way the demosaicing algorithm does. The next step takes the sorted data from the DCT and discards those coefficients that contribute less information to the image.⁠8 This is the Quantization step. Quantization is a “fancy name for division, To quantize the DCT coefficients we simply divide them by another value and round to the nearest integer” (Miano 1999, p. 88). This rounding process effectively discards some of the coefficeints and so information because the value become zero.

The JPEG standard does not specify the value to be used. It leaves that up to the application using the protocol. Rather it provides 8×8 quantization tables’ that map onto the 8×8 data units. We normally come across these table when we choose the ‘quality’ setting for JPEG compression in end-user software such as Photoshop or select Fine, HQ or SHQ quality settings in a camera.

Having discarded data from the RAW data file, JPEG’s final step is to create a visible (JFIF) file. This is achieved through Huffman coding. Like  DCT Huffman coding takes the set of values in each data unit and transforms it into another set of values. Unlike the DCT, Huffman coding is  lossless – no further information is discarded. Rather this process saves further space by assigning shorter codes to the most frequently used values. Like Morse code, Huffman Coding assigns shorter codes to the most frequently occurring values (vowels have shorter Morse code symbols than x or z) according to a Huffman table. As Calvin Haas explains: “Creating these tables generally involves counting how frequently each symbol (DCT code word) appears in an image, and allocating the bit strings accordingly. But, most JPEG encoders simply use the huffman tables presented in the JPEG standard” (2008).

Having mapped the data to new (shorter) values according to a Huffman table, the resultant file must include that table (or reference the standard table) to enable other software to decode the data as a visible image.

Having started as light photons, being turned into electrical charge and from there into data, the resultant information has been sorted and compressed by JPEG into a file ready to be written (alongside a RAW file) to the camera’s memory. The JPEG protocol wraps the compressed data within a format that includes the Huffman and Quantization tables necessary to decode the compressed data, the data itself and a series of markers that break the stream of encoded data into its component structures. These markers are 2 bytes length with the second denoting the type of marker.

One such marker is the APP marker which hold application-specific data. They are used by software or applications to additional information beyond what is demanded by JPEG. An encoder that uses JPEG can specify particular information within an APP marker. This is important when it comes to the two most widely used JPEG-encoded file formats.

JPEG does not define a file format. As John Miano says: “it says nothings about how colors are represented, but deals only with how component values are stored” (1999, p. 40). Other file formats such as TIFF can compress using JPEG. JPEG can therefore write more than one sort of data/image file. The two most common follow the JFIF (JPEG File Interchange Format)) ([NO STYLE for: Hamilton 1992]) and the EXIF (Exchangeable Image File Format) ([NO STYLE for: CIPA 2011]) standards. The two standards are very similar with EXIF allowing the addition of specific metadata tags but does not allow colour profiles. Most cameras encode to an EXIF file while imaging application use JFIF. Technically JFIF and EXIF use different APP markers (APP0 and APP1). In practice most photo applications use JFIF and include the metadata from the APP1 marker.⁠9

Other markers provide space in the file for comments; details of the width and height and number of components in the image;  the Huffman and Quantization tables.

As I shall discuss in the ‘The JPEG object in theory’ and ‘The JPEG object in practice’ chapters this ‘family of compression algorithms’ (Lane 1999]) can be addressed as an object in Harman’s terms not only in terms of its existence in paper standards documents but also in terms of its ‘weird’ quadruple existence within the digital imaging pipeline. Clearly however, it is possible to address this whole pipeline (or indeed the chemical imaging pipeline) through OOP.

OOP enables, even forces, us to see a panoply of objects in play in any situation or assemblage. Human and unhuman, material and virtual, even real and imaginary actants (in Latour’s terms) connect and reconnect in ways that we experience as processes or pipelines. In terms of chemical photography, the photographer, lens, shutter blades, gelatine, silver and  sodium thiosulfate (fixer) all have their own presence and material actuality. They all do things as individual objects as they connect and reconnect with each other. But they also form components of other objects: the camera, the photographic society, Snappy Snaps, Kodak. It is objects all the way down.

Similarly in the digital imaging pipeline hardware and software objects, mathematical algorithms and tables, silicon and electrical charges, Adobe, the photographer and photons are all in play. They all have their specificity and their connections. Some are material, others virtual. Some we can distinguish, others – like an algorithm have a form (like the law of gravity or Pi) have a weird presence and actuality. Some are often characterised as systems or contexts, but they too are objects just at a different scale. We may experience the pipeline as a process but what we are really faced with is a network of objects connecting and reconnecting within other objects.

Where OOP (at least in the Harman version I explore) differs is firstly in refusing to leave that focus on objects – to refuse to talk of systems, assemblages or contexts as anything other than objects, or as Tim Morton calls them ‘hyperobjects’. Secondly Harman’s OOP refuses to characterise those objects as defined by their relations. Rather they have an existence and, in Jane Bennett’s terms a vitality, that exceeds their relations. Thirdly, those objects are not processes. They are not in flux. Rather change is matter of new objects formed in new object connections. Finally, the objects in play in the digital imaging pipeline do not hold anything back. They do not harbour potential. They are fully present in their connections not harbouring potential, waiting to ‘become’.

This perspective on objects runs counter to much discussion about digital/software objects and protocols.

[NOTE: If you’ve managed to get this far and have any knowledge of maths, electrical engineering or suchlike and can tell me if I’ve made any great glaring errors ever… I’d be very grateful!]
  • CIPA 2011, Exchangeable Image File Format For Digital Still Cameras: Exif Version 2.3, Camera & Imaging Products Association. Retrieved September 14, 2011,  from
  • Haas, C 2008, JPEG Huffman Coding Tutorial, Retrieved September 13, 2011,  from
  • Hamilton, E 1992, JPEG File Interchange Format Version 1.02, World Wide Web Consortium. Retrieved September 14, 2011,  from
  • Lane, T 1999, JPEG image compression FAQ, part 1/2, Retrieved September 14, 2011,  from
  • Miano, J., 1999, Compressed Image File Formats, Addison Wesley, Reading, Mass..


1 For the sake of clarity I focus on basic black and white photography rather than colour imaging or reversal (slide) photography.

2 The camera I used as the basis for my RAW/JPEG imaging apparatus (the Olympus E-420) uses a Live MOS sensor, The brand name used by Leica, Panasonic and Olympus in their Four Thirds System  cameras  manufactured since 2006. The companies claim the sensor can achieve the same image quality as CCD-based sensors while keeping energy consumption down to CMOS levels.

3 CCD sensors are generally seen as more expensive, more power hungry but also having higher sensitivity and being capable of delivering higher quality. CMOS sensors tend to be found in mobile phone cameras

4 Of course OOPs would be clear that we have already been dealing with a whole series of nested objects in terms of the sensor, but for clarity’s sake I outline the key actants.

5 A bit in computer terms has a value of on or off, one or zero. A two-bit ADC would divide the information from the sensor into levels 00, 01, 10 and 11. An 8-bit sensor can divide it into 256 from 00000000 to 11111111. My Olympus E-0420 uses a 12-bit ADC dividing the information into 4096 possible levels.

6 This RAW data/file is not ‘pure’. Each camera has its own way of writing the RAW data, it’s own format. RAW converters (the part of software that interprets and renders that data as image within other software) have to know the various formats Olympus, nikon, Canon etc use in order to ‘make sense’ of that data.

7 JPEG works with 8-bit data.

8 This is why JPEG compression is referred to as ‘lossy compression’ because data is lost.

9 Strictly speaking this goes against the standards with both JFIF and EXIF demanding that their marker is first in the datastream. As with much software, this demand is fudged.