Computer Science

Data Compression

Data compression is the process of reducing the size of data to save storage space or transmission time. It is achieved by encoding information using fewer bits than the original representation. This can be done through various algorithms and techniques, such as lossless compression which preserves all the original data, or lossy compression which sacrifices some data to achieve higher compression ratios.

Written by Perlego with AI-assistance

Related key terms

Compression

Data Encoding

Data Encryption

Data Representation in Computer Science

9 Key excerpts on "Data Compression"

eBook - ePub
Elements of Multimedia
- Sreeparna Banerjee(Author)
- 2019(Publication Date)
- Chapman and Hall/CRC
  (Publisher)
7 Multimedia Data Compression 7.1 Introduction
Data Compression is one of the most important requirements in the development of multimedia. The need for Data Compression in computers is a consequence of the limitations of available memory storage capacity and processing capability [1 ,2 ,3 ,4 ,5 ,6 ]. The need for Data Compression is also prompted by the following three factors:
1. Representation of multimedia data 2. Storage and processing of data 3. Transmission of multimedia information
Some examples give an idea of representation, storage, and transmission requirements for various multimedia elements. A grayscale image of dimension 320 × 240 × 8 bits requires 77 Kb (kilobytes) of storage. A color image of 1100 × 900 pixels and 24 bits color requires 3 Mb (megabytes), and color video with 640 × 480 and 24-bit color having a transmission rate of 30 fps requires 27.6 Mb. For color video of 640 × 480 pixels and 24 bits color with a transmission rate of 30 frames per second (fps), a transmission rate of 27.6 Mb per second is required. High-definition television (HDTV; 1920 × 1080 pixels, 24 bits color, 30 fps) requires 1.5 Gb/sec; for cable TV, the typical transmission rate is 18 Mb/sec. For audio CD, a rate of 1.4 Mb/sec is required. For cinema quality audio with six channels, the transmission rate is 1 Gb/hour.

To understand why compression is necessary, one must distinguish between data and information [7 ]. Data is not synonymous with information. The latter is expressed through data, so its volume content can exceed the amount of information. Data that provides no relevant information is called redundant data or redundancy. The goal of multimedia coding or compression is to reduce the amount of data by reducing the amount of redundancy. Different types of redundancies are discussed in a later section.

Compression
Sign up to read
Learn more about book

eBook - ePub

Art of Digital Audio

John Watkinson(Author)
2013(Publication Date)
Routledge
(Publisher)

5 Compression

5.1 Introduction

Compression, bit rate reduction and data reduction are all terms which mean basically the same thing in this context. In essence the same (or nearly the same) audio information is carried using a smaller quantity or rate of data. It should be pointed out that in audio, compression traditionally means a process in which the dynamic range of the sound is reduced, typically by broadcasters wishing their station to sound louder. However, when bit rate reduction is employed, the dynamics of the decoded signal are unchanged. Provided the context is clear, the two meanings can co-exist without a great deal of confusion.

There are several reasons why compression techniques are popular:

(a)	Compression extends the playing time of a given storage device.
(b)	Compression allows miniaturization. With fewer data to store, the same playing time is obtained with smaller hardware. This is useful in portable and consumer devices.
(c)	Tolerances can be relaxed. With fewer data to record, storage density can be reduced, making equipment which is more resistant to adverse environments and which requires less maintenance.
(d)	In transmission systems, compression allows a reduction in bandwidth which will generally result in a reduction in cost. This may make possible some process which would be uneconomic without it.
(e)	If a given bandwidth is available to an uncompressed signal, compression allows faster than real-time transmission within that bandwidth.
(f)	If a given bandwidth is available, compression allows a better-quality signal within that bandwidth.

Figure 5.1 In (a) a compression system consists of compressor or coder, a transmission channel and a matching expander or decoder. The combination of coder and decoder is known as a codec. (b) MPEG is asymmetrical since the encoder is much more complex than the decoder.

Compression is summarized in Figure 5.1 . It will be seen in (a) that the PCM audio data rate is reduced at source by the compressor. The compressed data are then passed through a communication channel and returned to the original audio rate by the expander. The ratio between the source data rate and the channel data rate is called the compression factor. The term coding gain is also used. Sometimes a compressor and expander in series are referred to as a compander. The compressor may equally well be referred to as a coder and the expander a decoder in which case the tandem pair may be called a codec

Learn more about book

eBook - ePub
Digital Image Processing with Application to Digital Cinema
- KS Thyagarajan(Author)
- 2005(Publication Date)
- Routledge
  (Publisher)
8 Image Compression 8.1 INTRODUCTION
So far we have described several image processing methods that enable us to modify a given image so that a specific image characteristic is altered. This chapter describes another image processing method used in compressing images. Thus, image compression is a digital process by which the amount of data (in bits) in a given image is reduced to as low as desired. The need for image compression stems from the fact that more and more image and video data are used for transmission and storage in this Internet age. As more and more TV channels are introduced, transmission bandwidth becomes very precious. For example, the data rates for SDTV and HDTV are shown in Table 8-1 . As shown in Table 8-1 , raw video data rates range from about 20 to 120MB/s. However, the transmission channel bandwidths are around 4 and 8MHz, respectively, for SDTV and HDTV, hence the need for Data Compression. In fact the required compression ratios for the two TV systems are about 20:1 and 60:1. The term compression ratio used earlier refers to the ratio of the number of bits in the original digital source data to that in compressed digital data.

At this point it is worth mentioning the difference between the terms bandwidth and Data Compression. Bandwidth compression refers to an analog signal, whose bandwidth is reduced from that of the original signal. Obviously, this can be achieved by filtering the analog signal through an appropriate filter. Data Compression, however, refers to a process of reducing the data rate of a digital signal. Data Compression does not necessarily reduce the signal bandwidth. Consider, for instance, a speech signal in telephony. The speech signal in native form has a bandwidth of about 4kHz. Therefore, a transmission medium supporting a maximum of 4kHz is enough to carry the analog speech signal. If the same speech signal is to be conveyed through digital means, then it has to be sampled at least at a rate of 8kHz (Nyquist frequency) with each sample quantized to 8 bits or more. So, the same analog signal in digital form generates data at a rate of 64K bits/s. In order to transmit this digital signal using digital modulation schemes, a bandwidth of about 32kHz is required. In fact, the speech signal in digital form has expanded its bandwidth over that of the same signal in analog form. Even if we compress the digital speech signal by a factor of two, which gives a data rate of 32K bits/s, it still requires the transmission channel to support a bandwidth of about 16kHz. Thus, we see that Data Compression does not reduce the bandwidth of the original signal source, which is analog. However, the converse is true. Despite this fact, converting analog signal into digital signal has many service features that the analog signal cannot offer, which is one of the important reasons for the exploding deployment of digital techniques in almost all communications applications.
Sign up to read
Learn more about book
eBook - ePub
Compression for Great Video and Audio
Master Tips and Common Sense
- Ben Waggoner(Author)
- 2013(Publication Date)
- Routledge
  (Publisher)
In fact, we use randomness as a measure of compressibility. Compression is sometimes called “entropy coding,” since what you’re really saving is the entropy (randomness) in the data, while the stuff that could be predicted from that entropy is what gets compressed away to be reconstructed on decode.

The More Efficient the Coding, the More Random the Output

Using a codebook makes the file smaller by reducing redundancy. Because there is less redundancy, there is by definition less of a pattern to the data itself, and hence the data itself looks random. You can look at the first few dozen characters of a text file, and immediately see what language it’s in. Look at the first few dozen characters of a compressed file, and you’ll have no idea what it is.

Data Compression
Data Compression is compression that works on arbitrary content, like computer files, without having to know much in advance about their contents. There have been many different compression algorithms used over the past few decades. Ones that are currently available use different techniques, but they share similar properties.
The most-used Data Compression technique is Deflate, which originated in PKWare’s .zip format and is also used in .gz files, .msi installers, http header compression, and many, many other places. Deflate was even used in writing this book—Microsoft Word’s .docx format (along with all Microsoft Office “.???x” formats) is really a directory of files that are then Deflated into a single file.

For example, the longest chapter in my current draft (“Production, Post, and Acquisition”) is 78,811 bytes. Using Deflate, it goes down to 28,869 bytes. And if I use an advanced texttuned compressor like PPMd, (included in the popular 7-Zip tool), it can get down to 22,883 bytes. But that’s getting pretty close to the theoretical lower limit for how much this kind of content can be compressed. That’s called the Shannon limit, and Data Compression is all about getting as close to that as possible.
Sign up to read
Learn more about book
eBook - ePub
Nine Algorithms That Changed the Future
The Ingenious Ideas That Drive Today's Computers
- John MacCormick(Author)
- 2011(Publication Date)
- Princeton University Press
  (Publisher)
every file. But a good compression algorithm will produce substantial savings on certain common types of files.

So how can we get our hands on this free lunch? How on earth can you make a piece of data, or information, smaller than its actual “true” size without destroying it, so that everything can be reconstructed perfectly later on? In fact, humans do this all the time without even thinking about it. Consider the example of your weekly calendar. To keeps things simple, let's assume you work eight-hour days, five days a week, and that you divide your calendar into one-hour slots. So each of the five days has eight possible slots, for a total of 40 slots per week. Roughly speaking, then, to communicate a week of your calendar to someone else, you have to communicate 40 pieces of information. But if someone calls you up to schedule a meeting for next week, do you describe your availability by listing 40 separate pieces of information? Of course not! Most likely you will say something like “Monday and Tuesday are full, and I'm booked from 1 p.m. to 3 p.m. on Thursday and Friday, but otherwise available.” This is an example of lossless Data Compression! The person you are talking to can exactly reconstruct your availability in all 40 slots for next week, but you didn't have to list them explicitly.

At this point, you might be thinking that this kind of “compression” is cheating, since it depends on the fact that huge chunks of your schedule were the same. Specifically, all of Monday and Tuesday were booked, so you could describe them very quickly, and the rest of the week was available except for two slots that were also easy to describe. It's true that this was a particularly simple example. Nevertheless, Data Compression in computers works this way too: the basic idea is to find parts of the data that are identical to each other and use some kind of trick to describe those parts more efficiently.
This is particularly easy when the data contains repetitions. For example, you can probably think of a good way to compress the following data:
Sign up to read
Learn more about book
eBook - ePub
Television Technology Demystified
A Non-technical Guide
- Aleksandar Louis Todorovic(Author)
- 2014(Publication Date)
- Routledge
  (Publisher)
In that respect, compression is not only a generic attribute of digital television since it existed in analog television as well. The interlace principle in itself represents a 2:1 compression since a progressive scanning would require a band-width twice as large. The color-coding standards are also a sort of compression because they permit two additional color difference signals to squeeze into the channel designed and of the dimensions for the transmission of a single luminance (black-and-white) signal.
The digital domain, however, offers considerably more opportunities for compression or bit-rate reduction. Compression methods can be divided into two general groups:

lossless compression — the rejected redundancy is fully recoverable at the moment of decompression.

lossy compression — this is nonreversible and causes definitive loss of some picture information. However, thanks to the psychophysical characteristics of human vision and its limitations, even a lossy compression can produce video signals whose quality is subjectively assessed as acceptable; in some instances, depending on the compression ratio and on the content of the compressed pictures, that quality can even be very good.

The goal of all compression methods is to achieve a maximum compression efficiency that represents the relationship between the bit rate and the picture quality at the end of the decoding process. Higher efficiency could mean better picture quality at a given bit rate or it could mean that a lower bit rate will ensure a given picture quality.

Every signal (or, more precisely, every piece of information) has two parts (see Figure 6.1) :

•entropy — a part that is unpredictable and that must be kept at all times as complete as possible;

Figure 6.1 The principle of compression (a. Ideal compression, b. Excessive compression, c. Practical compression)

•redundancy — a part that has a very high degree of repetitiveness or is highly predictable and can be easily reconstructed from a simple initial indication.

All compression methods function by rejecting as much redundancy as possible while preserving the entropy untouched. In Figure 6.1
Sign up to read
Learn more about book
eBook - ePub
Videoconferencing
The Whole Picture
- James R. Wilcox(Author)
- 2000(Publication Date)
- Routledge
  (Publisher)
interframe codecs. They, too, try to achieve entropy through compression.

What is entropy? It is a scientific term that refers to the measure of disorder, or uncertainty, in a system. Good compression techniques eliminate signal duplication by using shorthand methods that are readily understood by the coding and decoding devices on each end. The only part of the signal that must be fully described is the part that is random, and is thereby impossible to predict.

Redundancy-elimination compression first worked on achieving entropy in text messages. Text files consist of only about 50 different characters, some of which appear often. The letters E, T, O, and I, for instance, collectively account for 37% of all character appearances in an average text file. David Huffman, a researcher at MIT, noted and subsequently exploited their repetition. In 1960, Huffman devised the first text compression algorithm. The idea behind Huffman encoding is similar to that of Morse code. It assigns short, simple codes to common characters or sequences and longer more complex ones to those that appear infrequently. Huffman encoding can reduce a text file by approximately 40%. A lossless compression technique, it is used to reduce long stings of zeros in many standards-based video compression systems including H.261, H.263, and MPEG.

Other lossless compression techniques followed Huffman coding. Developed in the 1970’s, Lempel-Ziv-Welch (LZW) compression focused not on the characters themselves, but on repetitive bit combinations. LZW coding builds a dictionary of these commonly used data sequences and represents them with abbreviated codes. Originally developed for text, another technique called run-length coding, compresses at the bit level by reducing the number of zeros and ones in a file. Sending only one example, followed by a shorthand description of the number of times it repeats, indicates a string of identical bits. LZW and run-length coding can be used in video compression, but are usually performed after lossy techniques have done their job.
Sign up to read
Learn more about book
eBook - ePub
The Manual of Photography
- Elizabeth Allen, Sophie Triantaphillidou(Authors)
- 2012(Publication Date)
- Routledge
  (Publisher)
Chapter | 29 |

Image compression

Elizabeth Allen
All images © Elizabeth Allen unless indicated.
INTRODUCTION

The growth in global use of the Internet, coupled with improvements in methods of transmitting digital data, such as the widespread adoption of broadband and wireless networking, mean that an ever greater range of information is represented using digital imagery. Improvements in digital image sensor technology enable the production of larger digital images at acquisition. Advances in areas such as 3D imaging, multispectral imaging and high-dynamic-range imaging all add to the already huge requirements in terms of storage and transmission of the data produced. The cost of permanent storage continues to drop, but the need to find novel and efficient methods of Data Compression prior to storage remains a relevant issue.

Much work has been carried out, over many decades, in the fields of communications and signal processing to determine methods of reducing data, without significantly affecting the information conveyed. More recently there has been a focus on developing and adapting these methods to deal specifically with data representing images.

In many cases the unique attributes of images (compared to those of other types of data representation), for example their spatial and statistical structures, and the typical characteristics of natural images in the frequency domain are exploited to reduce file size. Additionally, the limitations of the human visual system, in terms of resolution, the contrast sensitivity function (see Chapter 4 ), tone and colour discrimination, are used in clever ways to produce significant compression of images which can appear virtually indistinguishable from uncompressed originals.

UNCOMPRESSED IMAGE FILE SIZES

The uncompressed image file size is the size of the image data alone, without including space taken up by other aspects of a file stored in a particular format, such as the file header and metadata. It is calculated on the basis that the same number of binary digits (or length of code) is assigned to every pixel. The image file size stored on disc (accessed through the file properties) may vary significantly from this calculated size, particularly if there is extra information embedded in the file, or if the file has been compressed in some way. The uncompressed file size (in bits) is calculated using:
Sign up to read
Learn more about book

eBook - ePub

The MPEG Handbook

John Watkinson(Author)
2012(Publication Date)
Routledge
(Publisher)

MPEG is, however, much more than a compression scheme as it also standardizes the protocol and syntax under which it is possible to combine or multiplex audio data with video data to produce a digital equivalent of a television program. Many such programs can be combined in a single multiplex and MPEG defines the way in which such multiplexes can be created and transported. The definitions include the metadata which decoders require to demultiplex correctly and which users will need to locate programs of interest.

Figure 1.2 (a) MPEG defines the protocol of the bitstream between encoder and decoder. The decoder is defined by implication, the encoder is left very much to the designer. (b) This approach allows future encoders of better performance to remain compatible with existing decoders. (c) This approach also allows an encoder to produce a standard bitstream whilst its technical operation remains a commercial secret.

As with all video systems there is a requirement for synchronizing or genlocking and this is particularly complex when a multiplex is assembled from many signals which are not necessarily synchronized to one another.

1.2 Why compression is necessary

Compression, bit rate reduction, data reduction and source coding are all terms which mean basically the same thing in this context. In essence the same (or nearly the same) information is carried using a smaller quantity or rate of data. It should be pointed out that in audio compression traditionally means a process in which the dynamic range of the sound is reduced. In the context of MPEG the same word means that the bit rate is reduced, ideally leaving the dynamics of the signal unchanged. Provided the context is clear, the two meanings can co-exist without a great deal of confusion.

There are several reasons why compression techniques are popular:

a	Compression extends the playing time of a given storage device.
b	Compression allows miniaturization. With fewer data to store, the same playing time is obtained with smaller hardware. This is useful in ENG (electronic news gathering) and consumer devices.
c	Tolerances can be relaxed. With fewer data to record, storage density can be reduced making equipment which is more resistant to adverse environments and which requires less maintenance.

Learn more about book

Learn about this page

Index pages curate the most relevant extracts from our library of academic textbooks. They’ve been created using an in-house natural language model (NLM), each adding context and meaning to key research topics.

Explore more topic indexes

View all