Computer Science

Data Encoding

Data encoding is the process of converting information from one format to another for efficient transmission or storage. It involves transforming data into a code that can be easily interpreted by a computer or other electronic device. This process is essential for ensuring that data is accurately and efficiently transmitted and stored.

Written by Perlego with AI-assistance

Related key terms

Binary Conversion

Coding Frame Psychology

Compression

Converting Analogue to Digital

Data Compression

Data Encryption

Data Representation in Computer Science

Encryption

Memory Processes

Transduction

2 Key excerpts on "Data Encoding"

eBook - ePub
Statistical Data Cleaning with Applications in R
- Mark van der Loo, Edwin de Jonge(Authors)
- 2018(Publication Date)
- Wiley
  (Publisher)
is easier to compute.

Many encodings encountered in practice are actually character encodings. That is, they can be (de)coded on a symbol-by-symbol basis. Examples include the widely used ASCII, UTF-8, and Latin-1 character encoding standards. Encodings that cannot allow for symbol-by-symbol (de)-coding exist as well. For example, ‘punycode’ (Costello, 2003) is an encoding system that translates strings made up of symbols from the Unicode alphabet (see subsequent text) to a possibly longer sequence of ASCII symbols. A second example is the ‘BOCU-1’ encoding standard (Scherer and Davis, 2006) that compresses strings from the same alphabet while keeping, among other things, the binary sorting order unaltered.

3.2.2 Unicode

Historically, hundreds of encodings have been developed and many of them are widely in use. One practical problem is that different encoding schemes can assign different symbols to the same byte sequence. Or, a byte sequence valid in one encoding is invalid in another. The development of all these conflicting encodings has led to the current situation where, in general, one cannot determine by looking at the byte sequence alone, which encoding is used with certainty. This is why, for example, the XML standard requires the encoding of XML documents to be declared explicitly if it is not in one of the encodings predefined in the XML standard (W3C, 2008).

In an attempt to resolve this situation, the Unicode Consortium developed a set of standards that aim to facilitate the exchange of textual data between systems using different encodings. Two of these standards are of particular interest: first of all, the Unicode Standard aims to assign a number to every possible symbol, used in any language, ever. The unicode standard can be seen as a large table where each row contains a description of a symbol and a number. The column with descriptions can be interpreted as the Unicode alphabet: it is the list of abstract symbols contained in the standard. The numbers are called code points in Unicode terminology. The standard is still updated frequently. For example, in version 7.0 of the standard (Allen et al., 2014), the alphabet has been extended with more than 2500 symbols, including the set of ‘Sinhala Archaic Numbers’ and an extensive set of emojis. In total, the Unicode standard currently has room to define code points of which
Sign up to read
Learn more about book
eBook - ePub
Practical Digital Forensics
Forensic Lab Setup, Evidence Analysis, and Structured Investigation Across Windows, Mobile, Browser, HDD, and Memory (English Edition)
- Dr. Akashdeep Bhardwaj, Keshav Kaushik(Authors)
- 2023(Publication Date)
- BPB Publications
  (Publisher)
HAPTER 2
Essential Technical Concepts

Introduction

Undertaking a digital forensics investigation demands a deep grasp of some of computing’s most fundamental technological concepts. To discover and manage digital evidence, you must understand how information is stored in computers, number theory, how digital files are constructed, and the many types of storage units and their differences. These fundamental topics will be covered in this chapter. Computers store, process, and portray digital data in a certain way, as explained in this chapter.

Structure
In this chapter, we will cover the following:

Different number system

Encoding schema

File carving and structure

File metadata

Hash analysis

System memory

Storage

Filesystem

Cloud computing

Windows OS

Networking

Objectives

We will explore how a system represents data in this part, as well as typical numbering systems and the principal encoding strategy used by machines to generate human-readable text. Let us start with the standard numbering scheme.

Decimal (Base-10)

The base-10 system, which employs 10 digits or symbols (0, 1, 2, 3, 4, 5, 6, 7, 8, and 9) to represent its values, is the most extensively used numbering system that we use every day while conducting arithmetic calculations (for example, 17 + 71 = 88). The value a number represents is determined by its position in decimal, where each digit is multiplied by the power of 10 corresponding with that digit’s location. Take, for example, the decimal number 7,564. This number can be interpreted as:
7,654 = 7,000 + 600 + 50 + 4 An understanding of the decimal numbering system is essential, as the other numbering systems follow similar rules.
Binary

Data is stored in binary format on computers, which is the base-2 numeric system represented by 1s and 0s. The computer language, binary, follows the same laws as a decimal. Binary, on the other hand, contains two symbols (0 and 1) and multiplies by the power of two, unlike decimal, which has 10 symbols and multiplies by the power of 10. Each 1 OR 0 in a computer is referred to as a bit (or binary digit), and the total of eight bits is referred to as a byte. The most significant bit is the highest-order bit, which is placed in the leftmost bit and has the greatest significant bit value (MSB). On the other hand, the Least Significant Bit is positioned in the rightmost bit and is the lowest significant bit value (LSB ). Table 2.1
Sign up to read
Learn more about book

Learn about this page

Index pages curate the most relevant extracts from our library of academic textbooks. They’ve been created using an in-house natural language model (NLM), each adding context and meaning to key research topics.

Explore more topic indexes

View all