Computer Science

Database Replication

Database replication is the process of copying data from one database to another, ensuring that both databases are consistent and up-to-date. This is done to improve performance, increase availability, and provide redundancy in case of failures. It is commonly used in distributed systems and high-availability environments.

Written by Perlego with AI-assistance

Related key terms

Database Design

Database Normalisation

Database Scaling

Database Security

Database Sharding

DNA replication

Oracle Database

Viral Replication

11 Key excerpts on "Database Replication"

eBook - ePub
Logical Database Design Principles
- John Garmany, Jeff Walker, Terry Clark(Authors)
- 2005(Publication Date)
- Auerbach Publications
  (Publisher)
8 DESIGNING REPLICATED DATABASES

Database Replication is the copying of part or all of a database to one or more remote sites. The remote site can be in another part of the world, or right next to the primary site. Because of the speed and reliability of the Internet, many people believe that Database Replication is no longer needed. While it is true that distributed transactions provide access to real-time data almost anywhere in the world, they require that all databases operate all the time. When one database is down for any reason or cannot be contacted, other databases can no longer access its data. Replication solves this problem because each site has a copy of the data.

When a query accesses data on multiple databases, it is called a distributed query. Applications using distributed queries are difficult to optimize because each brand (in fact, different versions of the same brand) of database handles the execution of the query differently. Sometimes, the local database will retrieve data from the remote database and process it locally. This can result in high network bandwidth use and slow query performance. More modern databases will analyze the query and send a request to the remote database for only that information it needs—greatly reducing network traffic, but also making performance tuning very complicated. Either way, distributed queries always contain some network latency time. If this response delay becomes unacceptable, replicating the databases may be the solution. Querying local data is always faster than accessing remote data.

Replication of all or part of a database is becoming increasingly common as companies consolidate servers and information. Many companies use some form of replication to load data into the company data warehouse. It can also be used to create a separate database for reporting, thereby removing the impact of data aggregation from the main database. Many companies use replication to create a subset of data that is accessed by another application, a Web site for example. By replicating only a subset of the data, or using non-updateable replication, they can protect sensitive information while allowing users to have access to up-to-date data.
Sign up to read
Learn more about book
eBook - ePub
Principles of Transaction Processing
- Philip A. Bernstein, Eric Newcomer(Authors)
- 2009(Publication Date)
- Morgan Kaufmann
  (Publisher)
Chapter 9 Replication
9.1 Introduction
Replication is the technique of using multiple copies of a server or a resource for better availability and performance. Each copy is called a replica .

The main goal of replication is to improve availability, since a service is available even if some of its replicas are not. This helps mission critical services, such as many financial systems or reservation systems, where even a short outage can be very disruptive and expensive. It helps when communications is not always available, such as a laptop computer that contains a database replica and is connected to the network only intermittently. It is also useful for making a cluster of unreliable servers into a highly-available system, by replicating data on multiple servers.

Replication can also be used to improve performance by creating copies of databases, such as data warehouses, which are snapshots of TP databases that are used for decision support. Queries on the replicas can be processed without interfering with updates to the primary database server. If applied to the primary server, such queries would degrade performance, as discussed in Section 6.6 , Query-Update Problems in two-phase locking.

In each of these cases, replication can also improve response time. The overall capacity of a set of replicated servers can be greater than the capacity of a single server. Moreover, replicas can be distributed over a wide area network, ensuring that some replica is near each user, thereby reducing communications delay.

9.2 Replicated Servers

The Primary-Backup Model

To maximize a server’s availability, we should try to maximize its mean time between failures (MTBF) and minimize its mean time to repair (MTTR). After doing the best we can at this, we can still expect periods of unavailability. To improve availability further requires that we introduce some redundant processing capability by configuring each server as two server processes: a primary server that is doing the real work, and a backup server that is standing by, ready to take over immediately after the primary fails (see Figure 9.1
Sign up to read
Learn more about book
eBook - ePub
Advanced MySQL 8
Discover the full potential of MySQL and ensure high performance of your database
- Eric Vanier, Birju Shah, Tejaswi Malepati(Authors)
- 2019(Publication Date)
- Packt Publishing
  (Publisher)
replication to achieve high availability, in the case of failure.
Passage contains an image

Replication

Replication is defined as:
"The process of generating and reproducing multiple copies of data at one or more sites." - Thomas M Connolly, Carolyn E Begg, 2002.

When a database instance fails, the service it provided must be available through another route. This requires process redundancy. In a database, either only one processor can process all transactions, or there can be multiple processors. When there is a single process managing all transactions, we should have a standby process available, in the case of failure, for an active one. It is also necessary to make sure that this fail-over is seamless and the process itself can detect the failure.

Data redundancy is also required for high availability. Data redundancy can be achieved either by having a storage subsystem that maintains redundancy while presenting the data to DBMS processes as a single copy, or having the DBMS explicitly maintain multiple copies of the data.

Passage contains an image

Group replication

Group replication is a MySQL server plugin. It helps us create a replication topology that has the following attributes:
Sign up to read
Learn more about book
eBook - ePub
Professional Microsoft SQL Server 2008 Programming
- Robert Vieira(Author)
- 2010(Publication Date)
- Wrox
  (Publisher)
17 Replication Coming off the heels of significant change in 2005, replication is one of a few quiet areas in terms of version differences in SQL Server 2008. Indeed, virtually nothing has changed that isn't directly tied to a non-replication feature. (They had to allow for replication of the new data types, didn't they?) Replication is one of those things that everyone loves to ignore—until they need it. Then, it seems, there is a sudden crisis about learning and implementing it instantly (and not necessarily in that order, I'm sorry to say). So, what then, exactly, is replication? I'll shy entirely away from the Webster's definition of it and go to my own definition: Replication is the process of taking one or more databases and systematically providing a rule-based copy mechanism for that data to and potentially from a different database. Replication is often a topology and administration question. As such, many developers have a habit of ignoring it—bad idea. Replication has importance to software architects in a rather big way, as it can be a solution to many complex load and data distribution issues such as: Making data available to clients that are generally not connected to your main network Distributing the load associated with heavy reporting demands Addressing latency issues with geographically dispersed database needs Supporting geographic redundancy And those are just a few of the biggies. So, with that in mind, we're going to take a long look at replication. I'm going to warn you in advance that this isn't going to have quite as many walkthroughs as I usually do, but patience, my young padawan —there is a reason. In simple terms, once you've built one or two of the styles of replication, you have most of the “constructing” part of the learning out of the way. What's more, the actual building up of the replication instance is indeed mostly an administrator's role
Sign up to read
Learn more about book
eBook - ePub
MySQL 8 Administrator's Guide
- Chintan Mehta, Hetal Oza, Ankit K Bhavsar, Subhash Shah(Authors)
- 2018(Publication Date)
- Packt Publishing
  (Publisher)
It is assumed that you are reading this for two reasons. You're familiar with MySQL replication and are willing to gain more knowledge, and perhaps you're unfamiliar with MySQL replication and want to learn.

MySQL replication is useful for serving lots of different purposes. Usually, people start thinking about MySQL replication when they start having more queries than a single database server can handle. Based on this, do you have any guesses on what MySQL replication is? Replication is the technique to have more than one databases set up to serve single or multiple client applications. A client can be an end user or person who sends a request for any query in terms of read data or write data from different devices, such as computers, mobiles, tablets, and so on. These databases are replicas of the same database. This means all databases participating in Database Replication are exactly the same as each other. Replication works by frequently copying data from one database to all other replica databases. These databases may be located on the same database server, different database servers, or different machines altogether.

As mentioned earlier, Database Replication serves various purposes. It depends on the reason why MySQL Database Replication is set up. MySQL replication is set up to scale up a database or an application that is backed up by the database. It is also useful for maintaining database backups and reporting purposes. We will discuss these in detail a little later in this chapter.

MySQL replication is mostly set up for scaling reads. In any web application, the number of read operations is pretty higher compared to that of write database operations. Most common web applications are always read heavy. Consider an example of a social networking website. If we navigate to a user profile page, we see a lot of information such as the user's personal information, demographic information,
Sign up to read
Learn more about book
eBook - ePub
PostgreSQL 10 Administration Cookbook
Over 165 effective recipes for database management and maintenance in PostgreSQL 10
- Gianni Ciolli, Simon Riggs(Authors)
- 2018(Publication Date)
- Packt Publishing
  (Publisher)
Replication technology can be confusing. You might be forgiven for thinking that people have a reason to keep it that way. My observation is that there are many techniques, each with their own advocates, and the strengths and weaknesses are often hotly debated.

There are some simple underlying concepts that can help you understand the various options available. The terms used here are designed to avoid favoring any particular technique, and we've used standard industry terms whenever available.
Passage contains an image

Topics

Database Replication is the term we use to describe the technology used to maintain a copy of a set of data on a remote system.
There are usually two main reasons for you wanting to do this, and those reasons are often combined:

High Availability : Reducing the chances of data unavailability by having multiple systems, each holding a full copy of the data.

Data Movement : Allowing data to be used by additional applications or workload on additional hardware. Examples of this are Reference Data Management (RDM ), where a single central server might provide information to many other applications, and Business Intelligence /Reporting Systems .

Of course, both of those topics are complex areas, and there are many architectures and possibilities for implementing each of them.
What we will talk about here is High Availability, where there is no transformation of the data. We simply copy the data from one PostgreSQL database server to another. So we are specifically avoiding all discussion on ETL tools, EAI tools, inter-database migration, data warehousing strategies, and so on. Those are valid topics in IT architecture; it's just that we don't cover them in this book.

Passage contains an image

Basic concepts

Let's look at the basic architecture. Typically, individual database servers are referred to as nodes. The whole group of database servers involved in replication is known as a cluster. That is the common usage of the term, but be careful; the term cluster is also used for two other quite separate meanings elsewhere in PostgreSQL. Firstly, cluster is sometimes used to refer to the database instance, though I prefer the term database server
Sign up to read
Learn more about book
eBook - ePub
Professional Microsoft SQL Server 2008 Administration
- Brian Knight, Ketan Patel, Wayne Snyder, Ross LoForte, Steven Wort(Authors)
- 2011(Publication Date)
- Wrox
  (Publisher)
Chapter 16 Replication
Today's enterprise needs to distribute its data across many departments and geographically dispersed offices. SQL Server replication provides ways to distribute data and database objects among its SQL Server databases, databases from other vendors such as Oracle, and mobile devices such as Pocket PC and point-of-sale terminals. Along with log shipping, database mirroring, and clustering, replication provides functionalities that satisfy customers' needs for load balancing, high availability, and scaling.

This chapter introduces you to the concept of replication, explaining how to implement basic snapshot replication, and noting things to pay attention to when setting up transactional and merge replication.
Replication Overview
SQL Server replication closely resembles the magazine publishing industry, so we'll use that analogy to explain its overall architecture. Consider National Geographic . The starting point is the large pool of journalists writing articles . From all the available articles, the editor picks which ones will be included in the current month's magazine. The selected set of articles is then published in a publication . Once a monthly publication is printed, it is shipped out via various distribution channels to subscribers all over the world.

In SQL Server replication, similar terminology is used. The pool from which a publication is formed can be considered a database. Each piece selected for publication is an article ; it can be a table, a stored procedure, or another database object. Like a magazine publisher, replication also needs a distributor
Sign up to read
Learn more about book
eBook - ePub
Microsoft SQL Server 2012 Administration
Real-World Skills for MCSA Certification and Beyond (Exams 70-461, 70-462, and 70-463)
- Tom Carpenter(Author)
- 2013(Publication Date)
- Sybex
  (Publisher)
Replication can be used to automatically export data for delivery to multiple clients. In addition to replication, data can be imported or exported from files. As the DBA for your organization, you may be called upon to implement a replication strategy. To do this, you must understand the replication model implemented in SQL Server and the steps required to enable it. You may also need to import data from CSV files (or other file types), and you should be aware of the methods and tools used for this process as well. Both replication and data import and export are addressed in this chapter.

If you are preparing for the 70-462 exam, it is important that you know how to choose the proper replication type and implement replication for a specified database. You should also know how to import and export data from SQL Server databases.

SQL Server Replication

When you want the same data to be available in multiple physical locations or on multiple server instances, you may choose to implement data replication. Data replication, in SQL Server, should not be conceptualized as Database Replication, because you can replicate part of the database, and you are not required to replicate the entire database. Instead, you create publications that include articles. The articles are tables and other objects that you want to replicate. A publication could include an entire database, but it doesn’t have to; this is why you should think of it as data replication and not Database Replication. As an example, salespeople may want to replicate just the portion of the customers table that is in their area of responsibility. In this case, the computer of the salesperson would be the subscriber. In this section, you’ll learn about the different replication types, replication roles (such as publisher and distributor), and replication models, as well as how to implement the different roles used to provide the replication architecture. You’ll also learn to monitor replication and replication performance.
Sign up to read
Learn more about book
eBook - ePub
Building Dependable Distributed Systems
- Wenbing Zhao(Author)
- 2014(Publication Date)
- Wiley-Scrivener
  (Publisher)
Chapter 4

Data and Service Replication

Different from checkpointing/logging and recovery-oriented computing, which focus on the recovery of an application should a fault occur, the replication technique offers another way of achieving high availability of a distributed service by masking various hardware and software faults. The goal of the replication technique is to extend the mean time to failure of a distributed service. As the name suggests, the replication technique resorts to the use of space redundancy, i.e., instead of running a single copy of the service, multiple copies are deployed across a group of physical nodes for fault isolation. For replication to work (i.e., to be able to mask individual faults), it is important to ensure that faults occur independently at different replicas.

The most well-known approach to service replication is state-machine replication [26]. In this approach, each replica is modeled as a state machine that consists of a set of state variables and a set of interfaces accessible by the clients that operate on the state variables deterministically. With the presence of multiple copies of the state machine, the issue of consistency among the replicas becomes important. It is apparent that the access to the replicas must be coordinated by a replication algorithm so that they remain consistent at the end of each operation. More specifically, a replication algorithm must ensure that a client’s request (that invokes on one of the interfaces defined by the state machine) reaches all non-faulty replicas, and all non-faulty replicas must deliver the requests (that potentially come from different clients) in exactly the same total order. It is important that the execution of a client’s request is deterministic, i.e., given the same request, the same response will be generated at all non-faulty replicas. If an application contains nondeterministic behavior, it must be i.e., rendered deterministic by controlling such behavior, e.g.
Sign up to read
Learn more about book
eBook - ePub
SQL Server 2019 Administrator's Guide
A definitive guide for DBAs to implement, monitor, and maintain enterprise database solutions, 2nd Edition
- Marek Chmel, Vladimír Mužný(Authors)
- 2020(Publication Date)
- Packt Publishing
  (Publisher)
read-only access and such configuration allows performance offload to the secondary system.

Replication

Replication is a feature used for moving data from one server to another and allows for many different scenarios and topologies.
Note:
Replication uses a Publisher /Subscriber model, where the Publisher is the server offering the content via a replication article and the Subscribers are getting the data.

The configuration is more complex compared to mirroring and log shipping but allows much more variety in terms of configuring security, performance, and topology.

Replication has many benefits, and a few of them are as follows:
- Works at the object level (whereas other features work at the database or instance level)
- Allows merger replication , where more servers synchronize data between each other
- Allows bi-directional synchronization of data
- Allows more than one SQL Server partner (Oracle, for example)
There's several different replication types that can be used with SQL Server. You can choose them based on your needs for HA/DR and the data availability requirements on the secondary servers. These options include the following:
- Snapshot replication
- Transactional replication
- Peer-to-peer replication
- Merge replication
In the next section, we will look at SQL Server replication in detail.

Configuring replication on SQL Server

In this section, we will focus on SQL Server replication in detail and we'll learn how to configure replication for a database between different servers. Like with many other features, the configuration can be done with SQL Server Management Studio (SSMS ) or with Transaction-SQL (T-SQL ) code, which sometimes provides greater flexibility. Be aware that replication is one of the features that you can configure immediately during installation so that it's available on your system. If you haven't installed the feature, you can always add replication
Sign up to read
Learn more about book
eBook - ePub
Handbook of Data Management
1999 Edition
- Sanjiv Purba(Author)
- 2019(Publication Date)
- Auerbach Publications
  (Publisher)
It is also possible to address the concurrency control issues within the application design itself by designing the application to avoid conflicts or by compensating for conflicts when they occur. Conflict avoidance ensures that all transactions are unique — that updates only originate from one site at a time. Schemas can be fragmented in such a manner as to ensure conflict avoidance. In addition, each site can be assigned a slice of time for delivering updates, thereby avoiding conflicts by establishing a business practice.

Special Considerations for Disaster Recovery

One of the most common uses of replication is to replicate transactions to a backup system for the purpose of disaster recovery. However, users are advised to pay close attention to several issues relating to the use of replication for disaster recovery.

In a high-volume transaction environment, many transactions that have occurred at the primary system may be lost in flight and will need to be reentered into the backup system. When the primary fails, how will the lost transactions be identified so that they can be reentered? How will the users be switched over to the backup? The backup system will have a different network address, requiring users to manually log in to the backup and restart their applications.

Is there a documented process for users and administrators to follow in the event of a disaster? Once the primary is recovered, is there a clean mechanism for switching users back to the primary? In the best-case scenario, it should be possible to switch the roles of the systems, making the backup the new primary and making the primary the new backup once it is recovered. This practice eliminates the need to switch all users back to the original configuration.

Special Consideration for Network Failures

Although the availability of data is enhanced through replication because there is a local copy of data, users must understand the implications of extended network failures. Network failures can cause replicate databases to quickly drift apart because local updates are not being propagated. If the application is sensitive to the sequence of transactions originating at various replicate sites, then extended network failures may cause local processing to shut down, nullifying any perceived benefit to data availability through replication.
Sign up to read
Learn more about book

Learn about this page

Index pages curate the most relevant extracts from our library of academic textbooks. They’ve been created using an in-house natural language model (NLM), each adding context and meaning to key research topics.

Explore more topic indexes

Biological Sciences

Languages & Linguistics

Politics & International Relations

Social Sciences

Technology & Engineering