Difference between revisions of "Replication Server Performance Tuning"

From SybaseWiki
Jump to: navigation, search
m (Performance Tuning)
 
(37 intermediate revisions by 7 users not shown)
Line 1: Line 1:
==Things about Replication Server==
+
Here are some rules of thumb to tune a replication server (mainly Warm Standby setup). They may not always be suitable for your environment, but it can be used as a starting point for further improvements. The list is not complete.
  
===Performance Tuning===
+
==RepAgent==
 
+
Here are some rules of thumb to tune a replication server. They may not always be suitable for your environment, but it can be used as a starting point for further improvements.
+
 
+
====RepAgent====
+
 
Some basic settings need to be changed to get a better througput from the RepAgent to the RepServer. Change these setting with the sp_config_rep_agent stored procedure and do a restart of the RepAgent afterwards.
 
Some basic settings need to be changed to get a better througput from the RepAgent to the RepServer. Change these setting with the sp_config_rep_agent stored procedure and do a restart of the RepAgent afterwards.
 
* "scan batch size", set to "10000"
 
* "scan batch size", set to "10000"
Line 15: Line 11:
 
In the output of sp_help_rep_agent the markers are shown like this (example): (2954283,4). The first number is the page number and second one the record number on the page.
 
In the output of sp_help_rep_agent the markers are shown like this (example): (2954283,4). The first number is the page number and second one the record number on the page.
  
====SQM====
+
==SQM==
 
Do not spend a lot of time on this. Just make sure that the stable devices are on raw disk when your operating system supports it.
 
Do not spend a lot of time on this. Just make sure that the stable devices are on raw disk when your operating system supports it.
  
====Replicating Stored Procedures====
+
==STS==
 +
Increase the value of sts_cachesize so rows of the RSSD system tables can be cached in memory. A fairly good indication of the value of this configuration parameter can be determined by adding the rowcount of all system tables and use the total number at the value. Add something like 10% to allow for growth.
 +
configure replication server set sts_cachesize to '550'
 +
Cache the system tables itself by running these commands:
 +
configure replication server set sts_full_cache_rs_classes to 'on'
 +
configure replication server set sts_full_cache_rs_columns to 'on'
 +
configure replication server set sts_full_cache_rs_config to 'on'
 +
configure replication server set sts_full_cache_rs_databases to 'on'
 +
configure replication server set sts_full_cache_rs_datatype to 'on'
 +
configure replication server set sts_full_cache_rs_diskaffinity to 'on'
 +
configure replication server set sts_full_cache_rs_functions to 'on'
 +
configure replication server set sts_full_cache_rs_objects to 'on'
 +
configure replication server set sts_full_cache_rs_publications to 'on'
 +
configure replication server set sts_full_cache_rs_queues to 'on'
 +
configure replication server set sts_full_cache_rs_repdbs to 'on'
 +
configure replication server set sts_full_cache_rs_routes to 'on'
 +
configure replication server set sts_full_cache_rs_sites to 'on'
 +
configure replication server set sts_full_cache_rs_systext to 'on'
 +
configure replication server set sts_full_cache_rs_translation to 'on'
 +
configure replication server set sts_full_cache_rs_users to 'on'
 +
configure replication server set sts_full_cache_rs_version to 'on'
 +
In RepServer version before 15.0 you could technically also cache the rs_locater table. '''Do not''' cache this table. A crash of Repserver can then lead to inconsistencies.
 +
 
 +
Also, you should be aware that while caching these tables will improve performance of replication in general and will reduce CPU utilization by repserver and dataserver, it can impact operations that cause the table caches to be rebuilt.  Specifically, if rs_objects and rs_columns are cached, it will severely increase the time required to create, drop, or alter replication definitions.  Repserver's CPU usage will spike during this time as it is forced to rebuild its cache.
 +
 
 +
One user experienced this after executing the above commands to cache the tables on RepServer 12.5.  A support call to Sybase lked to un-caching rs_objects and rs_columns.  From a private communication with Sybase support:
 +
CR 500890 More efficient processing of changes to cached that are identified to sts_full_cache_<tablename>
 +
Workaround:It is not necessary to fully cache tables.  STS cache automatically does that for RS and is quite efficient.
 +
If there are frequent changes to RS like adding/dropping repdefs and subscriptions, it is best NOT to fully cache those tables,
 +
but to increase the sts_cachesize to hold more entries
 +
The user's experience was restricted purely to slow adding/dropping of repdefs, whereas subscriptions were added/dropped with normal speed. Sybase support further communicated the fact that this will not be changed in later versions.
 +
 
 +
==SQT==
 +
The most important configuration parameter for the SQT is "sqt_max_cache_size". To check if this parameter has been set too low, run an "admin who,sqt" at regular intervals and check if the columns "removed" or "full" contain a non-zero value. Be aware that the output of "admin who,sqt" also shows the DSI threads. So when "removed" or "full" are non-zero for a DSI thread, change the memory setting at the connection-level using "alter connection to <dataserver.database> set dsi_sqt_max_cache_size to '<new-value>'". Otherwise change the "sqt_max_cache_size" configuration parameter using "rs_configure" or (better!) "configure replication server".
 +
 
 +
It's best to start increasing "sqt_max_cache_size" before changing settings at the connection level.
 +
 
 +
For RepServer 12.6 do not increase "sqt_max_cache_size" to much. Oversizing the cache will in fact decrease performance. SQT cache is sufficiently sized if you do not observe, or observe only infrequently, transactions being flushed from cache because there is not enough room to store more in cache.
 +
 
 +
==DIST==
 +
For standard warm standby setups you can safely disable the distributor thread (DIST) to save some unneeded overhead within Replication Server. When you want to define subscriptions using the logical connection (for instance for Multi Site Availability) you will need the DIST thread so either leave it as is, or re-enable the distributor thread when it has been disabled.
 +
Disabling the DIST can be done with "alter logical connection to <logical-ds.logical-db> set distribution off".
 +
 
 +
==Replicating Stored Procedures==
 
Using stored procedure replication can greatly improve the performance of a replicated environment. Before using this feature check if the tables that are modified through the stored procedure are also marked for replication. For warm standby environments this is usually done with the sp_reptostandby stored procedure.
 
Using stored procedure replication can greatly improve the performance of a replicated environment. Before using this feature check if the tables that are modified through the stored procedure are also marked for replication. For warm standby environments this is usually done with the sp_reptostandby stored procedure.
 
Since data at the replicate site is now modified through a stored proc, some side effects can occur.
 
Since data at the replicate site is now modified through a stored proc, some side effects can occur.
 
* Tables using identity columns will have a different value in their identity column on the primary and replicate site.
 
* Tables using identity columns will have a different value in their identity column on the primary and replicate site.
* Inserted/Updated values like getdate(), global variables (like @@spid) and the result of some system functions, will also differ between primary and replicate, causing a difference in data between sites.
+
* Inserted/Updated values from global variables (like @@spid), various system functions like getdate() will also differ between primary and replicate, causing a difference in data between sites.
 +
* You cannot use nested stored procedure replication (stored procedure a executes b, and a and b are both marked for replication). See http://www.sybase.com/detail?id=1039889 for recovery instructions when you got "RepAgent(n): Nested replicated stored procedure detected. Transaction log may be corrupt. Please contact SYBASE Technical Support.". It might be neccesary to run the recovery steps twice.
 
Check if this applies to your situation before using stored procedure replication.
 
Check if this applies to your situation before using stored procedure replication.
  
 
Usually a stored procedure is marked for replication like this: sp_setrepproc "<procedure name>","function"
 
Usually a stored procedure is marked for replication like this: sp_setrepproc "<procedure name>","function"
 
There is no need to create anything else (like subscribtions or other stuff).
 
There is no need to create anything else (like subscribtions or other stuff).
 +
 +
==Modifying the RSSD using sql scripts==
 +
(from sybase.com): Every time you run a SQL script that modifies the Replication Server System Database (RSSD), you must shut down Replication Server before running the script, then restart it after running the script. This is because of changes to the way heterogeneous datatype information is cached.
 +
 +
==Minimal Column Replication==
 +
With minimal column replication updates and deletes will be handled with greater efficiency by the RepServer. Before implementing  this for a table check the following:
 +
* the table has a unique index
 +
* the columns in that index will never be changed by an update statement
 +
Once you have verified this it's easy to implement minimal column replication. Here is a template:
 +
  create replication definition <repdef-name>
 +
  with primary at &lt;logical-connection&gt;.&lt;database&gt;
 +
  with all tables named '&lt;table-name&gt;'
 +
  (
 +
    &lt;column-list-with-datatypes&gt;
 +
  )
 +
  primary key (&lt;column_list&gt;)
 +
  replicate minimal columns
 +
The replication definition is defined at the logical connection and so it will also be used after a "switch active".
 +
 +
==Configure for SMP systems==
 +
When replication server runs on a multiprocessor machine you can take advantage of the multi-threaded capability of RepServer. Run this command:
 +
configure replication server set smp_enable to 'on'
 +
A reboot is required to activate the setting.
 +
 +
 +
[[Category:RepServer]]

Latest revision as of 22:01, 10 February 2010

Here are some rules of thumb to tune a replication server (mainly Warm Standby setup). They may not always be suitable for your environment, but it can be used as a starting point for further improvements. The list is not complete.

RepAgent

Some basic settings need to be changed to get a better througput from the RepAgent to the RepServer. Change these setting with the sp_config_rep_agent stored procedure and do a restart of the RepAgent afterwards.

  • "scan batch size", set to "10000"
  • "scan timeout", set to "5" (seconds) (not really a performance booster)
  • "send buffer size", set to "16k"
  • Check if "batch ltl" is set to "true".

With the sp_help_rep_agent stored procedure you can see if the RepAgent can keep up with the activity in the transaction log. Output of this procedure (with the "scan" option") shows the current and the end marker. The end marker is the last page/row number of the transaction log, and the current marker is the page/row number currently being scanned by the RepAgent. In the output of sp_help_rep_agent the markers are shown like this (example): (2954283,4). The first number is the page number and second one the record number on the page.

SQM

Do not spend a lot of time on this. Just make sure that the stable devices are on raw disk when your operating system supports it.

STS

Increase the value of sts_cachesize so rows of the RSSD system tables can be cached in memory. A fairly good indication of the value of this configuration parameter can be determined by adding the rowcount of all system tables and use the total number at the value. Add something like 10% to allow for growth.

configure replication server set sts_cachesize to '550'

Cache the system tables itself by running these commands:

configure replication server set sts_full_cache_rs_classes to 'on'
configure replication server set sts_full_cache_rs_columns to 'on'
configure replication server set sts_full_cache_rs_config to 'on'
configure replication server set sts_full_cache_rs_databases to 'on'
configure replication server set sts_full_cache_rs_datatype to 'on'
configure replication server set sts_full_cache_rs_diskaffinity to 'on'
configure replication server set sts_full_cache_rs_functions to 'on'
configure replication server set sts_full_cache_rs_objects to 'on'
configure replication server set sts_full_cache_rs_publications to 'on'
configure replication server set sts_full_cache_rs_queues to 'on'
configure replication server set sts_full_cache_rs_repdbs to 'on'
configure replication server set sts_full_cache_rs_routes to 'on'
configure replication server set sts_full_cache_rs_sites to 'on'
configure replication server set sts_full_cache_rs_systext to 'on'
configure replication server set sts_full_cache_rs_translation to 'on'
configure replication server set sts_full_cache_rs_users to 'on'
configure replication server set sts_full_cache_rs_version to 'on'

In RepServer version before 15.0 you could technically also cache the rs_locater table. Do not cache this table. A crash of Repserver can then lead to inconsistencies.

Also, you should be aware that while caching these tables will improve performance of replication in general and will reduce CPU utilization by repserver and dataserver, it can impact operations that cause the table caches to be rebuilt. Specifically, if rs_objects and rs_columns are cached, it will severely increase the time required to create, drop, or alter replication definitions. Repserver's CPU usage will spike during this time as it is forced to rebuild its cache.

One user experienced this after executing the above commands to cache the tables on RepServer 12.5. A support call to Sybase lked to un-caching rs_objects and rs_columns. From a private communication with Sybase support:

CR 500890 More efficient processing of changes to cached that are identified to sts_full_cache_<tablename>
Workaround:It is not necessary to fully cache tables.  STS cache automatically does that for RS and is quite efficient. 
If there are frequent changes to RS like adding/dropping repdefs and subscriptions, it is best NOT to fully cache those tables,
but to increase the sts_cachesize to hold more entries

The user's experience was restricted purely to slow adding/dropping of repdefs, whereas subscriptions were added/dropped with normal speed. Sybase support further communicated the fact that this will not be changed in later versions.

SQT

The most important configuration parameter for the SQT is "sqt_max_cache_size". To check if this parameter has been set too low, run an "admin who,sqt" at regular intervals and check if the columns "removed" or "full" contain a non-zero value. Be aware that the output of "admin who,sqt" also shows the DSI threads. So when "removed" or "full" are non-zero for a DSI thread, change the memory setting at the connection-level using "alter connection to <dataserver.database> set dsi_sqt_max_cache_size to '<new-value>'". Otherwise change the "sqt_max_cache_size" configuration parameter using "rs_configure" or (better!) "configure replication server".

It's best to start increasing "sqt_max_cache_size" before changing settings at the connection level.

For RepServer 12.6 do not increase "sqt_max_cache_size" to much. Oversizing the cache will in fact decrease performance. SQT cache is sufficiently sized if you do not observe, or observe only infrequently, transactions being flushed from cache because there is not enough room to store more in cache.

DIST

For standard warm standby setups you can safely disable the distributor thread (DIST) to save some unneeded overhead within Replication Server. When you want to define subscriptions using the logical connection (for instance for Multi Site Availability) you will need the DIST thread so either leave it as is, or re-enable the distributor thread when it has been disabled. Disabling the DIST can be done with "alter logical connection to <logical-ds.logical-db> set distribution off".

Replicating Stored Procedures

Using stored procedure replication can greatly improve the performance of a replicated environment. Before using this feature check if the tables that are modified through the stored procedure are also marked for replication. For warm standby environments this is usually done with the sp_reptostandby stored procedure. Since data at the replicate site is now modified through a stored proc, some side effects can occur.

  • Tables using identity columns will have a different value in their identity column on the primary and replicate site.
  • Inserted/Updated values from global variables (like @@spid), various system functions like getdate() will also differ between primary and replicate, causing a difference in data between sites.
  • You cannot use nested stored procedure replication (stored procedure a executes b, and a and b are both marked for replication). See http://www.sybase.com/detail?id=1039889 for recovery instructions when you got "RepAgent(n): Nested replicated stored procedure detected. Transaction log may be corrupt. Please contact SYBASE Technical Support.". It might be neccesary to run the recovery steps twice.

Check if this applies to your situation before using stored procedure replication.

Usually a stored procedure is marked for replication like this: sp_setrepproc "<procedure name>","function" There is no need to create anything else (like subscribtions or other stuff).

Modifying the RSSD using sql scripts

(from sybase.com): Every time you run a SQL script that modifies the Replication Server System Database (RSSD), you must shut down Replication Server before running the script, then restart it after running the script. This is because of changes to the way heterogeneous datatype information is cached.

Minimal Column Replication

With minimal column replication updates and deletes will be handled with greater efficiency by the RepServer. Before implementing this for a table check the following:

  • the table has a unique index
  • the columns in that index will never be changed by an update statement

Once you have verified this it's easy to implement minimal column replication. Here is a template:

 create replication definition <repdef-name>
 with primary at <logical-connection>.<database>
 with all tables named '<table-name>'
 (
   <column-list-with-datatypes>
 )
 primary key (<column_list>)
 replicate minimal columns

The replication definition is defined at the logical connection and so it will also be used after a "switch active".

Configure for SMP systems

When replication server runs on a multiprocessor machine you can take advantage of the multi-threaded capability of RepServer. Run this command:

configure replication server set smp_enable to 'on'

A reboot is required to activate the setting.