Bring Elastic and Resilient Multi-Tenancy to TiKV

May 01, 2023 20 min Free

Description

To meet the requirement of Multi-Tenancy and Change Data Capture (for RawKV), TiKV introduces significant changes: separate data space into logical sub ranges for different tenancies and add timestamp as key postfix for MVCC. These changes bring challenges, including region management becoming a bottleneck and difficulty in limiting blast radius among tenancies. The TSO service also becomes a bottleneck for performance and resilience. To make multi-tenancy elastic and resilient, region management and TSO services are refactored as micro-services, isolating tenancies by scale and QoS. An TSO cache is implemented in TiKV to acquire TSO in batch for performance and tolerate service interruption during fault and failover of PD. Causality consistency brought by TSO cache is handled with caption.