Bring Elastic and Resilient Multi-Tenancy to TiKV
Description
To meet the requirement of Multi-Tenancy and Change Data Capture (for RawKV), TiKV introduces significant changes: separate data space into logical sub ranges for different tenancies and add timestamp as key postfix for MVCC. These changes bring challenges, including region management becoming a bottleneck and difficulty in limiting blast radius among tenancies. The TSO service also becomes a bottleneck for performance and resilience. To make multi-tenancy elastic and resilient, region management and TSO services are refactored as micro-services, isolating tenancies by scale and QoS. An TSO cache is implemented in TiKV to acquire TSO in batch for performance and tolerate service interruption during fault and failover of PD. Causality consistency brought by TSO cache is handled with caption.