| Issue |
Security and Safety
Volume 4, 2025
|
|
|---|---|---|
| Article Number | 2025010 | |
| Number of page(s) | 19 | |
| Section | Digital Finance | |
| DOI | https://doi.org/10.1051/sands/2025010 | |
| Published online | 31 July 2025 | |
Research Article
HTAP benchmark in financial scenarios
1
School of Computer Science, Fudan University, Shanghai, 200438, China
2
Research Institute of Financial Technology, Fudan University, Shanghai, 200438, China
* Corresponding author (email: This email address is being protected from spambots. You need JavaScript enabled to view it.
)
Received:
17
February
2025
Revised:
27
April
2025
Accepted:
29
July
2025
With the development of HTAP (Hybrid Transactional/Analytical Processing), the use of HTAP databases has become increasingly common. HTAP databases integrate OLTP (Online Transaction Processing) and OLAP (Online Analytical Processing) into a single system, replacing traditional ETL methods and thereby reducing the cost of system usage and maintenance. However, financial scenarios impose higher demands on data and service solutions, with significantly more complex data structures and business processes. This complexity poses a series of challenges for the application of HTAP databases in financial scenarios. To address these challenges, this paper proposes a benchmark tailored for financial scenarios. This benchmark evaluates the key characteristics of HTAP systems in financial contexts: throughput frontier and freshness. It simulates data distributions specific to financial scenarios and generates data consistent with financial use cases. Based on financial data patterns, it defines a variety of analytical queries, including those for marketing, risk control, single-table aggregation, and multi-table joins. It also simulates common banking transactions such as account operations, deposits, and withdrawals, and supports risk detection rollback to reflect the real-world behavior of banking transactions. The testing program allows the generation of transactional and analytical clients in varying proportions based on parameters, enabling concurrent database testing. This paper presents adaptations of the benchmark to a community edition of a specific database for testing purposes. Considering the growing adoption of cloud databases, where resource allocation has become more flexible, this paper also investigates the changes in database performance and workload isolation under different resource allocation scenarios for transactional and analytical clients. The study finds that appropriate resource allocation can significantly enhance workload isolation. Additionally, the benchmark allows database users to assign intent weights to transactional and analytical workloads, enabling a unified performance evaluation metric for the overall performance of the database. A comparison of two different resource allocation strategies shows that appropriate resource allocation improved the comprehensive performance score of the database from 55.0 to 60.0.
Key words: HTAP / Financial scenarios / Benchmark / Resource allocation / Data generation / Workload isolation
Citation: Wu W, Zhang H and Jing Y et al. HTAP benchmark in financial scenarios. Security and Safety 2025; 4: 2025010. https://doi.org/10.1051/sands/2025010
© The Author(s) 2025. Published by EDP Sciences and China Science Publishing & Media Ltd.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
1. Introduction
Since the emergence of databases, various types of database technologies have emerged for different application scenarios [1]. As the scale and diversity of enterprise data expand [2], the traditional architecture of loading transactional data into a data warehouse for analysis through ETL faces increasing operational costs and issues of non-immediacy. There is a growing need for a system that can support both forms of workloads. In consideration of these problems, the concept of HTAP (Hybrid Transactional/Analytical Processing) was proposed [3]. Subsequently, this concept was extended into an application architecture, and more and more databases are moving towards HTAP systems. Examples include TiDB [4], MySQL Heatwave [5], SQL Server [6], and SAP HANA [7]. These databases have different architectures and technical solutions, and accordingly, they have performance advantages in different aspects [8]. For instance, TiDB has an architecture mode that combines distributed row storage with column storage replicas. It has good transaction and analysis throughput. Since transactional and analytical workloads do not run on the same node, its load isolation is relatively high. However, due to the need for cross-node data synchronization, its data freshness is relatively poor. Another example is SQL Server, which adopts an architecture of main row storage and in-memory column storage. It has high transaction and analysis throughput. Since the workloads usually operate on the same node, data synchronization generally does not take too long, so its data freshness is relatively high. Conversely, because it operates on the same node, different workloads may compete with each other, causing one type of workload to affect the other, thereby reducing the throughput of the other type of workload. That is, the load isolation of this database is not strong.
Due to performance differences caused by variations in database architecture, design, and technology, evaluating a database’s performance across various dimensions to help users select a database suitable for their business scenarios is of profound significance. A testing benchmark typically consists of several components: data schema, data generator, test procedures, and evaluation metrics. Most existing HTAP testing benchmarks are derived from extensions of existing data schemas, while workloads in different business scenarios have distinct characteristics and data distributions. Considering the above issues, this paper designs an HTAP testing benchmark for banking and financial scenarios. This benchmark designs its data schema by referencing common banking operations, samples data from real-world bank data distributions, and simultaneously devises a workload operation mode that ensures security.
The main innovations and contributions of this paper are as follows:
(1) The design involves 13 tables such as customer information, related dimension tables, transfer record tables, customer history information tables, wealth management information tables, loan information tables, and account information tables. The data scale is stipulated to grow proportionally based on the number of institutions as the standard, with the data scale controlled by the input parameter SF. Considering the security of bank transaction services, this paper designs status fields for the customer table and account table to identify the status of customers and accounts, ensuring the detection of dishonest users and abnormal accounts, and providing data schema-level support for transaction security detection.
(2) For the above data schema, combined with the obtained real urban institution information and the data distribution of each field, test data that restores real financial transaction scenarios as much as possible is generated. Considering the correlation between data, we first generate associated information to ensure the correctness of data relationships. For example, each different customer may have a different number of accounts. Meanwhile, to simulate abnormal data states detected in banking operations, this paper generates a certain proportion of tuples containing abnormal fields based on real bank data distributions, providing data-level support for transaction anomaly detection.
(3) Account information for each user is generated according to a certain probability distribution. Three common banking operations are designed: deposit, withdrawal, and transfer, along with 21 analytical SQLs related to risk control and marketing. For businesses requiring security guarantees, status fields of relevant tables need to be detected to ensure operational security.
(4) Considering that transactional and analytical workloads in an HTAP architecture run on the same system, in addition to the basic metrics such as the throughput of transactions and queries, we also need to consider the load isolation and freshness of the workloads, that is, the degree to which one workload affects the other and the freshness of the data obtained by the analytical workload.
(5) Design a comprehensive indicator that can reflect the database throughput, freshness, and user intention, adapt it to the database, and simultaneously study the changes in the data isolation and comprehensive performance of the database under different resource allocation situations.
2. Related work
In the 1980s, during the early development of database technology, performance evaluation was relatively simple and lacked unified standards. Researchers began to attempt to write some simple benchmark test programs to measure the basic performance of databases, such as testing the efficiency of operations like data storage, retrieval, and update. Tests at this stage mainly focused on the basic functions and performance of the database system. The test scenarios and data scale were relatively small. For example, the Wisconsin Benchmark [9] only had three data tables and some test SQL statements.
In the 1990s, with the increasingly widespread application of databases, the demand for database performance evaluation grew. Some more standardized database test benchmarks began to take shape. The Transaction Processing Performance Council (TPC) was established. Among them, TPC-C was approved in July 1992 and became an influential Online Transaction Processing (OLTP) benchmark. TPC-C [10] simulated the warehouse order management scenario. Compared with previous benchmark tests, TPC-C was more complex, with multiple transaction types, a more complex database, and an overall execution structure, involving a mix of five different types and complexities of concurrent transactions. It measured the performance of the database by Transactions Per Minute (TPMC), providing a relatively unified performance evaluation standard for database vendors and users.
In addition, the TPC-H [11] benchmark, which was designed for data warehouses and decision support systems, was widely used. TPC-H simulated the dynamic queries of real-world business transaction databases, included a complete set of business-oriented instant queries and concurrent data modifications, emphasized the testing of operating systems, databases, and I/O performance, and focused on the query performance for analysis.
Subsequently, with the continuous development of database technology, in addition to traditional relational databases, various new types of databases such as NoSQL databases and distributed databases emerged. Therefore, database test benchmarks also continued to develop and improve to meet the testing needs of different types of databases. For example, Graph 500 [12] was for graph databases, TSBenchmark [13] was for time-series databases, and TPCx-BB [14] was for big data applications. At the same time, some open-source communities and research institutions also began to introduce their own database testing tools and benchmarks, such as Sysbench.
Considering the evaluation of the hybrid workloads of HTAP databases, TPC-CH [15] first integrated TPC-C and TPC-H, which included transactional and analytical workloads. Therefore, in addition to the throughput indicators of the workloads, it also considered the mutual influence and interaction between the workloads. In 2022, the Li Guoliang team from Tsinghua University designed a new HTAP test benchmark, Hatrick [16]. This test benchmark first proposed a throughput frontier that could reflect the load interference situation and an indicator that could reflect data freshness, and gave specific definitions and calculation methods.
At present, in the financial and banking field, there are the DataBench-T [17] test benchmark for OLTP and DataBench-H for OLAP, which were released by the China Academy of Information and Communications Technology. However, these two test benchmarks do not consider the situation of mixed workloads and do not design indicators that reflect the characteristics of HTAP. The current development of test benchmarks is shown in Figure 1 [18].
In summary, although there are various types of database test benchmarks currently, there is stillroom for development and improvement of test benchmarks for specific scenarios. Moreover, the research on HTAP test benchmarks and their characteristics is not yet perfect. The Fintech Benchmark designed a data schema for the bank transaction scenario, generated data that conforms to the real-world data distribution, designed workloads that conform to the real-world scenario, and studied the performance of HTAP databases in the financial scenario. This aims to address the deficiencies of existing database test benchmarks in terms of business scenario adaptation, test data generation, and HTAP database testing, provide a unified evaluation and measurement for database products applied in the financial business scenario, help financial practitioners make more accurate choices, guide the healthy development of data service vendors, and promote the development of financial information innovation work.
3. Design of test benchmark
This test benchmark is designed with 13 tables covering common financial and banking-related businesses. The data scale increases proportionally with the parameter SF. Based on the data schema, a multi-threaded data generator is designed to improve the data generation efficiency, and test data of different scales are generated according to SF. During the testing phase, the test program automatically creates database tables and controls the import of offline data into the database. Online data will also be generated during the testing process. After the test is completed, relevant performance indicators are collected from the thread space, written into a data file, and visualized.
3.1. Schema design
Figure 2 shows the Schema of Fintech Benchmark, which mainly includes the following aspects:
(1) Relevant information of users from various dimensions, such as user address information (ADD_INFO), telephone information (TEL_INFO), and basic information (BASE_INFO).
(2) Users’ accounts and related product information, such as demand deposit account information (HQ_PRDT_INFO), time deposit account information (DQ_PRDT_INFO), financial products (FIN_PRDT), intermediate business information table (INTER_BUS_PRDT_INFO), and loan voucher information table (PULOANCH) that users own.
(3) Business-related information and processed information, such as transaction flow table (TRANS_INFO), deposit information table (CK_INFO), and user historical information table (CUST_INFO_HIS).
(4) FRESHNESS TABLE: To support the measurement of freshness, a FRESHNESS TABLE is established for each TP client. It contains two columns, Client_ID and Tnx_ID, which are used to record the ID of the most recently completed transaction of each TP client. Whenever a TP client submits a transaction, the Tnx_ID column of this table is modified.
![]() |
Figure 2. The schema of the test benchmark |
The overall data scale scales proportionally with the parameter SF, allocating 3000 × SF users per organization. Among them, a selected part of users have multiple addresses and multiple mobile phone numbers, and the overall data scale reaches 4000 * SF. The customer information table only records the unique mobile phone number and address information left by customers. For financial products and intermediate businesses, we select 1000 * SF customers to own them. Similarly, for loan vouchers, 1000 * SF customers are also selected to own them. For the demand deposit accounts and time deposit accounts that each user owns, we generate a certain number of accounts for each user according to the real-world data distribution. We use SF * 3000Ra (Random) to represent its data scale, that is, the number of accounts randomly generated based on the number of 3000 * SF customers. For the transaction flow information, considering that its data scale is larger than other information, we set it as SF * 60 000. The accounts of both parties in the transactions are sampled from the account information table. The deposit information table is processed from the transaction flow and account information during initialization. It reflects the changes in the deposits of all accounts according to the transaction flow, and its data scale is the sum of the account data scale. For the main bodies of bank information services, customers and accounts, this paper sets status fields to identify whether customers and accounts are abnormal, so as to achieve risk control in the business process. The following statuses are set for customer status:
active: The customer is in normal status, can use bank accounts and various services normally (such as deposit/withdrawal, transfer, loan, etc.), and has no restrictions.
suspend: The customer’s account is temporarily frozen, with partial or all functions restricted (such as prohibited transactions, limited login, etc.).
delete: The customer’s account has been permanently closed, and the data may be archived but can no longer be used.
locked: The account is locked for specific reasons and can only be restored after manual intervention to unlock it.
apply: The customer is handling a certain business (such as account opening, loan, report of loss, etc.), and the status is under process approval.
loss: The customer applies for (report of loss) due to loss of documents, bank cards, passwords, etc., and the account may be in a protected state.
stop: The bank suspends fund outflows from the account (but may allow inflows) according to the customer’s application or regulatory requirements.
For accounts, three statuses are set: active (normal activation), suspend (account suspension due to anomalies), and delete (account deletion).
In addition to the business-related tables, for the FRESHNESS TABLE, each table has only one record. If a new transaction is submitted, the transaction ID column of the only data in the table is directly modified to the current transaction ID.
3.2. Data generator
The data generation adheres to logical relevance, rationality of data distribution, and efficiency of data generation. The system architecture of the data generator is roughly shown in Figure 3. By inputting the data scale parameter SF, the data generator can start multiple threads to generate logically consistent data information based on the obtained data distribution.
![]() |
Figure 3. Data generator |
This data generator uses a rule-based method to generate data in batches, creating a total of twelve tables of data. Since there are inter-relationships among the table data, especially many tables in the Schema are related to the user table, this module first generates the intermediate structure CUST_LITE, which represents the association between users, demand deposit accounts, and time deposit accounts. This structure is a collection of all user IDs and their corresponding accounts. According to the actual data distribution, a different number of demand deposit accounts and time deposit accounts are generated for related tables are sampled from this intermediate structure. Similar to this data is the ORG_ID in the institution table.
During the process of generating table data, to better simulate a real-world database, we conducted statistics on columns that are frequently used in the forms and whose data distribution can affect query performance. These statistics include the standard deviation (STD()), variance (VARIANCE), maximum value (MAX()), minimum value (MIN()), average field length (AVG(LENGTH())), and cardinality information. We also ensured that the generated data conforms to these data characteristics. For example, from the cardinality statistics of the ORG table, we can obtain the total number of financial institutions and that they are divided into six different levels. In the BASE_INFO user basic information table, in addition to the cardinality, the extreme values of the B_AGE column determine the approximate age distribution. In the TRAN_FLOW transaction flow table, a data distribution histogram is obtained through the statistics of the TF_MONEY transaction amount column. Considering the roll-back situations that occur in actual bank transactions, such as insufficient balance in transfer transactions, abnormal roll-backs of transfer accounts, and restrictions on withdrawals from time deposit accounts, the data generator can control the account distribution, such as the distribution of account balances and the proportion of abnormal accounts, to manage the roll-back ratio.
On this basis, the following data tables are generated respectively:
(1) ADD_INFO: This is the address information of customers. All CUST_IDs in this table are obtained from CUST_LITE, and it is stipulated that a certain proportion of customers have multiple addresses.
(2) TEL_INFO: This is the telephone information of customers. All CUST_IDs in this table are obtained from CUST_LITE, and it is stipulated that a certain proportion of customers have multiple phone numbers.
(3) BASE_INFO: This contains basic information of customers such as their birthdays and genders.
(4) CUST_INFO: This integrates the basic information, telephone, and address information of customers, involving information related to customer status, name, age, change date, phone number, and address.
(5) INTER_BUS_PRDT_INFO: This is the table of intermediate business information that customers own. Only a part of customers are selected to own intermediate products.
(6) FIN_PRDT: This is the table of financial product information that customers own. Only a part of customers are selected to own financial products, and the types and amounts of financial products are set according to a certain data distribution.
(7) PULOANVCH: This is the table of loan information that customers own. Only a part of customers are selected to have loans, and the loan amounts are generated according to a certain data distribution.
(8) HQ_PRDT_INFO: This is the information of demand deposit accounts that customers own. It is stipulated that each user corresponds to a series of demand deposit accounts, and the corresponding relationship is determined by CUSTLITE. For the deposit amount in each account, it is generated according to a certain data distribution. The number of generated demand deposit accounts are written into the file of the test module to provide relevant data information support for the transactions in the test module.
(9) DQ_PRDT_INFO: This is the information of time deposit accounts that customers own. It is stipulated that each user correspond to a series of time deposit accounts. The corresponding relationship and the generation rule of account balances are the same as those of demand deposit accounts. The number of generated time deposit accounts are written into the file of the test module to provide relevant data information support for the transactions in the test module.
(10) TRAN_FLOW: This is the transaction flow table. This table selects the accounts of both parties in the transaction by reading the previously generated demand deposit account information and generates the transaction amount and transaction time according to a certain data distribution. Since the data volume of this table is large, multiple threads are used for data generation to fully utilize system resources and speed up the process.
(11) CUST_INFO_HIS: This is the customer historical information table. It is generated based on CUST_INFO. It is assumed that a part of users has updated their information at certain times, and the data in this table is generated on this basis.
(12) CK_PRDT_INFO: This is the deposit information table. The data in this table is generated by SQL, which calculates and generates the monthly, quarterly, and annual deposit balances of demand deposit accounts.
3.3. Test program
The overall testing process are shown in Figure 4. The test program supports setting test parameters through configuration files and command-line parameters, allowing users to configure the data source and the number of transactional and analytical clients. To eliminate the influence of cache misses, the test supports a warm-up phase, enabling users to set the warm-up time and test time. The test program allows users to choose whether to conduct multi-user testing, that is, accessing the database through two different database users. This design is beneficial for evaluating the database performance under different resource allocation scenarios for different users. Users can choose single-point testing or testing database isolation in the parameter selection. The single-point test configuration will report the throughput at a certain client ratio setting, while the isolation test will report the load interference and isolation of the database.
![]() |
Figure 4. The operation process of the test program |
The test program first initializes the database tables, and then imports the data generated offline by the data generator. Meanwhile, according to the number of transactional clients, a freshness table is created for each transactional client and its transaction number is initialized to 0. Then, according to the number of transactional and analytical clients given by the user, the corresponding number of analytical threads and transactional threads are generated. Each transactional thread randomly selects one of the transactions according to a certain proportion to execute. After the transaction is submitted, the transaction number in the freshness table corresponding to the current thread needs to be modified. At the same time, the current transaction number and the current timestamp are added to the global data structure. This process continues until the test time is reached. The analytical thread selects one from 21 analytical queries, queries all freshnesses simultaneously, and returns the results. The calculation method of this indicator was proposed by the Li Guoliang team in Hatrik. The transaction number in the queried freshness table represents all the transactions observed by the analytical query at the current moment. Using the queried results to query the global data structure, the first transaction that cannot be seen by the query is found (the transaction number is greater than the currently read transaction number, and the timestamp is earlier than the start time of the query). The core idea is to reflect data freshness through the synchronization of transaction numbers. The freshness value is calculated by subtracting the submission time of the first unobservable transaction from the query start time. A smaller value indicates higher freshness. The test continuously repeats the above process until the test time ends.
For the data required for calculating the throughput frontier, the test program first measures the maximum number of clients. By continuously increasing the number of clients, the maximum number of transactional and analytical clients that the database can handle under the current configuration is determined. Single-point tests are conducted with different numbers of clients configured according to a certain proportion, such as 0.8 times the maximum number of transactional clients and 0.2 times the maximum number of analytical clients, or 0.5 times the maximum number of transactional clients and 0.5 times the maximum number of analytical clients. By aggregating the single-point test data of multiple different proportion configurations, the change in database throughput with the increase in the number of the other type of clients are reflected. However, in multiple tests, the previous test will generate new data, causing inconsistent data scales in multiple tests and resulting in test errors. Considering this situation, the test program tracks the initial data scale of the newly added tables and deletes the increased table data. Considering that data updates may also cause some minor errors, users can also choose to reload the data for each round. This approach ensures the accuracy of the test but increases the test overhead.
3.4. Workload
The workload of the test benchmark consists of analytical workload and transactional workload.
The analytical workload contains 21 SQL statements related to risk control and marketing, such as:
(1) Determine the product-holding situation of a certain customer in a certain institution. This query obtains the products held by a certain customer in a certain institution, including demand deposits, time deposits, loans, credit cards, salary disbursement on behalf of others, water bill payment on behalf of others, electricity bill payment on behalf of others, etc.
(2) Conduct a fuzzy query of the customer information of a certain institution in a certain region.
(3) Query the handling situation of intermediate business to understand the situation of those who have opened debit card business in our bank, are high-asset customers and have opened more than three types of businesses, but have not yet handled intermediate business. Such customers can be marketed in the later stage.
(4) Query transactions where the transaction amount of any payer exceeds 100 000 CNY.
(5) The product distribution held by customers of different ages in each institution.
The transactional workload mainly contains three transactions. The transaction process is shown in Algorithm 1 as follows:
(1) Deposit: Randomly select an account as the deposit account from demand deposit and time deposit accounts according to a certain proportion, generate the deposit amount according to a certain data distribution, check whether the account is abnormal. If the account is abnormal, roll back the transaction, update the demand deposit account table, insert into the transaction flow table. Both the transfer-in and transfer-out parties are the same account, and the amount is positive. Then submit the transaction.
(2) Withdrawal: Select an account as the withdrawal account from demand deposit and time deposit accounts according to a certain proportion; generate the withdrawal amount according to a certain data distribution. If the account is abnormal, roll back the transaction. If the account is a time deposit account, there is a certain probability that the withdrawal cannot be made, and the transaction is rolled back to simulate that the time deposit has not reached the withdrawable period. If it has reached the withdrawable period, check the account balance. If the account balance is less than the withdrawal amount, roll back the transaction. If the account is a demand deposit account, directly check the account balance. If the amount does not meet the withdrawal requirement, roll back. After passing the above-mentioned checks, modify the account balance table, insert into the transaction flow table, with the amount being negative, and submit the transaction.
(3) Transaction: Select two accounts from demand deposit accounts as the accounts of both parties in the transaction. Keep selecting until the two accounts are different. Generate the transaction amount; check whether either of the two accounts is abnormal. If abnormal, roll back the transaction. If both accounts are normal, check whether the balance of the transfer-out party is sufficient. If not sufficient, roll back the transaction. After passing the above-mentioned checks, insert into the transaction flow table, update the balances of both parties in the transaction, and submit the transaction.
1. tf_acoount=RandomChoice(hq_account_sf) #Generate transaction account
2. op_account=RandomChoice(hq_account_sf) #Generate opposite account
3. While(tf_account!=op_account) op_account=RandomChoice(hq_account_sf) #Loop until they are different
4. IF(CheckAnomalies(tf_account,op_account)) THEN rollback
#Roll back the transaction due to abnormal account
5. tr_money=RandomGenarateTranMoney()#Generate the transaction amount
6. IF(tf_account.amount<tr_money) THEN rollback#Roll back the transaction due to insufficient account balance
7. Update(tf_account,op_acoount, amount)
InsertInto(tr_flow)
Commit #Update the account information, insert into the transaction flow table, and submit the transaction
For businesses involving fund changes, transactions will undergo pre-abnormality detection, including checking whether the account is in a tradable status and whether the balance is sufficient. For fund changes involving accounts in abnormal statuses, the business will be rolled back to ensure fund security.
3.5. Performance indicators and collection of test data statistics
This test benchmark mainly evaluates the HTAP-related characteristics, throughput frontier, and freshness of the database. For an HTAP system, transactional and analytical workloads are running simultaneously, and there is resource contention among different workloads. The throughput frontier can well reflect this phenomenon. If the throughput frontier is below the diagonal, it indicates that the resource contention in the database is relatively severe; conversely, it means that the isolation between the transactional and analytical workloads of the database is relatively strong. To calculate the throughput frontier, first, a pre-test is carried out to measure the maximum number of analytical and transactional clients that the database can support under the current resource conditions. Then, according to the proportion of the maximum number of clients, different numbers of transactional and analytical clients are allocated for multiple tests to obtain the throughput of each test. Finally, the boundary points are searched and obtained. Theoretically, when transactional and analytical workloads run in a system, the data obtained by the analytical workload may have a certain delay compared to the transactional workload, that is, the data read by the analytical workload may not be the latest. Therefore, freshness is used to measure the freshness of the data obtained by the analytical workload. It is defined as the time difference between the arrival time of the current analytical query and the first unobservable transaction. The smaller the difference, the higher the freshness, and 0 represents the optimal value of freshness. After comprehensively considering the above-mentioned indicators (Composite Metric), it is simply referred to as CM. This test benchmark supports users in evaluating the database performance based on the usage proportion preference of transactions and analytical queries, so as to help users select the most suitable database for a certain scenario. Its formulation is shown in Formula 1, where ap_weight denotes the preference for analytical workloads, tp_weight represents the priority for transactional workloads, and freshness reflects data recency in the system. It can be seen that if the system throughput is high, and the ratio of transaction throughput to analytical throughput is more in line with the weights given by the user, the overall indicator score is higher. In addition, the smaller the freshness value of the database, that is, the fresher the data, the higher the overall indicator score.
The workflow of the result collection module is shown in Figure 5. It calculates the relevant indicators reflecting the system performance by collecting the test data recorded in the threads during the test process. This includes collecting and statistically analyzing the indicators of all analytical clients to obtain the average execution time of each query, collecting and statistically analyzing the indicators of all transactional clients to obtain the average latency of each transaction, obtaining the freshness of all queries from all analytical clients, and taking the 95th percentile as the freshness score of the system, and calculating the throughput of transactions and analytical queries. After the test is completed, the result collection module will write the test-related information into a local file and visualize the results.
![]() |
Figure 5. Metrics collector |
4. Experimental analysis
4.1. Experimental setup
Experimental environment: The database is deployed on a single-node. The relevant test tenants are allocated 6G of RAM, 4 Intel Xeon CPUs, X, and 40G of SSD hard-disk storage. During the test, the IOPS of 16KB is controlled to be 1024.
Test benchmark setup: The Benchmark is deployed on the same machine as the database for testing. Considering the impact of cache, before each round of testing, a warm-up is first carried out for 40 seconds, and then a 5-minute formal test is conducted. The experiment supports single-point performance testing with a specified number of clients enabled, and can obtain the latency of each transaction and the query duration respectively, and give the overall system throughput and freshness. Or choose to test the interference between the transactional workload and the analytical workload. For the study of the mutual influence between workloads, the experiment uses test benchmarks of three data scales, SF1, SF5, and SF10, for testing. For the impact of resource allocation on the comprehensive performance of the database, we use the data scale of SF5 to test and study the comprehensive load interference and comprehensive performance indicators.
4.2. Load interference and freshness testing
We carried out load interference and freshness tests of SF1, SF5, and SF10 on this database respectively. For the data of each table, we use the “WITH COLUMN GROUP (all columns, each column)” statement provided by the database to create a column-stored copy. The query optimizer autonomously selects whether to use column-storage according to the cost.
Figure 6 shows the database test situation of SF1. Figure 6a describes what changes will occur in the transaction throughput when the number of analytical clients is changed under the condition of fixing different numbers of transactional clients. It can be seen from the figure that the more transactional clients there are, the more obvious the decline in transaction throughput is as the number of analytical clients’ increases. On the contrary, if there are fewer transactional clients, increasing the number of analytical clients will not have a great impact on the transaction throughput. Each line in Figure 6b fixes different numbers of analytical clients. It can be seen that by changing the number of transactional clients, the more analytical clients there are, the more obvious the decline in query throughput is, that is, the greater the interference caused by transactional clients. Figure 6c is the throughput frontier of the database system tested under the current configuration, which is surrounded by the boundary points of query throughput and transaction throughput obtained from the tests of various proportion clients. The blue proportion line represents that as the proportion of one type of client increases, the throughput of the other type of client decreases in a linear proportion. If the line of the throughput frontier is below the proportion line, it proves that its data isolation is not strong. The specific isolation can be seen from the relative area of the gray graph below the throughput frontier. The larger the area, the stronger the isolation.
![]() |
Figure 6. SF1 HTAP performance indicators |
Meanwhile, we also conducted tests on the data scales of SF5 and SF10. The specific situations are shown in Figures 7 and 8. It can be observed that there is no significant change in the load isolation of this database. The suboptimal load isolation arises from the database’s typical distributed architecture, which contrasts with its single-node deployment in this experiment. The analytical clients and transactional clients use the same set of resources, and there will inevitably be serious resource contention, resulting in poor isolation of the database in the single-machine case. At the same time, due to the change for data, the query throughput has decreased to a certain extent. Among them, f2, f5, and f8 respectively represent the freshness of the database when the ratio of transactional clients to analytical clients is 2:8, 5:5, and 8:2. It can be seen that whether it is SF1, SF5, or SF10, the freshness is 0, that is, the data seen by the analytical clients is always the latest. This is related to the design concept of Oceanabse. This database adopts the LSMT storage architecture. When reading data, the adopted solution is to merge with the latest version of data while querying, so the data read is always the latest. This approach improves the freshness of the data and ensures that all the read data is the freshest. However, relatively speaking, the additional overhead during reading will cause a loss in the throughput performance of the database.
![]() |
Figure 7. SF5 HTAP performance indicators |
![]() |
Figure 8. SF10 HTAP performance indicators |
4.3. Influence of resource proportions on database performance
This database supports resource isolation within tenants. By configuring cgroups and resource groups, it is convenient to control the resource usage of different users within a certain tenant. The resource limitations are mainly controlled by the following parameters:
MGMT_P1: Specifies the proportion of CPU committed to be allocated to this resource group under full-load conditions of the system.
UTILIZATION_LIMIT: Specifies the upper limit of CPU resources used by the resource group, restricting the maximum usage of the resource group.
MIN_IOPS: Used when there is an IO contention, it is the reserved IOPS resources of the resource group. The sum does not exceed 100, and the default value is 0.
MAX_IOPS: Used to specify the maximum IOPS resources that the resource group can use. The sum can exceed 100, and the default value is 100.
WEIGHT_IOPS: This parameter is used to set the IOPS weight value of the resource group. In the case of no I/O resource contention, this parameter is mainly used to determine the relative priority between resource groups.
We have set up two database users, tp_user and ap_user, who execute transactional workload and analytical workload respectively. The resources allocated to them are as follows:
ap_user: MGMT_P1: 80, UTILIZATION_LIMIT: 100, MIN_IOPS: 20, MAX_IOPS: 100, WEIGHT_IOPS: 20.
tp_user: MGMT_P1: 20, UTILIZATION_LIMIT: 100, MIN_IOPS: 80, WEIGHT_IOPS: 80.
Under this resource configuration, both ap_user and tp_user can theoretically obtain the maximum tenant resources. In the case of resource contention, the resources will be allocated proportionally. Another set of configurations is just the opposite. We call the two resource allocation situations Strategy One and Strategy Two respectively. The load interference situations we tested are shown in the figures. By comparing Figures 9 and 10, it can be seen that Strategy One, which allocates more IO resources to ap_user, has better load isolation. The main reason is that relatively speaking; the ap workload requires more IO resources. We have allocated more IO resources to ap_user in the case of contention and more CPU resources to tp_user.
![]() |
Figure 9. AP_USER(MGMT_P1:80 UTILIZATION_LIMIT:100 MIN_IOPS:20 MAX_IOPS:100 WEIGHT_IOPS:20) |
![]() |
Figure 10. AP_USER(MGMT_P1:20 UTILIZATION_LIMIT:100 MIN_IOPS:80 MAX_IOPS:100 WEIGHT_IOPS:80) |
In this article, a comprehensive index is designed for the comprehensive performance of the database, where ap_weight and tp_weight represent the user’s intention preferences for ap and tp. In real-world scenarios, banks may perform some batch processing operations at night and handle more transactions during the day. Therefore, there may be different intention weights in different situations. The range of weights is 0−1, and the sum of the two cannot exceed 1. Referring to the above-mentioned resource allocation situation. In this article, when the number of transactional clients and analytical clients is increased respectively until the database reaches its maximum performance, assuming that the intention weights of the two workloads are the same, the score of Strategy One is calculated to be 55.0, and the score of Strategy Two is 61.0. The specific indicators are shown in Table 1. It can be seen that when the user considers ap and tp to be equally important, the resource allocation method of Strategy Two scores higher than that of Strategy One. This evaluation method can be used to evaluate the value of resource allocation in HTAP databases and provide a more reasonable evaluation criterion for different resource configuration schemes.
Performance indicators of HTAP database with a weight ratio of 0.5:0.5
When ap_weight is adjusted to 0.8 and tp_weight is adjusted to 0.2, the indicator situation is shown in Table 2. It can be found that the comprehensive indexes of resource configuration schemes with higher transaction throughput, such as Strategy One and Strategy Two, have significantly decreased. This is because the smaller tp_weight leads to a smaller impact of transaction throughput on this evaluation index. Such an evaluation index will also guide the resource-scheduling model to allocate more resources to analytical users. By adjusting the changes of weight indicators, the performance of the HTAP system can be reasonably evaluated for different scenarios, and diversified scheduling models can be trained.
Performance indicators of HTAP database with a weight ratio of 0.8:0.2
Comparison with relevant test benchmarks
5. Discussion and prospect
This paper compares the proposed FinTech Benchmark with existing HTAP test benchmarks and financial scenario-specific test benchmarks from the aspects of Workload, Metrics, Data and Schema, mainly focusing on the HTAP performance of databases in financial scenarios under the premise of fully ensuring business security and the impact of resource isolation configurations on the comprehensive performance of HTAP databases. Existing test benchmarks such as the OLTP test benchmark DataBench-T for financial scenarios mainly focus on transactional workloads in financial scenarios to evaluate transaction performance and characteristics; the OLAP test benchmark DataBench-A for financial scenarios predefines a typical financial scenario test model to simulate real precise marketing scenarios and focuses on complex query efficiency with large data volumes such as query response time and execution time to measure the performance of databases in handling large-scale data and complex queries; the Hatrick benchmark for HTAP scenarios extends the existing SSB data schema based on retail business and innovatively proposes descriptions of database HTAP characteristics, with specific comparisons shown in Table 3.
Although this test benchmark has completed the design of the data schema, data generation, workload design, and comprehensive index design of the HTAP test benchmark in the financial scenario, and has finished the testing of the database, there are still the following aspects that can be expanded and improved:
(1) Expand the transaction workload. This benchmark currently focuses on high-frequency banking transactions, specifically deposit, withdrawal, and fund transfer operations. More transactions in the financial and banking fields should be developed to enrich the complexity and authenticity of the workload.
(2) Optimize the test plan. During the process of testing throughput, multiple tests are needed, which results in a relatively long testing time. New test plans can be explored within the test process to optimize the test flow.
(3) Adapt to multiple databases, test the performance of different databases under different architectures, and enhance the scalability of the test benchmark.
6. Conclusion
With the development and gradual application of HTAP systems, considering that most of the current test benchmarks evaluate the database system for a certain type of workload, there is relatively little research related to finance and banking. Moreover, there is a lack of systematic HTAP test evaluation and in-depth research on the characteristics of HTAP systems. This paper proposes an HTAP test benchmark based on banking transaction and analysis scenarios, and designs and completes the data generation program and test program of the test benchmark. The characteristics of HTAP are studied, including the database performance under different data scales and the changes in database isolation under different resource configurations. A comprehensive index considering user-intention weights are designed. Under the given weights, a comprehensive score comparison is conducted for different resource configurations under the same data scale.
Acknowledgments
No Acknowledgments.
Funding
This work was supported by National Natural Science Foundation of China (No. 62072113), and Shanghai Key Laboratory of Computer Software Testing and Evaluating (SSTL) Open Project (No. SSTL2023_04).
Conflicts of interest
The authors hereby declare that there are no conflicts of interest regarding the tested databases. This ensures that the research process and results are not interfered with by any interest-related relationships that may affect the fairness and objectivity, thereby enhancing the credibility and academic rigor of the paper.
Data availability statement
No data are associated with this article.
Author contribution statement
Weilong Wu: Primarily responsible for the design and implementation of the benchmark framework, including data schema development, test program coding, experiment execution, and manuscript drafting. Conducted data analysis and result interpretation. Hanbing Zhang: Contributed to manuscript revision and optimization, assisting in refining experimental designs and data analysis methods to enhance the research’s logical rigor. Yinan Jing: Provided academic guidance, oversaw the research direction, and participated in designing the benchmark’s technical framework and reviewing experimental protocols. Zhenying He, Kai Zhang, Xiaoyang Wang: Provided academic resources and support, including guidance on financial scenario modeling, system architecture suggestions, and security compliance reviews. Hongfeng Chai: Initiated the research vision and strategic planning, integrated fintech resources, and supervised the overall project progress to provide critical support.
References
- China Academy of Information and Communications Technology. Database Development Research Report 2021. Beijing, 2021 [Google Scholar]
- Li G, Zhou X and Sun J et al. openGauss: An autonomous database system. Proc. VLDB Endow. 2021; 14: 3028–3041. [Google Scholar]
- Pezzini M, Feinberg D and Rayner N et al. Hybrid transaction/analytical processing will foster opportunities for dramatic business innovation. Gartner 2014; 2014: 4–20. [Google Scholar]
- Huang D, Liu Q and Cui Q et al. TiDB: A raft-based HTAP database. Proc VLDB Endow. 13; 2020: 3072–3084. [Google Scholar]
- MySQL Heatwave. Real-time Analytics for MySQL Database Service, 2021. [Google Scholar]
- Larson PÅ, Birka A and Hanson EN et al. Real-time analytical processing with SQL server. VLDB 8; 2015: 1740–1751. [Google Scholar]
- Sikka V, Färber F and Lehner W et al. Efficient transaction processing in SAP HANA database: The end of A column store myth. In: SIGMOD, 2012, 731–742. [Google Scholar]
- Li G and Zhang C. HTAP databases: What is new and what is next. In: Proceedings of the 2022 International Conference on Management of Data (SIGMOD ’22). Association for Computing Machinery, New York, NY, USA, 2022, 2483–2488. [Google Scholar]
- Bitton D, DeWitt DJ and Turbyfill C. Benchmarking database systems–A systematic approach. Madison: University of Wisconsin-Madison, 1983. [Google Scholar]
- Leutenegger ST and Dias DM. A modeling study of the TPC-C benchmark. In: Proceedings of the 1993 ACM International Conference on Management of Data, Washington DC, 1993. [Google Scholar]
- Poess M and Floyd C. New TPC benchmarks for decision support and web commerce. ACM Spec Interest Group Manag Data Record 2000; 29: 6471. [Google Scholar]
- Murphy RC, Wheeler KB and Barrett BW et al. Introducing the graph 500. Cray Users Group 2010; 19: 4574. [Google Scholar]
- Hao Y, Qin X and Chen Y et al. TS-Benchmark: A benchmark for time series databases. In: 37th IEEE International Conference on Data Engineering, Chania, 2021. [Google Scholar]
- Cao P, Gowda B and Lakshmi S et al. From BigBench to TPCx-BB: Standardization of a big data benchmark. In: 8th TPC Technology Conference, New Delhi, 2016, 2444. [Google Scholar]
- Funke F, Kemper A and Krompass S et al. Metrics for measuring the performance of the mixed workload CH-benCHmark. In: Third TPC Technology Conference, Seattle, 2011. [Google Scholar]
- Milkai E, Chronis Y and Gaffney KP et al. How good is my HTAP system? In: Proceedings of the 2022 International Conference on Management of Data (SIGMOD ’22). Association for Computing Machinery, New York, NY, USA, 2022, 1810–1824. [Google Scholar]
- Jiang C, Tian J and Ma P. Databench-T: A transactional database benchmark for financial scenarios. In: 2021 IEEE 20th International Conference on Trust, Security and Privacy in Computing and Communications, Shenyang, 2021. [Google Scholar]
- Jing Y, Zhang H and Li Z et al. Next-generation database benchmark for financial scenarios. Strategic Study CAE 2022; 24: 121. [Google Scholar]

Weilong Wu is currently a master’s student in School of Computer Science, Fudan University, China. He current research interest include HTAP Benchmark and HTAP resource scheduling.

Hanbing Zhang received the Ph.D. degree in Software Engineering from Fudan University, China, in 2022. He is currently a postdoc research fellow in the School of Computer Science, Fudan University. His research interests include interactive data exploration, approximate query processing, and query optimization. He is a member of ACM and IEEE.

Yinan Jing received the Ph.D. degree in Computer Science from Fudan University, Shanghai, China, in 2007. He is currently an Associate Professor with the School of Computer Science at Fudan University. He was also a Visiting Scholar with the Department of Computer Science at the University of Southern California. His current research interests include big data analytics, database systems, spatial and temporal data management, and data security and privacy.

Zhenying He received the BS, MS, and Ph.D. degrees in computer science from the Harbin Institute of Technology, China, in 1998, 2000, and 2006, respectively. Currently, he is an associate professor in the School of Computer Science, Fudan University, China. His current research interests include keywords search on structured data, query processing on RDF data, and big data.

Kai Zhang received the Ph.D. degree from the University of Science and Technology of China in 2016. He is currently an associate professor with the School of Computer Science and Engineering, Fudan University, China. He was a research fellow with the National University of Singapore from 2016 to 2017. He was a visiting scholar with the Ohio State University from 2013 to 2015. His research interests are mainly in the fields of parallel and distributed computing and database systems.

Xiaoyang Wang received the Ph.D. degree in computer science from the University of Southern California. He is currently a distinguished professor with the School of Computer Science, Fudan University, China. Before joining Fudan University in 2011, he was the dorothean chair professor in Computer Science with the University of Vermont. His research interests include data systems and data security. He is the fellow of CCF and the member of ACM.

Hongfeng Chai is currently a professor and the dean of the Institute of Financial Technology at Fudan University, China. Before joining Fudan University in 2021, he served as vice president of China Foreign Exchange Trade System and is currently a director of UnionPay and chairman and director of the National Engineering Laboratory of E-commerce and Electronic Payment. His research interests mainly focus on Financial Technology and Security. He is an academician of the Chinese Academy of Engineering.
All Tables
All Figures
![]() |
Figure 1. The development history of database test benchmarks [18] |
| In the text | |
![]() |
Figure 2. The schema of the test benchmark |
| In the text | |
![]() |
Figure 3. Data generator |
| In the text | |
![]() |
Figure 4. The operation process of the test program |
| In the text | |
![]() |
Figure 5. Metrics collector |
| In the text | |
![]() |
Figure 6. SF1 HTAP performance indicators |
| In the text | |
![]() |
Figure 7. SF5 HTAP performance indicators |
| In the text | |
![]() |
Figure 8. SF10 HTAP performance indicators |
| In the text | |
![]() |
Figure 9. AP_USER(MGMT_P1:80 UTILIZATION_LIMIT:100 MIN_IOPS:20 MAX_IOPS:100 WEIGHT_IOPS:20) |
| In the text | |
![]() |
Figure 10. AP_USER(MGMT_P1:20 UTILIZATION_LIMIT:100 MIN_IOPS:80 MAX_IOPS:100 WEIGHT_IOPS:80) |
| In the text | |
Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.










