1. Introduction
In recent years, distributed and microservice systems have been strongly dependent on multilayered software stacks and external libraries. Therefore, with the increase in dependencies, the decrease in maintainability and the risk of conflicts have become apparent. In particular, in the MySQL [
1] environment,
libmysqlclient is highly functional, but its implementation dependency is deep, and when dealing with the replication protocol, linking with many static libraries inside the server source is required (Table 2). This structure makes it difficult to apply lightweight or embedded environments.
In contrast, the lightweight MySQL client library
Trilogy [
3] published by GitHub adopts a design optimized for asynchronous I/O while minimizing external dependencies. In this study, by extending this Trilogy with the client implementation of the replication protocol (COM_BINLOG_DUMP=0x12) (This implementation has been published as a Trilogy Pull Request. (commit hash: a61c97a) (
https://github.com/trilogy-libraries/trilogy/pull/247) ), a low-overhead inter-process communication (IPC) module between the MySQL server and applications was realized.
2. Background and Issues
2.1. Overview of the Replication Mechanism
MySQL Server replication is mainly performed through binlog (binary log). All transaction events are recorded and transferred through the Log_event class group inside the server. The normal client library (libmysqlclient) implements only SQL-level communication such as COM_QUERY and COM_PING, and the replication protocol (COM_BINLOG_DUMP) is implemented only inside the server.
2.2. Dependent Source Files and Build Issues
In order to use the replication mechanism of MySQL Server directly on the application side, it is necessary to depend on the internal source files of the main server.
Table 1 shows the list.
These source files are designed on the premise that they are used only inside
mysqld, and it is also necessary to depend on the static libraries in
Table 2 at build time.
For this reason, even if only libmysqlclient is linked, compilation and linking will not pass, and it is necessary to reproduce the entire build environment of MySQL Server. In addition, since these libraries are provided under the GPL-2 license, there are also licensing constraints for use in proprietary products by static linking.
2.3. Summary of Problems
The above structural issues are summarized as follows.
Build Complexity: A large number of dependencies reduces reproducibility, and the reproducibility of the build environment is low.
Maintenance Difficulty: The internal ABI tends to change with MySQL version upgrades, making relinking difficult.
License Risk: It is necessary to link GPL code, which is not suitable for MIT/BSD environments.
2.4. Issues of libmysqlclient
To process replication events of MySQL Server, it is necessary to link internal library groups (libmysys.a, libmysql_binlog_event.a, libmysql_serialization.a, etc.) (see
Table 2). These are designed for internal use of the server, and since they are not intended to be used externally, the build configuration becomes complicated, and ABI compatibility is not guaranteed.
Using these involves operational risks in terms of both licensing and technology.
2.5. MySQL Dependency and System Design Constraints
libmysqlclient is distributed under GPL-2, and when statically linked in commercial systems, license risks arise. Also, since the group of dependent libraries is complex, conflicts and compatibility problems are likely to occur due to differences in build environments. These are fatal in resource-constrained environments such as IoT and edge devices.
3. Proposed Method
3.1. Low-Dependency IPC Design Using Trilogy
In this study, we use the MIT-licensed lightweight MySQL client Trilogy as a foundation, and by extending it with replication protocol functionality, we construct a low-dependency and a lightweight and loosely coupled IPC layer (
Figure 1).
trilogy_binlog_dump() requests a binary log stream from the MySQL server
trilogy_binlog_dump_recv() sequentially receives and analyzes event packets
The event header is analyzed by trilogy_parse_binlog_event_packet(), and FORMAT_DESCRIPTION_EVENT, TABLE_MAP_EVENT, and WRITE_ROWS_EVENT are reconstructed in user space
Table 3.
mysql Trilogy Dependencies.
Table 3.
mysql Trilogy Dependencies.
| Item |
Conventional (libmysqlclient) |
Proposed (Trilogy extension) |
| Number of dependent sources |
13 or more |
0 |
| Static link libraries |
7 |
1 (libtrilogy.a) |
| License |
GPL-2 |
MIT |
| Build time |
about 30 minutes |
a few seconds |
| Execution performance |
High speed |
High speed |
By this method, it became possible to handle the replication protocol directly from the application layer without depending on the internal structure of MySQL Server.
3.2. Microservice Configuration
This IPC library functions as an intermediate layer between the control plane (ctrl-plane) and the data plane (data-plane) in a microservice architecture (
Figure 2). Applications subscribe to the binary logs of the MySQL server through Trilogy and can asynchronously transmit session information and commands.
3.3. Comparison of Evaluation Results
In this section, we compare
operational metrics for communication between the control plane and data plane in SFU when using the proposed method (Trilogy extension) and the conventional configuration (
libmysqlclient). Both configurations were implemented on a Docker environment, and CPU noop (halt) time during IPC communication associated with SDP exchange processing was measured. The HALT value is an indicator of the CPU processing capacity remaining per processing cycle after the Edge-SFU performs real-time packet processing, and is an indicator that comprehensively reflects I/O wait, packet throughput, number of remaining processes in the loop, etc. By using this indicator, node load and the accuracy of autonomous control decisions under actual operation can be quantitatively evaluated. In addition, container size and build time were also measured, and the difference in overall build cost and runtime overhead was confirmed. The evaluation results are shown in
Table 4.
In the Trilogy-based configuration, significant improvements were confirmed in build cost, dependent libraries, and container size compared to the libmysqlclient configuration. In this study, by selecting a simple INSERT-ONLY workload that does not consider transactional consistency or multi-statements for the purpose of stable delivery for pub/sub use, the evaluation became dominated by scheduler behavior rather than being I/O-bound.
3.4. Context Switch Comparison
In this section, we verified the differences in I/O context management methods between the proposed method (Trilogy extension) and the conventional method (libmysqlclient), and analyzed the effect on the number of context switches. Measurements used pidstat -wt 1 and strace -fc -p (pid), and in addition, the call path of poll(2) was traced from symbol traces using gdb.
As shown in
Figure 3, in the Trilogy implementation, main thread contexts A, B, and C exhibited stable context-switch cycles.
Point A In SFU-0 (main thread) and SFU-2 (DB Ingress context), a polling loop with an interval of 1 ms as designed is maintained, and stable scheduling was confirmed even under increased load.
Point B The RTC / SCTP / JUICE prefixes are internal threads of
libdatachannel [
2], and at PeerConnection initialization,
std::threads equal to the number of CPU cores are generated, and event wait loops are resident.
Point C The JUICE lower layer (UDP socket processing) uses a receive loop based on poll(2), and showed a tendency for CS to increase in proportion to the number of sessions.
Figure 3.
CS distribution for each SFU context by pidstat (Trilogy)
Figure 3.
CS distribution for each SFU context by pidstat (Trilogy)
Figure 4.
CS distribution for each SFU context by pidstat (mysqlclient)
Figure 4.
CS distribution for each SFU context by pidstat (mysqlclient)
Figure 5.
CS distribution for each SFU context by pidstat (mysqlclient detail)
Figure 5.
CS distribution for each SFU context by pidstat (mysqlclient detail)
As shown in
Figure 6, it was confirmed that the call arguments (0 or -1) of
poll(2), the number of calls, and whether or not
ppoll(2) is used are the main differences between the two implementations.
Point A/B As shown in
Figure 6, the difference between Trilogy and mysqlclient appeared as a difference in lock waits and the number of
usleep(2) calls.
Point C/D As shown in
Figure 6, mysqlclient uses
poll(pfd, 1, 0);, whereas Trilogy specifies infinite waiting (
timeout=-1) with
poll(pfd, 1, -1);, and waits for socket state changes while occupying the thread context.
|
Listing 1. Socket wait implementation example in Trilogy |
 |
The design of Trilogy does not aim to improve throughput by asynchronous I/O, but aims at maintaining the performance of the main context by reducing context switches. By performing infinite waiting with poll(2), it can wait while keeping the CPU cache, and suppress unnecessary thread scheduling. In contrast, in the libmysqlclient implementation (vio/viosocket.cc), it was confirmed that 0 (non-blocking) is specified for the timeout of poll(2). This is as shown in the following excerpt.
|
Listing 2. poll call in mysqlclient |
 |
As a result of
gdb analysis, in the
mysqlclient implementation, non-blocking operation by
poll(0) was frequently observed, and since this is combined with
ppoll(2) accompanied by signal mask processing, the number of context switches tends to be larger than Trilogy. This function is implemented in MySQL Connector CPP (mysql/mysql-connector-cpp/blob/trunk/cdk/foundation/socket _detail.cc#L862), and in the path linked as libmysqlclient, timeout_usec=0 is passed as a fixed value. (
https://github.com/mysql/mysql-connector-cpp/blob/trunk/cdk/foundation/socket_detail.cc). In the internal implementation of MySQL Connector/C (libmysqlclient), the timeout of poll() is fixed to 0 in the normal path, and no means of changing it from the application side is provided. Although there are some paths that specify poll(-1) or arbitrary timeout in connection establishment and TLS processing, in persistent stream processing such as COM_BINLOG_DUMP, poll(0) is used fixedly. Therefore, in the normal path, it should be regarded as a "design policy difference" rather than a "default difference."
3.5. Discussion
poll(0): User-space loop (busy-ish) → increases context switches, causes non-linear spikes.
poll(-1): Kernel manages sleep → stable RUNNABLE density, easier cooperation between scheduler and IRQ.
The reduction of context switches in Trilogy is based on a design that suppresses competition for CPU caches and memory access bandwidth, and maintains consistent performance of the application main context. In modern multi-core environments, it suggests that suppressing unnecessary scheduling contributes more to performance improvement than improving throughput through asynchronous I/O.
On the other hand, the design that causes frequent context switches by using short-timeout polling as in libmysqlclient is effective in high-load and multi-connection environments. Because frequent scheduling distributes CPU usage and ensures fairness among waiting threads, the balance between fairness and throughput is improved in some use cases. In a single IPC environment such as this study, suppressing context switches by infinite waiting with poll(-1) contributes to maintaining main context performance, but in multi-connection environments, the signal mask + short-term polling strategy may be advantageous.
4. Limitations and Future Work
This report focuses on the waiting strategies of poll(0) or poll(-1) under the CFS scheduler (Completely Fair Scheduler), but in recent years, io_uring Linux 5.1 has been introduced to asynchronously control kernel I/O from user space. Kernel-bypass designs such as io_uring and DPDK aim to eliminate scheduler intervention, which differs from the "waiting optimization based on scheduling behavior" in this paper. Therefore, they are excluded from the comparison in this study.
5. Related Works
5.1. Microservice Design Patterns
Meijer et al. [
4] experimentally evaluated the impact of microservice design patterns (Gateway Aggregation, etc.) on system performance, and clarified bottleneck transitions and nonlinear behavior in CPU utilization. This study is complementary in that it presents micro-level optimization at the library layer (libmysqlclient vs Trilogy) in contrast to macro-level optimization of design patterns.
5.2. Linux Scheduler’s Complex Load Balancing Algorithms
Lozi et al. [
6] showed that Linux scheduler’s complex load balancing algorithms cause unnecessary thread waiting in NUMA environments. The
poll(-1) strategy in the Trilogy implementation in this study is an approach to realize scheduler simplification from the application layer by suppressing excessive CS.
5.3. poll, ppoll
In multi-core CPU environments, frequent CS causes the following performance degradation [
5]: (1) increased L1/L2 cache miss rate due to loss of cache locality, (2) increased memory latency due to NUMA node migration, (3) spinlock waiting due to kernel lock contention. These are particularly serious in high-thread-density IPC processing.
5.4. Performance Characteristics
The effectiveness of reducing context switches, which is the focus of this study, is a particularly important issue in modern multi-core and many-core architectures. In multi-core CPU environments, frequent CS between threads is known to cause significant performance degradation due to the following factors.
Loss of cache locality Every time a CS occurs, CPU cache lines are flushed and the working set of the next thread is reloaded. This increases the L1/L2 cache miss rate and causes significant memory access delays.
NUMA (Non-Uniform Memory Access) node migration When threads are scheduled on different cores, the memory reference destination node changes, increasing memory latency.
Increased interrupt handlers and kernel lock contention In environments with high CS frequency, kernel lock acquisition contention increases, and CPU cycles are wasted by spin locks.
These phenomena are particularly serious in IPC processing with high thread density and microservice platforms that adopt multi-parallel event loops, and it has been reported that stabilization strategies by CS reduction are effective.
6. Conclusion
In modern ManyCore (≈64 core) environments, the characteristics of Trilogy-type and mysqlclient-type can be summarized as follows.
Trilogy-type is optimal for "pinning CPU to a small number of tasks" use (e.g., low-latency IPC). This method uses poll(-1) for long-term waiting, avoiding frequent wakeups by the kernel scheduler, and suppressing unnecessary context switches. As a result, cache locality is maintained and the effective CPU utilization rate is stabilized.
mysqlclient-type is optimal for "rotating CPU through many tasks" use (e.g., web servers, DB clients) This method repeats short-cycle poll / ppoll, making it easier for the CFS scheduler to ensure fairness among tasks, and can improve throughput in multi-connection environments.
Therefore, which is better depends on the use, and when "using MySQL as IPC" in combination with CPU Pinning, the Trilogy-type is suitable (poll(-1) dominates the main loop). Conversely, in cases such as MySQL Connector and DB Proxy, where CPU Pinning is not used and "one process handles hundreds of connections," it is considered that the CS-intensive (short poll / ppoll) strategy is more globally optimal.
|
Listing 3. inc/trilogy/client.h |
 |
|
Listing 4. inc/trilogy/protocol.h |
 |
|
Listing 5. src/protocol.c |
 |
|
Listing 6. src/client.c |
 |
References
- Oracle Corporation: MySQL Internals Manual, 2024. Available online: https://dev.mysql.com/doc/internals/en/.
- Paul-Louis Ageneau, libdatachannel, GitHub repository. Available online: https://github.com/paullouisageneau/libdatachannel.
- GitHub: Trilogy is a client library for MySQL-compatible database servers, designed for performance, flexibility, and ease of embedding., 2023. Available online: https://github.com/trilogy-libraries/trilogy.
- W. Meijer, C. W. Meijer, C. Trubiani, A. Aleti: "Experimental Evaluation of Architectural Software Performance Design Patterns in Microservices," Journal of Systems and Software, 2024. [CrossRef]
- The Linux man-pages project: “poll(2), ppoll(2) - Wait for some event on a file descriptor,” Linux Programmer’s Manual. Available online: https://man7.org/linux/man-pages/man2/poll.2.
- J.-P. Lozi, B. J.-P. Lozi, B. Lepers, J. Funston, F. Gaud, V. Quema, A. Fedorova: "The Linux Scheduler: a Decade of Wasted Cores," Proceedings of EuroSys 2016, ACM, pp. 1-16. Available online: https://people.ece.ubc.ca/sasha/papers/eurosys16-final29.pdf. [CrossRef]
|
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).