Performance And Hardware Requirements

Introduction

The Knomi Voice Matcher service can scale both vertically and horizontally. It can scale up by utilizing more cpu cores. In addition, an integrator can scale out this service by using multiple servers, though load balancing and enrolling to multiple servers must be handled by the integrator.

Methodology used for benchmarking

For benchmarking, we used 3 types of AWS instances- instance with 4 core, 8 core and 36 core. We run the server and http client in same host server to get performance numbers as close to the barebone as possible. E.g. this helps to eliminate the network round trip time out of the calculation. We used jmeter as http client for the testing. But other popular clients such as curl can be used for testing.

Size vCPU Mem (GiB)
c5.xlarge 4 8
c5.2xlarge 8 16
c5.9xlarge 36 72

We measured 2 metrics - response time and throughput. With varying number of threads, ramp-up time and request per thread (loop count) in jmeter, we observed the average response time and throughput from the test result. Then we took the minimum of those response times as the output response time. For throughput, we decided that up to 30-40% higher response time than the minimum response time might be allowable and took that corresponding throughput as the final throughput.

Benchmarking Results with latest version

voice matcher response time (ms)

  4 core 8 core 36 core
add 129 88 75
compare 245 165 148
verify 148 109 101
export 140 106 98

voice matcher throughput ( /sec)

  4 core 8 core 36 core
add 11.6 26.1 76.7
compare 7 13 39
verify 11.4 25.5 69.6
export 13 25.5 69

How to calculate hardware requirements

For all tested endpoints - addupdate, compare, verify and export, we found improved i.e. higher throughput and lesser response time with higher core AWS instances. So user can choose host server’s hardware config accordingly.

Please note, for both response time and throughput, though scaling up gives better performance as expected, but performance improvement is not exact linear with the amount of hardware resource increase.

From our tests, we found 8 core instance (c5.2xlarge) have almost 50% less time than 4 core instance (c5.xlarge) and throughput also increased more than 100% (almost linearly). Hence, when multiple servers are needed to support throughput requirement, we recommend to use as many 8 core server as possible and then the remaining ones with 4 core server.

On 8 core AWS instance, addupdate, verify and export endpoint can have ~1.5k transactions per minute and compare endpoint can have ~750 transactions per minute.

On 4 core AWS instance, addupdate, verify and export endpoint can have ~700 transactions per minute and compare endpoint can have ~400 transactions per minute.

If e.g. 10k transaction of type addupdate, verify or export need to be supported, then (10k / 1.5k) = 6.66, so 6 instances of 8 core machine, and then ((10k - (6 * 1.5k)) / 700 ) = 1.43, so 2 instances of 4 core machines, should be able to support the requirement.

If e.g. 10k transaction of type compare need to be supported, then (10k / 750) = 13.33, so 13 instances of 8 core machine, and then ((10k - (13 * 750)) / 400 ) = 0.625, so 1 instances of 4 core machines, should be able to support the requirement.