Performance And Hardware Requirements¶

Introduction¶

The Knomi Voice Matcher service can scale both vertically and horizontally. It can scale up by utilizing more cpu cores. In addition, an integrator can scale out this service by using multiple servers, though load balancing and enrolling to multiple servers must be handled by the integrator.

Methodology used for benchmarking¶

For benchmarking, we used 3 types of AWS instances- instance with 4 core, 8 core and 36 core. We run the server and http client in same host server to get performance numbers as close to the barebone as possible. E.g. this helps to eliminate the network round trip time out of the calculation. We used jmeter as http client for the testing. But other popular clients such as curl can be used for testing.

Size	vCPU	Mem (GiB)
c5.xlarge	4	8
c5.2xlarge	8	16
c5.9xlarge	36	72

We measured 2 metrics - response time and throughput. With varying number of threads, ramp-up time and request per thread (loop count) in jmeter, we observed the average response time and throughput from the test result. Then we took the minimum of those response times as the output response time. For throughput, we decided that up to 30-40% higher response time than the minimum response time might be allowable and took that corresponding throughput as the final throughput.

Benchmarking Results with latest version¶

voice matcher response time (ms)¶

	4 core	8 core	36 core
add	129	88	75
compare	245	165	148
verify	148	109	101
export	140	106	98

voice matcher throughput ( /sec)¶

	4 core	8 core	36 core
add	11.6	26.1	76.7
compare	7	13	39
verify	11.4	25.5	69.6
export	13	25.5	69

How to calculate hardware requirements¶

For all tested endpoints - addupdate, compare, verify and export, we found improved i.e. higher throughput and lesser response time with higher core AWS instances. So user can choose host server’s hardware config accordingly.

Please note, for both response time and throughput, though scaling up gives better performance as expected, but performance improvement is not exact linear with the amount of hardware resource increase.

From our tests, we found 8 core instance (c5.2xlarge) have almost 50% less time than 4 core instance (c5.xlarge) and throughput also increased more than 100% (almost linearly). Hence, when multiple servers are needed to support throughput requirement, we recommend to use as many 8 core server as possible and then the remaining ones with 4 core server.

On 8 core AWS instance, addupdate, verify and export endpoint can have ~1.5k transactions per minute and compare endpoint can have ~750 transactions per minute.

On 4 core AWS instance, addupdate, verify and export endpoint can have ~700 transactions per minute and compare endpoint can have ~400 transactions per minute.

If e.g. 10k transaction of type addupdate, verify or export need to be supported, then (10k / 1.5k) = 6.66, so 6 instances of 8 core machine, and then ((10k - (6 * 1.5k)) / 700 ) = 1.43, so 2 instances of 4 core machines, should be able to support the requirement.

If e.g. 10k transaction of type compare need to be supported, then (10k / 750) = 13.33, so 13 instances of 8 core machine, and then ((10k - (13 * 750)) / 400 ) = 0.625, so 1 instances of 4 core machines, should be able to support the requirement.