Netty based application performance issues

I have a Producer Consumer based application based on Netty. The basic requirement was to build a message oriented middleware (MOM)

MOM

So the MOM is based on the concept of queuing (Queuing makes systems loosely coupled and that was the basic requirement of the application). The broker understands the MQTT protocol. We performed stress testing of the application on our local machine. These are the specs of the local machine.

在这里输入图像描述

We were getting great results. However, our production server is AWS Ubuntu based. So we stress tested the same application on AWS Ubuntu server. The performance was 10X poor than the local system. This is the configuration of the AWS server.

在这里输入图像描述

We have tried the following options to figure out where the issue is.

  • Initially we checked for bugs in our business logic. Did not find any.
  • Made the broker, client and all other dependencies same on mac as well as aws. What I mean by same dependencies is that we installed the same versions on aws as on mac.
  • Increased the ulimit on AWS.
  • Played with sysctl settings.
  • We were using Netty 4.1 and we had a doubt that it might be a Netty error as we do not have stable release for Netty 4.1 yet. So we even built the entire application using Netty 3.9.8 Final (Stable) and we still faced the same issue.
  • Increased the hardware configurations substantially of the AWS machine.
  • Now we have literally run out of options. The java version is the same on both machines.

    在这里输入图像描述

    So the last resort for us is to build the entire application using NodeJS but that would require a lot of effort rather than tweaking something in Netty itself. We are not searching for Java based alternatives to Netty as we think this might even be a bug in JVM NIO's native implementation on Mac and Ubuntu.

    What possible options can we try further to solve this bug. Is this a Netty inherent issue. Or is this something to do with some internal implementations on Mac and Ubuntu which are different and are leading to perfomance differences as we see them ?

    EDIT

    The stress testing parameters are as follows.

  • We had 1000 clients sending 1000 messages per second (Global rate).
  • We ran the test for about 10 minutes to note the latency.
  • On the server side we have 10 consumer threads handling the messages.
  • We have a new instance of ChannelHandler per client.
  • For boss pool and worker pool required by Netty, we used the Cached Thread pool.
  • We have tried tuning the consumer threads but to no avail.
  • Edit 2

    These are the profiler results provided by jvmtop for one phase of load testing.

    在这里输入图像描述

    链接地址: http://www.djcxy.com/p/95220.html

    上一篇: Netty 4从客户端创建多个连接

    下一篇: 基于Netty的应用程序性能问题