With the vast advances of Internet services, large-scale and high-performance servers, such as CC-NUMA multiprocessors, are gaining importance in network computing. In a CC-NUMA multiprocessor, the key component to connect a computing node to the interconnection network is the node controller. Node controllers perform protocol processing to transmit messages with other nodes in the system. As the new generation CC-NUMA multiprocessors are moving towards application-specific protocol processing, a node controller will require very powerful protocol processors or engines to provide the flexibility of processing different kinds of protocols. In this paper, we study the design of a thread-based node controller, in which protocol engines have a multithreaded architecture. Multithreading allows protocol processing of different requests to proceed in parallel, whereby reducing blocking and improving response time. Four important design parameters for a multithreaded protocol engine are examined: (1) the number of thread context storages, (2) the number of protocol operation units, (3) the scheduling policy and (4) the thread allocation scheme. From the application-driven simulation on six representative applications, we conclude that the number of thread contexts and protocol operation units have a great impact on the overall system performance. An appropriate thread allocation scheme for invalidation traffic is needed, and prioritizing a thread and scheduling it accordingly are also important for the system performance.