Concretely eﬃcient secure multi-party computation protocols: survey and more

Secure multi-party computation (MPC) allows a set of parties to jointly compute a function on their private inputs, and reveals nothing but the output of the function. In the last decade, MPC has rapidly moved from a purely theoretical study to an object of practical interest, with a growing interest in practical applications such as privacy-preserving machine learning (PPML). In this paper, we comprehensively survey existing work on concretely eﬃcient MPC protocols with both semi-honest and malicious security, in both dishonest-majority and honest-majority settings. We focus on considering the notion of security with abort, meaning that corrupted parties could prevent honest parties from receiving output after they receive output. We present high-level ideas of the basic and key approaches for designing diﬀerent styles of MPC protocols and the crucial building blocks of MPC. For MPC applications, we compare the known PPML protocols built on MPC, and describe the eﬃciency of private inference and training for the state-of-the-art PPML protocols. Furthermore, we summarize several challenges and open problems to break though the eﬃciency of MPC protocols as well as some interesting future work that is worth being addressed. This survey aims to provide the recent development and key approaches of MPC to researchers, who are interested in knowing, improving, and applying concretely eﬃcient MPC protocols.


Introduction
Secure multi-party computation (MPC) allows a set of parties to jointly compute a function on their private inputs without revealing anything but the output of the function. Specifically, MPC allows n parties to jointly compute the following function: (y 1 , . . . , y n ) ← f (x 1 , . . . , x n ), where every party P i holds an input x i , obtains an output y i , and can learn nothing except for (x i , y i , f ), and function f is often modeled as a Boolean or arithmetic circuit. MPC is a foundation of cryptography, and is also a core technology to protect privacy of data for cooperative computing in the big data era.

Our contribution
In this paper, we comprehensively survey the known work on concretely efficient MPC protocols with both semi-honest and malicious security. This survey not only covers the work in the dishonest-majority setting, but also concerns on the recent development in the honest-majority setting. Our survey involves not only secret-sharing based MPC but also garbled-circuit based MPC. We present the basic approaches for designing concretely efficient MPC protocols, and give the high-level ideas for the key techniques underlying these MPC protocols. In particular, we give a uniform framework and view of MPC protocols based on additive, Shamir, and replicated secret sharings. We also provide a high-level view of the recent approach with sublinear communication to design correlated oblivious transfer (COT) that is a crucial building block for dishonest-majority MPC. In addition, we describe one important application of MPC (i.e., privacy-preserving machine learning, or PPML in short), and summarize the known PPML protocols in terms of functionality, security, techniques and neural-network architectures as well as discussing several key techniques for designing concretely efficient PPML protocols. Furthermore, we propose several challenges and interesting open problems to break through the efficiency of differently flavored MPC protocols, and also present some future work that need to be addressed for developing or deploying MPC.
Comparison with other MPC surveys. Recently, Lindell [88] presented a short MPC survey, which gives an overview of the security of MPC, a summarization of MPC feasible results, an honest-majority MPC framework based on Shamir secret sharing, two specific MPC protocols (i.e., PSI and threshold RSA), a very short overview of dishonest-majority MPC, and several application examples of MPC. This survey focuses on the notion and security of MPC and how MPC is being currently used, and does not involve the recent development of MPC protocols which is addressed by our survey. Besides, Orsini [89] gave a survey for only SPDZ-style MPC protocols in the dishonest-majority malicious setting. Compared to the two surveys [88,89], our survey is more comprehensive, presents an important MPC application (i.e., PPML) and also gives interesting future work for MPC.

Organization
In Section 2, we define the necessary notation to be used in the subsequent main body, and then describe the security and communication models for MPC and give a sketch of several basic building blocks of MPC. We describe the development and the key techniques for MPC based on secret sharings (resp., garbled circuits) in Section 3 (resp., Section 4). We present the crucial building blocks for dishonestmajority MPC in Section 5. In Section 6, we summarize the protocols that apply MPC to realize private inference and training of machine learning, as well as several key techniques for PPML applications. Finally, we conclude this survey, and propose open problems and future work for MPC.

Notation
We use κ and ρ to denote the computational and statistical security parameters, respectively. In the known MPC implementations, we often adopt κ = 128 and ρ = 40, where sometimes we also consider ρ = 64 or ρ = 80. For two integers a, b with a < b, we denote by [a, b] the set {a, . . . , b} and by [a, b) the set {a, . . . , b − 1}. We use x ← S to denote sampling x uniformly at random from set S and x ← D to denote sampling x according to distribution D. For bit-string x, we use lsb(x) to denote the least significant bit of x. For row vector x, we use x i to denote the ith component of x with x 1 the first entry, and denote by HW(x) the Hamming weight of x (i.e., the number of non-zero entries in vector x). For two families of distributions X = {X κ } κ∈N and Y = {Y κ } κ∈N , we write X arithmetic expressions involving both elements of F and elements of K, it means that values in F are viewed as polynomials lying in K with only constant terms. Specifically, we use F 2 κ to denote the degreeκ extension field of F 2 . In the context of MPC for Boolean circuits, we often use {0, 1} κ , F κ 2 and F 2 κ interchangeably, and thus addition in F κ 2 or F 2 κ corresponds to XOR in {0, 1} κ . Most of the known MPC protocols model a function as a circuit. Specifically, circuit C over an arbitrary field F is defined by a set of input wires and output wires along with a list of gates of the form (α, β, γ, T ), where α, β are the indices of the input wires of the gate, γ is the index of the output wire of the gate, and T ∈ {ADD, MULT} is the type of the gate. If F = F 2 , then C is a Boolean circuit with ADD = ⊕ and MULT = ∧. Otherwise, C is an arithmetic circuit where ADD/MULT corresponds to addition/multiplication in field F. We let |C| denote the number of MULT gates in the circuit.
We will use n and t to denote the number of total parties and the number of corrupted parties respectively, unless otherwise specified. Sometimes, we call t as the corruption threshold.

Security and communication models
Security model. All security properties for MPC can be formalized in an Ideal/Real paradigm [90,91], which provides a modular way to prove security. The real-world execution where the parties interact with semi-honest/malicious adversary A and execute a protocol Π is compared to the ideal-world execution where the parties interact with a simulator S and an ideal functionality F. In the ideal word, F plays the role of an incorruptible trusted party, and captures the security of MPC protocols. We use P 1 , . . . , P n to denote n parties running a protocol. In the multi-party setting (i.e., n > 2), we often consider a rushing adversary A, meaning that A is allowed to receive its incoming messages in a round before it sends its outgoing message. If the adversary is allowed to be computationally unbounded, the protocol is said to be information-theoretically secure. If the adversary is bounded to (non-uniform) probabilistic polynomial time (PPT), the protocol satisfies the computational security.
The known MPC protocols mainly use two types of simulation models: stand-alone setting [91,92] and universal composability (UC) [90]. Compared to stand-alone model, UC model additionally involves an environment Z which can determine the inputs of honest parties and receive the outputs from the honest parties. This may make security proofs of MPC protocols somewhat more complex. While stand-alone model only guarantees the security under the sequential composition, UC model has the property that the protocols maintain their security, even though running concurrently with other (in)secure protocols. According to the result from [93], we obtain that any MPC protocol in the stand-alone model, which is proven secure with a black-box straight-line simulator and has the property the inputs of all parties are fixed before the protocol execution begins (referred as to start synchronization or input availability), is also secure under concurrent composition.
Communication model. The default method of communication for MPC is authenticated channel, which can be implemented in practice using the known standard techniques. In the multi-party setting, all parties are also connected via point-to-point channels, and sometimes need also a broadcast channel. A broadcast channel can be efficiently implemented using a standard 2-round echo-broadcast protocol [94], as we only consider security with abort. The communication overhead of this broadcast protocol can be significantly improved by letting all parties send the hash outputs of received messages and aborting if an inconsistent hash value is detected. Sometimes, the parties need to communicate over a private channel, meaning that the messages sent over such channel are kept confidential and authenticated. As such, private channel can be established using the standard techniques.

Oblivious transfer and oblivious linear-function evaluation
Informally, oblivious transfer (OT) [95,96] involves two parties where a sender inputs two messages (m 0 , m 1 ) and a receiver inputs a choice bit b, and allows the receiver to obtain m b while keeping b secret against the sender and m 1−b confidential against the receiver. We use an OT extension protocol to generate a large number of OT correlations (see Section 5). Oblivious linear-function evaluation (OLE) is an arithmetic generalization of OT to a large field F, and is a special case of oblivious polynomial evaluation introduced in [97]. Specifically, OLE is a two-party protocol between a sender and a receiver, where the sender inputs u, v ∈ F, and the receiver inputs x ∈ F and obtains an output y = u · x + v ∈ F. We discuss the development of OT/OLE and their variants in Section 5.

Commitment and coin-tossing
Commitment. In the dishonest-majority setting, MPC protocols often need a commitment scheme to commit a value while keeping it secret (the hiding property), and then to open the value while keeping it consistent with the one that has been committed (the binding property). To achieve high efficiency, the commitment scheme is often constructed in the random-oracle model by defining Commit(x) = H(x, r), where x is a message, r is a randomness, and H : {0, 1} * → {0, 1} 2κ is a cryptographic hash function modeled as a random oracle.
Coin-tossing. A lot of MPC protocols with malicious security require the parties to generate multiple public randomness by executing a coin-tossing protocol, while guaranteeing that no malicious parties can control the randomness or make them deviate from uniform distribution. In the dishonestmajority setting, the coin-tossing protocol is constructed by that (1) every party P i commits a random seed s i and then opens the seed; (2) every party computes s := i∈[n] s i and then generates the public randomness with s and a pseudorandom generator (PRG) modeled as a random oracle. In the honest-majority setting, the coin-tossing protocol is constructed in a totally different way. Specifically, all parties generate multiple random shares, and then open these shares as the public randomness, where random shares can be generated very efficiently when a majority of parties are honest.

MPC protocols based on secret sharings
Based on the secret-sharing approach, the concretely efficient MPC protocols enable the parties to send small messages per non-linear gate, but has a number of rounds linear to the depth of the circuit being computed. For now, concretely efficient MPC protocols mainly adopt three kinds of linear secret-sharing schemes (LSSSs): additive secret sharing, Shamir secret sharing [98], and replicated secret sharing (a.k.a., CNF secret sharing) [99,100], where additive secret sharing is mainly used for MPC protocols in the dishonest-majority setting, while Shamir and replicated secret sharings are adopted for honest-majority MPC protocols. We first recall the constructions of these LSSSs in a uniform view. To achieve malicious security, additive secret sharing needs to be equipped with information-theoretic message authentication codes (IT-MACs), and thus we define two types of IT-MACs used in MPC with dishonest majority. Note that IT-MACs are unnecessary for Shamir/replicated secret sharing in the honest-majority setting. Then, based on LSSSs, we describe how to construct semi-honest MPC protocols with a uniform structure. Finally, we present how to transform semi-honest MPC protocols to maliciously secure MPC protocols using the state-of-the-art checking techniques.

Linear secret-sharing schemes
All the three kinds of LSSSs used in MPC are (n, t)-threshold secret sharing schemes, which enable n parties to share secret x among the parties, such that no subset of t parties can obtain any information on secret x, while any subset of t + 1 parties can reconstruct secret x. Additive secret sharing can only be made for t = n − 1, while Shamir/replicated secret sharing allows any t < n (we often adopt t < n/2 for honest-majority MPC). The three kinds of LSSSs are defined over a field F. While additive/replicated secret sharing allows an any-sized field (including F 2 ), Shamir secret sharing requires |F| > n. Below, we describe the constructions of these LSSSs and the useful procedures for designing MPC protocols.
• Share(x): For x ∈ F, a dealer generates n shares x 1 , . . . , x n . By [x], we denote the sharing of x.
(a) Additive secret sharing: This is the simplest LSSS as far as we know. To share a value x ∈ F, the dealer samples x i ← F for i ∈ [1, n] such that i∈ [1,n] x i = x, and sends x i to P i .
(b) Shamir secret sharing: Let α 1 , . . . , α n ∈ F be n distinct non-zero elements (e.g., α i = i for i ∈ [1, n]). The dealer samples a random polynomial f (X) of degree t over F such that f (0) = x, and then sends x i = f (α i ) to P i . Shamir secret sharing is mainly used for a large n in the honest-majority MPC protocols. (c) Replicated secret sharing: The dealer samples T consists of all sets of t parties. Every party P i obtains shares x T for all T ∈ T subject to i / ∈ T . In general, the total number of shares is n t and every party stores n−1 t shares, which can become very large as n and t grow. Therefore, we mainly use replicated secret sharing when n is small. For example, if n = 3 and t = 1, then the dealer samples x 1 , x 2 , x 3 ← F such that x 1 + x 2 + x 3 = x, and then sends (x 2 , x 3 ) to P 1 , (x 1 , x 3 ) to P 2 and (x 1 , x 2 ) to P 3 .
Note that the shares for replicated/Shamir secret sharing need to be sent over a private channel, but this may be not necessary for additive secret sharing in most cases. In MPC protocols, either a party P i acts as a dealer if it knows secret x, or the computation of a dealer is performed distributedly by a protocol if no one knows x. In the computational setting, we can use the pseudo-random secret sharing (PRSS) approach [99] to reduce the communication cost that distributes the shares of Shamir and replicated sharings. For example, to generate a degreet Shamir sharing [x] t , the dealer can send a random seed s i to P i who computes x i with s i and a pseudo-random generator PRG for each i ∈ [1, t]. Then, the dealer sends x t+1 , . . . , x n to P t+1 , . . . , P n , respectively, such that (x 1 , . . . , x t+1 ) defines a degree-t polynomial f with f (0) = x and x i = f (α i ) for i ∈ [t + 2, n]. Since the seeds s 1 , . . . , s t can be reused for multiple Shamir sharings, the communication to send the seeds could be ignored. Besides, based on the PRSS technique [99], we can convert a random replicated sharing into a random Shamir sharing with a small communication to distribute the seeds. This technique is only suitable for a small number of parties, as the number of random seeds is n t and grows exponentially with the number of parties.
• Reconstruct([x], i): This procedure enables only P i to obtain secret x. When any t + 1 shares of [x] are sent to P i over a private channel, secret x can be reconstructed by P i as follows: (a) Additive secret sharing: Given {x j } j =i from all other parties, P i computes x := i∈ [1,n] x i . (b) Shamir secret sharing: Secret x can be reconstructed using Lagrange interpolation. Without loss of generality, we assume that P i gets shares x 1 , . . . , x t+1 . Then P i can compute x := i∈ [1,t+1] Specifically, for (n, t) = (3, 1), after receiving x i from one party, P i computes x := x 1 + x 2 + x 3 . • Open([x]): This procedure allows all parties to know x. This is easy to be realized by executing h for all T ∈ T with i / ∈ T . In particular, when n = 3 and t = 1, The LSSSs defined as above guarantee perfect privacy in the presence of malicious adversaries, but only provide correctness against semi-honest adversaries. To achieve malicious security for the case of t < n/2, we need to modify the Reconstruct procedure as follows: Then, for j ∈ [1, n]\[1, t + 1], P i computes f (α j ) and checks that x j = f (α j ). If the check fails, P i aborts; otherwise, it computes x := f (0). If the check passes, then all shares are computed from a unique polynomial f (X). We know that t + 1 shares of n shares are correct from honest parties, and uniquely determine the polynomial f (X). Therefore, x reconstructed by P i is correct, if P i does not abort. If multiple secret values need to be reconstructed, then we can optimize the communication similarly using a cryptographic hash function. In particular, P j for all j ∈ [1, n]\[1, t + 1] send the hash values of their shares to P i , and then P i computes their shares using polynomial interpolation and checks whether these hash values are correct. (b) Replicated secret sharing: For the sake of simplicity, we focus on the case that only one party of three parties allows to be corrupted (i.e., n = 3 and t = 1). For other honest-majority cases, this procedure can be done similarly.
. If the check fails, P i aborts; otherwise, it defines x i := x i (i−1) and computes x := x 1 + x 2 + x 3 . Since either P i−1 or P i+1 is honest, the equality check can guarantee that share x i is correct, and thus the secret x reconstructed by P i is correct.
]. If P i−1 is honest, then it is clear that P i will obtain the correct secrets. Otherwise (i.e., P i+1 is honest), if there exists some h ∈ [1, ] such that x i h =x i h , then P i will abort except with probability negl(κ), based on the second pre-image resistance of H.
For additive secret sharing, we need to equip it with IT-MACs to guarantee the security of procedures Reconstruct and Open in the presence of malicious adversaries, which is shown in the next subsection.

Information-theoretic message authentication codes
In the dishonest-majority setting, MPC protocols can use additive secret sharing to execute the circuit evaluation privately [4,102]. This is sufficient for semi-honest security. Nevertheless, in the malicious setting, IT-MACs need to be introduced to guarantee the correctness of secret values [103,104]. Currently, there are two-style IT-MACs that are used in MPC protocols: BDOZ-style [105] and SPDZ-style [103]. While the original IT-MACs are defined over a single large field, it is easy to extend them so that values are defined over an any-sized field F and authentication is done over a large extension field K, which is described as follows: • BDOZ-style IT-MACs [105]: Sample a global key (a.k.a. MAC key) ∆ ← K. For a message x ∈ F, sample a local key K ← K, and define an MAC on x as M = K + x · ∆, where (x, M) is held by a party P A and (K, ∆) is produced by the other party P B . If a malicious P A forges an MAC M on , which occurs with probability 1/|K| as ∆ is perfectly hidden. • SPDZ-style IT-MACs [103]: Sample a global key ∆ ← K. For a message x ∈ F, the MAC on x is defined as M = x · ∆. Note that every party holds the additive shares of ∆ and M, and no party knows the key ∆ and the MAC M (see below for details). The security analysis is similar to that of BDOZ-style IT-MACs.
It is easy to see that SPDZ-style IT-MACs are more compact than BDOZ-style IT-MACs. Nevertheless, when applying to MPC, it is incomparable for the IT-MACs of two styles. While BDOZ-style IT-MACs are more suitable to be used in constant-round MPC protocols based on distributed garbling [106][107][108][109][110][111], SPDZ-style IT-MACs are mainly adopted to transform the semi-honest GMW protocol [4] into efficient MPC protocols with malicious security.
Given the above IT-MACs, we can define authenticated secret sharing (i.e., additive secret sharing with IT-MACs authentication) as follows: • Initialize: For each i ∈ [1, n], a dealer samples ∆ i ← K, and sends ∆ i to P i . For SPDZ-style IT-MACs, the dealer also computes ∆ := i∈[1,n] ∆ i , where ∆ i is the P i 's share of ∆ and can be also written as ∆ i in this case. • Share(x): For x ∈ F, the dealer generates n random additive shares x 1 , . . . , x n ∈ F such that i∈ [1,n] x i = x, where x i is the P i 's share. Then, the dealer computes their MAC tags as follows: (a) BDOZ-style: Each share x i is authenticated independently by n − 1 different parties. In particular, for each } j =i to every party P i . (b) SPDZ-style: Compute M := x · ∆, and then sample uniform MAC shares M 1 , . . . , M n such that i∈ [1,n] M i = x · ∆, i.e., i∈ [1,n] M i = ( i∈ [1,n] x i ) · ( i∈[1,n] ∆ i ). By [[x]], we denote the authenticated sharing of x.
• . , x j , τ j ) to P i over a private channel. Then, for j = i, P i checks that The soundness error is bounded by (n − 1)/|K| + (n − 1)/2 κ based on the analysis [112]. Following the work [ (iii) Every party P i computes σ i := M(y) i − y · ∆ i , and then commit σ i to all other parties. (iv) Every party P i opens σ i to all other parties.
For MPC protocols in the dishonest-majority setting, the dealer is realized distributedly by executing a protocol, where global key ∆ i is sampled uniformly at random by party P i . Note that authenticated shares (i.e., additive secret sharing equipped with IT-MACs) are still additively homomorphic, as IT-MACs are additively homomorphic. That is, given public coefficients c 1 , . . . , c , c ∈ F, all parties can compute locally [[y]] := h∈ [1, ] c h · [[x h ]] + c. Authenticated shares of both styles are able to be generated using COT or its arithmetic generalization vector oblivious linear-function evaluation (VOLE), which are discussed in Section 5.

Semi-honest protocols
In the semi-honest setting, we use a simple framework to unify the state-of-art concretely efficient MPC protocols, including 1) the GMW protocol [4] with optimizations [102,[113][114][115] based on additive secret sharing; 2) the BGW protocol [2] with optimizations [116][117][118][119] based on Shamir secret sharing; and 3) the secure three-party computation (3PC) protocol [87] based on replicated secret sharing. Here, for MPC based on replicated secret sharing, we focus on the three-party case for the sake of simplicity. While the original GMW protocol only considers Boolean circuits, we easily extend it to arithmetic circuits over any finite field F [120]. Similarly, while the state-of-the-art 3PC protocol [87] with semihonest security in the honest-majority setting focuses on the case of Boolean circuits, it is easy to extend this protocol to work over any finite field F [101]. Toward more parties (e.g., the number of parties is n = 4 or n = 5), MPC protocols based on replicated secret sharing can be constructed efficiently (see, e.g., [124][125][126]). In the presence of semi-honest adversaries, the GMW-like protocols and the MPC protocols based on replicated secret sharing can be straightforwardly extended to work over a ring such as Z 2 k for k = 32 or k = 64. Furthermore, the BGW-like protocols based on Shamir secret sharing can also work over a general ring (see, e.g., [127]). While integer computations modulo Z 2 k are more natural for modern computers, and may be useful for simplifying implementations and applications such as machine learning (ML), we focus on the case of finite fields for the sake of simplicity.
Below, we present the framework for secret-sharing-based MPC protocols in the semi-honest setting, which is shown in Figure 1. Specifically, the inputs are shared secretly among all parties, and then the circuit is evaluated layer-by-layer where all gates in a layer can be computed in parallel, and thus the communication round is linear to the depth of the circuit. Finally, the output of every party is reconstructed. While addition gates are free without any communication, the main cost of MPC protocols is to compute multiplication gates by executing a semi-honest multiplication protocol Π semi Mult . For LSSSs of different kinds, Π semi Mult is constructed in a different way. We sketch three classical constructions of Π semi Mult corresponding to three kinds of secret sharings in Figure 2, where the protocols are divided into two phases: the preprocessing phase where the circuit and input are unknown and the online phase where the circuit and input are known to all parties. Toward additive secret sharing, the state-of-the-art protocol adopts Beaver multiplication triples [121] to perform the multiplication of two secret values. In particular, a random Beaver triple ([a], [b], [c]) with c = a · b ∈ F is generated in the preprocessing phase, and then is consumed to compute one multiplication using the Beaver technique in the online phase. If F = F 2 , then Beaver triples can be computed by OT extension protocols; otherwise, they are able to be computed using OLE protocols. Recently, Mouchet et al. [128] also used the multi-party homomorphic encryption (MHE) scheme to generate Beaver triples with semi-honest security in the dishonest-majority setting. When the Beaver technique requires two elements per multiplication gate per party in the online phase, the communication can be further reduced to one element per multiplication gate per party using the technique underlying the Turbospeedz protocol [129]. In particular, the function-dependent preprocessing phase is introduced where the circuit is known but the input is still unknown, and circuit-dependent Beaver triples are generated (1) Consider (Λ w , [λ w ]) as the additive secret sharing on a wire w, where Λ w = z w + λ w ∈ F is a public value, z w is the actual value on w and λ w is a random element. Here, [λ w ] can be generated by having every party share a random value to other parties and then sum all its shares as the resulting share of λ w in the preprocessing phase. (2) For each wire w associated with P i 's input x w , P i sends Λ w = x w + λ w to all parties who define (Λ w , [λ w ]) as the sharing on wire w.
where [λ γ ] is a random sharing generated by the parties in the preprocessing phase. The parties execute Open([Λ γ ]) to obtain Λ γ , and define (Λ γ , [λ γ ]) as the sharing on the output wire γ. The GMW protocol based on the multiplication protocol Π semi Mult shown in Figure 2 supports the corruption threshold t = n − 1 with n the number of total parties. If we allow n/2 ≤ t < n − 1 (that is still in the dishonest-majority setting), we can use the TinyKeys approach [114] to improve the efficiency of protocol Π semi Mult over a binary field. Specifically, for the number of honest parties h > 1, e.g., (h = 6, n = 20) or (h = 120, n = 400), Hazay et al. [114] used the IKNP OT extension protocol [113,130] with short keys to generate Beaver multiplication triples. The basic idea is that the combination of the short keys of h honest parties will have a large entropy, although the short key of an honest party has only a low entropy. The TinyKeys approach can also be extended to the malicious setting using IT-MACs with short keys [131].
Due to the recent development of PCG-style OT extension protocols [132][133][134], these protocols outperform the IKNP-style OT extension protocols [113,130,135,136], and have the sublinear communication compared to the linear communication of the IKNP-style protocols (see Section 5 for more details). In this case, the efficiency improvement using the TinyKeys approach seems to be significantly smaller, if the recent PCG-style (instead of IKNP-style) OT extension protocols are adopted to construct the protocol Π semi Mult . For Shamir secret sharing, if the number of parties is small (particularly n ≤ 5), the state-of-theart multiplication protocol is the GRR protocol [117] using the degree-reduction approach. The GRR protocol works by letting every party locally multiply its shares of the inputs [x], [y], share the result to all other parties (allowing the degree of the polynomial to be reduced from 2t to t), and then locally compute a linear combination of shares as its share of the output [z]. If the number of parties is larger (e.g., n > 5), we mainly adopt the Damgård-Nielsen (DN) protocol [116] to realize the multiplication of two secret values. The original DN protocol [116] described in Figure 2 is information-theoretically secure, and needs six elements of communication per multiplication gate per party. In the information-theoretic setting, the communication cost of the DN protocol was first improved by Goyal et al. [119,137] from 6 elements to 5.5 elements using a simple technique. Recently, the communication of the DN protocol was further improved to 4 elements per multiplication gate per party by Goyal et al. [118]. Inspired by the technique [119,137], they first modify the original DN protocol [116] shown in Figure 2 to make P king send a random degree-t Shamir sharing [ ] t (instead of ) to all other parties and then all parties compute [z] t := [ ] t − [r] t locally. Their crucial observation is that when P king is an honest party, the corrupted parties only receive random elements from P king in this case. This holds even if the corrupted parties know the whole double sharings ([r] t , [r] 2t ), since they only receive t shares that are uniformly random and independent of secret z = x · y. Therefore, we can split the tasks of computing multiplication gates as P king into all parties rather than a fixed party in the original DN protocol. In particular, when we need to compute n multiplication gates, we let every party behave as P king for one multiplication gate. If P king is a corrupted party, we still need a pair of random double sharings. If P king is an honest party, the double sharings do not need to be random. This means that we only need to generate t pairs of random double sharings for n multiplication gates in the preprocessing phase. In the computation setting, we can use the pseudorandom secret sharing approach [99] to further reduce the communication cost, which has been used in previous honest-majority MPC protocols such as [101,[138][139][140]. For example, the stateof-the-art DN-style protocol [118] can be optimized to two elements per multiplication gate per party using a pseudo-random generator (PRG). In addition, Goyal et al. [118] also proposed a combination of the improved DN multiplication and Beaver triple multiplication to reduce the round complexity by a factor of 2, at the cost of that the communication cost is additionally increased by 0.5 elements per multiplication gate per party. Recently, Abspoel et al. [141] used regenerating codes to construct a single-round multiplication protocol at the cost of increasing the communication complexity by a factor O(n/ log n), where the regenerating property [142] of Shamir secret sharing requires that the number of parties n is large and the DN multiplication protocol needs about two rounds.
As for replicated secret sharing in the 3PC setting, the state-of-the-art protocol [87,101] is simple and needs to send only one element per multiplication gate per party. In particular, shares of z = x · y can be computed by every party locally. Then, every party needs to use a random 0-sharing to randomize its share of z, and then sends the result to another party. The random 0-sharing [r] is easy to be computed by having every party P i send a uniform key k i to P i+1 for i ∈ [1, 3] and compute r i := F (k i−1 , id)−F (k i , id) using the two keys that it holds where id is an identifier. While Shamir secret sharing is suitable for a large number of parties, replicated secret sharing is mainly used for a small number of parties (e.g., the number of parties n ∈ {3, 4, 5, 7, 9}). In addition, Keller et al. [143] showed that the 3PC protocol [87] can be generalized to a general Q 2 access structure, which was further improved in [144] to eliminate the restriction of replicated secret sharing (i.e., requiring an exponentially-large number of shares for a large number of parties).

Maliciously secure protocols
The MPC protocols based on secret sharings described in the previous subsection guarantee security in the presence of semi-honest adversaries. To achieve malicious security, some checking procedures need to be added. The underlying techniques to assure security against malicious adversaries are different between dishonest-majority MPC and honest-majority MPC. For example, MPC in the dishonest-majority setting needs IT-MACs to authenticate values shared among all parties, but this is unnecessary for MPC with honest majority. Thus, we present the development for maliciously secure MPC in two different settings.
Dishonest majority. Goldreich, Micali and Wigderson (GMW) [4] proposed a general compiler to convert a semi-honest MPC protocol into a maliciously secure MPC protocol for the same computational task. However, this compiler is non-black-box using the generic zero-knowledge proofs to prove the correctness of computation in each step, and thus is not concretely efficient. Later, Ishai, Prabhakaran and Sahai (IPS) [145] proposed a black-box compiler, where an inner MPC protocol with semi-honest security computes a circuit in the OT-hybrid model, and an outer MPC protocol with malicious security in the honest-majority setting is used to guarantee the security of the whole MPC protocol in the presence of malicious adversaries. The IPS compiler was improved in [123] for multi-party setting, and was further optimized in [54,146] for two-party setting. However, the concrete efficiency for the maliciously secure MPC protocols based on the IPS compiler is still not sufficiently high. Recently, based on the IPS framework, Hazay et al. [147] proposed a new compiler using two-level sharings where the outer level is Shamir secret sharing or algebraic geometric (AG) secret sharing [148], and the inner level is additive secret sharing. Their compiler allows an arbitrary-sized field with constant communication overhead over the semi-honest GMW protocol [4], but the concrete efficiency is still low.
In the dishonest-majority setting, concretely efficient MPC protocols based on IT-MACs have the smallest overhead to achieve malicious security. Using IT-MACs, we can transform the semi-honest GMW protocol shown in Figure 1 to a maliciously secure protocol in the following way: • Replacing all additive secret sharings with authenticated secret sharings defined in Section 3.2.
• For each wire w associated with P i 's input x w , the parties generate a random authenticated share [[r w ]] in the preprocessing phase, and then P i broadcasts Λ w := x w − r w to all parties who compute Mult with a maliciously secure protocol Π mali Mult . We can also use the Beaver technique [121] to construct protocol Π mali Mult in the following steps: • Preprocessing phase: All parties generate a random authenticated triple ( The generation of authenticated triples can be divided into two steps: 1) computing authenticated shares by executing a correlated OT (COT) or vector OLE (VOLE) protocol with malicious security, where the notion of COT and VOLE can be found in Section 5; 2) generating faulty authenticated triples with authenticated shares and then checking the correctness of these faulty authenticated triples. In the first step, all parties execute a maliciously secure COT/VOLE protocol in a pairwise way, and then run a consistency-check procedure to check the consistency of shares and global keys among multiple executions. The state-of-the-art consistency check adopts the random linear combination approach (see, e.g., [43,104,106,108,110]), and requires a very small communication overhead. For the second step, multiple approaches can be used for different applications (see below).
• Online phase: This phase is similar to the semi-honest protocol shown in Figure 2, except that authenticated shares are involved and the corresponding open procedure described in Section 3.2 is used. Similarly, the communication in this phase can be further improved from 2 elements per multiplication gate per party to only one element, using the technique in Turbospeedz [129].
When the Turbospeedz technique [129] and the batch-check technique shown in Section 3.2 are used, the online phase of the maliciously secure GMW-like protocols [103,104] has the optimal communication cost without sacrificing the security. Most of MPC studies focus on improving the efficiency of the preprocessing phase. Particularly, generating authenticated triples is the efficiency bottleneck. For the generation of authenticated triples, we consider two cases: 1) large fields (i.e., |F| ≥ 2 ρ ) and 2) binary fields (i.e., F = F 2 ). Note that the techniques for binary fields are able to be used in other cases of small fields (e.g., F 2 8 ).
For the case of large fields, the SPDZ framework [103,149] is the state-of-the-art protocol in the dishonest-majority malicious setting. The original SPDZ protocol [103,149] uses the depth-1 homomorphic encryption (HE) scheme (i.e., the underlying HE scheme could support one multiplication) to generate authenticated triples in the preprocessing phase, and can evaluate the circuit fast in the online phase. Later, Keller et al. [43] proposed a SPDZ-style protocol called as MASCOT, which uses the OT extension protocol [136] and Gilboa multiplication idea [150] to generate authenticated triples more efficiently. Subsequently, based on additively homomorphic encryption [151] and lattice-based zero-knowledge proofs, Keller et al. [152] presented an optimized SPDZ-style protocol referred to as Overdrive, which significantly improves the communication to generate authenticated triples. Overdrive includes two versions: LowGear for a small number of parties and HighGear for a large number of parties. The performance of HighGear is further improved by Baum et al. [153] via optimizing the underlying zero-knowledge protocol. In these SPDZ protocols, the underlying technique for checking the correctness of faulty authenticated triples is the so-called "sacrifice" technique. The improved sacrifice technique [43] works as follows: ]) to verify that σ = 0, where CheckZero is the same as Open defined in Section 3.2, except that the values to be opened are 0 and thus are unnecessary to be sent. • When faulty authenticated triples need to be checked for some integer , the randomness r can be reused for all checking procedures, and the procedures Open and CheckZero can be done in a batch (see Section 3.2). • If c = a · b + e for some adversarially chosen error e and e = 0, then we have the following: whereê =ĉ −â · b is another error introduced by the adversary. If e = 0, the probability that σ = r · e −ê is equal to 0 is 1/|F|, as r is sampled uniformly at random after e,ê have been determined. Therefore, the checking procedure described as above requires F to be a large field. We can also repeat the checking procedure t times to support a small field F, where |F| t ≥ 2 ρ and t = ρ if F = F 2 . However, this will require a large computation and communication overhead.
Recently, Chen et al. [154] integrated the depth-2 HE scheme [155,156] into the SPDZ framework to improve the efficiency of SPDZ protocols for computing matrix multiplication and two-dimensional convolution. For other general functions, it is not clear whether their approach can achieve a better efficiency, due to the larger parameters for HE.
While the semi-honest GMW protocol over a field can be straightforwardly extended to ring Z 2 k , this is not easy for SPDZ-style protocols with malicious security. Cramer et al. [157] proposed the first concretely efficient MPC protocol over a ring Z 2 k in the SPDZ framework (named as SPDZ 2 k ), using IT-MACs over two different rings where the secret values are in Z 2 k and the MAC tags are in Z 2 k+s . Their protocol SPDZ 2 k uses the MASCOT idea to generate authenticated triples, but needs more communication than MASCOT [43]. Later, Damgård et al. [158] implemented SPDZ 2 k , designed new protocols for equality test, comparison, and truncation over ring Z 2 k , and demonstrated that these operations in the ML domain using SPDZ 2 k are more efficient than the field-based SPDZ-style protocols [43,152]. Subsequently, two improved MPC protocols based on the SPDZ 2 k idea have also been proposed [159,160].
Recently, Boyle et al. [161] proposed several new protocols to generate Beaver multiplication triples and authenticated triples in the PCG framework based on a new variant of the ring-LPN assumption. They use distributed point function (DPF) and the sparse feature of the noise in ring-LPN to generate Beaver triples in the semi-honest two-party setting with a small communication. Using the programmability of PCG [162], the semi-honest protocol for producing Beaver triples in the two-party setting can be easy to be extended to the multi-party setting. Based on the construction of SPDZ-style authenticated shares, Boyle et al. [161] also extended the semi-honest protocol to generate authenticated triples in the two-party malicious setting. The maliciously secure protocol has a communication that is two orders of magnitude smaller than Overdrive. While the communication efficiency is attractive, it is worth further reducing the computational cost to make the PCG approach more practical. Boyle et al. [161] also gave a candidate construction for generating authenticated triples in the multi-party setting using the three-party DPF [163], but its concrete efficiency is very low.
For the case of binary field (i.e., F = F 2 ), we mainly consider two types of MPC protocols in the malicious setting: TinyOT-style [104, 106-110, 164, 165] and MiniMAC-style [164,[166][167][168][169]. In particular, the sub-protocols for generating authenticated triples underlying in the TinyOT-style protocols can be used to design constant-round MPC protocols with malicious security [106][107][108][109][110]. These TinyOT-style protocols adopt the BDOZ-style IT-MACs [105] to authenticate bits, and use the bucketing approach to eliminate the possible leakage of shares due to the selective-failure attack, where the adversary can guess a bit share of an honest party with probability 1/2 but will be caught with the same probability. MiniMAC [168] aims to solve the problem that SPDZ [103] has a large communication overhead for binary field F 2 (particularly the communication is blown up by a factor of ρ). Specifically, MiniMAC adopts a batch authentication idea: if k instances of the same Boolean circuit need to be computed at once, one can bundle these computations together and view them as the computation of a single Boolean circuit over a large ring F k , where the addition and multiplication over F k are component-wise. In MiniMAC, an IT-MAC on message x ∈ F k is defined as C(x) * ∆ where ∆ ∈ F k , C is a linear error-correcting code with a large minimum distance and * is the componentwise product. MiniMAC-style MPC protocols [164,[166][167][168][169] also work for layered Boolean circuits where the gates of a Boolean circuit are partitioned into ordered layers and a layer only consists of gates of the same type. The recent MiniMAC-style protocol [166] adopts an algebraic tool called reverse multiplication friendly embedding (RMFE) [170] that is originally proposed for honest-majority MPC, and obtains a lower communication cost. Besides, for small fields, TinyTable-style protocols [112,171,172] are proposed and very suitable for secure AES or 3DES evaluation using the one-time truth table approach.
Honest majority. In the malicious setting, we only need to check the correctness of multiplication gates, as addition gates are computed locally and always correct. In 2017, Lindell and Nof [101] observed that the semi-honest DN protocol [116] has guaranteed the privacy of secret values in the presence of malicious adversaries, and allows the adversary to introduce an additive error in the output, i.e., for two sharings [x], [y], the DN protocol will output a sharing [z] with z = x · y + d where d is an additive error. This observation also holds for the GRR protocol [117] and the multiplication protocol based on replicated secret sharing [87]. They adopt the Beaver triples and the random-linear-combination approach to check the correctness of multiplication gates, which introduces a relatively large overhead compared to the semi-honest protocol. Subsequently, Chida et al. [138] proposed a different approach to verify the correctness of multiplication gates, where the semi-honest multiplication protocol is executed twice and then the parties check the correctness of a multiplication gates using another related multiplication triple. Their MPC protocol still introduces twice the communication overhead compared to the semi-honest DN protocol. Concurrently, Nordholt and Veeningen [140] also achieved the twice communication overhead. The studies [101,138] mainly consider the case of large fields, and also present the correctness check for small fields by repeating the verification procedure which will introduce an large overhead. In the three-party setting, Furukawa et al. [173] and Araki et al. [174] converted the semi-honest protocol for Boolean circuits [87] to maliciously secure protocols using the "Cut-and-Choose" approach, which will introduce a overhead of O(ρ/ log N ) where N is the number of multiplication gates. This overhead is smaller than the natural repeat approach, but is not optimal. The MPC protocols described as above allow the corruption threshold t < n/2 where n is the number of parties. If t < n/3 is allowed, the MPC protocol with malicious security can be constructed at essentially the same cost as the best-known semi-honest protocol [139], i.e., the overhead to achieve malicious security is 1 and optimal. Their approach is as follows: 1) for two Shamir sharings [x] t and [y] t , the parties can locally compute the Shamir sharing of [z] 2t with z = x · y; 2) when t < n/3, the opening of [z] 2t can be guaranteed to be correct; 3) [z] 2t can be used to check the correctness of [z] t that is obtained by running the semi-honest DN protocol. If t < n/2, there are two recent techniques to achieve malicious security with an overhead of 1 over the best-known semi-honest protocol.
Specifically, one can use the distributed zero-knowledge proof with sublinear communication [124] to verify the correctness of multiplication gates. Firstly, Boneh et al. [124] used such zero-knowledge proofs and a variant of the DN protocol with replicated secret sharing to construct a maliciously secure MPC protocol for constant number of parties. At the first time, their approach obtains 1 bit per AND gate per party in terms of the amortized communication cost for the 3PC protocol about Boolean circuits. Their verification protocol requires O(n √ N + n) field elements per party of communication and constant rounds using the Fiat-Shamir heuristic, where n is the number of parties. In the three-party setting, the concrete efficiency of the 3PC protocol by Boneh et al. [124] is significantly improved in [175], which achieves the best efficiency for now. Recently, Boyle et al. [125] used the distributed zero-knowledge proof in a new way, and constructed an MPC protocol with an optimal overhead over the best-known semihonest protocol for an arbitrary number of parties. Their approach uses a new insight where for any secret sharing of a value x, we can simultaneously view shares of x as a sharing of each secret share x i itself. Their verification protocol [125] for checking multiplication triples needs communication of O(n log N +n) field elements per party and constant rounds.
Building on the technique [124] of distributed zero-knowledge proofs, Goyal et al. [119,137] proposed another verification technique for multiplication gates to achieve malicious security with an overhead of 1, and requires O(log N ) rounds and communication of (n log N + n) field elements per party for the verification protocol.
They used a key observation that the semi-honest DN protocol can compute the inner-product of two vectors with the same communication cost [138], and adopted a recursive idea to perform the verification of multiplication gates. Concretely, their verification technique works as follows: [1,N ] to be verified, where these tuples are computed using the semi-honest DN protocol, parties P 1 , . . . , P n use a random-linear-combination approach with a uniformly random r to compute the following vectors:  . This can be done in a batch using the batch-wise multiplication verification technique [140,176] to compress the verification of k inner-product tuples into one check of a single inner-product tuple of dimension m/k. (3) The parties repeat the second step log k N times so that only a single multiplication triple needs to be checked. This final check can be performed using the randomization and opening approach.
The verification technique by Goyal et al. [119,137] is originally described for Shamir secret sharing, and can also work for replicated secret sharing as the state-of-the-art semi-honest multiplication protocol [87,101] has the same communication cost for computing inner-product of two vectors.
Both of the state-of-the-art verification techniques [125,137] are based on the technique underlying the distributed zero-knowledge proofs [124]. Both techniques can obtain the same communication complexity, but has a different round complexity where the technique [119] requires O(log N ) of rounds and the technique [125] has constant rounds. In terms of concrete communication efficiency, the verification protocol of multiplication gates by Goyal et al. [137] is slightly better than the protocol by Boyle et al. [125].
In Table 1, we compare the communication cost of several known honest-majority MPC protocols based on Shamir secret sharing for evaluating a single arithmetic circuit, where the left part compares the communication cost of semi-honest MPC protocols and the right part compares the communication overhead of maliciously secure MPC protocols over semi-honest protocols. For a large number of parties, the honest-majority MPC protocols as described above adopt Shamir secret sharing as the underlying LSSS, and thus require O(n log n) bits per multiplication gate of communication complexity for evaluating a single Boolean circuit, as Shamir secret sharing requires that the size of field F is greater than the number n of parties. Recently, based on RMFE [170], Polychroniadou and Song [177] combined Shamir secret sharing with additive secret sharing to reduce the communication complexity to O(n) bits per multiplication gate.
Two recent studies [178,179] designed large-scale MPC protocols, which scale practically to hundreds of thousands of parties. Such MPC protocols are interesting for applications that a large number of parties participate in the protocol execution. For example, in privacy-preserving federated learning, thousands of low-resource devices are desired to train a ML model on their collective data. Additionally, when the number of parties is larger, the honest-majority assumption will become more believable. While the honest-majority MPC protocols [101,116,118,119,125,[137][138][139] have a total communication complexity O(n|C|), both concretely efficient MPC protocols [178,179] adopt packed secret sharing [180] to obtain the total communication complexity O(|C|), where n is the number of parties and |C| is the size of the circuit to be computed. While the work by Gordon et al. [179] has the total computation complexity O(log n · |C|) for any polynomial-sized circuit, the work by Beck et al. [178] obtains the total computation complexity O(|C|) for highly repetitive circuits (e.g., ML training algorithms). Packed secret sharing is an important tool that has been used to obtain the total communication complexityÕ(|C|) for SIMD circuits 1 [181][182][183], and is a generalization of Shamir secret sharing that defines as follows: • Let n be the number of parties and k be the number of secrets that are packed in a single sharing.
Let α 1 , . . . , α n , β 1 , . . . , β k be n + k distinct non-zero elements over a field F. Recently, Goyal et al. [184] constructed a large-scale MPC protocol, which achieves the total communication complexity O(|C|) for a single circuit evaluation, using Hall's Marriage Theorem. In the malicious setting, the MPC protocol by Goyal et al. [184] can achieve an overhead of 1 over the semi-honest protocol using the verification technique [119], while the overhead for other recent MPC protocols [178,179] is more than twice. Since no implementation is provided, the concrete efficiency of their MPC protocol is not clear. All the recent large-scale MPC protocols [178,179,184] require that the number of corrupted parties t ≤ n(1/2 − ) for 0 < < 1/2. Besides, Gordon et al. [179] suggested to use the SPDZ technique and a committee of t + 1 parties to design the online protocol, which is usable to reduce the online communication cost as only t + 1 parties instead of n parties run the online protocol. Concurrently, Escudero and Dalskov [185] improved the online communication cost of honest-majority MPC protocols using the Turbospeedz technique and the committee idea, and obtained the minimal online communication (i.e., one field element per multiplication gate per party).

Constant-round MPC based on garbled circuits
For now, known concretely efficient constant-round MPC protocols are constructed based on garbled circuits that are encrypted versions of circuits and can be computed only once. We first consider semihonest protocols, and then show how they are compiled to maliciously secure MPC protocols.

Secure two-party computation
The first constant-round secure two-party computation (2PC) protocol was proposed by Yao [6], and achieves semi-honest security. The Yao's 2PC protocol adopts garbled circuit (GC) and OT as the building blocks. Specifically, using a garbling scheme, a garbler P A is able to generate a garbled circuit GC, an encoding information e and a decoding information d.
Then, an evaluator P B can evaluate GC with e, and then obtains the output bits according to d. The garbling scheme enables P B to obtain a function output, but does not reveal any other information on the input to P B . We refer the reader to [186,187] for the formal definition of garbling schemes. Roughly, Yao's 2PC protocol with semi-honest security is described in Figure 3.
The 2PC protocol can be further optimized using the precomputing OT idea [188], where a random oblivious transfer (ROT) protocol is run in the preprocessing phase, and transform ROT to standard OT with chosen choice bits in the online phase. Besides, a GC can be sent in a pipelined way (i.e., garbled rows for a batch of gates are computed and communicated, and then these are done for the next batch of gates) [189], which allows the GC implementation to scale to an unlimited number of gates using a nearly constant memory. Subsequent studies focus on optimizing the Yao's 2PC protocol in two aspects: improving the construction of GCs and designing more efficient OT extension protocols. Below, we describe the development of GCs, and postpone that of OT extension to Section 5.
Garbled circuits. The first GC construction was introduced by Yao [6] in 1986, but its formal description and security proof was first presented by Lindell and Pinkas until 2004 [190]. The original Yao GC construction requires 8κ bits per gate. The communication cost can be reduced to 4κ bits per gate using the "point-permute" technique [1,191], where the actual bit z w on each wire w is masked by a random bit (a.k.a., permute bit) λ w , and the resulting public value Λ w = z w ⊕ λ w ∈ {0, 1} allows to be known by the evaluator. In this case, the decoding information d can be defined as the wire masks λ w for all circuit-output wires w. The garbled row reduction (GRR) technique [192] is able to further reduce the communication to 3κ bits per gate, where a garbled row is always defined as zero by specially setting the 0-labels. Later, Pinkas et al. [193] used the polynomial-interpolation approach to further reduce the communication to 2κ bits per gate, where the technique is called 4-to-2 GRR compared to the 4-to-3 GRR technique [192]. In 2008, Kolesnikov and Schneider [194] proposed the free-XOR technique that enables XOR gates in the circuit to be garbled with no communication. In particular, the garbler sets L w,1 = L w,0 ⊕ ∆ for each wire w where ∆ is a fixed offset (a.k.a., global key). For each XOR gate (α, β, γ, ⊕), the garbler also sets L γ,0 = L α,0 ⊕ L β,0 and λ γ = λ α ⊕ λ β . Given the public-value and label pairs (Λ α , L α,Λα ) and (Λ β , L β,Λ β ) on the input wires, the evaluator can compute locally the public value on the output wire γ. While the 4-to-2 GRR technique [193] is not compatible with free XOR, the earlier 4-to-3 GRR technique [192] keeps compatible with free XOR. Afterward, Zahur, Rosulek, and Evans [195] improved the GC construction to 2κ bits per AND gate while keeping the XOR gates for free. The main idea behind their construction is to break an AND gate into two half gates for which the evaluator knows one input (i.e., a public value). We review the half-gate construction as follows: • Construction of GCs: The garbler computes a garbled row for an AND gate (α, β, γ, ∧) as below: The garbler also computes the 0-label on the output wire γ as: • Evaluation of circuits: For any AND gate (α, β, γ, ∧), given (Λ α , L α,Λα ) and (Λ β , L β,Λ β ), the evaluator can evaluate the label on the output wire γ as follows: If the garbler sets lsb(∆) = 1, then lsb(L w,Λw ) = lsb(L w,0 ⊕ Λ w · ∆) = lsb(L w,0 ) ⊕ Λ w for every wire w. Thus, the garbler can send lsb(L w,0 ) for the output wire w of each AND gate to the   [199] 1.5 0 ≤ 6 0 ≤ 3 0 Circular CRHF a For GC size, a small constant additive term (i.e., 5 bits) is ignored for [199]. b We use PRF to denote pseudo-random function.
evaluator. Then, the evaluator can compute Λ γ = lsb(L γ,Λγ )⊕lsb(L γ,0 ). Actually, the communication of bit lsb(L w,0 ) for each AND gate can be omitted, if we define a label L w,zw corresponding to the actual bit z w instead of the public value Λ w for every wire w and set λ w = lsb(L w,0 ), and thus Λ w = lsb(L w,zw ) = lsb(L w,0 ⊕ z w · ∆) = λ w ⊕ z w . For every circuit-output wire w, the garbler can send the wire mask λ w ∈ {0, 1} to the evaluator, who can compute the output bit z w := Λ w ⊕ λ w .
For security, it is unnecessary to model H as a random oracle, and instead is sufficient to require that H satisfies the notion of circular correlated robustness hash function (circular CRHF) [196]. In this case, we can use a random permutation such as a fixed-key AES to implement CRHFs [197,198].
Given the hardware-instruction support, the computational efficiency of GCs can be significantly improved [197,198]. This makes the efficiency bottleneck for GCs become the size of garbled circuits.
Zahur et al. [195] proved a lower bound of 2κ bits per AND gate in a model of linear garbling, which models the labels as a whole. Recently, Rosulek and Roy [199] broke through the half gates' lower bound by introducing a new technique called slicing and dicing, while keeping fully compatible with the free-XOR technique. In particular, they improved the communication cost of GCs to 1.5κ + 5 bits per AND gate, when the computation is slightly more than half gates. In terms of techniques, they slice the garbled labels into two halves, and introduce more linear combinations to increase the linear-algebraic dimension in which the garbling scheme can operate. Besides, they also add some random control bits into the construction of GCs, where the control bits determine the linear combinations of labels and garbled ciphertexts, and are outside of the linear garbling model. However, the state-of-the-art garbling scheme [199] is more complex and involves many linear-algebraic operations. It may be a challenging task to give a simple description of their garbling scheme, i.e., describe the garbling scheme as the clean composition of some simpler components similar to the half-gate construction.
In Table 2, we summarize the communication and computation costs of the known efficient garbling schemes by following the comparison table shown in [199], where the flexible-XOR technique [200] and fast 4-to-2 GRR technique [201] are also compared. Our survey only considers the garbling schemes on Boolean circuits, which allows to obtain the minimal size of garbled circuits. We refer the reader to [202][203][204][205] for garbling arithmetic circuits.

Secure multi-party computation
In the multi-party setting, constant-round MPC has to deal with the case that multiple parties collude to cheat an honest party. Therefore, we cannot let only one party construct garbled circuits, and instead make all parties jointly construct a garbled circuit in a distributed manner. We use distributed garbling schemes to generate multi-party garbled circuits. The first distributed garbling scheme was introduced by Beaver, Micali, and Rogaway [1] in 1990. Based on the distributed garbling, they presented a constantround MPC protocol in the dishonest-majority setting, but this protocol has a very low concrete efficiency. In the semi-honest setting, we focus on the case of all-but-one corruption (i.e., n − 1 out of n parties allow to be corrupted). We describe several work that construct more efficient constant-round MPC protocols in the honest-majority malicious setting in Section 4.2.
Surprisingly, in the dishonest-majority setting, the BMR garbling was first optimized until 2016 using the free-XOR technique [206]. Based on the optimized BMR garbled circuits, they proposed an efficient constant-round MPC protocol with semi-honest security. In particular, their improved BMR garbled circuit [206] is defined as follows: • Every party P i with i ∈ [1, n] has the following secret values: (a) A global key ∆ i ∈ {0, 1} κ . (b) For each wire w in the circuit, a share of a mask bit λ w ∈ {0, 1}. (c) For each wire w, two garbled labels L i w,0 , L i w,1 ∈ {0, 1} κ such that L i w,0 ⊕ L i w,1 = ∆ i . • For each AND gate (α, β, γ, ∧), for each u, v ∈ {0, 1}, all parties jointly compute the following: The multi-party garbled circuit consists of (G 1 γ,u,v , . . . , G n γ,u,v ) for the output wire γ of each AND gate and each u, v ∈ {0, 1}.
While the BMR garbled circuit is symmetric that allows every party to evaluate the circuit, Wang, Ranellucci and Katz [109] proposed an asymmetric distributed garbling which only allows one party (e.g., P 1 ) to evaluate the circuit. The WRK garbled circuit is defined as follows: • Every party P i holds the same secret values as in the BMR garbled circuits.
Recently, Yang et al. [110] partially used the half-gate technique to further reduce the size of the WRK garbled circuit from 4n|C|κ bits to (4n − 6)|C|κ bits. The size of the BMR garbled circuit is 4n|C|κ bits, and is larger than the WRK garbled circuit. The MPC protocols based on BMR garbled circuits will have 1-2 less online rounds than those based on WRK garbled circuits, if all parties obtain the output. The constant-round MPC protocols based on distributed garbling achieve optimal online communication cost. The main task for improving constant-round MPC is to reduce the communication cost in the preprocessing phase, while keeping the computation fast. However, the known constant-round MPC protocols that are concretely efficient has O(n 2 ) computation complexity in the online phase. For a large number of parties, this becomes expensive. In 2017, Ben-Efraim et al. [207] constructed a constant-round MPC protocol in the BMR framework, which achieves the computation complexity of O(1) in the online phase. This was done using key-homomorphic pseudorandom functions that can be constructed under the DDH/LWE assumption. Their protocol in the online phase is more efficient than the state-of-the-art semihonest MPC protocol [206] with O(n 2 ) computation complexity when the number of parties is at least 100. However, their approach is not compatible with the free-XOR optimization [194], which will introduce a large overhead in the preprocessing phase. Recently, Ben-Efraim et al. [208] used an encryption scheme that is both key-homomorphic and message-homomorphic based on the LPN assumption to construct a BMR-like garbled circuit that is compatible with free-XOR. Their LPN-based technique can obtain faster online computation when n ≥ 100, but requires a rather expensive preprocessing phase. If the number of honest parties is relaxed to h = n/c for some constant 1 < c < n, the preprocessing phase can be accelerated significantly using the techniques in [106,109].

Maliciously secure protocols
In the malicious setting, we first consider the two-party case, and then discuss the multi-party case in the dishonest-majority and honest-majority settings, respectively.

Secure two-party computation
For constant-round 2PC protocols, before 2017, one popular approach for designing maliciously secure protocols is to use the "Cut-and-Choose" (C&C) technique. There are two different flavors to use such technique. The first one is the circuit-level C&C approach that was introduced by Lindell and Pinkas [209] and optimized in [210][211][212][213][214][215][216][217][218][219][220][221][222][223][224], where many garbled circuits are prepared, a random subset of them are opened and verified, and the remaining unchecked circuits are evaluated. In the single-execution setting where a circuit is computed at once on an input, ρ garbled circuits need to be prepared for statistical security 2 −ρ and the most efficient 2PC protocol in this setting is by Wang et al. [224]. In the amortized setting where the same circuit is evaluated multiple times on different inputs, only O(ρ/ log τ ) garbled circuits need to be prepared for amortizing over τ executions, and the best-known 2PC protocol in this setting is by Rindal and Rosulek [221]. The second one is the gate-level C&C approach that was introduced by Nielsen and Orlandi [225] and called LEGO, where a lot of individual garbled AND gates are prepared, a random subset of them are opened and verified, and the remaining unchecked garbled gates are soldered to a garbled fault tolerant circuit using the XOR-homomorphic commitments. Subsequently, the LEGO protocol was optimized in [226][227][228][229][230][231]. Compared to the circuit-level C&C approach, the gate-level C&C approach has a lower asymptotic complexity O(ρ/ log |C|) and supports the function-independent preprocessing where both circuit and input are unknown (where such preprocessing is not supported by the circuit-level C&C approach), but is less efficient in the amortized setting and has also lower efficiency for some functions in the single-execution setting.
In 2017, the milestone work by Wang, Ranellucci and Katz [108] proposed the authenticated garbling approach to construct highly-efficient 2PC protocols, where a single "authenticated" garbled circuit is constructed and transmitted. Their approach works in the following framework: (2) Function-dependent preprocessing phase: In this phase in which the circuit is known, the parties generate an authenticated garbled circuit in a distributed way. In the process of generating authenticated garbled circuits, the key observation is that we can use the same global key for garbled circuits and authenticated shares, and thus the MAC tags and local keys involved in authenticated shares are naturally set as the shares to be used for constructing garbled circuits. (3) Online phase: The party P 1 evaluates the circuit and obtains the output.
Wang et al. [108] proposed and adopted the WRK garbled circuit, and thus only one party P 1 can evaluate the circuit. Concurrently, a similar approach is proposed by Hazay, Scholl, and Soria-Vazquez [106] based on the BMR garbled circuit. Later, Katz et al. [107] significantly optimized the 2PC protocol by 1) applying the half-gate technique into distributed garbling, and 2) improving the communication and computation of the TinyOT-like protocol. The 2PC protocol [107] can generate a garbled circuit with 2κ + 1 bits per AND gate in the preprocessing phase, and performs the circuit authentication separately in a batch in the online phase. For now, the state-of-the-art approach for maliciously secure 2PC is to adopt the distributed garbling approach [106][107][108], and is significantly outperform both C&C approaches. An interesting future work is to further reduce the size of distributed garbled circuits by Katz et al. [107].

Secure multi-party computation
Dishonest majority. For constant-round MPC protocols tolerating all-but-one malicious corruption, several studies [232][233][234] adopt the cut-and-choose approach or the combination approach of BMR and SPDZ to construct MPC protocols. However, their concrete efficiency is very low. In this setting, the best-known MPC protocols [106,[109][110][111] follow the distributed garbling framework [106,108] based on TinyOT-like protocols. These MPC protocols have the same structure as that of 2PC protocols [107,108], but need to execute a consistency-check procedure to check the consistency of shares or global keys among multiple executions. Recently, Poddar et al. [235] applied the constant-round MPC protocol [109] with malicious security to build a system called Senate that allows n parties to collaboratively run analytical SQL queries while keeping individual data private. The state-of-the-art constant-round MPC protocol with malicious security was proposed by Yang et al. [110], and can be used to further improve the performance of the above application. While the half-gate optimization is totally applied in the construction of distributed garbling in the two-party setting [107], this is only done partially in the multi-party setting [110]. It is a challenge to totally apply the half-gate technique (or even the recent slicing-and-dicing technique [199]) to multi-party garbled circuits.
Honest majority. In the honest-majority setting, constant-round MPC protocols can be constructed based on replicated secret sharing using less communication and computation (see, e.g, [236][237][238][239]). In the three-party setting with at most one malicious corruption, Mohassel et al. [239] proposed the currently most efficient 3PC protocol with three rounds by constructing a single Yao-style garbled circuit, where the maliciously secure 3PC protocol has the essentially same cost as the semi-honest Yao's 2PC protocol. Concurrently, Ishai et al. [238] constructed a two-round 3PC protocol while three garbled circuits need to be sent. In the four-party setting with at most one malicious corruption, the state-of-the-art protocol was proposed by Byali et al. [236], and has five rounds of communication and needs to send a single Yao-style garbled circuit. This protocol can achieve the stronger security property, i.e., GOD. In the five-party setting with at most two malicious corruptions, Chandran et al. [237] presented the best-known MPC protocol with 8 rounds of communication. They adopted the BMR garbled circuit to prevent collusion, and proposed a attested OT primitive to make the whole MPC protocol only rely on symmetric-key primitives without the need of OT protocols. In terms of communication cost, their maliciously secure protocol requires 60% less communication than the semi-honest MPC protocol with dishonest majority [206], and its semi-honest variant needs 8× less communication. Their construction [237] can be also extended to n parties with the corruption threshold t ≤ √ n. Later, building upon the work [237], secure five-party computation (5PC) with fairness or GOD was also constructed in [240] with a small overhead over the 5PC protocol [237] satisfying security with abort.

Oblivious transfer and oblivious linear-function evaluation
In this section, we describe the recent development and techniques of oblivious transfer (OT) and its important variants (i.e., random OT and correlated OT). Furthermore, we present the arithmetic generalization of OT called oblivious linear-function evaluation (OLE) and its key variant (i.e., vector OLE). While OT is mainly used in MPC protocols for Boolean circuits, OLE is mainly applied in MPC protocols for arithmetic circuits. In this survey, we mainly review the state-of-the-art techniques to construct (correlated) OT, and give a concise overview of the techniques to design (vector) OLE. Note that OLE has the same importance as OT. Additionally, vector OLE can be designed in the same framework as correlated OT for the state-of-art technique based on learning parity with noise (LPN). We note that homomorphic encryption (HE) is a key technique to generate (vector) OLE correlations, although it is not described in detail in this section. The recent techniques based on LPN variants allow to obtain sublinear communication complexity, compared to linear communication complexity based on HE.

Oblivious transfer
Oblivious transfer (OT) [95,96] is a fundamental cryptographic primitive between a sender S and a receiver R, which enables R to obtain only one of the two input messages of S, while S learns nothing on the R's choice bit. It can be used to construct not only the Yao's 2PC protocol but also a lot of other MPC protocols with both semi-honest and malicious security. In addition, OT can also be used to design a lot of cryptographic protocols of other kinds. OT protocols can be constructed from different cryptographic assumptions, including decisional Diffie-Hellman (DDH) [241][242][243][244][245], computational Diffie-Hellman (CDH) [244,[246][247][248], learning with errors (LWE) [242,245,[249][250][251], learning parity with noise (LPN) [247] and commutative supersingular isogeny Diffie-Hellman (CSIDH) [252]. However, when a large number of OT correlations need to be generated (particularly for MPC applications), these OT protocols based on public key operations are very expensive. To deal with this problem, Beaver [253] introduced the notion of OT extension, in which a small number of base OTs are extended efficiently to a large number of OTs (even any polynomial number of OTs) using fast operations. The first OT extension protocol by Beaver [253] uses the pseudorandom generator (PRG) in a non-black-box way, and thus is only theoretically interesting. For now, concretely efficient OT extension protocols are divided into two styles: one based on the IKNP framework [130] and the other in the PCG framework [162,254]. While the IKNP-style protocols adopt the symmetric-key primitive PRG to perform extension and support chosen choice bits, the PCG-style protocols utilize the sparse feature of the noise in the LPN problem [255] to realize extension and only allow random choice bits. 2 OT extension of both styles adopts the following structure from weak OTs to standard OTs: where COT requires two messages (m 0 , m 1 ) of the sender satisfying the fixed correlation (i.e., m 0 ⊕ m 1 = ∆ for a fixed string ∆), ROT only allows to output two uniformly random messages, and OT allows to input arbitrary two messages. Both transformations (i.e., COT ⇒ ROT and ROT ⇒ OT) are standard and recalled as follows: κ are two random strings held by the sender κ are held by the receiver, a ROT correlation ((r 0 , r 1 ), (b, r b )) can be computed without any interaction as below: where i is an index associated with the COT correlation. While the index i can be omitted in the semi-honest setting, it is necessary for malicious security [198].
• ROT ⇒ OT: Given a ROT correlation ((r 0 , r 1 ), (b, r b )) where the sender obtains two random strings r 0 , r 1 and the receiver gets a choice bit b and the string r b , a standard OT correlation ((m 0 , m 1 ), (b, m b )) can be constructed using the "one-time padding" technique as follows: -The sender sends τ 0 = m 0 ⊕ r 0 and τ 1 = m 1 ⊕ r 1 to the receiver, who computes Therefore, we can focus on designing concretely efficient COT protocols, and then transform them to standard OT protocols. In addition, COT protocols are able to be used to generate BDOZ-style authenticated shares using the TinyOT-like protocols [104,[106][107][108][109][110] as well as SPDZ-style authenticated shares using the bit-decomposition idea [43,150]. For GCs with free-XOR, the garbled labels for every wire in the circuit satisfy the COT correlation, and thus can be transmitted obliviously from a garbler to a evaluator using a COT protocol, i.e., COT can also be straightforwardly used in MPC protocols.
The semi-honest IKNP protocol [130] (improved in [113]) works roughly as follows: 1) execute a base-OT protocol (relying on public-key operations) to generate κ ROT correlations in the setup phase, by switching the role of the sender and receiver and then 2) extend κ ROT correlations to a large number of COT correlations in the extension phase, using PRG and switching column vectors into row vectors. The extended phase can be executed iteratively to generate an unlimited number of COT correlations, when using the same setup phase [113]. Later, the IKNP-style OT extension protocols with malicious security were proposed in [135,136]. Using the random-linear-combination approach, the maliciously secure protocol by Keller, Orsini, and Scholl [136] achieves the best efficiency in the IKNP framework, and has the communication cost matching that of the best-known semi-honest protocol [113]. While the IKNP-style OT extension protocols enjoy fast computation, they require linear communication cost (i.e., κ bits per COT correlation).
Another style of OT extension protocols lies in the pseudorandom correlation generator (PCG) framework [162,254]. 3 In general, the PCG-style OT extension protocols [132,134,162,256] are able to generate COT correlations with sublinear communication (i.e.,Õ( √ N ) for producing N COT correlations), but need more computation than IKNP-style protocols. To simplify the following description, we now give an informal definition of COT in a vector form. Specifically, sender S obtains a uniform global key ∆ ∈ F 2 κ and a random vector v ∈ F N 2 κ , while receiver R holds a uniform choice-bit vector u ∈ F N 2 and a vector w ∈ F N 2 κ such that w = v + ∆ · u. For both semi-honest and malicious security, the state-of-the-art PCG-style COT protocols [132,134] are constructed in the following three layers: (1) SPCOT: Construct a single-point COT (SPCOT) protocol, a variant of COT where the Hamming weight of the choice-bit vector u is exactly 1 (i.e., HW(u) = 1). We can use a point α ∈ [1, N ] to represent the location of the single non-zero entry, meaning that u α = 1 and u i = 0 for i = α. We can construct the SPCOT protocol using the following approach: • Semi-honest security: The best-known SPCOT protocol [132] in the semi-honest setting adopts the designing idea of the puncturable pseudorandom function (PPRF) construction based on the GGM tree [257]. 4 In particular, a PPRF is a special pseudorandom function F , which can generate a normal key k and a punctured key k{α} for an input α, such that k can be used to evaluate F at each point, and k{α} allows to evaluate F at every point except for α without leaking any information about F (k, α) [258,259].
Using the binary-tree structure of GGM-PPRF, sender S can transmit k{α} to the receiver R without knowing any information on α, by executing the OT protocol log N times in parallel. We refer the reader to [132,260] for details. Using key k, S can compute vector v as where addition is performed over binary field F 2 κ . Therefore, we obtain that w = v+∆·u holds, where R defines u as u α = 1 and u i = 0 for i = α. • Malicious security: The above SPCOT protocol allows a malicious sender S to send incorrect messages in the OT protocol executions, so that the punctured key obtained by receiver R does not correspond to the punctured point α. The deviation of the outputs of two parties can be detected by the receiver by executing a consistency-check procedure. The high-level idea for the state-of-the-art consistency check [134] is as follows: (a) From v + w = ∆ · u, we apply a random linear combination defined by uniformly random coefficients χ 1 , . . . , χ N ∈ F 2 κ sampled by R into two sides of the equation. According to u α = 1 and u i = 0 for i = α, we obtain the following result: (b) Using the approach underlying the MASCOT protocol [43], S and R can compute the additive shares of χ α · ∆: where it needs extra κ COT correlations. (c) Combining two equations, we have the following: The remaining task is to check V = W by running an equality-test protocol. Since V and W are unnecessary to be kept secret, the equality-test protocol can be constructed in a highly efficient manner using a cryptographic hash function [69,134].
(2) MPCOT: Construct a multi-point COT (MPCOT) protocol, a variant of COT with HW(u) = t for a parameter t > 1. Based on the SPCOT protocol, an MPCOT protocol can be constructed in a fairly straightforward way: • Given the length N of MPCOT vectors, two parties S and R can execute the SPCOT protocol t times with each the outputting length N/t. Then, for i ∈ [1, t], S obtains v i ∈ F N/t 2 κ , and R gets w i ∈ F N/t 2 κ and u i ∈ F N/t 2 with HW(u i ) = 1. Sender S defines v := (v 1 , . . . , v t ) ∈ F N 2 κ , and receiver R sets w := (w 1 , . . . , w t ) ∈ F N 2 κ and u := (u 1 , . . . , u t ) ∈ F N 2 where HW(u) = i∈ [1,t] HW(u i ) = t. • The MPCOT protocol described as above needs t log N/t OT correlations to execute the SPCOT sub-protocol. These OT correlations (thousands of OT correlations for concrete parameters t, N [132,134]) can be generated using the IKNP-style OT extension protocol [113,136]. For malicious security, extra tκ COT correlations are required. This can be optimized to κ COT correlations by combining t consistency checks into a single check (see [134] for details). • In the malicious setting, a malicious sender S may use different ∆ in the t SPCOT protocol executions. Thus, we need a consistency-check procedure to guarantee the consistency of ∆. The state-of-the-art consistency check [134] guarantees the consistency of ∆ for free, as the SPCOT correlations are always assured to use the same ∆ as the extra κ COT correlations, which have guaranteed the consistency of ∆. (3) COT from LPN: This procedure extends MPCOT correlations to COT correlations with uniform choice bits, based on the LPN assumption. For both semi-honest and malicious security, this procedure is the same, and only involves local computation.
• Based on the MPCOT protocol, two parties S and R can generate a length-N MPCOT vector (s, (r, e)) such that r = s + ∆ · e ∈ F N 2 κ . Here, we can view e ∈ F N 2 as the noise vector of an LPN problem, such that e is divided into t consecutive sub-vectors of length N/t where each sub-vector has a single non-zero entry at a random position. Such distribution is called as a regular noise distribution. As analyzed and observed in previous work [114,132,162,254,261], no known attack exploits a regular noise distribution, and performs significantly better than a uniform noise distribution where e is a uniform vector such that HW(e) = t.
• Based on LPN assumptions of two different flavors, S and R can generate COT correlations in the following two ways: (a) Dual LPN: Informally, the dual-LPN assumption with a regular noise distribution D t states that: where e ← D t , H ∈ F N ×n 2 is a matrix created by a code generation algorithm, u ∈ F n 2 is a uniform vector and N = c · n for a compression parameter c > 1 (e.g., c = 2 or 4). The dual-LPN assumption is also known as the regular syndrome decoding (RSD) assumption, which is introduced in [261] as the assumption underlying the security of the candidate fast syndrome-based (FSB) hash function for the SHA-3 competition. Given an MPCOT vector (s, (r, e)), S and R output a COT vector (v, (u, w)) as follows: (b) Primal LPN: Informally, the primal-LPN assumption with a regular noise distribution D t states that: where a ← F k 2 is a uniform vector, A ∈ F k×n 2 is a matrix created by a code generation algorithm, e ← D t , u ∈ F n 2 is a uniform vector and k < n. In this case, two parties S and R additionally need a length- k COT vector (b, (a, c)) such that c = b + ∆ · a ∈ F k 2 κ and a ∈ F k 2 is a uniform vector. Then, given an MPCOT vector (s, (r, e)), S and R can output vector v and two vectors (u, w), respectively, as follows: for dual LPN and M = k + t log n t for primal LPN in the semi-honest setting, and M is further increased by κ for malicious security. Every iteration produces n COT correlations using the setup COT correlations, and outputs n − M COT correlations where the remaining M COT correlations are stored and bootstrapped as the refreshed setup COT correlations to be used in the next iteration. In the first iteration, M setup COT correlations can be generated using the IKNP-style OT extension protocol [113,136]. When a huge number of COT correlations are required and many iterations are executed, the setup cost for generating setup COT correlations in the first iteration can be amortized to negligible.
In Figure 4, we present the structure of the PCG-based COT protocol described as above. According to the best-known implementations, the IKNP-style protocols are highly efficient to compute thousands of COT correlations, and the PCG-style protocols will be more efficient if millions of COT correlations are required even if the network bandwidth is large enough.
Recently, Rindal et al. [133] proposed a new variant of the dual-LPN assumption, using a structured and sparse matrix H generated a new LDPC code. When applying the new dual-LPN problem into the semi-honest COT protocol by Boyle et al. [132], they showed that the COT protocol based on dual LPN can simultaneously achieve lower communication and faster computation than the best-known COT protocol based on primal LPN [134], as the new structured LPDC codes [133] support fast encoding operation. Based on the new dual-LPN assumption [133], the COT protocol [132] could even obtain 37% less computation than the best-known IKNP-style protocol [113]. However, the efficiency gain builds upon an aggressive dual-LPN problem based on heuristically designed linear codes. Rindal et al. [133] analyzed two key properties of the underlying linear codes (including large minimum distance) under the linear test framework to establish a degree of confidence about the hardness of the new dual-LPN problem. More analyses on the new dual-LPN problem are encouraged to establish more confidence on the security of their COT protocol. When applying the state-of-the-art consistency check by Yang et al. [134] into the semi-honest COT protocol [132] based on the new dual-LPN assumption [133], we can obtain the currently most efficient COT protocol with malicious security. This consistency check along with the check technique by Boyle et al. [132] allows the malicious sender to guess some positions of non-zero entries of noise vector e in a selective failure manner, i.e., an incorrect guess will be caught. This means that the adversary is allowed to query (on average) one-bit information on the noise vector. In this case, it is worth analyzing whether the current parameter selection of the underlying LPN assumptions has already been sufficient to achieve the κ-bit security level (e.g., κ = 128).
Recently, Boyle et al. [256] proposed the notion of pseudorandom correlation function (PCF), and gave an efficient PCF construction for generating COT correlations under a variable-density variant of the LPN assumption (VDLPN). While PCG only allows to generate a fixed length of correlated randomness (e.g., COT) in an all at once way and does not support the stateful incremental evaluation enabled by PRG in a "stream-cipher" mode, PCF can produce correlated randomness on-the-fly and offer the ability to securely generate virtually unbounded number of correlated randomness. In particular, PCF allows two parties to generate two short correlated keys k 0 and k 1 in the setup phase, and then use the keys to compute COT correlations on-the-fly, i.e., v i = PCF(k 0 , x i ) and (u i , w i ) = PCF(k 1 , x i ) for a uniform string x i such that w i = v i + ∆ · u i where ∆ can be involved in key k 0 . Boyle et al. [256] only presented the semi-honest construction to distribute the short keys for computing COT correlations, which may have an efficiency advantage than the PCG approach when the number N of resulting COT correlations is very huge (e.g., N ≈ 2 48 or even larger).

Oblivious linear-function evaluation
OLE. Oblivious linear-function evaluation (OLE) is an arithmetic generalization of OT, and is particularly useful for designing MPC protocols for arithmetic circuits over large fields [120,146,[262][263][264]. In particular, OLE directly gives a two-party additive sharing of the multiplication of two secret values. Therefore, by a pairwise OLE protocol execution, we can use OLE to generate Beaver multiplication triples without authentication in the multi-party setting. OLE can be constructed using OT extension and Gilboa multiplication approach [43,150], and has a cheap computation cost but a much high communication cost. There exists a standard approach to design OLE using additively homomorphic encryption (AHE) based on RLWE, which has been used in Overdrive [152] and the recent work [265], where a receiver R sends Enc(x) to sender S, and then S computes Enc(y) = u · Enc(x) + v and sends it to R who decrypts to obtain y = u · x + v ∈ F for a large field F. Here, the AHE needs to satisfy the circuit privacy property. In addition, OLE can also be constructed from somewhat homomorphic encryption [103,149], but will require a larger communication. Without relying on homomorphic encryption, OLE is also able to be built directly from Ring-LWE [266,267]. Besides, we can also construct OLE protocols from OT and noisy Reed-Solomon encodings [97,264,268], or Paillier encryption [269]. Among all the OLE protocols, the protocols [152,265] based on AHE obtain the best communication efficiency, and the protocol [266] from RLWE has the optimal one round of communication.
Recently, Boyle et al. [162] proposed an OLE construction based directly on LPN, which has very lower communication cost than the above OLE protocols but needs the computational cost of at least O(N 2 ) for generating N OLE correlations. Later, they [161] solved the computational problem using a variant of the ring-LPN assumption, and constructed an OLE protocol for computing a large number of OLE correlations. This OLE protocol has very lower communication cost than the protocols based on RLWE, and provides a computational complexity ofÕ(N ). Their PCG approach based on ring-LPN is a nice approach to generate a large number of OLE correlations (e.g., N = 2 20 ). For a small number of OLE correlations, the approaches based on RLWE may be better. Based on ring-LPN, the resulting OLE correlations are random (i.e., u, v, x ∈ F are uniformly random), but are sufficient to design MPC protocols where only random multiplication triples need to be generated in the preprocessing phase. VOLE. Vector oblivious linear-function evaluation (VOLE) is an arithmetic generalization of COT to a large field and defined as follows: • A sender holds a uniform global key ∆ ∈ F. • For each VOLE execution, the sender obtains a vector v ∈ F N , and a receiver gets two vectors w, u ∈ F N , such that w = v + ∆ · u.
We have a standard transformation from COT to OT using CRHFs [130]. This is not the case for VOLE and OLE, as the underlying field F is large and the sender cannot enumerate all possible values w.r.t. x ∈ F. Similar to OLE, VOLE can be built based on OT extension [43] or AHE [152,265], where the latter has a lower communication. The VOLE protocols [152,265] based on AHE have the communication complexity linear to the output length of VOLE. Based on the LPN assumption, the PCG approach [254] can construct VOLE protocols with sublinear communication, and is the most promising approach to produce a large number of VOLE correlations (e.g., N ≥ 10 5 ). Subsequently, this approach was further optimized in [69, 132-134, 162, 256, 260]. The efficiency and security comparisons among these VOLE protocols based on LPN is similar to the COT case shown in the previous subsection.
The state-of-the-art VOLE protocols [69,133] adopt the same framework as the best-known COT protocols [133,134] based on dual-LPN or primal-LPN, except that an additional VOLE correlation needs to be generated in a single-point VOLE protocol execution as the single non-zero element is uniform in large field F rather than equal to 1. Additionally, for VOLE, we need to use the LPN assumption over a large field F instead of F 2 . We are able to use the VOLE protocols [152,265] based on AHE to generate the VOLE correlations in the setup phase. Besides, we can use the PCF approach to generate VOLE correlations under the VDLPN assumption [256], and may have an efficiency advantage than the PCG approach if the number of resulting VOLE correlations is very huge. Similar to the case of COT, we can use the state-of-the-art consistency check [69,134] to construct maliciously secure VOLE protocols.

MPC application to machine learning
Recent advances in machine learning (ML) have driven a lot of real-life applications, such as healthcare, financial risk analysis, facial recognition, image and video analysis for self-driving cars, recommendation systems, text translation, voice assistants, image classification, etc. The level of accuracy as required is high for mission-critical applications (e.g., healthcare). Accuracy is mainly governed by two factors: 1) the large amount of computing power that is demanded to train deep learning models; 2) the variance in datasets, which comes from collecting data from multiple diverse sources and is generally infeasible for a single company to achieve.
Toward this, multiple companies (e.g., Microsoft, Amazon, Google) provide with machine learning as a service (MLaaS), which works in the following two different ways: • Inference: A company offers a trained ML model, and a customer is able to query a feature input to obtain the inference result. • Training: Multiple companies work together to train a high accuracy model using their datasets.
In the first scenario, companies want to keep the ML model secret as it may take a lot of money to train a model, and customers wish to protect the privacy of their inputs where the input information may be sensitive such as personal health data or faces. In the second scenario, companies would not be willing to share their data since data are proprietary information of a company and these companies may be competitive, and are prohibited from sharing client information due to privacy laws. Here, we say that an ML model is kept secret, meaning that the model parameters are hidden, but the model structure (e.g., which functions are used) is still known. It is a challenge to protect the privacy of model structure while keeping PPML concretely efficient. Therefore, to address the above privacy concerns in ML applications, privacy-preserving machine learning (PPML) is highly desirable, and has emerged as a flourishing research area. In particular, PPML allows ML computations over private data, while ensuring the privacy of the data. Due to the privacyprotection requirement, PPML makes the already compute-intensive ML algorithms more demanding in terms of high computation power and large communication cost. However, many everyday users have no such computation and communication capacities to execute PPML. Thus, it may be economical and convenient for users to securely outsource an ML task to a set of powerful and specialized cloud servers in a pay-per-use manner, where the security is guaranteed if at most t servers of n servers collude to cheat (either t < n or t < n/2 depending on the concrete MPC protocols used). In this case, the inference and training can be realized in the following way: • Outsourcing inference: A company may host its trained ML model in a secret-shared way to n (untrusted) servers. A customer can secretly share its feature input among the same n servers. The servers can compute an inference result in a shared fashion and return the result to the customer. • Outsourcing training: Multiple companies can secretly share their datasets to a set of (untrusted) servers, who cooperatively train a common model on their joint datasets while keeping their individual dataset private.
MPC is one of key techniques to realize PPML, and is the most promising approach to perform PPML in the above outsourced computation setting based on secret sharings. A series of PPML protocols have been built upon MPC techniques. We can partition these PPML protocols into two categories: Note: All protocols for secure three-party/four-party computation (i.e., 3PC/4PC) tolerate one corruption, and thus belong to the honest-majority setting. For malicious adversaries, "Abort", "Fairness", and "GOD" denote the PPML protocols that achieve security with abort, fairness, and guaranteed output delivery, respectively. For the underlying LSSS, we use "ASS" and "RSS" to denote the additive secret sharing and the replicated secret sharing, respectively. If a PPML protocol supports multiple neuralnetwork architectures, we only describe the one with largest parameters for private ML inference.
one is in the dishonest-majority setting, and the other is in the honest-majority setting. We surveyed the known PPML protocols based on MPC, and compare them in Table 3. All PPML protocols shown in Table 3 along with other PPML protocols [30,34,[270][271][272][273][274][275][276] are customized in the following ways: • Based on the known MPC protocols, improve the ML algorithms to make them more MPC-friendly.
• According to the definitions of ML algorithms, tailor the known MPC protocols.
These customized PPML protocols can obtain high efficiency for specific learning tasks. Recently, Zheng et al. [35] designed a platform for privacy-preserving training and inference of generic ML tasks, which supports new neural-network architectures but has a lower efficiency.
In the dishonest-majority setting, the PPML protocols focus on the two-party case, except for two protocols Helen [36] and Cerebro [35]. Both Helen and Cerebro implemented the inference and training of ML algorithms among 4 parties and 2-12 parties, respectively, and allow the adversary to be semihonest or malicious. For 12 parties tolerating 11 semi-honest corruptions, the recent PPML protocol Cerebro [35] can perform an inference of the decision tree with 12 layers in average time about 20 s. In the semi-honest setting with six parties, Cerebro can implement the logistic regression training in about 16 minutes, and the linear regression training in about 100 s. According to the experimental results in Cerebro [35], the maliciously secure PPML protocol is 61-3300× slower than the semi-honest version. In the multi-party setting with dishonest majority, the model for ML inference is small, and the dataset and neural-network architecture for ML training is also small. More efficiency optimizations need to be exploited to support larger datasets and models. The maliciously secure protocols need to be further improved to reduce the overhead over the semi-honest protocols. Now, we turn our attention to the two-party case. Most of the two-party PPML protocols consider the semi-honest adversaries. The only exception is QuantizedNN [283], which uses SPDZ 2 k [157,158] and Overdrive [152] to design maliciously secure protocols, where SPDZ 2 k and Overdrive have been implemented in the MP-SPDZ library [17]. Their maliciously secure protocol [283] is roughly 3-9× slower than the semi-honest protocol. In the semihonest two-party setting, most of PPML protocols focus on ML inference except for SecureML [24] and QUOTIENT [18]. Nevertheless, SecureML and QUOTIENT only implemented the small dataset MNIST that has 60,000 training samples and 10 different classes. For two-party ML inference with semi-honest security, the state-of-the-art PPML protocol CrypTFlow2 [28] is able to perform private inference over complex deep neural networks (DNNs) like ResNet-50 (50 layers, 23.5 million parameters) and DenseNet-121 (121 layers, 8.5 million parameters), which can be trained over a large-scale dataset ImageNet that contains more than 1,000,000 training samples and 1000 different classes. Their implementation [28] needs about 546 s and 32 GB of communication for ResNet-50, and 463 s and 35 GB of communication for DenseNet-121. In the two-party setting, it seems to have been highly efficient for ML inference with semi-honest security, but the ML inference against malicious adversaries and the ML training still have a low efficiency, which needs to be addressed in the future work.
In the honest-majority setting, the known PPML protocols only consider the three-party and fourparty cases tolerate one corruption. In this setting, we can achieve a relatively high efficiency for private inference and training. By accelerating semi-honest 3PC with GPU, CryptGPU [31] can perform one private inference over the ImageNet-scale ResNet-50 (resp., ResNet-152) using 9.3 s and 3.1 GB of communication (resp., 25.8 s and 6.6 GB of communication). The private training implemented by CryptGPU is able to support VGG-16 (16 layers, 138 million parameters), which is trained over a Tiny ImageNet dataset that contains 100,000 training samples and more than 200 different classes. Their implementation reports the running time and communication for a single iteration of private training, which are 13.9 s and 7.6 GB respectively. For malicious security, the best-known PPML protocol Falcon [292] can run a private inference over a neural-network VGG-16 trained with Tiny ImageNet using 12.1 s of running time and 0.4 GB of communication. However, Falcon with malicious security takes over 3 years and about 1012 TB of communication to train a VGG-16 model over the Tiny ImageNet dataset. When a majority of parties are honest, multiple PPML protocols can also achieve stronger security property than security with abort (i.e., fairness and GOD) using a small overhead. In this case, the PPML protocols in the four-party setting have a better performance than those in the three-party setting, but require a stronger assumption about the number of honest parties. Among these PPML protocols achieving fairness or GOD, Tetrad [294] has the best efficiency for now. Particularly, Tetrad takes 183 s and 35 GB of communication to train a VGG-16 model over a small dataset CIFAR-10 that includes 50,000 training samples and 10 different classes. Overall, in the three-party/four-party setting, private inference has been practical and can scale to complex models and large datasets, even in the presence of malicious adversaries. In the same setting, private training provides a high efficiency and supports a moderate-sized dataset for semi-honest security, but has a very low efficiency for malicious security. Besides, it will be an interesting future work to design honest-majority PPML protocols with at least five parties and two corrupted parties.
For ML applications, we need to handle multiple different-type functions. For example, in DNNs, we need to compute Matrix Multiplication, Convolution etc., for linear layers, and ReLU, Max Pooling, Sigmoid, SoftMax etc., for non-linear layers. Therefore, we need to construct mixed-mode MPC protocols, which support both arithmetic circuits and Boolean circuits and allow to convert between arithmetic and Boolean circuits. Additionally, for division operations or a function represented as a circuit that has a large depth, we may need to use the garbled circuit approach to achieve better efficiency. In the dishonest-majority setting, the ABY-like protocols [7,25] developed the techniques to realize the conversion among arithmetic sharing, Boolean sharing and Yao's GCs in the presence of semi-honest adversaries. The ABY-like protocols focus on the case that the circuit evaluation is executed between two parties. In the multi-party setting, the recent work [295] proposes the semi-honest protocols to support arithmetic sharing, Boolean sharing, Yao's GCs, and conversions between any two represents. However, their protocols need a trusted party to distribute all correlated randomness among the parties evaluating the circuits, which makes the protocols have a weaker security guarantee. For malicious security, the conversion between distributed garbled circuits and SPDZ-style authenticated sharings can be realized using the doubly authenticated bits (daBits) technique [158,[296][297][298] or the more efficient extended daBits (edaBits) technique [299]. Specifically, a daBit consists of a pair of random sharings ([r] 2 , [r] M ), where r ∈ {0, 1} and either M = p for a prime p or M = 2 k . The daBits technique was first presented by Rotaru and Wood [298] for the case of M = p. Then, the performance was further improved for the case of M = p in [296,297,299], and daBits over a ring Z 2 k were also shown in [158,299], where the known implementation [296] takes about 11.8 KB for generating one daBit with a large prime p and can generate about 2150 daBits per second in the two-party malicious setting. The state-of-the-art edaBits protocol [299] adopts the "Cut-and-Bucketing" technique to check the consistency of values in two different domains, where an edaBit consists of a set of random sharings ([r m−1 ] 2 , . . . , [r 0 ] 2 ) in the binary domain and a sharing [r] M in the arithmetic domain, such that r = i∈[0,m) r i · 2 i mod M . In the two-party malicious setting, the edaBits approach will reduce the communication cost by a factor of 2× for implementing comparison of 63-bit integers, compared to the daBits technique [299]. The "Cut-and-Bucketing" technique for malicious security leads to at least a factor of 3 overhead over the semi-honest protocol. It is an interesting open problem to construct a concretely efficient edaBits protocol with malicious security, which achieves an overhead of 2 or even smaller. In the honest-majority setting, the protocols against malicious adversaries, which allow to convert between the arithmetic, Boolean, and garbling worlds, can be constructed more efficiently [19,294], where the techniques underlying the constant-round MPC protocols (e.g., [236,239]) can be used and adapted.
On the other hand, the recent studies by Boyle et al. [300,301] proposed a new approach to construct mixed-mode MPC protocols based on function secret sharing (FSS), which is useful for ML applications with optimal online communication and round complexity. Their FSS approach supports arithmetic operations that are mixed with non-arithmetic operations. In particular, for a non-arithmetic function g such as ReLU, two parties can obtain two succinct FSS keys to evaluate function g r (x) = g(x + r) where r is a randomness shared by the parties. In general, the FSS-based approach requires more communication in the preprocessing phase than the GC or GMW approach, unless the FSS keys are distributed by a trusted dealer, or the input length is relatively small. This naturally leaves a future work to reduce the preprocessing cost for distributing the FSS keys by a concretely efficient 2PC protocol. For online communication cost and rounds, the FSS-based approach outperforms the GC and GMW approaches. Their FSS-based approach can be also secure against malicious adversaries [300]. For now, Boyle et al. [300,301] only proposed efficient constructions for functions including comparison (e.g., ReLU), splines (e.g., used in sigmoid), bit-decomposition, zero test, and arithmetic/logical shifts. There are still many functions used in ML and scientific computation (e.g., exponentiation, tanh, and reciprocal of square root), whose concretely efficient FSS-based 2PC protocols are unknown. Besides, Boyle et al. [300,301] only gave two-party constructions. It is an interesting future work to construct concretely efficient FSSbased MPC protocols with optimal online communication and rounds for multiple parties (i.e., n ≥ 3). While prior work uses a uniform bitwidth for the whole ML inference, the recent work by Rathee et al. [27] proposed the mixed bitwidths approach, i.e., operating in low bitwidths and going to high bitwidths only when necessary. They designed new protocols to switch between bitwidths and operations on values of differing bitwidths. Their approach is interesting and able to obtain better efficiency. While the work [27] only considers private ML inference in the two-party setting, it is worth further developing the mixed bitwidths approach to private ML training and the multi-party setting.

Conclusion and future work
We have described the (recent) development of concretely efficient MPC protocols along with the key techniques underlying these MPC protocols. Particularly, we present the high-level ideas in the recent MPC protocols and OT/OLE protocols. As an example of MPC applications, we discuss privacy-preserving machine learning, and summarize related work as well as conversion and FSS-based techniques. It is desired that this survey will help new researchers (who are interesting for MPC) understand the recent development of concretely efficient MPC rapidly, and to preliminarily understand some key techniques as a starting point of MPC study.
To deploy MPC on a large scale, standardization is a necessary step. Nevertheless, this is not an easy task, as there exists many different kinds of MPC protocols that have different advantages in terms of security and efficiency. Besides, there are many techniques and different assumptions that are used in the design of MPC. These make the MPC standardization procedure becomes hard. Of course, we can first standardize a batch of MPC protocols in the same setting, and then standardize the next batch in the other setting. When standardization is a long-time procedure and needs to take a large amount of financial resources, this approach is very expensive. Furthermore, how to keep compatibility of multiple MPC protocols in different standardization procedures is a problem. All of these need to be addressed and solved in the future work. Recently, ISO is preparing the standardization process for MPC based on secret sharing [302]. Besides, NIST will standardize multi-party threshold cryptography in the future [303], where MPC is a key technique to realize AES encryption/decryption, EdDSA signing, the distributed key-generation of RSA, etc. NIST also intends to accompany the progress of emerging technologies in the area of privacy enhancing cryptography [304], which includes MPC, ZK, HE, etc.
We have summarized some open problems and future work in the previous sections. In the following, we conclude this work by further listing several open problems and future work for concretely efficient MPC protocols.
• Constant-round 2PC: The recent break-through work by Rosulek and Roy [199] reduced the size of garbled circuits from 2κ bits per AND gate to 3κ/2+5 bits per AND gate. A natural open problem is whether one can do better, e.g., about 4κ/3 bits per AND gate while keeping compatibility with free-XOR and high computational efficiency. If this seems to be impossible, one can attempt to prove that ≈ 3κ/2 bits per AND gate is optimal in a more inclusive model than the linear garbling model in [195]. When the work [199] focuses on the semi-honest adversary, another open problem is to extend the slicing-and-dicing technique in [199] to two-party distributed garbling that can be used to design maliciously secure 2PC protocols by combining with BDOZ-style IT-MACs. • Constant-round MPC: We conclude three future work for designing constant-round MPC.
(a) Dishonest majority for garbling: For multi-party distributed garbling based on only symmetric primitives, the state-of-the-art technique by Yang et al. [110] achieves (4n − 6)|C|κ bits in terms of the size of a garbled circuit. It is a challenging task to further reduce it to about 2n|C|κ bits based on still symmetric primitives. In other words, is it possible to totally apply the half-gate technique to multi-party distributed garbling? (b) Dishonest majority for AND triples: Currently, we use a TinyOT-like protocol to generate authenticated AND triples, which requires an overhead of at least 3 to achieve malicious security over the corresponding semi-honest protocol, where the overhead is from the usage of the bucketing technique. It is an interesting open problem that designs an authenticated-AND-triple protocol achieving an overhead of 2 (or even smaller) using a novel technique. (c) Honest majority: If the number of parties n > 5, Chandran et al. [237] presented a constantround MPC protocol with corruption threshold t ≤ √ n. It is an interesting future work that constructing a constant-round concrete-efficient MPC protocol tolerating t < n/2 corrupted parties when n > 5.
• SPDZ: The efficiency bottleneck for SPDZ-style protocols is to generate authenticated triples over a large field. The state-of-the-art protocol [161] based on a variant of the ring-LPN assumption obtains a relatively low communication complexity, which is two orders of magnitude smaller than Overdrive [152]. However, this protocol has a computation complexity of O(N log N ) for encoding (while fast Fourier transform (FFT) is used) which is large for large N , where N is the number of resulting authenticated triples. An important future work is to reduce the computation complexity while keeping small communication complexity for the ring-LPN-based protocol. • LPN variants for MPC: COT and OLE as well as their variants are key building blocks for MPC in the dishonest-majority setting. To design the COT and (V)OLE protocols with low communication, several LPN variants have already been proposed, including LPN with a regular noise distribution [114,132], LPN with static leakage [132], ring-LPN with reducible polynomials and a regular noise distribution [161]. An important future work is to further analyze the LPN variants proposed in the MPC context, which allows to establish more confidence on the hardness of these LPN problems. • Large-scale MPC with honest majority: For honest-majority MPC in the malicious setting, several recent work [178,179,184] designed large-scale MPC protocols, which scales practically to hundreds of thousands of parties. However, their concrete efficiency is still not high. Constructing large-scale maliciously secure MPC protocols with higher concrete efficiency as well as giving an efficient implementation scaling to thousands of parties will be an interesting future work.