Ultrasound Signal Processing: From Models to Deep Learning

Medical ultrasound imaging relies heavily on high-quality signal processing to provide reliable and interpretable image reconstructions. Conventionally, reconstruction algorithms where derived from physical principles. These algorithms rely on assumptions and approximations of the underlying measurement model, limiting image quality in settings were these assumptions break down. Conversely, more sophisticated solutions based on statistical modelling, careful parameter tuning, or through increased model complexity, can be sensitive to different environments. Recently, deep learning based methods, which are optimized in a data-driven fashion, have gained popularity. These model-agnostic techniques often rely on generic model structures, and require vast training data to converge to a robust solution. A relatively new paradigm combines the power of the two: leveraging data-driven deep learning, as well as exploiting domain knowledge. These model-based solutions yield high robustness, and require less parameters and training data than conventional neural networks. In this work we provide an overview of these techniques from recent literature, and discuss a wide variety of ultrasound applications. We aim to inspire the reader to further research in this area, and to address the opportunities within the field of ultrasound signal processing. We conclude with a future perspective on model-based deep learning techniques for medical ultrasound.


Introduction
Ultrasound (US) imaging has proven itself to be an invaluable tool in medical diagnostics. Among many imaging technologies, such as X-ray, computed tomography (CT), and magnetic resonance imaging (MRI), US uniquely positions itself as an interactive diagnostic tool, providing real-time spatial and temporal information to the clinician. Combined with its relatively low cost, compact size, and absence of ionizing radiation, US imaging is an increasingly popular choice in patient monitoring.
Consequently, the versatility of US imaging has spurred a wide range of applications in the field. While conventionally it is used for the acquisition of B-mode (2D) images, more recent developments have enabled ultrafast, and 3D volumetric imaging. Additionally, US devices can be used for measuring clinically relevant features such as: blood velocity (Doppler), tissue characteristics (e.g. Elastography maps), and perfusion trough ultrasound localization microscopy (ULM). While this wide range of applications shares the same underlying measurement steps: acquisition, reconstruction, and visualisation, their signal processing pipelines are often specific for each application.
It follows that the quality of US imaging strongly depends on the implemented signal processing algorithms. The resulting demand for high-quality signal processing has pushed the reconstruction process from fixed, often hardware based implementations to the digital domain (Thomenius, 1996;Kim et al., 1997). More recently, this has led to fully software-based algorithms, as they can open up the potential to complex measurement models and statistical signal interpretations. However, this shift has also posed a new set of challenges, as it puts a significant strain on the digitisation hardware, bandwidth constrained data channels, and computational capacity. As a result, clinical devices, where real-time imaging and robustness are of utmost importance, still mainly rely on simple hardware based solutions.
A more recent development in this field is the utilisation of deep neural networks. Such networks can provide fast approximations for signal recovery, and can often be efficiently implemented due to their exploitation of parallel processing. After training, these networks can be efficiently implemented to facilitate ultra-fast signal processing. However, by inheriting generic network architectures from computer vision tasks, these approaches are highly datadriven and are often over-parameterized, posing several challenges. In order to converge to a well-generalised solution across the full data distribution encountered in practice, large amounts of (unbiased) training data are needed, which is not always trivial to obtain. Furthermore, these models are often treated as a 'black-box', making it difficult to guarantee the correct behavior in a real clinical setting.
To overcome some of the challenges of purely data-driven methods, an alternative approach is to try to combine model-based and data-driven methods, in attempt to get the best of both worlds. The proposition here is that the design of data-driven methods for ultrasound signal processing can likely benefit from the vast amounts of research on conventional, model-based, reconstruction algorithms, informing e.g. specific neural network designs or hybrid processing approaches.
In this review paper, we aim to provide the reader with a comprehensive overview of ultrasound signal processing based on modelling, machine  , and Deep Beamforming (Khan et al., 2020). learning, and model-based learning. To achieve this, we take a probabilistic perspective and place methods in the context of their assumptions on signal models and statistics, and training data. While other works (Shlezinger et al., 2020;Monga et al., 2021;Van Sloun et al., 2019a;Al Kassir et al., 2022;Liu et al., 2019) offer an excellent overview of the different aspects of AI applied to ultrasound image processing, the focus of this paper is to put the theory of both signal processing and machine learning under a unifying umbrella, rather than to showcase a general review of deep learning being applied to ultrasound specific problems. To that end, we cover topics ranging from beamforming to post-processing and advanced applications such as super-resolution. Throughout the paper we will distinguish between three types of approaches that we cover in separate sections.
• Model-Based Methods for US Signal Processing: Conventional model-based methods derive signal processing algorithms by modelling the problem based on first principles, such as knowledge of the acquisition model, noise, or signal statistics. Simple models offer analytical solutions, while more complex models often require iterative algorithms.
• Deep Learning (DL) for US Signal Processing: Deep learning (DL) solutions are fully data-driven and fit highly-parameterized algorithms (in the form of deep neural networks) to data. DL methods are model-agnostic and thus rely on the training data to expose structure and relations between inputs and desired outputs.
• Model-Based DL for US Signal Processing: Model-based DL aims at bridging the gap by deriving algorithms from first-principle models (and their assumptions) while learning parts of these models (or their analytic/iterative solutions) from data. These approaches enable incorporating prior knowledge and structure (inductive biases), and offer tools for designing deep neural networks with architectures that are tailored to a specific problem and setting. The resulting methods resemble conventional model-based methods, but allow for overcoming mismatched or incomplete model information by learning from data.
In all cases, data is needed to test the performance of (clinical) signal processing algorithms. However, in deep learning based solutions specifically, we observe an increasing need for training data when prior knowledge on the underlying signal model is not fully exploited. A schematic overview of these approaches is given in Figure 1, including examples of corresponding techniques in the case of ultrasound beamforming.
We begin by briefly explaining the probabilistic perspective and notation we will adopt throughout the paper in a preliminaries section, after which we provide background information on the basics of US acquisition, which can be skipped by experts in the field of ultrasound. Following this background information, we will dive into model-based US signal processing, in which we will derive various conventional beamforming and post-processing algorithms from their models and statistical assumptions. Next, we turn to DL methods, after which we bridge the gap between model-based and DL-based processing, identifying opportunities for data-driven enhancement of model-based methods (and their assumptions) by DL. Finally we provide a discussion and conclusion, where we provide a future outlook and several opportunities for deep learning in ultrasound signal processing.
A probabilistic approach to deep learning in ultrasound signal processing In this paper we will use the language and tools of probability theory to seamlessly bridge the gap between conventional model-based signal processing and contemporary machine/deep learning approaches. As Shakir Mohamed (DeepMind) phrased it: "Almost all of machine learning can be viewed in probabilistic terms, making probabilistic thinking fundamental. It is, of course, not the only view. But it is through this view that we can connect what we do in machine learning to every other computational science, whether that be in stochastic optimisation, control theory, operations research, econometrics, information theory, statistical physics or bio-statistics. For this reason alone, mastery of probabilistic thinking is essential." To that end, we begin by briefly reviewing some concepts in probabilistic signal processing based on models, and then turn to recasting such problems as data-driven learning problems.

Preliminaries on model-based probabilistic inference
Let us consider a general linear model where y is our observed signal, A a measurement matrix, n a noise vector, and x the signal of interest. As we shall see throughout the paper, many problems in ultrasound signal processing can be described according to such linear models. In ultrasound beamforming for example, y may denote the measured (noisy) RF signals, x the spatial tissue reflectivity, and A a matrix that transforms such a reflectivity map to channel domain signals. The goal of beamforming is then to infer x from y, under the measurement model in (1).
Recalling Bayes rule, we can define the posterior probability of x given y, as a product of the likelihood p(y|x) and a prior p(x), such that Following (3) we can define a maximum a posteriori (MAP) estimator for (1), given byx which provides a single, most likely, estimate according to the posterior distribution. If we assume a Gaussian white noise vector n in (1), i.e.
y ∼ N (Ax|σ 2 n I), the MAP estimator becomes: where λ is a scalar regularization parameter.
Evidently, the MAP estimator takes the prior density function p(x) into account. In other words, it allows us to incorporate and exploit prior information on x, should this be available. Conversely, if x is assumed to be deterministic but unknown, we get the maximum likelihood (ML) estimator.
The ML estimator thus assigns equal likelihood to each x in the absence of measurements. As such this simplifies to: Many traditional ultrasound processing methods are in this form, where its output only depends on a set of (finely tuned) hyper-parameters, and the input data. This is not surprising, as deriving a strong and useful prior that generalizes well to the entire expected data distribution is challenging in its own right.
Data-driven approaches aim to overcome the challenges of accurate modeling by learning the likelihood function, the prior, the entire posterior, or a direct end-to-end mapping (replacing the complete MAP estimator) from data. We will detail on these methods in the following section.

Preliminaries on deep-learning-based inference
Fully data-driven methods aim at learning the optimal parameters θ * of a generic parameterized mapping, f θ (·), Y → X from training data. In deep learning the mapping function f θ (·) is a deep neural network. Learning itself can also be formulated as a probabilistic inference problem, where optimized parameter settings for a fixed network architecture are inferred from a dataset D. To that end we define a posterior over the parameters: where p(θ) denotes a prior over the parameters. Often p(θ) is fully factorized, i.e. each parameter is assumed independent, to keep the learning problem in deep networks (with millions of parameters) tractable. Typical priors are Gaussian or Laplacian density functions.
Most deep learning applications rely on MAP estimation to find the set of parameters that minimize the negative log posterior: Note that for measurement (input) -signal (output) training pairs (y i , x i ) ∈ D common forms of p(x|f θ (y), θ) are Gaussian, Laplacian, or categorical distributions, resulting in mean-squared-error, mean-absolute-error, and crossentropy negative log-likelihood functions, respectively. Similarly, Gaussian and Laplacian priors lead to ℓ 2 and ℓ 1 regularization on the parameters, respectively. It is worth noting that while most deep learning applications perform MAP estimation, there is increasing interest in so-called Bayesian deep learning, which aims at learning the parameters of the prior distribution p(θ) as well. This enables posterior sampling during inference (by sampling from p(θ)) for (epistemic) uncertainty estimation. Again, often these distributions are fully factorized (e.g. independent Gaussian or Bernoulli) to make the problem tractable (Gal and Ghahramani, 2016).
After training (i.e. inferring parameter settings), we can use the network to perform MAP inference to retrieve x from new input measurements y: The neural network thus directly models the parameters of the posterior, and does not factorize it into a likelihood and prior term as model-based MAP inference does. Note that for Gaussian and Laplace density functions, in which a neural network f θ (y) computes the distribution mean, For categorical distributions, f θ (y) computes the probabilities for each category/class.
Typical deep neural network parameterizations f θ (·) are therefore modelagnostic, as they disregard the structure of the measurement/likelihood model and prior, and offer a high degree of flexibility to fit many data distributions and problems. However, many such parameterizations do exploit specific symmetries in the expected input data. Examples are convolutional neural networks, which exploit the spatial shift-invariant structure of many image classification/regression problems through shift-equivariant convolutional layers. Similarly, many applications where the input is temporally correlated, such as time series analysis, recurrent neural networks (RNN) are employed.

Preliminaries on model-based deep learning
Model-based DL aims at imposing much more structure to the network architectures and parameterizations of f θ (·). Where standard deep networks aim at fitting a broad class of problems, model-based DL offers architectures that are highly tailored to specific inference problems given in (1) and (4) -i.e.
they are aware of the model and structure of the problem. This promises to relax challenges related to generalization, robustness, and interpretability in deep learning. It often also enables designing smaller (but more specialized) networks with a lower computational and memory footprint.
To derive a model-based DL method, one can start by deriving a MAP estimator for x from the model, including assumptions on likelihood models p(y|x) and priors p(x). Generally, such estimators come in two forms: analytic (direct) and iterative solutions. The solution structure dictates the neural network architecture. One then has to select which parts of the orig- This iterative solution consists of two alternating steps: 1) a gradient step on x to maximize the log-likelihood of log p(y|x), 2) a proximal step that Also within the field of US imaging and signal processing, model-based DL is seeing increasing adoption for problems spanning from beamforming  to clutter suppression (Solomon et al., 2019a) and localization microscopy . Exact implementations of these model-based DL methods for US imaging is indeed highly application specific (which is its merit), as we will discuss in a later section.

Fundamentals of US acquisition
Ultrasound imaging is based on the pulse-echo principle. First, a pressure pulse is transmitted towards a region of interest by the US transducer consisting of multiple transducer elements. Within the medium, scattering occurs due to inhomogeneities in density, speed-of-sound and non-linear behavior.
The resulting back-scattered echoes are recorded using the same transducer, yielding a set of radio-frequency (RF) channel signals that can be processed.
Typical ultrasound signal processing includes B-mode image reconstruction via beamforming, velocity estimation (Doppler), and additional downstream post-processing and analysis.
Although the focus of this paper lies on these processing methods, which we will discuss in later chapters, we will for the sake of completeness briefly review the basic principles of ultrasound channel signal acquisition.

Transmit schemes
Consider an ultrasound transducer with channels c ∈ C. A transmit scheme consists of a series of transmit events e ∈ E. Different transmit events can be constructed by adjusting the per channel transmit delays (focusing), the number of active channels (aperture), and in advanced modes, also waveform parameters. We briefly list the most common transmit schemes.

Line scanning
Most commercial ultrasound devices rely on focused, line-by-line, acquisition schemes, as it yields superior resolution and contrast compared to unfocused strategies. In line scanning, a subaperture of channels focuses the acoustic energy by channel-dependent transmit delays along a single (axial) path at a set depth, maximizing the reflected echo intensity in a region-of-interest (Ding et al., 2014). Some transmit schemes make use of multiple foci per line. To cover the full lateral field of view, many scan lines are needed, limiting the overall frame rate.

Synthetic aperture
In synthetic aperture (SA) imaging, each channel transmit-receive pair is acquired separately (Ylitalo and Ermert, 1994;Jensen et al., 2006). To that end, each element independently fires a spherical wavefront, of which the reflections can be simultaneously recorded by all receiving elements. Typically, the number of transmit events is equal to the number of transducer elements (E = C). Having access to these individual transmit-receive pairs enables retrospective transmit focusing to an arbitrary set of foci (e.g each pixel).
While SA imaging offers advantages in terms of receive processing, it is time consuming, similar to line scanning. Furthermore, single elements generate low acoustic energy, which reduces the SNR.

Plane-and Diverging wave
As of recent, unfocused (parallel) acquisition schemes have become more popular, since they can drastically reduce acquisition times, yielding so-called ultrafast imaging at very high frame rates. Plane wave (PW) imaging insonifies the entire region of interest at once through a planar wave field, by firing with all elements and placing the axial focus point at infinity. Diverging wave (DW) transmissions also insonify the entire region of interest in one shot, but generate a spherical (diverging) wavefront by placing a (virtual) focus point behind the transducer array. Especially for small transducer footprints (e.g. phased array probes), DW schemes are useful to cover a large image region.
Both PW and DW imaging suffer from deteriorated resolution and low contrast (high clutter) due to strong interference by scattering from all directions. Often, multiple transmits at different angles are therefore compounded to boost image quality. However, this reduces frame rate. Unfocused transmissions rely heavily on the powerful receive processing to yield an image of sufficient quality, raising computational requirements.

Doppler
Beyond positional information, ultrasound also permits measurement of velocities, useful in the context of e.g. blood flow imaging or tissue motion estimation. This imaging mode, called Doppler imaging (Chan and Perlas, 2011;Routh, 1996;Hamelmann et al., 2019), often requires dedicated transmit schemes with multiple high-rate sequential acquisitions. Continuous wave Doppler allows for simultaneous transmit and receive of acoustic waves, using separate sub-apertures. While this yields a high temporal sampling rate, and prevents aliasing, it does result in some spatial ambiguity. The entire region of overlap between the transmit and receive beam contribute to the velocity estimate. Alternatively, pulsed-wave Doppler relies on a series of snapshots of the slow-time signal, with the temporal sampling rate being equal to the frame rate. From these measurements, a more confined region-of-interest can be selected for improved position information, at the cost of possible aliasing.

Waveform and frequency
The resolution that can be obtained using ultrasound is for a large part dependent on the frequency of the transmitted pulse. High transmit pulse frequencies, and short pulse durations, yield high spatial resolution, but are strongly affected by attenuation. This becomes especially problematic in deep tissue regions. As a general rule, the smallest measurable structures scale to approximately half the wavelength of the transmit frequency, i.e. the diffraction limit. In practice the transmit pulse spans multiple wavelengths, which additionally limits axial resolution by half the transmit pulse length. Design choices such as transducer array aperture, element sensitivity, bandwidth of the front-end circuitry, and reconstruction algorithms also play a dominant role in this.

Array designs
Depending on the application, different transducer types may be preferred.
Either due to physical constraints, or by having desirable imaging properties.
Commonly used transducer geometries include linear-, convex-and phased arrays. Effectively, the transducer array, consisting of elements, spatially samples the array response. Typically, these array elements have a centerto-center spacing (pitch) of λ/2 or less, in order to avoid spatial aliasing. In general, a higher number of elements yields a better resolution image, but this consequently increases size, complexity, and bandwidth requirements.
Especially for 2D arrays (used in 3D imaging), the high number of transducer elements can be problematic in implantation due to the vast number of physical hardware connections. Other than translating to an increase in cost and complexity, it also raises power consumption. In those cases, often some form of micro-beamforming is applied in the front-end, combining individual channel signals early in the signal chain.
Similar reductions in data rates can be achieved through sub-sampling of the receive channels. Trivial approaches include uniform or random subsampling, at the cost of reduced resolution, and more pronounced aliasing artifacts (grating lobes). Several works have showed that these effects can be mitigated either by principled array designs (Cohen and Eldar, 2020) Song et al. (2020), or by learning sub-sampling patterns from data in a taskadaptive fashion (Huijben et al., 2020).

Sub-Nyquist signal sampling
Digital signal processing of US signals requires sampling of the signals received by the transducer, after which the digital signal is transferred to the processing unit. To prevent frequency-aliasing artifacts, sampling at the Nyquist limit is necessary. In practice, sampling rates of 4-10 times higher are common, as it allows for a finer resolution during digital focusing. As a consequence, this leads to high bandwidth data-streams, which become especially problematic for large transducer arrays (e.g. 3D probes).
Compressed sensing (CS) provides a framework that allows for reduced data rates, by sampling below the Nyquist limit, alleviating the burden on data transfer (Eldar, 2015). CS acquisition methods provide strong signal recovery guarantees when complemented with advanced processing methods for reconstruction of the signal of interest. These reconstruction methods are typically based on MAP estimation, combining likelihood models on the measured data (i.e. a measurement matrix), with priors on signal structure (e.g. sparsity in some basis). Many of the signal processing algorithms that we will list throughout the paper will find application within a CS context, especially those methods that introduce a signal prior for reconstruction, either through models or by learning from data. The latter is especially useful for elaborate tasks where little is known about the distribution of system parameters, offering signal reconstruction beyond what is possible using conventional CS methods.
For further reading into the fundamentals of ultrasound, the reader may refer to works such as Brahme (2014).

Model-based US signal processing
Model-based ultrasound signal processing techniques are based on first principles such as the underlying physics of the imaging setup or knowledge of the statistical structure of the signals. We will now describe some of the most commonly used model-based ultrasound signal processing techniques, building upon the probabilistic perspective sketched in earlier sections. For each algorithm we will explicitly list 1) inputs and outputs (and dimensions), 2) the assumed signal model and statistics, 3) signal priors, and 4) the resulting ML/MAP objective and solution.
Beamforming, the act of reconstructing an image from the received raw RF channel signals, is central to ultrasound imaging and typically the first step in the signal processing pipeline. We will thus start our description with beamforming methods.

Beamforming
Given an ultrasound acquisition of C transducer channels, N t axial samples, and E transmission events, we can denote Y ∈ R E×C×Nt as the recorded RF data cube, representing back-scattered echoes from each transmission event.
With beamforming, we aim to transform the raw aperture domain signals Y to the spatial domain, through a processing function f (·) such that whereX represents the data beamformed to a set of focus points S r . As an example, in pixel-based beamforming, these focus points could be a pixel grid such that S r ∈ R rx×rz , where r x and r y represent the lateral and axial components of the vector indicating the pixel coordinates, respectively. Note that, while this example is given in cartesian coordinates, beamforming to other coordinate systems (e.g. polar coordinates) is also common.

Delay-and-sum beamforming
Delay-and-sum (DAS) beamforming has been the backbone of ultrasound image reconstruction for decades. This is mainly driven by its low computational complexity, which allows for real-time processing, and efficient hard- aims at aligning the received signals for a set of focus points (in pixel-based beamforming: pixels) by applying time-delays. We can define the total TOF from transmission to the receiving element, as where τ r is the required channel delay to focus to an imaging point r, vectors r e and r c correspond to the origin of the transmit event e, and the position of element c, respectively, and v is the speed of sound in the medium. Note that the speed-of-sound is generally assumed to be constant throughout the medium. As a consequence, speed-of-sound variations can cause misalignment of the channel signals, and result in aberration errors.
After TOF correction, we obtain a channel vector y r per pixel r, for which we can define a linear forward model to recover the pixel reflectivity x r : where y r ∈ R C is a vector containing the received aperture signals, x r ∈ R the tissue reflectively at a single focus point r, and n r ∈ R C×1 an additive Gaussian noise vector ∼ N (0, σ 2 n I). In this simplified model, all interference (e.g. clutter, off-axis scattering, thermal noise) is contained in n. Note that (without loss of generality) we assume a real-valued array response in our analysis, which can be straightforwardly extended to complex values (e.g. after in-phase and quadrature demodulation). Under the Gaussian noise model, (14) yields the following likelihood model for the channel vector: where σ 2 n denotes the noise power. The delay-and-sum beamformer is the per-pixel ML estimator of the tissue reflectively,x r , given bŷ Solving (17) yields:x where C is the number of array elements. In practice, apodization/tapering weights are included to suppress sidelobes: This form can be recognized as the standard definition of DAS beamforming, in which the channel signals are weighed using an apodization function, w, and subsequently summed to yield a beamformed signal.
Here Q k,c,r are the Fourier coefficients of a distortion function derived from the beamforming delays at r, as in (13).
When not all Fourier coefficients are sampled (i.e. in sub-Nyquist acquisition), the desired time-domain signal can be recovered using CS methods such as NESTA Becker et al. (2011), or via deep learning approaches.

Advanced adaptive Beamforming
The shortcomings of standard DAS beamforming have spurred the development of a wide range of adaptive beamforming algorithms. These methods aim to overcome some of the limitations that DAS faces, by adaptively tuning its processing based on the input signal statistics.

Minimum Variance
DAS beamforming is the ML solution of (14) under white Gaussian noise.
To improve realism for more structured noise sources, such as off-axis interference, we can introduce a colored (correlated) Gaussian noise profile n ∼ N (0, Γ n ), with Γ r being the array covariance matrix for beamforming point r. Maximum (log) likelihood estimation for x r then yields: Setting the gradient of the argument in (22) with respect tox r equal to zero, gives: It can be shown that solution (25) can also be obtained by minimizing the total output power (or variance), while maintaining unity gain in a desired direction (the foresight): Solving for (26) yields the closed form solution which is known as Minimum Variance (MV) or Capon beamforming.
In practice, the noise covariance is unknown, and is instead empirically estimated from data (Γ n = E[yy H ]). For stability of covariance matrix inversion, this estimation often relies on averaging multiple sub-apertures and focus points, or by adding a constant factor to the diagonal of the covariance matrix (diagonal loading). Note here, that for Γ = σ 2 n I (White Gaussian noise), we get the DAS solution as in (18).
Minimum Variance beamforming was shown to improve both resolution and contrast in ultrasound images, and has similarly found application in plane wave compounding (Austeng et al., 2011). However, it is computationally complex due to the inversion of the covariance matrix (Raz, 2002), leading to significantly longer reconstruction times compared to DAS. To boost image quality further, eigen-space based MV beamforming has been proposed (Deylami et al., 2016), at the expense of further increasing computational complexity. As a result of this, real-time implementations remain challenging, to an extent that MV beamforming is almost exclusively used as a research tool.

Wiener beamforming
In the previously covered methods, we have considered the ML estimate ofx. Following (4), we can extend this by including a prior probability distribution p(x r ), such that For a Gaussian likelihood model, the solution to this MAP estimate is equivalent to minimizing the mean-squared-error, such that also known as Wiener beamforming (Van Trees, 2004). Solving this yieldŝ with Γ r being the array covariance for beamforming point r, and w MV the MV beamforming weights given by (27). Wiener beamforming is therefore equivalent to MV beamforming, followed by a scaling factor based on the ratio between the signal power and total power of the output signal, which can be referred to as post-filtering. Based on this result, Nilsen and Holm (2010) observe that for any w that satisfies w H 1 = 1 (unity gain), we can find a Wiener post-filter that minimizes the MSE of the estimated signal. As such, we can write Assuming white Gaussian noise (Γ r = σ 2 n I, and x ∼ N (0, σ 2 x ) the Wiener beamformer is equivalent to Wiener post-filtering for DAS, given by: Coherence Factor weighing The Coherence Factor (CF) (Mallart and Fink, 1994) aims to quantify the coherence of the back-scattered echoes in order to improve image quality through scaling with a so-called coherence factor, defined as where C denotes the number of channels. Effectively, this operates as a post-filter, after beamforming, based on the ratio of coherent and incoherent energy across the array. As such, it can suppress focusing errors that may occur due to speed-of-sound inhomogeneity, given bŷ The CF has been reported to significantly improve contrast, especially in regions affected by phase distortions. However it also suffers from reduced brightness, and speckle degradation. An explanation for this can be found when comparing (35) with the Wiener post-filter for DAS in (33). We can see that CF weighing is in fact a Wiener post-filter where the noise is scaled by a factor C, leading to a stronger suppression of interference, but consequently also reducing brightness. Several derivations of the CF have been proposed to overcome some of these limitations, or to further improve image quality, such as the Generalized CF (Pai-Chi Li and Meng-Lin Li, 2003), and Phase Coherence factor (Camacho et al., 2009).

Iterative MAP beamforming:
Chernyakova et al. (2019), propose an iterative maximum a posteriori (iMAP) estimator, which provides a statistical interpretation to post-filtering.
The iMAP estimator works under the assumption of knowledge on the received signal model, and treats signal of interest and interference as uncorrelated Gaussian random variables with variance σ 2 x . Given the likelihood model in (15), and x ∼ N (0, σ 2 x ), the MAP estimator of x is given bŷ However, the parameters σ 2 x and σ 2 n are unknown in practice. Instead, these can be estimated from the data at hand, leading to an iterative solution First an initial estimate of the signal and noise variances is calculated and initializing with the DAS estimatex (0) = 1 C 1 H y. Following (4) and (37), a MAP estimate of the beamformed signal is given bŷ where t is an index denoting the number of iterations. Equations (37) and (38) where α and λ are regularization parameters. This particular form of regularization is also called elastic-net regularization. ADMIRE shows significant reduction in clutter due to multi-path scattering, and reverberation, resulting in a 10-20dB improvement in CNR.

Sparse coding
Chernyakova et al. propose to formulate the beamforming process as a line-by-line recovery of back-scatter intensities from (potentially undersampled) Fourier coefficients (Chernyakova and Eldar, 2014). Denoting the axial fast-time intensities by x ∈ R N , and the noisy measured DFT coefficients of a scan line byỹ ∈ R M , with M ≤ N , we can formulate the following linear measurement model: where assuming it is sparse) can again be posed as a MAP estimation problem: where λ is a regularization parameter. Problem (41) can be solved using the Iterative Shrinkage and Thresholding Algorithm (ISTA), a proximal gradient where τ λ = sgn(x i )(|x i |−λ) + is the proximal operator of the ℓ 1 norm, µ is the gradient step size, and (·) H denotes the Hermitian, or conjugate transpose.
It is interesting to note that the first step in the ISTA algorithm, given bŷ x 1 = A Hỹ = F H u H Hỹ , thus mapsỹ back to the axial/fast-time domain through the zero-filled inverse DFT.

Wavefield inversion
The previously described beamforming methods all build upon measurement models that treat pixels or scan lines (or, for ADMIRE: short-time windows) independently. As a result, complex interaction of contributions and interference from the full lateral field of view are not explicitly modeled, and often approximated through some noise model. To that end, several works explore reconstruction methods which model the joint across the full field of view, and its intricate behavior, at the cost of a higher computational footprint.
Such methods typically rely on some form of "wavefield inversion", i.e.
inverting the physical wave propagation model. One option is to pose beamforming as a MAP optimization problem through a likelihood model that relates the per-pixel back-scatter intensities to the channel signals (Szasz et al., 2016b,a;Ozkan et al., 2017), and some prior/regularization term on the statistics of spatial distributions of back-scatter intensities in anatomical images. Based on the time-delays given by (13) (and the Green's function of the wave equation), one can again formulate our typical linear forward model: where x ∈ R rxrz is a vector of beamformed data, n ∈ R CNt an additive white Gaussian noise vector, and y ∈ R CNt the received channel data. The space-time mapping is encoded in the sparse matrix A ∈ R.
Solving this system of equations relies heavily on priors to yield a unique and anatomically-feasible solution, and yields the following MAP optimiza- where log p θ (x) acts as a regularizer, with parameters θ (e.g. an l 1 norm to promote a sparse solution (Combettes and Wajs, 2005)). Ozkan et al. (2017) investigate several intuition-and physics-based regularizers, and their effect on the beamformed image. The results show benefits for contrast and resolution for all proposed regularization methods, however each yielding different visual characteristics. This shows that choosing correct regularization terms and parameters that yield a robust beamformer can be challenging.

Post processing
After mapping the channel data to the image domain via beamforming, ultrasound systems apply several post processing steps. Classically, this includes further image processing to boost B-mode image quality (e.g. contrast, resolution, de-speckling), but also spatio-temporal processing to suppress tissue clutter and to estimate motion (e.g. blood flow). Beyond this, we see increasing attention for post-processing methods dedicated to advanced applications such as super-resolution ultrasound localization microscopy (ULM). We will now go over some of the model-based methods for post processing, covering B-mode image quality improvement, tissue clutter filtering, and ULM.

B-mode image quality improvement
Throughout the years, many B-mode image-quality boost algorithms have been proposed with aims that can be broadly categorized into: 1) resolution enhancement, 2) contrast enhancement, and 3) speckle suppression.
Although our focus lies on model-based methods (to recall: methods that are derived from models and first principles), it is worth nothing that Bmode processing often also relies on heuristics to accommodate e.g. user preferences. These include fine-tuned brightness curves (S-curve) to improve perceived contrast.
A commonly used method to boost image quality is to coherently compound multiple transmissions with diverse transmit parameters. Often, a simple measurement model similar to that in DAS is assumed, where multiple transmissions are (after potential TOF-alignment) assumed to measure the same tissue intensity for a given pixel, but with different Gaussian noise realizations. As for the Gaussian likelihood model for DAS, this then simply yields averaging of the individual measurements (e.g different plane wave angles, or frequencies). More advanced model-based compounding methods use MV-weighting of the transmits, thus assuming a likelihood model where multiple measurements have correlated noise: x r = arg max xr log p(y r |x r , Γ r ) = arg min xr (y r − 1x r ) H Γ −1 r (y r − 1x r ).
Note that here, unlike in MV beamforming, y r is a vector containing the beamformed pixel intensities from multiple transmits/measurements (after TOF alignment),x r is the compounded pixel, and Γ r is the auto-correlation matrix across the series of transmits to be estimated. Compounding can boost resolution, contrast, and suppress speckle. Bayesian interpretation to NLM has enabled ultrasound-specific implementations with more realistic (multiplicative) noise models (Coupé et al., 2008).
Other MAP approaches pose denoising as a dictionary matching problem (Jabarulla and Lee, 2018). These methods do not explicitly estimate patch density functions from the image, but instead learn a dictionary of patches.
To achieve a boost in image resolution, the problem can be recast as MAP estimation under a likelihood model that includes a deterministic blurring/pointspread-function matrix A blur : where x is the (vectorized) high-resolution image to be recovered, and y a (vectorized) blurred and noisy (Gaussian white) observation y. This deconvolution problem is ill posed and requires adequate regularization via priors.
As we noted before, the log-prior term can take many forms, including ℓ 1 or total variation based regularizers.
Clutter filtering for flow Slow moving tissue introduces a clutter signal that introduces artefacts and obscures the feature of interest being imaged (be it blood velocity or e.g. contrast agents), and considerable effort has gone into suppressing this tissue clutter signal. Although Infinite Impulse Response (IIR) and Finite Impulse Response (FIR) filters have been the most commonly used filters for tasks such as this, it is still very difficult to separate the signals originating from slow moving blood or fast moving tissue. Therefore, spatio-temporal clutter filtering is receiving increasing attention. We will here go over some of these more advanced methods (including singular value thresholding and robust principle component analysis), again taking a probabilistic MAP perspective.
We define the spatio-temporal measured signal as a Casorati matrix, Y ∈ R N M ×T , where N and M are spatial dimensions, and T is the time dimension, which we model as Y = X tissue + X blood , where X tissue ∈ R N M ×T is the tissue component, and X blood ∈ R N M ×T is the blood/flow component. We then impose a prior on X tissue , and assume it to be low rank. If we additionally assume X blood to have i.i.d. Gaussian entries, the MAP estimation problem for the tissue clutter signal becomes: where ∥·∥ F and ∥·∥ * denote the Frobenius norm and the nuclear norm, respectively. The solution to (48) is: where T SVT,λ is the singular value thresholding function, which is the proximal operator of the nuclear norm (Cai et al., 2010).
To improve upon the model in (48), one can include a more specific prior on the flow components, and separate them from the noise: where we place a mixed ℓ 1 /ℓ 2 prior on the blood flow component X blood , and assume i.i.d. Gaussian entries in the noise matrix N, such that: where ∥·∥ 1,2 indicates the ℓ 1 and ℓ 2 norm. This low-rank plus sparse optimization problem is also termed Robust Principle Component Analysis (RPCA), and can be solved through an iterative proximal gradient method: where T SVT,λ 1 is the solution of (48) (i.e. the proximal operator of the nuclear norm), T 1,2,λ 1 is the mixed ℓ 1 -ℓ 2 thresholding operation, and µ 1 and µ 2 are the gradient steps for the two terms. Shen et al. (2019) further augment the RPCA formulation to boost resolution for the blood flow estimates. To that end they add a PSF-based convolution kernel to the blood component A r ⊛ X blood , casting it as a joint deblurring and signal separation problem.

Ultrasound Localization Microscopy
We will now turn to an advanced and increasingly popular ultrasound signal processing application: ULM. Conventional ultrasound resolution is fundamentally limited by wave physics, to half the wavelength of the transmitted wave, i.e., the diffraction limit. This limit is in the range of millimeters for most ultrasound probes, and is inversely proportional to the transmission frequency. However, high transmit frequencies come at the cost of lower penetration depth.
To overcome this diffraction limit, ULM adapts concepts from Nobelprize winning super-resolution fluorescence microscopy to ultrasound. Instead of localizing fluorescent blinking molecules, ULM detects and localizes ultrasound contrast agents, microbubbles, flowing through the vascular bed.
These microbubbles have a size similar to red blood cells, and act as point scatterers. By accumulating precisely localized microbubbles across many frames, a super-resolution image of the vascular bed can be obtained. In typical implementations, the localization of the MB's is performed by centroid detection (Siepmann et al., 2011;Couture et al., 2011;Christensen-Jeffries et al., 2020).
Not surprisingly, we can also pose microbubble localization as a MAP estimation problem (Van Sloun et al., 2017). We define a sparse high-resolution image that is vectorized into x, in which only few pixels have non-zero entries: those pixels that contain a microbubble. Our vectorized measurements can then be modeled as: y = Ax + n, where A is a PSF matrix and n is a white Gaussian noise vector. This yields the following MAP problem: x = arg max according to a motion model .

Deep Learning for US Signal Processing
Deep learning based ultrasound signal processing offers a highly flexible framework for learning a desired input-output mappingX = f θ (Y ) from training data, overcoming the need for explicit modeling and derivation of solutions. This can especially be advantageous for complex problems in which models fall short (e.g. incomplete, with naive assumptions) or their solutions are demanding or even intractable. We will now go over some emerging applications of deep learning in the ultrasound signal processing pipeline. As in the previous section, we will first cover advanced methods for beamforming and then turn to downstream post-processing such as B-mode image quality improvement, clutter suppression and ULM.

Beamforming
We discern two categories of approaches: Neural networks that replace the entire mapping from channel data to images, and those that only replace the beamsumming operation, i.e. after TOF correction. Hyun et al. (2021) and Bell et al. (2020Bell et al. ( , 2019

Direct channel to image transformation
The authors in Nair et al. (2018Nair et al. ( , 2020  Most beamforming approaches thus benefit from traditional alignment of the channel data before processing. In addition, the authors provide a mechanism for jointly learning optimal channel selection/sparse array design via a technique dubbed deep probabilistic subsampling.

Beam-Summing after TOF correction
While most beamforming methods aim at boosting resolution and contrast, Hyun et al. (2019) argue that beamformers should accurately estimate the true tissue backscatter map, and thus also target speckle reduction. The authors train their beamformer on ultrasound simulations of a large variety of artificial tissue backscatter maps derived from natural images.

Post processing
Application of deep learning methods to general image processing/restoration problems has seen a surge of interest in recent years, showing remarkable performance across a range of applications. Naturally, these pure image processing methods are being explored for ultrasound post processing as well.
In this section we will treat the same topics as in the previous chapter, but focus on recent deep learning methods. (2020) attempt to extend it to 3D imaging using a 3D UNet model. It is an interesting point to note that Vedula et al. (2017) and Ando et al. (2020) use simulated data, while the other works on speckle reduction use in-vivo data gathered from volunteers. However, there is uniformity in how these works create their target images; through model based speckle reduction algorithms.

B-mode image quality improvement
Most deep learning methods for image quality improvement rely on supervised learning, requiring ground truth targets which are often difficult to obtain. As an alternative, Huh et al. (2021) present a self-supervised method based on the cycle-GAN architecture, originally developed for un-paired (cycle-consistent) style transfer (Huh et al., 2021). This approach aims at transferring the features of a high-quality target distribution of images to a given low-quality image, which the authors leverage to improve elevational image quality in 3D ultrasound.
Clutter filtering for flow processing rates (Brown et al., 2021). Notably, Youn et al. (2020) perform localization directly from channel data.

Model-Based Deep Learning for US Signal Processing
We now highlight several works that incorporate signal processing knowledge in their deep learning approaches to improve performance, reduce network complexity, and to provide reliable inference models. Generally, these models retain a large part of the conventional signal processing pipeline intact, and replace critical points in the processing with neural networks, so as to provide robust inference as a result. We will discuss methods ranging from iterative solvers, to unfolded fixed complexity solutions.

Beamforming
Model-based pre-focussing using DL Pre-focussing (or TOF correction) is conventionally done deterministically, based on the array geometry and assuming a constant speed-of-sound.
Instead, data-adaptive focusing, by calculating delays based on the recorded data, facilitates correction for speed-of-sound mismatches. The work by Nair et al. (2018Nair et al. ( , 2020 does this implicitly, by finding a direct mapping from the time-domain to an output image, using DL. However, this yields a black-box solution, which can be difficult to interpret. The authors of Kim et al. (2021) adhere more strictly to a conventional beamforming structure, and tackle this problem in two steps: first the estimation of a local speed-of-sound map, and secondly the calculation of the corresponding beamforming delays. The speed-of-sound image is predicted from multi-angled plane wave transmissions using SQI-net (Oh et al., 2021), a type of U-net. One then needs to find the propagation path and travel time of the transmitted pulse, i.e. the delay-matrix, between each imaging point and transducer element. For a uniform speed-of-sound this is trivial, since the shortest distance between a point and element corresponds to the fastest path. For a non-uniform speed-of-sound, this is more challenging, and requires a path finding algorithms that adds to the computational complexity. The Dijkstra algorithm (Dijkstra, 1959) for instance, which is commonly used to find the fastest path, has a complexity of O(n 2 log n), where n is the number of nodes in the graph, or equivalently, the density of the local speed-of-sound grid.
As such, the authors propose a second U-net style neural network, referred to as DelayNet, for estimating these delay times. The network comprises 3×3 locally masked convolutions, such that no filter weights are assigned in the direction opposite from direction of wave propagation. Intuitively, this can be understood as enforcing an increasing delay-time the further we get from the transducer, i.e. the wave does not move in reverse direction. Furthermore, the reduced filter count improves computational efficiency by ∼33%.
Finally, the predicted delay matrix is used to focus the RF data using the corrected delays, after which it is beamsummed to yield a beamformed output signal. As such, DelayNet does not to be trained directly on a target delaymatrix, but instead can be trained end-to-end on the desired beamformed targets. Note that in this method, the estimation of the speed-of-sound is done in a purely data-driven fashion. However, the pre-focussing itself inherits a model-based structure, by constraining the problem to learning time-shifts from the aforementioned speed-of-sound map.
Model-based beamsumming using DL Luijten et al. (2019Luijten et al. ( , 2020 propose adaptive beamforming by deep learning (ABLE), a deep learning based beamsumming approach that inherits its structure from adaptive beamforming algorithms, specifically minimum variance (MV) beamforming. ABLE specifically aims to overcome the most computationally complex part of the beamforming, the calculation of the adaptive apodization weights, replacing this with a neural network f θ . The step from the model-based MAP estimator to ABLE is then given bŷ where θ comprise the neural network weights, and y r the TOF corrected RF data. Multiplying the predicted weights with the TOF corrected data, and summing the result, yields a beamformed output signal.
Note that for training, we do not need access to the apodization weights as in MV beamforming. Instead, this is done end-to-end towards a MV generated target, given by arg min wherex MV is a MV training target, and L a loss function. Since the network operates directly on RF data, which has positive and negative signal components, as well as a high dynamic range, the authors propose an Antirectifier as an activation function. The Antirectifier introduces a non-linearlity while preserving the sign information and dynamic range, unlike the rectified linear unit, or hyperbolic tangent. Similarly, a signed-mean-squared-logarithmicerror (SMSLE) loss function is introduced, which ensures that errors in the RF domain reflect the errors in the log-compressed output image. The authors show that a relatively small network, comprising four fully connected layers, can solve this task, and is able to generalize well to different datasets.
They report an increase in resolution and contrast, while reducing computational complexity by 2 to 3 orders of magnitude. Wiacek et al. (2020) similarly exploit DNNs as a function approximator in order to accelerate the calculation of the short-lag spatial coherence (SLSC).
Specifically the authors apply their method to SLSC beamforming, which displays the spatial coherence of backscattered echoes across the transducer array. This contrasts conventional DAS beamforming in which the recorded pressures are visualized. The authors report a 3.4 times faster computation compared to the standard CPU based approach, corresponding to a framerate of 11 frames-per-second. Luchies and Byram (2018) propose a wideband DNN for suppressing offaxis scattering, which operates in the frequency domain, similar to ADMIRE discussed earlier. After focusing an axially gated section of channel data, the RF signals undergo a discrete Fourier transform (DFT), mapping the signal into different frequency bins. The neural network operates specifically on these frequency bins, after which the data is transformed back to the timedomain using the inverse discrete Fourier transform (IDFT) and summed to yield a beamformed signal. The same fully connected network structure was used for different center frequencies, only retraining the weights.
An extension of this work is described in Khan et al. (2021a), where the neural network itself is replaced by a model-based network architecture. The estimation of model parameters β, as formulated in (39), can be seen as a sparse coding problem y = Aβ (where β is a sparse vector) which can be solved by using an iterative algorithm such as ISTA. This yieldŝ where τ λ (·) is the soft-thresholding function parameterized by λ.
To derive a model-based network architecture, (57) is unfolded as a feedforward neural network with input A T y and outputβ, the predicted model coefficients. For each iteration, or fold, we can then learn the weight matrices, and the soft-thresholding parameter λ trainable. This then leads to a learned ISTA algorithm (LISTA): where W k represents a trainable fully connected layer and λ k is a (perfold) trainable thresholding parameter. When contrasted with its modelbased iterative counterpart ISTA, LISTA is a fixed complexity solution that tailors its processing to a given dataset using deep learning. Compared to conventional deep neural networks, LISTA has a low number of trainable parameters however.
The authors show that LISTA can be trained on model fits of ADMIRE, or even simulation data containing targets without off-axis scattering, thereby potentially outperforming the fully model-based algorithm, ADMIRE, due to its ability to learn optimal regularization parameters from data. (2021)  The ultrasound forward model is based on a set of differential equations, and mainly depends on three parameters: the acoustic velocity c 0 , the density ρ 0 , and the attenuation α 0 . Such a model could abstractly be defined as

Mamistvalov and Eldar
However, due to the complex non-linear nature of this forward model, a simplified linear model was developed in (details are given in Almansouri et al. (2018a)), which yields the estimator where A is a matrix that accounts for time-shifting and attenuation of the transmit pulse. The adjoint operator operator of the linearized model gives an approximate estimator for x, given byx = A T y. The authors adopt a U-net architecture, to compensate for artifacts caused by non-linearities.
Effectively the the network finds a mapping from a relatively simple estimate, yet based on the physical measurement model, and maps it to a desired high- where f (·) θ denotes the neural network, andx the high-quality estimate.

Post-Processing and Interpretation
Deep Unfolding for B-mode IQ enhancement/PW compounding/Compressed Acquisition Chennakeshava et al. (2020Chennakeshava et al. ( , 2021 propose a plane wave compounding and deconvolution method based on deep unfolding. Their architecture is based on a proximal gradient descent algorithm derived from a model-based MAP optimization problem, that is subsequently unfolded and trained to compound 3 plane wave images, gathered at low frequency, into an image gathered using 75 compounded plane wave transmissions at a higher frequency. This encourages a learned proximal operator that maps low-resolution, low-contrast input images onto a manifold of images with better spatial resolution and contrast.
Denote x ∈ R N as the vectorized high-resolution beamformed RF image, and y ∈ R N M the vectorized measurement of low-resolution beamformed RF images from M = 3 transmitted plane waves. The authors assume the following acquisition model: where and where y m is the vectorised, beamformed RF image belonging to the m th steered plane wave transmission, n ∈ R N M is a noise vector which is assumed to follow a Gaussian distribution with zero mean and diagonal covariance, and A ∈ R N M ×N is a block matrix, with its blocks A 1 , A 2 ,..., A M being the measurement matrices of individual PW acquisitions. The authors assume that the measurement matrices (which capture the system PSF for each PW) follow a convolutional Toeplitz structure.
Based on this model, each fold in the unfolded proximal gradient algorithm aimed at recovering the high-resolution image x is written as, where P θ is a Unet-style neural network replacing the generalised proximal operator, and where ⊛ denotes a convolutional operation, and {w Deep unfolding for clutter filtering Solomon et al. (2019a) propose deep unfolded convolutional robust RPCA for ultrasound clutter suppression. The approach is derived from the RPCA algorithm, given by (51) and (52), but unfolds it and learns all the parameters (gradient projection and regularization weights) from data. Each network layer in the unfolded architecture takes the following form: and,X where W 1 , W 2 , W 3 , W 4 , W 5 , and W 6 are trainable convolutional kernels.
The resulting deep network has two distinct (model-based) non-linearities/activations per layer: the mixed ℓ 1,2 thresholding, and singular value thresholding. The authors train the architecture end to end on a combination of simulations and RPCA results on real data, and demonstrate that it outperforms a strong non-model-based deep network (a ResNet).

Deep unfolding for ultrasound localisation microscopy
In the spirit of unfolding, Van Sloun et al. (2019a) propose to unfold their sparse recovery algorithm for ULM to enable accurate localization even for high concentrations of microbubbles. Similar to the previous examples of unfolding, each of the layers k in the resulting architecture takes the following form: with W where λ is a parameter that depends on the assumed noise variance. Iterative optimization of (71) was performed using gradient descent, and the recovered clean image is given byx = f −1 θ (ẑ).

Discussion
Over the past decade, the field of ultrasound signal processing has seen a large transformation, with the development of novel algorithms and processing methods. This development is driven for a large part by the move from hardware-to software based reconstruction. In this review, we have showcased several works, from conventional algorithms, to full deep learning based approaches; each having their own strengths and weaknesses.
Conventional model-based algorithms are based on first principles and offer a great amount of interpretability, which is relevant in clinical settings.
However, as we show in this paper, these methods rely on estimations, and often simplifications of the underlying physics model, which result in suboptimal signal reconstructions. For example, DAS beamforming assumes a linear measurement model, and a Gaussian noise profile, both of which are very crude approximations of a realistic ultrasound measurement. In contrast, adaptive methods (e.g. MV beamforming) that aim at modeling the signal statistics more accurately, are often computationally expensive to implement in real-time applications.
Spurred by the need to overcome these limitations, we see a shift in research towards data-driven signal processing methods (mostly based on deep learning), a trend that has started around 2014 (Zhang et al., 2021), which sees a significant increase in the number of peer reviewed AI publications.
This can be explained by 2 significant factors: 1) the availability of high compute-power GPUs, and 2) the availability of easy-to-use machine learning frameworks such as TensorFlow (Abadi et al., 2015) and PyTorch (Paszke et al., 2019), which have significantly lowered the threshold of entry into the field of AI for ultrasound researchers. However, the performance of datadriven, and more specifically, deep learning algorithms is inherently bounded by the availability of large amounts of high-quality training data. Acquiring ground truth data is not trivial in ultrasound beamforming and signal processing applications, and thus simulations or the outputs of advanced yet slow model-based algorithms are often considered as training targets. Moreover, the lack of clear understanding of the behavior of learned models (i.e. the black box model), and ability to predict their performance "in the wild", makes implementations in clinical devices challenging.
These general challenges associated with fully data-driven deep learning methods have in turn spurred research in the field of "model-based deep learning". Model-based deep learning combines the model-based and datadriven paradigms, and offers a robust signal processing framework. It enables learning those aspects of full models from data for which no adequate first-principles derivation is available, or complementing/augmenting partial model knowledge. Compared to conventional deep neural networks, these systems often require a smaller number of parameters, and less training data, in order to learn an accurate input-output mapping. ing under unknown non-diagonal covariance Gaussian channel noise is augmented with a neural network, and the entire hybrid solution is optimized end-to-end. The methods covered here aim to achieve a better imaging quality, e.g. temporal or spatial resolution, ultimately aiding in the diagnosis process. While a deeper analysis of the clinical relevance is a crucial and interesting topic, it is beyond the scope of this work.

Conclusion
In this review, we outline the development of signal processing methods in US, from classic model-based algorithms, to fully data driven DL based methods. We also discuss methods that lie at the intersection of these two approaches, using neural architectures inspired by model-based algorithms, and derived from probabilistic inference problems. We take a probabilistic perspective, offering a generalised framework with which we can describe the multitude of approaches described in this paper, all under the same umbrella.
This provides us insight into the demarcation between components derived from first principles, and the components derived from data. This also affords us the ability to combine these components in a unique combination, to derive architectures that integrate multiple classes of signal processing algorithms.
The application of such novel, DL based reconstruction methods requires the next generation of US devices to be equipped accordingly. Either by fast networking and on-device encoding, or by fully arming them with sufficient and appropriate processing power (GPUs and TPUs), which allows for flexible and real-time deployment of AI algorithms.