Applications, promises, and pitfalls of deep learning for fluorescence image reconstruction

Deep learning is becoming an increasingly important tool for image reconstruction in fluorescence microscopy. We review state-of-the-art applications such as image restoration and super-resolution imaging, and discuss how the latest deep learning research could be applied to other image reconstruction tasks. Despite its successes, deep learning also poses substantial challenges and has limits. We discuss key questions, including how to obtain training data, whether discovery of unknown structures is possible, and the danger of inferring unsubstantiated image details. This Perspective highlights recent applications of deep learning in fluorescence microscopy image reconstruction and discusses future directions and limitations of these approaches.

F luorescence microscopy is an indispensable tool in the biologist's arsenal that has enabled the systematic spatiotemporal dissection of life's molecular machines 1 . In all microscopy modalities, the resolution and quality of the images obtained is fundamentally limited by the optics, the photophysics of molecular probes, and sensor technology. Recent developments such as superresolution microscopy can allow researchers to circumvent these limits by means of clever experimental strategies followed by image reconstruction. In the case of photo-activated localization microscopy (PALM) 2 and stochastic optical reconstruction microscopy (STORM) 3 , hundreds of images need to be processed to reconstruct a super-resolved image 4 . Computation is thus increasingly becoming an essential component of the imaging process. Any substantial improvements in the algorithms used for reconstruction would not only improve image quality but also open the door to the development of new imaging modalities.
In recent years, deep learning 5 and, in particular, deep convolutional neural networks (CNNs) have had stunning successes, surpassing human performance for hard problems such as visual 6 and speech recognition 7 (Box 1). Almost every discipline of science and engineering has been impacted, from astronomy 8 and biology 9 to high-energy physics 10 . CNNs are particularly relevant in fields that produce or interpret images. For example, medical imaging picked up the trend early and applied deep CNNs to image processing of computed tomography and magnetic resonance images 11,12 . Fluorescence microscopy, too, has recently benefited from these advances, with deep CNNs applied to image restoration 13 , deconvolution 14,15 , super-resolution [16][17][18] , translation between label-free and fluorescence images 19,20 , and 'virtual' staining 21 , as well as image segmentation, classification, and phenotyping 22 .

Deep learning for image reconstruction
A common denominator between applications mentioned above is the formulation of image reconstruction as a transformation between acquired and reconstructed images. Classical approaches to image reconstruction handcraft these transformations from first principles (Box 2). For example, in deconvolution, this requires precise understanding of the optics and well-characterized noise statistics. This has led, for example, to the design of popular algorithms such as Richardson-Lucy deconvolution, which require knowledge of the point spread function of the microscope and assume Poisson noise statistics 23 . However, such handcrafted algorithms are limited by the accuracy of their assumptions. They do not capture the full statistical complexity of microscopy images. Hence, data-driven approaches may in some cases be more broadly applicable and better suited than analytical ones to solve image reconstruction problems. Ideally, scientists want generic reconstruction algorithms that can be trained from exemplary data obtained either experimentally or from simulation. This is how deep learning comes into play. Deep neural networks can learn end-to-end image transformations from data without the need for explicit analytical modeling (Box 3).

Fluorescence microscopy image reconstruction
Image reconstruction in fluorescence microscopy encompasses image denoising, deconvolution, registration, stitching, fusion, and super-resolution, as well as challenges brought by more exotic acquisition modalities such as tomographic, light-field, lens-less, and deep-tissue imaging. Here we review existing applications of deep learning in fluorescence microscopy and suggest promising new ones.
Image denoising. Image noise in fluorescence imaging is typically caused by the combined effect of Poisson statistics from photoncounting, thermal and readout noise of the sensor, and background autofluorescence. Early on, denoising algorithms were recognized as an opportunity to speed up acquisition by using shorter exposures and reduce photodamage by decreasing illumination intensity 24 . Algorithms were specially developed for fluorescence images that leveraged compressed sensing 25 , image self-similarity 24,26 , and thresholding after wavelet transform 27 . Today, state-of-the-art classical denoising algorithms in computer vision research use sparse coding techniques 28 , low-rank decomposition 29 , or image self-similarity 30,31 . The lesson learned over time is that to effectively denoise an image, it is better to look beyond small patches of pixels and gather information across multiple image regions and ideally learn from large sets of images. In 2009, Jain et al. 32 trained a five-layer CNN to denoise natural images, achieving a lower error rate and processing two orders of magnitude faster than that of previous approaches. Since then, numerous deep CNN architectures have been proposed for denoising natural images 33 . Recently, Weigert et al. 13 introduced content-aware image restoration (CARE), a method that trains networks with low signal-to-noise ratio (input) and high signal-to-noise ratio (target) image pairs. They demonstrate effective denoising of fluorescence microscopy images

Applications, promises, and pitfalls of deep learning for fluorescence image reconstruction
PersPective | FOCUS Nature MethoDs acquired under conditions with low light or short camera exposures (Fig. 1a). CARE maintains image quality while decreasing light exposure 60-fold and outperforms classical denoising such as nonlocal means 30 or block-matching and three-dimensional (3D) filtering 31 . Such deep-learning-based restoration can be useful when specimens are light sensitive, when fluorophore bleaching is of concern, or to study fast dynamics in live samples. Yet a major obstacle to applying deep learning for image denoising is the need for matched pairs of high-noise and low-noise training images. This requirement can be relaxed, as shown by recent developments. Self-supervision approaches such as noise2noise 34,35 , noise2self 36 , and noise2void 37,38 leverage statistical independence of noise across pixels. Another approach, deep image prior 39 , uses a generative model trained per image. Also promising are new, unpaired image transformation schemes such as cycle generative adversarial network (cycleGAN) 40 (Fig. 2f). We expect that these advances will broaden the applicability and facilitate the adoption of CNN-based denoising in fluorescence microscopy. Spatial deconvolution. The resolution of images obtained by fluorescence microscopy is fundamentally limited by the optical parameters of the microscope used. Image deconvolution can improve image quality by enhancing high-frequency details. In the case of 3D fluorescence microscopy, this is particularly useful for reducing axial blur and out-of-focus light 41,42 . Classic approaches such as Richardson-Lucy deconvolution 23 are still used extensively and have been extended to multi-view light-sheet microscopy 43 . Recent open-source packages such as DeconvolutionLab2 44 have helped to democratize advanced methods for deconvolution in fluorescence microscopy. Challenges remain: when imaging large samples, the point spread function (PSF) varies in space and across views, and even possibly over time. Shajkofci et al. 15 recently showed how a CNN can be used to estimate optical aberrations and improve image quality compared with that achieved with other deconvolution algorithms. It would be interesting to investigate whether the same approach can be applied to volumetric imaging. Another interesting opportunity would be to address challenges that are opto-mechanical in nature: in light-sheet microscopy there is a limited choice of illumination and detection objectives that can be placed orthogonal to each other. To solve this problem, one could use lower-numeri-cal-aperture objectives that mechanically fit and apply deep learning to computationally increase the detection numerical aperture 45 .
Axial inpainting. A pervasive problem in microscopy is poor axial sampling and resolution that leads to 3D image anisotropy, that is, lower resolution along the objective axis (z) than in the transverse (x-y) planes. To address this, Weigert et al. 13,14 recently demonstrated a self-supervised training strategy, Isonet, which can deconvolve axial images (xz and yz) to restore isotropic resolution and substantially outperforms the popular Richardson-Lucy algorithm.

Box 1 | the deep learning revolution
In 2012, Krizhevsky, Sutskever, and Hinton published their now famous AlexNet paper 89 . In that work, they showed that a deep convolutional neural network (CNN) could win the ImageNet Large-scale Visual Recognition Challenge (ILSVRC) by an unprecedented margin. AlexNet was trained on 1.2 million high-resolution training examples to classify images into 1,000 object categories. The double-digit decrease in error rate achieved by Hinton's team astounded the computer vision community. One year later, most competing teams used CNNs for what was considered at the time to be the most challenging machine learning competition (ILSVRC). Since 2012, deep CNNs have continued to break records, going from an error rate of 15.4% (ref. 89 ) in 2012 to just 3.08% (ref. 6 ) in 2018, surpassing human accuracy. Now, deep CNNs are the state of the art for major machine learning challenges on object 6 and speech recognition 7 . Recent notable artificial intelligence successes critically depended on deep CNNs, but also on other deep learning models such as deep reinforcement learning, as with DeepMind's AlphaGo victory against the second best human go player 84 . This has contributed to the enthusiasm, but also hype, surrounding deep learning.

Box 2 | Image reconstruction, inverse problems, and deep learning
In most imaging modalities, there is a well-founded understanding of the physics behind the image formation process. People can write formulas that describe, and code that simulates, the way light propagates through optical elements to eventually form an image on a detector; this is called the forward model f (see figure). This model is a transformation from the desired ideal image y to observed images x. Observed images x are incomplete, degraded, or convoluted compared with y. For example, image noise, low-pass filtering, pixel-value quantization, and subsampling cause partial and often irrecoverable loss of information. In the opposite direction, reconstructing the true image y from the observed images x is a challenging inverse problem. Finding the inverse of x necessitates a pseudo-inverse function g that is classically implemented as an iterative algorithm. One obstacle to inversion is that forward processes are often stochastic. Another fundamental difficulty is that the forward model maps many images to the same observation, and so there is no well-defined inverse but instead multiple confounding solutions (red line in the figure). However, in most cases, there is a priori knowledge about the validity of solutions, which makes it possible to select one inverse. Until recently, this prior information about the solution had to be mathematically formulated by experts and engineered into algorithms. These handcrafted priors typically prescribe, for example, that correct solutions should be smooth (Tikhonov's and total variation regularizations) or admit a concise representation in some basis (wavelet basis, dictionary) 108 . However, the statistical structure of real microscopy images is far more complex, thus limiting the quality of images reconstructed with such priors. This led to the advent of data-driven inversion schemes that learn a pseudo-inverse f from large datasets of (x, y) pairs. These pairs can be obtained by simulation, that is, by computation of f, or can be obtained empirically by means of clever experimental strategies that produce both x and y (ref. 13 ). The learning procedure learns not only the function g but also, implicitly, the prior knowledge about 'good' solutions. Deep learning is one promising data-driven approach for solving inverse problems and, by extension, image reconstruction tasks 109 .  (Fig. 1b). The authors used simulated pairs of diffraction-limited and super-resolved images to train their network and obtain high emitter detection efficiency on both real and simulated images. Instead of predicting high-resolution images, DeepLoco 48 demonstrates direct prediction of emitter coordinates. In another application of deep learning to super-resolution microscopy, Ouyang et al. 17 trained a U-Net (Box 4) fed with sparsely sampled PALM images to predict their densely sampled counterparts (Fig. 1c). In contrast with the mean-squared-error training loss of deep-STORM, their artificial neural network accelerated (ANNA)-PALM method uses an ingenious combination of L 1 , structural similarity (SSIM) and conditional adversarial losses 49,50 . Using a similar network architecture and training loss, Wang et al. 16 recently demonstrated that the resolution of diffraction-limited confocal images can be enhanced (Fig. 1d). This work shows that use of structured losses, whether explicit (SSIM) or adversarially learned (conditional GAN (cGAN); Box 4), result in output images that are sharper and of better perceptual quality 50 . For super-resolved fluorescence imaging on live cells, algorithms must cope with high emission densities, which has led to new approaches capable of handling higher densities, such as super-resolution radial fluctuations (SRRF) 51 . We expect that the use of CNNs in this challenging regime could improve image quality and temporal resolution for live cell imaging.

Structured illumination.
Another super-resolution approach is structured illumination microscopy (SIM). It can surpass the diffraction limit twofold by computational synthesis of images acquired by shifted illumination patterns 52,53 . However, current algorithms require precise knowledge of the effective pattern and are thus sensitive to distortions or attenuation caused by the sample. Although classical algorithms exist that can handle low-intensity 54 , unknown 55 , or distorted 56 illumination patterns, it would be interesting to compare these with a CNN-based SIM reconstruction algorithm (Fig. 2a), possibly leveraging ideas similar to magnetic resonance imaging reconstruction with automated transform by manifold approximation (AUTOMAP) 57 .
The training data could be generated by simulations that would consider all relevant degradations and distortions. We believe that such an algorithm could help scientists obtain better SIM reconstructions under challenging signal-to-noise conditions.
Spectral deconvolution. The trend in fluorescence imaging is for faster acquisition with more colors. However, fluorescent proteins and dyes used for labeling have broad, overlapping emission spectra, which leads to channel mixing that complicates interpretation. One solution that sacrifices speed is to acquire the channels sequentially. Another solution is to acquire the channels simultaneously, but this requires the separation of channels with spectral unmixing algorithms. To address this problem, algorithms such as linear unmixing 58 and phasor approaches 59 have been devised to deconvolve these spectra into separate channels corresponding to distinct labels. In some sense, the unmixing of multicolor acquisitions is a deconvolution problem in the spectral dimension that is difficult in low signal-to-noise conditions. Could CNNs be applied to deconvolve more wavelengths in poor signal-to-noise conditions? One straightforward approach to obtain training data would be to acquire each channel sequentially and then simultaneously (Fig. 2b), or to use physics-based models to generate synthetic training data.
Other, more creative uses of deep learning for multicolor microscopy can be envisioned: Hershko et al. 60 recently showed how CNNs can leverage chromatic effects on the PSF to determine the color of single emitters from a grayscale image.
Corrective uneven illumination and detection. Unfortunately, in most microscopes, illumination and detection efficiency are not uniform over the entire field of view. In widefield and spinning disk confocal microscopes 61 , uneven detection efficiency

Box 3 | Deep learning as differential learning
In deep learning, nonlinear parameterized processing modules are combined to progressively transform an input x into the desired output y, typically attaining only approximations ŷ (panels a and d). The adjective "deep" refers to the large number of stacked modules required for building a universal function approximator g (panel a). The learning process begins with a random choice for the parameters θ of the modules, which are iteratively updated to improve the accuracy of the reconstruction. This is possible because each module is differentiable, meaning that it is known exactly how changes in parameter values θ k cause changes in output values (panel b). More formally, the partial derivatives ŷ θ ∂ ∕∂ of the output ŷ are known with respect to any parameter θ. Similarly, one can compute the partial derivatives Artificial neural networks such as CNNs are the most popular implementation of such modules. Networks with a sufficient number of parameters, and at least three layers, can theoretically approximate any function 110 .
The back-propagation algorithm 111 uses the chain rule to efficiently compute all partial derivatives, or gradients, with just one forward pass through the network followed by a backward pass. The discrepancy between the desired output y and the actual output ŷ is quantified using a loss function l(ŷ, y) (panel c). The parameters θ are updated to minimize this loss using stochastic gradient descent (panel e). The speed provided by computation of the updates on small batches of data, in parallel, on specialized hardware, allows one to fit networks with millions of parameters on datasets with millions of observations. often requires nonuniformity correction. In the case of mesoscale light-sheet microscopy, the problem is far more complex because illumination shadows are caused by sample-induced occlusion 62 . We believe that CNNs are well positioned to address this problem. Promising CNN-based techniques exist in computer vision for context-based inpainting of missing image regions 63,64 . Fortunately, in light-sheet microscopy, information is not completely absent because the occlusion is often only partial. Moreover, in multiview imaging, training data are readily available in the form of images acquired from different detection arms or with light sheets of distinct propagation angles.
Scattered light haze reduction. Scattered light is a cause of substantial image-quality deterioration in live fluorescence imaging. For example, in light-sheet microscopy, background induced by scattered light is often suppressed to facilitate subsequent image processing steps such as image stitching and fusion 65 . Yet many challenges remain because scattering is depth sensitive and sample dependent. Recently, powerful hybrid approaches have been developed that combine CNNs and optical modeling for dehazing natural scenes 66 . Could such ideas be applied to microscopy to further improve the quality of images? In multiview light-sheet microscopy, an obvious way to obtain training data is to collect low-and highhaze image pairs from overlapping regions obtained from distinct views (Fig. 2c). In some cases, matching image pairs cannot be obtained, such as when one is imaging deep within a sample using a spinning-disk confocal microscope. In such instances, recent advances in unpaired image-to-image translation 40 could be used to learn to dehaze images using images at the surface and deeper in the sample (Fig. 2f).
Registration, stitching, and fusion. Because of the limited field of view of objectives and large size of some specimens, it is often necessary to break down an acquisition into 2D or 3D tiles 43 .
Reconstructing a complete image requires stitching the tiles together after registration to a common reference frame. State-of-the-art algorithms 43,67 rely on iterative schemes that leverage direct image cross-correlation or matching fiducial markers. One obvious application of deep learning to image registration is to teach CNNs to find image correspondences 68 , and thus have a deep learning component be part of a larger algorithm. However, one key promise of deep learning is its ability to train complex models solving a sequence of tasks in an end-to-end fashion. Recently, Rohe et al. 69 showed that pairs of large 3D medical images can be nonrigidly registered in less than 30 ms with a fully convolutional CNN that directly outputs a deformation vector field. Their method is more accurate and faster than traditional approaches and does not require iterations of an optimizer, but it does require a large amount of training data. Similarly, Nguyen et al. 70 demonstrated an unsupervised approach for nonrigid transformation estimation that is faster and more accurate than classical correlation-based or fiducial-based approaches. We expect that these ideas will eventually percolate into the field of   imaging. An upcoming and exciting challenge is imaging of live moving animals. For example, Caenorhabditis elegans larvae twist incessantly before hatching, making the analysis of time-lapse data challenging. This requires specialized stabilization algorithms capable of unfolding, straightening, and registering worm images 71 . Can a CNN learn to stabilize 3D time-lapse images of highly dynamic samples against a fixed template shape (Fig. 2d)?
Light-field, lens-less imaging and tomography. Recently, exotic forms of fluorescence imaging have emerged that pose novel computational challenges. For example, light-field imaging can multiplex an entire volume into a single 2D image acquisition, thus enabling rapid functional imaging at single-neuron resolution 72 . Preliminary work by Fei et al. 73 revealed results on CNN-based light-field 3D image reconstruction of live C. elegans worms (Fig. 1e). Similarly, lens-less imaging schemes such as DiffuserCam achieve one-shot 3D imaging by replacing the objective lens with a diffusing element placed in front of the sensor 74 . Tomography is another indirect approach to obtain 3D images from 2D acquisitions that is particularly relevant for mesoscopic fluorescence imaging 75 . In both cases the final 3D image is computationally reconstructed from a single 2D image. We believe that both lens-less imaging and tomography will benefit from deep-learning-based algorithms, just as has been demonstrated for light-field imaging.
Temporal consistency. Applications of deep learning to microscopy typically do not capture temporal patterns: predictions are made independently, frame by frame. Therefore, successive reconstructed images are not necessarily temporally consistent and can exhibit artifacts that are clearly noticeable when examined across time. Explicit modeling of the temporal dimension and training with time-resolved data would lead to more accurate reconstructions. A straightforward approach for augmenting any CNN with temporal consistency is simply to treat time as an additional dimension (Fig. 2h). However, this can become intractable for large networks handling long-term correlations. Instead, a better alternative is to combine CNNs with state-of-the-art recurrent neural networks whose outputs are fed back as inputs and hence are specifically designed for sequence prediction. One obvious candidate is the convolutional long-short term memory (convLSTM) architecture, which combines CNNs for space and LSTMs for time, and has been applied to weather forecasting 76 .
Beyond fluorescence. In some cases, fluorescence measurements are a proxy for other quantities. For example, measurement of material flow by fluorescence is an important tool in the emerging field of cell mechanobiology 77 . Unfortunately, current hand-crafted optical flow algorithms typically do not produce smooth flow fields in challenging imaging conditions. CNN-based optical flow estimation algorithms recently developed for self-driving cars 78 be a source of inspiration for designing more robust algorithms (Fig. 2e). Another interesting challenge is registering electron microscopy (EM) and fluorescence microscopy images. This task often requires the placement of fiducial markers visible in both modalities 79 . Recent work on virtual fluorescence labeling [19][20][21] suggests that it is possible to automatically translate images across modalities. Unsupervised approaches for learning image correspondence such as cycleGAN 40 could be applied for image registration in correlative imaging 20 .
Deep learning for adaptive and smart imaging. State-of-the-art microscopes-in particular light-sheet microscopes-are becoming adaptive machines capable of closed-loop control and realtime image analysis 80,81 . Yet current adaptive microscopes rely on time-consuming and complex iterative schemes 80 or direct measurements 81 . What if one could design a system that quickly and correctly guesses the best imaging parameters from few measurements? This is an inverse problem, because it may be easy to deduce the consequence of a particular choice of parameters but difficult to determine which choice will lead to the desired image. A few limited examples already show, for example, how to (1) train a CNN to focus light deep within turbid media 82 , (2) engineer illumination patterns for optimal imaging 83 , and (3) design PSFs for optimal spectral separation 60 . Yet the true challenge going forward will be to optimize multistep imaging strategies: the microscope will observe, make moves that change imaging parameters, and thus play a game with the ultimate goal of gaining the most information about the sample (Fig. 2i). Deep reinforcement learning-another key concept behind DeepMind's AlphaGo success 84 (Box 1)-could be used to automatically learn such strategies. Ultimately, we expect that most of the human decision-making required for imaging optimization will be done automatically, something that will also necessarily require extensive end-to-end robotic automation so that the 'game' can be restarted at will.

Limits and pitfalls
Deep learning versus classical methods. Although classical principled algorithms are often outperformed by deep learning models, they also retain key advantages. First, classical algorithms inherently produce outputs that are consistent with their inputs because they rely exclusively on first principles formulated as explicit analytical models instead of being trained from data (see section "The hallucination problem" and Box 4). Second, classic algorithms generalize to any valid measurement because they are not limited by the adequacy of the training data (see section "The generalization problem"). Third, classic algorithms-in contrast to deep learning models-do not produce widely erratic outputs after very minute tweaks to their inputs (see section "The adversarial fragility problem").

Box 4 | Architectures for image reconstruction
Image restoration or reconstruction requires neural networks that map one or several input images to an output image. Input and output images may be 2D or 3D and may even have different dimensions. The encoder-decoder convolutional network architecture 112 is one of the simplest networks capable of image-to-image translation. First, the input image is successively and repeatedly downsampled and convolved, and is also subjected to other normalization and nonlinear operations typical for CNNs. At every step, the image dimensions are reduced while the number of image channels-an additional dimension-is increased. This creates a representation bottleneck that is then reversed with successive and repeated upsampling and convolution steps. The bottleneck forces the network to encode the image into a small set of abstracted variables (often called latent variables) along the channel dimension, which are then decoded by the second half of the network. This ensures that only the most relevant features from the input images are retained and those that are deemed unessential to the reconstruction are discarded. The U-Net 113,114 is a variation of the encoder-decoder network. It adds shortcut or skip connections between the encoder and decoder branches. These additional connections are beneficial for image restoration tasks because they send fine image details directly to the decoder pass, skipping the bottleneck. For image restoration tasks in which the identity function is a good first guess, residual connections that add the input image to the output image can be used to learn a residual mapping. Overall, skip and residual connections have been shown to help train deeper networks by preventing vanishing gradients 115 .
In the GAN 49 framework, two networks, a generator and a discriminator, are trained together. In applications to microscopy, an improvement on GANs is often used: cGAN. The generator learns to transform an input image into an output image while the discriminator learns to classify images as real (from the training set; label 1) or counterfeit (from the generator; label 0). The input image is provided to the discriminator 50 . Therefore, cGANs can be understood as providing an implicitly learned loss function defined by the action of the discriminator. In practice, one achieves this behavior by training the generator to minimize the loss function and training the discriminator to maximize it. At the end of training, the discriminator network is dropped and only the generator is used for image reconstruction. The hallucination problem. The most serious issue when applying deep learning for discovery is that of hallucination. When looking at random patterns such as clouds, human brains can perceive shapes of objects and animals. Similarly, deep learning systems can hallucinate details and make mistakes when provided with inadequate training data (Box 5). These hallucinations are deceptive artifacts that appear highly plausible in the absence of contradictory information and can be challenging, if not impossible, to detect. Some network architectures, such as GANs, are particularly susceptible to hallucination because their explicit goal is to fool a discriminator by forging persuasive details. The cGANs (Box 4) as used by Isola et al. 50 alleviate this problem but may still produce unsubstantiated image details. A possible mitigation strategy is to add additional consistency losses to prevent hallucinations, as done in ANNA-PALM 17 . In general, mistakes made by deep learning models can be highly plausible and more subtle than those of classical algorithms, a potential cause for concern when such models are used for discovery.
The generalization problem. Another failure mode of neural networks is overlearning or overfitting: the neural network memorizes the training data exactly and fails to generalize to unseen data. This can be caused by insufficient training data, or by a poor choice of network parameters. Architectural features such as drop-out regularization can help, but the best way to avoid overfitting is to train with data that exceed the memorization capacity of the networks. Overall, current deep learning approaches typically do not generalize well to data obtained from a different microscope or under different conditions 13 .
The adversarial fragility problem. Recent research such as DeepFool 85 and the work by Sabour et al. 86 shows that neural networks can be tricked into producing completely different outputs after the application of imperceptible perturbations to their inputs. The modification of even a single pixel can be enough to fool deep neural networks into confusing different classes of object on the CIFAR-10 image classification dataset 87 . Overall, this fragility of deep learning in the face of adversarial attacks is concerning and raises many questions about the nature and brittleness of deep CNN computation (see "Interpreting and trusting the black box" below).
Research is currently under way in the deep learning community to better understand adversarial attacks, as well as defenses 88 .

Challenges
What amount of training data is needed?. The success of deep learning depends critically on the availability of training data and their application to the correct category of images. If the amount of training data is not sufficient, poor performance will ensue (Box 5). Yet a common misconception is that deep learning always requires a very large number of training examples. For example, Wang et al. 16 used at most 3,000 image pairs for training, but Christiansen et al. 19 needed <100 high-resolution images (at most 4,600 × 4,600 pixels), and Ounkomol et al. 20 used only 40 images (1,500 × 1,500 pixels). In some cases, the training and inference can be done on the very same 3D image stack, as shown by Weigert et al. 14 . This is in stark contrast with the millions of images needed for the ImageNet challenge 89 . Indeed, far fewer training data are required in fluorescence microscopy than are needed to distinguish between, for example, 800 different kinds of birds in natural images. In fact, the quality of the data and their suitability for the problem are perhaps more important (Box 5). In any case, fluorescence microscopy needs creative experimental and computational strategies to obtain more and better training data.
Strategies for obtaining training data. One approach is to perform dedicated experiments that produce the necessary images for training. For example, Weigert et al. 13 acquired pairs of low-quality and high-quality images by varying the exposure and laser power. In cases where the physics of the image-degradation process is well understood, forward-model simulations can be used to generate realistic images 13,18 . Neural networks could also be used to improve the quality of these simulations. Recently, much effort has been invested in building generative models of cells 90 using adversarial generative models [91][92][93] . The images produced by these models could in turn be used to train restoration or reconstruction algorithms.
Leveraging available training data. A classic approach to increasing the number of training examples is data augmentation, which creates variants of existing images through image rotation, scaling, lighting, and other methods. Another way to leverage available data is transfer learning, which relies on the observation that the first layers of a neural network act as detectors of universally applicable textures and patterns 94 (Fig. 2g). Transfer learning consists in pretraining networks on large datasets from other domains, thus accelerating convergence and improving generalization 95 . As an example, Estava et al. 96 recently demonstrated how Google's Inception network trained on ImageNet images can be retrained to classify dermatological images for melanoma classification. Instead of using ImageNet, could scientists use existing large collections of fluorescence microscopy images? Prime candidates are the cellular and organelle fluorescence images from the Human Protein Atlas 97 or

Box 5 | Deep-learning-based discovery and its limits
Does data-driven image reconstruction preclude discovery? Can novel structures and unknown patterns be discovered even if absent from the training data? To shed some light on this subtle but important question, we will use an analogy: let us train a neural network (U-Net) to restore a highly degraded image of an ancient English word: "Witenagemot. " First, we use a training dataset consisting of images of all three-letter substrings occurring in the most common English words. We exclude "Witenagemot" and its variants. As shown in the figure, the word is recovered.
We discovered a word that is not present in the training data.
Knowing which alphabet to expect does not preclude the discovery of new words, sentences, or ideas. What if we exclude the letters a and e from the training data? In that case, the network does not recognize these letters, and discovery of the word is hampered. If one uses the wrong alphabet for training-for example, 6,000 Chinese characters-discovery is impossible. The mistakes made are interesting in their own right: the network tries to reconcile the inadequate prior with the information present in the degraded image. Finally, we can rescue the restoration by using instead a 50%−50% mix of Chinese and Latin characters. However, we also see that some characters such as m are decorated with artifacts of Chinese provenance. These ideas need to be adapted to fluorescence imaging, and tools should be built to facilitate the interpretation of results. One possibility is to devise tools that would explain given predictions. Take, for example, low-contrast images of fluorescently labeled cell membranes that would be restored using a CNN. The user could select pixels along a predicted membrane and ask for an explanation-for example, in the form of a covariance map-that could be presented interactively to show which pixels and patterns in the input image justify the reconstruction and why. Yet it is not enough to explain how a result is attained; it is also necessary to quantify the confidence in these results. Recently, several methods for computing pixel-wise confidence measures and confidence intervals have been proposed that leverage an ensemble of predictors and adversarial training 13,103,104 . Another promising approach is to use internal consistency checks, for example, by comparing the input of the network to the output transformed by the forward model as done during inference in ANNA-PALM 17 .
Is discovery possible?. An oft-asked question is, "Is it possible to reconstruct structures or patterns that are not present in the training data?" To illustrate this problem, we use the restoration of degraded text as an analogy for image reconstruction in microscopy (Box 5).
What are the lessons to be learned? First, data-driven discovery is possible, but it depends critically on the quality and compatibility of the training data. There are also serious hallucination risks when the training data contain less or more information than strictly required. The same is true for microscopy: the recognition of image features does not preclude further analysis of their spatial organization or the study of their temporal dynamics. For example, being able to super-resolve microtubules with fewer raw image acquisitions makes it possible to make observations at higher temporal resolution and consequently opens the door to new discoveries 17,18 . However, mistakes can be made if the training data are inadequate. In light of these results, it is our opinion that deep learning results cannot be blindly trusted, something certainly true of any singular piece of evidence. Although confidence measures are needed 13 , discoveries must be confirmed from multiple lines of evidence gathered from distinct experiments.

Reproducibility.
Reproducibility is the bedrock of modern science. It is only natural to expect independent validation of published deep learning results. However, a lack of experimental consistency and reporting guidelines sometimes prevents deep learning researchers from replicating each other's results 105 . Moreover, the statistical significance of measured performance metrics is rarely reported in the published literature because of high costs-in terms of time and resources-associated with training 106 . To ameliorate this, researchers should adopt guidelines when publishing their results. Such best practices could include, for example, (1) making the source code and trained models freely available, (2) allowing access to the dataset that was used to train the models, (3) full disclosure of all hyperparameters used during training, (4) reporting the statistical significance of observed results, and (5) reporting failure cases in addition to exemplary ones. Such approaches will increase confidence in the results and facilitate widespread adoption of deep learning techniques in fluorescence microscopy.

Dissemination
To accelerate the adoption of deep learning in microscopy, new software frameworks tailored for biologists are needed to use, adapt, train, validate, and interpret deep neural networks 13 . In particular, tools with better ergonomics, user-friendly graphical interfaces, and smart parameter-free or auto-tuning algorithms would help lower the technical bar to adoption. Another potential obstacle is the cost of hardware for training. Training deep learning models today typically requires specialized hardware such as graphical processing units (GPUs). Using GPUs instead of standard central processing units (CPUs) leads to 100-times-faster training speeds. This typically reduces training times from weeks and days to hours. Whereas buying or assembling desktop computers outfitted with GPUs is relatively affordable, assembling and maintaining such a machine requires nontrivial technical skills. An alternative is pay-as-yougo cloud-based platforms, which are best reserved for short-term high-intensity workloads.

Conclusion
Deep learning holds many promises for fluorescence microscopy. Some applications will stand the test of time; others will fall short. Some applications will be redefined by deep learning, whereas others will continue to rely on classical methods. In either case, learned algorithms will be key for upcoming computational advances in microscopy. Because of the compositionality of deep learning models, it is possible to train models that directly process acquired raw data and produce the final analysis product, such as segmentation and classifications. Hence, in the future, we expect that deep learning models trained end-to-end will blur the frontier between image reconstruction and image analysis.
Reporting Summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability
Source code for the experiment described in Box 5 can be found at http://github.com/royerlab/DLDiscovery.

Statistics
For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section.
n/a Confirmed The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly The statistical test(s) used AND whether they are one-or two-sided Only common tests should be described solely by name; describe more complex techniques in the Methods section.
A description of all covariates tested A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors/reviewers. We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information.

Data
Policy information about availability of data All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: -Accession codes, unique identifiers, or web links for publicly available datasets -A list of figures that have associated raw data -A description of any restrictions on data availability N/A Field-specific reporting Please select the one below that is the best fit for your research. If you are not sure, read the appropriate sections before making your selection.

Life sciences Behavioural & social sciences Ecological, evolutionary & environmental sciences
For a reference copy of the document with all sections, see nature.com/documents/nr-reporting-summary-flat.pdf