Download as pdf or txt
Download as pdf or txt
You are on page 1of 69

Physics-Informed Neural Networks M.

Raissi & P. Perdikaris & G.E.


Karniadakis
Visit to download the full and correct content document:
https://ebookmass.com/product/physics-informed-neural-networks-m-raissi-p-perdikar
is-g-e-karniadakis/
More products digital (pdf, epub, mobi) instant
download maybe you interests ...

Elsevier Weekblad - Week 26 - 2022 Gebruiker

https://ebookmass.com/product/elsevier-weekblad-
week-26-2022-gebruiker/

Jock Seeks Geek: The Holidates Series Book #26 Jill


Brashear

https://ebookmass.com/product/jock-seeks-geek-the-holidates-
series-book-26-jill-brashear/

The New York Review of Books – N. 09, May 26 2022


Various Authors

https://ebookmass.com/product/the-new-york-review-of-
books-n-09-may-26-2022-various-authors/

Accelerators for Convolutional Neural Networks Arslan


Munir

https://ebookmass.com/product/accelerators-for-convolutional-
neural-networks-arslan-munir/
Graph neural networks for efficient learning of
mechanical properties of polycrystals Jonathan M.
Hestroffer

https://ebookmass.com/product/graph-neural-networks-for-
efficient-learning-of-mechanical-properties-of-polycrystals-
jonathan-m-hestroffer/

Calculate with Confidence, 8e (Oct 26,


2021)_(0323696953)_(Elsevier) 8th Edition Morris Rn
Bsn Ma Lnc

https://ebookmass.com/product/calculate-with-
confidence-8e-oct-26-2021_0323696953_elsevier-8th-edition-morris-
rn-bsn-ma-lnc/

Analysis and Visualization of Discrete Data Using


Neural Networks Koji Koyamada

https://ebookmass.com/product/analysis-and-visualization-of-
discrete-data-using-neural-networks-koji-koyamada/

Building Computer Vision Applications Using Artificial


Neural Networks, 2nd Edition Shamshad Ansari

https://ebookmass.com/product/building-computer-vision-
applications-using-artificial-neural-networks-2nd-edition-
shamshad-ansari/

Quantum-like Networks: An Approach to Neural Behavior


through their Mathematics and Logic 1st Edition Stephen
A. Selesnick

https://ebookmass.com/product/quantum-like-networks-an-approach-
to-neural-behavior-through-their-mathematics-and-logic-1st-
edition-stephen-a-selesnick/
Accepted Manuscript

Physics-Informed Neural Networks: A Deep Learning Framework for Solving Forward and
Inverse Problems Involving Nonlinear Partial Differential Equations

M. Raissi, P. Perdikaris, G.E. Karniadakis

PII: S0021-9991(18)30712-5
DOI: https://doi.org/10.1016/j.jcp.2018.10.045
Reference: YJCPH 8347

To appear in: Journal of Computational Physics

Received date: 13 June 2018


Revised date: 26 October 2018
Accepted date: 28 October 2018

Please cite this article in press as: M. Raissi et al., Physics-Informed Neural Networks: A Deep Learning Framework for Solving Forward
and Inverse Problems Involving Nonlinear Partial Differential Equations, J. Comput. Phys. (2018), https://doi.org/10.1016/j.jcp.2018.10.045

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing
this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is
published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all
legal disclaimers that apply to the journal pertain.
Highlights
• We put forth a deep learning framework that creates a new interface between machine learning and scientific computing by enabling
the synergistic combination of models and data.
• We introduce an effective mechanism for incorporating first physical principles into machine learning and regularize the training of
deep neural networks in small data regimes.
• The resulting data-driven algorithms open a new path to discovering governing equations, and predicting complex dynamics from
incomplete models and incomplete data.
Physics-Informed Neural Networks: A Deep Learning
Framework for Solving Forward and Inverse Problems
Involving Nonlinear Partial Differential Equations
M. Raissi1 , P. Perdikaris2 and G.E. Karniadakis1
1
Division of Applied Mathematics, Brown University,
Providence, RI, 02912, USA
2
Department of Mechanical Engineering and Applied Mechanics,
University of Pennsylvania,
Philadelphia, PA, 19104, USA

Abstract
We introduce physics-informed neural networks – neural networks that are
trained to solve supervised learning tasks while respecting any given laws of
physics described by general nonlinear partial differential equations. In this
work, we present our developments in the context of solving two main classes
of problems: data-driven solution and data-driven discovery of partial differ-
ential equations. Depending on the nature and arrangement of the available
data, we devise two distinct types of algorithms, namely continuous time
and discrete time models. The first type of models forms a new family of
data-efficient spatio-temporal function approximators, while the latter type
allows the use of arbitrarily accurate implicit Runge-Kutta time stepping
schemes with unlimited number of stages. The effectiveness of the proposed
framework is demonstrated through a collection of classical problems in flu-
ids, quantum mechanics, reaction-diffusion systems, and the propagation of
nonlinear shallow-water waves.
Keywords: Data-driven scientific computing, Machine learning, Predictive
modeling, Runge-Kutta methods, Nonlinear dynamics

1 1. Introduction
2 With the explosive growth of available data and computing resources,
3 recent advances in machine learning and data analytics have yielded trans-

Preprint submitted to Journal of Computational Physics November 2, 2018


4 formative results across diverse scientific disciplines, including image recog-
5 nition [1], cognitive science [2], and genomics [3]. However, more often than
6 not, in the course of analyzing complex physical, biological or engineering
7 systems, the cost of data acquisition is prohibitive, and we are inevitably
8 faced with the challenge of drawing conclusions and making decisions under
9 partial information. In this small data regime, the vast majority of state-
10 of-the-art machine learning techniques (e.g., deep/convolutional/recurrent
11 neural networks) are lacking robustness and fail to provide any guarantees
12 of convergence.
13

14 At first sight, the task of training a deep learning algorithm to accurately


15 identify a nonlinear map from a few – potentially very high-dimensional –
16 input and output data pairs seems at best naive. Coming to our rescue, for
17 many cases pertaining to the modeling of physical and biological systems,
18 there exists a vast amount of prior knowledge that is currently not being uti-
19 lized in modern machine learning practice. Let it be the principled physical
20 laws that govern the time-dependent dynamics of a system, or some empir-
21 ically validated rules or other domain expertise, this prior information can
22 act as a regularization agent that constrains the space of admissible solutions
23 to a manageable size (e.g., in incompressible fluid dynamics problems by dis-
24 carding any non realistic flow solutions that violate the conservation of mass
25 principle). In return, encoding such structured information into a learning
26 algorithm results in amplifying the information content of the data that the
27 algorithm sees, enabling it to quickly steer itself towards the right solution
28 and generalize well even when only a few training examples are available.
29

30 The first glimpses of promise for exploiting structured prior information to


31 construct data-efficient and physics-informed learning machines have already
32 been showcased in the recent studies of [4–6]. There, the authors employed
33 Gaussian process regression [7] to devise functional representations that are
34 tailored to a given linear operator, and were able to accurately infer solu-
35 tions and provide uncertainty estimates for several prototype problems in
36 mathematical physics. Extensions to nonlinear problems were proposed in
37 subsequent studies by Raissi et. al. [8, 9] in the context of both inference and
38 systems identification. Despite the flexibility and mathematical elegance of
39 Gaussian processes in encoding prior information, the treatment of nonlinear
40 problems introduces two important limitations. First, in [8, 9] the authors
41 had to locally linearize any nonlinear terms in time, thus limiting the applica-

2
42 bility of the proposed methods to discrete-time domains and compromising
43 the accuracy of their predictions in strongly nonlinear regimes. Secondly,
44 the Bayesian nature of Gaussian process regression requires certain prior as-
45 sumptions that may limit the representation capacity of the model and give
46 rise to robustness/brittleness issues, especially for nonlinear problems [10].

47 2. Problem setup
48 In this work we take a different approach by employing deep neural net-
49 works and leverage their well known capability as universal function ap-
50 proximators [11]. In this setting, we can directly tackle nonlinear problems
51 without the need for committing to any prior assumptions, linearization, or
52 local time-stepping. We exploit recent developments in automatic differenti-
53 ation [12] – one of the most useful but perhaps under-utilized techniques in
54 scientific computing – to differentiate neural networks with respect to their
55 input coordinates and model parameters to obtain physics-informed neural
56 networks. Such neural networks are constrained to respect any symmetries,
57 invariances, or conservation principles originating from the physical laws that
58 govern the observed data, as modeled by general time-dependent and non-
59 linear partial differential equations. This simple yet powerful construction
60 allows us to tackle a wide range of problems in computational science and in-
61 troduces a potentially transformative technology leading to the development
62 of new data-efficient and physics-informed learning machines, new classes of
63 numerical solvers for partial differential equations, as well as new data-driven
64 approaches for model inversion and systems identification.
65

66 The general aim of this work is to set the foundations for a new paradigm
67 in modeling and computation that enriches deep learning with the longstand-
68 ing developments in mathematical physics. To this end, our manuscript is
69 divided into two parts that aim to present our developments in the con-
70 text of two major classes of problems: data-driven solution and data-driven
71 discovery of partial differential equations. All code and data-sets accom-
72 panying this manuscript are available on GitHub at https://github.com/
73 maziarraissi/PINNs. Throughout this work we have been using relatively
74 simple deep feed-forward neural networks architectures with hyperbolic tan-
75 gent activation functions and no additional regularization (e.g., L1/L2 penal-
76 ties, dropout, etc.). Each numerical example in the manuscript is accompa-
77 nied with a detailed discussion about the neural network architecture we

3
78 employed as well as details about its training process (e.g. optimizer, learn-
79 ing rates, etc.). Finally, a comprehensive series of systematic studies that
80 aims to demonstrate the performance of the proposed methods is provided
81 in Appendix A and Appendix B.
82

83 In this work, we consider parametrized and nonlinear partial differential


84 equations of the general form
ut + N [u; λ] = 0, x ∈ Ω, t ∈ [0, T ], (1)
85 where u(t, x) denotes the latent (hidden) solution, N [·; λ] is a nonlinear op-
86 erator parametrized by λ, and Ω is a subset of RD . This setup encapsulates a
87 wide range of problems in mathematical physics including conservation laws,
88 diffusion processes, advection-diffusion-reaction systems, and kinetic equa-
89 tions. As a motivating example, the one dimensional Burgers equation [13]
90 corresponds to the case where N [u; λ] = λ1 uux − λ2 uxx and λ = (λ1 , λ2 ).
91 Here, the subscripts denote partial differentiation in either time or space.
92 Given noisy measurements of the system, we are interested in the solution
93 of two distinct problems. The first problem is that of inference, filtering and
94 smoothing, or data-driven solutions of partial differential equations [4, 8]
95 which states: given fixed model parameters λ what can be said about the
96 unknown hidden state u(t, x) of the system? The second problem is that of
97 learning, system identification, or data-driven discovery of partial differential
98 equations [5, 9, 14] stating: what are the parameters λ that best describe the
99 observed data?

100 3. Data-driven solutions of partial differential equations


101 Let us start by concentrating on the problem of computing data-driven so-
102 lutions to partial differential equations (i.e., the first problem outlined above)
103 of the general form
ut + N [u] = 0, x ∈ Ω, t ∈ [0, T ], (2)
104 where u(t, x) denotes the latent (hidden) solution, N [·] is a nonlinear differ-
105 ential operator, and Ω is a subset of RD . In sections 3.1 and 3.2, we put
106 forth two distinct types of algorithms, namely continuous and discrete time
107 models, and highlight their properties and performance through the lens of
108 different benchmark problems. In the second part of our study (see section
109 4), we shift our attention to the problem of data-driven discovery of partial
110 differential equations [5, 9, 14].

4
111 3.1. Continuous Time Models
112 We define f (t, x) to be given by the left-hand-side of equation (2); i.e.,

f := ut + N [u], (3)

113 and proceed by approximating u(t, x) by a deep neural network. This as-
114 sumption along with equation (3) result in a physics-informed neural net-
115 work f (t, x). This network can be derived by applying the chain rule for
116 differentiating compositions of functions using automatic differentiation [12],
117 and has the same parameters as the network representing u(t, x), albeit with
118 different activation functions due to the action of the differential operator
119 N . The shared parameters between the neural networks u(t, x) and f (t, x)
120 can be learned by minimizing the mean squared error loss

M SE = M SEu + M SEf , (4)

121 where
1 
Nu
M SEu = |u(tiu , xiu ) − ui |2 ,
Nu i=1
122 and
Nf
1 
M SEf = |f (tif , xif )|2 .
Nf i=1

123 Here, {tiu , xiu , ui }N


i=1 denote the initial and boundary training data on u(t, x)
u

N
124 and {tif , xif }i=1 specify the collocations points for f (t, x). The loss M SEu
f

125 corresponds to the initial and boundary data while M SEf enforces the struc-
126 ture imposed by equation (2) at a finite set of collocation points. Although
127 similar ideas for constraining neural networks using physical laws have been
128 explored in previous studies [15, 16], here we revisit them using modern
129 computational tools, and apply them to more challenging dynamic problems
130 described by time-dependent nonlinear partial differential equations.
131

132 Here, we should underline an important distinction between this line of


133 work and existing approaches in the literature that elaborate on the use of
134 machine learning in computational physics. The term physics-informed ma-
135 chine learning has been also recently used by Wang et. al. [17] in the context
136 of turbulence modeling. Other examples of machine learning approaches for
137 predictive modeling of physical systems include [18–29]. All these approaches

5
138 employ machine learning algorithms like support vector machines, random
139 forests, Gaussian processes, and feed-forward/convolutional/recurrent neural
140 networks merely as black-box tools. As described above, the proposed work
141 aims to go one step further by revisiting the construction of “custom” activa-
142 tion and loss functions that are tailored to the underlying differential opera-
143 tor. This allows us to open the black-box by understanding and appreciating
144 the key role played by automatic differentiation within the deep learning field.
145 Automatic differentiation in general, and the back-propagation algorithm in
146 particular, is currently the dominant approach for training deep models by
147 taking their derivatives with respect to the parameters (e.g., weights and
148 biases) of the models. Here, we use the exact same automatic differentiation
149 techniques, employed by the deep learning community, to physics-inform
150 neural networks by taking their derivatives with respect to their input co-
151 ordinates (i.e., space and time) where the physics is described by partial
152 differential equations. We have empirically observed that this structured ap-
153 proach introduces a regularization mechanism that allows us to use relatively
154 simple feed-forward neural network architectures and train them with small
155 amounts of data. The effectiveness of this simple idea may be related to
156 the remarks put forth by Lin, Tegmark and Rolnick [30] and raises many
157 interesting questions to be quantitatively addressed in future research. To
158 this end, the proposed work draws inspiration from the early contributions of
159 Psichogios and Ungar [16], Lagaris et. al. [15], as well as the contemporary
160 works of Kondor [31, 32], Hirn [33], and Mallat [34].
161

162 In all cases pertaining to data-driven solution of partial differential equa-


163 tions, the total number of training data Nu is relatively small (a few hundred
164 up to a few thousand points), and we chose to optimize all loss functions using
165 L-BFGS, a quasi-Newton, full-batch gradient-based optimization algorithm
166 [35]. For larger data-sets, such as the data-driven model discovery examples
167 discussed in section 4, a more computationally efficient mini-batch setting can
168 be readily employed using stochastic gradient descent and its modern vari-
169 ants [36, 37]. Despite the fact that there is no theoretical guarantee that this
170 procedure converges to a global minimum, our empirical evidence indicates
171 that, if the given partial differential equation is well-posed and its solution is
172 unique, our method is capable of achieving good prediction accuracy given
173 a sufficiently expressive neural network architecture and a sufficient num-
174 ber of collocation points Nf . This general observation deeply relates to the
175 resulting optimization landscape induced by the mean square error loss of

6
176 equation 4, and defines an open question for research that is in sync with
177 recent theoretical developments in deep learning [38, 39]. To this end, we
178 will test the robustness of the proposed methodology using a series of sys-
179 tematic sensitivity studies that are provided in Appendix A and Appendix B.
180

181 3.1.1. Example (Shrödinger Equation)


182 This example aims to highlight the ability of our method to handle pe-
183 riodic boundary conditions, complex-valued solutions, as well as different
184 types of nonlinearities in the governing partial differential equations. The
185 one-dimensional nonlinear Schrödinger equation is a classical field equation
186 that is used to study quantum mechanical systems, including nonlinear wave
187 propagation in optical fibers and/or waveguides, Bose-Einstein condensates,
188 and plasma waves. In optics, the nonlinear term arises from the intensity
189 dependent index of refraction of a given material. Similarly, the nonlinear
190 term for Bose-Einstein condensates is a result of the mean-field interactions
191 of an interacting, N-body system. The nonlinear Schrödinger equation along
192 with periodic boundary conditions is given by
iht + 0.5hxx + |h|2 h = 0, x ∈ [−5, 5], t ∈ [0, π/2], (5)
h(0, x) = 2 sech(x),
h(t, −5) = h(t, 5),
hx (t, −5) = hx (t, 5),
193 where h(t, x) is the complex-valued solution. Let us define f (t, x) to be given
194 by
f := iht + 0.5hxx + |h|2 h,
195 and proceed by placing a complex-valued neural network prior on h(t, x).
196 In fact, if u denotes the real part of h and v is the imaginary
 part, we

197 are placing a multi-out neural network prior on h(t, x) = u(t, x) v(t, x) .
198 This will result in the complex-valued (multi-output) physic-informed neural
199 network f (t, x). The shared parameters of the neural networks h(t, x) and
200 f (t, x) can be learned by minimizing the mean squared error loss
M SE = M SE0 + M SEb + M SEf , (6)
201 where
1 
N0
M SE0 = |h(0, xi0 ) − hi0 |2 ,
N0 i=1

7
202
Nb
1   i i 
M SEb = |h (tb , −5) − hi (tib , 5)|2 + |hix (tib , −5) − hix (tib , 5)|2 ,
Nb i=1
203 and
Nf
1 
M SEf = |f (tif , xif )|2 .
Nf i=1
i Nb
204 Here, {xi0 , hi0 }N
i=1 denotes the initial data, {tb }i=1 corresponds to the colloca-
0

Nf
205 tion points on the boundary, and {tif , xif }i=1 represents the collocation points
206 on f (t, x). Consequently, M SE0 corresponds to the loss on the initial data,
207 M SEb enforces the periodic boundary conditions, and M SEf penalizes the
208 Schrödinger equation not being satisfied on the collocation points.
209

210 In order to assess the accuracy of our method, we have simulated equation
211 (5) using conventional spectral methods to create a high-resolution data set.
212 Specifically, starting from an initial state h(0, x) = 2 sech(x) and assuming
213 periodic boundary conditions h(t, −5) = h(t, 5) and hx (t, −5) = hx (t, 5), we
214 have integrated equation (5) up to a final time t = π/2 using the Chebfun
215 package [40] with a spectral Fourier discretization with 256 modes and a
216 fourth-order explicit Runge-Kutta temporal integrator with time-step Δt =
217 π/2 · 10−6 . Under our data-driven setting, all we observe are measurements
218 {xi0 , hi0 }N
i=1 of the latent function h(t, x) at time t = 0. In particular, the train-
0

219 ing set consists of a total of N0 = 50 data points on h(0, x) randomly parsed
220 from the full high-resolution data-set, as well as Nb = 50 randomly sampled
221 collocation points {tib }N b
i=1 for enforcing the periodic boundaries. Moreover,
222 we have assumed Nf = 20, 000 randomly sampled collocation points used
223 to enforce equation (5) inside the solution domain. All randomly sampled
224 point locations were generated using a space filling Latin Hypercube Sam-
225 pling strategy [41].
226

227 Here our goal is to infer the entire spatio-temporal solution h(t, x) of the
228 Schrödinger equation (5). We chose to jointly represent the latent function
229 h(t, x) = [u(t, x) v(t, x)] using a 5-layer deep neural network with 100 neu-
230 rons per layer and a hyperbolic tangent activation function. In general, the
231 neural network should be given sufficient approximation capacity in order to
232 accommodate the anticipated complexity of u(t, x). Although more system-
233 atic procedures such as Bayesian optimization [42] can be employed in order

8
234 to fine-tune the design of the neural network, in the absence of theoretical
235 error/convergence estimates, the interplay between the neural architecture/-
236 training procedure and the complexity of the underlying differential equation
237 is still poorly understood. One viable path towards assessing the accuracy
238 of the predicted solution could come by adopting a Bayesian approach and
239 monitoring the variance of the predictive posterior distribution, but this goes
240 beyond the scope of the present work and will be investigated in future stud-
241 ies.
242 In this example, our setup aims to highlight the robustness of the pro-
243 posed method with respect to the well known issue of over-fitting. Specifi-
244 cally, the term in M SEf in equation (6) acts as a regularization mechanism
245 that penalizes solutions that do not satisfy equation (5). Therefore, a key
246 property of physics-informed neural networks is that they can be effectively
247 trained using small data sets; a setting often encountered in the study of
248 physical systems for which the cost of data acquisition may be prohibitive.
249 Figure 1 summarizes the results of our experiment. Specifically, the top
250 panel of figure 1shows the magnitude of the predicted spatio-temporal solu-
251 tion |h(t, x)| = u2 (t, x) + v 2 (t, x), along with the locations of the initial and
252 boundary training data. The resulting prediction error is validated against
253 the test data for this problem, and is measured at 1.97 · 10−3 in the rela-
254 tive L2 -norm. A more detailed assessment of the predicted solution is pre-
255 sented in the bottom panel of Figure 1. In particular, we present a compar-
256 ison between the exact and the predicted solutions at different time instants
257 t = 0.59, 0.79, 0.98. Using only a handful of initial data, the physics-informed
258 neural network can accurately capture the intricate nonlinear behavior of the
259 Schrödinger equation.
260

261 One potential limitation of the continuous time neural network models
262 considered so far stems from the need to use a large number of colloca-
263 tion points Nf in order to enforce physics-informed constraints in the en-
264 tire spatio-temporal domain. Although this poses no significant issues for
265 problems in one or two spatial dimensions, it may introduce a severe bot-
266 tleneck in higher dimensional problems, as the total number of collocation
267 points needed to globally enforce a physics-informed constrain (i.e., in our
268 case a partial differential equation) will increase exponentially. Although
269 this limitation could be addressed to some extend using sparse grid or quasi
270 Monte-Carlo sampling schemes [43, 44], in the next section, we put forth a
271 different approach that circumvents the need for collocation points by in-

9
|h(t, x)|
5
Data (150 points) 3.5
3.0
2.5
0 2.0
x

1.5
1.0
0.5
−5
0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4
t

t = 0.59 t = 0.79 t = 0.98


5 5 5
|h(t, x)|

|h(t, x)|

|h(t, x)|
0 0 0
−5 0 5 −5 0 5 −5 0 5
x x x
Exact Prediction

Figure 1: Shrödinger equation: Top: Predicted solution |h(t, x)| along with the initial and
boundary training data. In addition we are using 20,000 collocation points generated using
a Latin Hypercube Sampling strategy. Bottom: Comparison of the predicted and exact
solutions corresponding to the three temporal snapshots depicted by the dashed vertical
lines in the top panel. The relative L2 error for this case is 1.97 · 10−3 .

272 troducing a more structured neural network representation leveraging the


273 classical Runge-Kutta time-stepping schemes [45].

274 3.2. Discrete Time Models


275 Let us apply the general form of Runge-Kutta methods with q stages [45]
276 to equation (2) and obtain

un+ci = un − Δt qj=1 aij N [un+cj ], i = 1, . . . , q,
 (7)
un+1 = un − Δt qj=1 bj N [un+cj ].

277 Here, un+cj (x) = u(tn + cj Δt, x) for j = 1, . . . , q. This general form en-
278 capsulates both implicit and explicit time-stepping schemes, depending on
279 the choice of the parameters {aij , bj , cj }. Equations (7) can be equivalently

10
280 expressed as
un = uni , i = 1, . . . , q,
(8)
un = unq+1 ,
281 where
q
uni := un+ci + Δt j=1 aij N [un+cj ], i = 1, . . . , q,
q (9)
unq+1 := un+1 + Δt j=1 bj N [u
n+cj
].
282 We proceed by placing a multi-output neural network prior on
 n+c 
u 1 (x), . . . , un+cq (x), un+1 (x) . (10)

283 This prior assumption along with equations (9) result in a physics-informed
284 neural network that takes x as an input and outputs
 n 
u1 (x), . . . , unq (x), unq+1 (x) . (11)

285 3.2.1. Example (Allen-Cahn Equation)


286 This example aims to highlight the ability of the proposed discrete time
287 models to handle different types of nonlinearity in the governing partial dif-
288 ferential equation. To this end, let us consider the Allen-Cahn equation along
289 with periodic boundary conditions

ut − 0.0001uxx + 5u3 − 5u = 0, x ∈ [−1, 1], t ∈ [0, 1], (12)


u(0, x) = x2 cos(πx),
u(t, −1) = u(t, 1),
ux (t, −1) = ux (t, 1).

290 The Allen-Cahn equation is a well-known equation from the area of reaction-
291 diffusion systems. It describes the process of phase separation in multi-
292 component alloy systems, including order-disorder transitions. For the Allen-
293 Cahn equation, the nonlinear operator in equation (9) is given by
 3
N [un+cj ] = −0.0001un+c
xx
j
+ 5 un+cj − 5un+cj ,

294 and the shared parameters of the neural networks (10) and (11) can be
295 learned by minimizing the sum of squared errors

SSE = SSEn + SSEb , (13)

11
296 where
q+1 Nn
 
SSEn = |unj (xn,i ) − un,i |2 ,
j=1 i=1

297 and
q

SSEb = |un+ci (−1) − un+ci (1)|2 + |un+1 (−1) − un+1 (1)|2
i=1
q

+ |un+c
x
i
(−1) − un+c
x
i
(1)|2 + |un+1 n+1
x (−1) − ux (1)| .
2

i=1

298 Here, {xn,i , un,i }N n


i=1 corresponds to the data at time-step t . In classical nu-
n

299 merical analysis, these time-steps are usually confined to be small due to sta-
300 bility constraints for explicit schemes or computational complexity constrains
301 for implicit formulations [45]. These constraints become more severe as the
302 total number of Runge-Kutta stages q is increased, and, for most problems
303 of practical interest, one needs to take thousands to millions of such steps
304 until the solution is resolved up to a desired final time. In sharp contrast to
305 classical methods, here we can employ implicit Runge-Kutta schemes with
306 an arbitrarily large number of stages at effectively very little extra cost.1
307 This enables us to take very large time steps while retaining stability and
308 high predictive accuracy, therefore allowing us to resolve the entire spatio-
309 temporal solution in a single step.
310

311 In this example, we have generated a training and test data-set set by
312 simulating the Allen-Cahn equation (12) using conventional spectral meth-
313 ods. Specifically, starting from an initial condition u(0, x) = x2 cos(πx) and
314 assuming periodic boundary conditions u(t, −1) = u(t, 1) and ux (t, −1) =
315 ux (t, 1), we have integrated equation (12) up to a final time t = 1.0 using the
316 Chebfun package [40] with a spectral Fourier discretization with 512 modes
317 and a fourth-order explicit Runge-Kutta temporal integrator with time-step
318 Δt = 10−5 .
319

320 Our training data-set consists of Nn = 200 initial data points that are
321 randomly sub-sampled from the exact solution at time t = 0.1, and our goal

1
To be precise, it is only the number of parameters in the last layer of the neural
network that increases linearly with the total number of stages.

12
322 is to predict the solution at time t = 0.9 using a single time-step with size
323 Δt = 0.8. To this end, we employ a discrete time physics-informed neural
324 network with 4 hidden layers and 200 neurons per layer, while the output
325 layer predicts 101 quantities of interest corresponding to the q = 100 Runge-
326 Kutta stages un+ci (x), i = 1, . . . , q, and the solution at final time un+1 (x).
327 The theoretical error estimates for this scheme predict a temporal error ac-
328 cumulation of O(Δt2q ) [45], which in our case translates into an error way
329 below machine precision, i.e., Δt2q = 0.8200 ≈ 10−20 . To our knowledge, this
330 is the first time that an implicit Runge-Kutta scheme of that high-order has
331 ever been used. Remarkably, starting from smooth initial data at t = 0.1 we
332 can predict the nearly discontinuous solution at t = 0.9 in a single time-step
333 with a relative L2 error of 6.99 · 10−3 , as illustrated in Figure 2. This error is
334 entirely attributed to the neural network’s capacity to approximate u(t, x),
335 as well as to the degree that the sum of squared errors loss allows interpola-
336 tion of the training data.
337

338 The key parameters controlling the performance of our discrete time al-
339 gorithm are the total number of Runge-Kutta stages q and the time-step
340 size Δt. As we demonstrate in the systematic studies provided in Appendix
341 A and Appendix B, low-order methods, such as the case q = 1 correspond-
342 ing to the classical trapezoidal rule, and the case q = 2 corresponding to the
343 4th -order Gauss-Legendre method, cannot retain their predictive accuracy for
344 large time-steps, thus mandating a solution strategy with multiple time-steps
345 of small size. On the other hand, the ability to push the number of Runge-
346 Kutta stages to 32 and even higher allows us to take very large time steps,
347 and effectively resolve the solution in a single step without sacrificing the
348 accuracy of our predictions. Moreover, numerical stability is not sacrificed
349 either as implicit Gauss-Legendre is the only family of time-stepping schemes
350 that remain A-stable regardless of their order, thus making them ideal for
351 stiff problems [45]. These properties are unprecedented for an algorithm of
352 such implementation simplicity, and illustrate one of the key highlights of
353 our discrete time approach.

354 4. Data-driven discovery of partial differential equations


355 In the current part of our study, we shift our attention to the problem of
356 data-driven discovery of partial differential equations [5, 9, 14]. In sections 4.1
357 and 4.2, we put forth two distinct types of algorithms, namely continuous

13
u(t, x)
1.0
0.75
0.5 0.50
0.25
0.0 0.00
x

−0.25
−0.5 −0.50
−0.75
−1.0 −1.00
0.0 0.2 0.4 0.6 0.8 1.0
t
t = 0.10 t = 0.90
1.0
0.00
0.5
−0.25
u(t, x)

u(t, x)

−0.50 0.0

−0.75 −0.5

−1.00 −1.0
−1 0 1 −1 0 1
x x
Data Exact Prediction

Figure 2: Allen-Cahn equation: Top: Solution u(t, x) along with the location of the initial
training snapshot at t = 0.1 and the final prediction snapshot at t = 0.9. Bottom: Initial
training data and final prediction at the snapshots depicted by the white vertical lines in
the top panel. The relative L2 error for this case is 6.99 · 10−3 .

358 and discrete time models, and highlight their properties and performance
359 through the lens of various canonical problems.

360 4.1. Continuous Time Models


361 Let us recall equation (1) and similar to section 3.1 define f (t, x) to be
362 given by the left-hand-side of equation (1); i.e.,

f := ut + N [u; λ]. (14)

363 We proceed by approximating u(t, x) by a deep neural network. This as-


364 sumption along with equation (14) result in a physics-informed neural net-

14
365 work f (t, x). This network can be derived by applying the chain rule for
366 differentiating compositions of functions using automatic differentiation [12].
367 It is worth highlighting that the parameters of the differential operator λ
368 turn into parameters of the physics-informed neural network f (t, x).

369 4.1.1. Example (Navier-Stokes Equation)


370 Our next example involves a realistic scenario of incompressible fluid flow
371 described by the ubiquitous Navier-Stokes equations. Navier-Stokes equa-
372 tions describe the physics of many phenomena of scientific and engineering
373 interest. They may be used to model the weather, ocean currents, water flow
374 in a pipe and air flow around a wing. The Navier-Stokes equations in their
375 full and simplified forms help with the design of aircrafts and cars, the study
376 of blood flow, the design of power stations, the analysis of the dispersion of
377 pollutants, and many other applications. Let us consider the Navier-Stokes
378 equations in two dimensions2 (2D) given explicitly by

ut + λ1 (uux + vuy ) = −px + λ2 (uxx + uyy ),


(15)
vt + λ1 (uvx + vvy ) = −py + λ2 (vxx + vyy ),

379 where u(t, x, y) denotes the x-component of the velocity field, v(t, x, y) the
380 y-component, and p(t, x, y) the pressure. Here, λ = (λ1 , λ2 ) are the unknown
381 parameters. Solutions to the Navier-Stokes equations are searched in the set
382 of divergence-free functions; i.e.,

ux + vy = 0. (16)

383 This extra equation is the continuity equation for incompressible fluids that
384 describes the conservation of mass of the fluid. We make the assumption
385 that
u = ψy , v = −ψx , (17)
386 for some latent function ψ(t, x, y).3 Under this assumption, the continuity
387 equation (16) will be automatically satisfied. Given noisy measurements

{ti , xi , y i , ui , v i }N
i=1

2
It is straightforward to generalize the proposed framework to the Navier-Stokes equa-
tions in three dimensions (3D).
3
This construction can be generalized to three dimensional problems by employing the
notion of vector potentials.

15
388 of the velocity field, we are interested in learning the parameters λ as well as
389 the pressure p(t, x, y). We define f (t, x, y) and g(t, x, y) to be given by

f := ut + λ1 (uux + vuy ) + px − λ2 (uxx + uyy ),


(18)
g := vt + λ1 (uvx + vvy ) + py − λ2 (vxx + vyy ),
 
390 and proceed by jointly approximating ψ(t, x, y) p(t, x, y) using a single
391 neural network with two outputs. This prior assumption along  with equations 
392 (17) and (18) results into a physics-informed neural network f (t, x, y) g(t, x, y) .
393 The parameters λ of the Navier-Stokes operator
 as well as the parameters
 of
394 the neural networks ψ(t, x, y) p(t, x, y) and f (t, x, y) g(t, x, y) can be
395 trained by minimizing the mean squared error loss

1 
N

M SE := |u(ti , xi , y i ) − ui |2 + |v(ti , xi , y i ) − v i |2 (19)
N i=1

1 
N

+ |f (ti , xi , y i )|2 + |g(ti , xi , y i )|2 .
N i=1

396 Here we consider the prototype problem of incompressible flow past a circular
397 cylinder; a problem known to exhibit rich dynamic behavior and transitions
398 for different regimes of the Reynolds number Re = u∞ D/ν. Assuming a
399 non-dimensional free stream velocity u∞ = 1, cylinder diameter D = 1, and
400 kinematic viscosity ν = 0.01, the system exhibits a periodic steady state
401 behavior characterized by a asymmetrical vortex shedding pattern in the
402 cylinder wake, known as the Kármán vortex street [46].
403

404 To generate a high-resolution data set for this problem we have employed
405 the spectral/hp-element solver NekTar [47]. Specifically, the solution domain
406 is discretized in space by a tessellation consisting of 412 triangular elements,
407 and within each element the solution is approximated as a linear combination
408 of a tenth-order hierarchical, semi-orthogonal Jacobi polynomial expansion
409 [47]. We have assumed a uniform free stream velocity profile imposed at the
410 left boundary, a zero pressure outflow condition imposed at the right bound-
411 ary located 25 diameters downstream of the cylinder, and periodicity for the
412 top and bottom boundaries of the [−15, 25] × [−8, 8] domain. We integrate
413 equation (15) using a third-order stiffly stable scheme [47] until the system
414 reaches a periodic steady state, as depicted in figure 3(a). In what follows,
415 a small portion of the resulting data-set corresponding to this steady state

16
416 solution will be used for model training, while the remaining data will be
417 used to validate our predictions. For simplicity, we have chosen to confine
418 our sampling in a rectangular region downstream of cylinder as shown in
419 figure 3(a).
420

421 Given scattered and potentially noisy data on the stream-wise u(t, x, y)
422 and transverse v(t, x, y) velocity components, our goal is to identify the un-
423 known parameters λ1 and λ2 , as well as to obtain a qualitatively accurate
424 reconstruction of the entire pressure field p(t, x, y) in the cylinder wake, which
425 by definition can only be identified up to a constant. To this end, we have
426 created a training data-set by randomly sub-sampling the full high-resolution
427 data-set. To highlight the ability of our method to learn from scattered and
428 scarce training data, we have chosen N = 5, 000, corresponding to a mere
429 1% of the total available data as illustrated in figure 3(b). Also plotted are
430 representative snapshots of the predicted velocity components u(t, x, y) and
431 v(t, x, y) after the model was trained. The neural network architecture used
432 here consists of 9 layers with 20 neurons in each layer.
433

434 A summary of our results for this example is presented in figure 4. We


435 observe that the physics-informed neural network is able to correctly identify
436 the unknown parameters λ1 and λ2 with very high accuracy even when the
437 training data was corrupted with noise. Specifically, for the case of noise-
438 free training data, the error in estimating λ1 and λ2 is 0.078%, and 4.67%,
439 respectively. The predictions remain robust even when the training data are
440 corrupted with 1% uncorrelated Gaussian noise, returning an error of 0.17%,
441 and 5.70%, for λ1 and λ2 , respectively.
442

443 A more intriguing result stems from the network’s ability to provide a
444 qualitatively accurate prediction of the entire pressure field p(t, x, y) in the
445 absence of any training data on the pressure itself. A visual comparison
446 against the exact pressure solution is presented in figure 4 for a represen-
447 tative pressure snapshot. Notice that the difference in magnitude between
448 the exact and the predicted pressure is justified by the very nature of the
449 incompressible Navier-Stokes system, as the pressure field is only identifiable
450 up to a constant. This result of inferring a continuous quantity of interest
451 from auxiliary measurements by leveraging the underlying physics is a great
452 example of the enhanced capabilities that physics-informed neural networks
453 have to offer, and highlights their potential in solving high-dimensional in-

17
Vorticity
3
2
5
1
0 0
y

−1
−5
−2
−3
−15 −10 −5 0 5 10 15 20 25
x

u(tt,, x,
u(t, x, yy)) v(t, x, y)

y t y t

x x

Figure 3: Navier-Stokes equation: Top: Incompressible flow and dynamic vortex shedding
past a circular cylinder at Re = 100. The spatio-temporal training data correspond to
the depicted rectangular region in the cylinder wake. Bottom: Locations of training data-
points for the the stream-wise and transverse velocity components, u(t, x, y) and v(t, x, t),
respectively.

454 verse problems.


455

456 Our approach so far assumes availability of scattered data throughout the
457 entire spatio-temporal domain. However, in many cases of practical interest,
458 one may only be able to observe the system at distinct time instants. In the
459 next section, we introduce a different approach that tackles the data-driven
460 discovery problem using only two data snapshots. We will see how, by lever-
461 aging the classical Runge-Kutta time-stepping schemes, one can construct
462 discrete time physics-informed neural networks that can retain high predic-
463 tive accuracy even when the temporal gap between the data snapshots is
464 very large.

18
Predicted pressure Exact pressure
2 2
1.4 0.0
1 1.3 1 −0.1
1.2 −0.2
0 0
y

y
1.1 −0.3
−1 1.0 −1 −0.4
−2 0.9 −2 −0.5
2 4 6 8 2 4 6 8
x x

ut + (uux + vuy ) = −px + 0.01(uxx + uyy )


Correct PDE
vt + (uvx + vvy ) = −py + 0.01(vxx + vyy )
ut + 0.999(uux + vuy ) = −px + 0.01047(uxx + uyy )
Identified PDE (clean data)
vt + 0.999(uvx + vvy ) = −py + 0.01047(vxx + vyy )
ut + 0.998(uux + vuy ) = −px + 0.01057(uxx + uyy )
Identified PDE (1% noise)
vt + 0.998(uvx + vvy ) = −py + 0.01057(vxx + vyy )

Figure 4: Navier-Stokes equation: Top: Predicted versus exact instantaneous pressure


field p(t, x, y) at a representative time instant. By definition, the pressure can be recov-
ered up to a constant, hence justifying the different magnitude between the two plots.
This remarkable qualitative agreement highlights the ability of physics-informed neural
networks to identify the entire pressure field, despite the fact that no data on the pressure
are used during model training. Bottom: Correct partial differential equation along with
the identified one obtained by learning λ1 , λ2 and p(t, x, y).

465 4.2. Discrete Time Models


466 We begin by applying the general form of Runge-Kutta methods [45] with
467 q stages to equation (1) and obtain

un+ci = un − Δt qj=1 aij N [un+cj ; λ], i = 1, . . . , q,
 (20)
un+1 = un − Δt qj=1 bj N [un+cj ; λ].

468 Here, un+cj (x) = u(tn + cj Δt, x) is the hidden state of the system at time
469 tn + cj Δt for j = 1, . . . , q. This general form encapsulates both implicit and
470 explicit time-stepping schemes, depending on the choice of the parameters
471 {aij , bj , cj }. Equations (20) can be equivalently expressed as

un = uni , i = 1, . . . , q,
(21)
un+1 = un+1
i , i = 1, . . . , q.

19
472 where
q
uni := un+ci + Δt j=1 aij N [un+cj ; λ], i = 1, . . . , q,
q (22)
un+1
i := un+ci + Δt j=1 (aij − bj )N [u
n+cj
; λ], i = 1, . . . , q.

473 We proceed by placing a multi-output neural network prior on


 n+c 
u 1 (x), . . . , un+cq (x) . (23)

474 This prior assumption along with equations (22) result in two physics-informed
475 neural networks  n 
u1 (x), . . . , unq (x), unq+1 (x) , (24)
476 and  
un+1
1 (x), . . . , u n+1
q (x), un+1
q+1 (x) . (25)
477 Given noisy measurements at two distinct temporal snapshots {xn , un } and
478 {xn+1 , un+1 } of the system at times tn and tn+1 , respectively, the shared
479 parameters of the neural networks (23), (24), and (25) along with the pa-
480 rameters λ of the differential operator can be trained by minimizing the sum
481 of squared errors
SSE = SSEn + SSEn+1 , (26)
482 where
q Nn
 
SSEn := |unj (xn,i ) − un,i |2 ,
j=1 i=1

483 and
q Nn+1
 
SSEn+1 := |un+1
j (xn+1,i ) − un+1,i |2 .
j=1 i=1
N N N
484 Here, xn = {xn,i }i=1
n
, un = {un,i }i=1
n
, xn+1 = {xn+1,i }i=1
n+1
, and un+1 =
N
485 {un+1,i }i=1 .
n+1

486 4.2.1. Example (Korteweg–de Vries Equation)


487 Our final example aims to highlight the ability of the proposed frame-
488 work to handle governing partial differential equations involving higher or-
489 der derivatives. Here, we consider a mathematical model of waves on shallow
490 water surfaces; the Korteweg-de Vries (KdV) equation. This equation can
491 also be viewed as Burgers equation with an added dispersive term. The KdV

20
492 equation has several connections to physical problems. It describes the evolu-
493 tion of long one-dimensional waves in many physical settings. Such physical
494 settings include shallow-water waves with weakly non-linear restoring forces,
495 long internal waves in a density-stratified ocean, ion acoustic waves in a
496 plasma, and acoustic waves on a crystal lattice. Moreover, the KdV equa-
497 tion is the governing equation of the string in the Fermi-Pasta-Ulam problem
498 [48] in the continuum limit. The KdV equation reads as

ut + λ1 uux + λ2 uxxx = 0, (27)

499 with (λ1 , λ2 ) being the unknown parameters. For the KdV equation, the
500 nonlinear operator in equations (22) is given by

N [un+cj ] = λ1 un+cj un+c


x
j
− λ2 un+c
xxx
j

501 and the shared parameters of the neural networks (23), (24), and (25) along
502 with the parameters λ = (λ1 , λ2 ) of the KdV equation can be learned by
503 minimizing the sum of squared errors (26).
504

505 To obtain a set of training and test data we simulated (27) using con-
506 ventional spectral methods. Specifically, starting from an initial condition
507 u(0, x) = cos(πx) and assuming periodic boundary conditions, we have inte-
508 grated equation (27) up to a final time t = 1.0 using the Chebfun package
509 [40] with a spectral Fourier discretization with 512 modes and a fourth-order
510 explicit Runge-Kutta temporal integrator with time-step Δt = 10−6 . Using
511 this data-set, we then extract two solution snapshots at time tn = 0.2 and
512 tn+1 = 0.8, and randomly sub-sample them using Nn = 199 and Nn+1 = 201
513 to generate a training data-set. We then use these data to train a discrete
514 time physics-informed neural network by minimizing the sum of squared error
515 loss of equation (26) using L-BFGS [35]. The network architecture used here
516 comprises of 4 hidden layers, 50 neurons per layer, and an output layer pre-
517 dicting the solution at the q Runge-Kutta stages, i.e., un+cj (x), j = 1, . . . , q,
518 where q is empirically chosen to yield a temporal error accumulation of the
519 order of machine precision  by setting4

q = 0.5 log / log(Δt), (28)

4
This is motivated by the theoretical error estimates for implicit Runge-Kutta schemes
suggesting a truncation error of O(Δt2q ) [45].

21
520 where the time-step for this example is Δt = 0.6.
521

522 The results of this experiment are summarized in figure 5. In the top
523 panel, we present the exact solution u(t, x), along with the locations of the
524 two data snapshots used for training. A more detailed overview of the exact
525 solution and the training data is given in the middle panel. It is worth notic-
526 ing how the complex nonlinear dynamics of equation (27) causes dramatic
527 differences in the form of the solution between the two reported snapshots.
528 Despite these differences, and the large temporal gap between the two train-
529 ing snapshots, our method is able to correctly identify the unknown param-
530 eters regardless of whether the training data is corrupted with noise or not.
531 Specifically, for the case of noise-free training data, the error in estimating
532 λ1 and λ2 is 0.023%, and 0.006%, respectively, while the case with 1% noise
533 in the training data returns errors of 0.057%, and 0.017%, respectively.

534 5. Conclusions
535 We have introduced physics-informed neural networks, a new class of
536 universal function approximators that is capable of encoding any underlying
537 physical laws that govern a given data-set, and can be described by par-
538 tial differential equations. In this work, we design data-driven algorithms for
539 inferring solutions to general nonlinear partial differential equations, and con-
540 structing computationally efficient physics-informed surrogate models. The
541 resulting methods showcase a series of promising results for a diverse collec-
542 tion of problems in computational science, and open the path for endowing
543 deep learning with the powerful capacity of mathematical physics to model
544 the world around us. As deep learning technology is continuing to grow
545 rapidly both in terms of methodological and algorithmic developments, we
546 believe that this is a timely contribution that can benefit practitioners across
547 a wide range of scientific domains. Specific applications that can readily en-
548 joy these benefits include, but are not limited to, data-driven forecasting of
549 physical processes, model predictive control, multi-physics/multi-scale mod-
550 eling and simulation.
551

552 We must note however that the proposed methods should not be viewed
553 as replacements of classical numerical methods for solving partial differential
554 equations (e.g., finite elements, spectral methods, etc.). Such methods have
555 matured over the last 50 years and, in many cases, meet the robustness and

22
u(t, x)
1.0
2.0
0.5 1.5
1.0
0.0
x

0.5
0.0
−0.5
−0.5
−1.0 −1.0
0.0 0.2 0.4 0.6 0.8 1.0
t
t = 0.20 t = 0.80
199 trainng data 201 trainng data
1.0
2
0.5
u(t, x)

u(t, x)

1
0.0

−0.5 0

−1.0
−1 0 1 −1 0 1
x x
Exact Data

Correct PDE ut + uux + 0.0025uxxx = 0


Identified PDE (clean data) ut + 1.000uux + 0.0025002uxxx = 0
Identified PDE (1% noise) ut + 0.999uux + 0.0024996uxxx = 0

Figure 5: KdV equation: Top: Solution u(t, x) along with the temporal locations of the
two training snapshots. Middle: Training data and exact solution corresponding to the
two temporal snapshots depicted by the dashed vertical lines in the top panel. Bottom:
Correct partial differential equation along with the identified one obtained by learning
λ 1 , λ2 .

556 computational efficiency standards required in practice. Our message here,


557 as advocated in Section 3.2, is that classical methods such as the Runge-
558 Kutta time-stepping schemes can coexist in harmony with deep neural net-
559 works, and offer invaluable intuition in constructing structured predictive

23
560 algorithms. Moreover, the implementation simplicity of the latter greatly
561 favors rapid development and testing of new ideas, potentially opening the
562 path for a new era in data-driven scientific computing.
563

564 Although a series of promising results was presented, the reader may per-
565 haps agree this work creates more questions than it answers. How deep/wide
566 should the neural network be? How much data is really needed? Why does
567 the algorithm converge to unique values for the parameters of the differen-
568 tial operators, i.e., why is the algorithm not suffering from local optima for
569 the parameters of the differential operator? Does the network suffer from
570 vanishing gradients for deeper architectures and higher order differential op-
571 erators? Could this be mitigated by using different activation functions?
572 Can we improve on initializing the network weights or normalizing the data?
573 Are the mean square error and the sum of squared errors the appropriate
574 loss functions? Why are these methods seemingly so robust to noise in the
575 data? How can we quantify the uncertainty associated with our predictions?
576 Throughout this work, we have attempted to answer some of these questions,
577 but we have observed that specific settings that yielded impressive results for
578 one equation could fail for another. Admittedly, more work is needed collec-
579 tively to set the foundations in this field.
580

581 In a broader context, and along the way of seeking answers to those
582 questions, we believe that this work advocates a fruitful synergy between
583 machine learning and classical computational physics that has the potential
584 to enrich both fields and lead to high-impact developments.

585 Acknowledgements
586 This work received support by the DARPA EQUiPS grant N66001-15-2-
587 4055 and the AFOSR grant FA9550-17-1-0013.

588 Appendix A. Data-driven solution of partial differential equations


589 This Appendix accompanies the main manuscript, and contains a series
590 of systematic studies that aim to demonstrate the performance of the pro-
591 posed algorithms for problems pertaining to data-driven solution of partial
592 differential equations. Throughout this document, we will use the Burgers’
593 equation as a canonical example.

24
594 Appendix A.1. Continuous Time Models
595 In one space dimension, the Burger’s equation along with Dirichlet bound-
596 ary conditions reads as

ut + uux − (0.01/π)uxx = 0, x ∈ [−1, 1], t ∈ [0, 1], (A.1)


u(0, x) = − sin(πx),
u(t, −1) = u(t, 1) = 0.

597 Let us define f (t, x) to be given by

f := ut + uux − (0.01/π)uxx ,

598 and proceed by approximating u(t, x) by a deep neural network. To highlight


599 the simplicity in implementing this idea we have included a Python code
600 snippet using Tensorflow [49]; currently one of the most popular and well
601 documented open source libraries for machine learning computations. To
602 this end, u(t, x) can be simply defined as
603 def u ( t , x ) :
604 u = n e u r a l n e t ( t f . concat ( [ t , x ] , 1 ) , weights , b i a s e s )
605 return u
606 Correspondingly, the physics-informed neural network f (t, x) takes the form
607 def f ( t , x ) :
608 u = u( t , x)
609 u t = t f . gradients (u , t ) [ 0 ]
610 u x = t f . gradients (u , x ) [ 0 ]
611 u xx = t f . g r a d i e n t s ( u x , x ) [ 0 ]
612 f = u t + u∗ u x − ( 0 . 0 1 / t f . p i )∗ u xx
613 return f
614 The shared parameters between the neural networks u(t, x) and f (t, x) can
615 be learned by minimizing the mean squared error loss

M SE = M SEu + M SEf , (A.2)

616 where
1 
Nu
M SEu = |u(tiu , xiu ) − ui |2 ,
Nu i=1

25
617 and
Nf
1 
M SEf = |f (tif , xif )|2 .
Nf i=1
618 Here, {tiu , xiu , ui }N
i=1 denote the initial and boundary training data on u(t, x)
u

N
619 and {tif , xif }i=1f
specify the collocations points for f (t, x). The loss M SEu
620 corresponds to the initial and boundary data while M SEf enforces the struc-
621 ture imposed by equation (A.1) at a finite set of collocation points. Although
622 similar ideas for constraining neural networks using physical laws have been
623 explored in previous studies [15, 16], here we revisit them using modern
624 computational tools, and apply them to more challenging dynamic problems
625 described by time-dependent nonlinear partial differential equations.
626

627 The Burgers equation is often considered as a prototype example of a


628 hyperbolic conservation law (as ν → 0). Notice that if we want to “fabri-
629 cate” an “exact” solution to this equation we would select a solution u(t, x)
630 (e.g., e−t sin(πx)) and obtain the corresponding right hand side f (t, x) by
631 differentiation. The resulting u(t, x) and f (t, x) are “guaranteed” to satisfy
632 the Burgers equation and conserve all associated invariances by construction.
633 In our work, we replace u(t, x) by a neural network u(t, x; W, b) and obtain
634 a physics-informed neural network f (t, x; W, b) by automatic differentiation.
635 Consequently, the resulting pair u(t, x; W, b) and f (t, x; W, b) must satisfy
636 the Burgers equation regardless of the choice of the weights W and bias b
637 parameters. Hence, at this “prior” level, i.e. before we train the networks
638 on a given set of data, our model should exactly preserves the continuity
639 and momentum equations by construction. During training, given a data-set
640 ti , xi , ui and tj , xj , fj , we then try to find the “correct” parameters W ∗ and b∗
641 such that we get as good a fit as possible to both the observed data and the
642 differential equation residual. During this process the residual, albeit small,
643 will not be exactly zero, and therefore our approximation will conserve mass
644 and momentum within the accuracy of the residual loss. Similar behavior is
645 observed in classical Galerkin finite element methods, while the only numer-
646 ical methods that are known to have exact conservation properties in this
647 setting are discontinuous Galerkin and finite volumes.
648

649 In all benchmarks considered in this work, the total number of training
650 data Nu is relatively small (a few hundred up to a few thousand points), and
651 we chose to optimize all loss functions using L-BFGS a quasi-Newton, full-

26
652 batch gradient-based optimization algorithm [35]. For larger data-sets a more
653 computationally efficient mini-batch setting can be readily employed using
654 stochastic gradient descent and its modern variants [36, 37]. Despite the
655 fact that there is no theoretical guarantee that this procedure converges to
656 a global minimum, our empirical evidence indicates that, if the given partial
657 differential equation is well-posed and its solution is unique, our method is
658 capable of achieving good prediction accuracy given a sufficiently expressive
659 neural network architecture and a sufficient number of collocation points Nf .
660 This general observation deeply relates to the resulting optimization land-
661 scape induced by the mean square error loss of equation 4, and defines an
662 open question for research that is in sync with recent theoretical develop-
663 ments in deep learning [38, 39]. Here, we will test the robustness of the
664 proposed methodology using a series of systematic sensitivity studies that
665 accompany the numerical results presented in the following.
666

667 Figure A.6 summarizes our results for the data-driven solution of the
668 Burgers equation. Specifically, given a set of Nu = 100 randomly distributed
669 initial and boundary data, we learn the latent solution u(t, x) by training all
670 3021 parameters of a 9-layer deep neural network using the mean squared
671 error loss of (A.2). Each hidden layer contained 20 neurons and a hyperbolic
672 tangent activation function. The top panel of Figure A.6 shows the predicted
673 spatio-temporal solution u(t, x), along with the locations of the initial and
674 boundary training data. We must underline that, unlike any classical nu-
675 merical method for solving partial differential equations, this prediction is
676 obtained without any sort of discretization of the spatio-temporal domain.
677 The exact solution for this problem is analytically available [13], and the
678 resulting prediction error is measured at 6.7 · 10−4 in the relative L2 -norm.
679 Note that this error is about two orders of magnitude lower than the one
680 reported in our previous work on data-driven solution of partial differential
681 equation using Gaussian processes [8]. A more detailed assessment of the
682 predicted solution is presented in the bottom panel of figure A.6. In partic-
683 ular, we present a comparison between the exact and the predicted solutions
684 at different time instants t = 0.25, 0.50, 0.75. Using only a handful of ini-
685 tial and boundary data, the physics-informed neural network can accurately
686 capture the intricate nonlinear behavior of the Burgers’ equation that leads
687 to the development of a sharp internal layer around t = 0.4. The latter is
688 notoriously hard to accurately resolve with classical numerical methods and
689 requires a laborious spatio-temporal discretization of equation (A.1).

27
u(t, x)
1.0
Data (100 points) 0.75
0.5 0.50
0.25
0.0 0.00
x

−0.25
−0.5 −0.50
−0.75
−1.0
0.0 0.2 0.4 0.6 0.8
t
t = 0.25 t = 0.50 t = 0.75
1 1 1
u(t, x)

u(t, x)

u(t, x)
0 0 0

−1 −1 −1
−1 0 1 −1 0 1 −1 0 1
x x x
Exact Prediction

Figure A.6: Burgers’ equation: Top: Predicted solution u(t, x) along with the initial and
boundary training data. In addition we are using 10,000 collocation points generated using
a Latin Hypercube Sampling strategy. Bottom: Comparison of the predicted and exact
solutions corresponding to the three temporal snapshots depicted by the white vertical
lines in the top panel. The relative L2 error for this case is 6.7 · 10−4 . Model training took
approximately 60 seconds on a single NVIDIA Titan X GPU card.

690

691 To further analyze the performance of our method, we have performed


692 the following systematic studies to quantify its predictive accuracy for differ-
693 ent number of training and collocation points, as well as for different neural
694 network architectures. In table A.1 we report the resulting relative L2 error
695 for different number of initial and boundary training data Nu and different
696 number of collocation points Nf , while keeping the 9-layer network archi-
697 tecture fixed. The general trend shows increased prediction accuracy as the
698 total number of training data Nu is increased, given a sufficient number of
699 collocation points Nf . This observation highlights a key strength of physics-
700 informed neural networks: by encoding the structure of the underlying phys-

28
Nf
2000 4000 6000 7000 8000 10000
Nu
20 2.9e-01 4.4e-01 8.9e-01 1.2e+00 9.9e-02 4.2e-02
40 6.5e-02 1.1e-02 5.0e-01 9.6e-03 4.6e-01 7.5e-02
60 3.6e-01 1.2e-02 1.7e-01 5.9e-03 1.9e-03 8.2e-03
80 5.5e-03 1.0e-03 3.2e-03 7.8e-03 4.9e-02 4.5e-03
100 6.6e-02 2.7e-01 7.2e-03 6.8e-04 2.2e-03 6.7e-04
200 1.5e-01 2.3e-03 8.2e-04 8.9e-04 6.1e-04 4.9e-04
Table A.1: Burgers’ equation: Relative L2 error between the predicted and the exact
solution u(t, x) for different number of initial and boundary training data Nu , and different
number of collocation points Nf . Here, the network architecture is fixed to 9 layers with
20 neurons per hidden layer.

Neurons
10 20 40
Layers
2 7.4e-02 5.3e-02 1.0e-01
4 3.0e-03 9.4e-04 6.4e-04
6 9.6e-03 1.3e-03 6.1e-04
8 2.5e-03 9.6e-04 5.6e-04
Table A.2: Burgers’ equation: Relative L2 error between the predicted and the exact
solution u(t, x) for different number of hidden layers and different number of neurons per
layer. Here, the total number of training and collocation points is fixed to Nu = 100 and
Nf = 10, 000, respectively.

701 ical law through the collocation points Nf , one can obtain a more accurate
702 and data-efficient learning algorithm.5 Finally, table A.2 shows the resulting
703 relative L2 for different number of hidden layers, and different number of
704 neurons per layer, while the total number of training and collocation points
705 is kept fixed to Nu = 100 and Nf = 10, 000, respectively. As expected, we
706 observe that as the number of layers and neurons is increased (hence the
707 capacity of the neural network to approximate more complex functions), the
708 predictive accuracy is increased.

5
Note that the case Nf = 0 corresponds to a standard neural network model, i.e., a
neural network that does not take into account the underlying governing equation.

29
709 Appendix A.2. Discrete Time Models
710 Let us apply the general form of Runge-Kutta methods with q stages [45]
711 to a general equation of the form
ut + N [u] = 0, x ∈ Ω, t ∈ [0, T ], (A.3)
712 and obtain

un+ci = un − Δt qj=1 aij N [un+cj ], i = 1, . . . , q,
 (A.4)
un+1 = un − Δt qj=1 bj N [un+cj ].

713 Here, un+cj (x) = u(tn + cj Δt, x) for j = 1, . . . , q. This general form en-
714 capsulates both implicit and explicit time-stepping schemes, depending on
715 the choice of the parameters {aij , bj , cj }. Equations (7) can be equivalently
716 expressed as
un = uni , i = 1, . . . , q,
(A.5)
un = unq+1 ,
717 where
q
uni := un+ci + Δt j=1 aij N [un+cj ], i = 1, . . . , q,
q (A.6)
unq+1 := un+1 + Δt j=1 bj N [u
n+cj
].
718 We proceed by placing a multi-output neural network prior on
 n+c 
u 1 (x), . . . , un+cq (x), un+1 (x) . (A.7)
719 This prior assumption along with equations (A.6) result in a physics-informed
720 neural network that takes x as an input and outputs
 n 
u1 (x), . . . , unq (x), unq+1 (x) . (A.8)
721 To highlight the key features of the discrete time representation we revisit
722 the problem of data-driven solution of the Burgers’ equation. For this case,
723 the nonlinear operator in equation (A.6) is given by
N [un+cj ] = un+cj un+c
x
j
− (0.01/π)un+c
xx ,
j

724 and the shared parameters of the neural networks (A.7) and (A.8) can be
725 learned by minimizing the sum of squared errors
SSE = SSEn + SSEb , (A.9)

30
726 where
q+1 Nn
 
SSEn = |unj (xn,i ) − un,i |2 ,
j=1 i=1

727 and
q
  
SSEb = |un+ci (−1)|2 + |un+ci (1)|2 + |un+1 (−1)|2 + |un+1 (1)|2 .
i=1

728 Here, {xn,i , un,i }N n


i=1 corresponds to the data at time t . The Runge-Kutta
n

729 scheme now allows us to infer the latent solution u(t, x) in a sequential fash-
730 ion. Starting from initial data {xn,i , un,i }N n
i=1 at time t and data at the
n

731 domain boundaries x = −1 and x = 1, we can use the aforementioned loss


732 function (A.9) to train the networks of (A.7), (A.8), and predict the solu-
733 tion at time tn+1 . A Runge-Kutta time-stepping scheme would then use this
734 prediction as initial data for the next step and proceed to train again and
735 predict u(tn+2 , x), u(tn+3 , x), etc., one step at a time.
736

737 The result of applying this process to the Burgers’ equation is presented
738 in figure A.7. For illustration purposes, we start with a set of Nn = 250 initial
739 data at t = 0.1, and employ a physics-informed neural network induced by an
740 implicit Runge-Kutta scheme with 500 stages to predict the solution at time
741 t = 0.9 in a single step. The theoretical error estimates for this scheme predict
742 a temporal error accumulation of O(Δt2q ) [45], which in our case translates
743 into an error way below machine precision, i.e., Δt2q = 0.81000 ≈ 10−97 . To
744 our knowledge, this is the first time that an implicit Runge-Kutta scheme
745 of that high-order has ever been used. Remarkably, starting from smooth
746 initial data at t = 0.1 we can predict the nearly discontinuous solution at
747 t = 0.9 in a single time-step with a relative L2 error of 8.2·10−4 . This error is
748 two orders of magnitude lower that the one reported in [8], and it is entirely
749 attributed to the neural network’s capacity to approximate u(t, x), as well as
750 to the degree that the sum of squared errors loss allows interpolation of the
751 training data. The network architecture used here consists of 4 layers with
752 50 neurons in each hidden layer.
753

754 A detailed systematic study to quantify the effect of different network


755 architectures is presented in table A.3. By keeping the number of Runge-
756 Kutta stages fixed to q = 500 and the time-step size to Δt = 0.8, we have

31
u(t, x)
1.0
0.75
0.5 0.50
0.25
0.0 0.00
x

−0.25
−0.5 −0.50
−0.75
−1.0
0.0 0.2 0.4 0.6 0.8
t
t = 0.10 t = 0.90
1.0
0.5
0.5
u(t, x)

u(t, x)

0.0 0.0

−0.5
−0.5
−1.0
−1 0 1 −1 0 1
x x
Data Exact Prediction

Figure A.7: Burgers equation: Top: Solution u(t, x) along with the location of the initial
training snapshot at t = 0.1 and the final prediction snapshot at t = 0.9. Bottom: Initial
training data and final prediction at the snapshots depicted by the white vertical lines in
the top panel. The relative L2 error for this case is 8.2 · 10−4 .

757 varied the number of hidden layers and the number of neurons per layer, and
758 monitored the resulting relative L2 error for the predicted solution at time
759 t = 0.9. Evidently, as the neural network capacity is increased the predictive
760 accuracy is enhanced.
761

762 The key parameters controlling the performance of our discrete time al-
763 gorithm are the total number of Runge-Kutta stages q and the time-step size
764 Δt. In table A.4 we summarize the results of an extensive systematic study
765 where we fix the network architecture to 4 hidden layers with 50 neurons

32
Neurons
10 25 50
Layers
1 4.1e-02 4.1e-02 1.5e-01
2 2.7e-03 5.0e-03 2.4e-03
3 3.6e-03 1.9e-03 9.5e-04
Table A.3: Burgers’ equation: Relative final prediction error measure in the L2 norm
for different number of hidden layers and neurons in each layer. Here, the number of
Runge-Kutta stages is fixed to 500 and the time-step size to Δt = 0.8.

766 per layer, and vary the number of Runge-Kutta stages q and the time-step
767 size Δt. Specifically, we see how cases with low numbers of stages fail to
768 yield accurate results when the time-step size is large. For instance, the case
769 q = 1 corresponding to the classical trapezoidal rule, and the case q = 2
770 corresponding to the 4th -order Gauss-Legendre method, cannot retain their
771 predictive accuracy for time-steps larger than 0.2, thus mandating a solu-
772 tion strategy with multiple time-steps of small size. On the other hand, the
773 ability to push the number of Runge-Kutta stages to 32 and even higher
774 allows us to take very large time steps, and effectively resolve the solution
775 in a single step without sacrificing the accuracy of our predictions. More-
776 over, numerical stability is not sacrificed either as implicit Gauss-Legendre is
777 the only family of time-stepping schemes that remain A-stable regardless of
778 their order, thus making them ideal for stiff problems [45]. These properties
779 are unprecedented for an algorithm of such implementation simplicity, and
780 illustrate one of the key highlights of our discrete time approach.
781 Finally, in table A.5 we provide a systematic study to quantify the accu-
782 racy of the predicted solution as we vary the spatial resolution of the input
783 data. As expected, increasing the total number of training data results in
784 enhanced prediction accuracy.

785 Appendix B. Data-driven discovery of partial differential equations


786 This Appendix accompanies the main manuscript, and contains a series
787 of systematic studies that aim to demonstrate the performance of the pro-
788 posed algorithms for problems pertaining to data-driven discovery of partial
789 differential equations. Throughout this document, we will use the Burgers’
790 equation as a canonical example.

33
Δt
0.2 0.4 0.6 0.8
q
1 3.5e-02 1.1e-01 2.3e-01 3.8e-01
2 5.4e-03 5.1e-02 9.3e-02 2.2e-01
4 1.2e-03 1.5e-02 3.6e-02 5.4e-02
8 6.7e-04 1.8e-03 8.7e-03 5.8e-02
16 5.1e-04 7.6e-02 8.4e-04 1.1e-03
32 7.4e-04 5.2e-04 4.2e-04 7.0e-04
64 4.5e-04 4.8e-04 1.2e-03 7.8e-04
100 5.1e-04 5.7e-04 1.8e-02 1.2e-03
500 4.1e-04 3.8e-04 4.2e-04 8.2e-04
Table A.4: Burgers’ equation: Relative final prediction error measured in the L2 norm
for different number of Runge-Kutta stages q and time-step sizes Δt. Here, the network
architecture is fixed to 4 hidden layers with 50 neurons in each layer.

N 250 200 150 100 50 10


Error 4.02e-4 2.93e-3 9.39e-3 5.54e-2 1.77e-2 7.58e-1
Table A.5: Burgers equation: Relative L2 error between the predicted and the exact
solution u(t, x) for different number of training data Nn . Here, we have fixed q = 500, and
used a neural network architecture with 3 hidden layers and 50 neurons per hidden layer.

791 Appendix B.1. Continuous Time Models


792 As a first example, let us consider the Burgers’ equation. This equation
793 arises in various areas of applied mathematics, including fluid mechanics,
794 nonlinear acoustics, gas dynamics, and traffic flow [13]. It is a fundamen-
795 tal partial differential equation and can be derived from the Navier-Stokes
796 equations for the velocity field by dropping the pressure gradient term. For
797 small values of the viscosity parameters, Burgers’ equation can lead to shock
798 formation that is notoriously hard to resolve by classical numerical methods.
799 In one space dimension the equation reads as
ut + λ1 uux − λ2 uxx = 0. (B.1)
800 Let us define f (t, x) to be given by
f := ut + λ1 uux − λ2 uxx , (B.2)
801 and proceed by approximating u(t, x) by a deep neural network. This will
802 result in the physics-informed neural network f (t, x). The shared parameters

34
803 of the neural networks u(t, x) and f (t, x) along with the parameters λ =
804 (λ1 , λ2 ) of the differential operator can be learned by minimizing the mean
805 squared error loss
M SE = M SEu + M SEf , (B.3)
806 where
1 
N
M SEu = |u(tiu , xiu ) − ui |2 ,
N i=1
807 and
1 
N
M SEf = |f (tiu , xiu )|2 .
N i=1

808 Here, {tiu , xiu , ui }N


i=1 denote the training data on u(t, x). The loss M SEu cor-
809 responds to the training data on u(t, x) while M SEf enforces the structure
810 imposed by equation (B.1) at a finite set of collocation points, whose number
811 and location is taken to be the same as the training data.
812

813 To illustrate the effectiveness of our approach, we have created a train-


814 ing data-set by randomly generating N = 2, 000 points across the entire
815 spatio-temporal domain from the exact solution corresponding to λ1 = 1.0
816 and λ2 = 0.01/π. The locations of the training points are illustrated in the
817 top panel of figure B.8. This data-set is then used to train a 9-layer deep
818 neural network with 20 neurons per hidden layer by minimizing the mean
819 square error loss of (B.3) using the L-BFGS optimizer [35]. Upon training,
820 the network is calibrated to predict the entire solution u(t, x), as well as the
821 unknown parameters λ = (λ1 , λ2 ) that define the underlying dynamics. A
822 visual assessment of the predictive accuracy of the physics-informed neural
823 network is given in the middle and bottom panels of figure B.8. The network
824 is able to identify the underlying partial differential equation with remark-
825 able accuracy, even in the case where the scattered training data is corrupted
826 with 1% uncorrelated noise.
827

828 To further scrutinize the performance of our algorithm, we have per-


829 formed a systematic study with respect to the total number of training data,
830 the noise corruption levels, and the neural network architecture. The results
831 are summarized in tables B.6 and B.7. The key observation here is that the
832 proposed methodology appears to be very robust with respect to noise levels
833 in the data, and yields a reasonable identification accuracy even for noise

35
u(t, x)
1.0 1.00
0.75
0.5 0.50
0.25
0.0 0.00
x

−0.25
−0.5 −0.50
−0.75
−1.0 −1.00
0.0 0.2 0.4 0.6 0.8
t Data (2000 points)

t = 0.25 t = 0.50 t = 0.75


1 1 1
u(t, x)

u(t, x)

u(t, x)
0 0 0

−1 −1 −1
−1 0 1 −1 0 1 −1 0 1
x x x
Exact Prediction

Correct PDE ut + uux − 0.0031831uxx = 0


Identified PDE (clean data) ut + 0.99915uux − 0.0031794uxx = 0
Identified PDE (1% noise) ut + 1.00042uux − 0.0032098uxx = 0

Figure B.8: Burgers equation: Top: Predicted solution u(t, x) along with the training
data. Middle: Comparison of the predicted and exact solutions corresponding to the three
temporal snapshots depicted by the dashed vertical lines in the top panel. Bottom: Correct
partial differential equation along with the identified one obtained by learning λ1 and λ2 .

834 corruption up to 10%. This enhanced robustness seems to greatly outper-


835 form competing approaches using Gaussian process regression as previously
836 reported in [9] as well as approaches relying on sparse regression that require
837 relatively clean data for accurately computing numerical gradients [50]. We
838 also observe some variability and non monotonic trends in tables B.6 and B.7
839 as the network architecture and total number of training points are changed.
840 This variability could be potentially attributed to different factors pertaining
841 to the equation itself as well as the particular neural network setup, and gives
842 rise to a series of questions that require further investigation, as discussed in
843 the concluding remarks section of this paper.

36
% error in λ1 % error in λ2
noise
0% 1% 5% 10% 0% 1% 5% 10%
Nu
500 0.131 0.518 0.118 1.319 13.885 0.483 1.708 4.058
1000 0.186 0.533 0.157 1.869 3.719 8.262 3.481 14.544
1500 0.432 0.033 0.706 0.725 3.093 1.423 0.502 3.156
2000 0.096 0.039 0.190 0.101 0.469 0.008 6.216 6.391
Table B.6: Burgers’ equation: Percentage error in the identified parameters λ1 and λ2 for
different number of training data N corrupted by different noise levels. Here, the neural
network architecture is kept fixed to 9 layers and 20 neurons per layer.

% error in λ1 % error in λ2
Neurons
10 20 40 10 20 40
Layers
2 11.696 2.837 1.679 103.919 67.055 49.186
4 0.332 0.109 0.428 4.721 1.234 6.170
6 0.668 0.629 0.118 3.144 3.123 1.158
8 0.414 0.141 0.266 8.459 1.902 1.552
Table B.7: Burgers’ equation: Percentage error in the identified parameters λ1 and λ2
for different number of hidden layers and neurons per layer. Here, the training data is
considered to be noise-free and fixed to N = 2, 000.

844 Appendix B.2. Discrete Time Models


845 Our starting point here is similar to as described in section 3.2. Now
846 equations (7) can be equivalently expressed as

un = uni , i = 1, . . . , q,
(B.4)
un+1 = un+1
i , i = 1, . . . , q.

847 where
q
uni := un+ci + Δt j=1 aij N [un+cj ; λ], i = 1, . . . , q,
q (B.5)
un+1
i := un+ci + Δt j=1 (aij − bj )N [u
n+cj
; λ], i = 1, . . . , q.

848 We proceed by placing a multi-output neural network prior on


 n+c 
u 1 (x), . . . , un+cq (x) . (B.6)

37
849 This prior assumption along with equations (22) result in two physics-informed
850 neural networks  n 
u1 (x), . . . , unq (x), unq+1 (x) , (B.7)
851 and  
un+1
1 (x), . . . , u n+1
q (x), un+1
q+1 (x) . (B.8)
852 Given noisy measurements at two distinct temporal snapshots {xn , un } and
853 {xn+1 , un+1 } of the system at times tn and tn+1 , respectively, the shared
854 parameters of the neural networks (B.6), (B.7), and (B.8) along with the
855 parameters λ of the differential operator can be trained by minimizing the
856 sum of squared errors

SSE = SSEn + SSEn+1 , (B.9)

857 where
q Nn
 
SSEn := |unj (xn,i ) − un,i |2 ,
j=1 i=1

858 and
q Nn+1
 
SSEn+1 := |un+1
j (xn+1,i ) − un+1,i |2 .
j=1 i=1
Nn N N
859 Here, xn = {xn,i }i=1 , un = {un,i }i=1
n
, xn+1 = {xn+1,i }i=1
n+1
, and un+1 =
Nn+1
860 {un+1,i }i=1 .

861 Appendix B.3. Example (Burgers’ Equation)


862 Let us illustrate the key features of this method through the lens of the
863 Burgers’ equation. Recall the equation’s form

ut + λ1 uux − λ2 uxx = 0, (B.10)

864 and notice that the nonlinear spatial operator in equation (B.5) is given by

N [un+cj ] = λ1 un+cj un+c


x
j
− λ2 un+c
xx .
j

865 Given merely two training data snapshots, the shared parameters of the neu-
866 ral networks (B.6), (B.7), and (B.8) along with the parameters λ = (λ1 , λ2 )
867 of the Burgers’ equation can be learned by minimizing the sum of squared
868 errors in equation (B.9). Here, we have created a training data-set compris-
869 ing of Nn = 199 and Nn+1 = 201 spatial points by randomly sampling the

38
870 exact solution at time instants tn = 0.1 and tn+1 = 0.9, respectively. The
871 training data are shown in the top and middle panel of figure B.9. The neural
872 network architecture used here consists of 4 hidden layers with 50 neurons
873 each, while the number of Runge-Kutta stages is empirically chosen to yield
874 a temporal error accumulation of the order of machine precision  by setting6

q = 0.5 log / log(Δt), (B.11)

875 where the time-step for this example is Δt = 0.8. The bottom panel of fig-
876 ure B.9 summarizes the identified parameters λ = (λ1 , λ2 ) for the cases of
877 noise-free data, as well as noisy data with 1% of Gaussian uncorrelated noise
878 corruption. For both cases, the proposed algorithm is able to learn the cor-
879 rect parameter values λ1 = 1.0 and λ2 = 0.01/π with remarkable accuracy,
880 despite the fact that the two data snapshots used for training are very far
881 apart, and potentially describe different regimes of the underlying dynamics.
882

883 A sensitivity analysis is performed to quantify the accuracy of our predic-


884 tions with respect to the gap between the training snapshots Δt, the noise
885 levels in the training data, and the physics-informed neural network architec-
886 ture. As shown in table B.8, the proposed algorithm is quite robust to both
887 Δt and the noise corruption levels, and it returns reasonable estimates for
888 the unknown parameters. This robustness is mainly attributed to the flexi-
889 bility of the underlying implicit Runge-Kutta scheme to admit an arbitrarily
890 high number of stages, allowing the data snapshots to be very far apart in
891 time, while not compromising the accuracy with which the nonlinear dynam-
892 ics of equation (B.10) are resolved. This is the key highlight of our discrete
893 time formulation for identification problems, setting it apart from competing
894 approaches [9, 50]. Lastly, table B.9 presents the percentage error in the
895 identified parameters, demonstrating the robustness of our estimates with
896 respect to the underlying neural network architecture. Despite the overall
897 positive results, the variability observed in tables B.8 and B.9 is still largely
898 unexplained and naturally motivates a series of questions provided in the
899 concluding remarks section of this paper.

6
This is motivated by the theoretical error estimates for implicit Runge-Kutta schemes
suggesting a truncation error of O(Δt2q ) [45].

39
u(t, x)
1.0
0.75
0.5 0.50
0.25
0.0 0.00
x

−0.25
−0.5 −0.50
−0.75
−1.0
0.0 0.2 0.4 0.6 0.8
t
t = 0.10 t = 0.90
199 trainng data 201 trainng data
1.0
0.5
0.5
u(t, x)

u(t, x)

0.0 0.0

−0.5
−0.5
−1.0
−1 0 1 −1 0 1
x x
Exact Data

Correct PDE ut + uux + 0.003183uxx = 0


Identified PDE (clean data) ut + 1.000uux + 0.003193uxx = 0
Identified PDE (1% noise) ut + 1.000uux + 0.003276uxx = 0

Figure B.9: Burgers equation: Top: Solution u(t, x) along with the temporal locations of
the two training snapshots. Middle: Training data and exact solution corresponding to the
two temporal snapshots depicted by the dashed vertical lines in the top panel. Bottom:
Correct partial differential equation along with the identified one obtained by learning
λ 1 , λ2 .

900 References
901 [1] A. Krizhevsky, I. Sutskever, G. E. Hinton, Imagenet classification with
902 deep convolutional neural networks, in: Advances in neural information
903 processing systems, pp. 1097–1105.

40
% error in λ1 % error in λ2
noise
0% 1% 5% 10% 0% 1% 5% 10%
Δt
0.2 0.002 0.435 6.073 3.273 0.151 4.982 59.314 83.969
0.4 0.001 0.119 1.679 2.985 0.088 2.816 8.396 8.377
0.6 0.002 0.064 2.096 1.383 0.090 0.068 3.493 24.321
0.8 0.010 0.221 0.097 1.233 1.918 3.215 13.479 1.621
Table B.8: Burgers’ equation: Percentage error in the identified parameters λ1 and λ2 for
different gap size Δt between two different snapshots and for different noise levels.

% error in λ1 % error in λ2
Neurons
10 25 50 10 25 50
Layers
1 1.868 4.868 1.960 180.373 237.463 123.539
2 0.443 0.037 0.015 29.474 2.676 1.561
3 0.123 0.012 0.004 7.991 1.906 0.586
4 0.012 0.020 0.011 1.125 4.448 2.014
Table B.9: Burgers’ equation: Percentage error in the identified parameters λ1 and λ2 for
different number of hidden layers and neurons in each layer.

904 [2] B. M. Lake, R. Salakhutdinov, J. B. Tenenbaum, Human-level concept


905 learning through probabilistic program induction, Science 350 (2015)
906 1332–1338.

907 [3] B. Alipanahi, A. Delong, M. T. Weirauch, B. J. Frey, Predicting the se-


908 quence specificities of DNA-and RNA-binding proteins by deep learning,
909 Nature biotechnology 33 (2015) 831–838.

910 [4] M. Raissi, P. Perdikaris, G. E. Karniadakis, Inferring solutions of dif-


911 ferential equations using noisy multi-fidelity data, Journal of Computa-
912 tional Physics 335 (2017) 736–746.

913 [5] M. Raissi, P. Perdikaris, G. E. Karniadakis, Machine learning of linear


914 differential equations using Gaussian processes, Journal of Computa-
915 tional Physics 348 (2017) 683 – 693.

916 [6] H. Owhadi, Bayesian numerical homogenization, Multiscale Modeling


917 & Simulation 13 (2015) 812–828.

41
918 [7] C. E. Rasmussen, C. K. Williams, Gaussian processes for machine learn-
919 ing, volume 1, MIT press Cambridge, 2006.

920 [8] M. Raissi, P. Perdikaris, G. E. Karniadakis, Numerical Gaussian pro-


921 cesses for time-dependent and non-linear partial differential equations,
922 arXiv preprint arXiv:1703.10230 (2017).

923 [9] M. Raissi, G. E. Karniadakis, Hidden physics models: Machine


924 learning of nonlinear partial differential equations, arXiv preprint
925 arXiv:1708.00588 (2017).

926 [10] H. Owhadi, C. Scovel, T. Sullivan, et al., Brittleness of Bayesian infer-


927 ence under finite information in a continuous world, Electronic Journal
928 of Statistics 9 (2015) 1–79.

929 [11] K. Hornik, M. Stinchcombe, H. White, Multilayer feedforward networks


930 are universal approximators, Neural networks 2 (1989) 359–366.

931 [12] A. G. Baydin, B. A. Pearlmutter, A. A. Radul, J. M. Siskind, Au-


932 tomatic differentiation in machine learning: a survey, arXiv preprint
933 arXiv:1502.05767 (2015).

934 [13] C. Basdevant, M. Deville, P. Haldenwang, J. Lacroix, J. Ouazzani,


935 R. Peyret, P. Orlandi, A. Patera, Spectral and finite difference solu-
936 tions of the Burgers equation, Computers & fluids 14 (1986) 23–41.

937 [14] S. H. Rudy, S. L. Brunton, J. L. Proctor, J. N. Kutz, Data-driven


938 discovery of partial differential equations, Science Advances 3 (2017).

939 [15] I. E. Lagaris, A. Likas, D. I. Fotiadis, Artificial neural networks for


940 solving ordinary and partial differential equations, IEEE Transactions
941 on Neural Networks 9 (1998) 987–1000.

942 [16] D. C. Psichogios, L. H. Ungar, A hybrid neural network-first principles


943 approach to process modeling, AIChE Journal 38 (1992) 1499–1511.

944 [17] J.-X. Wang, J. Wu, J. Ling, G. Iaccarino, H. Xiao, A comprehensive


945 physics-informed machine learning framework for predictive turbulence
946 modeling, arXiv preprint arXiv:1701.07102 (2017).

42
947 [18] Y. Zhu, N. Zabaras, Bayesian deep convolutional encoder-decoder net-
948 works for surrogate modeling and uncertainty quantification, arXiv
949 preprint arXiv:1801.06879 (2018).

950 [19] T. Hagge, P. Stinis, E. Yeung, A. M. Tartakovsky, Solving differen-


951 tial equations with unknown constitutive relations as recurrent neural
952 networks, arXiv preprint arXiv:1710.02242 (2017).

953 [20] R. Tripathy, I. Bilionis, Deep UQ: Learning deep neural network sur-
954 rogate models for high dimensional uncertainty quantification, arXiv
955 preprint arXiv:1802.00850 (2018).

956 [21] P. R. Vlachas, W. Byeon, Z. Y. Wan, T. P. Sapsis, P. Koumoutsakos,


957 Data-driven forecasting of high-dimensional chaotic systems with long-
958 short term memory networks, arXiv preprint arXiv:1802.07486 (2018).

959 [22] E. J. Parish, K. Duraisamy, A paradigm for data-driven predictive mod-


960 eling using field inversion and machine learning, Journal of Computa-
961 tional Physics 305 (2016) 758–774.

962 [23] K. Duraisamy, Z. J. Zhang, A. P. Singh, New approaches in turbulence


963 and transition modeling using data-driven techniques, in: 53rd AIAA
964 Aerospace Sciences Meeting, p. 1284.

965 [24] J. Ling, A. Kurzawski, J. Templeton, Reynolds averaged turbulence


966 modelling using deep neural networks with embedded invariance, Jour-
967 nal of Fluid Mechanics 807 (2016) 155–166.

968 [25] Z. J. Zhang, K. Duraisamy, Machine learning methods for data-driven


969 turbulence modeling, in: 22nd AIAA Computational Fluid Dynamics
970 Conference, p. 2460.

971 [26] M. Milano, P. Koumoutsakos, Neural network modeling for near wall
972 turbulent flow, Journal of Computational Physics 182 (2002) 1–26.

973 [27] P. Perdikaris, D. Venturi, G. E. Karniadakis, Multifidelity information


974 fusion algorithms for high-dimensional systems and massive data sets,
975 SIAM J. Sci. Comput. 38 (2016) B521–B538.

976 [28] R. Rico-Martinez, J. Anderson, I. Kevrekidis, Continuous-time nonlin-


977 ear signal processing: a neural network based approach for gray box

43
Another random document with
no related content on Scribd:
Our camp that night was pitched among trees, and some men
brought a big horned owl to show us, a beautiful creature, buff with
dark markings, and held by a string tied to its leg. My brother gave its
captors money to release it, and I rejoiced to see it flap its great
wings and sail off to the shelter of a tall Turkestan elm, where I
trusted that it would rest in security.
We often saw the great golden eagles which are trained to hunting in
this part of the world. They kill gazelles, hares and foxes, and I
always wondered how their masters could ride at breakneck pace
and mount and dismount while carrying such a weight on their arms.
The great birds seemed wonderfully docile, and apparently
indifferent as to whether their hoods were on or off. The hunting
eagle is captured by means of a live fox tied to a rope; the bird,
busily employed in tearing its prey, does not observe that the quarry
is being drawn by the rope gradually nearer and nearer to a hole, in
which the hunter lies concealed with a net to throw over the eagle.
When captured the unfortunate bird is confined in a dark room, its
eyelids are sewn up, and its spirit is broken by the incessant beating
of drums which allows it no sleep. It remains morose for a time,
refusing all food, but gradually becomes tame and attaches itself to
the man who feeds it and takes it out hunting.
A HUNTING EAGLE.
Page 182.

The British Consul-General is always welcomed throughout Chinese


Turkestan, and I will give a description of our entry into Yarkand,
which will serve as an example of what occurred at every town
during our tour. Some miles from the city we were met at intervals by
groups of British subjects, mostly Hindus, who dismounted to greet
my brother and then rode behind us, our escort thus becoming
bigger and bigger as we proceeded. Some of its members were but
indifferent horsemen. Now and again a rider would be thrown and his
steed gallop off, or a horse tethered by the roadside would break
loose, agitating the procession and making my chestnut scream with
excitement until the runaways were captured, usually by the men
from Punyal.
Old Jafar Bai had a reception all to himself. Though he lived at
Kashgar and owned shops there, he told me that the chief part of his
property was at Yarkand, acquired in the old days when he owned a
caravan and carried goods between the two towns. I was interested
to note the number of acquaintances who clasped his hands warmly,
and, when we stopped to partake of the usual spread of fowls, eggs
and tea laid out in a marquee, the old man had the joy of seeing his
small grandson brought to him by his son-in-law. He kissed the child
passionately, and then, full of pride, brought it to me and smiled as I
gave the little fellow sweets and biscuits.
After this the whole company remounted and swept on again, to be
stopped nearer to the city by the Russian Agent accompanied by the
Russian subjects, who were standing in a large group beside tables
laden with food, to which our servants always did full justice,
surprised that their employers did not appreciate these incessant
meals. Just outside Yarkand the beating of drums, the squealing of
pipes and the scraping of tars, producing music most excruciating to
European ears, announced the Chinese reception. As I always
avoided this ceremony, I was glad to be met by Dr. Hoegberg, head
of the Swedish Missions and incidentally the architect of the Kashgar
Consulate, who drove me along the broad tree-bordered road to the
new Chinese town and through interminable bazars to the pleasant
garden-house of the British Agent.
“The people of Yarkand display an entire lack of energy and
enterprise, or indeed of any interest in life,” was the dictum of
Lieutenant Etherton, who visited the city in 1909. Though I thought
the statement somewhat sweeping at first, I soon noticed how
apathetic the Yarkandis were when contrasted with the lively,
laughing Kashgaris, and the reason was not far to seek. The
inhabitants of this district are afflicted with goitre in its most
distressing forms; and the Swedish doctor told us he believed that
about fifty per cent of the population were victims of the complaint,
which in his opinion was not the same as the European goitre, and
for which he knew of no remedy save iodine. One theory is that it is
due to the habit of drinking stagnant water stored in tanks, the river
unfortunately being at some distance from the city; but the peasantry
living right out in the country are by no means exempt from the
scourge. Many thus affected become idiots, and the children of
goitrous parents inherit the disease, which Marco Polo commented
on in the following words: “A large proportion of them have swollen
legs and great crops at the throat, which arises from some quality in
their drinking-water.” The old Chinese travellers also make mention
of the complaint, but I heard that the Celestials, who boil all their
water, whether used for drinking or for washing, never fall victims to
it, nor apparently do the Hindu traders or travellers, although if they
marry Yarkandi women their children may develop it. Some say that
all who drink from a certain canal are sure to contract the disease,
while others affirm that it is caused by the grey water of the Yarkand
River. Be that as it may, the health of half the population is
undermined, and the aged and children alike are sufferers, some
unfortunates having their heads permanently tilted backwards by the
horrible swelling in their throats. This has given rise to the popular
anecdote of the man who rode his horse to the water but had to ask
a neighbour if the animal were drinking, as he could not himself look
down to see.
Besides goitre and skin-diseases induced by lack of washing, opium
and hashish-smoking, and the squalor in which they live, contribute
to the sickly look of the people, and I decided that dirty, dusty ruinous
old Yarkand was a good place to live out of. The mosques and
shrines were in a state of dilapidation, and in spite of a large body of
Hindus, who trade with India by one of the highest routes in the
world, the whole place looked much poorer than Kashgar.
Masses of snowy-white cotton were to be seen everywhere in the
bazars, ready for the stuffing of cushions and quilts or to be spun
into yarn, while at odd corners we came across groups of children
busily removing the pods or beating out the seeds with sticks. Here,
as at Kashgar, there is no grazing for the sheep; hence the poor
quality and the toughness of the mutton. The animals were trying to
get some nourishment from the withered cotton bents, and I
sometimes saw a woman holding out bunches of lucerne to her half-
starved charges or letting them munch dried maize leaves from a
basket. One must ride in single file through the narrow alleys of the
bazar, which are covered in with awnings of maize leaves to keep off
the heat. Children and chickens get in the way; here a goat is tied up
or a camel is lying down in the midst of the traffic; there a horse,
tethered by a rope to a stall, lashes out with its heels at passing
riders, and now and again one gets glimpses of extremely unsavoury
courtyards. But in fairness to the inhabitants of Chinese Turkestan I
defy any one to keep clean who has to live in a house of unbaked
mud where there are no washing arrangements, and where, in the
absence of chairs, every one must sit on the mud floor: fortunately
the brilliantly coloured flowered prints do not show the prevailing dirt
as much as might be expected.
The best shops in the bazar were near the Hindu serai, that was
hung with silks in honour of my brother’s visit, and I was told that the
Chinese are so considerate to the traders from India that they forbid
the opening of any butchers’ shops near their quarters, and orders to
this effect, inscribed on boards and stuck up on walls, were pointed
out to me. Sometimes the Yarkandis tear down these notices and the
butchers reopen their stalls, but whenever this occurs a complaint
from the Hindus to the authorities is ultimately successful. This
praiseworthy tolerance of the religious views of other races partly
accounts for the easy Chinese mastery over a Mohamedan
population.
Quantities of beautiful fruit, such as peaches and grapes, were on
sale in the bazars, the vendors keeping off the swarms of flies by
means of horse-hair flappers, and naked children were munching
enormous chunks of melon. Horses were being shod, horse-shoes
hammered out on the anvils, and near by picturesque copper pots
were being worked into shape, a noisy operation. At intervals we
came across a mosque with the columned verandah so
characteristic of the province, its beams and pediments covered with
incised carving something in the style of Jacobean work. The
principal mosque had lost about half the blue and white tiles that had
once adorned its façade, and the city wall was out of repair to such
an extent that people could enter the town by many a breach after
the crazy-looking wooden doors had been closed at sunset.
Among the callers on my brother was the son of the Thum of Hunza,
whose defeat by the British in 1891 is so graphically described in
Knight’s book Where Three Empires Meet. The young chief, who
was a child at that time, now ekes out a penurious existence on a
small estate given to his ancestors by the Chinese, and has a
pension of a couple of taels a month, a sum equivalent to 4s. 8d.
Safdar Ali Khan, the old Thum, after his defeat fled to Kucha, where
he still lives with an ancient retainer or two, and earns a humble
livelihood as a market-gardener. Sic transit!
During our stay I had the pleasure of entertaining a Yarkandi lady.
She arrived accompanied by her mother and three sons, and was
clad in a purple satin coat, while across her forehead was a richly
embroidered head-band, over which fell in graceful folds her long
white muslin shawl. When she had removed her lace-work veil her
pretty face was set off by big gold earrings and her long black plaits
reached half-way down her back. I photographed both ladies,
together with the small boys, who were attired in velvet. Going next
day to return their visit, I found myself in a garden that had formerly
belonged to Yakub Beg, where the mud platform on which he was
wont to perform his devotions was pointed out to me. On this
occasion I gained a little insight into native etiquette; for my hostess,
after graciously accepting a small gift which I presented, put it aside
and did not open the parcel until I had retired, it being considered
bad manners to look at and admire a present in the way that
Europeans are accustomed to do. Our conversation happening to
turn on scorpions, my hostess said that she had suffered agonies for
three days after having been stung by one, and her husband related
that the followers of a certain Indian saint have the power of taking
away the pain of a scorpion sting by breathing on the afflicted part.
Though he had not had personal experience of this, he had met
many who swore that they had been cured instantly by this means,
which was perhaps akin to hypnotism. On our return to Yarkand
some three weeks later I was invited to attend the feast of the
“shaving of the head” of my host’s youngest son, but having no
interpreter, as men were tabooed, I declined, though I much
regretted missing the sight of some forty or fifty ladies attired in their
best and adorned with much jewellery.
While at Yarkand we visited the little colony of boys and girls who
were being trained by the Swedish missionaries in their large
compound. These children were taught to read and write in Turki, to
weave and to sew. The girls cooked all the food, made the bread and
did the housework, wearing aprons over their gowns of pretty
Russian print. The boys were dressed in clothes of their own
weaving, and Mrs. Hoegberg hoped that the girls might later on
marry the boys, who were being trained to be self-supporting. In any
case she trusted that they might lead happier lives than usually befall
the maidens of Chinese Turkestan, who are practically sold by their
parents and are often handed over to old men. It is true that the
husband engages to pay a certain sum for the maintenance of his
wife should he divorce her, and this he does in the presence of
witnesses. But the onus of finding these witnesses and bringing
them up before the Imam lies on the woman, and the man can often
persuade them to swear that he promised to pay his wife much less
than he really did. The parents of a wealthy woman can help her to
obtain her rights, but a poor woman may have a hard fight for bare
existence before she can find a new husband to support her.
Village life is better for the women than life in the town, for they have
fewer matrimonial adventures, and there are none of the temporary
marriages that are common in all the centres of population. I noticed
that they veiled far less in Yarkand than in Kashgar, the result of
stronger Chinese influence; but here and throughout the province
they were not permitted to enter the little village mosques that are
such a characteristic feature of the country. These places of worship
are usually built by some pious benefactor, who gives a piece of land
for an endowment fund. This is called a wakf or “trust,” and the
trustees appoint a mulla, who is often a villager with a good voice
who merely calls the Faithful to prayer.
Dr. and Mrs. Hoegberg had done missionary work in Persia, and said
that they found the Turki very slow-witted and disinclined to discuss
religion, a strong contrast in this respect to the keen-brained,
argumentative Persians, who enjoy nothing more than metaphysics,
and, being Shias, are less orthodox and priest-ridden than the
primitive Sunnis of Chinese Turkestan.
Whether Christianity is gaining a hold in Chinese Turkestan or not,
the high standard which it sets up is not without its influence, as the
following anecdote told me by Dr. Hoegberg shows. A Yarkandi
merchant went with some traders to buy figs, and on the way his
friends jeered at him on account of his leanings towards Christianity.
When they reached the market they were offered the fruit packed in
baskets said to contain a hundred, but the buyers never dreamt of
trusting the word of the vendor, and counted the contents of the
baskets, finding several figs short in each. The merchant then
enquired of his colleagues whether, when they bought calico or print
that had come from Europe, they found any deficiency in the number
of yards that were stamped upon each piece. “Never,” they
answered in chorus, and he then pointed out that this honesty was
due to Christian principles of fair-dealing.
CHAPTER X
THROUGH THE DESERT TO KHOTAN
... The view was boundless, there were no traces either of man or horse,
and in the night the demons and goblins raised fire-lights as many as the
stars; in the day time the driving wind blew the sand before it....—Travels
of Hiuen Tsiang.

Yarkand is the richest oasis in Chinese Turkestan, but we did not


appreciate this fact until we had left the city and saw the open
country covered with wide stretches of rice, maize, wheat and millet;
and I confess that I had to revise my opinion as to the lethargy of the
Yarkandis, or at all events of the peasantry, when I realized the
ceaseless labour required to produce such abundance.
The Yarkand River, the source of which had recently been fixed by
the Filippi expedition, was about six miles from the town, and we
crossed it in broad ferry-boats like punts, which were some forty feet
long. We clambered over a barrier at one end of the boat, and our
nine horses, stepping in nimbly behind us, one after the other,
without any fuss, were packed in tightly, close up to the plank that
separated us from them. Sattur’s mapa was fixed into a second boat
with some difficulty as it was too broad, but finally all our belongings
were settled, and two muscular men—one handling a long pole and
the other a paddle—took us across the river, which is dangerous on
account of its shifting quicksands. Our horses seemed to enjoy the
novel experience, some of them craning over to drink as we slowly
approached the opposite bank. There I anticipated some trouble, as
the animals had to turn round and step out at the end by which they
entered. However, they grasped the situation at once, and very soon
we were mounted, fording a couple of shallow branches of the main
stream and stumbling over a dreary waste of rounded boulders
which formed an old river bed. Beyond this lay trees and villages and
a band of British subjects ready to welcome us with the inevitable
tea, fruit and sweetmeats; an attention that I did not appreciate, as
several of our hosts were afflicted with goitre in its most distressing
forms.
At Posgam, where we halted for the night, quarters were assigned to
us in a garden that boasted a magnificent walnut-tree, and we had
our beds placed on platforms outside the attractive garden-house,
where my room, carpeted with crudely coloured products of the
loom, had fretted woodwork windows.
Next day our twenty-four mile march led us entirely through
cultivation, along a broad highway bordered with willows, the rice
fields stretching for many acres on either side. The River Tiznaf
flowed clear over a stony bed, in pleasing contrast to the muddy
streams we had encountered hitherto, and we were told that those
drinking from it never suffered from goitre.

FERRY ON THE YARKAND RIVER.


Page 192.

In this part of the world it is customary for the villages to open their
bazars on different days and to name them accordingly. At the
Panjshamba, or Thursday market, every kind of article is offered for
sale, because the bazars are all closed on Juma (Friday), the day on
which the Faithful visit the mosques, and I was told that at Khotan
the Chahar-shamba (Wednesday) bazar is held only for the sale of
milk products.
We met crowds of people coming to the Posgam market. There were
beggars galore, whole families of them, sometimes accompanied by
big dogs; and tramping along to gain their livelihood were the
religious mendicants, who were striking figures clad in rags of many
colours, wearing sugar-loaf hats and carrying bowls and stout sticks,
or sometimes gourds and rattles. They evidently aimed at the
picturesque in their appearance, and their outward dirt was a sign of
inward holiness and conferred on them the power to drive away
demons and heal diseases. Farther on we came across musicians
carrying tars, some having instruments resembling zithers and
others drums and pipes, while parties of Chinese laden with
gambling tables struck a sinister note. The crowd was largely
composed of women of the peasant class mounted on ponies or
donkeys and driving their cattle and sheep to market, some clasping
fowls in their arms. Two or three wore a curious globular hat of cloth
of silver, the like of which I saw only once at Kashgar, when I was
told that it was the headgear of a bride. All the world seemed bound
for Posgam, and as we passed through village after village on our
way to Kargalik hardly any one was to be seen, and the little stalls
under the vine-covered trellises that roofed in the bazars were
shuttered up or bare, with the exception of the bread stalls. The
boxes of flowers on the roofs gave touches of light and colour in the
form of asters, balsams and marigolds, while here and there masses
of golden maize were drying in the sun.
On this occasion the Hindus had provided for us a refection of chops
and poached eggs, evidently considering this food more suitable for
a Sahib than the usual fowls, and when we had coped with this I left
my brother to enjoy the reception given by the Russian subjects,
and, attended by Jafar Bai, rode on to our quarters, passing the
Chinese Amban on his way to greet the British Consul-General. This
dignitary, with a most impassive face, drove in an elaborately painted
mapa, preceded by a youth carrying a huge magenta silk umbrella
with a deep fringe, while his escort of soldiers, in quaint black
uniforms, were carrying mediaeval-looking spears and halberds.
The house prepared for us stood in a little garden crammed with
vegetables and with enormous specimens of the misshapen and
velvety crimson coxcomb. An outside staircase led to a balcony that
ran round a large upper room with heavily barred wooden windows,
which was the ladies’ abode—a very depressing one to my mind, as
it remained in perpetual twilight, and from it no glimpse could be
obtained of the outside world, though its smells and noises were
extremely obvious. But, as I slept on the balcony, it served me for a
convenient dressing-room, as well as for a retreat when my brother
held the usual receptions of British subjects and Chinese officials in
the house below.
About this time all the horses seemed to become lame at once. The
Badakshani chestnut and the grey both took to limping, and the nice
little pony on which I rode astride cut its fetlock badly. Kalmuck, our
last purchase, though sound, was an exasperatingly sluggish horse
and consequently very fatiguing to ride. Jafar Bai, as usual, persisted
that the lameness was due to my brother’s order to water the horses
after they had been about an hour in camp, and was in no way
convinced when it was proved that bad shoeing had lamed one
animal, and when the others gradually recovered in spite of
adherence to the English rules as to forage and watering.
We were now to have our first sight of the real desert, which lay
between us and the Khotan Oasis. On the night before our march
across it we rested in a tiny village on its very edge, some of the
mud-built houses being half-buried by the sand and others having
trenches dug round them to keep it off. An irrigation channel ran
between willows, with patches of cultivation on either side. We put
up as best we could in the courtyard of a serai, the building itself
being too crowded with peasants to accommodate us. Owing to the
reluctance which all Orientals feel to leaving a town, the drivers of
the arabas, in spite of their being drawn by five horses apiece,
arrived so late that our supper, eaten by the light of the moon, was
extremely scanty.
When we rose in the morning the desert stretched before us vast
and undulating. In Canada in the early spring the prairie, reaching to
the far horizon on either side of the train, had reminded me of a
desert, so limitless, so barren and devoid of life did the largest wheat
field in the world appear. But oh, the difference! The Takla Makan
kills all life unless there is water to correct its baleful influence, while
the prairie holds in its bosom food for millions.
As we rode on our way at six o’clock the early morning wind was
swirling up the sand, obscuring the sky and magnifying everything
strangely. At intervals the potais, most of which were in a ruinous
condition, loomed monstrous through the haze, a caravan that I
imagined to be composed of camels resolved itself into a group of
diminutive donkeys, while a gigantic figure draped in fluttering robes
turned into a harmless peasant carrying a staff and water-gourd. We
followed the broad track made by arabas and the hoofs of countless
animals; but I thought how easy it would be to lose the way, were a
strong wind to blow the sand across our route and cover the skulls
and other traces of bygone caravans. In the days of Hiuen Tsiang
and Marco Polo there were no potais, and travellers must often have
been lost; indeed the Chinese pilgrim tells us that when he crossed
this desert the heaps of bones were his only means of knowing
whether he was following the right track or not. I was interested to
hear that this particular stage had the reputation of being haunted
and that no peasant would traverse it alone at night. In fact, a Hindu
trader told Iftikhar Ahmad that he and his servants had been greatly
terrified a few days before our arrival. They were travelling after dark
and, though there was no moon, a sudden light in the sky revealed a
broad road bordered by irrigation channels and trees, along which
marched an army. The onlookers imagined from their uniforms that
the soldiers were Turks, but they could not see their faces, and
suddenly they vanished, only to give place to droves of cattle and
sheep, which seemed to pour in an unending stream past the
frightened travellers. In the life of Hiuen Tsiang mention is made
more than once of the hallucinations to which he was subject in the
desert, and the following passage occurs: “He saw a body of troops
amounting to several hundreds covering the sandy plain—the
soldiers were clad in fur and felt. And now the appearance of camels
and horses and the fluttering of standards and lances met his
view....” I quote this passage because the Chinaman’s vision in the
seventh century seems strangely akin to that of the Hindu and his
servants. As we neared the large oasis of Guma the inevitable
receptions began several miles out in the desert, and I was struck
with the appearance of our host, the Aksakal. He was a tall,
handsome man, remarkably like a high-class Persian, and wore a
long mauve coat with a magenta waistband, and a purple felt hat
with broad gold band, a purchase from India. He installed us in his
newly built house, which, being in the middle of the bazar, was the
haunt of legions of flies. It consisted of several small rooms opening
on to a little courtyard planted with shrubs and flowers, over which
lovely humming-bird moths were hovering; but, as there was no exit
at the back and we were at very close quarters with our servants, I
did not altogether appreciate what was evidently the ne plus ultra of
Guma taste. Our rooms and the verandah were painted in pink and
mauve, the window frames bright green with their shutters picked out
in blue and brown, while above the window of the principal room was
a richly coloured and gilded floral design. The entrance door, draped
with green plush, cloth of gold and silver and a piece of purple and
green embroidery, and the chairs, upholstered in orange and sky-
blue velvet, made up a gorgeous whole, in which I felt rather like a
prisoner, as I had to retreat constantly to my apartment, pull the
shutters to, and sit in a dim twilight when the Chinese Amban and
other callers arrived in state.
Guma is noted for its manufacture of paper, and we went to see the
process. The pale green lining of the bark of the mulberry is boiled in
great iron pots and ladled out upon broad stones, to be pulped by
wooden hammers. The mixture is then spread over canvas-filled
frames which are held under water during the operation, and
afterwards set upright in the open air to dry, when sheets of a coarse
whitish paper about the size of foolscap can be pulled off the canvas.
This paper is mainly used for packing; if needed for writing, it is
rubbed with glass to glaze it.
As the oasis is rich in mulberry trees it produces a considerable
amount of silk; but Khotan is the chief centre of this profitable
industry. The women tend the silkworms.
The soil of Guma is so sandy that the inhabitants cannot build the
usual mud-houses, but are obliged to have recourse to wattle-and-
daub structures, composed of a framework of sticks plastered inside
and out with a mixture that is for ever dropping off in flakes, thereby
giving to these dwellings a most unsubstantial air. I noticed that in
the cemeteries the graves were marked by tall withered saplings, to
denote the sites when they are covered up by the all-pervading
sand.
The time of our visit coincided with the Mizan or Equinox, which is
supposed to mark the close of the hot weather, and the “kindly fruits
of the earth” were nearly ready for the harvest. The cotton crop was
being gathered, its bursting pods lying on the ground; the handsome
man-high maize and millet were yellowing, and we revelled in
delicious corn cobs, boiled and then smeared with butter and
sprinkled with salt, as I had learned to eat them in Canada. We were
also given another vegetable, the roots of the lotus, which the
Chinese look upon as a delicacy; but it did not appeal to my taste.
The pomegranates were a glorious scarlet and the many varieties of
grapes were in their prime; the melons, peaches and nectarines had
passed their zenith.
On the evening before we left Guma our servants, together with the
various travellers who had attached themselves to our party,
organized an entertainment. There was much singing, the
performers yelling at the top of their voices, accompanied by a
thrumming of sitars, a thudding of drums and a squealing of pipes.
Three of the men executed a pantomime dance, one being disguised
as a woman, another as an old man, and the third, a handsome
young fellow, having no make-up at all. All three went round in a
circle one after the other with curious steps and much waving of
arms, the play being based on the well-known theme of the girl-wife
snatched from an old husband by her youthful lover. I felt rather like
an Oriental woman as I watched the show from behind a curtain, and
was amused to hear later that I was considered to be a model of
discreet behaviour because I had not attended any of the Chinese
banquets.
It was rather disturbing at night to hear the Chinese watchman going
his rounds, beating two sticks together as an assurance to the
citizens that he was guarding them faithfully, but I fancy that he and
his colleagues were of the Dogberry type and would probably
pretend not to notice were any devilry afoot.
Although we saw very little veiling after we had left Yarkand, this
Mohamedan custom prevailing less and less the nearer we
approached China, the women were extremely nervous at our
approach, having seldom or never seen Europeans. They would
rush in all directions, hiding their faces in the long cotton shawls
which they wore over their heads, and would vanish like rabbits into
their mud hovels, giving me the queer sense of being watched by
legions of eyes as we rode through the mean bazars. There were
many public eating-houses in this part of the world, with Chinese
painted screens to hide the customers seated behind them, and with
gaily coloured pictures on the walls. The food was cooked in big
cauldrons in full view of the public, and I was told that the restaurant-
keepers, who are Tunganis (Chinese Moslems) usually become rich,
especially in one district, where both men and women take all their
meals in public. As a rule no payment is demanded until six months
have elapsed, and then mine host goes round to collect his debts,
with the not uncommon result that greedy folk who have partaken
too lavishly of the seven dishes provided are obliged to sell their
property in order to pay up. Fuel is certainly a heavy item for the
poor, who use it only for cooking and not to heat their houses;
therefore these restaurants, if used with discretion, ought to make for
economy.
During this journey the weather as a rule was perfect—fresh in the
morning and evening, quite cold at night, and only during the middle
of the day uncomfortably hot. I felt as if I were on a riding-tour and
picnic combined, so little of the discomforts of travel did we
experience, the supply question being easy and our servants doing
their work with scarcely a hitch. At night we generally slept in the
open air under our mosquito nets, and when the full moon rode
across the heavens I was often obliged to bandage my eyes to shut
out the brilliant light.
It was on our march between Sang-uya and Pialma that the desert,
for once, showed itself in an unamiable mood. The morning was fine
when we left our comfortable quarters in a Chinese country house,
and we soon entered the region of sand-dunes, our horses racing up
and down them with much spirit, though the loose sand made the
going very heavy. We stopped a picturesque party of wayfarers with
their donkeys in order to photograph them, and gave them money for
their trouble. They posed themselves and their animals as my
brother directed, but when we had finished they remarked that they
had expected to be shot, as they imagined the camera to be some
kind of firearm! Not unnaturally I thought that this was a joke on their
part, but later on we passed a company of beggars, and my brother
took a group consisting of a wild-looking woman leading an ox and a
man wearing a red leather sugar-loaf hat. I noticed that the latter
clasped his hands in an attitude of entreaty as he stood perfectly
motionless beside the animal, and when he received his douceur he
burst into speech, saying with many exclamations that he had verily
believed that his last hour had come. These incidents gave me a
glimpse of the docile spirit of the race, and partly explained why the
inhabitants of Chinese Turkestan have nearly always been ruled by a
succession of foreign masters. They are small cultivators and petty
shopkeepers, taking little interest in anything outside their immediate
circle, and their life seems to destroy initiative and independence,
thus rendering the task of their Chinese rulers easy.
The morning breeze that blew in our faces was pleasant enough at
first, but gradually turned into a gale, which raised the sand in such
great clouds that the sun and sky were obscured with a yellow haze.
In spite of my veil and blue goggles the grit whipped my face and
eyes as we galloped our fastest in order to reach our destination
before matters grew worse. The horses were much excited, being as
anxious as we were to escape from the whirling sand, and it was
annoying when the grey broke loose from the rider who was leading
him and cantered off until we nearly lost sight of him in the thick
haze. A couple of men did their best to head him back, while the rest
of us waited, my chestnut screaming loudly and plunging violently in
his eagerness to join in the chase. The grey behaved in the usual
provoking manner of horses on the loose, circling round and round
us, almost letting himself be caught, and then galloping off a short
distance before he returned to coquet with the other horses. Finally
my brother made a lucky snatch at the trailing halter, and off we went
faster than ever, noting with thankfulness potai after potai as they
loomed up out of the blinding dust. Suddenly a change occurred that
seemed almost like magic. We plunged into a tree-bordered lane
with fields of maize stretching on either side, while overhead the
clear blue sky seemed free from every particle of dust. I looked back
at the whirling yellow inferno from which we had escaped, and in a
few minutes thankfully dismounted in a large garden with irrigation
channels through which the water flowed with a faint delicious
splashing. Here our tents were in readiness, pitched under shady
trees, and hot tea was brought that served a double purpose; for we
found it a soothing lotion for our sore eyes as well as grateful to our
parched throats.
The waggons, which had done this last stage during the night, left
again at five o’clock in the afternoon, as the horses would be forced
to do a double stage of some thirty miles, with no water obtainable
on the road. But the animals had had thirteen hours’ rest and the
going was good for the first part of the way, so we hoped they would
be able to manage it. We ourselves were to break the stage at Ak
Langar, some fourteen miles away, and rest there for four hours
before undertaking the remainder of the march, which, we were told,
was a continuous series of lofty sand-dunes. Accordingly, after our
evening meal we mounted at seven o’clock, and leaving the little
oasis, rode off under the full moon across an absolutely barren
gravelly desert. We were told that some years before our visit a
governor of Khotan had placed posts at intervals along this stage,
upon which lamps were hung and lighted on dark nights. Unluckily
this benefactor, a rara avis among officials, failing to keep his
finances in order, was dismissed from his post and was now
dragging out a precarious existence in the Chinatown of Kashgar.
We of course stood in no need of lanterns, but in spite of the
moonlight the desert seemed rather eerie, and our horses,
unaccustomed to night marches, were curiously nervous and
suddenly shied at some dark moving shapes that turned out to be
camels grazing on the scanty tamarisk scrub. A little farther on they
were startled by a large dog, which we disturbed at its meal on a
dead ass, and here and there the moon gleamed on the white bones
of deceased pack-animals that lay beside the track. I am not
ashamed to confess that I should not have cared to ride this stage
alone, and I did not wonder that the peasants whom we passed
driving laden donkeys were always in large parties.
After a while we came to a ruined potai, against which a rough post
was leaning, and learnt that this was the boundary between the
districts of Kargalik and Khotan. We were therefore in the Kingdom
of Jade, and our horses, having become used to their novel
experience, trotted along briskly in the keen night air, pricking their
ears and hastening whenever they espied the remains of a deserted
serai sharply silhouetted in the moonlight; for they were as anxious
for their night’s rest as I was.
With the exception I have mentioned there were no potais to mark
this particular route, so I had not the pleasing sensation of knowing
that two and a quarter miles were accomplished whenever we
passed one, and was feeling extremely sleepy, when a black mass
of building seemed to rear up suddenly ahead of us. It was just upon
midnight, and I was most thankful to dismount and pass into a serai
built of hewn stone, the welcome cleanliness of its rooms being due
to the fact that practically no one halted there, owing to the lack of
water. Yet the first sight that met my eyes was a man drawing up a
bucket from a well by means of a windlass; but Jafar Bai explained
that the water was bitter and harmful to horses.
The natives had given us such alarming accounts of the difficulties of
the latter part of the stage that, tired as we all were, we were allowed
to sleep for only four hours, and it seemed to me as if I had hardly
closed my eyes when Sattur roused me. He brought a lighted candle
by which I dressed; for my room had no window and opened on to
the public courtyard, and a fat pigeon, disturbed by the light, flopped
down from the rafters and fluttered feebly round and round until I let
it out.
When we rode off in the crisp air of the early morning we were
surprised to find that for some miles ahead of us the road lay across
a gravelly plain that made excellent going for horses and baggage
waggons. Close to the serai four huge vultures were feeding on the
remains of a dead camel, and the loathsome birds were so gorged
that on the approach of our party they could only with difficulty flap or
hobble away for a few feet; they watched us until we had passed and
then returned to their interrupted meal. How horrible it must be for a
dying animal to be ringed about with these birds biding their time, or
even fastening on their prey before life is extinct! Owing to the recent
storm the atmosphere was unusually clear, and we enjoyed the
somewhat rare experience of seeing the lower slopes of the Kuen-
lun range, the existence of which was not even mentioned by Marco
Polo, presumably on account of its invisibility, which is notorious.
After a while we rode among low sand-dunes curved and ribbed by
the wind, and then crossed a high ridge that was more like a low hill
than a dune and must have meant a stiff pull for even our five-horse
arabas. Below its crest stood a couple of wooden posts, signifying
that we had reached the boundary of the famous Kaptar Mazzar or
Pigeon Shrine, where all good Moslems must dismount to approach
the sacred spot on foot. There in the midst of the sand lay a
graveyard marked by poles on which hung fluttering rags and bits of
sheepskin, and near by was a tiny mosque with fretted wooden door
and window and some low buildings, the roofs of which were
crowded with grey pigeons. Legend has it that Imam Shakir
Padshah, trying to convert the Buddhist inhabitants of the country to
Islam by the drastic agency of the sword, fell here in battle against
the army of Khotan and was buried in the little cemetery. It is
affirmed that two doves flew forth from the heart of the dead saint
and became the ancestors of the swarms of sacred pigeons that we
saw. Our arrival caused a stir among them and a great cloud rose
up, with a tremendous whirring of wings, and some settled upon the
maize that our party flung upon the ground as an offering.
The guardian of the shrine, in long blue coat and white turban, left
his study of the Koran and, accompanied by his little scarlet-clad
daughter, hurried to meet us. My brother asked them to attract their

You might also like