Download as pdf or txt
Download as pdf or txt
You are on page 1of 61

Analysis and Visualization of Discrete

Data Using Neural Networks Koji


Koyamada
Visit to download the full and correct content document:
https://ebookmass.com/product/analysis-and-visualization-of-discrete-data-using-neur
al-networks-koji-koyamada/
More products digital (pdf, epub, mobi) instant
download maybe you interests ...

Building Computer Vision Applications Using Artificial


Neural Networks, 2nd Edition Shamshad Ansari

https://ebookmass.com/product/building-computer-vision-
applications-using-artificial-neural-networks-2nd-edition-
shamshad-ansari/

Hydrogeological Conceptual Site Models: Data Analysis


and Visualization 1st Edition, (Ebook PDF)

https://ebookmass.com/product/hydrogeological-conceptual-site-
models-data-analysis-and-visualization-1st-edition-ebook-pdf/

Data Science With Rust: A Comprehensive Guide - Data


Analysis, Machine Learning, Data Visualization & More
Van Der Post

https://ebookmass.com/product/data-science-with-rust-a-
comprehensive-guide-data-analysis-machine-learning-data-
visualization-more-van-der-post/

Accelerators for Convolutional Neural Networks Arslan


Munir

https://ebookmass.com/product/accelerators-for-convolutional-
neural-networks-arslan-munir/
Spatial analysis using big data: methods and urban
applications Yamagata

https://ebookmass.com/product/spatial-analysis-using-big-data-
methods-and-urban-applications-yamagata/

Visual Data Insights Using SAS ODS Graphics: A Guide to


Communication-Effective Data Visualization 1st Edition
Leroy Bessler

https://ebookmass.com/product/visual-data-insights-using-sas-ods-
graphics-a-guide-to-communication-effective-data-
visualization-1st-edition-leroy-bessler-2/

Visual Data Insights Using SAS ODS Graphics: A Guide to


Communication-Effective Data Visualization 1st Edition
Leroy Bessler

https://ebookmass.com/product/visual-data-insights-using-sas-ods-
graphics-a-guide-to-communication-effective-data-
visualization-1st-edition-leroy-bessler/

Biostatistics and Computer-based Analysis of Health


Data using R 1st Edition Christophe Lalanne

https://ebookmass.com/product/biostatistics-and-computer-based-
analysis-of-health-data-using-r-1st-edition-christophe-lalanne/

Data Visualization: Exploring and Explaining with Data


1st Edition Jeffrey D. Camm

https://ebookmass.com/product/data-visualization-exploring-and-
explaining-with-data-1st-edition-jeffrey-d-camm/
Analysis and
Visualization of
Discrete Data Using
Neural Networks
This page intentionally left blank
Analysis and
Visualization of
Discrete Data Using
Neural Networks

Koji Koyamada
Kyoto University, Japan

World Scientific
NEW JERSEY • LONDON • SINGAPORE • BEIJING • SHANGHAI • HONG KONG • TAIPEI • CHENNAI • TOKYO
Published by
World Scientific Publishing Co. Pte. Ltd.
5 Toh Tuck Link, Singapore 596224
USA office: 27 Warren Street, Suite 401-402, Hackensack, NJ 07601
UK office: 57 Shelton Street, Covent Garden, London WC2H 9HE

Library of Congress Cataloging-in-Publication Data


Names: Koyamada, Kōji, author.
Title: Analysis and visualization of discrete data using neural networks / Koji Koyamada.
Description: New Jersey : World Scientific, [2024] | Includes bibliographical references.
Identifiers: LCCN 2023038211 | ISBN 9789811283611 (hardcover) |
ISBN 9789811283628 (ebook for institutions) | ISBN 9789811283635 (ebook for individuals)
Subjects: LCSH: Data mining. | Information visualization--Data processing. |
Neural networks (Computer science)
Classification: LCC QA76.9.D343 K685 2024 | DDC 006.3/12--dc23/eng/20231222
LC record available at https://lccn.loc.gov/2023038211

British Library Cataloguing-in-Publication Data


A catalogue record for this book is available from the British Library.

Copyright © 2024 by World Scientific Publishing Co. Pte. Ltd.


All rights reserved. This book, or parts thereof, may not be reproduced in any form or by any means,
electronic or mechanical, including photocopying, recording or any information storage and retrieval
system now known or to be invented, without written permission from the publisher.

For photocopying of material in this volume, please pay a copying fee through the Copyright Clearance
Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA. In this case permission to photocopy
is not required from the publisher.

For any available supplementary material, please visit


https://www.worldscientific.com/worldscibooks/10.1142/13603#t=suppl

Desk Editor: Quek ZhiQin, Vanessa

Typeset by Stallion Press


Email: [email protected]

Printed in Singapore
Contents

1. Introduction 1
1.1 Basic operations of Excel . . . . . . . . . . . . . . . . . . . . 2
1.1.1 Table components . . . . . . . . . . . . . . . . . . . . 3
1.1.2 Name box and formula bar . . . . . . . . . . . . . . . 4
1.1.3 Ribbon . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.1.4 File tab and Backstage view . . . . . . . . . . . . . . 5
1.1.5 Autofill . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.1.6 Relative reference . . . . . . . . . . . . . . . . . . . . 7
1.1.7 Absolute reference . . . . . . . . . . . . . . . . . . . . 8
1.1.8 Introduction of visualization with Excel . . . . . . . . 10
1.1.9 Introduction of PDE derivation with Excel . . . . . . 12
1.2 Basic operations of Google Colab (Colab) . . . . . . . . . . . 15
1.2.1 Code cell . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.2.2 Text cell . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.2.3 Introduction to visualization with Colab . . . . . . . . 18
1.2.4 Introduction of deep learning with Colab . . . . . . . 19
1.3 Organization of this document . . . . . . . . . . . . . . . . . 24

2. Basic 25
2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.1.1 Data analysis using NNs . . . . . . . . . . . . . . . . 25
2.1.2 Format of physical data . . . . . . . . . . . . . . . . . 30
2.1.3 Physical data visualization . . . . . . . . . . . . . . . 30
2.2 Statistical Analysis in Excel . . . . . . . . . . . . . . . . . . 36
2.2.1 Correlation analysis . . . . . . . . . . . . . . . . . . . 37
2.2.2 F-test . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

v
vi Analysis and Visualization of Discrete Data

2.2.3 t-Test . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.2.4 Z-test . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.3 Regression analysis . . . . . . . . . . . . . . . . . . . . . . . 41
2.3.1 Model characteristics . . . . . . . . . . . . . . . . . . 42
2.3.2 Regression analysis assumptions . . . . . . . . . . . . 43
2.3.3 Regression analysis using the data analysis tools
in Excel . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.3.4 Excel macro . . . . . . . . . . . . . . . . . . . . . . . 55
2.3.5 Implementation of the variable reduction method
using Excel VBA . . . . . . . . . . . . . . . . . . . . . 59
2.4 What-if analysis . . . . . . . . . . . . . . . . . . . . . . . . . 63
2.5 Solver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
2.5.1 Optimization . . . . . . . . . . . . . . . . . . . . . . . 68
2.5.2 Implementation of regression analysis . . . . . . . . . 69
2.6 Colab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
2.6.1 Correlation analysis . . . . . . . . . . . . . . . . . . . 74
2.6.2 F-test . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
2.6.3 t-Test . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
2.6.4 Z-test . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
2.6.5 Regression analysis . . . . . . . . . . . . . . . . . . . 77
2.6.6 Optimization problem . . . . . . . . . . . . . . . . . . 80
2.7 NNs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
2.7.1 Universal approximation theorem . . . . . . . . . . . 83
2.7.2 Regression analysis using NNs . . . . . . . . . . . . . 85
2.7.3 NN implementation using Excel VBA . . . . . . . . . 87
2.7.4 Function approximation using NNs . . . . . . . . . . . 95
2.7.5 Point cloud data analysis using NNs: Application
example . . . . . . . . . . . . . . . . . . . . . . . . . . 98
2.7.6 Line group data analysis using NN application
examples . . . . . . . . . . . . . . . . . . . . . . . . . 101

3. Practical Part 109


3.1 About PINNs . . . . . . . . . . . . . . . . . . . . . . . . . . 109
3.2 Automatic differentiation in NN . . . . . . . . . . . . . . . . 113
3.3 Using automatic differentiation in regression equation . . . . 116
3.4 Automatic differentiation using Colab . . . . . . . . . . . . . 118
Contents vii

3.5 Increasing the accuracy of visualization of magnetic line


group data using automatic differentiation . . . . . . . . . . 119
3.6 Generation of a CAD model from point cloud data using
point cloud data processing software . . . . . . . . . . . . . 121

4. Advanced Application Section 129


4.1 Derivation of PDEs describing physical data . . . . . . . . . 129
4.1.1 Related research . . . . . . . . . . . . . . . . . . . . . 132
4.1.2 PDE derivation using regularized regression
analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 134
4.1.3 Definition of error . . . . . . . . . . . . . . . . . . . . 136
4.1.4 Example of PDE derivation . . . . . . . . . . . . . . . 137
4.2 Use of visual analysis techniques . . . . . . . . . . . . . . . . 171
4.3 Methods for solving a given PDE . . . . . . . . . . . . . . . 173
4.3.1 How to solve PDEs using the Fourier transform . . . 175
4.3.2 PDE approximate solution . . . . . . . . . . . . . . . 177
4.3.3 How to find solutions to PDEs using PINNs . . . . . 183
4.3.4 Example of finding solutions to PDEs using
PINNs . . . . . . . . . . . . . . . . . . . . . . . . . . 185
4.3.5 Implementation of the PDE solution method . . . . . 195

5. Physically Based Surrogate Model 203


5.1 About the CAE surrogate model . . . . . . . . . . . . . . . . 203
5.1.1 Example of SM construction . . . . . . . . . . . . . . 203
5.1.2 Expectations for physics-based learning . . . . . . . . 209
5.2 Application to carbon neutrality (CN) . . . . . . . . . . . . 209

6. Closing Remarks 213

References 217
Index 221
This page intentionally left blank
Chapter 1

Introduction

In this chapter, you will learn practical methods for analyzing physical data
spatiotemporally defined by discrete points. Physical data refer to data
used to describe physical or other phenomena. The physical phenomena
refer to various phenomena that exist in nature. They include, for example,
weather, temperature, earthquakes, and floods. The physical data are often
defined as spatiotemporal discrete data. The spatiotemporal discrete data
refer to data that are not spatiotemporally continuous. When we handle
spatiotemporal discrete data, we usually divide the spatiotemporal axis into
discrete segments so that we can handle data in each segment.
Spatiotemporal discrete data can be used in a variety of fields. For
example, in the social and natural sciences, spatiotemporal discrete data
are sometimes used to analyze various phenomena and their continuous
changes. In biology and geology, spatiotemporal discrete data are some-
times used to analyze phenomena that occur in time.
Physical data can be measured in a variety of ways as follows:

• Using measuring instruments: Physical data can be measured directly


using measuring instruments. For example, a thermometer or a hygrom-
eter can be used to measure temperature and a seismometer can be used
to measure the vibrations of an earthquake.
• Observation: Physical data can be obtained by observing phenomena.
For example, a weather forecast site or app can be used to observe the
weather. To acquire weather data, you can observe temperature, humid-
ity, wind speed, etc., at meteorological observatories.
• Using simulations: Physical data can be obtained by running simula-
tions on a computer. For example, you can understand fluid and heat
conduction phenomena by simulating fluid behavior and heat conduction
phenomena.

1
2 Analysis and Visualization of Discrete Data

Today, discovering a spatiotemporal model that explains physical data


defined by spatiotemporal discrete points (hereafter simply referred to as
“physical data”) is an interesting topic for researchers and practitioners.
A key topic in the discovery of a spatiotemporal model is the discrete-to-
continuous transformation to effectively evaluate partial differential terms.
More specifically, it is about finding a partial differentiable function that
adequately approximates physical data defined discretely in the time space.
A partially differentiable function is characterized by a partial derivative
at a given point. Partial differentiation refers to calculating a derivative
with respect to a particular variable in a multivariable function at that
point. The partial differentiable function, which has a partial derivative at a
given point, is used to analyze a multivariable function. Successfully finding
such partial differentiable function enables effective visualization and also
enables a partial differential equation (PDE) that explains a phenomenon
to be derived using the results of analysis.
This document uses Excel to understand the basics of how to analyze
physical data. Excel is widely used as a piece of spreadsheet software to
organize data is suited for data analysis. Using Excel has the following
benefits:

• Easy to use: Excel is commonly pre-installed on most computers and


does not require additional installation. Moreover, Excel can be used
without the need for specialized skills.
• Has a variety of functions: Excel has a variety of functions that allow
you to aggregate data efficiently.
• Can create graphs or charts: Excel can be used to visually display data
in the form of a graph or chart, which facilitates understanding of data.
• Data can be easily imported : Excel allows you to import data from, for
example, a CSV or text file.
• Can work jointly: Excel allows multiple people to work jointly, making
it useful for data analysis projects.

Excel, which provides an array of benefits, is also widely used in data


analysis. Next, the basics of Excel will be explained.

1.1. Basic operations of Excel


Excel is a piece of spreadsheet software developed by Microsoft. The
spreadsheet software is a tool that allows you to enter and edit data in
a tabular format for aggregation, analysis, and other purposes. Excel is
Introduction 3

widely used by individuals and in businesses, and can handle large amounts
of data, making it suitable for aggregation and analysis. Excel can be used
to not only create a table but to also create a graph or chart. Additionally,
Excel has a variety of functions to organize data, allowing you to analyze
data and calculate formulas. Excel is commonly pre-installed on Windows
and Mac OS and sold as part of Microsoft Office. It is also available as a
cloud service, so you can use it from any Internet-connected environment.
The data to be analyzed are often provided in Excel format. There are
quite a few tools capable of reading and analyzing Excel data, but Excel
itself also contains functions to perform analyses. Excel provides tools for
determining correlation between data, testing for differences in the mean
and variance of data, and regression analysis. It also serves as an optimiza-
tion tool. These tools can be used to implement deep learning. Excel can
be regarded as a tool for preliminary analysis prior to utilizing dedicated
analysis tools, and it is advisable to become familiar with its functionalities.
This section describes the basic terminology in Excel and explains how to
perform what-if analysis using sample data.

1.1.1. Table components


The main components of an Excel table are as follows (Figure 1.1):

• Worksheet : One of the basic building blocks of an Excel document. One


document has multiple worksheets. One worksheet is usually contained
in an Excel file, but multiple worksheets may be included.
• Cell : A basic element for entering data in Excel at which a row and a
column intersect. You can enter text, numbers, and other values in a cell.
• Row : A basic element that divides an Excel worksheet horizontally. A
row usually contains a single cell. In Excel, you can refer to a row by
specifying a row number.
• Column: A basic element that divides an Excel worksheet vertically. A
column usually contains a single cell. In Excel, you can refer to a column
by specifying the column number.
• Row number : A number that identifies a row in Excel and appears on
the left. Row numbers typically start from “1”.
• Column number : A number that identifies a column in Excel and appears
at the top. Column numbers usually start from “A”.

Each component of a table in Excel plays an important role in creating


and manipulating a table. When creating a table or entering data, it is
4 Analysis and Visualization of Discrete Data

Figure 1.1. Components of a table in Excel.

important to understand these components and manipulate them appro-


priately.

1.1.2. Name box and formula bar


The name box and formula bar (Figure 1.2) in Excel have the following
roles:

• Name box: In Excel, you can name a specified cell or range. This is
called naming. Naming allows you to specify a cell or a range of cells. A
named cell or range is called a “name” in Excel. You can manage your
data more efficiently by specifying a named cell or range. Excel provides
a function called “name box” that allows you to specify a named cell or
range by entering its name.
• Formula bar: In Excel, you can calculate data by entering a formula.
You can enter a formula in a place called “formula bar”. The formula
bar is located in the Excel status bar and used to enter a formula into a
cell. Entering a formula allows you to calculate data efficiently.
Introduction 5

Figure 1.2. Name box and formula bar.

Figure 1.3. Ribbon.

1.1.3. Ribbon
The Excel ribbon refers to the tabbed menus that appear in the main Excel
window. The ribbon has various functions and commands grouped together
into a tab according to the category. There are various tabs, such as Home
and Data. Users can use commands in the ribbon to manipulate Excel to
create or edit a table.
The tabs and the area containing the buttons which can be switched
by clicking a tab are collectively called the “ribbon” (Figure 1.3). You can
switch between buttons by clicking the tab placed above them.

1.1.4. File tab and Backstage view


The Excel file tab is a special tab that appears in the upper-left corner of the
main Excel window. Clicking on the File tab will bring up the Backstage
view (Figure 1.4).
The Backstage view is a space for managing Excel files and data. The
Backstage view contains templates for creating a new workbook, as well as
6 Analysis and Visualization of Discrete Data

Figure 1.4. File tab and Backstage view.

commands for opening, saving, printing, and sharing existing workbooks,


and for setting Excel options. If you want an Excel data analysis tool or
solver to appear as a menu on the ribbon, you must set a corresponding
option here. In the Backstage view, you can also check your Excel account
or privacy settings.
Clicking on the File tab will bring up the Backstage view. Clicking
Options will display a dialog box.

1.1.5. Autofill
The Autofill function in Excel helps you enter data efficiently. It automat-
ically suggests data to be entered next based on the data that have already
been entered (Figure 1.5).
For example, suppose that columns contain month and day data. The
data “January”, “February”, and “March” are entered in the “Month”
column, and the data “1st”, “2nd”, and “3rd” are entered in the “Day”
column. If you enter “4” in the April cell of the “Month” column, “4” will
also be automatically entered in the 4th cell of the “Day” column.
The Autofill function allows you to enter data more efficiently, so you
can perform Excel work more efficiently. The Autofill function also reduces
input errors, thereby increasing data reliability.
Introduction 7

Figure 1.5. Autofill.

You can easily enter continuous data such as “January, February,


March. . .” by dragging them:

1. Select a cell filled with a number.


2. Move the mouse pointer to the small square at the bottom right of the
thick frame.
3. When the shape of the mouse pointer changes into a cross, drag it.
4. Click on “Autofill Options” that has appeared in the lower right area.

1.1.6. Relative reference


In Excel, the relative reference function allows the reference destination in
a referenced cell to be automatically changed when the referenced cell is
moved (Figure 1.6).
For example, for a cell containing the formula “= A1”, if you were to
move this cell to the right by one cell, the formula will be automatically
rewritten to “= B1”. The function capable of changing the reference desti-
nation in a cell automatically when the cell is moved is known as “relative
reference”.
The “relative reference” function, which automatically changes the ref-
erence destination in a cell when the cell is moved, is useful for writing for-
mulas. When data that use “relative reference” are duplicated, a reference
8 Analysis and Visualization of Discrete Data

Figure 1.6. Relative reference.

destination in the duplicated cell is also changed, allowing you to manage


data efficiently.
Autofill is used to copy a formula.

1. If you select the D2 address that contains a formula, a small square will
appear at the bottom right of that cell.
2. Drag it downward to automatically fill other cells.
3. Select the B2 address. The contents of that cell will be displayed in the
formula bar.

1.1.7. Absolute reference


In Excel, when a cell containing “absolute reference” is referenced, the
reference destination in the cell is fixed even after the cell moves.
For instance, consider a cell that contains the formula = “$A$1”. If
you shift this cell to the right by one cell, the reference destination in the
moved cell will remain unchanged as “= $A$1”. The function capable of
fixing a reference destination in a cell even after the cell is moved is known
as “absolute reference”.
The absolute reference function, which prevents the reference destina-
tion in a cell from being changed after the cell has moved, is useful for
writing formulas. When data that use “absolute reference” are duplicated,
the reference destination in the duplicated cell is not changed, allowing you
to manage data efficiently (Figure 1.7).
Introduction 9

Figure 1.7. Absolute reference.

It is recommended to use the absolute reference function in order to


retain the cell address in a formula when the formula is copied.

1. Select the B4 address. This will cause the cell to be surrounded by a


thick frame with a small square appearing at the bottom right.
2. Dragging the small square downward will not trigger the autofill to work.
3. Instead, click to select B1, which is the “commission rate”, and press
F4.
4. Make sure the column and row numbers are prefixed with “$” to indicate
they are absolute references. There are three prefixing patterns: abso-
lute reference for both column and row, for row only, and for column
only. This pattern can be changed by pressing the F4 button.

To learn how to use relative and absolute references, display the calcu-
lation results of numbers in column A × numbers in row 1.

1. Enter 1 to 9 downward, starting from row 2 of column A.


2. Enter 1 to 9 rightward, starting from row 1 of column B.
3. Calculate products for all 9 × 9 cells using relative and absolute ref-
erences appropriately (Figure 1.8). More specifically, enter a formula
“= $A2*B$1” in row 2 of column B. Copy this cell up to row 10, and
then copy all the rows 2 to 10 of column B together up to 10 columns.
10 Analysis and Visualization of Discrete Data

Figure 1.8. Creation of a 9 × 9 multiplication table.

1.1.8. Introduction of visualization with Excel


Excel is widely used as a piece of spreadsheet software but it is also suitable
for visualizing data.
There are several ways to visualize data in Excel:

1. Visualization with charts: Excel provides a variety of chart types to help


you visualize your data with bar, line, pie charts, etc.
2. Visualization using a pivot table: You can visualize data by using an
Excel pivot table to aggregate data and convert them to a chart.
3. Visualization using a spreadsheet: You can use an Excel spreadsheet to
organize data and visualize them using color coding or formulas.
4. Visualization using macro: Excel has a macro language called VBA. You
can use it to create a custom chart or automated visualization tool.

Before visualizing data in Excel, you may have to organize and process
them. Excel also has a visualization add-in that allows you to do more
detailed visualizations.
Displaying a chart using Excel:

1. Creating a chart: Excel has a variety of chart types that allow you to
enter data and create a chart.
• Select data, select Chart from the Insert tab, and select a chart type.
• After a chart is created, the data appear in the chart area.
Introduction 11

2. Editing a chart: You can edit a chart you have created.


• Select a chart and select Chart Options from the Design tab.
• You can change detailed chart settings by using tabs, such as Axis,
Data Series, and Layout.
3. Chart design: You can change the appearance of a chart.
• You can change the appearance of a chart by selecting a preferred
chart and choosing Shape, Frame, etc., from the Format tab.

A concrete example is used to explain this. The following is a two-


dimensional function: f (x, y) = sin(x) · cos(2y). Calculate f (x, y) using a
variable value (y) in column A and variable value (x) in row 1 (Figure 1.9).

• Autofill is used to fill variable values downward from the cell of row 2 of
column A.
• Autofill is used to fill variable values rightward from the cell of row 1 of
column B.

Figure 1.9. Presentation of two-dimensional function as a contour chart.


12 Analysis and Visualization of Discrete Data

• Calculate function values using relative and absolute references as appro-


priate.
• Set the chart type to “Contour” and display a chart.

1.1.9. Introduction of PDE derivation with Excel


This section introduces how to use data analysis (regression analysis) in
Excel to derive a PDE from the given data; detailed operations will be
described in subsequent sections. Regression analysis is a method of ana-
lyzing the degree of impact (weight) of two or more explanatory variables
on an objective variable. As a concrete example, consider a model that
describes the annual sales of a restaurant. Each restaurant corresponds to
a row in Excel. In each row, enter annual sales (unit: 10,000 yen), the num-
ber of seats (number), walking time from the nearest station (minutes), and
presence/absence of breakfast (0, 1). The regression analysis tool allows you
to easily create a formula that explains annual sales by adding a weight to
each of the following explanatory variables, and an intercept: Number of
seats (number), walking time from the nearest station (minutes), and pres-
ence/absence of breakfast (0, 1) (Figure 1.10).
This is an introductory exercise for general regression analysis tech-
niques, but it provides a tip for developing a data-driven method for deriv-
ing a PDE (Figure 1.11). For example, enter the partial derivatives,

ut , ux , uy uxx , uyy

Figure 1.10. Explanation of sales using regression analysis.


Introduction 13

Figure 1.11. Derivation of a PDE using regression analysis.

for the physical data (u) at each of the multiple coordinate points in each
Excel row. The regression analysis tool allows you to analyze the degree of
impact of each partial derivative on u [1]. As a result, a PDE
ut = a0 ux + a1 uy + a2 uxx + a3 uyy
is derived that describes the physical data.
As a concrete example, consider an equation of motion (Figure 1.12).
An equation of motion is an ordinary differential equation representing
the laws of motion of an object in physics. A schematic diagram is shown
here where the mass of an object is m kg, a spring constant is k, and the
proportional coefficient of air resistance is c.
This ordinary differential equation is solved to calculate the position
coordinate of the object while changing the time data. This equation
becomes a second-order ordinary differential equation at the object’s posi-
tion coordinate x, which can be solved by giving an initial condition. Enter
the coordinate data (x) and the differential value xt , xtt , uy uxx , uyy at each
of the multiple times in each Excel row. xtt , xt , x regression analysis tools
allow you to analyze the degree of impact of position and speed on xtt . As
a result, a PDE
xtt = a0 xt + a1 x + a2 (1.1)
is derived that describes the physical data.
The advection equation is a PDE that describes the phenomena in which
physical quantities, such as a substance and momentum, are carried along
14 Analysis and Visualization of Discrete Data

Figure 1.12. Derivation of equation of motion.

with a flow. As another example, consider a case (translational equation)


where a substance is transported in a flow at a constant velocity a0 and the
concentration of the substance is C : u(x, t). Figure 1.13 shows a schematic
diagram for a one-dimensional equation.
The advection equation given by

u t + a0 u x = 0 (1.2)

has an exact solution in which the geometry given by initial conditions


moves along the x axis at a velocity of a0 .
The concentration u can be calculated by setting the time t and the posi-
tion x and using this exact solution. Furthermore, this exact solution itself
can be differentiated to calculate time derivative of uut or space derivation
of uux . Regression analysis is then used to obtain a regression equation that
describes time derivative as spatial derivative. This regression equation is
a derived PDE (Figure 1.13).
As described earlier, if the physical data are obtained and a partial
derivative can be obtained for each piece of data, a PDE can be derived
in a data-driven manner using an Excel analysis tool. The above example
deals with a PDE consisting of linear terms where a constant coefficient
Introduction 15

Figure 1.13. Derivation of advection equation.

is assumed for partial differential terms. Even for nonlinear terms where a
coefficient is a function of physical data, they can be reduced to a regression
analysis problem by integrating the nonlinear parts with partial differential
terms.
For the partial differential terms, if an exact solution is given, an exact
differential value can be obtained. If not, the calculation of a differential
value becomes an issue. One possible strategy is to obtain and differenti-
ate an approximate function that distributes physical data continuously in
some manner. One such approximate function that attracts attention is an
approximate function based on neural network (NN). This will be discussed
in Chapter 2.

1.2. Basic operations of Google Colab (Colab)


Approximately 1000 rows of data can be processed with Excel; however,
in order to understand data analysis techniques handling a large amount
of data, Google Colab is used. Google Colab (Colab) is Google’s cloud
environment for data analysis [2]. Colab allows you to start analyzing data
16 Analysis and Visualization of Discrete Data

immediately on your browser without requiring hardware or software. Data


analysis in Colab has the following advantages:
• Eliminates the need to set up hardware or software: Colab allows you to
start analyzing right away and proceed with analysis without having to
worry about preparation on a PC or server or installation of necessary
tools.
• Available in a browser so you can access it from any device: Colab runs
in a browser so you can access it from any device, including your PC,
smartphone, and tablet.
• Works with a large amount of data: Colab, which works in Google’s cloud
infrastructure, can handle large amounts of data.
• Keeps a history of your work as you can proceed with work in a notebook
format: Colab allows you to proceed with data analysis in a notebook
format, so you can keep a history of your work. This enhances repeata-
bility of your work and facilitates collaboration.
Google Colaboratory (Colab for short) is a free cloud service provided
by Google, where you can use the Jupyter notebook and Python to analyze
data and build a model for machine learning. The following are some basic
Colab operations:
1. Creating a notebook.
2. Opening a notebook.
3. Saving a notebook.
4. Sharing a notebook.
5. Connecting a notebook.
6. Running a notebook.
7. Downloading a notebook.
A Colab notebook has a “cell” in which to write text and code. A notebook
is composed of multiple cells that can be edited directly in a web browser.
The Colab notebook has two types of cell:
• Code cell: A cell for writing Python code.
• Text cell: A cell for describing text that can use Markdown notation to
decorate your text (Figure 1.14).
To create a new cell, click “+ Code” or “+ Text” in the menu bar.
Alternatively, you can also copy and paste existing cells. To run each cell,
click to select it and then press “Shift + Enter” or click the “” Run button
in the cell.
Introduction 17

Figure 1.14. Google Colab.

1.2.1. Code cell


You can enter and execute a Python code in a code cell.

• To create a code cell, click “+Cell” at the top of a Colab page.


• To enter and run code, select a cell and click the Run button.

Colab has Python libraries pre-installed, allowing you to program with


popular libraries like TensorFlow [3], Keras [4], and Numpy [5]. Colab can
also perform high-speed computation using GPU and can perform deep
learning tasks such as machine learning and image recognition. Colab also
has other functions, such as importing external data, saving them in Google
Drive, and publishing them to GitHub.

1.2.2. Text cell


The text cell allows you to write program descriptions, formulas, links,
images, and more.

• To create a text cell, click “+ Cell” at the top of a Colab page and select
“Text”.
• To enter text, select a cell and enter a sentence in it.

You can use the Markdown notation to format headings, lists, and more.
Colab text cells are very useful for writing descriptions and documents.
In particular, they can be used to describe programs and interpret results.
18 Analysis and Visualization of Discrete Data

The Colab text cells also allow you to better describe the output of your
program.

1.2.3. Introduction to visualization with Colab


There are many ways for visualization with Colab.

1. Visualization with Matplotlib [6]: Matplotlib, pre-installed in Colab, is


the most commonly used Python library for data visualization. Mat-
plotlib allows you to draw a variety of charts, such as line, scatter, and
bar charts.
2. Visualization with Seaborn [7]: Seaborn is a data visualization library
based on Matplotlib, allowing you to draw a beautiful chart with ease.
3. Visualization with Plotly [8]: Plotly is a Python library that can be used
to create an interactive chart. Plotly allows you to draw a beautiful and
interactive chart.
4. Map visualization with Folium [9]: Folium is a library for map visual-
ization in Python that allows you to display markers, polygons, etc., on
a map.

For example, you can write the following code to draw a line chart with
Matplotlib:

< Start >


import matplotlib.pyplot as plt
# X-axis value
x = [1, 2, 3, 4, 5]
# Y-axis value
y = [10, 20, 30, 40, 50]
# Draw a line chart
plt.plot(x, y)
# Chart title
plt.title(‘‘Line Plot’’)
# X-axis label
plt.xlabel(‘‘X-axis’’)
# Y-axis label
plt.ylabel(‘‘Y-axis’’)
# Display chart
plt.show()
< End >
Introduction 19

Figure 1.15. Example of drawing a line chart using Matplotlib.

The code above stores the X axis and Y axis values in x and y,
respectively, and draws a line chart using the plt.plot () function. The
plt.title (), plt.xlabel (), and plt.yllabel () functions are used to set
the title of the chart, the X-axis label, and the Y-axis label, respectively.
Finally, the plt.show () function is used to display the chart. As described
above, Matplotlib allows you to draw a line chart easily (Figure 1.15).

1.2.4. Introduction of deep learning with Colab


To conduct deep learning on Colab, you can perform the following steps:

1. Create a Colab notebook.


2. Install the libraries required for your notebook, such as TensorFlow and
Keras.
3. Upload the data required for learning to Colab.
4. Build an NN used for learning.
5. Train the NN with learning data.
6. Make predictions using the trained model.

Colab has pre-installed libraries required for deep learning, such as Ten-
sorFlow and Keras, so deep learning can commence immediately just after
importing the libraries. Colab also offers GPUs for free so that high-speed
training using large amounts of data can be conducted. It is worth noting
20 Analysis and Visualization of Discrete Data

that Colab is a cloud service provided by Google; therefore, it must be


utilized with connection to the Internet, and data storage is the user’s
responsibility.
As a sample, we will introduce a TensorFlow program that trains the
NN to recognize handwritten digits using the Modified National Institute of
Standards and Technology (MNIST) handwritten digit dataset. MNIST is
a modified version of the original dataset provided by the National Institute
of Standards and Technology. The MNIST handwritten digits dataset is
an image dataset of handwritten digits widely used in the field of machine
learning and deep learning. This dataset contains handwritten digits from
0 to 9, each being a grayscale image of 28 × 28 pixels. The dataset contains
60,000 images for training and 10,000 for testing. MNIST is widely used
as a benchmark for developing and evaluating machine learning algorithms
and models.

< Start >


# Importing Tensorflow and MNIST
import tensorflow as tf
from tensorflow import keras
# Reading MNIST data
(x train, y train), (x test, y test) = keras.datasets.mnist.load data()
# Building a model
model = keras.Sequential([
keras.layers.Flatten(input shape=(28, 28)),
keras.layers.Dense(128, activation=‘relu’),
keras.layers.Dense(10, activation=‘softmax’)
])
# Compiling the model
model.compile(optimizer=‘adam’,
loss=‘sparse categorical crossentropy’,
metrics=[‘accuracy’])
# Training
model.fit(x train, y train, epochs=5)
# Evaluation
test loss, test acc = model.evaluate(x test, y test, verbose=2)
print(‘\nTest accuracy:’, test acc)
< End >

The above code uses Tensorflow’s keras APIs to read MNIST data and
build, train, and evaluate an NN.
Introduction 21

MNIST data reading: keras.datasets.mnist.load data() is a Keras


function that downloads and reads the MNIST dataset. MNIST is an image
dataset of handwritten digits, consisting of 60,000 training images and
10,000 test images.
(x train, y train), (x test, y test) are the types of variables to
receive four NumPy arrays returned by a function. x train and y train
are NumPy arrays representing training images and their corresponding
labels (numbers). Similarly, x test and y test represent test images and
their corresponding labels.
More specifically, x train is a NumPy array containing 60,000 ele-
ments, each representing a grayscale image of 28 × 28 pixels and being
an integer value between 0 and 255. y train is a NumPy array containing
60,000 elements, each being an integer between 0 and 9 and represent-
ing a number written in the corresponding image. Therefore, x test and
y test are similar, in that each NumPy array contains 10,000 pieces of
test data.
These arrays are used to define and train a model in Keras.
In model building, Keras is used to define an NN model and each instruc-
tion in the above coding has the following meaning:

• keras.Sequential (): A class for creating a Sequential model. This is a


simple model that can be stacked linearly.
• keras.layers.Flatten(input shape=(28, 28)): Defines a flatten layer
that converts a two-dimensional array of 28 × 28 pixels into a one-
dimensional flat array as an input layer. This means you can arrange
the images in a row and treat them as input.
• keras.layers.Dense(128, activation=‘relu’): Defines dense layers,
each having 128 neurons. These are fully bonded layers in which all units
in the previous layer are connected to all units in the next layer. The
ReLU activation function is applied to these layers to introduce nonlin-
earity, allowing the model to learn more complex functions.
• keras.layers.Dense(10, activation=‘softmax’): Defines a dense
layer having 10 neurons. This is a final output layer which outputs a
probability distribution for 10 classes. These probabilities are calculated
by the softmax function.

Stacking these layers on the Sequential model defines the NN from the
input layer to the hidden layer to the output layer. The model receives a
handwritten digit image of 28 × 28 pixels as input and is trained to classify
it into one of 10 classes.
22 Analysis and Visualization of Discrete Data

In model compiling, Keras is used to compile the NN model and each


instruction in the above coding has the following meaning:

• optimizer=‘adam’: Specifies that the Adam optimizer is used as an


optimization algorithm to update model parameters. Adam is one type
of optimization algorithm based on the gradient descent method and is
characterized by fast speed and fast convergence.
• loss=‘sparse categorical crossentropy’: Sets a loss function of the
model. In this case, a sparse version of the cross-entropy loss is used for
classification tasks. This version is used when the correct answer labels
are each indicated as a scalar value and encoded as a class index.
• metrics=[‘accuracy’]: Sets metrics to evaluate the model. In this case,
this specifies that the correct answer rate is used to evaluate the model’s
performance. The correct answer rate represents the percentage of sam-
ples for which prediction matches the correct answer.

These settings define how the model learns and determines an optimiza-
tion algorithm, the loss function, and evaluation metrics.
In model training, Keras is used to train the NN model and each instruc-
tion in the above coding has the following meaning:

• x train: Represents input data for training. In this case, it contains


image data of handwritten digits from the MNIST dataset. The input
data are entered into the input layer of the NN.
• y train: Represents the correct answer data for training. In this case,
it contains label data for handwritten digits from the MNIST dataset.
The correct answer data are represented in a format that corresponds to
the number of nodes in the output layer of the NN and are used when
training the model.
• epochs=5: Specifies the number of model training operations. One
epoch means that the model is trained with the entire training dataset
for one time. In this case, the model is trained for five epochs.

The model.fit () method trains the model using the given training
dataset. More specifically, the training data are split into minibatches and
the gradient descent method is used for each minibatch to update model
parameters. Then, the loss function and evaluation metrics for the model
are calculated to report the progress of training. Splitting training data
into minibatches means that a large dataset is split into several smaller
batches to update model parameters for each minibatch. This allows the
Introduction 23

model to be trained efficiently without having to process all the data at


once.
A power of two is usually specified as the number of samples for a mini-
batch. For example, 32, 64, and 128 are commonly used. Using minibatches
can reduce the amount of memory required during the training process and
parallelize computations. In many cases, a random selection of minibatches
can help the model learn in a balanced manner.
The minibatch learning is often used when the stochastic gradient
descent (SGD) method is used but can also be used with other optimiza-
tion algorithms. Using minibatch learning enables efficient learning using
a large dataset.
After five epochs, the model is expected to learn the entire training
dataset and improve prediction performance. The output results are as
follows:

< Start >


Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-
datasets/mnist.npz
11490434/11490434 [==============================]
- 1s 0us/step
Epoch 1/5
1875/1875 [==============================] - 16s 8ms/step
- loss: 2.5951 - accuracy: 0.8703
Epoch 2/5
1875/1875 [==============================] - 10s 5ms/step
- loss: 0.3623 - accuracy: 0.9147
Epoch 3/5
1875/1875 [==============================] - 8s 4ms/step
- loss: 0.2840 - accuracy: 0.9276
Epoch 4/5
1875/1875 [==============================] - 8s 4ms/step
- loss: 0.2487 - accuracy: 0.9360
Epoch 5/5
1875/1875 [==============================] - 7s 4ms/step
- loss: 0.2324 - accuracy: 0.9419
313/313 - 1s - loss: 0.2557 - accuracy: 0.9429 - 813ms/epoch
- 3ms/step
Test accuracy: 0.9429000020027161
< End >

These messages indicate that TensorFlow is used to download data from


the MNIST dataset and that the downloaded data are used to train the
24 Analysis and Visualization of Discrete Data

NN model. The dataset contains images of handwritten digits and their


corresponding labels (number classes).
This model was trained using five epochs. During one epoch, all the
data in the training dataset are learned for one time. At the end of each
epoch, the model returns a loss and a correct answer rate. The loss is a
metric that measures the difference between model’s predictions and actual
labels. The correct answer rate indicates the percentage of data that the
model predicted correctly.
Finally, it is indicated that the model has an accuracy of 0.9429, which
means that the model can recognize handwritten digits with about 94.3%
accuracy. This is a percentage at which the model can successfully classify
digits accurately using the test dataset.
TensorFlow is an open-source machine learning library that provides
low-level APIs for performing a variety of machine learning tasks. Keras
is a high-level NN API for deep learning using TensorFlow as the back
end. Keras is built on top of TensorFlow and hides TensorFlow’s low-level
operations, making it easier to build and train NNs.

1.3. Organization of this document


This book begins with the Basic, which describes data analysis methods
using Excel. In the Basic, we will explain mean and variance testing, regres-
sion analysis using the Excel analysis tool, and how to implement an NN
using the Excel solver. Furthermore, we will demonstrate automatic differ-
entiation and introduce how to solve a simple PDE.
In the Practical Part that follows, we will explain how to create a
spatiotemporal model using an NN. First, we will explain a method for
visualizing page information from ancient literature data shot with a three-
dimensional CT machine. Then, we will regard a magnetic line group calcu-
lated from the results of analyzing electromagnetic fields in a fusion reactor
as physical data and explain how to visualize the plasma region using an
NN. Up to this point, physical data irrelevant to time are handled.
As an advanced application, we will explain how to use an NN to
derive a PDE that describes physical data and how to solve the PDE using
the NN. Finally, we will demonstrate a surrogate model, together with
some examples. The surrogate model streamlines what-if analysis by link-
ing parameters, such as conditions from large-scale numerical simulations,
with physical data.
Chapter 2

Basic

2.1. Background
2.1.1. Data analysis using NNs
This document describes the analysis of physical data primarily using NNs.
An NN is a combination of mathematical models of neurons in the human
cranial nervous system (Figure 2.1). The concept of NN is inspired by the
mechanics of the human brain (all the interconnections between neurons).
NN refers to a mathematical model created to represent some of the fea-
tures of brain function on a computer. NN is one of the modeling methods
in machine learning (ML) that mimics the workings of the human brain
without the aim to model the billions of neurons in the brain accurately.
It simply attempts to simplify it and make it operational. In NN, when one
inputs data (numerical values), the numerical values are propagated with
weights to the next layer. This is similar to the sequence of processes in the
human brain, whereby synapses are connected with weights and neurons
produce outputs, which are then connected to the next level.
Areas of use for NNs include pattern recognition and data mining
(e.g., image recognition and recommendations).
NNs are widely used in video websites to analyze comments posted on
videos. Specifically, NNs combine natural language processing techniques
to perform sentiment analysis, topic classification, spam filtering, etc., of
comments. The objective of sentiment analysis is to determine whether
a comment falls into the positive, negative, or neutral category. NNs are
useful for sentiment analysis because they can capture the meaning of words
and sentences. Topic classification determines to which topic a comment
relates. For example, to classify comments about a particular product or
service, NNs can create a classifier using keywords related to that product

25
26 Analysis and Visualization of Discrete Data

Figure 2.1. About NNs (https://ledge.ai/neural-network/).

or service. Spam filtering aims to detect automatically generated comments


by spam and bots. By using NNs, characteristic patterns of spam comments
can be detected and deleted automatically. Comment analysis using NNs
can help video platforms control quality and improve products and services.
The introduction of NN-based methods in machine translation has dra-
matically improved translation accuracy. In machine translation, NNs are
mainly used in a method called NMT (Neural Machine Translation). NMT
uses an NN architecture called the Sequence-to-Sequence (Seq2Seq) model
to encode input sentences into a latent semantic space, which is then
decoded to produce output sentences.
Specifically, NMT uses two NNs called encoders and “decoders”. The
encoder receives input sentences as sequences of words or characters and
converts them into a latent semantic space. The decoder receives as input
the latent semantic space created by the encoder and decodes it to produce
output sentences. In NMT, encoders and decoders use architectures such as
RNN (Recurrent Neural Network), LSTM, and GRU. These architectures
contribute to the accuracy of translation because they can handle long
sequences and take into account the context needed for translation. NMT
may also incorporate a mechanism called “attention” between the encoder
and decoder. Attention is a mechanism that uses the output of the encoder
to determine which part of the encoder’s input sentence should correspond
to which part of the decoder-generated sentence, and thus is useful when
translating long or complex sentences.
In stock price prediction in the financial field, NNs are used to predict
future stock prices based on past stock price data, economic indicators, and
Basic 27

other information. In general, NNs use deep learning models such as MLP
(Multilayer Perception) and LSTM (Long Short-Term Memory). There are
two main methods of stock price forecasting. One is to forecast time series
data and the other is to predict future trends, such as the rise or fall of a
stock price.
The objective of forecasting time series data is to use past stock price
data and other information to train NNs to predict future stock prices.
For example, LSTM can be used to predict future stock prices by learning
patterns of past stock price fluctuations.
Conversely, NN is used primarily as a classification problem in predict-
ing trends such as rising or falling stock prices. The objective is to predict
whether stock prices will rise or fall, based on past stock price fluctua-
tions and other economic indicators. In this case, classifiers such as MLP
(Multilayer Perception) and CNN (Convolutional Neural Network)
are used.
However, since many factors affect stock price forecasts, it is difficult
to make forecasts using only a single NN model. For this reason, ensemble
learning, which combines multiple models to make forecasts, is often used.
Image recognition has become an indispensable technology in automated
driving development. Image recognition systems using NNs are already in
use to recognize surrounding objects faster and more accurately on behalf
of the driver. In automated driving, NNs are used to recognize surrounding
objects. A typical approach is to collect data from cameras, radar, lidar,
and sensors and feed that information as input data to the NN. The NN
then extracts feature values from the input data and uses them to recog-
nize objects and extract information such as position, speed, and direction.
In general, CNN (Convolutional Neural Network) is widely used for object
detection and segmentation. Deep learning-based object detection models
such as YOLO (You Only Look Once), SSD (Single Shot Detection), and
Faster R-CNN are also used. These models can provide fast and accurate
object detection. NNs are also used in other aspects of automation, such
as vehicle tracking, lane recognition, and traffic sign recognition.
NNs can detect gastric cancer from endoscopic images, which allows
them to be used in practice. NNs have achieved excellent results in image
analysis and are used in various fields.
To detect gastric cancer, endoscopic images are input into NNs to iden-
tify regions where cancer is most likely to be present. Specifically, NNs can
now automatically identify regions on a gastric endoscopic image where can-
cer is most likely to be present and detect gastric cancers 6 mm or larger
28 Analysis and Visualization of Discrete Data

with the same accuracy as a skilled endoscopist. In gastric cancer detection


using NNs, it is important to train NNs using a large number of endoscopic
images as training data. Once trained, the NNs can detect regions where
cancer is likely to be present by inputting unknown endoscopic images.
NNs have also been applied to determine damage to paved roads, esti-
mate the degree of damage inside bridges, detect abnormalities in power
lines, etc. In paved road damage determination, data collected by vehicle
vibration sensors and other devices are input into the NNs to estimate the
state of pavement deterioration. In estimating the degree of damage inside
bridges, data collected from vibration tests, sonic inspections, etc., are input
into the NNs to assess the type and extent of the damage. In power line
anomaly detection, data obtained from vibration sensors attached to the
power lines are input into the NNs to detect abnormal vibrations. In this
manner, NNs are used to detect various types of damage and abnormalities.
In agricultural work, harvesting and crop sorting heavily burden
workers. Therefore, NNs are used in harvesting robots that assist farm-
ers in the following ways:
Automatic Classification of Fruits and Vegetables: A harvesting robot
automatically classifies harvested crops. NNs are used to analyze images
of crops captured via a camera, to determine the type and ripeness of the
crop.
Automated Harvesting: To harvest crops, a harvesting robot uses NNs
to learn what ripe crops look like and then harvest them accurately. For
example, a tomato harvesting robot learns the color and shape of ripe
tomatoes, thereby accurately identifying and harvesting them.
Crop Health Monitoring: In a harvesting robot, NNs are used to moni-
tor crop health. A harvesting robot uses a camera to capture images of the
crop and uses NN to identify what diseases or insect damage the crop is
infected with.
Crop Growth Prediction: NNs are used to predict crop growth. A har-
vesting robot uses data collected during crop growth to train the NNs and
predict future growth. This allows farmers to predict the timing and quality
of the harvest accurately.
As described, NNs play an important role in various fields. However, the
training targets are mainly images and videos. NNs have rarely been used
to describe physical data defined in spatiotemporal coordinates. PointNet,
a set of spatial coordinate points, is a well-known NN for point cloud data
(a set of points in 3D space).
PointNet is a type of NN that takes point cloud data as input data and
extracts a global feature representation of the point cloud. PointNet was
Basic 29

first introduced in 2017 by Qi CR, et al. [10] and is widely used in areas such
as 3D object recognition, autonomous vehicles, and robot vision. A unique
feature of PointNet is its ability to process point clouds directly. Conven-
tional methods typically represent point clouds by converting them to voxel
grids or 3D meshes and then applying a CNN. However, such methods can
lead to compression and loss of information. PointNet avoids these prob-
lems by processing point clouds directly. PointNet can be used for tasks
such as classification, segmentation, and object detection of point clouds.
PointNet learns features such as point locations, colors, and normals from
an input point cloud and combines these features to create a feature repre-
sentation of the entire point cloud (Figure 2.2).
Using this global feature representation, PointNet can generate classifi-
cation output data for a given point cloud data (Figure 2.3).

Figure 2.2. Overview of PointNet.

Figure 2.3. PointNet classification.


30 Analysis and Visualization of Discrete Data

2.1.2. Format of physical data


Before describing the analysis and visualization of geophysical data, the for-
mat of geophysical data needs to be explained. Roughly stated, the format
is the physical quantity defined on the point cloud data. In this document,
two types of physical data are assumed: The first is a physical quantity
u that is measured at a certain spatial location at a certain fixed time.
A typical example is the data u(ti , xj , yj , zj ) obtained by a measurement
device installed at a spatially fixed location (xj , yj , zj ) at time ti . If there
are N determined time(s) and M spatially fixed location(s), the physical
data are denoted as follows:
u(ti , xj , yj , zj ), i = 1, N, j = 1, M
u is scalar data defined by discrete spatiotemporal points but can also
be vector or tensor data, depending on the instrument used for the
measurement. When N = 1, it means discrete space data.
The second is data obtained at a specific spatiotemporal location.
A typical example is data obtained by a mobile device (such as a UAV).
If there is (are) N determined spatiotemporal point(s), the physical data
are denoted as follows:
u(ti , xi , yi , zi ), i = 1, N
This N is called the number of observations.
When given physical data, interpolation or approximation is used to
compute a spatiotemporal model that explains these data well. Interpola-
tion creates a function that always passes through the given discrete data,
and a typical interpolation method is kriging. If the given discrete data
are likely to contain noise, it does not necessarily explain the original dis-
crete data well. To solve such problems, approximations are sometimes
used. Approximation does not require passing through the given discrete
data and determines a function with as low an error as possible. A typical
approximation method is regression. NN is a type of approximation and
has more explanatory power than basic linear regression.

2.1.3. Physical data visualization


Physical data visualization refers to converting physical data into a form
that is easily understood by humans through its representation in graphs
and images. This allows us to visually understand the relationships and
characteristics of the data, which is useful information for data analysis
and research.
There are many ways to visualize physical data.
Basic 31

2.1.3.1. 1-D plot


A 1-D plot is suitable when you are interested in only one parameter, such
as time series data or statistical data. For example, plotting temperature
changes along a time axis, with time on the x-axis and temperature values
on the y-axis, provides a visual understanding of temperature changes.
A 1-D plot includes line graphs, histograms, and box plots.

2.1.3.2. 2-D Plot


A 2-D plot refers to plotting data using two axes (x-axis and y-axis).
A 2-D plot is appropriate when you are interested in multiple parame-
ters. For example, when examining the relationship between temperature
and humidity, you can plot temperature on the x-axis and humidity on the
y-axis to visually understand the relationship between temperature and
humidity. A 2-D plot includes scatter plots, scatter plot matrixes, and con-
tour plots.

2.1.3.3. 3-D Plot


A 3-D plot refers to plotting data using three axes (x-, y-, and z-axes).
A 3-D plot is suitable when you are interested in three or more parame-
ters. For example, it is useful for displaying a bird’s-eye view of temper-
ature data acquired at the atmosphere’s latitude, longitude, and height.
Such data defined in three-dimensional coordinates are sometimes called
volume data. Volume data refer to data that represent the position, shape,
texture, etc., of a point or object in three-dimensional space. Volume data
are used in a variety of fields, including medical imaging such as CT
scans and MRIs, science and technology, architecture, design, and gaming.
Volume data may be stored on a three-dimensional grid or expressed by a
tetrahedral mesh. To visualize these data, expression methods such as 3D
graphs and slice images are used. Also, when examining the relationship
between temperature, humidity, and air pressure, you can plot temperature
on the x-axis, humidity on the y-axis, and air pressure on the z-axis to visu-
ally understand the relationship. A 3-D plot includes isosurface plots and
volume renderings.
Video: Animation visualization of continuous data along a time axis
allows for a visual understanding of data changes.
Visualization Software: Visualization can be performed using program-
ming languages such as Matplotlib, Python, and R. Specialized software
such as Origin [11] and Grafana [12] can also be utilized.
32 Analysis and Visualization of Discrete Data

VR/AR: VR/AR technology [13] can also be used to visualize data in


three-dimensional space.

2.1.3.4. Particle-based volume rendering


For 3-D plots, this section describes one of the typical visualization
methods, volume rendering [14]. This visualization method successfully
represents the overall characteristics of the data, including its internal struc-
ture, by representing the target volume data as a semitransparent cloud.
This section focuses on physical data visualization techniques based on
volume rendering.
We will discuss PVBR (Particle-Based Volume Rendering) [15], in which
the given physical data comprise opaque emitting particles to perform
volume rendering. PBVR is a simple method that basically consists of two
processing steps: particle generation and particle projection. Particle gen-
eration first requires an estimate of the particle density. PBVR was origi-
nally a visualization technique for continuously distributed physical data.
Therefore, it was necessary to determine how to sample for that defined
area. This time, however, the physical data are defined at discrete points.
Thus, the particle density is determined by the originally given points.
Next, the particle radius is determined by the user-specified transfer func-
tion for the opacity according to Equation (2.4).
The particle density ρ means the number of particles per unit volume.
Consider a sphere of a certain radius and align each particle’s position with
the center of the sphere. The particles inside the sphere are counted and
divided by the volume of the sphere to obtain the number density, which
is the particle density ρ.
In ray casting, if the radius of the ray is r, consider a cylindrical interval
of a certain length l along a ray and assume that the particles are Poisson
distributed in this interval. The Poisson distribution is one of the probabil-
ity distributions in probability theory and is used for random variables that
take non-negative integer values, not continuous values. In particular, the
Poisson distribution with mean λ is expressed by the following probability
density function:

p(k; λ) = (λk e−λ )/k! (2.1)

In this function, k represents a non-negative integer value (0, 1, 2, 3 . . .),


λ represents the positive mean, e represents the base of the natural
logarithm, and k! represents the factorial of k.
Basic 33

The Poisson distribution is often used to represent the number of times


independent events occur. For example, it is used to represent the number
of times in which a certain event occurs in a given time frame, such as an
accident or breakdown. The Poisson distribution is also used as an approx-
imation for many events because the mean and variance are equal. In terms
of physical data analysis and visualization, it is a means of expressing, for
example, the probability of the number of particles generated in a particular
spatial region.
Ray casting is a method of determining the pixels of an image along
the line of sight from within a three-dimensional scene. This allows a
three-dimensional scene to be projected onto a two-dimensional screen.
Ray casting is often used to create realistic images because it allows phys-
ically accurate visualization of shadows, reflections, and other physical
phenomena.
Light can travel along a ray without any obstruction in a space with
a defined group of particles when the number of particles present in that
space is zero. This probability is called transparency. The transparency
t can be calculated as
2
t = e−ρπr l , (2.2)
assuming that particles are generated according to the Poisson distribution
described before since the volume is now πr2 l and the particle density
is ρ. The sum of opacity and transparency is 1. Thus, the opacity can be
calculated as follows:
2
α = 1 − t = 1 − e−ρπr l
(2.3)
The particle radius r can be calculated using the particle density ρ, the
opacity value α, and the ray segment length Δt used in volume ray casting:

log(1 − α)
r= (2.4)
πρΔt
The opacity value α is calculated by a user-specified transfer function. The
transfer function shows the relationship between physical data values and
opacity. A high opacity is set for data values that are to be emphasized,
and a low opacity is set for physical data values that are not.
The first step in PBVR is to generate particles in the coordinates where
the physical data are defined according to the particle radius given in the
above equation. The second step is to project the generated particles onto
the image plane. The particles are projected onto the image surface, and
34 Analysis and Visualization of Discrete Data

pixel values are calculated for each pixel. This basic processing step is
repeated several times. The pixel values are added to the frame buffer and
then divided by the number of iterations to obtain the final pixel values by
averaging.
In PBVR, the projected images from the generated particles are added
to the image plane to calculate the luminance values at the corresponding
pixels. PBVR makes particles completely opaque so that when viewed by
the eye, light, ci (i = 1, . . . , n), from the particles at the back is blocked by
the particles in the front. This effect can be achieved using the Z-buffer
algorithm. The pre-projection sorting and alpha compositing processes typ-
ically required in volume rendering are unnecessary. The translucency effect
is achieved in this averaging process.
In the projection process of the generated particles, the Z-buffer algo-
rithm is used to keep the projected images of the particles closest to the eye
position, and pixel values are calculated using these images. As particles are
opaque, they do not require alpha compositing or reordering according to
their distance from the eye position. For the last remaining particles, color
mapping and shading calculations are performed. Color mapping uses a
color transfer function to convert scalar data calculated by interpolation
at the particle position into color data. The shading is determined by cal-
culating the brightness based on the interpolated gradient vector and the
light source vector, both of which are computed at the particle’s position,
and then multiplying this value by the color data. To calculate the final
pixel values, we discuss ensemble averaging (Figure 2.4).
An ensemble is a set of results from multiple iterations of conceptually
equivalent trial experiments. Each set of ensembles has the same attribute
values for particle radius and particle density. In the ensemble averaging
method, a trial experiment is assumed to be a single image creation process
consisting of particle generation and projection with the same attribute
values. The seed for random number generation is changed for each trial
experiment. In ensemble averaging, the final pixel value is the average of
the pixel values over all iterations. Therefore, if the i luminance value is
Bi , the final pixel value is calculated as follows:
LR
 Bi
B total (LR ) = B i  = (2.5)
i=1
LR

In this equation, LR represents the number of iterations and B total (LR )i


represents the final luminance value from LR iterations. Also, B i  repre-
sents the ensemble mean of B i .
Basic 35

Figure 2.4. Pixel value calculation by ensemble averaging.

The variance of the final luminance value, with the luminance value Bi
as the random variable, is expanded as follows:
L 
 R
Bi
total
BVar (LR ) = Var
L
i=1 R
⎧ ⎫
1 ⎨ LR  ⎬
= 2 Var B i + 2 Cov B i , B j (2.6)
LR ⎩ i=1 i,j,i<j

The luminance values Bi are independent of each other. Thus, the value of
the covariance Cov(Bi , Bj ) is 0. The variance of Bi can be determined as
Bvar regardless of the number of iterations. Thus, the standard deviation
of the final luminance value is as follows:
total BVar
BVar (LR ) = (2.7)
LR
Note that the more the number of iterations used, the better the image
quality becomes. This method is often used to improve the image quality
of noisy images such as satellite images. The ensemble averaging method
can be used to control the level of detail in the display of volume rendering
results. Creating an average image with a small number of iterations is
36 Analysis and Visualization of Discrete Data

Figure 2.5. Relationship between the number of iterations and the quality of the gener-
ated images.

suitable for fast display. If high-quality rendering results are required, a


sufficient number of iterations should be performed.
Figure 2.5 shows the relationship between the number of iterations and
the quality of the generated image. You will note that the image quality
improves as the number of iterations increases. In this case, image quality
was evaluated by the average pixel value of the image difference from the
ray-casted volume-rendered image created using the same transfer function.

2.2. Statistical Analysis in Excel


The following statistical analyses can be performed in Excel:

• Calculation of mean, median, maximum, minimum, standard deviation,


variance, ratio, the ratio of change.
• Correlation analysis, regression analysis, t-test, F -test, ANOVA, Z-test,
χ2 test.
• Histograms, box plots, scatter plots, line charts, bar charts, pie charts,
3D charts.
• Pivot tables, contingency tables, conditional formatting, copying formu-
las, search and replace.
• Filtering, sorting, importing, and exporting data.
• Creating a macro, recording the operation history, saving and executing
macros.

The following sections describe correlation analysis, regression analysis,


t-test, F -test, and Z-test.
Another random document with
no related content on Scribd:
all has been a dream since that one night in a garden of Zarab-
shan."
Very little remained of the moon patch. The Yellow Girl stepped a tiny
pace forward, to prolong her stay yet another few moments. All but
the moonlit strip of the rug from Samarcand glowed bloodily in the
flare of the brazen mosque lamp.
"No, forgetful lover," chided the Yellow Girl, "I can not return. I can
not cross the Border again. In Samarcand, eight hundred years ago
we mocked for a while the doom that hung over us, and in the end
called the bowstring but a caress of farewell. Again, in the garden of
Zarab-shan we met, we parted, and you forgot: so this time I take no
chances. While I can not return, you at least can follow me ... if you
will ... for it is very easy...."
She edged along the ever narrowing strip of moon-bathed silk, and
with an embracing gesture, lured Clarke to rise and follow her.
"It is so easy ... move lightly ... but be careful not to disturb your body
or overbalance it...."
Had Diane not turned away from the door; were she not even now
strolling insouciantly down Royal Street——
"Yellow Girl, you and I have had enough of farewells!"
Something left Clarke, tottered perilously on the two handbreadths of
moonlight that remained, then caught the Yellow Girl by the hand
and took the lead.
The blue web of the rug from Samarcand gleamed for another
moment in the moonlight, then sweltered in the red glow of the
mosque lamp.
*** END OF THE PROJECT GUTENBERG EBOOK THE GIRL
FROM SAMARCAND ***

Updated editions will replace the previous one—the old editions will
be renamed.

Creating the works from print editions not protected by U.S.


copyright law means that no one owns a United States copyright in
these works, so the Foundation (and you!) can copy and distribute it
in the United States without permission and without paying copyright
royalties. Special rules, set forth in the General Terms of Use part of
this license, apply to copying and distributing Project Gutenberg™
electronic works to protect the PROJECT GUTENBERG™ concept
and trademark. Project Gutenberg is a registered trademark, and
may not be used if you charge for an eBook, except by following the
terms of the trademark license, including paying royalties for use of
the Project Gutenberg trademark. If you do not charge anything for
copies of this eBook, complying with the trademark license is very
easy. You may use this eBook for nearly any purpose such as
creation of derivative works, reports, performances and research.
Project Gutenberg eBooks may be modified and printed and given
away—you may do practically ANYTHING in the United States with
eBooks not protected by U.S. copyright law. Redistribution is subject
to the trademark license, especially commercial redistribution.

START: FULL LICENSE


THE FULL PROJECT GUTENBERG LICENSE
PLEASE READ THIS BEFORE YOU DISTRIBUTE OR USE THIS WORK

To protect the Project Gutenberg™ mission of promoting the free


distribution of electronic works, by using or distributing this work (or
any other work associated in any way with the phrase “Project
Gutenberg”), you agree to comply with all the terms of the Full
Project Gutenberg™ License available with this file or online at
www.gutenberg.org/license.

Section 1. General Terms of Use and


Redistributing Project Gutenberg™
electronic works
1.A. By reading or using any part of this Project Gutenberg™
electronic work, you indicate that you have read, understand, agree
to and accept all the terms of this license and intellectual property
(trademark/copyright) agreement. If you do not agree to abide by all
the terms of this agreement, you must cease using and return or
destroy all copies of Project Gutenberg™ electronic works in your
possession. If you paid a fee for obtaining a copy of or access to a
Project Gutenberg™ electronic work and you do not agree to be
bound by the terms of this agreement, you may obtain a refund from
the person or entity to whom you paid the fee as set forth in
paragraph 1.E.8.

1.B. “Project Gutenberg” is a registered trademark. It may only be


used on or associated in any way with an electronic work by people
who agree to be bound by the terms of this agreement. There are a
few things that you can do with most Project Gutenberg™ electronic
works even without complying with the full terms of this agreement.
See paragraph 1.C below. There are a lot of things you can do with
Project Gutenberg™ electronic works if you follow the terms of this
agreement and help preserve free future access to Project
Gutenberg™ electronic works. See paragraph 1.E below.
1.C. The Project Gutenberg Literary Archive Foundation (“the
Foundation” or PGLAF), owns a compilation copyright in the
collection of Project Gutenberg™ electronic works. Nearly all the
individual works in the collection are in the public domain in the
United States. If an individual work is unprotected by copyright law in
the United States and you are located in the United States, we do
not claim a right to prevent you from copying, distributing,
performing, displaying or creating derivative works based on the
work as long as all references to Project Gutenberg are removed. Of
course, we hope that you will support the Project Gutenberg™
mission of promoting free access to electronic works by freely
sharing Project Gutenberg™ works in compliance with the terms of
this agreement for keeping the Project Gutenberg™ name
associated with the work. You can easily comply with the terms of
this agreement by keeping this work in the same format with its
attached full Project Gutenberg™ License when you share it without
charge with others.

1.D. The copyright laws of the place where you are located also
govern what you can do with this work. Copyright laws in most
countries are in a constant state of change. If you are outside the
United States, check the laws of your country in addition to the terms
of this agreement before downloading, copying, displaying,
performing, distributing or creating derivative works based on this
work or any other Project Gutenberg™ work. The Foundation makes
no representations concerning the copyright status of any work in
any country other than the United States.

1.E. Unless you have removed all references to Project Gutenberg:

1.E.1. The following sentence, with active links to, or other


immediate access to, the full Project Gutenberg™ License must
appear prominently whenever any copy of a Project Gutenberg™
work (any work on which the phrase “Project Gutenberg” appears, or
with which the phrase “Project Gutenberg” is associated) is
accessed, displayed, performed, viewed, copied or distributed:
This eBook is for the use of anyone anywhere in the United
States and most other parts of the world at no cost and with
almost no restrictions whatsoever. You may copy it, give it away
or re-use it under the terms of the Project Gutenberg License
included with this eBook or online at www.gutenberg.org. If you
are not located in the United States, you will have to check the
laws of the country where you are located before using this
eBook.

1.E.2. If an individual Project Gutenberg™ electronic work is derived


from texts not protected by U.S. copyright law (does not contain a
notice indicating that it is posted with permission of the copyright
holder), the work can be copied and distributed to anyone in the
United States without paying any fees or charges. If you are
redistributing or providing access to a work with the phrase “Project
Gutenberg” associated with or appearing on the work, you must
comply either with the requirements of paragraphs 1.E.1 through
1.E.7 or obtain permission for the use of the work and the Project
Gutenberg™ trademark as set forth in paragraphs 1.E.8 or 1.E.9.

1.E.3. If an individual Project Gutenberg™ electronic work is posted


with the permission of the copyright holder, your use and distribution
must comply with both paragraphs 1.E.1 through 1.E.7 and any
additional terms imposed by the copyright holder. Additional terms
will be linked to the Project Gutenberg™ License for all works posted
with the permission of the copyright holder found at the beginning of
this work.

1.E.4. Do not unlink or detach or remove the full Project


Gutenberg™ License terms from this work, or any files containing a
part of this work or any other work associated with Project
Gutenberg™.

1.E.5. Do not copy, display, perform, distribute or redistribute this


electronic work, or any part of this electronic work, without
prominently displaying the sentence set forth in paragraph 1.E.1 with
active links or immediate access to the full terms of the Project
Gutenberg™ License.
1.E.6. You may convert to and distribute this work in any binary,
compressed, marked up, nonproprietary or proprietary form,
including any word processing or hypertext form. However, if you
provide access to or distribute copies of a Project Gutenberg™ work
in a format other than “Plain Vanilla ASCII” or other format used in
the official version posted on the official Project Gutenberg™ website
(www.gutenberg.org), you must, at no additional cost, fee or expense
to the user, provide a copy, a means of exporting a copy, or a means
of obtaining a copy upon request, of the work in its original “Plain
Vanilla ASCII” or other form. Any alternate format must include the
full Project Gutenberg™ License as specified in paragraph 1.E.1.

1.E.7. Do not charge a fee for access to, viewing, displaying,


performing, copying or distributing any Project Gutenberg™ works
unless you comply with paragraph 1.E.8 or 1.E.9.

1.E.8. You may charge a reasonable fee for copies of or providing


access to or distributing Project Gutenberg™ electronic works
provided that:

• You pay a royalty fee of 20% of the gross profits you derive from
the use of Project Gutenberg™ works calculated using the
method you already use to calculate your applicable taxes. The
fee is owed to the owner of the Project Gutenberg™ trademark,
but he has agreed to donate royalties under this paragraph to
the Project Gutenberg Literary Archive Foundation. Royalty
payments must be paid within 60 days following each date on
which you prepare (or are legally required to prepare) your
periodic tax returns. Royalty payments should be clearly marked
as such and sent to the Project Gutenberg Literary Archive
Foundation at the address specified in Section 4, “Information
about donations to the Project Gutenberg Literary Archive
Foundation.”

• You provide a full refund of any money paid by a user who


notifies you in writing (or by e-mail) within 30 days of receipt that
s/he does not agree to the terms of the full Project Gutenberg™
License. You must require such a user to return or destroy all
copies of the works possessed in a physical medium and
discontinue all use of and all access to other copies of Project
Gutenberg™ works.

• You provide, in accordance with paragraph 1.F.3, a full refund of


any money paid for a work or a replacement copy, if a defect in
the electronic work is discovered and reported to you within 90
days of receipt of the work.

• You comply with all other terms of this agreement for free
distribution of Project Gutenberg™ works.

1.E.9. If you wish to charge a fee or distribute a Project Gutenberg™


electronic work or group of works on different terms than are set
forth in this agreement, you must obtain permission in writing from
the Project Gutenberg Literary Archive Foundation, the manager of
the Project Gutenberg™ trademark. Contact the Foundation as set
forth in Section 3 below.

1.F.

1.F.1. Project Gutenberg volunteers and employees expend


considerable effort to identify, do copyright research on, transcribe
and proofread works not protected by U.S. copyright law in creating
the Project Gutenberg™ collection. Despite these efforts, Project
Gutenberg™ electronic works, and the medium on which they may
be stored, may contain “Defects,” such as, but not limited to,
incomplete, inaccurate or corrupt data, transcription errors, a
copyright or other intellectual property infringement, a defective or
damaged disk or other medium, a computer virus, or computer
codes that damage or cannot be read by your equipment.

1.F.2. LIMITED WARRANTY, DISCLAIMER OF DAMAGES - Except


for the “Right of Replacement or Refund” described in paragraph
1.F.3, the Project Gutenberg Literary Archive Foundation, the owner
of the Project Gutenberg™ trademark, and any other party
distributing a Project Gutenberg™ electronic work under this
agreement, disclaim all liability to you for damages, costs and
expenses, including legal fees. YOU AGREE THAT YOU HAVE NO
REMEDIES FOR NEGLIGENCE, STRICT LIABILITY, BREACH OF
WARRANTY OR BREACH OF CONTRACT EXCEPT THOSE
PROVIDED IN PARAGRAPH 1.F.3. YOU AGREE THAT THE
FOUNDATION, THE TRADEMARK OWNER, AND ANY
DISTRIBUTOR UNDER THIS AGREEMENT WILL NOT BE LIABLE
TO YOU FOR ACTUAL, DIRECT, INDIRECT, CONSEQUENTIAL,
PUNITIVE OR INCIDENTAL DAMAGES EVEN IF YOU GIVE
NOTICE OF THE POSSIBILITY OF SUCH DAMAGE.

1.F.3. LIMITED RIGHT OF REPLACEMENT OR REFUND - If you


discover a defect in this electronic work within 90 days of receiving it,
you can receive a refund of the money (if any) you paid for it by
sending a written explanation to the person you received the work
from. If you received the work on a physical medium, you must
return the medium with your written explanation. The person or entity
that provided you with the defective work may elect to provide a
replacement copy in lieu of a refund. If you received the work
electronically, the person or entity providing it to you may choose to
give you a second opportunity to receive the work electronically in
lieu of a refund. If the second copy is also defective, you may
demand a refund in writing without further opportunities to fix the
problem.

1.F.4. Except for the limited right of replacement or refund set forth in
paragraph 1.F.3, this work is provided to you ‘AS-IS’, WITH NO
OTHER WARRANTIES OF ANY KIND, EXPRESS OR IMPLIED,
INCLUDING BUT NOT LIMITED TO WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR ANY PURPOSE.

1.F.5. Some states do not allow disclaimers of certain implied


warranties or the exclusion or limitation of certain types of damages.
If any disclaimer or limitation set forth in this agreement violates the
law of the state applicable to this agreement, the agreement shall be
interpreted to make the maximum disclaimer or limitation permitted
by the applicable state law. The invalidity or unenforceability of any
provision of this agreement shall not void the remaining provisions.
1.F.6. INDEMNITY - You agree to indemnify and hold the
Foundation, the trademark owner, any agent or employee of the
Foundation, anyone providing copies of Project Gutenberg™
electronic works in accordance with this agreement, and any
volunteers associated with the production, promotion and distribution
of Project Gutenberg™ electronic works, harmless from all liability,
costs and expenses, including legal fees, that arise directly or
indirectly from any of the following which you do or cause to occur:
(a) distribution of this or any Project Gutenberg™ work, (b)
alteration, modification, or additions or deletions to any Project
Gutenberg™ work, and (c) any Defect you cause.

Section 2. Information about the Mission of


Project Gutenberg™
Project Gutenberg™ is synonymous with the free distribution of
electronic works in formats readable by the widest variety of
computers including obsolete, old, middle-aged and new computers.
It exists because of the efforts of hundreds of volunteers and
donations from people in all walks of life.

Volunteers and financial support to provide volunteers with the


assistance they need are critical to reaching Project Gutenberg™’s
goals and ensuring that the Project Gutenberg™ collection will
remain freely available for generations to come. In 2001, the Project
Gutenberg Literary Archive Foundation was created to provide a
secure and permanent future for Project Gutenberg™ and future
generations. To learn more about the Project Gutenberg Literary
Archive Foundation and how your efforts and donations can help,
see Sections 3 and 4 and the Foundation information page at
www.gutenberg.org.

Section 3. Information about the Project


Gutenberg Literary Archive Foundation
The Project Gutenberg Literary Archive Foundation is a non-profit
501(c)(3) educational corporation organized under the laws of the
state of Mississippi and granted tax exempt status by the Internal
Revenue Service. The Foundation’s EIN or federal tax identification
number is 64-6221541. Contributions to the Project Gutenberg
Literary Archive Foundation are tax deductible to the full extent
permitted by U.S. federal laws and your state’s laws.

The Foundation’s business office is located at 809 North 1500 West,


Salt Lake City, UT 84116, (801) 596-1887. Email contact links and up
to date contact information can be found at the Foundation’s website
and official page at www.gutenberg.org/contact

Section 4. Information about Donations to


the Project Gutenberg Literary Archive
Foundation
Project Gutenberg™ depends upon and cannot survive without
widespread public support and donations to carry out its mission of
increasing the number of public domain and licensed works that can
be freely distributed in machine-readable form accessible by the
widest array of equipment including outdated equipment. Many small
donations ($1 to $5,000) are particularly important to maintaining tax
exempt status with the IRS.

The Foundation is committed to complying with the laws regulating


charities and charitable donations in all 50 states of the United
States. Compliance requirements are not uniform and it takes a
considerable effort, much paperwork and many fees to meet and
keep up with these requirements. We do not solicit donations in
locations where we have not received written confirmation of
compliance. To SEND DONATIONS or determine the status of
compliance for any particular state visit www.gutenberg.org/donate.

While we cannot and do not solicit contributions from states where


we have not met the solicitation requirements, we know of no
prohibition against accepting unsolicited donations from donors in
such states who approach us with offers to donate.

International donations are gratefully accepted, but we cannot make


any statements concerning tax treatment of donations received from
outside the United States. U.S. laws alone swamp our small staff.

Please check the Project Gutenberg web pages for current donation
methods and addresses. Donations are accepted in a number of
other ways including checks, online payments and credit card
donations. To donate, please visit: www.gutenberg.org/donate.

Section 5. General Information About Project


Gutenberg™ electronic works
Professor Michael S. Hart was the originator of the Project
Gutenberg™ concept of a library of electronic works that could be
freely shared with anyone. For forty years, he produced and
distributed Project Gutenberg™ eBooks with only a loose network of
volunteer support.

Project Gutenberg™ eBooks are often created from several printed


editions, all of which are confirmed as not protected by copyright in
the U.S. unless a copyright notice is included. Thus, we do not
necessarily keep eBooks in compliance with any particular paper
edition.

Most people start at our website which has the main PG search
facility: www.gutenberg.org.

This website includes information about Project Gutenberg™,


including how to make donations to the Project Gutenberg Literary
Archive Foundation, how to help produce our new eBooks, and how
to subscribe to our email newsletter to hear about new eBooks.

You might also like