Download as pdf or txt
Download as pdf or txt
You are on page 1of 83

User Guide for

DualSPHysics code

DualSPHysics_v2.0
March 2012

[email protected]
A.J.C. Crespo ([email protected])
J.M. Dominguez ([email protected])
M.G. Gesteira ([email protected])
A. Barreiro ([email protected])
B.D. Rogers ([email protected])

DualSPHysics is part of the SPHysics project:

University of Vigo, The University of Manchester and Johns Hopkins University

1
2
Acknowledgements

The development of DualSPHysics was partially supported by:


 Xunta de Galicia under project PGIDIT06PXIB383285PR.
 Xunta de Galicia under project Programa de Consolidación e Estructuración
de Unidades de Investigación Competitivas (Grupos de Referencia
Competitiva) and also financed by European Regional Development Fund
(FEDER).
 ESPHI (An European Smooth Particle Hydrodynamics Initiative) project
supported by the Commission of the European Communities (Marie Curie
Actions, contract number MTKI-CT-2006-042350).
 Research Councils UK (RCUK) Research Fellowship.
 EPSRC Grant EP/H003045/1.
 We want to acknowledge Orlando Garcia Feal for his technical help.

3
4
Abstract

This guide documents the DualSPHysics code based on the Smoothed Particle
Hydrodynamics model named SPHysics. This manuscript describes how to compile and
run the DualSPHysics code (a set of C++ and CUDA files). New pre-processing tools
are implemented to create more complex geometries and new post-processing tools are
developed to analyse easily numerical results. Several working examples are
documented to enable the user to use the codes and understand how they work.

5
6
Contents

1. Introduction 9
2. SPH formulation 11
3. CPU and GPU implementation 13
3.1 CPU optimizations 13
3.2 GPU implementation 14
3.3 GPU optimizations 15
4. DualSPHysics open-source code 17
4.1 CPU source files 20
4.2 GPU source files 24
5. Compiling DualSPHysics 27
5.1 Windows compilation 27
5.2 Linux compilation 28
6. Running DualSPHysics 29
7. Format Files 31
8. Processing 33
9. Pre-processing 35
10. Post-processing 41
10.1 Visualization of boundaries 41
10.2 Visualization of all particles output data 42
10.3 Analysis of numerical measurements 44
10.4 Surface representation 46
11. Testcases 49
11.1 CaseDambreak 51
11.2 CaseWavemaker 55
11.3 CaseRealface 58
11.4 CasePump 62
11.5 CaseDambreak2D 65
11.6 CaseWavemaker2D 67
11.7 CaseFloating 70
12. How to modify DualSPHysics for your application 71
12.1 Creating new cases 71
12.2 Source files 71
13. FAQ 73
14. DualSPHysics future 77
15. References 79
16. Licenses 81

7
8
1. Introduction

Smoothed Particle Hydrodynamics is a Lagrangian meshless method that has been used
in an expanding range of applications within the field of Computation Fluid Dynamics
[Gómez-Gesteira et al. 2010a] where fluids present discontinuities in the flow, interact
with structures, and exhibit large deformation with moving boundaries. The SPH model
is approaching a mature stage with continuing improvements and modifications such
that the accuracy, stability and reliability of the model are reaching an acceptable level
for practical engineering applications.

SPHysics is an open-source SPH model developed by researchers at the Johns Hopkins


University (US), the University of Vigo (Spain), the University of Manchester (UK) and
the University of Rome, La Sapienza. The software is available to free download at
www.sphysics.org. A complete guide of the FORTRAN code is found in [Gómez-
Gesteira et al. 2010b].

The SPHysics Fortran code was validated for different problems of wave breaking
[Dalrymple and Rogers, 2006], dam-break behaviour [Crespo et al. 2008], interaction
with coastal structures [Gómez-Gesteira and Dalrymple, 2004] or with moving
breakwater [Rogers et al. 2010].

Although SPHysics allows us to model problems using fine description, the main
problem for the application to real engineering problems is the long computational
runtime, meaning that SPHysics is rarely applied to large domains. Hardware
acceleration and parallel computing are required to make SPHysics more useful and
versatile.

Graphics Processing Units (GPUs) appear as a cheap alternative to handle High


Performance Computing (HPC) for numerical modelling. GPUs are designed to manage
huge amounts of data and their computing power has developed in recent years much
faster than conventional central processing units (CPUs). Compute Unified Device
Architecture (CUDA) is a parallel programming framework and language for gpu
computing using some extensions to the C/C++ language. Researchers and engineers of
different fields are achieving high speedups implementing their codes with the CUDA
language. Thus, the parallel power computing of GPUs can be also applied for SPH
methods where the same loops for each particle along the simulation can be parallelised.

The first work where a classical SPH approach was executed on the GPU belongs to
Harada et al. [2007]. A remarkable element of their work was the implementation
before the appearance of CUDA. The first GPU model based on the SPHysics

9
formulation was developed by Hérault et al. [2010] where they applied SPH to study
free-surface flows.

The code DualSPHysics has been developed starting from the SPH formulation
implemented in SPHysics. This FORTRAN code is robust and reliable but is not
properly optimised for huge simulations. DualSPHysics is implemented in C++ and
CUDA language to carry out simulations on the CPU and GPU respectively. The new
CPU code presents some advantages, such as more optimised use of the memory. The
object-oriented programming paradigm provides means to develop a code that is easy to
understand, maintain and modify with a sophisticated control of errors available.
Furthermore, better approaches are implemented, for example particles are reordered to
give faster access to memory, symmetry is considered in the force computation to
reduce the number of particle interactions and the best approach to create the neighbour
list is implemented [Dominguez et al. 2010]. The CUDA language manages the parallel
execution of threads on the GPUs. The best approaches were considered to be
implemented as an extension of the C++ code, so the best optimizations to parallelise
particle interaction on GPU were implemented [Crespo et al. 2009]. Preliminary results
were presented in Crespo et al. [2010] and the first rigorous validations were presented
in Crespo et al. [2011].

DualSPHysics code has been developed to simulate real-life engineering problems


using SPH models.

In the following sections we will describe the SPH formulation available in


DualSPHysics, the implementation and optimization techniques, how to compile and
run the different codes of the DualSPHysics package and future developments.

The user can download from www.dual.sphysics.org the following files:

- DUALSPHYSICS DOCUMENTATION:
o DualSPHysics_v2.0_GUIDE.pdf
o ExternalModelsConversion_GUIDE.pdf
o GenCase_XML_GUIDE.pdf

- DUALSPHYSICS PACKAGE:
o DualSPHysics_v2.0_linux_32bit.zip
o DualSPHysics_v2.0_linux_64bit.zip
o DualSPHysics_v2.0_windows_32bit.zip
o DualSPHysics_v2.0_windows_64bit.zip

- DUALSPHYSICS SOURCE FILES:


o DualSPHysics_v2.0_SourceFiles_linux.zip
o DualSPHysics_v2.0_SourceFiles_windows.zip

10
2. SPH formulation

All the SPH theory implemented in DualSPHysics is taken directly from the SPHysics
code. Therefore a more detailed description, equations and references are collected in
[Gómez-Gesteira et al. 2012a, 2012b].

Here we only summarise the SPH formulation already available on the new
DualSPHysics code:
 Time integration scheme:
- Verlet [Verlet, 1967].
- Symplectic [Leimkhuler, 1997].
 Variable time step [Monaghan and Kos, 1999].
 Kernel functions:
- Cubic Spline kernel [Monaghan and Lattanzio, 1985].
- Quintic Wendland kernel [Wendland, 1995].
 Kernel gradient correction [Bonet and Lok, 1999].
 Shepard density filter [Panizzo, 2004].
 Viscosity treatments:
- Artificial viscosity [Monaghan, 1992].
- Laminar viscosity + SPS turbulence model [Dalrymple and Rogers, 2006].
 Weakly compressible approach using Tait’s equation of state [Monaghan et al.,
1999].
 Dynamic boundary conditions [Crespo et al. 2007].
 Floating objects [Monaghan et al. 2003] (not implemented for Laminar+SPS
viscosity and Kernel Gradient Correction).

Features that will be integrated soon on the CPU-GPU solver as future improvements:
 Periodic open boundaries.
 SPH-ALE with Riemann Solver [Rogers et al. 2010].
 Primitive-variable Riemann Solver.
 Variable particle resolution [Omidvar et al. 2012].
 Multiphase (gas-solid-water).
 Inlet/outlet flow conditions.
 Modified virtual boundary conditions [Vacondio et al. 2011].

11
12
3. CPU and GPU implementation

The DualSPHysics code is the result of an optimised implementation using the best
approaches for CPU and GPU with the accuracy, robustness and reliability shown by
the SPHysics code. SPH simulations such as those in the SPHysics and DualSPHysics
codes can be split in three main steps; (i) generation of the neighbour list, (ii)
computation of the forces between particles (solving momentum and continuity
equations) and (iii) the update of the physical quantities at the next time step. Thus,
running a simulation means executing these steps in an iterative manner:

1st STEP: Neighbour list (Cell-linked list described in Dominguez et al. [2010]):
- Domain is divided in square cells of side 2h (or the size of the kernel domain).
- Only a list of particles, ordered according to the cell they belong to, is generated.
- All the physical variables of the particles are reordered.
nd
2 STEP: Force computation:
- Particles of the same cell and adjacent cells are candidates to be neighbours.
- Each particle interacts with all its neighbouring particles (at a distance < 2h).
rd
3 STEP: System Update:
- New time step is computed.
- Physical quantities are updated in the next step starting from the current
magnitudes, the interaction forces and the new time step value.
- Particle data are stored occasionally.

3.1 CPU optimizations

The optimizations applied to the CPU implementation of DualSPHysics are:

Applying symmetry to particle-particle interaction


When the force, fab, exerted by a particle, a, on a neighbour particle, b, is computed, the
force exerted by the neighbouring particle on the first one can be known since it has the
same magnitude but opposite direction (fba =-fab). Thus, the number of interactions to be
evaluated can be reduced by two, which decreases the computational time.

Splitting the domain into smaller cells


The domain is split into cells of size (2h×2h×2h) to reduce the neighbour search to only
the adjacent cells. However, only about 30% of the particles of the adjacent cells are
real neighbours (at a distance < 2h). A suitable technique to diminish the number of
false neighbours would be to reduce the volume of the cell. Thus, the domain can be
split into cells of size (h×h×h).

13
Figure 3-1. The volume searched using cells of side 2h (left panels) is bigger than using cells of
side h (right panels). Note that symmetry is applied in the interaction.

Multi-core programming with OpenMP


Current CPUs have several cores or processing units, so it is essential to distribute the
computation load among them to maximize the CPU performance and to accelerate the
SPH code. OpenMP is a portable and flexible programming interface whose
implementation does not involve major changes in the code. Using OpenMP, multiple
threads for a process can be easily created. These threads are distributed among all the
cores of the CPU sharing the memory. So, there is no need to duplicate data or to
transfer information between threads.

3.2 GPU Implementation

A GPU implementation initially focused on the force computation since following


Dominguez et al. [2010] this is the most consuming part in terms of runtime. However
the most efficient technique consists of minimising the communications between the
CPU and GPU for the data transfers. If neighbour list and system update are also
implemented on the GPU only one CPU-GPU transfer is needed at the beginning of the
simulation while relevant data will be transferred to the CPU when saving output data is
required (usually infrequently). Crespo et al. [2011] used an execution of DualSPHysics
performed entirely on the GPU to run a numerical experiment where the results are in
close agreement with the experimental results.

The GPU implementation presents some key differences in comparison to the CPU
version. The main difference is the parallel execution of all tasks that can be parallelised
such as all loops regarding particles. One GPU execution thread computes the resulting
force of one particle performing all the interactions with its neighbours. The symmetry
of the particle interaction is employed on the CPU reducing the runtime, but is not
applied in the GPU implementation since it is not efficient due to memory coalescence
issues.

DualSPHysics is unique where the same application can be run using either the CPU or
GPU implementation; this facilitates the use of the code not only on workstations with a
Nvidia GPU but also on machines without a CUDA-enabled GPU. The main code has a
14
common core for both the CPU and GPU implementations with only minor source code
differences implemented for the two devices applying the specific optimizations for
CPU and GPU. Thus, debugging or maintenance is easier and comparisons of results
and computational time are more direct. Figure 3-2 shows a flow diagram to represent
the differences between the CPU and GPU implementations and the different steps
involved in a complete execution.

Figure 3-2. Flow diagram of the CPU (left) and total GPU implementation (right).

3.3 GPU Optimizations

An efficient and full use of all the capabilities of the GPU is not straightforward and on
the other hand, the GPU implementation presents several limitations mainly due to the
Lagrangian nature of the SPH method. Thus, the CPU-GPU transfers must be
minimised, code divergence must be reduced, no coalescent memory accesses gives rise
to less performance and no balanced workload must be avoided. Therefore the
following optimizations were developed to avoid or minimize these problems:

Full implementation on GPU


To minimize the CPU-GPU data transfers, the entire SPH computation can be
implemented on the GPU keeping data in the GPU memory (see right panel of Fig. 3-2).
Therefore, the CPU-GPU communications are drastically reduced and only specific
results will be recovered from GPU at some time steps. Moreover, when the other two
main stages of the SPH method (neighbour list generation and system update) are
implemented on GPU, the computational time devoted to these processes decreases.

Maximizing the occupancy of GPU


Since the access to the GPU global memory is irregular during the particle interaction, it
is essential to have the largest number of active warps in order to hide the latencies of

15
memory access and keep the hardware as busy as possible. The number of active warps
depends on the registers required for the CUDA kernel (defined in the file
DualSPHysics_ptxasinfo), the GPU specifications and the number of threads per block.
Using this optimization, the size of the block is adjusted according to the registers of the
kernel and the hardware specifications to maximise the occupancy.

Simplifying the neighbour search


During the GPU execution of the interaction kernel, each thread has to look for the
neighbours of its particle sweeping through the particles that belong to its own cell and
to the surrounding cells, a total of 27 cells in 3-D since symmetry cannot be applied.
However, this procedure can be optimized simplifying the neighbour search. This
process of neighbour searching can be removed from the interaction kernel if the range
of particles that could interact with the target particle is previously known. Since
particles are reordered according to the cells and cells follow the order of X, Y and Z
axis, the range of particles of three consecutive cells in the X-axis (cellx,y,z, cellx+1,y,z y
cellx+2,y,z) is equal to the range from the first particle of cellx,y,z to the last of cellx+2,y,z.
Thus, the 27 cells can be defined as 9 ranges of particles. The interaction kernel is
significantly simplified, when these ranges are known in advance. Thus, the memory
accesses decrease and the number of divergent warps is reduced.

Figure 3-3. Interaction cells using 9 ranges of three consecutive cells (right) instead of 27 cells
(left). Note that symmetry is not applied in the GPU interaction.

Division of the domain into smaller cells


As described in the optimization applied in the CPU implementation, the procedure
consists in dividing the domain into cells of size h instead of size 2h in order to increase
the percentage of real neighbours. Using cells of size h on the GPU implementation, the
number of pair-wise interactions decreases.

16
4. DualSPHysics open-source code

This section reports a brief description of the source files of the SPH solver that have
been released with DualSPHysics v2.0. The source code is freely redistributable under
the terms of the GNU General Public License (GPL) as published by the Free Software
Foundation.

Thus, users can download the files from DUALSPHYSICS SOURCE FILES.

First, note that a more complete documentation is provided in directory DOXY. This
documentation has been created using the documentation system Doxygen
(www.doxygen.org).
To navigate through the full documentation open the HTML file index.html as it is
shown in Figure 4-1.

Figure 4-1. Documentation for DualSPHysics code generated with Doxygen.

17
Open source files are in directory SOURCE and a complete list of the code files are
summarised in Table 4-1.

Common SPH on CPU SPH on GPU


JException (.h .cpp) TypesDef.h main.cpp
JLog (.h .cpp) Functions.h JCfgRun (.h .cpp)
JObject (.h .cpp) JTimer.h JSph (.h .cpp)
JPartData (.h .cpp) JTimerCuda.h Types.h
JPtxasInfo (.h .cpp) JSphCpu (.h .cpp) JSphGpu (.h .cpp)
JFormatFiles2.lib
JSpaceCtes (.h .cpp) JSphTimersCpu.h JSphTimersGpu.h
libjformatfiles2.a
JSpaceEParms (.h .cpp) JDivideCpu (.h .cpp) CudaSphApi (.h .cu)
JSphMotion.lib
JSpaceParts (.h .cpp) CudaSphFC.cu
libjsphmotion.a
JVarsAscii (.h .cpp) CudaSphNL.cu
JXml.lib
libjxml.a CudaSphSU.cu

Table 4-1. List of source files of DualSPHysics code.

Common files are also used for other codes such as GenCase, BoundaryVTK, PartVTK,
MeasureTool and IsoSurface. The rest of the files implements the SPH solver, some of
them are used both for CPU/GPU executions and others are specific:

COMMON FILES:
JException.h & JException.cpp
Declares/implements the class that defines exceptions with the information of the class and method.

JLog.h & JLog.cpp


Declares and implements the class that allows creating a log file with info of the execution.

JObject.h & JObject.cpp


Declares/implements the parent class that defines objects with methods that throws exceptions.

JPartData.h & JPartData.cpp


Declares/implements the class that allows reading/writing files with particles data in different formats.

JPtxasInfo.h & JPtxasInfo.cpp


Declares/implements the class that returns the number of registers of each cuda kernel.

JSpaceCtes.h & JSpaceCtes.cpp


Declares/implements the class that manages the info of constants from the input XML file.

JSpaceEParms.h & JSpaceEParms.cpp


Declares/implements the class that manages the info of execution parameters from the input XML file.

JSpaceParts.h & JSpaceParts.cpp


Declares/implements the class that manages the info of particles from the input XML file.

JVarsAscii.h & JVarsAscii.cpp


Declares/implements the class that reads variables from a text file in ASCII format .

TypesDef.h
Declares general types and functions for the entire application.

Functions.h
Declares basic/general functions for the entire application.

JTimer.h
Declares the class that defines a class to measure short time intervals.

18
JTimerCuda.h
Declares the class that defines a class to measure short time intervals using cudaEvent .

JFormatFiles2.lib (libjformatfiles2.a)
Precompiled library that provides functions to store particle data in formats VTK, CSV, ASCII.

JSphMotion.lib (libjsphmotion.a)
Precompiled library that provides the displacement of moving objects during a time interval.

JXml.lib (libjxml.a)
Precompiled library that class that helps to manage the XML document using library TinyXML.

SPH SOLVER:
main.cpp
Main file of the project that executes the code on CPU or GPU.

JCfgRun.h & JCfgRun.cpp


Declares/implements the class that defines the class responsible of collecting the execution parameters by
command line.

JSph.h & JSph.cpp


Declares/implements the class that defines all the attributes and functions that CPU and GPU simulations
share.

Types.h
Defines specific types for the SPH application.

SPH SOLVER ONLY FOR CPU EXECUTIONS:


JSphCpu.h & JSphCpu.cpp
Declares/implements the class that defines the attributes and functions used only in CPU simulations.

JSphTimersCpu.h
Measures time intervals during CPU execution.

JDivideCpu.h & JDivideCpu.cpp


Declares/implements the class that defines the class responsible of computing the Neighbour List.

SPH SOLVER ONLY FOR GPU EXECUTIONS:


JSphGpu.h & JSphGpu.cpp
Declares/implements the class that defines the attributes and functions used only in GPU simulations.

JSphTimersGpu.h
Measures time intervals during GPU execution.

CudaSphApi.h & CudaSphApi.cu


Declares/implements all the functions used on GPU execution.

CudaSphFC.cu
Functions to compute forces on GPU.

CudaSphNL.cu
Functions to compute the neighbor list on GPU.

CudaSphSU.cu
Functions to update system on GPU.

19
4.1 CPU source files

The source file JSphCpu.cpp can be better understood with the help of the outline
represented in Figure 4-2:

RUN
Interaction_Forces PreInteraction_Forces

AllocMemory
DtVariable CallInteractionCells

DivideConfig ComputeStep_Ver ComputeVerletVars SPSCalcTau

LoadPartBegin
ComputeRhop

InitVars
RunFloating
PrintMemoryAlloc RunMotion

RunDivideBoundary
ComputeStepDivide RunDivideBoundary

RunDivideFluid
RunDivideFluid
RunShepard
SaveData

SaveData
MAIN LOOP

RUN Starts simulation.


AllocMemory Allocates memory of main data.
DivideConfig Configuration for neighbour list.
LoadPartBegin Loads PART to restart simulation.

InitVars Initialization of arrays and variables for the execution.


PrintMemoryAlloc Prints out the allocated memory.
RunDivideBoundary Computes neighbour list of boundary particles.
RunDivideFluid Computes neighbour list of fluid particles.
SaveData Creates files with output data.
MAIN LOOP Main loop of the simulation.

ComputeStep_Ver Computes particle interaction and updates system using Verlet.


Interaction_Forces Computes particle interaction.
PreInteraction_Forces Prepares variables for particle interaction.
CallInteractionCells Interaction between cells.
SPSCalcTau Computes sub-particle stress tensor for SPS model.
DtVariable Computes a variable time step.
ComputeVerletVars Computes new values of position and velocity using Verlet.
ComputeRhop Computes new values of density.
RunFloating Processes movement of particles of floating objects.

RunMotion Processes movement of moving boundary particles.


ComputeStepDivide Computes new neighbour list.

RunDivideBoundary Computes neighbour list of boundary particles.


RunDivideFluid Computes neighbour list of fluid particles.
RunShepard Applies Shepard density filter.
SaveData Creates files with output data.

Figure 4-2. Outline of JSphCpu.cpp when using Verlet time algorithm.

20
The previous outline corresponds to Verlet algorithm, but if Symplectic is used the step
is split in predictor and corrector. Thus, Figure 4-3 shows the structure of the CPU code
using this time scheme and correcting forces using kernel gradient correction (KGC):

Interaction_Forces
CallInteractionCells
Interaction_MatrixKgc
<INTER_MatrixKgc>

DtVariable PreInteraction_Forces

RUN ComputeVars CallInteractionCells


Predictor <INTER_ForcesKgc>

AllocMemory ComputeRhop
SPSCalcTau

DivideConfig RunFloating

LoadPartBegin
CallInteractionCells
ComputeStep_Sym RunDivideFluid Interaction_MatrixKgc
<INTER_MatrixKgc>
InitVars
Interaction_Forces PreInteraction_Forces
PrintMemoryAlloc Corrector

DtVariable CallInteractionCells
RunDivideBoundary <INTER_ForcesCorrKgc>
RunMotion
ComputeVars
RunDivideFluid SPSCalcTau

ComputeRhopEpsilon
SaveData
ComputeStepDivide RunDivideBoundary
RunFloating
MAIN LOOP
RunDivideFluid

CallInteractionCells
RunShepard
<INTER_Shepard>

SaveData

ComputeStep_Sym Computes particle interaction and updates system with Symplectic.

PREDICTOR Predictor step.


Interaction_Forces Computes particle interaction.
Interaction_MatrixKgc Computes matrix for kernel gradient correction.
CallInteractionCells Interaction between cells to compute elements of the matrix KGC.
<INTER_MatrixKgc>

PreInteraction_Forces Prepares variables for particle interaction.


CallInteractionCells Interaction between cells using corrected force in predictor.
<INTER_ForcesKgc>
SPSCalcTau Computes sub-particle stress tensor for SPS model.
DtVariable Computes a variable time step.

ComputeVars Computes new values of position and velocity

ComputeRhop Computes new values of density.

RunFloating Processes movement of particles of floating objects.

CORRECTOR Corrector step.


RunDivideFluid Computes neighbour list of fluid particles.
Interaction_Forces Computes particle interaction.
Interaction_MatrixKgc Computes matrix for kernel gradient correction.
CallInteractionCells Interaction between cells to compute elements of the matrix KGC.
<INTER_MatrixKgc>
PreInteraction_Forces Prepares variables for particle interaction.
CallInteractionCells Interaction between cells using corrected force in corrector.
<INTER_ForcesCorrKgc>

SPSCalcTau Computes sub-particle stress tensor for SPS model.

DtVariable Computes a variable time step.


ComputeVars Computes new values of position and velocity
ComputeRhopEpsilon Computes new values of density in corrector.
RunFloating Processes movement of particles of floating objects.

Figure 4-3. Outline of JSphCpu.cpp when using Symplectic time algorithm and KGC.

21
Note that JSphCpu::CallInteractionCells is implemented using a template that is
declared in the file JSphCpu.h, that calls the function InteractionCells:

CallInteractionCells

<INTER_ForcesKgc>
<INTER_MatrixKgc> <INTER_ForcesCorrKgc> <INTER_Shepard>
<INTER_Forces>
<INTER_ForcesCorr>

InteractionCells
InteractionCells_Single
InteractionCells_Dynamic
InteractionCells_Static

InteractCelij
InteractSelf

ComputeMatrixKgc ComputeForces ComputeForcesShepard

Figure 4-4. Call graph for the function CallInteractionCells.

The interaction between cells can be performed with a single execution thread
(InteractionCells_Single), or using different threads thanks to the OpenMP
implementation (InteractionCells_Dynamic & InteractionCells_Static). Furthermore,
particles of a cell can interact with particles of the cell itself (InteractCelij) or with
particles of the adjacent cells (InteractSelf).

Thus, depending on the parameter <INTER_XXXX> the interaction between cells is


carried out to compute different options as it can be seen in Table 4-2:
CallInteractionCells<INTER_MatrixKgc>
Interaction between cells to compute elements of matrix KGC.
->ComputeMatrixKgc
CallInteractionCells<INTER_ForcesKgc>
Interaction between cells using corrected force in predictor.
->ComputeForces
CallInteractionCells<INTER_ForcesCorrKgc>
Interaction between cells using corrected force in corrector.
->ComputeForces
CallInteractionCells<INTER_Forces>
Interaction between cells without corrected force in predictor.
->ComputeForces
CallInteractionCells<INTER_ForcesCorr>
Interaction between cells without corrected force in corrector.
->ComputeForces
CallInteractionCells<INTER_Shepard>
Interaction between cells for Shepard filter.
->ComputeForcesShepard

Table 4-2. Different options can be computed during cells interactions on CPU executions.

22
As mentioned before, a more complete documentation was generated using Doxygen.
For example the call graph of ComputeStep_Ver function can be seen in Figure 4-5:

Figure 4-5. Call graph for the function ComputeStep_Ver in Doxygen.

And the call graph of ComputeStepDivide function can be seen in Figure 4-6:

Figure 4-6. Call graph for the function ComputeStepDivide in Doxygen.

23
4.2 GPU source files

The source file JSphGpu.cpp can also be understood better with the outline represented
in Figure 4-7 that includes the functions implemented in the CUDA files:

CsCallInteraction_Forces
CsPreInteraction_Forces KerPreInteraction_Forces
CsInteraction_Forces
KerComputeForcesFluid KerComputeForcesFluidBox
RUN
KerComputeForcesBound KerComputeForcesBoundBox

CsInitCuda
CsDtVariable KerCalcFa2

AllocMemory
CsCallComputeStep KerComputeStepVerlet

LoadPartBegin CsComputeStep_Ver
CsRunFloating CsCalcRidpft KerCalcRidp

InitVars KerCalcFtOmega

PrintMemoryAlloc KerFtUpdate

CsCallDivide
RunMotion CsCalcRidpmv KerCalcRidp
CsDivide
CsMoveLinBound KerMoveLinBound
SaveData

CsCallComputeStepDivide
MAIN LOOP KerLimitZ
CsDivide
GetVarsDevice ReduMaxF KerReduMaxF

KerPreSort

KerRhopOut

KerSortData

KerCellsBeg

KerCellBegNv
CudaSphApi.cu
CudaSphNL.cu
CsCallRunShepard CudaSphFC.cu
KerPreShepard
CsRunShepard
CudaSphSU.cu
KerComputeForcesShepard

SaveData

RUN Starts simulation.


CsInitCuda Initialisation of CUDA device.
AllocMemory Memory allocation. DivideConfig is now called in AllocMemory
since it depends on the allocated memory in GPU.

LoadPartBegin Loads PART to restart simulation.


InitVars Initialisation of arrays and variables for the execution.
PrintMemoryAlloc Prints out the allocated memory.
CsCallDivide Calls CsDivide() to compute the initial neighbour list.
SaveData Creates files with output data.
MAIN LOOP Main loop of the simulation.
CsCallComputeStep Calls CsComputeStep() depending on the time algorithm.
RunMotion Processes movement of moving boundary particles.
CsCallComputeStepDivide Calls CsDivide() to computes the new Neighbour List.
CsCallRunShepard Calls CsRunShepard() to apply Shepard density filter.
SaveData Creates files with output data.
GetVarsDevice Recovers values of variables used in CUDA files.

Figure 4-7. Outline of JSphGpu.cpp when using Verlet time algorithm.

24
The colour of the boxes indicates the CUDA file where the functions are implemented:
CudaSphApi.cu implements all the functions used on GPU execution, and includes:
CudaSphNL.cu implements all the functions to compute the neighbour list on GPU;
CudaSphFC.cu implements all the functions to compute forces on GPU;
CudaSphSU.cu implements all the functions to update system on GPU.
And dashed boxes indicates that the functions inside are CUDA kernels.

The following table summarises the calls of the CsCallComputeStep function:


CsCallComputeStep Computes particle interaction and updates system.
CsComputeStep_Ver Updates system using Verlet algorithm.
CsCallInteraction_Forces Interaction between particles to compute forces.
CsInteraction_Forces
CsPreInteraction_Forces Prepares variables for force computation
->KerPreInteraction_Forces and its CUDA Kernel.
KerComputeForcesFluid CUDA Kernel to compute interaction between particles
(Fluid-Fluid & Fluid-Bound).
->KerComputeForcesFluidBox CUDA Kernel to compute interaction of a particle with a
set of particles.
KerComputeForcesBound CUDA Kernel to compute interaction between particles
(Bound-Fluid).
->KerComputeForcesBoundBox CUDA Kernel to compute interaction of a particle with a
set of particles.
CsDtVariable Computes a variable DT.
->KerCalcFa2 CUDA Kernel to compute Fa2.
KerComputeStepVerlet CUDA Kernel to updates pos, vel and rhop using Verlet.
CsRunFloating Processes movement of particles of floating objects.
CsCalcRidpft Computes position of particles of floating objects and
->KerCalcRidp its CUDA Kernel.
KerCalcFtOmega CUDA Kernel to compute values for a floating body.
KerFtUpdate CUDA Kernel to update particles of a floating body.

Table 4-3. Outline of CsCallComputeStep when using Verlet time algorithm.

Two different types of CUDA Kernels can be launched to compute interactions between
Fluid-Fluid and Fluid-Bound on the GPU executions:
- KerComputeForcesFluid: This kernel is launched in the basic configuration
where artificial viscosity is used.
- KerComputeForcesFullFluid: This kernel is used with Laminar+SPS viscosity
treatment and/or when Kernel Gradient Correction is applied. This CUDA
Kernel is less optimised and its performance is killed by the higher number of
registers.

25
26
5. Compiling DualSPHysics

The code can be compiled for either CPU executions or GPU executions (but not both
simultaneously). All computations have been implemented both in C++ for CPU
simulations and in CUDA for the GPU simulations. Most of the source code is common
to CPU and GPU which allows the code to be run on workstations without a CUDA-
enabled GPU, using only the CPU implementation.

To run DualSPHysics on GPU, only an Nvidia CUDA-enabled GPU card is needed and
the last version of the driver must be installed. However to compile the source code the
GPU programming language CUDA must be installed on your computer. CUDA
Toolkit 4.0 can be downloaded from Nvidia website: http://developer.nvidia.com/cuda-
toolkit-40.

Once the compiler nvcc has been installed in your machine, you can download the
relevant files from the section DUALSPHYSICS SOURCE FILES:
o DualSPHysics_v2.0_SourceFiles_linux.zip
o DualSPHysics_v2.0_SourceFiles_windows.zip

5.1 Windows compilation

Unzipping DualSPHysics_v2.0_SourceFiles_windows.zip the project file


DualSPHysics_R80.sln is provided to be opened with Visual Studio 2010. Also
different configurations can be chosen for compilation:

a) Release for CPU and GPU


b) ReleaseCPU only for CPU
c) x64 / Win32 choose between two windows platforms

The project is created including always the libraries for OpenMP in the executable. To
not include them, user can modify Props config -> C/C++ -> Language -> OpenMp.

The use of OpenMP can be also deactivated by commenting the code line in Types.h:
#define USE_OPENMP ///<Enables/Disables OpenMP.

In the .zip there are also several folders:


-Source: contains all the source files;
-Libs: precompiled libraries for x64 and Win32;
-Doxy: including the settings for Doxygen and the generated documentation;
-Run: will be created once the solution is built.

27
5.2 Linux compilation

Unzipping DualSPHysics_v2.0_SourceFiles_linux.zip different Makefiles can be used


to compile the code in linux:

a) make (–f Makefile) full compilation just using make command


b) make –f Makefile_CPU only for CPU

To choose your correct architecture you must modify the variable ARCH= 32 or 64 (for
32bit and 64bit respectively) in the Makefile.

To include the OpenMP libraries you must modify the variable WITHOMP= 1 or 0 (if
WITHOMP= 0 then code line in Types.h: #define USE_OPENMP must be commented)

Figure 5-1. Content of the file Makefile for linux.

Other compilation parameters for both Windows and Linux are:

“sm_10,compute_10” that compiles for GPUs with compute capability 1.0;


“sm_12,compute_12” that compiles for GPUs with compute capability 1.2;
“sm_20,compute_20” that compiles for GPUs with compute capability 2.0;
“-use_fast_math” allows using mathematical functions with less accuracy but faster.

By default, the output from compiling the CUDA code is stored in the file
DualSPHysics_ ptxasinfo. Thus, any possible error in the compilation can be identified
in this file.

28
6. Running DualSPHysics

To start using DualSPHysics, users have to follow these instructions:

1) First, download and read the DUALSPHYSICS_DOCUMENTATION:


- DualSPHysics_v2.0_GUIDE. This manuscript.
- GenCase_XML_GUIDE: Helps to create a new case.
- ExternalModelsConversion_GUIDE: Describes how to convert the file format of any
external geometry of a 3D model to VTK, PLY or STL using open-source codes.

2) Download the DUALSPHYSICS_PACKAGE (See Figure 6-1):


- EXECS
- HELP
- MOTION
- RUN_DIRECTORY:
o CASEDAMBREAK & CASEDAMBREAK2D
o CASEWAVEMAKER & CASEWAVEMAKER2D
o CASEREALFACE
o CASEPUMP

3) The user has to choose the package depending on the operative system. There are
versions for Windows/Linux and 32/64 bits of the following codes:
- GenCase
- DualSPHysics
- BoundaryVTK
- PartVTK
- MeasureTool
- IsoSurface

4) The appropriate batch BAT files in the directories CASES must be used depending
on a CPU or GPU execution.

5) Paraview open-source software (www.paraview.org) is recommended to be used to


visualise the results.

In Figure 8-1, the following directories can be observed:

EXECS:
- Contains all the executable codes.
- Some libraries needed for the codes are also included.
- The file DualSPHysics_ptxasinfo is used to optimise the block size for the
different CUDA kernels on GPU executions.

29
HELP:
- Contains CaseTemplate.xml, a XML example with all the different labels
and formats that can be used in the input XML file.
- The folder XML_examples includes more examples of the XML files used to
run some of the simulations shown in www.vimeo.com/dualsphysics.
- The information about the execution parameters of the different codes is
presented in HELP_NameCode.out.

MOTION:
- Contains the bat file Motion.bat to perform the examples with the different
type of movements that can be described with DualSPHysics. Eight
examples can be carried out (Motion01.xml…, Motion08.xml).
- The text file motion08mov_f3.out describes the prescribed motion used in
the eighth example.

RUN_DIRECTORY:
- It is the directory created to run the working cases where the output files will
be written.
GenCase
DualSPHysics + dll’s or lib’s
BoundaryVTK
EXECS PartVTK
MeasureTool
DualSPHysics_ptxasinfo

CaseTemplate.xml
XML_examples
HELP HELP_GenCase.out, HELP_DualSPHysics.out,
HELP_BoundaryVTK.out, HELP_PartVTK.out,
HELP_MeasureTool.out

Motion.bat
MOTION Motion01.xml, Motion02.xml…, Motion08.xml
motion08mov_f3.out
DualSPHysics CaseDambreak.bat
CaseDambreak_Def.xml
CASEDAMBREAK FastCaseDambreak.bat
FastCaseDambreak_Def.xml
PointsVelocity.txt

CaseWavemaker.bat
CaseWavemaker_Def.xml
CASEWAVEMAKER FastCaseWavemaker.bat
FastCaseWavemaker_Def.xml
RUN_DIRECTORY PointsHeights.txt

CaseRealface.bat
CaseRealface_Def.xml
FastCaseRealface.bat
CASEREALFACE
FastCaseRealface_Def.xml
face.vtk
PointsPressure.txt

CasePump.bat
CasePump_Def.xml
CASEPUMP
pump_fixed.vtk
pump_moving.vtk

Figure 6-1. Directory tree and provided files.

30
7. Format Files

The codes provided with this guide present some improvements in comparison to the
codes available within SPHysics. One of them is related to the format of the files that
are used as input and output data along the execution of DualSPHysics and the pre-
processing and post-processing codes. Three new formats are introduced now in
comparison to SPHysics: XML, binary and VTK-binary.

XML File

The EXtensible Markup Language is textual data format compatible with any hardware
and software. It is based on a set of labels to organise the information and they can be
loaded or written easily using any standard text editor. This format will be used for the
input files for the pre-processing code and the SPH solver.

BINARY File

The output data in the SPHysics code is written in text files, so ASCII format is used.
ASCII files present some interesting advantages such as visibility and portability,
however they also present important disadvantages: data stored in text format consumes
at least six times more memory than the same data stored in binary format, precision is
reduced when figures are converted from real numbers to text and read or write data in
ASCII is more expensive (two orders of magnitude). Since DualSPHysics allows
performing simulations with a high number of particles, a binary file format is
necessary to avoid these problems. Binary format reduces the volume of the files and
the time dedicated to generate them. The file format used in DualSPHysics is named
BINX2 (.bi2). These files contain the minimum information of particle properties. For
instance, some variables can be removed, e.g., the pressure is not stored since it can be
calculated starting from the density using the equation of state. The mass values are
constant for fluid particles and for boundaries so only two values are used instead of an
array. The position of fixed boundary particles is only stored in the first file since they
remain unchanged throughout the simulation. Data for particles that leave the limits of
the domain are stored in an independent file which leads to an additional saving.

31
Advantages of BINX2:
- Memory storage reduction: an important reduction in the file size is obtained
using the binary format by saving only the values of the particles that change
along the simulation.
- Fast Access: read-write timing is decreased since binary conversion is faster than
the ASCII one.
- No precision lost.
- Portability
- Additional information: a header is stored at the beginning of each file with
information about number of particles, mass value, the smoothing length, inter-
particle spacing, time…

VTK File

This format is basically used for visualization. In the case of the tools for
DualSPHysics, this format is used for the results; that is, not only the particle positions,
but also physical magnitudes obtained numerically for the particles involved in the
simulations.
VTK supports many data types, such as scalar, vector, tensor, texture, and also supports
different algorithms such as polygon reduction, mesh smoothing, cutting, contouring
and Delaunay triangulation. Essentially, VTK consists of a header that describes the
data and includes any other useful information, the dataset structure with the geometry
and topology of the dataset and the dataset attributes. Here we use VTK of
POLYDATA type with legacy-binary format that is easy for read-write operations.

32
8. Processing

In this section and the two following ones, the running and execution of the different
codes in the DualSPHysics package will be described. The main code which performs
the SPH simulation is named DualSPHysics.

The input files to run DualSPHysics code include one XML file (Case.xml) and a
binary file (Case.bi2). Case.xml contains all the parameters of the system configuration
and its execution such as key variables (smoothing length, reference density, gravity,
coefficient to calculate pressure, speed of sound…), the number of particles in the
system, movement definition of moving boundaries and properties of moving bodies.
The binary file Case.bi2 contains the particle data; arrays of position, velocity and
density and headers with mass values, time instant...

The output files consist of binary format files with the particle information at different
instants of the simulation (Part0000.bi2, Part0001.bi2, Part0002.bi2 …).

The configuration of the execution is defined in the XML file, but it can be also defined
or changed using execution parameters. Furthermore, new options and possibilities
for the execution can be imposed using more execution parameters.

Typing “DualSPHysics.exe –h” in the command window will display a brief HELP
manual:
DualSPHysics
================================
Information about execution parameters:
DualSPHysics [name_case [dir_out]] [options]
Options:
-h Shows information about parameters
-opt <file> Load a file configuration
-cpu Execution on Cpu (option by default)
-gpu[:id] Execucion on Gpu and id of the device

-ompthreads:<int> Only for Cpu execution, indicates the number of threads


by host for parallel execution, it takes the number of
cores of the device by default (or using zero value)
-ompdynamic Parallel execution with symmetry in interaction
and dynamic load balancing
-ompstatic Parallel execution with symmetry in interaction
and static load balancing

-symplectic Symplectic algorithm as time step algorithm


-verlet[:steps] Verlet algorithm as time step algorithm and number of
time steps to switch equations
-cubic Cubic spline kernel
-wendland Wendland kernel
-kgc:<0/1> Kernel Gradient Correction
-viscoart:<float> Artifitical viscosity [0-1]
-viscolamsps:<float> Laminar+SPS viscosity [order of 1E-6]
-shepard:steps Shepard filter and number of steps to be applied
-dbc:steps Hughes and Graham correction and number of steps
to update the density of the boundaries

33
-sv:[formas,...] Specify the output formats.
none No particles files are generated
binx2 Bynary files (option by default)
ascii ASCII files (PART_xxxx of SPHysics)
vtk VTK files
csv CSV files
-svdt:<0/1> Generate file with information about the time step dt
-svres:<0/1> Generate file that summarizes the execution process
-svtimers:<0/1> Obtain timing for each individual process
-name <string> Specify path and name of the case
-runname <string> Specify name for case execution
-dirout <dir> Specify the out directory

-partbegin:begin[:first] dir
Specify the beginning of the simulation starting from a given PART
(begin) and located in the directory (dir), (first) indicates the
number of the first PART to be generated

-incz:<float> Allowed increase in Z+ direction


-rhopout:min:max Exclude fluid particles out of these density limits

-tmax:<float> Maximum time of simulation


-tout:<float> Time between output files

-cellmode:<mode> Specify the cell division mode. By default, the fastest


approach is chosen
hneigs fastest and the most expensive in memory (only for gpu)
h intermediate
2h slowest and the least expensive in memory

-ptxasfile <file> Indicate the file with information about the compilation
of kernels in CUDA to adjust the size of the blocks depending on the
needed registers for each kernel (only for gpu)

Examples:
DualSPHysics case out_case -sv:binx2,csv
DualSPHysics case -gpu -svdt:1

This help provides information about the execution parameters that can be changed:
time stepping algorithm specifying Symplectic or Verlet (-symplectic, -verlet[:steps]),
choice of kernel function which can be Cubic or Wendland (-cubic, -wendland), whether
kernel gradients are corrected or not (-kgc:<0/1>), the value for artificial viscosity (-
viscoart:) or laminar+SPS viscosity treatment (-viscolamsps:), activation of the Shepard
density filter and how often must be applied it (-shepard:steps), Hughes and Graham
[2010] correction and the number of steps to update the density of the boundaries (-
dbc:steps), the maximum time of simulation and time intervals to save the output data (-
tmax:, -tout:>). To run the code, it is also necessary to specify whether the simulation is
going to run in CPU or GPU mode (-cpu, -gpu[:id]), the format of the output files (-
sv:[formats,...], none, binx2, ascii, vtk, csv), whether to generate files with information of the
time steps (-svdt:<0/1>), that summarises the execution process (-svres:<0/1>) with the
computational time of each individual process (-svtimers:<0/1>), information about
density (-svrhop:<0/1>). It is also possible to exclude particles out of limits of some given
values of density (-rhopout:min:max).

For CPU executions, a multi-core implementation using OpenMP enables executions in


parallel using the different cores of the machine. It takes the maximum number of cores
of the device by default or users can specify it (-ompthreads:<int>). In addition, the
parallel execution can use dynamic (-ompdynamic) or static (-ompstatic) load balancing.
On the other hand, for GPU executions, different cell divisions of the domain can be
used (-cellmode:<mode>) that differ in memory usage and efficiency.

34
9. Pre-processing

A new C++ code named GenCase has been implemented to define the initial
configuration of the simulation, movement description of moving objects and the
parameters of the execution in DualSPHysics. All this information is contained in an
input file in XML format; Case_Def.xml. Two output files are created after running
GenCase: Case.xml and Case.bi2 (the input files for DualSPHysics code).

GenCase employs a 3D mesh to locate particles. The idea is to build any object using
particles. These particles are created in the nodes of the 3D mesh. Firstly, the mesh
nodes around the object are defined and then particles are created only in the nodes
needed to draw the desired geometry. Figure 9-1 illustrates how this mesh is used, in
this case a triangle is generated in 2D. First the nodes of a mesh are defined starting
from the maximum dimensions of the desired triangle, then the three lines with the three
points of the triangle are defined and finally particles in the available points under the
three lines are created.

Figure 9-1. Generation of a 2-D triangle formed by particles using GenCase.

35
Independent of the complexity of the case geometry, all particles will be placed with an
equidistant spacing between them. The geometry of the case is defined independently
on the inter-particle distance. This allows the number of particles to be varied while
maintaining the size of the problem defining a different distance among particles.
Furthermore, GenCase is able to generate millions of particles only in seconds on CPU.

Very complex geometries are created with an easy implementation since a wide variety
of commands (labels in the XML file) are used to create different objects; points, lines,
triangles, quadrilateral, polygons, pyramids, prisms, boxes, beaches, spheres, ellipsoids,
cylinders, waves (<drawpoint />, <drawpoints />, <drawline />, <drawlines />, <drawtriangle />,
<drawquadri />, <drawtrianglesstrip />, <drawtrianglesfan />, <drawpolygon>, <drawtriangles />,
<drawpyramid />, <drawprism />, <drawbox />, <drawbeach />, <drawsphere />, <drawellipsoid
/>, <drawcylinder />, <drawwave />).

Once the mesh nodes that represent the desired object are selected, these points are
stored as a matrix of nodes. The shape of the object can be transformed using a
translation (<move />), a scaling (<scale />) or a rotation (<rotate />, <rotateline />). The
main process is creating particles in the nodes, different type of particles can be created;
a fluid particle (<setmkfluid />), a boundary particle (<setmkbound />) or none (<setmkvoid
/>). So, mk is the value used to mark a set of particles with a common feature in the
simulation. Only the bound nodes of the object can be used to locate particles
(<setdrawmode mode="face" />), the nodes inside the bounds (<setdrawmode mode="solid"
/>) or both (<setdrawmode mode="full" />).

The set of fluid particles can be labelled with features or special behaviours (<initials />).
For example, initial velocity (<velocity />) can be imposed for fluid particles or a solitary
wave can be defined (<velwave />). Furthermore, particles can be defined as part of a
floating object (<floatings />).

Having defined the boundaries filling a region with particles is particularly useful
working with complex geometries (<fillpoint />, <fillbox />, <fillfigure />, <fillprism />).

In case of more complex geometries, objects can be imported from 3DS files (Figure 9-
2) or CAD files (Figure 9-3). This allows combining the use of realistic geometries
generated by 3D designing application with the drawing commands of GenCase. These
files (3Ds or CAD) must be converted to STL format (<drawfilestl />), PLY format
(<drawfileply />) or VTK format (<drawfilevtk />), formats that are easily loaded by
GenCase. Any object in STL, PLY or VTK can be split in different triangles and any
triangle can be converted in particles using the GenCase code.

36
Figure 9-2. Example of a 3D file that can be imported by GenCase and converted into particles.

Figure 9-3. Example of a CAD file that can be imported by GenCase


and converted into particles.

37
Different kinds of movements can be imposed to a set of particles; linear, rotational,
circular, sinusoidal... To help users to define movements, a directory with some
examples is also included in the DualSPHysics package. Thus, the directory MOTION
includes:
- Motion01: uniform rectilinear motion (<mvrect />) that also includes pauses
(<wait />)
- Motion02: combination of two uniform rectilinear motion (<mvrect />)
- Motion03: movement of an object depending on the movement of another
(hierarchy of objects)
- Motion04: accelerated rectilinear motion (<mvrectace />)
- Motion05: rotational motion (<mvrot />). See Figure 9-4.
- Motion06: accelerated rotation motion (<mvrotace />) and accelerated circular
motion (<mvcirace />). See Figure 9-5.
- Motion07: sinusoidal movement (<mvrectsinu />, <mvrotsinu />, <mvcirsinu />)
- Motion08: prescribed movement with data from an external file (<mvpredef />)

Figure 9-4. Example of rotational motion. Figure 9-5. Example of accelerated rotation
motion and accelerated circular motion.

38
Typing “GenCase.exe –h” in the command window generates a brief HELP manual:

GenCase
===========================
Execution parameters information:
GenCase config_in config_out [options]
Options:
-h Shows information about the different parameters
-template Generates the example file 'CaseTemplate.xml'
-dp:<float> Distance between particles

-save:<values> Indicates the format of output files


+/-all: To choose or reject all options
+/-binx2: Binary format for the initial configuration
+/-vtkall: VTK with all particles of the initial configuration
+/-vtkbound: VTK with boundary particles of the initial configuration
+/-vtkfluid: VTK with fluid particles of the initial configuration
(preselected formats: binx2)

-debug:<int> Debug level (-1:Off, 0:Explicit, n:Debug level)

Examples:
GenCase case_def case
GenCase case case -dp:0.01

The user is invited to execute “GenCase.exe –template” and read carefully the generated
CaseTemplate.xml. The CaseTemplate.xml includes all the different labels with the
corresponding format that can be used in the input XML file. It is readily available in
the files to be downloaded.

A more complete description of the code can be found at the PDF files;
GenCase_XML_GUIDE.pdf and ExternalModelsConversion_GUIDE.pdf already
available to be downloaded from the DualSPHysics website.

GenCase code facilitates generating arbitrary complex geometries in a more user


friendly way and make DualSPHysics useful to users to examine realistic cases.

39
40
10. Post-processing

10.1 Visualization of boundaries

In order to visualise the boundary shapes formed by the boundary particles, different
geometry files can be generated using the BoundaryVTK code.

As input data, shapes can be loaded from a VTK file (-loadvtk), a PLY file (-loadply) or
an STL file (-loadstl) while boundary movement can be imported from an XML file (-
loadxml file.xml) using the timing of the simulation (-motiontime) or with the exact instants
of output data (-motiondatatime). The movement of the boundaries can also be
determined starting from the particle positions (-motiondata). The output files consist of
VTK files (-savevtk), PLY files (-saveply) or STL files (-savestl) with the loaded
information and the moving boundary positions at different instants. For example the
output files can be named motion_0000.vtk, motion_0001.vtk, motion_0002.vtk...
These files can be also generated by only a selected object defined by mk (-onlymk:), by
the id of the particles (-onlyid:).

Typing “BoundaryVTK.exe –h” in the command window a brief HELP manual is


available with the different parameters of execution:

BoundaryVtk
===============================
Information about parameters of execution:
BoundaryVtk <options>

Basic options:
-h Shows information about parameters
-opt <file> Load a file configuration

Load shapes:
-loadvtk <file.vtk> Load shapes of vtk files (PolyData)
-onlymk:<values> Indicates the mk of the shapes to be loaded, affects
to the previous -loadvtk and can not be used with
-onlyid
-onlyid:<values> Indicates the code of object of the shapes to be
loaded, only affects to the previous -loadvtk
-changemk:<value> Changes the mk of the loaded shapes to a given value,
only affects to the previous -loadvtk
-loadply:<mkvalue> <file.ply> Load shapes of ply files with a indicated mk
-loadstl:<mkvalue> <file.stl> Load shapes of stl files with a indicated mk

Load configuration for a mobile boundary:


-filexml file.xml Load xml file with information of the movement and
kind of particles
-motiontime:<time>:<step> Configure the duration and time step to
simulate the movement defined on the xml file,
can not be used with -motiondatatime and -motiondata

41
-motiondatatime <dir> Indicates the directory where real times of the
simulation are loaded for the mobile boundary
can not be used with -motiontime and -motiondata
-motiondata <dir> Indicates the directory where position of particles
are loaded to generate the mobile boundary
can not be used with -motiontime and -motiondatatime

Define output files:


-savevtk <file.vtk> Generates vtk(polydata) files with the loaded shapes
-saveply <file.ply> Generates ply file with the loaded information
-savestl <file.stl> Generates stl file with the loaded information
-savevtkdata <file.vtk> Generates vtk(polydata) file with the loaded
shapes including mk and shape code
-onlymk:<values> Indicates the mk of the shapes to be stored, affects
to the previous out option and
can not be used with -onlyid
-onlyid:<values> Indicates the code of the object of the shapes to be
stored, affects to the previous out option

Examples:
BoundaryVtk -loadvtk bound.vtk -savevtk box.vtk -onlymk:10,12

BoundaryVtk -loadvtk bound.vtk -filexml case.xml -motiondata .


-saveply motion.ply

10.2 Visualization of all particles output data

The BoundaryVTK code allows creating triangles or planes to represent the boundaries.
Conversely, the PartVTK code is used to tackle all particles (fixed boundaries, moving
boundaries or fluid particles) and to work with the different physical quantities
computed in DualSPHysics.

Thus, the output files of DualSPHysics, i.e. the binary files (.bi2), are now the input
files for the post-processing code PartVTK. Thus, it generates files in different formats
of the selected particles. The output files can be VTK-binary (-savevtk), CSV (-savecsv)
or ASCII (-saveascii). For example; PartVtkBin_0000.vtk, PartVtkBin_0001.vtk,
PartVtkBin_0002.vtk... These files can be generated by only a selected set of particles
defined by mk (-onlymk:), by the id of the particles (-onlyid:) or by the type of the
particle (-onlytype:) so we can check or uncheck all the particles (+/-all), the boundaries
(+/-bound), the fixed boundaries (+/-fixed), the moving boundaries (+/-moving), the
floating bodies (+/-floating), the fluid particles (+/-fluid) or the excluded fluid particles
during the simulation (+/-fluidout). The output files can contain different particle data (-
vars:); all the magnitudes (+/-all), velocity (+/-vel), density (+/-rhop), pressure (+/-press),
mass (+/-mass), acceleration (+/-ace), vorticity (+/-vor), the id of the particle (+/-id), the
type (+/-type:) and mk (+/-mk). Acceleration values can be positive (AcePos) or negative
(AceNeg) according to the direction of the axis. To compute acceleration of particles or
vorticity, some parameters can be also defined, such as the kernel to compute the
interactions (-acekercubic, -acekerwendland) the value of the viscosity (-aceviscoart:) or
even the gravity (-acegravity:).

42
Typing “PartVTK.exe –h” in the command window the HELP manual is visible with
the available parameters of execution:

PartVtk
===========================
Information about parameters of execution:

PartVtk <options>

Basic options:
-h Shows information about parameters
-opt <file> Load configuration of a file

Define input file :


-dirin <dir> Directory with particle data
-filein <file> File with particle data
-filexml file.xml Load xml file with information of mk and
type of particles, this is needed for the filter -onlymk
and for the variable -vars:mk
-first:<int> Indicates the first file to be computed
-last:<int> Indicates the last file to be computed
-jump:<int> Indicates the number of files to be skipped
-threads:<int> Indicates the number of threads for parallel execution of
the interpolation, it takes the number of cores of the device by default
(or using zero value).

Define parameters for acceleration or vorticity calculation:


-acekercubic Use Cubic spline kernel
-acekerwendland Use Wendland kernel
-aceviscoart:<float> Artifitical viscosity [0-1] (only for acceleration)
-acegravity:<float:float:float> Gravity value (only for acceleration)

Define output file :


-savevtk <file.vtk> Generates vtk(polydata) files with particles
according to given filters with the options
onlymk, onlyid and onlytype
-savecsv <file.csv> Generates CSV files to use with calculation sheets
-saveascii <file.asc> Generates ASCII files without headers

Configuration for each output file:


-onlymk:<values> Indicates the mk of selected particles
-onlyid:<values> Indicates the id of selected particles
-onlytype:<values> Indicates the type of selected particles
+/-all: To choose or reject all options
+/-bound: Boundary particles (fixed, moving and floating)
+/-fixed: Boundary fixed particles
+/-moving: Boundary moving particles
+/-floating: Floating body particles
+/-fluid: Fluid particles (no excluded)
+/-fluidout: Excluded fluid particles
(Preselected types: all)

-vars:<values> Indicates the stored variables of each particle


+/-all: To choose or reject all options
+/-vel: Velocity
+/-rhop: Density
+/-press: Pressure
+/-mass: Mass
+/-id: Id of particle
+/-type: Type (fixed,moving,floating,fluid,fluidout)
+/-mk: Value of mk associated to the particles
+/-ace: Acceleration
+/-vor: Vorticity
(Preselected variables: id,vel,rhop,type)

Examples:
PartVtk -savevtk partfluid.vtk -onlytype:-bound
-savevtk partbound.vtk -onlytype:-all,bound -vars:+mk

43
10.3 Analysis of numerical measurements

To compare experimental and numerical values, a tool to analyse these numerical


measurements is needed. The MeasureTool code allows different physical quantities at
a set of given points to be computed.

The binary files (.bi2) generated by DualSPHysics are the input files of the
MeasureTool code and the output files are again VTK-binary (-savevtk) or CSV (-
savecsv) or ASCII (-saveascii). The numerical values are computed as an interpolation of
the values of the neighbouring particles around a given position. The interpolation can
be computed using different kernels (-kercubic, -kerwendland). Kernel correction is also
applied when the summation of the kernel values around the position is higher than a
value (-kclimit:) defining a dummy value if the correction is not applied (-kcdummy:).
The positions where the interpolation is performed are given in a text file (-points <file>)
and the distance of interpolation can be 2h (the size of the kernel) or can be changed (-
distinter_2h:, -distinter:). The computation can only be carried out by a selected set of
particles, so the same commands than PartVTK can be used (-onlymk:, -onlyid:, -
onlytype:). Different interpolated variables (-vars) can be calculated numerically; all
available ones (+/-all), velocity (+/-vel), density (+/-rhop), pressure (+/-press), mass (+/-
mass), vorticity (+/-vor), acceleration (+/-ace), the id of the particle (+/-id), the type (+/-
type:) and mk (+/-mk). The maximum water depth can be also computed. Height values
(-height:) are calculated according to the interpolated mass, if the nodal mass is higher
than a given reference mass, that Z-position will be considered as the maximum height.
The reference value can be calculated in relation to the mass values of the selected
particles (-height:0.5, half the mass by default) or can be given in an absolute way (-
heightlimit:).

Typing “MeasureTool.exe –h” in the command window a brief HELP manual is


available with the different parameters of execution:
MeasureTool
===============================
Information about parameters of execution:
MeasureTool <options>

Basic options:
-h Shows information about parameters
-opt <file> Load configuration of a file

Define input file :


-dirin <dir> Directory with particle data
-filein <file> File with particle data
-filexml file.xml Load xml file with information of mk and
type of particles, this is needed for the filter -onlymk
and for the variable -vars:mk
-first:<int> Indicates the first file to be computed
-last:<int> Indicates the last file to be computed
-jump:<int> Indicates the number of files to be skipped
-threads:<int> Indicates the number of threads for parallel execution of
the interpolation, it takes the number of cores of the device by default
(or using zero value).

44
Define parameters for acceleration or vorticity calculation:
-acekercubic Use Cubic spline kernel
-acekerwendland Use Wendland kernel
-aceviscoart:<float> Artifitical viscosity [0-1]
-acegravity:<float:float:float> Gravity value

Set the filters that are applied to particles:


-onlymk:<values> Indicates the mk of selected particles
for interpolation
-onlyid:<values> Indicates the id of selected particles
-onlytype:<values> Indicates the type of selected particles
+/-all: To choose or reject all options
+/-bound: Boundary particles (fixed, moving and floating)
+/-fixed: Boundary fixed particles
+/-moving: Boundary moving particles
+/-floating: Floating body particles
+/-fluid: Fluid particles (no excluded)
+/-fluidout: Excluded fluid particles
(preselected types: all)

Set the configuration of interpolation:


-pointstemplate Generates an example file to be used with -points
-points <file> Defines the points where interpolated data will be
computed (each value separated by an space or a new line)
-particlesmk:<values> Indicates the mk of selected particles for define
the points where interpolated data will be computed
-kercubic Use Cubic spline kernel for interpolation
-kerwendland Use Wendland kernel for interpolation (by default)
-kclimit:<float> Defines the minimum value of sum_wab_vol to apply
the Kernel Correction (0.5 by default)
-kcdummy:<float> Defines the dummy for the interpolated quantity
if Kernel Correction is noy applied (0 by default)
-kcusedummy:<0/1> Defines whether or not to use the dummy value
(1 by default)
-distinter_2h:<float> Defines the maximum distance for the interaction
among particles depending on 2h (1 by default)
-distinter:<float> Defines the maximum distance for the interaction
among particles in an absolute way.

Set the values to be calculated:


-vars[:<values>] Defines the variables or magnitudes that
are going to be computed as an interpolation
of the selected particles around a given position
+/-all: To choose or reject all options
+/-vel: Velocity
+/-rhop: Density
+/-press: Pressure
+/-mass: Mass
+/-id: Id of particles
+/-type: Type (fixed,moving,floating,fluid,fluidout)
+/-mk: Value of mk associated to the particles
+/-ace: Acceleration
+/-vor: Vorticity
(preselected values: vel,rhop,id)
-height[:<float>] Height value is calculated starting from
mass values for each point x,y
The reference mass to obtain the height is
calculated according to the mass values of the
selected particles; 0.5 by default(half the mass)
-heightlimit:<float> The same than -height but the reference mass is
given in an absolute way.

45
Define output files format:
-savevtk <file.vtk> Generates vtk(polydata) with the given
interpolation points
-savecsv <file.csv> Generates one CSV file with the time history
of the obtained values
-saveascii <file.asc> Generates one ASCII file without headers
with the time history of the obtained values
Examples:
MeasureTool -points fpoints.txt -onlytype:-all,fluid -savecsv dataf
MeasureTool -points fpoints.txt -vars:press -savevtk visupress

10.4 Surface representation

Using a large number of particles, the visualization of the simulation can be improved
by representing surfaces instead of particles. To create the surfaces, the marching cubes
algorithm is used [Lorensen and Cline, 1987]. This computer graphics technique
extracts a polygonal mesh (set of triangles) of an isosurface from a 3D scalar field.

Figure 10-1, represents a 3D dam-break simulation using 300,000 particles. The first
snapshot shows the particle representation. Values of mass are interpolated at the points
of 3D mesh that covers the entire domain. Thus, if a value of mass is chosen, a
polygonal mesh is created where the vertexes of the triangles has the same value of
mass. The triangles of this isosurface of mass are represented in the second frame of the
figure. The last snapshots correspond to the surface representation where the colour
corresponds to the interpolated velocity value of the triangles.

Figure 10-1. Conversion of points to surfaces.

46
The output binary files of DualSPHysics are the input files of the IsoSurface code and
the output files are VTK files (-saveiso[:<var>]) with the isosurfaces calculated using a
variable (<var>) or can be structured points (-savegrid) with data obtained after the
interpolation. The interpolation of the values at the nodes of the scalar field can be
performed using different distances. Thus, the distance between the nodes can be
defined depending on dp (-distnode_dp:) or in an absolute way (-distnode:). On the other
hand, the maximum distance for the interaction between particles to interpolate values
on the nodes can be also defined depending on 2h (-distinter_dp:) or in an absolute way (-
distinter:). The limits of mesh can be calculated starting from particle data of all part
files (-domain_static) or adjusted to the selected particles of each part file (-
domain_dynamic).

IsoSurface
=============================
Information about parameters of execution:

IsoSurface <options>

Basic options:
-h Shows information about parameters
-opt <file> Load configuration of a file

Define input files:


-dirin <dir> Directory with particle data
-filein <file> File with particle data

-filexml file.xml Load xml file with information of mk and type of


particles, this is needed for the filter -onlymk and
for the variable -vars:mk

-dirgridin <filesname> Directory and name of the VTK files with data of
the nodes interpolated starting from the particles
(structed_points).
-filegridin <file> VTK file with data of the nodes interpolated
starting from the particles (structed_points).
-first:<int> Indicates the first file to be computed
-last:<int> Indicates the last file to be computed
-jump:<int> Indicates the number of files to be skipped

Define parameters for acceleration or vorticity calculation:


-acekercubic Use Cubic spline kernel
-acekerwendland Use Wendland kernel
-aceviscoart:<float> Artifitical viscosity [0-1]
-acegravity:<float:float:float> Gravity value

Set the filters that are applied to particles:


-onlymk:<values> Indicates the mk of selected particles
for interpolation
-onlyid:<values> Indicates the id of selected particles
-onlytype:<values> Indicates the type of selected particles
+/-all: To choose or reject all options
+/-bound: Boundary particles (fixed, moving and floating)
+/-fixed: Boundary fixed particles
+/-moving: Boundary moving particles
+/-floating: Floating body particles
+/-fluid: Fluid particles (no excluded)
+/-fluidout: Excluded fluid particles
(preselected types: fluid)

47
Set the configuration of interpolation:
-domain_static_all Mesh limits are calculated starting from particle data
selected in all part files
-domain_static_limits:xmin:ymin:zmin:xmax:ymax:zmax Mesh limits are given.
-domain_dynamic Mesh limits are adjusted to the selected particles of each
part file (option by default)
-domain_dynamic_limits:xmin:ymin:zmin:xmax:ymax:zmax Mesh limits are
adjusted to the selected particles within the given limits.

-kercubic Use Cubic spline kernel for interpolation


-kerwendland Use Wendland kernel for interpolation (by default)

-kclimit:<float> Defines the minimum value of sum_wab_vol to apply


the Kernel Correction (deactivated by default)
-kcdummy:<float> Defines the dummy for the interpolated quantity
if Kernel Correction is noy applied (0 by default)
-kcusedummy:<0/1> Defines whether or not to use the dummy value
(1 by default)

-distinter_2h:<float> Defines the maximum distance for the interaction


among particles depending on 2h (1 by default)
-distinter:<float> Defines the maximum distance for the interaction
among particles in an absolute way.

-distnode_dp:<float> Defines the distance between nodes by multiplying dp


and the given value (option by default)
-distnode:<float> Distance between nodes is given

-threads:<int> Indicates the number of threads for parallel execution of


the interpolation, it takes the number of cores of the device by default
(or using zero value).

Set the values to be calculated:


-vars[:<values>] Defines the variables or magnitudes that are going to
be computed as an interpolation of the selected particles around a given
position
+/-all: To choose or reject all options
+/-vel: Velocity
+/-rhop: Density
+/-press: Pressure
+/-mass: Mass
+/-id: Id of particles
+/-type: Type (fixed,moving,floating,fluid,fluidout)
+/-mk: Value of mk associated to the particles
+/-ace: Acceleration
+/-vor: Vorticity
(preselected values: vel,rhop,mass,id)

Define output files format:


-savegrid <file.vtk> Generates VTK files (structed_points) with
data obtained after interpolation

-saveiso[:<values>[:<var>]] <file.vtk> Generates VTK files (polydata)


with the isosuface calculated starting from a variable (mass, rhop or
press) and using several limit values. The variable by default is mass.
When limit values are not given, the intermediate value of the first data
file is considered

Examples:
IsoSurface -onlytype:-all,fluid -domain_static_all -savegrid filegrid
IsoSurface -saveiso:0.04,0.045,0.05 fileiso

48
11. Testcases

Some demonstration cases are included with the DualSPHysics package (11.1); a dam
break interacting with an obstacle in a tank (a), a wave maker in a beach (b), the real
geometry of a human face interacting with impinging fluid (c) and a pump mechanism
that translates water within a storage tank (d).

Figure 11-1. Test cases: dam break (a), wave maker (b), human face (c) and pump (d).

The different test cases are described in the following sections. The BAT files are
presented and the different command lines to execute the different programs are
described. The XML files are also explained in detail to address the execution
parameters and the different labels to plot boundaries, fluid volumes, movements, etc.

For each Case.bat there are 4 options;


-Case_win_CPU.bat (windows on CPU)
-Case_win_GPU.bat (windows on GPU)
-Case_linux_CPU.bat (linux on CPU)
-Case_linux_GPU.bat (linux on GPU)

Two different versions of some test cases were performed, a full version case and a
faster version with less particles.

Since many 3-D applications can be reduced to a 2-D problem, the new executable of
DualSPHysics can now perform 2-D simulations, however this options is not fully
optimized. Thus, cases for 2-D simulations are also provided (CASEDAMBREAK2D
and CASEWAVEMAKER2D). A 2-D simulation can be achieved easily by imposing
the same values along Y-direction (<pointmin> = <pointmax> in the input XML file).
Thus, the initial configuration is a 2-D system and DualSPHysics will be automatically
detected that the simulation must be carried out using the 2-D formulation.

49
The following tables summarise the computational runtimes and the performance of
DualSPHysics carrying out the different testcases on a 64-bit LINUX system with the
CPU Intel Xeon E5620 with 2.4GHz and the GPU GTX 480 at 1.40 GHz. Better
speedups are obtained increasing the number of particles. This performance also
depends on the percentage of fixed boundaries, moving boundaries and fluid particles.

Table 11-1 summarises computational times of using Intel Xeon E5620 with one core
and using 8 threads.

Runtime (min) Runtime (min)


DualSPHysics v2.0 np CPU CPU Speed-up
single-core multi-core
FastCaseDambreak 40,368 20.95 3.15 6.6
FastCaseWavemaker 20,232 25.73 4.42 5.8
FastCaseRealface 102,254 23.79 4.49 5.3
CaseDambreak 342,992 516.58 74.27 7.0
CaseWavemaker 202,959 888.30 141.13 6.3
CaseRealface 1,010,213 663.98 119.06 5.6
CasePump 1,009,607 >27 days 5345.77 >7.3
CaseDambreak2D 20,161 40.22 10.24 3.9
CaseWavemaker2D 19,079 68.59 17.68 3.8
CaseFloating 226,551 1489.69 221.30 6.7

Table 11-1. Total runtimes (mins) and CPU performance of DualSPHysics v2.0.

Table 11-2 summarises computational times of using Intel Xeon E5620 and GTX 480.

Runtime (min) Runtime (min)


DualSPHysics v2.0 np CPU GPU Speed-up
single-core
FastCaseDambreak 40,368 20.95 0.45 46.8
FastCaseWavemaker 20,232 25.73 1.40 18.4
FastCaseRealface 102,254 23.79 0.54 43.9
CaseDambreak 342,992 516.58 7.09 72.9
CaseWavemaker 202,959 888.30 17.74 50.1
CaseRealface 1,010,213 663.98 10.62 62.5
CasePump 1,009,607 >27 days 571.06 >68
CaseDambreak2D 20,161 40.22 2.30 17.5
CaseWavemaker2D 19,079 68.59 6.68 10.3
CaseFloating 226,551 1489.69 27.15 54.9

Table 11-2. Total runtimes (mins) and CPU/GPU performance of DualSPHysics v2.0.

50
11.1 CASEDAMBREAK

The first test case consists of a dam-break flow impacting on a structure inside a
numerical tank. No moving boundary particles are involved in this simulation and no
movement is defined.

CaseDambreak.bat file summarises the tasks to be carried out using the different codes
in a given order:

CaseDambreak.bat
GenCase.exe CaseDambreak_Def CaseDambreak_out/CaseDambreak
DualSPHysics.exe CaseDambreak_out/CaseDambreak –svres –gpu
PartVTK.exe –dirin CaseDambreak_out
–savevtk CaseDambreak_out/PartFluid
-onlytype:-all,+fluid
MeasureTool.exe –dirin CaseDambreak_out -points PointsVelocity.txt
-onlytype:-all,+fluid
-vars:-all,+vel
-savevtk CaseDambreak_out/PointsVelocity
-savecsv CaseDambreak_out/PointsVelocity

Pre-processing
GenCase.exe CaseDambreak_Def CaseDambreak_out/CaseDambreak

The CaseDambreak.xml is described in detail:


Different constants are defined:
lattice=staggered grid (2) to locate particles
gravity=gravity acceleration
cflnumber=coefficient in the Courant condition
hswl=maximum still water level, automatically calculated with TRUE
coefsound=coefficient needed to compute the speed of sound
coefficient=coefficient needed to compute the smoothing length
ALL THESE CONSTANTS ARE ALSO DEFINED IN SPHYSICS

dp=distance between particles


WHEN CHANGING THIS PARAMETER,
THE TOTAL NUMBER OF PARTICLES IS MODIFIED
x,y and z values are used to defined the limits of the domain

Volume of fluid:
setmkfluid mk=0,
solid to plot particles under the volume limits
drawbox to plot a rectangular box defining a corner and its
size in the three directions
Boundary Tank:
setmkbound mk=0,
face to plot particles only in the sides of
the defined box where the top is removed
To create CaseDambreak_Box_Dp.vtk

setmkvoid is used to remove particles in the given box,


to create the hole to place then the structure

51
Boundary Building:
setmkbound mk=1,
face to plot particles only in the sides of the
defined box where the bottom is removed

To create CaseDambreak_Building_Dp.vtk

Parameters of
configuration:
Kind of step
algorithm,
kernel,
viscosity,
density filter,
boundaries
correction,
simulation
timings… Final number of particles:
np=total number of particles
nb=boundary particles
nbf=fixed boundary particles
and final mk of the objects

Summary of the computed constants

PARAVIEW is used to visualise all the VTK files. In this case, two VTK files can be
opened with it; CaseDambreak_Box_Dp.vtk and CaseDambreak_Building_Dp.vtk.

Figure 11-2. VTK files generated during the pre-processing for CaseDambreak.

52
Processing (choices depending on device cpu or gpu)

DualSPHysics.exe CaseDambreak_out/CaseDambreak –svres –cpu


DualSPHysics.exe CaseDambreak_out/CaseDambreak –svres –gpu

TABLE 11-3 summarises the different computational times after running the test case
on CPU (single-core and 8 threads) and GPU:

np CPU CPU GPU Speed-up


single-core multi-core (CPU vs. GPU)
FastCaseDambreak 40,368 20.95 3.15 0.45 46.8
CaseDambreak 342,992 516.58 74.27 7.09 72.9

Table 11-3. Total runtimes (mins) and CPU/GPU performance for CaseDambreak.

Post-Processing
PartVTK.exe –dirin CaseDambreak_out
–savevtk CaseDambreak_out/PartFluid
-onlytype:-all,+fluid

Different instants of the CaseDambreak simulation are depicted in Figure 11-3 where
the PartFluid_xxxx.vtk of the fluid particles are represented.

Figure 11-3. Instants of the CaseDambreak simulation.


Colour represents velocity of the particles.

53
Post-processing
MeasureTool.exe –dirin CaseDambreak_out -points PointsVelocity.txt
-onlytype:-all,+fluid
-vars:-all,+vel
-savevtk CaseDambreak_out/PointsVelocity
-savecsv CaseDambreak_out/PointsVelocity

The file PointsVelocity.txt contains a list of points where numerical velocity can be
computed using the post-processing codes. Eight positions from (0.754,0.31,0.0) m to
(0.754,0.31,0.16) m with dz = 0.02 m are used as control points. The generated
PointsVelocity_xxxx.vtk files can be visualised to study the velocity behaviour along the
column as Figure 11-4 shows.

Figure 11-4. Numerical velocity computed at the given points.

On the other hand, the obtained PointsVelocity_Vel.csv collects all the numerical values
of the velocity in three directions. The x-component of the velocity is depicted in Figure
11-5 at different numerical heights.
Numerical Velocity
z=0 z=0.02 z=0.04 z=0.06
2.5

1.5
Vx (m/s)

0.5

0
0 0.2 0.4 0.6 0.8 1 1.2 1.4
t (s)

Figure 11-5. Numerical velocity at different experimental points.

54
11.2 CASEWAVEMAKER

CaseWavemaker simulates several waves breaking on a numerical beach. A wavemaker


is performed to generate and propagate the waves. In this testcase, a sinusoidal
movement is imposed to the boundary particles of the wavemaker.

CaseWavemaker.bat file describes the execution task to be carried out in this directory:

CaseWavemaker.bat
GenCase.exe CaseWavemaker_Def CaseWavemaker_out/CaseWavemaker
DualSPHysics.exe CaseWavemaker_out/CaseWavemaker –svres –gpu
BoundaryVTK.exe -loadvtk CaseWavemaker_out/CaseWavemaker__Real.vtk
-filexml CaseWavemaker_out/CaseWavemaker.xml
-motiondatatime CaseWavemaker_out
-savevtkdata CaseWavemaker_out/MotionPiston -onlymk:21
-savevtkdata CaseWavemaker_out/Box.vtk -onlymk:11
PartVTK.exe –dirin CaseWavemaker_out
–savevtk CaseWavemaker_out/PartFluid
-onlytype:-all,+fluid
IsoSurface.exe –dirin CaseWavemaker_out –saveiso CaseWavemaker_out/Surface
-onlytype:-all,+fluid

Pre-processing
GenCase.exe CaseWavemaker_Def CaseWavemaker_out/CaseWavemaker

The CaseWavemaker.xml is now described in detail:


Different constants are defined:
lattice= staggered grid (2) for boundaries and cubic grid (1) for fluid particles
gravity=gravity acceleration
cflnumber=coefficient in the Courant condition
hswl=maximum still water level, automatically calculated with TRUE
coefsound=coefficient needed to compute the speed of sound
coefficient=coefficient needed to compute the smoothing length
ALL THESE CONSTANTS ARE ALSO DEFINED IN SPHYSICS

To create particles in the order that grows the x-direction,


then y-direction and finally z-direction. This can be changed to plot ID
dp=distance between particles
WHEN CHANGING THIS PARAMETER,
THE TOTAL NUMBER OF PARTICLES IS MODIFIED
x,y and z values are used to defined the limits of the domain

Volume of fluid:
setmkfluid mk=0,
full to plot particles under the volume limits and the faces
drawprism to plot a figure that mimics a beach

setmkvoid is used to remove particles,


to define the maximum water level at z=0.75m
since all particles above are removed

55
Piston Wavemaker:
setmkbound mk=10,
face to plot particles only in the sides of the
defined box that formed the wavemaker

Boundary Tank:
setmkbound mk=0,
drawprism to plot the beach
face to plot only in the sides, except face with mask 96
boundary particles will replace the fluid ones in the sides of the beach

To create CaseWavemaker__Dp.vtk and CaseWavemaker__Real.vtk

Piston movement:
Particles associated to mk=10 move following a sinusoidal movement
The movement is the combination of three different movements with
different frequencies, amplitudes and phases
The duration and the order of the moves are also indicated

Parameters
of
configuration

Final number of particles:


np=total number of particles
nb=boundary particles
nbf=fixed boundary particles
nb-nbf=moving boundaries
and final mk of the objects

Summary of the computed constants

Summary of the movement

56
Processing (choices depending on device cpu or gpu)

DualSPHysics.exe CaseWavemaker_out/CaseWavemaker –svres –cpu


DualSPHysics.exe CaseWavemaker_out/CaseWavemaker –svres –gpu

TABLE 11-4 summarises the different computational times after running the test case
on CPU (single-core and 8 threads) and GPU:

np CPU CPU GPU Speed-up


single-core multi-core (CPU vs. GPU)
FastCaseWavemaker 20,232 25.73 4.42 1.40 18.4
CaseWavemaker 202,959 888.30 141.13 17.74 50.1

Table 11-4. Total runtimes (mins) and CPU/GPU performance for CaseWavemaker.

Post-processing
BoundaryVTK.exe -loadvtk CaseWavemaker_out/ CaseWavemaker__Real.vtk
-filexml CaseWavemaker_out/CaseWavemaker.xml
-motiondatatime CaseWavemaker_out
-savevtkdata CaseWavemaker_out/MotionPiston -onlymk:21
-savevtkdata CaseWavemaker_out/Box.vtk -onlymk:11
IsoSurface.exe –dirin CaseWavemaker_out –saveiso CaseWavemaker_out/Surface
-onlytype:-all,+fluid

The files MotionPiston_xxxx.vtk with the shape of the wavemaker at different instants
and Surface_xxxx.vtk with the surface representation are depicted in Figure 11-6:

Figure 11-6. Different instants of the CaseWavemaker simulation.


Colour represents velocity.

57
11.3 CASEREALFACE

The third test case combines the feature of defining an initial movement for the fluid
particles with importing a real geometry from a 3d studio file (face.vtk).

CaseRealface.bat
GenCase.exe CaseRealface_Def CaseRealface_out/CaseRealface
DualSPHysics.exe CaseRealface_out/CaseRealface –svres –gpu
PartVTK.exe –dirin CaseRealface_out
–savevtk CaseRealface_out/PartFluid
-onlytype:-all,+fluid
MeasureTool.exe –dirin CaseRealface_out -points PointsPressure.txt
-onlytype:-all,+bound
-vars:-all,+press
-savevtk CaseRealface_out/PointsPressure
-savecsv CaseRealface_out/PointsPressure

Pre-processing
GenCase.exe CaseRealface_Def CaseRealface_out/CaseRealface

The CaseRealface.xml is here described in detail:


Different constants are defined:
lattice=cubic grid to locate particles
gravity=gravity acceleration
cflnumber=coefficient in the Courant condition
hswl=maximum still water level, automatically calculated with TRUE
coefsound=coefficient needed to compute the speed of sound
coefficient=coefficient needed to compute the smoothing length
ALL THESE CONSTANTS ARE ALSO DEFINED IN SPHYSICS
To create particles in the order that grows the x-direction,
then y-direction and finally z-direction. This can be changed to plot ID

dp=distance between particles


WHEN CHANGING THIS PARAMETER,
THE TOTAL NUMBER OF PARTICLES IS MODIFIED
x,y and z values are used to defined the limits of the domain

Real Geometry: Load face.vtk and conversion to boundaries with mk=0

The eyes of the face are created as boundary particles to


prevent particle penetration through the eye holes.

Firstly, two spheres are created using boundary particles


and then two new spheres with a displaced center are
created using void to remove particles.

The 3D matrix used to locate particles is here rotated to create the


next object with a different orientation.

Volume of fluid:
setmkfluid mk=0,
full to plot particles under the volume limits and the faces
drawellipsoid to plot an ellipsoid of fluid particles

Two boundary points to fix the limits of the domain.


To create CaseRealface__Dp.vtk and CaseRealface__Real.vtk

58
Fluid particles associated to mk=0 will move with an initial velocity

Parameters
of
configuration

Final number of particles:


np=total number of particles
nb=boundary particles
nbf=fixed boundary particles
and final mk of the objects

Summary of the computed constants

Processing (choices depending on device cpu or gpu)

DualSPHysics.exe CaseRealface_out/CaseRealface –svres –cpu


DualSPHysics.exe CaseRealface_out/CaseRealface –svres –gpu

TABLE 11-5 summarises the different computational times after running the test case
on CPU (single-core and 8 threads) and GPU:

np CPU CPU GPU Speed-up


single-core multi-core (CPU vs. GPU)
FastCaseRealface 102,254 23.79 4.49 0.54 43.9
CaseRealface 1,010,213 663.98 119.06 10.62 62.5

Table 11-5. Total runtimes (mins) and CPU/GPU performance for CaseRealface.

59
Post-processing

PartVTK.exe –dirin CaseRealface_out


–savevtk CaseRealface_out/PartFluid
-onlytype:-all,+fluid

In Figure 11-7, face.vtk and PartFluid_xxxx.vtk of the fluid particles are represented at
different instants of the simulation.

Figure 11-7. Different instants of the CaseRealface simulation.


Colour represents density of the particles.

60
Post-processing

MeasureTool.exe –dirin CaseRealface_out -points PointsPressure.txt


-onlytype:-all,+bound
-vars:-all,+press
-savevtk CaseRealface_out/PointsPressure
-savecsv CaseRealface_out/PointsPressure

The position of the numerical point to compute the numerical pressure starting from the
pressures of the boundary particles is provided in the file PointsPressure.txt. The
numerical position is also represented in the left panel of Figure 11-8. Time history of
numerical pressure values are depicted in the right panel of the figure.

35000

30000

25000
Pressure (Pa)

20000

15000

10000

5000

0
0 0.5 1 1.5 2
time (s)

Figure 11-8. Numerical pressure stored in PointsPressure.csv

61
11.4 CASEPUMP

The last 3D testcase loads an external model of a pump with a fixed (pump_fixed.vtk)
and a moving part (pump_moving.vtk). The moving part describes a rotational
movement and the reservoir is pre-filled with fluid particles.

CasePump.bat
GenCase.exe CasePump_Def CasePump_out/CasePump
DualSPHysics.exe CasePump_out/CasePump –svres –gpu
BoundaryVTK.exe -loadvtk CasePump_out/ CasePump__Real.vtk
-loadxml CasePump_out/CasePump.xml
-motiondatatime CasePump_out
-savevtkdata CasePump_out/MotionPump –onlymk:13
-savevtkdata CasePump_out/Pumpfixed.vtk –onlymk:11
PartVTK.exe –dirin CasePump_out
–savevtk CasePump_out/PartFluid
-onlytype:-all,+fluid

Pre-processing

GenCase.exe CasePump_Def CasePump_out/CasePump

The CasePump.xml is here described in detail:


Different constants are defined:
lattice=cubic grid to locate particles
gravity=gravity acceleration
cflnumber=coefficient in the Courant condition
hswl=maximum still water level, automatically calculated with TRUE
coefsound=coefficient needed to compute the speed of sound
coefficient=coefficient needed to compute the smoothing length
ALL THESE CONSTANTS ARE ALSO DEFINED IN SPHYSICS

dp=distance between particles


WHEN CHANGING THIS PARAMETER,
THE TOTAL NUMBER OF PARTICLES IS MODIFIED
x,y and z values are used to defined the limits of the domain

Real Geometry:
Load pump_fixed.vtk and conversion to boundaries with mk=0.
Load pump_moving.vtk and conversion to boundaries with mk=2

Filling a region with fluid.


Fluid particles are created while points have type void,
starting from the seed point and within the limits of a box.
To create CasePump__Dp.vtk and CasePump__Real.vtk

62
Pump movement:
Particles associated to mk=2 move with an accelerated rotational movement
during 2 seconds and then a uniform rotation

Parameters
of
configuration

Final number of particles:


np=total number of particles
nb=boundary particles
nbf=fixed boundary particles
nb-nbf=moving boundaries
and final mk of the objects

Summary of the computed constants

Summary of the movement

Processing Choices depending on Device (cpu or gpu)

DualSPHysics.exe CasePump_out/CasePump –svres –cpu


DualSPHysics.exe CasePump_out/CasePump –svres –gpu

TABLE 11-6 summarises the different computational times after running the test case
on CPU (single-core and 8 threads) and GPU:

np CPU CPU GPU Speed-up


single-core multi-core (CPU vs. GPU)
CasePump 1,009,607 >27 days 5345.77 571.06 >68

Table 11-6. Total runtimes (mins) and CPU/GPU performance for CasePump.

63
Post-processing

BoundaryVTK.exe -loadvtk CasePump_out/ CasePump__Real.vtk


-loadxml CasePump_out/CasePump.xml
-motiondatatime CasePump_out
-savevtkdata CasePump_out/MotionPump –onlymk:13
-savevtkdata CasePump_out/Pumpfixed.vtk –onlymk:11
PartVTK.exe –dirin CasePump_out
–savevtk CasePump_out/PartFluid
-onlytype:-all,+fluid

In Figure 11-9, the fixed Pumpfixed.vtk, the different MotionPump.vtk and the
PartFluid_xxxx.vtk files of the fluid particles are represented at different instants of the
simulation.

Figure 11-9. Different instants of the CasePump simulation with 2.2M particles.
Colour represents velocity of the particles.

64
11.5 CASEDAMBREAK2D

The CaseDambreak2D is a 2D dam-break flow evolving inside a numerical tank.

CaseDambreak2D.bat file summarises the tasks to be carried out using the different
codes in a given order:

CaseDambreak.bat
GenCase.exe CaseDambreak2D_Def CaseDambreak2D_out/CaseDambreak2D
DualSPHysics.exe CaseDambreak2D_out/CaseDambreak2D –svres –gpu
PartVTK.exe –dirin CaseDambreak2D_out
–savevtk CaseDambreak_out/PartFluid
-onlytype:-all,+fluid
-vars:+id,+vel,+rhop,+press,+vor

Pre-processing
GenCase.exe CaseDambreak2D_Def CaseDambreak2D_out/CaseDambreak2D

The CaseDambreak2D.xml is described in detail:

65
Processing Choices depending on Device (cpu or gpu)

DualSPHysics.exe CaseDambreak2D_out/CaseDambreak2D –svres –cpu


DualSPHysics.exe CaseDambreak2D_out/CaseDambreak2D –svres –gpu

TABLE 11-7 summarises the different computational times:


np CPU CPU GPU Speed-up
single-core multi-core (CPU vs. GPU)
CaseDambreak2D 20,161 40.22 10.24 2.30 17.5

Table 11-7. Total runtimes (mins) and CPU/GPU performance for CaseDambreak2D.

Post-processing
PartVTK.exe –dirin CaseDambreak2D_out
–savevtk CaseDambreak_out/PartFluid
-onlytype:-all,+fluid
-vars:+id,+vel,+rhop,+press,+vor

Different instants of the CaseDambreak2D simulation are depicted in Figure 11-10


where the PartFluid_xxxx.vtk files of the fluid particles are represented:

Figure 11-10. Different instants of the CaseDambreak2D simulation.


For each time instant, top panel represents velocity and lower panel shows particles’ vorticity .

66
11.6 CASEWAVEMAKER2D

CaseWavemaker2D simulates breaking waves on a numerical beach.

CaseWavemaker2D.bat file summarises the tasks to be carried out using the different
codes in a given order:

CaseWavemaker2D.bat
GenCase.exe CaseWavemaker2D_Def CaseWavemaker2D_out/CaseWavemaker2D
DualSPHysics.exe CaseWavemaker2D_out/CaseWavemaker2D –svres –gpu
PartVTK.exe –dirin CaseWavemaker2D_out
–savevtk CaseWavemaker2D_out/PartAll
-onlytype:+all,-fluidout
MeasureTool –dirin CaseWavemaker2D_out
–points PointsHeights.txt
-onlytype:-all,+fluid
-height
-savevtk CaseWavemaker2D_out/PointsHeight
-savecsv CaseWavemaker2D_out/PointsHeight

Pre-processing
GenCase.exe CaseWavemaker2D_Def CaseWavemaker2D_out/CaseWavemaker2D

The CaseWavemaker2D.xml is described in detail:

67
Processing Choices depending on Device (cpu or gpu)

DualSPHysics.exe CaseWavemaker2D_out/CaseWavemaker2D –svres –cpu


DualSPHysics.exe CaseWavemaker2D_out/CaseWavemaker2D –svres –gpu

TABLE 11-7 summarises the different computational times:


np CPU CPU GPU Speed-up
single-core multi-core (CPU vs. GPU)
CaseWavemaker2D 19,079 68.59 17.68 6.68 10.3

Table 11-7. Total runtimes (mins) and CPU/GPU performance for CaseWavemaker2D.

68
Post-processing
PartVTK.exe –dirin CaseWavemaker2D_out
–savevtk CaseWavemaker2D_out/PartAll
-onlytype:+all,-fluidout
MeasureTool –dirin CaseWavemaker2D_out
–points PointsHeights.txt
-onlytype:-all,+fluid
-height
-savevtk CaseWavemaker2D_out/PointsHeight
-savecsv CaseWavemaker2D_out/PointsHeight
Different instants of the CaseWavemaker2D simulation are depicted in Figure 11-11
where the PartAll_xxxx.vtk files and PointsHeights_h_xxxx.vtk are represented:

Figure 11-11. Different instants of the CaseWavemaker2D simulation. For each time instant,
top panel represents the particles’ density and lower panel displays maximum wave height.

69
11.7 CASEFLOATING

Finally, a new case named CaseFloating is also provided in the directory


RUN_DIRECTORY. In this case, a floating box moves in waves generated with a
piston. This example has been added since many users study problems that involve
fluid-structure interaction.
The floating objects in DualSPHysics are treated as rigid bodies and the evolution
of the boundary particles of the fluid-driven objects are obtained from the
equations described in the SPH literature (see [Monaghan et al. 2003]). However,
this basic formulation does not reproduce the correct sliding movement of rigid
bodies when interacting with other boundaries and frictional forces must be
introduced (see Rogers et al. 2010). The next section explains how to modify in
the DualSPHysics code to perform your own applications adding new features.

Figure 11-12 shows different instants of the simulation of this test case. The floating
object is represented using VTK files generated using the provided BAT file that can
help users for their own post-processing:

Figure 11-12. Different instants of the CaseFloating simulation.

70
12. How to modify DualSPHysics for your application

12.1 Creating new cases

More info is provided to help users to create their own cases of study:
- A guide presenting all the commands that can be used in the XML file.
GenCase_XML_GUIDE.pdf.
- A guide describing how to convert the file format of any external geometry of a 3D
model to VTK, PLY or STL using open-source codes.
ExternalModelsConversion_GUIDE.pdf.
- More XML examples of some simulations from www.vimeo/dualsphysics.

12.2 Source files of the SPH solver

To add new formulation or changes, you only have to modify the following files:

CPU and GPU executions: main.cpp


JCfgRun (.h .cpp)
JSph (.h .cpp)
Types.h

For only CPU executions: JSphCpu (.h .cpp)


JSphTimersCpu.h
JDivideCpu (.h .cpp)

For only GPU executions: JSphGpu (.h .cpp)


JSphTimersGpu.h
CudaSphApi (.h .cu)
CudaSphFC.cu
CudaSphNL.cu
CudaSphSU.cu

Read section 4 for a complete description of the files.

71
72
13. FAQ: Frequently asked questions about DualSPHysics

What do I need to use DualSPHysics? What are the hardware and software
requirements?
In order to use DualSPHysics code on a GPU, you need a CUDA-enabled Nvidia GPU
card on your machine (http://developer.nvidia.com/cuda-gpus). If you want to run GPU
only simulations (i.e. not develop the source code) the latest version of the driver for
your graphics card must be installed. If no source code development is required, there
is no need to install any compiler to run the binary of the code, only the driver must be
updated. If you also want to compile the code you must install the nvcc compiler and
some C++ compiler. The nvcc compiler is included in the CUDA Toolkit 4.0 that can be
downloaded from the Nvidia website and must be installed on your machine.

Why cannot I run DualSPHysics binary?


An error when the simulation starts can be solved installing the latest version of the
driver of the GPU card.

How can I compile the code with different environments/compilers?


The provided source files in this release can be compiled for linux using a ‘makefile’
along with gcc and nvcc compilers, and for windows using a project for Visual Studio
VS2010. In case you use another compiler or other environment you can adjust the
contents of the makefile.

How many particles can I simulate with the GPU code?


The amount of particles that can be simulated depends on (i) the memory space of the
GPU card and (ii) the options of the simulation.

How can I learn how to use DualSPHysics?


The main information is provided in this document. A more complete description of the
SPH formulation implemented in the code is presented in [Gómez-Gesteira et al. 2012a,
2012b]. We highly recommend running the test cases and making small modifications
to them. The user can also attend courses where the code is used for tasks such as the
Training days held during the SPHERIC workshops
[http://wiki.manchester.ac.uk/spheric/index.php/Events_and_Activities] or the 2-day
SPH short courses organized by University of Manchester
[http://www.mace.manchester.ac.uk/business/cpd/courses/sph/].

How to start looking at the source code?


Section 4 of this guide introduces the source files including some call graphs and then
you must start looking to the documentation generated with Doxygen
(www.doxygen.org).

73
How can I create my own geometry?
Different documents can help users to create their own geometry. The input XML file
can be modified following GenCase_XML_GUIDE.pdf and different input formats of
real geometries can be converted using ExternalModelsConversion_GUIDE.pdf. This
manuscript also describes in detail the input files of the different test cases.

How can I contribute to the project?


You can contribute to the DualSPHysics project by reporting bugs, suggesting new
improvements, citing DualSPHysics in your paper if you use it, submitting your
modified codes together with examples.

How do I prevent fluid particles penetration through boundaries?


Due to the nature of the boundary conditions implemented in DualSPHysics, the
dynamic boundary conditions, particle penetration can be solved by changing the
following parameters:
- coefsound defines how much the pressure can vary starting from the computed density.
Thus, increasing this value, you allow the pressure of boundary particles to vary within
a higher range, so higher pressures for boundary particles can be reached avoiding
particle penetration
- DBCsteps when set equal to 1 means that you are updating density/pressure of
boundaries at each time step. The lower this value, the less likely fluid particles will
travel through the walls.

How does the code define the limits of the domain?


In the input XML file, the parameters pointmin and pointmax only define the domain to
create particles so out of these limits fluid or boundary particles will not be created. The
limits of the computational domain are computed at the beginning of simulation and use
the initial minimum and maximum positions of the particles that were already created
with GenCase.

How do I prevent the boundary particles from going outside of the domain limits
when applying motion?
As explained in the previous question, the limits of the computational domain are
computed starting from the initial minimum and maximum positions of the particles.
Since these values use the initial configuration, any movement of boundaries that
implies positions beyond these limits will give us the error ‘boundary particles out the
domain’. One solution to avoid creating larger tanks or domains is to define boundary
points (drawpoint) at the minimum and maximum positions that the particles are
expected to reach during the simulation.

When fluid particles are excluded from the simulation?


Fluid particles are excluded during the simulation; (i) if their positions are outside of the
limits of the domain (except in Z+ direction) and (ii) when density values are out of a
given range.

74
How do I create a 2-D simulation?
DualSPHysics can also perform 2-D simulations. To generate a 2-D configuration you
only have to change the XML file; imposing the same values along Y-direction that
define the limits of the domain where particles are created. For example, the geometry
of CASEWAVEMAKER2D is the 2-D version of the 3-D setup of
CASEWAVEMAKER just changing in the input XML file <pointmin>=1 and
<pointmax>=1.

Can I apply the formulation about floating objects already implemented for any
problem?
As explained in Section 11.7, the formulation of Monaghan et al. (2003) has been
implemented in DualSPHysics and different examples in the literature show the good
agreement of SPH results with experimental data. However, this basic formulation does
not include the correct physics when the floating objects interact with other objects or
with solid walls and friction forces are not taken into account. Thus, only fluid-driven
objects can be reproduced.

How must I cite the use of the code in my paper?


Please refer the code if you use it in a paper with references [Crespo et al. 2011] and
[Gómez-Gesteira et al. 2012a, 2012b].

75
76
14. DualSPHysics future

The new version DualSPHysics v3.0 that will be released before summer 2012 will
include:
 Periodic open boundaries.
 Double precision.
 New MPI implementation.
 MultiGPU.
 New output format files with parallel VTK.

Other features that will be integrated soon on the CPU-GPU solver are:
 SPH-ALE with Riemann Solver.
 Primitive-variable Riemann Solver.
 Variable particle resolution.
 Multiphase (gas-solid-water).
 Inlet/outlet flow conditions.
 Modified virtual boundary conditions.

77
78
15. References

1. Bonet J and Lok T-SL (1999) Variational and momentum preservation aspects of
Smoothed Particle Hydrodynamic formulations. Computer Methods in Applied
Mechanical Engeneering 180, 97-115.
2. Crespo AJC, Gómez-Gesteira M and Dalrymple RA (2007) Boundary conditions
generated by dynamic particles in SPH methods. Computers, Materials & Continua, 5,
173-184.
3. Crespo AJC, Gómez-Gesteira M and RA Dalrymple (2008) Modeling Dam Break
Behavior over a Wet Bed by a SPH Technique. Journal of Waterway, Port, Coastal, and
Ocean Engineering, 134(6), 313-320.
4. Crespo AJC, Marongiu JC, Parkinson E, Gómez-Gesteira M and Dominguez JM (2009)
High Performance of SPH Codes: Best approaches for efficient parallelization on GPU
computing. Proc IVth Int SPHERIC Workshop (Nantes), 69-76.
5. Crespo AJC, Dominguez JM, Barreiro A and Gómez-Gesteira M (2010) Development
of a Dual CPU-GPU SPH model. Proc 5th Int SPHERIC Workshop (Manchester), 401-
407.
6. Crespo AJC, Dominguez JM, Barreiro A, Gómez-Gesteira M and Rogers BD (2011)
GPUs, a new tool of acceleration in CFD: Efficiency and reliability on Smoothed
Particle Hydrodynamics methods. PLoS ONE 6 (6), e20685,
doi:10.1371/journal.pone.0020685.
7. Dalrymple RA and Rogers BD (2006) Numerical modeling of water waves with the
SPH method. Coastal Engineering, 53, 141–147.
8. Dominguez JM, Crespo AJC, Gómez-Gesteira M and Marongiu JC (2011) Neighbour
lists in Smoothed Particle Hydrodynamics. International Journal for Numerical Methods
in Fluids, 67, 2026-2042, doi: 10.1002/fld.2481
9. Gómez-Gesteira M and Dalrymple R (2004) Using a 3D SPH method for wave impact
on a tall structure. Journal of Waterway, Port, Coastal, and Ocean Engineering, 130(2),
63-69.
10. Gómez-Gesteira M, Rogers BD, Dalrymple RA and Crespo AJC (2010a) State-of-the-
art of classical SPH for free-surface flows. Journal of Hydraulic Research 48 Extra
Issue: 6–27, doi:10.3826/jhr.2010.0012.
11. Gómez-Gesteira M, Rogers BD, Dalrymple RA, Crespo AJC and Narayanaswamy M
(2010b) User Guide for the SPHysics Code v2.0, available at:
http://wiki.manchester.ac.uk/sphysics/images/SPHysics_v2.0.001_GUIDE.pdf
12. Gómez-Gesteira M, Rogers BD, Crespo AJC, Dalrymple RA, Narayanaswamy M and
Dominguez JM (2012a) SPHysics - development of a free-surface fluid solver- Part 1:
Theory and Formulations. Computers & Geosciences, doi:10.1016/j.cageo.2012.02.029.
13. Gómez-Gesteira M, Crespo AJC, Rogers BD, Dalrymple RA, Dominguez JM and
Barreiro A (2012b) SPHysics - development of a free-surface fluid solver- Part 2:
Efficiency and test cases. Computers & Geosciences, doi:10.1016/j.cageo.2012.02.028.
14. Harada T, Koshizuka S and Kawaguchi Y (2007) Smoothed Particle Hydrodynamics on
GPUs. Proc of Comp Graph Inter, 63-70.

79
15. Herault A, Bilotta G and Dalrymple RA (2010) SPH on GPU with CUDA. Journal of
Hydraulic Research, 48 Extra Issue, 74–79, doi:10.3826/jhr.2010.0005.
16. Hughes J and Graham D (2010) Comparison of incompressible and weakly-
compressible SPH models for free-surface water flows, Journal of Hydraulic Research,
48 Extra Issue, 105-117, doi:10.3826/jhr.2010.0009.
17. Leimkuhler BJ, Reich S, Skeel RD (1996) Integration Methods for Molecular dynamic
IMA Volume in Mathematics and its application. Springer.
18. Lorensen WE, Cline HE (1987) Marching Cubes: A High Resolution 3D Surface
Construction Algorithm, Computer Graphics SIGGRAPH 87 Proceedings, 21, 4, 163-
170.
19. Monaghan, JJ (1992) Smoothed particle hydrodynamics. Annual Review of Astronomy
and Astrophysics 30, 543- 574.
20. Monaghan JJ, and Lattanzio JC (1985) A refined method for astrophysical problems.
Astron. Astrophys. 149, 135–143.
21. Monaghan JJ and Kos A (1999) Solitary waves on a Cretan beach. Journal of Waterway,
Port, Coastal and Ocean Engineering 125, 145-154.
22. Monaghan JJ, Cas RAF, Kos AM and Hallworth M (1999) Gravity currents descending a
ramp in a stratified tank. Journal of Fluid Mechanics 379, 39–70.
23. Monaghan JJ, Kos A and Issa N (2003) Fluid motion generated by impact. Journal of
Waterway, Port, Coastal and Ocean Engineering 129, 250-259.
24. Omidvar P, Stansby PK and Rogers BD (2012) Wave body interaction in 2D using
smoothed particle hydrodynamics (SPH) with variable particle mass. International
Journal for Numerical Methods in Fluids 68, 686-705, doi:10.1002/fld.2528.
25. Panizzo A (2004) Physical and Numerical Modelling of Subaerial Landslide Generated
Waves. PhD thesis, Universita degli Studi di L'Aquila.
26. Rogers BD, Dalrymple RA, Stansby PK (2010) Simulation of caisson breakwater
movement using SPH. Journal of Hydraulic Research, 48, 135-141,
doi:10.3826/jhr.2010.0013.
27. Vacondio R, Rogers BD and Stansby PK (2011) Smoothed Particle Hydrodynamics:
Approximate zero-consistent 2-D boundary conditions and still shallow-water test.
International Journal for Numerical Methods in Fluids, doi:10.1002/fld.255.
28. Verlet L (1967) Computer experiments on classical fluids. I. Thermodynamical
properties of Lennard-Jones molecules. Physical Review, 159, 98-103.
29. Wendland H (1995) Piecewiese polynomial, positive definite and compactly supported
radial functions of minimal degree. Advances in Computational Mathematics 4, 389-
396.

80
16. Licenses

GPL License

The source code for DualSPHysics is freely redistributable under the terms of the GNU
General Public License (GPL) as published by the Free Software Foundation.
Simply put, the GPL says that anyone who redistributes the software, with or without
changes, must pass along the freedom to further copy and change it. By distributing the
complete source code for GNU DualSPHysics under the terms of the GPL, we
guarantee that you and all other users will have the freedom to redistribute and change
DualSPHysics.
Releasing the source code for DualSPHysics has another benefit as well. By having
access to all of the source code for a mathematical system like SPHysics, you have the
ability to see exactly how each and every computation is performed.
Although enhancements to DualSPHysics are not required to be redistributed under the
terms of the GPL, we encourage you to release your enhancements to SPHysics under
the same terms for the benefit of all users. We also encourage you to submit your
changes for inclusion in future versions of DualSPHysics.

Copyright (C) 2012 by Jose M. Dominguez, Dr. Alejandro Crespo, Prof. M. Gomez
Gesteira, Anxo Barreiro and Dr. Benedict Rogers.

DualSPHysics code belongs to SPHysics project, www.sphysics.org, an international


collaboration between University of Vigo (Spain), University of Manchester (UK) and
Johns Hopkins University (USA).

This file is part of DualSPHysics project.


DualSPHysics is free software: you can redistribute it and/or modify it under the terms
of the GNU General Public License as published by the Free Software Foundation,
either version 3 of the License, or (at your option) any later version.

DualSPHysics is distributed in the hope that it will be useful, but WITHOUT ANY
WARRANTY; without even the implied warranty of MERCHANTABILITY or
FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
more details.

You should have received a copy of the GNU General Public License, along with
DualSPHysics. If not, see http://www.gnu.org/licenses/.

81
BSD License

Copyright (c) 2012 by Jose M. Dominguez, Dr Alejandro Crespo, Prof. M. Gómez


Gesteira, Anxo Barreiro and Dr Benedict Rogers. All rights reserved.

Redistribution and use in source and binary forms, with or without


modification, are permitted provided that the following conditions are met:

-Redistributions of source code must retain the above copyright notice, this list of
conditions and the following disclaimer.

-Redistributions in binary form must reproduce the above copyright


notice, this list of conditions and the following disclaimer in the documentation and/or
other materials provided with the distribution.

-Neither the name of the DualSPHysics nor the names of its contributors may be used to
endorse or promote products derived from this software without specific prior written
permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND


CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES,
INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
THE POSSIBILITY OF SUCH DAMAGE.

82
 TinyXml License

www.sourceforge.net/projects/tinyxml

Original code (2.0 and earlier) copyright (c) 2000-2006 Lee Thomason
(www.grinninglizard.com)

This software is provided 'as-is', without any express or implied warranty. In no event
will the authors be held liable for any damages arising from the use of this software.

Permission is granted to anyone to use this software for any purpose, including
commercial applications, and to alter it and redistribute it freely, subject to the
following restrictions:

1. The origin of this software must not be misrepresented; you must not claim that you
wrote the original software. If you use this software in a product, an acknowledgment in
the product documentation would be appreciated but is not required.

2. Altered source versions must be plainly marked as such, and must not be
misrepresented as being the original software.

3. This notice may not be removed or altered from any source distribution.

83

You might also like