Professional Documents
Culture Documents
NWChem User Manual 5.1
NWChem User Manual 5.1
Release 5.1
December 2007
2
DISCLAIMER
This material was prepared as an account of work sponsored by an agency of the United States Government. Neither
the United States Government nor the United States Department of Energy, nor Battelle, nor any of their employees,
MAKES ANY WARRANTY, EXPRESS OR IMPLIED, OR ASSUMES ANY LEGAL LIABILITY OR RESPON-
SIBILITY FOR THE ACCURACY, COMPLETENESS, OR USEFULNESS OF ANY INFORMATION, APPARA-
TUS, PRODUCT, SOFTWARE, OR PROCESS DISCLOSED, OR REPRESENTS THAT ITS USE WOULD NOT
INFRINGE PRIVATELY OWNED RIGHTS.
LIMITED USE
This software (including any documentation) is being made available to you for your internal use only, solely for
use in performance of work directly for the U.S. Federal Government or work under contracts with the U.S. Department
of Energy or other U.S. Federal Government agencies. This software is a version which has not yet been evaluated and
cleared for commercialization. Adherence to this notice may be necessary for the author, Battelle Memorial Institute,
to successfully assert copyright in and commercialize this software. This software is not intended for duplication or
distribution to third parties without the permission of the Manager of Software Products at Pacific Northwest National
Laboratory, Richland, Washington, 99352.
ACKNOWLEDGMENT
This software and its documentation were produced with Government support under Contract Number DE-AC05-
76RL01830 awarded by the United States Department of Energy. The Government retains a paid-up non-exclusive,
irrevocable worldwide license to reproduce, prepare derivative works, perform publicly and display publicly by or for
the Government, including the right to distribute to other Government contractors.
3
4
AUTHOR DISCLAIMER
This software contains proprietary information of the authors, Pacific Northwest National Laboratory (PNNL),
and the US Department of Energy (USDOE). The information herein shall not be disclosed to others, and shall not be
reproduced whole or in part, without written permission from PNNL or USDOE. The information contained in this
document is provided “AS IS” without guarantee of accuracy. Use of this software is prohibited without written per-
mission from PNNL or USDOE. The authors, PNNL, and USDOE make no representations or warranties whatsoever
with respect to this software, including the implied warranty of merchant-ability or fitness for a particular purpose.
The user assumes all risks, including consequential loss or damage, in respect to the use of the software. In addition,
PNNL and the authors shall not be obligated to correct or maintain the program, or notify the user community of
modifications or updates that will be made over the course of time.
Contents
1 Introduction 19
1.1 Citation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.2 User Feedback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2 Getting Started 23
2.1 Input File Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.2 Simple Input File — SCF geometry optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.3 Water Molecule Sample Input File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.4 Input Format and Syntax for Directives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.4.1 Input Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.4.2 Format and syntax of directives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3 NWChem Architecture 31
3.1 Database Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2 Persistence of data and restart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4 Functionality 35
4.1 Molecular electronic structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.2 Relativistic effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.3 Pseudopotential plane-wave electronic structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.4 Molecular dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.5 Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.6 Parallel tools and libraries (ParSoft) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5 Top-level directives 41
5.1 START and RESTART — Start-up mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.2 SCRATCH_DIR and PERMANENT_DIR — File directories . . . . . . . . . . . . . . . . . . . . . . . 43
5
6 CONTENTS
6 Geometries 57
6.1 Keywords on the GEOMETRY directive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
6.2 SYMMETRY — Symmetry Group Input . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
6.3 Cartesian coordinate input . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
6.4 ZMATRIX — Z-matrix input . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
6.5 ZCOORD — Forcing internal coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
6.6 Applying constraints in geometry optimizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
6.7 SYSTEM — Lattice parameters for periodic systems . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
7 Basis sets 71
7.1 Basis set library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
7.2 Explicit basis set definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
7.3 Combinations of library and explicit basis set input . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
13 COSMO 139
16 MP2 175
16.1 FREEZE — Freezing orbitals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
16.2 TIGHT — Increased precision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
16.3 SCRATCHDISK — Limiting I/O usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
16.4 PRINT and NOPRINT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
16.5 VECTORS — MO vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
16.6 RI-MP2 fitting basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
16.7 FILE3C — RI-MP2 3-center integral filename . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
16.8 RIAPPROX — RI-MP2 Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
16.9 Advanced options for RI-MP2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
16.9.1 Control of linear dependence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
16.9.2 Reference Spin Mapping for RI-MP2 Calculations . . . . . . . . . . . . . . . . . . . . . . . 179
16.9.3 Batch Sizes for the RI-MP2 Calculation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
16.9.4 Energy Memory Allocation Mode: RI-MP2 Calculation . . . . . . . . . . . . . . . . . . . . 181
16.9.5 Local Memory Usage in Three-Center Transformation . . . . . . . . . . . . . . . . . . . . . 181
16.10One-electron properties and natural orbitals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
18 Selected CI 187
18.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
18.2 Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
18.3 Configuration Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
18.3.1 Specifying the reference occupation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
18.3.2 Applying creation-annihilation operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
18.3.3 Uniform excitation level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
CONTENTS 11
24 Hessians 223
24.1 Hessian Module Input . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
24.1.1 Defining the wavefunction threshold . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
24.1.2 Profile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
24.1.3 Print Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
26 DPLOT 229
26.1 GAUSSIAN — Gaussian Cube format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
26.2 TITLE — Title directive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
26.3 LIMITXYZ — Plot limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
26.4 SPIN — Density to be plotted . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
26.5 OUTPUT — Filename . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
26.6 VECTORS — MO vector file name . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
26.7 WHERE — Density evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
26.8 ORBITAL — Orbital sub-space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
26.9 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
28 Properties 237
28.1 Property keywords . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
28.1.1 Nbofile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
29 VSCF 239
31 Prepare 245
31.1 Default database directories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
31.2 System name and coordinate source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
31.3 Sequence file generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248
31.4 Topology file generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250
14 CONTENTS
33 Analysis 275
33.1 System specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275
33.2 Reference coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275
33.3 File specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276
33.4 Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
33.5 Coordinate analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278
33.6 Essential dynamics analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280
33.7 Trajectory format conversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280
33.8 Electrostatic potentials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282
39 Acknowledgments 371
Introduction
NWChem is a computational chemistry package designed to run on high-performance parallel supercomputers. Code
capabilities include the calculation of molecular electronic energies and analytic gradients using Hartree-Fock self-
consistent field (SCF) theory, Gaussian density function theory (DFT), and second-order perturbation theory. For all
methods, geometry optimization is available to determine energy minima and transition states. Classical molecular
dynamics capabilities provide for the simulation of macromolecules and solutions, including the computation of free
energies using a variety of force fields.
NWChem is scalable, both in its ability to treat large problems efficiently, and in its utilization of available parallel
computing resources. The code uses the parallel programming tools TCGMSG and the Global Array (GA) library
developed at PNNL for the High Performance Computing and Communication (HPCC) grand-challenge software
program and the Environmental Molecular Sciences Laboratory (EMSL) Project. NWChem has been optimized to
perform calculations on large molecules using large parallel computers, and it is unique in this regard.
This document is intended as an aid to chemists using the code for their own applications. Users are not expected
to have a detailed understanding of the code internals, but some familiarity with the overall structure of the code,
how it handles information, and the nature of the algorithms it contains will generally be helpful. The following
sections describe the structure of the input file, and give a brief overview of the code architecture. All input directives
recognized by the code are described in detail, with options, defaults, and recommended usages, where applicable.
The appendices present additional information on the molecular geometry and basis function libraries included in the
code.
1.1 Citation
The EMSL Software Agreement stipulates that the use of NWChem will be acknowledged in any publications which
use results obtained with NWChem. The acknowledgment should be of the form:
NWChem Version 5.1, as developed and distributed by Pacific Northwest National Laboratory, P. O. Box
999, Richland, Washington 99352 USA, and funded by the U. S. Department of Energy, was used to
obtain some of these results.
The words “A modified version of” should be added at the beginning, if appropriate. Note: Your EMSL Software
Agreement contains the complete specification of the required acknowledgment.
Please use the following citation when publishing results obtained with NWChem:
19
20 CHAPTER 1. INTRODUCTION
Straatsma, T.P.; Aprà, E.; Windus, T.L.; Bylaska, E.J.; de Jong, W.; Hirata, S.; Valiev, M.; Hackler, M.;
Pollack, L.; Harrison, R.; Dupuis, M.; Smith, D.M.A; Nieplocha, J.; Tipparaju V.; Krishnan, M.; Auer,
A.A.; Brown, E.; Cisneros, G.; Fann, G.; Früchtl, H.; Garza, J.; Hirao, K.; Kendall, R.; Nichols, J.; Tse-
mekhman, K.; Wolinski, K.; Anchell, J.; Bernholdt, D.; Borowski, P.; Clark, T.; Clerc, D.; Dachsel, H.;
Deegan, M.; Dyall, K.; Elwood, D.; Glendening, E.; Gutowski, M.; Hess, A.; Jaffe, J.; Johnson, B.; Ju,
J.; Kobayashi, R.; Kutteh, R.; Lin, Z.; Littlefield, R.; Long, X.; Meng, B.; Nakajima, T.; Niu, S.; Rosing,
M.; Sandrone, G.; Stave, M.; Taylor, H.; Thomas, G.; van Lenthe, J.; Wong, A.; Zhang, Z.; NWChem,
A Computational Chemistry Package for Parallel Computers, Version 4.6 (2004), Pacific Northwest Na-
tional Laboratory, Richland, Washington 99352-0999, USA.
High Performance Computational Chemistry: an Overview of NWChem a Distributed Parallel Applica-
tion, Kendall, R.A.; Aprà, E.; Bernholdt, D.E.; Bylaska, E.J.; Dupuis, M.; Fann, G.I.; Harrison, R.J.; Ju,
J.; Nichols, J.A.; Nieplocha, J.; Straatsma, T.P.; Windus, T.L.; Wong, A.T. Computer Phys. Comm., 2000,
128, 260–283 .
If you use the DIRDYVTST portion of NWChem, please also use the additional citation:
DIRDYVTST, Yao-Yuan Chuang and Donald G. Truhlar, Department of Chemistry and Super Computer
Institute, University of Minnesota; Ricky A. Kendall,Scalable Computing Laboratory, Ames Laboratory
and Iowa State University; Bruce C. Garrett and Theresa L. Windus, Environmental Molecular Sciences
Laboratory, Pacific Northwest Laboratory.
Users can also subscribe to the [email protected] electronic mailing list itself. This is intended
as a general forum through which code users can contact one another and the developers, to share experience with the
code and discuss problems. Announcements of new releases and bug fixes will also be made to this list.
To subscribe to the user list, send a message to
1.2. USER FEEDBACK 21
subscribe nwchem-users
The automated list manager is capable of recognizing a number of commands, including ; “subscribe”, “unsub-
scribe”, “get”, “index”, “which”, “who”, “info” and “lists”. The command “end” halts processing of commands. It
will provide some help if the message includes the line help in the body.
22 CHAPTER 1. INTRODUCTION
Chapter 2
Getting Started
This section provides an overview of NWChem input and program architecture, and the syntax used to describe the
input. See Sections 2.2 and 2.3 for examples of NWChem input files with detailed explanation.
NWChem consists of independent modules that perform the various functions of the code. Examples of modules
include the input parser, SCF energy, SCF analytic gradient, DFT energy, etc.. Data is passed between modules and
saved for restart using a disk-resident database or dumpfile (see Section 3).
The input to NWChem is composed of commands, called directives, which define data (such as basis sets, ge-
ometries, and filenames) and the actions to be performed on that data. Directives are processed in the order presented
in the input file, with the exception of certain start-up directives (see Section 2.1) which provide critical job control
information, and are processed before all other input. Most directives are specific to a particular module and define
data that is used by that module only. A few directives (see Section 5) potentially affect all modules, for instance by
specifying the total electric charge on the system.
There are two types of directives. Simple directives consist of one line of input, which may contain multiple fields.
Compound directives group together multiple simple directives that are in some way related and are terminated with
an END directive. See the sample inputs (Sections 2.2, 2.3) and the input syntax specification (Section 2.4).
All input is free format and case is ignored except for actual data (e.g., names/tags of centers, titles). Directives
or blocks of module-specific directives (i.e., compound directives) can appear in any order, with the exception of the
TASK directive (see sections 2.1 and 5.10) which is used to invoke an NWChem module. All input for a given task
must precede the TASK directive. This input specification rule allows the concatenation of multiple tasks in a single
NWChem input file.
To make the input as short and simple as possible, most options have default values. The user needs to supply
input only for those items that have no defaults, or for items that must be different from the defaults for the particular
application. In the discussion of each directive, the defaults are noted, where applicable.
The input file structure is described in the following sections, and illustrated with two examples. The input format
and syntax for directives is also described in detail.
The structure of an input file reflects the internal structure of NWChem. At the beginning of a calculation, NWChem
needs to determine how much memory to use, the name of the database, whether it is a new or restarted job, where
to put scratch/permanent files, etc.. It is not necessary to put this information at the top of the input file, however.
23
24 CHAPTER 2. GETTING STARTED
NWChem will read through the entire input file looking for the start-up directives. In this first pass, all other directives
are ignored.
The start-up directives are
• START
• RESTART
• SCRATCH_DIR
• PERMANENT_DIR
• MEMORY
• ECHO
After the input file has been scanned for the start-up directives, it is rewound and read sequentially. Input is
processed either by the top-level parser (for the directives listed in Section 5, such as TITLE, SET, . . . ) or by the
parsers for specific computational modules (e.g., SCF, DFT, . . . ). Any directives that have already been processed
(e.g., MEMORY) are ignored. Input is read until a TASK directive (see Section 5.10) is encountered. A TASK directive
requests that a calculation be performed and specifies the level of theory and the operation to be performed. Input
processing then stops and the specified task is executed. The position of the TASK directive in effect marks the end of
the input for that task. Processing of the input resumes upon the successful completion of the task, and the results of
that task are available to subsequent tasks in the same input file.
The name of the input file is usually provided as an argument to the execute command for NWChem. That is, the
execute command looks something like the following;
nwchem input_file
The default name for the input file is nwchem.nw. If an input file name input_file is specified without
an extension, the code assumes .nw as a default extension, and the input filename becomes input_file.nw. If
the code cannot locate a file named either input_file or input_file.nw (or nwchem.nw if no file name is
provided), an error is reported and execution terminates. The following section presents two input files to illustrate the
directive syntax and input file format for NWChem applications.
Examining the input line by line, it can be seen that it contains only four directives; TITLE, GEOMETRY, BASIS,
and TASK. The TITLE directive is optional, and is provided as a means for the user to more easily identify out-
puts from different jobs. An initial geometry is specified in Cartesian coordinates and Angstrøms by means of the
GEOMETRY directive. The Dunning cc-pvdz basis is obtained from the NWChem basis library, as specified by the
BASIS directive input. The TASK directive requests an SCF geometry optimization.
The GEOMETRY directive (Section 6) defaults to Cartesian coordinates and Angstrøms (options include atomic
units and Z-matrix format; see Section 6.4). The input blocks for the BASIS and GEOMETRY directives are structured
in similar fashion, i.e., name, keyword, . . . , end (In this simple example, there are no keywords). The BASIS input
block must contain basis set information for every atom type in the geometry with which it will be used. Refer to
Sections 7 and 8, and Appendix A for a description of available basis sets and a discussion of how to define new ones.
The last line of this sample input file (task scf optimize) tells the program to optimize the molecular
geometry by minimizing the SCF energy. (For a description of possible tasks and the format of the TASK directive,
refer to Section 5.10.)
If the input is stored in the file n2.nw, the command to run this job on a typical UNIX workstation is as follows:
nwchem n2
NWChem output is to UNIX standard output, and error messages are sent to both standard output and standard
error.
start h2o_freq
charge 1
basis
H library sto-3g
O library sto-3g
end
scf
uhf; doublet
print low
26 CHAPTER 2. GETTING STARTED
end
basis
H library 6-31g**
O library 6-31g**
end
The START directive (Section 5.1) tells NWChem that this run is to be started from the beginning. This directive
need not be at the beginning of the input file, but it is commonly placed there. Existing database or vector files are to
be ignored or overwritten. The entry h2o_freq on the START line is the prefix to be used for all files created by the
calculation. This convention allows different jobs to run in the same directory or to share the same scratch directory
(see Section 5.2), as long as they use different prefix names in this field.
As in the first sample problem, the geometry is given in Cartesian coordinates. In this case, the units are specified
as Angstrøms. (Since this is the default, explicit specification of the units is not actually necessary, however.) The
CHARGE directive defines the total charge of the system. This calculation is to be done on an ion with charge +1.
A small basis set (STO-3G) is specified for the intial geometry optimization. Next, the multiple lines of the first
SCF directive in the scf ...end block specify details about the SCF calculation to be performed. Unrestricted
Hartree-Fock is chosen here (by specifying the keyword uhf), rather than the default, restricted open-shell high-
spin Hartree-Fock (ROHF). This is necessary for the subsequent MP2 calculation, because only UMP2 is currently
available for open-shell systems (see Section 4). For open-shell systems, the spin multiplicity has to be specified
(using doublet in this case), or it defaults to singlet. The print level is set to low to avoid verbose output for the
starting basis calculations.
All input up to this point affects only the settings in the runtime database. The program takes its information from
this database, so the sequence of directives up to the first TASK directive is irrelevant. An exchange of order of the
different blocks or directives would not affect the result. The TASK directive, however, must be specified after all
relevant input for a given problem. The TASK directive causes the code to perform the specified calculation using the
parameters set in the preceding directives. In this case, the first task is an SCF calculation with geometry optimization,
specified with the input scf and optimize. (See Section 5.10 for a list of available tasks and operations.)
After the completion of any task, settings in the database are used in subsequent tasks without change, unless
they are overridden by new input directives. In this example, before the second task (task mp2 optimize), a
better basis set (6-31G**) is defined and the title is changed. The second TASK directive invokes an MP2 geometry
optimization.
2.4. INPUT FORMAT AND SYNTAX FOR DIRECTIVES 27
Once the MP2 optimization is completed, the geometry obtained in the calculation is used to perform a frequency
calculation. This task is invoked by the keyword freq in the final TASK directive, task mp2 freq. The second
derivatives of the energy are calculated as numerical derivatives of analytical gradients. The intermediate energies
and gradients are not of interest in this case, so output from the SCF and MP2 modules is disabled with the PRINT
directives.
This section describes the input format and the syntax used in the rest of this documentation to describe the format
of directives. The input format for the directives used in NWChem is similar to that of UNIX shells, which is also
used in other chemistry packages, most notably GAMESS-UK. An input line is parsed into whitespace (blanks or
tabs) separating tokens or fields. Any token that contains whitespace must be enclosed in double quotes in order to be
processed correctly. For example, the basis set with the descriptive name modified Dunning DZ must appear in
a directive as "modified Dunning DZ", since the name consists of three separate words.
A (physical) line in the input file is terminated with a newline character (also known as a ‘return’ or ‘enter’ character).
A semicolon (;) can be also used to indicate the end of an input line, allowing a single physical line of input to contain
multiple logical lines of input. For example, five lines of input for the GEOMETRY directive can be entered as follows;
geometry
O 0 0 0
H 0 1.430 1.107
H 0 -1.430 1.107
end
This one physical input line comprises five logical input lines. Each logical or physical input line must be no longer
than 1023 characters.
In the input file:
• a string, token, or field is a sequence of ASCII characters (NOTE: if the string includes blanks or tabs (i.e., white
space), the entire string must be enclosed in double quotes).
• \ (backslash) at the end of a line concatenates it with the next line. Note that a space character is automatically
inserted at this point so that it is not possible to split tokens across lines. A backslash is also used to quote special
characters such as whitespace, semi-colons, and hash symbols so as to avoid their special meaning (NOTE: these
special symbols must be quoted with the backslash even when enclosed within double quotes).
• ; (semicolon) is used to mark the end of a logical input line within a physical line of input.
• # (the hash or pound symbol) is the comment character. All characters following # (up to the end of the physical
line) are ignored.
28 CHAPTER 2. GETTING STARTED
• If any input line (excluding Python programs, Section 37) begins with the string INCLUDE (ignoring case) and
is followed by a valid file name, then the data in that file are read as if they were included into the current
input file at the current line. Up to three levels of nested include files are supported. The user should note that
inputting a basis set from the standard basis library (Section 7) uses one level of include.
• Data is read from the input file until an end-of-file is detected, or until the string EOF (ignoring case) is encoun-
tered at the beginning of an input line.
Directives consist of a directive name, keywords, and optional input, and may contain one line or many. Simple
directives consist of a single line of input with one or more fields. Compound directives can have multiple input lines,
and can also include other optional simple and compound directives. A compound directive is terminated with an END
directive. The directives START (see Section 5.1) and ECHO (see Section 5.4) are examples of simple directives. The
directive GEOMETRY (see Section 6) is an example of a compound directive.
Some limited checking of the input for self-consistency is performed by the input module, but most defaults are
imposed by the application modules at runtime. It is therefore usually impossible to determine beforehand whether or
not all selected options are consistent with each other.
In the rest of this document, the following notation and syntax conventions are used in the generic descriptions of
the NWChem input.
• a directive name always appears in all-capitals, and in computer typeface (e.g., GEOMETRY, BASIS, SCF). Note
that the case of directives and keywords is ignored in the actual input.
• a keyword always appears in lower case, in computer typeface (e.g., swap, print, units, bqbq).
• variable names always appear in lower case, in computer typeface, and enclosed in angle brackets to distinguish
them from keywords (e.g., <input_filename>, <basisname>, <tag>).
• () is used to group items (the parentheses and other special symbols should not appear in the input).
• < > enclose a type, a name of a value to be specified, or a default value, if any.
An input parameter is identified in the description of the directive by prefacing the name of the item with the type
of data expected, i.e.,
If an input item is not prefaced by one of these type names, it is assumed to be of type “string”.
In addition, integer lists may be specified using Fortran triplet notation, which interprets lo:hi:inc as lo,
lo+inc, lo+2*inc, . . . , hi. For example, where a list of integers is expected in the input, the following two lines
are equivalent
7 10 21:27:2 1:3 99
7 10 21 23 25 27 1 2 3 99
(In Fortran triplet notation, the increment, if unstated, is 1; e.g., 1:3 = 1:3:1.)
The directive VECTORS (Section 10.5) is presented here as an example of an NWChem input directive. The
general form of the directive is as follows:
This directive contains three optional keywords, as indicated by the three main sets of square brackets enclosing the
keywords input, swap, and output. The keyword input allows the user to specify the source of the molecular
orbital vectors. There are two mutually exclusive options for specifying the vectors, as indicated by the || symbol
separating the option descriptions;
The first option, (<string input_movecs default atomic>), allows the user to specify an ASCII
character string for the parameter input_movecs. If no entry is specified, the code uses the default atomic (i.e.,
atomic guess). The second option, (project <string basisname> <string filename>), contains the
keyword project, which takes two string arguments. When this keyword is used, the vectors in file <filename>
will be projected from the (smaller) basis <basisname> into the current atomic orbital (AO) basis.
The second keyword, swap, allows the user to re-order the starting vectors, specifying the pairs of vectors to be
swapped. As many pairs as the user wishes to have swapped can be listed for <integer vec1 vec2 ... >.
The optional keywords alpha and beta allow the user to swap the alpha or beta spin orbitals.
The third keyword, output, allows the user to tell the code where to store the vectors, by specifying an ASCII
string for the parameter output_movecs. If no entry is specified for this parameter, the default is to write the vectors
back into either the user- specified MO vectors input file or, if this is not available, the file $file_prefix$.movecs.
A particular example of the VECTORS directive is shown below. It specifies both the input and output key-
words, but does not use the swap option.
This directive tells the code to generate input vectors by projecting from vectors in a smaller basis named "small basis",
which is stored in the file small_basis.movecs. The output vectors will be stored in the file large_basis.movecs.
The order of keyed optional entries within a directive should not matter, unless noted otherwise in the specific
instructions for a particular directive.
30 CHAPTER 2. GETTING STARTED
Chapter 3
NWChem Architecture
As noted above, NWChem consists of independent modules that perform the various functions of the code. Examples
include the input parser, self-consistent field (SCF) energy, SCF analytic gradient, and density functional theory (DFT)
energy modules. The independent NWChem modules can share data only through a disk-resident database, which is
similar to the GAMESS-UK dumpfile or the Gaussian checkpoint file. This allows the modules to share data, or to
share access to files containing data.
It is not necessary for the user to be intimately familiar with the contents of the database in order to run NWChem.
However, a nodding acquaintance with the design of the code will help in clarifying the logic behind the input re-
quirements, especially when restarting jobs or performing multiple tasks within one job. Section 3.1 gives a general
description of the database.
As described above (Section 2.1), all start-up directives are processed at the beginning of the job by the main
program, and then the input module is invoked. Each input directive usually results in one or more entries being made
in the database. When a TASK directive is encountered, control is passed to the appropriate module, which extracts
relevant data from the database and any associated files. Upon completion of the task, the module will store significant
results in the database, and may also modify other database entries in order to affect the behavior of subsequent
computations.
1. the name of the array, which is a string of ASCII characters (e.g., "reference energies")
2. the type of the data in the array (i.e., real, integer, logical, or character)
3. the number (N) of data items in the array (Note: A scalar is stored as an array of unit length.)
It is possible to enter data directly into the database using the SET directive (see Section 5.7). For example, to store
a (64-bit precision) three-element real array with the name "reference energies" in the database, the directive
is as follows:
31
32 CHAPTER 3. NWCHEM ARCHITECTURE
NWChem determines the data to be real (based on the type of the first element, 0.0), counts the number of elements
in the array, and enters the array into the database.
Much of the data stored in the database is internally managed by NWChem and should not be modified by the user.
However, other data, including some NWChem input options, can be freely modified.
Objects are built in the database by storing associated data as multiple entries, using an internally consistent
naming convention. This data is managed exclusively by the subroutines (or methods) that are associated with the
object. Currently, the code has two main objects: basis sets and geometries. Sections 6 and 7 present a complete
discussion of the input to describe these objects.
As an illustration of what comprises a geometry object, the following table contains a partial listing of the database
contents for a water molecule geometry named "test geom". Each entry contains the field test geom, which is
the unique name of the object.
Entry Type[nelem]
--------------------------- ----------------------
geometry:test geom:efield double[3]
geometry:test geom:coords double[9]
geometry:test geom:ncenter int[1]
geometry:test geom:charges double[3]
geometry:test geom:tags char[6]
...
Using this convention, multiple instances of objects may be stored with different names in the same database. For
example, if a user needed to do calculations considering alternative geometries for the water molecule, an input file
could be constructed containing all the geometries of interest by storing them in the database under different names.
The runtime database contents for the file h2o.db listed above were generated from the user-specified input
directive,
The GEOMETRY directive allows the user to specify the coordinates of the atoms (or centers), and identify the geometry
with a unique name. (Refer to Section 6 for a complete description of the GEOMETRY directive.)
Unless a specific name is defined for the geometry, (such as the name "test geom" shown in the example), the
default name of geometry is assigned. This is the geometry name that computational modules will look for when
executing a calculation. The SET directive can be used in the input to force NWChem to look for a geometry with
a name other than geometry. For example, to specify use of the geometry with the name "test geom" in the
example above, the SET directive is as follows:
NWChem will automatically check for such indirections when loading geometries. Storage of data associated with
basis sets, the other database resident object, functions in a similar fashion, using the default name "ao basis".
Functionality
NWChem provides many methods to compute the properties of molecular and periodic systems using standard quan-
tum mechanical descriptions of the electronic wavefunction or density. In addition, NWChem has the capability to
perform classical molecular dynamics and free energy simulations. These approaches may be combined to perform
mixed quantum-mechanics and molecular-mechanics simulations.
NWChem is available on almost all high performance computing platforms, workstations, PCs running LINUX,
as well as clusters of desktop platforms or workgroup servers. NWChem development has been devoted to provid-
ing maximum efficiency on massively parallel processors. It achieves this performance on the 1960 processors HP
Itanium2 system in the EMSL’s MSCF. It has not been optimized for high performance on single processor desktop
systems.
The following methods are available to calculate energies and analytic first derivatives with respect to atomic
coordinates. Second derivatives are computed by finite difference of the first derivatives.
35
36 CHAPTER 4. FUNCTIONALITY
The following methods are available to compute energies only. First and second derivatives are computed by finite
difference of the energies.
• COSMO energies - the continuum solvation ‘COnductor-like Screening MOdel’ of A. Klamt and G. Schüür-
mann to describe dielectric screening effects in solvents.
• Python
• the POLYRATE direct dynamics software
4.2. RELATIVISTIC EFFECTS 37
• Spin-free and spin-orbit one-electron Douglas-Kroll and zeroth-order regular approximations (ZORA) are avail-
able for all quantum mechanical methods and their gradients.
• Dyall’s spin-free Modified Dirac Hamiltonian approximation is available for the Hartree-Fock method and its
gradients.
• One-electron spin-orbit effects can be included via spin-orbit potentials. This option is available for DFT and
its gradients, but has to be run without symmetry.
• PSPW - (Pseudopotential plane-wave) A gamma point code for calculating molecules, liquids, crystals, and
surfaces.
• Band - A prototype band structure code for calculating crystals and surfaces with small band gaps (e.g. semi-
conductors and metals)
With
• Pseudopotential libraries
• Orthorhombic simulation cells with periodic and free space boundary conditions.
• Mulliken, point charge, DPLOT (wavefunction, density and electrostatic potential plotting) analysis
38 CHAPTER 4. FUNCTIONALITY
• Energy minimization
• Effective pair potentials (functional form used in AMBER, GROMOS, CHARMM, etc.)
• SHAKE constraints
NWChem also has the capability to combine classical and quantum descriptions in order to perform:
• Mixed quantum-mechanics and molecular-mechanics (QM/MM) minimizations and molecular dynamics simu-
lation , and
• Quantum molecular dynamics simulation by using any of the quantum mechanical methods capable of returning
gradients.
By using the DIRDYVTST module of NWChem, the user can write an input file to the POLYRATE program, which
can be used to calculate rate constants including quantum mechanical vibrational energies and tunneling contributions.
4.5 Python
The Python programming language has been embedded within NWChem and many of the high level capabilities of
NWChem can be easily combined and controlled by the user to perform complex operations.
4.6. PARALLEL TOOLS AND LIBRARIES (PARSOFT) 39
Top-level directives
Top-level directives are directives that can affect all modules in the code. Some specify molecular properties (e.g.,
total charge) or other data that should apply to all subsequent calculations with the current database. However, most
top-level directives provide the user with the means to manage the resources for a calculation and to start computations.
As the first step in the execution of a job, NWChem scans the entire input file looking for start-up directives, which
NWChem must process before all other input. The input file is then rewound and processed sequentially, and each
directive is processed in the order in which it is encountered. In this second pass, start-up directives are ignored.
The following sections describe each of the top-level directives in detail, noting all keywords, options, required
input, and defaults.
(RESTART || START) \
[<string file_prefix default $input_file_prefix$>] \
[rtdb <string rtdb_file_name default $file_prefix$.db>]
The START directive indicates that the calculation is one in which a new database is to be created. Any rel-
evant information that already exists in a previous database of the same name is destroyed. The string variable
<file_prefix> will be used as the prefix to name any files created in the course of the calculation.
E.g., to start a new calculation on water, one might specify
start water
41
42 CHAPTER 5. TOP-LEVEL DIRECTIVES
file name /home/dave/job.2.nw yields job.2 as the file prefix, if a name is not assigned explicitly using the
START directive.
The user also has the option of specifying a unique name for the database, using the keyword rtdb. When this
keyword is entered, the string entered for rtdb_file_name is used as the database name. If the keyword rtbd is
omitted, the name of the database defaults to $<file_prefix>$.db in the directory for permanent files.
If a calculation is to start from a previous calculation and go on using the existing database, the RESTART directive
must be used. In such a case, the previous database must already exist. The name specified for <file_prefix>
usually should not be changed when restarting a calculation. If it is changed, NWChem will not be able to find needed
files when going on with the calculation.
In the most common situation, the previous calculation was completed (with or without an error condition), and it
is desired to perform a new task or restart the previous one, perhaps with some input changes. In these instances, the
RESTART directive should be used. This reuses the previous database and associated files, and reads the input file for
new input and task information.
The RESTART directive looks immediately for new input and task information, deleting information about pre-
vious incomplete tasks. For example, when doing a RESTART there is no need to specify geometry or basis set
declaration because the program will detect this information since it is stored in the run-time database.
If a calculation runs out of time, for example because it is on a queuing system, this is another instance where doing
a RESTART is advisable. Simply include nothing after the RESTART directive except those tasks that are unfinished.
NOTE: Due to changes in the runtime database structure, RESTART will not work on database files generated by
NWChem versions 4.0.1 and older.
To summarize the default options for this start-up directive, if the input file does not contain a START or a
RESTART directive, then
• the variable <file_prefix> is assigned the name of the input file for the job, without the suffix (which is
usually .nw)
• the variable <rtdb_file_name> is assigned the default name, $file_prefix$.db
If the database with name $file_prefix$.db does not already exist, the calculation is carried out as if a START
directive had been encountered. If the database with name $file_prefix$.db does exist, then the calculation is
performed as if a RESTART directive had been encountered.
For example, NWChem can be run using an input file with the name water.nw by typing the UNIX command
line,
nwchem water.nw
If the NWChem input file water.nw does not contain a START or RESTART directive, the code sets the vari-
able <file_prefix> to water. Files created by the job will have this prefix, and the database will be named
water.db. If the database water.db does not exist already, the code behaves as if the input file contains the
directive,
start water
If the database water.db does exist, the code behaves as if the input file contained the directive,
restart water
5.2. SCRATCH_DIR AND PERMANENT_DIR — FILE DIRECTORIES 43
Directories are extracted from the user input by executing the following steps, in sequence:
1. Look for a directory qualified by the process ID number of the invoking process. Processes are numbered from
zero. Else,
2. If there is a list of directories qualified by the name of the host machine1 , then use round-robin allocation from
the list for processes executing on the given host. Else,
3. If there is a list of directories unqualified by any hostname or process ID, then use round-robin allocation from
this list.
If directory allocation directive(s) are not specified in the input file, or if no match is found to the directory names
specified by input using these directives, then the steps above are executed using the installation-specific defaults. If
the code cannot find a valid directory name based on the input specified in either the directive(s) or the system defaults,
files are automatically written to the current working directory (".").
The following is a list of examples of specific allocations of scratch directory locations:
• Put scratch files from all processes in the local scratch directory (Warning: the definition of “local scratch
directory” may change from machine to machine):
scratch_dir /localscratch
• Put scratch files from Process 0 in /piofs/rjh, but put all other scratch files in /scratch:
• Put scratch files from Process 0 in directory scr1, those from Process 1 in scr2, and so forth, in a round-robin
fashion, using the given list of directories:
1 As returned by util_hostname() which maps to the output of the command hostname on Unix workstations.
44 CHAPTER 5. TOP-LEVEL DIRECTIVES
• Allocate files in a round-robin fashion from host-specific lists for processes distributed across two SGI multi-
processor machines (node names coho and bohr):
• integer
• byte
• kb (kilobytes)
• mb (megabytes)
In most cases, the user need specify only the total memory limit to adjust the amount of memory used by NWChem.
The following specifications all provide for eight megabytes of total memory (assuming 64-bit floating point numbers),
which will be distributed according to the default partitioning:
memory 1048576
memory 1048576 real
memory 1 mw
memory 8 mb
memory total 8 mb
memory total 1048576
5.4. ECHO — PRINT INPUT FILE 45
In NWChem there are three distinct regions of memory: stack, heap, and global. Stack and heap are node-private,
while the union of the global region on all processors is used to provide globally-shared memory. The allowed lim-
its on each category are determined from a default partitioning (currently 25% heap, 25% stack, and 50% global).
Alternatively, the keywords stack, heap, and global can be used to define specific allocations for each of these
categories. If the user sets only one of the stack, heap, or global limits by input, the limits for the other two categories
are obtained by partitioning the remainder of the total memory available in proportion to the weight of those two cat-
egories in the default memory partitioning. If two of the category limits are given, the third is obtained by subtracting
the two given limits from the total limit (which may have been specified or may be a default value). If all three category
limits are specified, they determine the total memory allocated. However, if the total memory is also specified, it must
be larger than the sum of all three categories. The code will abort if it detects an inconsistent memory specification.
The following memory directives also allocate 8 megabytes, but specify a complete partitioning as well:
The optional keywords verify and noverify in the directive give the user the option of enabling or disabling
automatic detection of corruption of allocated memory. The default is verify, which enables the feature. This
incurs some overhead (which can be around 10% increase in walltime on some platforms), which can be eliminated
by specifying noverify.
The keywords hardfail and nohardfail give the user the option of forcing (or not forcing) the local memory
management routines to generate an internal fatal error if any memory operation fails. The default is nohardfail,
which allows the code to continue past any memory operation failure, and perhaps generate a more meaningful error
message before terminating the calculation. Forcing a hard-fail can be useful when poorly coded applications do not
check the return status of memory management routines.
When assigning the specific memory allocations using the keywords stack, heap, and global in the MEMORY
directive, the user should be aware that some of the distinctions among these categories of memory have been blurred in
their actual implementation in the code. The memory allocator (MA) allocates both the heap and the stack from a single
memory region of size heap+stack, without enforcing the partition. The heap vs. stack partition is meaningful only
to applications developers, and can be ignored by most users. Further complicating matters, the global array (GA)
toolkit is allocated from within the MA space on distributed memory machines, while on shared-memory machines it
is separate2 .
On distributed memory platforms, the MA region is actually the total size of
stack+heap+global
All three types of memory allocation compete for the same pool of memory, with no limits except on the total available
memory. This relaxation of the memory category definitions usually benefits the user, since it can allow allocation
requests to succeed where a stricter memory model would cause the directive to fail. These implementation character-
istics must be kept in mind when reading program output that relates to memory usage.
Standard default for memory is currently 400 MB.
ECHO
The ECHO directive is processed only once, by Process 0 when the input file is read.
The character string <title> is assigned to the contents of the string following the TITLE directive. If the string
contains white space, it must be surrounded by double quotes. For example,
The title is stored in the database and will be used in all subsequent tasks/jobs until redefined in the input.
In addition, it is possible to enable the printing of specific items by naming them in the PRINT directive in
the <list_of_names>. Items identified in this way will be printed, regardless of the overall print level speci-
fied. Similarly, the NOPRINT directive can be used to suppress the printing of specific items by naming them in its
<list_of_names>. These items will not be printed, regardless of the overall print level, or the specific print level
of the individual items.
The list of items that can be printed for each module is documented as part of the input instructions for that module.
The items recognized by the top level of the code, and their thresholds, are:
The following example shows how a PRINT directive for the top level process can be used to limit printout to only
essential information. The directive is
This directive instructs the NWChem main program to print nothing, except for the memory usage statistics
(ma stats) and the names of all items stored in the database at the end of the job.
The print level within a module is inherited from the calling layer. For instance, by specifying the print to be low
within the MP2 module will cause the SCF, CPHF and gradient modules when invoked from the MP2 to default to
low print. Explicit user input of print thresholds overrides the inherited value.
The entry for variable <name> is the name of data to be entered into the database. This must be specified; there
is no default. The variable <type>, which is optional, allows the user to define a string specifying the type of data in
the array <name>. The data type can be explicitly specified as integer, real, double, logical, or string.
If no entry for <type> is specified on the directive, its value is inferred from the data type of the first datum. In such
a case, floating-point data entered using this directive must include either an exponent or a decimal point, to ensure
that the correct default type will be inferred. The correct default type will be inferred for logical values if logical-true
48 CHAPTER 5. TOP-LEVEL DIRECTIVES
values are specified as .true., true, or t, and logical-false values are specified as .false., false, or f. One
exception to the automatic detection of the data type is that the data type must be explicitly stated to input integer
ranges, unless the first element in the list is an integer that is not a range (c.f., 2.4). For example,
will not work since the first element will be interpreted as a string and not an integer. To work around this feature, use
instead
geometry "Ar+ghost"
Ar1 0 0 0
Bq2 0 0 2
end
basis
Ar1 library aug-cc-pvdz
Ar2 library aug-cc-pvdz
Bq2 library Ar aug-cc-pvdz
end
This input tells the code to perform MP2 energy calculations on an argon dimer in the first task, and then on the
argon atom in the presence of the “ghost” basis of the other atom.
The SET directive can also be used as an indirect means of supplying input to a part of the code that does not have a
separate input module (e.g., the atomic SCF, Section 10.5.2). Additional examples of applications of this directive can
5.8. UNSET — DELETE DATA IN THE RTDB 49
be found in the sample input files (see Section 2.3), and its usage with basis sets (Section 7) and geometries (Section
6). Also see Section 3.1 for an example of how to store an array in the database.
This directive cannot be used with complex objects such as geometries and basis sets3 . A wild-card (*) specified
at the end of the string <name> will cause all entries whose name begins with that string to be deleted. This is very
useful as a way to reset modules to their default behavior, since modules typically store information in the database
with names that begin with module:. For example, the SCF program can be restored to its default behavior by
deleting all database entries beginning with scf:, using the directive
unset scf:*
set mylist 1 2 3 4
unset mylist
STOP
As soon as this directive is encountered, all processing ceases and the calculation terminates with an error condi-
tion.
database is persistent, multiple tasks within one job behave exactly the same as multiple restart jobs with the same
sequence of input.
There are four main forms of the the TASK directive. The most common form is used to tell the code at what
level of theory to perform an electronic structure calculation, and which specific calculations to perform. The second
form is used to specify tasks that do not involve electronic structure calculations or tasks that have not been fully
implemented at all theory levels in NWChem, such as simple property evaluations. The third form is used to execute
UNIX commands on machines having a Bourne shell. The fourth form is specific to combined quantum-mechanics
and molecular-mechanics (QM/MM) calculations.
By default, the program terminates when a task does not complete successfully. The keyword ignore can be used
to prevent this termination, and is recognized by all forms of the TASK directive. When a TASK directive includes the
keyword ignore, a warning message is printed if the task fails, and code execution continues with the next task. An
example of this feature is given in the sample input file in Section 11.5.
The input options, keywords, and defaults for each of these four forms for the TASK directive are discussed in the
following sections.
This is the most commonly used version of the TASK directive, and it has the following form:
The string <theory> specifies the level of theory to be used in the calculations for this task. NWChem currently
supports ten different options. These are listed below, with the corresponding entry for the variable <theory>:
• scf — Hartree-Fock
• dft — Density functional theory for molecules
• sodft — Spin-Orbit Density functional theory
• mp2 — MP2 using a semi-direct algorithm
• direct_mp2 — MP2 using a full-direct algorithm
• rimp2 — MP2 using the RI approximation
• ccsd — Coupled-cluster single and double excitations
• ccsd(t) — Coupled-cluster linearized triples approximation
• ccsd+t(ccsd) — Fourth order triples contribution
• mcscf — Multiconfiguration SCF
• selci — Selected configuration interaction with perturbation correction
• md — Classical molecular dynamics simulation
• pspw — Pseudopotential plane-wave density functional theory for molecules and insulating solids using NWPW
• band — Pseudopotential plane-wave density functional theory for solids using NWPW
• tce — Tensor Contraction Engine (please see Section 15.4 for a complete description of this task directive
5.10. TASK — PERFORM A TASK 51
The string <operation> specifies the calculation that will be performed in the task. The default operation is a
single point energy evaluation. The following list gives the selection of operations currently available in NWChem:
NOTE: See Section 36.1 for the complete list of operations that accompany the NWPW module.
The user should be aware that some of these operations (gradient, optimize, dynamics, thermodynamics) require
computation of derivatives of the energy with respect to the molecular coordinates. If analytical derivatives are not
available (Section 4), they must be computed numerically, which can be very computationally intensive.
Here are some examples of the TASK directive, to illustrate the input needed to specify particular calculations with
the code. To perform a single point energy evaluation using any level of theory, the directive is very simple, since the
energy evaluation is the default for the string operation. For an SCF energy calculation, the input line is simply
task scf
Similarly, to perform a geometry optimization using density functional theory, the TASK directive is
The optional keyword ignore can be used to allow execution to continue even if the task fails, as discussed
above. An example with the keyword ignore can be found in Section 11.5.
This form of the TASK directive is used in instances where the task to be performed does not fit the model of the
previous version (such as execution of a Python program, Section 37), or if the operation has not yet been implemented
in a fashion that applies to a wide range of theories (e.g., property evaluation). Instead of requiring theory and
operation as input, the directive needs only a string identifying the task. The form of the directive in such cases is
as follows:
52 CHAPTER 5. TOP-LEVEL DIRECTIVES
The supported tasks that can be accessed with this form of the TASK directive are listed below, with the corre-
sponding entries for string variable <task>.
This directive also recognizes the keyword ignore, which allows execution to continue after a task has failed.
This form of the TASK directive is supported only on machines with a fully UNIX-style operating system. This
directive causes specified processes to be executed using the Bourne shell. This form of the task directive is:
The keyword shell is required for this directive. It specifies that the given command will be executed in the
Bourne shell. The user can also specify which process(es) will execute this command by entering values for process
on the directive. The default is for only process zero to execute the command. A range of processes may be specified,
using Fortran triplet notation4 . Alternatively, all processes can be specified simply by entering the keyword all. The
input entered for command must form a single string, and must consist of valid UNIX command(s). If the string
includes white space, it must be enclosed in double quotes.
For example, the TASK directive to tell process zero to copy the molecular orbitals file to a backup location
/piofs/save can be input as follows:
The TASK directive to tell all processes to list the contents of their /scratch directories is as follows:
The TASK directive to tell processes 0 to 10 to remove the contents of the current directory is as follows:
Note that NWChem’s ability to quote special input characters is very limited when compared with that of the
Bourne shell. To execute all but the simplest UNIX commands, it is usually much easier to put the shell script in a file
and execute the file from within NWChem.
4 The notation lo:hi:inc denotes the integers lo, lo+inc, lo+2*inc, . . . , hi
5.10. TASK — PERFORM A TASK 53
This is very similar to the most commonly used version of the TASK directive described in Section 5.10.1, and it has
the following form;
The string <theory> specifies the QM theory to be used in the QM/MM simulation5 . The level of theory may be
any QM method that can compute gradients but those algorithms in NWChem that do not support analytic gradients
should be avoided (c.f., Section 4).
The string <operation> is used to specify the calculation that will be performed in the QM/MM task. The
default operation is a single point energy evaluation. The following list gives the selection of operations currently
available in the NWChem QM/MM module;
Here are some examples of the TASK directive for QM/MM simulations. To perform a single point energy of a
QM/MM system using any QM level of theory, the directive is very simple. As with the general task directive, the
QM/MM energy evaluation is the default. For a DFT energy calculation the task directive input is,
or completely as
To do a molecular dynamics simulation of a QM/MM system using the SCF level of theory the task directive input
would be
The optional keyword ignore can be used to allow execution to continue even if the task fails, as discussed
above.
NWChem computes the basis set superposition error (BSSE) when two or more fragments are interacting by using
the counterpoise method. This directive is performed if the BSSE section is present. Single point energies, energy
gradients, geometry optimizations, Hessians and frequencies, at the level of theory that allows these tasks, can be
obtained with the BSSE correction. The input options for the BSSE section are:
BSSE
MON <string monomer name> <integer natoms> \
5 If theory is “md” this is not a QM/MM simulation and will result in an appropriate error
54 CHAPTER 5. TOP-LEVEL DIRECTIVES
MON defines the monomer’s name and its atoms; <string monomer name> defines the name of the monomer,
<integer atoms> is the list of atoms corresponding to the monomer (where such a list is relative to the initial geometry).
This information is needed for each monomer. With the tag INPUT the user can modify any calculation attributes for
each monomer without ghost. For example, the iterations number and the grid can be changed in a DFT calculation
(see the example of the interaction between Zn2+ and water). INPUT_WGHOST is the same than INPUT but for the
monomer with ghost. The input changes will be applied within this and for the following calculations, you should
be cautious reverting the changes for the next monomers. CHARGE assigns a charge to a monomer and it must be
consistent with the total charge in the whole system (see Section 5.11). The options OFF and ON turns off and on any
BSSE calculation.
The energy evaluation involves 1 + 2N calculations, i.e. one for the supermolecule and two for the N monomers. [S.
Simon, M. Duran, J. J. Dannenberg, J. Chem. Phys., 105, 11024 (1996)] NWChem stores the vector files for each cal-
culation (<string monomer name>.bsse.movecs), and one hessian file (<string monomer name>.bsse.hess).
The code does not assign automatically the basis set for the ghost atoms, you must assign the corresponding bqX for
each element, instead.
Examples
title dimer
start dimer
geometry units angstrom
symmetry c1
F 1.47189 2.47463 -0.00000
H 1.47206 3.29987 0.00000
F 1.46367 -0.45168 0.00000
H 1.45804 0.37497 -0.00000
end
title znwater
start znwater
echo
geometry noautoz units angstrom
symmetry c1
Zn -1.89334 -0.72741 -0.00000
O -0.20798 0.25012 0.00000
H -0.14200 1.24982 -0.00000
H 0.69236 -0.18874 -0.00000
end
bsse
mon metal 1
charge 2
input_wghost "scf\; maxiter 200\; end"
mon water 2 3 4
end
The default charge6 is zero if this directive is omitted. An example of a case where the directive would be needed
is for a calculation on a doubly charged cation. In such a case, the directive is simply,
6 The charge directive, in conjunction with the charges of atomic nuclei (which can be changed via the geometry input, cf. Section 6.3),
determines the total number of electrons in the chemical system. Therefore, a charge n specification removes "n" electrons from the chemical
system. Similarly, charge -n adds "n" electrons.
56 CHAPTER 5. TOP-LEVEL DIRECTIVES
charge 2
If centers with fractional charge have been specified (Section 6) the net charge of the system should be adjusted to
ensure that there are an integral number of electrons.
The charge may be changed between tasks, and is used by all wavefunction types. For instance, in order to
compute the first two vertical ionization energies of LiH, one might optimize the geometry of LiH using a UHF SCF
wavefunction, and then perform energy calculations at the optimized geometry on LiH + and LiH 2+ in turn. This is
accomplished with the following input:
charge 1
scf; uhf; doublet; end
task scf
charge 2
scf; uhf; singlet; end
task scf
The GEOMETRY, BASIS, and SCF directives are described below (Sections 6, 7 and 10 respectively) but their intent
should be clear. The TASK directive is described above (Section 5.10).
The entry for variable <name> is the name of the file that will contain the Ecce import information and should
include the full path to the directory where you want that file. For example
ecce_print /home/user/job/ecce.out
If the full path is not given and only the file name is given, the file will be located in whatever directory the job is
started in. For example, if the line
ecce_print ecce.out
is in the input file, the file could end up in the scratch directory if the user is using a batch script that copies the
input file to a local scratch directory and then launches NWChem from there. If the system then automatically removes
files in the scratch space at the end of the job, the ecce.out file will be lost. So, the best practice is to include the full
path name for the file.
Chapter 6
Geometries
The GEOMETRY directive is a compound directive that allows the user to define the geometry to be used for a given
calculation. The directive allows the user to specify the geometry with a relatively small amount of input, but there
are a large number of optional keywords and additional subordinate directives that the user can specify, if needed. The
directive therefore appears to be rather long and complicated when presented in its general form, as follows:
[VARIABLES
<string symbol> <real value>
... ]
57
58 CHAPTER 6. GEOMETRIES
[CONSTANTS
<string symbol> <real value>
... ]
(END || ZEND)]
[ZCOORD
CVR_SCALING <real value>
BOND <integer i> <integer j> \
[<real value>] [<string name>] [constant]
ANGLE <integer i> <integer j> \
[<real value>] [<string name>] [constant]
TORSION <integer i> <integer j> <integer k> <integer l> \
[<real value>] [<string name>] [constant]
END]
END
• keywords on the first line of the directive (to specify such optional input as the geometry name, input units, and
print level for the output)
• symmetry information
• Cartesian coordinates or Z-matrix input to specify the locations of the atoms and centers
• lattice parameters (needed only for periodic systems)
The following sections present the input for this compound directive in detail, describing the options available and
the usages of the various keywords in each of the three main parts.
All of the keywords and input on this line are optional. The following list describes all options and their defaults.
• <name> – user-supplied name for the geometry; the default name is geometry, and all NWChem modules
look for a geometry with this name. However, multiple geometries may be specified by using a different name
for each. Subsequently, the user can direct a module to a named geometry by using the SET directive (see the
example in Section 5.7) to associate the default name of geometry with the alternate name.
• units – keyword specifying that a value will be entered by the user for the string variable <units>. The
default units for the geometry input are Angstrøms (Note: atomic units or Bohr are used within the code, regard-
less of the option specified for the input units. The default conversion factor used in the code to convert from
Angstrøms to Bohr is 1.8897265 which may be overidden with the angstrom_to_au keyword described
below.). The code recognizes the following possible values for the string variable <units>:
– angstroms or an — Angstroms (Å), the default (converts to A.U. using the Åto A.U. conversion factor)
– au or atomic or bohr — Atomic units (A.U.)
– nm or nanometers — nanometers (converts to A.U. using a conversion factor computed as 10.0 times
the Å to A.U. conversion factor)
– pm or picometers — picometers (converts to A.U. using a conversion factor computed as 0.01 times
the Å to A.U. conversion factor)
• angstrom_to_au – may also be specified as ang2au. This enables the user to modify the conversion factors
used to convert between Å and A.U.. The default value is 1.8897265.
• bqbq – keyword to specify the treatment of interactions between dummy centers. The default in NWChem is
to ignore such interactions when computing energies or energy derivatives. These interactions will be included
if the keyword bqbq is specified.
• print and noprint – complementary keyword pair to enable or disable printing of the geometry. The default
is to print the output associated with the geometry. In addition, the keyword print may be qualified by the
additional keyword xyz, which specifies that the coordinates should be printed in the XYZ format of molecular
graphics program XMol.
• center and nocenter – complementary keyword pair to enable or disable translation of the center of nuclear
charge to the origin. With the origin at this position, all three components of the nuclear dipole are zero. The
default is to move the center of nuclear charge to the origin.
• autosym – keyword to specify that the symmetry of the geometric system should be automatically determined.
This option is on by default. Only groups up to and including Oh are recognized. Occasionally NWChem will
be unable to determine the full symmetry of a molecular system, but will find a proper subgroup of the full
symmetry. The default tolerance is set to work for most cases, but may need to be decreased to find the full
symmetry of a geometry. Note that autosym will be turned off if the SYMMETRY group input is given (See
section 6.2).
• noautoz – by default NWChem (release 3.3 and later) will generate redundant internal coordinates from user
input Cartesian coordinates. The internal coordinates will be used in geometry optimizations. The noautoz
keyword disables use of internal coordinates. The autoz keyword is provided only for backward compatibility.
See Section 6.5 for a more detailed description of redundant internal coordinates, including how to force the
definition of specific internal variables in combination with automatically generated variables.
60 CHAPTER 6. GEOMETRIES
• adjust – This indicates that an existing geometry is to be adjusted. Only new input for the redundant internal
coordinates may be provided (Section 6.5). It is not possible to define new centers or to modify the point group
using this keyword. See Section 6.5 for an example of its usage.
• nucleus – keyword to specify the default model for the nuclear charge distribution. The following values are
recognized:
NOTE: If you specify a finite nuclear size, you should ensure that the basis set you use is contracted for a finite
nuclear size. See the Section 7 for more information.
The following examples illustrate some of the various options that the user can specify on the first input line of the
GEOMETRY directive, using the keywords and input options described above.
The following directives all specify the same geometry for H2 (a bond length of 0.732556 Å):
The keyword group is optional, and can be omitted without affecting how the input for this directive is processed1 .
However, if the SYMMETRY directive is used, a group name must be specified by supplying an entry for the string
variable <group_name>. The group name should be specified as the standard Schöflies symbol. Examples of
expected input for the variable group_name include such entries as:
useful.
6.3. CARTESIAN COORDINATE INPUT 61
The SYMMETRY directive is optional. The default is no symmetry (i.e., C1 point group). Automatic detection of
point group symmetry is available through the use of autosym in the GEOMETRY directive main line (discussed in
Section 6.1). Note: if the SYMMETRY directive is present the autosym keyword is ignored.
If only symmetry-unique atoms are specified, the others will be generated through the action of the point group
operators, but the user if free to specify all atoms. The user must know the symmetry of the molecule being modeled,
and be able to specify the coordinates of the atoms in a suitable orientation relative to the rotation axes and planes of
symmetry. Appendix C lists a number of examples of the GEOMETRY directive input for specific molecules having
symmetry patterns recognized by NWChem. The exact point group symmetry will be forced upon the molecule, and
atoms within 10−3 A.U. of a symmetry element (e.g., a mirror plane or rotation axis) will be forced onto that element.
Thus, it is not necessary to specify to a high precision those coordinates that are determined solely by symmetry.
The keyword print gives information concerning the point group generation, including the group generators, a
character table, the mapping of centers, and the group operations.
The keyword tol relates to the accuracy with which the symmetry-unique atoms should be specified. When the
atoms are generated, those that are within the tolerance, tol, are considered the same.
The string <tag> is the name of the atom or center, and its case (upper or lower) is important. The tag is limited
to 16 characters and is interpreted as follows:
• If the entry for <tag> begins with either the symbol or name of an element (regardless of case), then the
center is treated as an atom of that type. The default charge is the atomic number (adjusted for the presence
of ECPs by the ECP NELEC directive ; see Section 8). Additional characters can be added to the string, to
distinguish between atoms of the same element (For example, the tags oxygen, O, o34, olonepair, and
Oxygen-ether, will all be interpreted as oxygen atoms.).
• If the entry for <tag> begins with the characters bq or x (regardless of case), then the center is treated as a
dummy center with a default zero charge (Note: a tag beginning with the characters xe will be interpreted as a
xenon atom rather than as a dummy center.). Dummy centers may optionally have basis functions or non-zero
charge. See Section B.2 for a sample input using dummy centers with charges.
It is important to be aware of the following points regarding the definitions and usage of the values specified for
the variable <tag> to describe the centers in a system:
• If the tag begins with characters that cannot be matched against an atom, and those characters are not BQ or X,
then a fatal error is generated.
62 CHAPTER 6. GEOMETRIES
• The tag of a center is used in the BASIS (Section 7) and ECP (Section 8) directives to associate functions with
centers.
• All centers with the same tag will have the same basis functions.
• When using automatic symmetry detection, only centers with the same tag will be candidates for testing for
symmetry equivalence.
• The user-specified charges (of all centers, atomic and dummy) and any net total charge of the system (Section
5.11) are used to determine the number of electrons in the system.
The Cartesian coordinates of the atom in the molecule are specified as real numbers supplied for the variables x,
y, and z following the characters entered for the tag. The values supplied for the coordinates must be in the units
specified by the value of the variable <units> on the first line of the GEOMETRY directive input.
After the Cartesian coordinate input, optional velocities may be entered as real numbers for the variables vx, vy,
and vz. The velocities should be given in atomic units and are used in QMD and PSPW calculations.
The Cartesian coordinate input line also contains the optional keywords charge, mass and nucleus, which
allow the user to specify the charge of the atom (or center) and its mass (in atomic mass units), and the nuclear model.
The default charge for an atom is its atomic number, adjusted for the presence of ECPs (see Section 8). In order to
specify a different value for the charge on a particular atom, the user must enter the keyword charge, followed by
the desired value for the variable <charge>.
The default mass for an atom is taken to be the mass of its most abundant naturally occurring isotope or of the
isotope with the longest half-life. To model some other isotope of the element, its mass must be defined explicitly by
specifying the keyword mass, followed by the value (in atomic mass units) for the variable <mass>.
The default nuclear model is a point nucleus. The keyword nucleus (or nucl or nuc) followed by the model
name <nucmodel> overrides this default. Allowed values of <nucmodel> are point or pt and finite or fi.
The finite option is a nuclear model with a Gaussian shape. The RMS radius of the Gaussian is determined by the
atomic mass number via the formula rRMS = 0.836 ∗ A1/3 + 0.57 fm. The mass number A is derived from the variable
<mass>.
The geometry of the system can be specified entirely in Cartesian coordinates by supplying a <tag> line of the
type described above for each atom or center. The user has the option, however, of supplying the geometry of some or
all of the atoms or centers using a Z-matrix description. In such a case, the user supplies the input tag line described
above for any centers to be described by Cartesian coordinates, and then specifies the remainder of the system using
the optional ZMATRIX directive described below in Section 6.4.
[VARIABLES
<string symbol> <real value>
6.4. ZMATRIX — Z-MATRIX INPUT 63
... ]
[CONSTANTS
<string symbol> <real value>
... ]
(END || ZEND)]
The input module recognizes three possible spellings of this directive name. It can be invoked with ZMATRIX,
ZMT, or ZMAT. The user can specify the molecular structure using either Cartesian coordinates or internal coordinates
(bond lengths, bond angles and dihedral angles. The Z-matrix input for a center defines connectivity, bond length,
and bond or torsion angles. Cartesian coordinate input for a center consists of three real numbers defining the x,y,z
coordinates of the atom.
Within the Z-matrix input, bond lengths and Cartesian coordinates must be input in the user-specified units, as
defined by the value specified for the variable <units> on the first line of the GEOMETRY directive. All angles are
specified in degrees.
The individual centers (denoted as i, j, and k below) used to specify Z-matrix connectivity may be designated
either as integers (identifying each center by number) or as tags (If tags are used, the tag must be unique for each
center.) The use of “dummy” atoms is possible, by using X or BQ at the start of the tag.
Bond lengths, bond angles and dihedral angles (denoted below as R, alpha, and beta, respectively) may be
specified either as numerical values or as symbolic strings that must be subsequently defined using the VARIABLES
or CONSTANTS directives. The numerical values of the symbolic strings labeled VARIABLES may be subject to
changes during a geometry optimization say, while the numerical values of the symbolic strings labeled CONSTANTS
will stay frozen to the value given in the input. The same symbolic string can be used more than once, and any mixture
of numeric data and symbols is acceptable. Bond angles (α) must be in the range 0 < α < 180.
The Z-matrix input is specified sequentially as follows:
tag1
tag2 i R
tag3 i R j alpha
tag4 i R j alpha k beta [orient]
...
The structure of this input is described in more detail below. In the following discussion, the tag or number of the
center being currently defined is labeled as C (“C” for current). The values entered for these tags for centers defined in
the Z-matrix input are interpreted in the same way as the <tag> entries for Cartesian coordinates described above (see
Section 6.3). Figures 6.1, 6.2 and 6.3 display the relationships between the input data and the definitions of centers
and angles.
The Z-matrix input shown above is interpreted as follows:
1. tag1
Only a tag is required for the first center.
2. tag2 i R
The second center requires specification of its tag and the bond length (RCi ) distance to a previous atom, which
is identified by i.
3. tag3 i R j alpha
64 CHAPTER 6. GEOMETRIES
00111100k
i k 00111100
11α
00 11
00 C
11
00
R
00j
11
00111100β
C
11
00 11j
00 i
Figure 6.1: Relationships between the centers, bond angle and dihedral angle in Z-matrix input.
The third center requires specification of its tag, its bond length distance (RCi ) to one of the two previous centers
(identified by the value of i), and the bond angle α = Ci cj.
(a) the dihedral angle (β) between the current center and centers i, j, and k (Figure 6.1), or
(b) a second bond angle β = Cik
c and an orientation to the plane containing the other three centers (Figure 6.2
and 6.3).
By default, β is interpreted as a dihedral angle (see Figure 6.1), but if the optional final parameter (<orient>)
is specified with the value ±1, then β is interpreted as the angle Cik. c The sign of <orient> specifies the
direction of the bond angle relative to the plane containing the three reference atoms. If <orient> is +1, then
the new center (C) is above the plane (Figure 6.2); and if <orient> is −1, then C is below the plane (Figure
6.3).
Following the Z-matrix center definitions described above, the user can specify initial values for any symbolic
variables used to define the Z-matrix tags. This is done using the optional VARIABLES directive, which has the
general form:
VARIABLES
<string symbol> <real value>
...
Each line contains the name of a variable followed by its value. Optionally, an equals sign (=) can be included between
the symbol and its value, for clarity in reading the input file.
Following the VARIABLES directive, the CONSTANTS directive may be used to define any Z-matrix symbolic
variables that remain unchanged during geometry optimizations. To freeze the Cartesian coordinates of an atom, refer
to Section 6.6. The general form of this directive is as follows:
k111
000
65
111
000
0011
C
k
00
11
β i 111
000
11
00
00111100i 000
111
β
R α
111
000
00
11
11
00
j
C 00111100
j
6.4. ZMATRIX — Z-MATRIX INPUT
111
000
000k
111
111
000
i
11
00 000
111
β11
00i β 111
000
R
11
00
α
j
00111100
C11
00
11
00
C 00111100
j
Input line: C i R j alpha k beta -1
66
6.4. ZMATRIX — Z-MATRIX INPUT 67
CONSTANTS
<string symbol> <real value>
...
Each line contains the name of a variable followed by its value. As with the VARIABLES directive, an equals sign (=)
can be included between the symbol and its value.
The end of the Z-matrix input using the compound ZMATRIX directive is signaled by a line containing either END
or ZEND, following all input for the directive itself and its associated optional directives.
A simple example is presented for water. All Z-matrix parameters are specified numerically, and symbolic tags are
used to specify connectivity information. This requires that all tags be unique, and therefore different tags are used for
the two hydrogen atoms, which may or may not be identical.
geometry
zmatrix
O
H1 O 0.95
H2 O 0.95 H1 108.0
end
end
The following example illustrates the Z-matrix input for the molecule CH3CF3 . This input uses the numbers of
centers to specify the connectivity information (i, j, and k), and uses symbolic variables for the Z-matrix parameters
R, alpha, and beta, which are defined in the inputs for the VARIABLES and CONSTANTS directives.
geometry
zmatrix
C
C 1 CC
H 1 CH1 2 HCH1
H 1 CH2 2 HCH2 3 TOR1
H 1 CH3 2 HCH3 3 -TOR2
F 2 CF1 1 CCF1 3 TOR3
F 2 CF2 1 CCF2 6 FCH1
F 2 CF3 1 CCF3 6 -FCH1
variables
CC 1.4888
CH1 1.0790
CH2 1.0789
CH3 1.0789
CF1 1.3667
CF2 1.3669
CF3 1.3669
constants
HCH1 104.28
HCH2 104.74
HCH3 104.7
CCF1 112.0713
CCF2 112.0341
CCF3 112.0340
TOR1 109.3996
68 CHAPTER 6. GEOMETRIES
TOR2 109.3997
TOR3 180.0000
FCH1 106.7846
end
end
The input for any centers specified with Cartesian coordinates must be specified using the format of the <tag>
lines described in Section 6.3 above. However, in order to correctly specify these Cartesian coordinates within the
Z-matrix, the user must understand the orientation of centers specified using internal coordinates. These are arranged
as follows:
ZCOORD
CVR_SCALING <real value>
BOND <integer i> <integer j> \
[<real value>] [<string name>] [constant]
ANGLE <integer i> <integer j> <integer k> \
[<real value>] [<string name>] [constant]
TORSION <integer i> <integer j> <integer k> <integer l> \
[<real value>] [<string name>] [constant]
END
The centers i, j, k and l must be specified using the numbers of the centers, as supplied in the input for the
Cartesian coordinates. The ZCOORD input parameters are defined as follows:
A value may be specified for a user-defined internal coordinate, in which case it is forced upon the input Cartesian
coordinates while attempting to make only small changes in the other internal coordinates. If no value is provided the
6.6. APPLYING CONSTRAINTS IN GEOMETRY OPTIMIZATIONS 69
value implicit in the input coordinates is kept. If the keyword constant is specified, then that internal variable is not
modified during a geometry optimization with DRIVER (Section 20). Each internal coordinate may also be named
either for easy identification in the output, or for the application of constraints (Section 6.6).
If the keyword adjust is specified on the main GEOMETRY directive, only ZCOORD data may be specified and it
can be used to change the user-defined internal coordinates, including adding/removing constraints and changing their
values.
This defines only the centers in the list as active. All other centers will have zero force assigned to them, and will
remain frozen at their starting coordinates during a geometry optimization.
For example, the following directive specifies that atoms numbered 1, 5, 6, 7, 8, and 15 are active and all other
atoms are frozen:
or equivalently,
set geometry:actlist 1 5 6 7 8 15
If this option is not specified by entering a SET directive, the default behavior in the code is to treat all atoms as
active. To revert to this default behavior after the option to define frozen atoms has been invoked, the UNSET directive
must be used (since the database is persistent, see Section 3.2). The form of the UNSET directive is as follows:
unset geometry:actlist
When the system possess translational symmetry, fractional coordinates are used in the directions where trans-
lational symmetry exists. This means that for crystals x, y and z are fractional, for surfaces x and y are fractional,
whereas for polymers only z is fractional. For example, in the following H2 O layer input (a 2-d periodic system), x
and y coordinates are fractional, whereas z is expressed in Å.
Since no space group symmetry is available yet other than P1, input of cell parameters is relative to the primitive
cell. For example, this is the input required for the cubic face-centered type structure of bulk MgO.
system crystal
lat_a 2.97692 lat_b 2.97692 lat_c 2.97692
alpha 60.00 beta 60.00 gamma 60.00
end
Chapter 7
Basis sets
NWChem currently supports basis sets consisting of generally contracted1 Cartesian Gaussian functions up to a max-
imum angular momentum of six (h functions), and also sp (or L) functions2 . The BASIS directive is used to define
these, and also to specify use of an effective core potential (ECP) that is associated with a basis set; see Section 8.
The basis functions to be used for a given calculation can be drawn from a standard set in the EMSL basis set
library that is included in the release of NWChem (See Appendix A for a list of the standard basis sets currently
supplied with the release of the code). Alternatively, the user can specify particular functions explicitly in the input, to
define a particular basis set.
The general form of the BASIS directive is as follows:
...
END
• name
1 Generally contracted meaning that the same primitive, Gaussian functions are contracted into multiple contracted functions using different
contraction coefficients. Reuse of the radial functions increases the efficiency of integral generation.
2 An sp shell is two-component general contraction. However, the first component specifies an s shell and the second a p shell. Again, reuse of
71
72 CHAPTER 7. BASIS SETS
By default, the basis set is stored in the database with the name "ao basis". Another name may be specified
in the BASIS directive, thus, multiple basis sets may be stored simultaneously in the database. Also, the DFT
(Section 11) and RI-MP2 (Section 16) modules and the Dyall-modified-Dirac relativistic method (Section 9.3)
require multiple basis sets with specific names.
The user can associate the "ao basis" with another named basis using the SET directive (see Section 5.7).
• SPHERICAL or CARTESIAN
The keywords spherical and cartesian offer the option of using either spherical-harmonic (5 d, 7 f, 9 g,
. . . ) or Cartesian (6 d, 10 f, 15 g, . . . ) angular functions. The default is Cartesian.
Note that the correlation-consistent basis sets were designed using spherical harmonics and to use these, the
spherical keyword should be present in the BASIS directive. The use of spherical functions also helps
eliminate problems with linear dependence.
• SEGMENT or NOSEGMENT
By default, NWChem forces all basis sets to be segmented, even if they are input with general contractions or
L or sp shells. This is because the current derivative integral program cannot handle general contractions. If a
calculation is computing energies only, a performance gain can result from exploiting generally contracted basis
sets, in which case NOSEGMENT should be specified.
• PRINT or NOPRINT
The default is for the input module to print all basis sets encountered. Specifying the keyword noprint allows
the user to suppress this output.
• REL
This keyword marks the entire basis as a relativistic basis for the purposes of the Dyall-modified-Dirac rela-
tivistic integral code. The marking of the basis set is necessary for the code to make the proper association
between the relativistic shells in the ao basis and the shells in the large and/or small component basis. This is
only necessary for basis sets which are to be used as the ao basis. The user is referred to Section 9.3 for more
details.
Basis sets are associated with centers by using the tag of a center in a geometry that has either been input by the
user (Section 6) or is available elsewhere. Each atom or center with the same tag will have the same basis set. All
atoms must have basis functions assigned to them — only dummy centers (X or Bq) may have no basis functions. To
facilitate the specification of the geometry and the basis set for any chemical system, the matching process of a basis
set tag to a geometry tag first looks for an exact match. If no match is found, NWChem will attempt to match, ignoring
case, the name or symbol of the element. E.g., all hydrogen atoms in a system could be labeled “H1”, “H2”, . . . , in
the geometry but only one basis set specification for “H” or “hydrogen” is necessary. If desired, a special basis may
be added to one or more centers (e.g., “H1”) by providing a basis for that tag. If the matching mechanism fails then
NWChem stops with an appropriate error message.
A special set of tags, “*” and tags ending with a “*” (E.g. “H*”) can be used in combination with the keyword
library (see section below). These tags facilitate the definition of a certain type of basis set of all atoms, or a group
of atoms, in a geometry using only a single or very few basis set entries. The “*” tag will not place basis sets on
dummy atoms, Bq* can be used for that if necessary.
Examined next is how to reference standard basis sets in the basis set library, and finally, how to define a basis set
using exponents and coefficients.
7.1. BASIS SET LIBRARY 73
For example, the NWChem basis set library contains the Dunning cc-pvdz basis set. These may be used as follows
basis
oxygen library cc-pvdz
hydrogen library cc-pvdz
end
A default path of the NWChem basis set libraries is provided on installation of the code, but a different path can be
defined by specifying the keyword file, and one can explicitly name the file to be accessed for the basis functions.
For example,
basis
o library 3-21g file /usr/d3g681/nwchem/library
si library 6-31g file /usr/d3g681/nwchem/libraries/
end
This directive tells the code to use the basis set 3-21g in the file /usr/d3g681/nwchem/library for atom o
and to use the basis set 6-31g in the directory /usr/d3g681/nwchem/libraries/ for atom si, rather than
look for them in the default libraries. When a directory is defined the code will search for the basis set in a file with
the name 6-31g.
The “*” tag can be used to efficiently define basis set input directives for large numbers of atoms. An example is:
basis
* library 3-21g
end
This directive tells the code to assign the basis sets 3-21g to all the atom tags defined in the geometry. If one wants
to place a different basis set on one of the atoms defined in the geometry, the following directive can be used:
basis
* library 3-21g except H
end
This directive tells the code to assign the basis sets 3-21g to all the atoms in the geometry, except the hydrogen
atoms. Remember that the user will have to explicitly define the hydrogen basis set in this directive! One may also
define tags that end with a “*”:
74 CHAPTER 7. BASIS SETS
basis
oxy* library 3-21g
end
This directive tells the code to assign the basis sets 3-21g to all atom tags in the geometry that start with “oxy”.
If standard basis sets are to be placed upon a dummy center, the variable <tag_in_lib> must also be entered on
this line, to identify the correct atom type to use from the basis function library (see the ghost atom example in Section
5.7 and below). For example: To specify the cc-pvdz basis for a calculation on the water monomer in the dimer basis,
where the dummy oxygen and dummy hydrogen centers have been identified as bqo and bqh respectively, the BASIS
directive is as follows:
basis
o library cc-pvdz
h library cc-pvdz
bqo library o cc-pvdz
bqh library h cc-pvdz
end
A special dummy center tag is bq*, which will assign the same basis set to all bq centers in the geometry. Just as with
the “*” tag, the except list can be used to assign basis sets to unique dummy centers.
The library basis sets can also be marked as relativistic by adding the rel keyword to the tag line. See Section 9.3
for more details. The correlation consistent basis sets have been contracted for relativistic effects and are included in
the standard library.
There are also contractions in the standard library for both a point nucleus and a finite nucleus of Gaussian shape.
These are usually distinguished by the suffixex _pt and _fi. It is the user’s responsibility to ensure that the con-
traction matches the nuclear type specified in the geometry object. The specification of a finite nucleus basis set does
NOT automatically set the nuclear type for that atom to be finite. See Section 6 for information.
The variable <shell_type> identifies the angular momentum of the shell, s, p, d, . . . . NWChem is configured
to handle up to h shells. The keyword rel marks the shell as relativistic — see Section 9.3 for more details. Subse-
quent lines define the primitive function exponents and contraction coefficients. General contractions are specified by
including multiple columns of coefficients.
The following example defines basis sets for the water molecule:
The resulting basis set defined is identical to the one defined above in the explicit basis set input.
Chapter 8
Effective core potentials (ECPs) are a useful means of replacing the core electrons in a calculation with an effective
potential, thereby eliminating the need for the core basis functions, which usually require a large set of Gaussians to
describe them. In addition to replacing the core, they may be used to represent relativistic effects, which are largely
confined to the core. In this context, both the scalar (spin-free) relativistic effects and spin-orbit (spin-dependent)
relativistic effects may be included in effective potentials. NWChem has the facility to use both, and these are described
in the next two sections.
A brief recapitulation of the development of RECPs is given here, following Pacios and Christiansen1 . The process
can be viewed as starting from an atomic Dirac-Hartree-Fock calculation, done in jj coupling, and producing relativistic
effective potentials (REPs) for each l and j value, UlREP
j . From these, a local potential is extracted, which for example
contains the Coulomb potential of the core electrons balanced by the part of the nuclear attraction which cancels the
core electron charge. The residue is expressed in a semi-local form,
L−1 l+1/2
U REP = ULJ
REP
∑ ∑ (r) ∑ |l jmihl jm|
REP REP
(r) + Ul j (r) −ULJ (8.1)
l=0 j=|l−1/2 m
where L is one larger than the maximum angular momentum in the atom. The scalar potential is obtained by averaging
the REPs for each j for a given l to give an averaged relativistic effective potential, or AREP,
1 h REP i
UlAREP (r) = REP
lUl−1/2 (r) + (l + 1)Ul+1/2 (r) . (8.2)
2l + 1
L−1
2
H SO = s · ∑ 2l + 1 ∆UlREP ∑0 |lmihlm|l|lm
ˆ 0 ihlm0 |. (8.3)
l=1 mm
where
∆UlREP = Ul+1/2
REP REP
(r) −Ul−1/2 (r). (8.4)
1 l. F. Pacios and P. A. Christiansen, J. Chem. Phys. 82, 2664 (1985)
77
78 CHAPTER 8. EFFECTIVE CORE POTENTIALS
The spin-orbit integrals generated by NWChem are the integrals over the sum, including the factor of 2/(2l + 1), so
that they may be treated as an effective spin-orbit operator without further factors introduced.
The effective potentials, both scalar and spin-orbit, are fitted to Gaussians with the form
where Alk is the contraction coefficient, nlk is the exponent of the “r” term (r-exponent), and Blk is the Gaussian
exponent. The nlk is shifted by 2, in accordance with most of the ECP literature and implementations, i.e., an nlk = 0
implies r−2 . The current implementation allows nlk values of only 0, 1, or 2.
The optional directive ECP allows the user to describe an effective core potential (ECP) in terms of contracted Gaussian
functions as given above. Potentials using these functions must be specified explicitly by user input in the ECP
directive. This directive has essentially the same form and properties as the standard BASIS directive, except for
essential differences required for ECPs. Because of this, the ECP is treated internally as a basis set. The form of the
input for the ECP directive is as follows:
...
END
ECPs are automatically segmented, even if general contractions are input. The projection operators defined in an
ECP are spherical by default, so there is no need to include the CARTESIAN or SPHERICAL keyword as there is for
a standard basis set. ECPs are associated with centers in geometries through tags or names of centers. These tags must
match in the same manner as for basis sets the tags in a GEOMETRY and ECP directives, and are limited to sixteen
(16) characters. Each center with the same tag will have the same ECP. By default, the input module prints each ECP
that it encounters. The NOPRINT option can be used to disable printing. There can be only one active ECP, even
though several may exist in the input deck. The ECP modules load “ecp basis” inputs along with any “ao basis” inputs
present. ECPs may be used in both energy and gradient calculations.
ECPs are named in the same fashion as geometries or regular basis sets, with the default name being "ecp basis".
It should be clear from the above discussion on geometries and database entries how indirection is supported. All di-
rectives that are in common with the standard Gaussian basis set input have the same function and syntax.
8.1. SCALAR ECPS 79
As for regular basis sets, ECPs may be obtained from the standard library. The names of the sets of ECPs available
in the standard library (their coverage is described in Appendix A) are
• "LANL2DZ ECP"
• "CRENBL ECP"
• "CRENBS ECP"
The keyword nelec allows the user to specify the number of core electrons replaced by the ECP. Additional input
lines define the specific coefficients and exponents. The variable <shell_type> is used to specify the components
of the ECP. The keyword ul entered for <shell_type> denotes the local part of the ECP. This is equivalent to the
highest angular momentum functions specified in the literature for most ECPs. The standard entries (s, p, d, etc.)
for shell_type specify the angular momentum projector onto the local function. The shell type label of s indicates
the ul-s projector input, p indicates the ul-p, etc.
For example, the Christiansen, Ross and Ermler ARECPs are available in the standard basis set libary named
{crenbl_ecp}. To perform a calculation on uranyl (UO2+ 2 ) with all-electron oxygen (aug-cc-pvdz basis), and
uranium with an ARECP and using the corresponding basis the following input can be used
geometry
U 0 0 0
O 0 0 1.65
O 0 0 -1.65
end
basis
U library crenbl_ecp
O library aug-cc-pvdz
end
ecp
U library crenbl_ecp
end
The following is an example of explicit input of an ECP for H2 CO. It defines an ECP for the carbon and oxygen
atoms in the molecule.
ecp
C nelec 2 # ecp replaces 2 electrons on C
C ul # d
1 80.0000000 -1.60000000
1 30.0000000 -0.40000000
2 0.5498205 -0.03990210
C s # s - d
80 CHAPTER 8. EFFECTIVE CORE POTENTIALS
0 0.7374760 0.63810832
0 135.2354832 11.00916230
2 8.5605569 20.13797020
C p # p - d
2 10.6863587 -3.24684280
2 23.4979897 0.78505765
O nelec 2 # ecp replaces 2 electrons on O
O ul # d
1 80.0000000 -1.60000000
1 30.0000000 -0.40000000
2 1.0953760 -0.06623814
O s # s - d
0 0.9212952 0.39552179
0 28.6481971 2.51654843
2 9.3033500 17.04478500
O p # p - s
2 52.3427019 27.97790770
2 30.7220233 -16.49630500
end
Various ECPs without a local function are available, including those of the Stuttgart group. For those, no "ul" part
needs to be defined. To define the absence of the local potential, simply specify one contraction with a zero coefficient:
<string tag> ul
2 1.00000 0.00000
END
8.2. SPIN-ORBIT ECPS 81
Note: in the literature the coefficients of the spin-orbit potentials are NOT always defined in the same manner. The
NWChem code assumes that the spin-orbit potential defined in the input is of the form:
2
∆UlNWChem = ∆Ul (8.5)
2l + 1
For example, in the literature the Stuttgart potentials are defined as ∆Ul and, hence, have to be multiplied by 2/(2l +1).
On the other hand, the CRENBL potentials in the published papers are defined as 2l+1 l
∆Ul and, hence, have to be
multiplied by 2/l (Warning: on the CRENBL website the spin-orbit potentials already have been corrected with the
2/l factor).
82 CHAPTER 8. EFFECTIVE CORE POTENTIALS
Chapter 9
All methods which include treatment of relativistic effects are ultimately based on the Dirac equation, which has a four
component wave function. The solutions to the Dirac equation describe both positrons (the “negative energy” states)
and electrons (the “positive energy” states), as well as both spin orientations, hence the four components. The wave
function may be broken down into two-component functions traditionally known as the large and small components;
these may further be broken down into the spin components.
The implementation of approximate all-electron relativistic methods in quantum chemical codes requires the re-
moval of the negative energy states and the factoring out of the spin-free terms. Both of these may be achieved using a
transformation of the Dirac Hamiltonian known in general as a Foldy-Wouthuysen transformation. Unfortunately this
transformation cannot be represented in closed form for a general potential, and must be approximated. One popular
approach is that originally formulated by Douglas and Kroll1 and developed by Hess2 . This approach decouples the
positive and negative energy parts to second order in the external potential (and also fourth order in the fine structure
constant, α). Other approaches include the Zeroth Order Regular Approximation (ZORA)3 and modification of the
Dirac equation by Dyall4 , and involves an exact FW transformation on the atomic basis set level5 .
Since these approximations only modify the integrals, they can in principle be used at all levels of theory. At
present the Douglas-Kroll and ZORA implementations can be used at all levels of theory whereas Dyall’s approach
is currently available at the Hartree-Fock level. The derivatives have been implemented, allowing both methods to be
used in geometry optimizations and frequency calculations.
The RELATIVISTIC directive provides input for the implemented relativistic approximations and is a compound
directive that encloses additional directives specific to the approximations:
RELATIVISTIC
[DOUGLAS-KROLL [<string (ON||OFF) default ON> \
<string (FPP||DKH||DKFULL||DK3||DK3FULL) default DKH>] ||
ZORA [ (ON || OFF) default ON ] ||
DYALL-MOD-DIRAC [ (ON || OFF) default ON ]
[ (NESC1E || NESC2E) default NESC1E ] ]
[CLIGHT <real clight default 137.0359895>]
END
1 M.Douglas and N. M. Kroll, Ann. Phys. (N.Y.) 82, 89 (1974)
2 B.A.Hess, Phys. Rev. A 32, 756 (1985); 33, 3742 (1986)
3 C. Chang, M. Pelissier, M. Durand, Physica Scripta 34, 294 (1986); E. van Lenthe, The ZORA Equation, doctoral thesis, Vrije Universiteit,
Amsterdam (1996); S. Faas, J.G. Snijders, J.H. van Lenthe, E. van Lenthe, and E.J. Baerends, Chem. Phys. Lett. 246, 632 (1995).
4 K. G. Dyall, J. Chem. Phys. 100, 2118 (1994)
5 K. G. Dyall, J. Chem. Phys. 106, 9618 (1997); K. G. Dyall and T. Enevoldsen, J. Chem. Phys. 111, 10000 (1999).
83
84 CHAPTER 9. RELATIVISTIC ALL-ELECTRON APPROXIMATIONS
Only one of the methods may be chosen at a time. If both methods are found to be on in the input block, NWChem
will stop and print an error message. There is one general option for both methods, the definition of the speed of light
in atomic units:
The following sections describe the optional sub-directives that can be specified within the RELATIVISTIC
block.
The spin-free and spin-orbit one-electron Douglas-Kroll approximation have been implemented. The use of relativistic
effects from this Douglas-Kroll approximation can be invoked by specifying:
The ON|OFF string is used to turn on or off the Douglas-Kroll approximation. By default, if the DOUGLAS-KROLL
keyword is found, the approximation will be used in the calculation. If the user wishes to calculate a non-relativistic
quantity after turning on Douglas-Kroll, the user will need to define a new RELATIVISTIC block and turn the ap-
proximation OFF. The user could also simply put a blank RELATIVISTIC block in the input file and all options will
be turned off.
The FPP is the approximation based on free-particle projection operators6 whereas the DKH and DKFULL approx-
imations are based on external-field projection operators7 . The latter two are considerably better approximations than
the former. DKH is the Douglas-Kroll-Hess approach and is the approach that is generally implemented in quantum
chemistry codes. DKFULL includes certain cross-product integral terms ignored in the DKH approach (see for example
Häberlen and Rösch8 ). The third-order Douglas-Kroll approximation has been implemented by T. Nakajima and K.
Hirao9 . This approximation can be called using DK3 (DK3 without cross-product integral terms) or DK3FULL (DK3
with cross-product integral terms).
The contracted basis sets used in the calculations should reflect the relativistic effects, i.e. one should use con-
tracted basis sets which were generated using the Douglas-Kroll Hamiltonian. Basis sets that were contracted using
the non-relativistic (Schödinger) Hamiltonian WILL PRODUCE ERRONEOUS RESULTS for elements beyond the
first row. See appendix A for available basis sets and their naming convention.
NOTE: we suggest that spherical basis sets are used in the calculation. The use of high quality cartesian basis sets
can lead to numerical inaccuracies.
In order to compute the integrals needed for the Douglas-Kroll approximation the implementation makes use of a
fitting basis set (see literature given above for details). The current code will create this fitting basis set based on the
given "ao basis" by simply uncontracting that basis. This again is what is commonly implemented in quantum
chemistry codes that include the Douglas-Kroll method. Additional flexibility is available to the user by explicitly
specifying a Douglas-Kroll fitting basis set. This basis set must be named "D-K basis" (see Chapter 7).
The ON|OFF string is used to turn on or off ZORA. By default, if the ZORA keyword is found, the approximation
will be used in the calculation. If the user wishes to calculate a non-relativistic quantity after turning on ZORA, the
user will need to define a new RELATIVISTIC block and turn the approximation OFF. The user can also simply put
a blank RELATIVISTIC block in the input file and all options will be turned off.
The ON|OFF string is used to turn on or off the Dyall’s modified Dirac approximation. By default, if the
DYALL-MOD-DIRAC keyword is found, the approximation will be used in the calculation. If the user wishes
to calculate a non-relativistic quantity after turning on Dyall’s modified Dirac, the user will need to define a new
RELATIVISTIC block and turn the approximation OFF. The user could also simply put a blank RELATIVISTIC
block in the input file and all options will be turned off.
Both one- and two-electron approximations are available NESC1E || NESC2E, and both have analytic gradi-
ents. The one-electron approximation is the default. The two-electron approximation specified by NESC2E has some
sub options which are placed on the same logical line as the DYALL-MOD-DIRAC directive, with the following syntax:
The first sub-option gives the capability to limit the two-electron corrections to those in which the small com-
ponents in any density must be on the same center. This reduces the (LL|SS) contributions to at most three-center
integrals and the (SS|SS) contributions to two centers. For a case with only one relativistic atom this option is redun-
dant. The second controls the inclusion of the (SS|SS) integrals which are of order α4 . For light atoms they may safely
be neglected, but for heavy atoms they should be included.
In addition to the selection of this keyword in the RELATIVISTIC directive block, it is necessary to supply
basis sets in addition to the ao basis. For the one-electron approximation, three basis sets are needed: the atomic
FW basis set, the large component basis set and the small component basis set. The atomic FW basis set should be
included in the ao basis. The large and small components should similarly be incorporated in basis sets named
large component and small component, respectively. For the two-electron approximation, only two basis
sets are needed. These are the large component and the small component. The large component should be included
in the ao basis and the small component is specified separately as small component, as for the one-electron
approximation. This means that the two approximations can not be run correctly without changing the ao basis,
and it is up to the user to ensure that the basis sets are correctly specified.
There is one further requirement in the specification of the basis sets. In the ao basis, it is necessary to add
the rel keyword either to the basis directive or the library tag line (See below for examples). The former marks
the basis functions specified by the tag as relativistic, the latter marks the whole basis as relativistic. The marking
is actually done at the unique shell level, so that it is possible not only to have relativistic and nonrelativistic atoms,
it is also possible to have relativistic and nonrelativistic shells on a given atom. This would be useful, for example,
for diffuse functions or for high angular momentum correlating functions, where the influence of relativity was small.
The marking of shells as relativistic is necessary to set up a mapping between the ao basis and the large and/or small
component basis sets. For the one-electron approximation the large and small component basis sets MUST be of the
same size and construction, i.e. differing only in the contraction coefficients.
It should also be noted that the relativistic code will NOT work with basis sets that contain sp shells, nor will it
work with ECPs. Both of these are tested and flagged as an error.
Some examples follow. The first example sets up the data for relativistic calculations on water with the one-electron
approximation and the two-electron approximation, using the library basis sets.
start h2o-dmd
basis "large"
oxygen library cc-pvdz_pt_sf_lc
hydrogen library cc-pvdz_pt_sf_lc
end
9.3. DYALL’S MODIFIED DIRAC HAMITONIAN APPROXIMATION 87
basis "small"
oxygen library cc-pvdz_pt_sf_sc
hydrogen library cc-pvdz_pt_sf_sc
end
relativistic
dyall-mod-dirac
end
task scf
relativistic
dyall-mod-dirac nesc2e
end
task scf
The second example has oxygen as a relativistic atom and hydrogen nonrelativistic.
start h2o-dmd2
relativistic
dyall-mod-dirac
end
task scf
Chapter 10
The NWChem self-consistent field (SCF) module computes closed-shell restricted Hartree-Fock (RHF) wavefunc-
tions, restricted high-spin open-shell Hartree-Fock (ROHF) wavefunctions, and spin-unrestricted Hartree-Fock (UHF)
wavefunctions.
The SCF directive provides input to the SCF module and is a compound directive that encloses additional directives
specific to the SCF module:
SCF
...
END
SINGLET
DOUBLET
TRIPLET
QUARTET
QUINTET
SEXTET
SEPTET
OCTET
NOPEN <integer nopen default 0>
RHF
ROHF
UHF
The optional keywords SINGLET, DOUBLET, . . . , OCTET and NOPEN allow the user to specify the number of
89
90 CHAPTER 10. HARTREE-FOCK OR SELF-CONSISTENT FIELD
singly occupied orbitals for a particular calculation. SINGLET is the default, and specifies a closed shell; DOUBLET
specifies one singly occupied orbital; TRIPLET specifies two singly occupied orbitals; and so forth. If there are more
than seven singly occupied orbitals, the keyword NOPEN must be used, with the integer nopen defining the number
of singly occupied orbitals (sometimes referred to as open shells).
If the multiplicity is any value other than SINGLET, the default calculation will be a spin-restricted, high-spin,
open-shell SCF calculation (keyword ROHF). The open-shell orbitals must be the highest occupied orbitals. If nec-
essary, any starting vectors may be rearranged through the use of the SWAP keyword on the VECTORS directive (see
Section 10.5) to accomplish this.
A spin-unrestricted solution can also be performed by specifying the keyword UHF. In UHF calculations, it is
assumed that the number of singly occupied orbitals corresponds to the difference between the number of alpha-spin
and beta-spin orbitals. For example, a UHF calculation with 2 more alpha-spin orbitals than beta-spin orbitals can be
obtained by specifying
scf
triplet ; uhf # (Note: two logical lines of input)
...
end
The user should be aware that, by default, molecular orbitals are symmetry adapted in NWChem. This may not
be desirable for fully unrestricted wavefunctions. In such cases, the user has the option of defeating the defaults by
specifying the keywords ADAPT OFF (see Section 10.3) and SYM OFF (see Section 10.2).
The keywords RHF and ROHF are provided in the code for completeness. It may be necessary to specify these in
order to modify the behavior of a previous calculation (see Section 3.2 for restart behavior).
This directive enables/disables the use of symmetry to speed up Fock matrix construction (via the petite-list or
skeleton algorithm) in the SCF, if symmetry was used in the specification of the geometry. Symmetry adaptation of
the molecular orbitals is not affected by this option. The default is to use symmetry if it is specified in the geometry
directive (Section 6).
For example, to disable use of symmetry in Fock matrix construction:
sym off
The default in the SCF module calculation is to force symmetry adaption of the molecular orbitals. This does not
affect the speed of the calculation, but without explicit adaption the resulting orbitals may be symmetry contaminated
for some problems. This is especially likely if the calculation is started using orbitals from a distorted geometry.
10.4. TOL2E — INTEGRAL SCREENING THRESHOLD 91
The underlying assumption in the use of symmetry in Fock matrix construction is that the density is totally sym-
metric. If the orbitals are symmetry contaminated, this assumption may not be valid — which could result in incorrect
energies and poor convergence of the calculation. It is thus advisable when specifying ADAPT OFF to also specify
SYM OFF (Section 10.2).
The variable tol2e is used in determining the integral screening threshold for the evaluation of the energy and
related Fock-like matrices. The Schwarz inequality is used to screen the product of integrals and density matrices in a
manner that results in an accuracy in the energy and Fock matrices that approximates the value specified for tol2e.
It is generally not necessary to set this parameter directly. Specify instead the required precision in the wavefunc-
tion, using the THRESH directive (Section 10.7). The default threshold is the minimum of 10−7 and 0.01 times the
requested convergence threshold for the SCF calculation (Section 10.7).
The input to specify the threshold explicitly within the SCF directive is, for example:
tol2e 1e-9
For very diffuse basis sets, or for high-accuracy calculations it might be necessary to set this parameter. A value
of 10−12 is sufficient for nearly all such purposes.
The VECTORS directive allows the user to specify the source and destination of the molecular orbital vectors. In
a startup calculation (see Section 5.1), the default source for guess vectors is a diagonalized Fock matrix constructed
from a superposition of the atomic density matrices for the particular problem. This is usually a very good guess. For
a restarted calculation, the default is to use the previous MO vectors.
The optional keyword INPUT allows the user to specify the source of the input molecular orbital vectors as any of
the following:
• ATOMIC — eigenvectors of a Fock-like matrix formed from a superposition of the atomic densities (the default
guess). See Sections 10.5.2 and 10.6.
• filename — the name of a file containing the MO vectors from a previous calculation. Note that unless the
path is fully qualified, or begins with a dot (“.”), then it is assumed to reside in the directory for permanent files
(see Section 5.2).
• PROJECT basisname filename — projects the existing MO vectors in the file filename from the
smaller basis with name basisname into the current basis. The definition of the basis basisname must be
available in the current database, and the basis must be smaller than the current basis. In addition, the geometry
used for the previous calculations must have the atoms in the same order and in the same orientation as the
current geometry.
• FRAGMENT file1 ... — assembles starting MO vectors from previously performed calculations on frag-
ments of the system and is described in more detail in Section 10.5.1. Even though there are some significant
restrictions in the use of the initial implementation of this method (see Section 10.5.1), this is the most powerful
initial guess option within the code. It is particularly indispensable for open shell metallic systems.
• ROTATE input_geometry input_movecs — rotates MO vectors generated at a previous geometry to
the current active geometry.
The molecular orbitals are saved every iteration if more than 600 seconds have elapsed, and also at the end of the
calculation. At completion (converged or not), the SCF module always canonically transforms the molecular orbitals
by separately diagonalizing the closed–closed, open–open, and virtual–virtual blocks of the Fock matrix.
The name of the file used to store the MO vectors is determined as follows:
• if the OUTPUT keyword was specified on the VECTORS directive, then the filename that follows this keyword
is used, or
• if the input vectors were read from a file, this file is reused for the output vectors (overwriting the input vectors);
else,
• a default file name is generated in the directory for permanent files (Section 5.2) by prepending ".movecs"
with the file prefix, i.e., "<file_prefix>.movecs".
The name of this file is stored in the database so that a subsequent SCF calculation will automatically restart from
these MO vectors.
Applications of this directive are illustrated in the following examples.
Example 1:
Assuming a start-up calculation, this directive will result in use of the default atomic density guess, and will output
the vectors to the file h2o.movecs.
Example 2:
This directive will result in the initial vectors being read from the file "initial.movecs". The results will be
written to the file final.movecs. The contents of "initial.movecs" will not be changed.
Example 3:
This directive will cause the calculation to start from vectors in the file "small.movecs" which are in a basis
named "small basis". The output vectors will be written to the default file "<file_prefix.movecs>".
Once starting vectors have been obtained using any of the possible options, they may be reordered through use of
the SWAP keyword. This optional keyword requires a list of orbital pairs that will be swapped. For UHF calculations,
separate SWAP keywords may be provided for the alpha and beta orbitals, as necessary.
An example of use of the SWAP directive:
vectors input try1.movecs swap 173 175 174 176 output try2.movecs
This directive will cause the initial orbitals to be read from the file "try1.movecs". The vectors for the orbitals
within the pairs 173–175 will be swapped with those within 174–176, so the resulting order is 175, 176, 173, 174. The
final orbitals obtained in the calculation will be written to the file "try2.movecs".
The swapping of orbitals occurs as a sequential process in the order (left to right) input by the user. Thus, regarding
each pair as an elementary transposition it is possible to construct arbitrary permutations of the orbitals. For instance,
to apply the permutation (6789)1 we note that this permutation is equal to (67)(78)(89), and thus may be specified as
vectors swap 8 9 7 8 6 7
Another example, now illustrating this feature for a UHF calculation, is the directive
This input will result in the swapping of the 5–6 alpha orbital pair and the 4–5 beta orbital pair. (All other items in the
input use the default values.)
The LOCK keyword allows the user to specify that the ordering of orbitals will be locked to that of the initial vectors,
insofar as possible. The default is to order by ascending orbital energies within each orbital space. One application
where locking might be desirable is a calculation where it is necessary to preserve the ordering of a previous geometry,
despite flipping of the orbital energies. For such a case, the LOCK directive can be used to prevent the SCF calculation
from changing the ordering, even if the orbital energies change.
The mapping of the MO’s to the nuclei can be changed using the REORDER keyword. Once starting vectors have
been obtained using any of the possible options, the REORDER keyword moves the MO coefficients between atoms
listed in the integer list. This keyword is particularly useful for calculating localized electron and hole states.
This optional keyword requires a list containing the new atom ordering. It is not necessary to provide separate lists
for alpha and beta orbitals.
An example of use of the REORDER keyword:
This directive will cause the initial orbitals to be read from the file "try1.movecs". The MO coefficients for the
basis functions on atom 2 will be swapped with those on atom 1. The final orbitals obtained in the calculation will be
written to the file "try2.movecs".
The following example shows how the ROTATE keyword can be used to rotate MO vectors calculated at geometry
geom1 to geometry geom2, which has a different rotational orientation:
dft
vectors input atomic output geom1.mo
end
task dft
• The system naturally decomposes into molecules that can be treated individually, e.g., a cluster.
• One or more fragments are particularly hard to converge and therefore much time can be saved by converging
them independently.
• A fragment (e.g., a metal atom) must be prepared with a specific occupation. This can often be readily accom-
plished with a calculation on the fragment using dummy charges to model a ligand field.
• The molecular occupation predicted by the atomic initial guess is often wrong for systems with heavy metals
which may have partially occupied orbitals with lower energy than some doubly occupied orbitals. The fragment
initial guess avoids this problem.
The molecular orbitals are formed by superimposing the previously generated orbitals of fragments of the molecule
being studied. These fragment molecular orbitals must be in the same basis as the current calculation. The input
specifies the files containing the fragment molecular orbitals. For instance, in a calculation on the water dimer, one
might specify
where h2o1.movecs contains the orbitals for the first fragment, and h2o2.movecs contains the orbitals for the
second fragment.
A complete example of the input for a calculation on the water dimer using the fragment guess is as follows:
start dimer
geometry dimer
O -0.595 1.165 -0.048
H 0.110 1.812 -0.170
H -1.452 1.598 -0.154
O 0.724 -1.284 0.034
10.5. VECTORS — INPUT/OUTPUT OF MO VECTORS 95
geometry h2o1
O -0.595 1.165 -0.048
H 0.110 1.812 -0.170
H -1.452 1.598 -0.154
end
geometry h2o2
O 0.724 -1.284 0.034
H 0.175 -2.013 0.348
H 0.177 -0.480 0.010
end
basis
o library 3-21g
h library 3-21g
end
First, the geometry of the dimer and the two monomers are specified and given names. Then, after the basis specifi-
cation, calculations are performed on the fragments by setting the geometry to the appropriate fragment (Section 5.7)
and redirecting the output molecular orbitals to an appropriately named file. Note also that use of the atomic initial
guess is forced, since the default initial guess is to use any existing MOs which would not be appropriate for the second
fragment calculation. Finally, the dimer calculation is performed by specifying the dimer geometry, indicating use of
the fragment guess, and redirecting the output MOs.
The following points are important in using the fragment initial guess:
1. The fragment calculations must be in the same basis set as the full calculation.
2. The order of atoms in the fragments and the order in which the fragment files are specified must be such that
when the fragment basis sets are concatenated all the basis functions are in the same order as in the full system.
This is readily accomplished by first generating the full geometry with atoms for each fragment contiguous, split-
ting this into numbered fragments and specifying the fragment MO files in the correct order on the VECTORS
directive.
96 CHAPTER 10. HARTREE-FOCK OR SELF-CONSISTENT FIELD
3. The occupation of orbitals is preserved when they are merged from the fragments to the full molecule and the
resulting occupation must match the requested occupation for the full molecule. E.g., a triplet ROHF calculation
must be comprised of fragments that have a total of exactly two open-shell orbitals.
4. Because of these restrictions, it is not possible to introduce additional atoms (or basis functions) into fragments
for the purpose of cleanly breaking real bonds. However, it is possible, and highly recommended, to introduce
additional point charges to simulate the presence of other fragments.
5. MO vectors of partially occupied or strongly polarized systems are very sensitive to orientation. While it is
possible to specify the same fragment MO vector file multiple times in the VECTORS directive, it is usually
much better to do a separate calculation for each fragment.
6. Linear dependencies which were present in a fragment calculation may be magnified in the full calculation.
When this occurs, some of the fragment’s highest virtual orbitals will not be copied to the full system, and a
warning will be printed.
A more involved example is now presented. We wish to model the sextet state of Fe(III) complexed with water,
imidazole and a heme with a net unit positive charge. The default atomic guess does not give the correct d 5 occupation
for the metal and also gives an incorrect state for the double anion of the heme. The following performs calculations
on all of the fragments. Things to note are:
1. The use of a dummy +2 charge in the initial guess on the heme which in part simulates the presence of the metal
ion, and also automatically forces an additional two electrons to be added to the system (the default net charge
being zero).
2. The iron fragment calculation (charge +3, d 5 , sextet) will yield the correct open-shell occupation for the full
system. If, instead, the d-orbitals were partially occupied (e.g., the doublet state) it would be useful to introduce
dummy charges around the iron to model the ligand field and thereby lift the degeneracy to obtain the correct
occupation.
3. Cs symmetry is used for all of the calculations. It is not necessary that the same symmetry be used in all of the
calculations, provided that the order and orientation of the atoms is preserved.
4. The unset scf:* directive is used immediately before the calculation on the full system so that the default
name for the output MO vector file can be used, rather than having to specify it explicitly.
start heme6a1
title "heme-H2O (6A1) from M.Dupuis"
############################################################
# Define the geometry of the full system and the fragments #
############################################################
geometry full-system
symmetry cs
geometry ring-only
symmetry cs
H 0.438 -0.002 4.549
C 0.443 -0.001 3.457
C 0.451 -1.251 2.828
C 0.452 1.250 2.828
H 0.455 2.652 4.586
H 0.461 -2.649 4.586
N1 0.455 -1.461 1.441
N1 0.458 1.458 1.443
C 0.460 2.530 3.505
C 0.462 -2.530 3.506
C 0.478 2.844 1.249
C 0.478 3.510 2.534
C 0.478 -2.848 1.248
C 0.480 -3.513 2.536
C 0.484 3.480 0.000
C 0.485 -3.484 0.000
H 0.489 4.590 2.664
H 0.496 -4.592 2.669
98 CHAPTER 10. HARTREE-FOCK OR SELF-CONSISTENT FIELD
geometry imid-only
symmetry cs
H 0.498 4.573 0.000
H 0.503 -4.577 0.000
H -4.925 1.235 0.000
H -4.729 -1.338 0.000
C -3.987 0.685 0.000
N -3.930 -0.703 0.000
C -2.678 1.111 0.000
C -2.622 -1.076 0.000
H -2.284 2.126 0.000
H -2.277 -2.108 0.000
N -1.838 0.007 0.000
end
geometry fe-only
symmetry cs
Fe .307 0.000 0.000
end
geometry water-only
symmetry cs
O 2.673 -0.009 0.000
H 3.238 -0.804 0.000
H 3.254 0.777 0.000
end
############################
# Basis set for everything #
############################
basis nosegment
O library 6-31g*
N library 6-31g*
C library 6-31g*
H library 6-31g*
Fe library "Ahlrichs pVDZ"
end
##########################################################
# SCF on the fragments for initial guess for full system #
##########################################################
task scf
charge 3
set geometry fe-only
scf; sextet; vectors atomic output fe.mo; end
task scf
##########################
# SCF on the full system #
##########################
charge 1
scf
sextet
vectors fragment ring.mo imid.mo fe.mo water.mo
maxiter 50
end
task scf
As noted above, the default guess vectors are based on superimposing the density matrices of the neutral atoms. If
some atoms are significantly charged, this default guess may be improved upon by modifying the atomic densities.
This is done by setting parameters that add fractional charges to the occupation of the valence atomic orbitals. Since
the atomic SCF program does not have its own input block, the SET directive (Section 5.7) must be used to set these
parameters.
The input specifies a list of tags (i.e., names of atoms in a geometry, see Section 6) and the charges to be added to
those centers. Two parameters must be set as follows:
The array of strings atomscf:tags_z should be set to the list of tags, and the array atomscf:z should be set
to the list of charges which must be real numbers (not integers). All atoms that have a tag specified in the list of tags
will be assigned the corresponding charge from the list of charges.
100 CHAPTER 10. HARTREE-FOCK OR SELF-CONSISTENT FIELD
For example, the following specifies that all oxygen atoms with tag O be assigned a charge of -1 and all iron atoms
with tag Fe be assigned a charge of +2
There are some limitations to this feature. It is not possible to add electrons to closed shell atoms, nor is it possible
to remove all electrons from a given atom. Attempts to do so will cause the code to report an error, and it will not
report further errors in the input for modifying the charge even when they are detected.
Finally, recall that the database is persistent (Section 3.2) and that the modified settings will be used in subsequent
atomic guess calculations unless the data is deleted from the database with the UNSET directive (Section 5.8).
This directive specifies the convergence threshold for the calculation. The convergence threshold is the norm of
the orbital gradient, and has a default value in the code of 10−4 .
The norm of the orbital gradient corresponds roughly to the precision available in the wavefunction, and the energy
should be converged to approximately the square of this number. It should be noted, however, that the precision in the
energy will not exceed that of the integral screening tolerance. This tolerance (Section 10.4) is automatically set from
the convergence threshold, so that sufficient precision is usually available by default.
The default convergence threshold suffices for most SCF energy and geometry optimization calculations, provid-
ing about 6–8 decimal places in the energy, and about four significant figures in the density and energy derivative
with respect to nuclear coordinates. However, greater precision may be required for calculations involving weakly in-
teracting systems, floppy molecules, finite-difference of gradients to compute the Hessian, and for post-Hartree-Fock
calculations. A threshold of 10−6 is adequate for most such purposes, and a threshold of 10−8 might be necessary for
very high accuracy or very weak interactions. A threshold of 10−10 should be regarded as the best that can be attained
in most circumstances.
The maximum number of iterations for the SCF calculation defaults to 20 for both ROHF/RHF and UHF calcu-
lations. For most molecules, this number of iterations is more than sufficient for the quadratically convergent SCF
algorithm to obtain a solution converged to the default threshold (see Section 10.7 above). If the SCF program detects
that the quadratically convergent algorithm is not efficient, then it will resort to a linearly convergent algorithm and
increase the maximum number of iterations by 10.
Convergence may not be reached in the maximum number of iterations for many reasons, including input error
(e.g., an incorrect geometry or a linearly dependent basis), a very low convergence threshold, a poor initial guess, or
the fact that the system is intrinsically hard to converge due to the presence of many states with similar energies.
The following sets the maximum number of SCF iterations to 50:
maxiter 50
PROFILE
This option can be helpful in understanding the computational performance of an SCF calculation. However, it
can introduce a significant overhead on machines that have expensive timing routines, such as the SUN.
DIIS
The implementation of this option is currently fairly rudimentary. It does not have level-shifting and damping, and
does not support open shells or UHF. It is provided on an “as is” basis, and should be used with caution.
When the DIIS directive is specified in the input, the user has the additional option of specifying the size of the
subspace for the DIIS extrapolation. This is accomplished with the DIISBAS directive, which is of the form:
The default of 5 should be adequate for most applications, but may be increased if convergence is poor. On large
systems, it may be necessary to specify a lower value for diisbas, to conserve memory.
are computed once and stored. Semi-direct calculations are between these two extremes with some integrals being
precomputed and stored, and all other integrals being recomputed as necessary.
The default behavior of the SCF module is
• If enough memory is available, the integrals are computed once and are cached in memory.
• If there is not enough memory to store all the integrals at once, then 95% of the available disk space in the
scratch directory (see Section 5.2) is assumed to be available for this purpose, and as many integrals as possible
are cached on disk (with no memory being used for caching). Some attempt is made to store the most expensive
integrals in the cache.
• If there is not enough room in memory or on disk for all the integrals, then the ones that are not cached are
recomputed in a semidirect fashion.
The integral file is deleted at the end of a calculation, so it is not possible to restart a semidirect calculation when
the integrals are cached in memory or on disk. Many computer systems (e.g., the EMSL IBM SP) clear the fast
scratch space at the end of each job, adding a further complication to the problem of restarting a parallel semidirect
calculation.
On the IBM SP or any other computer with fast disks local to each processor, semidirect calculation offers the best
behavior. It can result in quadratic speedup as more processors are added.
A fully direct calculation (with recomputation of the integrals at each iteration) is forced by specifying the directive
DIRECT
Alternatively, the SEMIDIRECT directive can be used to control the default semidirect calculation by defining the
amount of disk space and the cache memory size. The form of this directive is as follows:
The keyword FILESIZE allows the user to specify the amount of disk space to be used per process for storing
the integrals in 64-bit words. Similarly, the keyword MEMSIZE allows the user to specify the number of 64-bit words
to be used per process for caching integrals in memory. (Note: If the amount of storage space specified by the entry
for memsize is not available, the code cuts the value in half and checks again for available space. This process is
repeated until the request is satisfied.)
By default, the integral files are placed into the scratch directory (see Section 5.2). Specifying the keyword
FILENAME overrides this default. The user-specified name entered in the string filename has the process number
appended to it, so that each process has a distinct file but with a common base-name and directory. Therefore, it is
not possible to use this keyword to specify different disks for different processes. The SCRATCH_DIR directive (see
Section 5.2) can be used for this purpose.
For example, to force full recomputation of all integrals:
direct
To disable the use of memory for caching integrals and limit disk usage by each process to 100 megawords (MW):
The integral records are typically 32769 words long and any non-zero value for filesize or memsize should
be enough to hold at least one record.
10.11.1 Integral File Size and Format for the SCF Module
The file format is rather complex, since it accommodates a variety of packing and compression options and the dis-
tribution of data. This section presents some information that may help the user understand the output, and illustrates
how to use the output information to estimate file sizes.
If integrals are stored with a threshold of greater than 10−10 , then the integrals are stored in a 32-bit fixed-point
format (with appropriate treatment for large values to retain precision). If integrals are stored with a threshold less
than 10−10 , however, the values are stored in 64-bit floating-point format. If a replicated-data calculation is being run,
then 8 bits are used for each basis function label, unless there are more than 256 functions, in which case 16 bits are
used. If distributed data is being used, then the labels are always packed to 8 bits (the distributed blocks always being
less than 256; labels are relative to the start of the block).
Thus, the number (W ) of 64-bit words required to store N integrals, may be computed as
N , 8-bit labels and 32-bit values
3
2 N , 16-bit labels and 32-bit values
W= 3
2 N , 8-bit labels and 64-bit values
2N , 16-bit labels and 64-bit values
The actual number of words required can exceed this computed value by up to one percent, due to bookkeeping
overhead, and because the file itself is organized into fixed-size records.
With at least the default print level, all semidirect (not direct) calculations will print out information about the
integral file and the number of integrals computed. The form of this output is as follows:
The file information above relates only to process 0. The line of information about the number of quartets, integrals,
etc., is a sum over all processes.
When the integral file is closed, additional information of the following form is printed:
------------------------------------------------------------
EAF file 0: "./c6h6.aoints.0" size=262152 bytes
------------------------------------------------------------
write read awrite aread wait
----- ---- ------ ----- ----
calls: 6 12 0 0 0
104 CHAPTER 10. HARTREE-FOCK OR SELF-CONSISTENT FIELD
Again, the detailed file information relates just to process 0, but the final line indicates the total number of integral
records stored by all processes.
This information may be used to optimize subsequent calculations, for instance by assigning more memory or disk
space.
Note to users: It is desired that the SCF program converge reliably with the default options for a wide variety of
molecules. In addition, it should be guaranteed to converge for any system, with sufficient iterations. Please report
significant convergence problems to [email protected], and include the input file.
The SCF program uses a preconditioned conjugate gradient (PCG) method that is unconditionally convergent. Ba-
sically, a search direction is generated by multiplying the orbital gradient (the derivative of the energy with respect to
the orbital rotations) by an approximation to the inverse of the level-shifted orbital Hessian. In the initial iterations (see
Section 10.13), an inexpensive one-electron approximation to the inverse orbital Hessian is used. Closer to conver-
gence, the full orbital Hessian is used, which should provide quadratic convergence. For both the full or one-electron
orbital Hessians, the inverse-Hessian matrix-vector product is formed iteratively. Subsequently, an approximate line
search is performed along the new search direction. If the exact Hessian is being employed, then the line search should
require a single step (of unity). Preconditioning with approximate Hessians may require additional steps, especially
in the initial iterations. It is the (approximate) line search that provides the convergence guarantee. The iterations
required to solve the linear equations are referred to as micro-iterations. A macro-iteration comprises both the iterative
solution and a line search.
Level-shifting plays the same role in this algorithm as it does in the conventional iterative solution of the SCF
equations. The approximate Hessian used for preconditioning should be positive definite. If this is not the case, then
level-shifting by a positive constant (∆) serves to make the preconditioning matrix positive definite, by adding ∆ to all
of its eigenvalues. The level-shifts employed for the RHF orbital Hessian should be approximately four times (only
twice for UHF) the value that one would employ in a conventional SCF2 . Level-shifting is automatically enabled in
the early iterations, and the default options suffice for most test cases.
So why do things go wrong and what can be done to fix convergence problems? Most problems encountered so
far arise either poor initial guesses or from small or negative eigenvalues of the orbital Hessian. The atomic orbital
guess is usually very good. However, in calculations on charged systems, especially with open shells, incorrect initial
occupations may result. The SCF might then converge very slowly since very large orbital rotations might be required
to achieve the correct occupation or move charge large distances in the molecule. Possible actions are
• Modify the atomic guess by assigning charges to the atoms known to carry substantial charges (Section 10.5.2)
• Examining an analysis of the initial orbitals (Section 10.16) and then swapping them to attain the desired occu-
pation (Section 10.5).
• Converging the calculation in a minimal basis set, which is usually easier, and then projecting into a larger basis
set (Section 10.5).
Small or negative Hessian eigenvalues can occur even though the calculation seem to be close to convergence (as
measured by the gradient norm, or the off-diagonal Fock matrix elements). Small eigenvalues will cause the iterative
linear equation solver to converge slowly, resulting in an excessive number of micro-iterations. This makes the SCF
expensive in terms of computation time, and it is possible to exceed the maximum number of iterations without
achieving the accuracy required for quadratic convergence — which causes more macro-iterations to be performed.
Two main options are available when a problem will not converge: Newton-Raphson can be disabled temporarily
or permanently (see Section 10.13), and level-shifting can be applied to the matrix (see Section 10.14). In some cases,
both options may be necessary to achieve final convergence.
If there is reason to suspect a negative eigenvalue, the first course is to disable the Newton-Raphson iteration until
the solution is closer to convergence. It may be necessary to disable it completely. At some point close to convergence,
the Hessian will be positive definite, so disabling Newton-Raphson should yield a solution with approximately the
same convergence rate as DIIS.
If temporarily disabling Newton-Raphson is not sufficient to achieve convergence, it may be necessary to disable
it entirely and apply a small level-shift to the approximate Hessian. This should improve the convergence rate of the
micro-iterations and stabilize the macro-iterations. The level-shifting will destroy exact quadratic convergence, but
the optimization process is automatically adjusted to reflect this by enforcing conjugacy and reducing the accuracy to
which the linear equations are solved. The net result of this is that the solution will do more macro-iterations, but each
one should take less time than it would with the unshifted Hessian.
The following sections describe the directives needed to disable the Newton-Raphson iteration and specify level-
shifting.
The exact orbital Hessian is adopted as the preconditioner when the maximum element of the orbital gradient
is below the value specified for nr_switch. The default value is 0.1, which means that Newton-Raphson will be
disabled until the maximum value of the orbital gradient (twice the largest off-diagonal Fock matrix element) is less
than 0.1. To disable the Newton-Raphson entirely, the value of nr_switch must be set to zero. The directive to
accomplish this is as follows:
nr 0
This directive contains only two keywords: one for the PCG method and the other for the exact Hessian (Newton
Raphson, or NR). Use of PCG or NR is determined by the input specified for nr_switch on the NR directive, Section
10.13 above.
Specifying the keyword pcg on the LEVEL directive allows the user to define the level shifting for the approximate
(i.e., PCG) method. Specifying the keyword nr allows the user to define the level shifting for the exact Hessians. In
both options, the initial level shift is defined by the value specified for the variable initial. Optionally, tol can be
specified independently with each keyword to define the level of accuracy that must be attained in the solution before
the level shifting is changed to the value specified by input in the real variable final. Level shifts and gradient
thresholds are specified in atomic units.
For the PCG method (as specified using the keyword pcg), the defaults for this input are 20.0 for initial, 0.5
for tol, and 0.0 for final. This means that the approximate Hessian will be shifted by 20.0 until the maximum
element of the gradient falls below 0.5, at which point the shift will be set to zero.
For the exact Hessian (as specified using the keyword nr), the defaults are all zero. The exact Hessian is usually
not shifted since this destroys quadratic convergence. An example of an input directive that applies a shift of 0.2 to the
exact Hessian is as follows:
level nr 0.2
To apply this shift to the exact Hessian only until the maximum element of the gradient falls below 0.005, the
required input directive is as follows:
Note that in both of these examples, the parameters for the PCG method are at the default values. To obtain values
different from the defaults, the keyword pcg must also be specified. For example, to specify the level shifting in the
above example for the exact Hessian and non-default shifting for the PCG method, the directive would be something
like the following:
This input will cause the PCG method to be level-shifted by 20.0 until the maximum element of the gradient falls
below 0.3, then the shift will be zero. For the exact Hessian, the level shifting is initially 0.2, until the maximum
element falls below 0.005, after which the shift is zero.
The default options correspond to
set scf:localize t
will separately localize the core, valence, and virtual orbital spaces using the Pipek-Mezey algorithm. If the additional
directive
set scf:loctype FB
is included, then the Foster-boys algorithm is used. The partitioning of core-orbitals is performed using the atomic
information described in Section 16.1.
In the next release, this functionality will be extended to included all wavefunctions using molecular orbitals.
108 CHAPTER 10. HARTREE-FOCK OR SELF-CONSISTENT FIELD
GRADIENTS
[print || noprint] ...
END
The complementary keyword pair print and noprint allows the user some additional control on the informa-
tion that can be included in the print output from the SCF calculation. Currently, only a few items can be explicitly
invoked via print control. These are as follows:
The NWChem density functional theory (DFT) module uses the Gaussian basis set approach to compute closed shell
and open shell densities and Kohn-Sham orbitals in the:
The formal scaling of the DFT computation can be reduced by choosing to use auxiliary Gaussian basis sets to fit
the charge density (CD) and/or fit the exchange-correlation (XC) potential.
DFT input is provided using the compound DFT directive
DFT
...
END
The actual DFT calculation will be performed when the input module encounters the TASK directive (Section 5.10).
TASK DFT
Once a user has specified a geometry and a Kohn-Sham orbital basis set the DFT module can be invoked with no
input directives (defaults invoked throughout). There are sub-directives which allow for customized application; those
currently provided as options for the DFT module are:
111
112 CHAPTER 11. DFT FOR MOLECULES (DFT)
DECOMP
ODFT
DIRECT
INCORE
ITERATIONS <integer iterations default 30>
MAX_OVL
MULLIKEN
MULT <integer mult default 1>
NOIO
PRINT||NOPRINT
The following sections describe these keywords and optional sub-directives that can be specified for a DFT calcu-
lation in NWChem.
114 CHAPTER 11. DFT FOR MOLECULES (DFT)
MAX_OVL
The user has the option of specifying the exchange-correlation treatment in the DFT Module (see table 11.1). The
default exchange-correlation functional is defined as the local density approximation (LDA) for closed shell systems
and its counterpart the local spin-density (LSD) approximation for open shell systems. Within this approximation the
exchange functional is the Slater ρ1/3 functional (from J.C. Slater, Quantum Theory of Molecules and Solids, Vol. 4:
The Self-Consistent Field for Molecules and Solids (McGraw-Hill, New York, 1974)), and the correlation functional
is the Vosko-Wilk-Nusair (VWN) functional (functional V) (S.J. Vosko, L. Wilk and M. Nusair, Can. J. Phys. 58, 1200
(1980)). The parameters used in this formula are obtained by fitting to the Ceperley and Alder2 Quantum Monte-Carlo
solution of the homogeneous electron gas.
These defaults can be invoked explicitly by specifying the following keywords within the DFT module input
directive, XC slater vwn_5.
That is, this statement in the input file
dft
XC slater vwn_5
end
task dft
task dft
The DECOMP directive causes the components of the energy corresponding to each functional to be printed, rather
than just the total exchange-correlation energy which is the default. You can see an example of this directive in the
sample input in Section 11.5.
Many alternative exchange and correlation functionals are available to the user as listed in table 11.1. The following
sections describe how to use these options.
There are several Exchange and Correlation functionals in addition to the default slater and vwn_5 functionals.
These are either local or gradient-corrected functionals (GCA); a full list can be found in table 11.1.
The Hartree-Fock exact exchange functional, (which has O(N 4 ) computation expense), is invoked by specifying
2 D.M. Ceperley and B.J. Alder, Phys. Rev. Lett. 45, 566 (1980).
116 CHAPTER 11. DFT FOR MOLECULES (DFT)
XC HFexch
Note that the user also has the ability to include only the local or nonlocal contributions of a given functional. In
addition the user can specify a multiplicative prefactor (the variable <prefactor> in the input) for the local/nonlocal
component or total. An example of this might be,
The user should be aware that the Becke88 local component is simply the Slater exchange and should be input as such.
Any combination of the supported exchange functional options can be used. For example the popular Gaussian B3
exchange could be specified as:
Any combination of the supported correlation functional options can be used. For example B3LYP could be
specified as:
XC vwn_1_rpa 0.19 lyp 0.81 HFexch 0.20 slater 0.80 becke88 nonlocal 0.72
In addition to the options listed above for the exchange and correlation functionals, the user has the alternative of
specifying combined exchange and correlation functionals. A complete list of the available functionals appears in
table 11.1.
The available hybrid functionals (where a Hartree-Fock Exchange component is present) consist of the Becke
“half and half” (see A.D. Becke, J. Chem. Phys. 98, 1372 (1992)), the adiabatic connection method (see A.D. Becke,
J. Chem. Phys. 98, 5648 (1993)), B3LYP (popularized by Gaussian9X), Becke 1997 (“Becke V” paper: A.D.Becke,
J. Chem. Phys., 107, 8554 (1997)).
The keyword beckehandh specifies that the exchange-correlation energy will be computed as
1 HF 1 Slater 1 PW91LDA
EXC ≈ E + EX + EC
2 X 2 2
We know this is NOT the correct Becke prescribed implementation which requires the XC potential in the energy
expression. But this is what is currently implemented as an approximation to it.
The keyword acm specifies that the exchange-correlation energy is computed as
One way to calculate meta-GGA energies is to use orbitals and densities from fully self-consistent GGA or LDA
calculations and run them in one iteration in the meta-GGA functional. It is expected that meta-GGA energies obtained
this way will be close to fully self consistent meta-GGA calculations.
It is possible to calculate metaGGA energies both ways in NWChem, that is, self-consistently or with GGA/LDA
orbitals and densities. However, since second derivatives are not available for metaGGAs, in order to calculate fre-
quencies, one must use task dft freq numerical. A sample file with this is shown below, in 11.5. In this
instance, the energy is calculated self-consistently and geometry is optimized using the analytical gradients.
(For more information on metaGGAs, see S. Kurth, J. Perdew, P. Blaha, Int. J. Quant. Chem 75, 889 (1999) for a
brief description of meta-GGAs, and citations 14-27 therein for thorough background )
Note: both TPSS and PKZB correlation require the PBE GGA CORRELATION (which is itself dependent on an
LDA). The decision has been made to use these functionals with the accompanying local PW91LDA. The user does
not have the ability to set the local part of these metaGGA functionals.
The keyword LB94 will correct the asymptotic region of the XC definition of exchange-correlation potential by the
van-Leeuwen–Baerends exchange-correlation potential that has the correct −1/r asymptotic behavior. The total en-
ergy will be computed by the XC definition of exchange-correlation functional. This scheme is known to tend to
overcorrect the deficiency of most uncorrected exchange-correlation potentials.
The keyword CS00, when supplied with a real value of shift (in atomic units), will perform Casida–Salahub
’00 asymptotic correction. This is primarily intended for use in conjunction with TDDFT and the background of this
method is given in more detail in Chapter 14. The shift is normally positive (which means that the original uncorrected
exchange-correlation potential must be shifted down).
When the keyword CS00 is specified without the value of shift, the program will automatically supply it according
to the semi-empirical formula of Zhan, Nichols, and Dixon (again, see Chapter 14 for more details and references).
As the Zhan’s formula is calibrated against B3LYP results, it is most meaningful to use this in conjunction with the
B3LYP functional, although the program does not prohibit (or even warn) the use of any other functional.
Sample input files of asymptotically corrected TDDFT calculations can be found in Chapter 14.
A simple example calculates the geometry of water, using the metaGGA functionals xtpss03 and ctpss03. This
also highlights some of the print features in the DFT module. Note that you must use the line task dft freq numerical
because analytic hessians are not available for the metaGGAs:
118 CHAPTER 11. DFT FOR MOLECULES (DFT)
The default optimization in the DFT module is to iterate on the Kohn-Sham (SCF) equations for a specified number
of iterations (default 30). The keyword that controls this optimization is ITERATIONS, and has the following general
form,
The optimization procedure will stop when the specified number of iterations is reached or convergence is met.
See an example that uses this directive in section 11.5.
• convergence of the total energy; this is defined to be when the total DFT energy at iteration N and at iteration
N-1 differ by a value less than some value (the default is 1e-6). This value can be modified using the key word,
• convergence of the total density; this is defined to be when the total DFT density matrix at iteration N and at
iteration N-1 have a RMS difference less than some value (the default is 1e-5). This value can be modified using
the key word,
• convergence of the orbital gradient; this is defined to be when the DIIS error vector becomes less than some
value (the default is 5e-4). This value can be modified using the key word,
122 CHAPTER 11. DFT FOR MOLECULES (DFT)
The default optimization strategy is to immediately begin direct inversion of the iterative subspace3 . Damping is
also initiated (using 70% of the previous density) for the first 2 iteration. In addition, if the HOMO - LUMO gap is
small and the Fock matrix somewhat diagonally dominant, then level-shifting is automatically initiated. There are a
variety of ways to customize this procedure to whatever is desired.
An alternative optimization strategy is to specify, by using the change in total energy (from iterations when N and
N-1), when to turn damping, level-shifting, and/or DIIS on/off. Start and stop keywords for each of these is available
as,
So, for example, damping, DIIS, and/or level-shifting can be turned on/off as desired.
Another strategy can be to simply specify how many iterations (cycles) you wish each type of procedure to be
used. The necessary keywords to control the number of damping cycles (ncydp), the number of DIIS cycles (ncyds),
and the number of level-shifting cycles (ncysh) are input as,
The amount of damping, level-shifting, time at which level-shifting is automatically imposed, and Fock matrices
used in the DIIS extrapolation can be modified by the following keywords
Damping is defined to be the percentage of the previous iterations density mixed with the current iterations density.
So, for example
CONVERGENCE damp 70
would mix 30% of the current iteration density with 70% of the previous iteration density.
Level-Shifting4 is defined as the amount of shift applied to the diagonal elements of the unoccupied block of the
Fock matrix. The shift is specified by the keyword lshift. For example the directive,
causes the diagonal elements of the Fock matrix corresponding to the virtual orbitals to be shifted by 0.5 a.u. By
default, this level-shifting procedure is switched on whenever the HOMO-LUMO gap is small. Small is defined by
3 P. Pulay, Chem. Phys. Lett. 73, 393 (1980) and P. Pulay, J. Comp. Chem. 3, 566 (1982)
4 M.F. Guest and V.R. Saunders, Mol. Phys. 28, 819 (1974)
11.8. CDFT — CONSTRAINED DFT 123
default to be 0.05 au but can be modified by the directive hl_tol. An example of changing the HOMO-LUMO gap
tolerance to 0.01 would be,
Direct inversion of the iterative subspace with extrapolation of up to 10 Fock matrices is a default optimization
procedure. For large molecular systems the amount of available memory may preclude the ability to store this number
of N**2 arrays in global memory. The user may then specify the number of Fock matrices to be used in the extrap-
olation (must be greater than three (3) to be effective). To set the number of Fock matrices stored and used in the
extrapolation procedure to 3 would take the form,
CONVERGENCE diis 3
The user has the ability to simply turn off any optimization procedures deemed undesirable with the obvious
keywords,
For systems where the initial guess is very poor, the user can try the method described in 5 that makes use of frac-
tional occupation of the orbital levels during the initial cycles of the SCF convergence. The input has the following
form
where the optional value n_rabuck determines the number of SCF cycles during which the method will be active.
For example, to set equal to 30 the number of cycles where the Rabuck method is active, you need to use the following
line
CONVERGENCE rabuck 30
CDFT <integer fatom1 latom1> [<integer fatom2 latom2>] (charge||spin <real constaint_valu
[pop (becke||mulliken||lowdin) default lowdin]
Variables fatom1 and latom1 define the first and last atom of the group of atoms to which the constaint will be
applied. Therefore the atoms in the same group should be placed continuously in the geometry input. If fatom2 and
latom2 are specified, the difference between group 1 and 2 (i.e. 1-2) is constrained.
The constraint can be either on the charge or the spin density (# of alpha - beta electrons) with a user specified
constaint_value. Note: No gradients have been implemented for the spin constaints case. Geometry optimizations can
only be performed using the charge constaint.
5 A. D. Rabuck and G. E. Scuseria, J. Chem. Phys 110,695 (1999)
124 CHAPTER 11. DFT FOR MOLECULES (DFT)
To calculate the charge or spin density, the Becke, Mulliken, and Lowdin population schemes can be used. The
Lowdin scheme is default while the Mulliken scheme is not recommended. If basis sets with many diffuse functions
are used, the Becke population scheme is recommended.
Multiple constaints can be defined simultaniously by defining multiple cdft lines in the input. The same popu-
lation scheme will be used for all constaints and only needs to be specified once. If multiple population options are
defined, the last one will be used. When there are convergence problems with multiple constaints, the user is advised
to do one constraint first and to use the resulting orbitals for the next step of the constained calculations.
It is best to put "convergence nolevelshifting" in the dft directive to avoid issues with gradient calculations and
convergence in CDFT. Use orbital swap to get a broken-symmetry solution.
An input example is given below.
geometry
symmetry
C 0.0 0.0 0.0
O 1.2 0.0 0.0
C 0.0 0.0 2.0
O 1.2 0.0 2.0
end
basis
* library 6-31G*
end
dft
xc b3lyp
convergence nolevelshifting
odft
mult 1
vectors swap beta 14 15
cdft 1 2 charge 1.0
end
task dft
The SMEAR keyword is useful in cases with many degenerate states near the HOMO (eg metallic clusters)
This option allows fractional occupation of the molecular orbitals. A Gaussian broadening function of exponent
smear is used as described in the paper: R.W. Warren and B.I. Dunlap, Chem. Phys. Letters 262, 384 (1996).
The user must be aware that an additional energy term is added to the total energy in order to have energies and
gradients consistent.
11.10. GRID — NUMERICAL INTEGRATION OF THE XC POTENTIAL 125
A numerical integration is necessary for the evaluation of the exchange-correlation contribution to the density
functional. The default quadrature used for the numerical integration is an Euler-MacLaurin scheme for the radial
components (with a modified Mura-Knowles transformation) and a Lebedev scheme for the angular components.
Within this numerical integration procedure various levels of accuracy have been defined and are available to the user.
The user can specify the level of accuracy with the keywords; xcoarse, coarse, medium, fine, and xfine. The default is
medium.
GRID [xcoarse||coarse||medium||fine||xfine]
Our intent is to have a numerical integration scheme which would give us approximately the accuracy defined
below regardless of molecular composition.
In order to determine the level of radial and angular quadrature needed to give us the target accuracy we computed
total DFT energies at the LDA level of theory for many homonuclear atomic, diatomic and triatomic systems in rows
1-4 of the periodic table. In each case all bond lengths were set to twice the Bragg-Slater radius. The total DFT energy
of the system was computed using the converged SCF density with atoms having radial shells ranging from 35-235 (at
fixed 48/96 angular quadratures) and angular quadratures of 12/24-48/96 (at fixed 235 radial shells). The error of the
numerical integration was determined by comparison to a “best” or most accurate calculation in which a grid of 235
radial points 48 theta and 96 phi angular points on each atom was used. This corresponds to approximately 1 million
points per atom. The following tables were empirically determined to give the desired target accuracy for DFT total
energies. These tables below show the number of radial and angular shells which the DFT module will use for for a
given atom depending on the row it is in (in the periodic table) and the desired accuracy. Note, differing atom types
in a given molecular system will most likely have differing associated numerical grids. The intent is to generate the
desired energy accuracy (with utter disregard for speed).
126 CHAPTER 11. DFT FOR MOLECULES (DFT)
Table 11.2: Program default number of radial and angular shells empirically determined for Row 1 atoms (Li → F) to
reach the desired accuracies.
Table 11.3: Program default number of radial and angular shells empirically determined for Row 2 atoms (Na → Cl)
to reach the desired accuracies.
Table 11.4: Program default number of radial and angular shells empirically determined for Row 3 atoms (K → Br) to
reach the desired accuracies.
Table 11.5: Program default number of radial and angular shells empirically determined for Row 4 atoms (Rb → I) to
reach the desired accuracies.
In addition to the simple keyword specifying the desired accuracy as described above, the user has the option of
specifying a custom quadrature of this type in which ALL atoms have the same grid specification. This is accomplished
by using the gausleg keyword.
GRID gausleg <integer nradpts default 50> <integer nagrid default 10>
In this type of grid, the number of phi points is twice the number of theta points. So, for example, a specification
of,
GRID gausleg 80 20
would be interpreted as 80 radial points, 20 theta points, and 40 phi points per center (or 64000 points per center before
pruning).
Lebedev angular grid A second quadrature is the Lebedev scheme for the angular components6 . Within this nu-
merical integration procedure various levels of accuracy have also been defined and are available to the user. The input
for this type of grid takes the form,
In this context the variable iangquad specifies a certain number of angular points as indicated by the table below.7
Therefore the user can specify any number of radial points along with the level of angular quadrature (1-29).
The user can also specify grid parameters specific for a given atom type: parameters that must be supplied are:
atom tag and number of radial points. As an example, here is a grid input line for the water molecule
grid lebedev 80 11 H 70 8 O 90 11
6 The
subroutine for the Lebedev grid was derived from a routine supplied by M. Causà of the University of Torino and from the grid points
supplied by D.N. Laikov from Moscow State University.
7 V.I. Lebedev and D.N. Laikov, Doklady Mathematics 366, 741 (1999).
128 CHAPTER 11. DFT FOR MOLECULES (DFT)
IANGQUAD Nangular l
1 38 9
2 50 11
3 74 13
4 86 15
5 110 17
6 146 19
7 170 21
8 194 23
9 230 25
10 266 27
11 302 29
12 350 31
13 434 35
14 590 41
15 770 47
16 974 53
17 1202 59
18 1454 65
19 1730 71
20 2030 77
21 2354 83
22 2702 89
23 3074 95
24 3470 101
25 3890 107
26 4334 113
27 4802 119
28 5294 125
29 5810 131
1
∏2 1 − er f (µ0AB )
wA (r) =
B6=A
1 µAB
µ0AB =
α (1 − µ2AB )n
rA − rB
µAB =
|rA − rB |
euler Euler-McLaurin quadrature wih the transformation devised by C.W. Murray, N.C. Handy, and G.L. Laming,
Mol. Phys.78, 997 (1993).
mura Modification of the Murray-Handy-Laming scheme by M.E.Mura and P.J.Knowles, J Chem Phys 104, 9848
(1996) (we are not using the scaling factors proposed in this paper).
treutler Gauss-Chebyshev using the transformation suggested by O.Treutler and R.Alrhichs, J.Chem.Phys 102,
346 (1995).
NODISK
This keyword turns off storage of grid points and weights on disk.
The user has the option of controlling screening for the tolerances in the integral evaluations for the DFT module. In
most applications, the default values will be adequate for the calculation, but different values can be specified in the
input for the DFT module using the keywords described below.
The input parameter accCoul is used to define the tolerance in Schwarz screening for the Coulomb integrals.
Only integrals with estimated values greater than 10(−accCoul) are evaluated.
Screening away needless computation of the XC functional (on the grid) due to negligible density is also possible
with the use of,
XC functional computation is bypassed if the corresponding density elements are less than tol_rho.
A screening parameter, radius, used in the screening of the Becke or Delley spatial weights is also available as,
TOLERANCES tight
This option sets all tolerances to their default/user specified values at the very first iteration.
DIRECT||INCORE
NOIO
The inverted charge-density and exchange-correlation matrices for a DFT calculation are normally written to disk
storage. The user can prevent this by specifying the keyword noio within the input for the DFT directive. The input
to exercise this option is as follows,
noio
If this keyword is encountered, then the two matrices (inverted charge-density and exchange-correlation) are computed
“on-the-fly” whenever needed.
The INCORE option is always assumed to be true but can be overridden with the option DIRECT in which case all
integrals are computed “on-the-fly”.
11.13. ODFT AND MULT — OPEN SHELL SYSTEMS 131
Both closed-shell and open-shell systems can be studied using the DFT module. Specifying the keyword MULT
within the DFT directive allows the user to define the spin multiplicity of the system. The form of the input line is as
follows;
When the keyword MULT is specified, the user can define the integer variable mult, where mult is equal to the
number of alpha electrons minus beta electrons, plus 1.
The keyword ODFT is unnecessary except in the context of forcing a singlet system to be computed as an open
shell system (i.e., using a spin-unrestricted wavefunction).
The Perdew and Zunger (see J. P. Perdew and A. Zunger, Phys. Rev. B 23, 5048 (1981)) method to remove the self-
interaction contained in many exchange-correlation functionals has been implemented with the Optimized Effective
Potential method (see R. T. Sharp and G. K. Horton, Phys. Rev. 90, 317 (1953), J. D. Talman and W. F. Shadwick,
Phys. Rev. A 14, 36 (1976)) within the Krieger-Li-Iafrate approximation (J. B. Krieger, Y. Li, and G. J. Iafrate, Phys.
Rev. A 45, 101 (1992); 46, 5453 (1992); 47, 165 (1993)) Three variants of these methods are included in NWChem:
• sic perturbative This is the default option for the sic directive. After a self-consistent calculation, the
Kohn-Sham orbitals are localized with the Foster-Boys algorithm (see section 10.15) and the self-interaction
energy is added to the total energy. All exchange-correlation functionals implemented in the NWChem can be
used with this option.
• sic oep With this option the optimized effective potential is built in each step of the self-consistent process.
Because the electrostatic potential generated for each orbital involves a numerical integration, this method can
be expensive.
• sic oep-loc This option is similar to the oep option with the addition of localization of the Kohn-Sham
orbitals in each step of the self-consistent process.
With oep and oep-loc options a xfine grid (see section 11.10) must be used in order to avoid numerical noise, further-
more the hybrid functionals can not be used with these options. More details of the implementation of this method can
be found in J. Garza, J. A. Nichols and D. A. Dixon, J. Chem. Phys. 112, 7880 (2000). The components of the sic
energy can be printed out using:
MULLIKEN
When this keyword is encountered, Mulliken analysis of both the input density as well as the output density will occur.
For example, to perform a mulliken analysis and print the explicit population analysis of the basis functions, use the
following
dft
mulliken
print "mulliken ao"
end
task dft
basis
H library aug-cc-pvdz
O library aug-cc-pvdz
bqH library H aug-cc-pvdz
bqO library O aug-cc-pvdz
end
Please note that the “ghost” oxygen atom has been labeled bqO, and not just bq.
The PRINT||NOPRINT options control the level of output in the DFT. Please see some examples using this
directive in section 11.5, a sample input file. Known controllable print options are:
11.17. PRINT CONTROL 133
The spin-orbit DFT module (SODFT) in the NWChem code allows for the variational treatment of the one-electron
spin-orbit operator within the DFT framework. The implementation requires the definition of an effective core poten-
tial (ECP) and a matching spin-orbit potential (SO). The current implementation does NOT use symmetry.
The actual SODFT calculation will be performed when the input module encounters the TASK directive (Section
5.10).
TASK SODFT
Input parameters are the same as for the DFT, see section 11 for specifications. Some of the DFT options are not
available in the SODFT. These are max_ovl and sic.
Besides using the standard ECP and basis sets, see Section 8 for details, one also has to specify a spin-orbit (SO)
potential. The input specification for the SO potential can be found in section 8.2. At this time we have not included
any spin-orbit potentials in the basis set library.
Note: One should use a combination of ECP and SO potentials that were designed for the same size core, i.e. don’t
use a small core ECP potential with a large core SO potential (it will produce erroneous results).
Also, note that charge fitting basis sets will not work with spin-orbit calculations.
The following is an example of a calculation of UO2 :
start uo2_sodft
echo
Memory 32 mw
charge 2
135
136 CHAPTER 12. SPIN-ORBIT DFT (SODFT)
U S
12.12525300 0.02192100
7.16154500 -0.22516000
4.77483600 0.56029900
2.01169300 -1.07120900
U S
0.58685200 1.00000000
U S
0.27911500 1.00000000
U S
0.06337200 1.00000000
U S
0.02561100 1.00000000
U P
17.25477000 0.00139800
7.73535600 -0.03334600
5.15587800 0.11057800
2.24167000 -0.31726800
U P
0.58185800 1.00000000
U P
0.26790800 1.00000000
U P
0.08344200 1.00000000
U P
0.03213000 1.00000000
U D
4.84107000 0.00573100
2.16016200 -0.05723600
0.57563000 0.23882800
U D
0.27813600 1.00000000
U D
0.12487900 1.00000000
U D
0.05154800 1.00000000
U F
2.43644100 0.35501100
1.14468200 0.40084600
0.52969300 0.30467900
U F
0.24059600 1.00000000
U F
0.10186700 1.00000000
O S
47.10551800 -0.01440800
5.91134600 0.12956800
0.97648300 -0.56311800
O S
0.29607000 1.00000000
O P
137
16.69221900 0.04485600
3.90070200 0.22261300
1.07825300 0.50018800
O P
0.28418900 1.00000000
O P
0.07020000 1.00000000
END
ECP
U nelec 78
U s
2 4.06365300 112.92010300
2 1.88399500 15.64750000
2 0.88656700 -3.68997100
U p
2 3.98618100 118.75801600
2 2.00016000 15.07722800
2 0.96084100 0.55672000
U d
2 4.14797200 60.85589200
2 2.23456300 29.28004700
2 0.91369500 4.99802900
U f
2 3.99893800 49.92403500
2 1.99884000 -24.67404200
2 0.99564100 1.38948000
O nelec 2
O s
2 10.44567000 50.77106900
O p
2 18.04517400 -4.90355100
O d
2 8.16479800 -3.31212400
END
SO
U p
2 3.986181 1.816350
2 2.000160 11.543940
2 0.960841 0.794644
U d
2 4.147972 0.353683
2 2.234563 3.499282
2 0.913695 0.514635
U f
2 3.998938 4.744214
2 1.998840 -5.211731
2 0.995641 1.867860
END
138 CHAPTER 12. SPIN-ORBIT DFT (SODFT)
dft
mult 1
xc hfexch
odft
grid fine
convergence energy 1.000000E-06
convergence density 1.000000E-05
convergence gradient 1E-05
iterations 100
mulliken
end
task sodft
Chapter 13
COSMO
COSMO is the continuum solvation ‘COnductor-like Screening MOdel’ of A. Klamt and G. Schüürmann to describe
dielectric screening effects in solvents.
1. A. Klamt and G. Schüürmann, J. Chem. Soc. Perkin Trans. 2, 1993 799 (1993).
The NWChem COSMO module implements algorithm for calculation of the energy for the following methods:
by determining the solvent reaction field self-consistently with the solute charge distribution from the respective meth-
ods. Note that COSMO for unrestricted Hartree-Fock (UHF) method can also be performed by invoking the DFT
module with appropriate keywords.
Correlation energy of solvent molecules may also be evaluated at
1. MP2,
2. CCSD,
3. CCSD+T(CCSD),
4. CCSD(T),
levels of theory. It is cautioned, however, that these correlated COSMO calculations determine the solvent reaction
field using the HF charge distribution of the solute rather than the charge distribution of the correlation theory and are
not entirely self consistent in that respect. In other words, these calculations assume that the correlation effect and
solvation effect are largely additive, and the combination effect thereof is neglected. COSMO for MCSCF has not
been implemented yet.
In the current implementation the code calculates the gas-phase energy of the system followed by the solution-
phase energy, and returns the electrostatic contribution to the solvation free energy. At the present gradients are
139
140 CHAPTER 13. COSMO
calculated by finite difference of the energy. Known problems include that the code does not work with spherical
basis functions. The code does not calculate the non-electrostatic contributions to the free energy, except for the
cavitation/dispersion contribution to the solvation free energy, which is computed and printed. It should be noted that
one must in general take into account the standard state correction besides the electrostatic and cavitation/dispersion
contribution to the solvation free energy, when a comparison to experimental data is made.
Invoking the COSMO solvation model is done by specifying the input COSMO input block with the input options
as:
cosmo
[off]
[dielec <real dielec default 78.4>]
[radius <real atom1>
<real atom2>
. . .
<real atomN>]
[rsolv <real rsolv default 0.00>]
[iscren <integer iscren default 0>]
[minbem <integer minbem default 2>]
[maxbem <integer maxbem default 3>]
[ificos <integer ificos default 0>]
[lineq <integer lineq default 1>]
end
followed by the task directive specifying the wavefunction and type of calculation, e.g., task scf energy,
task mp2 energy, task dft optimize, etc.
off can be used to turn off COSMO in a compound (multiple task) run. By default, once the COSMO solvation
model has been defined it will be used in subsequent calculations. Add the keyword off if COSMO is not needed in
subsequent calculations.
Dielec is the value of the dielectric constant of the medium, with a default value of 78.4 (the dielectric constant
for water).
Radius is an array that specifies the radius of the spheres associated with each atom and that make up the
molecule-shaped cavity. Default values are Van der Waals radii. Values are in units of angstroms. The codes uses the
following Van der Waals radii by default:
data vdwr(103) /
1 0.80,0.49,0.00,0.00,0.00,1.65,1.55,1.50,1.50,0.00,
2 2.30,1.70,2.05,2.10,1.85,1.80,1.80,0.00,2.80,2.75,
3 0.00,0.00,1.20,0.00,0.00,0.00,2.70,0.00,0.00,0.00,
4 0.00,0.00,0.00,1.90,1.90,0.00,0.00,0.00,0.00,1.55,
5 0.00,1.64,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,
6 0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,
7 0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,
8 0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,
9 0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,
1 0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,1.65,
2 0.00,0.00,0.00/
with 0.0 values replaced by 1.80. Other radii can be used as well. See for examples:
Rsolv is a parameter used to define the solvent accessible surface. See the original reference of Klamt and
Schuurmann for a description. The default value is 0.00 (in angstroms).
Iscren is a flag to define the dielectric charge scaling option. “iscren 1” implies the original scaling from
Klamt and Schüürmann, mainly “(ε − 1)/(ε + 1/2)”, where ε is the dielectric constant. “iscren 0” implies the
modified scaling suggested by Stefanovich and Truong, mainly “(ε − 1)/ε”. Default is to use the modified scaling.
For high dielectric the difference between the scaling is not significant.
The next three parameters define the tesselation of the unit sphere. The approach follows the original proposal
by Klamt and Schüürmann. A very fine tesselation is generated from maxbem refining passes starting from either
an octahedron or an icosahedron. The boundary elements created with the fine tesselation are condensed down to a
coarser tesselation based on minbem. The induced point charges from the polarization of the medium are assigned to
the centers of the coarser tesselation. Default values are “minbem 2” and “maxbem 3”. The flag ificos serves to
select the original tesselation, “ificos 0” for an octahedron (default) and “ificos 1” for an icoshedron. Starting
from an icosahedron yields a somewhat finer tesselation that converges somewhat faster. Solvation energies are not
really sensitive to this choice for sufficiently fine tesselations.
The lineq parameter serves to select the numerical algorithm to solve the linear equations yielding the effective
charges that represent the polarization of the medium. “lineq 0” selects an iterative method (default), “lineq 1”
selects a dense matrix linear equation solver. For large molecules where the number of effective charges is large, the
codes selects the iterative method.
The following example is for a water molecule in ‘water’, using the HF/6-31G** level of theory:
start
echo
title "h2o"
geometry
o .0000000000 .0000000000 -.0486020332
h .7545655371 .0000000000 .5243010666
h -.7545655371 .0000000000 .5243010666
end
basis segment cartesian
o library 6-31g**
h library 6-31g**
end
cosmo
dielec 78.0
radius 1.40
1.16
1.16
rsolv 0.50
lineq 0
end
task scf energy
142 CHAPTER 13. COSMO
Chapter 14
14.1 Overview
NWChem supports a spectrum of single excitation theories for vertical excitation energy calculations, namely, configu-
ration interaction singles (CIS),1 time-dependent Hartree–Fock (TDHF or also known as random-phase approximation
RPA), time-dependent density functional theory (TDDFT),2 and Tamm–Dancoff approximation to TDDFT.3 These
methods are implemented in a single framework that invokes Davidson’s trial vector algorithm (or its modification for
a non-Hermitian eigenvalue problem).4 The capabilities of the module are summarized as follows:
143
144 CHAPTER 14. CIS, TDHF, AND TDDFT
These are very effective way to rectify the shortcomings of TDDFT when applied to Rydberg excited states (see
below).
formula proposed by Zhan, Nichols, and Dixon.13 Both Casida-Salahub scheme and this new asymptotic correction
scheme give considerably improved (Koopmans type) ionization potentials and Rydberg excitation energies. The
latter, however, supply the shift by itself unlike to former.
for an excited-state geometry optimization (and perhaps an adiabatic excitation energy calculation), and
for an excited-state vibrational frequency calculation. The TDDFT module first invokes DFT module for a ground-
state calculation (regardless of whether the calculations uses a HF reference as in CIS or TDHF or a DFT functional),
and hence there is no need to perform a separate ground-state DFT calculation prior to calling a TDDFT task. When
no second argument of the task directive is given, a single-point excitation energy calculation will be assumed. For
geometry optimizations, it is usually necessary to specify the target excited state and its irreducible representation it
belongs to. See the subsections TARGET and TARGETSYM for more detail.
Individual parameters and keywords may be supplied in the TDDFT input block. The syntax is:
TDDFT
[(CIS||RPA) default RPA]
[NROOTS <integer nroots default 1>]
[MAXVECS <integer maxvecs default 1000>]
[(SINGLET||NOSINGLET) default SINGLET]
[(TRIPLET||NOTRIPLET) default TRIPLET]
[THRESH <double thresh default 1e-4>]
[MAXITER <integer maxiter default 100>]
[TARGET <integer target default 1>]
[TARGETSYM <character targetsym default ’none’>]
[SYMMETRY]
[ALGORITHM <integer algorithm default 0>]
[FREEZE [[core] (atomic || <integer nfzc default 0>)] \
[virtual <integer nfzv default 0>]]
[PRINT (none||low||medium||high||debug)
<string list_of_names ...>]
END
The user can also specify the reference wave function in the DFT input block (even when CIS and TDHF calcula-
tions are requested). See the section of Sample input and output for more details.
Since each keyword has a default value, a minimal input file will be
13 C.-G. Zhan, J. A. Nichols, and D. A. Dixon, J. Phys. Chem. A 107, 4184 (2003).
146 CHAPTER 14. CIS, TDHF, AND TDDFT
GEOMETRY
Be 0.0 0.0 0.0
END
BASIS
Be library 6-31G**
END
Note that the keyword for the asymptotic correction must be given in the DFT input block, since all the effects
of the correction (and also changes in the computer program) occur in the SCF calculation stage. See Chapter 11
(keyword CS00 and LB94) for details.
These keywords toggle the Tamm–Dancoff approximation. CIS means that the Tamm–Dancoff approximation is used
and the CIS or Tamm–Dancoff TDDFT calculation is requested. RPA, which is the default, requests TDHF (RPA) or
TDDFT calculation.
The performance of CIS (Tamm–Dancoff TDDFT) and RPA (TDDFT) are comparable in accuracy. However, the
computational cost is slightly greater in the latter due to the fact that the latter involves a non-Hermitian eigenvalue
problem and requires left and right eigenvectors while the former needs just one set of eigenvectors of a Hermitian
eigenvalue problem. The latter has much greater chance of aborting the calculation due to triplet near instability or
other instability problems.
One can specify the number of excited state roots to be determined. The default value is 1. It is advised that the users
request several more roots than actually needed, since owing to the nature of the trial vector algorithm, some low-lying
roots can be missed when they do not have sufficient overlap with the initial guess vectors.
This keyword limits the subspace size of Davidson’s algorithm; in other words, it is the maximum number of trial
vectors that the calculation is allowed to hold. Typically, 10 to 20 trial vectors are needed for each excited state root to
be converged. However, it need not exceed the product of the number of occupied orbitals and the number of virtual
orbitals. The default value is 1000.
SINGLET (NOSINGLET) requests (suppresses) the calculation of singlet excited states when the reference wave
function is closed shell. The default is SINGLET.
14.4. KEYWORDS OF TDDFT INPUT BLOCK 147
TRIPLET (NOTRIPLET) requests (suppresses) the calculation of triplet excited states when the reference wave func-
tion is closed shell. The default is TRIPLET.
This keyword specifies the convergence threshold of Davidson’s iterative algorithm to solve a matrix eigenvalue prob-
lem. The threshold refers to the norm of residual, namely, the difference between the left-hand side and right-hand
side of the matrix eigenvalue equation with the current solution vector. With the default value of 1e-4, the excitation
energies are usually converged to 1e-5 hartree.
It typically takes 10–30 iterations for the Davidson algorithm to get converged results. The default value is 100.
14.4.8 TARGET and TARGETSYM— the target root and its symmetry
At the moment, the first and second geometrical derivatives of excitation energies that are needed in force, geome-
try, and frequency calculations are obtained by numerical differentiation. These keywords may be used to specify
which excited state root is being used for the geometrical derivative calculation. For instance, when TARGET 3 and
TARGETSYM a1g are included in the input block, the total energy (ground state energy plus excitation energy) of
the third lowest excited state root (excluding the ground state) transforming as the irreducible representation a1g will
be passed to the module which performs the derivative calculations. The default values of these keywords are 1 and
none, respectively.
The keyword TARGETSYM is essential in excited state geometry optimization, since it is very common that the
order of excited states changes due to the geometry changes in the course of optimization. Without specifying the
TARGETSYM, the optimizer could (and would likely) be optimizing the geometry of an excited state that is different
from the one the user had intended to optimize at the starting geometry. On the other hand, in the frequency calcula-
tions, TARGETSYM must be none, since the finite displacements given in the course of frequency calculations will
lift the spatial symmetry of the equilibrium geometry. When these finite displacements can alter the order of excited
states including the target state, the frequency calculation is not be feasible.
By adding this keyword to the input block, the user can request the module to generate the initial guess vectors
transforming as the same irreducible representation as TARGETSYM. This causes the final excited state roots be (ex-
clusively) dominated by those with the specified irreducible representation. This may be useful, when the user is
interested in just the optically allowed transitions, or in the geometry optimization of an excited state root with a par-
ticular irreducible representation. By default, this option is not set. TARGETSYM must be specified when SYMMETRY
is invoked.
There are four distinct algorithms to choose from, and the default value of 0 (optimal) means that the program makes
an optimal choice from the four algorithms on the basis of available memory. In the order of decreasing memory
148 CHAPTER 14. CIS, TDHF, AND TDDFT
The incore algorithm stores all the trial and product vectors in memory across different nodes with the GA, and often
decreases the MAXITER value to accommodate them. The disk-based algorithm stores the vectors on disks across
different nodes with the DRA, and retrieves each vector one at a time when it is needed. The multiple and single
tensor contraction refers to whether just one or more than one trial vectors are contracted with integrals. The multiple
tensor contraction algorithm is particularly effective (in terms of speed) for CIS and TDHF, since the number of the
direct evaluations of two-electron integrals is diminished substantially.
Some of the lowest-lying core orbitals and/or some of the highest-lying virtual orbitals may be excluded in the CIS,
TDHF, and TDDFT calculations by this keyword (this does not affect the ground state HF or DFT calculation). No
orbitals are frozen by default. To exclude the atom-like core regions altogether, one may request
FREEZE atomic
To specify the number of lowest-lying occupied orbitals be excluded, one may use
FREEZE 10
FREEZE core 10
To freeze the highest virtual orbitals, use the virtual keyword. For instance, to freeze the top 5 virtuals
FREEZE virtual 5
This keyword changes the level of output verbosity. One may also request some particular items in Table 14.1 printed.
The following is a sample input for a spin-restricted TDDFT calculation of singlet excitation energies for the water
molecule at the B3LYP/6-31G*.
14.5. SAMPLE INPUT 149
Table 14.1: Printable items in the TDDFT modules and their default print levels.
Item Print Level Description
“timings” high CPU and wall times spent in each step
“trial vectors” high Trial CI vectors
“initial guess” debug Initial guess CI vectors
“general information” default General information
“xc information” default HF/DFT information
“memory information” default Memory information
“convergence” debug Convergence
“subspace” debug Subspace representation of CI matrices
“transform” debug MO to AO and AO to MO transformation of CI vectors
“diagonalization” debug Diagonalization of CI matrices
“iteration” default Davidson iteration update
“contract” debug Integral transition density contraction
“ground state” default Final result for ground state
“excited state” low Final result for target excited state
START h2o
GEOMETRY
O 0.00000000 0.00000000 0.12982363
H 0.75933475 0.00000000 -0.46621158
H -0.75933475 0.00000000 -0.46621158
END
BASIS
* library 6-31G*
END
DFT
XC B3LYP
END
TDDFT
RPA
NROOTS 20
END
START co
CHARGE 1
150 CHAPTER 14. CIS, TDHF, AND TDDFT
GEOMETRY
C 0.0 0.0 0.0
O 1.5 0.0 0.0
END
BASIS
* library aug-cc-pVDZ
END
DFT
XC HFexch
MULT 2
END
TDDFT
RPA
NROOTS 5
END
A geometry optimization followed by a frequency calculation for an excited state is carried out for BF at the
CIS/6-31G* level in the following sample input.
START bf
GEOMETRY
B 0.0 0.0 0.0
F 0.0 0.0 1.2
END
BASIS
* library 6-31G*
END
DFT
XC HFexch
END
TDDFT
CIS
NROOTS 3
NOTRIPLET
TARGET 1
END
TDDFT with an asymptotically corrected SVWN exchange-correlation potential. Casida-Salahub scheme has been
used with the shift value of 0.1837 a.u. supplied as an input parameter.
START tddft_ac_co
GEOMETRY
O 0.0 0.0 0.0000
C 0.0 0.0 1.1283
END
BASIS SPHERICAL
C library aug-cc-pVDZ
O library aug-cc-pVDZ
END
DFT
XC Slater VWN_5
CS00 0.1837
END
TDDFT
NROOTS 12
END
START tddft_ac_co
GEOMETRY
O 0.0 0.0 0.0000
C 0.0 0.0 1.1283
END
BASIS SPHERICAL
C library aug-cc-pVDZ
O library aug-cc-pVDZ
END
DFT
XC B3LYP
CS00
END
TDDFT
NROOTS 12
END
152 CHAPTER 14. CIS, TDHF, AND TDDFT
15.1 Overview
The Tensor Contraction Engine (TCE) Module of NWChem implements a variety of approximations that converge
at the exact solutions of Schrödinger equation. They include configuration interaction theory through singles, dou-
bles, triples, and quadruples substitutions, coupled-cluster theory through connected singles, doubles, triples, and
quadruples substitutions, and many-body perturbation theory through fourth order in its tensor formulation. Not only
optimized parallel programs of some of these high-end correlation theories are new, but also the way in which they
have been developed is unique. The working equations of all of these methods have been derived completely auto-
matically by a symbolic manipulation program called a Tensor Contraction Engine (TCE), and the optimized parallel
programs have also been computer-generated by the same program, which were interfaced to NWChem. The devel-
opment of the TCE program and this portion of the NWChem program has been financially supported by the United
States Department of Energy, Office of Science, Office of Basic Energy Science, through the SciDAC program.
The capabilities of the module include:
• Unrestricted coupled-cluster theory (LCCD, CCD, LCCSD, CCSD, QCISD, CCSDT, CCSDTQ),
• Unrestricted iterative many-body perturbation theory [MBPT(2), MBPT(3), MBPT(4)] in its tensor formulation,
• Unrestricted coupled-cluster singles and doubles with perturbative connected triples {CCSD(T), CCSD[T]},
153
154 CHAPTER 15. TENSOR CONTRACTION ENGINE MODULE: CI, MBPT, AND CC
Version 4.6 and onwards the distributed binary executables do not contain CCSDTQ and its derivative methods, owing
to their large volume. The source code includes them, so a user can reinstate them by setenv CCSDTQ yes and
recompile TCE module. The following optimizations have been used in the module:
• Spin symmetry (spin integration is performed wherever possible within the unrestricted framework, making the
present unrestricted program optimal for an open-shell system. The spin adaption was not performed, although
in a restricted calculation for a closed-shell system, certain spin blocks of integrals and amplitudes are further
omitted by symmetry, and consequently, the present unrestricted CCSD requires only twice as many operations
as a spin-adapted restricted CCSD for a closed-shell system),
• Point-group symmetry,
• Dynamic load balancing (local index sort and matrix multiplications) parallelism,
• Multiple parallel I/O schemes including fully incore algorithm using Global Arrays,
This extensible module is designed such that an existing or new model of many-electron theory can be added and
further optimization can be incorporated with ease by virtue of the TCE. This module is still being actively enhanced
by the TCE and we hope to include more models and optimizations in future releases!
In addition to changes made in the 4.7 version the most essential component of the 5.0 release include:
• Several variants of active-space CCSDt and EOMCCSDt methods that employ limited set of triply excited
cluster amplitudes defined by active orbitals.
• Ground-state non-iterative CC approaches that account for the effect of triply and/or quadruply excited con-
nected clusters: the perturbative approaches based on the similarity transformed Hamiltonian: CCSD(2), CCSD(2)T ,
CCSDT(2)Q , the completely and locally renormalized methods: CR-CCSD(T), LR-CCSD(T), LR-CCSD(TQ)-
1.
• Excited-state non-iterative corrections due to triples to the EOMCCSD excitation energies: the completely
renormalized EOMCCSD(T) method (CR-EOMCCSD(T)).
• New form of the offset tables for files that store cluster amplitudes, recursive intermediates, and one- and two-
electron integrals.
• More efficient storage of two-electron integrals for CC calculations based on the RHF or ROHF references.
• A. Szabo and N. S. Ostlund, Modern Quantum Chemistry: Introduction to Advanced Electronic Structure The-
ory,
• S. Hirata, T. Yanai, W. A. de Jong, T. Nakajima, and K. Hirao, J. Chem. Phys. 120, 3297 (2004).
• S. Hirata, P.-D. Fan, A.A. Auer, M. Nooijen, P. Piecuch, J. Chem. Phys. 121, 12197 (2004).
• K. Kowalski, S. Hirata, M. Włoch, P. Piecuch, T.L. Windus, J. Chem. Phys. 123, 074319 (2005).
• P. Piecuch, S.A. Kucharski, and R.J. Bartlett, J. Chem. Phys. 110, 6103 (1999).
• N. Oliphant and L. Adamowicz, Int. Rev. Phys. Chem. 12, 339 (1993).
• P. Piecuch, K. Kowalski, I.S.O. Pimienta, M.J. McGuire, Int. Rev. Phys. Chem. 21, 527 (2002).
The TCE thoroughly analyzes the working equation of many-electron theory models and automatically generates a
program that takes full advantage of these symmetries at the same time. To do so, the TCE first recognizes the index
permutation symmetries among the working equations, and perform strength reduction and factorization by carefully
monitoring the index permutation symmetries of intermediate tensors. Accordingly, every input and output tensor
(such as integrals, excitation amplitudes, residuals) has just two independent but strictly ordered index strings, and
each intermediate tensor has just four independent but strictly ordered index strings. The operation cost and storage
size of tensor contraction is minimized by using the index range restriction arising from these index permutation
symmetries and also spin and spatial symmetry integration.
To maintain the peak local memory usage at a manageable level, in the beginning of the calculation, the orbitals
are rearranged into tiles (blocks) that contains orbitals with the same spin and spatial symmetries. So the tensor
contractions in these methods are carried out at the tile level; the spin, spatial, and index permutation symmetry is
employed to reduce the operation and storage cost at the tile level also.
In a parallel execution, dynamic load balancing of tile-level local tensor index sorting and local tensor contraction
(matrix multiplication) will be invoked.
Each process is assigned a local tensor index sorting and tensor contraction dynamically. It must first retrieve the
tiles of input tensors, and perform these local operations, and accumulate the output tensors to the storage. We have
developed a uniform interface for these I/O operations to either (1) a global file on a global file system, (2) a global
memory on a global or distributed memory system, and (3) semi-replicated files on a distributed file systems. Some of
these operations depend on the ParSoft library.
in the input file, which may be preceeded by the TCE input block that details the calculations:
TCE
[(DFT||HF||SCF) default HF=SCF]
[FREEZE [[core] (atomic || <integer nfzc default 0>)] \
[virtual <integer nfzv default 0>]]
[(LCCD||CCD||CCSD||CC2||LR-CCSD||LCCSD||CCSDT||CCSDTA||CCSDTQ|| \
CCSD(T)||CCSD[T]||CCSD(2)_T||CCSD(2)||CCSDT(2)_Q|| \
CR-CCSD[T]||CR-CCSD(T)|| \
LR-CCSD(T)||LR-CCSD(TQ)-1||CREOMSD(T)|| \
QCISD||CISD||CISDT||CISDTQ|| \
MBPT2||MBPT3||MBPT4||MP2||MP3||MP4) default CCSD]
[THRESH <double thresh default 1e-6>]
[MAXITER <integer maxiter default 100>]
[PRINT (none||low||medium||high||debug)
<string list_of_names ...>]
[IO (fortran||eaf||ga||sf||replicated||dra||ga_eaf) default ga]
[DIIS <integer diis default 5>]
[LSHIFT <double lshift default is 0.0d0>]
[NROOTS <integer nroots default 0>]
[TARGET <integer target default 1>]
[TARGETSYM <character targetsym default ’none’>]
[SYMMETRY]
[2EORB]
[T3A_LVL]
[ACTIVE_OA]
[ACTIVE_OB]
[ACTIVE_VA]
[ACTIVE_VB]
[DIPOLE]
[TILESIZE <no default (automatically adjusted)>]
[(NO)FOCK <logical recompf default .true.>]
[FRAGMENT <default -1 (off)>]
END
Also supported are energy gradient calculation, geometry optimization, and vibrational frequency (or hessian) calcu-
lation, on the basis of numerical differentiation. To perform these calculations, use
or
or
Alternatively, more descriptive keywords for each individual method can be used. For instance, to perform a
CCSDT energy, gradient, etc. calculation, use
158 CHAPTER 15. TENSOR CONTRACTION ENGINE MODULE: CI, MBPT, AND CC
or
or
or
with an (optional) input block enclosed either by UCCSDT and END or by UCC and END. The keywords for individual
methods of TCE module always start with letter U which stands for “unrestricted” to avoid confusion with other related
methods (such as spin-restricted CCSD and various canonical MP2 implementation) already in place in NWChem.
(UCCSDT||UCC)
[(DFT||HF||SCF) default HF=SCF]
[FREEZE [[core] (atomic || <integer nfzc default 0>)] \
[virtual <integer nfzv default 0>]]
[THRESH <double thresh default 1e-6>]
[MAXITER <integer maxiter default 100>]
[PRINT (none||low||medium||high||debug)]
<string list_of_names ...>]
[IO (fortran||eaf||ga||sf||replicated||dra||ga_eaf) default ga]
[DIIS <integer diis default 5>]
[NROOTS <integer nroots default 0>]
[TARGET <integer target default 1>]
[TARGETSYM <character targetsym default ’none’>]
[SYMMETRY]
[DIPOLE]
[TILESIZE <no default (automatically adjusted)>]
[(NO)FOCK <logical recompf default .true.>]
[FRAGMENT <default -1 (off)>]
END
When a method (CCSDT in this example) is specified in the task directive, a duplicate method specification is not
necessary (indeed not allowed) in the corresponding (UCCSDT or UCC in this case) input block. The keywords of the
other methods for task directive are:
or
or
15.5. KEYWORDS OF TCE INPUT BLOCK 159
etc. The input block can be specified by the same name (UCISDT and END block for TASK UCISDT ENERGY) or
UCC for the CC family, UCI for the CI family, and UMP or UMBPT for the MP family of methods.
The user may also specify the parameters of reference wave function calculation in a separate block for either HF
(SCF) or DFT, depending on the first keyword in the above syntax.
Since each keyword has a default value, a minimal input file will be
GEOMETRY
Be 0.0 0.0 0.0
END
BASIS
Be library cc-pVDZ
END
which performs a CCSD/cc-pVDZ calculation of the Be atom in its singlet ground state with a spin-restricted HF
reference.
This keyword tells the module which of the HF (SCF) or DFT module is going to be used for the calculation of a
reference wave function. The keyword HF and SCF are one and the same keyword internally, and are default. When
these are used, the details of the HF (SCF) calculation can be specified in the SCF input block, whereas if DFT is
chosen, DFT input block may be provided.
For instance, RHF-RCCSDT calculation (R standing for spin-restricted) can be performed with the following input
blocks:
SCF
SINGLET
RHF
END
TCE
SCF
CCSDT
END
or
SCF
160 CHAPTER 15. TENSOR CONTRACTION ENGINE MODULE: CI, MBPT, AND CC
SINGLET
RHF
END
UCCSDT
SCF
END
or
SCF
SINGLET
RHF
END
UCC
SCF
END
This calculation (and any correlation calculation in the TCE module using a RHF or RDFT reference for a closed-shell
system) skips the storage and computation of all β spin blocks of integrals and excitation amplitudes. ROHF-UCCSDT
(U standing for spin-unrestricted) for an open-shell doublet system can be requested by
SCF
DOUBLET
ROHF
END
TCE
SCF
CCSDT
END
and likewise, UHF-UCCSDT for an open-shell doublet system can be specified with
SCF
DOUBLET
UHF
END
TCE
SCF
CCSDT
END
15.5. KEYWORDS OF TCE INPUT BLOCK 161
The operation and storage costs of the last two calculations are identical. To use the KS DFT reference wave function
for a UCCSD calculation of an open-shell doublet system,
DFT
ODFT
MULT 2
END
TCE
DFT
CCSD
END
• CREOMSD(T): EOMCCSD energies and completely renormalized EOMCCSD(T)(IA) correction. In this op-
CR−EOMCCSD(T),IA
tion NWCHEM prints two components: (1) total energy of the K-th state EK = EKEOMCCSD +δK (T )
CR−EOMCCSD(T),IA CR−EOMCCSD(T),IA
and (2) the so-called δ-corrected EOMCCSD excitation energy ωK = ωKEOMCCSD +δK (T ).
All of these models are based on spin-orbital expressions of the amplitude and energy equations, and designed
primarily for spin-unrestricted reference wave functions. However, for a restricted reference wave function of a closed-
shell system, some further reduction of operation and storage cost will be made. Within the unrestricted framework,
all these methods take full advantage of spin, spatial, and index permutation symmetries to save operation and storage
costs at every stage of the calculation. Consequently, these computer-generated programs will perform significantly
faster than, for instance, a hand-written spin-adapted CCSD program in NWChem, although the nominal operation
cost for a spin-adapted CCSD is just one half of that for spin-unrestricted CCSD (in spin-unrestricted CCSD there are
three independent sets of excitation amplitudes, whereas in spin-adapted CCSD there is only one set, so the nominal
operation cost for the latter is one third of that of the former. For a restricted reference wave function of a closed-shell
system, all β spin block of the excitation amplitudes and integrals can be trivially mapped to the all α spin block,
reducing the ratio to one half).
While the MBPT (MP) models implemented in the TCE module give identical correlation energies as conventional
implementation for a canonical HF reference of a closed-shell system, the former are intrinsically more general and
theoretically robust for other less standard reference wave functions and open-shell systems. This is because the zeroth
order of Hamiltonian is chosen to be the full Fock operatior (not just the diagonal part), and no further approximation
was invoked. So unlike the conventional implementation where the Fock matrix is assumed to be diagonal and a
correlation energy is evaluated in a single analytical formula that involves orbital energies (or diagonal Fock matrix
elements), the present tensor MBPT requires the iterative solution of amplitude equations and subsequent energy
evaluation and is generally more expensive than the former. For example, the operation cost of many conventional
implementation of MBPT(2) scales as the fourth power of the system size, but the cost of the present tensor MBPT(2)
scales as the fifth power of the system size, as the latter permits non-canonical HF reference and the former does not
(to reinstate the non-canonical HF reference in the former makes it also scale as the fifth power of the system size).
15.5. KEYWORDS OF TCE INPUT BLOCK 163
This keyword specifies the convergence threshold of iterative solutions of amplitude equations, and applies to all of
the CI, CC, and MBPT models. The threshold refers to the norm of residual, namely, the deviation from the amplitude
equations. The default value is 1e-6.
It sets the maximum allowed number iterations for the iterative solutions of amplitude equations. The default value is
100.
There are five parallel I/O schemes implemented for all the models, which need to be wisely chosen for a particular
problem and computer architecture.
The GA algorithm, which is default, stores all input (integrals and excitation amplitudes), output (residuals), and
intermediate tensors in the shared memory area across all nodes by virtue of GA library. This fully incore algorithm
replaces disk I/O by inter-process communications. This is a recommended algorithm whenever feasible. Note that the
memory management through runtime orbital range tiling described above applies to local (unshared) memory of each
node, which may be separately allocated from the shared memory space for GA. So when there is not enough shared
memory space (either physically or due to software limitations, in particular, shmmax setting), the GA algorithm
can crash due to an out-of-memory error. The replicated scheme is the currently the only disk-based algorithm for
a genuinely distributed file system. This means that each node keeps an identical copy of input tensors and it holds
non-identical overlapping segments of intermediate and output tensors in its local disk. Whenever data coherency is
required, a file reconcilation process will take place to make the intermediate and output data identical throughout the
nodes. This algorithm, while requiring redundant data space on local disk, performs reasonably efficiently in parallel.
For sequential execution, this reduces to the EAF scheme. For a global file system, the SF scheme is recommended.
This together with the Fortran77 direct access scheme does not usually exhibit scalability unless shared files on the
global file system also share the same I/O buffer. For sequential executions, the SF, EAF, and replicated schemes are
interchangeable, while the Fortran77 scheme is appreciably slower.
Two new I/O algorithms dra and ga_eaf combines GA and DRA or EAF based replicated algorithm. In the
former, arrays that are not active (e.g., prior T amplitudes used in DIIS or EOM-CC trial vectors) in GA algorithm
will be moved to DRA. In the latter, the intermediates that are formed by tensor contractions are initially stored in
GA, thereby avoiding the need to accumulate the fragments of the intermediate scattered in EAFs in the original EAF
algorithm. Once the intermediate is formed completely, then it will be replicated as EAFs.
164 CHAPTER 15. TENSOR CONTRACTION ENGINE MODULE: CI, MBPT, AND CC
It sets the number iterations in which a DIIS extrapolation is performed to accelerate the convergence of excitation
amplitudes. The default value is 5, which means in every five iteration, one DIIS extrapolation is performed (and in
the rest of the iterations, Jacobi rotation is used). When zero or negative value is specified, the DIIS is turned off. It is
not recommended to perform DIIS every iteration, whereas setting a large value for this parameter necessitates a large
memory (disk) space to keep the excitation amplitudes of previous iterations. In 5.0 version we significantly improved
the DIIS solver by re-organizing the itrative process and by introducing the level shift option (lshift) that enable
to increase small orbital energy differences used in calculating the up-dates for cluster amplitudes. Typical values
for lshift oscillates between 0.3 and 0.5 for CC calculations for ground states of multi-configurational character.
Otherwise, the value of lshift is by default set equal to 0.
Some of the lowest-lying core orbitals and/or some of the highest-lying virtual orbitals may be excluded in the calcula-
tions by this keyword (this does not affect the ground state HF or DFT calculation). No orbitals are frozen by default.
To exclude the atom-like core regions altogether, one may request
FREEZE atomic
To specify the number of lowest-lying occupied orbitals be excluded, one may use
FREEZE 10
FREEZE core 10
To freeze the highest virtual orbitals, use the virtual keyword. For instance, to freeze the top 5 virtuals
FREEZE virtual 5
One can specify the number of excited state roots to be determined. The default value is 1. It is advised that the users
request several more roots than actually needed, since owing to the nature of the trial vector algorithm, some low-lying
roots can be missed when they do not have sufficient overlap with the initial guess vectors.
15.5.9 TARGET and TARGETSYM — the target root and its symmetry
At the moment, the first and second geometrical derivatives of excitation energies that are needed in force, geome-
try, and frequency calculations are obtained by numerical differentiation. These keywords may be used to specify
which excited state root is being used for the geometrical derivative calculation. For instance, when TARGET 3 and
TARGETSYM a1g are included in the input block, the total energy (ground state energy plus excitation energy) of
the third lowest excited state root (excluding the ground state) transforming as the irreducible representation a1g will
be passed to the module which performs the derivative calculations. The default values of these keywords are 1 and
none, respectively.
15.5. KEYWORDS OF TCE INPUT BLOCK 165
The keyword TARGETSYM is essential in excited state geometry optimization, since it is very common that the
order of excited states changes due to the geometry changes in the course of optimization. Without specifying the
TARGETSYM, the optimizer could (and would likely) be optimizing the geometry of an excited state that is different
from the one the user had intended to optimize at the starting geometry. On the other hand, in the frequency calcula-
tions, TARGETSYM must be none, since the finite displacements given in the course of frequency calculations will
lift the spatial symmetry of the equilibrium geometry. When these finite displacements can alter the order of excited
states including the target state, the frequency calculation is not be feasible.
By adding this keyword to the input block, the user can request the module to seek just the roots of the specified
irreducible representation as TARGETSYM. By default, this option is not set. TARGETSYM must be specified when
SYMMETRY is invoked.
In the 5.0 version a new option has been added in order to provide more economical way of storing two-electron
integrals used in CC calculations based on the RHF and ROHF references. The 2EORB keyword can be used in the
context of all (except for the active-space approaches) CC approaches. All two-electron integrals are transfromed
and subsequently stored in a way which is compatible with assumed tiling scheme. The transformation from orbital
to spinorbital form of the two-electron integrals is performed on-the-fly during execution of the CC module. This
option, although slower, allows to significantly reduce the memory requirements needed by the first half of 4-index
transformation and final file with fully transformed two-electron integrals . Savings in the memory requirements on
the order of magnitude (and more) have been observed for large-scale open-shell calculations.
In addition to the algorithm implemented in the 5.0 version, several new computation-intensive algorithms has been
added to the 5.1 version with the purpose of improving scalability and overcoming local memory bottleneck of the
5.0 2EORB 4-index transformation. All new variants of 4-index transormation should be executed on multiprocessor
machines. In order to give the user a full control over this part of the TCE code several keywords were designed to
define the most vital parameters that determine the perfromance of 4-index transformation. All new keywords must be
used with the 2EORB keyword. The 2emet keyword (default value 1 or "2emet 1", refers to the old 5.0 version of
the 4-index transformation), defines the algorithm to be used. By putting "2emet 2" the TCE code will execute the
algoritm based on the two step procedure with two intermediate files. In many instances this algorithm is characterized
by better timings compared to algorithms 3 and 4, although it is more memory demanding. In contrast to algorithms
nr 1,3, and 4 this approach can make use of a disk to store intermediate files. For this purpose one should use the
keyword idiskx ("idiskx 0" causes that all intermediate files are stored on globall arrays, while "idiskx 1"
tells the code to use a disk to store intermediates; default value of idiskx is equal 0). Algorithm nr 3 ("2emet 3")
uses only one intermediate file whereas algorithm nr 4 ("2emet 4") is a version of algorithm 3 with inbuild option
of reducing the memory requirements. For example, by using new keyword "split 4" we will reduce the size of
only intermediate file by factor of 4 (the split keyword can be only used in the context of algorithm nr 4). All
new algorithms (i.e., 2,3,4) use the attilesize to define the size of atomic tile. By default attilesize is set
equal 30. For larger systems the use of larger values of attilesize is recommended (typically between 40-60).
In the later part of this manual several examples illustrate the use of the newly introduced keywords.
When this is set, the ground-state CC calculation will enter another round of iterative step for the so-called Λ equation
to obtain the one-particle density matrix and dipole moments. Likewise, for excited-states (EOM-CC), the transition
166 CHAPTER 15. TENSOR CONTRACTION ENGINE MODULE: CI, MBPT, AND CC
moments and dipole moments will be computed when (and only when) this option is set. In the latter case, EOM-CC
left hand side solutions will be sought incurring approximately three times the computational cost of excitation energies
alone (note that the EOM-CC effective Hamiltonian is not Hermitian and has distinct left and right eigenvectors).
The default is FOCK meaning that the Fock matrix will be reconstructed (as opposed to using the orbital energies
as the diagonal part of Fock). This is essential in getting correct correlation energies with ROHF or DFT reference
wave functions. However, currently, this module cannot reconstruct the Fock matrix when one-component relativistic
effects are operative. So when a user wishes to run TCE’s correlation methods with DK or other relativistic reference,
NOFOCK must be set and orbital energies must be used for the Fock matrix.
This keyword changes the level of output verbosity. One may also request some particular items in Table 15.1 printed.
Table 15.1: Printable items in the TCE modules and their default print levels.
Item Print Level Description
“time” vary CPU and wall times
“tile” vary Orbital range tiling information
“t1” debug T1 excitation amplitude dumping
“t2” debug T2 excitation amplitude dumping
“t3” debug T3 excitation amplitude dumping
“t4” debug T4 excitation amplitude dumping
“general information” default General information
“correlation information” default TCE information
“mbpt2” debug Caonical HF MBPT2 test
“get_block” debug I/O information
“put_block” debug I/O information
“add_block” debug I/O information
“files” debug File information
“offset” debug File offset information
“ao1e” debug AO one-electron integral evaluation
“ao2e” debug AO two-electron integral evaluation
“mo1e” debug One-electron integral transformation
“mo2e” debug Two-electron integral transformation
START h2o
CHARGE 1
GEOMETRY
O 0.00000000 0.00000000 0.12982363
H 0.75933475 0.00000000 -0.46621158
H -0.75933475 0.00000000 -0.46621158
END
BASIS
* library cc-pVTZ
END
SCF
ROHF
DOUBLET
THRESH 1.0e-10
TOL2E 1.0e-10
END
TCE
CCSD
END
START h2o
CHARGE 1
GEOMETRY
O 0.00000000 0.00000000 0.12982363
H 0.75933475 0.00000000 -0.46621158
H -0.75933475 0.00000000 -0.46621158
END
BASIS
* library cc-pVTZ
END
SCF
ROHF
DOUBLET
THRESH 1.0e-10
TOL2E 1.0e-10
END
EOM-CCSDT calculation for excitation energies, excited-state dipole, and transition moments.
START tce_h2o_eomcc
BASIS
* library sto-3g
END
SCF
SINGLET
RHF
END
TCE
CCSDT
DIPOLE
FREEZE CORE ATOMIC
NROOTS 1
END
Active-space CCSDt/EOMCCSDt calculations (version I) of several excited states of the Be3 molecule. Three
highest-lying occupied α and β orbitals (active_oa and active_ob) and nine lowest-lying unoccupied α and β
orbitals (active_va and active_vb) define the active space.
START TCE_ACTIVE_CCSDT
ECHO
BASIS spherical
# --- DEFINE YOUR BASIS SET ---
END
SCF
THRESH 1.0e-10
TOL2E 1.0e-10
SINGLET
15.6. SAMPLE INPUT 169
RHF
END
TCE
FREEZE ATOMIC
CCSDTA
TILESIZE 15
THRESH 1.0d-5
ACTIVE_OA 3
ACTIVE_OB 3
ACTIVE_VA 9
ACTIVE_VB 9
T3A_LVL 1
NROOTS 2
END
Completely renormalized EOMCCSD(T) (CR-EOMCCSD(T)) calculations for the ozone molecule as described by
the POL1 basis set. The CREOMSD(T) directive automatically initialize three-step procedure: (1) CCSD calculations;
(2) EOMCCSD calculations; (3) non-iterative CR-EOMCCSD(T) corrections.
START TCE_CR_EOM_T_OZONE
ECHO
BASIS SPHERICAL
O S
10662.285000000 0.00079900
1599.709700000 0.00615300
364.725260000 0.03115700
103.651790000 0.11559600
33.905805000 0.30155200
O S
12.287469000 0.44487000
4.756805000 0.24317200
O S
1.004271000 1.00000000
O S
0.300686000 1.00000000
O S
0.090030000 1.00000000
170 CHAPTER 15. TENSOR CONTRACTION ENGINE MODULE: CI, MBPT, AND CC
O P
34.856463000 0.01564800
7.843131000 0.09819700
2.306249000 0.30776800
0.723164000 0.49247000
O P
0.214882000 1.00000000
O P
0.063850000 1.00000000
O D
2.306200000 0.20270000
0.723200000 0.57910000
O D
0.214900000 0.78545000
0.063900000 0.53387000
END
SCF
THRESH 1.0e-10
TOL2E 1.0e-10
SINGLET
RHF
END
TCE
FREEZE ATOMIC
CREOMSD(T)
TILESIZE 20
THRESH 1.0d-6
NROOTS 2
END
The LR-CCSD(T) calculations for the glycine molecule in the aug-cc-pVTZ basis set. Option 2EORB is used in
order to minimize memory requirements associated with the storage of two-electron integrals.
START TCE_LR_CCSD_T
ECHO
BASIS
* library aug-cc-pVTZ
END
SCF
THRESH 1.0e-10
TOL2E 1.0e-10
SINGLET
RHF
END
TCE
FREEZE ATOMIC
2EORB
TILESIZE 15
LR-CCSD(T)
THRESH 1.0d-7
END
The CCSD calculations for the triplet state of the C20 molecule. New algorithms for 4-index tranformation are
used.
title "c20_cage"
echo
start c20_cage
memory stack 2320 mb heap 180 mb global 2000 mb noverify
basis spherical
* library cc-pvtz
end
scf
triplet
rohf
thresh 1.e-8
maxiter 200
end
tce
ccsd
maxiter 60
diis 5
thresh 1.e-6
2eorb
2emet 3
attilesize 40
tilesize 30
freeze atomic
end
15.7.1 Introduction
Response properties can be calculated within the TCE. The current functionality is limited to ground-state dipole
polarizabilities for the CCSD and CCSDT levels of theory. Like the rest of the TCE, properties can be calculated with
RHF, UHF, ROHF and DFT reference wavefunctions.
Specific details about the CCSD-LR implementation can be found in the following papers:
• J. R. Hammond, M. Valiev, W. A. deJong and K. Kowalski, J. Phys. Chem. A, 111, 5492 (2007).
An appropriate background on coupled-cluster linear response (CC-LR) can be found in the references of those papers.
15.7. TCE RESPONSE PROPERTIES 173
15.7.2 Performance
The coupled-cluster response codes were generated in the same manner as the rest of the TCE, thus all previous
comments on performance apply here as well. The improved offsets available in the CCSD and EOM-CCSD codes is
now also available in the CCSD-Λ and CCSD-LR codes. The bottleneck for CCSD-LR is the same as EOM-CCSD,
likewise for CCSDT-LR and EOM-CCSDT. The CCSD-LR code has been tested on as many as 720 processors for
systems with more than 1000 spin-orbitals, while the CCSDT-LR code has been tested on as many as 128 processors.
15.7.3 Input
The input commands for TCE response properties exclusively use set directives (see Section 5.7) instead of TCE input
block keywords. There are currently only three commands available:
The boolean variable lineresp invokes the linear response equations for the corresponding coupled-cluster
method (only CCSD and CCSDT possess this feature) and evaluates the dipole polarizability. When lineresp is
true, the Λ-equations will also be solved, so the dipole moment is also calculated. If no other options are set, the com-
plete dipole polarizability tensor will be calculated at zero frequency (static). Up to nine real frequencies can be set;
adding more should not crash the code but it will calculate meaningless quanities. If one desires to calculate more fre-
quencies at one time, merely change the line double precision afreq(9) in $(NWCHEM_TOP)/src/tce/include/tce.f
appropriately and recompile.
The user can choose to calculate response amplitudes only for certain axis, either because of redundancy due to
symmetry or because of memory limitations. The boolean vector of length three respaxis is used to determine
whether or not a given set of response amplitudes are allocated, solved for, and used in the polarizability tensor
evaluation. The logical variables represent the X, Y, Z axes, respectively. If respaxis is set to T F T, for example,
the responses with respect to the X and Z dipoles will be calculated, and the four (three unique) tensor components
will be evaluated. This feature is also useful for conserving memory. By calculating only one axis at a time, memory
requirements can be reduced by 25% or more, depending on the number of DIIS vectors used. Reducing the number
of DIIS vectors also reduces the memory requirements.
It is strongly advised that when calculating polarizabilities at high-frequencies, that user set the frequencies in
increasing order, preferably starting with zero or other small value. This approach is computationally efficient (the
initial guess for subsequent responses is the previously converged value) and mitigates starting from a zero vector for
the response amplitudes.
15.7.4 Examples
basis spherical
* library aug-cc-pvdz
end
tce
freeze atomic
ccsd
io ga
2eorb
tilesize 16
end
set tce:lineresp T
set tce:afreq 0.000 0.072
set tce:respaxis T T T
geometry units au
symmetry c2v
H 0 0 0
F 0 0 1.7328795
end
basis spherical
* library aug-cc-pvdz
end
tce
ccsdt
io ga
2eorb
end
set tce:lineresp T
set tce:afreq 0.0 0.1 0.2 0.3 0.4
set tce:respaxis T F T
MP2
There are (at least) three algorithms within NWChem that compute the Møller-Plesset (or many-body) perturbation
theory second-order correction to the Hartree-Fock energy (MP2). They vary in capability, the size of system that can
be treated and use of other approximations
• Semi-direct — this is recommended for most large applications (up to about 2800 basis functions), especially
on the IBM SP and other machines with significant disk I/O capability. Partially transformed integrals are stored
on disk, multi-passing as necessary. RHF and UHF references may be treated including computation of analytic
derivatives. This is selected by specifying mp2 on the task directive, e.g.
TASK MP2
• Fully-direct — this is of utility if only limited I/O resources are available (up to about 2800 functions). Only
RHF references and energies are available. This is selected by specifying direct_mp2 on the task directive,
e.g.
TASK DIRECT_MP2
• Resolution of the identity (RI) approximation MP2 (RI-MP2) — this uses the RI approximation and is therefore
only exact in the limit of a complete fitting basis. However, with some care, high accuracy may be obtained with
relatively modest fitting basis sets. An RI-MP2 calculation can cost over 40 times less than the corresponding
exact MP2 calculation. RHF and UHF references with only energies are available. This is selected by specifying
rimp2 on the task directive, e.g.,
TASK RIMP2
MP2
[FREEZE [[core] (atomic || <integer nfzc default 0>)] \
[virtual <integer nfzv default 0>]]
[TIGHT]
[PRINT]
[NOPRINT]
[VECTORS <string filename default scf-output-vectors> \
[swap [(alpha||beta)] <integer pair-list>] ]
175
176 CHAPTER 16. MP2
freeze atomic
For example, in a calculation on Si(OH)2 , by default the lowest seven orbitals would be frozen (the oxygen 1s, and
the silicon 1s, 2s and 2p).
Table 16.1: Number of orbitals considered “core” in the “freeze by atoms” algorithm.
Caution: The rule for freezing orbitals “by atoms” are rather unsophisticated: the number of orbitals to be frozen
is computed from the Table 16.1 by summing the number of core orbitals in each atom present. The corresponding
number of lowest-energy orbitals are frozen — if for some reason the actual core orbitals are not the lowest lying, then
correct results will not be obtained. From limited experience, it seems that special attention should be paid to systems
including third- and higher- period atoms.
The user may also specify the number of orbitals to be frozen by atom. Following the Si(OH)2 example, the user
could specify
freeze atomic O 1 Si 3
In this case only the lowest four orbitals would be frozen. If the user does not specify the orbitals by atom, the rules
default to Table 16.1.
Caution: The system does not check for a valid number of orbitals per atom. If the user specifies to freeze more
orbitals then are available for the atom, the system will not catch the error. The user must specify a logical number of
orbitals to be frozen for the atom.
The FREEZE directive may also be used to specify the number of core orbitals to freeze. For instance, to freeze
the first 10 orbitals
freeze 10
16.2. TIGHT — INCREASED PRECISION 177
freeze core 10
Again, note that if the 10 orbitals to be frozen do not correspond to the first 10 orbitals, then the swap keyword of the
VECTORS directive must be used to order the input orbitals correctly (Section 16.5).
To freeze the highest virtual orbitals, use the virtual keyword. For instance, to freeze the top 5 virtuals
freeze virtual 5
Again, note that this only works for the direct-MP2 and RI-MP2 energy codes.
scratchdisk 512
puts an upper limit of 512 MBytes to the semi-direct MP2 usage of disk (again, on a per process base).
vectors /tmp/h2o.movecs
178 CHAPTER 16. MP2
Table 16.2: Printable items in the MP2 modules and their default print levels.
Item Print Level Description
RI-MP2
As noted above (Section 16.1) if the SCF orbitals are not in the correct order, it is necessary to permute the
input orbitals using the swap keyword of the VECTORS directive. For instance, if it is desired to freeze a total six
orbitals corresponding to the SCF orbitals 1–5, and 7, it is necessary to swap orbital 7 into the 6th position. This is
accomplished by
vectors swap 6 7
Alternatively, using a standard capability of basis sets (Section 7) another named basis may be associated with the
fitting basis. For instance, the following input specifies a basis with the name "small fitting basis" and then
defines this to be the "ri-mp2 basis".
16.7. FILE3C — RI-MP2 3-CENTER INTEGRAL FILENAME 179
file3c /scratch/h2o.3c
Construction of the RI fit requires the inversion of a matrix of fitting basis integrals which is carried out via diago-
nalization. If the fitting basis includes near linear dependencies, there will be small eigenvalues which can ultimately
lead to non-physical RI-MP2 correlation energies. Eigenvectors of the fitting matrix are discarded if the correspond-
ing eigenvalue is less than $mineval$ which defaults to 10−8 . This parameter may be changed by setting the a
parameter in the database. For instance, to set it to 10−10
The user has the option of specifying that the RI-MP2 calculations are to be done with variations of the SCF reference
wavefunction. This is accomplished with a SET directive of the form,
Each element specified for array is the SCF spin case to be used for the corresponding spin case of the correlated
calculation. The number of elements set determines the overall type of correlated calculation to be performed. The
default is to use the unadulterated SCF reference wavefunction.
For example, to perform a spin-unrestricted calculation (two elements) using the alpha spin orbitals (spin case 1)
from the reference for both of the correlated reference spin cases, the SET directive would be as follows,
The SCF calculation to produce the reference wavefunction could be either RHF or UHF in this case.
The SET directive for a similar case, but this time using the beta-spin SCF orbitals for both correlated spin cases,
is as follows,
The SET directive for a spin-unrestricted calculation with the spins flipped from the original SCF reference wave-
function is as follows,
The user can control the size of each batch in the transformation and energy evaluation in the MP2 calculation, and
consequently the memory requirements and number of passes required. This is done using two SET directives of the
following form,
The default is for the code to determine the batch size based on the available memory. Should there be problems
with the program-determined batch sizes, these variables allow the user to override them. The program will always
use the smaller of the user’s value of these entries and the internally computed batch size.
The transformation batch size computed in the code is the number of occupied orbitals in the (occ vir| f it) three-
center integrals to be produced at a time. If this entry is less than the number of occupied orbitals in the system, the
transformation will require multiple passes through the two-electron integrals. The memory requirements of this stage
are two global arrays of dimension < batchsize > × vir × f it with the “fit” dimension distributed across all processors
(on shell-block boundaries). The compromise here is memory space versus multiple integral evaluations.
The energy evaluation batch sizes are computed in the code from the number of occupied orbitals in the two sets of
three-center integrals to be multiplied together to produce a matrix of approximate four-center integrals. Two blocks
of integrals of dimension (< batchisize > × vir) and (< batch jsize > × vir) by fit are read in from disk and multiplied
together to produce < batchisize >< batch jsize > vir2 approximate integrals. The compromise here is performance
of the distributed matrix multiplication (which requires large matrices) versus memory space.
16.10. ONE-ELECTRON PROPERTIES AND NATURAL ORBITALS 181
The user must choose a strategy for the memory allocation in the energy evaluation phase of the RI-MP2 calculation,
either by minimizing the amount of I/O, or minimizing the amount of computation. This can be accomplished using a
SET directive of the form,
A value of I entered for the string mem_opt means that a strategy to minimize I/O will be employed. A value of
C tells the code to use a strategy that minimizes computation.
When the option to minimize I/O is selected, the block sizes are made as large as possible so that the total number
of passes through the integral files is as small as possible. When the option to minimize computation is selected, the
blocks are chosen as close to square as possible so that permutational symmetry in the energy evaluation can be used
most effectively.
For most applications, the code will be able to size the blocks without help from the user. Therefore, it is unlikely
that users will have any reason to specify values for these entries except when doing very particular performance
measurements.
The size of xf3ci:AO 1 batch size is the most important of the three, in terms of the effect on performance.
Local memory usage in the first two steps of the transformation is controlled in the RI-MP2 calculation using the
following SET directives,
The size of the local arrays determines the sizes of the two matrix multiplications. These entries set limits on the
size of blocks to be used in each index. The listing above is in order of importance of the parameters to performance,
with xf3ci:AO 1 batch size being most important.
Note that these entries are only upper bounds and that the program will size the blocks according to what it
determines as the best usage of the available local memory. The absolute maximum for a block size is the number of
functions in the AO basis, or the number of fitting basis functions on a node. The absolute minimum value for block
size is the size of the largest shell in the appropriate basis. Batch size entries specified for max that are larger than
these limits are automatically reset to an appropriate value.
Note that the MP2 linear response density matrix is not necessarily positive definite so it is not unusual to see a few
small negative natural orbital occupation numbers.
Chapter 17
Multiconfiguration SCF
The NWChem multiconfiguration SCF (MCSCF) module can currently perform complete active space SCF (CASSCF)
calculations with at most 20 active orbitals and about 500 basis functions. It is planned to extend it to handle 1000+
basis functions.
MCSCF
STATE <string state>
ACTIVE <integer nactive>
ACTELEC <integer nactelec>
MULTIPLICITY <integer multiplicity>
[SYMMETRY <integer symmetry default 1>]
[VECTORS [[input] <string input_file default $file_prefix$.movecs>]
[swap <integer vec1 vec2> ...] \
[output <string output_file default input_file>] \
[lock]
[HESSIAN (exact||onel)]
[MAXITER <integer maxiter default 20>]
[THRESH <real thresh default 1.0e-4>]
[TOL2E <real tol2e default 1.0e-9>]
[LEVEL <real shift default 0.1d0>]
END
Note that the ACTIVE, ACTELEC, and MULTIPLICITY directives are required. The symmetry and multiplicity may
alternatively be entered using the STATE directive.
active 10
The input molecular orbitals (see the vectors directive, Sections 17.6 and 10.5) must be arranged in order
183
184 CHAPTER 17. MULTICONFIGURATION SCF
3. unoccupied orbitals.
The number of electrons in the CASSCF active space must be specified using the the ACTELEC directive. An error is
reported if the number of active electrons and the multiplicity are inconsistent.
The number of closed shells is determined by subtracting the number of active electrons from the total number of
electrons (which in turn is derived from the sum of the nuclear charges minus the total system charge).
17.3 MULTIPLICITY
The spin multiplicity must be specified and is enforced by projection of the determinant wavefunction.
E.g., to obtain a triplet state
multiplicity 3
This species the irreducible representation of the wavefunction as an integer in the range 1—8 using the same num-
bering of representations as output by the SCF program. Note that only Abelian point groups are supported.
E.g., to specify a B1 state when using the C2v group
symmetry 3
The electronic state (spatial symmetry and multiplicity) may alternatively be specified using the conventional notation
for an electronic state, such as 3 B2 for a triplet state of B2 symmetry. This would be accomplished with the input
state 3b2
which is equivalent to
symmetry 4
multiplicity 3
17.6. VECTORS — INPUT/OUTPUT OF MO VECTORS 185
• Doubly occupied and unoccupied orbitals diagonalize the corresponding blocks of an effective Fock operator.
Note that in the case of degenerate orbital energies this does not fully determine the orbtials.
• Active-space orbitals are chosen as natural orbitals by diagonalization of the active space 1-particle density
matrix. Note that in the case of degenerate occupations that this does not fully determine the orbitals.
hessian onel
level 0.5
Selected CI
The selected CI module is integrated into NWChem but as yet no input module has been written. The input thus
consists of setting the appropriate variables in the database.
It is assumed that an initial SCF/MCSCF calculation has completed, and that MO vectors are available. These will
be used to perform a four-index transformation, if this has not already been performed.
18.1 Background
This is a general spin-adapted, configuration-driven CI program which can perform arbitrary CI calculations, the only
restriction being that all spin functions are present for each orbital occupation. CI wavefunctions may be specified
using a simple configuration generation program, but the prime usage is intended to be in combination with pertur-
bation correction and selection of new configurations. The second-order correction (Epstein-Nesbet) to the CI energy
may be computed, and at the same time configurations that interact greater than a certain threshold with the current
CI wavefunction may be chosen for inclusion in subsequent calculations. By repeating this process (typically twice is
adequate) with the same threshold until no new configurations are added, the CI expansion may be made consistent
with the selection threshold, enabling tentative extrapolation to the full-CI limit.
A typical sequence of calculations is as follows:
4. Compute the perturbation correction and select additional configurations that interact greater than the current
threshold.
6. Lower the threshold (a factor of 10 is common) and repeat steps 3, 4, and 5. The first pass through step 4 will
yield the approximately self-consistent CI and CI+PT energies from the previous selection threshold.
To illustrate this, below is some abbreviated output from a calculation on water in an augmented cc-PVDZ basis
set with one frozen core orbital. The SCF was converged to high precision in C2v symmetry with the following input
187
188 CHAPTER 18. SELECTED CI
start h2o
geometry; symmetry c2v
O 0 0 0; H 0 1.43042809 -1.10715266
end
basis
H library aug-cc-pvdz; O library aug-cc-pvdz
end
task scf
scf; thresh 1d-8; end
The following input restarts from the SCF to perform a sequence of selected CI calculations with the specified
tolerances, starting with the SCF reference.
restart h2o
set fourindex:occ_frozen 1
set selci:mode select
set "selci:selection thresholds" \
0.001 0.001 0.0001 0.0001 0.00001 0.00001 0.000001
task selci
Table 18.1 summarizes the output from each of the major computational steps that were performed.
CI
Step Description dimension Energy
18.2 Files
Currently, no direct control is provided over filenames. All files are prefixed with the standard file-prefix, and any files
generated by all nodes are also postfixed with the processor number. Thus, for example the molecular integrals file,
18.3. CONFIGURATION GENERATION 189
used only by process zero, might be called h2o.moints whereas the off-diagonal Hamiltonian matrix element file
used by process number eight would be called h2o.hamil.8.
ciconf — the CI configuration file, which holds information about the current CI expansion, indexing vectors, etc.
This is the most important file and is required for all restarts. Note that the CI configuration generator is only
run if this file does not exist. Referenced only by process zero.
moints — the molecular integrals, generated by the four-index transformation. As noted above these must currently
be manually deleted, or the database entry selci:moints:force set, to force regeneration. Referenced
only by process zero.
civecs — the CI vectors. Referenced only by process zero.
wmatrx — temporary file used to hold coupling coefficients. Deleted at calculation end. Referenced only by process
zero.
rtname, roname — restart information for the PT selection. Should be automatically deleted if no restart is
necessary. Referenced only by process zero.
hamdg — diagonal elements of the Hamiltonian. Deleted at calculation end. Referenced only by process zero.
hamil — off-diagonal Hamiltonian matrix elements. All processes generate a file containing a subset of these
elements. These files can become very large. Deleted at calculation end.
ns (socc(i),i=1,ns) (docc(i),i=1,nd)
where ns specifies the number of singly occupied orbitals, socc() is the list of singly occupied orbitals, and docc()
is the list of doubly occupied orbitals (the number of doubly occupied orbitals, nd, is inferred from ns and the total
number of electrons). All occupations may be strung together and inserted into the database as a single integer array
with name "selci:conf". For example, the input
set "selci:conf" \
0 1 2 3 4 \
0 1 2 3 27 \
190 CHAPTER 18. SELECTED CI
0 1 3 4 19 \
2 11 19 1 3 4 \
2 8 27 1 2 3 \
0 1 2 4 25 \
4 3 4 25 27 1 2 \
4 2 3 19 20 1 4 \
4 2 4 20 23 1 3
The optional formatting of the input is just to make this arcane notation easier to read. Relatively few configurations
can be currently specified in this fashion because of the input line limit of 1024 characters.
Up to 10 sets of creation-annihilation operator pairs may be specified, each set containing up to 255 pairs. This suffices
to specify complete active spaces with up to ten electrons.
The number of sets is specified as follows,
set selci:ngen 4
which indicates that there will be four sets. Each set is then specified as a separate integer array
In the absence of friendly, input note that the names "selci:refgen n" must be formatted with n in I2 format.
Each set specifies a list of creation-annihilation operator pairs (in that order). So for instance, in the above example
each set is the same and causes the excitations
If orbitals 3 and 4 were initially doubly occupied, and orbitals 5 and 6 initially unoccupied, then the application of this
set of operators four times in succession is sufficient to generate the four electron in four orbital complete active space.
The precise sequence in which operators are applied is
4. apply the operator to the configuration, if the result is new add it to the new list
By default no excitation is applied to the reference configurations. If, for instance, you wanted to generate a single
excitation CI space from the current configuration list, specify
set selci:exci 1
Any excitation level may be applied, but since the list of configurations is explicitly generated, as is the CI Hamiltonian
matrix, you will run out of disk space if you attempt to use more than a few tens of thousands of configurations.
By default, only one root is generated in the CI diagonalization or perturbation selection. The following requests that
2 roots be generated
set selci:nroot 2
There is no imposed upper limit. If many roots are required, then, to minimize root skipping problems, it helps to
perform an initial approximate diagonalization with several more roots than required, and then resetting this parameter
once satisfied that the desired states are obtained.
By default, the CI wavefunctions are converged to a residual norm of 10−6 which provides similar accuracy in the
perturbation corrections to the energy, and much higher accuracy in the CI eigenvalues. This may be adjusted with
the example setting much lower precision, appropriate for the approximate diagonalization discussed in the preceding
section.
192 CHAPTER 18. SELECTED CI
18.7 Mode
By default the program runs in "ci+davids" mode and just determines the CI eigenvectors/values in the current
configuration space. To perform a selected-CI with perturbation correction use the following
• all unique two-electron integrals in the MO basis that are non-zero by symmetry, and
• all CI information, including the CI vectors.
These large data structures are allocated on the local stack. A fatal error will result if insufficient memory is available.
to accomplish this.
optimizing a geometry the reference list must be kept fixed to keep the potential energy surface continuous and well
defined. To do this specify
The NWChem coupled cluster energy module is primarily the work of Alistair Rendell and Rika Kobayashi, with
contributions from David Bernholdt.
The coupled cluster code can perform calculations with full iterative treatment of single and double excitations
and non-iterative inclusion of triple excitation effects. It is presently limited to closed-shell (RHF) references.
Note that symmetry is not used within most of the CCSD(T) code. This can have a profound impact on performance
since the speed-up from symmetry is roughly the square of the number of irreducible representations. In the absence
of symmetry, the performance of this code is competitive with other programs.
The operation of the coupled cluster code is controlled by the input block
CCSD
[MAXITER <integer maxiter default 20>]
[THRESH <real thresh default 10e-6>]
[TOL2E <real tol2e default min(10e-12 , 0.01*$thresh$)>]
[DIISBAS <integer diisbas default 5>]
[FREEZE [[core] (atomic || <integer nfzc default 0>)] \
[virtual <integer nfzv default 0>]]
[IPRT <integer IPRT default 0>]
[PRINT ...]
[NOPRINT ...]
END
Note that the keyword CCSD is used for the input block regardless of the actual level of theory desired (specified with
the TASK directive). The following directives are recognized within the CCSD group.
The maximum number of iterations is set to 20 by default. This should be quite enough for most calculations, although
particularly troublesome cases may require more.
195
196 CHAPTER 19. COUPLED CLUSTER CALCULATIONS
The variable tol2e is used in determining the integral screening threshold for the evaluation of the energy and
related quantities.
CAUTION! At the present time, the tol2e parameter only affects the three- and four-virtual contributions, and
the triples, all of which are done “on the fly”. The transformations used for the other parts of the code currently have
a hard-wired threshold of 10−12 . The default for tol2e is set to match this, and since user input can only make the
threshold smaller, setting this parameter can only make calculations take longer.
This directive is idential to that used in the MP2 module, Section 16.1.
The coupled cluster module supports the standard NWChem print control keywords, although very little in the code is
actually hooked into this mechanism yet.
Item Print Level Description
“reference” high Wavefunction information
“guess pair energies” debug MP2 pair energies
“byproduct energies” default Intermediate energies
“term debugging switches” debug Switches for individual terms
• CCSD+T(CCSD) – The fourth order triples contribution computed with converged singles and doubles ampli-
tudes
The calculation is invoked using the the TASK directive, so to perform a CCSD+T(CCSD) calculation, for example,
the input file should include the directive
TASK CCSD+T(CCSD)
Lower-level results which come as by-products (such as MP3/MP4) of the requested calculation are generally
also printed in the output file and stored on the run-time database, but the method specified in the TASK directive is
considered the primary result.
The information in this section is intended for use by experts (both with the methodology and with the code), primarily
for debugging and development work. Messing with stuff in listed in this section will probably make your calculation
quantitatively wrong! Consider yourself warned!
The /DEBUG/ common block contains a number of arrays which control the calculation of particular terms in the
program. These are 15-element integer arrays (although from the code only a few elements actually effect anything)
which can be set from the input deck. See the code for details of how the arrays are interpreted.
Printing of this data at run-time is controlled by the "term debugging switches" print option. The values
are checked against the defaults at run-time and a warning is printed to draw attention to the fact that the calculation
does not correspond precisely to the requested method.
198 CHAPTER 19. COUPLED CLUSTER CALCULATIONS
The DRIVER module is one of two drivers (see Section 21 for documentation on STEPPER) to perform a geometry
optimization function on the molecule defined by input using the GEOMETRY directive (see Section 6). Geometry op-
timization is either an energy minimization or a transition state optimization. The algorithm programmed in DRIVER
is a quasi-newton optimization with line searches and approximate energy Hessian updates.
DRIVER is selected by default out of the two available modules to perform geometry optimization. In order to
force use of DRIVER (e.g., because a previous optimization used STEPPER) provide a DRIVER input block (below)
— even an empty block will force use of DRIVER.
Optional input for this module is specified within the compound directive,
DRIVER
(LOOSE || DEFAULT || TIGHT)
GMAX <real value>
GRMS <real value>
XMAX <real value>
XRMS <real value>
CLEAR
REDOAUTOZ
199
200 CHAPTER 20. GEOMETRY OPTIMIZATION WITH DRIVER
PRINT ...
END
On each optimization step a line search is performed. To speed up calculations (up to two times), it may be
beneficial to turn off the line search using following directive:
set driver:linopt 0
In version 3.3 Gaussian-style convergence criteria have been adopted. The defaults may be used, or the directives
LOOSE, DEFAULT, or TIGHT specified to use standard sets of values, or the individual criteria adjusted. All criteria
are in atomic units. GMAX and GRMS control the maximum and root mean square gradient in the coordinates being
used (Z-matrix, redundant internals, or Cartesian). XMAX and XRMS control the maximum and root mean square of
the Cartesian step.
Note that GMAX and GRMS used for convergence of geometry may significantly vary in different coordinate
systems such as Z-matrix, redundant internals, or Cartesian. The coordinate system is defined in the input file (default
is Z-matrix). Therefore the choice of coordinate system may slightly affect converged energy. Although in most cases
XMAX and XRMS are last to converge which are always done in Cartesian coordinates, which insures convergence
to the same geometry in different coordinate systems.
The old criterion may be recovered with the input
In performing a line search the optimizer must know the precision of the energy (this has nothing to do with
convergence criteria). The default value of 1e-7 should be adjusted if less, or more, precision is available. Note that
the default EPREC for DFT calculations is 5e-6 instead of 1e-7.
A fixed trust radius (trust) is used to control the step during minimizations, and is also used for modes being
minimized during saddle-point searches. It defaults to 0.3 for minimizations and 0.1 for saddle-point searches. The
parameter sadstp is the trust radius used for the mode being maximized during a saddle-point search and defaults to
0.1.
By default at most 20 geometry optimization steps will be taken, but this may be modified with this directive.
By default Driver reuses Hessian information from a previous optimization, and, to facilitate a restart also stores
which mode is being followed for a saddle-point search. This option deletes all restart data.
Deletes Hessian data and regenerates internal coordinates at the current geometry. Useful if there has been a large
change in the geometry that has rendered the current set of coordinates invalid or non-optimal.
• 0 = Default ... use restart data if available, otherwise use diagonal guess.
• 2 = Use restart data if available, otherwise transform Cartesian Hessian from previous frequency calculation.
In addition, the diagonal elements of the initial Hessian for internal coordinates may be scaled using separate
factors for bonds, angles and torsions with the following
These values typically give a two-fold speedup over unit values, based on about 100 test cases up to 15 atoms using
3-21g and 6-31g* SCF. However, if doing many optimizations on physically similar systems it may be worth fine
tuning these parameters.
Finally, the entire Hessian from any source may be scaled by a factor using the directive
It might be of utility, for instance, when computing an initial Hessian using SCF to start a large MP2 optimization.
The SCF vibrational modes are expected to be stiffer than the MP2, so scaling the initial Hessian by a number less
than one might be beneficial.
When searching for a transition state the program, by default, will take an initial step uphill and then do mode
following using a fuzzy maximum overlap (the lowest eigen-mode with an overlap with the previous search direction
of 0.7 times the maximum overlap is selected). Once a negative eigen-value is found, that mode is followed regardless
of overlap.
The initial uphill step is appropriate if the gradient points roughly in the direction of the saddle point, such as
might be the case if a constrained optimization was performed at the starting geometry. Alternatively, the initial search
direction may be chosen to be along a specific internal variable (using the directive VARDIR) or along a specific eigen-
mode (using MODDIR). Following a variable might be valuable if the initial gradient is either very small or very large.
Note that the eigen-modes in the optimizer have next-to-nothing to do with the output from a frequency calculation.
You can examine the eigen-modes used by the optimizer with
The selection of the first negative mode is usually a good choice if the search is started in the vicinity of the
transition state and the initial search direction is satisfactory. However, sometimes the first negative mode might not
be the one of interest (e.g., transverse to the reaction direction). If NOFIRSTNEG is specified, the code will not take
the first negative direction and will continue doing mode-following until that mode goes negative.
The XYZ directive causes the geometry at each step (but not intermediate points of a line search) to be output into
separate files in the permanent directory in XYZ format. The optional string will prefix the filename. The NOXYZ
directive turns this off.
For example, the input
The STEPPER module performs a search for critical points on the potential energy surface of the molecule defined
by input using the GEOMETRY directive (see Section 6). Since STEPPER is not the primary geometry optimization
module in NWChem the compound directive is required; the DRIVER module is the default (see Section 20). Input
for this module is specified within the compound directive,
STEPPER
...
END
The presence of the STEPPER compound directive automatically turns off the default geometry optimization tool
DRIVER. Input specified for the STEPPER module must appear in the input file after the GEOMETRY directive, since
it must know the number of atoms that are to be used in the geometry optimization. In the current version of NWChem,
STEPPER can be used only with geometries that are defined in Cartesian coordinates. STEPPER removes translational
and rotational components before determining the step direction (5 components for linear systems and 6 for others)
using a standard Eckart algorithm. The default initial guess nuclear Hessian is the identity matrix.
The default in STEPPER is to minimize the energy as a function of the geometry with a maximum of 20 geometry
optimization iterations. When this is the desired calculation, no input is required other than the STEPPER compound
directive. However, the user also has the option of defining different tasks for the STEPPER module, and can vary the
number of iterations and the convergence criteria from the default values. The input for these options is described in
the following sections.
MIN
STEPPER can also be used to find the transition state by following the lowest eigenvector of the nuclear Hessian.
This is usually invoked by using the saddle keyword on the TASK directive (Section 5.10), but it may also be
selected by specifying the directive
TS
205
206 CHAPTER 21. GEOMETRY OPTIMIZATION WITH STEPPER
The keyword TRACK tells STEPPER to track the eigenvector corresponding to the integer value of <nmode> during a
transition state walk. (Note: this input is invalid for a minimization walk since following a specific eigenvector will not
necessarily give the desired local minimum.) The step is constructed to go up in energy along the nmode eigenvector
and down in all other degrees of freedom.
The value specified for the integer <maxiter> defines the maximum number of geometry optimization steps. The
geometry optimization will restart automatically.
The larger the value specified for the variable radius, the larger the steps that can be taken by STEPPER.
Experience has shown that for larger systems (i.e., those with 20 or more atoms), a value of 0.5, or greater, usually
should be entered for <radius>.
The keyword CONVGG allows the user to specify the convergence tolerance for the gradient norm for all degrees of
freedom. The input line is of the following form,
The entry for the real variable <convgg> should be approximately equal to the square root of the energy convergence
tolerance.
The energy convergence tolerance is the convergence criterion for the energy difference in the geometry optimiza-
tion in STEPPER. It can be specified by input using a line of the following form,
1 If you have done a geometry optimization and hessian generation in the same input deck using a small basis set, you must make sure you delete
the name.stpr41 file since stepper will by default use that hessian and not the one in the name.hess file
208 CHAPTER 21. GEOMETRY OPTIMIZATION WITH STEPPER
Chapter 22
The constraints directive allows the user to specify which constraints should be imposed on the system during the
geometry optimization. Currently such constraints are limited to fixed atom positions and harmonic restraints (springs)
on the distance between the two atoms. The general form of constraints block is presented below:
• name – optional keyword that associates a name with a given set of constraints. Any unnamed set of constraints
will be given a name ”default” and will be automatically loaded prior to a calculation. Any constraints with the
name other than ”default” will have to be loaded manually using SET directive. For example,
CONSTRAINTS one
spring bond 1 3 5.0 1.3
fix atom 1
END
• clear – destroys any prior constraint information. This may be useful when the same constraints have to be
redefined or completely removed from the runtime database.
• enable||disable – enables or disables particular set of constraints without actually removing the information
from the runtime database.
• fix atom – fixes atom positions during geometry optimization. This directive requires an integer list that specifies
which atoms are to be fixed. This directive can be repeated within a given constraints block. To illustrate the use
209
210 CHAPTER 22. CONSTRAINTS FOR GEOMETRY OPTIMIZATION
"fix atom" directive let us consider a situation where we would like to fix atoms 1, 3, 4, 5, 6 while performing
an optimization on some hypothetical system. There are actually several ways to enter this particular constraint.
There is a straightforward option which requires the most typing
constraints
fix atom 1 3 4 5 6
end
constraints
fix atom 1 3:6
end
constraints
fix atom 1
fix atom 3:6
end
• spring bond <i jkr0 > – places a spring with a spring constant k and equilibrium length r0 between atoms i and j
(all in atomic units). Please note that this type of constraint adds an additional term to the total energy expression
1
E = Etotal + k(ri j − r0 )2
2
This additional term forces the distance between atoms i and j to be in the vicinity of r0 but never exactly that.
In general the spring energy term will always have some nonzero residual value, and this has to be accounted
for when comparing total energies. The "spring bond" directive can be repeated within a given constraints
block. If the spring between the same pair of atoms is defined more than once, it will be replaced by the latest
specification in the order it appears in the input block.
Chapter 23
ONIOM is the hybrid method of Morokuma and co-workers that enables different levels of theory to be applied to
different parts of a molecule/system and combined to produce a consistent energy expression. The objective is to
perform a high-level calculation on just a small part of the system and to include the effects of the remainder at lower
levels of theory, with the end result being of similar accuracy to a high-level calculation on the full system.
1. M. Svensson, S. Humbel, R.D.J. Froese, T. Mastubara, S. Sieber, and K. Morokuma, J. Phys. Chem., 100, 19357
(1996).
2. S. Dapprich, I. Komaromi, K.S. Byun, K. Morokuma, and M.J. Frisch, J. Mol. Struct. (Theochem), 461-462, 1
(1999).
3. R.D.J. Froese and K. Morokuma in “Encylopedia of Computational Chemistry,” volume 2, pp.1244-1257, (ed.
P. von Rague Schleyer, John Wiley and Sons, Chichester, Sussex, 1998).
The NWChem ONIOM module implements two- and three-layer ONIOM models for use in energy, gradient,
geometry optimization, and vibrational frequency calculations with any of the pure quantum mechanical methods
within NWChem. At the present time, it is not possible to perform ONIOM calculations with either solvation models
or classical force fields. Nor is it yet possible to compute properties except as derivatives of the total energy.
Using the terminology of Morokuma et al., the full molecular geometry including all atoms is referred to as the
“real” geometry and it is treated using a “low”-level of theory. A subset of these atoms (referred to as the “model”
geometry) are treated using both the “low”-level and a “high”-level of theory. A three-layer model also introduces an
“intermediate” model geometry and a “medium” level of theory.
The two-layer model requires a high and low level of theory and a real and model molecular geometry. The energy
at the high-level of theory for the real geometry is estimated as
The three-layer model requires high, medium and low levels of theory, and real, intermediate and model geometries
and the corresponding energy estimate is
When does ONIOM work well? The approximation for a two-layer model will be good if
211
212 CHAPTER 23. HYBRID CALCULATIONS WITH ONIOM
• the model system includes the interactions that dominate the energy difference being computed and the high-
level of theory describes these to the required precision, and
• the interactions between the model and the rest of the real system (substitution effects) are described to sufficient
accuracy at the lower level of theory.
ONIOM is used to compute energy differences and the absolute energies are not all that meaningful even though they
are well defined. Due to cancellation of errors, ONIOM actually works better than you might expect, but a poorly
designed calculation can yield very bad results. Please read and heed the caution at the end of the article by Dapprich
et al.
The input options are as follows
ONIOM
HIGH <string theory> [basis <string basis default "ao basis">] \
[ecp <string ecp>] [input <string input>]
[MEDIUM <string theory> [basis <string basis default "ao basis">] \
[ecp <string ecp>] [input <string input>]]
LOW <string theory> [basis <string basis default "ao basis">] \
[ecp <string ecp>] [input <string input>]
MODEL <integer natoms> [charge <double charge>] \
[<integer i1 j1> <real g1> [<string tag1>] ...]
[INTER <integer natoms> [charge <double charge>] \
[<integer i1 j1> <real g1> [<string tag1>] ...]]
[VECTORS [low-real <string mofile>] [low-model <string mofile>] \
[high-model <string mofile>] [medium-model <string mofile]\
[medium-inter <string mofile>] [low-inter <string mofile>]]
[PRINT ...]
[NOPRINT ...]
END
The geometry and total charge of the full or real system should be specified as normal using the geometry directive
(see Section 6). If Nmodel of the atoms are to be included in the model system, then these should be specified first
in the geometry. Similarly, in a three-layer calculation, if there are Ninter atoms to be included in the intermediate
system, then these should also be arranged together at the beginning of the geometry. The implict assumption is that
the model system is a subset of the intermediate system which is a subset of the real system. The number of atoms to
be included in the model and intemediate systems are specified using the MODEL and INTER directives. Optionally,
the total charge of the model and intermediate systems may be adjusted. The default is that all three systems have the
same total charge.
Example 1. A two-layer calculation on K + (H2 O) taking the potassium ion as the model system. Note that no
bonds are broken so no link atoms are introduced. The real geometry would be specified with potassium (the model)
first.
23.1. REAL, MODEL AND INTERMEDIATE GEOMETRIES 213
geometry autosym
K 0 0.00 1.37
O 0 0.00 -1.07
H 0 -0.76 -1.68
H 0 0.76 -1.68
end
and the following directive in the ONIOM input block indicates that one atom (implicitly the first in the geometry) is
in the model system
model 1
Link atoms for bonds spanning two regions are automatically generated from the bond information. The additional
parameters on the MODEL and INTER directives describe the broken bonds including scale factors for placement of
the link atom and, optionally, the type of link atom. The type of link atom defaults to hydrogen, but any type may
be specified (actually here you are specifying a geometry tag which is used to associate a geometrical center with an
atom type and basis sets, etc.. See section 6.3). For each broken bond specify the numbers of the two atoms (i and j),
the scale factor (g) and optionally the tag of the link atom. Link atoms are placed along the vector connecting the the
first to the second atom of the bond according to the equation
Rlink = (1 − g)R1 + g ∗ R2
where g is the scale factor. If the scale factor is one, then the link atom is placed where the second atom was. More
usually, the scale factor is less than one, in which case the link atom is placed between the original two atoms. The
scale factor should be chosen so that the link atom (usually hydrogen) is placed near its equilibrium bond length from
the model atom. E.g., when breaking a single carbon-carbon bond (typical length 1.528 Angstrøms) using a hydrogen
link atom we will want a carbon-hydrogen bond length of about 1.084 Angstrøms, so the scale factor should be chosen
as 1.084/1.528 ≈ 0.709.
Example 2. A calculation on acetaldehyde (H3C − CHO) using aldehyde (H − CHO) as the model system. The
covalent bond between the two carbon atoms is broken and a link atom must be introduced to replace the methyl group.
The link atom is automatically generated — all you need to do is specify the atoms in the model system that are also
in the real system (here CHO) and the broken bonds. Here is the geometry of acetaldehyde with the CHO of aldehyde
first
geometry
C -0.383 0.288 0.021
H -1.425 0.381 0.376
O 0.259 1.263 -0.321
There are three atoms (the first three) of the real geometry included in the model geometry, and we are breaking the
bond between atoms 1 and 7, replacing atom 7 with a hydrogen link atom. This is all accomplished by the directive
214 CHAPTER 23. HYBRID CALCULATIONS WITH ONIOM
model 3 1 7 0.709 H
Since the default link atom is hydrogen there is actually no need to specify the “H”.
See also Section 23.6.3 for a more complex example.
The link atoms are appended to the atoms of the model or intermediate systems in the order that the broken bonds are
specified in the input. This is of importance only if manually constructing an initial guess.
The basis name on the theory directive (high, medium, or low) is that specified on a basis set directive (see Section 7)
and not the name of a standard basis in the library. If not specified, the basis set for the high-level theory defaults to the
standard "ao basis". That for the medium level defaults to the high-level basis, and the low-level basis defaults
to the medium-level basis. Other wavefunction parameters are obtained from the standard wavefunction input blocks.
See 23.6.2 for an example.
If an effective core potential is specified in the usual fashion (see Section 8) outside of the ONIOM input then this will
be used in all calculations. If an alternative ECP name (the name specified on the ECP directive in the same manner as
done for basis sets) is specified on one of the theory directives, then this ECP will be used in preference for that level
of theory. See Section 23.6.2 for sample input.
For many purposes, the ability to specify the theory, basis and effective core potential is adequate. All of the options
for each theory are determined from their independent input blocks. However, if the same theory (e.g., DFT) is to
be used with different options for the ONIOM theoretical models, then the general input strings must be used. These
strings are processed as NWChem input each time the theoretical model is invoked. The strings may contain any
NWChem input, except for options pertaining to ONIOM and the task directive. The intent that the strings be used
just to control the options pertaining to the theory being used.
A word of caution. Be sure to check that the options are producing the desired results. Since the NWChem
database is persistent and the ONIOM calculations happen in an undefined order, the input strings should fully define
the calculation you wish to have happen.
For instance, if the high model is DFT/B3LYP/6-311g** and the low model is DFT/LDA/3-21g, the ONIOM input
might look like this
23.3. USE OF SYMMETRY 215
oniom
model 3
low dft basis 3-21g input "dft\; xc\; end"
high dft basis 6-311g** input "dft\; xc b3lyp\; end"
end
The empty XC directive restores the default LDA exchange-correlation option (see Section 11.3). Note that semi-
colons and other quotation marks inside the input string must be preceded by a backslash to avoid special interpretation.
See Section 23.6.4 for another example.
• low-real — ".lrmos"
• low-inter — ".limos"
• low-model — ".lmmos"
• medium-inter — ".mimos"
• medium-model — ".mmmos"
• high-model — ".hmmos"
Each calculation will utilize the appropriate vectors which is more efficient during geometry optimizations and fre-
quency calculations, and is also useful for the initial calculation. In the absence of existing MO vectors files, the
default atomic guess is used (see Section 10.5).
If special measures must be taken to converge the initial SCF, DFT or MCSCF calculation for one or more of the
systems, then initial vectors may be saved in a file with the default name, or another name may be specified using the
VECTORS directive. Note that subsequent vectors (e.g., from a geometry optimization) will be written back to this
file, so take a copy if you wish to preserve it. To generate the initial guess for the model or intermediate systems it is
necessary to generate the geometries which is most readily done, if there are link atoms, by just running NWChem on
the input for the ONIOM calculation on your workstation. It will print these geometries before starting any calculations
which you can then terminate.
E.g., in a calculation on Fe(III) surrounded by some ligands, it is hard to converge the full (real) system from the
atomic guess so as to obtain a d 5 configuration for the iron atom since the d orbitals are often nominally lower in
energy than some of the ligand orbitals. The most effective mechanism is to converge the isolated Fe(III) and then to
216 CHAPTER 23. HYBRID CALCULATIONS WITH ONIOM
use the fragment guess (see Section 10.5.1) as a starting guess for the real system. The resulting converged molecular
orbitals can be saved either with the default name (as described above in this section), in which case no additional
input is necessary. If an alternative name is desired, then the VECTORS directive may be used as follows
23.5 Restarting
Restart of ONIOM calculations does not currently work as smoothly as we would like. For geometry optimizations
that terminated gracefully by running out of iterations, the restart will work as normal. Otherwise, specify in the input
of the restart job the last geometry of the optimization. The Hessian information will be reused and the calculation
should proceed losing at most the cost of one ONIOM gradient evaluation. For energy or frequency calculations,
restart may not currently be possible.
23.6 Examples
A simple two-layer model changing just the wavefunction with one link atom.
This reproduces the two-layer ONIOM (MP2:HF) result from Dapprich et al. for the reaction R − CH3 = R −
CH2 + H with R = CH3 using CH4 as the model . The geometries of R −CH3 and R −CH2 are optimized at the DFT-
B3LYP/6-311++G** level of theory, and then ONIOM is used to compute the binding energy using UMP2 for the
model system and HF for the real system. The results, including MP2 calculations on the full system for comparison,
are as given in Table 23.1
Table 23.1: Energies for ONIOM example 1, hydrocarbon bond energy using MP2:HF two-layer model.
The following input first performs a calculation on CH3 −CH2 , and then on CH3 −CH3 . Note that in the second
calculation we cannot use the full symmetry since we are breaking the C-C bond in forming the model system (the
non-equivalence of the methyl groups is perhaps more apparent if we write R −CH3 ).
start
basis spherical
H library 6-311++G**; C library 6-311++G**
end
geometry autosym
23.6. EXAMPLES 217
oniom
high mp2
low scf
model 3 3 7 0.724
end
task oniom
oniom
high mp2
low scf
model 4 4 8 0.724
end
task oniom
A two-layer model including modification of theory, basis, ECP and total charge and no link atoms.
This input reproduces the ONIOM optimization and vibrational frequency calculation of Rh(CO)2Cp of Dapprich
et al. The model system is Rh(CO)+2 . The low theory is the Gaussian LANL2MB model (Hay-Wadt n+1 ECP with
minimal basis on Rh, STO-3G on others) with SCF. The high theory is the Gaussian LANL2DZ model (another Hay-
218 CHAPTER 23. HYBRID CALCULATIONS WITH ONIOM
Wadt ECP with a DZ basis set on Rh, Dunning split valence on the other atoms) with DFT/B3LYP. Note that different
names should be used for the basis set and ECP since the same mechanism is used to store them in the database.
start
ecp LANL2DZ_ECP
rh library LANL2DZ_ECP
end
ecp Hay-Wadt_MB_(n+1)_ECP
rh library Hay-Wadt_MB_(n+1)_ECP
end
charge 0
geometry autosym
rh 0.00445705 -0.15119674 0.00000000
c -0.01380554 -1.45254070 1.35171818
c -0.01380554 -1.45254070 -1.35171818
o -0.01805883 -2.26420212 2.20818932
o -0.01805883 -2.26420212 -2.20818932
c 1.23209566 1.89314720 0.00000000
c 0.37739392 1.84262319 -1.15286640
c -1.01479160 1.93086461 -0.70666350
c -1.01479160 1.93086461 0.70666350
c 0.37739392 1.84262319 1.15286640
h 2.31251453 1.89903673 0.00000000
h 0.70378132 1.86131979 -2.18414218
h -1.88154273 1.96919306 -1.35203550
h -1.88154273 1.96919306 1.35203550
h 0.70378132 1.86131979 2.18414218
end
dft; grid fine; convergence gradient 1e-6 density 1e-6; xc b3lyp; end
23.6. EXAMPLES 219
oniom
low scf basis Hay-Wadt_MB_(n+1) ecp Hay-Wadt_MB_(n+1)_ECP
high dft basis LANL2DZ ecp LANL2DZ_ECP
model 5 charge 1
print low
end
A three layer example combining CCSD(T), and MP2 with two different quality basis sets, and using multiple link
atoms.
The full system is tetra-dimethyl-amino-ethylene (TAME) or (N(Me)2)2-C=C-(N(Me)2)2. The intermediate sys-
tem is (NH2)2-C=C-(NH2)2 and H2C=CH2 is the model system. CCSD(T)+aug-cc-pvtz is used for the model region,
MP2+aug-cc-pvtz for the intermediate region, and MP2+aug-cc-pvdz for everything.
In the real geometry the first two atoms (C, C) are the model system (link atoms will be added automatically). The
first six atoms (C, C, N, N, N, N) describe the intermediate system (again with link atoms to be added automatically).
The atoms have been numbered using comments to make the bonding input easier to generate.
To make the model system, four C-N bonds are broken between the ethylene fragment and the dimethyl-amino
groups and replaced with C-H bonds. To make the intermediate system, eight C-N bonds are broken between the
nitrogens and the methyl groups and replaced with N-H bonds. The scaling factor could be chosen differently for each
of the bonds.
start
geometry
C 0.40337795 -0.17516305 -0.51505208 # 1
C -0.40328664 0.17555927 0.51466084 # 2
N 1.87154979 -0.17516305 -0.51505208 # 3
N -0.18694782 -0.60488524 -1.79258692 # 4
N 0.18692927 0.60488318 1.79247594 # 5
N -1.87148219 0.17564718 0.51496494 # 6
C 2.46636552 1.18039452 -0.51505208 # 7
C 2.48067731 -1.10425355 0.46161675 # 8
C -2.46642715 -1.17982091 0.51473105 # 9
C -2.48054940 1.10495864 -0.46156202 # 10
C 0.30027136 0.14582197 -2.97072148 # 11
C -0.14245927 -2.07576980 -1.96730852 # 12
C -0.29948109 -0.14689874 2.97021079 # 13
C 0.14140463 2.07558249 1.96815181 # 14
H 0.78955302 2.52533887 1.19760764
H -0.86543435 2.50958894 1.88075113
... and 22 other hydrogen atoms on the methyl groups
end
220 CHAPTER 23. HYBRID CALCULATIONS WITH ONIOM
oniom
high ccsd(t) basis aug-cc-pvtz
medium mp2 basis aug-cc-pvtz
low mp2 basis aug-cc-pvdz
model 2 1 3 0.87 1 4 0.87 2 5 0.87 2 6 0.87
task oniom
1. The semi-colons and quotation marks inside the input string must be quoted with backslash.
2. The low level of theory sets the fitting basis set and the high level of theory unsets it.
start
geometry
symmetry d2h
C 0.71237329 -1.21458940 0.0
C -0.71237329 -1.21458940 0.0
C 0.71237329 1.21458940 0.0
C -0.71237329 1.21458940 0.0
C -1.39414269 0.00000000 0.0
C 1.39414269 0.00000000 0.0
H -2.47680865 0.00000000 0.0
H 2.47680865 0.00000000 0.0
C 1.40340535 -2.48997027 0.0
C -1.40340535 -2.48997027 0.0
C 1.40340535 2.48997027 0.0
C -1.40340535 2.48997027 0.0
C 0.72211503 3.64518615 0.0
23.6. EXAMPLES 221
basis small
h library DZVP_(DFT_Orbital)
c library DZVP_(DFT_Orbital)
end
basis fitting
h library DGauss_A1_DFT_Coulomb_Fitting
c library DGauss_A1_DFT_Coulomb_Fitting
end
basis big
h library TZVP_(DFT_Orbital)
c library TZVP_(DFT_Orbital)
end
oniom
model 8 1 9 0.75 2 10 0.75 3 11 0.75 4 12 0.75
high dft basis big input "unset \"cd basis\"\; dft\; xc b3lyp\; end"
low dft basis small input "set \"cd basis\" fitting\; dft\; xc\; end"
end
task oniom
222 CHAPTER 23. HYBRID CALCULATIONS WITH ONIOM
Chapter 24
Hessians
This section relates to the computation of analytic hessians which are available for open and closed shell SCF, except
ROHF and for closed shell DFT. Analytic hessians are not currently available for SCF or DFT calculations relativistic
all-electron methodologies or for charge fitting with DFT. The current algorithm is fully in-core and does not use
symmetry. This will be changed in the next release.
There is no required input for the Hessian module. This module only impacts the hessian calculation. For options
for calculating the frequencies, please see Section 25, the Vibrational module.
All input for the Hessian Module is optional since the default definitions are usually correct for most purposes. The
generic module input begins with hessian and has the form:
hessian
thresh <real tol default 1d-6>
print ...
profile
end
You may modify the default threshold for the wavefunction. This keyword is identical to THRESH in the SCF, Section
10.7, and the CONVERGENCE gradient in the DFT, Section 11.7. The usual defaults for the convergence of the
wavefunction for single point and gradient calculations is generally not tight enough for analytic hessians. Therefore,
the hessian, by default, tightens these up to 1d-6 and runs an additional energy point if needed. If, during an analytic
hessian calculation, you encounter an error:
223
224 CHAPTER 24. HESSIANS
24.1.2 Profile
The PROFILE keyword provides additional information concerning the computation times of different sections of the
hessian code. Summary information is given about the maximum, minimum and average times that a particular section
of the code took to complete. This is normally only useful for developers.
Vibrational frequencies
The nuclear hessian which is used to compute the vibrational frequencies can be computed by finite difference
for any ab initio wave-function that has analytic gradients or by analytic methods for SCF and DFT (see Sec-
tion 24 for details). The appropriate nuclear hessian generation algorithm is chosen based on the user input when
TASK <theory> frequencies is the task directive.
The vibrational package was integrated from the Utah Messkit and can use any nuclear hessian generated from
the driver routines, finite difference routines or any analytic hessian modules. There is no required input for the
“VIB” package. VIB computes the Infra Red frequencies and intensities1 for the computed nuclear hessian and the
“projected” nuclear hessian. The VIB module projects out the translations and rotations of the nuclear hessian using
the standard Eckart projection algorithm. It also computes the zero point energy for the molecular system based on
the frequencies obtained from the projected hessian.
The default mass of each atom is used unless an alternative mass is provided via the geometry input, (c.f., 6)
or redefined using the vibrational module input. The default mass is the mass of the most abundant isotope of each
element.2 If the abundance was roughly equal, the mass of the isotope with the longest half life was used.
In addition, the vibrational analysis is given at the default standard temperature of 298.15 degrees.
driver routines
2 c.f., "The Elements" by John Emsley, Oxford University Press, (C) 1989, ISBN 0-19-855237-8.
3 The geometry specification at the point where the hessian is computed must be the default “geometry” on the current run-time-data-base for
225
226 CHAPTER 25. VIBRATIONAL FREQUENCIES
end
By default the task <theory> frequencies directive will recompute the hessian. To reuse the previously
computed hessian you need only specify reuse in the module input block. If you have stored the hessian in an
alternate place you may redirect the reuse directive to that file by specifying the path to that file.
reuse /path_to_hessian_file
This will reuse your saved Hessian data but one caveat is that the geometry specification at the point where the hessian
is computed must be the default “geometry” on the current run-time-data-base for the projection to work properly.
You may also modify the mass of a specific center or a group of centers via the input.
To modify the mass of a specific center you can simply use:
mass 3 4.00260324
which will set the mass of center 3 to 4.00260324 AMUs. The lexical index of centers is determined by the geometry
object.
To modify all Hydrogen atoms in a molecule you may use the tag based mechanism:
The mass redefinitions always start with the default masses and change the masses in the order given in the input.
Care must be taken to change the masses properly. For example, if you want all hydrogens to have the mass of
Deuterium and the third hydrogen (which is the 6th atomic center) to have the mass of Tritium you must set the
Deuterium masses first with the tag based mechanism and then set the 6th center’s mass to that of Tritium using the
lexical center index mechanism.
The mass redefinitions are not fully persistent on the run-time-data-base. Each input block that redefines masses
will invalidate the mass definitions of the previous input block. For example,
freq
reuse
mass hydrogen 2.014101779
end
task scf frequencies
freq
reuse
mass oxygen 17.9991603
end
task scf frequencies
will use the new mass for all hydrogens in the first frequency analysis. The mass of the oxygen atoms will be redefined
in the second frequency analysis but the hydrogen atoms will use the default mass. To get a modified oxygen and
hydrogen analysis you would have to use:
25.1. VIBRATIONAL MODULE INPUT 227
freq
reuse
mass hydrogen 2.014101779
end
task scf frequencies
freq
reuse
mass hydrogen 2.014101779
mass oxygen 17.9991603
end
task scf frequencies
The “VIB” module can generate the vibrational analysis at various temperatures other than at standard room tempera-
ture. Either temp or temperature can be used to initiate this command.
To modify the temperature of the computation you can simply use:
At this point, the temperatures are persistant and so the user must "reset" the temperature if the standard behavior
is required after setting the temperatures in a previous “VIB” command, i.e.
temp 1 298.15
25.1.4 Animation
The “VIB” module also can generate mode animation input files in the standard xyz file format for graphics packages
like RasMol or XMol There are scripts to automate this for RasMol in $NWCHEM_TOP/contrib/rasmolmovie.
Each mode will have 20 xyz files generated that cycle from the equilibrium geometry to 5 steps in the positive direction
of the mode vector, back to 5 steps in the negative direction of the mode vector, and finally back to the equilibrium
geometry. By default these files are not generated. To activate this mechanism simply use the following input directive
animate
By default, the step size used is 0.15 a.u. which will give reliable animations for most systems. This can be changed
via the input directive
where <step_size> is the real number that is the magnitude of each step along the eigenvector of each nuclear
hessian mode in atomic units.
228 CHAPTER 25. VIBRATIONAL FREQUENCIES
This example input deck will optimize the geometry for the given basis set, compute the frequencies for H2 O, H2 O at
different temperatures, D2 O, HDO, and TDO.
start h2o
title Water
geometry units au autosym
O 0.00000000 0.00000000 0.00000000
H 0.00000000 1.93042809 -1.10715266
H 0.00000000 -1.93042809 -1.10715266
end
basis noprint
H library sto-3g
O library sto-3g
end
scf; thresh 1e-6; end
driver; tight; end
task scf optimize
freq
reuse; temp 4 298.15 300.0 350.0 400.0
end
task scf freq
freq
reuse; mass H 2.014101779
temp 1 298.15
end
task scf freq
freq
reuse; mass 2 2.014101779
end
task scf freq
freq
reuse; mass 2 2.014101779 ; mass 3 3.01604927
end
task scf freq
Chapter 26
DPLOT
DPLOT
...
END
This directive is used to obtain the plots of various types of electron densities (or orbitals) of the molecule. The
electron density is calculated on a specified set of grid points using the molecular orbitals from SCF or DFT calculation.
The output file is either in MSI Insight II contour format (default) or in the Gaussian Cube format. DPLOT is not
executed until the “task dplot” directive is given. Different sub-directives are described below.
A outputfile is generate in Gaussian Cube format. You can visualize this file using gOpenMol (after converting the
Gaussian Cube file with gcube2plt), Molden or Molekel.
This sub-directive specifies a title line for the generated input to the Insight program or for the Gaussian cube file.
Only one line is allowed.
229
230 CHAPTER 26. DPLOT
This sub-directive specifies the limits of the cell to be plotted. The grid is generated using No_Of_Spacings + 1
points along each direction. The known names for Units are angstroms, au and bohr.
This sub-directive specifies, what kind of density is to be computed. The known names for Spin are total,
alpha, beta and spindens, the last being computed as the difference between α and β electron densities.
This sub-directive specifies the name of the generated input to the Insight program or the generated Gaussian cube
file. The name OUTPUT is reserved for the standard NWChem output.
This sub-directive specifies the name of the molecular orbital file. If the second file is optionally given the density
is computed as the difference between the corresponding electron densities. The vector files have to match.
This sub-directive specifies where the density is to be computed. The known names for Where are grid (the
calculation of the density is performed on the set of a grid points specified by the sub-directive LimitXYZ and the file
specified by the sub-directive Output is generated), nuclei (the density is computed at the position of the nuclei
and written to the NWChem output) and g+n (both).
This sub-directive specifies the subset of the orbital space for the calculation of the electron density. The density is
computed using the occupation numbers from the orbital file modified according to the Spin directive. If the contours
26.9. EXAMPLES 231
of the orbitals are to be plotted Option should be set to view. Note, that in this case No_Of_Orbitals should be
set to 1 and sub-directive Where is automatically set to grid. Also specification of two orbital files conflicts with
the view option. α orbitals are always plotted unless Spin is set to beta.
26.9 Examples
Charge Density
start n2
geometry
n 0 0 0.53879155
n 0 0 -0.53879155
end
basis; n library cc-pvdz;end
scf
vectors output n2.movecs
end
dplot
TITLE HOMO
vectors n2.movecs
LimitXYZ
-3.0 3.0 10
-3.0 3.0 10
-3.0 3.0 10
spin total
gaussian
output lumo.cube
end
task scf
task dplot
Molecular Orbital
start n2
geometry
n 0 0 0.53879155
n 0 0 -0.53879155
end
basis; n library cc-pvdz;end
scf
vectors output n2.movecs
end
dplot
TITLE HOMO
232 CHAPTER 26. DPLOT
vectors n2.movecs
LimitXYZ
-3.0 3.0 10
-3.0 3.0 10
-3.0 3.0 10
spin total
orbitals view; 1; 7
output homo.grd
end
task scf
task dplot
Chapter 27
The NWChem electron transfer (ET) module calculates the electronic coupling energy (also called the electron transfer
matrix element) between ET reactant and product states. The electronic coupling (VRP ), activation energy (∆G∗ ), and
nuclear reorganization energy (λ) are all components of the electron transfer rate defined by Marcus’ theory, which
also depends on the temperature (reference 1):
−∆G∗
2π 2 1
kET = VRP √ exp (27.1)
h̄ 4πλkB T kB T
The ET module utilizes the method of Corresponding Orbital Transformation to calculate VRP . The only input
required are the names of the files containing the open-shell (UHF) MO vectors for the ET reactant and product states
(R and P).
The basis set used in the calculation of VRP must be the same as the basis set used to calculate the MO vectors of R
and P. The magnitude of VRP depends on the amount of overlap between R and P, which is important to consider when
choosing the basis set. Diffuse functions may be necessary to fill in the overlap, particularly when the ET distance is
long.
The MO’s of R and P must correspond to localized states. for instance, in the reaction A− B → A B− the transferring
electron is localized on A in the reactant state and is localized on B in the product state. To verify the localization
of the electron in the calculation of the vectors, carefully examine the Mulliken population analysis. In order to
determine which orbitals are involved in the electron transfer, use the print keyword "mulliken ao" which prints
the Mulliken population of each basis function.
An effective core potential (ECP) basis can be used to replace core electrons. However, there is one caveat: the
orbitals involved in electron transfer must not be replaced with ECP’s. Since the ET orbitals are valence orbitals, this
is not usually a problem, but the user should use ECP’s with care.
Suggested references are listed below. The first two references gives a good description of Marcus’ two-state ET
model, and the appendix of the third reference details the method used in the ET module.
2. J.R. Bolton, N. Mataga, and G. McLendon in “Electron Transfer in Inorganic, Organic and Biological Systems"
(American Chemical Society, Washington, D.C., 1991)
3. A. Farazdel, M. Dupuis, E. Clementi, and A. Aviram, J. Am. Chem. Soc., 112, 4206 (1990).
233
234 CHAPTER 27. ELECTRON TRANSFER CALCULATIONS WITH ET
In the VECTORS directive the user specifies the source of the molecular orbital vectors for the ET reactant and
product states. This is required input, as no default filename will be set by the program. In fact, this is the only required
input in the ET module, although there are other optional keywords described below.
This directive enables/disables the use of the NWChem’s Fock matrix routine in the calculation of the two-electron
portion of the ET Hamiltonian. Since the Fock matrix routine has been optimized for speed, accuracy and parallel
performance, it is the most efficient choice.
Alternatively, the user can calculate the two-electron contribution to the ET Hamiltonian with another subroutine
which may be more accurate for systems with a small number of basis functions, although it is slower.
The variable tol2e is used in determining the integral screening threshold for the evaluation of the two-electron
contribution to the Hamiltonian between the electron transfer reactant and product states. As a default, tol2e is
set depending on the magnitude of the overlap between the ET reactant and product states (SRP ), and is not less than
1.0d-12 or greater than 1.0d-7.
The input to specify the threshold explicitly within the ET directive is, for example:
tol2e 1e-9
27.4 Example
The following example is for a simple electron transfer reaction, He → He+ . The ET calculation is easy to execute,
but it is crucial that ET reactant and product wavefunctions reflect localized states. This can be accomplished using
either a fragment guess (shown in the example, see 10.5.1), or a charged atomic density guess (see 10.5.2). For self-
exchange ET reactions such as this one, you can use the REORDER keyword to move the electron from the first helium
to the second (see 10.5).
Example input :
27.4. EXAMPLE 235
#ET reactants:
charge 1
scf
doublet; uhf; vectors input fragment HeP.mo He.mo output HeA.mo
# HeP.mo are the vectors for He(+),
# He.mo are the vectors for neutral He.
end
task scf
#ET products:
charge 1
scf
doublet; uhf; vectors input HeA.mo reorder 2 1 output HeB.mo
end
task scf
et
vectors reactants HeA.mo
vectors products HeB.mo
end
task scf et
The overlap between the ET reactant and product states (SRP ) is small, so the magnitude of the coupling between
the states is also small. If the fragment guess or charged atomic density guess were not used, the Mulliken spin
population would be 0.5 on both He atoms, the overlap between the ET reactant and product states would be 100 %
and an infinite VRP would result.
Chapter 28
Properties
Properties can be calculated for both the Hartree-Fock and DFT wave functions. The properties that are available are:
The properties module is started when the task directive TASK <theory> property is defined in the user
input file. The input format has the form:
PROPERTY
[property keyword]
[CENTER ((com || coc || origin || arb <real x y z>) default coc)]
END
Most of the properties can be computed for Hartree-Fock (closed-shell RHF, open-shell ROHF, and open-shell
UHF), and DFT (closed-shell and open-shell spin unrestricted) wavefunctions. The NMR chemical shift is limited
to closed-shell wave functions, whereas the NMR hyperfine and indirect spin-spin coupling require a UHF or ODFT
wave function.
237
238 CHAPTER 28. PROPERTIES
NBOFILE
DIPOLE
QUADRUPOLE
OCTUPOLE
MULLIKEN
ESP
EFIELD
EFIELDGRAD
ELECTRONDENSITY
HYPERFINE
SHIELDING [<integer> number_of_atoms <integer> atom_list]
SPINSPIN [<integer> number_of_pairs <integer> pair_list]
ALL
com is the center of mass, coc is the center of charge, origin is (0.0, 0.0, 0.0) and arb is any arbitrary point
which must be accompanied by the coordinated to be used. Currently the x, y, and z coordinates must be given in the
same units as UNITS in GEOMETRY (See Section 6.1).
28.1.1 Nbofile
The keyword NBOFILE does not execute the Natural Bond Analysis code, but simply creates an input file to be used
as input to the stand-alone NBO code. All other properties are calculated upon request.
Following the successful completion of an electronic structure calculation, a Natural Bond Orbital (NBO) analysis
may be carried out by providing the keyword NBOFILE in the PROPERTY directive. NWChem will query the rtdb
and construct an ASCII file, <file_prefix>.gen, that may be used as input to the stand alone version of the NBO
program, gennbo. <file_prefix> is equal to string following the START directive. The input deck may be edited
to provide additional options to the NBO calculation, (see the NBO user’s manual for details.)
Chapter 29
VSCF
The VSCF module can be used to calculate the anharmonic contributions to the vibrational modes of the molecule of
interest. Energies are calculated on a one-dimensional grid along each normal mode, on a two-dimensional grid along
each pair of normal modes, and optionally on a three-dimensional grid along each triplet of normal modes. These
energies are then used to calculate the vibrational nuclear wavefunction at an SCF- (VSCF) and MP2-like (cc-VSCF)
level of theory.
VSCF can be used at all levels of theory, SCF and correlated methods, and DFT. For correlated methods, only the
SCF level dipole is evaluated and used to calculate the IR intensity values.
The VSCF module is started when the task directive TASK <theory> vscf is defined in the user input file.
The input format has the form:
VSCF
[coupling <string couplelevel default "pair">]
[ngrid <integer default 16 >]
[iexcite <integer default 1 >]
[vcfct <real default 1.0>]
END
The order of coupling of the harmonic normal modes included in the calculation is controlled by the specifying:
For coupling=diagonal a one-dimensional grid along each normal mode is computed. For coupling=pair
a two-dimensional grid along each pair of normal modes is computed. For coupling=triplet a three-dimensional
grid along each triplet of normal modes is computed.
The number of grid points along each normal mode, or pair of modes can be defined by specifying:
This VSCF module by default calculates the ground state (v=0), but can also calculate excited states (such as v=1).
The number of excited states calculated is defined by specifying:
239
240 CHAPTER 29. VSCF
With iexcite=1 the fundamental frequencies are calculated. With iexcite=2 the first overtones are calcu-
lated. With iexcite=3 the second overtones are calculated.
In certain cases the pair coupling potentials can become larger than those for a single normal mode. In this case
the pair potentials need to be scaled down. The scaling factor used can be defined by specifying:
Electrostatic potentials
The NWChem Electrostatic Potential (ESP) module derives partial atomic charges that fit the quantum mechanical
electrostatic potential on selected grid points.
The ESP module is specified by the NWChem task directive
task esp
The input for the module is taken from the ESP input block
ESP
...
END
• If a grid file is found, the grid will be read from that file. If no grid file is found, or the keyword
recalculate
where rcut is the maximum distance in nm between a grid point and any of the atomic centers. When omitted,
a default value for rcut of 0.3 nm is used.
• The grid spacing is specified by
241
242 CHAPTER 30. ELECTROSTATIC POTENTIALS
where spac is the grid spacing in nm for the regularly spaced grid points. If not specified, a default spacing of
0.05 nm is used.
• The van der Waals radius of an element can be specified by
where iatnum is the atomic number for which a van der Waals radius of atrad in nm will be used in the grid
point determination. Default values will be used for atoms not specified.
• The probe radius in nm determining the envelope around the molecule is specified by
• The distance between atomic center and probe center can be multiplied by a constant factor specified by
All grid points are discarded that lie within a distance factor*(radius(i)+probe) from any atom i.
• Schwarz screening is applied using
30.2 Constraints
Additional constraints to the partial atomic charges can be imposed during the fitting procedure.
where charge is the net charge of the set of atoms {iatom}. A negative atom number iatom can be used to
specify that the partial charge of that atom is substracted in the sum for the set.
• The net charge of a sequence of atoms can be constrained using
• The individual charge of a group of atoms can be constrained to be equal to those of a second group of atoms
with
constrain group <integer iatom> <integer jatom> to <integer katom> <integer latom>
resulting in the same charge for atoms iatom and katom, for atoms iatom+1 and katom+1, ... for atoms
jatom and latom.
30.3. RESTRAINTS 243
• A special constraint
can be used to constrain the set {iatom,{jatom}} to zero charge, and constrain all atoms in {jatom} to
have the same charge. This can be used, for example, to restrain a methyl group to zero charge, and have all
hydrogen carrying identical charges.
30.3 Restraints
Restraints can be applied to each partial charge using the RESP charge fitting procedure.
where hfree can be specified to exclude hydrogen atoms from the restaining procecure. Variable scale is
the strength of the restraint potential, with a default of 0.005au for the harmonic restraint and a default value
of 0.001au for the hyperbolic restraint. For the hyperbolic restraints the tightness tight can be specified to
change the default value of 0.1e. The iteration count that needs to be carried out for the hyperbolic restraint
is determined by the maximum number of allowed iterations maxiter, with a default value of 25, and the
tolerance in the convergence of the partial charges toler, with a default of 0.001e.
244 CHAPTER 30. ELECTROSTATIC POTENTIALS
Chapter 31
Prepare
The prepare module is used to set up the necessary files for a molecular dynamics simulation with NWChem. User
supplied coordinates can be used to generate topology and restart files. The topology file contains all static information
about a molecular system, such as lists of atoms, bonded interactions and force field parameters. The restart file
contains all dynamic information about a molecular system, such as coordinates, velocities and properties.
Without any input, the prepare module checks the existence of a topology and restart file for the molecular systems.
If these files exist, the module returns to the main task level without action. The module will generate these files when
they do not exist. Without any input to the module, the generated system will be for a non-solvated isolated solute
system.
To update existing files, including solvation, the module requires input directives read from an input deck,
prepare
...
end
* sequence generation
This sub-task analyzes the supplied coordinates from a PDB-formatted file or from the input geometry, and
generates a sequence file, containing the description of the system in terms of basic building blocks found as
fragment or segment files in the database directories for the force field used. If these files do not exist, they
are generated based on the supplied coordinates. This process constists of generating a fragment file with the
list of atoms with their force field dependent atom types, partial atomic charges calculated from a Hartree Fock
calculation for the fragment, followed by a restrained electrostatic potential fit, and a connectivity list. From
the information on this fragment file the lists of all bonded interactions are generated, and the complete lists are
written to a segment file.
* topology generation
Based on the generated or user-supplied sequence file and the force field specific segment database files, this
sub-task compiles the lists of atoms, bonded interactions, excluded pairs, and substitutes the force field param-
eters. Special commands may be given to specify interaction parameters that will be changing in a free energy
evaluation.
* restart generation
Using the user supplied coordinates and the topology file for the chemical system, this sub-task generates a
245
246 CHAPTER 31. PREPARE
restart file for the system with coordinates, velocities and other dynamic information. This step may include
solvation of the chemical system and specifying periodic boundary conditions.
* standards
The standard database files contain the original force field information. These files are to reside in a directory
that is specified in the file $HOME/.nwchemrc. There will be such a directory for each supported force field.
These directories contain fragment files (with extension frg), segment files (with extension sgm) and a parameter
file (with the name of the force field and with extension par).
* extensions
These database files contain generally accepted extensions to the original force field and are to reside in a
separate directory that is specified in the file $HOME/.nwchemrc. There will be such a directory for each
supported force field. These directories contain fragment files (with extension frg), segment files (with extension
sgm) and a parameter file (with the name of the force field and with extension par).
* contributed
These database files contain contributed definitions, also required for the quality assurance tests and are to reside
in a separate directory that is specified in the file $HOME/.nwchemrc. There will be such a directory for each
supported force field. These directories contain fragment files (with extension frg), segment files (with extension
sgm) and a parameter file (with the name of the force field and with extension par).
* user preferences
These database files contain user preferred extensions to the original force field and are to reside in a separate
directory that is specified in the file $HOME/.nwchemrc. Separate directories of this type should be defined for
each supported force field. This directory may contain fragment files (with extension frg), segment files (with
extension sgm) and a parameter file (with the name of the force field and with extension par).
* temporary files
Temporary database files contain user preferred extensions to the original force field and are to reside in a sepa-
rate directory that is specified in the file $HOME/.nwchemrc. There may be such a directory for each supported
force field. This directory may contain fragment files (with extension frg), segment files (with extension sgm)
and a parameter file (with the name of the force field and with extension par).
* current files
Database files that contain user preferred extensions to the original force field and are to reside in a separate
directory that is specified in the file $HOME/.nwchemrc. Typically this will be the current working directory,
although it may be defined as a specific directory. This directory may contain fragment files (with extension
frg), segment files (with extension sgm) and a parameter file (with the name of the force field and with extension
par). If not specified, files will be taken from the current directory.
Data is taken from the database files searched in the above order. If data is specified more than once, the last found
values are used. For example, if some standard segment is redefined in a temporary file, the latter one will be used.
This allows the user to redefine standards or extensions without having to modify those database files, which may
reside in a generally available, non-modifyable directory. If a filename is specified rather than a directory, the filename
indicates the parameter file definition. All other files (frg and sgm files) will be take from the specified directory.
The most common problems with the prepare module are
The format of the pdb file does not conform to the pdb standard. In particular, atom names need to correspond
with definitions in the fragment and segment database files, and should adhere to IUPAC recommendations as
adopted by the pdb standard. If this problem occurs, the pdb file will need to be corrected.
31.1. DEFAULT DATABASE DIRECTORIES 247
Non-standard segments may contain atoms that could not be atom typed with the existing typing rules in the
force field parameter files. When this happens, additional typing rules can be included in the parameter file, or
the fragment file may be manually typed.
Parameters for atom types or bonded interactions do not exist in the force field. When this happens, additional
parameters may be defined in the parameter files, or the segment file may be edited to include explicit parameters.
This entry specifies the default force field. Database files supplied with NWChem currently support values for
ffname of amber, referring to AMBER95, and charmm, referring to the academic CHARMM22 force field.
Entries of this type specify the directory ffdir in which force field database files can be found. Optionally the
parameterfile in this directory may be specified as parfile. The prepare module will only use files in directories
specified here. One exception is that files in the current work directory will be used if no directory with current files
is specified. The directories are read in the order 1-9 with duplicate parameters taken from the last occurrence found.
Note that multiple parameter files may be specified that will be read in the order in which they are specified.
This entry may be used to identify a pure solvent restart file solvfil by a name solvnam
An example file $HOME/.nwchemrc is:
ffield amber
amber_1 /soft/nwchem/share/amber/amber_s/amber99.par,spce.par
amber_2 /soft/nwchem/share/amber/amber_x/
amber_3 /usr/people/username/data/amber/amber_u/
spce /soft/nwchem/share/solvents/spce.rst
charmm_1 /soft/nwchem/share/charmm/charmm_s/
charmm_2 /soft/nwchem/share/charmm/charmm_x/
The system name can be explicitly specified for the prepare module. If not specified, the system name will be
taken from a specification in a previous md input block, or derived from the run time database name.
The source of the coordinates can be explicitly specified to be from a PDB formatted file sys.pdb, or from a
geometry object in the run time database. If not specified, a pdb file will be used when it exists in the current directory
or the rtdb geometry otherwise.
If a PDB formatted source file contains different MODELs, the model keyword can be used to specify which
MODEL will be used to generate the topology and restart file. If not specified, the first MODEL found on the PDB
file will be read.
The altloc keyword may be used to specify the use of alternate location coordinates on a PDB file.
The chain keyword may be used to specify the chain identifier for coordinates on a PDB file.
sscyx
Keyword sscyx may be used to rename cysteine residues that form sulphur bridges to CYX.
hbuild
Keyword hbuild may be used to add hydrogen atoms to the unknown segments of the structure found on the pdb
file. Placement of hydrogen atoms is based on geometric criteria, and the resulting fragment and segment files should
be carefully examined for correctness.
The database directories are used as specified in the file .nwchemrc. Specific definitions for the force field used
may be changed in the input file using
Variable maxscf specifies the maximum number of atoms in a segment for which partial atomic charges will be
determined from an SCF calculation followed by RESP charge fitting. For larger segments a crude partial charge
guestimation will be done.
31.3. SEQUENCE FILE GENERATION 249
Variable qscale specifies the factor with which SCF/RESP determined charges will be multiplied.
This command specifies that segment sgmnam should be used for segment with number sgmnum. This command
can be used to specify a particular protonation state. For example, the following command specifies that residue 114
is a hystidine protonated at the Nε site and residue 202 is a hystidine protonated at the Nδ site:
For example, to link atom SG in segment 20 with atom FE in segment 55, use:
The format of the sequence file is given in Table 35.9. In addition to the list of segments this file also includes
links between non-standard segments or other non-standard links. These links are generated based on distances found
between atoms on the pdb file. When atoms are involved in such non-standard links that have not been identified in
the fragment of segment files as a non-chain link atom, the prepare module will ignore these links and report them as
skipped. If one or more of these links are required, the user has to include them with explicit link directives in the
sequence file, making them forced links. Alternatively, these links can be made forced-links by changing link into
LINK in the sequence file.
Directive fraction can be used to separate solute molecules into fractions for which energies will be separately
reported during molecular dynamics simulations. The listed molecules will be the last molecule in a fraction. Up to
10 molecules may be specified in this directive.
Directive counter adds num counter ions of type ion to the sequence file. Up to 10 counter directives may
appear in the input block.
This directive scales the counter ion charge by the specified factor in the determination of counter ions positions.
250 CHAPTER 31. PREPARE
Keyword new_top is used to force the generation of a new topology file. An existing topology file for the system
in the current directory will be overwritten. If keyword new_seq is also specified, an existing sequence file will also
be overwritten with a newly generated file.
amber | charmm
The prepare module generates force field specific fragment, segment and topology files. The force field may be
explicitly specified in the prepare input block by specifying its name. Currently AMBER and CHARMM are the
supported force fields. A default force field may be specified in the file $HOME/.nwchemrc.
The user can explicitly specify the directories where force field specific databases can be found. These include
force field standards, extensions, quality assurance tests, user preferences, temporary , and current database files.
Defaults for the directories where database files reside may be specified in the file $HOME/.nwchemrc for each of
the supported force fields. Fragment, segment and sequence files generated by the prepare module are written in the
temporary directory. When not specified, the current directory will be used. Topology and restart files are always
created in the current directory.
The following directives control the modifications of a topology file. These directives are executed in the order
in which they appear in the prepare input deck. The topology modifying commands are not stored on the run-time
database and are, therefor, not persistent.
These modify commands change the atom type, partial atomic charge, atomic polarizability, specify a dummy,
self-interaction and quantum atom, respectively. If mset is specified, the modification will only apply to the spec-
ified set, which has to be 1, 2 or 3. If not specified, the modification will be applied to all three sets. The quan-
tum region in QM/MM simulations is defined by specifying atoms with the quantum or quantum_high label.
For atoms defined quantum_high basis sets labeled X_H will be used. The atomnam should be specified as
<integer isgm>:<string name>, where isgm is the segment number, and name is the atom name. A lead-
ing blank in an atom name should be substituted with an underscore. The modify commands may be combined. For
example, the following directive changes for the specified atom the charge and atom type in set 2 and specifies the
atom to be a dummy in set 3.
With the following directives modifications can be made for entire segments.
31.4. TOPOLOGY FILE GENERATION 251
where protonation specifies a modification of the default protonation state of the segment as specified in the
segment file. This option only applies to Q-HOP simulations.
Modifications to bonded interaction parameters can be made with the following modify commands.
where atomtyp and mset are defined as above, multip is the torsion ultiplicity for which the modification is
to be applied, value is the reference bond, angle, torsion angle of out-of-plane angle value respectively, and forcon
is the force constant for bond, angle, torsion angle of out-of-plane angle. When multip or mset are not defined the
modification will be applied to all multiplicities and sets, respectively, for the identified bonded interaction.
After modifying atoms to quantum atoms the bonded interactions in which only quantum atoms are involved are
removed from the bonded lists using
update lists
Error messages resulting from parameters not being defined for bonded interaction in which only quantum atoms
are involved are ignored using
ignore
To specify that a free energy calculation will be carried out using the topology file, the following keyword needs
to be specified,
free
To specify that a Q-HOP simulation will be carried out using the topology file, the following keyword needs to be
specified,
qhop
To specify that only the first set of parameters should be used, even if multiple sets have been defined in the
fragment or segment files, the following keyword needs to be specified,
first
Note that keywords free, qhop and qhop are mutually exclusive.
252 CHAPTER 31. PREPARE
This directive specifies a distance restraint potential between atoms atom1 and atom2, with a harmonic function
with force constant f orc1 between dist1 and dist2, and a harmonic function with force constant f orc2 between dist2
and dist3. For distances shorter than dist1 or larger than dist3, a constant force is applied such that force and energy
are continuous at dist1 and dist3, respectively. Distances are given in nm, force constants in kJ mol−1 nm−2 .
Directive select specifies a group of atoms used in the definition of potential of mean force potentials.
The selected atoms are specified by the string atoms which takes the form
For example, all carbon and oxygen atoms in segments 3 and 6 through 12 are selected for group 1 by
3,6-12:_C????,_O????
pmf [all] [bias] zalign <integer isel> <real forcon1> <real forcon2>
pmf [combine] [bias] xyplane <integer isel> <real forcon1> <real forcon2>
pmf [constraint] [bias] (distance | zdistance) <integer isel> <integer jsel> \
<real dist1> <real dist2> <real forcon1> <real forcon2>
pmf [bias] angle <integer isel> <integer jsel> <integer ksel> \
<real angle1> <real angle2> <real forcon1> <real forcon2>
pmf [bias] torsion <integer isel> <integer jsel> <integer ksel> <integer lsel> \
<real angle1> <real angle2> <real forcon1> <real forcon2>
pmf [bias] basepair <integer isel> <integer jsel> \
<real dist1> <real dist2> <real forcon1> <real forcon2>
pmf [bias] (zaxis | zaxis-cog) <integer isel> <integer jsel> <integer ksel> \
<real dist1> <real dist2> <real forcon1> <real forcon2>
Directive pmf specifies a potential of mean force potential in terms of the specified atom selection. Option
zalign specifies the atoms in the selection to be restrained to a line parallel to the z-axis. Option xyplane specifies
the atoms in the selection to be restrained to a plane perpendicular to the z-axis. Options distance, angle and
torsion, are defined in terms of the center of geometry of the specified atom selections. Keyword basepair is
used to specify a harmonic potential between residues isel and jsel. Keywords zaxis and zaxis-cog can be
used to pull atoms toward the z-axis. Option all may be specified to apply an equivalent pmf to each of the equiv-
alent solute molecules in the system. Option combine may be specified to apply the specified pmf to the atoms in
all of the equivalent solute molecules. Option constraint may be specified to a distance pmf to treat the distance
as a contraint. Option bias may be specified to indicate that this function should be treated as a biasing potential.
Appropriate corrections to free energy results will be evaluated.
Keyword new_rst will cause an existing restart file to be overwritten with a new file.
The follwing directives control the manipulation of restart files, and are executed in the order in which they appear
in the prepare input deck.
The solvent keyword can be used to specify the three letter solvent name as expected on the PDB formatted file,
and the name of the solvent model for which solvent coordinates will be used.
Solvation can be specified to be in a cubic box with specified edge, rectangular box with specified edges, or in
a sphere with specified radius. Solvation in a cube or rectangular box will automatically also set periodic boundary
conditions. Solvation in a sphere will only allow simulations without periodic boundary conditions. The size of the
cubic and rectangular boxes will be expanded by a length specified by the expand variable. If no shape is specified,
solvation will be done for a cubic box with an edge that leaves rshell nm between any solute atom and a periodic image
of any solute atom after the solute has been centered. An explicit write is not needed to write the restart file. The
solvate will write out a file sys_calc.rst. If not specified, the dimension of the solvation cell will be as large as
to have at least a distance of rshell nm between any solute atom and the edge of the cell. The experimental troct
directive generates a truncated octrahedral box.
The variable touch specifies the minimum distance between a solvent and solute atom for which a solvent
molecule will be accepted for solvation.
The variable xpndw specifies the size in nm with which the simulation volume will be increased after solvation.
These directives read and write the file filename in the specified format. The solute option instructs to write
out the coordinates for solute and all, or if specified the first nsolvent, crystal solvent molecules only. If no format
is specified, it will be derived from the extension of the filename. Recognized extensions are rst, rst_old (read only),
pdb, xyz (write only) and pov (write only). Reading and then writing the same restart file will cause the sub-block size
information to be lost. If this information needs to be retained a shell copy command needs to be used. The large
keyword allows PDB files to be written with more than 9999 residues. Since the PDB file will not conform to the PDB
convention, this option should only be used if required. NWChem will be able to read the resulting PDB file, but other
codes may not.
254 CHAPTER 31. PREPARE
This directive scales the volume and coordinates written to povray files. A negative value of scale (default) scales
the coordinates to lie in [-1:1].
This directive causes povray files to contain cpk model output. The optional value is used to scale the atomic radii.
A neagtive value of cpk resets the rendering to stick.
These directives center the solute center of geometry at the origin, in the y-z plane, in the x-z plane or in the x-y
plane, respectively.
orient
This directive translates solute atoms in the indicated range by xtran, without checking for bad contacts in the
resulting structure.
This directive rotates solute atoms in the indicated range by angle around the vector given by xrot„ without check-
ing for bad contacts in the resulting structure.
This directive removes solvent molecules inside or outside the specified coordinate range.
periodic
vacuo
This directive specifies the grid size of trial counter-ion positions and minimum distance between an atom in the
system and a counter-ion.
crop
boxsize
cube
The align directive orients the system such that atomi and atomj are on the z-axis, and atomk in the x=y
plane.
The repeat directive causes a subsequent write pdb directive to write out multiple copies of the system, with
nx copies in the x, ny copies in the y, and nz copies in the z-direction, with a minimum distance of dist between
any pair of atoms from different copies. If nz is -2, an inverted copy is placed in the z direction, with a separation
of zdist nm. If dist is negative, the box dimensions will be used. For systems with solvent, this directive should
be used with a negative dist. Optional keywords chains, molecules and fractions specify to write each
repeating solute unit as a chain, to repeat each solute molecule, or each solute fraction separately. Optional keywords
randomx, randomy, and randomz can be used to apply random rotations for each repeat unit around a vector
through the center of geometry of the solute in the x, y or z direction.
The skip directive can be used to skip single repeat unit from the repeat directive. Up to 100 skip directives
may be specified, and will only apply to the previously specified repeat directive.
specifies to move all solute molecules toward the z-axis or x=y-plane, respectively, to within a distance of touch
nm between any pair of atoms from different solute molecules. Parameter nmoves specifies the number of collapse
moves that will be made. Monatomic ions will move with the nearest multi-atom molecule.
256 CHAPTER 31. PREPARE
specifies that molecule jmol will move together with molecule imol in collapse operations.
specifies to merge the coordinates found on the specified pdb file into the current structure after translation by xtran(3).
Chapter 32
Molecular dynamics
32.1 Introduction
The molecular dynamics module of NWChem uses a distribution of data based on a spacial decomposition of the
molecular system, offering an efficient parallel implementation in terms of both memory requirements and communi-
cation costs, especially for simulations of large molecular systems.
Inter-processor communication using the global array tools and the design of a data structure allowing distribution
based on spacial decomposition are the key elements in taking advantage of the distribution of memory requirements
and computational work with minimal communication.
In the spacial decomposition approach, the physical simulation volume is divided into rectangular cells, each
of which is assigned to a processor. Depending on the conditions of the calculation and the number of available
processors, each processor contains one or more of these spacially grouped cells. The most important aspects of this
decomposition are the dependence of the cell sizes and communication cost on the number of processors and the
shape of the cells, the frequent reassignment of atoms to cells leading to a fluctuating number of atoms per cell, and
the locality of communication which is the main reason for the efficiency of this approach for very large molecular
systems.
To improve efficiency, molecular systems are broken up into separately treated solvent and solute parts. Solvent
molecules are assigned to the domains according to their center of geometry and are always owned by a one node. This
avoids solvent–solvent bonded interactions crossing node boundaries. Solute molecules are broken up into segments,
with each segment assigned to a processor based on its center of geometry. This limits the number of solute bonded
interactions that cross node boundaries. The processor to which a particular cell is assigned is responsible for the
calculation of all interactions between atoms within that cell. For the calculation of forces and energies in which
atoms in cells assigned to different processors are involved, data are exchanged between processors. The number of
neighboring cells is determined by the size and shape of the cells and the range of interaction. The data exchange that
takes place every simulation time step represents the main communication requirements. Consequently, one of the
main efforts is to design algorithms and data structures to minimize the cost of this communication. However, for very
large molecular systems, memory requirements also need to be taken into account.
To compromise between these requirements exchange of data is performed in successive point to point communi-
cations rather than using the shift algorithm which reduces the number of communication calls for the same amount
of communicated data.
257
258 CHAPTER 32. MOLECULAR DYNAMICS
For inhomogeneous systems, the computational load of evaluating atomic interactions will generally differ between
cell pairs. This will lead to load imbalance between processors. Two algorithms have been implemented that allow
for dynamically balancing the workload of each processor. One method is the dynamic resizing of cells such that cells
gradually become smaller on the busiest node, thereby reducing the computational load of that node. Disadvantages
of this method are that the efficiency depends on the solute distribution in the simulation volume and the redistribution
of work depends on the number of nodes which could lead to results that depend on the number of nodes used. The
second method is based on the dynamic redistribution of intra-node cell-cell interactions. This method represents a
more coarse load balancing scheme, but does not have the disadvantages of the cell resizing algorithm. For most
molecular systems the cell pair redistribution is the more efficient and preferred method.
The description of a molecular system consists of static and dynamic information. The static information does
not change during a simulation and includes items such as connectivity, excluded and third neighbor lists, equilibrium
values and force constants for all bonded and non-bonded interactions. The static information is called the topology
of the molecular system, and is kept on a separate topology file. The dynamic information includes coordinates and
velocities for all atoms in the molecular system, and is kept in a so-called restart file.
32.1.2 Topology
The static information about a molecular system that is needed for a molecular simulation is provided to the simulation
module in a topology file. Items in this file include, among many other things, a list of atoms, their non-bonded
parameters for van der Waals and electrostatic interactions, and the complete connectivity in terms of bonds, angles
and dihedrals.
In molecular systems, a distinction is made between solvent and solute, which are treated separately. A solvent
molecule is defined only once in the topology file, even though many solvent molecules usually are included in the
actual molecular system. In the current implementation only one solvent can be defined. Everything that is not solvent
in the molecular system is solute. Each solute atom in the system must be explicitly defined in the topology.
Molecules are defined in terms of one or more segments. Typically, repetitive parts of a molecule are each defined
as a single segment, such as the amino acid residues in a protein. Segments can be quite complicated to define and are,
therefore, collected in a set of database files. The definition of a molecular system in terms of segments is a sequence.
Topology files are created using the prepare module.
32.1.3 Files
File names used have the form $system$_$calc$.$ext$, with exception of the topology file (Section 32.1.2),
which is named $system$.top. Anything that refers to the definition of the chemical system can be used for
$system$, as long as no periods or underlines are used. The identifier $calc$ can be anything that refers to the
type of calculation to be performed for the system with the topology defined. This file naming convention allows
for the creation of a single topology file $system$.top that can be used for a number of different calculations,
each identified with a different $calc$. For example, if crown.top is the name of the topology file for a crown
ether, crown_em, crown_md, crown_ti could be used with appropriate extensions for the filenames for energy
minimization, molecular dynamics simulation and multi-configuration thermodynamic integration, respectively. All
of these calculations would use the same topology file crown.top.
The extensions <ext> identify the kind of information on a file, and are pre-determined.
32.1. INTRODUCTION 259
Table 32.1: List of file extensions for nwchem chemical system files.
32.1.4 Databases
Database file supplied with NWChem and used by the prepare module are found in directories with name $ffield$_$level$,
where $ffield$ is any of the supported force fields (Section 32.1.5). The source of the data is identified by
$level$, and can be
level Description
s original published data
x additional published data
q contributed data
The user is can replace these directories or add additional database files by specifying them in the .nwchemrc file.
or in the prepare input file.
The extension 1-9 defines the priority of database file.
frg fragments
par parameters
seq sequences
sgm segments
The paths of the different database directories should be defined in a file .nwchemrc in a user’s home directory,
and provides the user the option to select which database files are scanned.
260 CHAPTER 32. MOLECULAR DYNAMICS
The units for lengths, angles, and energies are correspondingly nanometers, radians, and kJ/mol.
The topology of a molecular system is generated by the prepare module from the sequence in terms of segments as
specified on the PDB file. For each unique segment specified in this file the segment database directories are searched
for the segment definition. For segments not found in one of the database directories a segment definition is generated
in the temporary directory if a fragment file was found. If a fragment file could not be found, it is generated by the
prepare module base on what is found on the PDB file.
When all segments are found or created, the parameter substitutions are performed, using force field parameters
taken from the parameter databases. After all lists have been generated the topology is written to a local topology file
$system$.top.
where the theory keyword md specifies use of the molecular dynamics module, and the operation keyword is one of
where the strings systemid and calcid are user defined names for the chemical system and the type of cal-
culation to ber performed, respectively. These names are used to derive the filenames used for the calculation. The
topoly file used will be systemid.top, while all other files are named systemid_calcid.ext.
262 CHAPTER 32. MOLECULAR DYNAMICS
resume
specifies that the current job will be an extension of a previous simulation, using most of the input data that have
been recorded by that previous run in the restart file. Typically the input in the current md input block defines a
larger number of steps than the previous job.
leapfrog | leapfrog_bc
specifies the integration algorithm, where leapfrog specifies the default leap frog integration, and leapfrog_bc
specifies the Brown-Clarke leap frog integrator.
f
fi = fi + f guide ∗ gi−1 (32.1)
Variable tguide defines the length of the averaging relative to the timestep ∆t.
∆t ∆t
gi = fi + 1 − gi−1 (32.2)
tguide tguide
isotherm [<real tmpext> [<real tmpext2>]] [trelax <real tmprlx> [<real tmsrlx>]] \
[anneal [<real tann1>] <real tann2>]
specifies a constant temperature ensemble using Berendsen’s thermostat, where <tmpext> is the external
temperature with a default of 298.15 K, and <tmprlx> and <tmsrlx> are temperature relaxation times in ps
with a default of 0.1. If only <tmprlx> is given the complete system is coupled to the heat bath with relaxation
time <tmprlx>. If both relaxation times are supplied, solvent and solute are independently coupled to the heat
bath with relaxation times <tmprlx> and <tmsrlx>, respectively. If keyword anneal is specified, the
external temperature will change from tmpext to tempext2 between simulation time tann1 and tann2
Cutoff radii can be specified for short range and long range interactions.
32.17 Polarization
First order and self consistent electronic polarization models have been implemented.
32.19 Constraints
Constraints are satisfied using the SHAKE coordinate resetting procedure.
of 10−4 at the short range cutoff radius, and morder is order of the Cardinal B-spline interpolation which
must be an even number and at least 4 (default value). A platform specific 3D fast Fourier transform is used, if
available, when imfft is set to 2. nprocs can be used to define a subset of processors to be used to do the
FFT calculations. If solvent is specified, the charge grid will be calculated from the solvent charges only.
( fix | free )
solvent ( [<integer idfirst> [<integer idlast>]] |
( within | beyond) <real rfix> <string atomname> ) | \
solute ( [<integer idfirst> [<integer idlast>]] [ heavy | {<string atomname>}] |
( within | beyond) <real rfix> <string atomname> )
[permanent]
For solvent the molecule numbers idfirst and idlastmay be specified to be the first and last molecule to
which the directive applies. If omitted, the directive applies to all molecules. For solute, the segment numbers
idfirst and idlastmay be specified to be the first and last segment to which the directive applies. If
omitted, the directive applies to all segments. In addition, the keyword heavy may be specified to apply to all
non hydrogen atoms in the solute, or a set of atom names may be specified in which a wildcard character ? may
be used. Keyword permanent is used to keep the specification on the restart file for subsequent simulations.
detail
specifies that moments of inertia and radii of gyration will be part of the recorded properties.
profile
specifies that execution time profiling data will be part of the recorded properties.
include fixed
specifies that energies will be evaluated between fixed atoms. Normally these interactions are excluded from the
pairlists.
atomlist
specifies that pairlists will be atom based. Normally pairlist are charge group based.
Keyword stat specifies the frequency <nfstat> of printing statistical information of properties that are
calculated during the simulation. For molecular dynamics simulation this frequency is in time steps, for multi-
configuration thermodynamic integration in λ-steps.
Keyword energies specifies the frequency nfener of printing solute bonded energies the output file for
energy/import calculations. The default for nfener is 0.
Keyword forces specifies the frequency nfforc of printing solute forces the output file for energy/import
calculations. The default for nfforc is 0.
Keyword matrix specifies that a solute distance matrix is to be printed.
Keyword expect is obsolete.
Keyword timing specifies that timing data is printed.
Keyword pmf specifies that pmf data is printed every iprpmf steps. Keyword out6 specifies that output is
written to standard out in stead of the output file with extension out.
Keyword dayout is obsolete.
Keyword rdf specifies the frequency <nfrdf> in molecular dynamics steps of calculating contributions to the
radial distribution functions. The default is 0. The range of the radial distribution functions is given by <rrdf>
in nm, with a default of the short range cutoff radius. Note that radial distribution functions are not evaluated
beyond the short range cutoff radius. The number of bins in each radial distribution function is given by <ngl>,
with a default of 1000. This option is no longer supported. If radial distribution function are to be calculated, a
rdi files needs to be available in which the contributions are specified as follows.
32.26 Recording
The following keywords control recording data to file. Record directives may be combined to a single directive.
Keyword wvelo specifies the frequency <nfvelo> in molecular dynamics steps of writing solvent velocitiesto
the trajectory file. This keyword takes precedent over veloc. This directive redefines previous veloc, wvelo
and svelo directives. The default is not to record.
Keyword svelo specifies the frequency <nfsvel> in molecular dynamics steps of writing solute velocities to
the trajectory file. This keyword takes precedent over veloc. This directive redefines previous veloc, wvelo
and svelo directives. The default is not to record.
Keyword force specifies the frequency <nfvelo> in molecular dynamics steps of writing forces to the
trajectory file. This directive redefines previous vforce, wforc and sforc directives. The default is not to
record.
Keyword wforc specifies the frequency <nfvelo> in molecular dynamics steps of writing solvent forcesto
the trajectory file. This keyword takes precedent over force. This directive redefines previous vforce,
wforc and sforc directives. The default is not to record.
Keyword sforc specifies the frequency <nfsvel> in molecular dynamics steps of writing solute forces to the
trajectory file. This keyword takes precedent over force. This directive redefines previous vforce, wforc
and sforc directives. The default is not to record.
Keyword prop specifies the frequency <nfprop> in molecular dynamics steps of writing information to the
property file, with extension prp. The default is not to record.
Keyword prop_average specifies the frequency <nfprop> in molecular dynamics steps of writing average
information to the property file, with extension prp. The default is not to record.
Keyword free specifies the frequency <nffree> in multi-configuration thermodynamic integration steps to
record data to the free energy data file, with extension gib. The default is 1, i.e. to record at every λ. This
option is obsolete. All data are required to do the final analysis.
Keyword sync specifies the frequency <nfsync> in molecular dynamics steps of writing information to the
synchronization file, with extension syn. The default is not to record. The information written is the simulation
time, the wall clock time of the previous MD step, the wall clock time of the previous force evaluation, the total
synchronization time, the largest synchronization time and the node on which the largest synchronization time
was found. The recording of synchronization times is part of the load balancing algorithm. Since load balancing
is only performed when pair-lists are updated, the frequency <nfsync> is correlated with the frequency of
pair-list updates <nfpair>. This directive is only needed for analysis of the load balancing performance. For
normal use this directive is not used.
Keyword times specifies the frequency <nfsync> in molecular dynamics steps of writing information to the
timings file, with extension tim. The default is not to record. The information written is wall clock time used
by each of the processors for the different components in the force evaluation. This directive is only needed for
analysis of the wall clock time distribution. For normal use this directive is not used.
Keywords acf, cnv and fet are obsolete.
Keywords binary, ascii, ecce and argos are obsolete.
[average]
[combination]
[iotime]
[experimental]
determines the type of dynamic load balancing performed, where the default is none. Load balancing option
size is resizing cells on a node, and pairs redistributes the cell-cell interactions over nodes. Keyword reset
will reset the load balancing read from the restart file. The level of cell resizing can be influenced with f actld.
The cells on the busiest node are resized with a factor
Tsync
1
min 3
np − tsync
1 − f actld ∗ (32.3)
twall
min is the
where Tsync is the accumulated synchronization time of all nodes, n p is the total number of nodes, tsync
synchronization time of the busiest node, and twall is the wall clock time of the molecular dynamics step. For
the combined load balancing, ldpair is the number of successive pair redistribution load balancing steps in
which the accumulated synchronization time increases, before a resizing load balancing step will be attempted.
Load balancing is only performed in molecular dynamics steps in which the pair-list is updated. The default
load balancing is equivalent to specifying
Keyword last specifies that the load balancing is based on the synchronization times of the last step. This
is the default. Keyword average specifies that the load balancing is based on the average synchronization
times since the last load balancing step. Keyword minimum specifies that the load balancing is based on the
minimum synchronization times since the last load balancing step. Keywords combination, iotime and
experimental are experimental load balancing options that should not be used in production runs.
(pack | nopack)
specifies if data are communicated in packed or unpacked form. The default is pack.
sets the number of additional cells for which memory is allocated. In rare events the amount of memory set
aside per node is insufficient to hold all atomic coordinates assigned to that node. This leads to execution which
aborts with the message that mwm or msa is too small. Jobs may be restarted with additional space allocated by
where <madbox> is the number of additional cells that are allocated on each node. The default for <madbox>
is 6. In some cases <madbox> can be reduced to 4 if memory usage is a concern. Values of 2 or less will
almost certainly result in memory shortage.
mwm <integer mwmreq>
sets the maximum number of solvent molecules <mwmreq> per node, allowing increased memory to be allo-
cated for solvent molecules. This option can be used if execution aborted because mwm was too small.
msa <integer msareq>
sets the maximum number of solute atoms <msareq> per node, allowing increased memory to be allocated for
solute atoms. This option can be used if execution aborted because msa was too small.
mcells <integer mbbreq>
sets the maximum number of cell pairs <mbbreq> per node, allowing increased memory to be allocated for the
cell pair lists. This option can be used if execution aborted because mbbl was too small.
boxmin <real rbox>
sets the minimum size of a cell. This directive is obsolete. The use of mcells is preferred.
segmentsize <real rsgm>
sets the maximum size of a segment. This value is used to determine which segments at the boundary of the
cutoff radius should be considered in the generation of the pairlists. This value is also determined by the prepare
module and written to the restart file. Use of this directive is not needed for simulations that use the current
prepare module to generate the restart file.
memory <integer memlim>
sets a limit <memlim> in kB on the allocated amount of memory used by the molecular dynamics module. Per
default all available memory is allocated. Use of this command is required for QM/MM simulations only.
expert
enables the use of certain combinations of features that are considered unsafe. This directive should not be used
for production runs.
develop <integer idevel>
enables the use of certain development options specified by the integer idevel. This option is for development
purposes only, and should not be used for production runs.
control <integer icntrl>
enables the use of certain development options specified by the integer icntrl. This option is for development
purposes only, and should not be used for production runs.
numerical
writes out analytical and finite difference forces for test purposes.
server <string servername> <integer serverport>
allows monitoring over a socket connection to the specified port on the named server of basic data as a simulation
is running.
For development purposes debug information can be written to the debug file with extension dbg with
274 CHAPTER 32. MOLECULAR DYNAMICS
membrane [ rotations ]
Constraining the center of mass of solute molecules in the xy plane is accomplished using
radius_gyration
diffusion
Analysis
The analysis module is used to analyze molecular trajectories generated by the NWChem molecular dynamics mod-
ule, or partial charges generated by the NWChem electrostatic potential fit module. This module should not de run in
parallel mode.
Directives for the analysis module are read from an input deck,
analysis
...
end
The analysis is performed as post-analysis of trajectory files through using the task directive
task analysis
or
task analyze
where the strings systemid and calcid are user defined names for the chemical system and the type of cal-
culation to ber performed, respectively. These names are used to derive the filenames used for the calculation. The
topoly file used will be systemid.top, while all other files are named systemid_calcid.ext.
275
276 CHAPTER 33. ANALYSIS
where filename is the name of an existing restart file. This input directive is required.
where filename is an existing trj trajectory file. If firstfile and lastfile are specified, the specified filename needs to
have a ? wild card character that will be substituted by the 3-character integer number from firstfile to lastfile, and the
analysis will be performed on the series of files. For example,
file tr_md?.trj 3 6
will instruct the analysis to be performed on files tr_md003.trj, tr_md004.trj, tr_md005.trj and tr_md006.trj.
From the specified files the subset of frames to be analyzed is specified by
For example, to analyze the first 100 frames from the specified trajectory files, use
frames 100
To analyze every 10-th frame between frames 200 and 400 recorded on the specified trajectory files, use
Solute coordinates of the reference set and ech subsequent frame read from a trajectory file are translated to have
the center of geometry of the specified solute molecule at the center of the simulation box. After this translation all
molecules are folded back into the box according to the periodic boundary conditions. The directive for this operation
is
Coordinates of each frame read from a trajectory file can be rotated using
If center was defined, rotation takes place after the system has been centered. The rotate directives only
apply to frames read from the trajectory files, and not to the reference coordinates. Upto 100 rotate directives can
be specified, which will be carried out in the order in which they appear in the input deck. rotate off cancels all
previously defined rotate directives.
To perform a hydrogen bond analysis:
33.4 Selection
Analyses can be applied to a selection of solute atoms and solvent molecules. The selection is determined by
where {atomlist} is the set of atom names selected from the specified residues. By default all solute atoms are
selected. When keyword super is specified the selecion applies to the superimposition option.
The selected atoms are specified by the string atomlist which takes the form
where isgm and jsgm are the first and last residue numbers, and aname is an atom name. In the atomname a question
mark may be used as a wildcard character.
For example, all protein backbone atoms are selected by
select _N,_CA,_C
select 20-80,90-100:_N,_CA,_C
This selection is reset to apply to all atoms after each file directive.
Solvent molecules within range nm from any selected solute atom are selected by
After solvent selection, the solute atom selection is reset to being all selected.
The current selection can be saved to, or read from a file using the save and read keywords, respectively.
Some analysis are performed on groups of atoms. These groups of atoms are defined by
where isgm and jsgm are the first and last residue numbers, and aname is an atom name. In the atomname a question
mark may be used as a wildcard character.
Multiple define directive can be used to define a single set of atoms.
rmsd
ramachandran
To define a distance:
To define an angle:
To define a torsion:
To define a vector:
where igroup specifies the group of atoms defined with a define directive. Keyword periodic can be used to
specify the periodicity, ipbc=1 for periodicity in z, ipbc=2 for periodicity in x and y, and ipbc=3 for periodicity
in x, y and z. Currently the only option is local which prints all selected solute atom with a distance between rsel
and rval from the atoms defined in igroup. The actual analysis is done by the scan deirective. A formatted report
is printed from group analyses using
groups [<integer igroup> [<integer jgroup>]] [periodic [<integer ipbc default 3>]] \
<string function> [<real value1> [<real value2>]] [<string filename>]
where igroup and jgroup are groups of atoms defined with a define directive. Keyword periodic specifies
that periodic boundary conditions need to be applied in ipbc dimensions. The type of analysis is define by f unction,
value1 and value2. If f ilename is specified, the analysis is applied to the reference coordinates and written to the spec-
ified file. If no filename is given, the analysis is applied to the specified trajectory and performed as part of the scan
directive. Implemented analyses defined by <string function> [<real value1> [<real value2>]]
include
distance to calculate the distance between the centers of geometry of the two specified groups of atoms, and
distances to calculate all atomic distances between atoms in the specified groups that lie between value1 and
value2.
Coordinate histograms are specified by
where ide f is the atom group definition number, length is the size of the histogram, zcoordinate is the cur-
rently only histogram option, and f ilename is the filname to which the histogram is written.
Order parameters are evalated using
which will create, depending on the specified analysis options files filename.rms and filename.ana. After the scan
directive previously defined coordinate analysis options are all reset. Optional keyword super specifies that frames
read from the trajectory file(s) are superimposed to the reference structure before the analysis is performed.
280 CHAPTER 33. ANALYSIS
essential
to project the trajectory onto the specified vector. This will create files filename with extensions frm or trj, val,
vec, _min.pdb and _max.pdb, with the projected trajectory, the projection value, the eigenvector, and the minimum
and maximum projection structure.
For example, an essential dynamics analysis with projection onto the first vector generating files firstvec.{trj, val,
vec, _min.pdb, _max.pdb} is generated by
essential
project 1 firstvec
To copy the selected frames from the specified trejctory file(s), onto a new file, use
To superimpose the selected atoms for each specified frame to the reference coordinates before copying onto a new
file, use
The rotate directive specifies that the structure will make a full ratation every tangle ps. This directive only has
effect when writing povray files.
The format of the new file is determined from the extension, which can be one of
amb AMBER formatted trajectory file (obsolete)
arc DISCOVER archive file
bam AMBER unformatted trajectory file
crd AMBER formatted trajectory file
dcd CHARMM formatted trajectory file
esp gOpenMol formatted electrostatic potential files
frm ecce frames file (obsolete)
pov povray input files
trj NWChem trajectory file
33.7. TRAJECTORY FORMAT CONVERSION 281
where tag number itag is set to the string tag for all atoms anam within a distance rtag from segments iatag
through jatag. A question mark can be used in anam as a wild card character.
Atom rendering is specified using
for all atoms anam within a distance rtag from segments iatag through jatag, and a scaling factor of rval. A
question mark can be used in anam as a wild card character.
Atom color is specified using
for all atoms anam within a distance rtag from segments iatag through jatag. A question mark can be used in
anam as a wild card character.
For example, to display all carbon atoms in segments 34 through 45 in green and rendered cpk in povray files can
be specified with
A zero or negative scaling factor will scale the coordinates to lie within [-1,1] in all dimensions.
The cpk rendering in povray files can be scaled by
A sequence of trajectory files with unequal lengths can be converted to files with all nclean frames using
The input coordinates are taken from the xyzq file that can be generated from a rst by the prepare module. Param-
eter spacing specifies the number of gridpoints per nm, rcut specifies extent of the charge grid beyond the molecule.
Periodic boundaries will be used if periodic is specified. If iper is set to 2, periodic boundary conditions are
applied in x and y dimensions only. If periodic is specified, a negative value of rcut will extend the grid in the
periodic dimensions by abs(rcut), otherwise this value will be ignored in the periodic dimensions. The resulting plt
formatted file pltfile can be viewed with the gOpenMol program. The resulting electrostatic potential grid is in units
of kJ mol−1 e−1 . If no files are specified, only the parameters are set. This analysis applies to solute(s) only.
The electrostatic potential at specific point are evaluated using
Combined or hybrid Quantum Mechanics and Molecular Mechanics (QM/MM) is a simulation methodology that is
about 15 years old but in all the literature there are cautions that calibration computations must be done to validate the
model for each particular chemical system studied. This is not a black box style computation and the NWChem users
are advised that without calibration QM/MM may not give the appropriate results1 . Since both quantum-mechanical
and classical molecular mechanics are involved in the calculation good working knowledge of the two methods is
required to ensure meaningful results.
The QM/MM module is invoked with the following task directive.
where qmtheory specifies quantum method for the calculation of the quantum region. It is expected that most of
QM/MM simulations will be performed with with HF or DFT theories, but any other QM theory supported by
NWChem should also work. Currently the supported operations for QM/MM runs are energy, optimize, saddle,
dynamics, numerical hessian, and numerical frequencies.
Unlike pure quantum mechanical calculations the information about the chemical system for QM/MM simulations
is contained not in the geometry block but in the externally prepared topology and restart files. These files have to be
present prior to any QM/MM simulation. The input file for QM/MM simulation can be divided into three major parts –
specification of the molecular mechanics parameters for the classical region, specification of the quantum mechanical
method for the quantum region, and the parameters of the interaction between quantum and classical methods. All this
discussed in detail in the sections below.
“Methods and Applications of Combined Quantum Mechanical and Molecular Mechanical Potentials.” In Reviews in Computational Chemistry;
K. B. Lipkowitz, D. B. Boyd, Eds.; VCH Publishers: New York; Vol. 7, pp 119-185 (1995); and M. A. Thompson and G. K. Schenter, J. Phys.
Chem 99 6374 (1995)
283
284 CHAPTER 34. COMBINED QUANTUM AND MOLECULAR MECHANICS
this "preparation stage" would be run separately from main QM/MM simulation. This will require a properly formatted
PDB file for the system. In more complex cases (e.g.non-standard residues or nucleotides) additional fragment and
parameter files might have to be provided by the user. The definition of the quantum region in the input for the prepare
module is specified by either modify atom directive (see Section 31):
Here isgm and atomname refer to the residue number and atom name record as given in the PDB file. It is
important to note that that the leading blanks in atom name record should be indicated with underscores. Per PDB
format quidelines the atom name record starts at column 13. If, for example, the atom name record "OW" starts
in the 14th column in PDB file, it will appear as "_OW" in the modify atom directive in the prepare block. In the
current implementation only solute atoms can be declared as quantum. If part of the solvent has to be treated quantum
mechanically then it has to redeclared to be solute. In addition to modify commands the prepare input block should
also contain update lists and ignore directives. There are other options that can be used in the input block for the
prepare module ( e.g. solvating the structure, etc ), those discussed in more details in Section 31. The successful run
of the prepare module will result in generation of topology and restart files. Similar to classical MD, both files are
required for QM/MM simulations and have to be placed in the same directory as the input file. Here is an example
input file that will generate QM/MM restart and topology files for the ethanol molecule
prepare
#--name of the pdb file
source etl0.pdb
#--generate new topology and sequence file
new_top new_seq
#--generate new restart file
new_rst
#--define quantum region (note the use of underscore)
modify atom 1:_C1 quantum
modify atom 1:2H1 quantum
modify atom 1:3H1 quantum
modify atom 1:4H1 quantum
#
update lists
ignore
end
task prepare
These are contents of etl0.pdb file used in the above input file.
Running the input shown above will produce (among other things) the topology file (etl.top) and the restart file
(etl_md.rst). The naming of the topology file follows after the rtdb name specified in the start directive in the input
(i.e. "start etl"), while the "_md" suffix in the restart file name is specific to the way prepare module works in this
particular case. If necessary, this particular naming scheme can be altered using system keyword in the prepare input
block (for more details see Section 31).
Here is more complicated example where the entire ethanol molecule is declared quantum and solvated in a box
of spce waters:
update lists
ignore
end
task prepare
Fixing atoms outside a certain distance from the QM region can also be accomplished using prepare module. These
constraints will then be permanently embedded in the resulting restart file, which may be advantageous for ceratain
types of QM/MM simulations. The actual format for the constraint directive to fix whole residues is
fix segments beyond <real radius> <integer residue number>:<string atom name>
fix atoms beyond <real radius> <integer residue number>:<string atom name>
The following input file illustrated the use of fix segments directive
start etl
286 CHAPTER 34. COMBINED QUANTUM AND MOLECULAR MECHANICS
prepare
source etl0.pdb
new_top new_seq
new_rst
center
orient
#solvation in 40 A cubic box
solvate cube 4.0
modify segment 1 quantum
#fix residues more than 20 A away from ethanol oxygen atom
fix segments beyond 2.0 1:_O
update lists
ignore
end
task prepare
The molecular mechanics parameters are given in the form of standard MD input block as used by the MD module (c.f.
Section 32). This input block is required for QM/MM simulations. It specifies the restart and topology file that will
be used in the calculation. It also contains information relevant to the calculation of the classical region (e.g. cutoff
distances, constraints, optimization and dynamics parameters, etc) in the system. In this input block one can also set
fixed atom constraints on both classical and quantum atoms. Continuing with our example for ethanol molecule here
is a simple input block that may be used for this system.
md
# this specifies that etl_md.rst will be used as a restart file
# and etl.top will be a topology file
system etl_md
# if we ever wanted to fix C1 atom
fix solute 1 _C1
end
The parameters defining calculation of the QM region (including basis sets) must be present in the traditional NWChem
input format except for the geometry block. The geometrical information will be constructed automatically using
information contained in the restart file
The QM/MM interface parameters define the interaction between classical and quantum regions. The input follows
standard NWChem format:
34.4. QM/MM INTERFACE PARAMETERS 287
qmmm
[ eref <double precision default 0.0d0>]
[ bqzone <double precision default 9.0d0>]
[ mm_charges [exclude <(none||all||linkbond||linkbond_H) default none>]
[ expand <none||all||solute||solvent> default none]
[ update <integer default 0>]
Detailed explanation of the subdirectives in the QM/MM input block is given below:
is recomended for both small and large regions and should be used whenever is possible (in many cases it
outperforms "bfgs"). Finally "sd" the most ineficient and slow way to optimize regions, yet it is the only option
available for the optimization of solvent regions. The default is to assign "sd" to optimization involving solvent
region (if any), and "lbfgs" to all others.
File formats
291
292 CHAPTER 35. FILE FORMATS
For each
z-matrix
definition
one card
VII
VII-1-1 i5 atom i
VII-1-2 i5 atom j
VII-1-3 i5 atom k
VII-1-4 i5 atom l
VII-1-5 f12.6 bond length i-j
VII-1-6 f12.6 angle i-j-k
VII-1-7 f12.6 torson i-j-k-l
The NWChem plane-wave (NWPW) module uses pseudopotentials and plane-wave basis sets to perform Density
Functional Theory calculations. This module complements the capabilities of the more traditional Gaussian function
based approaches by having an accuracy at least as good for many applications, yet is still fast enough to treat systems
containing hundreds of atoms. Another significant advantage is its ability to simulate dynamics on a ground state
potential surface directly at run-time using the Car-Parrinello algorithm. This method’s efficiency and accuracy make
it a desirable first principles method of simulation in the study of complex molecular, liquid, and solid state systems.
Applications for this first principles method include the calculation of free energies, search for global minima, explicit
simulation of solvated molecules, and simulations of complex vibrational modes that cannot be described within the
harmonic approximation.
The NWPW module is a collection of three modules.
• PSPW - (PSeudopotential Plane-Wave) A gamma point code for calculating molecules, liquids, crystals, and
surfaces.
• Band - A band structure code for calculating crystals and surfaces with small band gaps (e.g. semi-conductors
and metals).
• PAW - a (gamma point) projector augmented plane-wave code for calculating molecules, crystals, and surfaces
The PSPW, Band, and PAW modules can be used to compute the energy and optimize the geometry. Both the PSPW
and Band modules can also be used to find saddle points, and compute numerical second derivatives. In addition the
PSPW module can also be used to perform Car-Parrinello molecular dynamics.
Section 36.1 describes the tasks contained within the PSPW module, section 36.2 describes the tasks contained
within the Band module, section 36.3 describes the tasks contained within the PAW module, and section 36.4 de-
scribes the pseudopotential library included with NWChem. The datafiles used by the PSPW module are described in
section 36.5. Car-Parrinello output data files are described in section 36.5.7, and the minimization and Car-Parrinello
algorithms are described in section 36.6. Examples of how to setup and run a PSPW geometry optimization, a Car-
Parrinello simulation, a band structure minimization, and a PAW geometry optimization are presented in sections
36.7, 36.16, and 36.11, and 36.14. Finally in section 36.17 the capabilities and limitations of the NWPW module are
discussed.
If you are a first time user of this module it is recommended that you skip the next five sections and proceed directly
to the tutorials in sections 36.7-36.14.
301
302 CHAPTER 36. PSEUDOPOTENTIAL PLANE-WAVE DENSITY FUNCTIONAL THEORY (NWPW)
PSPW
...
END
TASK PSPW
there are additional directives that are specific to the PSPW module, which are:
Once a user has specified a geometry, the PSPW module can be invoked with no input directives (defaults invoked
throughout). However, the user will probably always specify the simulation cell used in the computation, since the
default simulation cell is not well suited for most systems. There are sub-directives which allow for customized
application; those currently provided as options for the PSPW module are:
PSPW
CELL_NAME <string cell_name default ’cell_default’>
INPUT_WAVEFUNCTION_FILENAME <string input_wavefunctions default input_movecs>
OUTPUT_WAVEFUNCTION_FILENAME <string output_wavefunctions default input_movecs>
FAKE_MASS <real fake_mass default 400000.0>
TIME_STEP <real time_step default 5.8>
LOOP <integer inner_iteration outer_iteration default 10 100>
TOLERANCES <real tole tolc default 1.0e-7 1.0e-7>
CUTOFF <real cutoff>
ENERGY_CUTOFF <real ecut default (see input description)>
WAVEFUNCTION_CUTOFF <real wcut default (see input description)>
EWALD_NCUT <integer ncut default 1>]
36.1. PSPW TASKS 303
END
The following list describes the keywords contained in the PSPW input block.
• <output_wavefunctions> - name of the file that will contain the one-electron orbitals at the end of the run.
• <fake_mass> - value for the electronic fake mass (µ). This parameter is not presently used in a conjugate
gradient simulation
• <time_step> - value for the time step (∆t). This parameter is not presently used in a conjugate gradient simu-
lation.
• <inner_iteration> - number of iterations between the printing out of energies and tolerances
304 CHAPTER 36. PSEUDOPOTENTIAL PLANE-WAVE DENSITY FUNCTIONAL THEORY (NWPW)
• <cutoff> - value for the cutoff energy used to define the wavefunction. In addition using the CUTOFF keyword
automatically sets the cutoff energy for the density to be twice the wavefunction cutoff.
• <ecut> - value for the cutoff energy used to define the density. Default is set to be the maximum value that will
fit within the simulation_cell <cell_name>.
• <wcut> - value for the cutoff energy used to define the one-electron orbitals. Default is set to be the maximum
value that will fit within the simulation_cell <cell_name>.
• <ncut> - value for the number of unit cells to sum over (in each direction) for the real space part of the Ewald
summation. Note Ewald summation is only used if the simulation_cell is periodic.
• <rcut> - value for the cutoff radius used in the Ewald summation. Note Ewald summation is only used if the
simulation_cell is periodic.
ai |)
Default set to be MIN(|~
π , i = 1, 2, 3.
• (Vosko || PBE96 || revPBE || ...) - Choose between Vosko et al’s LDA parameterization or the orginal and
revised Perdew, Burke, and Ernzerhof GGA functional. In addition, several hybrid options.
• MULT - optional keyword which if specified allows the user to define the spin multiplicity of the system
• MULLIKEN - optional keyword which if specified causes a Mulliken analysis to be performed at the end of the
simulation.
• ALLOW_TRANSLATION - By default the the center of mass forces are projected out of the computed forces.
This optional keyword if specified allows the center of mass forces to not be zero.
• SCF - optional keyword which sets the minimizer to be a band by band minimizer. Several options are available
for setting the density or potential mixing, and the type of Kohn-Sham minimizer.
• <mapping> - for a value of 1 slab FFT is used, for a value of 2 a 2d-hilbert FFT is used.
36.1. PSPW TASKS 305
A prototype limited memory BFGS (LMBFGS) minimizer can be used to minimize the energy. To use this new
optimizer the following SET directive needs to be specified:
Limited testing suggests that the Grassman LMBFGS minimizer is about twice as fast as the conjugate gradient
minimizer. However, there are several known cases where this optimizer fails, so it is currently not a default option,
and should be used with caution.
In addition the following SET directives can be specified:
set nwpw:lcao_skip .false. # Default - initial wavefunctions generated using an LCAO guess.
set nwpw:lcao_skip .true. # Initial wavefunctions generated using a random plane-wave gues
set nwpw:lcao_print .false. # Default - Ouput not produced during the generation of the LCA
set nwpw:lcao_print .true. # Output produced during the generation of the LCAO guess.
The simulation cell parameters are entered by defining a simulation_cell sub-block within the PSPW block. Listed
below is the format of a simulation_cell sub-block.
PSPW
...
SIMULATION_CELL
CELL_NAME <string name default ’cell_default’>
BOUNDARY_CONDITIONS (periodic || aperiodic default periodic)
LATTICE_VECTORS
<real a1.x a1.y a1.z default 20.0 0.0 0.0>
<real a2.x a2.y a2.z default 0.0 20.0 0.0>
<real a3.x a3.y a3.z default 0.0 0.0 20.0>
NGRID <integer na1 na2 na3 default 32 32 32>
END
...
END
Basically, the user needs to enter the dimensions, gridding and boundary conditions of the simulation cell. The
following list describes the input in detail.
• <a2.x a2.y a2.z> - user-supplied values for the second lattice vector
• <a3.x a3.y a3.z> - user-supplied values for the third lattice vector
• <na1 na2 na3> - user-supplied values for discretization along lattice vector directions.
Alternatively, instead of explicitly entering lattice vectors, users can enter the unit cell using the standard cell
parameters, a, b, c, α, β, and γ, by using the LATTICE block. The format for input is as follows:
PSPW
...
SIMULATION_CELL
...
LATTICE
[lat_a <real a default 20.0>]
[lat_b <real b default 20.0>]
[lat_c <real c default 20.0>]
[alpha <real alpha default 90.0>]
[beta <real beta default 90.0>]
[gamma <real gamma default 90.0>]
END
...
END
...
END
The user can also enter the lattice vectors of standard unit cells using the keywords SC, FCC, BCC, for simple
cubic, face-centered cubic, and body-centered cubic respectively. Listed below is an example of the format of this type
of input.
PSPW
...
SIMULATION_CELL
SC 20.0
....
END
...
END
Finally, the lattice vectors from the unit cell can also be defined using the fractional coordinate input in the GE-
OMETRY input (see section 6.7). Listed below is an example of the format of this type of input for an 8 atom silicon
carbide unit cell.
geometry units au
system crystal
lat_a 8.277d0
lat_b 8.277d0
lat_c 8.277d0
alpha 90.0d0
beta 90.0d0
gamma 90.0d0
36.1. PSPW TASKS 307
end
Si -0.50000d0 -0.50000d0 -0.50000d0
Si 0.00000d0 0.00000d0 -0.50000d0
Si 0.00000d0 -0.50000d0 0.00000d0
Si -0.50000d0 0.00000d0 0.00000d0
C -0.25000d0 -0.25000d0 -0.25000d0
C 0.25000d0 0.25000d0 -0.25000d0
C 0.25000d0 -0.25000d0 0.25000d0
C -0.25000d0 0.25000d0 0.25000d0
end
The PSPW module using the DRIVER geometry optimizer can optimize a crystal unit cell. Currently this type of
optimization works only if the geometry is specified in fractional coordinates. The following SET directive is used to
tell the DRIVER geometry optimizer to optimize the crystal unit cell in addition to the geometry.
36.1.3 DPLOT
The pspw dplot task is used to generate plots of various types of electron densities (or orbitals) of a molecule. The
electron density is calculated on the specified set of grid points from a PSPW calculation. The output file generated is
in the Gaussian Cube format. Input to the DPLOT task is contained within the DPLOT sub-block.
PSPW
...
DPLOT
...
END
...
END
PSPW
...
DPLOT
VECTORS <string input_wavefunctions default input_movecs>
DENSITY [total||difference||alpha||beta||laplacian||potential default total] <string d
ELF [restricted|alpha|beta] <string elf_name no default>
ORBITAL <integer orbital_number no default> <string orbital_name no default>
308 CHAPTER 36. PSEUDOPOTENTIAL PLANE-WAVE DENSITY FUNCTIONAL THEORY (NWPW)
END
...
END
The following list describes the input for the DPLOT sub-block.
This sub-directive specifies the name of the molecular orbital file. If the second file is optionally given the density is
computed as the difference between the corresponding electron densities. The vector files have to match.
This sub-directive specifies, what kind of density is to be plotted. The known names for total, difference, alpha, beta,
laplacian, and potential.
By default the grid spacing and the limits of the cell to be plotted are defined by the input wavefunctions. Alternatively
the user can use the LIMITXYZ sub-directive to specify other limits. The grid is generated using No_Of_Spacings +
1 points along each direction. The known names for Units are angstroms, au and bohr.
36.1.4 Wannier
The pspw wannier task is generate maximally localized (Wannier) molecular orbitals. The algorithm proposed by
Silvestrelli et al is use to generate the Wannier orbitals. The current version of this code works only for cubic cells.
Input to the Wannier task is contained within the Wannier sub-block.
36.1. PSPW TASKS 309
PSPW
...
Wannier
...
END
...
END
PSPW
...
Wannier
OLD_WAVEFUNCTION_FILENAME <string input_wavefunctions default input_movecs>
NEW_WAVEFUNCTION_FILENAME <string output_wavefunctions default input_movecs>
END
...
END
The following list describes the input for the Wannier sub-block.
The SET directive is used to specify the molecular orbitals contribute to the self-interaction-correction (SIC) term.
This defines only the molecular orbitals in the list as SIC active. All other molecular orbitals will not contribute to the
SIC term.
For example the following directive specifies that the molecular orbitals numbered 1,5,6,7,8, and 15 are SIC active.
or equivalently
set pspw:SIC_orbitals 1 5 6 7 8 15
Two types of solvers can be used and they are specified using the following SET directive
The parameters for the cutoff coulomb kernel are defined by the following SET directives:
The MULLIKEN option can be used to generate derived atomic point charges from a plane-wave density. This
analysis is based on a strategy suggested in the work of P.E. Blochl, J. Chem. Phys. vol. 103, page 7422 (1995).
In this strategy the low-frequency components a plane-wave density are fit to a linear combination of atom centered
Gaussian functions.
The following SET directives are used to define the fitting.
set pspw_APC:Gc <real Gc_cutoff> # specifies the maximum frequency component of the densit
set pspw_APC:nga <integer number_gauss> # specifies the the number of Gaussian functions pe
atom.
set pspw_APC:gamma <real gamma_list> # specifies the decay lengths of each atom centered Ga
36.1.7 Car-Parrinello
The Car-Parrinello task is used to perform ab initio molecular dynamics using the scheme developed by Car and
Parrinello. In this unified ab initio molecular dynamics scheme the motion of the ion cores is coupled to a fictitious
motion for the Kohn-Sham orbitals of density functional theory. Constant energy or constant temperature simulations
can be performed. A detailed description of this method is described in section 36.6.
Input to the Car-Parrinello simulation is contained within the Car-Parrinello sub-block.
PSPW
...
Car-Parrinello
...
END
...
END
The Car-Parrinello sub-block contains a great deal of input, including pointers to data, as well as parameter input.
Listed below is the format of a Car-Parrinello sub-block.
PSPW
...
Car-Parrinello
CELL_NAME <string cell_name default ’cell_default’>
INPUT_WAVEFUNCTION_FILENAME <string input_wavefunctions default input_movecs>
OUTPUT_WAVEFUNCTION_FILENAME <string output_wavefunctions default input_movecs>
INPUT_V_WAVEFUNCTION_FILENAME <string input_v_wavefunctions default input_vmovecs>
OUTPUT_V_WAVEFUNCTION_FILENAME <string output_v_wavefunctions default input_vmovecs>
FAKE_MASS <real fake_mass default default 1000.0>
TIME_STEP <real time_step default 5.0>
LOOP <integer inner_iteration outer_iteration default 10 1>
SCALING <real scale_c scale_r default 1.0 1.0>
ENERGY_CUTOFF <real ecut default (see input description)>
WAVEFUNCTION_CUTOFF <real wcut default (see input description)>
EWALD_NCUT <integer ncut default 1>
EWALD_RCUT <real rcut default (see input description)>
XC (Vosko || LDA || PBE96 || revPBE || HF || PBE0 || revPBE0 ||
LDA-SIC || LDA-SIC/2 || LDA-0.4SIC || LDA-SIC/4 || LDA-0.2SIC ||
PBE96-SIC || PBE96-SIC/2 || PBE96-0.4SIC || PBE96-SIC/4 || PBE96-0.2SIC ||
revPBE-SIC || revPBE-SIC/2 || revPBE-0.4SIC || revPBE-SIC/4 || revPBE-0.2SIC ||
default Vosko)
[Nose-Hoover <real Period_electron Temperature_electrion Period_ion Temperature_ion
default 100.0 298.15 100.0 298.15>]
[SA_decay <real sa_scale_c sa_scale_r default 1.0 1.0>]
XYZ_FILENAME <string xyz_filename default XYZ>
EMOTION_FILENAME <string emotion_filename default EMOTION>
HMOTION_FILENAME <string hmotion_filename default HMOTION>
OMOTION_FILENAME <string omotion_filename default OMOTION>
EIGMOTION_FILENAME <string eigmotion_filename default EIGMOTION>
ION_MOTION_FILENAME <string ion_motion_filename default MOTION>
END
...
END
The following list describes the input for the Car-Parrinello sub-block.
• <cell_name> - name of the the simulation_cell named <cell_name>. See section 36.1.1.
• <input_wavefunctions> - name of the file containing one-electron orbitals
• <output_wavefunctions> - name of the file that will contain the one-electron orbitals at the end of the run.
• <input_v_wavefunctions> - name of the file containing one-electron orbital velocities.
• <output_v_wavefunctions> - name of the file that will contain the one-electron orbital velocities at the end of
the run.
312 CHAPTER 36. PSEUDOPOTENTIAL PLANE-WAVE DENSITY FUNCTIONAL THEORY (NWPW)
• <scale_c> - value for the initial velocity scaling of the one-electron orbital velocities.
• <scale_r> - value for the initial velocity scaling of the ion velocities.
• <ecut> - value for the cutoff energy used to define the density. Default is set to be the maximum value that will
fit within the simulation_cell <cell_name>.
• <wcut> - value for the cutoff energy used to define the one-electron orbitals. Default is set to be the maximum
value that will fit within the simulation_cell <cell_name>.
• <ncut> - value for the number of unit cells to sum over (in each direction) for the real space part of the Ewald
summation. Note Ewald summation is only used if the simulation_cell is periodic.
• <rcut> - value for the cutoff radius used in the Ewald summation. Note Ewald summation is only used if the
simulation_cell is periodic.
ai |)
Default set to be MIN(|~
π , i = 1, 2, 3.
• (Vosko || PBE96 || revPBE || ...) - Choose between Vosko et al’s LDA parameterization or the orginal and
revised Perdew, Burke, and Ernzerhof GGA functional. In addition, several hybrid options.
• Nose-Hoover - optional subblock which if specified causes the simulation to perform Nose-Hoover dynamics.
If this subblock is not specified the simulation performs constant energy dynamics. See section 36.6.2 for a
description of the parameters.
• SA_decay - optional subblock which if specified causes the simulation to run a simulated annealing simulation.
For simulated annealing to work the Nose-Hoover subblock needs to be specified. The initial temperature are
taken from the Nose-Hoover subblock. See section 36.6.2 for a description of the parameters.
• <emotion_filename> - name of the emotion motion file. See section 36.5.7 for a description of the datafile.
• <hmotion_filename> - name of the hmotion motion file. See section 36.5.7 for a description of the datafile.
• <eigmotion_filename> - name of the eigmotion motion file. See section 36.5.7 for a description of the datafile.
• <ion_motion_filename> - name of the ion_motion motion file. See section 36.5.7 for a description of the
datafile.
• MULLIKEN - optional keyword which if specified causes an omotion motion file to be created.
36.1. PSPW TASKS 313
• <omotion_filename> - name of the omotion motion file. See section 36.5.7 for a description of the datafile.
When a DPLOT sub-block is specified the following SET directive can be used to output dplot data during a
Car-Parrinello simulation:
The Gaussian cube files specified in the DPLOT sub-block are appended with the specified iteration number.
For example, the following directive specifies that at the 3,10,11,12,13,14,15, and 50 iterations Gaussian cube files
are to be produced.
The Car-Parrinello module allows users to freeze the cartesian coordinates in a simulation (Note - the Car-Parrinello
code recognizes Cartesian constraints, but it does not recognize internal coordinate constraints). The SET directive
(Section 6.6) is used to freeze atoms, by specifying a directive of the form:
This defines only the centers in the list as active. All other centers will have zero force assigned to them, and will
remain frozen at their starting coordinates during a Car-Parrinello simulation.
For example, the following directive specifies that atoms numbered 1, 5, 6, 7, 8, and 15 are active and all other
atoms are frozen:
or equivalently,
set geometry:actlist 1 5 6 7 8 15
If this option is not specified by entering a SET directive, the default behavior in the code is to treat all atoms as
active. To revert to this default behavior after the option to define frozen atoms has been invoked, the UNSET directive
must be used (since the database is persistent, see Section 3.2). The form of the UNSET directive is as follows:
unset geometry:actlist
In addition, the Car-Parrinello module allows users to freeze bond lengths via a Shake algorithm. The following
SET directive shows how to do this.
This input fixes the bond length between atoms 2 and 6 to be 6.9334 bohrs. Note that this input only recognizes bohrs.
When using constraints it is usually necessary to turn off center of mass shifting. This can be done by the following
SET directive.
36.1.9 QM/MM
A preliminary QM/MM capability that can run Car-Parrinello molecular dynamics has been integrated into the PSPW
module. Currently, the input is not very robust but it is straightforward. The first step to run a QM/MM simulations is
to define the MM atoms in the geometry block. The MM atoms must be at the end of the geometry and a carat, " ˆ ",
must be appended to the end of the atom name, e.g.
Next the pseudopotentials have be defined for the every type of MM atom contained in the geometry blocks. The
following local pseudopotential suggested by Laio, VandeVondele and Rothlisberger can be automatically generated.
rc nσ − rnσ
V (~r) = −Zion (36.1)
−sign(Zion ) ∗ rc nσ +1 − rnσ +1
The following input To define this pseudopo the Oˆ MM atom using the following input
NWPW
QMMM
mm_psp O^ -0.8476 4 0.70
END
END
defines the local pseudopotential for the Oˆ MM atom , where Zion = −0.8476, nσ = 4, and rc = 0.7. The following
input can be used to define the local pseudopotentials for all the MM atoms in the geometry block defined above
NWPW
QMMM
mm_psp O^ -0.8476 4 0.70
mm_psp H^ 0.4238 4 0.40
END
END
Next the Lenard-Jones potentials for the QM and MM atoms need to be defined. This is done as as follows
NWPW
QMMM
36.1. PSPW TASKS 315
Note that the Lenard-Jones potential is not defined for the MM H atoms in this example. The final step is to define
the MM fragments in the simulation. MM fragments are a set of atoms in which bonds and angle harmonic potentials
are defined, or alternatively shake constraints are defined. The following input defines the fragments for the two water
molecules in the above geometry,
NWPW
QMMM
fragment spc
size 3 #size of fragment
index_start 6:9:3 #atom index list that defines the start of
# the fragments (start:final:stride)
NWPW
QMMM
fragment spc
size 3 #size of fragment
index_start 6:9:3 #atom index list that defines the start of
# the fragments (start:final:stride)
NWPW
QMMM
fragment spc1
size 3 #size of fragment
index_start 6 #atom index list that defines the start of
#the fragments
end
fragment spc2
size 3 #size of fragment
index_start 9 #atom index list that defines the start of
#the fragments
shake units angstroms 1 2 3 cyclic 1.0 1.632993125 1.0
end
END
END
36.1.10 PSP_GENERATOR
A one-dimensional pseudopotential code has been integrated into NWChem. This code allows the user to modify and
develop pseudopotentials. Currently, only the Hamann and Troullier-Martins norm-conserving pseudopotentials can
be generated. In future releases, the pseudopotential library (section 36.4) will be more complete, so that the user will
not have explicitly generate pseudopotentials using this module.
Input to the PSP_GENERATOR task is contained within the PSP_GENERATOR sub-block.
PSPW
...
PSP_GENERATOR
...
END
...
END
PSPW
...
PSP_GENERATOR
PSEUDOPOTENTIAL_FILENAME: <string psp_name>
ELEMENT: <string element>
CHARGE: <real charge>
MASS_NUMBER: <real mass_number>
ATOMIC_FILLING: <integer ncore nvalence>
( (1||2||...) (s||p||d||f||...) <real filling> \
...)
end
...
END
The following list describes the input for the PSP_GENERATOR sub-block.
• ATOMIC_FILLING:.....(see below)
• CUTOFF:....(see below)
ATOMIC_FILLING Block
This required block is used to define the reference atom which is used to define the pseudopotential. After the
ATOMIC_FILLING: <ncore> <nvalence> line, the core states are listed (one per line), and then the valence states
are listed (one per line). Each state contains two integer and a value. The first integer specifies the radial quantum
number, n, The second integer specifies the angular momentum quantum number, l, and the third value specifies the
occupation of the state.
For example to define a pseudopotential for the Neon atom in the 1s2 2s2 2p6 state could have the block
ATOMIC_FILLING: 1 2
1 s 2.0 #core state - 1s^2
2 s 2.0 #valence state - 2s^2
2 p 6.0 #valence state - 2p^6
ATOMIC_FILLING: 3 0
1 s 2.0 #core state
2 s 2.0 #core state
2 p 6.0 #core state
CUTOFF Block
This optional block specifies the cutoff distances used to match the all-electron atom to the pseudopotential atom. For
Hamann pseudopotentials rcut (l) defines the distance where the all-electron potential is matched to the pseudopotential,
and for Troullier-Martins pseudopotentials rcut (l) defines the distance where the all-electron orbital is matched to the
pseudowavefunctions. Thus the definition of the radii depends on the type of pseudopotential. The cutoff radii used in
Hamann pseudopotentials will be smaller than the cutoff radii used in Troullier-Martins pseudopotentials.
For example to define a softened Hamann pseudopotential for Carbon would be
ATOMIC_FILLING: 1 2
1 s 2.0
2 s 2.0
2 p 2.0
CUTOFF: 2
s 0.8
p 0.85
d 0.85
ATOMIC_FILLING: 1 2
1 s 2.0
2 s 2.0
2 p 2.0
CUTOFF: 2
s 1.200
p 1.275
d 1.275
SEMICORE_RADIUS Option
Specifying the SEMICORE_RADIUS option turns on the semicore correction approximation proposed by Louie et
al (S.G. Louie, S. Froyen, and M.L. Cohen, Phys. Rev. B, 26, 1738, (1982)). This approximation is known to
dramatically improve results for systems containing alkali and transition metal atoms.
The implementation in the PSPW module defines the semi-core density, ρsemicore in terms of the core density, ρcore ,
by using the sixth-order polynomial
ρcore
if r ≥ rsemicore
ρsemicore (r) = (36.2)
c0 + c3 r3 + c4 r4 + c5 r5 + c6 r6 if r < rsemicore
This expansion was suggested by Fuchs and Scheffler (M. Fuchs, and M. Scheffler, Comp. Phys. Comm.,119,67
(1999)), and is better behaved for taking derivatives (i.e. calculating ionic forces) than the expansion suggested by
Louie et al.
36.1.11 WAVEFUNCTION_INITIALIZER
The functionality of this task is now performed automatically. For backward compatibility, we provide a description
of the input to this task.
36.1. PSPW TASKS 319
The wavefunction_initializer task is used to generate an initial wavefunction datafile. Input to the WAVEFUNC-
TION_INITIALIZER task is contained within the WAVEFUNCTION_INITIALIZER sub-block.
PSPW
...
WAVEFUNCTION_INITIALIZER
...
END
...
END
PSPW
...
WAVEFUNCTION_INITIALIZER
CELL_NAME: <string cell_name>
WAVEFUNCTION_FILENAME: <string wavefunction_name default input_movecs>
(RESTRICTED||UNRESTRICTED)
if (RESTRICTED)
RESTRICTED_ELECTRONS: <integer restricted electrons>
if (UNRESTRICTED)
UP_ELECTRONS: <integer up_electrons>
DOWN_ELECTRONS: <integer down_electrons>
END
...
END
The following list describes the input for the WAVEFUNCTION_INITIALIZER sub-block.
For backward compatibility, the input to the WAVEFUNCTION_INITIALIZER sub-block can also be of the form
320 CHAPTER 36. PSEUDOPOTENTIAL PLANE-WAVE DENSITY FUNCTIONAL THEORY (NWPW)
PSPW
...
WAVEFUNCTION_INITIALIZER
CELL_NAME: <string cell_name>
WAVEFUNCTION_FILENAME: <string wavefunction_name default input_movecs>
(RESTRICTED||UNRESTRICTED)
where
• <up_filling> - number of restricted molecular orbitals if RESTRICTED and number of spin-up molecular
orbitals if UNRESTRICTED.
The values for the planewave (−2|| − 1||1||2) are used to represent whether the specified planewave is a cosine
or a sine function, in addition random noise can be added to these base functions. That is +1 represents a cosine
function, and −1 represents a sine function. The +2 and −2 values are used to represent a cosine function with
random components added and a sine function with random components added respectively.
36.1.12 V_WAVEFUNCTION_INITIALIZER
The functionality of this task is now performed automatically. For backward compatibility, we provide a description
of the input to this task.
The v_wavefunction_initializer task is used to generate an initial velocity wavefunction datafile. Input to the
V_WAVEFUNCTION_INITIALIZER task is contained within the V_WAVEFUNCTION_INITIALIZER sub-block.
PSPW
...
V_WAVEFUNCTION_INITIALIZER
...
36.1. PSPW TASKS 321
END
...
END
PSPW
...
V_WAVEFUNCTION_INITIALIZER
V_WAVEFUNCTION_FILENAME: <string v_wavefunction_name default input_vmovecs>
CELL_NAME: <string cell_name>
(RESTRICTED||UNRESTRICTED)
UP_FILLING: <integer up_filling>
DOWN_FILLING: <integer down_filling>
END
...
END
The following list describes the input for the V_WAVEFUNCTION_INITIALIZER sub-block.
36.1.13 WAVEFUNCTION_EXPANDER
The functionality of this task is now performed automatically. For backward compatibility, we provide a description
of the input to this task.
The wavefunction_expander task is used to convert a new wavefunction file that spans a larger grid space from
an old wavefunction file. Input to the WAVEFUNCTION_EXPANDER task is contained within the WAVEFUNC-
TION_EXPANDER sub-block.
PSPW
...
WAVEFUNCTION_EXPANDER
...
END
...
END
322 CHAPTER 36. PSEUDOPOTENTIAL PLANE-WAVE DENSITY FUNCTIONAL THEORY (NWPW)
PSPW
...
WAVEFUNCTION_EXPANDER
OLD_WAVEFUNCTION_FILENAME: <string old_wavefunction_name default input_movecs>
NEW_WAVEFUNCTION_FILENAME: <string new_wavefunction_name default input_movecs>
NEW_NGRID: <integer na1 na2 na3>
END
...
END
The following list describes the input for the WAVEFUNCTION_EXPANDER sub-block.
• <na1 na2 na3> - number of grid points in each dimension for the new wavefunction file.
36.1.14 STEEPEST_DESCENT
The functionality of this task is now performed automatically by the PSPW minimizer. For backward compatibility,
we provide a description of the input to this task.
The steepest_descent task is used to optimize the one-electron orbitals with respect to the total energy. In addition
it can also be used to optimize geometries. This method is meant to be used for coarse optimization of the one-electron
orbitals.
Input to the steepest_descent simulation is contained within the steepest_descent sub-block.
PSPW
...
STEEPEST_DESCENT
...
END
...
END
The steepest_descent sub-block contains a great deal of input, including pointers to data, as well as parameter input.
Listed below is the format of a STEEPEST_DESCENT sub-block.
36.1. PSPW TASKS 323
PSPW
...
STEEPEST_DESCENT
CELL_NAME <string cell_name>
[GEOMETRY_OPTIMIZE]
INPUT_WAVEFUNCTION_FILENAME <string input_wavefunctions default input_movecs>
OUTPUT_WAVEFUNCTION_FILENAME <string output_wavefunctions default input_movecs>
FAKE_MASS <real fake_mass default 400000.0>
TIME_STEP <real time_step default 5.8>
LOOP <integer inner_iteration outer_iteration default 10 1>
TOLERANCES <real tole tolc tolr default 1.0d-9 1.0d-9 1.0d-4>
ENERGY_CUTOFF <real ecut default (see input desciption)>
WAVEFUNCTION_CUTOFF <real wcut default (see input description)>
EWALD_NCUT <integer ncut default 1>
EWALD_RCUT <real rcut default (see input description)>
XC (Vosko || LDA || PBE96 || revPBE || HF || PBE0 || revPBE0 ||
LDA-SIC || LDA-SIC/2 || LDA-0.4SIC || LDA-SIC/4 || LDA-0.2SIC ||
PBE96-SIC || PBE96-SIC/2 || PBE96-0.4SIC || PBE96-SIC/4 || PBE96-0.2SIC ||
revPBE-SIC || revPBE-SIC/2 || revPBE-0.4SIC || revPBE-SIC/4 || revPBE-0.2SIC ||
default Vosko)
[MULLIKEN]
END
...
END
The following list describes the input for the STEEPEST_DESCENT sub-block.
• <ncut> - value for the number of unit cells to sum over (in each direction) for the real space part of the Ewald
summation. Note Ewald summation is only used if the simulation_cell is periodic.
• <rcut> - value for the cutoff radius used in the Ewald summation. Note Ewald summation is only used if the
simulation_cell is periodic.
ai |)
Default set to be MIN(|~
π , i = 1, 2, 3.
• (Vosko || PBE96 || revPBE || ...) - Choose between Vosko et al’s LDA parameterization or the orginal and
revised Perdew, Burke, and Ernzerhof GGA functional. In addition, several hybrid options.
• MULLIKEN - optional keyword which if specified causes a Mulliken analysis to be performed at the end of the
simulation.
NWPW
...
END
TASK Band
Once a user has specified a geometry, the Band module can be invoked with no input directives (defaults invoked
throughout). There are sub-directives which allow for customized application; those currently provided as options for
the Band module are:
NWPW
CELL_NAME <string cell_name default ’cell_default’>
ZONE_NAME <string zone_name default ’zone_default’>
INPUT_WAVEFUNCTION_FILENAME <string input_wavefunctions default input_movecs>
OUTPUT_WAVEFUNCTION_FILENAME <string output_wavefunctions default input_movecs>
FAKE_MASS <real fake_mass default 400000.0>
TIME_STEP <real time_step default 5.8>
LOOP <integer inner_iteration outer_iteration default 10 100>
TOLERANCES <real tole tolc default 1.0e-7 1.0e-7>
CUTOFF <real cutoff>
ENERGY_CUTOFF <real ecut default (see input description)>
WAVEFUNCTION_CUTOFF <real wcut default (see input description)>
EWALD_NCUT <integer ncut default 1>]
EWALD_RCUT <real rcut default (see input description)>
XC (Vosko || PBE96 || revPBE default Vosko)
DFT||ODFT||RESTRICTED||UNRESTRICTED
MULT <integer mult default 1>
CG
LMBFGS
SCF [Anderson|| simple || Broyden]
[CG || RMM-DIIS]
36.2. BAND TASKS 325
[density || potential]
[ALPHA real alpha default 0.25]
[ITERATIONS integer inner_iterations default 5]
[OUTER_ITERATIONS integer outer_iterations default 0]
END
• <output_wavefunctions> - name that will point to file containing the one-electron orbitals at the end of the run.
• <fake_mass> - value for the electronic fake mass (µ). This parameter is not presently used in a conjugate
gradient simulation
• <time_step> - value for the time step (∆t). This parameter is not presently used in a conjugate gradient simu-
lation.
• <inner_iteration> - number of iterations between the printing out of energies and tolerances
• <cutoff> - value for the cutoff energy used to define the wavefunction. In addition using the CUTOFF keyword
automatically sets the cutoff energy for the density to be twice the wavefunction cutoff.
• <ecut> - value for the cutoff energy used to define the density. Default is set to be the maximum value that will
fit within the simulation_cell <cell_name>.
• <wcut> - value for the cutoff energy used to define the one-electron orbitals. Default is set to be the maximum
value that will fix within the simulation_cell <cell_name>.
• <ncut> - value for the number of unit cells to sum over (in each direction) for the real space part of the Ewald
summation. Note Ewald summation is only used if the simulation_cell is periodic.
• <rcut> - value for the cutoff radius used in the Ewald summation. Note Ewald summation is only used if the
simulation_cell is periodic.
ai |)
Default set to be MIN(|~
π , i = 1, 2, 3.
326 CHAPTER 36. PSEUDOPOTENTIAL PLANE-WAVE DENSITY FUNCTIONAL THEORY (NWPW)
• (Vosko || PBE96 || revPBE) - Choose between Vosko et al’s LDA parameterization or the orginal and revised
Perdew, Burke, and Ernzerhof GGA functional.
• SCF - optional keyword which sets the minimizer to be a band by band minimizer. Several options are available
for setting the density or potential mixing, and the type of Kohn-Sham minimizer.
To supply the special points of the Brillouin zone, the user defines a brillouin_zone sub-block within the NWPW
block. Listed below is the format of a brillouin_zone sub-block.
NWPW
...
BRILLOUIN_ZONE
ZONE_NAME <string name default ’zone_default’>
(KVECTOR <real k1 k2 k3 no default> <real weight default (see input description)>
...)
END
...
END
The user enters the special points and weights of the Brillouin zone. The following list describes the input in detail.
• <k1 k2 k3> - user-supplied values for a special point in the Brillouin zone.
• <weight> - user-supplied weight. Default is to set the weight so that the sum of all the weights for the entered
special points adds up to unity.
36.2.2 BAND_DPLOT
The BAND BAND_DPLOT task is used to generate plots of various types of electron densities (or orbitals) of a crystal.
The electron density is calculated on the specified set of grid points from a Band calculation. The output file generated
is in the Gaussian Cube format. Input to the BAND_DPLOT task is contained within the BAND_DPLOT sub-block.
36.2. BAND TASKS 327
NWPW
...
BAND_DPLOT
...
END
...
END
NWPW
...
BAND_DPLOT
VECTORS <string input_wavefunctions default input_movecs>
DENSITY [total||difference||alpha||beta||laplacian||potential default total] <string d
ELF [restricted|alpha|beta] <string elf_name no default>
ORBITAL (density || real || complex default density) <integer orbital_number no defaul
END
...
END
The following list describes the input for the BAND_DPLOT sub-block.
This sub-directive specifies the name of the molecular orbital file. If the second file is optionally given the density is
computed as the difference between the corresponding electron densities. The vector files have to match.
This sub-directive specifies, what kind of density is to be plotted. The known names for total, difference, alpha, beta,
laplacian, and potential.
By default the grid spacing and the limits of the cell to be plotted are defined by the input wavefunctions. Alternatively
the user can use the LIMITXYZ sub-directive to specify other limits. The grid is generated using No_Of_Spacings +
1 points along each direction. The known names for Units are angstroms, au and bohr.
SMEAR <sigma default 0.001> [TEMPERATURE <temperature>] [FERMI || GAUSSIAN default FERMI]
[ORBITALS <integer orbitals default 4>]
Both Fermi-Dirac (FERMI) and Gaussian broadening functions are available. The ORBITALS keyword is used to
change the number of virtual orbitals to be used in the calculation. Note to use this option the user must currently use
the SCF minimizer. The following SCF option is recommended for running fractional occupation
SCF Anderson
NWPW
...
END
TASK PAW
there are additional directives that are specific to the PSPW module, which are:
Once a user has specified a geometry, the PAW module can be invoked with no input directives (defaults invoked
throughout). There are sub-directives which allow for customized application; those currently provided as options for
the PAW module are:
NWPW
CELL_NAME <string cell_name default ’cell_default’>
[GEOMETRY_OPTIMIZE]
INPUT_WAVEFUNCTION_FILENAME <string input_wavefunctions default input_movecs>
OUTPUT_WAVEFUNCTION_FILENAME <string output_wavefunctions default input_movecs>
FAKE_MASS <real fake_mass default 400000.0>
TIME_STEP <real time_step default 5.8>
LOOP <integer inner_iteration outer_iteration default 10 100>
TOLERANCES <real tole tolc default 1.0e-7 1.0e-7>
CUTOFF <real cutoff>
ENERGY_CUTOFF <real ecut default (see input description)>
WAVEFUNCTION_CUTOFF <real wcut default (see input description)>
EWALD_NCUT <integer ncut default 1>]
EWALD_RCUT <real rcut default (see input description)>
XC (Vosko || PBE96 || revPBE default Vosko)
DFT||ODFT||RESTRICTED||UNRESTRICTED
MULT <integer mult default 1>
INTEGRATE_MULT_L <integer imult default 1>
END
• <cell_name> - name of the the simulation_cell named <cell_name>. The current version of PAW only accepts
periodic unit cells. See section 36.1.1.
• GEOMETRY_OPTIMIZE - optional keyword which if specified turns on geometry optimization.
• <input_wavefunctions> - name of the file containing one-electron orbitals
• <output_wavefunctions> - name of the file that will contain the one-electron orbitals at the end of the run.
• <fake_mass> - value for the electronic fake mass (µ). This parameter is not presently used in a conjugate
gradient simulation
• <time_step> - value for the time step (∆t). This parameter is not presently used in a conjugate gradient simu-
lation.
330 CHAPTER 36. PSEUDOPOTENTIAL PLANE-WAVE DENSITY FUNCTIONAL THEORY (NWPW)
• <inner_iteration> - number of iterations between the printing out of energies and tolerances
• <cutoff> - value for the cutoff energy used to define the wavefunction. In addition using the CUTOFF keyword
automatically sets the cutoff energy for the density to be twice the wavefunction cutoff.
• <ecut> - value for the cutoff energy used to define the density. Default is set to be the maximum value that will
fit within the simulation_cell <cell_name>.
• <wcut> - value for the cutoff energy used to define the one-electron orbitals. Default is set to be the maximum
value that will fix within the simulation_cell <cell_name>.
• <ncut> - value for the number of unit cells to sum over (in each direction) for the real space part of the smooth
compensation summation.
• <rcut> - value for the cutoff radius used in the smooth compensation summation.
ai |)
Default set to be MIN(|~
π , i = 1, 2, 3.
• (Vosko || PBE96 || revPBE) - Choose between Vosko et al’s LDA parameterization or the orginal and revised
Perdew, Burke, and Ernzerhof GGA functional.
• MULT - optional keyword which if specified allows the user to define the spin multiplicity of the system
• INTEGRATE_MULT_L - optional keyword which if specified allows the user to define the angular XC integra-
tion of the augmented region
• <mapping> - for a value of 1 slab FFT is used, for a value of 2 a 2d-hilbert FFT is used.
H He
------- ------------------
Li Be B C N O F Ne
------- -------------------
Na Mg Al Si P S Cl Ar
-------------------------------------------------------
K Ca Sc Ti V Cr Mn Fe Co Ni Cu Zn Ga Ge As Se Br Kr
-------------------------------------------------------
Rb Sr Y Zr Nb Mo Tc Ru Rh Pd Ag Cd In Sn Sb Te I Xe
-------------------------------------------------------
36.4. PSEUDOPOTENTIAL AND PAW BASIS LIBRARIES 331
Cs Ba La Hf Ta W Re Os Ir Pt Au Hg Tl Pb Bi Po At Rn
-------------------------------------------------------
Fr Ra .
-----------------
------------------------------------------
. . . . . . Gd . . . . . . .
------------------------------------------
. . U . Pu . . . . . . . . .
------------------------------------------
The pseudopotential libraries are continually being tested and added. Also, the PSPW program can read in pseudopo-
tentials in CPI and TETER format generated with pseudopotential generation programs such as the OPIUM package
of Rappe et al. The user can request additional pseudopotentials from Eric J. Bylaska at ([email protected]).
Similarly, a library of PAW basis used by PAW is currently available in the directory
$NWCHEM_TOP/src/nwpw/libraryps/paw_default
H He
------- -----------------
Li Be B C N O F Ne
------- ------------------
Na Mg Al Si P S Cl Ar
------------------------------------------------------
K Ca Sc Ti V Cr Mn Fe Co Ni Cu Zn Ga Ge As Se Br Kr
------------------------------------------------------
. . . . . . . . . . . . . . . . . .
------------------------------------------------------
. . . . . . . . . . . . . . . . . .
------------------------------------------------------
. . .
-----------------
------------------------------------------
. . . . . . . . . . . . . .
------------------------------------------
. . . . . . . . . . . . . .
------------------------------------------
Currently there are not very many elements available for PAW. However, the user can request additional basis sets
from Eric J. Bylaska at ([email protected]).
A preliminary implementation of the HGH pseudopotentials (Hartwigsen, Goedecker, and Hutter) has been imple-
mented into the PSPW module. To access the pseudopotentials the pseudopotentials input block is used. For example,
to redirect the code to use HGH pseudopotentials for carbon and hydrogen, the following input would be used.
nwpw
...
332 CHAPTER 36. PSEUDOPOTENTIAL PLANE-WAVE DENSITY FUNCTIONAL THEORY (NWPW)
pseudopotentials
C library HGH_LDA
H library HGH_LDA
end
...
end
The implementation of HGH pseudopotentials is rather limited in this release. HGH pseudopotentials cannot be used
to optimize unit cells, and they do not work with the MULLIKEN option. They also have not yet been implemented
into the BAND structure code.
To read in pseudopotentials in CPI format the following input would be used.
nwpw
...
pseudopotentials
C CPI c.cpi
H CPI h.cpi
end
...
end
In order for the program to recognize the CPI format the CPI files, e.g. c.cpi have to be prepended with the “<CPI>”
keyword.
To read in pseudopotentials in TETER format the following input would be used.
nwpw
...
pseudopotentials
C TETER c.teter
H TETER h.teter
end
...
end
In order for the program to recognize the TETER format the TETER files, e.g. c.teter have to be prepended with the
“<TETER>” keyword.
If you wish to redirect the code to a different directory other than the default one, you need to set the environmental
variable NWCHEM_NWPW_LIBRARY to the new location of the libraryps directory.
Input to the PSPW and Band modules are contained in both the RTDB and datafiles. The RTDB is used to store input
that the user will need to directly specify. Input of this kind includes ion positions, ion velocities, and simulation
cell parameters. The datafiles are used to store input, such the one-electron orbitals, one-electron orbital velocities,
formatted pseudopotentials, and one-dimensional pseudopotentials, that the user will in most cases run a program to
generate.
36.5. NWPW RTDB ENTRIES AND DATAFILES 333
The positions of the ions are stored in the default geometry structure in the RTDB and must be specified using the
GEOMETRY directive.
The velocities of the ions are stored in the default geometry structure in the RTDB, and must be specified using the
GEOMETRY directive.
The one-electron orbitals are stored in a wavefunction datafile. This is a binary file and cannot be directly edited. This
datafile is used by steepest_descent and Car-Parrinello tasks and can be generated using the wavefunction_initializer
or wavefunction_expander tasks.
The one-electron orbital velocities are stored in a velocity wavefunction datafile. This is a binary file and cannot be di-
rectly edited. This datafile is used by the Car-Parrinello task and can be generated using the v_wavefunction_initializer
task.
The pseudopotentials in Kleinman-Bylander form expanded on a simulation cell (3d grid) are stored in a formatted
pseudopotential datafile. This is a binary file and cannot be directly edited. This datafile is used by steepest_descent
and Car-Parrinello tasks and can be generated using the pseudopotential_formatter task.
The one-dimensional pseudopotentials are stored in a one-dimensional pseudopotential file. This is an ASCII file and
can be directly edited with a text editor. However, the user will usually use the psp_generator task to generate this
datafile.
The data stored in the one-dimensional pseudopotential file is
[line 1: ] element
[line 2: ] charge mass lmax
[line 3: ] (rcut(l), l=1,lmax)
[line 4: ] nr dr
[line 5: ] r(1) (Vpsp(1,l), l=1,lmax)
[line 6: ] ....
[line nr+4: ] r(nr) (Vpsp(nr,l), l=1,lmax)
[line nr+5: ] r(1) (psi(1,l), l=1,lmax)
[line nr+6: ] ....
[line 2*nr+4:] r(nr) (psi(nr,l), l=1,lmax)
[line 2*nr+5:] r_semicore
if (r_semicore read) then
[line 2*nr+6:] r(1) rho_semicore(1)
[line 2*nr+7:] ....
[line 3*nr+5:] r(nr) rho_semicore(nr)
end if
Data file that stores ion positions and velocities as a function of time in XYZ format.
[line 1: ] n_ion
[line 2: ]
do ii=1,n_ion
[line 2+ii: ] atom_name(ii), x(ii),y(ii),z(ii),vx(ii),vy(ii),vz(ii)
end do
[line n_ion+3 ] n_nion
do ii=1,n_ion
[line n_ion+3+ii: ] atom_name(ii), x(ii),y(ii),z(ii), vx(ii),vy(ii),vz(ii)
end do
[line 2*n_ion+4: ] ....
do
do ii=1,n_ion
[line n_ion+3+ii: ] x(ii),y(ii),z(ii), vx(ii),vy(ii),vz(ii)
end do
[line 2*n_ion+4: ] ....
[line 1: ] time
[line 2: ] ms,ne(ms),ne(ms)
do i=1,ne(ms)
[line 2+i: ] (hml(i,j), j=1,ne(ms)
end do
[line 3+ne(ms): ] time
[line 4+ne(ms): ] ....
Datafile that stores the eigenvalues for the one-electron orbitals as a function of time.
Datafile that stores a reduced representation of the one-electron orbitals. To be used with a molecular orbital viewer
that will be ported to NWChem in the near future.
Car and Parrinello developed a unified scheme for doing ab initio molecular dynamics by combining the motion of the
ion cores and a fictitious motion for the Kohn-Sham orbitals of density-functional theory (R. Car and M. Parrinello,
Phys. Rev. Lett. 55, 2471, (1985)). At the heart of this method they introduced a fictitious kinetic energy functional
for the Kohn-Sham orbitals.
336 CHAPTER 36. PSEUDOPOTENTIAL PLANE-WAVE DENSITY FUNCTIONAL THEORY (NWPW)
occ Z
KE({ψi,σ (~r)}) = ∑ d~r µ |ψ̇i,σ (~r)|2 (36.3)
i,σ
Given this kinetic energy the constrained equations of motion are found by taking the first variation of the auxiliary
Lagrangian.
occ Z 2
1
∑ ∑ MI ~R˙ I − E {ψi,σ (~r)} , ~RI
h n oi
L = d~r µ |ψ̇i,σ (~r)|2 +
i,σ 2 I
Z
+ ∑ Λi j,σ d~r ψ∗i,σ (~r)ψ j,σ (~r) − δi j,σ (36.4)
i j,σ
Which generates a dynamics for the wavefunctions ψi,σ (~r) and atoms positions ~RI through the constrained equations
of motion:
δE
µψ̈i,σ (~r,t) = − + ∑ Λi j,σ ψ j,σ (~r,t) (36.5)
δψ∗i,σ (~r,t) j
∂E
MI ~R¨ I = − (36.6)
∂~RI
where µ is the fictitious mass for the electronic degrees of freedom and MI are the ionic masses. The adjustable pa-
rameter µ is used to describe the relative rate at which the wavefunctions change with time. Λi j,σ are the Lagrangian
multipliers for the orthonormalization of the single-particle orbitals ψi,σ (~r). They are defined by the orthonormaliza-
tion constraint conditions and can be rigorously found. However, the equations of motion for the Lagrange multipliers
depend on the specific algorithm used to integrate Eqs. 36.5-36.6.
For this method to give ionic motions that are physically meaningful the kinetic energy of the Kohn-Sham orbitals
must be relatively small when compared to the kinetic energy of the ions. There are two ways where this criterion can
fail. First, the numerical integrations for the Car-Parrinello equations of motion can often lead to large relative values
of the kinetic energy of the Kohn-Sham orbitals relative to the kinetic energy of the ions. This kind of failure is easily
fixed by requiring a more accurate numerical integration, i.e. use a smaller time step for the numerical integration.
Second, during the motion of the system a the ions can be in locations where there is an Kohn-Sham orbital level
crossing, i.e. the density-functional energy can have two states that are nearly degenerate. This kind of failure often
occurs in the study of chemical reactions. This kind of failure is not easily fixed and requires the use of a more
sophisticated density-functional energy that accounts for low-lying excited electronic states.
" #
(∆t)2 δE
δψ∗i,σ ∑
ψt+∆t
i,σ ← 2ψti,σ − ψt−∆t
i,σ + + ψ j,σ Λ ji,σ (36.7)
µ j t
36.6. CAR-PARRINELLO SCHEME FOR AB INITIO MOLECULAR DYNAMICS 337
(∆t)2 ∂E
~Rt+∆t
I ← 2~RtI − ~Rt−∆t
I + (36.8)
MI ∂~RI
δE
In this molecular dynamic procedure we have to know variational derivative δψ∗i,σ and the matrix Λi j,σ . The
δE
variational derivative δψ∗i,σ can be analytically found and is
δE 1
= − ∇2 ψi,σ (~r)
δψ∗i,σ 2
Z
+ d~r0Wext (~r,~r0 )ψi,σ (~r0 )
n(~r0 )
Z
+ d~r0 ψi,σ (~r)
|~r − ~r0 |
+ µσxc (~r)ψi,σ (~r)
≡ Ĥψi,σ (36.9)
Nose-Hoover Thermostats for the electrons and ions can also be added to the Car-Parrinello simulation. In this type
of simulation thermostats variables xe and xR are added to the simulation by adding the auxiliary energy functionals to
the total energy.
1
ION_T HERMOSTAT (xR ) = QR x˙R + ER0 xR (36.10)
2
1
ELECT RON_T HERMOSTAT (xe ) = Qe x˙e + Ee0 xe (36.11)
2
1
ER0 = f kB T (36.12)
2
where f is the number of atomic degrees of freedom, kB is Boltzmann’s constant, and T is the desired temperature.
Defining the average fictitious kinetic energy of the electrons is not as straightforward. Blöchl and Parrinello (P.E.
Blöchl and M. Parrinello, Phys. Rev. B, 45, 9413, (1992)) have suggested the following formula for determining the
average fictitious kinetic energy
µ 1
Ee0 = 4kB T ∑
M i
< ψi | − ∇2 |ψi >
2
(36.13)
where µ is the fictitious electronic mass, M is average mass of one atom, and ∑i < ψi | − 21 ∇2 |ψi > is the kinetic energy
of the electrons.
Blöchl and Parrinello suggested that the choice of mass parameters, Qe , and QR should be made such that the period
of oscillating thermostats should be chosen larger than the typical time scale for the dynamical events of interest but
338 CHAPTER 36. PSEUDOPOTENTIAL PLANE-WAVE DENSITY FUNCTIONAL THEORY (NWPW)
r
QR
Pion = 2π (36.14)
4ER0
r
Qe
Pelectron = 2π (36.15)
4Ee0
where Pion and Pelectron are the periods of oscillation for the ionic and fictitious electronic thermostats.
In simulated annealing simulations the electronic and ionic Temperatures are scaled according to an exponential
cooling schedule,
t
Te (t) = Te0 exp− τe (36.16)
0 − t
Tionic (t) = Tionic exp τionic (36.17)
In this section we show how use the PSPW module to optimize the geometry for a C2 molecule at the PBE96 levels.
In the following example we show the input needed to optimize the geometry for a C2 molecule at the LDA level.
In this example, default pseudopotentials from the pseudopotential library are used for C, the boundary condition is
free-space, the exchange correlation functional is PBE96, The boundary condition is free-space, and the simulation
cell cell is aperiodic and cubic with a side length of 10.0 Angstroms and has 40 grid points in each direction (cutoff
energy is 44 Ry).
start c2_pspw_pbe96
title "C2 restricted singlet dimer optimization - PBE96/44Ry"
geometry
C -0.62 0.0 0.0
C 0.62 0.0 0.0
end
pspw
simulation_cell units angstroms
boundary_conditions aperiodic
SC 10.0
ngrid 40 40 40
end
xc pbe96
end
set nwpw:minimizer 2
task pspw optimize
36.8. PSPW TUTORIAL 2: RUNNING A CAR-PARRINELLO SIMULATION 339
start c2_pspw_lda_md
title "C2 restricted singlet dimer, LDA/44Ry - constant energy Car-Parrinello simulation"
geometry
C -0.62 0.0 0.0
C 0.62 0.0 0.0
end
pspw
simulation_cell units angstroms
boundary_conditions aperiodic
lattice
lat_a 10.00d0
lat_b 10.00d0
lat_c 10.00d0
end
ngrid 40 40 40
end
Car-Parrinello
fake_mass 600.0
time_step 5.0
loop 10 10
end
end
set nwpw:minimizer 2
task pspw energy
task pspw Car-Parrinello
36.9 PSPW Tutorial 3: optimizing a unit cell and geometry for Silicon-
Carbide
The following example demonstrates how to uses the PSPW module to optimize the unit cell and geometry for a
silicon-carbide crystal.
340 CHAPTER 36. PSEUDOPOTENTIAL PLANE-WAVE DENSITY FUNCTIONAL THEORY (NWPW)
title "SiC 8 atom cubic cell - geometry and unit cell optimization"
start SiC
driver
clear
maxiter 40
end
set includestress .true. # this option tells driver to optimize the unit cell
In this section we show how use the PSPW module to perform a Car-Parrinello QM/MM simulation for a CCl4
molecule in a box of 64 H2 O. Before running a PSPW Car-Parrinello simulation the system should be on the Born-
Oppenheimer surface, i.e. the one-electron orbitals should be minimized with respect to the total energy (i.e. task
pspw energy).
36.10. PSPW TUTORIAL 4: QM/MM SIMULATION FOR CCL4 + 64H2 O 341
In the following example we show the input needed to run a Car-Parrinello QM/MM simulation for a CCl4
molecule in a box of 64 H2 O. In this example, default pseudopotentials from the pseudopotential library are used
for C, Cl, Oˆ and Hˆ exchange correlation functional is PBE96, The boundary condition is periodic, and with a side
length of 23.577 Bohrs and has a cutoff energy is 50 Ry). The time step and fake mass for the Car-Parrinello run are
specified to be 5.0 au and 600.0 au, respectively.
memory 1500 mb
start CCl4-water64
#scratch_dir ./perm
#permanent_dir ./perm
xyz_filename ccl4.00.xyz
ion_motion_filename ccl4.00.ion_motion
emotion_filename ccl4.00.emotion
end
end
task pspw car-parrinello
start SiC_band
title "SiC 8 atom cubic cell"
ewald_ncut 8
end
set nwpw:minimizer 2
set nwpw:psi_brillioun_check .false.
task pspw energy
task band energy
36.12 BAND Tutorial 2: optimizing a unit cell and geometry for Silicon-
Carbide
The following example demonstrates how to uses the BAND module to optimize the unit cell and geometry for a
silicon-carbide crystal.
title "SiC 8 atom cubic cell - geometry and unit cell optimization"
start SiC
driver
clear
346 CHAPTER 36. PSEUDOPOTENTIAL PLANE-WAVE DENSITY FUNCTIONAL THEORY (NWPW)
maxiter 40
end
set includestress .true. # this option tells driver to optimize the unit cell
#set nwpw:stress_numerical .true. #option to use numerical stresses
36.13 BAND Tutorial 3: optimizing a unit cell and geometry for Aluminum
with fractional occupation
The following example demonstrates how to uses the BAND module to optimize the unit cell and geometry for a
Aluminum.
start aluminumfrac
memory 900 mb
geometry noautoz
system crystal
lat_a 3.0
lat_b 3.0
lat_c 3.0
alpha 90.0
beta 90.0
gamma 90.0
end
Al 0.0 0.0 0.0
Al 0.0 0.5 0.5
Al 0.5 0.5 0.0
Al 0.5 0.0 0.5
end
set nwpw:cif_filename aluminum
nwpw
scf anderson
mult 1
smear temperature 3500.0 fermi
cutoff 15.0
monkhorst-pack 3 3 3
ewald_ncut 8
mapping 2
end
set nwpw:lcao_skip .true.
driver
clear
end
task band optimize ignore
The following input deck performs for a water molecule a PSPW energy calculation followed by a PAW energy
calculation and a PAW geometry optimization calculation. The default unit cell parameters are used (SC=20.0, ngrid
32 32 32). In this simulation, the first PAW run optimizes the wavefunction and the second PAW run optimizes the
wavefunction and geometry in tandem.
start paw_test
charge 0
nwpw
time_step 15.8
ewald_rcut 1.50
tolerances 1.0d-8 1.0d-8
end
set nwpw:lcao_iterations 1
set nwpw:minimizer 2
task pspw energy
nwpw
time_step 5.8
geometry_optimize
ewald_rcut 1.50
tolerances 1.0d-7 1.0d-7 1.0d-4
end
task paw steepest_descent
36.15 PAW Tutorial 2: optimizing a unit cell and geometry for Silicon-
Carbide
The following example demonstrates how to uses the PAW module to optimize the unit cell and geometry for a silicon-
carbide crystal.
title "SiC 8 atom cubic cell - geometry and unit cell optimization"
start SiC
driver
clear
maxiter 40
end
set includestress .true. # this option tells driver to optimize the unit cell
set nwpw:stress_numerical .true. #currently only numerical stresses implemented in paw
In this section we show how use the PAW module to perform a Car-Parrinello molecular dynamic simulation for a
C2 molecule at the LDA level. Before running a PAW Car-Parrinello simulation the system should be on the Born-
Oppenheimer surface, i.e. the one-electron orbitals should be minimized with respect to the total energy (i.e. task
pspw energy). The input needed is basically the same as for optimizing the geometry of a C2 molecule at the LDA
level, except that and additional Car-Parrinello sub-block is added.
In the following example we show the input needed to run a Car-Parrinello simulation for a C2 molecule at the LDA
level. In this example, default pseudopotentials from the pseudopotential library are used for C, the boundary condition
is free-space, the exchange correlation functional is LDA, The boundary condition is free-space, and the simulation
cell cell is aperiodic and cubic with a side length of 10.0 Angstroms and has 40 grid points in each direction (cutoff
energy is 44 Ry). The time step and fake mass for the Car-Parrinello run are specified to be 5.0 au and 600.0 au,
respectively.
start c2_paw_lda_md
title "C2 restricted singlet dimer, LDA/44Ry - constant energy Car-Parrinello simulation"
geometry
C -0.62 0.0 0.0
C 0.62 0.0 0.0
end
pspw
simulation_cell units angstroms
boundary_conditions aperiodic
lattice
lat_a 10.00d0
lat_b 10.00d0
lat_c 10.00d0
end
ngrid 40 40 40
end
Car-Parrinello
fake_mass 600.0
time_step 5.0
loop 10 10
end
end
set nwpw:minimizer 2
task paw energy
task paw Car-Parrinello
Python (version 1.5.1) programs may be embedded into the NWChem input and used to control the execution of
NWChem. Python is a very powerful and widely used scripting language that provides useful things such as variables,
conditional branches and loops, and is also readily extended. Example applications include scanning potential energy
surfaces, computing properties in a variety of basis sets, optimizing the energy w.r.t. parameters in the basis set,
computing polarizabilities with finite field, and simple molecular dynamics.
Look in the NWChem contrib directory for useful scripts and examples. Visit the Python web-site http://www.python.org
for a full manual and lots of useful code and resources.
python [print|noprint]
...
end
The END directive must be flush against the left margin (see the Troubleshooting section for the reason why).
The program is by default printed to standard output when read, but this may be disabled with the noprint key-
word. Python uses indentation to indicate scope (and the initial level of indentation must be zero), whereas NWChem
uses optional indentation only to make the input more readable. For example, in Python, the contents of a loop,
or conditionally-executed block of code must be indented further than the surrounding code. Also, Python attaches
special meaning to several symbols also used by NWChem. For these reasons, the input inside a PYTHON compound
directive is read verbatim except that if the first line of the Python program is indented, the same amount of indentation
is removed from all subsequent lines. This is so that a program may be indented inside the PYTHON input block for
improved readability of the NWChem input, while satisfying the constraint that when given to Python the first line has
zero indentation.
E.g., the following two sets of input specify the same Python program.
python
print ’Hello’
print ’Goodbye’
351
352 CHAPTER 37. CONTROLLING NWCHEM WITH PYTHON
end
python
print ’Hello’
print ’Goodbye’
end
whereas this program is in error since the indentation of the second line is less than that of the first.
python
print ’Hello’
print ’Goodbye’
end
The Python program is not executed until the following directive is encountered
task python
which is to maintain consistency with the behavior of NWChem in general. The program is executed by all nodes.
This enables the full functionality and speed of NWChem to be accessible from Python, but there are some gotchas
• Print statements and other output will be executed by all nodes so you will get a lot more output than probably
desired unless the output is restricted to just one node (by convention node zero).
• The calls to NWChem functions are all collective (i.e., all nodes must execute them). If these calls are not made
collectively your program may deadlock (i.e., cease to make progress).
• When writing to the database (rtdb_put()) it is the data from node zero that is written.
• NWChem overrides certain default signal handlers so care must be taken when creating processes (see Section
37.3.11).
• input_parse(string) — invokes the standard NWChem input parser with the data in string as input.
Note that the usual behavior of NWChem will apply — the parser only reads input up to either end of input or
until a TASK directive is encountered (the task directive is not executed by the parser).
• task_energy(theory) — returns the energy as if computed with the NWChem directive TASK ENERGY <THEORY>.
37.3. EXAMPLES 353
An example below (Section 37.3.10) explains, in lieu of a Python wrapper for the geometry object, how to obtain
the Cartesian molecular coordinates directly from the database.
37.3 Examples
Several examples will provide the best explanation of how the extensions are used, and how Python might prove useful.
python
print ’Hello world from process ’, ga_nodeid()
end
task python
This input prints the traditional greeting from each parallel process.
geometry units au
O 0 0 0; H 0 1.430 -1.107; H 0 -1.430 -1.107
end
python
exponent = 0.1
while (exponent <= 2.01):
input_parse(’’’
basis noprint
H library 3-21g; O library 3-21g; O d; %f 1.0
end
’’’ % (exponent))
354 CHAPTER 37. CONTROLLING NWCHEM WITH PYTHON
print none
task python
This program augments a 3-21g basis for water with a d-function on oxygen and varies the exponent from 0.1 to
2.0 in steps of 0.1, printing the exponent and energy at each step.
The geometry is input as usual, but the basis set input is embedded inside a call to input_parse() in the
Python program. The standard Python string substitution is used to put the current value of the exponent into the basis
set (replacing the %f) before being parsed by NWChem. The energy is returned by task_energy(’scf’) and
printed out. The print none in the NWChem input switches off all NWChem output so all you will see is the
output from your Python program.
Note that execution in parallel may produce unwanted output since all process execute the print statement inside
the Python program.
Look in the NWChem contrib directory for a routine that makes the above task easier.
geometry units au
O 0 0 0; H 0 1.430 -1.107; H 0 -1.430 -1.107
end
print none
python
if (ga_nodeid() == 0): plotdata = open("plotdata",’w’)
def energy_at_exponent(exponent):
input_parse(’’’
basis noprint
H library 3-21g; O library 3-21g; O d; %f 1.0
end
’’’ % (exponent))
return task_energy(’scf’)
exponent = 0.1
while exponent <= 2.01:
energy = energy_at_exponent(exponent)
if (ga_nodeid() == 0):
print ’ exponent = ’, exponent, ’ energy = ’, energy
plotdata.write(’%f %f\n’ % (exponent , energy))
exponent = exponent + 0.1
task python
This input performs exactly the same calculation as the previous one, but uses a slightly more sophisticated Python
program, also writes the data out to a file for easy visualization with a package such as gnuplot, and protects write
statements to prevent duplicate output in a parallel job. The only significant differences are in the Python program. A
file called "plotdata" is opened, and then a procedure is defined which given an exponent returns the energy. Next
comes the main loop that scans the exponent through the desired range and prints the results to standard output and to
the file. When the loop is finished the additional output file is closed.
python
geometry = ’’’
geometry noprint; symmetry d2h
C 0 0 %f; H 0 0.916 1.224
end
’’’
x = 0.6
while (x < 0.721):
input_parse(geometry % x)
energy = task_energy(’scf’)
print ’ x = %5.2f energy = %10.6f’ % (x, energy)
x = x + 0.01
end
print none
task python
This scans the bond length in ethene from 1.2 to 1.44 in steps of 0.2 computing the energy at each geometry. Since
it is using D2h symmetry the program actually uses a variable (x) that is half the bond length.
Look in the NWChem contrib directory for a routine that makes the above task easier.
basis spherical
Ne library cc-pvdz; BqNe library Ne cc-pvdz
He library cc-pvdz; BqHe library He cc-pvdz
end
print none
python noprint
356 CHAPTER 37. CONTROLLING NWCHEM WITH PYTHON
def energy(geometry):
input_parse(geometry + ’scf; vectors atomic; end\n’)
return task_energy(’mp2’)
def bsse_energy(z):
return energy(supermolecule % z) - \
energy(fragment1 % z) - \
energy(fragment2 % z)
z = 3.3
while (z < 4.301):
e = bsse_energy(z)
if (ga_nodeid() == 0):
print ’ z = %5.2f energy = %10.7f ’ % (z, e)
z = z + 0.1
end
task python
This example scans the He—Ne bond-length from 3.3 to 4.3 and prints out the BSSE counterpoise corrected MP2
energy.
The basis set is specified as usual, noting that we will need functions on ghost centers to do the counterpoise
correction. The Python program commences by defining strings containing the geometry of the super-molecule and
two fragments, each having one variable to be substituted. Next, a function is defined to compute the energy given a
geometry, and then a function is defined to compute the counterpoise corrected energy at a given bond length. Finally,
the bond length is scanned and the energy printed. When computing the energy, the atomic guess has to be forced in
the SCF since by default it will attempt to use orbitals from the previous calculation which is not appropriate here.
Since the counterpoise corrected energy is a linear combination of other standard energies, it is possible to compute
the analytic derivatives term by term. Thus, combining this example and the next could yield the foundation of a BSSE
corrected geometry optimization package.
37.3.6 Scan the geometry and compute the energy and gradient
python noprint
print ’ y z energy gradient’
print ’ ----- ----- ---------- ------------------------------------’
y = 1.2
while y <= 1.61:
z = 1.0
while z <= 1.21:
input_parse(’’’
geometry noprint units atomic
O 0 0 0
H 0 %f -%f
37.3. EXAMPLES 357
H 0 -%f -%f
end
’’’ % (y, z, y, z))
(energy,gradient) = task_gradient(’scf’)
print none
task python
This program illustrates evaluating the energy and gradient by calling task_gradient(). A water molecule
is scanned through several C2v geometries by varying the y and z coordinates of the two hydrogen atoms. At each
geometry the coordinates, energy and gradient are printed.
The basis set (sto-3g) is input as usual. The two while loops vary the y and z coordinates. These are then substituted
into a geometry which is parsed by NWChem using input_parse(). The energy and gradient are then evaluated
by calling task_gradient() which returns a tuple containing the energy (a scalar) and the gradient (a vector or
list). These are printed out exploiting the Python convention that a print statement ending in a comma does not print
end-of-line.
print none
python
energies = {}
c2h4 = ’geometry noprint; symmetry d2h; \
C 0 0 0.672; H 0 0.935 1.238; end\n’
ch4 = ’geometry noprint; symmetry td; \
C 0 0 0; H 0.634 0.634 0.634; end\n’
h2 = ’geometry noprint; H 0 0 0.378; H 0 0 -0.378; end\n’
input_parse(geometry)
return task_energy(’mp2’)
task python
In this example the reaction energy for 2H2 + C2 H4 → 2CH4 is evaluated using MP2 in several basis sets. The
geometries are fixed, but could be re-optimized in each basis. To illustrate the useful associative arrays in Python, the
reaction energies are put into the associative array energies — note its declaration at the top of the program.
python
rtdb_put("test_int2", 22)
rtdb_put("test_int", [22, 10, 3], INT)
rtdb_put("test_dbl", [22.9, 12.4, 23.908], DBL)
rtdb_put("test_str", "hello", CHAR)
rtdb_put("test_logic", [0,1,0,1,0,1], LOGICAL)
rtdb_put("test_logic2", 0, LOGICAL)
rtdb_print(1)
task python
geometry; he 0 0 0; he 0 0 2; end
basis; he library 3-21g; end
scf; maxiter 1; end
python
try:
task_energy(’scf’)
except NWChemError, message:
37.3. EXAMPLES 359
task python
The above test program shows how to handle exceptions generated by NWChem by forcing an SCF calculation on
He2 to fail due to insufficient iterations.
If an NWChem command fails it will raise the exception "NWChemError" (case sensitive) unless the error was
fatal. If the exception is not caught, then it will cause the entire Python program to terminate with an error. This Python
program catches the exception, prints out the message, and then continues as if all was well since the exception has
been handled.
If your Python program detects an error, raise an unhandled exception. Do not call exit(1) since this may
circumvent necessary clean-up of the NWChem execution environment.
In an ideal world the geometry and basis set objects would have full Python wrappers, but until then a back-door
solution will have to suffice. We’ve already seen how to use input_parse() to put geometry (and basis) data into
NWChem, so it only remains to get the geometry data back after it has been updated by a geometry optimzation or
some other operation.
The following Python procedure retrieves the coordinates in the same units as initially input for a geometry of a
given name. Its full source is included in the NWChem contrib directory.
def geom_get_coords(name):
try:
actualname = rtdb_get(name)
except NWChemError:
actualname = name
coords = rtdb_get(’geometry:’ + actualname + ’:coords’)
units = rtdb_get(’geometry:’ + actualname + ’:user units’)
if (units == ’a.u.’):
factor = 1.0
elif (units == ’angstroms’):
factor = rtdb_get(’geometry:’+actualname+’:angstrom_to_au’)
else:
raise NWChemError,’unknown units’
i = 0
while (i < len(coords)):
coords[i] = coords[i] / factor
i = i + 1
return coords
A geometry (see Section 6) with name NAME has its coordinates (in atomic units) stored in the database entry
geometry:NAME:coords. A minor wrinkle here is that indirection is possible (and used by the optimizers) so that
we must first check if NAME actually points to another name. In the program this is done in the first try...except
sequence. With the actual name of the geometry, we can get the coordinates. Any exceptions are passed up to the
caller. The rest of the code is just to convert back into the initial input units — only atomic units or Angstrøms are
handled in this simple example. Returned is a list of the atomic coordinates in the same units as your initial input.
360 CHAPTER 37. CONTROLLING NWCHEM WITH PYTHON
coords = geom_get_coords(’geometry’)
try:
coords = geom_get_coords(’geometry’)
except NWChemError,message:
print ’Coordinates for geometry not found ’, message
else:
print coords
This is very dirty and definitely not supported from one release to another, but, browsing the output of rtdb_print()
at the end of a calculation is a good way to find stuff. To be on safer ground, look in the programmers manual since
some of the high-level routines do pass data via the database in a well-defined and supported manner. Be warned
— you must be very careful if you try to modify data in the database. The input parser does many important things
that are not immediately apparent (e.g., ensure the geometry is consistent with the point group, mark the SCF as not
converged if the SCF options are changed, . . . ). Where at all possible your Python program should generate standard
NWChem input and pass it to input_parse() rather than setting parameters directly in the database.
37.3.11 Scaning a basis exponent yet again — plotting and handling child processes
geometry units au
O 0 0 0; H 0 1.430 -1.107; H 0 -1.430 -1.107
end
print none
python
import Gnuplot, time, signal
def energy_at_exponent(exponent):
input_parse(’’’
basis noprint
H library 3-21g; O library 3-21g; O d; %f 1.0
end
’’’ % (exponent))
return task_energy(’scf’)
data = []
exponent = 0.5
while exponent <= 0.6:
energy = energy_at_exponent(exponent)
print ’ exponent = ’, exponent, ’ energy = ’, energy
data = data + [[exponent,energy]]
exponent = exponent + 0.02
if (ga_nodeid() == 0):
signal.signal(signal.SIGCHLD, signal.SIG_DFL)
37.4. TROUBLESHOOTING 361
g = Gnuplot.Gnuplot()
g(’set data style linespoints’)
g.plot(data)
time.sleep(30) # 30s to look at the plot
end
task python
This illustrates how to handle signals from terminating child processes and how to generate simple plots on UNIX
systems. The example from Section 37.3.3 is modified so that instead of writing the data to a file for subsequent
visualization, it is saved for subsequent visualization with Gnuplot (you’ll need both Gnuplot and the corresponding
package for Python in your PYTHONPATH. Look at http://monsoon.harvard.edu/ mhagger/download).
The issue is that NWChem traps various signals from the O/S that usually indicate bad news in order to provide
better error handling and reliable clean-up of shared, parallel resources. One of these signals is SIGCHLD which
is generated whenever a child process terminates. If you want to create child processes within Python, then the
NWChem handler for SIGCHLD must be replaced with the default handler. There seems to be no easy way to restore
the NWChem handler after the child has completed, but this should have no serious side effect.
37.4 Troubleshooting
Common problems with Python programs inside NWChem.
This indicates that NWChem thinks that a line is less indented than the first line. If this is not the case then
perhaps there is a tab in your input which NWChem treats as a single space character but appears to you as more
spaces. Try running untabify in Emacs. The problem could also be the END directive that terminates the
PYTHON compound directive — since Python also has an end statement. To avoid confusion the END directive
for NWChem must be at the start of the line.
2. Your program hangs or deadlocks — most likely you have a piece of code that is restricted to executing on a
subset of the processors (perhaps just node 0) but is calling (perhaps indirectly) a function that must execute on
all nodes.
362 CHAPTER 37. CONTROLLING NWCHEM WITH PYTHON
Chapter 38
NWChem has interfaces to several different packages which are listed below. In general, the NWChem authors work
with the authors of the other packages to make sure that the interface works. However, any problems with the interface
should be reported to the [email protected] e-mail list.
by Bruce C. Garrett,
Environmental Molecular Sciences Laboratory,
Pacific Northwest Laboratory, Richland, Washington
Ricky A. Kendall,
Scalable Computing Laboratory,
Ames Laboratory and Iowa State University, Ames, IA 50011
Theresa L. Windus,
Environmental Molecular Sciences Laboratory,
Pacific Northwest Laboratory, Richland, Washington
If you use the DIRDYVTST portion of NWChem, please use following citation in addition to the usual NWChem
citation from Section 1:
DIRDYVTST, Yao-Yuan Chuang and Donald G. Truhlar, Department of Chemistry and Super Computer
Institute, University of Minnesota; Ricky A. Kendall,Scalable Computing Laboratory, Ames Laboratory
and Iowa State University; Bruce C. Garrett and Theresa L. Windus, Environmental Molecular Sciences
Laboratory, Pacific Northwest Laboratory.
363
364 CHAPTER 38. INTERFACES TO OTHER PROGRAMS
38.1.1 Introduction
By using DIRDYVTST, a user can carry out electronic structure calculations with NWChem and use the resulting
energies, gradients, and Hessians for direct dynamics calculations with POLYRATE. This program prepares the file30
input for POLYRATE from NWChem electronic structure calculations of energies, gradients and Hessians at the
reactant, product, and saddle point geometries and along the minimum energy path. Cartesian geometries for the
reactants, products, and saddle points need to be input to this program; optimization of geometries is not performed in
this program. Note that DIRDYVTST is based on the DIRDYGAUSS program and is similar to two other programs:
DDUTILITIES and GAUSSRATE. Users of this module are encouraged to read the POLYRATE manual since they
will need to create the file fu5 input to run calculations with POLYRATE.
Notes about the code:
Input. The code has been written to parallel, as much as possible, the POLYRATE code.
Output. There is one default output file for each DIRDYVTST run - .file30.
Integrators for following the reaction path. Currently the Euler and three Page-McIver (PM) methods are imple-
mented. The PM methods are the local quadratic approximation (LQA), the corrected LQA (CLQA), and the cubic
(CUBE) algorithm. The PM methods are implemented so that the Hessian can be reused at intermediate steps at which
only the gradient is updated.
38.1.2 Files
Test runs are located in directories in $NWCHEM_TOP/QA/tests. Test runs are available for two systems: H + H2
and OH + H2 .
The H + H2 test uses the Euler integration method at the SCF/3-21G level of theory to calculate points along the
reaction path. This test is located in the $NWCHEM_TOP/QA/tests/h3tr1 directory.
The OH + H2 test uses the Page-McIver CUBE algorithm to calculate points on the SCF/3-21G surface and does
additional single point calculations at the SCF/6-31G* level of theory. This test is located in the $NWCHEM_TOP/QA/tests/oh3tr3
directory.
Note: These tests are set up with SCF, however, other levels of theory can be used. The initial hessian calculations
at the reactants, products and saddle point can cause some problems when numerical hessians are required (especially
when there is symmetry breaking in the wavefunction).
The input consists of keywords for NWChem and keywords related to POLYRATE input. The first set of inputs are for
NWChem with the general input block of the form:
Use of symmetry
The use of symmetry in the calculation is controlled by the keyword autosym | noautosym which is used as
described in the geometry directive (see Section 6). Autosym is on by default. A couple words of warning here.
The tolerance related to autosym can cause problems when taking the initial step off of the transition state. If the
tolerance is too large and the initial step relatively small, the resulting geometry will be close to a higher symmetry
than is really wanted and the molecule will be symmetrized into the higher symmetry. To check this, the code prints
out the symmetry at each geometry along the path. It is up to the user to check the symmetry and make sure that it is
the required one. In preverse cases, the user may need to turn autosym off (noautosym) if changing the tolerance
doesn’t produce the desired results. In the case that autosym is used, the user does not need to worry about the different
alignment of the molecule between NWChem and POLYRATE, this is taken care of internally in the DIRDYVTST
module.
Basis specification
The basis name on the theory or sptheory directive is that specified on a basis set directive (see Section 7) and not the
name of a standard basis in the library. If not specified, the basis set for the sptheory defaults to the theory basis which
defaults to "ao basis".
If an effective core potential is specified in the usual fashion (see Section 8) outside of the DIRDYVTST input then
this will be used in all calculations. If an alternative ECP name (the name specified on the ECP directive in the same
manner as done for basis sets) is specified on one of the theory directives, then this ECP will be used in preference for
that level of theory.
For many purposes, the ability to specify the theory, basis and effective core potential is adequate. All of the options
for each theory are determined from their independent input blocks. However, if the same theory (e.g., DFT) is to
be used with different options for theory and sptheory, then the general input strings must be used. These strings are
processed as NWChem input each time the theoretical calculation is invoked. The strings may contain any NWChem
input, except for options pertaining to DIRDYVTST and the task directive. The intent is that the strings be used just to
control the options pertaining to the theory being used.
A word of caution. Be sure to check that the options are producing the desired results. Since the NWChem
database is persistent, the input strings should fully define the calculation you wish to have happen.
For instance, if the theory model is DFT/LDA/3-21g and the sptheory model is DFT/B3LYP/6-311g**, the
DIRDYVTST input might look like this
dirdyvtst
theory dft basis 3-21g input "dft\; xc\; end"
sptheory dft basis 6-311g** input "dft\; xc b3lyp\; end"
....
end
The empty XC directive restores the default LDA exchange-correlation option (see Section 11.3). Note that semi-
colons and other quotation marks inside the input string must be preceded by a backslash to avoid special interpretation.
366 CHAPTER 38. INTERFACES TO OTHER PROGRAMS
These keyword options are simlar to the POLYRATE input format, except there are no ENERGETICS, OPTIMIZA-
TION, SECOND, TUNNELING, and RATE sections.
*GENERAL
[TITLE <string title>]
ATOMS
<integer num> <string tag> [<real mass>]
...
END
[SINGLEPOINT]
[SAVEFILE (vecs || hess || spc)
Descriptions
TITLE is a keyword that allows the user to input a description of the calculation. In this version, the user can only
have a single-line description.
For example: TITLE Calculation of D + HCl reaction
ATOMS is a list keyword that is used to input a list of the atoms. It is similar to POLYRATE in that the order of
the atom and the atomic symbol are required in a single line. If isotope of the element is considered then the atomic
mass is required in units of amu.
For example:
ATOMS
1 H 2.014
2 H
3 Cl
END
SINGLEPOINT is a keyword that specifies that a single point calculation is to be performed at the reactants,
products and saddle point geometries. The type of single point calculation is specified in the sptheory line.
SAVEFILE is a keyword that specifies that NWChem files are to be saved. Allowed values of variable input
to SAVEFILE are vecs, hess, and spc for saving the files base theory movecs, base theory hessian and singlepoint
calculation movecs.
REACT1, REACT2, PROD1, PROD2, and START sections These sections have the following format:
REACT1 and REACT2 are input for each of the reactants and PROD1 and PROD2 are input for each of the
products. REACT1 and PROD1 are required. START is the input for the transition state if one exists, or starting point
to follow downhill the MEP.
Descriptions
GEOM is a list keyword that indicates the geometry of the molecule in Cartesian coordinates with atomic unit.
For example:
GEOM
1 0.0 0.0 0.0
2 0.0 0.0 1.5
END
SPECIES is a variable keyword that indicates the type of the molecule. Options are: ATOMIC (atomic reactant
or product), LINRP (linear reactant or product), NONLINRP (nonlinear reactant or product), LINTS (linear transition
state), and NONLINTS (nonlinear transition state).
For example: SPECIES atomic
*PATH
[SCALEMASS <real scalemass default 1.0>]
[SSTEP <real sstep default 0.01>]
[SSAVE <real ssave default 0.1>]
[SHESS <real shess default SSAVE>]
[SLP <real slp default 1.0>]
[SLM <real slm default -1.0>]
[SIGN (REACTANT || PRODUCT default REACTANT)]
[INTEGRA (EULER || LQA || CLQA || CUBE default EULER)]
[PRINTFREQ (on || off default off)]
Descriptions
SCALEMASS is a variable keyword that indicates the arbitrary mass (in amu) used for mass-scaled Cartesian
coordinates. This is the variable called mu in published papers. Normally, this is taken as either 1.0 amu or, for
bimolecular reactions, as the reduced mass of relative translation of the reactants.
SSTEP is a variable keyword that indicates the numerical step size (in bohrs) for the gradient grid. This is the step
size for following the minimum energy path.
SSAVE is a variable keyword that indicates the numerical step size (in bohrs) for saving the Hessian grid. At each
save point the potential and its first and second derivatives are recalculated and written to the .file30 file. For example,
if SSTEP=0.01 and SSAVE=0.1, then the potential information is written to .file30 every 10 steps along the gradient
grid.
SHESS is a variable keyword that indicates the numerical step size (in bohrs) for recomputing the Hessian when
using a Page-McIver integrator (e.g., LQA, CLQA, or CUBE). For Euler integration SHESS = SSAVE. For interme-
diate points along the gradient grid, the Hessian matrix from the last Hessian calculation is reused. For example, if
SSTEP=0.01 and SHESS=0.05, then the Hessian matrix is recomputed every 5 steps along the gradient grid.
SLP is a variable keyword that indicates the positive limit of the reaction coordinate (in bohrs).
368 CHAPTER 38. INTERFACES TO OTHER PROGRAMS
SLM is a variable keyword that indicates the negative limit of the reaction coordinate (in bohrs).
SIGN is a variable keyword used to ensure the conventional definition of the sign of s, s < 0 for the reactant side
and s > 0 for the product side, is followed. PRODUCT should be used if the eigenvector at the saddle point points
toward the product side and REACTANT if the eigenvector points toward the reactant side.
INTEGRA is a variable keyword that indicates the integration method used to follow the reaction path. Options
are: EULER, LQA, CLQA, and CUBE.
PRINTFREQ is a variable keyword that indicates that projected frequencies and eigenvectors will be printed along
the MEP.
Restart
DIRDYVTST calculations should be restarted through the normal NWChem mechanism (See Section 5.1). The user
needs to change the start directive to a restart directive and get rid of any information that will overwrite
important information in the RTDB. The file.db and file.file30 need to be available for the calculation to
restart properly.
Example
This is an example that creates the file30 file for POLYRATE for H + H2 . Note that the multiplicity is that of the
entire supermolecule, a doublet. In this example, the initial energies, gradients, and Hessians are calculated at the
UHF/3-21G level of theory and the singlepoint calculations are calculated at the MP2/cc-pVDZ level of theory with a
tighter convergence threshold than the first SCF.
start h3test
basis
h library 3-21G
end
basis singlepoint
h library cc-pVDZ
end
scf
uhf
doublet
thresh 1.0e-6
end
ATOMS
38.1. DIRDYVTST — DIRECT DYNAMICS FOR VARIATIONAL TRANSITION STATE THEORY 369
1 H
2 H
3 H
END
SINGLEPOINT
*REACT1
GEOM
1 0.0 0.0 0.0
2 0.0 0.0 1.3886144
END
SPECIES LINRP
*REACT2
GEOM
3 0.0 0.0 190.3612132
END
SPECIES ATOMIC
*PROD2
GEOM
1 0.0 0.0 190.3612132
END
SPECIES ATOMIC
*PROD1
GEOM
2 0.0 0.0 1.3886144
3 0.0 0.0 0.0
END
SPECIES LINRP
*START
GEOM
1 0.0 0.0 -1.76531973
2 0.0 0.0 0.0
3 0.0 0.0 1.76531973
END
SPECIES LINTS
*PATH
SSTEP 0.05
SSAVE 0.05
370 CHAPTER 38. INTERFACES TO OTHER PROGRAMS
SLP 0.50
SLM -0.50
SCALEMASS 0.6718993
INTEGRA CLQA
end
task dirdyvtst
Chapter 39
Acknowledgments
This work was supported by funds from the Environmental and Molecular Sciences Laboratory Construction Project
at Pacific Northwest National Laboratory. Development of some of the parallel programming tools and algorithms
employed by NWChem was performed under the auspices of the High Performance Computing and Communications
Program of the Mathematical, Information, and Computational Sciences Division, U.S. Department of Energy. Pacific
Northwest National Laboratory is operated by Battelle Memorial Institute for the U.S. Department of Energy under
Contract DE-AC05-76RL01830.
371
372 CHAPTER 39. ACKNOWLEDGMENTS
Appendix A
Basis sets and effective core potentials were obtained (1/1/2002) from the Extensible Computational Chemistry En-
vironment (ECCE) Basis Set Database, as developed and distributed by the Molecular Science Computing Facility,
Environmental and Molecular Sciences Laboratory which is part of the Pacific Northwest National Laboratory, P.O.
Box 999, Richland, Washington 99352, USA, and is funded by the U.S. Department of Energy. The Pacific Northwest
National Laboratory is a multi-program laboratory operated by Battelle Memorial Institute for the U.S. Department
of Energy under contract DE-AC05-76RL01830. Contact David Feller ([email protected]) or Deborah Gracio
([email protected]) for further information.
The names in the NWChem library are consistent with those in the ECCE database and thus may include spaces.
The standard NWChem input routines require that strings including spaces are enclosed in quotation marks ("...")
or that blanks are escaped with a backslash. As a convenience, basis set names may also have the blanks replaced with
underscores. Thus, the following all yield the same basis set for oxygen
Case may be ignored when specifying basis set names, but otherwise names should be specified exactly as provided
below. A good method is just to cut/paste from the WWW pages since they were generated electronically from the
library source.
Errors found in the basis set library of NWChem version 4.4 have been corrected in the current library of NWChem
version 4.6. The changes are listed below:
Errors found in the basis set library of NWChem version 4.0 have been corrected in the current library of NWChem
version 4.1. The changes are listed below:
373
374 APPENDIX A. STANDARD BASIS SETS
IMPORTANT NOTE: The Stuttgart basis set and ECP developers have recently developed a new set of Stuttgart
RSC ECP’s for the Lanthanide series of elements. This new set is now called "Stuttgart RSC ECP", while the previous,
and most widely used, set has been renamed to "Stuttgart RSC 1997 ECP" in both the EMSL Basis Set library and in
the NWChem basis set library.
Relativistic contractions of standard basis sets for use in the Douglas-Kroll and Dyall-modified-Dirac method have
also been included in the library. These are identified by tags following the standard basis set name.
For the Dyall-modified-Dirac (DMD) method three tags should be specified. The first is a tag for the nuclear
model, which can be pt or fi for a point or a finite Gaussian model (see Section6). The second is a tag for the
relativistic Hamiltonian: sf is for the spin-free modified Dirac Hamiltonian. The third tags the component type, fw
for the atomic FW transformed large component, lc for the large component and sc for the small component:
Basis sets which are available with either the DmD or DK contractions are indicated in the list below. Additional DmD
basis sets are available in the directory nwchem/contrib/basissets/Dyall_DMD.
Here is a list of known all-electron non-relativistic, DK and DmD basis sets, effective core potentials with their
respective basis sets, fitting basis sets, and Polarization, Diffuse and Core-valence sets of functions, along with the
elements included for each. Additional information about each basis set in the NWChem library can be obtained from
the online EMSL Gaussian Basis Set library.
Standard all-electron basis sets:
17. Basis Set "Stuttgart RSC 1997 ECP" (number of atoms 64)
K Ca Sc Ti V Cr Mn Fe Co Ni Cu Zn Rb Sr Y Zr Nb Mo Tc Ru Rh Pd Ag Cd Cs Ba Ce Pr Nd Pm Sm Eu Gd
Tb Dy Ho Er Tm Yb Hf Ta W Re Os Ir Pt Au Hg Ac Th Pa U Np Pu Am Cm Bk Cf Es Fm Md No Lr Db
Polarization functions:
Diffuse functions:
Core-valence functions:
start h2o
title "Water in 6-31g basis set"
geometry units au
O 0.00000000 0.00000000 0.00000000
H 0.00000000 1.43042809 -1.10715266
H 0.00000000 -1.43042809 -1.10715266
end
basis
H library 6-31g
O library 6-31g
end
task scf
restart h2o
title "Water geometry optimization"
There is no need to specify anything that has not changed from the previous input deck, though it will do no harm
to repeat it.
389
390 APPENDIX B. SAMPLE INPUT FILES
start ne
title "Neon"
geometry; ne 0 0 0; end
basis spherical
ne library aug-cc-pvdz
end
scf; thresh 1e-10; end
task scf
An external field may be simulated with point charges. The charges here apply a field of magnitude 0.01 atomic units
to the atom at the origin. Since the basis functions have not been reordered by the additional centers we can also restart
from the previous vectors, which is the default for a restart job.
restart ne
title "Neon in electric field"
geometry units atomic
bq1 0 0 100 charge 50
ne 0 0 0
bq2 0 0 -100 charge -50
end
task scf
The final energy should be -128.496441, which together with the previous field-free result yields an estimate for
the polarizability of 1.83 atomic units. Note that by default NWChem does not include the interaction between the
two point charges in the total energy (section 6).
start ecpchho
geometry units au
C 0.000000 0.000000 -1.025176
O 0.000000 0.000000 1.280289
H 0.000000 1.767475 -2.045628
H 0.000000 -1.767475 -2.045628
B.3. SCF ENERGY OF H2 CO USING ECPS FOR C AND O 391
end
basis
C SP
0.1675097360D+02 -0.7812840500D-01 0.3088908800D-01
0.2888377460D+01 -0.3741108860D+00 0.2645728130D+00
0.6904575040D+00 0.1229059640D+01 0.8225024920D+00
C SP
0.1813976910D+00 0.1000000000D+01 0.1000000000D+01
C D
0.8000000000D+00 0.1000000000D+01
C F
0.1000000000D+01 0.1000000000D+01
O SP
0.1842936330D+02 -0.1218775590D+00 0.5975796600D-01
0.4047420810D+01 -0.1962142380D+00 0.3267825930D+00
0.1093836980D+01 0.1156987900D+01 0.7484058930D+00
O SP
0.2906290230D+00 0.1000000000D+01 0.1000000000D+01
O D
0.8000000000D+00 0.1000000000D+01
O F
0.1100000000D+01 0.1000000000D+01
H S
0.1873113696D+02 0.3349460434D-01
0.2825394365D+01 0.2347269535D+00
0.6401216923D+00 0.8137573262D+00
H S 1 1.00
0.1612777588D+00 0.1000000000D+01
end
ecp
C nelec 2
C ul
1 80.0000000 -1.60000000
1 30.0000000 -0.40000000
2 0.5498205 -0.03990210
C s
0 0.7374760 0.63810832
0 135.2354832 11.00916230
2 8.5605569 20.13797020
C p
2 10.6863587 -3.24684280
2 23.4979897 0.78505765
O nelec 2
O ul
1 80.0000000 -1.60000000
1 30.0000000 -0.40000000
2 1.0953760 -0.06623814
O s
0 0.9212952 0.39552179
392 APPENDIX B. SAMPLE INPUT FILES
0 28.6481971 2.51654843
2 9.3033500 17.04478500
O p
2 52.3427019 27.97790770
2 30.7220233 -16.49630500
end
scf
vectors input hcore
maxiter 20
end
task scf
start n2
geometry
symmetry d2h
n 0 0 0.542
end
basis spherical
n library cc-pvtz
end
mp2
freeze core
end
ccsd
freeze core
end
task ccsd(t)
394 APPENDIX B. SAMPLE INPUT FILES
Appendix C
Below are examples of the use of the SYMMETRY directive in the compound GEOMETRY directive (Section 6). The z
axis is always the primary rotation axis. When in doubt about which axes and planes are used for the group elements,
the keyword print may be added to the SYMMETRY directive to obtain this information.
C.1 Cs methanol
The z axis is the C2 axis and the σv may be either the xz or the yz planes.
geometry units au
O 0.00000000 0.00000000 0.00000000
H 0.00000000 1.43042809 -1.10715266
symmetry group c2v
end
395
396 APPENDIX C. EXAMPLES OF GEOMETRIES USING SYMMETRY
Although acetylene has symmetry D∞h the subgroup D2h includes all operations that interchange equivalent atoms
which is what determines how much speedup you gain from using symmetry in building a Fock matrix.
The C2 axes are the x, y, and z axes. The σ planes are the xy, xz and yz planes. Generally, the unique atoms are
placed to use the z as the primary rotational axis and use the xz or yz planes as the σ plane.
geometry units au
symmetry group d2h
C 0.000000000 0.000000000 -1.115108538
H 0.000000000 0.000000000 -3.106737425
end
The C2 axes are the x, y, and z axes. The σ planes are the xy, xz and yz planes. Generally, the unique atoms are placed
to use the z as the primary rotational axis and use the xz or yz planes as the σ plane.
C.5 Td methane
For ease of use, the primary C3 axis should be the x=y=z axis. The 3 C2 axes are the x, y, and z.
geometry units au
c 0.0000000 0.0000000 0.0000000
h 1.1828637 1.1828637 1.1828637
symmetry group Td
end
C.6 Ih buckminsterfullerene
One of the C5 axes is the z axis and the point of inversion is the origin.
C.7 S4 porphyrin
The S4 and C2 rotation axis is the z axis. The reflection plane for the S4 operation is the xy plane.
The C3 axis is the z axis. The σh plane is the xy plane. One of the perpendicular C2 axes is the x=y axis. One of
the σv planes is the plane containing the x=y axis and the z axis. (The other axes and planes are generated by the C3
operation.)
geometry units au
symmetry group d3h
geometry units au
symmetry D3d
end
geometry units au
C 1.855 1.855 0
H 3.289 3.289 0
symmetry D6h
end
geometry units au
b 0 0 0
o 2.27238285 1.19464491 0.00000000
h 2.10895420 2.97347707 0.00000000
symmetry C3h
end
The C5 axis is the z axis. The center of inversion is the origin. One of the perpendicular C2 axes is the x axis. One of
the σd planes is the yz plane.
fe 0 0 0
c 0 1.194 1.789
h 0 2.256 1.789
end
The C4 axis is the z axis. The σv planes are the yz and the xz planes. The σd planes are: 1) the plane containing the
x=y axis and the z axis and 2) the plane containing the -x=y axis and the z axis.
geometry units au
S 0.00000000 0.00000000 -0.14917600
Cl 0.00000000 0.00000000 4.03279700
F 3.13694200 0.00000000 -0.15321800
F 0.00000000 0.00000000 -3.27074500
symmetry C4v
end
The C2 axis is the z axis. The origin is the inversion center. The σh plane is the xy plane.
charge -1
geometry units angstroms
symmetry d5h
c 0 1.1853 0
h 0 2.2654 0
end
geometry units au
Au 0 0 0
Cl 0 4.033 0
symmetry D4h
end
Appendix D
Running NWChem
The command required to invoke NWChem is machine dependent, whereas most of the NWChem input is machine
independent1 .
To run NWChem sequentially on nearly all UNIX-based platforms simply use the command nwchem and provide the
name of the input file as an argument (See section 2.1 for more information). This does assume that either nwchem is
in your path or you have set an alias of nwchem to point to the appropriate executable.
Output is to standard output, standard error and Fortran unit 6 (usually the same as standard output). Files are
created by default in the current directory, though this may be overridden in the input (section 5.2).
Generally, one will run a job with the following command:
nwchem input.nw >& input.out &
These platforms require the use of the TCGMSG2 parallel command and thus also require the definition of a
process-group (or procgroup) file. The process-group file describes how many processes to start, what program to run,
which machines to use, which directories to work in, and under which userid to run the processes. By convention the
process-group file has a .p suffix.
The process-group file is read to end-of-file. The character # (hash or pound sign) is used to indicate a comment
which continues to the next new-line character. Each line describes a cluster of processes and consists of the following
whitespace separated fields:
401
402 APPENDIX D. RUNNING NWCHEM
• userid – The user-name on the machine that will be executing the process.
• hostname – The hostname of the machine to execute this process. If it is the same machine on which parallel
was invoked the name must match the value returned by the command hostname. If a remote machine it must
allow remote execution from this machine (see man pages for rlogin, rsh).
• nslave – The total number of copies of this process to be executing on the specified machine. Only “clusters”
of identical processes specified in this fashion can use shared memory to communicate. If no shared memory is
supported on machine <hostname> then only the value one (1) is valid.
• executable – Full path name on the host <hostname> of the image to execute. If <hostname> is the
local machine then a local path will suffice.
• workdir – Full path name on the host <hostname> of the directory to work in. Processes execute a chdir()
to this directory before returning from pbegin(). If specified as a “.” then remote processes will use the login
directory on that machine and local processes (relative to where parallel was invoked) will use the current
directory of parallel.
then 4 processes running NWChem would be started on the machine pc running as user d3g681 in directory
"/scr22/rjh". To actually run this simply type:
N.B. : The first process specified (process zero) is the only process that
Thus, if your file systems are physically distributed (e.g., most workstation clusters) you must ensure that process zero
can correctly resolve the paths for the input and database files.
N.B. In releases of NWChem prior to 3.3 additional processes had to be created on workstation clusters to support
remote access to shared memory. This is no longer the case. The TCGMSG process group file now just needs to refer
to processes running NWChem.
• using mpirun:
• If you have all nodes connected via shared memory and you have installed the ch_shmem version of MPICH,
you can do
D.5 IBM SP
If using POE (IBM’s Parallel Operating Environment) interactively, simply create the list of nodes to use in the file
"host.list" in the current directory and invoke NWChem with
where n is the number of processes to use. Process 0 will run on the first node in "host.list" and must have
access to the input and other necessary files. Very significant performance gains may be had by setting the following
environment variables before running NWChem (or setting them using POE command line options).
• setenv MP_EUILIB us — dedicated user space communication over the switch (the default is IP over the
switch which is much slower).
• setenv MP_CSS_INTERRUPT yes — enable interrupts when a message arrives (the default is to poll
which significantly slows down global array accesses).
For batch execution, we recommend use of the llnw command which is installed in /usr/local/bin on the
EMSL/PNNL IBM SP. If you are not running on that system, the llnw script may be found in the NWChem distribu-
tion directory contrib/loadleveler. Interactive help may be obtained with the command llnw -help. Otherwise, the
very simplest job to run NWChem in batch using Load Leveller is something like this
#!/bin/csh -x
# @ job_type = parallel
# @ class = small
# @ network.lapi = css0,not_shared,US
# @ input = /dev/null
# @ output = <OUTPUT_FILE_NAME>
# @ error = <ERROUT_FILE_NAME>
# @ environment = COPY_ALL; MP_PULSE=0; MP_SINGLE_THREAD=yes; MP_WAIT_MODE=yield; r
# @ min_processors = 7
# @ max_processors = 7
# @ cpu_limit = 1:00:00
# @ wall_clock_limit = 1:00:00
# @ queue
#
cd /scratch
nwchem <INPUT_FILE_NAME>
404 APPENDIX D. RUNNING NWCHEM
# @ network.lapi = css0,shared,US
# @ node = NNODE
# @ tasks_per_node = NTASK
# @ network.lapi = css0,not_shared,US
# @ min_processors = 7
# @ max_processors = 7
where NNODE is the number of physical nodes to be used and NTASK is the number of tasks per node.
These files and the NWChem executable must be in a file system accessible to all processes. Put the above into a
file (e.g., "test.job") and submit it with the command
llsubmit test.job
It will run a 7 processor, 1 hour job in the queue small. It should be apparent how to change these values.
Note that on many IBM SPs, including that at EMSL, the local scratch disks are wiped clean at the beginning of
each job and therefore persistent files should be stored elsewhere. PIOFS is recommended for files larger than a few
MB.
where npes is the number of processors and input_file is the name of your input file.
where npes is the number of processors and input_file is the name of your input file.
where and input_file is the name of your input file. If you use WMPI, you must have a file named nw32.pg
in the $NWCHEM_TOP/bin/win32 directory; the file must only contains the following single line
local 0
D.9. TESTED PLATFORMS AND O/S VERSIONS 405
• HP DEC alpha workstation , Tru64 V5.1, Compaq Fortran V5.3, V5.4.2, V5.5.1
• Linux with Intel x86 cpus. NWChem Release 4.5 has been tested on RedHat 6.x and 7.x, Mandrake 7.x. We
have tested NWChem on Linux for the Power PC Macintosh with Yellow Dog 2.4. These all use the GCC
compiler at different levels. The Intel Fortran Compiler version 7 is supported. The Portland Group Compiler
has been tested in a less robust manner. Automatic generation of SSE2 optimized code is available when the
Intel compiler is used (ifc vs g77 performances gain of 40% in some benchmarks) A somewhat Athlon optimized
code can be generated under the GNU or Intel compilers by typing make _CPU=k7. GCC3 specific options
can be turned on by typing make GCC31=y
• HP 9000/800 workstations with HPUX B.11.00. f90 must be used for compilation.
• Intel x86 with Windows 2000 has been tested with Compaq Visual Fortran 6.0 and 6.1 with WMPI 1.3 or
NT-Mpich. NT-MPICH is available from http://www-unix.mcs.anl.gov/˜ashton/mpich.nt/
• Intel IA64 under Linux (with Intel compilers version 7 and later) and under HPUX.
• Fujitsu VPP computers.