Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Hands-On Graph Neural Networks Using Python: Practical techniques and architectures for building powerful graph and deep learning apps with PyTorch
Hands-On Graph Neural Networks Using Python: Practical techniques and architectures for building powerful graph and deep learning apps with PyTorch
Hands-On Graph Neural Networks Using Python: Practical techniques and architectures for building powerful graph and deep learning apps with PyTorch
Ebook671 pages4 hours

Hands-On Graph Neural Networks Using Python: Practical techniques and architectures for building powerful graph and deep learning apps with PyTorch

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Graph neural networks are a highly effective tool for analyzing data that can be represented as a graph, such as networks, chemical compounds, or transportation networks. The past few years have seen an explosion in the use of graph neural networks, with their application ranging from natural language processing and computer vision to recommendation systems and drug discovery.
Hands-On Graph Neural Networks Using Python begins with the fundamentals of graph theory and shows you how to create graph datasets from tabular data. As you advance, you’ll explore major graph neural network architectures and learn essential concepts such as graph convolution, self-attention, link prediction, and heterogeneous graphs. Finally, the book proposes applications to solve real-life problems, enabling you to build a professional portfolio. The code is readily available online and can be easily adapted to other datasets and apps.
By the end of this book, you’ll have learned to create graph datasets, implement graph neural networks using Python and PyTorch Geometric, and apply them to solve real-world problems, along with building and training graph neural network models for node and graph classification, link prediction, and much more.

LanguageEnglish
Release dateApr 14, 2023
ISBN9781804610701
Hands-On Graph Neural Networks Using Python: Practical techniques and architectures for building powerful graph and deep learning apps with PyTorch

Related to Hands-On Graph Neural Networks Using Python

Related ebooks

Intelligence (AI) & Semantics For You

View More

Related articles

Reviews for Hands-On Graph Neural Networks Using Python

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Hands-On Graph Neural Networks Using Python - Maxime Labonne

    Cover.jpg

    BIRMINGHAM—MUMBAI

    Hands-On Graph Neural Networks Using Python

    Copyright © 2023 Packt Publishing

    All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

    Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

    Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

    Group Product Manager: Gebin George

    Publishing Product Manager: Dinesh Chaudhary

    Senior Editor: David Sugarman

    Technical Editor: Devanshi Ayare

    Copy Editor: Safis Editing

    Project Coordinator: Farheen Fathima

    Proofreader: Safis Editing

    Indexer: Tejal Daruwale Soni

    Production Designer: Joshua Misquitta

    First published: April 2023

    Production reference: 1240323

    Published by Packt Publishing Ltd.

    Livery Place

    35 Livery Street

    Birmingham

    B3 2PB, UK.

    ISBN 978-1-80461-752-6

    www.packtpub.com

    Contributors

    About the author

    Maxime Labonne is a senior applied researcher at J.P. Morgan with a Ph.D. in machine learning and cyber security from the Polytechnic Institute of Paris. During his Ph.D., Maxime worked on developing machine learning algorithms for anomaly detection in computer networks. He then joined the AI Connectivity Lab at Airbus, where he applied his expertise in machine learning to improve the security and performance of computer networks. He then joined J.P. Morgan, where he now develops techniques for solving a variety of challenging problems in finance and other domains. In addition to his research work, Maxime is passionate about sharing his knowledge and experience with others through Twitter (@maximelabonne) and his personal blog.

    About the reviewers

    Dr. Mürsel Taşgın is a computer scientist with a Ph.D. He graduated from the Computer Engineering Department of Middle East Technical University in 2002. He completed his master of science and Ph.D. in the Computer Engineering Department of Bogazici University. During his Ph.D., he worked in the field of complex systems, graphs, and ML. He also worked in industry in technical, research, and managerial roles (at Mostly.AI, KKB, Turkcell, and Akbank). Dr. Mürsel Taşgın’s current focus is mainly on generative AI, graph machine learning, and financial applications of machine learning. He also teaches artificial intelligence (AI)/ML courses at universities.

    I would like to thank my dear wife Zehra and precious son Kerem for their support and understanding during my long working hours.

    Amir Shirian is a data scientist at Nokia, where he applies his expertise in multimodal signal processing and ML to solve complex problems. He received his Ph.D. in computer science from the University of Warwick, England, after completing his bachelor of science and master of science degrees in electrical engineering at the University of Tehran, Iran. Amir’s research focuses on developing algorithms and models for emotion and behavior understanding, with a particular interest in using graph neural networks to analyze and interpret data from multiple sources. His work has been published in several high-profile academic journals and presented at international conferences. Amir enjoys hiking, playing 3tar, and exploring new technologies in his free time.

    Lorenzo Giusti is a Ph.D. student in data science at La Sapienza, University of Rome, with a focus on extending graph neural networks through topological deep learning. He has extensive research experience as a visiting Ph.D. student at Cambridge, as a research scientist intern at NASA, where he supervised a team and led a project on synthesizing the Martian environment using images from spacecraft cameras, and as a research scientist intern at CERN, working on anomaly detection for particle physics accelerators. Lorenzo also has a master of science in data science from La Sapienza and a bachelor of engineering in computer engineering from Roma Tre University, where he focused on quantum technologies.

    Table of Contents

    Preface

    Part 1: Introduction to Graph Learning

    1

    Getting Started with Graph Learning

    Why graphs?

    Why graph learning?

    Why graph neural networks?

    Summary

    Further reading

    2

    Graph Theory for Graph Neural Networks

    Technical requirements

    Introducing graph properties

    Directed graphs

    Weighted graphs

    Connected graphs

    Types of graphs

    Discovering graph concepts

    Fundamental objects

    Graph measures

    Adjacency matrix representation

    Exploring graph algorithms

    Breadth-first search

    Depth-first search

    Summary

    3

    Creating Node Representations with DeepWalk

    Technical requirements

    Introducing Word2Vec

    CBOW versus skip-gram

    Creating skip-grams

    The skip-gram model

    DeepWalk and random walks

    Implementing DeepWalk

    Summary

    Further reading

    Part 2: Fundamentals

    4

    Improving Embeddings with Biased Random Walks in Node2Vec

    Technical requirements

    Introducing Node2Vec

    Defining a neighborhood

    Introducing biases in random walks

    Implementing Node2Vec

    Building a movie RecSys

    Summary

    Further reading

    5

    Including Node Features with Vanilla Neural Networks

    Technical requirements

    Introducing graph datasets

    The Cora dataset

    The Facebook Page-Page dataset

    Classifying nodes with vanilla neural networks

    Classifying nodes with vanilla graph neural networks

    Summary

    Further reading

    6

    Introducing Graph Convolutional Networks

    Technical requirements

    Designing the graph convolutional layer

    Comparing graph convolutional and graph linear layers

    Predicting web traffic with node regression

    Summary

    Further reading

    7

    Graph Attention Networks

    Technical requirements

    Introducing the graph attention layer

    Linear transformation

    Activation function

    Softmax normalization

    Multi-head attention

    Improved graph attention layer

    Implementing the graph attention layer in NumPy

    Implementing a GAT in PyTorch Geometric

    Summary

    Part 3: Advanced Techniques

    8

    Scaling Up Graph Neural Networks with GraphSAGE

    Technical requirements

    Introducing GraphSAGE

    Neighbor sampling

    Aggregation

    Classifying nodes on PubMed

    Inductive learning on protein-protein interactions

    Summary

    Further reading

    9

    Defining Expressiveness for Graph Classification

    Technical requirements

    Defining expressiveness

    Introducing the GIN

    Classifying graphs using GIN

    Graph classification

    Implementing the GIN

    Summary

    Further reading

    10

    Predicting Links with Graph Neural Networks

    Technical requirements

    Predicting links with traditional methods

    Heuristic techniques

    Matrix factorization

    Predicting links with node embeddings

    Introducing Graph Autoencoders

    Introducing VGAEs

    Implementing a VGAE

    Predicting links with SEAL

    Introducing the SEAL framework

    Implementing the SEAL framework

    Summary

    Further reading

    11

    Generating Graphs Using Graph Neural Networks

    Technical requirements

    Generating graphs with traditional techniques

    The Erdős–Rényi model

    The small-world model

    Generating graphs with graph neural networks

    Graph variational autoencoders

    Autoregressive models

    Generative adversarial networks

    Generating molecules with MolGAN

    Summary

    Further reading

    12

    Learning from Heterogeneous Graphs

    Technical requirements

    The message passing neural network framework

    Introducing heterogeneous graphs

    Transforming homogeneous GNNs to heterogeneous GNNs

    Implementing a hierarchical self-attention network

    Summary

    Further reading

    13

    Temporal Graph Neural Networks

    Technical requirements

    Introducing dynamic graphs

    Forecasting web traffic

    Introducing EvolveGCN

    Implementing EvolveGCN

    Predicting cases of COVID-19

    Introducing MPNN-LSTM

    Implementing MPNN-LSTM

    Summary

    Further reading

    14

    Explaining Graph Neural Networks

    Technical requirements

    Introducing explanation techniques

    Explaining GNNs with GNNExplainer

    Introducing GNNExplainer

    Implementing GNNExplainer

    Explaining GNNs with Captum

    Introducing Captum and integrated gradients

    Implementing integrated gradients

    Summary

    Further reading

    Part 4: Applications

    15

    Forecasting Traffic Using A3T-GCN

    Technical requirements

    Exploring the PeMS-M dataset

    Processing the dataset

    Implementing the A3T-GCN architecture

    Summary

    Further reading

    16

    Detecting Anomalies Using Heterogeneous GNNs

    Technical requirements

    Exploring the CIDDS-001 dataset

    Preprocessing the CIDDS-001 dataset

    Implementing a heterogeneous GNN

    Summary

    Further reading

    17

    Building a Recommender System Using LightGCN

    Technical requirements

    Exploring the Book-Crossing dataset

    Preprocessing the Book-Crossing dataset

    Implementing the LightGCN architecture

    Summary

    Further reading

    18

    Unlocking the Potential of Graph Neural Networks for Real-World Applications

    Index

    Other Books You May Enjoy

    Preface

    In just ten years, Graph Neural Networks (GNNs) have become an essential and popular deep learning architecture. They have already had a significant impact various industries, such as in drug discovery, where GNNs predicted a new antibiotic, named halicin, and have improved estimated time of arrival calculations on Google Maps. Tech companies and universities are exploring the potential of GNNs in various applications, including recommender systems, fake news detection, and chip design. GNNs have enormous potential and many yet-to-be-discovered applications, making them a critical tool for solving global problems.

    In this book, we aim to provide a comprehensive and practical overview of the world of GNNs. We will begin by exploring the fundamental concepts of graph theory and graph learning and then delve into the most widely used and well-established GNN architectures. As we progress, we will also cover the latest advances in GNNs and introduce specialized architectures that are designed to tackle specific tasks, such as graph generation, link prediction, and more.

    In addition to these specialized chapters, we will provide hands-on experience through three practical projects. These projects will cover critical real-world applications of GNNs, including traffic forecasting, anomaly detection, and recommender systems. Through these projects, you will gain a deeper understanding of how GNNs work and also develop the skills to implement them in practical scenarios.

    Finally, this book provides a hands-on learning experience with readable code for every chapter’s techniques and relevant applications, which are readily accessible on GitHub and Google Colab.

    By the end of this book, you will have a comprehensive understanding of the field of graph learning and GNNs and will be well-equipped to design and implement these models for a wide range of applications.

    Who this book is for

    This book is intended for individuals interested in learning about GNNs and how they can be applied to various real-world problems. This book is ideal for data scientists, machine learning engineers, and artificial intelligence (AI) professionals who want to gain practical experience in designing and implementing GNNs. This book is written for individuals with prior knowledge of deep learning and machine learning. However, it provides a comprehensive introduction to the fundamental concepts of graph theory and graph learning for those new to the field. It will also be useful for researchers and students in computer science, mathematics, and engineering who want to expand their knowledge in this rapidly growing area of research.

    What this book covers

    Chapter 1, Getting Started with Graph Learning, provides a comprehensive introduction to GNNs, including their importance in modern data analysis and machine learning. The chapter starts by exploring the relevance of graphs as a representation of data and their widespread use in various domains. It then delves into the importance of graph learning, including different applications and techniques. Finally, the chapter focuses on the GNN architecture and highlights its unique features and performance compared to other methods.

    Chapter 2, Graph Theory for Graph Neural Networks, covers the basics of graph theory and introduces various types of graphs, including their properties and applications. This chapter also covers fundamental graph concepts, such as the adjacency matrix, graph measures, such as centrality, and graph algorithms, Breadth-First Search (BFS) and Depth-First Search (DFS).

    Chapter 3, Creating Node Representations with DeepWalk, focuses on DeepWalk, a pioneer in applying machine learning to graph data. The main objective of the DeepWalk architecture is to generate node representations that other models can utilize for downstream tasks such as node classification. The chapter covers two key components of DeepWalk – Word2Vec and random walks – with a particular emphasis on the Word2Vec skip-gram model.

    Chapter 4, Improving Embeddings with Biased Random Walks in Node2Vec, focuses on the Node2Vec architecture, which is based on the DeepWalk architecture covered in the previous chapter. The chapter covers the modifications made to the random walk generation in Node2Vec and how to select the best parameters for a specific graph. The implementation of Node2Vec is compared to DeepWalk on Zachary’s Karate Club to highlight the differences between the two architectures. The chapter concludes with a practical application of Node2Vec, building a movie recommendation system.

    Chapter 5, Including Node Features with Vanilla Neural Networks, explores the integration of additional information, such as node and edge features, into the graph embeddings to produce more accurate results. The chapter starts with a comparison of vanilla neural networks’ performance on node features only, treated as tabular datasets. Then, we will experiment with adding topological information to the neural networks, leading to the creation of a simple vanilla GNN architecture.

    Chapter 6, Introducing Graph Convolutional Networks, focuses on the Graph Convolutional Network (GCN) architecture and its importance as a blueprint for GNNs. It covers the limitations of previous vanilla GNN layers and explains the motivation behind GCNs. The chapter details how the GCN layer works, its performance improvements over the vanilla GNN layer, and its implementation on the Cora and Facebook Page-Page datasets using PyTorch Geometric. The chapter also touches upon the task of node regression and the benefits of transforming tabular data into a graph.

    Chapter 7, Graph Attention Networks, focuses on Graph Attention Networks (GATs), which are an improvement over GCNs. The chapter explains how GATs work by using the concept of self-attention and provides a step-by-step understanding of the graph attention layer. The chapter also implements a graph attention layer from scratch using NumPy. The final section of the chapter discusses the use of a GAT on two node classification datasets, Cora and CiteSeer, and compares the accuracy with that of a GCN.

    Chapter 8, Scaling up Graph Neural Networks with GraphSAGE, focuses on the GraphSAGE architecture and its ability to handle large graphs effectively. The chapter covers the two main ideas behind GraphSAGE, including its neighbor sampling technique and aggregation operators. You will learn about the variants proposed by tech companies such as Uber Eats and Pinterest, as well as the benefits of GraphSAGE’s inductive approach. The chapter concludes by implementing GraphSAGE for node classification and multi-label classification tasks.

    Chapter 9, Defining Expressiveness for Graph Classification, explores the concept of expressiveness in GNNs and how it can be used to design better models. It introduces the Weisfeiler-Leman (WL) test, which provides the framework for understanding expressiveness in GNNs. The chapter uses the WL test to compare different GNN layers and determine the most expressive one. Based on this result, a more powerful GNN is designed and implemented using PyTorch Geometric. The chapter concludes with a comparison of different methods for graph classification on the PROTEINS dataset.

    Chapter 10, Predicting Links with Graph Neural Networks, focuses on link prediction in graphs. It covers traditional techniques, such as matrix factorization and GNN-based methods. The chapter explains the concept of link prediction and its importance in social networks and recommender systems. You will learn about the limitations of traditional techniques and the benefits of using GNN-based methods. We will explore three GNN-based techniques from two different families, including node embeddings and subgraph representation. Finally, you will implement various link prediction techniques in PyTorch Geometric and choose the best method for a given problem.

    Chapter 11, Generating Graphs Using Graph Neural Networks, explores the field of graph generation, which involves finding methods to create new graphs. The chapter first introduces you to traditional techniques such as Erdős–Rényi and small-world models. Then you will focus on three families of solutions for GNN-based graph generation: VAE-based, autoregressive, and GAN-based models. The chapter concludes with an implementation of a GAN-based framework with Reinforcement Learning (RL) to generate new chemical compounds using the DeepChem library with TensorFlow.

    Chapter 12, Learning from Heterogeneous Graphs, focuses on heterogeneous GNNs. Heterogeneous graphs contain different types of nodes and edges, in contrast to homogeneous graphs, which only involve one type of node and one type of edge. The chapter begins by reviewing the Message Passing Neural Network (MPNN) framework for homogeneous GNNs, then expands the framework to heterogeneous networks. Finally, we introduce a technique for creating a heterogeneous dataset, transforming homogeneous architectures into heterogeneous ones, and discussing an architecture specifically designed for processing heterogeneous networks.

    Chapter 13, Temporal Graph Neural Networks, focuses on Temporal GNNs, or Spatio-Temporal GNNs, which are a type of GNN that can handle graphs with changing edges and features over time. The chapter first explains the concept of dynamic graphs and the applications of temporal GNNs, focusing on time series forecasting. The chapter then moves on to the application of temporal GNNs to web traffic forecasting to improve results using temporal information. Finally, the chapter describes another temporal GNN architecture specifically designed for dynamic graphs and applies it to the task of epidemic forecasting.

    Chapter 14, Explaining Graph Neural Networks, covers various techniques to better understand the predictions and behavior of a GNN model. The chapter highlights two popular explanation methods: GNNExplainer and integrated gradients. Then, you will see the application of these techniques on a graph classification task using the MUTAG dataset and a node classification task using the Twitch social network.

    Chapter 15, Forecasting Traffic Using A3T-GCN, focuses on the application of Temporal Graph Neural Networks in the field of traffic forecasting. It highlights the importance of accurate traffic forecasts in smart cities and the challenges of traffic forecasting due to complex spatial and temporal dependencies. The chapter covers the steps involved in processing a new dataset to create a temporal graph and the implementation of a new type of temporal GNN to predict future traffic speed. Finally, the results are compared to a baseline solution to verify the relevance of the architecture.

    Chapter 16, Detecting Anomalies Using Heterogeneous GNNs, focuses on the application of GNNs in anomaly detection. GNNs, with their ability to capture complex relationships, make them well-suited for detecting anomalies and can handle large amounts of data efficiently. In this chapter, you will learn how to implement a GNN for intrusion detection in computer networks using the CIDDS-001 dataset. The chapter covers processing the dataset, building relevant features, implementing a heterogenous GNN, and evaluating the results to determine its effectiveness in detecting anomalies in network traffic.

    Chapter 17, Recommending Books Using LightGCN, focuses on the application of GNNs in recommender systems. The goal of recommender systems is to provide personalized recommendations to users based on their interests and past interactions. GNNs are well-suited for this task as they can effectively incorporate complex relationships between users and items. In this chapter, the LightGCN architecture is introduced as a GNN specifically designed for recommender systems. Using the Book-Crossing dataset, the chapter demonstrates how to build a book recommender system with collaborative filtering using the LightGCN architecture.

    Chapter 18, Unlocking the Potential of Graph Neural Networks for Real-Word Applications, summarizes what we have learned throughout the book, and looks ahead to the future of GNNs.

    To get the most out of this book

    You should have a basic understanding of graph theory and machine learning concepts, such as supervised and unsupervised learning, training, and the evaluation of models to maximize your learning experience. Familiarity with deep learning frameworks, such as PyTorch, will also be useful, although not essential, as the book will provide a comprehensive introduction to the mathematical concepts and their implementation.

    To install Python 3.8.15, you can download the latest version from the official Python website: https://www.python.org/downloads/. We strongly recommend using a virtual environment, such as venv or conda.

    Optionally, if you want to use a Graphics Processing Unit (GPU) from NVIDIA to accelerate training and inference, you will need to install CUDA and cuDNN:

    CUDA is a parallel computing platform and API developed by NVIDIA for general computing on GPUs. To install CUDA, you can follow the instructions on the NVIDIA website: https://developer.nvidia.com/cuda-downloads.

    cuDNN is a library developed by NVIDIA, which provides highly optimized GPU implementations of primitives for deep learning algorithms. To install cuDNN, you need to create an account on the NVIDIA website and download the library from the cuDNN download page: https://developer.nvidia.com/cudnn.

    You can check out the list of CUDA-enabled GPU products on the NVIDIA website: https://developer.nvidia.com/cuda-gpus.

    To install PyTorch 1.13.1, you can follow the instructions on the official PyTorch website: https://pytorch.org/. You can choose the installation method that is most appropriate for your system (including CUDA and cuDNN).

    To install PyTorch Geometric 2.2.0, you can follow the instructions in the GitHub repository: https://pytorch-geometric.readthedocs.io/en/2.2.0/notes/installation.html. You will need to have PyTorch installed on your system first.

    Chapter 11 requires TensorFlow 2.4. To install it, you can follow the instructions on the official TensorFlow website: https://www.tensorflow.org/install. You can choose the installation method that is most appropriate for your system and the version of TensorFlow you want to use.

    Chapter 14 requires an older version of PyTorch Geometric (version 2.0.4). It is recommended to create a specific virtual environment for this chapter.

    Chapter 15, Chapter 16, and Chapter 17 require a high GPU memory usage. You can lower it by decreasing the size of the training set in the code.

    Other Python libraries are required in some or most chapters. You can install them using pip install , or using another installer depending on your configuration (such as conda). Here is the complete list of required packages with the corresponding versions:

    pandas==1.5.2

    gensim==4.3.0

    networkx==2.8.8

    matplotlib==3.6.3

    node2vec==0.4.6

    seaborn==0.12.2

    scikit-learn==1.2.0

    deepchem==2.7.1

    torch-geometric-temporal==0.54.0

    captum==0.6.0

    The complete list of requirements is available on GitHub at https://github.com/PacktPublishing/Hands-On-Graph-Neural-Networks-Using-Python. Alternatively, you can directly import notebooks in Google Colab at https://colab.research.google.com.

    If you are using the digital version of this book, we advise you to type the code yourself or access the code from the book’s GitHub repository (a link is available in the next section). Doing so will help you avoid any potential errors related to the copying and pasting of code.

    Download the example code files

    You can download the example code files for this book from GitHub at https://github.com/PacktPublishing/Hands-On-Graph-Neural-Networks-Using-Python. If there’s an update to the code, it will be updated in the GitHub repository.

    We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

    Download the color images

    We also provide a PDF file that has color images of the screenshots and diagrams used in this book. You can download it here: https://packt.link/gaFU6.

    Conventions used

    There are a number of text conventions used throughout this book.

    Code in text: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: We initialize two lists (visited and queue) and add the starting node.

    A block of code is set as follows:

    DG = nx.DiGraph()

    DG.add_edges_from([('A', 'B'), ('A', 'C'), ('B', 'D'), ('B', 'E'), ('C', 'F'), ('C', 'G')])

    Tips or important notes

    Appear like this.

    Enjoying the preview?
    Page 1 of 1