Dynamic Causal Discovery

Ulrich Schaechtle

Dynamic Causal Discovery

Ulrich Schaechtle

Department of Computer Science

Research output: Thesis › Doctoral Thesis

269 Downloads (Pure)

Abstract

Discovering rich causal models computationally can be key to creating human like artificial intelligence. Recent research has detached the notion of time from approaches to causal inference and has thus obscured modelling of and inference over dynamic dependencies in causal systems. This thesis investigates how prior knowledge of temporal and dynamic dependencies, even if conceptually unrelated to causal inference, informs both modelling and inference over causal relations. Three novel methods are introduced incrementally and their contribution is positioned in the wider area of causal discovery. The first method discovers causal relations within high-dimensional tensor data as they are typically recorded in non-experimental databases. The method allows simultaneous inclusion of numerous dimensions within the data analysis such as samples, time and domain variables construed as tensors. It relies on dynamically changing noise but it does not model it explicitly. This explicit handling of changing noise levels, also known as the phenomenon of heteroskedasticity, is interpreted via a set of functional equations in the second method. This method not only exploits changing noise levels, but also simplifies assumptions made for causal discovery. However, as we expect heteroskedastic noise, it requires latent structural relations and variables for noise which produce heteroskedasticity. But learning such latent concepts begs for the discovery of more expressive models. The third method addresses the discovery of more complex models by introducing time as an observed entity in the system and builds on probabilistic programming. gpmem is a probabilistic programming technique that uses Gaussian Processes and is proposed here to provide a statistical alternative to memoization. We test all three methods on synthetic and real world data. Real world data-sets range over a variety of domains, for example healthcare, social sciences and biology. For these data-sets we achieve higher accuracy for causal discovery and more expressiveness than the current state-of-the-art. We use adequate and recent benchmarks for
comparison.

Original language	English
Qualification	Ph.D.
Awarding Institution	Royal Holloway, University of London
Supervisors/Advisors	Stathis, Kostas, Supervisor
Award date	1 Feb 2016
Publication status	Unpublished - 2016

Keywords

Causality
Gaussian Processes
Causal Discovery
Probabilistic Programming

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

Access to Document

ThesisSchaechtle_FinalOther version, 20.4 MB

Cite this

@phdthesis{cd3d3473702f41ff921d94af862c6aad,

title = "Dynamic Causal Discovery",

abstract = "Discovering rich causal models computationally can be key to creating human like artificial intelligence. Recent research has detached the notion of time from approaches to causal inference and has thus obscured modelling of and inference over dynamic dependencies in causal systems. This thesis investigates how prior knowledge of temporal and dynamic dependencies, even if conceptually unrelated to causal inference, informs both modelling and inference over causal relations. Three novel methods are introduced incrementally and their contribution is positioned in the wider area of causal discovery. The first method discovers causal relations within high-dimensional tensor data as they are typically recorded in non-experimental databases. The method allows simultaneous inclusion of numerous dimensions within the data analysis such as samples, time and domain variables construed as tensors. It relies on dynamically changing noise but it does not model it explicitly. This explicit handling of changing noise levels, also known as the phenomenon of heteroskedasticity, is interpreted via a set of functional equations in the second method. This method not only exploits changing noise levels, but also simplifies assumptions made for causal discovery. However, as we expect heteroskedastic noise, it requires latent structural relations and variables for noise which produce heteroskedasticity. But learning such latent concepts begs for the discovery of more expressive models. The third method addresses the discovery of more complex models by introducing time as an observed entity in the system and builds on probabilistic programming. gpmem is a probabilistic programming technique that uses Gaussian Processes and is proposed here to provide a statistical alternative to memoization. We test all three methods on synthetic and real world data. Real world data-sets range over a variety of domains, for example healthcare, social sciences and biology. For these data-sets we achieve higher accuracy for causal discovery and more expressiveness than the current state-of-the-art. We use adequate and recent benchmarks forcomparison.",

keywords = "Causality, Gaussian Processes, Causal Discovery, Probabilistic Programming",

author = "Ulrich Schaechtle",

year = "2016",

language = "English",

school = "Royal Holloway, University of London",

}

TY - BOOK

T1 - Dynamic Causal Discovery

AU - Schaechtle, Ulrich

PY - 2016

Y1 - 2016

N2 - Discovering rich causal models computationally can be key to creating human like artificial intelligence. Recent research has detached the notion of time from approaches to causal inference and has thus obscured modelling of and inference over dynamic dependencies in causal systems. This thesis investigates how prior knowledge of temporal and dynamic dependencies, even if conceptually unrelated to causal inference, informs both modelling and inference over causal relations. Three novel methods are introduced incrementally and their contribution is positioned in the wider area of causal discovery. The first method discovers causal relations within high-dimensional tensor data as they are typically recorded in non-experimental databases. The method allows simultaneous inclusion of numerous dimensions within the data analysis such as samples, time and domain variables construed as tensors. It relies on dynamically changing noise but it does not model it explicitly. This explicit handling of changing noise levels, also known as the phenomenon of heteroskedasticity, is interpreted via a set of functional equations in the second method. This method not only exploits changing noise levels, but also simplifies assumptions made for causal discovery. However, as we expect heteroskedastic noise, it requires latent structural relations and variables for noise which produce heteroskedasticity. But learning such latent concepts begs for the discovery of more expressive models. The third method addresses the discovery of more complex models by introducing time as an observed entity in the system and builds on probabilistic programming. gpmem is a probabilistic programming technique that uses Gaussian Processes and is proposed here to provide a statistical alternative to memoization. We test all three methods on synthetic and real world data. Real world data-sets range over a variety of domains, for example healthcare, social sciences and biology. For these data-sets we achieve higher accuracy for causal discovery and more expressiveness than the current state-of-the-art. We use adequate and recent benchmarks forcomparison.

AB - Discovering rich causal models computationally can be key to creating human like artificial intelligence. Recent research has detached the notion of time from approaches to causal inference and has thus obscured modelling of and inference over dynamic dependencies in causal systems. This thesis investigates how prior knowledge of temporal and dynamic dependencies, even if conceptually unrelated to causal inference, informs both modelling and inference over causal relations. Three novel methods are introduced incrementally and their contribution is positioned in the wider area of causal discovery. The first method discovers causal relations within high-dimensional tensor data as they are typically recorded in non-experimental databases. The method allows simultaneous inclusion of numerous dimensions within the data analysis such as samples, time and domain variables construed as tensors. It relies on dynamically changing noise but it does not model it explicitly. This explicit handling of changing noise levels, also known as the phenomenon of heteroskedasticity, is interpreted via a set of functional equations in the second method. This method not only exploits changing noise levels, but also simplifies assumptions made for causal discovery. However, as we expect heteroskedastic noise, it requires latent structural relations and variables for noise which produce heteroskedasticity. But learning such latent concepts begs for the discovery of more expressive models. The third method addresses the discovery of more complex models by introducing time as an observed entity in the system and builds on probabilistic programming. gpmem is a probabilistic programming technique that uses Gaussian Processes and is proposed here to provide a statistical alternative to memoization. We test all three methods on synthetic and real world data. Real world data-sets range over a variety of domains, for example healthcare, social sciences and biology. For these data-sets we achieve higher accuracy for causal discovery and more expressiveness than the current state-of-the-art. We use adequate and recent benchmarks forcomparison.

KW - Causality

KW - Gaussian Processes

KW - Causal Discovery

KW - Probabilistic Programming

UR - http://www.schaechtle.com/

M3 - Doctoral Thesis

ER -

Dynamic Causal Discovery

Abstract

Keywords

UN SDGs

Access to Document

Other files and links

Cite this