--- title: "User guide for the noisysbmGGM package" author: "Valentin Kilian & Fanny Villers" date: "04/03/2024" output: rmarkdown::html_vignette: toc: true toc_depth: 2 number_sections: true vignette: > %\VignetteIndexEntry{User guide for the noisysbmGGM package} \usepackage[utf8]{inputenc} %\VignetteEngine{knitr::rmarkdown} editor_options: markdown: wrap: sentence --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) ``` `noisysbmGGM` is a R package designed to perform graph inference from noisy data, and in particular to infer Gaussian Graphical Model (GGM). The package accompanies the article titled *Enhancing the Power of Gaussian Graphical Model Inference by Modeling the Graph Structure* by Kilian, Rebafka and Villers available at . The main goal of the package is to help users to analyze complex networks and draw meaningful conclusions from noisy data. # Introduction The package offers two key functionalities: 1. Generation of data according to the Noisy Stochastic Block Model (NSBM) and NSBM inference: The package provides functions to generate data in the noisy stochastic block model, which is a statistical model that can be used when interactions between entities are described via noisy information. The package includes a greedy algorithm to fit parameters, cluster nodes, and a multiple testing procedure to detect significant interactions. 2. Gaussian Graphical Model (GGM) Inference: when observing a sample of a Gaussian vector, the package allows to infer the GGM that encodes the direct relationships between the variables. ## What kind of data ? ### Graph inference in the NSBM : We observe a noisy version of a graph between $p$ nodes, typically a $(p,p)$ matrix containing the values of a test statistic applied on all pairs of nodes. The aim is to detect the significant edges in the graph while controlling the number of false discoveries. ### GGM Inference: In the GGM inference context, we observe a $n$-sample of a Gaussian vector of dimension $p$, and the goal is to infer the GGM that captures the direct relationships between the variables. The GGM inference function starts by computing a $(p,p)$ matrix composed of well chosen test statistics that test conditional dependencies between each pair of variables. Then the same procedure used for NSBM are employed for GGM inference purpose. ## Link with the `noisySBM` package : `noisysbmGGM` is an improved version of certain aspects of the previous `noisySBM` package, as the new greedy algorithm is more efficient than the previous VEM algorithm to fit model parameters. Moreover the package introduces the additional applications to GGM inference. To start with, we load the package: ```{r} library(noisysbmGGM) ``` # Noisy stochastic block model ## The model The **noisy stochastic block model (NSBM)** is a random graph model suited for the problem of graph inference. In this model, we assume that the observed matrix is a perturbation of an unobserved binary graph, which is the *true* graph to be inferred. The binary graph is chosen to be a stochastic block model (SBM) for its capacity to model graph heterogeneity. The definition is stated for an undirected graph without self-loops, but extensions are straightforward. We also consider only the **Gaussian NSBM** but this model can be extended to any parametric model. Let $p\geq 2$ be the number of nodes and $\mathcal{A}=\{(i,j): 1\leq i