Skip to content

Bowowzahoya/gephi

Repository files navigation

gephi

Package for creating Gephi nodes and edges .csv/.xlsx files from scientific publication/patent data, as well as analyzing clusters generated by Gephi's Leiden algorithm.

Currently works with Scopus, Lens scholarly and Lens patent exports.

Background

Gephi can create force-directed graphs of nodes and edges. When using keywords, scientific journals or patent categories, this can be used to distill clusters of keywords that define an area of research/R&D.

Clusters can be identified by eye, or with the use of the Leiden algorithm available in Gephi.

See the example/ folder for an example of such a graph.

When clusters have been identified in Gephi, this module can be used to analyze the clusters and how well keywords fit within the clusters.

Nodes and Edges Generation

Usage

Main function is get_nodes_edges(). Nodes and edges can also be generated separately using get_nodes() and get_edges()

Use a dictionary or Series 'limited_node_sizes' for supplying the size of a node when there is a limit in export size (Scopus: 20,000 papers, Lens: 50,000 patents/papers)

There is also the flag 'includes_internal_similarity' to calculate the internal similarity within a node, though this will take a bit longer.

import pandas as pd
limited_node_sizes = pd.Series({"rna":102890})
filepaths_of_exports = ["res/rna.csv", "res/crispr_cas.csv"]
nodes = get_nodes(filepaths_of_exports, limited_node_sizes=limited_node_sizes, includes_internal_similarity=True)

To use Scopus export files set database="scopus" (default) To use Lens export files (either patent or scholarly, both works), set database="lens"

nodes = get_edges(filepaths_of_exports, includes_internal_similarity=True, database="lens")

Analysis

export the nodes table with clusters indicated from Gephi in the "Data Laboratory" tab, and then "export table".

Clustering Functionalities

Usage

There are two functions:

clusters = get_cluster_info(nodes, edges)

This function will gather clusters from exported nodes and edges on which a Leiden algorithm has been run in the Gephi program. It will then add extra information, like internal similarity, internal mean weight of edges, minimum and maximum size (with full or no overlap between nodes), as well as mean edge weights to other clusters.

This information can be used to understand clustering more.

nodes_with_info = get_cluster_info_nodes(nodes, edges)

This function will add extra information to exported nodes on which a Leiden algorithm has been run in the Gephi program, like how much are the mean edge weights of each node to the rest of the cluster, and other clusters.

See example folder for usage.

About

Generate nodes/edges input for gephi

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages