Skip to article content

The Future of Scientific Literature: Papers Talking to Each Other

A NeuroLibre Case Study

Back to Article
Integrative literature demonstration
Download Notebook

Integrative literature demonstration

0.0.0.1NeuroLibre Day | September 27, 2024 | Montreal, Canada

0.1A meta-analysis on 5 ALBERTA (ALien Brain ExtRacTion Analytics) articles

NeuroxLink is a mini python package to parse mdast, facilitating cross-paper import of article content from MyST servers. It also introduces some Plotly functionality to work with data from interactive figures!

pip install neuroxlink

PyPI version

# from neuroxlink.src.neuroxlink import NeuroxLink
from neuroxlink import NeuroxLink
import sys
import os
import pandas as pd
import plotly.graph_objects as go
import plotly.express as px
from plotly.subplots import make_subplots
import plotly.io as pio
import numpy as np
pio.renderers.default = "plotly_mimetype"
nlx = NeuroxLink(cdn_url="https://cdn.neurolibre.org")

# Define and create a list of DOIs and figures
doi1 = "10.55458/neurolibre.alberta1"
doi2 = "10.55458/neurolibre.alberta2"
doi3 = "10.55458/neurolibre.alberta3"
doi4 = "10.55458/neurolibre.alberta4"
doi5 = "10.55458/neurolibre.alberta5"
dois = [doi1, doi2, doi3, doi4, doi5]
figures = ['fig1', 'fig2', 'fig3']
nlx.import_papers(dois)
🔗 importing 10.55458/neurolibre.alberta1 from 🌎 https://cdn.neurolibre.org/content/alberta1/paper.json
The Role of Hippocampal Volume, Brain Density, and Network Efficiency in Alien Memory Function: ALien Brain ExtRacTion Analytics (ALBERTA) Consortium
-------------------------------------
🔗 importing 10.55458/neurolibre.alberta2 from 🌎 https://cdn.neurolibre.org/content/alberta2/paper.json
Size Matters, but So Does Connectivity: The Amygdala's Role in Emotional Intelligence: ALien Brain ExtRacTion Analytics (ALBERTA) Consortium
-------------------------------------
🔗 importing 10.55458/neurolibre.alberta3 from 🌎 https://cdn.neurolibre.org/content/alberta3/paper.json
More Than Just Words: Temporal Cortex Volume Correlates with Language Ability: ALien Brain ExtRacTion Analytics (ALBERTA) Consortium
-------------------------------------
🔗 importing 10.55458/neurolibre.alberta4 from 🌎 https://cdn.neurolibre.org/content/alberta4/paper.json
Navigating the Void: How Parietal Cortex Volume Predicts Spatial Orientation in Zero Gravity: ALien Brain ExtRacTion Analytics (ALBERTA) Consortium
-------------------------------------
🔗 importing 10.55458/neurolibre.alberta5 from 🌎 https://cdn.neurolibre.org/content/alberta5/paper.json
Attention is all you need, and a chunky prefrontal cortex: ALien Brain ExtRacTion Analytics (ALBERTA) Consortium
-------------------------------------

0.3.22. Access some information about the first ALBERTA article

paper = nlx.papers[0]
print("\n------------> AUTHORS AND AFFILIATIONS")
print(paper.get_authors(), "\n------------> DEPENDENCIES")
print(paper.get_dependencies(), "\n------------> HEADINGS")
print(paper.headings)

------------> AUTHORS AND AFFILIATIONS
[{'nameParsed': {'literal': 'Alien Brain Consortium', 'given': 'Alien Brain', 'family': 'Consortium'}, 'name': 'Alien Brain Consortium', 'affiliations': ['Where aliens live'], 'id': 'contributors-myst-generated-uid-0'}, {'nameParsed': {'literal': 'Robo Neuro', 'given': 'Robo', 'family': 'Neuro'}, 'name': 'Robo Neuro', 'affiliations': ['Where robots live'], 'id': 'contributors-myst-generated-uid-1'}] 
------------> DEPENDENCIES
[{'url': '/alberta1/analysis', 'label': 'fig0cell', 'kind': 'Notebook', 'slug': 'analysis', 'location': '/content/analysis.ipynb'}] 
------------> HEADINGS
[{'depth': 1, 'text': 'Introduction', 'identifier': 'introduction'}, {'depth': 1, 'text': 'Methods', 'identifier': 'methods'}, {'depth': 1, 'text': 'Results', 'identifier': 'results'}, {'depth': 2, 'text': 'Alienarity Index', 'identifier': 'alienarity-index'}, {'depth': 2, 'text': 'Hippocampal Volume', 'identifier': 'hippocampal-volume'}, {'depth': 2, 'text': 'Brain Density', 'identifier': 'brain-density'}, {'depth': 2, 'text': 'Network Efficiency', 'identifier': 'network-efficiency'}, {'depth': 1, 'text': 'Conclusion', 'identifier': 'conclusion'}]

0.3.33. Alienarity Index (AI)

You cannot publish anything without AI these days, so here we go.

First, we will check if the article published any plotly figure, if so, we will capture their reference labels:

paper.inspect_plotly_figures()
These are the plotly figures I found:
-------------------------------------
- html-link [fig0] enumerated as (Figure 1)

- html-link [fig1] enumerated as (Figure 2)

- html-link [fig2] enumerated as (Figure 3)

- html-link [fig3] enumerated as (Figure 4)

Perfect! There are four plotly figures that were linked using fig0, fig1, fig2, and fig4.

As we created these example “articles”, we know that each paper reported their AI as fig0.

Note that we are not performing any computation here, instead, we are taking a special chunk of the paper we imported as structured data and creating a plotly figure out of it!

fig = paper.create_plotly_object_from('fig0')
fig.show()
Loading...

Interesting... What about the AI of the second article? Let’s find out...

nlx.papers[doi2].create_plotly_object_from('fig0').show()
Loading...

I wonder if there is something extraterrastial going on here, each AI has 6 points, but what’s the message they are trying to give???

Let’s take a look at the third one...

nlx.papers[doi3].create_plotly_object_from('fig0').show()
Loading...

👽 Okay, this is getting weird and I cannot picture what’s going on here.

Time to put these together and see if AI has any insights hidden for us. Are not those aliens weird!

combined_df = pd.DataFrame()

for doi in dois:
    data = {}
    
    # Loop through each figure for the current DOI
    # Get the data for the current DOI and figure
    fig = "fig0"
    fig_data = nlx.papers[doi].get_plotly_data(fig, select_trace_type="scatter", select_trace_mode="markers")

    # Add data for each figure, ensure the column name is unique
    data['y'] = fig_data['y']
    data['x'] = fig_data['x']  # Use the figure-specific column name
        
    # Convert the collected data into a DataFrame
    df = pd.DataFrame(data)
    
    # Add the DOI to each row in the DataFrame
    df['doi'] = doi
    
    # Append to the combined DataFrame
    combined_df = pd.concat([combined_df, df], ignore_index=True)

combined_df.head()
Loading...
fig = px.scatter(combined_df,x="x",y="y",color="doi",template="plotly_dark")
fig.update_layout(
    width=800,
    height=800,
    title = "Putting together AI from 5 articles",
    xaxis=dict(scaleanchor="y", scaleratio=1,constrain="domain"),  # Link x and y axis scaling
    yaxis=dict(scaleanchor="x", scaleratio=1,constrain="domain"),
    margin=dict(l=0, r=0, t=50, b=0),
        legend=dict(
        orientation='v',  # Vertical orientation
        yanchor='middle',  # Centered vertically
        xanchor='right',   # Right-aligned
        x=1.4,             # Position just outside the right side
        y=0.5              # Centered on the y-axis
    )
)
fig.update_traces(marker=dict(size=12, opacity=0.9))
fig.show()
Loading...

1Now we are talking! Or, is it papers talking to each other?

#TalkAboutInsight

1.0.1Let’s run a meta-analysis by importing 15 figures from 5 ALBERTA studies into this article!

# Define and create a list of DOIs and figures
doi1 = "10.55458/neurolibre.alberta1"
doi2 = "10.55458/neurolibre.alberta2"
doi3 = "10.55458/neurolibre.alberta3"
doi4 = "10.55458/neurolibre.alberta4"
doi5 = "10.55458/neurolibre.alberta5"
dois = [doi1, doi2, doi3, doi4, doi5]
figures = ['fig1', 'fig2', 'fig3']

1.1Import them alien papers!

nlx.import_papers([doi1,doi2,doi3,doi4,doi5])
nlx.papers[doi2].inspect_plotly_figures()
These are the plotly figures I found:
-------------------------------------
- html-link [fig0] enumerated as (Figure 1)

- html-link [fig1] enumerated as (Figure 2)

- html-link [fig2] enumerated as (Figure 3)

- html-link [fig3] enumerated as (Figure 4)

1.2We can dial into these figures!

Interactive figures can have multiple representations of the data, which is one of the many things that makes them cool. Yet, we can still select the type of data we are interested in and use it!

nlx.papers["10.55458/neurolibre.alberta2"].create_plotly_object_from('fig1')
#nlx.papers["10.55458/neurolibre.alberta2"].create_plotly_object_from('fig1', select_trace_type="histogram")
#nlx.papers["10.55458/neurolibre.alberta2"].create_plotly_object_from('fig1', select_trace_type="box")
#nlx.papers["10.55458/neurolibre.alberta2"].create_plotly_object_from('fig1', select_trace_type="scatter")
#nlx.papers["10.55458/neurolibre.alberta2"].create_plotly_object_from('fig1', select_trace_type="scatter",select_trace_mode="markers")
Loading...

1.3Start the meta analysis!

Now we are going to capture the bivariate data of 3 types of measurements (alien brain volume (ABV), alien brain density (ABD), and alien brain efficiency (ABE)) from 15 articles.

data = []

for doi in dois:
    fig_data = []
    for fig in figures:
        print(f"Fetching data: {fig} from {doi}")
        fig_data.append(nlx.papers[doi].create_plotly_object_from(fig,select_trace_type="scatter",select_trace_mode="markers"))
    data.append(fig_data)
print("done")    
Fetching data: fig1 from 10.55458/neurolibre.alberta1
Fetching data: fig2 from 10.55458/neurolibre.alberta1
Fetching data: fig3 from 10.55458/neurolibre.alberta1
Fetching data: fig1 from 10.55458/neurolibre.alberta2
Fetching data: fig2 from 10.55458/neurolibre.alberta2
Fetching data: fig3 from 10.55458/neurolibre.alberta2
Fetching data: fig1 from 10.55458/neurolibre.alberta3
Fetching data: fig2 from 10.55458/neurolibre.alberta3
Fetching data: fig3 from 10.55458/neurolibre.alberta3
Fetching data: fig1 from 10.55458/neurolibre.alberta4
Fetching data: fig2 from 10.55458/neurolibre.alberta4
Fetching data: fig3 from 10.55458/neurolibre.alberta4
Fetching data: fig1 from 10.55458/neurolibre.alberta5
Fetching data: fig2 from 10.55458/neurolibre.alberta5
Fetching data: fig3 from 10.55458/neurolibre.alberta5
done
fig = make_subplots(rows=5, cols=3, subplot_titles=("ABV", "ABD", "ABE") * 5)

for row in range(5):
    fig.update_yaxes(title_text=f"Score", row=row+1, col=1)
    fig.update_yaxes(title_text=f"Paper {row+1}", row=row+1, col=3, side="right")
    for col in range(3):
        subplot_figure = data[row][col]
        for trace in subplot_figure.data:
            fig.add_trace(trace, row=row + 1, col=col + 1)
            
# Update layout
fig.update_layout(title_text="Mosaic Plot of 15 figures from 5 articles",
                  height=800, width=800,
                  showlegend=True,
                  template="plotly_dark")

# Show the figure
fig.show()
Loading...
column_names = {'fig1': 'vol', 'fig2': 'dens', 'fig3': 'eff'}

# Initialize an empty DataFrame to store the combined data
combined_df = pd.DataFrame()

# Loop through each DOI and figure
for doi in dois:
    data = {}
    for fig in figures:
        # Get the data for the current DOI and figure
        fig_data = nlx.papers[doi].get_plotly_data(fig,select_trace_type="scatter",select_trace_mode="markers")
        
        #data['score'] = fig_data['x']
        data[column_names[fig]] = fig_data['y']
        
    # Create a DataFrame for the current DOI with DOI as a column
    df = pd.DataFrame(data)
    df['DOI'] = doi  # Add the DOI to each row
    
    # Append to the combined DataFrame
    combined_df = pd.concat([combined_df, df], ignore_index=True)

print(combined_df)
              vol         dens          eff                           DOI
0     1241.856859  1241.856859  1241.856859  10.55458/neurolibre.alberta1
1     -386.257828  -386.257828  -386.257828  10.55458/neurolibre.alberta1
2     1141.903393  1141.903393  1141.903393  10.55458/neurolibre.alberta1
3     2335.390648  2335.390648  2335.390648  10.55458/neurolibre.alberta1
4     -508.642687  -508.642687  -508.642687  10.55458/neurolibre.alberta1
...           ...          ...          ...                           ...
1995    11.076677    11.076677    11.076677  10.55458/neurolibre.alberta5
1996  -517.365866  -517.365866  -517.365866  10.55458/neurolibre.alberta5
1997   565.384082   565.384082   565.384082  10.55458/neurolibre.alberta5
1998    25.591227    25.591227    25.591227  10.55458/neurolibre.alberta5
1999   850.649546   850.649546   850.649546  10.55458/neurolibre.alberta5

[2000 rows x 4 columns]
doi_names = {'10.55458/neurolibre.alberta1': 'Hippocampus', '10.55458/neurolibre.alberta2': 'Amygdala', '10.55458/neurolibre.alberta3': 'Temporal',"10.55458/neurolibre.alberta4":"Parietal","10.55458/neurolibre.alberta5":"Prefrontal"}

# Initialize an empty DataFrame for combining all DOIs' data
combined_df = pd.DataFrame()

# Loop through each DOI
for doi in dois:
    data = {}
    
    # Loop through each figure for the current DOI
    for fig in figures:
        # Get the data for the current DOI and figure
        fig_data = nlx.papers[doi].get_plotly_data(fig, select_trace_type="scatter", select_trace_mode="markers")
        
        # Add data for each figure, ensure the column name is unique
        data['score'] = fig_data['y']
        data[column_names[fig]] = fig_data['x']  # Use the figure-specific column name
        
    # Convert the collected data into a DataFrame
    df = pd.DataFrame(data)
    
    # Add the DOI to each row in the DataFrame
    df['Region'] = doi_names[doi]
    
    # Append to the combined DataFrame
    combined_df = pd.concat([combined_df, df], ignore_index=True)

# Print or check the combined DataFrame
combined_df.head()
Loading...
# numeric_cols = combined_df.select_dtypes(include=[np.number]).columns
# df_zscore[numeric_cols] = (combined_df[numeric_cols] - combined_df[numeric_cols].mean())/combined_df[numeric_cols].std()
# df_zscore["DOI"] = combined_df["DOI"]

fig = px.scatter_matrix(combined_df,
    dimensions=["vol", "dens", "eff","score"],
    color="Region", template="plotly_dark")
fig.update_traces(diagonal_visible=False)
fig.update_layout(margin=dict(l=0,r=0,t=0,b=0))
fig.show()
Loading...
features = ['score', 'vol', 'dens', 'eff']
X = combined_df[features].values
X_mean = np.mean(X, axis=0)
X_std = X - X_mean
cov_matrix = np.cov(X_std.T)
eigenvalues, eigenvectors = np.linalg.eig(cov_matrix)
sorted_indices = np.argsort(eigenvalues)[::-1]
sorted_eigenvalues = eigenvalues[sorted_indices]
sorted_eigenvectors = eigenvectors[:, sorted_indices]

top_2_eigenvectors = sorted_eigenvectors[:, :2]
X_pca = X_std.dot(top_2_eigenvectors)

pca_df = pd.DataFrame(data=X_pca, columns=['PCA1', 'PCA2'])
pca_df['Region'] = combined_df['Region']

fig = px.scatter(pca_df, x='PCA1', y='PCA2', color='Region', title='PCA of Regions based on Features', template="plotly_dark")
fig.update_traces(marker=dict(size=15, opacity=0.7))
fig.update_layout(margin=dict(l=0,r=0,t=0,b=0))
fig.show()
Loading...
# If you don't believe me that the variability is coming from volume and score...

# Aliens are soo linear, omg. Just stick with the ones with bigger heads :) 
fig = px.scatter(combined_df,x="vol",y="score",color="Region",template="plotly_dark")
fig.update_traces(marker=dict(size=15, opacity=0.7))
fig.update_layout(margin=dict(l=0,r=0,t=0,b=0))
fig.show()
Loading...
score_corr_pearson = combined_df.corr(numeric_only=True, method="pearson")
score_corr_kendall = combined_df.corr(numeric_only=True, method="kendall")
score_corr_spearman = combined_df.corr(numeric_only=True, method="spearman")

# Create a list of the correlation matrices and corresponding labels
corr_matrices = [score_corr_pearson, score_corr_kendall, score_corr_spearman]
methods = ['Pearson', 'Kendall', 'Spearman']

# Initialize the first heatmap (Pearson by default)
fig = px.imshow(corr_matrices[0],
                color_continuous_scale='Viridis',
                zmin=-0.5, zmax=0.9, 
                title=f'{methods[0]} Correlation Matrix', template="ggplot2")

# Update the layout to add the slider
fig.update_layout(
    updatemenus=[
        dict(
            type="buttons",
            direction="down",
            buttons=[
                dict(
                    args=[{"z": [corr_matrices[0].values]}],
                    label="Pearson",
                    method="restyle"
                ),
                dict(
                    args=[{"z": [corr_matrices[1].values]}],
                    label="Kendall",
                    method="restyle"
                ),
                dict(
                    args=[{"z": [corr_matrices[2].values]}],
                    label="Spearman",
                    method="restyle"
                )
            ]
        )
    ]
)

fig.update_layout(margin=dict(l=0,r=0,t=0,b=0))
fig.show()
Loading...
The Future of Scientific Literature: Papers Talking to Each Other
The Future of Scientific Literature: Papers Talking to Each Other