The Future of Scientific Literature: Papers Talking to Each Other
A NeuroLibre Case Study
Integrative literature demonstration
0.0.0.1NeuroLibre Day | September 27, 2024 | Montreal, Canada¶
0.1A meta-analysis on 5 ALBERTA (ALien Brain ExtRacTion Analytics) articles¶
- The Role of Hippocampal Volume, Brain Density, and Network Efficiency in Alien Memory Function
- Size Matters, but So Does Connectivity: The Amygdala’s Role in Emotional Intelligence
- More Than Just Words: Temporal Cortex Volume Correlates with Language Ability
- Navigating the Void: How Parietal Cortex Volume Predicts Spatial Orientation in Zero Gravity
- Attention is all you need, and a chunky prefrontal cortex
0.2NeuroLibre cross-links¶
NeuroxLink is a mini python package to parse mdast, facilitating cross-paper import of article content from MyST servers. It also introduces some Plotly functionality to work with data from interactive figures!
pip install neuroxlink
0.3nlx = NeuroxLink()
nlx.import_paper("10.55458/neurolibre.alberta1")
¶
# from neuroxlink.src.neuroxlink import NeuroxLink
from neuroxlink import NeuroxLink
import sys
import os
import pandas as pd
import plotly.graph_objects as go
import plotly.express as px
from plotly.subplots import make_subplots
import plotly.io as pio
import numpy as np
pio.renderers.default = "plotly_mimetype"
0.3.11. Instantiate a neuroxlink
object (nlx
) and import all 5 ALBERTA papers¶
nlx = NeuroxLink(cdn_url="https://cdn.neurolibre.org")
# Define and create a list of DOIs and figures
doi1 = "10.55458/neurolibre.alberta1"
doi2 = "10.55458/neurolibre.alberta2"
doi3 = "10.55458/neurolibre.alberta3"
doi4 = "10.55458/neurolibre.alberta4"
doi5 = "10.55458/neurolibre.alberta5"
dois = [doi1, doi2, doi3, doi4, doi5]
figures = ['fig1', 'fig2', 'fig3']
nlx.import_papers(dois)
🔗 importing 10.55458/neurolibre.alberta1 from 🌎 https://cdn.neurolibre.org/content/alberta1/paper.json
The Role of Hippocampal Volume, Brain Density, and Network Efficiency in Alien Memory Function: ALien Brain ExtRacTion Analytics (ALBERTA) Consortium
-------------------------------------
🔗 importing 10.55458/neurolibre.alberta2 from 🌎 https://cdn.neurolibre.org/content/alberta2/paper.json
Size Matters, but So Does Connectivity: The Amygdala's Role in Emotional Intelligence: ALien Brain ExtRacTion Analytics (ALBERTA) Consortium
-------------------------------------
🔗 importing 10.55458/neurolibre.alberta3 from 🌎 https://cdn.neurolibre.org/content/alberta3/paper.json
More Than Just Words: Temporal Cortex Volume Correlates with Language Ability: ALien Brain ExtRacTion Analytics (ALBERTA) Consortium
-------------------------------------
🔗 importing 10.55458/neurolibre.alberta4 from 🌎 https://cdn.neurolibre.org/content/alberta4/paper.json
Navigating the Void: How Parietal Cortex Volume Predicts Spatial Orientation in Zero Gravity: ALien Brain ExtRacTion Analytics (ALBERTA) Consortium
-------------------------------------
🔗 importing 10.55458/neurolibre.alberta5 from 🌎 https://cdn.neurolibre.org/content/alberta5/paper.json
Attention is all you need, and a chunky prefrontal cortex: ALien Brain ExtRacTion Analytics (ALBERTA) Consortium
-------------------------------------
0.3.22. Access some information about the first ALBERTA article¶
paper = nlx.papers[0]
print("\n------------> AUTHORS AND AFFILIATIONS")
print(paper.get_authors(), "\n------------> DEPENDENCIES")
print(paper.get_dependencies(), "\n------------> HEADINGS")
print(paper.headings)
------------> AUTHORS AND AFFILIATIONS
[{'nameParsed': {'literal': 'Alien Brain Consortium', 'given': 'Alien Brain', 'family': 'Consortium'}, 'name': 'Alien Brain Consortium', 'affiliations': ['Where aliens live'], 'id': 'contributors-myst-generated-uid-0'}, {'nameParsed': {'literal': 'Robo Neuro', 'given': 'Robo', 'family': 'Neuro'}, 'name': 'Robo Neuro', 'affiliations': ['Where robots live'], 'id': 'contributors-myst-generated-uid-1'}]
------------> DEPENDENCIES
[{'url': '/alberta1/analysis', 'label': 'fig0cell', 'kind': 'Notebook', 'slug': 'analysis', 'location': '/content/analysis.ipynb'}]
------------> HEADINGS
[{'depth': 1, 'text': 'Introduction', 'identifier': 'introduction'}, {'depth': 1, 'text': 'Methods', 'identifier': 'methods'}, {'depth': 1, 'text': 'Results', 'identifier': 'results'}, {'depth': 2, 'text': 'Alienarity Index', 'identifier': 'alienarity-index'}, {'depth': 2, 'text': 'Hippocampal Volume', 'identifier': 'hippocampal-volume'}, {'depth': 2, 'text': 'Brain Density', 'identifier': 'brain-density'}, {'depth': 2, 'text': 'Network Efficiency', 'identifier': 'network-efficiency'}, {'depth': 1, 'text': 'Conclusion', 'identifier': 'conclusion'}]
0.3.33. Alienarity Index (AI)¶
You cannot publish anything without AI these days, so here we go.
First, we will check if the article published any plotly figure, if so, we will capture their reference labels:
paper.inspect_plotly_figures()
These are the plotly figures I found:
-------------------------------------
- html-link [fig0] enumerated as (Figure 1)
- html-link [fig1] enumerated as (Figure 2)
- html-link [fig2] enumerated as (Figure 3)
- html-link [fig3] enumerated as (Figure 4)
Perfect! There are four plotly figures that were linked using fig0
, fig1
, fig2
, and fig4
.
As we created these example “articles”, we know that each paper reported their
AI
asfig0
.
Note that we are not performing any computation here, instead, we are taking a special chunk of the paper we imported as structured data and creating a plotly figure out of it!
fig = paper.create_plotly_object_from('fig0')
fig.show()
Interesting... What about the AI
of the second article? Let’s find out...
nlx.papers[doi2].create_plotly_object_from('fig0').show()
I wonder if there is something extraterrastial going on here, each AI
has 6 points, but what’s the message they are trying to give???
Let’s take a look at the third one...
nlx.papers[doi3].create_plotly_object_from('fig0').show()
👽 Okay, this is getting weird and I cannot picture what’s going on here.
Time to put these together and see if AI
has any insights hidden for us. Are not those aliens weird!
combined_df = pd.DataFrame()
for doi in dois:
data = {}
# Loop through each figure for the current DOI
# Get the data for the current DOI and figure
fig = "fig0"
fig_data = nlx.papers[doi].get_plotly_data(fig, select_trace_type="scatter", select_trace_mode="markers")
# Add data for each figure, ensure the column name is unique
data['y'] = fig_data['y']
data['x'] = fig_data['x'] # Use the figure-specific column name
# Convert the collected data into a DataFrame
df = pd.DataFrame(data)
# Add the DOI to each row in the DataFrame
df['doi'] = doi
# Append to the combined DataFrame
combined_df = pd.concat([combined_df, df], ignore_index=True)
combined_df.head()
fig = px.scatter(combined_df,x="x",y="y",color="doi",template="plotly_dark")
fig.update_layout(
width=800,
height=800,
title = "Putting together AI from 5 articles",
xaxis=dict(scaleanchor="y", scaleratio=1,constrain="domain"), # Link x and y axis scaling
yaxis=dict(scaleanchor="x", scaleratio=1,constrain="domain"),
margin=dict(l=0, r=0, t=50, b=0),
legend=dict(
orientation='v', # Vertical orientation
yanchor='middle', # Centered vertically
xanchor='right', # Right-aligned
x=1.4, # Position just outside the right side
y=0.5 # Centered on the y-axis
)
)
fig.update_traces(marker=dict(size=12, opacity=0.9))
fig.show()
1Now we are talking! Or, is it papers talking to each other?¶
#TalkAboutInsight
1.0.1Let’s run a meta-analysis by importing 15 figures from 5 ALBERTA studies into this article!¶
# Define and create a list of DOIs and figures
doi1 = "10.55458/neurolibre.alberta1"
doi2 = "10.55458/neurolibre.alberta2"
doi3 = "10.55458/neurolibre.alberta3"
doi4 = "10.55458/neurolibre.alberta4"
doi5 = "10.55458/neurolibre.alberta5"
dois = [doi1, doi2, doi3, doi4, doi5]
figures = ['fig1', 'fig2', 'fig3']
1.1Import them alien papers!¶
nlx.import_papers([doi1,doi2,doi3,doi4,doi5])
nlx.papers[doi2].inspect_plotly_figures()
These are the plotly figures I found:
-------------------------------------
- html-link [fig0] enumerated as (Figure 1)
- html-link [fig1] enumerated as (Figure 2)
- html-link [fig2] enumerated as (Figure 3)
- html-link [fig3] enumerated as (Figure 4)
1.2We can dial into these figures!¶
Interactive figures can have multiple representations of the data, which is one of the many things that makes them cool. Yet, we can still select the type of data we are interested in and use it!
nlx.papers["10.55458/neurolibre.alberta2"].create_plotly_object_from('fig1')
#nlx.papers["10.55458/neurolibre.alberta2"].create_plotly_object_from('fig1', select_trace_type="histogram")
#nlx.papers["10.55458/neurolibre.alberta2"].create_plotly_object_from('fig1', select_trace_type="box")
#nlx.papers["10.55458/neurolibre.alberta2"].create_plotly_object_from('fig1', select_trace_type="scatter")
#nlx.papers["10.55458/neurolibre.alberta2"].create_plotly_object_from('fig1', select_trace_type="scatter",select_trace_mode="markers")
1.3Start the meta analysis!¶
Now we are going to capture the bivariate data of 3 types of measurements (alien brain volume (ABV), alien brain density (ABD), and alien brain efficiency (ABE)) from 15 articles.
data = []
for doi in dois:
fig_data = []
for fig in figures:
print(f"Fetching data: {fig} from {doi}")
fig_data.append(nlx.papers[doi].create_plotly_object_from(fig,select_trace_type="scatter",select_trace_mode="markers"))
data.append(fig_data)
print("done")
Fetching data: fig1 from 10.55458/neurolibre.alberta1
Fetching data: fig2 from 10.55458/neurolibre.alberta1
Fetching data: fig3 from 10.55458/neurolibre.alberta1
Fetching data: fig1 from 10.55458/neurolibre.alberta2
Fetching data: fig2 from 10.55458/neurolibre.alberta2
Fetching data: fig3 from 10.55458/neurolibre.alberta2
Fetching data: fig1 from 10.55458/neurolibre.alberta3
Fetching data: fig2 from 10.55458/neurolibre.alberta3
Fetching data: fig3 from 10.55458/neurolibre.alberta3
Fetching data: fig1 from 10.55458/neurolibre.alberta4
Fetching data: fig2 from 10.55458/neurolibre.alberta4
Fetching data: fig3 from 10.55458/neurolibre.alberta4
Fetching data: fig1 from 10.55458/neurolibre.alberta5
Fetching data: fig2 from 10.55458/neurolibre.alberta5
Fetching data: fig3 from 10.55458/neurolibre.alberta5
done
fig = make_subplots(rows=5, cols=3, subplot_titles=("ABV", "ABD", "ABE") * 5)
for row in range(5):
fig.update_yaxes(title_text=f"Score", row=row+1, col=1)
fig.update_yaxes(title_text=f"Paper {row+1}", row=row+1, col=3, side="right")
for col in range(3):
subplot_figure = data[row][col]
for trace in subplot_figure.data:
fig.add_trace(trace, row=row + 1, col=col + 1)
# Update layout
fig.update_layout(title_text="Mosaic Plot of 15 figures from 5 articles",
height=800, width=800,
showlegend=True,
template="plotly_dark")
# Show the figure
fig.show()
column_names = {'fig1': 'vol', 'fig2': 'dens', 'fig3': 'eff'}
# Initialize an empty DataFrame to store the combined data
combined_df = pd.DataFrame()
# Loop through each DOI and figure
for doi in dois:
data = {}
for fig in figures:
# Get the data for the current DOI and figure
fig_data = nlx.papers[doi].get_plotly_data(fig,select_trace_type="scatter",select_trace_mode="markers")
#data['score'] = fig_data['x']
data[column_names[fig]] = fig_data['y']
# Create a DataFrame for the current DOI with DOI as a column
df = pd.DataFrame(data)
df['DOI'] = doi # Add the DOI to each row
# Append to the combined DataFrame
combined_df = pd.concat([combined_df, df], ignore_index=True)
print(combined_df)
vol dens eff DOI
0 1241.856859 1241.856859 1241.856859 10.55458/neurolibre.alberta1
1 -386.257828 -386.257828 -386.257828 10.55458/neurolibre.alberta1
2 1141.903393 1141.903393 1141.903393 10.55458/neurolibre.alberta1
3 2335.390648 2335.390648 2335.390648 10.55458/neurolibre.alberta1
4 -508.642687 -508.642687 -508.642687 10.55458/neurolibre.alberta1
... ... ... ... ...
1995 11.076677 11.076677 11.076677 10.55458/neurolibre.alberta5
1996 -517.365866 -517.365866 -517.365866 10.55458/neurolibre.alberta5
1997 565.384082 565.384082 565.384082 10.55458/neurolibre.alberta5
1998 25.591227 25.591227 25.591227 10.55458/neurolibre.alberta5
1999 850.649546 850.649546 850.649546 10.55458/neurolibre.alberta5
[2000 rows x 4 columns]
doi_names = {'10.55458/neurolibre.alberta1': 'Hippocampus', '10.55458/neurolibre.alberta2': 'Amygdala', '10.55458/neurolibre.alberta3': 'Temporal',"10.55458/neurolibre.alberta4":"Parietal","10.55458/neurolibre.alberta5":"Prefrontal"}
# Initialize an empty DataFrame for combining all DOIs' data
combined_df = pd.DataFrame()
# Loop through each DOI
for doi in dois:
data = {}
# Loop through each figure for the current DOI
for fig in figures:
# Get the data for the current DOI and figure
fig_data = nlx.papers[doi].get_plotly_data(fig, select_trace_type="scatter", select_trace_mode="markers")
# Add data for each figure, ensure the column name is unique
data['score'] = fig_data['y']
data[column_names[fig]] = fig_data['x'] # Use the figure-specific column name
# Convert the collected data into a DataFrame
df = pd.DataFrame(data)
# Add the DOI to each row in the DataFrame
df['Region'] = doi_names[doi]
# Append to the combined DataFrame
combined_df = pd.concat([combined_df, df], ignore_index=True)
# Print or check the combined DataFrame
combined_df.head()
# numeric_cols = combined_df.select_dtypes(include=[np.number]).columns
# df_zscore[numeric_cols] = (combined_df[numeric_cols] - combined_df[numeric_cols].mean())/combined_df[numeric_cols].std()
# df_zscore["DOI"] = combined_df["DOI"]
fig = px.scatter_matrix(combined_df,
dimensions=["vol", "dens", "eff","score"],
color="Region", template="plotly_dark")
fig.update_traces(diagonal_visible=False)
fig.update_layout(margin=dict(l=0,r=0,t=0,b=0))
fig.show()
features = ['score', 'vol', 'dens', 'eff']
X = combined_df[features].values
X_mean = np.mean(X, axis=0)
X_std = X - X_mean
cov_matrix = np.cov(X_std.T)
eigenvalues, eigenvectors = np.linalg.eig(cov_matrix)
sorted_indices = np.argsort(eigenvalues)[::-1]
sorted_eigenvalues = eigenvalues[sorted_indices]
sorted_eigenvectors = eigenvectors[:, sorted_indices]
top_2_eigenvectors = sorted_eigenvectors[:, :2]
X_pca = X_std.dot(top_2_eigenvectors)
pca_df = pd.DataFrame(data=X_pca, columns=['PCA1', 'PCA2'])
pca_df['Region'] = combined_df['Region']
fig = px.scatter(pca_df, x='PCA1', y='PCA2', color='Region', title='PCA of Regions based on Features', template="plotly_dark")
fig.update_traces(marker=dict(size=15, opacity=0.7))
fig.update_layout(margin=dict(l=0,r=0,t=0,b=0))
fig.show()
# If you don't believe me that the variability is coming from volume and score...
# Aliens are soo linear, omg. Just stick with the ones with bigger heads :)
fig = px.scatter(combined_df,x="vol",y="score",color="Region",template="plotly_dark")
fig.update_traces(marker=dict(size=15, opacity=0.7))
fig.update_layout(margin=dict(l=0,r=0,t=0,b=0))
fig.show()
score_corr_pearson = combined_df.corr(numeric_only=True, method="pearson")
score_corr_kendall = combined_df.corr(numeric_only=True, method="kendall")
score_corr_spearman = combined_df.corr(numeric_only=True, method="spearman")
# Create a list of the correlation matrices and corresponding labels
corr_matrices = [score_corr_pearson, score_corr_kendall, score_corr_spearman]
methods = ['Pearson', 'Kendall', 'Spearman']
# Initialize the first heatmap (Pearson by default)
fig = px.imshow(corr_matrices[0],
color_continuous_scale='Viridis',
zmin=-0.5, zmax=0.9,
title=f'{methods[0]} Correlation Matrix', template="ggplot2")
# Update the layout to add the slider
fig.update_layout(
updatemenus=[
dict(
type="buttons",
direction="down",
buttons=[
dict(
args=[{"z": [corr_matrices[0].values]}],
label="Pearson",
method="restyle"
),
dict(
args=[{"z": [corr_matrices[1].values]}],
label="Kendall",
method="restyle"
),
dict(
args=[{"z": [corr_matrices[2].values]}],
label="Spearman",
method="restyle"
)
]
)
]
)
fig.update_layout(margin=dict(l=0,r=0,t=0,b=0))
fig.show()