You can find individual report sections, figures and a pdf with a rendered version of the report in outputs/figures
.
Run bash make_report.sh REPORT_DATE_HERE
to render the report.
Output csvs will be saved in the outputs/data/experts
folder.
Run python ai_genomics/analysis/researchers/influential_researchers.py
to generate two csvs of active / influential researchers based on the OpenAlex data.
Run python ai_genomics/analysis/ai_genomics_experts/patent_assignees.py
to generate csv of the assignees who have been assigned the most AI and genomics patents.
Run python ai_genomics/analysis/ai_genomics_experts/cb_orgs_and_people.py
to generate four csvs of AI and genomics Crunchbase investors, top most funded companies and related people.
Run python ai_genomics/analysis/ai_genomics_experts/gtr_people.py
to generate a csv of people that have worked on AI and genomics projects (only projects with people information are included)
Run python ai_genomics/analysis/gtr/gtr_cluster_analysis.py
to reproduce the prototype cluster analysis of GtR clusters. Note that this analysis uses the same sampled dataset as reported by JMG.
You can run change the reproduce
parameter to re-run the analysis from scratch. This includes creating vector representations of all sampled projects (which takes around 1hr locally).
Run python ai_genomics/analysis/influence/make_influence_tables.py
to calculate influence scores for documents in key datasets. This works as follows:
This will save influence_scores
locally for follow-on analysis.
Run python ai_genomics/analysis/influence/make_influence_analysis.py
with get_influence(local=False)
to reproduce the analysis in the report with new influence scores. This includes an analysis of influence scores and an analysis of influence via citations.
Run python ai_genomics/analysis/influence/make_influence_analysis.py
with get_influence(local=True)
. This will read the original set of influence scores from s3 and reproduce the analysis with that.
All charts are saved in outputs/figures/png
.
Run python ai_genomics/analysis/integrated_emergence/make_emergence_analysis.py
to perform an emergence analysis of document clusters in the OpenAlex, Patent and GtR data, and to integrate the results across datasets. This works as follows:
We calculate document cluster-year frequencies by dataset
We calculate the "recency" and "significance" of each cluster in the table.
Recency captures the % of all activity in a cluster happening in recent years i.e. the extent to which it skews towards the past or the present
Significance captures the % of activity in all clusters in a cluster i.e the extent to which the cluster is important within the population of clusters
We visualise the above in a two by two matrix which identifies different "types" of clusters based on their emergence / significance values
All charts are saved in figures/outputs/png