- Science is a collective endeavor that involves actors such as researchers, students, communicators, administrators and the general public in various roles. In this post, we analyse a schematic science workflow, from conceptualization to dissemination through literature research, actual research and peer-reviewed publication. We identify opportunities, and pitfalls, for the use of Large Language Models (LLMs) and review contentious aspects of LLM use in the particular context of the OpenScience movement.
- LLMs already provide substantial help across a range of use case, including literature research (knowledge exploration, source discovery, citation matching, relevance assessment, summarization and synthesis query), assessment (peer review, essay rating, CV scoring, citation valence rating), research design (brainstorming, experimental design, criticism), code (explanation, documentation, optimization, generation, translation and adaptation), data analysis and visualization (by writing bespoke code), writing of exercise sheets, grants, presentations, articles or dissemination pieces (outline expansion, grammatical and stylistic support, translation).
- Some unique characteristics of LLMs as research companions are their comprehensive databank of facts, an output that is sensitive to previous interactions and elicited by targeted prompts which enables a conversational(conversation) workflow with multiple rounds of refinement. Furthermore, LLMs generally produce high-quality, grammatical text with user-tunable style and length. Users can induce the LLM to perform ortographic, syntactic and semantic transformations on any text.
- We illustrate a number of use cases with full transcripts of interaction. LLMs perform well in many of the scenarios listed above. They occassionally and unpredictably produce erroneous but plausible-sounding and well-articulated prose. Real-time access of LLMs to authoritative sources of constraints or knowledge, such as compilers, computational engines, encyclopedias, databases and refereed work will likely support and expedite the necessary cross-checking of LLM output.
- We reflect on LLMs from the perspective of Open Science. We find that LLMs need to come with an accessible training set, open source code, and a documented training protocol in order to align with principles of transparency, accountability and accessibility, while we acknowledge that the technology poses some structural interpretability problems. Similarly, while we salute the proliferation of compact models for consumer hardware, we acknowledge that LLMs and broader AI technologies pose serious ethical challenges that warrant a societal discussion about their deployment.
- LLMs and assistive AI technologies in general will have profound impacts on education and research. How the situation will unfold depends on techological, political and cultural feedback loops that are hard to anticipate. We close with a few open questions on broader issues: copyright, the value of static content and possible human empowerment from the perspective of inclusivity.