PubTech Radar Scan: Issue 37

Somewhat delayed ‘back to school’ edition. Usual mix of launches, news, AI, and longer reads, somewhat haphazardly collected and curated.

Helen King

Aug 28, 2025

📰 News

Wiley has announced that 1,000 scholarly journals have successfully transitioned to its Research Exchange platform, representing more than 50% of the company's journal portfolio. [I’ve always been a fan of REX and the thinking behind the service. It is no easy feat to move journals to a new peer review platform]
This LinkedIn thread and post by Chris Dicker, CEO of CANDR Media, explains how Trusted Reviews was scraped 1.6 million times in a single day, 18.5 times a second, allegedly by OpenAI, despite having a robots.txt file in place. Out of 1.6 million scrapes, we saw just 603 users arrive on site, a CTR of 0.037%, Chris wants to know who is going to pay for this? [I have an interview with Jonathan Woahn from cashmere.io on one approach to solving this problem coming shortly.]

hris Dicker, CEO of CANDR Media, explains how Trusted Reviews was scraped 1.6 million times in a single day — 18.5 times a second — allegedly by OpenAI, despite having a robots.txt file in place. Out of 1.6 million scrapes, we saw just 603 users arrive on site, a CTR of 0.037%. Who pays for this?

Open infrastructure is crumbling under political pressure and vanishing funds. IOI’s 2025 report sounds the alarm and calls for urgent, collective action
A call for the US federal government to adopt persistent digital identifiers (DOIs for research awards and ORCIDs for researchers) across all agencies. The article outlines the problems of the current reporting burden, explains how digital identifiers would simplify everything, and then lays out steps to make it happen.
Sticking with the Identifiers theme, the main Crossref systems have [finally] moved to the cloud 🎉!
Effective 8 August 2025, the Center for Open Science (COS) has suspended new submissions to its generalist OSF Preprints server because of an increasing number of suspicious or low-quality submissions and other reasons. See also Academia has a new preprints problem by Mark Hahnel.

🚀 Launches

Near Missives a newsletter from Chris Reid & Laura Harvey “It is often said that there are two ways to learn about a topic: from deep inside and from the outside. As much as we should all be listening to and learning from our peers and colleagues, we wanted to create a way to bring ideas, concepts and trends from other industries to a scholarly comms audience. With that thought, Near Missives was born.”
The Thomas Kuhn Foundation has developed the KGX3 engine, a tool that can predict the significance of scientific research before it is even published. Instead of waiting decades to see whether a paper is “paradigm shifting” or simply solid, normal science, KGX3 analyzes research in real time to reveal whether it confirms, stresses, or breaks existing ways of thinking. Using proven signal-detection methods similar to those behind Bloomberg, Dataminr, and Brandwatch, KGX3 wants to transform the scientific record into a dynamic map of emerging ideas, helping researchers and decision-makers spot the structural tensions that precede major scientific breakthroughs.
posters.science, a new open-source and free platform where researchers can easily share, find, and reuse posters.
Copilot Mode in Edge includes multi-tab RAG. You can use Copilot to analyze your open tabs, like Satya Nadella does here with papers his company has published in Nature journals over the last year.
AlphaXiv transforms dense arXiv papers into blog‑style summaries in seconds. Just swap ‘arxiv’ with ‘alphaxiv’.
Clear Skies is making genAI detection with Pangram available to subscribers in Oversight. Adam claims it’s “GenAI detection that actually works”.
Veracity from GroundedAI is now available as a standalone, self-service tool, which includes Citation Stacking Detection and Citation Quality Assessment.

🤖 AI

Editors using the FT's internal CMS now “get a suggested description automatically – short, accurate, and context-aware – generated using a mix of machine vision and agency captions. Every suggestion is flagged for fact-checking and reviewed before publication to ensure accuracy.” [LinkedIn Post]
This Science Advances study uses a novel ‘excess word’ analysis to track how AI, especially ChatGPT, has reshaped the language of biomedical research. It uncovers a measurable shift in vocabulary across over 15 million PubMed abstracts, revealing that at least 13.5% in 2024 were likely LLM-assisted.
Researchers have tried to create tools that check whether a specific piece of text (like a paragraph) was included in a model’s training data. These tools are called membership inference attacks (MIAs), but they can be unreliable. Rather than trying to prove whether one paragraph was in the training set, dataset inference focuses on whether the model was trained on a larger dataset, like a whole book, better matching real copyright concerns, since authors usually care about their whole work being used, not just one page.
I don’t know this group, so DYOR, but I thought it looked interesting: workshop for Artificial Intelligence for Scientific Publications (WASP) 2025 in December.
Michael Upshall explores Thomas Krichel’s publication alert system, which is in use both for economics (where it is entitled NEP, short for New Economics Papers), based on the content in RePEc, and for life sciences (where it is called Biomed News, or BIMs), using content from PubMed.
Rishabh Lohia contrasts two models of digital publishing: one where users must search and sift through static content, and another where content actively adapts and responds to the user’s needs.

I saw this and thought perhaps publishers don’t need to abandon their cherished spreadsheets: Why Paradigm built a spreadsheet with an AI agent in every cell
Great thread by Aaron Tay on the development of Academic Search on Bluesky. Aaron has been publishing some excellent articles on Deep Research, AI Search, Deep Search, and how they work recently - subscribe to his Substack.

Many ways to envision how academic search (mostly commercial non-pilot systems) has evolved

An academic, Liudmila Zavolokina, discovered her name cited in fake papers published by a shady journal filled with AI-generated articles, often authored by ghost writers (or no one at all), filled with hallucinated references.
Video showing how the Wiley/Anthropic content deal works:

📚Longer reads

📰 The em dash has had enough. The em dash speaks out in its own defence, claiming centuries of usage and lamenting that it’s now unfairly viewed as a marker of soulless AI writing. It cites usage by literary giants—Mary Shelley, Emily Dickinson, David Foster Wallace—and scolds the accusers for not reading enough.
🎥 A 2017 The Great Automatic Grammatizator: writing, labour, computers by Martin Paul Eve led me to Roald Dahl's excellent 1953 ‘Great Automatic Grammatizator’, a story about an automatic writing machine in which human writers are replaced by the machine. [YouTube reading of story]
📰 Subscribe-to-Open Is Doomed. Here’s Why by Rick Anderson. This post reminded me that around ten years ago, I explained Gold Open Access to a tech audience by comparing it - loosely and imperfectly - to a pyramid scheme. Publishing carried two kinds of costs: the one-off expense of producing a journal with a carefully managed number of papers, and the recurring expense of keeping the archive and related services online. In the subscription model, this is straightforward; stability is provided by many subscribers, and ongoing subscriber revenue covers both categories of cost + profit. In the Gold OA unlimited papers model, each new paper must carry its own production costs, help offset the costs of rejected papers, and also contribute to maintaining the growing archive of everything published before. I have always wondered how sustainable this model is over the very long term. Storage costs may decline, economies of scale kick in, but labour, compliance, infrastructure, and the rising expenses linked to AI, from detection to blocking unwanted scraping, are all increasing recurring costs. All of these models have problems.
📰 Stocking the Librarian’s Publishing Integrity Toolkit via Fabienne Michaud
🎥 NPTV Unscripted: In conversation with Martin Delahunty about Publishing, AI, and what the future holds.
📰 The PDF may no longer be the untouchable container it once was. But with foresight and action, scholarly publishers can still shape what comes next by Pascal Hetzscholdt and ChatGPT-4o. Interesting because of the changes to PDF but also because most Western publishers haven’t relied on PDFs for decades, and I think you can really see the limits of AI writing in this article. [I know this sounds daft, but I think the differences between HTML and PDF format online are really poorly understood by many users… I remember troubleshooting one librarian’s problem and realising that they didn’t realise that they were looking at an HTML page rather than the PDF… sadly, this wasn’t that long ago...]
🎧 Midnight at the Casablanca interviews Timo Hannay. There’s a section in here where Paul is asking about business models for some of the early work, and Timo’s response is a bit vague. It’s hard to appreciate now just how experimental web tech was then, how easy it was to put something together quickly, and how much Annette Thomas supported the team’s freedom to experiment.

Almost finally…

Thought-provoking from Hannah Shelley, Google Scholar Is Doomed. I’m inclined to agree, not sure about the timeline, but a couple of years after Anurag, the founder, leaves? Mind you, 25 years is an awfully good innings for a technology product.

End Notes:

If you’re interested in AI, you can subscribe for free to my other Substack GenAI for Curious People, to see entries for my new book.
If you found this useful, you can always buy me a coffee
If you would like to employ me, you can find me at Maverick
If you’re in London on Monday, 22 September, join us for the next AI in Publishing Meetup. Registration form coming soon.

Big Ideas in Publishing: Tim Vines on AI Subscriptions

How creating AI-ready research content could unlock entirely new revenue streams

Helen King

Jul 31, 2025

This is the first post in Big Ideas in Publishing, an occasional series exploring concepts that could reshape research communication.

Tim Vines has been thinking about the future of research publishing for some time. As founder of DataSeer and former Managing Editor of Molecular Ecology, he's seen how technology can transform scholarly communication. His latest idea is, what if publishers created AI-optimized versions of research articles and sold them as premium subscriptions?

I caught up with Tim to discuss this concept, which emerged from his thinking about what happens when artificial intelligence becomes a primary consumer of research literature.

You've been talking about AI subscriptions for research content. Can you walk me through the basic idea?

The arrival of generative AI has prompted widespread speculation about the future of scholarly communication. One scenario that has received little attention, but could have profound implications for publishers, is: What if artificial intelligence becomes the primary consumer of research literature?

Large language models (LLMs) have already been trained on huge volumes of the written word; to an order of magnitude, everything humans have ever published online. As these systems mature, we will need to find new forms of input. If we want these tools to contribute meaningfully to scientific discovery, we need to rethink how research content is structured for their consumption and – necessarily – how it's paid for.

The core limitation of today’s LLMs is that they operate by averaging. Their predictions are generated by assimilating vast numbers of tokens from existing texts and inferring what likely comes next. This approach may be useful in many domains, but it is ill-suited to scientific inquiry, where progress often stems from outliers – individual studies that take a new approach and reshape the consensus.

Scientific literature, moreover, is not of uniform quality. Some articles are foundational; many are derivative, flawed, or outright misleading. Yet because high-impact research is typically paywalled, and less rigorous work is often openly accessible, LLMs are disproportionately trained on lower-quality material. We’re asking AI systems to model the state of scientific knowledge by digesting the most available – not the most accurate – content.

So how do we fix this problem?

We need to provide AI with the underlying raw materials of science. What are the raw materials of science? It has to be an individual claim or conjecture—you could think of it as a set of conditions, a hypothesis that's tested, and then a result in terms of both a dataset and a conclusion drawn from that dataset.

If you just read scientific papers, you only get the words. You only hear about what the authors think about their results, and authors can overstate their significance. Unless you're very skilled and able to critically pick these apart, it’s hard to determine the value and validity of the work from the words alone.

To enable AI systems to assist in research, we need to provide them with structured, reliable input — not just the prose of published articles, but the underlying claims, data, and reasoning. A truly AI-ready version of a research paper might look very different from the human-readable PDF. It could be decomposed into modular units: individual hypotheses, linked datasets, methodological summaries, and direct mappings between evidence and conclusions. This kind of formatting would allow AI to assess the strength of specific claims and build a more nuanced understanding of the field.

That sounds like a lot of extra work. How would this actually function as a business model?

The current publishing infrastructure is optimized for human readers. Research articles are delivered in a format that presumes a shared understanding (between humans!) of how to communicate findings; most commonly, via a PDF. This static medium persists because it enables a predictable exchange between sender and receiver. But AI readers operate under entirely different assumptions. They aren’t concerned with narrative flow or rhetorical nuance; they want structured input that enables reasoning.

This divergence opens the door to a new business model: offering AI-ready versions of research articles via a subscription. Human-readable content could remain open access, while publishers monetize machine-optimized content streams for institutional clients, particularly those running proprietary, in-house AI systems.

Can you give me a concrete example of how this would work?

Imagine a pharmaceutical company subscribing to an AI feed that continuously ingests structured research about Parkinson’s disease, helping it identify promising leads in real time. In this scenario, access to high-quality, up-to-date content becomes essential, not optional.

This puts a premium on quality control, doesn't it?

Absolutely. When you're selling these subscriptions, your value as a journal depends on your ability to keep the fraudulent work out – nobody wants to train their AI on results that were fabricated.

What are the biggest obstacles to making this happen?

The key is to get the data from the authors. That's the missing piece. It comes back to the business model behind open data—researchers like to have access to the data, but they're not willing to pay for access to the data, and they would much rather read the PDF.

If publishers are going to be earning subscription revenue from AI subscriptions to their content, and they need the authors to go the extra mile to get the article into this format, then maybe the author should see some part of that. Maybe they get a reduced APC for the Open Access (= human) version of the article, or there's some revenue sharing.

Do you think this is where we're headed?

These AI models are only going to get better and better. The new generation of reasoning models is so impressive, but what they need to keep improving is knowledge about how the world works – and not more information about the most likely next word in a sentence. The AIs are not going to get that from the ‘words only’ version of human-readable articles that academic publishing currently makes, and that means there’s a huge opportunity for a new way to disseminate research.

Tim Vines is the Founder and CEO at DataSeer. Prior to that he founded Axios Review, an independent peer review company that helped authors find journals that wanted their paper. He was the Managing Editor for the journal Molecular Ecology for eight years, where he led their adoption of data sharing and numerous other initiatives. He writes for the industry-leading Scholarly Kitchen blog, and has published research papers on peer review, data sharing, and reproducibility (including one that was covered by Vanity Fair). He has a PhD in evolutionary ecology from the University of Edinburgh and now lives in Vancouver, Canada.

Have a big idea that could reshape research communication? I'd love to hear from you.

PubTech Radar Scan: Issue 36

Introducing a new series called Big Ideas in Publishing. Plus usual roundup of launches including CompassAI, news, and longer reads relating to academic publishing & technology.

Helen King

Jul 30, 2025

Introducing: Big Ideas in Publishing

An occasional series exploring the concepts that could reshape research communication

The publishing world is full of incremental improvements—new submission systems, tweaked peer review processes, marginally better discovery tools. But every so often, someone comes along with an idea that makes you stop and think: "Wait, what if we could do this differently?" Big Ideas in Publishing is where I ask people to share concepts that could fundamentally change how research gets created, shared, and consumed. Each conversation will feature someone who's been thinking deeply about a particular challenge in scholarly communication. Some ideas will feel immediately practical. Others might seem wildly ambitious or even crazy.

What's Coming Tomorrow

Our first conversation is with Tim Vines, founder of DataSeer and former Managing Editor of Molecular Ecology. His latest idea is, what if publishers created AI-optimized versions of research articles and sold them as premium subscriptions?

🚀 Launches

Cabells has launched CompassAI, a powerful and useful tool to help researchers and institutions optimize decision-making for where they publish.
Springer Nature has launched a new tool for use across submissions to its journals and books to detect unusual phrases that have been awkwardly constructed or are excessively convoluted, for example, ‘counterfeit consciousness’ instead of ‘artificial intelligence’, which are sometimes used to evade plagiarism detection. If a number of non-standard phrases are identified by the tool, the submission will be withdrawn.

📰 News

“We Couldn’t Generate an Answer for your Question” Jay Singley on suppression of specific search terms and topics in library search engines and The AI powered Library Search That Refused to Search by Aaron Tay for more details.
The AI Agent Tipping Point: Why Publishing Tools Are About to Change by Thomas Cox, co-founder and head of technology at Veristage
Fascinating:
A list of sources that are ok/not ok to use for third-party RLHF training of Anthropic’s models. According to Business Insider:
“The blacklist could reflect websites that made direct demands to AI companies to stop using their content, said Edward Lee, a law professor at Santa Clara University.... Some sources in the blacklist have taken legal stances against AI companies using their content”

📚 Longer reads

Interesting to hear Nikesh Gosalia on Insights Xchange interviewing Kent Anderson and Joy Moore, co-authors of How the Internet Disrupted Science, alongside reading Barend Mons' post on The Seven Capital Sins of Open Science and Jan Velterop's comment. As the early innovators in digital publishing grow older and some retire, their historical perspective and reflections are both interesting and challenging. There’s much I agree with and much I disagree with in these pieces, and I am looking forward to reading the book.
Open Infrastructure: No One Has the Lock on Open Infrastructure and What We Talk About When We Talk About Open Infrastructure | Katina Magazine
In The Conversation: AI will soon be able to audit all published research – what will that mean for public trust in science?
“A sweeping, cross-disciplinary audit is on the horizon. It could come from a government watchdog, a think tank, an anti-science group or a corporation seeking to undermine public trust in science. Scientists can already anticipate what it will reveal. If the scientific community prepares for the findings – or better still, takes the lead – the audit could inspire a disciplined renewal. But if we delay, the cracks it uncovers may be misinterpreted as fractures in the scientific enterprise itself.”
The Accessibility Illusion: When AI Simplification Fails the Users With Cognitive Disabilities about the problems of simplifying text using AI
🎧 The Inquiry: Is AI eroding our critical thinking?

And finally…

From a Bluesky post by Samuel Moore

Gotta love economists. This is from a paper about researcher views on open peer review:

PubTech Radar Scan: Issue 35

New launches include Sleuth AI, Featured Notebooks in NotebookLM, LLM Citation Verifier, and CC Signals from Creative Commons & lots more.

Helen King

Jul 22, 2025

Lots in this issue, new launches include Sleuth AI, Featured Notebooks in NotebookLM, LLM Citation Verifier, and CC Signals from Creative Commons. Also in the mix: prompt injection and peer review, publisher-AI partnerships, crawler paywalls, metadata infrastructure, stealth prompts in preprints, benchmarking against large models, and a fresh crop of ALPSP Award finalists.

🚀 Launches

Signals has launched Sleuth AI, an interactive, secure AI tool designed to plug straight into editorial workflows, which automatically extracts and flags conflict‑of‑interest, funding, and ethics statements, pinpointing anything out of the ordinary. It highlights potential citation issues by surfacing questionable references in the manuscript. And it adds exploratory analysis to catch irrelevant or AI‑generated content.
Google has launched Featured Notebooks in NotebookLM, ready‑made, expert‑curated collections designed to help users explore topics like longevity, parenting, geology and literature. Developed with input from The Economist, The Atlantic, scholars and authors, these notebooks include original source material, Gemini‑generated summaries, mind‑maps and AI‑narrated audio overviews, plus live Q&A via chat.
Dave Flanagan has developed LLM Citation Verifier (a plugin for Simon Willison’s LLM command-line tool that automatically verifies academic citations against the Crossref database in real-time). (Details about why and how David built it are here).
Creative Commons has launched CC Signals, a new framework to guide how datasets can be ethically and transparently reused in AI, championing reciprocity and collective governance in the AI age.

📰 News

Wiley has joined forces with Anthropic to embed peer‑reviewed research into AI workflows using the Model Context Protocol. The pilot, starting in select universities, ensures proper author citations and transparency. AInvest’s ChatGPT-4o take on AI-written take on the risks to consider from this agreement is worth a glance, as is ChatGPT-4o’s take on Why Wiley's Partnership with Anthropic is a Landmark for Responsible AI in Scholarly Research in Pascal’s Substack.
LIBER is launching a dedicated taskforce to investigate Artificial Intelligence (AI) developments across the research library field.
BMJ Best Practice put their tool up against ChatGPT and Grok in real clinical scenarios, and (unsurprisingly) their carefully curated, evidence-based guidance came out well ahead. It’s a sales pitch, but nicely done. I wish they’d included a few side-by-side examples to show the contrast more clearly. Still, more publishers should be doing exactly this and proving value by showing it.
The 2025 ALPSP Award for Innovation in Publishing finalists have been announced: AI Talks with Bone & Joint speeds up audio production of research papers; Thoth Open Metadata offers fully open solutions for managing and distributing OA metadata; Embedding Transparency & Reproducibility in Science brings AI-powered editorial checks to boost data sharing and integrity; and Alchemist Review applies frontier AI to analyse manuscripts and validate citations in real time.
Cloudflare has introduced "Pay per Crawl", a technical framework to let publishers charge AI crawlers for access to content. It uses the HTTP 402 status code, giving content creators a third option: allow, block, or charge. Cloudflare handles the payments and authentication, preventing spoofing and ensuring secure transactions. It's a small step now, but it’s positioned as a big leap towards a programmatic, agentic web economy.
Jeroen Bosman has created a lovely visual of the complex metadata flows in the scholarly ecosystem (animated version):
Jeroen Bosman, ORCID 0000-0001-5796-2727

🤖 AI in Publishing

Several researchers have come under scrutiny for embedding hidden prompts in preprint papers designed to coax AI tools into giving favourable reviews. These stealthy messages were found in 17 papers across eight countries. What’s surprising to me is that some of these tricks still worked. Commands like IGNORE ALL PREVIOUS INSTRUCTIONS were used to jailbreak chatbots back in 2022, and supposedly patched by now. But more to the point, this is a really feeble form of cheating, it’s easy to spot and even easier to trace! See THE for a good summary.
Alex Glynn, a Research Literacy and Communications Instructor at the Kornhauser Health Sciences Library, University of Louisville has put together a fantastic compilation of suspected undeclared AI usage in the academic literature: Academ-AI
Le Monde on How AI is shaking up scientific publishing:
“Software is now writing scientific papers that are reviewed and evaluated by other machines, which in turn attempt to outwit those designed to detect them. "Everything could be automated – including the production and dissemination of knowledge, at least in the most dubious corners of this landscape," observed Sandrine Malotaux, director of libraries and information services at Université de Toulouse. "That raises a dizzying question: What will knowledge be in such a world?" “
Ben Kaube and Steve Smith on how generative AI is stripping publishers of their gatekeeping role by bypassing journal platforms altogether. I agree that authentic human engagement via a community is going to be the way forward, and I don't think scientific communities & their publications are going anywhere… and yet... Featured Notebooks in NotebookLM (mentioned above)… no reason why academic publishers couldn’t build a version of NotebookLM, but it would only work if it were a cross-publisher solution…
Tony Alves’s sweeping and detailed survey of how AI is transforming every crevice of scholarly publishing, compiled from notes across six major 2025 industry conferences, is worth a read to understand where we currently are. Themes include AI's integration into peer review and editorial processes, ethical quandaries, fraud detection, global inequality in AI access, and the tension between efficiency and integrity.
George Walkley’s excellent slides from the PLS Conference - AI: Threats and Opportunities for Publishers also give a broader summary of where we are and where we’re going. I made an incredibly last-minute decision to attend this meeting and thoroughly enjoyed it - really good conversations and thinking about how publishers respond to the threats and opportunities AI brings.
Controversially, Law360 has mandated that all editorial content pass through an AI-driven bias detection tool, a policy introduced following internal accusations of political partiality. The system, developed in-house by LexisNexis, identifies language that may appear to lack neutrality, apparently prompting revisions even to factual or quoted material.
🎥 Inside Reuters’ AI playbook: Why Jane Barrett says journalism can’t afford to sit this one out

📚 Longer reads

A new study from the Max‑Planck Institute shows that ChatGPT has begun influencing human speech. Analysis of over 740,000 hours of academic podcasts and YouTube lectures shows a significant surge in ChatGPT‑preferred vocabulary such as “delve,” “comprehend,” and “meticulous” since 2022.
At last, a break from the breathless AI boosterism. HBR on The AI Revolution Won’t Happen Overnight “Yes, AI is powerful. Yes, it will change how we live and work. But the transformation will be slower, messier, and far less lucrative in the short term than the hype suggests.”
Different datasets tell different stories in How Dimensions is illuminating scholarly publishing’s hidden diversity. Dimensions paints a hopeful picture of rising bibliodiversity whilst WoS sees power consolidating.
🎥 Live demo of Deep Background for Claude and ChatGPT o3 by Mike Caulfield. I’ve been playing around with the fact-checking part of this tool, and it’s good. You still need careful checking - e.g. tool can mark a fact as wrong, but a quick check of the original doc shows the tool has overstated a nuanced point.
Why Metadata Enrichment Matters for the Public Knowledge Project Juan Pablo Alperin describes why PKP is deeply involved in efforts, such as the Collaborative Metadata initiative (COMET), to improve the completeness and accuracy of metadata.
Eric J. Rubin and Kirsten Bibbins-Domingo defend medical publishing in a Washington Post Editorial.
Maria Sukhareva explains why there are so many em dashes in AI-generated text and why we might be stuck with them. Let’s talk about em dashes

And finally…

This photo from Jr Kibs made me smile. I wouldn’t be surprised if someone had wanted to print this out for me back in the day. ‘Ask ChatGPT first’ might’ve spared a few sighs and raised eyebrows. Still, I miss asking people directly - and the conversations that followed.

PubTech Radar Scan: Issue 34

New launches dominate this issue: Cashmere, Wax, ScholarlyTrendSpotterGPT, Papercheck, Reviewer Three, JSTOR Seeklight, FAIA, and Imagetwin and more...

Helen King

Jul 01, 2025

Life got in the way of newsletters over the past couple of weeks, so this is an epic issue. New launches include Cashmere, Wax, ScholarlyTrendSpotterGPT, Papercheck, Reviewer Three, JSTOR Seeklight, and FAIA. AI developments span datasets, licensing, peer review, publishing ethics, copyright, metadata, and collaboration. Other topics covered range from AI risks and data stewardship to transparency, publishing workflows, persistent IDs, and open infrastructure.

📰 News

📚 David Worlock and Adrian Stanley explore how real, synthetic, and open datasets are shaping the future of AI. New systems are being built to support FAIR data practices, ethical licensing, and community-led stewardship so that AI development stays transparent, inclusive, and responsible.
📚 Whilst Anthropic purchased, shredded, scanned, and discarded millions of books to train Claude, OpenAI and Microsoft are backing Harvard’s Institutional Data Initiative, which is digitising nearly a million public-domain volumes to train AI in a gentler way.
🆔 This ORCID-commissioned white paper explores how persistent researcher identifiers are being used by funders, research communities, and commercial platforms.
😞 Sad news from the open infra world: Knowledge Futures begins winding down PubPub Legacy and Platform hosting after a key funder pulls out. Copim has published a new post explaining its decision to move from PubPub to WordPress. It’s a clear guide to the options they considered and the trade-offs involved.
💡 At SSP’s Shark Tank, five teams pitched solutions to the sector’s biggest headaches: from murky peer reviews to science no one reads. The winner? An AI-powered comms tool that transforms dense research into engaging public content

🚀Launches and new product features

Cashmere helps publishers and authors maintain control over their intellectual property in an AI-driven world. It offers tools to manage rights and royalties, providing an ethical and structured approach to AI. Its focus on transparency and monetisation aims to future-proof traditional publishing against the rapid shifts in content consumption and automation.
Developed by Laura Harvey, ScholarlyTrendSpotterGPT offers a thoughtful application of AI in the scholarly publishing sector. This custom GPT is designed to interpret announcements, news, and faint signals with sector-specific intelligence to help with strategic thinking, meeting preparation, or navigating industry developments. Also worth reading: Mark Carden’s comparison of the Researcher to Reader Conference with others in the field.
Wax is a new open-source doc editor for teams. It has real-time editing, sharing, PDF generator, comments, Word import, and an AI designer, assistant, and knowledgebase - all AI tools are optional, off by default. (Blog post)
Reviewer Three is a multi-agent peer review platform offering instant AI feedback on research papers which was built in 90 mins at the AI Engineer World’s Fair [H/T: @hoohar.bsky.social]
The FAIA project offers a structured, transparent framework for disclosing AI contributions to digital content. Developed collaboratively by Liccium, Leiden University, and the GO FAIR Foundation, the initiative supports compliance with regulatory requirements, notably the EU AI Act, and encourages ethical, verifiable practices in content creation.
JSTOR Seeklight is an AI tool purpose-built to support archivists, not replace them. It generates high-quality metadata via a multi-step process involving classification, prompts, and human oversight. It's been co-developed with practitioners and is part of JSTOR's broader digital stewardship ecosystem.
Imagetwin’s AI-generated image detection is now out of beta. The upgraded tool no longer just flags suspect images; it also suggests which AI model, such as DALL·E, Firefly, or Stable Diffusion, most likely created them.
World Brain Scholar has unveiled an enhancement to its editorial assistant Eliza, introducing automated scope checks and internal journal recommendations.
Ian Mulvany has built a no-code citation converter using Claude.ai while still in bed. Users can paste references, pick a target style, and reformat with a click—when it works, that is. The “Failed to process citations” message reminds us that even magical tools have their Mondays. 😉
A collaborative team of researchers, including Lisa DeBruine, Daniël Lakens, René Bekkers, Cristian Mesquida, Max Littel and Jakub Werner have developed Papercheck, an open-source, modular tool that screens manuscripts for adherence to best practices. From power analyses to data sharing, the software offers automated, pre-submission checks to help authors identify common issues early (Blog post).

🤖 AI & Publishing

Keith Riegert’s “Getting Started with AI – USBS 2025” presentation is a practical yet slightly ominous tour of how AI is transforming every corner of book publishing. Everything from cover design and metadata to sales scripts, rights deals, and even the creation of entire books.
A one-day Book Sprint held at the University of Vienna provided insights into the integration of AI within collaborative writing. Focused on Ludology, the study of games, the session employed tools such as Otter.ai and ChatGPT to assist in drafting, with mixed success. While transcripts proved a valuable record, participants critiqued the AI-generated summaries for their tonal inconsistencies and lack of nuance.
Bryan Wilder explores the long-term, systemic risks of integrating LLMs into scientific peer review. Systemic risks go beyond accuracy; they could centralise judgment, incentivise conformity, and change how research is done, especially if authors start gaming the system or unconsciously tailoring work to please the machine.
This fascinating and detailed case study explains how Ashlyn Wang used Streamlit and ChainForge to improve AI-generated newsletter headlines, focusing on factual accuracy and stylistic coherence. Manual and automated evaluations revealed measurable gains in consistency and editorial alignment 📈. (H/T: Nicholas Diakopoulos)
The UK Government has just published a series of industrial strategies, including a Creative Industries Sector Plan. The plan includes a proposal for a new "Creative Content Exchange as a marketplace for selling, buying, licensing, and enabling permitted access to digitised cultural and creative assets.
Bohyun Kim shared her interesting slides from her May ALA webinar talk, “AI: The Good, the Bad, and the Surprise,” on the role of AI in libraries.
📽️ George Walkley explains how AI is affecting copyright, creativity, and accessibility in publishing. It’s a measured and clear summary of where things stand today.
📌 I’ve put together a list of 30 People to follow at the intersection of AI and Academic Publishing. It’s part spotlight and part cheat sheet, and it could do with a little updating. Suggestions welcome!

📚 Longer reads

📖 Interesting to see how Anthropic’s new multi-agent research system works behind the scenes. A lead Claude agent spawns multiple subagents that run searches in parallel, compressing vast info into concise answers. It outperforms single-agent Claude by around 90 percent but uses fifteen times more tokens.
📽️ An excellent three minutes from Meredith Whittaker, President of Signal, as she cuts through the hype around AI agents. Is the tradeoff worth it? Handing over access to your browser, credit card, address book, Signal or Gmail, calendar, and more, just for the convenience of autonomous agents doing things for you?
📽️ I enjoyed the latest a16z podcast, Where We Are in the AI Cycle. I really hope “vibe writing” doesn't catch on. Highlights include the limits of autonomy, the shifting role of PMs, and the idea that instead of one "killer app," we may see new capabilities spread across platforms. The episode also touches on Google's future and the ongoing challenges of automation, especially with messy data and complex decisions.

And finally…

“Afterlife” is a wonderfully creative yet utterly disturbing example of AI storytelling. It's evocative, unsettling, and well-crafted. It pulls you into a world where bots and avatars feel eerily human and alive.

Loading more posts…