PubTech Radar Scan: Issue 37
Somewhat delayed ‘back to school’ edition. Usual mix of launches, news, AI, and longer reads, somewhat haphazardly collected and curated.
📰 News
Wiley has announced that 1,000 scholarly journals have successfully transitioned to its Research Exchange platform, representing more than 50% of the company's journal portfolio. [I’ve always been a fan of REX and the thinking behind the service. It is no easy feat to move journals to a new peer review platform]
This LinkedIn thread and post by Chris Dicker, CEO of CANDR Media, explains how Trusted Reviews was scraped 1.6 million times in a single day, 18.5 times a second, allegedly by OpenAI, despite having a robots.txt file in place. Out of 1.6 million scrapes, we saw just 603 users arrive on site, a CTR of 0.037%, Chris wants to know who is going to pay for this? [I have an interview with Jonathan Woahn from cashmere.io on one approach to solving this problem coming shortly.]
A call for the US federal government to adopt persistent digital identifiers (DOIs for research awards and ORCIDs for researchers) across all agencies. The article outlines the problems of the current reporting burden, explains how digital identifiers would simplify everything, and then lays out steps to make it happen.
Sticking with the Identifiers theme, the main Crossref systems have [finally] moved to the cloud 🎉!
Effective 8 August 2025, the Center for Open Science (COS) has suspended new submissions to its generalist OSF Preprints server because of an increasing number of suspicious or low-quality submissions and other reasons. See also Academia has a new preprints problem by Mark Hahnel.
🚀 Launches
Near Missives a newsletter from Chris Reid & Laura Harvey “It is often said that there are two ways to learn about a topic: from deep inside and from the outside. As much as we should all be listening to and learning from our peers and colleagues, we wanted to create a way to bring ideas, concepts and trends from other industries to a scholarly comms audience. With that thought, Near Missives was born.”
The Thomas Kuhn Foundation has developed the KGX3 engine, a tool that can predict the significance of scientific research before it is even published. Instead of waiting decades to see whether a paper is “paradigm shifting” or simply solid, normal science, KGX3 analyzes research in real time to reveal whether it confirms, stresses, or breaks existing ways of thinking. Using proven signal-detection methods similar to those behind Bloomberg, Dataminr, and Brandwatch, KGX3 wants to transform the scientific record into a dynamic map of emerging ideas, helping researchers and decision-makers spot the structural tensions that precede major scientific breakthroughs.
posters.science, a new open-source and free platform where researchers can easily share, find, and reuse posters.
Copilot Mode in Edge includes multi-tab RAG. You can use Copilot to analyze your open tabs, like Satya Nadella does here with papers his company has published in Nature journals over the last year.
AlphaXiv transforms dense arXiv papers into blog‑style summaries in seconds. Just swap ‘arxiv’ with ‘alphaxiv’.
Clear Skies is making genAI detection with Pangram available to subscribers in Oversight. Adam claims it’s “GenAI detection that actually works”.
Veracity from GroundedAI is now available as a standalone, self-service tool, which includes Citation Stacking Detection and Citation Quality Assessment.
🤖 AI
Editors using the FT's internal CMS now “get a suggested description automatically – short, accurate, and context-aware – generated using a mix of machine vision and agency captions. Every suggestion is flagged for fact-checking and reviewed before publication to ensure accuracy.” [LinkedIn Post]
This Science Advances study uses a novel ‘excess word’ analysis to track how AI, especially ChatGPT, has reshaped the language of biomedical research. It uncovers a measurable shift in vocabulary across over 15 million PubMed abstracts, revealing that at least 13.5% in 2024 were likely LLM-assisted.
Researchers have tried to create tools that check whether a specific piece of text (like a paragraph) was included in a model’s training data. These tools are called membership inference attacks (MIAs), but they can be unreliable. Rather than trying to prove whether one paragraph was in the training set, dataset inference focuses on whether the model was trained on a larger dataset, like a whole book, better matching real copyright concerns, since authors usually care about their whole work being used, not just one page.
I don’t know this group, so DYOR, but I thought it looked interesting: workshop for Artificial Intelligence for Scientific Publications (WASP) 2025 in December.
Michael Upshall explores Thomas Krichel’s publication alert system, which is in use both for economics (where it is entitled NEP, short for New Economics Papers), based on the content in RePEc, and for life sciences (where it is called Biomed News, or BIMs), using content from PubMed.
Rishabh Lohia contrasts two models of digital publishing: one where users must search and sift through static content, and another where content actively adapts and responds to the user’s needs.
I saw this and thought perhaps publishers don’t need to abandon their cherished spreadsheets: Why Paradigm built a spreadsheet with an AI agent in every cell
Great thread by Aaron Tay on the development of Academic Search on Bluesky. Aaron has been publishing some excellent articles on Deep Research, AI Search, Deep Search, and how they work recently - subscribe to his Substack.
An academic, Liudmila Zavolokina, discovered her name cited in fake papers published by a shady journal filled with AI-generated articles, often authored by ghost writers (or no one at all), filled with hallucinated references.
Video showing how the Wiley/Anthropic content deal works:
📚Longer reads
📰 The em dash has had enough. The em dash speaks out in its own defence, claiming centuries of usage and lamenting that it’s now unfairly viewed as a marker of soulless AI writing. It cites usage by literary giants—Mary Shelley, Emily Dickinson, David Foster Wallace—and scolds the accusers for not reading enough.
🎥 A 2017 The Great Automatic Grammatizator: writing, labour, computers by Martin Paul Eve led me to Roald Dahl's excellent 1953 ‘Great Automatic Grammatizator’, a story about an automatic writing machine in which human writers are replaced by the machine. [YouTube reading of story]
📰 Subscribe-to-Open Is Doomed. Here’s Why by Rick Anderson. This post reminded me that around ten years ago, I explained Gold Open Access to a tech audience by comparing it - loosely and imperfectly - to a pyramid scheme. Publishing carried two kinds of costs: the one-off expense of producing a journal with a carefully managed number of papers, and the recurring expense of keeping the archive and related services online. In the subscription model, this is straightforward; stability is provided by many subscribers, and ongoing subscriber revenue covers both categories of cost + profit. In the Gold OA unlimited papers model, each new paper must carry its own production costs, help offset the costs of rejected papers, and also contribute to maintaining the growing archive of everything published before. I have always wondered how sustainable this model is over the very long term. Storage costs may decline, economies of scale kick in, but labour, compliance, infrastructure, and the rising expenses linked to AI, from detection to blocking unwanted scraping, are all increasing recurring costs. All of these models have problems.
📰 Stocking the Librarian’s Publishing Integrity Toolkit via Fabienne Michaud
🎥 NPTV Unscripted: In conversation with Martin Delahunty about Publishing, AI, and what the future holds.
📰 The PDF may no longer be the untouchable container it once was. But with foresight and action, scholarly publishers can still shape what comes next by Pascal Hetzscholdt and ChatGPT-4o. Interesting because of the changes to PDF but also because most Western publishers haven’t relied on PDFs for decades, and I think you can really see the limits of AI writing in this article. [I know this sounds daft, but I think the differences between HTML and PDF format online are really poorly understood by many users… I remember troubleshooting one librarian’s problem and realising that they didn’t realise that they were looking at an HTML page rather than the PDF… sadly, this wasn’t that long ago...]
🎧 Midnight at the Casablanca interviews Timo Hannay. There’s a section in here where Paul is asking about business models for some of the early work, and Timo’s response is a bit vague. It’s hard to appreciate now just how experimental web tech was then, how easy it was to put something together quickly, and how much Annette Thomas supported the team’s freedom to experiment.
Almost finally…
Thought-provoking from Hannah Shelley, Google Scholar Is Doomed. I’m inclined to agree, not sure about the timeline, but a couple of years after Anurag, the founder, leaves? Mind you, 25 years is an awfully good innings for a technology product.
End Notes:
If you’re interested in AI, you can subscribe for free to my other Substack GenAI for Curious People, to see entries for my new book.
If you found this useful, you can always buy me a coffee
If you would like to employ me, you can find me at Maverick
If you’re in London on Monday, 22 September, join us for the next AI in Publishing Meetup. Registration form coming soon.