Databases in academia

University research isn't always up on the latest in business IT


Last week I was at Cambridge, learning what Henslow taught Darwin (Kohn, Murrell, Parker and Whitehorn, Nature, vol. 436, 4 August 2005, p643 – available online if you subscribe/register).

Henslow, elected Professor of Botany at Cambridge in 1825, was a careful scientist, the first university lecturer to illustrate his lectures (yes, even before PowerPoint), and a creationist who investigated the variation within species in order to show that species were created as fundamentally stable things that just varied widely in response to conditions.

Darwin was his pupil (Henslow helped arrange for Darwin’s presence on the Beagle), but Darwin made the intellectual leap that allowed him to interpret Henslow’s records of variation - not as evidence of a fixed set of created species with variations, but as evidence of the evolution of new species in action.

Why was I there representing Reg Developer? Well, John Parker’s research establishing exactly what Henslow was doing and its importance to Darwin’s work was assisted by Mark Whitehorn, Reg Developer columnist and database expert, who got his PhD with Parker many years ago.

Shows John Parker with Henslow samples

The research team was cross-disciplinary in the first place – it included David Kohn, a historian from Drew University in New Jersey, USA (who “went white” when he learnt what Henslow had been doing, since he had to rewrite a chunk of his book, yet to be published, on Darwin); Gina Murrell from the Cambridge University Herbarium; as well as Parker, who is from the Cambridge University Botanic Garden.

However, it was largely chance that Mark was around to point out that correlating Henslow’s plant collections with the time of collection, the people involved, Darwin’s published work and so on using a card index, was woefully inefficient. He designed a database to hold all the information available from Henslow’s collections (found in sheds and attics around Cambridge, as I remember it) and advised and assisted with the extensive data cleansing needed.

He chose Microsoft SQL Server (although he says any reasonable relational database would have done) to store the data, because he considers its query and analysis facilities to be unparalleled today – and he used SQL Server 2005 in its beta incarnation, simply because it made the management of the database and analysis very much easier than with the previous version. And, the research team’s enthusiasm for the way they could now ask questions of their data and get immediate answers and visualisations was palpable.

Shows Henslow tree-planting in Cambridge.

Of course, Henslow’s sheets of paper with collections of plants stuck to them, illustrating variations within a single species, is also a database of sorts. These days, we’d photograph the plants and store them in an electronic database as an extended datatype (although whether recreating the database from a set of CDs in a box in a cupboard some 150 years later would be as feasible as recreating Henslow’s work is moot). But perhaps we wouldn’t.

Although computers are widely used in theoretical physics and such research, the tools taken as routine in business are being overlooked in academia – if Mark hadn’t taken a PhD with John Parker and then moved into databases (he’s in the Department of Applied Computing at the University of Dundee) this research would have been based on shuffling index cards in a card index box (or, at best, on something like a spreadsheet).

Makes you think. And one thing it makes me think is that there are still unexplored opportunities for database specialists out there. And, frankly, 20 years or more after James Martin first excited me with the potential of Relational Databases, that rather surprises me.

Photographs by David Norfolk, who is also the author of IT Governance, published by Thorogood. More details here.


Other stories you might like

  • Huawei reports severe revenue drop as US sanctions bite consumer business

    Chinese chipmaker SMIC also reports trouble getting the American kit it needs to expand

    Chinese tech giant Huawei has reported a 29.5 per cent year-on-year plunge, blaming it in part on US sanctions, but also shrugged off the situation.

    The company last Friday reported H1 2021 revenue of CNY320.4 ($49.56B). H1 202 yielded CNY 454 ($70B) of revenue. The news wasn't all bad, because Huawei reported its net profit margin rose to 9.8 per cent – up from 9.2 per cent a year ago.

    Eric Xu, Huawei's rotating chairman, blamed the results on "a decline in revenue from our consumer business caused by external factors". That's an oblique reference to US sanctions that make it hard for Huawei to source the components it needs to make top-notch products – especially mobile devices.

    Continue reading
  • AI algorithms uncannily good at spotting your race from medical scans, boffins warn

    Plus: British MP wants to ban AI deepfake smut tools

    In brief Neural networks can correctly guess a person’s race just by looking at their bodily x-rays and researchers have no idea how it can tell.

    There are biological features that can give clues to a person’s ethnicity, like the colour of their eyes or skin. But beneath all that, it’s difficult for humans to tell. That’s not the case for AI algorithms, according to a study that’s not yet been peer reviewed.

    A team of researchers trained five different models on x-rays of different parts of the body, including chest and hands and then labelled each image according to the patient’s race. The machine learning systems were then tested on how well they could predict someone’s race given just their medical scans.

    Continue reading
  • SpaceX Starship struts its stack to show it has the right stuff

    Combined with its Super Heavy booster, Starship stood briefly as the tallest rocket yet

    The Jeff Bezos-bearing Blue Origin New Shepard rocket elicited attention for its shape when it launched last month.

    On Friday, rival billionaire Elon Musk's SpaceX Starship made a show of its size.

    SpaceX stacked its Starship SN20 upper-stage atop the company's Super Heavy booster at its facility in Boca Chica, Texas, to test the fit of the two components that together made the largest rocket ever built.

    Continue reading
  • Amazon delays return to office work until 2022 at the earliest

    Other Big Tech companies, however, still want workers in this autumn

    Amazon has delayed staff returning to its offices around the world from September this year to January 2022, as the Delta variant of the novel coronavirus continues to spread.

    “As we continue to closely watch local conditions related to COVID-19, we are adjusting our guidance for corporate employees in the U.S. and other countries where we had previously anticipated that employees would begin coming in regularly the week of Sept. 7,” the online bazaar said on Thursday. “We are now extending this date to Jan. 3, 2022. Our return-to-office timeline will vary globally in accordance with local conditions.”

    The pandemic has changed the way we work. Gone are the days where we need to commute into the office and work at our desks next to our colleagues. Recent surveys show that most people prefer working from home and don’t want to go back to the office much, if at all.

    Continue reading
  • All your DNS were belong to us: AWS and Google Cloud shut down spying vulnerability

    Security researchers found they could snoop on dynamic DNS traffic

    Until February this year, Amazon Route53's DNS service offered largely unappreciated network eavesdropping capabilities. And this undocumented spying option was also available at Google Cloud DNS and at least one other DNS-as-a-service provider.

    In a presentation earlier this week at the Black Hat USA 2021 security conference in Las Vegas, Nevada, Shir Tamari and Ami Luttwak from security firm Wiz, described how they found a DNS name server hijacking flaw that allowed them to spy on the dynamic DNS traffic of other customers.

    "We found a simple loophole that allowed us to intercept a portion of worldwide dynamic DNS traffic going through managed DNS providers like Amazon and Google," explained Tamari in a blog post. "Essentially, we 'wiretapped' the internal network traffic of 15,000 organizations (including Fortune 500 companies and government agencies) and millions of devices."

    Continue reading
  • Foxconn buys chip factory off Macronix in bid to break into the electric vehicle market

    Electronics giant must conquer its supply chain as US eyes domestic production

    Taiwanese electronics giant Foxconn has purchased a chip plant for $90.8m from its compatriot, Macronix International.

    "Macronix is pleased to see the subject 6-inch wafer fab continue to make its contribution to Taiwan as Foxconn commits to have the fab be used as an important base for Foxconn to reinforce its semiconductor development plan and to meet the demand of electric vehicles," said Miin Wu, chairman and CEO of Macronix, in a canned statement on Foxconn's website.

    The sales agreement includes Macronix's 6-inch wafer fab and equipment, but no employees, in Taiwan's Hsinchu Science Park and is planned to close by the end of 2021.

    Continue reading
  • THX Onyx: A do-it-all DAC for the travelling audiophile

    Hi-res, MQA, DSD, supports Apple Music's highest quality – but is it worth the hassle?

    Review Apple introduced hi-res lossless audio to its music service last month, but third-party hardware is required to enjoy it – if indeed the difference is audible. We took a look at the THX Onyx, a portable DAC and headphone amplifier that claims to be just the thing.

    There is a strange cocktail of ingredients that flavours the music and audio industry. There is a drive towards greater convenience, which means streaming music and true wireless, as popularised by Apple's Bluetooth-driven AirPods, first introduced in September 2016. Then there is a push towards higher quality, with vendors touting higher resolution such as 24-bit 192kHz digital, or exotic formats such as DSD (Direct Stream Digital), MQA (Master Quality Authenticated) – all of which are supported by the THX Onyx – and Dolby Atmos/Spatial audio, which is a new approach to surround sound.

    These two demands sometimes pull in opposite directions. Streaming audio has largely meant lossy compression, formats such as MP3 and AAC (Advanced Audio Coding), which reduce data size by omitting parts of the signal that are inaudible or hardly audible. Wireless has largely meant Bluetooth audio, for which none of the available codecs are lossless. Lossy compression at levels like Apple's 256 Kbps AAC is excellent and not an issue for most people yet there remains the nagging annoyance that it is potentially compromising quality for the sake of convenience and efficiency.

    Continue reading
  • Does the world need another cross-platform framework? Tough, here's JetBrains with Compose Multiplatform

    'A different way of thinking about applications' says project lead

    An open-source Kotlin framework for cross-platform applications, based on Jetpack Compose for Android, is now in preview.

    Google's Jetpack Compose is an official framework for building a user interface in an Android application, and reached version 1.0 last week, at the same time as the first stable release of Android Studio, 2020.3.1 or "Arctic Fox".

    Despite only just hitting 1.0, Google said: "There are already over 2,000 apps in the Play Store using Compose – in fact, the Play Store app itself uses Compose."

    Continue reading
  • Your Computer Is On Fire, but it will take much more than this book to put it out

    Detailed diagnosis of tech industry delusion falls short of prescribing a cure

    Book review Seasoned industry watchers will welcome Your Computer Is on Fire as a thorough and unflinching debunking of Big Tech's outlandish self-mythologising. They might even hope that governments, business, and the media organisations who buy into the barrage of propaganda start to ask a few important questions. But there are limits to this niche text that is at times prone to academic navel-gazing.

    In the 1990s, despite the outward differences between the industry big guns, the background hum was the same. The internet offered opportunity for all, ecommerce could lead to frictionless economics, software made people more productive, and companies more competitive. Such delusions survived the dotcom crash and financial crisis then re-emerged in the early days of social media as the Arab Spring became a use case for the positive impact of Twitter and Facebook. Together with that movement's difficult development, the nefarious exploitation of social media user data that contributed to the election of US presidential regime with ever-so-slightly insurrectionist tendencies should have given pause for thought.

    It's a wonder, then, that tech industry propaganda has barely shifted. Instead, it's a case of different tech, same tune. Last month, Google CEO Sundar Pichai told the BBC that AI would be the "most profound technology" that humanity will ever develop. Similarly, UK Cabinet Office minister Julia Lopez adopted industry language when she said that "now, more than ever, digital must be front and centre of government's priorities to meet user needs."

    Continue reading
  • Flushing roulette: Southern Water installing digital sewer monitors to prevent blockages

    Plan to deal with fatbergs NOT related to that £90m fine for dumping effluent into sea on England's south coast

    Where's there's muck there's brass, and there won't be many places more mucky than a sewer system as bidders for a network digitalisation contract in southern England are about to rediscover.

    According to a tender published this week, Southern Water is wading through the market to sniff out a supplier to "significantly and rapidly improve the visibility of the gravity wastewater network."

    "We plan to achieve this by installing 10,000's (up to 30,000 across Kent, East and West Sussex, Hampshire and the Isle of Wight) of sewer monitors and developing in parallel the associated analytics to make appropriate and effective use of the additional information to prevent sewer blockages developing into a pollution or flooding incident," the document states.

    Continue reading
  • Hey, AI software developers, you are taking Unicode into account, right ... right?

    Here's how to switch around account numbers, slip past moderation, and mix up names in production-level models

    Analysis Computer scientists have detailed ways in which AI language systems – including some in production – can be hoodwinked into making bad decisions by text containing unseen Unicode characters.

    Account numbers can be switched around, recipients of transactions changed, and comment moderation bypassed by special hidden characters, we're told. And it is claimed software built by Microsoft, Google, IBM, and Facebook can be potentially fooled by carefully crafted Unicode.

    The issue is that ambiguity or discrepancies can be introduced if the machine-learning software ignores certain invisible Unicode characters. What's seen on screen or printed out, for instance, won't match up with what the neural network saw and made a decision on. It may be possible abuse this lack of Unicode awareness for nefarious purposes.

    Continue reading

Biting the hand that feeds IT © 1998–2021