In my last blog post, I argued that Europe can't win the game of who can field the best-known, most widely used smart personal assistant – and since these assistants are really impersonal, we wouldn't want to. We want an eco-system of companies doing what they do best: spotting a need, filling it well, and cooperating with each other.
We talked at the ROCKIT Roadmap Conference about data and infrastructure and necessary connections, but here I have to tread on dangerous territory – by tying lots of things together into a story about practical action for CITIA to take. All the bits of the story come from someone who was there, but do they work together to make what we need?
One of the recurring themes of the conference is that we need to share data with each other so that all of our algorithms and products improve. This needs to start with academics. Many academics feel that data collected using public resources should be public. That doesn't make releasing it easy. It's very difficult to make scientists, who are always chasing the next publishable result, stop before the end of a project so that they have enough resource left to package the data, set license terms, and release it.
Anonymization can be both necessary, and expensive – especially since there isn't general agreement about what level of certainty is legally acceptable. In addition, postdoctoral researchers and students often don't have the skills to do a decent job on packaging. They have trouble thinking like someone who doesn't already understand what they've produced, so their documentation can be very poor indeed. I think this is an important part of any future job – academic or otherwise – so training is part of the solution. However, I also think data repositories need enough support to be able to curate the good from the bad and quality check packaging in time that data producers can correct it.
I also think it's hard to engineer data sharing among companies – but we did generally agree that the best way to make headway is to start working together to target our customer contacts, picking off each vertical separately. I actually think if this were to happen, the data sharing would come as a side effect. So the main action here is finding out how CITIA can encourage this kind of working together, rather than thinking about data itself.
Another recurring theme was that we need open architectures and at least de facto standards, so that academics and businesses can each concentrate on the part they do best. That's great, but it has to get less vague very quickly. We've agreed that will be an important part of our work in the second year of our support action, but what are the best actions that actually fit a relatively small budget we have to achieve the results we need? When I get stuck on big problems like this, I try to think about the nearest successful analogue to the problem at hand, and the history of how that collaborative system emerged. What is the most similar story we can think of – some major open source system like Wordpress? Solutions from the logistics industry? I'm really not sure, myself, but someone must have a better vision than I do. Let us know!
Jean Carletta, Edinburgh University