In my last blog post, I argued that
Europe can't win the game of who can field the best-known, most
widely used smart personal assistant – and since these assistants
are really impersonal, we wouldn't want to. We want an eco-system of
companies doing what they do best: spotting a need, filling it
well, and cooperating with each other.
We talked at the ROCKIT Roadmap Conference about data and infrastructure and necessary connections,
but here I have to tread on dangerous territory – by tying lots of
things together into a story about practical action for CITIA to
take. All the bits of the story come from someone who was there, but
do they work together to make what we need?
One of the recurring themes of the
conference is that we need to share data with each other so that all
of our algorithms and products improve. This needs to start with
academics. Many academics feel that data collected using public
resources should be public. That doesn't make releasing it easy.
It's very difficult to make scientists, who are always chasing the
next publishable result, stop before the end of a project so that
they have enough resource left to package the data, set license
terms, and release it.
Anonymization can be both necessary, and
expensive – especially since there isn't general agreement about
what level of certainty is legally acceptable. In addition,
postdoctoral researchers and students often don't have the skills to
do a decent job on packaging. They have trouble thinking like
someone who doesn't already understand what they've produced, so
their documentation can be very poor indeed. I think this is an
important part of any future job – academic or otherwise – so
training is part of the solution. However, I also think data
repositories need enough support to be able to curate the good from
the bad and quality check packaging in time that data producers can
correct it.
I also think it's hard to engineer data
sharing among companies – but we did generally agree that the best
way to make headway is to start working together to target our
customer contacts, picking off each vertical separately. I actually
think if this were to happen, the data sharing would come as a side
effect. So the main action here is finding out how CITIA can
encourage this kind of working together, rather than thinking about
data itself.
Another recurring theme was that we
need open architectures and at least de facto standards, so that
academics and businesses can each concentrate on the part they do
best. That's great, but it has to get less vague very quickly.
We've agreed that will be an important part of our work in the second
year of our support action, but what are the best actions that
actually fit a relatively small budget we have to achieve the results
we need? When I get stuck on big problems like this, I try to think
about the nearest successful analogue to the problem at hand, and the
history of how that collaborative system emerged. What is the most
similar story we can think of – some major open source system like
Wordpress? Solutions from the logistics industry? I'm really not
sure, myself, but someone must have a better vision than I do. Let us know!
Jean Carletta, Edinburgh University