What are the IP challenges for those building AI systems? Much will be familiar; today’s AI systems are based on software and all the usual IP issues associated with developing software products will apply.

However in addition to all the usual IP suspects associated with developing code, there are some unique features of AI systems which give rise to new IP challenges. One of these is the need to train the system using large volumes of data. Training is often crucial, as it allows the system to develop and refine its decision making abilities to the point where they start to become comparable with human decision making, putting the “intelligence” into AI.

How this need to train the system impacts on the IP issues will depend on the parties involved in the development. As with other software products, there are many ways in which AI systems can be developed; they can purchased off-the-shelf as finished products, custom built by organisations to address their own particular requirements, or anything in between. We are also increasingly seeing AI and machine learning offered as a service using a cloud based model, with providers such as AWS, Google Cloud and Microsoft Azure (to name just a few) offering access to their cloud based AI systems, such as natural language understanding, real time translation and emotion recognition via an API.

Who owns the IP in an AI system?

The way an AI system is trained will influence who owns the IP in the finished product. If all development is done entirely in-house, using in-house data sets, then ownership of the resulting system should be straight forward. However, things will be more complicated where there are multiple parties involved in its development. If one party supplies the initial code but a second trains it, there is the potential for conflict as to who owns the IP rights in the resulting system. Unlike a traditional software development situation where every line of code is attributable to a human author, using machine learning will generate large sections of code automatically. It’s easy to see the potential for disputes to arise about who should own the IP rights in the code generated by the training process.

The approach the parties take to IP ownership issues in their commercial agreements will need to adapt to reflect this process. In particular, parties entering agreements to develop AI systems will need to think carefully about how the IP rights in the resulting system are going to be owned and licensed going forward and ensure this recorded in their agreements.

Who owns the data used to train the system?

Training an AI system will generally require a very large dataset, which the system can use to test and refine its decision making abilities. But could training the system lead to a claim for IP infringement? A key question will be who owns the IP rights in the dataset used to train the system.

Where the system is developed entirely in-house using internal datasets, the position is simple as the organisation is likely to own all the IP rights in the data. However things will be more complicated where some or all of the data has been collected from, or generated by, a third party. Data can be protected by an overlapping patchwork of different IP rights and contractual restrictions on the purposes for which the data can be used. For example, one common misapprehension is that data which is freely available online can be re-used for any purpose. This often isn’t the case; website terms and conditions along with copyright and other IP right protections (such as database right) can prevent the data from being used to train a machine learning system.

Using the data without permission presents a potential liability risk which could hinder the development and commercialisation of the system. It will therefore be important for those training AI systems to make sure they background check any data sources and restrictions of use attached to such data. In some cases a licence agreement with the data owner might be required. There will also be data protection and privacy issues to consider where personal data is used as part of the training.

Next time…

Part two of this series will build on this and look at the ownership of IP rights in content generated by AI systems.