Uses And Limitations Of AI In Chip Design
By Raik Brinkmann, OneSpin Solutions
Raik Brinkmann, president and CEO of OneSpin Solutions, sat down with Semiconductor Engineering to talk about AI changes and challenges, new opportunities for using existing technology to improve AI, and vice versa. What follows are excerpts of that conversation.
SE: What’s changing in AI?
Brinkmann: There are a couple of big changes underway. One involves AI in functional safety, where you use context to prove the system is doing something good and that it’s not going to fail.
Basically, it’s making sure that the data you use for training represents the scenarios you need to worry about. When you have many vectors of input it’s difficult to cover all the relevant cases. People are looking into how to analyze the data itself for gaps, and for value distribution and vectors. I’ve seen some some good research about this, and some papers talking about verification and different angles to that. People are taking this seriously, and we will see a lot of interesting research as a result.
SE: What’s the second big shift?
Brinkmann: The use of AI at the edge. It’s a trend that we predicted earlier. Edge devices are capturing data at the edge. You see this with Amazon, Microsoft and others bringing edge devices into the home. That’s part of it. So is Apple’s self-driving car initiative. It’s not clear if they will have a self-driving car anytime soon, but there is serious research going on. And they only do things when they think it will work out and there’s a good chance for success.
SE: What do you see as the biggest challenges there?
Brinkmann: Who owns the data and how to secure it. Different companies are pursuing different goals. One piece of this will involve security at the device level. A second will be the security throughout the chain of data to make sure it’s not manipulated along the way. If you are pushing data from sensors into the cloud to improve machine learning or other algorithms, which you then put back onto a device, you want to make sure that data doesn’t get compromised along the way. There is research into using blockchains for that. Every device is going to add a little more to it, and you can verify the data hasn’t been compromised because it’s been distributed. At the same time, people find ways of saying, ‘Okay, this all may be true, but I’m not going to give you all the data. So I own piece of the chain of the data and you own some something else.’
SE: This starts crossing into the privacy and IP protection domains, right?
Brinkmann: Yes, because people want to retain that knowledge. There was an example I saw recently involving 3D printing, where they use a hyperledger infrastructure. Basically they want to build a system where you have someone putting a requirement out for certain components that are going to be printed in 3D, and then someone designs the component. So that’s IP that you want to protect. But in the end, you still have to send the data to the factory.
SE: What’s the solution? Partitioning and encryption?
Brinkmann: Yes, exactly. You know where the pieces are, what are the work products that everyone needs to see in this process, what are the things you can and need to protect, and what is the lifetime of the data? That’s what you can model, which is quite interesting.
SE: So how do you trace down partitioning from a formal verification perspective?
Brinkmann: Right now we are looking into hardware, and we are starting to look into firmware and other things. I’m not quite sure how this how this plays in this blockchain and connected world. For the time being, we will be focusing on the individual pieces rather than the big picture. I don’t see us verifying the whole chain of events in the system unless we use a different model — not a hardware model, but something that is capturing the whole process in some different languages that we can potentially support and use formal technology to do that.
SE: That’s an interesting challenge.
Brinkmann: When we look at safety in SoCs, which we are targeting with our tools, the first thing that happens is we break down the chip manageable parts. From there, formal will give valuable input to the whole system verification.
SE: One of the problems with those chips, particularly the ones used for safety, is that the AI running on those systems are basically opaque. Is there any progress in understanding what can go wrong with those algorithms?
Brinkmann: The machine learning guys are working on that. They want to know what this thing is doing, and they are developing some inspection and back-propagation algorithms that can be analyzed to try to understand what the system has learned. That’s something they’re trying to do, because if you don’t know exactly what decisions are based on, you can’t really get a good feeling about if it’s safe or not.
SE: That’s still not transparency, right?
Brinkmann: No, but at least they’re trying to get some visibility.
SE: Will formal will play a role here?
Brinkmann: We’ll have to see. It’s certainly possible to verify algorithms and posit things mathematically with proof-of-system systems. Doing a full formal analysis of the chain of transformations is going to be very challenging. If you’re looking at how you take a machine learning algorithm into hardware, there are multiple steps that you do that actually are not equivalent, in a sense. You’re losing information. Let’s say you have trained your algorithm on a compute farm on floating point, and now you switch to fixed point. It’s not equivalent. It’s not the same by construction. Hopefully it will give you the same response to the data in some measures, but there may be some degradation, and there will be some differences in some patterns. So it becomes a statistical expression of how equivalent these two models are.
SE: Basically what is random in this distribution, right?
Brinkmann: Right, and then it’s not the same. If you compute it with some integers, it’s different than doing it with floating point. But you still want to retain a certain amount of what you have proven to be true in the original model in the reduced one. And then you go down to hardware and say, ‘Okay, maybe I can squeeze the precision further in some areas that I’m not so interested in, or can I even go to very low precision in some places so I can map it better to hardware.’ There’s no equivalent in the formal sense. You have to redefine that concept. And only if you have done that can we automate such a process with formal analysis.
SE: Flipping this around, will you use of AI machine learning in formal, as well?
Brinkmann: That’s one of the things we are looking at.
SE: How would that work?
Brinkmann: There’s multiple things you can try. The most obvious is trying to optimize regressions. Many people integrate formal into their continuous integration and continuous deployment platforms in an Agile fashion. That means that you re-run a lot of the tests, and you can look at which proof engines have been performing the best. But which are the right ones to pick? There’s a question you can answer with machine learning for this particular job.
SE: How about identifying patterns you didn’t expect to find, and re-using that model to say, ‘Hey, we found this and it showed up here?’
Brinkmann: I’m looking into this right now for one of our applications. ‘What if you have a structural problem — a repetitive thing that you check for?’
SE: And possibly repetitive occasionally, as opposed to all the time, right?
Brinkmann: Exactly. You try to figure out what is the underlying pattern using machine learning. Then you can try to apply some outlier analysis to pinpoint some things that are strange so that a design engineer or verification engineer can look at it and say, ‘Okay, this is not good, or it’s actually expected.’
SE: So taking this from a different angle, what you’re doing is looking at variation in the design and trying to figure out exactly what caused it?
Brinkmann: Yes. And from a property checking sense, that’s going to be hard. If you have sequential execution of your designs, that will be difficult. But there may be ways of jumping over longer sequences in the state transition by just assuming certain conditions about the design and keeping them fairly general and open. That’s different than simulation, where you have a concrete state. Even if you randomize it, you have some set of concrete states that you want to parallelize at the same time. In formal, you can keep them symbolic. And analyzing designs from several different symbolic states and trying to find patterns that are similar, or which look interestingly different, could be a way to apply machine learning and say, ‘Okay, here’s something I found, or that a tool found, that is different from what it usually does.’
SE: In reality, formal has been looking at software code forever because that’s really what a design is. But does that allow you to take this technology and use it to analyze algorithms and other software and over-the-air updates?
Brinkmann: To some extent, yes, because there are some limitations to what hardware model checking can do today. You have to have a final model. There’s some research into how to expand the scope of this. But even within this scope, you can take pieces of software/firmware and have a hardware-like model of it. If you take, for example, a low-level driver that interfaces with some IP in your design, you can analyze the C code. We do this today. So these are some of the things we are exploring right now with our C++ and SystemC-based model checking. We can apply this to low-level firmware code — typically interfaces with some memory-mapped registers, some interfaces with the hardware — and you have a very strict programming environment that doesn’t include a lot of the software stack that you would pull in. The difficulty when it comes to higher-level software is not the software that you write. It’s the underlying frameworks that you build it on. But it’s very difficult to formally analyze everything, so you need to focus the analysis onto things that are relevant to you. But in order to do that you need models — formal models or semi-abstract pieces that model the underlying software stack, like software libraries that you put in. Let’s say you use a C library, and you call several C functions in there. That’s going to pull in a lot of software that you didn’t write that you would normally trust to work, and you normally wouldn’t want to verify. As a software engineer the assumption is that the libraries you are using are working. You know that’s not always the case, but you have to look for bugs in your own design and not in the libraries you use. And you usually don’t suspect a compiler bug when you map something to hardware or to some process architecture. You usually think it’s user error. There are some good, light formal analysis ways that actually pull in everything, but they can’t give you the full model checking and contextual information there.
SE: Where does security fit into all of this?
Brinkmann: The security aspect is very crucial here. It’s similar to safety in the sense that you have to think it through from the start to analyze what you want to protect against, what are your vectors of attack, what can go wrong. There’s a very elaborate way of doing a safety analysis. People have done this for many years. They have standards for it. We have tools now that can do this on different levels of the system. We are specializing on the chip level, but it’s going up the chain, of course, into the ECU or the whole car. For safety, this is not a solved problem in the sense that it’s still holding a lot of challenges. But for security, there is not even a standard or a good methodology that is established across industries.
SE: We used to think about this as an isolated system. No one worried about the infotainment system because who would want to hack into your radio? But the infotainment system may be a failover for the engine control system in the future, so it has to have the same level of security as everything else.
Brinkmann: Yes, and one interesting aspect we’re looking into from a security perspective is how things are connected. Is it possible to go from from A to B, even if it’s not intentional? That’s actually the key piece. People build systems and they don’t understand all the aspects of how things can work together in their system anymore because they think, ‘This needs to talk to that and I need to make this work with this.’ But there may be scenarios they didn’t foresee that are breaching security by bringing things together that should be isolated.