“What really makes a measurement of high value is a lot of uncertainty combined
with a high cost of being wrong.”
Hubbard, Douglas W. (2010).
How to Measure Anything: Finding the Value of Intangibles in Business
Reliable Integrated Operations
These pages are still under development
The materiel you'll find here is about the services I think should be provided by
a platform for integrated operations. One way to look at integrated operations is as
the natural evolution of SCADA, as a SCADA system always refers
to a system that coordinates, but does not control processes in real time.
I look at the communications technologies usually in place at an operator,
and how they may be used to provide the reliable services they are capable of, and
how to add desirable features that enhance the overall reliability of the communications
infrastructure.
On the basis of those features I present an architecture that supports the development
of what I choose to call Reliable Integrated Operations. The internal architecture for
integration and communication is by far the most complicated part of the proposal,
but presents a simple and reliable programming model to the outside world.
It is based on design elements and software that has been successfully used by companies such as
Raytheon, Boeing, Lockheed-Martin, Siemens, Northrop, Ericsson, 3Com and many others
to create everything from the Ship Self-Defense System on the USS Ronald Reagan aircraft carrier
to television broadcasting and ATM switch signaling software.
A fair indication that we are looking at a mature, flexible and high performance technological approach.
But first a brief overview, because the rest of what you'll find here is presented from
the bottom up to illustrate that each layer of the proposed platform builds upon, and
takes into account, the capabilities of the technological foundation it builds on.
In the Oil & Energy sector, integrated operations (IO) refers to work processes and ways of
doing oil and gas exploration and production, facilitated by information and communication
technology.
The most distinguishing features of integrated operations are:
- Real-Time Process Supervision
- Multi-site work environment
- Multi-disciplinary teams
- Collaboration with focus on production
- Seeks to optimize the whole value chain
To be efficient, integrated operations relies heavily on communications and information technology.
Broadband connections can be used to share process data, video-conferencing and
video-surveillance of the platform. This makes it possible to move some personnel onshore and use
the existing human resources more efficiently.
Instead of having an expert in production optimization on duty at every platform, the expert
can be stationed onshore and be available for consultation for several offshore platforms.
Integrated operations also enables a team at an office in a different time zone to be consulting
the night-shift of a platform, so that no onshore workers need to be at work during the night.
Splitting the team between land and sea allows the operator to implement more efficient work processes
leveraging information and communication technology.
Capability Maturity Model Integration
CMMI is a framework used
to build process improvement systems.
Reliable Integrated Operations can be a valuable tool for:
- Causal Analysis and Resolution
- Organizational Performance Management
A platform for Integrated Operations would include features
that directly supported the following CMMI process areas:
- Decision Analysis and Resolution
- Measurement and Analysis
- Organizational Process Focus
- Process and Product Quality Assurance
- Risk Management
And provide integration with the existing services for:
- Configuration Management
- Organizational Process Definition
- Organizational Process Performance
- Project Monitoring and Control
- Project Planning
- Quantitative Project Management
- Requirements Management
CMMI helps organizations to improve their performance and capability to consistently
and predictably deliver the products, services, and goods their customers want,
when they want them and at a price they're willing to pay.
From a purely inwardly-facing perspective, CMMI helps companies
improve operational performance by lowering the cost of production,
delivery, and sourcing.
The Norwegian Armed Forces Datatjenester (Data Services) choose the CMMI for
Services as a business process improvement model when faced with challenge of
building one integrated unit with one unifying culture achieving:
- Clear articulation of the unit's mission, role and vision
- Enhanced focus on leadership
- Enhancing the units operational capabilities
- Enhancing leadership
- Enhancing sharing
A platform for Integrated operations should support change and continuous improvement.
TOGAF builds on CMMI and uses these
methods and techniques in relation to enterprise architecture.
Integrated operations is an aspect the Enterprise Architecture
Technology & infrastructure enables architecture to provide meaning to available information.
The reliability of the IO solution depends on how reliable the chosen technologies that makes up the
infrastructure are. As more control functions are transferred onshore, the reliability of
the integrated operations solution becomes mission critical to the operator.
Technologies that previously provided an adequate level of service, may no longer be applicable
as they are unable to provide the level of reliability required for the emerging uses
of integrated operations.
Integrated operations is an aspect of Enterprise Architecture (EA) for the process industry.
Enterprise architecture uses principles that has grown out of software architecture,
and applies them to management and organization science to provide a description
of the structure and work-flows of the enterprise. Enterprise architecture is an
emerging discipline based on four pillars:
- Business architecture: Defines the business strategy, governance,
organization, and business processes within the organization
- Applications architecture: Provides a high-level blueprint
for individual application/component systems, their relationships to
the business processes, the interactions between them,
and how they expose functionality for integration.
- Data architecture: Describes the structure of an organization's data assets
and the data management resources
- Technical architecture: Describes the hardware, software and network
infrastructure needed to support the applications
Business architecture includes people, responsibility, and interactions
between people.
Some EAFs', like TOGAF
is centered around systems of software and their evolution, but
the principles of EA are applicable to many other aspects of the enterprise.
Interdependencies
Interdependencies give rise to numerous challenges that need to be taken into account to
build a reliable distributed platform for critical applications.
An interdependency is a bidirectional relationship between two infrastructures were the
state of each infrastructure influences the state of the other. Generally speaking ,
two infrastructures are interdependent if each is dependent on the other.
Integration & Interdependencies
The Oil & Energy sector provides vital services to the community,
and as operators establishes solutions for integrated operations onshore -
the requirements for stable operation of the onshore infrastructure takes on
aspects of supervisory offshore systems. It follows that the security
and reliability requirements associated with the previously offshore operations
has to propagate onshore along with the operations.
Infrastructure interdependencies can be categorized according to various dimensions in
order to facilitate their identification, understanding and analysis. As Integrated operations
aims to integrate the capabilities of several existing infrastructures supporting
management, process supervision & control and maintenance functionality
it's important that the architecture addresses interdependency issues. It's also
possible that a platform for Integrated Operations would be a
candidate for European Public-Private Partnership for Resilience
or similar efforts towards establishing a reference framework for governance of critical information infrastructures.
EU defines critical information infrastructure (CII) as those systems that
provide the resources upon which all the functions of
society depend, such as telecommunications,
transportation, energy, water supplies, healthcare,
emergency services, manufacturing and financial services,
as well as essential governmental functions.
Establishment Of a European Public-Private Partnership For Resilience (EP3R)
states that Enhancing security and resilience of CIIs is a joint responsibility which is shared among
a multiplicity of public and private stakeholders. The success of EP3R would depend on
the active participation and strong commitment of all relevant stakeholders.
Critical Information Infrastructure Protection (CIIP) underlines
the need for protecting critical information infrastructures.
CIIP builds on five pillars:
- Preparedness and prevention
- Detection and response
- Mitigation and recovery
- Cooperation
- Criteria for Critical Infrastructures
Research indicates that due to the increased number of interdependencies between systems in Integrated Operations,
the increased exploration of real time data and different organizational silos of competence between
IT and Automation; a security, or safety, incident in the ICT/SCADA systems may have complex and unanticipated consequences.
Types of interdependencies
Four classes of interdependencies have been distinguished: Physical, cyber, geographic, and logical.
Physical interdependencies arise from physical linkages or connections among
elements of the infrastructures. In this context disruptions and perturbations in one
infrastructure can propagate to other infrastructures.
Cyber interdependencies occur when the state of an infrastructure depends on
information transmitted through the information infrastructure. Such
interdependencies result from the increased use of computer-based information
systems such as SCADA systems, to support control, monitoring and management
activities
Geographic interdependencies exist between two infrastructures when a local
environmental event can create state changes in both of them. This generally
occurs when the elements of the infrastructures are in close spatial proximity.
Logical interdependencies gather all interdependencies that are not physical, cyber
or geographic, caused for example by regulatory, legal or policy constraints
Environment
Any integration architecture for critical applications needs awareness of the
environment, including economic, legal/regulatory, technical,
social/political, business, public policy, security and health/safety issues.
Infrastructure characteristics
These concern in particular the structural composition of the infrastructures
and their temporal dynamics.
State of Operation
In order to provide a reliable platform Integrated Operations it is necessary to
understand how the different components depend on each other taking into account the
different operation states of each component and how it affect the operational state of
other components. An platform for critical applications generally features several
performance levels and thus, several modes of operation can be distinguished,
ranging from full capacity to emergency situation.
These modes of service depend on the workload and level of stress of the system, the
different error and failure conditions that might occur and their severity, and the error
recovery and restoration actions that can be applied to cope with these failures.
Type of Failure
Three types of failures are of particular interest when analyzing interdependent infrastructures:
Cascading failures occur when a disruption in one infrastructure causes the failure
of one or more components in a second infrastructure.
Escalating failures occur when an existing failure in one infrastructure exacerbates
an independent disruption in another infrastructure, increasing its severity or the
time for recovery and restoration from this failure.
Common cause failures occur when two or more infrastructures are affected
simultaneously because of some common cause.
Additional Resources
- Pattern-Oriented Software Architecture: A System of Patterns, Volume 1
- Pattern-Oriented Software Architecture: Patterns for Concurrent and Networked Objects, Volume 2
- Pattern-Oriented Software Architecture: Patterns for Resource Management, Volume 3
- Pattern-Oriented Software Architecture: A Pattern Language for Distributed Computing, Volume 4
- Patterns of Enterprise Application Architecture
- Enterprise Integration Patterns Designing Building and Deploying Messaging Solutions
- US-CERT:Control Systems Security Program
- Managing emerging information security risks during transitions to Integrated Operations - Ying Qian, Yulin Fang, Martin Gilje Jaatun, Stig Ole Johnsen, Jose J. Gonzalez
- SINTEF Technology and Society:State of the art report – “SAFETY, SECURITY AND RESILIENCE IN INTEGRATED OPERATIONS” - Stig Ole Johnsen, Bjørn Axel Gran, Martin Gilje Jaatun, Sjur Larsen, Atoosa P-J. Thunem
- CRUTIAL
- IFIP Working Group 11.10 on Critical Infrastructure Protection
- IRRIIS – Integrated Risk Reduction of Information-based Infrastructure Systems
- Critical Information Infrastructure Protection
- Situation awareness and safety in offshore drill crews - Anne Sneddon, Kathryn Mearns, Rhona Flin
- Situation awareness and the cognitive management of complex systems, - Adams M, Tenney Y, Pew R
- A Computational Model of Attention/Situation Awareness - Jason S. McCarley, Christopher D. Wickens, Juliana Goh, and William J. Horrey
- Situational Awareness and Safety - Neville A. Stanton, Peter R. G. Chambers, John Piggott
- Converged Communications for Pipeline Operations and Security - Upendra H. Manyam
- Change patterns and change support features – Enhancing flexibility in process-aware information systems - Barbara Weber, Manfred Reichert, Stefanie Rinderle-Ma
- Deadline-based Escalation in Process-Aware Information Systems - Wil M.P. van der Aalst, Michael Rosemann, Marlon Dumas
- Designing fault tolerant networks to prevent poison message failure - Xiaojiang Du1, Mark A. Shayman, Ronald A. Skoog
- System Engineering and Software Exception Handling - Herbert Hecht
- SYSTEMS FAILURES:An approach to understanding what can go wrong - John Donaldson, John Jenkins
- Coordinated Atomic Actions
- Integrated Barrier Analysis in Operational Risk Assessment in Offshore Petroleum Operations - Jan Erik Vinnem, Terje Aven, Stein Hauge, Jorunn Seljelid, Gunnar Veire
“Quality is a direct experience independent of and prior to intellectual abstractions.”
Robert M. Pirsig
Communication Infrastructure Capabilities & Integrated Operations
Most operators have made considerable investments in their communication infrastructure,
an infrastructure that provides important features that are largely unused by the current
generation of platforms used to enable integrated operations.
To facilitate efficient communication for reliable integrated operations there are a number of key
factors that should be taken into account:
- Security
- Reliability
- The amount of real-time process data
- The amount of process alarm and event data
- The amount of process control data
- The amount of quality of service data
- The amount of video-surveillance data
- The amount of video-conference data
A typical oil field may have between 10 000 and 100 000 tags, where some tags may have a sub-second resolution.
For a tag there may exist a dead-band that defines the minimum change of interest to the operator. Only when a
change exceeds the dead-band will it be stored in the history of the tag.
Under normal operation, a solution for integrated operations must be able to handle 100 000 alarms and events
each day, and during an emergency the system must be capable of handling far more than that.
Alarms and events requires that the solution is capable of prioritizing the traffic based on the priority
of the alarm or event.
The amount of process control/command data is negligible, but the system must be able to prioritize control data
ahead of most data - normally just below emergency and critical alerts. Shutdown commands must progress
through the system at a higher priority than all other communication.
In integrated operations there are ideally no direct physical link between the PLC, SCADA, or any of the
other offshore components and the onshore IO components. Usually these components reside on separate
networks - and interaction between offshore and onshore components should be managed
by a reliable secure data bridge. Information packages exchanged between components of an
IO solution should be signed using digital certificates. The data bridge should authenticate
the sender and verify senders authorization to communicate over the bridge using the
digital certificate attached to the message.
Augmented Quality of Service information can help to ensure that the components in an IO solution are aware
of the quality of service provided by the network components facilitating communication between the various
components. A reliable integrated operations solution could incorporate distributed network monitoring providing
integration between SNMP and alarms and events.
Ethernet may offer Quality of Service[^]
through its 802.1p[^].
While the ethernet notion of quality of service provides desirable services for integrated operations, IO can benefit from
a more comprehensive mechanism that allows components in an IO network to be aware of communication and other component failures.
Broadband connections does not mean unlimited bandwidth, while a 100 GBit/s seems to provide
near unlimited bandwidth - usually only a fraction of that bandwidth will be available to any
single component of your integrated operations solution. If the broadband connection fails,
a satellite link often provides the remaining means of communication for integrated operations,
and the solution has to work with a 4 MBit/s connection with increased latency.
Retrieving the current value 34 359 tags requires just above 13MB or about 112MBit when you account for
802.1q, IPv6 and ICP timestamps (slightly less for IPv4 and TCP timestamps). When you add the bandwidth
requirements for alarms and events, commands, video-surveillance, video-conferences, snmp, ntp, ldap,
kerberos and all the other services you need; you'll realize that we are looking at a lot of traffic.
Prioritizing Traffic
Time-sensitive network traffic, such as real-time process data, alarms and events, and commands
can be prioritized ahead bandwidth intensive multimedia and remote session traffic.
The Internet Engineering Task Force (IETF) and the Institute of Electrical and Electronic Engineers (IEEE)
defines a set of technologies that provide Quality of Service (QoS) for ip v4 and ip v6 based networks.
The technologies are designed to alleviate the problems caused by shared network resources and
finite bandwidth and includes mechanisms for prioritization and traffic shaping.
QoS also can be used to improve the throughput of traffic that crosses a slow link,
such as a satellite link.
802.1p allows eight different classes of service to be expressed through the 3-bit PCP field in an 802.1q header.
How traffic is treated when assigned to any particular class is undefined and left to the implementation, so extensive
end-to-end testing is recommended.
IEEE recommends the following:
| PCP | Network priority | Traffic characteristics |
| 1 | 0 (lowest) | Background |
| 0 | 1 | Best Effort |
| 2 | 2 | Excellent Effort |
| 3 | 3 | Critical Applications |
| 4 | 4 | Video, < 100 ms latency |
| 5 | 5 | Voice, < 10 ms latency |
| 6 | 6 | Internetwork Control |
| 7 | 7 (highest) | Network Control |
This means that voice and video usually will have a higher priority than process , alarms and events, and
control/command data. This is perhaps not ideally suited for integrated operations, but there is nothing
that prevents a solution provider from assigning lower priorities to voice and video.
All network elements along the path that prioritized traffic takes must support QoS.
This includes:
- The sending and receiving hosts
- Layer 2 (Data Link layer) network devices (bridges and switches)
- Layer 3 (Network layer) network devices (routers), including routers used for wide area network (WAN) links
If a network device along this path does not support QoS, the traffic flow receives the standard first-come,
first-served treatment on that network segment.
Dropped packets
Dropping packets wastes the resources that have already been expended in
carrying these packets so far through the network.
The mechanism assumes that the congestion problem is resolved by the time
the packets are re-sent, or in the case of TCP datagrams, that TCP will throttle
back transmission rates at the sender to reduce congestion in the network.
TCP streams tend to build up their transmission rates together, reach
the peak throughput of the network, and crash together to a lower
rate as packets are dropped, only to repeat the process.
Communication mechanisms such as SOAP, REST, JSON over http/https are particularly
vulnerable, as the size of the data exchanged between client and server tends to
be in the multi megabyte range.
Careful use of the QoS mechanisms, together with appropriate communication mechanisms,
can help to provide a more reliable transport mechanism for critical information.
Efficient communication for Integrated Operations
Many distributed real-time and embedded systems (such as PLCs' and SCADA) would benefit from an
event-based communication model.
Synchronous method invocation (SMI) where a client invokes a two-way operation on server and then
blocks waiting for the response is the most common communication model used by suppliers of
components for integrated operations. This model has limitations, however, stemming from the tight
coupling between client and server lifetimes, synchronous communication, and point-to-point communication.
The deficiencies of the currently prevalent communication model with regards to integrated operations
can be resolved by a distributed publish/subscribe service that decouples event sources and sinks -
providing a reliable asynchronous communication model that allows transparent
group communication. Obviously SMI is an appropriate architecture for most
software, and it's appropriate that an IO solution supports the SMI model for integration
with external systems.
The asynchronous communication model is appropriate for dealing asynchronous, real-time, events.
Other communication should not be allowed to disturb the scheduling and prioritization of
real-time data required by reliable integrated operations.
Here are some features the would enable efficient communication
for integrated operations:
- Persistent and Non-persistent Event Channels
- Scheduling & Priority Management
- Real-time deadline assurance
- Grouping
- Ordering
- Filters
- Shared Subscription Information
- Support streaming of voice and video data at "Best effort" or "Background" priorities
This communication model ensures that important information can be pushed
through the communication framework ahead of less important information
such as voice and video.
OPC
OPC does not support Quality of Service. The OPC Dictionary at OPC Training Institute mentions it.
So all data will ideally be assigned the default "Best Effort" priority, but may be assigned "Background"
priority. Microsoft has identified this as a problem with some NDIS drivers.
What is the problem with 802.1p?
While "Network Administrators might select a prioritization strategy where the most important data
is allowed to pass first. It is even possible to “drop” (lose) data altogether to ensure that significant
data is not queued behind optional information" has some truth to it, I feel that it's not reasonable
to demand this in a scenario as complex as integrated operations. I'm also a bit uncertain about how one can
implement logic in a switch that is capable of determining whether some piece of OPC data is optional.
OPC is based on DCOM, and DCOM uses MS-RPCE,
an extended version of DCE-RPC.
Cisco documents additional problems with DCOM NBAR Versus RPC DCOM/W32/MS Blaster:
The DCOM protocol enables Microsoft software components to communicate with one another.
This is a core function of the Windows kernel and cannot be disabled. The vulnerability
results because the Windows RPC service does not properly check message inputs under certain
circumstances. By sending a malformed RPC message, an attacker can cause the RPC service on a
device to fail in such a way that arbitrary code could be executed. The typical exploit for
this vulnerability launches a reverse-telnet back to the attacker's host to gain complete
access to the target.
Successful exploitation of this vulnerability enables an attacker to run code with local
system privileges. This enables an attacker to install programs; view, change, or delete data;
and create new accounts with full privileges. Because RPC is active by default on all versions
of the Windows operating system, any user who can deliver a malformed TCP request to an RPC
interface of a vulnerable computer could attempt to exploit the vulnerability. It is even
possible to trigger this vulnerability through other means, such as logging into an affected
system and exploiting the vulnerable component locally
HTTP/HTTP over TLS (https)
It's fairly well documented that http & and https
introduces a significant bandwidth overhead - it's not unusual to observe an overhead of one or even two orders of magnitude,
or between 1000% and 10 000%. Neither is there any provision for QoS at the protocol layer.
For XML based service-oriented architectures (SOAs') payload sizes in the 1Mbyte range are not unusual.
The reason for this is that the SOA architecture tends to be based on stateless agents,
and the entire state of a transaction therefore has to be contained in a document.
As an XML document flows through the SOA system, it will grow as it accumulates intermediate results as part of
its being processed.
These sizes are in distinct contrast to those usually required for integrated operations, where the primary payload
is process, control, and alarm and event data.
As standards based integration is fairly important to integrated operations; SOAP, REST, or other XML based services, should
provide means of communication with external systems; but not serve as the back-bone communication
technology for a reliable integrated operations solution. This SoA integration layer belongs at the logical edge
of the integrated operations solution.
Windows 7 and Windows Server 2008 provides a new feature called URL-based QoS
and this provides for a coarse prioritization of Web traffic based on the URL. It's use is intended to prevent visits non-work-related Web sites
from consuming a large portion of the network’s bandwidth.
“Definition of risk
Long definition: The probability and magnitude of a loss, disaster, or other undesirable event
Shorter (equivalent) definition: Something bad could happen”
Hubbard, Douglas W. (2009).
The Failure of Risk Management: Why It’s Broken and How to Fix It
Cyber Threats
What's this got to do with Integrated Operations? I can somehow hear the response:
"Hackers are just college pranks ...", well not any more, and the
US Department of Defense decision
to allow the U.S. to respond to cyber attacks with physical force
in some cases only underscores this change. In the UK they are revving up their
cyber warfare plans,
as they are developing a cyber-weapons programme
that will give ministers an attacking capability to help counter growing threats
to national security from cyberspace.
According to Channel 4 News China admits that it has an elite
unit of cyber warriors in its army, and Germany are establishing
two high-level government agencies
devoted exclusively to cyber-war.
Investigators are typically unable to disclose information about investigations
into cyber threats because of non-disclosure agreements, so it is likely that
the problem is much greater than what we can be lead to believe based on the
media coverage - and that's severe enough.
In March 2011 a security breach made it possible for criminals to enter
into EMC Corp's RSA security division's security systems by creating
duplicates to "SecurID" electronic keys.
SecurIDs are widely used electronic keys designed to thwart hackers who might use
key-logging viruses to capture passwords by constantly generating new passwords to
enter the system. EMC has disclosed that the hackers had broken into their network
and stolen some SecurID-related information that could be used to compromise the
effectiveness of those devices in securing customer networks.
RSA is among the best at securing networks, and even they can't keep their most sensitive
information out of the hands of hackers.
In May 2011, hackers broke into the security networks of the world's biggest defense
contactor Lockheed Martin Corp, and they are pretty savvy when it comes to
defending their networks too. It's fairly obvious that hackers have more resources
at their disposal and that they are getting more sophisticated.
Integrated operations expose automation systems to the intranet, and indirectly to the Internet, opening
mission critical systems, that were never really designed for coping with advanced cyber threats, to attack.
Stuxnet is a computer worm. It targets Siemens
industrial software and equipment running on Microsoft Windows. While it is not the first time that
crackers have targeted industrial systems, it is the first discovered malware that spies on and subverts
industrial systems, and the first to include a programmable logic controller (PLC) rootkit.
In his article How to Hijack a Controller - Why Stuxnet Isn't Just About Siemens' PLCs Ralph Langner
provides an insight into just how serious the ramifications of Stuxnet is for the automation industry.
It can be expected that controller vendors will see this as a major business opportunity because
the outlook to replace millions of controllers before end-of-lifetime with upgraded product versions
means a multi-million dollar market.
Ralph Langner
The CRUTIAL Project showed that flooding based DoS attacks have severe
effects on IEC 60870-5-104 communications in terms of both loss of messages and total block.
The Modbus servers are known to be relative simple to attack.
In their white paper entitled “Effective OPC Security for Control Systems,” Eric Byres, chief technology
officer at Byres Security Inc. and Darek Kominek, Manager, OPC Marketing, MatrikonOPC, talks about the
security advantages of limiting network interfaces and protocols. The paper tries to give the impression
of OPC as a safe technology, but effectively documents that OPC relies on external security measures.
This is in sharp contrast to the Security Development Lifecycle (SDL) methodologies currently
being implemented by companies such as Microsoft.
OPC servers expose a standard COM event sink to client applications, making it easy to
implement code that will block the OPC server by entering an infinite loop.
Statements like
"This is yet another reason to only install or use software from a known and trusted OPC vendor"
only serve to underline how fragile this technology is.
During the winter of 2002/2003, numerous acts of sabotage targeted the SCADA system
responsible for loading oil tankers at a major marine terminal in Venezuela .
In one such attack, PLC code was erased, causing an eight hour delay loading tankers.
As documented by Cisco in Siemens Tecnomatix FactoryLink Multiple Vulnerabilities,
multiple Denial of Service (DoS) vulnerabilities have been reported in Siemens'
Tecnomatix FactoryLink SCADA system. The vulnerabilities are due to insufficient
verification of messages' data by the Siemens Tecnomatix FactoryLink's SCADA services,
while handling messages sent to the server. Remote attackers can exploit these
vulnerabilities by sending a specially crafted message to the affected server.
Successful exploitation of these vulnerabilities may lead to a DoS condition, which
may cause the server to become unresponsive. There may also be other vulnerabilities ...
According to IWA Publishing, a wholly owned subsidiary of the International Water Association:
Research from application security management firm Idappcom found 52 new threats in March 2011
targeted at Supervisory Control and Data Acquisition systems. Idappcom’s chief technology
officer Tony Haywood told specialist press that hackers could be attacking the systems
because they are typically less well protected than public-facing IT solutions.
He said the increase ‘may be an indicator towards a worrying trend’.
On the 15th of March 2011, Iranian crackers used a username and password to make certificate requests from the Comodo Certificate Authority.
These requests were successful and certificates were issued for 9 domains which are published on
the Comodo Fraud Incident Report page.
The fraudulent certificates are for the major Identity Provider sources on the Internet
mail.google.com, www.google.com, login.yahoo.com, login.skype.com, addons.mozilla.org, login.live.com, global trustee.
These certificates may be used to spoof content, perform phishing attacks, or perform man-in-the-middle attacks against
all internet application users. Revocations of your computer’s trust of these certificates can be obtained via a
web browser update. All of these certificates were revoked immediately on discovery. Monitoring of OCSP responder traffic
has not detected any attempted use of these certificates after their revocation.
Comodo concludes: that this was likely to be a state-driven attack.
On the 2nd. of March 2011 Symantecs' securityfocus site reported
an issue with IBMs' Tivoli that allows attackers to use a browser to compromise the application,
access or modify data, or exploit latent vulnerabilities in the underlying database.
On the 25th. of September 2008 Symantecs' securityfocus site reported
an remote buffer-overflow vulnerability with ABB PCU400. The PCU400 handles communication with RTUs, IEDs and Substation Automation Systems.
On the 3rd. August 2011 Symantecs' securityfocus site reported
an issue with SIMATIC S7-300 controller. The S7-300 is marketed as an ideal universal automation system for centralized and
decentralized configurations, and an attacker can carry out this attack using readily available network utilities.
On the 7th. July 2011 Symantecs' securityfocus site updated their report
on an issue with several of the SIMATIC S7 line of controllers 200,300,400 and 1200.
Considerations
When you mix mission critical infrastructure for automation and your intranet - you make the
assumption that the security of your intranet has not been compromised. Based on
available information that seems to be a risky assumption. By further assuming that
the major vendors of automation systems is on top of the situation, you ignore
that major security conscious enterprises and organizations, with far more experience
in this area, still have their security breached on a regular basis.
When it comes to preventing malicious attacks "Best Practices" for IT often balances
the cost of security measures against the cost of restoring lost data from a backup.
If an attacker can get into the automaton infrastructure, reprogram PLCs' and SCADA systems,
the possible damage can be of an entirely different order of magnitude. Considering
how Stuxnet worked it's way into
PLCs' it's reasonable to consider that even the most farfetched of schemes may be worth
guarding against.
Many automation system providers are dangerously unaware of the real security issues
facing their installations, and this has led to a false sense of security within
the user community and a lack of urgency among automation system providers.
Customers and automation system providers see security as an operating system
problem or a network perimeter and firewall problem, but it has become obvious
that this is simply not true.
In short, if you build automation systems and software, and your systems and solutions can be
accessed by potentially malicious users inside or outside the firewall, the solution will
come under attack.
“Computers have enabled people to make more mistakes faster
than almost any invention in history, with the possible exception of
tequila and hand guns”
Mitch Ratcliffe
Infrastructure monitoring
Infrastructure monitoring provides the functionality that can enable Augmented Quality of Service
appropriate for reliable integrated operations. By monitoring a computer network for slow or
failing components; it's possible to provide an enhanced view of the real-time quality
of process and control capabilities of the integrated operations infrastructure.
By monitoring numerous parameters of the network and the health and integrity of servers
that play a role in the IO solution the reliability of the IO infrastructure is tied to
the quality of the process information and the process command execution capabilities.
This will enable rapid identification of infrastructure problems, while at the same time
provide valuable information for capacity planning.
This requires a distributed and efficient monitoring solution capable of:
- Performance monitoring
- Availability monitoring
- Integrity monitoring
- Flexible condition based notification and filtering
- Logging
- Automatic discovery by IP range, services and SNMP
- Automatic monitoring of discovered devices
- Automatic execution of remote commands
- Automatic IPMI commands
To effectively support integrated operations the capabilities of the monitoring framework
should be able to handle:
- Millions of monitored devices
- An order of magnitude more monitored metrics
- Agents capable of executing thousands of availability and performance checks per second
“Never worry about theory as long as the machinery does what it's supposed to do.”
Robert A. Heinlein
“[Defines quality as the] degree to which a set of inherent characteristics fulfills requirements -
where inherent as opposed to assigned, means existing in something,
especially as a permanent characteristic”
ISO 9000
Architecture
On way to look at integrated operations is as the natural evolution of SCADA, as a SCADA system always refers
to a system that coordinates, but does not control processes in real time.
Integrated operations is primarily about onshore supervisory control of offshore resources
and access to the current state of processes and equipment - augmented by Visualization, and
Optimization & Operations Research.
An architecture for integrated operations should be based on proven technologies,
as putting together a working IO framework of service components is a challenging
task in itself.
To achieve the primary goal of integrated operations, the architecture
should focus on the following areas:
- Reliable, prioritized, predictable and fault tolerant communication
- Integration of available data sources
- Integration of available services
Design for Failure
All systems have failure points, where the operating conditions
does not satisfy the runtime requirements. Software for critical systems is
expected to protect against a wide range of anomalies that can include:
- Unusual environmental conditions
- Erroneous inputs from operators
- Faults in the computer, the software and communication lines
- Sensor and actuator failures
The portions of the programs that provides this protection are
called exception handlers. Their purpose is to
- Detect that an anomalous condition has been encountered
- Provide a recovery path that permits continued system operation, sometimes with reduced capabilities
In critical systems a large part of the development effort is devoted to
exception handling, and a significant part of the failures in these systems
have been traced to deficiencies in the exception handling code.
So, providing a reliable framework for exception handling is important
to the overall robustness of the solution.
A study of failures in AT&Ts
Electronic Switching System showed the following distribution of causes:
- Recovery 35%
- Processing errors 30%
- Hardware 20%
- Software 15%
It appears that faults in error handling and recovery accounts for 65%
system failures, faults during normal operation only accounts for 15%.
Similar studies shows similar results, so it appears that error handling
and recovery is a cause for serious concern.
Considerations
Since recovery might not be possible it's sensible to provide some kind of
notification about the condition that has caused the system to enter the
exception handling state, and log this information for later analysis.
Critical components provides one or more critical services, otherwise they would not be critical - so it's
vital that the solution provides a mechanism for notifying components that depends on the services
provided by the faulting component that those services are not available.
During recovery the faulting component should provide notifications about the services
as they become available, allowing dependant components to adjust to the level of service
currently provided.
Integration
Integration enables separate components to work together to produce a unified
set of functionality. Some components may have to be developed, while others
can be bought from existing software vendors. The components are often distributed
among multiple computers - often not running on the same local area network;
or even located on the same continent.
Integration has a lot to do with communication, so it makes sense to handle
integration and communication at the same time.
Successful integration involves a wide range of considerations and consequences
that must be taken into account to provide for secure and reliable operation.
Usually an enterprise already has a number of software solutions - and any new
development effort has to work with the existing components.
Architectures for critical systems have to pay strict attention to
the predictability of service execution. Deterministic behavior of
the components of a critical system promotes the overall system
predictability. In order to decide if a critical requirement is
met, the system must behave predictably. This can only happen if
all the parts of the system behave deterministically and if they
combine predictably.
An Architectures for critical systems needs to include the
following four major components, each of which must be
integrated in a way that promotes end-to-end
predictability in the system as a whole:
- The communication & transport infrastructure
- The scheduling mechanisms of the operating system
- The services provided by the architecture
- The services provided by the application components
The interfaces and mechanisms provided by an architecture for
critical systems should facilitate a predictable combination of
components. The components manages resources by using
the mechanisms provided by the architecture, and the architecture
also provides mechanisms to facilitate the coordination of
activities between components.
Considerations
Component coupling: Integrated components should minimize their
dependencies on each other so that each can evolve without causing problems
for the other. Tightly coupled components make numerous assumptions
about how the other components work; and when one or more component is
altered or replaced, those assumptions break - and then the integration breaks.
The interface for integrating components should be specific enough to provide useful
functionality, but general enough to allow components to evolve and adapt to
requirement changes.
Simplicity: When integrating components, architects and developers should
try to minimized the work required to provide for integration.
Technology: Different integration technologies provides different capabilities.
While integration technologies like DCE-RPC and Corba has a reputation for being
overly complex and hard to understand, this complexity is often a result
of features that target distributed solutions and are unmatched
by more recent RPC technologies such as SOAP, J-SON and REST.
Data format: Usually integration is for exchange of data, and integrated components
must agree on the format of the data they exchange, or the solution must provide an intermediate
translator component that serves as a communication bridge between components that
insist on different data formats.
Timeliness: Integration should minimize time between when
one component decides to share some data and other components receives that data.
Components should be informed as soon as possible when data they depend on is changed
and ready for consumption. Latency in data communication has to be factored into
the integration design; the longer communication takes, the greater opportunity for
the data to become stale, and the integration becomes more complex.
Functionality: Components may want to share more than data,
they may wish to share functionality such that each components can invoke
functionality in other components. Invoking functionality remotely can be difficult
to achieve, and even though it seems the same as invoking local functionality,
it works quite differently, and that has significant consequences for how
successful the integration is.
Asynchronous events: Computer processing is usually synchronous,
a procedure waits while a sub-procedure executes. It’s given that the
sub-procedure is available when the procedure wants to invoke it.
Components, on the other hand may not want, or need, to wait for the
remote operations to complete, just invoke the remote functionality and
then continue to execute without waiting for a result. This is especially
true for distributed components where the other remote component may not
be running or the network may not be available.
Framework services
Service locator
Distributed components require a way to look up other components that
provide services they rely on. A service locator centralizes
distributed component lookups, provides a centralized point of control,
and may act as a cache that eliminates redundant lookups.
Fault Tolerance
Fault Tolerance enables a system to continue operation, sometimes at a
reduced level of service. The framework should provide for component redundancy,
fault detection, recovery and services that support replication of components.
Fault Tolerance should provide for a range of strategies, including request retry,
redirection to an alternative server, passive (primary/backup) replication,
and active replication which provides more rapid recovery from faults.
The framework should allow users to define fault tolerance properties for
each replicated component.
Life cycle
Life cycle should provide services and a protocol for creating, deleting,
copying and moving components.
Reflection
The framework should provide mechanisms that allow components to retrieve information
about other components, including information about provided services and data structures
that allows components to dynamically invoke operations on other components.
Transaction management
The framework should provide a Transaction Service enabling transaction synchronization
across the elements of a distributed component system. A transaction can involve multiple
components handling multiple requests. The scope of a transaction is defined by a transaction
context that is shared by the participating components. The Transaction Service should
place no constraints on the number of components involved, the topology of the system,
or the way in which the system is distributed across a network.
Logging
The framework should provide a Log Service to be able to store information about
events that may be of interest to the operator or component provider. The Log Service
should provide capabilities to form log networks for storing and forwarding events.
Remote Procedure Call (RPC)
RPC allows components to communicate across process boundaries using a development
style that makes remote procedures appear as local procedures. When a component
needs to modify data across process boundaries, it makes a call to the remote component.
The components maintains the integrity of the data it owns, and each component can be
altered without affecting other components as long as its interface remains unchanged.
Most RPC technologies use an interface description language (IDL) to define the interfaces
available to remote components. The IDL files is then used to generate code that hides
the complexities of interprocess communication from the developers.
To behave predictably RPC technologies require:
- That components are running concurrently
- That components has available processing capacity
- That the network is reliable and has enough spare capacity
Messaging
Messaging allows components to send and receive messages between components.
The technology allows component to be distributed over local and remote networks
and reduces the complexity solutions that span multiple operating systems and
network protocols. Messaging is a middleware technology that provides a
distributed communications layer that insulates the developer from the details
of the various operating system and network interfaces.
Messaging makes components loosely coupled by communicating asynchronously,
this also makes the communication more reliable because the communicating components
do not have to be running at the same time. The messaging system is responsible for
transferring data between components.
Messaging technologies does not require:
- That components are running concurrently
- That components has available processing capacity
- That the network is reliable and has enough spare capacity
“We are hoarding potentials so great that they are just about unimaginable.”
Jack Schwartz
Information & Storage
Integrated operations can make effective use if information from a wide range of sources,
including - but definitely not limited to:
- Measurement & Analysis
- Corporate Dashboards, including KPI
- Online Equipment Monitoring
- Performance Optimization
- Substation Automation
- Predictive Maintenance
- Post-Incident Analysis
By leveraging a common information model this information can be
seamlessly integrated into the IO infrastructure, bringing all the
information to the right people at the right time.
Model Driven Architecture
InfoPoint is a Model Driven Architecture (MDA)
framework that enables efficient distributed access to information.
InfoPoint uses information from existing systems, and provides extensive meta modeling facilities
based on the design of the MetaObject Facility.
The modeling framework has many similarities to the Eclipse Modeling Framework.
Ontology
Ontology represents knowledge as a set of concepts within a domain, and
how those concepts relate to each other. It can be used to reason about the
entities within that domain, and may be used to describe the domain.
According to Thomas Robert Gruber an ontology is a "formal,
explicit specification of a shared conceptualization" providing a shared vocabulary and taxonomy
(classification), which models a domain — it allows for definition of objects, and their properties and relations.
InfoPoint is capable of expressing complex ontologies through its modeling facilities.
InfoPoint provides the tools required to generate a structured framework for organizing information
based on the existing information infrastructure. The technology provides an efficient platform
for the Semantic Web, systems engineering,
simulation, optimization,
and other efforts related to the information architecture.
Based on an extensive suite of code generators, InfoPoint accelerates development from
the model of the required business functionality and
behavior to a deployable extensible component framework.
Relational Data
InfoPoint automatically maps relational data sources. Relational Databases
like Oracle RDBMS, IBM DB2
and Microsoft SQL Server provides information about
how data is structured, properties and relations. InfoPoint can automatically import this information and generate
a distributed, event driven, change tracking framework providing access to the data under management.
InfoPoint is also capable of creating a relational database based on model information. The model can be extended
and InfoPoint will update existing databases with new tables and related definitions.
Real-time Data
Integrated operations relies on real-time data historians for acquiring,
storing, and displaying large amounts of operations and engineering information.
Access to data from all levels of the operations, allowing decades of high
frequency time-series data to be available online, in its original resolution;
provides cross discipline teams with an invaluable tool for production
analysis and optimization.
InfoPoint provides a model for access to real-time data and historians, such as AspenTech Ip 21
and OSISoft PI. The communication and integration architecture provides the facilities
required to reliably access data and distribute change notification.
Alarm and Events
Alarm and Events provides valuable information for integrated operations.
Effective alarm & event management is central to a continuous
improvement process.
Developing a continuous improvement model for alarm and events management is
an important step in attaining a higher level in operational excellence as it
has a direct impact on mitigating and preventing abnormal events as they develop.
Alarms are a signal to the operator that they should intervene in the process
operation to correct a condition and return the process to a normal state or
to prevent the process from going into an abnormal/unsafe condition.
Operators should view alarms in the context of the overall operation.
It makes no sense for an operator to have to respond to an alarm
for low pressure when the well is shut down for maintenance.
When alarms functions as they should they alert the operator to an unexpected
change, inform the operator of the nature of the change, and guide the operator
towards a corrective action.
Alarms should always require operator action; if not, they are not alarms but events.
ERP Integration
InfoPoint provides a common model for data usually managed by the ERP solution.
“Information sharing produces shared awareness among the participants,
and collaborative production relies on shared creation,
but collective action creates shared responsibility,
by tying the user's identity to the identity of the group.”
Clay Shirky
“[Visualization is] the visual means of resolving logical problems”
Jacques Bertin, 1967
Integrated Operations and Visualization
Efficient visualization tools are one of the corner stones of integrated operations.
Cutting edge 3D visualization and simulation enables better and more accurate cross discipline communication,
leading to a more productive environment for collaboration.
Visualization can be used for anything from examining simple data sets to analyzing complex, time-dependent
data from disparate sources, effective IO requires features and functions that let you easily
gain meaningful insight into your data.
In integrated operations visualization can be used to aid the process of inspecting, cleaning,
transforming, and modeling of data. The goal is to highlight useful information, suggest conclusions,
and support decision making.
Visualization can provide useful insights into analysis techniques that focuses
on modeling and knowledge discovery for predictive purposes.
The quantity of information that we have to process on a regular basis has increased enormously.
The rapid advances in technology, primarily in the areas of communication and information,
puts an ever increasing amount of data at our fingertips. To turn this wast amount of data into useful
information we need effective methods that allow us to analyze the data in ways that help us
make informed decisions.
To be effective cross disciplinary teams requires effective means of communication.
When we want to communicate an idea, it's often useful to draw a picture.
It can be a sketch on paper, a drawing on a white-board, or a slide.
The visual representations help us to illustrate concepts that would be hard explain
verbally to a listener. When we have data that illustrates concepts, ideas,
and properties intrinsic to that data, visualization is a valuable communication tool.
Collaboration Services
The communication and integration architecture enables
voice, video, and shared workspaces for collaboration.
InfoPoint provides extensive services for:
- Project management
- Equipment & Resource management
- Case management
“If you can define the outcome you really want, give examples of it, and identify how
those consequences are observable, then you can design measurements that will measure the
outcomes that matter.”
Hubbard, Douglas W. (2010).
How to Measure Anything: Finding the Value of Intangibles in Business
Optimization & Operations research
Operations research is a branch of mathematics that focuses on the effective use of available resources.
Based on techniques from other branches of mathematics — like modeling, statistics, optimization —
Operations research enables optimal or near-optimal solutions to complex decision-making problems.
Operations research emphasizes practical application of human-technology interaction and overlaps
with disciplines, like industrial engineering and management science, and draws on psychology and
organization science. Operations Research is usually concerned with determining the maximum
or minimum of some real-life objective. Maximize production and profit - minimize risk and loss.
Operations research applies problem-solving techniques and methods applied to improve efficiency
and support decision-making. Operational research involves statistics, optimization, probability theory,
queuing theory, game theory, graph theory, decision analysis, mathematical modeling and simulation.
Optimization
Operational research relies heavily on computer science and leverages the
mathematical theory of methods for optimization and their implementation
using techniques of formal programming and high performance computing.
Optimization can be performed using linear, quadratic, nonlinear, convex,
nonconvex, global, stochastic, parallel and distributed programming.
The effectiveness of optimization is determined by the techniques applied to solve the problem,
the goals for improvement, and available time and computing power.
Operational Research with Risk
Risk analysis deals with the assessment and management of risk from unlikely but costly or distressing events.
Risk management is becoming an increasingly important subject and operational research provides
methods and tools for producing and maintaining formal risk management strategies.
|