Download PDF Principles of Data Management: Facilitating information sharing

Free download. Book file PDF easily for everyone and every device. You can download and read online Principles of Data Management: Facilitating information sharing file PDF Book only if you are registered here. And also you can download or read online all Book PDF file that related with Principles of Data Management: Facilitating information sharing book. Happy reading Principles of Data Management: Facilitating information sharing Bookeveryone. Download file Free Book PDF Principles of Data Management: Facilitating information sharing at Complete PDF Library. This Book have some digital formats such us :paperbook, ebook, kindle, epub, fb2 and another formats. Here is The CompletePDF Book Library. It's free to register here to get Book file PDF Principles of Data Management: Facilitating information sharing Pocket Guide.

Optimal organization of such a tree is query-dependent, and therefore may not be practical for the large-scale IoT network. In [ 52 ], an architecture based on distributed in-network query processing is discussed. Users input SQL-like queries at the server end describing the data needed and how it should be fused. Queries view data produced by a sensor as a virtual table with one column per sensor type. Tables are continuous streams of values, with periodic time-bounded buffering used for querying purposes. Sensors push data to designated view nodes where queries can pull the data or data is pushed further to higher levels in the network.

Power consumption is controlled via defining a lifetime attribute for the query, thus giving the user control over the data sampling rate. Intermediate nodes are used to execute partial aggregation and packet merging to lower the communication overhead incurred by the relay of query results.

Sensors can determine the time, frequency, and order in which to acquire samples needed to answer queries to achieve minimized energy costs. Both propose special storage mechanisms and query processing customized for the distinct nature of smartcards and embedded systems respectively. However, these solutions rely on the availability of energy supplies from hosting devices e. Based on discussions above, we can summarize the existing data management solutions for IoT or IoT subspaces relative to whether and how they fulfill the aforementioned design primitives, as shown in Table 1.

Most of the current data management proposals are targeted to WSNs, which are only a subset of the global IoT space, and therefore do not explicitly address the more sophisticated architectural characteristics of IoT. WSNs are a mature networking paradigm whose data management solutions revolve mainly around in-network data processing and optimization. Sensors are mostly of stationary, resource-constrained nature, which does not facilitate sophisticated analysis and services.

The main focus in WSN-based data management solutions is to harvest real-time data promptly for quick decision making, with limited permanent storage capacities for long-term usage. This represents only a subset of the more versatile IoT system, which aims at harnessing the data available from a variety of sources; stationary and mobile, smart and embedded, resource-constrained and resource-rich, real-time and archival.

The main focus of IoT-based data management therefore extends the provisions made for WSNs to add provisions of a seamless way to tap into the volumes of heterogeneous data in order to find interesting global patterns and strategic opportunities. Some of the current proposals provide abstractions to support the integration of data from heterogeneous networks, thus paving the way for adaptation and seamless integration of other IoT subsystems.

Solid and comprehensive data management solutions that support interoperability between diverse subsystems and integrate the overall lifecycle of data management with the presence of mobile objects and context-awareness requirements are yet to be developed. We propose a framework for IoT data management that is more compatible with the IoT data lifecycle and addresses the design primitives discussed earlier. The proposed IoT data management framework consists of six stacked layers, two of which include sub-layers and complementary or twin layers.

The Communication Layer provides support for transmission of requests, queries, data, and results collection and delivery. The Data Layer also handles data and query processing for local, autonomous data repository sites filtering, preprocessing, processing.

The Query Layer handles the details of query processing and optimization in cooperation with the Federation Layer as well as the complementary Transactions Layer processing, delivery. Outline of the proposed IoT data management framework and mapping of its layers to the IoT data lifecycle. The layers of the proposed IoT data management framework and their respective functional modules are illustrated in Figure 4 , and discussed in the following subsections. We start the framework description with the data layer since it is the core element in data management.

Enterprise Data World Webinars: Information Management Principles

Understanding where and how data is stored is essential to subsequent updates, queries, and access to the data. There are two main issues to be addressed in IoT data management with regard to the data itself: the placement of storage facilities the where , and the format to be used for data storage the how.

In our framework, we opt for a hybrid approach to data storage—with temporal, real-time data stored near at the objects' generating this data, and persistent, long-term data that is to be used for analysis catalogued and stored at dedicated facilities—is expected to yield a beneficial trade-off between the costs of storage space and data transmission on one hand and the availability of data for sophisticated analysis and queries on the other hand. The data layer is concerned with the storage of persistent data, while the things layer is concerned with the storage of transient data, among other things to be discussed at later sections.

Which data should go where is a design element that is subject to the nature of data and the requirements of applications that will have the most access to the IoT system that will utilize this data the most. Due to the global nature of IoT, data is generated at different locations that may be far apart geographically, requiring a federated approach to storage and access of real-time data.

Federated data management involves the autonomous control of local sites on their data, with optional participation in the data federation for query processing purposes and only when resources allow for such participation. A similar approach can be followed with IoT data management, with data generated by the different sub-systems or groups of related objects placed at locations designated by the owners of such sub-systems.

The different sub-systems can then participate in a database federation if they find their data relevant to a query, and only when their energy-constrained resources allow. The data layer therefore can be viewed as an abstract representation of the union of the data residing on the different IoT sites, with modules to locally handle this data, and with catalogs to identify the specifications of this data for the purposes of later integration into the federation at the federation layer.

The data layer modules have functionalities that are similar to their counterparts in upper layers. The Local Data Integrator Module performs simple integration processes on data that generated within an autonomous IoT system, with possibly structured and unstructured data to be unified for query and analysis results.

At the Data Layer catalogs, information necessary to access data and define sources i. Semantic metadata can be used to describe how data can be accessed and linked in order to facilitate efficient queries, personalization of results, and heterogeneous data integration [ 54 ]. Furthermore, data can be associated with geo-location and time tags to facilitate with semantic relevance as well as location and time context establishment; certain data is only relevant in specific locations and during specific time intervals.

Defining data sources in the catalog can take a hierarchical form that resembles the semantic structure of the sources, with inter-links to represent existing dependencies. In addition, sources may be associated with confidence degrees in the accuracy level of the data they generate, which can be used for query optimization. Catalog information will be the primary source that is used to provide schema mapping for interoperability between heterogeneous data stores.

Domain-specific descriptions should be provided as an integral part of metadata, also for interoperability purposes. The things layer encompasses the entities that generate data for the IoT system as well as modules that perform in-network, real-time processes whose results are to be transmitted further up the system. Entities can be sensors, RFID tags, mobile devices, laptops, embedded chips, vehicles, ships, and any apparatus that has embedded intelligent devices with communication capabilities.

Entities can be looked at as collections or what can be called virtual entities [ 5 ]; a body area network linked to a patient, a vehicle with a multitude of intelligent devices, or an environmental sensor network. Each of these objects, whether operating autonomously or as a part of an interlinked network system, is considered a data source for the IoT network. In order to have access to data at the things layer, there needs to be a mechanism to uniquely identify the data sources. Due to the global nature of IoT, location-based identification is pivotal to the efficient retrieval and querying of data generated by the geographically distributed things.

Furthermore, modal identification should be incorporated as well to identify entities of the same type that provide the same type of data. This is similar to data-centric naming mechanisms that are followed in some sensor networks. Platform-dependent identification can be cross-referenced with location-based identification, and is used to identify entities belonging to a specific platform or entity and serving a generic purpose pertaining to that entity or platform, as is the case with body area networks linked to a certain patient.

Therefore, this unique identification is tightly attached to the IoT objects and can be promptly accessed at the things layer, either by in-network processes or by crawling agents at upper layers in the framework that need to identify certain objects that may satisfy certain requests. Two modules are of importance at the things layer and are similar in functionality to other modules in upper layers, although at a less sophisticated complexity level: The local aggregation modules and the in-network query optimizers.

Local Aggregation Modules : To optimize transmission and storage costs, the aggregation modules at the Things layer are deployed to provide summaries and aggregates of data whose detailed values are of limited importance and can be discarded. Since the aggregation function aims at minimizing communication costs, the aggregation points are deployed closer to the data sources. This module will be activated for real-time queries that need to be executed at the network level and for which only end results are needed in upper layers.

The data that will supply such queries with results may still be reported periodically to permanent storage at the upper data layer, but are processed in real-time to provide prompt reporting for delay-sensitive queries. The communication layer connects the distributed data sources to more concentrated data storage and processing units.

Principles of Data Management: Facilitating Information Sharing - Keith Gordon - Google книги

Inter-objects as well as objects-to-infrastructure communication technologies are to be used, and interoperation guarantees are provided at upper layers. Communication takes place also close to the federation layer as multiple geographically dispersed data repositories can be engaged for sophisticated query and analysis purposes. In distributed database systems, database fragments are stored in predefined and finite locations. The system design dictates that metadata store the locations of database fragments beforehand for querying and update purposes.

The situation differs drastically for IoT data, where sources that will generate and deliver data are 1 distributed over diverse locations that vary from dedicated objects to implants on objects; 2 not finite, with new sources becoming available continuously; and 3 autonomous, with no unification of schema or definition of metadata. These characteristics make it challenging to execute real-time queries placed to the things layers if there is no way to identify which sources or sub-systems can respond to the requests.

There needs to be a layer on top of the things layer that handles the seamless and transparent identification and unification of data sources for query processing purposes. This layer is mainly targeted at the requests on the downlink, and is not concerned with the periodical reporting of generated data to upper layers for storage and archiving purposes. Sources Orchestrator : The sources orchestrator is concerned with translating a query execution plan in terms of the sources that should be involved in executing that query plan.

It works closely with the query optimizer to find the sources that match the query requested data and tweak the execution plan so that it becomes location-aware. In addition, it builds the plan of cooperation and hierarchy of results delivery that is to be followed by the data sources. Sources Crawler s : The Internet of Things has a flexible architecture that accommodates new sub-systems as they are installed and activated. If seamless and scalable merging of such systems into the overall architecture is to be supported, their existence should be made seen by the system as a whole.

This can be done proactively, by having these systems announce their existence to the IoT infrastructure; or reactively, by running discovery routines whenever new queries are placed to the things layer. Proactive appending of new sources is discussed in the next module. Reactive discovery of new sources is done by either having the crawler do periodical scans or do scans only upon reception of data query requests. New Sources Notifier: As new sources are found by the sources crawler, there needs to be a notification mechanism through which real-time, continuously updated queries become aware of the existence of such sources.

In addition, the data specifications of these new sources need to be reported to the data layer in order to be merged with the objects catalog and metadata store. The new sources notifier can also be used to proactively alert the IoT infrastructure about the specifications of new sub-systems or devices so their metadata can be included in the system for future reference and access. New sub-systems or devices make themselves known to the infrastructure by directly reporting their specifications to the new sources notifier, which in turn verifies the authenticity of these reports and passes the specifications to the objects catalog and metadata store that is either closest geographically or most related semantically.

The federation layer lies at the center of the framework, and provides the glue that joins dispersed IoT subsystems and data sources together to form a globalized view of the IoT system. It provides interoperability features for the diverse data types and repositories that are to be joined together to answer a specific query. The sources catalog is a more scalar version of the repositories catalog, in which descriptions of the objects that provide data are defined. The moderator adds the repositories' descriptive information accordingly to the catalog.

Repositories can nevertheless be inactive intermittently if their local administrative needs are set to have higher priority and cannot accommodate participation at global views. The moderator can then negotiate participation in a specific query or analysis task based on the repositories' local workloads, processing priorities, and availability of the needed data. Repositories can announce their desire to permanently deactivate their participation in the IoT data federation so as not to be contacted for future query negotiations by the moderator.

The same piece of data can be available under different names at different repositories, and similar entities or classes may be represented differently in the varying databases' schemas. Therefore, schema matching may depend on comparing attribute values or defining ontologies for data attributes in order to mask semantic heterogeneity and provide a seamless schematic view of data.

Since matching the schemas in the presence of a dynamic data repositories environment such as the one characteristic of IoT, it is not practical to define a unified global schema in advance. Schema-level matching based on schema information can be predefined for core attributes and object classes to support data-level matching.

The context profiler should determine for each such query the location of the data sources that have the most potential of holding results for that query. Although the main contextual support provided is geographic, other context specifications such as time frames or modalities can be supported as well. An example query with multiple context specifications is a query that may reach the data sources layer asking for identities of patients modality within a certain city location who have shown certain symptoms query constraints that are indicative of a quickly spreading time frame epidemic.

Data Integrator: The data integrator closely collaborates with the query layer to facilitate access to heterogeneous data via processing the information included in the sources catalog. The data integrator will assist in the execution of semantic queries that do not state the repositories from which to fetch data, but rather provide location or time constraints for data sources. The data integrator will work as well to provide seamless results format for queries that join both data from well-defined repositories with real-time data from dynamic data sources.

The query layer encapsulates the elements necessary for generating, optimizing, and executing queries on the IoT database. It is deployed both at the federated and local levels; the local level being that governing the subsystems deployed by individual organizations or agencies. This way, a global view of IoT data can be obtained while localized data views can still be generated by the underlying systems forming the IoT infrastructure.

Query Plan Generator : The query plan generator takes as input the specifications of the desired output, either from the application or directly from the user, and transforms it into a query written in a standard query language format. Possibly multiple query plans are then generated either as command trees or as sets of steps; query representations that conform to the data entities defined in the database schema and detail the instructions of how to fetch the needed data. Semantic and context queries that dictate only the desired location s , time interval, or business-specific semantic requirements, need also to be transformed into executable plans.

This will be done in collaboration with the data adapter.

Principles of Data Management : Facilitating Information Sharing by Keith Gordon (2013, Paperback)

The plans for a query are then passed to the query optimizer to choose the best plan to execute. Each plan is assigned a cost that is estimated based on evaluation criteria that are either predefined in a data dictionary or fed at runtime by the user submitting the query. Dynamic real-time updates to query results need to account for the storage space allotted to the results.

This can be done either by summarizing results into synopses, by gracefully aging results according to some relevance requirements, or by providing approximations to query results with guarantees on the approximation accuracy in representing the actual data. A temporary storage buffer for results can be used to hold the step-by-step intermediate results of the query as it gets executed. IoT data repositories are globally distributed over geographic areas and hold massive volumes of data that is constantly generated and may change as time progresses.

Aggregation Engine: The aggregation engine is engaged according to query needs, and works closer to the repositories in order to perform summary operations of data based any number of criteria such as time, location, or modal context. Fusion Engine : In addition to aggregation, which provides various forms of summaries for data, fusion functions should be provided at this layer as well to support more sophisticated data merging capabilities.

Shop by category

Fusion engines placement is similar to aggregation engines placement, although the fusion techniques should be as simple and efficient as possible, and used mostly for data whose delivery can be delay-tolerant. The management layer is concerned with the mechanisms needed to provide access and security to the various data stores in the data layer of the framework.

Transaction Manager : The transaction manager handles the execution of transactions that are more related to business processes and services. Depending on the type of transaction submitted to the manager, it can deploy either a classical single-source execution mechanism, or deploy global or distributed execution strategies. The strict ACID properties that are required for successful transactions in order to guarantee data consistency may be relaxed in favour of the more trending eventual consistency guarantees [ 56 ].

Recovery Manager : The recovery manager is concerned with restoring the data repositories into the most recent consistent state after the occurrence of a failure due to electricity, crash incidents, corrupted files, etc. This is usually done by rolling back all the transactions or operations that were taking place within the data management system and have not yet been committed successfully.

Archiving is one way to recover lost or damaged data in primary storage space, but replicas of data repositories that are updated concurrently with the primary repositories can be used for sensitive systems with strong data availability requirements. Replication can be demanding in terms of storage and may degrade performance due to if a concurrent updates strategy is enforced. Security Manager: The security manager should include data protection and privacy measures in accordance with legal frameworks that are of relevance to both the protected data and to the final users.

Varying degrees of data privacy requirements can be defined, depending on the nature of data and the entity that generates the data [ 59 ]. Therefore, designing robust security and privacy measures that are tightly integrated with data management solutions is essential to the successful deployment of IoT. This information can be of use for the proper functioning of the system, future improvements, and novel or unconventional business opportunities.

In addition, different applications may need data access and processing at the lower Things layer or the higher data layer, depending on the real-time requirements and level of complexity needed for analysis. Pattern recognition and data mining techniques can be used for the multitude of IoT applications.

However, they need to take into consideration the three factors that distinguish IoT data; the increasingly large data volume, the highly unstructured and heterogeneous nature of data itself, and the geographically distributed storage facilities. The ARM defines a reference model that is composed of a number of sub-models. The domain model defines the basic attributes and responsibilities of IoT devices, resources, services, and virtual entities abstractions of physical sensors into physical devices and units; smartphones, vehicles, patients, etc. The information model defines—on a conceptual level—the attributes and relations of the data that is handled in an IoT system, which includes modeling the information flow and storage and how they are related.

The functional model breaks up the IoT architecture into manageable parts, defines their interrelationships, and produces functionalities that can be used to build IoT systems. The communication model defines the communication paradigms that connect entities within the IoT. The trust, security, and privacy model defines concepts related to system dependability and compliance to expected behavior, security of communications, and protection of private data and control of information disclosure within the IoT context.

Of special importance to our work are the information and functional models. Our definitions of data types when discussing the IoT data lifecycle are compliant with the IoT-A definitions of data to be either real-time, summarized or aggregated, inferred by in-depth processing, or reconciled preprocessed and enhanced for sophisticated data analysis. The IoT-A functional model breaks IoT functionalities into functionality groups, synonymous to the layered approach we have followed in our framework.

The device functionality group maps to our things layer, with possible storage capacities made possible on the devices as we have defined in the things layer. The communication functionality group maps to our communication layer. The IoT service functionality group maps to the data and sources layer in our proposed framework, although our proposed framework details many modules that are abstracted in the IoT-A functional model.

The virtual entity functionality group can be mapped to the federation layer in our framework. However, since our framework is data-centric, the virtual entities in the federation can be of a scale bigger than that which is meant by the virtual entities in the IoT-A.

Du vil kanskje like

The federation layer extends also to the service organization functionality group, which also handles service requests from applications and businesses, and therefore can be mapped to the query layer. Two functionality groups in the IoT-A—management and security—are only partially addressed in our framework, and further elaboration is intended as future work. When it comes to data handling, the IoT-A information view that stems from the information model provides a more detailed outlook as to how data is stored.

Unlike our use of catalogs and discovery modules to manage the discovery and indexing of data sources, the IoT-A uses service descriptions that are either provided by the IoT services themselves or by special management components in order to make the services visible and discoverable within the IoT system. A service resolution component is responsible for service discovery, which is similar to what our proposed framework provides in the sources layer.

The patient's smartphone can be such concentration point, and it can also interact with and control the patient's surrounding environment e. Vital readings that are collected periodically by the smartphone are reported wirelessly to the respective caregiver's network communication and stored in the patient's respective health records in data repositories data layer repositories. Emergency or abnormal events concerning the patient are also reported back to the caregiver's data repositories and the caregiver is alerted for proper assessment and response notification exchange pattern.

They can also analyze the collective patients' health profiles in order to find or infer interesting patterns related to possibly spreading health conditions, such as post-op infection incidents related to operations performed at that given hospital. The scale of analysis can be further widened to include health records that are accessed via the Internet on a city-wide, state-wide, country-wide, or even world-wide scale sources discovery, then data federation.

This can serve to identify the existence of epidemics or seasonal symptoms, as well as visualize their spread rate, patients' severity levels, and locale. This can help in the prompt containment of life-threatening conditions. Longer-term uses of analysis findings can drive future health care policies and deployment strategies. This example highlights a data-intensive environment in which the availability of data for large-scale analysis is pivotal in discovering valuable knowledge that can shape strategic action plans.

The system is flexible in the sense that as new sources of data become available e. It is not mandatory that a certain level of data availability be the exact aggregate of all sub-levels; analysis of data about a certain municipality, for example, does not require all data from hospitals within that municipality. Overall, the framework's separation of data and processing, and the framework's support for various levels of availability, allow for various processing needs to be addressed by different entities with varying degrees of flexibility.

In this paper, we discussed some of the data management solutions proposed for the Internet of Things, with a focus on the required design elements that should be addressed in order to provide a comprehensive solution. The design primitives we propose cover the three main functions of handling data; how it is collected, how it is stored, and how it is processed. The current solutions are only partial in the sense that they address data management requirements of IoT subsystems such as WSNs, and include partial subsets of the desired design primitives.

To compensate for this shortage, we outlined the components of a comprehensive IoT data management framework with core data and sources layers and support for federated architecture. The framework highlights the need for two-way, cross-layered design approach that can address both real-time and archival query, analysis, and service needs. Future work involves mapping the details of the proposed framework more closely to the reference model in the IoT-A, in-depth investigation and development of a data management solution that builds upon the proposed framework, and adding considerations of data security and privacy into the framework design in compliance with the considerations that need to be addressed in the IoT dynamic and heterogeneous environment.

Another dimension that the authors want to explore is the integration of heterogeneous data sources and systems within the IoT, where heterogeneity extends from the classical notion of different data types and formats to different data sources, time and geo tags, and globally distributed locations. The statements made herein are solely the responsibility of the authors. National Center for Biotechnology Information , U. Journal List Sensors Basel v.

Sensors Basel. Published online Nov Find articles by Mohammad Hayajneh.


Find articles by Najah Abu Ali. Author information Article notes Copyright and License information Disclaimer. This article has been cited by other articles in PMC. Abstract The Internet of Things IoT is a networking paradigm where interconnected, smart objects continuously generate data and transmit it over the Internet. Keywords: Internet of Things, data management, sensor networks. IoT Data Management Traditional data management systems handle the storage, retrieval, and update of elementary data items, records and files.

IoT Data Lifecycle The lifecycle of data within an IoT system—illustrated in Figure 1 —proceeds from data production to aggregation, transfer, optional filtering and preprocessing, and finally to storage and archiving. This new edition covers web technology and its relation to databases and includes material on the management of master data.

Review This Product No reviews yet - be the first to create one! Need help? Partners MySchool Discovery. Subscribe to our newsletter Some error text Name. Email address subscribed successfully. A activation email has been sent to you. Please click the link in that email to activate your subscription. A professional soldier for 38 years he has held technical, educational and managerial appointments including serving as head of the army's data management team.

Convert currency. Add to Basket. Compare all 18 new copies. Soft cover. Condition: New. Seller Inventory More information about this seller Contact this seller. Seller Inventory AVS First edition. Good data management and information sharing is vital to the success of a modern enterprise. This professional reference guide explains not only the importance of data management to modern business but also the issues facing those involved and how to address them.

Distinctively, Principles of Data Management is not based on a particular proprietary system or software, but is business focussed, providing the knowledge and techniques required to successfully implement a data management function. Seller Inventory BV. Book Description Condition: New.