This is the QA environment of the MD-SOAR platform. It is for TESTING PURPOSES ONLY. Navigate to https://mdsoar.org to access the latest open access research from MD-SOAR institutions.
QA Environment
 

UMBC Center for Accelerated Real Time Analysis

Permanent URI for this collection

Real time analytics is the leading edge of a smart data revolution, pushed by Internet advances in sensor hardware on one side and AI/ML streaming acceleration on the other. Center for Accelerated Real Time Analytics (CARTA) explores the realm streaming applications of Magna Analytics. The center works with next-generation hardware technologies, like the IBM Minsky with onboard GPU accelerated processors and Flash RAM, a Smart Cyber Physical Sensor Systems to build Cognitive Analytics systems and Active storage devices for real time analytics. This will lead to the automated ingestion and simultaneous analytics of Big Datasets generated in various domains including Cyberspace, Healthcare, Internet of Things (IoT) and the Scientific arena, and the creation of self learning, self correcting “smart” systems.

Browse

Recent Submissions

Now showing 1 - 12 of 12
  • Item
    Developing Ethics and Equity Principles, Terms, and Engagement Tools to Advance Health Equity and Researcher Diversity in AI and Machine Learning: Modified Delphi Approach
    (JMIR, 2023-12-06) Hendricks-Sturrup, Rachele; Simmons, Malaika; Anders, Shilo; Aneni, Kammarauche; Joshi, Karuna; et al
    Background: Artificial intelligence (AI) and machine learning (ML) technology design and development continues to be rapid, despite major limitations in its current form as a practice and discipline to address all sociohumanitarian issues and complexities. From these limitations emerges an imperative to strengthen AI and ML literacy in underserved communities and build a more diverse AI and ML design and development workforce engaged in health research. Objective: AI and ML has the potential to account for and assess a variety of factors that contribute to health and disease and to improve prevention, diagnosis, and therapy. Here, we describe recent activities within the Artificial Intelligence/Machine Learning Consortium to Advance Health Equity and Researcher Diversity (AIM-AHEAD) Ethics and Equity Workgroup (EEWG) that led to the development of deliverables that will help put ethics and fairness at the forefront of AI and ML applications to build equity in biomedical research, education, and health care. Methods: The AIM-AHEAD EEWG was created in 2021 with 3 cochairs and 51 members in year 1 and 2 cochairs and ~40 members in year 2. Members in both years included AIM-AHEAD principal investigators, coinvestigators, leadership fellows, and research fellows. The EEWG used a modified Delphi approach using polling, ranking, and other exercises to facilitate discussions around tangible steps, key terms, and definitions needed to ensure that ethics and fairness are at the forefront of AI and ML applications to build equity in biomedical research, education, and health care. Results: The EEWG developed a set of ethics and equity principles, a glossary, and an interview guide. The ethics and equity principles comprise 5 core principles, each with subparts, which articulate best practices for working with stakeholders from historically and presently underrepresented communities. The glossary contains 12 terms and definitions, with particular emphasis on optimal development, refinement, and implementation of AI and ML in health equity research. To accompany the glossary, the EEWG developed a concept relationship diagram that describes the logical flow of and relationship between the definitional concepts. Lastly, the interview guide provides questions that can be used or adapted to garner stakeholder and community perspectives on the principles and glossary. Conclusions: Ongoing engagement is needed around our principles and glossary to identify and predict potential limitations in their uses in AI and ML research settings, especially for institutions with limited resources. This requires time, careful consideration, and honest discussions around what classifies an engagement incentive as meaningful to support and sustain their full engagement. By slowing down to meet historically and presently underresourced institutions and communities where they are and where they are capable of engaging and competing, there is higher potential to achieve needed diversity, ethics, and equity in AI and ML implementation in health research.
  • Item
    IoT-Reg: A Comprehensive Knowledge Graph for Real-Time IoT Data Privacy Compliance
    (2023-12-15) Echenim, Kelvin; Joshi, Karuna
    The proliferation of the Internet of Things (IoT) has led to an exponential increase in data generation, especially from wearable IoT devices. While this data influx offers unparalleled insights and connectivity, it also brings significant privacy and security challenges. Existing regulatory frameworks like the United States (US) National Institute of Standards and Technology Interagency or Internal Report (NISTIR) 8228, the US Health Insurance Portability and Accountability Act (HIPAA), and the European Union (EU) General Data Protection Regulation (GDPR) aim to address these challenges but often operate in isolation, making their compliance in the vast IoT ecosystem inconsistent. This paper presents the IoT-Reg ontology, a holistic semantic framework that amalgamates these regulations, offering a stratified approach based on the IoT data lifecycle stages and providing a comprehensive yet granular approach to IoT data handling practices. The IoT-Reg ontology aims to transform the IoT domain into a realm where regulatory controls are seamlessly integrated system components by emphasizing risk management, compliance, and the pivotal role of manufacturers’ privacy policies, ensuring consistent adherence, enhancing user trust, and promoting a privacy-centric IoT environment. We include the results of validating this framework against risk mitigation for Wearable IoT devices.
  • Item
    Secure and Privacy-Compliant Data Sharing: An Essential Framework for Healthcare Organizations
    (2024-01-04) Walid, Redwan; Joshi, Karuna ; Elluri, Lavanya
    Data integration from multiple sources can improve decision-making and predict epidemiological trends. While there are many ben- eőts to data integration, there are also privacy concerns, especially in healthcare. The Health Insurance Portability and Accountability Act (HIPAA) is one of the essential regulations in healthcare, and it sets strict standards for the privacy and security of patient data. Often, data integration can be complex because different rules apply to different companies. Many existing data integration technologies are domain-speciőc and theoretical, while others rigorously adhere to uniőed data integration. Moreover, the integration systems do not have semantic access control, which causes privacy breaches. We propose a framework using a knowledge graph for sharing and integrating data across healthcare providers by protecting data privacy. We use an ontology to provide Attribute-Based Access Control (ABAC) for preventing excess or unwanted access based on the user attributes or central organization rules. The data is shared by removing sensitive attributes and anonymizing the rest using k-anonymity to strike a balance between data utility and secret information. A metadata layer describes the schema mapping to integrate data from multiple sources. Our framework is a promising approach to data integration in healthcare, and it addresses some of the critical challenges of data integration in this domain.
  • Item
    Venous Access: National Guideline and Registry Development (VANGUARD): Advancing Patient-Centered Venous Access Care Through the Development of a National Coordinated Registry Network
    (JMIR Publications, 2023-11-24) Iorga, Andrea; Velezis, Marti J; Marinac-Dabic, Danica; Lario, Robert F; Huff, Stanley M; Gore, Beth; Mermel, Leonard A; Bailey, L Charles; Skapik, Julia; Willis, Debi; Lee, Robert E; Hurst, Frank P; Gressler, Laura E; Reed, Terrie L; Towbin, Richard; Baskin, Kevin M
    There are over 8 million central venous access devices inserted each year, many in patients with chronic conditions who rely on central access for life-preserving therapies. Central venous access device–related complications can be life-threatening and add tens of billions of dollars to health care costs, while their incidence is most likely grossly mis- or underreported by medical institutions. In this communication, we review the challenges that impair retention, exchange, and analysis of data necessary for a meaningful understanding of critical events and outcomes in this clinical domain. The difficulty is not only with data extraction and harmonization from electronic health records, national surveillance systems, or other health information repositories where data might be stored. The problem is that reliable and appropriate data are not recorded, or falsely recorded, at least in part because policy, payment, penalties, proprietary concerns, and workflow burdens discourage completeness and accuracy. We provide a roadmap for the development of health care information systems and infrastructure that address these challenges, framed within the context of research studies that build a framework of standardized terminology, decision support, data capture, and information exchange necessary for the task. This roadmap is embedded in a broader Coordinated Registry Network Learning Community, and facilitated by the Medical Device Epidemiology Network, a Public-Private Partnership sponsored by the US Food and Drug Administration, with the scope of advancing methods, national and international infrastructure, and partnerships needed for the evaluation of medical devices throughout their total life cycle.
  • Item
    Evaluating Machine Learning and Statistical Models for Greenland Subglacial Bed Topography
    (2023-11-05) Yi, Katherine; Dewar, Angelina; Tabassum, Tartela; Lu, Jason; Chen, Ray; Alam, Homayra; Faruque, Omar; Li, Sikan; Morlighem, Mathieu; Wang, Jianwu
    The purpose of this research is to study how different machine learning and statistical models can be used to predict bedrock topography under the Greenland ice sheet using ice-penetrating radar and satellite imagery data. Accurate bed topography representations are crucial for understanding ice sheet stability and vulnerability to climate change. We explore nine predictive models including dense neural network, longshort term memory, variational auto-encoder, extreme gradient boosting (XGBoost), gaussian process regression, and kriging based residual learning. Model performance is evaluated with mean absolute error (MAE), root mean squared error (RMSE), coefficient of determination (R²), and terrain ruggedness index (TRI). In addition to testing various models, different interpolation methods, including nearest neighbor, bilinear, and kriging, are also applied in preprocessing. The XGBoost model with kriging interpolation exhibit strong predictive capabilities but demands extensive resources. Alternatively, the XGBoost model with bilinear interpolation shows robust predictive capabilities and requires fewer resources. These models effectively capture the complexity of the terrain hidden under the Greenland ice sheet with precision and efficiency, making them valuable tools for representing spatial patterns in diverse landscapes.
  • Item
    Ensuring Privacy Policy Compliance of Wearables with IoT Regulations
    (2023-11-01) Echenim, Kelvin; Elluri, Lavanya; Joshi, Karuna
    In an era where wearables, particularly those in non-hospital settings, collect and transmit sensitive personal data, it is imperative to implement stringent privacy safeguards. The National Institute of Standards and Technology (NIST) Internal Report 8228 provides regulations for securing Internet of Things (IoT) devices, data, and the privacy of individuals. We have developed a novel framework for examining the privacy policies governing the data and information utilized by wearable devices to ensure that these IoT devices work in adherence to the NIST controls. Our approach entails constructing an ontology of the pertinent NIST regulations, extracting key regulation terms, establishing clear annotation guidelines, and reasoning over the developed ontology. Our primary contribution is developing a novel method to accurately retrieve the expectations, privacy risk mitigation areas, and the associated regulations using Natural Language Processing and Semantic Web concepts. Ultimately, vendors and users can use our publicly available ontology to semi-automate the privacy compliance process for wearables, ensuring that the data collected and transmitted through the devices are secure, thereby protecting both the devices and the individuals who use them.
  • Item
    An Overview of Cybersecurity Knowledge Graphs Mapped to the MITRE ATT&CK Framework Domains
    (2023-10-03) Bolton, Joshua; Elluri, Lavanya; Joshi, Karuna
    A large volume of cybersecurity-related data sets are generated daily from systems following disparate protocols and standards. It is humanly impossible for cybersecurity experts to manually sieve through these large data sets, with different schema and metadata, to determine potential attacks or issues. A myriad of applications and tool sets are offered to automate the analysis of large cyber data sets. Semantic Web’s community has been studying the field of cybersecurity for over a decade and produced numerous knowledge graphs and frameworks. The Unified Cybersecurity Ontology (UCO) connected many of the leading knowledge representation frameworks, providing a holistic mapping of cyber data, beginning in 2016. MITRE ATT&CK is used by a wide variety of practitioners to understand how their current data and tooling prepare them to defend against both Advanced Persistent Threats (APTs) and less formal threat actors. The UCO and MITRE ATT&CK have provided researchers and practitioners, respectively, with tools to standardize data collection, correlation, and analysis. However, it is not apparent how current knowledge graphs and their applications in the cybersecurity domain utilize ATT&CK. In this paper, we present the results of our study on whether current cybersecurity knowledge graphs have mapped the main MITRE ATT&CK matrices.
  • Item
    CDFMR: A Distributed Statistical Analysis of Stock Market Data using MapReduce with Cumulative Distribution Function
    (IEEE, 2023-08-010) Dahiphale, Devendra; Wadkar, Abhijeet; Joshi, Karuna
    The stock market generates massive data daily on top of a deluge of historical data. Investors and traders look to stock market data analysis for assurance in their investments, a prime indicator of our global economy. This has led to immense popularity in the topic, and consequently, much research has been done on stock market predictions and future trends. However, due to the relatively slow electronic trading systems and order processing times, the velocity of data, the variety of data, and social factors, there is a need for gaining speed, control, and continuity in data processing (real-time stream processing) considering the amount of data that is being produced daily. Unfortunately, processing this massive amount of data on a single node is inefficient, time-consuming, and unsuitable for real-time processing. Recently, there have been many advancements in Big Data processing technologies such as Hadoop, Cloud MapReduce, and HBase. This paper proposes a MapReduce algorithm for statistical stock market analysis with a Cumulative Distribution Function (CDF). We also highlight the challenges we faced during this work and their solutions. We further showcase how our algorithm is spanned across multiple functions, which are run using multiple MapReduce jobs in a cascaded fashion.
  • Item
    Policy Integrated Blockchain to Automate HIPAA Part 2 Compliance
    (2023-07-05) Clavin, James; Joshi, Karuna
    Healthcare organizations exchange sensitive health records, including behavioral health data, across peer-to-peer networks, and it is challenging to find and fix compliance issues proactively. The Healthcare industry anticipates a growing need to audit substance use disorder patient data, commonly referred to as Part 2 data, having been shared without a release of information signed by the patient. To address this need, we developed and evaluated a novel methodology to detect Part 2 data exchanged between organizations that integrates Blockchain technologies with knowledge graphs. We detect substance use disorder data in patient encounters exchanged using clinical terminology based upon the value sets provided by the National Institutes of Health for the Substance Abuse and Mental Health Services Administration. Generally, we consider sharing Part 2 data without consent as Byzantine medical faults, as they represent data shared between known and trusted network participants, that is valid, but is not relevant, and sharing it causes a breach. In this paper, we present our methodology in detail along with the experiment results. We model a medical network of hospitals based upon the most recent healthcare legislation, TEFCA, and generate synthetic patient encounter data dynamically in HL7 format. We convert exchanged encounter data into a knowledge graph data model so that we can use SNOMED-CT for identifying Part 2 data. For cohorts of 1,000 patients, we detect Part 2 data in a subset of their encounter data shared between organizations and log that securely on an Ethereum-based blockchain.
  • Item
    Developing minimum core data structure for the obesity devices Coordinated Registry Network (CRN)
    (BMJ, 2022-11-14) Long, Cynthia; Tcheng, James E; Marinac-Dabic, Danica; Iorga, Andrea; Krucoff, Mitchell; Fisher, Deborah
    Obesity continues to be a major public health issue, with more than two-thirds of adults in the USA categorized as overweight or obese. Bariatric surgery is effective and yields durable weight loss; however, few qualified candidates choose to undergo surgical treatment. Less-invasive alternatives to bariatric surgery are being developed to bridge the treatment gap. Recognizing the burden of conducting pivotal clinical trials and traditional post-approval studies for medical devices, the Food and Drug Administration (FDA) Center for Devices and Radiological Health has encouraged the development of real-world data content and quality that is sufficient to provide evidence for Total Product Life Cycle medical device evaluation. A key first step is to establish a minimum core data structure that provides a common lexicon for endoscopic obesity devices and its corresponding interoperable data elements. Such a structure would facilitate data capture across existing workflow with a ‘coordinated registry network’ capability. On July 29, 2016, a workshop entitled, ‘GI Coordinated Registry Network: A Case for Obesity Devices’ was held at the FDA White Oak Campus by the Medical Device Epidemiology Network public–private partnership and FDA to initiate the work of developing a common lexicon and core data elements in the metabolic device space, which marked the inauguration of the Gastrointestinal Coordinated Registry Network project. Several work groups were subsequently formed to address clinical issues, data quality issues, registry participation, and data sharing.
  • Item
    A policy-based cloud broker for the VCL platform
    (Inderscience Enterprises, 2013-01-10) Joshi, Karuna; Yesha, Yelena; Finin, Tim; Yesha, Yaacov
    Managing and delivering virtualised cloud-based services in systems like the virtual computing lab (VCL) is an open challenge. Current research is focused on enhancing specific components like service discovery, composition and monitoring and there is no holistic view of what would constitute a lifecycle of virtualised services delivered on a cloud environment. We have developed a policy-based integrated framework for automating acquisition and consumption of cloud services. This framework divides the cloud service lifecycle into five phases: requirements, discovery, negotiation, composition, and consumption. We have also developed a cloud broker tool to automatically discover, negotiate and reserve images from the VCL platform. It was developed using web technologies and is built upon our service lifecycle framework. This paper describes our methodology and the tool as implemented on the VCL platform.
  • Item
    Multidisciplinary Education on Big Data + HPC + Atmospheric Sciences
    (National Science Foundation, 2017-11-01) Wang, Jianwu; Gobbert, Matthias K.; Zhang, Zhibo; Gangopadhyay, Aryya; Page, Glenn G.
    We present a new initiative to create a training program or graduate-level course (cybertraining.umbc.edu) in big data applied to atmospheric sciences as application area and using high-performance computing as indispensable tool. The training consists of instruction in all three areas of "Big Data + HPC + Atmospheric Sciences" supported by teaching assistants and followed by faculty-guided project research in a multidisciplinary team of participants from each area. Participating graduate students, post-docs, and junior faculty from around the nation will be exposed to multidisciplinary research and have the opportunity for significant career impact. The paper discusses the challenges, proposed solutions, practical issues of the initiative, and how to integrate high-quality developmental program evaluation into the improvement of the initiative from the start to aid in ongoing development of the program.