21  Appendix 1: Extended Version of the Standard

Skill Area Evidential Requirements

The following table provides the full evidential requirements for each Skill Area (A–E). These standards are defined by the Alliance for Data Science Professionals.


21.1 Skill Area A: Data Privacy and Stewardship

This skill relates to the security and protection of data, including design, creation, storage, distribution, and associated risk.

A.1 — Ensuring the Protection of Personal and Sensitive Data

  1. Assess risks and enact data protection policies and procedures.
  2. Ensure safe and secure management of sensitive data, models, and infrastructures.
  3. Apply appropriate data controls, such as encryption, (pseudo)anonymization, and synthetic data.
  4. Risk management around environment and infrastructure.

A.2 — Managing Loss of Sensitive Data

  1. Act with integrity, giving due regard to legal and regulatory requirements.
  2. Be aware of the actions that should be taken to respond to potential data loss in line with organizational, legal, and regulatory procedures.

A.3 — Data Stewardship and Standards

  1. Incorporate the FAIR Guiding Principles for scientific data management and stewardship into practices, where appropriate and practicable.
  2. Identify opportunities for efficient and creative reuse of data.
  3. Understand the relationship between technical standards and regulation/governance, and their benefits for interoperability and knowledge sharing.

21.2 Skill Area B: Definition, Acquisition, Engineering, Architecture, Storage and Curation

This skill relates to the collection, manipulation, and secure storage of data, the application of data management, and analytical techniques.

B.1 — Data Collection and Management

  1. Source and access data appropriate for the problem.
  2. Critically analyze the availability of appropriate data and resources to meet project requirements.
  3. Ensure data provenance processes are followed.
  4. Identify data characteristics (volume, velocity, and variety).
  5. Identify infrastructure requirements for data storage and analysis.
  6. Show familiarity or experience with tabular and non-tabular data (e.g., unstructured and streaming data).

B.2 — Data Engineering

  1. Source and access data appropriate for the problem.
  2. Construct data sets, potentially drawing from multiple disparate sources using data linkage.
  3. Perform data profiling and characterization to understand the surface properties of the data.
  4. Handle missing data, through principled inclusion/exclusion criteria and imputation methods.
  5. Take a systematic approach to data curation and the application of data quality controls.
  6. Identify the most appropriate solutions (e.g., cloud vs. on-premise) in response to business and project needs.

B.3 — Deployment

  1. Plan the deployment of data products with their end-users.
  2. Develop monitoring and maintenance processes.
  3. Deliver secure, stable, and scalable data products to meet the needs of the organization (e.g., Application Programming Interfaces (APIs), derivative datasets, dashboards, reports) and do so according to modern software development best practices.
  4. Design and deliver data products that meet appropriate accessibility standards for their users.

21.3 Skill Area C: Problem Definition and Communication with Stakeholders

This skill is about engaging stakeholders, demonstrating the ability to clearly define a problem, and agreeing on solutions.

C.1 — Problem Definition

  1. Identify and elicit project requirements.
  2. Determine success criteria and frame these in the context of the business.
  3. Clearly articulate the problem statement.
  4. Identify and critically evaluate assumptions.
  5. Recognize and quantify biases and identify solutions to manage and mitigate these.
  6. Assess risk.
  7. Demonstrate sector/domain knowledge and knowledge of how data science can deliver value to these sectors/domains.

C.2 — Relationship Management

  1. Communicate in an effective manner for diverse audiences, including technical colleagues, subject matter experts, and leadership.
  2. Effectively manage the expectations of diverse stakeholders with conflicting priorities to mediate equitable solutions.
  3. Use relevant communication techniques (written, oral, or visual), appropriate for the audience.
  4. Build appropriate and effective business relationships.
  5. Show experience in human factors considerations with respect to data-driven solutions.

21.4 Skill Area D: Problem Solving, Analysis, Statistical Modelling, Visualisation

This skill relates to the identification and presentation of solutions using a range of methods, tools, and techniques, demonstrating the ability to analyze a problem and define and present options.

D.1 — Identifying and Applying Technical Solutions and Project Management Approaches

  1. Identify viable solutions based on requirements and data available.
  2. Identify and provide guidance to technical and non-technical stakeholders on the most appropriate solution.
  3. Apply appropriate technical and project management methodologies appropriate for the organization and project.

D.2 — Data Preparation and Feature Modelling

  1. Identify appropriate solutions, including statistical and machine learning approaches, and demonstrate an understanding of the assumptions, strengths, and weaknesses of the selected approaches.
  2. Identify and evaluate appropriate evaluation metrics, including computational performance and accuracy.
  3. Manipulate data with due regard for differences in characteristics.
  4. Creation and evaluation of new data features.

D.3 — Data Analysis and Model Building

  1. Apply appropriate solutions, including statistical and machine learning approaches. Demonstrate competence in a modern programming language.
  2. Use appropriate analysis platforms and tools.
  3. Adopt a systematic approach to exploratory data analysis to embrace and manage ambiguity and uncertainty.
  4. Critically analyze data and analytical results.
  5. Adopt appropriate methods to visualize data and communicate complex findings.

21.5 Skill Area E: Evaluation and Reflection

This skill is about reflecting on performance and outcomes, identifying development needs, and applying important principles associated with ethics and sustainability. Note: when completing your evidence for this Skill Area you can refer to evidence provided in Skill Areas A–D, together with ensuring that ethical evaluation is reflected throughout Skill Areas A–D.

E.1 — Project Evaluation

  1. Ongoing monitoring of project performance and outcomes.
  2. Identify and feed forward lessons learned.
  3. Participate in and lead collaborative project evaluations, e.g., retrospectives.

E.2 — Ethical Behavior

  1. Identify and manage the risks of erroneous and biased data.
  2. Act with integrity with respect to legal and regulatory requirements.
  3. Uphold principles of ethical and safe use of data and AI technologies.
  4. Implement data use procedures to ensure sensitive data is only used for its agreed purpose.
  5. Implement data retention strategies in line with regulatory and legal requirements.

E.3 — Sustainability and Best Practices

  1. Evidence of incorporating the principles of open science and/or reproducible research within the organization, and perhaps beyond.
  2. Competence in programmatic approaches to undertaking data science work.
  3. Apply the scientific method in delivering solutions.
  4. Ensure high technical standards, in line with software development best practices; for example, software testing, version control, Continuous Integration and Continuous Delivery.
  5. Apply automation to promote reproducibility of analyses.

E.4 — Reflective Practice and Ongoing Development

  1. Learn from experience through self-assessment of one’s own responses to practice situations.
  2. Identify learning opportunities to maintain knowledge and skills in the relevant area of data science.
  3. Take ownership of ongoing professional development.
  4. Contribute to knowledge-sharing across their organization and/or the wider community.
  5. Contribute to the management and empowerment of the broader team.
  6. Engage with the latest developments across industry and academia and incorporate these into solutions.