Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
52 changes: 52 additions & 0 deletions docs/setupcfg.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
The following metadata fields can be extracted from a setup.cfg file.
These fields are defined in the [setuptools declarative configuration specification](https://setuptools.pypa.io/en/latest/userguide/declarative_config.html), and are mapped according to the [CodeMeta crosswalk for Python Distutils](https://github.com/codemeta/codemeta/blob/master/crosswalks/Python%20Distutils%20(PyPI).csv).

| Software metadata category | SOMEF metadata JSON path | SETUP.CFG metadata file field |
|--------------------------------|-----------------------------|----------------------------------------|
| author - value | author[i].result.value | metadata.author |
| author - email | author[i].result.email | metadata.author_email |
| author - name | author[i].result.name | metadata.author |
| code_repository | code_repository[i].result.value | project_urls (source, repository, code) |
| description | description[i].result.value | metadata.description |
| documentation | documentation[i].result.value | project_urls (Documentation, docs) |
| license - value | license[i].result.value | metadata.license or metadata.license_files |
| license - name | license[i].result.name | metadata.license *(1)* |
| license - spdx id | license[i].result.spdx_id | metadata.license if "spdx.org/licenses/" *(1)* |
| has_package_file | has_package_file[i].result.value | URL of the setup.cfg file |
| homepage | homepage[i].result.value | metadata.url or project_urls (Homepage) |
| keywords | keywords[i].result.value | metadata.keywords |
| package_id | package_id[i].result.value | metadata.name |
| requirements - value | requirements[i].result.value | options.install_requires or options.setup_requires *(2)* |
| requirements - name | requirements[i].result.name | options.install_requires or options.setup_requires -> name *(2)* |
| requirements - version | requirements[i].result.version | options.install_requires or options.setup_requires -> version *(2)* |
| runtime_platform - value | runtime_platform[i].result.value | options.python_requires -> "Python" + version *(3)* |
| runtime_platform - name | runtime_platform[i].result.name | options.python_requires -> "Python" *(3)* |
| runtime_platform - version | runtime_platform[i].result.version | options.python_requires *(3)* |
| version - value | version[i].result.value | metadata.version |
| version - tag | version[i].result.tag | metadata.version |

---

*(1)*
- Look for the name and spdx_id in a local dictionary with all licenses.

*(2)*
- Examples of requirements
```
[options]
install_requires =
astropy
ctapipe >= 0.12
h5py ~= 3.1.0

setup_requires =
setuptools >= 40.6.0
wheel

```

*(3)*
- Example:
```
python_requires = >= 3.10.0
```
2 changes: 1 addition & 1 deletion docs/supported_languages.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ To know more about the extraction details for each type of file, click on it.
| JavaScript | [`package.json`](./packagejson.md), [`bower.json`](./bower.md) |
| Julia | [`Project.toml`](./julia.md) |
| PHP | [`composer.json`](./composer.md) |
| Python | [`setup.py`](./setuppy.md), [`pyproject.toml`](./pyprojecttoml.md), [`requirements.txt`](./requirementstxt.md) |
| Python | [`setup.py`](./setuppy.md), [`setup.cfg`](./setupcfg.md), [`pyproject.toml`](./pyprojecttoml.md), [`requirements.txt`](./requirementstxt.md) |
| R | [`DESCRIPTION`](./description.md) |
| Ruby | [`*.gemspec`](./gemspec.md) |
| Rust | [`Cargo.toml`](./cargo.md) |
Expand Down
1 change: 1 addition & 0 deletions docs/supported_metadata_files.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ SOMEF can extract metadata from a wide range of files commonly found in software
| `pyproject.toml` | Python | Modern Python project configuration file used by tools like Poetry and Flit | [🔍](./pyprojecttoml.md)| [📄](https://packaging.python.org/en/latest/guides/writing-pyproject-toml/)| [PEP 621](https://peps.python.org/pep-0621/)| [Example](https://github.com/KnowledgeCaptureAndDiscovery/somef/blob/master/pyproject.toml) |
| `requirements.txt` | Python | Lists Python package dependencies | [🔍](./requirementstxt.md)| [📄](https://pip.pypa.io/en/stable/reference/requirements-file-format/)| [Latest](https://pip.pypa.io/en/stable/reference/requirements-file-format/)| [Example](https://github.com/oeg-upm/FAIR-Research-Object/blob/main/requirements.txt) |
| `setup.py` | Python | Package file format used in python projects | [🔍](./setuppy.md)| [📄](https://setuptools.pypa.io/en/latest/references/keywords.html)| [v75.0.0](https://github.com/pypa/setuptools)| [Example](https://github.com/oeg-upm/soca/blob/main/setup.py) |
| `setup.cfg` | Python | Configuration file for setuptools used to define package metadata and options in a declarative way | [🔍](./setupcfg.md)| [📄](https://setuptools.pypa.io/en/latest/userguide/declarative_config.html) | [v75.0.0](https://github.com/pypa/setuptools)|[Example](https://github.com/oeg-upm/soca/blob/main/setup.cfg)|
| `DESCRIPTION` | R | Metadata file for R packages including title, author, and version | [🔍](./description.md) | [📄](https://cran.r-project.org/doc/manuals/R-exts.html#The-DESCRIPTION-file)| [v4.4.1](https://cran.r-project.org/doc/manuals/r-release/R-exts.html) | [Example](https://github.com/cran/ggplot2/blob/master/DESCRIPTION) |
| `*.gemspec` | Ruby | Manifest file serves as the package descriptor used in Ruby gem projects. | [🔍](./gemspec.md)| [📄](https://guides.rubygems.org/specification-reference/)| [v3.5.22](https://github.com/rubygems/rubygems)|[Example](https://github.com/rubygems/rubygems/blob/master/bundler/bundler.gemspec) |
| `cargo.toml` | Rust | Manifest file serves as the package descriptor used in Rust projects | [🔍](./cargo.md) | [📄](https://doc.rust-lang.org/cargo/reference/manifest.html)| [v0.85.0](https://github.com/rust-lang/cargo) | [Example](https://github.com/rust-lang/cargo/blob/master/Cargo.toml) |
Expand Down
192 changes: 192 additions & 0 deletions src/somef/parser/setupcfg_parser.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,192 @@
import re
import os
import logging
import configparser
from pathlib import Path
from ..process_results import Result
from ..utils import constants
from ..regular_expressions import detect_license_spdx, detect_spdx_from_declared

def parse_setup_cfg(file_path, metadata_result: Result, source):
"""
Parser for setup.cfg files. Very similar to the one for pyproject.toml, but using configparser instead of toml library.
"""

try:
metadata_result.add_result(
constants.CAT_HAS_PACKAGE_FILE,
{"value": source, "type": constants.URL},
1,
constants.TECHNIQUE_CODE_CONFIG_PARSER,
source
)

config = configparser.ConfigParser()
config.read(file_path, encoding="utf-8")

metadata = dict(config["metadata"]) if "metadata" in config else {}
options = dict(config["options"]) if "options" in config else {}

if "name" in metadata:
metadata_result.add_result(
constants.CAT_PACKAGE_ID,
{"value": metadata["name"], "type": constants.STRING},
1, constants.TECHNIQUE_CODE_CONFIG_PARSER, source
)

if "version" in metadata:
version_value = metadata["version"]
if not version_value.startswith("attr:"):
metadata_result.add_result(
constants.CAT_VERSION,
{"value": version_value, "type": constants.RELEASE, "tag": version_value},
1, constants.TECHNIQUE_CODE_CONFIG_PARSER, source
)

if "description" in metadata:
metadata_result.add_result(
constants.CAT_DESCRIPTION,
{"value": metadata["description"], "type": constants.STRING},
1, constants.TECHNIQUE_CODE_CONFIG_PARSER, source
)

if "author" in metadata or "author_email" in metadata:
author_data = {
"name": metadata.get("author"),
"email": metadata.get("author_email"),
"type": constants.AGENT,
"value": metadata.get("author")
}
metadata_result.add_result(
constants.CAT_AUTHORS, author_data,
1, constants.TECHNIQUE_CODE_CONFIG_PARSER, source
)

if "license" in metadata:
license_value = metadata["license"]
license_info_spdx = detect_spdx_from_declared(license_value)
if not license_info_spdx:
license_info_spdx = detect_license_spdx(license_value, 'JSON')
if license_info_spdx:
license_data = {
"value": license_value,
"spdx_id": license_info_spdx.get('spdx_id'),
"name": license_info_spdx.get('name'),
"type": constants.LICENSE
}
else:
license_data = {"value": license_value, "type": constants.LICENSE}

metadata_result.add_result(
constants.CAT_LICENSE, license_data,
1, constants.TECHNIQUE_CODE_CONFIG_PARSER, source
)

if "keywords" in metadata:
for kw in re.split(r'[,\n]', metadata["keywords"]):
kw = kw.strip()
if kw:
metadata_result.add_result(
constants.CAT_KEYWORDS,
{"value": kw, "type": constants.STRING},
1, constants.TECHNIQUE_CODE_CONFIG_PARSER, source
)

if "url" in metadata:
metadata_result.add_result(
constants.CAT_HOMEPAGE,
{"value": metadata["url"], "type": constants.URL},
1, constants.TECHNIQUE_CODE_CONFIG_PARSER, source
)

if "install_requires" in options:
for req in options["install_requires"].strip().splitlines():
req = req.strip()
if req:
name, version = parse_dependency(req)
if name:
metadata_result.add_result(
constants.CAT_REQUIREMENTS,
{
"value": req,
"name": name,
"version": version,
"type": constants.SOFTWARE_DEPENDENCY,
"dependency_type": constants.DEPENDENCY_TYPE_RUNTIME,
"dependency_resolver": "python"
},
1, constants.TECHNIQUE_CODE_CONFIG_PARSER, source
)

if "setup_requires" in options:
for req in options["setup_requires"].strip().splitlines():
req = req.strip()
if req:
name, version = parse_dependency(req)
if name:
metadata_result.add_result(
constants.CAT_REQUIREMENTS,
{
"value": req,
"name": name,
"version": version,
"type": constants.SOFTWARE_DEPENDENCY,
"dependency_type": constants.DEPENDENCY_TYPE_DEVELOPMENT,
"dependency_resolver": "python"
},
1, constants.TECHNIQUE_CODE_CONFIG_PARSER, source
)

if "python_requires" in options:
metadata_result.add_result(
constants.CAT_RUNTIME_PLATFORM,
{
"value": f"Python{options['python_requires']}",
"name": "Python",
"version": options["python_requires"],
"type": constants.STRING
},
1, constants.TECHNIQUE_CODE_CONFIG_PARSER, source
)

if "project_urls" in metadata:
lines = metadata["project_urls"].split('\n')
for line in lines:
if '=' in line:
label, url_val = [part.strip() for part in line.split('=', 1)]
label_lower = label.lower()

if label_lower in ["documentation", "docs", "doc"]:
metadata_result.add_result(
constants.CAT_DOCUMENTATION,
{"value": url_val, "type": constants.URL},
1, constants.TECHNIQUE_CODE_CONFIG_PARSER, source
)

elif label_lower in ["repository", "source", "code"]:
metadata_result.add_result(
constants.CAT_CODE_REPOSITORY,
{"value": url_val, "type": constants.URL},
1, constants.TECHNIQUE_CODE_CONFIG_PARSER, source
)

except Exception as e:
logging.error(f"Error parsing setup.cfg file {file_path}: {str(e)}")

return metadata_result

def parse_dependency(dependency_str):
"""Parse a dependency string to extract name and version."""
if not dependency_str:
return None, None

parts = re.split(r'(>=|<=|==|!=|>|<|~=)', dependency_str, 1)
name = parts[0].strip()
if len(parts) > 1:
version = ''.join(parts[1:])
else:
version = ""

version = re.sub(r'[\[\]]', '', version)

return name, version
3 changes: 2 additions & 1 deletion src/somef/parser/toml_parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -118,7 +118,8 @@ def extract_common_version_field(data, metadata_result, source, file_type):
For Project.toml: data["version"]
"""
version_value = None

version_type = None

if file_type == "cargo" and "package" in data and "version" in data["package"]:
version_value = data["package"]["version"]
version_type = constants.RELEASE
Expand Down
6 changes: 5 additions & 1 deletion src/somef/process_files.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@
from .parser.publiccode_parser import parse_publiccode_file
from .parser.codeowners_parser import parse_codeowners_file
from .parser.conda_environment_parser import parse_conda_environment_file
from .parser.setupcfg_parser import parse_setup_cfg
from chardet import detect


Expand Down Expand Up @@ -277,7 +278,8 @@ def process_repository_files(repo_dir, metadata_result: Result, repo_type, owner
(filename.lower() == "environment.yml" or filename.lower() == "environment.yaml") or \
(filename.lower() == ".zenodo.json") or \
(filename.lower() == "cargo.toml" and repo_relative_path == ".") or (filename.lower() == "composer.json" and repo_relative_path == ".") or \
(filename == "Project.toml" or (filename.lower()== "publiccode.yml" or filename.lower()== "publiccode.yaml") and repo_relative_path == "."):
(filename == "Project.toml" or (filename.lower()== "publiccode.yml" or filename.lower()== "publiccode.yaml") and repo_relative_path == ".") or \
filename.lower() == "setup.cfg":
if filename.lower() in parsed_build_files and repo_relative_path != ".":
logging.info(f"Ignoring secondary {filename} in {dir_path}")
continue
Expand Down Expand Up @@ -318,6 +320,8 @@ def process_repository_files(repo_dir, metadata_result: Result, repo_type, owner
metadata_result = parse_publiccode_file(os.path.join(dir_path, filename), metadata_result, build_file_url)
if filename.lower() == "environment.yml" or filename.lower() == "environment.yaml":
metadata_result = parse_conda_environment_file(os.path.join(dir_path, filename), metadata_result, build_file_url)
if filename.lower() == "setup.cfg":
metadata_result = parse_setup_cfg(os.path.join(dir_path, filename), metadata_result, build_file_url)
# if filename.lower() == ".zenodo":
# metadata_result = parse_zenodo_file(os.path.join(dir_path, filename), metadata_result, build_file_url)
parsed_build_files.add(filename.lower())
Expand Down
Loading
Loading