feat: Publications sync system with SerpAPI integration by Mhafsia · Pull Request #64 · ppsp-team/ppsp-team.github.io

Mhafsia · 2026-01-06T22:33:32Z

Added SerpAPI sync script to fetch publications from Google Scholar
Migrated 135 publications to single YAML file (_data/publications.yml)
Now 285 publications synced from Google Scholar
New sidebar UI with filters: Type, Year (Category temporarily deactivated)
Pill-shaped filter buttons with color coding (blue, teal)
Clickable tags on publications to filter
Normalized author names (Initial. Lastname format)
Added clean_publications.py and normalize_publications.py scripts
Removed old individual .md publication files

- Added SerpAPI sync script to fetch publications from Google Scholar - Migrated 135 publications to single YAML file (_data/publications.yml) - Now 285 publications synced from Google Scholar - New sidebar UI with filters: Type, Year (Category provisoirement désactivé) - Pill-shaped filter buttons with color coding (blue, teal) - Clickable tags on publications to filter - Normalized author names (Initial. Lastname format) - Added clean_publications.py and normalize_publications.py scripts - Removed old individual .md publication files

netlify · 2026-01-06T22:37:32Z

✅ Deploy Preview for ppsp ready!

Name	Link
🔨 Latest commit	`910eac0`
🔍 Latest deploy log	https://app.netlify.com/projects/ppsp/deploys/695effef673911000832f8ad
😎 Deploy Preview	https://deploy-preview-64--ppsp.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Copilot

Pull request overview

This PR implements a comprehensive publications management system that migrates from individual markdown files to a centralized YAML-based approach with automated syncing from Google Scholar via SerpAPI. The changes include four Python utility scripts for synchronization, normalization, and data cleaning, plus a redesigned UI with interactive filtering capabilities.

Key Changes:

Automated publication syncing from Google Scholar using SerpAPI (285 total publications)
Migration of 135 existing publications from individual .md files to single _data/publications.yml
New sidebar-based filtering UI with Type, Year filters, and clickable tags
Author name normalization to "Initial. Lastname" format

Reviewed changes

Copilot reviewed 141 out of 143 changed files in this pull request and generated 21 comments.

Show a summary per file

File	Description
scripts/sync_publications.py	Main sync script that fetches publications from Google Scholar API and adds new entries to YAML
scripts/normalize_publications.py	Normalizes author names, journal titles, and publication types for consistency
scripts/clean_publications.py	Cleans publication data by removing duplicates, fixing doubled years, and extracting correct years
scripts/migrate_to_yaml.py	One-time migration script to convert .md frontmatter to YAML format
scripts/requirements.txt	Python dependencies (requests, pyyaml)
_includes/publications.html	Redesigned publications page with sidebar filters, pill-style buttons, and JavaScript filtering logic
.gitignore	Added .env files to prevent API key exposure
_papers/papers/*.md	Removed 135+ individual publication markdown files (migrated to YAML)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-06T22:40:09Z

scripts/normalize_publications.py

+        if len(parts) >= 2:
+            # Last part is lastname, first parts are first names
+            lastname = parts[-1]
+            firstnames = parts[:-1]
+            # Convert first names to initials
+            initials = ' '.join([f"{n[0]}." for n in firstnames if n])
+            normalized.append(f"{initials} {lastname}")


The author name normalization logic assumes "last part is lastname, first parts are first names", but this doesn't handle compound last names (e.g., "Van Der Berg", "De La Cruz") or names with suffixes (e.g., "John Smith Jr."). This could result in incorrect name formatting for authors with these name patterns.

Copilot · 2026-01-06T22:40:10Z

scripts/clean_publications.py

+    # If year is 2025 (our fallback), try to find real year in publication field
+    if current_year == 2025 and publication:
+        # Look for year pattern in publication
+        year_match = re.search(r'\b(19\d{2}|20[0-2]\d)\b', publication)


The regex pattern for extracting years in clean_publications.py is '\b(19\d{2}|20[0-2]\d)\b' which will match years from 1900-2029. However, the pattern in sync_publications.py uses '\b(19|20)\d{2}\b' which matches 1900-2099. This inconsistency could lead to different behavior when extracting years. Consider using the same pattern in both scripts for consistency.

Copilot · 2026-01-06T22:40:10Z

_includes/publications.html

+                <button class="filter-pill filter-pill-year" data-filter="2019-"
+                  onclick="toggleFilter(this, 'year')">Before 2020</button>


The year filter "Before 2020" uses the filter value "2019-" which is checked with a special case in the applyFilters function. However, this naming is misleading - "2019-" suggests a range starting from 2019, when it actually means "before 2020". Consider using a more explicit value like "pre-2020" or "before-2020" to make the intent clearer.

Copilot · 2026-01-06T22:40:10Z

scripts/sync_publications.py

+def normalize_title(title: str) -> str:
+    """Normalize title for comparison."""
+    return re.sub(r'[^a-z0-9]', '', title.lower())


The normalize_title function removes all non-alphanumeric characters for comparison, which could cause false positives. For example, "COVID-19 Study" and "COVID 19 Study" would be considered the same (correctly), but "Re-examination" and "Reexamination" would also match (potentially incorrectly if they're different papers). Consider a more nuanced normalization that preserves some punctuation context or use a fuzzy matching algorithm for better duplicate detection.

Copilot · 2026-01-06T22:40:11Z

scripts/sync_publications.py

+        except requests.exceptions.RequestException as e:
+            print(f"❌ API Error: {e}")
+            return


The error handling for RequestException catches all request-related errors generically and simply returns, but doesn't distinguish between different error types (network errors, timeout, 401 unauthorized, 429 rate limit, etc.). Consider adding more specific error handling to provide better diagnostics, especially for common issues like invalid API keys (401/403) or rate limiting (429).

Copilot · 2026-01-06T22:40:14Z

_includes/publications.html

+      // Check type filter (case insensitive)
+      const typeMatch = currentFilters.type === 'all' ||
+        type.toLowerCase() === currentFilters.type.toLowerCase();
+
+      // Check category filter
+      const catMatch = currentFilters.category === 'all' ||
+        categories.toLowerCase().includes(currentFilters.category.toLowerCase());


The case-insensitive type matching converts both sides to lowercase for comparison, but the category filter uses case-sensitive includes(). This inconsistency could lead to unexpected behavior. Either make both case-insensitive or both case-sensitive for consistency. Since publication types and categories are controlled values, case-sensitive comparison would be more predictable.

Suggested change

// Check type filter (case insensitive)

const typeMatch = currentFilters.type === 'all' ||

type.toLowerCase() === currentFilters.type.toLowerCase();

// Check category filter

const catMatch = currentFilters.category === 'all' ||

categories.toLowerCase().includes(currentFilters.category.toLowerCase());

// Check type filter (case sensitive, controlled values)

const typeMatch = currentFilters.type === 'all' ||

type === currentFilters.type;

// Check category filter (case sensitive, controlled values)

const catMatch = currentFilters.category === 'all' ||

categories.includes(currentFilters.category);

Copilot · 2026-01-06T22:40:14Z

scripts/clean_publications.py

+    seen_titles = {}
+    unique_pubs = []
+    duplicates_removed = 0
+
+    for pub in pubs:
+        title = pub.get('title', '').lower().strip()
+        # Normalize title for comparison
+        normalized = re.sub(r'[^a-z0-9]', '', title)
+
+        if normalized and normalized not in seen_titles:
+            seen_titles[normalized] = True


The duplicate removal uses a dictionary where keys are normalized titles but the value is just 'True'. Using a set would be more appropriate and memory-efficient for this use case. Consider changing 'seen_titles = {}' to 'seen_titles = set()' and updating the logic accordingly.

Copilot · 2026-01-06T22:40:14Z

_includes/publications.html

+    top: 100px;
+  }
+


The sticky sidebar has 'top: 100px' which may not work well on mobile devices or if the site header height changes. Consider using a more flexible approach with proper responsive breakpoints and calculating the offset based on the actual header height, or ensure this is tested across different screen sizes and header configurations.

Suggested change

top: 100px;

}

top: 0;

}

@media (min-width: 992px) {

.pub-sidebar {

top: 100px;

}

}

Copilot · 2026-01-06T22:40:15Z

scripts/migrate_to_yaml.py

+Run once to migrate, then delete this script.
+"""
+
+import os


Import of 'os' is not used.

Copilot · 2026-01-06T22:40:15Z

scripts/sync_publications.py

+import re
+import yaml
+import requests
+from datetime import datetime


Import of 'datetime' is not used.

- Use dynamic year fallback instead of hardcoded value - Better error handling for API requests - Handle compound last names and multiple initials - Add ARIA labels for accessibility - Use data-filter-group for robust filter handling - Add responsive sticky sidebar - Remove unused imports

Copilot

Pull request overview

Copilot reviewed 141 out of 143 changed files in this pull request and generated 10 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-08T00:17:19Z

scripts/normalize_publications.py

+        if re.match(r'^([A-Z]\.\s*)+[A-Za-z-]+$', author):
+            normalized.append(author)
+            continue


The regex pattern for detecting already-normalized author names on line 34 has a flaw. The pattern ^([A-Z]\.\s*)+[A-Za-z-]+$ will match names like "J. Smith-Jones" correctly, but it won't match compound last names with multiple parts separated by spaces (e.g., "J. van der Berg"). The pattern assumes the last name is a single continuous token, which doesn't handle compound last names with prefixes like "van", "von", "de", etc. that are already handled in the normalization logic starting at line 44.

Copilot · 2026-01-08T00:17:19Z

scripts/clean_publications.py

+def remove_duplicates(pubs: list) -> list:
+    """Remove duplicate publications based on title similarity."""
+    seen_titles = set()  # Use set instead of dict for efficiency
+    unique_pubs = []
+    duplicates_removed = 0
+
+    for pub in pubs:
+        title = pub.get('title', '').lower().strip()
+        # Normalize title for comparison
+        normalized = re.sub(r'[^a-z0-9]', '', title)
+
+        if normalized and normalized not in seen_titles:
+            seen_titles.add(normalized)
+            unique_pubs.append(pub)
+        else:
+            duplicates_removed += 1
+            print(f"   🔄 Removed duplicate: {pub.get('title', '')[:60]}...")
+
+    return unique_pubs, duplicates_removed


The duplicate removal logic uses a set to track normalized titles but returns a tuple on line 66 when the function signature on line 48 indicates it should return a list. While Python can handle this in most cases, the inconsistency between the declared return type (list) and actual return type (tuple) could cause issues. The function should consistently return a list as indicated by its signature.

Copilot · 2026-01-08T00:17:20Z

_includes/publications.html

+                <button class="filter-pill active" data-filter="all" onclick="toggleFilter(this, 'type')"
+                  aria-label="Show all publication types">All</button>
+                <button class="filter-pill" data-filter="Journal Paper" onclick="toggleFilter(this, 'type')"
+                  aria-label="Filter by type: Journal Paper">Journal
+                  Paper</button>
+                <button class="filter-pill" data-filter="Preprint" onclick="toggleFilter(this, 'type')"
+                  aria-label="Filter by type: Preprint">Preprint</button>
+                <button class="filter-pill" data-filter="Book_Chapter" onclick="toggleFilter(this, 'type')"
+                  aria-label="Filter by type: Book Chapter">Book
+                  Chapter</button>
+                <button class="filter-pill" data-filter="Comment" onclick="toggleFilter(this, 'type')"
+                  aria-label="Filter by type: Comment">Comment</button>
+                <button class="filter-pill" data-filter="Poster_Conference" onclick="toggleFilter(this, 'type')"
+                  aria-label="Filter by type: Poster/Conference">Poster/Conference</button>


In the HTML template, the filter pills use inline onclick handlers which mix JavaScript with HTML. This is generally considered a maintainability anti-pattern. Consider moving to event listeners in the JavaScript section instead of inline event handlers. This would make the code more maintainable and follow modern web development best practices.

Copilot · 2026-01-08T00:17:20Z

scripts/sync_publications.py

+def load_env():
+    env_path = Path(__file__).parent / '.env'
+    env_vars = {}
+    if env_path.exists():
+        with open(env_path) as f:
+            for line in f:
+                line = line.strip()
+                if line and not line.startswith('#') and '=' in line:
+                    key, value = line.split('=', 1)
+                    # Strip quotes from values (handles "value" or 'value')
+                    value = value.strip().strip('"').strip("'")
+                    env_vars[key.strip()] = value
+    return env_vars


The function load_env() on lines 20-32 manually parses the .env file. While this works, it doesn't handle all edge cases that standard .env parsers handle (e.g., escaped characters, multiline values, comments after values). Consider using a standard library like python-dotenv which is more robust and handles these edge cases correctly. Add it to requirements.txt if adopted.

Copilot · 2026-01-08T00:17:20Z

_includes/publications.html

+      // Fallback: legacy behavior based on specific filter values
+      document.querySelectorAll('.sidebar-section').forEach(section => {
+        const buttons = section.querySelectorAll('.filter-pill[data-filter]');
+        if (buttons.length > 0) {
+          const hasType = section.querySelector('[data-filter="Journal Paper"]');
+          const hasCategory = section.querySelector('[data-filter="Computational"]');
+
+          if ((filterGroup === 'type' && hasType) || (filterGroup === 'category' && hasCategory)) {
+            buttons.forEach(btn => {
+              btn.classList.toggle('active', btn.dataset.filter === value);
+            });
+          }
+        }
+      });
+    }


The JavaScript function filterByTag has a fallback mechanism (lines 317-331) in case the data-filter-group attributes are not found. However, this fallback uses brittle selectors that look for specific filter values like 'Journal Paper' and 'Computational'. If these values change or new filter types are added, this fallback code will break silently. Consider removing the fallback entirely and ensuring all filter sections have the required data-filter-group attribute, or document why this fallback is necessary.

Copilot · 2026-01-08T00:17:20Z

_includes/publications.html

+                    <div class="pub-tags">
+                      {% assign type_display = project.type | replace: "Poster_Conference", "Poster/Conference" |
+                      replace: "_", " " %}
+                      <button class="pub-tag" onclick="filterByTag('{{ project.type }}', 'type')">{{ type_display


The aria-label attributes on filter buttons (lines 35-48, 76-92) provide good accessibility support. However, the pub-tag buttons created in lines 119-120 that are clickable don't have aria-labels or roles, which could make them less accessible to screen reader users. Consider adding appropriate ARIA attributes to these interactive elements.

Suggested change

<button class="pub-tag" onclick="filterByTag('{{ project.type }}', 'type')">{{ type_display

<button class="pub-tag" onclick="filterByTag('{{ project.type }}', 'type')" aria-label="Filter by type: {{ type_display }}">{{ type_display

Copilot · 2026-01-08T00:17:21Z

_includes/publications.html

+  .pub-sidebar {
+    background: #f8f9fa;
+    border-radius: 8px;
+    padding: 20px 0px;
+    position: sticky;
+    top: 0;
+  }
+
+  @media (min-width: 992px) {
+    .pub-sidebar {
+      top: 100px;


The sticky sidebar CSS uses top: 0 for small screens and top: 100px for large screens (lines 152-160). The 100px offset assumes a fixed header height, but if the header size changes or is responsive, this could cause layout issues. Consider using CSS variables or calculating the header height dynamically to make this more maintainable.

Suggested change

.pub-sidebar {

background: #f8f9fa;

border-radius: 8px;

padding: 20px 0px;

position: sticky;

top: 0;

}

@media (min-width: 992px) {

.pub-sidebar {

top: 100px;

:root {

/* Default sticky offset for publication sidebar (can be overridden globally) */

--pub-sidebar-offset: 0;

}

.pub-sidebar {

background: #f8f9fa;

border-radius: 8px;

padding: 20px 0px;

position: sticky;

top: var(--pub-sidebar-offset, 0);

}

@media (min-width: 992px) {

:root {

--pub-sidebar-offset: 100px;

Copilot · 2026-01-08T00:17:21Z

scripts/sync_publications.py

+        # Try to extract from citation info
+        citation = article.get('citation', '')
+        # Use consistent year regex pattern (1900-2029)
+        year_match = re.search(r'\b(19\d{2}|20[0-2]\d)\b', citation)


The regex pattern for year extraction is inconsistent across files. In sync_publications.py line 68, the pattern is \b(19\d{2}|20[0-2]\d)\b (1900-2029), while in clean_publications.py line 37, it's the same. However, this pattern will match years up to 2029, which may be too permissive. Consider using a more conservative pattern that only matches up to the current year plus one to avoid false positives from text that happens to contain numbers in this range.

Copilot · 2026-01-08T00:17:21Z

scripts/sync_publications.py

+SERPAPI_KEY = env.get('SERPAPI_KEY', os.getenv('SERPAPI_KEY', ''))
+AUTHOR_ID = env.get('AUTHOR_ID', 'TakXk9MAAAAJ')  # Guillaume Dumas's Google Scholar ID


The SERPAPI_KEY is loaded from environment variables but there's no validation of its format or length. While there's a basic check for empty string on line 143, consider adding validation to ensure the API key meets minimum requirements (e.g., minimum length check) before making API requests, which could help catch configuration errors earlier.

Copilot · 2026-01-08T00:17:21Z

scripts/sync_publications.py

+    # Filter out empty titles and publications with unreasonable years (0 or far future)
+    current_year = get_current_year()
+    fetched_pubs = [p for p in fetched_pubs if p['title'] and 0 < p.get('year', 0) <= current_year + 1]


The year filtering logic on line 193 uses current_year + 1 as the upper bound. This means publications dated one year in the future will be accepted. While there's a comment on lines 157-158 mentioning this is expected behavior, accepting future-dated publications could lead to incorrect data if the API returns publications with incorrect years. Consider whether this tolerance is necessary or if it should be limited to the current year only.

Copilot

Pull request overview

Copilot reviewed 141 out of 143 changed files in this pull request and generated 9 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-08T00:29:15Z

scripts/sync_publications.py

+    # Use current year as fallback instead of hardcoded value
+    fallback_year = get_current_year()
+    parsed_year = int(year) if year and str(year).isdigit() else fallback_year


The fallback year logic uses get_current_year() when year data is missing. This means publications with missing year information will always appear as if they were published in the current year, which could be misleading. Consider using a special sentinel value (like 0 or None) for unknown years, or at minimum add a flag to indicate the year is estimated.

Copilot · 2026-01-08T00:29:15Z

scripts/migrate_to_yaml.py

+    # Extract frontmatter between --- markers
+    match = re.match(r'^---\s*\n(.*?)\n---', content, re.DOTALL)


The frontmatter parsing uses a regex that may not handle edge cases well. For example, if the content contains --- within a code block in the frontmatter, the regex could match incorrectly. Consider using a YAML parser with proper frontmatter support or adding more robust boundary detection.

Copilot · 2026-01-08T00:29:16Z

_includes/publications.html

+    } else {
+      // Fallback: legacy behavior based on specific filter values
+      document.querySelectorAll('.sidebar-section').forEach(section => {
+        const buttons = section.querySelectorAll('.filter-pill[data-filter]');
+        if (buttons.length > 0) {
+          const hasType = section.querySelector('[data-filter="Journal Paper"]');
+          const hasCategory = section.querySelector('[data-filter="Computational"]');
+
+          if ((filterGroup === 'type' && hasType) || (filterGroup === 'category' && hasCategory)) {
+            buttons.forEach(btn => {
+              btn.classList.toggle('active', btn.dataset.filter === value);
+            });
+          }
+        }
+      });


The filter toggle logic has a fallback mechanism (lines 316-332) for when data-filter-group attributes are not present, but the HTML always includes these attributes. This fallback code is likely dead code and should be removed to improve maintainability.

Suggested change

} else {

// Fallback: legacy behavior based on specific filter values

document.querySelectorAll('.sidebar-section').forEach(section => {

const buttons = section.querySelectorAll('.filter-pill[data-filter]');

if (buttons.length > 0) {

const hasType = section.querySelector('[data-filter="Journal Paper"]');

const hasCategory = section.querySelector('[data-filter="Computational"]');

if ((filterGroup === 'type' && hasType) || (filterGroup === 'category' && hasCategory)) {

buttons.forEach(btn => {

btn.classList.toggle('active', btn.dataset.filter === value);

});

}

}

});

Copilot · 2026-01-08T00:29:16Z

_includes/publications.html

+      const typeMatch = currentFilters.type === 'all' ||
+        type === currentFilters.type;
+
+      // Check category filter (case sensitive, controlled values)
+      const catMatch = currentFilters.category === 'all' ||
+        categories.includes(currentFilters.category);


The filtering logic performs case-sensitive exact matching for type and category filters (lines 374-379), which is correct for controlled values. However, the code should validate that the data attributes contain only expected values to prevent filter mismatches. Consider adding data validation or normalization when the page loads.

Copilot · 2026-01-08T00:29:16Z

scripts/normalize_publications.py

+        if re.match(r'^([A-Z]\.\s*)+[A-Za-z-]+$', author):
+            normalized.append(author)
+            continue


The regex pattern for author names ^([A-Z]\.\s*)+[A-Za-z-]+$ on line 34 doesn't handle compound last names with prefixes (like "van der Berg"). While compound prefixes are handled later in the code, names already in the format "J. van der Berg" won't be recognized by this regex and will be unnecessarily reprocessed. Consider updating the pattern to: ^([A-Z]\.\s*)+([a-z]+\s+)*[A-Za-z-]+$

Copilot · 2026-01-08T00:29:16Z

scripts/normalize_publications.py

+            for i in range(len(parts) - 1, 0, -1):
+                if parts[i - 1].lower() in COMPOUND_PREFIXES:
+                    lastname_start = i - 1
+                else:
+                    break


The compound last name detection logic (lines 44-48) iterates backwards but breaks on the first non-prefix word. This means it won't correctly handle names like "Jean de la Fontaine" where multiple compound prefixes appear consecutively. The current logic would only capture "de la" but miss earlier prefixes if they're separated by a non-prefix word.

Copilot · 2026-01-08T00:29:17Z

scripts/clean_publications.py

+def remove_duplicates(pubs: list) -> list:
+    """Remove duplicate publications based on title similarity."""
+    seen_titles = set()  # Use set instead of dict for efficiency
+    unique_pubs = []
+    duplicates_removed = 0
+
+    for pub in pubs:
+        title = pub.get('title', '').lower().strip()
+        # Normalize title for comparison
+        normalized = re.sub(r'[^a-z0-9]', '', title)
+
+        if normalized and normalized not in seen_titles:
+            seen_titles.add(normalized)
+            unique_pubs.append(pub)
+        else:
+            duplicates_removed += 1
+            print(f"   🔄 Removed duplicate: {pub.get('title', '')[:60]}...")
+
+    return unique_pubs, duplicates_removed


The duplicate removal function returns a tuple (unique_pubs, dup_count) but the function signature on line 48 doesn't indicate this. Consider adding type hints to make the return type explicit: def remove_duplicates(pubs: list) -> tuple[list, int]:

Copilot · 2026-01-08T00:29:17Z

_includes/publications.html

+            <div class="sidebar-section">
+              <input type="search" class="form-control rounded" id="search-input" onkeyup="filterPublications()"
+                placeholder="Search..." autocapitalize="off" autocomplete="off" autocorrect="off" role="textbox"
+                spellcheck="false">


Missing ARIA labels for the search input field. Add appropriate labels for screen reader accessibility. Example: aria-label="Search publications" on the input element.

Suggested change

spellcheck="false">

spellcheck="false" aria-label="Search publications">

Copilot · 2026-01-08T00:29:17Z

_includes/publications.html

+  }
+
+  @media (min-width: 992px) {
+    .pub-sidebar {


The sticky sidebar positioning uses top: 100px on large screens (line 158) but this value appears to be a magic number. This should either be documented or derived from the actual header height to ensure it works correctly with different header sizes.

Suggested change

.pub-sidebar {

.pub-sidebar {

/* Offset matches desktop header height to prevent overlap; update if header height changes */

…nd code quality - Replace fallback year with None to avoid misleading publication dates - Improve YAML frontmatter parsing with robust boundary detection - Remove dead legacy code in publications filter - Add aria-label for search input accessibility - Add documentation for sticky sidebar top offset - Fix author name regex to handle compound names (van der Berg, de la) - Improve compound last name detection logic - Add proper type hints for remove_duplicates function - Add data validation note for publication filters

Copilot

Pull request overview

Copilot reviewed 141 out of 143 changed files in this pull request and generated 10 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-08T01:24:04Z

scripts/sync_publications.py

+def save_publications(publications: list):
+    """Save publications to YAML file."""
+    # Sort by year (newest first)
+    publications.sort(key=lambda x: x.get('year', 0) or 0, reverse=True)


The sorting key uses x.get('year', 0) or 0 which treats both missing years and years set to 0 the same way. However, line 193 filters out publications with year <= 0, so any publications with year=0 or None should have already been removed before this function is called. The or 0 part is redundant defensive coding. Consider simplifying to just x.get('year', 0) or x['year'] if you're confident the key exists.

Copilot · 2026-01-08T01:24:04Z

scripts/normalize_publications.py

+            changes += 1
+
+    # Sort by year (newest first)
+    pubs.sort(key=lambda x: x.get('year', 0) or 0, reverse=True)


The sorting key uses x.get('year', 0) or 0 which is redundant. If the key doesn't exist, get returns 0, and 0 or 0 is still 0. Additionally, this pattern treats publications with missing years the same as publications from year 0. Consider using a more explicit approach like x.get('year') or 0 if None should be treated as 0, or raise an error if year is expected to always be present.

Copilot · 2026-01-08T01:24:04Z

scripts/clean_publications.py

+def remove_duplicates(pubs: list) -> tuple[list, int]:
+    """Remove duplicate publications based on title similarity.
+
+    Returns:
+        tuple[list, int]: Tuple of (unique_publications, duplicates_removed_count)


The function return type annotation is tuple[list, int] which uses the newer Python 3.9+ syntax for type hints. However, the scripts don't specify a minimum Python version in requirements.txt or in the shebang. For better compatibility with Python 3.7-3.8, consider using Tuple[list, int] from the typing module, or add a minimum Python version requirement (e.g., in a setup.py or README).

Copilot · 2026-01-08T01:24:05Z

scripts/migrate_to_yaml.py

+            errors.append(md_file.name)
+
+    # Sort by year (newest first)
+    publications.sort(key=lambda x: x.get('year', 0) or 0, reverse=True)


The same redundant sorting pattern x.get('year', 0) or 0 appears here as well. This is the fourth occurrence across the scripts.

Copilot · 2026-01-08T01:24:05Z

_includes/publications.html

+    document.getElementById('noResult').style.display = visibleCount === 0 ? 'block' : 'none';
+  }
+
+  // Initialize search function for backwards compatibility


The backward compatibility function searchItem() on lines 394-396 is defined but it's unclear if it's still being called anywhere. If this is for legacy code that's being removed, consider adding a deprecation comment or removing it if no longer needed.

Suggested change

// Initialize search function for backwards compatibility

// Initialize search function for backwards compatibility

/**

* @deprecated Use filterPublications instead. Kept for backwards compatibility with legacy callers.

*/

Copilot · 2026-01-08T01:24:05Z

scripts/sync_publications.py

+        # Try to extract from citation info
+        citation = article.get('citation', '')
+        # Use consistent year regex pattern (1900-2029)
+        year_match = re.search(r'\b(19\d{2}|20[0-2]\d)\b', citation)


The year range validation regex pattern (19\d{2}|20[0-2]\d) allows years up to 2029, but the validation on line 193 uses current_year + 1 as the upper bound. This creates an inconsistency where the regex could extract a year like 2029 from text, but then it would be filtered out by the validation check. Consider aligning these two approaches or documenting why they differ.

Copilot · 2026-01-08T01:24:06Z

scripts/normalize_publications.py

+            lastname_start = len(parts) - 1
+            for i in range(len(parts) - 1, 0, -1):
+                if parts[i - 1].lower() in COMPOUND_PREFIXES:
+                    lastname_start = i - 1
+                elif i == len(parts) - 1:
+                    # If current part is not a prefix, stop looking
+                    break
+                else:
+                    # If we found a prefix but then hit a non-prefix, keep the first prefix found
+                    continue


The compound last name detection logic has a flaw. In the loop on lines 45-53, when a prefix is found at position i-1, the code sets lastname_start = i - 1 but then continues the loop. This means if you have multiple consecutive prefixes like "de la Fontaine", the logic will correctly set lastname_start to the first prefix. However, the break condition on line 49-50 only breaks if the current part is not a prefix and we're at the last position. This could cause issues with names like "Jean de la Fontaine" where "de la" should both be part of the last name. Consider simplifying the logic to scan backwards until you find a non-prefix word.

Copilot · 2026-01-08T01:24:06Z

scripts/clean_publications.py

+    pubs, dup_count = remove_duplicates(pubs)
+
+    # Step 4: Sort by year (newest first)
+    pubs.sort(key=lambda x: x.get('year', 0) or 0, reverse=True)


The sorting key uses the same redundant pattern x.get('year', 0) or 0. This is the third occurrence of this pattern across the scripts. Consider creating a shared utility function to extract the year value consistently, which would also make it easier to handle edge cases uniformly.

Copilot · 2026-01-08T01:24:06Z

_includes/publications.html

+  @media (min-width: 992px) {
+    .pub-sidebar {
+      /* Offset matches desktop header height to prevent overlap; update if header height changes */
+      top: 100px;
+    }


The inline comment mentions that the top offset "matches desktop header height to prevent overlap" and suggests updating if header height changes. However, this creates a maintenance burden as the value is hardcoded. Consider using CSS custom properties or calculating the offset dynamically with JavaScript based on the actual header height, which would make the code more maintainable.

Copilot · 2026-01-08T01:24:06Z

_includes/publications.html

+      // Check type filter (case sensitive, controlled values)
+      const typeMatch = currentFilters.type === 'all' ||
+        type === currentFilters.type;
+
+      // Check category filter (case sensitive, controlled values)
+      // Note: Data should be validated on page load to ensure only expected values are present
+      const catMatch = currentFilters.category === 'all' ||
+        categories.includes(currentFilters.category);


The filter comparison logic on lines 360-361 and 365-366 uses strict equality for type and category checks, which is correct. However, the comment on line 364 mentions "Data should be validated on page load to ensure only expected values are present" but there's no actual validation code implemented. Consider adding a validation function that runs on page load to check for unexpected values and log warnings or errors, which would help catch data quality issues early.

Copilot AI review requested due to automatic review settings January 6, 2026 22:33

Copilot started reviewing on behalf of Mhafsia January 6, 2026 22:33 View session

Copilot AI reviewed Jan 6, 2026

View reviewed changes

Mhafsia requested a review from Copilot January 8, 2026 00:11

Copilot started reviewing on behalf of Mhafsia January 8, 2026 00:11 View session

Copilot AI reviewed Jan 8, 2026

View reviewed changes

Mhafsia requested a review from Copilot January 8, 2026 00:24

Copilot started reviewing on behalf of Mhafsia January 8, 2026 00:25 View session

Copilot AI reviewed Jan 8, 2026

View reviewed changes

Mhafsia requested a review from Copilot January 8, 2026 01:18

Copilot started reviewing on behalf of Mhafsia January 8, 2026 01:18 View session

Copilot AI reviewed Jan 8, 2026

View reviewed changes

		<button class="filter-pill filter-pill-year" data-filter="2019-"
		onclick="toggleFilter(this, 'year')">Before 2020</button>

	<button class="pub-tag" onclick="filterByTag('{{ project.type }}', 'type')">{{ type_display
	<button class="pub-tag" onclick="filterByTag('{{ project.type }}', 'type')" aria-label="Filter by type: {{ type_display }}">{{ type_display

		SERPAPI_KEY = env.get('SERPAPI_KEY', os.getenv('SERPAPI_KEY', ''))
		AUTHOR_ID = env.get('AUTHOR_ID', 'TakXk9MAAAAJ') # Guillaume Dumas's Google Scholar ID

		# Extract frontmatter between --- markers
		match = re.match(r'^---\s\n(.?)\n---', content, re.DOTALL)

	spellcheck="false">
	spellcheck="false" aria-label="Search publications">

	.pub-sidebar {
	.pub-sidebar {
	/* Offset matches desktop header height to prevent overlap; update if header height changes */

Conversation

Mhafsia commented Jan 6, 2026

Uh oh!

netlify bot commented Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for ppsp ready!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

netlify bot commented Jan 6, 2026 •

edited

Loading