The Pageman in Kabul

From Predictions to Causal Insights: My Pilot Study on SplitUP for Unpaired Multi-Omics Data — Bug Fixes, Docker, and Why It Matters

pageman Jan 23, 2026

A quick story of rapid follow-up research, fixing a subtle bug, packaging everything reproducibly, and how this estimator could help turn machine learning predictions into trustworthy causal relationships in genomics and beyond. Posted: January 24, 2026 Two days ago (Jan 21, 2026), Schur et al. uploaded a beautiful theoretical paper to arXiv: “Many Experiments, Few […]

Show full content

A quick story of rapid follow-up research, fixing a subtle bug, packaging everything reproducibly, and how this estimator could help turn machine learning predictions into trustworthy causal relationships in genomics and beyond.

Posted: January 24, 2026

Two days ago (Jan 21, 2026), Schur et al. uploaded a beautiful theoretical paper to arXiv:

“Many Experiments, Few Repetitions, Unpaired Data, and Sparse Effects: Is Causal Inference Possible?”
arXiv:2601.15254 [stat.ML]
https://arxiv.org/abs/2601.15254

They introduced SplitUP — a cross-fold sample-splitting GMM estimator designed exactly for the kind of data we increasingly face in modern biology: unpaired measurements of exposures (X) and outcomes (Y) across many experimental conditions or environments (Z as high-dimensional instruments), with hidden confounding lurking everywhere.

The promise is huge: consistent causal effect estimates even when m (number of environments) → ∞ but r (replicates per environment) stays fixed and small. They also added an ℓ₁-regularized version for sparse effects — perfect for high-dimensional omics where only a few pathways or metabolites truly matter.

But the paper is asymptotic theory. What happens in finite samples — say m = 50–200, r = 5–20, d = 50–500, with weak instruments or sparsity — was left open.

Motivation for the Pilot: Finite-Sample Reality Check

In real multi-omics pipelines (especially unpaired settings like Perturb-seq, multi-site metabolomics/transcriptomics, or pan-cancer CRISPR screens), we rarely hit the asymptotic regime. We have moderate numbers of conditions/time points/cell lines, limited biological replicates, high sparsity, and plenty of confounding from batch effects or heterogeneity.

So I ran a quick Monte Carlo simulation study to ask:

Does SplitUP really outperform simpler two-sample IV (TS-IV) in these regimes?
How does splitting variance behave at moderate m?
Is the ℓ₁ extension practically useful for sparse signals?

The result: a 6-page pilot paper posted Jan 23, 2026:

“Finite-Sample Performance of SplitUP in Many-Environments Unpaired Instrumental Variables: A Simulation Study with Pilot Results”
https://www.researchgate.net/publication/400024656

Key early findings (limited reps 10–50/cell, m=100–200, d=50):

UP-GMM ℓ₁ often lower error at moderate scale (splitting variance stabilized estimates)
Analytic SplitUP showed preliminary bias reduction in projected high-m/low-r sparse/weak regimes
But broader validation needed — hence the reproducibility package promise.

The Bug Discovery & FigShare Correction (Transparency Matters)

Within 24 hours of posting, I found a subtle but critical bug in the analytic SplitUP implementation: the diagonal correction term in C_XX was missing the scaling factor m (number of environments). This underscaled the bias correction, making SplitUP appear worse than TS-IV in moderate-m regimes — exactly the opposite of theory.

The fix was simple but essential:

# Before (buggy)C_XX = (n / (n - 1)) * BTB - correction / (n * (n - 1))# After (corrected)C_XX = (n / (n - 1)) * BTB - m * correction / (n * (n - 1))

I updated the FigShare reproducibility package (same DOI, new version):

https://figshare.com/articles/software/…/31135255

Replaced the buggy notebook with the corrected one
Added a prominent correction notice at the top of the description
Re-ran a few grids → results now align better with Schur et al.’s predictions (SplitUP competitive or better starting ~m=100–150)

Transparency first: better to fix and disclose within a day than let buggy code circulate.

Docker Reproducibility Package — One Command to Reproduce Everything

To make it dead simple for anyone to verify or extend the work, I packaged the corrected notebook + modular code into a Docker image:

Docker Hub: https://hub.docker.com/r/pageman/splitup-reproducibility

Pull & run:

docker pull pageman/splitup-reproducibility:latestdocker run --rm -v $(pwd)/results:/app/data pageman/splitup-reproducibility:latest

Outputs (HTML notebook, CSVs, plots) land in ./results/.
Everything pinned: Python 3.12, NumPy 1.26.4, SciPy 1.16.2, etc. No dependency hell.

Why This Matters: From Omics Predictions → Causal Relationships

Most omics ML today is predictive: train a classifier/regressor on unpaired multi-condition data → identify “important” modules (latent embeddings, metabolites, genes) associated with outcomes (neuroprotection, essentiality, toxicity).

But associations ≠ causation — hidden confounding is everywhere (batch effects, cell-type heterogeneity, technical noise).

SplitUP offers a way to go one step further:

Treat experimental conditions/time points/cell lines/mutations as many instruments Z
Use predictive modules as exposure X
Use phenotype/outcome as Y
Estimate causal β (X → Y) under hidden confounding, with consistency even in high-m/low-r regimes

In my pilot, the finite-sample checks show:

At moderate scale (typical in current omics), simpler TS-IV may be safer (lower variance)
In larger/paner regimes (future DepMap-scale, Perturb-seq mega-datasets), SplitUP + ℓ₁ starts delivering real bias reduction for sparse causal pathways

That’s the bridge: turn “this metabolite is associated with neuroprotection” into “this metabolite causally contributes to neuroprotection (β = 0.42, p < 0.01)” — more actionable for downstream wet-lab validation or drug targeting.

Closing Thought

Rapid follow-up research + immediate bug fix + full Docker reproducibility — this is how open science should move in 2026.

The theory (Schur et al.) is elegant; the practice (finite-sample reality) needs pilots like this one.

Links:

Original theory: https://arxiv.org/abs/2601.15254
Pilot paper: https://www.researchgate.net/publication/400024656
FigShare (corrected + Docker info): https://figshare.com/articles/software/…/31135255
Docker image: https://hub.docker.com/r/pageman/splitup-reproducibility

Questions, extensions, or want to collaborate on omics applications? DM me (@pageman).

#causalinference #multiomics #reproducibility #splitup #docker

http://pageman.wordpress.com/?p=1505

Extensions

The Amazing Journey of Portátil

pageman Jan 21, 2026

The Amazing Journey of Portátil – A Spanish Etymology Adventure From ancient Proto-Indo-European roots to the laptop in your bag — discover the incredible 2,700-year story of how one Spanish word evolved through time, culture, and technology! 🌟 The Word’s Journey in One Line PIE *per-² → Latin *portāre* → *portābilis* → Medieval Latin *portātilis* […]

Show full content

The Amazing Journey of Portátil – A Spanish Etymology Adventure

From ancient Proto-Indo-European roots to the laptop in your bag — discover the incredible 2,700-year story of how one Spanish word evolved through time, culture, and technology!

Beautiful illustration showing the journey of a Spanish word through time — Interactive Word Journey

The Word’s Journey in One Line

PIE *per-² → Latin *portāre* → *portābilis* → Medieval Latin *portātilis* → Old Fr. *portatile* → Castilian *portátil* (13th c.) → “laptop” (2000)

Travel Through Time

Ancient PIE Era

4000+ BCE

The Seed: *per-²

It all began with a tiny Proto-Indo-European root meaning “to lead, pass over.” Imagine our ancestors pointing the way forward!

Classical Latin

1st century BCE

Latin Takes Shape

*per-* blossomed into portāre — to carry, to convey! This magnificent verb traveled through Rome, inspiring words like porta (gate) and portus (harbor).

Imperial Latin

1st century CE

Enter *portābilis*

Engineers and scribes added the suffix -bilis to mean “that can be X-ed.” Thus portābilis — “carry-able” — was born! Imagine portable books carried by Roman scholars!

Medieval Era

6–9th century

The Romance Transformation

French speakers smoothed it to portatile, and Latin scholars remade it as portātilis. Monks cataloged portatile altare — altars they could carry!

Old Spanish

13th century

¡Hola, Castellano!

In Alfonso X’s legal codes (c. 1265), we find libros portátiles — books small enough to carry. The word had arrived in Spanish!

Modern Era

1950s–2000s

The Technology Leap

From portable radios in the 1950s to el portátil meaning “laptop” by 2000. One semantic shift in 2,700 years — from “carry-able” to “the carry-able computer!”

Timeline showing historical periods from ancient Rome through medieval ages to modern times

How “Portátil” Changed Over Time

Watch the meaning transform from ancient altars to modern laptops!

13–16th Century

carry-able, movable

altarssundialsweaponslamps

17th Century

musical instruments & mirrors

luteshand mirrors

18th Century

scientific machines

scientific devices

19th Century

everyday portable items

bombslanternsdesks

1950s

electronics age begins

portable radiosportable TVs

1987+

computer age revolution

laptopsportable computers

Elegant vintage books, ancient scrolls, and modern laptops showing evolution of knowledge

Meet the Relatives!

All these words share the same ancient root *per-* → *portāre*

transportar

Meaning: to transport

Origin: trans- + portare

importar

Meaning: to import

Origin: in- + portare

exportar

Meaning: to export

Origin: ex- + portare

reportar

Meaning: to report/carry back

Origin: re- + portare

soportar

Meaning: to support/bear

Origin: sub- + portare

oportuno

Meaning: opportune

Origin: ob- + portare → “bringing to”

Magical word transformation from ancient Latin script to modern Spanish typography

Fun & Fascinating Facts

Romance Family

Spanish, Portuguese, Catalan, French, and Italian ALL developed similar forms from the same Latin root!

Gender Bender

As an adjective, it’s epicene (works with any gender). As a noun, it’s overwhelmingly masculine: el portátil!

One Big Change

In 2,700 years, the word made only ONE real semantic jump — from “portable” to “laptop” by ellipsis!

97% Recognition

Surveys show 97% of native Spanish speakers recognize portátil as “laptop” without any context needed!

The Word’s Secret Superpower

The word “portátil” has undergone only ONE real semantic mutation in 2,700 years: from the generic idea “carry-able” to, by ellipsis, “the carry-able computer.” Every other change — vowel stem, stress, suffix — was a mechanical adjustment inside the phonology of successive languages.

The reception history therefore mirrors technological history: when an object becomes small enough to be carried, the ancient word for “portable” naturally snaps onto it, no new lexeme required.

How amazing is that?!

An interactive journey through language, time, and technology
Built with for word lovers

http://pageman.wordpress.com/?p=1501

Extensions

Dynasties, Poverty, and Power Concentration in the Philippines (2004–2016): A Nuanced Assessment of Economic Inequality

pageman Nov 24, 2025

1. Executive summary This report synthesizes the Kosmos Data Analysis on political dynasties and poverty in Philippine provinces (2004–2016) and your detailed notes. It focuses on: Core conclusions 2. Data and variables 2.1 Unit of analysis and time frame 2.2 Key variables 2.3 Discretionary analytical decisions The analysis made several explicit choices that shape the […]

Show full content

Statistical analysis showing the relationship between political dynasty share and poverty incidence in Philippine provinces (2004-2010) — Figure 1: Statistical analysis of the relationship between political dynasty share and poverty incidence in Philippine provinces (2004-2010). The image includes four panels: (A) a scatter plot showing the weak negative correlation between dynasty share and poverty incidence with regression line (r=-0.119); (B) temporal trends of both variables across election years; (C) year-specific correlation coefficients; and (D) a summary box with key statistical findings including Pearson and Spearman correlations, regression results (R²=0.0141), and research conclusions.

1. Executive summary

This report synthesizes the Kosmos Data Analysis on political dynasties and poverty in Philippine provinces (2004–2016) and your detailed notes. It focuses on:

How dynasty share (DYNSHARE) relates to:
- Poverty incidence (FAMPOVINC – % of families that are poor), and
- Poverty magnitude (FAMPOVMAG – absolute number of poor families).
How these relationships change across time and when dynasty concentration and dominance are considered.
How the 20 additional analytical questions extend the research agenda.

Core conclusions

Dynasty share vs poverty incidence (main 2004–2010 analysis)
- Overall correlation is weak, negative, and only marginally significant:
  - Pearson r = –0.1187, p = 0.068 (not significant at α = 0.05).
  - Spearman ρ = –0.1572, p = 0.015 (significant at α = 0.05).
- A simple linear regression suggests that higher dynasty share is associated with slightly lower poverty incidence, but:
  - The slope is small: –13.52 percentage points per unit of DYNSHARE.
    → Roughly –1.35 percentage points in poverty incidence for every +10 percentage points in DYNSHARE.
  - The model explains only ~1.4% of variance (R² = 0.0141) – statistically and substantively weak.
Dynasty share vs poverty magnitude
- For FAMPOVMAG, DYNSHARE is positively and significantly correlated:
  - Pearson r = +0.1870, p = 0.0039.
- Provinces with higher dynasty share tend to have more poor families in absolute terms, even when their poverty rates may be slightly lower or similar.
Concentration and dominance matter more than raw share
- In a multivariate model including:
  - DYNSHARE (overall dynastic prevalence),
  - DYNLAR (largest family’s share of seats), and
  - DYNHERF (Herfindahl index of dynasty concentration), plus year controls:
    - DYNSHARE coefficient ≈ 0.78, p = 0.942 (no predictive power).
    - DYNLAR coefficient ≈ +517.26, p = 0.001 (highly significant).
    - DYNHERF coefficient ≈ –2,737.55, p = 0.005 (highly significant).
- Interpretation: where dynasties are more concentrated or a single family is dominant, poverty is more strongly related to dynasty structure than to the mere presence of dynasties. Overall dynastic prevalence (DYNSHARE) on its own becomes uninformative once these deeper structural metrics are considered.
Temporal instability
- Year-by-year correlations between DYNSHARE and poverty incidence range from moderately negative to slightly positive and are never statistically significant.
- The negative association in 2004 and 2007 (r ≈ –0.18) disappears by 2010, and in the extended 2004–2016 dataset the overall Pearson r weakens to –0.0737 (p = 0.144).
Inequality and the “two faces” of dynasties
- Poverty incidence: weak evidence that dynastic areas may have slightly lower rates of poverty, but effects are tiny, unstable, and not robust.
- Poverty magnitude: strong evidence that dynastic areas contain more poor people in absolute numbers, suggesting dynasties do not prevent large pockets of deprivation.
Big-picture answer to the research question
- Dynastic control does not convincingly mitigate poverty, and there is no strong evidence that it directly increases poverty rates either.
- The relationship is complex, small in magnitude, and likely mediated by many other factors (urbanization, regional economies, national programs, etc.).
- However, dynasties are associated with larger absolute poor populations and with structural features (concentration, dominance) that correlate more strongly with poverty, raising concerns about inequality and democratic accountability.

2. Data and variables 2.1 Unit of analysis and time frame

Unit: 79 Philippine provinces.
Time points:
- Elections: 2004, 2007, 2010, 2013, 2016.
- Poverty data: 2006, 2009, 2012.
Primary “properly matched” panel: 2004–2010 elections matched to subsequent poverty years:
- 2004 → 2006 poverty
- 2007 → 2009 poverty
- 2010 → 2012 poverty
  → This yields 79 × 3 = 237 observations, which form the core dataset for the main regression and correlation analyses.
Extended panel: Adds 2013 and 2016 elections using 2012 poverty data (concurrent/retrospective match), increasing to 395 observations but with less precise temporal matching.

2.2 Key variables

DYNSHARE: Share of elective positions in a province held by members of political dynasties. Values range roughly from <0.1 to ~0.7 in the dataset.
FAMPOVINC: Family poverty incidence – percentage of families considered poor in the province.
FAMPOVMAG: Family poverty magnitude – number of poor families (an absolute count).
DYNLAR: Largest dynasty’s vote or seat share (measure of dominance by a single family).
DYNHERF: Herfindahl index of dynasty concentration across families. Higher values imply a more concentrated dynasty structure (few families dominate); lower values mean dynastic power is spread across many families.

2.3 Discretionary analytical decisions

The analysis made several explicit choices that shape the conclusions:

Time matching strategy
- Primary strategy: lagged match (election → subsequent poverty data), under the assumption that poverty reflects conditions that emerge after the election cycle.
- Extended strategy: concurrent/retrospective match for 2013 and 2016 with 2012 poverty, explicitly flagged as weaker.
Statistical test selection
- Used both Pearson and Spearman correlations:
  - Pearson for linear relationships with assumptions of normality.
  - Spearman for rank-based, monotonic relationships robust to outliers and non-linearity.
- Spearman deemed more appropriate given the scatter plots and potential non-linearities.
Significance thresholds
- Conventional α = 0.05 used throughout.
Treatment of 2013 and 2016
- Included only in secondary / robustness analyses with clear caveats.
Handling missing data
- Listwise deletion for two observations missing DYNHERF values; no imputation.
Model specification
- Multivariate model includes DYNSHARE, DYNLAR, DYNHERF, and year dummies, but excludes other socio-economic controls (e.g., HDI) due to missingness.
Effect size interpretation
- Standardized coefficients interpreted using Cohen’s benchmarks (|β| < 0.3 = small).
Visualization choices
- Four-panel (actually three-plot + summary-box) figure:
  - Panel A: scatter of DYNSHARE vs FAMPOVINC with regression line.
  - Panel B: temporal trends in mean DYNSHARE and mean FAMPOVINC.
  - Panel C: year-specific correlations.
  - Panel D: textual key findings summary.

These decisions are transparent and reasonable but should be kept in mind when interpreting the results, especially the limited socio-economic controls and the lag/concurrent matching choices.

3. Descriptive trends: dynasties and poverty over time 3.1 Mean dynasty share and poverty incidence

From the year-by-year table:

Election yearPoverty yearNMean DYNSHAREMean FAMPOVINC20042006790.303228.52%20072009790.365727.72%20102012790.450726.96%20132012*790.456226.96%20162012*790.507426.96%

Key descriptive patterns:

Dynasty share rises steadily over time.
- From ~30% in 2004 to over 50% by 2016.
- This suggests continuous entrenchment and expansion of dynastic control.
Average poverty incidence declines modestly (2006–2012).
- From ~28.5% in 2006 to ~27.0% in 2012.
- This broad improvement is likely driven by macroeconomic trends and national policies, not necessarily dynasties.
The visual trend lines (Panel B) display:
- An upward-sloping line for DYNSHARE.
- A downward-sloping line for FAMPOVINC.
- Superficially, one might read this as “more dynasties, less poverty,” but this is a time-series coincidence, not a causal inference. Both could be trending due to independent factors.

4. Bivariate correlation: DYNSHARE and poverty incidence (2004–2010) 4.1 Correlation coefficients

For the properly matched 2004–2010 dataset:

Pearson correlation, r = –0.1187, p = 0.0680
- Weak negative association.
- Fails to reach conventional significance (α = 0.05).
Spearman rank correlation, ρ = –0.1572, p = 0.0154
- Slightly stronger negative association in ranks.
- Statistically significant at α = 0.05.
- Interpreted as marginal evidence of a monotonic but weak inverse relationship, robust to outliers.

The difference between Pearson and Spearman indicates that:

The relationship is not cleanly linear.
A few influential points or non-linearities matter.
Using a rank-based measure (Spearman) is prudent, but it still yields a small effect.

4.2 Simple linear regression

Model:
[
\text{FAMPOVINC} = 32.78 + (-13.52)\times \text{DYNSHARE}
]

Coefficients and statistics

Intercept (β₀) ≈ 32.78
- Implied poverty incidence when DYNSHARE = 0 (hypothetical province without dynasties).
Slope (β₁) = –13.52 percentage points
- Standard error (SE) = 7.37
- 95% CI: [–28.04, +1.01]
- t = –1.833, p = 0.068
Model fit:
- R² = 0.0141 → 1.41% of variance explained.
- Adjusted R² = 0.0099.
- RMSE ≈ 12.9 percentage points, indicating substantial residual spread around the regression line.
Standardized effect size:
- Standardized β = –0.1187 (same as Pearson r).
- Cohen’s interpretation: small effect (far below 0.3).

Substantive interpretation

Translating β₁ into more intuitive units:
- A 10 percentage-point increase in DYNSHARE (e.g., from 30% to 40%) is associated with a 1.35 percentage-point decrease in poverty incidence.
Given that provincial poverty rates are in the 20–60% range, a 1.35-point change is small relative to typical inter-provincial differences.

Statistical and practical significance

Statistically:
- The slope is not significant at α = 0.05 in a standard t-test.
- Close to the threshold (p ≈ 0.068) but still conventionally “not significant.”
Practically:
- Even if we took the point estimate at face value, the magnitude is modest and dwarfed by unexplained variation and likely confounders (economic structure, geography, conflict, etc.).
Robustness:
- Regression assumptions largely hold:
  - Residuals approximately normal (Shapiro-Wilk W = 0.9907, p = 0.133).
  - Homoscedasticity appears acceptable from residual plots.
  - Some weak linear pattern.

4.3 Scatter plot insights (Panel A)

The scatterplot of DYNSHARE vs FAMPOVINC for 2004, 2007, 2010:

Shows wide dispersion of poverty rates within any narrow range of DYNSHARE.
Data points from different years overlap substantially, indicating that cross-sectional variation dominates over time-trend differences.
The regression line has a slight downward slope, visually confirming the small negative coefficient.
No clear threshold or tipping point is visible; the relationship is smooth and weak, not sharply segmented.

5. Temporal instability: year-by-year correlations

Year-specific Pearson correlations between DYNSHARE and FAMPOVINC:

ElectionPoverty yearrp-valueSignificance20042006–0.18480.1030Not sig.20072009–0.18280.1069Not sig.20102012+0.03050.7898Not sig.2013*2012+0.04460.6965Not sig.2016*2012–0.01400.9028Not sig.

(*2013 & 2016 use 2012 poverty data.)

Key observations:

No single year shows a significant correlation, even at relaxed thresholds.
Early years (2004, 2007) show moderate negative correlations (around –0.18), consistent with the overall negative sign but still not significant.
By 2010, the relationship nearly disappears (r ≈ +0.03).
Including the later years with temporal mismatch yields an overall Pearson r for 2004–2016 of –0.0737 (p = 0.144) – even weaker and clearly insignificant.

Panel C in the visualization likely depicts this as grey bars around zero, underscoring the message:

The dynasty–poverty association is not stable across time.

Temporal instability can arise from:

Changes in national programs (e.g., conditional cash transfers, infrastructure investments).
Measurement noise in poverty statistics.
Changes in how dynasties operate (e.g., switching from overt patronage to broader economic alliances).
Structural differences that vary by cohort of provincial elites.

The key result: any correlation we see in pooled data may be fragile and largely driven by specific periods or chance fluctuations.

6. Alternative outcome: poverty magnitude (FAMPOVMAG)

When focusing on absolute number of poor families instead of percentages:

Pearson r(DYNSHARE, FAMPOVMAG) = +0.1870, p = 0.0039.
This is a positive, statistically significant correlation with a small-to-moderate effect size.

Interpretation:

Provinces with larger dynasty shares tend to have more poor families in absolute terms, even if their rates of poverty are not much higher (or are slightly lower).
This likely reflects a mixture of:
- Population size effects: more populous (often more economically central) provinces can sustain a larger political class and hence more dynasties – and also naturally have more poor people even at the same incidence.
- Inequality within provinces: a province may exhibit relatively moderate poverty rates but still have very large pockets of deprivation in densely populated towns or urban peripheries.
- Urbanization and political stakes: dynastic competition is often fiercest where the political stakes (revenues, contracts, land values) are high; these areas can host both wealth and deep pockets of poverty.

Substantively, this result is highly important:

It suggests that dynastic dominance coexists with large poor populations, challenging any quick narrative that dynasties “solve” poverty just because rates may look relatively contained.
From a policy perspective, poverty magnitude is what stresses social services and public budgets (number of families to support, not just percentage).

Together with the weak negative or null correlation with poverty incidence, we can say:

Dynasties do not prevent large-scale poverty in absolute terms and may be more common where a large poor population exists, even if the relative poverty rate is not extreme.

7. Multivariate regression: concentration and dominance

The multivariate regression controlling for different dimensions of dynastic structure and time found:

DYNSHARE coefficient ≈ 0.78, p = 0.942
- Once we account for DYNLAR and DYNHERF, DYNSHARE becomes irrelevant statistically.
- This suggests that the overall extent of dynastic presence is largely absorbed by more structurally informative metrics.
DYNLAR coefficient ≈ +517.26, p = 0.001
- A strong and positive association between the dominance of the largest dynasty and poverty incidence.
- Interpretation (qualitative due to scaling):
  - Provinces where a single family controls a larger share of seats tend to have higher poverty incidence, all else equal.
  - Consistent with concerns that monopolistic dynasties may stifle competition, accountability, and inclusive development.
DYNHERF coefficient ≈ –2,737.55, p = 0.005
- Negative and significant coefficient for dynasty concentration.
- On the face of it, this suggests that – holding DYNSHARE and DYNLAR constant – greater consolidation across dynastic families might be associated with lower poverty.
- However, interpretation here is tricky:
  - DYNHERF is highly collinear with both DYNSHARE and DYNLAR by construction (they are all derived from the same configuration of family shares).
  - Its effect may be sensitive to scaling and interactions – e.g., concentration could be capturing provinces where a few relatively capable families dominate in otherwise highly dynastic environments.
  - Without additional socio-economic controls, we cannot treat this as evidence that concentration per se reduces poverty.
Overall model fit:
- R² = 0.058, F-statistic = 2.83, p = 0.017.
- The model explains only ~5.8% of variance – statistically detectable but substantively small.

Takeaways:

Dynasty structure matters more than raw presence.
- Whether there are dynasties is less informative than how power is distributed among them.
Dominance by a single family (DYNLAR) is clearly associated with higher poverty.
- This aligns with normative concerns about monopolistic elites.
Concentration (DYNHERF) has a complex relationship with poverty and needs more careful modeling and interpretation (possibly including non-linear or interaction terms, or separate models for different regions).

8. Statistical robustness and assumptions

Checks reported:

Normality of residuals:
- Shapiro–Wilk W = 0.9907, p = 0.133 → do not reject normality.
Linearity:
- Scatter and residual plots suggest a weak but roughly linear pattern between DYNSHARE and FAMPOVINC.
Homoscedasticity:
- Residuals show no strong funnel shape; variance appears fairly constant across fitted values.

Given these diagnostics:

The main regression model is technically sound from a classical OLS standpoint.
The weak results are therefore not due to gross violations of assumptions but to genuinely small underlying effects and substantial unexplained variation.

9. Answering the core research question

Does dynastic control exacerbate or mitigate economic inequality?

Based on all the evidence:

Weak mitigation effect on poverty incidence (but not robust)
- Pooled 2004–2010 data show a small negative association between dynasty share and poverty incidence (r ≈ –0.12 to –0.16).
- Statistically:
  - Spearman correlation is significant; Pearson and the regression slope are only marginally so.
- Substantively:
  - The effect is tiny (around a 1–1.5 percentage point change in poverty incidence for a 10-point change in dynasty share).
  - The model explains only ~1–1.5% of variance.
- This is not strong evidence that dynasties systematically reduce poverty; at best, they might coincide with slightly lower poverty in certain time periods.
Exacerbation in absolute terms
- Dynastic provinces have significantly higher poverty magnitude (more poor families in absolute numbers).
- This is robust (p ≈ 0.004) and substantively meaningful, particularly for resource planning and service delivery.
Concentration and dominance are the bigger concern
- Once we consider how dynastic power is distributed, overall dynasty share loses predictive power.
- Dominance by a single family (DYNLAR) is strongly associated with higher poverty incidence.
- This aligns with theories that uncompetitive elite structures can entrench exclusion, weaken checks and balances, and channel resources narrowly.
Temporal and contextual dependence
- Year-by-year correlations fluctuate and are never statistically significant.
- This suggests that the dynasty–poverty nexus depends heavily on specific contexts:
  - national and regional economic cycles,
  - policy interventions,
  - local governance quality,
  - conflict and security dynamics.
Overall judgment
- The data do not support a simple headline claim like “dynasties cause poverty” or “dynasties reduce poverty.”
- Instead, they portray a weak, inconsistent relationship for poverty incidence, a stronger association for poverty magnitude, and clearer links when looking at concentration and dominance measures.
- Dynasties appear compatible with both moderate improvements in average poverty rates and persisting or growing absolute numbers of poor, hinting at issues of inequality, distribution, and who actually benefits from development.

10. Policy implications for the Philippines

Even with all caveats, several policy-relevant insights emerge:

Regulating concentration, not just counting dynasts
- Potential anti-dynasty legislation may be more effective if it targets monopolistic control (e.g., caps on seats for any single family within a jurisdiction) rather than simply limiting the overall number of dynastic politicians.
Focus on inequality and magnitude of poverty
- Development programs should track both poverty incidence and poverty magnitude, especially in dynastic strongholds.
- High-DYNSHARE provinces with large poor populations may require targeted social protection, pro-poor infrastructure, and inclusive governance reforms.
Strengthening local accountability
- Where one family dominates, strengthening:
  - local civil society,
  - independent media,
  - citizen oversight bodies, and
  - competitive party systems
    becomes crucial to offset the risks of elite capture.
Improved data and monitoring
- Systematic collection of:
  - more frequent poverty measures (beyond 2012),
  - local governance quality indicators,
  - public expenditure and project data by province,
    would enable richer causal analysis (e.g., panel models, difference-in-differences, or instrumental-variable designs).
Communication with citizens
- The nuanced finding that dynasties coexist with large poor populations can help citizens:
  - Look beyond simple claims that “experienced dynasts bring development,” and
  - Demand evidence of broad-based, inclusive progress, not just visible projects.

11. Extending the research: interpreting the 20 analytical questions

Your list of 20 advanced analytical questions, each scored on usefulness and policy impact, provides a roadmap for future work. Here’s how they cluster and what they contribute, building on the present findings.

11.1 Deepening the dynasty–poverty–development link (Questions 1–3, 10–12)

Q1 (overall correlation and regression between DYNSHARE and FAMPOVINC, 2004–2016)
- This is essentially the core analysis already carried out, with a very high rubric score (98/100).
- It provides the baseline picture against which all other questions are compared.
Q2 (HDI vs DYNHERF, 1997–2009)
- Extends the focus from poverty alone to broader human development (education, health, income).
- By anchoring on DYNHERF, it speaks directly to concentration effects, where our multivariate results already show importance.
- This can answer: Do concentrated dynasties coincide with lower HDI, even when poverty rates look acceptable?
Q3 & Q11 (fat dynasties and changes in FAMPOVMAG)
- “Fat dynasties” represent families with multiple simultaneous officeholders.
- Linking their growth to changes in poverty magnitude sharpens the findings that dynasties are associated with large poor populations and can identify provincial hotspots of entrenched family power plus growing deprivation.
Q10 (difference tests between dynastic and non-dynastic provinces)
- Uses t-tests / ANOVA to complement correlations, explicitly contrasting groups of provinces with low vs high dynastic presence.
- This can deliver more straightforward statements for policy debates, such as “there is / is not a statistically significant difference in average poverty incidence between dynastic and non-dynastic provinces.”
Q12 (predictive models for fat dynasty emergence)
- Moves toward early-warning systems: given current poverty and dynasty metrics, what provinces are at risk of evolving into fat dynasty strongholds?
- This is valuable for proactive interventions and civic education campaigns.

11.2 Political structure and geography (Questions 4, 7, 9, 14, 17)

These questions probe where and in what forms dynasties are entrenched:

Q4 (temporal trends in DYNLAR and forecasting dominance)
- Directly targets the dominance variable that strongly predicts poverty in the multivariate model.
- Time-series models can identify provinces on track to become single-family fiefdoms, even if current poverty levels are moderate.
Q7 (position-specific dynasty prevalence vs poverty thresholds)
- Distinguishes dynastic patterns by office type (Governor, Mayor, Councilor, etc.).
- This can reveal whether control over core executive positions is more problematic than dynastic clustering in less powerful posts.
Q9 (regional clustering of DYNSHARE with k-means and socio-economic covariates)
- Identifies regions with similar dynasty–poverty profiles, which can be targeted with region-specific reforms rather than one-size-fits-all policies.
Q14 (urban vs rural comparison)
- Clarifies whether dynasties behave differently in urbanized vs rural provinces, which may have contrasting economic structures and political markets.
Q17 (spatio-temporal heatmaps of DYNSHARE)
- Enhances public and media understanding, making trends in dynastic expansion or contraction visible at a glance.

11.3 Networks, parties, and social structure (Questions 5, 6, 8, 13, 16)

These questions open up the micro-structure of dynastic power:

Q5 (party–dynasty linkages and HDI)
- Explores whether certain political parties systematically harbor more dynastic candidates, and whether this is associated with development outcomes.
- Useful for party reform and candidate vetting.
Q6 (network structure of dynastic families in high-poverty provinces)
- Uses graph theory to visualize interlocking families, alliances, and positions.
- Particularly powerful for journalistic storytelling and civic education, highlighting how a few families may jointly control multiple offices.
Q8 (gender disparities among dynastic politicians and relation to HDI)
- Adds a gender lens: Are dynastic seats predominantly male? When women are elected from dynastic families, does that associate with different development outcomes?
Q13 (DYNHERF outliers and provincial characteristics)
- Focuses on exceptional cases – provinces with extreme concentration but relatively good or bad development profiles.
- Ideal for case studies that can inform theory and policy (what works, what fails).
Q16 (ethical implications with conflict/terrorism data)
- Extends the framework beyond economic outcomes into security and conflict, addressing arguments that dynasties might either mitigate or inflame local conflicts.

11.4 Methodological and meta-analytic tools (Questions 15, 18–20)

These questions strengthen the analytic toolkit:

Q15 (PCA for dimension reduction)
- Helps identify latent factors (e.g., “elite concentration,” “socio-economic disadvantage”) from many correlated variables, simplifying modelling and communication.
Q18 (bibliometric analysis of references)
- Maps the existing research landscape on dynasties and poverty, clarifying where this Philippine analysis contributes and where gaps remain.
Q19 (sentiment analysis of dynasty-related abstracts vs stats)
- Compares qualitative academic narratives with the quantitative data, examining whether scholarly discourse is overly optimistic or pessimistic relative to actual correlations.
Q20 (Monte Carlo simulations for robustness)
- Tests how sensitive the observed correlations are to measurement errors, missing data, and modelling choices, reinforcing or tempering any strong claims.

Altogether, these 20 questions transform the current report from a descriptive study into a research program that can:

Probe causal mechanisms more deeply.
Address different dimensions of inequality (economic, political, gendered, spatial).
Improve both academic understanding and public communication about dynasties.

12. Limitations and cautions

Any interpretation of these findings should keep in mind:

Correlational, not causal
- No identification strategy (e.g., natural experiments, IVs, panel fixed-effects with strong exogeneity assumptions) is used.
- It is entirely possible that economic development affects dynastic success, not the other way around (or both respond to deeper structural variables).
Limited time coverage for poverty data
- Poverty data only up to 2012, forcing retrospective matching for 2013 and 2016.
- This reduces power to detect temporal dynamics and may obscure delayed effects.
Omitted variables
- Key factors such as:
  - industrial composition,
  - urbanization, migration,
  - infrastructure,
  - regional conflict,
    are not systematically controlled for due to data limitations.
Measurement issues
- DYNSHARE, DYNLAR, DYNHERF depend on correct identification of dynastic ties, which can be complex (marriages, alliances, changing surnames).
- Poverty estimates at the provincial level have survey and modeling errors.
Small effect sizes
- Even where statistically significant, many effects are small to very small in standardized terms.
- This means even robust correlations are unlikely to explain a large share of observed inequality.
Access to primary Kosmos report
- The web interface for the Kosmos trajectory is not fully visible from here; this synthesis relies heavily on the summary and data you provided rather than the full original HTML content.

13. Concluding reflections

The Kosmos-based analysis of Philippine political dynasties and poverty delivers a nuanced message:

Dynasties are deeply embedded and growing, with dynasty share rising from ~30% to ~50% between 2004 and 2016.
Poverty incidence has modestly improved over the same period, but these trends are not strongly tied to dynastic prevalence.
Cross-sectional correlations between dynasty share and poverty rates are weak, unstable, and mostly not significant, especially when extended to 2016.
Absolute poverty magnitude tells a different story: dynastic provinces have more poor families, reinforcing concerns that elite entrenchment coexists with mass deprivation.
Structural aspects of dynastic power—concentration and dominance—are more predictive of poverty than simple presence, highlighting the risks posed by monopolistic family rule.
The evidence is not strong enough to claim that dynasties directly cause poverty, but it is strong enough to argue that dynastic governance has not meaningfully mitigated it and may shape how poverty and inequality are distributed and sustained.

Moving forward, the rich set of 20 proposed analyses can turn this initial portrait into a comprehensive, multi-dimensional research program—one that informs not only academic debates, but also citizen advocacy, party reform, and the design of anti-dynasty legislation and poverty policies in the Philippines.

The original 20 questions:

What is the correlation between the dynasty share (DYNSHARE) in Philippine provinces and poverty incidence (FAMPOVINC) across election years from 2004 to 2016, including regression analysis to quantify how dynastic control exacerbates or mitigates economic inequality?
Rubric Score: 98/100 (High across dimensions like Impact on Economic Inequality, Correlation with Poverty Levels, Policy Reform Potential, Socio-Economic Correlates, Statistical Significance, Comparative Analysis, Research Gap Filling, Academic Contribution, Long-term Societal Impact).
Explanation: This question is most useful for Filipinos as it directly links political dynasties to poverty, empowering voters to advocate for anti-dynasty laws and informing policy to reduce inequality in resource allocation.
Probabilities: Probability of informing anti-poverty policy: 95%; Probability of raising public awareness on dynasties’ economic harm: 92%.

How does the human development index (HDI) vary in provinces with high dynasty herfindahl indices (DYNHERF) versus low ones from 1997 to 2009, using multivariate analysis to identify causal patterns?
Rubric Score: 96/100 (Strong in Influence on Human Development Index, Socio-Economic Correlates, Temporal Trends, Policy Implications, Ethical Implications, Civic Engagement Stimulation).
Explanation: Essential for Filipinos to understand how dynasties affect education, health, and income, guiding investments in non-dynastic regions for balanced national development.
Probabilities: Probability of influencing education/health reforms: 90%; Probability of voter mobilization against dynasties: 88%.

Which provinces show the highest increases in fat dynasties (marked by ‘fat=1’ in the Data sheet) from 2004 to 2016, and what are the associated changes in poverty magnitude (FAMPOVMAG)?
Rubric Score: 94/100 (Excels in Geographic Distribution, Temporal Trends, Correlation with Poverty, Comparative Analysis, Data Visualization Opportunities, Public Awareness Raising).
Explanation: Helps Filipinos identify “dynasty hotspots” linked to poverty growth, fostering targeted civic campaigns and regional equity discussions.
Probabilities: Probability of highlighting regional disparities: 92%; Probability of sparking local anti-dynasty movements: 85%.

What temporal trends exist in dynasty largest share (DYNLAR) across regions from 2004 to 2016, and how do these trends predict future dynasty dominance using time-series forecasting models?
Rubric Score: 92/100 (High in Temporal Trends, Predictive Modeling, Geographic Distribution, Statistical Significance, Long-term Societal Impact, Research Gap Filling).
Explanation: Useful for forecasting electoral patterns, enabling Filipinos to prepare for reforms like term limits to prevent entrenched family power.
Probabilities: Probability of accurate future predictions: 88%; Probability of aiding electoral reform advocacy: 90%.

How do political parties (from the Data sheet) correlate with dynasty prevalence, analyzing if certain parties foster more fat dynasties and their impact on provincial HDI?
Rubric Score: 90/100 (Strong in Party Affiliation Patterns, Relevance to Democratic Processes, Transparency in Governance, Socio-Economic Correlates, Ethical Implications).
Explanation: Informs Filipinos about party biases toward dynasties, promoting informed voting and calls for party diversification.
Probabilities: Probability of exposing party-dynasty ties: 85%; Probability of increasing voter scrutiny: 87%.

What is the network structure of dynastic families (using Last Name and First Name from Data sheet) in high-poverty provinces, visualized through graph theory to show interconnections?
Rubric Score: 88/100 (Excels in Family Network Mapping, Data Visualization, Geographic Distribution, Statistical Significance, Media Reportability).
Explanation: Visualizes family webs for Filipinos, revealing power concentrations and encouraging anti-nepotism education.
Probabilities: Probability of effective visualization impact: 90%; Probability of media amplification: 82%.

In which positions (e.g., Governor, Mayor, Councilor) are dynasties most prevalent from 2004 to 2016, and how does this vary by poverty threshold (POVTHRESH)?
Rubric Score: 86/100 (High in Position-Specific Dynasty Prevalence, Correlation with Poverty, Comparative Analysis, Policy Implications, Voter Education Value).
Explanation: Guides Filipinos on dynasty infiltration in local governance, aiding decisions on supporting non-dynastic candidates in key roles.
Probabilities: Probability of position-specific insights: 85%; Probability of influencing candidate selection: 80%.

What gender disparities exist in dynastic politicians (inferred from First Name in Data sheet) across provinces, and how do these relate to HDI improvements?
Rubric Score: 84/100 (Strong in Gender Representation, Socio-Economic Correlates, Ethical Implications, Research Gap Filling, Civic Engagement).
Explanation: Addresses gender inequality in politics for Filipinos, promoting inclusive representation and linking to broader development.
Probabilities: Probability of highlighting gender gaps: 88%; Probability of supporting women’s political empowerment: 78%.

How do dynasty shares (DYNSHARE) cluster by region using k-means clustering, and what socio-economic factors (e.g., FAMPOVINC) explain these clusters?
Rubric Score: 82/100 (Excels in Geographic Distribution, Statistical Significance, Comparative Analysis, Data Visualization, Academic Contribution).
Explanation: Clusters help Filipinos see regional patterns, informing national policies for dynasty regulation in clustered areas.
Probabilities: Probability of identifying clusters accurately: 85%; Probability of policy targeting: 75%.

What statistical tests (e.g., t-tests, ANOVA) show significant differences in poverty incidence between dynastic and non-dynastic provinces from 2006 to 2012?
Rubric Score: 80/100 (High in Statistical Significance, Correlation with Poverty, Comparative Analysis, Research Gap Filling, Academic Contribution).
Explanation: Provides evidence-based insights for Filipinos to challenge dynasties’ effectiveness in poverty reduction.
Probabilities: Probability of significant findings: 82%; Probability of academic/policy use: 80%.

How has the change in dynasty share from 2013 to 2016 (from DYNASTY STATS (2) sheet) correlated with shifts in family poverty magnitude across provinces?
Rubric Score: 78/100 (Strong in Temporal Trends, Correlation with Poverty, Socio-Economic Correlates, Policy Implications).
Explanation: Tracks recent changes, useful for Filipinos monitoring post-2016 trends and advocating timely interventions.
Probabilities: Probability of detecting correlations: 80%; Probability of recent policy relevance: 75%.

What predictive models (e.g., logistic regression) can forecast fat dynasty emergence based on prior election data and poverty metrics?
Rubric Score: 76/100 (High in Predictive Modeling, Temporal Trends, Statistical Significance, Long-term Impact).
Explanation: Enables Filipinos to anticipate dynasty growth, supporting proactive voter education and laws.
Probabilities: Probability of model accuracy: 78%; Probability of preventive action: 72%.

How do outliers in dynasty herfindahl (DYNHERF) relate to unique provincial characteristics like HDI or party dominance?
Rubric Score: 74/100 (Excels in Statistical Significance, Comparative Analysis, Research Gap Filling, Academic Contribution).
Explanation: Identifies exceptional cases for Filipinos, highlighting success stories or warnings in dynasty management.
Probabilities: Probability of outlier insights: 75%; Probability of case study value: 70%.

What comparative analysis reveals differences in dynasty prevalence between urban (e.g., cities) and rural municipalities, linked to poverty thresholds?
Rubric Score: 72/100 (Strong in Geographic Distribution, Comparative Analysis, Socio-Economic Correlates).
Explanation: Differentiates urban-rural dynamics for Filipinos, aiding tailored anti-dynasty strategies in diverse settings.
Probabilities: Probability of urban-rural distinctions: 80%; Probability of localized usefulness: 68%.

How can principal component analysis reduce dimensions in dynasty stats and poverty data to identify key drivers of inequality?
Rubric Score: 70/100 (High in Statistical Significance, Socio-Economic Correlates, Research Gap Filling).
Explanation: Simplifies complex data for Filipinos, making it accessible for public discourse on inequality drivers.
Probabilities: Probability of dimensionality reduction success: 75%; Probability of public comprehension: 65%.

What ethical implications arise from analyzing dynasty family networks in relation to terrorism or conflict data (mentioned in references)?
Rubric Score: 68/100 (Strong in Ethical Implications, Family Network Mapping, Public Awareness, Civic Engagement).
Explanation: Raises awareness of dynasties’ broader societal risks for Filipinos, though less directly economic.
Probabilities: Probability of ethical discussions: 70%; Probability of conflict-related insights: 62%.

How do visualizations (e.g., heatmaps) of dynasty shares by province over time enhance understanding of regional political evolution?
Rubric Score: 66/100 (Excels in Data Visualization, Temporal Trends, Geographic Distribution, Media Reportability).
Explanation: Visual aids make data engaging for Filipinos, boosting media coverage and public interest in political history.
Probabilities: Probability of visualization effectiveness: 85%; Probability of widespread sharing: 60%.

What bibliometric analysis of the dataset’s references reveals gaps in dynasty-poverty research for future studies?
Rubric Score: 64/100 (High in Research Gap Filling, Academic Contribution, Long-term Impact).
Explanation: Guides academic Filipinos toward new research, indirectly benefiting policy through evidence.
Probabilities: Probability of identifying gaps: 72%; Probability of inspiring studies: 58%.

How does sentiment analysis on dynasty-related references (e.g., abstracts) correlate with quantitative dynasty stats?
Rubric Score: 62/100 (Strong in Research Gap Filling, Academic Contribution, Media Reportability).
Explanation: Adds qualitative layer for Filipinos interested in scholarly discourse, though niche for general use.
Probabilities: Probability of sentiment insights: 65%; Probability of bridging qual-quant: 55%.

What arbitrary simulations (e.g., Monte Carlo) estimate the robustness of dynasty-poverty correlations under data perturbations?
Rubric Score: 60/100 (High in Statistical Significance, Predictive Modeling, Academic Contribution).
Explanation: Tests data reliability for expert Filipinos, ensuring findings’ credibility but least directly actionable for everyday citizens.
Probabilities: Probability of robustness confirmation: 70%; Probability of technical usefulness: 50%.

The Kosmos Data Analysis:

Correlation Between Dynasty Share and Poverty Incidence in Philippine Provinces (2004-2016)

PRIMARY FINDINGS

Overall Correlation (2004-2010 Period with Properly Matched Data):

Pearson correlation coefficient (r) = -0.1187 (p = 0.0680)
Spearman rank correlation (ρ) = -0.1572 (p = 0.0154)*
Statistical significance: Marginally significant – Not significant using Pearson correlation at α = 0.05, but statistically significant using Spearman rank correlation at α = 0.05

*The Spearman correlation is more appropriate given the non-linear relationships and potential outliers in the data.

REGRESSION ANALYSIS

Simple Linear Regression Model: FAMPOVINC = 32.78 + (-13.52 × DYNSHARE)

Key Regression Statistics:

Slope (β₁) = -13.52 percentage points (SE = 7.37, 95% CI: [-28.04, 1.01])
t-statistic = -1.833, p-value = 0.068
R² = 0.0141 (explains only 1.41% of variance in poverty incidence)
Adjusted R² = 0.0099
RMSE = 12.90 percentage points
N = 237 observations (79 provinces × 3 election cycles)

Standardized Effect Size:

Standardized regression coefficient (β) = -0.1187
Effect size classification: Small (Cohen’s conventions: <0.3)

Practical Interpretation:

For every 10 percentage point increase in dynasty share, poverty incidence decreases by approximately 1.35 percentage points
However, this relationship is not statistically significant using parametric tests (Pearson p = 0.068), but is significant using non-parametric tests (Spearman p = 0.015)

MULTIVARIATE REGRESSION RESULTS

When controlling for other dynasty metrics (DYNLAR, DYNHERF) and temporal factors:

DYNSHARE coefficient = 0.78 (p = 0.942) – not significant
DYNLAR coefficient = 517.26 (p = 0.001) – highly significant***
DYNHERF coefficient = -2,737.55 (p = 0.005) – highly significant**
Overall model: R² = 0.058, F-statistic = 2.83 (p = 0.017)*

Key Finding: When accounting for concentration measures (DYNHERF) and largest family share (DYNLAR), the simple dynasty share measure loses predictive power, suggesting that the concentration and dominance of specific families matter more than overall dynasty prevalence.

YEAR-BY-YEAR CORRELATIONS

Election YearPoverty YearNMean DYNSHAREMean FAMPOVINCPearson rP-valueSignificant20042006790.303228.52%-0.18480.1030No20072009790.365727.72%-0.18280.1069No20102012790.450726.96%+0.03050.7898No20132012*790.456226.96%+0.04460.6965No20162012*790.507426.96%-0.01400.9028No

*2013 and 2016 elections matched with 2012 poverty data (concurrent/retrospective comparison)

Temporal Pattern: The relationship between dynasty share and poverty varies substantially by year and is not stable across time periods. The negative correlation observed in 2004 and 2007 disappears by 2010.

ALTERNATIVE OUTCOME: POVERTY MAGNITUDE

DYNSHARE vs. FAMPOVMAG (Family Poverty Magnitude):

Pearson r = +0.1870 (p = 0.0039)**
This represents a positive and statistically significant correlation
Interpretation: Higher dynasty shares are associated with greater absolute numbers of poor families, even when poverty incidence rates show weak or negative associations

This finding reveals a critical nuance: provinces with higher dynasty shares tend to have larger poor populations in absolute terms, suggesting dynasties may exacerbate or fail to mitigate economic inequality at scale.

COMPREHENSIVE ASSESSMENT (2004-2016)

When including all election years (N = 395 observations):

Overall Pearson r = -0.0737 (p = 0.144) – not significant
The relationship weakens when later years with mismatched poverty data are included

STATISTICAL ROBUSTNESS

Regression Assumptions Check:

Normality of residuals: Satisfied (Shapiro-Wilk W = 0.9907, p = 0.133)
Linear relationship: Weak but present
Homoscedasticity: Appears reasonable from residual plots

ANSWER TO RESEARCH QUESTION

Does dynastic control exacerbate or mitigate economic inequality?

The evidence from 2004-2016 shows:

WEAK MITIGATION EFFECT (not robust): A weak negative correlation (r = -0.12 to -0.16) suggests provinces with higher dynasty shares have slightly lower poverty incidence rates, but this relationship:

Is only marginally significant (Spearman p = 0.015, Pearson p = 0.068)
Explains minimal variance (R² = 1.4%)
Varies substantially by year and is not stable
May reflect confounding factors rather than causal effects

EXACERBATION IN ABSOLUTE TERMS: Dynasty share is positively correlated with poverty magnitude (r = +0.19, p = 0.004), indicating that provinces with more dynasties have larger absolute numbers of poor families.
CONCENTRATION MATTERS MORE: Multivariate analysis reveals that dynasty concentration (DYNHERF) and dominance by specific families (DYNLAR) are stronger predictors of poverty than overall dynasty share.

CONCLUSION: The relationship between dynasty share and poverty incidence is weak, inconsistent across time, and not statistically significant using standard parametric methods. While there is marginal evidence of a weak negative association using rank-based methods, the effect size is very small and the practical significance is questionable. More importantly, dynasties appear to exacerbate inequality in absolute terms (poverty magnitude), even if poverty rates show weak associations. The data do not support strong claims that dynastic control meaningfully mitigates economic inequality, nor do they provide robust evidence that dynasties directly exacerbate poverty rates. The relationship is complex, context-dependent, and likely mediated by other factors not captured in simple correlation analyses.

DISCRETIONARY ANALYTICAL DECISIONS

• Time matching strategy: Matched election years to subsequent poverty measurements (2004→2006, 2007→2009, 2010→2012) based on the assumption that poverty data reflects conditions following elections. Alternative strategies could match concurrent years or use lagged effects.

• Statistical test selection: Prioritized Spearman rank correlation alongside Pearson correlation due to potential non-linear relationships and the presence of outliers. Both methods were reported for transparency.

• Significance threshold: Used conventional α = 0.05 for statistical significance testing.

• Treatment of 2013 and 2016 data: Included these years in extended analyses despite poverty data only being available through 2012, clearly noting this limitation. Could have excluded these years entirely.

• Multivariate model specification: Included DYNLAR, DYNHERF, and year dummy variables as controls. Other socioeconomic variables (HDI, regional indicators) could have been included but had substantial missing data.

• Handling of missing data: Used listwise deletion for observations with missing DYNHERF values (2 observations). Alternative approaches include imputation or excluding DYNHERF from analysis.

• Effect size interpretation: Applied Cohen’s conventions for small/medium/large effects (0.3/0.5 thresholds for standardized coefficients).

• Visualization choices: Created a four-panel figure focusing on 2004-2010 period for properly matched data, with temporal trends and year-specific correlations to show relationship instability.

• Primary analysis period: Focused on 2004-2010 as the primary analysis period due to properly temporally matched poverty data, while reporting extended analyses including 2013-2016 separately with appropriate caveats.

Original dataset, from the Ateneo Policy Center (APC) Political Dynasties Dataset through BetterGov.ph , alternative link through the Inclusive Democracy website. Description for the dataset:

The Ateneo Policy Center (APC) Political Dynasties Dataset is a continuing effort to track leadership patterns at the local government level in the Republic of the Philippines. In particular, it traces the presence and extent of political clans, comprised of elected politicians with family members also in elected office prior to or during their term in office. In the economic development literature, very few studies contain empirical estimates of dynastic patterns at the local government level

Download the .xlsx file (dataset) here.

Dynasty vs. Poverty Correlation Analysis

http://pageman.wordpress.com/?p=1491

Extensions

Jordan Peterson’s 42 Rules in Life

pageman Sep 19, 2022

Posting this because I saw some posts saying they took down Jordan Peterson’s original post in Quora. The post was originally an answer to the question: What are the most valuable things everyone should know? There’s also a website dedicated to it. Tell the truth.Do not do things that you hate.Act so that you can […]

Show full content

Posting this because I saw some posts saying they took down Jordan Peterson’s original post in Quora. The post was originally an answer to the question: What are the most valuable things everyone should know? There’s also a website dedicated to it.

Tell the truth.
Do not do things that you hate.
Act so that you can tell the truth about how you act.
Pursue what is meaningful, not what is expedient.
If you have to choose, be the one who does things, instead of the one who is seen to do things.
Pay attention.
Assume that the person you are listening to might know something you need to know. Listen to them hard enough so that they will share it with you.
Plan and work diligently to maintain the romance in your relationships.
Be careful who you share good news with.
Be careful who you share bad news with.
Make at least one thing better every single place you go.
Imagine who you could be, and then aim single-mindedly at that.
Do not allow yourself to become arrogant or resentful.
Try to make one room in your house as beautiful as possible.
Compare yourself to who you were yesterday, not to who someone else is today.
Work as hard as you possibly can on at least one thing and see what happens.
If old memories still make you cry, write them down carefully and completely.
Maintain your connections with people.
Do not carelessly denigrate social institutions or artistic achievement.
Treat yourself as if you were someone that you are responsible for helping.
Ask someone to do you a small favour, so that he or she can ask you to do one in the future.
Make friends with people who want the best for you.
Do not try to rescue someone who does not want to be rescued, and be very careful about rescuing someone who does.
Nothing well done is insignificant.
Set your house in perfect order before you criticize the world.
Dress like the person you want to be.
Be precise in your speech.
Stand up straight with your shoulders back.
Don’t avoid something frightening if it stands in your way — and don’t do unnecessarily dangerous things.
Do not let your children do anything that makes you dislike them.
Do not transform your wife into a maid.
Do not hide unwanted things in the fog.
Notice that opportunity lurks where responsibility has been abdicated.
Read something written by someone great.
Pet a cat when you encounter one on the street.
Do not bother children when they are skateboarding.
Don’t let bullies get away with it.
Write a letter to the government if you see something that needs fixing — and propose a solution.
Remember that what you do not yet know is more important than what you already know.
Be grateful in spite of your suffering.

http://pageman.wordpress.com/?p=1485

Extensions

Music Genres 1/5107 or 5044 – Central Asian Folk

pageman Dec 24, 2020

If you got roasted by the “How Bad was your Spotify” AI (try it here: https://pudding.cool/2020/12/judge-my-spotify/ ), you’ve probably asked yourself how you can disover other music genres so that you can broaden your music horizon. Well I found out about the Every Noise about a few weeka ago and it has 5,068 music genres […]

Show full content

If you got roasted by the “How Bad was your Spotify” AI (try it here: https://pudding.cool/2020/12/judge-my-spotify/ ), you’ve probably asked yourself how you can disover other music genres so that you can broaden your music horizon. Well I found out about the Every Noise about a few weeka ago and it has 5,068 music genres on Spotify. By the time I checked on it todat, it was a 5,107. If you divide 5,107 by 365 days, that’s about 14 new music genres a day that you can discover.

What I do is plug-in the numbers 1 and 5107 into random.org and let the random number generator (RNG), work it’s magic. When I clicked “generate” it gave the number 5,044. You go to the original Every Noise list http://everynoise.com/everynoise1d.cgi?scope=all and find out what music genre, the number 5,044 corresponds to and in this case the music genre is “Central Asian Folk”. If you click on that link, it will reorder the list with “Central Asian Folk” at #1, relates musical genres to it from numbers #2 to #20 and at #5107 it’s “hard minimal techno” – which kind of makes sense to be at the other side of the music genre spectrum.

The list, http://everynoise.com/everynoise1d.cgi?root=central%20asian%20folk&scope=all conveniently has a Spotify that you can open and you can now sample a music genre that normal recommendation algorithms wouldn’t have otherwise suggested to you. Check it out here: https://open.spotify.com/playlist/6zZc6904zlMlWQqH16hjlx?si=X9VWbfMrTiKMOk0vHYOAqg and see which songs stand out.

I plan to take some notes on the rabbit holes I will get into via this method. Hopefully after a year, I would have sampled a much varied pallette of musical genres and it will hopefully leave me more open-minded. Some of my discoveries for this instance is the album Nauryz by the group Roksonaki https://open.spotify.com/album/1so7YgnkpNjdn99uCGci4n?si=K3ZTgvBtRpK1WSJrtgv4Ng and and also the album Arzu (Song of the Uyghurs) by Sanubar Tursun https://open.spotify.com/album/4mAufdX1k61TJ0X1lHojyf?si=wFyDPBUdRxWtw9rJOCM5-A who was supposed to go on tour but didn’t show up for reasons I won’t tackle on this post.

The ease in which one can fall into a rabbit hole is pleasantly surprising. If you just research the instruments they use and you end somewhere referencing Albert Kuvezin https://open.spotify.com/artist/2rMv2d9ermxwlNvZEYSWmA?si=Ru5Jn4Y8S2awQK8MJ0JB0A but one of the advantages of a reodered list presumably in way that maps its musical proximity to a music genre is that there are Spotify playlists you can jump into right after Central Asian Folk:

kazakh traditional
dombra
tajik traditional
uzbek traditional
jaw harp
kyrgyz traditional
vietnamese traditional
lao traditional
classical clarinet
yunnan traditional
cambodian traditional
samba-jazz
jazz brass
mountain dulcimer
swedish jazz

The good thing is that until swedish jazz, I have no idea what the other music genres would sound like except maybe for classical clarinet, samba-jazz and jazz brass. This will surely push me to the very limits of my music listening comfort but hopefully I will learn more about people, music and culture that’s all new to me. And yes, you do end up praying and interceding for the cultures behind these music genres because there’s nothing more visceral than listening to a new song from a new culture for the first time.

http://pageman.wordpress.com/?p=1478

Extensions

Using Polly with Lambda

pageman Mar 1, 2020

Tried using Polly with Lambda and here’s the code: import json import boto3 def lambda_handler(event, context): # TODO implement polly_client = boto3.Session(region_name=’us-east-1′).client(‘polly’) # aws_access_key_id=, # aws_secret_access_key=, # region_name=’us-west-2′).client(‘polly’) response = polly_client.synthesize_speech(VoiceId=’Joanna’, OutputFormat=’mp3′, Text = ‘This is a sample text to be synthesized.’) file = open(‘/tmp/speech.mp3’, ‘wb’) file.write(response[‘AudioStream’].read()) file.close() s3 = boto3.client(‘s3’) with open(“/tmp/speech.mp3”, “rb”) as […]

Show full content

Tried using Polly with Lambda and here’s the code:

import json
import boto3

def lambda_handler(event, context):
# TODO implement
polly_client = boto3.Session(region_name=’us-east-1′).client(‘polly’)
# aws_access_key_id=,
# aws_secret_access_key=,
# region_name=’us-west-2′).client(‘polly’)

response = polly_client.synthesize_speech(VoiceId=’Joanna’,
OutputFormat=’mp3′,
Text = ‘This is a sample text to be synthesized.’)

file = open(‘/tmp/speech.mp3’, ‘wb’)
file.write(response[‘AudioStream’].read())
file.close()

s3 = boto3.client(‘s3’)
with open(“/tmp/speech.mp3”, “rb”) as f:
s3.upload_fileobj(f, “paulpajo.bucket.xxxxx”, “speechconvert.mp3”)

return {
‘statusCode’: 200,
‘body’: json.dumps(‘Hello from Lambda!’)
}

http://pageman.wordpress.com/?p=1475

Extensions

Happy New Year 2018!

pageman Jan 2, 2018

Time to start posting again – it’s 2018! Been reading up on a few things so here’s a list: How Lightning Network works Demystifying Hashgraphs (it’s permissioned btw) Fundamental Challeges to Public Blockchains by Preethi Kasireddy Hugo Nguyen on Charlie Lee divesting his LTC Naval Ravikant on how the blockchain will replace existing networks with […]

Show full content

Time to start posting again – it’s 2018!

Been reading up on a few things so here’s a list:

As you can see I’ve been having fun with the @ThreadReaderApp – when I see an interesting thread, I just reply to it with @ThreadReaderApp and type “unroll please”. Happy reading!

http://pageman.wordpress.com/?p=1472

Extensions

The first 11 days of April 2016 so far

pageman Apr 11, 2016

View this post on Instagram Register Now at http://bilangpilipinohackathon.eventbrite.com #BilangPilipino #HACKSYON #CodeMore #HackMore #LiveMore A post shared by Paul "The Pageman" Pajo (@paulpajo) on Apr 11, 2016 at 2:34am PDT It’s the season of many hackathons again and last month I was a judge at the IoT (Internet of Things) hackathon. This weekend there will […]

Show full content

View this post on Instagram

Register Now at http://bilangpilipinohackathon.eventbrite.com #BilangPilipino #HACKSYON #CodeMore #HackMore #LiveMore

A post shared by Paul "The Pageman" Pajo (@paulpajo) on Apr 11, 2016 at 2:34am PDT

It’s the season of many hackathons again and last month I was a judge at the IoT (Internet of Things) hackathon. This weekend there will be two (!!!) – and since I can’t violate the laws of physics – I will be be at the #ThinkOpenHealth Hackathon while another one, HACKSYON: #BilangPilipino Hackathon is going on. The former will be at AIM and is already sold out while the latter is already at more than 50% – so register now before you run out of slots to do a hackathon this weekend.

Btw, The Penthouse 8747 just opened!

View this post on Instagram

Cherry Summer here at The Penthouse 8747 #Accelerate2016 @scrambled_gegs

A post shared by Paul "The Pageman" Pajo (@paulpajo) on Apr 8, 2016 at 6:17am PDT

There’s now a Foot Zone at Jupiter – Special thanks to Jen Ching-Tapuro and Elisa Tapuro for the warm welcome when I dropped by.

View this post on Instagram

Soft opening of Foot Zone Jupiter! With Jen Ching-Tapuro and Elisa Tapuro – it's 50% off til tomorrow! #Accelerate2016

A post shared by Paul "The Pageman" Pajo (@paulpajo) on Apr 2, 2016 at 8:25am PDT

http://pageman.wordpress.com/?p=1384

Extensions

Back from Bali

pageman Apr 28, 2014

I was wondering where everyone was going to go for their Holy Week and for some reason, most of my friends whom I polled just said they would stay home. It seems everyone was either trying to enjoy the Holy Week break in the Philippines where all traffic just disappears or everyone was reserving their […]

Show full content

I was wondering where everyone was going to go for their Holy Week and for some reason, most of my friends whom I polled just said they would stay home. It seems everyone was either trying to enjoy the Holy Week break in the Philippines where all traffic just disappears or everyone was reserving their vacation leaves and/or plans for LaBoracay (the long weekend that sandwiches May 1 every year that serves as an excuse to go to Boracay). I’ve been to Boracay before, during and after Holy Week and essentially – no one is there from Manila before Maundy Thursday, people you hang out with in Manila are there from Maundy Thursday to Easter Sunday and then the people from the nearby provinces descend upon Boracay starting on the eve of Good Friday. They all miraculously disappear after Easter Sunday. I’m guessing it’s the same experience without of course the Holy Week subtext. So when Diana Vodden and Kamrin Klauschie said there was going to be an AngelHack Ambassador Retreat during Holy Week in Bali, Indonesia – who am I to say no?

I tried to post via the wifi spots in Bali and figured out later it might be easier if I just get a Telkomsel SIMPati card and just use it to post on social media. The downside to deciding late to do this was that I wasn’t able to cross-post a lot of my pictures on Flickr.com and had only two nice photos – one was Sabeen (the CEO of AngelHack) with one of the beach dogs and a nice sunset at Batu Belig Beach from Mozaic.

Sabeen and the Dog

Beach and Sunset View at Batu Belig from Mozaic

Here’s some photos via Instagram

View this post on Instagram

An #AngelHackGoneBali #regram via #dellaw – @narayanratish @kamrinklauschie @krism9999 @emm_aja @jess4men @angelhackhq @kawaiirei #Tremendous2014 #Seminyak #Bali #MadesWarung

A post shared by Paul "The Pageman" Pajo (@paulpajo) on Apr 19, 2014 at 7:21am PDT

View this post on Instagram

With #AngelHackGoneBali ambassadors @jess4men @kamrinklauschie @eikoraquel #Bali #Kuta #PotatoHead

A post shared by Paul "The Pageman" Pajo (@paulpajo) on Apr 17, 2014 at 8:58am PDT

View this post on Instagram

With Lucas & Jamie (and @eikoraquel and Ratish as photobombers!) #AngelHackGoneBali #Ubud #Bali #hackstrong

A post shared by Paul "The Pageman" Pajo (@paulpajo) on Apr 18, 2014 at 3:59am PDT

and just before the Ambassadors were leaving – I bumped into IAM’s May Flores and her friends Priscilla & Dr. Richard –

View this post on Instagram

#AngelHackGoneBali + #IAMAgency #Seminyak #Bali

A post shared by Paul "The Pageman" Pajo (@paulpajo) on Apr 19, 2014 at 8:15am PDT

I initially stayed at Ketut’s Place just at the back of Ubud Palace with Kris Marganti. It was a great place to be introduced to Bali and Ubud – it was a home stay place and our room overlooked a balcony on a ravine with a river at the bottom and the forest just across us. It was non-stop white noise from then on and we were even forced to open the windows and the doors the first day since the aircon was on the funk. It turn out to be a blessing in disguise as the Bali breeze was really perfect. When the Ambassadors arrived we moved to a new hotel called Inata Bisma – new as in it was a new hotel and even the locals didn’t know it was already open. I wanted to extend at Inata but I got an offer to stay at Batu Belig and so decided to do just that.

I’ll update this post later but all in all – Bali is a great place just to take a vacation from the hustle and bustle of Metro Manila. Imagine if the whole Cebu were all Boracay establishments? That would probably be Bali on a very rough scale. Visit it if you can – there’s a direct flight from CebuPac from Manila although for some reason, I went there via Singapore via Singapore airlines which was very tiring but I missed Changi Airport anyway haha.

http://pageman.wordpress.com/?p=1372

Extensions

How did Ashton Kutcher prepare for his role as Steve Jobs in the new movie Jobs?

pageman Aug 4, 2013

Answer by Ashton Kutcher: I spent about 3 months preparing the character. I started by consuming content about Steve Jobs. The script was a fantastic resource but after reading it I was left with as many questions as answers as to why he was the way he was and why he made some of the […]

Show full content

Answer by Ashton Kutcher:

I spent about 3 months preparing the character.

I started by consuming content about Steve Jobs. The script was a fantastic resource but after reading it I was left with as many questions as answers as to why he was the way he was and why he made some of the decisions he made. I started by watching documentaries and interviews Silicon Valley Historical Association about him and collecting youtube content Inspired By Jobs: Technology and Soundcloud files on jobs.

This was to try to understand some of the broad themes of his persona. What I was looking for was patterns on consistent behavior and ideals. I heard him repeat his story about a computer being a tool for the mind and that we should all be bold enough create the world we live in. I also picked up on his value for diverse education through experience. I then started to dissect the nuances of his behavior, the walk, the fact that he has an almost imperceptible lisp, his accent that was a combination of northern California and Wisconsin, the way he paused before answer, and nodded in understanding, the way he bowed in namaste when receiving praise, and stared with contempt when in conflict. I noticed how he used his hands to talk and how he counted with his fingers (pinky finger first), how he used the word "aaaaand" and "noooow" to think about what he was going to say next. But I quickly found that learning "how a person is" ultimately is the the key, you have to learn "why a person is".

Once armed with this external impression I wanted to get a better feel of why he saw the world the way he did. I wanted to know why he liked what he liked and pursued what he pursued. So I started to consume what he consumed.
Books he read: Autobiography of a Yogi : Paramahansa Yogananda,Mucusless Diet Healing System: Arnold Ehret, Be Here Now: Ram Dass.
Researching the artists he admired: Bauhaus, Folon, Ansel Adams
Eating the food he ate: Grapes, Carrot Juice, Popcorn
Studying the Entrepreneurs he admired: Edison, Edwin Land

Then I met with the people that he knew and worked with to unravel some of the subtle quarks and conflicted decision making that I couldn't rationalize. Alan Kay, Avi Tevanian, Jeffery Katzenberg, Mike Hawley, and many others were phenomenal resources.

I then worked with my acting coach Greta Seacat to relate his emotionality and behavior to my own. She helped me make it personal and authentic.

But in the end one of the greatest tells of the man were his creations. They were elegant, intelligent, thoughtful, precise, artistic, bold, visionary, complicated, efficient, fun, entertaining, powerful, imperfect, and beautiful on the inside and out…. Just like Steve.

View Answer on Quora

http://pageman.wordpress.com/?p=1370

Extensions

https://pageman.wordpress.com/feed

Posts