Locy Flagship: Semiconductor Yield Excursion Triage¶
This notebook uses a real manufacturing dataset (SECOM, UCI) and walks end-to-end through:
DERIVE: materialize risk links into graph edges.ASSUME: run temporary containment scenarios.ABDUCE: propose minimal changes that alter outcomes.EXPLAIN RULE: inspect proof paths behind a conclusion.
It is schema-first (recommended) and designed for first-time Locy readers.
How To Read This Notebook¶
- Each code cell is preceded by intent, expected output shape, and reading tips.
- We use a curated slice from SECOM for quick execution in docs while preserving real data behavior.
- Commands are grouped so you can reason from facts -> inference -> counterfactual -> explanation.
1) Setup and Data Discovery¶
What this does: Initialize helper utilities, locate prepared data files, and create an isolated temporary database.
What to expect:
Printed DATA_DIR and DB_DIR paths.
from pathlib import Path
from pprint import pprint
import csv
import shutil
import tempfile
import os
import uni_db
def _read_csv(path: Path) -> list[dict[str, str]]:
with path.open('r', encoding='utf-8', newline='') as f:
return list(csv.DictReader(f))
def _esc(value: str) -> str:
return str(value).replace('\\', '\\\\').replace("'", "\\'")
_default_candidates = [
Path('docs/examples/data/locy_semiconductor_yield_excursion'),
Path('website/docs/examples/data/locy_semiconductor_yield_excursion'),
Path('examples/data/locy_semiconductor_yield_excursion'),
Path('../data/locy_semiconductor_yield_excursion'),
]
if 'LOCY_DATA_DIR' in os.environ:
DATA_DIR = Path(os.environ['LOCY_DATA_DIR']).resolve()
else:
DATA_DIR = next(
(p.resolve() for p in _default_candidates if (p / 'secom_lots.csv').exists()),
_default_candidates[0].resolve(),
)
if not (DATA_DIR / 'secom_lots.csv').exists():
raise FileNotFoundError(
'Expected dataset under docs/examples/data/locy_semiconductor_yield_excursion. '
'Run from website/ (or repo root) or set LOCY_DATA_DIR to the dataset path.'
)
DB_DIR = tempfile.mkdtemp(prefix='uni_locy_semiconductor_')
db = uni_db.Database(DB_DIR)
print('DATA_DIR:', DATA_DIR)
print('DB_DIR:', DB_DIR)
DATA_DIR: /home/runner/work/uni-db/uni-db/website/docs/examples/data/locy_semiconductor_yield_excursion DB_DIR: /tmp/uni_locy_semiconductor_hfow0m9f
2) Load Real Data and Build a Focus Slice¶
What this does: Loads SECOM-derived CSVs and keeps a focused cohort (fail-heavy + pass references) for fast but meaningful execution.
What to expect: Counts for lots, selected features, tools/modules, and excursion events.
How to read it: The focused slice remains grounded in real measurements while keeping notebook runtime practical.
lots = _read_csv(DATA_DIR / 'secom_lots.csv')
features = _read_csv(DATA_DIR / 'secom_feature_catalog.csv')
excursions = _read_csv(DATA_DIR / 'secom_excursions.csv')
notebook_cases = _read_csv(DATA_DIR / 'secom_notebook_cases.csv')
selected_features = {r['feature_id']: r for r in features if r['selected'].lower() == 'true'}
focus_fail_ids = [r['lot_id'] for r in notebook_cases[:24]]
pass_reference_ids = [r['lot_id'] for r in lots if r['yield_outcome'] == 'PASS'][:72]
focus_ids = set(focus_fail_ids + pass_reference_ids)
focus_lots = [r for r in lots if r['lot_id'] in focus_ids]
focus_excursions = [
r for r in excursions
if r['lot_id'] in focus_ids and r['feature_id'] in selected_features
]
active_feature_ids = sorted({r['feature_id'] for r in focus_excursions})
feature_rows = [selected_features[fid] for fid in active_feature_ids]
tools = {}
modules = set()
for row in feature_rows:
tools[row['tool_id']] = row['module']
modules.add(row['module'])
print('focus lots:', len(focus_lots))
print('focus fail lots:', sum(1 for r in focus_lots if r['yield_outcome'] == 'FAIL'))
print('selected active features:', len(feature_rows))
print('tools:', len(tools), 'modules:', len(modules))
print('focus excursion rows:', len(focus_excursions))
focus lots: 96 focus fail lots: 24 selected active features: 16 tools: 13 modules: 5 focus excursion rows: 199
3) Define Schema (Recommended)¶
What this does: Creates explicit labels, typed properties, and edge types before ingest.
What to expect:
A single Schema created confirmation.
How to read it: Schema mode keeps demos and production behavior aligned and prevents implicit-shape drift.
(
db.schema()
.label('Lot')
.property('lot_id', 'string')
.property('yield_outcome', 'string')
.property('test_timestamp', 'string')
.property('cohort', 'string')
.done()
.label('Feature')
.property('feature_id', 'string')
.property('module', 'string')
.property('tool_id', 'string')
.property('effect_size', 'float64')
.property('selected', 'bool')
.done()
.label('Tool')
.property('tool_id', 'string')
.property('module', 'string')
.done()
.label('Module')
.property('name', 'string')
.done()
.edge_type('OBSERVED_EXCURSION', ['Lot'], ['Feature'])
.done()
.edge_type('MEASURED_ON', ['Feature'], ['Tool'])
.done()
.edge_type('PART_OF', ['Tool'], ['Module'])
.done()
.edge_type('IMPACTS_TOOL', ['Lot'], ['Tool'])
.done()
.edge_type('CONTAINED_BY', ['Lot'], ['Tool'])
.done()
.apply()
)
print('Schema created')
Schema created
4) Ingest the Manufacturing Graph¶
What this does: Inserts module/tool/feature/lot facts and excursion edges for the focused real-data slice.
What to expect: Graph counts for nodes and excursion edges.
How to read it:
Each Lot -> Feature -> Tool -> Module chain is the evidence path Locy will reason over.
for module in sorted(modules):
db.execute(f"CREATE (:Module {{name: '{_esc(module)}'}})")
for tool_id, module in sorted(tools.items()):
db.execute(
f"CREATE (:Tool {{tool_id: '{_esc(tool_id)}', module: '{_esc(module)}'}})"
)
db.execute(
f"MATCH (t:Tool {{tool_id: '{_esc(tool_id)}'}}), (m:Module {{name: '{_esc(module)}'}}) "
"CREATE (t)-[:PART_OF]->(m)"
)
for row in feature_rows:
selected_literal = 'true' if row['selected'].lower() == 'true' else 'false'
effect_size = float(row['effect_size']) if row['effect_size'] else 0.0
db.execute(
f"CREATE (:Feature {{feature_id: '{_esc(row['feature_id'])}', module: '{_esc(row['module'])}', "
f"tool_id: '{_esc(row['tool_id'])}', effect_size: {effect_size}, selected: {selected_literal}}})"
)
db.execute(
f"MATCH (f:Feature {{feature_id: '{_esc(row['feature_id'])}'}}), (t:Tool {{tool_id: '{_esc(row['tool_id'])}'}}) "
"CREATE (f)-[:MEASURED_ON]->(t)"
)
for row in focus_lots:
cohort = 'fail_focus' if row['yield_outcome'] == 'FAIL' else 'pass_reference'
db.execute(
f"CREATE (:Lot {{lot_id: '{_esc(row['lot_id'])}', yield_outcome: '{_esc(row['yield_outcome'])}', "
f"test_timestamp: '{_esc(row['test_timestamp'])}', cohort: '{cohort}'}})"
)
for row in focus_excursions:
db.execute(
f"MATCH (l:Lot {{lot_id: '{_esc(row['lot_id'])}'}}), (f:Feature {{feature_id: '{_esc(row['feature_id'])}'}}) "
"CREATE (l)-[:OBSERVED_EXCURSION]->(f)"
)
counts = db.query("""
MATCH (l:Lot)
WITH count(l) AS lots
MATCH (f:Feature)
WITH lots, count(f) AS features
MATCH ()-[e:OBSERVED_EXCURSION]->()
RETURN lots, features, count(e) AS excursion_edges
""")
print('Graph counts:')
pprint(counts[0])
outcome_counts = db.query("""
MATCH (l:Lot)
RETURN l.yield_outcome AS outcome, count(*) AS lots
ORDER BY lots DESC
""")
print('\nLot outcomes:')
pprint(outcome_counts)
Graph counts:
{'excursion_edges': 199, 'features': 16, 'lots': 96}
Lot outcomes:
[{'lots': 72, 'outcome': 'PASS'}, {'lots': 24, 'outcome': 'FAIL'}]
5) Baseline Inference + DERIVE Materialization¶
What this does:
Builds fail-lot relations, projects fail excursions to tools, and materializes :IMPACTS_TOOL edges via DERIVE.
What to expect:
- A
queryresult listing(lot_id, tool_id, module)evidence rows. - A
deriveresult with affected mutation count. - A
cypherranking of hotspot tools.
How to read Locy rules:
CREATE RULE ... YIELDcreates logical relations.CREATE RULE ... DERIVEdefines graph mutations thatDERIVE <rule>executes.
program_baseline = r'''
CREATE RULE fail_lot AS
MATCH (l:Lot)
WHERE l.yield_outcome = 'FAIL'
YIELD KEY l
CREATE RULE fail_tool_excursion AS
MATCH (l:Lot)-[:OBSERVED_EXCURSION]->(f:Feature)-[:MEASURED_ON]->(t:Tool)
WHERE l IS fail_lot
YIELD KEY l, KEY t
CREATE RULE impacts_tool AS
MATCH (l:Lot)-[:OBSERVED_EXCURSION]->(f:Feature)-[:MEASURED_ON]->(t:Tool)
WHERE l IS fail_lot
DERIVE (l)-[:IMPACTS_TOOL]->(t)
QUERY fail_tool_excursion WHERE l = l RETURN l.lot_id AS lot_id, t.tool_id AS tool_id, t.module AS module
DERIVE impacts_tool
MATCH (l:Lot)-[:IMPACTS_TOOL]->(t:Tool)
RETURN t.tool_id AS tool, t.module AS module, count(DISTINCT l) AS impacted_fail_lots
ORDER BY impacted_fail_lots DESC, tool
LIMIT 10
'''
baseline_out = db.locy_evaluate(
program_baseline,
{
'max_iterations': 300,
'timeout': 60.0,
'max_abduce_candidates': 40,
'max_abduce_results': 10,
},
)
stats = baseline_out['stats']
print('Iterations:', stats.total_iterations)
print('Strata:', stats.strata_evaluated)
print('Queries executed:', stats.queries_executed)
hot_tool_rows = []
for i, cmd in enumerate(baseline_out['command_results'], start=1):
print(f"\nCommand #{i}:", cmd.get('type'))
if cmd.get('type') in ('query', 'cypher'):
rows = cmd.get('rows', [])
print('rows:', len(rows))
pprint(rows[:5])
elif cmd.get('type') == 'derive':
print('affected:', cmd.get('affected'))
if cmd.get('type') == 'cypher':
hot_tool_rows = cmd.get('rows', [])
if not hot_tool_rows:
raise RuntimeError('Expected hotspot tool rows from baseline cypher ranking')
hot_tool = hot_tool_rows[0]['tool']
print('\nSelected hotspot tool for scenario analysis:', hot_tool)
Iterations: 0
Strata: 3
Queries executed: 5
Command #1: query
rows: 74
[{'lot_id': 'LOT_0003', 'module': 'etch', 'tool_id': 'etch-tool-08'},
{'lot_id': 'LOT_0003',
'module': 'lithography',
'tool_id': 'lithography-tool-04'},
{'lot_id': 'LOT_0003', 'module': 'test', 'tool_id': 'test-tool-07'},
{'lot_id': 'LOT_0011',
'module': 'lithography',
'tool_id': 'lithography-tool-04'},
{'lot_id': 'LOT_0011', 'module': 'implant', 'tool_id': 'implant-tool-03'}]
Command #2: derive
affected: 199
Command #3: cypher
rows: 10
[{'impacted_fail_lots': 26,
'module': 'lithography',
'tool': 'lithography-tool-04'},
{'impacted_fail_lots': 23, 'module': 'cmp', 'tool': 'cmp-tool-08'},
{'impacted_fail_lots': 21, 'module': 'implant', 'tool': 'implant-tool-03'},
{'impacted_fail_lots': 19, 'module': 'implant', 'tool': 'implant-tool-05'},
{'impacted_fail_lots': 17,
'module': 'lithography',
'tool': 'lithography-tool-06'}]
Selected hotspot tool for scenario analysis: lithography-tool-04
6) EXPLAIN RULE for a Concrete Failed Lot¶
What this does: Builds a derivation tree for one failed lot so readers can see why it satisfied a target rule.
What to expect: A tree-like printout with rule name, clause index, and bindings.
How to read it: Parents are conclusions; children are supporting premises.
focus_lot = focus_fail_ids[0]
program_explain = f'''
CREATE RULE fail_lot AS
MATCH (l:Lot)
WHERE l.yield_outcome = 'FAIL'
YIELD KEY l
CREATE RULE fail_tool_excursion AS
MATCH (l:Lot)-[:OBSERVED_EXCURSION]->(f:Feature)-[:MEASURED_ON]->(t:Tool)
WHERE l IS fail_lot
YIELD KEY l, KEY t
EXPLAIN RULE fail_tool_excursion WHERE l.lot_id = '{focus_lot}' RETURN t
'''
explain_out = db.locy_evaluate(program_explain)
explain_cmd = next(cmd for cmd in explain_out['command_results'] if cmd.get('type') == 'explain')
tree = explain_cmd['tree']
def _print_tree(node, depth=0, max_depth=4):
indent = ' ' * depth
rule = node.get('rule')
clause = node.get('clause_index')
bindings = node.get('bindings', {})
print(f"{indent}- rule={rule}, clause={clause}, bindings={bindings}")
if depth >= max_depth:
return
for child in node.get('children', []):
_print_tree(child, depth + 1, max_depth=max_depth)
print('Explain tree for lot:', focus_lot)
_print_tree(tree)
Explain tree for lot: LOT_0024
- rule=fail_tool_excursion, clause=0, bindings={}
- rule=fail_tool_excursion, clause=0, bindings={'t': {'_id': '9', '_labels': ['Tool'], 'module': 'etch', 'tool_id': 'etch-tool-05'}, 'l': {'_id': '56', '_labels': ['Lot'], 'lot_id': 'LOT_0024', 'test_timestamp': '2008-07-25T15:23:00', 'yield_outcome': 'FAIL', 'cohort': 'fail_focus'}, 'f': {'_id': '24', '_labels': ['Feature'], 'effect_size': 0.3781298, 'selected': True, 'module': 'etch', 'feature_id': 'F125', 'tool_id': 'etch-tool-05'}}
- rule=fail_lot, clause=0, bindings={'l': {'_id': '56', '_labels': ['Lot'], 'yield_outcome': 'FAIL', 'lot_id': 'LOT_0024', 'test_timestamp': '2008-07-25T15:23:00', 'cohort': 'fail_focus'}}
- rule=fail_tool_excursion, clause=0, bindings={'f': {'_id': '31', '_labels': ['Feature'], 'tool_id': 'cmp-tool-08', 'selected': True, 'module': 'cmp', 'feature_id': 'F432', 'effect_size': 0.31209902}, 't': {'_id': '6', '_labels': ['Tool'], 'module': 'cmp', 'tool_id': 'cmp-tool-08'}, 'l': {'_id': '56', '_labels': ['Lot'], 'test_timestamp': '2008-07-25T15:23:00', 'cohort': 'fail_focus', 'lot_id': 'LOT_0024', 'yield_outcome': 'FAIL'}}
- rule=fail_lot, clause=0, bindings={'l': {'_id': '56', '_labels': ['Lot'], 'lot_id': 'LOT_0024', 'cohort': 'fail_focus', 'yield_outcome': 'FAIL', 'test_timestamp': '2008-07-25T15:23:00'}}
- rule=fail_tool_excursion, clause=0, bindings={'l': {'_id': '56', '_labels': ['Lot'], 'cohort': 'fail_focus', 'lot_id': 'LOT_0024', 'test_timestamp': '2008-07-25T15:23:00', 'yield_outcome': 'FAIL'}, 't': {'_id': '16', '_labels': ['Tool'], 'module': 'lithography', 'tool_id': 'lithography-tool-06'}, 'f': {'_id': '18', '_labels': ['Feature'], 'selected': True, 'feature_id': 'F022', 'effect_size': 0.34723614, 'module': 'lithography', 'tool_id': 'lithography-tool-06'}}
- rule=fail_lot, clause=0, bindings={'l': {'_id': '56', '_labels': ['Lot'], 'test_timestamp': '2008-07-25T15:23:00', 'lot_id': 'LOT_0024', 'cohort': 'fail_focus', 'yield_outcome': 'FAIL'}}
- rule=fail_tool_excursion, clause=0, bindings={'f': {'_id': '19', '_labels': ['Feature'], 'effect_size': 0.44549732, 'selected': True, 'tool_id': 'lithography-tool-05', 'module': 'lithography', 'feature_id': 'F029'}, 't': {'_id': '15', '_labels': ['Tool'], 'module': 'lithography', 'tool_id': 'lithography-tool-05'}, 'l': {'_id': '56', '_labels': ['Lot'], 'yield_outcome': 'FAIL', 'lot_id': 'LOT_0024', 'cohort': 'fail_focus', 'test_timestamp': '2008-07-25T15:23:00'}}
- rule=fail_lot, clause=0, bindings={'l': {'_id': '56', '_labels': ['Lot'], 'lot_id': 'LOT_0024', 'cohort': 'fail_focus', 'yield_outcome': 'FAIL', 'test_timestamp': '2008-07-25T15:23:00'}}
- rule=fail_tool_excursion, clause=0, bindings={'f': {'_id': '28', '_labels': ['Feature'], 'tool_id': 'implant-tool-03', 'selected': True, 'feature_id': 'F299', 'effect_size': 0.304041, 'module': 'implant'}, 'l': {'_id': '56', '_labels': ['Lot'], 'cohort': 'fail_focus', 'test_timestamp': '2008-07-25T15:23:00', 'yield_outcome': 'FAIL', 'lot_id': 'LOT_0024'}, 't': {'_id': '12', '_labels': ['Tool'], 'tool_id': 'implant-tool-03', 'module': 'implant'}}
- rule=fail_lot, clause=0, bindings={'l': {'_id': '56', '_labels': ['Lot'], 'lot_id': 'LOT_0024', 'cohort': 'fail_focus', 'test_timestamp': '2008-07-25T15:23:00', 'yield_outcome': 'FAIL'}}
7) Counterfactual Containment with ASSUME¶
What this does: Applies a hypothetical hold on the hotspot tool, then compares contained vs residual failed lots.
What to expect:
One assume result block listing failed lots that become contained in the hypothetical state.
How to read it:
Residual count is computed as total_fail_lots - contained_fail_lots.
program_assume = f'''
ASSUME {{
MATCH (l:Lot)-[:IMPACTS_TOOL]->(t:Tool {{tool_id: '{hot_tool}'}})
CREATE (l)-[:CONTAINED_BY]->(t)
}} THEN {{
MATCH (l:Lot {{yield_outcome: 'FAIL'}})-[:CONTAINED_BY]->(t:Tool)
RETURN l.lot_id AS lot_id, t.tool_id AS tool_id
}}
'''
assume_out = db.locy_evaluate(program_assume)
assume_cmd = next(cmd for cmd in assume_out['command_results'] if cmd.get('type') == 'assume')
contained_rows = assume_cmd.get('rows', [])
contained_lot_ids = sorted({row['lot_id'] for row in contained_rows})
total_fail_lots = sum(1 for row in focus_lots if row['yield_outcome'] == 'FAIL')
contained_fail_lots = len(contained_lot_ids)
residual_fail_lots = total_fail_lots - contained_fail_lots
abduce_target_lot = contained_lot_ids[0] if contained_lot_ids else focus_lot
print('Total fail lots in cohort:', total_fail_lots)
print('Contained fail lots under assumption:', contained_fail_lots)
print('Residual fail lots under assumption:', residual_fail_lots)
print('ABDUCE target lot:', abduce_target_lot)
print('\nContained sample:')
pprint(contained_rows[:10])
rollback_check = db.query("MATCH (:Lot)-[r:CONTAINED_BY]->(:Tool) RETURN count(r) AS c")
print('\nRollback check (should be 0):', rollback_check[0]['c'])
Total fail lots in cohort: 24
Contained fail lots under assumption: 6
Residual fail lots under assumption: 18
ABDUCE target lot: LOT_0003
Contained sample:
[{'lot_id': 'LOT_0003', 'tool_id': 'lithography-tool-04'},
{'lot_id': 'LOT_0011', 'tool_id': 'lithography-tool-04'},
{'lot_id': 'LOT_0039', 'tool_id': 'lithography-tool-04'},
{'lot_id': 'LOT_0190', 'tool_id': 'lithography-tool-04'},
{'lot_id': 'LOT_0369', 'tool_id': 'lithography-tool-04'},
{'lot_id': 'LOT_0635', 'tool_id': 'lithography-tool-04'}]
Rollback check (should be 0): 0
8) Minimal Change Search with ABDUCE¶
What this does: Asks: what minimal change would make a contained failed lot no longer satisfy the hotspot quarantine rule?
What to expect:
An abduce result with candidate modifications (remove_edge, change_property, etc.).
How to read it:
validated=true candidates satisfy the abductive goal in hypothetical validation.
program_abduce = f'''
CREATE RULE needs_quarantine AS
MATCH (l:Lot)-[:OBSERVED_EXCURSION]->(:Feature)-[:MEASURED_ON]->(t:Tool)
WHERE l.yield_outcome = 'FAIL', t.tool_id = '{hot_tool}'
YIELD KEY l
ABDUCE NOT needs_quarantine WHERE l.lot_id = '{abduce_target_lot}' RETURN l
'''
abduce_out = db.locy_evaluate(
program_abduce,
{'max_abduce_candidates': 120, 'max_abduce_results': 12, 'timeout': 60.0},
)
abduce_cmd = next(cmd for cmd in abduce_out['command_results'] if cmd.get('type') == 'abduce')
mods = abduce_cmd.get('modifications', [])
print('ABDUCE target lot:', abduce_target_lot)
print('Abduced modifications:', len(mods))
for i, item in enumerate(mods[:8], start=1):
print(f"\nCandidate #{i}")
pprint(item)
ABDUCE target lot: LOT_0003
Abduced modifications: 2
Candidate #1
{'cost': 1.0,
'modification': {'edge_type': 'OBSERVED_EXCURSION',
'edge_var': '',
'match_properties': {},
'source_var': 'l',
'target_var': '',
'type': 'remove_edge'},
'validated': True}
Candidate #2
{'cost': 1.0,
'modification': {'edge_type': 'MEASURED_ON',
'edge_var': '',
'match_properties': {},
'source_var': 'l',
'target_var': 't',
'type': 'remove_edge'},
'validated': True}
9) What To Expect¶
- Baseline ranking should surface one or more dominant tools for failed-lot excursions.
ASSUMEshould contain at least one failed lot for the selected hotspot tool.- Residual failed lots should be lower than total failed lots.
ABDUCE NOTshould return at least one validated modification candidate.EXPLAIN RULEshould show a non-empty derivation tree for the focus failed lot.
10) Build-Time Assertions¶
These assertions make the notebook self-validating in CI/docs builds.
assert hot_tool_rows, 'Expected non-empty hotspot ranking output'
assert total_fail_lots > 0, 'Expected at least one failed lot'
assert contained_fail_lots > 0, 'Expected ASSUME to contain at least one failed lot'
assert residual_fail_lots < total_fail_lots, 'Expected residual fail lots to decrease under assumption'
assert mods, 'Expected ABDUCE to produce modification candidates'
assert any(item.get('validated') for item in mods), 'Expected at least one validated ABDUCE candidate'
assert tree.get('children'), 'Expected EXPLAIN RULE tree to include child derivations'
print('Notebook assertions passed.')
Notebook assertions passed.
11) Cleanup¶
Deletes the temporary on-disk database used for this notebook run.
shutil.rmtree(DB_DIR, ignore_errors=True)
print('Cleaned up', DB_DIR)
Cleaned up /tmp/uni_locy_semiconductor_hfow0m9f