π Advanced Featuresο
This guide covers RexFβs advanced capabilities for power users and complex research workflows.
Intelligent Parameter Explorationο
RexF provides automated parameter space exploration with multiple strategies.
Random Explorationο
Best for initial exploration of large parameter spaces:
from rexf import experiment, run
@experiment
def hyperparameter_search(learning_rate, batch_size, dropout_rate=0.1):
# Your model training code
model = create_model(dropout_rate)
accuracy = train_model(model, learning_rate, batch_size)
return {"accuracy": accuracy, "training_time": get_time()}
# Random exploration
run_ids = run.auto_explore(
hyperparameter_search,
strategy="random",
budget=50, # Number of experiments
optimization_target="accuracy",
# Parameter ranges (optional - RexF can infer reasonable ranges)
parameter_ranges={
"learning_rate": (0.0001, 0.1),
"batch_size": [16, 32, 64, 128],
"dropout_rate": (0.0, 0.5)
}
)
Grid Searchο
Systematic exploration of discrete parameter combinations:
# Grid search
run_ids = run.auto_explore(
hyperparameter_search,
strategy="grid",
budget=20,
optimization_target="accuracy",
parameter_ranges={
"learning_rate": [0.001, 0.01, 0.1],
"batch_size": [32, 64, 128],
"dropout_rate": [0.1, 0.2, 0.3]
}
)
Adaptive Explorationο
Learns from previous results to focus on promising regions:
# Adaptive exploration (Bayesian-style optimization)
run_ids = run.auto_explore(
hyperparameter_search,
strategy="adaptive",
budget=30,
optimization_target="accuracy",
# Adaptive strategy learns and doesn't need predefined ranges
)
The adaptive strategy:
Starts with random exploration
Builds a model of the parameter-performance relationship
Balances exploration of unknown regions with exploitation of good regions
Recommends parameters likely to improve results
Advanced Queryingο
Complex Query Expressionsο
Build sophisticated queries to find specific experiments:
# Complex accuracy and timing queries
efficient_models = run.find(
"accuracy > 0.9 and training_time < 300 and param_batch_size >= 32"
)
# Range queries
moderate_lr = run.find("param_learning_rate between 0.01 and 0.1")
# Status and timing combinations
recent_successes = run.find(
"status == 'completed' and start_time > '2024-01-01'"
)
Query Suggestionsο
Get intelligent query suggestions based on your data:
# Get suggested queries
suggestions = run.query_help()
print("Suggested queries for your experiments:")
for suggestion in suggestions:
print(f"- {suggestion}")
# Example output:
# - accuracy > 0.9
# - param_learning_rate < 0.01
# - training_time between 100 and 500
Custom Query Functionsο
For complex analysis, use the underlying query engine:
from rexf.intelligence.queries import SmartQueryEngine
from rexf.backends.intelligent_storage import IntelligentStorage
# Direct access to query engine
storage = IntelligentStorage("experiments.db")
query_engine = SmartQueryEngine(storage)
# Advanced filtering
results = storage.query_experiments(
parameter_filters={"learning_rate": {"lt": 0.01}},
metric_filters={"accuracy": {"gte": 0.9}},
order_by="start_time",
limit=10
)
Intelligent Insightsο
Deep Pattern Analysisο
Get comprehensive insights about your experiment patterns:
# Generate detailed insights
insights = run.insights(experiment_name="hyperparameter_search")
# Parameter impact analysis
param_insights = insights["parameter_insights"]
for param_name, analysis in param_insights.items():
print(f"\n{param_name}:")
print(f" Impact score: {analysis['impact_score']:.3f}")
print(f" Optimal range: {analysis['optimal_range']}")
print(f" Correlation with accuracy: {analysis['correlation']:.3f}")
# Performance patterns
perf_insights = insights["performance_insights"]
print(f"\nPerformance Insights:")
print(f" Best configuration: {perf_insights['best_configuration']}")
print(f" Efficiency sweet spot: {perf_insights['efficiency_sweet_spot']}")
# Correlation insights
correlations = insights["correlation_insights"]
for metric_pair, correlation in correlations.items():
if abs(correlation) > 0.5:
print(f"Strong correlation: {metric_pair} = {correlation:.3f}")
Anomaly Detectionο
Identify unusual experiments or outliers:
insights = run.insights()
anomalies = insights["anomaly_insights"]
print("Detected anomalies:")
for anomaly in anomalies["outliers"]:
print(f" Run {anomaly['run_id'][:8]}: {anomaly['reason']}")
print(f" {anomaly['details']}")
# Performance anomalies
perf_anomalies = anomalies["performance_anomalies"]
for anomaly in perf_anomalies:
print(f" Unusually {anomaly['type']}: {anomaly['description']}")
Smart Recommendationsο
Get actionable recommendations for improving your experiments:
insights = run.insights()
recommendations = insights["recommendations"]
print("Recommendations:")
for rec in recommendations:
print(f" π― {rec['title']}")
print(f" {rec['description']}")
print(f" Priority: {rec['priority']}")
if "action" in rec:
print(f" Action: {rec['action']}")
Advanced Experiment Managementο
Experiment Lineage and Relationshipsο
Track relationships between experiments:
@experiment
def data_preprocessing(dataset_size=1000, normalization="standard"):
# Data preprocessing
processed_data = preprocess(dataset_size, normalization)
return {"data_quality": evaluate_quality(processed_data)}
@experiment
def model_training(data_run_id, model_type="cnn"):
# Use data from previous experiment
data_experiment = run.get_by_id(data_run_id)
data_quality = data_experiment.metrics["data_quality"]
# Train model
accuracy = train_model(model_type, data_quality)
return {
"accuracy": accuracy,
"parent_experiment": data_run_id # Track lineage
}
# Run preprocessing
data_run_id = run.single(data_preprocessing, dataset_size=5000)
# Run training with reference to preprocessing
model_run_id = run.single(model_training,
data_run_id=data_run_id,
model_type="transformer")
Batch Processing and Parallel Executionο
For large-scale experiments, use batch processing:
import concurrent.futures
from functools import partial
def run_experiment_batch(param_combinations, experiment_func):
"""Run experiments in parallel."""
run_ids = []
# Create partial function with fixed experiment
run_func = partial(run.single, experiment_func)
# Run in parallel (be careful with resource usage)
with concurrent.futures.ThreadPoolExecutor(max_workers=4) as executor:
futures = []
for params in param_combinations:
future = executor.submit(run_func, **params)
futures.append(future)
# Collect results
for future in concurrent.futures.as_completed(futures):
try:
run_id = future.result()
run_ids.append(run_id)
except Exception as e:
print(f"Experiment failed: {e}")
return run_ids
# Define parameter grid
param_grid = [
{"learning_rate": lr, "batch_size": bs}
for lr in [0.001, 0.01, 0.1]
for bs in [32, 64, 128]
]
# Run batch
run_ids = run_experiment_batch(param_grid, hyperparameter_search)
print(f"Completed {len(run_ids)} experiments in parallel")
Advanced Analysis and Visualizationο
Custom Metrics and Analysisο
Define custom analysis functions:
def analyze_learning_curves(experiment_runs):
"""Custom analysis of learning progression."""
analysis = {}
for exp in experiment_runs:
# Extract learning curve data from metrics
if "learning_curve" in exp.metrics:
curve = exp.metrics["learning_curve"]
analysis[exp.run_id] = {
"convergence_epoch": find_convergence(curve),
"overfitting_detected": detect_overfitting(curve),
"final_accuracy": curve[-1]
}
return analysis
# Get experiments and analyze
recent_experiments = run.recent(hours=24)
learning_analysis = analyze_learning_curves(recent_experiments)
# Use analysis for recommendations
for run_id, analysis in learning_analysis.items():
if analysis["overfitting_detected"]:
print(f"Run {run_id[:8]}: Consider adding regularization")
Statistical Analysisο
Perform statistical tests on experiment results:
import scipy.stats as stats
def compare_experiment_groups(group1_query, group2_query, metric="accuracy"):
"""Compare two groups of experiments statistically."""
group1 = run.find(group1_query)
group2 = run.find(group2_query)
values1 = [exp.metrics[metric] for exp in group1 if metric in exp.metrics]
values2 = [exp.metrics[metric] for exp in group2 if metric in exp.metrics]
# Perform t-test
t_stat, p_value = stats.ttest_ind(values1, values2)
return {
"group1_mean": np.mean(values1),
"group2_mean": np.mean(values2),
"t_statistic": t_stat,
"p_value": p_value,
"significant": p_value < 0.05
}
# Compare high vs low learning rates
comparison = compare_experiment_groups(
"param_learning_rate > 0.01",
"param_learning_rate <= 0.01",
metric="accuracy"
)
print(f"High LR mean: {comparison['group1_mean']:.4f}")
print(f"Low LR mean: {comparison['group2_mean']:.4f}")
print(f"Statistically significant: {comparison['significant']}")
Performance Optimizationο
Database Optimizationο
For large numbers of experiments, optimize database performance:
from rexf.backends.intelligent_storage import IntelligentStorage
# Create storage with optimizations
storage = IntelligentStorage(
"experiments.db",
optimize_for_analytics=True # Enables additional indexing
)
# Batch operations for efficiency
experiments_batch = []
for params in large_param_list:
exp_data = create_experiment_data(params)
experiments_batch.append(exp_data)
# Bulk insert (more efficient than individual saves)
storage.save_experiments_batch(experiments_batch)
Memory Managementο
For memory-intensive experiments:
@experiment
def memory_intensive_experiment(dataset_size=1000000):
# Process data in chunks to manage memory
results = []
chunk_size = 10000
for i in range(0, dataset_size, chunk_size):
chunk_result = process_data_chunk(i, i + chunk_size)
results.append(chunk_result)
# Clear intermediate data
del chunk_result
# Return aggregated metrics only
return {
"accuracy": aggregate_accuracy(results),
"throughput": dataset_size / get_elapsed_time(),
"memory_peak": get_peak_memory_usage()
}
Integration with External Toolsο
Export to External Analysis Toolsο
Export data for analysis in R, MATLAB, or other tools:
def export_for_r_analysis():
"""Export experiment data for R analysis."""
experiments = run.all()
# Create R-friendly data structure
data_for_r = {
"run_id": [],
"parameters": [],
"metrics": [],
"metadata": []
}
for exp in experiments:
data_for_r["run_id"].append(exp.run_id)
data_for_r["parameters"].append(exp.parameters)
data_for_r["metrics"].append(exp.metrics)
data_for_r["metadata"].append({
"duration": exp.duration,
"status": exp.status,
"start_time": exp.start_time.isoformat()
})
# Save as R data file
import rpy2.robjects as robjects
r_data = robjects.conversion.py2rpy(data_for_r)
robjects.r.assign("experiment_data", r_data)
robjects.r("save(experiment_data, file='experiments.RData')")
Integration with MLflow/Weights & Biasesο
Use RexF alongside other experiment tracking tools:
import mlflow
@experiment
def dual_tracking_experiment(learning_rate=0.01):
# Start MLflow run
with mlflow.start_run():
# Your experiment code
accuracy = train_model(learning_rate)
# Log to MLflow
mlflow.log_param("learning_rate", learning_rate)
mlflow.log_metric("accuracy", accuracy)
# RexF automatically captures everything
return {"accuracy": accuracy}
# Both RexF and MLflow will track this experiment
run_id = run.single(dual_tracking_experiment, learning_rate=0.005)
This allows you to:
Use RexF for quick analysis and exploration
Use MLflow/W&B for detailed logging and team collaboration
Compare and validate results across both platforms
Next Stepsο
Youβve mastered RexFβs advanced features! Continue with:
π Web Dashboard - Interactive visualization and real-time monitoring
π§ Command Line Tools - Powerful command-line analytics
tutorials/machine_learning - Complete ML workflow tutorial
api/intelligence - Detailed API reference for advanced features